[19:21] Anyone here? If so, your website appears to have been hacked: http://urlte.am/2011.06.ARCHIVE/html.php?good2.png [19:21] Redirects to a Russian malware site. [20:32] SketchCow: Are you there? [20:33] I am always here. [20:33] I saw that mention, I have to get in there and fix. [20:35] Any chance we can get the urlteam site away from dreamhost? [20:35] It's primarily static, that's true. [20:35] The urlte.am site is 100% static, so it's not a broken script or something, it's a problem with the hosting [20:36] I can run it [20:36] https://github.com/ArchiveTeam/urlteam-stuff/tree/master/website contains the code for the website, in case you need one that is malware-free [20:36] rad [20:36] maybe I'll look into putting up a redirector too [20:37] chronomex: That will need a lot of disk space [20:38] btw, dec 31st is coming up, 6 months after our last release [20:40] I started preparing the next release, but we might have to push it back a little [21:03] what will eat up disk space exactly? [21:03] what release? of what? [21:03] The redirector? [21:04] bsmith094: We put out a torrent of all our data (~40 GB of compressed txt files) on May 31th and planned on releasing the next one 6 months after that [21:04] We are actually already over the 6 months already [21:09] hm. maybe I should get a bigger linode instance, if it's that big [21:09] I forgot how big! [21:10] 40 GB of compressed material in last torrent = 160 GB of uncompressed data, without even any indexes (you could do binary search or something though) [21:10] I'd probably just throw it into postgres like everything else I have [21:10] And I assume we add 10-20 GB in the next release, so maybe 50 GB compressed = 200 GB uncompressed [21:11] Yeah, I would love to try to put the whole data into postgres, but my database server doesn't have enough space :-/ [21:11] maybe I'll run it on my home box, I have space there ;) [21:11] It would be even cooler if we had reverse search: "Which shorturls link to this domain?" or similar [21:12] hmmmmmm. [21:12] that might be worthwhile [21:13] I'd leave that to whoever else wants to do it :P [21:13] wait wait wait, you guys have 40gb!? of url shrotener data? [21:13] bsmith: Compressed with "xz -9 -M 1000M" so it's about 25% of the full data [21:14] I estimate we only have between 5-10% of bit.ly though, so there is still lots of data we do not have [21:14] Only problem is that bitly started blocking us [21:14] fuckers [21:15] whats the xz algorithm? [21:15] new compression? [21:15] Improved Lempel-Ziv-Markov chain [21:16] bitchin compression [21:16] Since I am using lots of ram to compress that thing you will probably need 100mb or so to actually decompress it [21:16] whats the -m option do, i cant find it [21:17] -M (uppercase) sets maximum memory usage [21:17] 1000 MB max memory for compressing means between 50 MB and 200 MB memory usage while decompressing according to the xz man page [21:18] is there a size of archive above which these things have a stroke? [21:18] Nah, I think the only limit is that we try to stay below 4 GB for the compressed files so fat32 users can still download them [21:19] 2gb per file is a better metric to hew to [21:19] Oh, if 2 GB is the limit I have to redo the new tinyurl.com stuff [21:19] hiw long does it take to compress that much data? [21:19] bsmith094: weekend or so [21:20] where do u run that, personal mahcine or a dedicated box [21:20] bedroom heater [21:20] lol [21:20] Nvm, the biggest file for tinyurl is 1.3G compressed, though it was something like 5,5 GB uncompressed [21:20] The maximum possible size for a file on a FAT32 volume is 4 GB minus 1 byte or 4 294 967 295 (232−1) bytes. [21:20] It's winter here and I still haven't turned on the heat in my room [21:21] soultcer: where? [21:21] Innsbruck, Austria [21:21] psh, australia has different seasons [21:21] what kind of computing resources do you have? anybody [21:21] auatria [21:21] :P [21:21] I have computers? [21:22] I don't know, what are you asking? [21:22] like mega huge racks or just a particularly powerful personal machine? [21:22] I have some cheap-ass virtual servers for the tinyurl scraping and some typical computer gear (intel core 2 something something) to do the sorting, compressing, ... [21:22] My personal machine is over 5 years old :P [21:22] similar [21:22] i KNEW virtualization would pay ff at some point [21:22] if you don't mind waiting, it's not that bad [21:23] ive been helping archiveteam on a 4year old laprop [21:23] It's cheaper to buy a new VPS at 123systems or buyvm than getting a second IP address on a vps i already own [21:23] thats odd [21:24] that's ridic [21:28] Well, back to the new release thingie [21:28] What else did we download in the last 6 months? [21:28] I'll admit, I was kinda lazy this time, I only got tinyurl 6-letter-stuff [21:29] I didn't do shit on urlteam [21:29] but I would like to get http://xy.zz.urlte.am/foobar redirecting [21:29] Your is.gd scrape is now worth 1000x more because they started rate-limiting and we got almost all from their pre-rate-limit stuff [21:30] <3 [21:30] good [21:30] xy.z.urlte.am/foobar sounds nice, we would need a server with some big hdd for that though [21:30] I'm willing to figure it out [21:30] Do you think dreamhost with their "unlimited" offer would mind if we fill their mysql db with a couple billion rows? [21:31] good question. [21:31] mysql is rarely the right answer [21:31] I think this may be one of those rare times [21:31] Well, it's just key-value, what could go wrong? [21:31] yes, that's all mysql is good for :P [21:33] The downside is that you have to prepay for two years, and if they decide after one month that they don't like you, you are out 23 months of webhosting fees [21:34] Maybe get one of the storage plans on buyvm.net, slap some postgresql on it and see what happens? [21:35] http://buyvm.net/ -> Storage button [21:38] hm, cheap. [21:38] And they could double as seed box for the torrent [21:38] Although network bandwidth outgoing is limited [21:39] I might have to move numbertron over to buyvm if it outgrows what I can afford on linode [21:39] Well those plans are for "backup storage", so the hdd will probably be rather slow [21:39] aye [21:57] Hehe, there is actually someone downloading the urlteam torrent from me right now [21:58] A seebox apparently. That's sweet, it will hopefully seed after it finished [22:11] Oh, this reminds me [22:13] Blaine Cook, who was lead developer at Twitter, started a website called tinyarchive.org a few years ago in response to the "url shorteners are bad" outcry [22:13] He made it so that every URL with a 301 redirect that was submitted to his site was archive there [22:13] hm, seems dead-ish [22:14] i.e. you visited tinyarchive.org/http://bit.ly/bla and it would store the bit.ly link [22:14] hm. [22:14] He said "It's on google app engine so I can just buy more quote if we get millions of links" [22:14] Turns out he didn't think that through (remember: Lead dev at twitter) since buying storage for all bit.ly links on google's app engine would be expensive [22:14] heh. [22:15] And he later abandoned the domain [22:15] figures. [22:15] The source is still on github: https://github.com/blaine/tinyarchive [22:15] And when I noticed the domain was unused I bought it [22:15] smrt [22:15] So, we do have control of the tinyarchive.org domain in case we want to use it [22:16] I don't know if using urlte.am or tinyarchive.org is better. Or maybe we should use both, I don't really know [22:17] I like urlte.am [22:17] do you also have the data that went with it? [22:17] bsmith094: Nope, but it was just a couple of random bit.ly links I think. As I said, he didn't really think the whole thing through and figured it would be only a few million links instead of a couple billion [22:18] ah, oh well [22:18] when you buy a domain form someone who already has it, do they have to also manually giv you the data that goes with it, or is that automatic? [22:19] I think soultcer is saying he bought it on the drop [22:19] I bought the domain after he dropped it, so technically I bought a "fresh" domain [22:21] Some squatter owns tinyarchive.com though, so we are in the same situation as urlteam, where urlteam.com, urlteam.net and urlteam.org are owned by squatters [22:23] fuckres [22:23] owell [22:23] can't own everything, right? [22:24] Would be kind of expensive [22:25] kindof [22:31] xz is amazing. Compressed from 1.1 GB to 197 MB [22:43] over 90 percent, wow that is good [22:44] Over 90? [22:56] yeah, xz is rad