#urlteam 2011-12-19,Mon

โ†‘back Search

Time Nickname Message
19:21 ๐Ÿ”— n00b311 Anyone here? If so, your website appears to have been hacked: http://urlte.am/2011.06.ARCHIVE/html.php?good2.png
19:21 ๐Ÿ”— n00b311 Redirects to a Russian malware site.
20:32 ๐Ÿ”— soultcer SketchCow: Are you there?
20:33 ๐Ÿ”— SketchCow I am always here.
20:33 ๐Ÿ”— SketchCow I saw that mention, I have to get in there and fix.
20:35 ๐Ÿ”— soultcer Any chance we can get the urlteam site away from dreamhost?
20:35 ๐Ÿ”— SketchCow It's primarily static, that's true.
20:35 ๐Ÿ”— soultcer The urlte.am site is 100% static, so it's not a broken script or something, it's a problem with the hosting
20:36 ๐Ÿ”— chronomex I can run it
20:36 ๐Ÿ”— soultcer https://github.com/ArchiveTeam/urlteam-stuff/tree/master/website contains the code for the website, in case you need one that is malware-free
20:36 ๐Ÿ”— chronomex rad
20:36 ๐Ÿ”— chronomex maybe I'll look into putting up a redirector too
20:37 ๐Ÿ”— soultcer chronomex: That will need a lot of disk space
20:38 ๐Ÿ”— soultcer btw, dec 31st is coming up, 6 months after our last release
20:40 ๐Ÿ”— soultcer I started preparing the next release, but we might have to push it back a little
21:03 ๐Ÿ”— chronomex what will eat up disk space exactly?
21:03 ๐Ÿ”— bsmith094 what release? of what?
21:03 ๐Ÿ”— soultcer The redirector?
21:04 ๐Ÿ”— soultcer bsmith094: We put out a torrent of all our data (~40 GB of compressed txt files) on May 31th and planned on releasing the next one 6 months after that
21:04 ๐Ÿ”— soultcer We are actually already over the 6 months already
21:09 ๐Ÿ”— chronomex hm. maybe I should get a bigger linode instance, if it's that big
21:09 ๐Ÿ”— chronomex I forgot how big!
21:10 ๐Ÿ”— soultcer 40 GB of compressed material in last torrent = 160 GB of uncompressed data, without even any indexes (you could do binary search or something though)
21:10 ๐Ÿ”— chronomex I'd probably just throw it into postgres like everything else I have
21:10 ๐Ÿ”— soultcer And I assume we add 10-20 GB in the next release, so maybe 50 GB compressed = 200 GB uncompressed
21:11 ๐Ÿ”— soultcer Yeah, I would love to try to put the whole data into postgres, but my database server doesn't have enough space :-/
21:11 ๐Ÿ”— chronomex maybe I'll run it on my home box, I have space there ;)
21:11 ๐Ÿ”— soultcer It would be even cooler if we had reverse search: "Which shorturls link to this domain?" or similar
21:12 ๐Ÿ”— chronomex hmmmmmm.
21:12 ๐Ÿ”— chronomex that might be worthwhile
21:13 ๐Ÿ”— chronomex I'd leave that to whoever else wants to do it :P
21:13 ๐Ÿ”— bsmith094 wait wait wait, you guys have 40gb!? of url shrotener data?
21:13 ๐Ÿ”— soultcer bsmith: Compressed with "xz -9 -M 1000M" so it's about 25% of the full data
21:14 ๐Ÿ”— soultcer I estimate we only have between 5-10% of bit.ly though, so there is still lots of data we do not have
21:14 ๐Ÿ”— soultcer Only problem is that bitly started blocking us
21:14 ๐Ÿ”— chronomex fuckers
21:15 ๐Ÿ”— bsmith094 whats the xz algorithm?
21:15 ๐Ÿ”— bsmith094 new compression?
21:15 ๐Ÿ”— soultcer Improved Lempel-Ziv-Markov chain
21:16 ๐Ÿ”— chronomex bitchin compression
21:16 ๐Ÿ”— soultcer Since I am using lots of ram to compress that thing you will probably need 100mb or so to actually decompress it
21:16 ๐Ÿ”— bsmith094 whats the -m option do, i cant find it
21:17 ๐Ÿ”— soultcer -M (uppercase) sets maximum memory usage
21:17 ๐Ÿ”— soultcer 1000 MB max memory for compressing means between 50 MB and 200 MB memory usage while decompressing according to the xz man page
21:18 ๐Ÿ”— bsmith094 is there a size of archive above which these things have a stroke?
21:18 ๐Ÿ”— soultcer Nah, I think the only limit is that we try to stay below 4 GB for the compressed files so fat32 users can still download them
21:19 ๐Ÿ”— chronomex 2gb per file is a better metric to hew to
21:19 ๐Ÿ”— soultcer Oh, if 2 GB is the limit I have to redo the new tinyurl.com stuff
21:19 ๐Ÿ”— bsmith094 hiw long does it take to compress that much data?
21:19 ๐Ÿ”— chronomex bsmith094: weekend or so
21:20 ๐Ÿ”— bsmith094 where do u run that, personal mahcine or a dedicated box
21:20 ๐Ÿ”— chronomex bedroom heater
21:20 ๐Ÿ”— bsmith094 lol
21:20 ๐Ÿ”— soultcer Nvm, the biggest file for tinyurl is 1.3G compressed, though it was something like 5,5 GB uncompressed
21:20 ๐Ÿ”— chronomex The maximum possible size for a file on a FAT32 volume is 4 GB minus 1 byte or 4 294 967 295 (232รขยˆย’1) bytes.
21:20 ๐Ÿ”— soultcer It's winter here and I still haven't turned on the heat in my room
21:21 ๐Ÿ”— chronomex soultcer: where?
21:21 ๐Ÿ”— soultcer Innsbruck, Austria
21:21 ๐Ÿ”— chronomex psh, australia has different seasons
21:21 ๐Ÿ”— bsmith094 what kind of computing resources do you have? anybody
21:21 ๐Ÿ”— bsmith094 auatria
21:21 ๐Ÿ”— chronomex :P
21:21 ๐Ÿ”— chronomex I have computers?
21:22 ๐Ÿ”— chronomex I don't know, what are you asking?
21:22 ๐Ÿ”— bsmith094 like mega huge racks or just a particularly powerful personal machine?
21:22 ๐Ÿ”— soultcer I have some cheap-ass virtual servers for the tinyurl scraping and some typical computer gear (intel core 2 something something) to do the sorting, compressing, ...
21:22 ๐Ÿ”— soultcer My personal machine is over 5 years old :P
21:22 ๐Ÿ”— chronomex similar
21:22 ๐Ÿ”— bsmith094 i KNEW virtualization would pay ff at some point
21:22 ๐Ÿ”— chronomex if you don't mind waiting, it's not that bad
21:23 ๐Ÿ”— bsmith094 ive been helping archiveteam on a 4year old laprop
21:23 ๐Ÿ”— soultcer It's cheaper to buy a new VPS at 123systems or buyvm than getting a second IP address on a vps i already own
21:23 ๐Ÿ”— bsmith094 thats odd
21:24 ๐Ÿ”— chronomex that's ridic
21:28 ๐Ÿ”— soultcer Well, back to the new release thingie
21:28 ๐Ÿ”— soultcer What else did we download in the last 6 months?
21:28 ๐Ÿ”— soultcer I'll admit, I was kinda lazy this time, I only got tinyurl 6-letter-stuff
21:29 ๐Ÿ”— chronomex I didn't do shit on urlteam
21:29 ๐Ÿ”— chronomex but I would like to get http://xy.zz.urlte.am/foobar redirecting
21:29 ๐Ÿ”— soultcer Your is.gd scrape is now worth 1000x more because they started rate-limiting and we got almost all from their pre-rate-limit stuff
21:30 ๐Ÿ”— chronomex <3
21:30 ๐Ÿ”— chronomex good
21:30 ๐Ÿ”— soultcer xy.z.urlte.am/foobar sounds nice, we would need a server with some big hdd for that though
21:30 ๐Ÿ”— chronomex I'm willing to figure it out
21:30 ๐Ÿ”— soultcer Do you think dreamhost with their "unlimited" offer would mind if we fill their mysql db with a couple billion rows?
21:31 ๐Ÿ”— chronomex good question.
21:31 ๐Ÿ”— chronomex mysql is rarely the right answer
21:31 ๐Ÿ”— chronomex I think this may be one of those rare times
21:31 ๐Ÿ”— soultcer Well, it's just key-value, what could go wrong?
21:31 ๐Ÿ”— chronomex yes, that's all mysql is good for :P
21:33 ๐Ÿ”— soultcer The downside is that you have to prepay for two years, and if they decide after one month that they don't like you, you are out 23 months of webhosting fees
21:34 ๐Ÿ”— soultcer Maybe get one of the storage plans on buyvm.net, slap some postgresql on it and see what happens?
21:35 ๐Ÿ”— soultcer http://buyvm.net/ -> Storage button
21:38 ๐Ÿ”— chronomex hm, cheap.
21:38 ๐Ÿ”— soultcer And they could double as seed box for the torrent
21:38 ๐Ÿ”— soultcer Although network bandwidth outgoing is limited
21:39 ๐Ÿ”— chronomex I might have to move numbertron over to buyvm if it outgrows what I can afford on linode
21:39 ๐Ÿ”— soultcer Well those plans are for "backup storage", so the hdd will probably be rather slow
21:39 ๐Ÿ”— chronomex aye
21:57 ๐Ÿ”— soultcer Hehe, there is actually someone downloading the urlteam torrent from me right now
21:58 ๐Ÿ”— soultcer A seebox apparently. That's sweet, it will hopefully seed after it finished
22:11 ๐Ÿ”— soultcer Oh, this reminds me
22:13 ๐Ÿ”— soultcer Blaine Cook, who was lead developer at Twitter, started a website called tinyarchive.org a few years ago in response to the "url shorteners are bad" outcry
22:13 ๐Ÿ”— soultcer He made it so that every URL with a 301 redirect that was submitted to his site was archive there
22:13 ๐Ÿ”— chronomex hm, seems dead-ish
22:14 ๐Ÿ”— soultcer i.e. you visited tinyarchive.org/http://bit.ly/bla and it would store the bit.ly link
22:14 ๐Ÿ”— chronomex hm.
22:14 ๐Ÿ”— soultcer He said "It's on google app engine so I can just buy more quote if we get millions of links"
22:14 ๐Ÿ”— soultcer Turns out he didn't think that through (remember: Lead dev at twitter) since buying storage for all bit.ly links on google's app engine would be expensive
22:14 ๐Ÿ”— chronomex heh.
22:15 ๐Ÿ”— soultcer And he later abandoned the domain
22:15 ๐Ÿ”— chronomex figures.
22:15 ๐Ÿ”— soultcer The source is still on github: https://github.com/blaine/tinyarchive
22:15 ๐Ÿ”— soultcer And when I noticed the domain was unused I bought it
22:15 ๐Ÿ”— chronomex smrt
22:15 ๐Ÿ”— soultcer So, we do have control of the tinyarchive.org domain in case we want to use it
22:16 ๐Ÿ”— soultcer I don't know if using urlte.am or tinyarchive.org is better. Or maybe we should use both, I don't really know
22:17 ๐Ÿ”— chronomex I like urlte.am
22:17 ๐Ÿ”— bsmith094 do you also have the data that went with it?
22:17 ๐Ÿ”— soultcer bsmith094: Nope, but it was just a couple of random bit.ly links I think. As I said, he didn't really think the whole thing through and figured it would be only a few million links instead of a couple billion
22:18 ๐Ÿ”— bsmith094 ah, oh well
22:18 ๐Ÿ”— bsmith094 when you buy a domain form someone who already has it, do they have to also manually giv you the data that goes with it, or is that automatic?
22:19 ๐Ÿ”— chronomex I think soultcer is saying he bought it on the drop
22:19 ๐Ÿ”— soultcer I bought the domain after he dropped it, so technically I bought a "fresh" domain
22:21 ๐Ÿ”— soultcer Some squatter owns tinyarchive.com though, so we are in the same situation as urlteam, where urlteam.com, urlteam.net and urlteam.org are owned by squatters
22:23 ๐Ÿ”— chronomex fuckres
22:23 ๐Ÿ”— chronomex owell
22:23 ๐Ÿ”— chronomex can't own everything, right?
22:24 ๐Ÿ”— soultcer Would be kind of expensive
22:25 ๐Ÿ”— chronomex kindof
22:31 ๐Ÿ”— soultcer xz is amazing. Compressed from 1.1 GB to 197 MB
22:43 ๐Ÿ”— bsmith094 over 90 percent, wow that is good
22:44 ๐Ÿ”— soultcer Over 90?
22:56 ๐Ÿ”— chronomex yeah, xz is rad

irclogger-viewer