[04:22] who is "Start" on the archive team wiki [04:22] apparently he downloaded foxytunes [04:34] Had a nice chat with Brewster. [04:34] It's nice to have a boss/ceo you really love just chit-chatting with [04:36] aw :) [04:48] SketchCow seems to have a man crush [04:49] I do like the guy a lot. [05:53] First machine reboot in a while! [05:53] 1:53am, how did you arrive [08:07] -rw-r--r-- 1 tim.bowers games 984M Jul 5 09:07 ./bin/ign/storage/pouet/pouet.net_06052013.cdx [08:07] -rw-r--r-- 1 tim.bowers games 247G Jul 5 09:07 ./bin/ign/storage/pouet/pouet.net_06052013.warc [08:07] keep on ROLLIN' [08:07] 2 Months and counting. [08:10] Smiley, have you looked at the cdx file for that warc [08:11] skim off the first 100,000 lines and set it up as a gz file for me to grab, I need to check a hypothesis [08:12] omf_: not yet, but I can do [08:13] head -n -10000 ./file.cdx > ./file_for_omf ? :D [08:13] ooh 100000 [08:14] will take a while I feel D: [08:14] for 100k lines, it should be short [08:15] wait, you WANT the first 100k? [08:16] sup timmeh [08:16] 100k is shorthand for 100,000 [08:16] yes [08:17] i thought you wanted me to strip off the first 100k :P [08:17] where to put this file d: [08:17] Smiley, looking at your command and my comment I was unclear [08:18] me too :D [08:18] I want: head -n 100000 ./cdx > new_file_ [08:18] where shall I put this, I doubt many pastebins like 100k lines being pasted. [08:18] omf_: yeah, I've got it :) [08:18] just stick it in /home on anarchive if you can [08:19] i can ;) [08:19] done [08:21] thanks [08:26] Smiley, what are the stats when you: wc blah.cdx [08:28] well lines was 100,000 :D [08:28] tim.bowers@timDesktop ~ $ wc ./for_omf 100000 1100001 22557370 ./for_omf [08:29] I mean the original please [09:14] oh [09:14] * Smiley calculates [09:19] so i'm now backing up 2012 of techcrunch.com [09:22] i have 7.4gb of techcrunch.com so far [09:23] tim.bowers@timDesktop ~ $ wc ./bin/ign/storage/pouet/pouet.net_06052013.cdx 3868315 42551466 1032158108 ./bin/ign/storage/pouet/pouet.net_06052013.cdx [09:23] omf_: [09:24] I see [09:32] the rest of hackaday.com is going up [09:32] just the first 6 months of 2013 [10:04] http://www.flickr.com/photos/textfiles/sets/72157634488809303/with/9215318638/ [10:31] Beautifully [14:18] http://web.archive.org/web/*/www.vms2linux.de/ods5fs.html -- wtf? that site has no robots.txt! [14:41] The requested URL /robots.txt was not found on this server. [14:42] indeed, but at time of crawling? [14:50] i found out when it does have one: http://web.archive.org/web/20110725130646/http://vms2linux.de/robots.txt [14:50] and again: http://web.archive.org/web/20111231183927/http://vms2linux.de/robots.txt [14:51] that site is very weird [14:51] it sometimes has robots.txt and then the next crawl doesn't have it [14:56] two people uploading and one has deleted it? XD