#archiveteam 2013-07-05,Fri

↑back Search

Time Nickname Message
04:22 🔗 winr4r who is "Start" on the archive team wiki
04:22 🔗 winr4r apparently he downloaded foxytunes
04:34 🔗 SketchCow Had a nice chat with Brewster.
04:34 🔗 SketchCow It's nice to have a boss/ceo you really love just chit-chatting with
04:36 🔗 winr4r aw :)
04:48 🔗 BlueMax SketchCow seems to have a man crush
04:49 🔗 SketchCow I do like the guy a lot.
05:53 🔗 SketchCow First machine reboot in a while!
05:53 🔗 SketchCow 1:53am, how did you arrive
08:07 🔗 Smiley -rw-r--r-- 1 tim.bowers games 984M Jul 5 09:07 ./bin/ign/storage/pouet/pouet.net_06052013.cdx
08:07 🔗 Smiley -rw-r--r-- 1 tim.bowers games 247G Jul 5 09:07 ./bin/ign/storage/pouet/pouet.net_06052013.warc
08:07 🔗 Smiley keep on ROLLIN'
08:07 🔗 Smiley 2 Months and counting.
08:10 🔗 omf_ Smiley, have you looked at the cdx file for that warc
08:11 🔗 omf_ skim off the first 100,000 lines and set it up as a gz file for me to grab, I need to check a hypothesis
08:12 🔗 Smiley omf_: not yet, but I can do
08:13 🔗 Smiley head -n -10000 ./file.cdx > ./file_for_omf ? :D
08:13 🔗 Smiley ooh 100000
08:14 🔗 Smiley will take a while I feel D:
08:14 🔗 omf_ for 100k lines, it should be short
08:15 🔗 Smiley wait, you WANT the first 100k?
08:16 🔗 winr4r sup timmeh
08:16 🔗 omf_ 100k is shorthand for 100,000
08:16 🔗 Smiley yes
08:17 🔗 Smiley i thought you wanted me to strip off the first 100k :P
08:17 🔗 Smiley where to put this file d:
08:17 🔗 omf_ Smiley, looking at your command and my comment I was unclear
08:18 🔗 Smiley me too :D
08:18 🔗 omf_ I want: head -n 100000 ./cdx > new_file_
08:18 🔗 Smiley where shall I put this, I doubt many pastebins like 100k lines being pasted.
08:18 🔗 Smiley omf_: yeah, I've got it :)
08:18 🔗 omf_ just stick it in /home on anarchive if you can
08:19 🔗 Smiley i can ;)
08:19 🔗 Smiley done
08:21 🔗 omf_ thanks
08:26 🔗 omf_ Smiley, what are the stats when you: wc blah.cdx
08:28 🔗 Smiley well lines was 100,000 :D
08:28 🔗 Smiley tim.bowers@timDesktop ~ $ wc ./for_omf 100000 1100001 22557370 ./for_omf
08:29 🔗 omf_ I mean the original please
09:14 🔗 Smiley oh
09:14 🔗 * Smiley calculates
09:19 🔗 godane so i'm now backing up 2012 of techcrunch.com
09:22 🔗 godane i have 7.4gb of techcrunch.com so far
09:23 🔗 Smiley tim.bowers@timDesktop ~ $ wc ./bin/ign/storage/pouet/pouet.net_06052013.cdx 3868315 42551466 1032158108 ./bin/ign/storage/pouet/pouet.net_06052013.cdx
09:23 🔗 Smiley omf_:
09:24 🔗 omf_ I see
09:32 🔗 godane the rest of hackaday.com is going up
09:32 🔗 godane just the first 6 months of 2013
10:04 🔗 SketchCow http://www.flickr.com/photos/textfiles/sets/72157634488809303/with/9215318638/
10:31 🔗 ersi Beautifully
14:18 🔗 balrog http://web.archive.org/web/*/www.vms2linux.de/ods5fs.html -- wtf? that site has no robots.txt!
14:41 🔗 Smiley The requested URL /robots.txt was not found on this server.
14:42 🔗 Smiley indeed, but at time of crawling?
14:50 🔗 godane i found out when it does have one: http://web.archive.org/web/20110725130646/http://vms2linux.de/robots.txt
14:50 🔗 godane and again: http://web.archive.org/web/20111231183927/http://vms2linux.de/robots.txt
14:51 🔗 godane that site is very weird
14:51 🔗 godane it sometimes has robots.txt and then the next crawl doesn't have it
14:56 🔗 Smiley two people uploading and one has deleted it? XD

irclogger-viewer