Time |
Nickname |
Message |
04:22
🔗
|
winr4r |
who is "Start" on the archive team wiki |
04:22
🔗
|
winr4r |
apparently he downloaded foxytunes |
04:34
🔗
|
SketchCow |
Had a nice chat with Brewster. |
04:34
🔗
|
SketchCow |
It's nice to have a boss/ceo you really love just chit-chatting with |
04:36
🔗
|
winr4r |
aw :) |
04:48
🔗
|
BlueMax |
SketchCow seems to have a man crush |
04:49
🔗
|
SketchCow |
I do like the guy a lot. |
05:53
🔗
|
SketchCow |
First machine reboot in a while! |
05:53
🔗
|
SketchCow |
1:53am, how did you arrive |
08:07
🔗
|
Smiley |
-rw-r--r-- 1 tim.bowers games 984M Jul 5 09:07 ./bin/ign/storage/pouet/pouet.net_06052013.cdx |
08:07
🔗
|
Smiley |
-rw-r--r-- 1 tim.bowers games 247G Jul 5 09:07 ./bin/ign/storage/pouet/pouet.net_06052013.warc |
08:07
🔗
|
Smiley |
keep on ROLLIN' |
08:07
🔗
|
Smiley |
2 Months and counting. |
08:10
🔗
|
omf_ |
Smiley, have you looked at the cdx file for that warc |
08:11
🔗
|
omf_ |
skim off the first 100,000 lines and set it up as a gz file for me to grab, I need to check a hypothesis |
08:12
🔗
|
Smiley |
omf_: not yet, but I can do |
08:13
🔗
|
Smiley |
head -n -10000 ./file.cdx > ./file_for_omf ? :D |
08:13
🔗
|
Smiley |
ooh 100000 |
08:14
🔗
|
Smiley |
will take a while I feel D: |
08:14
🔗
|
omf_ |
for 100k lines, it should be short |
08:15
🔗
|
Smiley |
wait, you WANT the first 100k? |
08:16
🔗
|
winr4r |
sup timmeh |
08:16
🔗
|
omf_ |
100k is shorthand for 100,000 |
08:16
🔗
|
Smiley |
yes |
08:17
🔗
|
Smiley |
i thought you wanted me to strip off the first 100k :P |
08:17
🔗
|
Smiley |
where to put this file d: |
08:17
🔗
|
omf_ |
Smiley, looking at your command and my comment I was unclear |
08:18
🔗
|
Smiley |
me too :D |
08:18
🔗
|
omf_ |
I want: head -n 100000 ./cdx > new_file_ |
08:18
🔗
|
Smiley |
where shall I put this, I doubt many pastebins like 100k lines being pasted. |
08:18
🔗
|
Smiley |
omf_: yeah, I've got it :) |
08:18
🔗
|
omf_ |
just stick it in /home on anarchive if you can |
08:19
🔗
|
Smiley |
i can ;) |
08:19
🔗
|
Smiley |
done |
08:21
🔗
|
omf_ |
thanks |
08:26
🔗
|
omf_ |
Smiley, what are the stats when you: wc blah.cdx |
08:28
🔗
|
Smiley |
well lines was 100,000 :D |
08:28
🔗
|
Smiley |
tim.bowers@timDesktop ~ $ wc ./for_omf 100000 1100001 22557370 ./for_omf |
08:29
🔗
|
omf_ |
I mean the original please |
09:14
🔗
|
Smiley |
oh |
09:14
🔗
|
* |
Smiley calculates |
09:19
🔗
|
godane |
so i'm now backing up 2012 of techcrunch.com |
09:22
🔗
|
godane |
i have 7.4gb of techcrunch.com so far |
09:23
🔗
|
Smiley |
tim.bowers@timDesktop ~ $ wc ./bin/ign/storage/pouet/pouet.net_06052013.cdx 3868315 42551466 1032158108 ./bin/ign/storage/pouet/pouet.net_06052013.cdx |
09:23
🔗
|
Smiley |
omf_: |
09:24
🔗
|
omf_ |
I see |
09:32
🔗
|
godane |
the rest of hackaday.com is going up |
09:32
🔗
|
godane |
just the first 6 months of 2013 |
10:04
🔗
|
SketchCow |
http://www.flickr.com/photos/textfiles/sets/72157634488809303/with/9215318638/ |
10:31
🔗
|
ersi |
Beautifully |
14:18
🔗
|
balrog |
http://web.archive.org/web/*/www.vms2linux.de/ods5fs.html -- wtf? that site has no robots.txt! |
14:41
🔗
|
Smiley |
The requested URL /robots.txt was not found on this server. |
14:42
🔗
|
Smiley |
indeed, but at time of crawling? |
14:50
🔗
|
godane |
i found out when it does have one: http://web.archive.org/web/20110725130646/http://vms2linux.de/robots.txt |
14:50
🔗
|
godane |
and again: http://web.archive.org/web/20111231183927/http://vms2linux.de/robots.txt |
14:51
🔗
|
godane |
that site is very weird |
14:51
🔗
|
godane |
it sometimes has robots.txt and then the next crawl doesn't have it |
14:56
🔗
|
Smiley |
two people uploading and one has deleted it? XD |