Time |
Nickname |
Message |
00:19
🔗
|
chronomex |
erp |
04:06
🔗
|
tef |
any attempts to archive aaron's stuff yet |
04:09
🔗
|
kennethre |
tef: i'll assist in any way, if needed |
04:10
🔗
|
tef |
i'm firing up my work's crawler atm on news.yc +1 |
04:17
🔗
|
Famicoman |
jasons been working on some stuff |
04:17
🔗
|
Famicoman |
godane grabbed a copy of his site |
04:17
🔗
|
Famicoman |
and I'm sure there are other things |
04:19
🔗
|
tef |
ah cool, I was wondering if someone would do that |
04:19
🔗
|
tef |
I did the newsyc for sopa |
04:28
🔗
|
godane |
Famicoman: I got SpikeTV Xbox 360 2011 coverage |
04:29
🔗
|
Famicoman |
noice |
04:36
🔗
|
godane |
its also 720p version |
04:37
🔗
|
godane |
whats funny is the release group calls themself "Aggressive Archive Force" |
04:38
🔗
|
Famicoman |
haha wow |
04:38
🔗
|
GLaDOS |
Heh |
04:47
🔗
|
SketchCow |
OK, now on a proper laptop. |
04:47
🔗
|
SketchCow |
We have some rough stuff. |
04:48
🔗
|
tef |
SketchCow: i'm running a crawl of hn frontpage + all links appearing on it. should have snapshots, and ajax shit too. |
04:48
🔗
|
tef |
in the warcs. |
04:48
🔗
|
chronomex |
great |
04:49
🔗
|
tef |
not sure what to do about twitter |
04:49
🔗
|
tef |
especially #pdftribute |
05:02
🔗
|
SketchCow |
My co-workers and I made duplicate pages. |
05:02
🔗
|
SketchCow |
http://archive.org/details/ark-aaronsw |
05:02
🔗
|
SketchCow |
and |
05:03
🔗
|
SketchCow |
http://archive.org/details/aaronsw |
05:06
🔗
|
tef |
oops |
05:07
🔗
|
tef |
about halfway with the hackernews +1 link |
05:13
🔗
|
godane |
https://www.youtube.com/watch?v=AqZNebWoqnc |
05:13
🔗
|
godane |
that is another video for len sassaman afk |
05:30
🔗
|
balrog_ |
SketchCow: I notice that several interesting sections of Aaron's website were removed and blocked with robots.txt in the past but I'm sure you're all aware |
05:33
🔗
|
tef |
ppfft who looks at robots.txts. wimpy crawlers. |
05:34
🔗
|
Cameron_D |
I do, to find bonus things to crawl :3 |
05:35
🔗
|
GLaDOS |
"Disallow? Must be some saucy stuff in here.." |
05:41
🔗
|
balrog_ |
yeah but some of that was pulled too which means its not in Wayback |
05:51
🔗
|
godane |
starting the upload of sega visions now |
06:27
🔗
|
godane |
some of my items are not showing up |
06:31
🔗
|
godane |
i'm going to bed |
06:31
🔗
|
godane |
hope then internal error stuff goes away |
08:11
🔗
|
GLaDOS |
Aaron Swartz is trending worldwide on twitter. |
08:11
🔗
|
GLaDOS |
Wow. |
08:12
🔗
|
kennethre |
GLaDOS: incredible |
08:32
🔗
|
GLaDOS |
...and he disappears. |
11:06
🔗
|
SketchCow |
Huuuuug |
11:09
🔗
|
SketchCow |
I'm adding a pile of material (Atari, Creative Computing, soritng BITSAVERS) |
12:56
🔗
|
Nemo_bis |
NATO's ftp done: Downloaded: 16099 files, 58G in 11d 12h 4m 0s (61.6 KB/s) |
14:50
🔗
|
schbiridi |
nice, Nemo_bis |
16:52
🔗
|
emijrp |
i haz a script to move videos from youtube to internet archive |
16:53
🔗
|
emijrp |
so, if we are ok with uploading all videos about aaron (including copyright ones) we can proceed... |
17:09
🔗
|
ersi |
IA has some youtube-grabbing infra as well afaik |
17:18
🔗
|
adamcaudi |
Can someone that's a bit more familiar with wget / warc files take a look at this and see if I've done anything stupid? https://gist.github.com/4524708 |
17:19
🔗
|
adamcaudi |
It seems right to me, but I'd rather not collect a few GB of mirrors then realize I missed something |
17:23
🔗
|
balrog_ |
I hope someone's archiving the current #pdftribute |
17:23
🔗
|
balrog_ |
(twitter hashtag) |
17:25
🔗
|
emijrp |
and his tw account? |
17:29
🔗
|
balrog_ |
hmm. |
17:29
🔗
|
balrog_ |
#pdftribute is various academics posting their papers to be freely available in protest of paywalls |
17:45
🔗
|
ersi |
emijrp: then again, it's always nice to have a copy if you're able to grab |
17:46
🔗
|
emijrp |
i will send the script to Nemo_bis |
17:46
🔗
|
emijrp |
i dont have upload bandwidth for that |
17:46
🔗
|
emijrp |
go go go http://archiveteam.org/index.php?title=Aaron_Swartz |
17:46
🔗
|
Nemo_bis |
emijrp: how many are they? |
17:46
🔗
|
emijrp |
250 |
17:46
🔗
|
Nemo_bis |
oh, should be feasible then |
17:47
🔗
|
Nemo_bis |
I don't have much free upload or disk right now |
17:47
🔗
|
SketchCow |
Please grab PDFtributes if possible |
17:48
🔗
|
emijrp |
http://archiveteam.org/index.php?title=Aaron_Swartz/YouTube_videos |
17:49
🔗
|
emijrp |
add links in wiki to the grabs, so we see what is complete |
17:49
🔗
|
Nemo_bis |
emijrp: are you talking to me? |
17:49
🔗
|
emijrp |
no |
17:49
🔗
|
Nemo_bis |
ah ok |
17:49
🔗
|
* |
Nemo_bis waiting for the script |
17:49
🔗
|
Nemo_bis |
if someone else could run it I wouldn't be offended though ^^ |
17:52
🔗
|
emijrp |
Nemo_bis: http://code.google.com/p/emijrp/source/browse/trunk/scrapers/youtube2internetarchive.py |
17:53
🔗
|
Nemo_bis |
emijrp: have you updated the collections and so on? |
17:53
🔗
|
emijrp |
no |
17:53
🔗
|
emijrp |
wait.. |
18:25
🔗
|
emijrp |
Nemo_bis: try now, read the instructions http://code.google.com/p/emijrp/source/browse/trunk/scrapers/youtube2internetarchive.py |
18:26
🔗
|
emijrp |
save links to videos in download/videostodo.txt |
18:26
🔗
|
emijrp |
and then python youtube2internetarchive.py english all aaronsw |
18:29
🔗
|
Nemo_bis |
'es': {'01':'january', '02': 'february', '03':'march', '04':'april', '05':'may', '06':'june', '07':'july', '08':'august','09':'september','10':'october', '11':'november', '12':'december'} |
18:29
🔗
|
Nemo_bis |
File "youtube2internetarchive.py", line 59 |
18:29
🔗
|
Nemo_bis |
^ |
18:29
🔗
|
Nemo_bis |
SyntaxError: invalid syntax |
18:29
🔗
|
godane |
this is cool: http://web.archive.org/web/20120911024204/http://www.underground-gamer.com/forums.php?action=viewforum&forumid=40&page=25 |
18:30
🔗
|
emijrp |
fixed Nemo_bis |
18:30
🔗
|
balrog_ |
godane: why is that on IA? |
18:30
🔗
|
godane |
cause i mirrored it |
18:30
🔗
|
balrog_ |
ah. |
18:30
🔗
|
ersi |
awesome ;D |
18:30
🔗
|
Smiley |
it was a int, and now you make it a string? |
18:30
🔗
|
ersi |
godane: Hello there, chris1975 |
18:31
🔗
|
godane |
thats my username there |
18:31
🔗
|
godane |
what sucks is i didn't do this to bitgamer fourms |
18:32
🔗
|
balrog_ |
:( |
18:32
🔗
|
Nemo_bis |
emijrp: what keys should I use? |
18:34
🔗
|
balrog_ |
godane: I wish they had imported the forums to the ug forums |
18:34
🔗
|
balrog_ |
would have been nice |
18:34
🔗
|
emijrp |
Nemo_bis: yours? |
18:34
🔗
|
Nemo_bis |
emijrp: what collection is it? |
18:35
🔗
|
emijrp |
aaronsw |
18:38
🔗
|
Nemo_bis |
and everyone can write to it? |
18:39
🔗
|
emijrp |
dont know |
18:39
🔗
|
emijrp |
you can request admin role to SketchCow ? |
18:41
🔗
|
Nemo_bis |
they all seem to be erroring on download |
18:41
🔗
|
emijrp |
update youtube-dl .. |
18:42
🔗
|
godane |
balog_: i'm getting other crap like spiketv video game awards too |
18:42
🔗
|
godane |
found some copys going back to 2008 |
18:44
🔗
|
godane |
so there is going to be a spiketv-specials collection in computer and tech videos collections sometime |
18:45
🔗
|
Nemo_bis |
Traceback (most recent call last): |
18:45
🔗
|
Nemo_bis |
File "youtube2internetarchive.py", line 138, in <module> |
18:45
🔗
|
Nemo_bis |
KeyError: 'english' |
18:45
🔗
|
Nemo_bis |
upload_month = num2month[language][json_['upload_date'][4:6]] |
18:45
🔗
|
Nemo_bis |
emijrp: ^ |
18:46
🔗
|
ersi |
/query |
18:47
🔗
|
Smiley |
stop trying to convert int to string? |
18:47
🔗
|
emijrp |
Nemo_bis: fixed |
18:47
🔗
|
emijrp |
Smiley: not that |
18:48
🔗
|
Smiley |
D: |
18:52
🔗
|
Nemo_bis |
emijrp: are you adding a keyword? |
18:52
🔗
|
emijrp |
yes... lok the code |
18:52
🔗
|
Nemo_bis |
ok |
18:53
🔗
|
godane |
there is only ~8000 urls from g4tv.com feed to go |
18:54
🔗
|
godane |
*thefeed |
18:54
🔗
|
Famicoman |
godane let me know if you ever find the halo 2 specials done by mtv and spiketv in 2004 |
18:55
🔗
|
Famicoman |
Also, I think I have the first spiketv video game awards on vhs somewhere around here |
18:55
🔗
|
godane |
Famicoman: did you get g4 e3 2007 or 2008 |
18:55
🔗
|
godane |
i'm also looking for g4 ces from 2008 |
18:56
🔗
|
Famicoman |
nah, I haven't found too many g4 specials |
18:56
🔗
|
godane |
what do you have? |
18:56
🔗
|
Famicoman |
I don't know, probably more techtv stuff than anything else |
18:57
🔗
|
Famicoman |
I don't remember where I put it all |
18:57
🔗
|
godane |
whats funny is i have most of that upload to archive.org now |
18:57
🔗
|
Famicoman |
I feel like demonoid had a good amount of g4 stuff before it went down |
18:57
🔗
|
Famicoman |
I think I had G4 comicon coverage for a few years |
18:58
🔗
|
godane |
i have 2011 up and 2012 on my drive |
18:58
🔗
|
emijrp |
Nemo_bis: works fine? |
18:58
🔗
|
godane |
do you have any attack of the shows from 2010? |
18:59
🔗
|
godane |
i have nov and dec of 2010 |
18:59
🔗
|
godane |
the full year of 2011 |
19:00
🔗
|
Nemo_bis |
emijrp: no |
19:00
🔗
|
emijrp |
lol |
19:00
🔗
|
emijrp |
query me |
19:01
🔗
|
godane |
Famicoman: spiketv halo 2?: http://www.spike.com/full-episodes/blhn9j/gttv-halo-4-season-5-ep-528 |
19:14
🔗
|
godane |
now this you would not have without my help: http://web.archive.org/web/20120919075719/http://www.underground-gamer.com/forums.php?action=viewtopic&topicid=742&page=841 |
19:15
🔗
|
godane |
its 1500+ page forums from underground gamer in brasil |
19:17
🔗
|
godane |
i am suprise how much i got from ug as far as the site looking the right |
19:17
🔗
|
godane |
*the right way |
20:18
🔗
|
dashcloud |
hi guys, found this: http://pdftribute.net/ |
20:18
🔗
|
dashcloud |
someone's getting all the #pdftribute links with papers and collecting them there |
20:19
🔗
|
dashcloud |
here's a second site doing it as well: http://pdftribute.loc-com.de/ |
20:20
🔗
|
tef |
nice |
20:20
🔗
|
dashcloud |
and this person: https://twitter.com/thejbf/statuses/290551198757560320 is archiving all of the #pdftribute tweets |