Time |
Nickname |
Message |
01:01
🔗
|
Cameron_D |
http://allthingsd.com/20131231/you-say-goodbye-and-we-say-hello/ |
02:17
🔗
|
joepie91 |
Cameron_D: right |
02:17
🔗
|
joepie91 |
game plan? |
02:17
🔗
|
joepie91 |
because it sounds like we have less than 24 hours |
02:18
🔗
|
joepie91 |
fuck it |
02:18
🔗
|
joepie91 |
will throw it into archivebot |
02:18
🔗
|
joepie91 |
and see what happens |
02:20
🔗
|
Cameron_D |
that should get most of it |
02:20
🔗
|
Cameron_D |
althoug they do seem to have a fair chunk of video content http://allthingsd.com/video/ |
02:27
🔗
|
joepie91 |
Cameron_D: I have to say that I'm a bit taken aback by the non-noisyness of the URLs on allthingsd |
02:27
🔗
|
joepie91 |
it all seems... pretty sane |
02:29
🔗
|
Cameron_D |
yeah, which is nice |
02:29
🔗
|
joepie91 |
yes :P |
02:30
🔗
|
Cameron_D |
The comments are all hosted externally so I don't think its downloading them |
02:46
🔗
|
joepie91 |
Cameron_D |
02:46
🔗
|
joepie91 |
it seems to do comments fine |
02:46
🔗
|
joepie91 |
it's grabbing stuff from avatars.fyre.co anyway |
02:46
🔗
|
joepie91 |
(fyre == livefyre == afaik the comments system they use) |
02:49
🔗
|
Cameron_D |
ah cool |
04:09
🔗
|
godane |
i think i'm grabbing allthingsd videos |
04:09
🔗
|
godane |
m.wsj.net/video/ is not 403 error |
04:09
🔗
|
godane |
i can grab all of it |
05:30
🔗
|
godane |
Happy New Year! |
05:32
🔗
|
dashcloud |
happy new year! |
05:37
🔗
|
BiggieJon |
watch live.twit.tv, much beter then any network tv |
05:56
🔗
|
ivan` |
someone with upstream please grab https://www.youtube.com/user/AllThingsD/videos |
05:56
🔗
|
ivan` |
youtube-dl handles /user/ URLs |
05:57
🔗
|
ivan` |
youtube-dl --title --continue --retries 4 --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors "https://www.youtube.com/user/AllThingsD/videos" |
05:59
🔗
|
godane |
i found a way to grab the wsj source video |
05:59
🔗
|
godane |
there is going to be tons of warc.gz of that |
05:59
🔗
|
ivan` |
cool |
06:00
🔗
|
godane |
ivan`: http://m.wsj.net/video/ |
06:00
🔗
|
godane |
all videos i think of wsj is there |
06:46
🔗
|
Dessiato |
Where can I find an archive of /soc/ about 3-4 months ago |
06:55
🔗
|
wp494 |
ah hell no |
06:55
🔗
|
wp494 |
http://techcrunch.com/2013/12/31/google-to-close-bump-and-flock-its-recently-acquired-file-sharing-apps/ |
07:21
🔗
|
Cameron_D |
aww man, bump was great |
16:18
🔗
|
Schbirid |
anyone bored? some netlabel that could use its releases put into IA. i am not affiliated, just randomly found it. please be nice and slow as it is a lot of releases. tell me if you are doing it! http://www.darklandrecordings.com/releases |
16:19
🔗
|
Schbirid |
also: |
16:19
🔗
|
Schbirid |
http://www.starquakerecords.com/all.html |
16:21
🔗
|
Schbirid |
and http://odgprod.com/ |
16:25
🔗
|
Schbirid |
last one should be easy http://odgprod.com/son/zip/ (but of course extracting and metadata is the hard work anyways) |
16:30
🔗
|
Schbirid |
another http://www.endlessascent.com/ |
16:52
🔗
|
godane |
!ao http://www.slate.com/blogs/behold/2013/12/30/paula_salischiker_photographs_hoarders_in_britain_in_her_series_the_art.html |
16:52
🔗
|
godane |
sorry |
16:52
🔗
|
godane |
wrong channel |
16:52
🔗
|
Smiley |
:D |
18:36
🔗
|
SketchCow |
Internet Archive got $1.3 million for fund drive |
18:39
🔗
|
Smiley |
\o/ |
18:40
🔗
|
SketchCow |
10 petabytes to be purchased for disk space, apparently. |
18:44
🔗
|
balrog |
wow nice! |
18:48
🔗
|
ersi |
Yay! |
19:37
🔗
|
SketchCow |
Yes, we've not quite outgrown the archive yet. |
19:45
🔗
|
balrog |
yet. |
19:46
🔗
|
SketchCow |
We are a little nutty with the space. |
19:48
🔗
|
Nemo_bis |
I can upload the whole Wikimedia Commons repository 40 times in that space, hmm |
19:48
🔗
|
turnip |
Oh god please don't |
19:49
🔗
|
Nemo_bis |
...of course not |
19:52
🔗
|
Smiley |
:D |
19:52
🔗
|
Smiley |
that'd be odd. |
20:28
🔗
|
balrog |
http://blog.bu.mp/post/71781606704/all-good-things |
21:12
🔗
|
zenguy_pc |
how long will 10 petabytes last? |
21:13
🔗
|
SketchCow |
We estimate 18 months |
21:14
🔗
|
Nemo_bis |
Did the on-demand wayback machine archiving increase the rate at which space is consumed? |
21:15
🔗
|
godane |
SketchCow: i figured you would want to know about this: http://m.wsj.net/video/ |
21:16
🔗
|
godane |
all wall street journal videos |
21:16
🔗
|
godane |
i'm making a collection of sorts: https://archive.org/search.php?query=creator%3A%22m.wsj.net%22 |
21:16
🔗
|
Schbirid |
awesome |
21:17
🔗
|
godane |
think all things d is in the 19000xxx numbers |
21:20
🔗
|
godane |
also you should know this bug has been around since christmas |
21:20
🔗
|
godane |
based on google cache |
21:23
🔗
|
ivan` |
SketchCow: any guess on how many TB of YouTube wayback has? |
21:27
🔗
|
SketchCow |
Oh no idea. |
22:05
🔗
|
godane |
i'm doing a grab of the index of m.wsj.net/video/ |
22:06
🔗
|
godane |
that way we can at least grab the files even if this folder is 403 again |
22:08
🔗
|
DFJustin |
ivan`: 932.48 TB in the youtubecrawl collection |
22:23
🔗
|
ivan` |
DFJustin: wow, my guess was closer to 200TB |
22:25
🔗
|
ivan` |
that's a lot of YouTube |
22:37
🔗
|
SketchCow |
Mmmmm, sorting godane uploads |