[01:01] http://allthingsd.com/20131231/you-say-goodbye-and-we-say-hello/ [02:17] Cameron_D: right [02:17] game plan? [02:17] because it sounds like we have less than 24 hours [02:18] fuck it [02:18] will throw it into archivebot [02:18] and see what happens [02:20] that should get most of it [02:20] althoug they do seem to have a fair chunk of video content http://allthingsd.com/video/ [02:27] Cameron_D: I have to say that I'm a bit taken aback by the non-noisyness of the URLs on allthingsd [02:27] it all seems... pretty sane [02:29] yeah, which is nice [02:29] yes :P [02:30] The comments are all hosted externally so I don't think its downloading them [02:46] Cameron_D [02:46] it seems to do comments fine [02:46] it's grabbing stuff from avatars.fyre.co anyway [02:46] (fyre == livefyre == afaik the comments system they use) [02:49] ah cool [04:09] i think i'm grabbing allthingsd videos [04:09] m.wsj.net/video/ is not 403 error [04:09] i can grab all of it [05:30] Happy New Year! [05:32] happy new year! [05:37] watch live.twit.tv, much beter then any network tv [05:56] someone with upstream please grab https://www.youtube.com/user/AllThingsD/videos [05:56] youtube-dl handles /user/ URLs [05:57] youtube-dl --title --continue --retries 4 --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors "https://www.youtube.com/user/AllThingsD/videos" [05:59] i found a way to grab the wsj source video [05:59] there is going to be tons of warc.gz of that [05:59] cool [06:00] ivan`: http://m.wsj.net/video/ [06:00] all videos i think of wsj is there [06:46] Where can I find an archive of /soc/ about 3-4 months ago [06:55] ah hell no [06:55] http://techcrunch.com/2013/12/31/google-to-close-bump-and-flock-its-recently-acquired-file-sharing-apps/ [07:21] aww man, bump was great [16:18] anyone bored? some netlabel that could use its releases put into IA. i am not affiliated, just randomly found it. please be nice and slow as it is a lot of releases. tell me if you are doing it! http://www.darklandrecordings.com/releases [16:19] also: [16:19] http://www.starquakerecords.com/all.html [16:21] and http://odgprod.com/ [16:25] last one should be easy http://odgprod.com/son/zip/ (but of course extracting and metadata is the hard work anyways) [16:30] another http://www.endlessascent.com/ [16:52] !ao http://www.slate.com/blogs/behold/2013/12/30/paula_salischiker_photographs_hoarders_in_britain_in_her_series_the_art.html [16:52] sorry [16:52] wrong channel [16:52] :D [18:36] Internet Archive got $1.3 million for fund drive [18:39] \o/ [18:40] 10 petabytes to be purchased for disk space, apparently. [18:44] wow nice! [18:48] Yay! [19:37] Yes, we've not quite outgrown the archive yet. [19:45] yet. [19:46] We are a little nutty with the space. [19:48] I can upload the whole Wikimedia Commons repository 40 times in that space, hmm [19:48] Oh god please don't [19:49] ...of course not [19:52] :D [19:52] that'd be odd. [20:28] http://blog.bu.mp/post/71781606704/all-good-things [21:12] how long will 10 petabytes last? [21:13] We estimate 18 months [21:14] Did the on-demand wayback machine archiving increase the rate at which space is consumed? [21:15] SketchCow: i figured you would want to know about this: http://m.wsj.net/video/ [21:16] all wall street journal videos [21:16] i'm making a collection of sorts: https://archive.org/search.php?query=creator%3A%22m.wsj.net%22 [21:16] awesome [21:17] think all things d is in the 19000xxx numbers [21:20] also you should know this bug has been around since christmas [21:20] based on google cache [21:23] SketchCow: any guess on how many TB of YouTube wayback has? [21:27] Oh no idea. [22:05] i'm doing a grab of the index of m.wsj.net/video/ [22:06] that way we can at least grab the files even if this folder is 403 again [22:08] ivan`: 932.48 TB in the youtubecrawl collection [22:23] DFJustin: wow, my guess was closer to 200TB [22:25] that's a lot of YouTube [22:37] Mmmmm, sorting godane uploads