[06:03] tephra: Thanks a ton. Now grabbing. [06:13] The thing is working nicely! [06:21] You're even finding shows I missed [07:15] For whoever was helping with this... SadDM I think. It appears a ton of shows that were "deleted" are back. [07:15] No idea what's up. [12:01] someone else please grab https://www.youtube.com/user/ElliotRodger/videos too [12:09] ivan`: i started [12:09] https://www.youtube.com/user/ElliotRodger/videos [12:09] with youtube-dl -t --embed-subs --embed-thumbnail --add-metadata --all-subs --write-thumbnail --write-info-json --write-annotations --youtube-include-dash-manifest https://www.youtube.com/user/ElliotRodger/videos [12:12] thanks [12:15] 6+22/136+140/136+139/135+141/135+140/135+139/134+141/134+140/134+139/133+141/133+140/133+139/160+141/160+140/160+139/best [12:15] fwiw, if you want 1080p from youtube, make sure you have a recent ffmpeg/avconv installed and use python -m youtube_dl --title --continue --retries 4 --youtube-include-dash-manifest --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors -f 38/138+141/138+22/138+140/138+139/264+141/264+22/264+140/264+139/137+141/137+22/137+140/137+139/22+141/136+141/22/13 [12:16] that also gets you 480p instead of 360p and higher-quality audio in some cases [12:17] also https://github.com/ludios/youtube-dl/commits/prime if you want it to work right [12:18] if you don't want to deal with that, that's cool too ;) [12:18] avconv version 0.8.10-6:0.8.10-1, [12:19] i am using the git master of youtube-dl [12:19] that might be too old to merge DASH [12:19] I have avconv version 9.13-6:9.13-0ubuntu0.14.04.1, Copyright (c) 2000-2014 the Libav developers [12:20] recent ffmpeg builds work too [12:20] [youtube] -0iLHGRaYRI: Downloading DASH manifest [12:22] nearly 2 gbyte [12:22] I got a total of 1.18GB? [12:22] [download] Downloading video #21 of 22 [12:24] https://gist.github.com/nsapa/dfcdef35017110f9ba32 [12:25] oh, yeah, cause you're using the upstream youtube-dl that doesn't delete the .fNN.mp4 files after merging [12:25] where should i upload them? [12:27] I don't know if IA has a "mass murderers" collection yet [12:27] just wait a few days and see if they get deleted from youtube [12:33] ivan`: upstream update procedure: git stash && git stash drop && git pull && make cleanall && make [12:33] it give you a real working youtube-dl [12:34] you don't need to build anything to get a working youtube-dl [12:34] it's a python module [12:35] yes but i prefer calling ~nico/Dev/youtube-dl/youtube-dl to push ~nico/Dev/youtube-dl/ && python -m youtube_dl && popd [12:35] SketchCow: that's great news. I think there was something wonky going on on their end the night we were looking at it. Maybe things have settled down now. [12:35] yeah I call it with a shell function [12:36] PYTHONPATH, btw [12:37] http://chfoo-d1.mooo.com:8031/puush/ <= more item todo than usual, no ? [16:01] Heh, gzip is really confused by megawarc files. http://cl.ly/image/3Z1429070d1a [16:22] keep an eye on springpad: https://springpad.com/blog/2014/05/announcement-springpad-shutting-down/ it's going down on June 25th, but supposedly in the next couple of days, there will be a nice export tool released [17:30] I think that's because it's a series of concatenated gzip chunks instead of a normal single-pass [17:30] or something like that [17:30] Oh, he's gone anyway [17:31] I wish people would idle more [18:19] i idle all the time :P [18:29] waxy, underscor: I've had the same thing happen on gzip files that are more than 4G, also on gzip files created by streaming thru gzip [18:44] mhm [18:44] I think it works better (in the streaming case, at least) with the --rsyncable flag [19:06] I wouldn't expect so [19:09] oh, hm, I guess not, cause it still doesn't know the final size [19:09] yeah, rsyncable just resets the dictionary periodically [19:56] EVR just deleted all their audio. [19:57] Boom baby [20:00] And we got it all. [20:00] Archive.org will put it all up. [20:03] yeahhhhhh [20:17] awesome [22:50] Someone PLEASE get this: [22:50] https://groups.google.com/forum/#!search/bitsavers%7Csort:date/comp.os.vms/hntRR5eKKVM/rSCcPfMl3fgJ [23:04] Should I just be able to use the standard wget call here? http://www.archiveteam.org/index.php?title=Wget [23:04] Sorry, not too familiar with grabbing stuff like this, but I'll do my best [23:04] danneh_: try it and see if it manages to get pdf files [23:06] Downloading now [23:06] I'll keep an eye on it [23:08] Hmm, just grabbed that one site, I'll need to look and see hosts/etc/etc, will keep you updated [23:08] that one page* [23:11] I think it's because the URLs are like: http://h18000.www1.hp.com/cpq-products/quickspecs/productbulletin.html#!spectype=soc&path=soc_80000/soc_80001&docid=soc_80002 [23:11] and it doesn't worry about the fragments being different, just that they point to the same .html page [23:14] Anyone encountered this before, or any sorta command line switches I can add to just interpret the URLs as a whole, rather than stopping at the #? [23:15] I'm having a look about online, all else fails I'll just try to modify wget and recompile [23:23] Uploading endless EVR shows [23:23] danneh_: it feels like they're using js [23:24] yep :( [23:24] i'm on season 13 of the joy of painting series [23:24] https://github.com/mirror/wget/blob/38a7829dcb4eb5dba28dbf0f05c6a80fea9217f8/src/url.c#L1004 [23:24] I'll try to modify the thing and recompile it [23:24] danneh_: that probably won't help. [23:24] the fragment triggers a script on the page [23:24] wget has no idea what to do about js [23:24] :/ [23:25] hmm, that sucks [23:25] wpull might [23:25] perhaps, using phantomjs [23:25] wpull in phantomjs does, but not without that [23:26] that said, wpull's phantomjs mode only scrolls [23:26] ah :/ [23:26] I'll just try to do it manually then [23:26] extract URLs, feed them into wget [23:48] Hey, all. I'm trying to extract the Upcoming.org save, trying to figure out the most efficient approach to reconstruct a site from .warcs or .megawarcs [23:48] Is warctozip my best bet? [23:49] Trying to minimize steps, there's something like 3.5TB of gzipped content [23:50] waxpancak: hmm, have to think about it. the WARCs preserve the http headers and all [23:50] (FWIW http://wakaba.c3.cx/s/apps/unarchiver has WARC support, since I insisted ;) ) [23:53] Ha, looking for something that'll run under Linux so I can avoid transferring 3.5TB to my desktop. [23:53] But that's awesome. [23:54] (he's ported that utility to the GNUStep libs which is great. It's awesome btw for extracting old .sits and whatever else I very frequently run into) [23:54] warc-tools seems to have an extract program [23:55] warcat seems to as well. what sort of output format do you need? [23:55] just the files dumped out to disk? [23:55] yep [23:56] I've tried warctools before and had issues. might be more stable now though. either one of those should work... based on the documentation. [23:56] they're listed at http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem [23:56] https://github.com/internetarchive/warctools/ or https://github.com/chfoo/warcat [23:57] if you run into issues extracting a particular warc, please let me know... I've had platform issues with wget on arm :/ [23:57] (putting in zero content-lengths and stuff like that)