#archiveteam 2014-05-24,Sat

↑back Search

Time Nickname Message
06:03 🔗 SketchCow tephra: Thanks a ton. Now grabbing.
06:13 🔗 SketchCow The thing is working nicely!
06:21 🔗 SketchCow You're even finding shows I missed
07:15 🔗 SketchCow For whoever was helping with this... SadDM I think. It appears a ton of shows that were "deleted" are back.
07:15 🔗 SketchCow No idea what's up.
12:01 🔗 ivan` someone else please grab https://www.youtube.com/user/ElliotRodger/videos too
12:09 🔗 nico ivan`: i started
12:09 🔗 nico https://www.youtube.com/user/ElliotRodger/videos
12:09 🔗 nico with youtube-dl -t --embed-subs --embed-thumbnail --add-metadata --all-subs --write-thumbnail --write-info-json --write-annotations --youtube-include-dash-manifest https://www.youtube.com/user/ElliotRodger/videos
12:12 🔗 ivan` thanks
12:15 🔗 ivan` 6+22/136+140/136+139/135+141/135+140/135+139/134+141/134+140/134+139/133+141/133+140/133+139/160+141/160+140/160+139/best
12:15 🔗 ivan` fwiw, if you want 1080p from youtube, make sure you have a recent ffmpeg/avconv installed and use python -m youtube_dl --title --continue --retries 4 --youtube-include-dash-manifest --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors -f 38/138+141/138+22/138+140/138+139/264+141/264+22/264+140/264+139/137+141/137+22/137+140/137+139/22+141/136+141/22/13
12:16 🔗 ivan` that also gets you 480p instead of 360p and higher-quality audio in some cases
12:17 🔗 ivan` also https://github.com/ludios/youtube-dl/commits/prime if you want it to work right
12:18 🔗 ivan` if you don't want to deal with that, that's cool too ;)
12:18 🔗 nico avconv version 0.8.10-6:0.8.10-1,
12:19 🔗 nico i am using the git master of youtube-dl
12:19 🔗 ivan` that might be too old to merge DASH
12:19 🔗 ivan` I have avconv version 9.13-6:9.13-0ubuntu0.14.04.1, Copyright (c) 2000-2014 the Libav developers
12:20 🔗 ivan` recent ffmpeg builds work too
12:20 🔗 nico [youtube] -0iLHGRaYRI: Downloading DASH manifest
12:22 🔗 nico nearly 2 gbyte
12:22 🔗 ivan` I got a total of 1.18GB?
12:22 🔗 nico [download] Downloading video #21 of 22
12:24 🔗 nico https://gist.github.com/nsapa/dfcdef35017110f9ba32
12:25 🔗 ivan` oh, yeah, cause you're using the upstream youtube-dl that doesn't delete the .fNN.mp4 files after merging
12:25 🔗 nico where should i upload them?
12:27 🔗 ivan` I don't know if IA has a "mass murderers" collection yet
12:27 🔗 ivan` just wait a few days and see if they get deleted from youtube
12:33 🔗 nico ivan`: upstream update procedure: git stash && git stash drop && git pull && make cleanall && make
12:33 🔗 nico it give you a real working youtube-dl
12:34 🔗 ivan` you don't need to build anything to get a working youtube-dl
12:34 🔗 ivan` it's a python module
12:35 🔗 nico yes but i prefer calling ~nico/Dev/youtube-dl/youtube-dl to push ~nico/Dev/youtube-dl/ && python -m youtube_dl && popd
12:35 🔗 SadDM SketchCow: that's great news. I think there was something wonky going on on their end the night we were looking at it. Maybe things have settled down now.
12:35 🔗 ivan` yeah I call it with a shell function
12:36 🔗 ivan` PYTHONPATH, btw
12:37 🔗 nico http://chfoo-d1.mooo.com:8031/puush/ <= more item todo than usual, no ?
16:01 🔗 waxy Heh, gzip is really confused by megawarc files. http://cl.ly/image/3Z1429070d1a
16:22 🔗 dashcloud keep an eye on springpad: https://springpad.com/blog/2014/05/announcement-springpad-shutting-down/ it's going down on June 25th, but supposedly in the next couple of days, there will be a nice export tool released
17:30 🔗 underscor I think that's because it's a series of concatenated gzip chunks instead of a normal single-pass
17:30 🔗 underscor or something like that
17:30 🔗 underscor Oh, he's gone anyway
17:31 🔗 underscor I wish people would idle more
18:19 🔗 godane i idle all the time :P
18:29 🔗 exmic waxy, underscor: I've had the same thing happen on gzip files that are more than 4G, also on gzip files created by streaming thru gzip
18:44 🔗 underscor mhm
18:44 🔗 underscor I think it works better (in the streaming case, at least) with the --rsyncable flag
19:06 🔗 exmic I wouldn't expect so
19:09 🔗 underscor oh, hm, I guess not, cause it still doesn't know the final size
19:09 🔗 exmic yeah, rsyncable just resets the dictionary periodically
19:56 🔗 SketchCow EVR just deleted all their audio.
19:57 🔗 SketchCow Boom baby
20:00 🔗 SketchCow And we got it all.
20:00 🔗 SketchCow Archive.org will put it all up.
20:03 🔗 exmic yeahhhhhh
20:17 🔗 tephra awesome
22:50 🔗 SketchCow Someone PLEASE get this:
22:50 🔗 SketchCow https://groups.google.com/forum/#!search/bitsavers%7Csort:date/comp.os.vms/hntRR5eKKVM/rSCcPfMl3fgJ
23:04 🔗 danneh_ Should I just be able to use the standard wget call here? http://www.archiveteam.org/index.php?title=Wget
23:04 🔗 danneh_ Sorry, not too familiar with grabbing stuff like this, but I'll do my best
23:04 🔗 balrog danneh_: try it and see if it manages to get pdf files
23:06 🔗 danneh_ Downloading now
23:06 🔗 danneh_ I'll keep an eye on it
23:08 🔗 danneh_ Hmm, just grabbed that one site, I'll need to look and see hosts/etc/etc, will keep you updated
23:08 🔗 danneh_ that one page*
23:11 🔗 danneh_ I think it's because the URLs are like: http://h18000.www1.hp.com/cpq-products/quickspecs/productbulletin.html#!spectype=soc&path=soc_80000/soc_80001&docid=soc_80002
23:11 🔗 danneh_ and it doesn't worry about the fragments being different, just that they point to the same .html page
23:14 🔗 danneh_ Anyone encountered this before, or any sorta command line switches I can add to just interpret the URLs as a whole, rather than stopping at the #?
23:15 🔗 danneh_ I'm having a look about online, all else fails I'll just try to modify wget and recompile
23:23 🔗 SketchCow Uploading endless EVR shows
23:23 🔗 balrog danneh_: it feels like they're using js
23:24 🔗 balrog yep :(
23:24 🔗 godane i'm on season 13 of the joy of painting series
23:24 🔗 danneh_ https://github.com/mirror/wget/blob/38a7829dcb4eb5dba28dbf0f05c6a80fea9217f8/src/url.c#L1004
23:24 🔗 danneh_ I'll try to modify the thing and recompile it
23:24 🔗 balrog danneh_: that probably won't help.
23:24 🔗 balrog the fragment triggers a script on the page
23:24 🔗 balrog wget has no idea what to do about js
23:24 🔗 balrog :/
23:25 🔗 danneh_ hmm, that sucks
23:25 🔗 danneh_ wpull might
23:25 🔗 balrog perhaps, using phantomjs
23:25 🔗 yipdw wpull in phantomjs does, but not without that
23:26 🔗 yipdw that said, wpull's phantomjs mode only scrolls
23:26 🔗 balrog ah :/
23:26 🔗 danneh_ I'll just try to do it manually then
23:26 🔗 danneh_ extract URLs, feed them into wget
23:48 🔗 waxpancak Hey, all. I'm trying to extract the Upcoming.org save, trying to figure out the most efficient approach to reconstruct a site from .warcs or .megawarcs
23:48 🔗 waxpancak Is warctozip my best bet?
23:49 🔗 waxpancak Trying to minimize steps, there's something like 3.5TB of gzipped content
23:50 🔗 balrog waxpancak: hmm, have to think about it. the WARCs preserve the http headers and all
23:50 🔗 balrog (FWIW http://wakaba.c3.cx/s/apps/unarchiver has WARC support, since I insisted ;) )
23:53 🔗 waxpancak Ha, looking for something that'll run under Linux so I can avoid transferring 3.5TB to my desktop.
23:53 🔗 waxpancak But that's awesome.
23:54 🔗 balrog (he's ported that utility to the GNUStep libs which is great. It's awesome btw for extracting old .sits and whatever else I very frequently run into)
23:54 🔗 balrog warc-tools seems to have an extract program
23:55 🔗 balrog warcat seems to as well. what sort of output format do you need?
23:55 🔗 balrog just the files dumped out to disk?
23:55 🔗 waxpancak yep
23:56 🔗 balrog I've tried warctools before and had issues. might be more stable now though. either one of those should work... based on the documentation.
23:56 🔗 balrog they're listed at http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem
23:56 🔗 balrog https://github.com/internetarchive/warctools/ or https://github.com/chfoo/warcat
23:57 🔗 balrog if you run into issues extracting a particular warc, please let me know... I've had platform issues with wget on arm :/
23:57 🔗 balrog (putting in zero content-lengths and stuff like that)

irclogger-viewer