#archiveteam 2014-05-24,Sat

↑back Search

Time	Nickname	Message
06:03 ^🔗	SketchCow	tephra: Thanks a ton. Now grabbing.
06:13 ^🔗	SketchCow	The thing is working nicely!
06:21 ^🔗	SketchCow	You're even finding shows I missed
07:15 ^🔗	SketchCow	For whoever was helping with this... SadDM I think. It appears a ton of shows that were "deleted" are back.
07:15 ^🔗	SketchCow	No idea what's up.
12:01 ^🔗	ivan`	someone else please grab https://www.youtube.com/user/ElliotRodger/videos too
12:09 ^🔗	nico	ivan`: i started
12:09 ^🔗	nico	https://www.youtube.com/user/ElliotRodger/videos
12:09 ^🔗	nico	with youtube-dl -t --embed-subs --embed-thumbnail --add-metadata --all-subs --write-thumbnail --write-info-json --write-annotations --youtube-include-dash-manifest https://www.youtube.com/user/ElliotRodger/videos
12:12 ^🔗	ivan`	thanks
12:15 ^🔗	ivan`	6+22/136+140/136+139/135+141/135+140/135+139/134+141/134+140/134+139/133+141/133+140/133+139/160+141/160+140/160+139/best
12:15 ^🔗	ivan`	fwiw, if you want 1080p from youtube, make sure you have a recent ffmpeg/avconv installed and use python -m youtube_dl --title --continue --retries 4 --youtube-include-dash-manifest --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors -f 38/138+141/138+22/138+140/138+139/264+141/264+22/264+140/264+139/137+141/137+22/137+140/137+139/22+141/136+141/22/13
12:16 ^🔗	ivan`	that also gets you 480p instead of 360p and higher-quality audio in some cases
12:17 ^🔗	ivan`	also https://github.com/ludios/youtube-dl/commits/prime if you want it to work right
12:18 ^🔗	ivan`	if you don't want to deal with that, that's cool too ;)
12:18 ^🔗	nico	avconv version 0.8.10-6:0.8.10-1,
12:19 ^🔗	nico	i am using the git master of youtube-dl
12:19 ^🔗	ivan`	that might be too old to merge DASH
12:19 ^🔗	ivan`	I have avconv version 9.13-6:9.13-0ubuntu0.14.04.1, Copyright (c) 2000-2014 the Libav developers
12:20 ^🔗	ivan`	recent ffmpeg builds work too
12:20 ^🔗	nico	[youtube] -0iLHGRaYRI: Downloading DASH manifest
12:22 ^🔗	nico	nearly 2 gbyte
12:22 ^🔗	ivan`	I got a total of 1.18GB?
12:22 ^🔗	nico	[download] Downloading video #21 of 22
12:24 ^🔗	nico	https://gist.github.com/nsapa/dfcdef35017110f9ba32
12:25 ^🔗	ivan`	oh, yeah, cause you're using the upstream youtube-dl that doesn't delete the .fNN.mp4 files after merging
12:25 ^🔗	nico	where should i upload them?
12:27 ^🔗	ivan`	I don't know if IA has a "mass murderers" collection yet
12:27 ^🔗	ivan`	just wait a few days and see if they get deleted from youtube
12:33 ^🔗	nico	ivan`: upstream update procedure: git stash && git stash drop && git pull && make cleanall && make
12:33 ^🔗	nico	it give you a real working youtube-dl
12:34 ^🔗	ivan`	you don't need to build anything to get a working youtube-dl
12:34 ^🔗	ivan`	it's a python module
12:35 ^🔗	nico	yes but i prefer calling ~nico/Dev/youtube-dl/youtube-dl to push ~nico/Dev/youtube-dl/ && python -m youtube_dl && popd
12:35 ^🔗	SadDM	SketchCow: that's great news. I think there was something wonky going on on their end the night we were looking at it. Maybe things have settled down now.
12:35 ^🔗	ivan`	yeah I call it with a shell function
12:36 ^🔗	ivan`	PYTHONPATH, btw
12:37 ^🔗	nico	http://chfoo-d1.mooo.com:8031/puush/ <= more item todo than usual, no ?
16:01 ^🔗	waxy	Heh, gzip is really confused by megawarc files. http://cl.ly/image/3Z1429070d1a
16:22 ^🔗	dashcloud	keep an eye on springpad: https://springpad.com/blog/2014/05/announcement-springpad-shutting-down/ it's going down on June 25th, but supposedly in the next couple of days, there will be a nice export tool released
17:30 ^🔗	underscor	I think that's because it's a series of concatenated gzip chunks instead of a normal single-pass
17:30 ^🔗	underscor	or something like that
17:30 ^🔗	underscor	Oh, he's gone anyway
17:31 ^🔗	underscor	I wish people would idle more
18:19 ^🔗	godane	i idle all the time :P
18:29 ^🔗	exmic	waxy, underscor: I've had the same thing happen on gzip files that are more than 4G, also on gzip files created by streaming thru gzip
18:44 ^🔗	underscor	mhm
18:44 ^🔗	underscor	I think it works better (in the streaming case, at least) with the --rsyncable flag
19:06 ^🔗	exmic	I wouldn't expect so
19:09 ^🔗	underscor	oh, hm, I guess not, cause it still doesn't know the final size
19:09 ^🔗	exmic	yeah, rsyncable just resets the dictionary periodically
19:56 ^🔗	SketchCow	EVR just deleted all their audio.
19:57 ^🔗	SketchCow	Boom baby
20:00 ^🔗	SketchCow	And we got it all.
20:00 ^🔗	SketchCow	Archive.org will put it all up.
20:03 ^🔗	exmic	yeahhhhhh
20:17 ^🔗	tephra	awesome
22:50 ^🔗	SketchCow	Someone PLEASE get this:
22:50 ^🔗	SketchCow	https://groups.google.com/forum/#!search/bitsavers%7Csort:date/comp.os.vms/hntRR5eKKVM/rSCcPfMl3fgJ
23:04 ^🔗	danneh_	Should I just be able to use the standard wget call here? http://www.archiveteam.org/index.php?title=Wget
23:04 ^🔗	danneh_	Sorry, not too familiar with grabbing stuff like this, but I'll do my best
23:04 ^🔗	balrog	danneh_: try it and see if it manages to get pdf files
23:06 ^🔗	danneh_	Downloading now
23:06 ^🔗	danneh_	I'll keep an eye on it
23:08 ^🔗	danneh_	Hmm, just grabbed that one site, I'll need to look and see hosts/etc/etc, will keep you updated
23:08 ^🔗	danneh_	that one page*
23:11 ^🔗	danneh_	I think it's because the URLs are like: http://h18000.www1.hp.com/cpq-products/quickspecs/productbulletin.html#!spectype=soc&path=soc_80000/soc_80001&docid=soc_80002
23:11 ^🔗	danneh_	and it doesn't worry about the fragments being different, just that they point to the same .html page
23:14 ^🔗	danneh_	Anyone encountered this before, or any sorta command line switches I can add to just interpret the URLs as a whole, rather than stopping at the #?
23:15 ^🔗	danneh_	I'm having a look about online, all else fails I'll just try to modify wget and recompile
23:23 ^🔗	SketchCow	Uploading endless EVR shows
23:23 ^🔗	balrog	danneh_: it feels like they're using js
23:24 ^🔗	balrog	yep :(
23:24 ^🔗	godane	i'm on season 13 of the joy of painting series
23:24 ^🔗	danneh_	https://github.com/mirror/wget/blob/38a7829dcb4eb5dba28dbf0f05c6a80fea9217f8/src/url.c#L1004
23:24 ^🔗	danneh_	I'll try to modify the thing and recompile it
23:24 ^🔗	balrog	danneh_: that probably won't help.
23:24 ^🔗	balrog	the fragment triggers a script on the page
23:24 ^🔗	balrog	wget has no idea what to do about js
23:24 ^🔗	balrog	:/
23:25 ^🔗	danneh_	hmm, that sucks
23:25 ^🔗	danneh_	wpull might
23:25 ^🔗	balrog	perhaps, using phantomjs
23:25 ^🔗	yipdw	wpull in phantomjs does, but not without that
23:26 ^🔗	yipdw	that said, wpull's phantomjs mode only scrolls
23:26 ^🔗	balrog	ah :/
23:26 ^🔗	danneh_	I'll just try to do it manually then
23:26 ^🔗	danneh_	extract URLs, feed them into wget
23:48 ^🔗	waxpancak	Hey, all. I'm trying to extract the Upcoming.org save, trying to figure out the most efficient approach to reconstruct a site from .warcs or .megawarcs
23:48 ^🔗	waxpancak	Is warctozip my best bet?
23:49 ^🔗	waxpancak	Trying to minimize steps, there's something like 3.5TB of gzipped content
23:50 ^🔗	balrog	waxpancak: hmm, have to think about it. the WARCs preserve the http headers and all
23:50 ^🔗	balrog	(FWIW http://wakaba.c3.cx/s/apps/unarchiver has WARC support, since I insisted ;) )
23:53 ^🔗	waxpancak	Ha, looking for something that'll run under Linux so I can avoid transferring 3.5TB to my desktop.
23:53 ^🔗	waxpancak	But that's awesome.
23:54 ^🔗	balrog	(he's ported that utility to the GNUStep libs which is great. It's awesome btw for extracting old .sits and whatever else I very frequently run into)
23:54 ^🔗	balrog	warc-tools seems to have an extract program
23:55 ^🔗	balrog	warcat seems to as well. what sort of output format do you need?
23:55 ^🔗	balrog	just the files dumped out to disk?
23:55 ^🔗	waxpancak	yep
23:56 ^🔗	balrog	I've tried warctools before and had issues. might be more stable now though. either one of those should work... based on the documentation.
23:56 ^🔗	balrog	they're listed at http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem
23:56 ^🔗	balrog	https://github.com/internetarchive/warctools/ or https://github.com/chfoo/warcat
23:57 ^🔗	balrog	if you run into issues extracting a particular warc, please let me know... I've had platform issues with wget on arm :/
23:57 ^🔗	balrog	(putting in zero content-lengths and stuff like that)

irclogger-viewer