Time |
Nickname |
Message |
06:03
🔗
|
SketchCow |
tephra: Thanks a ton. Now grabbing. |
06:13
🔗
|
SketchCow |
The thing is working nicely! |
06:21
🔗
|
SketchCow |
You're even finding shows I missed |
07:15
🔗
|
SketchCow |
For whoever was helping with this... SadDM I think. It appears a ton of shows that were "deleted" are back. |
07:15
🔗
|
SketchCow |
No idea what's up. |
12:01
🔗
|
ivan` |
someone else please grab https://www.youtube.com/user/ElliotRodger/videos too |
12:09
🔗
|
nico |
ivan`: i started |
12:09
🔗
|
nico |
https://www.youtube.com/user/ElliotRodger/videos |
12:09
🔗
|
nico |
with youtube-dl -t --embed-subs --embed-thumbnail --add-metadata --all-subs --write-thumbnail --write-info-json --write-annotations --youtube-include-dash-manifest https://www.youtube.com/user/ElliotRodger/videos |
12:12
🔗
|
ivan` |
thanks |
12:15
🔗
|
ivan` |
6+22/136+140/136+139/135+141/135+140/135+139/134+141/134+140/134+139/133+141/133+140/133+139/160+141/160+140/160+139/best |
12:15
🔗
|
ivan` |
fwiw, if you want 1080p from youtube, make sure you have a recent ffmpeg/avconv installed and use python -m youtube_dl --title --continue --retries 4 --youtube-include-dash-manifest --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors -f 38/138+141/138+22/138+140/138+139/264+141/264+22/264+140/264+139/137+141/137+22/137+140/137+139/22+141/136+141/22/13 |
12:16
🔗
|
ivan` |
that also gets you 480p instead of 360p and higher-quality audio in some cases |
12:17
🔗
|
ivan` |
also https://github.com/ludios/youtube-dl/commits/prime if you want it to work right |
12:18
🔗
|
ivan` |
if you don't want to deal with that, that's cool too ;) |
12:18
🔗
|
nico |
avconv version 0.8.10-6:0.8.10-1, |
12:19
🔗
|
nico |
i am using the git master of youtube-dl |
12:19
🔗
|
ivan` |
that might be too old to merge DASH |
12:19
🔗
|
ivan` |
I have avconv version 9.13-6:9.13-0ubuntu0.14.04.1, Copyright (c) 2000-2014 the Libav developers |
12:20
🔗
|
ivan` |
recent ffmpeg builds work too |
12:20
🔗
|
nico |
[youtube] -0iLHGRaYRI: Downloading DASH manifest |
12:22
🔗
|
nico |
nearly 2 gbyte |
12:22
🔗
|
ivan` |
I got a total of 1.18GB? |
12:22
🔗
|
nico |
[download] Downloading video #21 of 22 |
12:24
🔗
|
nico |
https://gist.github.com/nsapa/dfcdef35017110f9ba32 |
12:25
🔗
|
ivan` |
oh, yeah, cause you're using the upstream youtube-dl that doesn't delete the .fNN.mp4 files after merging |
12:25
🔗
|
nico |
where should i upload them? |
12:27
🔗
|
ivan` |
I don't know if IA has a "mass murderers" collection yet |
12:27
🔗
|
ivan` |
just wait a few days and see if they get deleted from youtube |
12:33
🔗
|
nico |
ivan`: upstream update procedure: git stash && git stash drop && git pull && make cleanall && make |
12:33
🔗
|
nico |
it give you a real working youtube-dl |
12:34
🔗
|
ivan` |
you don't need to build anything to get a working youtube-dl |
12:34
🔗
|
ivan` |
it's a python module |
12:35
🔗
|
nico |
yes but i prefer calling ~nico/Dev/youtube-dl/youtube-dl to push ~nico/Dev/youtube-dl/ && python -m youtube_dl && popd |
12:35
🔗
|
SadDM |
SketchCow: that's great news. I think there was something wonky going on on their end the night we were looking at it. Maybe things have settled down now. |
12:35
🔗
|
ivan` |
yeah I call it with a shell function |
12:36
🔗
|
ivan` |
PYTHONPATH, btw |
12:37
🔗
|
nico |
http://chfoo-d1.mooo.com:8031/puush/ <= more item todo than usual, no ? |
16:01
🔗
|
waxy |
Heh, gzip is really confused by megawarc files. http://cl.ly/image/3Z1429070d1a |
16:22
🔗
|
dashcloud |
keep an eye on springpad: https://springpad.com/blog/2014/05/announcement-springpad-shutting-down/ it's going down on June 25th, but supposedly in the next couple of days, there will be a nice export tool released |
17:30
🔗
|
underscor |
I think that's because it's a series of concatenated gzip chunks instead of a normal single-pass |
17:30
🔗
|
underscor |
or something like that |
17:30
🔗
|
underscor |
Oh, he's gone anyway |
17:31
🔗
|
underscor |
I wish people would idle more |
18:19
🔗
|
godane |
i idle all the time :P |
18:29
🔗
|
exmic |
waxy, underscor: I've had the same thing happen on gzip files that are more than 4G, also on gzip files created by streaming thru gzip |
18:44
🔗
|
underscor |
mhm |
18:44
🔗
|
underscor |
I think it works better (in the streaming case, at least) with the --rsyncable flag |
19:06
🔗
|
exmic |
I wouldn't expect so |
19:09
🔗
|
underscor |
oh, hm, I guess not, cause it still doesn't know the final size |
19:09
🔗
|
exmic |
yeah, rsyncable just resets the dictionary periodically |
19:56
🔗
|
SketchCow |
EVR just deleted all their audio. |
19:57
🔗
|
SketchCow |
Boom baby |
20:00
🔗
|
SketchCow |
And we got it all. |
20:00
🔗
|
SketchCow |
Archive.org will put it all up. |
20:03
🔗
|
exmic |
yeahhhhhh |
20:17
🔗
|
tephra |
awesome |
22:50
🔗
|
SketchCow |
Someone PLEASE get this: |
22:50
🔗
|
SketchCow |
https://groups.google.com/forum/#!search/bitsavers%7Csort:date/comp.os.vms/hntRR5eKKVM/rSCcPfMl3fgJ |
23:04
🔗
|
danneh_ |
Should I just be able to use the standard wget call here? http://www.archiveteam.org/index.php?title=Wget |
23:04
🔗
|
danneh_ |
Sorry, not too familiar with grabbing stuff like this, but I'll do my best |
23:04
🔗
|
balrog |
danneh_: try it and see if it manages to get pdf files |
23:06
🔗
|
danneh_ |
Downloading now |
23:06
🔗
|
danneh_ |
I'll keep an eye on it |
23:08
🔗
|
danneh_ |
Hmm, just grabbed that one site, I'll need to look and see hosts/etc/etc, will keep you updated |
23:08
🔗
|
danneh_ |
that one page* |
23:11
🔗
|
danneh_ |
I think it's because the URLs are like: http://h18000.www1.hp.com/cpq-products/quickspecs/productbulletin.html#!spectype=soc&path=soc_80000/soc_80001&docid=soc_80002 |
23:11
🔗
|
danneh_ |
and it doesn't worry about the fragments being different, just that they point to the same .html page |
23:14
🔗
|
danneh_ |
Anyone encountered this before, or any sorta command line switches I can add to just interpret the URLs as a whole, rather than stopping at the #? |
23:15
🔗
|
danneh_ |
I'm having a look about online, all else fails I'll just try to modify wget and recompile |
23:23
🔗
|
SketchCow |
Uploading endless EVR shows |
23:23
🔗
|
balrog |
danneh_: it feels like they're using js |
23:24
🔗
|
balrog |
yep :( |
23:24
🔗
|
godane |
i'm on season 13 of the joy of painting series |
23:24
🔗
|
danneh_ |
https://github.com/mirror/wget/blob/38a7829dcb4eb5dba28dbf0f05c6a80fea9217f8/src/url.c#L1004 |
23:24
🔗
|
danneh_ |
I'll try to modify the thing and recompile it |
23:24
🔗
|
balrog |
danneh_: that probably won't help. |
23:24
🔗
|
balrog |
the fragment triggers a script on the page |
23:24
🔗
|
balrog |
wget has no idea what to do about js |
23:24
🔗
|
balrog |
:/ |
23:25
🔗
|
danneh_ |
hmm, that sucks |
23:25
🔗
|
danneh_ |
wpull might |
23:25
🔗
|
balrog |
perhaps, using phantomjs |
23:25
🔗
|
yipdw |
wpull in phantomjs does, but not without that |
23:26
🔗
|
yipdw |
that said, wpull's phantomjs mode only scrolls |
23:26
🔗
|
balrog |
ah :/ |
23:26
🔗
|
danneh_ |
I'll just try to do it manually then |
23:26
🔗
|
danneh_ |
extract URLs, feed them into wget |
23:48
🔗
|
waxpancak |
Hey, all. I'm trying to extract the Upcoming.org save, trying to figure out the most efficient approach to reconstruct a site from .warcs or .megawarcs |
23:48
🔗
|
waxpancak |
Is warctozip my best bet? |
23:49
🔗
|
waxpancak |
Trying to minimize steps, there's something like 3.5TB of gzipped content |
23:50
🔗
|
balrog |
waxpancak: hmm, have to think about it. the WARCs preserve the http headers and all |
23:50
🔗
|
balrog |
(FWIW http://wakaba.c3.cx/s/apps/unarchiver has WARC support, since I insisted ;) ) |
23:53
🔗
|
waxpancak |
Ha, looking for something that'll run under Linux so I can avoid transferring 3.5TB to my desktop. |
23:53
🔗
|
waxpancak |
But that's awesome. |
23:54
🔗
|
balrog |
(he's ported that utility to the GNUStep libs which is great. It's awesome btw for extracting old .sits and whatever else I very frequently run into) |
23:54
🔗
|
balrog |
warc-tools seems to have an extract program |
23:55
🔗
|
balrog |
warcat seems to as well. what sort of output format do you need? |
23:55
🔗
|
balrog |
just the files dumped out to disk? |
23:55
🔗
|
waxpancak |
yep |
23:56
🔗
|
balrog |
I've tried warctools before and had issues. might be more stable now though. either one of those should work... based on the documentation. |
23:56
🔗
|
balrog |
they're listed at http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem |
23:56
🔗
|
balrog |
https://github.com/internetarchive/warctools/ or https://github.com/chfoo/warcat |
23:57
🔗
|
balrog |
if you run into issues extracting a particular warc, please let me know... I've had platform issues with wget on arm :/ |
23:57
🔗
|
balrog |
(putting in zero content-lengths and stuff like that) |