#archiveteam-bs 2013-12-06,Fri

↑back Search

Time Nickname Message
00:03 🔗 w0rp I'm going to be running my YouTube backup script from a cron job for a list of usernames, and I'll put danooct1 in there. That's the guy I've mentioned before who's been recording the effects of old PC viruses.
00:04 🔗 DFJustin what are you using, youtube-dl?
00:05 🔗 w0rp https://github.com/w0rp/riptube I wrote a new one.
00:05 🔗 DFJustin does it handle youtube's new crap for 1080p videos
00:05 🔗 w0rp The sucky part is that it gets a bunch of request errors, seemingly at random, so it kind of just retries video downloads with a short timeout again and again until it gets it.
00:06 🔗 Marcelo danooct1 was one of my first subscriptions
00:06 🔗 w0rp The good thing is that you can pretty much max out your connection speed downloading from YouTube, so its kind of like instant download, instant download, error, error, instant download, etc.
00:07 🔗 DFJustin (youtube-dl silently fails back to 720p and doesn't tell you you didn't get the full quality)
00:08 🔗 w0rp It *should* get the highest quality available for a video, including new stuff.
00:08 🔗 w0rp I know that the 4-5 accounts I downloaded thus far total 40GB, so the video files are pretty big at least. And most of them appear to be webm of some kind.
00:13 🔗 w0rp It saves a JSON file along with the video file with metadata, which looks like this. http://bpaste.net/show/JhutcFaVP7HXkp38aPSf/ (That's for me backing up retsupurae) I may have to test video quality more.
00:16 🔗 ex-parrot w0rp: how have you dealt with the obfuscation algorithms etc?
00:16 🔗 DFJustin ugh needs python 3.3
00:17 🔗 DFJustin anyway https://www.youtube.com/user/officialBEG is an example of an account with 1080p videos if you wanna try a couple
00:17 🔗 w0rp I could probably rewrite the "yield from" line and make it work with 3.2
00:17 🔗 ex-parrot how far away from working with 2.7 is it?
00:17 🔗 ex-parrot I've been working on a "personal archiver" django app but it's all in 2.7
00:18 🔗 DFJustin (cygwin64 python3 is 3.2.5)
00:18 🔗 w0rp I'll play with that. I probably just need to adjust some import lines for the old urllib structure and change the format parts.
00:18 🔗 w0rp That and account for str/unicode and bytes/str
00:19 🔗 ex-parrot my current plan for youtube is to just shell out to jwz's solution
00:21 🔗 w0rp What's kind of annoying is that Python 3.3 is the earliest Python 3 version with u"" support.
00:21 🔗 DFJustin ah dang
00:23 🔗 DFJustin for my own needs I've ended up using a modified version of this script https://github.com/rg3/youtube-dl/issues/1643#issuecomment-28317090
00:59 🔗 w0rp DFJustin: I just added Python 2.7 and 3.2 support.
00:59 🔗 w0rp I haven't tested the 3.2 support, but I have tested the 2.7 support a little.
01:00 🔗 w0rp As always, I can't feel good about unicode in Python 2.
01:04 🔗 DFJustin AttributeError: 'datetime.datetime' object has no attribute 'timestamp'
01:04 🔗 DFJustin File "./riptube.py", line 585, in to_epoch
01:04 🔗 DFJustin return datetime_obj.timestamp()
01:05 🔗 DFJustin going afk for a while
01:11 🔗 w0rp DFJustin: You can try again now. Turns out that method was introduced in 3.3.
01:35 🔗 w0rp https://www.youtube.com/watch?v=Q5okb9Vc8SY Kinect? SOLD.
01:43 🔗 ivan` w0rp: I don't get why you're writing youtube-dl
01:43 🔗 ivan` does riptube have incompatible design goals?
01:45 🔗 w0rp I want the video metadata in an easy to read format with the best possible video quality for all videos in an account, with a implementation of reasonable quality.
01:45 🔗 ivan` if youtube-dl's HTTP code sucks (super-long timeouts and lack of retries) I'm sure they would take a patch that fixed it, and it would work with a few hundred other video platforms
01:45 🔗 ivan` oh
01:47 🔗 w0rp youtube-dl's code is pretty unreadable.
01:51 🔗 ivan` it does tend to go on and on
01:54 🔗 ivan` but it does defeat their flash and js encryption schemes
01:54 🔗 ivan` and get subtitles and annotations and thumbnails
02:13 🔗 DFJustin nope this is getting the 720p version of 1080p videos
02:47 🔗 ivan` wayback's coverage of the long tail of the english wikipedia is somewhat lacking
03:32 🔗 dashcloud it's really kind of amazing that this logo even exists: https://twitter.com/ODNIgov/status/408712553179533312/photo/1
03:43 🔗 dashcloud it's an ocotopus sitting on top of Earth with lettering around the edges of the patch: NROL-39, and "Nothing is beyond our reach" (apparently the NRO's slogan)
11:57 🔗 Arkiver2 chaosradio.ccc.de download already grown to 100 GB...
11:57 🔗 Arkiver2 in just to days. O.o
12:29 🔗 GLaDOS o. http://scr.terrywri.st/1386332968.png
12:36 🔗 Baljem sounds about right
13:02 🔗 joepie91 chfoo: good work on the docs!
13:02 🔗 joepie91 from cursory reading, they seem pretty solid and well laid-out
13:03 🔗 joepie91 (oh, and the diagram is awesome :D)
13:03 🔗 Baljem +1
15:15 🔗 chfoo thanks
15:20 🔗 chfoo i think its worthy to document as much as we can
15:26 🔗 chfoo speaking of documentation, i'd would love to see how geocities was saved technically. ie, the process of finding usernames, programs used, etc
15:28 🔗 Schbirid we need a archiveteam movie kickstarter
15:31 🔗 Baljem that might be more SketchCow than even his usual documentary audiences can handle ;)
22:53 🔗 SketchCow Noooooo
22:53 🔗 SketchCow Well, maybe
22:53 🔗 SketchCow But not directed by me
22:53 🔗 SketchCow Someone else doing the travel, that would be neat
23:33 🔗 joepie91 SketchCow: don't lie, you just -love- overnight flights
23:33 🔗 joepie91 :)
23:42 🔗 BlueMax probably the only place where he can escape the nightmarish realm of archiving everything. :P
23:57 🔗 SketchCow Overnighting
23:57 🔗 SketchCow This one was me at 7:30am and arriving at 11am
23:57 🔗 SketchCow Westbound flight

irclogger-viewer