#archiveteam-bs 2014-05-31,Sat

↑back Search

Time Nickname Message
04:29 🔗 godane anyone working on justin.tv?
04:30 🔗 godane SketchCow said on #archiveteam that its deleting all archive footage in week
04:48 🔗 underscor ohhdemgir: 700gb is pretty large for an item
04:48 🔗 underscor I don't think that will successfully derive ever
04:49 🔗 underscor not positive, but I think few if any of the workers have that much free space at once
04:49 🔗 underscor (plus room for all the output files)
04:50 🔗 godane SketchCow: i found something useful
04:50 🔗 godane you can grab the id of a video then put it like this: http://api.justin.tv/api/clip/show/502307186.xml
04:50 🔗 godane then get the flv file
05:00 🔗 godane anyways i'm grabbing the twit videos
05:20 🔗 godane so they ip ban me on the api end
06:20 🔗 godane so i figured out my other problem
06:21 🔗 godane i was seeing like 41 hours of twit on the web pages
06:21 🔗 godane but only got about 30 mins
06:21 🔗 godane turns out that clip/show is just one of the files
06:21 🔗 godane the best way is do it this way: http://api.justin.tv/api/broadcast/by_archive/502307186.xml
06:21 🔗 godane that way you get the full broadcast
06:37 🔗 underscor youtube-dl will do it too
06:37 🔗 underscor and get all the pieces
06:37 🔗 underscor it can even save the metadata, too
06:38 🔗 underscor youtube-dl http://justin.tv/marugawa/b/533453715 --restrict-filenames --write-info-json --write-annotations --write-thumbnail
07:45 🔗 godane can i just give one of you guys a list of urls?
07:45 🔗 godane i'm grabbing it based on the api
08:02 🔗 godane its now clsoe to 15k videos just for twit alone
08:03 🔗 godane i'm going to pastebin the video list
08:03 🔗 godane i don't think i have the storage to grab it anyways
08:04 🔗 godane with most videos of twit being 500mb x 15k
08:05 🔗 godane 1G x 7500
08:05 🔗 godane shit
08:05 🔗 godane make that 17k
08:10 🔗 godane part 1: http://pastebin.com/SzBzn6XE
08:12 🔗 godane part 2: http://pastebin.com/rSEBJHjr
08:13 🔗 godane part 3:
08:13 🔗 godane part 3: http://pastebin.com/Fbnk9B8f
08:14 🔗 godane part 4: http://pastebin.com/7vhSqCVV
08:15 🔗 godane underscor: you can use the backend of IA right?
09:01 🔗 midas godane: grab them one by one or all at once, how strict is the banning policy?
09:05 🔗 godane i don't know if there strict banning with videos
09:05 🔗 godane but there is when hiting the api
09:06 🔗 godane i'm grabbing geek beat live epsiodes
09:06 🔗 midas part1, lots missing
09:07 🔗 godane did you get stuff in 2010-4 folders
09:08 🔗 godane i only has cause there was a lot there
09:13 🔗 midas right, using more than 2 wgets will break it it seems
09:14 🔗 midas getting the video's is easy like this, but we probably need some metadata to go with that
09:22 🔗 godane i think people are grabbing a ton of stuff from justin.tv
09:22 🔗 godane keep getting error page
09:22 🔗 godane *pages
09:43 🔗 midas justin downloads are between 60K and 18MB/s so yeah, no idea! :p
17:07 🔗 antomatic any help needed with justin?
17:09 🔗 Smiley antomatic: talk to midas / SketchCow
18:12 🔗 midas and to add, godane
21:09 🔗 SketchCow We need help.
21:20 🔗 godane SketchCow: here is a example of the api to grab the justin.tv archive broadcast: http://api.justin.tv/api/broadcast/by_archive/326563367.xml
21:21 🔗 godane the number is the id for the video id
21:22 🔗 godane midas has my 4 part twit channel list
21:22 🔗 godane since that has close to 4 years of twit.tv 24 live stream
21:23 🔗 godane based on what i can tell i maybe close to 7.5TB to 10TB alone
21:23 🔗 godane midas: re check 404 errors
21:24 🔗 godane the trove has gave me fake 404 errors so they maybe fake
22:31 🔗 SketchCow Only I care about this, but....
22:32 🔗 SketchCow I wrote this word cloud generation into subject keywords thing on IA.
22:32 🔗 SketchCow It worked, but occasionally it hit some bug in the system
22:32 🔗 SketchCow Fixed it... now adding those one or two words every 12 items that were missing
22:53 🔗 godane you guys will be getting a GeekBeat.TV Live collection at some point
22:53 🔗 godane based on these justin.tv rips
23:48 🔗 SadDM SketchCow: for the record, I care... that thing is super-awesome.

irclogger-viewer