Time |
Nickname |
Message |
04:29
🔗
|
godane |
anyone working on justin.tv? |
04:30
🔗
|
godane |
SketchCow said on #archiveteam that its deleting all archive footage in week |
04:48
🔗
|
underscor |
ohhdemgir: 700gb is pretty large for an item |
04:48
🔗
|
underscor |
I don't think that will successfully derive ever |
04:49
🔗
|
underscor |
not positive, but I think few if any of the workers have that much free space at once |
04:49
🔗
|
underscor |
(plus room for all the output files) |
04:50
🔗
|
godane |
SketchCow: i found something useful |
04:50
🔗
|
godane |
you can grab the id of a video then put it like this: http://api.justin.tv/api/clip/show/502307186.xml |
04:50
🔗
|
godane |
then get the flv file |
05:00
🔗
|
godane |
anyways i'm grabbing the twit videos |
05:20
🔗
|
godane |
so they ip ban me on the api end |
06:20
🔗
|
godane |
so i figured out my other problem |
06:21
🔗
|
godane |
i was seeing like 41 hours of twit on the web pages |
06:21
🔗
|
godane |
but only got about 30 mins |
06:21
🔗
|
godane |
turns out that clip/show is just one of the files |
06:21
🔗
|
godane |
the best way is do it this way: http://api.justin.tv/api/broadcast/by_archive/502307186.xml |
06:21
🔗
|
godane |
that way you get the full broadcast |
06:37
🔗
|
underscor |
youtube-dl will do it too |
06:37
🔗
|
underscor |
and get all the pieces |
06:37
🔗
|
underscor |
it can even save the metadata, too |
06:38
🔗
|
underscor |
youtube-dl http://justin.tv/marugawa/b/533453715 --restrict-filenames --write-info-json --write-annotations --write-thumbnail |
07:45
🔗
|
godane |
can i just give one of you guys a list of urls? |
07:45
🔗
|
godane |
i'm grabbing it based on the api |
08:02
🔗
|
godane |
its now clsoe to 15k videos just for twit alone |
08:03
🔗
|
godane |
i'm going to pastebin the video list |
08:03
🔗
|
godane |
i don't think i have the storage to grab it anyways |
08:04
🔗
|
godane |
with most videos of twit being 500mb x 15k |
08:05
🔗
|
godane |
1G x 7500 |
08:05
🔗
|
godane |
shit |
08:05
🔗
|
godane |
make that 17k |
08:10
🔗
|
godane |
part 1: http://pastebin.com/SzBzn6XE |
08:12
🔗
|
godane |
part 2: http://pastebin.com/rSEBJHjr |
08:13
🔗
|
godane |
part 3: |
08:13
🔗
|
godane |
part 3: http://pastebin.com/Fbnk9B8f |
08:14
🔗
|
godane |
part 4: http://pastebin.com/7vhSqCVV |
08:15
🔗
|
godane |
underscor: you can use the backend of IA right? |
09:01
🔗
|
midas |
godane: grab them one by one or all at once, how strict is the banning policy? |
09:05
🔗
|
godane |
i don't know if there strict banning with videos |
09:05
🔗
|
godane |
but there is when hiting the api |
09:06
🔗
|
godane |
i'm grabbing geek beat live epsiodes |
09:06
🔗
|
midas |
part1, lots missing |
09:07
🔗
|
godane |
did you get stuff in 2010-4 folders |
09:08
🔗
|
godane |
i only has cause there was a lot there |
09:13
🔗
|
midas |
right, using more than 2 wgets will break it it seems |
09:14
🔗
|
midas |
getting the video's is easy like this, but we probably need some metadata to go with that |
09:22
🔗
|
godane |
i think people are grabbing a ton of stuff from justin.tv |
09:22
🔗
|
godane |
keep getting error page |
09:22
🔗
|
godane |
*pages |
09:43
🔗
|
midas |
justin downloads are between 60K and 18MB/s so yeah, no idea! :p |
17:07
🔗
|
antomatic |
any help needed with justin? |
17:09
🔗
|
Smiley |
antomatic: talk to midas / SketchCow |
18:12
🔗
|
midas |
and to add, godane |
21:09
🔗
|
SketchCow |
We need help. |
21:20
🔗
|
godane |
SketchCow: here is a example of the api to grab the justin.tv archive broadcast: http://api.justin.tv/api/broadcast/by_archive/326563367.xml |
21:21
🔗
|
godane |
the number is the id for the video id |
21:22
🔗
|
godane |
midas has my 4 part twit channel list |
21:22
🔗
|
godane |
since that has close to 4 years of twit.tv 24 live stream |
21:23
🔗
|
godane |
based on what i can tell i maybe close to 7.5TB to 10TB alone |
21:23
🔗
|
godane |
midas: re check 404 errors |
21:24
🔗
|
godane |
the trove has gave me fake 404 errors so they maybe fake |
22:31
🔗
|
SketchCow |
Only I care about this, but.... |
22:32
🔗
|
SketchCow |
I wrote this word cloud generation into subject keywords thing on IA. |
22:32
🔗
|
SketchCow |
It worked, but occasionally it hit some bug in the system |
22:32
🔗
|
SketchCow |
Fixed it... now adding those one or two words every 12 items that were missing |
22:53
🔗
|
godane |
you guys will be getting a GeekBeat.TV Live collection at some point |
22:53
🔗
|
godane |
based on these justin.tv rips |
23:48
🔗
|
SadDM |
SketchCow: for the record, I care... that thing is super-awesome. |