| Time |
Nickname |
Message |
|
04:29
🔗
|
godane |
anyone working on justin.tv? |
|
04:30
🔗
|
godane |
SketchCow said on #archiveteam that its deleting all archive footage in week |
|
04:48
🔗
|
underscor |
ohhdemgir: 700gb is pretty large for an item |
|
04:48
🔗
|
underscor |
I don't think that will successfully derive ever |
|
04:49
🔗
|
underscor |
not positive, but I think few if any of the workers have that much free space at once |
|
04:49
🔗
|
underscor |
(plus room for all the output files) |
|
04:50
🔗
|
godane |
SketchCow: i found something useful |
|
04:50
🔗
|
godane |
you can grab the id of a video then put it like this: http://api.justin.tv/api/clip/show/502307186.xml |
|
04:50
🔗
|
godane |
then get the flv file |
|
05:00
🔗
|
godane |
anyways i'm grabbing the twit videos |
|
05:20
🔗
|
godane |
so they ip ban me on the api end |
|
06:20
🔗
|
godane |
so i figured out my other problem |
|
06:21
🔗
|
godane |
i was seeing like 41 hours of twit on the web pages |
|
06:21
🔗
|
godane |
but only got about 30 mins |
|
06:21
🔗
|
godane |
turns out that clip/show is just one of the files |
|
06:21
🔗
|
godane |
the best way is do it this way: http://api.justin.tv/api/broadcast/by_archive/502307186.xml |
|
06:21
🔗
|
godane |
that way you get the full broadcast |
|
06:37
🔗
|
underscor |
youtube-dl will do it too |
|
06:37
🔗
|
underscor |
and get all the pieces |
|
06:37
🔗
|
underscor |
it can even save the metadata, too |
|
06:38
🔗
|
underscor |
youtube-dl http://justin.tv/marugawa/b/533453715 --restrict-filenames --write-info-json --write-annotations --write-thumbnail |
|
07:45
🔗
|
godane |
can i just give one of you guys a list of urls? |
|
07:45
🔗
|
godane |
i'm grabbing it based on the api |
|
08:02
🔗
|
godane |
its now clsoe to 15k videos just for twit alone |
|
08:03
🔗
|
godane |
i'm going to pastebin the video list |
|
08:03
🔗
|
godane |
i don't think i have the storage to grab it anyways |
|
08:04
🔗
|
godane |
with most videos of twit being 500mb x 15k |
|
08:05
🔗
|
godane |
1G x 7500 |
|
08:05
🔗
|
godane |
shit |
|
08:05
🔗
|
godane |
make that 17k |
|
08:10
🔗
|
godane |
part 1: http://pastebin.com/SzBzn6XE |
|
08:12
🔗
|
godane |
part 2: http://pastebin.com/rSEBJHjr |
|
08:13
🔗
|
godane |
part 3: |
|
08:13
🔗
|
godane |
part 3: http://pastebin.com/Fbnk9B8f |
|
08:14
🔗
|
godane |
part 4: http://pastebin.com/7vhSqCVV |
|
08:15
🔗
|
godane |
underscor: you can use the backend of IA right? |
|
09:01
🔗
|
midas |
godane: grab them one by one or all at once, how strict is the banning policy? |
|
09:05
🔗
|
godane |
i don't know if there strict banning with videos |
|
09:05
🔗
|
godane |
but there is when hiting the api |
|
09:06
🔗
|
godane |
i'm grabbing geek beat live epsiodes |
|
09:06
🔗
|
midas |
part1, lots missing |
|
09:07
🔗
|
godane |
did you get stuff in 2010-4 folders |
|
09:08
🔗
|
godane |
i only has cause there was a lot there |
|
09:13
🔗
|
midas |
right, using more than 2 wgets will break it it seems |
|
09:14
🔗
|
midas |
getting the video's is easy like this, but we probably need some metadata to go with that |
|
09:22
🔗
|
godane |
i think people are grabbing a ton of stuff from justin.tv |
|
09:22
🔗
|
godane |
keep getting error page |
|
09:22
🔗
|
godane |
*pages |
|
09:43
🔗
|
midas |
justin downloads are between 60K and 18MB/s so yeah, no idea! :p |
|
17:07
🔗
|
antomatic |
any help needed with justin? |
|
17:09
🔗
|
Smiley |
antomatic: talk to midas / SketchCow |
|
18:12
🔗
|
midas |
and to add, godane |
|
21:09
🔗
|
SketchCow |
We need help. |
|
21:20
🔗
|
godane |
SketchCow: here is a example of the api to grab the justin.tv archive broadcast: http://api.justin.tv/api/broadcast/by_archive/326563367.xml |
|
21:21
🔗
|
godane |
the number is the id for the video id |
|
21:22
🔗
|
godane |
midas has my 4 part twit channel list |
|
21:22
🔗
|
godane |
since that has close to 4 years of twit.tv 24 live stream |
|
21:23
🔗
|
godane |
based on what i can tell i maybe close to 7.5TB to 10TB alone |
|
21:23
🔗
|
godane |
midas: re check 404 errors |
|
21:24
🔗
|
godane |
the trove has gave me fake 404 errors so they maybe fake |
|
22:31
🔗
|
SketchCow |
Only I care about this, but.... |
|
22:32
🔗
|
SketchCow |
I wrote this word cloud generation into subject keywords thing on IA. |
|
22:32
🔗
|
SketchCow |
It worked, but occasionally it hit some bug in the system |
|
22:32
🔗
|
SketchCow |
Fixed it... now adding those one or two words every 12 items that were missing |
|
22:53
🔗
|
godane |
you guys will be getting a GeekBeat.TV Live collection at some point |
|
22:53
🔗
|
godane |
based on these justin.tv rips |
|
23:48
🔗
|
SadDM |
SketchCow: for the record, I care... that thing is super-awesome. |