#archiveteam 2014-06-17,Tue

↑back Search

Time Nickname Message
00:04 🔗 SketchCow Understood and known.
00:04 🔗 SketchCow Hey, anyone got a server they can give me an account on to run firefox and an X server (small one) on?
00:35 🔗 SN4T14 SketchCow, still need that server?
00:45 🔗 SketchCow I'd like to play around on one, yes.
00:46 🔗 SN4T14 ohhdemgir, are you there?
09:16 🔗 midas https://blog.box.com/2014/06/box-acquiring-streem-bringing-the-cloud-to-your-desktop/
09:20 🔗 garyrh "Streem has been acquired by Box! We're creating an optional migration path so all of your data will be safe!"
09:39 🔗 Nemo_bis "optional" and "all" don't get along well
09:43 🔗 garyrh looks like streem has disabled all public videos.
10:01 🔗 garyrh actually, the videos are still technically accessible
10:01 🔗 garyrh e.g. https://streem.s3.amazonaws.com/objects/9b38131adb0b0c6e36830d8fbeeb3fb4/LinuxCon_and_CloudOpen_North_America_2013_-_Linux_Kernel_Panel.mp4
10:02 🔗 ohhdemgir SN4T14, you can give SketchCo1 an account on arc01 if you want
11:30 🔗 * SketchCow jumps up and down at the counter
11:32 🔗 BlueMax one million hard drives, sir?
11:45 🔗 midas Ryan Kearney is delivering his 1PB drive
14:14 🔗 SN4T14 SketchCow, hang on, let me get you an account. :p
14:19 🔗 SN4T14 Although, you will have to set up the DE and everything yourself (or you can set up a VM and have the installer do it for you)
15:28 🔗 Arkiver2 http://www.theguardian.com/technology/2014/jun/17/youtube-indie-labels-music-subscription
15:33 🔗 db48x Arkiver2: fun
15:34 🔗 db48x Arkiver2: how will we identify the videos that are likely to be taken down?
15:34 🔗 Arkiver2 db48x: I have no idea
15:34 🔗 Arkiver2 is it public with which music labels google made an agreement?
15:34 🔗 Arkiver2 Does google release those knd of contracts?
15:35 🔗 db48x I don't suppose there's a list of independant artists and their youtube channels...
15:35 🔗 db48x hah, no
15:38 🔗 Arkiver2 found some
15:38 🔗 Arkiver2 Adele and Arctic Monkeys are two examples of artists that are going to be blocked
15:38 🔗 Arkiver2 so let's do those at least
15:39 🔗 db48x https://www.youtube.com/playlist?list=PL55DF5F0E7C2C2DD3
15:39 🔗 db48x https://www.youtube.com/playlist?list=PLV7t4yekvqhv9x2_P8Jvue6IvtH4LCm20
15:39 🔗 db48x https://www.youtube.com/user/notsignedtv
15:40 🔗 ivan` prepare for blocked youtube content http://www.theguardian.com/technology/2014/jun/17/youtube-indie-labels-music-subscription
15:40 🔗 Arkiver2 ivan`: we're discussing it already :P
15:40 🔗 ivan` I missed it :)
15:40 🔗 Arkiver2 lol
15:40 🔗 Arkiver2 we're searching for indepenent artists
15:41 🔗 db48x the best way I know of to download a youtube channel is to use http://www.jwz.org/hacks/youtubefeed.pl
15:41 🔗 ivan` does that get you 1080p and 256kbit DASH?
15:41 🔗 Arkiver2 db48x: but that doesn't create a warc with youtube vids right?
15:41 🔗 db48x yea, it sorts by quality and grabs the best one
15:41 🔗 db48x Arkiver2: nope
15:42 🔗 db48x ivan`: of course the list of video types could easily be wrong or out of date, we'd want to double-check :)
15:42 🔗 ivan` db48x: have you observed it download a 1080p video after 2013-10?
15:42 🔗 db48x yes
15:42 🔗 db48x well
15:42 🔗 Arkiver2 db48x: I'll start crawls with heritrix on the channels of some artists so their pages of the videos are saved, BUT NOT THE VIDEOS THEMSELVES are in the warcs
15:42 🔗 db48x in 2013 yes, dunno about after november actually...
15:43 🔗 db48x ivan`: we could modify it to just grab all the offered videos, that would do the trick
15:43 🔗 ivan` anyway this is how I use youtube-dl https://www.refheap.com/d97ee2660f3ebec52c8265f1e/raw
15:43 🔗 Nemo_bis emijrp made me upload thousands videos with https://code.google.com/p/emijrp/source/browse/trunk/scrapers/youtube2internetarchive.py :)
15:43 🔗 db48x already have to modify it not to skip videos older than 2 days
15:45 🔗 db48x I think the article is probably a bit alarmist though
15:46 🔗 db48x I can't see them banning every account where someone sat down with a guitar in front of a camera
15:46 🔗 yipdw they'll ban you if you don't use your Real Name
15:46 🔗 yipdw at least
15:46 🔗 db48x on the other hand, they could just review all videos that seem to have music but that don't fall afoul of their stupid copyrighted-music detector
15:47 🔗 yipdw Rumour Has It that that's the case
15:47 🔗 yipdw I guess everyone will have to find Someone Like Youtube
15:48 🔗 db48x still, let's engage our paranoia anyway
15:51 🔗 joepie91_ Arkiver2, db48x: not a whole lot of "you" left in "youtube"
15:51 🔗 db48x heh
15:52 🔗 joepie91_ also
15:52 🔗 joepie91_ [17:46] <db48x> I can't see them banning every account where someone sat down with a guitar in front of a camera
15:52 🔗 joepie91_ you'd be amazed
16:01 🔗 db48x yea, I couldn't imagine them finding them all, but then I remembered their copyrighted-music detector
16:02 🔗 db48x anything that _doesn't_ get flagged by that but that doesn't look like speech is basically going to be independant music
16:05 🔗 yipdw it's good to know that Youtube is no better than the labels
16:05 🔗 yipdw creative destruction of expectations
16:16 🔗 SN4T14 db48x, stop being silly and start using youtube-dl. :p
16:20 🔗 db48x youtube-dl won't download a whole rss feed
16:20 🔗 SN4T14 Why do you need RSS feeds?
16:20 🔗 db48x so that I can go around finding and downloading whole channels
16:21 🔗 db48x rather than individual videos
16:21 🔗 Arkiver2 I have a windows program here that downloads videos in the best quality
16:22 🔗 SN4T14 db48x, youtube-dl can download entire channels
16:22 🔗 schbirid youtube-dl does that just fine afaik
16:22 🔗 schbirid and playlists etc
16:22 🔗 SN4T14 No need to mess around with RSS feeds when there's simpler ways. ;)
16:23 🔗 db48x it'd be nice if the documentation mentioned that
16:24 🔗 schbirid man youtube-dl :P
16:25 🔗 SN4T14 ^
16:25 🔗 db48x obviously that's no help if you haven't installed it because it looks like it won't do what you want :)
16:25 🔗 schbirid heh true
16:26 🔗 SN4T14 also https://www.google.is/search?q=youtube+dl+entire+channel
16:26 🔗 SN4T14 ;)
16:26 🔗 SN4T14 Yes, that is an Icelandic Google link because I'm lazy. :p
16:27 🔗 db48x you're supposed to use lmgtfy.com :)
16:28 🔗 SN4T14 www.lmgtfy.com/?q=no
16:28 🔗 SN4T14 :p
16:28 🔗 db48x heh
16:28 🔗 db48x should we build a list of channels on an etherpad or the wiki or something?
16:29 🔗 SN4T14 piratepad! :D
16:29 🔗 db48x :)
16:29 🔗 schbirid wiki
16:30 🔗 schbirid isnt it what its for?
16:30 🔗 db48x wikis are great, but there's more round-trip time
16:31 🔗 SN4T14 Make a wiki page for it, and link to the piratepad list. ;)
16:31 🔗 joepie91_ db48x: or, y'know, the bottom of http://rg3.github.io/youtube-dl/supportedsites.html
16:31 🔗 joepie91_ :P
16:31 🔗 db48x http://piratepad.net/C2ioWiy8fG
16:32 🔗 SN4T14 youtube-dl supports downloading archive.org, we should archive it! :D
16:32 🔗 joepie91_ lol
16:32 🔗 SN4T14 db48x, I liked your old name! :p
16:33 🔗 db48x what was it?
16:35 🔗 SN4T14 Add name here
16:35 🔗 SN4T14 :p
16:36 🔗 db48x ah
16:40 🔗 db48x does freedb have information about labels in it?
16:46 🔗 yipdw db48x: not sure, but MusicBrainz does
16:46 🔗 yipdw db48x: and their data is freely available -> http://musicbrainz.org/doc/MusicBrainz_Database
16:48 🔗 db48x nice
16:48 🔗 db48x want to find all the artists with no label, then do some youtube searches?
16:49 🔗 yipdw at some point, but it's also artists on non-participating labels
16:49 🔗 yipdw e.g. for Adele you'd probably want to look up XL
16:49 🔗 yipdw XL Recordings that is
16:51 🔗 db48x yea
16:53 🔗 yipdw the constitutents of the Worldwide Independent Network might be a good place to start for that list
16:54 🔗 db48x good idea
16:55 🔗 db48x not many actual artists in a youtube search for 'indenendant artist'
16:56 🔗 db48x however I spell it
16:56 🔗 db48x mostly interviews, promoters and consultants
16:59 🔗 db48x this is good though: https://www.youtube.com/watch?v=Hbxy9xvpZ10&list=PLC32FEF51263DD92C
16:59 🔗 SN4T14 db48x, it's spelled "independent"
17:00 🔗 db48x yes, I spelled it correctly when I did the search
18:37 🔗 SketchCow http://instagram.com/p/pWpuz6MxuB/ (Server decomissioned)
18:39 🔗 SN4T14 That looks so fun. :D
18:48 🔗 godane i think i surpass my old record in godaneinbox
18:48 🔗 godane its at 18735 now
18:51 🔗 DFJustin cripes
18:55 🔗 midas so, about this RAWPORTER, did I miss anything?
18:56 🔗 SketchCow I think I punched them
18:56 🔗 SketchCow Then we sat
18:56 🔗 SketchCow But if we can pull ANYTHING out of them, do it.
18:57 🔗 SketchCow Chances might be it's not possible.
18:57 🔗 SketchCow Might be limited release, but scan them
18:57 🔗 midas someone did a scan already if im not mistaken
18:58 🔗 midas or was that steem
18:59 🔗 midas joepie91_: you did some rawporter work yesterday with the markers
19:01 🔗 midas i think the s3 wasnt secured
19:01 🔗 midas so we can grab all pictures and video's
19:01 🔗 midas http://rawporter.s3.amazonaws.com/
19:03 🔗 SketchCow Well, do it.
19:11 🔗 midas s3cmd du s3://rawporter
19:11 🔗 midas WARNING: Retrying failed request: /?marker=thumbs/l_f5fnivczoddwq7.jpg (timed out)
19:11 🔗 midas WARNING: Waiting 3 sec...
19:11 🔗 midas 78880173037 s3://rawporter/
19:11 🔗 midas well peeps?
19:12 🔗 SN4T14 78GB? That's pretty small...
19:12 🔗 midas thats what is on the s3
19:14 🔗 SN4T14 Weird
19:18 🔗 joepie91_ :P
19:18 🔗 joepie91_ just grab all of S3
19:19 🔗 midas meh, good point
19:20 🔗 joepie91_ grab first, assess later
19:21 🔗 yipdw that advice has also served me well on the North Side of Chicago
19:21 🔗 joepie91_ yipdw: ?
19:21 🔗 yipdw bad regional joke
19:21 🔗 joepie91_ (also, do we have a way of grabbing an entire S3 bucket with WARC?)
19:21 🔗 joepie91_ lol
19:22 🔗 * joepie91_ is not from that region
19:22 🔗 yipdw basically the north side and the rest of the North Shore area is unusually sexually active
19:22 🔗 SN4T14 So basically sex apartheid
19:23 🔗 yipdw nah
19:23 🔗 yipdw we have real racism in Chicag
19:23 🔗 yipdw o
19:23 🔗 joepie91_ "unusually sexually active"?
19:23 🔗 yipdw it's on the high end of the curve
19:23 🔗 yipdw anyway
19:24 🔗 yipdw I think I passed the -bs threshold on line 1
19:27 🔗 joepie91_ lol
19:30 🔗 midas grabbing s3 now
19:32 🔗 SN4T14 midas, according to my calculations, you're going to cost them $4-$9.5 in S3 costs from grabbing all of that. :p
19:33 🔗 midas im going to grab it 20 times SN4T14 ;)
19:35 🔗 schbirid earbits.com mp3 tars are incoming, grab them while you can: https://archive.org/search.php?query=subject%3A%22earbits.com%22%20mp3
19:35 🔗 SN4T14 while true; do curl https://rawporter.s3.amazonaws.com/AWOL/gallery.swf -o /dev/null; done
19:35 🔗 SN4T14 :p
19:35 🔗 SN4T14 Not sure if that's correct, I rarely use curl. :p
19:36 🔗 midas s3cmd get s3://rawporter --recursive /hurr/durr
19:36 🔗 SN4T14 while true; do s3cmd get s3://rawporter --recursive /dev/null &; done
19:36 🔗 SN4T14 :D
19:36 🔗 schbirid lol, open s3 is the new ID iteration
19:37 🔗 SN4T14 ID iteration?
19:37 🔗 midas open s3 is running around with your middlefingers in the air and screaming
19:38 🔗 joepie91_ schbirid: nah, open S3 is much more efficient
19:38 🔗 joepie91_ than ID iteration
19:38 🔗 joepie91_ :P
19:38 🔗 schbirid :>
19:38 🔗 schbirid SN4T14: wget http:///www.internet.com/file?id=123
19:38 🔗 joepie91_ jesus wtf, 28KB/sec
19:38 🔗 joepie91_ how congested is IA
19:39 🔗 midas jeez joepie91_
19:39 🔗 midas are you on dialup?
19:39 🔗 joepie91_ 20kb/sec now, cancelled it
19:39 🔗 joepie91_ midas: IA is, apparently
19:40 🔗 schbirid not from here
19:41 🔗 midas i
19:41 🔗 midas i've seen worse according to the weathermap
19:41 🔗 midas slowdown rate on s3 is also very low
19:41 🔗 midas are you downloading or uploading joepie91_
19:41 🔗 midas ?
19:41 🔗 joepie91_ dfl
19:41 🔗 joepie91_ dl *
19:43 🔗 joepie91_ it's going over HE
19:43 🔗 joepie91_ now brb
19:49 🔗 Muad-Dib http://arstechnica.com/business/2014/06/artists-who-dont-sign-with-youtubes-new-subscription-service-to-be-blocked/
19:52 🔗 exmic yeah that's fucked up
19:55 🔗 db48x Muad-Dib: we need someone to build a list of independent artists
19:56 🔗 db48x yipdw suggested looking at the members of the Worldwide Independent Network
20:40 🔗 Nemo_bis congrats midas and ohhdemgir :) https://archive.org/metamgr.php?f=histogram&group=uploader&w_collection=ftpsites
20:40 🔗 SN4T14 "You must be logged in to access this service." >.>
20:40 🔗 schbirid :)
20:41 🔗 Nemo_bis and why aren't you logged in on archive.org, aren't you going after spam and support requests in forums etc. etc.
20:42 🔗 db48x aww, I'm not authorized
20:42 🔗 Nemo_bis oh, look, there is only one 2331388015 KB item https://archive.org/metamgr.php?f=histogram&group=size&w_collection=ftpsites
20:42 🔗 Nemo_bis The second is 893892396 KB from another wikisourceror, I swear I didn't suggest him
20:45 🔗 schbirid how can i see how much i uploaded?
20:48 🔗 Nemo_bis schbirid: https://archive.org/metamgr.php?f=histogram&group=size&w_uploader=spirit@quaddicted.com but I'm not sure if there's a way to sum the first column
20:49 🔗 schbirid oi, dont leak my mail address to irc please
20:49 🔗 schbirid thanks
20:49 🔗 exmic well, meatmgr
20:49 🔗 Nemo_bis sorry, I had a doubt for a moment but then thought it's in all the xml files anyway :p
20:49 🔗 exmic :P
20:50 🔗 Nemo_bis I should have used mine as example
20:50 🔗 schbirid yeah but those are for admins only
20:50 🔗 schbirid while in this channel i maybe know 10%
20:50 🔗 schbirid no biggie
20:50 🔗 exmic what, the xml files?
20:50 🔗 schbirid yeah
20:51 🔗 Nemo_bis everyone can download them
20:51 🔗 exmic hmm
20:51 🔗 exmic that's what I thought, Nemo_bis
20:51 🔗 schbirid its kinda crazy how much archive.org shows stupid admins like me :D
20:51 🔗 schbirid oh? :(
20:51 🔗 * db48x spams schbirid
20:52 🔗 Nemo_bis they're not even hidden behind the "HTTPS"/download link in items like https://archive.org/details/wiki-wikiurbandeadcom
20:54 🔗 garyrh joepie91, i've been downloading rawporter
20:54 🔗 garyrh nearly done
20:55 🔗 garyrh is anyone else downloading rawporter?
20:56 🔗 SN4T14 I think midas was as well.
20:58 🔗 midas just the s3 files
20:58 🔗 midas 6200 of 39K
20:59 🔗 midas probably done in the morning
21:00 🔗 joepie91_ schbirid: anybody can see uploader, yes
21:02 🔗 Nemo_bis OTOH, http://blog.archive.org/2013/10/25/reader-privacy-at-the-internet-archive/ : almost nobody on the web is so good
21:07 🔗 DFJustin I do wish there was an uploader privacy option for items though
21:07 🔗 DFJustin other than registering a throwaway email
21:08 🔗 midas DFJustin: darken it directly after uploading?
21:08 🔗 midas altho, it wont be findable anymore
21:09 🔗 exmic also won't be downloadable or anything
21:24 🔗 Nemo_bis lol mistym
21:25 🔗 Nemo_bis * midas
21:51 🔗 ohhdemgir Nemo_bis, "User: ohhdemgirls is not authorized to access this service."
21:52 🔗 Nemo_bis well, you're second with most items in ftpsites
21:53 🔗 ohhdemgir i wanna see!!
21:57 🔗 SN4T14 I think midas was as well.
21:57 🔗 SN4T14 Whoops, this isn't Cygwin
21:57 🔗 SN4T14 lol
23:25 🔗 underscor schbirid: Nemo_bis: yeah, they're totally open in the current system
23:25 🔗 underscor Much of the system is architected on that being the case but eventually we want to move to a different user ID
23:25 🔗 underscor as manpower allows

irclogger-viewer