Time |
Nickname |
Message |
00:01
🔗
|
SadDM |
I did some poking around on this earlier in the week |
00:06
🔗
|
SadDM |
the streams are all stored in urls of the following form: http:////www.eastvillageradio.com/archivedshows/{p}/{f}.mp3 |
00:06
🔗
|
SadDM |
where {f} and {p} are the same as the url parameters to the streaming player |
00:08
🔗
|
SadDM |
dashcloud: fwiw I sussed that out with chrome's developer tools... so there's that |
00:19
🔗
|
SadDM |
also, I have all of the World Wide Mash streams... 28GB |
00:45
🔗
|
SketchCow |
SadDM: So do we think we can do this? |
00:45
🔗
|
SketchCow |
If someone writes some scripts, I can run them at IA and use the pipe. |
00:46
🔗
|
SadDM |
They seem to limit downloads to about 500KB/s |
00:46
🔗
|
SadDM |
but I did that whole show without them limiting me further |
00:46
🔗
|
SadDM |
I might be doable |
00:47
🔗
|
SadDM |
scraping out the necesary parameters is a bit of a pain but not terrible |
00:47
🔗
|
SadDM |
if a couple of people split up the task of generating the URLs and fed them to you... that might be the best bet |
00:49
🔗
|
SketchCow |
Give me an example of an mp3. |
00:49
🔗
|
SadDM |
stand by... |
00:50
🔗
|
SadDM |
http:////www.eastvillageradio.com/archivedshows/1300/1300-18839-20090716.mp3 |
00:52
🔗
|
SadDM |
whoops... there's a couple extra slashes in there for some reason |
00:53
🔗
|
SadDM |
there are 58 shows, and the one I did took about 13-14 hours |
00:53
🔗
|
SadDM |
paralellization could speed that up, or they could catch on to you and throttle |
01:02
🔗
|
SketchCow |
Yes, but --2014-05-20 01:02:12-- http://www.eastvillageradio.com/archivedshows/1300/1300-18839-20090716.mp3 |
01:02
🔗
|
SketchCow |
Resolving www.eastvillageradio.com... 98.129.116.171 |
01:02
🔗
|
SketchCow |
Connecting to www.eastvillageradio.com|98.129.116.171|:80... connected. |
01:02
🔗
|
SketchCow |
HTTP request sent, awaiting response... 404 Not Found |
01:02
🔗
|
SketchCow |
2014-05-20 01:02:12 ERROR 404: Not Found. |
01:02
🔗
|
SadDM |
really? |
01:02
🔗
|
SadDM |
I took that right from the list of urls that I downloaded |
01:04
🔗
|
SketchCow |
Tried it at two different IPs, different mechanisms. |
01:04
🔗
|
SketchCow |
Nothing |
01:04
🔗
|
SadDM |
lemme try again |
01:05
🔗
|
SadDM |
Even stranger... I'm getting a 400: Bad Request |
01:06
🔗
|
SadDM |
yeah, it definitly seems to be gone now |
01:09
🔗
|
SketchCow |
17,390,160 1.16MB/s eta 82s |
01:09
🔗
|
SketchCow |
I'm getting 1.16MB/s |
01:09
🔗
|
SadDM |
I just double-checked the streaming player and tried it with a url that I was streaming, but still got a 400 |
01:09
🔗
|
SadDM |
you got it to work? |
01:10
🔗
|
SketchCow |
yes. |
01:10
🔗
|
SketchCow |
I could probably stream a good ripper tonight. |
01:10
🔗
|
SketchCow |
Script a good stream ripper |
01:10
🔗
|
SketchCow |
sorry |
01:11
🔗
|
SketchCow |
I was just stunned that John Lennon sold his house to Ringo Starr |
01:11
🔗
|
SadDM |
:-D |
01:11
🔗
|
SketchCow |
Ok, this one is on me. |
01:11
🔗
|
SketchCow |
I can do this. |
01:11
🔗
|
SketchCow |
The ripper is easy. |
01:11
🔗
|
SadDM |
do you want me to start feeding youy lists of magic numbers? |
01:11
🔗
|
SketchCow |
I can acquire those myself. |
01:11
🔗
|
SketchCow |
No, the problem is I want to come up with a way to turn it into an IA item. |
01:11
🔗
|
SadDM |
alright then |
01:11
🔗
|
SketchCow |
I'm THINKING |
01:12
🔗
|
SketchCow |
Look at this BRAIN |
01:12
🔗
|
SadDM |
yeah, they have playlists for each show too which would make great metadata for each show stream |
01:14
🔗
|
SketchCow |
No other reason to do it. |
01:15
🔗
|
SadDM |
Anything I can do to help? |
01:16
🔗
|
SadDM |
I'm almost begged for the night, but I could start serious work after the day-job ends tomorrow |
01:28
🔗
|
SketchCow |
I think someone assisting me with ripping out html tables from the playlists would help. |
01:29
🔗
|
SketchCow |
I mean, I can do all this, but I have a massive to-do list |
01:30
🔗
|
SadDM |
OK, I'll start looking into that tomorrow night. |
01:32
🔗
|
SketchCow |
OK, so, division of labor |
01:32
🔗
|
SketchCow |
I am going to go ahead and just start taking in mp3s. |
01:33
🔗
|
SketchCow |
Since it's XXXX-YYYYY-ZZZZZZZZ.mp3 where XXXX is the show id, the resulting pile of ids can be stripped. |
01:33
🔗
|
SketchCow |
So we can take this mp3 set, and turn them into described items. |
01:42
🔗
|
SketchCow |
https://archive.org/details/evr_5744-50944-20140513 |
01:42
🔗
|
SketchCow |
Experiment #1 |
01:46
🔗
|
SketchCow |
This actually won't be hard on the suck side |
01:59
🔗
|
SketchCow |
PROC=$$ |
01:59
🔗
|
SketchCow |
for each in `cat $PROC.showarchive.txt | grep shows/player/main | sed 's/.*p=//g' | sed 's/\".*//g' | sed 's/^/http:\/\/www.eastvillageradio.com\/archivedshows\//g' | sed 's/\&f=/\//g' | sed 's/$/.mp3/g'` |
01:59
🔗
|
SketchCow |
mv nowplaying* $PROC.showarchive.txt |
01:59
🔗
|
SketchCow |
wget "$1" |
01:59
🔗
|
SketchCow |
do |
01:59
🔗
|
SketchCow |
wget --user-agent="EVR Will Never Die" "$each" |
01:59
🔗
|
SketchCow |
done |
01:59
🔗
|
SketchCow |
Turns out it wasn't hard at all. |
02:08
🔗
|
SketchCow |
Pulling in 6 hours of radio every 4 minutes. |
02:19
🔗
|
SketchCow |
Looks like 7 simultaneous streams is about the smartest |
02:33
🔗
|
dashcloud |
wow- that was quick! |
03:45
🔗
|
SketchCow |
712 hours grabbed so far. |
03:46
🔗
|
SketchCow |
So, one month. |
11:46
🔗
|
damongant |
For anyone with access to the 4chan article -i can't be bothered to sign in - we only archive images for 7 days (I'm the admin of deniableplausibility) |
12:50
🔗
|
SketchCow |
1,879 hours grabbed. |
12:52
🔗
|
SketchCow |
Shows are falling fast! |
12:52
🔗
|
SadDM |
SketchCow: I'm pulling down the playlists as we speak... I'll parse out the tables tonight |
12:52
🔗
|
SketchCow |
Actually, sorry, it's actually 3758 hours, 154 days. |
12:52
🔗
|
SketchCow |
SadDM: Thanks. |
12:53
🔗
|
SketchCow |
It'll me piles of mp3 with names like: |
12:53
🔗
|
SketchCow |
1232-234867-20140429.mp3 |
12:53
🔗
|
SadDM |
I also grabbed the show descriptions and art. |
12:53
🔗
|
SketchCow |
So, I'm doing an experiment, which is not coming out well. |
12:53
🔗
|
SketchCow |
I went to a wayback copy of the site, to find the shows now gone |
12:53
🔗
|
SketchCow |
And not surprisingly, their mp3s are wiped. |
12:54
🔗
|
SketchCow |
Also, as our wayback archives prove, this whole "playlist here, click here to listen" thing starts up in 2009. |
12:54
🔗
|
SadDM |
ok, when I get the tables parsed out I'll put them in text files something like 1232-234867-20140429.desc? Something like that seems pretty script friendly to me. |
12:55
🔗
|
SketchCow |
So while we won't get ALL the shows that were alive on EVR, we do have things going back to the full range of archive the site had. |
12:55
🔗
|
SketchCow |
SadDM: Definitely do a single one for me to see, and we'll experiment with injecting it into the page. |
12:59
🔗
|
SadDM |
I'll get that to you some time tonight. It looks like mapping the playlist to the show's id is going to take a tiny bit of work... probably more than I can get ccomplished on my breaks today. |
13:00
🔗
|
SketchCow |
It's not THAT bad. |
13:00
🔗
|
SketchCow |
But I agree. |
13:00
🔗
|
SadDM |
yeah, it's just that the xxx-yyyyy-zzzzzzzzz number isn't in the url |
13:00
🔗
|
SketchCow |
It is, it's in the "listen" |
13:01
🔗
|
SketchCow |
So that page has playlist page and listen link |
13:01
🔗
|
SadDM |
right |
13:01
🔗
|
SadDM |
so I just need to grab that one piece of data to look up another... not too bad |
13:02
🔗
|
SadDM |
it's just not *in* the playlist page's url |
13:03
🔗
|
SketchCow |
So, as expected (?) the fact is, of the "shows" I can download from, they are only the shows that are still around, and then going back as far as the shows were streamed under the "new" system (2009) |
13:03
🔗
|
SketchCow |
And in some cases, mp3s have been removed regardless, even though it's an active show, so only the last couple of years. |
13:05
🔗
|
SketchCow |
But it is VERY obviously going to go past 4000 hours of music |
13:05
🔗
|
SketchCow |
it is very hard to complain |
13:14
🔗
|
SadDM |
yup... its going to be a nice collection of hipster rage. |
16:54
🔗
|
SadDM |
SketchCow: for https://archive.org/details/evr_5744-50944-20140513 how did you extract that table? Did you just do a copy & paste? I'm asking because the tables in the html files seem to be several seperate tables juggled into place with javascript. |
17:05
🔗
|
SketchCow |
I did it by hand as proof. |
17:10
🔗
|
SadDM |
ugh... I was afraid of that |
17:12
🔗
|
SadDM |
I'm open to suggestions from *anybody* on how to programatically rip the table from this page: http://www.eastvillageradio.com/shows/playlists.aspx?contentid=1208&showid=511106&list=206717 |
17:31
🔗
|
SadDM |
They miss the step where they call us: http://www.smashingmagazine.com/2014/05/19/last-goodbye-shut-down-failing-product/ |
17:34
🔗
|
SketchCow |
SadDM: I am asking my co-employee at archive.org. |
17:37
🔗
|
SketchCow |
He wants it. |
17:37
🔗
|
SketchCow |
You shouldn't work on it. He has it. |
17:37
🔗
|
SketchCow |
He caused us to pay attention to it, he will eat the pain. |
17:38
🔗
|
SadDM |
LOL... good enough. I've got thousands more gaming zines to concentrate on anyway. |
18:54
🔗
|
SketchCow |
SadDM: https://archive.org/details/evr_test_item |
18:56
🔗
|
SadDM |
Your guy did this? WHat Show and date is it? |
19:05
🔗
|
SketchCow |
https://archive.org/details/evr_test_item |
19:05
🔗
|
SketchCow |
Now has logo. He's working on date. |
19:05
🔗
|
SketchCow |
Logo AND description. |
19:20
🔗
|
SadDM |
that's looking pretty good |
19:21
🔗
|
SadDM |
I'd be interested to know how he's (?) re-assembling the set list. |
20:00
🔗
|
SketchCow |
He's a python genius |
20:00
🔗
|
SketchCow |
I bet he's just doing parsing |
20:01
🔗
|
SketchCow |
Like, I bet he's just got an HTML ingestor. |
20:03
🔗
|
SadDM |
I love that the world is filled with people that are smarter, and have more experience than me. |
21:33
🔗
|
monod |
helloooooooooooooooo |
21:33
🔗
|
monod |
I have a request |
21:34
🔗
|
monod |
Oh, btw, it's not a famous website, I think |
21:34
🔗
|
monod |
And I'm still browsing it |
21:34
🔗
|
monod |
So, it might even turn unvaluable |
21:34
🔗
|
monod |
But, I'd ask if it is possible to save smartphrases.com |
21:34
🔗
|
monod |
smartphrase.com* |
21:35
🔗
|
monod |
That's all. |
21:35
🔗
|
monod |
I don't think it's closing down, but I dunno |
21:35
🔗
|
ivan` |
archivebot is on it |
21:38
🔗
|
monod |
Oh my go |
21:38
🔗
|
monod |
d |
21:38
🔗
|
monod |
Do you mean you were already archiving that website? Or that you're going to archive it now? |
21:38
🔗
|
monod |
Or something else? |
21:38
🔗
|
ivan` |
started it just now |
21:38
🔗
|
ivan` |
http://archivebot.at.ninjawedding.org:4567/ |
21:39
🔗
|
monod |
I wonder if that website isn't too big! |
21:39
🔗
|
ivan` |
I very much doubt that :) |
21:39
🔗
|
monod |
33666.57 MB |
21:39
🔗
|
monod |
33 gigs??? |
21:39
🔗
|
monod |
Oops |
21:40
🔗
|
monod |
4.86 MB? o_O |
21:40
🔗
|
monod |
Is it for real?? XD |
21:40
🔗
|
Smiley |
so far monod |
21:40
🔗
|
monod |
Oh |
21:41
🔗
|
monod |
Guys, couldn't you get some colleges involved in your project? You'd get a lot of bandwidth, e.g. |
21:42
🔗
|
ivan` |
if you know someone with a spare xeon sitting around in a college please send them our way |
21:42
🔗
|
monod |
Uhm |
21:42
🔗
|
monod |
xeon == server? Or what? |
21:42
🔗
|
ivan` |
intel's server chip |
21:43
🔗
|
monod |
How does one cost? Also, what about bandwidth? Isn't it uncorrelated to server chips? |
21:43
🔗
|
monod |
How much* does one cost |
21:44
🔗
|
ivan` |
sure, you need bandwidth, CPU, memory, and disk |
21:44
🔗
|
ivan` |
$1100ish for a server? or ~$60/mo on OVH |
21:45
🔗
|
monod |
That's another question: who has all that archiving capacity? HDD capacity I mean |
21:45
🔗
|
ivan` |
for archivebot? for all the other projects? whoever here pays for it |
21:46
🔗
|
monod |
Online storage??? |
21:46
🔗
|
ivan` |
or do you mean who stores everything long-term? that would be archive.org |
21:46
🔗
|
monod |
Uhm |
21:47
🔗
|
monod |
I kinda meant: where are all the files being downloaded right now? :) And yeah, also who keeps them in the long-term, to which you already answered |
21:47
🔗
|
ivan` |
there's being downloaded to an OVH machine in Canada |
21:48
🔗
|
ivan` |
s/'s/'re/ |
21:48
🔗
|
ivan` |
gah need sleep |
21:48
🔗
|
monod |
same :D |
21:48
🔗
|
monod |
Going to get some in minutes ;) |
21:48
🔗
|
monod |
Anyway, then you re-download from the OVH servers to your "home", @archive.org |
21:49
🔗
|
monod |
Right? |
21:49
🔗
|
ivan` |
no, they're uploaded to fos.textfiles.com |
21:49
🔗
|
ivan` |
from there they make it into an archivebot collection on archive.org |
21:50
🔗
|
ivan` |
https://archive.org/details/archivebot |
21:50
🔗
|
monod |
Thanks |
22:08
🔗
|
monod |
Cya all! |
22:12
🔗
|
SketchCow |
Where's my hug |
22:13
🔗
|
* |
exmic points to the door |
22:13
🔗
|
exmic |
he'll be around shortly |
22:13
🔗
|
* |
Baljem provides interim SketchCow-hugging services |
22:14
🔗
|
Baljem |
my rates are exceedingly reasonable, too! |
22:16
🔗
|
exmic |
they exceed reasonability |
22:18
🔗
|
SketchCow |
I just love it when someone goes running in with questions. |
22:19
🔗
|
Baljem |
I was disappointed there wasn't more head-explodey action with that one |
22:19
🔗
|
Baljem |
that's always the best bit |
22:19
🔗
|
SketchCow |
I like the ones where someone goes "ok got the minimum amount of information OKAY PEOPLE HERE IS MY GROUND UP REWRITE FOR A COMPLETE OVERHAUL OF THE ARCHIVE TEAM PROCESS" |
22:20
🔗
|
SketchCow |
There's ossification of procedure and there's not making the same fundamental mistake 4,000 times |
22:21
🔗
|
SketchCow |
WHY IS THE TRACKER NOT IN RUBY ON RAILS |
22:21
🔗
|
amerrykan |
this thing you've been doing for years? yeah, it sucks. i re-engineered the entire thing while standing in the shower this morning |
22:21
🔗
|
exmic |
the only ruby on rails that's acceptable is http://rubylovesyou.com/ |
22:22
🔗
|
exmic |
nsfwish |
22:22
🔗
|
exmic |
I guess this is getting kind of offtopic |
22:23
🔗
|
SketchCow |
Or really, really ontopic |
22:23
🔗
|
exmic |
or that |
22:23
🔗
|
Baljem |
yeah, see, my rates are nowhere near her rates |
22:23
🔗
|
amerrykan |
she takes care of all the microsoft boys |
22:23
🔗
|
exmic |
lol |
22:24
🔗
|
Baljem |
admittedly my services are limited to hugs, though, so y'know. |
22:24
🔗
|
exmic |
you know they sell whips, right? |
22:27
🔗
|
amerrykan |
i really don't want to know what financial domination is, do i |
22:28
🔗
|
SketchCow |
And now I am playing the latest episode of Veep at +20% speed |
22:28
🔗
|
SketchCow |
Apparently the Internet Archive and Wayback machine are mentioned. |
22:28
🔗
|
yipdw |
oh speaking of which I should make sure my DigitalOcean account has enough money |
22:28
🔗
|
yipdw |
it'd be hilarious if archivebot.at.ninjawedding.org just died |
22:28
🔗
|
exmic |
yeah, hilarious |
22:28
🔗
|
yipdw |
all good |
22:28
🔗
|
SketchCow |
Laff Riot |
22:29
🔗
|
exmic |
glad that someone is on that |