| Time |
Nickname |
Message |
|
00:01
🔗
|
SadDM |
I did some poking around on this earlier in the week |
|
00:06
🔗
|
SadDM |
the streams are all stored in urls of the following form: http:////www.eastvillageradio.com/archivedshows/{p}/{f}.mp3 |
|
00:06
🔗
|
SadDM |
where {f} and {p} are the same as the url parameters to the streaming player |
|
00:08
🔗
|
SadDM |
dashcloud: fwiw I sussed that out with chrome's developer tools... so there's that |
|
00:19
🔗
|
SadDM |
also, I have all of the World Wide Mash streams... 28GB |
|
00:45
🔗
|
SketchCow |
SadDM: So do we think we can do this? |
|
00:45
🔗
|
SketchCow |
If someone writes some scripts, I can run them at IA and use the pipe. |
|
00:46
🔗
|
SadDM |
They seem to limit downloads to about 500KB/s |
|
00:46
🔗
|
SadDM |
but I did that whole show without them limiting me further |
|
00:46
🔗
|
SadDM |
I might be doable |
|
00:47
🔗
|
SadDM |
scraping out the necesary parameters is a bit of a pain but not terrible |
|
00:47
🔗
|
SadDM |
if a couple of people split up the task of generating the URLs and fed them to you... that might be the best bet |
|
00:49
🔗
|
SketchCow |
Give me an example of an mp3. |
|
00:49
🔗
|
SadDM |
stand by... |
|
00:50
🔗
|
SadDM |
http:////www.eastvillageradio.com/archivedshows/1300/1300-18839-20090716.mp3 |
|
00:52
🔗
|
SadDM |
whoops... there's a couple extra slashes in there for some reason |
|
00:53
🔗
|
SadDM |
there are 58 shows, and the one I did took about 13-14 hours |
|
00:53
🔗
|
SadDM |
paralellization could speed that up, or they could catch on to you and throttle |
|
01:02
🔗
|
SketchCow |
Yes, but --2014-05-20 01:02:12-- http://www.eastvillageradio.com/archivedshows/1300/1300-18839-20090716.mp3 |
|
01:02
🔗
|
SketchCow |
Resolving www.eastvillageradio.com... 98.129.116.171 |
|
01:02
🔗
|
SketchCow |
Connecting to www.eastvillageradio.com|98.129.116.171|:80... connected. |
|
01:02
🔗
|
SketchCow |
HTTP request sent, awaiting response... 404 Not Found |
|
01:02
🔗
|
SketchCow |
2014-05-20 01:02:12 ERROR 404: Not Found. |
|
01:02
🔗
|
SadDM |
really? |
|
01:02
🔗
|
SadDM |
I took that right from the list of urls that I downloaded |
|
01:04
🔗
|
SketchCow |
Tried it at two different IPs, different mechanisms. |
|
01:04
🔗
|
SketchCow |
Nothing |
|
01:04
🔗
|
SadDM |
lemme try again |
|
01:05
🔗
|
SadDM |
Even stranger... I'm getting a 400: Bad Request |
|
01:06
🔗
|
SadDM |
yeah, it definitly seems to be gone now |
|
01:09
🔗
|
SketchCow |
17,390,160 1.16MB/s eta 82s |
|
01:09
🔗
|
SketchCow |
I'm getting 1.16MB/s |
|
01:09
🔗
|
SadDM |
I just double-checked the streaming player and tried it with a url that I was streaming, but still got a 400 |
|
01:09
🔗
|
SadDM |
you got it to work? |
|
01:10
🔗
|
SketchCow |
yes. |
|
01:10
🔗
|
SketchCow |
I could probably stream a good ripper tonight. |
|
01:10
🔗
|
SketchCow |
Script a good stream ripper |
|
01:10
🔗
|
SketchCow |
sorry |
|
01:11
🔗
|
SketchCow |
I was just stunned that John Lennon sold his house to Ringo Starr |
|
01:11
🔗
|
SadDM |
:-D |
|
01:11
🔗
|
SketchCow |
Ok, this one is on me. |
|
01:11
🔗
|
SketchCow |
I can do this. |
|
01:11
🔗
|
SketchCow |
The ripper is easy. |
|
01:11
🔗
|
SadDM |
do you want me to start feeding youy lists of magic numbers? |
|
01:11
🔗
|
SketchCow |
I can acquire those myself. |
|
01:11
🔗
|
SketchCow |
No, the problem is I want to come up with a way to turn it into an IA item. |
|
01:11
🔗
|
SadDM |
alright then |
|
01:11
🔗
|
SketchCow |
I'm THINKING |
|
01:12
🔗
|
SketchCow |
Look at this BRAIN |
|
01:12
🔗
|
SadDM |
yeah, they have playlists for each show too which would make great metadata for each show stream |
|
01:14
🔗
|
SketchCow |
No other reason to do it. |
|
01:15
🔗
|
SadDM |
Anything I can do to help? |
|
01:16
🔗
|
SadDM |
I'm almost begged for the night, but I could start serious work after the day-job ends tomorrow |
|
01:28
🔗
|
SketchCow |
I think someone assisting me with ripping out html tables from the playlists would help. |
|
01:29
🔗
|
SketchCow |
I mean, I can do all this, but I have a massive to-do list |
|
01:30
🔗
|
SadDM |
OK, I'll start looking into that tomorrow night. |
|
01:32
🔗
|
SketchCow |
OK, so, division of labor |
|
01:32
🔗
|
SketchCow |
I am going to go ahead and just start taking in mp3s. |
|
01:33
🔗
|
SketchCow |
Since it's XXXX-YYYYY-ZZZZZZZZ.mp3 where XXXX is the show id, the resulting pile of ids can be stripped. |
|
01:33
🔗
|
SketchCow |
So we can take this mp3 set, and turn them into described items. |
|
01:42
🔗
|
SketchCow |
https://archive.org/details/evr_5744-50944-20140513 |
|
01:42
🔗
|
SketchCow |
Experiment #1 |
|
01:46
🔗
|
SketchCow |
This actually won't be hard on the suck side |
|
01:59
🔗
|
SketchCow |
PROC=$$ |
|
01:59
🔗
|
SketchCow |
for each in `cat $PROC.showarchive.txt | grep shows/player/main | sed 's/.*p=//g' | sed 's/\".*//g' | sed 's/^/http:\/\/www.eastvillageradio.com\/archivedshows\//g' | sed 's/\&f=/\//g' | sed 's/$/.mp3/g'` |
|
01:59
🔗
|
SketchCow |
mv nowplaying* $PROC.showarchive.txt |
|
01:59
🔗
|
SketchCow |
wget "$1" |
|
01:59
🔗
|
SketchCow |
do |
|
01:59
🔗
|
SketchCow |
wget --user-agent="EVR Will Never Die" "$each" |
|
01:59
🔗
|
SketchCow |
done |
|
01:59
🔗
|
SketchCow |
Turns out it wasn't hard at all. |
|
02:08
🔗
|
SketchCow |
Pulling in 6 hours of radio every 4 minutes. |
|
02:19
🔗
|
SketchCow |
Looks like 7 simultaneous streams is about the smartest |
|
02:33
🔗
|
dashcloud |
wow- that was quick! |
|
03:45
🔗
|
SketchCow |
712 hours grabbed so far. |
|
03:46
🔗
|
SketchCow |
So, one month. |
|
11:46
🔗
|
damongant |
For anyone with access to the 4chan article -i can't be bothered to sign in - we only archive images for 7 days (I'm the admin of deniableplausibility) |
|
12:50
🔗
|
SketchCow |
1,879 hours grabbed. |
|
12:52
🔗
|
SketchCow |
Shows are falling fast! |
|
12:52
🔗
|
SadDM |
SketchCow: I'm pulling down the playlists as we speak... I'll parse out the tables tonight |
|
12:52
🔗
|
SketchCow |
Actually, sorry, it's actually 3758 hours, 154 days. |
|
12:52
🔗
|
SketchCow |
SadDM: Thanks. |
|
12:53
🔗
|
SketchCow |
It'll me piles of mp3 with names like: |
|
12:53
🔗
|
SketchCow |
1232-234867-20140429.mp3 |
|
12:53
🔗
|
SadDM |
I also grabbed the show descriptions and art. |
|
12:53
🔗
|
SketchCow |
So, I'm doing an experiment, which is not coming out well. |
|
12:53
🔗
|
SketchCow |
I went to a wayback copy of the site, to find the shows now gone |
|
12:53
🔗
|
SketchCow |
And not surprisingly, their mp3s are wiped. |
|
12:54
🔗
|
SketchCow |
Also, as our wayback archives prove, this whole "playlist here, click here to listen" thing starts up in 2009. |
|
12:54
🔗
|
SadDM |
ok, when I get the tables parsed out I'll put them in text files something like 1232-234867-20140429.desc? Something like that seems pretty script friendly to me. |
|
12:55
🔗
|
SketchCow |
So while we won't get ALL the shows that were alive on EVR, we do have things going back to the full range of archive the site had. |
|
12:55
🔗
|
SketchCow |
SadDM: Definitely do a single one for me to see, and we'll experiment with injecting it into the page. |
|
12:59
🔗
|
SadDM |
I'll get that to you some time tonight. It looks like mapping the playlist to the show's id is going to take a tiny bit of work... probably more than I can get ccomplished on my breaks today. |
|
13:00
🔗
|
SketchCow |
It's not THAT bad. |
|
13:00
🔗
|
SketchCow |
But I agree. |
|
13:00
🔗
|
SadDM |
yeah, it's just that the xxx-yyyyy-zzzzzzzzz number isn't in the url |
|
13:00
🔗
|
SketchCow |
It is, it's in the "listen" |
|
13:01
🔗
|
SketchCow |
So that page has playlist page and listen link |
|
13:01
🔗
|
SadDM |
right |
|
13:01
🔗
|
SadDM |
so I just need to grab that one piece of data to look up another... not too bad |
|
13:02
🔗
|
SadDM |
it's just not *in* the playlist page's url |
|
13:03
🔗
|
SketchCow |
So, as expected (?) the fact is, of the "shows" I can download from, they are only the shows that are still around, and then going back as far as the shows were streamed under the "new" system (2009) |
|
13:03
🔗
|
SketchCow |
And in some cases, mp3s have been removed regardless, even though it's an active show, so only the last couple of years. |
|
13:05
🔗
|
SketchCow |
But it is VERY obviously going to go past 4000 hours of music |
|
13:05
🔗
|
SketchCow |
it is very hard to complain |
|
13:14
🔗
|
SadDM |
yup... its going to be a nice collection of hipster rage. |
|
16:54
🔗
|
SadDM |
SketchCow: for https://archive.org/details/evr_5744-50944-20140513 how did you extract that table? Did you just do a copy & paste? I'm asking because the tables in the html files seem to be several seperate tables juggled into place with javascript. |
|
17:05
🔗
|
SketchCow |
I did it by hand as proof. |
|
17:10
🔗
|
SadDM |
ugh... I was afraid of that |
|
17:12
🔗
|
SadDM |
I'm open to suggestions from *anybody* on how to programatically rip the table from this page: http://www.eastvillageradio.com/shows/playlists.aspx?contentid=1208&showid=511106&list=206717 |
|
17:31
🔗
|
SadDM |
They miss the step where they call us: http://www.smashingmagazine.com/2014/05/19/last-goodbye-shut-down-failing-product/ |
|
17:34
🔗
|
SketchCow |
SadDM: I am asking my co-employee at archive.org. |
|
17:37
🔗
|
SketchCow |
He wants it. |
|
17:37
🔗
|
SketchCow |
You shouldn't work on it. He has it. |
|
17:37
🔗
|
SketchCow |
He caused us to pay attention to it, he will eat the pain. |
|
17:38
🔗
|
SadDM |
LOL... good enough. I've got thousands more gaming zines to concentrate on anyway. |
|
18:54
🔗
|
SketchCow |
SadDM: https://archive.org/details/evr_test_item |
|
18:56
🔗
|
SadDM |
Your guy did this? WHat Show and date is it? |
|
19:05
🔗
|
SketchCow |
https://archive.org/details/evr_test_item |
|
19:05
🔗
|
SketchCow |
Now has logo. He's working on date. |
|
19:05
🔗
|
SketchCow |
Logo AND description. |
|
19:20
🔗
|
SadDM |
that's looking pretty good |
|
19:21
🔗
|
SadDM |
I'd be interested to know how he's (?) re-assembling the set list. |
|
20:00
🔗
|
SketchCow |
He's a python genius |
|
20:00
🔗
|
SketchCow |
I bet he's just doing parsing |
|
20:01
🔗
|
SketchCow |
Like, I bet he's just got an HTML ingestor. |
|
20:03
🔗
|
SadDM |
I love that the world is filled with people that are smarter, and have more experience than me. |
|
21:33
🔗
|
monod |
helloooooooooooooooo |
|
21:33
🔗
|
monod |
I have a request |
|
21:34
🔗
|
monod |
Oh, btw, it's not a famous website, I think |
|
21:34
🔗
|
monod |
And I'm still browsing it |
|
21:34
🔗
|
monod |
So, it might even turn unvaluable |
|
21:34
🔗
|
monod |
But, I'd ask if it is possible to save smartphrases.com |
|
21:34
🔗
|
monod |
smartphrase.com* |
|
21:35
🔗
|
monod |
That's all. |
|
21:35
🔗
|
monod |
I don't think it's closing down, but I dunno |
|
21:35
🔗
|
ivan` |
archivebot is on it |
|
21:38
🔗
|
monod |
Oh my go |
|
21:38
🔗
|
monod |
d |
|
21:38
🔗
|
monod |
Do you mean you were already archiving that website? Or that you're going to archive it now? |
|
21:38
🔗
|
monod |
Or something else? |
|
21:38
🔗
|
ivan` |
started it just now |
|
21:38
🔗
|
ivan` |
http://archivebot.at.ninjawedding.org:4567/ |
|
21:39
🔗
|
monod |
I wonder if that website isn't too big! |
|
21:39
🔗
|
ivan` |
I very much doubt that :) |
|
21:39
🔗
|
monod |
33666.57 MB |
|
21:39
🔗
|
monod |
33 gigs??? |
|
21:39
🔗
|
monod |
Oops |
|
21:40
🔗
|
monod |
4.86 MB? o_O |
|
21:40
🔗
|
monod |
Is it for real?? XD |
|
21:40
🔗
|
Smiley |
so far monod |
|
21:40
🔗
|
monod |
Oh |
|
21:41
🔗
|
monod |
Guys, couldn't you get some colleges involved in your project? You'd get a lot of bandwidth, e.g. |
|
21:42
🔗
|
ivan` |
if you know someone with a spare xeon sitting around in a college please send them our way |
|
21:42
🔗
|
monod |
Uhm |
|
21:42
🔗
|
monod |
xeon == server? Or what? |
|
21:42
🔗
|
ivan` |
intel's server chip |
|
21:43
🔗
|
monod |
How does one cost? Also, what about bandwidth? Isn't it uncorrelated to server chips? |
|
21:43
🔗
|
monod |
How much* does one cost |
|
21:44
🔗
|
ivan` |
sure, you need bandwidth, CPU, memory, and disk |
|
21:44
🔗
|
ivan` |
$1100ish for a server? or ~$60/mo on OVH |
|
21:45
🔗
|
monod |
That's another question: who has all that archiving capacity? HDD capacity I mean |
|
21:45
🔗
|
ivan` |
for archivebot? for all the other projects? whoever here pays for it |
|
21:46
🔗
|
monod |
Online storage??? |
|
21:46
🔗
|
ivan` |
or do you mean who stores everything long-term? that would be archive.org |
|
21:46
🔗
|
monod |
Uhm |
|
21:47
🔗
|
monod |
I kinda meant: where are all the files being downloaded right now? :) And yeah, also who keeps them in the long-term, to which you already answered |
|
21:47
🔗
|
ivan` |
there's being downloaded to an OVH machine in Canada |
|
21:48
🔗
|
ivan` |
s/'s/'re/ |
|
21:48
🔗
|
ivan` |
gah need sleep |
|
21:48
🔗
|
monod |
same :D |
|
21:48
🔗
|
monod |
Going to get some in minutes ;) |
|
21:48
🔗
|
monod |
Anyway, then you re-download from the OVH servers to your "home", @archive.org |
|
21:49
🔗
|
monod |
Right? |
|
21:49
🔗
|
ivan` |
no, they're uploaded to fos.textfiles.com |
|
21:49
🔗
|
ivan` |
from there they make it into an archivebot collection on archive.org |
|
21:50
🔗
|
ivan` |
https://archive.org/details/archivebot |
|
21:50
🔗
|
monod |
Thanks |
|
22:08
🔗
|
monod |
Cya all! |
|
22:12
🔗
|
SketchCow |
Where's my hug |
|
22:13
🔗
|
* |
exmic points to the door |
|
22:13
🔗
|
exmic |
he'll be around shortly |
|
22:13
🔗
|
* |
Baljem provides interim SketchCow-hugging services |
|
22:14
🔗
|
Baljem |
my rates are exceedingly reasonable, too! |
|
22:16
🔗
|
exmic |
they exceed reasonability |
|
22:18
🔗
|
SketchCow |
I just love it when someone goes running in with questions. |
|
22:19
🔗
|
Baljem |
I was disappointed there wasn't more head-explodey action with that one |
|
22:19
🔗
|
Baljem |
that's always the best bit |
|
22:19
🔗
|
SketchCow |
I like the ones where someone goes "ok got the minimum amount of information OKAY PEOPLE HERE IS MY GROUND UP REWRITE FOR A COMPLETE OVERHAUL OF THE ARCHIVE TEAM PROCESS" |
|
22:20
🔗
|
SketchCow |
There's ossification of procedure and there's not making the same fundamental mistake 4,000 times |
|
22:21
🔗
|
SketchCow |
WHY IS THE TRACKER NOT IN RUBY ON RAILS |
|
22:21
🔗
|
amerrykan |
this thing you've been doing for years? yeah, it sucks. i re-engineered the entire thing while standing in the shower this morning |
|
22:21
🔗
|
exmic |
the only ruby on rails that's acceptable is http://rubylovesyou.com/ |
|
22:22
🔗
|
exmic |
nsfwish |
|
22:22
🔗
|
exmic |
I guess this is getting kind of offtopic |
|
22:23
🔗
|
SketchCow |
Or really, really ontopic |
|
22:23
🔗
|
exmic |
or that |
|
22:23
🔗
|
Baljem |
yeah, see, my rates are nowhere near her rates |
|
22:23
🔗
|
amerrykan |
she takes care of all the microsoft boys |
|
22:23
🔗
|
exmic |
lol |
|
22:24
🔗
|
Baljem |
admittedly my services are limited to hugs, though, so y'know. |
|
22:24
🔗
|
exmic |
you know they sell whips, right? |
|
22:27
🔗
|
amerrykan |
i really don't want to know what financial domination is, do i |
|
22:28
🔗
|
SketchCow |
And now I am playing the latest episode of Veep at +20% speed |
|
22:28
🔗
|
SketchCow |
Apparently the Internet Archive and Wayback machine are mentioned. |
|
22:28
🔗
|
yipdw |
oh speaking of which I should make sure my DigitalOcean account has enough money |
|
22:28
🔗
|
yipdw |
it'd be hilarious if archivebot.at.ninjawedding.org just died |
|
22:28
🔗
|
exmic |
yeah, hilarious |
|
22:28
🔗
|
yipdw |
all good |
|
22:28
🔗
|
SketchCow |
Laff Riot |
|
22:29
🔗
|
exmic |
glad that someone is on that |