#archiveteam 2011-09-19,Mon

↑back Search

Time Nickname Message
01:03 🔗 undersco2 alard: Mind if I look at your scripts?
01:03 🔗 undersco2 :)
02:51 🔗 underscor alard: actually, poke me here instead
02:51 🔗 underscor since I keep this client open
02:51 🔗 underscor :)
04:04 🔗 SketchCow I agree we should grab as many videos as possible.
04:27 🔗 SketchCow In another window, I'm digitizing a BetacamSP tape from G4 TV regarding GDC's nominees and winners, as well as a bunch of G4 spots and icons.
04:27 🔗 SketchCow This is as high quality as you can possibly get, so I've got some pretty amazing items going by.
04:29 🔗 SketchCow --
04:29 🔗 SketchCow OK, again, let's go with the high level of this JSTOR thing.
04:30 🔗 SketchCow I'd like to make it that you browse JSTOR, you see the JSTOR, but when you click and read, it downloads it and then uploads to us.
04:30 🔗 SketchCow With all the metadata.
04:30 🔗 SketchCow So a person is "reading" and they're also saving for us.
04:30 🔗 SketchCow And having it so you're asking US what to read, so we don't get overlaps.
04:30 🔗 SketchCow We'll demolish the lists in no time WITHOUT using Scripts or DDOS.
04:30 🔗 SketchCow We COULD use scripts, that's not the point, this is more hilarious.
04:31 🔗 SketchCow So go for hilarious, we'll get attention and make people snicker.
06:14 🔗 SketchCow And I think people can get through the back catalog easily.
06:39 🔗 ersi_ Sounds great
06:46 🔗 chronomex didn't themes.freshmeat.net used to exist?
06:46 🔗 chronomex or am I thinking of something else?
06:46 🔗 SketchCow Yes, something like that.
06:46 🔗 chronomex huh.
07:59 🔗 SketchCow HEY SO IT TURNS OUT
08:00 🔗 SketchCow If you demand and get a dedicated laptop, you can have this laptop and a Betamax player make out for days on end.
08:00 🔗 SketchCow And they'll be able to ingest so much video, it's fuckin' sick.
08:00 🔗 chronomex get a room
08:01 🔗 SketchCow They did. My room.
08:01 🔗 SketchCow Oh look, it's Peter Molyneux
08:01 🔗 chronomex I kind of wonder what the lady thinks of that
08:01 🔗 chronomex but then she knows you're kind of an archivist
08:02 🔗 SketchCow She's got a place in the city
08:02 🔗 SketchCow The lady and I don't live together
08:02 🔗 chronomex aye
08:02 🔗 SketchCow I live in the fucksticks and come down often.
08:02 🔗 SketchCow I think she's been up here.... maybe twice, three times
08:02 🔗 chronomex huh, okay, blows that theory out of the water
08:02 🔗 SketchCow The landscape is pretty but the house itself's kinda awful
08:24 🔗 alard SketchCow: have you had a look at the JSTOR thing?
08:27 🔗 SketchCow Not yet.
08:28 🔗 SketchCow Should I?
08:29 🔗 alard Well, why not? It's supposed to help with your no-scripts idea. :)
08:29 🔗 alard It doesn't download metadata, but it does download and upload pdfs.
08:29 🔗 SketchCow Give me the link again.
08:29 🔗 alard http://severe-samurai-6114.heroku.com/
08:30 🔗 chronomex you should call it J-U-Stor-It
08:30 🔗 ersi Y-U-stor-it
08:31 🔗 alard Heh.
08:31 🔗 SketchCow Y U SO ACADEMIC
08:32 🔗 SketchCow Come on dude.
08:33 🔗 SketchCow Is that not the most beautiful thing ever.
08:33 🔗 SketchCow Isn't that so much better than Archive Team scans and downloads
08:33 🔗 db48x I like JUStorIt better
08:34 🔗 SketchCow Where do these PDFs end up, by the way.
08:34 🔗 SketchCow And yes, we really do need the metadata.
08:34 🔗 db48x yea, with a collection that large there's no point in bothering if you don't have the metadata
08:34 🔗 SketchCow But this is obviously 90% of what I was requesting.
08:34 🔗 db48x true
08:35 🔗 db48x should turn it into a restartless Firefox addon
08:35 🔗 db48x that makes it automatic
08:35 🔗 SketchCow We don't want automatic.
08:35 🔗 db48x hmm
08:35 🔗 SketchCow We want thousands of people to get this, run it, and be liberating JSTOR at a slower, don't go to jail pace
08:36 🔗 SketchCow And JSTOR running around, watching everything go in every direction.
08:36 🔗 SketchCow Embarassed as hell.
08:36 🔗 db48x ah, I see
08:36 🔗 SketchCow Style, it's about style.
08:36 🔗 chronomex we should explain that we *COULD* do it the obvious rapey way but look see that's entirely unnecessary
08:37 🔗 SketchCow I'd not.
08:37 🔗 chronomex try to dissuade people from "helping" us in exactly the way we're avoiding
08:37 🔗 SketchCow But I do agree on dissuading.
08:37 🔗 SketchCow I could see writing something like "It turns out if you download too much you go to jail"
08:37 🔗 SketchCow I'll compose something, how about that.
08:37 🔗 SketchCow Since we'll have a wiki page for that.
08:37 🔗 SketchCow but I'd like to see the metadata thing working.
08:37 🔗 chronomex sure
08:38 🔗 alard The point about the metadata is this:
08:38 🔗 alard 1. at the moment, the thing needs to be fed a list of article IDs
08:38 🔗 alard 2. if you have collected the article IDs, you also have the corresponding metadata.
08:38 🔗 SketchCow ...
08:39 🔗 SketchCow And if you're browsing, and not downloading, you're not doing the click through license!
08:39 🔗 SketchCow Where are these uploading, by the way.
08:40 🔗 alard They're uploading back to the application. File name of the pdf + the data itself. (And in this example setup, nothing is saved.)
08:43 🔗 alard Server-side it's pretty simple: there's something that provides the next id and something that receives the POSTed data.
08:43 🔗 SketchCow Are you saying severe-samurai-6114.heroku.com is getting the data?
08:44 🔗 alard Yes.
08:45 🔗 db48x alard: why do you need the Base64 class? why not use the built-in functions btoa and atob?
08:45 🔗 alard Ignorance?
08:47 🔗 db48x oh, you're using typed arrays too
08:47 🔗 Soojin you should put that info on the side of the page so all my retard friends don't need to ask me what it does:P
08:47 🔗 Soojin that way you'll get more "hosts" ;)
08:47 🔗 SketchCow Which info.
08:47 🔗 Soojin the mission info :)
08:48 🔗 SketchCow P.S. This guy and I are now working together: http://vimeo.com/29184137
08:48 🔗 SketchCow Thanks, Soojin.
08:48 🔗 SketchCow I mean, wait...
08:48 🔗 SketchCow ...duh
08:48 🔗 SketchCow I'd rather alard and db48x work to make the code work as best it can, I'll make sure the rest is smooth.
08:48 🔗 SketchCow But I want that side of things nice and tight, I can get a couple hosts going, etc.
08:48 🔗 alard db48x: Yes, it took some trickery to get the binary pdfs to download and upload and arrive in one piece.
08:49 🔗 alard Good.
08:50 🔗 db48x yea
08:51 🔗 db48x I'm working on a parser in javascript that uses them at the moment
08:51 🔗 alard For the metadata, I think the 'Summary' box contains everything?
08:51 🔗 SketchCow Possibly.
08:52 🔗 alard So maybe it's an idea to grab that and submit it with the data, then figure out how to parse it later?
08:52 🔗 db48x hrm
08:53 🔗 db48x Abstract(back to top)
08:53 🔗 db48x An abstract for this item is not available.
08:54 🔗 alard But the bibliographic information is there.
08:55 🔗 db48x ugh, the source for these pages is annoying
08:57 🔗 SketchCow I agree, the bibliographic info is thee.
08:57 🔗 SketchCow Quickly checking to see if there's any other way to get those.
08:58 🔗 SketchCow http://www.jstor.org/action/downloadCitation?format=bibtex&include=abs
08:58 🔗 SketchCow Sorry, session in there.
08:58 🔗 SketchCow @article{1909,
08:58 🔗 SketchCow @comment{{NUMBER OF CITATIONS : 1}}
08:58 🔗 SketchCow author = {Alphonsus, Brother},
08:58 🔗 SketchCow jstor_articletype = {research-article},
08:58 🔗 SketchCow title = {Birds Found in St. Joseph CO., Ind., Each Day in June, 1990},
08:58 🔗 SketchCow journal = {Midland Naturalist},
08:58 🔗 SketchCow jstor_issuetitle = {},
08:58 🔗 SketchCow volume = {1},
08:58 🔗 SketchCow number = {4},
08:58 🔗 SketchCow jstor_formatteddate = {Oct., 1909},
08:58 🔗 SketchCow pages = {pp. 97-99},
08:58 🔗 SketchCow url = {http://www.jstor.org/stable/2993227},
08:58 🔗 SketchCow ISSN = {02716844},
08:59 🔗 SketchCow abstract = {},
08:59 🔗 SketchCow language = {English},
08:59 🔗 SketchCow year = {1909},
08:59 🔗 SketchCow publisher = {The University of Notre Dame},
08:59 🔗 SketchCow copyright = {Copyright � 1909 The University of Notre Dame},
08:59 🔗 SketchCow }
08:59 🔗 SketchCow I think that's probably superior, you'll agree.
09:00 🔗 chronomex looks wonderfully structured
09:00 🔗 db48x indeed
09:00 🔗 db48x bibtex is the way to go
09:00 🔗 alard But is it complete? (Sometimes they leave things out.)
09:00 🔗 chronomex it is, after all, bibtex
09:00 🔗 chronomex why not get both
09:01 🔗 chronomex i feel like jstor may be a false flag distracting us from real fires
09:02 🔗 alard bibtex is somewhat expensive for JSTOR, since it has to be generated with a separate request.
09:02 🔗 SketchCow Don't care about that.
09:02 🔗 chronomex howevr, i 1) am not equipped to do this argument now and 2) have not seen other fies
09:02 🔗 SketchCow chronomex: I wouldn't have set alard on this if I thought it was time consuming.
09:03 🔗 SketchCow And it's not, this is less than 24 hours of effort.
09:03 🔗 chronomex aye
09:03 🔗 chronomex anyway, bedtime
09:03 🔗 SketchCow This little show plays well, when describing it.
09:03 🔗 SketchCow Turning thousands of people who are pissed about JSTOR into mules
09:04 🔗 SketchCow And we can constantly mention how this has to be done so people aren't sent to jail for 30 years.
09:04 🔗 db48x :)
09:04 🔗 SketchCow For my own bit, Friendster material goes up soon, and in doing that, it'll make life easier for the poor server
09:04 🔗 SketchCow Which is now crazy clogged with data
09:04 🔗 alard Where did you get the bibtex? Is that via Export Citation?
09:04 🔗 db48x alard: yes
09:04 🔗 SketchCow Yes
09:05 🔗 alard db48x: btoa doesn't give the same results as Base64.encode
09:05 🔗 db48x alard: no, not for a typed array :)
09:05 🔗 alard Okay, I'll stop trying then.
09:06 🔗 db48x (it actually stringifies the array first, so it's really doing btoa("[object ArrayBuffer]") or whatever)
09:08 🔗 db48x I'm going to file a bug
09:17 🔗 db48x bug 687418
09:18 🔗 db48x (https://bugzilla.mozilla.org/show_bug.cgi?id=687418)
09:20 🔗 alard Cool.
09:25 🔗 SketchCow So, have the wget guys taken the warc stuff yet?
09:27 🔗 alard The wget guy has said that he would look at the code, but that's a few weeks ago.
09:27 🔗 SketchCow Ask him if there's anything you can answer or help with.
09:27 🔗 SketchCow Just a way to say hi.
09:27 🔗 SketchCow Without demanding or complaining.
09:27 🔗 alard Yeah.
09:28 🔗 alard I've just sent the copyright assignment stuff back to them, maybe that's also a good reason to email.
09:28 🔗 alard It's more for gnulib than for wget, but it's a reason.
09:32 🔗 alard Okay, the JSTOR thing should now download the bibtex and include the contents from the abstract/bibliographic sections.
09:35 🔗 alard What next?
09:35 🔗 SketchCow I'd like a limited test.
09:35 🔗 SketchCow Throw it to a few random folks, see what comes out the other end.
09:35 🔗 SketchCow See how it pulls, etc.
09:38 🔗 alard Then the question becomes: where to host this thing?
09:39 🔗 SketchCow I'll figure that out.
10:43 🔗 godane i'm starting to archive gbtv
10:43 🔗 godane and the screen savers
10:43 🔗 godane looks like there is like 12 months of glenn beck on archive.org
10:56 🔗 SketchCow Yes
11:00 🔗 godane the white balance in alot of youtube screen savers episodes is off
11:00 🔗 godane like its too bright
18:40 🔗 Coderjoe I was at a They Might Be Giants concert last night... they were mentioning the various social media they were on between songs, and mentioned friendster several times
19:40 🔗 ersi Coderjoe: did you LOL?
19:40 🔗 Coderjoe yes
19:40 🔗 Coderjoe and a coworker gave be a glance and LOLd as well
19:45 🔗 ersi Hah
20:09 🔗 alard Shouldn't we do something with the Delicious archiving? Has been asked before, I know, and there are the scripts by db48x and user lists by SketchCow, but is anyone actually running those?
21:01 🔗 db48x alard: the scripts were mostly written by underscore :)
21:01 🔗 db48x I'm not sure how complete they are
21:01 🔗 alard Ah, I see :)
21:02 🔗 alard They seem pretty complete.
21:02 🔗 db48x that's good
21:02 🔗 alard At least the cannibal.sh does seem to download most of the interesting bits.
21:02 🔗 db48x I want to sit down and review them again, compare what they download with the site
21:03 🔗 alard I had to replace grep -oP with pcregrep -o, though. Sometimes my grep -oP only said 'Aborted' and didn't grep any bookmarks.
21:03 🔗 db48x hmm
21:05 🔗 db48x won't be today though
21:14 🔗 Coderjoe too bad there is non-public stuff that we can't get at
21:14 🔗 Coderjoe does this script also handle additional bits, like delicious library?
21:15 🔗 alard What's that?
21:59 🔗 Coderjoe alard: I haven't used it, but a friend mentioned he uses it to keep track of things like his DVD collection
22:00 🔗 alard But isn't that a separate program?
22:00 🔗 alard A shiny mac app?
22:05 🔗 Coderjoe I don't really know

irclogger-viewer