[01:03] alard: Mind if I look at your scripts? [01:03] :) [02:51] alard: actually, poke me here instead [02:51] since I keep this client open [02:51] :) [04:04] I agree we should grab as many videos as possible. [04:27] In another window, I'm digitizing a BetacamSP tape from G4 TV regarding GDC's nominees and winners, as well as a bunch of G4 spots and icons. [04:27] This is as high quality as you can possibly get, so I've got some pretty amazing items going by. [04:29] -- [04:29] OK, again, let's go with the high level of this JSTOR thing. [04:30] I'd like to make it that you browse JSTOR, you see the JSTOR, but when you click and read, it downloads it and then uploads to us. [04:30] With all the metadata. [04:30] So a person is "reading" and they're also saving for us. [04:30] And having it so you're asking US what to read, so we don't get overlaps. [04:30] We'll demolish the lists in no time WITHOUT using Scripts or DDOS. [04:30] We COULD use scripts, that's not the point, this is more hilarious. [04:31] So go for hilarious, we'll get attention and make people snicker. [06:14] And I think people can get through the back catalog easily. [06:39] Sounds great [06:46] didn't themes.freshmeat.net used to exist? [06:46] or am I thinking of something else? [06:46] Yes, something like that. [06:46] huh. [07:59] HEY SO IT TURNS OUT [08:00] If you demand and get a dedicated laptop, you can have this laptop and a Betamax player make out for days on end. [08:00] And they'll be able to ingest so much video, it's fuckin' sick. [08:00] get a room [08:01] They did. My room. [08:01] Oh look, it's Peter Molyneux [08:01] I kind of wonder what the lady thinks of that [08:01] but then she knows you're kind of an archivist [08:02] She's got a place in the city [08:02] The lady and I don't live together [08:02] aye [08:02] I live in the fucksticks and come down often. [08:02] I think she's been up here.... maybe twice, three times [08:02] huh, okay, blows that theory out of the water [08:02] The landscape is pretty but the house itself's kinda awful [08:24] SketchCow: have you had a look at the JSTOR thing? [08:27] Not yet. [08:28] Should I? [08:29] Well, why not? It's supposed to help with your no-scripts idea. :) [08:29] It doesn't download metadata, but it does download and upload pdfs. [08:29] Give me the link again. [08:29] http://severe-samurai-6114.heroku.com/ [08:30] you should call it J-U-Stor-It [08:30] Y-U-stor-it [08:31] Heh. [08:31] Y U SO ACADEMIC [08:32] Come on dude. [08:33] Is that not the most beautiful thing ever. [08:33] Isn't that so much better than Archive Team scans and downloads [08:33] I like JUStorIt better [08:34] Where do these PDFs end up, by the way. [08:34] And yes, we really do need the metadata. [08:34] yea, with a collection that large there's no point in bothering if you don't have the metadata [08:34] But this is obviously 90% of what I was requesting. [08:34] true [08:35] should turn it into a restartless Firefox addon [08:35] that makes it automatic [08:35] We don't want automatic. [08:35] hmm [08:35] We want thousands of people to get this, run it, and be liberating JSTOR at a slower, don't go to jail pace [08:36] And JSTOR running around, watching everything go in every direction. [08:36] Embarassed as hell. [08:36] ah, I see [08:36] Style, it's about style. [08:36] we should explain that we *COULD* do it the obvious rapey way but look see that's entirely unnecessary [08:37] I'd not. [08:37] try to dissuade people from "helping" us in exactly the way we're avoiding [08:37] But I do agree on dissuading. [08:37] I could see writing something like "It turns out if you download too much you go to jail" [08:37] I'll compose something, how about that. [08:37] Since we'll have a wiki page for that. [08:37] but I'd like to see the metadata thing working. [08:37] sure [08:38] The point about the metadata is this: [08:38] 1. at the moment, the thing needs to be fed a list of article IDs [08:38] 2. if you have collected the article IDs, you also have the corresponding metadata. [08:38] ... [08:39] And if you're browsing, and not downloading, you're not doing the click through license! [08:39] Where are these uploading, by the way. [08:40] They're uploading back to the application. File name of the pdf + the data itself. (And in this example setup, nothing is saved.) [08:43] Server-side it's pretty simple: there's something that provides the next id and something that receives the POSTed data. [08:43] Are you saying severe-samurai-6114.heroku.com is getting the data? [08:44] Yes. [08:45] alard: why do you need the Base64 class? why not use the built-in functions btoa and atob? [08:45] Ignorance? [08:47] oh, you're using typed arrays too [08:47] you should put that info on the side of the page so all my retard friends don't need to ask me what it does:P [08:47] that way you'll get more "hosts" ;) [08:47] Which info. [08:47] the mission info :) [08:48] P.S. This guy and I are now working together: http://vimeo.com/29184137 [08:48] Thanks, Soojin. [08:48] I mean, wait... [08:48] ...duh [08:48] I'd rather alard and db48x work to make the code work as best it can, I'll make sure the rest is smooth. [08:48] But I want that side of things nice and tight, I can get a couple hosts going, etc. [08:48] db48x: Yes, it took some trickery to get the binary pdfs to download and upload and arrive in one piece. [08:49] Good. [08:50] yea [08:51] I'm working on a parser in javascript that uses them at the moment [08:51] For the metadata, I think the 'Summary' box contains everything? [08:51] Possibly. [08:52] So maybe it's an idea to grab that and submit it with the data, then figure out how to parse it later? [08:52] hrm [08:53] Abstract(back to top) [08:53] An abstract for this item is not available. [08:54] But the bibliographic information is there. [08:55] ugh, the source for these pages is annoying [08:57] I agree, the bibliographic info is thee. [08:57] Quickly checking to see if there's any other way to get those. [08:58] http://www.jstor.org/action/downloadCitation?format=bibtex&include=abs [08:58] Sorry, session in there. [08:58] @article{1909, [08:58] @comment{{NUMBER OF CITATIONS : 1}} [08:58] author = {Alphonsus, Brother}, [08:58] jstor_articletype = {research-article}, [08:58] title = {Birds Found in St. Joseph CO., Ind., Each Day in June, 1990}, [08:58] journal = {Midland Naturalist}, [08:58] jstor_issuetitle = {}, [08:58] volume = {1}, [08:58] number = {4}, [08:58] jstor_formatteddate = {Oct., 1909}, [08:58] pages = {pp. 97-99}, [08:58] url = {http://www.jstor.org/stable/2993227}, [08:58] ISSN = {02716844}, [08:59] abstract = {}, [08:59] language = {English}, [08:59] year = {1909}, [08:59] publisher = {The University of Notre Dame}, [08:59] copyright = {Copyright ï¿½ 1909 The University of Notre Dame}, [08:59] } [08:59] I think that's probably superior, you'll agree. [09:00] looks wonderfully structured [09:00] indeed [09:00] bibtex is the way to go [09:00] But is it complete? (Sometimes they leave things out.) [09:00] it is, after all, bibtex [09:00] why not get both [09:01] i feel like jstor may be a false flag distracting us from real fires [09:02] bibtex is somewhat expensive for JSTOR, since it has to be generated with a separate request. [09:02] Don't care about that. [09:02] howevr, i 1) am not equipped to do this argument now and 2) have not seen other fies [09:02] chronomex: I wouldn't have set alard on this if I thought it was time consuming. [09:03] And it's not, this is less than 24 hours of effort. [09:03] aye [09:03] anyway, bedtime [09:03] This little show plays well, when describing it. [09:03] Turning thousands of people who are pissed about JSTOR into mules [09:04] And we can constantly mention how this has to be done so people aren't sent to jail for 30 years. [09:04] :) [09:04] For my own bit, Friendster material goes up soon, and in doing that, it'll make life easier for the poor server [09:04] Which is now crazy clogged with data [09:04] Where did you get the bibtex? Is that via Export Citation? [09:04] alard: yes [09:04] Yes [09:05] db48x: btoa doesn't give the same results as Base64.encode [09:05] alard: no, not for a typed array :) [09:05] Okay, I'll stop trying then. [09:06] (it actually stringifies the array first, so it's really doing btoa("[object ArrayBuffer]") or whatever) [09:08] I'm going to file a bug [09:17] bug 687418 [09:18] (https://bugzilla.mozilla.org/show_bug.cgi?id=687418) [09:20] Cool. [09:25] So, have the wget guys taken the warc stuff yet? [09:27] The wget guy has said that he would look at the code, but that's a few weeks ago. [09:27] Ask him if there's anything you can answer or help with. [09:27] Just a way to say hi. [09:27] Without demanding or complaining. [09:27] Yeah. [09:28] I've just sent the copyright assignment stuff back to them, maybe that's also a good reason to email. [09:28] It's more for gnulib than for wget, but it's a reason. [09:32] Okay, the JSTOR thing should now download the bibtex and include the contents from the abstract/bibliographic sections. [09:35] What next? [09:35] I'd like a limited test. [09:35] Throw it to a few random folks, see what comes out the other end. [09:35] See how it pulls, etc. [09:38] Then the question becomes: where to host this thing? [09:39] I'll figure that out. [10:43] i'm starting to archive gbtv [10:43] and the screen savers [10:43] looks like there is like 12 months of glenn beck on archive.org [10:56] Yes [11:00] the white balance in alot of youtube screen savers episodes is off [11:00] like its too bright [18:40] I was at a They Might Be Giants concert last night... they were mentioning the various social media they were on between songs, and mentioned friendster several times [19:40] Coderjoe: did you LOL? [19:40] yes [19:40] and a coworker gave be a glance and LOLd as well [19:45] Hah [20:09] Shouldn't we do something with the Delicious archiving? Has been asked before, I know, and there are the scripts by db48x and user lists by SketchCow, but is anyone actually running those? [21:01] alard: the scripts were mostly written by underscore :) [21:01] I'm not sure how complete they are [21:01] Ah, I see :) [21:02] They seem pretty complete. [21:02] that's good [21:02] At least the cannibal.sh does seem to download most of the interesting bits. [21:02] I want to sit down and review them again, compare what they download with the site [21:03] I had to replace grep -oP with pcregrep -o, though. Sometimes my grep -oP only said 'Aborted' and didn't grep any bookmarks. [21:03] hmm [21:05] won't be today though [21:14] too bad there is non-public stuff that we can't get at [21:14] does this script also handle additional bits, like delicious library? [21:15] What's that? [21:59] alard: I haven't used it, but a friend mentioned he uses it to keep track of things like his DVD collection [22:00] But isn't that a separate program? [22:00] A shiny mac app? [22:05] I don't really know