#archiveteam 2011-09-19,Mon

↑back Search

Time	Nickname	Message
01:03 ^🔗	undersco2	alard: Mind if I look at your scripts?
01:03 ^🔗	undersco2	:)
02:51 ^🔗	underscor	alard: actually, poke me here instead
02:51 ^🔗	underscor	since I keep this client open
02:51 ^🔗	underscor	:)
04:04 ^🔗	SketchCow	I agree we should grab as many videos as possible.
04:27 ^🔗	SketchCow	In another window, I'm digitizing a BetacamSP tape from G4 TV regarding GDC's nominees and winners, as well as a bunch of G4 spots and icons.
04:27 ^🔗	SketchCow	This is as high quality as you can possibly get, so I've got some pretty amazing items going by.
04:29 ^🔗	SketchCow	--
04:29 ^🔗	SketchCow	OK, again, let's go with the high level of this JSTOR thing.
04:30 ^🔗	SketchCow	I'd like to make it that you browse JSTOR, you see the JSTOR, but when you click and read, it downloads it and then uploads to us.
04:30 ^🔗	SketchCow	With all the metadata.
04:30 ^🔗	SketchCow	So a person is "reading" and they're also saving for us.
04:30 ^🔗	SketchCow	And having it so you're asking US what to read, so we don't get overlaps.
04:30 ^🔗	SketchCow	We'll demolish the lists in no time WITHOUT using Scripts or DDOS.
04:30 ^🔗	SketchCow	We COULD use scripts, that's not the point, this is more hilarious.
04:31 ^🔗	SketchCow	So go for hilarious, we'll get attention and make people snicker.
06:14 ^🔗	SketchCow	And I think people can get through the back catalog easily.
06:39 ^🔗	ersi_	Sounds great
06:46 ^🔗	chronomex	didn't themes.freshmeat.net used to exist?
06:46 ^🔗	chronomex	or am I thinking of something else?
06:46 ^🔗	SketchCow	Yes, something like that.
06:46 ^🔗	chronomex	huh.
07:59 ^🔗	SketchCow	HEY SO IT TURNS OUT
08:00 ^🔗	SketchCow	If you demand and get a dedicated laptop, you can have this laptop and a Betamax player make out for days on end.
08:00 ^🔗	SketchCow	And they'll be able to ingest so much video, it's fuckin' sick.
08:00 ^🔗	chronomex	get a room
08:01 ^🔗	SketchCow	They did. My room.
08:01 ^🔗	SketchCow	Oh look, it's Peter Molyneux
08:01 ^🔗	chronomex	I kind of wonder what the lady thinks of that
08:01 ^🔗	chronomex	but then she knows you're kind of an archivist
08:02 ^🔗	SketchCow	She's got a place in the city
08:02 ^🔗	SketchCow	The lady and I don't live together
08:02 ^🔗	chronomex	aye
08:02 ^🔗	SketchCow	I live in the fucksticks and come down often.
08:02 ^🔗	SketchCow	I think she's been up here.... maybe twice, three times
08:02 ^🔗	chronomex	huh, okay, blows that theory out of the water
08:02 ^🔗	SketchCow	The landscape is pretty but the house itself's kinda awful
08:24 ^🔗	alard	SketchCow: have you had a look at the JSTOR thing?
08:27 ^🔗	SketchCow	Not yet.
08:28 ^🔗	SketchCow	Should I?
08:29 ^🔗	alard	Well, why not? It's supposed to help with your no-scripts idea. :)
08:29 ^🔗	alard	It doesn't download metadata, but it does download and upload pdfs.
08:29 ^🔗	SketchCow	Give me the link again.
08:29 ^🔗	alard	http://severe-samurai-6114.heroku.com/
08:30 ^🔗	chronomex	you should call it J-U-Stor-It
08:30 ^🔗	ersi	Y-U-stor-it
08:31 ^🔗	alard	Heh.
08:31 ^🔗	SketchCow	Y U SO ACADEMIC
08:32 ^🔗	SketchCow	Come on dude.
08:33 ^🔗	SketchCow	Is that not the most beautiful thing ever.
08:33 ^🔗	SketchCow	Isn't that so much better than Archive Team scans and downloads
08:33 ^🔗	db48x	I like JUStorIt better
08:34 ^🔗	SketchCow	Where do these PDFs end up, by the way.
08:34 ^🔗	SketchCow	And yes, we really do need the metadata.
08:34 ^🔗	db48x	yea, with a collection that large there's no point in bothering if you don't have the metadata
08:34 ^🔗	SketchCow	But this is obviously 90% of what I was requesting.
08:34 ^🔗	db48x	true
08:35 ^🔗	db48x	should turn it into a restartless Firefox addon
08:35 ^🔗	db48x	that makes it automatic
08:35 ^🔗	SketchCow	We don't want automatic.
08:35 ^🔗	db48x	hmm
08:35 ^🔗	SketchCow	We want thousands of people to get this, run it, and be liberating JSTOR at a slower, don't go to jail pace
08:36 ^🔗	SketchCow	And JSTOR running around, watching everything go in every direction.
08:36 ^🔗	SketchCow	Embarassed as hell.
08:36 ^🔗	db48x	ah, I see
08:36 ^🔗	SketchCow	Style, it's about style.
08:36 ^🔗	chronomex	we should explain that we COULD do it the obvious rapey way but look see that's entirely unnecessary
08:37 ^🔗	SketchCow	I'd not.
08:37 ^🔗	chronomex	try to dissuade people from "helping" us in exactly the way we're avoiding
08:37 ^🔗	SketchCow	But I do agree on dissuading.
08:37 ^🔗	SketchCow	I could see writing something like "It turns out if you download too much you go to jail"
08:37 ^🔗	SketchCow	I'll compose something, how about that.
08:37 ^🔗	SketchCow	Since we'll have a wiki page for that.
08:37 ^🔗	SketchCow	but I'd like to see the metadata thing working.
08:37 ^🔗	chronomex	sure
08:38 ^🔗	alard	The point about the metadata is this:
08:38 ^🔗	alard	1. at the moment, the thing needs to be fed a list of article IDs
08:38 ^🔗	alard	2. if you have collected the article IDs, you also have the corresponding metadata.
08:38 ^🔗	SketchCow	...
08:39 ^🔗	SketchCow	And if you're browsing, and not downloading, you're not doing the click through license!
08:39 ^🔗	SketchCow	Where are these uploading, by the way.
08:40 ^🔗	alard	They're uploading back to the application. File name of the pdf + the data itself. (And in this example setup, nothing is saved.)
08:43 ^🔗	alard	Server-side it's pretty simple: there's something that provides the next id and something that receives the POSTed data.
08:43 ^🔗	SketchCow	Are you saying severe-samurai-6114.heroku.com is getting the data?
08:44 ^🔗	alard	Yes.
08:45 ^🔗	db48x	alard: why do you need the Base64 class? why not use the built-in functions btoa and atob?
08:45 ^🔗	alard	Ignorance?
08:47 ^🔗	db48x	oh, you're using typed arrays too
08:47 ^🔗	Soojin	you should put that info on the side of the page so all my retard friends don't need to ask me what it does:P
08:47 ^🔗	Soojin	that way you'll get more "hosts" ;)
08:47 ^🔗	SketchCow	Which info.
08:47 ^🔗	Soojin	the mission info :)
08:48 ^🔗	SketchCow	P.S. This guy and I are now working together: http://vimeo.com/29184137
08:48 ^🔗	SketchCow	Thanks, Soojin.
08:48 ^🔗	SketchCow	I mean, wait...
08:48 ^🔗	SketchCow	...duh
08:48 ^🔗	SketchCow	I'd rather alard and db48x work to make the code work as best it can, I'll make sure the rest is smooth.
08:48 ^🔗	SketchCow	But I want that side of things nice and tight, I can get a couple hosts going, etc.
08:48 ^🔗	alard	db48x: Yes, it took some trickery to get the binary pdfs to download and upload and arrive in one piece.
08:49 ^🔗	alard	Good.
08:50 ^🔗	db48x	yea
08:51 ^🔗	db48x	I'm working on a parser in javascript that uses them at the moment
08:51 ^🔗	alard	For the metadata, I think the 'Summary' box contains everything?
08:51 ^🔗	SketchCow	Possibly.
08:52 ^🔗	alard	So maybe it's an idea to grab that and submit it with the data, then figure out how to parse it later?
08:52 ^🔗	db48x	hrm
08:53 ^🔗	db48x	Abstract(back to top)
08:53 ^🔗	db48x	An abstract for this item is not available.
08:54 ^🔗	alard	But the bibliographic information is there.
08:55 ^🔗	db48x	ugh, the source for these pages is annoying
08:57 ^🔗	SketchCow	I agree, the bibliographic info is thee.
08:57 ^🔗	SketchCow	Quickly checking to see if there's any other way to get those.
08:58 ^🔗	SketchCow	http://www.jstor.org/action/downloadCitation?format=bibtex&include=abs
08:58 ^🔗	SketchCow	Sorry, session in there.
08:58 ^🔗	SketchCow	@article{1909,
08:58 ^🔗	SketchCow	@comment{{NUMBER OF CITATIONS : 1}}
08:58 ^🔗	SketchCow	author = {Alphonsus, Brother},
08:58 ^🔗	SketchCow	jstor_articletype = {research-article},
08:58 ^🔗	SketchCow	title = {Birds Found in St. Joseph CO., Ind., Each Day in June, 1990},
08:58 ^🔗	SketchCow	journal = {Midland Naturalist},
08:58 ^🔗	SketchCow	jstor_issuetitle = {},
08:58 ^🔗	SketchCow	volume = {1},
08:58 ^🔗	SketchCow	number = {4},
08:58 ^🔗	SketchCow	jstor_formatteddate = {Oct., 1909},
08:58 ^🔗	SketchCow	pages = {pp. 97-99},
08:58 ^🔗	SketchCow	url = {http://www.jstor.org/stable/2993227},
08:58 ^🔗	SketchCow	ISSN = {02716844},
08:59 ^🔗	SketchCow	abstract = {},
08:59 ^🔗	SketchCow	language = {English},
08:59 ^🔗	SketchCow	year = {1909},
08:59 ^🔗	SketchCow	publisher = {The University of Notre Dame},
08:59 ^🔗	SketchCow	copyright = {Copyright ï¿½ 1909 The University of Notre Dame},
08:59 ^🔗	SketchCow	}
08:59 ^🔗	SketchCow	I think that's probably superior, you'll agree.
09:00 ^🔗	chronomex	looks wonderfully structured
09:00 ^🔗	db48x	indeed
09:00 ^🔗	db48x	bibtex is the way to go
09:00 ^🔗	alard	But is it complete? (Sometimes they leave things out.)
09:00 ^🔗	chronomex	it is, after all, bibtex
09:00 ^🔗	chronomex	why not get both
09:01 ^🔗	chronomex	i feel like jstor may be a false flag distracting us from real fires
09:02 ^🔗	alard	bibtex is somewhat expensive for JSTOR, since it has to be generated with a separate request.
09:02 ^🔗	SketchCow	Don't care about that.
09:02 ^🔗	chronomex	howevr, i 1) am not equipped to do this argument now and 2) have not seen other fies
09:02 ^🔗	SketchCow	chronomex: I wouldn't have set alard on this if I thought it was time consuming.
09:03 ^🔗	SketchCow	And it's not, this is less than 24 hours of effort.
09:03 ^🔗	chronomex	aye
09:03 ^🔗	chronomex	anyway, bedtime
09:03 ^🔗	SketchCow	This little show plays well, when describing it.
09:03 ^🔗	SketchCow	Turning thousands of people who are pissed about JSTOR into mules
09:04 ^🔗	SketchCow	And we can constantly mention how this has to be done so people aren't sent to jail for 30 years.
09:04 ^🔗	db48x	:)
09:04 ^🔗	SketchCow	For my own bit, Friendster material goes up soon, and in doing that, it'll make life easier for the poor server
09:04 ^🔗	SketchCow	Which is now crazy clogged with data
09:04 ^🔗	alard	Where did you get the bibtex? Is that via Export Citation?
09:04 ^🔗	db48x	alard: yes
09:04 ^🔗	SketchCow	Yes
09:05 ^🔗	alard	db48x: btoa doesn't give the same results as Base64.encode
09:05 ^🔗	db48x	alard: no, not for a typed array :)
09:05 ^🔗	alard	Okay, I'll stop trying then.
09:06 ^🔗	db48x	(it actually stringifies the array first, so it's really doing btoa("[object ArrayBuffer]") or whatever)
09:08 ^🔗	db48x	I'm going to file a bug
09:17 ^🔗	db48x	bug 687418
09:18 ^🔗	db48x	(https://bugzilla.mozilla.org/show_bug.cgi?id=687418)
09:20 ^🔗	alard	Cool.
09:25 ^🔗	SketchCow	So, have the wget guys taken the warc stuff yet?
09:27 ^🔗	alard	The wget guy has said that he would look at the code, but that's a few weeks ago.
09:27 ^🔗	SketchCow	Ask him if there's anything you can answer or help with.
09:27 ^🔗	SketchCow	Just a way to say hi.
09:27 ^🔗	SketchCow	Without demanding or complaining.
09:27 ^🔗	alard	Yeah.
09:28 ^🔗	alard	I've just sent the copyright assignment stuff back to them, maybe that's also a good reason to email.
09:28 ^🔗	alard	It's more for gnulib than for wget, but it's a reason.
09:32 ^🔗	alard	Okay, the JSTOR thing should now download the bibtex and include the contents from the abstract/bibliographic sections.
09:35 ^🔗	alard	What next?
09:35 ^🔗	SketchCow	I'd like a limited test.
09:35 ^🔗	SketchCow	Throw it to a few random folks, see what comes out the other end.
09:35 ^🔗	SketchCow	See how it pulls, etc.
09:38 ^🔗	alard	Then the question becomes: where to host this thing?
09:39 ^🔗	SketchCow	I'll figure that out.
10:43 ^🔗	godane	i'm starting to archive gbtv
10:43 ^🔗	godane	and the screen savers
10:43 ^🔗	godane	looks like there is like 12 months of glenn beck on archive.org
10:56 ^🔗	SketchCow	Yes
11:00 ^🔗	godane	the white balance in alot of youtube screen savers episodes is off
11:00 ^🔗	godane	like its too bright
18:40 ^🔗	Coderjoe	I was at a They Might Be Giants concert last night... they were mentioning the various social media they were on between songs, and mentioned friendster several times
19:40 ^🔗	ersi	Coderjoe: did you LOL?
19:40 ^🔗	Coderjoe	yes
19:40 ^🔗	Coderjoe	and a coworker gave be a glance and LOLd as well
19:45 ^🔗	ersi	Hah
20:09 ^🔗	alard	Shouldn't we do something with the Delicious archiving? Has been asked before, I know, and there are the scripts by db48x and user lists by SketchCow, but is anyone actually running those?
21:01 ^🔗	db48x	alard: the scripts were mostly written by underscore :)
21:01 ^🔗	db48x	I'm not sure how complete they are
21:01 ^🔗	alard	Ah, I see :)
21:02 ^🔗	alard	They seem pretty complete.
21:02 ^🔗	db48x	that's good
21:02 ^🔗	alard	At least the cannibal.sh does seem to download most of the interesting bits.
21:02 ^🔗	db48x	I want to sit down and review them again, compare what they download with the site
21:03 ^🔗	alard	I had to replace grep -oP with pcregrep -o, though. Sometimes my grep -oP only said 'Aborted' and didn't grep any bookmarks.
21:03 ^🔗	db48x	hmm
21:05 ^🔗	db48x	won't be today though
21:14 ^🔗	Coderjoe	too bad there is non-public stuff that we can't get at
21:14 ^🔗	Coderjoe	does this script also handle additional bits, like delicious library?
21:15 ^🔗	alard	What's that?
21:59 ^🔗	Coderjoe	alard: I haven't used it, but a friend mentioned he uses it to keep track of things like his DVD collection
22:00 ^🔗	alard	But isn't that a separate program?
22:00 ^🔗	alard	A shiny mac app?
22:05 ^🔗	Coderjoe	I don't really know

irclogger-viewer