Time |
Nickname |
Message |
01:03
🔗
|
undersco2 |
alard: Mind if I look at your scripts? |
01:03
🔗
|
undersco2 |
:) |
02:51
🔗
|
underscor |
alard: actually, poke me here instead |
02:51
🔗
|
underscor |
since I keep this client open |
02:51
🔗
|
underscor |
:) |
04:04
🔗
|
SketchCow |
I agree we should grab as many videos as possible. |
04:27
🔗
|
SketchCow |
In another window, I'm digitizing a BetacamSP tape from G4 TV regarding GDC's nominees and winners, as well as a bunch of G4 spots and icons. |
04:27
🔗
|
SketchCow |
This is as high quality as you can possibly get, so I've got some pretty amazing items going by. |
04:29
🔗
|
SketchCow |
-- |
04:29
🔗
|
SketchCow |
OK, again, let's go with the high level of this JSTOR thing. |
04:30
🔗
|
SketchCow |
I'd like to make it that you browse JSTOR, you see the JSTOR, but when you click and read, it downloads it and then uploads to us. |
04:30
🔗
|
SketchCow |
With all the metadata. |
04:30
🔗
|
SketchCow |
So a person is "reading" and they're also saving for us. |
04:30
🔗
|
SketchCow |
And having it so you're asking US what to read, so we don't get overlaps. |
04:30
🔗
|
SketchCow |
We'll demolish the lists in no time WITHOUT using Scripts or DDOS. |
04:30
🔗
|
SketchCow |
We COULD use scripts, that's not the point, this is more hilarious. |
04:31
🔗
|
SketchCow |
So go for hilarious, we'll get attention and make people snicker. |
06:14
🔗
|
SketchCow |
And I think people can get through the back catalog easily. |
06:39
🔗
|
ersi_ |
Sounds great |
06:46
🔗
|
chronomex |
didn't themes.freshmeat.net used to exist? |
06:46
🔗
|
chronomex |
or am I thinking of something else? |
06:46
🔗
|
SketchCow |
Yes, something like that. |
06:46
🔗
|
chronomex |
huh. |
07:59
🔗
|
SketchCow |
HEY SO IT TURNS OUT |
08:00
🔗
|
SketchCow |
If you demand and get a dedicated laptop, you can have this laptop and a Betamax player make out for days on end. |
08:00
🔗
|
SketchCow |
And they'll be able to ingest so much video, it's fuckin' sick. |
08:00
🔗
|
chronomex |
get a room |
08:01
🔗
|
SketchCow |
They did. My room. |
08:01
🔗
|
SketchCow |
Oh look, it's Peter Molyneux |
08:01
🔗
|
chronomex |
I kind of wonder what the lady thinks of that |
08:01
🔗
|
chronomex |
but then she knows you're kind of an archivist |
08:02
🔗
|
SketchCow |
She's got a place in the city |
08:02
🔗
|
SketchCow |
The lady and I don't live together |
08:02
🔗
|
chronomex |
aye |
08:02
🔗
|
SketchCow |
I live in the fucksticks and come down often. |
08:02
🔗
|
SketchCow |
I think she's been up here.... maybe twice, three times |
08:02
🔗
|
chronomex |
huh, okay, blows that theory out of the water |
08:02
🔗
|
SketchCow |
The landscape is pretty but the house itself's kinda awful |
08:24
🔗
|
alard |
SketchCow: have you had a look at the JSTOR thing? |
08:27
🔗
|
SketchCow |
Not yet. |
08:28
🔗
|
SketchCow |
Should I? |
08:29
🔗
|
alard |
Well, why not? It's supposed to help with your no-scripts idea. :) |
08:29
🔗
|
alard |
It doesn't download metadata, but it does download and upload pdfs. |
08:29
🔗
|
SketchCow |
Give me the link again. |
08:29
🔗
|
alard |
http://severe-samurai-6114.heroku.com/ |
08:30
🔗
|
chronomex |
you should call it J-U-Stor-It |
08:30
🔗
|
ersi |
Y-U-stor-it |
08:31
🔗
|
alard |
Heh. |
08:31
🔗
|
SketchCow |
Y U SO ACADEMIC |
08:32
🔗
|
SketchCow |
Come on dude. |
08:33
🔗
|
SketchCow |
Is that not the most beautiful thing ever. |
08:33
🔗
|
SketchCow |
Isn't that so much better than Archive Team scans and downloads |
08:33
🔗
|
db48x |
I like JUStorIt better |
08:34
🔗
|
SketchCow |
Where do these PDFs end up, by the way. |
08:34
🔗
|
SketchCow |
And yes, we really do need the metadata. |
08:34
🔗
|
db48x |
yea, with a collection that large there's no point in bothering if you don't have the metadata |
08:34
🔗
|
SketchCow |
But this is obviously 90% of what I was requesting. |
08:34
🔗
|
db48x |
true |
08:35
🔗
|
db48x |
should turn it into a restartless Firefox addon |
08:35
🔗
|
db48x |
that makes it automatic |
08:35
🔗
|
SketchCow |
We don't want automatic. |
08:35
🔗
|
db48x |
hmm |
08:35
🔗
|
SketchCow |
We want thousands of people to get this, run it, and be liberating JSTOR at a slower, don't go to jail pace |
08:36
🔗
|
SketchCow |
And JSTOR running around, watching everything go in every direction. |
08:36
🔗
|
SketchCow |
Embarassed as hell. |
08:36
🔗
|
db48x |
ah, I see |
08:36
🔗
|
SketchCow |
Style, it's about style. |
08:36
🔗
|
chronomex |
we should explain that we *COULD* do it the obvious rapey way but look see that's entirely unnecessary |
08:37
🔗
|
SketchCow |
I'd not. |
08:37
🔗
|
chronomex |
try to dissuade people from "helping" us in exactly the way we're avoiding |
08:37
🔗
|
SketchCow |
But I do agree on dissuading. |
08:37
🔗
|
SketchCow |
I could see writing something like "It turns out if you download too much you go to jail" |
08:37
🔗
|
SketchCow |
I'll compose something, how about that. |
08:37
🔗
|
SketchCow |
Since we'll have a wiki page for that. |
08:37
🔗
|
SketchCow |
but I'd like to see the metadata thing working. |
08:37
🔗
|
chronomex |
sure |
08:38
🔗
|
alard |
The point about the metadata is this: |
08:38
🔗
|
alard |
1. at the moment, the thing needs to be fed a list of article IDs |
08:38
🔗
|
alard |
2. if you have collected the article IDs, you also have the corresponding metadata. |
08:38
🔗
|
SketchCow |
... |
08:39
🔗
|
SketchCow |
And if you're browsing, and not downloading, you're not doing the click through license! |
08:39
🔗
|
SketchCow |
Where are these uploading, by the way. |
08:40
🔗
|
alard |
They're uploading back to the application. File name of the pdf + the data itself. (And in this example setup, nothing is saved.) |
08:43
🔗
|
alard |
Server-side it's pretty simple: there's something that provides the next id and something that receives the POSTed data. |
08:43
🔗
|
SketchCow |
Are you saying severe-samurai-6114.heroku.com is getting the data? |
08:44
🔗
|
alard |
Yes. |
08:45
🔗
|
db48x |
alard: why do you need the Base64 class? why not use the built-in functions btoa and atob? |
08:45
🔗
|
alard |
Ignorance? |
08:47
🔗
|
db48x |
oh, you're using typed arrays too |
08:47
🔗
|
Soojin |
you should put that info on the side of the page so all my retard friends don't need to ask me what it does:P |
08:47
🔗
|
Soojin |
that way you'll get more "hosts" ;) |
08:47
🔗
|
SketchCow |
Which info. |
08:47
🔗
|
Soojin |
the mission info :) |
08:48
🔗
|
SketchCow |
P.S. This guy and I are now working together: http://vimeo.com/29184137 |
08:48
🔗
|
SketchCow |
Thanks, Soojin. |
08:48
🔗
|
SketchCow |
I mean, wait... |
08:48
🔗
|
SketchCow |
...duh |
08:48
🔗
|
SketchCow |
I'd rather alard and db48x work to make the code work as best it can, I'll make sure the rest is smooth. |
08:48
🔗
|
SketchCow |
But I want that side of things nice and tight, I can get a couple hosts going, etc. |
08:48
🔗
|
alard |
db48x: Yes, it took some trickery to get the binary pdfs to download and upload and arrive in one piece. |
08:49
🔗
|
alard |
Good. |
08:50
🔗
|
db48x |
yea |
08:51
🔗
|
db48x |
I'm working on a parser in javascript that uses them at the moment |
08:51
🔗
|
alard |
For the metadata, I think the 'Summary' box contains everything? |
08:51
🔗
|
SketchCow |
Possibly. |
08:52
🔗
|
alard |
So maybe it's an idea to grab that and submit it with the data, then figure out how to parse it later? |
08:52
🔗
|
db48x |
hrm |
08:53
🔗
|
db48x |
Abstract(back to top) |
08:53
🔗
|
db48x |
An abstract for this item is not available. |
08:54
🔗
|
alard |
But the bibliographic information is there. |
08:55
🔗
|
db48x |
ugh, the source for these pages is annoying |
08:57
🔗
|
SketchCow |
I agree, the bibliographic info is thee. |
08:57
🔗
|
SketchCow |
Quickly checking to see if there's any other way to get those. |
08:58
🔗
|
SketchCow |
http://www.jstor.org/action/downloadCitation?format=bibtex&include=abs |
08:58
🔗
|
SketchCow |
Sorry, session in there. |
08:58
🔗
|
SketchCow |
@article{1909, |
08:58
🔗
|
SketchCow |
@comment{{NUMBER OF CITATIONS : 1}} |
08:58
🔗
|
SketchCow |
author = {Alphonsus, Brother}, |
08:58
🔗
|
SketchCow |
jstor_articletype = {research-article}, |
08:58
🔗
|
SketchCow |
title = {Birds Found in St. Joseph CO., Ind., Each Day in June, 1990}, |
08:58
🔗
|
SketchCow |
journal = {Midland Naturalist}, |
08:58
🔗
|
SketchCow |
jstor_issuetitle = {}, |
08:58
🔗
|
SketchCow |
volume = {1}, |
08:58
🔗
|
SketchCow |
number = {4}, |
08:58
🔗
|
SketchCow |
jstor_formatteddate = {Oct., 1909}, |
08:58
🔗
|
SketchCow |
pages = {pp. 97-99}, |
08:58
🔗
|
SketchCow |
url = {http://www.jstor.org/stable/2993227}, |
08:58
🔗
|
SketchCow |
ISSN = {02716844}, |
08:59
🔗
|
SketchCow |
abstract = {}, |
08:59
🔗
|
SketchCow |
language = {English}, |
08:59
🔗
|
SketchCow |
year = {1909}, |
08:59
🔗
|
SketchCow |
publisher = {The University of Notre Dame}, |
08:59
🔗
|
SketchCow |
copyright = {Copyright � 1909 The University of Notre Dame}, |
08:59
🔗
|
SketchCow |
} |
08:59
🔗
|
SketchCow |
I think that's probably superior, you'll agree. |
09:00
🔗
|
chronomex |
looks wonderfully structured |
09:00
🔗
|
db48x |
indeed |
09:00
🔗
|
db48x |
bibtex is the way to go |
09:00
🔗
|
alard |
But is it complete? (Sometimes they leave things out.) |
09:00
🔗
|
chronomex |
it is, after all, bibtex |
09:00
🔗
|
chronomex |
why not get both |
09:01
🔗
|
chronomex |
i feel like jstor may be a false flag distracting us from real fires |
09:02
🔗
|
alard |
bibtex is somewhat expensive for JSTOR, since it has to be generated with a separate request. |
09:02
🔗
|
SketchCow |
Don't care about that. |
09:02
🔗
|
chronomex |
howevr, i 1) am not equipped to do this argument now and 2) have not seen other fies |
09:02
🔗
|
SketchCow |
chronomex: I wouldn't have set alard on this if I thought it was time consuming. |
09:03
🔗
|
SketchCow |
And it's not, this is less than 24 hours of effort. |
09:03
🔗
|
chronomex |
aye |
09:03
🔗
|
chronomex |
anyway, bedtime |
09:03
🔗
|
SketchCow |
This little show plays well, when describing it. |
09:03
🔗
|
SketchCow |
Turning thousands of people who are pissed about JSTOR into mules |
09:04
🔗
|
SketchCow |
And we can constantly mention how this has to be done so people aren't sent to jail for 30 years. |
09:04
🔗
|
db48x |
:) |
09:04
🔗
|
SketchCow |
For my own bit, Friendster material goes up soon, and in doing that, it'll make life easier for the poor server |
09:04
🔗
|
SketchCow |
Which is now crazy clogged with data |
09:04
🔗
|
alard |
Where did you get the bibtex? Is that via Export Citation? |
09:04
🔗
|
db48x |
alard: yes |
09:04
🔗
|
SketchCow |
Yes |
09:05
🔗
|
alard |
db48x: btoa doesn't give the same results as Base64.encode |
09:05
🔗
|
db48x |
alard: no, not for a typed array :) |
09:05
🔗
|
alard |
Okay, I'll stop trying then. |
09:06
🔗
|
db48x |
(it actually stringifies the array first, so it's really doing btoa("[object ArrayBuffer]") or whatever) |
09:08
🔗
|
db48x |
I'm going to file a bug |
09:17
🔗
|
db48x |
bug 687418 |
09:18
🔗
|
db48x |
(https://bugzilla.mozilla.org/show_bug.cgi?id=687418) |
09:20
🔗
|
alard |
Cool. |
09:25
🔗
|
SketchCow |
So, have the wget guys taken the warc stuff yet? |
09:27
🔗
|
alard |
The wget guy has said that he would look at the code, but that's a few weeks ago. |
09:27
🔗
|
SketchCow |
Ask him if there's anything you can answer or help with. |
09:27
🔗
|
SketchCow |
Just a way to say hi. |
09:27
🔗
|
SketchCow |
Without demanding or complaining. |
09:27
🔗
|
alard |
Yeah. |
09:28
🔗
|
alard |
I've just sent the copyright assignment stuff back to them, maybe that's also a good reason to email. |
09:28
🔗
|
alard |
It's more for gnulib than for wget, but it's a reason. |
09:32
🔗
|
alard |
Okay, the JSTOR thing should now download the bibtex and include the contents from the abstract/bibliographic sections. |
09:35
🔗
|
alard |
What next? |
09:35
🔗
|
SketchCow |
I'd like a limited test. |
09:35
🔗
|
SketchCow |
Throw it to a few random folks, see what comes out the other end. |
09:35
🔗
|
SketchCow |
See how it pulls, etc. |
09:38
🔗
|
alard |
Then the question becomes: where to host this thing? |
09:39
🔗
|
SketchCow |
I'll figure that out. |
10:43
🔗
|
godane |
i'm starting to archive gbtv |
10:43
🔗
|
godane |
and the screen savers |
10:43
🔗
|
godane |
looks like there is like 12 months of glenn beck on archive.org |
10:56
🔗
|
SketchCow |
Yes |
11:00
🔗
|
godane |
the white balance in alot of youtube screen savers episodes is off |
11:00
🔗
|
godane |
like its too bright |
18:40
🔗
|
Coderjoe |
I was at a They Might Be Giants concert last night... they were mentioning the various social media they were on between songs, and mentioned friendster several times |
19:40
🔗
|
ersi |
Coderjoe: did you LOL? |
19:40
🔗
|
Coderjoe |
yes |
19:40
🔗
|
Coderjoe |
and a coworker gave be a glance and LOLd as well |
19:45
🔗
|
ersi |
Hah |
20:09
🔗
|
alard |
Shouldn't we do something with the Delicious archiving? Has been asked before, I know, and there are the scripts by db48x and user lists by SketchCow, but is anyone actually running those? |
21:01
🔗
|
db48x |
alard: the scripts were mostly written by underscore :) |
21:01
🔗
|
db48x |
I'm not sure how complete they are |
21:01
🔗
|
alard |
Ah, I see :) |
21:02
🔗
|
alard |
They seem pretty complete. |
21:02
🔗
|
db48x |
that's good |
21:02
🔗
|
alard |
At least the cannibal.sh does seem to download most of the interesting bits. |
21:02
🔗
|
db48x |
I want to sit down and review them again, compare what they download with the site |
21:03
🔗
|
alard |
I had to replace grep -oP with pcregrep -o, though. Sometimes my grep -oP only said 'Aborted' and didn't grep any bookmarks. |
21:03
🔗
|
db48x |
hmm |
21:05
🔗
|
db48x |
won't be today though |
21:14
🔗
|
Coderjoe |
too bad there is non-public stuff that we can't get at |
21:14
🔗
|
Coderjoe |
does this script also handle additional bits, like delicious library? |
21:15
🔗
|
alard |
What's that? |
21:59
🔗
|
Coderjoe |
alard: I haven't used it, but a friend mentioned he uses it to keep track of things like his DVD collection |
22:00
🔗
|
alard |
But isn't that a separate program? |
22:00
🔗
|
alard |
A shiny mac app? |
22:05
🔗
|
Coderjoe |
I don't really know |