[01:00] I wish the amount of data being seeded into JSTOR was being seen at the bottom of the pool [01:07] What do you mean? [01:11] The torrent is slow as hell, but I know there's a lot of bandwidth being pumped into it [01:11] nearly every time I check I'm pumping out as much or more than I'm downloading [01:12] Yeah, I'm downloading like 800k, and pushing 11MBps [01:12] I don't have that kind of line, but I'm pushing 200k and pulling ~100k [01:13] ahahahaha [01:13] I just realized I'm seeding to someone else at the college [01:13] That's great [01:13] awesome [01:13] Internal bandwidth is free :D [01:13] External too, actually [01:13] Gigabit to every room [01:13] oh? [01:13] It's super sexyt [01:13] sexy* [01:13] nice! [01:14] Publicly routable IPs too [01:14] It's absolutely lovely [01:14] wow, that's nice [01:14] I'll be at cal poly and they have one of the worst nets I've ever seen [01:14] tbh, the campus internet is a huge factor in whether I consider a school [01:14] Really? [01:14] yes [01:15] dorm internet is so fragile that if you plug a upnp device in, it takes out your whole floor's internet [01:15] That sucks [01:15] Really? [01:15] hahahaha [01:15] they dpi for filesharing [01:15] and then ban people from the net for doing it [01:15] Ugh, that's gross [01:15] And I mean even vpn traffic [01:15] if they see one good https connection always running etc, they'll come and search your computer [01:16] and yes, you give up that right when you go into the dorms [01:16] The IT guy was like, "You have 3 letter strikes. On the 3rd letter, I have to ban you. But usually I'll just give it back in a week" [01:16] fortunately, I don't have to live in the dorms [01:16] (letter=DMCA letter) [01:16] yeh [01:17] See that's the thing, I don't get those [01:17] And I know cal poly gets their bandwidth fre [01:17] free [01:17] they installed the fiber ring that's in town [01:17] Yeah,thatinstantly disqualifies a school for me [01:17] lol [01:17] the problem is they've privatized everything on campus [01:18] that's the thing, the IT people for the dorms are a company [01:18] not campus IT [01:18] oh [01:18] and they have `basic computing requirements` for being on the net [01:18] you have to have a whitelisted antivirus, as well as antimalware [01:18] they use the cisco clean access agent to verify [01:19] It won't give you net access without being vetted [01:22] even macs have to install it and do all that [01:22] but any linux box is specifically exempt [01:22] Yeah [01:22] It's great [01:22] They have the same thing here, but I run linux, so [02:20] 855 [02:20] cat google_verizon*|sort -u|wc -l [02:20] Not too shabby so far [05:20] that's gross [05:25] Moooorrrrning [06:02] Eighty children were shot dead in Norway yesterday. [06:31] Yeah, what the fuck's up with that [07:09] SketchCow: can you create a IA collection for wikiteam? then i can add items, or only admins can? [07:37] why doesnt internet archive translate the interface? [07:37] and faqs, and so on.. [07:37] that can help with visits [07:37] translate into what? [07:39] other languages [07:40] :P [07:40] like what, lisp? [07:40] SketchCow: A right-wing extremist wanted to kill his future political opponents [07:40] lol lisp, sure that may help with visits [07:40] Oh, and he probably also hated them because they didn't hate imigrants. [07:40] Sorry if this is off topic, I'm just furious [07:41] use your fury to archive all local newspapers mainpages, open a collection on IA [07:42] It's not going to go away, so what's the use? [08:00] SketchCow, JSTOR PDF test upload: http://www.archive.org/details/philtrans11070387 [08:01] Are metadata fields correct? [08:01] I invented some of them... [08:01] namely identifier-doi, page-ending, page-starting [08:02] and there are other fields like journal_issue according to advanced search, didn't know which one to choose [08:03] http://www.archive.org/post/351922/creating-a-collection [08:04] * Nemo_bis needs a "subscribe to thread" feature on IA forum [08:16] arghhhhhhh http://www.us.archive.org/log_show.php?task_id=81041940 [08:16] rsync: getaddrinfo: ia600502.us.archive.org 873: Temporary failure in name resolution [08:16] rsync error: error in socket IO (code 10) at clientserver.c(122) [Receiver=3.0.7] [08:25] Open Library copyright warning: By saving a change to this wiki, you agree that your contribution is given freely to the world under CC0. Yippee! [08:27] ohhhdears [08:32] Man, there are gazillions projects to help. This is infinite. [09:37] the internet never stops [13:58] Nemo_bis: Looking now. [13:59] SketchCow, http://www.us.archive.org/log_show.php?task_id=81041940 ? [14:00] Yeah, re-runnng. [14:02] Strange little issue there. [14:06] OK, first, I successfully ran it. [14:07] Now, second. Could you paste the metadata list in here? [14:08] SketchCow, yes [14:08] Everything is ready now [14:08] I've prepared the script to create the files to upload [14:08] And metadata.csv tu upload it [14:08] *to [14:08] Yes, fine. Please paste the metadata from that entry in here. [14:08] I want to see what was missed or included and how. [14:08] only that entry? [14:09] I think we could agree pasting all the entries would be a little tedious in here [14:09] I just want to compare how it went. [14:09] well, but it's among other files [14:09] For example, the Publisher seems wrong. But show me what the metadata showed. [14:09] For fuck's sake, sir. [14:09] I'll go look myself, one moment. [14:10] http://ia600502.us.archive.org/25/items/philtrans11070387/11070387.pdf_meta.txt [14:10] Sorry, I deleted that file... [14:10] I'm going to go look at the original because I have a copy [14:10] Which someone blabbed about [14:10] One momenty [14:11] http://p.defau.lt/?4vQqWdjzG_g9Xcz4GDtsSQ [14:11] Sorry, didn't understand you [14:11] and my browser crashed [14:12] Dog ate your homework [14:13] :-/ [14:13] http://p.defau.lt/?4vQqWdjzG_g9Xcz4GDtsSQ [14:13] Wait [14:13] x-archive-meta-publisher: Proceedings of the Royal Society of London. Series B, Containing Papers of a Biological Character (1905-1934) [14:13] That one. I feel like that one's wrong. Let me double check. [14:13] The rest are all perfect, good job there. [14:13] Also, I see you're using s3, bravo [14:14] you can script that and upload at a fantastic rate. [14:14] Give me a moment to compare with a couple others. [14:15] Yes, i didn't know how to treat that [14:16] Give a look at http://www.archive.org/details/philtrans00097255 as well: I left some tags because they don't seem to harm and may turn usuful sooner or later [14:17] TY - JOUR [14:17] T1 - An Account of an Appulse of the Moon to the Planet Jupiter, Observed at Chelsea, by Mr. Samuel Dunn [14:17] JF - Philosophical Transactions (1683-1775) [14:17] VL - 53 [14:17] SP - 31 [14:17] EP - 31 [14:17] PY - 1763/01/01/ [14:17] UR - http://dx.doi.org/10.1098/rstl.1763.0010 [14:17] M3 - doi:10.1098/rstl.1763.0010 [14:17] AU - Dunn, S. [14:19] OK, let's see. [14:19] First of all, no, drop the latex [14:19] Second, check the metadata, because maxwell said he included HTML by mistake. [14:19] removed it [14:19] Drop the latex. [14:19] ok [14:20] Include the original .txt file in the upload. [14:20] ok [14:20] that's how to not lose data [14:20] So, creator, date, identifier-doi, great. [14:22] x-archive-meta-journal--issue: 566 [14:22] that should be journal-issue [14:22] hm, yes, weird [14:23] make volume into journal-volume [14:23] ah, I used journal_issue [14:23] I found it in the advanced search [14:23] If we're going to make up metadata tags, might as well be consistently informative. [14:23] Yes. I don't know what are the standard tags. [14:24] Instead of publisher, make it "journal-title" [14:24] I'll just obey you. :-p [14:24] ok [14:24] set publisher to "Royal Society of London" [14:24] ok [14:24] Because we know all of these are. [14:25] Now, personally, I find these look at lot better if you generate a description. [14:25] Even though it's redundant, I'd do it. [14:26] And to generate a description, I'd concatenate these tags. [14:26] Now, here's where the STYLE kicks in [14:26] Style like bling [14:26] Let's quickly find the current bibliographic format for journal cite [14:27] Because that's sexy, that's long-term thinking. [14:28] Theory of the Luminosity Produced in Certain Substances by $ \alpha $-Rays [14:28] Proc. R. Soc. Lond. A May 11, 1910 83:561-572 [14:28] So, SketchCow, I take it the archive said these were safe? [14:28] Let's not discuss the archive [14:28] Oh [14:28] Okay [14:28] Now, Nemo_bis - [14:28] See that format how it shows up there? [14:29] yes I know... [14:30] no idea how to convert it [14:30] That's fine. [14:30] I'm composing that right now. [14:31] TITLE. AUTHOR. JOURNAL. DATE. VOLUME:STARTINGPAGE-ENDINGPAGE [14:31] That should be the description. [14:32] Generate one, paste it in here, we'll see how it looks [14:32] That should be the descrption. Looks MUCH better in the entry, gives more info, plays out nicely. [14:35] There are up to 4 authors [14:35] Concatenate them [14:35] There were more, in about 50 cases, but I drpped them [14:35] Never drop information [14:35] (by error) [14:35] I know [14:36] Jason, you're a fucking hero. (Good metadata isn't close to god, but it makes the bastard easy to pick out in a database.) [14:36] If this is going to go on, it should go on right [14:37] I'm not very comfortable adding that description, it's going to be a horrible regex [14:38] Why would it be a regex, versus writing a second script that when called with the filename, generates this description? [14:39] Because I'm using the perl script with metadata in csv [14:40] That's what I have now: https://docs.google.com/leaf?id=0B0bTq2pGEiCoOTIwMTYzYzMtNWY4ZS00ZDBjLThmZjItNzc4Yjg2YThkZmY5&hl=en_US [14:40] It's not that difficult to add a description like that, just very boring. [14:40] (If you want to avoid errors.) [14:40] Accurate > Interesting [14:42] What's boring, the scriptwriting, or the result. [14:42] The head-scratching [14:43] On first thought I don't know how to do it, let's see... [14:43] And the resulting work in doing it right, a total of a couple hours of work, result in 18,000 accurate and information-rich entries get into the archive. [14:44] 200 years of papers. [14:44] It's only duplicate info, but I'm not saying I won't do it. ;-) [14:44] it's not duplicate info. [14:45] Any more than having an element table makes a park redundant [14:45] When you do this, re-run the s3 injection into philtrans00097255 [14:46] It should just work [14:46] And replace the information [14:46] Or than having the wikitext rendered in HTML [14:48] It's not a problem for the perl script to have empty fields within "", is it? [14:48] Well, I can remove it later. [14:55] http://www.archive.org/details/SoftwareVault_198&reCache=1 [14:55] That's looking sweeeeeeeeeet [14:58] While i fix the metadata, anyone in the USA willing to run a sweet script which will create the cleaned JPG and upload them to the IA for sweet derivation? [14:59] You only need ~600 GiB, some bandwidth and a min to run the command. :-) [15:12] http://p.defau.lt/?eifOyZ9yL49Shb0_u2PcTA and then a lot of crap to clean [15:13] hm, no, too much crap, better to risk some empty references [15:15] Nemo_bis: 600GB concurrently? [15:15] underscor, yes, unless you want to split the metadata file [15:15] (which is not so difficult, only boring) [15:15] Oh I see [15:15] I have gigabit here, but only like 20GB [15:15] It's crazy fast to the archive, too [15:16] Cause I'm only 3 hops away [15:16] (The college has level 3 fiber) [15:16] So it goes from here to DC, Texas, and the archive [15:17] Usually the S3 interface doesn't receive all my bandwidth [15:17] so it's download 600 GB, run something on it, and re-upload it? [15:17] I don't know if it's a problem with lag [15:18] No, it's download 30 GiB, convert to 600 GiB (less compressed) images, upload [15:18] or find a solution to this... [15:18] How hard would it be to split the metadata file? [15:18] I [15:18] http://lists.freedesktop.org/archives/poppler/2011-July/007653.html [15:18] 'm happy to help, just can't do it all in one shot [15:18] 20 GiB is not enough, it would be very boring [15:19] Nemo_bis: See query [15:20] isnt there a database law in the US? [15:20] i know we have one in germany [15:20] wait, bullshit [15:20] does not cover the contents [15:23] no database law in USA [15:23] Only EU [15:23] In fact, it's much safer for a USA citizen to upload this [15:24] That's a rarity ;) [15:24] indeed [15:25] I have to go out to the conference [15:26] A couple hours tweaking this means 18,000 items are properly done [15:29] SketchCow, yes. 16000 should be ok now [15:29] now edge cases :-/ [15:31] Database law? [15:33] http://en.wikipedia.org/wiki/Database_right [15:34] A really horrible thing [15:35] why that? [15:36] Eww, that's super ugly. Surprised this is the first I've heard of it. [15:53] I'd like to see it. [15:53] Obviously. [15:54] Whenever. [15:54] Just overwrite that one we're working on. [15:54] I can see about wrapping up this set into a single collection. [16:00] argh, edge cases [16:00] do you want to see where I'm now? [16:00] Just keep going [16:00] Gotta go down and see.... an apple II emulator [16:00] ....written entirely in javascript [16:00] Which is some badass right there [16:11] Did you hear about the Iraq National Library and Archive destruction, right? [16:11] What a shame. [16:16] A lot of museums got destroyed. [16:17] SketchCow, does the IA have any ongoing project of digitization of outdated scientific journals? [16:17] I read about the "abandoned books". [16:17] But for instance my university has thousands of meters of old journals laying in deposits which cost a lot and are going to be destroyed sooner or later. [16:18] (shelf meters) [16:18] We should spend millions to archive them properly, and nobody will ever look at them [16:28] If they are not digitized and saved on my hard disk, they are not properly archived. [16:29] So how much knowledge could be properly archived? [16:30] (If the government doesn't donate you a bunch of hard disks, I mean.) [16:32] For your this night nightmares http://www.webarchive.org.uk/wayback/archive/20100427132359/http://www.bl.uk/iraqdiary/cilippics.html [16:33] it makes my flesh creep [16:34] * Nemo_bis doesn't know if this translates "mi si accappona la pelle" correctly [16:41] SketchCow, https://docs.google.com/leaf?id=0B0bTq2pGEiCoYTkzMGZiYzYtYWY4NC00MWFiLWExN2QtMTlhMjYzOTJmNTFj [16:42] Almost finished. A few dozens descriptions are still missing (I'm adding them now), but should be correct. [17:06] SketchCo1, did you see the link? [17:54] * Nemo_bis checking 19000 cells for errors [17:56] and now again... [18:01] Ok, that's the metadata.csv https://docs.google.com/leaf?id=0B0bTq2pGEiCoNmRkZjIzYmYtZTA4NS00Y2IyLTg3ODktNWFmZGI0M2FmNjY5&hl=en_US [18:01] If someone could check it it would be great. [18:01] SketchCo1, ^ [18:22] RE: iraq national library being destroyed: Some of the oil rich shah's over there are gathering up priceless art and literature and making storage facilities for it, I saw a tv program on it a couple years back. [20:19] Yes! [20:19] My Jizz Strong bracelets are here! [20:19] * underscor is so excited [20:19] No offense if that offends you, btw [20:21] I'm offense [20:22] :O [20:22] You're just jealous because you don't have a glow-in-the-dark live strong bracelet [20:22] s/live/jizz/ [20:23] Wait seriously? Glow in the dark? [20:23] Yes [20:25] Some hospitals have reportedly cut the Livestrong wristbands from patients' wrists because they resemble the yellow "Do Not Resuscitate" bands [20:25] (wikipedia) [20:25] Oh, these are white [20:26] do not jizz [20:26] (because they glow) [20:26] The best thing is, they can't take them away at school [20:26] why not? [20:26] Because there was a court thing or something about the "I <3 Boobies" bracelets [20:27] Why could they take it away? It's not a weapon, is it? [20:27] mmm, boobies [20:27] freespeech [20:27] and the outcome was that they were allowed, as long as they were related to a charitable cause [20:27] schools in the states have broad authority to do whatever the hell they want [20:27] I should have claimed my laser pointer is free speech ;-) [20:27] So these are for "Prostate Cancer Awareness Year" [20:27] it's a strange area of law [20:27] ;) [20:27] Well, they allow the boobies at our school, at least [20:27] But I think it was pretty high up [20:28] Dunno if it was outside state court though [20:28] If Archivism is free speech too, is deleting stuff denying free speech? [20:28] underscor: They allow boobies? or they allow "I <3 Boobies" bracelets? Important detail! [20:28] The bracelets [20:28] ... [20:28] hahahha [20:28] http://www.huffingtonpost.com/2011/04/12/i-heart-boobies-bracelets_n_848208.html [20:29] "Prostate Cancer Awareness Year"? Where did you get it? [20:29] I made it up [20:30] Ah, nice [20:30] do they have little penises on them or something? [20:30] Since it's Prostate Cancer Awareness Year, it runs for 365 days, and begins again! [20:30] Nope [20:30] Just "JIZZSTRONG" [20:30] ha [20:30] In Impact, iirc [20:30] Well, better than "I <3 Jizz" I guess ;-) [20:30] Ahahahaha [20:31] Well, we got the director of the summer program here to buy one, so I think we're safe [20:31] We'll see what my high school officials say [20:32] they'll probably say "there is no such thing" but you can say "well, see, you haven't heard of it, so I clearly need to raise some awareness" [20:32] hahaha [20:32] I think SketchCow needs one [20:32] Definitely [20:33] woop woop woop off-topic siren [20:33] Hey! Shut that off! [20:33] woop woop woop off-topic siren [20:33] D: [20:34] @quiet [20:34] hey, it doesn't work on this channel [20:34] That looks like a bot command [20:34] :P [20:34] @huggle underscor [20:34] this neither [20:35] lol [20:35] it's StewardsBot [20:35] * StewardsBot :No such nick/channel [20:35] :P [20:36] in FreeNode [20:36] Oh [20:37] StewardBot, actually [20:37] every channel needs its friendly bot [20:38] lol [21:50] SketchCow, can you raise underscor's upload rate limit via S3? things are getting in quite slowly, apparently [21:50] (if such a limit exists that way) [21:50] I think it might just be a limitation of the system [21:51] But he mentioned uploading at something like 60 MiB/s [21:52] Although it could be a limitation for things coming from outside [21:53] Probably [21:54] Also, 60MBps is above the read capacity of this HD [21:54] haha [21:54] :-D [21:55] clearly you need a bigger array [21:55] ^