[00:01] They're working on mirroring text dumps. They're already something like 10 TB https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps [00:01] A mirror has been added recently, an rsync to a second server is happening right now, a new cluster has been ordered yesterday. [00:01] So, things are moving a bit. [00:15] qls [00:15] Whoops, wrong tab [02:51] any ideas for how to scrape fanfiction.net [02:53] bsmith094, didn't you get that already? [02:53] got the stoey id numbers, not the stories [05:56] heh, wow, www.naenara.com.kp uses client-side imagemaps [05:56] I haven't seen those in a long time [13:04] The article is up [13:04] (Tech Review) [13:04] You can read it and decide if my concerns were accurate. [13:10] http://www.technologyreview.com/article/39317/ [13:12] nice comic :) [13:34] Hoh! Awesome [14:14] haha, awesome quotes [14:23] :] [14:25] I did somehow expect Schwartz to totally go nuts on AT though [14:26] dunno why, might have something to do with his personal site [15:34] the images rock [15:49] shmmmm. [15:59] good morning fellas [17:17] huh [17:17] I wonder how they figured out some of the handles in this channel [17:17] most likely by looking at archiveteam.org and inferring [17:17] OR PERHAPS SOMEONE IN HERE IS A MOLE [17:18] also, I've got a WARC of naenara.com.kp; what's the easiest way to get that to IA? register and upload? [17:19] yep [17:19] is it in .warc? [17:20] yes [17:20] it's only 1.6 GB, gzipped [17:20] I've got a significant part of the North Korean internet on a USB pen drive [17:20] that's an awesome thought [17:21] rad. [17:23] I'd say upload it, then let info@archive.org know. [17:24] using http://www.archive.org/create/, I guess? [17:24] or is there a specialized upload point for WARCs? [17:24] that link was just the first I found [17:27] yes, that [17:30] "You appear to be using the Firefox browser. [17:30] The browser will only upload files of 2GB or less." [17:30] that's right [17:30] good thing that fits within those limits [17:32] heh [17:33] I understand how a browser can parse that, but I'm simultaneously amazed that it works [17:34] yipdw: that is beautiful [17:35] browsers must have some of the best backwards compatibility ever [17:57] alright, uploaded and notified [18:14] mm quirks mode [18:15] you can also create items through the s3 interface. each item is a bucket and the first file uploaded to the bucket creates it. [20:05] I like the TR article [20:18] which article? [20:23] bsmith094, http://www.technologyreview.com/article/39317/ [21:18] Tjamls sp ,icj. yipdw [21:18] Tjat [21:18] Thanks so much yipdw [21:18] That's the golden stuff. [21:20] np [21:20] I'll see what else I can get before they officially replace Dear Leader [21:32] Excellent. [21:32] Actually, I think the grandson is already dear leader. [21:35] oh, good point [21:35] ha, naenara updaetd [21:35] updated [21:35] might as well run the mirror again [21:38] Obviously get the sets. [21:38] Want a place to FTP? [21:40] I'm uploading to archive.org right now via their HTTP interface [21:40] FTP would probably be nicer, though [21:40] though I guess I can also use the S3-alike [21:43] Get a free account [21:43] and you can upload via FTP [21:43] I can help with that. [21:46] SketchCow: got the account [21:46] er, I mean, I have an account [21:46] brb [22:52] hm. I still hate magtape. [22:58] those fuckers in the 70s [22:59] the oxide is glued to the tape with a urethane compound, which gets gummy over the course of about 10 years [22:59] whee [22:59] you can fix it by baking the thing at 135-140F [22:59] and warp the base [23:00] less than ~135 won't do anything, more than 140 will cause printthrough [23:00] I am helping negotiate the possible transfer of something like 135,000 tapes [23:00] the base of my carts is a 2mm aluminum slab [23:00] Isn't that exciting. [23:00] ooh, what of? [23:00] and what type of tapes? [23:00] http://www.sfgate.com/cgi-bin/article.cgi?f=/n/a/2011/12/20/national/a065958S85.DTL [23:00] It might be 35,000, someone might have typod. [23:01] Reel to reel of some sort [23:01] They talked about Base64, a program that compresses digital documents for speedy transmission by removing all the spaces and punctuation marks. [23:01] :| [23:01] Everything the Christian Science Monitor recorded for radio, ever [23:01] ah. i wish these were reel tapes, that would solve some problems :| [23:01] wow [23:01] mmm... so one side of the tape gets extra crispy while the other stays original recipie [23:01] Coderjoe: ? [23:01] oh [23:01] They're being digitized, we're just discussing having the original tapes. [23:02] cool [23:02] I hate tape *cartridges*. [23:02] chronomex: the 2mm aluminum slab. it will cook one side of the tape more than the other [23:03] almost to the point where i'm going to pay someone to do this for me [23:03] Coderjoe: Ah. I see. No. [23:03] tape is at right angle to the base [23:04] duh [23:04] well maybe trks 0 and 1 vs 2 and 3 [23:04] side being EDGE not flat [23:05] who ever talks about datatape that way :P [23:06] Anyone feel like typing in Compute! tables of contents? [23:08] archiveteam retruns this site may be compromised on google [23:09] Gotta fixz that. [23:13] anything reasonably sized that needs downloading?, not mobileme, too big, i couldnt put a start of a dent in that [23:15] goddamnit, I sent ^C to the wrong terminal [23:15] I hate when I kill a process that's been running for an hour or so [23:15] at least it's idempotent [23:21] worse still when it's 80% done with a two-week job. [23:22] I haven't done that before [23:22] it sucks hard [23:23] even worse is when things get OOM-killed [23:23] I'm still, STILL adding Jamendo. [23:33] yeah. oom is the pits. [23:45] did thingiverse ever finish up, or is thta still open? [23:48] We did a full round [23:48] We'll do another round at some point.