[00:36] can I begin rsyncing splinder while still downloading? [00:37] yes [00:37] ok I need to get on that asap. gotta catch sketchcow for a slot? [00:37] indeed [00:37] thanks [01:38] BACK [01:38] dnova: use the upload script [01:39] # (ask SketchCow for a module name) [01:39] lol [01:40] i know. I meant when you do get a module name, use the upload script [01:40] SketchCow: I want to start uploading my splinders [01:40] Coderjoe: mos def [02:16] http://i.imgur.com/1fcec.png [02:20] hah [02:26] *facepalm* [02:28] never put dicks in your ears [02:58] Coderjoe: you think cleaning out earwax is hard? [03:08] Hello, everyone. [03:08] There are two reporters, Eva Talmadge and Matt/Matthias Schwartz, trying to do a story on Archive Team. [03:08] Please do not talk to them. [03:08] Let's put that in the lines. [03:08] -------------------------------------- [03:08] Hello, everyone. [03:08] There are two reporters, Eva Talmadge and Matt/Matthias Schwartz, trying to do a story on Archive Team. [03:08] Let's put that in the lines. [03:08] Please do not talk to them. [03:08] -------------------------------------- [03:25] SketchCow: Channel Topic, perhaps? [03:28] I expect some people will ignore. [03:28] But I did want to say it. [03:31] out of curiosity, what's your reasoning there? [03:51] http://www.mattathiasschwartz.com/ [03:51] Go read the other articles [03:51] tell me how we'll fare. [03:58] it looks like there is no way to simple turn wikipedia dump into a wikipedia website [03:59] is there any tools you guys use to read wiki dumps like a full index website? [04:12] godane: what's the goal? [04:34] someone at some point in this channel asked for a copy of Coming Soon (online magazine) (www.csoon.com)- I tried use wget-warc to make a copy of it [04:38] dashcloud: Any idea what it'd require to verify that you did things correctly? I'd love to help, but know very little about wget-warc. [04:41] here's the command I used to grab it: http://pastebin.com/Yzzw28ep, and the site's still up, minus 10-20 pages [04:43] I'm short on time right now, but I'm happy to send over my copy tomorrow [04:44] Cool. I'll take a stab at it if no one else more qualified steps forward. [04:44] the only other thing I think you need is to make sure all the directories mentioned in the command exist [04:45] (i.e don't rely on wget to create them) [04:45] good night folks! [05:19] chronomex: was trying to host a local lan version of wikipedia [05:20] ah, hm. [05:20] wow this matt guy is really artsy-fartsy with his writing [05:55] Can't we just throw them off instead? [05:59] Today has been catch-up day. [05:59] Also, have you got that rsync slot set up for me? [06:01] I can do that. [06:06] While you're around, I'd like to throw out an "archive.org was the only source for an extremely helpful page" testimonial, as if you needed more [06:06] So much good for the Internet. [06:07] note: this is archiveteam.org. We only have access to archive.org, we don't run it. [06:08] and by access we mean we have no more access than anyone else with an account [06:08] for the most part [06:08] What he said. [06:14] I know. That was directed at SketchCow. [06:15] Since this is a convenient way to throw quick thoughts at him [06:23] He's the same as what we said: only has member access. [06:39] Fair enough. My understanding was that he worked a bit closer with them. [08:47] so, google buzz is going down soonish i hear [08:50] that's the buzz [10:08] lol [10:37] sharing info about the damaged libraries by hurricane irene throguht facebook pages (see last 3 paragraphs), looks like a long term solution hell yeah http://www.librarian.net/stax/3652/helping-libraries-damaged-by-hurricane-irene/ [10:38] webcite allows uploading link batches http://www.webcitation.org/comb [10:38] it is very useful to archive tons [10:39] Archive-It tool from Internet Archive is not free, so, in this case, IA sucks [10:40] :/ [10:43] metadata for 15000+ knols complete [10:44] the channel is #klol [11:15] trying to archive all AT wiki using the webcite comb www.webcitation.org/comb [11:17] clicked the submit button, but the process is slow,... waitnig [11:19] you put 15000 urls into www.webcitation.org/comb? [11:19] no [11:19] 15000 metadata from knols downloaded, really 20000 now [11:20] the webcite submit is this http://www.archiveteam.org/index.php?title=Template:Navigation_box [11:20] less than 100 links [11:20] it scrapes all the links, you checkbox the desired links and archive [11:20] ooh [11:21] that makes more sense [11:22] by the way, uploading knol links batches to webcite is a choice [11:23] just downloading all knols, tar gzip and upload to IA is a shit [11:23] most of AT projects are not viewable [11:23] just huge packs [11:31] yea, given their size that's been the easiest way to go [11:43] He's the same as what we said: only has member access. [11:43] Actually, he's a full admin, iirc [12:23] ~25k metadata chunk for your tests http://www.sendspace.com/file/o8fthv [12:23] tab delimited [13:54] underscor: interesting. [15:06] Brp [17:47] how many sites have closed this year? [17:48] milliona [17:48] s [17:51] emijrp: check out the wiki for the deathwatch pages [17:51] i mean, i feel that this year has been very bad [17:52] it can only get worse [18:18] SketchCow: why IA doesnt setup anything like this to allow people transcript books? https://es.wikisource.org/w/index.php?title=P%C3%A1gina:Plat%C3%B3n_-_La_Rep%C3%BAblica_%281805%29,_Tomo_1.djvu/322&action=edit&redlink=1 IA OCR is worst ever [18:18] that would be really cool [18:20] emijrp: that text made little sense [18:21] text on the left is OCR autofill, later a person rewrite needed phrases [18:23] a corrected page is this https://es.wikisource.org/wiki/P%C3%A1gina:Plat%C3%B3n_-_La_Rep%C3%BAblica_%281805%29,_Tomo_1.djvu/89 [18:40] https://twitter.com/#!/brewster_kahle [18:42] emijrp: point being? A specific tweet? click the "X hours ago" to direct link [18:42] no, just recommending that twitter account [18:42] Okay [18:43] * Schbirid sendsd ersi into the fresh air [18:43] Weeeee! [19:11] The digital materials, we can make copies of. And we’ve—we have two copies within the United States, and we have a partial copy in Alexandria, Egypt, which is, I guess, fitting, as we have a large-scale swap agreement with them to archive their materials, and they archive ours. And also in Amsterdam, we have a partial copy. If there are five or six copies of these materials worldwide, I think I’d feel safe. [19:11] http://www.democracynow.org/2011/8/24/pioneering_internet_archivists_brewster_kahle_and [19:12] So, ... [22:42] So at what point should I just give up and kill a wget? [23:30] Wyatt: Check the files directory. If the most recent directory was recently modified it's still going. [23:31] And if there are a crapload of domains it's trying to download all of Splinder/MobileMe/whatever.