[01:36] *** GLaDOS has quit (Ping timeout: 512 seconds) [01:56] *** Kazzy has quit (Read error: Operation timed out) [01:56] *** Kazzy (~Kaz@[redacted]) has joined #internetarchive.bak [01:56] *** svchfoo1 gives channel operator status to Kazzy [03:23] *** bzc6p__ (~bzc6p@[redacted]) has joined #internetarchive.bak [03:29] *** bzc6p_ has quit (Ping timeout: 600 seconds) [03:30] *** GLaDOS (~STR_IDENT@[redacted]) has joined #internetarchive.bak [03:30] *** svchfoo1 gives channel operator status to GLaDOS [03:42] *** GLaDOS has quit (Ping timeout: 260 seconds) [03:45] *** GLaDOS (~STR_IDENT@[redacted]) has joined #internetarchive.bak [03:45] *** svchfoo1 gives channel operator status to GLaDOS [04:26] *** wp494 has quit (LOUD UNNECESSARY QUIT MESSAGES) [05:28] *** wp494 (~wickedpla@[redacted]) has joined #internetarchive.bak [06:04] *** zottelbey (~zottelbey@[redacted]) has joined #internetarchive.bak [09:47] *** Start has quit (Ping timeout: 740 seconds) [10:06] *** Start (~Start@[redacted]) has joined #internetarchive.bak [10:06] *** svchfoo2 gives channel operator status to Start [10:49] *** svchfoo2 has quit (Remote host closed the connection) [10:51] *** svchfoo2 (~chfoo2@[redacted]) has joined #internetarchive.bak [10:58] *** svchfoo1 gives channel operator status to svchfoo2 [11:21] *** zottelbey has quit (Remote host closed the connection) [11:25] *** bzc6p__ is now known as bzc6p [12:00] *** zottelbey (~zottelbey@[redacted]) has joined #internetarchive.bak [12:27] *** zottelbey has quit (Remote host closed the connection) [13:12] *** zottelbey (~zottelbey@[redacted]) has joined #internetarchive.bak [13:20] *** zottelbey has quit (Remote host closed the connection) [13:35] *** zottelbey (~zottelbey@[redacted]) has joined #internetarchive.bak [14:53] *** wp494 has quit (LOUD UNNECESSARY QUIT MESSAGES) [15:28] 1.5TB so far! [15:58] *** bzc6p_ (~bzc6p@[redacted]) has joined #internetarchive.bak [16:04] *** bzc6p has quit (Ping timeout: 600 seconds) [16:40] *** patricko- is now known as patrickod [16:57] ! [16:58] How big is the data in total? [16:58] wiki says shard1 is 2.91TB [17:00] Fools, forces beyond your control, etc [17:00] We should probably have a second person [17:00] Anyone got space? [17:07] *** patrickod is now known as patricko- [17:23] I'm not going to do a full TB, but I'll pick up 500 or so gb [17:24] I want it separate from you. [17:24] You're on point for server and shard and watching what goes bazinga. [17:25] If need be, I will do it, but I'd rather it be outsider. [17:25] closure: how many other people requested keys? [17:25] so far 3 people have keys [17:27] so just me, midas and underscor? [17:35] How much space is needed? [17:35] For this test set? [17:36] it is 2.91 tb, but no one person has to take the whole thing [17:38] What OS should they be running [17:38] linux or OSX [17:40] *** closure finishes a (slow) stats run [17:40] numcopies +0: 72540 [17:40] numcopies +1: 30793 [17:40] numcopies +2: 10 [17:40] Calling out. [17:40] +0 is only in IA, +1 one backup, etc [17:41] that's by files [17:41] *** balrog (~balrog@[redacted]) has joined #internetarchive.bak [17:41] nice call out [17:42] Yes. [17:42] Well, time to up the test, man. [17:42] oh btw guys, I forgot to mention this little config option: git config annex.diskreserve 200GB [17:43] then it'll leave 200 gb free when you let it rip [17:44] oh, doing git-annex? [17:44] right, for git-annex get [17:44] (currently I'm hitting some FTPs) [17:46] git config annex.web-options=--limit-rate=200k [17:46] you can also configure it to tell wget to --limit-rate for bandwidth [17:48] So. [17:48] People are going to start to arrive in here, over the next day or two. [17:48] I'm primarily aiming them at you, closure. [17:48] We should probably wiki instructions, so feel free to make subpages or new pages on the wiki. [17:51] I've got three volunteers so far. [17:52] so I currently have ~6TB of slow but open storage, and I'm bringing 12TB online this week. expanding to 30TB over time. [17:52] Obviously I don't want to use all of this for IA.bak [17:52] (right now I'm hitting FTP servers) [17:59] I don't think you should use more than 500gb at this point. [18:00] *** Owen-x (~owen@[redacted]) has joined #internetarchive.bak [18:01] 8 people are interested. [18:01] We'll see if the IRC hurdle is an issue. [18:01] closure: What's in the test set, by the way. [18:02] It's internetarchivebooks and usenethistorical collections [18:03] Aww, that's a nice set. [18:04] yeah, I noticed the tweet, but I don't have the hd space to spare right now, busy archiving other stuff [18:04] (as i told you in PM SketchCow, just didn't know a discussion about it was going on here) [18:08] Hello - I've got 1TB online here but could probably spare 3 or 4 in the near future [18:08] What do I do? [18:09] Owen-x: take a read over this page: http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK/git-annex_implementation#shard1 [18:09] ah OK, that one [18:09] Owen-x: If there's clarifications you want or issues you hit, please bring them, so closure can make the instructions more definite. [18:10] yes, this is all very rough and we're learning by doing [18:10] OK, will do, I'll work through it now [18:12] *** jxyzn (cfbc1136@[redacted]) has joined #internetarchive.bak [18:12] *** yuppie_ (webchat@[redacted]) has joined #internetarchive.bak [18:13] *** yuppie_ has quit (Client Quit) [18:14] closure: i think we should add that ppa to the list [18:15] midas: go ahead [18:15] OK, how do I get my id_rsa.pub added so I can clone the git? [18:15] im currently on my cellphone but ill add it when i get home :p [18:15] Owen-x: you can just send it to me [18:16] email? [18:16] /msg is fine [18:16] OK [18:16] it's a public key, you could take out an add in the NYT ;) [18:18] Owen-x: key added [18:18] but dont, the ads in the nyt are expensive. [18:18] yes, buy TB with that $$$ instead [18:19] whoop! Here we go [18:19] ---------------------------------------------------------------------- [18:20] If ANYONE in here is comfortable working with closure to make [18:20] a nice graphic interface to the activities of the server, [18:20] *** jxyzn (cfbc1136@[redacted]) has left #internetarchive.bak [18:20] speak up. Bonus for crazy loopy stuff [18:20] ---------------------------------------------------------------------- [18:20] *** jxyzn (607e660e@[redacted]) has joined #internetarchive.bak [18:20] woop woop craaazy loops [18:21] closure: what kind of data are you getting back for graphs etc? [18:21] all it gets back is pushes of the git-annex branch that say what file was in what clone, when [18:23] any examples? [18:23] (pastebin or something) [18:27] http://pastebin.com/ZyQiRTSC [18:28] it would be doable to mine those git commits and do some status display showing files that are backed up and not, or number of bytes stored [18:28] how did you get that list earlier, of +0 +1 +2 files? [18:29] "MD5-s28966-" is a file of 28966 bytes [18:29] I used "git annex status ." in the repo for that, but it takes a couple minutes to read everything [18:30] hm [18:30] i was hoping on some more eatable data so i could use google charts [18:33] SketchCow: food for tought, speed getting data from IA. im on a 200Mbit pipe here and barely hitting 200KB/s. ofcourse it depends on location what kind of speed someone will get. [18:33] but it took me ~2 days for 40GB already [18:56] *** hater (bneA465dS9@[redacted]) has joined #internetarchive.bak [19:56] *** garyrh gives channel operator status to arkiver balrog chfoo db48x [19:56] *** garyrh gives channel operator status to underscor [20:00] *** bzc6p_ is now known as bzc6p [20:21] Understood. [20:22] midas - Where are you pulling from [20:22] Is this teamarchive1? [20:23] grabbing from Location: https://ia600505.us.archive.org/15/items/americaneducator08fost/americaneducator08fost_orig_jp2.tar [following] [20:23] So, that's DIRECTLY from archive. [20:23] 1. yay [20:23] yeah [20:23] 2. ohh [20:33] *** richo (~richo@[redacted]) has joined #internetarchive.bak [20:45] I've provided a number of people links to the channel. Not sure how many will show. [20:52] k [20:57] zottelbey: I've added your keu [20:57] er, key [20:57] thanks [21:16] *** zottelbey has quit (Remote host closed the connection) [21:22] I've been informed there's a global load balancing issue with the machines right now at the IA, so that's why some items are slow. We should proceed, but that's why. [21:39] *** wp494 (~wickedpla@[redacted]) has joined #internetarchive.bak [21:41] okay, thanks for the heads up SketchCow [21:52] who is in charge of the git-annex-implementation or the server hosting the files (or the git-repo)? [21:52] closure. [21:52] he's running the test. [21:53] SketchCow: thx [21:55] closure: are you recording the serverload per client? [21:58] and do you need some help? [22:46] *** wp494 has quit (LOUD UNNECESSARY QUIT MESSAGES) [22:52] *** Owen-x has quit (Owen-x) [22:53] *** wp494 (~wickedpla@[redacted]) has joined #internetarchive.bak [22:56] Closure may have reasonable waking hours, so he might not get back yet. [23:08] server's load is 0 0 0 [23:09] *** Owen-x (~owen@[redacted]) has joined #internetarchive.bak [23:10] closure: is anyone downloading files atm? [23:11] sure, from the IA. No idea what their load is ;) [23:11] our server just gets a git push from time to time, I imagine it will scale quite a way [23:20] *** Owen-x has quit (Owen-x) [23:20] *** jxyzn has quit (Quit: http://chat.efnet.org (Session timeout)) [23:22] *** aschmitz has quit (Read error: Connection reset by peer) [23:41] *** Owen-x (~owen@[redacted]) has joined #internetarchive.bak