[01:31] *** Start has quit IRC (west.us.hub irc.mzima.net) [01:39] *** Start has joined #internetarchive.bak [02:50] Shard 13, the lesser-known sequel to District 13 [02:52] *** VADemon has quit IRC (Read error: Operation timed out) [04:15] Kaz: I'd be happy to show you how to create a shard [04:16] also to try to figure out why your iabak isn't downloading any more stuff [04:16] do you have a NOMORE file lying around? [04:16] that status page is great [04:27] *** kyan has quit IRC (Quit: Leaving) [06:03] Kaz: I love how simple the html for your status page is :) [06:06] Kaz: but you should check in a script to create it [06:18] *** Start has quit IRC (Remote host closed the connection) [06:22] *** Start has joined #internetarchive.bak [06:54] *** Start has quit IRC (Quit: Disconnected.) [07:00] db48x: there's a script to create it, but I'm trying to work out a way to have it work out active shards automatically, and keep them in order [07:01] as for downloads, I don't have a NOMORE file, it just runs through each of the active shards and then eventually says "we've run out of things for you to download" etc, don't have a log file to hand atm [09:23] *** sevs has joined #internetarchive.bak [10:47] trying to download my second shard is not going well. I need to re-arrange my partitions it seems [11:38] 03registrar 05master 60d8023 06other 10SHARD5/pubkeys registration of milenko on SHARD5 [12:44] 03registrar 05master 8ceaa41 06other 10SHARD18/pubkeys registration of fusl on SHARD18 [13:30] *** kurt has joined #internetarchive.bak [13:40] 03registrar 05master d84e93b 06other 10SHARD11/pubkeys registration of fusl on SHARD11 [13:47] 03registrar 05master c55efc5 06other 10SHARD6/pubkeys registration of Kaz on SHARD6 [13:54] 03registrar 05master 46ab4da 06other 10SHARD11/pubkeys registration of fusl on SHARD11 [14:03] *** VADemon has joined #internetarchive.bak [14:17] *** VADemon has quit IRC (Read error: Operation timed out) [14:30] *** sep332_ has joined #internetarchive.bak [15:02] *** Start has joined #internetarchive.bak [15:21] 03registrar 05master 60daf86 06other 10SHARD18/pubkeys registration of fusl on SHARD18 [15:49] *** Start has quit IRC (Quit: Disconnected.) [15:50] 03registrar 05master 2396904 06other 10SHARD17/pubkeys registration of fusl on SHARD17 [15:51] *** atomotic has joined #internetarchive.bak [16:02] what I've probably done wrong is, I put 1T at ~iabak, and ran that to get my first shard (3); then, on re-run determined i was picking up shard4, and put another 1T mount at ./shard4 [16:03] so ~iabak is pretty much full, and ./iabak is not happy about that, but it mangaes to fetch a few 10-20GB each time I run it before giving up [16:03] into iabak4 that is [16:03] I should probably move ~iabak -> ~iabak/shard3 and put a small-ish LV at ~iabak for transient stuff [16:16] 03registrar 05master d735526 06other 10SHARD10/pubkeys registration of rtucker-iabak on SHARD10 [17:06] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [17:36] I can get 10.5TB of storage for 42.65£, or get 6TB for £23 (the 10TB is an i7, and the 6TB is an ARM). (All pricing is monthly) What do people suggest? Spend more for non ARM [17:46] 03registrar 05master 3db70fa 06other 10SHARD5/pubkeys registration of deewiant+ia.bak on SHARD5 [17:46] depends how long you plan to keep it for [17:47] im using the ARM atm, and its gotten to 600GB and given up [17:47] as it cant checksum fast [17:47] £43/mo adds up very quickly compared to building a half-decent machine and buying your own drives [17:48] yea [17:48] 12+ drives to a case, replace/expand as and when you want etc [17:52] Kaz, I want to build something, but dont really know where to get started [17:53] depends what your budget is to start out really. then you've got building your own vs buying decom'd servers off ebay etc [18:35] *** VADemon has joined #internetarchive.bak [21:15] *** kyan has joined #internetarchive.bak [21:21] 03registrar 05master 1bc4740 06other 10SHARD10/pubkeys registration of Kaz on SHARD10 [21:25] db48x: just for some context as I'm going through testing things. SHARD19 is my testbed for now [21:26] 'git annex get --auto -J50' caps cpu at 100% for ~30 seconds, then exits [21:27] 'git annex get -J50' is now downloading normally (Though I assume this is just going to grab every file it can get) [21:35] 03registrar 05master 11cedde 06other 10SHARD19/pubkeys registration of Kaz on SHARD19 [22:25] 03registrar 05master 58ace3c 06other 10SHARD6/pubkeys registration of me on SHARD6 [22:40] 03registrar 05master a78b3da 06other 10SHARD5/pubkeys registration of me on SHARD5 [23:06] Kaz: indeed [23:07] git-annex git --auto tells it to use the stored preferences for what files it should get, but we don't use that git-annex feature [23:07] see git-annex help prefferred-content [23:08] err, preferred [23:12] the reason why iabak doesn't use preferred-content settings is that git-annex doesn't randomize the order in which it scans the repository, so everyone would tend to download the same files when they all jumped into a brand new shard [23:13] *** Start has joined #internetarchive.bak [23:14] iabak uses git-annex find --not --copies 4 to select files to download, shuffles that list, and then feeds it back to git-annex get [23:16] we also process it slightly to change it from a list of files into a list of items first, so that we tell git-annex to download the whole item [23:16] that helps the user when they go to view the files because they'll have complete items rather than one file from each one [23:23] 03registrar 05master 33a35c1 06other 10SHARD6/pubkeys registration of milenko on SHARD6 [23:25] Kaz: btw, the new status page needs a fast-forward button, so that we can skip over the boring parts of waiting for the clients to download things [23:25] Is there a prototype of a new status page? [23:26] SketchCow: http://iabak.archiveteam.org/status.html [23:27] https://s-media-cache-ak0.pinimg.com/236x/1b/ea/58/1bea585a849b1de39c4122fcb1dbcf62.jpg can replace the missing one [23:27] nice [23:28] Not too sure I understand what you mean? [23:29] Kaz: it currently takes _days_ to see any changes in the graphs on that page [23:30] right [23:31] will have a look into things, not too sure on the best way to go about that really [23:31] so if you put in a fast-forward button that speeds up the changes to the graphs so that they go up more quickly, that'd be great [23:31] I'll leave the implementation details up to you