[00:00] when you check files X, Y and Z into a git repository, git creates a tree object which lists the files, their modes, and the hash of their content [00:00] however, this is recursive, so files in a directory get their own tree objects [00:01] i think it's more the logistics of a git repo with 100000 files [00:01] and using that as the primary data structure [00:01] yes [00:01] It's not as if git-annex is slow [00:02] larger git repositories are generally slower [00:02] the other problem is i'm not exactly using fask disks [00:03] shards only have approximately 100k files because putting all 271 million files in a single git repository would be insane indeed [00:04] Now that would be slow [00:04] i'm attaching all the storage i have lying around and fiddling around with union filesystems [00:04] aufs seems to be doing okayish [00:05] I have a number of little shardlets on disks of their own [00:05] it's probably pretty rare to have that many free inodes sitting around as well [00:05] And an IDE dock [00:06] Senji: how do you split a shard across two disks so you don't double-download anything? [00:06] the real question is that i have no clue what comes after /dev/sdz [00:06] tpw_rules: downloaded it first and split it later [00:07] db48x: jdamery-iabak@cleopatra:/mnt/daled/ia/IA.BAK/shard2 has 385M free inodes [00:07] tpw_rules: Looks like what comes after /dev/sdz is /dev/sdaa. [00:07] (but only 639G free space) [00:08] So, it's basically base-26 numbered. Simple enough. [00:08] aww i didn't want you to spoil it [00:08] Sorry. [00:08] Senji how did you split it? [00:09] Senji: fair enough; I only have 60M [00:09] plus, you need two per file once the repository fills up :) [00:09] db48x: scary thing; my personal backup disk is using 170M inides [00:10] tpw_rules: guddle around in .git/annex/objects :) [00:10] ah okay [00:10] so there's no like git annex cut yet [00:10] tpw_rules: http://git-annex.branchable.com/design/balanced_preferred_content/ [00:10] that'd let you cut a repository in have [00:11] add both repositories to a group, and make it want one copy out of all the repositories in that grounp [00:11] group [00:14] hmm. that looks a little complex [00:29] no more so than the rest of git annex :) [00:33] well you put in a nice script for me :P [00:33] does fscking only start when all the shards are downloaded? [00:37] *** primus has quit IRC (Read error: Operation timed out) [00:37] *** primus104 has quit IRC (Read error: Operation timed out) [00:38] *** primus104 has joined #internetarchive.bak [00:39] *** primus has joined #internetarchive.bak [00:49] *** VADemon has quit IRC (Read error: Connection reset by peer) [00:51] *** primus105 has joined #internetarchive.bak [00:57] *** primus104 has quit IRC (Read error: Operation timed out) [01:00] *** primus105 has quit IRC (Read error: Operation timed out) [01:01] *** primus104 has joined #internetarchive.bak [01:15] *** garyrh has joined #internetarchive.bak [01:59] *** primus104 has quit IRC (Leaving.) [02:29] tpw_rules: it depends on what mode the shard is in [02:58] it seems to be skipping the fsck of shard 1 [03:36] i'd like to force that to run to make sure it's still good [03:38] i think somehow the shards got out of order so it's working on 5 first [05:26] well, all it does is grab a list of them, then run handleshard in each one, sequentially [05:26] it doesn't make any effort to do them in any particular order, it just does them in whatever order your filesystem coughs them up [05:27] usually that's the order of creation, but I don't think filesystems make any gaurantees about the order in which they return directory entries [06:46] *** chazchaz has quit IRC (Ping timeout: 369 seconds) [06:50] *** chazchaz has joined #internetarchive.bak [07:57] *** primus104 has joined #internetarchive.bak [08:29] *** primus104 has quit IRC (Leaving.) [09:14] 03registrar 05master 48ecb76 06other 10SHARD4/pubkeys registration of db48x iabak on SHARD4 [10:04] hmm [10:04] Timeout, server iabak.archiveteam.org not responding.236.00 KiB/s [10:04] fatal: The remote end hung up unexpectedly [10:04] seconds after registering; weird coincidence [10:05] well, dunno about seconds [10:05] at some point before it finished the initial clone [10:05] anyway, time to sleep [11:18] *** primus104 has joined #internetarchive.bak [11:40] db48x: install-fsck-service will try and install a systemd service even if there's a crontab line [12:25] *** ohhdemgir has quit IRC (Ping timeout: 512 seconds) [12:55] *** primus104 has quit IRC (Leaving.) [13:00] *** ohhdemgir has joined #internetarchive.bak [14:04] *** protodev has quit IRC (Ping timeout: 606 seconds) [14:08] *** protodev has joined #internetarchive.bak [14:08] *** GLaDOS has quit IRC (Excess Flood) [14:09] *** GLaDOS has joined #internetarchive.bak [14:09] *** svchfoo3 sets mode: +o GLaDOS [16:32] 03registrar 05master 8782199 06other 10SHARD1/pubkeys registration of twatson52 on SHARD1 [16:32] 03registrar 05master c16516a 06other 10SHARD5/pubkeys registration of twatson52 on SHARD5 [16:32] sorry, breaking everything [16:32] 03registrar 05master 9eb16e5 06other 10SHARD6/pubkeys registration of twatson52 on SHARD6 [16:32] *** primus104 has joined #internetarchive.bak [16:39] i ruined everything again [16:42] 03registrar 05master ee3df6b 06other 10SHARD1/pubkeys registration of twatson52 on SHARD1 [16:43] Control socket connect(.git/annex/ssh/SHARD1@iabak.archiveteam.org): Connection refused i keep getting [16:47] 03registrar 05master c6f6fc5 06other 10SHARD4/pubkeys registration of twatson52 on SHARD4 [16:48] sorry [16:50] 03registrar 05master b779ea5 06other 10SHARD5/pubkeys registration of twatson52 on SHARD5 [16:51] 03registrar 05master 4881f9e 06other 10SHARD6/pubkeys registration of twatson52 on SHARD6 [16:51] that should be the last one until i catch everything on fire again [16:57] syncing is just not happening. Control socket connect(.git/annex/ssh/SHARD1@iabak.archiveteam.org): Connection refused [16:57] i deleted everything, copied the objects folder from the old one, and i still can't sync [16:59] does it create any kind of sockets on the filesystem or something? [17:15] oh ok, i did a "git config annex.sshcaching false" and that fixed it because i'm on a weird filesystem [19:18] *** primus104 has quit IRC (Leaving.) [19:51] *** primus104 has joined #internetarchive.bak [20:07] oops i already broke a hard drive [20:12] i somehow got a 'data phase error' [20:12] not a clue what that means [20:32] fun :) [20:32] the net says http://www.spinics.net/lists/usb/msg01782.html [20:33] Senji: hmm, I suppose that's true. why would you want to use a crontab line when you have systemd though? [20:35] well it's a sata adapter so *shrug* [20:35] i think i overheated it [20:35] also fsck resume and parallelization doesn't seem to be working right [20:57] how so? [20:59] oops [21:01]   anglachel  db48x  ~  projects  IA.BAK  shard4  du -chsL 2>&1 | sort -h [21:01] sort: write failed: /tmp/sortfSTPMa: No space left on device [21:03] it completely ignored my diskreserve [21:11] db48x: two fscks checksum the same files [21:11] i think [21:14] i'll let it complete a full one and see what happens [21:15] ugh no this drive is broken. next time i can dick around with it is wednesday [21:16] or it at least need to be zeroed [21:19] but that might mean i'll have to reregister all the shards because i r dum [21:39] reregistering is fine :) [21:41] db48x: I don't have systemd; I have enough bits of systemd to make other things work, which apparently includes the executable that you're checking for :) [21:42] I have no idea if a systemd user service will work or not [21:44] db48x: there's at least one bug with diskreserve that closure has fixed in his development version [21:44] (of git annex) [22:05] Senji: hmm. what's your PID 1? [22:24] sysvinit [22:25] Senji: not a symlink to systemd? [22:25] ;) [22:30] no [22:39] huh [22:40] how does having systemctl make other things work? [22:40] and what does it actually do? [22:40] do you have /bin/systemd as well? [22:42] db48x: debian packages systemd in fairly large lumps I think. [22:42] I do have /bin/systemd [22:48] Senji: what does "dpkg -l sysvinit systemd-sysv" say? [22:52] un systemd-sysv (no description available) [22:52] ii sysvinit 2.88dsf-59 amd64 System-V-like init utilities - tr [23:11] so you have both installed, but you're not using systemd at all [23:11] except to confuse programs that try to do the right thing in both init systems [23:15] Yes, because things depend on it [23:19] *** ohhdemgir has quit IRC (Quit: Leaving) [23:19] Looks as if the fundamental thing pulling systemd in is xfce4's session manager [23:24] *** ryang has quit IRC (Connection closed) [23:25] does xcfe4's session manager work if you have systemd but aren't actually using it? [23:26] I have no idea, I don't use xfce myself; but I've not had any commplaints :) [23:26] Various bits of gnome would pull it in too [23:26] heh [23:26] But I don't do gnome [23:27] *** ryang has joined #internetarchive.bak [23:41] Apparently the official way to check for running systemd is the existence of /run/systemd/system [23:41] I don't ahve a systemd system to try that on :) [23:47] that's easy enough to check for [23:47] do you have documentation though? [23:50] Only stackexchange and reddit answers