[00:03] *** Lord_Nigh has joined #internetarchive.bak [00:03] 03registrar 05master 4403a1e 06other 10SHARD11/pubkeys registration of me on SHARD11 [00:13] 03registrar 05master dfc24e1 06other 10SHARD9/pubkeys registration of me on SHARD9 [00:20] 03registrar 05master 31d6f00 06other 10SHARD15/pubkeys registration of me on SHARD15 [00:33] 03registrar 05master 3e27759 06other 10SHARD16/pubkeys registration of me on SHARD16 [00:34] 03registrar 05master f1ab75d 06other 10SHARD3/pubkeys registration of me on SHARD3 [01:12] *** kyan_ has joined #internetarchive.bak [01:13] *** kyan_ has quit IRC (Remote host closed the connection) [01:42] 03registrar 05master fd6f3c7 06other 10SHARD14/pubkeys registration of me on SHARD14 [01:43] 03registrar 05master 9c87e34 06other 10SHARD4/pubkeys registration of me on SHARD4 [01:50] *** Start has joined #internetarchive.bak [02:07] 03registrar 05master e7a438b 06other 10SHARD17/pubkeys registration of me on SHARD17 [02:08] 03registrar 05master a30dd6b 06other 10SHARD18/pubkeys registration of me on SHARD18 [02:09] 03registrar 05master 60f749d 06other 10SHARD19/pubkeys registration of me on SHARD19 [02:16] Aww wyea [03:19] Frogging: you might find this useful to run, inside SHARDn directories, to make wget time out: git config annex.web-options --timeout=600 [03:20] all right, I'll try it [03:20] although, wget is supposed to have a default 900 second read timeout [03:21] a shorter timeout like 1 should also work fine [03:39] *** bwn has quit IRC (Ping timeout: 961 seconds) [04:15] *** bwn has joined #internetarchive.bak [05:01] bam [05:01] I think I've got a script to update a shard from the latest state of IA [05:02] closure: question for you [05:02] closure: if I run git-annex rekey on a file that doesn't exist in the repository, what happens? [05:03] nothing [05:05] db48x: does your update script remove files that have been darked? [05:06] 03registrar 05master a1abb0f 06other 10SHARD12/pubkeys registration of milenko on SHARD12 [05:08] *** VADemon has quit IRC (Quit: left4dead) [05:23] closure: yes [05:23] closure: though it doesn't notice files that have been removed from non-dark items [05:30] https://gist.github.com/db48x/66f41a91120266fe195aacec4c5531c2 [05:34] db48x: I'm unclear why you're putting disk space checks into iabak, are git-annex's not good enough? [06:10] *** sevs has joined #internetarchive.bak [06:22] closure: with just git-annex it can sit there for hours spewing out error messages about not having enough disk space [06:23] in the possibly-vain hope that there will be a small file that it can squeeze in [07:13] *** kyan has quit IRC (Quit: Leaving) [07:14] *** Start has quit IRC (Ping timeout: 506 seconds) [07:31] 03registrar 05master ccef620 06other 10SHARD12/pubkeys registration of removed@gmail.com on SHARD12 [07:33] *** Start has joined #internetarchive.bak [08:29] 03registrar 05master 22c5e73 06other 10SHARD15/pubkeys registration of alex on SHARD15 [08:30] 03registrar 05master 140dd41 06other 10SHARD10/pubkeys registration of alex on SHARD10 [08:37] *** atomotic has joined #internetarchive.bak [08:37] *** atomotic has quit IRC (Connection closed) [09:34] *** asktoomuc has joined #internetarchive.bak [09:58] *** asktoomuc has quit IRC (Read error: Connection reset by peer) [09:59] *** asktoomuc has joined #internetarchive.bak [10:35] *** bwn has quit IRC (Ping timeout: 244 seconds) [10:35] *** db48x` has joined #internetarchive.bak [10:36] *** db48x has quit IRC (Ping timeout: 255 seconds) [11:40] *** bwn has joined #internetarchive.bak [12:37] 03registrar 05master f3de9f7 06other 10SHARD9/pubkeys registration of alex on SHARD9 [12:55] *** asktoomuc has quit IRC (Read error: Connection reset by peer) [12:55] *** asktoomuc has joined #internetarchive.bak [13:00] *** VADemon has joined #internetarchive.bak [13:05] *** asktoomuc has quit IRC (Read error: Connection reset by peer) [13:05] *** asktoomuc has joined #internetarchive.bak [13:06] *** milenko has quit IRC (Read error: Operation timed out) [13:10] 03registrar 05master b128fc5 06other 10SHARD12/pubkeys registration of milenko on SHARD12 [13:10] 03registrar 05master b8b4a58 06other 10SHARD12/pubkeys registration of milenko on SHARD12 [13:10] 03registrar 05master 6eaae00 06other 10SHARD12/pubkeys registration of milenko on SHARD12 [13:10] 03registrar 05master fd1ae9b 06other 10SHARD12/pubkeys registration of milenko on SHARD12 [13:11] 03registrar 05master d00b134 06other 10SHARD12/pubkeys registration of milenko on SHARD12 [13:12] 03registrar 05master 462226d 06other 10SHARD12/pubkeys registration of milenko on SHARD12 [13:13] 03registrar 05master dff42d8 06other 10SHARD12/pubkeys registration of milenko on SHARD12 [13:16] 03registrar 05master 3258c18 06other 10SHARD12/pubkeys registration of milenko on SHARD12 [13:20] 03registrar 05master 7bbf5c4 06other 10SHARD12/pubkeys registration of milenko on SHARD12 [14:02] *** asktoomuc has quit IRC (Read error: Connection reset by peer) [14:03] *** asktoomuc has joined #internetarchive.bak [14:04] *** asktoomuc has quit IRC (Client Quit) [14:04] *** asktoomuc has joined #internetarchive.bak [14:10] * db48x` yawns [14:10] *** db48x` is now known as db48 [14:10] *** db48 is now known as db48x [14:10] * db48x is confused [14:11] I'm getting a lot of duplicate output from this script [14:28] *** milenko has joined #internetarchive.bak [14:50] 928G 30G 97% /home/iabak! but hm. exactly the same as yesterday. iabak seems to have stalled [14:50] 78rpm/KanesHawaiians-AlekokiAndLiliuE/KanesHawaiians-AlekokiAndLiliuE_meta.xml has been downloading for >24 hours [14:50] * Jon might restart it [14:53] *** sep332 has joined #internetarchive.bak [15:04] Jon: sounds like the download failed to time out [15:04] someone else saw the same thing [15:09] * Jon kicks it and restarts [15:12] *** asktoomuc has quit IRC (Ping timeout: 260 seconds) [15:12] *** asktoomuc has joined #internetarchive.bak [15:14] *** asktoomuc has quit IRC (Client Quit) [15:17] *** bwn has quit IRC (Ping timeout: 961 seconds) [15:21] I wish I could have brough my monitors on vacation with me [15:22] db48x: I downloaded a couple TB of shard1 back before the cron job was written :) [15:22] is there something wrong with my shard3? [15:23] *** asktoomuc has joined #internetarchive.bak [15:24] this is the thing wher esome of the files aren't in the IA any more? [15:48] sep332: no? [15:49] but if I sent you a public key and you sent me a public key, and we each set up ssh access for the other, then our iabaks would be able to pull files from each other [15:50] since a fair portion of the files in the early shards have gone dark, it would bring the percentages up even though the files aren't available from IA any longer [15:53] Is there any point in storing files from shard3 that will never be needed for a restore? [15:54] ^ SketchCow [15:54] yes [15:55] I mean if the archive deleted them they probably don't want them back... [15:55] Is it not better to remove them from the shards and free up disk space for other data? [15:56] sep332: dark items aren't deleted, they're merely not actively distributed [15:56] *** bwn has joined #internetarchive.bak [15:57] YESSSSS [15:57] SO MUCH POINT [15:58] man... I did not know that [15:58] ok [15:58] You are a bad backup. [15:58] * SketchCow shakes collar [15:58] * SketchCow slap [15:58] grr [15:58] IA sometimes loses stuff due to outside forces [15:58] Backups prevent that from being an issue [15:58] That's a living, breathing reason besides all the other theoretical concerns and general architectural worries. [15:59] i'll try to be more rogue in future [16:00] db48x: so, I'm currently adding tor P2P to git-annex :) [16:00] closure: yes, via Tor [16:01] one the one hand that is cool [16:01] on the other hand it's not ssh [16:01] indeed, tossing a lot of data over tor may not be fast [16:03] * db48x grumbles; the metadata for dark items doesn't include the collection [16:03] if some of the early shards have dark items and only a few copies, it would be a good chance to test out *restore* from iabak [16:04] closure: my thoughts precisely (was just thinking that in a day or two I'll ambush someone else, but tell them that it's a firedrill for restoring) [16:04] restoring just those files to a server, which could then send them out to other nodes [16:04] how is restoring going to work? [16:05] Senji: the plan, as I recall, is to restore files to a repository that is accessible via HTTP(S) [16:05] well, the clients and server have ssh connections. git-annex can upload over ssh. so.. [16:06] yea, that could work as well [16:06] (I wish I had one of the HSMs I use at work for backing up iabak data) [16:07] though I figured it would need to be a dedicated server, separate from the server(s) hosting the shards themselves [16:07] db48x what happened to the files that have gone dark? The IA lost some data? They were set to private collections? [16:07] asktoomuc: usually it's copyright-related [16:07] certianly would need a different, larger server for restoring a large quantity of data [16:08] all the ssh public keys of clients are checked into a git repo, so other servers can use them [16:08] closure: oh, that's cool [16:09] nifty [16:09] (private git repo hosted on gitlab as registration data is somewhat confidential) [16:10] anyway, it would be useful to have a tool to update a shard, removing the IA url for files that the IA no longer has [16:11] then, the restore command on the client is something like: git annex copy --not --in web --to restoreserver [16:12] I'm working on that tool, but it's giving me a headache [16:13] I need a DAG of pipes [16:13] lemme know if the headache is on the git-annex side.. [16:14] closure: so far I'm just dumping the git-annex commands into a file, so that I can look at what I'm generating [16:14] sounds like you need a real programming language :) [16:14] A restore is definitely on the cards in December [16:14] That's the plan [16:14] I just wanted us to continue the refinements and dealing we're going with at the moment. [16:14] Since it's super important. [16:14] closure: it's getting more and more likely [16:15] cut your losses and use go or whatever [16:15] I can see a haskell library that can make a DAG of pipes in my mind [16:16] maybe it is a mirage [16:17] there's one called "pipes" :P [16:17] heh :) [16:17] "Elegant semantics: Use practical category theory " [16:17] only in haskell library descriptions [16:23] heh, nice description [16:23] a question occurs [16:24] what do we want to do if a shard currently contains an item, but then it gets moved out of the collection that the shard contains? [16:24] it's not dark, so we don't want to rmurl it [16:29] *** ask_ has joined #internetarchive.bak [16:30] https://gist.github.com/db48x/66f41a91120266fe195aacec4c5531c2#file-update-shard-sh-L24 [16:30] so very unpipelined [16:34] http://iabak.archiveteam.org/SHARD4.html [16:34] I'm fascinated at how it seems to stop [16:34] Is that because we need more people? Or something else. [16:37] *** asktoomuc has quit IRC (Ping timeout: 740 seconds) [16:38] Hmm, apparently I have an unregistered shard on that shard. I should fix that up when I'm back at home tomorrow :-D [16:42] SketchCow: shard4 contains the 9/11 TV Archive, which is stream-only; I haven't checked this, but I think that translates to "cannot download" [16:45] how do you fix unregister segments? I think I have a couple [16:56] hmm I still seem to be invisible/unregistered on iabak pages. wonder why. [16:56] done 1T of shard3 [16:58] *** asktoomuc has joined #internetarchive.bak [17:02] I've got 786G of shard4 [17:02] any idea why I get those? "git-annex: unable to decommit memory: Invalid argument" [17:04] what distro? [17:08] this feels like it's working: https://gist.github.com/db48x/66f41a91120266fe195aacec4c5531c2 [17:09] I'll check back after lunch [17:10] *** kula has joined #internetarchive.bak [17:15] *** sevs has quit IRC (Quit: Page closed) [17:18] *** sevs has joined #internetarchive.bak [17:26] that's annoying [17:26] it looks like these errors are killing my process, hence I haven't backep up anything in the last 24 hours [17:41] db48x: i need to safely shutdown the iabak server for maintenance. there seems to be tmux sessions running, any idea who's running those and how can i arrange to get them safely stopped? [17:51] sec [17:51] it's probably me [17:51] mine is gone [17:53] I feel like something isn't working properly. On shard3 (822MB on my side): "git-annex: unable to decommit memory: Invalid argument Wow! I'm done downloading this shard of the IA!" [17:54] man this is hard. I have been at it for the past few days and I'm having a really hard time making it work on a pretty standard Debian 8 install. I wonder how you guys do it... [17:58] Kaz: nah it's db48x mostly [18:03] asktoomuc: what's your kernel version, and is there anything non-standard about your setup? [18:03] any changed options, etc [18:04] it's 3.16.0-4-amd64 [18:05] and no, I just installed it as usual in a VM from the Debian iso: debian-8.6.0-amd64-netinst.iso [18:05] the share drive is mounted using NFS (I installed nfs-common and git and git-annex from apt-get) [18:05] so I can't see anything out of place really [18:06] I was able to download ~80G this weekend but since I updated the script to the latest version yesterday, it doesn't seem to work for me anymore [18:07] probably need to change the git-annex build to the i386ancient one to support these old kernels [18:09] old? XD [18:09] that's the Debian stable version [18:09] but I can deploy it in a Debian 10 install maybe [18:09] what's the minimum version of the kernel recommended? [18:10] https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-i386-ancient.tar.gz [18:10] untar and replace git-annex.linux directory in the iabak repo with that [18:10] kernel 3.16 was released August 2014 [18:12] well okay, I love Debian but I'll admit they are lagging behind with their stable version [18:12] I'm running Debian 8 as well but I realized I also updated my kernel to 4.7.3 [18:13] thanks closure [18:13] it might be the kernel taking issue with git-annex doing things with a 1TB vm allocation [18:13] which I've noticed it's doing on mine [18:14] it's apparently a new way to free memory which libc has started using w/o caring if the kernel supports it [18:14] not sure why we need a new way to free memory in 2016 [18:14] munmap()? :p [18:15] hmmm ok, I'll try the package you gave me and if that doesn't work, I'll scrap the VM and get a Ubuntu server or Debian unstable [18:15] or just update the kernel [18:15] apt-get dit-upgrade or is it more involved? [18:16] I'm actually not sure, I compiled mine because of reasons, but there might well be a "friendlier" way [18:16] I'm not bad with a computer but very far from being a sysadmin [18:16] might be easier to just reinstall a latest distro in that case [18:17] here https://packages.debian.org/jessie-backports/linux-image-amd64 [18:17] you should be able to jsut add jessie-backports to your apt sources [18:18] https://backports.debian.org/Instructions/#index2h2 [18:19] ah and there's a special command to install from backports [18:19] on that page [18:20] oh ok, thanks Frogging [18:20] packages from backports have lower priority, that's why you need this command [18:20] yeah^ [18:20] learn something new everyday [18:20] maybe it's to avoid accidentally dragging the whole system forward with dependencies when installing stuff [18:21] which can happen if you add unstable repos to your sources and such :p [18:21] yes, you either opt-in individually, or override all priorities at once [18:22] *** ask_ has quit IRC (Quit: Bye) [18:29] *** komarEX has joined #internetarchive.bak [18:29] hello [18:29] guys git annex is so slow with huge number of files that I could cry a river [18:30] git repack -ad && git gc && git update-index --index-version 4 [18:30] should I ? [19:11] *** VADemon has quit IRC (Quit: left4dead) [19:44] Right, not tuu sure what on easrth is happening here [19:44] the script has decided to attempt bittorrent downloads [19:44] I've gone through and disabled the bittorrent remote (or so I thought) [20:37] *** atomotic has joined #internetarchive.bak [21:15] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [21:39] what's the best way to run the script once logged out? using screen? launching the script in the background with "&" [21:41] I already ran the "loginctl enable-linger user" as recommended [21:41] screen works [21:41] then screen works [21:41] it installs a crontab too [21:56] are those expected? "Continuing any downloads that were previously interrupted... flock: cannot open lock file .git/annex/iasyncer.lock: No such file or directory" [21:57] I think I will start again from scratch, too many issues [21:58] I must have broken something [22:02] i get that flock line all the time, doesn't seem to affect anything [22:18] *** komarEX has quit IRC (Quit: Page closed) [22:18] 03registrar 05master 2677c00 06other 10SHARD14/pubkeys registration of roninfight on SHARD14 [22:20] *** asktoomuc has quit IRC (Quit: Page closed) [22:25] 03registrar 05master 058800e 06other 10SHARD14/pubkeys registration of roninfight on SHARD14 [22:29] 03registrar 05master 448a743 06other 10SHARD14/pubkeys registration of roninfight on SHARD14 [22:30] 03registrar 05master 885dc43 06other 10SHARD14/pubkeys registration of roninfight on SHARD14 [22:34] 03registrar 05master e429741 06other 10SHARD14/pubkeys registration of roninfight on SHARD14 [22:40] *** asktoomuc has joined #internetarchive.bak [22:40] *** asktoomuc has quit IRC (Client Quit) [22:50] *** bwn has quit IRC (Ping timeout: 244 seconds) [23:03] *** sep332 has quit IRC (konversation out) [23:14] do some shards have their stats update more often than others? I have 190G from shard10 and have for over a day, shard15 I have 150G, neither of those appear to be updated, but I am showing up on shard14 and shard16