#internetarchive.bak 2016-11-22,Tue

↑back Search

Time Nickname Message
00:03 🔗 Lord_Nigh has joined #internetarchive.bak
00:03 🔗 iabak-reg 03registrar 05master 4403a1e 06other 10SHARD11/pubkeys registration of me on SHARD11
00:13 🔗 iabak-reg 03registrar 05master dfc24e1 06other 10SHARD9/pubkeys registration of me on SHARD9
00:20 🔗 iabak-reg 03registrar 05master 31d6f00 06other 10SHARD15/pubkeys registration of me on SHARD15
00:33 🔗 iabak-reg 03registrar 05master 3e27759 06other 10SHARD16/pubkeys registration of me on SHARD16
00:34 🔗 iabak-reg 03registrar 05master f1ab75d 06other 10SHARD3/pubkeys registration of me on SHARD3
01:12 🔗 kyan_ has joined #internetarchive.bak
01:13 🔗 kyan_ has quit IRC (Remote host closed the connection)
01:42 🔗 iabak-reg 03registrar 05master fd6f3c7 06other 10SHARD14/pubkeys registration of me on SHARD14
01:43 🔗 iabak-reg 03registrar 05master 9c87e34 06other 10SHARD4/pubkeys registration of me on SHARD4
01:50 🔗 Start has joined #internetarchive.bak
02:07 🔗 iabak-reg 03registrar 05master e7a438b 06other 10SHARD17/pubkeys registration of me on SHARD17
02:08 🔗 iabak-reg 03registrar 05master a30dd6b 06other 10SHARD18/pubkeys registration of me on SHARD18
02:09 🔗 iabak-reg 03registrar 05master 60f749d 06other 10SHARD19/pubkeys registration of me on SHARD19
02:16 🔗 SketchCow Aww wyea
03:19 🔗 closure Frogging: you might find this useful to run, inside SHARDn directories, to make wget time out: git config annex.web-options --timeout=600
03:20 🔗 Frogging all right, I'll try it
03:20 🔗 closure although, wget is supposed to have a default 900 second read timeout
03:21 🔗 closure a shorter timeout like 1 should also work fine
03:39 🔗 bwn has quit IRC (Ping timeout: 961 seconds)
04:15 🔗 bwn has joined #internetarchive.bak
05:01 🔗 db48x bam
05:01 🔗 db48x I think I've got a script to update a shard from the latest state of IA
05:02 🔗 db48x closure: question for you
05:02 🔗 db48x closure: if I run git-annex rekey on a file that doesn't exist in the repository, what happens?
05:03 🔗 closure nothing
05:05 🔗 closure db48x: does your update script remove files that have been darked?
05:06 🔗 iabak-reg 03registrar 05master a1abb0f 06other 10SHARD12/pubkeys registration of milenko on SHARD12
05:08 🔗 VADemon has quit IRC (Quit: left4dead)
05:23 🔗 db48x closure: yes
05:23 🔗 db48x closure: though it doesn't notice files that have been removed from non-dark items
05:30 🔗 db48x https://gist.github.com/db48x/66f41a91120266fe195aacec4c5531c2
05:34 🔗 closure db48x: I'm unclear why you're putting disk space checks into iabak, are git-annex's not good enough?
06:10 🔗 sevs has joined #internetarchive.bak
06:22 🔗 db48x closure: with just git-annex it can sit there for hours spewing out error messages about not having enough disk space
06:23 🔗 db48x in the possibly-vain hope that there will be a small file that it can squeeze in
07:13 🔗 kyan has quit IRC (Quit: Leaving)
07:14 🔗 Start has quit IRC (Ping timeout: 506 seconds)
07:31 🔗 iabak-reg 03registrar 05master ccef620 06other 10SHARD12/pubkeys registration of removed@gmail.com on SHARD12
07:33 🔗 Start has joined #internetarchive.bak
08:29 🔗 iabak-reg 03registrar 05master 22c5e73 06other 10SHARD15/pubkeys registration of alex on SHARD15
08:30 🔗 iabak-reg 03registrar 05master 140dd41 06other 10SHARD10/pubkeys registration of alex on SHARD10
08:37 🔗 atomotic has joined #internetarchive.bak
08:37 🔗 atomotic has quit IRC (Connection closed)
09:34 🔗 asktoomuc has joined #internetarchive.bak
09:58 🔗 asktoomuc has quit IRC (Read error: Connection reset by peer)
09:59 🔗 asktoomuc has joined #internetarchive.bak
10:35 🔗 bwn has quit IRC (Ping timeout: 244 seconds)
10:35 🔗 db48x` has joined #internetarchive.bak
10:36 🔗 db48x has quit IRC (Ping timeout: 255 seconds)
11:40 🔗 bwn has joined #internetarchive.bak
12:37 🔗 iabak-reg 03registrar 05master f3de9f7 06other 10SHARD9/pubkeys registration of alex on SHARD9
12:55 🔗 asktoomuc has quit IRC (Read error: Connection reset by peer)
12:55 🔗 asktoomuc has joined #internetarchive.bak
13:00 🔗 VADemon has joined #internetarchive.bak
13:05 🔗 asktoomuc has quit IRC (Read error: Connection reset by peer)
13:05 🔗 asktoomuc has joined #internetarchive.bak
13:06 🔗 milenko has quit IRC (Read error: Operation timed out)
13:10 🔗 iabak-reg 03registrar 05master b128fc5 06other 10SHARD12/pubkeys registration of milenko on SHARD12
13:10 🔗 iabak-reg 03registrar 05master b8b4a58 06other 10SHARD12/pubkeys registration of milenko on SHARD12
13:10 🔗 iabak-reg 03registrar 05master 6eaae00 06other 10SHARD12/pubkeys registration of milenko on SHARD12
13:10 🔗 iabak-reg 03registrar 05master fd1ae9b 06other 10SHARD12/pubkeys registration of milenko on SHARD12
13:11 🔗 iabak-reg 03registrar 05master d00b134 06other 10SHARD12/pubkeys registration of milenko on SHARD12
13:12 🔗 iabak-reg 03registrar 05master 462226d 06other 10SHARD12/pubkeys registration of milenko on SHARD12
13:13 🔗 iabak-reg 03registrar 05master dff42d8 06other 10SHARD12/pubkeys registration of milenko on SHARD12
13:16 🔗 iabak-reg 03registrar 05master 3258c18 06other 10SHARD12/pubkeys registration of milenko on SHARD12
13:20 🔗 iabak-reg 03registrar 05master 7bbf5c4 06other 10SHARD12/pubkeys registration of milenko on SHARD12
14:02 🔗 asktoomuc has quit IRC (Read error: Connection reset by peer)
14:03 🔗 asktoomuc has joined #internetarchive.bak
14:04 🔗 asktoomuc has quit IRC (Client Quit)
14:04 🔗 asktoomuc has joined #internetarchive.bak
14:10 🔗 * db48x` yawns
14:10 🔗 db48x` is now known as db48
14:10 🔗 db48 is now known as db48x
14:10 🔗 * db48x is confused
14:11 🔗 db48x I'm getting a lot of duplicate output from this script
14:28 🔗 milenko has joined #internetarchive.bak
14:50 🔗 Jon 928G 30G 97% /home/iabak! but hm. exactly the same as yesterday. iabak seems to have stalled
14:50 🔗 Jon 78rpm/KanesHawaiians-AlekokiAndLiliuE/KanesHawaiians-AlekokiAndLiliuE_meta.xml has been downloading for >24 hours
14:50 🔗 * Jon might restart it
14:53 🔗 sep332 has joined #internetarchive.bak
15:04 🔗 db48x Jon: sounds like the download failed to time out
15:04 🔗 db48x someone else saw the same thing
15:09 🔗 * Jon kicks it and restarts
15:12 🔗 asktoomuc has quit IRC (Ping timeout: 260 seconds)
15:12 🔗 asktoomuc has joined #internetarchive.bak
15:14 🔗 asktoomuc has quit IRC (Client Quit)
15:17 🔗 bwn has quit IRC (Ping timeout: 961 seconds)
15:21 🔗 db48x I wish I could have brough my monitors on vacation with me
15:22 🔗 sep332 db48x: I downloaded a couple TB of shard1 back before the cron job was written :)
15:22 🔗 sep332 is there something wrong with my shard3?
15:23 🔗 asktoomuc has joined #internetarchive.bak
15:24 🔗 Senji this is the thing wher esome of the files aren't in the IA any more?
15:48 🔗 db48x sep332: no?
15:49 🔗 db48x but if I sent you a public key and you sent me a public key, and we each set up ssh access for the other, then our iabaks would be able to pull files from each other
15:50 🔗 db48x since a fair portion of the files in the early shards have gone dark, it would bring the percentages up even though the files aren't available from IA any longer
15:53 🔗 sep332 Is there any point in storing files from shard3 that will never be needed for a restore?
15:54 🔗 sep332 ^ SketchCow
15:54 🔗 db48x yes
15:55 🔗 sep332 I mean if the archive deleted them they probably don't want them back...
15:55 🔗 Senji Is it not better to remove them from the shards and free up disk space for other data?
15:56 🔗 db48x sep332: dark items aren't deleted, they're merely not actively distributed
15:56 🔗 bwn has joined #internetarchive.bak
15:57 🔗 SketchCow YESSSSS
15:57 🔗 SketchCow SO MUCH POINT
15:58 🔗 sep332 man... I did not know that
15:58 🔗 sep332 ok
15:58 🔗 SketchCow You are a bad backup.
15:58 🔗 * SketchCow shakes collar
15:58 🔗 * SketchCow slap
15:58 🔗 sep332 grr
15:58 🔗 SketchCow IA sometimes loses stuff due to outside forces
15:58 🔗 SketchCow Backups prevent that from being an issue
15:58 🔗 SketchCow That's a living, breathing reason besides all the other theoretical concerns and general architectural worries.
15:59 🔗 sep332 i'll try to be more rogue in future
16:00 🔗 closure db48x: so, I'm currently adding tor P2P to git-annex :)
16:00 🔗 db48x closure: yes, via Tor
16:01 🔗 db48x one the one hand that is cool
16:01 🔗 db48x on the other hand it's not ssh
16:01 🔗 closure indeed, tossing a lot of data over tor may not be fast
16:03 🔗 * db48x grumbles; the metadata for dark items doesn't include the collection
16:03 🔗 closure if some of the early shards have dark items and only a few copies, it would be a good chance to test out *restore* from iabak
16:04 🔗 db48x closure: my thoughts precisely (was just thinking that in a day or two I'll ambush someone else, but tell them that it's a firedrill for restoring)
16:04 🔗 closure restoring just those files to a server, which could then send them out to other nodes
16:04 🔗 Senji how is restoring going to work?
16:05 🔗 db48x Senji: the plan, as I recall, is to restore files to a repository that is accessible via HTTP(S)
16:05 🔗 closure well, the clients and server have ssh connections. git-annex can upload over ssh. so..
16:06 🔗 db48x yea, that could work as well
16:06 🔗 Senji (I wish I had one of the HSMs I use at work for backing up iabak data)
16:07 🔗 db48x though I figured it would need to be a dedicated server, separate from the server(s) hosting the shards themselves
16:07 🔗 asktoomuc db48x what happened to the files that have gone dark? The IA lost some data? They were set to private collections?
16:07 🔗 db48x asktoomuc: usually it's copyright-related
16:07 🔗 closure certianly would need a different, larger server for restoring a large quantity of data
16:08 🔗 closure all the ssh public keys of clients are checked into a git repo, so other servers can use them
16:08 🔗 db48x closure: oh, that's cool
16:09 🔗 db48x nifty
16:09 🔗 closure (private git repo hosted on gitlab as registration data is somewhat confidential)
16:10 🔗 closure anyway, it would be useful to have a tool to update a shard, removing the IA url for files that the IA no longer has
16:11 🔗 closure then, the restore command on the client is something like: git annex copy --not --in web --to restoreserver
16:12 🔗 db48x I'm working on that tool, but it's giving me a headache
16:13 🔗 db48x I need a DAG of pipes
16:13 🔗 closure lemme know if the headache is on the git-annex side..
16:14 🔗 db48x closure: so far I'm just dumping the git-annex commands into a file, so that I can look at what I'm generating
16:14 🔗 closure sounds like you need a real programming language :)
16:14 🔗 SketchCow A restore is definitely on the cards in December
16:14 🔗 SketchCow That's the plan
16:14 🔗 SketchCow I just wanted us to continue the refinements and dealing we're going with at the moment.
16:14 🔗 SketchCow Since it's super important.
16:14 🔗 db48x closure: it's getting more and more likely
16:15 🔗 closure cut your losses and use go or whatever
16:15 🔗 db48x I can see a haskell library that can make a DAG of pipes in my mind
16:16 🔗 db48x maybe it is a mirage
16:17 🔗 closure there's one called "pipes" :P
16:17 🔗 db48x heh :)
16:17 🔗 closure "Elegant semantics: Use practical category theory "
16:17 🔗 closure only in haskell library descriptions
16:23 🔗 db48x heh, nice description
16:23 🔗 db48x a question occurs
16:24 🔗 db48x what do we want to do if a shard currently contains an item, but then it gets moved out of the collection that the shard contains?
16:24 🔗 db48x it's not dark, so we don't want to rmurl it
16:29 🔗 ask_ has joined #internetarchive.bak
16:30 🔗 db48x https://gist.github.com/db48x/66f41a91120266fe195aacec4c5531c2#file-update-shard-sh-L24
16:30 🔗 db48x so very unpipelined
16:34 🔗 SketchCow http://iabak.archiveteam.org/SHARD4.html
16:34 🔗 SketchCow I'm fascinated at how it seems to stop
16:34 🔗 SketchCow Is that because we need more people? Or something else.
16:37 🔗 asktoomuc has quit IRC (Ping timeout: 740 seconds)
16:38 🔗 Senji Hmm, apparently I have an unregistered shard on that shard. I should fix that up when I'm back at home tomorrow :-D
16:42 🔗 yipdw SketchCow: shard4 contains the 9/11 TV Archive, which is stream-only; I haven't checked this, but I think that translates to "cannot download"
16:45 🔗 ask_ how do you fix unregister segments? I think I have a couple
16:56 🔗 Jon hmm I still seem to be invisible/unregistered on iabak pages. wonder why.
16:56 🔗 Jon done 1T of shard3
16:58 🔗 asktoomuc has joined #internetarchive.bak
17:02 🔗 Frogging I've got 786G of shard4
17:02 🔗 asktoomuc any idea why I get those? "git-annex: unable to decommit memory: Invalid argument"
17:04 🔗 Frogging what distro?
17:08 🔗 db48x this feels like it's working: https://gist.github.com/db48x/66f41a91120266fe195aacec4c5531c2
17:09 🔗 db48x I'll check back after lunch
17:10 🔗 kula has joined #internetarchive.bak
17:15 🔗 sevs has quit IRC (Quit: Page closed)
17:18 🔗 sevs has joined #internetarchive.bak
17:26 🔗 asktoomuc that's annoying
17:26 🔗 asktoomuc it looks like these errors are killing my process, hence I haven't backep up anything in the last 24 hours
17:41 🔗 Kenshin db48x: i need to safely shutdown the iabak server for maintenance. there seems to be tmux sessions running, any idea who's running those and how can i arrange to get them safely stopped?
17:51 🔗 Kaz sec
17:51 🔗 Kaz it's probably me
17:51 🔗 Kaz mine is gone
17:53 🔗 asktoomuc I feel like something isn't working properly. On shard3 (822MB on my side): "git-annex: unable to decommit memory: Invalid argument Wow! I'm done downloading this shard of the IA!"
17:54 🔗 asktoomuc man this is hard. I have been at it for the past few days and I'm having a really hard time making it work on a pretty standard Debian 8 install. I wonder how you guys do it...
17:58 🔗 Kenshin Kaz: nah it's db48x mostly
18:03 🔗 Frogging asktoomuc: what's your kernel version, and is there anything non-standard about your setup?
18:03 🔗 Frogging any changed options, etc
18:04 🔗 asktoomuc it's 3.16.0-4-amd64
18:05 🔗 asktoomuc and no, I just installed it as usual in a VM from the Debian iso: debian-8.6.0-amd64-netinst.iso
18:05 🔗 asktoomuc the share drive is mounted using NFS (I installed nfs-common and git and git-annex from apt-get)
18:05 🔗 asktoomuc so I can't see anything out of place really
18:06 🔗 asktoomuc I was able to download ~80G this weekend but since I updated the script to the latest version yesterday, it doesn't seem to work for me anymore
18:07 🔗 closure probably need to change the git-annex build to the i386ancient one to support these old kernels
18:09 🔗 asktoomuc old? XD
18:09 🔗 asktoomuc that's the Debian stable version
18:09 🔗 asktoomuc but I can deploy it in a Debian 10 install maybe
18:09 🔗 asktoomuc what's the minimum version of the kernel recommended?
18:10 🔗 closure https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-i386-ancient.tar.gz
18:10 🔗 closure untar and replace git-annex.linux directory in the iabak repo with that
18:10 🔗 sep332 kernel 3.16 was released August 2014
18:12 🔗 asktoomuc well okay, I love Debian but I'll admit they are lagging behind with their stable version
18:12 🔗 Frogging I'm running Debian 8 as well but I realized I also updated my kernel to 4.7.3
18:13 🔗 asktoomuc thanks closure
18:13 🔗 Frogging it might be the kernel taking issue with git-annex doing things with a 1TB vm allocation
18:13 🔗 Frogging which I've noticed it's doing on mine
18:14 🔗 closure it's apparently a new way to free memory which libc has started using w/o caring if the kernel supports it
18:14 🔗 closure not sure why we need a new way to free memory in 2016
18:14 🔗 Frogging munmap()? :p
18:15 🔗 asktoomuc hmmm ok, I'll try the package you gave me and if that doesn't work, I'll scrap the VM and get a Ubuntu server or Debian unstable
18:15 🔗 Frogging or just update the kernel
18:15 🔗 asktoomuc apt-get dit-upgrade or is it more involved?
18:16 🔗 Frogging I'm actually not sure, I compiled mine because of reasons, but there might well be a "friendlier" way
18:16 🔗 asktoomuc I'm not bad with a computer but very far from being a sysadmin
18:16 🔗 asktoomuc might be easier to just reinstall a latest distro in that case
18:17 🔗 Frogging here https://packages.debian.org/jessie-backports/linux-image-amd64
18:17 🔗 Frogging you should be able to jsut add jessie-backports to your apt sources
18:18 🔗 Frogging https://backports.debian.org/Instructions/#index2h2
18:19 🔗 Frogging ah and there's a special command to install from backports
18:19 🔗 Frogging on that page
18:20 🔗 asktoomuc oh ok, thanks Frogging
18:20 🔗 Meroje packages from backports have lower priority, that's why you need this command
18:20 🔗 Frogging yeah^
18:20 🔗 asktoomuc learn something new everyday
18:20 🔗 Frogging maybe it's to avoid accidentally dragging the whole system forward with dependencies when installing stuff
18:21 🔗 Frogging which can happen if you add unstable repos to your sources and such :p
18:21 🔗 Meroje yes, you either opt-in individually, or override all priorities at once
18:22 🔗 ask_ has quit IRC (Quit: Bye)
18:29 🔗 komarEX has joined #internetarchive.bak
18:29 🔗 komarEX hello
18:29 🔗 komarEX guys git annex is so slow with huge number of files that I could cry a river
18:30 🔗 komarEX git repack -ad && git gc && git update-index --index-version 4
18:30 🔗 komarEX should I ?
19:11 🔗 VADemon has quit IRC (Quit: left4dead)
19:44 🔗 Kaz Right, not tuu sure what on easrth is happening here
19:44 🔗 Kaz the script has decided to attempt bittorrent downloads
19:44 🔗 Kaz I've gone through and disabled the bittorrent remote (or so I thought)
20:37 🔗 atomotic has joined #internetarchive.bak
21:15 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
21:39 🔗 asktoomuc what's the best way to run the script once logged out? using screen? launching the script in the background with "&"
21:41 🔗 asktoomuc I already ran the "loginctl enable-linger user" as recommended
21:41 🔗 sep332 screen works
21:41 🔗 Meroje then screen works
21:41 🔗 Meroje it installs a crontab too
21:56 🔗 asktoomuc are those expected? "Continuing any downloads that were previously interrupted... flock: cannot open lock file .git/annex/iasyncer.lock: No such file or directory"
21:57 🔗 asktoomuc I think I will start again from scratch, too many issues
21:58 🔗 asktoomuc I must have broken something
22:02 🔗 thelsdj i get that flock line all the time, doesn't seem to affect anything
22:18 🔗 komarEX has quit IRC (Quit: Page closed)
22:18 🔗 iabak-reg 03registrar 05master 2677c00 06other 10SHARD14/pubkeys registration of roninfight on SHARD14
22:20 🔗 asktoomuc has quit IRC (Quit: Page closed)
22:25 🔗 iabak-reg 03registrar 05master 058800e 06other 10SHARD14/pubkeys registration of roninfight on SHARD14
22:29 🔗 iabak-reg 03registrar 05master 448a743 06other 10SHARD14/pubkeys registration of roninfight on SHARD14
22:30 🔗 iabak-reg 03registrar 05master 885dc43 06other 10SHARD14/pubkeys registration of roninfight on SHARD14
22:34 🔗 iabak-reg 03registrar 05master e429741 06other 10SHARD14/pubkeys registration of roninfight on SHARD14
22:40 🔗 asktoomuc has joined #internetarchive.bak
22:40 🔗 asktoomuc has quit IRC (Client Quit)
22:50 🔗 bwn has quit IRC (Ping timeout: 244 seconds)
23:03 🔗 sep332 has quit IRC (konversation out)
23:14 🔗 thelsdj do some shards have their stats update more often than others? I have 190G from shard10 and have for over a day, shard15 I have 150G, neither of those appear to be updated, but I am showing up on shard14 and shard16

irclogger-viewer