#internetarchive.bak 2016-11-19,Sat

↑back Search

Time Nickname Message
00:03 πŸ”— bwn has joined #internetarchive.bak
02:07 πŸ”— Start has quit IRC (Read error: Connection reset by peer)
02:08 πŸ”— Start_ has joined #internetarchive.bak
02:34 πŸ”— db48x heh, I've been letting iabak download some stuff to test the new code I wrote
02:34 πŸ”— db48x but I just realized that it's working on an item with ~650 files
02:35 πŸ”— db48x 80+ GB
02:37 πŸ”— db48x https://archive.org/details/13jany2014warcs
02:38 πŸ”— patrickod has quit IRC (Quit: ZNC - http://znc.in)
02:38 πŸ”— patrickod has joined #internetarchive.bak
02:40 πŸ”— patrickod has quit IRC (Client Quit)
02:40 πŸ”— patrickod has joined #internetarchive.bak
02:43 πŸ”— db48x` has joined #internetarchive.bak
02:45 πŸ”— db48x has quit IRC (Ping timeout: 255 seconds)
02:54 πŸ”— Start_ is now known as Start
04:28 πŸ”— db48x` oops, no wonder
04:28 πŸ”— db48x` I made a list of 5 items, then counted from 1 to 6 to download them
05:51 πŸ”— db48x` is now known as db48x
06:03 πŸ”— yipdw Kaz: yes
06:03 πŸ”— yipdw what's up
06:51 πŸ”— db48x yipdw: not sure what he was going to ask, but the stats aren't getting to graphite
06:52 πŸ”— yipdw hmm
06:52 πŸ”— db48x want to check it out?
06:52 πŸ”— db48x he fixed the cronjobs, which apparently had vanished
06:52 πŸ”— yipdw I can poke at it slowly over the next day or two
06:52 πŸ”— db48x ah :)
06:52 πŸ”— db48x well, I have a few hours before I sleep
06:56 πŸ”— db48x a lot of exceptions
07:00 πŸ”— db48x well, carbon is getting lots of connections
07:00 πŸ”— db48x 19/11/2016 02:00:34 :: MetricLineReceiver connection with 127.0.0.1:37932 established
07:00 πŸ”— db48x 19/11/2016 02:00:34 :: MetricLineReceiver connection with 127.0.0.1:37932 closed cleanly
07:00 πŸ”— db48x 19/11/2016 02:00:34 :: MetricLineReceiver connection with 127.0.0.1:37933 established
07:00 πŸ”— db48x 19/11/2016 02:00:34 :: MetricLineReceiver connection with 127.0.0.1:37933 closed cleanly
07:06 πŸ”— bwn has quit IRC (Ping timeout: 961 seconds)
07:06 πŸ”— db48x /var/lib/graphite/whisper/iabak/shardstats/connections/all.wsp has a very recent modification time
07:16 πŸ”— bwn has joined #internetarchive.bak
07:28 πŸ”— kyan has quit IRC (Quit: Leaving)
10:45 πŸ”— bwn has quit IRC (Ping timeout: 244 seconds)
10:56 πŸ”— atomotic has joined #internetarchive.bak
11:07 πŸ”— bwn has joined #internetarchive.bak
11:15 πŸ”— asktoomuc has joined #internetarchive.bak
12:14 πŸ”— Whopper has joined #internetarchive.bak
12:18 πŸ”— Optical has joined #internetarchive.bak
12:20 πŸ”— Optical I was wondering guys as you are sort of a online group have you tried reaching out to Seagate or other HDD companies to sponsor you for HDDs for the IA.BAK and other projects?
12:20 πŸ”— atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
12:47 πŸ”— sevs has joined #internetarchive.bak
12:53 πŸ”— sevs db48x: you here?
13:08 πŸ”— atomotic has joined #internetarchive.bak
13:17 πŸ”— atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
13:41 πŸ”— iabak-reg 03registrar 05master fd84ce3 06other 10SHARD15/pubkeys registration of ronin_fight on SHARD15
13:53 πŸ”— iabak-reg 03registrar 05master 53598d0 06other 10SHARD12/pubkeys registration of roninfight on SHARD12
13:57 πŸ”— asktoomuc even with NFS enabled (to test), this process is anything but straightforward. I'm now getting "Connection refused" errors to .git/annex/ssh/SHARD12@iabak.archiveteam.org
13:57 πŸ”— asktoomuc not sure what I'm doing wrong
14:17 πŸ”— kurt db48x / yipdw I haven't added the cronjobs back in - I was running the scripts manually to see if it actually ended up giving us updated graphs
14:26 πŸ”— trs80 asktoomuch: can you pastebin the error? we'll need more details to work out what's going on
15:43 πŸ”— cmaldonad has joined #internetarchive.bak
15:44 πŸ”— cmaldonad has quit IRC (Client Quit)
16:24 πŸ”— Optical has quit IRC (Quit: Page closed)
16:29 πŸ”— asktoomuc http://pastebin.com/E9gdpN1Q
16:31 πŸ”— iabak-reg 03registrar 05master 2af9b61 06other 10SHARD12/pubkeys registration of removed on SHARD12
16:33 πŸ”— iabak-reg 03registrar 05master 9391ff2 06other 10SHARD12/pubkeys registration of removed@gmail.com on SHARD12
16:50 πŸ”— db48x asktoomuc: to ssh control master connections generally work for you, or is the error message just saying that you couldn't connnect to iabak.archiveteam.org?
16:54 πŸ”— db48x asktoomuc: http://stackoverflow.com/questions/36459785/shared-ssh-connection-with-control-master-not-working perhaps?
16:54 πŸ”— db48x oh, is this another NFS-or-SMB thing?
16:58 πŸ”— iabak-reg 03registrar 05master 4ed30a9 06other 10SHARD12/pubkeys registration of removed@gmail.com on SHARD12
17:20 πŸ”— asktoomuc db48x: NFS this time round
17:21 πŸ”— asktoomuc as SMB isn't supported with symlinks it seems. I created a special share with NFS enabled for it
17:22 πŸ”— asktoomuc I'm not sure what shared ssh or control master are tbh. I have just installed a Debian 8 and I'm running the process from it
17:23 πŸ”— db48x it's a way for multiple SSH processes to share the same TCP connection, when they're all talking to the same server
17:24 πŸ”— db48x it makes it quicker to start multiple concurrent downloads, since they only have to set up the connection once
17:25 πŸ”— db48x but it requires creating a unix socket file
17:25 πŸ”— asktoomuc ok, I see
17:25 πŸ”— asktoomuc do I need any special configuration to make it work?
17:26 πŸ”— db48x hmm
17:26 πŸ”— asktoomuc I'm not too sure what's wrong from the logs. I have linked the pastebin of the execution as you've probably seen
17:27 πŸ”— db48x yes, see lines 167 and 168
17:28 πŸ”— db48x that's git printing out a message that it couldn't use the socket file we specified
17:28 πŸ”— asktoomuc hmmm ok
17:29 πŸ”— db48x or rather, it's git-annex that specifies where the socket should be stored
17:32 πŸ”— db48x ok, this is configurable in git-annex
17:32 πŸ”— db48x go into the shard directory and run 'git config annex.sshcaching false'
17:33 πŸ”— db48x of course, as soon as it checks out a new shard that shard will be broken
17:33 πŸ”— asktoomuc that command isn't supposed to return anything, is it?
17:34 πŸ”— db48x no
17:34 πŸ”— db48x you can run git config --list to see what settings are set
17:35 πŸ”— db48x I will think about a way to make this easier
17:36 πŸ”— asktoomuc http://pastebin.com/uCHZZfXb
17:36 πŸ”— asktoomuc looks like the setting is set
17:37 πŸ”— asktoomuc can I run the main iabak script in the parent directory after that?
17:50 πŸ”— sevs db48x: you should be able to put git options into the ANNEXGETOPTS file, I think there is a switch for that
17:52 πŸ”— sevs -c name=value Overrides git configuration settings.
17:54 πŸ”— asktoomuc hmmm this didn't fix it
17:55 πŸ”— asktoomuc "./iabak-helper: line 162: 5400000000000 - : syntax error: operand expected (error token is "- ")"
17:56 πŸ”— asktoomuc http://pastebin.com/7YiUKJ7k
17:56 πŸ”— asktoomuc sorry guys, I'm just trying to contribute but I end up creating a lot of problems...
17:57 πŸ”— sevs just had that a couple hours ago
17:57 πŸ”— sevs one sec
17:59 πŸ”— sevs go to the shard dir, and set the git config variable annex.diskreserve=100M
18:01 πŸ”— asktoomuc now that I think about it, the script never asked me how much free space I wanted to keep
18:01 πŸ”— asktoomuc I seem to remember reading about that in the wiki
18:02 πŸ”— asktoomuc "It should prompt you for how much disk space to not use. To adjust this value later, use git config annex.diskreserve 200GB in all of the IA.BAK/shard* directories."
18:02 πŸ”— asktoomuc yep, it didn't
18:04 πŸ”— sevs yeah, on one machine it was set, on another it wasn't
18:05 πŸ”— sevs db48x: I believe this was introduced with the newest commit to https://github.com/ArchiveTeam/IA.BAK
18:08 πŸ”— db48x yep
18:08 πŸ”— db48x annoyingly I thought I handled the case where there was no reserve set
18:08 πŸ”— db48x except right there, obviously
18:09 πŸ”— asktoomuc :)
18:10 πŸ”— asktoomuc well at least I'm happy that I'm helping with finding issues in the current process
18:10 πŸ”— asktoomuc it seems to be downloading now
18:10 πŸ”— sevs if i got my bits of bash right you tried with "${GIT} config annex.diskreserve || true" except it should be "|| 0"
18:11 πŸ”— db48x sevs: that's a good idea
18:11 πŸ”— sevs asktoomuc: yay!
18:11 πŸ”— asktoomuc the downloading seems a bit slow (~500KB/s) but I can live with that
18:13 πŸ”— sevs asktoomuc: by default it only downloads files sequentially
18:14 πŸ”— sevs you can put a file ANNEXGETOPTS in your ia.bak dir
18:14 πŸ”— asktoomuc yeah, I saw a message about enabling concurrent download but I don't know how to do that and I'm worried it would not work with my setup that doesn't support control master connections it seems
18:14 πŸ”— asktoomuc oh ok, well I'll try that
18:14 πŸ”— sevs with the content "-J<num>"
18:14 πŸ”— kurt bit of an edge case here really, but on the subject of concurrent downloads
18:14 πŸ”— asktoomuc the message says: "(Not using enough bandwith? Enable concurrent downloads with: echo -J5 > ANNEXGETOPTS)"
18:14 πŸ”— sevs exactly
18:15 πŸ”— kurt if you've got more than one concurrent download going then kill the script, it'll finish them off one by one once you restart the script
18:15 πŸ”— kurt any way to avoid that?
18:15 πŸ”— asktoomuc so I create ANNEXGETOPTS. What's a sensible value for n on a VM running on Core i7 with enough RAM and a 1Gbps symetrical connection?
18:16 πŸ”— db48x it'll work, it'll just make an extra TCP connection for each concurrent download
18:16 πŸ”— db48x kurt: tweak the rundownloaddirs function and send us a pull request :)
18:16 πŸ”— sevs yes? i remember deleting everything in <shard>/.git/annex/tmp/ worked at some point
18:17 πŸ”— asktoomuc *value for J sorry
18:17 πŸ”— sevs at least once that worked, no idea if i was just lucky
18:19 πŸ”— sevs asktoomuc: start with 10, see where that brings you to
18:19 πŸ”— asktoomuc ok thanks!
18:20 πŸ”— sevs do you have iftop? wonderful tool, shows the traffic/sec
18:20 πŸ”— asktoomuc nope, I'll install the package now
18:21 πŸ”— sevs in the lower right corner you see the total incomming and outgoing rate
18:27 πŸ”— Deewiant has quit IRC (Quit: Viivan loppu.)
18:30 πŸ”— kurt sevs: looks like that's just restarted the downloads :(
18:31 πŸ”— sevs puhh, yeah, no idea
18:31 πŸ”— db48x yes, it'll restart any downloads that were interrupted
18:32 πŸ”— db48x but it doesn't do so concurrently
18:32 πŸ”— db48x once those are done it'll go back to normal operation, which can be concurrent
18:33 πŸ”— sevs the one time I did this I got "Hey, you have shuf installed ..."
18:33 πŸ”— db48x yes, you'll get that next
18:33 πŸ”— db48x see https://github.com/ArchiveTeam/IA.BAK/blob/master/iabak-helper#L144-L149
18:36 πŸ”— kurt I know what the normal operation is - just wondering if there were a way to have it 'forget' that it's got some partly-finished downloads so it goes straight to concurrent downloads
18:37 πŸ”— kurt my issue is that huge 20gb files at 10mbit/s each 40 times is time inefficient
18:37 πŸ”— kurt I'll have a poke around anywho
18:39 πŸ”— db48x kurt: sure, run git annex unused
18:40 πŸ”— db48x then git annex dropunused
18:42 πŸ”— asktoomuc I keep getting that message "Filled up available disk space, so stopping here!"
18:43 πŸ”— sevs db48x: shouldn't it be possible to run the list from "git annex unused" through the same "| dirname_pipe | sumofbytes | shuffle | rundownloads" pipeline?
18:43 πŸ”— asktoomuc "Wow! I'm done downloading this shard of the IA!" <= that seems too good to be true
18:44 πŸ”— sevs asktoomuc: which shard were you working on?
18:44 πŸ”— asktoomuc shard 12
18:44 πŸ”— sevs might be that there were enough copies
18:44 πŸ”— asktoomuc I chose for me, I didn't specify anything
18:44 πŸ”— asktoomuc *it
18:45 πŸ”— sevs yeah
18:45 πŸ”— sevs you *do* have space free?
18:45 πŸ”— asktoomuc it says the shard directory is 641M (du -sh)
18:45 πŸ”— asktoomuc I have ~6TB free
18:45 πŸ”— sevs hmm
18:46 πŸ”— db48x what is your annex.diskreserve setting in shard12?
19:06 πŸ”— bwn has quit IRC (Ping timeout: 964 seconds)
19:09 πŸ”— asktoomuc annex.diskreserve=100M
19:09 πŸ”— asktoomuc how do you choose which shard you download?
19:09 πŸ”— asktoomuc and why does the script exits after just finishing 1 shard?
19:12 πŸ”— sevs has quit IRC (Ping timeout: 268 seconds)
19:14 πŸ”— kyan has joined #internetarchive.bak
19:15 πŸ”— db48x it's not supposed to
19:17 πŸ”— asktoomuc it probably has to do with my "Filled up available disk space" error message then
19:20 πŸ”— Kenshin who's managing the iabak node currently?
19:20 πŸ”— Kenshin i need to move the VM to another machine with more space, and also IP change
19:21 πŸ”— HCross db48x and closure
19:21 πŸ”— HCross probably need SketchCow for DNS
19:23 πŸ”— db48x Kenshin: why do you say that?
19:24 πŸ”— Kenshin db48x: ?
19:24 πŸ”— Kenshin db48x: what do you mean
19:25 πŸ”— db48x oh, I misread
19:28 πŸ”— db48x asktoomuc: so, what does "df -Ph ." print out on your system?
19:31 πŸ”— asktoomuc http://pastebin.com/sKGhShBx
19:31 πŸ”— db48x you forgot the .
19:31 πŸ”— db48x but presumably it's just the first and last lines of that
19:35 πŸ”— asktoomuc sorry
19:35 πŸ”— asktoomuc Filesystem Size Used Avail Use% Mounted on /dev/sda1 1.9G 1012M 726M 59% /
19:36 πŸ”— db48x ok, so iabak did the right thing
19:37 πŸ”— db48x you have 700M available and wanted to reserve 100M, so it downloaded ~600M of stuff :)
19:38 πŸ”— asktoomuc wrong directory...
19:38 πŸ”— asktoomuc man I'm useless
19:39 πŸ”— asktoomuc root@IABAK-VM:/mnt/IABAK/IA.BAK# df -Ph . Filesystem Size Used Avail Use% Mounted on 192.168.11.98:/mnt/user/IABAK 7.3T 1.9T 5.4T 26% /mnt/IABAK
19:39 πŸ”— iabak-reg 03registrar 05master 16a57e4 06other 10SHARD12/pubkeys registration of hcross on SHARD12
19:40 πŸ”— asktoomuc but I'm thinking what you were saying is related somehow. It's suspicious that it stopped at ~640M when the space available on the main disk is 726M
19:44 πŸ”— bwn has joined #internetarchive.bak
19:48 πŸ”— SketchCow ?
19:56 πŸ”— yipdw asktoomuc: that's the reserve behavior; that seems expected to me
19:59 πŸ”— db48x yipdw: except that he actually has terabytes available in the mount
20:00 πŸ”— db48x asktoomuc: just for kicks, what do you get when you run it in the shard directory?
20:05 πŸ”— iabak-reg 03registrar 05master fa6c2de 06other 10SHARD10/pubkeys registration of Kaz on SHARD10
20:06 πŸ”— yipdw oh hmm
20:06 πŸ”— yipdw I missed that
20:08 πŸ”— asktoomuc when I run waht, db48x ?
20:22 πŸ”— asktoomuc this? root@IABAK-VM:/mnt/IABAK/IA.BAK/shard12# df -Ph . Filesystem Size Used Avail Use% Mounted on 192.168.11.98:/mnt/user/IABAK 7.3T 1.9T 5.4T 26% /mnt/IABAK
20:24 πŸ”— iabak-reg 03registrar 05master 2e6043e 06other 10SHARD16/pubkeys registration of Kaz on SHARD16
20:30 πŸ”— thelsdj I'm having the out of disk space problem as well, only started when I re-ran iabak which updated code from git
20:31 πŸ”— thelsdj I have 8.8T free (what df says) and its set to save 7TB
20:34 πŸ”— thelsdj Filesystem Size Used Available Capacity Mounted on
20:34 πŸ”— thelsdj /dev/sda1 16.0T 7.1T 8.8T 45% /mnt/DroboFS
20:34 πŸ”— thelsdj annex.diskreserve=7TB
20:34 πŸ”— thelsdj Checking for any items that still need to be downloaded...
20:34 πŸ”— thelsdj oops, out of disk space
20:40 πŸ”— thelsdj tried setting it in GB so 7168GB and same problem, trying by removing the limit to verify that is the problem
20:41 πŸ”— thelsdj now it says i'm done with the shard, is shard16 only 202G? or is that another bug?
20:46 πŸ”— thelsdj huh, so i removed NOMORE and it won't even start another shard, says its used all the disk space (even though I don't have limit set in my existing shard)
20:49 πŸ”— kurt diskreserve = how much it'll keep free, no?
20:50 πŸ”— asktoomuc yeah
20:50 πŸ”— kurt and /mnt/IABAK has 5.4TB free?
20:51 πŸ”— kurt or are you no longer using /mnt/IABAK
20:51 πŸ”— asktoomuc no no, I'm indeed using it. And yes, it has 5.4TB free
20:52 πŸ”— kurt so you want to keep 7TB free
20:52 πŸ”— asktoomuc so I should be using it until it's almost full with the current setting
20:52 πŸ”— kurt you have 5.4TB free
20:52 πŸ”— kurt and you're wondering why it won't download more?
20:52 πŸ”— asktoomuc no, that's thelsdj
20:52 πŸ”— kurt you are correct, I am an idiot
20:52 πŸ”— kurt names are hard I want nick colors back, sorry
20:53 πŸ”— asktoomuc no worries
20:54 πŸ”— db48x kurt: :)
20:55 πŸ”— db48x so apparently I broke how it measures the free disk space?
20:55 πŸ”— db48x asktoomuc: can you do some debugging for me?
20:55 πŸ”— thelsdj db48x: i can as well, yeah you seem to have broken it
20:55 πŸ”— asktoomuc sure, thanks for trying to help
20:56 πŸ”— iabak-reg 03registrar 05master 6193808 06other 10SHARD10/pubkeys registration of thelsdj on SHARD10
20:56 πŸ”— thelsdj ok so hmm
20:56 πŸ”— thelsdj i set th annex.diskreserve on the IA.BAK directory as well
20:57 πŸ”— thelsdj and now it seems to be working
20:57 πŸ”— db48x https://gist.github.com/anonymous/41195d51c16e880df5e67c62fb46cca6
20:57 πŸ”— db48x ah, hmm
20:59 πŸ”— thelsdj also, sort of unrelated, how can i double check that if it tells me its finished getting a shard that I can believe it? is shard16 only 202G? i thought it was more than that
21:00 πŸ”— kurt heh, now I don't even have concurrent grabs at all
21:00 πŸ”— db48x forgot one function: https://gist.github.com/db48x/b079eaf83d33361d28c8115e8e5352da
21:01 πŸ”— db48x thelsdj: you can use git annex list --all to see which remotes have copies of which files
21:01 πŸ”— db48x you can use git annex list --not --copies 4 to see a list of all files that don't have enough copies
21:02 πŸ”— db48x asktoomuc: if you can download the test.sh from that second gist and source it, then you'll be able to call the functions and make sure they work correctly
21:03 πŸ”— db48x for example, bytesFromSize $(annexreserved) should print out 100000000
21:03 πŸ”— db48x and bytesFromSize $(diskfree) should print out 540000000000000
21:03 πŸ”— thelsdj line 13 has syntax error i think
21:03 πŸ”— asktoomuc I need a tiny bit more hand-holding, sorry
21:04 πŸ”— asktoomuc I can download the file, where do you want me to put it and what to do with it?
21:04 πŸ”— db48x asktoomuc: put it in IA.BAK
21:04 πŸ”— thelsdj no, just my shell was weird
21:04 πŸ”— db48x then use the "source" command to add it to your current environment
21:04 πŸ”— db48x (basically "source text.sh")
21:05 πŸ”— db48x thelsdj: it almost has LTS, but if you get an error message let me know what it is :)
21:05 πŸ”— thelsdj needs 'pow' as well
21:05 πŸ”— db48x ah, right
21:06 πŸ”— asktoomuc ok, copied and sourced
21:06 πŸ”— asktoomuc root@IABAK-VM:/mnt/IABAK/IA.BAK# bytesFromSize $(annexreserved) 100000000
21:06 πŸ”— db48x updated: https://gist.github.com/db48x/b079eaf83d33361d28c8115e8e5352da
21:07 πŸ”— asktoomuc root@IABAK-VM:/mnt/IABAK/IA.BAK# bytesFromSize $(diskfree) 5400000000000
21:07 πŸ”— db48x asktoomuc: ok, excellent
21:07 πŸ”— db48x asktoomuc: that rules out a class of problems
21:07 πŸ”— db48x asktoomuc: although for completeness, let's make sure that subtraction works :)
21:07 πŸ”— thelsdj so for me annexreserved is 0
21:07 πŸ”— thelsdj but i don't think thats right
21:08 πŸ”— db48x echo $(($(bytesFromSize $(diskfree)) - $(bytesFromSize $(annexreserved))))
21:08 πŸ”— asktoomuc root@IABAK-VM:/mnt/IABAK/IA.BAK# echo $(($(bytesFromSize $(diskfree)) - $(bytesFromSize $(annexreserved)))) 5399900000000
21:08 πŸ”— db48x asktoomuc: perfect
21:08 πŸ”— db48x thelsdj: annexreserved tries to be smart
21:09 πŸ”— db48x if you run annexreserved . it looks in the current directory
21:09 πŸ”— db48x if you just run annexreserved it looks in a shard* subdirectory of the current directory
21:10 πŸ”— db48x asktoomuc: ok, another thing we can do is to run iabak-helper with debugging output
21:10 πŸ”— thelsdj oh i guess my new shard wasn't really setup yet and didn't have the reserve
21:11 πŸ”— db48x thelsdj: ok, that's a bug we need to track down separately
21:11 πŸ”— db48x asktoomuc: if you edit iabak-helper and add "set -x" as the second line, then when you run iabak it'll be super verbose
21:11 πŸ”— db48x then I can read that output and see what's going on
21:11 πŸ”— asktoomuc ok, let's give that a try
21:12 πŸ”— db48x I'll be back in 5 minutes
21:19 πŸ”— iabak-reg 03registrar 05master a3e059a 06other 10SHARD15/pubkeys registration of thelsdj on SHARD15
21:19 πŸ”— iabak-reg 03registrar 05master 62053cb 06other 10SHARD16/pubkeys registration of octobyt3 on SHARD16
21:20 πŸ”— thelsdj + available=-6992000000000000
21:20 πŸ”— thelsdj + [[ -6992000000000000 -gt 34359738368 ]]
21:20 πŸ”— thelsdj + [[ -6992000000000000 -gt 0 ]]
21:20 πŸ”— thelsdj + echo 'oops, out of disk space'
21:20 πŸ”— thelsdj hmmm
21:20 πŸ”— thelsdj too much space free maybe?
21:20 πŸ”— thelsdj lol
21:23 πŸ”— kurt are you running freenas or something? could try changing the dataset quota if so
21:27 πŸ”— thelsdj db48x: so your bytesFromSize returns different things if I do 7TB or 7T
21:27 πŸ”— thelsdj maybe thats the bug?
21:28 πŸ”— thelsdj or a bug, not sure, still messing and trying to figure this out
21:32 πŸ”— db48x thelsdj: ah, indeed
21:33 πŸ”— asktoomuc on my side, rerunning it with set -x seems to have changed something. It is still downloading for now and hasn't aborted with the no space message yet
21:34 πŸ”— db48x asktoomuc: heh, ok. can you scroll back up to just before it started downloading?
21:34 πŸ”— asktoomuc sure
21:35 πŸ”— asktoomuc there's a bunch of stuff there
21:35 πŸ”— db48x look for this:
21:35 πŸ”— db48x + echo 'Checking for any files that still need to be downloaded...'
21:35 πŸ”— db48x + periodicsync
21:35 πŸ”— db48x Checking for any files that still need to be downloaded...
21:35 πŸ”— db48x + find_insufficient_copies
21:36 πŸ”— db48x and capture the log down until it starts downloading something
21:36 πŸ”— db48x (personally, I use tmux, so I can search backwards through the back buffer for the string 'find_insufficient_copies' and I'm there; perhaps you can do something similar in your setup)
21:37 πŸ”— asktoomuc http://pastebin.com/fVGUgQbv
21:37 πŸ”— db48x ok, your flock is breaking too, but that's not the cause of this problem
21:37 πŸ”— asktoomuc I'm not and Putty is very annoying because it keeps scrolling down the window when updating the download graph
21:38 πŸ”— asktoomuc hopefully I captured everything you needed
21:38 πŸ”— db48x yes, it looks good
21:38 πŸ”— db48x + available=5399900000000
21:38 πŸ”— db48x so it knows that it has plenty of space available
21:39 πŸ”— db48x + [[ 5399900000000 -gt 34359738368 ]]
21:39 πŸ”— db48x + available=34359738368
21:39 πŸ”— db48x + [[ 34359738368 -gt 0 ]]
21:39 πŸ”— db48x + spacelimit=34359738368
21:39 πŸ”— db48x it limits it to a cap that I put in on a whim
21:40 πŸ”— db48x ah:
21:40 πŸ”— db48x + read -d '' bytes filename
21:40 πŸ”— db48x + [[ 387325952 -lt 34359738368 ]]
21:40 πŸ”— db48x + spaceneeded=387325952
21:40 πŸ”— db48x + files+=(${filename})
21:40 πŸ”— db48x + read -d '' bytes filename
21:40 πŸ”— db48x + numfiles=1
21:40 πŸ”— db48x we read a name from the list of things to download, and end up only finding one thing
21:40 πŸ”— db48x occupying 387MB
21:42 πŸ”— thelsdj huh, so if i manually run the find_insufficient_copies in my shard dir i get a ton of files, but the script doesn't seem to find anything
21:44 πŸ”— thelsdj + files=()
21:44 πŸ”— thelsdj + read -d '' bytes filename
21:44 πŸ”— thelsdj + numfiles=0
21:44 πŸ”— db48x hrm
21:50 πŸ”— thelsdj looks like my xargs and dirname don't behave as expected
21:51 πŸ”— db48x perfect
21:51 πŸ”— thelsdj maybe they are being short circuited or not, let me try hard coding it to 'cat' and see if that fixes it
21:56 πŸ”— thelsdj is the sumofbytes supposed to only print one line?
21:56 πŸ”— db48x potentially
21:56 πŸ”— db48x it groups them by whatever is in the second column
21:57 πŸ”— db48x and we use dirname to shorten the second column and get the item names
21:57 πŸ”— db48x if there's only one item in the shard then there will only be one line in the final output
21:57 πŸ”— thelsdj i just get 734964 archivebot/archiveteam_archivebot_go_080/00000_Header.png but thats with cutting out the xargs/dirname stuff and just using 'cat'
21:57 πŸ”— db48x then no
21:57 πŸ”— thelsdj but theres a ton of items i don't yet have
21:58 πŸ”— db48x with filenames like that, sumofbytes should return them unchanged
21:58 πŸ”— db48x since there shouldn't be any duplicates in the second column to group the lines up by
21:59 πŸ”— thelsdj so by making it just find_insufficient_copies | rundownloads it works
21:59 πŸ”— thelsdj annoying that these pipes are so hard to debug
22:01 πŸ”— db48x thelsdj: :)
22:01 πŸ”— thelsdj hmm maybe not
22:01 πŸ”— db48x find_insufficient_copies | head -z | tr '\000' '\n'
22:01 πŸ”— db48x find_insufficient_copies | head -z | dirname_pipe | tr '\000' '\n', etc
22:02 πŸ”— VADemon has joined #internetarchive.bak
22:03 πŸ”— db48x (it would be nice if bash had a better debugger; one that let you set breakpoints, and inspect everything that had flowed through a pipeline)
22:05 πŸ”— asktoomuc it's still downloading on my side. I'm confused
22:06 πŸ”— db48x asktoomuc: :)
22:06 πŸ”— db48x gremlins?
22:09 πŸ”— asktoomuc yeah I don't know. The only thing I changed was the set -x after running the tests you asked me
22:10 πŸ”— asktoomuc anyway, not going to complain
22:12 πŸ”— thelsdj hmmm still not working, it tries to download 2 torrents that i don't have the tools for, but tehres still a TON of files that find_insufficient_copies returns but aren't being attempted
22:32 πŸ”— thelsdj si don't quite get shard16, i have 202G and i seem to have everything that there aren't 4 copies of and yet like half the shard is still IA only? are the torrent files really 50+% of the shard?
22:33 πŸ”— jsp12345 has quit IRC (Read error: Operation timed out)
22:33 πŸ”— thelsdj i guess if deewiant has 1.34T then maybe the torrents really are 1.1T total
22:34 πŸ”— thelsdj huh, now git annex list is showing a ton of stuff i don't hve but is on web and yet iabak is not downloading it
22:39 πŸ”— thelsdj find_insufficient_copies |tr '\000' '\n'| awk '{print $2}' |xargs -n 100 ../git-annex.linux/git annex list|grep "^__X_"|wc -l
22:39 πŸ”— thelsdj 3874
22:40 πŸ”— thelsdj so by my basic understanding, iabak _should_ be trying to download these?
22:46 πŸ”— iabak-reg 03registrar 05master c9abe05 06other 10SHARD3/pubkeys registration of mitch on SHARD3
22:53 πŸ”— db48x thelsdj: we're also having some trouble with the stats on the website
22:53 πŸ”— thelsdj i think i also may be hitting a bash pipe limit
22:54 πŸ”— db48x yes. it sounds to me like it's not processing that pipeline correctly or something
22:54 πŸ”— thelsdj the output of find_insufficient_copies is 490k but i think my bash gives up at 256k
22:54 πŸ”— db48x well, it's not like a pipe can only transfer a limited amount of data
22:55 πŸ”— thelsdj right, it doesn't make sense but stuffs not coming out the end, even if i remove all the steps
22:55 πŸ”— db48x weird
22:56 πŸ”— db48x what is find_insufficient_copies |tr '\000' '\n'|wc -l
22:57 πŸ”— komarEX has joined #internetarchive.bak
22:57 πŸ”— komarEX hey everyone
22:58 πŸ”— komarEX is it me or ANNEXGETOPTS disappeared from iabak ?
22:58 πŸ”— db48x komarEX: no, it's still there
22:58 πŸ”— komarEX well the information is but I don't see usage
22:58 πŸ”— thelsdj the 'read' is only getting 383 different filenames
22:59 πŸ”— thelsdj out of ~4800 that start the pipe
22:59 πŸ”— db48x thelsdj: funky
22:59 πŸ”— komarEX I have -J15 in file and it still downloads just one file
22:59 πŸ”— kurt komarEX: glad I'm not the only one noticing that today
22:59 πŸ”— kurt but I can't see any changes that would affect it in the iabak repo
23:00 πŸ”— komarEX Then I guess I have to look into my files backup
23:00 πŸ”— HCross2 Mine is doing that too
23:01 πŸ”— db48x it's right there on line... oh, uh
23:02 πŸ”— db48x sorry about that, I did break it
23:03 πŸ”— komarEX so let's just restart iabak right?
23:04 πŸ”— db48x yep
23:04 πŸ”— komarEX could you confirm 2 things with me
23:04 πŸ”— db48x hopefully :)
23:04 πŸ”— komarEX 1. shard4 is 1,41TB ?
23:04 πŸ”— db48x no, shard1 has 2.71 TB of stuff
23:05 πŸ”— komarEX oh
23:05 πŸ”— komarEX ok
23:05 πŸ”— komarEX 2. what will happen if I let annex use for ex. 4TB but shard is 2,71
23:05 πŸ”— db48x it will move on to another shard
23:05 πŸ”— komarEX ok
23:06 πŸ”— komarEX oh
23:06 πŸ”— komarEX btw
23:06 πŸ”— db48x feel free to also peruse the content of the shards and manually run git annex get for any files you would like to use
23:06 πŸ”— komarEX shard4 =/= shard1 I believe :d
23:06 πŸ”— db48x music you'd like to listen to, for instance
23:07 πŸ”— db48x oh, shard 4 is 1.41 TB
23:07 πŸ”— komarEX ok and one more thing
23:07 πŸ”— db48x sure
23:07 πŸ”— komarEX can you deny annex/iabak to download other shards ?
23:08 πŸ”— db48x sure, just edit the repolist
23:08 πŸ”— db48x set them all to "maint"
23:08 πŸ”— komarEX can you tell me which file/command ?
23:09 πŸ”— db48x use your favorite text editor
23:09 πŸ”— db48x it's a very simple file
23:09 πŸ”— komarEX oh I'm blind
23:09 πŸ”— komarEX it's in main dir ok
23:11 πŸ”— thelsdj so it eems to be stopping right before a 10G file, which i obviously have space for, so guessing issue is with: [[ $((${spaceneeded} + ${bytes})) -lt ${spacelimit} ]]
23:13 πŸ”— db48x thelsdj: what are the other variables at the time?
23:14 πŸ”— db48x komarEX: just be aware that this may require you to manually pull future updates
23:14 πŸ”— komarEX db48x: I'm aware
23:15 πŸ”— bwn has quit IRC (Ping timeout: 244 seconds)
23:15 πŸ”— db48x :)
23:15 πŸ”— thelsdj it gives up between these two: XXX:spaceneeded:70256110876,bytes:262,spacelimit:34359738368,archivebot/archiveteam_archivebot_go_081/www.geekhard.fr-shallow-20140726-175015-1f1e6.json
23:15 πŸ”— thelsdj XXX:spaceneeded:80994514176,bytes:10738403300,spacelimit:34359738368,archivebot/archiveteam_archivebot_go_081/www.genealogy.com-inf-20140605-184144-ahip0-000
23:16 πŸ”— thelsdj 00.warc.gz
23:16 πŸ”— thelsdj (i took out the check so i can get output from before and after)
23:16 πŸ”— komarEX db48x: I guess I should just throw this file to .gitignore?
23:17 πŸ”— db48x thelsdj: each time you run it it's going to get the list of files in a different order, and it'll stop at a different place
23:17 πŸ”— thelsdj no, its same space every time
23:17 πŸ”— thelsdj i took out the randomize
23:17 πŸ”— thelsdj since i dont' have shuf anyways
23:17 πŸ”— db48x ah
23:17 πŸ”— db48x ok, so why is spaceneeded 70G even though spacelimit is 34G?
23:18 πŸ”— db48x it's supposed to stop when spaceneeded + bytes -gt spacelimit
23:18 πŸ”— asktoomuc I'm still pulling files. I'm at roughly 15G for now, it's quite slow (between 1 and 6MB/s) but I guess I'm not in a hurry. I just hope it will keep working
23:19 πŸ”— thelsdj well, why is spacelimit 34G when I should have like 1.7T available for it
23:19 πŸ”— db48x thelsdj: because I had the idea that iabak should sync the repository more frequently
23:19 πŸ”— db48x so I wrote it to stop after a slightly random threshold in the hopes that it would do so
23:20 πŸ”— db48x (the threshold is 8 hours at 1MB/s)
23:21 πŸ”— thelsdj ok, so right, theres like 300+ files that it goes through no problem, doesn't take any time
23:21 πŸ”— thelsdj and it always tries to download them
23:21 πŸ”— thelsdj so i guess its blowing through that in like 20 seconds
23:22 πŸ”— db48x metadata and torrents and stuff that fails?
23:22 πŸ”— thelsdj i've manually run the git annex get command for them and theres no output, return value is 0
23:22 πŸ”— thelsdj doesn't seem to be failures as far as i can tell
23:22 πŸ”— thelsdj just silently succeeds immediately
23:22 πŸ”— db48x then you already have those files
23:23 πŸ”— db48x that may be something we forgot along the way, now that I think about it
23:23 πŸ”— thelsdj right so i already have the files, but thy aren't at 4 copies yet, but they aren't filtered out
23:23 πŸ”— thelsdj ok that makes sense
23:24 πŸ”— db48x and now that I've added a threshold, they're clogging up the works
23:24 πŸ”— thelsdj right
23:24 πŸ”— db48x good bug report :)
23:24 πŸ”— db48x --not --copies 4 --not --here?
23:25 πŸ”— thelsdj yeah --not is there
23:26 πŸ”— db48x ah, --not --copies 4 --not --in=here?
23:27 πŸ”— thelsdj oh i get it, ok adding that and trying
23:29 πŸ”— komarEX has quit IRC (Quit: Page closed)
23:31 πŸ”— thelsdj combined with changing sumofbytes to cat, it seems to be downloading again
23:31 πŸ”— db48x ok, so sumofbytes is broken
23:31 πŸ”— db48x it's just an awk script; could your awk be broken?
23:31 πŸ”— thelsdj at least on my system i think it prints only 1 line
23:31 πŸ”— thelsdj yeah its possible my awk is weird
23:32 πŸ”— thelsdj i have busybox awk
23:32 πŸ”— db48x oh
23:32 πŸ”— db48x busybox
23:33 πŸ”— thelsdj i'm surprised that this works as well as it does on my Drobo NAS
23:33 πŸ”— db48x gawk instead?
23:34 πŸ”— thelsdj if git-annex arm build could include gawk that would be great
23:34 πŸ”— thelsdj seeing if i have a gawk elsewhere or available
23:35 πŸ”— db48x heh
23:36 πŸ”— thelsdj not in any obvious places or in the repo for what people have built for the Drobo
23:38 πŸ”— thelsdj yeah so i think if its needed having git-annex include it in its arm binary would be very useful
23:40 πŸ”— asktoomuc am I supposed to see multiple progress bars when concurrent downloads happen?
23:43 πŸ”— asktoomuc because right now it only seems to download files sequentially even though I created the ANNEXGETOPTS file
23:43 πŸ”— asktoomuc iabak@IABAK-VM:/mnt/IABAK/IA.BAK$ cat ANNEXGETOPTS J9
23:43 πŸ”— thelsdj should be -J9 right?
23:44 πŸ”— asktoomuc indeed, thanks for spotting my mistake
23:45 πŸ”— asktoomuc time to go to bed I guess, I can't read properly anymore
23:45 πŸ”— asktoomuc thank you all for your help today!
23:45 πŸ”— bwn has joined #internetarchive.bak
23:48 πŸ”— db48x asktoomuc: you're welcome!
23:48 πŸ”— db48x asktoomuc: thanks for helping us out
23:48 πŸ”— db48x thelsdj: git-annex doesn't need awk at all
23:48 πŸ”— db48x iabak does, but iabak isn't going to distribute awk
23:48 πŸ”— db48x I mean, we could distribute bash too, and perl
23:48 πŸ”— db48x so I'd rather not distribute any of them
23:49 πŸ”— yipdw i have been on this weird rust kick latly
23:49 πŸ”— thelsdj yeah, i mean its worth discussing as embedded NAS' are a good source of space so would be nice to be able to run on them without much problem
23:54 πŸ”— db48x thelsdj: I'd rather have a few lines at the top of the file like AWK=awk that people can edit
23:57 πŸ”— db48x I added some things to the readme
23:57 πŸ”— db48x anything else we've covered today that I've forgotten?

irclogger-viewer