[00:03] *** bwn has joined #internetarchive.bak [02:07] *** Start has quit IRC (Read error: Connection reset by peer) [02:08] *** Start_ has joined #internetarchive.bak [02:34] heh, I've been letting iabak download some stuff to test the new code I wrote [02:34] but I just realized that it's working on an item with ~650 files [02:35] 80+ GB [02:37] https://archive.org/details/13jany2014warcs [02:38] *** patrickod has quit IRC (Quit: ZNC - http://znc.in) [02:38] *** patrickod has joined #internetarchive.bak [02:40] *** patrickod has quit IRC (Client Quit) [02:40] *** patrickod has joined #internetarchive.bak [02:43] *** db48x` has joined #internetarchive.bak [02:45] *** db48x has quit IRC (Ping timeout: 255 seconds) [02:54] *** Start_ is now known as Start [04:28] oops, no wonder [04:28] I made a list of 5 items, then counted from 1 to 6 to download them [05:51] *** db48x` is now known as db48x [06:03] Kaz: yes [06:03] what's up [06:51] yipdw: not sure what he was going to ask, but the stats aren't getting to graphite [06:52] hmm [06:52] want to check it out? [06:52] he fixed the cronjobs, which apparently had vanished [06:52] I can poke at it slowly over the next day or two [06:52] ah :) [06:52] well, I have a few hours before I sleep [06:56] a lot of exceptions [07:00] well, carbon is getting lots of connections [07:00] 19/11/2016 02:00:34 :: MetricLineReceiver connection with 127.0.0.1:37932 established [07:00] 19/11/2016 02:00:34 :: MetricLineReceiver connection with 127.0.0.1:37932 closed cleanly [07:00] 19/11/2016 02:00:34 :: MetricLineReceiver connection with 127.0.0.1:37933 established [07:00] 19/11/2016 02:00:34 :: MetricLineReceiver connection with 127.0.0.1:37933 closed cleanly [07:06] *** bwn has quit IRC (Ping timeout: 961 seconds) [07:06] /var/lib/graphite/whisper/iabak/shardstats/connections/all.wsp has a very recent modification time [07:16] *** bwn has joined #internetarchive.bak [07:28] *** kyan has quit IRC (Quit: Leaving) [10:45] *** bwn has quit IRC (Ping timeout: 244 seconds) [10:56] *** atomotic has joined #internetarchive.bak [11:07] *** bwn has joined #internetarchive.bak [11:15] *** asktoomuc has joined #internetarchive.bak [12:14] *** Whopper has joined #internetarchive.bak [12:18] *** Optical has joined #internetarchive.bak [12:20] I was wondering guys as you are sort of a online group have you tried reaching out to Seagate or other HDD companies to sponsor you for HDDs for the IA.BAK and other projects? [12:20] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [12:47] *** sevs has joined #internetarchive.bak [12:53] db48x: you here? [13:08] *** atomotic has joined #internetarchive.bak [13:17] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [13:41] 03registrar 05master fd84ce3 06other 10SHARD15/pubkeys registration of ronin_fight on SHARD15 [13:53] 03registrar 05master 53598d0 06other 10SHARD12/pubkeys registration of roninfight on SHARD12 [13:57] even with NFS enabled (to test), this process is anything but straightforward. I'm now getting "Connection refused" errors to .git/annex/ssh/SHARD12@iabak.archiveteam.org [13:57] not sure what I'm doing wrong [14:17] db48x / yipdw I haven't added the cronjobs back in - I was running the scripts manually to see if it actually ended up giving us updated graphs [14:26] asktoomuch: can you pastebin the error? we'll need more details to work out what's going on [15:43] *** cmaldonad has joined #internetarchive.bak [15:44] *** cmaldonad has quit IRC (Client Quit) [16:24] *** Optical has quit IRC (Quit: Page closed) [16:29] http://pastebin.com/E9gdpN1Q [16:31] 03registrar 05master 2af9b61 06other 10SHARD12/pubkeys registration of removed on SHARD12 [16:33] 03registrar 05master 9391ff2 06other 10SHARD12/pubkeys registration of removed@gmail.com on SHARD12 [16:50] asktoomuc: to ssh control master connections generally work for you, or is the error message just saying that you couldn't connnect to iabak.archiveteam.org? [16:54] asktoomuc: http://stackoverflow.com/questions/36459785/shared-ssh-connection-with-control-master-not-working perhaps? [16:54] oh, is this another NFS-or-SMB thing? [16:58] 03registrar 05master 4ed30a9 06other 10SHARD12/pubkeys registration of removed@gmail.com on SHARD12 [17:20] db48x: NFS this time round [17:21] as SMB isn't supported with symlinks it seems. I created a special share with NFS enabled for it [17:22] I'm not sure what shared ssh or control master are tbh. I have just installed a Debian 8 and I'm running the process from it [17:23] it's a way for multiple SSH processes to share the same TCP connection, when they're all talking to the same server [17:24] it makes it quicker to start multiple concurrent downloads, since they only have to set up the connection once [17:25] but it requires creating a unix socket file [17:25] ok, I see [17:25] do I need any special configuration to make it work? [17:26] hmm [17:26] I'm not too sure what's wrong from the logs. I have linked the pastebin of the execution as you've probably seen [17:27] yes, see lines 167 and 168 [17:28] that's git printing out a message that it couldn't use the socket file we specified [17:28] hmmm ok [17:29] or rather, it's git-annex that specifies where the socket should be stored [17:32] ok, this is configurable in git-annex [17:32] go into the shard directory and run 'git config annex.sshcaching false' [17:33] of course, as soon as it checks out a new shard that shard will be broken [17:33] that command isn't supposed to return anything, is it? [17:34] no [17:34] you can run git config --list to see what settings are set [17:35] I will think about a way to make this easier [17:36] http://pastebin.com/uCHZZfXb [17:36] looks like the setting is set [17:37] can I run the main iabak script in the parent directory after that? [17:50] db48x: you should be able to put git options into the ANNEXGETOPTS file, I think there is a switch for that [17:52] -c name=value Overrides git configuration settings. [17:54] hmmm this didn't fix it [17:55] "./iabak-helper: line 162: 5400000000000 - : syntax error: operand expected (error token is "- ")" [17:56] http://pastebin.com/7YiUKJ7k [17:56] sorry guys, I'm just trying to contribute but I end up creating a lot of problems... [17:57] just had that a couple hours ago [17:57] one sec [17:59] go to the shard dir, and set the git config variable annex.diskreserve=100M [18:01] now that I think about it, the script never asked me how much free space I wanted to keep [18:01] I seem to remember reading about that in the wiki [18:02] "It should prompt you for how much disk space to not use. To adjust this value later, use git config annex.diskreserve 200GB in all of the IA.BAK/shard* directories." [18:02] yep, it didn't [18:04] yeah, on one machine it was set, on another it wasn't [18:05] db48x: I believe this was introduced with the newest commit to https://github.com/ArchiveTeam/IA.BAK [18:08] yep [18:08] annoyingly I thought I handled the case where there was no reserve set [18:08] except right there, obviously [18:09] :) [18:10] well at least I'm happy that I'm helping with finding issues in the current process [18:10] it seems to be downloading now [18:10] if i got my bits of bash right you tried with "${GIT} config annex.diskreserve || true" except it should be "|| 0" [18:11] sevs: that's a good idea [18:11] asktoomuc: yay! [18:11] the downloading seems a bit slow (~500KB/s) but I can live with that [18:13] asktoomuc: by default it only downloads files sequentially [18:14] you can put a file ANNEXGETOPTS in your ia.bak dir [18:14] yeah, I saw a message about enabling concurrent download but I don't know how to do that and I'm worried it would not work with my setup that doesn't support control master connections it seems [18:14] oh ok, well I'll try that [18:14] with the content "-J" [18:14] bit of an edge case here really, but on the subject of concurrent downloads [18:14] the message says: "(Not using enough bandwith? Enable concurrent downloads with: echo -J5 > ANNEXGETOPTS)" [18:14] exactly [18:15] if you've got more than one concurrent download going then kill the script, it'll finish them off one by one once you restart the script [18:15] any way to avoid that? [18:15] so I create ANNEXGETOPTS. What's a sensible value for n on a VM running on Core i7 with enough RAM and a 1Gbps symetrical connection? [18:16] it'll work, it'll just make an extra TCP connection for each concurrent download [18:16] kurt: tweak the rundownloaddirs function and send us a pull request :) [18:16] yes? i remember deleting everything in /.git/annex/tmp/ worked at some point [18:17] *value for J sorry [18:17] at least once that worked, no idea if i was just lucky [18:19] asktoomuc: start with 10, see where that brings you to [18:19] ok thanks! [18:20] do you have iftop? wonderful tool, shows the traffic/sec [18:20] nope, I'll install the package now [18:21] in the lower right corner you see the total incomming and outgoing rate [18:27] *** Deewiant has quit IRC (Quit: Viivan loppu.) [18:30] sevs: looks like that's just restarted the downloads :( [18:31] puhh, yeah, no idea [18:31] yes, it'll restart any downloads that were interrupted [18:32] but it doesn't do so concurrently [18:32] once those are done it'll go back to normal operation, which can be concurrent [18:33] the one time I did this I got "Hey, you have shuf installed ..." [18:33] yes, you'll get that next [18:33] see https://github.com/ArchiveTeam/IA.BAK/blob/master/iabak-helper#L144-L149 [18:36] I know what the normal operation is - just wondering if there were a way to have it 'forget' that it's got some partly-finished downloads so it goes straight to concurrent downloads [18:37] my issue is that huge 20gb files at 10mbit/s each 40 times is time inefficient [18:37] I'll have a poke around anywho [18:39] kurt: sure, run git annex unused [18:40] then git annex dropunused [18:42] I keep getting that message "Filled up available disk space, so stopping here!" [18:43] db48x: shouldn't it be possible to run the list from "git annex unused" through the same "| dirname_pipe | sumofbytes | shuffle | rundownloads" pipeline? [18:43] "Wow! I'm done downloading this shard of the IA!" <= that seems too good to be true [18:44] asktoomuc: which shard were you working on? [18:44] shard 12 [18:44] might be that there were enough copies [18:44] I chose for me, I didn't specify anything [18:44] *it [18:45] yeah [18:45] you *do* have space free? [18:45] it says the shard directory is 641M (du -sh) [18:45] I have ~6TB free [18:45] hmm [18:46] what is your annex.diskreserve setting in shard12? [19:06] *** bwn has quit IRC (Ping timeout: 964 seconds) [19:09] annex.diskreserve=100M [19:09] how do you choose which shard you download? [19:09] and why does the script exits after just finishing 1 shard? [19:12] *** sevs has quit IRC (Ping timeout: 268 seconds) [19:14] *** kyan has joined #internetarchive.bak [19:15] it's not supposed to [19:17] it probably has to do with my "Filled up available disk space" error message then [19:20] who's managing the iabak node currently? [19:20] i need to move the VM to another machine with more space, and also IP change [19:21] db48x and closure [19:21] probably need SketchCow for DNS [19:23] Kenshin: why do you say that? [19:24] db48x: ? [19:24] db48x: what do you mean [19:25] oh, I misread [19:28] asktoomuc: so, what does "df -Ph ." print out on your system? [19:31] http://pastebin.com/sKGhShBx [19:31] you forgot the . [19:31] but presumably it's just the first and last lines of that [19:35] sorry [19:35] Filesystem Size Used Avail Use% Mounted on /dev/sda1 1.9G 1012M 726M 59% / [19:36] ok, so iabak did the right thing [19:37] you have 700M available and wanted to reserve 100M, so it downloaded ~600M of stuff :) [19:38] wrong directory... [19:38] man I'm useless [19:39] root@IABAK-VM:/mnt/IABAK/IA.BAK# df -Ph . Filesystem Size Used Avail Use% Mounted on 192.168.11.98:/mnt/user/IABAK 7.3T 1.9T 5.4T 26% /mnt/IABAK [19:39] 03registrar 05master 16a57e4 06other 10SHARD12/pubkeys registration of hcross on SHARD12 [19:40] but I'm thinking what you were saying is related somehow. It's suspicious that it stopped at ~640M when the space available on the main disk is 726M [19:44] *** bwn has joined #internetarchive.bak [19:48] ? [19:56] asktoomuc: that's the reserve behavior; that seems expected to me [19:59] yipdw: except that he actually has terabytes available in the mount [20:00] asktoomuc: just for kicks, what do you get when you run it in the shard directory? [20:05] 03registrar 05master fa6c2de 06other 10SHARD10/pubkeys registration of Kaz on SHARD10 [20:06] oh hmm [20:06] I missed that [20:08] when I run waht, db48x ? [20:22] this? root@IABAK-VM:/mnt/IABAK/IA.BAK/shard12# df -Ph . Filesystem Size Used Avail Use% Mounted on 192.168.11.98:/mnt/user/IABAK 7.3T 1.9T 5.4T 26% /mnt/IABAK [20:24] 03registrar 05master 2e6043e 06other 10SHARD16/pubkeys registration of Kaz on SHARD16 [20:30] I'm having the out of disk space problem as well, only started when I re-ran iabak which updated code from git [20:31] I have 8.8T free (what df says) and its set to save 7TB [20:34] Filesystem Size Used Available Capacity Mounted on [20:34] /dev/sda1 16.0T 7.1T 8.8T 45% /mnt/DroboFS [20:34] annex.diskreserve=7TB [20:34] Checking for any items that still need to be downloaded... [20:34] oops, out of disk space [20:40] tried setting it in GB so 7168GB and same problem, trying by removing the limit to verify that is the problem [20:41] now it says i'm done with the shard, is shard16 only 202G? or is that another bug? [20:46] huh, so i removed NOMORE and it won't even start another shard, says its used all the disk space (even though I don't have limit set in my existing shard) [20:49] diskreserve = how much it'll keep free, no? [20:50] yeah [20:50] and /mnt/IABAK has 5.4TB free? [20:51] or are you no longer using /mnt/IABAK [20:51] no no, I'm indeed using it. And yes, it has 5.4TB free [20:52] so you want to keep 7TB free [20:52] so I should be using it until it's almost full with the current setting [20:52] you have 5.4TB free [20:52] and you're wondering why it won't download more? [20:52] no, that's thelsdj [20:52] you are correct, I am an idiot [20:52] names are hard I want nick colors back, sorry [20:53] no worries [20:54] kurt: :) [20:55] so apparently I broke how it measures the free disk space? [20:55] asktoomuc: can you do some debugging for me? [20:55] db48x: i can as well, yeah you seem to have broken it [20:55] sure, thanks for trying to help [20:56] 03registrar 05master 6193808 06other 10SHARD10/pubkeys registration of thelsdj on SHARD10 [20:56] ok so hmm [20:56] i set th annex.diskreserve on the IA.BAK directory as well [20:57] and now it seems to be working [20:57] https://gist.github.com/anonymous/41195d51c16e880df5e67c62fb46cca6 [20:57] ah, hmm [20:59] also, sort of unrelated, how can i double check that if it tells me its finished getting a shard that I can believe it? is shard16 only 202G? i thought it was more than that [21:00] heh, now I don't even have concurrent grabs at all [21:00] forgot one function: https://gist.github.com/db48x/b079eaf83d33361d28c8115e8e5352da [21:01] thelsdj: you can use git annex list --all to see which remotes have copies of which files [21:01] you can use git annex list --not --copies 4 to see a list of all files that don't have enough copies [21:02] asktoomuc: if you can download the test.sh from that second gist and source it, then you'll be able to call the functions and make sure they work correctly [21:03] for example, bytesFromSize $(annexreserved) should print out 100000000 [21:03] and bytesFromSize $(diskfree) should print out 540000000000000 [21:03] line 13 has syntax error i think [21:03] I need a tiny bit more hand-holding, sorry [21:04] I can download the file, where do you want me to put it and what to do with it? [21:04] asktoomuc: put it in IA.BAK [21:04] no, just my shell was weird [21:04] then use the "source" command to add it to your current environment [21:04] (basically "source text.sh") [21:05] thelsdj: it almost has LTS, but if you get an error message let me know what it is :) [21:05] needs 'pow' as well [21:05] ah, right [21:06] ok, copied and sourced [21:06] root@IABAK-VM:/mnt/IABAK/IA.BAK# bytesFromSize $(annexreserved) 100000000 [21:06] updated: https://gist.github.com/db48x/b079eaf83d33361d28c8115e8e5352da [21:07] root@IABAK-VM:/mnt/IABAK/IA.BAK# bytesFromSize $(diskfree) 5400000000000 [21:07] asktoomuc: ok, excellent [21:07] asktoomuc: that rules out a class of problems [21:07] asktoomuc: although for completeness, let's make sure that subtraction works :) [21:07] so for me annexreserved is 0 [21:07] but i don't think thats right [21:08] echo $(($(bytesFromSize $(diskfree)) - $(bytesFromSize $(annexreserved)))) [21:08] root@IABAK-VM:/mnt/IABAK/IA.BAK# echo $(($(bytesFromSize $(diskfree)) - $(bytesFromSize $(annexreserved)))) 5399900000000 [21:08] asktoomuc: perfect [21:08] thelsdj: annexreserved tries to be smart [21:09] if you run annexreserved . it looks in the current directory [21:09] if you just run annexreserved it looks in a shard* subdirectory of the current directory [21:10] asktoomuc: ok, another thing we can do is to run iabak-helper with debugging output [21:10] oh i guess my new shard wasn't really setup yet and didn't have the reserve [21:11] thelsdj: ok, that's a bug we need to track down separately [21:11] asktoomuc: if you edit iabak-helper and add "set -x" as the second line, then when you run iabak it'll be super verbose [21:11] then I can read that output and see what's going on [21:11] ok, let's give that a try [21:12] I'll be back in 5 minutes [21:19] 03registrar 05master a3e059a 06other 10SHARD15/pubkeys registration of thelsdj on SHARD15 [21:19] 03registrar 05master 62053cb 06other 10SHARD16/pubkeys registration of octobyt3 on SHARD16 [21:20] + available=-6992000000000000 [21:20] + [[ -6992000000000000 -gt 34359738368 ]] [21:20] + [[ -6992000000000000 -gt 0 ]] [21:20] + echo 'oops, out of disk space' [21:20] hmmm [21:20] too much space free maybe? [21:20] lol [21:23] are you running freenas or something? could try changing the dataset quota if so [21:27] db48x: so your bytesFromSize returns different things if I do 7TB or 7T [21:27] maybe thats the bug? [21:28] or a bug, not sure, still messing and trying to figure this out [21:32] thelsdj: ah, indeed [21:33] on my side, rerunning it with set -x seems to have changed something. It is still downloading for now and hasn't aborted with the no space message yet [21:34] asktoomuc: heh, ok. can you scroll back up to just before it started downloading? [21:34] sure [21:35] there's a bunch of stuff there [21:35] look for this: [21:35] + echo 'Checking for any files that still need to be downloaded...' [21:35] + periodicsync [21:35] Checking for any files that still need to be downloaded... [21:35] + find_insufficient_copies [21:36] and capture the log down until it starts downloading something [21:36] (personally, I use tmux, so I can search backwards through the back buffer for the string 'find_insufficient_copies' and I'm there; perhaps you can do something similar in your setup) [21:37] http://pastebin.com/fVGUgQbv [21:37] ok, your flock is breaking too, but that's not the cause of this problem [21:37] I'm not and Putty is very annoying because it keeps scrolling down the window when updating the download graph [21:38] hopefully I captured everything you needed [21:38] yes, it looks good [21:38] + available=5399900000000 [21:38] so it knows that it has plenty of space available [21:39] + [[ 5399900000000 -gt 34359738368 ]] [21:39] + available=34359738368 [21:39] + [[ 34359738368 -gt 0 ]] [21:39] + spacelimit=34359738368 [21:39] it limits it to a cap that I put in on a whim [21:40] ah: [21:40] + read -d '' bytes filename [21:40] + [[ 387325952 -lt 34359738368 ]] [21:40] + spaceneeded=387325952 [21:40] + files+=(${filename}) [21:40] + read -d '' bytes filename [21:40] + numfiles=1 [21:40] we read a name from the list of things to download, and end up only finding one thing [21:40] occupying 387MB [21:42] huh, so if i manually run the find_insufficient_copies in my shard dir i get a ton of files, but the script doesn't seem to find anything [21:44] + files=() [21:44] + read -d '' bytes filename [21:44] + numfiles=0 [21:44] hrm [21:50] looks like my xargs and dirname don't behave as expected [21:51] perfect [21:51] maybe they are being short circuited or not, let me try hard coding it to 'cat' and see if that fixes it [21:56] is the sumofbytes supposed to only print one line? [21:56] potentially [21:56] it groups them by whatever is in the second column [21:57] and we use dirname to shorten the second column and get the item names [21:57] if there's only one item in the shard then there will only be one line in the final output [21:57] i just get 734964 archivebot/archiveteam_archivebot_go_080/00000_Header.png but thats with cutting out the xargs/dirname stuff and just using 'cat' [21:57] then no [21:57] but theres a ton of items i don't yet have [21:58] with filenames like that, sumofbytes should return them unchanged [21:58] since there shouldn't be any duplicates in the second column to group the lines up by [21:59] so by making it just find_insufficient_copies | rundownloads it works [21:59] annoying that these pipes are so hard to debug [22:01] thelsdj: :) [22:01] hmm maybe not [22:01] find_insufficient_copies | head -z | tr '\000' '\n' [22:01] find_insufficient_copies | head -z | dirname_pipe | tr '\000' '\n', etc [22:02] *** VADemon has joined #internetarchive.bak [22:03] (it would be nice if bash had a better debugger; one that let you set breakpoints, and inspect everything that had flowed through a pipeline) [22:05] it's still downloading on my side. I'm confused [22:06] asktoomuc: :) [22:06] gremlins? [22:09] yeah I don't know. The only thing I changed was the set -x after running the tests you asked me [22:10] anyway, not going to complain [22:12] hmmm still not working, it tries to download 2 torrents that i don't have the tools for, but tehres still a TON of files that find_insufficient_copies returns but aren't being attempted [22:32] si don't quite get shard16, i have 202G and i seem to have everything that there aren't 4 copies of and yet like half the shard is still IA only? are the torrent files really 50+% of the shard? [22:33] *** jsp12345 has quit IRC (Read error: Operation timed out) [22:33] i guess if deewiant has 1.34T then maybe the torrents really are 1.1T total [22:34] huh, now git annex list is showing a ton of stuff i don't hve but is on web and yet iabak is not downloading it [22:39] find_insufficient_copies |tr '\000' '\n'| awk '{print $2}' |xargs -n 100 ../git-annex.linux/git annex list|grep "^__X_"|wc -l [22:39] 3874 [22:40] so by my basic understanding, iabak _should_ be trying to download these? [22:46] 03registrar 05master c9abe05 06other 10SHARD3/pubkeys registration of mitch on SHARD3 [22:53] thelsdj: we're also having some trouble with the stats on the website [22:53] i think i also may be hitting a bash pipe limit [22:54] yes. it sounds to me like it's not processing that pipeline correctly or something [22:54] the output of find_insufficient_copies is 490k but i think my bash gives up at 256k [22:54] well, it's not like a pipe can only transfer a limited amount of data [22:55] right, it doesn't make sense but stuffs not coming out the end, even if i remove all the steps [22:55] weird [22:56] what is find_insufficient_copies |tr '\000' '\n'|wc -l [22:57] *** komarEX has joined #internetarchive.bak [22:57] hey everyone [22:58] is it me or ANNEXGETOPTS disappeared from iabak ? [22:58] komarEX: no, it's still there [22:58] well the information is but I don't see usage [22:58] the 'read' is only getting 383 different filenames [22:59] out of ~4800 that start the pipe [22:59] thelsdj: funky [22:59] I have -J15 in file and it still downloads just one file [22:59] komarEX: glad I'm not the only one noticing that today [22:59] but I can't see any changes that would affect it in the iabak repo [23:00] Then I guess I have to look into my files backup [23:00] Mine is doing that too [23:01] it's right there on line... oh, uh [23:02] sorry about that, I did break it [23:03] so let's just restart iabak right? [23:04] yep [23:04] could you confirm 2 things with me [23:04] hopefully :) [23:04] 1. shard4 is 1,41TB ? [23:04] no, shard1 has 2.71 TB of stuff [23:05] oh [23:05] ok [23:05] 2. what will happen if I let annex use for ex. 4TB but shard is 2,71 [23:05] it will move on to another shard [23:05] ok [23:06] oh [23:06] btw [23:06] feel free to also peruse the content of the shards and manually run git annex get for any files you would like to use [23:06] shard4 =/= shard1 I believe :d [23:06] music you'd like to listen to, for instance [23:07] oh, shard 4 is 1.41 TB [23:07] ok and one more thing [23:07] sure [23:07] can you deny annex/iabak to download other shards ? [23:08] sure, just edit the repolist [23:08] set them all to "maint" [23:08] can you tell me which file/command ? [23:09] use your favorite text editor [23:09] it's a very simple file [23:09] oh I'm blind [23:09] it's in main dir ok [23:11] so it eems to be stopping right before a 10G file, which i obviously have space for, so guessing issue is with: [[ $((${spaceneeded} + ${bytes})) -lt ${spacelimit} ]] [23:13] thelsdj: what are the other variables at the time? [23:14] komarEX: just be aware that this may require you to manually pull future updates [23:14] db48x: I'm aware [23:15] *** bwn has quit IRC (Ping timeout: 244 seconds) [23:15] :) [23:15] it gives up between these two: XXX:spaceneeded:70256110876,bytes:262,spacelimit:34359738368,archivebot/archiveteam_archivebot_go_081/www.geekhard.fr-shallow-20140726-175015-1f1e6.json [23:15] XXX:spaceneeded:80994514176,bytes:10738403300,spacelimit:34359738368,archivebot/archiveteam_archivebot_go_081/www.genealogy.com-inf-20140605-184144-ahip0-000 [23:16] 00.warc.gz [23:16] (i took out the check so i can get output from before and after) [23:16] db48x: I guess I should just throw this file to .gitignore? [23:17] thelsdj: each time you run it it's going to get the list of files in a different order, and it'll stop at a different place [23:17] no, its same space every time [23:17] i took out the randomize [23:17] since i dont' have shuf anyways [23:17] ah [23:17] ok, so why is spaceneeded 70G even though spacelimit is 34G? [23:18] it's supposed to stop when spaceneeded + bytes -gt spacelimit [23:18] I'm still pulling files. I'm at roughly 15G for now, it's quite slow (between 1 and 6MB/s) but I guess I'm not in a hurry. I just hope it will keep working [23:19] well, why is spacelimit 34G when I should have like 1.7T available for it [23:19] thelsdj: because I had the idea that iabak should sync the repository more frequently [23:19] so I wrote it to stop after a slightly random threshold in the hopes that it would do so [23:20] (the threshold is 8 hours at 1MB/s) [23:21] ok, so right, theres like 300+ files that it goes through no problem, doesn't take any time [23:21] and it always tries to download them [23:21] so i guess its blowing through that in like 20 seconds [23:22] metadata and torrents and stuff that fails? [23:22] i've manually run the git annex get command for them and theres no output, return value is 0 [23:22] doesn't seem to be failures as far as i can tell [23:22] just silently succeeds immediately [23:22] then you already have those files [23:23] that may be something we forgot along the way, now that I think about it [23:23] right so i already have the files, but thy aren't at 4 copies yet, but they aren't filtered out [23:23] ok that makes sense [23:24] and now that I've added a threshold, they're clogging up the works [23:24] right [23:24] good bug report :) [23:24] --not --copies 4 --not --here? [23:25] yeah --not is there [23:26] ah, --not --copies 4 --not --in=here? [23:27] oh i get it, ok adding that and trying [23:29] *** komarEX has quit IRC (Quit: Page closed) [23:31] combined with changing sumofbytes to cat, it seems to be downloading again [23:31] ok, so sumofbytes is broken [23:31] it's just an awk script; could your awk be broken? [23:31] at least on my system i think it prints only 1 line [23:31] yeah its possible my awk is weird [23:32] i have busybox awk [23:32] oh [23:32] busybox [23:33] i'm surprised that this works as well as it does on my Drobo NAS [23:33] gawk instead? [23:34] if git-annex arm build could include gawk that would be great [23:34] seeing if i have a gawk elsewhere or available [23:35] heh [23:36] not in any obvious places or in the repo for what people have built for the Drobo [23:38] yeah so i think if its needed having git-annex include it in its arm binary would be very useful [23:40] am I supposed to see multiple progress bars when concurrent downloads happen? [23:43] because right now it only seems to download files sequentially even though I created the ANNEXGETOPTS file [23:43] iabak@IABAK-VM:/mnt/IABAK/IA.BAK$ cat ANNEXGETOPTS J9 [23:43] should be -J9 right? [23:44] indeed, thanks for spotting my mistake [23:45] time to go to bed I guess, I can't read properly anymore [23:45] thank you all for your help today! [23:45] *** bwn has joined #internetarchive.bak [23:48] asktoomuc: you're welcome! [23:48] asktoomuc: thanks for helping us out [23:48] thelsdj: git-annex doesn't need awk at all [23:48] iabak does, but iabak isn't going to distribute awk [23:48] I mean, we could distribute bash too, and perl [23:48] so I'd rather not distribute any of them [23:49] i have been on this weird rust kick latly [23:49] yeah, i mean its worth discussing as embedded NAS' are a good source of space so would be nice to be able to run on them without much problem [23:54] thelsdj: I'd rather have a few lines at the top of the file like AWK=awk that people can edit [23:57] I added some things to the readme [23:57] anything else we've covered today that I've forgotten?