Time |
Nickname |
Message |
00:03
🔗
|
|
bwn has joined #internetarchive.bak |
02:07
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
02:08
🔗
|
|
Start_ has joined #internetarchive.bak |
02:34
🔗
|
db48x |
heh, I've been letting iabak download some stuff to test the new code I wrote |
02:34
🔗
|
db48x |
but I just realized that it's working on an item with ~650 files |
02:35
🔗
|
db48x |
80+ GB |
02:37
🔗
|
db48x |
https://archive.org/details/13jany2014warcs |
02:38
🔗
|
|
patrickod has quit IRC (Quit: ZNC - http://znc.in) |
02:38
🔗
|
|
patrickod has joined #internetarchive.bak |
02:40
🔗
|
|
patrickod has quit IRC (Client Quit) |
02:40
🔗
|
|
patrickod has joined #internetarchive.bak |
02:43
🔗
|
|
db48x` has joined #internetarchive.bak |
02:45
🔗
|
|
db48x has quit IRC (Ping timeout: 255 seconds) |
02:54
🔗
|
|
Start_ is now known as Start |
04:28
🔗
|
db48x` |
oops, no wonder |
04:28
🔗
|
db48x` |
I made a list of 5 items, then counted from 1 to 6 to download them |
05:51
🔗
|
|
db48x` is now known as db48x |
06:03
🔗
|
yipdw |
Kaz: yes |
06:03
🔗
|
yipdw |
what's up |
06:51
🔗
|
db48x |
yipdw: not sure what he was going to ask, but the stats aren't getting to graphite |
06:52
🔗
|
yipdw |
hmm |
06:52
🔗
|
db48x |
want to check it out? |
06:52
🔗
|
db48x |
he fixed the cronjobs, which apparently had vanished |
06:52
🔗
|
yipdw |
I can poke at it slowly over the next day or two |
06:52
🔗
|
db48x |
ah :) |
06:52
🔗
|
db48x |
well, I have a few hours before I sleep |
06:56
🔗
|
db48x |
a lot of exceptions |
07:00
🔗
|
db48x |
well, carbon is getting lots of connections |
07:00
🔗
|
db48x |
19/11/2016 02:00:34 :: MetricLineReceiver connection with 127.0.0.1:37932 established |
07:00
🔗
|
db48x |
19/11/2016 02:00:34 :: MetricLineReceiver connection with 127.0.0.1:37932 closed cleanly |
07:00
🔗
|
db48x |
19/11/2016 02:00:34 :: MetricLineReceiver connection with 127.0.0.1:37933 established |
07:00
🔗
|
db48x |
19/11/2016 02:00:34 :: MetricLineReceiver connection with 127.0.0.1:37933 closed cleanly |
07:06
🔗
|
|
bwn has quit IRC (Ping timeout: 961 seconds) |
07:06
🔗
|
db48x |
/var/lib/graphite/whisper/iabak/shardstats/connections/all.wsp has a very recent modification time |
07:16
🔗
|
|
bwn has joined #internetarchive.bak |
07:28
🔗
|
|
kyan has quit IRC (Quit: Leaving) |
10:45
🔗
|
|
bwn has quit IRC (Ping timeout: 244 seconds) |
10:56
🔗
|
|
atomotic has joined #internetarchive.bak |
11:07
🔗
|
|
bwn has joined #internetarchive.bak |
11:15
🔗
|
|
asktoomuc has joined #internetarchive.bak |
12:14
🔗
|
|
Whopper has joined #internetarchive.bak |
12:18
🔗
|
|
Optical has joined #internetarchive.bak |
12:20
🔗
|
Optical |
I was wondering guys as you are sort of a online group have you tried reaching out to Seagate or other HDD companies to sponsor you for HDDs for the IA.BAK and other projects? |
12:20
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
12:47
🔗
|
|
sevs has joined #internetarchive.bak |
12:53
🔗
|
sevs |
db48x: you here? |
13:08
🔗
|
|
atomotic has joined #internetarchive.bak |
13:17
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
13:41
🔗
|
iabak-reg |
03registrar 05master fd84ce3 06other 10SHARD15/pubkeys registration of ronin_fight on SHARD15 |
13:53
🔗
|
iabak-reg |
03registrar 05master 53598d0 06other 10SHARD12/pubkeys registration of roninfight on SHARD12 |
13:57
🔗
|
asktoomuc |
even with NFS enabled (to test), this process is anything but straightforward. I'm now getting "Connection refused" errors to .git/annex/ssh/SHARD12@iabak.archiveteam.org |
13:57
🔗
|
asktoomuc |
not sure what I'm doing wrong |
14:17
🔗
|
kurt |
db48x / yipdw I haven't added the cronjobs back in - I was running the scripts manually to see if it actually ended up giving us updated graphs |
14:26
🔗
|
trs80 |
asktoomuch: can you pastebin the error? we'll need more details to work out what's going on |
15:43
🔗
|
|
cmaldonad has joined #internetarchive.bak |
15:44
🔗
|
|
cmaldonad has quit IRC (Client Quit) |
16:24
🔗
|
|
Optical has quit IRC (Quit: Page closed) |
16:29
🔗
|
asktoomuc |
http://pastebin.com/E9gdpN1Q |
16:31
🔗
|
iabak-reg |
03registrar 05master 2af9b61 06other 10SHARD12/pubkeys registration of removed on SHARD12 |
16:33
🔗
|
iabak-reg |
03registrar 05master 9391ff2 06other 10SHARD12/pubkeys registration of removed@gmail.com on SHARD12 |
16:50
🔗
|
db48x |
asktoomuc: to ssh control master connections generally work for you, or is the error message just saying that you couldn't connnect to iabak.archiveteam.org? |
16:54
🔗
|
db48x |
asktoomuc: http://stackoverflow.com/questions/36459785/shared-ssh-connection-with-control-master-not-working perhaps? |
16:54
🔗
|
db48x |
oh, is this another NFS-or-SMB thing? |
16:58
🔗
|
iabak-reg |
03registrar 05master 4ed30a9 06other 10SHARD12/pubkeys registration of removed@gmail.com on SHARD12 |
17:20
🔗
|
asktoomuc |
db48x: NFS this time round |
17:21
🔗
|
asktoomuc |
as SMB isn't supported with symlinks it seems. I created a special share with NFS enabled for it |
17:22
🔗
|
asktoomuc |
I'm not sure what shared ssh or control master are tbh. I have just installed a Debian 8 and I'm running the process from it |
17:23
🔗
|
db48x |
it's a way for multiple SSH processes to share the same TCP connection, when they're all talking to the same server |
17:24
🔗
|
db48x |
it makes it quicker to start multiple concurrent downloads, since they only have to set up the connection once |
17:25
🔗
|
db48x |
but it requires creating a unix socket file |
17:25
🔗
|
asktoomuc |
ok, I see |
17:25
🔗
|
asktoomuc |
do I need any special configuration to make it work? |
17:26
🔗
|
db48x |
hmm |
17:26
🔗
|
asktoomuc |
I'm not too sure what's wrong from the logs. I have linked the pastebin of the execution as you've probably seen |
17:27
🔗
|
db48x |
yes, see lines 167 and 168 |
17:28
🔗
|
db48x |
that's git printing out a message that it couldn't use the socket file we specified |
17:28
🔗
|
asktoomuc |
hmmm ok |
17:29
🔗
|
db48x |
or rather, it's git-annex that specifies where the socket should be stored |
17:32
🔗
|
db48x |
ok, this is configurable in git-annex |
17:32
🔗
|
db48x |
go into the shard directory and run 'git config annex.sshcaching false' |
17:33
🔗
|
db48x |
of course, as soon as it checks out a new shard that shard will be broken |
17:33
🔗
|
asktoomuc |
that command isn't supposed to return anything, is it? |
17:34
🔗
|
db48x |
no |
17:34
🔗
|
db48x |
you can run git config --list to see what settings are set |
17:35
🔗
|
db48x |
I will think about a way to make this easier |
17:36
🔗
|
asktoomuc |
http://pastebin.com/uCHZZfXb |
17:36
🔗
|
asktoomuc |
looks like the setting is set |
17:37
🔗
|
asktoomuc |
can I run the main iabak script in the parent directory after that? |
17:50
🔗
|
sevs |
db48x: you should be able to put git options into the ANNEXGETOPTS file, I think there is a switch for that |
17:52
🔗
|
sevs |
-c name=value Overrides git configuration settings. |
17:54
🔗
|
asktoomuc |
hmmm this didn't fix it |
17:55
🔗
|
asktoomuc |
"./iabak-helper: line 162: 5400000000000 - : syntax error: operand expected (error token is "- ")" |
17:56
🔗
|
asktoomuc |
http://pastebin.com/7YiUKJ7k |
17:56
🔗
|
asktoomuc |
sorry guys, I'm just trying to contribute but I end up creating a lot of problems... |
17:57
🔗
|
sevs |
just had that a couple hours ago |
17:57
🔗
|
sevs |
one sec |
17:59
🔗
|
sevs |
go to the shard dir, and set the git config variable annex.diskreserve=100M |
18:01
🔗
|
asktoomuc |
now that I think about it, the script never asked me how much free space I wanted to keep |
18:01
🔗
|
asktoomuc |
I seem to remember reading about that in the wiki |
18:02
🔗
|
asktoomuc |
"It should prompt you for how much disk space to not use. To adjust this value later, use git config annex.diskreserve 200GB in all of the IA.BAK/shard* directories." |
18:02
🔗
|
asktoomuc |
yep, it didn't |
18:04
🔗
|
sevs |
yeah, on one machine it was set, on another it wasn't |
18:05
🔗
|
sevs |
db48x: I believe this was introduced with the newest commit to https://github.com/ArchiveTeam/IA.BAK |
18:08
🔗
|
db48x |
yep |
18:08
🔗
|
db48x |
annoyingly I thought I handled the case where there was no reserve set |
18:08
🔗
|
db48x |
except right there, obviously |
18:09
🔗
|
asktoomuc |
:) |
18:10
🔗
|
asktoomuc |
well at least I'm happy that I'm helping with finding issues in the current process |
18:10
🔗
|
asktoomuc |
it seems to be downloading now |
18:10
🔗
|
sevs |
if i got my bits of bash right you tried with "${GIT} config annex.diskreserve || true" except it should be "|| 0" |
18:11
🔗
|
db48x |
sevs: that's a good idea |
18:11
🔗
|
sevs |
asktoomuc: yay! |
18:11
🔗
|
asktoomuc |
the downloading seems a bit slow (~500KB/s) but I can live with that |
18:13
🔗
|
sevs |
asktoomuc: by default it only downloads files sequentially |
18:14
🔗
|
sevs |
you can put a file ANNEXGETOPTS in your ia.bak dir |
18:14
🔗
|
asktoomuc |
yeah, I saw a message about enabling concurrent download but I don't know how to do that and I'm worried it would not work with my setup that doesn't support control master connections it seems |
18:14
🔗
|
asktoomuc |
oh ok, well I'll try that |
18:14
🔗
|
sevs |
with the content "-J<num>" |
18:14
🔗
|
kurt |
bit of an edge case here really, but on the subject of concurrent downloads |
18:14
🔗
|
asktoomuc |
the message says: "(Not using enough bandwith? Enable concurrent downloads with: echo -J5 > ANNEXGETOPTS)" |
18:14
🔗
|
sevs |
exactly |
18:15
🔗
|
kurt |
if you've got more than one concurrent download going then kill the script, it'll finish them off one by one once you restart the script |
18:15
🔗
|
kurt |
any way to avoid that? |
18:15
🔗
|
asktoomuc |
so I create ANNEXGETOPTS. What's a sensible value for n on a VM running on Core i7 with enough RAM and a 1Gbps symetrical connection? |
18:16
🔗
|
db48x |
it'll work, it'll just make an extra TCP connection for each concurrent download |
18:16
🔗
|
db48x |
kurt: tweak the rundownloaddirs function and send us a pull request :) |
18:16
🔗
|
sevs |
yes? i remember deleting everything in <shard>/.git/annex/tmp/ worked at some point |
18:17
🔗
|
asktoomuc |
*value for J sorry |
18:17
🔗
|
sevs |
at least once that worked, no idea if i was just lucky |
18:19
🔗
|
sevs |
asktoomuc: start with 10, see where that brings you to |
18:19
🔗
|
asktoomuc |
ok thanks! |
18:20
🔗
|
sevs |
do you have iftop? wonderful tool, shows the traffic/sec |
18:20
🔗
|
asktoomuc |
nope, I'll install the package now |
18:21
🔗
|
sevs |
in the lower right corner you see the total incomming and outgoing rate |
18:27
🔗
|
|
Deewiant has quit IRC (Quit: Viivan loppu.) |
18:30
🔗
|
kurt |
sevs: looks like that's just restarted the downloads :( |
18:31
🔗
|
sevs |
puhh, yeah, no idea |
18:31
🔗
|
db48x |
yes, it'll restart any downloads that were interrupted |
18:32
🔗
|
db48x |
but it doesn't do so concurrently |
18:32
🔗
|
db48x |
once those are done it'll go back to normal operation, which can be concurrent |
18:33
🔗
|
sevs |
the one time I did this I got "Hey, you have shuf installed ..." |
18:33
🔗
|
db48x |
yes, you'll get that next |
18:33
🔗
|
db48x |
see https://github.com/ArchiveTeam/IA.BAK/blob/master/iabak-helper#L144-L149 |
18:36
🔗
|
kurt |
I know what the normal operation is - just wondering if there were a way to have it 'forget' that it's got some partly-finished downloads so it goes straight to concurrent downloads |
18:37
🔗
|
kurt |
my issue is that huge 20gb files at 10mbit/s each 40 times is time inefficient |
18:37
🔗
|
kurt |
I'll have a poke around anywho |
18:39
🔗
|
db48x |
kurt: sure, run git annex unused |
18:40
🔗
|
db48x |
then git annex dropunused |
18:42
🔗
|
asktoomuc |
I keep getting that message "Filled up available disk space, so stopping here!" |
18:43
🔗
|
sevs |
db48x: shouldn't it be possible to run the list from "git annex unused" through the same "| dirname_pipe | sumofbytes | shuffle | rundownloads" pipeline? |
18:43
🔗
|
asktoomuc |
"Wow! I'm done downloading this shard of the IA!" <= that seems too good to be true |
18:44
🔗
|
sevs |
asktoomuc: which shard were you working on? |
18:44
🔗
|
asktoomuc |
shard 12 |
18:44
🔗
|
sevs |
might be that there were enough copies |
18:44
🔗
|
asktoomuc |
I chose for me, I didn't specify anything |
18:44
🔗
|
asktoomuc |
*it |
18:45
🔗
|
sevs |
yeah |
18:45
🔗
|
sevs |
you *do* have space free? |
18:45
🔗
|
asktoomuc |
it says the shard directory is 641M (du -sh) |
18:45
🔗
|
asktoomuc |
I have ~6TB free |
18:45
🔗
|
sevs |
hmm |
18:46
🔗
|
db48x |
what is your annex.diskreserve setting in shard12? |
19:06
🔗
|
|
bwn has quit IRC (Ping timeout: 964 seconds) |
19:09
🔗
|
asktoomuc |
annex.diskreserve=100M |
19:09
🔗
|
asktoomuc |
how do you choose which shard you download? |
19:09
🔗
|
asktoomuc |
and why does the script exits after just finishing 1 shard? |
19:12
🔗
|
|
sevs has quit IRC (Ping timeout: 268 seconds) |
19:14
🔗
|
|
kyan has joined #internetarchive.bak |
19:15
🔗
|
db48x |
it's not supposed to |
19:17
🔗
|
asktoomuc |
it probably has to do with my "Filled up available disk space" error message then |
19:20
🔗
|
Kenshin |
who's managing the iabak node currently? |
19:20
🔗
|
Kenshin |
i need to move the VM to another machine with more space, and also IP change |
19:21
🔗
|
HCross |
db48x and closure |
19:21
🔗
|
HCross |
probably need SketchCow for DNS |
19:23
🔗
|
db48x |
Kenshin: why do you say that? |
19:24
🔗
|
Kenshin |
db48x: ? |
19:24
🔗
|
Kenshin |
db48x: what do you mean |
19:25
🔗
|
db48x |
oh, I misread |
19:28
🔗
|
db48x |
asktoomuc: so, what does "df -Ph ." print out on your system? |
19:31
🔗
|
asktoomuc |
http://pastebin.com/sKGhShBx |
19:31
🔗
|
db48x |
you forgot the . |
19:31
🔗
|
db48x |
but presumably it's just the first and last lines of that |
19:35
🔗
|
asktoomuc |
sorry |
19:35
🔗
|
asktoomuc |
Filesystem Size Used Avail Use% Mounted on /dev/sda1 1.9G 1012M 726M 59% / |
19:36
🔗
|
db48x |
ok, so iabak did the right thing |
19:37
🔗
|
db48x |
you have 700M available and wanted to reserve 100M, so it downloaded ~600M of stuff :) |
19:38
🔗
|
asktoomuc |
wrong directory... |
19:38
🔗
|
asktoomuc |
man I'm useless |
19:39
🔗
|
asktoomuc |
root@IABAK-VM:/mnt/IABAK/IA.BAK# df -Ph . Filesystem Size Used Avail Use% Mounted on 192.168.11.98:/mnt/user/IABAK 7.3T 1.9T 5.4T 26% /mnt/IABAK |
19:39
🔗
|
iabak-reg |
03registrar 05master 16a57e4 06other 10SHARD12/pubkeys registration of hcross on SHARD12 |
19:40
🔗
|
asktoomuc |
but I'm thinking what you were saying is related somehow. It's suspicious that it stopped at ~640M when the space available on the main disk is 726M |
19:44
🔗
|
|
bwn has joined #internetarchive.bak |
19:48
🔗
|
SketchCow |
? |
19:56
🔗
|
yipdw |
asktoomuc: that's the reserve behavior; that seems expected to me |
19:59
🔗
|
db48x |
yipdw: except that he actually has terabytes available in the mount |
20:00
🔗
|
db48x |
asktoomuc: just for kicks, what do you get when you run it in the shard directory? |
20:05
🔗
|
iabak-reg |
03registrar 05master fa6c2de 06other 10SHARD10/pubkeys registration of Kaz on SHARD10 |
20:06
🔗
|
yipdw |
oh hmm |
20:06
🔗
|
yipdw |
I missed that |
20:08
🔗
|
asktoomuc |
when I run waht, db48x ? |
20:22
🔗
|
asktoomuc |
this? root@IABAK-VM:/mnt/IABAK/IA.BAK/shard12# df -Ph . Filesystem Size Used Avail Use% Mounted on 192.168.11.98:/mnt/user/IABAK 7.3T 1.9T 5.4T 26% /mnt/IABAK |
20:24
🔗
|
iabak-reg |
03registrar 05master 2e6043e 06other 10SHARD16/pubkeys registration of Kaz on SHARD16 |
20:30
🔗
|
thelsdj |
I'm having the out of disk space problem as well, only started when I re-ran iabak which updated code from git |
20:31
🔗
|
thelsdj |
I have 8.8T free (what df says) and its set to save 7TB |
20:34
🔗
|
thelsdj |
Filesystem Size Used Available Capacity Mounted on |
20:34
🔗
|
thelsdj |
/dev/sda1 16.0T 7.1T 8.8T 45% /mnt/DroboFS |
20:34
🔗
|
thelsdj |
annex.diskreserve=7TB |
20:34
🔗
|
thelsdj |
Checking for any items that still need to be downloaded... |
20:34
🔗
|
thelsdj |
oops, out of disk space |
20:40
🔗
|
thelsdj |
tried setting it in GB so 7168GB and same problem, trying by removing the limit to verify that is the problem |
20:41
🔗
|
thelsdj |
now it says i'm done with the shard, is shard16 only 202G? or is that another bug? |
20:46
🔗
|
thelsdj |
huh, so i removed NOMORE and it won't even start another shard, says its used all the disk space (even though I don't have limit set in my existing shard) |
20:49
🔗
|
kurt |
diskreserve = how much it'll keep free, no? |
20:50
🔗
|
asktoomuc |
yeah |
20:50
🔗
|
kurt |
and /mnt/IABAK has 5.4TB free? |
20:51
🔗
|
kurt |
or are you no longer using /mnt/IABAK |
20:51
🔗
|
asktoomuc |
no no, I'm indeed using it. And yes, it has 5.4TB free |
20:52
🔗
|
kurt |
so you want to keep 7TB free |
20:52
🔗
|
asktoomuc |
so I should be using it until it's almost full with the current setting |
20:52
🔗
|
kurt |
you have 5.4TB free |
20:52
🔗
|
kurt |
and you're wondering why it won't download more? |
20:52
🔗
|
asktoomuc |
no, that's thelsdj |
20:52
🔗
|
kurt |
you are correct, I am an idiot |
20:52
🔗
|
kurt |
names are hard I want nick colors back, sorry |
20:53
🔗
|
asktoomuc |
no worries |
20:54
🔗
|
db48x |
kurt: :) |
20:55
🔗
|
db48x |
so apparently I broke how it measures the free disk space? |
20:55
🔗
|
db48x |
asktoomuc: can you do some debugging for me? |
20:55
🔗
|
thelsdj |
db48x: i can as well, yeah you seem to have broken it |
20:55
🔗
|
asktoomuc |
sure, thanks for trying to help |
20:56
🔗
|
iabak-reg |
03registrar 05master 6193808 06other 10SHARD10/pubkeys registration of thelsdj on SHARD10 |
20:56
🔗
|
thelsdj |
ok so hmm |
20:56
🔗
|
thelsdj |
i set th annex.diskreserve on the IA.BAK directory as well |
20:57
🔗
|
thelsdj |
and now it seems to be working |
20:57
🔗
|
db48x |
https://gist.github.com/anonymous/41195d51c16e880df5e67c62fb46cca6 |
20:57
🔗
|
db48x |
ah, hmm |
20:59
🔗
|
thelsdj |
also, sort of unrelated, how can i double check that if it tells me its finished getting a shard that I can believe it? is shard16 only 202G? i thought it was more than that |
21:00
🔗
|
kurt |
heh, now I don't even have concurrent grabs at all |
21:00
🔗
|
db48x |
forgot one function: https://gist.github.com/db48x/b079eaf83d33361d28c8115e8e5352da |
21:01
🔗
|
db48x |
thelsdj: you can use git annex list --all to see which remotes have copies of which files |
21:01
🔗
|
db48x |
you can use git annex list --not --copies 4 to see a list of all files that don't have enough copies |
21:02
🔗
|
db48x |
asktoomuc: if you can download the test.sh from that second gist and source it, then you'll be able to call the functions and make sure they work correctly |
21:03
🔗
|
db48x |
for example, bytesFromSize $(annexreserved) should print out 100000000 |
21:03
🔗
|
db48x |
and bytesFromSize $(diskfree) should print out 540000000000000 |
21:03
🔗
|
thelsdj |
line 13 has syntax error i think |
21:03
🔗
|
asktoomuc |
I need a tiny bit more hand-holding, sorry |
21:04
🔗
|
asktoomuc |
I can download the file, where do you want me to put it and what to do with it? |
21:04
🔗
|
db48x |
asktoomuc: put it in IA.BAK |
21:04
🔗
|
thelsdj |
no, just my shell was weird |
21:04
🔗
|
db48x |
then use the "source" command to add it to your current environment |
21:04
🔗
|
db48x |
(basically "source text.sh") |
21:05
🔗
|
db48x |
thelsdj: it almost has LTS, but if you get an error message let me know what it is :) |
21:05
🔗
|
thelsdj |
needs 'pow' as well |
21:05
🔗
|
db48x |
ah, right |
21:06
🔗
|
asktoomuc |
ok, copied and sourced |
21:06
🔗
|
asktoomuc |
root@IABAK-VM:/mnt/IABAK/IA.BAK# bytesFromSize $(annexreserved) 100000000 |
21:06
🔗
|
db48x |
updated: https://gist.github.com/db48x/b079eaf83d33361d28c8115e8e5352da |
21:07
🔗
|
asktoomuc |
root@IABAK-VM:/mnt/IABAK/IA.BAK# bytesFromSize $(diskfree) 5400000000000 |
21:07
🔗
|
db48x |
asktoomuc: ok, excellent |
21:07
🔗
|
db48x |
asktoomuc: that rules out a class of problems |
21:07
🔗
|
db48x |
asktoomuc: although for completeness, let's make sure that subtraction works :) |
21:07
🔗
|
thelsdj |
so for me annexreserved is 0 |
21:07
🔗
|
thelsdj |
but i don't think thats right |
21:08
🔗
|
db48x |
echo $(($(bytesFromSize $(diskfree)) - $(bytesFromSize $(annexreserved)))) |
21:08
🔗
|
asktoomuc |
root@IABAK-VM:/mnt/IABAK/IA.BAK# echo $(($(bytesFromSize $(diskfree)) - $(bytesFromSize $(annexreserved)))) 5399900000000 |
21:08
🔗
|
db48x |
asktoomuc: perfect |
21:08
🔗
|
db48x |
thelsdj: annexreserved tries to be smart |
21:09
🔗
|
db48x |
if you run annexreserved . it looks in the current directory |
21:09
🔗
|
db48x |
if you just run annexreserved it looks in a shard* subdirectory of the current directory |
21:10
🔗
|
db48x |
asktoomuc: ok, another thing we can do is to run iabak-helper with debugging output |
21:10
🔗
|
thelsdj |
oh i guess my new shard wasn't really setup yet and didn't have the reserve |
21:11
🔗
|
db48x |
thelsdj: ok, that's a bug we need to track down separately |
21:11
🔗
|
db48x |
asktoomuc: if you edit iabak-helper and add "set -x" as the second line, then when you run iabak it'll be super verbose |
21:11
🔗
|
db48x |
then I can read that output and see what's going on |
21:11
🔗
|
asktoomuc |
ok, let's give that a try |
21:12
🔗
|
db48x |
I'll be back in 5 minutes |
21:19
🔗
|
iabak-reg |
03registrar 05master a3e059a 06other 10SHARD15/pubkeys registration of thelsdj on SHARD15 |
21:19
🔗
|
iabak-reg |
03registrar 05master 62053cb 06other 10SHARD16/pubkeys registration of octobyt3 on SHARD16 |
21:20
🔗
|
thelsdj |
+ available=-6992000000000000 |
21:20
🔗
|
thelsdj |
+ [[ -6992000000000000 -gt 34359738368 ]] |
21:20
🔗
|
thelsdj |
+ [[ -6992000000000000 -gt 0 ]] |
21:20
🔗
|
thelsdj |
+ echo 'oops, out of disk space' |
21:20
🔗
|
thelsdj |
hmmm |
21:20
🔗
|
thelsdj |
too much space free maybe? |
21:20
🔗
|
thelsdj |
lol |
21:23
🔗
|
kurt |
are you running freenas or something? could try changing the dataset quota if so |
21:27
🔗
|
thelsdj |
db48x: so your bytesFromSize returns different things if I do 7TB or 7T |
21:27
🔗
|
thelsdj |
maybe thats the bug? |
21:28
🔗
|
thelsdj |
or a bug, not sure, still messing and trying to figure this out |
21:32
🔗
|
db48x |
thelsdj: ah, indeed |
21:33
🔗
|
asktoomuc |
on my side, rerunning it with set -x seems to have changed something. It is still downloading for now and hasn't aborted with the no space message yet |
21:34
🔗
|
db48x |
asktoomuc: heh, ok. can you scroll back up to just before it started downloading? |
21:34
🔗
|
asktoomuc |
sure |
21:35
🔗
|
asktoomuc |
there's a bunch of stuff there |
21:35
🔗
|
db48x |
look for this: |
21:35
🔗
|
db48x |
+ echo 'Checking for any files that still need to be downloaded...' |
21:35
🔗
|
db48x |
+ periodicsync |
21:35
🔗
|
db48x |
Checking for any files that still need to be downloaded... |
21:35
🔗
|
db48x |
+ find_insufficient_copies |
21:36
🔗
|
db48x |
and capture the log down until it starts downloading something |
21:36
🔗
|
db48x |
(personally, I use tmux, so I can search backwards through the back buffer for the string 'find_insufficient_copies' and I'm there; perhaps you can do something similar in your setup) |
21:37
🔗
|
asktoomuc |
http://pastebin.com/fVGUgQbv |
21:37
🔗
|
db48x |
ok, your flock is breaking too, but that's not the cause of this problem |
21:37
🔗
|
asktoomuc |
I'm not and Putty is very annoying because it keeps scrolling down the window when updating the download graph |
21:38
🔗
|
asktoomuc |
hopefully I captured everything you needed |
21:38
🔗
|
db48x |
yes, it looks good |
21:38
🔗
|
db48x |
+ available=5399900000000 |
21:38
🔗
|
db48x |
so it knows that it has plenty of space available |
21:39
🔗
|
db48x |
+ [[ 5399900000000 -gt 34359738368 ]] |
21:39
🔗
|
db48x |
+ available=34359738368 |
21:39
🔗
|
db48x |
+ [[ 34359738368 -gt 0 ]] |
21:39
🔗
|
db48x |
+ spacelimit=34359738368 |
21:39
🔗
|
db48x |
it limits it to a cap that I put in on a whim |
21:40
🔗
|
db48x |
ah: |
21:40
🔗
|
db48x |
+ read -d '' bytes filename |
21:40
🔗
|
db48x |
+ [[ 387325952 -lt 34359738368 ]] |
21:40
🔗
|
db48x |
+ spaceneeded=387325952 |
21:40
🔗
|
db48x |
+ files+=(${filename}) |
21:40
🔗
|
db48x |
+ read -d '' bytes filename |
21:40
🔗
|
db48x |
+ numfiles=1 |
21:40
🔗
|
db48x |
we read a name from the list of things to download, and end up only finding one thing |
21:40
🔗
|
db48x |
occupying 387MB |
21:42
🔗
|
thelsdj |
huh, so if i manually run the find_insufficient_copies in my shard dir i get a ton of files, but the script doesn't seem to find anything |
21:44
🔗
|
thelsdj |
+ files=() |
21:44
🔗
|
thelsdj |
+ read -d '' bytes filename |
21:44
🔗
|
thelsdj |
+ numfiles=0 |
21:44
🔗
|
db48x |
hrm |
21:50
🔗
|
thelsdj |
looks like my xargs and dirname don't behave as expected |
21:51
🔗
|
db48x |
perfect |
21:51
🔗
|
thelsdj |
maybe they are being short circuited or not, let me try hard coding it to 'cat' and see if that fixes it |
21:56
🔗
|
thelsdj |
is the sumofbytes supposed to only print one line? |
21:56
🔗
|
db48x |
potentially |
21:56
🔗
|
db48x |
it groups them by whatever is in the second column |
21:57
🔗
|
db48x |
and we use dirname to shorten the second column and get the item names |
21:57
🔗
|
db48x |
if there's only one item in the shard then there will only be one line in the final output |
21:57
🔗
|
thelsdj |
i just get 734964 archivebot/archiveteam_archivebot_go_080/00000_Header.png but thats with cutting out the xargs/dirname stuff and just using 'cat' |
21:57
🔗
|
db48x |
then no |
21:57
🔗
|
thelsdj |
but theres a ton of items i don't yet have |
21:58
🔗
|
db48x |
with filenames like that, sumofbytes should return them unchanged |
21:58
🔗
|
db48x |
since there shouldn't be any duplicates in the second column to group the lines up by |
21:59
🔗
|
thelsdj |
so by making it just find_insufficient_copies | rundownloads it works |
21:59
🔗
|
thelsdj |
annoying that these pipes are so hard to debug |
22:01
🔗
|
db48x |
thelsdj: :) |
22:01
🔗
|
thelsdj |
hmm maybe not |
22:01
🔗
|
db48x |
find_insufficient_copies | head -z | tr '\000' '\n' |
22:01
🔗
|
db48x |
find_insufficient_copies | head -z | dirname_pipe | tr '\000' '\n', etc |
22:02
🔗
|
|
VADemon has joined #internetarchive.bak |
22:03
🔗
|
db48x |
(it would be nice if bash had a better debugger; one that let you set breakpoints, and inspect everything that had flowed through a pipeline) |
22:05
🔗
|
asktoomuc |
it's still downloading on my side. I'm confused |
22:06
🔗
|
db48x |
asktoomuc: :) |
22:06
🔗
|
db48x |
gremlins? |
22:09
🔗
|
asktoomuc |
yeah I don't know. The only thing I changed was the set -x after running the tests you asked me |
22:10
🔗
|
asktoomuc |
anyway, not going to complain |
22:12
🔗
|
thelsdj |
hmmm still not working, it tries to download 2 torrents that i don't have the tools for, but tehres still a TON of files that find_insufficient_copies returns but aren't being attempted |
22:32
🔗
|
thelsdj |
si don't quite get shard16, i have 202G and i seem to have everything that there aren't 4 copies of and yet like half the shard is still IA only? are the torrent files really 50+% of the shard? |
22:33
🔗
|
|
jsp12345 has quit IRC (Read error: Operation timed out) |
22:33
🔗
|
thelsdj |
i guess if deewiant has 1.34T then maybe the torrents really are 1.1T total |
22:34
🔗
|
thelsdj |
huh, now git annex list is showing a ton of stuff i don't hve but is on web and yet iabak is not downloading it |
22:39
🔗
|
thelsdj |
find_insufficient_copies |tr '\000' '\n'| awk '{print $2}' |xargs -n 100 ../git-annex.linux/git annex list|grep "^__X_"|wc -l |
22:39
🔗
|
thelsdj |
3874 |
22:40
🔗
|
thelsdj |
so by my basic understanding, iabak _should_ be trying to download these? |
22:46
🔗
|
iabak-reg |
03registrar 05master c9abe05 06other 10SHARD3/pubkeys registration of mitch on SHARD3 |
22:53
🔗
|
db48x |
thelsdj: we're also having some trouble with the stats on the website |
22:53
🔗
|
thelsdj |
i think i also may be hitting a bash pipe limit |
22:54
🔗
|
db48x |
yes. it sounds to me like it's not processing that pipeline correctly or something |
22:54
🔗
|
thelsdj |
the output of find_insufficient_copies is 490k but i think my bash gives up at 256k |
22:54
🔗
|
db48x |
well, it's not like a pipe can only transfer a limited amount of data |
22:55
🔗
|
thelsdj |
right, it doesn't make sense but stuffs not coming out the end, even if i remove all the steps |
22:55
🔗
|
db48x |
weird |
22:56
🔗
|
db48x |
what is find_insufficient_copies |tr '\000' '\n'|wc -l |
22:57
🔗
|
|
komarEX has joined #internetarchive.bak |
22:57
🔗
|
komarEX |
hey everyone |
22:58
🔗
|
komarEX |
is it me or ANNEXGETOPTS disappeared from iabak ? |
22:58
🔗
|
db48x |
komarEX: no, it's still there |
22:58
🔗
|
komarEX |
well the information is but I don't see usage |
22:58
🔗
|
thelsdj |
the 'read' is only getting 383 different filenames |
22:59
🔗
|
thelsdj |
out of ~4800 that start the pipe |
22:59
🔗
|
db48x |
thelsdj: funky |
22:59
🔗
|
komarEX |
I have -J15 in file and it still downloads just one file |
22:59
🔗
|
kurt |
komarEX: glad I'm not the only one noticing that today |
22:59
🔗
|
kurt |
but I can't see any changes that would affect it in the iabak repo |
23:00
🔗
|
komarEX |
Then I guess I have to look into my files backup |
23:00
🔗
|
HCross2 |
Mine is doing that too |
23:01
🔗
|
db48x |
it's right there on line... oh, uh |
23:02
🔗
|
db48x |
sorry about that, I did break it |
23:03
🔗
|
komarEX |
so let's just restart iabak right? |
23:04
🔗
|
db48x |
yep |
23:04
🔗
|
komarEX |
could you confirm 2 things with me |
23:04
🔗
|
db48x |
hopefully :) |
23:04
🔗
|
komarEX |
1. shard4 is 1,41TB ? |
23:04
🔗
|
db48x |
no, shard1 has 2.71 TB of stuff |
23:05
🔗
|
komarEX |
oh |
23:05
🔗
|
komarEX |
ok |
23:05
🔗
|
komarEX |
2. what will happen if I let annex use for ex. 4TB but shard is 2,71 |
23:05
🔗
|
db48x |
it will move on to another shard |
23:05
🔗
|
komarEX |
ok |
23:06
🔗
|
komarEX |
oh |
23:06
🔗
|
komarEX |
btw |
23:06
🔗
|
db48x |
feel free to also peruse the content of the shards and manually run git annex get for any files you would like to use |
23:06
🔗
|
komarEX |
shard4 =/= shard1 I believe :d |
23:06
🔗
|
db48x |
music you'd like to listen to, for instance |
23:07
🔗
|
db48x |
oh, shard 4 is 1.41 TB |
23:07
🔗
|
komarEX |
ok and one more thing |
23:07
🔗
|
db48x |
sure |
23:07
🔗
|
komarEX |
can you deny annex/iabak to download other shards ? |
23:08
🔗
|
db48x |
sure, just edit the repolist |
23:08
🔗
|
db48x |
set them all to "maint" |
23:08
🔗
|
komarEX |
can you tell me which file/command ? |
23:09
🔗
|
db48x |
use your favorite text editor |
23:09
🔗
|
db48x |
it's a very simple file |
23:09
🔗
|
komarEX |
oh I'm blind |
23:09
🔗
|
komarEX |
it's in main dir ok |
23:11
🔗
|
thelsdj |
so it eems to be stopping right before a 10G file, which i obviously have space for, so guessing issue is with: [[ $((${spaceneeded} + ${bytes})) -lt ${spacelimit} ]] |
23:13
🔗
|
db48x |
thelsdj: what are the other variables at the time? |
23:14
🔗
|
db48x |
komarEX: just be aware that this may require you to manually pull future updates |
23:14
🔗
|
komarEX |
db48x: I'm aware |
23:15
🔗
|
|
bwn has quit IRC (Ping timeout: 244 seconds) |
23:15
🔗
|
db48x |
:) |
23:15
🔗
|
thelsdj |
it gives up between these two: XXX:spaceneeded:70256110876,bytes:262,spacelimit:34359738368,archivebot/archiveteam_archivebot_go_081/www.geekhard.fr-shallow-20140726-175015-1f1e6.json |
23:15
🔗
|
thelsdj |
XXX:spaceneeded:80994514176,bytes:10738403300,spacelimit:34359738368,archivebot/archiveteam_archivebot_go_081/www.genealogy.com-inf-20140605-184144-ahip0-000 |
23:16
🔗
|
thelsdj |
00.warc.gz |
23:16
🔗
|
thelsdj |
(i took out the check so i can get output from before and after) |
23:16
🔗
|
komarEX |
db48x: I guess I should just throw this file to .gitignore? |
23:17
🔗
|
db48x |
thelsdj: each time you run it it's going to get the list of files in a different order, and it'll stop at a different place |
23:17
🔗
|
thelsdj |
no, its same space every time |
23:17
🔗
|
thelsdj |
i took out the randomize |
23:17
🔗
|
thelsdj |
since i dont' have shuf anyways |
23:17
🔗
|
db48x |
ah |
23:17
🔗
|
db48x |
ok, so why is spaceneeded 70G even though spacelimit is 34G? |
23:18
🔗
|
db48x |
it's supposed to stop when spaceneeded + bytes -gt spacelimit |
23:18
🔗
|
asktoomuc |
I'm still pulling files. I'm at roughly 15G for now, it's quite slow (between 1 and 6MB/s) but I guess I'm not in a hurry. I just hope it will keep working |
23:19
🔗
|
thelsdj |
well, why is spacelimit 34G when I should have like 1.7T available for it |
23:19
🔗
|
db48x |
thelsdj: because I had the idea that iabak should sync the repository more frequently |
23:19
🔗
|
db48x |
so I wrote it to stop after a slightly random threshold in the hopes that it would do so |
23:20
🔗
|
db48x |
(the threshold is 8 hours at 1MB/s) |
23:21
🔗
|
thelsdj |
ok, so right, theres like 300+ files that it goes through no problem, doesn't take any time |
23:21
🔗
|
thelsdj |
and it always tries to download them |
23:21
🔗
|
thelsdj |
so i guess its blowing through that in like 20 seconds |
23:22
🔗
|
db48x |
metadata and torrents and stuff that fails? |
23:22
🔗
|
thelsdj |
i've manually run the git annex get command for them and theres no output, return value is 0 |
23:22
🔗
|
thelsdj |
doesn't seem to be failures as far as i can tell |
23:22
🔗
|
thelsdj |
just silently succeeds immediately |
23:22
🔗
|
db48x |
then you already have those files |
23:23
🔗
|
db48x |
that may be something we forgot along the way, now that I think about it |
23:23
🔗
|
thelsdj |
right so i already have the files, but thy aren't at 4 copies yet, but they aren't filtered out |
23:23
🔗
|
thelsdj |
ok that makes sense |
23:24
🔗
|
db48x |
and now that I've added a threshold, they're clogging up the works |
23:24
🔗
|
thelsdj |
right |
23:24
🔗
|
db48x |
good bug report :) |
23:24
🔗
|
db48x |
--not --copies 4 --not --here? |
23:25
🔗
|
thelsdj |
yeah --not is there |
23:26
🔗
|
db48x |
ah, --not --copies 4 --not --in=here? |
23:27
🔗
|
thelsdj |
oh i get it, ok adding that and trying |
23:29
🔗
|
|
komarEX has quit IRC (Quit: Page closed) |
23:31
🔗
|
thelsdj |
combined with changing sumofbytes to cat, it seems to be downloading again |
23:31
🔗
|
db48x |
ok, so sumofbytes is broken |
23:31
🔗
|
db48x |
it's just an awk script; could your awk be broken? |
23:31
🔗
|
thelsdj |
at least on my system i think it prints only 1 line |
23:31
🔗
|
thelsdj |
yeah its possible my awk is weird |
23:32
🔗
|
thelsdj |
i have busybox awk |
23:32
🔗
|
db48x |
oh |
23:32
🔗
|
db48x |
busybox |
23:33
🔗
|
thelsdj |
i'm surprised that this works as well as it does on my Drobo NAS |
23:33
🔗
|
db48x |
gawk instead? |
23:34
🔗
|
thelsdj |
if git-annex arm build could include gawk that would be great |
23:34
🔗
|
thelsdj |
seeing if i have a gawk elsewhere or available |
23:35
🔗
|
db48x |
heh |
23:36
🔗
|
thelsdj |
not in any obvious places or in the repo for what people have built for the Drobo |
23:38
🔗
|
thelsdj |
yeah so i think if its needed having git-annex include it in its arm binary would be very useful |
23:40
🔗
|
asktoomuc |
am I supposed to see multiple progress bars when concurrent downloads happen? |
23:43
🔗
|
asktoomuc |
because right now it only seems to download files sequentially even though I created the ANNEXGETOPTS file |
23:43
🔗
|
asktoomuc |
iabak@IABAK-VM:/mnt/IABAK/IA.BAK$ cat ANNEXGETOPTS J9 |
23:43
🔗
|
thelsdj |
should be -J9 right? |
23:44
🔗
|
asktoomuc |
indeed, thanks for spotting my mistake |
23:45
🔗
|
asktoomuc |
time to go to bed I guess, I can't read properly anymore |
23:45
🔗
|
asktoomuc |
thank you all for your help today! |
23:45
🔗
|
|
bwn has joined #internetarchive.bak |
23:48
🔗
|
db48x |
asktoomuc: you're welcome! |
23:48
🔗
|
db48x |
asktoomuc: thanks for helping us out |
23:48
🔗
|
db48x |
thelsdj: git-annex doesn't need awk at all |
23:48
🔗
|
db48x |
iabak does, but iabak isn't going to distribute awk |
23:48
🔗
|
db48x |
I mean, we could distribute bash too, and perl |
23:48
🔗
|
db48x |
so I'd rather not distribute any of them |
23:49
🔗
|
yipdw |
i have been on this weird rust kick latly |
23:49
🔗
|
thelsdj |
yeah, i mean its worth discussing as embedded NAS' are a good source of space so would be nice to be able to run on them without much problem |
23:54
🔗
|
db48x |
thelsdj: I'd rather have a few lines at the top of the file like AWK=awk that people can edit |
23:57
🔗
|
db48x |
I added some things to the readme |
23:57
🔗
|
db48x |
anything else we've covered today that I've forgotten? |