Time |
Nickname |
Message |
00:04
🔗
|
|
cmaldonad has joined #internetarchive.bak |
00:06
🔗
|
|
cmaldonad has quit IRC (Client Quit) |
00:17
🔗
|
sevs |
So, I've got a problem |
00:17
🔗
|
sevs |
iabak repeatedly hangs on some random item |
00:18
🔗
|
sevs |
just sits there and nothing is happening until I CTRL-C and restart |
00:18
🔗
|
sevs |
no network traffic, no cpu usage, nothing |
00:36
🔗
|
SketchCow |
I don't necessarily want to support SMB |
00:36
🔗
|
SketchCow |
But I want our docs to go "so, you're saddled with SMB - do this" |
00:37
🔗
|
db48x |
SketchCow: :) |
00:37
🔗
|
db48x |
sevs: what does it say before it hangs? |
00:38
🔗
|
sevs |
db48x: nothing, says "get starr/kajongsonsaengmu038800/kajongsonsaengmu038800_scandata.xml (from web...)" and nothing after that |
00:39
🔗
|
db48x |
ok, so perhaps it's a network thing? |
00:40
🔗
|
db48x |
go into that shard's directory and run ../git-annex.linux/git-annex get starr/kajongsonsaengmu038800/kajongsonsaengmu038800_scandata.xml |
00:40
🔗
|
sevs |
hmm, perhaps? I'm ssh'd into the machine, when I restart it it works again for a couple items |
00:41
🔗
|
sevs |
and it happens on two machines, two different shards, two different networks |
00:41
🔗
|
db48x |
hmm |
00:41
🔗
|
sevs |
sec |
00:41
🔗
|
sevs |
ok, that works |
00:42
🔗
|
sevs |
downloads the one file and everything is fine |
00:44
🔗
|
db48x |
and you say that it does download files for a while and then hangs |
00:44
🔗
|
db48x |
itermittant problem? |
00:44
🔗
|
sevs |
now i've started iabak again, downloads two items and sits one the third |
00:45
🔗
|
db48x |
if you leave it running for a while while it's hung, does it time out and start going again? |
00:48
🔗
|
sevs |
http://pastebin.com/WhmSYAc1 |
00:48
🔗
|
sevs |
it does not timeout |
00:48
🔗
|
sevs |
had it sitting there for more than 20 minutes |
00:49
🔗
|
db48x |
the message about not being able to verify the content is fine; that happens for some metadata for which we don't have hashes |
00:49
🔗
|
db48x |
but not timing out is weird |
00:50
🔗
|
sevs |
i have iftop open in another window and aside from some broadcasts there is literally no traffic |
00:51
🔗
|
sevs |
yeah no idea what the cause could be |
00:51
🔗
|
sevs |
was working fine yesterday |
00:51
🔗
|
db48x |
use tcpdump to capture the actual packets? |
00:52
🔗
|
sevs |
uhhh |
00:53
🔗
|
sevs |
would need you to walk me through that or some tutorial |
00:56
🔗
|
db48x |
:) |
00:56
🔗
|
db48x |
I have to eat, but I'll be back in a bit |
00:56
🔗
|
db48x |
perhpas if you google for instructions for using tcpdump to capture the traffic generated by a program you'll find something useful |
00:56
🔗
|
db48x |
bbl |
00:57
🔗
|
sevs |
ok, I'll be waiting |
01:09
🔗
|
sevs |
db48x: is it even possible to filter that traffic by pid? |
01:17
🔗
|
sevs |
so, "tcpdump -i bond0 -w tcp.dump" should capture everything, later on i can then take a look at it with wireshark |
01:55
🔗
|
sevs |
ofc now it works without problem ffs |
02:20
🔗
|
|
svchfoo3 sets mode: +o balrog |
02:23
🔗
|
db48x |
sevs: heh, that figures :) |
03:55
🔗
|
|
db48x has quit IRC (Quit: new ssd) |
04:04
🔗
|
|
kyan has quit IRC (Quit: Leaving) |
06:19
🔗
|
|
bwn has quit IRC (Ping timeout: 244 seconds) |
06:50
🔗
|
|
Start has joined #internetarchive.bak |
08:00
🔗
|
|
bwn has joined #internetarchive.bak |
08:03
🔗
|
|
yipdw has quit IRC (Read error: Operation timed out) |
08:04
🔗
|
|
yipdw has joined #internetarchive.bak |
08:04
🔗
|
|
svchfoo1 sets mode: +o yipdw |
08:31
🔗
|
|
atomotic has joined #internetarchive.bak |
08:53
🔗
|
|
sevs has quit IRC (Ping timeout: 268 seconds) |
08:55
🔗
|
|
zhongfu has quit IRC (Ping timeout: 260 seconds) |
09:01
🔗
|
|
zhongfu has joined #internetarchive.bak |
10:41
🔗
|
iabak-reg |
03registrar 05master 307eced 06other 10SHARD4/pubkeys registration of mr.business1148 on SHARD4 |
10:41
🔗
|
iabak-reg |
03registrar 05master af79bb9 06other 10SHARD4/pubkeys registration of mr.business1148 on SHARD4 |
10:41
🔗
|
iabak-reg |
03registrar 05master 4958ddc 06other 10SHARD4/pubkeys registration of mr.business1148 on SHARD4 |
10:41
🔗
|
iabak-reg |
03registrar 05master d394608 06other 10SHARD4/pubkeys registration of mr.business1148 on SHARD4 |
10:45
🔗
|
|
bwn has quit IRC (Ping timeout: 244 seconds) |
10:55
🔗
|
|
bwn has joined #internetarchive.bak |
11:04
🔗
|
|
atomotic has quit IRC (Remote host closed the connection) |
11:08
🔗
|
|
atomotic has joined #internetarchive.bak |
12:19
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
12:35
🔗
|
iabak-reg |
03registrar 05master 642b216 06other 10SHARD10/pubkeys registration of mr.business1148 on SHARD10 |
12:35
🔗
|
iabak-reg |
03registrar 05master d510cf1 06other 10SHARD10/pubkeys registration of mr.business1148 on SHARD10 |
12:43
🔗
|
|
atomotic has joined #internetarchive.bak |
13:20
🔗
|
iabak-reg |
03registrar 05master aff34a8 06other 10SHARD14/pubkeys registration of mr.business1148 on SHARD14 |
13:20
🔗
|
iabak-reg |
03registrar 05master 33bb081 06other 10SHARD14/pubkeys registration of mr.business1148 on SHARD14 |
13:20
🔗
|
iabak-reg |
03registrar 05master 1520d84 06other 10SHARD14/pubkeys registration of mr.business1148 on SHARD14 |
13:21
🔗
|
iabak-reg |
03registrar 05master 0125bf8 06other 10SHARD14/pubkeys registration of mr.business1148 on SHARD14 |
15:03
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
15:04
🔗
|
Jon |
1008G 928G 30G 97% /home/iabak \o/ |
15:06
🔗
|
|
sep332 has quit IRC (Konversation terminated!) |
16:09
🔗
|
|
atomotic has joined #internetarchive.bak |
16:12
🔗
|
|
milenko has quit IRC (Ping timeout: 250 seconds) |
16:13
🔗
|
|
milenko has joined #internetarchive.bak |
16:18
🔗
|
|
godane has quit IRC (Quit: Leaving.) |
16:33
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
16:35
🔗
|
SketchCow |
Hey, so, this brings up a question. |
16:35
🔗
|
SketchCow |
Here we have registration of Mr. Business multiple times on each shard |
16:35
🔗
|
SketchCow |
Does this mean the same person has two copies of the data? |
16:36
🔗
|
Senji |
It *probably* means the client has bugged out again and done multiple registrations. But I can't tell from just the info in this channel. |
16:44
🔗
|
|
asktoomuc has joined #internetarchive.bak |
17:02
🔗
|
|
kyan has joined #internetarchive.bak |
17:22
🔗
|
|
minus_ has left WeeChat 1.6 |
17:27
🔗
|
|
kyan has quit IRC (Remote host closed the connection) |
17:30
🔗
|
Frogging |
oh looks like I got the hanging problem too |
17:34
🔗
|
Frogging |
I'll just restart it for now |
17:46
🔗
|
|
kyan has joined #internetarchive.bak |
17:51
🔗
|
iabak-reg |
03registrar 05master 6777bca 06other 10SHARD14/pubkeys registration of mitch on SHARD14 |
18:02
🔗
|
|
CyberJaco is now known as zz_CyberJ |
18:20
🔗
|
|
luckcolor has quit IRC (Remote host closed the connection) |
18:21
🔗
|
|
luckcolor has joined #internetarchive.bak |
18:25
🔗
|
|
luckcolor has quit IRC (Read error: Connection reset by peer) |
18:26
🔗
|
|
luckcolor has joined #internetarchive.bak |
18:28
🔗
|
|
luckcolor has quit IRC (Remote host closed the connection) |
18:31
🔗
|
|
luckcolor has joined #internetarchive.bak |
18:39
🔗
|
|
Start_ has joined #internetarchive.bak |
18:39
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
18:40
🔗
|
|
kyan has quit IRC (Remote host closed the connection) |
18:48
🔗
|
|
luckcolor has quit IRC (Read error: Connection reset by peer) |
18:56
🔗
|
|
luckcolor has joined #internetarchive.bak |
19:01
🔗
|
|
bwn has quit IRC (Ping timeout: 961 seconds) |
19:04
🔗
|
|
kyan has joined #internetarchive.bak |
20:05
🔗
|
|
db48x has joined #internetarchive.bak |
20:05
🔗
|
db48x |
big new SSD :) |
20:24
🔗
|
|
Start_ is now known as Start |
20:28
🔗
|
SketchCow |
closure: I notice the graph on the bottom of http://iabak.archiveteam.org/ has died |
20:32
🔗
|
|
bwn has joined #internetarchive.bak |
20:36
🔗
|
db48x |
hmm |
20:44
🔗
|
db48x |
I guess flatlining is a form of death |
20:47
🔗
|
db48x |
HCross: ping? |
20:48
🔗
|
Kaz |
I'll go and have a look |
20:49
🔗
|
Kaz |
oh wait that's from grafana or whatever |
20:49
🔗
|
db48x |
Kaz: do we have yet a file detailing the status of present and future shards, for coordination? |
20:49
🔗
|
Kaz |
as far as I'm aware, we do not |
20:49
🔗
|
Kaz |
by the looks of it, nothing is being pushed into graphite |
20:50
🔗
|
db48x |
https://github.com/ArchiveTeam/IA.BAK/blob/server/shardstats#L159 |
20:54
🔗
|
Kaz |
server has uptime of 3 days, graph died 3 days ago |
20:55
🔗
|
db48x |
Last update: Fri Nov 18 15:30:01 EST 2016 |
20:55
🔗
|
db48x |
run it by hand with -x and see what it's doing? |
20:56
🔗
|
Kaz |
that's a pretty big script |
20:56
🔗
|
Kaz |
as far as I can tell, it does 'things' |
20:57
🔗
|
db48x |
true |
20:57
🔗
|
db48x |
but with -x you can follow along |
21:06
🔗
|
Kaz |
I have either broken it, or fixed it |
21:06
🔗
|
Kaz |
only time will tell |
21:07
🔗
|
db48x |
heh |
21:08
🔗
|
db48x |
how did you fix it? |
21:11
🔗
|
Kaz |
well lets not get too hasty, the graph hasn't updated yet |
21:12
🔗
|
db48x |
:) |
21:12
🔗
|
db48x |
what change did you make that you hope will fix it? |
21:12
🔗
|
Kaz |
https://github.com/ArchiveTeam/IA.BAK/blob/propellor/IABak.hs#L37 |
21:12
🔗
|
Kaz |
as far as I can tell, these cronjobs don't exist on the server |
21:14
🔗
|
db48x |
ah, clever |
21:16
🔗
|
|
sep332_ is now known as sep332 |
21:16
🔗
|
Kaz |
this doesn't seem to have worker |
21:17
🔗
|
db48x |
it should run on the half-hour |
21:17
🔗
|
db48x |
give it another 15 minutes? |
21:17
🔗
|
Kaz |
well |
21:18
🔗
|
Kaz |
I decided to be brave and ran it myself once first, to see if it updated the graph |
21:18
🔗
|
Kaz |
it did not |
21:18
🔗
|
db48x |
hrm |
21:19
🔗
|
db48x |
did you run it with -x? |
21:19
🔗
|
Kaz |
no |
21:19
🔗
|
db48x |
oh |
21:19
🔗
|
db48x |
if you had, you could scroll back and figure out what went wrong |
21:20
🔗
|
db48x |
(I set my tmux to hold a million lines of buffer; it's very handy even if it can waste a gigabyte of memory) |
21:28
🔗
|
Kaz |
yipdw around? |
21:34
🔗
|
Kaz |
db48x: so by the looks of it it's going through sendstat with the values we need, but graphite isn't actually doing anything with the data |
21:34
🔗
|
db48x |
hmm |
21:35
🔗
|
db48x |
is it listening on the UDP port? |
21:36
🔗
|
db48x |
port 2003 |
21:36
🔗
|
Kaz |
I see tcp on 2003&2004 |
21:36
🔗
|
db48x |
no, we use udp |
21:37
🔗
|
Kaz |
yes |
21:37
🔗
|
Kaz |
I'm trying to work out if it's my netstat flags that are the issue |
21:37
🔗
|
db48x |
err, or do we? |
21:37
🔗
|
db48x |
netstat -4lnp is what I use |
21:37
🔗
|
db48x |
ok, I guess we're actually using tcp |
21:38
🔗
|
db48x |
https://github.com/ArchiveTeam/IA.BAK/blob/73394b30ac6684e2c06eabca5fdc663b80e0d4a4/shardstats#L25 |
21:38
🔗
|
db48x |
check graphite's logs, I guess |
22:05
🔗
|
|
komarEX has joined #internetarchive.bak |
22:06
🔗
|
komarEX |
hello |
22:06
🔗
|
komarEX |
I'm getting a lot of "Unable to access these remotes: web" errors |
22:06
🔗
|
komarEX |
what do? |
22:10
🔗
|
Kaz |
komarEX: don't worry too much, most likely it's just darked (hidden) items from IA |
22:11
🔗
|
Kaz |
does it have any other error messages other than the "unable to access remotes"? |
22:11
🔗
|
komarEX |
Kaz: a lot of "MD5-something" gets |
22:12
🔗
|
komarEX |
Kaz: http://pastebin.com/raw/8UXqifh0 |
22:12
🔗
|
komarEX |
errors like these in count of way too much |
22:13
🔗
|
Kaz |
which shard is this? |
22:14
🔗
|
|
sep332 is now known as sep332_ |
22:15
🔗
|
komarEX |
Kaz: shard4 |
22:16
🔗
|
Kaz |
ah right okay |
22:16
🔗
|
Kaz |
yeah, 99% nothing to worry about. darked items are something we can't really avoid as such |
22:17
🔗
|
Kaz |
unless you're seeing it for *every* item |
22:17
🔗
|
komarEX |
Kaz: in the meantime I saw maybe 2-3 files that downloaded normally |
22:17
🔗
|
db48x |
it also frequently happens for metadata files, since we don't have a hash for them |
22:21
🔗
|
komarEX |
when I'm at it |
22:21
🔗
|
komarEX |
why my local shard size doesn't update on site? http://iabak.archiveteam.org/client/b32d595395f444f44bd26c500db785fc228d3bed.html |
22:22
🔗
|
komarEX |
stays at 80GB when I have about 600 or so already downloaded |
22:31
🔗
|
db48x |
komarEX: we're looking at a problem with the stats at the moment :) |
22:31
🔗
|
komarEX |
db48x: ah ok |
23:06
🔗
|
|
komarEX has quit IRC (Quit: Page closed) |
23:17
🔗
|
|
asktoomuc has quit IRC (Quit: Page closed) |
23:46
🔗
|
|
bwn has quit IRC (Ping timeout: 244 seconds) |