#internetarchive.bak 2016-11-18,Fri

↑back Search

Time Nickname Message
00:04 🔗 cmaldonad has joined #internetarchive.bak
00:06 🔗 cmaldonad has quit IRC (Client Quit)
00:17 🔗 sevs So, I've got a problem
00:17 🔗 sevs iabak repeatedly hangs on some random item
00:18 🔗 sevs just sits there and nothing is happening until I CTRL-C and restart
00:18 🔗 sevs no network traffic, no cpu usage, nothing
00:36 🔗 SketchCow I don't necessarily want to support SMB
00:36 🔗 SketchCow But I want our docs to go "so, you're saddled with SMB - do this"
00:37 🔗 db48x SketchCow: :)
00:37 🔗 db48x sevs: what does it say before it hangs?
00:38 🔗 sevs db48x: nothing, says "get starr/kajongsonsaengmu038800/kajongsonsaengmu038800_scandata.xml (from web...)" and nothing after that
00:39 🔗 db48x ok, so perhaps it's a network thing?
00:40 🔗 db48x go into that shard's directory and run ../git-annex.linux/git-annex get starr/kajongsonsaengmu038800/kajongsonsaengmu038800_scandata.xml
00:40 🔗 sevs hmm, perhaps? I'm ssh'd into the machine, when I restart it it works again for a couple items
00:41 🔗 sevs and it happens on two machines, two different shards, two different networks
00:41 🔗 db48x hmm
00:41 🔗 sevs sec
00:41 🔗 sevs ok, that works
00:42 🔗 sevs downloads the one file and everything is fine
00:44 🔗 db48x and you say that it does download files for a while and then hangs
00:44 🔗 db48x itermittant problem?
00:44 🔗 sevs now i've started iabak again, downloads two items and sits one the third
00:45 🔗 db48x if you leave it running for a while while it's hung, does it time out and start going again?
00:48 🔗 sevs http://pastebin.com/WhmSYAc1
00:48 🔗 sevs it does not timeout
00:48 🔗 sevs had it sitting there for more than 20 minutes
00:49 🔗 db48x the message about not being able to verify the content is fine; that happens for some metadata for which we don't have hashes
00:49 🔗 db48x but not timing out is weird
00:50 🔗 sevs i have iftop open in another window and aside from some broadcasts there is literally no traffic
00:51 🔗 sevs yeah no idea what the cause could be
00:51 🔗 sevs was working fine yesterday
00:51 🔗 db48x use tcpdump to capture the actual packets?
00:52 🔗 sevs uhhh
00:53 🔗 sevs would need you to walk me through that or some tutorial
00:56 🔗 db48x :)
00:56 🔗 db48x I have to eat, but I'll be back in a bit
00:56 🔗 db48x perhpas if you google for instructions for using tcpdump to capture the traffic generated by a program you'll find something useful
00:56 🔗 db48x bbl
00:57 🔗 sevs ok, I'll be waiting
01:09 🔗 sevs db48x: is it even possible to filter that traffic by pid?
01:17 🔗 sevs so, "tcpdump -i bond0 -w tcp.dump" should capture everything, later on i can then take a look at it with wireshark
01:55 🔗 sevs ofc now it works without problem ffs
02:20 🔗 svchfoo3 sets mode: +o balrog
02:23 🔗 db48x sevs: heh, that figures :)
03:55 🔗 db48x has quit IRC (Quit: new ssd)
04:04 🔗 kyan has quit IRC (Quit: Leaving)
06:19 🔗 bwn has quit IRC (Ping timeout: 244 seconds)
06:50 🔗 Start has joined #internetarchive.bak
08:00 🔗 bwn has joined #internetarchive.bak
08:03 🔗 yipdw has quit IRC (Read error: Operation timed out)
08:04 🔗 yipdw has joined #internetarchive.bak
08:04 🔗 svchfoo1 sets mode: +o yipdw
08:31 🔗 atomotic has joined #internetarchive.bak
08:53 🔗 sevs has quit IRC (Ping timeout: 268 seconds)
08:55 🔗 zhongfu has quit IRC (Ping timeout: 260 seconds)
09:01 🔗 zhongfu has joined #internetarchive.bak
10:41 🔗 iabak-reg 03registrar 05master 307eced 06other 10SHARD4/pubkeys registration of mr.business1148 on SHARD4
10:41 🔗 iabak-reg 03registrar 05master af79bb9 06other 10SHARD4/pubkeys registration of mr.business1148 on SHARD4
10:41 🔗 iabak-reg 03registrar 05master 4958ddc 06other 10SHARD4/pubkeys registration of mr.business1148 on SHARD4
10:41 🔗 iabak-reg 03registrar 05master d394608 06other 10SHARD4/pubkeys registration of mr.business1148 on SHARD4
10:45 🔗 bwn has quit IRC (Ping timeout: 244 seconds)
10:55 🔗 bwn has joined #internetarchive.bak
11:04 🔗 atomotic has quit IRC (Remote host closed the connection)
11:08 🔗 atomotic has joined #internetarchive.bak
12:19 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
12:35 🔗 iabak-reg 03registrar 05master 642b216 06other 10SHARD10/pubkeys registration of mr.business1148 on SHARD10
12:35 🔗 iabak-reg 03registrar 05master d510cf1 06other 10SHARD10/pubkeys registration of mr.business1148 on SHARD10
12:43 🔗 atomotic has joined #internetarchive.bak
13:20 🔗 iabak-reg 03registrar 05master aff34a8 06other 10SHARD14/pubkeys registration of mr.business1148 on SHARD14
13:20 🔗 iabak-reg 03registrar 05master 33bb081 06other 10SHARD14/pubkeys registration of mr.business1148 on SHARD14
13:20 🔗 iabak-reg 03registrar 05master 1520d84 06other 10SHARD14/pubkeys registration of mr.business1148 on SHARD14
13:21 🔗 iabak-reg 03registrar 05master 0125bf8 06other 10SHARD14/pubkeys registration of mr.business1148 on SHARD14
15:03 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
15:04 🔗 Jon 1008G 928G 30G 97% /home/iabak \o/
15:06 🔗 sep332 has quit IRC (Konversation terminated!)
16:09 🔗 atomotic has joined #internetarchive.bak
16:12 🔗 milenko has quit IRC (Ping timeout: 250 seconds)
16:13 🔗 milenko has joined #internetarchive.bak
16:18 🔗 godane has quit IRC (Quit: Leaving.)
16:33 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
16:35 🔗 SketchCow Hey, so, this brings up a question.
16:35 🔗 SketchCow Here we have registration of Mr. Business multiple times on each shard
16:35 🔗 SketchCow Does this mean the same person has two copies of the data?
16:36 🔗 Senji It *probably* means the client has bugged out again and done multiple registrations. But I can't tell from just the info in this channel.
16:44 🔗 asktoomuc has joined #internetarchive.bak
17:02 🔗 kyan has joined #internetarchive.bak
17:22 🔗 minus_ has left WeeChat 1.6
17:27 🔗 kyan has quit IRC (Remote host closed the connection)
17:30 🔗 Frogging oh looks like I got the hanging problem too
17:34 🔗 Frogging I'll just restart it for now
17:46 🔗 kyan has joined #internetarchive.bak
17:51 🔗 iabak-reg 03registrar 05master 6777bca 06other 10SHARD14/pubkeys registration of mitch on SHARD14
18:02 🔗 CyberJaco is now known as zz_CyberJ
18:20 🔗 luckcolor has quit IRC (Remote host closed the connection)
18:21 🔗 luckcolor has joined #internetarchive.bak
18:25 🔗 luckcolor has quit IRC (Read error: Connection reset by peer)
18:26 🔗 luckcolor has joined #internetarchive.bak
18:28 🔗 luckcolor has quit IRC (Remote host closed the connection)
18:31 🔗 luckcolor has joined #internetarchive.bak
18:39 🔗 Start_ has joined #internetarchive.bak
18:39 🔗 Start has quit IRC (Read error: Connection reset by peer)
18:40 🔗 kyan has quit IRC (Remote host closed the connection)
18:48 🔗 luckcolor has quit IRC (Read error: Connection reset by peer)
18:56 🔗 luckcolor has joined #internetarchive.bak
19:01 🔗 bwn has quit IRC (Ping timeout: 961 seconds)
19:04 🔗 kyan has joined #internetarchive.bak
20:05 🔗 db48x has joined #internetarchive.bak
20:05 🔗 db48x big new SSD :)
20:24 🔗 Start_ is now known as Start
20:28 🔗 SketchCow closure: I notice the graph on the bottom of http://iabak.archiveteam.org/ has died
20:32 🔗 bwn has joined #internetarchive.bak
20:36 🔗 db48x hmm
20:44 🔗 db48x I guess flatlining is a form of death
20:47 🔗 db48x HCross: ping?
20:48 🔗 Kaz I'll go and have a look
20:49 🔗 Kaz oh wait that's from grafana or whatever
20:49 🔗 db48x Kaz: do we have yet a file detailing the status of present and future shards, for coordination?
20:49 🔗 Kaz as far as I'm aware, we do not
20:49 🔗 Kaz by the looks of it, nothing is being pushed into graphite
20:50 🔗 db48x https://github.com/ArchiveTeam/IA.BAK/blob/server/shardstats#L159
20:54 🔗 Kaz server has uptime of 3 days, graph died 3 days ago
20:55 🔗 db48x Last update: Fri Nov 18 15:30:01 EST 2016
20:55 🔗 db48x run it by hand with -x and see what it's doing?
20:56 🔗 Kaz that's a pretty big script
20:56 🔗 Kaz as far as I can tell, it does 'things'
20:57 🔗 db48x true
20:57 🔗 db48x but with -x you can follow along
21:06 🔗 Kaz I have either broken it, or fixed it
21:06 🔗 Kaz only time will tell
21:07 🔗 db48x heh
21:08 🔗 db48x how did you fix it?
21:11 🔗 Kaz well lets not get too hasty, the graph hasn't updated yet
21:12 🔗 db48x :)
21:12 🔗 db48x what change did you make that you hope will fix it?
21:12 🔗 Kaz https://github.com/ArchiveTeam/IA.BAK/blob/propellor/IABak.hs#L37
21:12 🔗 Kaz as far as I can tell, these cronjobs don't exist on the server
21:14 🔗 db48x ah, clever
21:16 🔗 sep332_ is now known as sep332
21:16 🔗 Kaz this doesn't seem to have worker
21:17 🔗 db48x it should run on the half-hour
21:17 🔗 db48x give it another 15 minutes?
21:17 🔗 Kaz well
21:18 🔗 Kaz I decided to be brave and ran it myself once first, to see if it updated the graph
21:18 🔗 Kaz it did not
21:18 🔗 db48x hrm
21:19 🔗 db48x did you run it with -x?
21:19 🔗 Kaz no
21:19 🔗 db48x oh
21:19 🔗 db48x if you had, you could scroll back and figure out what went wrong
21:20 🔗 db48x (I set my tmux to hold a million lines of buffer; it's very handy even if it can waste a gigabyte of memory)
21:28 🔗 Kaz yipdw around?
21:34 🔗 Kaz db48x: so by the looks of it it's going through sendstat with the values we need, but graphite isn't actually doing anything with the data
21:34 🔗 db48x hmm
21:35 🔗 db48x is it listening on the UDP port?
21:36 🔗 db48x port 2003
21:36 🔗 Kaz I see tcp on 2003&2004
21:36 🔗 db48x no, we use udp
21:37 🔗 Kaz yes
21:37 🔗 Kaz I'm trying to work out if it's my netstat flags that are the issue
21:37 🔗 db48x err, or do we?
21:37 🔗 db48x netstat -4lnp is what I use
21:37 🔗 db48x ok, I guess we're actually using tcp
21:38 🔗 db48x https://github.com/ArchiveTeam/IA.BAK/blob/73394b30ac6684e2c06eabca5fdc663b80e0d4a4/shardstats#L25
21:38 🔗 db48x check graphite's logs, I guess
22:05 🔗 komarEX has joined #internetarchive.bak
22:06 🔗 komarEX hello
22:06 🔗 komarEX I'm getting a lot of "Unable to access these remotes: web" errors
22:06 🔗 komarEX what do?
22:10 🔗 Kaz komarEX: don't worry too much, most likely it's just darked (hidden) items from IA
22:11 🔗 Kaz does it have any other error messages other than the "unable to access remotes"?
22:11 🔗 komarEX Kaz: a lot of "MD5-something" gets
22:12 🔗 komarEX Kaz: http://pastebin.com/raw/8UXqifh0
22:12 🔗 komarEX errors like these in count of way too much
22:13 🔗 Kaz which shard is this?
22:14 🔗 sep332 is now known as sep332_
22:15 🔗 komarEX Kaz: shard4
22:16 🔗 Kaz ah right okay
22:16 🔗 Kaz yeah, 99% nothing to worry about. darked items are something we can't really avoid as such
22:17 🔗 Kaz unless you're seeing it for *every* item
22:17 🔗 komarEX Kaz: in the meantime I saw maybe 2-3 files that downloaded normally
22:17 🔗 db48x it also frequently happens for metadata files, since we don't have a hash for them
22:21 🔗 komarEX when I'm at it
22:21 🔗 komarEX why my local shard size doesn't update on site? http://iabak.archiveteam.org/client/b32d595395f444f44bd26c500db785fc228d3bed.html
22:22 🔗 komarEX stays at 80GB when I have about 600 or so already downloaded
22:31 🔗 db48x komarEX: we're looking at a problem with the stats at the moment :)
22:31 🔗 komarEX db48x: ah ok
23:06 🔗 komarEX has quit IRC (Quit: Page closed)
23:17 🔗 asktoomuc has quit IRC (Quit: Page closed)
23:46 🔗 bwn has quit IRC (Ping timeout: 244 seconds)

irclogger-viewer