[00:04] *** cmaldonad has joined #internetarchive.bak [00:06] *** cmaldonad has quit IRC (Client Quit) [00:17] So, I've got a problem [00:17] iabak repeatedly hangs on some random item [00:18] just sits there and nothing is happening until I CTRL-C and restart [00:18] no network traffic, no cpu usage, nothing [00:36] I don't necessarily want to support SMB [00:36] But I want our docs to go "so, you're saddled with SMB - do this" [00:37] SketchCow: :) [00:37] sevs: what does it say before it hangs? [00:38] db48x: nothing, says "get starr/kajongsonsaengmu038800/kajongsonsaengmu038800_scandata.xml (from web...)" and nothing after that [00:39] ok, so perhaps it's a network thing? [00:40] go into that shard's directory and run ../git-annex.linux/git-annex get starr/kajongsonsaengmu038800/kajongsonsaengmu038800_scandata.xml [00:40] hmm, perhaps? I'm ssh'd into the machine, when I restart it it works again for a couple items [00:41] and it happens on two machines, two different shards, two different networks [00:41] hmm [00:41] sec [00:41] ok, that works [00:42] downloads the one file and everything is fine [00:44] and you say that it does download files for a while and then hangs [00:44] itermittant problem? [00:44] now i've started iabak again, downloads two items and sits one the third [00:45] if you leave it running for a while while it's hung, does it time out and start going again? [00:48] http://pastebin.com/WhmSYAc1 [00:48] it does not timeout [00:48] had it sitting there for more than 20 minutes [00:49] the message about not being able to verify the content is fine; that happens for some metadata for which we don't have hashes [00:49] but not timing out is weird [00:50] i have iftop open in another window and aside from some broadcasts there is literally no traffic [00:51] yeah no idea what the cause could be [00:51] was working fine yesterday [00:51] use tcpdump to capture the actual packets? [00:52] uhhh [00:53] would need you to walk me through that or some tutorial [00:56] :) [00:56] I have to eat, but I'll be back in a bit [00:56] perhpas if you google for instructions for using tcpdump to capture the traffic generated by a program you'll find something useful [00:56] bbl [00:57] ok, I'll be waiting [01:09] db48x: is it even possible to filter that traffic by pid? [01:17] so, "tcpdump -i bond0 -w tcp.dump" should capture everything, later on i can then take a look at it with wireshark [01:55] ofc now it works without problem ffs [02:20] *** svchfoo3 sets mode: +o balrog [02:23] sevs: heh, that figures :) [03:55] *** db48x has quit IRC (Quit: new ssd) [04:04] *** kyan has quit IRC (Quit: Leaving) [06:19] *** bwn has quit IRC (Ping timeout: 244 seconds) [06:50] *** Start has joined #internetarchive.bak [08:00] *** bwn has joined #internetarchive.bak [08:03] *** yipdw has quit IRC (Read error: Operation timed out) [08:04] *** yipdw has joined #internetarchive.bak [08:04] *** svchfoo1 sets mode: +o yipdw [08:31] *** atomotic has joined #internetarchive.bak [08:53] *** sevs has quit IRC (Ping timeout: 268 seconds) [08:55] *** zhongfu has quit IRC (Ping timeout: 260 seconds) [09:01] *** zhongfu has joined #internetarchive.bak [10:41] 03registrar 05master 307eced 06other 10SHARD4/pubkeys registration of mr.business1148 on SHARD4 [10:41] 03registrar 05master af79bb9 06other 10SHARD4/pubkeys registration of mr.business1148 on SHARD4 [10:41] 03registrar 05master 4958ddc 06other 10SHARD4/pubkeys registration of mr.business1148 on SHARD4 [10:41] 03registrar 05master d394608 06other 10SHARD4/pubkeys registration of mr.business1148 on SHARD4 [10:45] *** bwn has quit IRC (Ping timeout: 244 seconds) [10:55] *** bwn has joined #internetarchive.bak [11:04] *** atomotic has quit IRC (Remote host closed the connection) [11:08] *** atomotic has joined #internetarchive.bak [12:19] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [12:35] 03registrar 05master 642b216 06other 10SHARD10/pubkeys registration of mr.business1148 on SHARD10 [12:35] 03registrar 05master d510cf1 06other 10SHARD10/pubkeys registration of mr.business1148 on SHARD10 [12:43] *** atomotic has joined #internetarchive.bak [13:20] 03registrar 05master aff34a8 06other 10SHARD14/pubkeys registration of mr.business1148 on SHARD14 [13:20] 03registrar 05master 33bb081 06other 10SHARD14/pubkeys registration of mr.business1148 on SHARD14 [13:20] 03registrar 05master 1520d84 06other 10SHARD14/pubkeys registration of mr.business1148 on SHARD14 [13:21] 03registrar 05master 0125bf8 06other 10SHARD14/pubkeys registration of mr.business1148 on SHARD14 [15:03] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [15:04] 1008G 928G 30G 97% /home/iabak \o/ [15:06] *** sep332 has quit IRC (Konversation terminated!) [16:09] *** atomotic has joined #internetarchive.bak [16:12] *** milenko has quit IRC (Ping timeout: 250 seconds) [16:13] *** milenko has joined #internetarchive.bak [16:18] *** godane has quit IRC (Quit: Leaving.) [16:33] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [16:35] Hey, so, this brings up a question. [16:35] Here we have registration of Mr. Business multiple times on each shard [16:35] Does this mean the same person has two copies of the data? [16:36] It *probably* means the client has bugged out again and done multiple registrations. But I can't tell from just the info in this channel. [16:44] *** asktoomuc has joined #internetarchive.bak [17:02] *** kyan has joined #internetarchive.bak [17:22] *** minus_ has left WeeChat 1.6 [17:27] *** kyan has quit IRC (Remote host closed the connection) [17:30] oh looks like I got the hanging problem too [17:34] I'll just restart it for now [17:46] *** kyan has joined #internetarchive.bak [17:51] 03registrar 05master 6777bca 06other 10SHARD14/pubkeys registration of mitch on SHARD14 [18:02] *** CyberJaco is now known as zz_CyberJ [18:20] *** luckcolor has quit IRC (Remote host closed the connection) [18:21] *** luckcolor has joined #internetarchive.bak [18:25] *** luckcolor has quit IRC (Read error: Connection reset by peer) [18:26] *** luckcolor has joined #internetarchive.bak [18:28] *** luckcolor has quit IRC (Remote host closed the connection) [18:31] *** luckcolor has joined #internetarchive.bak [18:39] *** Start_ has joined #internetarchive.bak [18:39] *** Start has quit IRC (Read error: Connection reset by peer) [18:40] *** kyan has quit IRC (Remote host closed the connection) [18:48] *** luckcolor has quit IRC (Read error: Connection reset by peer) [18:56] *** luckcolor has joined #internetarchive.bak [19:01] *** bwn has quit IRC (Ping timeout: 961 seconds) [19:04] *** kyan has joined #internetarchive.bak [20:05] *** db48x has joined #internetarchive.bak [20:05] big new SSD :) [20:24] *** Start_ is now known as Start [20:28] closure: I notice the graph on the bottom of http://iabak.archiveteam.org/ has died [20:32] *** bwn has joined #internetarchive.bak [20:36] hmm [20:44] I guess flatlining is a form of death [20:47] HCross: ping? [20:48] I'll go and have a look [20:49] oh wait that's from grafana or whatever [20:49] Kaz: do we have yet a file detailing the status of present and future shards, for coordination? [20:49] as far as I'm aware, we do not [20:49] by the looks of it, nothing is being pushed into graphite [20:50] https://github.com/ArchiveTeam/IA.BAK/blob/server/shardstats#L159 [20:54] server has uptime of 3 days, graph died 3 days ago [20:55] Last update: Fri Nov 18 15:30:01 EST 2016 [20:55] run it by hand with -x and see what it's doing? [20:56] that's a pretty big script [20:56] as far as I can tell, it does 'things' [20:57] true [20:57] but with -x you can follow along [21:06] I have either broken it, or fixed it [21:06] only time will tell [21:07] heh [21:08] how did you fix it? [21:11] well lets not get too hasty, the graph hasn't updated yet [21:12] :) [21:12] what change did you make that you hope will fix it? [21:12] https://github.com/ArchiveTeam/IA.BAK/blob/propellor/IABak.hs#L37 [21:12] as far as I can tell, these cronjobs don't exist on the server [21:14] ah, clever [21:16] *** sep332_ is now known as sep332 [21:16] this doesn't seem to have worker [21:17] it should run on the half-hour [21:17] give it another 15 minutes? [21:17] well [21:18] I decided to be brave and ran it myself once first, to see if it updated the graph [21:18] it did not [21:18] hrm [21:19] did you run it with -x? [21:19] no [21:19] oh [21:19] if you had, you could scroll back and figure out what went wrong [21:20] (I set my tmux to hold a million lines of buffer; it's very handy even if it can waste a gigabyte of memory) [21:28] yipdw around? [21:34] db48x: so by the looks of it it's going through sendstat with the values we need, but graphite isn't actually doing anything with the data [21:34] hmm [21:35] is it listening on the UDP port? [21:36] port 2003 [21:36] I see tcp on 2003&2004 [21:36] no, we use udp [21:37] yes [21:37] I'm trying to work out if it's my netstat flags that are the issue [21:37] err, or do we? [21:37] netstat -4lnp is what I use [21:37] ok, I guess we're actually using tcp [21:38] https://github.com/ArchiveTeam/IA.BAK/blob/73394b30ac6684e2c06eabca5fdc663b80e0d4a4/shardstats#L25 [21:38] check graphite's logs, I guess [22:05] *** komarEX has joined #internetarchive.bak [22:06] hello [22:06] I'm getting a lot of "Unable to access these remotes: web" errors [22:06] what do? [22:10] komarEX: don't worry too much, most likely it's just darked (hidden) items from IA [22:11] does it have any other error messages other than the "unable to access remotes"? [22:11] Kaz: a lot of "MD5-something" gets [22:12] Kaz: http://pastebin.com/raw/8UXqifh0 [22:12] errors like these in count of way too much [22:13] which shard is this? [22:14] *** sep332 is now known as sep332_ [22:15] Kaz: shard4 [22:16] ah right okay [22:16] yeah, 99% nothing to worry about. darked items are something we can't really avoid as such [22:17] unless you're seeing it for *every* item [22:17] Kaz: in the meantime I saw maybe 2-3 files that downloaded normally [22:17] it also frequently happens for metadata files, since we don't have a hash for them [22:21] when I'm at it [22:21] why my local shard size doesn't update on site? http://iabak.archiveteam.org/client/b32d595395f444f44bd26c500db785fc228d3bed.html [22:22] stays at 80GB when I have about 600 or so already downloaded [22:31] komarEX: we're looking at a problem with the stats at the moment :) [22:31] db48x: ah ok [23:06] *** komarEX has quit IRC (Quit: Page closed) [23:17] *** asktoomuc has quit IRC (Quit: Page closed) [23:46] *** bwn has quit IRC (Ping timeout: 244 seconds)