[00:31] *** garyrh has quit (Write error: Connection reset by peer) [01:00] *** garyrh (garyrh@[redacted]) has joined #internetarchive.bak [01:00] *** svchfoo3 gives channel operator status to garyrh [01:24] *** zottelbey has quit (Remote host closed the connection) [01:38] wow, 50% 4 copies, and 50% 3 copies! [01:39] my node has finished downloading, abuot 3 hours ago [01:40] so about 48h all up [01:41] oh, you have it all? [01:41] yep [01:41] there's always SHARD2 ;) [01:42] Wow! This shard of the IA is fully backed up now! [01:42] I know of 4 copies of every file (including one copy at the IA itself). [01:43] ah, that message is a little wrong isn't it [01:43] yeah, I was just thinking that [01:44] how do I start on SHARD2? [01:44] run ./checkoutshard shard2 [01:45] IA.BAK/master 65b9cc0 Joey Hess: fix message when done; a client may have downloaded everything but that does not mean NUMCOPIES was reached yet [01:45] IA.BAK/master 691c95a Joey Hess: work on newest shards first, those are most likely to not be done yet [01:45] [IA.BAK] joeyh pushed 2 new commits to master: http://git.io/ve3Um [01:46] details, details [01:46] IA.BAK/master e67be5b Joey Hess: accept uppercase SHARD names [01:46] [IA.BAK] joeyh pushed 1 new commit to master: http://git.io/ve3U8 [01:48] IA.BAK/master 1ca901e Joey Hess: better, just grep -i for shard names [01:48] [IA.BAK] joeyh pushed 1 new commit to master: http://git.io/ve3U7 [01:49] joeyh: tac doesn't work (shard2 is returned first by find for me) [01:49] drat [01:50] sort -nr [01:50] fails at 100 [01:51] -n needs only numbers I think [01:52] I know, I'll add the ctime to the find output [01:55] IA.BAK/master 0d43159 Joey Hess: sort by ctime [01:55] [IA.BAK] joeyh pushed 1 new commit to master: http://git.io/ve3Tx [01:56] yep, running shard2 now [01:56] it's 3.2tb iirc [01:57] hmm, my diskreserve wasn't applied in the shard2/.git/config [01:58] I forget if I tested that part [01:58] code for it is in checkoutshard [01:58] git user.name and email were set [01:59] I suspect the find didn't work for the same reason [01:59] ah, I see [02:00] found shard2 [02:01] IA.BAK/master ab2c2c9 Joey Hess: find prevshard before we create a new shard [02:01] [IA.BAK] joeyh pushed 1 new commit to master: http://git.io/ve3kd [02:02] cheers [02:12] why is 3 copies red now? [02:14] because the coloring is broken [02:14] it just averages between green and red, and with 2 current groups, one is green and one red [02:15] 3 copies was appearing grey earlier too [02:15] *** trs80 nods [02:16] I think SketchCow can fix it by filling in a 0.. if you view source, some lines just lack a number in them [02:19] hmm, no, doesn't help. [02:20] filling in a dummy value of 1 does seem to work [02:20] heh, 0.0001 works [02:20] where's the code for the stats page? [02:20] not in git wherever it is.. [02:21] aha. i don't see a dot for me on the map :( [02:21] thought i'd poke around about that. [02:22] http://iabak.archiveteam.org/stats/SHARD1.geolist [02:24] oh, he seems to me generating it in my home directory [02:24] that's convenient for me finding it [02:26] Alas, the entry for me is slightly wrong. [02:27] ... admittedly only trivially so, I think that's the right ZIP code for the other side of the street. :) [02:27] heh [02:29] fixed [02:32] *** svchfoo2 gives channel operator status to chfoo [02:33] it thinks I'm in milan and SC, and NJ. Wrong on 3 counts [02:33] the geolocations are pretty good though [02:34] closure: while you're in there, I posted a change that would add the counts to the tooltips [02:34] we should probably pull it into git [02:34] yesterday afternoon sometime [02:34] agreed! [02:35] dinner, bbl [02:40] hmm. not sure we can avoid using `find`s printf feature now that it's used to output the ctime for sorting. [02:42] IA.BAK/master c35feeb Joey Hess: docs [02:42] IA.BAK/server 2effa3d Joey Hess: add web page generator script [02:42] [IA.BAK] joeyh pushed 1 new commit to master: http://git.io/ve3YT [02:42] [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/ve3Yk [02:42] beardicus: well, we can add a fallback that works less well [02:42] k. i was just using awk instead to strip the './' [02:44] though it looks like that's no longer happening and was completely unecessary anyways :) [02:49] *** S[h]O[r]T (~Sh]Or]T@[redacted]) has joined #internetarchive.bak [02:49] who can take my pub key? :) [02:50] dude [02:50] over here [02:52] IA.BAK/pubkey 1754006 Joey Hess: add S[h]O[r]T [02:52] [IA.BAK] joeyh pushed 1 new commit to pubkey: http://git.io/ve3OZ [02:59] IA.BAK/master b134621 Joey Hess: hit webhook to pull new keys in registration process (github's web hook seems slow or not always arriving) [02:59] [IA.BAK] joeyh pushed 1 new commit to master: http://git.io/ve33Y [03:08] is the "get" part of git-annex or is that really just wget? [03:08] as in "git annex get" ? [03:09] yeah, i assume that is what is downloading the files [03:09] it's 3 layers with wget underneath in this case [03:16] ah ok. [03:16] if i read this right i can run iabak multiple times without issue? [03:19] indeed [03:19] one of the things one of those layers allows :) [03:55] Hey what wha [04:10] Good cheap hack. [04:10] Meanwhile, we're closed to 100% 4 [04:46] that's wicked progress from like yesterday [04:46] in fact all my downloads are done [04:50] Sweet, we've only got 3 copy stuff left. [04:51] closure: you need to add some sort of percentage to reading the size info for "git annex info" [05:52] *** niyaje3 (~niyaje@[redacted]) has joined #internetarchive.bak [07:24] *** niyaje3 has quit (Ping timeout: 600 seconds) [08:34] *** bzc6p_ (~bzc6p@[redacted]) has joined #internetarchive.bak [08:34] *** bzc6p has quit (Read error: Operation timed out) [08:38] *** bzc6p_ is now known as bzc6p [11:32] *** zottelbey (~zottelbey@[redacted]) has joined #internetarchive.bak [12:04] *** Senji (jdamery@[redacted]) has joined #internetarchive.bak [12:05] Hello, can I sign up? [12:11] Senji: git clone https://github.com/ArchiveTeam/IA.BAK then have a look at readme.md [12:12] you may need to wait for someone (closure) who can add your ssh key [12:12] Yes, I've done that :) iabak told me to come to irc and give someone an ssh key :) [12:12] that! [12:15] Senji: dump it here, closure will add it [12:15] closure is in the US, so might still be sleeping. in the interim you could add it to https://github.com/ArchiveTeam/IA.BAK/blob/pubkey/SHARD1/pubkeys and send a pull request if your git-fu is up to it [12:16] *** bzc6p (~bzc6p@[redacted]) has left #internetarchive.bak [12:16] ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCY455AWlXCJQjmZf6UZQHNlS0JdE7+24HLaNzZd2lWTDL9xHQkO/m2itYrhh3l5keT8Og3ggTyK+xWM7+1J7Vr7AtjPrM9fmwOwJ9YqFTy6T1ELHNriAl1KJOB75NDF2XKymp6v+T1MSTwPTDx/jmlh4g5C4TJjMNzX6KzKSQO4fNpPqAXZGANcTEWUyD58otq0ihCLfuG6ziyWIKZIoKSkGzf1RML1jsSnyuegbQ6bolMqct5YSt9yv21yoBYNabXGZKOlKyqA7h7Y12SlYKOok9fTmU51EKRJfLkuQETzIp9i0kvIvVWysWJJ6lX5FIo0PmLclq/Vxs+/3dfxlDD jdamery@cleopatra [12:17] trs80: I'm not really up to handling github faff from my phone (in a break during a con at the moment :)) [12:18] yeah, fair enough. eastercon? [12:18] Yup [12:51] wish i was getting better speeds from IA. running the script around 7x. was doing almost 1MB/s each during the night but slowed down to around 100KB/s-200KB/s this morning [12:55] i feel like it hardly matters how many copies you run at a certain point. there is hardly any difference in speed for me whether i run 10 or 20 copies. 2MB/s is the maximum i will get. i guess the peering to germany is terrible [12:56] sometimes i get peaks of 7MB/s on single files. but thats rare enough. [12:58] I'll add your key, one sec [13:08] IA.BAK/pubkey 0263695 Daniel Brooks: add a key for Senji [13:08] [IA.BAK] db48x pushed 1 new commit to pubkey: http://git.io/vesbl [14:00] thanks [14:02] you're welcome [15:40] *** kyan has quit (Ping timeout: 258 seconds) [17:19] Things look like we're getting stuff done, which is great. [18:21] *** kyan (~kyan@[redacted]) has joined #internetarchive.bak [18:26] *** kyan has quit (Client Quit) [19:10] SketchCow: any idea if IA.BAK is putting extra pressure on IA's network? the ISC link has been in the red zone since the past 2 months [19:10] that's probably partially the reason for flucutating download speeds [19:14] Hmm, iabak still can't get access to SHARD1@iabak.archiveteam.org:shard - do I need to wait for someone to pull the key to the shard server or something? :) [19:47] It is NOT putting extra pressure. [19:47] What it's doing is revealing slowdowns. [19:47] WHich is great [19:48] We're going to fix these things soon [19:48] *** SN4T14_ (~SN4T14@[redacted]) has joined #internetarchive.bak [19:48] Yeah, they don't appear to have any instances of falling over or anything. [19:49] This certainly hasn't caused a DOS on the Archive. :) [19:50] *** kyan (~kyan@[redacted]) has joined #internetarchive.bak [19:56] *** SN4T14__ has quit (Ping timeout: 512 seconds) [20:03] Senji: your key is on the server afaics [20:04] Aha it's working now. great! [20:05] IA.BAK/master 8c57f7a Joey Hess: fix url to force immediate key deploy [20:05] [IA.BAK] joeyh pushed 1 new commit to master: http://git.io/veZSa [20:07] *** kyan has quit (Quit: This computer has gone to sleep) [20:07] We should figure out how long it's taken to initiate and complete this 2.91gb shard [20:08] http://iabak.archiveteam.org/stats/SHARD1.filestransferred [20:08] we have most of the history here [20:08] (still needs a graph of this made) [20:09] what kind of data is that closure ? total transferred? [20:10] number of downloads of files [20:10] *** closure sees SHARD2 already has 5 clients and 32k files transferred [20:11] oh shard2 is active? [20:11] ./checkoutshard SHARD2 [20:11] ill start on that in a bit [20:15] *** closure is a little concerned by the 21486 files on SHARD1 that have more than 3 copies downloaded [20:15] that's a quarter of the whole thing that was downloaded more than necessary [20:16] Huh. Race condition? [20:16] yes, 2 clients decide to get the same file same time [20:16] Presumably there's no simple way to get two hosts to coordinate on that. [20:18] What about the 15 files at +7? that's a lot of race [20:18] closure: are you forcing the ignore if downloaded more than once? [20:18] because i'm not running with any copies limitation [20:19] tpw_rules: well, if you're just running git annex get without --not --copies 4, it'll just get everything [20:19] exactly. i think people are doing that [20:19] maybe [20:19] we might get cleaner data from the next shard. [20:20] blah i don't have disk space for that [20:20] Though I'd highly recommend coordinating with others for it, it *might* be worthwhile for some people with a lot of redundant copies to drop them. [20:21] Particularly if it'd let you get more helpful data. [20:25] *** closure notices that the first several files have 6 and 7 copies [20:25] so probably those ones are just people running git annex get manually [20:33] hmm, i'm getting errors: dirname: invalid option -- 'z' [20:34] Senji: it's an old version of coreutils iirc [20:34] touch NOSHUF and it will avoid that codepath [20:36] right that's happily downloading now [20:38] closure: thats probably me :p [20:38] actually, I forgot I did a manual git annex get in a lot of usenethistorical, so probably me too ;) [20:39] lets see if munin likes to plot some downloaded files in a neat way [20:41] in fact, you can see the culprits in git annex whereis --copies 5 .. the last listed client is the one that got the file last [20:42] closure: http://dit.serveert.me.uk:8081/munin/serveert.me.uk/dit.serveert.me.uk/iabak.html [20:43] graph seems empty? [20:43] it grabs your imput file from http://iabak.archiveteam.org/stats/SHARD1.filestransferred and feeds it into munin [20:43] yeah only started like 5 minutes ago :p [20:44] ah, it's not getting old data [20:44] nope [20:44] i could add that [20:44] let me make a ALL.filestransferred across all shards and you could add that [20:44] that would be perfect [20:51] IA.BAK/server 0debc26 Joey Hess: get stats for ALL [20:51] IA.BAK/server 6ce9b36 Joey Hess: support ALL [20:51] [IA.BAK] joeyh pushed 2 new commits to server: http://git.io/veZpe [20:53] there we go [20:54] http://iabak.archiveteam.org/stats/ALL.filestransferred is the output file? [20:55] SketchCow: BTW, I don't know what you're planning for the graphs as we get more shards, but you could use this to do a page overviewing them all [20:55] midas: yes [20:55] perfect. ill change the imput file [21:01] midas: also, can munin make anything of http://iabak.archiveteam.org/stats/ALL.clientconnsperhour ? [21:02] sure, but i dont see any data in it [21:02] just timestamps [21:03] the data is the first number [21:04] i prob should make similar graphs on my client. ive just been looking at my download speeds & disk utilization http://74.63.230.61/munin/localdomain/localhost.localdomain/if_eth0.html http://74.63.230.61/munin/localdomain/localhost.localdomain/diskstats_utilization/vdc.html [21:04] [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/veZjM [21:04] IA.BAK/server 59eaeb4 Joey Hess: add ALL file that sums up the numcopies stats for all shards [21:04] yeah, that's a confusing format [21:04] ah right, yeah now i get it :p [21:04] shouldnt be a problem, ill build that in a minute [21:06] IA.BAK/server 96a29e6 Joey Hess: move file into place [21:06] [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/veneq [21:07] [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/veneR [21:08] IA.BAK/server 212b7af Joey Hess: improve format [21:08] IA.BAK/server a1dc565 Joey Hess: include 0 [21:08] [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/vener [21:33] seems to be working closure, i've added the graphs per client/hour but that needs to update [21:42] http://dit.serveert.me.uk:8081/munin/serveert.me.uk/dit.serveert.me.uk/iaconnected-day.png closure [21:49] closure: My plan for the graphs that I'm on is just to concatenate all the shards. [21:50] I'm just waiting to hit the full SHARD1 and then we'll throw them together [21:55] Since people are jumping the gun and grabbing shard2, when we add it in, it'll be well along. [22:22] *** beardicus has quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz…) [22:44] *** beardicus (~beardicus@[redacted]) has joined #internetarchive.bak [22:50] *** niyaje3 (~niyaje@[redacted]) has joined #internetarchive.bak [22:56] *** beardicus has quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz…) [22:56] *** beardicus (~beardicus@[redacted]) has joined #internetarchive.bak [23:00] *** beardicus has quit (Client Quit) [23:00] *** beardicus (~beardicus@[redacted]) has joined #internetarchive.bak [23:04] *** niyaje3 has quit (Read error: Connection reset by peer) [23:14] *** zottelbey has quit (Remote host closed the connection) [23:15] *** niyaje3 (~niyaje@[redacted]) has joined #internetarchive.bak [23:21] IA.BAK/server d6512ee Joey Hess: add SHARDn.size file giving size in tb [23:21] [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/venRY [23:29] IA.BAK/server 8ec147e Joey Hess: fix [23:29] [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/ven0Q [23:42] IA.BAK/server 4e51e2c Joey Hess: fix [23:42] IA.BAK/server 97896ea Joey Hess: prepare for SHARD2 and ALL; add munin graphs [23:42] [IA.BAK] joeyh pushed 2 new commits to server: http://git.io/venuB [23:44] IA.BAK/server 4adbce0 Joey Hess: redundant div [23:44] [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/venuM [23:44] midas: odd saw-toothing on the connected users grap [23:44] +h [23:46] closure: I see that git-annex is in FreeBSD's ports collection; would you recommend it as stable enough for this project? [23:46] my big storage thing is a ZFS pool that I'm looking to migrate off of Solaris [23:47] what's the version number? [23:47] if the version is new enough [23:47] needs 201502something [23:47] I don't know [23:47] oh wait, version of git-annex or the ZFS pool [23:47] the former [23:47] oh I plan to use whatever is newest and most appropriate [23:48] I haven't actually built the FreeBSD system yet, just want to know if I'll hit any known roadblocks doing so [23:50] accessing the pool via CIFS on an Ubuntu machine seemed(?) to work but I'd rather have one system powered instead of two [23:51] plus doing that I can avoid all the messages about crippled filesystems :P [23:52] IA.BAK/server ac9ef18 Joey Hess: generate stats for all individual shards, and combined stats [23:52] [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/venzN [23:55] http://teamarchive1.fnf.archive.org/ia.bak/SHARD2.html [23:56] nice [23:57] oh dear [23:57] i don't have the disk for that :(