[00:00] SketchCow: may I suggest linking to https://github.com/ArchiveTeam/IA.BAK/ or https://github.com/ArchiveTeam/IA.BAK/blob/master/README.md instead of that wiki page? [00:12] *** trs80 has quit IRC (Ping timeout: 186 seconds) [00:44] *** Quile has quit IRC (Ping timeout: 186 seconds) [00:45] *** kyan has joined #internetarchive.bak [00:50] *** Quile has joined #internetarchive.bak [01:07] *** mhazinsk has quit IRC (Ping timeout: 186 seconds) [01:08] *** mhazinsk has joined #internetarchive.bak [01:08] *** svchfoo3 sets mode: +o mhazinsk [01:19] *** trs80 has joined #internetarchive.bak [01:28] SketchCow: nice! [01:45] *** trs80 has quit IRC (Ping timeout: 186 seconds) [01:46] *** Start-mob has joined #internetarchive.bak [01:46] *** trs80 has joined #internetarchive.bak [01:47] *** mattl____ has joined #internetarchive.bak [01:48] threw up a VM with 500gb disk space to test this :) [01:48] *** logchfoo starts logging #internetarchive.bak at Thu Apr 23 01:48:24 2015 [01:48] *** logchfoo has joined #internetarchive.bak [01:49] *** mattl____ is now known as mattl [01:50] hey mattl! [01:50] hey! [01:51] *** closure is now known as joeyh [01:51] just to avoid confusion [01:51] ahhh [01:51] i was looking for you. [01:57] joeyh: i like the message reminding me to make a cronjob and not just be lazy and run everything inside screen [01:59] well, it's a start [01:59] we need auto-cron, but it's hard to set that up in the right way for everyone [02:00] i have 4 screens with ./iabak running in each one, let's see how that works. Bytemark have decent bandwidth, shouldn't take too long [02:00] I finally hit my space allocation so I've only got a cron job. [02:01] *** trs80 has quit IRC (Ping timeout: 186 seconds) [02:01] hmm, is BigV cheap enough to keep .5 tb spinning there? [02:01] yep. [02:01] 20GBP a month or something like that. [02:02] bytemark may not have the best bw to the IA, it seems it can be slow outside the US [02:02] well, this is a good test either way. CC is moving most things over to BigV. [02:02] *** trs80 has joined #internetarchive.bak [02:03] and we need to start talking to IA [02:03] sep332: Fix the wiki [02:11] i added minimal steps to get started and a link to the readme for the rest [02:12] also the iabak script tells people to read the readme when you run it [02:29] *** iabak-reg has joined #internetarchive.bak [02:30] *** iabak-reg has quit IRC (Client Quit) [02:31] *** iabak-reg has joined #internetarchive.bak [02:32] Nobody hopped on from this bit, I think [02:32] I will now have to push it [02:46] 05master 8e314b5 06other fast forward [02:47] hmm, not quite iabak-reg [02:48] 03registrar 05master a8adff8 06other 10SHARD2/pubkeys registration of wdenton on SHARD2 [02:48] that's more like it! [02:57] 03registrar 05master 79de7f9 06other 10SHARD2/pubkeys registration of justtesting on SHARD2 [03:12] *** Start-mob has quit IRC (Remote host closed the connection) [03:24] ooh [04:06] 03registrar 05master 4a3edb9 06other 10SHARD2/pubkeys registration of archiveteam on SHARD2 [04:13] 03registrar 05master a62663c 06other 10SHARD3/pubkeys registration of archiveteam on SHARD3 [04:15] *** kalleboo has joined #internetarchive.bak [04:18] hi. when I run iabak, my terminal fills up with "dirname: invalid option -- 'z'" [04:19] this is with GNU coreutils 8.4 [04:20] and that is the problem. [04:21] workaround: touch IA.BAK/NOSHUF and restart [04:22] ok cool [04:23] yeah this is one of those "lying around doing one old thing" servers which isn't really eligible for upgrading everythig to the latest and greatest. it's on some quite-old distribution of centos [04:24] we could fix it with a perl command that reads stdin, breaks on \0 , truncates to the directory name, and outputs back out with \0 [04:27] *** zottelbey has joined #internetarchive.bak [04:52] https://www.reddit.com/r/DataHoarder/comments/33iz8b/that_time_archive_team_decided_to_back_up_the/ [04:54] *** mhazinsk has quit IRC (Ping timeout: 186 seconds) [04:56] what is with /r/DataHoarder and assholes [04:56] the correlation coefficient is almost 1 [04:57] 1? That's not very high. [04:58] it is for the correlation coefficient [04:59] NOT HIGH. [04:59] you're right, that was two days ago [05:03] And the place where it's higher? HN. [05:23] how can the correlation be higher than 1? [05:31] "They backed up 9.12 TB? I don't mean to be a party pooper but that doesn't seem impressive. [05:31] " [05:31] That... [05:31] That sounds like someone who isn't all that cognizant of what all that involves. [05:32] It's not like we had some guy with a small number of empty drives mash wget. [06:02] also, 9.1 * 3 [06:04] *nod* [06:21] well it is /r/DataHoarder/ where there is a dude with a literal petabyte in his house [06:21] (hi ohhdemgir) [06:22] other forums might be more impressed [06:44] It won't be impressive until the number spikes up past a petabyte [07:23] *** stapper has joined #internetarchive.bak [07:31] I'm confused how the ia.bak git annex server is setup [07:32] does it just have a local copy of all the shards or is stuff setup to use a remove backed by internetarchive s3 or something [07:33] remote* [07:36] 03registrar 05master 0ea43c4 06other 10SHARD3/pubkeys registration of wild.dominic on SHARD3 [07:36] or are the files themselves backed by urls or something [07:38] espes___: each shard is a git annex repository where each file is added using git annex addurl [07:39] oh neat [07:42] if you do git annex whereis | less it'll show you where each file is located, including the url for the web remote [08:19] *** cloudmons has joined #internetarchive.bak [08:40] *** atomotic has joined #internetarchive.bak [08:43] Hmm, someone appears to have grabbed 2/3 of shard 2 overnight :-) [08:43] AKA yay shard2 finished [08:44] *** cloudmons has quit IRC (ircd.choopa.net irc.mzima.net) [09:09] Hurrah [09:30] *** marvinw has quit IRC (Read error: Operation timed out) [09:32] *** logchfoo_ starts logging #internetarchive.bak at Thu Apr 23 09:32:52 2015 [09:32] *** logchfoo_ has joined #internetarchive.bak [09:34] *** GLaDOS has quit IRC (Read error: Operation timed out) [09:36] *** GLaDOS has joined #internetarchive.bak [09:37] *** svchfoo2 sets mode: +o GLaDOS [09:52] *** Start has quit IRC (ircd.shaw.ca irc.shaw.ca) [09:52] *** chfoo- has quit IRC (ircd.shaw.ca irc.shaw.ca) [09:52] *** wp494 has quit IRC (ircd.shaw.ca irc.shaw.ca) [09:52] *** garyrh has quit IRC (ircd.shaw.ca irc.shaw.ca) [09:52] *** DFJustin has quit IRC (ircd.shaw.ca irc.shaw.ca) [09:52] *** matthusby has quit IRC (ircd.shaw.ca irc.shaw.ca) [09:52] *** Sanqui has quit IRC (ircd.shaw.ca irc.shaw.ca) [09:52] *** underscor has quit IRC (ircd.shaw.ca irc.shaw.ca) [09:52] *** csssuf has quit IRC (ircd.shaw.ca irc.shaw.ca) [09:57] *** Start has joined #internetarchive.bak [09:57] *** chfoo- has joined #internetarchive.bak [09:57] *** wp494 has joined #internetarchive.bak [09:57] *** garyrh has joined #internetarchive.bak [09:57] *** matthusby has joined #internetarchive.bak [09:57] *** DFJustin has joined #internetarchive.bak [09:57] *** Sanqui has joined #internetarchive.bak [09:57] *** underscor has joined #internetarchive.bak [09:57] *** csssuf has joined #internetarchive.bak [09:57] *** irc.shaw.ca sets mode: +o DFJustin [10:02] *** marvinw has joined #internetarchive.bak [10:22] *** marvinw has quit IRC (Ping timeout: 606 seconds) [10:30] *** S[h]O[r]T has joined #internetarchive.bak [10:30] *** sep332 has joined #internetarchive.bak [10:31] *** svchfoo2 sets mode: +o sep332 [10:32] *** marvinw has joined #internetarchive.bak [10:33] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [11:11] *** kalleboo has quit IRC (Linkinus - http://linkinus.com) [11:22] *** richo has joined #internetarchive.bak [11:23] *** zottelbey has quit IRC (Remote host closed the connection) [11:24] *** zottelbey has joined #internetarchive.bak [11:24] woooo [11:29] clients should be switching over to shard3 [11:38] *** atomotic has joined #internetarchive.bak [11:41] 03registrar 05master 9409bbf 06other 10SHARD1/pubkeys registration of id on SHARD1 [11:41] 03registrar 05master 3fbf3d3 06other 10SHARD2/pubkeys registration of id on SHARD2 [11:41] 03registrar 05master 8fc1f58 06other 10SHARD3/pubkeys registration of id on SHARD3 [11:45] *** zottelbey has quit IRC (Remote host closed the connection) [11:47] *** zottelbey has joined #internetarchive.bak [11:48] so shard2 took 10 days [11:51] Right. Although we were quiet about it initially. [11:51] We should probably set up the next 10 shards. [11:51] or 5 at least. [11:53] And is there scripting or contingency yet for the script to go "oh, there's more shards and I have more space" [11:54] there is, it needs a slight bit of dehardcoding to not just switch to shard3 though [12:02] I'd say work on that, next. [12:02] Then we can start making sure that people with space who show up aren't waiting for assignment. [12:03] Obviously, as time goes on, people with multi-terabyte sets are going to help us hit larger and larger collections. [12:23] 03registrar 05master 669c3d1 06other 10SHARD3/pubkeys registration of id on SHARD3 [13:02] *** sankin has joined #internetarchive.bak [13:57] *** Start has quit IRC (Disconnected.) [13:57] SketchCow: any collection recs for new shards? [14:16] I note that Shard 1 still doesn't have 100% >=3 backups (indeed it appears to have one file not backed up at all); is there anything going on to deal with that? [14:17] Shard 2 currently shows 100% >=3 so is comparatively in better shape [14:18] joeyh: I'll ping you with them [14:50] *** Start has joined #internetarchive.bak [14:57] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [14:57] *** Start has quit IRC (Disconnected.) [15:03] *** Start has joined #internetarchive.bak [15:05] *** phuzion has joined #internetarchive.bak [15:17] so, I think we'll soon be able to configure git-annex like so: ((balanced_amoung(backup) and not (copies=backup:3)) or present [15:18] and the files will be spread amoung the repos in a balanced way, w/o resorting to randomness like we do now [15:18] and without the extra copies some files get now [15:18] (well, with less of them anyhow) [15:37] So I'm still downloading files for SHARD2, but it looks to be 100% backed up now. Will the git-annex stuff be smart enough to start me downloading SHARD3? [15:37] it should switch you over, yes [15:37] ok [15:38] there might be a little period where your client hasn't heard in from the others that shard2 is done and does a little extra downloading [15:51] db48x: hey, I see you installed fail2ban on the server. any particular reason? [15:51] *** Start has quit IRC (Disconnected.) [15:51] I'm unclined to just disable all password auth, but let clients connect as often as they like [15:51] heey, this might explain some of the spikes in the graph, if a client got banned for a while [16:00] *** Start has joined #internetarchive.bak [16:26] *** real_eyes is now known as realeyes [16:32] 03registrar 05master d7b6ef4 06other 10SHARD3/pubkeys registration of bas+at on SHARD3 [16:35] seems to work fine on OS X :) [16:45] *** Start has quit IRC (Disconnected.) [17:15] *** VADemon has joined #internetarchive.bak [17:20] *** db48x has quit IRC (Ping timeout: 258 seconds) [17:28] iabak does not work on freenas, is this known? [17:28] (freenas is BSD based, I know) [17:31] there's no git-annex build for it, that'd be the first problem [17:35] ok, just wanted to make sure I wasn't going crazy [17:41] well, you're using BSD so you can't be too sure [17:41] * ersi hides [17:42] Hah [17:56] *** kyan has quit IRC (Quit: This computer has gone to sleep) [18:11] *** garyrh has quit IRC (Remote host closed the connection) [18:46] 03registrar 05master 761215d 06other 10SHARD3/pubkeys registration of chris on SHARD3 [19:22] Hey guys, I'm getting the following error, any ideas? http://pastebin.com/Tv3n79Yh [19:23] Should I just delete the files in question and rerun ./iabak? [19:32] *** Senji2 has joined #internetarchive.bak [19:33] mmm, fscking shard2. Guess it's time to setup cronjob on cleopatra [19:34] *** garyrh has joined #internetarchive.bak [19:43] phuzion: hmm, I wonder if your repository is in direct mode? [19:43] you could run git reset --hard in there [19:43] I nuked it, I'm gonna try again [19:43] don't know why a file would be changed though [19:43] wait one sec [19:44] you don't want to commit a deletion of that file [19:44] so git reset --hard [19:44] The original directory's already gone, sorry. [19:44] I'm starting from scratch. [19:44] oh, ok [19:45] oh, this is on freebsd? [19:45] nope [19:45] centos 7 [19:45] what filesystem? [19:45] i'm saving to an NFS share if that makes any difference [19:45] 03registrar 05master b729821 06other 10SHARD3/pubkeys registration of chris on SHARD3 [19:45] that's me re-registering [19:45] nfs is gonna be flakey one way or another [19:45] Bummer [19:46] Flaky as in I shouldn't bother with it? [19:47] depends, I've never seen it flake out this way before [19:48] out of curiosity, what kind of network performance are you seeing when you're pulling objects? [19:48] I'm getting about 200KiB/s [19:48] sounds about right [19:49] that's on the low end. I'd say run concurrent iabak, but that is known to not be wise on nfs [19:49] I can max out my adsl with 10 copies [19:50] Would iSCSI perform better? [19:57] gut feeling: concurrent ./iabak on OS X performs poorly due to lack of shuf? (lots of "transfer already in progress") [19:59] lhobas: it'll tend to contend with itself like that yes. [19:59] on linux with NOSHUF that doesn't slow things down much [20:00] makes it do a bit more work to find each file [20:02] yeah, but most of thr time you're downloading 300+MB files at 200kB/s rather than finding a new file [20:03] seeing lots of small files and contending atm with (beginning of) shard3 [20:04] wow <10% of the files are IA-only now [20:04] if the files are primarily small tgen you might be better off running one copy until you get to some bigger ones [20:05] depending how big 'small' is [20:06] Senji2: doing that for now [20:06] any chance of changing the hostname that is synced through git-annex? (and showing up in http://iabak.archiveteam.org/stats/SHARD3.leaderboard etc) Did not consider how it might leak personal info [20:09] for shard2 I got a bit over a TB with concurrent NOSHUF, but my machine running shard3 has a working readlink so I don't know what 3 is like in that regard. [20:12] lhobas: just cd SHARDn; git annex describe . whatever [20:17] *** atomotic has joined #internetarchive.bak [20:37] moar red :) [20:39] joeyh: I might be able to throw like 2-3TB at this if I can figure out iSCSI on this NAS [20:41] job for next week is to see how much of the spare disk pile still works :) [20:51] *** atomotic has quit IRC (Ping timeout: 260 seconds) [20:58] *** sankin has quit IRC (Leaving.) [21:39] 03registrar 05master 948a90f 06other 10SHARD4/pubkeys registration of sean.palmer on SHARD4 [21:53] I am having a hard time doing the math behind this project [21:53] do we intend this to be cold storage, or online? [22:13] *** kyan has joined #internetarchive.bak [22:21] nearline/online [22:39] 03registrar 05master d682f2b 06other 10SHARD3/pubkeys registration of primus1024 on SHARD3 [22:42] lhobas: for changing your hostname, cd into the shard directory, then run 'git annex describe . ' [22:43] fixed it, thanks joeyh & Kazzy [22:44] oh right, didn't see message from joeyh between commit messages, heh [22:44] *** primus102 has joined #internetarchive.bak [22:47] Hi, can someone help me with a problem running ./iabak? When it starts it keeps showing msg: dirname: invalid option -- 'z' [22:47] Try `dirname --help' for more information. [22:55] *** primus104 has joined #internetarchive.bak [22:55] *** primus102 has quit IRC (Remote host closed the connection) [22:56] touching the NOSHUF file in your IA.BAK folder will stop that (but also stop it shuffling the order of downloads) [22:57] *** primus has joined #internetarchive.bak [22:58] thanks, so if i understand correctly it's not a serious error msg and it's ok to leave it working like that? [22:58] correct [22:59] thank you [23:11] 03registrar 05master 27f9cd3 06other 10SHARD4/pubkeys registration of primus1024 on SHARD4 [23:26] *** zottelbey has quit IRC (Remote host closed the connection)