[00:04] *** GLaDOS has quit (Read error: Operation timed out) [00:05] *** GLaDOS (~STR_IDENT@[redacted]) has joined #internetarchive.bak [00:05] *** svchfoo2 gives channel operator status to GLaDOS [00:35] *** svchfoo1 has quit (Read error: Operation timed out) [00:36] *** wp494_ (~wickedpla@[redacted]) has joined #internetarchive.bak [00:37] *** svchfoo1 (~chfoo1@[redacted]) has joined #internetarchive.bak [00:37] *** svchfoo2 gives channel operator status to svchfoo1 [00:39] *** wp494 has quit (Read error: Operation timed out) [00:54] *** patricko- is now known as patrickod [01:04] *** wp494_ is now known as wp494 [01:05] closure: Yo, bro [01:06] Turns out, flying to Sweden overnight got me a little time-messed. [01:23] *** patrickod is now known as patricko- [01:26] *** patricko- is now known as patrickod [01:32] *** patrickod is now known as patricko- [01:59] *** GLaDOS has quit (Read error: Operation timed out) [02:00] *** GLaDOS (~STR_IDENT@[redacted]) has joined #internetarchive.bak [02:01] *** svchfoo2 gives channel operator status to GLaDOS [02:10] *** GLaDOS has quit (Ping timeout: 260 seconds) [02:11] *** GLaDOS (~STR_IDENT@[redacted]) has joined #internetarchive.bak [02:11] *** svchfoo2 gives channel operator status to GLaDOS [02:26] *** GLaDOS has quit (Read error: Operation timed out) [02:26] *** GLaDOS (~STR_IDENT@[redacted]) has joined #internetarchive.bak [02:27] *** svchfoo1 gives channel operator status to GLaDOS [02:39] *** GLaDOS has quit (Read error: Operation timed out) [02:42] *** GLaDOS (~STR_IDENT@[redacted]) has joined #internetarchive.bak [02:42] *** svchfoo2 gives channel operator status to GLaDOS [02:52] *** GLaDOS has quit (Ping timeout: 260 seconds) [02:55] *** GLaDOS (~STR_IDENT@[redacted]) has joined #internetarchive.bak [02:55] *** svchfoo2 gives channel operator status to GLaDOS [05:36] *** zottelbey (~zottelbey@[redacted]) has joined #internetarchive.bak [06:01] *** bzc6p has quit (bzc6p) [06:10] *** GLaDOS has quit (Read error: Operation timed out) [06:11] *** GLaDOS (~STR_IDENT@[redacted]) has joined #internetarchive.bak [06:11] *** svchfoo2 gives channel operator status to GLaDOS [06:28] *** wp494 has quit (Ping timeout: 740 seconds) [06:53] *** wp494 (~wickedpla@[redacted]) has joined #internetarchive.bak [08:20] *** niyaje (~niyaje@[redacted]) has joined #internetarchive.bak [09:19] *** niyaje has quit (Ping timeout: 600 seconds) [09:20] *** niyaje (~niyaje@[redacted]) has joined #internetarchive.bak [09:42] *** Start has quit (ircd.shaw.ca irc.shaw.ca) [09:42] *** csssuf has quit (ircd.shaw.ca irc.shaw.ca) [09:42] *** garyrh has quit (ircd.shaw.ca irc.shaw.ca) [09:42] *** pikhq has quit (ircd.shaw.ca irc.shaw.ca) [09:42] *** wp494 has quit (ircd.shaw.ca irc.shaw.ca) [10:02] *** Start (~Start@[redacted]) has joined #internetarchive.bak [10:02] *** csssuf (~csssuf@[redacted]) has joined #internetarchive.bak [10:02] *** garyrh (garyrh@[redacted]) has joined #internetarchive.bak [10:02] *** irc.shaw.ca gives channel operator status to Start garyrh pikhq [10:02] *** pikhq (~pikhq@[redacted]) has joined #internetarchive.bak [10:02] *** wp494 (~wickedpla@[redacted]) has joined #internetarchive.bak [10:02] *** niyaje has quit (Ping timeout: 600 seconds) [10:10] *** niyaje (~niyaje@[redacted]) has joined #internetarchive.bak [10:29] *** cloudmons has quit (Read error: Connection reset by peer) [10:29] *** cloudmons (~quassel@[redacted]) has joined #internetarchive.bak [10:37] *** niyaje2 (~niyaje@[redacted]) has joined #internetarchive.bak [10:37] *** niyaje has quit (Ping timeout: 600 seconds) [10:41] *** niyaje (~niyaje@[redacted]) has joined #internetarchive.bak [10:47] *** niyaje3 (~niyaje@[redacted]) has joined #internetarchive.bak [10:47] *** niyaje2 has quit (Ping timeout: 600 seconds) [10:49] *** niyaje3 has quit (Client Quit) [10:54] *** niyaje has quit (Ping timeout: 600 seconds) [11:21] *** SketchCo1 (~jscott@[redacted]) has joined #internetarchive.bak [11:21] *** Kazzy_ (~Kaz@[redacted]) has joined #internetarchive.bak [11:26] *** Kazzy has quit (hub.se efnet.portlane.se) [11:26] *** SketchCow has quit (hub.se efnet.portlane.se) [11:26] *** underscor has quit (hub.se efnet.portlane.se) [11:42] *** underscor (~quassel@[redacted]) has joined #internetarchive.bak [11:42] *** Kazzy_ is now known as Kazzy [11:42] *** svchfoo2 gives channel operator status to Kazzy [13:46] *** Start has quit (Disconnected.) [14:22] *** bzc6p (~bzc6p@[redacted]) has joined #internetarchive.bak [14:34] *** Start (~Start@[redacted]) has joined #internetarchive.bak [14:35] *** Start has quit (Read error: Connection reset by peer) [14:35] *** Start_ (~Start@[redacted]) has joined #internetarchive.bak [14:46] *** Start_ has quit (Read error: Connection reset by peer) [14:46] *** Start (~Start@[redacted]) has joined #internetarchive.bak [15:18] *** Start has quit (Disconnected.) [15:25] *** Start (~Start@[redacted]) has joined #internetarchive.bak [15:26] *** Start has quit (Read error: Connection reset by peer) [15:28] *** Start (~Start@[redacted]) has joined #internetarchive.bak [15:29] *** kofica (~TheBuda@[redacted]) has joined #internetarchive.bak [15:29] @find jerusalem bible [15:29] *** kofica (~TheBuda@[redacted]) has left #internetarchive.bak [15:30] *** kofica (~TheBuda@[redacted]) has joined #internetarchive.bak [15:30] *** kofica (~TheBuda@[redacted]) has left #internetarchive.bak [15:43] *** zottelbey has quit (Remote host closed the connection) [15:51] *** garyrh has quit (Remote host closed the connection) [15:51] *** Start has quit (Disconnected.) [16:01] *** VADemon (~VADemon@[redacted]) has joined #internetarchive.bak [16:22] *** zottelbey (~zottelbey@[redacted]) has joined #internetarchive.bak [16:24] *** zottelbey has quit (Remote host closed the connection) [16:26] *** zottelbey (~zottelbey@[redacted]) has joined #internetarchive.bak [16:28] *** Start (~Start@[redacted]) has joined #internetarchive.bak [16:29] *** Start has quit (Read error: Connection reset by peer) [16:29] *** Start_ (~Start@[redacted]) has joined #internetarchive.bak [16:29] *** zottelbey has quit (Remote host closed the connection) [16:29] *** Start_ is now known as Start [16:33] *** zottelbey (~zottelbey@[redacted]) has joined #internetarchive.bak [16:45] *** Start has quit (Disconnected.) [16:55] *** patricko- is now known as patrickod [17:08] *** patrickod is now known as patricko- [17:43] *** SketchCo1 is now known as SketchCOw [17:43] *** SketchCOw is now known as SketchCow [17:43] *** svchfoo2 gives channel operator status to SketchCow [17:50] closure: I'm around [17:54] SketchCow: so, we need a machine or container that can be the git server. Any thoughts what to use? [17:56] *** Start (~Start@[redacted]) has joined #internetarchive.bak [18:01] How much space does it need? [18:01] I can probably throw you on sisyphus [18:02] I think not a whole lot of space.. let's see [18:03] Also, this MIGHT be a job for Kenshin [18:03] Who wants to help but IA may be slow talking about CDN, while I am much happier to utilize resources for a project like this. [18:04] get closure on sisphyus and get the ball rollin' [18:04] the demo shard is 51 mb [18:04] times 1770 shards [18:04] hardly nothing [18:07] we may end up wanting a separate unix account per shard though, or something like that, to limit the ssh keys that can access it [18:09] 90gb, if I see that correctly. [18:09] yipdw: Yes, I agree [18:11] oh I was going for the rock puns [18:12] So, game plan. [18:12] 1. Ignore people saying it can't be done, we're fucked, look I have numbers [18:12] 2. Set up a version of this with teamarchive1/sisyphus [18:13] 3. Have a few folks using it who step forward, who are not anywhere on IA infrastructure [18:13] 4. Deal with the 1,409 problems that crop up [18:13] 5. Hack away at a pretty interface once this is working, to show it wokring [18:13] 6. Increase backup size when we think it works and more people step forward [18:13] 7. Repeat 6 [18:14] 8. Success/Failure/Wonder/Sadness [18:14] where does "try a restore" fit in? [18:14] I mostly just want to see that work; I know it's going to be largely provided by git-annex [18:16] that's basically the same list I have at http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK/git-annex_implementation [18:21] btw the dshr article is interesting, entangled storage is a neat thing (that I'd be happy to add a git-annex backend for if someone points me at the tools), but I think he has some holes in his analysis [18:21] closure: teamarchive1.fnf.archive.org [18:26] closure: let me know how much resource you need in terms of ram/hdd storage (please don't tell me the more the merrier), how many VMs you need, plan to need, etc [18:27] i just got back so i'm going to clear up some pending stuff, but i'll jump in shortly after [18:28] closure: Let me know if you hit snags with logging in [18:28] Kenshin: this is the controller, not the storage, it doesn't need much [18:28] SketchCow: logged in. now what [18:28] Kenshin: I suspect this will.. yeah, what he said [18:28] closure: hahahah [18:28] THE QUESTION [18:28] That question every nerd asks after a successful slack or OS install [18:28] my thought would be: add a SHARD1 user account [18:28] closure: so i assume ssd storage would be better :) [18:28] closure: high RAM requirement? [18:28] Kenshin: absolutely [18:29] closure: Try doing a sudo [18:29] ram should be low, it's just running some git stuff [18:29] where do you prefer it? lax, nyc, uk [18:31] Kenshin: near IA seems to make sense [18:32] closure is not in the sudoers file. This incident will be reported. [18:32] LAX then. what's your preferred distribution? [18:32] debian [18:33] christ SketchCow this has a lot of cpus [18:34] oh, it's qemu [18:35] OH SHIT REPORTED INCIDENT [18:35] *** SketchCow and closure run through the streets, dogs barking, lights shining on them [18:35] We had a good run, man [18:35] *** SketchCow holds the bullet wound but it's too late [18:36] tell.. [18:36] telll underscor his ass was mighty fine [18:36] the box.. it's full of cpus [18:36] closure: Fixed [18:36] Have at, try not destroy the box, I do like it [18:37] *** Start has quit (Disconnected.) [18:38] So, just one example where I disagree with Rosenthal [18:38] He talks about the recovery situation [18:38] And he mentions the crater scenario (no IA) and the slightly gon scenario (a drive pair blew up) [18:39] And he then applies current bandwidth situation as his calculation for how long recovery takes. [18:39] Except [18:39] A cratered, returned IA will be hosted somewhere else, and I guarantee the organization would be paying for RIDICULOUS amounts of upstream [18:39] RIDICULOUS [18:40] And we'd have huge amounts of changes for allowing the maximum amount of data to flow in. [18:41] So that's one problem right there. An academic forgets how much money can be thrown at a problem and how much people will do if given enough money for a task. [18:41] But I'm more concerned about lost items. specific ones [18:41] Either by mistake, hard drive failure, etc. [18:59] *** patricko- is now known as patrickod [19:03] *** Start (~Start@[redacted]) has joined #internetarchive.bak [19:19] git clone SHARD1@124.6.40.227:shard1 [19:19] I'm using Kenshin's VM for now, I like ssd [19:20] give me a ssh public key and I'll give you access to this repo, to tes it [19:25] *** Start has quit (Disconnected.) [19:29] *** patrickod is now known as patricko- [19:32] *** Start (~Start@[redacted]) has joined #internetarchive.bak [19:33] *** Start has quit (Read error: Connection reset by peer) [19:33] *** Start_ (~Start@[redacted]) has joined #internetarchive.bak [19:36] *** Start_ is now known as Start [19:38] anyone? cat .ssh/id_rsa.pub to me [19:43] how big is it? [19:45] *** Start has quit (Read error: Connection reset by peer) [19:45] *** Start_ (~Start@[redacted]) has joined #internetarchive.bak [19:45] *** Start_ is now known as Start [19:46] 50 mb [19:46] *** patricko- is now known as patrickod [19:48] sep332: you should be able to clone it now, and see http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK/git-annex_implementation#demo_shard for other commands [19:49] I have also done some initial tweaks to make the repo mostly read-only [19:53] *** patrickod is now known as patricko- [20:00] so my username is SHARD1 ? [20:01] that's the git repo's username, yes [20:07] got it. what's a good way to test? [20:08] after a git clone [20:08] pick some of the items, and git annex get them, to start [20:08] then git annex sync , and it should tell the server what you have [20:08] oh you updated the wiki already :p [20:08] thanks [20:09] you could just "git annex get ." to start downloading from the top [20:09] obvs we'll need better ways to allocate items to clients later [20:10] it's around 2 tb if you download everything ;) [20:10] ok, i've got ~800GB from the other test shard so far [20:10] which other one? the old http url? [20:11] on another box though so I can't just cp them... i should just fix that box i guess [20:11] if you already downloaded that much, you should convert it to use the new repo as origin. It's the same git repo, same daya [20:11] data [20:11] from testrepo1 [20:11] yep, shard1 *is* testrepo1 [20:12] good lord 800 gb? [20:12] gotcha. my git-fu is nonexistant :) [20:12] oh right, this is ArchiveTeam, give them a list of urls, and wham [20:12] well i had a 4TB HD sitting here all lonely [20:13] so, you'll need to give me the ssh public key for the account on that other box, and then edit .git/config, swap out the http url with SHARD1@124.6.40.227:shard1 [20:13] and then it will just switch over [20:14] nice to know you could pull 800 gb this way from IA over the past 2 weeks I was away [20:14] i can use the same key. [20:14] sure [20:17] so here's something you could do.. once you set up that other box, run git annex sync in both repos. Then, in the new repo: git annex get --not --copies=2 [20:17] that will only download files that don't have 2 known copies, and the IA counts as 1 copy, so it will get files you have not already gotten in the old repo [20:17] ok cool [20:18] i'm getting an error on git annex sync though [20:18] testshard: error while loading shared libraries: testshard: cannot open shared object file: No such file or directory [20:18] hmm [20:18] Please make sure you have the correct access rights and the repository exists. [20:19] I think that's a bug with your installation of git-annex [20:19] it says that twice and then [20:19] (non-fast-forward problems can be solved by setting receive.denyNonFastforwards to false in the remote's git config) [20:19] I guess you're using the standalone tarball of git-annex and maybe not in the right way [20:19] that's... spookily accurate [20:19] I actually tried these shards with the git-annex in debian stable, and it works ok [20:20] it just can't git annex fsck, otherwise things work [20:20] oh! ok then. i'll do that. [20:20] or you might try git-annex.linux/runshell , which should give you a shell environment using the right libs [20:22] *** Start has quit (Disconnected.) [20:23] runshell doesn't seem to help. i'm going to try whatever ubuntu has for git-annex [20:23] hmm, I'm not so sure it's the local install [20:24] paste git-annex sync --debug [20:29] http://pastebin.com/raw.php?i=FdbmsBQn [20:32] what does this say? git config remote.origin.url [20:33] testshard:shard1 [20:33] testshard is a line in my .ssh/config [20:33] 124.6.40.227, user SHARD1 [20:33] yeah, I think you got that wrong somehow :) [20:34] maybe run: git config remote.origin.url SHARD1@124.6.40.227:shard1 [20:36] i get the same error on git-annex sync though [20:37] git fetch origin is what's failing there [20:38] ok now how do i tell git to use my key without putting it in .ssh/config ? [20:39] oh, I see your problem there.. yeah, that's a pain to set up. Why not just give me a second ssh key? [20:40] i reverted to testshard:shard1 and "git fetch origin" doesn't fail [20:40] doesn't seem to do anything [20:42] ok i got SHARD1@124.6.40.227:shard1 to work using ssh-agent [20:42] but same old error [20:51] dude, I dunno. you're way down some nonstandard, bad-idea config rabbithole [20:51] alright, don't sweat it then [20:51] when git starts trying to load your ssh/.config dummy hostname as a shared library, you've probably done something wrong [20:52] i'll sleep on it and poke it with a stick tomorrow :) [21:45] *** zottelbey has quit (Remote host closed the connection) [22:00] *** patricko- is now known as patrickod [22:02] *** patrickod is now known as patricko- [22:27] *** kaizoku (~kaizoku@[redacted]) has left #internetarchive.bak [22:39] *** Start (~Start@[redacted]) has joined #internetarchive.bak [22:39] *** svchfoo1 gives channel operator status to Start [23:00] closure: this isn't really your department, but when I try to install git annex from source it fails to install one of the dependencies (Test.Tasty.QuickCheck). have you noticed this?