#internetarchive.bak 2015-04-23,Thu

↑back Search

Time Nickname Message
00:00 🔗 sep332 SketchCow: may I suggest linking to https://github.com/ArchiveTeam/IA.BAK/ or https://github.com/ArchiveTeam/IA.BAK/blob/master/README.md instead of that wiki page?
00:12 🔗 trs80 has quit IRC (Ping timeout: 186 seconds)
00:44 🔗 Quile has quit IRC (Ping timeout: 186 seconds)
00:45 🔗 kyan has joined #internetarchive.bak
00:50 🔗 Quile has joined #internetarchive.bak
01:07 🔗 mhazinsk has quit IRC (Ping timeout: 186 seconds)
01:08 🔗 mhazinsk has joined #internetarchive.bak
01:08 🔗 svchfoo3 sets mode: +o mhazinsk
01:19 🔗 trs80 has joined #internetarchive.bak
01:28 🔗 closure SketchCow: nice!
01:45 🔗 trs80 has quit IRC (Ping timeout: 186 seconds)
01:46 🔗 Start-mob has joined #internetarchive.bak
01:46 🔗 trs80 has joined #internetarchive.bak
01:47 🔗 mattl____ has joined #internetarchive.bak
01:48 🔗 mattl____ threw up a VM with 500gb disk space to test this :)
01:48 🔗 logchfoo starts logging #internetarchive.bak at Thu Apr 23 01:48:24 2015
01:48 🔗 logchfoo has joined #internetarchive.bak
01:49 🔗 mattl____ is now known as mattl
01:50 🔗 closure hey mattl!
01:50 🔗 mattl hey!
01:51 🔗 closure is now known as joeyh
01:51 🔗 joeyh just to avoid confusion
01:51 🔗 mattl ahhh
01:51 🔗 mattl i was looking for you.
01:57 🔗 mattl joeyh: i like the message reminding me to make a cronjob and not just be lazy and run everything inside screen
01:59 🔗 joeyh well, it's a start
01:59 🔗 joeyh we need auto-cron, but it's hard to set that up in the right way for everyone
02:00 🔗 mattl i have 4 screens with ./iabak running in each one, let's see how that works. Bytemark have decent bandwidth, shouldn't take too long
02:00 🔗 pikhq I finally hit my space allocation so I've only got a cron job.
02:01 🔗 trs80 has quit IRC (Ping timeout: 186 seconds)
02:01 🔗 joeyh hmm, is BigV cheap enough to keep .5 tb spinning there?
02:01 🔗 mattl yep.
02:01 🔗 mattl 20GBP a month or something like that.
02:02 🔗 joeyh bytemark may not have the best bw to the IA, it seems it can be slow outside the US
02:02 🔗 mattl well, this is a good test either way. CC is moving most things over to BigV.
02:02 🔗 trs80 has joined #internetarchive.bak
02:03 🔗 mattl and we need to start talking to IA
02:03 🔗 SketchCow sep332: Fix the wiki
02:11 🔗 sep332 i added minimal steps to get started and a link to the readme for the rest
02:12 🔗 sep332 also the iabak script tells people to read the readme when you run it
02:29 🔗 iabak-reg has joined #internetarchive.bak
02:30 🔗 iabak-reg has quit IRC (Client Quit)
02:31 🔗 iabak-reg has joined #internetarchive.bak
02:32 🔗 SketchCow Nobody hopped on from this bit, I think
02:32 🔗 SketchCow I will now have to push it
02:46 🔗 iabak-reg 05master 8e314b5 06other fast forward
02:47 🔗 joeyh hmm, not quite iabak-reg
02:48 🔗 iabak-reg 03registrar 05master a8adff8 06other 10SHARD2/pubkeys registration of wdenton on SHARD2
02:48 🔗 joeyh that's more like it!
02:57 🔗 iabak-reg 03registrar 05master 79de7f9 06other 10SHARD2/pubkeys registration of justtesting on SHARD2
03:12 🔗 Start-mob has quit IRC (Remote host closed the connection)
03:24 🔗 sep332 ooh
04:06 🔗 iabak-reg 03registrar 05master 4a3edb9 06other 10SHARD2/pubkeys registration of archiveteam on SHARD2
04:13 🔗 iabak-reg 03registrar 05master a62663c 06other 10SHARD3/pubkeys registration of archiveteam on SHARD3
04:15 🔗 kalleboo has joined #internetarchive.bak
04:18 🔗 kalleboo hi. when I run iabak, my terminal fills up with "dirname: invalid option -- 'z'"
04:19 🔗 kalleboo this is with GNU coreutils 8.4
04:20 🔗 joeyh and that is the problem.
04:21 🔗 joeyh workaround: touch IA.BAK/NOSHUF and restart
04:22 🔗 kalleboo ok cool
04:23 🔗 kalleboo yeah this is one of those "lying around doing one old thing" servers which isn't really eligible for upgrading everythig to the latest and greatest. it's on some quite-old distribution of centos
04:24 🔗 joeyh we could fix it with a perl command that reads stdin, breaks on \0 , truncates to the directory name, and outputs back out with \0
04:27 🔗 zottelbey has joined #internetarchive.bak
04:52 🔗 garyrh https://www.reddit.com/r/DataHoarder/comments/33iz8b/that_time_archive_team_decided_to_back_up_the/
04:54 🔗 mhazinsk has quit IRC (Ping timeout: 186 seconds)
04:56 🔗 yipdw what is with /r/DataHoarder and assholes
04:56 🔗 yipdw the correlation coefficient is almost 1
04:57 🔗 garyrh 1? That's not very high.
04:58 🔗 yipdw it is for the correlation coefficient
04:59 🔗 garyrh NOT HIGH.
04:59 🔗 yipdw you're right, that was two days ago
05:03 🔗 garyrh And the place where it's higher? HN.
05:23 🔗 db48x how can the correlation be higher than 1?
05:31 🔗 pikhq "They backed up 9.12 TB? I don't mean to be a party pooper but that doesn't seem impressive.
05:31 🔗 pikhq "
05:31 🔗 pikhq That...
05:31 🔗 pikhq That sounds like someone who isn't all that cognizant of what all that involves.
05:32 🔗 pikhq It's not like we had some guy with a small number of empty drives mash wget.
06:02 🔗 joeyh also, 9.1 * 3
06:04 🔗 pikhq *nod*
06:21 🔗 DFJustin well it is /r/DataHoarder/ where there is a dude with a literal petabyte in his house
06:21 🔗 DFJustin (hi ohhdemgir)
06:22 🔗 DFJustin other forums might be more impressed
06:44 🔗 SketchCow It won't be impressive until the number spikes up past a petabyte
07:23 🔗 stapper has joined #internetarchive.bak
07:31 🔗 espes___ I'm confused how the ia.bak git annex server is setup
07:32 🔗 espes___ does it just have a local copy of all the shards or is stuff setup to use a remove backed by internetarchive s3 or something
07:33 🔗 espes___ remote*
07:36 🔗 iabak-reg 03registrar 05master 0ea43c4 06other 10SHARD3/pubkeys registration of wild.dominic on SHARD3
07:36 🔗 espes___ or are the files themselves backed by urls or something
07:38 🔗 db48x espes___: each shard is a git annex repository where each file is added using git annex addurl
07:39 🔗 espes___ oh neat
07:42 🔗 db48x if you do git annex whereis | less it'll show you where each file is located, including the url for the web remote
08:19 🔗 cloudmons has joined #internetarchive.bak
08:40 🔗 atomotic has joined #internetarchive.bak
08:43 🔗 Senji Hmm, someone appears to have grabbed 2/3 of shard 2 overnight :-)
08:43 🔗 Senji AKA yay shard2 finished
08:44 🔗 cloudmons has quit IRC (ircd.choopa.net irc.mzima.net)
09:09 🔗 SketchCow Hurrah
09:30 🔗 marvinw has quit IRC (Read error: Operation timed out)
09:32 🔗 logchfoo_ starts logging #internetarchive.bak at Thu Apr 23 09:32:52 2015
09:32 🔗 logchfoo_ has joined #internetarchive.bak
09:34 🔗 GLaDOS has quit IRC (Read error: Operation timed out)
09:36 🔗 GLaDOS has joined #internetarchive.bak
09:37 🔗 svchfoo2 sets mode: +o GLaDOS
09:52 🔗 Start has quit IRC (ircd.shaw.ca irc.shaw.ca)
09:52 🔗 chfoo- has quit IRC (ircd.shaw.ca irc.shaw.ca)
09:52 🔗 wp494 has quit IRC (ircd.shaw.ca irc.shaw.ca)
09:52 🔗 garyrh has quit IRC (ircd.shaw.ca irc.shaw.ca)
09:52 🔗 DFJustin has quit IRC (ircd.shaw.ca irc.shaw.ca)
09:52 🔗 matthusby has quit IRC (ircd.shaw.ca irc.shaw.ca)
09:52 🔗 Sanqui has quit IRC (ircd.shaw.ca irc.shaw.ca)
09:52 🔗 underscor has quit IRC (ircd.shaw.ca irc.shaw.ca)
09:52 🔗 csssuf has quit IRC (ircd.shaw.ca irc.shaw.ca)
09:57 🔗 Start has joined #internetarchive.bak
09:57 🔗 chfoo- has joined #internetarchive.bak
09:57 🔗 wp494 has joined #internetarchive.bak
09:57 🔗 garyrh has joined #internetarchive.bak
09:57 🔗 matthusby has joined #internetarchive.bak
09:57 🔗 DFJustin has joined #internetarchive.bak
09:57 🔗 Sanqui has joined #internetarchive.bak
09:57 🔗 underscor has joined #internetarchive.bak
09:57 🔗 csssuf has joined #internetarchive.bak
09:57 🔗 irc.shaw.ca sets mode: +o DFJustin
10:02 🔗 marvinw has joined #internetarchive.bak
10:22 🔗 marvinw has quit IRC (Ping timeout: 606 seconds)
10:30 🔗 S[h]O[r]T has joined #internetarchive.bak
10:30 🔗 sep332 has joined #internetarchive.bak
10:31 🔗 svchfoo2 sets mode: +o sep332
10:32 🔗 marvinw has joined #internetarchive.bak
10:33 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
11:11 🔗 kalleboo has quit IRC (Linkinus - http://linkinus.com)
11:22 🔗 richo has joined #internetarchive.bak
11:23 🔗 zottelbey has quit IRC (Remote host closed the connection)
11:24 🔗 zottelbey has joined #internetarchive.bak
11:24 🔗 joeyh woooo
11:29 🔗 joeyh clients should be switching over to shard3
11:38 🔗 atomotic has joined #internetarchive.bak
11:41 🔗 iabak-reg 03registrar 05master 9409bbf 06other 10SHARD1/pubkeys registration of id on SHARD1
11:41 🔗 iabak-reg 03registrar 05master 3fbf3d3 06other 10SHARD2/pubkeys registration of id on SHARD2
11:41 🔗 iabak-reg 03registrar 05master 8fc1f58 06other 10SHARD3/pubkeys registration of id on SHARD3
11:45 🔗 zottelbey has quit IRC (Remote host closed the connection)
11:47 🔗 zottelbey has joined #internetarchive.bak
11:48 🔗 joeyh so shard2 took 10 days
11:51 🔗 SketchCow Right. Although we were quiet about it initially.
11:51 🔗 SketchCow We should probably set up the next 10 shards.
11:51 🔗 SketchCow or 5 at least.
11:53 🔗 SketchCow And is there scripting or contingency yet for the script to go "oh, there's more shards and I have more space"
11:54 🔗 joeyh there is, it needs a slight bit of dehardcoding to not just switch to shard3 though
12:02 🔗 SketchCow I'd say work on that, next.
12:02 🔗 SketchCow Then we can start making sure that people with space who show up aren't waiting for assignment.
12:03 🔗 SketchCow Obviously, as time goes on, people with multi-terabyte sets are going to help us hit larger and larger collections.
12:23 🔗 iabak-reg 03registrar 05master 669c3d1 06other 10SHARD3/pubkeys registration of id on SHARD3
13:02 🔗 sankin has joined #internetarchive.bak
13:57 🔗 Start has quit IRC (Disconnected.)
13:57 🔗 joeyh SketchCow: any collection recs for new shards?
14:16 🔗 Senji I note that Shard 1 still doesn't have 100% >=3 backups (indeed it appears to have one file not backed up at all); is there anything going on to deal with that?
14:17 🔗 Senji Shard 2 currently shows 100% >=3 so is comparatively in better shape
14:18 🔗 SketchCow joeyh: I'll ping you with them
14:50 🔗 Start has joined #internetarchive.bak
14:57 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
14:57 🔗 Start has quit IRC (Disconnected.)
15:03 🔗 Start has joined #internetarchive.bak
15:05 🔗 phuzion has joined #internetarchive.bak
15:17 🔗 joeyh so, I think we'll soon be able to configure git-annex like so: ((balanced_amoung(backup) and not (copies=backup:3)) or present
15:18 🔗 joeyh and the files will be spread amoung the repos in a balanced way, w/o resorting to randomness like we do now
15:18 🔗 joeyh and without the extra copies some files get now
15:18 🔗 joeyh (well, with less of them anyhow)
15:37 🔗 swebb So I'm still downloading files for SHARD2, but it looks to be 100% backed up now. Will the git-annex stuff be smart enough to start me downloading SHARD3?
15:37 🔗 joeyh it should switch you over, yes
15:37 🔗 swebb ok
15:38 🔗 joeyh there might be a little period where your client hasn't heard in from the others that shard2 is done and does a little extra downloading
15:51 🔗 joeyh db48x: hey, I see you installed fail2ban on the server. any particular reason?
15:51 🔗 Start has quit IRC (Disconnected.)
15:51 🔗 joeyh I'm unclined to just disable all password auth, but let clients connect as often as they like
15:51 🔗 joeyh heey, this might explain some of the spikes in the graph, if a client got banned for a while
16:00 🔗 Start has joined #internetarchive.bak
16:26 🔗 real_eyes is now known as realeyes
16:32 🔗 iabak-reg 03registrar 05master d7b6ef4 06other 10SHARD3/pubkeys registration of bas+at on SHARD3
16:35 🔗 lhobas seems to work fine on OS X :)
16:45 🔗 Start has quit IRC (Disconnected.)
17:15 🔗 VADemon has joined #internetarchive.bak
17:20 🔗 db48x has quit IRC (Ping timeout: 258 seconds)
17:28 🔗 phuzion iabak does not work on freenas, is this known?
17:28 🔗 phuzion (freenas is BSD based, I know)
17:31 🔗 joeyh there's no git-annex build for it, that'd be the first problem
17:35 🔗 phuzion ok, just wanted to make sure I wasn't going crazy
17:41 🔗 ersi well, you're using BSD so you can't be too sure
17:41 🔗 * ersi hides
17:42 🔗 phuzion Hah
17:56 🔗 kyan has quit IRC (Quit: This computer has gone to sleep)
18:11 🔗 garyrh has quit IRC (Remote host closed the connection)
18:46 🔗 iabak-reg 03registrar 05master 761215d 06other 10SHARD3/pubkeys registration of chris on SHARD3
19:22 🔗 phuzion Hey guys, I'm getting the following error, any ideas? http://pastebin.com/Tv3n79Yh
19:23 🔗 phuzion Should I just delete the files in question and rerun ./iabak?
19:32 🔗 Senji2 has joined #internetarchive.bak
19:33 🔗 Senji2 mmm, fscking shard2. Guess it's time to setup cronjob on cleopatra
19:34 🔗 garyrh has joined #internetarchive.bak
19:43 🔗 joeyh phuzion: hmm, I wonder if your repository is in direct mode?
19:43 🔗 joeyh you could run git reset --hard in there
19:43 🔗 phuzion I nuked it, I'm gonna try again
19:43 🔗 joeyh don't know why a file would be changed though
19:43 🔗 joeyh wait one sec
19:44 🔗 joeyh you don't want to commit a deletion of that file
19:44 🔗 joeyh so git reset --hard
19:44 🔗 phuzion The original directory's already gone, sorry.
19:44 🔗 phuzion I'm starting from scratch.
19:44 🔗 joeyh oh, ok
19:45 🔗 joeyh oh, this is on freebsd?
19:45 🔗 phuzion nope
19:45 🔗 phuzion centos 7
19:45 🔗 joeyh what filesystem?
19:45 🔗 phuzion i'm saving to an NFS share if that makes any difference
19:45 🔗 iabak-reg 03registrar 05master b729821 06other 10SHARD3/pubkeys registration of chris on SHARD3
19:45 🔗 phuzion that's me re-registering
19:45 🔗 joeyh nfs is gonna be flakey one way or another
19:45 🔗 phuzion Bummer
19:46 🔗 phuzion Flaky as in I shouldn't bother with it?
19:47 🔗 joeyh depends, I've never seen it flake out this way before
19:48 🔗 phuzion out of curiosity, what kind of network performance are you seeing when you're pulling objects?
19:48 🔗 phuzion I'm getting about 200KiB/s
19:48 🔗 Senji2 sounds about right
19:49 🔗 joeyh that's on the low end. I'd say run concurrent iabak, but that is known to not be wise on nfs
19:49 🔗 Senji2 I can max out my adsl with 10 copies
19:50 🔗 phuzion Would iSCSI perform better?
19:57 🔗 lhobas gut feeling: concurrent ./iabak on OS X performs poorly due to lack of shuf? (lots of "transfer already in progress")
19:59 🔗 joeyh lhobas: it'll tend to contend with itself like that yes.
19:59 🔗 Senji2 on linux with NOSHUF that doesn't slow things down much
20:00 🔗 joeyh makes it do a bit more work to find each file
20:02 🔗 Senji2 yeah, but most of thr time you're downloading 300+MB files at 200kB/s rather than finding a new file
20:03 🔗 lhobas seeing lots of small files and contending atm with (beginning of) shard3
20:04 🔗 sep332 wow <10% of the files are IA-only now
20:04 🔗 Senji2 if the files are primarily small tgen you might be better off running one copy until you get to some bigger ones
20:05 🔗 Senji2 depending how big 'small' is
20:06 🔗 lhobas Senji2: doing that for now
20:06 🔗 lhobas any chance of changing the hostname that is synced through git-annex? (and showing up in http://iabak.archiveteam.org/stats/SHARD3.leaderboard etc) Did not consider how it might leak personal info
20:09 🔗 Senji2 for shard2 I got a bit over a TB with concurrent NOSHUF, but my machine running shard3 has a working readlink so I don't know what 3 is like in that regard.
20:12 🔗 joeyh lhobas: just cd SHARDn; git annex describe . whatever
20:17 🔗 atomotic has joined #internetarchive.bak
20:37 🔗 joeyh moar red :)
20:39 🔗 phuzion joeyh: I might be able to throw like 2-3TB at this if I can figure out iSCSI on this NAS
20:41 🔗 Senji2 job for next week is to see how much of the spare disk pile still works :)
20:51 🔗 atomotic has quit IRC (Ping timeout: 260 seconds)
20:58 🔗 sankin has quit IRC (Leaving.)
21:39 🔗 iabak-reg 03registrar 05master 948a90f 06other 10SHARD4/pubkeys registration of sean.palmer on SHARD4
21:53 🔗 Atluxity I am having a hard time doing the math behind this project
21:53 🔗 Atluxity do we intend this to be cold storage, or online?
22:13 🔗 kyan has joined #internetarchive.bak
22:21 🔗 Senji2 nearline/online
22:39 🔗 iabak-reg 03registrar 05master d682f2b 06other 10SHARD3/pubkeys registration of primus1024 on SHARD3
22:42 🔗 Kazzy lhobas: for changing your hostname, cd into the shard directory, then run 'git annex describe . <infohere>'
22:43 🔗 lhobas fixed it, thanks joeyh & Kazzy
22:44 🔗 Kazzy oh right, didn't see message from joeyh between commit messages, heh
22:44 🔗 primus102 has joined #internetarchive.bak
22:47 🔗 primus102 Hi, can someone help me with a problem running ./iabak? When it starts it keeps showing msg: dirname: invalid option -- 'z'
22:47 🔗 primus102 Try `dirname --help' for more information.
22:55 🔗 primus104 has joined #internetarchive.bak
22:55 🔗 primus102 has quit IRC (Remote host closed the connection)
22:56 🔗 Senji touching the NOSHUF file in your IA.BAK folder will stop that (but also stop it shuffling the order of downloads)
22:57 🔗 primus has joined #internetarchive.bak
22:58 🔗 primus thanks, so if i understand correctly it's not a serious error msg and it's ok to leave it working like that?
22:58 🔗 trs80 correct
22:59 🔗 primus thank you
23:11 🔗 iabak-reg 03registrar 05master 27f9cd3 06other 10SHARD4/pubkeys registration of primus1024 on SHARD4
23:26 🔗 zottelbey has quit IRC (Remote host closed the connection)

irclogger-viewer