Time |
Nickname |
Message |
00:00
🔗
|
sep332 |
SketchCow: may I suggest linking to https://github.com/ArchiveTeam/IA.BAK/ or https://github.com/ArchiveTeam/IA.BAK/blob/master/README.md instead of that wiki page? |
00:12
🔗
|
|
trs80 has quit IRC (Ping timeout: 186 seconds) |
00:44
🔗
|
|
Quile has quit IRC (Ping timeout: 186 seconds) |
00:45
🔗
|
|
kyan has joined #internetarchive.bak |
00:50
🔗
|
|
Quile has joined #internetarchive.bak |
01:07
🔗
|
|
mhazinsk has quit IRC (Ping timeout: 186 seconds) |
01:08
🔗
|
|
mhazinsk has joined #internetarchive.bak |
01:08
🔗
|
|
svchfoo3 sets mode: +o mhazinsk |
01:19
🔗
|
|
trs80 has joined #internetarchive.bak |
01:28
🔗
|
closure |
SketchCow: nice! |
01:45
🔗
|
|
trs80 has quit IRC (Ping timeout: 186 seconds) |
01:46
🔗
|
|
Start-mob has joined #internetarchive.bak |
01:46
🔗
|
|
trs80 has joined #internetarchive.bak |
01:47
🔗
|
|
mattl____ has joined #internetarchive.bak |
01:48
🔗
|
mattl____ |
threw up a VM with 500gb disk space to test this :) |
01:48
🔗
|
|
logchfoo starts logging #internetarchive.bak at Thu Apr 23 01:48:24 2015 |
01:48
🔗
|
|
logchfoo has joined #internetarchive.bak |
01:49
🔗
|
|
mattl____ is now known as mattl |
01:50
🔗
|
closure |
hey mattl! |
01:50
🔗
|
mattl |
hey! |
01:51
🔗
|
|
closure is now known as joeyh |
01:51
🔗
|
joeyh |
just to avoid confusion |
01:51
🔗
|
mattl |
ahhh |
01:51
🔗
|
mattl |
i was looking for you. |
01:57
🔗
|
mattl |
joeyh: i like the message reminding me to make a cronjob and not just be lazy and run everything inside screen |
01:59
🔗
|
joeyh |
well, it's a start |
01:59
🔗
|
joeyh |
we need auto-cron, but it's hard to set that up in the right way for everyone |
02:00
🔗
|
mattl |
i have 4 screens with ./iabak running in each one, let's see how that works. Bytemark have decent bandwidth, shouldn't take too long |
02:00
🔗
|
pikhq |
I finally hit my space allocation so I've only got a cron job. |
02:01
🔗
|
|
trs80 has quit IRC (Ping timeout: 186 seconds) |
02:01
🔗
|
joeyh |
hmm, is BigV cheap enough to keep .5 tb spinning there? |
02:01
🔗
|
mattl |
yep. |
02:01
🔗
|
mattl |
20GBP a month or something like that. |
02:02
🔗
|
joeyh |
bytemark may not have the best bw to the IA, it seems it can be slow outside the US |
02:02
🔗
|
mattl |
well, this is a good test either way. CC is moving most things over to BigV. |
02:02
🔗
|
|
trs80 has joined #internetarchive.bak |
02:03
🔗
|
mattl |
and we need to start talking to IA |
02:03
🔗
|
SketchCow |
sep332: Fix the wiki |
02:11
🔗
|
sep332 |
i added minimal steps to get started and a link to the readme for the rest |
02:12
🔗
|
sep332 |
also the iabak script tells people to read the readme when you run it |
02:29
🔗
|
|
iabak-reg has joined #internetarchive.bak |
02:30
🔗
|
|
iabak-reg has quit IRC (Client Quit) |
02:31
🔗
|
|
iabak-reg has joined #internetarchive.bak |
02:32
🔗
|
SketchCow |
Nobody hopped on from this bit, I think |
02:32
🔗
|
SketchCow |
I will now have to push it |
02:46
🔗
|
iabak-reg |
05master 8e314b5 06other fast forward |
02:47
🔗
|
joeyh |
hmm, not quite iabak-reg |
02:48
🔗
|
iabak-reg |
03registrar 05master a8adff8 06other 10SHARD2/pubkeys registration of wdenton on SHARD2 |
02:48
🔗
|
joeyh |
that's more like it! |
02:57
🔗
|
iabak-reg |
03registrar 05master 79de7f9 06other 10SHARD2/pubkeys registration of justtesting on SHARD2 |
03:12
🔗
|
|
Start-mob has quit IRC (Remote host closed the connection) |
03:24
🔗
|
sep332 |
ooh |
04:06
🔗
|
iabak-reg |
03registrar 05master 4a3edb9 06other 10SHARD2/pubkeys registration of archiveteam on SHARD2 |
04:13
🔗
|
iabak-reg |
03registrar 05master a62663c 06other 10SHARD3/pubkeys registration of archiveteam on SHARD3 |
04:15
🔗
|
|
kalleboo has joined #internetarchive.bak |
04:18
🔗
|
kalleboo |
hi. when I run iabak, my terminal fills up with "dirname: invalid option -- 'z'" |
04:19
🔗
|
kalleboo |
this is with GNU coreutils 8.4 |
04:20
🔗
|
joeyh |
and that is the problem. |
04:21
🔗
|
joeyh |
workaround: touch IA.BAK/NOSHUF and restart |
04:22
🔗
|
kalleboo |
ok cool |
04:23
🔗
|
kalleboo |
yeah this is one of those "lying around doing one old thing" servers which isn't really eligible for upgrading everythig to the latest and greatest. it's on some quite-old distribution of centos |
04:24
🔗
|
joeyh |
we could fix it with a perl command that reads stdin, breaks on \0 , truncates to the directory name, and outputs back out with \0 |
04:27
🔗
|
|
zottelbey has joined #internetarchive.bak |
04:52
🔗
|
garyrh |
https://www.reddit.com/r/DataHoarder/comments/33iz8b/that_time_archive_team_decided_to_back_up_the/ |
04:54
🔗
|
|
mhazinsk has quit IRC (Ping timeout: 186 seconds) |
04:56
🔗
|
yipdw |
what is with /r/DataHoarder and assholes |
04:56
🔗
|
yipdw |
the correlation coefficient is almost 1 |
04:57
🔗
|
garyrh |
1? That's not very high. |
04:58
🔗
|
yipdw |
it is for the correlation coefficient |
04:59
🔗
|
garyrh |
NOT HIGH. |
04:59
🔗
|
yipdw |
you're right, that was two days ago |
05:03
🔗
|
garyrh |
And the place where it's higher? HN. |
05:23
🔗
|
db48x |
how can the correlation be higher than 1? |
05:31
🔗
|
pikhq |
"They backed up 9.12 TB? I don't mean to be a party pooper but that doesn't seem impressive. |
05:31
🔗
|
pikhq |
" |
05:31
🔗
|
pikhq |
That... |
05:31
🔗
|
pikhq |
That sounds like someone who isn't all that cognizant of what all that involves. |
05:32
🔗
|
pikhq |
It's not like we had some guy with a small number of empty drives mash wget. |
06:02
🔗
|
joeyh |
also, 9.1 * 3 |
06:04
🔗
|
pikhq |
*nod* |
06:21
🔗
|
DFJustin |
well it is /r/DataHoarder/ where there is a dude with a literal petabyte in his house |
06:21
🔗
|
DFJustin |
(hi ohhdemgir) |
06:22
🔗
|
DFJustin |
other forums might be more impressed |
06:44
🔗
|
SketchCow |
It won't be impressive until the number spikes up past a petabyte |
07:23
🔗
|
|
stapper has joined #internetarchive.bak |
07:31
🔗
|
espes___ |
I'm confused how the ia.bak git annex server is setup |
07:32
🔗
|
espes___ |
does it just have a local copy of all the shards or is stuff setup to use a remove backed by internetarchive s3 or something |
07:33
🔗
|
espes___ |
remote* |
07:36
🔗
|
iabak-reg |
03registrar 05master 0ea43c4 06other 10SHARD3/pubkeys registration of wild.dominic on SHARD3 |
07:36
🔗
|
espes___ |
or are the files themselves backed by urls or something |
07:38
🔗
|
db48x |
espes___: each shard is a git annex repository where each file is added using git annex addurl |
07:39
🔗
|
espes___ |
oh neat |
07:42
🔗
|
db48x |
if you do git annex whereis | less it'll show you where each file is located, including the url for the web remote |
08:19
🔗
|
|
cloudmons has joined #internetarchive.bak |
08:40
🔗
|
|
atomotic has joined #internetarchive.bak |
08:43
🔗
|
Senji |
Hmm, someone appears to have grabbed 2/3 of shard 2 overnight :-) |
08:43
🔗
|
Senji |
AKA yay shard2 finished |
08:44
🔗
|
|
cloudmons has quit IRC (ircd.choopa.net irc.mzima.net) |
09:09
🔗
|
SketchCow |
Hurrah |
09:30
🔗
|
|
marvinw has quit IRC (Read error: Operation timed out) |
09:32
🔗
|
|
logchfoo_ starts logging #internetarchive.bak at Thu Apr 23 09:32:52 2015 |
09:32
🔗
|
|
logchfoo_ has joined #internetarchive.bak |
09:34
🔗
|
|
GLaDOS has quit IRC (Read error: Operation timed out) |
09:36
🔗
|
|
GLaDOS has joined #internetarchive.bak |
09:37
🔗
|
|
svchfoo2 sets mode: +o GLaDOS |
09:52
🔗
|
|
Start has quit IRC (ircd.shaw.ca irc.shaw.ca) |
09:52
🔗
|
|
chfoo- has quit IRC (ircd.shaw.ca irc.shaw.ca) |
09:52
🔗
|
|
wp494 has quit IRC (ircd.shaw.ca irc.shaw.ca) |
09:52
🔗
|
|
garyrh has quit IRC (ircd.shaw.ca irc.shaw.ca) |
09:52
🔗
|
|
DFJustin has quit IRC (ircd.shaw.ca irc.shaw.ca) |
09:52
🔗
|
|
matthusby has quit IRC (ircd.shaw.ca irc.shaw.ca) |
09:52
🔗
|
|
Sanqui has quit IRC (ircd.shaw.ca irc.shaw.ca) |
09:52
🔗
|
|
underscor has quit IRC (ircd.shaw.ca irc.shaw.ca) |
09:52
🔗
|
|
csssuf has quit IRC (ircd.shaw.ca irc.shaw.ca) |
09:57
🔗
|
|
Start has joined #internetarchive.bak |
09:57
🔗
|
|
chfoo- has joined #internetarchive.bak |
09:57
🔗
|
|
wp494 has joined #internetarchive.bak |
09:57
🔗
|
|
garyrh has joined #internetarchive.bak |
09:57
🔗
|
|
matthusby has joined #internetarchive.bak |
09:57
🔗
|
|
DFJustin has joined #internetarchive.bak |
09:57
🔗
|
|
Sanqui has joined #internetarchive.bak |
09:57
🔗
|
|
underscor has joined #internetarchive.bak |
09:57
🔗
|
|
csssuf has joined #internetarchive.bak |
09:57
🔗
|
|
irc.shaw.ca sets mode: +o DFJustin |
10:02
🔗
|
|
marvinw has joined #internetarchive.bak |
10:22
🔗
|
|
marvinw has quit IRC (Ping timeout: 606 seconds) |
10:30
🔗
|
|
S[h]O[r]T has joined #internetarchive.bak |
10:30
🔗
|
|
sep332 has joined #internetarchive.bak |
10:31
🔗
|
|
svchfoo2 sets mode: +o sep332 |
10:32
🔗
|
|
marvinw has joined #internetarchive.bak |
10:33
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
11:11
🔗
|
|
kalleboo has quit IRC (Linkinus - http://linkinus.com) |
11:22
🔗
|
|
richo has joined #internetarchive.bak |
11:23
🔗
|
|
zottelbey has quit IRC (Remote host closed the connection) |
11:24
🔗
|
|
zottelbey has joined #internetarchive.bak |
11:24
🔗
|
joeyh |
woooo |
11:29
🔗
|
joeyh |
clients should be switching over to shard3 |
11:38
🔗
|
|
atomotic has joined #internetarchive.bak |
11:41
🔗
|
iabak-reg |
03registrar 05master 9409bbf 06other 10SHARD1/pubkeys registration of id on SHARD1 |
11:41
🔗
|
iabak-reg |
03registrar 05master 3fbf3d3 06other 10SHARD2/pubkeys registration of id on SHARD2 |
11:41
🔗
|
iabak-reg |
03registrar 05master 8fc1f58 06other 10SHARD3/pubkeys registration of id on SHARD3 |
11:45
🔗
|
|
zottelbey has quit IRC (Remote host closed the connection) |
11:47
🔗
|
|
zottelbey has joined #internetarchive.bak |
11:48
🔗
|
joeyh |
so shard2 took 10 days |
11:51
🔗
|
SketchCow |
Right. Although we were quiet about it initially. |
11:51
🔗
|
SketchCow |
We should probably set up the next 10 shards. |
11:51
🔗
|
SketchCow |
or 5 at least. |
11:53
🔗
|
SketchCow |
And is there scripting or contingency yet for the script to go "oh, there's more shards and I have more space" |
11:54
🔗
|
joeyh |
there is, it needs a slight bit of dehardcoding to not just switch to shard3 though |
12:02
🔗
|
SketchCow |
I'd say work on that, next. |
12:02
🔗
|
SketchCow |
Then we can start making sure that people with space who show up aren't waiting for assignment. |
12:03
🔗
|
SketchCow |
Obviously, as time goes on, people with multi-terabyte sets are going to help us hit larger and larger collections. |
12:23
🔗
|
iabak-reg |
03registrar 05master 669c3d1 06other 10SHARD3/pubkeys registration of id on SHARD3 |
13:02
🔗
|
|
sankin has joined #internetarchive.bak |
13:57
🔗
|
|
Start has quit IRC (Disconnected.) |
13:57
🔗
|
joeyh |
SketchCow: any collection recs for new shards? |
14:16
🔗
|
Senji |
I note that Shard 1 still doesn't have 100% >=3 backups (indeed it appears to have one file not backed up at all); is there anything going on to deal with that? |
14:17
🔗
|
Senji |
Shard 2 currently shows 100% >=3 so is comparatively in better shape |
14:18
🔗
|
SketchCow |
joeyh: I'll ping you with them |
14:50
🔗
|
|
Start has joined #internetarchive.bak |
14:57
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
14:57
🔗
|
|
Start has quit IRC (Disconnected.) |
15:03
🔗
|
|
Start has joined #internetarchive.bak |
15:05
🔗
|
|
phuzion has joined #internetarchive.bak |
15:17
🔗
|
joeyh |
so, I think we'll soon be able to configure git-annex like so: ((balanced_amoung(backup) and not (copies=backup:3)) or present |
15:18
🔗
|
joeyh |
and the files will be spread amoung the repos in a balanced way, w/o resorting to randomness like we do now |
15:18
🔗
|
joeyh |
and without the extra copies some files get now |
15:18
🔗
|
joeyh |
(well, with less of them anyhow) |
15:37
🔗
|
swebb |
So I'm still downloading files for SHARD2, but it looks to be 100% backed up now. Will the git-annex stuff be smart enough to start me downloading SHARD3? |
15:37
🔗
|
joeyh |
it should switch you over, yes |
15:37
🔗
|
swebb |
ok |
15:38
🔗
|
joeyh |
there might be a little period where your client hasn't heard in from the others that shard2 is done and does a little extra downloading |
15:51
🔗
|
joeyh |
db48x: hey, I see you installed fail2ban on the server. any particular reason? |
15:51
🔗
|
|
Start has quit IRC (Disconnected.) |
15:51
🔗
|
joeyh |
I'm unclined to just disable all password auth, but let clients connect as often as they like |
15:51
🔗
|
joeyh |
heey, this might explain some of the spikes in the graph, if a client got banned for a while |
16:00
🔗
|
|
Start has joined #internetarchive.bak |
16:26
🔗
|
|
real_eyes is now known as realeyes |
16:32
🔗
|
iabak-reg |
03registrar 05master d7b6ef4 06other 10SHARD3/pubkeys registration of bas+at on SHARD3 |
16:35
🔗
|
lhobas |
seems to work fine on OS X :) |
16:45
🔗
|
|
Start has quit IRC (Disconnected.) |
17:15
🔗
|
|
VADemon has joined #internetarchive.bak |
17:20
🔗
|
|
db48x has quit IRC (Ping timeout: 258 seconds) |
17:28
🔗
|
phuzion |
iabak does not work on freenas, is this known? |
17:28
🔗
|
phuzion |
(freenas is BSD based, I know) |
17:31
🔗
|
joeyh |
there's no git-annex build for it, that'd be the first problem |
17:35
🔗
|
phuzion |
ok, just wanted to make sure I wasn't going crazy |
17:41
🔗
|
ersi |
well, you're using BSD so you can't be too sure |
17:41
🔗
|
* |
ersi hides |
17:42
🔗
|
phuzion |
Hah |
17:56
🔗
|
|
kyan has quit IRC (Quit: This computer has gone to sleep) |
18:11
🔗
|
|
garyrh has quit IRC (Remote host closed the connection) |
18:46
🔗
|
iabak-reg |
03registrar 05master 761215d 06other 10SHARD3/pubkeys registration of chris on SHARD3 |
19:22
🔗
|
phuzion |
Hey guys, I'm getting the following error, any ideas? http://pastebin.com/Tv3n79Yh |
19:23
🔗
|
phuzion |
Should I just delete the files in question and rerun ./iabak? |
19:32
🔗
|
|
Senji2 has joined #internetarchive.bak |
19:33
🔗
|
Senji2 |
mmm, fscking shard2. Guess it's time to setup cronjob on cleopatra |
19:34
🔗
|
|
garyrh has joined #internetarchive.bak |
19:43
🔗
|
joeyh |
phuzion: hmm, I wonder if your repository is in direct mode? |
19:43
🔗
|
joeyh |
you could run git reset --hard in there |
19:43
🔗
|
phuzion |
I nuked it, I'm gonna try again |
19:43
🔗
|
joeyh |
don't know why a file would be changed though |
19:43
🔗
|
joeyh |
wait one sec |
19:44
🔗
|
joeyh |
you don't want to commit a deletion of that file |
19:44
🔗
|
joeyh |
so git reset --hard |
19:44
🔗
|
phuzion |
The original directory's already gone, sorry. |
19:44
🔗
|
phuzion |
I'm starting from scratch. |
19:44
🔗
|
joeyh |
oh, ok |
19:45
🔗
|
joeyh |
oh, this is on freebsd? |
19:45
🔗
|
phuzion |
nope |
19:45
🔗
|
phuzion |
centos 7 |
19:45
🔗
|
joeyh |
what filesystem? |
19:45
🔗
|
phuzion |
i'm saving to an NFS share if that makes any difference |
19:45
🔗
|
iabak-reg |
03registrar 05master b729821 06other 10SHARD3/pubkeys registration of chris on SHARD3 |
19:45
🔗
|
phuzion |
that's me re-registering |
19:45
🔗
|
joeyh |
nfs is gonna be flakey one way or another |
19:45
🔗
|
phuzion |
Bummer |
19:46
🔗
|
phuzion |
Flaky as in I shouldn't bother with it? |
19:47
🔗
|
joeyh |
depends, I've never seen it flake out this way before |
19:48
🔗
|
phuzion |
out of curiosity, what kind of network performance are you seeing when you're pulling objects? |
19:48
🔗
|
phuzion |
I'm getting about 200KiB/s |
19:48
🔗
|
Senji2 |
sounds about right |
19:49
🔗
|
joeyh |
that's on the low end. I'd say run concurrent iabak, but that is known to not be wise on nfs |
19:49
🔗
|
Senji2 |
I can max out my adsl with 10 copies |
19:50
🔗
|
phuzion |
Would iSCSI perform better? |
19:57
🔗
|
lhobas |
gut feeling: concurrent ./iabak on OS X performs poorly due to lack of shuf? (lots of "transfer already in progress") |
19:59
🔗
|
joeyh |
lhobas: it'll tend to contend with itself like that yes. |
19:59
🔗
|
Senji2 |
on linux with NOSHUF that doesn't slow things down much |
20:00
🔗
|
joeyh |
makes it do a bit more work to find each file |
20:02
🔗
|
Senji2 |
yeah, but most of thr time you're downloading 300+MB files at 200kB/s rather than finding a new file |
20:03
🔗
|
lhobas |
seeing lots of small files and contending atm with (beginning of) shard3 |
20:04
🔗
|
sep332 |
wow <10% of the files are IA-only now |
20:04
🔗
|
Senji2 |
if the files are primarily small tgen you might be better off running one copy until you get to some bigger ones |
20:05
🔗
|
Senji2 |
depending how big 'small' is |
20:06
🔗
|
lhobas |
Senji2: doing that for now |
20:06
🔗
|
lhobas |
any chance of changing the hostname that is synced through git-annex? (and showing up in http://iabak.archiveteam.org/stats/SHARD3.leaderboard etc) Did not consider how it might leak personal info |
20:09
🔗
|
Senji2 |
for shard2 I got a bit over a TB with concurrent NOSHUF, but my machine running shard3 has a working readlink so I don't know what 3 is like in that regard. |
20:12
🔗
|
joeyh |
lhobas: just cd SHARDn; git annex describe . whatever |
20:17
🔗
|
|
atomotic has joined #internetarchive.bak |
20:37
🔗
|
joeyh |
moar red :) |
20:39
🔗
|
phuzion |
joeyh: I might be able to throw like 2-3TB at this if I can figure out iSCSI on this NAS |
20:41
🔗
|
Senji2 |
job for next week is to see how much of the spare disk pile still works :) |
20:51
🔗
|
|
atomotic has quit IRC (Ping timeout: 260 seconds) |
20:58
🔗
|
|
sankin has quit IRC (Leaving.) |
21:39
🔗
|
iabak-reg |
03registrar 05master 948a90f 06other 10SHARD4/pubkeys registration of sean.palmer on SHARD4 |
21:53
🔗
|
Atluxity |
I am having a hard time doing the math behind this project |
21:53
🔗
|
Atluxity |
do we intend this to be cold storage, or online? |
22:13
🔗
|
|
kyan has joined #internetarchive.bak |
22:21
🔗
|
Senji2 |
nearline/online |
22:39
🔗
|
iabak-reg |
03registrar 05master d682f2b 06other 10SHARD3/pubkeys registration of primus1024 on SHARD3 |
22:42
🔗
|
Kazzy |
lhobas: for changing your hostname, cd into the shard directory, then run 'git annex describe . <infohere>' |
22:43
🔗
|
lhobas |
fixed it, thanks joeyh & Kazzy |
22:44
🔗
|
Kazzy |
oh right, didn't see message from joeyh between commit messages, heh |
22:44
🔗
|
|
primus102 has joined #internetarchive.bak |
22:47
🔗
|
primus102 |
Hi, can someone help me with a problem running ./iabak? When it starts it keeps showing msg: dirname: invalid option -- 'z' |
22:47
🔗
|
primus102 |
Try `dirname --help' for more information. |
22:55
🔗
|
|
primus104 has joined #internetarchive.bak |
22:55
🔗
|
|
primus102 has quit IRC (Remote host closed the connection) |
22:56
🔗
|
Senji |
touching the NOSHUF file in your IA.BAK folder will stop that (but also stop it shuffling the order of downloads) |
22:57
🔗
|
|
primus has joined #internetarchive.bak |
22:58
🔗
|
primus |
thanks, so if i understand correctly it's not a serious error msg and it's ok to leave it working like that? |
22:58
🔗
|
trs80 |
correct |
22:59
🔗
|
primus |
thank you |
23:11
🔗
|
iabak-reg |
03registrar 05master 27f9cd3 06other 10SHARD4/pubkeys registration of primus1024 on SHARD4 |
23:26
🔗
|
|
zottelbey has quit IRC (Remote host closed the connection) |