Time |
Nickname |
Message |
00:13
🔗
|
|
protodev has quit IRC (Ping timeout: 606 seconds) |
00:14
🔗
|
|
protodev has joined #internetarchive.bak |
00:32
🔗
|
|
primus104 has quit IRC (Leaving.) |
00:40
🔗
|
|
protodev has quit IRC (Ping timeout: 606 seconds) |
00:41
🔗
|
|
protodev has joined #internetarchive.bak |
00:56
🔗
|
|
niyaje4 has joined #internetarchive.bak |
01:10
🔗
|
|
protodev has quit IRC (Ping timeout: 606 seconds) |
01:10
🔗
|
|
protodev has joined #internetarchive.bak |
01:21
🔗
|
|
protodev has quit IRC (Ping timeout: 606 seconds) |
01:22
🔗
|
|
protodev has joined #internetarchive.bak |
04:21
🔗
|
|
niyaje4 has quit IRC (Ping timeout: 600 seconds) |
05:03
🔗
|
|
Ctrl-S has quit IRC ( HydraIRC -> http://www.hydrairc.com <- In tests, 0x09 out of 0x0A l33t h4x0rz prefer it :)) |
05:10
🔗
|
|
Ctrl-S has joined #internetarchive.bak |
05:48
🔗
|
|
marvinw has quit IRC (Read error: Operation timed out) |
06:08
🔗
|
|
zottelbey has joined #internetarchive.bak |
06:13
🔗
|
|
marvinw has joined #internetarchive.bak |
06:15
🔗
|
|
Control-S has joined #internetarchive.bak |
06:19
🔗
|
|
Ctrl-S has quit IRC (Read error: Operation timed out) |
06:19
🔗
|
|
Control-S is now known as Ctrl-S |
06:20
🔗
|
|
primus104 has joined #internetarchive.bak |
06:22
🔗
|
|
niyaje4 has joined #internetarchive.bak |
07:19
🔗
|
|
atomotic has joined #internetarchive.bak |
07:36
🔗
|
lhobas |
my SHARD3 download finished, with 108 files that could not be downloaded - any similar experiences here? |
08:15
🔗
|
Senji |
There's a bug with URL parsing of files with # in the URLs |
08:23
🔗
|
|
niyaje4 has quit IRC (Ping timeout: 600 seconds) |
09:07
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
09:28
🔗
|
Senji |
So, when's shard5 going to appear? :-) |
10:50
🔗
|
|
primus104 has quit IRC (Leaving.) |
11:00
🔗
|
db48x |
neat, one of my shards became corrupted |
11:00
🔗
|
db48x |
https://pastebin.mozilla.org/8832598 |
11:00
🔗
|
Senji |
neat? |
11:00
🔗
|
db48x |
git annex fsck fixed it though |
11:01
🔗
|
db48x |
Senji: yes |
11:01
🔗
|
db48x |
this is a test, we have to have bad things happen to it |
11:03
🔗
|
Senji |
Good point :) |
11:03
🔗
|
db48x |
I deliberately put it on a disk that I suspected of having trouble |
11:04
🔗
|
Senji |
Testing the disk too then? :) |
11:04
🔗
|
db48x |
yes |
11:05
🔗
|
Senji |
Any idea when shard5 comes out? :-) |
11:05
🔗
|
db48x |
gah, shard2 is broken the same way |
11:06
🔗
|
db48x |
weirdly it's exactly the same hash in the error messages... |
11:07
🔗
|
db48x |
oh man, git annex fsck is broken here too |
11:08
🔗
|
db48x |
it's just repeating this over and over: |
11:08
🔗
|
db48x |
git-annex: fd:12: commitBuffer: resource vanished (Broken pipe) |
11:08
🔗
|
db48x |
failed |
11:09
🔗
|
Senji |
Odd |
11:10
🔗
|
db48x |
indeed |
11:51
🔗
|
iabak-reg |
03registrar 05master fc499f1 06other 10SHARD4/pubkeys registration of db48x iabak on SHARD4 |
12:32
🔗
|
|
atomotic has joined #internetarchive.bak |
12:47
🔗
|
|
sankin has joined #internetarchive.bak |
13:16
🔗
|
|
primus104 has joined #internetarchive.bak |
13:40
🔗
|
protodev |
btw, i uploaded my current perl-script: https://github.com/cancerAlot/IA.BAK/tree/perl-master |
13:41
🔗
|
protodev |
it's still broken and not ready yet; if someone spots the bug in checkssh() which causes to register with an wrong ssh-pub-key, tell me ;) |
13:51
🔗
|
|
garyrh has quit IRC (http://bnc4free.com/) |
13:58
🔗
|
|
Start has quit IRC (Disconnected.) |
14:35
🔗
|
|
Start has joined #internetarchive.bak |
14:39
🔗
|
db48x |
protodev: excellent, I'll take a look |
14:59
🔗
|
|
Start has quit IRC (Disconnected.) |
15:00
🔗
|
protodev |
db48x: there is no maintain-function, no cronjob, no proper osx-support (in the install-sub) and loads of other missing features |
15:01
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
15:03
🔗
|
protodev |
and the whole @cmd-thing will be replaced - ugly hack which is not needed |
15:03
🔗
|
|
Start has joined #internetarchive.bak |
15:16
🔗
|
|
Start has quit IRC (Disconnected.) |
15:31
🔗
|
iabak-reg |
03registrar 05master b2119e2 06other 10SHARD3/pubkeys registration of protodev on SHARD3 |
15:31
🔗
|
protodev |
YES! :D |
15:32
🔗
|
protodev |
it works !! |
15:32
🔗
|
sep332 |
woot |
15:33
🔗
|
protodev |
ok, not yet... |
15:40
🔗
|
iabak-reg |
03registrar 05master 106f19e 06other 10SHARD3/pubkeys registration of db48x+iabak on SHARD3 |
15:41
🔗
|
Senji |
Hmm, the disk I have my shard4 repo in is nearly full; should I create a new shard4 repo on the next disk and worry about not having duplication; or just wait until shard5? |
15:42
🔗
|
db48x |
if you create another shard4, then you'll want to manually set them both up so that they don't want content that the other has |
15:43
🔗
|
Senji |
Yeah, exactly |
15:44
🔗
|
sep332 |
can I do that with iabak or do i have to run "git-annex get" directly with appropriate flags? |
15:44
🔗
|
db48x |
use git annex wanted |
15:44
🔗
|
iabak-reg |
03registrar 05master ed8d47f 06other 10SHARD3/pubkeys registration of protodev on SHARD3 |
15:44
🔗
|
|
Start has joined #internetarchive.bak |
15:47
🔗
|
Senji |
The git annex wanted documentation says it doesn't have --in; which seems to rule out the obvious way of doing that (git annex wanted --not --in <uuid>) |
15:49
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
15:57
🔗
|
db48x |
yep, successfully registered |
15:57
🔗
|
db48x |
plenty of other problems after that though :) |
16:00
🔗
|
db48x |
protodev: I'll send you a pull request |
16:02
🔗
|
db48x |
it's probably similar to what you've already got, but since you said that yours didn't work... |
16:08
🔗
|
|
primus104 has quit IRC (Leaving.) |
16:16
🔗
|
db48x |
protodev: removed some duplicated bits |
16:18
🔗
|
sep332 |
are more shards planned? just curious |
16:21
🔗
|
|
Start has joined #internetarchive.bak |
16:24
🔗
|
|
garyrh has joined #internetarchive.bak |
16:43
🔗
|
|
Start has quit IRC (Disconnected.) |
16:56
🔗
|
protodev |
db48x: we wrote the exactly same piece of code ^ |
16:56
🔗
|
protodev |
i just named my variables a bit different |
17:23
🔗
|
|
primus104 has joined #internetarchive.bak |
17:26
🔗
|
yipdw |
the reincarnation of Thompson and Ritchie |
17:31
🔗
|
db48x |
:) |
17:31
🔗
|
db48x |
sep332: yes |
17:31
🔗
|
db48x |
sep332: the plan is to have enough shards to back up all of archive.org, of course |
17:32
🔗
|
sep332 |
are you going to add each as the previous one gets close to done? or all done? |
17:32
🔗
|
yipdw |
we should call them crystals in preparation for Final Fantasy XVI or whatever |
17:33
🔗
|
db48x |
no, I believe the plan is to add all of the rest of them all at once, and spread the word wide and far |
17:34
🔗
|
sep332 |
ok cool |
17:38
🔗
|
db48x |
I think there were a few more things we wanted to do before then, though |
17:44
🔗
|
|
primus104 has quit IRC (Leaving.) |
18:32
🔗
|
|
primus104 has joined #internetarchive.bak |
19:33
🔗
|
tpw_rules |
yipdw: you're only allowed to have 7 crystals |
19:33
🔗
|
tpw_rules |
perhaps make the number of total shards evenly divisible by 7? |
19:50
🔗
|
closure |
can make a couple more shards, just need to pick collections for them to get 100k-200k files |
19:51
🔗
|
sep332 |
if the same file (by MD5) gets uploaded multiple times, will it get backed up more? or only 4 copies per MD5? |
19:52
🔗
|
closure |
if it's in the same shard, only once |
19:52
🔗
|
closure |
well, 4 times |
19:52
🔗
|
Senji |
closure: are there likely to be more shards soon then? |
19:53
🔗
|
closure |
once someone picks some collections |
19:53
🔗
|
Senji |
my shard4 disk is full, so if I want to start another one off it'd currently have to be another shard4 with some faff |
19:53
🔗
|
closure |
or I can just randomly pick some, shrug |
19:53
🔗
|
Senji |
Well, nearly full :) |
19:54
🔗
|
sep332 |
hm. a file can be in multiple collections. If those collections are in different shards, the same file will count for both shards |
19:54
🔗
|
closure |
when a given filename is in multiple collections, it only gets into one shard |
19:54
🔗
|
closure |
but if the same file is in the IA under different names, it might end up redundantly in multiple shards |
19:55
🔗
|
sep332 |
ok thanks |
19:55
🔗
|
closure |
(could sort and uniq this away, but it's a 32gb file listing them all..) |
19:55
🔗
|
Senji |
There are enough files in the IA that I'm sure there are MD5 collisions |
19:55
🔗
|
closure |
that too |
19:56
🔗
|
closure |
although iirc we did the math and collisions seemed unlikely |
19:56
🔗
|
closure |
at least collisions within a shard seem very unlikely |
19:56
🔗
|
closure |
(non-intentional collisions) |
19:56
🔗
|
pikhq |
As I understand it the risk of *accidental* MD5 collisions is still astronomically low. |
19:56
🔗
|
sep332 |
MD5 collisions are extremely unlikely unless you control both files |
19:57
🔗
|
pikhq |
Or, "unless you are actually trying for a collision". |
19:57
🔗
|
sep332 |
if i give you a static MD5, like one from the census, you couldn't make a file that collides. |
19:58
🔗
|
sep332 |
there is a cryptanalytic weakness but it's still not good enough to be feasible |
20:02
🔗
|
Senji |
sep332: actually, people have created md5 collisions for existing other documents |
20:03
🔗
|
Senji |
sep332: but the thing here isn't malicious intent, it's the birthday 'paradox' |
20:07
🔗
|
sep332 |
hadn't thought of birthday paradox. But I can't find any instance of preimage attack on md5? |
20:08
🔗
|
closure |
./mkSHARD 'wwIIarchive dnalounge jstor_polisciequar usda-commoditysituationreports jstor_jpoliecon' 5 |
20:08
🔗
|
closure |
fairly random, just a list an early shard packer found |
20:09
🔗
|
Senji |
sep332: the Flame malware forged a Windows code-signing cert |
20:18
🔗
|
sep332 |
Senji2: that was a chosen-prefix collision, not a preimage. the attackers controlled part of the "original" certificate |
20:19
🔗
|
Senji |
sep332: oh, OK, my apologies; I must have misunderstood |
20:19
🔗
|
Senji |
I've seen a random collision (in work's backups) before :) |
20:19
🔗
|
sep332 |
some questionable conclusions but a good summary https://randomoracle.wordpress.com/2012/06/13/unanswered-questions-about-the-flame-certificate-forgery-22/ |
20:19
🔗
|
sep332 |
oh, crazy |
20:29
🔗
|
closure |
./mkSHARD 'sports wikipediadumps Strangefolk canadianpamphlets statedocs_maine 1920_census' 6 |
20:39
🔗
|
|
zottelbey has quit IRC (Remote host closed the connection) |
20:47
🔗
|
closure |
wow, this would be a BIG shard |
20:47
🔗
|
closure |
6.19 terabytes |
20:47
🔗
|
closure |
I'll bet it's all the sports vids |
20:47
🔗
|
Senji |
Something to get teeth into :) |
20:47
🔗
|
closure |
maybe save that one for later |
20:48
🔗
|
sep332 |
similar number of files though? ~100,000? |
20:48
🔗
|
closure |
yes |
20:48
🔗
|
closure |
hmm, sports is only 1.25 tb |
20:49
🔗
|
closure |
must be the wikipedia |
20:49
🔗
|
closure |
1.44 tb |
20:50
🔗
|
closure |
everything in this shard is kinda large |
20:50
🔗
|
closure |
1920 census is 2.6 tb |
20:51
🔗
|
|
sankin has quit IRC (Leaving.) |
20:51
🔗
|
sep332 |
as in the year 1920? pretty sure they didn't have that many hard drives lo |
20:51
🔗
|
sep332 |
lol |
20:52
🔗
|
closure |
yeah, damnifiknow what's in tha |
20:52
🔗
|
sep332 |
high-res scans of every piece of paper? |
20:53
🔗
|
closure |
only 61 gb of canadian pamphlets |
20:54
🔗
|
db48x |
yes |
20:55
🔗
|
db48x |
scans of microfilm of photographs of the pieces of paper |
20:55
🔗
|
closure |
all the scratches lovingly preserved at 600dpi |
20:56
🔗
|
sep332 |
a video of someone reading all the papers out loud |
20:57
🔗
|
closure |
sep332: a scanning tunneling microscophy of your hard drive, 300 yrs after you back this up |
20:57
🔗
|
sep332 |
oh wait db48x are you serious? |
20:58
🔗
|
db48x |
yes |
20:59
🔗
|
sep332 |
then "called it" ;) |
21:00
🔗
|
db48x |
closure: did you see the stuff about my broken shards from this morning? |
21:01
🔗
|
closure |
yeah, when a git repo is that hosed, you can use git-annex repair, but it will be slow on the shards. Probably better to re-clone and move in .git/config and .git/annex/objects |
21:02
🔗
|
db48x |
how is that different from fsck? |
21:02
🔗
|
closure |
fsck just finds problems |
21:02
🔗
|
db48x |
it fixed the first shard |
21:03
🔗
|
closure |
it doesn't fix broken git repos, it only fixes up git-annex specific issues |
21:05
🔗
|
db48x |
what do you think is broken? |
21:05
🔗
|
closure |
you have a bunch of broken .git/objects/ files |
21:05
🔗
|
db48x |
well, one |
21:06
🔗
|
db48x |
same hash repeated a number of times |
21:06
🔗
|
db48x |
what about the other message? |
21:06
🔗
|
db48x |
git-annex: fd:12: commitBuffer: resource vanished (Broken pipe) |
21:06
🔗
|
db48x |
failed |
21:07
🔗
|
closure |
just a consequence of the actual broken git repo |
21:37
🔗
|
db48x |
it'd be nice if it were more informative, and less spammy |
22:04
🔗
|
iabak-reg |
03registrar 05master ad0b477 06other 10SHARD5/pubkeys registration of jdamery+iabak on SHARD5 |
22:08
🔗
|
iabak-reg |
03registrar 05master 1308265 06other 10SHARD3/pubkeys registration of protodev on SHARD3 |
22:16
🔗
|
|
DFJustin has quit IRC (IMHOSTFU) |
22:17
🔗
|
|
DFJustin has joined #internetarchive.bak |
22:17
🔗
|
|
svchfoo2 sets mode: +o DFJustin |
22:17
🔗
|
|
svchfoo1 sets mode: +o DFJustin |
22:17
🔗
|
|
Start has joined #internetarchive.bak |
22:21
🔗
|
|
toad2 has joined #internetarchive.bak |
22:22
🔗
|
iabak-reg |
03registrar 05master 67e3630 06other 10SHARD3/pubkeys registration of protodev on SHARD3 |
22:27
🔗
|
|
toad1 has quit IRC (Read error: Operation timed out) |
23:30
🔗
|
db48x |
oh yea, this drive is toast: |
23:30
🔗
|
db48x |
[db48x@celebdil shard2]$ ../git-annex.linux/git-annex fsck --quiet |
23:30
🔗
|
db48x |
git-annex: .git/annex/objects/Gk/8X/MD5-s1843643129--f8d132c8f834ebf8597856bcfc069f8f/MD5-s1843643129--f8d132c8f834ebf8597856bcfc069f8f: hGetBufSome: hardware fault (Input/output error) |
23:31
🔗
|
db48x |
and also: |
23:31
🔗
|
db48x |
May 06 16:21:30 celebdil smartd[1196]: Device: /dev/sdc [SAT], 1062 Currently unreadable (pending) sectors |
23:31
🔗
|
db48x |
May 06 16:21:30 celebdil smartd[1196]: Device: /dev/sdc [SAT], 51 Offline uncorrectable sectors |
23:38
🔗
|
Senji |
loverly |
23:40
🔗
|
db48x |
actually, I suppose the word "toast" applies better to a different drive from a different computer which got burnt a couple of weeks ago |
23:40
🔗
|
db48x |
power supply shorted and caused a small fire |
23:44
🔗
|
db48x |
oh, even better: |
23:44
🔗
|
db48x |
May 06 15:16:16 celebdil kernel: EXT4-fs (sdc1): Delayed block allocation failed for inode 28312757 at logical offset 1162316 with max |
23:44
🔗
|
db48x |
May 06 15:16:16 celebdil kernel: EXT4-fs (sdc1): This should not happen!! Data will be lost |
23:44
🔗
|
db48x |
May 06 15:16:16 celebdil kernel: EXT4-fs error (device sdc1) in ext4_writepages:2395: Journal has aborted |
23:44
🔗
|
db48x |
I'm so glad I use ZFS for most things |