Time |
Nickname |
Message |
00:09
🔗
|
|
Owen-x has quit (Owen-x) |
00:26
🔗
|
|
aschmitz (~aschmitz@[redacted]) has joined #internetarchive.bak |
00:44
🔗
|
|
tpw_rules (~tpw_rules@[redacted]) has joined #internetarchive.bak |
00:44
🔗
|
tpw_rules |
hey. i heard about you all from twitter. i've got lots of free space i can give |
00:47
🔗
|
tpw_rules |
how can i get my ssh public key added? closure the wiki says to talk to you |
00:54
🔗
|
tpw_rules |
closure: i've got a key to give you |
01:00
🔗
|
SketchCow |
Closure is popping in and out. You might want to just message him and he'll see it when he unidles. |
01:00
🔗
|
SketchCow |
A volunteer stepped forward with 19tb, very kind. |
01:00
🔗
|
tpw_rules |
just put it on a locked pastie or something? i have about 3TB to spare |
01:00
🔗
|
SketchCow |
Of course, bigger numbers are more meaningful as we start considering how to do backups. |
01:00
🔗
|
SketchCow |
tpw_rules: I don't understand locked pastie in this |
01:01
🔗
|
SketchCow |
tpw_rules: Oh, I see. Well, however you'd like. |
01:01
🔗
|
tpw_rules |
it's a really ambitious project. have you talked to anybody in big business? google et al do this kind of thing |
01:08
🔗
|
closure |
tpw_rules: added your key |
01:08
🔗
|
tpw_rules |
cool. then just follow the wiki? |
01:08
🔗
|
tpw_rules |
it doesn't require any incoming ports open does it |
01:09
🔗
|
closure |
tpw_rules: yes, follow the wiki .. and keep in touch since this is just a test |
01:09
🔗
|
closure |
no incoming ports needed, no |
01:09
🔗
|
tpw_rules |
RSA key fingerprint is 79:ea:f9:7f:89:7e:29:27:4c:63:74:53:f9:1c:f3:d4. that you? |
01:10
🔗
|
tpw_rules |
i can do that |
01:10
🔗
|
tpw_rules |
i'll just idle here |
01:11
🔗
|
closure |
that's the right ssh host key,, yes |
01:19
🔗
|
ersi |
tpw_rules: Hehe, welcome around. :) |
01:25
🔗
|
tpw_rules |
i'm getting a 403 forbidden |
01:25
🔗
|
tpw_rules |
Try making some of these repositories available: |
01:25
🔗
|
tpw_rules |
00000000-0000-0000-0000-000000000001 -- web |
01:25
🔗
|
tpw_rules |
is that me |
01:27
🔗
|
tpw_rules |
yeah i'm not able to download anything |
01:33
🔗
|
tpw_rules |
reset everything and still 403 |
01:35
🔗
|
tpw_rules |
did i break something or is it an issue with archive.org? |
01:38
🔗
|
closure |
must be archive.org (works for me tho) |
01:38
🔗
|
tpw_rules |
http://pastie.org/private/5jtfzil4whdcji0lnecag |
01:39
🔗
|
tpw_rules |
"The item is not available due to issues with the item's content. " so i guess i'm just downloading bad files |
01:39
🔗
|
tpw_rules |
i'll just let it go and get to some good files |
01:40
🔗
|
closure |
always possible they darked a few of the files but if it keeps failing, might be something else on your end |
01:40
🔗
|
tpw_rules |
oh, i changed the command a bit and it's downloading different files okay |
01:41
🔗
|
tpw_rules |
well i'll let that crunch overnight. full repo is ~3TB? i have that space |
01:42
🔗
|
closure |
https://archive.org/download/Ttscribe/Ttscribe_meta.xml is indeed darked |
01:42
🔗
|
closure |
a little less than 3tb I think |
01:43
🔗
|
sep332 |
there is a load-balancing issue which might slow your downloads, temporarily anyway |
01:45
🔗
|
tpw_rules |
why is a lot of this stuff tarred instead of compressed too? |
01:49
🔗
|
tpw_rules |
instead of tar.xz or something. ease of access? |
02:05
🔗
|
sep332 |
In general, the vast majority of archive items are compressed. not sure about these collections in particular though |
02:10
🔗
|
sep332 |
a quick glance shows these tar files are full of .jp2 (JPEG2000) files. |
02:11
🔗
|
tpw_rules |
ahhh. so no sense recompressing them |
02:25
🔗
|
balrog |
ohai tpw_rules |
02:25
🔗
|
tpw_rules |
it might be worth it to create a guide showing how to attach a bunch of spare disks to a raspberry pi or something and set it up to archive |
02:25
🔗
|
tpw_rules |
i'm doing research on using unionfs to tolerate disk failures and i'll see what i can come up wioth |
02:27
🔗
|
tpw_rules |
we all know everybody loves relatively meaningless raspberry pi projects :D |
02:35
🔗
|
SketchCow |
So, two things that are becoming obvious |
02:35
🔗
|
SketchCow |
One, the "backup drive" will have notable curation by a team of us, where we slowly add new items to the "drive", based on historical value and need |
02:35
🔗
|
SketchCow |
Because multiple petabytes are unlikely to fall out of the sky |
02:37
🔗
|
tpw_rules |
are there any sort of "responsible bandwidth limits" for doing something like this by archive.org itself? i can suck 12MB/s down and i don't want to break anything |
02:38
🔗
|
SketchCow |
No, absolutely not |
02:38
🔗
|
SketchCow |
Brewster wants the lines absolutely packed to insane levels all the time. |
02:38
🔗
|
SketchCow |
And then he'll buy more. |
02:38
🔗
|
SketchCow |
We went from 40GB/s to 80GB/s relatively recently |
02:38
🔗
|
tpw_rules |
if you say so :) |
02:39
🔗
|
SketchCow |
I'm trying to see if I can find a metric. |
02:39
🔗
|
SketchCow |
If https://monitor.archive.org/weathermap/weathermap.html is still public |
02:39
🔗
|
tpw_rules |
i have no idea what that means but it looks cool |
02:40
🔗
|
|
balrog wishes btrfs had per-subvolume RAID already |
02:40
🔗
|
SketchCow |
Well, we want yellow. Lots of yellow |
02:41
🔗
|
SketchCow |
And then turning it back to blue |
02:46
🔗
|
|
Owen-x (~owen@[redacted]) has joined #internetarchive.bak |
03:22
🔗
|
|
Owen-x has quit (Owen-x) |
03:25
🔗
|
|
Owen-x (~owen@[redacted]) has joined #internetarchive.bak |
03:40
🔗
|
|
svchfoo1 has quit (Read error: Operation timed out) |
03:42
🔗
|
|
svchfoo1 (~chfoo1@[redacted]) has joined #internetarchive.bak |
03:43
🔗
|
|
svchfoo2 gives channel operator status to svchfoo1 |
03:50
🔗
|
|
Owen-x has quit (Owen-x) |
03:55
🔗
|
DFJustin |
how reliable is an rpi in terms of ram corruption etc |
04:00
🔗
|
SketchCow |
I heard that |
04:29
🔗
|
|
bzc6p_ (~bzc6p@[redacted]) has joined #internetarchive.bak |
04:34
🔗
|
|
bzc6p has quit (Ping timeout: 600 seconds) |
04:34
🔗
|
|
zottelbey (~zottelbey@[redacted]) has joined #internetarchive.bak |
05:06
🔗
|
|
zottelbey has quit (Remote host closed the connection) |
08:09
🔗
|
midas |
SketchCow: speed is getting better, peaks at 1.5MB/s now |
09:03
🔗
|
|
Muad-Dib (~paul@[redacted]) has joined #internetarchive.bak |
10:39
🔗
|
|
svchfoo1 has quit (Remote host closed the connection) |
10:40
🔗
|
|
svchfoo1 (~chfoo1@[redacted]) has joined #internetarchive.bak |
10:43
🔗
|
|
svchfoo2 gives channel operator status to svchfoo1 |
10:57
🔗
|
|
csssuf has quit (Ping timeout: 370 seconds) |
10:58
🔗
|
|
csssuf (~csssuf@[redacted]) has joined #internetarchive.bak |
12:21
🔗
|
SketchCow |
Great |
13:03
🔗
|
|
csssuf has quit (Ping timeout: 370 seconds) |
13:04
🔗
|
|
csssuf (~csssuf@[redacted]) has joined #internetarchive.bak |
13:44
🔗
|
|
zottelbey (~zottelbey@[redacted]) has joined #internetarchive.bak |
13:59
🔗
|
|
bpye (~quassel@[redacted]) has joined #internetarchive.bak |
13:59
🔗
|
|
bpye has quit (Remote host closed the connection) |
14:00
🔗
|
|
bpye has quit (Remote host closed the connection) |
14:00
🔗
|
|
bpye (~quassel@[redacted]) has joined #internetarchive.bak |
14:32
🔗
|
midas |
and it slowed down again :p 92.0KB/s |
16:30
🔗
|
|
bzc6p__ (~bzc6p@[redacted]) has joined #internetarchive.bak |
16:35
🔗
|
|
bzc6p_ has quit (Read error: Operation timed out) |
17:14
🔗
|
|
patricko- is now known as patrickod |
17:26
🔗
|
|
patrickod is now known as patricko- |
17:50
🔗
|
SketchCow |
How much of it is out there! (Closure had a factoid) |
17:50
🔗
|
SketchCow |
The stats run - how long does it take? |
18:04
🔗
|
|
zottelbey has quit (Remote host closed the connection) |
18:06
🔗
|
|
patricko- is now known as patrickod |
18:08
🔗
|
closure |
hey so I'd like to set up a project on github for cllient-side scripts |
18:09
🔗
|
|
bzc6p_ (~bzc6p@[redacted]) has joined #internetarchive.bak |
18:11
🔗
|
|
bzc6p__ has quit (Read error: Operation timed out) |
18:15
🔗
|
closure |
numcopies +0: 72540 |
18:15
🔗
|
closure |
numcopies +1: 30779 |
18:15
🔗
|
closure |
numcopies +2: 21 |
18:15
🔗
|
closure |
numcopies +3: 3 |
18:15
🔗
|
closure |
these stats are probably out of date.. everyone: git-annex sync |
18:16
🔗
|
|
bzc6p__ (~bzc6p@[redacted]) has joined #internetarchive.bak |
18:18
🔗
|
|
bzc6p_ has quit (Read error: Operation timed out) |
18:28
🔗
|
db48x |
closure: do you have a few minutes to look at an error I'm getting when I build git-annex? |
18:29
🔗
|
|
patrickod is now known as patricko- |
18:32
🔗
|
|
patricko- is now known as patrickod |
18:37
🔗
|
|
patrickod is now known as patricko- |
18:49
🔗
|
SketchCow |
Hey, so who was it who reported the 1pb of duplicates by md5? |
18:53
🔗
|
|
bzc6p__ is now known as bzc6p |
18:54
🔗
|
SketchCow |
sep332: Yo |
18:54
🔗
|
sep332 |
oh hey |
18:55
🔗
|
sep332 |
yeah that was me |
18:56
🔗
|
SketchCow |
Is it an actual report or textfile? |
18:57
🔗
|
balrog |
"by md5" --- ehh... |
18:57
🔗
|
balrog |
suggestion, also compare filesizes |
18:57
🔗
|
sep332 |
well i have a list of: count, hash, size |
18:57
🔗
|
sep332 |
so i did (count-1) * size to get size of duplicates |
18:58
🔗
|
SketchCow |
So, short form. |
18:58
🔗
|
SketchCow |
it is of interest to Brewster and IA if there is an assessment showing that there is 1pb of duplicate files. |
18:59
🔗
|
SketchCow |
And if it comes in the form of something we can look at. |
18:59
🔗
|
SketchCow |
So, if you have a file that can be "here are items that are the same" |
18:59
🔗
|
SketchCow |
That will be of use specifically. |
19:00
🔗
|
SketchCow |
I'd say a CSV of: |
19:00
🔗
|
SketchCow |
size,item1,item2,item.... |
19:00
🔗
|
|
patricko- is now known as patrickod |
19:09
🔗
|
sep332 |
balrog: MD5 is still resistant to preimage attacks. checking the size is a good idea though |
19:10
🔗
|
balrog |
IMHO, I'd do more than md5+size to actually confirm duplicates |
19:10
🔗
|
balrog |
I'd probably do compare of the entire data, if it was my drive |
19:10
🔗
|
sep332 |
SketchCow: I mostly have lists of individual files, but I can try extracting items if that's more useful |
19:11
🔗
|
SketchCow |
I think items are how we deal. |
19:21
🔗
|
|
Owen-x (~owen@[redacted]) has joined #internetarchive.bak |
19:23
🔗
|
|
patrickod is now known as patricko- |
19:43
🔗
|
|
Owen-x has quit (Owen-x) |
19:47
🔗
|
|
csssuf has quit (Ping timeout: 370 seconds) |
19:47
🔗
|
|
csssuf (~csssuf@[redacted]) has joined #internetarchive.bak |
21:03
🔗
|
|
Owen-x (~owen@[redacted]) has joined #internetarchive.bak |
21:15
🔗
|
|
patricko- is now known as patrickod |
21:19
🔗
|
|
Owen-x_ (~owen@[redacted]) has joined #internetarchive.bak |
21:21
🔗
|
|
patrickod is now known as patricko- |
21:21
🔗
|
|
Owen-x has quit (Ping timeout: 186 seconds) |
21:21
🔗
|
|
Owen-x_ is now known as Owen-x |
21:22
🔗
|
|
Owen-x has quit (Client Quit) |
22:26
🔗
|
|
swebb has quit (Quit: badcheese.com - where crap sometimes gets done) |
22:31
🔗
|
|
swebb (~swebb@[redacted]) has joined #internetarchive.bak |
22:32
🔗
|
|
Owen-x (~owen@[redacted]) has joined #internetarchive.bak |
22:35
🔗
|
|
patricko- is now known as patrickod |
23:02
🔗
|
|
patrickod is now known as patricko- |
23:02
🔗
|
|
patricko- is now known as patrickod |
23:08
🔗
|
|
Owen-x has quit (Owen-x) |
23:18
🔗
|
|
Owen-x (~owen@[redacted]) has joined #internetarchive.bak |
23:26
🔗
|
|
patrickod is now known as patricko- |
23:32
🔗
|
|
Owen-x has quit (Owen-x) |