#internetarchive.bak 2015-03-31,Tue

↑back Search

Time Nickname Message
00:09 🔗 Owen-x has quit (Owen-x)
00:26 🔗 aschmitz (~aschmitz@[redacted]) has joined #internetarchive.bak
00:44 🔗 tpw_rules (~tpw_rules@[redacted]) has joined #internetarchive.bak
00:44 🔗 tpw_rules hey. i heard about you all from twitter. i've got lots of free space i can give
00:47 🔗 tpw_rules how can i get my ssh public key added? closure the wiki says to talk to you
00:54 🔗 tpw_rules closure: i've got a key to give you
01:00 🔗 SketchCow Closure is popping in and out. You might want to just message him and he'll see it when he unidles.
01:00 🔗 SketchCow A volunteer stepped forward with 19tb, very kind.
01:00 🔗 tpw_rules just put it on a locked pastie or something? i have about 3TB to spare
01:00 🔗 SketchCow Of course, bigger numbers are more meaningful as we start considering how to do backups.
01:00 🔗 SketchCow tpw_rules: I don't understand locked pastie in this
01:01 🔗 SketchCow tpw_rules: Oh, I see. Well, however you'd like.
01:01 🔗 tpw_rules it's a really ambitious project. have you talked to anybody in big business? google et al do this kind of thing
01:08 🔗 closure tpw_rules: added your key
01:08 🔗 tpw_rules cool. then just follow the wiki?
01:08 🔗 tpw_rules it doesn't require any incoming ports open does it
01:09 🔗 closure tpw_rules: yes, follow the wiki .. and keep in touch since this is just a test
01:09 🔗 closure no incoming ports needed, no
01:09 🔗 tpw_rules RSA key fingerprint is 79:ea:f9:7f:89:7e:29:27:4c:63:74:53:f9:1c:f3:d4. that you?
01:10 🔗 tpw_rules i can do that
01:10 🔗 tpw_rules i'll just idle here
01:11 🔗 closure that's the right ssh host key,, yes
01:19 🔗 ersi tpw_rules: Hehe, welcome around. :)
01:25 🔗 tpw_rules i'm getting a 403 forbidden
01:25 🔗 tpw_rules Try making some of these repositories available:
01:25 🔗 tpw_rules 00000000-0000-0000-0000-000000000001 -- web
01:25 🔗 tpw_rules is that me
01:27 🔗 tpw_rules yeah i'm not able to download anything
01:33 🔗 tpw_rules reset everything and still 403
01:35 🔗 tpw_rules did i break something or is it an issue with archive.org?
01:38 🔗 closure must be archive.org (works for me tho)
01:38 🔗 tpw_rules http://pastie.org/private/5jtfzil4whdcji0lnecag
01:39 🔗 tpw_rules "The item is not available due to issues with the item's content. " so i guess i'm just downloading bad files
01:39 🔗 tpw_rules i'll just let it go and get to some good files
01:40 🔗 closure always possible they darked a few of the files but if it keeps failing, might be something else on your end
01:40 🔗 tpw_rules oh, i changed the command a bit and it's downloading different files okay
01:41 🔗 tpw_rules well i'll let that crunch overnight. full repo is ~3TB? i have that space
01:42 🔗 closure https://archive.org/download/Ttscribe/Ttscribe_meta.xml is indeed darked
01:42 🔗 closure a little less than 3tb I think
01:43 🔗 sep332 there is a load-balancing issue which might slow your downloads, temporarily anyway
01:45 🔗 tpw_rules why is a lot of this stuff tarred instead of compressed too?
01:49 🔗 tpw_rules instead of tar.xz or something. ease of access?
02:05 🔗 sep332 In general, the vast majority of archive items are compressed. not sure about these collections in particular though
02:10 🔗 sep332 a quick glance shows these tar files are full of .jp2 (JPEG2000) files.
02:11 🔗 tpw_rules ahhh. so no sense recompressing them
02:25 🔗 balrog ohai tpw_rules
02:25 🔗 tpw_rules it might be worth it to create a guide showing how to attach a bunch of spare disks to a raspberry pi or something and set it up to archive
02:25 🔗 tpw_rules i'm doing research on using unionfs to tolerate disk failures and i'll see what i can come up wioth
02:27 🔗 tpw_rules we all know everybody loves relatively meaningless raspberry pi projects :D
02:35 🔗 SketchCow So, two things that are becoming obvious
02:35 🔗 SketchCow One, the "backup drive" will have notable curation by a team of us, where we slowly add new items to the "drive", based on historical value and need
02:35 🔗 SketchCow Because multiple petabytes are unlikely to fall out of the sky
02:37 🔗 tpw_rules are there any sort of "responsible bandwidth limits" for doing something like this by archive.org itself? i can suck 12MB/s down and i don't want to break anything
02:38 🔗 SketchCow No, absolutely not
02:38 🔗 SketchCow Brewster wants the lines absolutely packed to insane levels all the time.
02:38 🔗 SketchCow And then he'll buy more.
02:38 🔗 SketchCow We went from 40GB/s to 80GB/s relatively recently
02:38 🔗 tpw_rules if you say so :)
02:39 🔗 SketchCow I'm trying to see if I can find a metric.
02:39 🔗 SketchCow If https://monitor.archive.org/weathermap/weathermap.html is still public
02:39 🔗 tpw_rules i have no idea what that means but it looks cool
02:40 🔗 balrog wishes btrfs had per-subvolume RAID already
02:40 🔗 SketchCow Well, we want yellow. Lots of yellow
02:41 🔗 SketchCow And then turning it back to blue
02:46 🔗 Owen-x (~owen@[redacted]) has joined #internetarchive.bak
03:22 🔗 Owen-x has quit (Owen-x)
03:25 🔗 Owen-x (~owen@[redacted]) has joined #internetarchive.bak
03:40 🔗 svchfoo1 has quit (Read error: Operation timed out)
03:42 🔗 svchfoo1 (~chfoo1@[redacted]) has joined #internetarchive.bak
03:43 🔗 svchfoo2 gives channel operator status to svchfoo1
03:50 🔗 Owen-x has quit (Owen-x)
03:55 🔗 DFJustin how reliable is an rpi in terms of ram corruption etc
04:00 🔗 SketchCow I heard that
04:29 🔗 bzc6p_ (~bzc6p@[redacted]) has joined #internetarchive.bak
04:34 🔗 bzc6p has quit (Ping timeout: 600 seconds)
04:34 🔗 zottelbey (~zottelbey@[redacted]) has joined #internetarchive.bak
05:06 🔗 zottelbey has quit (Remote host closed the connection)
08:09 🔗 midas SketchCow: speed is getting better, peaks at 1.5MB/s now
09:03 🔗 Muad-Dib (~paul@[redacted]) has joined #internetarchive.bak
10:39 🔗 svchfoo1 has quit (Remote host closed the connection)
10:40 🔗 svchfoo1 (~chfoo1@[redacted]) has joined #internetarchive.bak
10:43 🔗 svchfoo2 gives channel operator status to svchfoo1
10:57 🔗 csssuf has quit (Ping timeout: 370 seconds)
10:58 🔗 csssuf (~csssuf@[redacted]) has joined #internetarchive.bak
12:21 🔗 SketchCow Great
13:03 🔗 csssuf has quit (Ping timeout: 370 seconds)
13:04 🔗 csssuf (~csssuf@[redacted]) has joined #internetarchive.bak
13:44 🔗 zottelbey (~zottelbey@[redacted]) has joined #internetarchive.bak
13:59 🔗 bpye (~quassel@[redacted]) has joined #internetarchive.bak
13:59 🔗 bpye has quit (Remote host closed the connection)
14:00 🔗 bpye has quit (Remote host closed the connection)
14:00 🔗 bpye (~quassel@[redacted]) has joined #internetarchive.bak
14:32 🔗 midas and it slowed down again :p 92.0KB/s
16:30 🔗 bzc6p__ (~bzc6p@[redacted]) has joined #internetarchive.bak
16:35 🔗 bzc6p_ has quit (Read error: Operation timed out)
17:14 🔗 patricko- is now known as patrickod
17:26 🔗 patrickod is now known as patricko-
17:50 🔗 SketchCow How much of it is out there! (Closure had a factoid)
17:50 🔗 SketchCow The stats run - how long does it take?
18:04 🔗 zottelbey has quit (Remote host closed the connection)
18:06 🔗 patricko- is now known as patrickod
18:08 🔗 closure hey so I'd like to set up a project on github for cllient-side scripts
18:09 🔗 bzc6p_ (~bzc6p@[redacted]) has joined #internetarchive.bak
18:11 🔗 bzc6p__ has quit (Read error: Operation timed out)
18:15 🔗 closure numcopies +0: 72540
18:15 🔗 closure numcopies +1: 30779
18:15 🔗 closure numcopies +2: 21
18:15 🔗 closure numcopies +3: 3
18:15 🔗 closure these stats are probably out of date.. everyone: git-annex sync
18:16 🔗 bzc6p__ (~bzc6p@[redacted]) has joined #internetarchive.bak
18:18 🔗 bzc6p_ has quit (Read error: Operation timed out)
18:28 🔗 db48x closure: do you have a few minutes to look at an error I'm getting when I build git-annex?
18:29 🔗 patrickod is now known as patricko-
18:32 🔗 patricko- is now known as patrickod
18:37 🔗 patrickod is now known as patricko-
18:49 🔗 SketchCow Hey, so who was it who reported the 1pb of duplicates by md5?
18:53 🔗 bzc6p__ is now known as bzc6p
18:54 🔗 SketchCow sep332: Yo
18:54 🔗 sep332 oh hey
18:55 🔗 sep332 yeah that was me
18:56 🔗 SketchCow Is it an actual report or textfile?
18:57 🔗 balrog "by md5" --- ehh...
18:57 🔗 balrog suggestion, also compare filesizes
18:57 🔗 sep332 well i have a list of: count, hash, size
18:57 🔗 sep332 so i did (count-1) * size to get size of duplicates
18:58 🔗 SketchCow So, short form.
18:58 🔗 SketchCow it is of interest to Brewster and IA if there is an assessment showing that there is 1pb of duplicate files.
18:59 🔗 SketchCow And if it comes in the form of something we can look at.
18:59 🔗 SketchCow So, if you have a file that can be "here are items that are the same"
18:59 🔗 SketchCow That will be of use specifically.
19:00 🔗 SketchCow I'd say a CSV of:
19:00 🔗 SketchCow size,item1,item2,item....
19:00 🔗 patricko- is now known as patrickod
19:09 🔗 sep332 balrog: MD5 is still resistant to preimage attacks. checking the size is a good idea though
19:10 🔗 balrog IMHO, I'd do more than md5+size to actually confirm duplicates
19:10 🔗 balrog I'd probably do compare of the entire data, if it was my drive
19:10 🔗 sep332 SketchCow: I mostly have lists of individual files, but I can try extracting items if that's more useful
19:11 🔗 SketchCow I think items are how we deal.
19:21 🔗 Owen-x (~owen@[redacted]) has joined #internetarchive.bak
19:23 🔗 patrickod is now known as patricko-
19:43 🔗 Owen-x has quit (Owen-x)
19:47 🔗 csssuf has quit (Ping timeout: 370 seconds)
19:47 🔗 csssuf (~csssuf@[redacted]) has joined #internetarchive.bak
21:03 🔗 Owen-x (~owen@[redacted]) has joined #internetarchive.bak
21:15 🔗 patricko- is now known as patrickod
21:19 🔗 Owen-x_ (~owen@[redacted]) has joined #internetarchive.bak
21:21 🔗 patrickod is now known as patricko-
21:21 🔗 Owen-x has quit (Ping timeout: 186 seconds)
21:21 🔗 Owen-x_ is now known as Owen-x
21:22 🔗 Owen-x has quit (Client Quit)
22:26 🔗 swebb has quit (Quit: badcheese.com - where crap sometimes gets done)
22:31 🔗 swebb (~swebb@[redacted]) has joined #internetarchive.bak
22:32 🔗 Owen-x (~owen@[redacted]) has joined #internetarchive.bak
22:35 🔗 patricko- is now known as patrickod
23:02 🔗 patrickod is now known as patricko-
23:02 🔗 patricko- is now known as patrickod
23:08 🔗 Owen-x has quit (Owen-x)
23:18 🔗 Owen-x (~owen@[redacted]) has joined #internetarchive.bak
23:26 🔗 patrickod is now known as patricko-
23:32 🔗 Owen-x has quit (Owen-x)

irclogger-viewer