#internetarchive.bak 2015-05-26,Tue

↑back Search

Time Nickname Message
00:27 🔗 Start has joined #internetarchive.bak
00:28 🔗 iabak-reg 03registrar 05master cf23978 06other 10SHARD6/pubkeys registration of infinity on SHARD6
00:52 🔗 iabak-reg 03registrar 05master 038c73d 06other 10SHARD7/pubkeys registration of kurtmclester on SHARD7
00:55 🔗 Kazzy shard5 done
00:59 🔗 SketchCow https://soundcloud.com/renjith-vijay/fast-and-furious-7-get-low-ringtone
00:59 🔗 SketchCow That plays every time we move shards.
01:00 🔗 tpw_rules i've got 6TB down so far in general. how can i check progress on a specific shard?
01:00 🔗 tpw_rules closure: ^
01:01 🔗 Kazzy tpw_rules: http://iabak.archiveteam.org/SHARD5.html
01:01 🔗 Kazzy can substitute SHARD% for the shard you want to view
01:01 🔗 tpw_rules i mean me specifically
01:01 🔗 Kazzy SHARD5*
01:15 🔗 closure yeah, we could make individual pages for each registered user. already have the graphs and data.
01:19 🔗 tpw_rules what about just have it as part of git annex?
01:20 🔗 closure tpw_rules: just run git-annex info
01:23 🔗 tpw_rules oh duh
01:24 🔗 tpw_rules why are some "unknown size"?
01:24 🔗 closure xml files with dummy size 0 in the IA survey
01:24 🔗 tpw_rules is the working tree the ones on your server or on disk size
01:25 🔗 closure that's the total files, local annex is on your disk
01:25 🔗 tpw_rules ok
01:26 🔗 tpw_rules okay so i have about a third of shard6
01:26 🔗 tpw_rules gonna need to add more disk soon
01:30 🔗 db48x tpw_rules: what's the UUID of your copy of the shard?
01:32 🔗 db48x cd shard5; git config --get annex.uuid
01:32 🔗 primus104 has quit IRC (Leaving.)
01:34 🔗 BotOfWar has quit IRC (Quit: left4dead)
01:35 🔗 tpw_rules shard5: ada4868b-f969-40ba-ae7e-d58a20f1c477 shard6: a0c7d62a-53ec-4641-8ecf-af122b2cd5da shard1: 6ca85911-2e94-49b6-bbdc-cb1be080e300
01:35 🔗 tpw_rules i'm talking about shard6
01:35 🔗 VADemon has joined #internetarchive.bak
01:35 🔗 tpw_rules can you remove all the ones that claim to be twatson52 that aren't those? i made a bunch of ones that got blown up in various ways
01:36 🔗 tpw_rules they all have the same email
01:36 🔗 db48x http://iabak.archiveteam.org:8080/render/?width=1060&height=733&_salt=1432604162.838&target=iabak.shardstats.leaderboard.a0c7d62a-53ec-4641-8ecf-af122b2cd5da.shard6
01:37 🔗 db48x http://iabak.archiveteam.org:8080/render/?width=1060&height=733&_salt=1432604200.538&target=iabak.shardstats.leaderboard.a0c7d62a-53ec-4641-8ecf-af122b2cd5da.shard6&from=-1weeks
01:37 🔗 db48x pretty nice slope there
01:38 🔗 db48x you can use iabak.archiveteam.org:8080 to look at all the different stats we collect
01:38 🔗 db48x gives you a nice browser and graph editor
01:38 🔗 tpw_rules is there any reason the line is all dashy?
01:39 🔗 db48x we only check in once per hour
01:39 🔗 db48x you can use the graph editor to make it join up the dots
01:39 🔗 tpw_rules but shouldn't it connect the dots? i get the stairstep but it looks like it was done with a dashed line
01:39 🔗 tpw_rules ah
01:40 🔗 db48x it doesn't know a priori what a gap in the data means
01:40 🔗 db48x in this case it means a missed sample; it could just as easily mean a zero
02:08 🔗 tpw_rules has anybody else fiddled around with union filesystems for storing all the data?
02:09 🔗 db48x I use ZFS
02:09 🔗 trs80 what's the goal of using a union fs?
02:09 🔗 tpw_rules i'm coming at it from the point of having piles of hard drives that could be used
02:09 🔗 tpw_rules basically be able to have a failure of one drive not affect all the data stored
02:09 🔗 db48x yea
02:10 🔗 db48x ZFS allows striping/mirroring/raid across disks
02:10 🔗 tpw_rules and, more importantly, a failure of one drive not make the other data unusable like it would in a RAID 0
02:11 🔗 db48x true
02:11 🔗 db48x I prefer raid though, so that the occasional error can be corrected
02:11 🔗 tpw_rules but otoh i'm thinking about the idea of writing a disk manager that would create a new repo on each disk and manage filling them up and fscking as plugged in
02:11 🔗 tpw_rules so i don't need 45 drive enclosures and usb ports
02:11 🔗 db48x :)
02:12 🔗 db48x we should have a way for the iabak client to help that out
02:12 🔗 tpw_rules but then i have to actually swap drives around and i'm lazy
02:13 🔗 tpw_rules well how in the hands of the masses do you want this to be? i know several people who would be happy to help if it were just something they kept running on windows in the tray
02:14 🔗 tpw_rules and i really don't think making it all shell scripts is condusive to that ideal
02:15 🔗 db48x very true
02:16 🔗 sep332 With multiple disks you have to make sure that they're not all getting the same data
02:17 🔗 tpw_rules also note that if i say i have an idea, you can say "shut up and make it" because i could. but here goes. have a config file of archive destinations. for each, give it a name and directory. in each directory, put a file with the name and a uuid
02:17 🔗 sep332 No good having 3 copies of a file sitting next to each other in one drawer
02:17 🔗 tpw_rules this way you can have the directory be a mount point (and the same for multiple desinations) and the file can be used to determine if a disk is mounted/which one is there
02:17 🔗 tpw_rules sep332: oh yeah, that's important
02:18 🔗 db48x sep332: yea, git annex can handle that if we tell it to
02:18 🔗 tpw_rules though i get the approach of building this from common tools so it can be rebuilt in the event of thermonuclear meltdown, i think it limits options a lot
02:19 🔗 tpw_rules but this program could also read the config and determine which ones need to be checked and ask for the appropriate one to show up at the directory
02:19 🔗 tpw_rules so it could easily support multiple drives that are swapped out or multiple locations on one computer
02:20 🔗 tpw_rules also perhaps the option to download locally so it can fsck one drive and later copy files to another when it comes back
02:22 🔗 db48x we can simplify that more, even
02:23 🔗 db48x git annex repositories already have their uuid
02:23 🔗 tpw_rules can you scan a directory for a git repo easily?
02:23 🔗 db48x yes
02:23 🔗 tpw_rules recursively?
02:23 🔗 tpw_rules but yeah. just read the .git/whatever in the pointed to directory to make sure
02:23 🔗 db48x so if the iabak script listened for devices being added (udev, on linux) and found the repositories on them, it could fsck them
02:24 🔗 tpw_rules ignore my previous two questions
02:24 🔗 db48x yep
02:24 🔗 tpw_rules does git exist for windows without cygwin?
02:24 🔗 tpw_rules and could git annex?
02:24 🔗 db48x no
02:25 🔗 tpw_rules is it possible to bundle cygwin into an installer?
02:25 🔗 db48x sure
02:25 🔗 db48x git already does that
02:25 🔗 tpw_rules i have some experience with windows gui programming (python/pyqt so it could even be cross-platform) and bundle all that stuff together
02:25 🔗 db48x excellent :)
02:27 🔗 tpw_rules now this is where it gets shaky: how do we tie multiple repos on multiple computers to one name? that would be necessary to prevent duplication. i have a friend with a stack of six or so laptops; they should all know the files the others have
02:27 🔗 db48x yes, each repository knows which other repsotories have each file
02:28 🔗 SketchCow Yeah, this is git-annex's job
02:28 🔗 tpw_rules so we would have some association of the sets of repositories to an account and git annex can query those in the set?
02:28 🔗 db48x mostly
02:29 🔗 tpw_rules i mean we don't want two copies of a file three feet from each other, but having two copies three states from each other is good
02:29 🔗 db48x tpw_rules: yes
02:29 🔗 VADemon has quit IRC (Quit: left4dead)
02:29 🔗 SketchCow I suspect we're going to run into a LITTLE of that no matter what, because people will want to "help"
02:29 🔗 db48x theres a couple of ways to do it
02:30 🔗 db48x there's a git-annex feature coming down the line at some point which we could rely on
02:30 🔗 tpw_rules SketchCow: sure, but i wasn't sure how we automatically prevented the first and allowed the second
02:30 🔗 db48x or iabak could simply say git annex get --not --copies 4 --not in otherrepo1 --not --in otherrepo2...
02:31 🔗 tpw_rules that list would only update when a sync happened though
02:31 🔗 tpw_rules might have to tune that in practice
02:31 🔗 db48x the git annex feature is called balanced preferred content: http://git-annex.branchable.com/design/balanced_preferred_content/
02:32 🔗 db48x if the disk is offline, then the last sync for it has the must up-to-date information already :)
02:32 🔗 tpw_rules well this could also support n drives simultaneously too
02:33 🔗 db48x simplest to put a single repository on each of them
02:33 🔗 tpw_rules i was thinking more across multiple computers. but the balanced thing would sovle that problem
02:34 🔗 db48x both ways of doing it will work for n computers and n drives, and n drives on each of m computers ;)
02:34 🔗 tpw_rules but which will be the most perfect
02:35 🔗 db48x :)
02:36 🔗 tpw_rules i'd be happy to write a gui but i don't know enough about git annex and bash to do the script. i feel we should move it to python or something
02:36 🔗 tpw_rules or haskell :P
02:36 🔗 db48x haskell would be fun
02:36 🔗 db48x I don't know haskell very well yet; it'd be fun to learn
02:36 🔗 tpw_rules i need to learn haskell
02:37 🔗 tpw_rules i have worked with functionaly programming languages, but nothing truly functional
02:44 🔗 SketchCow closure: Seems like I need to start giving you more collections
02:44 🔗 SketchCow And we are getting to the point where we need to do a sanity check to make sure a collection isn't already being backed up
02:46 🔗 closure I have that sanity check in place actually
02:46 🔗 closure http://iabak.archiveteam.org/client/f9601d3062715f39f6290547fbaf14b3e6c2b4fb.html
02:47 🔗 SketchCow Great
02:50 🔗 iabak-reg 03registrar 05master da43505 06other 10SHARD6/pubkeys registration of kurtmclester on SHARD6
02:56 🔗 SketchCow Shard6 is filling in nicely.
04:03 🔗 tpw_rules maybe we don't need all commits in the channel :P
04:53 🔗 SketchCow Yes. We. Do.
04:55 🔗 tpw_rules okay okay okay
04:57 🔗 tpw_rules /media/iabak/disk2/IA.BAK;/media/iabak/disk3/IA.BAK;/media/iabak/disk4/IA.BAK 8.1T 5.9T 1.8T 77% /home/thomas/iabak/IA.BAK
04:57 🔗 SketchCow This is what we PLAYYYY FORRRRR
04:57 🔗 tpw_rules fourtunately i have 3x 3TB disks that i'm not using right now
04:58 🔗 tpw_rules anyway, it is time to say good night and let the datums flow in
05:03 🔗 iabak-reg 03registrar 05master a2a9eb2 06other 10SHARD6/pubkeys registration of kevin on SHARD6
06:00 🔗 zottelbey has joined #internetarchive.bak
06:37 🔗 iabak-reg 03registrar 05master 9e6e2ff 06other 10SHARD6/pubkeys registration of bas+at on SHARD6
07:48 🔗 Senji tpw: http://iabak.archiveteam.org/stats/SHARD5.leaderboard-raw seems to think you have 8T 8T 8T 1.5T
07:51 🔗 Senji Err, no, divite all those numbers by 10, I can'tmath :)
07:51 🔗 Senji I'll just go tback to bed, clearly I'm not awake yet
07:53 🔗 lhobas_ new stats pages per user are really nice :)
07:55 🔗 lhobas_ just noticed the cleanup function in the iabak script doesn't work on OS X (more stupid Mac-only glitches I assume) - https://github.com/ArchiveTeam/IA.BAK/blob/a420ad/iabak-helper#L282 throws "No such file or directory" (pid does exist, statement should eval to true)
07:56 🔗 ivan` has joined #internetarchive.bak
07:56 🔗 lhobas_ #L285 in that file seems off to me, think the file is supposed to be rm'ed, not the pid-number right?
07:56 🔗 primus104 has joined #internetarchive.bak
08:03 🔗 garyrh https://news.ycombinator.com/item?id=9602868
08:16 🔗 ivan` I would like to nominate https://archive.org/details/archiveteam_greader for distributed archival because it's got 8TB of compressed text, a lot from dead blogs that are nowhere else
08:16 🔗 ivan` the Directory and Stats are unimportant and omitting them saves ~800GB
08:16 🔗 ivan` I was planning on dumping it into my Google Drive or onto external drives but never got around to either but will maybe try later
08:19 🔗 iabak-reg 03registrar 05master 37605fb 06other 10SHARD6/pubkeys registration of cyrus on SHARD6
08:21 🔗 iabak-reg 03registrar 05master 5463225 06other 10SHARD6/pubkeys registration of peter on SHARD6
08:26 🔗 Start has quit IRC (Read error: Connection reset by peer)
08:26 🔗 Start_ has joined #internetarchive.bak
08:28 🔗 Cyrus has joined #internetarchive.bak
08:28 🔗 Senji Emcy: I'd say "2 sheets" rather than "4 or more" :)
08:28 🔗 Senji Bah, mischat :)
08:40 🔗 iabak-reg 03registrar 05master d422268 06other 10SHARD6/pubkeys registration of antoine on SHARD6
09:17 🔗 SketchCow This project just got mentioned on hackernews.
09:17 🔗 SketchCow Might cause a run on clients.
09:17 🔗 SketchCow Or whiners.
09:17 🔗 SketchCow Or whiny clients
09:17 🔗 SketchCow Or client whiners
09:19 🔗 iabak-reg 03registrar 05master c394c48 06other 10SHARD6/pubkeys registration of koos303 on SHARD6
09:21 🔗 atomotic has joined #internetarchive.bak
09:23 🔗 garyrh All four plus extra.
09:25 🔗 atomotic has quit IRC (Client Quit)
09:26 🔗 SketchCow We are probably going to start getting into the light realm of bad actors.
09:26 🔗 SketchCow We'll see how we handle it.
09:28 🔗 iabak-reg 03registrar 05master f586bb7 06other 10SHARD6/pubkeys registration of info on SHARD6
09:36 🔗 Start_ has quit IRC (Read error: Connection reset by peer)
09:36 🔗 lufix has joined #internetarchive.bak
09:37 🔗 Start has joined #internetarchive.bak
10:05 🔗 iabak-reg 03registrar 05master 40f2c21 06other 10SHARD6/pubkeys registration of bdupray on SHARD6
11:03 🔗 lufix has quit IRC (Ping timeout: 240 seconds)
11:03 🔗 iabak-reg 03registrar 05master 9ebe0e8 06other 10SHARD6/pubkeys registration of alex_online78532 on SHARD6
11:10 🔗 lufix has joined #internetarchive.bak
11:10 🔗 hendi has joined #internetarchive.bak
11:27 🔗 hendi is there a way to set a nicer name for my account? currently I'm named "info"?
11:28 🔗 Senji I believe there is an intention to allow nicknames. Currently it's using the bit before the @ in your email address
11:28 🔗 hendi alright, thanks; I'll keep an eye out for that functionality then
11:30 🔗 iabak-reg 03registrar 05master 744b699 06other 10SHARD6/pubkeys registration of andrei.zbikowski on SHARD6
11:57 🔗 atomotic has joined #internetarchive.bak
12:34 🔗 iabak-reg 03registrar 05master d785a5d 06other 10SHARD6/pubkeys registration of olliejudge on SHARD6
12:39 🔗 beardicus has joined #internetarchive.bak
12:40 🔗 beardicus hello, meat popsicles.
12:40 🔗 beardicus i think i need to register.
12:41 🔗 beardicus finally got a chance to update my scripts this morning... should be fscking shard1 right now.
12:41 🔗 beardicus but i am "out of touch"
12:41 🔗 ppiixx beardicus: i think you just need to run change-email in the iabak dir
12:42 🔗 beardicus hmm. i now see `register-helper.pl`... let's see what that does.
12:44 🔗 iabak-reg 03registrar 05master a378f40 06other 10SHARD1/pubkeys registration of brian on SHARD1
12:44 🔗 iabak-reg 03registrar 05master dde25de 06other 10SHARD2/pubkeys registration of brian on SHARD2
12:45 🔗 Senji That seems to have worked
12:45 🔗 beardicus that was change-email that did it. thanks ppiixx
12:46 🔗 beardicus noting that prompt-email did nothing.
12:46 🔗 beardicus also noting that my system has neither systemd nor cron, so the script is a little complainy and i'll have to figure that out.
12:47 🔗 Senji No cron!
12:47 🔗 beardicus it's a synology nas.
12:47 🔗 beardicus there's a gui thingy for running tasks though, i think.
12:48 🔗 beardicus "Task Scheduler" yay.
12:49 🔗 beardicus assuming the corrrect periodic command to blip is iabak-cronjob? daily?
12:50 🔗 ppiixx yep
12:50 🔗 sankin has joined #internetarchive.bak
12:51 🔗 iabak-reg 03registrar 05master b95aab9 06other 10SHARD6/pubkeys registration of olliejudge on SHARD6
12:56 🔗 lufix beardicus: My synology nas has cron, I believe?
12:57 🔗 beardicus hmm. crond does exist. no crontab though.
12:59 🔗 iabak-reg 03registrar 05master 69747f3 06other 10SHARD6/pubkeys registration of atrus6 on SHARD6
13:30 🔗 lufix beardicus: Ah, I see :) http://www.multigesture.net/articles/how-to-use-cron-on-a-synology-nas/
13:30 🔗 lufix Might help
13:31 🔗 beardicus yeah. apparently you need to use tabs between fields too.
13:48 🔗 iabak-reg 03registrar 05master e1e0acf 06other 10SHARD6/pubkeys registration of jon archive.org on SHARD6
13:50 🔗 iabak-reg 03registrar 05master 629d09d 06other 10SHARD6/pubkeys registration of mail on SHARD6
13:55 🔗 Start has quit IRC (Disconnected.)
14:18 🔗 iabak-reg 03registrar 05master f8bf3c8 06other 10SHARD6/pubkeys registration of chiploaded on SHARD6
14:34 🔗 zottelbey has quit IRC (Ping timeout: 512 seconds)
14:41 🔗 Start has joined #internetarchive.bak
14:46 🔗 ohhdemgir has joined #internetarchive.bak
14:52 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
14:55 🔗 iabak-reg 03registrar 05master 09b57ac 06other 10SHARD6/pubkeys registration of moritz.steiner on SHARD6
15:17 🔗 zottelbey has joined #internetarchive.bak
15:19 🔗 iabak-reg 03registrar 05master 3988cd2 06other 10SHARD6/pubkeys registration of mariusz on SHARD6
15:20 🔗 beardicus has quit IRC (Sleep.)
15:20 🔗 primus104 has quit IRC (Leaving.)
15:28 🔗 iabak-reg 03registrar 05master e5f4b99 06other 10SHARD6/pubkeys registration of iabackup on SHARD6
15:45 🔗 sep332 I second ivan`'s nomination of the google reader archive, even though the files are huge
15:50 🔗 Start has quit IRC (Disconnected.)
15:57 🔗 Start has joined #internetarchive.bak
16:00 🔗 scatman has joined #internetarchive.bak
16:01 🔗 Start has quit IRC (Client Quit)
16:12 🔗 mariusz has joined #internetarchive.bak
16:13 🔗 Zero_Dogg has joined #internetarchive.bak
16:14 🔗 iabak-reg 03registrar 05master 91c2b2a 06other 10SHARD6/pubkeys registration of archive.org on SHARD6
16:14 🔗 mariusz Hi. How can I register myself? :)
16:16 🔗 Zero_Dogg Your install-git-annex is a bit stupid, always defaulting to i386. There are standalone tarballs for arm too, that works on raspberry pi (which I'd be using if I set it up)
16:16 🔗 Start has joined #internetarchive.bak
16:18 🔗 Zero_Dogg https://github.com/zerodogg/scriptbucket/blob/master/gitannex-install#L52-L60 is an example of the logic needed
16:23 🔗 yipdw do submit patches, we all run on i386/x86-64 and therefore haven't had a need to generalize
16:25 🔗 Zero_Dogg I will :)
16:26 🔗 Zero_Dogg Got some spare space that I might be able to use for this, but it's on a raspi server
16:26 🔗 SketchCow Zero_Dogg: So you have criticism and can't donate space!
16:26 🔗 SketchCow You... you came from Hackernews
16:27 🔗 Zero_Dogg does it require much cpu after the whole thing is downloaded (ie. does it git annex fsck, much)?
16:27 🔗 Zero_Dogg SketchCow: lol
16:28 🔗 Zero_Dogg SketchCow: I came from your blog, actually :p
16:28 🔗 SketchCow Oh, THAT dump
16:28 🔗 Zero_Dogg hah
16:30 🔗 yipdw http://upload.wikimedia.org/wikipedia/commons/5/55/Creature_from_the_Black_Lagoon_poster.jpg
16:39 🔗 SketchCow I will say, the Dogg is right in one regard - we should come up with some FAQ/information on how system intensive the ongoing holding of the data is.
16:43 🔗 Zero_Dogg See? All nice and constructive, complete with pull request. Not hackernewsy at all
16:45 🔗 Start has quit IRC (Disconnected.)
16:46 🔗 SketchCow That's the way we like it.
16:58 🔗 Lord has joined #internetarchive.bak
16:58 🔗 Lord hello
16:59 🔗 Lord i'm quite interested in this project (backink up the web backup :-) )
17:06 🔗 Lord i launched iabak, it downloaded some files and it failed
17:06 🔗 Lord i created a user without home so the script failed
17:06 🔗 Lord (maybe this info interest you)
17:08 🔗 Lord i think i'll face another problem : the script is downloading gitannex i386 but my gentoo doesn't have multilib support
17:08 🔗 Beardface has joined #internetarchive.bak
17:09 🔗 iabak-reg 03registrar 05master 256141c 06other 10SHARD6/pubkeys registration of lord-ia on SHARD6
17:10 🔗 Lord here i am :-)
17:10 🔗 Lord it works
17:10 🔗 Lord (with lots of setlocale errors)
17:14 🔗 sep332 can git-annex be configured to use a set amount of space? like 2TB?
17:19 🔗 mariusz sep332: git config annex.diskreserve 2000GB
17:20 🔗 sep332 that's not how much space it *won't* use?
17:23 🔗 primus104 has joined #internetarchive.bak
17:26 🔗 mariusz sep332: you're right:)
17:26 🔗 mariusz sep332: sorry
17:26 🔗 sep332 np. it's a cool idea just not what i'm looking for
17:29 🔗 SketchCow The new clients are cuasing some partying.
17:30 🔗 iabak-reg 03registrar 05master 3de6df9 06other 10SHARD6/pubkeys registration of dylan.barlett on SHARD6
17:33 🔗 tpw_rules is that a problem? i'll bring pizza
17:33 🔗 tpw_rules i'm getting an error trying to sync to shard 6: error: Ref refs/heads/synced/git-annex is at 8c3a1a32ad19cc72f8429d7078dce8e9bc7e9e67 but expected e606f7f0f586c8f1504bd49cccb48d81dfa0a873
17:35 🔗 SketchCow No, it's not causing a problem at all. Just watching the activity.
17:35 🔗 SketchCow We also are getting dilletantes, which is good, because that's a worthwhile experiment.
17:36 🔗 SketchCow (People joining to fuck around and see what it does, then going "well that was fun" and disappearing, likely already, but certainly within the 2 week/4 week period)
17:38 🔗 SketchCow My theory is this will just cause a bunch of 0.00 clients, since people are unlikely to go "let's see what it does.... DOWNLOAD A TERABYTE"
17:39 🔗 iabak-reg 03registrar 05master b29843a 06other 10SHARD6/pubkeys registration of frozenbeardme on SHARD6
17:42 🔗 iabak-reg 03registrar 05master 5e39162 06other 10SHARD6/pubkeys registration of ryan on SHARD6
17:46 🔗 Beardface it works!
17:48 🔗 SketchCow That's what we hope!
17:48 🔗 SketchCow How much space you got, Beardface!
17:48 🔗 * SketchCow rubs hands like Mr. Burns
17:48 🔗 Beardface ~1T so your last comment is kind of relevant, heh
17:49 🔗 hendi I currently have 1.5TB and think about adding some more
17:50 🔗 hendi should I do RAID1, or go without RAID, and just redownload when a drive fails?
17:50 🔗 Start has joined #internetarchive.bak
17:50 🔗 Beardface it checks for systemd to install a service, if not found it exits.. intentional? (when you start it again it installs a cron instead)
17:52 🔗 primus104 has quit IRC (Leaving.)
17:53 🔗 DFJustin hendi: there is already redundancy in the iabak software so that multiple people will get the same file, so local RAID is unnecessary
17:54 🔗 SketchCow agree with DFJustin - it's wasted space, unless you personally have an interest in a collection in a way you are making it available elsewhere.
17:58 🔗 hendi great, thank you
17:58 🔗 hendi expect at least 15TB from me, then
18:01 🔗 SketchCow Fantastic.
18:01 🔗 SketchCow That'll help a lot.
18:02 🔗 mariusz I plan on running iabak on more than one computer. Is the software smart enough to send me a different set of data to each one?
18:02 🔗 tpw_rules not yet
18:02 🔗 tpw_rules hendi: i've used mhddfs to attach a bunch of drives into one filesystem
18:02 🔗 tpw_rules the advantage being that if one drive breaks, it doesn't take everything
18:03 🔗 SketchCow mariusz: It's a worthwhile feature going forward for closure and db48x to consider, where machines are called buddies and they're treated as one machine.
18:04 🔗 mariusz Yeah,tThat would be great..
18:05 🔗 SketchCow This project keeps coming up with new feature-adds
18:06 🔗 mariusz another question. I have about 50 older drives ranging from 80GB to 1TB that I could manually plug-in once every few weeks. Is "cold storage" supported? If yes, any info how to go about this?
18:06 🔗 SketchCow Somebody in 4 years is going to go "Man, this closure guy thought of EVERYTHING...."
18:06 🔗 SketchCow mariusz: Not yet, in any meaningful way.
18:06 🔗 SketchCow I should say it's supported in git-annex, but we're being simple... for now.
18:06 🔗 tpw_rules mariusz: i solved that problem by getting a bunch of extremely cheap usb enclosures and attaching them as one
18:06 🔗 SketchCow Because of aforementioned discovery of "buddy" feature and similar features.
18:07 🔗 tpw_rules oh i had another idea: be able to set up an IPC socket that git annex will request downloads from so we can use something other than wget. i was thinking a pretty gui
18:08 🔗 tpw_rules or even just a download command
18:09 🔗 hendi tpw_rules, thanks for the hint, I'll have a look at mhddfs!
18:09 🔗 tpw_rules get 0.1.38 btw, the later version is a bit crashy. it's a union fs, but it supports writes too
18:13 🔗 DFJustin I think it does support cold storage as long as you check in at least once a month
18:13 🔗 mariusz another idea - sneakerneting the data. i.e get a beer and copy your shard :)
18:14 🔗 SketchCow It's a thought down the line.
18:15 🔗 SketchCow There's a second/third/fourth wave of approach as we hit the upper limits of just scooping people out and into the project.
18:15 🔗 SketchCow But it's still holding up for people going "Oh, yeah, got 10tb lying around."
18:15 🔗 SketchCow The main critical thing is to make sure we have chosen collections that aren't wasteful.
18:18 🔗 DFJustin I'm a little suspicious about some of the recent ones like wikipediadumps
18:20 🔗 tpw_rules (why is that plural? i thought one dump contained the entire history of everything)
18:21 🔗 DFJustin well there is more than one wikipedia (language)
18:25 🔗 tpw_rules oh, true
18:35 🔗 SketchCow I think wikipediadumps is on the edge.
18:35 🔗 SketchCow On the other hand, our collection of dumps goes WAY back farther than anyone.
18:35 🔗 SketchCow I did some of those, with lots of skeletons
18:36 🔗 SketchCow Like Erik Moller pro-childporn arguments that were quietly expunged when he became Wikipedia org dude
18:36 🔗 SketchCow Or Jimbo Wales getting into an argument with someone, and having a db admin remove the thing he said, and then going "I never said that."
18:36 🔗 SketchCow And who knows what else, down there.
18:36 🔗 SketchCow But year.
18:36 🔗 SketchCow Yeah.
18:36 🔗 SketchCow Maybe we need a nominations page.
18:37 🔗 SketchCow Yes.
18:37 🔗 SketchCow We do.
18:37 🔗 SketchCow OK, one moment.
18:37 🔗 Start has quit IRC (Read error: Connection reset by peer)
18:46 🔗 SketchCow http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK/nominations
18:49 🔗 Beardface are the shars a set size?
18:49 🔗 Beardface shards*
18:49 🔗 SketchCow The shards are a set number of files/items
18:50 🔗 SketchCow So imagine it's... 20
18:50 🔗 SketchCow 20 1m files mean tiny shard
18:50 🔗 SketchCow 20 1g files means fatty shard
18:50 🔗 SketchCow Once things are going into the upper reaches/echelons, we'll see cases where shards 100-1000 are all 1mb or some such chicanery
18:51 🔗 Beardface ahh
18:53 🔗 iabak-reg 03registrar 05master a1cadb4 06other 10SHARD6/pubkeys registration of mmein301+i on SHARD6
18:53 🔗 SketchCow Like, 80% of the work going on now is to find and deal with use cases and bugs/contingencies they reveal.
18:53 🔗 SketchCow 10% is improving the UI and interaction
18:54 🔗 SketchCow 10% is filler, primarily melted hooves and horns
19:00 🔗 iabak-reg 03registrar 05master 95ae19f 06other 10SHARD6/pubkeys registration of nico+iabak on SHARD6
19:01 🔗 mariusz btw. does any one knows why git config annex.web-options=--limit-rate=200k returns "invalid key" error?
19:01 🔗 hendi If I want to run iabak on multiple machines, should I copy the private key over for accounting and stuff, or use a new one on each machine?
19:08 🔗 iabak-reg 03registrar 05master b6d2ef7 06other 10SHARD6/pubkeys registration of cyberjacob+IA on SHARD6
19:08 🔗 closure mariusz: I think git config doesn't want the = there
19:08 🔗 closure hendi: use a new one
19:12 🔗 mariusz closure: that worked. thanks. probably would be good
19:13 🔗 mariusz to change the README
19:13 🔗 mariusz ;)
19:13 🔗 Senji Glad to see I'm not the only person using a foo+bar address :)
19:14 🔗 iabak-reg 03registrar 05master 1d18796 06other 10SHARD7/pubkeys registration of mail on SHARD7
19:15 🔗 tpw_rules error: Ref refs/heads/synced/git-annex is at 38179c6b1a70682556e88bf6d5c94187cdaabaac but expected ffeac6bf96c796c6117981d2ee64fc642edbaa01
19:16 🔗 tpw_rules i am still getting sync problems like that
19:16 🔗 tpw_rules i don't have more than one iabak script running
19:18 🔗 closure well, that can happen if someone else pushed a change at the same time. It should normally clear up the next time, unless you're unludky
19:24 🔗 iabak-reg 03registrar 05master f760a1f 06other 10SHARD7/pubkeys registration of cyberjacob+IA on SHARD7
19:28 🔗 Start has joined #internetarchive.bak
19:30 🔗 Start has quit IRC (Client Quit)
19:31 🔗 iabak-reg 03registrar 05master c7d5ea9 06other 10SHARD6/pubkeys registration of eskild on SHARD6
19:48 🔗 beardicus has joined #internetarchive.bak
19:50 🔗 iabak-reg 03registrar 05master 40e077c 06other 10SHARD7/pubkeys registration of moritz.steiner on SHARD7
19:50 🔗 primus104 has joined #internetarchive.bak
19:51 🔗 Zero_Dogg /win 20
19:52 🔗 Zero_Dogg bah
19:53 🔗 sep332 where are the authorized_keys files again? I think I'm missing from shard1
19:54 🔗 atomotic has joined #internetarchive.bak
19:57 🔗 closure .git/annex/id_rsa and id_rsa.pub
19:58 🔗 CyberJaco has joined #internetarchive.bak
19:58 🔗 CyberJaco Hi
19:59 🔗 sep332 i have id_rsa but I'm getting Permission denied (publickey) when i sync
19:59 🔗 sep332 hi CyberJaco
19:59 🔗 iabak-reg 03registrar 05master dd3be51 06other 10SHARD6/pubkeys registration of hannson on SHARD6
20:00 🔗 CyberJaco that's weird, why is the last letter of my name mising...
20:00 🔗 closure sep332: sounds like the wrong key, we have separate sets of keys for each shard, so shard1 may not have the pubkey you're using for other shards
20:00 🔗 sep332 looks like an 8-char limit?
20:01 🔗 sep332 closure: can i register a new one?
20:01 🔗 closure sep332: manually, yes..
20:01 🔗 closure ./register-helper.pl "$SHARD" "$uuid" "$registrationemail" "$(cat id_rsa.pub)"
20:02 🔗 closure full in the bits, that will give an url you can hit to register
20:05 🔗 mariusz has quit IRC (Read error: Operation timed out)
20:05 🔗 closure sep332: or, I can manually add it
20:07 🔗 iabak-reg 03registrar 05master f36f056 06other 10SHARD1/pubkeys registration of sean.palmer on SHARD1
20:08 🔗 iabak-reg 03registrar 05master fd09178 06other 10SHARD6/pubkeys registration of brian on SHARD6
20:09 🔗 sep332 closure: is it the [annex] uuid or the [remote "origin"] annex-uuid?
20:10 🔗 iabak-reg 03registrar 05master 84567e0 06other 10SHARD6/pubkeys registration of steven.m.reed on SHARD6
20:11 🔗 SketchCow SHARD6 does the climb
20:11 🔗 closure sep332: the annex.uuid
20:17 🔗 sep332 closure: that's what i put in, same error
20:17 🔗 sep332 i'm using the same key for all shards
20:18 🔗 closure check perms of your id_rsa file
20:19 🔗 sep332 -rw-------
20:24 🔗 closure sep332: check if shard1's git config has remote.origin.annex-ssh-options set
20:27 🔗 sep332 nope. i'll just copy it from another shard
20:28 🔗 closure well, not the whole config, just that setting
20:28 🔗 sep332 yeah
20:29 🔗 sep332 alright it's working. thanks closure
20:42 🔗 CyberJaco is now known as zz_CyberJ
20:44 🔗 iabak-reg 03registrar 05master 4de5003 06other 10SHARD6/pubkeys registration of mariusz on SHARD6
20:46 🔗 sankin has quit IRC (Leaving.)
20:50 🔗 SketchCow awww yes here comes mariusz
20:56 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
21:01 🔗 xhdr has joined #internetarchive.bak
21:10 🔗 tpw_rules closure: i think the sync error may have killed the iabak-hourly process. it's been a couple hours since that error and it hasn't tried again
21:21 🔗 closure indeed, that could happen
21:22 🔗 laxity has joined #internetarchive.bak
21:29 🔗 lhobas_ closure: noticed that the cleanup function in the iabak script doesn't work on OS X (more stupid Mac-only glitches I assume) - https://github.com/ArchiveTeam/IA.BAK/blob/a420ad/iabak-helper#L282 throws "No such file or directory" (pid does exist, statement should eval to true)
21:29 🔗 lhobas_ any clue what could cause that?
21:29 🔗 lhobas_ (#L285 in that file seems off to me, think the file is supposed to be rm'ed, not the pid-number right?)
21:34 🔗 iabak-reg 03registrar 05master e16b5ee 06other 10SHARD6/pubkeys registration of matt on SHARD6
22:12 🔗 db48x I should have gone ahead and fixed that last night when you mentioned it
22:12 🔗 db48x couldn't sleep anyway
22:15 🔗 tpw_rules do you not love me
22:16 🔗 closure SketchCow on npr, eep
22:23 🔗 iabak-reg 03registrar 05master 93b954b 06other 10SHARD6/pubkeys registration of carl.moden on SHARD6
22:39 🔗 mariusz has joined #internetarchive.bak
23:05 🔗 Atluxity has quit IRC (Ping timeout: 360 seconds)
23:10 🔗 Atluxity has joined #internetarchive.bak
23:12 🔗 iabak-reg 03registrar 05master 240d6ac 06other 10SHARD6/pubkeys registration of paul.chambers on SHARD6
23:18 🔗 beardicus has quit IRC (Quit: Sleep.)
23:19 🔗 Atluxity has quit IRC (Ping timeout: 360 seconds)
23:23 🔗 Start has joined #internetarchive.bak
23:26 🔗 Atluxity has joined #internetarchive.bak
23:27 🔗 beardicus has joined #internetarchive.bak
23:31 🔗 db48x closure: ah, an interesting clue
23:32 🔗 db48x so where is the ffi function that calls CreateProcess?
23:36 🔗 db48x I'm looking at http://hackage.haskell.org/package/process-1.2.3.0/docs/src/System-Process.html#createProcess, but I don't see where it actually calls the win32 api...
23:37 🔗 Atluxity has quit IRC (Ping timeout: 360 seconds)
23:38 🔗 Atluxity has joined #internetarchive.bak
23:45 🔗 closure db48x: oh, I just nailed that problem
23:45 🔗 zottelbey has quit IRC (Quit: Leaving)
23:46 🔗 closure writing the PR for the library that needs changes now..
23:46 🔗 closure you were in the right place, but it has a side of C files :)
23:49 🔗 Atluxity has quit IRC (Ping timeout: 360 seconds)
23:52 🔗 iabak-reg 03registrar 05master de4a487 06other 10SHARD6/pubkeys registration of iabak on SHARD6
23:53 🔗 db48x ah, good
23:53 🔗 db48x is https://github.com/haskell/process/blob/master/System/Process/Internals.hs#L414 closer?
23:54 🔗 db48x ah, https://github.com/haskell/process/blob/master/cbits/runProcess.c#L557
23:56 🔗 db48x presumably you're adding a way to add that flag where that's called here: https://github.com/haskell/process/blob/master/System/Process/Internals.hs#L452
23:57 🔗 closure that's the plan, but I'm actually watching austraian cooking show :P
23:57 🔗 closure feel free to send patch to https://github.com/haskell/process/issues/32
23:58 🔗 Atluxity has joined #internetarchive.bak

irclogger-viewer