#internetarchive.bak 2016-11-09,Wed

↑back Search

Time Nickname Message
00:12 🔗 Start has joined #internetarchive.bak
00:57 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
01:01 🔗 Lord_Nigh has joined #internetarchive.bak
01:26 🔗 Lord_Nigh has quit IRC (Ping timeout: 633 seconds)
01:49 🔗 Lord_Nigh has joined #internetarchive.bak
01:52 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
02:00 🔗 Lord_Nigh has joined #internetarchive.bak
02:06 🔗 Lord_Nigh has quit IRC (Ping timeout: 250 seconds)
02:09 🔗 Lord_Nigh has joined #internetarchive.bak
02:14 🔗 Lord_Nigh has quit IRC (Ping timeout: 244 seconds)
02:33 🔗 Lord_Nigh has joined #internetarchive.bak
03:21 🔗 Lord_Nigh has quit IRC (Read error: Operation timed out)
03:30 🔗 Lord_Nigh has joined #internetarchive.bak
05:09 🔗 Blackout has joined #internetarchive.bak
05:35 🔗 kyan has quit IRC (Quit: Leaving)
07:21 🔗 SketchCow WHERE
07:21 🔗 SketchCow ARE
07:21 🔗 SketchCow THE
07:21 🔗 SketchCow SHARDMASTERS
07:22 🔗 SketchCow I need you to work with closure. I need you to start assigning items to shards
07:22 🔗 bwn would itemlists from census be helpful?
07:22 🔗 SketchCow Somewhat
07:22 🔗 bwn https://archive.org/download/archiveteam_census_2016
07:22 🔗 SketchCow We need to work on these tomorrow
07:26 🔗 vitzli has joined #internetarchive.bak
07:28 🔗 SketchCow Tomorrow
07:28 🔗 SketchCow We appointed three shardmasters. I expect to hear from them tomorrow or I will find replacements.
07:28 🔗 SketchCow The three shardmasters are HCross2 Kaz and Jess
07:28 🔗 SketchCow JesseW
07:29 🔗 SketchCow Tomorrow or I move faster
07:32 🔗 HCross2 Here I am
07:34 🔗 SketchCow Did you get credentials from Closure to begin assigning shard sets
07:35 🔗 HCross2 I havent
07:35 🔗 SketchCow We need you to do that.
07:35 🔗 SketchCow And then, just start working on these. It's file based, not items based.
07:36 🔗 SketchCow Use the Wiki or Google Docs to make them, if you have to.
07:36 🔗 SketchCow I will contribute all the time needed to suggest collections of higher priority
07:36 🔗 SketchCow I will also begin talking behind the scenes about how to handle web grabs (likely by making encrypted/password protected chunks)
07:39 🔗 HCross2 Will do. I'll go over all the documents now
07:45 🔗 HCross2 closure: 15 mins from work now. When I get in, I'll send you an SSH key
07:53 🔗 SketchCow Good.
07:53 🔗 SketchCow I think I should start a slack too
08:05 🔗 SketchCow Didn't want to wait. Slack created.
08:05 🔗 SketchCow Slacks that are free are always a pain in the ass. I am inviting the shardmasters, closure and then in the future we will use it to reach out to people who have access to a lot of disk space but just don't deal with IRC as much as slack
08:17 🔗 atomotic has joined #internetarchive.bak
08:17 🔗 SketchCow So please coordinate with closure when he wakes (I can e-mail him if he's not checking IRC) and we can begin designing shards, and then I will make a call out to a set of people to help back things up
08:19 🔗 SketchCow But we have 12 petabytes to coordinate and we should get on that hardcore
08:19 🔗 Senji Ugg. Been away; and a number of my bits of shards have gone offline and expired. I'll get them back online over the rest of the week.
08:19 🔗 SketchCow Please do
08:19 🔗 Senji 12PB is going to take a lot of volunteers
08:19 🔗 SketchCow I also want us to please create documentation for people to read and re-read as needed to keep track.
08:19 🔗 SketchCow Perhaps a readthedocs
08:44 🔗 ivan has joined #internetarchive.bak
08:44 🔗 zhongfu has joined #internetarchive.bak
10:09 🔗 kurt has joined #internetarchive.bak
10:10 🔗 kurt Closure idle for 11 days, doesn't look promising
11:03 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
11:33 🔗 atomotic has joined #internetarchive.bak
11:48 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
12:42 🔗 atomotic has joined #internetarchive.bak
12:49 🔗 VADemon has joined #internetarchive.bak
14:13 🔗 VADemon has quit IRC (Read error: Operation timed out)
14:16 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
14:20 🔗 Deewiant has joined #internetarchive.bak
14:31 🔗 vitzli has quit IRC (Quit: Leaving)
14:56 🔗 atomotic has joined #internetarchive.bak
14:58 🔗 Atom has joined #internetarchive.bak
15:27 🔗 Start has quit IRC (Quit: Disconnected.)
16:08 🔗 SketchCow We'll deal.
16:09 🔗 SketchCow Kaz and I'll find JesseW
16:19 🔗 SketchCow ---------------------------------------------
16:19 🔗 SketchCow Who in this channel can step forward with help with client-coding or configuring
16:19 🔗 SketchCow ---------------------------------------------
16:31 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
16:40 🔗 computerf @SketchCow: I can help out after this weekend. Is there is list of things that need to be done? Not seeing one on the IA.BAK wiki page
16:42 🔗 MrRadar SketchCow: I could probably help (well, once I finish recovering from the shock of President Trump)
16:42 🔗 MrRadar What needs to be done?
16:45 🔗 SketchCow The whole IA.BAK project wasn't mothballed, but it was in "see it running" mode
16:45 🔗 SketchCow Now it is not.
16:46 🔗 SketchCow What I want is a group of people willing to step in and talk with people who have disk space, to help them get on the project
16:47 🔗 SketchCow Brewster and I chatted. He is tacitly fine with this.
16:52 🔗 computerf Ok, so recruiting people in. What is there to do software/infrastructure-wise? I'm not the best "people person"...
16:52 🔗 MrRadar Ditto
16:52 🔗 SketchCow I'm a people person.
16:53 🔗 SketchCow :)
16:53 🔗 Kaz I'm a human
16:53 🔗 SketchCow Kaz. Shardmastering. Need you on it stat.
16:53 🔗 SketchCow ping me an e-mail address
16:53 🔗 Kaz yes
16:53 🔗 Kaz but Closure
16:54 🔗 SketchCow I can reach Closure.
16:54 🔗 Kaz okay
16:54 🔗 SketchCow Ping me an e-mail.
16:54 🔗 SketchCow I need to start assembling people with disk space. 50tb folks.
16:54 🔗 Kaz just need my pubkey?
16:54 🔗 SketchCow No, I am not doing that. I need your e-mail so I can have you on the slack
16:55 🔗 Kaz iabak"kurtmclester.com
16:55 🔗 Kaz bloody keyboard layout
16:55 🔗 Kaz iabak@kurtmclester.com
16:55 🔗 SketchCow Invited
16:58 🔗 computerf SketchCow: do we aim for fewer people with lots of storage or more people with less though? We would need >600 people each with 50TB to backup the 30PB archive just once, so easily over 1500 people to give most items triple redundancy.
16:58 🔗 computerf All with 50TB
16:58 🔗 SketchCow Several things.
16:59 🔗 SketchCow First, it's not 30pb
16:59 🔗 SketchCow It's more like.... 12 public facing, 15 wayback
16:59 🔗 SketchCow We're going after public facing initially
16:59 🔗 SketchCow Second, I agree, this is relatively difficult to aim for
17:00 🔗 SketchCow Luckily, there's material that we can skip over
17:00 🔗 SketchCow Hence Shardmasters, and not just start at AAAAAAA.txt (0000000.txt depending on your system) and moving forward
17:02 🔗 SketchCow Examples of materials we can skip over: duplicates of television shows, spam
17:04 🔗 computerf Ok, fair points. Even with that in mind though, let's assume that there's 8PB of stuff we want to have triple redundancy on. That's still ~500 people with 50TB each. That's more in the realm of plausibility, but my main point is I think it would be more worthwhile to try and get a lot of people with just like a couple 2TB external HDDs or whatever rather than focus on people with huge disk arrays. More likely (IMHO) that we could get the
17:04 🔗 computerf amount of storage needed that way.
17:04 🔗 SketchCow Yes, but
17:05 🔗 SketchCow You do realize it's possible to SEEK OUT group A while ALSO SEEKING OUT group B
17:05 🔗 SketchCow Group A preferred, Group B nice
17:05 🔗 SketchCow Group B also comes with a lot more support needs
17:05 🔗 SketchCow Oh no my drive broke, oh no why does it not sync
17:05 🔗 SketchCow Hence I am trying to build an actual support structure this time.
17:06 🔗 antomatic But saying '50tb minimum' may be a useful "you must be this tall" measure - people who are more able to offer a tiny amount of space may also be more likely to churn out, disappear, get lost, etc. Whereas someone standing up 50tb is doing so for a reason and is (hopefully) less likely to disappear on a whim.
17:06 🔗 computerf ... which is something we're going to have to deal with in the long run anyways
17:06 🔗 computerf Yes I just don't see a way to even get close to enough storage if we set the min to like 50TB
17:06 🔗 SketchCow PLease stand over here
17:06 🔗 SketchCow Next to the group of people who told me my projects seemed unrealistically attainable
17:08 🔗 computerf Look, I'm not saying that the whole thing is unrealistic at all. Just that setting the bar so high I think will diminish the likelihood of it happening.
17:08 🔗 computerf Please prove me wrong though
17:08 🔗 computerf I would love to see it
17:12 🔗 SketchCow On it
17:12 🔗 computerf Anyways, back to the original question: what needs to be done software-wise?
17:17 🔗 SketchCow Our client for IA.BAK can use refinement/flexibility for a download page.
17:17 🔗 SketchCow So the time from "find this" to "install" is as short as the Warrior.
17:18 🔗 SketchCow Docs writer coming in.
17:20 🔗 cmaldonad has joined #internetarchive.bak
17:20 🔗 SketchCow Hello, cmaldonad
17:20 🔗 SketchCow <--- Jason
17:20 🔗 cmaldonad hi SketchCow
17:20 🔗 SketchCow Website: http://iabak.archiveteam.org/
17:20 🔗 cmaldonad ---> kami here
17:20 🔗 cmaldonad reading that
17:21 🔗 SketchCow Wikipage: http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK
17:23 🔗 cmaldonad I see an additional concern that is not listed
17:24 🔗 SketchCow Always up for hearing it
17:24 🔗 cmaldonad Periodical Restore Rehearsal events
17:24 🔗 SketchCow That's actually in there, but not listed
17:24 🔗 cmaldonad otherwise there's no way that your restore plans are useful/adequate
17:24 🔗 cmaldonad ok
17:24 🔗 SketchCow Sorry, in there in a "built into the system"
17:24 🔗 cmaldonad I get the idea of the project
17:24 🔗 cmaldonad ok
17:25 🔗 SketchCow So, we have a client people install, and I'd like to begin a documentation set related to it, to help ramp.
17:25 🔗 cmaldonad ok
17:26 🔗 SketchCow Anything written, put in a public place (google docs, or the wiki) to begin to build that framework
17:26 🔗 SketchCow The people we'
17:26 🔗 cmaldonad it's important to have an idea of the audience
17:26 🔗 SketchCow The people we're dealing with initially will be comfortable but we quickly move to a situation where people who are more "just what do I type" come in
17:26 🔗 SketchCow Initially, Unix nerds, then ultimately, it's a client in windows and other systems that people are plugging removable hard drives into
17:27 🔗 cmaldonad I am a Sys Admin, but I've written docs for entry level staff, so I consider myself useful for either audience (high or low technical skill level)
17:27 🔗 SketchCow The beginning of a framework based on what's written should work. I can answer questions as can others.
17:27 🔗 SketchCow If you find gaps or want links, we can help
17:29 🔗 cmaldonad I am reading this http://tracker.archiveteam.org at the moment
17:30 🔗 SketchCow You got it
17:30 🔗 SketchCow Tracker is currently a separate project but worth seeing since it came from the same people.
17:32 🔗 cmaldonad and now I jumped to this http://git-annex.branchable.com
17:41 🔗 HCross2 Can I see an example shard please, so I can get an idea on what to do?
17:41 🔗 HCross2 I'm also happy if people have servers/space and want me to configure it all
17:48 🔗 cmaldonad "A script can do this using the git annex fromkey and git annex registerurl commands. Time to make such a repository with 100k files is in the 10 minute range (faster on SSD or randisk)."
17:48 🔗 cmaldonad example values for this section would help
17:59 🔗 db48x HCross2: an example shard is http://iabak.archiveteam.org/SHARD1.html
17:59 🔗 HCross2 I meant the actual contents of the shard file
18:01 🔗 db48x it's not a file, it's a git repository
18:01 🔗 db48x to create the repository we first make a list of collections, then use a script to enumerate their contents, adding each item in the collection to the repository
18:04 🔗 db48x https://github.com/ArchiveTeam/IA.BAK/blob/server/mkSHARD
18:05 🔗 yipdw at some point I will need to make sure that the ia.bak code will run on FreeBSD (since that's where all my storage is), so I will try to get some time in to look at the code
18:08 🔗 HCross2 Thanks db48x
18:09 🔗 db48x you're welcome
18:09 🔗 HCross2 I'll see about writing up a set of instructions on how to create shards
18:19 🔗 db48x we aimed to have about 100,000 files adding up to between 2 and 5 TB in each shard
18:25 🔗 SketchCow I need a secondary/majordomo/co-organizer for this project.
18:25 🔗 SketchCow Someone who is also on here a lot and can help answer so stuff doesn't linger.
18:35 🔗 yipdw I guess I can do that; I've run this code before
18:37 🔗 Meroje someone with write access to wiki, can you add CGI as perl dependency ?
18:38 🔗 SketchCow Mostly, I want, as a metric, for valid questions in this channel to be answered in 15 minutes if possible.
18:38 🔗 SketchCow If it takes this being a big priority, I get it. I just don't want things lingering.
18:38 🔗 SketchCow For example: Meroje: No.
18:38 🔗 SketchCow See? I got back to them in 60 seconds.
18:38 🔗 Meroje great
18:40 🔗 SketchCow cmaldonad: Please mail me at jason@textfiles.com if you run into issues with the framework
18:40 🔗 SketchCow My schedule: Broadway show tonight, travel to DC tomorrow, working in warehouses for 3 days, back up
18:44 🔗 db48x Meroje: why don't you have write access to the wiki?
18:45 🔗 cmaldonad SketchCow, will do
18:46 🔗 cmaldonad testing
18:46 🔗 cmaldonad ok, timestamps enabled here
18:47 🔗 SketchCow Yes.
18:52 🔗 dfboyd has joined #internetarchive.bak
18:58 🔗 SketchCow A contributor with TB is coming on here soon, we can work with him and see how the onboarding is
19:00 🔗 cmaldonad TB ?
19:02 🔗 SketchCow Terabytes
19:02 🔗 SketchCow And Tubercluosis
19:02 🔗 SketchCow A user with both disk space a debilitating lung disease
19:03 🔗 sep332 Johnny Pneumonic
19:04 🔗 dfboyd In case it comes up: I ran the numbers on Amazon Glacier. It would take 430 Snowball servers to move 21P; stored in Amazon Glacier it would cost $154,140.67 a month in us-east-1 or their other less-expensive clusters.
19:05 🔗 SketchCow Those numbers were run some time ago
19:05 🔗 SketchCow But agreed, we found it not workable
19:05 🔗 SketchCow Even with Glacier
19:05 🔗 SketchCow dfboyd and cmaldonad - Docs
19:05 🔗 SketchCow cmaldonad: dfboyd has stepped forward to run second if you need verbiage or research
19:06 🔗 cmaldonad thanks
19:06 🔗 cmaldonad but do we have a list of pending documents to write, or should I make a decision as we determine what is needed as we get new people contributing space and generating questions?
19:07 🔗 db48x cmaldonad: we're not so organized that we have a list of documents that are yet to be written
19:08 🔗 db48x I guess you could add it to the list of documents to write
19:08 🔗 Frogging I can give a few TB
19:09 🔗 SketchCow I think the priority is "Someone wanders in from the street with a pile of drive space and an existensial fear for the archive's data"
19:17 🔗 cmaldonad We could start with a "Jumpstart to your own shard"
19:17 🔗 cmaldonad and there's also that comes to mind
19:18 🔗 cmaldonad like a Matrix that would allow people to know how they can best contribute with whatever space available they have
19:18 🔗 cmaldonad but I need to read more about tech details on that
19:18 🔗 cmaldonad the Jumpstart is a good way to start
19:22 🔗 db48x sounds good
19:22 🔗 db48x ask me questions and I'll answer them
19:25 🔗 Kksmkrn has joined #internetarchive.bak
19:29 🔗 kyan has joined #internetarchive.bak
19:33 🔗 cmaldonad ok db48x, thanks
19:33 🔗 cmaldonad I am afk to make lunch
19:37 🔗 dfboyd If you assume the average space volunteer has 1TB, then you need 21,000 of them just to have 1x coverage. You probably want 3x coverage: 60,000 people. Suppose the average contributor is able to drop $500 and get 10 x 1TB hard drives, then great, you only ned 6000 people?
19:38 🔗 db48x yea, it's a problem
19:38 🔗 SketchCow This "I ran the numbers guyz" thing is adorable
19:38 🔗 SketchCow I'll work on having a cohesive response
19:39 🔗 dfboyd Which means not just a few dedicated volunteers, it means a mass volunteer thing; you need not just hackers and hobbyist engineers, you need retirees and moms and church groups or whatever?
19:39 🔗 db48x what I really "want" to do is write a nice windows desktop application, to make adoption easier
19:39 🔗 db48x but "want" and "windows desktop app" don't really go together
19:39 🔗 dfboyd As long as you're thinking about it already, I won't keep going on about it. You have fingers, you can do arithmetic.
19:39 🔗 cmaldonad this should be as easy as "Seti@Home" was
19:40 🔗 SketchCow A very cohesive response
19:40 🔗 db48x cmaldonad: agreed
19:40 🔗 cmaldonad I know it's not the current status, but that's one of the biggest distributed projects that had success
19:41 🔗 yipdw it would be nice too if it were somehow made clear that this isn't a theoretical thing; there's backups out there right now and the point now is to get more
19:42 🔗 cmaldonad uhm
19:43 🔗 cmaldonad dfboyd, is your programming background in Windows or plain C (Unix/Linux variants)?
19:43 🔗 dfboyd My only other idea that I want to ask about is: suppose the end-user just needs to do the following: 1. download a program of some kind and run it on their PC; they just have to tell it how much storage it's allowed to use. 2. What the program does is, behaves like an HDFS DataNode or a GFS chunkserver: it just checks in to the master and says, "I have XX GB available". 3. The master saves a collection of data blocks to that client; every so oft
19:43 🔗 yipdw that's what the current client does
19:43 🔗 dfboyd I am plain C (Unix/Linux), Python; not Windows-knowledgeable.
19:43 🔗 yipdw git-annex is a bit rough on Windows
19:44 🔗 yipdw there's path-length-limit issues
19:44 🔗 yipdw but it can work
19:44 🔗 dfboyd However these days one writes cross-platform apps using Electron, which is basically a menu-bar-less browser that runs Javascript apps. That's how the Slack chat client is made. And I do know CLojurescript.
19:44 🔗 cmaldonad dfboyd, then it would be a lot more feasible to have a Raspbian based image that brings up a shard node by booting a Raspberry Pi and a wizard asking what drive to use
19:46 🔗 dfboyd Does that mean people have to buy a Raspberry Pi and a hard drive?
19:46 🔗 cmaldonad it would
19:46 🔗 cmaldonad but it would be a zero-config deplouyment
19:47 🔗 cmaldonad deployment*
19:47 🔗 dfboyd (i.e. one can't just participate by running some background program on one's ordinary desktop PC).
19:47 🔗 cmaldonad I don't know if you are looking to: - zero config or wide adoption through reutilization
19:47 🔗 cmaldonad it would be just one option to deploy
19:48 🔗 cmaldonad I just see Windows desktop set ups as very fragile
19:48 🔗 cmaldonad say, they would use space probably shared by the Windows installation, most people don't partition OS a in a diff partition than data
19:49 🔗 SketchCow cmaldonad: It helps to understand the nature of git-annex and why I specifically chose that for this
19:50 🔗 yipdw fragility can be dealt with; the current system already accounts for that
19:50 🔗 SketchCow Example: Drives are able to be offlined
19:50 🔗 SketchCow And verified at times
19:50 🔗 cmaldonad ok
19:50 🔗 SketchCow (This is why there's an "aging" system already built in: notice how we classify people by last checkins)
19:50 🔗 SketchCow Idea being someone puts a drive into a bay once a month and it goes whiirrrr and spits them out saying 'thanks'
19:50 🔗 SketchCow And if it fails, it piles back into the red
19:51 🔗 cmaldonad ok
19:51 🔗 cmaldonad got it
19:51 🔗 db48x also, with git-annex the users aren't downloading random anonymous chunks, they're downloading a random selection of ordinary files that they can just use normally
19:51 🔗 db48x images, music, magazines, whatever
19:52 🔗 db48x in principle they can pick and choose which files they want at any time, if they can use the command line
19:52 🔗 cmaldonad that's a high motivation factor that should be highlighted
19:52 🔗 db48x the hypothetical gui app would make that nicer for most people
19:56 🔗 db48x and then there are the 50GB warc files that require specialized tools to use, so the HGA won't help much
19:56 🔗 db48x but we're not backing those up yet, so we can just not mention that in the press releases
20:01 🔗 db48x I too require lunch
20:01 🔗 db48x back soon (herbacious)
20:09 🔗 atomotic has joined #internetarchive.bak
20:11 🔗 Kksmkrn has quit IRC (Ping timeout: 250 seconds)
20:11 🔗 boyd has joined #internetarchive.bak
20:11 🔗 dfboyd has quit IRC (Quit: Page closed)
20:12 🔗 Kksmkrn has joined #internetarchive.bak
20:17 🔗 atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
20:33 🔗 VADemon has joined #internetarchive.bak
20:43 🔗 kyan has quit IRC (Quit: Leaving)
20:46 🔗 bwn HCross2: if it will help you on your quest, i still have collection total sizes and an item list with totalsize/files/collections from jan too (as well the itemlists i mentioned earlier)
20:47 🔗 bwn i can slice and dice/sort if needed, if not, i will shut up :)
20:52 🔗 SketchPho has joined #internetarchive.bak
20:53 🔗 SketchPho Hey.
21:01 🔗 SketchPho I've added my phone client to this channel so that I can be more easily reached if needed
21:01 🔗 SketchPho I'm going to keep out of the other channels
21:17 🔗 SketchPho Yard Masters, please make archive bot collection followed by General archive team collection the next shards
21:18 🔗 Kaz understood
21:27 🔗 Start has joined #internetarchive.bak
21:30 🔗 Kksmkrn has quit IRC (Ping timeout: 250 seconds)
21:31 🔗 Kksmkrn has joined #internetarchive.bak
21:32 🔗 Kksmkrn has left
23:06 🔗 cmaldonad has quit IRC (Quit: This computer has gone to sleep)
23:21 🔗 Lord_Nigh has quit IRC (Ping timeout: 250 seconds)
23:25 🔗 Lord_Nigh has joined #internetarchive.bak
23:27 🔗 Lord_Nigh has quit IRC (Excess Flood)
23:29 🔗 Lord_Nigh has joined #internetarchive.bak
23:37 🔗 bwn has quit IRC (Ping timeout: 244 seconds)
23:45 🔗 bwn has joined #internetarchive.bak
23:58 🔗 VADemon has quit IRC (Quit: left4dead)

irclogger-viewer