#internetarchive.bak 2015-03-04,Wed

↑back Search

Time Nickname Message
01:02 🔗 garyrh_ gives channel operator status to closure garyrh ivan` Kenshin
01:03 🔗 SketchCow closure: Good job
01:04 🔗 SketchCow I'd like the default shard(?) to be 100gb, or a percentage of a typical small drive
01:04 🔗 SketchCow For the record, I think the amount of stuff will be far under 20 petabytes
01:04 🔗 SketchCow Our intental guy will have some good data on it this week.
01:11 🔗 SketchCow So, with this, the question becomes.... what's missing?
01:11 🔗 SketchCow I mean, a good leaderboard and view, of course.
01:11 🔗 SketchCow But I have some extra disk space, as I'm sure others do, to donate to the cause.
01:40 🔗 closure SketchCow: each shard is split further amoung clients, so it can be larger than a typical small drive. A client could store only a few gb out of an 8 tb shard
01:41 🔗 SketchCow I se.
01:41 🔗 SketchCow See.
01:41 🔗 SketchCow OK, withdrawn.
01:41 🔗 SketchCow I have a range of questions, if you want them.
01:41 🔗 closure it's basically 1st come 1st serve which clients get which Items out of a shard
01:41 🔗 closure yeah, ask away
01:41 🔗 SketchCow (Also, I'll redo the talk page to reflect a pushing to git-annex)
01:43 🔗 SketchCow Goofy McAnderson has a drive dedicated to us. It's on his system, it's 500gb.
01:43 🔗 SketchCow If he was to look in that drive's directory, what would he see?
01:43 🔗 closure bunch of $itemname.tar
01:44 🔗 closure some random or not so random subset of the IA items
01:44 🔗 SketchCow Rounded to item?
01:44 🔗 SketchCow So $itemname.tar is the full originals set of $itemname?
01:44 🔗 closure yeah, presumably w/o the derives
01:53 🔗 BEGIN LOGGING AT Tue Mar 3 20:53:14 2015
01:53 🔗 Now talking on #internetarchive.bak
01:53 🔗 Topic for #internetarchive.bak is: http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK
01:53 🔗 Topic for #internetarchive.bak set by chfoo!~chris@[redacted] at Sun Mar 1 23:43:37 2015
01:53 🔗 svchfoo1 gives channel operator status to chfoo
01:54 🔗 pikhq That's a pretty neat approach to git annex for it, yeah.
01:54 🔗 pikhq (apologies for not having been around much otherwise: job interviewing. :))
01:54 🔗 closure kind of a cool effect of distributed, forked git repos
01:55 🔗 closure or, he could rename it to "awesomhot.tar", and the IA wouldn't care, it can still restore the file from him despite the name change
01:57 🔗 yipdw does that capability fall out from the usual way git handles renames?
01:57 🔗 yipdw or is there more on top from git-annex
01:57 🔗 pikhq That's more a function of git-annex's storage of data.
01:57 🔗 closure it's basically due to git renames, yes
01:58 🔗 yipdw hmm neat
01:58 🔗 pikhq All that's in the git repo *itself* is just a symlink to the git-annex data store, but git annex doesn't really look at the symlinks to determine what the file is.
01:58 🔗 pikhq Yaaay, nice properties falling out naturally.
01:58 🔗 closure pikhq is right
02:00 🔗 SketchCow I like that solution (nooo, hotbootydogporn)
02:04 🔗 SketchCow These are all good.
02:04 🔗 SketchCow I am running out of questions
02:04 🔗 SketchCow Oh, and this can go into the wiki or your document
02:04 🔗 SketchCow What is IA running?
02:05 🔗 SketchCow Like, what do we need to be running? Another machine with git, git-annex, or whatever?
02:05 🔗 SketchCow I mean, i sounds almost like we need to give you a box and let you start making it into god.
02:06 🔗 closure IA needs some kind of server, with git and git-annex. I'm assuming locked down ssh keys for clients to access it (can only run git-annex-shell to download data and do git pull/push), although that could be handled other ways.
02:06 🔗 SketchCow How much disk space
02:06 🔗 SketchCow And is it shoving out all the data?
02:06 🔗 SketchCow Is it pulling from the items and constructing the love?
02:07 🔗 closure needs enough disk to buffer outgoing transfers to clients, and probably several gb for the git repos
02:07 🔗 closure I assume it's doing the client-facing transfer, I don't know about how the $item.tar gets made
02:09 🔗 closure could be one client per shard too, or something like that, depending on how some of these things scale
02:09 🔗 closure sorry, 1 server per shard
02:09 🔗 closure or per 10 or whatever
02:10 🔗 SketchCow Yeah, might want to aws
02:10 🔗 closure hmm, here's one other thought.. the total number of files in all items in IA might be only say 10x the number of items. It might make sense to make the repos contain not $item.tar, but $item/$file
02:11 🔗 SketchCow It's a strong idea.
02:11 🔗 closure it lets you play mp3 and movies w/o this tar thing that is so hard 35 years after being made ;)
02:12 🔗 closure (btw, I have a git-annex repo I made a while ago that contains the most popular 500 or so GD live recordings. Kind of amusing.)
02:13 🔗 closure I pulled out all the recordings of Dark Star. I think I could play them back to back for about 1 week..
02:14 🔗 closure 119 gb
02:14 🔗 closure is only a baby deadhead
02:18 🔗 SketchCow It might be worth doing for the initial.
02:18 🔗 SketchCow And it's sexy, it solves the problem of IA completely craters into the earth
02:18 🔗 SketchCow the drives just.... have files
02:18 🔗 SketchCow Nice
02:19 🔗 closure git-annex.branchable.com/future_proofing
02:27 🔗 pikhq Yeah, that's probably my favorite feature of git-annex.
02:28 🔗 pikhq If git annex bites the dust somehow, it's just files.
02:48 🔗 closure updated document with several items
02:57 🔗 Ctrl-S will the files be compresssed?
02:58 🔗 closure I was thinking not, but *shrug* could be
02:59 🔗 closure (assuming it uses ssh they'd be compressed in transit)
02:59 🔗 fenn compression saves a lot on html, which is what most of the web archive would be
02:59 🔗 closure point
02:59 🔗 fenn someone said something about bzip
03:00 🔗 closure except, is warc compressed? :)
03:00 🔗 Ctrl-S with the amount of data you guys will be working with, you don't really have the option to not use compression
03:00 🔗 pikhq warc is commonly gzip compressed.
03:00 🔗 pikhq I don't know if the web archive is though.
03:01 🔗 closure if it's separate files, and not $Item.tar, it could decide on a per-file basis when adding it whether to compress, or leave a already compressed file format as-is
03:01 🔗 fenn (nevermind, the bzip thing was something else)
03:02 🔗 ivan` how much low-entropy stuff would there be on IA, anyway? .warc.gz, audio, video are relatively incompresible
03:02 🔗 closure pdf, html, disk images, ..
03:03 🔗 fenn there are compression algorithms that uncompress .zip or whatever and then recompress it, making a note to re-zip it when you uncompress
03:03 🔗 Ctrl-S How expensive (Computer time, coder time) would it be to try compressing everything, or check if compression would have a benifit?
03:04 🔗 Ctrl-S like if it's a warc, make sure it's a compressed warc, if it's a know video/image/audio file don't bother
03:04 🔗 pikhq Coder time, pretty easy. Compute time, :(
03:05 🔗 Ctrl-S could you just have a script look to see if compression has been done, and then if it's a known compressable apply compression if needed?
03:06 🔗 fenn you could just try compressing the first 1kB and see if it helps or not
03:07 🔗 aschmitz My question is mostly how much git-annex would trust the clients. For example, if I claim I have the whole archive, does it have any realistic way of checking? Obviously I might need some metadata for that (hashes of each file, or whatever), but far less than actually having everything.
03:07 🔗 yipdw you could also run your shard on a compressed filesystem, take the complexity of compression entirely out of this syste
03:07 🔗 yipdw m
03:08 🔗 aschmitz (See also Sybil attacks on multiple copies, etc.)
03:09 🔗 fenn aschmitz: you ask the client for a salted hash
03:09 🔗 Ctrl-S randomly request a 1M chunk?
03:09 🔗 closure aschmitz: that's a fun attack. :) The fire drill secion has one way to detect such bad actors, but it seems hard to know for sure, you have to decide how much you trust people and the system, and hope for enough redundancy ..
03:09 🔗 aschmitz fenn: Yeah, I had proposed that before, and it seems like the only realistic way I can think of. Not sure if git-annex does that now, but I have to bet that closure could make it. :)
03:10 🔗 fenn i have yet to hear any good solutions to sybil attacks (in general) and proof of storage
03:10 🔗 pikhq Salted hash of a random 1M chunk would suffice for detecting corruption, but yeah. That's not really a way of determining how much to trust a client but more whether or not a client has violated that trust.
03:10 🔗 closure git-annex doesn't have proof right now, other than trying to get the file that it claims to have
03:11 🔗 aschmitz pikhq: Could do salted hash of the whole N GB chunk, as all you have to transfer is the hash. Would have to read that from the archive too, but whatever disk scrubbing IA does probably has to read everything regularly anyway.
03:12 🔗 closure however, systems that have a tit-for-tat incentive system need proof more than this system, which has a incentive of helping the IA
03:12 🔗 closure (right?)
03:12 🔗 aschmitz fenn: That's a valid point. I suppose if you don't care that everyone is anonymous, you could register different people separately.
03:13 🔗 fenn forcing people to register doesn't solve the sybil attack problem
03:13 🔗 pikhq closure: Yeah, there's no incentives to game here which helps a lot in terms of the odds of being attacked.
03:13 🔗 aschmitz fenn: Depends on how thorough you are at identifying them. Having, say, a number of different universities register is something that could be verified that they're distinct, at least. Individuals would be a lot harder.
03:13 🔗 closure well, no incentive other than some random 4chan thread "let's kill the IA today because it's a wednesday"
03:13 🔗 fenn but it may not matter anyway; bittorrent has various enemies and is also vulnerable to sybil attacks, but it still works fine
03:13 🔗 aschmitz closure: That was the concern, yeah.
03:14 🔗 Ctrl-S multiple tiers of trust
03:14 🔗 aschmitz closure: Alternatively, someone could just target a small section of the data (say, furry art or something), claim they had several copies, and if the IA ever does become a crater, nobody else will have bothered to keep copies.
03:15 🔗 fenn how would they "target" the data?
03:15 🔗 aschmitz I don't necessarily have solutions, and git-annex is awesome, just trying to throw out potential issues. They're not all necessarily valid attacks, or worth defending against.
03:16 🔗 closure aschmitz: yeah. It is possible to prevent such targeting, but it adds quite a lot of complexity, and possibly decreases incentives for some good actors
03:16 🔗 aschmitz fenn: Presumably they could "sample" a bunch of different chunks, then identify content they didn't like? WARCs would be pretty easy to identify, say, by domain name by just scanning a few kb of content.
03:16 🔗 closure ie, it could assign particular items at random to clients, and ignore clients who claim to have unassigned items
03:16 🔗 aschmitz Sure.
03:17 🔗 closure or encrypt items..
03:17 🔗 aschmitz Aside: What happened to http://archive.org/about/bibalex_p_r.php ?
03:18 🔗 aschmitz closure: Unfortunately, encryption is a pretty annoying single point of failure, and if the key(s) go away, all the data does.
03:18 🔗 fenn yeah, that
03:19 🔗 closure I actually shard (SSS) my gpg key amoung many git-annex repos. Saved me losing it last month :)
03:19 🔗 closure N of M is the bomb
03:19 🔗 aschmitz Nice.
03:20 🔗 pikhq Frankly probably the best thing for preventing these sorts of attacks is just making sure that enough good actors participate that these won't work. :)
03:20 🔗 aschmitz Yeah, it's not like there's not precedent for doing that with keys (see: DNSSEC root), but I'm guessing we'd prefer to avoid having to deal with it.
03:21 🔗 fenn the most likely attacks are not cryptographic or explosives, but legal actions
03:21 🔗 yipdw yeah, if there was a huge asshole contingent I'd guess we'd have seen it in the warrior projects
03:21 🔗 yipdw haven't seen that so far
03:21 🔗 fenn like "cease and desist at once!"
03:21 🔗 aschmitz yipdw: I'm still impressed you haven't.
03:21 🔗 aschmitz Which, y'know good.
03:22 🔗 yipdw I am impressed too
03:22 🔗 pikhq Not very interesting to assholes.
03:22 🔗 closure yipdw: you forget when we HTML injection exploited the leaderboard? :)
03:22 🔗 pikhq It's like attacking an orphan's puppy. Just, why?
03:22 🔗 fenn unfortunately you need orders of magnitude more participation (and thus attention, and unwanted attention) than the warrior projects
03:22 🔗 yipdw closure: oh yeah, there was that
03:23 🔗 pikhq fenn: True.
03:23 🔗 closure <-- hey there's always 1 asshole
03:23 🔗 pikhq My warrior instance I haven't paid attention to in months.
03:23 🔗 pikhq Still see it show up on leaderboards though.
03:23 🔗 aschmitz pikhq: Might it be worth allowing some sort of automatic "I have these chunks" messages/something that can be shared among places that trust one another? I suspect many places that support LOCKSS would potentially dedicate some storage space, and be willing to trust one another and avoid duplicating effort unnecessarily.
03:23 🔗 fenn "trust, but verify"
03:24 🔗 fenn if there is a simple protocol to verify then there's no reason to blindly trust
03:24 🔗 aschmitz Well, sure.
03:25 🔗 fenn if N of M shards are needed to reassemble a decryption key, who owns the key, and how do they get it?
03:26 🔗 aschmitz Whoever can get N shards :)
03:26 🔗 fenn isn't this just DRM all over again?
03:26 🔗 fenn (wasn't it proven that DRM can't work?)
03:26 🔗 aschmitz Under such a scheme, you wouldn't actually let people decrypt the data unless the key were revealed (which would only happen when IA disappears), which technically works.
03:27 🔗 fenn how would the key be revealed "when IA disappears" (whatever that means)
03:27 🔗 aschmitz DRM relies on saying "you can see this data, but you have to stop when I tell you to". This would be "you can have this data, but can't decrypt it until I release the key".
03:27 🔗 aschmitz Presumably a number of semi-trusted people would be given shards of the key, and N of M of them would have to agree.
03:28 🔗 fenn also this sounds a lot like various video game quests :P
03:28 🔗 aschmitz Note: I don't particularly like this idea, but I'm explaining how it would work.
03:28 🔗 closure Kill Bills 0..M-N
03:28 🔗 closure me neither, for the record
03:28 🔗 fenn Three were intended for the Elves, Seven for Dwarves, Nine for Men, and one, the One Ring was given to 4chan
03:28 🔗 aschmitz Which is to say: I don't think the crypto is really necessary.
03:29 🔗 aschmitz On the other hand, Freenet seems to avoid some problems by not really letting anyone see what their computer is actually storing. Hopefully that wouldn't be an issue here, but I don't know how many threats IA gets, or how many individuals would be likely to get.
03:29 🔗 Ctrl-S if you encrypt it you create a single point of failure
03:30 🔗 Ctrl-S if someone controls the keys they can control the whole array
03:30 🔗 aschmitz Ctrl-S: Not that I disagree, but we were at least discussing how to make it a N of M point of failure :)
03:30 🔗 aschmitz And to be fair, the keys would only be used to obscure the data that was being stored, not for commanding the clients or anything.
03:31 🔗 fenn i was thinking a different failure mode... the world blows up and nobody can read the ancient scrolls because they're encrypted
03:31 🔗 aschmitz Yeah, that would also suck.
03:31 🔗 Ctrl-S i was thinking of access to data, not C&C
03:32 🔗 Ctrl-S if the keys are lost, the data is lost
03:32 🔗 Ctrl-S so if you did have keys you'd need to spread them over the world
03:32 🔗 aschmitz Ah, I was confused by your "can control the whole array" comment. Anyway, it doesn't seem like anyone likes the idea, so it doesn't seem worth going over too much.
03:32 🔗 fenn i'm sure this conversation will come up again and again, with all the "dark" data in IA
03:32 🔗 Ctrl-S if you are the only one with the keys, noone can access it without you
03:32 🔗 aschmitz Actually, yeah, dark data might be interesting.
03:33 🔗 Ctrl-S only encrypt dark data?
03:34 🔗 aschmitz Ctrl-S: DNSSEC handled the "one key spread over the world" with their Trusted Community Representatives stuff: http://www.root-dnssec.org/index.html%3Fp=171.html . Ignoring everything else about DNSSEC, it seems like a reasonable proposal if you have to do that sort of thing.
03:34 🔗 fenn there's something called "time lock encryption puzzles" where you basically just square a number repeatedly, and it has to be done in serial fashion, and it takes a lot of processor cycles, but not an unfeasible number of cycles
03:35 🔗 fenn the idea is that someone can encrypt the data after crunching on it for an arbitrarily long time
03:35 🔗 Ctrl-S proposes the well established and highly secure ROT-13 crypto algorythim
03:35 🔗 fenn decrypt*
03:35 🔗 aschmitz fenn: I was actually looking into that for similar data, yeah. Unfortunately, you kind of have to leave something running doing the calculations to have the time lock expire at the right time, but I guess that's not a huge deal.
03:39 🔗 pikhq Ctrl-S: 3ROT-13, please.
04:31 🔗 bzc6p_ (~bzc6p@[redacted]) has joined #internetarchive.bak
04:38 🔗 SketchCow Boop
04:38 🔗 bzc6p has quit (Ping timeout: 600 seconds)
04:40 🔗 SketchCow Hi. So.
04:40 🔗 SketchCow 1. I really don't want to encrypt.
04:43 🔗 SketchCow 2. I am comfortable with, and happy with, git-annex's level of complexity and self-healing.
04:43 🔗 SketchCow 3. There comes a point in the project when bad actors have to just be tolerated.
04:44 🔗 SketchCow 4. There comes a point when you have to assume the bad actors making sybil attacks against shards are not ging to be able to touch the oirignals, hich have torrents
04:44 🔗 SketchCow I think that we should move to a field test with closure and a selection of items.
04:45 🔗 SketchCow or collections, really.
04:45 🔗 aschmitz Works for me.
04:45 🔗 SketchCow I think perhaps a aws system is the way to go.
04:45 🔗 SketchCow Repeatable, we can mess with them
04:45 🔗 SketchCow Use AWS bandwidth
04:45 🔗 SketchCow Unless we want to start with archive.org internally.
04:45 🔗 SketchCow I can get another server
05:08 🔗 DFJustin LOCKSS has been mentioned a couple times, is it feasible to actually just use LOCKSS
05:10 🔗 aschmitz My impression is that LOCKSS is basically just a caching proxy. I could be wrong, but if it is, probably not.
05:15 🔗 aschmitz Apparently I'm somewhat wrong. You might be able to produce LOCKSS manifests for IA files, I guess, which might work.
05:15 🔗 aschmitz Slightly more information and useful links at http://www.lockss.org/about/how-it-works/
06:39 🔗 db48x` has quit (Read error: Operation timed out)
06:49 🔗 SketchCow no.
06:52 🔗 godane something unrelated
06:55 🔗 godane SketchCow: i posted on -bs
10:37 🔗 bzc6p_ is now known as bzc6p
13:32 🔗 closure SketchCow: if aws is used, this would mean pumping the whole IA contents into aws and back out eventually. that's some BW cost
13:33 🔗 closure some vm like aws is proably ok for initial development
13:35 🔗 Kenshin sketch:i can kinda provide resources u know
13:56 🔗 SketchCow Kenshin: Appreciated. Yes, I forgot, the bandwidth
13:57 🔗 Kenshin there was the other interesting topic in #archiveteam as well, about .onion site. heh
14:12 🔗 SketchCow I saw.
14:22 🔗 trs80 has quit (Ping timeout: 186 seconds)
14:41 🔗 SketchCow Kenshiin, how much can you throw somewhere near the US in disk space for this test backup?
14:44 🔗 Kenshin u'd probably prefer LAX, i have a 10TB node there
14:44 🔗 Kenshin it's 10ms from archive.org
15:05 🔗 SketchCow Yes.
15:05 🔗 SketchCow Well, for this test, assign 500gb to it initially.
15:05 🔗 SketchCow I want to see it overflow, hit issues, etc
15:06 🔗 SketchCow Otherwise, we're testing a butterfly against a tanker
15:22 🔗 Kenshin k. i'll arrange something for you guys while you carry on hashing it out
15:27 🔗 Start has quit (Disconnected.)
15:29 🔗 Ctrl-S cut the machine's power halfway through
16:02 🔗 Start (~Start@[redacted]) has joined #internetarchive.bak
16:51 🔗 Start has quit (Disconnected.)
16:58 🔗 Start (~Start@[redacted]) has joined #internetarchive.bak
17:21 🔗 bzc6p_ (~bzc6p@[redacted]) has joined #internetarchive.bak
17:21 🔗 SketchCow I'll be making another machine with 500gb. If people have 500gb networked drives, that would help
17:21 🔗 SketchCow Probably 5-10 would be a good number.
17:22 🔗 SketchCow As mentioned by closure, git and git-annex to be on there. Maybe we need a wiki page with requirements.
17:26 🔗 bzc6p has quit (Ping timeout: 600 seconds)
17:35 🔗 SketchCow I have to focus on my GDC presentation today, but I like where this is going, a lot. closure, just let us know what technology you need, and if there's code beyond what you would write to make it go.
17:45 🔗 Start has quit (Disconnected.)
18:03 🔗 Start (~Start@[redacted]) has joined #internetarchive.bak
18:31 🔗 closure SketchCow: I have to work on git-annex development all day (what a fate), not this, and I'm doing 7drl 24x7 all next week. Some first steps others could do:
18:32 🔗 closure - pick a set of around 10 thousand items whose size sums to around 8 TB
18:33 🔗 closure - build map from Item to shard. Needs to scale well to 24+ million. sql?
18:35 🔗 closure - write ingestion script that takes an item and generates a tarball of its non-derived files. Needs to be able to reproduce the same checksum each time run on an (unmodified) item. I know how to make tar and gz reproducible, BTW
18:36 🔗 closure - write client registration backend, which generates the client's ssh private key, git-annex UUID, and sends them to the client (somehow tied to IA library cards?)
18:37 🔗 closure - client runtime environment (docker image maybe?) with warrior-like interface
18:37 🔗 closure (all that needs to do is configure things and get git-annex running)
18:38 🔗 closure could someone wiki that? ta
18:38 🔗 Start has quit (Disconnected.)
18:41 🔗 closure oh, getting a full item list with sizes and last modification time might be a good start too
18:42 🔗 yipdw closure: captured at http://www.archiveteam.org/index.php?title=INTERNETARCHIVE.BAK/git-annex_implementation
19:45 🔗 Start (~Start@[redacted]) has joined #internetarchive.bak
19:46 🔗 bzc6p_ is now known as bzc6p
19:46 🔗 Start has quit (Read error: Connection reset by peer)
19:48 🔗 closure oh and if someone can get a count of all files in all items in the IA, that would be very useful information. Seems like an IA admin is best positioned to do that..
20:35 🔗 Start (~Start@[redacted]) has joined #internetarchive.bak
21:23 🔗 Start has quit (Disconnected.)
22:31 🔗 sep332 (~sep332@[redacted]) has joined #internetarchive.bak
22:46 🔗 dirt (james@[redacted]) has joined #internetarchive.bak
22:58 🔗 garyrh_ has quit (Quit: Leaving)
23:01 🔗 jbenet (sid17552@[redacted]) has joined #internetarchive.bak
23:01 🔗 jbenet greetings-- saw the post on HN today.
23:02 🔗 jbenet i'm the author of ipfs.io -- i designed IPFS with the archive in mind. (see also end of https://www.youtube.com/watch?v=skMTdSEaCtA).
23:03 🔗 jbenet Our tech is very close to ready. you can read about the tech details here: http://static.benet.ai/t/ipfs.pdf
23:03 🔗 jbenet or watch the old talk here: https://www.youtube.com/watch?v=Fa4pckodM9g -- i will be doing another, updated tech dive into the protocol + details.
23:04 🔗 jbenet you can loosely think of ipfs as git + bittorrent + dht + web.
23:05 🔗 xmc hmmm
23:05 🔗 yipdw huh I didn't know someone posted this on HN
23:05 🔗 xmc my thoughts too
23:06 🔗 chfoo https://news.ycombinator.com/item?id=9147719
23:06 🔗 yipdw cool, nobody writing about how stupid we all are yet
23:06 🔗 yipdw i'll wait a few more hours
23:06 🔗 xmc hahahah
23:07 🔗 chfoo jbenet: feel free to add your solution in the wiki discussion page
23:07 🔗 jbenet i've been trying to get in touch with you about this-- i've been to a friday lunch (virgil griffith brought me) and recently reached out to brewster. i think you'll find that ipfs will very neatly plug into your arch, and does a ton of heavy lifting. it's not perfect yet -- keep in mind there was no code a few months ago -- but today we're at a point of
23:07 🔗 jbenet streaming video reliably and with no noticeable lag-- which is enough perf for replicating the archive.
23:08 🔗 jbenet --and before you use it, we've to put in the `commit` datastructure (so you can have proper version control like git--
23:08 🔗 jbenet but basically, we're at a point where figuring out your exact constraints-- as they would look with ipfs-- would help us build the thing you need.
23:09 🔗 ivan` yipdw: that would be me... a month ago https://news.ycombinator.com/item?id=8980154
23:09 🔗 closure been meaning to look into ipfs..
23:09 🔗 yipdw ha
23:10 🔗 xmc jbenet: i should point out that archiveteam is not the internet archive, and only one or two people here are associated with them
23:10 🔗 xmc we just have a good working relationship with them
23:10 🔗 jbenet xmc: ah, thank you for pointing that out.
23:10 🔗 xmc :)
23:10 🔗 xmc sure thing
23:10 🔗 jbenet xmc: not hyper clear from looking at a page for 20s
23:10 🔗 jbenet :]
23:10 🔗 xmc no worries
23:11 🔗 xmc it's a common mistake
23:13 🔗 jbenet yeah, the single page http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK doesnt make it clear-- but then again it's a wiki and we should click home.
23:13 🔗 jbenet well
23:13 🔗 jbenet in any case-- now you know about ipfs :) look into it, i'm sure it'll be useful in this endeavor and we're happy to help. (#ipfs on freenode)
23:14 🔗 jbenet xmc: does the archive have an irc channel?
23:14 🔗 xmc not officially
23:14 🔗 X-Scale (~gbabios@[redacted]) has joined #internetarchive.bak
23:14 🔗 xmc there is #internetarchive on this network though
23:15 🔗 xmc it's most of the same people as in here
23:15 🔗 xmc #archiveteam is the main channel for archiveteam
23:15 🔗 xmc surprisingly enough
23:16 🔗 jbenet cool, thanks!
23:18 🔗 chfoo trying to put the disclaimer but the wiki is being hammered
23:19 🔗 mntasauri (~motesorri@[redacted]) has joined #internetarchive.bak
23:21 🔗 z0ner (0c118402@[redacted]) has joined #internetarchive.bak
23:23 🔗 z0ner has quit (Client Quit)
23:24 🔗 z0nenet (0c118402@[redacted]) has joined #internetarchive.bak
23:24 🔗 z0nenet has quit (Client Quit)
23:24 🔗 z0ned (webchat@[redacted]) has joined #internetarchive.bak
23:29 🔗 z0ned So, what's the plan!?
23:30 🔗 z0ned has quit (Quit: Page closed)
23:30 🔗 xmc uh
23:38 🔗 chfoo has changed the topic to: http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK | #archiveteam
23:41 🔗 yipdw so I threw a bit about IA's data model and browsing tools in http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK/git-annex_implementation#Browsing_the_Internet_Archive
23:41 🔗 yipdw I'm not sure if 'ia search 'collection:*'' is a good idea, but it seems to work if you disregard that it might be killing a search server somewhere
23:45 🔗 jbenet is joey from git-annex in here?
23:46 🔗 xmc jbenet: yes, he goes by the name closure
23:46 🔗 jbenet closure: is it you? (guessing from the irc note)
23:46 🔗 jbenet great
23:48 🔗 chfoo zooko was here earlier too
23:51 🔗 jbenet chfoo: lol the post brought all the fs nuts out of the woodwork :)
23:52 🔗 jbenet i'll stick around if you dont mind. i can also leave, whatever.
23:52 🔗 yipdw jbenet: yeah, sticking around is totally cool
23:54 🔗 GauntletW (~ted@[redacted]) has joined #internetarchive.bak
23:57 🔗 Start (~Start@[redacted]) has joined #internetarchive.bak
23:57 🔗 svchfoo1 gives channel operator status to Start
23:58 🔗 rossdylan (~rossdylan@[redacted]) has joined #internetarchive.bak
23:59 🔗 ryang (uid10904@[redacted]) has joined #internetarchive.bak
23:59 🔗 mntasauri which fs does zooko work with
23:59 🔗 xmc tahoe-lafs
23:59 🔗 mntasauri tahoe ah

irclogger-viewer