#internetarchive.bak 2015-04-04,Sat

↑back Search

Time Nickname Message
00:31 🔗 garyrh has quit (Write error: Connection reset by peer)
01:00 🔗 garyrh (garyrh@[redacted]) has joined #internetarchive.bak
01:00 🔗 svchfoo3 gives channel operator status to garyrh
01:24 🔗 zottelbey has quit (Remote host closed the connection)
01:38 🔗 closure wow, 50% 4 copies, and 50% 3 copies!
01:39 🔗 trs80 my node has finished downloading, abuot 3 hours ago
01:40 🔗 trs80 so about 48h all up
01:41 🔗 closure oh, you have it all?
01:41 🔗 trs80 yep
01:41 🔗 closure there's always SHARD2 ;)
01:42 🔗 trs80 Wow! This shard of the IA is fully backed up now!
01:42 🔗 trs80 I know of 4 copies of every file (including one copy at the IA itself).
01:43 🔗 closure ah, that message is a little wrong isn't it
01:43 🔗 trs80 yeah, I was just thinking that
01:44 🔗 trs80 how do I start on SHARD2?
01:44 🔗 closure run ./checkoutshard shard2
01:45 🔗 GitHub122/#internetarchive.bak IA.BAK/master 65b9cc0 Joey Hess: fix message when done; a client may have downloaded everything but that does not mean NUMCOPIES was reached yet
01:45 🔗 GitHub122/#internetarchive.bak IA.BAK/master 691c95a Joey Hess: work on newest shards first, those are most likely to not be done yet
01:45 🔗 GitHub122/#internetarchive.bak [IA.BAK] joeyh pushed 2 new commits to master: http://git.io/ve3Um
01:46 🔗 db48x details, details
01:46 🔗 GitHub83/#internetarchive.bak IA.BAK/master e67be5b Joey Hess: accept uppercase SHARD names
01:46 🔗 GitHub83/#internetarchive.bak [IA.BAK] joeyh pushed 1 new commit to master: http://git.io/ve3U8
01:48 🔗 GitHub47/#internetarchive.bak IA.BAK/master 1ca901e Joey Hess: better, just grep -i for shard names
01:48 🔗 GitHub47/#internetarchive.bak [IA.BAK] joeyh pushed 1 new commit to master: http://git.io/ve3U7
01:49 🔗 trs80 joeyh: tac doesn't work (shard2 is returned first by find for me)
01:49 🔗 closure drat
01:50 🔗 db48x sort -nr
01:50 🔗 closure fails at 100
01:51 🔗 closure -n needs only numbers I think
01:52 🔗 closure I know, I'll add the ctime to the find output
01:55 🔗 GitHub185/#internetarchive.bak IA.BAK/master 0d43159 Joey Hess: sort by ctime
01:55 🔗 GitHub185/#internetarchive.bak [IA.BAK] joeyh pushed 1 new commit to master: http://git.io/ve3Tx
01:56 🔗 trs80 yep, running shard2 now
01:56 🔗 closure it's 3.2tb iirc
01:57 🔗 trs80 hmm, my diskreserve wasn't applied in the shard2/.git/config
01:58 🔗 closure I forget if I tested that part
01:58 🔗 closure code for it is in checkoutshard
01:58 🔗 trs80 git user.name and email were set
01:59 🔗 trs80 I suspect the find didn't work for the same reason
01:59 🔗 closure ah, I see
02:00 🔗 closure found shard2
02:01 🔗 GitHub80/#internetarchive.bak IA.BAK/master ab2c2c9 Joey Hess: find prevshard before we create a new shard
02:01 🔗 GitHub80/#internetarchive.bak [IA.BAK] joeyh pushed 1 new commit to master: http://git.io/ve3kd
02:02 🔗 trs80 cheers
02:12 🔗 trs80 why is 3 copies red now?
02:14 🔗 closure because the coloring is broken
02:14 🔗 closure it just averages between green and red, and with 2 current groups, one is green and one red
02:15 🔗 closure 3 copies was appearing grey earlier too
02:15 🔗 trs80 nods
02:16 🔗 closure I think SketchCow can fix it by filling in a 0.. if you view source, some lines just lack a number in them
02:19 🔗 closure hmm, no, doesn't help.
02:20 🔗 closure filling in a dummy value of 1 does seem to work
02:20 🔗 closure heh, 0.0001 works
02:20 🔗 beardicus where's the code for the stats page?
02:20 🔗 closure not in git wherever it is..
02:21 🔗 beardicus aha. i don't see a dot for me on the map :(
02:21 🔗 beardicus thought i'd poke around about that.
02:22 🔗 closure http://iabak.archiveteam.org/stats/SHARD1.geolist
02:24 🔗 closure oh, he seems to me generating it in my home directory
02:24 🔗 closure that's convenient for me finding it
02:26 🔗 pikhq Alas, the entry for me is slightly wrong.
02:27 🔗 pikhq ... admittedly only trivially so, I think that's the right ZIP code for the other side of the street. :)
02:27 🔗 db48x heh
02:29 🔗 closure fixed
02:32 🔗 svchfoo2 gives channel operator status to chfoo
02:33 🔗 closure it thinks I'm in milan and SC, and NJ. Wrong on 3 counts
02:33 🔗 closure the geolocations are pretty good though
02:34 🔗 db48x closure: while you're in there, I posted a change that would add the counts to the tooltips
02:34 🔗 closure we should probably pull it into git
02:34 🔗 db48x yesterday afternoon sometime
02:34 🔗 db48x agreed!
02:35 🔗 db48x dinner, bbl
02:40 🔗 beardicus hmm. not sure we can avoid using `find`s printf feature now that it's used to output the ctime for sorting.
02:42 🔗 GitHub102/#internetarchive.bak IA.BAK/master c35feeb Joey Hess: docs
02:42 🔗 GitHub134/#internetarchive.bak IA.BAK/server 2effa3d Joey Hess: add web page generator script
02:42 🔗 GitHub102/#internetarchive.bak [IA.BAK] joeyh pushed 1 new commit to master: http://git.io/ve3YT
02:42 🔗 GitHub134/#internetarchive.bak [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/ve3Yk
02:42 🔗 closure beardicus: well, we can add a fallback that works less well
02:42 🔗 beardicus k. i was just using awk instead to strip the './'
02:44 🔗 beardicus though it looks like that's no longer happening and was completely unecessary anyways :)
02:49 🔗 S[h]O[r]T (~Sh]Or]T@[redacted]) has joined #internetarchive.bak
02:49 🔗 S[h]O[r]T who can take my pub key? :)
02:50 🔗 closure dude
02:50 🔗 closure over here
02:52 🔗 GitHub30/#internetarchive.bak IA.BAK/pubkey 1754006 Joey Hess: add S[h]O[r]T
02:52 🔗 GitHub30/#internetarchive.bak [IA.BAK] joeyh pushed 1 new commit to pubkey: http://git.io/ve3OZ
02:59 🔗 GitHub138/#internetarchive.bak IA.BAK/master b134621 Joey Hess: hit webhook to pull new keys in registration process (github's web hook seems slow or not always arriving)
02:59 🔗 GitHub138/#internetarchive.bak [IA.BAK] joeyh pushed 1 new commit to master: http://git.io/ve33Y
03:08 🔗 S[h]O[r]T is the "get" part of git-annex or is that really just wget?
03:08 🔗 closure as in "git annex get" ?
03:09 🔗 S[h]O[r]T yeah, i assume that is what is downloading the files
03:09 🔗 closure it's 3 layers with wget underneath in this case
03:16 🔗 S[h]O[r]T ah ok.
03:16 🔗 S[h]O[r]T if i read this right i can run iabak multiple times without issue?
03:19 🔗 closure indeed
03:19 🔗 closure one of the things one of those layers allows :)
03:55 🔗 SketchCow Hey what wha
04:10 🔗 SketchCow Good cheap hack.
04:10 🔗 SketchCow Meanwhile, we're closed to 100% 4
04:46 🔗 tpw_rules that's wicked progress from like yesterday
04:46 🔗 tpw_rules in fact all my downloads are done
04:50 🔗 pikhq Sweet, we've only got 3 copy stuff left.
04:51 🔗 tpw_rules closure: you need to add some sort of percentage to reading the size info for "git annex info"
05:52 🔗 niyaje3 (~niyaje@[redacted]) has joined #internetarchive.bak
07:24 🔗 niyaje3 has quit (Ping timeout: 600 seconds)
08:34 🔗 bzc6p_ (~bzc6p@[redacted]) has joined #internetarchive.bak
08:34 🔗 bzc6p has quit (Read error: Operation timed out)
08:38 🔗 bzc6p_ is now known as bzc6p
11:32 🔗 zottelbey (~zottelbey@[redacted]) has joined #internetarchive.bak
12:04 🔗 Senji (jdamery@[redacted]) has joined #internetarchive.bak
12:05 🔗 Senji Hello, can I sign up?
12:11 🔗 trs80 Senji: git clone https://github.com/ArchiveTeam/IA.BAK then have a look at readme.md
12:12 🔗 trs80 you may need to wait for someone (closure) who can add your ssh key
12:12 🔗 Senji Yes, I've done that :) iabak told me to come to irc and give someone an ssh key :)
12:12 🔗 Senji that!
12:15 🔗 midas Senji: dump it here, closure will add it
12:15 🔗 trs80 closure is in the US, so might still be sleeping. in the interim you could add it to https://github.com/ArchiveTeam/IA.BAK/blob/pubkey/SHARD1/pubkeys and send a pull request if your git-fu is up to it
12:16 🔗 bzc6p (~bzc6p@[redacted]) has left #internetarchive.bak
12:16 🔗 Senji ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCY455AWlXCJQjmZf6UZQHNlS0JdE7+24HLaNzZd2lWTDL9xHQkO/m2itYrhh3l5keT8Og3ggTyK+xWM7+1J7Vr7AtjPrM9fmwOwJ9YqFTy6T1ELHNriAl1KJOB75NDF2XKymp6v+T1MSTwPTDx/jmlh4g5C4TJjMNzX6KzKSQO4fNpPqAXZGANcTEWUyD58otq0ihCLfuG6ziyWIKZIoKSkGzf1RML1jsSnyuegbQ6bolMqct5YSt9yv21yoBYNabXGZKOlKyqA7h7Y12SlYKOok9fTmU51EKRJfLkuQETzIp9i0kvIvVWysWJJ6lX5FIo0PmLclq/Vxs+/3dfxlDD jdamery@cleopatra
12:17 🔗 Senji trs80: I'm not really up to handling github faff from my phone (in a break during a con at the moment :))
12:18 🔗 trs80 yeah, fair enough. eastercon?
12:18 🔗 Senji Yup
12:51 🔗 S[h]O[r]T wish i was getting better speeds from IA. running the script around 7x. was doing almost 1MB/s each during the night but slowed down to around 100KB/s-200KB/s this morning
12:55 🔗 zottelbey i feel like it hardly matters how many copies you run at a certain point. there is hardly any difference in speed for me whether i run 10 or 20 copies. 2MB/s is the maximum i will get. i guess the peering to germany is terrible
12:56 🔗 zottelbey sometimes i get peaks of 7MB/s on single files. but thats rare enough.
12:58 🔗 db48x I'll add your key, one sec
13:08 🔗 GitHub8/#internetarchive.bak IA.BAK/pubkey 0263695 Daniel Brooks: add a key for Senji
13:08 🔗 GitHub8/#internetarchive.bak [IA.BAK] db48x pushed 1 new commit to pubkey: http://git.io/vesbl
14:00 🔗 Senji thanks
14:02 🔗 db48x you're welcome
15:40 🔗 kyan has quit (Ping timeout: 258 seconds)
17:19 🔗 SketchCow Things look like we're getting stuff done, which is great.
18:21 🔗 kyan (~kyan@[redacted]) has joined #internetarchive.bak
18:26 🔗 kyan has quit (Client Quit)
19:10 🔗 Kenshin SketchCow: any idea if IA.BAK is putting extra pressure on IA's network? the ISC link has been in the red zone since the past 2 months
19:10 🔗 Kenshin that's probably partially the reason for flucutating download speeds
19:14 🔗 Senji Hmm, iabak still can't get access to SHARD1@iabak.archiveteam.org:shard - do I need to wait for someone to pull the key to the shard server or something? :)
19:47 🔗 SketchCow It is NOT putting extra pressure.
19:47 🔗 SketchCow What it's doing is revealing slowdowns.
19:47 🔗 SketchCow WHich is great
19:48 🔗 SketchCow We're going to fix these things soon
19:48 🔗 SN4T14_ (~SN4T14@[redacted]) has joined #internetarchive.bak
19:48 🔗 pikhq Yeah, they don't appear to have any instances of falling over or anything.
19:49 🔗 pikhq This certainly hasn't caused a DOS on the Archive. :)
19:50 🔗 kyan (~kyan@[redacted]) has joined #internetarchive.bak
19:56 🔗 SN4T14__ has quit (Ping timeout: 512 seconds)
20:03 🔗 closure Senji: your key is on the server afaics
20:04 🔗 Senji Aha it's working now. great!
20:05 🔗 GitHub158/#internetarchive.bak IA.BAK/master 8c57f7a Joey Hess: fix url to force immediate key deploy
20:05 🔗 GitHub158/#internetarchive.bak [IA.BAK] joeyh pushed 1 new commit to master: http://git.io/veZSa
20:07 🔗 kyan has quit (Quit: This computer has gone to sleep)
20:07 🔗 SketchCow We should figure out how long it's taken to initiate and complete this 2.91gb shard
20:08 🔗 closure http://iabak.archiveteam.org/stats/SHARD1.filestransferred
20:08 🔗 closure we have most of the history here
20:08 🔗 closure (still needs a graph of this made)
20:09 🔗 midas what kind of data is that closure ? total transferred?
20:10 🔗 closure number of downloads of files
20:10 🔗 closure sees SHARD2 already has 5 clients and 32k files transferred
20:11 🔗 midas oh shard2 is active?
20:11 🔗 closure ./checkoutshard SHARD2
20:11 🔗 midas ill start on that in a bit
20:15 🔗 closure is a little concerned by the 21486 files on SHARD1 that have more than 3 copies downloaded
20:15 🔗 closure that's a quarter of the whole thing that was downloaded more than necessary
20:16 🔗 pikhq Huh. Race condition?
20:16 🔗 closure yes, 2 clients decide to get the same file same time
20:16 🔗 pikhq Presumably there's no simple way to get two hosts to coordinate on that.
20:18 🔗 Senji What about the 15 files at +7? that's a lot of race
20:18 🔗 tpw_rules closure: are you forcing the ignore if downloaded more than once?
20:18 🔗 tpw_rules because i'm not running with any copies limitation
20:19 🔗 closure tpw_rules: well, if you're just running git annex get without --not --copies 4, it'll just get everything
20:19 🔗 tpw_rules exactly. i think people are doing that
20:19 🔗 closure maybe
20:19 🔗 closure we might get cleaner data from the next shard.
20:20 🔗 tpw_rules blah i don't have disk space for that
20:20 🔗 pikhq Though I'd highly recommend coordinating with others for it, it *might* be worthwhile for some people with a lot of redundant copies to drop them.
20:21 🔗 pikhq Particularly if it'd let you get more helpful data.
20:25 🔗 closure notices that the first several files have 6 and 7 copies
20:25 🔗 closure so probably those ones are just people running git annex get manually
20:33 🔗 Senji hmm, i'm getting errors: dirname: invalid option -- 'z'
20:34 🔗 closure Senji: it's an old version of coreutils iirc
20:34 🔗 closure touch NOSHUF and it will avoid that codepath
20:36 🔗 Senji right that's happily downloading now
20:38 🔗 midas closure: thats probably me :p
20:38 🔗 closure actually, I forgot I did a manual git annex get in a lot of usenethistorical, so probably me too ;)
20:39 🔗 midas lets see if munin likes to plot some downloaded files in a neat way
20:41 🔗 closure in fact, you can see the culprits in git annex whereis --copies 5 .. the last listed client is the one that got the file last
20:42 🔗 midas closure: http://dit.serveert.me.uk:8081/munin/serveert.me.uk/dit.serveert.me.uk/iabak.html
20:43 🔗 closure graph seems empty?
20:43 🔗 midas it grabs your imput file from http://iabak.archiveteam.org/stats/SHARD1.filestransferred and feeds it into munin
20:43 🔗 midas yeah only started like 5 minutes ago :p
20:44 🔗 closure ah, it's not getting old data
20:44 🔗 midas nope
20:44 🔗 midas i could add that
20:44 🔗 closure let me make a ALL.filestransferred across all shards and you could add that
20:44 🔗 midas that would be perfect
20:51 🔗 GitHub117/#internetarchive.bak IA.BAK/server 0debc26 Joey Hess: get stats for ALL
20:51 🔗 GitHub117/#internetarchive.bak IA.BAK/server 6ce9b36 Joey Hess: support ALL
20:51 🔗 GitHub117/#internetarchive.bak [IA.BAK] joeyh pushed 2 new commits to server: http://git.io/veZpe
20:53 🔗 closure there we go
20:54 🔗 midas http://iabak.archiveteam.org/stats/ALL.filestransferred is the output file?
20:55 🔗 closure SketchCow: BTW, I don't know what you're planning for the graphs as we get more shards, but you could use this to do a page overviewing them all
20:55 🔗 closure midas: yes
20:55 🔗 midas perfect. ill change the imput file
21:01 🔗 closure midas: also, can munin make anything of http://iabak.archiveteam.org/stats/ALL.clientconnsperhour ?
21:02 🔗 midas sure, but i dont see any data in it
21:02 🔗 midas just timestamps
21:03 🔗 closure the data is the first number
21:04 🔗 S[h]O[r]T i prob should make similar graphs on my client. ive just been looking at my download speeds & disk utilization http://74.63.230.61/munin/localdomain/localhost.localdomain/if_eth0.html http://74.63.230.61/munin/localdomain/localhost.localdomain/diskstats_utilization/vdc.html
21:04 🔗 GitHub85/#internetarchive.bak [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/veZjM
21:04 🔗 GitHub85/#internetarchive.bak IA.BAK/server 59eaeb4 Joey Hess: add ALL file that sums up the numcopies stats for all shards
21:04 🔗 closure yeah, that's a confusing format
21:04 🔗 midas ah right, yeah now i get it :p
21:04 🔗 midas shouldnt be a problem, ill build that in a minute
21:06 🔗 GitHub102/#internetarchive.bak IA.BAK/server 96a29e6 Joey Hess: move file into place
21:06 🔗 GitHub102/#internetarchive.bak [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/veneq
21:07 🔗 GitHub79/#internetarchive.bak [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/veneR
21:08 🔗 GitHub79/#internetarchive.bak IA.BAK/server 212b7af Joey Hess: improve format
21:08 🔗 GitHub35/#internetarchive.bak IA.BAK/server a1dc565 Joey Hess: include 0
21:08 🔗 GitHub35/#internetarchive.bak [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/vener
21:33 🔗 midas seems to be working closure, i've added the graphs per client/hour but that needs to update
21:42 🔗 midas http://dit.serveert.me.uk:8081/munin/serveert.me.uk/dit.serveert.me.uk/iaconnected-day.png closure
21:49 🔗 SketchCow closure: My plan for the graphs that I'm on is just to concatenate all the shards.
21:50 🔗 SketchCow I'm just waiting to hit the full SHARD1 and then we'll throw them together
21:55 🔗 SketchCow Since people are jumping the gun and grabbing shard2, when we add it in, it'll be well along.
22:22 🔗 beardicus has quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz…)
22:44 🔗 beardicus (~beardicus@[redacted]) has joined #internetarchive.bak
22:50 🔗 niyaje3 (~niyaje@[redacted]) has joined #internetarchive.bak
22:56 🔗 beardicus has quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz…)
22:56 🔗 beardicus (~beardicus@[redacted]) has joined #internetarchive.bak
23:00 🔗 beardicus has quit (Client Quit)
23:00 🔗 beardicus (~beardicus@[redacted]) has joined #internetarchive.bak
23:04 🔗 niyaje3 has quit (Read error: Connection reset by peer)
23:14 🔗 zottelbey has quit (Remote host closed the connection)
23:15 🔗 niyaje3 (~niyaje@[redacted]) has joined #internetarchive.bak
23:21 🔗 GitHub46/#internetarchive.bak IA.BAK/server d6512ee Joey Hess: add SHARDn.size file giving size in tb
23:21 🔗 GitHub46/#internetarchive.bak [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/venRY
23:29 🔗 GitHub92/#internetarchive.bak IA.BAK/server 8ec147e Joey Hess: fix
23:29 🔗 GitHub92/#internetarchive.bak [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/ven0Q
23:42 🔗 GitHub136/#internetarchive.bak IA.BAK/server 4e51e2c Joey Hess: fix
23:42 🔗 GitHub136/#internetarchive.bak IA.BAK/server 97896ea Joey Hess: prepare for SHARD2 and ALL; add munin graphs
23:42 🔗 GitHub136/#internetarchive.bak [IA.BAK] joeyh pushed 2 new commits to server: http://git.io/venuB
23:44 🔗 GitHub83/#internetarchive.bak IA.BAK/server 4adbce0 Joey Hess: redundant div
23:44 🔗 GitHub83/#internetarchive.bak [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/venuM
23:44 🔗 closure midas: odd saw-toothing on the connected users grap
23:44 🔗 closure +h
23:46 🔗 yipdw closure: I see that git-annex is in FreeBSD's ports collection; would you recommend it as stable enough for this project?
23:46 🔗 yipdw my big storage thing is a ZFS pool that I'm looking to migrate off of Solaris
23:47 🔗 db48x what's the version number?
23:47 🔗 closure if the version is new enough
23:47 🔗 db48x needs 201502something
23:47 🔗 yipdw I don't know
23:47 🔗 yipdw oh wait, version of git-annex or the ZFS pool
23:47 🔗 closure the former
23:47 🔗 yipdw oh I plan to use whatever is newest and most appropriate
23:48 🔗 yipdw I haven't actually built the FreeBSD system yet, just want to know if I'll hit any known roadblocks doing so
23:50 🔗 yipdw accessing the pool via CIFS on an Ubuntu machine seemed(?) to work but I'd rather have one system powered instead of two
23:51 🔗 yipdw plus doing that I can avoid all the messages about crippled filesystems :P
23:52 🔗 GitHub76/#internetarchive.bak IA.BAK/server ac9ef18 Joey Hess: generate stats for all individual shards, and combined stats
23:52 🔗 GitHub76/#internetarchive.bak [IA.BAK] joeyh pushed 1 new commit to server: http://git.io/venzN
23:55 🔗 closure http://teamarchive1.fnf.archive.org/ia.bak/SHARD2.html
23:56 🔗 yipdw nice
23:57 🔗 tpw_rules oh dear
23:57 🔗 tpw_rules i don't have the disk for that :(

irclogger-viewer