#internetarchive.bak 2016-11-17,Thu

↑back Search

Time Nickname Message
00:00 πŸ”— cmaldonad has joined #internetarchive.bak
00:07 πŸ”— wp494_ has joined #internetarchive.bak
00:12 πŸ”— wp494 has quit IRC (Read error: Operation timed out)
00:39 πŸ”— iabak-reg 03registrar 05master d0b397e 06other 10SHARD16/pubkeys registration of deewiant+ia.bak on SHARD16
01:01 πŸ”— chfoo has quit IRC (Read error: Operation timed out)
01:02 πŸ”— chfoo has joined #internetarchive.bak
01:03 πŸ”— svchfoo1 sets mode: +o chfoo
01:57 πŸ”— iabak-reg 03registrar 05master 27aad98 06other 10SHARD14/pubkeys registration of mr.business1148 on SHARD14
01:58 πŸ”— Whopper_ has joined #internetarchive.bak
02:01 πŸ”— whopper has quit IRC (Read error: Operation timed out)
02:04 πŸ”— cmaldonad has quit IRC (Quit: This computer has gone to sleep)
02:07 πŸ”— cmaldonad has joined #internetarchive.bak
02:10 πŸ”— bwn has quit IRC (Ping timeout: 244 seconds)
02:10 πŸ”— wp494_ is now known as wp494
02:15 πŸ”— balrog has quit IRC (Read error: Operation timed out)
02:17 πŸ”— bwn has joined #internetarchive.bak
02:19 πŸ”— balrog has joined #internetarchive.bak
02:21 πŸ”— db48x has joined #internetarchive.bak
02:43 πŸ”— db48x so, /usr/bin/env is the new thing to do?
02:43 πŸ”— db48x seems like a waste
02:43 πŸ”— db48x if we're not actually modifying the environment
02:44 πŸ”— yipdw it's probably fine to skip it for e.g. Ruby and Python scripts
02:45 πŸ”— yipdw I just got used to it because virtualenv, rvm, rbenv, chruby etc
02:46 πŸ”— db48x I guess it does let us run something out of the PATH, rather than requiring an absolute path
02:47 πŸ”— yipdw it does; however, if system configuration is going to be controlled via propellor, I don't think that's needed
02:48 πŸ”— db48x yipdw: https://github.com/ArchiveTeam/IA.BAK/issues/48
02:48 πŸ”— yipdw oh the client end
02:48 πŸ”— yipdw yeah, that's right, bash is not in /bin on FreeBSD
02:57 πŸ”— thelsdj same on my Drobo 5N, had to symlink it into /bin
03:02 πŸ”— balrog has quit IRC (Quit: Bye)
03:12 πŸ”— balrog has joined #internetarchive.bak
03:29 πŸ”— closure at some point it makes sense to ditch the bash scripts and rewrite in python/perl/go/whatever that has less runtime deps
03:32 πŸ”— db48x indeed
03:34 πŸ”— db48x ξ‚°Β ./iabak
03:34 πŸ”— db48x You asked to pull from the remote 'origin', but did not specify
03:34 πŸ”— db48x a branch. Because this is not the default configured remote
03:34 πŸ”— db48x for your current branch, you must specify a branch on the command line.
03:34 πŸ”— db48x Welcome to iabak version 0.1
03:43 πŸ”— db48x well, using env doesn't break anything obvious
03:46 πŸ”— closure using env is fine, except for those embedded systems that don't include env :P
03:49 πŸ”— db48x closure: hadn't thought of that
04:05 πŸ”— cmaldonad has quit IRC (Quit: This computer has gone to sleep)
04:36 πŸ”— cmaldonad has joined #internetarchive.bak
05:11 πŸ”— cmaldonad has quit IRC (Quit: This computer has gone to sleep)
05:52 πŸ”— thelsdj i had to limit bandwidth, i forget that Comcast now has a 1TB monthly cap, I downloaded ~130G before I realized I might want to use my internet for the rest of the month
05:53 πŸ”— db48x thelsdj: :)
07:29 πŸ”— db48x oops:
07:29 πŸ”— db48x On branch master
07:29 πŸ”— db48x Untracked files:
07:29 πŸ”— db48x () {
07:29 πŸ”— db48x filename && [[ ${spaceneeded} -lt ${spacelimit} ]]; do
07:53 πŸ”— iabak-reg 03registrar 05master 410bf88 06other 10SHARD14/pubkeys registration of db48x+iabak on SHARD14
07:57 πŸ”— kyan has quit IRC (Quit: Leaving)
07:57 πŸ”— iabak-reg 03registrar 05master c4fba5f 06other 10SHARD15/pubkeys registration of octobyt3 on SHARD15
08:03 πŸ”— Whopper_ has quit IRC (Remote host closed the connection)
08:07 πŸ”— Whopper has joined #internetarchive.bak
08:20 πŸ”— Whopper has quit IRC (Remote host closed the connection)
08:32 πŸ”— atomotic has joined #internetarchive.bak
08:38 πŸ”— db48x I wish shuf were faster
08:39 πŸ”— db48x I think it's implemented in a really brain-dead way
08:49 πŸ”— db48x that said, I think I have a nice little improvement
08:56 πŸ”— db48x well, aside from one issue
08:56 πŸ”— db48x I need a combination of uniq and xargs
08:57 πŸ”— db48x something that calls a command on each group reported by uniq --group
09:14 πŸ”— * db48x thinks about awk
09:33 πŸ”— db48x sumofbytes () {
09:33 πŸ”— db48x awk 'BEGIN {RS=ORS="\000";FS=OFS=SUBSEP=" "} {arr[$2]+=$1} END {for (i in arr) print arr[i],i}'
09:33 πŸ”— db48x }
10:37 πŸ”— iabak-reg 03registrar 05master 269f212 06other 10SHARD15/pubkeys registration of yongbae2 on SHARD15
11:44 πŸ”— godane has quit IRC (Ping timeout: 250 seconds)
11:58 πŸ”— godane has joined #internetarchive.bak
12:47 πŸ”— godane has quit IRC (Quit: Leaving.)
13:18 πŸ”— atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
13:43 πŸ”— atomotic has joined #internetarchive.bak
14:31 πŸ”— cmaldonad has joined #internetarchive.bak
14:34 πŸ”— cmaldonad has quit IRC (Client Quit)
14:40 πŸ”— atomotic has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)
14:51 πŸ”— atomotic has joined #internetarchive.bak
14:57 πŸ”— Start has quit IRC (Quit: Disconnected.)
15:37 πŸ”— godane has joined #internetarchive.bak
16:02 πŸ”— closure db48x: I suspect it would suffice to read N filenames, shuffle them, git annex get them, and repeat
16:03 πŸ”— closure unsure what the right value for N would be, probably something more than the expected number of concurrent downloaders of a shard
16:03 πŸ”— closure 1000 seems reasonable..
16:04 πŸ”— closure hmm.. except shards report back so rarely. Maybe they'd all tend to get the first 1000 files before noticing other have
16:08 πŸ”— iabak-reg 03registrar 05master 6273979 06other 10SHARD4/pubkeys registration of milenko on SHARD4
16:09 πŸ”— closure I should probably revisit https://git-annex.branchable.com/design/balanced_preferred_content/
16:10 πŸ”— closure but to use it for IA.BAK, clients would need to register interest in a shard, and then wait for N other clients to also be registered before all starting downloading
16:14 πŸ”— milenko when I installed my client, it didn't ask to set a limit on disk free space. I've run the git config command to set the limit on the shard that was currently downloading and the client stopped when it hit that limit
16:15 πŸ”— milenko But now it's gone on to another shard and is downloading away.
16:15 πŸ”— milenko Can I set that global limit for all shards after the fact?
16:24 πŸ”— closure milenko: set it on all your existing shards and it will be copied over to any new shards
16:26 πŸ”— atomotic has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)
16:29 πŸ”— atomotic has joined #internetarchive.bak
16:39 πŸ”— atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com)
16:46 πŸ”— milenko awesome, ty
16:59 πŸ”— Start has joined #internetarchive.bak
17:00 πŸ”— zz_CyberJ is now known as CyberJaco
17:16 πŸ”— iabak-reg 03registrar 05master 121a6d1 06other 10SHARD3/pubkeys registration of mitch on SHARD3
17:17 πŸ”— asktoomuc has joined #internetarchive.bak
17:17 πŸ”— asktoomuc Hi
17:18 πŸ”— asktoomuc is it ok to ask support questions here to get setup with the system?
17:30 πŸ”— asktoomuc well ok, I'll assume I can. I get a lot of "Cannot create symlink" "Operation not supported" errors when running ./iabak for the first time
17:31 πŸ”— asktoomuc I'm running it on Debian 8, on a VM, with the repo located on a SMB share
17:36 πŸ”— SketchCow You got it, you can ask here.
17:36 πŸ”— SketchCow Sorry, we have a lot of people with different schedules.
17:36 πŸ”— SketchCow Someone will have ideas when they get back.
17:37 πŸ”— SketchCow We're trying to collect examples of these to be able to make a useful faq
17:37 πŸ”— SketchCow http://iabak.archiveteam.org/SHARD4.html is getting the love
17:41 πŸ”— asktoomuc annoyingly, it seems to not have these issues when created the repo locally in the virtual VM disk, which is tiny...
17:41 πŸ”— asktoomuc I wonder if it has to do with the way the SMB share is mounted using cifs utils
17:42 πŸ”— iabak-reg 03registrar 05master 300dbba 06other 10SHARD12/pubkeys registration of ronin_fight on SHARD12
17:47 πŸ”— godane SketchCow: i past 1 million items
17:49 πŸ”— godane i also think when i'm done with the nasa docs we should see about adding it as one of its own shard
17:49 πŸ”— trs80 asktoomuc: smb share is not going to work well (or at all, really)
17:50 πŸ”— trs80 asktoomuc: can you do nfs?
17:51 πŸ”— SketchCow Why is NFS better than SMB
17:51 πŸ”— SketchCow What's holding back SMB
17:51 πŸ”— trs80 smb doesn't support standard n*x things like symlinks
17:52 πŸ”— asktoomuc I'm not found of enabling nfs on my network
17:52 πŸ”— trs80 what nas are you using? you could mount an iscsi device
17:53 πŸ”— asktoomuc I'm using unRAID 6 on a custom box
17:54 πŸ”— trs80 hmm, no built-in iscsi support
17:55 πŸ”— asktoomuc not, not as far as I know. If think NFS is also buggy on that release
17:56 πŸ”— Kaz I think nfs is buggy
17:56 πŸ”— asktoomuc what's the problem exactly with SMB and symlink? There's no support at all for them?
17:56 πŸ”— Meroje smb is tricky with acls
17:56 πŸ”— trs80 I can't find a good reference, but closure will know what sort of funky filesystem features git-annex requires
17:59 πŸ”— asktoomuc so everyone here is running nfs?
18:00 πŸ”— trs80 or just local storage
18:00 πŸ”— trs80 can you present the storage directly as a virtual disk to the VM?
18:02 πŸ”— asktoomuc I am unsure how to do that to be honest
18:02 πŸ”— trs80 where is the VM running? on the unraid box?
18:06 πŸ”— asktoomuc for now yes. I'm planning to migrate it when my ESXi host is up and running, probably next week
18:06 πŸ”— SketchCow So, whatever we do, we should NEVER suggest NFS
18:06 πŸ”— SketchCow Ever
18:07 πŸ”— SketchCow NFS is last decade's dogshit and its functionality has been heavily replaced.
18:07 πŸ”— SketchCow There's a chance that we have filesystems that don't work, and we should find solutions where possible.
18:26 πŸ”— Frogging NFS is only really good for a tightly controlled LAN environment
18:27 πŸ”— Frogging it falls apart when you need authentication or adaptability
18:28 πŸ”— Frogging but either should work for a VM share :s
18:43 πŸ”— Start has quit IRC (Quit: Disconnected.)
18:46 πŸ”— yipdw asktoomuc: if you can, please post the full log somewhere; iabak is a lot of separate programs at the moment
18:48 πŸ”— yipdw it is possible there's something that is not git-annex proper that is operating with violated assumptions
18:48 πŸ”— asktoomuc Frogging: yeah, that's the reason why I don't want to use it on my lan. It's anything but tightly controlled. My fault really, but since SMB exists, I'd rather use that
18:50 πŸ”— asktoomuc yipdw: I'll run the process again and link a pastebin here if that's alright
18:50 πŸ”— yipdw that's fine
18:52 πŸ”— yipdw ping db48x and closure on it too; I'll be out and about soon
18:54 πŸ”— asktoomuc got it, thanks!
19:10 πŸ”— asktoomuc yipdw db48x closure : the stdout: http://pastebin.com/ju9sDWUh and the stderr: http://pastebin.com/T081xNkt
19:10 πŸ”— asktoomuc thanks for looking into this issue
19:44 πŸ”— iabak-reg 03registrar 05master ad686f0 06other 10SHARD10/pubkeys registration of wo_2iabak on SHARD10
19:51 πŸ”— luckcolor hey guys, is it fine if i thinker a bit with css of IA.bak? I remember someone mentioning this some time ago
19:52 πŸ”— SketchCow luckcolor: Yes please
19:52 πŸ”— SketchCow Make a protoype page
19:53 πŸ”— luckcolor Ok, will do. Is it fine if i use some css library like bootstrap?
19:53 πŸ”— SketchCow If it's not too complicated to implement - we like simple
19:54 πŸ”— luckcolor Sure
20:06 πŸ”— db48x luckcolor: yea, go for it
20:08 πŸ”— db48x asktoomuc: SMB is currently unsupported
20:08 πŸ”— asktoomuc yeah, that's the impression I got.
20:08 πŸ”— asktoomuc SketchCow seems to think it would be a good idea to support it
20:09 πŸ”— asktoomuc or something else
20:10 πŸ”— db48x SMB is a pretty annoying filesystem :)
20:11 πŸ”— db48x at the moment the main problem is installing git-annex; the current tarballs heavily use symlinks
20:11 πŸ”— asktoomuc do we have any alternative aside from local drives and NFS at the moment?
20:16 πŸ”— db48x I think I recall closure saying that he was going to make the release tarball not include any symlinks
20:17 πŸ”— db48x though I can't find it now
20:32 πŸ”— asktoomuc ok, guess I'll have to wait for now. Thanks
20:33 πŸ”— db48x asktoomuc: you could install git-annex locally, and then test it to see if the backup process will work
20:34 πŸ”— asktoomuc I have installed the git package
20:34 πŸ”— asktoomuc git-annex, sorry
20:34 πŸ”— asktoomuc not sure where to take it from there
20:35 πŸ”— db48x try creating a git repository, annexing some files into it, and then cloning that repository onto your smb mount
20:36 πŸ”— db48x (git init; git annex init; git annex add bigfile)
20:36 πŸ”— db48x - ${GITANNEX} find --print0 --not --copies "$NUMCOPIES" | xargs --no-run-if-empty -0 dirname -z -- | uniq -z | shuf -z | xargs --no-run-if-e
20:36 πŸ”— db48x + find_insufficient_copies | dirname_pipe | sumofbytes | shuf -z | rundownloads
20:36 πŸ”— db48x that is so much better
20:58 πŸ”— db48x closure: that would make shuf faster, but I was thinking that there might not be any need to read in the whole input stream before starting to print the shuffled output
20:59 πŸ”— db48x but it's only trivial if you know the length of the stream ahead of time
21:03 πŸ”— bwn has quit IRC (Ping timeout: 961 seconds)
21:22 πŸ”— db48x oooh, my new SSD has arrived
21:28 πŸ”— asktoomuc has quit IRC (Quit: Page closed)
23:09 πŸ”— db48x closure: what are the return values of git annex get?
23:09 πŸ”— sevs has joined #internetarchive.bak
23:20 πŸ”— bwn has joined #internetarchive.bak
23:51 πŸ”— db48x oops:
23:51 πŸ”— db48x Β db48xΒ ξ‚°Β /Β ξ‚±Β mediaΒ ξ‚±Β stuffΒ ξ‚±Β IA.BAKΒ ξ‚°Β git checkout -b wip-rundownloads-improvements
23:51 πŸ”— db48x fatal: unable to write new index file
23:52 πŸ”— kyan has joined #internetarchive.bak

irclogger-viewer