[00:00] *** cmaldonad has joined #internetarchive.bak [00:07] *** wp494_ has joined #internetarchive.bak [00:12] *** wp494 has quit IRC (Read error: Operation timed out) [00:39] 03registrar 05master d0b397e 06other 10SHARD16/pubkeys registration of deewiant+ia.bak on SHARD16 [01:01] *** chfoo has quit IRC (Read error: Operation timed out) [01:02] *** chfoo has joined #internetarchive.bak [01:03] *** svchfoo1 sets mode: +o chfoo [01:57] 03registrar 05master 27aad98 06other 10SHARD14/pubkeys registration of mr.business1148 on SHARD14 [01:58] *** Whopper_ has joined #internetarchive.bak [02:01] *** whopper has quit IRC (Read error: Operation timed out) [02:04] *** cmaldonad has quit IRC (Quit: This computer has gone to sleep) [02:07] *** cmaldonad has joined #internetarchive.bak [02:10] *** bwn has quit IRC (Ping timeout: 244 seconds) [02:10] *** wp494_ is now known as wp494 [02:15] *** balrog has quit IRC (Read error: Operation timed out) [02:17] *** bwn has joined #internetarchive.bak [02:19] *** balrog has joined #internetarchive.bak [02:21] *** db48x has joined #internetarchive.bak [02:43] so, /usr/bin/env is the new thing to do? [02:43] seems like a waste [02:43] if we're not actually modifying the environment [02:44] it's probably fine to skip it for e.g. Ruby and Python scripts [02:45] I just got used to it because virtualenv, rvm, rbenv, chruby etc [02:46] I guess it does let us run something out of the PATH, rather than requiring an absolute path [02:47] it does; however, if system configuration is going to be controlled via propellor, I don't think that's needed [02:48] yipdw: https://github.com/ArchiveTeam/IA.BAK/issues/48 [02:48] oh the client end [02:48] yeah, that's right, bash is not in /bin on FreeBSD [02:57] same on my Drobo 5N, had to symlink it into /bin [03:02] *** balrog has quit IRC (Quit: Bye) [03:12] *** balrog has joined #internetarchive.bak [03:29] at some point it makes sense to ditch the bash scripts and rewrite in python/perl/go/whatever that has less runtime deps [03:32] indeed [03:34]  ./iabak [03:34] You asked to pull from the remote 'origin', but did not specify [03:34] a branch. Because this is not the default configured remote [03:34] for your current branch, you must specify a branch on the command line. [03:34] Welcome to iabak version 0.1 [03:43] well, using env doesn't break anything obvious [03:46] using env is fine, except for those embedded systems that don't include env :P [03:49] closure: hadn't thought of that [04:05] *** cmaldonad has quit IRC (Quit: This computer has gone to sleep) [04:36] *** cmaldonad has joined #internetarchive.bak [05:11] *** cmaldonad has quit IRC (Quit: This computer has gone to sleep) [05:52] i had to limit bandwidth, i forget that Comcast now has a 1TB monthly cap, I downloaded ~130G before I realized I might want to use my internet for the rest of the month [05:53] thelsdj: :) [07:29] oops: [07:29] On branch master [07:29] Untracked files: [07:29] () { [07:29] filename && [[ ${spaceneeded} -lt ${spacelimit} ]]; do [07:53] 03registrar 05master 410bf88 06other 10SHARD14/pubkeys registration of db48x+iabak on SHARD14 [07:57] *** kyan has quit IRC (Quit: Leaving) [07:57] 03registrar 05master c4fba5f 06other 10SHARD15/pubkeys registration of octobyt3 on SHARD15 [08:03] *** Whopper_ has quit IRC (Remote host closed the connection) [08:07] *** Whopper has joined #internetarchive.bak [08:20] *** Whopper has quit IRC (Remote host closed the connection) [08:32] *** atomotic has joined #internetarchive.bak [08:38] I wish shuf were faster [08:39] I think it's implemented in a really brain-dead way [08:49] that said, I think I have a nice little improvement [08:56] well, aside from one issue [08:56] I need a combination of uniq and xargs [08:57] something that calls a command on each group reported by uniq --group [09:14] * db48x thinks about awk [09:33] sumofbytes () { [09:33] awk 'BEGIN {RS=ORS="\000";FS=OFS=SUBSEP=" "} {arr[$2]+=$1} END {for (i in arr) print arr[i],i}' [09:33] } [10:37] 03registrar 05master 269f212 06other 10SHARD15/pubkeys registration of yongbae2 on SHARD15 [11:44] *** godane has quit IRC (Ping timeout: 250 seconds) [11:58] *** godane has joined #internetarchive.bak [12:47] *** godane has quit IRC (Quit: Leaving.) [13:18] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [13:43] *** atomotic has joined #internetarchive.bak [14:31] *** cmaldonad has joined #internetarchive.bak [14:34] *** cmaldonad has quit IRC (Client Quit) [14:40] *** atomotic has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) [14:51] *** atomotic has joined #internetarchive.bak [14:57] *** Start has quit IRC (Quit: Disconnected.) [15:37] *** godane has joined #internetarchive.bak [16:02] db48x: I suspect it would suffice to read N filenames, shuffle them, git annex get them, and repeat [16:03] unsure what the right value for N would be, probably something more than the expected number of concurrent downloaders of a shard [16:03] 1000 seems reasonable.. [16:04] hmm.. except shards report back so rarely. Maybe they'd all tend to get the first 1000 files before noticing other have [16:08] 03registrar 05master 6273979 06other 10SHARD4/pubkeys registration of milenko on SHARD4 [16:09] I should probably revisit https://git-annex.branchable.com/design/balanced_preferred_content/ [16:10] but to use it for IA.BAK, clients would need to register interest in a shard, and then wait for N other clients to also be registered before all starting downloading [16:14] when I installed my client, it didn't ask to set a limit on disk free space. I've run the git config command to set the limit on the shard that was currently downloading and the client stopped when it hit that limit [16:15] But now it's gone on to another shard and is downloading away. [16:15] Can I set that global limit for all shards after the fact? [16:24] milenko: set it on all your existing shards and it will be copied over to any new shards [16:26] *** atomotic has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) [16:29] *** atomotic has joined #internetarchive.bak [16:39] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [16:46] awesome, ty [16:59] *** Start has joined #internetarchive.bak [17:00] *** zz_CyberJ is now known as CyberJaco [17:16] 03registrar 05master 121a6d1 06other 10SHARD3/pubkeys registration of mitch on SHARD3 [17:17] *** asktoomuc has joined #internetarchive.bak [17:17] Hi [17:18] is it ok to ask support questions here to get setup with the system? [17:30] well ok, I'll assume I can. I get a lot of "Cannot create symlink" "Operation not supported" errors when running ./iabak for the first time [17:31] I'm running it on Debian 8, on a VM, with the repo located on a SMB share [17:36] You got it, you can ask here. [17:36] Sorry, we have a lot of people with different schedules. [17:36] Someone will have ideas when they get back. [17:37] We're trying to collect examples of these to be able to make a useful faq [17:37] http://iabak.archiveteam.org/SHARD4.html is getting the love [17:41] annoyingly, it seems to not have these issues when created the repo locally in the virtual VM disk, which is tiny... [17:41] I wonder if it has to do with the way the SMB share is mounted using cifs utils [17:42] 03registrar 05master 300dbba 06other 10SHARD12/pubkeys registration of ronin_fight on SHARD12 [17:47] SketchCow: i past 1 million items [17:49] i also think when i'm done with the nasa docs we should see about adding it as one of its own shard [17:49] asktoomuc: smb share is not going to work well (or at all, really) [17:50] asktoomuc: can you do nfs? [17:51] Why is NFS better than SMB [17:51] What's holding back SMB [17:51] smb doesn't support standard n*x things like symlinks [17:52] I'm not found of enabling nfs on my network [17:52] what nas are you using? you could mount an iscsi device [17:53] I'm using unRAID 6 on a custom box [17:54] hmm, no built-in iscsi support [17:55] not, not as far as I know. If think NFS is also buggy on that release [17:56] I think nfs is buggy [17:56] what's the problem exactly with SMB and symlink? There's no support at all for them? [17:56] smb is tricky with acls [17:56] I can't find a good reference, but closure will know what sort of funky filesystem features git-annex requires [17:59] so everyone here is running nfs? [18:00] or just local storage [18:00] can you present the storage directly as a virtual disk to the VM? [18:02] I am unsure how to do that to be honest [18:02] where is the VM running? on the unraid box? [18:06] for now yes. I'm planning to migrate it when my ESXi host is up and running, probably next week [18:06] So, whatever we do, we should NEVER suggest NFS [18:06] Ever [18:07] NFS is last decade's dogshit and its functionality has been heavily replaced. [18:07] There's a chance that we have filesystems that don't work, and we should find solutions where possible. [18:26] NFS is only really good for a tightly controlled LAN environment [18:27] it falls apart when you need authentication or adaptability [18:28] but either should work for a VM share :s [18:43] *** Start has quit IRC (Quit: Disconnected.) [18:46] asktoomuc: if you can, please post the full log somewhere; iabak is a lot of separate programs at the moment [18:48] it is possible there's something that is not git-annex proper that is operating with violated assumptions [18:48] Frogging: yeah, that's the reason why I don't want to use it on my lan. It's anything but tightly controlled. My fault really, but since SMB exists, I'd rather use that [18:50] yipdw: I'll run the process again and link a pastebin here if that's alright [18:50] that's fine [18:52] ping db48x and closure on it too; I'll be out and about soon [18:54] got it, thanks! [19:10] yipdw db48x closure : the stdout: http://pastebin.com/ju9sDWUh and the stderr: http://pastebin.com/T081xNkt [19:10] thanks for looking into this issue [19:44] 03registrar 05master ad686f0 06other 10SHARD10/pubkeys registration of wo_2iabak on SHARD10 [19:51] hey guys, is it fine if i thinker a bit with css of IA.bak? I remember someone mentioning this some time ago [19:52] luckcolor: Yes please [19:52] Make a protoype page [19:53] Ok, will do. Is it fine if i use some css library like bootstrap? [19:53] If it's not too complicated to implement - we like simple [19:54] Sure [20:06] luckcolor: yea, go for it [20:08] asktoomuc: SMB is currently unsupported [20:08] yeah, that's the impression I got. [20:08] SketchCow seems to think it would be a good idea to support it [20:09] or something else [20:10] SMB is a pretty annoying filesystem :) [20:11] at the moment the main problem is installing git-annex; the current tarballs heavily use symlinks [20:11] do we have any alternative aside from local drives and NFS at the moment? [20:16] I think I recall closure saying that he was going to make the release tarball not include any symlinks [20:17] though I can't find it now [20:32] ok, guess I'll have to wait for now. Thanks [20:33] asktoomuc: you could install git-annex locally, and then test it to see if the backup process will work [20:34] I have installed the git package [20:34] git-annex, sorry [20:34] not sure where to take it from there [20:35] try creating a git repository, annexing some files into it, and then cloning that repository onto your smb mount [20:36] (git init; git annex init; git annex add bigfile) [20:36] - ${GITANNEX} find --print0 --not --copies "$NUMCOPIES" | xargs --no-run-if-empty -0 dirname -z -- | uniq -z | shuf -z | xargs --no-run-if-e [20:36] + find_insufficient_copies | dirname_pipe | sumofbytes | shuf -z | rundownloads [20:36] that is so much better [20:58] closure: that would make shuf faster, but I was thinking that there might not be any need to read in the whole input stream before starting to print the shuffled output [20:59] but it's only trivial if you know the length of the stream ahead of time [21:03] *** bwn has quit IRC (Ping timeout: 961 seconds) [21:22] oooh, my new SSD has arrived [21:28] *** asktoomuc has quit IRC (Quit: Page closed) [23:09] closure: what are the return values of git annex get? [23:09] *** sevs has joined #internetarchive.bak [23:20] *** bwn has joined #internetarchive.bak [23:51] oops: [23:51]  db48x  /  media  stuff  IA.BAK  git checkout -b wip-rundownloads-improvements [23:51] fatal: unable to write new index file [23:52] *** kyan has joined #internetarchive.bak