[00:34] I see the drop on shard3 [00:39] bah i think running 8 iabaks is going to be faster than one with -J8 [00:43] i'm going to try re-encoding some of this .dv because it is scary big [01:05] SketchCow: ahh, the expire *finally* happened [01:07] mariusz: I find you need about 500MB free space to run sync [01:09] tpw_rules: I run between about 5 and 20 iabaks in parallel depending on what I'm doing right now. But only for 4 hours a day :( [01:41] *** primus104 has quit IRC (Leaving.) [01:49] *** JesseW has joined #internetarchive.bak [02:20] *** patrickod has quit IRC (Ping timeout: 258 seconds) [02:20] *** patrickod has joined #internetarchive.bak [02:22] *** tpw-rules has joined #internetarchive.bak [02:23] *** db48x has quit IRC (hub.efnet.us irc.Prison.NET) [02:23] *** dirt has quit IRC (hub.efnet.us irc.Prison.NET) [02:23] *** midas has quit IRC (hub.efnet.us irc.Prison.NET) [02:23] *** tpw_rules has quit IRC (hub.efnet.us irc.Prison.NET) [02:24] *** dirt_ has joined #internetarchive.bak [02:25] *** midas1 has joined #internetarchive.bak [02:32] if i nuke all the symlinks, how can i get them back? [02:39] *** dirt_ is now known as dirt [02:39] *** tpw-rules is now known as tpw_rules [02:43] basically, can i reconstruct .git/objects and .git/refs and all the symlinks without reiniting the repo [03:06] *** jbenet__ has quit IRC (Remote host closed the connection) [03:09] *** mattl has quit IRC (Quit: X_X) [03:23] warning: it's about to look like i lost everything [03:23] but it will come back shortly [03:35] okay everything should be fixed [03:37] Senji: have you experimented with running multiple iabaks that are each running in parallel? [03:39] hm [03:39] we also have files that appear to be binary images of CDs. why not turn them into FLAC? [04:31] basically IA has a policy of never fucking with what people upload [04:32] it generates compressed streamable versions but the original is always kept [04:33] so, they could work with material donors to use more suitable formats but what's there is gonna stay [04:33] unless the donor goes through and redoes it all [04:34] ah. why? time/cpu? [04:34] concern that it might get damaged? [04:36] I don't work there so I can't really speak to why [04:36] i don't profess to understand but 'never' seems a bit extreme to me [04:36] ah well [04:36] it's just against their whole way of doing things [04:38] I mean they have an entire warehouse full of books that they've already scanned, just to have them still [04:38] Sigh [04:39] is $80 a good deal for a 3TB hard drive [04:40] http://www.edwardbetts.com/price_per_tb/ [04:41] yes. i can get that top one $10 less per drive for a deal of 3 [04:41] i may do it because i like knowing i have ridiculous amounts of data [04:42] although holy hell those samsungs [04:42] i didn't expect an external to be cheaper [04:43] although at some point i'll have to stop since i can't keep doing this for 1762 more shards [04:46] they've been cheaper for quite some time, price discrimination for consumer vs enterprise buyers or some such [04:46] IA buys shitloads of externals and then takes the covers off [04:46] interesting [04:47] which specifically? [04:47] eh it's getting time for sleep. night [04:47] currently running at about 10 floppies/second [05:29] *** JesseW has quit IRC (Ping timeout: 265 seconds) [05:30] *** JesseW has joined #internetarchive.bak [06:13] *** zz_CyberJ is now known as CyberJaco [06:16] *** jbenet__ has joined #internetarchive.bak [06:27] *** mattl has joined #internetarchive.bak [06:38] *** ryang has quit IRC (Remote host closed the connection) [07:00] *** JesseW has quit IRC (Quit: Leaving.) [07:07] 03registrar 05master 3aad84a 06other 10SHARD8/pubkeys registration of brian on SHARD8 [07:19] *** ryang has joined #internetarchive.bak [07:24] *** CyberJaco is now known as zz_CyberJ [07:54] tpw_rules: not significantly; I like my progress bars too much :) [08:16] *** primus104 has joined #internetarchive.bak [08:27] *** primus105 has joined #internetarchive.bak [08:33] *** primus104 has quit IRC (Read error: Operation timed out) [08:52] *** hendi__ has joined #internetarchive.bak [08:54] *** primus105 has quit IRC (Leaving.) [09:03] *** primus104 has joined #internetarchive.bak [09:03] *** primus104 has quit IRC (Client Quit) [10:50] *** midas1 is now known as midas [11:42] *** db48x has joined #internetarchive.bak [12:00] DFJustin: IA does not buy shitloads of externals anymore. [12:01] They have deals with dealers - it's just when the crisis hit due to the flooding, they found this whack-ass channel for a while. [12:01] Also, they've jumped to 6tb and 8tbs [12:02] i think backblaze still strip down external drives [12:06] *** mariusz has joined #internetarchive.bak [12:45] *** sankin has joined #internetarchive.bak [12:46] *** sankin has quit IRC (Client Quit) [12:51] *** sankin has joined #internetarchive.bak [13:30] tpw_rules: have you looked at the IA page for the huge dv file? [13:31] wouldn't surprise me if there's already a re-encoded version there [13:34] There is, he's just bothered that we are archiving the original instead of the re-encode. [13:34] That's the breaks. [13:34] yeah, ok [13:49] *** primus104 has joined #internetarchive.bak [13:50] *** hendi_ has joined #internetarchive.bak [13:58] *** hendi__ has quit IRC (Ping timeout: 512 seconds) [13:58] *** Start has quit IRC (Disconnected.) [14:37] *** Start has joined #internetarchive.bak [14:54] *** Start has quit IRC (Disconnected.) [14:55] *** Start has joined #internetarchive.bak [14:56] *** primus104 has quit IRC (Leaving.) [15:02] yeah i'm whiny [15:03] i'm getting lots of failures with shard 8 [15:04] Is your internet connection flaky? [15:04] i don't think so [15:05] it never has been [15:05] I've not had any problems downloading files over 100GB; even though it's taken weeks [15:05] me either, but my connection is much faster [15:06] https://i.imgur.com/2fANPGC.png [15:06] i'm not sure how it decides 'failed' [15:06] i remember having issues with shard 1 when things got shut down and it couldn't find them. has the same happened to shard 8? [15:06] I don't think your connection is *much* faster; but you're probably downloading more than 4 hours a day :-) [15:07] oh yeah, that would put a damper on things [15:07] how do you make it resume? [15:08] I 'killall -STOP wget' at 6am every morning and 'killall -CONT wget' at 2am (via cronjob) [15:09] why do you have to do that? [15:09] shard3 had some files that can't download because the filename quoting is wrong [15:10] Usage charging; it's essnetially free during those four hours overnight. [15:10] ohhhh :'( [15:10] that mucho sucks [15:16] how much storage do you have? i'm concerned about buying more before i look back and i've spend $7000 [15:16] Umm, at the moment I'm working on lying around bits of storage that's under my desk. Maybe 6-8TB in total? [15:16] ahh [15:17] that's what i've been doing but i'm running out [15:17] http://www.newegg.com/Product/Product.aspx?Item=N82E16822152425 kinda considering a couple of those [15:17] db48x: a passing systemd expert points at http://www.freedesktop.org/software/systemd/man/sd_booted.html as documeinting /run/systemd/system/ as the canonical test for systemd [15:19] it also doesn't look like syncing is happening [15:20] I've had the occasional problem with hourlysync dying [15:20] * tpw_rules starts in another terminal [15:21] Those drives look very cheap; but maybe that's just the thing where US to UK pricing is a bit odd for computer hardware [15:22] I have a http://www.amazon.co.uk/gp/product/B00JV1YQY0 that I use in my media-centre-pc (not for iabak :)) which cost me £119.99 [15:24] well in theory it's the same drive [15:25] lol the guy who asked seagate for a new motherboard [15:26] i have a stack of 5TB wd red NAS drives in my server but they're pretty expensive [15:27] I think you'd want greens for this. Purples maybe if you think it's more of a 24/7 usage pattern [15:27] well i'm also very cheap [15:27] i can get good seagate enclosures for like $16 a pop on ebay [15:28] it looks like the chipset is having problems with 5TB tho [15:29] that box on newegg has a lot of the same complaints as the samsung [15:29] i'll have to monitor spinning of my drives. not sure how good linux is at spinning them down over usb [15:32] *** JesseW has joined #internetarchive.bak [15:44] *** Start has quit IRC (Disconnected.) [15:56] *** JesseW has quit IRC (Quit: Leaving.) [16:03] *** Start has joined #internetarchive.bak [16:21] 03registrar 05master 902ec21 06other 10SHARD3/pubkeys registration of mail on SHARD3 [16:33] *** hendi_ has quit IRC (ircd.choopa.net irc.eversible.com) [16:33] *** Lord_ has quit IRC (ircd.choopa.net irc.eversible.com) [16:33] *** espes__ has quit IRC (ircd.choopa.net irc.eversible.com) [16:33] *** svchfoo3 has quit IRC (ircd.choopa.net irc.eversible.com) [16:33] *** bpye has quit IRC (ircd.choopa.net irc.eversible.com) [16:38] *** espes___ has joined #internetarchive.bak [16:42] *** bpye_ has joined #internetarchive.bak [16:43] *** hendi has joined #internetarchive.bak [16:43] *** Lord__ has joined #internetarchive.bak [16:43] *** Start has quit IRC (Disconnected.) [17:22] *** primus104 has joined #internetarchive.bak [18:38] closure: when is a new build with --incomplete coming? [18:54] *** Start has joined #internetarchive.bak [18:58] i think the fsck is locking too much [18:59] it seems to hold a lock on the archive while the checksum is happening so there's a lot of time that ./iabak is waiting to get a new file to download [19:06] it looks like a lot of stuff is going from cdbbsarchive [19:06] also closure i have even more crazy ideas [19:24] *** Start has quit IRC (Disconnected.) [20:08] *** mariusz has quit IRC (Read error: Operation timed out) [20:51] *** hendi has quit IRC (Ping timeout: 259 seconds) [20:59] *** sankin has quit IRC (Leaving.) [22:09] *** garyrh has quit IRC (Read error: Connection reset by peer) [22:27] *** Start has joined #internetarchive.bak [22:33] *** garyrh has joined #internetarchive.bak [22:54] *** Apathy_ has joined #internetarchive.bak [23:14] tpw_rules: I've not noticed any locking between fscking and getting; and one of my shards takes more than 5 hours to fsck [23:14] (cleopatra is Realyl Slow™ by modern computer standards) [23:15] Various modes of failing to get a file take *forever* though; I think waiting for timeouts in wget [23:41] *** mntasauri has quit IRC (Max SendQ exceeded) [23:41] *** mntasauri has joined #internetarchive.bak [23:43] *** mntasauri has quit IRC (Max SendQ exceeded) [23:43] *** mntasauri has joined #internetarchive.bak [23:46] *** primus104 has quit IRC (hub.se irc.efnet.pl)