#internetarchive.bak 2015-06-09,Tue

↑back Search

Time Nickname Message
00:34 🔗 SketchCow I see the drop on shard3
00:39 🔗 tpw_rules bah i think running 8 iabaks is going to be faster than one with -J8
00:43 🔗 tpw_rules i'm going to try re-encoding some of this .dv because it is scary big
01:05 🔗 Senji SketchCow: ahh, the expire *finally* happened
01:07 🔗 Senji mariusz: I find you need about 500MB free space to run sync
01:09 🔗 Senji tpw_rules: I run between about 5 and 20 iabaks in parallel depending on what I'm doing right now. But only for 4 hours a day :(
01:41 🔗 primus104 has quit IRC (Leaving.)
01:49 🔗 JesseW has joined #internetarchive.bak
02:20 🔗 patrickod has quit IRC (Ping timeout: 258 seconds)
02:20 🔗 patrickod has joined #internetarchive.bak
02:22 🔗 tpw-rules has joined #internetarchive.bak
02:23 🔗 db48x has quit IRC (hub.efnet.us irc.Prison.NET)
02:23 🔗 dirt has quit IRC (hub.efnet.us irc.Prison.NET)
02:23 🔗 midas has quit IRC (hub.efnet.us irc.Prison.NET)
02:23 🔗 tpw_rules has quit IRC (hub.efnet.us irc.Prison.NET)
02:24 🔗 dirt_ has joined #internetarchive.bak
02:25 🔗 midas1 has joined #internetarchive.bak
02:32 🔗 tpw-rules if i nuke all the symlinks, how can i get them back?
02:39 🔗 dirt_ is now known as dirt
02:39 🔗 tpw-rules is now known as tpw_rules
02:43 🔗 tpw_rules basically, can i reconstruct .git/objects and .git/refs and all the symlinks without reiniting the repo
03:06 🔗 jbenet__ has quit IRC (Remote host closed the connection)
03:09 🔗 mattl has quit IRC (Quit: X_X)
03:23 🔗 tpw_rules warning: it's about to look like i lost everything
03:23 🔗 tpw_rules but it will come back shortly
03:35 🔗 tpw_rules okay everything should be fixed
03:37 🔗 tpw_rules Senji: have you experimented with running multiple iabaks that are each running in parallel?
03:39 🔗 tpw_rules hm
03:39 🔗 tpw_rules we also have files that appear to be binary images of CDs. why not turn them into FLAC?
04:31 🔗 DFJustin basically IA has a policy of never fucking with what people upload
04:32 🔗 DFJustin it generates compressed streamable versions but the original is always kept
04:33 🔗 DFJustin so, they could work with material donors to use more suitable formats but what's there is gonna stay
04:33 🔗 DFJustin unless the donor goes through and redoes it all
04:34 🔗 tpw_rules ah. why? time/cpu?
04:34 🔗 tpw_rules concern that it might get damaged?
04:36 🔗 DFJustin I don't work there so I can't really speak to why
04:36 🔗 tpw_rules i don't profess to understand but 'never' seems a bit extreme to me
04:36 🔗 tpw_rules ah well
04:36 🔗 DFJustin it's just against their whole way of doing things
04:38 🔗 DFJustin I mean they have an entire warehouse full of books that they've already scanned, just to have them still
04:38 🔗 SketchCow Sigh
04:39 🔗 tpw_rules is $80 a good deal for a 3TB hard drive
04:40 🔗 DFJustin http://www.edwardbetts.com/price_per_tb/
04:41 🔗 tpw_rules yes. i can get that top one $10 less per drive for a deal of 3
04:41 🔗 tpw_rules i may do it because i like knowing i have ridiculous amounts of data
04:42 🔗 tpw_rules although holy hell those samsungs
04:42 🔗 tpw_rules i didn't expect an external to be cheaper
04:43 🔗 tpw_rules although at some point i'll have to stop since i can't keep doing this for 1762 more shards
04:46 🔗 DFJustin they've been cheaper for quite some time, price discrimination for consumer vs enterprise buyers or some such
04:46 🔗 DFJustin IA buys shitloads of externals and then takes the covers off
04:46 🔗 tpw_rules interesting
04:47 🔗 tpw_rules which specifically?
04:47 🔗 tpw_rules eh it's getting time for sleep. night
04:47 🔗 tpw_rules currently running at about 10 floppies/second
05:29 🔗 JesseW has quit IRC (Ping timeout: 265 seconds)
05:30 🔗 JesseW has joined #internetarchive.bak
06:13 🔗 zz_CyberJ is now known as CyberJaco
06:16 🔗 jbenet__ has joined #internetarchive.bak
06:27 🔗 mattl has joined #internetarchive.bak
06:38 🔗 ryang has quit IRC (Remote host closed the connection)
07:00 🔗 JesseW has quit IRC (Quit: Leaving.)
07:07 🔗 iabak-reg 03registrar 05master 3aad84a 06other 10SHARD8/pubkeys registration of brian on SHARD8
07:19 🔗 ryang has joined #internetarchive.bak
07:24 🔗 CyberJaco is now known as zz_CyberJ
07:54 🔗 Senji tpw_rules: not significantly; I like my progress bars too much :)
08:16 🔗 primus104 has joined #internetarchive.bak
08:27 🔗 primus105 has joined #internetarchive.bak
08:33 🔗 primus104 has quit IRC (Read error: Operation timed out)
08:52 🔗 hendi__ has joined #internetarchive.bak
08:54 🔗 primus105 has quit IRC (Leaving.)
09:03 🔗 primus104 has joined #internetarchive.bak
09:03 🔗 primus104 has quit IRC (Client Quit)
10:50 🔗 midas1 is now known as midas
11:42 🔗 db48x has joined #internetarchive.bak
12:00 🔗 SketchCow DFJustin: IA does not buy shitloads of externals anymore.
12:01 🔗 SketchCow They have deals with dealers - it's just when the crisis hit due to the flooding, they found this whack-ass channel for a while.
12:01 🔗 SketchCow Also, they've jumped to 6tb and 8tbs
12:02 🔗 ppiixx i think backblaze still strip down external drives
12:06 🔗 mariusz has joined #internetarchive.bak
12:45 🔗 sankin has joined #internetarchive.bak
12:46 🔗 sankin has quit IRC (Client Quit)
12:51 🔗 sankin has joined #internetarchive.bak
13:30 🔗 sep332 tpw_rules: have you looked at the IA page for the huge dv file?
13:31 🔗 sep332 wouldn't surprise me if there's already a re-encoded version there
13:34 🔗 SketchCow There is, he's just bothered that we are archiving the original instead of the re-encode.
13:34 🔗 SketchCow That's the breaks.
13:34 🔗 sep332 yeah, ok
13:49 🔗 primus104 has joined #internetarchive.bak
13:50 🔗 hendi_ has joined #internetarchive.bak
13:58 🔗 hendi__ has quit IRC (Ping timeout: 512 seconds)
13:58 🔗 Start has quit IRC (Disconnected.)
14:37 🔗 Start has joined #internetarchive.bak
14:54 🔗 Start has quit IRC (Disconnected.)
14:55 🔗 Start has joined #internetarchive.bak
14:56 🔗 primus104 has quit IRC (Leaving.)
15:02 🔗 tpw_rules yeah i'm whiny
15:03 🔗 tpw_rules i'm getting lots of failures with shard 8
15:04 🔗 Senji Is your internet connection flaky?
15:04 🔗 tpw_rules i don't think so
15:05 🔗 tpw_rules it never has been
15:05 🔗 Senji I've not had any problems downloading files over 100GB; even though it's taken weeks
15:05 🔗 tpw_rules me either, but my connection is much faster
15:06 🔗 tpw_rules https://i.imgur.com/2fANPGC.png
15:06 🔗 tpw_rules i'm not sure how it decides 'failed'
15:06 🔗 tpw_rules i remember having issues with shard 1 when things got shut down and it couldn't find them. has the same happened to shard 8?
15:06 🔗 Senji I don't think your connection is *much* faster; but you're probably downloading more than 4 hours a day :-)
15:07 🔗 tpw_rules oh yeah, that would put a damper on things
15:07 🔗 tpw_rules how do you make it resume?
15:08 🔗 Senji I 'killall -STOP wget' at 6am every morning and 'killall -CONT wget' at 2am (via cronjob)
15:09 🔗 tpw_rules why do you have to do that?
15:09 🔗 Senji shard3 had some files that can't download because the filename quoting is wrong
15:10 🔗 Senji Usage charging; it's essnetially free during those four hours overnight.
15:10 🔗 tpw_rules ohhhh :'(
15:10 🔗 tpw_rules that mucho sucks
15:16 🔗 tpw_rules how much storage do you have? i'm concerned about buying more before i look back and i've spend $7000
15:16 🔗 Senji Umm, at the moment I'm working on lying around bits of storage that's under my desk. Maybe 6-8TB in total?
15:16 🔗 tpw_rules ahh
15:17 🔗 tpw_rules that's what i've been doing but i'm running out
15:17 🔗 tpw_rules http://www.newegg.com/Product/Product.aspx?Item=N82E16822152425 kinda considering a couple of those
15:17 🔗 Senji db48x: a passing systemd expert points at http://www.freedesktop.org/software/systemd/man/sd_booted.html as documeinting /run/systemd/system/ as the canonical test for systemd
15:19 🔗 tpw_rules it also doesn't look like syncing is happening
15:20 🔗 Senji I've had the occasional problem with hourlysync dying
15:20 🔗 * tpw_rules starts in another terminal
15:21 🔗 Senji Those drives look very cheap; but maybe that's just the thing where US to UK pricing is a bit odd for computer hardware
15:22 🔗 Senji I have a http://www.amazon.co.uk/gp/product/B00JV1YQY0 that I use in my media-centre-pc (not for iabak :)) which cost me £119.99
15:24 🔗 tpw_rules well in theory it's the same drive
15:25 🔗 tpw_rules lol the guy who asked seagate for a new motherboard
15:26 🔗 tpw_rules i have a stack of 5TB wd red NAS drives in my server but they're pretty expensive
15:27 🔗 Senji I think you'd want greens for this. Purples maybe if you think it's more of a 24/7 usage pattern
15:27 🔗 tpw_rules well i'm also very cheap
15:27 🔗 tpw_rules i can get good seagate enclosures for like $16 a pop on ebay
15:28 🔗 tpw_rules it looks like the chipset is having problems with 5TB tho
15:29 🔗 tpw_rules that box on newegg has a lot of the same complaints as the samsung
15:29 🔗 tpw_rules i'll have to monitor spinning of my drives. not sure how good linux is at spinning them down over usb
15:32 🔗 JesseW has joined #internetarchive.bak
15:44 🔗 Start has quit IRC (Disconnected.)
15:56 🔗 JesseW has quit IRC (Quit: Leaving.)
16:03 🔗 Start has joined #internetarchive.bak
16:21 🔗 iabak-reg 03registrar 05master 902ec21 06other 10SHARD3/pubkeys registration of mail on SHARD3
16:33 🔗 hendi_ has quit IRC (ircd.choopa.net irc.eversible.com)
16:33 🔗 Lord_ has quit IRC (ircd.choopa.net irc.eversible.com)
16:33 🔗 espes__ has quit IRC (ircd.choopa.net irc.eversible.com)
16:33 🔗 svchfoo3 has quit IRC (ircd.choopa.net irc.eversible.com)
16:33 🔗 bpye has quit IRC (ircd.choopa.net irc.eversible.com)
16:38 🔗 espes___ has joined #internetarchive.bak
16:42 🔗 bpye_ has joined #internetarchive.bak
16:43 🔗 hendi has joined #internetarchive.bak
16:43 🔗 Lord__ has joined #internetarchive.bak
16:43 🔗 Start has quit IRC (Disconnected.)
17:22 🔗 primus104 has joined #internetarchive.bak
18:38 🔗 tpw_rules closure: when is a new build with --incomplete coming?
18:54 🔗 Start has joined #internetarchive.bak
18:58 🔗 tpw_rules i think the fsck is locking too much
18:59 🔗 tpw_rules it seems to hold a lock on the archive while the checksum is happening so there's a lot of time that ./iabak is waiting to get a new file to download
19:06 🔗 tpw_rules it looks like a lot of stuff is going from cdbbsarchive
19:06 🔗 tpw_rules also closure i have even more crazy ideas
19:24 🔗 Start has quit IRC (Disconnected.)
20:08 🔗 mariusz has quit IRC (Read error: Operation timed out)
20:51 🔗 hendi has quit IRC (Ping timeout: 259 seconds)
20:59 🔗 sankin has quit IRC (Leaving.)
22:09 🔗 garyrh has quit IRC (Read error: Connection reset by peer)
22:27 🔗 Start has joined #internetarchive.bak
22:33 🔗 garyrh has joined #internetarchive.bak
22:54 🔗 Apathy_ has joined #internetarchive.bak
23:14 🔗 Senji tpw_rules: I've not noticed any locking between fscking and getting; and one of my shards takes more than 5 hours to fsck
23:14 🔗 Senji (cleopatra is Realyl Slow™ by modern computer standards)
23:15 🔗 Senji Various modes of failing to get a file take *forever* though; I think waiting for timeouts in wget
23:41 🔗 mntasauri has quit IRC (Max SendQ exceeded)
23:41 🔗 mntasauri has joined #internetarchive.bak
23:43 🔗 mntasauri has quit IRC (Max SendQ exceeded)
23:43 🔗 mntasauri has joined #internetarchive.bak
23:46 🔗 primus104 has quit IRC (hub.se irc.efnet.pl)

irclogger-viewer