#internetarchive.bak 2015-04-19,Sun

↑back Search

Time Nickname Message
00:04 🔗 tpw_rules what else should i do?
00:04 🔗 tpw_rules i did fsck --fast, sync
00:04 🔗 db48x git annex info will tell you the uuid of the repository
00:05 🔗 db48x tpw_rules: what does git annex version say?
00:05 🔗 tpw_rules 5.20150327-g19a1a35
00:06 🔗 db48x that's pretty new, but try upgrading to the same version that iabak uses
00:06 🔗 db48x 20150418
00:06 🔗 tpw_rules the 1d92 id says [here] next to it
00:07 🔗 db48x good, that means you're in the right place
00:08 🔗 tpw_rules does it know the repo id vs just the path?
00:08 🔗 db48x does what know?
00:08 🔗 tpw_rules git annex
00:08 🔗 tpw_rules ie is it saying [here] because the IDs match too or just the path
00:09 🔗 db48x oh, yes
00:09 🔗 db48x that's the uuid of the repository, which stays the same no matter where you move it
00:09 🔗 tpw_rules k. i had another that i deleted which must be the ff2f one
00:09 🔗 db48x helps greatly when dealing with removable media
00:09 🔗 db48x fair enough
00:10 🔗 db48x several of us have had to do that
00:10 🔗 tpw_rules give me a couple minutes and i'll try upgrading. just synced again just in case
00:12 🔗 db48x you can also check the activity.log on th egit-annex branch after you do an fsck
00:13 🔗 db48x it should update it to include the timestamp of the last fsck you ran
00:13 🔗 db48x for example, 9de421fd65f290d7d15f56453e31e31bb3f447a8 is the commit created by my last fsck of shard1
00:15 🔗 tpw_rules how do i do that? i'm not a git master yet
00:15 🔗 tpw_rules actually when i last fscked it said something like 350 files failed
00:16 🔗 db48x git log -p <branch> will show you a log of the changes on the branch, along with what those changes were (which we call a "diff")
00:16 🔗 tpw_rules i know that much, just not the syntax
00:16 🔗 db48x a diff shows a - in front of lines that were removed, and a + in front of lines that were added
00:16 🔗 db48x lines with a space in front are called context, they're the lines around where the changes took place
00:17 🔗 tpw_rules i know what a diff is too :P
00:17 🔗 db48x oh, good :) what syntax don't you know then?
00:17 🔗 tpw_rules git log of a particular branch
00:18 🔗 db48x ah
00:18 🔗 db48x closure: is this the same bug as before: 3d968e2d417ddb798f7d849f46fa1c3f660e4a33?
00:18 🔗 tpw_rules i see me in there
00:18 🔗 tpw_rules with that id...
00:18 🔗 db48x good :)
00:19 🔗 tpw_rules so it's on your end then?
00:19 🔗 db48x let me sync again...
00:21 🔗 tpw_rules i wonder how hard making a fuse filesystem is. i've been thinking of one that can tolerate disappearance of file data
00:23 🔗 db48x hrm, I don't see 1d92 in the activity log
00:24 🔗 db48x closure: also a7cb94e6b32d
00:35 🔗 closure db48x: we'll keep hitting that bug until people upgrade already
00:35 🔗 closure it's only been what, 2 weeks?
00:38 🔗 db48x yea
00:38 🔗 db48x that last one is one of yours though
00:39 🔗 closure that last one is perfectly ok
00:40 🔗 aschmitz_ is now known as aschmitz
00:40 🔗 closure and actually, 3d968e2d417ddb798f7d849f46fa1c3f660e4a33? is perfectly ok too
00:45 🔗 db48x a7cb94e6b32d removes three other repositories from the activity log
00:46 🔗 closure no, it removes old entries where newer entries exist
00:46 🔗 db48x ahh
00:46 🔗 db48x confusing
00:52 🔗 tpw_rules http://svn.uvw.ru/mhddfs/trunk/README this looks like a pretty neat tool. ima try it next week and see how it does
00:52 🔗 tpw_rules like unionfs but properly writable
00:57 🔗 tpw_rules where's the "extract and run" git annex package for the latest version?
00:57 🔗 tpw_rules oh i see it
00:58 🔗 closure IA.BAK/install-git-annex
01:01 🔗 tpw_rules okay i just did the fsck
01:03 🔗 closure yay, worked
01:04 🔗 closure (would have been so much easier if you'd just use the script tho.. and seriously, I can't imagine we want people to not be using the script going forward..
01:04 🔗 tpw_rules hehe i'm about to redesign my setup. i'll do the script then :)
01:05 🔗 tpw_rules is it peer-to-peer yet? can i fetch data from other people's backups?
01:06 🔗 closure hmm, still 1.28 pb of storage in shard2 that is in repos that are due to expire
01:07 🔗 closure better than 3.99 pb tho
01:07 🔗 db48x uh, PB?
01:07 🔗 tpw_rules i think he's saying cumulatively
01:07 🔗 closure changes topic to: http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK | http://iabackup.archiveteam.org/ia.bak/ALL | #archiveteamtb
01:07 🔗 closure tb
01:07 🔗 closure wooop
01:07 🔗 tpw_rules but that also assumes that like 2000 people have backed up
01:07 🔗 db48x :)
01:07 🔗 closure changes topic to: http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK | http://iabackup.archiveteam.org/ia.bak/ALL | #archiveteam
01:07 🔗 closure I mean tb, not pb
01:08 🔗 closure changes topic to: http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK | http://iabackup.archiveteam.org/ia.bak/ALL | #archiveteam | repos listed in http://iabak.archivetea
01:08 🔗 closure changes topic to: http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK | http://iabackup.archiveteam.org/ia.bak/ALL | #archiveteam | repos listed in http://iabak.archivetea
01:09 🔗 closure wow, does efnet have super short topic sizes?
01:09 🔗 closure changes topic to: http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK | http://iabackup.archiveteam.org/ia.bak/ALL | #archiveteam
01:09 🔗 db48x yep
01:12 🔗 tpw_rules is there anything that's not super short?
01:17 🔗 closure changes topic to: http://iabackup.archiveteam.org/ia.bak/ | #archiveteam | 1 tb expiring from shard1 on MONDAY; check http://iabak.archiveteam.org/stats/SHARD1.expireleaderboard
01:18 🔗 tpw_rules has it not updated yet?
01:19 🔗 closure will on the hour, but you're ok now
01:19 🔗 db48x it last updated at 21:03
01:19 🔗 db48x (no idea what timezone, off-hand)
01:19 🔗 * closure wants root@katie and whoever it is who has root@ de and root@ lax. Kenshin maybe?
01:22 🔗 closure so, I'm thinking we could maybe prompt for a contact email and store it in the shard's own git repo
01:23 🔗 db48x weren't we going to tie it to a login at IA?
01:23 🔗 closure original plan, but seems we're not going to be tied to IA
01:23 🔗 db48x why is that?
01:24 🔗 closure ask sketch..
01:28 🔗 tpw_rules closure: does annex download stuff into .git/tmp first or only to the destination?
01:28 🔗 closure it goes to .git/annex/tmp first
01:28 🔗 tpw_rules hm
01:28 🔗 tpw_rules and then mv to the right place?
01:28 🔗 tpw_rules i was gonna symlink .git to another drive
01:29 🔗 closure might as well symlink shard1, 99.9% of the storage is in .git
01:29 🔗 tpw_rules then what's all the files that i can see?
01:29 🔗 db48x symlinks to things in .git
01:29 🔗 closure symlinks
01:30 🔗 tpw_rules oh
01:30 🔗 db48x .git/annex/objects
01:30 🔗 tpw_rules that puts a bit of a wrench in my plan
01:30 🔗 db48x they take up very little space, so you'd not gain much by having them separate
01:31 🔗 tpw_rules i was gonna store all the data on a jbod drive that i don't care if it fails and then the index and stuff on the raid. if the jbod fails, an fsck should be able to update the status and start redownloading what left
01:31 🔗 tpw_rules cause i suspect git annex won't be happy if half the stuff in .git goes missing
01:31 🔗 db48x hmm
01:32 🔗 db48x if a git commit object dissapears, it can download it again from the repository
01:32 🔗 db48x same if a whole pack file full of git commits dissapears
01:32 🔗 tpw_rules is an 'object' the downloaded thing or the pointer to it?
01:32 🔗 tpw_rules i was more concerned about all the indexes and stuff
01:33 🔗 db48x .git/annex/objects holds the actual downloaded files
01:33 🔗 tpw_rules the plan is to have a dump place / quasi stress test for all the spare hard drives i have, so i'd like to only lose data rather than part of the repo meta stuff
01:33 🔗 tpw_rules can i just symlink that elsewhere instead?
01:34 🔗 db48x the repo metadata is the same across all repositories
01:34 🔗 db48x once you sync you've uploaded it to the central one, and downloaded anything you didn't have
01:34 🔗 db48x if that goes, you can just sync again
01:34 🔗 tpw_rules what defines the repo as existing?
01:34 🔗 db48x what do you mean?
01:35 🔗 tpw_rules ie the difference between it working and "not in a git repository"
01:36 🔗 db48x go to an empty directory and do a 'git init'
01:36 🔗 tpw_rules i was gonna try out this thing: http://svn.uvw.ru/mhddfs/trunk/README
01:36 🔗 db48x then look in .git
01:37 🔗 db48x that's fine
01:37 🔗 tpw_rules what is transfer/ used for?
01:38 🔗 db48x transfers between repositories would be my guess, though I don't know how it differs from tmp
01:38 🔗 tpw_rules i wanna symlink all the things that will contain data onto that drive and keep the meta safe
01:39 🔗 db48x there's not really any reason to
01:39 🔗 tpw_rules i'm not interested in having to reinit the repos if something dies, basically
01:39 🔗 db48x if it's damaged then it's either self repairing, or you can just check out the repository again
01:40 🔗 tpw_rules well i'll fiddle. i have like 8TB of drives i just happened to find and i may pick up one of those seagate archive ones
01:41 🔗 tpw_rules i also found an unopened box of 360K 5.25" floppies
01:41 🔗 db48x lol
01:42 🔗 tpw_rules though they are double sided
01:45 🔗 db48x closure: is there a way I can drop all unused items?
01:47 🔗 closure db48x: unused how?
01:49 🔗 tpw_rules oh yeah. rather than waiting for unused to process and dropping 1-<number of items>
01:49 🔗 tpw_rules just all
01:49 🔗 db48x closure: partially transferred, kept by fsck just in case
01:50 🔗 db48x oh, and one that's no longer used by any files
01:50 🔗 closure ah, you can remove files from .git/annex/tmp/ and .git/annex/bad/
01:50 🔗 tpw_rules i presume the list by `git annex unused`
01:50 🔗 closure we shouldn't have any ones not used by files
01:50 🔗 tpw_rules closure: what directories under .git/annex store data?
01:51 🔗 closure .git/annex/objects/
01:51 🔗 tpw_rules i mean transfer/ can have data too
01:51 🔗 tpw_rules at some points. and tmp/
01:52 🔗 tpw_rules and download/
01:53 🔗 closure .git/annex/objects is all that matters, the rest can be deleted
01:53 🔗 tpw_rules i just don't want it moving things between disks
01:54 🔗 closure then you want .git/annex/ on the same disk
01:55 🔗 tpw_rules oh ok. if that gets half-wiped, it will fix itself?
01:56 🔗 tpw_rules well i'll play around
01:57 🔗 tpw_rules prepare to get lots of bug reports on what happens when parts of it suddenly stop existing
01:57 🔗 tpw_rules (possibly while running too)
02:21 🔗 VADemon has quit IRC (Read error: Connection reset by peer)
02:27 🔗 chazchaz has quit IRC (Read error: Operation timed out)
02:30 🔗 chazchaz has joined #internetarchive.bak
02:55 🔗 niyaje4 has joined #internetarchive.bak
03:42 🔗 bpye_ has quit IRC (Read error: Connection reset by peer)
03:44 🔗 realeyes hi
03:45 🔗 bpye has joined #internetarchive.bak
03:53 🔗 niyaje4 has quit IRC (Nettalk6 - www.ntalk.de)
03:55 🔗 niyaje4 has joined #internetarchive.bak
04:04 🔗 Atluxity has joined #internetarchive.bak
04:07 🔗 niyaje4 has quit IRC (Read error: Operation timed out)
04:11 🔗 niyaje4 has joined #internetarchive.bak
04:33 🔗 SketchCow has joined #internetarchive.bak
04:33 🔗 svchfoo3 sets mode: +o SketchCow
05:31 🔗 niyaje4 has quit IRC (Ping timeout: 600 seconds)
08:31 🔗 S[h]O[r]T the download speeds are still terribly slow. ive been doing shard2 for like 2weeks now
08:32 🔗 S[h]O[r]T if im lucky ill get 30-40Mbps over 4x threads
10:00 🔗 zottelbey has joined #internetarchive.bak
10:13 🔗 niyaje4 has joined #internetarchive.bak
11:04 🔗 niyaje4 has quit IRC (Ping timeout: 600 seconds)
15:26 🔗 ersi has quit IRC (Read error: Operation timed out)
15:40 🔗 ersi has joined #internetarchive.bak
15:40 🔗 svchfoo3 sets mode: +o ersi
15:53 🔗 tpw_rules does the site show how big each shard is?
15:55 🔗 Kazzy tpw_rules: yes
15:55 🔗 Kazzy http://iabackup.archiveteam.org/ia.bak/SHARD1
15:56 🔗 tpw_rules oh. didn't capitalize it
15:56 🔗 tpw_rules shouldn't "4 copies" be changed to "> 3 copies"?
15:57 🔗 Kazzy at the moment, it is just 4 copies, so that's fine for now (afaik)
15:59 🔗 tpw_rules i think there are more than four copies of some files
16:09 🔗 db48x https://github.com/ArchiveTeam/IA.BAK/blob/server/web/graph-gen.sh#L30
16:09 🔗 db48x yes, it counts 4 or more copies there
17:03 🔗 tpw_rules db48x: but the label is "4 copies" rather than "4 or more"
17:04 🔗 db48x yes, it is
17:05 🔗 db48x you can change it if you want
17:07 🔗 db48x what's your github username?
17:07 🔗 tpw_rules tpwrules
17:09 🔗 db48x invited you
17:10 🔗 tpw_rules I see, sec
17:14 🔗 tpw_rules um i think i got expired
17:14 🔗 tpw_rules closure told me i was fine yesterday
17:16 🔗 tpw_rules my current repo is 1d92bde5-54d3-41bc-932e-d8e8e7bfff51 -- thomas@mom-server:/media/media_store/shared/ia.bak/shard1
17:16 🔗 tpw_rules FF2F is old
17:18 🔗 tpw_rules or does the fact that i'm not on that list mean i'm safe
17:18 🔗 tpw_rules i'm not sure if the number is time or total amount that will expire
17:32 🔗 atomotic has joined #internetarchive.bak
17:36 🔗 atomotic has quit IRC (Client Quit)
17:39 🔗 Senji So, I don't need to run iabak-cronjob while I'm still running iabak?
17:47 🔗 SN4T14_ has quit IRC (Read error: Connection reset by peer)
18:28 🔗 closure Senji: that's correct (won't hurt tho)
18:56 🔗 db48x tpw_rules: http://iabak.archiveteam.org/stats/SHARD1.expireleaderboard is all the ones that will be expired, once we turn on expiration
18:57 🔗 db48x the first number is the size in bytes of files in each repository
18:58 🔗 Senji That must be most of shard1
18:58 🔗 db48x yes, those folks really need to upgrade!
18:59 🔗 db48x http://iabak.archiveteam.org/stats/SHARD2.expireleaderboard
19:00 🔗 pikhq Hooray, iabak-cronjob.
20:01 🔗 svchfoo2 has quit IRC (Remote host closed the connection)
20:05 🔗 Start has quit IRC (Disconnected.)
20:06 🔗 hatsefla1 has joined #internetarchive.bak
20:08 🔗 hatseflat has quit IRC (Write error: Broken pipe)
20:08 🔗 ppiixx has quit IRC (Write error: Broken pipe)
20:08 🔗 Start has joined #internetarchive.bak
20:22 🔗 Kazzy_ has joined #internetarchive.bak
20:23 🔗 LordNigh2 has joined #internetarchive.bak
20:29 🔗 jbenet_ has quit IRC (Ping timeout: 839 seconds)
20:29 🔗 Lord_Nigh has quit IRC (Remote host closed the connection)
20:29 🔗 LordNigh2 is now known as Lord_Nigh
20:29 🔗 Kazzy has quit IRC (Write error: Broken pipe)
20:29 🔗 Kazzy_ is now known as Kazzy
20:29 🔗 Muad-Dib has quit IRC (Remote host closed the connection)
20:29 🔗 marvinw has quit IRC (Remote host closed the connection)
20:29 🔗 lhobas_ has joined #internetarchive.bak
20:29 🔗 balrog has quit IRC (Remote host closed the connection)
20:29 🔗 Vito`_ has joined #internetarchive.bak
20:29 🔗 lhobas has quit IRC (Ping timeout: 412 seconds)
20:29 🔗 Vito` has quit IRC (Ping timeout: 412 seconds)
20:29 🔗 lhobas_ is now known as lhobas
20:29 🔗 Vito`_ is now known as Vito`
20:29 🔗 svchfoo2 has joined #internetarchive.bak
20:29 🔗 jbenet_ has joined #internetarchive.bak
20:29 🔗 svchfoo1 sets mode: +o Kazzy
20:31 🔗 balrog has joined #internetarchive.bak
20:32 🔗 Muad-Dib has joined #internetarchive.bak
20:33 🔗 ppiixx has joined #internetarchive.bak
20:34 🔗 svchfoo2 has quit IRC (Quit: Closing)
20:35 🔗 svchfoo2 has joined #internetarchive.bak
20:35 🔗 svchfoo3 sets mode: +o svchfoo2
20:46 🔗 marvinw has joined #internetarchive.bak
21:02 🔗 zottelbey has quit IRC (Remote host closed the connection)
22:24 🔗 pikhq has quit IRC (Remote host closed the connection)
22:47 🔗 pikhq has joined #internetarchive.bak
22:47 🔗 svchfoo3 sets mode: +o pikhq
23:54 🔗 svchfoo3 has quit IRC (Remote host closed the connection)
23:55 🔗 svchfoo3 has joined #internetarchive.bak
23:56 🔗 svchfoo2 sets mode: +o svchfoo3

irclogger-viewer