[00:04] what else should i do? [00:04] i did fsck --fast, sync [00:04] git annex info will tell you the uuid of the repository [00:05] tpw_rules: what does git annex version say? [00:05] 5.20150327-g19a1a35 [00:06] that's pretty new, but try upgrading to the same version that iabak uses [00:06] 20150418 [00:06] the 1d92 id says [here] next to it [00:07] good, that means you're in the right place [00:08] does it know the repo id vs just the path? [00:08] does what know? [00:08] git annex [00:08] ie is it saying [here] because the IDs match too or just the path [00:09] oh, yes [00:09] that's the uuid of the repository, which stays the same no matter where you move it [00:09] k. i had another that i deleted which must be the ff2f one [00:09] helps greatly when dealing with removable media [00:09] fair enough [00:10] several of us have had to do that [00:10] give me a couple minutes and i'll try upgrading. just synced again just in case [00:12] you can also check the activity.log on th egit-annex branch after you do an fsck [00:13] it should update it to include the timestamp of the last fsck you ran [00:13] for example, 9de421fd65f290d7d15f56453e31e31bb3f447a8 is the commit created by my last fsck of shard1 [00:15] how do i do that? i'm not a git master yet [00:15] actually when i last fscked it said something like 350 files failed [00:16] git log -p will show you a log of the changes on the branch, along with what those changes were (which we call a "diff") [00:16] i know that much, just not the syntax [00:16] a diff shows a - in front of lines that were removed, and a + in front of lines that were added [00:16] lines with a space in front are called context, they're the lines around where the changes took place [00:17] i know what a diff is too :P [00:17] oh, good :) what syntax don't you know then? [00:17] git log of a particular branch [00:18] ah [00:18] closure: is this the same bug as before: 3d968e2d417ddb798f7d849f46fa1c3f660e4a33? [00:18] i see me in there [00:18] with that id... [00:18] good :) [00:19] so it's on your end then? [00:19] let me sync again... [00:21] i wonder how hard making a fuse filesystem is. i've been thinking of one that can tolerate disappearance of file data [00:23] hrm, I don't see 1d92 in the activity log [00:24] closure: also a7cb94e6b32d [00:35] db48x: we'll keep hitting that bug until people upgrade already [00:35] it's only been what, 2 weeks? [00:38] yea [00:38] that last one is one of yours though [00:39] that last one is perfectly ok [00:40] *** aschmitz_ is now known as aschmitz [00:40] and actually, 3d968e2d417ddb798f7d849f46fa1c3f660e4a33? is perfectly ok too [00:45] a7cb94e6b32d removes three other repositories from the activity log [00:46] no, it removes old entries where newer entries exist [00:46] ahh [00:46] confusing [00:52] http://svn.uvw.ru/mhddfs/trunk/README this looks like a pretty neat tool. ima try it next week and see how it does [00:52] like unionfs but properly writable [00:57] where's the "extract and run" git annex package for the latest version? [00:57] oh i see it [00:58] IA.BAK/install-git-annex [01:01] okay i just did the fsck [01:03] yay, worked [01:04] (would have been so much easier if you'd just use the script tho.. and seriously, I can't imagine we want people to not be using the script going forward.. [01:04] hehe i'm about to redesign my setup. i'll do the script then :) [01:05] is it peer-to-peer yet? can i fetch data from other people's backups? [01:06] hmm, still 1.28 pb of storage in shard2 that is in repos that are due to expire [01:07] better than 3.99 pb tho [01:07] uh, PB? [01:07] i think he's saying cumulatively [01:07] *** closure changes topic to: http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK | http://iabackup.archiveteam.org/ia.bak/ALL | #archiveteamtb [01:07] tb [01:07] wooop [01:07] but that also assumes that like 2000 people have backed up [01:07] :) [01:07] *** closure changes topic to: http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK | http://iabackup.archiveteam.org/ia.bak/ALL | #archiveteam [01:07] I mean tb, not pb [01:08] *** closure changes topic to: http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK | http://iabackup.archiveteam.org/ia.bak/ALL | #archiveteam | repos listed in http://iabak.archivetea [01:08] *** closure changes topic to: http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK | http://iabackup.archiveteam.org/ia.bak/ALL | #archiveteam | repos listed in http://iabak.archivetea [01:09] wow, does efnet have super short topic sizes? [01:09] *** closure changes topic to: http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK | http://iabackup.archiveteam.org/ia.bak/ALL | #archiveteam [01:09] yep [01:12] is there anything that's not super short? [01:17] *** closure changes topic to: http://iabackup.archiveteam.org/ia.bak/ | #archiveteam | 1 tb expiring from shard1 on MONDAY; check http://iabak.archiveteam.org/stats/SHARD1.expireleaderboard [01:18] has it not updated yet? [01:19] will on the hour, but you're ok now [01:19] it last updated at 21:03 [01:19] (no idea what timezone, off-hand) [01:19] * closure wants root@katie and whoever it is who has root@ de and root@ lax. Kenshin maybe? [01:22] so, I'm thinking we could maybe prompt for a contact email and store it in the shard's own git repo [01:23] weren't we going to tie it to a login at IA? [01:23] original plan, but seems we're not going to be tied to IA [01:23] why is that? [01:24] ask sketch.. [01:28] closure: does annex download stuff into .git/tmp first or only to the destination? [01:28] it goes to .git/annex/tmp first [01:28] hm [01:28] and then mv to the right place? [01:28] i was gonna symlink .git to another drive [01:29] might as well symlink shard1, 99.9% of the storage is in .git [01:29] then what's all the files that i can see? [01:29] symlinks to things in .git [01:29] symlinks [01:30] oh [01:30] .git/annex/objects [01:30] that puts a bit of a wrench in my plan [01:30] they take up very little space, so you'd not gain much by having them separate [01:31] i was gonna store all the data on a jbod drive that i don't care if it fails and then the index and stuff on the raid. if the jbod fails, an fsck should be able to update the status and start redownloading what left [01:31] cause i suspect git annex won't be happy if half the stuff in .git goes missing [01:31] hmm [01:32] if a git commit object dissapears, it can download it again from the repository [01:32] same if a whole pack file full of git commits dissapears [01:32] is an 'object' the downloaded thing or the pointer to it? [01:32] i was more concerned about all the indexes and stuff [01:33] .git/annex/objects holds the actual downloaded files [01:33] the plan is to have a dump place / quasi stress test for all the spare hard drives i have, so i'd like to only lose data rather than part of the repo meta stuff [01:33] can i just symlink that elsewhere instead? [01:34] the repo metadata is the same across all repositories [01:34] once you sync you've uploaded it to the central one, and downloaded anything you didn't have [01:34] if that goes, you can just sync again [01:34] what defines the repo as existing? [01:34] what do you mean? [01:35] ie the difference between it working and "not in a git repository" [01:36] go to an empty directory and do a 'git init' [01:36] i was gonna try out this thing: http://svn.uvw.ru/mhddfs/trunk/README [01:36] then look in .git [01:37] that's fine [01:37] what is transfer/ used for? [01:38] transfers between repositories would be my guess, though I don't know how it differs from tmp [01:38] i wanna symlink all the things that will contain data onto that drive and keep the meta safe [01:39] there's not really any reason to [01:39] i'm not interested in having to reinit the repos if something dies, basically [01:39] if it's damaged then it's either self repairing, or you can just check out the repository again [01:40] well i'll fiddle. i have like 8TB of drives i just happened to find and i may pick up one of those seagate archive ones [01:41] i also found an unopened box of 360K 5.25" floppies [01:41] lol [01:42] though they are double sided [01:45] closure: is there a way I can drop all unused items? [01:47] db48x: unused how? [01:49] oh yeah. rather than waiting for unused to process and dropping 1- [01:49] just all [01:49] closure: partially transferred, kept by fsck just in case [01:50] oh, and one that's no longer used by any files [01:50] ah, you can remove files from .git/annex/tmp/ and .git/annex/bad/ [01:50] i presume the list by `git annex unused` [01:50] we shouldn't have any ones not used by files [01:50] closure: what directories under .git/annex store data? [01:51] .git/annex/objects/ [01:51] i mean transfer/ can have data too [01:51] at some points. and tmp/ [01:52] and download/ [01:53] .git/annex/objects is all that matters, the rest can be deleted [01:53] i just don't want it moving things between disks [01:54] then you want .git/annex/ on the same disk [01:55] oh ok. if that gets half-wiped, it will fix itself? [01:56] well i'll play around [01:57] prepare to get lots of bug reports on what happens when parts of it suddenly stop existing [01:57] (possibly while running too) [02:21] *** VADemon has quit IRC (Read error: Connection reset by peer) [02:27] *** chazchaz has quit IRC (Read error: Operation timed out) [02:30] *** chazchaz has joined #internetarchive.bak [02:55] *** niyaje4 has joined #internetarchive.bak [03:42] *** bpye_ has quit IRC (Read error: Connection reset by peer) [03:44] hi [03:45] *** bpye has joined #internetarchive.bak [03:53] *** niyaje4 has quit IRC (Nettalk6 - www.ntalk.de) [03:55] *** niyaje4 has joined #internetarchive.bak [04:04] *** Atluxity has joined #internetarchive.bak [04:07] *** niyaje4 has quit IRC (Read error: Operation timed out) [04:11] *** niyaje4 has joined #internetarchive.bak [04:33] *** SketchCow has joined #internetarchive.bak [04:33] *** svchfoo3 sets mode: +o SketchCow [05:31] *** niyaje4 has quit IRC (Ping timeout: 600 seconds) [08:31] the download speeds are still terribly slow. ive been doing shard2 for like 2weeks now [08:32] if im lucky ill get 30-40Mbps over 4x threads [10:00] *** zottelbey has joined #internetarchive.bak [10:13] *** niyaje4 has joined #internetarchive.bak [11:04] *** niyaje4 has quit IRC (Ping timeout: 600 seconds) [15:26] *** ersi has quit IRC (Read error: Operation timed out) [15:40] *** ersi has joined #internetarchive.bak [15:40] *** svchfoo3 sets mode: +o ersi [15:53] does the site show how big each shard is? [15:55] tpw_rules: yes [15:55] http://iabackup.archiveteam.org/ia.bak/SHARD1 [15:56] oh. didn't capitalize it [15:56] shouldn't "4 copies" be changed to "> 3 copies"? [15:57] at the moment, it is just 4 copies, so that's fine for now (afaik) [15:59] i think there are more than four copies of some files [16:09] https://github.com/ArchiveTeam/IA.BAK/blob/server/web/graph-gen.sh#L30 [16:09] yes, it counts 4 or more copies there [17:03] db48x: but the label is "4 copies" rather than "4 or more" [17:04] yes, it is [17:05] you can change it if you want [17:07] what's your github username? [17:07] tpwrules [17:09] invited you [17:10] I see, sec [17:14] um i think i got expired [17:14] closure told me i was fine yesterday [17:16] my current repo is 1d92bde5-54d3-41bc-932e-d8e8e7bfff51 -- thomas@mom-server:/media/media_store/shared/ia.bak/shard1 [17:16] FF2F is old [17:18] or does the fact that i'm not on that list mean i'm safe [17:18] i'm not sure if the number is time or total amount that will expire [17:32] *** atomotic has joined #internetarchive.bak [17:36] *** atomotic has quit IRC (Client Quit) [17:39] So, I don't need to run iabak-cronjob while I'm still running iabak? [17:47] *** SN4T14_ has quit IRC (Read error: Connection reset by peer) [18:28] Senji: that's correct (won't hurt tho) [18:56] tpw_rules: http://iabak.archiveteam.org/stats/SHARD1.expireleaderboard is all the ones that will be expired, once we turn on expiration [18:57] the first number is the size in bytes of files in each repository [18:58] That must be most of shard1 [18:58] yes, those folks really need to upgrade! [18:59] http://iabak.archiveteam.org/stats/SHARD2.expireleaderboard [19:00] Hooray, iabak-cronjob. [20:01] *** svchfoo2 has quit IRC (Remote host closed the connection) [20:05] *** Start has quit IRC (Disconnected.) [20:06] *** hatsefla1 has joined #internetarchive.bak [20:08] *** hatseflat has quit IRC (Write error: Broken pipe) [20:08] *** ppiixx has quit IRC (Write error: Broken pipe) [20:08] *** Start has joined #internetarchive.bak [20:22] *** Kazzy_ has joined #internetarchive.bak [20:23] *** LordNigh2 has joined #internetarchive.bak [20:29] *** jbenet_ has quit IRC (Ping timeout: 839 seconds) [20:29] *** Lord_Nigh has quit IRC (Remote host closed the connection) [20:29] *** LordNigh2 is now known as Lord_Nigh [20:29] *** Kazzy has quit IRC (Write error: Broken pipe) [20:29] *** Kazzy_ is now known as Kazzy [20:29] *** Muad-Dib has quit IRC (Remote host closed the connection) [20:29] *** marvinw has quit IRC (Remote host closed the connection) [20:29] *** lhobas_ has joined #internetarchive.bak [20:29] *** balrog has quit IRC (Remote host closed the connection) [20:29] *** Vito`_ has joined #internetarchive.bak [20:29] *** lhobas has quit IRC (Ping timeout: 412 seconds) [20:29] *** Vito` has quit IRC (Ping timeout: 412 seconds) [20:29] *** lhobas_ is now known as lhobas [20:29] *** Vito`_ is now known as Vito` [20:29] *** svchfoo2 has joined #internetarchive.bak [20:29] *** jbenet_ has joined #internetarchive.bak [20:29] *** svchfoo1 sets mode: +o Kazzy [20:31] *** balrog has joined #internetarchive.bak [20:32] *** Muad-Dib has joined #internetarchive.bak [20:33] *** ppiixx has joined #internetarchive.bak [20:34] *** svchfoo2 has quit IRC (Quit: Closing) [20:35] *** svchfoo2 has joined #internetarchive.bak [20:35] *** svchfoo3 sets mode: +o svchfoo2 [20:46] *** marvinw has joined #internetarchive.bak [21:02] *** zottelbey has quit IRC (Remote host closed the connection) [22:24] *** pikhq has quit IRC (Remote host closed the connection) [22:47] *** pikhq has joined #internetarchive.bak [22:47] *** svchfoo3 sets mode: +o pikhq [23:54] *** svchfoo3 has quit IRC (Remote host closed the connection) [23:55] *** svchfoo3 has joined #internetarchive.bak [23:56] *** svchfoo2 sets mode: +o svchfoo3