Time |
Nickname |
Message |
00:27
🔗
|
|
Start has joined #internetarchive.bak |
00:28
🔗
|
iabak-reg |
03registrar 05master cf23978 06other 10SHARD6/pubkeys registration of infinity on SHARD6 |
00:52
🔗
|
iabak-reg |
03registrar 05master 038c73d 06other 10SHARD7/pubkeys registration of kurtmclester on SHARD7 |
00:55
🔗
|
Kazzy |
shard5 done |
00:59
🔗
|
SketchCow |
https://soundcloud.com/renjith-vijay/fast-and-furious-7-get-low-ringtone |
00:59
🔗
|
SketchCow |
That plays every time we move shards. |
01:00
🔗
|
tpw_rules |
i've got 6TB down so far in general. how can i check progress on a specific shard? |
01:00
🔗
|
tpw_rules |
closure: ^ |
01:01
🔗
|
Kazzy |
tpw_rules: http://iabak.archiveteam.org/SHARD5.html |
01:01
🔗
|
Kazzy |
can substitute SHARD% for the shard you want to view |
01:01
🔗
|
tpw_rules |
i mean me specifically |
01:01
🔗
|
Kazzy |
SHARD5* |
01:15
🔗
|
closure |
yeah, we could make individual pages for each registered user. already have the graphs and data. |
01:19
🔗
|
tpw_rules |
what about just have it as part of git annex? |
01:20
🔗
|
closure |
tpw_rules: just run git-annex info |
01:23
🔗
|
tpw_rules |
oh duh |
01:24
🔗
|
tpw_rules |
why are some "unknown size"? |
01:24
🔗
|
closure |
xml files with dummy size 0 in the IA survey |
01:24
🔗
|
tpw_rules |
is the working tree the ones on your server or on disk size |
01:25
🔗
|
closure |
that's the total files, local annex is on your disk |
01:25
🔗
|
tpw_rules |
ok |
01:26
🔗
|
tpw_rules |
okay so i have about a third of shard6 |
01:26
🔗
|
tpw_rules |
gonna need to add more disk soon |
01:30
🔗
|
db48x |
tpw_rules: what's the UUID of your copy of the shard? |
01:32
🔗
|
db48x |
cd shard5; git config --get annex.uuid |
01:32
🔗
|
|
primus104 has quit IRC (Leaving.) |
01:34
🔗
|
|
BotOfWar has quit IRC (Quit: left4dead) |
01:35
🔗
|
tpw_rules |
shard5: ada4868b-f969-40ba-ae7e-d58a20f1c477 shard6: a0c7d62a-53ec-4641-8ecf-af122b2cd5da shard1: 6ca85911-2e94-49b6-bbdc-cb1be080e300 |
01:35
🔗
|
tpw_rules |
i'm talking about shard6 |
01:35
🔗
|
|
VADemon has joined #internetarchive.bak |
01:35
🔗
|
tpw_rules |
can you remove all the ones that claim to be twatson52 that aren't those? i made a bunch of ones that got blown up in various ways |
01:36
🔗
|
tpw_rules |
they all have the same email |
01:36
🔗
|
db48x |
http://iabak.archiveteam.org:8080/render/?width=1060&height=733&_salt=1432604162.838&target=iabak.shardstats.leaderboard.a0c7d62a-53ec-4641-8ecf-af122b2cd5da.shard6 |
01:37
🔗
|
db48x |
http://iabak.archiveteam.org:8080/render/?width=1060&height=733&_salt=1432604200.538&target=iabak.shardstats.leaderboard.a0c7d62a-53ec-4641-8ecf-af122b2cd5da.shard6&from=-1weeks |
01:37
🔗
|
db48x |
pretty nice slope there |
01:38
🔗
|
db48x |
you can use iabak.archiveteam.org:8080 to look at all the different stats we collect |
01:38
🔗
|
db48x |
gives you a nice browser and graph editor |
01:38
🔗
|
tpw_rules |
is there any reason the line is all dashy? |
01:39
🔗
|
db48x |
we only check in once per hour |
01:39
🔗
|
db48x |
you can use the graph editor to make it join up the dots |
01:39
🔗
|
tpw_rules |
but shouldn't it connect the dots? i get the stairstep but it looks like it was done with a dashed line |
01:39
🔗
|
tpw_rules |
ah |
01:40
🔗
|
db48x |
it doesn't know a priori what a gap in the data means |
01:40
🔗
|
db48x |
in this case it means a missed sample; it could just as easily mean a zero |
02:08
🔗
|
tpw_rules |
has anybody else fiddled around with union filesystems for storing all the data? |
02:09
🔗
|
db48x |
I use ZFS |
02:09
🔗
|
trs80 |
what's the goal of using a union fs? |
02:09
🔗
|
tpw_rules |
i'm coming at it from the point of having piles of hard drives that could be used |
02:09
🔗
|
tpw_rules |
basically be able to have a failure of one drive not affect all the data stored |
02:09
🔗
|
db48x |
yea |
02:10
🔗
|
db48x |
ZFS allows striping/mirroring/raid across disks |
02:10
🔗
|
tpw_rules |
and, more importantly, a failure of one drive not make the other data unusable like it would in a RAID 0 |
02:11
🔗
|
db48x |
true |
02:11
🔗
|
db48x |
I prefer raid though, so that the occasional error can be corrected |
02:11
🔗
|
tpw_rules |
but otoh i'm thinking about the idea of writing a disk manager that would create a new repo on each disk and manage filling them up and fscking as plugged in |
02:11
🔗
|
tpw_rules |
so i don't need 45 drive enclosures and usb ports |
02:11
🔗
|
db48x |
:) |
02:12
🔗
|
db48x |
we should have a way for the iabak client to help that out |
02:12
🔗
|
tpw_rules |
but then i have to actually swap drives around and i'm lazy |
02:13
🔗
|
tpw_rules |
well how in the hands of the masses do you want this to be? i know several people who would be happy to help if it were just something they kept running on windows in the tray |
02:14
🔗
|
tpw_rules |
and i really don't think making it all shell scripts is condusive to that ideal |
02:15
🔗
|
db48x |
very true |
02:16
🔗
|
sep332 |
With multiple disks you have to make sure that they're not all getting the same data |
02:17
🔗
|
tpw_rules |
also note that if i say i have an idea, you can say "shut up and make it" because i could. but here goes. have a config file of archive destinations. for each, give it a name and directory. in each directory, put a file with the name and a uuid |
02:17
🔗
|
sep332 |
No good having 3 copies of a file sitting next to each other in one drawer |
02:17
🔗
|
tpw_rules |
this way you can have the directory be a mount point (and the same for multiple desinations) and the file can be used to determine if a disk is mounted/which one is there |
02:17
🔗
|
tpw_rules |
sep332: oh yeah, that's important |
02:18
🔗
|
db48x |
sep332: yea, git annex can handle that if we tell it to |
02:18
🔗
|
tpw_rules |
though i get the approach of building this from common tools so it can be rebuilt in the event of thermonuclear meltdown, i think it limits options a lot |
02:19
🔗
|
tpw_rules |
but this program could also read the config and determine which ones need to be checked and ask for the appropriate one to show up at the directory |
02:19
🔗
|
tpw_rules |
so it could easily support multiple drives that are swapped out or multiple locations on one computer |
02:20
🔗
|
tpw_rules |
also perhaps the option to download locally so it can fsck one drive and later copy files to another when it comes back |
02:22
🔗
|
db48x |
we can simplify that more, even |
02:23
🔗
|
db48x |
git annex repositories already have their uuid |
02:23
🔗
|
tpw_rules |
can you scan a directory for a git repo easily? |
02:23
🔗
|
db48x |
yes |
02:23
🔗
|
tpw_rules |
recursively? |
02:23
🔗
|
tpw_rules |
but yeah. just read the .git/whatever in the pointed to directory to make sure |
02:23
🔗
|
db48x |
so if the iabak script listened for devices being added (udev, on linux) and found the repositories on them, it could fsck them |
02:24
🔗
|
tpw_rules |
ignore my previous two questions |
02:24
🔗
|
db48x |
yep |
02:24
🔗
|
tpw_rules |
does git exist for windows without cygwin? |
02:24
🔗
|
tpw_rules |
and could git annex? |
02:24
🔗
|
db48x |
no |
02:25
🔗
|
tpw_rules |
is it possible to bundle cygwin into an installer? |
02:25
🔗
|
db48x |
sure |
02:25
🔗
|
db48x |
git already does that |
02:25
🔗
|
tpw_rules |
i have some experience with windows gui programming (python/pyqt so it could even be cross-platform) and bundle all that stuff together |
02:25
🔗
|
db48x |
excellent :) |
02:27
🔗
|
tpw_rules |
now this is where it gets shaky: how do we tie multiple repos on multiple computers to one name? that would be necessary to prevent duplication. i have a friend with a stack of six or so laptops; they should all know the files the others have |
02:27
🔗
|
db48x |
yes, each repository knows which other repsotories have each file |
02:28
🔗
|
SketchCow |
Yeah, this is git-annex's job |
02:28
🔗
|
tpw_rules |
so we would have some association of the sets of repositories to an account and git annex can query those in the set? |
02:28
🔗
|
db48x |
mostly |
02:29
🔗
|
tpw_rules |
i mean we don't want two copies of a file three feet from each other, but having two copies three states from each other is good |
02:29
🔗
|
db48x |
tpw_rules: yes |
02:29
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
02:29
🔗
|
SketchCow |
I suspect we're going to run into a LITTLE of that no matter what, because people will want to "help" |
02:29
🔗
|
db48x |
theres a couple of ways to do it |
02:30
🔗
|
db48x |
there's a git-annex feature coming down the line at some point which we could rely on |
02:30
🔗
|
tpw_rules |
SketchCow: sure, but i wasn't sure how we automatically prevented the first and allowed the second |
02:30
🔗
|
db48x |
or iabak could simply say git annex get --not --copies 4 --not in otherrepo1 --not --in otherrepo2... |
02:31
🔗
|
tpw_rules |
that list would only update when a sync happened though |
02:31
🔗
|
tpw_rules |
might have to tune that in practice |
02:31
🔗
|
db48x |
the git annex feature is called balanced preferred content: http://git-annex.branchable.com/design/balanced_preferred_content/ |
02:32
🔗
|
db48x |
if the disk is offline, then the last sync for it has the must up-to-date information already :) |
02:32
🔗
|
tpw_rules |
well this could also support n drives simultaneously too |
02:33
🔗
|
db48x |
simplest to put a single repository on each of them |
02:33
🔗
|
tpw_rules |
i was thinking more across multiple computers. but the balanced thing would sovle that problem |
02:34
🔗
|
db48x |
both ways of doing it will work for n computers and n drives, and n drives on each of m computers ;) |
02:34
🔗
|
tpw_rules |
but which will be the most perfect |
02:35
🔗
|
db48x |
:) |
02:36
🔗
|
tpw_rules |
i'd be happy to write a gui but i don't know enough about git annex and bash to do the script. i feel we should move it to python or something |
02:36
🔗
|
tpw_rules |
or haskell :P |
02:36
🔗
|
db48x |
haskell would be fun |
02:36
🔗
|
db48x |
I don't know haskell very well yet; it'd be fun to learn |
02:36
🔗
|
tpw_rules |
i need to learn haskell |
02:37
🔗
|
tpw_rules |
i have worked with functionaly programming languages, but nothing truly functional |
02:44
🔗
|
SketchCow |
closure: Seems like I need to start giving you more collections |
02:44
🔗
|
SketchCow |
And we are getting to the point where we need to do a sanity check to make sure a collection isn't already being backed up |
02:46
🔗
|
closure |
I have that sanity check in place actually |
02:46
🔗
|
closure |
http://iabak.archiveteam.org/client/f9601d3062715f39f6290547fbaf14b3e6c2b4fb.html |
02:47
🔗
|
SketchCow |
Great |
02:50
🔗
|
iabak-reg |
03registrar 05master da43505 06other 10SHARD6/pubkeys registration of kurtmclester on SHARD6 |
02:56
🔗
|
SketchCow |
Shard6 is filling in nicely. |
04:03
🔗
|
tpw_rules |
maybe we don't need all commits in the channel :P |
04:53
🔗
|
SketchCow |
Yes. We. Do. |
04:55
🔗
|
tpw_rules |
okay okay okay |
04:57
🔗
|
tpw_rules |
/media/iabak/disk2/IA.BAK;/media/iabak/disk3/IA.BAK;/media/iabak/disk4/IA.BAK 8.1T 5.9T 1.8T 77% /home/thomas/iabak/IA.BAK |
04:57
🔗
|
SketchCow |
This is what we PLAYYYY FORRRRR |
04:57
🔗
|
tpw_rules |
fourtunately i have 3x 3TB disks that i'm not using right now |
04:58
🔗
|
tpw_rules |
anyway, it is time to say good night and let the datums flow in |
05:03
🔗
|
iabak-reg |
03registrar 05master a2a9eb2 06other 10SHARD6/pubkeys registration of kevin on SHARD6 |
06:00
🔗
|
|
zottelbey has joined #internetarchive.bak |
06:37
🔗
|
iabak-reg |
03registrar 05master 9e6e2ff 06other 10SHARD6/pubkeys registration of bas+at on SHARD6 |
07:48
🔗
|
Senji |
tpw: http://iabak.archiveteam.org/stats/SHARD5.leaderboard-raw seems to think you have 8T 8T 8T 1.5T |
07:51
🔗
|
Senji |
Err, no, divite all those numbers by 10, I can'tmath :) |
07:51
🔗
|
Senji |
I'll just go tback to bed, clearly I'm not awake yet |
07:53
🔗
|
lhobas_ |
new stats pages per user are really nice :) |
07:55
🔗
|
lhobas_ |
just noticed the cleanup function in the iabak script doesn't work on OS X (more stupid Mac-only glitches I assume) - https://github.com/ArchiveTeam/IA.BAK/blob/a420ad/iabak-helper#L282 throws "No such file or directory" (pid does exist, statement should eval to true) |
07:56
🔗
|
|
ivan` has joined #internetarchive.bak |
07:56
🔗
|
lhobas_ |
#L285 in that file seems off to me, think the file is supposed to be rm'ed, not the pid-number right? |
07:56
🔗
|
|
primus104 has joined #internetarchive.bak |
08:03
🔗
|
garyrh |
https://news.ycombinator.com/item?id=9602868 |
08:16
🔗
|
ivan` |
I would like to nominate https://archive.org/details/archiveteam_greader for distributed archival because it's got 8TB of compressed text, a lot from dead blogs that are nowhere else |
08:16
🔗
|
ivan` |
the Directory and Stats are unimportant and omitting them saves ~800GB |
08:16
🔗
|
ivan` |
I was planning on dumping it into my Google Drive or onto external drives but never got around to either but will maybe try later |
08:19
🔗
|
iabak-reg |
03registrar 05master 37605fb 06other 10SHARD6/pubkeys registration of cyrus on SHARD6 |
08:21
🔗
|
iabak-reg |
03registrar 05master 5463225 06other 10SHARD6/pubkeys registration of peter on SHARD6 |
08:26
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
08:26
🔗
|
|
Start_ has joined #internetarchive.bak |
08:28
🔗
|
|
Cyrus has joined #internetarchive.bak |
08:28
🔗
|
Senji |
Emcy: I'd say "2 sheets" rather than "4 or more" :) |
08:28
🔗
|
Senji |
Bah, mischat :) |
08:40
🔗
|
iabak-reg |
03registrar 05master d422268 06other 10SHARD6/pubkeys registration of antoine on SHARD6 |
09:17
🔗
|
SketchCow |
This project just got mentioned on hackernews. |
09:17
🔗
|
SketchCow |
Might cause a run on clients. |
09:17
🔗
|
SketchCow |
Or whiners. |
09:17
🔗
|
SketchCow |
Or whiny clients |
09:17
🔗
|
SketchCow |
Or client whiners |
09:19
🔗
|
iabak-reg |
03registrar 05master c394c48 06other 10SHARD6/pubkeys registration of koos303 on SHARD6 |
09:21
🔗
|
|
atomotic has joined #internetarchive.bak |
09:23
🔗
|
garyrh |
All four plus extra. |
09:25
🔗
|
|
atomotic has quit IRC (Client Quit) |
09:26
🔗
|
SketchCow |
We are probably going to start getting into the light realm of bad actors. |
09:26
🔗
|
SketchCow |
We'll see how we handle it. |
09:28
🔗
|
iabak-reg |
03registrar 05master f586bb7 06other 10SHARD6/pubkeys registration of info on SHARD6 |
09:36
🔗
|
|
Start_ has quit IRC (Read error: Connection reset by peer) |
09:36
🔗
|
|
lufix has joined #internetarchive.bak |
09:37
🔗
|
|
Start has joined #internetarchive.bak |
10:05
🔗
|
iabak-reg |
03registrar 05master 40f2c21 06other 10SHARD6/pubkeys registration of bdupray on SHARD6 |
11:03
🔗
|
|
lufix has quit IRC (Ping timeout: 240 seconds) |
11:03
🔗
|
iabak-reg |
03registrar 05master 9ebe0e8 06other 10SHARD6/pubkeys registration of alex_online78532 on SHARD6 |
11:10
🔗
|
|
lufix has joined #internetarchive.bak |
11:10
🔗
|
|
hendi has joined #internetarchive.bak |
11:27
🔗
|
hendi |
is there a way to set a nicer name for my account? currently I'm named "info"? |
11:28
🔗
|
Senji |
I believe there is an intention to allow nicknames. Currently it's using the bit before the @ in your email address |
11:28
🔗
|
hendi |
alright, thanks; I'll keep an eye out for that functionality then |
11:30
🔗
|
iabak-reg |
03registrar 05master 744b699 06other 10SHARD6/pubkeys registration of andrei.zbikowski on SHARD6 |
11:57
🔗
|
|
atomotic has joined #internetarchive.bak |
12:34
🔗
|
iabak-reg |
03registrar 05master d785a5d 06other 10SHARD6/pubkeys registration of olliejudge on SHARD6 |
12:39
🔗
|
|
beardicus has joined #internetarchive.bak |
12:40
🔗
|
beardicus |
hello, meat popsicles. |
12:40
🔗
|
beardicus |
i think i need to register. |
12:41
🔗
|
beardicus |
finally got a chance to update my scripts this morning... should be fscking shard1 right now. |
12:41
🔗
|
beardicus |
but i am "out of touch" |
12:41
🔗
|
ppiixx |
beardicus: i think you just need to run change-email in the iabak dir |
12:42
🔗
|
beardicus |
hmm. i now see `register-helper.pl`... let's see what that does. |
12:44
🔗
|
iabak-reg |
03registrar 05master a378f40 06other 10SHARD1/pubkeys registration of brian on SHARD1 |
12:44
🔗
|
iabak-reg |
03registrar 05master dde25de 06other 10SHARD2/pubkeys registration of brian on SHARD2 |
12:45
🔗
|
Senji |
That seems to have worked |
12:45
🔗
|
beardicus |
that was change-email that did it. thanks ppiixx |
12:46
🔗
|
beardicus |
noting that prompt-email did nothing. |
12:46
🔗
|
beardicus |
also noting that my system has neither systemd nor cron, so the script is a little complainy and i'll have to figure that out. |
12:47
🔗
|
Senji |
No cron! |
12:47
🔗
|
beardicus |
it's a synology nas. |
12:47
🔗
|
beardicus |
there's a gui thingy for running tasks though, i think. |
12:48
🔗
|
beardicus |
"Task Scheduler" yay. |
12:49
🔗
|
beardicus |
assuming the corrrect periodic command to blip is iabak-cronjob? daily? |
12:50
🔗
|
ppiixx |
yep |
12:50
🔗
|
|
sankin has joined #internetarchive.bak |
12:51
🔗
|
iabak-reg |
03registrar 05master b95aab9 06other 10SHARD6/pubkeys registration of olliejudge on SHARD6 |
12:56
🔗
|
lufix |
beardicus: My synology nas has cron, I believe? |
12:57
🔗
|
beardicus |
hmm. crond does exist. no crontab though. |
12:59
🔗
|
iabak-reg |
03registrar 05master 69747f3 06other 10SHARD6/pubkeys registration of atrus6 on SHARD6 |
13:30
🔗
|
lufix |
beardicus: Ah, I see :) http://www.multigesture.net/articles/how-to-use-cron-on-a-synology-nas/ |
13:30
🔗
|
lufix |
Might help |
13:31
🔗
|
beardicus |
yeah. apparently you need to use tabs between fields too. |
13:48
🔗
|
iabak-reg |
03registrar 05master e1e0acf 06other 10SHARD6/pubkeys registration of jon archive.org on SHARD6 |
13:50
🔗
|
iabak-reg |
03registrar 05master 629d09d 06other 10SHARD6/pubkeys registration of mail on SHARD6 |
13:55
🔗
|
|
Start has quit IRC (Disconnected.) |
14:18
🔗
|
iabak-reg |
03registrar 05master f8bf3c8 06other 10SHARD6/pubkeys registration of chiploaded on SHARD6 |
14:34
🔗
|
|
zottelbey has quit IRC (Ping timeout: 512 seconds) |
14:41
🔗
|
|
Start has joined #internetarchive.bak |
14:46
🔗
|
|
ohhdemgir has joined #internetarchive.bak |
14:52
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
14:55
🔗
|
iabak-reg |
03registrar 05master 09b57ac 06other 10SHARD6/pubkeys registration of moritz.steiner on SHARD6 |
15:17
🔗
|
|
zottelbey has joined #internetarchive.bak |
15:19
🔗
|
iabak-reg |
03registrar 05master 3988cd2 06other 10SHARD6/pubkeys registration of mariusz on SHARD6 |
15:20
🔗
|
|
beardicus has quit IRC (Sleep.) |
15:20
🔗
|
|
primus104 has quit IRC (Leaving.) |
15:28
🔗
|
iabak-reg |
03registrar 05master e5f4b99 06other 10SHARD6/pubkeys registration of iabackup on SHARD6 |
15:45
🔗
|
sep332 |
I second ivan`'s nomination of the google reader archive, even though the files are huge |
15:50
🔗
|
|
Start has quit IRC (Disconnected.) |
15:57
🔗
|
|
Start has joined #internetarchive.bak |
16:00
🔗
|
|
scatman has joined #internetarchive.bak |
16:01
🔗
|
|
Start has quit IRC (Client Quit) |
16:12
🔗
|
|
mariusz has joined #internetarchive.bak |
16:13
🔗
|
|
Zero_Dogg has joined #internetarchive.bak |
16:14
🔗
|
iabak-reg |
03registrar 05master 91c2b2a 06other 10SHARD6/pubkeys registration of archive.org on SHARD6 |
16:14
🔗
|
mariusz |
Hi. How can I register myself? :) |
16:16
🔗
|
Zero_Dogg |
Your install-git-annex is a bit stupid, always defaulting to i386. There are standalone tarballs for arm too, that works on raspberry pi (which I'd be using if I set it up) |
16:16
🔗
|
|
Start has joined #internetarchive.bak |
16:18
🔗
|
Zero_Dogg |
https://github.com/zerodogg/scriptbucket/blob/master/gitannex-install#L52-L60 is an example of the logic needed |
16:23
🔗
|
yipdw |
do submit patches, we all run on i386/x86-64 and therefore haven't had a need to generalize |
16:25
🔗
|
Zero_Dogg |
I will :) |
16:26
🔗
|
Zero_Dogg |
Got some spare space that I might be able to use for this, but it's on a raspi server |
16:26
🔗
|
SketchCow |
Zero_Dogg: So you have criticism and can't donate space! |
16:26
🔗
|
SketchCow |
You... you came from Hackernews |
16:27
🔗
|
Zero_Dogg |
does it require much cpu after the whole thing is downloaded (ie. does it git annex fsck, much)? |
16:27
🔗
|
Zero_Dogg |
SketchCow: lol |
16:28
🔗
|
Zero_Dogg |
SketchCow: I came from your blog, actually :p |
16:28
🔗
|
SketchCow |
Oh, THAT dump |
16:28
🔗
|
Zero_Dogg |
hah |
16:30
🔗
|
yipdw |
http://upload.wikimedia.org/wikipedia/commons/5/55/Creature_from_the_Black_Lagoon_poster.jpg |
16:39
🔗
|
SketchCow |
I will say, the Dogg is right in one regard - we should come up with some FAQ/information on how system intensive the ongoing holding of the data is. |
16:43
🔗
|
Zero_Dogg |
See? All nice and constructive, complete with pull request. Not hackernewsy at all |
16:45
🔗
|
|
Start has quit IRC (Disconnected.) |
16:46
🔗
|
SketchCow |
That's the way we like it. |
16:58
🔗
|
|
Lord has joined #internetarchive.bak |
16:58
🔗
|
Lord |
hello |
16:59
🔗
|
Lord |
i'm quite interested in this project (backink up the web backup :-) ) |
17:06
🔗
|
Lord |
i launched iabak, it downloaded some files and it failed |
17:06
🔗
|
Lord |
i created a user without home so the script failed |
17:06
🔗
|
Lord |
(maybe this info interest you) |
17:08
🔗
|
Lord |
i think i'll face another problem : the script is downloading gitannex i386 but my gentoo doesn't have multilib support |
17:08
🔗
|
|
Beardface has joined #internetarchive.bak |
17:09
🔗
|
iabak-reg |
03registrar 05master 256141c 06other 10SHARD6/pubkeys registration of lord-ia on SHARD6 |
17:10
🔗
|
Lord |
here i am :-) |
17:10
🔗
|
Lord |
it works |
17:10
🔗
|
Lord |
(with lots of setlocale errors) |
17:14
🔗
|
sep332 |
can git-annex be configured to use a set amount of space? like 2TB? |
17:19
🔗
|
mariusz |
sep332: git config annex.diskreserve 2000GB |
17:20
🔗
|
sep332 |
that's not how much space it *won't* use? |
17:23
🔗
|
|
primus104 has joined #internetarchive.bak |
17:26
🔗
|
mariusz |
sep332: you're right:) |
17:26
🔗
|
mariusz |
sep332: sorry |
17:26
🔗
|
sep332 |
np. it's a cool idea just not what i'm looking for |
17:29
🔗
|
SketchCow |
The new clients are cuasing some partying. |
17:30
🔗
|
iabak-reg |
03registrar 05master 3de6df9 06other 10SHARD6/pubkeys registration of dylan.barlett on SHARD6 |
17:33
🔗
|
tpw_rules |
is that a problem? i'll bring pizza |
17:33
🔗
|
tpw_rules |
i'm getting an error trying to sync to shard 6: error: Ref refs/heads/synced/git-annex is at 8c3a1a32ad19cc72f8429d7078dce8e9bc7e9e67 but expected e606f7f0f586c8f1504bd49cccb48d81dfa0a873 |
17:35
🔗
|
SketchCow |
No, it's not causing a problem at all. Just watching the activity. |
17:35
🔗
|
SketchCow |
We also are getting dilletantes, which is good, because that's a worthwhile experiment. |
17:36
🔗
|
SketchCow |
(People joining to fuck around and see what it does, then going "well that was fun" and disappearing, likely already, but certainly within the 2 week/4 week period) |
17:38
🔗
|
SketchCow |
My theory is this will just cause a bunch of 0.00 clients, since people are unlikely to go "let's see what it does.... DOWNLOAD A TERABYTE" |
17:39
🔗
|
iabak-reg |
03registrar 05master b29843a 06other 10SHARD6/pubkeys registration of frozenbeardme on SHARD6 |
17:42
🔗
|
iabak-reg |
03registrar 05master 5e39162 06other 10SHARD6/pubkeys registration of ryan on SHARD6 |
17:46
🔗
|
Beardface |
it works! |
17:48
🔗
|
SketchCow |
That's what we hope! |
17:48
🔗
|
SketchCow |
How much space you got, Beardface! |
17:48
🔗
|
* |
SketchCow rubs hands like Mr. Burns |
17:48
🔗
|
Beardface |
~1T so your last comment is kind of relevant, heh |
17:49
🔗
|
hendi |
I currently have 1.5TB and think about adding some more |
17:50
🔗
|
hendi |
should I do RAID1, or go without RAID, and just redownload when a drive fails? |
17:50
🔗
|
|
Start has joined #internetarchive.bak |
17:50
🔗
|
Beardface |
it checks for systemd to install a service, if not found it exits.. intentional? (when you start it again it installs a cron instead) |
17:52
🔗
|
|
primus104 has quit IRC (Leaving.) |
17:53
🔗
|
DFJustin |
hendi: there is already redundancy in the iabak software so that multiple people will get the same file, so local RAID is unnecessary |
17:54
🔗
|
SketchCow |
agree with DFJustin - it's wasted space, unless you personally have an interest in a collection in a way you are making it available elsewhere. |
17:58
🔗
|
hendi |
great, thank you |
17:58
🔗
|
hendi |
expect at least 15TB from me, then |
18:01
🔗
|
SketchCow |
Fantastic. |
18:01
🔗
|
SketchCow |
That'll help a lot. |
18:02
🔗
|
mariusz |
I plan on running iabak on more than one computer. Is the software smart enough to send me a different set of data to each one? |
18:02
🔗
|
tpw_rules |
not yet |
18:02
🔗
|
tpw_rules |
hendi: i've used mhddfs to attach a bunch of drives into one filesystem |
18:02
🔗
|
tpw_rules |
the advantage being that if one drive breaks, it doesn't take everything |
18:03
🔗
|
SketchCow |
mariusz: It's a worthwhile feature going forward for closure and db48x to consider, where machines are called buddies and they're treated as one machine. |
18:04
🔗
|
mariusz |
Yeah,tThat would be great.. |
18:05
🔗
|
SketchCow |
This project keeps coming up with new feature-adds |
18:06
🔗
|
mariusz |
another question. I have about 50 older drives ranging from 80GB to 1TB that I could manually plug-in once every few weeks. Is "cold storage" supported? If yes, any info how to go about this? |
18:06
🔗
|
SketchCow |
Somebody in 4 years is going to go "Man, this closure guy thought of EVERYTHING...." |
18:06
🔗
|
SketchCow |
mariusz: Not yet, in any meaningful way. |
18:06
🔗
|
SketchCow |
I should say it's supported in git-annex, but we're being simple... for now. |
18:06
🔗
|
tpw_rules |
mariusz: i solved that problem by getting a bunch of extremely cheap usb enclosures and attaching them as one |
18:06
🔗
|
SketchCow |
Because of aforementioned discovery of "buddy" feature and similar features. |
18:07
🔗
|
tpw_rules |
oh i had another idea: be able to set up an IPC socket that git annex will request downloads from so we can use something other than wget. i was thinking a pretty gui |
18:08
🔗
|
tpw_rules |
or even just a download command |
18:09
🔗
|
hendi |
tpw_rules, thanks for the hint, I'll have a look at mhddfs! |
18:09
🔗
|
tpw_rules |
get 0.1.38 btw, the later version is a bit crashy. it's a union fs, but it supports writes too |
18:13
🔗
|
DFJustin |
I think it does support cold storage as long as you check in at least once a month |
18:13
🔗
|
mariusz |
another idea - sneakerneting the data. i.e get a beer and copy your shard :) |
18:14
🔗
|
SketchCow |
It's a thought down the line. |
18:15
🔗
|
SketchCow |
There's a second/third/fourth wave of approach as we hit the upper limits of just scooping people out and into the project. |
18:15
🔗
|
SketchCow |
But it's still holding up for people going "Oh, yeah, got 10tb lying around." |
18:15
🔗
|
SketchCow |
The main critical thing is to make sure we have chosen collections that aren't wasteful. |
18:18
🔗
|
DFJustin |
I'm a little suspicious about some of the recent ones like wikipediadumps |
18:20
🔗
|
tpw_rules |
(why is that plural? i thought one dump contained the entire history of everything) |
18:21
🔗
|
DFJustin |
well there is more than one wikipedia (language) |
18:25
🔗
|
tpw_rules |
oh, true |
18:35
🔗
|
SketchCow |
I think wikipediadumps is on the edge. |
18:35
🔗
|
SketchCow |
On the other hand, our collection of dumps goes WAY back farther than anyone. |
18:35
🔗
|
SketchCow |
I did some of those, with lots of skeletons |
18:36
🔗
|
SketchCow |
Like Erik Moller pro-childporn arguments that were quietly expunged when he became Wikipedia org dude |
18:36
🔗
|
SketchCow |
Or Jimbo Wales getting into an argument with someone, and having a db admin remove the thing he said, and then going "I never said that." |
18:36
🔗
|
SketchCow |
And who knows what else, down there. |
18:36
🔗
|
SketchCow |
But year. |
18:36
🔗
|
SketchCow |
Yeah. |
18:36
🔗
|
SketchCow |
Maybe we need a nominations page. |
18:37
🔗
|
SketchCow |
Yes. |
18:37
🔗
|
SketchCow |
We do. |
18:37
🔗
|
SketchCow |
OK, one moment. |
18:37
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
18:46
🔗
|
SketchCow |
http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK/nominations |
18:49
🔗
|
Beardface |
are the shars a set size? |
18:49
🔗
|
Beardface |
shards* |
18:49
🔗
|
SketchCow |
The shards are a set number of files/items |
18:50
🔗
|
SketchCow |
So imagine it's... 20 |
18:50
🔗
|
SketchCow |
20 1m files mean tiny shard |
18:50
🔗
|
SketchCow |
20 1g files means fatty shard |
18:50
🔗
|
SketchCow |
Once things are going into the upper reaches/echelons, we'll see cases where shards 100-1000 are all 1mb or some such chicanery |
18:51
🔗
|
Beardface |
ahh |
18:53
🔗
|
iabak-reg |
03registrar 05master a1cadb4 06other 10SHARD6/pubkeys registration of mmein301+i on SHARD6 |
18:53
🔗
|
SketchCow |
Like, 80% of the work going on now is to find and deal with use cases and bugs/contingencies they reveal. |
18:53
🔗
|
SketchCow |
10% is improving the UI and interaction |
18:54
🔗
|
SketchCow |
10% is filler, primarily melted hooves and horns |
19:00
🔗
|
iabak-reg |
03registrar 05master 95ae19f 06other 10SHARD6/pubkeys registration of nico+iabak on SHARD6 |
19:01
🔗
|
mariusz |
btw. does any one knows why git config annex.web-options=--limit-rate=200k returns "invalid key" error? |
19:01
🔗
|
hendi |
If I want to run iabak on multiple machines, should I copy the private key over for accounting and stuff, or use a new one on each machine? |
19:08
🔗
|
iabak-reg |
03registrar 05master b6d2ef7 06other 10SHARD6/pubkeys registration of cyberjacob+IA on SHARD6 |
19:08
🔗
|
closure |
mariusz: I think git config doesn't want the = there |
19:08
🔗
|
closure |
hendi: use a new one |
19:12
🔗
|
mariusz |
closure: that worked. thanks. probably would be good |
19:13
🔗
|
mariusz |
to change the README |
19:13
🔗
|
mariusz |
;) |
19:13
🔗
|
Senji |
Glad to see I'm not the only person using a foo+bar address :) |
19:14
🔗
|
iabak-reg |
03registrar 05master 1d18796 06other 10SHARD7/pubkeys registration of mail on SHARD7 |
19:15
🔗
|
tpw_rules |
error: Ref refs/heads/synced/git-annex is at 38179c6b1a70682556e88bf6d5c94187cdaabaac but expected ffeac6bf96c796c6117981d2ee64fc642edbaa01 |
19:16
🔗
|
tpw_rules |
i am still getting sync problems like that |
19:16
🔗
|
tpw_rules |
i don't have more than one iabak script running |
19:18
🔗
|
closure |
well, that can happen if someone else pushed a change at the same time. It should normally clear up the next time, unless you're unludky |
19:24
🔗
|
iabak-reg |
03registrar 05master f760a1f 06other 10SHARD7/pubkeys registration of cyberjacob+IA on SHARD7 |
19:28
🔗
|
|
Start has joined #internetarchive.bak |
19:30
🔗
|
|
Start has quit IRC (Client Quit) |
19:31
🔗
|
iabak-reg |
03registrar 05master c7d5ea9 06other 10SHARD6/pubkeys registration of eskild on SHARD6 |
19:48
🔗
|
|
beardicus has joined #internetarchive.bak |
19:50
🔗
|
iabak-reg |
03registrar 05master 40e077c 06other 10SHARD7/pubkeys registration of moritz.steiner on SHARD7 |
19:50
🔗
|
|
primus104 has joined #internetarchive.bak |
19:51
🔗
|
Zero_Dogg |
/win 20 |
19:52
🔗
|
Zero_Dogg |
bah |
19:53
🔗
|
sep332 |
where are the authorized_keys files again? I think I'm missing from shard1 |
19:54
🔗
|
|
atomotic has joined #internetarchive.bak |
19:57
🔗
|
closure |
.git/annex/id_rsa and id_rsa.pub |
19:58
🔗
|
|
CyberJaco has joined #internetarchive.bak |
19:58
🔗
|
CyberJaco |
Hi |
19:59
🔗
|
sep332 |
i have id_rsa but I'm getting Permission denied (publickey) when i sync |
19:59
🔗
|
sep332 |
hi CyberJaco |
19:59
🔗
|
iabak-reg |
03registrar 05master dd3be51 06other 10SHARD6/pubkeys registration of hannson on SHARD6 |
20:00
🔗
|
CyberJaco |
that's weird, why is the last letter of my name mising... |
20:00
🔗
|
closure |
sep332: sounds like the wrong key, we have separate sets of keys for each shard, so shard1 may not have the pubkey you're using for other shards |
20:00
🔗
|
sep332 |
looks like an 8-char limit? |
20:01
🔗
|
sep332 |
closure: can i register a new one? |
20:01
🔗
|
closure |
sep332: manually, yes.. |
20:01
🔗
|
closure |
./register-helper.pl "$SHARD" "$uuid" "$registrationemail" "$(cat id_rsa.pub)" |
20:02
🔗
|
closure |
full in the bits, that will give an url you can hit to register |
20:05
🔗
|
|
mariusz has quit IRC (Read error: Operation timed out) |
20:05
🔗
|
closure |
sep332: or, I can manually add it |
20:07
🔗
|
iabak-reg |
03registrar 05master f36f056 06other 10SHARD1/pubkeys registration of sean.palmer on SHARD1 |
20:08
🔗
|
iabak-reg |
03registrar 05master fd09178 06other 10SHARD6/pubkeys registration of brian on SHARD6 |
20:09
🔗
|
sep332 |
closure: is it the [annex] uuid or the [remote "origin"] annex-uuid? |
20:10
🔗
|
iabak-reg |
03registrar 05master 84567e0 06other 10SHARD6/pubkeys registration of steven.m.reed on SHARD6 |
20:11
🔗
|
SketchCow |
SHARD6 does the climb |
20:11
🔗
|
closure |
sep332: the annex.uuid |
20:17
🔗
|
sep332 |
closure: that's what i put in, same error |
20:17
🔗
|
sep332 |
i'm using the same key for all shards |
20:18
🔗
|
closure |
check perms of your id_rsa file |
20:19
🔗
|
sep332 |
-rw------- |
20:24
🔗
|
closure |
sep332: check if shard1's git config has remote.origin.annex-ssh-options set |
20:27
🔗
|
sep332 |
nope. i'll just copy it from another shard |
20:28
🔗
|
closure |
well, not the whole config, just that setting |
20:28
🔗
|
sep332 |
yeah |
20:29
🔗
|
sep332 |
alright it's working. thanks closure |
20:42
🔗
|
|
CyberJaco is now known as zz_CyberJ |
20:44
🔗
|
iabak-reg |
03registrar 05master 4de5003 06other 10SHARD6/pubkeys registration of mariusz on SHARD6 |
20:46
🔗
|
|
sankin has quit IRC (Leaving.) |
20:50
🔗
|
SketchCow |
awww yes here comes mariusz |
20:56
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
21:01
🔗
|
|
xhdr has joined #internetarchive.bak |
21:10
🔗
|
tpw_rules |
closure: i think the sync error may have killed the iabak-hourly process. it's been a couple hours since that error and it hasn't tried again |
21:21
🔗
|
closure |
indeed, that could happen |
21:22
🔗
|
|
laxity has joined #internetarchive.bak |
21:29
🔗
|
lhobas_ |
closure: noticed that the cleanup function in the iabak script doesn't work on OS X (more stupid Mac-only glitches I assume) - https://github.com/ArchiveTeam/IA.BAK/blob/a420ad/iabak-helper#L282 throws "No such file or directory" (pid does exist, statement should eval to true) |
21:29
🔗
|
lhobas_ |
any clue what could cause that? |
21:29
🔗
|
lhobas_ |
(#L285 in that file seems off to me, think the file is supposed to be rm'ed, not the pid-number right?) |
21:34
🔗
|
iabak-reg |
03registrar 05master e16b5ee 06other 10SHARD6/pubkeys registration of matt on SHARD6 |
22:12
🔗
|
db48x |
I should have gone ahead and fixed that last night when you mentioned it |
22:12
🔗
|
db48x |
couldn't sleep anyway |
22:15
🔗
|
tpw_rules |
do you not love me |
22:16
🔗
|
closure |
SketchCow on npr, eep |
22:23
🔗
|
iabak-reg |
03registrar 05master 93b954b 06other 10SHARD6/pubkeys registration of carl.moden on SHARD6 |
22:39
🔗
|
|
mariusz has joined #internetarchive.bak |
23:05
🔗
|
|
Atluxity has quit IRC (Ping timeout: 360 seconds) |
23:10
🔗
|
|
Atluxity has joined #internetarchive.bak |
23:12
🔗
|
iabak-reg |
03registrar 05master 240d6ac 06other 10SHARD6/pubkeys registration of paul.chambers on SHARD6 |
23:18
🔗
|
|
beardicus has quit IRC (Quit: Sleep.) |
23:19
🔗
|
|
Atluxity has quit IRC (Ping timeout: 360 seconds) |
23:23
🔗
|
|
Start has joined #internetarchive.bak |
23:26
🔗
|
|
Atluxity has joined #internetarchive.bak |
23:27
🔗
|
|
beardicus has joined #internetarchive.bak |
23:31
🔗
|
db48x |
closure: ah, an interesting clue |
23:32
🔗
|
db48x |
so where is the ffi function that calls CreateProcess? |
23:36
🔗
|
db48x |
I'm looking at http://hackage.haskell.org/package/process-1.2.3.0/docs/src/System-Process.html#createProcess, but I don't see where it actually calls the win32 api... |
23:37
🔗
|
|
Atluxity has quit IRC (Ping timeout: 360 seconds) |
23:38
🔗
|
|
Atluxity has joined #internetarchive.bak |
23:45
🔗
|
closure |
db48x: oh, I just nailed that problem |
23:45
🔗
|
|
zottelbey has quit IRC (Quit: Leaving) |
23:46
🔗
|
closure |
writing the PR for the library that needs changes now.. |
23:46
🔗
|
closure |
you were in the right place, but it has a side of C files :) |
23:49
🔗
|
|
Atluxity has quit IRC (Ping timeout: 360 seconds) |
23:52
🔗
|
iabak-reg |
03registrar 05master de4a487 06other 10SHARD6/pubkeys registration of iabak on SHARD6 |
23:53
🔗
|
db48x |
ah, good |
23:53
🔗
|
db48x |
is https://github.com/haskell/process/blob/master/System/Process/Internals.hs#L414 closer? |
23:54
🔗
|
db48x |
ah, https://github.com/haskell/process/blob/master/cbits/runProcess.c#L557 |
23:56
🔗
|
db48x |
presumably you're adding a way to add that flag where that's called here: https://github.com/haskell/process/blob/master/System/Process/Internals.hs#L452 |
23:57
🔗
|
closure |
that's the plan, but I'm actually watching austraian cooking show :P |
23:57
🔗
|
closure |
feel free to send patch to https://github.com/haskell/process/issues/32 |
23:58
🔗
|
|
Atluxity has joined #internetarchive.bak |