[00:04] *** db48x has quit IRC (Quit: train) [00:54] *** godane has joined #internetarchive.bak [01:10] *** n00b428 has joined #internetarchive.bak [01:12] Hello all. I'm having some problems with a new install on Ubuntu [01:13] verification of content failed; Unable to access these remotes: web; Try making some of these repositories available: [01:13] Any thoughts? [01:21] 03registrar 05master cf24b43 06other 10SHARD10/pubkeys registration of octobyt3 on SHARD10 [01:23] Alrighty then.... [01:23] *** n00b428 has quit IRC (Quit: Page closed) [01:26] 03registrar 05master 563b210 06other 10SHARD4/pubkeys registration of octobyt3 on SHARD4 [01:57] *** sevs has joined #internetarchive.bak [03:09] thelsdj: build is ready: https://downloads.kitenet.net/git-annex/autobuild/armel/git-annex-standalone-armel.tar.gz [03:12] closure: huh, still doesn't work, same error [03:14] 03registrar 05master 871cec1 06other 10SHARD3/pubkeys registration of mike on SHARD3 [03:21] *** cmaldonad has joined #internetarchive.bak [03:23] ah, I see, the way I wrapped the linker didn't work [03:24] *** cmaldonad has quit IRC (Client Quit) [03:25] *** cmaldonad has joined #internetarchive.bak [03:28] also interestingly, I notice that binaries such as git on arm all do use a 64k page size. [03:28] except for haskell ones, which are linked with ld.gold. [03:29] probably the bug is there [03:29] yep, the others work [03:36] 03registrar 05master 5a5cc9a 06other 10SHARD12/pubkeys registration of kyle on SHARD12 [03:45] *** cmaldonad has quit IRC (Quit: This computer has gone to sleep) [03:58] gar, it's rebuilding from the top [04:06] Signups are going gangbusters! [04:24] SketchCow: if we can get the shard providing thing into a package format that will easily run on a nas using a settable amount of space, you'd have hundreds more volunteers [05:08] Agreed [05:08] Now who wants to do it? [05:48] oh, interesting [05:48] ia-mine's using IA's advanced search API, but uh [05:48] https://archive.org/advancedsearch.php?q=collection:coverartarchive&page=202&rows=50&output=json [05:48] I think this means we'll only ever get the first 10,000 results of a collection [05:49] * yipdw checks out the scraping API [05:52] btw i noticed something [05:52] ...crap wrong channel [06:36] *** kyan has quit IRC (Quit: Leaving) [07:04] I'm kinda curious, had to restart the iabak process and now it's doing "get MD5- (from web...)" followed by "(checksum...) ok" over and over [07:05] before it downloaded files but now it's using way less traffic [07:05] is this expected/normal? [07:24] *** Zippit has quit IRC (Ping timeout: 260 seconds) [07:35] *** minus_ has joined #internetarchive.bak [07:53] *** minus_ has quit IRC (Quit: Bye) [07:54] *** minus_ has joined #internetarchive.bak [08:33] 03registrar 05master f264d8d 06other 10SHARD14/pubkeys registration of octobyt3 on SHARD14 [08:36] 03registrar 05master b9fb989 06other 10SHARD16/pubkeys registration of octobyt3 on SHARD16 [08:55] *** Zippit has joined #internetarchive.bak [08:59] *** Zippit has quit IRC (Client Quit) [09:00] *** sevs has quit IRC (Ping timeout: 268 seconds) [11:43] *** CyberJaco is now known as zz_CyberJ [11:46] 03registrar 05master 251c42c 06other 10SHARD9/pubkeys registration of stefan on SHARD9 [11:51] 03registrar 05master c1011a5 06other 10SHARD10/pubkeys registration of mr.business1148 on SHARD10 [11:51] *** cmaldonad has joined #internetarchive.bak [12:06] 03registrar 05master ab9a828 06other 10SHARD10/pubkeys registration of mrote on SHARD10 [12:27] *** TGMMilenk is now known as milenko [12:31] *** atomotic has joined #internetarchive.bak [12:43] *** atomotic has quit IRC (Read error: Connection timed out) [12:51] So my client is currently pulling down a ~150GB /.git/annex/MD5-* file [12:51] *** markaro has joined #internetarchive.bak [12:51] Client says it's coming in at ~2MB/s - but the host is only showing around 100kb/s [12:52] That doesn't seem normal [13:04] *** cmaldonad has quit IRC (Quit: This computer has gone to sleep) [13:17] *** cmaldonad has joined #internetarchive.bak [13:42] *** cmaldonad has quit IRC (Quit: This computer has gone to sleep) [13:51] 03registrar 05master 95506be 06other 10SHARD12/pubkeys registration of mrote on SHARD12 [13:56] 03registrar 05master fb1daa3 06other 10SHARD9/pubkeys registration of mrote on SHARD9 [14:02] 03registrar 05master 477ee58 06other 10SHARD12/pubkeys registration of mariabak on SHARD12 [14:05] 03registrar 05master a0e9e40 06other 10SHARD10/pubkeys registration of mariabak on SHARD10 [14:08] *** markaro has quit IRC () [14:25] 03registrar 05master e3a6430 06other 10SHARD10/pubkeys registration of mariabak on SHARD10 [14:29] 03registrar 05master bafa0a5 06other 10SHARD9/pubkeys registration of mariabak on SHARD9 [14:41] 03registrar 05master eb7d6cf 06other 10SHARD3/pubkeys registration of iabakmar on SHARD3 [14:47] 03registrar 05master f5e911c 06other 10SHARD16/pubkeys registration of iabakmar on SHARD16 [14:49] *** sep332_ has quit IRC (konversation out) [14:51] *** sep332_ has joined #internetarchive.bak [14:52] *** cmaldonad has joined #internetarchive.bak [14:54] *** Start has quit IRC (Quit: Disconnected.) [14:55] *** cmaldonad has quit IRC (Client Quit) [14:58] *** markaro has joined #internetarchive.bak [15:16] Excellent [15:23] 03registrar 05master 4b08ea7 06other 10SHARD3/pubkeys registration of mitch on SHARD3 [17:10] *** markaro has quit IRC () [17:24] 03registrar 05master dd40f18 06other 10SHARD15/pubkeys registration of octobyt3 on SHARD15 [17:40] *** sevs has joined #internetarchive.bak [17:44] *** computerf has quit IRC (Read error: Operation timed out) [18:12] *** atomotic has joined #internetarchive.bak [18:13] *** computerf has joined #internetarchive.bak [18:20] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [18:51] *** kyan has joined #internetarchive.bak [19:13] 03registrar 05master 6d7eea2 06other 10SHARD10/pubkeys registration of iabakmar on SHARD10 [19:15] 03registrar 05master 299f6ad 06other 10SHARD3/pubkeys registration of iabakmar on SHARD3 [19:42] *** Start has joined #internetarchive.bak [20:09] *** Start has quit IRC (Remote host closed the connection) [20:10] So I think I saw this mentioned earlier but is an encrypted rclone mount on google drive or ACD potentially fair game for storage? [20:13] If you can engineer storage that passes the fixity and interaction check it's up for grabs. [20:13] So you guys are syncing data and periodically verifying it? [20:13] Yes [20:14] Ok because I have a 1g/1g unmetered line but I don't keep a ton of storage local [20:14] I've done 22TB of transit this month and heard no complaints from my ISP lol [20:14] 03registrar 05master d2fb17f 06other 10SHARD3/pubkeys registration of mike on SHARD3 [20:15] SketchCow, What kind of file sizes are we talking about? [20:15] Individual files or chunks? [20:16] Like am I syncing 50G megawarcs or 10 million tiny files [20:16] Purely for the sake of filesystem requirements [20:18] Could be either [20:18] No way to tell [20:19] Ok so they're not wrapped by you guys? [20:23] Blackout: git-annex can chunk larger files to smaller chunks when using a special remote; there are some special remotes supporting google drive [20:27] *** SketchPho has quit IRC (Quit: Connection closed for inactivity) [20:49] closure, so I'm not familiar with git-annex. Would that negate the need to mount the drive via rclone? [20:50] So I've been running iabak since yesterday on two machines, the folders are 302G and 318G, however the values on the stat page sit at 50 and 80G - am I doing something wrong? [20:51] *** Start has joined #internetarchive.bak [20:52] *** Start has quit IRC (Client Quit) [20:52] sevs: Be patient as we work it out. Some of the devs here might ask you some questions so we can check the reporting mechanism. [20:53] It might be as simple as closure has it running once a day. [20:53] SketchCow: ahh, ok [20:54] Was just confused since it apparently was updating every 10 minutes [20:55] *** Start has joined #internetarchive.bak [20:55] ^since the stats seemed to be updating every 10 minutes [20:59] *** Start has quit IRC (Client Quit) [21:00] *** voovik198 has joined #internetarchive.bak [21:02] *** voovik198 has left [21:03] *** Start has joined #internetarchive.bak [21:10] JASON GO BACK TO FUCKING SLEEP [21:16] So maaaany yyy thiiinggss to dooooo [21:36] *** n00b473 has joined #internetarchive.bak [21:41] thelsdj: build is ready (second try): https://downloads.kitenet.net/git-annex/autobuild/armel/git-annex-standalone-armel.tar.gz [21:42] trying it out [21:43] works! now to see if everything else works on drobo [21:43] sevs: your progress is only synced back periodically, much less frequently than 10 minutes. Give it a couple of days [21:43] thelsdj: damn, nice! [21:43] so was that on the WD NAS? [21:44] I was only able to get it to build with a 32kb page size, not 64k [21:44] its a Drobo 5N [21:44] which has 16k page size [21:44] i get this when running iabak a second time: fatal: unable to access 'https://github.com/ArchiveTeam/IA.BAK/': Problem with the SSL CA cert (path? access rights?) [21:44] trying rest but not sure if that is fatal [21:45] Blackout: http://git-annex.branchable.com/tips/using_Google_Cloud_Storage/ [21:45] Drobo also doesn't have 'tempfile' command by default [21:46] probably the NAS doesn't have a ssl cert store. https is only used for cloning the IA.BAK repo [21:46] IA.BAK probably will need some porting for such embedded systems. [21:47] yeah its close to being a functional linux but doesn't have a lot of expected helper apps and such [21:48] not sure why the git clone fails since i'm already in the git directory i cloned? [21:50] ah doesn't have a cron by default either but i think i can install one [21:50] it does a git pull to update itself [21:51] the git pull works it says 'Already up-to-date.' at the top, but then i get that SSL CA cert error a few lines down [21:54] *** Start has quit IRC (Remote host closed the connection) [21:55] thelsdj: iirc you said some device needed 64kb page size? [21:57] appears that WD My Cloud devices at least in newer firmware versions have 64k page size [21:57] so, not something you can test? [21:57] unfortunatly 64kb page size causes ld.gold to fail with internal error. bugs filed etc [21:58] nope, i don't have one, but that was what the original bug i saw had so might be worth reaching out to the person who reported it https://git-annex.branchable.com/bugs/git-annex_won__39__t_execute_on_WD_My_Cloud_NAS/ [21:59] so yeah looks like git pull fails with the git in git-annex download, but works fine with the git I have installed on my device [22:00] does git annex require a minimum git version? i think the one i have on the device is kind of old [22:01] 03registrar 05master 200ee77 06other 10SHARD14/pubkeys registration of fusl on SHARD14 [22:01] looks like 2.5 is the latest available built for drobo, i have 2.2 installed right now [22:03] 03registrar 05master 202a210 06other 10SHARD16/pubkeys registration of thelsdj on SHARD16 [22:04] welp, its doing something [22:07] closure, google cloud storage != google drive [22:08] But yeah I see the page for it so I'll probably look into it [22:10] ugh, i forgot that Drobo doesn't report actual free space with 'df' so I have to like say 'save 8TB' when I only want it to save 2 [22:22] 03registrar 05master 2b7770d 06other 10SHARD4/pubkeys registration of mitch on SHARD4 [22:32] *** n00b473 has quit IRC (Quit: http://chat.efnet.org ) [23:13] here is a non-iabak option for people: http://pastebin.com/Hzz56QDe [23:14] its code to grab collections items [23:14] also i only grabs the original files [23:19] Blackout: each shard is many sets of collection/item/files, where the files are as-they-are on IA; so, no, they aren't aggregated or otherwise transformed [23:19] that said, we do limit the size of each shard [23:28] *** Start has joined #internetarchive.bak [23:39] ack [23:52] *** sevs has quit IRC (Quit: Page closed)