[00:02] *** mistym has joined #archiveteam [00:03] *** Start_ is now known as Start [00:24] *** aaaaaaaa_ has quit IRC (Read error: Connection reset by peer) [00:24] *** aaaaaaaaa has joined #archiveteam [00:40] *** BlueMaxim has joined #archiveteam [01:05] *** mistym has quit IRC (Remote host closed the connection) [01:18] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [01:20] *** dashcloud has joined #archiveteam [01:22] Nemo_bis: Were you just trying to get the latest copy of the files out of them, or full history in a different format? [01:39] *** primus104 has quit IRC (Leaving.) [01:41] *** VonGuard_ has quit IRC (Ping timeout: 248 seconds) [01:51] *** VonGuard_ has joined #archiveteam [02:12] *** rejon has quit IRC (Read error: Operation timed out) [02:34] full history, natch [03:12] That was my guess, but then I guess I'm not sure what's wrong with the SVN format. Prefer git or something? [04:37] a dump of the repo isn't optimal, you want something more compact and more portable [04:37] svn has an export that packs it all up into a single file [04:43] *** aaaaaaaaa has quit IRC (Leaving) [05:28] heh [05:28] xmc: that's quite an understatement [05:28] I had to deal with a 50GB repo dump a while back [05:48] *** achip has quit IRC (Remote host closed the connection) [06:08] *** rejon has joined #archiveteam [06:22] db48x: hmmm. [06:49] *** achip has joined #archiveteam [06:54] *** achip has quit IRC (Ping timeout: 248 seconds) [07:04] *** mistym has joined #archiveteam [07:05] *** rejon has quit IRC (Read error: Operation timed out) [07:45] *** primus104 has joined #archiveteam [07:48] *** tev|stdby has quit IRC (Read error: Operation timed out) [07:50] *** tev|stdby has joined #archiveteam [07:56] db48x: so did you use the 50 GB dump file? [07:57] *** xk_id has joined #archiveteam [07:57] Ideally I want to convert the hundreds repos in a single repo and keep all history. SVN or Git doesn't matter but I expect SVN is easier given this is not a dump file [08:17] speaking of repos, are there any backups of the big open source project hosting sites like github and sourceforge? [08:38] *** achip has joined #archiveteam [08:48] *** achip has quit IRC (Ping timeout: 600 seconds) [08:55] *** schbirid has joined #archiveteam [09:21] git pull > 7z > upload to IA > wait for pull request > repeat [09:41] *** Ymgve has joined #archiveteam [10:05] Would it be hard to write a script to do this for the whole site? I have a little python experience and might be able to try [10:05] so basically maintain a local copy of each repo and then pull the changes on update [10:06] Control-S: we did that only for BerliOS, IIRC [10:07] There are dumps for GitHub and SourceForge somewhereâ„¢ but not in accessible places AFAIK [10:08] any idea how much disk space and data throughput/time would be needed? [10:16] Maybe https://code.openhub.net/ has some numbers, they have a dump of most stuff [10:18] you could archive the GitHub firehose [10:18] Nemo_bis: oh yes, I was converting it from SVN to Git using Reposurgeon [10:21] it took ages, of course [10:22] days for a single run of Reposurgeon, with quite a simple script [10:22] spent a few weeks optimizing it, got the runs down to a few hours [10:23] it's probably even faster now; I've read that they've kept optimizing it [10:27] *** achip has joined #archiveteam [10:28] Interesting, thanks [10:28] Of course this repo is two orders of magnitude smaller [10:31] *** achip has quit IRC (Read error: Operation timed out) [10:52] *** ionpulse has quit IRC (Ping timeout: 265 seconds) [10:55] Nemo_bis: what repositories are you looking to convert? [10:56] db48x: https://archive.org/details/toolserver-svn [10:56] Was hosted on fisheye.toolserver.org [11:06] ah, I see [11:07] don't convert them to a single repository [11:07] or at least, not a single SVN repository [11:08] converting them to Git would be easy [11:09] dump each one, feed the dump to Reposurgeon, feed the fast-import stream to Git [11:12] but, if you do want to make a giant repository out of them, then Reposurgeon will do it [11:13] *** xk_id has quit IRC (Remote host closed the connection) [11:13] *** khaoohs has quit IRC (Read error: Connection reset by peer) [11:13] *** xk_id has joined #archiveteam [11:13] *** khaoohs has joined #archiveteam [11:13] you can splice all the histories together, put them on separate branches, etc [11:18] *** the_fox has quit IRC (Ping timeout: 492 seconds) [11:19] *** the_fox has joined #archiveteam [11:23] *** xk_id has quit IRC (Read error: Operation timed out) [11:29] Interesting, I will try [11:56] *** xk_id has joined #archiveteam [12:12] *** Ymgve has quit IRC () [12:24] *** SmileyG has joined #archiveteam [12:33] *** Smiley has quit IRC (Ping timeout: 845 seconds) [13:11] *** Morbus has quit IRC (Ping timeout: 370 seconds) [13:16] *** achip has joined #archiveteam [13:19] *** Morbus has joined #archiveteam [13:24] *** achip has quit IRC (Read error: Operation timed out) [13:27] *** kris33 has joined #archiveteam [13:42] *** sankin has joined #archiveteam [13:44] *** xk_id has quit IRC (Remote host closed the connection) [13:44] *** xk_id has joined #archiveteam [13:46] *** kris33 has quit IRC (Ping timeout: 506 seconds) [13:47] *** BlueMaxim has quit IRC (Quit: Leaving) [13:47] *** xk_id has quit IRC (Read error: Connection reset by peer) [13:47] *** xk_id has joined #archiveteam [14:58] *** Ymgve has joined #archiveteam [15:05] *** Start has quit IRC (Disconnected.) [15:23] *** rejon has joined #archiveteam [15:36] *** signius_ has quit IRC (Ping timeout: 512 seconds) [15:45] *** signius_ has joined #archiveteam [15:48] SketchCow: we'll have ~200.000 items for cobook [15:51] *** Start has joined #archiveteam [15:54] *** Start has quit IRC (Client Quit) [16:01] *** Start has joined #archiveteam [16:01] *** mistym has quit IRC (Remote host closed the connection) [16:09] WHERE IS MY HUG [16:09] So wait, cobook? [16:09] I am not sure I know what this is. [16:09] Found [16:09] Thank goodness, http://www.archiveteam.org/index.php?title=Current_Projects is active [16:15] We're the Hug Team. [16:16] We're gonna hug yer shit up. [16:23] 989 of the bitsavers items have no OCR! [16:52] *** Start has quit IRC (Quit: Disconnected.) [16:53] *** Start has joined #archiveteam [16:54] *** Start has quit IRC (Read error: Connection reset by peer) [16:54] *** Start has joined #archiveteam [16:58] *** Start has quit IRC (Client Quit) [16:58] *** Start has joined #archiveteam [16:58] *** Start has quit IRC (Client Quit) [17:00] *** Start has joined #archiveteam [17:00] *** Start has quit IRC (Client Quit) [17:00] *** Start has joined #archiveteam [17:00] *** Start has quit IRC (Client Quit) [17:04] *** mistym has joined #archiveteam [17:04] ia in general seems to need more retrospective "did the derive fuck up" passes [17:15] *** aaaaaaaaa has joined #archiveteam [17:18] Yes, agreed [17:18] I think derive is one of those things thrown over the side if there's an issue. [17:21] *** K4k has joined #archiveteam [17:24] *** Start has joined #archiveteam [17:25] SketchCow: current projects should be in the topic [17:45] *** Start has quit IRC (Disconnected.) [17:48] 65G ovi-store [17:48] 25G ovi-store_attempt_2 [17:48] What's the story [17:49] they blocked us, useragent bans iirc [17:51] *** rejon has quit IRC (Ping timeout: 370 seconds) [18:22] *** Morbus has quit IRC (Ping timeout: 248 seconds) [18:29] The Cobook project has started: #cookbook [18:31] *** Start has joined #archiveteam [18:42] *** Start has quit IRC (Quit: Disconnected.) [19:08] *** kyan_ has quit IRC (Quit: Leaving) [19:18] *** phuzion has joined #archiveteam [19:44] *** Start has joined #archiveteam [19:45] *** Start_ has joined #archiveteam [19:45] *** Start has quit IRC (Read error: Connection reset by peer) [19:47] *** sankin has quit IRC (Leaving.) [19:49] What does 'ia upload' do when it shows "identifier:" but does not yet show any upload progress bar? [19:49] Also, why is disk usage trough the roof during that process? [19:51] I am already using 'ionice -c3' to prevent as much issues as I can, but it still prevents some other uses of the same machine while ia upload is running [19:52] it's creating a hash of your data and making sure it has enough storage to upload it [19:52] you mean, storage on the S3 side? [19:52] yep [19:53] also, the hasing should be cpu-bound, right? [19:53] I see almost no cpu usage during that' [19:54] *** Start_ has quit IRC (Quit: Disconnected.) [19:55] my current ia upload started 18:48 and I paused it somewhere between 19:05 and 19:10 because we wanted to watch TV (the same machine runs kodi with the nederland24 plugin in order to watch tv), but we watched matthijs van nieuwkerk at basically 1fps without sound [19:55] it was funny but pointless [19:55] I had to pause ia upload and restart kodi [19:56] but between 18:48 and 19:05 there was not much cpu usage [19:56] well, there was, but most if it was iowait, not user or system [19:57] so that should be managed by ionice -c3, right? yet somehow, it does not [19:57] I have graphs of cpu usage and disk usage and eth0 traffic and more :) [20:02] *** Morbus has joined #archiveteam [20:16] *** mistym has quit IRC (Remote host closed the connection) [20:20] *** Start has joined #archiveteam [20:23] *** Start has quit IRC (Client Quit) [20:35] *** Start has joined #archiveteam [20:41] Outrageous that ia wasn't optimised for the movie use case [20:43] hashing is disk-bound for large files [20:43] if you have a 50gb tar it has to read in the entire thing to hash it [20:58] *** signius_ has quit IRC (ircd.choopa.net irc.eversible.com) [20:58] *** the_fox has quit IRC (ircd.choopa.net irc.eversible.com) [20:58] *** dashcloud has quit IRC (ircd.choopa.net irc.eversible.com) [20:58] *** balrog has quit IRC (ircd.choopa.net irc.eversible.com) [20:58] *** Cameron_D has quit IRC (ircd.choopa.net irc.eversible.com) [20:58] *** Rickster has quit IRC (ircd.choopa.net irc.eversible.com) [20:58] *** xmc has quit IRC (ircd.choopa.net irc.eversible.com) [20:58] *** sivoais has quit IRC (ircd.choopa.net irc.eversible.com) [20:58] *** slash` has quit IRC (ircd.choopa.net irc.eversible.com) [20:58] *** Famicoman has quit IRC (ircd.choopa.net irc.eversible.com) [20:58] *** superkuh has quit IRC (ircd.choopa.net irc.eversible.com) [20:58] *** boozehoun has quit IRC (ircd.choopa.net irc.eversible.com) [21:14] *** S[h]O[r]T has quit IRC (Read error: Operation timed out) [21:23] *** Start has quit IRC (Quit: Disconnected.) [21:25] Nemo_bis: yes it is :p (insert sarcasm tag ere) [21:25] DFJustin: I understand, but why does 'ionice -c3' not prevent my issue? [21:25] *** Start has joined #archiveteam [21:37] SketchCow: looks like cobook is a lot smaller then I thought it was, will be done in a few hours [21:38] *** signius_ has joined #archiveteam [21:38] *** the_fox has joined #archiveteam [21:38] *** dashcloud has joined #archiveteam [21:38] *** balrog has joined #archiveteam [21:38] *** Cameron_D has joined #archiveteam [21:38] *** Rickster has joined #archiveteam [21:38] *** xmc has joined #archiveteam [21:38] *** sivoais has joined #archiveteam [21:38] *** slash` has joined #archiveteam [21:38] *** Famicoman has joined #archiveteam [21:38] *** superkuh has joined #archiveteam [21:38] *** boozehoun has joined #archiveteam [21:38] *** irc.eversible.com sets mode: +oo balrog xmc [21:38] *** swebb sets mode: +o balrog [21:38] *** swebb sets mode: +o xmc [21:38] *** schbirid has quit IRC (Quit: Leaving) [21:38] *** balrog sets mode: +o Lord_Nigh [21:41] *** mistym has joined #archiveteam [21:44] *** S[h]O[r]T has joined #archiveteam [21:47] *** balrog has quit IRC (Quit: Bye) [21:48] *** balrog has joined #archiveteam [21:48] *** swebb sets mode: +o balrog [22:02] *** K4k has quit IRC (Read error: Operation timed out) [22:04] arkiver: we might want to start www.odysee.com next [22:04] it's a video and photo site shutting down on feb. 23 [22:20] *** Start has quit IRC (Quit: Disconnected.) [22:24] *** Riviera has joined #archiveteam [23:11] *** Start has joined #archiveteam [23:25] *** xk_id has quit IRC (Remote host closed the connection) [23:28] Start: yes, let's do that [23:31] *** Ymgve has quit IRC () [23:46] *** Start_ has joined #archiveteam [23:47] *** Start has quit IRC (Read error: Connection reset by peer) [23:48] *** cadbury_ has quit IRC (Read error: Operation timed out) [23:52] *** SN4T14_ is now known as SN4T14 [23:52] *** cadbury_ has joined #archiveteam