#archiveteam 2015-02-09,Mon

↑back Search

Time Nickname Message
00:02 πŸ”— mistym has joined #archiveteam
00:03 πŸ”— Start_ is now known as Start
00:24 πŸ”— aaaaaaaa_ has quit IRC (Read error: Connection reset by peer)
00:24 πŸ”— aaaaaaaaa has joined #archiveteam
00:40 πŸ”— BlueMaxim has joined #archiveteam
01:05 πŸ”— mistym has quit IRC (Remote host closed the connection)
01:18 πŸ”— dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.)
01:20 πŸ”— dashcloud has joined #archiveteam
01:22 πŸ”— aschmitz Nemo_bis: Were you just trying to get the latest copy of the files out of them, or full history in a different format?
01:39 πŸ”— primus104 has quit IRC (Leaving.)
01:41 πŸ”— VonGuard_ has quit IRC (Ping timeout: 248 seconds)
01:51 πŸ”— VonGuard_ has joined #archiveteam
02:12 πŸ”— rejon has quit IRC (Read error: Operation timed out)
02:34 πŸ”— xmc full history, natch
03:12 πŸ”— aschmitz That was my guess, but then I guess I'm not sure what's wrong with the SVN format. Prefer git or something?
04:37 πŸ”— xmc a dump of the repo isn't optimal, you want something more compact and more portable
04:37 πŸ”— xmc svn has an export that packs it all up into a single file
04:43 πŸ”— aaaaaaaaa has quit IRC (Leaving)
05:28 πŸ”— db48x heh
05:28 πŸ”— db48x xmc: that's quite an understatement
05:28 πŸ”— db48x I had to deal with a 50GB repo dump a while back
05:48 πŸ”— achip has quit IRC (Remote host closed the connection)
06:08 πŸ”— rejon has joined #archiveteam
06:22 πŸ”— xmc db48x: hmmm.
06:49 πŸ”— achip has joined #archiveteam
06:54 πŸ”— achip has quit IRC (Ping timeout: 248 seconds)
07:04 πŸ”— mistym has joined #archiveteam
07:05 πŸ”— rejon has quit IRC (Read error: Operation timed out)
07:45 πŸ”— primus104 has joined #archiveteam
07:48 πŸ”— tev|stdby has quit IRC (Read error: Operation timed out)
07:50 πŸ”— tev|stdby has joined #archiveteam
07:56 πŸ”— Nemo_bis db48x: so did you use the 50 GB dump file?
07:57 πŸ”— xk_id has joined #archiveteam
07:57 πŸ”— Nemo_bis Ideally I want to convert the hundreds repos in a single repo and keep all history. SVN or Git doesn't matter but I expect SVN is easier given this is not a dump file
08:17 πŸ”— Control-S speaking of repos, are there any backups of the big open source project hosting sites like github and sourceforge?
08:38 πŸ”— achip has joined #archiveteam
08:48 πŸ”— achip has quit IRC (Ping timeout: 600 seconds)
08:55 πŸ”— schbirid has joined #archiveteam
09:21 πŸ”— midas git pull > 7z > upload to IA > wait for pull request > repeat
09:41 πŸ”— Ymgve has joined #archiveteam
10:05 πŸ”— Control-S Would it be hard to write a script to do this for the whole site? I have a little python experience and might be able to try
10:05 πŸ”— Control-S so basically maintain a local copy of each repo and then pull the changes on update
10:06 πŸ”— Nemo_bis Control-S: we did that only for BerliOS, IIRC
10:07 πŸ”— Nemo_bis There are dumps for GitHub and SourceForge somewhereβ„’ but not in accessible places AFAIK
10:08 πŸ”— Control-S any idea how much disk space and data throughput/time would be needed?
10:16 πŸ”— Nemo_bis Maybe https://code.openhub.net/ has some numbers, they have a dump of most stuff
10:18 πŸ”— db48x you could archive the GitHub firehose
10:18 πŸ”— db48x Nemo_bis: oh yes, I was converting it from SVN to Git using Reposurgeon
10:21 πŸ”— db48x it took ages, of course
10:22 πŸ”— db48x days for a single run of Reposurgeon, with quite a simple script
10:22 πŸ”— db48x spent a few weeks optimizing it, got the runs down to a few hours
10:23 πŸ”— db48x it's probably even faster now; I've read that they've kept optimizing it
10:27 πŸ”— achip has joined #archiveteam
10:28 πŸ”— Nemo_bis Interesting, thanks
10:28 πŸ”— Nemo_bis Of course this repo is two orders of magnitude smaller
10:31 πŸ”— achip has quit IRC (Read error: Operation timed out)
10:52 πŸ”— ionpulse has quit IRC (Ping timeout: 265 seconds)
10:55 πŸ”— db48x Nemo_bis: what repositories are you looking to convert?
10:56 πŸ”— Nemo_bis db48x: https://archive.org/details/toolserver-svn
10:56 πŸ”— Nemo_bis Was hosted on fisheye.toolserver.org
11:06 πŸ”— db48x ah, I see
11:07 πŸ”— db48x don't convert them to a single repository
11:07 πŸ”— db48x or at least, not a single SVN repository
11:08 πŸ”— db48x converting them to Git would be easy
11:09 πŸ”— db48x dump each one, feed the dump to Reposurgeon, feed the fast-import stream to Git
11:12 πŸ”— db48x but, if you do want to make a giant repository out of them, then Reposurgeon will do it
11:13 πŸ”— xk_id has quit IRC (Remote host closed the connection)
11:13 πŸ”— khaoohs has quit IRC (Read error: Connection reset by peer)
11:13 πŸ”— xk_id has joined #archiveteam
11:13 πŸ”— khaoohs has joined #archiveteam
11:13 πŸ”— db48x you can splice all the histories together, put them on separate branches, etc
11:18 πŸ”— the_fox has quit IRC (Ping timeout: 492 seconds)
11:19 πŸ”— the_fox has joined #archiveteam
11:23 πŸ”— xk_id has quit IRC (Read error: Operation timed out)
11:29 πŸ”— Nemo_bis Interesting, I will try
11:56 πŸ”— xk_id has joined #archiveteam
12:12 πŸ”— Ymgve has quit IRC ()
12:24 πŸ”— SmileyG has joined #archiveteam
12:33 πŸ”— Smiley has quit IRC (Ping timeout: 845 seconds)
13:11 πŸ”— Morbus has quit IRC (Ping timeout: 370 seconds)
13:16 πŸ”— achip has joined #archiveteam
13:19 πŸ”— Morbus has joined #archiveteam
13:24 πŸ”— achip has quit IRC (Read error: Operation timed out)
13:27 πŸ”— kris33 has joined #archiveteam
13:42 πŸ”— sankin has joined #archiveteam
13:44 πŸ”— xk_id has quit IRC (Remote host closed the connection)
13:44 πŸ”— xk_id has joined #archiveteam
13:46 πŸ”— kris33 has quit IRC (Ping timeout: 506 seconds)
13:47 πŸ”— BlueMaxim has quit IRC (Quit: Leaving)
13:47 πŸ”— xk_id has quit IRC (Read error: Connection reset by peer)
13:47 πŸ”— xk_id has joined #archiveteam
14:58 πŸ”— Ymgve has joined #archiveteam
15:05 πŸ”— Start has quit IRC (Disconnected.)
15:23 πŸ”— rejon has joined #archiveteam
15:36 πŸ”— signius_ has quit IRC (Ping timeout: 512 seconds)
15:45 πŸ”— signius_ has joined #archiveteam
15:48 πŸ”— arkiver SketchCow: we'll have ~200.000 items for cobook
15:51 πŸ”— Start has joined #archiveteam
15:54 πŸ”— Start has quit IRC (Client Quit)
16:01 πŸ”— Start has joined #archiveteam
16:01 πŸ”— mistym has quit IRC (Remote host closed the connection)
16:09 πŸ”— SketchCow WHERE IS MY HUG
16:09 πŸ”— SketchCow So wait, cobook?
16:09 πŸ”— SketchCow I am not sure I know what this is.
16:09 πŸ”— SketchCow Found
16:09 πŸ”— SketchCow Thank goodness, http://www.archiveteam.org/index.php?title=Current_Projects is active
16:15 πŸ”— garyrh_ We're the Hug Team.
16:16 πŸ”— garyrh_ We're gonna hug yer shit up.
16:23 πŸ”— SketchCow 989 of the bitsavers items have no OCR!
16:52 πŸ”— Start has quit IRC (Quit: Disconnected.)
16:53 πŸ”— Start has joined #archiveteam
16:54 πŸ”— Start has quit IRC (Read error: Connection reset by peer)
16:54 πŸ”— Start has joined #archiveteam
16:58 πŸ”— Start has quit IRC (Client Quit)
16:58 πŸ”— Start has joined #archiveteam
16:58 πŸ”— Start has quit IRC (Client Quit)
17:00 πŸ”— Start has joined #archiveteam
17:00 πŸ”— Start has quit IRC (Client Quit)
17:00 πŸ”— Start has joined #archiveteam
17:00 πŸ”— Start has quit IRC (Client Quit)
17:04 πŸ”— mistym has joined #archiveteam
17:04 πŸ”— DFJustin ia in general seems to need more retrospective "did the derive fuck up" passes
17:15 πŸ”— aaaaaaaaa has joined #archiveteam
17:18 πŸ”— SketchCow Yes, agreed
17:18 πŸ”— SketchCow I think derive is one of those things thrown over the side if there's an issue.
17:21 πŸ”— K4k has joined #archiveteam
17:24 πŸ”— Start has joined #archiveteam
17:25 πŸ”— midas SketchCow: current projects should be in the topic
17:45 πŸ”— Start has quit IRC (Disconnected.)
17:48 πŸ”— SketchCow 65G ovi-store
17:48 πŸ”— SketchCow 25G ovi-store_attempt_2
17:48 πŸ”— SketchCow What's the story
17:49 πŸ”— Kazzy they blocked us, useragent bans iirc
17:51 πŸ”— rejon has quit IRC (Ping timeout: 370 seconds)
18:22 πŸ”— Morbus has quit IRC (Ping timeout: 248 seconds)
18:29 πŸ”— arkiver The Cobook project has started: #cookbook
18:31 πŸ”— Start has joined #archiveteam
18:42 πŸ”— Start has quit IRC (Quit: Disconnected.)
19:08 πŸ”— kyan_ has quit IRC (Quit: Leaving)
19:18 πŸ”— phuzion has joined #archiveteam
19:44 πŸ”— Start has joined #archiveteam
19:45 πŸ”— Start_ has joined #archiveteam
19:45 πŸ”— Start has quit IRC (Read error: Connection reset by peer)
19:47 πŸ”— sankin has quit IRC (Leaving.)
19:49 πŸ”— Peetz0r What does 'ia upload' do when it shows "identifier:" but does not yet show any upload progress bar?
19:49 πŸ”— Peetz0r Also, why is disk usage trough the roof during that process?
19:51 πŸ”— Peetz0r I am already using 'ionice -c3' to prevent as much issues as I can, but it still prevents some other uses of the same machine while ia upload is running
19:52 πŸ”— midas it's creating a hash of your data and making sure it has enough storage to upload it
19:52 πŸ”— Peetz0r you mean, storage on the S3 side?
19:52 πŸ”— midas yep
19:53 πŸ”— Peetz0r also, the hasing should be cpu-bound, right?
19:53 πŸ”— Peetz0r I see almost no cpu usage during that'
19:54 πŸ”— Start_ has quit IRC (Quit: Disconnected.)
19:55 πŸ”— Peetz0r my current ia upload started 18:48 and I paused it somewhere between 19:05 and 19:10 because we wanted to watch TV (the same machine runs kodi with the nederland24 plugin in order to watch tv), but we watched matthijs van nieuwkerk at basically 1fps without sound
19:55 πŸ”— Peetz0r it was funny but pointless
19:55 πŸ”— Peetz0r I had to pause ia upload and restart kodi
19:56 πŸ”— Peetz0r but between 18:48 and 19:05 there was not much cpu usage
19:56 πŸ”— Peetz0r well, there was, but most if it was iowait, not user or system
19:57 πŸ”— Peetz0r so that should be managed by ionice -c3, right? yet somehow, it does not
19:57 πŸ”— Peetz0r I have graphs of cpu usage and disk usage and eth0 traffic and more :)
20:02 πŸ”— Morbus has joined #archiveteam
20:16 πŸ”— mistym has quit IRC (Remote host closed the connection)
20:20 πŸ”— Start has joined #archiveteam
20:23 πŸ”— Start has quit IRC (Client Quit)
20:35 πŸ”— Start has joined #archiveteam
20:41 πŸ”— Nemo_bis Outrageous that ia wasn't optimised for the movie use case
20:43 πŸ”— DFJustin hashing is disk-bound for large files
20:43 πŸ”— DFJustin if you have a 50gb tar it has to read in the entire thing to hash it
20:58 πŸ”— signius_ has quit IRC (ircd.choopa.net irc.eversible.com)
20:58 πŸ”— the_fox has quit IRC (ircd.choopa.net irc.eversible.com)
20:58 πŸ”— dashcloud has quit IRC (ircd.choopa.net irc.eversible.com)
20:58 πŸ”— balrog has quit IRC (ircd.choopa.net irc.eversible.com)
20:58 πŸ”— Cameron_D has quit IRC (ircd.choopa.net irc.eversible.com)
20:58 πŸ”— Rickster has quit IRC (ircd.choopa.net irc.eversible.com)
20:58 πŸ”— xmc has quit IRC (ircd.choopa.net irc.eversible.com)
20:58 πŸ”— sivoais has quit IRC (ircd.choopa.net irc.eversible.com)
20:58 πŸ”— slash` has quit IRC (ircd.choopa.net irc.eversible.com)
20:58 πŸ”— Famicoman has quit IRC (ircd.choopa.net irc.eversible.com)
20:58 πŸ”— superkuh has quit IRC (ircd.choopa.net irc.eversible.com)
20:58 πŸ”— boozehoun has quit IRC (ircd.choopa.net irc.eversible.com)
21:14 πŸ”— S[h]O[r]T has quit IRC (Read error: Operation timed out)
21:23 πŸ”— Start has quit IRC (Quit: Disconnected.)
21:25 πŸ”— Peetz0r Nemo_bis: yes it is :p (insert sarcasm tag ere)
21:25 πŸ”— Peetz0r DFJustin: I understand, but why does 'ionice -c3' not prevent my issue?
21:25 πŸ”— Start has joined #archiveteam
21:37 πŸ”— arkiver SketchCow: looks like cobook is a lot smaller then I thought it was, will be done in a few hours
21:38 πŸ”— signius_ has joined #archiveteam
21:38 πŸ”— the_fox has joined #archiveteam
21:38 πŸ”— dashcloud has joined #archiveteam
21:38 πŸ”— balrog has joined #archiveteam
21:38 πŸ”— Cameron_D has joined #archiveteam
21:38 πŸ”— Rickster has joined #archiveteam
21:38 πŸ”— xmc has joined #archiveteam
21:38 πŸ”— sivoais has joined #archiveteam
21:38 πŸ”— slash` has joined #archiveteam
21:38 πŸ”— Famicoman has joined #archiveteam
21:38 πŸ”— superkuh has joined #archiveteam
21:38 πŸ”— boozehoun has joined #archiveteam
21:38 πŸ”— irc.eversible.com sets mode: +oo balrog xmc
21:38 πŸ”— swebb sets mode: +o balrog
21:38 πŸ”— swebb sets mode: +o xmc
21:38 πŸ”— schbirid has quit IRC (Quit: Leaving)
21:38 πŸ”— balrog sets mode: +o Lord_Nigh
21:41 πŸ”— mistym has joined #archiveteam
21:44 πŸ”— S[h]O[r]T has joined #archiveteam
21:47 πŸ”— balrog has quit IRC (Quit: Bye)
21:48 πŸ”— balrog has joined #archiveteam
21:48 πŸ”— swebb sets mode: +o balrog
22:02 πŸ”— K4k has quit IRC (Read error: Operation timed out)
22:04 πŸ”— Start arkiver: we might want to start www.odysee.com next
22:04 πŸ”— Start it's a video and photo site shutting down on feb. 23
22:20 πŸ”— Start has quit IRC (Quit: Disconnected.)
22:24 πŸ”— Riviera has joined #archiveteam
23:11 πŸ”— Start has joined #archiveteam
23:25 πŸ”— xk_id has quit IRC (Remote host closed the connection)
23:28 πŸ”— arkiver Start: yes, let's do that
23:31 πŸ”— Ymgve has quit IRC ()
23:46 πŸ”— Start_ has joined #archiveteam
23:47 πŸ”— Start has quit IRC (Read error: Connection reset by peer)
23:48 πŸ”— cadbury_ has quit IRC (Read error: Operation timed out)
23:52 πŸ”— SN4T14_ is now known as SN4T14
23:52 πŸ”— cadbury_ has joined #archiveteam

irclogger-viewer