[00:04] heh, oops https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/5866 [00:59] *** BlueMaxim has joined #archiveteam [01:49] *** verizon has quit IRC (Quit: Page closed) [02:05] *** zenguy_pc has joined #archiveteam [02:29] are there any tools around that will let me deduplicate a very large WARC? [02:32] zout: warcat, although you'll have to write some additional code yourself... [02:33] thought so, no matter. [02:34] it'd be nice to have a warcat mode to do that, though [02:34] *** MMovie2 has quit IRC (Read error: Operation timed out) [02:34] if it turns out clean I'll contrib. [02:35] there's a PR asking for something similar https://github.com/chfoo/warcat/issues/13 [02:37] well, you aren't alone in wanting it :-) [02:38] zout: arkiver has written the code for that -- it may already be in the flickr grab source [02:42] *** MMovie has joined #archiveteam [02:45] *** MMovie2 has joined #archiveteam [02:47] *** maelstrom has joined #archiveteam [02:49] *** MMovie has quit IRC (Read error: Operation timed out) [02:54] *** maelstrom has quit IRC (Ping timeout: 250 seconds) [02:59] *** maelstrom has joined #archiveteam [03:44] *** ndiddy has quit IRC (Read error: Connection reset by peer) [03:46] *** zenguy_pc has quit IRC (Read error: Operation timed out) [03:51] *** zenguy_pc has joined #archiveteam [04:04] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:11] *** Sk1d has joined #archiveteam [05:02] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:08] *** Sk1d has joined #archiveteam [05:15] *** maelstrom has quit IRC (Quit: Leaving) [05:18] *** JesseW has quit IRC (Ping timeout: 370 seconds) [06:08] *** MMovie2 has quit IRC (Read error: Operation timed out) [06:16] *** ravetcofx has joined #archiveteam [06:48] *** MMovie has joined #archiveteam [07:16] *** MMovie2 has joined #archiveteam [07:18] *** MMovie has quit IRC (Read error: Operation timed out) [07:23] *** vOYtEC has joined #archiveteam [07:32] *** ravetcofx has quit IRC (Ping timeout: 370 seconds) [07:41] *** mutoso has quit IRC (Quit: leaving) [07:52] *** schbirid has joined #archiveteam [07:52] *** Igloo^ has quit IRC (Read error: Operation timed out) [08:01] *** mutoso has joined #archiveteam [08:02] *** atomotic has joined #archiveteam [08:12] *** ravetcofx has joined #archiveteam [08:28] *** RoanKatto has joined #archiveteam [08:30] *** ravetcofx has quit IRC (Ping timeout: 370 seconds) [08:31] *** ravetcofx has joined #archiveteam [08:31] *** Morbus has quit IRC (Read error: Operation timed out) [08:44] *** Honno has joined #archiveteam [09:06] *** WinterFox has joined #archiveteam [09:24] *** ravetcofx has quit IRC (Read error: Operation timed out) [09:29] *** Aranje has quit IRC (Quit: Three sheets to the wind) [09:43] *** nwf_ has quit IRC (Read error: Operation timed out) [09:52] *** nwf_ has joined #archiveteam [10:49] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [11:16] zout: I can give you some code that will deduplicate a WARC [11:17] I only didn't receive feedback confirmation on it yet from IA that it is decuplicated correctly [11:17] Though, I'm pretty sure it's done ok (plays back ok, etc.) [11:17] *** godane has quit IRC (Ping timeout: 633 seconds) [11:35] arkiver: that'd be helpful [12:09] *** tuankiet has quit IRC (Remote host closed the connection) [12:09] *** BlueMaxim has quit IRC (Read error: Operation timed out) [12:39] *** artherw has joined #archiveteam [12:58] *** atomotic has joined #archiveteam [12:59] *** godane has joined #archiveteam [13:06] *** zenguy_pc has quit IRC (Read error: Operation timed out) [13:32] *** ndiddy has joined #archiveteam [13:53] *** Stiletto has joined #archiveteam [13:56] *** WinterFox has quit IRC (Read error: Operation timed out) [13:59] *** zenguy_pc has joined #archiveteam [14:48] *** arkiver2_ has joined #archiveteam [14:49] also posted this in #googlecode. [14:49] checking the CDX records for the WARCs for status code 403, since we had a problem with that some time ago with the google code grab [14:50] Currently google gives us status code 503 for all of google code, but if that it fixed we'll regrab the project that had a 403 [14:50] * arkiver2_ is afk [14:50] *** arkiver2_ has quit IRC (Quit: BitchX: for distribution only with a new PC) [14:50] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [14:53] *** JesseW has joined #archiveteam [15:19] *** MMovie2 has quit IRC (Read error: Operation timed out) [15:28] *** MMovie has joined #archiveteam [16:35] *** ravetcofx has joined #archiveteam [16:36] *** redlob has quit IRC (Ping timeout: 260 seconds) [16:41] *** redlob has joined #archiveteam [16:45] *** MMovie2 has joined #archiveteam [16:47] *** MMovie has quit IRC (Read error: Operation timed out) [16:50] apparently there is some botched org.za domain migration with lots of suspensions [16:50] thats all i know [16:57] *** maelstrom has joined #archiveteam [17:05] *** asdasds has joined #archiveteam [17:05] whoa [17:05] how did i get here [17:05] *** asdasds has quit IRC (Client Quit) [17:23] *** JesseW has quit IRC (Ping timeout: 370 seconds) [17:27] *** SirCmpwn has quit IRC (Read error: Operation timed out) [17:27] *** fusl has quit IRC (Read error: Operation timed out) [17:27] *** SirCmpwn has joined #archiveteam [17:27] *** TC01 has quit IRC (Read error: Operation timed out) [17:27] *** Zialus has quit IRC (Read error: Operation timed out) [17:28] *** Petri152 has quit IRC (Read error: Operation timed out) [17:28] *** lukeman_ has quit IRC (Read error: Operation timed out) [17:28] *** Jonimus has quit IRC (Read error: Operation timed out) [17:28] *** Lord_Nigh has quit IRC (Write error: Broken pipe) [17:28] *** lukeman has joined #archiveteam [17:28] *** aMunster has quit IRC (Read error: Operation timed out) [17:28] *** phuzion has quit IRC (Read error: Operation timed out) [17:28] *** superkuh has quit IRC (Read error: Operation timed out) [17:28] *** rbraun has quit IRC (Write error: Broken pipe) [17:28] *** mhazinsk has quit IRC (Read error: Operation timed out) [17:28] *** MMovie2 has quit IRC (Read error: Operation timed out) [17:28] *** bwn has quit IRC (Read error: Operation timed out) [17:29] *** Jonimus has joined #archiveteam [17:29] *** swebb sets mode: +o Jonimus [17:29] *** Gfy has quit IRC (Read error: Operation timed out) [17:29] *** coretx has quit IRC (Read error: Operation timed out) [17:29] *** superkuh has joined #archiveteam [17:29] *** metalcamp has joined #archiveteam [17:30] *** remsen has quit IRC (Read error: Operation timed out) [17:30] *** jk[SVP] has quit IRC (Read error: Operation timed out) [17:30] *** remsen1 has joined #archiveteam [17:30] *** dxrt has quit IRC (Write error: Broken pipe) [17:30] *** TC01 has joined #archiveteam [17:30] *** beardicus has quit IRC (Read error: Operation timed out) [17:30] *** Petri152 has joined #archiveteam [17:31] *** Lord_Nigh has joined #archiveteam [17:31] *** dxrt has joined #archiveteam [17:31] *** beardicus has joined #archiveteam [17:31] *** swebb sets mode: +o beardicus [17:31] *** mhazinsk has joined #archiveteam [17:31] *** bwn has joined #archiveteam [17:31] *** phuzion has joined #archiveteam [17:31] *** Zialus has joined #archiveteam [17:31] *** MMovie has joined #archiveteam [17:32] *** fusl has joined #archiveteam [17:33] *** aMunster has joined #archiveteam [17:34] *** coretx has joined #archiveteam [17:34] *** rbraun has joined #archiveteam [17:35] *** Gfy has joined #archiveteam [17:36] *** VADemon has joined #archiveteam [17:38] *** jk[SVP] has joined #archiveteam [18:45] *** HCross has quit IRC (Read error: Connection reset by peer) [18:45] *** hawc145 has joined #archiveteam [18:46] *** hawc145 is now known as HCross [18:55] *** ravetcofx has quit IRC (Read error: Operation timed out) [18:59] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [18:59] *** metalcamp has joined #archiveteam [19:06] *** bRick5772 has joined #archiveteam [19:24] *** dashcloud has quit IRC (Remote host closed the connection) [19:26] *** dashcloud has joined #archiveteam [19:27] *** zenguy_pc has quit IRC (Read error: Operation timed out) [19:35] *** ravetcofx has joined #archiveteam [19:37] *** zenguy_pc has joined #archiveteam [19:47] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [20:06] *** dashcloud has quit IRC (Read error: Operation timed out) [20:09] *** dashcloud has joined #archiveteam [20:54] http://firefishy.com/orgza-domains-suspended-05092016-top.txt [21:01] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [21:24] *** VADemon has quit IRC (Quit: left4dead) [21:53] *** BartoCH has joined #archiveteam [21:54] *** zenguy_pc has quit IRC (Read error: Operation timed out) [22:12] *** nwf__ has joined #archiveteam [22:16] *** nwf_ has quit IRC (Read error: Operation timed out) [22:24] *** Froggypwn has joined #archiveteam [22:49] *** JesseW has joined #archiveteam [22:56] *** SilSte has quit IRC (Read error: Operation timed out) [23:13] *** SilSte has joined #archiveteam [23:25] *** JesseW has quit IRC (Ping timeout: 370 seconds) [23:48] *** schbirid has quit IRC (Quit: Leaving)