[00:05] *** maelstrom has quit IRC (Quit: Leaving) [00:05] *** maelstrom has joined #archiveteam [00:10] *** notafed has joined #archiveteam [00:10] *** maelstrom has quit IRC (Read error: Connection reset by peer) [00:20] *** RichardG has joined #archiveteam [00:20] *** RichardG has quit IRC (Read error: Connection reset by peer) [00:21] *** RichardG has joined #archiveteam [00:32] *** ZeoNet has joined #archiveteam [00:34] *** kyounko has joined #archiveteam [00:42] *** BlueMaxim has quit IRC (Quit: Leaving) [00:44] *** ZeoNet_ has joined #archiveteam [00:47] *** ZeoNet has quit IRC (Ping timeout: 244 seconds) [00:58] *** mafrasi2 has joined #archiveteam [00:59] *** RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) [01:09] *** RichardG has joined #archiveteam [01:11] *** RichardG has quit IRC (Client Quit) [01:14] *** kristian_ has quit IRC (Quit: Leaving) [01:15] *** RichardG has joined #archiveteam [01:18] *** ZeoNet__ has joined #archiveteam [01:20] *** ZeoNet_ has quit IRC (Ping timeout: 244 seconds) [01:24] *** ndiddy has joined #archiveteam [01:34] *** ZeoNet__ is now known as ZeoNet [01:36] *** notafed has quit IRC (Read error: Operation timed out) [01:45] *** maelstrom has joined #archiveteam [02:46] *** Start has quit IRC (Read error: Connection reset by peer) [02:46] *** Start_ has joined #archiveteam [02:53] *** Start_ is now known as Start [03:15] *** ndiddy has quit IRC (Read error: Connection reset by peer) [03:41] *** RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) [04:10] *** RichardG has joined #archiveteam [04:11] ok. wgetting. [04:12] if anyone wants my secret sauce IP range, just ask. [04:12] *** nwf has quit IRC (Read error: Connection reset by peer) [04:13] *** nwf has joined #archiveteam [04:14] on the assumption that I could be interrupted, I'm doing front pages first. [04:20] current thread, "why woman don't need rights" ._. [04:20] *** DFJustin has quit IRC (Remote host closed the connection) [04:25] *** DFJustin has joined #archiveteam [04:37] *** Atros has joined #archiveteam [04:39] *** atrocity has quit IRC (Ping timeout: 260 seconds) [04:40] *** atrocity has joined #archiveteam [04:43] *** Atros has quit IRC (Read error: Operation timed out) [04:44] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:51] *** Sk1d has joined #archiveteam [05:11] *** dashcloud has quit IRC (Ping timeout: 250 seconds) [05:20] *** RichardG has quit IRC (Read error: Operation timed out) [05:20] *** RichardG has joined #archiveteam [05:22] *** dashcloud has joined #archiveteam [05:26] in general does much software care about the 'request' in a warc, or just the response? [05:30] *** maelstrom has quit IRC (Quit: Leaving) [05:33] *** RichardG has quit IRC (Ping timeout: 259 seconds) [05:44] *** RichardG has joined #archiveteam [06:03] *** BlueMaxim has joined #archiveteam [06:09] *** RichardG has quit IRC (Read error: Operation timed out) [06:09] zout: the request records are used by a lot of replay software, e.g. wayback and pywb [06:09] so yes, a lot of software cares [06:10] my stuff seems at least minimally compatible with pywb, so great. [06:11] there's a few pretty good libraries out there that take care of the thorny parts [06:12] yeah, I'm using `warc` in python but was a bit unsure about some of the content in the request. ended up just making a warc with wget and copying the structure. [06:20] cloudflare susprisingly hasn't banned me for scraping through it yet. [07:13] *** vOYtEC has joined #archiveteam [07:35] *** vOYtEC has quit IRC (Ping timeout: 255 seconds) [08:27] *** ravetcofx has quit IRC (Ping timeout: 506 seconds) [08:37] *** kurt has joined #archiveteam [08:56] *** ZeoNet has quit IRC (Ping timeout: 370 seconds) [09:06] *** arkiver2 has joined #archiveteam [09:06] zout: so you are custom creating WARC files? [09:07] Can I please see the script you are creating the WARCs and an example of a WARC file created with it? [09:08] to check how you are handling the HTTP headers. [09:08] and request/response/other records [09:09] why are you not using wpull or wget? wpull has support for custom scripts for your crawl. [09:10] Basically if WARC files miss information, or have wrong headers (also HTTP headers), they will not go into the wayback machine, even if they are supported by the wayback machine [09:15] there is also wget-lua, which has support for lua scripts. [09:54] *** yipdw has quit IRC (Read error: Operation timed out) [09:56] *** arkiver2 has quit IRC (Read error: Connection reset by peer) [09:58] *** Infreq has quit IRC (Read error: Operation timed out) [09:58] *** brayden_ has joined #archiveteam [09:58] *** swebb sets mode: +o brayden_ [09:58] *** brayden has quit IRC (Read error: Operation timed out) [10:00] *** Infreq has joined #archiveteam [10:00] *** yipdw has joined #archiveteam [10:05] *** logchfoo3 has quit IRC (Ping timeout: 250 seconds) [10:07] *** logchfoo0 starts logging #archiveteam at Tue Sep 27 10:07:22 2016 [10:07] *** logchfoo0 has joined #archiveteam [10:08] *** BlueMaxim has quit IRC (Read error: Operation timed out) [10:08] *** BlueMaxim has joined #archiveteam [10:16] *** godane has quit IRC (Read error: Operation timed out) [10:17] arkiver2: do user submitted WARC ever make it into the wayback machine proper? [10:18] under exceptional agreements [10:21] *** hyperion_ has joined #archiveteam [10:22] arkiver2: PM'd a sample from my WARC. let me know if I'm missing anything, I'm not very far through so altering the format now wouldn't be a problem. [10:24] arkiver: ^ [10:27] I didn't think IA ever took input for the wayback machine from outside sources so that wasn't factored into my decision making at all. [10:28] *** godane has joined #archiveteam [10:30] *** kyounko has quit IRC (KVIrc 4.2.0 Equilibrium http://www.kvirc.net/) [10:43] *** hyperion_ has quit IRC (Ping timeout: 250 seconds) [11:02] *** godane has quit IRC (Quit: Leaving.) [11:02] *** godane has joined #archiveteam [11:17] *** RichardG has joined #archiveteam [11:59] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [12:23] *** BlueMaxim has quit IRC (Quit: Leaving) [12:30] *** BartoCH has joined #archiveteam [12:41] *** ZeoNet has joined #archiveteam [12:48] *** RichardG has quit IRC (Ping timeout: 255 seconds) [13:41] *** ZeoNet_ has joined #archiveteam [13:54] *** ZeoNet has quit IRC (Read error: Operation timed out) [13:54] *** ZeoNet_ is now known as ZeoNet [14:24] *** Start has quit IRC (Quit: Disconnected.) [15:06] *** RichardG has joined #archiveteam [15:37] zout: if you are using some custom scripts, can you please send me that too? [15:37] Not a lot of usermade WARCs go into the wayback machine [15:37] but if the way these WARCs were made is good [15:37] and the actual WARCs are good I don't see a reason to not put them in the wayback machine [15:39] zout: and I think it would be good to have hackforums in the wayback machine [15:39] :) [15:44] *** VADemon has joined #archiveteam [15:44] *** VADemon has quit IRC (Read error: Connection reset by peer) [15:45] *** VADemon has joined #archiveteam [15:45] *** VADemon has quit IRC (Read error: Connection reset by peer) [15:47] *** VADemon has joined #archiveteam [16:16] *** Atom-- has joined #archiveteam [16:18] *** chiefyg has joined #archiveteam [16:18] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [16:18] *** RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) [16:20] *** Atom has quit IRC (Read error: Operation timed out) [16:20] anybody? [16:21] yahoosucks [16:21] chiefyg: ^ [16:21] thanks :) [16:23] *** chiefyg has quit IRC (Quit: Page closed) [16:23] *** ZeoNet has quit IRC (Ping timeout: 370 seconds) [16:36] *** ZeoNet has joined #archiveteam [16:48] *** dashcloud has quit IRC (Read error: Operation timed out) [16:48] *** robink has quit IRC (Ping timeout: 633 seconds) [16:49] *** AlexLehm has joined #archiveteam [16:55] *** RichardG has joined #archiveteam [16:57] *** robink has joined #archiveteam [17:00] *** ZeoNet has quit IRC (Ping timeout: 244 seconds) [17:03] *** ravetcofx has joined #archiveteam [17:06] *** dashcloud has joined #archiveteam [17:12] *** RoanKatto has quit IRC (Ping timeout: 506 seconds) [18:03] *** bRick5772 has joined #archiveteam [18:13] *** swebb3 has joined #archiveteam [18:19] Make a difference [18:20] ? [18:20] That was my pep talk to him [18:20] Hey, it's late here [18:20] *** swebb3 has quit IRC (Remote host closed the connection) [18:20] to whom? [18:30] *** ndiddy has joined #archiveteam [18:30] chiefyg [18:32] oh [18:32] Yeah, not exactly a murder mystery [18:33] sorry, i just can't get it up [18:34] Eveything's a murder mystery if you try hard enough. [18:34] OK, Gawker storm is over. [18:36] *** SketchCow changes topic to: Archive Team: We're not archive.org | http://archiveteam.org/ | lengthy/off-topic in #archiveteam-bs | With AT you Save [18:55] *** Morbus has joined #archiveteam [19:16] *** ZeoNet has joined #archiveteam [19:24] *** dashcloud has quit IRC (Read error: Operation timed out) [19:26] over in a good way or a bad way? [19:27] My old gawker crawl is still running: https://archive.org/details/gawkermedia-20160624190933 [19:28] *** dr3gs has joined #archiveteam [19:28] *** dashcloud has joined #archiveteam [19:36] *** VADemon has quit IRC (Quit: left4dead) [19:41] *** dashcloud has quit IRC (Read error: Operation timed out) [19:44] *** dashcloud has joined #archiveteam [19:58] *** dr3gs has quit IRC (Leaving) [20:23] *** maelstrom has joined #archiveteam [20:36] *** bzc6p has joined #archiveteam [20:36] *** swebb sets mode: +o bzc6p [20:37] SketchCow: myVIP project is done. I've just sent you a mail with some more information, as I don't want to disturb your holiday with work now. [20:37] Gentlemen: thank you everyone who helped saving myVIP. [20:38] *** bzc6p has quit IRC (Client Quit) [20:39] *** blacwtr has joined #archiveteam [20:39] No problem [20:39] * HCross bows [20:41] *** ZeoNet has quit IRC (Ping timeout: 370 seconds) [20:42] *** blacwtr has quit IRC (Client Quit) [20:57] *** nickname_ has joined #archiveteam [21:27] You got it. [21:37] *** z00nx has quit IRC (Remote host closed the connection) [21:49] *** nickname_ has quit IRC (Ping timeout: 492 seconds) [22:15] *** bRick5772 has quit IRC (Quit: Leaving.) [22:16] *** AlexLehm has quit IRC (Ping timeout: 260 seconds) [22:41] *** dashcloud has quit IRC (Read error: Operation timed out) [22:45] *** dashcloud has joined #archiveteam [23:19] *** jspiros has quit IRC (leaving) [23:22] *** Aranje has joined #archiveteam [23:45] *** dashcloud has quit IRC (Remote host closed the connection) [23:46] *** BlueMaxim has joined #archiveteam