#archiveteam 2016-09-27,Tue

↑back Search

Time Nickname Message
00:05 🔗 maelstrom has quit IRC (Quit: Leaving)
00:05 🔗 maelstrom has joined #archiveteam
00:10 🔗 notafed has joined #archiveteam
00:10 🔗 maelstrom has quit IRC (Read error: Connection reset by peer)
00:20 🔗 RichardG has joined #archiveteam
00:20 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
00:21 🔗 RichardG has joined #archiveteam
00:32 🔗 ZeoNet has joined #archiveteam
00:34 🔗 kyounko has joined #archiveteam
00:42 🔗 BlueMaxim has quit IRC (Quit: Leaving)
00:44 🔗 ZeoNet_ has joined #archiveteam
00:47 🔗 ZeoNet has quit IRC (Ping timeout: 244 seconds)
00:58 🔗 mafrasi2 has joined #archiveteam
00:59 🔗 RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue)
01:09 🔗 RichardG has joined #archiveteam
01:11 🔗 RichardG has quit IRC (Client Quit)
01:14 🔗 kristian_ has quit IRC (Quit: Leaving)
01:15 🔗 RichardG has joined #archiveteam
01:18 🔗 ZeoNet__ has joined #archiveteam
01:20 🔗 ZeoNet_ has quit IRC (Ping timeout: 244 seconds)
01:24 🔗 ndiddy has joined #archiveteam
01:34 🔗 ZeoNet__ is now known as ZeoNet
01:36 🔗 notafed has quit IRC (Read error: Operation timed out)
01:45 🔗 maelstrom has joined #archiveteam
02:46 🔗 Start has quit IRC (Read error: Connection reset by peer)
02:46 🔗 Start_ has joined #archiveteam
02:53 🔗 Start_ is now known as Start
03:15 🔗 ndiddy has quit IRC (Read error: Connection reset by peer)
03:41 🔗 RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue)
04:10 🔗 RichardG has joined #archiveteam
04:11 🔗 zout ok. wgetting.
04:12 🔗 zout if anyone wants my secret sauce IP range, just ask.
04:12 🔗 nwf has quit IRC (Read error: Connection reset by peer)
04:13 🔗 nwf has joined #archiveteam
04:14 🔗 zout on the assumption that I could be interrupted, I'm doing front pages first.
04:20 🔗 zout current thread, "why woman don't need rights" ._.
04:20 🔗 DFJustin has quit IRC (Remote host closed the connection)
04:25 🔗 DFJustin has joined #archiveteam
04:37 🔗 Atros has joined #archiveteam
04:39 🔗 atrocity has quit IRC (Ping timeout: 260 seconds)
04:40 🔗 atrocity has joined #archiveteam
04:43 🔗 Atros has quit IRC (Read error: Operation timed out)
04:44 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
04:51 🔗 Sk1d has joined #archiveteam
05:11 🔗 dashcloud has quit IRC (Ping timeout: 250 seconds)
05:20 🔗 RichardG has quit IRC (Read error: Operation timed out)
05:20 🔗 RichardG has joined #archiveteam
05:22 🔗 dashcloud has joined #archiveteam
05:26 🔗 zout in general does much software care about the 'request' in a warc, or just the response?
05:30 🔗 maelstrom has quit IRC (Quit: Leaving)
05:33 🔗 RichardG has quit IRC (Ping timeout: 259 seconds)
05:44 🔗 RichardG has joined #archiveteam
06:03 🔗 BlueMaxim has joined #archiveteam
06:09 🔗 RichardG has quit IRC (Read error: Operation timed out)
06:09 🔗 yipdw zout: the request records are used by a lot of replay software, e.g. wayback and pywb
06:09 🔗 yipdw so yes, a lot of software cares
06:10 🔗 zout my stuff seems at least minimally compatible with pywb, so great.
06:11 🔗 yipdw there's a few pretty good libraries out there that take care of the thorny parts
06:12 🔗 zout yeah, I'm using `warc` in python but was a bit unsure about some of the content in the request. ended up just making a warc with wget and copying the structure.
06:20 🔗 zout cloudflare susprisingly hasn't banned me for scraping through it yet.
07:13 🔗 vOYtEC has joined #archiveteam
07:35 🔗 vOYtEC has quit IRC (Ping timeout: 255 seconds)
08:27 🔗 ravetcofx has quit IRC (Ping timeout: 506 seconds)
08:37 🔗 kurt has joined #archiveteam
08:56 🔗 ZeoNet has quit IRC (Ping timeout: 370 seconds)
09:06 🔗 arkiver2 has joined #archiveteam
09:06 🔗 arkiver2 zout: so you are custom creating WARC files?
09:07 🔗 arkiver2 Can I please see the script you are creating the WARCs and an example of a WARC file created with it?
09:08 🔗 arkiver2 to check how you are handling the HTTP headers.
09:08 🔗 arkiver2 and request/response/other records
09:09 🔗 arkiver2 why are you not using wpull or wget? wpull has support for custom scripts for your crawl.
09:10 🔗 arkiver2 Basically if WARC files miss information, or have wrong headers (also HTTP headers), they will not go into the wayback machine, even if they are supported by the wayback machine
09:15 🔗 arkiver2 there is also wget-lua, which has support for lua scripts.
09:54 🔗 yipdw has quit IRC (Read error: Operation timed out)
09:56 🔗 arkiver2 has quit IRC (Read error: Connection reset by peer)
09:58 🔗 Infreq has quit IRC (Read error: Operation timed out)
09:58 🔗 brayden_ has joined #archiveteam
09:58 🔗 swebb sets mode: +o brayden_
09:58 🔗 brayden has quit IRC (Read error: Operation timed out)
10:00 🔗 Infreq has joined #archiveteam
10:00 🔗 yipdw has joined #archiveteam
10:05 🔗 logchfoo3 has quit IRC (Ping timeout: 250 seconds)
10:07 🔗 logchfoo0 starts logging #archiveteam at Tue Sep 27 10:07:22 2016
10:07 🔗 logchfoo0 has joined #archiveteam
10:08 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
10:08 🔗 BlueMaxim has joined #archiveteam
10:16 🔗 godane has quit IRC (Read error: Operation timed out)
10:17 🔗 zout arkiver2: do user submitted WARC ever make it into the wayback machine proper?
10:18 🔗 Sanqui under exceptional agreements
10:21 🔗 hyperion_ has joined #archiveteam
10:22 🔗 zout arkiver2: PM'd a sample from my WARC. let me know if I'm missing anything, I'm not very far through so altering the format now wouldn't be a problem.
10:24 🔗 zout arkiver: ^
10:27 🔗 zout I didn't think IA ever took input for the wayback machine from outside sources so that wasn't factored into my decision making at all.
10:28 🔗 godane has joined #archiveteam
10:30 🔗 kyounko has quit IRC (KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
10:43 🔗 hyperion_ has quit IRC (Ping timeout: 250 seconds)
11:02 🔗 godane has quit IRC (Quit: Leaving.)
11:02 🔗 godane has joined #archiveteam
11:17 🔗 RichardG has joined #archiveteam
11:59 🔗 BartoCH has quit IRC (Ping timeout: 260 seconds)
12:23 🔗 BlueMaxim has quit IRC (Quit: Leaving)
12:30 🔗 BartoCH has joined #archiveteam
12:41 🔗 ZeoNet has joined #archiveteam
12:48 🔗 RichardG has quit IRC (Ping timeout: 255 seconds)
13:41 🔗 ZeoNet_ has joined #archiveteam
13:54 🔗 ZeoNet has quit IRC (Read error: Operation timed out)
13:54 🔗 ZeoNet_ is now known as ZeoNet
14:24 🔗 Start has quit IRC (Quit: Disconnected.)
15:06 🔗 RichardG has joined #archiveteam
15:37 🔗 arkiver zout: if you are using some custom scripts, can you please send me that too?
15:37 🔗 arkiver Not a lot of usermade WARCs go into the wayback machine
15:37 🔗 arkiver but if the way these WARCs were made is good
15:37 🔗 arkiver and the actual WARCs are good I don't see a reason to not put them in the wayback machine
15:39 🔗 arkiver zout: and I think it would be good to have hackforums in the wayback machine
15:39 🔗 arkiver :)
15:44 🔗 VADemon has joined #archiveteam
15:44 🔗 VADemon has quit IRC (Read error: Connection reset by peer)
15:45 🔗 VADemon has joined #archiveteam
15:45 🔗 VADemon has quit IRC (Read error: Connection reset by peer)
15:47 🔗 VADemon has joined #archiveteam
16:16 🔗 Atom-- has joined #archiveteam
16:18 🔗 chiefyg has joined #archiveteam
16:18 🔗 chiefyg WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
16:18 🔗 RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue)
16:20 🔗 Atom has quit IRC (Read error: Operation timed out)
16:20 🔗 chiefyg anybody?
16:21 🔗 xmc yahoosucks
16:21 🔗 xmc chiefyg: ^
16:21 🔗 chiefyg thanks :)
16:23 🔗 chiefyg has quit IRC (Quit: Page closed)
16:23 🔗 ZeoNet has quit IRC (Ping timeout: 370 seconds)
16:36 🔗 ZeoNet has joined #archiveteam
16:48 🔗 dashcloud has quit IRC (Read error: Operation timed out)
16:48 🔗 robink has quit IRC (Ping timeout: 633 seconds)
16:49 🔗 AlexLehm has joined #archiveteam
16:55 🔗 RichardG has joined #archiveteam
16:57 🔗 robink has joined #archiveteam
17:00 🔗 ZeoNet has quit IRC (Ping timeout: 244 seconds)
17:03 🔗 ravetcofx has joined #archiveteam
17:06 🔗 dashcloud has joined #archiveteam
17:12 🔗 RoanKatto has quit IRC (Ping timeout: 506 seconds)
18:03 🔗 bRick5772 has joined #archiveteam
18:13 🔗 swebb3 has joined #archiveteam
18:19 🔗 SketchCow Make a difference
18:20 🔗 xmc ?
18:20 🔗 SketchCow That was my pep talk to him
18:20 🔗 SketchCow Hey, it's late here
18:20 🔗 swebb3 has quit IRC (Remote host closed the connection)
18:20 🔗 xmc to whom?
18:30 🔗 ndiddy has joined #archiveteam
18:30 🔗 SketchCow chiefyg
18:32 🔗 xmc oh
18:32 🔗 SketchCow Yeah, not exactly a murder mystery
18:33 🔗 xmc sorry, i just can't get it up
18:34 🔗 SketchCow Eveything's a murder mystery if you try hard enough.
18:34 🔗 SketchCow OK, Gawker storm is over.
18:36 🔗 SketchCow changes topic to: Archive Team: We're not archive.org | http://archiveteam.org/ | lengthy/off-topic in #archiveteam-bs | With AT you Save
18:55 🔗 Morbus has joined #archiveteam
19:16 🔗 ZeoNet has joined #archiveteam
19:24 🔗 dashcloud has quit IRC (Read error: Operation timed out)
19:26 🔗 swebb over in a good way or a bad way?
19:27 🔗 swebb My old gawker crawl is still running: https://archive.org/details/gawkermedia-20160624190933
19:28 🔗 dr3gs has joined #archiveteam
19:28 🔗 dashcloud has joined #archiveteam
19:36 🔗 VADemon has quit IRC (Quit: left4dead)
19:41 🔗 dashcloud has quit IRC (Read error: Operation timed out)
19:44 🔗 dashcloud has joined #archiveteam
19:58 🔗 dr3gs has quit IRC (Leaving)
20:23 🔗 maelstrom has joined #archiveteam
20:36 🔗 bzc6p has joined #archiveteam
20:36 🔗 swebb sets mode: +o bzc6p
20:37 🔗 bzc6p SketchCow: myVIP project is done. I've just sent you a mail with some more information, as I don't want to disturb your holiday with work now.
20:37 🔗 bzc6p Gentlemen: thank you everyone who helped saving myVIP.
20:38 🔗 bzc6p has quit IRC (Client Quit)
20:39 🔗 blacwtr has joined #archiveteam
20:39 🔗 HCross No problem
20:39 🔗 * HCross bows
20:41 🔗 ZeoNet has quit IRC (Ping timeout: 370 seconds)
20:42 🔗 blacwtr has quit IRC (Client Quit)
20:57 🔗 nickname_ has joined #archiveteam
21:27 🔗 SketchCow You got it.
21:37 🔗 z00nx has quit IRC (Remote host closed the connection)
21:49 🔗 nickname_ has quit IRC (Ping timeout: 492 seconds)
22:15 🔗 bRick5772 has quit IRC (Quit: Leaving.)
22:16 🔗 AlexLehm has quit IRC (Ping timeout: 260 seconds)
22:41 🔗 dashcloud has quit IRC (Read error: Operation timed out)
22:45 🔗 dashcloud has joined #archiveteam
23:19 🔗 jspiros has quit IRC (leaving)
23:22 🔗 Aranje has joined #archiveteam
23:45 🔗 dashcloud has quit IRC (Remote host closed the connection)
23:46 🔗 BlueMaxim has joined #archiveteam

irclogger-viewer