Time |
Nickname |
Message |
00:05
🔗
|
|
maelstrom has quit IRC (Quit: Leaving) |
00:05
🔗
|
|
maelstrom has joined #archiveteam |
00:10
🔗
|
|
notafed has joined #archiveteam |
00:10
🔗
|
|
maelstrom has quit IRC (Read error: Connection reset by peer) |
00:20
🔗
|
|
RichardG has joined #archiveteam |
00:20
🔗
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
00:21
🔗
|
|
RichardG has joined #archiveteam |
00:32
🔗
|
|
ZeoNet has joined #archiveteam |
00:34
🔗
|
|
kyounko has joined #archiveteam |
00:42
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
00:44
🔗
|
|
ZeoNet_ has joined #archiveteam |
00:47
🔗
|
|
ZeoNet has quit IRC (Ping timeout: 244 seconds) |
00:58
🔗
|
|
mafrasi2 has joined #archiveteam |
00:59
🔗
|
|
RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) |
01:09
🔗
|
|
RichardG has joined #archiveteam |
01:11
🔗
|
|
RichardG has quit IRC (Client Quit) |
01:14
🔗
|
|
kristian_ has quit IRC (Quit: Leaving) |
01:15
🔗
|
|
RichardG has joined #archiveteam |
01:18
🔗
|
|
ZeoNet__ has joined #archiveteam |
01:20
🔗
|
|
ZeoNet_ has quit IRC (Ping timeout: 244 seconds) |
01:24
🔗
|
|
ndiddy has joined #archiveteam |
01:34
🔗
|
|
ZeoNet__ is now known as ZeoNet |
01:36
🔗
|
|
notafed has quit IRC (Read error: Operation timed out) |
01:45
🔗
|
|
maelstrom has joined #archiveteam |
02:46
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
02:46
🔗
|
|
Start_ has joined #archiveteam |
02:53
🔗
|
|
Start_ is now known as Start |
03:15
🔗
|
|
ndiddy has quit IRC (Read error: Connection reset by peer) |
03:41
🔗
|
|
RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) |
04:10
🔗
|
|
RichardG has joined #archiveteam |
04:11
🔗
|
zout |
ok. wgetting. |
04:12
🔗
|
zout |
if anyone wants my secret sauce IP range, just ask. |
04:12
🔗
|
|
nwf has quit IRC (Read error: Connection reset by peer) |
04:13
🔗
|
|
nwf has joined #archiveteam |
04:14
🔗
|
zout |
on the assumption that I could be interrupted, I'm doing front pages first. |
04:20
🔗
|
zout |
current thread, "why woman don't need rights" ._. |
04:20
🔗
|
|
DFJustin has quit IRC (Remote host closed the connection) |
04:25
🔗
|
|
DFJustin has joined #archiveteam |
04:37
🔗
|
|
Atros has joined #archiveteam |
04:39
🔗
|
|
atrocity has quit IRC (Ping timeout: 260 seconds) |
04:40
🔗
|
|
atrocity has joined #archiveteam |
04:43
🔗
|
|
Atros has quit IRC (Read error: Operation timed out) |
04:44
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
04:51
🔗
|
|
Sk1d has joined #archiveteam |
05:11
🔗
|
|
dashcloud has quit IRC (Ping timeout: 250 seconds) |
05:20
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
05:20
🔗
|
|
RichardG has joined #archiveteam |
05:22
🔗
|
|
dashcloud has joined #archiveteam |
05:26
🔗
|
zout |
in general does much software care about the 'request' in a warc, or just the response? |
05:30
🔗
|
|
maelstrom has quit IRC (Quit: Leaving) |
05:33
🔗
|
|
RichardG has quit IRC (Ping timeout: 259 seconds) |
05:44
🔗
|
|
RichardG has joined #archiveteam |
06:03
🔗
|
|
BlueMaxim has joined #archiveteam |
06:09
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
06:09
🔗
|
yipdw |
zout: the request records are used by a lot of replay software, e.g. wayback and pywb |
06:09
🔗
|
yipdw |
so yes, a lot of software cares |
06:10
🔗
|
zout |
my stuff seems at least minimally compatible with pywb, so great. |
06:11
🔗
|
yipdw |
there's a few pretty good libraries out there that take care of the thorny parts |
06:12
🔗
|
zout |
yeah, I'm using `warc` in python but was a bit unsure about some of the content in the request. ended up just making a warc with wget and copying the structure. |
06:20
🔗
|
zout |
cloudflare susprisingly hasn't banned me for scraping through it yet. |
07:13
🔗
|
|
vOYtEC has joined #archiveteam |
07:35
🔗
|
|
vOYtEC has quit IRC (Ping timeout: 255 seconds) |
08:27
🔗
|
|
ravetcofx has quit IRC (Ping timeout: 506 seconds) |
08:37
🔗
|
|
kurt has joined #archiveteam |
08:56
🔗
|
|
ZeoNet has quit IRC (Ping timeout: 370 seconds) |
09:06
🔗
|
|
arkiver2 has joined #archiveteam |
09:06
🔗
|
arkiver2 |
zout: so you are custom creating WARC files? |
09:07
🔗
|
arkiver2 |
Can I please see the script you are creating the WARCs and an example of a WARC file created with it? |
09:08
🔗
|
arkiver2 |
to check how you are handling the HTTP headers. |
09:08
🔗
|
arkiver2 |
and request/response/other records |
09:09
🔗
|
arkiver2 |
why are you not using wpull or wget? wpull has support for custom scripts for your crawl. |
09:10
🔗
|
arkiver2 |
Basically if WARC files miss information, or have wrong headers (also HTTP headers), they will not go into the wayback machine, even if they are supported by the wayback machine |
09:15
🔗
|
arkiver2 |
there is also wget-lua, which has support for lua scripts. |
09:54
🔗
|
|
yipdw has quit IRC (Read error: Operation timed out) |
09:56
🔗
|
|
arkiver2 has quit IRC (Read error: Connection reset by peer) |
09:58
🔗
|
|
Infreq has quit IRC (Read error: Operation timed out) |
09:58
🔗
|
|
brayden_ has joined #archiveteam |
09:58
🔗
|
|
swebb sets mode: +o brayden_ |
09:58
🔗
|
|
brayden has quit IRC (Read error: Operation timed out) |
10:00
🔗
|
|
Infreq has joined #archiveteam |
10:00
🔗
|
|
yipdw has joined #archiveteam |
10:05
🔗
|
|
logchfoo3 has quit IRC (Ping timeout: 250 seconds) |
10:07
🔗
|
|
logchfoo0 starts logging #archiveteam at Tue Sep 27 10:07:22 2016 |
10:07
🔗
|
|
logchfoo0 has joined #archiveteam |
10:08
🔗
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
10:08
🔗
|
|
BlueMaxim has joined #archiveteam |
10:16
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
10:17
🔗
|
zout |
arkiver2: do user submitted WARC ever make it into the wayback machine proper? |
10:18
🔗
|
Sanqui |
under exceptional agreements |
10:21
🔗
|
|
hyperion_ has joined #archiveteam |
10:22
🔗
|
zout |
arkiver2: PM'd a sample from my WARC. let me know if I'm missing anything, I'm not very far through so altering the format now wouldn't be a problem. |
10:24
🔗
|
zout |
arkiver: ^ |
10:27
🔗
|
zout |
I didn't think IA ever took input for the wayback machine from outside sources so that wasn't factored into my decision making at all. |
10:28
🔗
|
|
godane has joined #archiveteam |
10:30
🔗
|
|
kyounko has quit IRC (KVIrc 4.2.0 Equilibrium http://www.kvirc.net/) |
10:43
🔗
|
|
hyperion_ has quit IRC (Ping timeout: 250 seconds) |
11:02
🔗
|
|
godane has quit IRC (Quit: Leaving.) |
11:02
🔗
|
|
godane has joined #archiveteam |
11:17
🔗
|
|
RichardG has joined #archiveteam |
11:59
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
12:23
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
12:30
🔗
|
|
BartoCH has joined #archiveteam |
12:41
🔗
|
|
ZeoNet has joined #archiveteam |
12:48
🔗
|
|
RichardG has quit IRC (Ping timeout: 255 seconds) |
13:41
🔗
|
|
ZeoNet_ has joined #archiveteam |
13:54
🔗
|
|
ZeoNet has quit IRC (Read error: Operation timed out) |
13:54
🔗
|
|
ZeoNet_ is now known as ZeoNet |
14:24
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
15:06
🔗
|
|
RichardG has joined #archiveteam |
15:37
🔗
|
arkiver |
zout: if you are using some custom scripts, can you please send me that too? |
15:37
🔗
|
arkiver |
Not a lot of usermade WARCs go into the wayback machine |
15:37
🔗
|
arkiver |
but if the way these WARCs were made is good |
15:37
🔗
|
arkiver |
and the actual WARCs are good I don't see a reason to not put them in the wayback machine |
15:39
🔗
|
arkiver |
zout: and I think it would be good to have hackforums in the wayback machine |
15:39
🔗
|
arkiver |
:) |
15:44
🔗
|
|
VADemon has joined #archiveteam |
15:44
🔗
|
|
VADemon has quit IRC (Read error: Connection reset by peer) |
15:45
🔗
|
|
VADemon has joined #archiveteam |
15:45
🔗
|
|
VADemon has quit IRC (Read error: Connection reset by peer) |
15:47
🔗
|
|
VADemon has joined #archiveteam |
16:16
🔗
|
|
Atom-- has joined #archiveteam |
16:18
🔗
|
|
chiefyg has joined #archiveteam |
16:18
🔗
|
chiefyg |
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD |
16:18
🔗
|
|
RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) |
16:20
🔗
|
|
Atom has quit IRC (Read error: Operation timed out) |
16:20
🔗
|
chiefyg |
anybody? |
16:21
🔗
|
xmc |
yahoosucks |
16:21
🔗
|
xmc |
chiefyg: ^ |
16:21
🔗
|
chiefyg |
thanks :) |
16:23
🔗
|
|
chiefyg has quit IRC (Quit: Page closed) |
16:23
🔗
|
|
ZeoNet has quit IRC (Ping timeout: 370 seconds) |
16:36
🔗
|
|
ZeoNet has joined #archiveteam |
16:48
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
16:48
🔗
|
|
robink has quit IRC (Ping timeout: 633 seconds) |
16:49
🔗
|
|
AlexLehm has joined #archiveteam |
16:55
🔗
|
|
RichardG has joined #archiveteam |
16:57
🔗
|
|
robink has joined #archiveteam |
17:00
🔗
|
|
ZeoNet has quit IRC (Ping timeout: 244 seconds) |
17:03
🔗
|
|
ravetcofx has joined #archiveteam |
17:06
🔗
|
|
dashcloud has joined #archiveteam |
17:12
🔗
|
|
RoanKatto has quit IRC (Ping timeout: 506 seconds) |
18:03
🔗
|
|
bRick5772 has joined #archiveteam |
18:13
🔗
|
|
swebb3 has joined #archiveteam |
18:19
🔗
|
SketchCow |
Make a difference |
18:20
🔗
|
xmc |
? |
18:20
🔗
|
SketchCow |
That was my pep talk to him |
18:20
🔗
|
SketchCow |
Hey, it's late here |
18:20
🔗
|
|
swebb3 has quit IRC (Remote host closed the connection) |
18:20
🔗
|
xmc |
to whom? |
18:30
🔗
|
|
ndiddy has joined #archiveteam |
18:30
🔗
|
SketchCow |
chiefyg |
18:32
🔗
|
xmc |
oh |
18:32
🔗
|
SketchCow |
Yeah, not exactly a murder mystery |
18:33
🔗
|
xmc |
sorry, i just can't get it up |
18:34
🔗
|
SketchCow |
Eveything's a murder mystery if you try hard enough. |
18:34
🔗
|
SketchCow |
OK, Gawker storm is over. |
18:36
🔗
|
|
SketchCow changes topic to: Archive Team: We're not archive.org | http://archiveteam.org/ | lengthy/off-topic in #archiveteam-bs | With AT you Save |
18:55
🔗
|
|
Morbus has joined #archiveteam |
19:16
🔗
|
|
ZeoNet has joined #archiveteam |
19:24
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
19:26
🔗
|
swebb |
over in a good way or a bad way? |
19:27
🔗
|
swebb |
My old gawker crawl is still running: https://archive.org/details/gawkermedia-20160624190933 |
19:28
🔗
|
|
dr3gs has joined #archiveteam |
19:28
🔗
|
|
dashcloud has joined #archiveteam |
19:36
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
19:41
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
19:44
🔗
|
|
dashcloud has joined #archiveteam |
19:58
🔗
|
|
dr3gs has quit IRC (Leaving) |
20:23
🔗
|
|
maelstrom has joined #archiveteam |
20:36
🔗
|
|
bzc6p has joined #archiveteam |
20:36
🔗
|
|
swebb sets mode: +o bzc6p |
20:37
🔗
|
bzc6p |
SketchCow: myVIP project is done. I've just sent you a mail with some more information, as I don't want to disturb your holiday with work now. |
20:37
🔗
|
bzc6p |
Gentlemen: thank you everyone who helped saving myVIP. |
20:38
🔗
|
|
bzc6p has quit IRC (Client Quit) |
20:39
🔗
|
|
blacwtr has joined #archiveteam |
20:39
🔗
|
HCross |
No problem |
20:39
🔗
|
* |
HCross bows |
20:41
🔗
|
|
ZeoNet has quit IRC (Ping timeout: 370 seconds) |
20:42
🔗
|
|
blacwtr has quit IRC (Client Quit) |
20:57
🔗
|
|
nickname_ has joined #archiveteam |
21:27
🔗
|
SketchCow |
You got it. |
21:37
🔗
|
|
z00nx has quit IRC (Remote host closed the connection) |
21:49
🔗
|
|
nickname_ has quit IRC (Ping timeout: 492 seconds) |
22:15
🔗
|
|
bRick5772 has quit IRC (Quit: Leaving.) |
22:16
🔗
|
|
AlexLehm has quit IRC (Ping timeout: 260 seconds) |
22:41
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
22:45
🔗
|
|
dashcloud has joined #archiveteam |
23:19
🔗
|
|
jspiros has quit IRC (leaving) |
23:22
🔗
|
|
Aranje has joined #archiveteam |
23:45
🔗
|
|
dashcloud has quit IRC (Remote host closed the connection) |
23:46
🔗
|
|
BlueMaxim has joined #archiveteam |