| Time |
Nickname |
Message |
|
00:05
🔗
|
|
maelstrom has quit IRC (Quit: Leaving) |
|
00:05
🔗
|
|
maelstrom has joined #archiveteam |
|
00:10
🔗
|
|
notafed has joined #archiveteam |
|
00:10
🔗
|
|
maelstrom has quit IRC (Read error: Connection reset by peer) |
|
00:20
🔗
|
|
RichardG has joined #archiveteam |
|
00:20
🔗
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
|
00:21
🔗
|
|
RichardG has joined #archiveteam |
|
00:32
🔗
|
|
ZeoNet has joined #archiveteam |
|
00:34
🔗
|
|
kyounko has joined #archiveteam |
|
00:42
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
|
00:44
🔗
|
|
ZeoNet_ has joined #archiveteam |
|
00:47
🔗
|
|
ZeoNet has quit IRC (Ping timeout: 244 seconds) |
|
00:58
🔗
|
|
mafrasi2 has joined #archiveteam |
|
00:59
🔗
|
|
RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) |
|
01:09
🔗
|
|
RichardG has joined #archiveteam |
|
01:11
🔗
|
|
RichardG has quit IRC (Client Quit) |
|
01:14
🔗
|
|
kristian_ has quit IRC (Quit: Leaving) |
|
01:15
🔗
|
|
RichardG has joined #archiveteam |
|
01:18
🔗
|
|
ZeoNet__ has joined #archiveteam |
|
01:20
🔗
|
|
ZeoNet_ has quit IRC (Ping timeout: 244 seconds) |
|
01:24
🔗
|
|
ndiddy has joined #archiveteam |
|
01:34
🔗
|
|
ZeoNet__ is now known as ZeoNet |
|
01:36
🔗
|
|
notafed has quit IRC (Read error: Operation timed out) |
|
01:45
🔗
|
|
maelstrom has joined #archiveteam |
|
02:46
🔗
|
|
Start has quit IRC (Read error: Connection reset by peer) |
|
02:46
🔗
|
|
Start_ has joined #archiveteam |
|
02:53
🔗
|
|
Start_ is now known as Start |
|
03:15
🔗
|
|
ndiddy has quit IRC (Read error: Connection reset by peer) |
|
03:41
🔗
|
|
RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) |
|
04:10
🔗
|
|
RichardG has joined #archiveteam |
|
04:11
🔗
|
zout |
ok. wgetting. |
|
04:12
🔗
|
zout |
if anyone wants my secret sauce IP range, just ask. |
|
04:12
🔗
|
|
nwf has quit IRC (Read error: Connection reset by peer) |
|
04:13
🔗
|
|
nwf has joined #archiveteam |
|
04:14
🔗
|
zout |
on the assumption that I could be interrupted, I'm doing front pages first. |
|
04:20
🔗
|
zout |
current thread, "why woman don't need rights" ._. |
|
04:20
🔗
|
|
DFJustin has quit IRC (Remote host closed the connection) |
|
04:25
🔗
|
|
DFJustin has joined #archiveteam |
|
04:37
🔗
|
|
Atros has joined #archiveteam |
|
04:39
🔗
|
|
atrocity has quit IRC (Ping timeout: 260 seconds) |
|
04:40
🔗
|
|
atrocity has joined #archiveteam |
|
04:43
🔗
|
|
Atros has quit IRC (Read error: Operation timed out) |
|
04:44
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
|
04:51
🔗
|
|
Sk1d has joined #archiveteam |
|
05:11
🔗
|
|
dashcloud has quit IRC (Ping timeout: 250 seconds) |
|
05:20
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
|
05:20
🔗
|
|
RichardG has joined #archiveteam |
|
05:22
🔗
|
|
dashcloud has joined #archiveteam |
|
05:26
🔗
|
zout |
in general does much software care about the 'request' in a warc, or just the response? |
|
05:30
🔗
|
|
maelstrom has quit IRC (Quit: Leaving) |
|
05:33
🔗
|
|
RichardG has quit IRC (Ping timeout: 259 seconds) |
|
05:44
🔗
|
|
RichardG has joined #archiveteam |
|
06:03
🔗
|
|
BlueMaxim has joined #archiveteam |
|
06:09
🔗
|
|
RichardG has quit IRC (Read error: Operation timed out) |
|
06:09
🔗
|
yipdw |
zout: the request records are used by a lot of replay software, e.g. wayback and pywb |
|
06:09
🔗
|
yipdw |
so yes, a lot of software cares |
|
06:10
🔗
|
zout |
my stuff seems at least minimally compatible with pywb, so great. |
|
06:11
🔗
|
yipdw |
there's a few pretty good libraries out there that take care of the thorny parts |
|
06:12
🔗
|
zout |
yeah, I'm using `warc` in python but was a bit unsure about some of the content in the request. ended up just making a warc with wget and copying the structure. |
|
06:20
🔗
|
zout |
cloudflare susprisingly hasn't banned me for scraping through it yet. |
|
07:13
🔗
|
|
vOYtEC has joined #archiveteam |
|
07:35
🔗
|
|
vOYtEC has quit IRC (Ping timeout: 255 seconds) |
|
08:27
🔗
|
|
ravetcofx has quit IRC (Ping timeout: 506 seconds) |
|
08:37
🔗
|
|
kurt has joined #archiveteam |
|
08:56
🔗
|
|
ZeoNet has quit IRC (Ping timeout: 370 seconds) |
|
09:06
🔗
|
|
arkiver2 has joined #archiveteam |
|
09:06
🔗
|
arkiver2 |
zout: so you are custom creating WARC files? |
|
09:07
🔗
|
arkiver2 |
Can I please see the script you are creating the WARCs and an example of a WARC file created with it? |
|
09:08
🔗
|
arkiver2 |
to check how you are handling the HTTP headers. |
|
09:08
🔗
|
arkiver2 |
and request/response/other records |
|
09:09
🔗
|
arkiver2 |
why are you not using wpull or wget? wpull has support for custom scripts for your crawl. |
|
09:10
🔗
|
arkiver2 |
Basically if WARC files miss information, or have wrong headers (also HTTP headers), they will not go into the wayback machine, even if they are supported by the wayback machine |
|
09:15
🔗
|
arkiver2 |
there is also wget-lua, which has support for lua scripts. |
|
09:54
🔗
|
|
yipdw has quit IRC (Read error: Operation timed out) |
|
09:56
🔗
|
|
arkiver2 has quit IRC (Read error: Connection reset by peer) |
|
09:58
🔗
|
|
Infreq has quit IRC (Read error: Operation timed out) |
|
09:58
🔗
|
|
brayden_ has joined #archiveteam |
|
09:58
🔗
|
|
swebb sets mode: +o brayden_ |
|
09:58
🔗
|
|
brayden has quit IRC (Read error: Operation timed out) |
|
10:00
🔗
|
|
Infreq has joined #archiveteam |
|
10:00
🔗
|
|
yipdw has joined #archiveteam |
|
10:05
🔗
|
|
logchfoo3 has quit IRC (Ping timeout: 250 seconds) |
|
10:07
🔗
|
|
logchfoo0 starts logging #archiveteam at Tue Sep 27 10:07:22 2016 |
|
10:07
🔗
|
|
logchfoo0 has joined #archiveteam |
|
10:08
🔗
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
|
10:08
🔗
|
|
BlueMaxim has joined #archiveteam |
|
10:16
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
|
10:17
🔗
|
zout |
arkiver2: do user submitted WARC ever make it into the wayback machine proper? |
|
10:18
🔗
|
Sanqui |
under exceptional agreements |
|
10:21
🔗
|
|
hyperion_ has joined #archiveteam |
|
10:22
🔗
|
zout |
arkiver2: PM'd a sample from my WARC. let me know if I'm missing anything, I'm not very far through so altering the format now wouldn't be a problem. |
|
10:24
🔗
|
zout |
arkiver: ^ |
|
10:27
🔗
|
zout |
I didn't think IA ever took input for the wayback machine from outside sources so that wasn't factored into my decision making at all. |
|
10:28
🔗
|
|
godane has joined #archiveteam |
|
10:30
🔗
|
|
kyounko has quit IRC (KVIrc 4.2.0 Equilibrium http://www.kvirc.net/) |
|
10:43
🔗
|
|
hyperion_ has quit IRC (Ping timeout: 250 seconds) |
|
11:02
🔗
|
|
godane has quit IRC (Quit: Leaving.) |
|
11:02
🔗
|
|
godane has joined #archiveteam |
|
11:17
🔗
|
|
RichardG has joined #archiveteam |
|
11:59
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
|
12:23
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
|
12:30
🔗
|
|
BartoCH has joined #archiveteam |
|
12:41
🔗
|
|
ZeoNet has joined #archiveteam |
|
12:48
🔗
|
|
RichardG has quit IRC (Ping timeout: 255 seconds) |
|
13:41
🔗
|
|
ZeoNet_ has joined #archiveteam |
|
13:54
🔗
|
|
ZeoNet has quit IRC (Read error: Operation timed out) |
|
13:54
🔗
|
|
ZeoNet_ is now known as ZeoNet |
|
14:24
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
|
15:06
🔗
|
|
RichardG has joined #archiveteam |
|
15:37
🔗
|
arkiver |
zout: if you are using some custom scripts, can you please send me that too? |
|
15:37
🔗
|
arkiver |
Not a lot of usermade WARCs go into the wayback machine |
|
15:37
🔗
|
arkiver |
but if the way these WARCs were made is good |
|
15:37
🔗
|
arkiver |
and the actual WARCs are good I don't see a reason to not put them in the wayback machine |
|
15:39
🔗
|
arkiver |
zout: and I think it would be good to have hackforums in the wayback machine |
|
15:39
🔗
|
arkiver |
:) |
|
15:44
🔗
|
|
VADemon has joined #archiveteam |
|
15:44
🔗
|
|
VADemon has quit IRC (Read error: Connection reset by peer) |
|
15:45
🔗
|
|
VADemon has joined #archiveteam |
|
15:45
🔗
|
|
VADemon has quit IRC (Read error: Connection reset by peer) |
|
15:47
🔗
|
|
VADemon has joined #archiveteam |
|
16:16
🔗
|
|
Atom-- has joined #archiveteam |
|
16:18
🔗
|
|
chiefyg has joined #archiveteam |
|
16:18
🔗
|
chiefyg |
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD |
|
16:18
🔗
|
|
RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) |
|
16:20
🔗
|
|
Atom has quit IRC (Read error: Operation timed out) |
|
16:20
🔗
|
chiefyg |
anybody? |
|
16:21
🔗
|
xmc |
yahoosucks |
|
16:21
🔗
|
xmc |
chiefyg: ^ |
|
16:21
🔗
|
chiefyg |
thanks :) |
|
16:23
🔗
|
|
chiefyg has quit IRC (Quit: Page closed) |
|
16:23
🔗
|
|
ZeoNet has quit IRC (Ping timeout: 370 seconds) |
|
16:36
🔗
|
|
ZeoNet has joined #archiveteam |
|
16:48
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
16:48
🔗
|
|
robink has quit IRC (Ping timeout: 633 seconds) |
|
16:49
🔗
|
|
AlexLehm has joined #archiveteam |
|
16:55
🔗
|
|
RichardG has joined #archiveteam |
|
16:57
🔗
|
|
robink has joined #archiveteam |
|
17:00
🔗
|
|
ZeoNet has quit IRC (Ping timeout: 244 seconds) |
|
17:03
🔗
|
|
ravetcofx has joined #archiveteam |
|
17:06
🔗
|
|
dashcloud has joined #archiveteam |
|
17:12
🔗
|
|
RoanKatto has quit IRC (Ping timeout: 506 seconds) |
|
18:03
🔗
|
|
bRick5772 has joined #archiveteam |
|
18:13
🔗
|
|
swebb3 has joined #archiveteam |
|
18:19
🔗
|
SketchCow |
Make a difference |
|
18:20
🔗
|
xmc |
? |
|
18:20
🔗
|
SketchCow |
That was my pep talk to him |
|
18:20
🔗
|
SketchCow |
Hey, it's late here |
|
18:20
🔗
|
|
swebb3 has quit IRC (Remote host closed the connection) |
|
18:20
🔗
|
xmc |
to whom? |
|
18:30
🔗
|
|
ndiddy has joined #archiveteam |
|
18:30
🔗
|
SketchCow |
chiefyg |
|
18:32
🔗
|
xmc |
oh |
|
18:32
🔗
|
SketchCow |
Yeah, not exactly a murder mystery |
|
18:33
🔗
|
xmc |
sorry, i just can't get it up |
|
18:34
🔗
|
SketchCow |
Eveything's a murder mystery if you try hard enough. |
|
18:34
🔗
|
SketchCow |
OK, Gawker storm is over. |
|
18:36
🔗
|
|
SketchCow changes topic to: Archive Team: We're not archive.org | http://archiveteam.org/ | lengthy/off-topic in #archiveteam-bs | With AT you Save |
|
18:55
🔗
|
|
Morbus has joined #archiveteam |
|
19:16
🔗
|
|
ZeoNet has joined #archiveteam |
|
19:24
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
19:26
🔗
|
swebb |
over in a good way or a bad way? |
|
19:27
🔗
|
swebb |
My old gawker crawl is still running: https://archive.org/details/gawkermedia-20160624190933 |
|
19:28
🔗
|
|
dr3gs has joined #archiveteam |
|
19:28
🔗
|
|
dashcloud has joined #archiveteam |
|
19:36
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
|
19:41
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
19:44
🔗
|
|
dashcloud has joined #archiveteam |
|
19:58
🔗
|
|
dr3gs has quit IRC (Leaving) |
|
20:23
🔗
|
|
maelstrom has joined #archiveteam |
|
20:36
🔗
|
|
bzc6p has joined #archiveteam |
|
20:36
🔗
|
|
swebb sets mode: +o bzc6p |
|
20:37
🔗
|
bzc6p |
SketchCow: myVIP project is done. I've just sent you a mail with some more information, as I don't want to disturb your holiday with work now. |
|
20:37
🔗
|
bzc6p |
Gentlemen: thank you everyone who helped saving myVIP. |
|
20:38
🔗
|
|
bzc6p has quit IRC (Client Quit) |
|
20:39
🔗
|
|
blacwtr has joined #archiveteam |
|
20:39
🔗
|
HCross |
No problem |
|
20:39
🔗
|
* |
HCross bows |
|
20:41
🔗
|
|
ZeoNet has quit IRC (Ping timeout: 370 seconds) |
|
20:42
🔗
|
|
blacwtr has quit IRC (Client Quit) |
|
20:57
🔗
|
|
nickname_ has joined #archiveteam |
|
21:27
🔗
|
SketchCow |
You got it. |
|
21:37
🔗
|
|
z00nx has quit IRC (Remote host closed the connection) |
|
21:49
🔗
|
|
nickname_ has quit IRC (Ping timeout: 492 seconds) |
|
22:15
🔗
|
|
bRick5772 has quit IRC (Quit: Leaving.) |
|
22:16
🔗
|
|
AlexLehm has quit IRC (Ping timeout: 260 seconds) |
|
22:41
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
22:45
🔗
|
|
dashcloud has joined #archiveteam |
|
23:19
🔗
|
|
jspiros has quit IRC (leaving) |
|
23:22
🔗
|
|
Aranje has joined #archiveteam |
|
23:45
🔗
|
|
dashcloud has quit IRC (Remote host closed the connection) |
|
23:46
🔗
|
|
BlueMaxim has joined #archiveteam |