Time |
Nickname |
Message |
00:01
🔗
|
|
Aranje has quit IRC (Read error: Operation timed out) |
00:02
🔗
|
|
Aranje has joined #archiveteam |
00:51
🔗
|
|
jacketcha has joined #archiveteam |
01:01
🔗
|
|
Pixi has quit IRC (Quit: Pixi) |
01:08
🔗
|
|
Pixi has joined #archiveteam |
01:28
🔗
|
|
pizzaiolo has quit IRC (Ping timeout: 246 seconds) |
01:29
🔗
|
|
pizzaiolo has joined #archiveteam |
01:30
🔗
|
|
Selanda_ has quit IRC (Ping timeout: 260 seconds) |
01:31
🔗
|
|
Selanda has joined #archiveteam |
01:33
🔗
|
|
pizzaiolo has quit IRC (Client Quit) |
01:33
🔗
|
|
pizzaiolo has joined #archiveteam |
01:42
🔗
|
|
pizzaiolo has quit IRC (Remote host closed the connection) |
01:43
🔗
|
|
BlueMaxim has quit IRC (Leaving) |
01:44
🔗
|
|
ranavalon has quit IRC (Read error: Connection reset by peer) |
01:46
🔗
|
|
ranavalon has joined #archiveteam |
01:47
🔗
|
|
ranavalon has quit IRC (Remote host closed the connection) |
01:47
🔗
|
|
ranavalon has joined #archiveteam |
01:59
🔗
|
|
kitties has joined #archiveteam |
02:02
🔗
|
|
jacketcha has quit IRC (Read error: Connection reset by peer) |
02:08
🔗
|
|
nertzy has quit IRC (Read error: Connection reset by peer) |
02:11
🔗
|
|
jacketcha has joined #archiveteam |
02:22
🔗
|
|
BlueMaxim has joined #archiveteam |
02:31
🔗
|
|
mistym has joined #archiveteam |
03:05
🔗
|
|
Aranje has quit IRC (Read error: Operation timed out) |
03:05
🔗
|
|
Aranje has joined #archiveteam |
03:10
🔗
|
|
Aranje has quit IRC (Read error: Operation timed out) |
03:10
🔗
|
|
dx has joined #archiveteam |
03:10
🔗
|
|
Aranje has joined #archiveteam |
03:13
🔗
|
|
parker has joined #archiveteam |
03:15
🔗
|
|
dx has left |
03:29
🔗
|
|
parker has quit IRC (Read error: Operation timed out) |
03:32
🔗
|
|
BlueMaxim has quit IRC (Leaving) |
03:34
🔗
|
|
parker has joined #archiveteam |
03:51
🔗
|
|
BlueMaxim has joined #archiveteam |
04:18
🔗
|
|
parker has quit IRC (Ping timeout: 360 seconds) |
04:45
🔗
|
|
qw3rty118 has joined #archiveteam |
04:47
🔗
|
|
jacketcha has quit IRC (Read error: Connection reset by peer) |
04:47
🔗
|
|
jacketcha has joined #archiveteam |
04:49
🔗
|
|
nwf__ has quit IRC (WeeChat 1.6) |
04:49
🔗
|
|
qw3rty117 has quit IRC (Read error: Operation timed out) |
04:51
🔗
|
|
Aranje has quit IRC (Quit: Three sheets to the wind) |
05:08
🔗
|
|
ranav has joined #archiveteam |
05:12
🔗
|
|
zhongfu has quit IRC (Remote host closed the connection) |
05:12
🔗
|
|
mona has quit IRC (Ping timeout: 260 seconds) |
05:15
🔗
|
|
ranavalon has quit IRC (Read error: Operation timed out) |
05:16
🔗
|
|
mona has joined #archiveteam |
05:22
🔗
|
|
Ctrl has quit IRC (Ping timeout: 506 seconds) |
05:24
🔗
|
|
nwf has joined #archiveteam |
05:47
🔗
|
|
antomatic has quit IRC (Ping timeout: 252 seconds) |
05:47
🔗
|
|
jacketcha has quit IRC (Read error: Connection reset by peer) |
05:48
🔗
|
|
jacketcha has joined #archiveteam |
06:06
🔗
|
|
Stilett0 is now known as Stiletto |
06:30
🔗
|
|
antomatic has joined #archiveteam |
06:40
🔗
|
|
antomatic has quit IRC (Read error: Operation timed out) |
06:40
🔗
|
|
antomatic has joined #archiveteam |
07:25
🔗
|
|
pikhq has quit IRC (Ping timeout: 250 seconds) |
07:26
🔗
|
|
BobJonkma has joined #archiveteam |
07:40
🔗
|
|
pikhq has joined #archiveteam |
07:45
🔗
|
|
kitties has quit IRC (Quit: Connection closed for inactivity) |
09:29
🔗
|
|
atomotic has joined #archiveteam |
10:13
🔗
|
|
schbirid has joined #archiveteam |
10:15
🔗
|
|
SilSte has quit IRC (Read error: Connection reset by peer) |
10:32
🔗
|
|
Mateon1 has quit IRC (Read error: Operation timed out) |
10:33
🔗
|
|
Mateon1 has joined #archiveteam |
12:21
🔗
|
|
BlueMaxim has quit IRC (Leaving) |
12:58
🔗
|
|
atomotic has quit IRC (Quit: atomotic) |
13:45
🔗
|
|
atomotic has joined #archiveteam |
13:55
🔗
|
|
Morbus has joined #archiveteam |
14:32
🔗
|
|
atomotic has quit IRC (Quit: atomotic) |
15:52
🔗
|
|
ld1 has quit IRC (Quit: ld1) |
15:55
🔗
|
|
ld1 has joined #archiveteam |
16:14
🔗
|
|
atomotic has joined #archiveteam |
16:26
🔗
|
|
parker has joined #archiveteam |
16:29
🔗
|
|
RichardG has quit IRC (Ping timeout: 506 seconds) |
16:42
🔗
|
|
parker has quit IRC (Read error: Operation timed out) |
16:48
🔗
|
|
parker has joined #archiveteam |
17:01
🔗
|
|
atomotic has quit IRC (Quit: atomotic) |
17:10
🔗
|
|
atrocity has quit IRC () |
17:15
🔗
|
|
RichardG has joined #archiveteam |
17:23
🔗
|
|
atomotic has joined #archiveteam |
17:51
🔗
|
|
atomotic has quit IRC (Quit: atomotic) |
18:12
🔗
|
|
jschwart has joined #archiveteam |
19:18
🔗
|
|
parker has quit IRC (Ping timeout: 360 seconds) |
19:46
🔗
|
|
octothorp has quit IRC (Read error: Connection reset by peer) |
19:47
🔗
|
|
octothorp has joined #archiveteam |
19:57
🔗
|
|
Soni has quit IRC (Ping timeout: 255 seconds) |
20:09
🔗
|
|
Morbus has quit IRC (Ping timeout: 255 seconds) |
20:09
🔗
|
|
Morbus has joined #archiveteam |
20:16
🔗
|
|
Soni has joined #archiveteam |
20:31
🔗
|
|
parker has joined #archiveteam |
20:33
🔗
|
|
Scippy has joined #archiveteam |
20:37
🔗
|
|
Scippy has quit IRC (Ping timeout: 260 seconds) |
20:38
🔗
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
20:45
🔗
|
|
n00b604 has joined #archiveteam |
20:47
🔗
|
n00b604 |
Hi. Is there any tool for crawling and creating a WARC of a website that requires JS? I think already have a way of enumerating all the pages. |
20:48
🔗
|
WubTheCap |
ArchiveBot supports PhantomJS, if you mean infinite scrolling |
20:49
🔗
|
n00b604 |
WubTheCap: can it save a page like this <https://arbital.com/p/bayes_rule/>? |
20:50
🔗
|
n00b604 |
Wayback can't do it: http://web.archive.org/web/20171101121322/https://arbital.com/p/bayes_rule/ |
20:50
🔗
|
n00b604 |
Even archive.is can't: http://archive.is/6Kxrl |
20:51
🔗
|
n00b604 |
gotta go, might be back later (and will read the log to see if anyone has ideas) |
20:52
🔗
|
n00b604 |
btw the reason I want to archive this is because it apparently is being shut down. they say they'll maintain an archive, but I want to make sure |
20:52
🔗
|
|
n00b604 has quit IRC (Quit: Page closed) |
21:07
🔗
|
|
RichardG has joined #archiveteam |
21:23
🔗
|
JAA |
ArchiveBot *should* support PhantomJS for scrolling (e.g. Twitter), but that has been broken on most pipelines for many months. |
21:23
🔗
|
JAA |
You can use a browser with a WARC-writing MITM proxy such as warcprox. That would capture exactly what the browser requests and receives, including anything loaded after the initial page load etc. |
21:25
🔗
|
JAA |
But that's entirely manual. I don't have experience with working automatic tools; I know that brozzler exists though, which is warcprox + Chromium headless, I believe, so you could give that a try. |
21:25
🔗
|
JAA |
(NB, I didn't look at that page at all.) |
21:31
🔗
|
|
don is now known as don_ |
21:35
🔗
|
|
Atom-- has joined #archiveteam |
21:40
🔗
|
|
WubTheCap has quit IRC (Read error: Connection reset by peer) |
21:41
🔗
|
|
Atom has quit IRC (Read error: Operation timed out) |
21:44
🔗
|
|
WubTheCap has joined #archiveteam |
21:46
🔗
|
|
Atom-- has quit IRC (Read error: Operation timed out) |
22:01
🔗
|
|
pizzaiolo has joined #archiveteam |
22:15
🔗
|
|
pizzaiolo has quit IRC (Read error: Connection reset by peer) |
22:16
🔗
|
|
pizzaiolo has joined #archiveteam |
22:22
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
22:38
🔗
|
|
jschwart has quit IRC (Quit: Konversation terminated!) |
22:38
🔗
|
|
machina has joined #archiveteam |
22:46
🔗
|
|
pizzaiolo has quit IRC (Remote host closed the connection) |
22:52
🔗
|
|
soldaat has joined #archiveteam |
22:54
🔗
|
|
soldaat has quit IRC (Client Quit) |
23:20
🔗
|
|
MrDignity has joined #archiveteam |
23:28
🔗
|
|
BlueMaxim has joined #archiveteam |