Time |
Nickname |
Message |
01:41
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
01:45
🔗
|
|
dashcloud has joined #urlteam |
01:45
🔗
|
|
svchfoo3 sets mode: +o dashcloud |
03:20
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
04:06
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
04:06
🔗
|
|
dashcloud has joined #urlteam |
04:07
🔗
|
|
svchfoo3 sets mode: +o dashcloud |
04:49
🔗
|
|
DiscantX has joined #urlteam |
06:30
🔗
|
|
WinterFox has joined #urlteam |
06:33
🔗
|
|
JesseW has joined #urlteam |
07:09
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
07:16
🔗
|
|
svchfoo3 has quit IRC (Read error: Operation timed out) |
07:27
🔗
|
|
svchfoo3 has joined #urlteam |
07:28
🔗
|
|
svchfoo1 sets mode: +o svchfoo3 |
11:18
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
11:22
🔗
|
|
dashcloud has joined #urlteam |
11:22
🔗
|
|
svchfoo3 sets mode: +o dashcloud |
12:13
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
12:17
🔗
|
|
dashcloud has joined #urlteam |
12:17
🔗
|
|
svchfoo1 sets mode: +o dashcloud |
12:56
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
13:00
🔗
|
|
dashcloud has joined #urlteam |
13:00
🔗
|
|
svchfoo1 sets mode: +o dashcloud |
13:54
🔗
|
|
jornbaer has quit IRC (Quit: Vedlikehold) |
14:05
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
14:09
🔗
|
|
WinterFox has quit IRC (Read error: Operation timed out) |
14:38
🔗
|
|
VADemon has quit IRC (Read error: Connection reset by peer) |
16:15
🔗
|
|
JesseW has joined #urlteam |
17:02
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
17:14
🔗
|
JW_work |
I am happy for the scripts to be modified to save the redirect HTTP responses as WARCs (note that a lot of the redirects use HTTP HEAD requests, rather than GET requests). |
17:14
🔗
|
JW_work |
I think scraping all the target pages would be worth doing, too, albeit not along with the main discovery scrape. |
17:15
🔗
|
JW_work |
And if we don't want to scrape all of them, looking through the domains and scraping all but a selected blacklist might be feasible. |
17:15
🔗
|
JW_work |
arkiver: ping |
17:16
🔗
|
JW_work |
Medowar: thanks, will look into |
17:30
🔗
|
luckcolor |
JW_work: for most ones it wouldn't be required to regrab |
17:31
🔗
|
luckcolor |
we could convert the beacon in warc records |
17:32
🔗
|
JW_work |
luckcolor: the beacon doesn't have enough information to make a proper WARC, although for the dead shorteners, we may have to make due. |
17:32
🔗
|
luckcolor |
yeah that's what i was thinking |
17:32
🔗
|
luckcolor |
we must fake warc records for the dead ones |
17:32
🔗
|
JW_work |
specifically, it doesn't have exact datetime the query was made, it doesn't have the additional headers (if any) returned. |
17:33
🔗
|
luckcolor |
yeah |
17:33
🔗
|
luckcolor |
well i'm up for rerunning some url lists |
17:33
🔗
|
luckcolor |
using a separate tool |
17:34
🔗
|
luckcolor |
it can be easily done by creating url lists and then splitting into smaller lists |
17:34
🔗
|
luckcolor |
can probably be made as a warrior project |
17:43
🔗
|
JW_work |
nods |
19:28
🔗
|
|
jornane has joined #urlteam |
20:16
🔗
|
|
logchfoo1 starts logging #urlteam at Tue Jul 05 20:16:50 2016 |
20:16
🔗
|
|
logchfoo1 has joined #urlteam |
20:35
🔗
|
|
logchfoo1 starts logging #urlteam at Tue Jul 05 20:35:24 2016 |
20:35
🔗
|
|
logchfoo1 has joined #urlteam |
20:40
🔗
|
|
logchfoo0 starts logging #urlteam at Tue Jul 05 20:40:59 2016 |
20:40
🔗
|
|
logchfoo0 has joined #urlteam |
20:58
🔗
|
|
Start has joined #urlteam |
22:20
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
22:21
🔗
|
|
Start has joined #urlteam |