| Time |
Nickname |
Message |
|
01:41
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
01:45
🔗
|
|
dashcloud has joined #urlteam |
|
01:45
🔗
|
|
svchfoo3 sets mode: +o dashcloud |
|
03:20
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
|
04:06
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
04:06
🔗
|
|
dashcloud has joined #urlteam |
|
04:07
🔗
|
|
svchfoo3 sets mode: +o dashcloud |
|
04:49
🔗
|
|
DiscantX has joined #urlteam |
|
06:30
🔗
|
|
WinterFox has joined #urlteam |
|
06:33
🔗
|
|
JesseW has joined #urlteam |
|
07:09
🔗
|
|
JesseW has quit IRC (Read error: Operation timed out) |
|
07:16
🔗
|
|
svchfoo3 has quit IRC (Read error: Operation timed out) |
|
07:27
🔗
|
|
svchfoo3 has joined #urlteam |
|
07:28
🔗
|
|
svchfoo1 sets mode: +o svchfoo3 |
|
11:18
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
11:22
🔗
|
|
dashcloud has joined #urlteam |
|
11:22
🔗
|
|
svchfoo3 sets mode: +o dashcloud |
|
12:13
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
12:17
🔗
|
|
dashcloud has joined #urlteam |
|
12:17
🔗
|
|
svchfoo1 sets mode: +o dashcloud |
|
12:56
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
13:00
🔗
|
|
dashcloud has joined #urlteam |
|
13:00
🔗
|
|
svchfoo1 sets mode: +o dashcloud |
|
13:54
🔗
|
|
jornbaer has quit IRC (Quit: Vedlikehold) |
|
14:05
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
|
14:09
🔗
|
|
WinterFox has quit IRC (Read error: Operation timed out) |
|
14:38
🔗
|
|
VADemon has quit IRC (Read error: Connection reset by peer) |
|
16:15
🔗
|
|
JesseW has joined #urlteam |
|
17:02
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
|
17:14
🔗
|
JW_work |
I am happy for the scripts to be modified to save the redirect HTTP responses as WARCs (note that a lot of the redirects use HTTP HEAD requests, rather than GET requests). |
|
17:14
🔗
|
JW_work |
I think scraping all the target pages would be worth doing, too, albeit not along with the main discovery scrape. |
|
17:15
🔗
|
JW_work |
And if we don't want to scrape all of them, looking through the domains and scraping all but a selected blacklist might be feasible. |
|
17:15
🔗
|
JW_work |
arkiver: ping |
|
17:16
🔗
|
JW_work |
Medowar: thanks, will look into |
|
17:30
🔗
|
luckcolor |
JW_work: for most ones it wouldn't be required to regrab |
|
17:31
🔗
|
luckcolor |
we could convert the beacon in warc records |
|
17:32
🔗
|
JW_work |
luckcolor: the beacon doesn't have enough information to make a proper WARC, although for the dead shorteners, we may have to make due. |
|
17:32
🔗
|
luckcolor |
yeah that's what i was thinking |
|
17:32
🔗
|
luckcolor |
we must fake warc records for the dead ones |
|
17:32
🔗
|
JW_work |
specifically, it doesn't have exact datetime the query was made, it doesn't have the additional headers (if any) returned. |
|
17:33
🔗
|
luckcolor |
yeah |
|
17:33
🔗
|
luckcolor |
well i'm up for rerunning some url lists |
|
17:33
🔗
|
luckcolor |
using a separate tool |
|
17:34
🔗
|
luckcolor |
it can be easily done by creating url lists and then splitting into smaller lists |
|
17:34
🔗
|
luckcolor |
can probably be made as a warrior project |
|
17:43
🔗
|
JW_work |
nods |
|
19:28
🔗
|
|
jornane has joined #urlteam |
|
20:16
🔗
|
|
logchfoo1 starts logging #urlteam at Tue Jul 05 20:16:50 2016 |
|
20:16
🔗
|
|
logchfoo1 has joined #urlteam |
|
20:35
🔗
|
|
logchfoo1 starts logging #urlteam at Tue Jul 05 20:35:24 2016 |
|
20:35
🔗
|
|
logchfoo1 has joined #urlteam |
|
20:40
🔗
|
|
logchfoo0 starts logging #urlteam at Tue Jul 05 20:40:59 2016 |
|
20:40
🔗
|
|
logchfoo0 has joined #urlteam |
|
20:58
🔗
|
|
Start has joined #urlteam |
|
22:20
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
|
22:21
🔗
|
|
Start has joined #urlteam |