#urlteam 2016-07-05,Tue

↑back Search

Time Nickname Message
01:41 🔗 dashcloud has quit IRC (Read error: Operation timed out)
01:45 🔗 dashcloud has joined #urlteam
01:45 🔗 svchfoo3 sets mode: +o dashcloud
03:20 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
04:06 🔗 dashcloud has quit IRC (Read error: Operation timed out)
04:06 🔗 dashcloud has joined #urlteam
04:07 🔗 svchfoo3 sets mode: +o dashcloud
04:49 🔗 DiscantX has joined #urlteam
06:30 🔗 WinterFox has joined #urlteam
06:33 🔗 JesseW has joined #urlteam
07:09 🔗 JesseW has quit IRC (Read error: Operation timed out)
07:16 🔗 svchfoo3 has quit IRC (Read error: Operation timed out)
07:27 🔗 svchfoo3 has joined #urlteam
07:28 🔗 svchfoo1 sets mode: +o svchfoo3
11:18 🔗 dashcloud has quit IRC (Read error: Operation timed out)
11:22 🔗 dashcloud has joined #urlteam
11:22 🔗 svchfoo3 sets mode: +o dashcloud
12:13 🔗 dashcloud has quit IRC (Read error: Operation timed out)
12:17 🔗 dashcloud has joined #urlteam
12:17 🔗 svchfoo1 sets mode: +o dashcloud
12:56 🔗 dashcloud has quit IRC (Read error: Operation timed out)
13:00 🔗 dashcloud has joined #urlteam
13:00 🔗 svchfoo1 sets mode: +o dashcloud
13:54 🔗 jornbaer has quit IRC (Quit: Vedlikehold)
14:05 🔗 Start has quit IRC (Quit: Disconnected.)
14:09 🔗 WinterFox has quit IRC (Read error: Operation timed out)
14:38 🔗 VADemon has quit IRC (Read error: Connection reset by peer)
16:15 🔗 JesseW has joined #urlteam
17:02 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
17:14 🔗 JW_work I am happy for the scripts to be modified to save the redirect HTTP responses as WARCs (note that a lot of the redirects use HTTP HEAD requests, rather than GET requests).
17:14 🔗 JW_work I think scraping all the target pages would be worth doing, too, albeit not along with the main discovery scrape.
17:15 🔗 JW_work And if we don't want to scrape all of them, looking through the domains and scraping all but a selected blacklist might be feasible.
17:15 🔗 JW_work arkiver: ping
17:16 🔗 JW_work Medowar: thanks, will look into
17:30 🔗 luckcolor JW_work: for most ones it wouldn't be required to regrab
17:31 🔗 luckcolor we could convert the beacon in warc records
17:32 🔗 JW_work luckcolor: the beacon doesn't have enough information to make a proper WARC, although for the dead shorteners, we may have to make due.
17:32 🔗 luckcolor yeah that's what i was thinking
17:32 🔗 luckcolor we must fake warc records for the dead ones
17:32 🔗 JW_work specifically, it doesn't have exact datetime the query was made, it doesn't have the additional headers (if any) returned.
17:33 🔗 luckcolor yeah
17:33 🔗 luckcolor well i'm up for rerunning some url lists
17:33 🔗 luckcolor using a separate tool
17:34 🔗 luckcolor it can be easily done by creating url lists and then splitting into smaller lists
17:34 🔗 luckcolor can probably be made as a warrior project
17:43 🔗 JW_work nods
19:28 🔗 jornane has joined #urlteam
20:16 🔗 logchfoo1 starts logging #urlteam at Tue Jul 05 20:16:50 2016
20:16 🔗 logchfoo1 has joined #urlteam
20:35 🔗 logchfoo1 starts logging #urlteam at Tue Jul 05 20:35:24 2016
20:35 🔗 logchfoo1 has joined #urlteam
20:40 🔗 logchfoo0 starts logging #urlteam at Tue Jul 05 20:40:59 2016
20:40 🔗 logchfoo0 has joined #urlteam
20:58 🔗 Start has joined #urlteam
22:20 🔗 Start has quit IRC (Quit: Disconnected.)
22:21 🔗 Start has joined #urlteam

irclogger-viewer