[01:41] *** dashcloud has quit IRC (Read error: Operation timed out) [01:45] *** dashcloud has joined #urlteam [01:45] *** svchfoo3 sets mode: +o dashcloud [03:20] *** JesseW has quit IRC (Ping timeout: 370 seconds) [04:06] *** dashcloud has quit IRC (Read error: Operation timed out) [04:06] *** dashcloud has joined #urlteam [04:07] *** svchfoo3 sets mode: +o dashcloud [04:49] *** DiscantX has joined #urlteam [06:30] *** WinterFox has joined #urlteam [06:33] *** JesseW has joined #urlteam [07:09] *** JesseW has quit IRC (Read error: Operation timed out) [07:16] *** svchfoo3 has quit IRC (Read error: Operation timed out) [07:27] *** svchfoo3 has joined #urlteam [07:28] *** svchfoo1 sets mode: +o svchfoo3 [11:18] *** dashcloud has quit IRC (Read error: Operation timed out) [11:22] *** dashcloud has joined #urlteam [11:22] *** svchfoo3 sets mode: +o dashcloud [12:13] *** dashcloud has quit IRC (Read error: Operation timed out) [12:17] *** dashcloud has joined #urlteam [12:17] *** svchfoo1 sets mode: +o dashcloud [12:56] *** dashcloud has quit IRC (Read error: Operation timed out) [13:00] *** dashcloud has joined #urlteam [13:00] *** svchfoo1 sets mode: +o dashcloud [13:54] *** jornbaer has quit IRC (Quit: Vedlikehold) [14:05] *** Start has quit IRC (Quit: Disconnected.) [14:09] *** WinterFox has quit IRC (Read error: Operation timed out) [14:38] *** VADemon has quit IRC (Read error: Connection reset by peer) [16:15] *** JesseW has joined #urlteam [17:02] *** JesseW has quit IRC (Ping timeout: 370 seconds) [17:14] I am happy for the scripts to be modified to save the redirect HTTP responses as WARCs (note that a lot of the redirects use HTTP HEAD requests, rather than GET requests). [17:14] I think scraping all the target pages would be worth doing, too, albeit not along with the main discovery scrape. [17:15] And if we don't want to scrape all of them, looking through the domains and scraping all but a selected blacklist might be feasible. [17:15] arkiver: ping [17:16] Medowar: thanks, will look into [17:30] JW_work: for most ones it wouldn't be required to regrab [17:31] we could convert the beacon in warc records [17:32] luckcolor: the beacon doesn't have enough information to make a proper WARC, although for the dead shorteners, we may have to make due. [17:32] yeah that's what i was thinking [17:32] we must fake warc records for the dead ones [17:32] specifically, it doesn't have exact datetime the query was made, it doesn't have the additional headers (if any) returned. [17:33] yeah [17:33] well i'm up for rerunning some url lists [17:33] using a separate tool [17:34] it can be easily done by creating url lists and then splitting into smaller lists [17:34] can probably be made as a warrior project [17:43] nods [19:28] *** jornane has joined #urlteam [20:16] *** logchfoo1 starts logging #urlteam at Tue Jul 05 20:16:50 2016 [20:16] *** logchfoo1 has joined #urlteam [20:35] *** logchfoo1 starts logging #urlteam at Tue Jul 05 20:35:24 2016 [20:35] *** logchfoo1 has joined #urlteam [20:40] *** logchfoo0 starts logging #urlteam at Tue Jul 05 20:40:59 2016 [20:40] *** logchfoo0 has joined #urlteam [20:58] *** Start has joined #urlteam [22:20] *** Start has quit IRC (Quit: Disconnected.) [22:21] *** Start has joined #urlteam