#urlteam 2016-07-05,Tue

↑back Search

Time	Nickname	Message
01:41 ^🔗		dashcloud has quit IRC (Read error: Operation timed out)
01:45 ^🔗		dashcloud has joined #urlteam
01:45 ^🔗		svchfoo3 sets mode: +o dashcloud
03:20 ^🔗		JesseW has quit IRC (Ping timeout: 370 seconds)
04:06 ^🔗		dashcloud has quit IRC (Read error: Operation timed out)
04:06 ^🔗		dashcloud has joined #urlteam
04:07 ^🔗		svchfoo3 sets mode: +o dashcloud
04:49 ^🔗		DiscantX has joined #urlteam
06:30 ^🔗		WinterFox has joined #urlteam
06:33 ^🔗		JesseW has joined #urlteam
07:09 ^🔗		JesseW has quit IRC (Read error: Operation timed out)
07:16 ^🔗		svchfoo3 has quit IRC (Read error: Operation timed out)
07:27 ^🔗		svchfoo3 has joined #urlteam
07:28 ^🔗		svchfoo1 sets mode: +o svchfoo3
11:18 ^🔗		dashcloud has quit IRC (Read error: Operation timed out)
11:22 ^🔗		dashcloud has joined #urlteam
11:22 ^🔗		svchfoo3 sets mode: +o dashcloud
12:13 ^🔗		dashcloud has quit IRC (Read error: Operation timed out)
12:17 ^🔗		dashcloud has joined #urlteam
12:17 ^🔗		svchfoo1 sets mode: +o dashcloud
12:56 ^🔗		dashcloud has quit IRC (Read error: Operation timed out)
13:00 ^🔗		dashcloud has joined #urlteam
13:00 ^🔗		svchfoo1 sets mode: +o dashcloud
13:54 ^🔗		jornbaer has quit IRC (Quit: Vedlikehold)
14:05 ^🔗		Start has quit IRC (Quit: Disconnected.)
14:09 ^🔗		WinterFox has quit IRC (Read error: Operation timed out)
14:38 ^🔗		VADemon has quit IRC (Read error: Connection reset by peer)
16:15 ^🔗		JesseW has joined #urlteam
17:02 ^🔗		JesseW has quit IRC (Ping timeout: 370 seconds)
17:14 ^🔗	JW_work	I am happy for the scripts to be modified to save the redirect HTTP responses as WARCs (note that a lot of the redirects use HTTP HEAD requests, rather than GET requests).
17:14 ^🔗	JW_work	I think scraping all the target pages would be worth doing, too, albeit not along with the main discovery scrape.
17:15 ^🔗	JW_work	And if we don't want to scrape all of them, looking through the domains and scraping all but a selected blacklist might be feasible.
17:15 ^🔗	JW_work	arkiver: ping
17:16 ^🔗	JW_work	Medowar: thanks, will look into
17:30 ^🔗	luckcolor	JW_work: for most ones it wouldn't be required to regrab
17:31 ^🔗	luckcolor	we could convert the beacon in warc records
17:32 ^🔗	JW_work	luckcolor: the beacon doesn't have enough information to make a proper WARC, although for the dead shorteners, we may have to make due.
17:32 ^🔗	luckcolor	yeah that's what i was thinking
17:32 ^🔗	luckcolor	we must fake warc records for the dead ones
17:32 ^🔗	JW_work	specifically, it doesn't have exact datetime the query was made, it doesn't have the additional headers (if any) returned.
17:33 ^🔗	luckcolor	yeah
17:33 ^🔗	luckcolor	well i'm up for rerunning some url lists
17:33 ^🔗	luckcolor	using a separate tool
17:34 ^🔗	luckcolor	it can be easily done by creating url lists and then splitting into smaller lists
17:34 ^🔗	luckcolor	can probably be made as a warrior project
17:43 ^🔗	JW_work	nods
19:28 ^🔗		jornane has joined #urlteam
20:16 ^🔗		logchfoo1 starts logging #urlteam at Tue Jul 05 20:16:50 2016
20:16 ^🔗		logchfoo1 has joined #urlteam
20:35 ^🔗		logchfoo1 starts logging #urlteam at Tue Jul 05 20:35:24 2016
20:35 ^🔗		logchfoo1 has joined #urlteam
20:40 ^🔗		logchfoo0 starts logging #urlteam at Tue Jul 05 20:40:59 2016
20:40 ^🔗		logchfoo0 has joined #urlteam
20:58 ^🔗		Start has joined #urlteam
22:20 ^🔗		Start has quit IRC (Quit: Disconnected.)
22:21 ^🔗		Start has joined #urlteam

irclogger-viewer