#archiveteam 2014-05-29,Thu

↑back Search

Time	Nickname	Message
00:05 ^🔗	SketchCow	The thing that runs and creates metadata keywords on my internet archive items now runs 20 times more efficiently.
00:06 ^🔗	SketchCow	So THAT's good.
00:33 ^🔗	exmic	important, that
00:56 ^🔗	godane	SketchCow: looks like you beat me to this: https://archive.org/details/Operating_Instructions_for_the_MCA_DiscoVision_PR-7820-System_Side_1_Only
01:03 ^🔗	Warchell	Hello. Is there a way to convert warc file to html archive with relative link structure?
01:52 ^🔗	exmic	the first can be done with warctozip
01:55 ^🔗	ivan`	anyone know how many times you can hit google cache per minute before getting banned?
02:01 ^🔗	ivan`	wish we had a distributed google cache grabber :)
06:44 ^🔗	Cameron_D	I've used tor for that in the past
06:44 ^🔗	Cameron_D	if I got a banned message from Google it'd restart the tor service giving me a new IP
06:47 ^🔗	ivan`	I haven't been banned yet doing an average delay of 45 seconds
06:47 ^🔗	ivan`	too bad I probably won't get all of the site at that rate
06:47 ^🔗	ivan`	that's about 1800 pages per day, I wonder how long google keeps things in cache
07:49 ^🔗	SketchCow	Tomorrow's going to be awesome for EVR
07:49 ^🔗	SketchCow	https://archive.org/details/evr_1280-41011-20110925
07:51 ^🔗	SketchCow	We run it against all 5,300 items.
13:11 ^🔗	Warchell	Hello. Is there a way to convert warc file to html archive with relative link structure?
13:26 ^🔗	SketchCow	warctozip
19:52 ^🔗	Warchell	Is anyone alive?
19:55 ^🔗	*	exmic waves
21:32 ^🔗	SketchCow	burp
21:33 ^🔗	Warchell	any technical minded person alive?
21:39 ^🔗	dashcloud	Warchell: in general, if you have a question, just ask it- it's better than asking if someone is around, and waiting for a response to that, and then a response to your actual question
21:39 ^🔗	Warchell	I already did, seems this chat is dead most of the time
21:39 ^🔗	sep332	you asked the same question twice and got the same answer both times
21:41 ^🔗	Warchell	really? I must have skiped them both times. could you paste it?
21:41 ^🔗	Warchell	a lot of logging spam
21:42 ^🔗	sep332	check out warctozip
21:42 ^🔗	sep332	it can probably do what you're asking
21:43 ^🔗	dashcloud	also, if you're idling, configure your client to ignore joins/parts/name changes- it cuts down a lot of the text you have to scroll through after you're back
21:43 ^🔗	Warchell	yeah, i just did that
21:44 ^🔗	Warchell	you convert warc to view-able html mirror for timemachine with the same tool, or custom scripts?
22:46 ^🔗	ivan`	Warchell: maybe try https://github.com/alard/warc-proxy and using wget/wpull/httrack to crawl the site it serves
22:47 ^🔗	ivan`	all three of those can convert links
22:52 ^🔗	Warchell	i'm gonna try it, thanks

irclogger-viewer