[00:05] The thing that runs and creates metadata keywords on my internet archive items now runs 20 times more efficiently. [00:06] So THAT's good. [00:33] important, that [00:56] SketchCow: looks like you beat me to this: https://archive.org/details/Operating_Instructions_for_the_MCA_DiscoVision_PR-7820-System_Side_1_Only [01:03] Hello. Is there a way to convert warc file to html archive with relative link structure? [01:52] the first can be done with warctozip [01:55] anyone know how many times you can hit google cache per minute before getting banned? [02:01] wish we had a distributed google cache grabber :) [06:44] I've used tor for that in the past [06:44] if I got a banned message from Google it'd restart the tor service giving me a new IP [06:47] I haven't been banned yet doing an average delay of 45 seconds [06:47] too bad I probably won't get all of the site at that rate [06:47] that's about 1800 pages per day, I wonder how long google keeps things in cache [07:49] Tomorrow's going to be awesome for EVR [07:49] https://archive.org/details/evr_1280-41011-20110925 [07:51] We run it against all 5,300 items. [13:11] Hello. Is there a way to convert warc file to html archive with relative link structure? [13:26] warctozip [19:52] Is anyone alive? [19:55] * exmic waves [21:32] burp [21:33] any technical minded person alive? [21:39] Warchell: in general, if you have a question, just ask it- it's better than asking if someone is around, and waiting for a response to that, and then a response to your actual question [21:39] I already did, seems this chat is dead most of the time [21:39] you asked the same question twice and got the same answer both times [21:41] really? I must have skiped them both times. could you paste it? [21:41] a lot of logging spam [21:42] check out warctozip [21:42] it can probably do what you're asking [21:43] also, if you're idling, configure your client to ignore joins/parts/name changes- it cuts down a lot of the text you have to scroll through after you're back [21:43] yeah, i just did that [21:44] you convert warc to view-able html mirror for timemachine with the same tool, or custom scripts? [22:46] Warchell: maybe try https://github.com/alard/warc-proxy and using wget/wpull/httrack to crawl the site it serves [22:47] all three of those can convert links [22:52] i'm gonna try it, thanks