#archiveteam 2014-05-29,Thu

↑back Search

Time Nickname Message
00:05 🔗 SketchCow The thing that runs and creates metadata keywords on my internet archive items now runs 20 times more efficiently.
00:06 🔗 SketchCow So THAT's good.
00:33 🔗 exmic important, that
00:56 🔗 godane SketchCow: looks like you beat me to this: https://archive.org/details/Operating_Instructions_for_the_MCA_DiscoVision_PR-7820-System_Side_1_Only
01:03 🔗 Warchell Hello. Is there a way to convert warc file to html archive with relative link structure?
01:52 🔗 exmic the first can be done with warctozip
01:55 🔗 ivan` anyone know how many times you can hit google cache per minute before getting banned?
02:01 🔗 ivan` wish we had a distributed google cache grabber :)
06:44 🔗 Cameron_D I've used tor for that in the past
06:44 🔗 Cameron_D if I got a banned message from Google it'd restart the tor service giving me a new IP
06:47 🔗 ivan` I haven't been banned yet doing an average delay of 45 seconds
06:47 🔗 ivan` too bad I probably won't get all of the site at that rate
06:47 🔗 ivan` that's about 1800 pages per day, I wonder how long google keeps things in cache
07:49 🔗 SketchCow Tomorrow's going to be awesome for EVR
07:49 🔗 SketchCow https://archive.org/details/evr_1280-41011-20110925
07:51 🔗 SketchCow We run it against all 5,300 items.
13:11 🔗 Warchell Hello. Is there a way to convert warc file to html archive with relative link structure?
13:26 🔗 SketchCow warctozip
19:52 🔗 Warchell Is anyone alive?
19:55 🔗 * exmic waves
21:32 🔗 SketchCow burp
21:33 🔗 Warchell any technical minded person alive?
21:39 🔗 dashcloud Warchell: in general, if you have a question, just ask it- it's better than asking if someone is around, and waiting for a response to that, and then a response to your actual question
21:39 🔗 Warchell I already did, seems this chat is dead most of the time
21:39 🔗 sep332 you asked the same question twice and got the same answer both times
21:41 🔗 Warchell really? I must have skiped them both times. could you paste it?
21:41 🔗 Warchell a lot of logging spam
21:42 🔗 sep332 check out warctozip
21:42 🔗 sep332 it can probably do what you're asking
21:43 🔗 dashcloud also, if you're idling, configure your client to ignore joins/parts/name changes- it cuts down a lot of the text you have to scroll through after you're back
21:43 🔗 Warchell yeah, i just did that
21:44 🔗 Warchell you convert warc to view-able html mirror for timemachine with the same tool, or custom scripts?
22:46 🔗 ivan` Warchell: maybe try https://github.com/alard/warc-proxy and using wget/wpull/httrack to crawl the site it serves
22:47 🔗 ivan` all three of those can convert links
22:52 🔗 Warchell i'm gonna try it, thanks

irclogger-viewer