Time |
Nickname |
Message |
00:05
🔗
|
SketchCow |
The thing that runs and creates metadata keywords on my internet archive items now runs 20 times more efficiently. |
00:06
🔗
|
SketchCow |
So THAT's good. |
00:33
🔗
|
exmic |
important, that |
00:56
🔗
|
godane |
SketchCow: looks like you beat me to this: https://archive.org/details/Operating_Instructions_for_the_MCA_DiscoVision_PR-7820-System_Side_1_Only |
01:03
🔗
|
Warchell |
Hello. Is there a way to convert warc file to html archive with relative link structure? |
01:52
🔗
|
exmic |
the first can be done with warctozip |
01:55
🔗
|
ivan` |
anyone know how many times you can hit google cache per minute before getting banned? |
02:01
🔗
|
ivan` |
wish we had a distributed google cache grabber :) |
06:44
🔗
|
Cameron_D |
I've used tor for that in the past |
06:44
🔗
|
Cameron_D |
if I got a banned message from Google it'd restart the tor service giving me a new IP |
06:47
🔗
|
ivan` |
I haven't been banned yet doing an average delay of 45 seconds |
06:47
🔗
|
ivan` |
too bad I probably won't get all of the site at that rate |
06:47
🔗
|
ivan` |
that's about 1800 pages per day, I wonder how long google keeps things in cache |
07:49
🔗
|
SketchCow |
Tomorrow's going to be awesome for EVR |
07:49
🔗
|
SketchCow |
https://archive.org/details/evr_1280-41011-20110925 |
07:51
🔗
|
SketchCow |
We run it against all 5,300 items. |
13:11
🔗
|
Warchell |
Hello. Is there a way to convert warc file to html archive with relative link structure? |
13:26
🔗
|
SketchCow |
warctozip |
19:52
🔗
|
Warchell |
Is anyone alive? |
19:55
🔗
|
* |
exmic waves |
21:32
🔗
|
SketchCow |
burp |
21:33
🔗
|
Warchell |
any technical minded person alive? |
21:39
🔗
|
dashcloud |
Warchell: in general, if you have a question, just ask it- it's better than asking if someone is around, and waiting for a response to that, and then a response to your actual question |
21:39
🔗
|
Warchell |
I already did, seems this chat is dead most of the time |
21:39
🔗
|
sep332 |
you asked the same question twice and got the same answer both times |
21:41
🔗
|
Warchell |
really? I must have skiped them both times. could you paste it? |
21:41
🔗
|
Warchell |
a lot of logging spam |
21:42
🔗
|
sep332 |
check out warctozip |
21:42
🔗
|
sep332 |
it can probably do what you're asking |
21:43
🔗
|
dashcloud |
also, if you're idling, configure your client to ignore joins/parts/name changes- it cuts down a lot of the text you have to scroll through after you're back |
21:43
🔗
|
Warchell |
yeah, i just did that |
21:44
🔗
|
Warchell |
you convert warc to view-able html mirror for timemachine with the same tool, or custom scripts? |
22:46
🔗
|
ivan` |
Warchell: maybe try https://github.com/alard/warc-proxy and using wget/wpull/httrack to crawl the site it serves |
22:47
🔗
|
ivan` |
all three of those can convert links |
22:52
🔗
|
Warchell |
i'm gonna try it, thanks |