Time |
Nickname |
Message |
00:18
🔗
|
|
adinbied has joined #warrior |
01:20
🔗
|
|
svchfoo1 has quit IRC (Read error: Operation timed out) |
01:21
🔗
|
|
svchfoo1 has joined #warrior |
01:23
🔗
|
|
svchfoo3 sets mode: +o svchfoo1 |
01:49
🔗
|
|
alex____ has quit IRC (Quit: ZZzzz) |
09:11
🔗
|
|
alex__ has joined #warrior |
09:34
🔗
|
|
nertzy has joined #warrior |
09:44
🔗
|
|
nertzy has quit IRC (Quit: This computer has gone to sleep) |
17:00
🔗
|
|
alex____ has joined #warrior |
17:01
🔗
|
|
alex__ has quit IRC (Ping timeout: 252 seconds) |
18:02
🔗
|
adinbied |
Alright, now I'm trying to figure out how to narrow down/optimize the Angelfire script - as this wget log shows: https://pastebin.com/raw/EHHCa6Uv , I think the wget recursiveness is going a little far - I have it set to no parent, but recursive set to inf - what number should I have it set to? |
18:07
🔗
|
JAA |
adinbied: I do use --level inf normally. But yes, this seems to follow links to external pages. I assume you have to filter those out in the Lua hook. |
18:09
🔗
|
adinbied |
Here's my Lua hook currently: https://github.com/adinbied/angelfire-grab/blob/master/angelfire.lua |
18:10
🔗
|
adinbied |
What would I change to get it to filter it so it grabs images and whatnot but doesn't keep going? |
18:14
🔗
|
JAA |
adinbied: That would be the 'html == 0' check in download_child_p, see e.g. https://github.com/ArchiveTeam/jamiiforums-grab/blob/d3b65c4fda98d752aa1944a73c7933968b2d3d88/jamiiforums.lua#L85 |
18:16
🔗
|
JAA |
I'd recommend using a script from a previous project since those already contain a bunch of things like that (e.g. also filtering out crap, see the get_urls function in that file). |
18:18
🔗
|
adinbied |
Whoops - realized my lua script on github is out of date -- just updated it to reflect the log I put earlier |
18:28
🔗
|
adinbied |
JAA, I'm a little lost in regards to how I would implement something like the jamii forums Lua into my existing code (Lua is one of the languages I don't really know that well) - would you be able to help me out and provide an example? |
18:28
🔗
|
adinbied |
The table.insert part of the wget.callbacks.get_urls in my Lua is what actually parses the sitemap and grabs everything from it -- I think |
19:09
🔗
|
JAA |
adinbied: You can probably just keep these lines https://github.com/adinbied/angelfire-grab/blob/00b2ae85c1fbd851cfeff4a24f5e7918d4e6a116/angelfire.lua#L6-L14 and perhaps add a 'return urls' at the end of that if block, and add that at the beginning of the get_urls function of whichever script you use as a template. And also copy over the httploop_result changes. Remember to remove the site-specific code |
19:09
🔗
|
JAA |
from the template you use. |
19:10
🔗
|
JAA |
(This last part is why it would be very useful to have a generic warrior-template repository, and arkiver was working on one a while ago IIRC.) |
19:11
🔗
|
arkiver |
yes |
19:18
🔗
|
|
sep332 has quit IRC (Read error: Connection reset by peer) |