[00:18] *** adinbied has joined #warrior [01:20] *** svchfoo1 has quit IRC (Read error: Operation timed out) [01:21] *** svchfoo1 has joined #warrior [01:23] *** svchfoo3 sets mode: +o svchfoo1 [01:49] *** alex____ has quit IRC (Quit: ZZzzz) [09:11] *** alex__ has joined #warrior [09:34] *** nertzy has joined #warrior [09:44] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [17:00] *** alex____ has joined #warrior [17:01] *** alex__ has quit IRC (Ping timeout: 252 seconds) [18:02] Alright, now I'm trying to figure out how to narrow down/optimize the Angelfire script - as this wget log shows: https://pastebin.com/raw/EHHCa6Uv , I think the wget recursiveness is going a little far - I have it set to no parent, but recursive set to inf - what number should I have it set to? [18:07] adinbied: I do use --level inf normally. But yes, this seems to follow links to external pages. I assume you have to filter those out in the Lua hook. [18:09] Here's my Lua hook currently: https://github.com/adinbied/angelfire-grab/blob/master/angelfire.lua [18:10] What would I change to get it to filter it so it grabs images and whatnot but doesn't keep going? [18:14] adinbied: That would be the 'html == 0' check in download_child_p, see e.g. https://github.com/ArchiveTeam/jamiiforums-grab/blob/d3b65c4fda98d752aa1944a73c7933968b2d3d88/jamiiforums.lua#L85 [18:16] I'd recommend using a script from a previous project since those already contain a bunch of things like that (e.g. also filtering out crap, see the get_urls function in that file). [18:18] Whoops - realized my lua script on github is out of date -- just updated it to reflect the log I put earlier [18:28] JAA, I'm a little lost in regards to how I would implement something like the jamii forums Lua into my existing code (Lua is one of the languages I don't really know that well) - would you be able to help me out and provide an example? [18:28] The table.insert part of the wget.callbacks.get_urls in my Lua is what actually parses the sitemap and grabs everything from it -- I think [19:09] adinbied: You can probably just keep these lines https://github.com/adinbied/angelfire-grab/blob/00b2ae85c1fbd851cfeff4a24f5e7918d4e6a116/angelfire.lua#L6-L14 and perhaps add a 'return urls' at the end of that if block, and add that at the beginning of the get_urls function of whichever script you use as a template. And also copy over the httploop_result changes. Remember to remove the site-specific code [19:09] from the template you use. [19:10] (This last part is why it would be very useful to have a generic warrior-template repository, and arkiver was working on one a while ago IIRC.) [19:11] yes [19:18] *** sep332 has quit IRC (Read error: Connection reset by peer)