#warrior 2018-11-03,Sat

↑back Search

Time Nickname Message
00:18 🔗 adinbied has joined #warrior
01:20 🔗 svchfoo1 has quit IRC (Read error: Operation timed out)
01:21 🔗 svchfoo1 has joined #warrior
01:23 🔗 svchfoo3 sets mode: +o svchfoo1
01:49 🔗 alex____ has quit IRC (Quit: ZZzzz)
09:11 🔗 alex__ has joined #warrior
09:34 🔗 nertzy has joined #warrior
09:44 🔗 nertzy has quit IRC (Quit: This computer has gone to sleep)
17:00 🔗 alex____ has joined #warrior
17:01 🔗 alex__ has quit IRC (Ping timeout: 252 seconds)
18:02 🔗 adinbied Alright, now I'm trying to figure out how to narrow down/optimize the Angelfire script - as this wget log shows: https://pastebin.com/raw/EHHCa6Uv , I think the wget recursiveness is going a little far - I have it set to no parent, but recursive set to inf - what number should I have it set to?
18:07 🔗 JAA adinbied: I do use --level inf normally. But yes, this seems to follow links to external pages. I assume you have to filter those out in the Lua hook.
18:09 🔗 adinbied Here's my Lua hook currently: https://github.com/adinbied/angelfire-grab/blob/master/angelfire.lua
18:10 🔗 adinbied What would I change to get it to filter it so it grabs images and whatnot but doesn't keep going?
18:14 🔗 JAA adinbied: That would be the 'html == 0' check in download_child_p, see e.g. https://github.com/ArchiveTeam/jamiiforums-grab/blob/d3b65c4fda98d752aa1944a73c7933968b2d3d88/jamiiforums.lua#L85
18:16 🔗 JAA I'd recommend using a script from a previous project since those already contain a bunch of things like that (e.g. also filtering out crap, see the get_urls function in that file).
18:18 🔗 adinbied Whoops - realized my lua script on github is out of date -- just updated it to reflect the log I put earlier
18:28 🔗 adinbied JAA, I'm a little lost in regards to how I would implement something like the jamii forums Lua into my existing code (Lua is one of the languages I don't really know that well) - would you be able to help me out and provide an example?
18:28 🔗 adinbied The table.insert part of the wget.callbacks.get_urls in my Lua is what actually parses the sitemap and grabs everything from it -- I think
19:09 🔗 JAA adinbied: You can probably just keep these lines https://github.com/adinbied/angelfire-grab/blob/00b2ae85c1fbd851cfeff4a24f5e7918d4e6a116/angelfire.lua#L6-L14 and perhaps add a 'return urls' at the end of that if block, and add that at the beginning of the get_urls function of whichever script you use as a template. And also copy over the httploop_result changes. Remember to remove the site-specific code
19:09 🔗 JAA from the template you use.
19:10 🔗 JAA (This last part is why it would be very useful to have a generic warrior-template repository, and arkiver was working on one a while ago IIRC.)
19:11 🔗 arkiver yes
19:18 🔗 sep332 has quit IRC (Read error: Connection reset by peer)

irclogger-viewer