#warrior 2014-08-26,Tue

↑back Search

Time Nickname Message
07:01 🔗 GLaDOS [13universal-tracker] 15yipdw 04force-pushed 06weighted-upload-targets from 1424d38db to 14090f4a2: 02https://github.com/ArchiveTeam/universal-tracker/commits/weighted-upload-targets
07:01 🔗 GLaDOS 13universal-tracker/06weighted-upload-targets 14090f4a2 15David Yip: Add score management to upload targets page. #20.
07:01 🔗 GLaDOS 13universal-tracker/06weighted-upload-targets 14d01948e 15David Yip: Implement weighted random selection. #20.
20:28 🔗 aaaaaaaaa what do you guys think of using a locked page on the wiki to communicate messages and preset commands to the warriors?
20:32 🔗 yipdw I'd prefer the tracker
20:32 🔗 yipdw not sure why we would want two authorization systems
20:33 🔗 yipdw plus the overhead of remembering whether or not wikimarkup will render right etc
20:33 🔗 yipdw also, preset commands sounds dangerous
20:34 🔗 aaaaaaaaa I was thinking simple text messages and something like reboot the warrior in a half an hour.
20:34 🔗 aaaaaaaaa plus, I thought you said to not use the tracker.
20:34 🔗 aaaaaaaaa maybe I misunderstood
20:35 🔗 aaaaaaaaa plus, commands would be preset, rather than allowing arbitrary ones; the latter of which I agree is dangerous
20:36 🔗 yipdw oh, right, you could avoid shoving that in the tracker
20:36 🔗 yipdw that said maybe it's a better place to put it
20:38 🔗 yipdw not sure about giving the tracker that much more power over warrior nodes
20:39 🔗 yipdw keep in mind that there's not really any way to have a warrior authenticate the tracker and vice versa
20:39 🔗 yipdw the most damage that can be done now is someone feeding bogus items
20:40 🔗 aaaaaaaaa slight aside: The warrior already resets itself every so often, doesn't it? I think I saw that somewhere in there.
20:41 🔗 yipdw yeah
20:41 🔗 yipdw but that is not the same as responding to a reboot command from some remote thing that claims to be a tracker
20:42 🔗 phuzion yipdw: https handles that no problem
20:42 🔗 aaaaaaaaa I totally get that, just trying to feel out the terrain
20:42 🔗 yipdw assuming you get the certificates set up
20:43 🔗 aaaaaaaaa could pin them, plus they already have to trust the warrior download provided isn't malicious
20:44 🔗 phuzion setting up the infrastructure for warrior-side HTTPS client certificates would be a PITA, but sending the commands over HTTPS to control warriors shouldn't be a big deal.
20:45 🔗 yipdw I guess, I'm just not a big fan of remote control via tracker
20:46 🔗 yipdw or remote control period
20:46 🔗 yipdw it seems like a problem that's mostly solved via IRC
20:46 🔗 aaaaaaaaa If they are using IRC....
20:47 🔗 aaaaaaaaa but I get your point.
20:47 🔗 yipdw but I also have a very bad reaction to the sort of hyper-optimized products that are the thing in silicon valley these days so it is possible that I am just a luddite
20:49 🔗 aaaaaaaaa luddites have their place, regardless
20:49 🔗 yipdw at least the warrior is not a $200 internet-of-shit-enabled cup
20:50 🔗 aaaaaaaaa but hey through science, that can apparently identify what brand of pop you are drinking; despite the fact that in itself, such a sensor would be a breakthrough
20:52 🔗 yipdw but, on a somewhat related note, I do think a git-pull-and-restart wrapper is a nifty idea
20:53 🔗 aaaaaaaaa me too, my thought is just stuck jobs like the infinite loops we've been hitting lately. I don't know if a wrapper would fix that.
20:53 🔗 aaaaaaaaa but I am open to the idea I am attempting to over-engineer
20:54 🔗 yipdw aaaaaaaaa: a wrapper wouldn't fix that
20:54 🔗 yipdw it wouldn't be able to forcibly terminate a job
20:54 🔗 yipdw that one's a bit tricky
20:56 🔗 yipdw archivebot has a way to add URL ignore patterns on the fly, and that's how we deal with infinite loops there, but there's a lot of infrastructure needed for that and nobody wants that in the warrior
20:57 🔗 aaaaaaaaa Which is why I thought forcing reboots. For example, we have no idea how many of the old jobs are still going against swipnet.
20:57 🔗 aaaaaaaaa archivebot on warrior does seem overkill
20:57 🔗 yipdw yeah, forcing reboots would work
20:57 🔗 yipdw well hmm
20:57 🔗 yipdw actually...
20:58 🔗 yipdw you'd want to run a background task in the pipeline to check that
20:58 🔗 aaaaaaaaa but I also see your concerns about people losing control of their own machines or the DOS potential against warriors
20:58 🔗 yipdw having warriors pull commands over HTTPS would give you good enough protection against DDoS
20:58 🔗 yipdw er DOS
20:59 🔗 yipdw you would probably not want this auto-reboot thing on non-warrior pipelines
20:59 🔗 aaaaaaaaa oh no question
21:00 🔗 aaaaaaaaa manual runners are already their own thing
21:00 🔗 yipdw but that's okay -- it's possible to set up a task that only runs in a warrior context
21:00 🔗 yipdw yeah, actually remote reboot would be okay
21:00 🔗 aaaaaaaaa which is why a wrapper is already needed
21:00 🔗 yipdw I still find it squicky
21:00 🔗 yipdw but maybe it's not that bad
21:01 🔗 yipdw adding HTTPS with pinned certificates does mean that someone has to commit to keeping the Warrior image up-to-date
21:01 🔗 yipdw (hasn't had an update in two years :P)
21:01 🔗 yipdw also links to the Warrior need to point to the latest image
21:01 🔗 yipdw otherwise you run into the possibility of certificate check failure
21:02 🔗 yipdw that or we set up an ArchiveTeam CA or something (yikes)
21:02 🔗 yipdw HTTPS brings all sorts of unpleasantness
21:02 🔗 aaaaaaaaa just FYI, most of the technical details of ssl/tls are internet magic to me
21:03 🔗 yipdw if we really decide to do this we will need to work out some procedures to keep certificate verification on Warriors working when it should, and not working when it shouldn't
21:04 🔗 yipdw that or we say "meh, issue commands over HTTP" but I don't really want to go that route, mostly because it's embarassing in this day and age
21:05 🔗 yipdw we can do certificate pinning but then you need to either commit to long-lived certificates and hope nothing bad happens
21:05 🔗 yipdw or require periodic certificate store refreshes
21:05 🔗 yipdw we can set up a CA chain for Archive Team if someone wants to maintain those keys
21:05 🔗 yipdw actually, the CA chain is irrelevant
21:05 🔗 chfoo why not commit commands into github?
21:06 🔗 yipdw you could do that oto
21:06 🔗 aaaaaaaaa oh
21:06 🔗 aaaaaaaaa hardcode a repo.
21:06 🔗 aaaaaaaaa cause github will handle certificates for us
21:07 🔗 aaaaaaaaa wait, is githubs code updates ssl/tls?
21:07 🔗 yipdw yeah
21:08 🔗 yipdw yeah, doing that would save us from the https madness
21:08 🔗 yipdw we already trust github anyway
21:08 🔗 aaaaaaaaa brb, got a fire
21:28 🔗 aaaaaaaaa so to check my understanding. using github would fix the issue of securing commands to the warriors; and the task of looking for new restart commands can be limited to warriors directly, and not need signalling from the pipeline process to the warrior process.
21:28 🔗 aaaaaaaaa with messages, would you want those on github, or something else?
21:30 🔗 aaaaaaaaa plus there needs to be a git-pull restart-pipeline after so many tasks wrapper as well?
21:40 🔗 aaaaaaaaa so many items
22:40 🔗 yipdw aaaaaaaaa: would probably be easiest to have the pipeline periodically check for new commands
22:40 🔗 yipdw also I think this should be implemented in a project first and later extracted
22:40 🔗 yipdw going the other way smells like Java
22:41 🔗 yipdw also, since we're trusting github and repo committers anyway, "commands" could really just be a shell script
22:43 🔗 aaaaaaaaa ok, I'll focus my thinking that way
22:44 🔗 yipdw one nice thing about doing this all as git repos is that no new UI is needed
22:44 🔗 yipdw i.e. no new tracker admin screens, no new endpoints
22:45 🔗 yipdw just additional pipeline tasks and/or periodic tasks installed in the pipeline event loop
22:45 🔗 aaaaaaaaa plus we can update code without having to rebuild a brand new warrior
22:45 🔗 yipdw yeah
22:46 🔗 aaaaaaaaa I was thinking of like the current check IP, only every 10 minutes instead of 10 items
22:46 🔗 yipdw you may want to add it as a periodic task in the event loop
22:46 🔗 yipdw if you don't, there are situations where you will be unable to force-reboot
22:46 🔗 yipdw unless that's what you meant
22:46 🔗 aaaaaaaaa oh. ok.
22:47 🔗 yipdw on the other hand, adding something like that means that there is no synchronization between commands and pipeline items
22:48 🔗 yipdw so don't try doing file operations on WARCs in command scripts
22:48 🔗 yipdw depends on use case for commands I guess
22:48 🔗 aaaaaaaaa I'll have to look at the code some more. part of the problem is the language barrier and the other is not having a complete picture of the pipeline
22:48 🔗 aaaaaaaaa but I think the only time a forced reboot is necessary is on stuck items or code updates (until a wrapper is made)
22:49 🔗 aaaaaaaaa or DDoS of website
22:49 🔗 yipdw not sure how a wrapper would help
22:49 🔗 yipdw a wrapper could only pull once the pipeline stopped
22:49 🔗 yipdw if you had an infinite loop the pipeline wouldn't stop
22:50 🔗 yipdw you *can* git pull while a pipeline is running but active wget processes will not reload the script
22:50 🔗 aaaaaaaaa right, that is what I meant by stuck item, is basically any problem that keeps them from finishing
22:50 🔗 aaaaaaaaa so a wrapper would make restarting for regular code updates unnecessary.
22:50 🔗 aaaaaaaaa maybe we are just talking past each other
22:51 🔗 yipdw oh, I just meant the stuck item case
22:51 🔗 yipdw anyway, pipeline uses tornado's event loop, so may also want to check that out

irclogger-viewer