[07:01] [13universal-tracker] 15yipdw 04force-pushed 06weighted-upload-targets from 1424d38db to 14090f4a2: 02https://github.com/ArchiveTeam/universal-tracker/commits/weighted-upload-targets [07:01] 13universal-tracker/06weighted-upload-targets 14090f4a2 15David Yip: Add score management to upload targets page. #20. [07:01] 13universal-tracker/06weighted-upload-targets 14d01948e 15David Yip: Implement weighted random selection. #20. [20:28] what do you guys think of using a locked page on the wiki to communicate messages and preset commands to the warriors? [20:32] I'd prefer the tracker [20:32] not sure why we would want two authorization systems [20:33] plus the overhead of remembering whether or not wikimarkup will render right etc [20:33] also, preset commands sounds dangerous [20:34] I was thinking simple text messages and something like reboot the warrior in a half an hour. [20:34] plus, I thought you said to not use the tracker. [20:34] maybe I misunderstood [20:35] plus, commands would be preset, rather than allowing arbitrary ones; the latter of which I agree is dangerous [20:36] oh, right, you could avoid shoving that in the tracker [20:36] that said maybe it's a better place to put it [20:38] not sure about giving the tracker that much more power over warrior nodes [20:39] keep in mind that there's not really any way to have a warrior authenticate the tracker and vice versa [20:39] the most damage that can be done now is someone feeding bogus items [20:40] slight aside: The warrior already resets itself every so often, doesn't it? I think I saw that somewhere in there. [20:41] yeah [20:41] but that is not the same as responding to a reboot command from some remote thing that claims to be a tracker [20:42] yipdw: https handles that no problem [20:42] I totally get that, just trying to feel out the terrain [20:42] assuming you get the certificates set up [20:43] could pin them, plus they already have to trust the warrior download provided isn't malicious [20:44] setting up the infrastructure for warrior-side HTTPS client certificates would be a PITA, but sending the commands over HTTPS to control warriors shouldn't be a big deal. [20:45] I guess, I'm just not a big fan of remote control via tracker [20:46] or remote control period [20:46] it seems like a problem that's mostly solved via IRC [20:46] If they are using IRC.... [20:47] but I get your point. [20:47] but I also have a very bad reaction to the sort of hyper-optimized products that are the thing in silicon valley these days so it is possible that I am just a luddite [20:49] luddites have their place, regardless [20:49] at least the warrior is not a $200 internet-of-shit-enabled cup [20:50] but hey through science, that can apparently identify what brand of pop you are drinking; despite the fact that in itself, such a sensor would be a breakthrough [20:52] but, on a somewhat related note, I do think a git-pull-and-restart wrapper is a nifty idea [20:53] me too, my thought is just stuck jobs like the infinite loops we've been hitting lately. I don't know if a wrapper would fix that. [20:53] but I am open to the idea I am attempting to over-engineer [20:54] aaaaaaaaa: a wrapper wouldn't fix that [20:54] it wouldn't be able to forcibly terminate a job [20:54] that one's a bit tricky [20:56] archivebot has a way to add URL ignore patterns on the fly, and that's how we deal with infinite loops there, but there's a lot of infrastructure needed for that and nobody wants that in the warrior [20:57] Which is why I thought forcing reboots. For example, we have no idea how many of the old jobs are still going against swipnet. [20:57] archivebot on warrior does seem overkill [20:57] yeah, forcing reboots would work [20:57] well hmm [20:57] actually... [20:58] you'd want to run a background task in the pipeline to check that [20:58] but I also see your concerns about people losing control of their own machines or the DOS potential against warriors [20:58] having warriors pull commands over HTTPS would give you good enough protection against DDoS [20:58] er DOS [20:59] you would probably not want this auto-reboot thing on non-warrior pipelines [20:59] oh no question [21:00] manual runners are already their own thing [21:00] but that's okay -- it's possible to set up a task that only runs in a warrior context [21:00] yeah, actually remote reboot would be okay [21:00] which is why a wrapper is already needed [21:00] I still find it squicky [21:00] but maybe it's not that bad [21:01] adding HTTPS with pinned certificates does mean that someone has to commit to keeping the Warrior image up-to-date [21:01] (hasn't had an update in two years :P) [21:01] also links to the Warrior need to point to the latest image [21:01] otherwise you run into the possibility of certificate check failure [21:02] that or we set up an ArchiveTeam CA or something (yikes) [21:02] HTTPS brings all sorts of unpleasantness [21:02] just FYI, most of the technical details of ssl/tls are internet magic to me [21:03] if we really decide to do this we will need to work out some procedures to keep certificate verification on Warriors working when it should, and not working when it shouldn't [21:04] that or we say "meh, issue commands over HTTP" but I don't really want to go that route, mostly because it's embarassing in this day and age [21:05] we can do certificate pinning but then you need to either commit to long-lived certificates and hope nothing bad happens [21:05] or require periodic certificate store refreshes [21:05] we can set up a CA chain for Archive Team if someone wants to maintain those keys [21:05] actually, the CA chain is irrelevant [21:05] why not commit commands into github? [21:06] you could do that oto [21:06] oh [21:06] hardcode a repo. [21:06] cause github will handle certificates for us [21:07] wait, is githubs code updates ssl/tls? [21:07] yeah [21:08] yeah, doing that would save us from the https madness [21:08] we already trust github anyway [21:08] brb, got a fire [21:28] so to check my understanding. using github would fix the issue of securing commands to the warriors; and the task of looking for new restart commands can be limited to warriors directly, and not need signalling from the pipeline process to the warrior process. [21:28] with messages, would you want those on github, or something else? [21:30] plus there needs to be a git-pull restart-pipeline after so many tasks wrapper as well? [21:40] so many items [22:40] aaaaaaaaa: would probably be easiest to have the pipeline periodically check for new commands [22:40] also I think this should be implemented in a project first and later extracted [22:40] going the other way smells like Java [22:41] also, since we're trusting github and repo committers anyway, "commands" could really just be a shell script [22:43] ok, I'll focus my thinking that way [22:44] one nice thing about doing this all as git repos is that no new UI is needed [22:44] i.e. no new tracker admin screens, no new endpoints [22:45] just additional pipeline tasks and/or periodic tasks installed in the pipeline event loop [22:45] plus we can update code without having to rebuild a brand new warrior [22:45] yeah [22:46] I was thinking of like the current check IP, only every 10 minutes instead of 10 items [22:46] you may want to add it as a periodic task in the event loop [22:46] if you don't, there are situations where you will be unable to force-reboot [22:46] unless that's what you meant [22:46] oh. ok. [22:47] on the other hand, adding something like that means that there is no synchronization between commands and pipeline items [22:48] so don't try doing file operations on WARCs in command scripts [22:48] depends on use case for commands I guess [22:48] I'll have to look at the code some more. part of the problem is the language barrier and the other is not having a complete picture of the pipeline [22:48] but I think the only time a forced reboot is necessary is on stuck items or code updates (until a wrapper is made) [22:49] or DDoS of website [22:49] not sure how a wrapper would help [22:49] a wrapper could only pull once the pipeline stopped [22:49] if you had an infinite loop the pipeline wouldn't stop [22:50] you *can* git pull while a pipeline is running but active wget processes will not reload the script [22:50] right, that is what I meant by stuck item, is basically any problem that keeps them from finishing [22:50] so a wrapper would make restarting for regular code updates unnecessary. [22:50] maybe we are just talking past each other [22:51] oh, I just meant the stuck item case [22:51] anyway, pipeline uses tornado's event loop, so may also want to check that out