Time |
Nickname |
Message |
07:01
🔗
|
GLaDOS |
[13universal-tracker] 15yipdw 04force-pushed 06weighted-upload-targets from 1424d38db to 14090f4a2: 02https://github.com/ArchiveTeam/universal-tracker/commits/weighted-upload-targets |
07:01
🔗
|
GLaDOS |
13universal-tracker/06weighted-upload-targets 14090f4a2 15David Yip: Add score management to upload targets page. #20. |
07:01
🔗
|
GLaDOS |
13universal-tracker/06weighted-upload-targets 14d01948e 15David Yip: Implement weighted random selection. #20. |
20:28
🔗
|
aaaaaaaaa |
what do you guys think of using a locked page on the wiki to communicate messages and preset commands to the warriors? |
20:32
🔗
|
yipdw |
I'd prefer the tracker |
20:32
🔗
|
yipdw |
not sure why we would want two authorization systems |
20:33
🔗
|
yipdw |
plus the overhead of remembering whether or not wikimarkup will render right etc |
20:33
🔗
|
yipdw |
also, preset commands sounds dangerous |
20:34
🔗
|
aaaaaaaaa |
I was thinking simple text messages and something like reboot the warrior in a half an hour. |
20:34
🔗
|
aaaaaaaaa |
plus, I thought you said to not use the tracker. |
20:34
🔗
|
aaaaaaaaa |
maybe I misunderstood |
20:35
🔗
|
aaaaaaaaa |
plus, commands would be preset, rather than allowing arbitrary ones; the latter of which I agree is dangerous |
20:36
🔗
|
yipdw |
oh, right, you could avoid shoving that in the tracker |
20:36
🔗
|
yipdw |
that said maybe it's a better place to put it |
20:38
🔗
|
yipdw |
not sure about giving the tracker that much more power over warrior nodes |
20:39
🔗
|
yipdw |
keep in mind that there's not really any way to have a warrior authenticate the tracker and vice versa |
20:39
🔗
|
yipdw |
the most damage that can be done now is someone feeding bogus items |
20:40
🔗
|
aaaaaaaaa |
slight aside: The warrior already resets itself every so often, doesn't it? I think I saw that somewhere in there. |
20:41
🔗
|
yipdw |
yeah |
20:41
🔗
|
yipdw |
but that is not the same as responding to a reboot command from some remote thing that claims to be a tracker |
20:42
🔗
|
phuzion |
yipdw: https handles that no problem |
20:42
🔗
|
aaaaaaaaa |
I totally get that, just trying to feel out the terrain |
20:42
🔗
|
yipdw |
assuming you get the certificates set up |
20:43
🔗
|
aaaaaaaaa |
could pin them, plus they already have to trust the warrior download provided isn't malicious |
20:44
🔗
|
phuzion |
setting up the infrastructure for warrior-side HTTPS client certificates would be a PITA, but sending the commands over HTTPS to control warriors shouldn't be a big deal. |
20:45
🔗
|
yipdw |
I guess, I'm just not a big fan of remote control via tracker |
20:46
🔗
|
yipdw |
or remote control period |
20:46
🔗
|
yipdw |
it seems like a problem that's mostly solved via IRC |
20:46
🔗
|
aaaaaaaaa |
If they are using IRC.... |
20:47
🔗
|
aaaaaaaaa |
but I get your point. |
20:47
🔗
|
yipdw |
but I also have a very bad reaction to the sort of hyper-optimized products that are the thing in silicon valley these days so it is possible that I am just a luddite |
20:49
🔗
|
aaaaaaaaa |
luddites have their place, regardless |
20:49
🔗
|
yipdw |
at least the warrior is not a $200 internet-of-shit-enabled cup |
20:50
🔗
|
aaaaaaaaa |
but hey through science, that can apparently identify what brand of pop you are drinking; despite the fact that in itself, such a sensor would be a breakthrough |
20:52
🔗
|
yipdw |
but, on a somewhat related note, I do think a git-pull-and-restart wrapper is a nifty idea |
20:53
🔗
|
aaaaaaaaa |
me too, my thought is just stuck jobs like the infinite loops we've been hitting lately. I don't know if a wrapper would fix that. |
20:53
🔗
|
aaaaaaaaa |
but I am open to the idea I am attempting to over-engineer |
20:54
🔗
|
yipdw |
aaaaaaaaa: a wrapper wouldn't fix that |
20:54
🔗
|
yipdw |
it wouldn't be able to forcibly terminate a job |
20:54
🔗
|
yipdw |
that one's a bit tricky |
20:56
🔗
|
yipdw |
archivebot has a way to add URL ignore patterns on the fly, and that's how we deal with infinite loops there, but there's a lot of infrastructure needed for that and nobody wants that in the warrior |
20:57
🔗
|
aaaaaaaaa |
Which is why I thought forcing reboots. For example, we have no idea how many of the old jobs are still going against swipnet. |
20:57
🔗
|
aaaaaaaaa |
archivebot on warrior does seem overkill |
20:57
🔗
|
yipdw |
yeah, forcing reboots would work |
20:57
🔗
|
yipdw |
well hmm |
20:57
🔗
|
yipdw |
actually... |
20:58
🔗
|
yipdw |
you'd want to run a background task in the pipeline to check that |
20:58
🔗
|
aaaaaaaaa |
but I also see your concerns about people losing control of their own machines or the DOS potential against warriors |
20:58
🔗
|
yipdw |
having warriors pull commands over HTTPS would give you good enough protection against DDoS |
20:58
🔗
|
yipdw |
er DOS |
20:59
🔗
|
yipdw |
you would probably not want this auto-reboot thing on non-warrior pipelines |
20:59
🔗
|
aaaaaaaaa |
oh no question |
21:00
🔗
|
aaaaaaaaa |
manual runners are already their own thing |
21:00
🔗
|
yipdw |
but that's okay -- it's possible to set up a task that only runs in a warrior context |
21:00
🔗
|
yipdw |
yeah, actually remote reboot would be okay |
21:00
🔗
|
aaaaaaaaa |
which is why a wrapper is already needed |
21:00
🔗
|
yipdw |
I still find it squicky |
21:00
🔗
|
yipdw |
but maybe it's not that bad |
21:01
🔗
|
yipdw |
adding HTTPS with pinned certificates does mean that someone has to commit to keeping the Warrior image up-to-date |
21:01
🔗
|
yipdw |
(hasn't had an update in two years :P) |
21:01
🔗
|
yipdw |
also links to the Warrior need to point to the latest image |
21:01
🔗
|
yipdw |
otherwise you run into the possibility of certificate check failure |
21:02
🔗
|
yipdw |
that or we set up an ArchiveTeam CA or something (yikes) |
21:02
🔗
|
yipdw |
HTTPS brings all sorts of unpleasantness |
21:02
🔗
|
aaaaaaaaa |
just FYI, most of the technical details of ssl/tls are internet magic to me |
21:03
🔗
|
yipdw |
if we really decide to do this we will need to work out some procedures to keep certificate verification on Warriors working when it should, and not working when it shouldn't |
21:04
🔗
|
yipdw |
that or we say "meh, issue commands over HTTP" but I don't really want to go that route, mostly because it's embarassing in this day and age |
21:05
🔗
|
yipdw |
we can do certificate pinning but then you need to either commit to long-lived certificates and hope nothing bad happens |
21:05
🔗
|
yipdw |
or require periodic certificate store refreshes |
21:05
🔗
|
yipdw |
we can set up a CA chain for Archive Team if someone wants to maintain those keys |
21:05
🔗
|
yipdw |
actually, the CA chain is irrelevant |
21:05
🔗
|
chfoo |
why not commit commands into github? |
21:06
🔗
|
yipdw |
you could do that oto |
21:06
🔗
|
aaaaaaaaa |
oh |
21:06
🔗
|
aaaaaaaaa |
hardcode a repo. |
21:06
🔗
|
aaaaaaaaa |
cause github will handle certificates for us |
21:07
🔗
|
aaaaaaaaa |
wait, is githubs code updates ssl/tls? |
21:07
🔗
|
yipdw |
yeah |
21:08
🔗
|
yipdw |
yeah, doing that would save us from the https madness |
21:08
🔗
|
yipdw |
we already trust github anyway |
21:08
🔗
|
aaaaaaaaa |
brb, got a fire |
21:28
🔗
|
aaaaaaaaa |
so to check my understanding. using github would fix the issue of securing commands to the warriors; and the task of looking for new restart commands can be limited to warriors directly, and not need signalling from the pipeline process to the warrior process. |
21:28
🔗
|
aaaaaaaaa |
with messages, would you want those on github, or something else? |
21:30
🔗
|
aaaaaaaaa |
plus there needs to be a git-pull restart-pipeline after so many tasks wrapper as well? |
21:40
🔗
|
aaaaaaaaa |
so many items |
22:40
🔗
|
yipdw |
aaaaaaaaa: would probably be easiest to have the pipeline periodically check for new commands |
22:40
🔗
|
yipdw |
also I think this should be implemented in a project first and later extracted |
22:40
🔗
|
yipdw |
going the other way smells like Java |
22:41
🔗
|
yipdw |
also, since we're trusting github and repo committers anyway, "commands" could really just be a shell script |
22:43
🔗
|
aaaaaaaaa |
ok, I'll focus my thinking that way |
22:44
🔗
|
yipdw |
one nice thing about doing this all as git repos is that no new UI is needed |
22:44
🔗
|
yipdw |
i.e. no new tracker admin screens, no new endpoints |
22:45
🔗
|
yipdw |
just additional pipeline tasks and/or periodic tasks installed in the pipeline event loop |
22:45
🔗
|
aaaaaaaaa |
plus we can update code without having to rebuild a brand new warrior |
22:45
🔗
|
yipdw |
yeah |
22:46
🔗
|
aaaaaaaaa |
I was thinking of like the current check IP, only every 10 minutes instead of 10 items |
22:46
🔗
|
yipdw |
you may want to add it as a periodic task in the event loop |
22:46
🔗
|
yipdw |
if you don't, there are situations where you will be unable to force-reboot |
22:46
🔗
|
yipdw |
unless that's what you meant |
22:46
🔗
|
aaaaaaaaa |
oh. ok. |
22:47
🔗
|
yipdw |
on the other hand, adding something like that means that there is no synchronization between commands and pipeline items |
22:48
🔗
|
yipdw |
so don't try doing file operations on WARCs in command scripts |
22:48
🔗
|
yipdw |
depends on use case for commands I guess |
22:48
🔗
|
aaaaaaaaa |
I'll have to look at the code some more. part of the problem is the language barrier and the other is not having a complete picture of the pipeline |
22:48
🔗
|
aaaaaaaaa |
but I think the only time a forced reboot is necessary is on stuck items or code updates (until a wrapper is made) |
22:49
🔗
|
aaaaaaaaa |
or DDoS of website |
22:49
🔗
|
yipdw |
not sure how a wrapper would help |
22:49
🔗
|
yipdw |
a wrapper could only pull once the pipeline stopped |
22:49
🔗
|
yipdw |
if you had an infinite loop the pipeline wouldn't stop |
22:50
🔗
|
yipdw |
you *can* git pull while a pipeline is running but active wget processes will not reload the script |
22:50
🔗
|
aaaaaaaaa |
right, that is what I meant by stuck item, is basically any problem that keeps them from finishing |
22:50
🔗
|
aaaaaaaaa |
so a wrapper would make restarting for regular code updates unnecessary. |
22:50
🔗
|
aaaaaaaaa |
maybe we are just talking past each other |
22:51
🔗
|
yipdw |
oh, I just meant the stuck item case |
22:51
🔗
|
yipdw |
anyway, pipeline uses tornado's event loop, so may also want to check that out |