#warrior 2014-08-26,Tue

↑back Search

Time	Nickname	Message
07:01 ^🔗	GLaDOS	[13universal-tracker] 15yipdw 04force-pushed 06weighted-upload-targets from 1424d38db to 14090f4a2: 02https://github.com/ArchiveTeam/universal-tracker/commits/weighted-upload-targets
07:01 ^🔗	GLaDOS	13universal-tracker/06weighted-upload-targets 14090f4a2 15David Yip: Add score management to upload targets page. #20.
07:01 ^🔗	GLaDOS	13universal-tracker/06weighted-upload-targets 14d01948e 15David Yip: Implement weighted random selection. #20.
20:28 ^🔗	aaaaaaaaa	what do you guys think of using a locked page on the wiki to communicate messages and preset commands to the warriors?
20:32 ^🔗	yipdw	I'd prefer the tracker
20:32 ^🔗	yipdw	not sure why we would want two authorization systems
20:33 ^🔗	yipdw	plus the overhead of remembering whether or not wikimarkup will render right etc
20:33 ^🔗	yipdw	also, preset commands sounds dangerous
20:34 ^🔗	aaaaaaaaa	I was thinking simple text messages and something like reboot the warrior in a half an hour.
20:34 ^🔗	aaaaaaaaa	plus, I thought you said to not use the tracker.
20:34 ^🔗	aaaaaaaaa	maybe I misunderstood
20:35 ^🔗	aaaaaaaaa	plus, commands would be preset, rather than allowing arbitrary ones; the latter of which I agree is dangerous
20:36 ^🔗	yipdw	oh, right, you could avoid shoving that in the tracker
20:36 ^🔗	yipdw	that said maybe it's a better place to put it
20:38 ^🔗	yipdw	not sure about giving the tracker that much more power over warrior nodes
20:39 ^🔗	yipdw	keep in mind that there's not really any way to have a warrior authenticate the tracker and vice versa
20:39 ^🔗	yipdw	the most damage that can be done now is someone feeding bogus items
20:40 ^🔗	aaaaaaaaa	slight aside: The warrior already resets itself every so often, doesn't it? I think I saw that somewhere in there.
20:41 ^🔗	yipdw	yeah
20:41 ^🔗	yipdw	but that is not the same as responding to a reboot command from some remote thing that claims to be a tracker
20:42 ^🔗	phuzion	yipdw: https handles that no problem
20:42 ^🔗	aaaaaaaaa	I totally get that, just trying to feel out the terrain
20:42 ^🔗	yipdw	assuming you get the certificates set up
20:43 ^🔗	aaaaaaaaa	could pin them, plus they already have to trust the warrior download provided isn't malicious
20:44 ^🔗	phuzion	setting up the infrastructure for warrior-side HTTPS client certificates would be a PITA, but sending the commands over HTTPS to control warriors shouldn't be a big deal.
20:45 ^🔗	yipdw	I guess, I'm just not a big fan of remote control via tracker
20:46 ^🔗	yipdw	or remote control period
20:46 ^🔗	yipdw	it seems like a problem that's mostly solved via IRC
20:46 ^🔗	aaaaaaaaa	If they are using IRC....
20:47 ^🔗	aaaaaaaaa	but I get your point.
20:47 ^🔗	yipdw	but I also have a very bad reaction to the sort of hyper-optimized products that are the thing in silicon valley these days so it is possible that I am just a luddite
20:49 ^🔗	aaaaaaaaa	luddites have their place, regardless
20:49 ^🔗	yipdw	at least the warrior is not a $200 internet-of-shit-enabled cup
20:50 ^🔗	aaaaaaaaa	but hey through science, that can apparently identify what brand of pop you are drinking; despite the fact that in itself, such a sensor would be a breakthrough
20:52 ^🔗	yipdw	but, on a somewhat related note, I do think a git-pull-and-restart wrapper is a nifty idea
20:53 ^🔗	aaaaaaaaa	me too, my thought is just stuck jobs like the infinite loops we've been hitting lately. I don't know if a wrapper would fix that.
20:53 ^🔗	aaaaaaaaa	but I am open to the idea I am attempting to over-engineer
20:54 ^🔗	yipdw	aaaaaaaaa: a wrapper wouldn't fix that
20:54 ^🔗	yipdw	it wouldn't be able to forcibly terminate a job
20:54 ^🔗	yipdw	that one's a bit tricky
20:56 ^🔗	yipdw	archivebot has a way to add URL ignore patterns on the fly, and that's how we deal with infinite loops there, but there's a lot of infrastructure needed for that and nobody wants that in the warrior
20:57 ^🔗	aaaaaaaaa	Which is why I thought forcing reboots. For example, we have no idea how many of the old jobs are still going against swipnet.
20:57 ^🔗	aaaaaaaaa	archivebot on warrior does seem overkill
20:57 ^🔗	yipdw	yeah, forcing reboots would work
20:57 ^🔗	yipdw	well hmm
20:57 ^🔗	yipdw	actually...
20:58 ^🔗	yipdw	you'd want to run a background task in the pipeline to check that
20:58 ^🔗	aaaaaaaaa	but I also see your concerns about people losing control of their own machines or the DOS potential against warriors
20:58 ^🔗	yipdw	having warriors pull commands over HTTPS would give you good enough protection against DDoS
20:58 ^🔗	yipdw	er DOS
20:59 ^🔗	yipdw	you would probably not want this auto-reboot thing on non-warrior pipelines
20:59 ^🔗	aaaaaaaaa	oh no question
21:00 ^🔗	aaaaaaaaa	manual runners are already their own thing
21:00 ^🔗	yipdw	but that's okay -- it's possible to set up a task that only runs in a warrior context
21:00 ^🔗	yipdw	yeah, actually remote reboot would be okay
21:00 ^🔗	aaaaaaaaa	which is why a wrapper is already needed
21:00 ^🔗	yipdw	I still find it squicky
21:00 ^🔗	yipdw	but maybe it's not that bad
21:01 ^🔗	yipdw	adding HTTPS with pinned certificates does mean that someone has to commit to keeping the Warrior image up-to-date
21:01 ^🔗	yipdw	(hasn't had an update in two years :P)
21:01 ^🔗	yipdw	also links to the Warrior need to point to the latest image
21:01 ^🔗	yipdw	otherwise you run into the possibility of certificate check failure
21:02 ^🔗	yipdw	that or we set up an ArchiveTeam CA or something (yikes)
21:02 ^🔗	yipdw	HTTPS brings all sorts of unpleasantness
21:02 ^🔗	aaaaaaaaa	just FYI, most of the technical details of ssl/tls are internet magic to me
21:03 ^🔗	yipdw	if we really decide to do this we will need to work out some procedures to keep certificate verification on Warriors working when it should, and not working when it shouldn't
21:04 ^🔗	yipdw	that or we say "meh, issue commands over HTTP" but I don't really want to go that route, mostly because it's embarassing in this day and age
21:05 ^🔗	yipdw	we can do certificate pinning but then you need to either commit to long-lived certificates and hope nothing bad happens
21:05 ^🔗	yipdw	or require periodic certificate store refreshes
21:05 ^🔗	yipdw	we can set up a CA chain for Archive Team if someone wants to maintain those keys
21:05 ^🔗	yipdw	actually, the CA chain is irrelevant
21:05 ^🔗	chfoo	why not commit commands into github?
21:06 ^🔗	yipdw	you could do that oto
21:06 ^🔗	aaaaaaaaa	oh
21:06 ^🔗	aaaaaaaaa	hardcode a repo.
21:06 ^🔗	aaaaaaaaa	cause github will handle certificates for us
21:07 ^🔗	aaaaaaaaa	wait, is githubs code updates ssl/tls?
21:07 ^🔗	yipdw	yeah
21:08 ^🔗	yipdw	yeah, doing that would save us from the https madness
21:08 ^🔗	yipdw	we already trust github anyway
21:08 ^🔗	aaaaaaaaa	brb, got a fire
21:28 ^🔗	aaaaaaaaa	so to check my understanding. using github would fix the issue of securing commands to the warriors; and the task of looking for new restart commands can be limited to warriors directly, and not need signalling from the pipeline process to the warrior process.
21:28 ^🔗	aaaaaaaaa	with messages, would you want those on github, or something else?
21:30 ^🔗	aaaaaaaaa	plus there needs to be a git-pull restart-pipeline after so many tasks wrapper as well?
21:40 ^🔗	aaaaaaaaa	so many items
22:40 ^🔗	yipdw	aaaaaaaaa: would probably be easiest to have the pipeline periodically check for new commands
22:40 ^🔗	yipdw	also I think this should be implemented in a project first and later extracted
22:40 ^🔗	yipdw	going the other way smells like Java
22:41 ^🔗	yipdw	also, since we're trusting github and repo committers anyway, "commands" could really just be a shell script
22:43 ^🔗	aaaaaaaaa	ok, I'll focus my thinking that way
22:44 ^🔗	yipdw	one nice thing about doing this all as git repos is that no new UI is needed
22:44 ^🔗	yipdw	i.e. no new tracker admin screens, no new endpoints
22:45 ^🔗	yipdw	just additional pipeline tasks and/or periodic tasks installed in the pipeline event loop
22:45 ^🔗	aaaaaaaaa	plus we can update code without having to rebuild a brand new warrior
22:45 ^🔗	yipdw	yeah
22:46 ^🔗	aaaaaaaaa	I was thinking of like the current check IP, only every 10 minutes instead of 10 items
22:46 ^🔗	yipdw	you may want to add it as a periodic task in the event loop
22:46 ^🔗	yipdw	if you don't, there are situations where you will be unable to force-reboot
22:46 ^🔗	yipdw	unless that's what you meant
22:46 ^🔗	aaaaaaaaa	oh. ok.
22:47 ^🔗	yipdw	on the other hand, adding something like that means that there is no synchronization between commands and pipeline items
22:48 ^🔗	yipdw	so don't try doing file operations on WARCs in command scripts
22:48 ^🔗	yipdw	depends on use case for commands I guess
22:48 ^🔗	aaaaaaaaa	I'll have to look at the code some more. part of the problem is the language barrier and the other is not having a complete picture of the pipeline
22:48 ^🔗	aaaaaaaaa	but I think the only time a forced reboot is necessary is on stuck items or code updates (until a wrapper is made)
22:49 ^🔗	aaaaaaaaa	or DDoS of website
22:49 ^🔗	yipdw	not sure how a wrapper would help
22:49 ^🔗	yipdw	a wrapper could only pull once the pipeline stopped
22:49 ^🔗	yipdw	if you had an infinite loop the pipeline wouldn't stop
22:50 ^🔗	yipdw	you can git pull while a pipeline is running but active wget processes will not reload the script
22:50 ^🔗	aaaaaaaaa	right, that is what I meant by stuck item, is basically any problem that keeps them from finishing
22:50 ^🔗	aaaaaaaaa	so a wrapper would make restarting for regular code updates unnecessary.
22:50 ^🔗	aaaaaaaaa	maybe we are just talking past each other
22:51 ^🔗	yipdw	oh, I just meant the stuck item case
22:51 ^🔗	yipdw	anyway, pipeline uses tornado's event loop, so may also want to check that out

irclogger-viewer