On Nov 29, 2007 12:05 AM, Simon Horman <horms@xxxxxxxxxxxx> wrote:
> On Wed, Nov 28, 2007 at 01:56:18AM -0800, Ryan Castellucci wrote:
> > The attached patch modifies ldirectord to fork a process for each
> > virtual server to speed up response time with large numbers of virtual
> > servers. I am testing this vs multiple instances of ldirectord, two
> > virtual servers, three real servers each, and it uses about 25MB less
> > ram over that, and starts up a lot quicker.
> Hi Ryan,
> this patch seems very nice, thanks.
> > Other things to note
> > $0 is set for the children so you can see what virtual server each is
> > managing, and what real server it's checking from ps
> This will probably work on Linux, but it probably won't work on
> Solaris - they beileve that changing $0 and having that reflected
> in ps is a security problem because it allows people to hide processes
> - i.e. I can hide "fork-bomb" as "/bin/sh". This isn't a big problem
> with regards to your patch, just something I thought might
> be interesting.
It worked for me on Debian. Thanks for the note, I wasn't aware that
it wasn't generically supported on *nix.
> > All children are supervised by the parent, and restarted if they exit.
It seemed an almost required feature. I figure most people will be
running ldirectord itself under supervision, and it would just be
awful if one of the children died, and that virtual server had issues
despite the user's efforts.
> > Due to issues with state tracking, when a child starts, it forces all
> > of it's real servers down until it rechecks them. This fixed issues
> > with the state of the real servers changing between when a child dies
> > and when it is restarted.
> I think it would be nice if it could be a bit more clever.
> Leaving toggling the real-servers as neccessary. Do you
> think this is at all possible.
Probably. It could parse the output of ipvsadm, for example, to get
the current state. I didn't bother because I wanted to start simple
for getting the fork model working, but I don't think it would be too
hard to add.
Any other suggestions on ideas on how to do it?
> > Reloading the config kills all children due to not being able to muck
> > about with their state. Due to reasons stated above, this may cause a
> > brief service interruption.
> Killing the children seems fine, but interrupting the service
> will likely annoy many people.
Yeah. It's somewhat mitigated by the fact that it will check and
re-enable the real servers pretty quick. Fixing the above issue would
fix this one.
> > I'd like feedback on this patch, and any constructive criticism,
> > suggestions, bug fixes, etc are welcome.
> > Do please note that I coded this after being sick and awake for about
> > 20 hours straight (cough was keeping me up), so it probably isn't my
> > best work.
> > Standard disclaimer: This is not well tested code. Don't run it in
> > your massive data center. If you do anyway, I'm not responsible for
> > any failures that result from it's use.
> I wonder if you could make this a configuration option.
> We could intially set it to off to give people a chance to test it.
> Then make it the default later if it is successful.
I'll give it a try.
> Another request, would it be possible to make the diff
> against ldirectord.in that is in the linux-ha mecurial tree?
> Lastly, there is some (perhaps overly-complex logic) in ldirectord to
> only test a real-server once if it appears in multiple virtual services.
> I think that your changes will basically disable that code - so perhaps
> it could be removed if your code is successful. This would be nice as
> the complexity of the code has been a pain to maintain.
My changes do indeed remove that code, since continuing to do those
checks under a multiprocess model would require IPC, which I didn't
feel like implementing at the time.
I didn't think the existing result-caching code was particularly
complex, would you mind elaborating on what is annoying about it?
> On the other hand, that code was added for a real life case of many many
> virtual services all with the same real server, which seems like it
> would be a pathalogical case for your change. So perhaps we need to keep
> the old way too.
The forked model wouldn't make sense to use in that case anyway, so
might as well break the system down into forked model without real
server result caching, and single process with result caching.
Also, what are your thoughts on re-factoring ldirecotrd somewhat to
modularize parts of it? For example, it seems to me like it would be
nice to split out the check routines into modules and handle them as
plugins, which would reduce the size of the main program, and make it
much easier for people to have custom checks (just make a new module,
rather then having to touch ldirectod's code.
Ryan Castellucci http://ryanc.org/