Re: [lvs-users] RFC: Forking ldirecterd [PATCH]

To: " users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [lvs-users] RFC: Forking ldirecterd [PATCH]
From: Simon Horman <horms@xxxxxxxxxxxx>
Date: Thu, 29 Nov 2007 17:05:39 +0900
On Wed, Nov 28, 2007 at 01:56:18AM -0800, Ryan Castellucci wrote:
> The attached patch modifies ldirectord to fork a process for each
> virtual server to speed up response time with large numbers of virtual
> servers. I am testing this vs multiple instances of ldirectord, two
> virtual servers, three real servers each, and it uses about 25MB less
> ram over that, and starts up a lot quicker.

Hi Ryan,

this patch seems very nice, thanks.

> Other things to note
> $0 is set for the children so you can see what virtual server each is
> managing, and what real server it's checking from ps

This will probably work on Linux, but it probably won't work on
Solaris - they beileve that changing $0 and having that reflected
in ps is a security problem because it allows people to hide processes
- i.e. I can hide "fork-bomb" as "/bin/sh". This isn't a big problem
with regards to your patch, just something I thought might
be interesting.

> All children are supervised by the parent, and restarted if they exit.


> Due to issues with state tracking, when a child starts, it forces all
> of it's real servers down until it rechecks them.  This fixed issues
> with the state of the real servers changing between when a child dies
> and when it is restarted.

I think it would be nice if it could be a bit more clever.
Leaving toggling the real-servers as neccessary. Do you
think this is at all possible.

> Reloading the config kills all children due to not being able to muck
> about with their state.  Due to reasons stated above, this may cause a
> brief service interruption.

Killing the children seems fine, but interrupting the service
will likely annoy many people.

> I'd like feedback on this patch, and any constructive criticism,
> suggestions, bug fixes, etc are welcome.
> Do please note that I coded this after being sick and awake for about
> 20 hours straight (cough was keeping me up), so it probably isn't my
> best work.
> Standard disclaimer: This is not well tested code.  Don't run it in
> your massive data center.  If you do anyway, I'm not responsible for
> any failures that result from it's use.

I wonder if you could make this a configuration option.
We could intially set it to off to give people a chance to test it.
Then make it the default later if it is successful.

Another request, would it be possible to make the diff
against that is in the linux-ha mecurial tree?

Lastly, there is some (perhaps overly-complex logic) in ldirectord to
only test a real-server once if it appears in multiple virtual services.
I think that your changes will basically disable that code - so perhaps
it could be removed if your code is successful. This would be nice as
the complexity of the code has been a pain to maintain.

On the other hand, that code was added for a real life case of many many
virtual services all with the same real server, which seems like it
would be a pathalogical case for your change. So perhaps we need to keep
the old way too.


<Prev in Thread] Current Thread [Next in Thread>