On Tue, Feb 10, 2009 at 01:28:36PM +0100, Sebastian Vieira wrote:
> On Tue, Feb 10, 2009 at 12:49 PM, Bruce Richardson
> <itsbruce@xxxxxxxxxxx>wrote:
>
> > If ldirectord is turned off on inactive directors then the LVS
> > configuration on those servers may not
> > reflect the current situation and on the restart of ldirectord there
> > will be a delay while this discrepancy is detected. This is the main
> > risk with managing ldirectord as a hearbeat resource and I see it as a
> > significant enough danger to avoid any danger of it.
>
>
> Ah, i see your point. I never came across this issue before.
>
> But consider this: if a director does a failover it's because there's a
> problem with the network so any state change of a realserver gets 'lost'
> anyway during the timeout period you specified in heartbeat. If ldirectord
> starts i assume it immediately issues health-checks and thus sets the
> availability of the realservers accordingly. You can shorten the time
> between health-checks somewhat to minimize this period, but still it would
> be a period in which clients could be routed to unavailable realservers.
> Maybe it's an idea to have ldirector parse its configuration file and
> instead of first setting up the realserver entry in the LVS table, have it
> issue a health-check for each realserver it comes across. Then set up the
> entry according to it's availability.
Just to clarify. When ldirectord starts up it clears the LVS table and then
adds real-servers as they are checked. The there is no time between checks
- only a pause between each iteration of all checks (checkinterval). The
checks are run in serial. And each check finishes when it either gets a
result from the real-server or times-out.
In short, it works a bit like this:
while 1
foreach rs in real-server
check_realserver rs || timeout
sleep checkinterval
Typically timeouts take longer than getting a result. And can thus result
in delays in checking subsequent real-servers. It isn't unreasonable to
expect some timeouts if the network is in a situation where fail-over has
occurred. So this could be a problem as it may take a while for ldirectord
to check all real-servers.
A way around this problem is to enable the recently added fork option to
ldirectord. When this is present it forks a process for each real-server to
be checked and the checks can run in parallel.
In "fork" mode, ldirectord behaves more like this:
while 1
foreach rs in real-server
if child-process[rs] isn't present
child-process[rs] = fork_child(rs)
sleep 1
And in each child process:
while 1
check-realserver || timeout
sleep checkinterval
> But yes, i agree, if you want to eliminate this 'lost' period the best way
> would be to have ldirectord running on both nodes at all times. An argument
> that i thought of "heartbeat makes sure ldirectord is running" is moot if
> you have puppet handle the service state.
For the record, my position on this is that its better to have
ldirectord running on both hosts all the time.
--
Simon Horman
VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
|