LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] Ldirectord not working with heartbeat, works standalone

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [lvs-users] Ldirectord not working with heartbeat, works standalone
From: Simon Horman <horms@xxxxxxxxxxxx>
Date: Thu, 12 Feb 2009 14:22:23 +1100
On Tue, Feb 10, 2009 at 01:28:36PM +0100, Sebastian Vieira wrote:
> On Tue, Feb 10, 2009 at 12:49 PM, Bruce Richardson 
> <itsbruce@xxxxxxxxxxx>wrote:
> 
> > If ldirectord is turned off on inactive directors then the LVS
> > configuration on those servers may not
> > reflect the current situation and on the restart of ldirectord there
> > will be a delay while this discrepancy is detected.  This is the main
> > risk with managing ldirectord as a hearbeat resource and I see it as a
> > significant enough danger to avoid any danger of it.
> 
> 
> Ah, i see your point. I never came across this issue before.
> 
> But consider this: if a director does a failover it's because there's a
> problem with the network so any state change of a realserver gets 'lost'
> anyway during the timeout period you specified in heartbeat. If ldirectord
> starts i assume it immediately issues health-checks and thus sets the
> availability of the realservers accordingly. You can shorten the time
> between health-checks somewhat to minimize this period, but still it would
> be a period in which clients could be routed to unavailable realservers.
> Maybe it's an idea to have ldirector parse its configuration file and
> instead of first setting up the realserver entry in the LVS table, have it
> issue a health-check for each realserver it comes across. Then set up the
> entry according to it's availability.

Just to clarify. When ldirectord starts up it clears the LVS table and then
adds real-servers as they are checked. The there is no time between checks
- only a pause between each iteration of all checks (checkinterval).  The
checks are run in serial. And each check finishes when it either gets a
result from the real-server or times-out.

In short, it works a bit like this:

while 1
    foreach rs in real-server
         check_realserver rs || timeout
    sleep checkinterval

Typically timeouts take longer than getting a result.  And can thus result
in delays in checking subsequent real-servers.  It isn't unreasonable to
expect some timeouts if the network is in a situation where fail-over has
occurred.  So this could be a problem as it may take a while for ldirectord
to check all real-servers.

A way around this problem is to enable the recently added fork option to
ldirectord. When this is present it forks a process for each real-server to
be checked and the checks can run in parallel.

In "fork" mode, ldirectord behaves more like this:

while 1
    foreach rs in real-server
         if child-process[rs] isn't present
             child-process[rs] = fork_child(rs)
    sleep 1

And in each child process:

while 1
    check-realserver || timeout
    sleep checkinterval

> But yes, i agree, if you want to eliminate this 'lost' period the best way
> would be to have ldirectord running on both nodes at all times. An argument
> that i thought of "heartbeat makes sure ldirectord is running" is moot if
> you have puppet handle the service state.

For the record, my position on this is that its better to have
ldirectord running on both hosts all the time.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en


_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread] Current Thread [Next in Thread>