I have 2 LVS directors with failover using heartbeat (the real servers
are working fine). I only use a network UDP test, no serial connection,
because I want a failover to occur if the network connection fails on
the primary director. I use nice_failback so there is no fixed
master/slave.
I believe I am correct in saying only if all heartbeat checks fail then
the system fails over, so having a serial connection will not failover
if just the UDP check fails.
In ha.cf I have
<snip>
udpport 694
#
# What interfaces to broadcast heartbeats over?
#
bcast eth0 # Linux
node lvsrouter-1.domain
node lvsrouter-2.domain
</snip>
What happens with this, is that if lvsrouter-1 is master and you pull
the ethernet cable, then lvsrouter-2 takes over correctly but
lvsrouter-1 stays up as it thinks the slave has just failed. When you
put the cable back in then lvsrouter-1 comes up as master, and
lvsrouter-2 shuts down, but then comes back up again. So you end up with
both running as master.
The more I think about this, the more it seems incorrect. It appears
that the routers do not negotiate correctly, so maybe the config is
wrong in some way. If you do a 'service heartbeat stop' on lvsrouter-1,
the failover occurs and when you restart heartbeat it correctly takes up
the slave mode.
So it appears that heartbeat does not detect it should be slave when it
just loses network connectivity and it only has UDP checking. I'm sure I
read somewhere this case and someone had a different network test module
which also checked network connectiviy to the next hop router.
So maybe I don't have the 2 directors configured correctly, or this is a
scenario that is not easily coped with.
Does that explain it more clearly?
Thanks
Graham
-----Original Message-----
From: lvs-users-bounces@xxxxxxxxxxxxxxxxxxxxxx
[mailto:lvs-users-bounces@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Malcolm
Turnbull
Sent: 13 April 2005 13:56
To: LinuxVirtualServer.org users mailing list.
Subject: Re: Heartbeat LVS-DR config
Graham,
I'd wouldn't say giberish.. but you haven't said what you are doing ?
I assume you mean LVS-DR on one load balancer, with two real servers
(one set up as failover)
using either mon, ldirectord or keepalived ?
Or are you talking about some kind of heartbeat setup ?
Or do you mean hosting the services (apache) on the LVS boxes with
heartbeat ?
Purcocks, Graham wrote:
I have just built 2 new servers with LVS-DR and they fail over fine if
the machine dies, but if the network interface stops they both end up
being live when the interface is restored. This is with only IP
checking.
Sorry if it sounds gibberish but its driving me nuts. I know I've seen
it while Googling but I just can't find it again.
Thanks
Graham