On Wed, Jun 04, 2003 at 11:42:41AM -0500, AJ Lemke wrote:
> Hello List,
>
> I am running a 2 Node Cluster with fail over using Heartbeat. We
> recently have come to notice that when the Primary Node(Director1) is
> taken down or fails the Secondary Node(Director2) takes upto 6 minutes
> to assume the Virtual IP's. Sometimes the Director2 doesn't take over
> at all. Heartbeat checks the servers every 2 seconds and the Deadtime
> is 10 seconds. If I restart the heartbeat service on both Nodes they
> seem to work within 15 seconds the first couple of tries but then they
> seem to get confused as Director2 will not give up its resources when
> Director1 comes back on line. This is tested by shutting off the port
> on the switch or by starting and stopping the Heartbeat service. Any
> ideas as to what could be causing this problem?
That is very strange. Which version of heartbeat are you using?
As always, heartbeat related questions are best asked
on the linux-ha or linux-ha-dev mailing lists.
Information on these can be found on www.linux-ha.org.
--
Horms
|