Hello List,
I am running a 2 Node Cluster with fail over using Heartbeat. We
recently have come to notice that when the Primary Node(Director1) is
taken down or fails the Secondary Node(Director2) takes upto 6 minutes
to assume the Virtual IP's. Sometimes the Director2 doesn't take over
at all. Heartbeat checks the servers every 2 seconds and the Deadtime
is 10 seconds. If I restart the heartbeat service on both Nodes they
seem to work within 15 seconds the first couple of tries but then they
seem to get confused as Director2 will not give up its resources when
Director1 comes back on line. This is tested by shutting off the port
on the switch or by starting and stopping the Heartbeat service. Any
ideas as to what could be causing this problem?
AJ
|