Hi,
This weekend we've had a problem on our IPVS loadbalance cluster where
ldirectord didn't take a failing realserver out of the pool. Upon closer
inspection it turned out that ldirectord god stuck executing ipvsadm.
ldirectord just hung waiting got ipvsadm to finish. No health checks were
performed, changes in the config file were not picked up, and ldirectord
didn't respond to a stop command or even to SIGTERM. I had to kill -9 both
the ipvsadm and the ldirectord process. The ipvsadm process was started
almost a week ago, so basically the whole cluster was not doing any health
checks since then.
(Strangely, ldirectord reload to force a config file change *DID* work, but I
suspect that the newly created ldirectord does that instead of using some I/O
to talk to the running daemon.)
We've never encountered the problems before on our 2.2 kernel-based setup that
has run without problems for over two years, and this new 2.4-based setup has
also been running for about 4 months without problems.
Does anyone know what may have caused this and what went wrong? The relevant
info for our configuration:
* Debian Woody
* .debs for ldirectord and heartbeat from
http://www.ultramonkey.org/download/heartbeat/1.0.4/debian_woody/
* P4 2.4GHz with hyperthreading and SMP kernel
* Plenty of disk space and memory, no other processes and no CPU performance
problems.
Further suggestions are appreciated, I'm a bit stuck now.
--
Martijn
|