Hi there,
I'm using the ldirector with heartbeat daemon on a CentoOS 5 (x86_64) and I
realized that when I use the "checkcommand" option to run a script or
executable to check wheather a service is alive or not, the process exits
properly but it keeps in "defunct" state. There are up and down services that
are being tested, so, I also realized that it happens just when the
"checkcommand" process takes more than "checktimeout" seconds to complete.
I decided to check the availability of my UDP service writting a code myself to
check whether the server is up or down, so, I had to use the "checkcommand"
option.
I also saw that sometimes the restart of the heartbeat hangs. But it doesn't
happen so often...
Does anyone have already run into such trouble? I got thousands of "defunct"
processes after some while, so, I have to restart the ldirectord to get rid of
such "defunct" processes. I'm running the version script from the
heartbeat-ldirectord-2.1.3-3.el5.centos rpm package from CentOS repository. A
temporary solution I got was to increase the "checktimeout" providing that the
ldirectord timeout would never happen.
Any help would be welcome,
Thanks in advance,
Bruno
Abra sua conta no Yahoo! Mail, o único sem limite de espaço para
armazenamento!
http://br.mail.yahoo.com/
|