Hi all,
I've some strange behaviour for ldirectord when heartbeat launch services for
a node, because it sometimes launch them twices.
Here is my actual heartbeat configuration :
node1 IPaddr::164.129.24.6/23/eth0 IPaddr::164.129.25.2/23/eth0
IPaddr::164.129.25.3/23/eth0 IPaddr::164.129.33.49/28/eth1 trap-snmp
ldirectord::ldirectord.cf
I first had the problem when adding trap-snmp script which just send an snmp
alert to the monitoring server. All the line node1 was launched two times,
generating some errors in logs but also launching sometimes 2 ldirectord
daemons (as they seems to be launched just one after the other, and the first
one don't have enough time to write a runpid lock file before the second is
launched).
I found that the problem appear because the trap-master script was first
written whitout taking any argument (especially start/stop/status). Rewriting
it for taking argument and return a 0 if start or stop is called or 1 if
other values (including status) seems to correct the problem, however I do
not see documentation talking about how to write additional scripts and what
this script must answer. I've logged calls to start, stop and status, but
can't affirm when they are exactly used by heartbeat and what heartbeat wait
as answer.
I've now a similar problem which launch twices the services. It appear only
when node1 is starting and if node2 (which is the backup) is not up.
Have you already constat this sort of problems, I've read the changelogs but
don't find something similar and even google is not very helpfull on the
subject.
Thanks folks,
Jean-Michel.
---------------------------------------------------
LINUX ? Y'a moins bien, mais c'est plus cher.
|