Hello happy list,
Have a heartbeat + ldirectord setup spanning several IPs. From
hareresources:
cerberus ip1/24/eth0 ip2/24/eth0 ip3/24/eth0 ip4/24/eth0 ip5/24/eth0
ip6/24/eth1 ip7/24/eth1 ldirectord
one line is all I have, slightly edited and wrapped here.
ha.cf is pretty simple:
logfacility local0
bcast eth1
node hydra cerberus
ldirectord has a quite huge ldirector.cf in /etc/ha.d/ and it is all
working just fine on the main node cerberus. The secondary node hydra
has undergone some software updates. Heartbeat failover works in that it
detects when cerberus dies, and takes over the network interfaces.
However it starts the interfaces and ldirectord and things work for a
few seconds, then it tries to start it all again, ldirectord complains
it is already running, and heartbeat bails.
So what do I have set up wrong?
This is logged in messages:
Jan 20 04:49:02 hydra heartbeat: [14686]: info: Status update for node
cerberus: status active
Jan 20 04:49:02 hydra heartbeat: [14696]: debug: notify_world: setting SIGCHLD
Handler to SIG_DFL
Jan 20 04:49:03 hydra harc[14696]: info: Running /etc/ha.d/rc.d/status status
Jan 20 04:49:03 hydra heartbeat: [14686]: info: Link hydra:eth1 up.
Jan 20 04:49:58 hydra heartbeat: [14686]: info: Received shutdown notice from
'cerberus'.
Jan 20 04:49:58 hydra heartbeat: [14686]: info: Resources being acquired from
cerberus.
Jan 20 04:49:58 hydra heartbeat: [14706]: debug: notify_world: setting SIGCHLD
Handler to SIG_DFL
Jan 20 04:49:59 hydra harc[14706]: info: Running /etc/ha.d/rc.d/status status
Jan 20 04:49:59 hydra heartbeat: [14707]: info: No local resources
[/usr/lib/heartbeat/ResourceManager listkeys hydra] to acquire.
Jan 20 04:49:59 hydra heartbeat: [14686]: debug: StartNextRemoteRscReq(): child
count 1
Jan 20 04:49:59 hydra mach_down[14719]: info: Taking over resource group
ip1/24/eth0
Jan 20 04:49:59 hydra ResourceManager[14746]: info: Acquiring resource
group:<snip>
[many lines cut concerning the start of IPaddr for each ip]
Jan 20 04:50:11 hydra ResourceManager[14746]: info: Running
/etc/ha.d/resource.d/ldirectord start
Jan 20 04:50:11 hydra ResourceManager[14746]: debug: Starting
/etc/ha.d/resource.d/ldirectord start
Jan 20 04:50:13 hydra ldirectord[16829]: Starting Linux Director v1.77.2.5 as
daemon
Jan 20 04:50:13 hydra ResourceManager[14746]: debug:
/etc/ha.d/resource.d/ldirectord start done. RC=0
Jan 20 04:50:13 hydra mach_down[14719]: info: mach_down takeover complete for
node cerberus.
[21 lines: ldirectord[16831]: Added virtual server: xxx]
[3 lines: ldirectord[16831]: Added fallback server: xxx]
[40 lines: ldirectord[16831]: Quiescent real server: xxx]
[20 lines: ldirectord[16831]: Restored real server: xxx]
Jan 20 04:50:29 hydra heartbeat: [14686]: WARN: node cerberus: is dead
Jan 20 04:50:29 hydra heartbeat: [14686]: info: Dead node cerberus gave up
resources.
Jan 20 04:50:29 hydra heartbeat: [14686]: info: Resources being acquired from
cerberus.
Jan 20 04:50:29 hydra heartbeat: [14686]: info: Link cerberus:eth1 dead.
Jan 20 04:50:29 hydra heartbeat: [17044]: debug: notify_world: setting SIGCHLD
Handler to SIG_DFL
Jan 20 04:50:29 hydra harc[17044]: info: Running /etc/ha.d/rc.d/status status
Jan 20 04:50:29 hydra heartbeat: [17045]: info: No local resources
[/usr/lib/heartbeat/ResourceManager listkeys hydra] to acquire.
Jan 20 04:50:29 hydra heartbeat: [14686]: debug: StartNextRemoteRscReq(): child
count 1
Jan 20 04:50:29 hydra mach_down[17064]: info: Taking over resource group
ip1/24/eth0
Jan 20 04:50:29 hydra ResourceManager[17084]: info: Acquiring resource group:
<snip>
Jan 20 04:50:29 hydra heartbeat: [14686]: info: Comm_now_up(): updating status
to active
Jan 20 04:50:29 hydra heartbeat: [14686]: info: Local status now set to:
'active'
Jan 20 04:50:30 hydra heartbeat: [17108]: info: No local resources
[/usr/lib/heartbeat/ResourceManager listkeys hydra] to acquire.
Jan 20 04:50:30 hydra heartbeat: [14686]: debug: StartNextRemoteRscReq(): child
count 1
Jan 20 04:50:30 hydra IPaddr[17112]: INFO: IPaddr Running OK
Jan 20 04:50:31 hydra IPaddr[17224]: INFO: IPaddr Running OK
Jan 20 04:50:31 hydra IPaddr[17330]: INFO: IPaddr Running OK
Jan 20 04:50:32 hydra IPaddr[17436]: INFO: IPaddr Running OK
Jan 20 04:50:32 hydra IPaddr[17542]: INFO: IPaddr Running OK
Jan 20 04:50:33 hydra IPaddr[17654]: INFO: IPaddr Running OK
Jan 20 04:50:34 hydra IPaddr[17760]: INFO: IPaddr Running OK
Jan 20 04:50:35 hydra ResourceManager[17084]: info: Running
/etc/ha.d/resource.d/ldirectord start
Jan 20 04:50:35 hydra ResourceManager[17084]: debug: Starting
/etc/ha.d/resource.d/ldirectord start
Jan 20 04:50:37 hydra ResourceManager[17084]: debug:
/etc/ha.d/resource.d/ldirectord start done. RC=1
Jan 20 04:50:37 hydra ResourceManager[17084]: ERROR: Return code 1 from
/etc/ha.d/resource.d/ldirectord
Jan 20 04:50:37 hydra ResourceManager[17084]: CRIT: Giving up resources due to
failure of ldirectord
--
Kenny Dail <kend@xxxxxxxxx>
|