LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: Ldirectord starting problems

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: Ldirectord starting problems
From: Horms <horms@xxxxxxxxxxxx>
Date: Mon, 30 May 2005 11:32:16 +0900
On Fri, May 20, 2005 at 03:36:45PM +0100, Graham David Purcocks M.A.(Oxon.) 
wrote:
> Its starting up and immediately being shutdown which suggest your
> hearbeat config is incorrect and so its releasing itself as master.
> 
> So I guess you need to show your heartbeat config next.
> 
> On Fri, 2005-05-20 at 14:39, Deputy Michael wrote:
> > I'm not actually getting errors from heartbeat.
> > oailxwntst's ha-log
> > ____________________________________________________________________________
> > __________________________________________
> > heartbeat: 2005/05/19_12:35:47 info: **************************
> > heartbeat: 2005/05/19_12:35:47 info: Configuration validated. Starting
> > heartbeat 1.2.3.cvs.20050404
> > heartbeat: 2005/05/19_12:35:47 info: heartbeat: version 1.2.3.cvs.20050404
> > heartbeat: 2005/05/19_12:35:48 info: Heartbeat generation: 2949
> > heartbeat: 2005/05/19_12:35:48 info: UDP Broadcast heartbeat started on port
> > 694 (694) interface eth1
> > heartbeat: 2005/05/19_12:35:48 info: ping heartbeat started.
> > heartbeat: 2005/05/19_12:35:48 info: pid 3013 locked in memory.
> > heartbeat: 2005/05/19_12:35:48 info: Local status now set to: 'up'
> > heartbeat: 2005/05/19_12:35:49 info: pid 3016 locked in memory.
> > heartbeat: 2005/05/19_12:35:49 info: pid 3019 locked in memory.
> > heartbeat: 2005/05/19_12:35:49 info: pid 3020 locked in memory.
> > heartbeat: 2005/05/19_12:35:49 info: pid 3018 locked in memory.
> > heartbeat: 2005/05/19_12:35:49 info: pid 3017 locked in memory.
> > heartbeat: 2005/05/19_12:35:49 ERROR: Exiting HBWRITE process 3019 killed by
> > signal 11.
> > heartbeat: 2005/05/19_12:35:49 ERROR: Core heartbeat process died!
> > Restarting.
> > heartbeat: 2005/05/19_12:35:49 WARN: Shutdown delayed until current resource
> > activity finishes.
> > heartbeat: 2005/05/19_12:35:49 info: Link oailxwbtst.ssfhs.org:eth1 up.
> > heartbeat: 2005/05/19_12:37:48 WARN: node oailxwbts2.ssfhs.org: is dead
> > heartbeat: 2005/05/19_12:37:48 WARN: No STONITH device configured.
> > heartbeat: 2005/05/19_12:37:48 WARN: Shared disks are not protected.
> > heartbeat: 2005/05/19_12:37:48 info: Resources being acquired from
> > oailxwbts2.ssfhs.org.
> > heartbeat: 2005/05/19_12:37:48 WARN: node 192.168.6.254: is dead
> > heartbeat: 2005/05/19_12:37:48 info: Local status now set to: 'active'
> > heartbeat: 2005/05/19_12:37:48 info: Starting child client
> > "/usr/lib/heartbeat/ipfail" (501,501)
> > heartbeat: 2005/05/19_12:37:48 info: Starting "/usr/lib/heartbeat/ipfail" as
> > uid 501  gid 501 (pid 3025)
> > heartbeat: 2005/05/19_12:37:48 info: Running /etc/ha.d/rc.d/status status
> > heartbeat: 2005/05/19_12:37:48 info: Local Resource acquisition completed.
> > heartbeat: 2005/05/19_12:37:48 info: /usr/lib/heartbeat/mach_down:
> > nice_failback: foreign resources acquired
> > heartbeat: 2005/05/19_12:37:48 info: Initial resource acquisition complete
> > (T_RESOURCES)
> > heartbeat: 2005/05/19_12:37:48 info: mach_down takeover complete for node
> > oailxwbts2.ssfhs.org.
> > heartbeat: 2005/05/19_12:37:48 info: mach_down takeover complete.
> > heartbeat: 2005/05/19_12:37:48 info: Running /etc/ha.d/rc.d/status status
> > heartbeat: 2005/05/19_12:37:48 info: Running /etc/ha.d/rc.d/ip-request-resp
> > ip-request-resp
> > heartbeat: 2005/05/19_12:37:48 received ip-request-resp 10.90.2.194 OK yes
> > heartbeat: 2005/05/19_12:37:48 info: Acquiring resource group:
> > oailxwbtst.ssfhs.org 10.90.2.194 ldirectord
> > heartbeat: 2005/05/19_12:37:48 info: Running /etc/ha.d/resource.d/IPaddr
> > 10.90.2.194 start
> > heartbeat: 2005/05/19_12:37:48 info: Removing conflicting loopback lo:0.
> > heartbeat: 2005/05/19_12:37:48 info: /sbin/ifconfig lo:0 down
> > heartbeat: 2005/05/19_12:37:48 info: /sbin/route -n del -host 10.90.2.194
> > heartbeat: 2005/05/19_12:37:48 info: /sbin/ifconfig eth0:0 10.90.2.194
> > netmask 255.255.255.0       broadcast 10.90.2.255
> > heartbeat: 2005/05/19_12:37:48 info: Sending Gratuitous Arp for 10.90.2.194
> > on eth0:0 [eth0]
> > heartbeat: 2005/05/19_12:37:48 /usr/lib/heartbeat/send_arp -i 1010 -r 5 -p
> > /var/lib/heartbeat/rsctmp/send_arp/send_arp-10.90.2.194 eth0 10.90.2.194
> > auto 10.90.2.194 ffffffffffff
> > heartbeat: 2005/05/19_12:37:49 info: Running /etc/ha.d/resource.d/ldirectord
> > start
> > heartbeat: 2005/05/19_12:38:00 info: Local Resource acquisition completed.
> > (none)
> > heartbeat: 2005/05/19_12:38:00 info: local resource transition completed.
> > heartbeat: 2005/05/19_12:38:00 info: Heartbeat shutdown in progress. (3013)
> > heartbeat: 2005/05/19_12:38:00 info: Giving up all HA resources.
> > heartbeat: 2005/05/19_12:38:00 info: Releasing resource group:
> > oailxwbtst.ssfhs.org 10.90.2.194 ldirectord
> > heartbeat: 2005/05/19_12:38:00 info: Running /etc/ha.d/resource.d/ldirectord
> > stop
> > heartbeat: 2005/05/19_12:38:00 info: Running /etc/ha.d/resource.d/IPaddr
> > 10.90.2.194 stop
> > heartbeat: 2005/05/19_12:38:00 info: /sbin/route -n del -host 10.90.2.194
> > heartbeat: 2005/05/19_12:38:00 info: /sbin/ifconfig eth0:0 down
> > heartbeat: 2005/05/19_12:38:00 info: Restoring loopback IP Address
> > 10.90.2.194 on lo:0.
> > heartbeat: 2005/05/19_12:38:00 info: /sbin/ifconfig lo:0 10.90.2.194 netmask
> > 255.255.255.255
> > heartbeat: 2005/05/19_12:38:00 info: IP Address 10.90.2.194 released
> > heartbeat: 2005/05/19_12:38:00 info: killing /usr/lib/heartbeat/ipfail
> > process group 3025 with signal 15
> > heartbeat: 2005/05/19_12:38:00 info: All HA resources relinquished.
> > heartbeat: 2005/05/19_12:38:00 info: killing /usr/lib/heartbeat/ipfail
> > process group 3025 with signal 15
> > heartbeat: 2005/05/19_12:38:01 info: killing HBFIFO process 3016 with signal
> > 15
> > heartbeat: 2005/05/19_12:38:01 info: killing HBWRITE process 3017 with
> > signal 15
> > heartbeat: 2005/05/19_12:38:01 info: killing HBREAD process 3018 with signal
> > 15
> > heartbeat: 2005/05/19_12:38:01 info: killing HBREAD process 3020 with signal
> > 15
> > heartbeat: 2005/05/19_12:38:01 info: Core process 3020 exited. 4 remaining
> > heartbeat: 2005/05/19_12:38:01 info: Core process 3018 exited. 3 remaining
> > heartbeat: 2005/05/19_12:38:01 info: Core process 3016 exited. 2 remaining
> > heartbeat: 2005/05/19_12:38:01 info: Core process 3017 exited. 1 remaining
> > heartbeat: 2005/05/19_12:38:01 info: Heartbeat shutdown complete.
> > heartbeat: 2005/05/19_12:38:01 info: Heartbeat restart triggered.
> > heartbeat: 2005/05/19_12:38:01 info: Restarting heartbeat.
> > heartbeat: 2005/05/19_12:38:01 info: Performing heartbeat restart exec.
> > heartbeat: 2005/05/19_12:38:32 info: **************************
> > heartbeat: 2005/05/19_12:38:32 info: Configuration validated. Starting
> > heartbeat 1.2.3.cvs.20050404
> > heartbeat: 2005/05/19_12:38:32 info: heartbeat: version 1.2.3.cvs.20050404
> > heartbeat: 2005/05/19_12:38:32 info: Heartbeat generation: 2950
> > heartbeat: 2005/05/19_12:38:32 info: UDP Broadcast heartbeat started on port
> > 694 (694) interface eth1
> > heartbeat: 2005/05/19_12:38:32 info: ping heartbeat started.
> > heartbeat: 2005/05/19_12:38:32 info: pid 3355 locked in memory.
> > heartbeat: 2005/05/19_12:38:32 info: Local status now set to: 'up'
> > heartbeat: 2005/05/19_12:38:33 info: pid 3357 locked in memory.
> > heartbeat: 2005/05/19_12:38:33 info: pid 3361 locked in memory.
> > heartbeat: 2005/05/19_12:38:33 info: pid 3358 locked in memory.
> > heartbeat: 2005/05/19_12:38:33 info: pid 3359 locked in memory.
> > heartbeat: 2005/05/19_12:38:33 info: pid 3360 locked in memory.
> > heartbeat: 2005/05/19_12:38:33 info: Link oailxwbts2.ssfhs.org:eth1 up.
> > heartbeat: 2005/05/19_12:38:33 info: Status update for node
> > oailxwbts2.ssfhs.org: status up
> > heartbeat: 2005/05/19_12:38:33 info: Link oailxwbtst.ssfhs.org:eth1 up.
> > heartbeat: 2005/05/19_12:38:33 ERROR: Exiting HBWRITE process 3360 killed by
> > signal 11.

This is bad. Signal 11 is a segmentation fault, which in a very
small nutshell means the program tried to access memory it doesn't
have access to - this should never happen and is likely a bug
or perhaps a runtime linkage problem.

Did you recomile heartbeat for CentOS? If not, could you please try
doing so?

> > heartbeat: 2005/05/19_12:38:33 ERROR: Core heartbeat process died!
> > Restarting.
> > heartbeat: 2005/05/19_12:38:33 WARN: Shutdown delayed until current resource
> > activity finishes.
> > heartbeat: 2005/05/19_12:38:33 info: Status update for node
> > oailxwbts2.ssfhs.org: status active
> > heartbeat: 2005/05/19_12:38:33 info: Running /etc/ha.d/rc.d/status status
> > heartbeat: 2005/05/19_12:38:33 info: Running /etc/ha.d/rc.d/status status

[snip]

-- 
Horms

<Prev in Thread] Current Thread [Next in Thread>