LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

RE: Ldirectord starting problems

To: "'LinuxVirtualServer.org users mailing list.'" <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: RE: Ldirectord starting problems
From: Deputy Michael <Michael.Deputy@xxxxxxxxx>
Date: Fri, 20 May 2005 08:39:10 -0500
I'm not actually getting errors from heartbeat.
oailxwntst's ha-log
____________________________________________________________________________
__________________________________________
heartbeat: 2005/05/19_12:35:47 info: **************************
heartbeat: 2005/05/19_12:35:47 info: Configuration validated. Starting
heartbeat 1.2.3.cvs.20050404
heartbeat: 2005/05/19_12:35:47 info: heartbeat: version 1.2.3.cvs.20050404
heartbeat: 2005/05/19_12:35:48 info: Heartbeat generation: 2949
heartbeat: 2005/05/19_12:35:48 info: UDP Broadcast heartbeat started on port
694 (694) interface eth1
heartbeat: 2005/05/19_12:35:48 info: ping heartbeat started.
heartbeat: 2005/05/19_12:35:48 info: pid 3013 locked in memory.
heartbeat: 2005/05/19_12:35:48 info: Local status now set to: 'up'
heartbeat: 2005/05/19_12:35:49 info: pid 3016 locked in memory.
heartbeat: 2005/05/19_12:35:49 info: pid 3019 locked in memory.
heartbeat: 2005/05/19_12:35:49 info: pid 3020 locked in memory.
heartbeat: 2005/05/19_12:35:49 info: pid 3018 locked in memory.
heartbeat: 2005/05/19_12:35:49 info: pid 3017 locked in memory.
heartbeat: 2005/05/19_12:35:49 ERROR: Exiting HBWRITE process 3019 killed by
signal 11.
heartbeat: 2005/05/19_12:35:49 ERROR: Core heartbeat process died!
Restarting.
heartbeat: 2005/05/19_12:35:49 WARN: Shutdown delayed until current resource
activity finishes.
heartbeat: 2005/05/19_12:35:49 info: Link oailxwbtst.ssfhs.org:eth1 up.
heartbeat: 2005/05/19_12:37:48 WARN: node oailxwbts2.ssfhs.org: is dead
heartbeat: 2005/05/19_12:37:48 WARN: No STONITH device configured.
heartbeat: 2005/05/19_12:37:48 WARN: Shared disks are not protected.
heartbeat: 2005/05/19_12:37:48 info: Resources being acquired from
oailxwbts2.ssfhs.org.
heartbeat: 2005/05/19_12:37:48 WARN: node 192.168.6.254: is dead
heartbeat: 2005/05/19_12:37:48 info: Local status now set to: 'active'
heartbeat: 2005/05/19_12:37:48 info: Starting child client
"/usr/lib/heartbeat/ipfail" (501,501)
heartbeat: 2005/05/19_12:37:48 info: Starting "/usr/lib/heartbeat/ipfail" as
uid 501  gid 501 (pid 3025)
heartbeat: 2005/05/19_12:37:48 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2005/05/19_12:37:48 info: Local Resource acquisition completed.
heartbeat: 2005/05/19_12:37:48 info: /usr/lib/heartbeat/mach_down:
nice_failback: foreign resources acquired
heartbeat: 2005/05/19_12:37:48 info: Initial resource acquisition complete
(T_RESOURCES)
heartbeat: 2005/05/19_12:37:48 info: mach_down takeover complete for node
oailxwbts2.ssfhs.org.
heartbeat: 2005/05/19_12:37:48 info: mach_down takeover complete.
heartbeat: 2005/05/19_12:37:48 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2005/05/19_12:37:48 info: Running /etc/ha.d/rc.d/ip-request-resp
ip-request-resp
heartbeat: 2005/05/19_12:37:48 received ip-request-resp 10.90.2.194 OK yes
heartbeat: 2005/05/19_12:37:48 info: Acquiring resource group:
oailxwbtst.ssfhs.org 10.90.2.194 ldirectord
heartbeat: 2005/05/19_12:37:48 info: Running /etc/ha.d/resource.d/IPaddr
10.90.2.194 start
heartbeat: 2005/05/19_12:37:48 info: Removing conflicting loopback lo:0.
heartbeat: 2005/05/19_12:37:48 info: /sbin/ifconfig lo:0 down
heartbeat: 2005/05/19_12:37:48 info: /sbin/route -n del -host 10.90.2.194
heartbeat: 2005/05/19_12:37:48 info: /sbin/ifconfig eth0:0 10.90.2.194
netmask 255.255.255.0   broadcast 10.90.2.255
heartbeat: 2005/05/19_12:37:48 info: Sending Gratuitous Arp for 10.90.2.194
on eth0:0 [eth0]
heartbeat: 2005/05/19_12:37:48 /usr/lib/heartbeat/send_arp -i 1010 -r 5 -p
/var/lib/heartbeat/rsctmp/send_arp/send_arp-10.90.2.194 eth0 10.90.2.194
auto 10.90.2.194 ffffffffffff
heartbeat: 2005/05/19_12:37:49 info: Running /etc/ha.d/resource.d/ldirectord
start
heartbeat: 2005/05/19_12:38:00 info: Local Resource acquisition completed.
(none)
heartbeat: 2005/05/19_12:38:00 info: local resource transition completed.
heartbeat: 2005/05/19_12:38:00 info: Heartbeat shutdown in progress. (3013)
heartbeat: 2005/05/19_12:38:00 info: Giving up all HA resources.
heartbeat: 2005/05/19_12:38:00 info: Releasing resource group:
oailxwbtst.ssfhs.org 10.90.2.194 ldirectord
heartbeat: 2005/05/19_12:38:00 info: Running /etc/ha.d/resource.d/ldirectord
stop
heartbeat: 2005/05/19_12:38:00 info: Running /etc/ha.d/resource.d/IPaddr
10.90.2.194 stop
heartbeat: 2005/05/19_12:38:00 info: /sbin/route -n del -host 10.90.2.194
heartbeat: 2005/05/19_12:38:00 info: /sbin/ifconfig eth0:0 down
heartbeat: 2005/05/19_12:38:00 info: Restoring loopback IP Address
10.90.2.194 on lo:0.
heartbeat: 2005/05/19_12:38:00 info: /sbin/ifconfig lo:0 10.90.2.194 netmask
255.255.255.255
heartbeat: 2005/05/19_12:38:00 info: IP Address 10.90.2.194 released
heartbeat: 2005/05/19_12:38:00 info: killing /usr/lib/heartbeat/ipfail
process group 3025 with signal 15
heartbeat: 2005/05/19_12:38:00 info: All HA resources relinquished.
heartbeat: 2005/05/19_12:38:00 info: killing /usr/lib/heartbeat/ipfail
process group 3025 with signal 15
heartbeat: 2005/05/19_12:38:01 info: killing HBFIFO process 3016 with signal
15
heartbeat: 2005/05/19_12:38:01 info: killing HBWRITE process 3017 with
signal 15
heartbeat: 2005/05/19_12:38:01 info: killing HBREAD process 3018 with signal
15
heartbeat: 2005/05/19_12:38:01 info: killing HBREAD process 3020 with signal
15
heartbeat: 2005/05/19_12:38:01 info: Core process 3020 exited. 4 remaining
heartbeat: 2005/05/19_12:38:01 info: Core process 3018 exited. 3 remaining
heartbeat: 2005/05/19_12:38:01 info: Core process 3016 exited. 2 remaining
heartbeat: 2005/05/19_12:38:01 info: Core process 3017 exited. 1 remaining
heartbeat: 2005/05/19_12:38:01 info: Heartbeat shutdown complete.
heartbeat: 2005/05/19_12:38:01 info: Heartbeat restart triggered.
heartbeat: 2005/05/19_12:38:01 info: Restarting heartbeat.
heartbeat: 2005/05/19_12:38:01 info: Performing heartbeat restart exec.
heartbeat: 2005/05/19_12:38:32 info: **************************
heartbeat: 2005/05/19_12:38:32 info: Configuration validated. Starting
heartbeat 1.2.3.cvs.20050404
heartbeat: 2005/05/19_12:38:32 info: heartbeat: version 1.2.3.cvs.20050404
heartbeat: 2005/05/19_12:38:32 info: Heartbeat generation: 2950
heartbeat: 2005/05/19_12:38:32 info: UDP Broadcast heartbeat started on port
694 (694) interface eth1
heartbeat: 2005/05/19_12:38:32 info: ping heartbeat started.
heartbeat: 2005/05/19_12:38:32 info: pid 3355 locked in memory.
heartbeat: 2005/05/19_12:38:32 info: Local status now set to: 'up'
heartbeat: 2005/05/19_12:38:33 info: pid 3357 locked in memory.
heartbeat: 2005/05/19_12:38:33 info: pid 3361 locked in memory.
heartbeat: 2005/05/19_12:38:33 info: pid 3358 locked in memory.
heartbeat: 2005/05/19_12:38:33 info: pid 3359 locked in memory.
heartbeat: 2005/05/19_12:38:33 info: pid 3360 locked in memory.
heartbeat: 2005/05/19_12:38:33 info: Link oailxwbts2.ssfhs.org:eth1 up.
heartbeat: 2005/05/19_12:38:33 info: Status update for node
oailxwbts2.ssfhs.org: status up
heartbeat: 2005/05/19_12:38:33 info: Link oailxwbtst.ssfhs.org:eth1 up.
heartbeat: 2005/05/19_12:38:33 ERROR: Exiting HBWRITE process 3360 killed by
signal 11.
heartbeat: 2005/05/19_12:38:33 ERROR: Core heartbeat process died!
Restarting.
heartbeat: 2005/05/19_12:38:33 WARN: Shutdown delayed until current resource
activity finishes.
heartbeat: 2005/05/19_12:38:33 info: Status update for node
oailxwbts2.ssfhs.org: status active
heartbeat: 2005/05/19_12:38:33 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2005/05/19_12:38:33 info: Running /etc/ha.d/rc.d/status status
____________________________________________________________________________
__________________________________________

OAILXWBTS2's ha-log is
____________________________________________________________________________
__________________________________________
heartbeat: 2005/05/19_12:38:03 info: **************************
heartbeat: 2005/05/19_12:38:03 info: Configuration validated. Starting
heartbeat 1.2.3.cvs.20050404
heartbeat: 2005/05/19_12:38:03 info: heartbeat: version 1.2.3.cvs.20050404
heartbeat: 2005/05/19_12:38:03 info: Heartbeat generation: 15
heartbeat: 2005/05/19_12:38:03 info: UDP Broadcast heartbeat started on port
694 (694) interface eth1
heartbeat: 2005/05/19_12:38:03 info: pid 2994 locked in memory.
heartbeat: 2005/05/19_12:38:03 info: Local status now set to: 'up'
heartbeat: 2005/05/19_12:38:04 info: pid 2997 locked in memory.
heartbeat: 2005/05/19_12:38:04 info: pid 2999 locked in memory.
heartbeat: 2005/05/19_12:38:04 info: pid 2998 locked in memory.
heartbeat: 2005/05/19_12:38:04 info: Link oailxwbts2.ssfhs.org:eth1 up.
heartbeat: 2005/05/19_12:38:33 info: Link oailxwbtst.ssfhs.org:eth1 up.
heartbeat: 2005/05/19_12:38:33 info: Status update for node
oailxwbtst.ssfhs.org: status up
heartbeat: 2005/05/19_12:38:33 info: Local status now set to: 'active'
heartbeat: 2005/05/19_12:38:33 info: Running /etc/ha.d/rc.d/status status
____________________________________________________________________________
__________________________________________

The ldirector.log is blank on both servers

Failover/Failback seems to work correctly if I loose the whole server (or
down the heartbeat nic on the active server)  But if I stop httpd on the
active server no failover happens.  I just get the site did not answer error
message when I try to browse to the virtual site.  I also do not get any
load balancing.  If I connect from 5 machines, all five go to the same
server.

This was the first time since I built the servers that I actually went to
the console.  I received this error on both server consoles:

IPVS: set_ctl: Len92 != 68
Module is wrong version.

I'm headed out to the web to try to troubleshoot that now.

> Michael Deputy
Senior Systems Engineer
Alverno Information Services
317-532-7800 x6287
Michael.Deputy@xxxxxxxxx
http://www.alverno.org


____________________________________________________________________________
__________________________________ 
The information contained in this email and any accompanying documents is
intended for the sole use of the recipient to whom it is addressed, and may
contain information that is privileged, confidential, and prohibited from
disclosure under applicable law. If you are not the intended recipient, or
authorized to receive this on behalf of the recipient, you are hereby
notified that any review, use, disclosure, copying, or distribution is
prohibited. If you are not the intended recipient(s), please contact the
sender by e-mail and destroy all copies of the original message. Thank you.

<Prev in Thread] Current Thread [Next in Thread>