LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

RE: Ldirectord starting problems

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: RE: Ldirectord starting problems
From: "Graham David Purcocks M.A.(Oxon.)" <grahamp@xxxxxxxxxxxxx>
Date: Fri, 20 May 2005 15:36:45 +0100
Its starting up and immediately being shutdown which suggest your
hearbeat config is incorrect and so its releasing itself as master.

So I guess you need to show your heartbeat config next.

On Fri, 2005-05-20 at 14:39, Deputy Michael wrote:
> I'm not actually getting errors from heartbeat.
> oailxwntst's ha-log
> ____________________________________________________________________________
> __________________________________________
> heartbeat: 2005/05/19_12:35:47 info: **************************
> heartbeat: 2005/05/19_12:35:47 info: Configuration validated. Starting
> heartbeat 1.2.3.cvs.20050404
> heartbeat: 2005/05/19_12:35:47 info: heartbeat: version 1.2.3.cvs.20050404
> heartbeat: 2005/05/19_12:35:48 info: Heartbeat generation: 2949
> heartbeat: 2005/05/19_12:35:48 info: UDP Broadcast heartbeat started on port
> 694 (694) interface eth1
> heartbeat: 2005/05/19_12:35:48 info: ping heartbeat started.
> heartbeat: 2005/05/19_12:35:48 info: pid 3013 locked in memory.
> heartbeat: 2005/05/19_12:35:48 info: Local status now set to: 'up'
> heartbeat: 2005/05/19_12:35:49 info: pid 3016 locked in memory.
> heartbeat: 2005/05/19_12:35:49 info: pid 3019 locked in memory.
> heartbeat: 2005/05/19_12:35:49 info: pid 3020 locked in memory.
> heartbeat: 2005/05/19_12:35:49 info: pid 3018 locked in memory.
> heartbeat: 2005/05/19_12:35:49 info: pid 3017 locked in memory.
> heartbeat: 2005/05/19_12:35:49 ERROR: Exiting HBWRITE process 3019 killed by
> signal 11.
> heartbeat: 2005/05/19_12:35:49 ERROR: Core heartbeat process died!
> Restarting.
> heartbeat: 2005/05/19_12:35:49 WARN: Shutdown delayed until current resource
> activity finishes.
> heartbeat: 2005/05/19_12:35:49 info: Link oailxwbtst.ssfhs.org:eth1 up.
> heartbeat: 2005/05/19_12:37:48 WARN: node oailxwbts2.ssfhs.org: is dead
> heartbeat: 2005/05/19_12:37:48 WARN: No STONITH device configured.
> heartbeat: 2005/05/19_12:37:48 WARN: Shared disks are not protected.
> heartbeat: 2005/05/19_12:37:48 info: Resources being acquired from
> oailxwbts2.ssfhs.org.
> heartbeat: 2005/05/19_12:37:48 WARN: node 192.168.6.254: is dead
> heartbeat: 2005/05/19_12:37:48 info: Local status now set to: 'active'
> heartbeat: 2005/05/19_12:37:48 info: Starting child client
> "/usr/lib/heartbeat/ipfail" (501,501)
> heartbeat: 2005/05/19_12:37:48 info: Starting "/usr/lib/heartbeat/ipfail" as
> uid 501  gid 501 (pid 3025)
> heartbeat: 2005/05/19_12:37:48 info: Running /etc/ha.d/rc.d/status status
> heartbeat: 2005/05/19_12:37:48 info: Local Resource acquisition completed.
> heartbeat: 2005/05/19_12:37:48 info: /usr/lib/heartbeat/mach_down:
> nice_failback: foreign resources acquired
> heartbeat: 2005/05/19_12:37:48 info: Initial resource acquisition complete
> (T_RESOURCES)
> heartbeat: 2005/05/19_12:37:48 info: mach_down takeover complete for node
> oailxwbts2.ssfhs.org.
> heartbeat: 2005/05/19_12:37:48 info: mach_down takeover complete.
> heartbeat: 2005/05/19_12:37:48 info: Running /etc/ha.d/rc.d/status status
> heartbeat: 2005/05/19_12:37:48 info: Running /etc/ha.d/rc.d/ip-request-resp
> ip-request-resp
> heartbeat: 2005/05/19_12:37:48 received ip-request-resp 10.90.2.194 OK yes
> heartbeat: 2005/05/19_12:37:48 info: Acquiring resource group:
> oailxwbtst.ssfhs.org 10.90.2.194 ldirectord
> heartbeat: 2005/05/19_12:37:48 info: Running /etc/ha.d/resource.d/IPaddr
> 10.90.2.194 start
> heartbeat: 2005/05/19_12:37:48 info: Removing conflicting loopback lo:0.
> heartbeat: 2005/05/19_12:37:48 info: /sbin/ifconfig lo:0 down
> heartbeat: 2005/05/19_12:37:48 info: /sbin/route -n del -host 10.90.2.194
> heartbeat: 2005/05/19_12:37:48 info: /sbin/ifconfig eth0:0 10.90.2.194
> netmask 255.255.255.0 broadcast 10.90.2.255
> heartbeat: 2005/05/19_12:37:48 info: Sending Gratuitous Arp for 10.90.2.194
> on eth0:0 [eth0]
> heartbeat: 2005/05/19_12:37:48 /usr/lib/heartbeat/send_arp -i 1010 -r 5 -p
> /var/lib/heartbeat/rsctmp/send_arp/send_arp-10.90.2.194 eth0 10.90.2.194
> auto 10.90.2.194 ffffffffffff
> heartbeat: 2005/05/19_12:37:49 info: Running /etc/ha.d/resource.d/ldirectord
> start
> heartbeat: 2005/05/19_12:38:00 info: Local Resource acquisition completed.
> (none)
> heartbeat: 2005/05/19_12:38:00 info: local resource transition completed.
> heartbeat: 2005/05/19_12:38:00 info: Heartbeat shutdown in progress. (3013)
> heartbeat: 2005/05/19_12:38:00 info: Giving up all HA resources.
> heartbeat: 2005/05/19_12:38:00 info: Releasing resource group:
> oailxwbtst.ssfhs.org 10.90.2.194 ldirectord
> heartbeat: 2005/05/19_12:38:00 info: Running /etc/ha.d/resource.d/ldirectord
> stop
> heartbeat: 2005/05/19_12:38:00 info: Running /etc/ha.d/resource.d/IPaddr
> 10.90.2.194 stop
> heartbeat: 2005/05/19_12:38:00 info: /sbin/route -n del -host 10.90.2.194
> heartbeat: 2005/05/19_12:38:00 info: /sbin/ifconfig eth0:0 down
> heartbeat: 2005/05/19_12:38:00 info: Restoring loopback IP Address
> 10.90.2.194 on lo:0.
> heartbeat: 2005/05/19_12:38:00 info: /sbin/ifconfig lo:0 10.90.2.194 netmask
> 255.255.255.255
> heartbeat: 2005/05/19_12:38:00 info: IP Address 10.90.2.194 released
> heartbeat: 2005/05/19_12:38:00 info: killing /usr/lib/heartbeat/ipfail
> process group 3025 with signal 15
> heartbeat: 2005/05/19_12:38:00 info: All HA resources relinquished.
> heartbeat: 2005/05/19_12:38:00 info: killing /usr/lib/heartbeat/ipfail
> process group 3025 with signal 15
> heartbeat: 2005/05/19_12:38:01 info: killing HBFIFO process 3016 with signal
> 15
> heartbeat: 2005/05/19_12:38:01 info: killing HBWRITE process 3017 with
> signal 15
> heartbeat: 2005/05/19_12:38:01 info: killing HBREAD process 3018 with signal
> 15
> heartbeat: 2005/05/19_12:38:01 info: killing HBREAD process 3020 with signal
> 15
> heartbeat: 2005/05/19_12:38:01 info: Core process 3020 exited. 4 remaining
> heartbeat: 2005/05/19_12:38:01 info: Core process 3018 exited. 3 remaining
> heartbeat: 2005/05/19_12:38:01 info: Core process 3016 exited. 2 remaining
> heartbeat: 2005/05/19_12:38:01 info: Core process 3017 exited. 1 remaining
> heartbeat: 2005/05/19_12:38:01 info: Heartbeat shutdown complete.
> heartbeat: 2005/05/19_12:38:01 info: Heartbeat restart triggered.
> heartbeat: 2005/05/19_12:38:01 info: Restarting heartbeat.
> heartbeat: 2005/05/19_12:38:01 info: Performing heartbeat restart exec.
> heartbeat: 2005/05/19_12:38:32 info: **************************
> heartbeat: 2005/05/19_12:38:32 info: Configuration validated. Starting
> heartbeat 1.2.3.cvs.20050404
> heartbeat: 2005/05/19_12:38:32 info: heartbeat: version 1.2.3.cvs.20050404
> heartbeat: 2005/05/19_12:38:32 info: Heartbeat generation: 2950
> heartbeat: 2005/05/19_12:38:32 info: UDP Broadcast heartbeat started on port
> 694 (694) interface eth1
> heartbeat: 2005/05/19_12:38:32 info: ping heartbeat started.
> heartbeat: 2005/05/19_12:38:32 info: pid 3355 locked in memory.
> heartbeat: 2005/05/19_12:38:32 info: Local status now set to: 'up'
> heartbeat: 2005/05/19_12:38:33 info: pid 3357 locked in memory.
> heartbeat: 2005/05/19_12:38:33 info: pid 3361 locked in memory.
> heartbeat: 2005/05/19_12:38:33 info: pid 3358 locked in memory.
> heartbeat: 2005/05/19_12:38:33 info: pid 3359 locked in memory.
> heartbeat: 2005/05/19_12:38:33 info: pid 3360 locked in memory.
> heartbeat: 2005/05/19_12:38:33 info: Link oailxwbts2.ssfhs.org:eth1 up.
> heartbeat: 2005/05/19_12:38:33 info: Status update for node
> oailxwbts2.ssfhs.org: status up
> heartbeat: 2005/05/19_12:38:33 info: Link oailxwbtst.ssfhs.org:eth1 up.
> heartbeat: 2005/05/19_12:38:33 ERROR: Exiting HBWRITE process 3360 killed by
> signal 11.
> heartbeat: 2005/05/19_12:38:33 ERROR: Core heartbeat process died!
> Restarting.
> heartbeat: 2005/05/19_12:38:33 WARN: Shutdown delayed until current resource
> activity finishes.
> heartbeat: 2005/05/19_12:38:33 info: Status update for node
> oailxwbts2.ssfhs.org: status active
> heartbeat: 2005/05/19_12:38:33 info: Running /etc/ha.d/rc.d/status status
> heartbeat: 2005/05/19_12:38:33 info: Running /etc/ha.d/rc.d/status status
> ____________________________________________________________________________
> __________________________________________
> 
> OAILXWBTS2's ha-log is
> ____________________________________________________________________________
> __________________________________________
> heartbeat: 2005/05/19_12:38:03 info: **************************
> heartbeat: 2005/05/19_12:38:03 info: Configuration validated. Starting
> heartbeat 1.2.3.cvs.20050404
> heartbeat: 2005/05/19_12:38:03 info: heartbeat: version 1.2.3.cvs.20050404
> heartbeat: 2005/05/19_12:38:03 info: Heartbeat generation: 15
> heartbeat: 2005/05/19_12:38:03 info: UDP Broadcast heartbeat started on port
> 694 (694) interface eth1
> heartbeat: 2005/05/19_12:38:03 info: pid 2994 locked in memory.
> heartbeat: 2005/05/19_12:38:03 info: Local status now set to: 'up'
> heartbeat: 2005/05/19_12:38:04 info: pid 2997 locked in memory.
> heartbeat: 2005/05/19_12:38:04 info: pid 2999 locked in memory.
> heartbeat: 2005/05/19_12:38:04 info: pid 2998 locked in memory.
> heartbeat: 2005/05/19_12:38:04 info: Link oailxwbts2.ssfhs.org:eth1 up.
> heartbeat: 2005/05/19_12:38:33 info: Link oailxwbtst.ssfhs.org:eth1 up.
> heartbeat: 2005/05/19_12:38:33 info: Status update for node
> oailxwbtst.ssfhs.org: status up
> heartbeat: 2005/05/19_12:38:33 info: Local status now set to: 'active'
> heartbeat: 2005/05/19_12:38:33 info: Running /etc/ha.d/rc.d/status status
> ____________________________________________________________________________
> __________________________________________
> 
> The ldirector.log is blank on both servers
> 
> Failover/Failback seems to work correctly if I loose the whole server (or
> down the heartbeat nic on the active server)  But if I stop httpd on the
> active server no failover happens.  I just get the site did not answer error
> message when I try to browse to the virtual site.  I also do not get any
> load balancing.  If I connect from 5 machines, all five go to the same
> server.
> 
> This was the first time since I built the servers that I actually went to
> the console.  I received this error on both server consoles:
> 
> IPVS: set_ctl: Len92 != 68
> Module is wrong version.
> 
> I'm headed out to the web to try to troubleshoot that now.
> 
> > Michael Deputy
> Senior Systems Engineer
> Alverno Information Services
> 317-532-7800 x6287
> Michael.Deputy@xxxxxxxxx
> http://www.alverno.org
> 
> 
> ____________________________________________________________________________
> __________________________________ 
> The information contained in this email and any accompanying documents is
> intended for the sole use of the recipient to whom it is addressed, and may
> contain information that is privileged, confidential, and prohibited from
> disclosure under applicable law. If you are not the intended recipient, or
> authorized to receive this on behalf of the recipient, you are hereby
> notified that any review, use, disclosure, copying, or distribution is
> prohibited. If you are not the intended recipient(s), please contact the
> sender by e-mail and destroy all copies of the original message. Thank you.
> _______________________________________________
> LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
> Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
> or go to http://www.in-addr.de/mailman/listinfo/lvs-users
-- 

<Prev in Thread] Current Thread [Next in Thread>