LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

RE: Ldirectord starting problems

To: "'LinuxVirtualServer.org users mailing list.'" <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: RE: Ldirectord starting problems
From: Deputy Michael <Michael.Deputy@xxxxxxxxx>
Date: Fri, 20 May 2005 15:33:30 -0500
I started following the RHEL-4 thread.  CentOS4 is a clone of RHEL4.  So I
re-built my two machines with CentOS 3.4 (a clone of RHEL3) and installed
the software and copied my config files and re-booted.  I'm still testing
but it seems to be working.  My latest test has been to do a ping of the VIP
and bounce each of the "real" servers.  Ping never dropped a packet.  I'm
guessing the IPVS errors on the 2.6 kernel was my problem.

Thanks for all your assistance.

Dep

-----Original Message-----
From: Graham David Purcocks M.A.(Oxon.) [mailto:grahamp@xxxxxxxxxxxxx]
Sent: Friday, May 20, 2005 9:37 AM
To: LinuxVirtualServer.org users mailing list.
Subject: RE: Ldirectord starting problems


Its starting up and immediately being shutdown which suggest your
hearbeat config is incorrect and so its releasing itself as master.

So I guess you need to show your heartbeat config next.

On Fri, 2005-05-20 at 14:39, Deputy Michael wrote:
> I'm not actually getting errors from heartbeat.
> oailxwntst's ha-log
>
____________________________________________________________________________
> __________________________________________
> heartbeat: 2005/05/19_12:35:47 info: **************************
> heartbeat: 2005/05/19_12:35:47 info: Configuration validated. Starting
> heartbeat 1.2.3.cvs.20050404
> heartbeat: 2005/05/19_12:35:47 info: heartbeat: version 1.2.3.cvs.20050404
> heartbeat: 2005/05/19_12:35:48 info: Heartbeat generation: 2949
> heartbeat: 2005/05/19_12:35:48 info: UDP Broadcast heartbeat started on
port
> 694 (694) interface eth1
> heartbeat: 2005/05/19_12:35:48 info: ping heartbeat started.
> heartbeat: 2005/05/19_12:35:48 info: pid 3013 locked in memory.
> heartbeat: 2005/05/19_12:35:48 info: Local status now set to: 'up'
> heartbeat: 2005/05/19_12:35:49 info: pid 3016 locked in memory.
> heartbeat: 2005/05/19_12:35:49 info: pid 3019 locked in memory.
> heartbeat: 2005/05/19_12:35:49 info: pid 3020 locked in memory.
> heartbeat: 2005/05/19_12:35:49 info: pid 3018 locked in memory.
> heartbeat: 2005/05/19_12:35:49 info: pid 3017 locked in memory.
> heartbeat: 2005/05/19_12:35:49 ERROR: Exiting HBWRITE process 3019 killed
by
> signal 11.
> heartbeat: 2005/05/19_12:35:49 ERROR: Core heartbeat process died!
> Restarting.
> heartbeat: 2005/05/19_12:35:49 WARN: Shutdown delayed until current
resource
> activity finishes.
> heartbeat: 2005/05/19_12:35:49 info: Link oailxwbtst.ssfhs.org:eth1 up.
> heartbeat: 2005/05/19_12:37:48 WARN: node oailxwbts2.ssfhs.org: is dead
> heartbeat: 2005/05/19_12:37:48 WARN: No STONITH device configured.
> heartbeat: 2005/05/19_12:37:48 WARN: Shared disks are not protected.
> heartbeat: 2005/05/19_12:37:48 info: Resources being acquired from
> oailxwbts2.ssfhs.org.
> heartbeat: 2005/05/19_12:37:48 WARN: node 192.168.6.254: is dead
> heartbeat: 2005/05/19_12:37:48 info: Local status now set to: 'active'
> heartbeat: 2005/05/19_12:37:48 info: Starting child client
> "/usr/lib/heartbeat/ipfail" (501,501)
> heartbeat: 2005/05/19_12:37:48 info: Starting "/usr/lib/heartbeat/ipfail"
as
> uid 501  gid 501 (pid 3025)
> heartbeat: 2005/05/19_12:37:48 info: Running /etc/ha.d/rc.d/status status
> heartbeat: 2005/05/19_12:37:48 info: Local Resource acquisition completed.
> heartbeat: 2005/05/19_12:37:48 info: /usr/lib/heartbeat/mach_down:
> nice_failback: foreign resources acquired
> heartbeat: 2005/05/19_12:37:48 info: Initial resource acquisition complete
> (T_RESOURCES)
> heartbeat: 2005/05/19_12:37:48 info: mach_down takeover complete for node
> oailxwbts2.ssfhs.org.
> heartbeat: 2005/05/19_12:37:48 info: mach_down takeover complete.
> heartbeat: 2005/05/19_12:37:48 info: Running /etc/ha.d/rc.d/status status
> heartbeat: 2005/05/19_12:37:48 info: Running
/etc/ha.d/rc.d/ip-request-resp
> ip-request-resp
> heartbeat: 2005/05/19_12:37:48 received ip-request-resp 10.90.2.194 OK yes
> heartbeat: 2005/05/19_12:37:48 info: Acquiring resource group:
> oailxwbtst.ssfhs.org 10.90.2.194 ldirectord
> heartbeat: 2005/05/19_12:37:48 info: Running /etc/ha.d/resource.d/IPaddr
> 10.90.2.194 start
> heartbeat: 2005/05/19_12:37:48 info: Removing conflicting loopback lo:0.
> heartbeat: 2005/05/19_12:37:48 info: /sbin/ifconfig lo:0 down
> heartbeat: 2005/05/19_12:37:48 info: /sbin/route -n del -host 10.90.2.194
> heartbeat: 2005/05/19_12:37:48 info: /sbin/ifconfig eth0:0 10.90.2.194
> netmask 255.255.255.0 broadcast 10.90.2.255
> heartbeat: 2005/05/19_12:37:48 info: Sending Gratuitous Arp for
10.90.2.194
> on eth0:0 [eth0]
> heartbeat: 2005/05/19_12:37:48 /usr/lib/heartbeat/send_arp -i 1010 -r 5 -p
> /var/lib/heartbeat/rsctmp/send_arp/send_arp-10.90.2.194 eth0 10.90.2.194
> auto 10.90.2.194 ffffffffffff
> heartbeat: 2005/05/19_12:37:49 info: Running
/etc/ha.d/resource.d/ldirectord
> start
> heartbeat: 2005/05/19_12:38:00 info: Local Resource acquisition completed.
> (none)
> heartbeat: 2005/05/19_12:38:00 info: local resource transition completed.
> heartbeat: 2005/05/19_12:38:00 info: Heartbeat shutdown in progress.
(3013)
> heartbeat: 2005/05/19_12:38:00 info: Giving up all HA resources.
> heartbeat: 2005/05/19_12:38:00 info: Releasing resource group:
> oailxwbtst.ssfhs.org 10.90.2.194 ldirectord
> heartbeat: 2005/05/19_12:38:00 info: Running
/etc/ha.d/resource.d/ldirectord
> stop
> heartbeat: 2005/05/19_12:38:00 info: Running /etc/ha.d/resource.d/IPaddr
> 10.90.2.194 stop
> heartbeat: 2005/05/19_12:38:00 info: /sbin/route -n del -host 10.90.2.194
> heartbeat: 2005/05/19_12:38:00 info: /sbin/ifconfig eth0:0 down
> heartbeat: 2005/05/19_12:38:00 info: Restoring loopback IP Address
> 10.90.2.194 on lo:0.
> heartbeat: 2005/05/19_12:38:00 info: /sbin/ifconfig lo:0 10.90.2.194
netmask
> 255.255.255.255
> heartbeat: 2005/05/19_12:38:00 info: IP Address 10.90.2.194 released
> heartbeat: 2005/05/19_12:38:00 info: killing /usr/lib/heartbeat/ipfail
> process group 3025 with signal 15
> heartbeat: 2005/05/19_12:38:00 info: All HA resources relinquished.
> heartbeat: 2005/05/19_12:38:00 info: killing /usr/lib/heartbeat/ipfail
> process group 3025 with signal 15
> heartbeat: 2005/05/19_12:38:01 info: killing HBFIFO process 3016 with
signal
> 15
> heartbeat: 2005/05/19_12:38:01 info: killing HBWRITE process 3017 with
> signal 15
> heartbeat: 2005/05/19_12:38:01 info: killing HBREAD process 3018 with
signal
> 15
> heartbeat: 2005/05/19_12:38:01 info: killing HBREAD process 3020 with
signal
> 15
> heartbeat: 2005/05/19_12:38:01 info: Core process 3020 exited. 4 remaining
> heartbeat: 2005/05/19_12:38:01 info: Core process 3018 exited. 3 remaining
> heartbeat: 2005/05/19_12:38:01 info: Core process 3016 exited. 2 remaining
> heartbeat: 2005/05/19_12:38:01 info: Core process 3017 exited. 1 remaining
> heartbeat: 2005/05/19_12:38:01 info: Heartbeat shutdown complete.
> heartbeat: 2005/05/19_12:38:01 info: Heartbeat restart triggered.
> heartbeat: 2005/05/19_12:38:01 info: Restarting heartbeat.
> heartbeat: 2005/05/19_12:38:01 info: Performing heartbeat restart exec.
> heartbeat: 2005/05/19_12:38:32 info: **************************
> heartbeat: 2005/05/19_12:38:32 info: Configuration validated. Starting
> heartbeat 1.2.3.cvs.20050404
> heartbeat: 2005/05/19_12:38:32 info: heartbeat: version 1.2.3.cvs.20050404
> heartbeat: 2005/05/19_12:38:32 info: Heartbeat generation: 2950
> heartbeat: 2005/05/19_12:38:32 info: UDP Broadcast heartbeat started on
port
> 694 (694) interface eth1
> heartbeat: 2005/05/19_12:38:32 info: ping heartbeat started.
> heartbeat: 2005/05/19_12:38:32 info: pid 3355 locked in memory.
> heartbeat: 2005/05/19_12:38:32 info: Local status now set to: 'up'
> heartbeat: 2005/05/19_12:38:33 info: pid 3357 locked in memory.
> heartbeat: 2005/05/19_12:38:33 info: pid 3361 locked in memory.
> heartbeat: 2005/05/19_12:38:33 info: pid 3358 locked in memory.
> heartbeat: 2005/05/19_12:38:33 info: pid 3359 locked in memory.
> heartbeat: 2005/05/19_12:38:33 info: pid 3360 locked in memory.
> heartbeat: 2005/05/19_12:38:33 info: Link oailxwbts2.ssfhs.org:eth1 up.
> heartbeat: 2005/05/19_12:38:33 info: Status update for node
> oailxwbts2.ssfhs.org: status up
> heartbeat: 2005/05/19_12:38:33 info: Link oailxwbtst.ssfhs.org:eth1 up.
> heartbeat: 2005/05/19_12:38:33 ERROR: Exiting HBWRITE process 3360 killed
by
> signal 11.
> heartbeat: 2005/05/19_12:38:33 ERROR: Core heartbeat process died!
> Restarting.
> heartbeat: 2005/05/19_12:38:33 WARN: Shutdown delayed until current
resource
> activity finishes.
> heartbeat: 2005/05/19_12:38:33 info: Status update for node
> oailxwbts2.ssfhs.org: status active
> heartbeat: 2005/05/19_12:38:33 info: Running /etc/ha.d/rc.d/status status
> heartbeat: 2005/05/19_12:38:33 info: Running /etc/ha.d/rc.d/status status
>
____________________________________________________________________________
> __________________________________________
> 
> OAILXWBTS2's ha-log is
>
____________________________________________________________________________
> __________________________________________
> heartbeat: 2005/05/19_12:38:03 info: **************************
> heartbeat: 2005/05/19_12:38:03 info: Configuration validated. Starting
> heartbeat 1.2.3.cvs.20050404
> heartbeat: 2005/05/19_12:38:03 info: heartbeat: version 1.2.3.cvs.20050404
> heartbeat: 2005/05/19_12:38:03 info: Heartbeat generation: 15
> heartbeat: 2005/05/19_12:38:03 info: UDP Broadcast heartbeat started on
port
> 694 (694) interface eth1
> heartbeat: 2005/05/19_12:38:03 info: pid 2994 locked in memory.
> heartbeat: 2005/05/19_12:38:03 info: Local status now set to: 'up'
> heartbeat: 2005/05/19_12:38:04 info: pid 2997 locked in memory.
> heartbeat: 2005/05/19_12:38:04 info: pid 2999 locked in memory.
> heartbeat: 2005/05/19_12:38:04 info: pid 2998 locked in memory.
> heartbeat: 2005/05/19_12:38:04 info: Link oailxwbts2.ssfhs.org:eth1 up.
> heartbeat: 2005/05/19_12:38:33 info: Link oailxwbtst.ssfhs.org:eth1 up.
> heartbeat: 2005/05/19_12:38:33 info: Status update for node
> oailxwbtst.ssfhs.org: status up
> heartbeat: 2005/05/19_12:38:33 info: Local status now set to: 'active'
> heartbeat: 2005/05/19_12:38:33 info: Running /etc/ha.d/rc.d/status status
>
____________________________________________________________________________
> __________________________________________
> 
> The ldirector.log is blank on both servers
> 
> Failover/Failback seems to work correctly if I loose the whole server (or
> down the heartbeat nic on the active server)  But if I stop httpd on the
> active server no failover happens.  I just get the site did not answer
error
> message when I try to browse to the virtual site.  I also do not get any
> load balancing.  If I connect from 5 machines, all five go to the same
> server.
> 
> This was the first time since I built the servers that I actually went to
> the console.  I received this error on both server consoles:
> 
> IPVS: set_ctl: Len92 != 68
> Module is wrong version.
> 
> I'm headed out to the web to try to troubleshoot that now.
> 
> > Michael Deputy
> Senior Systems Engineer
> Alverno Information Services
> 317-532-7800 x6287
> Michael.Deputy@xxxxxxxxx
> http://www.alverno.org
> 
> 
>
____________________________________________________________________________
> __________________________________ 
> The information contained in this email and any accompanying documents is
> intended for the sole use of the recipient to whom it is addressed, and
may
> contain information that is privileged, confidential, and prohibited from
> disclosure under applicable law. If you are not the intended recipient, or
> authorized to receive this on behalf of the recipient, you are hereby
> notified that any review, use, disclosure, copying, or distribution is
> prohibited. If you are not the intended recipient(s), please contact the
> sender by e-mail and destroy all copies of the original message. Thank
you.
> _______________________________________________
> LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
> Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
> or go to http://www.in-addr.de/mailman/listinfo/lvs-users
-- 

____________________________________________________________________________
__________________________________ 
The information contained in this email and any accompanying documents is
intended for the sole use of the recipient to whom it is addressed, and may
contain information that is privileged, confidential, and prohibited from
disclosure under applicable law. If you are not the intended recipient, or
authorized to receive this on behalf of the recipient, you are hereby
notified that any review, use, disclosure, copying, or distribution is
prohibited. If you are not the intended recipient(s), please contact the
sender by e-mail and destroy all copies of the original message. Thank you.

<Prev in Thread] Current Thread [Next in Thread>