LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

[lvs-users] Heartbeat and ldirector taking a long time to change over.

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: [lvs-users] Heartbeat and ldirector taking a long time to change over.
From: Eric Renfro <erenfro@xxxxxxxxxxx>
Date: Mon, 21 Dec 2009 11:21:58 -0500
Hello,

I'm trying to resolve a current problem I have setting up a pair of LVS
load balancing servers using heartbeat and ldirector under Gentoo.

I am using heartbeat 2.0.8 on two servers and the heartbeat and
ldirector setup is not very extensive but should be working better than
it is. I will provide complete configurations, minus IP's themselves,
but to explain the problem up front, the issues I'm having is rather
strange.

Our servers are named simply, network1, and network2, which I will use
to explain the issue.

How I am discovering these issues is when I shut down either network1 or
network2's heartbeat process, it successfully releases the IP and passes
it on to the other to take over. It does this rather quickly as
expected, however, when it brings up ldirector, that is when the
problems begin. We have two clusters of three webservers each, on both
http and https ports. On network1, it immediately brings up the first
cluster that was setup with all three RIP nodes active but inaccessible.
All the others are weighted to 0 under a weighted-based setup, otherwise
they are non-existent and going to the fallback server RIP initially.
For about 5-10 minutes the replaced heartbeat+ldirector server has heavy
CPU load with ksoftirq/0 and ksoftirq/1 being the culprits of the active
CPU load, atop confirms this by having 3 irq's showing at 200%, 100%,
and 100%, last 5-10 minutes.

Once that all clears up and goes back to normal, ipvs routes show up
almost instantaneously and furthermore, actually works.

I do not know what is causing this issue and I would like some help to
resolve this issue.

Follow are the configuration files used. Virtual IP's are replaced by
xx.xx.101.13 and xx.xx.101.16 because there are two VIP's involved.
Related RIP's are also done similarly as Cluster1 (xx.xx.101.227,
xx.xx.101.226, xx.xx.101.224) and Cluster 2 (xx.xx.108.102,
xx.xx.101.183, xx.xx.101.184) being there are 6 total servers in two
different clusters. The actual network server's IP's are, network1
(xx.xx.101.153), and network2 (xx.xx.108.203).

ha.d/haresources:

network1.ourserver.com   xx.xx.101.13/24/eth0 ldirectord
network1.ourserver.com    xx.xx.101.16/24/eth0 ldirectord


ha.cf:

logfacility     local0
keepalive 2
deadtime 30
warntime 10
bcast eth1
auto_failback on
node    network1.ourserver.com
node    network2.ourserver.com


ldirector.cf:

checktimeout=3
checkinterval=5
#negotiatetimeout=5
autoreload=yes
logfile="local0"
quiescent=no

virtual = xx.xx.101.13:80
        fallback = xx.xx.101.13:80 gate
        real = xx.xx.101.227:80 gate
        real = xx.xx.101.226:80 gate
        real = xx.xx.101.224:80 gate
        scheduler = lc
        persistent = 7200
        protocol = tcp
        service = http
        httpmethod = HEAD
        request = "/"
        checktype = negotiate

virtual = xx.xx.101.13:443
        fallback = xx.xx.101.13:443 gate
        real = xx.xx.101.227:443 gate
        real = xx.xx.101.226:443 gate
        real = xx.xx.101.224:443 gate
        scheduler = lc
        persistent = 7200
        protocol = tcp
        service = https
        httpmethod = HEAD
        request = "/"
        checktype = negotiate

virtual = xx.xx.101.16:80
        fallback = xx.xx.101.16:80 gate
        real = xx.xx.108.102:80 gate
        real = xx.xx.101.183:80 gate
        real = xx.xx.101.184:80 gate
        scheduler = lc
        persistent = 7200
        protocol = tcp
        service = http
        httpmethod = HEAD
        request = "/"
        checktype = negotiate

virtual = xx.xx.101.16:443
        fallback = xx.xx.101.16:443 gate
        real = xx.xx.108.102:443 gate
        real = xx.xx.101.183:443 gate
        real = xx.xx.101.184:443 gate
        scheduler = lc
        persistent = 7200
        protocol = tcp
        service = https
        httpmethod = HEAD
        request = "/"
        checktype = negotiate


Here is a log of what happens when network2 is the active router at the
time and gets shutdown while network1's heartbeat is in standby mode
waiting to takeover:

Dec 21 05:43:03 network1 heartbeat: [26940]: info: Received shutdown
notice from 'network2.ourserver.com'.
Dec 21 05:43:03 network1 heartbeat: [26940]: info: Resources being
acquired from network2.ourserver.com.
Dec 21 05:43:03 network1 heartbeat: [26959]: info: acquire all HA
resources (standby).
Dec 21 05:43:03 network1 ResourceManager[26973]: info: Acquiring
resource group: network1.ourserver.com xx.xx.101.13/24/eth0 ldirectord
Dec 21 05:43:03 network1 IPaddr[27020]: INFO:  Resource is stopped
Dec 21 05:43:03 network1 IPaddr[27021]: INFO:  Resource is stopped
Dec 21 05:43:03 network1 ResourceManager[26973]: info: Running
/etc/ha.d/resource.d/IPaddr xx.xx.101.13/24/eth0 start
Dec 21 05:43:03 network1 IPaddr[27163]: INFO: Using calculated netmask
for xx.xx.101.13: 255.255.255.0
Dec 21 05:43:03 network1 IPaddr[27147]: INFO:  Resource is stopped
Dec 21 05:43:03 network1 IPaddr[27163]: DEBUG: Using calculated
broadcast for xx.xx.101.13: xx.xx.101.255
Dec 21 05:43:03 network1 heartbeat: [26961]: info: Local Resource
acquisition completed.
Dec 21 05:43:03 network1 heartbeat: [26940]: info: Initial resource
acquisition complete (T_RESOURCES(us))
Dec 21 05:43:03 network1 IPaddr[27163]: INFO: eval /sbin/ifconfig eth0:0
xx.xx.101.13 netmask 255.255.255.0 broadcast xx.xx.101.255
Dec 21 05:43:03 network1 IPaddr[27163]: DEBUG: Sending Gratuitous Arp
for xx.xx.101.13 on eth0:0 [eth0]
Dec 21 05:43:03 network1 IPaddr[27120]: INFO:  Success
Dec 21 05:43:03 network1 ldirectord[27273]: Invoking ldirectord invoked
as: /etc/ha.d/resource.d/ldirectord status
Dec 21 05:43:03 network1 ldirectord[27273]: ldirectord is stopped for
/etc/ha.d/ldirectord.cf
Dec 21 05:43:03 network1 ldirectord[27273]: Exiting with exit_status 3:
Exiting from ldirectord status
Dec 21 05:43:03 network1 ResourceManager[26973]: info: Running
/etc/ha.d/resource.d/ldirectord  start
Dec 21 05:43:03 network1 ldirectord[27290]: Invoking ldirectord invoked
as: /etc/ha.d/resource.d/ldirectord start
Dec 21 05:43:03 network1 ldirectord[27290]: Starting Linux Director
v1.186 as daemon
Dec 21 05:43:03 network1 ldirectord[27292]: Added virtual server:
xx.xx.101.13:80
Dec 21 05:43:03 network1 ldirectord[27292]: Added virtual server:
xx.xx.101.13:443
Dec 21 05:43:03 network1 ldirectord[27292]: Added virtual server:
xx.xx.101.16:80
Dec 21 05:43:03 network1 ldirectord[27292]: Added virtual server:
xx.xx.101.16:443
Dec 21 05:43:03 network1 ldirectord[27292]: Added fallback server:
xx.xx.101.13:80 (xx.xx.101.13:80) (Weight set to 1)
Dec 21 05:43:03 network1 ldirectord[27292]: Added fallback server:
xx.xx.101.13:443 (xx.xx.101.13:443) (Weight set to 1)
Dec 21 05:43:03 network1 ResourceManager[27299]: info: Acquiring
resource group: network1.ourserver.com xx.xx.101.16/24/eth0 ldirectord
Dec 21 05:43:03 network1 ldirectord[27292]: Added fallback server:
xx.xx.101.16:80 (xx.xx.101.16:80) (Weight set to 1)
Dec 21 05:43:03 network1 ldirectord[27292]: Added fallback server:
xx.xx.101.16:443 (xx.xx.101.16:443) (Weight set to 1)
Dec 21 05:43:03 network1 IPaddr[27336]: INFO:  Resource is stopped
Dec 21 05:43:03 network1 ResourceManager[27299]: info: Running
/etc/ha.d/resource.d/IPaddr xx.xx.101.16/24/eth0 start
Dec 21 05:43:03 network1 IPaddr[27423]: INFO: Using calculated netmask
for xx.xx.101.16: 255.255.255.0
Dec 21 05:43:03 network1 IPaddr[27423]: DEBUG: Using calculated
broadcast for xx.xx.101.16: xx.xx.101.255
Dec 21 05:43:03 network1 IPaddr[27423]: INFO: eval /sbin/ifconfig eth0:1
xx.xx.101.16 netmask 255.255.255.0 broadcast xx.xx.101.255
Dec 21 05:43:03 network1 IPaddr[27423]: DEBUG: Sending Gratuitous Arp
for xx.xx.101.16 on eth0:1 [eth0]
Dec 21 05:43:03 network1 IPaddr[27402]: INFO:  Success
Dec 21 05:43:03 network1 ldirectord[27506]: Invoking ldirectord invoked
as: /etc/ha.d/resource.d/ldirectord status
Dec 21 05:43:03 network1 ldirectord[27506]: ldirectord for
/etc/ha.d/ldirectord.cf is running with pid: 27292
Dec 21 05:43:03 network1 ldirectord[27506]: Exiting from ldirectord status
Dec 21 05:43:03 network1 ResourceManager[27299]: info: Running
/etc/ha.d/resource.d/ldirectord  start
Dec 21 05:43:04 network1 ldirectord[27523]: Invoking ldirectord invoked
as: /etc/ha.d/resource.d/ldirectord start
Dec 21 05:43:04 network1 heartbeat: [26959]: info: all HA resource
acquisition completed (standby).
Dec 21 05:43:04 network1 heartbeat: [26940]: info: Standby resource
acquisition done [all].
Dec 21 05:43:04 network1 harc[27527]: info: Running
/etc/ha.d/rc.d/status status
Dec 21 05:43:04 network1 mach_down[27537]: info:
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
Dec 21 05:43:04 network1 mach_down[27537]: info: mach_down takeover
complete for node network2.ourserver.com.
Dec 21 05:43:04 network1 heartbeat: [26940]: info: mach_down takeover
complete.
Dec 21 05:43:04 network1 harc[27565]: info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
Dec 21 05:43:04 network1 ip-request-resp[27565]: received
ip-request-resp xx.xx.101.13/24/eth0 OK yes
Dec 21 05:43:04 network1 ResourceManager[27580]: info: Acquiring
resource group: network1.ourserver.com xx.xx.101.13/24/eth0 ldirectord
Dec 21 05:43:04 network1 IPaddr[27604]: INFO:  Running OK
Dec 21 05:43:04 network1 ldirectord[27650]: Invoking ldirectord invoked
as: /etc/ha.d/resource.d/ldirectord status
Dec 21 05:43:04 network1 ldirectord[27650]: ldirectord for
/etc/ha.d/ldirectord.cf is running with pid: 27292
Dec 21 05:43:04 network1 ldirectord[27650]: Exiting from ldirectord status
Dec 21 05:43:04 network1 ResourceManager[27580]: info: Running
/etc/ha.d/resource.d/ldirectord  start
Dec 21 05:43:04 network1 ldirectord[27667]: Invoking ldirectord invoked
as: /etc/ha.d/resource.d/ldirectord start
Dec 21 05:43:04 network1 harc[27671]: info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
Dec 21 05:43:04 network1 ip-request-resp[27671]: received
ip-request-resp xx.xx.101.16/24/eth0 OK yes
Dec 21 05:43:04 network1 ResourceManager[27686]: info: Acquiring
resource group: network1.ourserver.com xx.xx.101.16/24/eth0 ldirectord
Dec 21 05:43:04 network1 IPaddr[27710]: INFO:  Running OK
Dec 21 05:43:04 network1 ldirectord[27756]: Invoking ldirectord invoked
as: /etc/ha.d/resource.d/ldirectord status
Dec 21 05:43:04 network1 ldirectord[27756]: ldirectord for
/etc/ha.d/ldirectord.cf is running with pid: 27292
Dec 21 05:43:04 network1 ldirectord[27756]: Exiting from ldirectord status
Dec 21 05:43:04 network1 ResourceManager[27686]: info: Running
/etc/ha.d/resource.d/ldirectord  start
Dec 21 05:43:05 network1 ldirectord[27773]: Invoking ldirectord invoked
as: /etc/ha.d/resource.d/ldirectord start
Dec 21 05:43:10 network1 ldirectord[27292]: Added real server:
xx.xx.101.227:80 (xx.xx.101.13:80) (Weight set to 1)
Dec 21 05:43:10 network1 ldirectord[27292]: Deleted fallback server:
xx.xx.101.13:80 (xx.xx.101.13:80)
Dec 21 05:43:10 network1 ldirectord[27292]: Added real server:
xx.xx.101.226:80 (xx.xx.101.13:80) (Weight set to 1)
Dec 21 05:43:11 network1 ldirectord[27292]: Added real server:
xx.xx.101.224:80 (xx.xx.101.13:80) (Weight set to 1)
Dec 21 05:43:34 network1 heartbeat: [26940]: WARN: node
network2.ourserver.com: is dead
Dec 21 05:43:34 network1 heartbeat: [26940]: info: Dead node
network2.ourserver.com gave up resources.
Dec 21 05:43:34 network1 heartbeat: [26940]: info: Link
network2.ourserver.com:eth1 dead.

Dec 21 05:46:21 network1 ldirectord[27292]: Added real server:
xx.xx.101.226:443 (xx.xx.101.13:443) (Weight set to 1)
Dec 21 05:46:21 network1 ldirectord[27292]: Deleted fallback server:
xx.xx.101.13:443 (xx.xx.101.13:443)
Dec 21 05:46:21 network1 ldirectord[27292]: Added real server:
xx.xx.101.224:443 (xx.xx.101.13:443) (Weight set to 1)
Dec 21 05:46:21 network1 ldirectord[27292]: Added real server:
xx.xx.108.102:80 (xx.xx.101.16:80) (Weight set to 1)
Dec 21 05:46:21 network1 ldirectord[27292]: Deleted fallback server:
xx.xx.101.16:80 (xx.xx.101.16:80)
Dec 21 05:46:21 network1 ldirectord[27292]: Added real server:
xx.xx.101.183:80 (xx.xx.101.16:80) (Weight set to 1)
Dec 21 05:46:21 network1 ldirectord[27292]: Added real server:
xx.xx.101.184:80 (xx.xx.101.16:80) (Weight set to 1)
Dec 21 05:46:21 network1 ldirectord[27292]: Added real server:
xx.xx.108.102:443 (xx.xx.101.16:443) (Weight set to 1)
Dec 21 05:46:21 network1 ldirectord[27292]: Deleted fallback server:
xx.xx.101.16:443 (xx.xx.101.16:443)
Dec 21 05:46:21 network1 ldirectord[27292]: Added real server:
xx.xx.101.183:443 (xx.xx.101.16:443) (Weight set to 1)
Dec 21 05:46:21 network1 ldirectord[27292]: Added real server:
xx.xx.101.184:443 (xx.xx.101.16:443) (Weight set to 1)

-- 
*Eric Renfro*
Software Developer

EZYield.com, Inc
125 Excelsior Pkwy
Winter Springs, FL 32708

407-629-0900 ext 832

*_Join us for the 2010 EZYield.com World Tour_

-FITUR Madrid, Spain, January 20-24 Pavilion 8, Stand 8B29A
-Sabre Hospitality Solutions Customer Forum, Americas, Dallas, TX,
February 23-24
-ITB Berlin, Germany, March 10-14 Hall 10.1, Booth 111
-SoftBrands Hospitality User Forum, Scottsdale, AZ, March 16-18
-Sabre Hospitality Solutions Customer Forum, EMEA, London, UK, March 23-24
-Sabre Hospitality Solutions Customer Forum, APAC, Singapore, April 26-27
-HSMAI Revenue Management & Internet Marketing Strategy Conference,
Orlando, FL, June 21
-HITEC 2010, Orlando, FL, June 21-24
-World Travel Market, London, UK, November 8-11*



Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
<Prev in Thread] Current Thread [Next in Thread>