Hello,
I'm trying to resolve a current problem I have setting up a pair of LVS
load balancing servers using heartbeat and ldirector under Gentoo.
I am using heartbeat 2.0.8 on two servers and the heartbeat and
ldirector setup is not very extensive but should be working better than
it is. I will provide complete configurations, minus IP's themselves,
but to explain the problem up front, the issues I'm having is rather
strange.
Our servers are named simply, network1, and network2, which I will use
to explain the issue.
How I am discovering these issues is when I shut down either network1 or
network2's heartbeat process, it successfully releases the IP and passes
it on to the other to take over. It does this rather quickly as
expected, however, when it brings up ldirector, that is when the
problems begin. We have two clusters of three webservers each, on both
http and https ports. On network1, it immediately brings up the first
cluster that was setup with all three RIP nodes active but inaccessible.
All the others are weighted to 0 under a weighted-based setup, otherwise
they are non-existent and going to the fallback server RIP initially.
For about 5-10 minutes the replaced heartbeat+ldirector server has heavy
CPU load with ksoftirq/0 and ksoftirq/1 being the culprits of the active
CPU load, atop confirms this by having 3 irq's showing at 200%, 100%,
and 100%, last 5-10 minutes.
Once that all clears up and goes back to normal, ipvs routes show up
almost instantaneously and furthermore, actually works.
I do not know what is causing this issue and I would like some help to
resolve this issue.
Follow are the configuration files used. Virtual IP's are replaced by
xx.xx.101.13 and xx.xx.101.16 because there are two VIP's involved.
Related RIP's are also done similarly as Cluster1 (xx.xx.101.227,
xx.xx.101.226, xx.xx.101.224) and Cluster 2 (xx.xx.108.102,
xx.xx.101.183, xx.xx.101.184) being there are 6 total servers in two
different clusters. The actual network server's IP's are, network1
(xx.xx.101.153), and network2 (xx.xx.108.203).
ha.d/haresources:
network1.ourserver.com xx.xx.101.13/24/eth0 ldirectord
network1.ourserver.com xx.xx.101.16/24/eth0 ldirectord
ha.cf:
logfacility local0
keepalive 2
deadtime 30
warntime 10
bcast eth1
auto_failback on
node network1.ourserver.com
node network2.ourserver.com
ldirector.cf:
checktimeout=3
checkinterval=5
#negotiatetimeout=5
autoreload=yes
logfile="local0"
quiescent=no
virtual = xx.xx.101.13:80
fallback = xx.xx.101.13:80 gate
real = xx.xx.101.227:80 gate
real = xx.xx.101.226:80 gate
real = xx.xx.101.224:80 gate
scheduler = lc
persistent = 7200
protocol = tcp
service = http
httpmethod = HEAD
request = "/"
checktype = negotiate
virtual = xx.xx.101.13:443
fallback = xx.xx.101.13:443 gate
real = xx.xx.101.227:443 gate
real = xx.xx.101.226:443 gate
real = xx.xx.101.224:443 gate
scheduler = lc
persistent = 7200
protocol = tcp
service = https
httpmethod = HEAD
request = "/"
checktype = negotiate
virtual = xx.xx.101.16:80
fallback = xx.xx.101.16:80 gate
real = xx.xx.108.102:80 gate
real = xx.xx.101.183:80 gate
real = xx.xx.101.184:80 gate
scheduler = lc
persistent = 7200
protocol = tcp
service = http
httpmethod = HEAD
request = "/"
checktype = negotiate
virtual = xx.xx.101.16:443
fallback = xx.xx.101.16:443 gate
real = xx.xx.108.102:443 gate
real = xx.xx.101.183:443 gate
real = xx.xx.101.184:443 gate
scheduler = lc
persistent = 7200
protocol = tcp
service = https
httpmethod = HEAD
request = "/"
checktype = negotiate
Here is a log of what happens when network2 is the active router at the
time and gets shutdown while network1's heartbeat is in standby mode
waiting to takeover:
Dec 21 05:43:03 network1 heartbeat: [26940]: info: Received shutdown
notice from 'network2.ourserver.com'.
Dec 21 05:43:03 network1 heartbeat: [26940]: info: Resources being
acquired from network2.ourserver.com.
Dec 21 05:43:03 network1 heartbeat: [26959]: info: acquire all HA
resources (standby).
Dec 21 05:43:03 network1 ResourceManager[26973]: info: Acquiring
resource group: network1.ourserver.com xx.xx.101.13/24/eth0 ldirectord
Dec 21 05:43:03 network1 IPaddr[27020]: INFO: Resource is stopped
Dec 21 05:43:03 network1 IPaddr[27021]: INFO: Resource is stopped
Dec 21 05:43:03 network1 ResourceManager[26973]: info: Running
/etc/ha.d/resource.d/IPaddr xx.xx.101.13/24/eth0 start
Dec 21 05:43:03 network1 IPaddr[27163]: INFO: Using calculated netmask
for xx.xx.101.13: 255.255.255.0
Dec 21 05:43:03 network1 IPaddr[27147]: INFO: Resource is stopped
Dec 21 05:43:03 network1 IPaddr[27163]: DEBUG: Using calculated
broadcast for xx.xx.101.13: xx.xx.101.255
Dec 21 05:43:03 network1 heartbeat: [26961]: info: Local Resource
acquisition completed.
Dec 21 05:43:03 network1 heartbeat: [26940]: info: Initial resource
acquisition complete (T_RESOURCES(us))
Dec 21 05:43:03 network1 IPaddr[27163]: INFO: eval /sbin/ifconfig eth0:0
xx.xx.101.13 netmask 255.255.255.0 broadcast xx.xx.101.255
Dec 21 05:43:03 network1 IPaddr[27163]: DEBUG: Sending Gratuitous Arp
for xx.xx.101.13 on eth0:0 [eth0]
Dec 21 05:43:03 network1 IPaddr[27120]: INFO: Success
Dec 21 05:43:03 network1 ldirectord[27273]: Invoking ldirectord invoked
as: /etc/ha.d/resource.d/ldirectord status
Dec 21 05:43:03 network1 ldirectord[27273]: ldirectord is stopped for
/etc/ha.d/ldirectord.cf
Dec 21 05:43:03 network1 ldirectord[27273]: Exiting with exit_status 3:
Exiting from ldirectord status
Dec 21 05:43:03 network1 ResourceManager[26973]: info: Running
/etc/ha.d/resource.d/ldirectord start
Dec 21 05:43:03 network1 ldirectord[27290]: Invoking ldirectord invoked
as: /etc/ha.d/resource.d/ldirectord start
Dec 21 05:43:03 network1 ldirectord[27290]: Starting Linux Director
v1.186 as daemon
Dec 21 05:43:03 network1 ldirectord[27292]: Added virtual server:
xx.xx.101.13:80
Dec 21 05:43:03 network1 ldirectord[27292]: Added virtual server:
xx.xx.101.13:443
Dec 21 05:43:03 network1 ldirectord[27292]: Added virtual server:
xx.xx.101.16:80
Dec 21 05:43:03 network1 ldirectord[27292]: Added virtual server:
xx.xx.101.16:443
Dec 21 05:43:03 network1 ldirectord[27292]: Added fallback server:
xx.xx.101.13:80 (xx.xx.101.13:80) (Weight set to 1)
Dec 21 05:43:03 network1 ldirectord[27292]: Added fallback server:
xx.xx.101.13:443 (xx.xx.101.13:443) (Weight set to 1)
Dec 21 05:43:03 network1 ResourceManager[27299]: info: Acquiring
resource group: network1.ourserver.com xx.xx.101.16/24/eth0 ldirectord
Dec 21 05:43:03 network1 ldirectord[27292]: Added fallback server:
xx.xx.101.16:80 (xx.xx.101.16:80) (Weight set to 1)
Dec 21 05:43:03 network1 ldirectord[27292]: Added fallback server:
xx.xx.101.16:443 (xx.xx.101.16:443) (Weight set to 1)
Dec 21 05:43:03 network1 IPaddr[27336]: INFO: Resource is stopped
Dec 21 05:43:03 network1 ResourceManager[27299]: info: Running
/etc/ha.d/resource.d/IPaddr xx.xx.101.16/24/eth0 start
Dec 21 05:43:03 network1 IPaddr[27423]: INFO: Using calculated netmask
for xx.xx.101.16: 255.255.255.0
Dec 21 05:43:03 network1 IPaddr[27423]: DEBUG: Using calculated
broadcast for xx.xx.101.16: xx.xx.101.255
Dec 21 05:43:03 network1 IPaddr[27423]: INFO: eval /sbin/ifconfig eth0:1
xx.xx.101.16 netmask 255.255.255.0 broadcast xx.xx.101.255
Dec 21 05:43:03 network1 IPaddr[27423]: DEBUG: Sending Gratuitous Arp
for xx.xx.101.16 on eth0:1 [eth0]
Dec 21 05:43:03 network1 IPaddr[27402]: INFO: Success
Dec 21 05:43:03 network1 ldirectord[27506]: Invoking ldirectord invoked
as: /etc/ha.d/resource.d/ldirectord status
Dec 21 05:43:03 network1 ldirectord[27506]: ldirectord for
/etc/ha.d/ldirectord.cf is running with pid: 27292
Dec 21 05:43:03 network1 ldirectord[27506]: Exiting from ldirectord status
Dec 21 05:43:03 network1 ResourceManager[27299]: info: Running
/etc/ha.d/resource.d/ldirectord start
Dec 21 05:43:04 network1 ldirectord[27523]: Invoking ldirectord invoked
as: /etc/ha.d/resource.d/ldirectord start
Dec 21 05:43:04 network1 heartbeat: [26959]: info: all HA resource
acquisition completed (standby).
Dec 21 05:43:04 network1 heartbeat: [26940]: info: Standby resource
acquisition done [all].
Dec 21 05:43:04 network1 harc[27527]: info: Running
/etc/ha.d/rc.d/status status
Dec 21 05:43:04 network1 mach_down[27537]: info:
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
Dec 21 05:43:04 network1 mach_down[27537]: info: mach_down takeover
complete for node network2.ourserver.com.
Dec 21 05:43:04 network1 heartbeat: [26940]: info: mach_down takeover
complete.
Dec 21 05:43:04 network1 harc[27565]: info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
Dec 21 05:43:04 network1 ip-request-resp[27565]: received
ip-request-resp xx.xx.101.13/24/eth0 OK yes
Dec 21 05:43:04 network1 ResourceManager[27580]: info: Acquiring
resource group: network1.ourserver.com xx.xx.101.13/24/eth0 ldirectord
Dec 21 05:43:04 network1 IPaddr[27604]: INFO: Running OK
Dec 21 05:43:04 network1 ldirectord[27650]: Invoking ldirectord invoked
as: /etc/ha.d/resource.d/ldirectord status
Dec 21 05:43:04 network1 ldirectord[27650]: ldirectord for
/etc/ha.d/ldirectord.cf is running with pid: 27292
Dec 21 05:43:04 network1 ldirectord[27650]: Exiting from ldirectord status
Dec 21 05:43:04 network1 ResourceManager[27580]: info: Running
/etc/ha.d/resource.d/ldirectord start
Dec 21 05:43:04 network1 ldirectord[27667]: Invoking ldirectord invoked
as: /etc/ha.d/resource.d/ldirectord start
Dec 21 05:43:04 network1 harc[27671]: info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
Dec 21 05:43:04 network1 ip-request-resp[27671]: received
ip-request-resp xx.xx.101.16/24/eth0 OK yes
Dec 21 05:43:04 network1 ResourceManager[27686]: info: Acquiring
resource group: network1.ourserver.com xx.xx.101.16/24/eth0 ldirectord
Dec 21 05:43:04 network1 IPaddr[27710]: INFO: Running OK
Dec 21 05:43:04 network1 ldirectord[27756]: Invoking ldirectord invoked
as: /etc/ha.d/resource.d/ldirectord status
Dec 21 05:43:04 network1 ldirectord[27756]: ldirectord for
/etc/ha.d/ldirectord.cf is running with pid: 27292
Dec 21 05:43:04 network1 ldirectord[27756]: Exiting from ldirectord status
Dec 21 05:43:04 network1 ResourceManager[27686]: info: Running
/etc/ha.d/resource.d/ldirectord start
Dec 21 05:43:05 network1 ldirectord[27773]: Invoking ldirectord invoked
as: /etc/ha.d/resource.d/ldirectord start
Dec 21 05:43:10 network1 ldirectord[27292]: Added real server:
xx.xx.101.227:80 (xx.xx.101.13:80) (Weight set to 1)
Dec 21 05:43:10 network1 ldirectord[27292]: Deleted fallback server:
xx.xx.101.13:80 (xx.xx.101.13:80)
Dec 21 05:43:10 network1 ldirectord[27292]: Added real server:
xx.xx.101.226:80 (xx.xx.101.13:80) (Weight set to 1)
Dec 21 05:43:11 network1 ldirectord[27292]: Added real server:
xx.xx.101.224:80 (xx.xx.101.13:80) (Weight set to 1)
Dec 21 05:43:34 network1 heartbeat: [26940]: WARN: node
network2.ourserver.com: is dead
Dec 21 05:43:34 network1 heartbeat: [26940]: info: Dead node
network2.ourserver.com gave up resources.
Dec 21 05:43:34 network1 heartbeat: [26940]: info: Link
network2.ourserver.com:eth1 dead.
Dec 21 05:46:21 network1 ldirectord[27292]: Added real server:
xx.xx.101.226:443 (xx.xx.101.13:443) (Weight set to 1)
Dec 21 05:46:21 network1 ldirectord[27292]: Deleted fallback server:
xx.xx.101.13:443 (xx.xx.101.13:443)
Dec 21 05:46:21 network1 ldirectord[27292]: Added real server:
xx.xx.101.224:443 (xx.xx.101.13:443) (Weight set to 1)
Dec 21 05:46:21 network1 ldirectord[27292]: Added real server:
xx.xx.108.102:80 (xx.xx.101.16:80) (Weight set to 1)
Dec 21 05:46:21 network1 ldirectord[27292]: Deleted fallback server:
xx.xx.101.16:80 (xx.xx.101.16:80)
Dec 21 05:46:21 network1 ldirectord[27292]: Added real server:
xx.xx.101.183:80 (xx.xx.101.16:80) (Weight set to 1)
Dec 21 05:46:21 network1 ldirectord[27292]: Added real server:
xx.xx.101.184:80 (xx.xx.101.16:80) (Weight set to 1)
Dec 21 05:46:21 network1 ldirectord[27292]: Added real server:
xx.xx.108.102:443 (xx.xx.101.16:443) (Weight set to 1)
Dec 21 05:46:21 network1 ldirectord[27292]: Deleted fallback server:
xx.xx.101.16:443 (xx.xx.101.16:443)
Dec 21 05:46:21 network1 ldirectord[27292]: Added real server:
xx.xx.101.183:443 (xx.xx.101.16:443) (Weight set to 1)
Dec 21 05:46:21 network1 ldirectord[27292]: Added real server:
xx.xx.101.184:443 (xx.xx.101.16:443) (Weight set to 1)
--
*Eric Renfro*
Software Developer
EZYield.com, Inc
125 Excelsior Pkwy
Winter Springs, FL 32708
407-629-0900 ext 832
*_Join us for the 2010 EZYield.com World Tour_
-FITUR Madrid, Spain, January 20-24 Pavilion 8, Stand 8B29A
-Sabre Hospitality Solutions Customer Forum, Americas, Dallas, TX,
February 23-24
-ITB Berlin, Germany, March 10-14 Hall 10.1, Booth 111
-SoftBrands Hospitality User Forum, Scottsdale, AZ, March 16-18
-Sabre Hospitality Solutions Customer Forum, EMEA, London, UK, March 23-24
-Sabre Hospitality Solutions Customer Forum, APAC, Singapore, April 26-27
-HSMAI Revenue Management & Internet Marketing Strategy Conference,
Orlando, FL, June 21
-HITEC 2010, Orlando, FL, June 21-24
-World Travel Market, London, UK, November 8-11*
signature.asc
Description: OpenPGP digital signature
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
|