LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

[lvs-users] problem with ldirectord- web server up/site down :(

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx, linux-ha@xxxxxxxxxxxxxxxxxx
Subject: [lvs-users] problem with ldirectord- web server up/site down :(
From: Dave Augustus <davea@xxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 03 May 2011 06:17:23 -0500
We had a failure yesterday(and we have had this happen in the past about 
once a month- I am now taking the time to post the problem) and one of 
our web sites was unavailable. After a few minutes of investigation, I 
found that the load-balancer did not have any hosts in the rotation for 
that site. All 3 web servers were up and working so the check in 
ldirectord should have had all 3 in the current running configuration of 
ipvs. A simple restart of ldirectord caused all 3 web servers to be 
added back into the rotation immediately and the site was restored to 
service.

There is no clustering software used in this current configuration.

It seems that ldirectord forgets what it is supposed to do over time(a 
few weeks) and a simple restart makes it happy again, as it has in this 
case and in previous cases.

Here are the software versions for the loadbalancer:
CentOS release 5.5 x86_64
ldirectord-1.0.4-1.1.el5
kernel 2.6.18-194.32.1.el5

Here are the important parts of the ldirectord.cf file (anonymized)
=============================
# Global Directives
checktimeout=20
checkinterval=30
autoreload=yes
logfile="local0"
quiescent=no
fork=yes

# http virtual service for redirecting port 80 to my.securesite.com
virtual=192.168.35.117:80
         real=192.168.35.43:80 gate 100
         real=192.168.35.44:80 gate 100
         real=192.168.35.45:80 gate 100
         service=http
         scheduler=rr
         netmask=255.255.255.255
         protocol=tcp

# http virtual service for my.securesite.com
virtual=192.168.35.117:443
         real=192.168.35.43:40117 gate 100
         real=192.168.35.44:40117 gate 100
         real=192.168.35.45:40117 gate 100
         service=https
         scheduler=wlc
         persistent=600
         netmask=255.255.255.255
         protocol=tcp
         virtualhost=my.securesite.com
=============================

/etc/ipvsadm.rules
=============================
(no entry for this host- let ldirectord figure it out)
(note: I have since ADDED the rules here for the 117 https host
but I don't see how not having it matters as ldirectord manages that.)
=============================

The logs had no place where the actual site was removed from ipvs. It 
did have some like the following with "failed" - notice the timestamps:

May  1 21:10:56 lb71 ldirectord[7336]: system(/sbin/ipvsadm -a -t 
63.251.35.117:80 -r 192.168.35.45:80 -g -w 100) failed:
May  1 21:10:56 lb71 ldirectord[7336]: Added real server: 
192.168.35.45:80 (192.168.35.117:80) (Weight set to 100)
May  1 21:10:56 lb71 ldirectord[7343]: Resetting soft failure count: 
192.168.35.45:40117 (tcp:192.168.35.117:443)
May  1 21:10:56 lb71 ldirectord[7343]: system(/sbin/ipvsadm -a -t 
192.168.35.117:443 -r 192.168.35.45:40117 -g -w 100) failed:
May  1 21:10:56 lb71 ldirectord[7343]: Added real server: 
192.168.35.45:40117 (192.168.35.117:443) (Weight set to 100)

Is this a bug in ldirectord? Some thing wrong in my config? Should I 
look to keepalived? mon?

Thanks,
Dave

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread] Current Thread [Next in Thread>
  • [lvs-users] problem with ldirectord- web server up/site down :(, Dave Augustus <=