We had a failure yesterday(and we have had this happen in the past about
once a month- I am now taking the time to post the problem) and one of
our web sites was unavailable. After a few minutes of investigation, I
found that the load-balancer did not have any hosts in the rotation for
that site. All 3 web servers were up and working so the check in
ldirectord should have had all 3 in the current running configuration of
ipvs. A simple restart of ldirectord caused all 3 web servers to be
added back into the rotation immediately and the site was restored to
service.
There is no clustering software used in this current configuration.
It seems that ldirectord forgets what it is supposed to do over time(a
few weeks) and a simple restart makes it happy again, as it has in this
case and in previous cases.
Here are the software versions for the loadbalancer:
CentOS release 5.5 x86_64
ldirectord-1.0.4-1.1.el5
kernel 2.6.18-194.32.1.el5
Here are the important parts of the ldirectord.cf file (anonymized)
=============================
# Global Directives
checktimeout=20
checkinterval=30
autoreload=yes
logfile="local0"
quiescent=no
fork=yes
# http virtual service for redirecting port 80 to my.securesite.com
virtual=192.168.35.117:80
real=192.168.35.43:80 gate 100
real=192.168.35.44:80 gate 100
real=192.168.35.45:80 gate 100
service=http
scheduler=rr
netmask=255.255.255.255
protocol=tcp
# http virtual service for my.securesite.com
virtual=192.168.35.117:443
real=192.168.35.43:40117 gate 100
real=192.168.35.44:40117 gate 100
real=192.168.35.45:40117 gate 100
service=https
scheduler=wlc
persistent=600
netmask=255.255.255.255
protocol=tcp
virtualhost=my.securesite.com
=============================
/etc/ipvsadm.rules
=============================
(no entry for this host- let ldirectord figure it out)
(note: I have since ADDED the rules here for the 117 https host
but I don't see how not having it matters as ldirectord manages that.)
=============================
The logs had no place where the actual site was removed from ipvs. It
did have some like the following with "failed" - notice the timestamps:
May 1 21:10:56 lb71 ldirectord[7336]: system(/sbin/ipvsadm -a -t
63.251.35.117:80 -r 192.168.35.45:80 -g -w 100) failed:
May 1 21:10:56 lb71 ldirectord[7336]: Added real server:
192.168.35.45:80 (192.168.35.117:80) (Weight set to 100)
May 1 21:10:56 lb71 ldirectord[7343]: Resetting soft failure count:
192.168.35.45:40117 (tcp:192.168.35.117:443)
May 1 21:10:56 lb71 ldirectord[7343]: system(/sbin/ipvsadm -a -t
192.168.35.117:443 -r 192.168.35.45:40117 -g -w 100) failed:
May 1 21:10:56 lb71 ldirectord[7343]: Added real server:
192.168.35.45:40117 (192.168.35.117:443) (Weight set to 100)
Is this a bug in ldirectord? Some thing wrong in my config? Should I
look to keepalived? mon?
Thanks,
Dave
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
|