LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

[lvs-users] Problems with ldirectord: Doesn't check like advised in conf

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: [lvs-users] Problems with ldirectord: Doesn't check like advised in config, real servers not dead but taken out of service
From: Timo Schoeler <timo.schoeler@xxxxxxxxxxxxx>
Date: Fri, 03 Apr 2009 15:18:52 +0200
Hello list,

I have some weird phenoma running ldirectord within heartbeat (v2).

Our load balancer provides some VIPs, that in turn point to some real 
IPs of real servers. Ports used are non-standard, as we deployed some 
proprietary stuff, but the only area where this should be taken into 
account is 'how to test the real servers vitality'. However, at the 
moment we check the servers vitality using

checktype = connect

with the following values

# Global Directives

checktimeout=2
checkinterval=60

# checkcount only works for ping checks!
checkcount=2

So, AFAICS ldirectord tests the (real) server on port 6789 (e.g.) and, 
if the port is open, it's 'okay' for the load balancer; if it cannot 
connect, the real server is taken out of service (-> quiescent = yes).

Furthermore, the load balancer should execute the connect check once 
every minute... but unfortunately, this doesn't seem to be true.

I ran tcpdump and checked for TCP connects between the load balancer and 
one of the real servers and saw that the tests did not occur in the 
interval configured in ldirectord's config.

We usually have ldirectord configured to do the connect check every two 
seconds (which it also doesn't do). However, we raised the value after 
we had ldirectord.log flooded with entries that shows servers taken out 
of service and taken back into server the next check. With a value of 60 
secs this became less a problem, but still exists.

I'd really appreciate any hint that could

i) make me understand why the (connect) check doesn't happen as expected 
(difference config file <-> real world)

ii) fix the problem of servers taken out of and back into service 
without being 'dead'

Thanks in advance & best,

Timo

-----

Software used:

Linux bla.blubb.org 2.6.18-92.1.22.el5xen #1 SMP Tue Dec 16 12:26:32 EST 
2008 x86_64 x86_64 x86_64 GNU/Linux

(CentOS 5.2/amd64)

heartbeat-2.1.3-3.el5.centos
heartbeat-ldirectord-2.1.3-3.el5.centos
heartbeat-pils-2.1.3-3.el5.centos
heartbeat-stonith-2.1.3-3.el5.centos

-----

15:06:37.126122 IP 1.2.3.4.58842 > 4.3.2.1.6789: S 
825392567:825392567(0) win 5840 <mss 1460,sackOK,timestamp 172223831 
0,nop,wscale 7>
15:06:37.141066 IP 4.3.2.1.6789 > 1.2.3.4.58842: S 
2149464780:2149464780(0) ack 825392568 win 5792 <mss 
1460,sackOK,timestamp 87243867 172223831,nop,wscale 5>
15:06:37.141085 IP 1.2.3.4.58842 > 4.3.2.1.6789: . ack 1 win 46 
<nop,nop,timestamp 172223834 87243867>
15:06:37.141137 IP 1.2.3.4.58842 > 4.3.2.1.6789: F 1:1(0) ack 1 win 46 
<nop,nop,timestamp 172223834 87243867>
15:06:37.155981 IP 4.3.2.1.6789 > 1.2.3.4.58842: . ack 2 win 181 
<nop,nop,timestamp 87243882 172223834>
15:06:37.170258 IP 4.3.2.1.6789 > 1.2.3.4.58842: F 1:1(0) ack 2 win 181 
<nop,nop,timestamp 87243896 172223834>
15:06:37.170270 IP 1.2.3.4.58842 > 4.3.2.1.6789: . ack 2 win 46 
<nop,nop,timestamp 172223842 87243896>
15:06:38.126156 IP 1.2.3.4.58846 > 4.3.2.1.6789: S 
833849460:833849460(0) win 5840 <mss 1460,sackOK,timestamp 172224081 
0,nop,wscale 7>
15:06:38.141115 IP 4.3.2.1.6789 > 1.2.3.4.58846: S 
2169286430:2169286430(0) ack 833849461 win 5792 <mss 
1460,sackOK,timestamp 87244867 172224081,nop,wscale 5>
15:06:38.141133 IP 1.2.3.4.58846 > 4.3.2.1.6789: . ack 1 win 46 
<nop,nop,timestamp 172224085 87244867>
15:06:38.141186 IP 1.2.3.4.58846 > 4.3.2.1.6789: F 1:1(0) ack 1 win 46 
<nop,nop,timestamp 172224085 87244867>
15:06:38.156983 IP 4.3.2.1.6789 > 1.2.3.4.58846: . ack 2 win 181 
<nop,nop,timestamp 87244883 172224085>
15:06:38.168275 IP 4.3.2.1.6789 > 1.2.3.4.58846: F 1:1(0) ack 2 win 181 
<nop,nop,timestamp 87244894 172224085>
15:06:38.168286 IP 1.2.3.4.58846 > 4.3.2.1.6789: . ack 2 win 46 
<nop,nop,timestamp 172224091 87244894>
--
...49 secs...
--
15:07:27.026332 IP 1.2.3.4.58920 > 4.3.2.1.6789: S 
887535625:887535625(0) win 5840 <mss 1460,sackOK,timestamp 172236306 
0,nop,wscale 7>
15:07:27.041035 IP 4.3.2.1.6789 > 1.2.3.4.58920: S 
2922998013:2922998013(0) ack 887535626 win 5792 <mss 
1460,sackOK,timestamp 87293774 172236306,nop,wscale 5>
15:07:27.041080 IP 1.2.3.4.58920 > 4.3.2.1.6789: . ack 1 win 46 
<nop,nop,timestamp 172236308 87293774>
15:07:27.041189 IP 1.2.3.4.58920 > 4.3.2.1.6789: F 1:1(0) ack 1 win 46 
<nop,nop,timestamp 172236308 87293774>
15:07:27.056877 IP 4.3.2.1.6789 > 1.2.3.4.58920: . ack 2 win 181 
<nop,nop,timestamp 87293790 172236308>
15:07:27.073315 IP 4.3.2.1.6789 > 1.2.3.4.58920: F 1:1(0) ack 2 win 181 
<nop,nop,timestamp 87293806 172236308>
15:07:27.073335 IP 1.2.3.4.58920 > 4.3.2.1.6789: . ack 2 win 46 
<nop,nop,timestamp 172236316 87293806>
15:07:28.026444 IP 1.2.3.4.58924 > 4.3.2.1.6789: S 
881660646:881660646(0) win 5840 <mss 1460,sackOK,timestamp 172236556 
0,nop,wscale 7>
15:07:28.041404 IP 4.3.2.1.6789 > 1.2.3.4.58924: S 
2937299048:2937299048(0) ack 881660647 win 5792 <mss 
1460,sackOK,timestamp 87294774 172236556,nop,wscale 5>
15:07:28.041436 IP 1.2.3.4.58924 > 4.3.2.1.6789: . ack 1 win 46 
<nop,nop,timestamp 172236558 87294774>
15:07:28.041616 IP 1.2.3.4.58924 > 4.3.2.1.6789: F 1:1(0) ack 1 win 46 
<nop,nop,timestamp 172236558 87294774>
15:07:28.056678 IP 4.3.2.1.6789 > 1.2.3.4.58924: . ack 2 win 181 
<nop,nop,timestamp 87294790 172236558>
15:07:28.069606 IP 4.3.2.1.6789 > 1.2.3.4.58924: F 1:1(0) ack 2 win 181 
<nop,nop,timestamp 87294802 172236558>
15:07:28.069629 IP 1.2.3.4.58924 > 4.3.2.1.6789: . ack 2 win 46 
<nop,nop,timestamp 172236566 87294802>
--
...48 secs...
--
15:09:16.474428 IP 1.2.3.4.58998 > 4.3.2.1.6789: S 
989321214:989321214(0) win 5840 <mss 1460,sackOK,timestamp 172263668 
0,nop,wscale 7>
15:09:16.489190 IP 4.3.2.1.6789 > 1.2.3.4.58998: S 
364077016:364077016(0) ack 989321215 win 5792 <mss 1460,sackOK,timestamp 
87403238 172263668,nop,wscale 5>
15:09:16.489207 IP 1.2.3.4.58998 > 4.3.2.1.6789: . ack 1 win 46 
<nop,nop,timestamp 172263671 87403238>
15:09:16.489261 IP 1.2.3.4.58998 > 4.3.2.1.6789: F 1:1(0) ack 1 win 46 
<nop,nop,timestamp 172263671 87403238>
15:09:16.504972 IP 4.3.2.1.6789 > 1.2.3.4.58998: . ack 2 win 181 
<nop,nop,timestamp 87403254 172263671>
15:09:16.522734 IP 4.3.2.1.6789 > 1.2.3.4.58998: F 1:1(0) ack 2 win 181 
<nop,nop,timestamp 87403271 172263671>
15:09:16.522746 IP 1.2.3.4.58998 > 4.3.2.1.6789: . ack 2 win 46 
<nop,nop,timestamp 172263680 87403271>
15:09:17.462438 IP 1.2.3.4.59002 > 4.3.2.1.6789: S 
993835442:993835442(0) win 5840 <mss 1460,sackOK,timestamp 172263915 
0,nop,wscale 7>
15:09:17.477212 IP 4.3.2.1.6789 > 1.2.3.4.59002: S 
370527242:370527242(0) ack 993835443 win 5792 <mss 1460,sackOK,timestamp 
87404226 172263915,nop,wscale 5>
15:09:17.477228 IP 1.2.3.4.59002 > 4.3.2.1.6789: . ack 1 win 46 
<nop,nop,timestamp 172263918 87404226>
15:09:17.477281 IP 1.2.3.4.59002 > 4.3.2.1.6789: F 1:1(0) ack 1 win 46 
<nop,nop,timestamp 172263918 87404226>
15:09:17.492636 IP 4.3.2.1.6789 > 1.2.3.4.59002: . ack 2 win 181 
<nop,nop,timestamp 87404242 172263918>
15:09:17.503821 IP 4.3.2.1.6789 > 1.2.3.4.59002: F 1:1(0) ack 2 win 181 
<nop,nop,timestamp 87404252 172263918>
15:09:17.503831 IP 1.2.3.4.59002 > 4.3.2.1.6789: . ack 2 win 46 
<nop,nop,timestamp 172263925 87404252>


_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread] Current Thread [Next in Thread>