LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: LVS stops balancing after a while

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: LVS stops balancing after a while
From: Roberto Nibali <ratz@xxxxxxxxxxxx>
Date: Sun, 05 Feb 2006 22:08:05 +0100
Hello,

even though I have red a lot mails from the list, I haven't found a case like our.
We have some trouble with our LVS-Cluster. Since a while we
evaluate a 6 Node-Cluster (3 Realserver/1Devel/2Directors) as a Mail Gateway.

Ok.

The Setup is the following:

The 2 Director (failover) running direct routing.
Setup is done by  ldirectord(1.77.2.41) which is
invoked by heartbeat.
Addionally there are two drbd-devices for some
configfiles and data (no ldirectord data/config in there).

On the realserver is some MTA(postfix,amavisd-new etc.) running.

When we starting the setup everything is running fine, the heartbeat comes up and starts the ldirectord with the following config:

-------------------------------------------------
# Global Directives
checktimeout=10
checkinterval=2
autoreload=yes
quiescent=yes

#Sample configuration for an smtp virtual service.
#Fallback setting overides global
virtual=10.10.x.60:25
        real=10.10.x.63:25 gate 100
        real=10.10.x.64:25 gate 100
        real=10.10.x.65:25 gate 100
        service=smtp
        scheduler=lc
        checkport=25
        checktype=connect
        protocol=tcp
-------------------------------------------------

so ipvsadm -Ln gives us the following output:

-------------------------------------------------
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.10.x.60:25 lc
  -> 10.10.x.63:25               Route   100    0          0
  -> 10.10.x.64:25               Route   100    0          0
  -> 10.10.x.65:25               Route   100    0          0
-------------------------------------------------

Nice description.

Everything works fine and the Director balances the incoming connections to the 3 realservers. But after a while (sorry dont know a specific time) the LB stops sceduling and the incoming connections will send to the last choosen realserver.

Is the timeframe within hours or days? Do you have recorded a dmesg -s 1000000 during such an event?

Really strange is that after this happens, you can unload the modules (ip_vs/ip_vs_lc) and after the ip_vs module got reloaded the traffic continues send to the last realserver.

Hmm, you might need to enable IPVS debugging during this time. Also check all your log files regarding your IPVS setup, the ldirectord output, the heartbeat, the kernel logs, the mta logs, ...

For testing i've writen the output of ipvsadm -Ln( and c) with timestamp to a file. The last entry was this:

IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Conns InPkts OutPkts InBytes OutBytes
  -> RemoteAddress:Port
TCP  10.10.x.60:25             17617  1872891        0    2541M        0
  -> 10.10.x.63:25              5927   614901        0  832431K        0
  -> 10.10.x.64:25              5964   663196        0  902503K        0
  -> 10.10.x.65:25              5726   594794        0  806841K        0

(No outgoing Pakets, because the mails are dropping in a sink)

The reason why you don't see the outgoing packet counter increasing is because in LVS-DR (route) the return packets do not pass through the director.

Restarting the ldirectord has the same effect, everthing continues doing wrong.

From the looks it's almost perfectly balanced. I'm a bit astonished concerning your last statement regarding the output above.

I cant get a conclusion out of the symptoms.

Could you tcpdump on the real servers that do not get any requests anymore along with the surveillance of the various log files?

Every help is appreciated and thanks for your time reading this.

I'm afraid, your output does not underline your statement so far.

Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

<Prev in Thread] Current Thread [Next in Thread>