LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: LVS stops balancing after a while

To: <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: LVS stops balancing after a while
From: Mathieu Massebœuf <mathieu.masseboeuf@xxxxxxxxxxxx>
Date: Mon, 6 Feb 2006 19:39:51 +0100
Hi,

I'm actualy in a similar issue as yours - using direct routing and wlc :
after a while connections stop being load balanced and go to a single server.
Since a few month I've upgraded our LVS infrastructure, which is made of
2 LVS servers and 10 web servers - I had no issue before except cpu/mem
resources (the setup was 3 years old).

The load balancers now run Debian sarge in a way similar to yours, with
the following :
  - Ldirectord 1.2.3-9sarge4
  - Heartbeat 1.2.3-9sarge4 , checking via a serial cable plus broadcast
on the internal lan
  - Ipvsadm 1.24+1.21-1 (for ipvs_syncmaster and ipvs_syncbackup cluster
synchronization)
  - Kernel 2.6.14 (non debian)

Setup is done the following way (heartbeat calling ldirectord), here is
the conf :
checktimeout=6
checkinterval=3
autoreload=yes
logfile="local3"
quiescent=yes

# HTTP Virtual Service
virtual=213.x.y.z:80
        real=172.16.x.41:80 gate 10
        real=172.16.x.42:80 gate 10
        real=172.16.x.43:80 gate 10
        real=172.16.x.44:80 gate 10
        real=172.16.x.45:80 gate 10
        real=172.16.x.46:80 gate 10
        real=172.16.x.47:80 gate 18
        real=172.16.x.48:80 gate 18
        real=172.16.x.49:80 gate 25
        real=172.16.x.50:80 gate 25
        service=http
        virtualhost="domain.com"
        request="/.testpage"
        receive="Test Page"
        scheduler=wlc
        #persistent=600
        protocol=tcp
        checktype=negotiate

When the issue is happening, ipvsadm -L -n outputs 0 ActiveConn and 0
InActConn
When it's not happening, each server have a lot of connections, 0 is not
possible, for example right now (which is low traffic) :
  -> 172.16.x.50:www  Route   25     619
3462


I noticed all the traffic was going to the same box as the logs were
filling quickly - and as stopping httpd on that box made the whole site
to go down.

Considering I was in an urgent situation, I couldn't have much time to
investigate more - what I did to go back up was a stop / start of
heartbeat, in the meantime the second load balancer would have taken
over the situation and then given it back.
After that everything seemed normal.

A quick investigation of the logs didn't revel anything strange (I
copied everything I could for further investigation), appart from the
following (one line only) :
Redirect from 213.255.89.122 on eth0 about 213.255.89.128 ignored.
  Advised path = 213.x.y.k (load2) -> 213.255.89.128, tos 00
ttyS0: 1 input overrun(s) (more of those)

As Jan said, any help is appreciated, and thanks for reading this
borring mail :D
(Which will hopefully be less borring if we find the cause of the proble)

--
Mathieu Massebœuf


<Prev in Thread] Current Thread [Next in Thread>