Re: Our LVS/DR backends freezes

To: Joseph Mack NA3T <jmack@xxxxxxxx>
Subject: Re: Our LVS/DR backends freezes
Cc: " users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
From: Olle Ö?stlund <olle@xxxxxxxxxxx>
Date: Tue, 28 Nov 2006 09:30:09 +0100
> nothing strange with `ulimit -a` ?

This is our default ulimit -a:

core file size        (blocks, -c) 0
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
max locked memory     (kbytes, -l) unlimited
max memory size       (kbytes, -m) unlimited
open files                    (-n) 1024
pipe size          (512 bytes, -p) 8
stack size            (kbytes, -s) unlimited
cpu time             (seconds, -t) unlimited
max user processes            (-u) 40959
virtual memory        (kbytes, -v) unlimited

The tomcat owner has "open files" raised to 4096.

> > We are running a "weighted round robin" load balancer algoritm, and the 
> > director will set weight 0 (= no traffic) on a backend once it has frozen 
> > (not responding to the directors status requests). Then the number of 
> > active 
> > connections sloooowly drops.
> They shouldn't slowly drop.
> All connections should be gone in the time the clients are 
> connected + FIN_WAIT. How long does a client hold a tomcat 
> connection? seconds, minutes, hours?

Hmmmm. This is an area I'm not very confident about. The fact is that
the ipvsadm-reported "active connections" do drop very slowly, and the
"inactive connections" seems never to drop. I thought this was a result
of having the ldirectord/ipvsadm "quiescent" attribute set to true.

       quiescent = [yes|no]

       If yes, then when real or failback servers are determined to be 
       down, they are not actually removed from the
       kernel's LVS table. Rather, their weight is set to zero which
       means that no new connections will be
       accepted. This has the side effect, that if the real server has
       persistent connections, new connections from
       any existing clients will continue to be routed to the real
       server, until the persistant timeout can expire.
       See ipvsadm for more information on persistant connections.

       If no, then the real or failback servers will be removed from the
       kernel's LVS table. The default is yes.

Exactly what is meant by "the persistant timeout can expire" I don't
know? What persistance?

I have tried to find a corresponadance to the ipvsadm "active
connections" and "inactive connections" numbers on the realservers, but
things do no match up. My conclusion is the the ipvsadm-figures are
strictly a memory of figures for the load-balancing algoritms to work
with, and that they does not correspond to realtime/real-world figures
of actual connections. If ipvsadm reports 700 "active connections" to a
realserver, netstat on the realserver typically reports less than half
the figure (netstat -t -n | wc -l ==> 227 connections).

This is a typical output from ipvsadm:

IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP wrr
  ->             Route   10     722        32908
  ->             Route   10     793        34342
TCP wrr
  ->               Route   10     43         2660
  ->               Route   10     47         2510
TCP wrr
  ->              Route   10     0          4
  ->              Route   10     0          4

> Where are you measuring the number of connections? with 
> ipvsadm on the director or with netstat on the realserver?

I was refering to ipvsadm-figures.

<Prev in Thread] Current Thread [Next in Thread>