LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: DR Load balancing active/inactive connections

To: Horms <horms@xxxxxxxxxxxx>
Subject: Re: DR Load balancing active/inactive connections
Cc: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
From: RU Admin <lvs-user@xxxxxxxxxxxxxxxxxx>
Date: Thu, 22 Feb 2007 12:49:27 -0500 (EST)


Horms:
Just wanted to say thank you for your suggestions back in November. I (finally) upgraded the kernel on the two directors to a custom 2.6.20 kernel about 2-3 weeks ago and that seems to have done the trick with the connection count problems. I am no longer seeing larger numbers in my active or inactive connections, they are now timing out properly which is great.

Thanks!!!

Craig


On Wed, 29 Nov 2006, Horms wrote:

On Tue, Nov 28, 2006 at 08:37:28AM -0500, RU Admin wrote:

[snip]

When running "ipvsadm -lcn", I can
see connections with the CLOSE state going from 00:59 to 00:01, and
then magically going back to 00:59 again for no reason.  The same
holds true for ESTABLISHED connections, I see them go from 29:59 to
00:01 and then back to 29:59, and I know for a fact that the
connection from the client has ended.

I seem to recall a bug relating to connection entries having
the behaviour you describe above due to a race in reference counting.
Which version of the kernel do you have? Is there any chance of updating
it to something like 2.6.18?

I'm using a stock Debian Sarge kernel (2.6.8-2-686-smp), I can
definitely build the latest kernel, and if you feel that it will help
then I'll do that.  It's always risky making a major kernel change on
a production machine, which is why I wanted to hold off from making
that change until someone else familiar with IPVS, felt that it might
help.

I think that it would be worth trying. Can you reproduce the problem
on a non-production machine?

[snip]

I am wondering if the problem is that for some reason the
linux-directors are not seeing the part of the close sequence
that is sent by the end-user (it won't see the portion sent by
the real-servers). Supposing for a minute that this is the case,
it would explain the strange numbers, and those strange numbers
will be effecting how wlc allocates connections.

But shouldn't IPVS timeout?  I thought that was the purpose of the timeouts...
So that when the director doesn't see a close event after a specified period of
time, it simply times out.

I actually think my close theory is wrong and that as you point out the
problem is timeouts. I think that you are correct in thinking that they
should time out. So that seems to leave us with two main possiblilities
1) there is a bug (which may have already been fixed) or 2) we are
reading the data wrong.

[snip]

How exactly did you deal with ARP, there are several methods.

On the real servers, I'm first bringing up the dummy0 interface with the VIP,
then I use "sysctl" and set the following:
  net.ipv4.conf.dummy0.rp_filter=0
  net.ipv4.conf.dummy0.arp_ignore=1
  net.ipv4.conf.dummy0.arp_announce=2
Then I bring up eth0 with the real server's regular IP address, and with
"sysctl", I set the following (includes a repeat of the above options):
  net.ipv4.conf.default.rp_filter=0
  net.ipv4.conf.all.rp_filter=0
  net.ipv4.conf.lo.rp_filter=0
  net.ipv4.conf.dummy0.rp_filter=0
  net.ipv4.conf.eth0.rp_filter=0

  net.ipv4.conf.default.arp_ignore=1
  net.ipv4.conf.all.arp_ignore=1
  net.ipv4.conf.lo.arp_ignore=1
  net.ipv4.conf.dummy0.arp_ignore=1
  net.ipv4.conf.eth0.arp_ignore=1

  net.ipv4.conf.default.arp_announce=2
  net.ipv4.conf.all.arp_announce=2
  net.ipv4.conf.lo.arp_announce=2
  net.ipv4.conf.dummy0.arp_announce=2
  net.ipv4.conf.eth0.arp_announce=2

The ARP problem was the one thing that kept me from moving to LVS-DR
for a long time.  I finally started playing with all of the
net.ipv4.conf options and bringing up the interfaces in a specific
order, and finally stumbled across a method that actually worked.  I'm
sure some of the above options don't need to be set, but it finally
works, and I'm a little afraid to touch it.

What you have above is the prefered method these days.

You shouldn't need to bother with lo and dummy0 as these are non-arping
interfaces (right?). Though setting them is harmless.

In any case, I agree with your analysis that ARP does not seem to be
a problem in your setup, as the connections are being forwarded by
the linux-director.

I'm going to try and build the latest 2.6.18 now, and hopefully
sometime later this week I can install the new kernel and reboot our
director. Unfortunately I've never been able to get keepalived to
handle a MASTER/SLAVE director properly, so I only have one director
in front of the real servers, so if I make a mistake, our main
university email server will be down.

ew. Good luck :)

--
Horms
 H: http://www.vergenet.net/~horms/
 W: http://www.valinux.co.jp/en/



<Prev in Thread] Current Thread [Next in Thread>
  • Re: DR Load balancing active/inactive connections, RU Admin <=