LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: DR Load balancing active/inactive connections

To: RU Admin <lvs-user@xxxxxxxxxxxxxxxxxx>
Subject: Re: DR Load balancing active/inactive connections
Cc: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
From: Horms <horms@xxxxxxxxxxxx>
Date: Wed, 29 Nov 2006 10:53:55 +0900
On Tue, Nov 28, 2006 at 08:37:28AM -0500, RU Admin wrote:

[snip]

> >>When running "ipvsadm -lcn", I can
> >>see connections with the CLOSE state going from 00:59 to 00:01, and
> >>then magically going back to 00:59 again for no reason.  The same
> >>holds true for ESTABLISHED connections, I see them go from 29:59 to
> >>00:01 and then back to 29:59, and I know for a fact that the
> >>connection from the client has ended.
> >
> >I seem to recall a bug relating to connection entries having
> >the behaviour you describe above due to a race in reference counting.
> >Which version of the kernel do you have? Is there any chance of updating
> >it to something like 2.6.18?
> 
> I'm using a stock Debian Sarge kernel (2.6.8-2-686-smp), I can
> definitely build the latest kernel, and if you feel that it will help
> then I'll do that.  It's always risky making a major kernel change on
> a production machine, which is why I wanted to hold off from making
> that change until someone else familiar with IPVS, felt that it might
> help.

I think that it would be worth trying. Can you reproduce the problem
on a non-production machine?

[snip]

> >I am wondering if the problem is that for some reason the
> >linux-directors are not seeing the part of the close sequence
> >that is sent by the end-user (it won't see the portion sent by
> >the real-servers). Supposing for a minute that this is the case,
> >it would explain the strange numbers, and those strange numbers
> >will be effecting how wlc allocates connections.
> 
> But shouldn't IPVS timeout?  I thought that was the purpose of the 
> timeouts...  
> So that when the director doesn't see a close event after a specified period 
> of 
> time, it simply times out.

I actually think my close theory is wrong and that as you point out the
problem is timeouts. I think that you are correct in thinking that they
should time out. So that seems to leave us with two main possiblilities
1) there is a bug (which may have already been fixed) or 2) we are
reading the data wrong.

[snip]

> >How exactly did you deal with ARP, there are several methods.
> 
> On the real servers, I'm first bringing up the dummy0 interface with the VIP, 
> then I use "sysctl" and set the following:
>   net.ipv4.conf.dummy0.rp_filter=0
>   net.ipv4.conf.dummy0.arp_ignore=1
>   net.ipv4.conf.dummy0.arp_announce=2
> Then I bring up eth0 with the real server's regular IP address, and with 
> "sysctl", I set the following (includes a repeat of the above options):
>   net.ipv4.conf.default.rp_filter=0
>   net.ipv4.conf.all.rp_filter=0
>   net.ipv4.conf.lo.rp_filter=0
>   net.ipv4.conf.dummy0.rp_filter=0
>   net.ipv4.conf.eth0.rp_filter=0
> 
>   net.ipv4.conf.default.arp_ignore=1
>   net.ipv4.conf.all.arp_ignore=1
>   net.ipv4.conf.lo.arp_ignore=1
>   net.ipv4.conf.dummy0.arp_ignore=1
>   net.ipv4.conf.eth0.arp_ignore=1
> 
>   net.ipv4.conf.default.arp_announce=2
>   net.ipv4.conf.all.arp_announce=2
>   net.ipv4.conf.lo.arp_announce=2
>   net.ipv4.conf.dummy0.arp_announce=2
>   net.ipv4.conf.eth0.arp_announce=2
> 
> The ARP problem was the one thing that kept me from moving to LVS-DR
> for a long time.  I finally started playing with all of the
> net.ipv4.conf options and bringing up the interfaces in a specific
> order, and finally stumbled across a method that actually worked.  I'm
> sure some of the above options don't need to be set, but it finally
> works, and I'm a little afraid to touch it.

What you have above is the prefered method these days.

You shouldn't need to bother with lo and dummy0 as these are non-arping
interfaces (right?). Though setting them is harmless.

In any case, I agree with your analysis that ARP does not seem to be
a problem in your setup, as the connections are being forwarded by
the linux-director.

> I'm going to try and build the latest 2.6.18 now, and hopefully
> sometime later this week I can install the new kernel and reboot our
> director. Unfortunately I've never been able to get keepalived to
> handle a MASTER/SLAVE director properly, so I only have one director
> in front of the real servers, so if I make a mistake, our main
> university email server will be down.

ew. Good luck :)

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/


<Prev in Thread] Current Thread [Next in Thread>