On Tue, Nov 28, 2006 at 08:37:28AM -0500, RU Admin wrote:
[snip]
When running "ipvsadm -lcn", I can
see connections with the CLOSE state going from 00:59 to 00:01, and
then magically going back to 00:59 again for no reason. The same
holds true for ESTABLISHED connections, I see them go from 29:59 to
00:01 and then back to 29:59, and I know for a fact that the
connection from the client has ended.
I seem to recall a bug relating to connection entries having
the behaviour you describe above due to a race in reference counting.
Which version of the kernel do you have? Is there any chance of updating
it to something like 2.6.18?
I'm using a stock Debian Sarge kernel (2.6.8-2-686-smp), I can
definitely build the latest kernel, and if you feel that it will help
then I'll do that. It's always risky making a major kernel change on
a production machine, which is why I wanted to hold off from making
that change until someone else familiar with IPVS, felt that it might
help.
I think that it would be worth trying. Can you reproduce the problem
on a non-production machine?
[snip]
I am wondering if the problem is that for some reason the
linux-directors are not seeing the part of the close sequence
that is sent by the end-user (it won't see the portion sent by
the real-servers). Supposing for a minute that this is the case,
it would explain the strange numbers, and those strange numbers
will be effecting how wlc allocates connections.
But shouldn't IPVS timeout? I thought that was the purpose of the timeouts...
So that when the director doesn't see a close event after a specified period of
time, it simply times out.
I actually think my close theory is wrong and that as you point out the
problem is timeouts. I think that you are correct in thinking that they
should time out. So that seems to leave us with two main possiblilities
1) there is a bug (which may have already been fixed) or 2) we are
reading the data wrong.
[snip]
How exactly did you deal with ARP, there are several methods.
On the real servers, I'm first bringing up the dummy0 interface with the VIP,
then I use "sysctl" and set the following:
net.ipv4.conf.dummy0.rp_filter=0
net.ipv4.conf.dummy0.arp_ignore=1
net.ipv4.conf.dummy0.arp_announce=2
Then I bring up eth0 with the real server's regular IP address, and with
"sysctl", I set the following (includes a repeat of the above options):
net.ipv4.conf.default.rp_filter=0
net.ipv4.conf.all.rp_filter=0
net.ipv4.conf.lo.rp_filter=0
net.ipv4.conf.dummy0.rp_filter=0
net.ipv4.conf.eth0.rp_filter=0
net.ipv4.conf.default.arp_ignore=1
net.ipv4.conf.all.arp_ignore=1
net.ipv4.conf.lo.arp_ignore=1
net.ipv4.conf.dummy0.arp_ignore=1
net.ipv4.conf.eth0.arp_ignore=1
net.ipv4.conf.default.arp_announce=2
net.ipv4.conf.all.arp_announce=2
net.ipv4.conf.lo.arp_announce=2
net.ipv4.conf.dummy0.arp_announce=2
net.ipv4.conf.eth0.arp_announce=2
The ARP problem was the one thing that kept me from moving to LVS-DR
for a long time. I finally started playing with all of the
net.ipv4.conf options and bringing up the interfaces in a specific
order, and finally stumbled across a method that actually worked. I'm
sure some of the above options don't need to be set, but it finally
works, and I'm a little afraid to touch it.
What you have above is the prefered method these days.
You shouldn't need to bother with lo and dummy0 as these are non-arping
interfaces (right?). Though setting them is harmless.
In any case, I agree with your analysis that ARP does not seem to be
a problem in your setup, as the connections are being forwarded by
the linux-director.
I'm going to try and build the latest 2.6.18 now, and hopefully
sometime later this week I can install the new kernel and reboot our
director. Unfortunately I've never been able to get keepalived to
handle a MASTER/SLAVE director properly, so I only have one director
in front of the real servers, so if I make a mistake, our main
university email server will be down.
ew. Good luck :)
--
Horms
H: http://www.vergenet.net/~horms/
W: http://www.valinux.co.jp/en/