Re: DR Load balancing active/inactive connections

To: " users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: DR Load balancing active/inactive connections
From: Horms <horms@xxxxxxxxxxxx>
Date: Tue, 28 Nov 2006 14:37:27 +0900
On Tue, Nov 21, 2006 at 08:57:59AM -0500, RU Admin wrote:
> I've been using IPVS for almost two years now, I started out with 6
> machines (1 director, 5 real servers) and was using LVS-NAT.  During
> the first year that I was running that email server everything worked
> perfectly with LVS-NAT.  About a year ago, I decided to setup another
> email server, this time with 5 machines (1 director, 4 real servers)
> and decided it was time to get LVS-DR working, which I successfully
> did.  I then decided to switch over my first email server (the one
> with 6 machines) to LVS-DR, since the other LVS-DR server was working
> great. Both of my email servers have been working great with LVS-DR
> for the past year, with one major exception (which has just recently
> started getting worse, because of the large volumes of connections
> coming into the servers).  The problem I am having is that my
> active/inactive connections are not being listed properly.  What I
> mean, is that the counter for my active/inactive connections just keep
> going up and up, and are constantly being skewed.  I read through a
> good number of archived messages on this mailing list, and I keep
> seeing everyone saying "Those numbers ipvsadm are showing, are just
> for reference, they don't really mean anything, don't worry about
> them."   Well, I can tell you first hand, when you use wlc (weighted
> least connections), those number obviously DO mean something.  My
> machines are no longer being equally balanced between because my
> connection counts are off, and this is really effecting the
> performance of my email servers.  When running "ipvsadm -lcn", I can
> see connections with the CLOSE state going from 00:59 to 00:01, and
> then magically going back to 00:59 again for no reason.  The same
> holds true for ESTABLISHED connections, I see them go from 29:59 to
> 00:01 and then back to 29:59, and I know for a fact that the
> connection from the client has ended.

I seem to recall a bug relating to connection entries having
the behaviour you describe above due to a race in reference counting.
Which version of the kernel do you have? Is there any chance of updating
it to something like 2.6.18?

> I'm currently using "IP Virtual Server version 1.2.0", and I know that
> there is a 1.2.1 version available, but my problem is that my email
> servers are in a production environment, and I really don't want to
> recompile a new kernel with the latest IPVS if that isn't going to
> solve the problem.  I'd hate to cause other problems with my system
> because of a major kernel upgrade.
> I can only hope that someone has some suggestions, I am a firm
> supporter of IPVS, and as I said I've been using it for 2 years now
> and one of my email servers handles over 30,000,000 emails in one
> month (or almost 1 million emails a day).  So we heavily relying on
> IPVS.  There is another department in our organization that spent
> thousands of dollars on FoundryNet load balancing productions, and
> I've been able to accomplish the same tasks (and handle a higher load)
> by using IPVS, so clearly IPVS is a solid product.  Unfortunately, I
> just really need to figure out what is going on with the connection
> count problems.
> I not sure what information you guys need, but here's some info about
> my setup.  If you need any more details, feel free to ask.
> 6 Dell PowerEdge SC1425
> Dual Xeon 3.06Ghz processors
> 160GB SATA
> Running Debian Sarge
> 1 machine is the director, the other 5 are the real servers.  All 6
> machines are on the same subnet (with public IPs), and the director is
> using LVS-DR for load balancing.  Just to give you an idea as to the
> types of connection numbers 
> I'm getting:
>   Prot LocalAddress:Port Scheduler Flags
>     -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
>   TCP wlc
>     ->     Route   50     648        2357
>     ->     Route   50     650        2231
>     ->     Route   50     648        2209
> Whereas when using LVS-NAT (which was 100% perfect), my numbers would be 
> something like:
>     ->     Route   50     16        56
>     ->     Route   50     14        50
>     ->     Route   50     15        48

I assume that the dumps above are for similar traffic rates.

I am wondering if the problem is that for some reason the
linux-directors are not seeing the part of the close sequence
that is sent by the end-user (it won't see the portion sent by
the real-servers). Supposing for a minute that this is the case,
it would explain the strange numbers, and those strange numbers
will be effecting how wlc allocates connections.

> I use keepalived to manage the director and to monitor the real
> servers. The only "tweaking" that I've done to IPVS, is I have to run
> this:
>   /sbin/ipvsadm --set 1800 0 0
> before starting up keepalived, just so that the active connections
> will stay active for 30 minutes.  In other words, we allow our users
> to idle their connection for 30 minutes, and after that, then the
> connection should be terminated.  And I put "0 0" there, because from
> what I've read, that tells ipvsadm to not change those other two
> values (in other words, leave the defaults as is).
> That's about all I can think of, the only other wierd thing that I had
> to do was to tweak some networking settings on the real servers to fix
> the pain-in-the-@$$ ARP issues that come with DR.  But I doubt those
> changes would have anything to do with the director's load balancing
> problems. Those tweaks were only done on the real servers, and they
> were to just silence the broadcasting of the MAC address for the VIP
> (dummy0) interfaces on the real servers.

How exactly did you deal with ARP, there are several methods.


<Prev in Thread] Current Thread [Next in Thread>