LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

DR Load balancing active/inactive connections

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: DR Load balancing active/inactive connections
From: RU Admin <lvs-user@xxxxxxxxxxxxxxxxxx>
Date: Tue, 21 Nov 2006 08:57:59 -0500 (EST)

I've been using IPVS for almost two years now, I started out with 6 machines (1 director, 5 real servers) and was using LVS-NAT. During the first year that I was running that email server everything worked perfectly with LVS-NAT. About a year ago, I decided to setup another email server, this time with 5 machines (1 director, 4 real servers) and decided it was time to get LVS-DR working, which I successfully did. I then decided to switch over my first email server (the one with 6 machines) to LVS-DR, since the other LVS-DR server was working great. Both of my email servers have been working great with LVS-DR for the past year, with one major exception (which has just recently started getting worse, because of the large volumes of connections coming into the servers). The problem I am having is that my active/inactive connections are not being listed properly. What I mean, is that the counter for my active/inactive connections just keep going up and up, and are constantly being skewed. I read through a good number of archived messages on this mailing list, and I keep seeing everyone saying "Those numbers ipvsadm are showing, are just for reference, they don't really mean anything, don't worry about them." Well, I can tell you first hand, when you use wlc (weighted least connections), those number obviously DO mean something. My machines are no longer being equally balanced between because my connection counts are off, and this is really effecting the performance of my email servers. When running "ipvsadm -lcn", I can see connections with the CLOSE state going from 00:59 to 00:01, and then magically going back to 00:59 again for no reason. The same holds true for ESTABLISHED connections, I see them go from 29:59 to 00:01 and then back to 29:59, and I know for a fact that the connection from the client has ended.

I'm currently using "IP Virtual Server version 1.2.0", and I know that there is a 1.2.1 version available, but my problem is that my email servers are in a production environment, and I really don't want to recompile a new kernel with the latest IPVS if that isn't going to solve the problem. I'd hate to cause other problems with my system because of a major kernel upgrade.

I can only hope that someone has some suggestions, I am a firm supporter of IPVS, and as I said I've been using it for 2 years now and one of my email servers handles over 30,000,000 emails in one month (or almost 1 million emails a day). So we heavily relying on IPVS. There is another department in our organization that spent thousands of dollars on FoundryNet load balancing productions, and I've been able to accomplish the same tasks (and handle a higher load) by using IPVS, so clearly IPVS is a solid product. Unfortunately, I just really need to figure out what is going on with the connection count problems.

I not sure what information you guys need, but here's some info about my setup. If you need any more details, feel free to ask.

6 Dell PowerEdge SC1425
Dual Xeon 3.06Ghz processors
2GB DDR
160GB SATA
Running Debian Sarge

1 machine is the director, the other 5 are the real servers. All 6 machines are on the same subnet (with public IPs), and the director is using LVS-DR for load balancing. Just to give you an idea as to the types of connection numbers I'm getting:
  Prot LocalAddress:Port Scheduler Flags
    -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
  TCP  vip.address.here:smtp wlc
    -> realserver1.ip.here:smtp     Route   50     648        2357
    -> realserver2.ip.here:smtp     Route   50     650        2231
    -> realserver3.ip.here:smtp     Route   50     648        2209
Whereas when using LVS-NAT (which was 100% perfect), my numbers would be something like:
    -> realserver1.ip.here:smtp     Route   50     16        56
    -> realserver2.ip.here:smtp     Route   50     14        50
    -> realserver3.ip.here:smtp     Route   50     15        48

I use keepalived to manage the director and to monitor the real servers. The only "tweaking" that I've done to IPVS, is I have to run this:
  /sbin/ipvsadm --set 1800 0 0
before starting up keepalived, just so that the active connections will stay active for 30 minutes. In other words, we allow our users to idle their connection for 30 minutes, and after that, then the connection should be terminated. And I put "0 0" there, because from what I've read, that tells ipvsadm to not change those other two values (in other words, leave the defaults as is).

That's about all I can think of, the only other wierd thing that I had to do was to tweak some networking settings on the real servers to fix the pain-in-the-@$$ ARP issues that come with DR. But I doubt those changes would have anything to do with the director's load balancing problems. Those tweaks were only done on the real servers, and they were to just silence the broadcasting of the MAC address for the VIP (dummy0) interfaces on the real servers.

And for those interested, I switched from LVS-NAT to LVS-DR, because I really feel that you can get much better network throughput by using DR instead of NAT. I know I've read a bunch of messages on the mailing list saying that NAT is just as good, but I think the one major advantage of IPVS is that it supports DR, whereas almost every other load balancing product I've seen uses some type of NATing (in other words, all network traffic goes in and out of the director). To have a setup, like I do now where only incoming traffic has to go through the director, is absolutely fantasic, because the cluster (for lack of a better word) can be easily expanded. With LVS-NAT, when you add more real servers, all you get is more CPU power, you don't get any more network throughput, with LVS-DR when you add a new real server, you completely expand your cluster, not just one part of it.

Sorry for the long email. But I really would appreciate any help that can be provided.

Thanks!

Craig




<Prev in Thread] Current Thread [Next in Thread>