I've been using IPVS for almost two years now, I started out with 6
machines (1 director, 5 real servers) and was using LVS-NAT. During the
first year that I was running that email server everything worked
perfectly with LVS-NAT. About a year ago, I decided to setup another
email server, this time with 5 machines (1 director, 4 real servers) and
decided it was time to get LVS-DR working, which I successfully did. I
then decided to switch over my first email server (the one with 6
machines) to LVS-DR, since the other LVS-DR server was working great.
Both of my email servers have been working great with LVS-DR for the past
year, with one major exception (which has just recently started getting
worse, because of the large volumes of connections coming into the
servers). The problem I am having is that my active/inactive connections
are not being listed properly. What I mean, is that the counter for my
active/inactive connections just keep going up and up, and are constantly
being skewed. I read through a good number of archived messages on this
mailing list, and I keep seeing everyone saying "Those numbers ipvsadm
are showing, are just for reference, they don't really mean anything,
don't worry about them." Well, I can tell you first hand, when you use
wlc (weighted least connections), those number obviously DO mean
something. My machines are no longer being equally balanced between
because my connection counts are off, and this is really effecting the
performance of my email servers. When running "ipvsadm -lcn", I can see
connections with the CLOSE state going from 00:59 to 00:01, and then
magically going back to 00:59 again for no reason. The same holds true
for ESTABLISHED connections, I see them go from 29:59 to 00:01 and then
back to 29:59, and I know for a fact that the connection from the client
has ended.
I'm currently using "IP Virtual Server version 1.2.0", and I know that
there is a 1.2.1 version available, but my problem is that my email
servers are in a production environment, and I really don't want to
recompile a new kernel with the latest IPVS if that isn't going to solve
the problem. I'd hate to cause other problems with my system because of a
major kernel upgrade.
I can only hope that someone has some suggestions, I am a firm supporter
of IPVS, and as I said I've been using it for 2 years now and one of my
email servers handles over 30,000,000 emails in one month (or almost 1
million emails a day). So we heavily relying on IPVS. There is another
department in our organization that spent thousands of dollars on
FoundryNet load balancing productions, and I've been able to accomplish
the same tasks (and handle a higher load) by using IPVS, so clearly IPVS
is a solid product. Unfortunately, I just really need to figure out what
is going on with the connection count problems.
I not sure what information you guys need, but here's some info about my
setup. If you need any more details, feel free to ask.
6 Dell PowerEdge SC1425
Dual Xeon 3.06Ghz processors
2GB DDR
160GB SATA
Running Debian Sarge
1 machine is the director, the other 5 are the real servers. All 6
machines are on the same subnet (with public IPs), and the director is
using LVS-DR for load balancing. Just to give you an idea as to the types
of connection numbers I'm getting:
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP vip.address.here:smtp wlc
-> realserver1.ip.here:smtp Route 50 648 2357
-> realserver2.ip.here:smtp Route 50 650 2231
-> realserver3.ip.here:smtp Route 50 648 2209
Whereas when using LVS-NAT (which was 100% perfect), my numbers would be
something like:
-> realserver1.ip.here:smtp Route 50 16 56
-> realserver2.ip.here:smtp Route 50 14 50
-> realserver3.ip.here:smtp Route 50 15 48
I use keepalived to manage the director and to monitor the real servers.
The only "tweaking" that I've done to IPVS, is I have to run this:
/sbin/ipvsadm --set 1800 0 0
before starting up keepalived, just so that the active connections will
stay active for 30 minutes. In other words, we allow our users to idle
their connection for 30 minutes, and after that, then the connection
should be terminated. And I put "0 0" there, because from what I've
read, that tells ipvsadm to not change those other two values (in other
words, leave the defaults as is).
That's about all I can think of, the only other wierd thing that I had
to do was to tweak some networking settings on the real servers to fix
the pain-in-the-@$$ ARP issues that come with DR. But I doubt those
changes would have anything to do with the director's load balancing
problems. Those tweaks were only done on the real servers, and they were
to just silence the broadcasting of the MAC address for the VIP (dummy0)
interfaces on the real servers.
And for those interested, I switched from LVS-NAT to LVS-DR, because I
really feel that you can get much better network throughput by using DR
instead of NAT. I know I've read a bunch of messages on the mailing list
saying that NAT is just as good, but I think the one major advantage of
IPVS is that it supports DR, whereas almost every other load balancing
product I've seen uses some type of NATing (in other words, all network
traffic goes in and out of the director). To have a setup, like I do now
where only incoming traffic has to go through the director, is absolutely
fantasic, because the cluster (for lack of a better word) can be easily
expanded. With LVS-NAT, when you add more real servers, all you get is
more CPU power, you don't get any more network throughput, with LVS-DR
when you add a new real server, you completely expand your cluster, not
just one part of it.
Sorry for the long email. But I really would appreciate any help that can
be provided.
Thanks!
Craig
|