On Tue, Nov 21, 2006 at 08:57:59AM -0500, RU Admin wrote:
>
> I've been using IPVS for almost two years now, I started out with 6
> machines (1 director, 5 real servers) and was using LVS-NAT. During
> the first year that I was running that email server everything worked
> perfectly with LVS-NAT. About a year ago, I decided to setup another
> email server, this time with 5 machines (1 director, 4 real servers)
> and decided it was time to get LVS-DR working, which I successfully
> did. I then decided to switch over my first email server (the one
> with 6 machines) to LVS-DR, since the other LVS-DR server was working
> great. Both of my email servers have been working great with LVS-DR
> for the past year, with one major exception (which has just recently
> started getting worse, because of the large volumes of connections
> coming into the servers). The problem I am having is that my
> active/inactive connections are not being listed properly. What I
> mean, is that the counter for my active/inactive connections just keep
> going up and up, and are constantly being skewed. I read through a
> good number of archived messages on this mailing list, and I keep
> seeing everyone saying "Those numbers ipvsadm are showing, are just
> for reference, they don't really mean anything, don't worry about
> them." Well, I can tell you first hand, when you use wlc (weighted
> least connections), those number obviously DO mean something. My
> machines are no longer being equally balanced between because my
> connection counts are off, and this is really effecting the
> performance of my email servers. When running "ipvsadm -lcn", I can
> see connections with the CLOSE state going from 00:59 to 00:01, and
> then magically going back to 00:59 again for no reason. The same
> holds true for ESTABLISHED connections, I see them go from 29:59 to
> 00:01 and then back to 29:59, and I know for a fact that the
> connection from the client has ended.
I seem to recall a bug relating to connection entries having
the behaviour you describe above due to a race in reference counting.
Which version of the kernel do you have? Is there any chance of updating
it to something like 2.6.18?
> I'm currently using "IP Virtual Server version 1.2.0", and I know that
> there is a 1.2.1 version available, but my problem is that my email
> servers are in a production environment, and I really don't want to
> recompile a new kernel with the latest IPVS if that isn't going to
> solve the problem. I'd hate to cause other problems with my system
> because of a major kernel upgrade.
>
> I can only hope that someone has some suggestions, I am a firm
> supporter of IPVS, and as I said I've been using it for 2 years now
> and one of my email servers handles over 30,000,000 emails in one
> month (or almost 1 million emails a day). So we heavily relying on
> IPVS. There is another department in our organization that spent
> thousands of dollars on FoundryNet load balancing productions, and
> I've been able to accomplish the same tasks (and handle a higher load)
> by using IPVS, so clearly IPVS is a solid product. Unfortunately, I
> just really need to figure out what is going on with the connection
> count problems.
>
> I not sure what information you guys need, but here's some info about
> my setup. If you need any more details, feel free to ask.
>
> 6 Dell PowerEdge SC1425
> Dual Xeon 3.06Ghz processors
> 2GB DDR
> 160GB SATA
> Running Debian Sarge
>
> 1 machine is the director, the other 5 are the real servers. All 6
> machines are on the same subnet (with public IPs), and the director is
> using LVS-DR for load balancing. Just to give you an idea as to the
> types of connection numbers
> I'm getting:
> Prot LocalAddress:Port Scheduler Flags
> -> RemoteAddress:Port Forward Weight ActiveConn InActConn
> TCP vip.address.here:smtp wlc
> -> realserver1.ip.here:smtp Route 50 648 2357
> -> realserver2.ip.here:smtp Route 50 650 2231
> -> realserver3.ip.here:smtp Route 50 648 2209
> Whereas when using LVS-NAT (which was 100% perfect), my numbers would be
> something like:
> -> realserver1.ip.here:smtp Route 50 16 56
> -> realserver2.ip.here:smtp Route 50 14 50
> -> realserver3.ip.here:smtp Route 50 15 48
I assume that the dumps above are for similar traffic rates.
I am wondering if the problem is that for some reason the
linux-directors are not seeing the part of the close sequence
that is sent by the end-user (it won't see the portion sent by
the real-servers). Supposing for a minute that this is the case,
it would explain the strange numbers, and those strange numbers
will be effecting how wlc allocates connections.
> I use keepalived to manage the director and to monitor the real
> servers. The only "tweaking" that I've done to IPVS, is I have to run
> this:
> /sbin/ipvsadm --set 1800 0 0
> before starting up keepalived, just so that the active connections
> will stay active for 30 minutes. In other words, we allow our users
> to idle their connection for 30 minutes, and after that, then the
> connection should be terminated. And I put "0 0" there, because from
> what I've read, that tells ipvsadm to not change those other two
> values (in other words, leave the defaults as is).
>
> That's about all I can think of, the only other wierd thing that I had
> to do was to tweak some networking settings on the real servers to fix
> the pain-in-the-@$$ ARP issues that come with DR. But I doubt those
> changes would have anything to do with the director's load balancing
> problems. Those tweaks were only done on the real servers, and they
> were to just silence the broadcasting of the MAC address for the VIP
> (dummy0) interfaces on the real servers.
How exactly did you deal with ARP, there are several methods.
--
Horms
H: http://www.vergenet.net/~horms/
W: http://www.valinux.co.jp/en/
|