On Mon, May 04, 2009 at 10:31:59AM +0200, Christian Frost wrote:
> Hi,
> We have a setup including two real servers each of which runs an
> instance of MySql with the max_connections option set to 1000. In this
> setup we have run some performance tests with mysqlslap two determine
> the throughput of the setup. These tests involve simulating many
> simultaneous users querying the database. Under these conditions we have
> encountered some problems with the load balancer. Specifically, using
> ipvsadm -L -n to monitor the connections during the performance test
> there are intitially many connections represented as inactive. After a
> few seconds the inactive connections are represented as active in the
> respective real server. This causes a problem when the Least-Connection
> Scheduling algorithm is used because the connections are not equally
> between the two real hosts. The two real hosts are almost equal in terms
> of processing capacities.
>
> In the following the output of ipvsadm -L -n is shown which probably
> explains the problem better.
>
> ipvsadm -L -n a few seconds in the test simulating 200 MySql clients
> connecting simultaneously.
>
> IP Virtual Server version 1.2.1 (size=4096)
> Prot LocalAddress:Port Scheduler Flags
> -> RemoteAddress:Port Forward Weight ActiveConn InActConn
> TCP 10.0.1.5:3306 lc
> -> 10.0.1.2:3306 Route 1 71 0
> -> 10.0.1.4:3306 Route 1 70 60
>
>
> ipvsadm -L -n after 30 seconds in the test simulating 200 MySql clients
> connecting simultaneously. Note that the load balancer uses the
> Least-Connection scheduling algorithm.
>
> IP Virtual Server version 1.2.1 (size=4096)
> Prot LocalAddress:Port Scheduler Flags
> -> RemoteAddress:Port Forward Weight ActiveConn InActConn
> TCP 10.0.1.5:3306 lc
> -> 10.0.1.2:3306 Route 1 71 0
> -> 10.0.1.4:3306 Route 1 130 0
>
>
> The problem does not occur if the connections are made sequentially and
> if the number of total connections is below about 100.
>
> Is there anything we can do to avoid these problems?
Hi Christian,
I'm taking a bit of a stab in the dark, but I think that the problem that
you are seeing is with the lc (and wlc) algorithms interraction with burst
of connections.
I think that the core of the problem is the way that lc calculates the
overhead of a server. This being relevant as an incomming connection is
allocated to whichever real-server is deemed to have the lowest overhead
at that time.
In net/netfilter/ipvs/ip_vs_lc.c:ip_vs_lc_dest_overhead()
overhead is calculated as:
active_connections * 256 + inactive_connections
So suppose that things are in a more or less balanced state,
real-server A has 71 connections and real-server B has 70.
Then a big burst of 60 new connections comes in. The first of these new
connections will go to real-server B, as expected. This connection will be
in the inactive state until the 3 way handshake is complete. So far so good.
Unfortunately, if the other 59 new connections come in before any of the
other new connections complete the handshake and move into the active
state, they will all be allocated to real-server B because:
71 * 256 + 0 > 70 * 256 + n
where: n < 256
Assuming that I am correct I can think of two methods of addressing this
problem:
1) Simply change 256 to a smaller value. In this case 256 basically
ends up being the granularity of balancing for bursts of connections.
And in the case at hand, clearly 256 is too coarse. Perhaps 8, 2 or
even 1 would be a better value.
This should be a trivial change to the code, and if lc is a module
you wouldn't even need to recompile the entire kernel - though you
would need to track down the original kernel source and config.
The main drawback of this is that if you have a lot of old, actually
dead, connections in the inactive state, then it might cause imbalance.
If that does help it might be good to consider making this parameter
configurable at run time, at least globally.
2) A more complex though arguably better approach would be to implement
some kind of slow start feature. That is, to assign some kind of weight
to new connections. I had a stab at this one in the past - it should
be in the archives - though I think my solution only addressed the
problem for active connections. But the idea seems reasonable
to extend to this problem.
--
Simon Horman
VA Linux Systems Japan K.K. Satellite Lab in Sydney, Australia
H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
|