LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Performance issues with apache/lvs

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Performance issues with apache/lvs
From: Kees Hoekzema <kees@xxxxxxxxxxxx>
Date: Sun, 6 Feb 2005 03:36:36 +0100
Hello list,

Unfortunately I have another problem with LVS. Somehow my requests are 
slowed down by the LVS. Sometimes it takes up to 10 seconds to get a reply 
from the servers. My first thought was that the realservers were bugging, 
but after some testing and benching it appears it was the LVS' doing.

My testing network looks like this:

+-------------------------------------------+
| HP Procurve GiB Switch (external traffic) |
+-------------------------------------------+
          1 |                          2 |
+--------------+          +-----------------+
| Loadbalancer |          | Benching server |
+--------------+          +-----------------+
          3 |                          4 |
+-------------------------------------------+
| HP Procurve GiB Switch (internal traffic) |
+-------------------------------------------+
        5 |             6 |
+------------+ +------------+
| Realserver | | Realserver |
+------------+ +------------+

The loadbalancer is an Intel Celeron 2GHz w/ 512MB ram and two Intel 
100mbit/s interfaces. Realserver1 is a dual Opteron 244 w/ 1GB ram and a 
tigon3 1GBit/s interface. Realserver2 is a dual Xeon 3GHz w/ 1GB ram and 
an Intel 1Gbit interface. The benching server is a dual xeon 2.4Ghz w/ 1G 
ram and also a 1GBit/s interface. The switches are both HP procurves 
10/100/1000 switches.

The problem is; when I test from the benching server, the average 
requesttime is somewhere around 21ms per request for the realservers (path 
each packet takes: 4-5 5-4) with a maximum of ~80ms. But when I test the 
loadbalancer (path: 2-1-3-5 5-3-1-2) I see maximum requesttimes of almost 
3 seconds, and an average requesttime of over 140ms.

I have created some graphs where I let 20 threads do 50 requests each, and 
plotted the time it took them to do each request, they can be found at 
http://arethusa.tweakers.net/~kees/lvs.html (don't mind the x-axis, i've 
reused some code to create the graphs).

As you can see in those graphs, after the first couple of hundred requests 
somehow the time it takes to serve a request jumps up 3 seconds.
I noticed this because with ab i would get something like 3.000 requests 
per second from each realserver and a bit more from the loadbalancer, but 
only if i did less than 100 requests (on the loadbalancer that is; the 
realservers had no problem with more requests). As soon as I increased the 
number of requests ab showed that sometimes it took up to 8 seconds to get 
a request. The same results could be reproduced by my colleagues also 
using ab (from different servers offcourse).

The network seems okay, connections between the realserver and benchserver 
peek at 820mbit/s, also 820mbit/s between the two realservers, between the 
loadbalancer and the benchserver they are quite okay, i even made a lvs 
service 'loadbalancing' the netpipes port to one server, no big drops in 
the speed though (peeks at 86mbit/s).

My questions regarding this;
- Can this be explained by the loadbalancers having only 100mbit interfaces 
while the rest of the servers have gbit? 
- Are there network/kernel specific variables that can influence this sort 
of thing?
- Where do I have to start looking if I want to debug this a bit more 
indepth?

All I have been able to do so far is detecting the problem without knowing 
how or where it can be solved.

-kees












<Prev in Thread] Current Thread [Next in Thread>