Hello List,
For some years now we use a couple of loadbalancers to loadbalance our
serverpark (usually 4-7 servers, at the moment 5 dual-dual-cores and
dual-quad-cores). This was no problem for the server, with loads of 30-40
mbit it was around 10% CPU load.
However, today we changed the layout of our site, and it required us to
install apache2 and php5 on the webservers next to apache1.3 and php4, which
we require for some older software that isnt php5 compatible yet.
We decided to let the apache 1.3 install have port 80, and give the apache2
install port 81, before today the complete site was run on apache1.3.
My ipvs config looks like this right now:
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 213.239.154.36:80 wrr
-> 10.0.1.34:80 Masq 32 0 422
-> 10.0.1.33:80 Masq 55 2 1156
-> 10.0.1.36:80 Masq 53 2 1222
-> 10.0.1.37:80 Masq 57 0 1217
-> 10.0.1.38:80 Masq 52 0 1167
TCP 213.239.154.35:80 wrr
-> 10.0.1.34:81 Masq 32 4 1664
-> 10.0.1.33:81 Masq 55 8 4832
-> 10.0.1.36:81 Masq 53 9 5118
-> 10.0.1.37:81 Masq 57 3 5144
-> 10.0.1.38:81 Masq 52 3 4928
(weights are dynamicly changed every 10 seconds)
Today we changed the port numbers for 213.239.154.35:80 to port 81 so our
visitors went to the new site. However, this also increased the load on the
loadbalancer dramaticly. At one point I had to stop each and every
additional service on the loadbalancer so it was only doing iptables and
ipvs, and still it was using up to 100% system-cpu time.
I noticed the InActConn on the .35 service was quite high, when the site was
doing 60 mbit I noticed over 120.000 inactive connections. Can this be a
problem? I tried to use 'ipvsadm --set 30 30 30' to lower the timeouts, but
ipvsadm --list -cn still shows a lot of connections with a timeout > 30
seconds. And basicly all in the TIME_WAIT state.
At the moment I'm using the 2.6.20.4 kernel - are there any known bugs with
it?
As I write this mail, I'm doing 12 mbit of traffic (in the middle of the
night) and vmstat shows me that the cpu uses around 20% system-time (so 80%
idle), from statistics collected before today I know it was usually at
around 5% with this ammount of traffic, so something is going terribly wrong
here.
I fear for tomorrow when the traffic is going back up to 50-60 mbit, my
realservers can handle it, but can my ipvs config?
I hope someone has any tips, or can give hints as to where I have to look in
the system, and what variables I can tweak.
-kees
P.s. The loadbalancers are Intel Celeron 2GHz w/ 512MB ram boxes - which
used to run the site perfectly fine for over 4 years.
|