Was 7.3 with 2.4.x kernel?
Yes - 2.4.18-18.7.x
Ok, so my assumption holds as a base. Would you be able to check if
switching to the WLC scheduler changes your observed active connection rate?
Where before, we were seeing Active Connections in the 1-4 range even
during normal usage, we're now seeing them in the 12-16 range on
average. We've got the same weighting on the new server as we did on
the old.
Different Server system and most importantly, different software
configuration. IPVS between 2.4 and 2.6 (provided my assumption above
holds) has change significantly with regards to the ratio of
active/inactive connections. We've seen that in our rrdtool/MRTG
graphs as well.
It's not so much that we've seen the numbers change, we've seen an
actual load impact on our application. Last night, for example, the
load-balanced Apache instances behind LVS were slowing to a crawl and
hitting MaxClients even under a fairly light load. So I think it's more
than just us having to readjust our expectations as to the number of
active connections.
What's your connection per second rate? It could be (theory) that while
netfilter in 2.4 maxed out regarding connection tracking and naturally
didn't allow too many connections through LVS, we now have a much
powerful box including an improved TCP stack handling with regard to
netfilter in 2.6.
We'd need more information if you want to dig this phenomenon.
Sure. This is a very vanilla FC5 system running the 2.6.17-1.2174_FC5
kernel. I'm certainly open to suggestions for tuning that we should do
in order to get decent performance - I'd hate to have to drop back to
Well, the performance bottleneck doesn't seem to be LVS, does it? Do you
collect statistics somewhere? I'd be interested in the server status of
the apache instances. Do you get more throughput through the LVS?
the RH7.3 box. We pretty much ported the LVS configuration over
verbatim from the 7.3 to the FC5, so the only thing that's changed is
the OS and hardware.
There's of course always the option to cap the amount of connections
sent to the RS by using the threshold limitation feature of LVS. But I
reckon you'd rather find out what exactly is limiting your RS now.
One thing we did notice is that by removing the persistent= line in our
https balancer (we have two stanzas - one for port 80 and one for port
443), the active connection numbers for the port 80 stanza dropped
dramatically. Of course, that broke one part of our application, so we
had to reenable it (though at 30 seconds as opposed to 300 seconds), but
hopefully that's a clue.
At first thought not so much.
######
# Global Directives
checktimeout=10
checkinterval=5
This alone will generate a hell of a lot checks IMHO, which could have
been too much for the old box to handle. This could mean that now all
the checks can be done and put an increased load on the RS. The server
status page should give more input on this, as well as the apache access
log file.
autoreload=no
logfile="/var/log/ldirectord.log"
quiescent=yes
virtual=128.109.135.22:80
fallback=127.0.0.1:80
real=192.168.0.13:80 masq 2
real=192.168.0.15:80 masq 2
real=192.168.0.16:80 masq 2
real=192.168.0.17:80 masq 2
real=192.168.0.18:80 masq 2
real=192.168.0.19:80 masq 2
real=192.168.0.20:80 masq 2
real=192.168.0.21:80 masq 3
real=192.168.0.22:80 masq 3
real=192.168.0.23:80 masq 3
real=192.168.0.24:80 masq 3
real=192.168.0.25:80 masq 3
real=192.168.0.26:80 masq 3
real=192.168.0.27:80 masq 4
real=192.168.0.28:80 masq 4
real=192.168.0.29:80 masq 4
real=192.168.0.30:80 masq 4
real=192.168.0.31:80 masq 4
service=http
request="lvs/lvs_donotremove"
receive="lvs up"
scheduler=wrr
protocol=tcp
checktype=negotiate
virtual=128.109.135.22:443
fallback=127.0.0.1:443
real=192.168.0.13:443 masq 2
real=192.168.0.15:443 masq 2
real=192.168.0.16:443 masq 2
real=192.168.0.17:443 masq 2
real=192.168.0.18:443 masq 2
real=192.168.0.19:443 masq 2
real=192.168.0.20:443 masq 2
real=192.168.0.21:443 masq 3
real=192.168.0.22:443 masq 3
real=192.168.0.23:443 masq 3
real=192.168.0.24:443 masq 3
real=192.168.0.25:443 masq 3
real=192.168.0.26:443 masq 3
real=192.168.0.27:443 masq 4
real=192.168.0.28:443 masq 4
real=192.168.0.29:443 masq 4
real=192.168.0.30:443 masq 4
real=192.168.0.31:443 masq 4
service=https
request="lvs/slvs_donotremove"
receive="slvs up"
scheduler=wrr
persistent=30
protocol=tcp
checktype=negotiate
Could we get some numbers, please?
ipvsadm -L -n -c
ipvsadm -L -n --stats
ipvsadm -L -n --rate
ipvsadm -L -n --timeout
ipvsadm -L -n --persistent-conn
Thanks and best regards,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
|