On Oct 15, 2006, at 9:14 AM, Roberto Nibali wrote:
We just moved our primary load balancer from an ancient Dell
PowerApp 120 running Red Hat 7.3 to a newer Dell PowerEdge 750
running Fedora Core 5. However, we're noticing something weird.
Was 7.3 with 2.4.x kernel?
Yes - 2.4.18-18.7.x
Where before, we were seeing Active Connections in the 1-4 range
even during normal usage, we're now seeing them in the 12-16 range
on average. We've got the same weighting on the new server as we
did on the old.
Different Server system and most importantly, different software
configuration. IPVS between 2.4 and 2.6 (provided my assumption
above holds) has change significantly with regards to the ratio of
active/inactive connections. We've seen that in our rrdtool/MRTG
graphs as well.
It's not so much that we've seen the numbers change, we've seen an
actual load impact on our application. Last night, for example, the
load-balanced Apache instances behind LVS were slowing to a crawl and
hitting MaxClients even under a fairly light load. So I think it's
more than just us having to readjust our expectations as to the
number of active connections.
Does anyone have any ideas why we might be seeing such a jump on
this newer system?
Different kernel, where at least for the (w)LC scheduler the RS
calculation is done differently. On top of that, the TCP stack has
changed tunables and you hardware also behaves differently. The LVS
state transition timeouts are different between 2.4.x and 2.6.x
kernels, IIRC and so, for example if you're using LVS-DR, the
active connection to passive connection transition takes more time,
thus yielding a potentially higher amount of sessions in state
active connection.
We'd need more information if you want to dig this phenomenon.
Sure. This is a very vanilla FC5 system running the
2.6.17-1.2174_FC5 kernel. I'm certainly open to suggestions for
tuning that we should do in order to get decent performance - I'd
hate to have to drop back to the RH7.3 box. We pretty much ported
the LVS configuration over verbatim from the 7.3 to the FC5, so the
only thing that's changed is the OS and hardware.
One thing we did notice is that by removing the persistent= line in
our https balancer (we have two stanzas - one for port 80 and one for
port 443), the active connection numbers for the port 80 stanza
dropped dramatically. Of course, that broke one part of our
application, so we had to reenable it (though at 30 seconds as
opposed to 300 seconds), but hopefully that's a clue.
I'm attaching our LVS configuration for reference.
Thanks,
Wade
######
# Global Directives
checktimeout=10
checkinterval=5
autoreload=no
logfile="/var/log/ldirectord.log"
quiescent=yes
virtual=128.109.135.22:80
fallback=127.0.0.1:80
real=192.168.0.13:80 masq 2
real=192.168.0.15:80 masq 2
real=192.168.0.16:80 masq 2
real=192.168.0.17:80 masq 2
real=192.168.0.18:80 masq 2
real=192.168.0.19:80 masq 2
real=192.168.0.20:80 masq 2
real=192.168.0.21:80 masq 3
real=192.168.0.22:80 masq 3
real=192.168.0.23:80 masq 3
real=192.168.0.24:80 masq 3
real=192.168.0.25:80 masq 3
real=192.168.0.26:80 masq 3
real=192.168.0.27:80 masq 4
real=192.168.0.28:80 masq 4
real=192.168.0.29:80 masq 4
real=192.168.0.30:80 masq 4
real=192.168.0.31:80 masq 4
service=http
request="lvs/lvs_donotremove"
receive="lvs up"
scheduler=wrr
protocol=tcp
checktype=negotiate
virtual=128.109.135.22:443
fallback=127.0.0.1:443
real=192.168.0.13:443 masq 2
real=192.168.0.15:443 masq 2
real=192.168.0.16:443 masq 2
real=192.168.0.17:443 masq 2
real=192.168.0.18:443 masq 2
real=192.168.0.19:443 masq 2
real=192.168.0.20:443 masq 2
real=192.168.0.21:443 masq 3
real=192.168.0.22:443 masq 3
real=192.168.0.23:443 masq 3
real=192.168.0.24:443 masq 3
real=192.168.0.25:443 masq 3
real=192.168.0.26:443 masq 3
real=192.168.0.27:443 masq 4
real=192.168.0.28:443 masq 4
real=192.168.0.29:443 masq 4
real=192.168.0.30:443 masq 4
real=192.168.0.31:443 masq 4
service=https
request="lvs/slvs_donotremove"
receive="slvs up"
scheduler=wrr
persistent=30
protocol=tcp
checktype=negotiate
|