LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: New system, higher active connections?

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: New system, higher active connections?
From: Roberto Nibali <ratz@xxxxxxxxxxxx>
Date: Tue, 17 Oct 2006 16:39:46 +0200
Was 7.3 with 2.4.x kernel?

Yes - 2.4.18-18.7.x

Ok, so my assumption holds as a base. Would you be able to check if switching to the WLC scheduler changes your observed active connection rate?

Where before, we were seeing Active Connections in the 1-4 range even during normal usage, we're now seeing them in the 12-16 range on average. We've got the same weighting on the new server as we did on the old.

Different Server system and most importantly, different software configuration. IPVS between 2.4 and 2.6 (provided my assumption above holds) has change significantly with regards to the ratio of active/inactive connections. We've seen that in our rrdtool/MRTG graphs as well.

It's not so much that we've seen the numbers change, we've seen an actual load impact on our application. Last night, for example, the load-balanced Apache instances behind LVS were slowing to a crawl and hitting MaxClients even under a fairly light load. So I think it's more than just us having to readjust our expectations as to the number of active connections.

What's your connection per second rate? It could be (theory) that while netfilter in 2.4 maxed out regarding connection tracking and naturally didn't allow too many connections through LVS, we now have a much powerful box including an improved TCP stack handling with regard to netfilter in 2.6.

We'd need more information if you want to dig this phenomenon.

Sure. This is a very vanilla FC5 system running the 2.6.17-1.2174_FC5 kernel. I'm certainly open to suggestions for tuning that we should do in order to get decent performance - I'd hate to have to drop back to

Well, the performance bottleneck doesn't seem to be LVS, does it? Do you collect statistics somewhere? I'd be interested in the server status of the apache instances. Do you get more throughput through the LVS?

the RH7.3 box. We pretty much ported the LVS configuration over verbatim from the 7.3 to the FC5, so the only thing that's changed is the OS and hardware.

There's of course always the option to cap the amount of connections sent to the RS by using the threshold limitation feature of LVS. But I reckon you'd rather find out what exactly is limiting your RS now.

One thing we did notice is that by removing the persistent= line in our https balancer (we have two stanzas - one for port 80 and one for port 443), the active connection numbers for the port 80 stanza dropped dramatically. Of course, that broke one part of our application, so we had to reenable it (though at 30 seconds as opposed to 300 seconds), but hopefully that's a clue.

At first thought not so much.

######
# Global Directives
checktimeout=10
checkinterval=5

This alone will generate a hell of a lot checks IMHO, which could have been too much for the old box to handle. This could mean that now all the checks can be done and put an increased load on the RS. The server status page should give more input on this, as well as the apache access log file.

autoreload=no
logfile="/var/log/ldirectord.log"
quiescent=yes

virtual=128.109.135.22:80
        fallback=127.0.0.1:80
                real=192.168.0.13:80 masq 2
                real=192.168.0.15:80 masq 2
                real=192.168.0.16:80 masq 2
                real=192.168.0.17:80 masq 2
                real=192.168.0.18:80 masq 2
                real=192.168.0.19:80 masq 2
                real=192.168.0.20:80 masq 2
                real=192.168.0.21:80 masq 3
                real=192.168.0.22:80 masq 3
                real=192.168.0.23:80 masq 3
                real=192.168.0.24:80 masq 3
                real=192.168.0.25:80 masq 3
                real=192.168.0.26:80 masq 3
                real=192.168.0.27:80 masq 4
                real=192.168.0.28:80 masq 4
                real=192.168.0.29:80 masq 4
                real=192.168.0.30:80 masq 4
                real=192.168.0.31:80 masq 4
        service=http
        request="lvs/lvs_donotremove"
        receive="lvs up"
        scheduler=wrr
        protocol=tcp
        checktype=negotiate

virtual=128.109.135.22:443
        fallback=127.0.0.1:443
                real=192.168.0.13:443 masq 2
                real=192.168.0.15:443 masq 2
                real=192.168.0.16:443 masq 2
                real=192.168.0.17:443 masq 2
                real=192.168.0.18:443 masq 2
                real=192.168.0.19:443 masq 2
                real=192.168.0.20:443 masq 2
                real=192.168.0.21:443 masq 3
                real=192.168.0.22:443 masq 3
                real=192.168.0.23:443 masq 3
                real=192.168.0.24:443 masq 3
                real=192.168.0.25:443 masq 3
                real=192.168.0.26:443 masq 3
                real=192.168.0.27:443 masq 4
                real=192.168.0.28:443 masq 4
                real=192.168.0.29:443 masq 4
                real=192.168.0.30:443 masq 4
                real=192.168.0.31:443 masq 4
        service=https
        request="lvs/slvs_donotremove"
        receive="slvs up"
        scheduler=wrr
        persistent=30
        protocol=tcp
        checktype=negotiate

Could we get some numbers, please?

ipvsadm -L -n -c
ipvsadm -L -n --stats
ipvsadm -L -n --rate
ipvsadm -L -n --timeout
ipvsadm -L -n --persistent-conn

Thanks and best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

<Prev in Thread] Current Thread [Next in Thread>