LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: New system, higher active connections?

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: New system, higher active connections?
From: "H. Wade Minter" <minter@xxxxxxxxxxxxx>
Date: Mon, 16 Oct 2006 12:02:49 -0400

On Oct 15, 2006, at 9:14 AM, Roberto Nibali wrote:

We just moved our primary load balancer from an ancient Dell PowerApp 120 running Red Hat 7.3 to a newer Dell PowerEdge 750 running Fedora Core 5. However, we're noticing something weird.

Was 7.3 with 2.4.x kernel?

Yes - 2.4.18-18.7.x


Where before, we were seeing Active Connections in the 1-4 range even during normal usage, we're now seeing them in the 12-16 range on average. We've got the same weighting on the new server as we did on the old.

Different Server system and most importantly, different software configuration. IPVS between 2.4 and 2.6 (provided my assumption above holds) has change significantly with regards to the ratio of active/inactive connections. We've seen that in our rrdtool/MRTG graphs as well.

It's not so much that we've seen the numbers change, we've seen an actual load impact on our application. Last night, for example, the load-balanced Apache instances behind LVS were slowing to a crawl and hitting MaxClients even under a fairly light load. So I think it's more than just us having to readjust our expectations as to the number of active connections.


Does anyone have any ideas why we might be seeing such a jump on this newer system?

Different kernel, where at least for the (w)LC scheduler the RS calculation is done differently. On top of that, the TCP stack has changed tunables and you hardware also behaves differently. The LVS state transition timeouts are different between 2.4.x and 2.6.x kernels, IIRC and so, for example if you're using LVS-DR, the active connection to passive connection transition takes more time, thus yielding a potentially higher amount of sessions in state active connection.

We'd need more information if you want to dig this phenomenon.

Sure. This is a very vanilla FC5 system running the 2.6.17-1.2174_FC5 kernel. I'm certainly open to suggestions for tuning that we should do in order to get decent performance - I'd hate to have to drop back to the RH7.3 box. We pretty much ported the LVS configuration over verbatim from the 7.3 to the FC5, so the only thing that's changed is the OS and hardware.

One thing we did notice is that by removing the persistent= line in our https balancer (we have two stanzas - one for port 80 and one for port 443), the active connection numbers for the port 80 stanza dropped dramatically. Of course, that broke one part of our application, so we had to reenable it (though at 30 seconds as opposed to 300 seconds), but hopefully that's a clue.

I'm attaching our LVS configuration for reference.

Thanks,
Wade


######
# Global Directives
checktimeout=10
checkinterval=5
autoreload=no
logfile="/var/log/ldirectord.log"
quiescent=yes

virtual=128.109.135.22:80
        fallback=127.0.0.1:80
                real=192.168.0.13:80 masq 2
                real=192.168.0.15:80 masq 2
                real=192.168.0.16:80 masq 2
                real=192.168.0.17:80 masq 2
                real=192.168.0.18:80 masq 2
                real=192.168.0.19:80 masq 2
                real=192.168.0.20:80 masq 2
                real=192.168.0.21:80 masq 3
                real=192.168.0.22:80 masq 3
                real=192.168.0.23:80 masq 3
                real=192.168.0.24:80 masq 3
                real=192.168.0.25:80 masq 3
                real=192.168.0.26:80 masq 3
                real=192.168.0.27:80 masq 4
                real=192.168.0.28:80 masq 4
                real=192.168.0.29:80 masq 4
                real=192.168.0.30:80 masq 4
                real=192.168.0.31:80 masq 4
        service=http
        request="lvs/lvs_donotremove"
        receive="lvs up"
        scheduler=wrr
        protocol=tcp
        checktype=negotiate

virtual=128.109.135.22:443
        fallback=127.0.0.1:443
                real=192.168.0.13:443 masq 2
                real=192.168.0.15:443 masq 2
                real=192.168.0.16:443 masq 2
                real=192.168.0.17:443 masq 2
                real=192.168.0.18:443 masq 2
                real=192.168.0.19:443 masq 2
                real=192.168.0.20:443 masq 2
                real=192.168.0.21:443 masq 3
                real=192.168.0.22:443 masq 3
                real=192.168.0.23:443 masq 3
                real=192.168.0.24:443 masq 3
                real=192.168.0.25:443 masq 3
                real=192.168.0.26:443 masq 3
                real=192.168.0.27:443 masq 4
                real=192.168.0.28:443 masq 4
                real=192.168.0.29:443 masq 4
                real=192.168.0.30:443 masq 4
                real=192.168.0.31:443 masq 4
        service=https
        request="lvs/slvs_donotremove"
        receive="slvs up"
        scheduler=wrr
        persistent=30
        protocol=tcp
        checktype=negotiate


<Prev in Thread] Current Thread [Next in Thread>