[lvs-users] Large clusters and slow realserver checking

To: <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: [lvs-users] Large clusters and slow realserver checking
From: "Anthony Sturchio" <asturchio@xxxxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 24 Sep 2009 09:20:43 -0400
My company has a large number of secure web applications that we have
running on an LVS cluster.  There are about 200 real IP's (for 200 different
domains / SSL certificates) and we have 5 different realservers in the mix.
Since we are serving http and https (80 and 443) this works out to be 2,000
realserver entries that ldirectord has to go through.  Obviously this takes
some time.  We have seen it take up to 15 minutes to expire a downed node,
or to reinstate a realserver once we bring it back up, of course depending
on how far along the list ldirectord is.  Using the forking option is not
possible since spawning a high number of processes simultaneously brings the
load balancer to its knees.


The top of our looks like this:




checktimeout = 2

negotiatetimeout = 2

checkinterval = 10

checkcount = 2


Does anyone have any suggestions on how we could improve the very poor
response time to expiring downed servers?  Throughput performance is very
good, however potentially having 20% of our clients wait up to 15 minutes in
the event of a realserver failure is not something management wants to


Thank you very much, 

-Anthony Sturchio


Please read the documentation before posting - it's available at: mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to

<Prev in Thread] Current Thread [Next in Thread>