LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] IPVS/NAT - no connection after real server down

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [lvs-users] IPVS/NAT - no connection after real server down
From: Graeme Fowler <graeme@xxxxxxxxxxx>
Date: Fri, 05 Sep 2008 14:15:47 +0100
On Fri, 2008-09-05 at 14:47 +0200, Pitscheider, Oswald wrote:
> I’ve tried the LVS with this changes having a little succeed, but there is 
> still the problem that if I remove a real server, requests to the server are 
> responded very slowly.
> From them moment, when the real server is removed from the pool, some 
> requests have to wait seconds for an answer.
> After a minute, the LVS works as it should.

This is fairly predictable, from your configuration and from the way TCP
works.

Each realserver is checked every 20 seconds (delay_loop 20). If you stop
Apache just as the check is done successfully, requests will stall for
20 seconds until the next check (because the server isn't responding).

If a request arrives fractionally after the successful check, the server
isn't responding, then the client will retry at the following intervals:

 -0.002  RS1 Check succeeds
 -0.001  RS1 Apache stopped
  0.000  Request arrives at RS1
  3.000  retry 1 to RS1
  9.000  retry 2 to RS1
 19.998  RS1 Check fails
         keepalived removes RS1 from pool
 21.000  retry 3 sent to RS2

Note however that it may take a short period for keepalived to do the
server removal, which may overlap with retry 3 - and the next delay to
retry 4 is another 24 seconds (3, 6, 12, 24 and so on) which takes you
towards 45 seconds altogether.

> I’ve tested the LVS using jmeter with 25 threats.

And depending on the way jmeter is configured, alongside your webserver
config, this will mean a minimum of 20 seconds (and likely much longer)
delay between you dropping the webserver and the clients recovering.

It is perfectly permissible to bring down the delay_loop as much as you
or your app server can tolerate. For fast failover you need a short
delay. I would argue that for most web clients, 20 seconds is perfectly
acceptable but that can depend entirely on what you're trying to
achieve.

Try "delay_loop 1" and see what you get. What you will get, possibly,
are a lot of log entries - but you should get very fast recovery.

Graeme


_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
<Prev in Thread] Current Thread [Next in Thread>