Re: Realserver failover problem using ssl and tomcat

To: " users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: Realserver failover problem using ssl and tomcat
From: Horms <horms@xxxxxxxxxxxx>
Date: Wed, 28 Jun 2006 12:50:45 +0900 (JST)
In article <017d01c69a52$7472e380$eba21805@tim> you wrote:
> Setup: 
>  a.. Single linux director with VIP of and RIP of
>  b.. Two realservers with RIP of and RIP of, called
>      realserver1 and realserver2 respectively
>  c.. Total of 3 computers, using LVS-DR
>  d.. Realservers running tomcat with ssl, each realserver has a copy of the
>      ssl certificate, and the director does not have a certificate. Sessions
>      are managed with a tomcat cluster.
>  e.. The director has ldirectord running, and no heartbeat/director failover
>      for purposes of this problem
>  f.. ldirectord uses a negotiate page from tomcat to monitor realserver
>      health
>  g.. Debian Sarge current version, 3.1, with 2.4.27 kernel.
>      uname -r output: 2.4.27-2-386. All 3 machines using same o/s
>  h.. If I connect 3 client computers to my server farm the system works well,
>      and the load is balanced.
>  i.. The load balancer can change the client from one server to another
>      mid-session and this works fine too.
> The problem:  If I disconnect realserver1 by pulling out its ethernet
> cable, clients connected to realserver2 are ok, but clients connected
> to realserver1 are not. When I say ok, I mean that if a client is
> logged into my site, (connected to a tomcat server) the client can
> click another link (which requires the client to remain logged into my
> site and the session valid) and another page loads up fine. When I say
> not ok, I mean that if a client connected to realserver1 clicks the
> same link the new page does not load. BUT: if the client clicks that
> same link again the page does load up fine.
> Important: A second click loads the page. If I remove realserver1 and
> then wait one minute, the failover is perfect, and the page loads from
> the new server on the first click. Its only a problem in the first 45
> seconds or so. I am running ldirectord in debug mode, and I can see on
> the screen that it detects the missing server within 4 seconds. It
> then issues the ipvsadm commands to remove it from the pool. I can run
> ipvsadm -L -n 10 seconds after removing realserver1 and realserver1 is
> gone from the server pool. So ipvs has been told that the server is
> down, but it still routes packets to it for another 30 seconds or so.
> I have used tcpdump on realserver2, and the first click does not
> arrive at it. The second click does. I think ipvs is routing the
> packet incorrectly, and it is taking some 30 seconds to implement the
> ipvsadm command to take realserver1 out of the pool.
> Is this normal? Is there any kind of setting I can change to make ipvs
> take notice of the ipvsadm commands more quickly?

I would say that what is happening is that ldirectord is taking a
while to timeout while trying to connect to the disconnected real
server. You should be able to specify a timeout within,
and I imagine that should alleviate the problem that you are seeing.

P.S: If you could make your mail < 80 characters wide, that would be awsome

H:          W:

<Prev in Thread] Current Thread [Next in Thread>