In article <017d01c69a52$7472e380$eba21805@tim> you wrote:
> Setup:
> a.. Single linux director with VIP of 192.168.0.240 and RIP of 192.168.0.16
> b.. Two realservers with RIP of 192.168.0.14 and RIP of 192.168.0.15, called
> realserver1 and realserver2 respectively
> c.. Total of 3 computers, using LVS-DR
> d.. Realservers running tomcat with ssl, each realserver has a copy of the
> ssl certificate, and the director does not have a certificate. Sessions
> are managed with a tomcat cluster.
> e.. The director has ldirectord running, and no heartbeat/director failover
> for purposes of this problem
> f.. ldirectord uses a negotiate page from tomcat to monitor realserver
> health
> g.. Debian Sarge current version, 3.1, with 2.4.27 kernel.
> uname -r output: 2.4.27-2-386. All 3 machines using same o/s
> h.. If I connect 3 client computers to my server farm the system works well,
> and the load is balanced.
> i.. The load balancer can change the client from one server to another
> mid-session and this works fine too.
>
> The problem: If I disconnect realserver1 by pulling out its ethernet
> cable, clients connected to realserver2 are ok, but clients connected
> to realserver1 are not. When I say ok, I mean that if a client is
> logged into my site, (connected to a tomcat server) the client can
> click another link (which requires the client to remain logged into my
> site and the session valid) and another page loads up fine. When I say
> not ok, I mean that if a client connected to realserver1 clicks the
> same link the new page does not load. BUT: if the client clicks that
> same link again the page does load up fine.
>
> Important: A second click loads the page. If I remove realserver1 and
> then wait one minute, the failover is perfect, and the page loads from
> the new server on the first click. Its only a problem in the first 45
> seconds or so. I am running ldirectord in debug mode, and I can see on
> the screen that it detects the missing server within 4 seconds. It
> then issues the ipvsadm commands to remove it from the pool. I can run
> ipvsadm -L -n 10 seconds after removing realserver1 and realserver1 is
> gone from the server pool. So ipvs has been told that the server is
> down, but it still routes packets to it for another 30 seconds or so.
>
> I have used tcpdump on realserver2, and the first click does not
> arrive at it. The second click does. I think ipvs is routing the
> packet incorrectly, and it is taking some 30 seconds to implement the
> ipvsadm command to take realserver1 out of the pool.
>
> Is this normal? Is there any kind of setting I can change to make ipvs
> take notice of the ipvsadm commands more quickly?
I would say that what is happening is that ldirectord is taking a
while to timeout while trying to connect to the disconnected real
server. You should be able to specify a timeout within ldirectord.cf,
and I imagine that should alleviate the problem that you are seeing.
P.S: If you could make your mail < 80 characters wide, that would be awsome
--
Horms
H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/
|