[lvs-users] problem after realserver failure/shutdown

To: users mailing list. <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: [lvs-users] problem after realserver failure/shutdown
From: John Lash <jlash@xxxxxxxxxxxxx>
Date: Wed, 15 Sep 2010 13:55:54 -0500
I'm running two systems with ipvs and keepalived. They are a localnode 
configuration, that is, the director is also a realserver.

I've found that when I have traffic up and running I can shutdown the 
realserver on the director with only a brief burst of failures. My problem is 
when I shutdown the realserver on the non-director system.

I have a high traffic load (http) up and running traffic from one 
single-threaded client. Life is good.

Then I shutdown the server on the non-director and I get a burst of connection 
failures (Connection refused). That clears up quickly and connections start 
flowing again. 

The problem is that then I see about 10 to 20 seconds of successful 
transactions, followed by a period of about a minute where I'm getting 
connection timeouts every other time (I'm using rr). Then I move into a period 
for the next fifteen minutes where there will be several timeouts about every 
20 seconds but otherwise normal traffic.

The initial "Connection refused" failures happen till keepalived turns off the 
downed realserver. The part I don't understand is why after seeing traffic come 
back, I start seeing the timeouts. I've hooked up tcpdump on the director and 
it shows me that every other connection is not getting a response. I looked at 
tcpdump on the "downed" realserver and there are no odd packets arriving for 
the loadbalanced VIP and port and no evidence of "connection refused" back at 
my client.

keepalived logs don't give any indication that it's healthcheckers are bouncing 
around. ipvsadm -l --stats only shows the functioning realserver.

Does anybody have an idea what's going on here?? This is completely 
reproducible and the timing of the connection errors is also consistent.


John Lash

