LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: 2.4 Hidden Patch

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: 2.4 Hidden Patch
Cc: RParish@xxxxxxxxxxxxx ('Ray Parish')
From: Greg Woods <woods@xxxxxxxx>
Date: Tue, 13 Aug 2002 11:51:31 -0600 (MDT)
> So what's stopping you from modifying /etc/rc.d/init.d/network to suit your
> needs? it's only a shell script... :)

I ended up having to do this, but I don't like it. The reason I don't like
it is that the next RPM update, which may be six months from now when I've
long since forgotten about that mod I made, will wipe it out and suddenly
things break.

In fact, I had to do this because of a race condition. I'm setting up my
LVS mail system using an alias on lo:0 as the service address on the real
servers, so that I can set the hidden flag only for the lo device and still
make connections directly to individual servers when needed. The race 
condition occurs because /etc/rc.d/init.d/network runs before /etc/rc.d/rc.local
during the boot process. This brings up all of the net interfaces before
the hidden flag gets set, and if the router ARP cache entry happens to time
out while a real server is booting, the real server might answer the ARP
request and this screws up LVS until the router ARP cache can be cleared.
In our organization, different people manage the routers than the mail servers,
so getting the router ARP cache cleared is a pain since I cannot do it myself.
The only way I could find to remove the race condition was to have the hidden
flag on lo set in /etc/rc.d/init.d/network, after the lo interface is brought
up but before any of the eth interfaces come up.

Anybody else seen this race condition before?

The next problem we have, I've mentioned before here, and that's dynamic
ARP caching which triggers another race condition. It really does look
like, when a router's ARP cache entry for the VIP times out or is cleared, 
if there happens to be an outbound packet going through the router and
that's the first one it sees, it will cache the MAC address associated with
this packet without ever issuing an explicit ARP. I have verified this by
monitoring ARP requests while this is going on. I have also verified that
only the director will answer an explicit ARP, as it should be, yet somehow,
once in a while, the MAC address of one of the real servers gets stuck in
the router's ARP cache associated with the VIP address. This blows the load
balancing out of the water because now EVERY incoming connection goes directly
from the router to the MAC address of this real server.

I know I must somehow be doing something wrong, because Cisco routers are
quite common and I would have expected someone else to have run into this
by now and nobody seems to have done so. But I do know this: 1) No real
server is answering the router's ARP request; and 2) Somehow the MAC address
of a real server can sneak into the router's ARP cache anyway.

Oh, yes, and since these machines are all running 2.4.18 kernels and the
real servers all use the 2.4 hidden patch, this is at least sort of 
relevant to the original thread :-)

--Greg


<Prev in Thread] Current Thread [Next in Thread>