LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: Questions regarding LVS-DR

To: Dan Trainor <dan@xxxxxxxxxxxxxx>
Subject: Re: Questions regarding LVS-DR
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Horms <horms@xxxxxxxxxxxx>
Date: Fri, 21 Oct 2005 12:52:07 +0900
On Thu, Oct 20, 2005 at 09:00:41PM -0600, Dan Trainor wrote:
> Thanks for the info, Horms.
> 
> I think that my confusion lies on what the ideal way to manage 
> interfaces that do not broadcast ARP, is.  It made sense to me to have 
> two interfaces; this way I could limit which one broadcasted ARP, and 
> which did not.  Sure, "The ARP Problem" sounds like a big problem.  But 
> what is the preferred way of dealing with it, if not for two physical 
> interfaces, on two different physical network segments?
> 
> Before anyone tells me to go RTFM again, I'll remind you, that there 
> were several methods listed, all of which make sense.  My question is, 
> which method is preferred, and possibly, why?

Handling ARP correctly for LVS-DR is tricky, so I certainly won't be
giving RTFM responses, not until we finally have a really good solution
anyway.

My suggestion is to use arp_ignore=2 and arp_announce=1. Please
note that you will almost certainly need to run these on
_all_ interfaces that can respond to ARP. That is really important,
I will try and explain why:

First, lets look at arp_ignore=2. It has a few different modes of
opperation, using "2" is usually applicable. 

The effect of setting it to 1 is that when an ARP request is recieved on
an interface, a response will only be sent if the address requested is
on that interface. The effect of setting it to 2 is the same as 1 with
the additional behaviour that it will only reply if the request came
from a host on a network configred on that interface (interfaces can
have multiple addresses, and thus be connected to multiple networks).

In LVS terms, if eth0 recieves an ARP request for the VIP, 
which is on lo, it would ordinarily respond, but with arp_ignore
enabled, it will not.


Next, arp_announce=1. This is a little more tricky to explain.  If you
have a connection, and an ARP request needs to be sent during the course
of that connection, then the source address of that ARP request will be
the local address of the connection.  If you enable arp_announce=1, then
this behaviour is changed so only an address on the interface that the
ARP request will be sent is used.

The reason the default behaviour is a proble is that when a host
recieves and ARP request, it will update its arp-cache with the source
IP and MAC address of the ARP request.  So an ARP request can adversise
ARP in the same way as an ARP reply.

In LVS terms, imagine that your real server has a connection for the
VIP, that is it is recieving packets for the VIP that have been sent
via the linux-director. Its actually going to have a lot of these
connections, and eventually, in the course of one of its connections,
its arp-cache to the next-hop in the return path (the router, or the
real-server if it is on the LAN) will expire, and it will send 
an ARP request in order to be able to send the next return packet.
Unless you use arp_announce, then this ARP request will have the VIP
as its source address, the router (or real-server) will then send all
subsequent packets for the VIP directly to the real-server. And
subsequent connections will not be load-balanced until the arp-cache
on the router (or real-server) expires. 



You can also resolve this problem using arptables. But you will
need to drop incoming arp requests (in the manner of arp_ignore)
and mangle outgoing arp responses (in the manner of arp_announce)
to be safe.

e.g.

/sbin/arptables -A IN -j DROP -d $VIP
/sbin/arptables -A OUT -j mangle -o eth0 -s $VIP --mangle-ip-s $LOCAL_IP


I have tried to explain both approaches at
http://www.ultramonkey.org/3/topologies/sl-ha-lb-eg.html#restricting_arp

I should probably make some diagrams to explain it a bit more.
I do confess that it took me several readings of the kernel-docs
to understand arp_announce.


-- 
Horms

<Prev in Thread] Current Thread [Next in Thread>