LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

ARP-Problem

To: LVS-users Mailing List <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: ARP-Problem
From: Stephan Wonczak <a0033@xxxxxxxxxxxxxxxx>
Date: Tue, 15 Jul 2003 10:05:49 +0200 (MET DST)
  Hi List!

  I have sent this before but it produced no response whatsoever.

  We have set up a 4-Node LVS cluster with 2 real Nodes and redundant
Director using RH AS 2.1 and Piranha (yeah, yeah, I know. Please read on
anyway, since we believe this is not really a problem with the RedHat
implementation). Everything is working fine, until we migrate the director
from the primary to the backup server.
  Each director essentially has only one network interface with all
LVS-related adresses being aliased interfaces. Let's call the primary
machine A, the backup machine B. The network configuration looks something
like this:

eth0    - public Address of Machine
eth0:0  - Public LVS-Address
eth0:1  - NAT-Router private adress

  In an failover-event (or maintanance, then we migrate the service from
the command line), the :0 and :1-interfaces are shut down on machine A and
brough up on  machine B and vice versa. So far, this all works as
expected.
  Now, this is where the fun starts. The real nodes have absolutely no
problem with the failover event, everything just keeps working fine.
(OK, active connections are being severed, but this is expected and we
can live with that) The client machines are being taken care of by the
gratitious ARPs sent by the pulse-daemon, so this keeps working, too.

  *BUT*

  If some client machine has to do a new arp-request, sometimes the now
secondary machines answeres it! Meaning: Machine B is director, having
taken over service from machine A, but both are still running. This
happens e.g. during maintenance. Machine B has both :0 and :1-Adresses,
machine A does no longer (verifiable by ifconfig).
  Using tcpdump we could see machine A still answering arp-requests for
the public LVS-Address, even though it is now assigned to machine B who
*should* be answering. Huh? The funny thing is, this migration of IP
adresses on virtual interfaces forks just fine without this problem for
numerous other services, only ipvs seems to produce the problem with the
ARPs.

  Any ideas, anyone? If there is any info missing or my explanations are
too garbled to understand, please don't hesitate to ask! We are quite
desperate in getting this solved, this bug (?) is a showstopper for
putting our application in production.

        Dipl. Chem. Dr. Stephan Wonczak

        Institut fuer Angewandte Informatik (ZAIK)
        Regionales Rechenzentrum der Universitaet zu Koeln (RRZK)
        Universitaet zu Koeln, Robert-Koch-Strasse 10, 50931 Koeln
        Tel: ++49/(0)221/478-5577, Fax: ++49/(0)221/478-5590

<Prev in Thread] Current Thread [Next in Thread>