ARP-Problem

To:	lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject:	ARP-Problem
From:	Stephan Wonczak <a0033@xxxxxxxxxxxxxxxx>
Date:	Sat, 3 May 2003 22:38:08 +0200 (MET DST)

  Hi List!
  We have set up a 4-Node LVS cluster with 2 real Nodes and redundant
Director using RH AS 2.1 and Piranha (yeah, yeah, I know). Everything is
working fine, until we migrate the director from the primary to the backup
server.
  Each director essentially has only one network interface with all
LVS-related adresses being aliased interfaces. Let's call the primary
machine A, the backup machine B. The network configuration looks solething
like this:

eth0    - public Address of Machine
eth0:0  - Public LVS-Address
eth0:1  - NAT-Router private adress

  In an failover-event, the :0 and :1-interfaces migrate from machine A to
machine B and vice versa.
  Now, this is where the fun starts. The real nodes have absolutely no
problem with the failover event, everything just keeps working fine. The
client machines are being taken care of by the gratitious ARPs  sent by
the pulse-daemon, so this keeps working, too.

  *BUT*

  If some client machine has to do a new arp-request, sometimes the now
secondary machines answeres it! Meaning: Machine B is director, having
taken over service from machine A, but both are still running. This
happens e.g. during maintenance. Machine B has both :0 and :1-Adresses,
machine A does no longer (verifiable by ifconfig).
  Using tcpdump we could see machine A still answering arp-requests for
the public LVS-Address, even though it is now assigned to machine B who
*should* be answering. Huh?

  Any ideas, anyone? If there is any info missing, please don't hesitate
to ask!

        Dipl. Chem. Dr. Stephan Wonczak

        Institut fuer Angewandte Informatik (ZAIK)
        Regionales Rechenzentrum der Universitaet zu Koeln (RRZK)
        Universitaet zu Koeln, Robert-Koch-Strasse 10, 50931 Koeln
        Tel: ++49/(0)221/478-5577, Fax: ++49/(0)221/478-5590

<Prev in Thread]	Current Thread	[Next in Thread>
ARP-Problem, Stephan Wonczak <=

Previous by Date:	Re: Failover Between 2 Datacenters, Nate Carlson
Next by Date:	Oracle/MySQL/NFS server failovers, ken price
Previous by Thread:	RE: Failover Between 2 Datacenters, Peter Mueller
Next by Thread:	Oracle/MySQL/NFS server failovers, ken price
Indexes:	[Date] [Thread] [Top] [All Lists]