Hello,
On Tue, 15 Jul 2003, Stephan Wonczak wrote:
> We have set up a 4-Node LVS cluster with 2 real Nodes and redundant
> Director using RH AS 2.1 and Piranha (yeah, yeah, I know. Please read on
> anyway, since we believe this is not really a problem with the RedHat
> implementation). Everything is working fine, until we migrate the director
> from the primary to the backup server.
> Each director essentially has only one network interface with all
> LVS-related adresses being aliased interfaces. Let's call the primary
> machine A, the backup machine B. The network configuration looks something
> like this:
>
> eth0 - public Address of Machine
> eth0:0 - Public LVS-Address
> eth0:1 - NAT-Router private adress
>
> In an failover-event (or maintanance, then we migrate the service from
> the command line), the :0 and :1-interfaces are shut down on machine A and
> brough up on machine B and vice versa. So far, this all works as
> expected.
> Now, this is where the fun starts. The real nodes have absolutely no
> problem with the failover event, everything just keeps working fine.
> (OK, active connections are being severed, but this is expected and we
> can live with that) The client machines are being taken care of by the
> gratitious ARPs sent by the pulse-daemon, so this keeps working, too.
>
> *BUT*
>
> If some client machine has to do a new arp-request, sometimes the now
> secondary machines answeres it! Meaning: Machine B is director, having
> taken over service from machine A, but both are still running. This
> happens e.g. during maintenance. Machine B has both :0 and :1-Adresses,
> machine A does no longer (verifiable by ifconfig).
Make sure you set ethX to 0.0.0.0 during "down". If you
leave the main IP address it can continue to respond. Another
problem can be the routing cache returning cached routes but
it should not be the case.
> Using tcpdump we could see machine A still answering arp-requests for
> the public LVS-Address, even though it is now assigned to machine B who
> *should* be answering. Huh? The funny thing is, this migration of IP
> adresses on virtual interfaces forks just fine without this problem for
> numerous other services, only ipvs seems to produce the problem with the
> ARPs.
>
> Any ideas, anyone? If there is any info missing or my explanations are
> too garbled to understand, please don't hesitate to ask! We are quite
> desperate in getting this solved, this bug (?) is a showstopper for
> putting our application in production.
Regards
--
Julian Anastasov <ja@xxxxxx>
|