Re: [lvs-users] recommendations on stonith?

To: " users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [lvs-users] recommendations on stonith?
From: Dan Yocum <yocum@xxxxxxxx>
Date: Wed, 19 Dec 2007 15:39:34 -0600
Hi Graeme,

The description of your problem has prompted a follow-up discussion 
amongst my group members which has lead to the following question:

We're thinking of geographically separating the 2 directors and half of 
the real servers between 2 computing centers here at Fermi (one is about 
a mile down the road from the other).  In the instance that the network 
fails dramatically between these 2 sites, and both directors become 
MASTER, what would happen when the network is restored (recall we're 
using DR, not NAT)?  Would the backup director become the backup again, 
or would someone have to manually intervene?


Graeme Fowler wrote:
> On Wed, 2007-12-12 at 10:52 -0600, Dan Yocum wrote:
>> But, let me ask this pointed question: has anyone ever experienced, or 
>> heard of an incident, where both the active and passive director went 
>> insane and each became active, bringing up the VIPs on their interfaces 
>> (i.e., they both respond to arp requests from the router)?
> Yes, I have.
> It was a complex network where the two keepalived directors were each
> connected to a different Cisco Cat6509 switch with a multi-port gig
> interconnect between the two carrying all the VLANs - essentially one
> big switch in two parts.
> In turn, the Cats were connected to different upstream routers (which in
> turn were cross-connected). This was designed to be a very robust
> network - bits could fail but the packets would route or switch around
> the failure...
> ...only on one occasion, the gig interconnect went bananas and
> segregated the two Cats. This mean the VRRP announcements went
> undetected, so both directors became MASTER - at this point very strange
> things happened, since as MASTER they both became the default gateway
> for traffic leaving the cluster (this was a NAT setup). The routers
> could see ARP flip-flops, but the Cats couldn't.
> All very messy. In order to fix it temporarily I had to do a STONITH of
> sorts, by stopping keepalived on one director.
> All that said, it wasn't the fault of either director - it was my design
> and reliance on a network with a level of complexity that meant the
> condition was possible. Since then I've tried to keep the announcement
> interfaces as close to each other as possible!
> Graeme
> _______________________________________________
> mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
> Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
> or go to

Dan Yocum
Fermilab  630.840.6509
Fermilab.  Just zeros and ones.

<Prev in Thread] Current Thread [Next in Thread>