Hi Joe,
I think you're reinforcing my back-of-the-envelope risk assessment - the
risk is low, but the impact would be high, in our environment. Also,
adding stonith would certainly add another layer of complexity, and
potentially more points of failure.
But, let me ask this pointed question: has anyone ever experienced, or
heard of an incident, where both the active and passive director went
insane and each became active, bringing up the VIPs on their interfaces
(i.e., they both respond to arp requests from the router)? This is my
"biggest" concern and it's not that big to begin with. This would be in
a direct routing configuration, I'm not concerned with NAT or TUN.
Thanks,
Dan
Joseph Mack NA3T wrote:
> On Mon, 10 Dec 2007, Dan Yocum wrote:
>
>> What are your recommendations on stonith and LVS director
>> failovers? Is it useful or not?
>
> people ran without it for years. But then people didn't have
> good backups back then either. How important is your setup:
> are you hosting a 1G$ or 1k$ business setup? Is it to run
> unattended or will people be looking at logs? Do you run
> smartmon on your disks and pre-emptively remove disks at
> 2yrs (even if they're working perfectly) or do you let them
> fail? Do you failout your fans after a year or so? Are you
> rrunning 5 9's or 1 9?
>
> High end commodity hardware isn't too bad nowadays and
> pre-emptive removal of parts that spin/move helps a lot. It
> seems like many of the failures are stupidity (pulling
> plugs, the ISP replaces/reconfigures the router) and no
> amount of stonith will fix that.
>
> Do you trust stonith? Is it a factor of 10 more reliable
> than the failures you expect?
>
> Joe
--
Dan Yocum
Fermilab 630.840.6509
yocum@xxxxxxxx, http://fermigrid.fnal.gov
Fermilab. Just zeros and ones.
|