On Wed, 26 Apr 2006, Dana Price wrote:
I've got an Ultramonkey 3.0 LB-DR setup, with two
directors. I have heartbeat running over eth0 and a
crossover on eth1. Since both heartbeat links have to
fail for a failover to occur, I'm concerned that something
like a bad nic, cable, or switch will bring my web service
down (say eth0 fails but the crossover eth1 is still up).
Is there any way to define two heartbeat links in ha.cf
but to have it failover if a designated one dies? That way
the directors can still maintain state over the second
link and I'd avoid the split-brained cluster that comes
with only 1 HB link.
this may be possible and someone else can give you the
answer, but I'll talk about something else...
There's only so many things you can worry about, so you pick
the ones that are most likely to go.
The most likely problem is your network connection will go
down - this is usually out of your control.
Next is mechanical things like disks and fans, or connectors
not making good contact. This is the problem you have to
deal with (- see below). Make sure you have ready-to-go
copies of your disks, just sitting on the shelf next to the
machine. You can update them by putting them in an external
USB case and plugging them in somewhere, whenever you change
your machine. Disks are really cheap compared to the cost of
the labor of replacing them, or the cost of downtime. As
well, pre-emptively swap out disks at their warrantee date.
Possibly you have unreliable power. Where I live in the US,
I get a 1 sec power bump once a week, when the power company
must be changing the power feed with a mechanical switch.
You need a UPS. Such things are unheard of in more advanced
parts of the world, like Europe, where you can have a
machine up for 400 days on the regular power without any
interruptions and UPS are not needed at all.
I've never had a NIC just fail. I (accidently) kicked the
BNC connector on one and it died. I killed another with
electrostatic shock by _not_ touching the computer case
before putting my fingers near the empty RJ-45 socket.
That's it - NICs generally don't die and neither do
switches. The tcpip stack never locks up, unless the whole
OS is hosed and that doesn't happen a whole lot with Linux
and if it does, then heartbeat is gone too.
The connectors/cables to a NIC are another thing. Make sure
your cables are multistranded and not a single strand for
each wire. Flexing of single strand wire at the connector
leads to cracks that show up as intermittant connections.
Single strand has become the default since the .com boom,
but they're only tolerated in the commodity market where
people would rather save 1% cost than have a reliable
connection. Nowhere else in the electronic industry are they
used. There's probably not too much problem if the cables
are just laid out and plugged in and left there without
movement till the computer is junked, but if you're
rearranging your cables frequently, use multicored cables.
Heartbeat has been used with LVS for years and we haven't
had anyone come up with a split brain yet. (Maybe it happens
and people dont think it worth mentioning.)
I would say that a pair of NICs with a single crossover
cable is probably the most reliable part of your set up. I
wouldn't bother making it redundant.
Joseph Mack NA3T EME(B,D), FM05lw North Carolina
jmack (at) wm7d (dot) net - azimuthal equidistant map
generator at http://www.wm7d.net/azproj.shtml
Homepage http://www.austintek.com/ It's GNU/Linux!