Re: Ultramonkey heartbeat failover options

To: d.price@xxxxxxxxxxx, <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: Ultramonkey heartbeat failover options
From: Joseph Mack NA3T <jmack@xxxxxxxx>
Date: Wed, 26 Apr 2006 14:30:23 -0700 (PDT)
On Wed, 26 Apr 2006, Dana Price wrote:

I've got an Ultramonkey 3.0 LB-DR setup, with two directors. I have heartbeat running over eth0 and a crossover on eth1. Since both heartbeat links have to fail for a failover to occur, I'm concerned that something like a bad nic, cable, or switch will bring my web service down (say eth0 fails but the crossover eth1 is still up). Is there any way to define two heartbeat links in but to have it failover if a designated one dies? That way the directors can still maintain state over the second link and I'd avoid the split-brained cluster that comes with only 1 HB link.

this may be possible and someone else can give you the answer, but I'll talk about something else...

There's only so many things you can worry about, so you pick the ones that are most likely to go.

The most likely problem is your network connection will go down - this is usually out of your control.

Next is mechanical things like disks and fans, or connectors not making good contact. This is the problem you have to deal with (- see below). Make sure you have ready-to-go copies of your disks, just sitting on the shelf next to the machine. You can update them by putting them in an external USB case and plugging them in somewhere, whenever you change your machine. Disks are really cheap compared to the cost of the labor of replacing them, or the cost of downtime. As well, pre-emptively swap out disks at their warrantee date.

Possibly you have unreliable power. Where I live in the US, I get a 1 sec power bump once a week, when the power company must be changing the power feed with a mechanical switch. You need a UPS. Such things are unheard of in more advanced parts of the world, like Europe, where you can have a machine up for 400 days on the regular power without any interruptions and UPS are not needed at all.

I've never had a NIC just fail. I (accidently) kicked the BNC connector on one and it died. I killed another with electrostatic shock by _not_ touching the computer case before putting my fingers near the empty RJ-45 socket. That's it - NICs generally don't die and neither do switches. The tcpip stack never locks up, unless the whole OS is hosed and that doesn't happen a whole lot with Linux and if it does, then heartbeat is gone too.

The connectors/cables to a NIC are another thing. Make sure your cables are multistranded and not a single strand for each wire. Flexing of single strand wire at the connector leads to cracks that show up as intermittant connections. Single strand has become the default since the .com boom, but they're only tolerated in the commodity market where people would rather save 1% cost than have a reliable connection. Nowhere else in the electronic industry are they used. There's probably not too much problem if the cables are just laid out and plugged in and left there without movement till the computer is junked, but if you're rearranging your cables frequently, use multicored cables.

Heartbeat has been used with LVS for years and we haven't had anyone come up with a split brain yet. (Maybe it happens and people dont think it worth mentioning.)

I would say that a pair of NICs with a single crossover cable is probably the most reliable part of your set up. I wouldn't bother making it redundant.


Joseph Mack NA3T EME(B,D), FM05lw North Carolina
jmack (at) wm7d (dot) net - azimuthal equidistant map
generator at Homepage It's GNU/Linux!

<Prev in Thread] Current Thread [Next in Thread>