On Thu, 31 Aug 2000, Dayton Turner wrote:
> Hi There,
>
> I'm having a hard time understanding how heartbeat could be a useful
> program in a large load balanced situation. It is my understanding that
> it establishes a serial connection, and a 'real' ethernet connection.
> When the 'real' ethernet connection dies, heartbeat then instructs the
> machine via the serial link to shut this or that down, then proceeds to
> take its ip over.
>
> A couple things bother me with this, and maybe i just havent read
> enough, but it doesnt make a whole lot of sense:
I'm not an expert on this, but this should tide you over (go to the
Linux-HA website http://www.linux-ha.org/ for more material or read
Pfister's book on clusters for a larger overview, or read the Linux HA
HOWTO
http://metalab.unc.edu/pub/Linux/ALPHA/linux-ha/High-Availability-HOWTO.html
)
> 1. Does this mean now that if I have 50 realservers that I need to buy a
> 50 port serial port device to connect cables to all of these? Sure, I
> could use two ethernet devices as well. That would solve that, I
> suppose.
heartbeat is used between 2 directors, one of which is on standby.
The heatbeat protocol is designed to remove the failed director
so that no other machines can detect its presence (equivalent to
switching it off).
Heartbeat is not designed to remove real-servers from the LVS. Failures on
the real-servers are handled by mon or ldirectord which remove services
(and not real-servers) from the ipvsadm table. It is conceivable that
services will die independantly on real-servers (it's also possible that
all services on a real-server will go down at the same time). There is no
need to remove/switch out a realserver even if all its services have
failed, as long as ipvsadm knows that the machine is no longer offering
its services.
Since heartbeat is only used between 2 machines, then a serial line will
do and has the advatage that it is independant of the tcpip layer. Clearly
it would be better to have multiple directors, and have a quorum voting to
kill a disabled machine, in which case serial is no longer appropriate
since it doesn't scale.
> 2. Why does heartbeat have the responsibility of starting and stopping
> the services? Does it really matter to stop apache if the machine is
> dead? I mean, nobody's accessing it anyways!
I don't think heartbeat has this responsibility directly, but it gives
the OK for the layer below (mon, ldirectord) to do this. When a machine
dies/hangs, its answers can't be trusted, and it may happily reply to
pings, but you can't telnet to it etc. You have to have as many windows on
the machine as possible to make any reasonable judgemet about its
condition.
machine "Fred" may be saying "I'm OK guys really, there's nothing wrong
wrong wrong ..." and the other machine(s) have got to do something to
eliminate all signs of life. In your example the httpd might be able to
fool mon/ldirectord or someone coming in on a browser (who will be getting
a "loading page" notice), but if some agent with a better view of Fred's
health has decided that Fred is deranged (disk is full ...), then attempts
have to be made to kill everything on Fred.
> 3. When the director takes the IP over, what problem does this solve?
You probably don't need an explanation for this now, but just in case...
the router doesn't have to be told that the director is now another
machine.
> If
> i was using simply mon, or ldirectord, why not just remove the IP from
> the list of realservers and get on with life? Consider machine (X) dead,
> instead of making machine (Y) _look_ like machine (X).
>
> I guess my question is: I assemble a server farm of http servers. I use
> LVS to balance my load. Lets say I have 5 webservers and one director.
> When machine 2 goes down, I can remove it from the list of realservers,
> and the remaining 4 servers can deal with the load. When machine 2 comes
> back up, I can enter its IP address back into the realserver table.
that's what ldirectord/mon do for you
Joe
--
Joseph Mack mack@xxxxxxxxxxx
|