LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

failover results?

To: "'lvs-users@xxxxxxxxxxxxxxxxxxxxxx'" <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: failover results?
From: Peter Mueller <pmueller@xxxxxxxxxxxx>
Date: Thu, 3 May 2001 17:16:03 -0700
when I pull the ethernet plug on my primary node, stage-monitor, heartbeat
doesn't failover the resource to the other director.  If I manually stop the
service it seems to work fine.  Is this expected?  Can someone elaborate?
ha.cf listed below.

Also, I've brainstormed a bit and thought of a few plausible scenarios for
failure.  Could someone go through them real quick?  

Thanks!

Peter

---------------------

Possible failures of LVS system to test (find out what happens) :

Director failure:
1.) network fails on NIC in some way - cable, switch, or card
test - disconnect network cable on ACTIVE and PASSIVE director while a load
is active.  Measure the time it takes for failover to occur and what happens
to current resources (ie active connections)
result - VIP dies.  secondary director doesn't take over!
2.) serial cable is disconnected or "goes bad"
test - disconnect serial cable.  does failover message switch to ETH/UDP?
is service interrupted?
3.) either director dies (nice recovery?)
test - shut off primary & failover directors and monitor the time it takes
for failover to occur and what happens to current resources.
4.) software stops running for some reason on either director
test - kill all relevant software (test for each program) and see what
happens.

Real server failure:
1.) network fails on NIC in some way - cable, switch, or card
test - disconnect the network and see what the reaction of LVS is.  observe
how connections flow on lVS (mac address problem?).
2.) apache dies
test - stop or kill apache.  see how LVS reacts (mac address problem?)
3.) Tomcat dies
test - stop or kill apache.  see how LVS reacts (mac address problem?)

<ha.cf>
[root@stage-monitor ha.d]# more ha.cf
#       File to wirte debug messages to
debugfile /var/log/ha-debug
#       File to write other messages to
logfile /var/log/ha-log
#       Facility to use for syslog()/logger 
logfacility     local0
#       keepalive: how many seconds between heartbeats
keepalive 1
#       deadtime: seconds-to-declare-host-dead
deadtime 3
#       initdead: added per mailing list archive
#initdead 40
#       hopfudge maximum hop count minus number of nodes in config
#hopfudge 1
#       serial  serialportname ...
serial  /dev/ttyS0
#       Only for serial ports.  It applies to both PPP/UDP and "raw" ports
#       This means run PPP over ports ttyS1 and ttyS2
#       Their respective IP addresses are as listed.
#       Note that I enforce that these are local addresses.  Other addresses
#       are almost certainly a mistake.
#ppp-udp        /dev/ttyS1 10.0.0.1 /dev/ttyS2 10.0.0.2
#       Baud rate for both serial and ppp-udp ports...
#baud   19200
#       What UDP port to use for udp or ppp-udp communication?
#udpport        1001
#       What interfaces to heartbeat over?
udp     eth0
#       Watchdog is the watchdog timer.  If our own heart doesn't beat for
#       a minute, then our machine will reboot.
#watchdog /dev/watchdog
#       Nice_failback sets the behavior when performing a failback:
#
#       - if it's on, when the primary node starts or comes back from any
#         failure and the cluster is already active, i.e. the secondary
#         server performed a failover, the primary stays quiet, acting as a
#         secondary.  This way some operations like syncing disks can be
#         easily done.
#       - if it's off (default), the primary node will always be the
primary,
#         whenever it's powered on.
nice_failback off
#       Tell what machines are in the cluster
#       node    nodename ...    -- must match uname -n
node    stage-monitor
node    vs1.internal.smartbasket.com


<Prev in Thread] Current Thread [Next in Thread>