[lvs-users] Problem setting up 2-node UltraMonkey style HA cluster

To:	lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject:	[lvs-users] Problem setting up 2-node UltraMonkey style HA cluster
From:	John Donath <john.donath@xxxxx>
Date:	Wed, 26 Sep 2007 12:13:01 +0200
-------------------
Problem description
-------------------

I want to setup a 2 node HA cluster based on the Streamline High 
Availability and
Load Balancing concept.

Unfortunately after having spent many hours I did not succeed in doing so.

The problem AFAIK is related to the fact that the second real server is 
never
getting added to LVS:

[root@grind11 ~]# ipvsadm
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  172.31.1.10:http rr persistent 600
  -> grind11.graddelt.com:http    Local   1      0          0

I now would have expected a second entry in the output from lvsadm, e.g.:
  -> grind12.graddelt.com:http    Route   1      0          0
But it never ever shows up ... -(


--------
Topology
--------

The topology is based on the Streamline High Availability and Load 
Balancing concept.


                                      ROUTER (.1)
                    VIP = .10            |
        ------------------------------------------------ 172.31.1.0/24
         eth1 | .11                        eth1 | .12
           -------                           -------
          |grind11| (eth3)<--bcast-->(eth3) |grind12|
           -------                           -------
         eth0 | .11                        eth0 | .12
        ------------------------------------------------ 10.1.156.0/24


The eth3 interfaces (cross-linked) are used for the bcast.
The 10.1.156.0/24 network is for management purposes only.

Both nodes (Grind11 and Grind 12) are running HTTPd only listening on the
172.31.1.0/24 network.

-------------
Configuration
-------------

[root@grind11 ha.d]# sysctl -a | grep arp | egrep "(ignore|annou)" | sort
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.default.arp_announce = 0
net.ipv4.conf.default.arp_ignore = 0
net.ipv4.conf.eth0.arp_announce = 2
net.ipv4.conf.eth0.arp_ignore = 1
net.ipv4.conf.eth1.arp_announce = 2
net.ipv4.conf.eth1.arp_ignore = 1
net.ipv4.conf.eth3.arp_announce = 2
net.ipv4.conf.eth3.arp_ignore = 1
net.ipv4.conf.lo.arp_announce = 0
net.ipv4.conf.lo.arp_ignore = 0

net.ipv4.conf.all.forwarding = 1
net.ipv4.conf.default.forwarding = 1
net.ipv4.conf.eth0.forwarding = 1
net.ipv4.conf.eth1.forwarding = 1
net.ipv4.conf.eth3.forwarding = 1
net.ipv4.conf.lo.forwarding = 1

[root@grind11 ha.d]# cat /etc/hosts
127.0.0.1       localhost.localdomain localhost
172.31.1.11     grind11.graddelt.com
172.31.1.12     grind12.graddelt.com

[root@grind11 ha.d]# cat ha.cf
logfacility   local0
debug         0
keepalive     1
deadtime      10
warntime      5
initdead      120
udpport       694
#ucast eth3 10.0.0.2
bcast eth3
auto_failback on
node          grind11.graddelt.com
node          grind12.graddelt.com
#ping          172.31.1.1
respawn hacluster /usr/lib/heartbeat/ipfail
crm off

[root@grind11 ha.d]# cat haresources
grind11.graddelt.com    \
        ldirectord::ldirectord.cf \
        LVSSyncDaemonSwap::master \
        IPaddr2::172.31.1.10/24/eth1/172.31.1.255

[root@grind11 ha.d]# cat ldirectord.cf
checktimeout=10
checkinterval=2
autoreload=no
logfile="/var/log/ldirectord.log"
#logfile="local0"
quiescent=no
virtual=172.31.1.10:80
        fallback=127.0.0.1:80
        real=172.31.1.11:80 gate
        real=172.31.1.12:80 gate
        service=http
        scheduler=rr
        persistent=600
        protocol=tcp
        checktype=negotiate
        request="ldtest.html"
        receive="GRIND11"

NOTE: Files are the same on both nodes!

--------
Software
--------

CentOS 4.5
httpd-2.0.52-32.3.ent.centos4
heartbeat-2.1.2-3.el4.centos
heartbeat-pils-2.1.2-3.el4.centos
heartbeat-stonith-2.1.2-3.el4.centos
heartbeat-ldirectord-2.1.2-3.el4.centos
ipvsadm-1.24-6

------------------------------------
Before starting heartbeat on Grind11
------------------------------------

NOTE: Heartbeat is not yet started on Grind12!

[root@grind11 ~]# ip ad
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
    inet 172.31.1.10/32 brd 172.31.1.255 scope global lo:0
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:18:71:e9:d0:d6 brd ff:ff:ff:ff:ff:ff
    inet 10.1.156.11/24 brd 10.1.156.255 scope global eth0
3: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:18:71:e9:d0:d5 brd ff:ff:ff:ff:ff:ff
    inet 172.31.1.11/24 brd 172.31.1.255 scope global eth1
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
    link/ether 00:0e:0c:c1:03:35 brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
    link/ether 00:0e:0c:d7:de:ef brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.1/30 brd 10.0.0.3 scope global eth3

[root@grind11 ~]# ipvsadm
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn

-----------------------------------
After starting heartbeat on Grind11
-----------------------------------

[root@grind11 ~]# tail -f /var/log/ha_log
Sep 26 10:48:44 grind11 heartbeat: [11850]: info: Version 2 support: off
Sep 26 10:48:44 grind11 heartbeat: [11850]: WARN: Logging daemon is 
disabled --enabling logging daemon is recommended
Sep 26 10:48:44 grind11 heartbeat: [11850]: info: **************************
Sep 26 10:48:44 grind11 heartbeat: [11850]: info: Configuration 
validated. Starting heartbeat 2.1.2
Sep 26 10:48:44 grind11 heartbeat: [11851]: info: heartbeat: version 2.1.2
Sep 26 10:48:44 grind11 heartbeat: [11851]: info: Heartbeat generation: 
1190494128
Sep 26 10:48:44 grind11 heartbeat: [11851]: info: 
G_main_add_TriggerHandler: Added signal manual handler
Sep 26 10:48:44 grind11 heartbeat: [11851]: info: 
G_main_add_TriggerHandler: Added signal manual handler
Sep 26 10:48:44 grind11 heartbeat: [11851]: info: Removing 
/var/run/heartbeat/rsctmp failed, recreating.
Sep 26 10:48:44 grind11 heartbeat: [11851]: info: glib: UDP Broadcast 
heartbeat started on port 694 (694) interface eth3
Sep 26 10:48:44 grind11 heartbeat: [11851]: info: glib: UDP Broadcast 
heartbeat closed on port 694 interface eth3 - Status: 1Sep 26 10:48:44 
grind11 heartbeat: [11851]: info: G_main_add_SignalHandler: Added signal 
handler for signal 17
Sep 26 10:48:44 grind11 heartbeat: [11851]: info: Local status now set 
to: 'up'
Sep 26 10:48:45 grind11 heartbeat: [11851]: info: Link 
grind11.graddelt.com:eth3 up.
Sep 26 10:50:44 grind11 heartbeat: [11851]: WARN: node 
grind12.graddelt.com: is dead
Sep 26 10:50:44 grind11 heartbeat: [11851]: info: Comm_now_up(): 
updating status to active
Sep 26 10:50:44 grind11 heartbeat: [11851]: info: Local status now set 
to: 'active'
Sep 26 10:50:44 grind11 heartbeat: [11851]: info: Starting child client 
"/usr/lib/heartbeat/ipfail" (90,90)
Sep 26 10:50:44 grind11 heartbeat: [11851]: WARN: No STONITH device 
configured.
Sep 26 10:50:44 grind11 heartbeat: [11851]: WARN: Shared disks are not 
protected.
Sep 26 10:50:44 grind11 heartbeat: [11851]: info: Resources being 
acquired from grind12.graddelt.com.
Sep 26 10:50:44 grind11 heartbeat: [11860]: info: Starting 
"/usr/lib/heartbeat/ipfail" as uid 90  gid 90 (pid 11860)
Sep 26 10:50:44 grind11 heartbeat: [11861]: debug: notify_world: setting 
SIGCHLD Handler to SIG_DFL
Sep 26 10:50:44 grind11 harc[11861]: info: Running /etc/ha.d/rc.d/status 
status
Sep 26 10:50:44 grind11 ipfail: [11860]: debug: [We are 
grind11.graddelt.com]
Sep 26 10:50:44 grind11 mach_down[11891]: info: 
/usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Sep 26 10:50:44 grind11 mach_down[11891]: info: mach_down takeover 
complete for node grind12.graddelt.com.
Sep 26 10:50:44 grind11 heartbeat: [11851]: info: mach_down takeover 
complete.
Sep 26 10:50:44 grind11 heartbeat: [11851]: info: Initial resource 
acquisition complete (mach_down)
Sep 26 10:50:44 grind11 heartbeat: [11851]: debug: 
StartNextRemoteRscReq(): child count 1
Sep 26 10:50:44 grind11 ipfail: [11860]: debug: auto_failback -> 1 (on)
Sep 26 10:50:44 grind11 ipfail: [11860]: debug: Setting message filter mode
Sep 26 10:50:44 grind11 heartbeat: [11862]: info: Local Resource 
acquisition completed.
Sep 26 10:50:44 grind11 heartbeat: [11851]: debug: 
StartNextRemoteRscReq(): child count 1
Sep 26 10:50:44 grind11 ipfail: [11860]: debug: Starting node walk
Sep 26 10:50:44 grind11 heartbeat: [11951]: debug: notify_world: setting 
SIGCHLD Handler to SIG_DFL
Sep 26 10:50:44 grind11 harc[11951]: info: Running 
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
Sep 26 10:50:44 grind11 ip-request-resp[11951]: received ip-request-resp 
ldirectord::ldirectord.cf OK yes
Sep 26 10:50:44 grind11 ResourceManager[11972]: info: Acquiring resource 
group: grind11.graddelt.com ldirectord::ldirectord.cf 
LVSSyncDaemonSwap::master IPaddr2::172.31.1.10/24/eth1/172.31.1.255
Sep 26 10:50:45 grind11 ResourceManager[11972]: info: Running 
/etc/ha.d/resource.d/ldirectord ldirectord.cf start
Sep 26 10:50:45 grind11 ResourceManager[11972]: debug: Starting 
/etc/ha.d/resource.d/ldirectord ldirectord.cf start
Sep 26 10:50:45 grind11 ipfail: [11860]: debug: Cluster node: 
grind12.graddelt.com: status: dead
Sep 26 10:50:45 grind11 ipfail: [11860]: debug: [They are 
grind12.graddelt.com]
Sep 26 10:50:45 grind11 ipfail: [11860]: debug: Cluster node: 
grind11.graddelt.com: status: active
Sep 26 10:50:45 grind11 ResourceManager[11972]: debug: 
/etc/ha.d/resource.d/ldirectord ldirectord.cf start done. RC=0
Sep 26 10:50:45 grind11 ResourceManager[11972]: info: Running 
/etc/ha.d/resource.d/LVSSyncDaemonSwap master start
Sep 26 10:50:45 grind11 ResourceManager[11972]: debug: Starting 
/etc/ha.d/resource.d/LVSSyncDaemonSwap master start
Sep 26 10:50:45 grind11 LVSSyncDaemonSwap[12089]: info: ipvs_syncmaster up
Sep 26 10:50:45 grind11 LVSSyncDaemonSwap[12089]: info: ipvs_syncmaster 
obtainedSep 26 10:50:45 grind11 ResourceManager[11972]: debug: 
/etc/ha.d/resource.d/LVSSyncDaemonSwap master start done. RC=0
Sep 26 10:50:46 grind11 ipfail: [11860]: debug: Setting message signal
Sep 26 10:50:46 grind11 IPaddr2[12135]: INFO:  Resource is stopped
Sep 26 10:50:46 grind11 ResourceManager[11972]: info: Running 
/etc/ha.d/resource.d/IPaddr2 172.31.1.10/24/eth1/172.31.1.255 start
Sep 26 10:50:46 grind11 ResourceManager[11972]: debug: Starting 
/etc/ha.d/resource.d/IPaddr2 172.31.1.10/24/eth1/172.31.1.255 start
Sep 26 10:50:46 grind11 IPaddr2[12251]: INFO: Removing conflicting 
loopback lo.
Sep 26 10:50:46 grind11 IPaddr2[12251]: INFO: ip -f inet addr delete 
172.31.1.10/32 dev lo
Sep 26 10:50:46 grind11 IPaddr2[12251]: INFO: ip -o -f inet addr show lo
Sep 26 10:50:46 grind11 IPaddr2[12251]: INFO: ip route delete 
172.31.1.10 dev loSep 26 10:50:46 grind11 IPaddr2[12251]: INFO: ip -f 
inet addr add 172.31.1.10/24 brd 172.31.1.255 dev eth1
Sep 26 10:50:46 grind11 IPaddr2[12251]: INFO: ip link set eth1 up
Sep 26 10:50:46 grind11 IPaddr2[12251]: INFO: 
/usr/lib/heartbeat/send_arp -i 200 -r 5 -p 
/var/run/heartbeat/rsctmp/send_arp/send_arp-172.31.1.10 eth1 172.31.1.10 
auto not_used not_used
Sep 26 10:50:46 grind11 IPaddr2[12222]: INFO:  Success
Sep 26 10:50:46 grind11 ResourceManager[11972]: debug: 
/etc/ha.d/resource.d/IPaddr2 172.31.1.10/24/eth1/172.31.1.255 start 
done. RC=0
Sep 26 10:50:46 grind11 ipfail: [11860]: debug: Waiting for messages...
Sep 26 10:50:54 grind11 heartbeat: [11851]: info: Local Resource 
acquisition completed. (none)
Sep 26 10:50:54 grind11 heartbeat: [11851]: info: local resource 
transition completed.


[root@grind11 ~]# tail -f /var/log/ldirectord.log
[Wed Sep 26 10:50:44 2007|ldirectord.cf|11933] Invoking ldirectord 
invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status
[Wed Sep 26 10:50:44 2007|ldirectord.cf|11933] Exiting with exit_status 
3: Exiting from ldirectord status
[Wed Sep 26 10:50:45 2007|ldirectord.cf|11999] Invoking ldirectord 
invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status
[Wed Sep 26 10:50:45 2007|ldirectord.cf|11999] Exiting with exit_status 
3: Exiting from ldirectord status
[Wed Sep 26 10:50:45 2007|ldirectord.cf|12021] Invoking ldirectord 
invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf start
[Wed Sep 26 10:50:45 2007|ldirectord.cf|12021] Starting Linux Director 
v1.186-ha-2.1.2 as daemon
[Wed Sep 26 10:50:45 2007|ldirectord.cf|12023] Added virtual server: 
172.31.1.10:80
[Wed Sep 26 10:50:45 2007|ldirectord.cf|12023] Added fallback server: 
127.0.0.1:80 (172.31.1.10:80) (Weight set to 1)
[Wed Sep 26 10:50:46 2007|ldirectord.cf|12023] Added real server: 
172.31.1.11:80 (172.31.1.10:80) (Weight set to 1)
[Wed Sep 26 10:50:46 2007|ldirectord.cf|12023] Deleted fallback server: 
127.0.0.1:80 (172.31.1.10:80)

[root@grind11 ~]# ip addr
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:18:71:e9:d0:d6 brd ff:ff:ff:ff:ff:ff
    inet 10.1.156.11/24 brd 10.1.156.255 scope global eth0
3: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:18:71:e9:d0:d5 brd ff:ff:ff:ff:ff:ff
    inet 172.31.1.11/24 brd 172.31.1.255 scope global eth1
    inet 172.31.1.10/24 brd 172.31.1.255 scope global secondary eth1
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
    link/ether 00:0e:0c:c1:03:35 brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
    link/ether 00:0e:0c:d7:de:ef brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.1/30 brd 10.0.0.3 scope global eth3

[root@grind11 ~]# ipvsadm
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  172.31.1.10:http rr persistent 600
  -> grind11.graddelt.com:http    Local   1      0          0

-------
COMMENT
-------

I now would have expected a second entry in the output from lvsadm, e.g.:
  -> grind12.graddelt.com:http    Route   1      0          0
But it never ever shows up ... -(

I will now start heartbeat on the other node (grind12).


------------------------------------
Before starting heartbeat on Grind12
------------------------------------

[root@grind12 ~]# ip addr
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
    inet 172.31.1.10/32 brd 172.31.1.255 scope global lo:0
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:06:5b:8c:0b:3a brd ff:ff:ff:ff:ff:ff
    inet 10.1.156.12/24 brd 10.1.156.255 scope global eth0
3: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:06:5b:8c:0b:3b brd ff:ff:ff:ff:ff:ff
    inet 172.31.1.12/24 brd 172.31.1.255 scope global eth1
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
    link/ether 00:0e:0c:c5:ef:15 brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
    link/ether 00:0e:0c:c1:00:fe brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.2/30 brd 10.0.0.3 scope global eth3

[root@grind12 ~]# tail -f /var/log/ha_log
Sep 26 11:09:44 grind12 heartbeat: [16610]: info: Version 2 support: off
Sep 26 11:09:44 grind12 heartbeat: [16610]: WARN: Logging daemon is 
disabled --enabling logging daemon is recommended
Sep 26 11:09:44 grind12 heartbeat: [16610]: info: **************************
Sep 26 11:09:44 grind12 heartbeat: [16610]: info: Configuration 
validated. Starting heartbeat 2.1.2
Sep 26 11:09:44 grind12 heartbeat: [16611]: info: heartbeat: version 2.1.2
Sep 26 11:09:44 grind12 heartbeat: [16611]: info: Heartbeat generation: 
1190494141
Sep 26 11:09:44 grind12 heartbeat: [16611]: info: 
G_main_add_TriggerHandler: Added signal manual handler
Sep 26 11:09:44 grind12 heartbeat: [16611]: info: 
G_main_add_TriggerHandler: Added signal manual handler
Sep 26 11:09:44 grind12 heartbeat: [16611]: info: Removing 
/var/run/heartbeat/rsctmp failed, recreating.
Sep 26 11:09:44 grind12 heartbeat: [16611]: info: glib: UDP Broadcast 
heartbeat started on port 694 (694) interface eth3
Sep 26 11:09:44 grind12 heartbeat: [16611]: info: glib: UDP Broadcast 
heartbeat closed on port 694 interface eth3 - Status: 1Sep 26 11:09:44 
grind12 heartbeat: [16611]: info: G_main_add_SignalHandler: Added signal 
handler for signal 17
Sep 26 11:09:44 grind12 heartbeat: [16611]: info: Local status now set 
to: 'up'
Sep 26 11:09:45 grind12 heartbeat: [16611]: info: Link 
grind11.graddelt.com:eth3 up.
Sep 26 11:09:45 grind12 heartbeat: [16611]: info: Status update for node 
grind11.graddelt.com: status active
Sep 26 11:09:45 grind12 heartbeat: [16611]: info: Link 
grind12.graddelt.com:eth3 up.
Sep 26 11:09:45 grind12 heartbeat: [16618]: debug: notify_world: setting 
SIGCHLD Handler to SIG_DFL
Sep 26 11:09:45 grind12 harc[16618]: info: Running /etc/ha.d/rc.d/status 
status
Sep 26 11:09:45 grind12 heartbeat: [16611]: info: Comm_now_up(): 
updating status to active
Sep 26 11:09:45 grind12 heartbeat: [16611]: info: Local status now set 
to: 'active'
Sep 26 11:09:45 grind12 heartbeat: [16611]: info: Starting child client 
"/usr/lib/heartbeat/ipfail" (90,90)
Sep 26 11:09:45 grind12 heartbeat: [16635]: info: Starting 
"/usr/lib/heartbeat/ipfail" as uid 90  gid 90 (pid 16635)
Sep 26 11:09:46 grind12 heartbeat: [16611]: info: remote resource 
transition completed.
Sep 26 11:09:46 grind12 heartbeat: [16611]: info: remote resource 
transition completed.
Sep 26 11:09:46 grind12 heartbeat: [16611]: info: Local Resource 
acquisition completed. (none)
Sep 26 11:09:46 grind12 ipfail: [16635]: debug: [We are 
grind12.graddelt.com]
Sep 26 11:09:46 grind12 ipfail: [16635]: debug: auto_failback -> 1 (on)
Sep 26 11:09:46 grind12 heartbeat: [16611]: info: grind11.graddelt.com 
wants to go standby [foreign]
Sep 26 11:09:46 grind12 ipfail: [16635]: debug: Setting message filter mode
Sep 26 11:09:47 grind12 heartbeat: [16611]: info: standby: acquire 
[foreign] resources from grind11.graddelt.com
Sep 26 11:09:47 grind12 heartbeat: [16636]: info: acquire local HA 
resources (standby).
Sep 26 11:09:47 grind12 ipfail: [16635]: debug: Starting node walk
Sep 26 11:09:47 grind12 ipfail: [16635]: debug: Cluster node: 
grind12.graddelt.com: status: active
Sep 26 11:09:47 grind12 heartbeat: [16636]: info: local HA resource 
acquisition completed (standby).
Sep 26 11:09:47 grind12 heartbeat: [16611]: info: Standby resource 
acquisition done [foreign].
Sep 26 11:09:47 grind12 heartbeat: [16611]: info: Initial resource 
acquisition complete (auto_failback)
Sep 26 11:09:48 grind12 heartbeat: [16611]: info: remote resource 
transition completed.
Sep 26 11:09:48 grind12 ipfail: [16635]: debug: Cluster node: 
grind11.graddelt.com: status: active
Sep 26 11:09:48 grind12 ipfail: [16635]: debug: [They are 
grind11.graddelt.com]
Sep 26 11:09:48 grind12 ipfail: [16635]: debug: Setting message signal
Sep 26 11:09:48 grind12 ipfail: [16635]: debug: Waiting for messages...
Sep 26 11:09:49 grind12 ipfail: [16635]: debug: Other side is now stable.
Sep 26 11:09:49 grind12 ipfail: [16635]: debug: Other side is now stable.
Sep 26 11:09:51 grind12 ipfail: [16635]: debug: Got asked for num_ping.
Sep 26 11:09:51 grind12 ipfail: [16635]: info: Ping node count is balanced.
Sep 26 11:09:51 grind12 ipfail: [16635]: debug: Abort message sent.
Sep 26 11:09:52 grind12 ipfail: [16635]: info: Giving up foreign 
resources (auto_failback).
Sep 26 11:09:52 grind12 ipfail: [16635]: info: Delayed giveup in 2 seconds.
Sep 26 11:09:52 grind12 ipfail: [16635]: debug: Other side is unstable.
Sep 26 11:09:53 grind12 ipfail: [16635]: debug: Other side is now stable.
Sep 26 11:09:53 grind12 ipfail: [16635]: debug: Other side is now stable.
Sep 26 11:09:54 grind12 ipfail: [16635]: info: giveup() called (timeout 
worked)
Sep 26 11:09:54 grind12 ipfail: [16635]: debug: Message [ask_resources] 
sent.
Sep 26 11:09:54 grind12 ipfail: [16635]: debug: giveup timeout has been 
destroyed.
Sep 26 11:09:54 grind12 heartbeat: [16611]: info: grind12.graddelt.com 
wants to go standby [foreign]
Sep 26 11:09:55 grind12 heartbeat: [16611]: info: standby: 
grind11.graddelt.com can take our foreign resources
Sep 26 11:09:55 grind12 heartbeat: [16649]: info: give up foreign HA 
resources (standby).
Sep 26 11:09:55 grind12 ResourceManager[16662]: info: Releasing resource 
group: grind11.graddelt.com ldirectord::ldirectord.cf 
LVSSyncDaemonSwap::master IPaddr2::172.31.1.10/24/eth1/172.31.1.255
Sep 26 11:09:55 grind12 ResourceManager[16662]: info: Running 
/etc/ha.d/resource.d/IPaddr2 172.31.1.10/24/eth1/172.31.1.255 stop
Sep 26 11:09:55 grind12 ResourceManager[16662]: debug: Starting 
/etc/ha.d/resource.d/IPaddr2 172.31.1.10/24/eth1/172.31.1.255 stop
Sep 26 11:09:55 grind12 IPaddr2[16700]: INFO:  Success
Sep 26 11:09:55 grind12 ResourceManager[16662]: debug: 
/etc/ha.d/resource.d/IPaddr2 172.31.1.10/24/eth1/172.31.1.255 stop done. 
RC=0
Sep 26 11:09:55 grind12 ResourceManager[16662]: info: Running 
/etc/ha.d/resource.d/LVSSyncDaemonSwap master stop
Sep 26 11:09:55 grind12 ResourceManager[16662]: debug: Starting 
/etc/ha.d/resource.d/LVSSyncDaemonSwap master stop
Sep 26 11:09:55 grind12 LVSSyncDaemonSwap[16787]: info: ipvs_syncbackup up
Sep 26 11:09:55 grind12 LVSSyncDaemonSwap[16787]: info: ipvs_syncmaster 
releasedSep 26 11:09:55 grind12 ResourceManager[16662]: debug: 
/etc/ha.d/resource.d/LVSSyncDaemonSwap master stop done. RC=0
Sep 26 11:09:55 grind12 ResourceManager[16662]: info: Running 
/etc/ha.d/resource.d/ldirectord ldirectord.cf stop
Sep 26 11:09:55 grind12 ResourceManager[16662]: debug: Starting 
/etc/ha.d/resource.d/ldirectord ldirectord.cf stop
Sep 26 11:09:56 grind12 ResourceManager[16662]: debug: 
/etc/ha.d/resource.d/ldirectord ldirectord.cf stop done. RC=0
Sep 26 11:09:56 grind12 heartbeat: [16649]: info: foreign HA resource 
release completed (standby).
Sep 26 11:09:56 grind12 heartbeat: [16611]: info: Local standby process 
completed [foreign].
Sep 26 11:09:58 grind12 heartbeat: [16611]: WARN: 1 lost packet(s) for 
[grind11.graddelt.com] [1296:1298]
Sep 26 11:09:58 grind12 heartbeat: [16611]: info: remote resource 
transition completed.
Sep 26 11:09:58 grind12 heartbeat: [16611]: info: No pkts missing from 
grind11.graddelt.com!
Sep 26 11:09:58 grind12 heartbeat: [16611]: info: Other node completed 
standby takeover of foreign resources.
Sep 26 11:09:58 grind12 ipfail: [16635]: debug: Other side is now stable.

[root@grind12 ~]# tail -f /var/log/ldirectord.log
[Wed Sep 26 11:09:44 2007|ldirectord.cf|16597] Invoking ldirectord 
invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status
[Wed Sep 26 11:09:44 2007|ldirectord.cf|16597] Exiting with exit_status 
3: Exiting from ldirectord status
[Wed Sep 26 11:09:56 2007|ldirectord.cf|16846] Invoking ldirectord 
invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf stop

[root@grind11 ~]# tail -f /var/log/ha_log
Sep 26 11:09:45 grind11 heartbeat: [11851]: info: Link 
grind12.graddelt.com:eth3 up.
Sep 26 11:09:45 grind11 heartbeat: [11851]: info: Status update for node 
grind12.graddelt.com: status init
Sep 26 11:09:45 grind11 heartbeat: [11851]: info: Status update for node 
grind12.graddelt.com: status up
Sep 26 11:09:45 grind11 heartbeat: [11851]: debug: 
StartNextRemoteRscReq(): child count 1
Sep 26 11:09:45 grind11 heartbeat: [11851]: debug: get_delnodelist: 
delnodelist=
Sep 26 11:09:45 grind11 ipfail: [11860]: info: Link Status update: Link 
grind12.graddelt.com/eth3 now has status up
Sep 26 11:09:45 grind11 ipfail: [11860]: info: Status update: Node 
grind12.graddelt.com now has status init
Sep 26 11:09:45 grind11 heartbeat: [15508]: debug: notify_world: setting 
SIGCHLD Handler to SIG_DFL
Sep 26 11:09:45 grind11 ipfail: [11860]: info: Status update: Node 
grind12.graddelt.com now has status up
Sep 26 11:09:45 grind11 harc[15508]: info: Running /etc/ha.d/rc.d/status 
status
Sep 26 11:09:45 grind11 heartbeat: [15525]: debug: notify_world: setting 
SIGCHLD Handler to SIG_DFL
Sep 26 11:09:45 grind11 harc[15525]: info: Running /etc/ha.d/rc.d/status 
status
Sep 26 11:09:46 grind11 heartbeat: [11851]: info: Status update for node 
grind12.graddelt.com: status active
Sep 26 11:09:46 grind11 ipfail: [11860]: info: Status update: Node 
grind12.graddelt.com now has status active
Sep 26 11:09:46 grind11 heartbeat: [15541]: debug: notify_world: setting 
SIGCHLD Handler to SIG_DFL
Sep 26 11:09:46 grind11 ipfail: [11860]: debug: Got join message from 
another ipfail client. (grind12.graddelt.com)
Sep 26 11:09:46 grind11 harc[15541]: info: Running /etc/ha.d/rc.d/status 
status
Sep 26 11:09:46 grind11 heartbeat: [11851]: info: remote resource 
transition completed.
Sep 26 11:09:46 grind11 heartbeat: [11851]: info: grind11.graddelt.com 
wants to go standby [foreign]
Sep 26 11:09:46 grind11 ipfail: [11860]: info: Asking other side for 
ping node count.
Sep 26 11:09:46 grind11 ipfail: [11860]: debug: Message [num_ping] sent.
Sep 26 11:09:46 grind11 ipfail: [11860]: debug: Other side is unstable.
Sep 26 11:09:47 grind11 ipfail: [11860]: debug: Other side is now stable.
Sep 26 11:09:47 grind11 heartbeat: [11851]: info: standby: 
grind12.graddelt.com can take our foreign resources
Sep 26 11:09:47 grind11 heartbeat: [15557]: info: give up foreign HA 
resources (standby).
Sep 26 11:09:47 grind11 heartbeat: [15557]: info: foreign HA resource 
release completed (standby).
Sep 26 11:09:47 grind11 heartbeat: [11851]: info: Local standby process 
completed [foreign].
Sep 26 11:09:47 grind11 heartbeat: [11851]: WARN: 1 lost packet(s) for 
[grind12.graddelt.com] [13:15]
Sep 26 11:09:47 grind11 heartbeat: [11851]: info: remote resource 
transition completed.
Sep 26 11:09:47 grind11 heartbeat: [11851]: info: No pkts missing from 
grind12.graddelt.com!
Sep 26 11:09:47 grind11 heartbeat: [11851]: info: Other node completed 
standby takeover of foreign resources.
Sep 26 11:09:47 grind11 ipfail: [11860]: debug: Other side is now stable.
Sep 26 11:09:48 grind11 ipfail: [11860]: debug: Other side is now stable.
Sep 26 11:09:52 grind11 ipfail: [11860]: info: No giveup timer to abort.
Sep 26 11:09:54 grind11 heartbeat: [11851]: info: grind12.graddelt.com 
wants to go standby [foreign]
Sep 26 11:09:55 grind11 ipfail: [11860]: debug: Other side is unstable.
Sep 26 11:09:56 grind11 heartbeat: [11851]: info: standby: acquire 
[foreign] resources from grind12.graddelt.com
Sep 26 11:09:56 grind11 heartbeat: [15570]: info: acquire local HA 
resources (standby).
Sep 26 11:09:56 grind11 ResourceManager[15583]: info: Acquiring resource 
group: grind11.graddelt.com ldirectord::ldirectord.cf 
LVSSyncDaemonSwap::master IPaddr2::172.31.1.10/24/eth1/172.31.1.255
Sep 26 11:09:57 grind11 ResourceManager[15583]: info: Running 
/etc/ha.d/resource.d/ldirectord ldirectord.cf start
Sep 26 11:09:57 grind11 ResourceManager[15583]: debug: Starting 
/etc/ha.d/resource.d/ldirectord ldirectord.cf start
Sep 26 11:09:57 grind11 ResourceManager[15583]: debug: 
/etc/ha.d/resource.d/ldirectord ldirectord.cf start done. RC=0
Sep 26 11:09:57 grind11 IPaddr2[15678]: INFO:  Running OK
Sep 26 11:09:57 grind11 heartbeat: [15570]: info: local HA resource 
acquisition completed (standby).
Sep 26 11:09:57 grind11 heartbeat: [11851]: info: Standby resource 
acquisition done [foreign].
Sep 26 11:09:58 grind11 heartbeat: [11851]: info: remote resource 
transition completed.
Sep 26 11:09:58 grind11 ipfail: [11860]: debug: Other side is now stable.

[root@grind11 ~]# ipvsadm
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  172.31.1.10:http rr persistent 600
  -> grind11.graddelt.com:http    Local   1      0          0

-------
COMMENT
-------

Lvsadm still doesn't show the second real node .. -)
[john@delta ~]$
<Prev in Thread]	Current Thread	[Next in Thread>
[lvs-users] Problem setting up 2-node UltraMonkey style HA cluster, John Donath <= Re: [lvs-users] Problem setting up 2-node UltraMonkey style HA cluster, Janusz Krzysztofik Re: [lvs-users] Problem setting up 2-node UltraMonkey style HA cluster, John Donath
Previous by Date:	Re: [lvs-users] direct-direct routing, Jan-Frode Myklebust
Next by Date:	Re: [lvs-users] Problem setting up 2-node UltraMonkey style HA cluster, Janusz Krzysztofik
Previous by Thread:	[lvs-users] direct-direct routing, Jan-Frode Myklebust
Next by Thread:	Re: [lvs-users] Problem setting up 2-node UltraMonkey style HA cluster, Janusz Krzysztofik
Indexes:	[Date] [Thread] [Top] [All Lists]