LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

UM Streamline HA (2-host LVS) not routing

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: UM Streamline HA (2-host LVS) not routing
From: "Ben Hollingsworth" <ben.hollingsworth@xxxxxxxxxxxx>
Date: Thu, 22 Feb 2007 11:09:17 -0600
Hey all,

I'm trying to setup a 2-node LVS setup on fully patched RedHat Enterprise 4 (also tried CentOS 4) using UltraMonkey's Streamline HA setup as described at http://www.ultramonkey.org/3/topologies/sl-ha-lb-eg.html. I have two pairs of test boxes I'm playing with, one running RHEL4 and one running CentOS 4, but the production setup will be one pair of RHEL4's. For now, I'm trying to balance a single IP to reference a simple Apache web server. Apache works fine on all boxen, including the ldirectord.html file. The only noteworthy difference between my setup and that on the web page is that RHEL4 now includes the arp_ignore settings, so I followed the Debian setup under "Restricting Arp Advertisements."

When I start everything up, it all appears to be running, but I can't ping the VIP from any other box (on or off that same subnet). Issuing a "service heartbeat standby" will properly fail everything over to the second box, according to ipvsadm, but I still can't ping the VIP from off site. Communication with the real IP's works fine. My guess is that the problem lies in either ldirectord or my routing configuration, but I'm pretty much at a loss by now. Any insight would be greatly appreciated. The weird thing is that I had this working on these very CentOS boxes a few months ago, but after a few months of neglect -- except for OS patches -- it's now broken, and now I can't get it working on RHEL4, either.

In this setup, the real nodes are lvs2 & lvs3, IP's 172.22.65.33 & 34. The VIP is 172.22.65.36. Gateway is 172.22.127.254. Heartbeat is via ucast over two interfaces, but I tried mcast over eth0 with identical results. Eth0 (172*) is the public interface; eth1 (192*) is a private subnet visible only to these boxes. Below are my config files from lvs2 and syslog entries.

RPM versions (all pulled from the official CentOS repositories):

heartbeat-stonith-2.0.7-1.c4
heartbeat-pils-2.0.7-1.c4
heartbeat-2.0.7-1.c4
heartbeat-ldirectord-2.0.7-1.c4
ipvsadm-1.24-6

ha.cf:

logfacility     local0
ucast eth0 172.22.65.34               (lvs3 lists .33)
ucast eth1 192.168.250.34           (lvs3 lists .33)
auto_failback off
node    lvs2.bryanlgh.org
node    lvs3.bryanlgh.org
ping 172.22.127.254
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster

haresources:

lvs2.bryanlgh.org       \
       ldirectord::ldirectord.cf \
       LVSSyncDaemonSwap::master \
       IPaddr2::172.22.65.36/18/eth0/172.22.127.255

ldirectord.cf:

checktimeout=10
checkinterval=2
autoreload=no
logfile="/var/log/ldirectord.log"
quiescent=yes
virtual=172.22.65.36:80
       fallback=127.0.0.1:80
       real=172.22.65.33:80 gate
       real=172.22.65.34:80 gate
       service=http
       request="ldirectord.html"
       receive="It worked"
       scheduler=rr
       persistent=600
       protocol=tcp
       checktype=negotiate

[root@lvs2 ha.d]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:50:56:8A:01:10
         inet addr:172.22.65.33  Bcast:172.22.127.255  Mask:255.255.192.0
         inet6 addr: fe80::250:56ff:fe8a:110/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:434881 errors:1 dropped:0 overruns:0 frame:0
         TX packets:330938 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:73445909 (70.0 MiB)  TX bytes:47478824 (45.2 MiB)
         Interrupt:10 Base address:0x1400

eth1      Link encap:Ethernet  HWaddr 00:50:56:8A:54:25
inet addr:192.168.250.33 Bcast:192.168.250.255 Mask:255.255.255.0
         inet6 addr: fe80::250:56ff:fe8a:5425/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:82990 errors:0 dropped:0 overruns:0 frame:0
         TX packets:82911 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:20602447 (19.6 MiB)  TX bytes:20516532 (19.5 MiB)
         Interrupt:11 Base address:0x1480

lo        Link encap:Local Loopback
         inet addr:127.0.0.1  Mask:255.0.0.0
         inet6 addr: ::1/128 Scope:Host
         UP LOOPBACK RUNNING  MTU:16436  Metric:1
         RX packets:341219 errors:0 dropped:0 overruns:0 frame:0
         TX packets:341219 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:31020352 (29.5 MiB)  TX bytes:31020352 (29.5 MiB)

lo:0      Link encap:Local Loopback
         inet addr:172.22.65.36  Mask:255.255.255.255
         UP LOOPBACK RUNNING  MTU:16436  Metric:1

# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface 192.168.250.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1 172.22.64.0 0.0.0.0 255.255.192.0 U 0 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth1 0.0.0.0 172.22.127.254 0.0.0.0 UG 0 0 0 eth0

# service heartbeat start
Starting High-Availability services:
ldirectord is stopped for /etc/ha.d/ldirectord.cf
                                                          [  OK  ]

From ldirectord.log:

[Thu Feb 22 10:51:05 2007|ldirectord.cf|18767] Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status [Thu Feb 22 10:51:05 2007|ldirectord.cf|18767] ldirectord is stopped for /etc/ha.d/ldirectord.cf [Thu Feb 22 10:51:05 2007|ldirectord.cf|18767] Exiting with exit_status 3: Exiting from ldirectord status [Thu Feb 22 10:51:30 2007|ldirectord.cf|18841] Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status [Thu Feb 22 10:51:30 2007|ldirectord.cf|18841] ldirectord is stopped for /etc/ha.d/ldirectord.cf [Thu Feb 22 10:51:30 2007|ldirectord.cf|18841] Exiting with exit_status 3: Exiting from ldirectord status [Thu Feb 22 10:51:30 2007|ldirectord.cf|18883] Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status [Thu Feb 22 10:51:30 2007|ldirectord.cf|18883] ldirectord is stopped for /etc/ha.d/ldirectord.cf [Thu Feb 22 10:51:30 2007|ldirectord.cf|18883] Exiting with exit_status 3: Exiting from ldirectord status [Thu Feb 22 10:51:30 2007|ldirectord.cf|18905] Invoking ldirectord invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf start [Thu Feb 22 10:51:30 2007|ldirectord.cf|18905] Starting Linux Director v1.143 as daemon [Thu Feb 22 10:51:30 2007|ldirectord.cf|18907] Added virtual server: 172.22.65.36:80 [Thu Feb 22 10:51:30 2007|ldirectord.cf|18907] Added fallback server: 127.0.0.1:80 ( x 172.22.65.36:80) (Weight set to 1) [Thu Feb 22 10:51:30 2007|ldirectord.cf|18907] Quiescent real server: 172.22.65.33:80 ( x 172.22.65.36:80) (Weight set to 0) [Thu Feb 22 10:51:30 2007|ldirectord.cf|18907] Quiescent real server: 172.22.65.34:80 ( x 172.22.65.36:80) (Weight set to 0) [Thu Feb 22 10:51:31 2007|ldirectord.cf|18907] Restored real server: 172.22.65.33:80 ( x 172.22.65.36:80) (Weight set to 1) [Thu Feb 22 10:51:31 2007|ldirectord.cf|18907] Deleted fallback server: 127.0.0.1:80 ( x 172.22.65.36:80) [Thu Feb 22 10:51:31 2007|ldirectord.cf|18907] Restored real server: 172.22.65.34:80 ( x 172.22.65.36:80) (Weight set to 1)

heartbeat.log:

Feb 22 10:51:05 lvs2 heartbeat: [18780]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
Feb 22 10:51:05 lvs2 heartbeat: [18780]: info: **************************
Feb 22 10:51:05 lvs2 heartbeat: [18780]: info: Configuration validated. Starting heartbeat 2.0.7
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: heartbeat: version 2.0.7
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: Heartbeat generation: 13
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: G_main_add_TriggerHandler: Addedsignal manual handler Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: G_main_add_TriggerHandler: Addedsignal manual handler Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: Removing /var/run/heartbeat/rsctmp failed, recreating. Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0 Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ucast: bound send socket to device: eth0 Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ucast: bound receive socket to device: eth0 Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ucast: started on port 694interface eth0 to 172.22.65.34 Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1 Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ucast: bound send socket to device: eth1 Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ucast: bound receive socket to device: eth1 Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ucast: started on port 694interface eth1 to 192.168.250.34
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ping heartbeat started.
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: Local status now set to: 'up'
Feb 22 10:51:07 lvs2 heartbeat: [18781]: info: Link 172.22.127.254:172.22.127.254 up. Feb 22 10:51:07 lvs2 heartbeat: [18781]: info: Status update for node 172.22.127.254: status ping Feb 22 10:51:18 lvs2 heartbeat: [18781]: info: Link lvs3.bryanlgh.org:eth0 up. Feb 22 10:51:18 lvs2 heartbeat: [18781]: info: Link lvs3.bryanlgh.org:eth1 up. Feb 22 10:51:18 lvs2 heartbeat: [18781]: info: Status update for node lvs3.bryanlgh.org: status up Feb 22 10:51:18 lvs2 heartbeat: [18792]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Feb 22 10:51:18 lvs2 harc[18792]: info: Running /etc/ha.d/rc.d/status status
Feb 22 10:51:19 lvs2 heartbeat: [18781]: debug: get_delnodelist: delnodelist= Feb 22 10:51:19 lvs2 heartbeat: [18781]: info: Comm_now_up(): updating status to active Feb 22 10:51:19 lvs2 heartbeat: [18781]: info: Local status now set to: 'active' Feb 22 10:51:19 lvs2 heartbeat: [18781]: info: Starting child client "/usr/lib/heartbeat/ipfail" (1001,104) Feb 22 10:51:19 lvs2 heartbeat: [18803]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 1001 gid 104 (pid 18803) Feb 22 10:51:19 lvs2 heartbeat: [18781]: info: Status update for node lvs3.bryanlgh.org: status active Feb 22 10:51:19 lvs2 heartbeat: [18804]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Feb 22 10:51:19 lvs2 harc[18804]: info: Running /etc/ha.d/rc.d/status status
Feb 22 10:51:19 lvs2 ipfail: [18803]: debug: [We are lvs2.bryanlgh.org]
Feb 22 10:51:19 lvs2 ipfail: [18803]: debug: auto_failback -> 0 (off)
Feb 22 10:51:19 lvs2 ipfail: [18803]: debug: Setting message filter mode
Feb 22 10:51:20 lvs2 ipfail: [18803]: debug: Starting node walk
Feb 22 10:51:20 lvs2 ipfail: [18803]: debug: Cluster node: 172.22.127.254: status: ping Feb 22 10:51:21 lvs2 ipfail: [18803]: debug: Cluster node: lvs3.bryanlgh.org: status: active
Feb 22 10:51:21 lvs2 ipfail: [18803]: debug: [They are lvs3.bryanlgh.org]
Feb 22 10:51:22 lvs2 ipfail: [18803]: debug: Cluster node: lvs2.bryanlgh.org: status: active
Feb 22 10:51:22 lvs2 ipfail: [18803]: debug: Setting message signal
Feb 22 10:51:22 lvs2 ipfail: [18803]: debug: Waiting for messages...
Feb 22 10:51:23 lvs2 ipfail: [18803]: debug: Got join message from another ipfail client. (lvs3.bryanlgh.org)
Feb 22 10:51:24 lvs2 ipfail: [18803]: debug: Found ping node 172.22.127.254!
Feb 22 10:51:25 lvs2 ipfail: [18803]: info: Asking other side for ping node count.
Feb 22 10:51:25 lvs2 ipfail: [18803]: debug: Message [num_ping] sent.
Feb 22 10:51:27 lvs2 ipfail: [18803]: info: No giveup timer to abort.
Feb 22 10:51:29 lvs2 heartbeat: [18781]: info: remote resource transition completed. Feb 22 10:51:29 lvs2 heartbeat: [18781]: info: remote resource transition completed. Feb 22 10:51:29 lvs2 heartbeat: [18781]: info: Initial resource acquisition complete (T_RESOURCES(us))
Feb 22 10:51:29 lvs2 ipfail: [18803]: debug: Other side is now stable.
Feb 22 10:51:30 lvs2 heartbeat: [18814]: info: Local Resource acquisition completed. Feb 22 10:51:30 lvs2 heartbeat: [18781]: debug: StartNextRemoteRscReq(): child count 1
Feb 22 10:51:30 lvs2 ipfail: [18803]: debug: Other side is now stable.
Feb 22 10:51:30 lvs2 heartbeat: [18844]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Feb 22 10:51:30 lvs2 harc[18844]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp Feb 22 10:51:30 lvs2 ip-request-resp[18844]: received ip-request-resp ldirectord::ldirectord.cf OK yes Feb 22 10:51:30 lvs2 ResourceManager[18859]: info: Acquiring resource group: lvs2.bryanlgh.org ldirectord::ldirectord.cf LVSSyncDaemonSwap::master IPaddr2::172.22.65.36/18/eth0/172.22.127.255 Feb 22 10:51:30 lvs2 ResourceManager[18859]: info: Running /etc/ha.d/resource.d/ldirectord ldirectord.cf start Feb 22 10:51:30 lvs2 ResourceManager[18859]: debug: Starting /etc/ha.d/resource.d/ldirectord ldirectord.cf start Feb 22 10:51:30 lvs2 ResourceManager[18859]: debug: /etc/ha.d/resource.d/ldirectord ldirectord.cf start done. RC=0 Feb 22 10:51:31 lvs2 ResourceManager[18859]: info: Running /etc/ha.d/resource.d/LVSSyncDaemonSwap master start Feb 22 10:51:31 lvs2 ResourceManager[18859]: debug: Starting /etc/ha.d/resource.d/LVSSyncDaemonSwap master start
Feb 22 10:51:31 lvs2 LVSSyncDaemonSwap[18968]: info: ipvs_syncbackup down
Feb 22 10:51:31 lvs2 LVSSyncDaemonSwap[18968]: info: ipvs_syncmaster up
Feb 22 10:51:31 lvs2 LVSSyncDaemonSwap[18968]: info: ipvs_syncmaster obtained Feb 22 10:51:31 lvs2 ResourceManager[18859]: debug: /etc/ha.d/resource.d/LVSSyncDaemonSwap master start done. RC=0
Feb 22 10:51:31 lvs2 IPaddr2[19017]: INFO: IPaddr2 Resource is stopped
Feb 22 10:51:31 lvs2 ResourceManager[18859]: info: Running /etc/ha.d/resource.d/IPaddr2 172.22.65.36/18/eth0/172.22.127.255 start Feb 22 10:51:31 lvs2 ResourceManager[18859]: debug: Starting /etc/ha.d/resource.d/IPaddr2 172.22.65.36/18/eth0/172.22.127.255 start Feb 22 10:51:31 lvs2 IPaddr2[19238]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-172.22.65.36 eth0 172.22.65.36 auto 172.22.65.36 ffffffffffff
Feb 22 10:51:31 lvs2 IPaddr2[19156]: INFO: IPaddr2 Success
Feb 22 10:51:31 lvs2 ResourceManager[18859]: debug: /etc/ha.d/resource.d/IPaddr2 172.22.65.36/18/eth0/172.22.127.255 start done. RC=0

[root@lvs2 log]# ipvsadm -L -n
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
 -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  172.22.65.36:80 rr persistent 600
 -> 172.22.65.34:80              Route   1      0          0
 -> 172.22.65.33:80              Local   1      0          0

From A Different Box On Same Subnet# ping 172.22.65.36
PING 172.22.65.36 (172.22.65.36) 56(84) bytes of data.
From 172.22.65.2 icmp_seq=1 Destination Host Unreachable
From 172.22.65.2 icmp_seq=2 Destination Host Unreachable
From 172.22.65.2 icmp_seq=3 Destination Host Unreachable

If you need any more info, just holler.  Thanks.

--
CONFIDENTIALITY NOTICE: This e-mail message, including any
attachments, is for the sole use of the intended recipient(s)
and may contain confidential and privileged information.  Any
unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please
contact the sender by reply e-mail and destroy all copies
of the original message.





<Prev in Thread] Current Thread [Next in Thread>