Hey all,
I'm trying to setup a 2-node LVS setup on fully patched RedHat
Enterprise 4 (also tried CentOS 4) using UltraMonkey's Streamline HA
setup as described at
http://www.ultramonkey.org/3/topologies/sl-ha-lb-eg.html. I have two
pairs of test boxes I'm playing with, one running RHEL4 and one running
CentOS 4, but the production setup will be one pair of RHEL4's. For
now, I'm trying to balance a single IP to reference a simple Apache web
server. Apache works fine on all boxen, including the ldirectord.html
file. The only noteworthy difference between my setup and that on the
web page is that RHEL4 now includes the arp_ignore settings, so I
followed the Debian setup under "Restricting Arp Advertisements."
When I start everything up, it all appears to be running, but I can't
ping the VIP from any other box (on or off that same subnet). Issuing a
"service heartbeat standby" will properly fail everything over to the
second box, according to ipvsadm, but I still can't ping the VIP from
off site. Communication with the real IP's works fine. My guess is
that the problem lies in either ldirectord or my routing configuration,
but I'm pretty much at a loss by now. Any insight would be greatly
appreciated. The weird thing is that I had this working on these very
CentOS boxes a few months ago, but after a few months of neglect --
except for OS patches -- it's now broken, and now I can't get it working
on RHEL4, either.
In this setup, the real nodes are lvs2 & lvs3, IP's 172.22.65.33 & 34.
The VIP is 172.22.65.36. Gateway is 172.22.127.254. Heartbeat is via
ucast over two interfaces, but I tried mcast over eth0 with identical
results. Eth0 (172*) is the public interface; eth1 (192*) is a private
subnet visible only to these boxes. Below are my config files from lvs2
and syslog entries.
RPM versions (all pulled from the official CentOS repositories):
heartbeat-stonith-2.0.7-1.c4
heartbeat-pils-2.0.7-1.c4
heartbeat-2.0.7-1.c4
heartbeat-ldirectord-2.0.7-1.c4
ipvsadm-1.24-6
ha.cf:
logfacility local0
ucast eth0 172.22.65.34 (lvs3 lists .33)
ucast eth1 192.168.250.34 (lvs3 lists .33)
auto_failback off
node lvs2.bryanlgh.org
node lvs3.bryanlgh.org
ping 172.22.127.254
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster
haresources:
lvs2.bryanlgh.org \
ldirectord::ldirectord.cf \
LVSSyncDaemonSwap::master \
IPaddr2::172.22.65.36/18/eth0/172.22.127.255
ldirectord.cf:
checktimeout=10
checkinterval=2
autoreload=no
logfile="/var/log/ldirectord.log"
quiescent=yes
virtual=172.22.65.36:80
fallback=127.0.0.1:80
real=172.22.65.33:80 gate
real=172.22.65.34:80 gate
service=http
request="ldirectord.html"
receive="It worked"
scheduler=rr
persistent=600
protocol=tcp
checktype=negotiate
[root@lvs2 ha.d]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:50:56:8A:01:10
inet addr:172.22.65.33 Bcast:172.22.127.255 Mask:255.255.192.0
inet6 addr: fe80::250:56ff:fe8a:110/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:434881 errors:1 dropped:0 overruns:0 frame:0
TX packets:330938 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:73445909 (70.0 MiB) TX bytes:47478824 (45.2 MiB)
Interrupt:10 Base address:0x1400
eth1 Link encap:Ethernet HWaddr 00:50:56:8A:54:25
inet addr:192.168.250.33 Bcast:192.168.250.255
Mask:255.255.255.0
inet6 addr: fe80::250:56ff:fe8a:5425/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:82990 errors:0 dropped:0 overruns:0 frame:0
TX packets:82911 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:20602447 (19.6 MiB) TX bytes:20516532 (19.5 MiB)
Interrupt:11 Base address:0x1480
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:341219 errors:0 dropped:0 overruns:0 frame:0
TX packets:341219 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:31020352 (29.5 MiB) TX bytes:31020352 (29.5 MiB)
lo:0 Link encap:Local Loopback
inet addr:172.22.65.36 Mask:255.255.255.255
UP LOOPBACK RUNNING MTU:16436 Metric:1
# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt
Iface
192.168.250.0 0.0.0.0 255.255.255.0 U 0 0 0
eth1
172.22.64.0 0.0.0.0 255.255.192.0 U 0 0 0
eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0
eth1
0.0.0.0 172.22.127.254 0.0.0.0 UG 0 0 0
eth0
# service heartbeat start
Starting High-Availability services:
ldirectord is stopped for /etc/ha.d/ldirectord.cf
[ OK ]
From ldirectord.log:
[Thu Feb 22 10:51:05 2007|ldirectord.cf|18767] Invoking ldirectord
invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status
[Thu Feb 22 10:51:05 2007|ldirectord.cf|18767] ldirectord is stopped for
/etc/ha.d/ldirectord.cf
[Thu Feb 22 10:51:05 2007|ldirectord.cf|18767] Exiting with exit_status
3: Exiting from ldirectord status
[Thu Feb 22 10:51:30 2007|ldirectord.cf|18841] Invoking ldirectord
invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status
[Thu Feb 22 10:51:30 2007|ldirectord.cf|18841] ldirectord is stopped for
/etc/ha.d/ldirectord.cf
[Thu Feb 22 10:51:30 2007|ldirectord.cf|18841] Exiting with exit_status
3: Exiting from ldirectord status
[Thu Feb 22 10:51:30 2007|ldirectord.cf|18883] Invoking ldirectord
invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf status
[Thu Feb 22 10:51:30 2007|ldirectord.cf|18883] ldirectord is stopped for
/etc/ha.d/ldirectord.cf
[Thu Feb 22 10:51:30 2007|ldirectord.cf|18883] Exiting with exit_status
3: Exiting from ldirectord status
[Thu Feb 22 10:51:30 2007|ldirectord.cf|18905] Invoking ldirectord
invoked as: /etc/ha.d/resource.d/ldirectord ldirectord.cf start
[Thu Feb 22 10:51:30 2007|ldirectord.cf|18905] Starting Linux Director
v1.143 as daemon
[Thu Feb 22 10:51:30 2007|ldirectord.cf|18907] Added virtual server:
172.22.65.36:80
[Thu Feb 22 10:51:30 2007|ldirectord.cf|18907] Added fallback server:
127.0.0.1:80 ( x 172.22.65.36:80) (Weight set to 1)
[Thu Feb 22 10:51:30 2007|ldirectord.cf|18907] Quiescent real server:
172.22.65.33:80 ( x 172.22.65.36:80) (Weight set to 0)
[Thu Feb 22 10:51:30 2007|ldirectord.cf|18907] Quiescent real server:
172.22.65.34:80 ( x 172.22.65.36:80) (Weight set to 0)
[Thu Feb 22 10:51:31 2007|ldirectord.cf|18907] Restored real server:
172.22.65.33:80 ( x 172.22.65.36:80) (Weight set to 1)
[Thu Feb 22 10:51:31 2007|ldirectord.cf|18907] Deleted fallback server:
127.0.0.1:80 ( x 172.22.65.36:80)
[Thu Feb 22 10:51:31 2007|ldirectord.cf|18907] Restored real server:
172.22.65.34:80 ( x 172.22.65.36:80) (Weight set to 1)
heartbeat.log:
Feb 22 10:51:05 lvs2 heartbeat: [18780]: WARN: Logging daemon is
disabled --enabling logging daemon is recommended
Feb 22 10:51:05 lvs2 heartbeat: [18780]: info: **************************
Feb 22 10:51:05 lvs2 heartbeat: [18780]: info: Configuration validated.
Starting heartbeat 2.0.7
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: heartbeat: version 2.0.7
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: Heartbeat generation: 13
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info:
G_main_add_TriggerHandler: Addedsignal manual handler
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info:
G_main_add_TriggerHandler: Addedsignal manual handler
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: Removing
/var/run/heartbeat/rsctmp failed, recreating.
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth0
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ucast: bound send
socket to device: eth0
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ucast: bound
receive socket to device: eth0
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ucast: started on
port 694interface eth0 to 172.22.65.34
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth1
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ucast: bound send
socket to device: eth1
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ucast: bound
receive socket to device: eth1
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ucast: started on
port 694interface eth1 to 192.168.250.34
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: glib: ping heartbeat started.
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: G_main_add_SignalHandler:
Added signal handler for signal 17
Feb 22 10:51:05 lvs2 heartbeat: [18781]: info: Local status now set to: 'up'
Feb 22 10:51:07 lvs2 heartbeat: [18781]: info: Link
172.22.127.254:172.22.127.254 up.
Feb 22 10:51:07 lvs2 heartbeat: [18781]: info: Status update for node
172.22.127.254: status ping
Feb 22 10:51:18 lvs2 heartbeat: [18781]: info: Link
lvs3.bryanlgh.org:eth0 up.
Feb 22 10:51:18 lvs2 heartbeat: [18781]: info: Link
lvs3.bryanlgh.org:eth1 up.
Feb 22 10:51:18 lvs2 heartbeat: [18781]: info: Status update for node
lvs3.bryanlgh.org: status up
Feb 22 10:51:18 lvs2 heartbeat: [18792]: debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
Feb 22 10:51:18 lvs2 harc[18792]: info: Running /etc/ha.d/rc.d/status status
Feb 22 10:51:19 lvs2 heartbeat: [18781]: debug: get_delnodelist:
delnodelist=
Feb 22 10:51:19 lvs2 heartbeat: [18781]: info: Comm_now_up(): updating
status to active
Feb 22 10:51:19 lvs2 heartbeat: [18781]: info: Local status now set to:
'active'
Feb 22 10:51:19 lvs2 heartbeat: [18781]: info: Starting child client
"/usr/lib/heartbeat/ipfail" (1001,104)
Feb 22 10:51:19 lvs2 heartbeat: [18803]: info: Starting
"/usr/lib/heartbeat/ipfail" as uid 1001 gid 104 (pid 18803)
Feb 22 10:51:19 lvs2 heartbeat: [18781]: info: Status update for node
lvs3.bryanlgh.org: status active
Feb 22 10:51:19 lvs2 heartbeat: [18804]: debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
Feb 22 10:51:19 lvs2 harc[18804]: info: Running /etc/ha.d/rc.d/status status
Feb 22 10:51:19 lvs2 ipfail: [18803]: debug: [We are lvs2.bryanlgh.org]
Feb 22 10:51:19 lvs2 ipfail: [18803]: debug: auto_failback -> 0 (off)
Feb 22 10:51:19 lvs2 ipfail: [18803]: debug: Setting message filter mode
Feb 22 10:51:20 lvs2 ipfail: [18803]: debug: Starting node walk
Feb 22 10:51:20 lvs2 ipfail: [18803]: debug: Cluster node:
172.22.127.254: status: ping
Feb 22 10:51:21 lvs2 ipfail: [18803]: debug: Cluster node:
lvs3.bryanlgh.org: status: active
Feb 22 10:51:21 lvs2 ipfail: [18803]: debug: [They are lvs3.bryanlgh.org]
Feb 22 10:51:22 lvs2 ipfail: [18803]: debug: Cluster node:
lvs2.bryanlgh.org: status: active
Feb 22 10:51:22 lvs2 ipfail: [18803]: debug: Setting message signal
Feb 22 10:51:22 lvs2 ipfail: [18803]: debug: Waiting for messages...
Feb 22 10:51:23 lvs2 ipfail: [18803]: debug: Got join message from
another ipfail client. (lvs3.bryanlgh.org)
Feb 22 10:51:24 lvs2 ipfail: [18803]: debug: Found ping node 172.22.127.254!
Feb 22 10:51:25 lvs2 ipfail: [18803]: info: Asking other side for ping
node count.
Feb 22 10:51:25 lvs2 ipfail: [18803]: debug: Message [num_ping] sent.
Feb 22 10:51:27 lvs2 ipfail: [18803]: info: No giveup timer to abort.
Feb 22 10:51:29 lvs2 heartbeat: [18781]: info: remote resource
transition completed.
Feb 22 10:51:29 lvs2 heartbeat: [18781]: info: remote resource
transition completed.
Feb 22 10:51:29 lvs2 heartbeat: [18781]: info: Initial resource
acquisition complete (T_RESOURCES(us))
Feb 22 10:51:29 lvs2 ipfail: [18803]: debug: Other side is now stable.
Feb 22 10:51:30 lvs2 heartbeat: [18814]: info: Local Resource
acquisition completed.
Feb 22 10:51:30 lvs2 heartbeat: [18781]: debug: StartNextRemoteRscReq():
child count 1
Feb 22 10:51:30 lvs2 ipfail: [18803]: debug: Other side is now stable.
Feb 22 10:51:30 lvs2 heartbeat: [18844]: debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
Feb 22 10:51:30 lvs2 harc[18844]: info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
Feb 22 10:51:30 lvs2 ip-request-resp[18844]: received ip-request-resp
ldirectord::ldirectord.cf OK yes
Feb 22 10:51:30 lvs2 ResourceManager[18859]: info: Acquiring resource
group: lvs2.bryanlgh.org ldirectord::ldirectord.cf
LVSSyncDaemonSwap::master IPaddr2::172.22.65.36/18/eth0/172.22.127.255
Feb 22 10:51:30 lvs2 ResourceManager[18859]: info: Running
/etc/ha.d/resource.d/ldirectord ldirectord.cf start
Feb 22 10:51:30 lvs2 ResourceManager[18859]: debug: Starting
/etc/ha.d/resource.d/ldirectord ldirectord.cf start
Feb 22 10:51:30 lvs2 ResourceManager[18859]: debug:
/etc/ha.d/resource.d/ldirectord ldirectord.cf start done. RC=0
Feb 22 10:51:31 lvs2 ResourceManager[18859]: info: Running
/etc/ha.d/resource.d/LVSSyncDaemonSwap master start
Feb 22 10:51:31 lvs2 ResourceManager[18859]: debug: Starting
/etc/ha.d/resource.d/LVSSyncDaemonSwap master start
Feb 22 10:51:31 lvs2 LVSSyncDaemonSwap[18968]: info: ipvs_syncbackup down
Feb 22 10:51:31 lvs2 LVSSyncDaemonSwap[18968]: info: ipvs_syncmaster up
Feb 22 10:51:31 lvs2 LVSSyncDaemonSwap[18968]: info: ipvs_syncmaster
obtained
Feb 22 10:51:31 lvs2 ResourceManager[18859]: debug:
/etc/ha.d/resource.d/LVSSyncDaemonSwap master start done. RC=0
Feb 22 10:51:31 lvs2 IPaddr2[19017]: INFO: IPaddr2 Resource is stopped
Feb 22 10:51:31 lvs2 ResourceManager[18859]: info: Running
/etc/ha.d/resource.d/IPaddr2 172.22.65.36/18/eth0/172.22.127.255 start
Feb 22 10:51:31 lvs2 ResourceManager[18859]: debug: Starting
/etc/ha.d/resource.d/IPaddr2 172.22.65.36/18/eth0/172.22.127.255 start
Feb 22 10:51:31 lvs2 IPaddr2[19238]: INFO: /usr/lib/heartbeat/send_arp
-i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-172.22.65.36
eth0 172.22.65.36 auto 172.22.65.36 ffffffffffff
Feb 22 10:51:31 lvs2 IPaddr2[19156]: INFO: IPaddr2 Success
Feb 22 10:51:31 lvs2 ResourceManager[18859]: debug:
/etc/ha.d/resource.d/IPaddr2 172.22.65.36/18/eth0/172.22.127.255 start
done. RC=0
[root@lvs2 log]# ipvsadm -L -n
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 172.22.65.36:80 rr persistent 600
-> 172.22.65.34:80 Route 1 0 0
-> 172.22.65.33:80 Local 1 0 0
From A Different Box On Same Subnet# ping 172.22.65.36
PING 172.22.65.36 (172.22.65.36) 56(84) bytes of data.
From 172.22.65.2 icmp_seq=1 Destination Host Unreachable
From 172.22.65.2 icmp_seq=2 Destination Host Unreachable
From 172.22.65.2 icmp_seq=3 Destination Host Unreachable
If you need any more info, just holler. Thanks.
--
CONFIDENTIALITY NOTICE: This e-mail message, including any
attachments, is for the sole use of the intended recipient(s)
and may contain confidential and privileged information. Any
unauthorized review, use, disclosure or distribution is
prohibited. If you are not the intended recipient, please
contact the sender by reply e-mail and destroy all copies
of the original message.
|