I have a LVS-NAT implementation in the lab that sort of works. I have
a primary and hot backup lvs node, and two web servers behind it a5ll
running RHEL/CentOS 5.2. I can happily point my web browser at the
virtual IP and I get the apache test page just fine. I check the httpd
access logs on the two real web servers and see that the load is being
distributed.
The problem lies when I try to test the failover of the lvs nodes. I
shut the primary node down, and I see that it at least attempts to fail
over, and seems to do so successfully:
Aug 25 18:21:44 lb2 pulse[5064]: partner dead: activating lvs
Aug 25 18:21:44 lb2 lvs[5083]: starting virtual service glassfish
active: 80
Aug 25 18:21:44 lb2 avahi-daemon[3136]: Registering new address record
for 10.11.12.10 on eth1.
Aug 25 18:21:44 lb2 avahi-daemon[3136]: Withdrawing address record for
10.11.12.10 on eth1.
Aug 25 18:21:44 lb2 avahi-daemon[3136]: Registering new address record
for 10.11.12.10 on eth1.
Aug 25 18:21:44 lb2 avahi-daemon[3136]: Registering new address record
for 10.100.13.220 on eth0.
Aug 25 18:21:44 lb2 avahi-daemon[3136]: Withdrawing address record for
10.100.13.220 on eth0.
Aug 25 18:21:44 lb2 avahi-daemon[3136]: Registering new address record
for 10.100.13.220 on eth0.
Aug 25 18:21:44 lb2 lvs[5083]: create_monitor for glassfish/gf1 running
as pid 5094
Aug 25 18:21:44 lb2 nanny[5094]: starting LVS client monitor for
10.100.13.220:80
Aug 25 18:21:44 lb2 nanny[5095]: starting LVS client monitor for
10.100.13.220:80
Aug 25 18:21:44 lb2 lvs[5083]: create_monitor for glassfish/gf2 running
as pid 5095
Aug 25 18:21:44 lb2 nanny[5094]: making 10.11.12.1:80 available
Aug 25 18:21:44 lb2 nanny[5095]: making 10.11.12.2:80 available
Aug 25 18:21:49 lb2 pulse[5085]: gratuitous lvs arps finished
The problem is that attempts from my web browser to refresh the page are
unsuccessful. The lvs.cf is synchronized between the lvs nodes. Here's
a copy of the config:
serial_no = 49
primary = 10.100.13.96
primary_private = 10.11.12.8
service = lvs
backup_active = 1
backup = 10.100.13.87
backup_private = 10.11.12.9
heartbeat = 1
heartbeat_port = 539
keepalive = 6
deadtime = 10
network = nat
nat_router = 10.11.12.10 eth1:1
nat_nmask = 255.255.255.0
debug_level = NONE
monitor_links = 1
virtual glassfish {
active = 1
address = 10.100.13.220 eth0:1
vip_nmask = 255.255.255.0
port = 80
send = "GET / HTTP/1.0\r\n\r\n"
expect = "HTTP"
use_regex = 0
load_monitor = none
scheduler = wlc
protocol = tcp
timeout = 6
reentry = 15
quiesce_server = 0
server gf1 {
address = 10.11.12.1
active = 1
weight = 1
}
server gf2 {
address = 10.11.12.2
active = 1
weight = 1
}
}
I believe the problem lies in arping, but I'm not sure how to diagnose
this. There are no firewalls between my browser and the lvs, and I'm
using a fairly dumb 100mb switch (also tried with a smarter switch).
Any help would be greatly appreciated.
Thanks,
James
|