Hello,
First, the praise. I just built a two node cluster using heartbeat,
ldirectord, and apache, using this document as a guideline (although I
am _not_ using ultramonkey):
http://www.ultramonkey.org/2.0.1/topologies/sl-ha-lb-eg.html
When all is well, the first node is assigned the virtual IP address and
is running ldirectord. Both of these are configured through heartbeat to
move to the second node if there is a problem.
ldirectord is configured to use both nodes for web traffic, and most of
the time, the http traffic is balanced between the two servers.
This setup rocks, as does ldirectord and other LVS components. With only
two servers, I get redundancy _and_ load balancing. So, thanks for
great, free software that is saving me $$$ on hardware.
Here's my problem:
After some period of inactivity, I'll try to connect via a web browser
to the virtual IP address, and one of two things will happen:
1. I'll get an error (in Mozilla, anyway) saying 'The document contains
no data.' and the request will fail. Subsequent connection attempts
will work fine, but only appear to connect to node 2.
2. I won't get an error, but I'll only get the second node in the
cluster, and all subsequent requests will only go to node 2.
Here's my only clue:
If I run the ipvsadm command when load balancing is working correctly
and I'm running apache bench a computer from outside the cluster, I'll
see something like this:
----------------------------------------------------------------------
las1:~ # ipvsadm
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP las-cluster.mos.org:http wlc
-> las1.mos.org:http Local 1 10 2700
-> las2.mos.org:http Route 1 9 2616
----------------------------------------------------------------------
When load balancing is not working and I run the same apache bench
command again, I'll see this:
----------------------------------------------------------------------
las1:~ # ipvsadm
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP las-cluster.mos.org:http wlc
-> las1.mos.org:http Local 1 0 0
-> las2.mos.org:http Route 1 0 0
----------------------------------------------------------------------
There'll be no active connections, even to node 2! But apache bench will
be running and returning data just fine (but only from node 2).
If I restart heartbeat (thus restarting ldirectord), load balancing will
work properly again, but only for a while.
Any ideas?
I'm running this on SUSE Enterprise 9 with the latest versions of stock
SUSE packages.
heartbeat 1.2.3
ldirectord 1.2.3
ipvsadm 1.24
Thanks for any help anyone might have,
Jeff
--
Jeff Amaral
MoS Web Team
jamaral@xxxxxxx
617-589-0427
|