
Praise and Help Request

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Praise and Help Request
From: Jeff Amaral <jamaral@xxxxxxx>
Date: Fri, 11 Feb 2005 13:26:08 -0500

First, the praise. I just built a two node cluster using heartbeat, ldirectord, and apache, using this document as a guideline (although I am _not_ using ultramonkey):

When all is well, the first node is assigned the virtual IP address and is running ldirectord. Both of these are configured through heartbeat to move to the second node if there is a problem.

ldirectord is configured to use both nodes for web traffic, and most of the time, the http traffic is balanced between the two servers.

This setup rocks, as does ldirectord and other LVS components. With only two servers, I get redundancy _and_ load balancing. So, thanks for great, free software that is saving me $$$ on hardware.

Here's my problem:

After some period of inactivity, I'll try to connect via a web browser to the virtual IP address, and one of two things will happen:
1. I'll get an error (in Mozilla, anyway) saying 'The document contains
   no data.' and the request will fail. Subsequent connection attempts
   will work fine, but only appear to connect to node 2.
2. I won't get an error, but I'll only get the second node in the
   cluster, and all subsequent requests will only go to node 2.

Here's my only clue:

If I run the ipvsadm command when load balancing is working correctly and I'm running apache bench a computer from outside the cluster, I'll see something like this:
las1:~ # ipvsadm
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP wlc
  ->            Local   1      10         2700
  ->            Route   1      9          2616

When load balancing is not working and I run the same apache bench command again, I'll see this:
las1:~ # ipvsadm
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP wlc
  ->            Local   1      0          0
  ->            Route   1      0          0

There'll be no active connections, even to node 2! But apache bench will be running and returning data just fine (but only from node 2).

If I restart heartbeat (thus restarting ldirectord), load balancing will work properly again, but only for a while.

Any ideas?

I'm running this on SUSE Enterprise 9 with the latest versions of stock SUSE packages.
heartbeat 1.2.3
ldirectord 1.2.3
ipvsadm 1.24

Thanks for any help anyone might have,

Jeff Amaral
MoS Web Team

<Prev in Thread] Current Thread [Next in Thread>