Heya,
I am using a LVS-DR setup with a load-balancer (and one backup) and 6
realservers. All realservers are running a 2.4.16-SUSE kernel.
The setup is as follows:
All realservers have one internal IP (192.168.100.x) on eth0 and various
real IPs on eth1, eth1:1, etc. (range from 212.12.37.70-76).
The loadbalancer's MAC-address is statically set in the cisco router in
order to avoid the arp-problem.
All realservers and loadbalancers are connected to one switch, which is
connected to the gateway.
So far so good.
4 out of 6 realservers behave normally. They get the MAC-address of the
gateway and happily route the packets to the gateway. 2 realservers, the
2 SMP-servers in my setup, behave rather strange. After a while they
"forget" the MAC-address of the gateway and cannot recover it, even
though all other realservers still have the right MAC-address. A tcpdump
on the interface always leads to the same: the server is trying to send
one ACK-packet to a client and is repeating this forever (I can't post
the line at the moment because right now everything is fine). After a
reboot everything is fine. The problem with this is, that ldirectord
doesn't recognize that the realserver is hanging, since the httpd is
still accepting connections on the internal IP-address. This leads the
whole cluster to hang and makes me (and my boss) extremely unhappy. :(
Do you guys have any ideas what's going on? I had the same setup before
but seperated both the internal and external interfaces on two switches,
or rather switching hubs, and everything worked just fine. To get better
performance I decided to go with the bigger switch and figured that the
packets should be routed okay. Well, 4 out of 6 ain't bad, but I'd
rather have all 6 servers working...
Thanks in advance,
Nico
--
Nico Lumma nl@xxxxxxxxxxxxxx
orangemedia.de GmbH www.orangemedia.de
Borsteler Chaussee 55 Tel: +49 40 46 85 67 - 20
D-22453 Hamburg Fax: +49 40 46 85 67 - 39
|