Re: weird problems with my cluster.

To: Nico Lumma <nl@xxxxxxxxxxxxxx>
Subject: Re: weird problems with my cluster.
Cc: <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
From: Julian Anastasov <ja@xxxxxx>
Date: Mon, 28 Jan 2002 12:38:17 +0200 (EET)

On Mon, 28 Jan 2002, Nico Lumma wrote:

> Heya,
> I am using a LVS-DR setup with a load-balancer (and one backup) and 6
> realservers. All realservers are running a 2.4.16-SUSE kernel.
> The setup is as follows:
> All realservers have one internal IP (192.168.100.x) on eth0 and various
> real IPs on eth1, eth1:1, etc. (range from
> The loadbalancer's MAC-address is statically set in the cisco router in
> order to avoid the arp-problem.

        Well, I assume you don't use any solutions to fix the ARP
problem in the real servers.

> All realservers and loadbalancers are connected to one switch, which is
> connected to the gateway.
> So far so good.
> 4 out of 6 realservers behave normally. They get the MAC-address of the
> gateway and happily route the packets to the gateway. 2 realservers, the
> 2 SMP-servers in my setup, behave rather strange. After a while they
> "forget" the MAC-address of the gateway and cannot recover it, even
> though all other realservers still have the right MAC-address. A tcpdump
> on the interface always leads to the same: the server is trying to send
> one ACK-packet to a client and is repeating this forever (I can't post

        What about ARP, do you see unanswered ARP probes looking in
this way:

who-has GW tell VIP (MAC of RS)

> the line at the moment because right now everything is fine). After a
> reboot everything is fine. The problem with this is, that ldirectord
> doesn't recognize that the realserver is hanging, since the httpd is
> still accepting connections on the internal IP-address. This leads the
> whole cluster to hang and makes me (and my boss) extremely unhappy. :(
> Do you guys have any ideas what's going on? I had the same setup before
> but seperated both the internal and external interfaces on two switches,
> or rather switching hubs, and everything worked just fine. To get better
> performance I decided to go with the bigger switch and figured that the
> packets should be routed okay. Well, 4 out of 6 ain't bad, but I'd
> rather have all 6 servers working...

        You can try whether it makes difference by applying one
of the patches:

or of course, the hidden flag:

        My explanation of a possible problem in your setup:

        By default Linux resolves the other hosts by announcing
any local IP in our ARP probes. When bidirectional communication
takes place it is possible the real server to use VIP as source
in our probe (resolving GW). May be GW does not like this probe
because it is from VIP (as source IP). The normal way to reply
to ARP probe is to reply directly to the sender's hwaddr but
may be after makeing some checks for the sender's IP (which is
true in Linux). If arp_prefsrc solves your problem, then this is
the case with your GW. It does not like broadcast ARP probes with
VIP to come from real servers.

> Thanks in advance,
>       Nico


Julian Anastasov <ja@xxxxxx>

<Prev in Thread] Current Thread [Next in Thread>