LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: Network funkyness

To: Daniel Burke <smstnitc@xxxxxxxxx>
Subject: Re: Network funkyness
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Julian Anastasov <ja@xxxxxx>
Date: Sun, 30 Jun 2002 13:53:44 +0000 (GMT)
        Hello,

On Sat, 29 Jun 2002, Daniel Burke wrote:

> > I'm coming in on this thread a bit late and I seem
> > to be missing some
> > information here.
> >
>
> In a nutshell, odd things were happening.  A little
> more verbose: over the course of 4 hours (almost to
> the minute) the VIPs would become unreachable.  It

        Can this time be related to some "ARP cache timeout",
may be in the uplink router. May be the VIP is ARP-resolved
from this router once on 4 hours.

> would start with machines on the same segment not
> being able to reach the VIPs, and eventually nobody
> could get to them.  The temp solution was to

        It seems you do not switch to real server mode
correctly, see below.

> force-fail over to the secondary director, and wait
> for it to happen again and force-fail back to the
> primary, rinse, repeat. (not the best of situations in
> a production environment)
>
> It took me almost 20 hours of research (B.O.S.S said I
> wasn't allowed to leave until I had a solution or we
> were scrapping LVS for some other solution), getting
> help from the LAN/WAN team, experimenting with dev
> servers, etc, to find out exactly what was happening
> and fix it.
>
> I started seeing weird things, like "arp -a" was
> showing (incomplete) for the mac address of the VIPs
> when that machine was no longer able to contact it.
> And it didn't happen to all 4 VIPs at once, it was one
> at a time, each almost an hour apart before the rest
> of the network suddenly couldn't get to any of them.

        Where you see this arp -a ? On some "client" on
the LAN?

> > I tried several methods to
> > > get to the bottom of the problem, and in the end,
> > > noticed I was setting the hidden flag on the dummy
> > > interface AFTER the 4 ip's were assigned...
> >
> > you have 4VIPs on each realserver?
>
> The 4 ip's are on the dummy0 interface, using the
> hidden patch to stop ARP.  I can't dial into work
> right now to get the exact lines, but this is
> something like what I had before:
>
> modprobe dummy
> echo 1 > /proc/???/all/hidden
> ifconfig dummy0:0 xxx.xxx.xxx.xxx netmask
> 255.255.255.255
> ifconfig dummy0:1 xxx.xxx.xxx.xxx netmask
> 255.255.255.255
> ifconfig dummy0:2 xxx.xxx.xxx.xxx netmask
> 255.255.255.255
> ifconfig dummy0:3 xxx.xxx.xxx.xxx netmask
> 255.255.255.255
> echo 1 > /proc/???/dummy/hidden
>
> I saw in some mailing list archives that people were
> adding a line "ifconfig dummy0 0.0.0.0 up" and doing
> the second hidden before assigning any ip's.  That

        Yes, this is the only secure way to bring up hidden IPs.
"0.0.0.0 up" (ENABLE IP PROTOCOL for this dev) simply leads
/proc/sys/net/ipv4/conf/dummy0 to appear as directory. Then
the flag value is set and then the IPs are added without risking
someone can ask for them before the flag is set.

> seemed to do the trick, but made the primary director
> unable to be a real server because heartbeat won't
> startup the ip's (that's minor at this point though).

        If one director changes to real server mode then the VIP
must be moved between dummy0 and ethX. Note that if you change
dummy0 to down it does not help. First, it appears to stop
replies for all VIPs, not for one VIP. Second, if the last
"down"-ed (in fact removed VIP from dummy0) remains configured
then the hidden flag continues to function. This is caused from
the fact that ifconfig can not remove the last IP address
and secondly, because the kernel continues to consider this
VIP on dummy0 even while dummy0 is down. OTOH, ip addr del
for the last address can stop the IP Protocol for dummy0
and to reset the dummy0/hidden value to 0.

> > your directors are realservers too? Your backup
> > director is a realserver
> > until it has to become the master director?
>
> Until yesterday, for the last 3 months on these
> servers, the master and backup directors were real
> servers also.  We did this for a year with some older
> servers we replaced these with.  The difference is we
> used forwarding with ipchains on the real servers, and
> that worked like a charm.  With the new servers apache
> just didn't like that setup so I was forced to put the
> ip's on dummy0:X.

        Yes, the ipchains redirect method is not suitable
for some servers but the tproxy support usually allows
binding to any IP (in Linux 2.2). The problem appears when
these servers walk the list with IP addresses.

> Dan.

Regards

--
Julian Anastasov <ja@xxxxxx>



<Prev in Thread] Current Thread [Next in Thread>