Re: Our LVS/DR backends freezes

To: Joseph Mack NA3T <jmack@xxxxxxxx>
Subject: Re: Our LVS/DR backends freezes
Cc: " users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
From: Olle Ö?stlund <olle@xxxxxxxxxxx>
Date: Wed, 29 Nov 2006 16:02:43 +0100
tis 2006-11-28 klockan 10:40 -0800 skrev Joseph Mack NA3T: 
> On Tue, 28 Nov 2006, Olle ~Vstlund wrote:
> > 1) LVS/DR is an exotic technique ...
> it's been very well tested and is the most often implemented 
> type of LVS setup. Admittedly having two machines handling 
> one connection requires some thinking about tcpip, but when 
> you consider the two machines as a black box, it functions 
> as a single box.

Ok, so it's the most common LVS-setup. Well, also we picked it as our
first choice. It's architecture is attractive...

> > 2) The director's ipvs connection-table gives a very bad picture of the
> > real world's connection-situation when using LVS/DR.
> It's an estimate, usually within a factor of 2. Your setup 
> has problems, but we don't know what yet.


I'm afraid focus has shifted to the ipvs connection-table. As far as I
know it's not causing us any real trouble. Merely showing very rough
figures. Unless it related to our hanging realservers?

If I have understood things correctly, the director's figures for
connections in FIN_WAIT or CLOSE does not correspond to real connections
at the realserver (I have not been able to see them using netstat as far
as I know). Thus, it is not a figure of the connections consumed at the
realserver. If all this is true, I see no relation between the odd
figures in the directors connection-table and the hanging realservers

On the other hand, if the FIN_WAIT/CLOSE connections in the directors
connection table does exist in some form at the relaserver and consume
resources in the realserver's kernel, I guess it could cause a hanging
realserver when all the resources have ben consumed. If this is
possible, it would explain our problems. 

What are your thoughts about this?

> > We will most likely look into switching to LVS/NAT, which adhere better
> > to network protocols and which kernel-logic may be less exotic. Whether
> > it is more used/better tested I don't know? What do you think?
> you want something that works. You don't care what the 
> problem is. If LVS-NAT works for you and LVS-DR doesn't, 
> then that's all that counts.

Well, our number one priority is still to identify and solve the
problem. It's the only way of making sure it won't pop up again. LVS/NAT
is our escape-route, but I feel we are not ready for escape yet.

Tonight I will reboot one of our realservers. This would reset the
leaking resources of that server, while the other server has been
leaking for days and will keep on leaking. If the next freeze will hit
just one server (the one which was not rebooted), it is a confirmation
of our "leaking resources" theory. If both freezes, I'm out of

Finally, I'm very impressed by the way you read and answers postings to
this list. By far the best list I've participated in. Keep up the good

<Prev in Thread] Current Thread [Next in Thread>