On Tue, Oct 30, 2007 at 02:22:10AM +0200, Rumen Bogdanovski wrote:
> Hi all,
> I just saw a patch for the proper timeout set at the backup for the
> received connections, well I want to share my opinion, since I have been
> messing with the very same code last several days.
> I think setting the timeout to 3 minutes for all received connections
> has a very good reason.
> IMHO setting a timeout of [IP_VS_TCP_S_ESTABLISHED] = 15*60*HZ is wrong
> since AFAIK there is no way for the master to inform the backup if a
> connection is closed or fin_wait or whatever. Connection sending is
> based on packet count, isn't it? So imagine A TCP connection lasting 3
> seconds which is going to hang on the backup for 15 more minutes. Now
> imagine 1000 connections lasting several seconds on the master hanging
> for 15 minutes on the backup. I think this timeout should be kept
> reasonably low to keep minimal number of hanging connections and
> reasonably high not to timeout until next update.
> However if the backup takes over it will set the proper timeouts as
> defined in "static int xxx_timeouts[IP_VS_XXX_S_LAST+1]" for all the
> Well I might be wrong, but I just wanted point the attention of the
> people who know how everything works to this potential problem :)
You are right that increasing the timeout will likely result in an
increased number of connections on the backup linux-director. But I'm
not entirely convinced that this is a problem as such. Not from a
memory point of view anyway. The connection entries themselves are very
small (~116 on i386) and even if you have millions of them its
still not a lot of memory.
In any case, this is just about changing the default value to something
that I believe is a bit more sane. As people have found problems
with the current default. I'm still open to the idea of the default
being configurable on the backup linux director and/or transmitted
via the synchronisation problem.
If however there are more serious problems created by having connection
entries lying around on the standby server that have been closed on
the master server, then thats probably not a problem that can be changed
by twiddling the timeout. Its probably a more fundamental problem.