Hi Horms,
On Wed, 26 Jun 2002, Horms wrote:
>
> I beleive that there is a minor bug in LVS 1.0.3 such that if stale
> information is recieved by the synchronisation thread the
> inactive and active connection counters may become inacurate.
>
> More specifically, a connection's entry in the hash table
> may change from being marked inactive to active. However the
> active and inactive connection counters for the connection's
> destination are not incremented and decremented accordingly.
>
> Later, when the connection's entry is removed from the hash table the
> active connection counter will be decremented and the inactive
> connection counter will be lefed unchanged. Thus the former becomes one
> lower than it should be, and the latter remains one higher than it
> should be.
>
The connection entries created by the synchronization mechanism always
have their dest server pointer NULL (i.e. cp->dest is NULL). When cp->dest
is NULL, it will not participate in server active/inactive connection
counting.
I just checked the ip_vs_sync.c code, and found that it didn't check the
cp->dest (it is a normal connection if cp->dest is not NULL) before
updating the state, it may cause the problem. For example, there are two
pirmary/backup load balancers (lb1 and lb2), first the lb1 is active,
there is a connection created and pointed to the selected server, and the
connection is synchronized to the lb2. Then, the lb1 fails and the lb2
takes over, the connection can continue through the lb2; the lb1 comes
back and works as the backup. Just after the time the connection changes
its state (such as from ESTABLISHED/ACTIVE to INACTIVE), the connection is
synchronized from the lb2 to the lb1. The connection at the lb1 still
points to the selected server, the directly changing state of this
connection will make the server active/inactive connection counting not
correct.
I haven't setup an environment to reproduce this problem. Horms, have you
experienced the problem in this way?
> This may lead to something along the lines of:
>
> Prot LocalAddress:Port Scheduler Flags
> -> RemoteAddress:Port Forward Weight ActiveConn InActConn
> TCP vs0:http wlc
> -> b1:http Route 1 4294967296 1
>
> When in fact the last line should be:
>
> -> b1:http Route 1 0 0
>
>
> Thought this bug is non-fatal, and unlikely to occur I think that
> it is still worth applying a fix. I have attached a patch which
> should resolve this problem.
>
>
Your fix is probably not correct. It updates the cp->dest's
active/inactive connection counters directly, where cp->dest may be NULL
and it will lead to the NULL pointer dereference.
I perfer that if cp->dest is not NULL and it means that it is not created
by the sync daemon, we are not going to update its state. Please see/test
the attached fix.
Thanks,
Wensong
ip_vs_sync.diff
Description: Text document
|