Hi Horms,
On Tue, 20 May 2003, Horms wrote:
>
> Hi,
>
> I believe that I have found the cause of your problem.
> The culprit is the following line which was added to
> ip_vs_sltimer_init() in ip_vs_timer.c
>
> sltimer_jiffies = jiffies;
>
Yes, the terrible line caused the problem. I added it in the version
1.0.8, in order to save a little bit CPU cycles in iterating
sltimer_jiffies to the system jiffies while reloading the ipvs module,
didn't realize this simple assignment would cause the indexes of slow
timer vectors not consistent with the sltimer_jiffies. It introduces a
completely wrong clock system. This extra line is absolutely not
necessary. *blush*
> Else where in the code sltimer_jiffies is initialised to zero.
> This new initialisation overrides that. This however, creates a problem.
> The timers are implemented by inserting them into an array.
> (Actually several arrays but that is not relevant).
> Which array element to insert a timer into is calculated based on
> sltimer_jiffies.
>
> Periodically (once per second) run_sltimer_list() which works
> its way through the array, executing the timers. How far
> it works through the array is based on making sltimer_jiffies catch
> up to jiffies, the former is incremented for each itteration of the
> loop.
>
> Unfortunately, which slot to inspect and thus which timers to
> execute is determined by an index. It too is incremented
> for each iteration of the loop. But it is not initialised in
> ip_vs_sltimer_init() when sltimer_jiffies is intialised.
> Thus unless ( ( jiffies << 6 ) & ( (1 << 8) - 1) ) == 0
> at the time that LVS is initialised then the index will
> not be correctly possitioned. Observing that this has a probablility
> of 1/2^8 this isn't so hot. And the more non-zero that value is,
> the further out the index will be.
>
> The result is two fold.
>
> Firtly as the index ends up lagging sltimer_jiffies, timers
> are executed up to 2^14 jiffies late, on an intel system
> there are 100 jiffies/seccond, so this means timers
> can be exuted up to 163 secconds late. This isn't particularly
> important. Except that it means that entries linger in the
> connection table and shop up with a really large (actually negative)
> timeout. It also isn't so good if the machine is busy as
> it uses up unneccessary resources, particularly memory.
>
The sltv2.index and sltv2.index are not consistent with sltimer_jiffies
too. So, the actual timer expiration delay is the number of jiffies/100
at the time of loading ipvs module.
It would not have big problem, unless the ipvs module is loaded just after
the system boots up. Otherwise, the longer time the system is running
before loading ipvs module, the longer timer expiration delay. Terrible
mistake.
> It also appears to have the more severe side effect that
> entries are not correctly cleared when an attempt
> is made to remove the lvs module from the kernel. Leading
> to such an operation hanging if there are entries
> in the timer array that should have been expired.
>
> I have attached a patch that should resolve this problem
> by initialising the index correctly. After all that,
> it is a whole one new line :)
>
> You can also resolve it by removing the offending line from
> ip_vs_sltimer_init().
>
> Note that you should use one of these solutions, not both!
>
The elegent solution is to remove that terrible extra line.
Thanks a lot for the finding. I will pack the version 1.0.9 soon and maybe
back port the SED and NY schedulers from 1.1.5 too.
Regards,
Wensong
|