LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] Kernel 2.6.35 and 100% S.I. CPU Time

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [lvs-users] Kernel 2.6.35 and 100% S.I. CPU Time
From: JL <lvs@xxxxxxxx>
Date: Mon, 13 Sep 2010 16:47:08 +0100
On 13 September 2010 16:25, Simon Horman <horms@xxxxxxxxxxxx> wrote:
> On Mon, Sep 13, 2010 at 03:52:29PM +0100, JL wrote:
>> OK, More information:
>>
>> I hope someone who is up on ipvs kernel side is listening!
>
> I am listening, sorry for not responding earlier.
Thanks. I was getting nervous :)

>> If a backup machines receives an IPVS state update packet (the ones
>> sent to 224.0.0.81) with a certain number of connections in it
>> (somewhere between two and eight, inclusive, will trigger it) then SI
>> goes to 100% on the backup immediately.
>>
>> Firewalling 224.0.0.81 insulates you from the problem (although, of
>> course, is unsuitable for a live deployment).
>
> Presumably turning off connection synchronisation
> has the same effect.


>> Feeding in only one connection at a time (slowly enough that the each
>> have their own IPVS packet) doesn't trigger the problem.
>
> So it occurs if the number of synchronised connections in
> a single packet is between 2 and 8. So 1 is ok, and so is 9?
No, one is ok, but somewhere between 2 and 8 this problem begins, and
anything higher has the problem. I just haven't been able to narrow
down the number any tighter than that.

However, some more testing indicates that it is not that straight-forward.

If I trigger it by pressing reload in the browser (which kicks off
about 9 HTTPS connections) I get the problem - If I put those same
gets into a bash script, and get them all at once, then it doesn't.

I'm still trying to simplify the problem down to a simple script I can run.

This is a two-node LVS/RS system. I have found that if none the
connections in the state packet are to backup machine, then it doesn't
trigger this problem.

>> This happens with linux 2.6.35.4, but not 2.6.27.45.
>
> That is a fairly wide number of kernel versions.
> But if it is easy to reproduce then it should be fairly easy to track down.
That was the intent of coming up with a simple test - that I could try
a number of different kernel versions, and see where the problem
appears.


Investigation is ongoing...

Thanks,
-- 
Jarrod Lowe

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread] Current Thread [Next in Thread>