Re: [lvs-users] Kernel 2.6.35 and 100% S.I. CPU Time

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [lvs-users] Kernel 2.6.35 and 100% S.I. CPU Time
From: JL <lvs@xxxxxxxx>
Date: Mon, 13 Sep 2010 17:42:54 +0100
OK, I have a tcpdump of some LVS packets. Immediately after receiving
this, the backup goes to 100% S.I.

A few notes that may help: is the external machine I am testing from is the VIP is the private-side address of master is the private-side address of backup

I notice that the packet dump contains multiple references to the
*same* connection. Is that normal?

This problem doesn't happen with HTTP with this small number of
connections. I suspect that may be because my HTTP tests have a lot
less packets per connection.

I triggered this with four simultaneous connections - but only on my
third attempt.

On 13 September 2010 16:47, JL <lvs@xxxxxxxx> wrote:
> On 13 September 2010 16:25, Simon Horman <horms@xxxxxxxxxxxx> wrote:
>> On Mon, Sep 13, 2010 at 03:52:29PM +0100, JL wrote:
>>> OK, More information:
>>> I hope someone who is up on ipvs kernel side is listening!
>> I am listening, sorry for not responding earlier.
> Thanks. I was getting nervous :)
>>> If a backup machines receives an IPVS state update packet (the ones
>>> sent to with a certain number of connections in it
>>> (somewhere between two and eight, inclusive, will trigger it) then SI
>>> goes to 100% on the backup immediately.
>>> Firewalling insulates you from the problem (although, of
>>> course, is unsuitable for a live deployment).
>> Presumably turning off connection synchronisation
>> has the same effect.
>>> Feeding in only one connection at a time (slowly enough that the each
>>> have their own IPVS packet) doesn't trigger the problem.
>> So it occurs if the number of synchronised connections in
>> a single packet is between 2 and 8. So 1 is ok, and so is 9?
> No, one is ok, but somewhere between 2 and 8 this problem begins, and
> anything higher has the problem. I just haven't been able to narrow
> down the number any tighter than that.
> However, some more testing indicates that it is not that straight-forward.
> If I trigger it by pressing reload in the browser (which kicks off
> about 9 HTTPS connections) I get the problem - If I put those same
> gets into a bash script, and get them all at once, then it doesn't.
> I'm still trying to simplify the problem down to a simple script I can run.
> This is a two-node LVS/RS system. I have found that if none the
> connections in the state packet are to backup machine, then it doesn't
> trigger this problem.
>>> This happens with linux, but not
>> That is a fairly wide number of kernel versions.
>> But if it is easy to reproduce then it should be fairly easy to track down.
> That was the intent of coming up with a simple test - that I could try
> a number of different kernel versions, and see where the problem
> appears.
> Investigation is ongoing...
> Thanks,
> --
> Jarrod Lowe

Jarrod Lowe

Attachment: dump.txt
Description: Text document

Please read the documentation before posting - it's available at: mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to
<Prev in Thread] Current Thread [Next in Thread>