LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] Kernel 2.6.35 and 100% S.I. CPU Time

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [lvs-users] Kernel 2.6.35 and 100% S.I. CPU Time
From: JL <lvs@xxxxxxxx>
Date: Mon, 13 Sep 2010 17:42:54 +0100
OK, I have a tcpdump of some LVS packets. Immediately after receiving
this, the backup goes to 100% S.I.

A few notes that may help:
  172.26.64.76 is the external machine I am testing from
  192.168.148.2 is the VIP
  10.17.192.19 is the private-side address of master
  10.17.192.20 is the private-side address of backup

I notice that the packet dump contains multiple references to the
*same* connection. Is that normal?

This problem doesn't happen with HTTP with this small number of
connections. I suspect that may be because my HTTP tests have a lot
less packets per connection.

I triggered this with four simultaneous connections - but only on my
third attempt.



On 13 September 2010 16:47, JL <lvs@xxxxxxxx> wrote:
> On 13 September 2010 16:25, Simon Horman <horms@xxxxxxxxxxxx> wrote:
>> On Mon, Sep 13, 2010 at 03:52:29PM +0100, JL wrote:
>>> OK, More information:
>>>
>>> I hope someone who is up on ipvs kernel side is listening!
>>
>> I am listening, sorry for not responding earlier.
> Thanks. I was getting nervous :)
>
>>> If a backup machines receives an IPVS state update packet (the ones
>>> sent to 224.0.0.81) with a certain number of connections in it
>>> (somewhere between two and eight, inclusive, will trigger it) then SI
>>> goes to 100% on the backup immediately.
>>>
>>> Firewalling 224.0.0.81 insulates you from the problem (although, of
>>> course, is unsuitable for a live deployment).
>>
>> Presumably turning off connection synchronisation
>> has the same effect.
>
>
>>> Feeding in only one connection at a time (slowly enough that the each
>>> have their own IPVS packet) doesn't trigger the problem.
>>
>> So it occurs if the number of synchronised connections in
>> a single packet is between 2 and 8. So 1 is ok, and so is 9?
> No, one is ok, but somewhere between 2 and 8 this problem begins, and
> anything higher has the problem. I just haven't been able to narrow
> down the number any tighter than that.
>
> However, some more testing indicates that it is not that straight-forward.
>
> If I trigger it by pressing reload in the browser (which kicks off
> about 9 HTTPS connections) I get the problem - If I put those same
> gets into a bash script, and get them all at once, then it doesn't.
>
> I'm still trying to simplify the problem down to a simple script I can run.
>
> This is a two-node LVS/RS system. I have found that if none the
> connections in the state packet are to backup machine, then it doesn't
> trigger this problem.
>
>>> This happens with linux 2.6.35.4, but not 2.6.27.45.
>>
>> That is a fairly wide number of kernel versions.
>> But if it is easy to reproduce then it should be fairly easy to track down.
> That was the intent of coming up with a simple test - that I could try
> a number of different kernel versions, and see where the problem
> appears.
>
>
> Investigation is ongoing...
>
> Thanks,
> --
> Jarrod Lowe
>



-- 
Jarrod Lowe

Attachment: dump.txt
Description: Text document

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
<Prev in Thread] Current Thread [Next in Thread>