OK, More information:
I hope someone who is up on ipvs kernel side is listening!
If a backup machines receives an IPVS state update packet (the ones
sent to 224.0.0.81) with a certain number of connections in it
(somewhere between two and eight, inclusive, will trigger it) then SI
goes to 100% on the backup immediately.
Firewalling 224.0.0.81 insulates you from the problem (although, of
course, is unsuitable for a live deployment).
Feeding in only one connection at a time (slowly enough that the each
have their own IPVS packet) doesn't trigger the problem.
This happens with linux 2.6.35.4, but not 2.6.27.45.
On 13 September 2010 11:40, JL <lvs@xxxxxxxx> wrote:
> On 13 September 2010 03:43, 楷子狐 <higkoo@xxxxxxx> wrote:
>> I had see this problem before :
>>
>> http://hi.baidu.com/higkoo/blog/item/f8943c60d16843d28cb10d17.html
>> ------------------
> Looks like the same thing.
>
> I suspect that the LVS service receives updates from the master, and
> then sticks them in some netfilter table, but with some error that
> makes the table huge. Maybe multiple entries appear?
>
> 楷子狐, Are you using MARK firewall rules, or a different method to
> select packets for LVS?
>
> If I change /proc/sys/net/ipv4/vs/sync_threshold to "3 100000", it
> does *not* fix the problem. Which kind of throws any theory I have had
> out the window.
>
> "ipvsadm -l -c" Gives a lot of kernel messages "Detected stall on CPU
> x". Eventually, however we get the list (which is currently only about
> a dozen entries).
>
> It was fine at linux 2.6.27.45.
>
> # /proc/sys/net/ipv4/vs# grep -H "" *
> am_droprate:10
> amemthresh:1024
> cache_bypass:0
> drop_entry:0
> drop_packet:0
> expire_nodest_conn:0
> expire_quiescent_template:0
> nat_icmp_send:0
> secure_tcp:0
> sync_threshold:3 50
>
> Does anyone have an idea what might be happening here?
>
>> ------------------ Original ------------------
>> From: "JL"<lvs@xxxxxxxx>;
>> Date: Sun, Sep 12, 2010 06:29 PM
>> To: "LinuxVirtualServer.org users mailing
>> list."<lvs-users@xxxxxxxxxxxxxxxxxxxxxx>;
>>
>> Subject: [lvs-users] Kernel 2.6.35 and 100% S.I. CPU Time
>>
>>
>> Hi,
>>
>> I have recently upgraded from kernel 2.6.27.45 to 2.6.35.4.
>>
>> Now, any machine which is a backup (that is, receiving connection
>> updates from another machine) goes to nearly 100% CPU time in Soft
>> Interrupt.
>>
>> Profiling the kernel shows the largest portion of time is spent in
>> nf_iterate.
>>
>> We are using FWMARK rules to specify traffic for LVS.
>>
>> Is this problem something people are aware of? Does anyone know of a
>> fix or workaround?
>>
>> Thanks,
>> --
>> Jarrod Lowe
>>
>> _______________________________________________
>> Please read the documentation before posting - it's available at:
>> http://www.linuxvirtualserver.org/
>>
>> LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
>> Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
>> or go to http://lists.graemef.net/mailman/listinfo/lvs-users
>> _______________________________________________
>> Please read the documentation before posting - it's available at:
>> http://www.linuxvirtualserver.org/
>>
>> LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
>> Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
>> or go to http://lists.graemef.net/mailman/listinfo/lvs-users
>>
>
>
>
> --
> Jarrod Lowe
>
--
Jarrod Lowe
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
|