LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] ipvs connections sync and CPU usage

To: "'Julian Anastasov'" <ja@xxxxxx>
Subject: Re: [lvs-users] ipvs connections sync and CPU usage
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: "Aleksey Chudov" <aleksey.chudov@xxxxxxxxx>
Date: Fri, 23 Dec 2011 17:24:52 +0200
Hello,

Thanks for the answer.

>> Linux Kernel 2.6.39.4 was patched to prevent "ip_vs_send_async error" 
>> as previously discussed in

>2.6.39.4 with sync_version=1 ?

Yes. net.ipv4.vs.sync_version = 1 on both nodes.


>> http://archive.linuxvirtualserver.org/html/lvs-users/2009-12/msg00058.html

> I have an idea how to avoid delays/drops in master when sending the sync 
> packets.
> May be we can use counter of enqueued packets and when it reaches 10 (some 
> fixed value)
> we can call wake_up_process(), so that we can wakeup the sending process which
> sleeps 1 second after every send. By this way we will prevent overflow of the 
> socket's sending
> buffer (the ip_vs_send_async error message). I can prepare patch in the 
> following days.

There is no performance problems on Master node with 
schedule_timeout_interruptible(HZ/10).
%sys Cpu utilization is 2 - 5%.


> Any progress to decrease the load with tuning sync period?
> What is the packet rate and MBytes of the sync traffic to backup?

Tried the following:
1. schedule_timeout_interruptible(HZ/10) and sync_threshold = "3  50" on both 
nodes
Results: sync traffic 40 Mbit/s, 4000 packets/sec, 35 %sys CPU on Backup node,
60% difference in persistent connections between Master and Backup nodes,
netstat -s on Master SndbufErrors: 0

2. Set schedule_timeout_interruptible(HZ/10) and sync_threshold = "3  10" on 
both nodes
Results: sync traffic 60 Mbit/s, 6000 packets/sec, 50 %sys CPU on Backup node,
6% difference in persistent connections between Master and Backup nodes,
netstat -s on Master SndbufErrors: 0

3. Set schedule_timeout_interruptible(HZ/10) and sync_threshold = "3  5" on 
both nodes
Results: sync traffic 110 Mbit/s, 12000 packets/sec, 80 %sys CPU on Backup node,
3% difference in persistent connections between Master and Backup nodes,
netstat -s on Master SndbufErrors: 0

4. Set schedule_timeout_interruptible(HZ/10) and sync_threshold = "3  100" on 
both nodes
Results: sync traffic 30 Mbit/s, 3000 packets/sec, 25 %sys CPU on Backup node,
70% difference in persistent connections between Master and Backup nodes,
netstat -s on Master SndbufErrors: 0

5. Set schedule_timeout_interruptible(HZ) and sync_threshold = "3  10" on both 
nodes
Results: sync traffic 40 Mbit/s, 4000 packets/sec, 35 %sys CPU on Backup node,
60% difference in persistent connections between Master and Backup nodes,
netstat -s on Master SndbufErrors: 3208239

As can be seen above the lowest difference in persistent connections between 
Master and Backup is with
HZ/10 and sync_threshold = "3  5", but 80 %sys CPU on Backup node is critical 
so "3  10" is more appropriate.

Is it possible to implement change schedule_timeout_interruptible via sysctl?

As mentioned in another report 
http://www.gossamer-threads.com/lists/lvs/users/24331
after switching from TCP VIP to Fwmark %sys CPU is raised from 40 - 50 % (TCP 
VIP) to 80 - 100 % (Fwmark)
with no difference in sync traffic.


>> May be the key here is to use some large value for the sysctl_sync_period 
>> (the 2nd of the values).
>> Keep first value 2 or 3 and try different values for the period.
>> For example, 100, 1000. It depends on how many packets have the connections.

As can be seen above large 2nd value ​​leads to increase in persistent 
connections difference between Master and Backup nodes.
In my tests difference over 30% is critical while IP failover because Backup 
node is thundered by reconnections.

>> Is it possible to lower cpu usage of ipvs_backup?
>> Is it possible to distribute cpu usage of ipvs_backup on multiple CPU cores?

> It was designed as single thread and it is expected that the sync traffic 
> should be lower than traffic in master.
> Only if many backup threads are started we can utilize more cores but such 
> change will lead to changes in
> user interface, there is also small risk due to possible packet reordering.

Sad to see 100% CPU utilization on single cpu core while 23 cores does not busy 
)

Regards,
Aleksey


_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
<Prev in Thread] Current Thread [Next in Thread>