LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] ipvs connections sync and CPU usage

To: Aleksey Chudov <aleksey.chudov@xxxxxxxxx>
Subject: Re: [lvs-users] ipvs connections sync and CPU usage
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Julian Anastasov <ja@xxxxxx>
Date: Sat, 24 Dec 2011 01:42:21 +0200 (EET)
        Hello,

On Fri, 23 Dec 2011, Aleksey Chudov wrote:

> >> http://archive.linuxvirtualserver.org/html/lvs-users/2009-12/msg00058.html
> 
> > I have an idea how to avoid delays/drops in master when sending the sync 
> > packets.
> > May be we can use counter of enqueued packets and when it reaches 10 (some 
> > fixed value)
> > we can call wake_up_process(), so that we can wakeup the sending process 
> > which
> > sleeps 1 second after every send. By this way we will prevent overflow of 
> > the socket's sending
> > buffer (the ip_vs_send_async error message). I can prepare patch in the 
> > following days.
> 
> There is no performance problems on Master node with 
> schedule_timeout_interruptible(HZ/10).
> %sys Cpu utilization is 2 - 5%.

        The problem is that sb_queue_tail does not know
when to wakeup the master thread without knowing the
socket's send space. There is always a risk to drop
sync message on sending. Another option is master thread
to change its sleep value according to the traffic, i.e.
it can reduce sleep timer below the currently fixed value
of HZ.

> > Any progress to decrease the load with tuning sync period?
> > What is the packet rate and MBytes of the sync traffic to backup?
> 
> Tried the following:
> 1. schedule_timeout_interruptible(HZ/10) and sync_threshold = "3  50" on both 
> nodes
> Results: sync traffic 40 Mbit/s, 4000 packets/sec, 35 %sys CPU on Backup node,
> 60% difference in persistent connections between Master and Backup nodes,
> netstat -s on Master SndbufErrors: 0
> 
> 2. Set schedule_timeout_interruptible(HZ/10) and sync_threshold = "3  10" on 
> both nodes
> Results: sync traffic 60 Mbit/s, 6000 packets/sec, 50 %sys CPU on Backup node,
> 6% difference in persistent connections between Master and Backup nodes,
> netstat -s on Master SndbufErrors: 0
> 
> 3. Set schedule_timeout_interruptible(HZ/10) and sync_threshold = "3  5" on 
> both nodes
> Results: sync traffic 110 Mbit/s, 12000 packets/sec, 80 %sys CPU on Backup 
> node,
> 3% difference in persistent connections between Master and Backup nodes,
> netstat -s on Master SndbufErrors: 0
> 
> 4. Set schedule_timeout_interruptible(HZ/10) and sync_threshold = "3  100" on 
> both nodes
> Results: sync traffic 30 Mbit/s, 3000 packets/sec, 25 %sys CPU on Backup node,
> 70% difference in persistent connections between Master and Backup nodes,
> netstat -s on Master SndbufErrors: 0
> 
> 5. Set schedule_timeout_interruptible(HZ) and sync_threshold = "3  10" on 
> both nodes
> Results: sync traffic 40 Mbit/s, 4000 packets/sec, 35 %sys CPU on Backup node,
> 60% difference in persistent connections between Master and Backup nodes,
> netstat -s on Master SndbufErrors: 3208239
> 
> As can be seen above the lowest difference in persistent connections between 
> Master and Backup is with
> HZ/10 and sync_threshold = "3  5", but 80 %sys CPU on Backup node is critical 
> so "3  10" is more appropriate.
> 
> Is it possible to implement change schedule_timeout_interruptible via sysctl?

        May be better to implement logic with auto-adjustment.

> As mentioned in another report 
> http://www.gossamer-threads.com/lists/lvs/users/24331
> after switching from TCP VIP to Fwmark %sys CPU is raised from 40 - 50 % (TCP 
> VIP) to 80 - 100 % (Fwmark)
> with no difference in sync traffic.
> 
> 
> >> May be the key here is to use some large value for the sysctl_sync_period 
> >> (the 2nd of the values).
> >> Keep first value 2 or 3 and try different values for the period.
> >> For example, 100, 1000. It depends on how many packets have the 
> >> connections.
> 
> As can be seen above large 2nd value ​​leads to increase in persistent 
> connections difference between Master and Backup nodes.
> In my tests difference over 30% is critical while IP failover because Backup 
> node is thundered by reconnections.

        May be such difference can be explained in this way:

- in master, the persistence templates start with timeout
equal to persistence timeout for the service. When more
connections hit this template they lock it in memory
and its timer is extended with 60 seconds every time
while there are controlled connections. This is the
mechanism behind ip_vs_control_add/ip_vs_control_del and
ip_vs_conn_expire. As result, such persistence templates
live longer than the configured timeout because the user
session lives longer.

- in slave, the control/n_control mechanism is not
implemented. It allows all persistent templates to
expire at the right time. They do not know that there
are controlled connections because nobody calls
ip_vs_control_add in the backup thread. As the lifetime
of templates is not extended their count should be lower.

        So, why the period leads to difference? May
be because with large value we do not refresh the
templates too often and backup server does not see that
their life is extended in master server.

        Using in_pkts for templates is not a good idea.
As drops are possible, it can be done more often but
not every time as for sync version 0. Also, before
ip_vs_conn_expire() we do not know if template life
will be extended. May be backup server should use
longer timeout for templates, so that it can not
miss the sync packets during the extended period.

        So, now the question is how to properly
reduce the rate of sync packets for templates and may be
for other conns when state is not changed but its life is
extended. I have to think for some time about such changes.

        Can you try such change: in ip_vs_sync_conn()
comment the following two lines under
"Reduce sync rate for templates":

        if (pkts % sysctl_sync_period(ipvs) != 1)
                return;

        By this way we will sync templates every time
a normal connection is synced, as for version 0. It is still
too often for templates but now you can try again with "3 100",
so that we can see if the difference is reduced.

        BTW, what is the persistence timeout value?

> >> Is it possible to lower cpu usage of ipvs_backup?
> >> Is it possible to distribute cpu usage of ipvs_backup on multiple CPU 
> >> cores?
> 
> > It was designed as single thread and it is expected that the sync traffic 
> > should be lower than traffic in master.
> > Only if many backup threads are started we can utilize more cores but such 
> > change will lead to changes in
> > user interface, there is also small risk due to possible packet reordering.
> 
> Sad to see 100% CPU utilization on single cpu core while 23 cores does not 
> busy )
> 
> Regards,
> Aleksey

Regards

--
Julian Anastasov <ja@xxxxxx>
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
<Prev in Thread] Current Thread [Next in Thread>