LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] ipvs connections sync and CPU usage

To: Aleksey Chudov <aleksey.chudov@xxxxxxxxx>
Subject: Re: [lvs-users] ipvs connections sync and CPU usage
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Julian Anastasov <ja@xxxxxx>
Date: Wed, 28 Dec 2011 03:13:54 +0200 (EET)
        Hello,

On Tue, 27 Dec 2011, Aleksey Chudov wrote:

> After applying "port 0 patch" ipvsadm displays Active and InAct connection
> on Backup node for Fwmark virtual.
> 
> 
> Tried the following:
> 
> Linux Kernel 2.6.39.4 + LVS Fwmark (configured as previously)
> 
> 1. ip_vs_sync_conn original, (HZ/10) on both, sync_threshold "3 10" on both,
> "port 0 patch"
> Results: sync traffic 50 Mbit/s, 4000 packets/sec, 30 %sys CPU on Backup, 8%
> diff in Persistent, 2% diff in Active, SndbufErrors: 0
> 
> 2. ip_vs_sync_conn patched, (HZ/10) on both, sync_threshold "3  100" on
> both, "port 0 patch"
> Results: sync traffic 60 Mbit/s, 5000 packets/sec, 40 %sys CPU on Backup, 8%
> diff in Persistent, 8% diff in Active, SndbufErrors: 0

        Not sure why difference in Active conns depends on
sync period. May be master should have more inactive conns
because SYN states are not synced, sync starts in EST state.
Exit from EST state should be reported no matter the
sync period. Who has more Active conns? Master or Backup?

> 3. ip_vs_sync_conn patched, (HZ/10) on both, sync_threshold "3  10" on both,
> "port 0 patch"
> Results: sync traffic 90 Mbit/s, 8000 packets/sec, 60 %sys CPU on Backup, 2%
> diff in Persistent, 2% diff in Active, SndbufErrors: 0
> 
> So, looks like "port 0 patch" fix Fwmark connections and CPU usage issues.
> 
> To reduce the  difference in persistent and active connections we should use
> ip_vs_sync_conn pathed + "3 10" but %sys CPU is still too high.

        Yes, "ip_vs_sync_conn patched" adds extra overhead when
repeating the sync messages for templates which is more with
shorter period such as 10.

> Are there any advantages in the reduction of persistence connection timeouts
> and tcp, tcpfin timeouts?

        I'm not sure the timeout values play too much for the
sync traffic. Currently, sync traffic depends on packet rates
and the threshold+period configuration. Another option is to add 
time-based alternative. 

        What we know about states and timeouts is:

- connection can change state on packet, we sync only changes
to some non-initial states

- packets can extend (restart) the timer in same state. It
seems now we do not report such timeout restarts to backup
server for all states, only established state uses the
threshold+period algorithm for reducing the sync messages.

- templates are synced depending on controlled connections,
i.e. on packets and not on timeout restart

        May be what we need is something like this:

- add additional configuration that can replace threshold+period
values if sync_retry_period is set:

        - sync_retry_period: example value: 10 seconds,
backup will set cp->timeout = timeout + sync_retry_period,
master will send sync message on received packet if we enter
state that should be reported or when connection timer
is restarted in such way that last sync message was
sent more than sync_retry_period ago. The idea is to
avoid sync messages for sync_retry_period seconds if state
is same, nothing is changed and only timeout is restarted,
i.e. send one sync message every 10 seconds in established
state. Template conns will be updated in the same way,
once per 10s. When sync_retry_period is 0 we will use the
threshold+period parameters as before.

        - sync_retries: this parameter can have the
purpose to define how many retries should be sent. As
adding additional timer to conn is not cheap, may be we can
add just one unsigned long "sync_time" to hold the jiffies
when sync message was generated but the lower 2 bits can be
used to encode the retry count (0..3). For example, when
state changes or sync_retry_period is passed we should
send sync message and should set sync_time = (jiffies & ~3UL) | 0;

        If sync_retries is set to 1..3 we can update
sync_time on the following 1..3 packets. For example,
if connection has 10 packets per second in established
state, we will send (1 + sync_retries) sync messages every
sync_retry_period seconds. Of course, such reliability
is guaranteed only if there are many packets in the
sync_retry_period. No retries will happen if we see
one packet per sync_retry_period because we don't use
timer to retransmit sync messages.

        This is what I'm going to try. Now the problem is
how to implement such algorithm, so that sync_time access
is atomic and no problem happens when many CPUs deliver
packets for same connection.

> Regards,
> Aleksey

Regards

--
Julian Anastasov <ja@xxxxxx>

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread] Current Thread [Next in Thread>