LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] ipvs connections sync and CPU usage

To: Aleksey Chudov <aleksey.chudov@xxxxxxxxx>
Subject: Re: [lvs-users] ipvs connections sync and CPU usage
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Julian Anastasov <ja@xxxxxx>
Date: Sun, 15 Jan 2012 18:48:34 +0200 (EET)
        Hello,

On Thu, 12 Jan 2012, Aleksey Chudov wrote:

> Hello Julian,
> 
> I successfully patched Linux Kernel 2.6.39.4 with "port 0", "HZ/10" and
> "sync" patched.
> After reboot and transition Backup server to Master state I see increase in
> sync traffic and cpu load respectively.
> 
> 
> 1. "port 0", "HZ/10" patches on Master, "port 0", "HZ/10" patches on Backup
> 
> Master # sysctl -a | grep net.ipv4.vs.sync
> net.ipv4.vs.sync_version = 1
> net.ipv4.vs.sync_threshold = 3  10
> 
> Results: sync traffic 60 Mbit/s, 4000 packets/sec, 40 %sys CPU on Backup, 40
> %soft CPU on Master
> 
> PersistConn: 93.5305%
> ActiveConn: 98.1211%
> InActConn: 99.4691%
> (less on Backup server)

        For test 1 I think traffic is low because master syncs
conn templates once per sysctl_sync_period. Here the problem
was that difference for PersistConn is high if sync_period
is large.

> 2. "port 0", "HZ/10" and "sync" patches on Master, "port 0", "HZ/10" patches
> on Backup
> 
> Master # sysctl -a | grep net.ipv4.vs.sync
> net.ipv4.vs.sync_version = 1
> net.ipv4.vs.sync_threshold = 3  10
> net.ipv4.vs.sync_refresh_period = 0
> net.ipv4.vs.sync_retries = 0
> 
> Results: sync traffic 300 Mbit/s (raised from 200 to 300 for 10 minutes
> after start), 25000 packets/sec,
> 98 %sys CPU on Backup, 70 - 90 %soft CPU on Master (all cores)

        If sync_refresh_period is used we follow some goals
when syncing conn templates but may be we need better policy
when only sync_threshold values are in effect. With latest
patch I restore rate for templates as done for version 0: sync
templates every time when normal connections are synced based on 
sync_threshold. I don't know if you have stats for old tests
where version 0 was used, may be older kernels. May be the
rate will be same because we sync templates too much.

> PersistConn: 99.6622%
> ActiveConn: 250.664%

        But this ActiveConn is strange, are some sync messages
dropped/lost here due to high sync traffic? Even HZ/10 does not
help here?

> InActConn: 35.802%
> 
> Yes, Number of ActiveConn on the Backup server is more than twice as many!
> Also memory usage on Backup server increased.
> And this is the only test in which load on Master server increased.
> Thus, the impression that something went wrong.

        If it happens in first 10 minutes and persistence
is 30 minutes, it means problem is not that we try to sync
conns on timer expiration. It should be from the fact that
we sync conn templates every time when normal conns are
synced. I guest the persistence session has ~5 connections
in average, so that rate is increased from 5000 to 25000.

> 3. "port 0", "HZ/10" and "sync" patches on Master, "port 0", "HZ/10" patches
> on Backup
> 
> Master # sysctl -a | grep net.ipv4.vs.sync
> net.ipv4.vs.sync_version = 1
> net.ipv4.vs.sync_threshold = 0  0
> net.ipv4.vs.sync_refresh_period = 10
> net.ipv4.vs.sync_retries = 0
> 
> Results: sync traffic 90 Mbit/s, 8000 packets/sec, 70 %sys CPU on Backup, 40
> %soft CPU on Master
> 
>        PersistConn ActiveConn  InActConn
> Master:    4897491    7073690    7663812
> Backup:    5057332    7073285    7625001
>            103.26%     99.99%     99.49%

        Yes, sync_refresh_period=10 is may be too strict but
it gives good numbers. May be should be increased for loaded
sites.

> 4. "port 0", "HZ/10" and "sync" patches on Master, "port 0", "HZ/10" patches
> on Backup
> 
> Master # sysctl -a | grep net.ipv4.vs.sync
> net.ipv4.vs.sync_version = 1
> net.ipv4.vs.sync_threshold = 0  0
> net.ipv4.vs.sync_refresh_period = 100
> net.ipv4.vs.sync_retries = 0
> 
> Results: sync traffic 60 Mbit/s, 5000 packets/sec, 50 %sys CPU on Backup, 40
> %soft CPU on Master
> 
>        PersistConn ActiveConn  InActConn
> Master:    5170205    7270767    7808097
> Backup:    5036484    7244686    7716304
>             97.41%     99.64%     98.82%

        It looks ok for your load. Note that any difference
in PersistConn can be ok as long as the persistence
timeout is above the longest session time. Then we are
ok even if templates are synced only once.

> 5. "port 0", "HZ/10" and "sync" patches on Master, "port 0", "HZ/10" patches
> on Backup
> 
> Master # sysctl -a | grep net.ipv4.vs.sync
> net.ipv4.vs.sync_version = 1
> net.ipv4.vs.sync_threshold = 0  0
> net.ipv4.vs.sync_refresh_period = 1000
> net.ipv4.vs.sync_retries = 0
> 
> Results: sync traffic 45 Mbit/s, 4000 packets/sec, 40 %sys CPU on Backup, 40
> %soft CPU on Master
> 
>        PersistConn ActiveConn  InActConn
> Master:    5226648    7691901    8251039
> Backup:    5100281    7576195    8159248
>             97.58%     98.50%     98.89%

        Large value for sync_refresh_period is still ok
for the default 15-minute timeout for ESTABLISHED state.
But may be it hurts the synchronization for conn templates.
Value of 1000 can give big difference for the 1800-second
persistence timeout but somehow it works here. Even your
latest test during peak hour shows good result: 96.06%.
OTOH, we use the persistence timeout as minimum timeout
where conns are scheduled to same real server and extend
it when new normal connection is created. If for your
setup you see 96% difference for sync_refresh_period=1000
it means normal connections from client are created early,
i.e. the sessions are short (eg. normal conns are created
in first 74 seconds) but you are using too large value for
persistence timeout. Because 96.06% means that even if templates
are refreshed once per 900 seconds, they expire nearly at
the same time in master and backup server, with ~74 seconds
difference. May be they are synced only once when they are
created and are never extended above the 1800 seconds
after last normal connection is created. For example:

+0 First conn in session, restart timer, sync (only once in 900s)
+74 Last conn in session, restart timer, no sync until +900
+1800 Expire in backup
+1874 Expire in master

1800 is 96.06% of 1874, of course, this 74 is an average value.
In such case I don't think you need more than one sync message
for templates and any value in the 128-200 range should give
same results as 1000.

> Of course it's quick results. To get the right counters more time needed.
> 
> Do I understand correctly that maximum safe sync_refresh_period depends on
> persistent timeout?

        Yes but we effectively clamp it to half of the timeout.
It means, for TCP-EST with default of 15 mins we will use max 450
while for your 1800-second persistence timeout the value used
is 900 even if you provide 1000. Users that have low traffic
can use sync_refresh_period=10 for more correct results.
Note that if conn template is still referenced by its normal
connections after the 1800-second period expires, the timeout
is changed from 1800 to 60 seconds and this can increase the
sync traffic.

> Have you thought about the possibility to distribute sync load on multiple
> processors?

        May be it is possible to use additional netlink
parameter to provide the desired number of threads.

> Looked closely at ipvsadm -lnc output and found that we have the following
> connections:
> 
> pro expire state       source             virtual            destination
> IP  05:02  NONE        Client IP:0    0.0.0.1:0          Real IP:0
> TCP 14:10  ESTABLISHED Client IP:50610 Virtual IP:80   Real IP:80
> TCP 14:10  ESTABLISHED Client IP:50619 Virtual IP:443   Real IP:443
> ...
> 
> Do we really need "Virtual IP:Port" information for Fwmark? Can we use
> "0.0.0.1:80" or better "0.0.0.1:0"?
> 1. With "0.0.0.1:80" we can sync connections to LVS servers with different
> VIPs (in different data centers for example) - very useful for scalability

        You mean, to avoid sync for normal conns when
conn templates are synced anyways? The problem is that
on role switch we should have correct cp->state. We
do not create conns for packets without SYN bit. Of course,
it can work for UDP and when no real servers are failing
and being replaced.

> 2. With "0.0.0.1:0" we can reduce the number of connections entries by
> aggregating them

        This is what we use, dport=0. Single conn template
for all normal connections from client that are with same mark.

> Are there any potential problems?

        Wihtout syncing the normal connections to backup
we are not sure that they are really using the real server
that is currently assigned to connection templates. And
the stateful method that is currently used does not allow
to avoid such sync. The new persistence engines will
require syncing. But for some normal cases may be it can
work to reduce the sync traffic when persistence is used
without stateful inspection.

Regards

--
Julian Anastasov <ja@xxxxxx>

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread] Current Thread [Next in Thread>