LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] ipvs connections sync and CPU usage

To: Aleksey Chudov <aleksey.chudov@xxxxxxxxx>
Subject: Re: [lvs-users] ipvs connections sync and CPU usage
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Julian Anastasov <ja@xxxxxx>
Date: Tue, 27 Dec 2011 00:03:45 +0200 (EET)
        Hello,

On Mon, 26 Dec 2011, Aleksey Chudov wrote:

> Hello,
> 
> Thanks for the answer.
> 
> >> Is it possible to implement change schedule_timeout_interruptible via 
> >> sysctl?
> > May be better to implement logic with auto-adjustment.
> 
> Auto-adjustment looks much better.
> 
> > Using in_pkts for templates is not a good idea.
> > As drops are possible, it can be done more often but not every time as for 
> > sync version 0.
> > Also, before ip_vs_conn_expire() we do not know if template life will be 
> > extended.
> > May be backup server should use longer timeout for templates, so that it 
> > can not miss
> > the sync packets during the extended period.
> 
> > So, now the question is how to properly reduce the rate of sync packets for 
> > templates
> > and may be for other conns when state is not changed but its life is 
> > extended. I have
> > to think for some time about such changes.
> 
> > Can you try such change: in ip_vs_sync_conn() comment the following two 
> > lines under
> > "Reduce sync rate for templates":
> >     if (pkts % sysctl_sync_period(ipvs) != 1)
> >             return;
> 
> > By this way we will sync templates every time a normal connection is 
> > synced, as for
> > version 0. It is still too often for templates but now you can try again 
> > with "3 100",
> > so that we can see if the difference is reduced.
> 
> > BTW, what is the persistence timeout value?
> 
> Persistence timeout is 1800 (30 min). It is application specific.
> 
> Tried the following:
> 
> Linux Kernel 2.6.39.4 + LVS Fwmark
> 
> iptables -t mangle -A PREROUTING -d VIP -i bond0 -p tcp -m multiport --dports 
> 80,443 -j MARK --set-mark 1
> 
> ipvsadm -A -f 1 -s wlc -p 1800
> -a -f 1 -r 1.1.1.1:0 -i -w 100
> -a -f 1 -r 1.1.1.2:0 -i -w 100
> ...
> -a -f 1 -r 1.1.X.X:0 -i -w 100
> (320 servers total)

        OK, this is IPIP method. It seems we have a problem
with this rport 0. See the appended patch. It should allow
sync-ed conns in backup to find their real server. As result,
the inact/act counters should work again, CPU usage should
be lower because before now we fail to bind to real server
for every sync message for the connection.

> # ipvsadm -l --daemon
> master sync daemon (mcast=eth3, syncid=1)
> backup sync daemon (mcast=eth3,syncid=1)
> 
> 1. ip_vs_sync_conn original,  schedule_timeout_interruptible(HZ/10) and 
> sync_threshold = "3  10"
> Results: sync traffic 60 Mbit/s, 6000 packets/sec, 60 %sys CPU on Backup node,
> 8% difference in persistent connections between Master and Backup nodes,
> netstat -s on Master SndbufErrors: 0
> 
> 2. ip_vs_sync_conn patched,  schedule_timeout_interruptible(HZ/10) and 
> sync_threshold = "3  10"
> Results: sync traffic 100 Mbit/s, 8500 packets/sec, 93 %sys CPU on Backup 
> node,
> <1% difference in persistent connections between Master and Backup nodes,
> netstat -s on Master SndbufErrors: 0
> 
> 3. ip_vs_sync_conn patched,  schedule_timeout_interruptible(HZ/10) and 
> sync_threshold = "3  100"
> Results: sync traffic 70 Mbit/s, 6000 packets/sec, 70 %sys CPU on Backup node,
> ~2% difference in persistent connections between Master and Backup nodes,
> netstat -s on Master SndbufErrors: 0
> 
> 4. ip_vs_sync_conn patched,  schedule_timeout_interruptible(HZ/10) and 
> sync_threshold = "3  200"
> Results: sync traffic 66 Mbit/s, 5800 packets/sec, 66 %sys CPU on Backup node,
> ~3% difference in persistent connections between Master and Backup nodes,
> netstat -s on Master SndbufErrors: 0
> 
> 5. ip_vs_sync_conn patched,  schedule_timeout_interruptible(HZ/10) and 
> sync_threshold = "3  1000"
> Results: sync traffic 64 Mbit/s, 5600 packets/sec, 64 %sys CPU on Backup node,
> ~3% difference in persistent connections between Master and Backup nodes,
> netstat -s on Master SndbufErrors: 0
> 
> In all test I can't check difference in Active and InAct connections because 
> ipvsadm does not show
> Active and InAct connections on Backup node for Fwmark virtual, only Persist 
> connections.
> 
> There is no significant differences in sync traffic after "3  100".

        Yes, it depends on how many packets we see for conns.

> >> As mentioned in another report 
> >> http://www.gossamer-threads.com/lists/lvs/users/24331
> >> after switching from TCP VIP to Fwmark %sys CPU is raised from 40 - 50 
> >> % (TCP VIP) to 80 - 100 % (Fwmark) with no difference in sync traffic.
> 
> Could you explain why %sys CPU is raised with Fwmark? 
> Could you explain why ipvsadm does not show Active and InAct connections on 
> Backup node for Fwmark virtual?

        Yes, we try to bind to dest for every sync message
without success because conns come with dport=80/443 while
real server port is 0. Only the template conns find the
server because they have rport 0. I hope the appended patch
should fix it. How better is the CPU then?

> Regards,
> Aleksey

Subject: [PATCH] ipvs: try also real server with port 0 in backup server

        We should not forget to try for real server with port 0
in the backup server when processing the sync message. We should
do it in all cases because the backup server can use different
forwarding method.

Signed-off-by: Julian Anastasov <ja@xxxxxx>
---
 include/net/ip_vs.h             |    2 +-
 net/netfilter/ipvs/ip_vs_conn.c |    2 +-
 net/netfilter/ipvs/ip_vs_ctl.c  |   10 ++++++++--
 net/netfilter/ipvs/ip_vs_sync.c |    2 +-
 4 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 48fd12e..ebe517f 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1207,7 +1207,7 @@ extern void ip_vs_control_cleanup(void);
 extern struct ip_vs_dest *
 ip_vs_find_dest(struct net *net, int af, const union nf_inet_addr *daddr,
                __be16 dport, const union nf_inet_addr *vaddr, __be16 vport,
-               __u16 protocol, __u32 fwmark);
+               __u16 protocol, __u32 fwmark, __u32 flags);
 extern struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn *cp);
 
 
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 12571fb..29fa5ba 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -616,7 +616,7 @@ struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn 
*cp)
        if ((cp) && (!cp->dest)) {
                dest = ip_vs_find_dest(ip_vs_conn_net(cp), cp->af, &cp->daddr,
                                       cp->dport, &cp->vaddr, cp->vport,
-                                      cp->protocol, cp->fwmark);
+                                      cp->protocol, cp->fwmark, cp->flags);
                ip_vs_bind_dest(cp, dest);
                return dest;
        } else
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 008bf97..e1a66cf 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -619,15 +619,21 @@ struct ip_vs_dest *ip_vs_find_dest(struct net  *net, int 
af,
                                   const union nf_inet_addr *daddr,
                                   __be16 dport,
                                   const union nf_inet_addr *vaddr,
-                                  __be16 vport, __u16 protocol, __u32 fwmark)
+                                  __be16 vport, __u16 protocol, __u32 fwmark,
+                                  __u32 flags)
 {
        struct ip_vs_dest *dest;
        struct ip_vs_service *svc;
+       __be16 port = dport;
 
        svc = ip_vs_service_get(net, af, fwmark, protocol, vaddr, vport);
        if (!svc)
                return NULL;
-       dest = ip_vs_lookup_dest(svc, daddr, dport);
+       if (fwmark && (flags & IP_VS_CONN_F_FWD_MASK) != IP_VS_CONN_F_MASQ)
+               port = 0;
+       dest = ip_vs_lookup_dest(svc, daddr, port);
+       if (!dest)
+               dest = ip_vs_lookup_dest(svc, daddr, port ^ dport);
        if (dest)
                atomic_inc(&dest->refcnt);
        ip_vs_service_put(svc);
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index bcf5563..8a0d6d6 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -740,7 +740,7 @@ static void ip_vs_proc_conn(struct net *net, struct 
ip_vs_conn_param *param,
                 * but still handled.
                 */
                dest = ip_vs_find_dest(net, type, daddr, dport, param->vaddr,
-                                      param->vport, protocol, fwmark);
+                                      param->vport, protocol, fwmark, flags);
 
                /*  Set the approprite ativity flag */
                if (protocol == IPPROTO_TCP) {
-- 
1.7.3.4


_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread] Current Thread [Next in Thread>