LVS
lvs-devel
Google
 
Web LinuxVirtualServer.org

[PATCH 1/3] IPVS: add wlib & wlip schedulers

To: Julian Anastasov <ja@xxxxxx>
Subject: [PATCH 1/3] IPVS: add wlib & wlip schedulers
Cc: Wensong Zhang <wensong@xxxxxxxxxxxx>, Simon Horman <horms@xxxxxxxxxxxx>, lvs-devel@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
From: Chris Caputo <ccaputo@xxxxxxx>
Date: Tue, 20 Jan 2015 23:21:18 +0000 (UTC)
On Tue, 20 Jan 2015, Julian Anastasov wrote:
> On Sat, 17 Jan 2015, Chris Caputo wrote:
> > From: Chris Caputo <ccaputo@xxxxxxx> 
> > 
> > IPVS wlib (Weighted Least Incoming Byterate) and wlip (Weighted Least 
> > Incoming 
> > Packetrate) schedulers, updated for 3.19-rc4.

Hi Julian,

Thanks for the review.

>       The IPVS estimator uses 2-second timer to update
> the stats, isn't that a problem for such schedulers?
> Also, you schedule by incoming traffic rate which is
> ok when clients mostly upload. But in the common case
> clients mostly download and IPVS processes download
> traffic only for NAT method.

My application consists of incoming TCP streams being load balanced to 
servers which receive the feeds. These are long lived multi-gigabyte 
streams, and so I believe the estimator's 2-second timer is fine. As an 
example:

# cat /proc/net/ip_vs_stats
   Total Incoming Outgoing         Incoming         Outgoing
   Conns  Packets  Packets            Bytes            Bytes
     9AB  58B7C17        0      1237CA2C325                0

 Conns/s   Pkts/s   Pkts/s          Bytes/s          Bytes/s
       1     387C        0          B16C4AE                0

>       May be not so useful idea: use sum of both directions
> or control it with svc->flags & IP_VS_SVC_F_SCHED_WLIB_xxx
> flags, see how "sh" scheduler supports flags. I.e.
> inbps + outbps.

I see a user-mode option as increasing complexity. For example, 
keepalived users would need to have keepalived patched to support the new 
algorithm, due to flags, rather than just configuring "wlib" or "wlip" and 
it just working.

I think I'd rather see a wlob/wlop version for users that want to 
load-balance based on outgoing bytes/packets, and a wlb/wlp version for 
users that want them summed.

>       Another problem: pps and bps are shifted values,
> see how ip_vs_read_estimator() reads them. ip_vs_est.c
> contains comments that this code handles couple of
> gigabits. May be inbps and outbps in struct ip_vs_estimator
> should be changed to u64 to support more gigabits, with
> separate patch.

See patch below to convert bps in ip_vs_estimator to 64-bits.

Other patches, based on your feedback, to follow.

Thanks,
Chris

From: Chris Caputo <ccaputo@xxxxxxx> 

IPVS: Change inbps and outbps to 64-bits so that estimator handles faster
flows. Also increases maximum viewable at user level from ~2.15Gbits/s to
~34.35Gbits/s.

Signed-off-by: Chris Caputo <ccaputo@xxxxxxx>
---
diff -uprN linux-3.19-rc5-stock/include/net/ip_vs.h 
linux-3.19-rc5/include/net/ip_vs.h
--- linux-3.19-rc5-stock/include/net/ip_vs.h    2015-01-18 06:02:20.000000000 
+0000
+++ linux-3.19-rc5/include/net/ip_vs.h  2015-01-20 08:01:15.548177969 +0000
@@ -390,8 +390,8 @@ struct ip_vs_estimator {
        u32                     cps;
        u32                     inpps;
        u32                     outpps;
-       u32                     inbps;
-       u32                     outbps;
+       u64                     inbps;
+       u64                     outbps;
 };
 
 struct ip_vs_stats {
diff -uprN linux-3.19-rc5-stock/net/netfilter/ipvs/ip_vs_est.c 
linux-3.19-rc5/net/netfilter/ipvs/ip_vs_est.c
--- linux-3.19-rc5-stock/net/netfilter/ipvs/ip_vs_est.c 2015-01-18 
06:02:20.000000000 +0000
+++ linux-3.19-rc5/net/netfilter/ipvs/ip_vs_est.c       2015-01-20 
08:01:34.369840704 +0000
@@ -45,10 +45,12 @@
 
   NOTES.
 
-  * The stored value for average bps is scaled by 2^5, so that maximal
-    rate is ~2.15Gbits/s, average pps and cps are scaled by 2^10.
+  * Average bps is scaled by 2^5, while average pps and cps are scaled by 2^10.
 
-  * A lot code is taken from net/sched/estimator.c
+  * All are reported to user level as 32 bit unsigned values. Bps can
+    overflow for fast links : max speed being ~34.35Gbits/s.
+
+  * A lot of code is taken from net/core/gen_estimator.c
  */
 
 
@@ -98,7 +100,7 @@ static void estimation_timer(unsigned lo
        u32 n_conns;
        u32 n_inpkts, n_outpkts;
        u64 n_inbytes, n_outbytes;
-       u32 rate;
+       u64 rate;
        struct net *net = (struct net *)arg;
        struct netns_ipvs *ipvs;
 
@@ -118,23 +120,24 @@ static void estimation_timer(unsigned lo
                /* scaled by 2^10, but divided 2 seconds */
                rate = (n_conns - e->last_conns) << 9;
                e->last_conns = n_conns;
-               e->cps += ((long)rate - (long)e->cps) >> 2;
+               e->cps += ((s64)rate - (s64)e->cps) >> 2;
 
                rate = (n_inpkts - e->last_inpkts) << 9;
                e->last_inpkts = n_inpkts;
-               e->inpps += ((long)rate - (long)e->inpps) >> 2;
+               e->inpps += ((s64)rate - (s64)e->inpps) >> 2;
 
                rate = (n_outpkts - e->last_outpkts) << 9;
                e->last_outpkts = n_outpkts;
-               e->outpps += ((long)rate - (long)e->outpps) >> 2;
+               e->outpps += ((s64)rate - (s64)e->outpps) >> 2;
 
+               /* scaled by 2^5, but divided 2 seconds */
                rate = (n_inbytes - e->last_inbytes) << 4;
                e->last_inbytes = n_inbytes;
-               e->inbps += ((long)rate - (long)e->inbps) >> 2;
+               e->inbps += ((s64)rate - (s64)e->inbps) >> 2;
 
                rate = (n_outbytes - e->last_outbytes) << 4;
                e->last_outbytes = n_outbytes;
-               e->outbps += ((long)rate - (long)e->outbps) >> 2;
+               e->outbps += ((s64)rate - (s64)e->outbps) >> 2;
                spin_unlock(&s->lock);
        }
        spin_unlock(&ipvs->est_lock);
--
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

<Prev in Thread] Current Thread [Next in Thread>