LVS
lvs-devel
Google
 
Web LinuxVirtualServer.org

Re: [RFC PATCHv6 3/7] ipvs: use u64_stats_t for the per-cpu counters

To: Julian Anastasov <ja@xxxxxx>
Subject: Re: [RFC PATCHv6 3/7] ipvs: use u64_stats_t for the per-cpu counters
Cc: Simon Horman <horms@xxxxxxxxxxxx>, lvs-devel@xxxxxxxxxxxxxxx, yunhong-cgl jiang <xintian1976@xxxxxxxxx>, dust.li@xxxxxxxxxxxxxxxxx
From: Jiri Wiesner <jwiesner@xxxxxxx>
Date: Sat, 12 Nov 2022 10:09:10 +0100
On Sat, Nov 12, 2022 at 10:00:01AM +0100, Jiri Wiesner wrote:
> On Mon, Oct 31, 2022 at 04:56:43PM +0200, Julian Anastasov wrote:
> > Use the provided u64_stats_t type to avoid
> > load/store tearing.

I think only 32-bit machines have a problem storing a 64-bit counter 
atomically. u64_stats_fetch_begin() and u64_stats_fetch_retry() should take 
care of that.

> > 
> > Fixes: 316580b69d0a ("u64_stats: provide u64_stats_t type")
> > Signed-off-by: Julian Anastasov <ja@xxxxxx>
> > ---
> >  include/net/ip_vs.h             | 10 +++++-----
> >  net/netfilter/ipvs/ip_vs_core.c | 30 +++++++++++++++---------------
> >  net/netfilter/ipvs/ip_vs_ctl.c  | 10 +++++-----
> >  net/netfilter/ipvs/ip_vs_est.c  | 20 ++++++++++----------
> >  4 files changed, 35 insertions(+), 35 deletions(-)
> > 
> > diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> > index e5582c01a4a3..a4d44138c2a8 100644
> > --- a/include/net/ip_vs.h
> > +++ b/include/net/ip_vs.h
> > @@ -351,11 +351,11 @@ struct ip_vs_seq {
> >  
> >  /* counters per cpu */
> >  struct ip_vs_counters {
> > -   __u64           conns;          /* connections scheduled */
> > -   __u64           inpkts;         /* incoming packets */
> > -   __u64           outpkts;        /* outgoing packets */
> > -   __u64           inbytes;        /* incoming bytes */
> > -   __u64           outbytes;       /* outgoing bytes */
> > +   u64_stats_t     conns;          /* connections scheduled */
> > +   u64_stats_t     inpkts;         /* incoming packets */
> > +   u64_stats_t     outpkts;        /* outgoing packets */
> > +   u64_stats_t     inbytes;        /* incoming bytes */
> > +   u64_stats_t     outbytes;       /* outgoing bytes */
> >  };
> >  /* Stats per cpu */
> >  struct ip_vs_cpu_stats {
> 
> Converting the per cpu stat to u64_stats_t means that the compiler cannot 
> optimize the memory access and addition on x86_64. Previously, the summation 
> of per cpu counters in ip_vs_chain_estimation() looked like:
> 15b65:  add    (%rsi),%r14
> 15b68:  add    0x8(%rsi),%r15
> 15b6c:  add    0x10(%rsi),%r13
> 15b70:  add    0x18(%rsi),%r12
> 15b74:  add    0x20(%rsi),%rbp
> The u64_stats_read() calls in ip_vs_chain_estimation() turned it into:
> 159d5:  mov    (%rcx),%r11
> 159d8:  mov    0x8(%rcx),%r10
> 159dc:  mov    0x10(%rcx),%r9
> 159e0:  mov    0x18(%rcx),%rdi
> 159e4:  mov    0x20(%rcx),%rdx
> 159e8:  add    %r11,%r14
> 159eb:  add    %r10,%r13
> 159ee:  add    %r9,%r12
> 159f1:  add    %rdi,%rbp
> 159f4:  add    %rdx,%rbx
> I guess that is not a big deal because the mov should be the instruction 
> taking the most time on account of accessing per cpu regions of other CPUs. 
> The add will be fast.

Another concern is the number of registers needed for the summation. 
Previously, 5 registers were needed. Now, 10 registers are needed. This would 
matter mostly if the size of the stack frame of ip_vs_chain_estimation() and 
the number of callee-saved registers stored to the stack changed, but 
introducing u64_stats_read() in ip_vs_chain_estimation() did not change that.

-- 
Jiri Wiesner
SUSE Labs

<Prev in Thread] Current Thread [Next in Thread>