LVS
lvs-devel
Google
 
Web LinuxVirtualServer.org

Re: [PATCH 03/18] ipvs: zero percpu stats

To: Julian Anastasov <ja@xxxxxx>
Subject: Re: [PATCH 03/18] ipvs: zero percpu stats
Cc: Simon Horman <horms@xxxxxxxxxxxx>, netdev@xxxxxxxxxxxxxxx, netfilter-devel@xxxxxxxxxxxxxxx, netfilter@xxxxxxxxxxxxxxx, lvs-devel@xxxxxxxxxxxxxxx, Hans Schillstrom <hans@xxxxxxxxxxxxxxx>
From: Eric Dumazet <eric.dumazet@xxxxxxxxx>
Date: Sun, 13 Mar 2011 11:57:32 +0100
Le dimanche 06 mars 2011 à 14:18 +0200, Julian Anastasov a écrit :
>       Hello,
> 
> On Sun, 6 Mar 2011, Eric Dumazet wrote:
> 
> >>    Zero the new percpu stats because we copy from there.
> >>
> >> Signed-off-by: Julian Anastasov <ja@xxxxxx>
> >> Signed-off-by: Simon Horman <horms@xxxxxxxxxxxx>
> >> ---
> >>  net/netfilter/ipvs/ip_vs_ctl.c |   17 +++++++++++++++++
> >>  1 files changed, 17 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c 
> >> b/net/netfilter/ipvs/ip_vs_ctl.c
> >> index a2a67ad..fd74527 100644
> >> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> >> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> >> @@ -715,8 +715,25 @@ static void ip_vs_trash_cleanup(struct net *net)
> >>  static void
> >>  ip_vs_zero_stats(struct ip_vs_stats *stats)
> >>  {
> >> +  struct ip_vs_cpu_stats *cpustats = stats->cpustats;
> >> +  int i;
> >> +
> >>    spin_lock_bh(&stats->lock);
> >>
> >> +  for_each_possible_cpu(i) {
> >> +          struct ip_vs_cpu_stats *u = per_cpu_ptr(cpustats, i);
> >> +          unsigned int start;
> >> +
> >> +          /* Do not pretend to be writer, it is enough to
> >> +           * sync with writers that modify the u64 counters
> >> +           * because under stats->lock we are the only reader.
> >> +           */
> >> +          do {
> >> +                  start = u64_stats_fetch_begin(&u->syncp);
> >> +                  memset(&u->ustats, 0, sizeof(u->ustats));
> >> +          } while (u64_stats_fetch_retry(&u->syncp, start));
> >
> >
> > Sorry this makes no sense to me.
> 
>       Hm, yes, the comment is a little bit misleading.
> I fixed it below...
> 
> > This code _is_ a writer, and hardly a hot path.
> 
>       Yes, the picture is as follows:
> 
> - in 2.6.38-rc we remove the global spin lock (stats->lock)
> from packet processing which is a hot path, adding percpu
> counters instead
> 
> - we need protection for percpu counters and for the sum
> 
> - the chain is: interrupts increment percpu counters, the
> estimation timer reads them and creates sum every 2 seconds,
> then user context can read the sum or even to show the percpu
> counters, not to forget the zeroing of sum and counters
> 
> The players in detail:
> 
> - packet processing:
>       - softirq context, hot path
>       - increments counters by using u64_stats_update_begin and
>       u64_stats_update_end, does not wait readers or zeroing
>       - sum not touched, stats->lock usage removed in 2.6.38-rc
> 
> - 2-second estimation timer:
>       - funcs: estimation_timer()
>       - timer context, softirq
>       - reads percpu counters with u64_stats_fetch_begin and
>       u64_stats_fetch_retry to sync with counter incrementing
>       - uses spin_lock (stats->lock) to protect the written sum
>       which is later read by user context: provides
>       at least u64 atomicity but additionally the relation
>       between packets and bytes
> 
> - sum readers:
>       - funcs: ip_vs_stats_show(), ip_vs_stats_percpu_show(),
>       ip_vs_copy_stats(), ip_vs_genl_fill_stats()
>       - user context, not a hot path
>       - uses spin_lock_bh (stats->lock) for atomic reading of
>       the sum created by estimation_timer()
> 
> - show percpu counters:
>       - funcs: ip_vs_stats_percpu_show()
>       - user context, not a hot path
>       - uses u64_stats_fetch_begin_bh and u64_stats_fetch_retry_bh
>       to synchronize with counter incrementing
>       - still missing: should use spin_lock_bh (stats->lock)
>       to synchronize with ip_vs_zero_stats() that modifies
>       percpu counters.
> 
> - zero stats and percpu counters
>       - funcs: ip_vs_zero_stats()
>       - user context, not a hot path
>       - uses spin_lock_bh (stats->lock) while modifying
>       sum but also while zeroing percpu counters because
>       we are a hidden writer which does not allow other
>       percpu counter readers at the same time but we are
>       still synchronized with percpu counter incrementing
>       without delaying it
> 
> To summarize, I see 2 solutions, in order of preference:
> 
> 1. all players except packet processing should use stats->lock
> when reading/writing sum or when reading/zeroing percpu
> counters. Use u64_stats to avoid delays in incrementing.
> 
> 2. Use seqlock instead of u64_stats if we want to treat the
> percpu counters zeroing as writer. This returns us before
> 2.6.38-rc where we used global stats->lock even for counter
> incrementing. Except that now we can use percpu seqlock
> just to register the zeroing as writer.
> 
> > Why try to pretend its a reader and confuse people ?
> >
> > Either :
> >
> > - Another writer can modify the counters in same time, and we must
> > synchronize with them (we are a writer after all)
> 
>       Global mutex allows only one zeroing at a time.
> But zeroing runs in parallel with incrementing, so we
> have 2 writers for a per-CPU state. This sounds like
> above solution 2 with percpu seqlock? But it adds extra
> spin_lock in hot path, even if it is percpu. It only
> saves the spin_lock_bh while reading percpu counters in
> ip_vs_stats_percpu_show(). That is why a prefer solution 1.
> 
> > - Another reader can read the counters in same time, and we must let
> > them catch we mihjt have cleared half of their values.
> 
>       Yes, zeroing can run in parallel with /proc reading,
> that is why I now try to serialize all readers with the
> stats spin lock to guarantee u64 atomicity.
> 
> > - No reader or writer can access data, no synch is needed, a pure
> > memset() is OK.
> 
>       Packet processing can damage the counters while we
> do memset, so we need at least u64_stats_fetch_* to sync
> with incrementing.
> 

OK I now understand what you wanted to do.

Problem is you do synchronize your memset() with a concurrent writer but
one way only. (You detect a writer did some changes on the counters
while you memset() them), but a writer has no way to detect your writes
(could be partially committed to main memory) : It could read a
corrupted value.

I feel memory barriers are wrong and not really fixable without slowing
down the hot path.

As implied in include/linux/u64_stats_sync.h file, a "writer" should be
alone :)

One other way to handle that (and let hot path packet processing without
extra locking) would be to never memset() this data, but use a separate
"summed" value as a relative point, and substract this sum to the
current one (all this in slow path, so not a problem)



--
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

<Prev in Thread] Current Thread [Next in Thread>