Hello,
This patchset implements stats estimation in
kthread context. Simple tests do not show any problem.
Please review, comment, test, etc.
Overview of the basic concepts. More in the
commit messages...
RCU Locking:
- when RCU preemption is enabled the kthreads use just RCU
lock for walking the chains and we do not need to reschedule.
May be this is the common case for distribution kernels.
In this case ip_vs_stop_estimator() is completely lockless.
- when RCU preemption is not enabled, we reschedule by using
refcnt for every estimator to track if the currently removed
estimator is used at the same time by kthread for estimation.
As RCU lock is unlocked during rescheduling, the deletion
should wait kd->mutex, so that a new RCU lock is applied
before the estimator is freed with RCU callback.
- As stats are now RCU-locked, tot_stats, svc and dest which
hold estimator structures are now always freed from RCU
callback. This ensures RCU grace period after the
ip_vs_stop_estimator() call.
Kthread data:
- every kthread works over its own data structure and all
such structures are attached to array
- even while there can be a kthread structure, its task
may not be running, eg. before first service is added or
while the sysctl var is set to an empty cpulist or
when run_estimation is 0.
- a task and its structure may be released if all
estimators are unlinked from its chains, leaving the
slot in the array empty
- to add new estimators we use the last added kthread
context (est_add_ktid). The new estimators are linked to
the chain just before the estimated one, based on add_row.
This ensures their estimation will start after 2 seconds.
If estimators are added in bursts, common case if all
services and dests are initially configured, we may
spread the estimators to more chains. This will reduce
the chain imbalance.
- the chain imbalance is not so fatal when we use
kthreads. We design each kthread for part of the
possible CPU usage, so even if some chain exceeds its
time slot it would happen all the time or sporadic
depending on the scheduling but still keeping the
2-second interval. The cpulist isolation can make
the things more stable as a 2-second time interval
per estimator.
Julian Anastasov (4):
ipvs: add rcu protection to stats
ipvs: use kthreads for stats estimation
ipvs: add est_cpulist and est_nice sysctl vars
ipvs: run_estimation should control the kthread tasks
Documentation/networking/ipvs-sysctl.rst | 24 +-
include/net/ip_vs.h | 144 +++++++-
net/netfilter/ipvs/ip_vs_core.c | 10 +-
net/netfilter/ipvs/ip_vs_ctl.c | 287 ++++++++++++++--
net/netfilter/ipvs/ip_vs_est.c | 408 +++++++++++++++++++----
5 files changed, 771 insertions(+), 102 deletions(-)
--
2.37.2
|