Hello,
On Mon, 5 Sep 2022, dust.li wrote:
> >+static inline void ip_vs_est_wait_resched(struct netns_ipvs *ipvs,
> >+ struct ip_vs_estimator *est)
> >+{
> >+#ifdef IPVS_EST_RESCHED_RCU
> >+ /* Estimator kthread is rescheduling on deleted est? Wait it! */
> >+ if (!refcount_dec_and_test(&est->refcnt)) {
> >+ struct ip_vs_est_kt_data *kd = ipvs->est_kt_arr[est->ktid];
> >+
> >+ mutex_lock(&kd->mutex);
> >+ mutex_unlock(&kd->mutex);
>
> IIUC, this mutex_lock/unlock() is just used for waiting for the ipvs-e
> thread schedule back if it had been scheduled out in cond_resched_rcu() ?
> But not to protect data ?
Yes
> If so, I am wondering if we can remove the mutex_trylock/unlock() in
> ip_vs_est_cond_resched_rcu, and use some wait/wakeup mechanism to do
> this ? Because when I run perf on 'ipvs-e' kthreads, I saw lots of CPU
> cycles are on the mutex_trylock.
Yes, wait/wakeup would be better alternative, it will
avoid the possible priority inversion as Jiri noticed.
Probably, the rate of rescheduling can be reduced,
so that we can achieve sub-millisecond latency. But the time
we spend in the for_each_possible_cpu() loop depends on the
number of possible CPUs, they can have different speed.
I thought of checking initially with need_resched() before
mutex_trylock() but when RCU preemption is disabled, the RCU
reader side is a region with disabled preemption and we should
call cond_resched() often to report a quiescent state with
rcu_all_qs().
For example:
- goal: sub-100-microsecond cond_resched rate to
report a quiescent state and to be nice to other kthreads
if (!(++resched_counter & 15) || need_resched()) ...
Thank you for the review and testing!
Regards
--
Julian Anastasov <ja@xxxxxx>
|