Re: [RFC PATCHv5 3/6] ipvs: use kthreads for stats estimation

To: Julian Anastasov <ja@xxxxxx>
Subject: Re: [RFC PATCHv5 3/6] ipvs: use kthreads for stats estimation
Cc: Simon Horman <horms@xxxxxxxxxxxx>, lvs-devel@xxxxxxxxxxxxxxx, yunhong-cgl jiang <xintian1976@xxxxxxxxx>,
From: Jiri Wiesner <jwiesner@xxxxxxx>
Date: Wed, 16 Nov 2022 17:41:19 +0100
On Sat, Oct 29, 2022 at 05:12:28PM +0300, Julian Anastasov wrote:
> On Thu, 27 Oct 2022, Jiri Wiesner wrote:
> > On Mon, Oct 24, 2022 at 06:01:32PM +0300, Julian Anastasov wrote:
> > > - fast and safe way to apply a new chain_max or similar
> > > parameter for cond_resched rate. If possible, without
> > > relinking. stop+start can be slow too.
> > 
> > I am still wondering where the requirement for 100 us latency in 
> > non-preemtive kernels comes from. Typical time slices assigned by a 
> > time-sharing scheduler are measured in milliseconds. A kernel with volutary 
> > preemption does not need any cond_resched statements in 
> > ip_vs_tick_estimation() because every spin_unlock() in 
> > ip_vs_chain_estimation() is a preemption point, which actually puts the 
> > accuracy of the computed estimates at risk but nothing can be done about 
> > that, I guess.
>       I'm not sure about the 100us requirements for non-RT
> kernels, this document covers only RT requirements, I think:
> Documentation/RCU/Design/Requirements/Requirements.rst
>       In fact, I don't worry for the RCU-preemptible
> case where we can be rescheduled at any time. In this
> case cond_resched_rcu() is NOP and chain_max has only
> one purpose of limiting ests in kthread, i.e. not to
> determine period between cond_resched calls which is
> its 2nd purpose for the non-preemptible case.
>       As for the non-preemptible case,
> rcu_read_lock/rcu_read_unlock are just preempt_disable/preempt_enable 
> which means the spin locking can not preempt us, the only way is
> we to call rcu_read_unlock which is just preempt_count_dec()
> or a simple barrier() but __preempt_schedule() is not
> called as it happens on CONFIG_PREEMPTION. So, only
> cond_resched() can allow rescheduling.
>       Also, there are some configurations like nohz_full
> that expect cond_resched() to check for any pending
> rcu_urgent_qs condition via rcu_all_qs(). I'm not
> expert in areas such as RCU and scheduling, so I'm
> not sure about the 100us latency budget for the
> non-preemptible cases we cover:
> 1. PREEMPT_NONE "No Forced Preemption (Server)"
> 2. PREEMPT_VOLUNTARY "Voluntary Kernel Preemption (Desktop)"
>       Where the latency can matter is setups where the
> IPVS kthreads are set to some low priority, as a
> way to work in idle times and to allow app servers
> to react to clients' requests faster. Once request
> is served with short delay, app blocks somewhere and
> our kthreads run again running in idle times.
>       In short, the IPVS kthreads do not have an
> urgent work, they should do their 4.8ms work in 40ms
> or even more but it is preferred not to delay other
> more-priority tasks such as applications or even other
> kthreads. That is why I think we should stick to some low
> period between cond_resched calls without causing
> it to take large part of our CPU usage.

OK, I agree that volutary preemption without CONFIG_PREEMPT_RCU will need a 
preemption point in ip_vs_tick_estimation().

>       If we want to reduce its rate, it can be
> in this way, for example:
>       int n = 0;
>       /* 400us for forced cond_resched() but reschedule on demand */
>       if (!(++n & 3) || need_resched()) {
>               cond_resched_rcu();
>               n = 0;
>       }
>       This controls both the RCU requirements and
> reacts faster on scheduler's indication. There will be
> an useless need_resched() call for the RCU-preemptible
> case, though, where cond_resched_rcu is NOP.

I do not see that as an improvement as well.

Jiri Wiesner

<Prev in Thread] Current Thread [Next in Thread>
  • Re: [RFC PATCHv5 3/6] ipvs: use kthreads for stats estimation, Jiri Wiesner <=