Message ID | 491d3af6c7d66dfb3b60b2f210f38e843dfe6ed2.1710525524.git.yan@cloudflare.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Report RCU QS for busy network kthreads | expand |
On Fri, Mar 15, 2024 at 12:55:03PM -0700, Yan Zhai wrote: > There are several scenario in network processing that can run > extensively under heavy traffic. In such situation, RCU synchronization > might not observe desired quiescent states for indefinitely long period. > Create a helper to safely raise the desired RCU quiescent states for > such scenario. > > Currently the frequency is locked at HZ/10, i.e. 100ms, which is > sufficient to address existing problems around RCU tasks. It's unclear > yet if there is any future scenario for it to be further tuned down. I suggest something like the following for the commit log: ------------------------------------------------------------------------ When under heavy load, network processing can run CPU-bound for many tens of seconds. Even in preemptible kernels, this can block RCU Tasks grace periods, which can cause trace-event removal to take more than a minute, which is unacceptably long. This commit therefore creates a new helper function that passes through both RCU and RCU-Tasks quiescent states every 100 milliseconds. This hard-coded value suffices for current workloads. ------------------------------------------------------------------------ > Suggested-by: Paul E. McKenney <paulmck@kernel.org> > Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org> > Signed-off-by: Yan Zhai <yan@cloudflare.com> > --- > v3->v4: comment fixup > > --- > include/linux/rcupdate.h | 24 ++++++++++++++++++++++++ > 1 file changed, 24 insertions(+) > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index 0746b1b0b663..da224706323e 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -247,6 +247,30 @@ do { \ > cond_resched(); \ > } while (0) > > +/** > + * rcu_softirq_qs_periodic - Periodically report consolidated quiescent states > + * @old_ts: last jiffies when QS was reported. Might be modified in the macro. > + * > + * This helper is for network processing in non-RT kernels, where there could > + * be busy polling threads that block RCU synchronization indefinitely. In > + * such context, simply calling cond_resched is insufficient, so give it a > + * stronger push to eliminate all potential blockage of all RCU types. > + * > + * NOTE: unless absolutely sure, this helper should in general be called > + * outside of bh lock section to avoid reporting a surprising QS to updaters, > + * who could be expecting RCU read critical section to end at local_bh_enable(). > + */ How about something like this for the kernel-doc comment? /** * rcu_softirq_qs_periodic - Report RCU and RCU-Tasks quiescent states * @old_ts: jiffies at start of processing. * * This helper is for long-running softirq handlers, such as those * in networking. The caller should initialize the variable passed in * as @old_ts at the beginning of the softirq handler. When invoked * frequently, this macro will invoke rcu_softirq_qs() every 100 * milliseconds thereafter, which will provide both RCU and RCU-Tasks * quiescent states. Note that this macro modifies its old_ts argument. * * Note that although cond_resched() provides RCU quiescent states, * it does not provide RCU-Tasks quiescent states. * * Because regions of code that have disabled softirq act as RCU * read-side critical sections, this macro should be invoked with softirq * (and preemption) enabled. * * This macro has no effect in CONFIG_PREEMPT_RT kernels. */ Thanx, Paul > +#define rcu_softirq_qs_periodic(old_ts) \ > +do { \ > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && \ > + time_after(jiffies, (old_ts) + HZ / 10)) { \ > + preempt_disable(); \ > + rcu_softirq_qs(); \ > + preempt_enable(); \ > + (old_ts) = jiffies; \ > + } \ > +} while (0) > + > /* > * Infrastructure to implement the synchronize_() primitives in > * TREE_RCU and rcu_barrier_() primitives in TINY_RCU. > -- > 2.30.2 > >
On Fri, Mar 15, 2024 at 10:40:56PM -0700, Paul E. McKenney wrote: > On Fri, Mar 15, 2024 at 12:55:03PM -0700, Yan Zhai wrote: > > There are several scenario in network processing that can run > > extensively under heavy traffic. In such situation, RCU synchronization > > might not observe desired quiescent states for indefinitely long period. > > Create a helper to safely raise the desired RCU quiescent states for > > such scenario. > > > > Currently the frequency is locked at HZ/10, i.e. 100ms, which is > > sufficient to address existing problems around RCU tasks. It's unclear > > yet if there is any future scenario for it to be further tuned down. > > I suggest something like the following for the commit log: > > ------------------------------------------------------------------------ > > When under heavy load, network processing can run CPU-bound for many tens > of seconds. Even in preemptible kernels, this can block RCU Tasks grace > periods, which can cause trace-event removal to take more than a minute, > which is unacceptably long. > > This commit therefore creates a new helper function that passes > through both RCU and RCU-Tasks quiescent states every 100 milliseconds. > This hard-coded value suffices for current workloads. FWIW, this sounds good to me. > > ------------------------------------------------------------------------ > > > Suggested-by: Paul E. McKenney <paulmck@kernel.org> > > Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org> > > Signed-off-by: Yan Zhai <yan@cloudflare.com> > > --- > > v3->v4: comment fixup > > > > --- > > include/linux/rcupdate.h | 24 ++++++++++++++++++++++++ > > 1 file changed, 24 insertions(+) > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > index 0746b1b0b663..da224706323e 100644 > > --- a/include/linux/rcupdate.h > > +++ b/include/linux/rcupdate.h > > @@ -247,6 +247,30 @@ do { \ > > cond_resched(); \ > > } while (0) > > > > +/** > > + * rcu_softirq_qs_periodic - Periodically report consolidated quiescent states > > + * @old_ts: last jiffies when QS was reported. Might be modified in the macro. > > + * > > + * This helper is for network processing in non-RT kernels, where there could > > + * be busy polling threads that block RCU synchronization indefinitely. In > > + * such context, simply calling cond_resched is insufficient, so give it a > > + * stronger push to eliminate all potential blockage of all RCU types. > > + * > > + * NOTE: unless absolutely sure, this helper should in general be called > > + * outside of bh lock section to avoid reporting a surprising QS to updaters, > > + * who could be expecting RCU read critical section to end at local_bh_enable(). > > + */ > > How about something like this for the kernel-doc comment? > > /** > * rcu_softirq_qs_periodic - Report RCU and RCU-Tasks quiescent states > * @old_ts: jiffies at start of processing. > * > * This helper is for long-running softirq handlers, such as those > * in networking. The caller should initialize the variable passed in > * as @old_ts at the beginning of the softirq handler. When invoked > * frequently, this macro will invoke rcu_softirq_qs() every 100 > * milliseconds thereafter, which will provide both RCU and RCU-Tasks > * quiescent states. Note that this macro modifies its old_ts argument. > * > * Note that although cond_resched() provides RCU quiescent states, > * it does not provide RCU-Tasks quiescent states. > * > * Because regions of code that have disabled softirq act as RCU > * read-side critical sections, this macro should be invoked with softirq > * (and preemption) enabled. > * > * This macro has no effect in CONFIG_PREEMPT_RT kernels. > */ Considering the note about cond_resched(), does does cond_resched() actually provide an RCU quiescent state for fully-preemptible kernels? IIUC for those cond_resched() expands to: __might_resched(); klp_sched_try_switch() ... and AFAICT neither reports an RCU quiescent state. So maybe it's worth dropping the note? Seperately, what's the rationale for not doing this on PREEMPT_RT? Does that avoid the problem through other means, or are people just not running effected workloads on that? Mark. > > Thanx, Paul > > > +#define rcu_softirq_qs_periodic(old_ts) \ > > +do { \ > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && \ > > + time_after(jiffies, (old_ts) + HZ / 10)) { \ > > + preempt_disable(); \ > > + rcu_softirq_qs(); \ > > + preempt_enable(); \ > > + (old_ts) = jiffies; \ > > + } \ > > +} while (0) > > + > > /* > > * Infrastructure to implement the synchronize_() primitives in > > * TREE_RCU and rcu_barrier_() primitives in TINY_RCU. > > -- > > 2.30.2 > > > >
On Sat, Mar 16, 2024 at 12:41 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Fri, Mar 15, 2024 at 12:55:03PM -0700, Yan Zhai wrote: > > There are several scenario in network processing that can run > > extensively under heavy traffic. In such situation, RCU synchronization > > might not observe desired quiescent states for indefinitely long period. > > Create a helper to safely raise the desired RCU quiescent states for > > such scenario. > > > > Currently the frequency is locked at HZ/10, i.e. 100ms, which is > > sufficient to address existing problems around RCU tasks. It's unclear > > yet if there is any future scenario for it to be further tuned down. > > I suggest something like the following for the commit log: > > ------------------------------------------------------------------------ > > When under heavy load, network processing can run CPU-bound for many tens > of seconds. Even in preemptible kernels, this can block RCU Tasks grace > periods, which can cause trace-event removal to take more than a minute, > which is unacceptably long. > > This commit therefore creates a new helper function that passes > through both RCU and RCU-Tasks quiescent states every 100 milliseconds. > This hard-coded value suffices for current workloads. > > ------------------------------------------------------------------------ > > > Suggested-by: Paul E. McKenney <paulmck@kernel.org> > > Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org> > > Signed-off-by: Yan Zhai <yan@cloudflare.com> > > --- > > v3->v4: comment fixup > > > > --- > > include/linux/rcupdate.h | 24 ++++++++++++++++++++++++ > > 1 file changed, 24 insertions(+) > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > index 0746b1b0b663..da224706323e 100644 > > --- a/include/linux/rcupdate.h > > +++ b/include/linux/rcupdate.h > > @@ -247,6 +247,30 @@ do { \ > > cond_resched(); \ > > } while (0) > > > > +/** > > + * rcu_softirq_qs_periodic - Periodically report consolidated quiescent states > > + * @old_ts: last jiffies when QS was reported. Might be modified in the macro. > > + * > > + * This helper is for network processing in non-RT kernels, where there could > > + * be busy polling threads that block RCU synchronization indefinitely. In > > + * such context, simply calling cond_resched is insufficient, so give it a > > + * stronger push to eliminate all potential blockage of all RCU types. > > + * > > + * NOTE: unless absolutely sure, this helper should in general be called > > + * outside of bh lock section to avoid reporting a surprising QS to updaters, > > + * who could be expecting RCU read critical section to end at local_bh_enable(). > > + */ > > How about something like this for the kernel-doc comment? > > /** > * rcu_softirq_qs_periodic - Report RCU and RCU-Tasks quiescent states > * @old_ts: jiffies at start of processing. > * > * This helper is for long-running softirq handlers, such as those > * in networking. The caller should initialize the variable passed in > * as @old_ts at the beginning of the softirq handler. When invoked > * frequently, this macro will invoke rcu_softirq_qs() every 100 > * milliseconds thereafter, which will provide both RCU and RCU-Tasks > * quiescent states. Note that this macro modifies its old_ts argument. > * > * Note that although cond_resched() provides RCU quiescent states, > * it does not provide RCU-Tasks quiescent states. > * > * Because regions of code that have disabled softirq act as RCU > * read-side critical sections, this macro should be invoked with softirq > * (and preemption) enabled. > * > * This macro has no effect in CONFIG_PREEMPT_RT kernels. > */ > It would be more accurate this way, I like it. Thanks! Yan > Thanx, Paul > > > +#define rcu_softirq_qs_periodic(old_ts) \ > > +do { \ > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && \ > > + time_after(jiffies, (old_ts) + HZ / 10)) { \ > > + preempt_disable(); \ > > + rcu_softirq_qs(); \ > > + preempt_enable(); \ > > + (old_ts) = jiffies; \ > > + } \ > > +} while (0) > > + > > /* > > * Infrastructure to implement the synchronize_() primitives in > > * TREE_RCU and rcu_barrier_() primitives in TINY_RCU. > > -- > > 2.30.2 > > > >
On Mon, Mar 18, 2024 at 5:59 AM Mark Rutland <mark.rutland@arm.com> wrote: > > On Fri, Mar 15, 2024 at 10:40:56PM -0700, Paul E. McKenney wrote: > > On Fri, Mar 15, 2024 at 12:55:03PM -0700, Yan Zhai wrote: > > > There are several scenario in network processing that can run > > > extensively under heavy traffic. In such situation, RCU synchronization > > > might not observe desired quiescent states for indefinitely long period. > > > Create a helper to safely raise the desired RCU quiescent states for > > > such scenario. > > > > > > Currently the frequency is locked at HZ/10, i.e. 100ms, which is > > > sufficient to address existing problems around RCU tasks. It's unclear > > > yet if there is any future scenario for it to be further tuned down. > > > > I suggest something like the following for the commit log: > > > > ------------------------------------------------------------------------ > > > > When under heavy load, network processing can run CPU-bound for many tens > > of seconds. Even in preemptible kernels, this can block RCU Tasks grace > > periods, which can cause trace-event removal to take more than a minute, > > which is unacceptably long. > > > > This commit therefore creates a new helper function that passes > > through both RCU and RCU-Tasks quiescent states every 100 milliseconds. > > This hard-coded value suffices for current workloads. > > FWIW, this sounds good to me. > > > > > ------------------------------------------------------------------------ > > > > > Suggested-by: Paul E. McKenney <paulmck@kernel.org> > > > Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org> > > > Signed-off-by: Yan Zhai <yan@cloudflare.com> > > > --- > > > v3->v4: comment fixup > > > > > > --- > > > include/linux/rcupdate.h | 24 ++++++++++++++++++++++++ > > > 1 file changed, 24 insertions(+) > > > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > > index 0746b1b0b663..da224706323e 100644 > > > --- a/include/linux/rcupdate.h > > > +++ b/include/linux/rcupdate.h > > > @@ -247,6 +247,30 @@ do { \ > > > cond_resched(); \ > > > } while (0) > > > > > > +/** > > > + * rcu_softirq_qs_periodic - Periodically report consolidated quiescent states > > > + * @old_ts: last jiffies when QS was reported. Might be modified in the macro. > > > + * > > > + * This helper is for network processing in non-RT kernels, where there could > > > + * be busy polling threads that block RCU synchronization indefinitely. In > > > + * such context, simply calling cond_resched is insufficient, so give it a > > > + * stronger push to eliminate all potential blockage of all RCU types. > > > + * > > > + * NOTE: unless absolutely sure, this helper should in general be called > > > + * outside of bh lock section to avoid reporting a surprising QS to updaters, > > > + * who could be expecting RCU read critical section to end at local_bh_enable(). > > > + */ > > > > How about something like this for the kernel-doc comment? > > > > /** > > * rcu_softirq_qs_periodic - Report RCU and RCU-Tasks quiescent states > > * @old_ts: jiffies at start of processing. > > * > > * This helper is for long-running softirq handlers, such as those > > * in networking. The caller should initialize the variable passed in > > * as @old_ts at the beginning of the softirq handler. When invoked > > * frequently, this macro will invoke rcu_softirq_qs() every 100 > > * milliseconds thereafter, which will provide both RCU and RCU-Tasks > > * quiescent states. Note that this macro modifies its old_ts argument. > > * > > * Note that although cond_resched() provides RCU quiescent states, > > * it does not provide RCU-Tasks quiescent states. > > * > > * Because regions of code that have disabled softirq act as RCU > > * read-side critical sections, this macro should be invoked with softirq > > * (and preemption) enabled. > > * > > * This macro has no effect in CONFIG_PREEMPT_RT kernels. > > */ > > Considering the note about cond_resched(), does does cond_resched() actually > provide an RCU quiescent state for fully-preemptible kernels? IIUC for those > cond_resched() expands to: > > __might_resched(); > klp_sched_try_switch() > > ... and AFAICT neither reports an RCU quiescent state. > > So maybe it's worth dropping the note? > > Seperately, what's the rationale for not doing this on PREEMPT_RT? Does that > avoid the problem through other means, or are people just not running effected > workloads on that? > It's a bit anti-intuition but yes the RT kernel avoids the problem. This is because "schedule()" reports task RCU QS actually, and on RT kernel cond_resched() call won't call "__cond_resched()" or "__schedule(PREEMPT)" as you already pointed out, which would clear need-resched flag. This then allows "schedule()" to be called on hard IRQ exit time by time. Yan > Mark. > > > > > Thanx, Paul > > > > > +#define rcu_softirq_qs_periodic(old_ts) \ > > > +do { \ > > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && \ > > > + time_after(jiffies, (old_ts) + HZ / 10)) { \ > > > + preempt_disable(); \ > > > + rcu_softirq_qs(); \ > > > + preempt_enable(); \ > > > + (old_ts) = jiffies; \ > > > + } \ > > > +} while (0) > > > + > > > /* > > > * Infrastructure to implement the synchronize_() primitives in > > > * TREE_RCU and rcu_barrier_() primitives in TINY_RCU. > > > -- > > > 2.30.2 > > > > > >
On Mon, Mar 18, 2024 at 9:32 PM Yan Zhai <yan@cloudflare.com> wrote: > > On Mon, Mar 18, 2024 at 5:59 AM Mark Rutland <mark.rutland@arm.com> wrote: > > > > On Fri, Mar 15, 2024 at 10:40:56PM -0700, Paul E. McKenney wrote: > > > On Fri, Mar 15, 2024 at 12:55:03PM -0700, Yan Zhai wrote: > > > > There are several scenario in network processing that can run > > > > extensively under heavy traffic. In such situation, RCU synchronization > > > > might not observe desired quiescent states for indefinitely long period. > > > > Create a helper to safely raise the desired RCU quiescent states for > > > > such scenario. > > > > > > > > Currently the frequency is locked at HZ/10, i.e. 100ms, which is > > > > sufficient to address existing problems around RCU tasks. It's unclear > > > > yet if there is any future scenario for it to be further tuned down. > > > > > > I suggest something like the following for the commit log: > > > > > > ------------------------------------------------------------------------ > > > > > > When under heavy load, network processing can run CPU-bound for many tens > > > of seconds. Even in preemptible kernels, this can block RCU Tasks grace > > > periods, which can cause trace-event removal to take more than a minute, > > > which is unacceptably long. > > > > > > This commit therefore creates a new helper function that passes > > > through both RCU and RCU-Tasks quiescent states every 100 milliseconds. > > > This hard-coded value suffices for current workloads. > > > > FWIW, this sounds good to me. > > > > > > > > ------------------------------------------------------------------------ > > > > > > > Suggested-by: Paul E. McKenney <paulmck@kernel.org> > > > > Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org> > > > > Signed-off-by: Yan Zhai <yan@cloudflare.com> > > > > --- > > > > v3->v4: comment fixup > > > > > > > > --- > > > > include/linux/rcupdate.h | 24 ++++++++++++++++++++++++ > > > > 1 file changed, 24 insertions(+) > > > > > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > > > index 0746b1b0b663..da224706323e 100644 > > > > --- a/include/linux/rcupdate.h > > > > +++ b/include/linux/rcupdate.h > > > > @@ -247,6 +247,30 @@ do { \ > > > > cond_resched(); \ > > > > } while (0) > > > > > > > > +/** > > > > + * rcu_softirq_qs_periodic - Periodically report consolidated quiescent states > > > > + * @old_ts: last jiffies when QS was reported. Might be modified in the macro. > > > > + * > > > > + * This helper is for network processing in non-RT kernels, where there could > > > > + * be busy polling threads that block RCU synchronization indefinitely. In > > > > + * such context, simply calling cond_resched is insufficient, so give it a > > > > + * stronger push to eliminate all potential blockage of all RCU types. > > > > + * > > > > + * NOTE: unless absolutely sure, this helper should in general be called > > > > + * outside of bh lock section to avoid reporting a surprising QS to updaters, > > > > + * who could be expecting RCU read critical section to end at local_bh_enable(). > > > > + */ > > > > > > How about something like this for the kernel-doc comment? > > > > > > /** > > > * rcu_softirq_qs_periodic - Report RCU and RCU-Tasks quiescent states > > > * @old_ts: jiffies at start of processing. > > > * > > > * This helper is for long-running softirq handlers, such as those > > > * in networking. The caller should initialize the variable passed in > > > * as @old_ts at the beginning of the softirq handler. When invoked > > > * frequently, this macro will invoke rcu_softirq_qs() every 100 > > > * milliseconds thereafter, which will provide both RCU and RCU-Tasks > > > * quiescent states. Note that this macro modifies its old_ts argument. > > > * > > > * Note that although cond_resched() provides RCU quiescent states, > > > * it does not provide RCU-Tasks quiescent states. > > > * > > > * Because regions of code that have disabled softirq act as RCU > > > * read-side critical sections, this macro should be invoked with softirq > > > * (and preemption) enabled. > > > * > > > * This macro has no effect in CONFIG_PREEMPT_RT kernels. > > > */ > > > > Considering the note about cond_resched(), does does cond_resched() actually > > provide an RCU quiescent state for fully-preemptible kernels? IIUC for those > > cond_resched() expands to: > > > > __might_resched(); > > klp_sched_try_switch() > > > > ... and AFAICT neither reports an RCU quiescent state. > > > > So maybe it's worth dropping the note? > > > > Seperately, what's the rationale for not doing this on PREEMPT_RT? Does that > > avoid the problem through other means, or are people just not running effected > > workloads on that? > > > It's a bit anti-intuition but yes the RT kernel avoids the problem. > This is because "schedule()" reports task RCU QS actually, and on RT > kernel cond_resched() call won't call "__cond_resched()" or > "__schedule(PREEMPT)" as you already pointed out, which would clear > need-resched flag. This then allows "schedule()" to be called on hard > IRQ exit time by time. > And these are excellent questions that I should originally include in the comment. Thanks for bringing it up. Let me send another version tomorrow, allowing more thoughts on this if any. thanks Yan > Yan > > > Mark. > > > > > > > > Thanx, Paul > > > > > > > +#define rcu_softirq_qs_periodic(old_ts) \ > > > > +do { \ > > > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && \ > > > > + time_after(jiffies, (old_ts) + HZ / 10)) { \ > > > > + preempt_disable(); \ > > > > + rcu_softirq_qs(); \ > > > > + preempt_enable(); \ > > > > + (old_ts) = jiffies; \ > > > > + } \ > > > > +} while (0) > > > > + > > > > /* > > > > * Infrastructure to implement the synchronize_() primitives in > > > > * TREE_RCU and rcu_barrier_() primitives in TINY_RCU. > > > > -- > > > > 2.30.2 > > > > > > > >
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 0746b1b0b663..da224706323e 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -247,6 +247,30 @@ do { \ cond_resched(); \ } while (0) +/** + * rcu_softirq_qs_periodic - Periodically report consolidated quiescent states + * @old_ts: last jiffies when QS was reported. Might be modified in the macro. + * + * This helper is for network processing in non-RT kernels, where there could + * be busy polling threads that block RCU synchronization indefinitely. In + * such context, simply calling cond_resched is insufficient, so give it a + * stronger push to eliminate all potential blockage of all RCU types. + * + * NOTE: unless absolutely sure, this helper should in general be called + * outside of bh lock section to avoid reporting a surprising QS to updaters, + * who could be expecting RCU read critical section to end at local_bh_enable(). + */ +#define rcu_softirq_qs_periodic(old_ts) \ +do { \ + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && \ + time_after(jiffies, (old_ts) + HZ / 10)) { \ + preempt_disable(); \ + rcu_softirq_qs(); \ + preempt_enable(); \ + (old_ts) = jiffies; \ + } \ +} while (0) + /* * Infrastructure to implement the synchronize_() primitives in * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.