Message ID | 20231208220545.7452-1-frederic@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | rcu: Fix expedited GP deadlock (and cleanup some nocb stuff) | expand |
On Fri, Dec 08, 2023 at 11:05:37PM +0100, Frederic Weisbecker wrote: > TREE04 can trigger a writer stall if run with memory pressure. This > is due to a circular dependency between waiting for expedited grace > period and polling on expedited grace period when workqueues go back > to mayday serialization. > > Here is a proposal fix. The torture.sh "acceptance test" with KCSAN and --duration 30 ran fine except for this in TREE09: kernel/rcu/tree_nocb.h:1785:13: error: unused function '__call_rcu_nocb_wake' [-Werror,-Wunused-function] My guess is that the declaration of __call_rcu_nocb_wake() in kernel/rcu/tree.h needs an "#ifdef CONFIG_SMP", but you might have a better fix. Thanx, Paul > Frederic Weisbecker (8): > rcu/nocb: Make IRQs disablement symetric > rcu/nocb: Re-arrange call_rcu() NOCB specific code > rcu/exp: Fix RCU expedited parallel grace period kworker allocation > failure recovery > rcu/exp: Handle RCU expedited grace period kworker allocation failure > rcu: s/boost_kthread_mutex/kthread_mutex > rcu/exp: Make parallel exp gp kworker per rcu node > rcu/exp: Handle parallel exp gp kworkers affinity > rcu/exp: Remove rcu_par_gp_wq > > kernel/rcu/rcu.h | 5 - > kernel/rcu/tree.c | 222 +++++++++++++++++++++++++-------------- > kernel/rcu/tree.h | 12 +-- > kernel/rcu/tree_exp.h | 81 +++----------- > kernel/rcu/tree_nocb.h | 38 ++++--- > kernel/rcu/tree_plugin.h | 52 ++------- > 6 files changed, 191 insertions(+), 219 deletions(-) > > -- > 2.42.1 >
Le Mon, Dec 11, 2023 at 08:38:59AM -0800, Paul E. McKenney a écrit : > On Fri, Dec 08, 2023 at 11:05:37PM +0100, Frederic Weisbecker wrote: > > TREE04 can trigger a writer stall if run with memory pressure. This > > is due to a circular dependency between waiting for expedited grace > > period and polling on expedited grace period when workqueues go back > > to mayday serialization. > > > > Here is a proposal fix. > > The torture.sh "acceptance test" with KCSAN and --duration 30 ran > fine except for this in TREE09: > > kernel/rcu/tree_nocb.h:1785:13: error: unused function '__call_rcu_nocb_wake' [-Werror,-Wunused-function] > > My guess is that the declaration of __call_rcu_nocb_wake() in > kernel/rcu/tree.h needs an "#ifdef CONFIG_SMP", but you might have a > better fix. Could be because if CONFIG_RCU_NO_CB_CPU=n, the function is only called (though as dead code) from rcutree_migrate_callbacks() which in turn only exists if CONFIG_HOTPLUG_CPU=y. Something like that then: diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 35f7af331e6c..e1ff53d5084c 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -445,6 +445,8 @@ static void rcu_qs(void); static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp); #ifdef CONFIG_HOTPLUG_CPU static bool rcu_preempt_has_tasks(struct rcu_node *rnp); +static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty, + unsigned long flags); #endif /* #ifdef CONFIG_HOTPLUG_CPU */ static int rcu_print_task_exp_stall(struct rcu_node *rnp); static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp); @@ -466,8 +468,6 @@ static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp, unsigned long j, bool lazy); static void call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *head, rcu_callback_t func, unsigned long flags, bool lazy); -static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty, - unsigned long flags); static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level); static bool do_nocb_deferred_wakeup(struct rcu_data *rdp); static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp);
On Mon, Dec 11, 2023 at 09:04:04PM +0100, Frederic Weisbecker wrote: > Le Mon, Dec 11, 2023 at 08:38:59AM -0800, Paul E. McKenney a écrit : > > On Fri, Dec 08, 2023 at 11:05:37PM +0100, Frederic Weisbecker wrote: > > > TREE04 can trigger a writer stall if run with memory pressure. This > > > is due to a circular dependency between waiting for expedited grace > > > period and polling on expedited grace period when workqueues go back > > > to mayday serialization. > > > > > > Here is a proposal fix. > > > > The torture.sh "acceptance test" with KCSAN and --duration 30 ran > > fine except for this in TREE09: > > > > kernel/rcu/tree_nocb.h:1785:13: error: unused function '__call_rcu_nocb_wake' [-Werror,-Wunused-function] > > > > My guess is that the declaration of __call_rcu_nocb_wake() in > > kernel/rcu/tree.h needs an "#ifdef CONFIG_SMP", but you might have a > > better fix. > > Could be because if CONFIG_RCU_NO_CB_CPU=n, the function is only called > (though as dead code) from rcutree_migrate_callbacks() which in turn only > exists if CONFIG_HOTPLUG_CPU=y. > > Something like that then: > > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h > index 35f7af331e6c..e1ff53d5084c 100644 > --- a/kernel/rcu/tree.h > +++ b/kernel/rcu/tree.h > @@ -445,6 +445,8 @@ static void rcu_qs(void); > static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp); > #ifdef CONFIG_HOTPLUG_CPU > static bool rcu_preempt_has_tasks(struct rcu_node *rnp); > +static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty, > + unsigned long flags); > #endif /* #ifdef CONFIG_HOTPLUG_CPU */ > static int rcu_print_task_exp_stall(struct rcu_node *rnp); > static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp); > @@ -466,8 +468,6 @@ static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp, > unsigned long j, bool lazy); > static void call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *head, > rcu_callback_t func, unsigned long flags, bool lazy); > -static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty, > - unsigned long flags); > static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level); > static bool do_nocb_deferred_wakeup(struct rcu_data *rdp); > static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp); This one passes TREE01 and TINY01, but on TREE09 still gets this: kernel/rcu/tree_nocb.h:1785:13: error: ‘__call_rcu_nocb_wake’ defined but not used [-Werror=unused-function] Huh. I suppose that there is always __maybe_unused? Thanx, Paul
On Mon, Dec 11, 2023 at 01:39:40PM -0800, Paul E. McKenney wrote: > This one passes TREE01 and TINY01, but on TREE09 still gets this: > > kernel/rcu/tree_nocb.h:1785:13: error: ‘__call_rcu_nocb_wake’ defined but not used [-Werror=unused-function] > > Huh. I suppose that there is always __maybe_unused? Looks like a good fit indeed! Thanks! > Thanx, Paul