Message ID | 20250227131613.52683-3-urezki@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v4,1/3] rcutorture: Allow a negative value for nfakewriters | expand |
Hi Ulad, I put these three patches into next (and misc.2025.02.27a) for some testing, hopefully it all goes well and they can make it v6.15. A few tag changed below: On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote: > Switch for using of get_state_synchronize_rcu_full() and > poll_state_synchronize_rcu_full() pair to debug a normal > synchronize_rcu() call. > > Just using "not" full APIs to identify if a grace period is > passed or not might lead to a false-positive kernel splat. > > It can happen, because get_state_synchronize_rcu() compresses > both normal and expedited states into one single unsigned long > value, so a poll_state_synchronize_rcu() can miss GP-completion > when synchronize_rcu()/synchronize_rcu_expedited() concurrently > run. > > To address this, switch to poll_state_synchronize_rcu_full() and > get_state_synchronize_rcu_full() APIs, which use separate variables > for expedited and normal states. > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/ I switch this into "Closes:" per checkpatch. > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency") > Reported-by: cheung wall <zzqq0103.hey@gmail.com> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> You seem to forget add Paul's Reviewed-by, so I add it in rcu/next. Would you or Paul double-check the Reviewed-by should be here? Regards, Boqun > --- > include/linux/rcupdate_wait.h | 3 +++ > kernel/rcu/tree.c | 8 +++----- > 2 files changed, 6 insertions(+), 5 deletions(-) > > diff --git a/include/linux/rcupdate_wait.h b/include/linux/rcupdate_wait.h > index f9bed3d3f78d..4c92d4291cce 100644 > --- a/include/linux/rcupdate_wait.h > +++ b/include/linux/rcupdate_wait.h > @@ -16,6 +16,9 @@ > struct rcu_synchronize { > struct rcu_head head; > struct completion completion; > + > + /* This is for debugging. */ > + struct rcu_gp_oldstate oldstate; > }; > void wakeme_after_rcu(struct rcu_head *head); > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 8625f616c65a..48384fa2eaeb 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1632,12 +1632,10 @@ static void rcu_sr_normal_complete(struct llist_node *node) > { > struct rcu_synchronize *rs = container_of( > (struct rcu_head *) node, struct rcu_synchronize, head); > - unsigned long oldstate = (unsigned long) rs->head.func; > > WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && > - !poll_state_synchronize_rcu(oldstate), > - "A full grace period is not passed yet: %lu", > - rcu_seq_diff(get_state_synchronize_rcu(), oldstate)); > + !poll_state_synchronize_rcu_full(&rs->oldstate), > + "A full grace period is not passed yet!\n"); > > /* Finally. */ > complete(&rs->completion); > @@ -3247,7 +3245,7 @@ static void synchronize_rcu_normal(void) > * snapshot before adding a request. > */ > if (IS_ENABLED(CONFIG_PROVE_RCU)) > - rs.head.func = (void *) get_state_synchronize_rcu(); > + get_state_synchronize_rcu_full(&rs.oldstate); > > rcu_sr_normal_add_req(&rs); > > -- > 2.39.5 >
On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote: > Hi Ulad, > > I put these three patches into next (and misc.2025.02.27a) for some > testing, hopefully it all goes well and they can make it v6.15. > > A few tag changed below: > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote: > > Switch for using of get_state_synchronize_rcu_full() and > > poll_state_synchronize_rcu_full() pair to debug a normal > > synchronize_rcu() call. > > > > Just using "not" full APIs to identify if a grace period is > > passed or not might lead to a false-positive kernel splat. > > > > It can happen, because get_state_synchronize_rcu() compresses > > both normal and expedited states into one single unsigned long > > value, so a poll_state_synchronize_rcu() can miss GP-completion > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently > > run. > > > > To address this, switch to poll_state_synchronize_rcu_full() and > > get_state_synchronize_rcu_full() APIs, which use separate variables > > for expedited and normal states. > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/ > > I switch this into "Closes:" per checkpatch. > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency") > > Reported-by: cheung wall <zzqq0103.hey@gmail.com> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next. > Would you or Paul double-check the Reviewed-by should be here? I am good with keeping my Reviewed-by tags. Thanx, Paul > Regards, > Boqun > > > --- > > include/linux/rcupdate_wait.h | 3 +++ > > kernel/rcu/tree.c | 8 +++----- > > 2 files changed, 6 insertions(+), 5 deletions(-) > > > > diff --git a/include/linux/rcupdate_wait.h b/include/linux/rcupdate_wait.h > > index f9bed3d3f78d..4c92d4291cce 100644 > > --- a/include/linux/rcupdate_wait.h > > +++ b/include/linux/rcupdate_wait.h > > @@ -16,6 +16,9 @@ > > struct rcu_synchronize { > > struct rcu_head head; > > struct completion completion; > > + > > + /* This is for debugging. */ > > + struct rcu_gp_oldstate oldstate; > > }; > > void wakeme_after_rcu(struct rcu_head *head); > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 8625f616c65a..48384fa2eaeb 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -1632,12 +1632,10 @@ static void rcu_sr_normal_complete(struct llist_node *node) > > { > > struct rcu_synchronize *rs = container_of( > > (struct rcu_head *) node, struct rcu_synchronize, head); > > - unsigned long oldstate = (unsigned long) rs->head.func; > > > > WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && > > - !poll_state_synchronize_rcu(oldstate), > > - "A full grace period is not passed yet: %lu", > > - rcu_seq_diff(get_state_synchronize_rcu(), oldstate)); > > + !poll_state_synchronize_rcu_full(&rs->oldstate), > > + "A full grace period is not passed yet!\n"); > > > > /* Finally. */ > > complete(&rs->completion); > > @@ -3247,7 +3245,7 @@ static void synchronize_rcu_normal(void) > > * snapshot before adding a request. > > */ > > if (IS_ENABLED(CONFIG_PROVE_RCU)) > > - rs.head.func = (void *) get_state_synchronize_rcu(); > > + get_state_synchronize_rcu_full(&rs.oldstate); > > > > rcu_sr_normal_add_req(&rs); > > > > -- > > 2.39.5 > >
On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote: > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote: > > Hi Ulad, > > > > I put these three patches into next (and misc.2025.02.27a) for some > > testing, hopefully it all goes well and they can make it v6.15. > > > > A few tag changed below: > > > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote: > > > Switch for using of get_state_synchronize_rcu_full() and > > > poll_state_synchronize_rcu_full() pair to debug a normal > > > synchronize_rcu() call. > > > > > > Just using "not" full APIs to identify if a grace period is > > > passed or not might lead to a false-positive kernel splat. > > > > > > It can happen, because get_state_synchronize_rcu() compresses > > > both normal and expedited states into one single unsigned long > > > value, so a poll_state_synchronize_rcu() can miss GP-completion > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently > > > run. > > > > > > To address this, switch to poll_state_synchronize_rcu_full() and > > > get_state_synchronize_rcu_full() APIs, which use separate variables > > > for expedited and normal states. > > > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/ > > > > I switch this into "Closes:" per checkpatch. > > > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency") > > > Reported-by: cheung wall <zzqq0103.hey@gmail.com> > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next. > > Would you or Paul double-check the Reviewed-by should be here? > > I am good with keeping my Reviewed-by tags. > Thank you! Regards, Boqun > Thanx, Paul > [...]
Hello, Boqun! > Hi Ulad, > > I put these three patches into next (and misc.2025.02.27a) for some > testing, hopefully it all goes well and they can make it v6.15. > > A few tag changed below: > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote: > > Switch for using of get_state_synchronize_rcu_full() and > > poll_state_synchronize_rcu_full() pair to debug a normal > > synchronize_rcu() call. > > > > Just using "not" full APIs to identify if a grace period is > > passed or not might lead to a false-positive kernel splat. > > > > It can happen, because get_state_synchronize_rcu() compresses > > both normal and expedited states into one single unsigned long > > value, so a poll_state_synchronize_rcu() can miss GP-completion > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently > > run. > > > > To address this, switch to poll_state_synchronize_rcu_full() and > > get_state_synchronize_rcu_full() APIs, which use separate variables > > for expedited and normal states. > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/ > > I switch this into "Closes:" per checkpatch. > Thanks! > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency") > > Reported-by: cheung wall <zzqq0103.hey@gmail.com> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next. > Would you or Paul double-check the Reviewed-by should be here? > Probably. But i do not remember that i got a review from Paul. Might be wrong and have missed that :) Anyway, thank you for adding. -- Uladzislau Rezki
On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote: > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote: > > Hi Ulad, > > > > I put these three patches into next (and misc.2025.02.27a) for some > > testing, hopefully it all goes well and they can make it v6.15. > > > > A few tag changed below: > > > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote: > > > Switch for using of get_state_synchronize_rcu_full() and > > > poll_state_synchronize_rcu_full() pair to debug a normal > > > synchronize_rcu() call. > > > > > > Just using "not" full APIs to identify if a grace period is > > > passed or not might lead to a false-positive kernel splat. > > > > > > It can happen, because get_state_synchronize_rcu() compresses > > > both normal and expedited states into one single unsigned long > > > value, so a poll_state_synchronize_rcu() can miss GP-completion > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently > > > run. > > > > > > To address this, switch to poll_state_synchronize_rcu_full() and > > > get_state_synchronize_rcu_full() APIs, which use separate variables > > > for expedited and normal states. > > > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/ > > > > I switch this into "Closes:" per checkpatch. > > > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency") > > > Reported-by: cheung wall <zzqq0103.hey@gmail.com> > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next. > > Would you or Paul double-check the Reviewed-by should be here? > > I am good with keeping my Reviewed-by tags. > Thanks Paul! -- Uladzislau Rezki
On Thu, Feb 27, 2025 at 06:44:15PM +0100, Uladzislau Rezki wrote: > On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote: > > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote: > > > Hi Ulad, > > > > > > I put these three patches into next (and misc.2025.02.27a) for some > > > testing, hopefully it all goes well and they can make it v6.15. > > > > > > A few tag changed below: > > > > > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote: > > > > Switch for using of get_state_synchronize_rcu_full() and > > > > poll_state_synchronize_rcu_full() pair to debug a normal > > > > synchronize_rcu() call. > > > > > > > > Just using "not" full APIs to identify if a grace period is > > > > passed or not might lead to a false-positive kernel splat. > > > > > > > > It can happen, because get_state_synchronize_rcu() compresses > > > > both normal and expedited states into one single unsigned long > > > > value, so a poll_state_synchronize_rcu() can miss GP-completion > > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently > > > > run. > > > > > > > > To address this, switch to poll_state_synchronize_rcu_full() and > > > > get_state_synchronize_rcu_full() APIs, which use separate variables > > > > for expedited and normal states. > > > > > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/ > > > > > > I switch this into "Closes:" per checkpatch. > > > > > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency") > > > > Reported-by: cheung wall <zzqq0103.hey@gmail.com> > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > > > > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next. > > > Would you or Paul double-check the Reviewed-by should be here? > > > > I am good with keeping my Reviewed-by tags. > > > Thanks Paul! Except that I got this from overnight testing of rcu/dev on the shared RCU tree: WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 I see this only on TREE05. Which should not be too surprising, given that this is the scenario that tests it. It happened within five minutes on all 14 of the TREE05 runs. This commit, just to avoid any ambiguity: 7cb48b64a563 ("MAINTAINERS: Update Joel's email address") Thanx, Paul
On Fri, Feb 28, 2025 at 07:41:40AM -0800, Paul E. McKenney wrote: > On Thu, Feb 27, 2025 at 06:44:15PM +0100, Uladzislau Rezki wrote: > > On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote: > > > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote: > > > > Hi Ulad, > > > > > > > > I put these three patches into next (and misc.2025.02.27a) for some > > > > testing, hopefully it all goes well and they can make it v6.15. > > > > > > > > A few tag changed below: > > > > > > > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote: > > > > > Switch for using of get_state_synchronize_rcu_full() and > > > > > poll_state_synchronize_rcu_full() pair to debug a normal > > > > > synchronize_rcu() call. > > > > > > > > > > Just using "not" full APIs to identify if a grace period is > > > > > passed or not might lead to a false-positive kernel splat. > > > > > > > > > > It can happen, because get_state_synchronize_rcu() compresses > > > > > both normal and expedited states into one single unsigned long > > > > > value, so a poll_state_synchronize_rcu() can miss GP-completion > > > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently > > > > > run. > > > > > > > > > > To address this, switch to poll_state_synchronize_rcu_full() and > > > > > get_state_synchronize_rcu_full() APIs, which use separate variables > > > > > for expedited and normal states. > > > > > > > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/ > > > > > > > > I switch this into "Closes:" per checkpatch. > > > > > > > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency") > > > > > Reported-by: cheung wall <zzqq0103.hey@gmail.com> > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > > > > > > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next. > > > > Would you or Paul double-check the Reviewed-by should be here? > > > > > > I am good with keeping my Reviewed-by tags. > > > > > Thanks Paul! > > Except that I got this from overnight testing of rcu/dev on the shared > RCU tree: > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > I see this only on TREE05. Which should not be too surprising, given > that this is the scenario that tests it. It happened within five minutes > on all 14 of the TREE05 runs. > Hm.. This is not fun. I tested this on my system and i did not manage to trigger this whereas you do. Something is wrong. -- Uladzislau Rezki
On Fri, Feb 28, 2025 at 05:36:47PM +0100, Uladzislau Rezki wrote: > On Fri, Feb 28, 2025 at 07:41:40AM -0800, Paul E. McKenney wrote: > > On Thu, Feb 27, 2025 at 06:44:15PM +0100, Uladzislau Rezki wrote: > > > On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote: > > > > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote: > > > > > Hi Ulad, > > > > > > > > > > I put these three patches into next (and misc.2025.02.27a) for some > > > > > testing, hopefully it all goes well and they can make it v6.15. > > > > > > > > > > A few tag changed below: > > > > > > > > > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote: > > > > > > Switch for using of get_state_synchronize_rcu_full() and > > > > > > poll_state_synchronize_rcu_full() pair to debug a normal > > > > > > synchronize_rcu() call. > > > > > > > > > > > > Just using "not" full APIs to identify if a grace period is > > > > > > passed or not might lead to a false-positive kernel splat. > > > > > > > > > > > > It can happen, because get_state_synchronize_rcu() compresses > > > > > > both normal and expedited states into one single unsigned long > > > > > > value, so a poll_state_synchronize_rcu() can miss GP-completion > > > > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently > > > > > > run. > > > > > > > > > > > > To address this, switch to poll_state_synchronize_rcu_full() and > > > > > > get_state_synchronize_rcu_full() APIs, which use separate variables > > > > > > for expedited and normal states. > > > > > > > > > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/ > > > > > > > > > > I switch this into "Closes:" per checkpatch. > > > > > > > > > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency") > > > > > > Reported-by: cheung wall <zzqq0103.hey@gmail.com> > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > > > > > > > > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next. > > > > > Would you or Paul double-check the Reviewed-by should be here? > > > > > > > > I am good with keeping my Reviewed-by tags. > > > > > > > Thanks Paul! > > > > Except that I got this from overnight testing of rcu/dev on the shared > > RCU tree: > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > > > I see this only on TREE05. Which should not be too surprising, given > > that this is the scenario that tests it. It happened within five minutes > > on all 14 of the TREE05 runs. > > > Hm.. This is not fun. I tested this on my system and i did not manage to > trigger this whereas you do. Something is wrong. > We have below code to start a new GP, if we detect that processing is starved: <snip> /* * The "start_new_poll" is set to true, only when this GP is not able * to handle anything and there are outstanding users. It happens when * the rcu_sr_normal_gp_init() function was not able to insert a dummy * separator to the llist, because there were no left any dummy-nodes. * * Number of dummy-nodes is fixed, it could be that we are run out of * them, if so we start a new pool request to repeat a try. It is rare * and it means that a system is doing a slow processing of callbacks. */ if (start_new_poll) (void) start_poll_synchronize_rcu(); <snip> we do not use a _full() version, since we need to inform rcu-gp-kthread to initiate a new GP. Any thoughts? -- Uladzislau Rezki
On Fri, Feb 28, 2025 at 05:36:47PM +0100, Uladzislau Rezki wrote: > On Fri, Feb 28, 2025 at 07:41:40AM -0800, Paul E. McKenney wrote: > > On Thu, Feb 27, 2025 at 06:44:15PM +0100, Uladzislau Rezki wrote: > > > On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote: > > > > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote: > > > > > Hi Ulad, > > > > > > > > > > I put these three patches into next (and misc.2025.02.27a) for some > > > > > testing, hopefully it all goes well and they can make it v6.15. > > > > > > > > > > A few tag changed below: > > > > > > > > > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote: > > > > > > Switch for using of get_state_synchronize_rcu_full() and > > > > > > poll_state_synchronize_rcu_full() pair to debug a normal > > > > > > synchronize_rcu() call. > > > > > > > > > > > > Just using "not" full APIs to identify if a grace period is > > > > > > passed or not might lead to a false-positive kernel splat. > > > > > > > > > > > > It can happen, because get_state_synchronize_rcu() compresses > > > > > > both normal and expedited states into one single unsigned long > > > > > > value, so a poll_state_synchronize_rcu() can miss GP-completion > > > > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently > > > > > > run. > > > > > > > > > > > > To address this, switch to poll_state_synchronize_rcu_full() and > > > > > > get_state_synchronize_rcu_full() APIs, which use separate variables > > > > > > for expedited and normal states. > > > > > > > > > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/ > > > > > > > > > > I switch this into "Closes:" per checkpatch. > > > > > > > > > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency") > > > > > > Reported-by: cheung wall <zzqq0103.hey@gmail.com> > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > > > > > > > > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next. > > > > > Would you or Paul double-check the Reviewed-by should be here? > > > > > > > > I am good with keeping my Reviewed-by tags. > > > > > > > Thanks Paul! > > > > Except that I got this from overnight testing of rcu/dev on the shared > > RCU tree: > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > > > I see this only on TREE05. Which should not be too surprising, given > > that this is the scenario that tests it. It happened within five minutes > > on all 14 of the TREE05 runs. > > > Hm.. This is not fun. I tested this on my system and i did not manage to > trigger this whereas you do. Something is wrong. If you have a debug patch, I would be happy to give it a go. Thanx, Paul
On Fri, Feb 28, 2025 at 10:21:57AM -0800, Paul E. McKenney wrote: > On Fri, Feb 28, 2025 at 05:36:47PM +0100, Uladzislau Rezki wrote: > > On Fri, Feb 28, 2025 at 07:41:40AM -0800, Paul E. McKenney wrote: > > > On Thu, Feb 27, 2025 at 06:44:15PM +0100, Uladzislau Rezki wrote: > > > > On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote: > > > > > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote: > > > > > > Hi Ulad, > > > > > > > > > > > > I put these three patches into next (and misc.2025.02.27a) for some > > > > > > testing, hopefully it all goes well and they can make it v6.15. > > > > > > > > > > > > A few tag changed below: > > > > > > > > > > > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote: > > > > > > > Switch for using of get_state_synchronize_rcu_full() and > > > > > > > poll_state_synchronize_rcu_full() pair to debug a normal > > > > > > > synchronize_rcu() call. > > > > > > > > > > > > > > Just using "not" full APIs to identify if a grace period is > > > > > > > passed or not might lead to a false-positive kernel splat. > > > > > > > > > > > > > > It can happen, because get_state_synchronize_rcu() compresses > > > > > > > both normal and expedited states into one single unsigned long > > > > > > > value, so a poll_state_synchronize_rcu() can miss GP-completion > > > > > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently > > > > > > > run. > > > > > > > > > > > > > > To address this, switch to poll_state_synchronize_rcu_full() and > > > > > > > get_state_synchronize_rcu_full() APIs, which use separate variables > > > > > > > for expedited and normal states. > > > > > > > > > > > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/ > > > > > > > > > > > > I switch this into "Closes:" per checkpatch. > > > > > > > > > > > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency") > > > > > > > Reported-by: cheung wall <zzqq0103.hey@gmail.com> > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > > > > > > > > > > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next. > > > > > > Would you or Paul double-check the Reviewed-by should be here? > > > > > > > > > > I am good with keeping my Reviewed-by tags. > > > > > > > > > Thanks Paul! > > > > > > Except that I got this from overnight testing of rcu/dev on the shared > > > RCU tree: > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > > > > > I see this only on TREE05. Which should not be too surprising, given > > > that this is the scenario that tests it. It happened within five minutes > > > on all 14 of the TREE05 runs. > > > > > Hm.. This is not fun. I tested this on my system and i did not manage to > > trigger this whereas you do. Something is wrong. > > If you have a debug patch, I would be happy to give it a go. > I can trigger it. But. Some background. I tested those patches during many hours on the stable kernel which is 6.13. On that kernel i was not able to trigger it. Running the rcutorture on the our shared "dev" tree, which i did now, triggers this right away. -- Uladzislau Rezki
On Fri, Feb 28, 2025 at 06:08:19PM +0100, Uladzislau Rezki wrote: > On Fri, Feb 28, 2025 at 05:36:47PM +0100, Uladzislau Rezki wrote: > > On Fri, Feb 28, 2025 at 07:41:40AM -0800, Paul E. McKenney wrote: > > > On Thu, Feb 27, 2025 at 06:44:15PM +0100, Uladzislau Rezki wrote: > > > > On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote: > > > > > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote: > > > > > > Hi Ulad, > > > > > > > > > > > > I put these three patches into next (and misc.2025.02.27a) for some > > > > > > testing, hopefully it all goes well and they can make it v6.15. > > > > > > > > > > > > A few tag changed below: > > > > > > > > > > > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote: > > > > > > > Switch for using of get_state_synchronize_rcu_full() and > > > > > > > poll_state_synchronize_rcu_full() pair to debug a normal > > > > > > > synchronize_rcu() call. > > > > > > > > > > > > > > Just using "not" full APIs to identify if a grace period is > > > > > > > passed or not might lead to a false-positive kernel splat. > > > > > > > > > > > > > > It can happen, because get_state_synchronize_rcu() compresses > > > > > > > both normal and expedited states into one single unsigned long > > > > > > > value, so a poll_state_synchronize_rcu() can miss GP-completion > > > > > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently > > > > > > > run. > > > > > > > > > > > > > > To address this, switch to poll_state_synchronize_rcu_full() and > > > > > > > get_state_synchronize_rcu_full() APIs, which use separate variables > > > > > > > for expedited and normal states. > > > > > > > > > > > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/ > > > > > > > > > > > > I switch this into "Closes:" per checkpatch. > > > > > > > > > > > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency") > > > > > > > Reported-by: cheung wall <zzqq0103.hey@gmail.com> > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > > > > > > > > > > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next. > > > > > > Would you or Paul double-check the Reviewed-by should be here? > > > > > > > > > > I am good with keeping my Reviewed-by tags. > > > > > > > > > Thanks Paul! > > > > > > Except that I got this from overnight testing of rcu/dev on the shared > > > RCU tree: > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > > > > > I see this only on TREE05. Which should not be too surprising, given > > > that this is the scenario that tests it. It happened within five minutes > > > on all 14 of the TREE05 runs. > > > > > Hm.. This is not fun. I tested this on my system and i did not manage to > > trigger this whereas you do. Something is wrong. > > > We have below code to start a new GP, if we detect that processing is > starved: > > <snip> > /* > * The "start_new_poll" is set to true, only when this GP is not able > * to handle anything and there are outstanding users. It happens when > * the rcu_sr_normal_gp_init() function was not able to insert a dummy > * separator to the llist, because there were no left any dummy-nodes. > * > * Number of dummy-nodes is fixed, it could be that we are run out of > * them, if so we start a new pool request to repeat a try. It is rare > * and it means that a system is doing a slow processing of callbacks. > */ > if (start_new_poll) > (void) start_poll_synchronize_rcu(); > <snip> > > we do not use a _full() version, since we need to inform rcu-gp-kthread > to initiate a new GP. > > Any thoughts? My kneejerk not-to-be-trusted take is that it does not matter which type of grace period gets started so long as a grace period does get started. Presumably you have done the get_state_synchronize_rcu_full() before this point? Thanx, Paul
On Fri, Feb 28, 2025 at 10:25:58AM -0800, Paul E. McKenney wrote: > On Fri, Feb 28, 2025 at 06:08:19PM +0100, Uladzislau Rezki wrote: > > On Fri, Feb 28, 2025 at 05:36:47PM +0100, Uladzislau Rezki wrote: > > > On Fri, Feb 28, 2025 at 07:41:40AM -0800, Paul E. McKenney wrote: > > > > On Thu, Feb 27, 2025 at 06:44:15PM +0100, Uladzislau Rezki wrote: > > > > > On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote: > > > > > > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote: > > > > > > > Hi Ulad, > > > > > > > > > > > > > > I put these three patches into next (and misc.2025.02.27a) for some > > > > > > > testing, hopefully it all goes well and they can make it v6.15. > > > > > > > > > > > > > > A few tag changed below: > > > > > > > > > > > > > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote: > > > > > > > > Switch for using of get_state_synchronize_rcu_full() and > > > > > > > > poll_state_synchronize_rcu_full() pair to debug a normal > > > > > > > > synchronize_rcu() call. > > > > > > > > > > > > > > > > Just using "not" full APIs to identify if a grace period is > > > > > > > > passed or not might lead to a false-positive kernel splat. > > > > > > > > > > > > > > > > It can happen, because get_state_synchronize_rcu() compresses > > > > > > > > both normal and expedited states into one single unsigned long > > > > > > > > value, so a poll_state_synchronize_rcu() can miss GP-completion > > > > > > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently > > > > > > > > run. > > > > > > > > > > > > > > > > To address this, switch to poll_state_synchronize_rcu_full() and > > > > > > > > get_state_synchronize_rcu_full() APIs, which use separate variables > > > > > > > > for expedited and normal states. > > > > > > > > > > > > > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/ > > > > > > > > > > > > > > I switch this into "Closes:" per checkpatch. > > > > > > > > > > > > > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency") > > > > > > > > Reported-by: cheung wall <zzqq0103.hey@gmail.com> > > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > > > > > > > > > > > > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next. > > > > > > > Would you or Paul double-check the Reviewed-by should be here? > > > > > > > > > > > > I am good with keeping my Reviewed-by tags. > > > > > > > > > > > Thanks Paul! > > > > > > > > Except that I got this from overnight testing of rcu/dev on the shared > > > > RCU tree: > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > > > > > > > I see this only on TREE05. Which should not be too surprising, given > > > > that this is the scenario that tests it. It happened within five minutes > > > > on all 14 of the TREE05 runs. > > > > > > > Hm.. This is not fun. I tested this on my system and i did not manage to > > > trigger this whereas you do. Something is wrong. > > > > > We have below code to start a new GP, if we detect that processing is > > starved: > > > > <snip> > > /* > > * The "start_new_poll" is set to true, only when this GP is not able > > * to handle anything and there are outstanding users. It happens when > > * the rcu_sr_normal_gp_init() function was not able to insert a dummy > > * separator to the llist, because there were no left any dummy-nodes. > > * > > * Number of dummy-nodes is fixed, it could be that we are run out of > > * them, if so we start a new pool request to repeat a try. It is rare > > * and it means that a system is doing a slow processing of callbacks. > > */ > > if (start_new_poll) > > (void) start_poll_synchronize_rcu(); > > <snip> > > > > we do not use a _full() version, since we need to inform rcu-gp-kthread > > to initiate a new GP. > > > > Any thoughts? > > My kneejerk not-to-be-trusted take is that it does not matter which type > of grace period gets started so long as a grace period does get started. > Presumably you have done the get_state_synchronize_rcu_full() before > this point? > Yes, of course. That code is located in the gp-kthread. get_state_synchronize_rcu_full() is done before. -- Uladzislau Rezki
On Fri, Feb 28, 2025 at 07:24:41PM +0100, Uladzislau Rezki wrote: > On Fri, Feb 28, 2025 at 10:21:57AM -0800, Paul E. McKenney wrote: > > On Fri, Feb 28, 2025 at 05:36:47PM +0100, Uladzislau Rezki wrote: > > > On Fri, Feb 28, 2025 at 07:41:40AM -0800, Paul E. McKenney wrote: > > > > On Thu, Feb 27, 2025 at 06:44:15PM +0100, Uladzislau Rezki wrote: > > > > > On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote: > > > > > > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote: > > > > > > > Hi Ulad, > > > > > > > > > > > > > > I put these three patches into next (and misc.2025.02.27a) for some > > > > > > > testing, hopefully it all goes well and they can make it v6.15. > > > > > > > > > > > > > > A few tag changed below: > > > > > > > > > > > > > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote: > > > > > > > > Switch for using of get_state_synchronize_rcu_full() and > > > > > > > > poll_state_synchronize_rcu_full() pair to debug a normal > > > > > > > > synchronize_rcu() call. > > > > > > > > > > > > > > > > Just using "not" full APIs to identify if a grace period is > > > > > > > > passed or not might lead to a false-positive kernel splat. > > > > > > > > > > > > > > > > It can happen, because get_state_synchronize_rcu() compresses > > > > > > > > both normal and expedited states into one single unsigned long > > > > > > > > value, so a poll_state_synchronize_rcu() can miss GP-completion > > > > > > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently > > > > > > > > run. > > > > > > > > > > > > > > > > To address this, switch to poll_state_synchronize_rcu_full() and > > > > > > > > get_state_synchronize_rcu_full() APIs, which use separate variables > > > > > > > > for expedited and normal states. > > > > > > > > > > > > > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/ > > > > > > > > > > > > > > I switch this into "Closes:" per checkpatch. > > > > > > > > > > > > > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency") > > > > > > > > Reported-by: cheung wall <zzqq0103.hey@gmail.com> > > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > > > > > > > > > > > > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next. > > > > > > > Would you or Paul double-check the Reviewed-by should be here? > > > > > > > > > > > > I am good with keeping my Reviewed-by tags. > > > > > > > > > > > Thanks Paul! > > > > > > > > Except that I got this from overnight testing of rcu/dev on the shared > > > > RCU tree: > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > > > > > > > I see this only on TREE05. Which should not be too surprising, given > > > > that this is the scenario that tests it. It happened within five minutes > > > > on all 14 of the TREE05 runs. > > > > > > > Hm.. This is not fun. I tested this on my system and i did not manage to > > > trigger this whereas you do. Something is wrong. > > > > If you have a debug patch, I would be happy to give it a go. > > > I can trigger it. But. > > Some background. I tested those patches during many hours on the stable > kernel which is 6.13. On that kernel i was not able to trigger it. Running > the rcutorture on the our shared "dev" tree, which i did now, triggers this > right away. Bisection? (Hey, you knew that was coming!) Thanx, Paul
Hello, Paul! > > > > > > > > > > Except that I got this from overnight testing of rcu/dev on the shared > > > > > RCU tree: > > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > > > > > > > > > I see this only on TREE05. Which should not be too surprising, given > > > > > that this is the scenario that tests it. It happened within five minutes > > > > > on all 14 of the TREE05 runs. > > > > > > > > > Hm.. This is not fun. I tested this on my system and i did not manage to > > > > trigger this whereas you do. Something is wrong. > > > > > > If you have a debug patch, I would be happy to give it a go. > > > > > I can trigger it. But. > > > > Some background. I tested those patches during many hours on the stable > > kernel which is 6.13. On that kernel i was not able to trigger it. Running > > the rcutorture on the our shared "dev" tree, which i did now, triggers this > > right away. > > Bisection? (Hey, you knew that was coming!) > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start detection After revert in the dev, rcutorture passes TREE05, 16 instances. -- Uladzislau Rezki
On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote: > Hello, Paul! > > > > > > > > > > > > > Except that I got this from overnight testing of rcu/dev on the shared > > > > > > RCU tree: > > > > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > > > > > > > > > > > I see this only on TREE05. Which should not be too surprising, given > > > > > > that this is the scenario that tests it. It happened within five minutes > > > > > > on all 14 of the TREE05 runs. > > > > > > > > > > > Hm.. This is not fun. I tested this on my system and i did not manage to > > > > > trigger this whereas you do. Something is wrong. > > > > > > > > If you have a debug patch, I would be happy to give it a go. > > > > > > > I can trigger it. But. > > > > > > Some background. I tested those patches during many hours on the stable > > > kernel which is 6.13. On that kernel i was not able to trigger it. Running > > > the rcutorture on the our shared "dev" tree, which i did now, triggers this > > > right away. > > > > Bisection? (Hey, you knew that was coming!) > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start detection > > After revert in the dev, rcutorture passes TREE05, 16 instances. Huh. We sure don't get to revert that one... Do we have a problem with the ordering in rcu_gp_init() between the calls to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example, do we need to capture the relevant portion of the list before the call to rcu_seq_start(), and do the grace-period-start work afterwards? My kneejerk (and possibibly completely wrong) guess is that rcu_gp_init() calls rcu_gp_start(), then there is a call to synchronize_rcu() whose cookie says wait for the end of the next grace period, then we capture the lists including this one that needs to wait longer. Then when we look at the end of the grace period, boom! This would be a real bug due to some CPU coming online between the time of the call to rcu_gp_start() and synchronize_rcu(). Or is there some other way that this can happen? Thanx, Paul
On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote: > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote: > > Hello, Paul! > > > > > > > > > > > > > > > > Except that I got this from overnight testing of rcu/dev on the shared > > > > > > > RCU tree: > > > > > > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > > > > > > > > > > > > > I see this only on TREE05. Which should not be too surprising, given > > > > > > > that this is the scenario that tests it. It happened within five minutes > > > > > > > on all 14 of the TREE05 runs. > > > > > > > > > > > > > Hm.. This is not fun. I tested this on my system and i did not manage to > > > > > > trigger this whereas you do. Something is wrong. > > > > > > > > > > If you have a debug patch, I would be happy to give it a go. > > > > > > > > > I can trigger it. But. > > > > > > > > Some background. I tested those patches during many hours on the stable > > > > kernel which is 6.13. On that kernel i was not able to trigger it. Running > > > > the rcutorture on the our shared "dev" tree, which i did now, triggers this > > > > right away. > > > > > > Bisection? (Hey, you knew that was coming!) > > > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start detection > > > > After revert in the dev, rcutorture passes TREE05, 16 instances. > > Huh. We sure don't get to revert that one... > > Do we have a problem with the ordering in rcu_gp_init() between the calls > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example, > do we need to capture the relevant portion of the list before the call > to rcu_seq_start(), and do the grace-period-start work afterwards? I tried moving the call to rcu_sr_normal_gp_init() before the call to rcu_seq_start() and got no failures in a one-hour run of 200*TREE05. Which does not necessarily mean that this is the correct fix, but I figured that it might at least provide food for thought. Thanx, Paul ------------------------------------------------------------------------ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 48384fa2eaeb8..d3efeff7740e7 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void) /* Advance to a new grace period and initialize state. */ record_gp_stall_check_time(); + start_new_poll = rcu_sr_normal_gp_init(); /* Record GP times before starting GP, hence rcu_seq_start(). */ rcu_seq_start(&rcu_state.gp_seq); ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq); - start_new_poll = rcu_sr_normal_gp_init(); trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start")); rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap); raw_spin_unlock_irq_rcu_node(rnp);
On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote: > On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote: > > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote: > > > Hello, Paul! > > > > > > > > > > > > > > > > > > > Except that I got this from overnight testing of rcu/dev on the shared > > > > > > > > RCU tree: > > > > > > > > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > > > > > > > > > > > > > > > I see this only on TREE05. Which should not be too surprising, given > > > > > > > > that this is the scenario that tests it. It happened within five minutes > > > > > > > > on all 14 of the TREE05 runs. > > > > > > > > > > > > > > > Hm.. This is not fun. I tested this on my system and i did not manage to > > > > > > > trigger this whereas you do. Something is wrong. > > > > > > > > > > > > If you have a debug patch, I would be happy to give it a go. > > > > > > > > > > > I can trigger it. But. > > > > > > > > > > Some background. I tested those patches during many hours on the stable > > > > > kernel which is 6.13. On that kernel i was not able to trigger it. Running > > > > > the rcutorture on the our shared "dev" tree, which i did now, triggers this > > > > > right away. > > > > > > > > Bisection? (Hey, you knew that was coming!) > > > > > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start detection > > > > > > After revert in the dev, rcutorture passes TREE05, 16 instances. > > > > Huh. We sure don't get to revert that one... > > > > Do we have a problem with the ordering in rcu_gp_init() between the calls > > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example, > > do we need to capture the relevant portion of the list before the call > > to rcu_seq_start(), and do the grace-period-start work afterwards? > > I tried moving the call to rcu_sr_normal_gp_init() before the call to > rcu_seq_start() and got no failures in a one-hour run of 200*TREE05. > Which does not necessarily mean that this is the correct fix, but I > figured that it might at least provide food for thought. > > Thanx, Paul > > ------------------------------------------------------------------------ > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 48384fa2eaeb8..d3efeff7740e7 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void) > > /* Advance to a new grace period and initialize state. */ > record_gp_stall_check_time(); > + start_new_poll = rcu_sr_normal_gp_init(); > /* Record GP times before starting GP, hence rcu_seq_start(). */ > rcu_seq_start(&rcu_state.gp_seq); > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq); > - start_new_poll = rcu_sr_normal_gp_init(); > trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start")); > rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap); > raw_spin_unlock_irq_rcu_node(rnp); > Running this 24 hours already. TREE05 * 16 scenario. I do not see any warnings yet. There is a race, indeed. The gp_seq is moved forward, wheres clients can still come until rcu_sr_normal_gp_init() places a dummy-wait-head for this GP. Thank you for testing Paul and looking to this :) -- Uladzislau Rezki
On Sun, Mar 02, 2025 at 11:19:44AM +0100, Uladzislau Rezki wrote: > On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote: > > On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote: > > > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote: > > > > Hello, Paul! > > > > > > > > > > > > > > > > > > > > > > Except that I got this from overnight testing of rcu/dev on the shared > > > > > > > > > RCU tree: > > > > > > > > > > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > > > > > > > > > > > > > > > > > I see this only on TREE05. Which should not be too surprising, given > > > > > > > > > that this is the scenario that tests it. It happened within five minutes > > > > > > > > > on all 14 of the TREE05 runs. > > > > > > > > > > > > > > > > > Hm.. This is not fun. I tested this on my system and i did not manage to > > > > > > > > trigger this whereas you do. Something is wrong. > > > > > > > > > > > > > > If you have a debug patch, I would be happy to give it a go. > > > > > > > > > > > > > I can trigger it. But. > > > > > > > > > > > > Some background. I tested those patches during many hours on the stable > > > > > > kernel which is 6.13. On that kernel i was not able to trigger it. Running > > > > > > the rcutorture on the our shared "dev" tree, which i did now, triggers this > > > > > > right away. > > > > > > > > > > Bisection? (Hey, you knew that was coming!) > > > > > > > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start detection > > > > > > > > After revert in the dev, rcutorture passes TREE05, 16 instances. > > > > > > Huh. We sure don't get to revert that one... > > > > > > Do we have a problem with the ordering in rcu_gp_init() between the calls > > > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example, > > > do we need to capture the relevant portion of the list before the call > > > to rcu_seq_start(), and do the grace-period-start work afterwards? > > > > I tried moving the call to rcu_sr_normal_gp_init() before the call to > > rcu_seq_start() and got no failures in a one-hour run of 200*TREE05. > > Which does not necessarily mean that this is the correct fix, but I > > figured that it might at least provide food for thought. > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 48384fa2eaeb8..d3efeff7740e7 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void) > > > > /* Advance to a new grace period and initialize state. */ > > record_gp_stall_check_time(); > > + start_new_poll = rcu_sr_normal_gp_init(); > > /* Record GP times before starting GP, hence rcu_seq_start(). */ > > rcu_seq_start(&rcu_state.gp_seq); > > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq); > > - start_new_poll = rcu_sr_normal_gp_init(); > > trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start")); > > rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap); > > raw_spin_unlock_irq_rcu_node(rnp); > > > Running this 24 hours already. TREE05 * 16 scenario. I do not see any > warnings yet. There is a race, indeed. The gp_seq is moved forward, > wheres clients can still come until rcu_sr_normal_gp_init() places a > dummy-wait-head for this GP. > > Thank you for testing Paul and looking to this :) Very good! This is a bug in this commit of mine: 012f47f0f806 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") Boqun, could you please fold this into that commit with something like this added to the commit log just before the paragraph starting with "Although this fixes 91a967fd6934"? However, simply changing get_state_synchronize_rcu_full() function to use rcu_state.gp_seq instead of the root rcu_node structure's ->gp_seq field results in a theoretical bug in kernels booted with rcutree.rcu_normal_wake_from_gp=1 due to the following sequence of events: o The rcu_gp_init() function invokes rcu_seq_start() to officially start a new grace period. o A new RCU reader begins, referencing X from some RCU-protected list. The new grace period is not obligated to wait for this reader. o An updater removes X, then calls synchronize_rcu(), which queues a wait element. o The grace period ends, awakening the updater, which frees X while the reader is still referencing it. The reason that this is theoretical is that although the grace period has officially started, none of the CPUs are officially aware of this, and thus will have to assume that the RCU reader pre-dated the start of the grace period. Except for kernels built with CONFIG_PROVE_RCU=y, which use the polled grace-period APIs, which can and do complain bitterly when this sequence of events occurs. Not only that, there might be some future RCU grace-period mechanism that pulls this sequence of events from theory into practice. This commit therefore also pulls the call to rcu_sr_normal_gp_init() to precede that to rcu_seq_start(). I will let you guys decide whether the call to rcu_sr_normal_gp_init() needs a comment, and, if so, what that comment should say. ;-) Thanx, Paul
On Sun, Mar 02, 2025 at 09:39:44AM -0800, Paul E. McKenney wrote: > On Sun, Mar 02, 2025 at 11:19:44AM +0100, Uladzislau Rezki wrote: > > On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote: > > > On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote: > > > > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote: > > > > > Hello, Paul! > > > > > > > > > > > > > > > > > > > > > > > > > Except that I got this from overnight testing of rcu/dev on the shared > > > > > > > > > > RCU tree: > > > > > > > > > > > > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > > > > > > > > > > > > > > > > > > > I see this only on TREE05. Which should not be too surprising, given > > > > > > > > > > that this is the scenario that tests it. It happened within five minutes > > > > > > > > > > on all 14 of the TREE05 runs. > > > > > > > > > > > > > > > > > > > Hm.. This is not fun. I tested this on my system and i did not manage to > > > > > > > > > trigger this whereas you do. Something is wrong. > > > > > > > > > > > > > > > > If you have a debug patch, I would be happy to give it a go. > > > > > > > > > > > > > > > I can trigger it. But. > > > > > > > > > > > > > > Some background. I tested those patches during many hours on the stable > > > > > > > kernel which is 6.13. On that kernel i was not able to trigger it. Running > > > > > > > the rcutorture on the our shared "dev" tree, which i did now, triggers this > > > > > > > right away. > > > > > > > > > > > > Bisection? (Hey, you knew that was coming!) > > > > > > > > > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start detection > > > > > > > > > > After revert in the dev, rcutorture passes TREE05, 16 instances. > > > > > > > > Huh. We sure don't get to revert that one... > > > > > > > > Do we have a problem with the ordering in rcu_gp_init() between the calls > > > > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example, > > > > do we need to capture the relevant portion of the list before the call > > > > to rcu_seq_start(), and do the grace-period-start work afterwards? > > > > > > I tried moving the call to rcu_sr_normal_gp_init() before the call to > > > rcu_seq_start() and got no failures in a one-hour run of 200*TREE05. > > > Which does not necessarily mean that this is the correct fix, but I > > > figured that it might at least provide food for thought. > > > > > > Thanx, Paul > > > > > > ------------------------------------------------------------------------ > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 48384fa2eaeb8..d3efeff7740e7 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void) > > > > > > /* Advance to a new grace period and initialize state. */ > > > record_gp_stall_check_time(); > > > + start_new_poll = rcu_sr_normal_gp_init(); > > > /* Record GP times before starting GP, hence rcu_seq_start(). */ > > > rcu_seq_start(&rcu_state.gp_seq); > > > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq); > > > - start_new_poll = rcu_sr_normal_gp_init(); > > > trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start")); > > > rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap); > > > raw_spin_unlock_irq_rcu_node(rnp); > > > > > Running this 24 hours already. TREE05 * 16 scenario. I do not see any > > warnings yet. There is a race, indeed. The gp_seq is moved forward, > > wheres clients can still come until rcu_sr_normal_gp_init() places a > > dummy-wait-head for this GP. > > > > Thank you for testing Paul and looking to this :) > > Very good! This is a bug in this commit of mine: > > 012f47f0f806 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") > > Boqun, could you please fold this into that commit with something like > this added to the commit log just before the paragraph starting with > "Although this fixes 91a967fd6934"? > > However, simply changing get_state_synchronize_rcu_full() function > to use rcu_state.gp_seq instead of the root rcu_node structure's > ->gp_seq field results in a theoretical bug in kernels booted > with rcutree.rcu_normal_wake_from_gp=1 due to the following > sequence of events: > > o The rcu_gp_init() function invokes rcu_seq_start() > to officially start a new grace period. > > o A new RCU reader begins, referencing X from some > RCU-protected list. The new grace period is not > obligated to wait for this reader. > > o An updater removes X, then calls synchronize_rcu(), > which queues a wait element. > > o The grace period ends, awakening the updater, which > frees X while the reader is still referencing it. > > The reason that this is theoretical is that although the > grace period has officially started, none of the CPUs are > officially aware of this, and thus will have to assume that > the RCU reader pre-dated the start of the grace period. > > Except for kernels built with CONFIG_PROVE_RCU=y, which use the > polled grace-period APIs, which can and do complain bitterly when > this sequence of events occurs. Not only that, there might be > some future RCU grace-period mechanism that pulls this sequence > of events from theory into practice. This commit therefore > also pulls the call to rcu_sr_normal_gp_init() to precede that > to rcu_seq_start(). > > I will let you guys decide whether the call to rcu_sr_normal_gp_init() > needs a comment, and, if so, what that comment should say. ;-) > Please see the updated patch below (next and rcu/dev branches are updated too). For the comment, I think we can add something like /* * A new wait segment must be started before gp_seq advanced, so * that previous gp waiters won't observe the new gp_seq. */ but I will let Ulad to decide ;-) Regards, Boqun ------------------------>8 Subject: [PATCH] rcu: Fix get_state_synchronize_rcu_full() GP-start detection The get_state_synchronize_rcu_full() and poll_state_synchronize_rcu_full() functions use the root rcu_node structure's ->gp_seq field to detect the beginnings and ends of grace periods, respectively. This choice is necessary for the poll_state_synchronize_rcu_full() function because (give or take counter wrap), the following sequence is guaranteed not to trigger: get_state_synchronize_rcu_full(&rgos); synchronize_rcu(); WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&rgos)); The RCU callbacks that awaken synchronize_rcu() instances are guaranteed not to be invoked before the root rcu_node structure's ->gp_seq field is updated to indicate the end of the grace period. However, these callbacks might start being invoked immediately thereafter, in particular, before rcu_state.gp_seq has been updated. Therefore, poll_state_synchronize_rcu_full() must refer to the root rcu_node structure's ->gp_seq field. Because this field is updated under this structure's ->lock, any code following a call to poll_state_synchronize_rcu_full() will be fully ordered after the full grace-period computation, as is required by RCU's memory-ordering semantics. By symmetry, the get_state_synchronize_rcu_full() function should also use this same root rcu_node structure's ->gp_seq field. But it turns out that symmetry is profoundly (though extremely infrequently) destructive in this case. To see this, consider the following sequence of events: 1. CPU 0 starts a new grace period, and updates rcu_state.gp_seq accordingly. 2. As its first step of grace-period initialization, CPU 0 examines the current CPU hotplug state and decides that it need not wait for CPU 1, which is currently offline. 3. CPU 1 comes online, and updates its state. But this does not affect the current grace period, but rather the one after that. After all, CPU 1 was offline when the current grace period started, so all pre-existing RCU readers on CPU 1 must have completed or been preempted before it last went offline. The current grace period therefore has nothing it needs to wait for on CPU 1. 4. CPU 1 switches to an rcutorture kthread which is running rcutorture's rcu_torture_reader() function, which starts a new RCU reader. 5. CPU 2 is running rcutorture's rcu_torture_writer() function and collects a new polled grace-period "cookie" using get_state_synchronize_rcu_full(). Because the newly started grace period has not completed initialization, the root rcu_node structure's ->gp_seq field has not yet been updated to indicate that this new grace period has already started. This cookie is therefore set up for the end of the current grace period (rather than the end of the following grace period). 6. CPU 0 finishes grace-period initialization. 7. If CPU 1’s rcutorture reader is preempted, it will be added to the ->blkd_tasks list, but because CPU 1’s ->qsmask bit is not set in CPU 1's leaf rcu_node structure, the ->gp_tasks pointer will not be updated. Thus, this grace period will not wait on it. Which is only fair, given that the CPU did not come online until after the grace period officially started. 8. CPUs 0 and 2 then detect the new grace period and then report a quiescent state to the RCU core. 9. Because CPU 1 was offline at the start of the current grace period, CPUs 0 and 2 are the only CPUs that this grace period needs to wait on. So the grace period ends and post-grace-period cleanup starts. In particular, the root rcu_node structure's ->gp_seq field is updated to indicate that this grace period has now ended. 10. CPU 2 continues running rcu_torture_writer() and sees that, from the viewpoint of the root rcu_node structure consulted by the poll_state_synchronize_rcu_full() function, the grace period has ended. It therefore updates state accordingly. 11. CPU 1 is still running the same RCU reader, which notices this update and thus complains about the too-short grace period. The fix is for the get_state_synchronize_rcu_full() function to use rcu_state.gp_seq instead of the root rcu_node structure's ->gp_seq field. With this change in place, if step 5's cookie indicates that the grace period has not yet started, then any prior code executed by CPU 2 must have happened before CPU 1 came online. This will in turn prevent CPU 1's code in steps 3 and 11 from spanning CPU 2's grace-period wait, thus preventing CPU 1 from being subjected to a too-short grace period. This commit therefore makes this change. Note that there is no change to the poll_state_synchronize_rcu_full() function, which as noted above, must continue to use the root rcu_node structure's ->gp_seq field. This is of course an asymmetry between these two functions, but is an asymmetry that is absolutely required for correct operation. It is a common human tendency to greatly value symmetry, and sometimes symmetry is a wonderful thing. Other times, symmetry results in poor performance. But in this case, symmetry is just plain wrong. Nevertheless, the asymmetry does require an additional adjustment. It is possible for get_state_synchronize_rcu_full() to see a given grace period as having started, but for an immediately following poll_state_synchronize_rcu_full() to see it as having not yet started. Given the current rcu_seq_done_exact() implementation, this will result in a false-positive indication that the grace period is done from poll_state_synchronize_rcu_full(). This is dealt with by making rcu_seq_done_exact() reach back three grace periods rather than just two of them. However, simply changing get_state_synchronize_rcu_full() function to use rcu_state.gp_seq instead of the root rcu_node structure's ->gp_seq field results in a theoretical bug in kernels booted with rcutree.rcu_normal_wake_from_gp=1 due to the following sequence of events: o The rcu_gp_init() function invokes rcu_seq_start() to officially start a new grace period. o A new RCU reader begins, referencing X from some RCU-protected list. The new grace period is not obligated to wait for this reader. o An updater removes X, then calls synchronize_rcu(), which queues a wait element. o The grace period ends, awakening the updater, which frees X while the reader is still referencing it. The reason that this is theoretical is that although the grace period has officially started, none of the CPUs are officially aware of this, and thus will have to assume that the RCU reader pre-dated the start of the grace period. Except for kernels built with CONFIG_PROVE_RCU=y, which use the polled grace-period APIs, which can and do complain bitterly when this sequence of events occurs. Not only that, there might be some future RCU grace-period mechanism that pulls this sequence of events from theory into practice. This commit therefore also pulls the call to rcu_sr_normal_gp_init() to precede that to rcu_seq_start(). Although this fixes 91a967fd6934 ("rcu: Add full-sized polling for get_completed*() and poll_state*()"), it is not clear that it is worth backporting this commit. First, it took me many weeks to convince rcutorture to reproduce this more frequently than once per year. Second, this cannot be reproduced at all without frequent CPU-hotplug operations, as in waiting all of 50 milliseconds from the end of the previous operation until starting the next one. Third, the TREE03.boot settings cause multi-millisecond delays during RCU grace-period initialization, which greatly increase the probability of the above sequence of events. (Don't do this in production workloads!) Fourth, the TREE03 rcutorture scenario was modified to use four-CPU guest OSes, to have a single-rcu_node combining tree, no testing of RCU priority boosting, and no random preemption, and these modifications were necessary to reproduce this issue in a reasonable timeframe. Fifth, extremely heavy use of get_state_synchronize_rcu_full() and/or poll_state_synchronize_rcu_full() is required to reproduce this, and as of v6.12, only kfree_rcu() uses it, and even then not particularly heavily. [boqun: Apply the fix [1]] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lore.kernel.org/rcu/d90bd6d9-d15c-4b9b-8a69-95336e74e8f4@paulmck-laptop/ [1] Signed-off-by: Boqun Feng <boqun.feng@gmail.com> --- kernel/rcu/rcu.h | 2 +- kernel/rcu/tree.c | 11 +++++++---- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index feb3ac1dc5d5..f87c9d6d36fc 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -162,7 +162,7 @@ static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) { unsigned long cur_s = READ_ONCE(*sp); - return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 1)); + return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (3 * RCU_SEQ_STATE_MASK + 1)); } /* diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index e4c0ce600b2b..f80cfdc3ee3e 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1801,10 +1801,10 @@ static noinline_for_stack bool rcu_gp_init(void) /* Advance to a new grace period and initialize state. */ record_gp_stall_check_time(); + start_new_poll = rcu_sr_normal_gp_init(); /* Record GP times before starting GP, hence rcu_seq_start(). */ rcu_seq_start(&rcu_state.gp_seq); ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq); - start_new_poll = rcu_sr_normal_gp_init(); trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start")); rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap); raw_spin_unlock_irq_rcu_node(rnp); @@ -3357,14 +3357,17 @@ EXPORT_SYMBOL_GPL(get_state_synchronize_rcu); */ void get_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) { - struct rcu_node *rnp = rcu_get_root(); - /* * Any prior manipulation of RCU-protected data must happen * before the loads from ->gp_seq and ->expedited_sequence. */ smp_mb(); /* ^^^ */ - rgosp->rgos_norm = rcu_seq_snap(&rnp->gp_seq); + + // Yes, rcu_state.gp_seq, not rnp_root->gp_seq, the latter's use + // in poll_state_synchronize_rcu_full() notwithstanding. Use of + // the latter here would result in too-short grace periods due to + // interactions with newly onlined CPUs. + rgosp->rgos_norm = rcu_seq_snap(&rcu_state.gp_seq); rgosp->rgos_exp = rcu_seq_snap(&rcu_state.expedited_sequence); } EXPORT_SYMBOL_GPL(get_state_synchronize_rcu_full);
On Sun, Mar 02, 2025 at 10:46:29AM -0800, Boqun Feng wrote: > On Sun, Mar 02, 2025 at 09:39:44AM -0800, Paul E. McKenney wrote: > > On Sun, Mar 02, 2025 at 11:19:44AM +0100, Uladzislau Rezki wrote: > > > On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote: > > > > On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote: > > > > > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote: > > > > > > Hello, Paul! > > > > > > > > > > > > > > > > > > > > > > > > > > > > Except that I got this from overnight testing of rcu/dev on the shared > > > > > > > > > > > RCU tree: > > > > > > > > > > > > > > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > > > > > > > > > > > > > > > > > > > > > I see this only on TREE05. Which should not be too surprising, given > > > > > > > > > > > that this is the scenario that tests it. It happened within five minutes > > > > > > > > > > > on all 14 of the TREE05 runs. > > > > > > > > > > > > > > > > > > > > > Hm.. This is not fun. I tested this on my system and i did not manage to > > > > > > > > > > trigger this whereas you do. Something is wrong. > > > > > > > > > > > > > > > > > > If you have a debug patch, I would be happy to give it a go. > > > > > > > > > > > > > > > > > I can trigger it. But. > > > > > > > > > > > > > > > > Some background. I tested those patches during many hours on the stable > > > > > > > > kernel which is 6.13. On that kernel i was not able to trigger it. Running > > > > > > > > the rcutorture on the our shared "dev" tree, which i did now, triggers this > > > > > > > > right away. > > > > > > > > > > > > > > Bisection? (Hey, you knew that was coming!) > > > > > > > > > > > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start detection > > > > > > > > > > > > After revert in the dev, rcutorture passes TREE05, 16 instances. > > > > > > > > > > Huh. We sure don't get to revert that one... > > > > > > > > > > Do we have a problem with the ordering in rcu_gp_init() between the calls > > > > > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example, > > > > > do we need to capture the relevant portion of the list before the call > > > > > to rcu_seq_start(), and do the grace-period-start work afterwards? > > > > > > > > I tried moving the call to rcu_sr_normal_gp_init() before the call to > > > > rcu_seq_start() and got no failures in a one-hour run of 200*TREE05. > > > > Which does not necessarily mean that this is the correct fix, but I > > > > figured that it might at least provide food for thought. > > > > > > > > Thanx, Paul > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index 48384fa2eaeb8..d3efeff7740e7 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void) > > > > > > > > /* Advance to a new grace period and initialize state. */ > > > > record_gp_stall_check_time(); > > > > + start_new_poll = rcu_sr_normal_gp_init(); > > > > /* Record GP times before starting GP, hence rcu_seq_start(). */ > > > > rcu_seq_start(&rcu_state.gp_seq); > > > > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq); > > > > - start_new_poll = rcu_sr_normal_gp_init(); > > > > trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start")); > > > > rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap); > > > > raw_spin_unlock_irq_rcu_node(rnp); > > > > > > > Running this 24 hours already. TREE05 * 16 scenario. I do not see any > > > warnings yet. There is a race, indeed. The gp_seq is moved forward, > > > wheres clients can still come until rcu_sr_normal_gp_init() places a > > > dummy-wait-head for this GP. > > > > > > Thank you for testing Paul and looking to this :) > > > > Very good! This is a bug in this commit of mine: > > > > 012f47f0f806 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") > > > > Boqun, could you please fold this into that commit with something like > > this added to the commit log just before the paragraph starting with > > "Although this fixes 91a967fd6934"? > > > > However, simply changing get_state_synchronize_rcu_full() function > > to use rcu_state.gp_seq instead of the root rcu_node structure's > > ->gp_seq field results in a theoretical bug in kernels booted > > with rcutree.rcu_normal_wake_from_gp=1 due to the following > > sequence of events: > > > > o The rcu_gp_init() function invokes rcu_seq_start() > > to officially start a new grace period. > > > > o A new RCU reader begins, referencing X from some > > RCU-protected list. The new grace period is not > > obligated to wait for this reader. > > > > o An updater removes X, then calls synchronize_rcu(), > > which queues a wait element. > > > > o The grace period ends, awakening the updater, which > > frees X while the reader is still referencing it. > > > > The reason that this is theoretical is that although the > > grace period has officially started, none of the CPUs are > > officially aware of this, and thus will have to assume that > > the RCU reader pre-dated the start of the grace period. > > > > Except for kernels built with CONFIG_PROVE_RCU=y, which use the > > polled grace-period APIs, which can and do complain bitterly when > > this sequence of events occurs. Not only that, there might be > > some future RCU grace-period mechanism that pulls this sequence > > of events from theory into practice. This commit therefore > > also pulls the call to rcu_sr_normal_gp_init() to precede that > > to rcu_seq_start(). > > > > I will let you guys decide whether the call to rcu_sr_normal_gp_init() > > needs a comment, and, if so, what that comment should say. ;-) > > > > Please see the updated patch below (next and rcu/dev branches are > updated too). Works for me! > For the comment, I think we can add something like > > /* > * A new wait segment must be started before gp_seq advanced, so > * that previous gp waiters won't observe the new gp_seq. > */ > > but I will let Ulad to decide ;-) Over to you, Uladzislau! ;-) Thanx, Paul > Regards, > Boqun > > ------------------------>8 > Subject: [PATCH] rcu: Fix get_state_synchronize_rcu_full() GP-start detection > > The get_state_synchronize_rcu_full() and poll_state_synchronize_rcu_full() > functions use the root rcu_node structure's ->gp_seq field to detect > the beginnings and ends of grace periods, respectively. This choice is > necessary for the poll_state_synchronize_rcu_full() function because > (give or take counter wrap), the following sequence is guaranteed not > to trigger: > > get_state_synchronize_rcu_full(&rgos); > synchronize_rcu(); > WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&rgos)); > > The RCU callbacks that awaken synchronize_rcu() instances are > guaranteed not to be invoked before the root rcu_node structure's > ->gp_seq field is updated to indicate the end of the grace period. > However, these callbacks might start being invoked immediately > thereafter, in particular, before rcu_state.gp_seq has been updated. > Therefore, poll_state_synchronize_rcu_full() must refer to the > root rcu_node structure's ->gp_seq field. Because this field is > updated under this structure's ->lock, any code following a call to > poll_state_synchronize_rcu_full() will be fully ordered after the > full grace-period computation, as is required by RCU's memory-ordering > semantics. > > By symmetry, the get_state_synchronize_rcu_full() function should also > use this same root rcu_node structure's ->gp_seq field. But it turns out > that symmetry is profoundly (though extremely infrequently) destructive > in this case. To see this, consider the following sequence of events: > > 1. CPU 0 starts a new grace period, and updates rcu_state.gp_seq > accordingly. > > 2. As its first step of grace-period initialization, CPU 0 examines > the current CPU hotplug state and decides that it need not wait > for CPU 1, which is currently offline. > > 3. CPU 1 comes online, and updates its state. But this does not > affect the current grace period, but rather the one after that. > After all, CPU 1 was offline when the current grace period > started, so all pre-existing RCU readers on CPU 1 must have > completed or been preempted before it last went offline. > The current grace period therefore has nothing it needs to wait > for on CPU 1. > > 4. CPU 1 switches to an rcutorture kthread which is running > rcutorture's rcu_torture_reader() function, which starts a new > RCU reader. > > 5. CPU 2 is running rcutorture's rcu_torture_writer() function > and collects a new polled grace-period "cookie" using > get_state_synchronize_rcu_full(). Because the newly started > grace period has not completed initialization, the root rcu_node > structure's ->gp_seq field has not yet been updated to indicate > that this new grace period has already started. > > This cookie is therefore set up for the end of the current grace > period (rather than the end of the following grace period). > > 6. CPU 0 finishes grace-period initialization. > > 7. If CPU 1’s rcutorture reader is preempted, it will be added to > the ->blkd_tasks list, but because CPU 1’s ->qsmask bit is not > set in CPU 1's leaf rcu_node structure, the ->gp_tasks pointer > will not be updated. Thus, this grace period will not wait on > it. Which is only fair, given that the CPU did not come online > until after the grace period officially started. > > 8. CPUs 0 and 2 then detect the new grace period and then report > a quiescent state to the RCU core. > > 9. Because CPU 1 was offline at the start of the current grace > period, CPUs 0 and 2 are the only CPUs that this grace period > needs to wait on. So the grace period ends and post-grace-period > cleanup starts. In particular, the root rcu_node structure's > ->gp_seq field is updated to indicate that this grace period > has now ended. > > 10. CPU 2 continues running rcu_torture_writer() and sees that, > from the viewpoint of the root rcu_node structure consulted by > the poll_state_synchronize_rcu_full() function, the grace period > has ended. It therefore updates state accordingly. > > 11. CPU 1 is still running the same RCU reader, which notices this > update and thus complains about the too-short grace period. > > The fix is for the get_state_synchronize_rcu_full() function to use > rcu_state.gp_seq instead of the root rcu_node structure's ->gp_seq field. > With this change in place, if step 5's cookie indicates that the grace > period has not yet started, then any prior code executed by CPU 2 must > have happened before CPU 1 came online. This will in turn prevent CPU > 1's code in steps 3 and 11 from spanning CPU 2's grace-period wait, > thus preventing CPU 1 from being subjected to a too-short grace period. > > This commit therefore makes this change. Note that there is no change to > the poll_state_synchronize_rcu_full() function, which as noted above, > must continue to use the root rcu_node structure's ->gp_seq field. > This is of course an asymmetry between these two functions, but is an > asymmetry that is absolutely required for correct operation. It is a > common human tendency to greatly value symmetry, and sometimes symmetry > is a wonderful thing. Other times, symmetry results in poor performance. > But in this case, symmetry is just plain wrong. > > Nevertheless, the asymmetry does require an additional adjustment. > It is possible for get_state_synchronize_rcu_full() to see a given > grace period as having started, but for an immediately following > poll_state_synchronize_rcu_full() to see it as having not yet started. > Given the current rcu_seq_done_exact() implementation, this will > result in a false-positive indication that the grace period is done > from poll_state_synchronize_rcu_full(). This is dealt with by making > rcu_seq_done_exact() reach back three grace periods rather than just > two of them. > > However, simply changing get_state_synchronize_rcu_full() function to > use rcu_state.gp_seq instead of the root rcu_node structure's ->gp_seq > field results in a theoretical bug in kernels booted with > rcutree.rcu_normal_wake_from_gp=1 due to the following sequence of > events: > > o The rcu_gp_init() function invokes rcu_seq_start() to officially > start a new grace period. > > o A new RCU reader begins, referencing X from some RCU-protected > list. The new grace period is not obligated to wait for this > reader. > > o An updater removes X, then calls synchronize_rcu(), which queues > a wait element. > > o The grace period ends, awakening the updater, which frees X > while the reader is still referencing it. > > The reason that this is theoretical is that although the grace period > has officially started, none of the CPUs are officially aware of this, > and thus will have to assume that the RCU reader pre-dated the start of > the grace period. > > Except for kernels built with CONFIG_PROVE_RCU=y, which use the polled > grace-period APIs, which can and do complain bitterly when this sequence > of events occurs. Not only that, there might be some future RCU > grace-period mechanism that pulls this sequence of events from theory > into practice. This commit therefore also pulls the call to > rcu_sr_normal_gp_init() to precede that to rcu_seq_start(). > > Although this fixes 91a967fd6934 ("rcu: Add full-sized polling for > get_completed*() and poll_state*()"), it is not clear that it is > worth backporting this commit. First, it took me many weeks to > convince rcutorture to reproduce this more frequently than once > per year. Second, this cannot be reproduced at all without frequent > CPU-hotplug operations, as in waiting all of 50 milliseconds from the > end of the previous operation until starting the next one. Third, > the TREE03.boot settings cause multi-millisecond delays during RCU > grace-period initialization, which greatly increase the probability of > the above sequence of events. (Don't do this in production workloads!) > Fourth, the TREE03 rcutorture scenario was modified to use four-CPU > guest OSes, to have a single-rcu_node combining tree, no testing of RCU > priority boosting, and no random preemption, and these modifications > were necessary to reproduce this issue in a reasonable timeframe. > Fifth, extremely heavy use of get_state_synchronize_rcu_full() and/or > poll_state_synchronize_rcu_full() is required to reproduce this, and as of > v6.12, only kfree_rcu() uses it, and even then not particularly heavily. > > [boqun: Apply the fix [1]] > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> > Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > Link: https://lore.kernel.org/rcu/d90bd6d9-d15c-4b9b-8a69-95336e74e8f4@paulmck-laptop/ [1] > Signed-off-by: Boqun Feng <boqun.feng@gmail.com> > --- > kernel/rcu/rcu.h | 2 +- > kernel/rcu/tree.c | 11 +++++++---- > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > index feb3ac1dc5d5..f87c9d6d36fc 100644 > --- a/kernel/rcu/rcu.h > +++ b/kernel/rcu/rcu.h > @@ -162,7 +162,7 @@ static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > { > unsigned long cur_s = READ_ONCE(*sp); > > - return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 1)); > + return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (3 * RCU_SEQ_STATE_MASK + 1)); > } > > /* > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index e4c0ce600b2b..f80cfdc3ee3e 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1801,10 +1801,10 @@ static noinline_for_stack bool rcu_gp_init(void) > > /* Advance to a new grace period and initialize state. */ > record_gp_stall_check_time(); > + start_new_poll = rcu_sr_normal_gp_init(); > /* Record GP times before starting GP, hence rcu_seq_start(). */ > rcu_seq_start(&rcu_state.gp_seq); > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq); > - start_new_poll = rcu_sr_normal_gp_init(); > trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start")); > rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap); > raw_spin_unlock_irq_rcu_node(rnp); > @@ -3357,14 +3357,17 @@ EXPORT_SYMBOL_GPL(get_state_synchronize_rcu); > */ > void get_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > { > - struct rcu_node *rnp = rcu_get_root(); > - > /* > * Any prior manipulation of RCU-protected data must happen > * before the loads from ->gp_seq and ->expedited_sequence. > */ > smp_mb(); /* ^^^ */ > - rgosp->rgos_norm = rcu_seq_snap(&rnp->gp_seq); > + > + // Yes, rcu_state.gp_seq, not rnp_root->gp_seq, the latter's use > + // in poll_state_synchronize_rcu_full() notwithstanding. Use of > + // the latter here would result in too-short grace periods due to > + // interactions with newly onlined CPUs. > + rgosp->rgos_norm = rcu_seq_snap(&rcu_state.gp_seq); > rgosp->rgos_exp = rcu_seq_snap(&rcu_state.expedited_sequence); > } > EXPORT_SYMBOL_GPL(get_state_synchronize_rcu_full); > -- > 2.45.1 >
On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote: > On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote: > > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote: > > > Hello, Paul! > > > > > > > > > > > > > > > > > > > Except that I got this from overnight testing of rcu/dev on the shared > > > > > > > > RCU tree: > > > > > > > > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > > > > > > > > > > > > > > > I see this only on TREE05. Which should not be too surprising, given > > > > > > > > that this is the scenario that tests it. It happened within five minutes > > > > > > > > on all 14 of the TREE05 runs. > > > > > > > > > > > > > > > Hm.. This is not fun. I tested this on my system and i did not manage to > > > > > > > trigger this whereas you do. Something is wrong. > > > > > > > > > > > > If you have a debug patch, I would be happy to give it a go. > > > > > > > > > > > I can trigger it. But. > > > > > > > > > > Some background. I tested those patches during many hours on the stable > > > > > kernel which is 6.13. On that kernel i was not able to trigger it. Running > > > > > the rcutorture on the our shared "dev" tree, which i did now, triggers this > > > > > right away. > > > > > > > > Bisection? (Hey, you knew that was coming!) > > > > > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start detection > > > > > > After revert in the dev, rcutorture passes TREE05, 16 instances. > > > > Huh. We sure don't get to revert that one... > > > > Do we have a problem with the ordering in rcu_gp_init() between the calls > > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example, > > do we need to capture the relevant portion of the list before the call > > to rcu_seq_start(), and do the grace-period-start work afterwards? > > I tried moving the call to rcu_sr_normal_gp_init() before the call to > rcu_seq_start() and got no failures in a one-hour run of 200*TREE05. > Which does not necessarily mean that this is the correct fix, but I > figured that it might at least provide food for thought. > > Thanx, Paul > > ------------------------------------------------------------------------ > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 48384fa2eaeb8..d3efeff7740e7 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void) > > /* Advance to a new grace period and initialize state. */ > record_gp_stall_check_time(); > + start_new_poll = rcu_sr_normal_gp_init(); > /* Record GP times before starting GP, hence rcu_seq_start(). */ > rcu_seq_start(&rcu_state.gp_seq); > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq); > - start_new_poll = rcu_sr_normal_gp_init(); > trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start")); Oh... so the bug is this? Good catch... CPU 0 CPU 1 rcu_gp_init() rcu_seq_start(rcu_state.gp_seq) sychronize_rcu_normal() rs.head.func = (void *) get_state_synchronize_rcu(); // save rcu_state.gp_seq rcu_sr_normal_add_req() -> llist_add(rcu_state.srs_next) (void) start_poll_synchronize_rcu(); sr_normal_gp_init() llist_add(wait_head, &rcu_state.srs_next); // pick up the // injected WH rcu_state.srs_wait_tail = wait_head; rcu_gp_cleanup() rcu_seq_end(&rcu_state.gp_seq); sr_normal_complete() WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && !poll_state_synchronize_rcu(oldstate), Where as reordering sr_normal_gp_init() prevents this: rcu_gp_init() sr_normal_gp_init() // WH has not // been injected // so nothing to // wait on rcu_seq_start(rcu_state.gp_seq) sychronize_rcu_normal() rs.head.func = (void *) get_state_synchronize_rcu(); // save rcu_state.gp_seq rcu_sr_normal_add_req() -> llist_add(rcu_state.srs_next) (void) start_poll_synchronize_rcu(); rcu_gp_cleanup() rcu_seq_end(&rcu_state.gp_seq); // sr_normal_complete() // wont do anything so // no warning Did I get that right? I think this is a real bug AFAICS, hoping all the memory barriers are in place to make sure the code reordering also correctly orders the accesses. I'll double check that. I also feel its 'theoretical', because as long as rcu_gp_init() and rcu_gp_cleanup() are properly ordered WRT pre-existing readers, then synchronize_rcu_normal() still waits for pre-existing readers even though its a bit confused about the value of the cookies. For the fix, Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> (If possible, include a Link: to my (this) post so that the sequence of events is further clarified.) thanks, - Joel
On Sun, Mar 02, 2025 at 07:15:07PM -0500, Joel Fernandes wrote: > On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote: > > On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote: > > > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote: > > > > Hello, Paul! > > > > > > > > > > > > > > > > > > > > > > Except that I got this from overnight testing of rcu/dev on the shared > > > > > > > > > RCU tree: > > > > > > > > > > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > > > > > > > > > > > > > > > > > I see this only on TREE05. Which should not be too surprising, given > > > > > > > > > that this is the scenario that tests it. It happened within five minutes > > > > > > > > > on all 14 of the TREE05 runs. > > > > > > > > > > > > > > > > > Hm.. This is not fun. I tested this on my system and i did not manage to > > > > > > > > trigger this whereas you do. Something is wrong. > > > > > > > > > > > > > > If you have a debug patch, I would be happy to give it a go. > > > > > > > > > > > > > I can trigger it. But. > > > > > > > > > > > > Some background. I tested those patches during many hours on the stable > > > > > > kernel which is 6.13. On that kernel i was not able to trigger it. Running > > > > > > the rcutorture on the our shared "dev" tree, which i did now, triggers this > > > > > > right away. > > > > > > > > > > Bisection? (Hey, you knew that was coming!) > > > > > > > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start detection > > > > > > > > After revert in the dev, rcutorture passes TREE05, 16 instances. > > > > > > Huh. We sure don't get to revert that one... > > > > > > Do we have a problem with the ordering in rcu_gp_init() between the calls > > > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example, > > > do we need to capture the relevant portion of the list before the call > > > to rcu_seq_start(), and do the grace-period-start work afterwards? > > > > I tried moving the call to rcu_sr_normal_gp_init() before the call to > > rcu_seq_start() and got no failures in a one-hour run of 200*TREE05. > > Which does not necessarily mean that this is the correct fix, but I > > figured that it might at least provide food for thought. > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 48384fa2eaeb8..d3efeff7740e7 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void) > > > > /* Advance to a new grace period and initialize state. */ > > record_gp_stall_check_time(); > > + start_new_poll = rcu_sr_normal_gp_init(); > > /* Record GP times before starting GP, hence rcu_seq_start(). */ > > rcu_seq_start(&rcu_state.gp_seq); > > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq); > > - start_new_poll = rcu_sr_normal_gp_init(); > > trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start")); > > Oh... so the bug is this? Good catch... > > > CPU 0 CPU 1 > > rcu_gp_init() > rcu_seq_start(rcu_state.gp_seq) > sychronize_rcu_normal() > rs.head.func > = (void *) get_state_synchronize_rcu(); > // save rcu_state.gp_seq > rcu_sr_normal_add_req() -> > llist_add(rcu_state.srs_next) > (void) start_poll_synchronize_rcu(); > > > sr_normal_gp_init() > llist_add(wait_head, &rcu_state.srs_next); > // pick up the > // injected WH > rcu_state.srs_wait_tail = wait_head; > > rcu_gp_cleanup() > rcu_seq_end(&rcu_state.gp_seq); > sr_normal_complete() > WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && > !poll_state_synchronize_rcu(oldstate), > > Where as reordering sr_normal_gp_init() prevents this: > > rcu_gp_init() > > sr_normal_gp_init() > // WH has not > // been injected > // so nothing to > // wait on > > rcu_seq_start(rcu_state.gp_seq) > sychronize_rcu_normal() > rs.head.func > = (void *) get_state_synchronize_rcu(); > // save rcu_state.gp_seq > rcu_sr_normal_add_req() -> > llist_add(rcu_state.srs_next) > (void) start_poll_synchronize_rcu(); > > rcu_gp_cleanup() > rcu_seq_end(&rcu_state.gp_seq); > // sr_normal_complete() > // wont do anything so > // no warning > > Did I get that right? > > I think this is a real bug AFAICS, hoping all the memory barriers are in > place to make sure the code reordering also correctly orders the accesses. > I'll double check that. > > I also feel its 'theoretical', because as long as rcu_gp_init() and > rcu_gp_cleanup() are properly ordered WRT pre-existing readers, then > synchronize_rcu_normal() still waits for pre-existing readers even though its > a bit confused about the value of the cookies. > > For the fix, > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Oops, this should be: Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> ;-) thanks, - Joel > > (If possible, include a Link: to my (this) post so that the sequence of > events is further clarified.) > > thanks, > > - Joel >
On Sun, Mar 02, 2025 at 12:36:51PM -0800, Paul E. McKenney wrote: > On Sun, Mar 02, 2025 at 10:46:29AM -0800, Boqun Feng wrote: > > On Sun, Mar 02, 2025 at 09:39:44AM -0800, Paul E. McKenney wrote: > > > On Sun, Mar 02, 2025 at 11:19:44AM +0100, Uladzislau Rezki wrote: > > > > On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote: > > > > > On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote: > > > > > > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote: > > > > > > > Hello, Paul! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Except that I got this from overnight testing of rcu/dev on the shared > > > > > > > > > > > > RCU tree: > > > > > > > > > > > > > > > > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > > > > > > > > > > > > > > > > > > > > > > > I see this only on TREE05. Which should not be too surprising, given > > > > > > > > > > > > that this is the scenario that tests it. It happened within five minutes > > > > > > > > > > > > on all 14 of the TREE05 runs. > > > > > > > > > > > > > > > > > > > > > > > Hm.. This is not fun. I tested this on my system and i did not manage to > > > > > > > > > > > trigger this whereas you do. Something is wrong. > > > > > > > > > > > > > > > > > > > > If you have a debug patch, I would be happy to give it a go. > > > > > > > > > > > > > > > > > > > I can trigger it. But. > > > > > > > > > > > > > > > > > > Some background. I tested those patches during many hours on the stable > > > > > > > > > kernel which is 6.13. On that kernel i was not able to trigger it. Running > > > > > > > > > the rcutorture on the our shared "dev" tree, which i did now, triggers this > > > > > > > > > right away. > > > > > > > > > > > > > > > > Bisection? (Hey, you knew that was coming!) > > > > > > > > > > > > > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start detection > > > > > > > > > > > > > > After revert in the dev, rcutorture passes TREE05, 16 instances. > > > > > > > > > > > > Huh. We sure don't get to revert that one... > > > > > > > > > > > > Do we have a problem with the ordering in rcu_gp_init() between the calls > > > > > > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example, > > > > > > do we need to capture the relevant portion of the list before the call > > > > > > to rcu_seq_start(), and do the grace-period-start work afterwards? > > > > > > > > > > I tried moving the call to rcu_sr_normal_gp_init() before the call to > > > > > rcu_seq_start() and got no failures in a one-hour run of 200*TREE05. > > > > > Which does not necessarily mean that this is the correct fix, but I > > > > > figured that it might at least provide food for thought. > > > > > > > > > > Thanx, Paul > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > index 48384fa2eaeb8..d3efeff7740e7 100644 > > > > > --- a/kernel/rcu/tree.c > > > > > +++ b/kernel/rcu/tree.c > > > > > @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void) > > > > > > > > > > /* Advance to a new grace period and initialize state. */ > > > > > record_gp_stall_check_time(); > > > > > + start_new_poll = rcu_sr_normal_gp_init(); > > > > > /* Record GP times before starting GP, hence rcu_seq_start(). */ > > > > > rcu_seq_start(&rcu_state.gp_seq); > > > > > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq); > > > > > - start_new_poll = rcu_sr_normal_gp_init(); > > > > > trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start")); > > > > > rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap); > > > > > raw_spin_unlock_irq_rcu_node(rnp); > > > > > > > > > Running this 24 hours already. TREE05 * 16 scenario. I do not see any > > > > warnings yet. There is a race, indeed. The gp_seq is moved forward, > > > > wheres clients can still come until rcu_sr_normal_gp_init() places a > > > > dummy-wait-head for this GP. > > > > > > > > Thank you for testing Paul and looking to this :) > > > > > > Very good! This is a bug in this commit of mine: > > > > > > 012f47f0f806 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") > > > > > > Boqun, could you please fold this into that commit with something like > > > this added to the commit log just before the paragraph starting with > > > "Although this fixes 91a967fd6934"? > > > > > > However, simply changing get_state_synchronize_rcu_full() function > > > to use rcu_state.gp_seq instead of the root rcu_node structure's > > > ->gp_seq field results in a theoretical bug in kernels booted > > > with rcutree.rcu_normal_wake_from_gp=1 due to the following > > > sequence of events: > > > > > > o The rcu_gp_init() function invokes rcu_seq_start() > > > to officially start a new grace period. > > > > > > o A new RCU reader begins, referencing X from some > > > RCU-protected list. The new grace period is not > > > obligated to wait for this reader. > > > > > > o An updater removes X, then calls synchronize_rcu(), > > > which queues a wait element. > > > > > > o The grace period ends, awakening the updater, which > > > frees X while the reader is still referencing it. > > > > > > The reason that this is theoretical is that although the > > > grace period has officially started, none of the CPUs are > > > officially aware of this, and thus will have to assume that > > > the RCU reader pre-dated the start of the grace period. > > > > > > Except for kernels built with CONFIG_PROVE_RCU=y, which use the > > > polled grace-period APIs, which can and do complain bitterly when > > > this sequence of events occurs. Not only that, there might be > > > some future RCU grace-period mechanism that pulls this sequence > > > of events from theory into practice. This commit therefore > > > also pulls the call to rcu_sr_normal_gp_init() to precede that > > > to rcu_seq_start(). > > > > > > I will let you guys decide whether the call to rcu_sr_normal_gp_init() > > > needs a comment, and, if so, what that comment should say. ;-) > > > > > > > Please see the updated patch below (next and rcu/dev branches are > > updated too). > > Works for me! > > > For the comment, I think we can add something like > > > > /* > > * A new wait segment must be started before gp_seq advanced, so > > * that previous gp waiters won't observe the new gp_seq. > > */ > > > > but I will let Ulad to decide ;-) > > Over to you, Uladzislau! ;-) > Works for me! Sorry for late answer. I got a fever, therefore i reply not in time. -- Uladzislau Rezki
On Sun, Mar 02, 2025 at 07:17:10PM -0500, Joel Fernandes wrote: > On Sun, Mar 02, 2025 at 07:15:07PM -0500, Joel Fernandes wrote: > > On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote: > > > On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote: > > > > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote: > > > > > Hello, Paul! > > > > > > > > > > > > > > > > > > > > > > > > > Except that I got this from overnight testing of rcu/dev on the shared > > > > > > > > > > RCU tree: > > > > > > > > > > > > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80 > > > > > > > > > > > > > > > > > > > > I see this only on TREE05. Which should not be too surprising, given > > > > > > > > > > that this is the scenario that tests it. It happened within five minutes > > > > > > > > > > on all 14 of the TREE05 runs. > > > > > > > > > > > > > > > > > > > Hm.. This is not fun. I tested this on my system and i did not manage to > > > > > > > > > trigger this whereas you do. Something is wrong. > > > > > > > > > > > > > > > > If you have a debug patch, I would be happy to give it a go. > > > > > > > > > > > > > > > I can trigger it. But. > > > > > > > > > > > > > > Some background. I tested those patches during many hours on the stable > > > > > > > kernel which is 6.13. On that kernel i was not able to trigger it. Running > > > > > > > the rcutorture on the our shared "dev" tree, which i did now, triggers this > > > > > > > right away. > > > > > > > > > > > > Bisection? (Hey, you knew that was coming!) > > > > > > > > > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start detection > > > > > > > > > > After revert in the dev, rcutorture passes TREE05, 16 instances. > > > > > > > > Huh. We sure don't get to revert that one... > > > > > > > > Do we have a problem with the ordering in rcu_gp_init() between the calls > > > > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example, > > > > do we need to capture the relevant portion of the list before the call > > > > to rcu_seq_start(), and do the grace-period-start work afterwards? > > > > > > I tried moving the call to rcu_sr_normal_gp_init() before the call to > > > rcu_seq_start() and got no failures in a one-hour run of 200*TREE05. > > > Which does not necessarily mean that this is the correct fix, but I > > > figured that it might at least provide food for thought. > > > > > > Thanx, Paul > > > > > > ------------------------------------------------------------------------ > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 48384fa2eaeb8..d3efeff7740e7 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void) > > > > > > /* Advance to a new grace period and initialize state. */ > > > record_gp_stall_check_time(); > > > + start_new_poll = rcu_sr_normal_gp_init(); > > > /* Record GP times before starting GP, hence rcu_seq_start(). */ > > > rcu_seq_start(&rcu_state.gp_seq); > > > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq); > > > - start_new_poll = rcu_sr_normal_gp_init(); > > > trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start")); > > > > Oh... so the bug is this? Good catch... > > > > > > CPU 0 CPU 1 > > > > rcu_gp_init() > > rcu_seq_start(rcu_state.gp_seq) > > sychronize_rcu_normal() > > rs.head.func > > = (void *) get_state_synchronize_rcu(); > > // save rcu_state.gp_seq > > rcu_sr_normal_add_req() -> > > llist_add(rcu_state.srs_next) > > (void) start_poll_synchronize_rcu(); > > > > > > sr_normal_gp_init() > > llist_add(wait_head, &rcu_state.srs_next); > > // pick up the > > // injected WH > > rcu_state.srs_wait_tail = wait_head; > > > > rcu_gp_cleanup() > > rcu_seq_end(&rcu_state.gp_seq); > > sr_normal_complete() > > WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && > > !poll_state_synchronize_rcu(oldstate), > > > > Where as reordering sr_normal_gp_init() prevents this: > > > > rcu_gp_init() > > > > sr_normal_gp_init() > > // WH has not > > // been injected > > // so nothing to > > // wait on > > > > rcu_seq_start(rcu_state.gp_seq) > > sychronize_rcu_normal() > > rs.head.func > > = (void *) get_state_synchronize_rcu(); > > // save rcu_state.gp_seq > > rcu_sr_normal_add_req() -> > > llist_add(rcu_state.srs_next) > > (void) start_poll_synchronize_rcu(); > > > > rcu_gp_cleanup() > > rcu_seq_end(&rcu_state.gp_seq); > > // sr_normal_complete() > > // wont do anything so > > // no warning > > > > Did I get that right? > > > > I think this is a real bug AFAICS, hoping all the memory barriers are in > > place to make sure the code reordering also correctly orders the accesses. > > I'll double check that. > > > > I also feel its 'theoretical', because as long as rcu_gp_init() and > > rcu_gp_cleanup() are properly ordered WRT pre-existing readers, then > > synchronize_rcu_normal() still waits for pre-existing readers even though its > > a bit confused about the value of the cookies. > > > > For the fix, > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > Oops, this should be: > > Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> I see the original patch "rcu: Fix get_state_synchronize_rcu_full() GP-start detection" is not yet on -next. Once we are convinced about the fix, do we want to squash the fix into this patch and have Boqun take it? Yet another idea is to apply it for 6.15-rc cycle if more time is needed. Alternatively, we could squash it and I queue it for 6.16 instead of 6.15. And I'm guessing Vlad's series is also for 6.16. thanks, - Joel
On Mon, Mar 03, 2025 at 12:00:40PM -0500, Joel Fernandes wrote: [...] > > I see the original patch "rcu: Fix get_state_synchronize_rcu_full() GP-start > detection" is not yet on -next. Once we are convinced about the fix, do we > want to squash the fix into this patch and have Boqun take it? > Which "-next" are you talking about? The original patch and the fix is already in next-20250303 of linux-next: https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/?h=next-20250303&qt=range&q=153fc45000e0058435ec0609258fb16e7ea257d2 Regards, Boqun > Yet another idea is to apply it for 6.15-rc cycle if more time is needed. > > Alternatively, we could squash it and I queue it for 6.16 instead of 6.15. > > And I'm guessing Vlad's series is also for 6.16. > > thanks, > > - Joel >
On 3/3/2025 12:07 PM, Boqun Feng wrote: > On Mon, Mar 03, 2025 at 12:00:40PM -0500, Joel Fernandes wrote: > [...] >> >> I see the original patch "rcu: Fix get_state_synchronize_rcu_full() GP-start >> detection" is not yet on -next. Once we are convinced about the fix, do we >> want to squash the fix into this patch and have Boqun take it? >> > > Which "-next" are you talking about? The original patch and the fix is > already in next-20250303 of linux-next: > > https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/?h=next-20250303&qt=range&q=153fc45000e0058435ec0609258fb16e7ea257d2 I see it now during manual inspection, but I'm confused why git cherry tells me otherwise. I tried the following command and it shows the patch in question in the first line of output. Basically the question that the command asks is "What is in Paul's dev branch that is not in RCU tree's -next branch". This question is asked for the obvious raisins. So I am obviously missing something in the command. Thoughts? (rcugit is the RCU tree, and paul/dev is Paul's dev branch) git cherry --abbrev -v rcugit/next paul/dev | grep "^+" | cut -d' ' -f2,3- 012f47f0f806 rcu: Fix get_state_synchronize_rcu_full() GP-start detection 2ada0addbdb6 tools/memory-model: Add atomic_and()/or()/xor() and add_negative e176ebffc3f4 tools/memory-model: Add atomic_andnot() with its variants de6f99723392 tools/memory-model: Legitimize current use of tags in LKMM macros 723177d71224 tools/memory-model: Define applicable tags on operation in tools/... 29279349a566 tools/memory-model: Define effect of Mb tags on RMWs in tools/... d80a8e016433 MAINTAINERS: Update Joel's email address fafa18068359 tools/memory-model: Switch to softcoded herd7 tags dcc5197839f2 tools/memory-model: Distinguish between syntactic and semantic tags fa9e35a0772a tools/memory-model/README: Fix typo a2bfbf847c96 tools/memory-model: glossary.txt: Fix indents 3839dbb05869 rcutorture: Make srcu_lockdep.sh check kernel Kconfig b5aa1c489085 rcutorture: Make srcu_lockdep.sh check reader-conflict handling 9a5720bad9ed rcu: Remove swake_up_one_online() bandaid 04159042a62b Revert "rcu/nocb: Fix rcuog wake-up from offline softirq" fdc37fed1c81 rcutorture: Split out beginning and end from rcu_torture_one_read() 3c6b1925361e rcutorture: Make torture.sh --do-rt use CONFIG_PREEMPT_RT fadc715785cc rcutorture: Add tests for SRCU up/down reader primitives 90a8f490324c rcutorture: Pull rcu_torture_updown() loop body into new function 5fbaa5179f6a rcutorture: Comment invocations of tick_dep_set_task() 461810471faa rcutorture: Complain if an ->up_read() is delayed more than 10 seconds 35e1a180319d rcutorture: Check for ->up_read() without matching ->down_read() 0676ba797dfa EXP srcu: Enable Tiny SRCU On all CONFIG_SMP=n kernels c8fff13fd2fd EXP rcutorture: Add SRCU-V scenario for preemptible Tiny SRCU 910a5f9ebf5f EXP rcutorture: Limit callback flooding for Tiny SRCU in preemptible kernels 8979a891a365 EXP hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING
On 3/3/2025 12:30 PM, Joel Fernandes wrote: > > > On 3/3/2025 12:07 PM, Boqun Feng wrote: >> On Mon, Mar 03, 2025 at 12:00:40PM -0500, Joel Fernandes wrote: >> [...] >>> >>> I see the original patch "rcu: Fix get_state_synchronize_rcu_full() GP-start >>> detection" is not yet on -next. Once we are convinced about the fix, do we >>> want to squash the fix into this patch and have Boqun take it? >>> >> >> Which "-next" are you talking about? The original patch and the fix is >> already in next-20250303 of linux-next: >> >> https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/?h=next-20250303&qt=range&q=153fc45000e0058435ec0609258fb16e7ea257d2 > I see it now during manual inspection, but I'm confused why git cherry tells me > otherwise. > > I tried the following command and it shows the patch in question in the first > line of output. Basically the question that the command asks is "What is in > Paul's dev branch that is not in RCU tree's -next branch". This question is > asked for the obvious raisins. > So I am obviously missing something in the command. Thoughts? > > (rcugit is the RCU tree, and paul/dev is Paul's dev branch) > > git cherry --abbrev -v rcugit/next paul/dev | grep "^+" | cut -d' ' -f2,3- > > 012f47f0f806 rcu: Fix get_state_synchronize_rcu_full() GP-start detection Ah, its because Paul's dev branch has an older version of the patch I think (without the fix being discussed here) :-D So that explains it... thanks, - Joel
On Mon, Mar 03, 2025 at 12:30:51PM -0500, Joel Fernandes wrote: > > > On 3/3/2025 12:07 PM, Boqun Feng wrote: > > On Mon, Mar 03, 2025 at 12:00:40PM -0500, Joel Fernandes wrote: > > [...] > >> > >> I see the original patch "rcu: Fix get_state_synchronize_rcu_full() GP-start > >> detection" is not yet on -next. Once we are convinced about the fix, do we > >> want to squash the fix into this patch and have Boqun take it? > >> > > > > Which "-next" are you talking about? The original patch and the fix is > > already in next-20250303 of linux-next: > > > > https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/?h=next-20250303&qt=range&q=153fc45000e0058435ec0609258fb16e7ea257d2 > I see it now during manual inspection, but I'm confused why git cherry tells me > otherwise. The gitk command for the win! ;-) Or, in non-GUI environments, tig. > I tried the following command and it shows the patch in question in the first > line of output. Basically the question that the command asks is "What is in > Paul's dev branch that is not in RCU tree's -next branch". This question is > asked for the obvious raisins. > So I am obviously missing something in the command. Thoughts? > > (rcugit is the RCU tree, and paul/dev is Paul's dev branch) > > git cherry --abbrev -v rcugit/next paul/dev | grep "^+" | cut -d' ' -f2,3- I must defer to others on this one. I must confess that I have not yet found a good use case for "git cherry". Thanx, Paul > 012f47f0f806 rcu: Fix get_state_synchronize_rcu_full() GP-start detection > 2ada0addbdb6 tools/memory-model: Add atomic_and()/or()/xor() and add_negative > e176ebffc3f4 tools/memory-model: Add atomic_andnot() with its variants > de6f99723392 tools/memory-model: Legitimize current use of tags in LKMM macros > 723177d71224 tools/memory-model: Define applicable tags on operation in tools/... > 29279349a566 tools/memory-model: Define effect of Mb tags on RMWs in tools/... > d80a8e016433 MAINTAINERS: Update Joel's email address > fafa18068359 tools/memory-model: Switch to softcoded herd7 tags > dcc5197839f2 tools/memory-model: Distinguish between syntactic and semantic tags > fa9e35a0772a tools/memory-model/README: Fix typo > a2bfbf847c96 tools/memory-model: glossary.txt: Fix indents > 3839dbb05869 rcutorture: Make srcu_lockdep.sh check kernel Kconfig > b5aa1c489085 rcutorture: Make srcu_lockdep.sh check reader-conflict handling > 9a5720bad9ed rcu: Remove swake_up_one_online() bandaid > 04159042a62b Revert "rcu/nocb: Fix rcuog wake-up from offline softirq" > fdc37fed1c81 rcutorture: Split out beginning and end from rcu_torture_one_read() > 3c6b1925361e rcutorture: Make torture.sh --do-rt use CONFIG_PREEMPT_RT > fadc715785cc rcutorture: Add tests for SRCU up/down reader primitives > 90a8f490324c rcutorture: Pull rcu_torture_updown() loop body into new function > 5fbaa5179f6a rcutorture: Comment invocations of tick_dep_set_task() > 461810471faa rcutorture: Complain if an ->up_read() is delayed more than 10 seconds > 35e1a180319d rcutorture: Check for ->up_read() without matching ->down_read() > 0676ba797dfa EXP srcu: Enable Tiny SRCU On all CONFIG_SMP=n kernels > c8fff13fd2fd EXP rcutorture: Add SRCU-V scenario for preemptible Tiny SRCU > 910a5f9ebf5f EXP rcutorture: Limit callback flooding for Tiny SRCU in > preemptible kernels > 8979a891a365 EXP hrtimers: Force migrate away hrtimers queued after > CPUHP_AP_HRTIMERS_DYING > >
On 3/3/2025 1:55 PM, Paul E. McKenney wrote: >> I tried the following command and it shows the patch in question in the first >> line of output. Basically the question that the command asks is "What is in >> Paul's dev branch that is not in RCU tree's -next branch". This question is >> asked for the obvious raisins. >> So I am obviously missing something in the command. Thoughts? >> >> (rcugit is the RCU tree, and paul/dev is Paul's dev branch) >> >> git cherry --abbrev -v rcugit/next paul/dev | grep "^+" | cut -d' ' -f2,3- > I must defer to others on this one. I must confess that I have not yet > found a good use case for "git cherry". In this case git cherry was seemingly correct because it detected a difference between the same patch in your dev branch versus the next branch [1]. It seems the next branch has the fix with the reordering of the statements. So it thought that your branch had a patch that wasn't there in next, where in reality the one in next was "newer". [1] https://web.git.kernel.org/pub/scm/linux/kernel/git/rcu/linux.git/commit/?h=misc.2025.03.02a&id=153fc45000e0058435ec0609258fb16e7ea257d2 thanks, - Joel
diff --git a/include/linux/rcupdate_wait.h b/include/linux/rcupdate_wait.h index f9bed3d3f78d..4c92d4291cce 100644 --- a/include/linux/rcupdate_wait.h +++ b/include/linux/rcupdate_wait.h @@ -16,6 +16,9 @@ struct rcu_synchronize { struct rcu_head head; struct completion completion; + + /* This is for debugging. */ + struct rcu_gp_oldstate oldstate; }; void wakeme_after_rcu(struct rcu_head *head); diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 8625f616c65a..48384fa2eaeb 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1632,12 +1632,10 @@ static void rcu_sr_normal_complete(struct llist_node *node) { struct rcu_synchronize *rs = container_of( (struct rcu_head *) node, struct rcu_synchronize, head); - unsigned long oldstate = (unsigned long) rs->head.func; WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) && - !poll_state_synchronize_rcu(oldstate), - "A full grace period is not passed yet: %lu", - rcu_seq_diff(get_state_synchronize_rcu(), oldstate)); + !poll_state_synchronize_rcu_full(&rs->oldstate), + "A full grace period is not passed yet!\n"); /* Finally. */ complete(&rs->completion); @@ -3247,7 +3245,7 @@ static void synchronize_rcu_normal(void) * snapshot before adding a request. */ if (IS_ENABLED(CONFIG_PROVE_RCU)) - rs.head.func = (void *) get_state_synchronize_rcu(); + get_state_synchronize_rcu_full(&rs.oldstate); rcu_sr_normal_add_req(&rs);
Switch for using of get_state_synchronize_rcu_full() and poll_state_synchronize_rcu_full() pair to debug a normal synchronize_rcu() call. Just using "not" full APIs to identify if a grace period is passed or not might lead to a false-positive kernel splat. It can happen, because get_state_synchronize_rcu() compresses both normal and expedited states into one single unsigned long value, so a poll_state_synchronize_rcu() can miss GP-completion when synchronize_rcu()/synchronize_rcu_expedited() concurrently run. To address this, switch to poll_state_synchronize_rcu_full() and get_state_synchronize_rcu_full() APIs, which use separate variables for expedited and normal states. Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/ Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency") Reported-by: cheung wall <zzqq0103.hey@gmail.com> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> --- include/linux/rcupdate_wait.h | 3 +++ kernel/rcu/tree.c | 8 +++----- 2 files changed, 6 insertions(+), 5 deletions(-)