Message ID | 20240604222355.2370768-3-paulmck@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Miscellaneous fixes for v6.11 | expand |
Le Tue, Jun 04, 2024 at 03:23:49PM -0700, Paul E. McKenney a écrit : > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > In the synchronize_rcu() common case, we will have less than > SR_MAX_USERS_WAKE_FROM_GP number of users per GP. Waking up the kworker > is pointless just to free the last injected wait head since at that point, > all the users have already been awakened. > > Introduce a new counter to track this and prevent the wakeup in the > common case. > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > --- > kernel/rcu/tree.c | 35 ++++++++++++++++++++++++++++++----- > kernel/rcu/tree.h | 1 + > 2 files changed, 31 insertions(+), 5 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 6ba36d9c09bde..2fe08e6186b4d 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -96,6 +96,7 @@ static struct rcu_state rcu_state = { > .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED, > .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work, > rcu_sr_normal_gp_cleanup_work), > + .srs_cleanups_pending = ATOMIC_INIT(0), > }; > > /* Dump rcu_node combining tree at boot to verify correct setup. */ > @@ -1633,8 +1634,11 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) > * the done tail list manipulations are protected here. > */ > done = smp_load_acquire(&rcu_state.srs_done_tail); > - if (!done) > + if (!done) { > + /* See comments below. */ > + atomic_dec_return_release(&rcu_state.srs_cleanups_pending); This condition is not supposed to happen. If the work is scheduled, there has to be a wait_queue in rcu_state.srs_done_tail. And decrementing may make things worse. So this should be: if (WARN_ON_ONCE(!done)) return; Thanks.
On Wed, Jun 05, 2024 at 06:35:08PM +0200, Frederic Weisbecker wrote: > Le Tue, Jun 04, 2024 at 03:23:49PM -0700, Paul E. McKenney a écrit : > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > > > In the synchronize_rcu() common case, we will have less than > > SR_MAX_USERS_WAKE_FROM_GP number of users per GP. Waking up the kworker > > is pointless just to free the last injected wait head since at that point, > > all the users have already been awakened. > > > > Introduce a new counter to track this and prevent the wakeup in the > > common case. > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > --- > > kernel/rcu/tree.c | 35 ++++++++++++++++++++++++++++++----- > > kernel/rcu/tree.h | 1 + > > 2 files changed, 31 insertions(+), 5 deletions(-) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 6ba36d9c09bde..2fe08e6186b4d 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -96,6 +96,7 @@ static struct rcu_state rcu_state = { > > .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED, > > .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work, > > rcu_sr_normal_gp_cleanup_work), > > + .srs_cleanups_pending = ATOMIC_INIT(0), > > }; > > > > /* Dump rcu_node combining tree at boot to verify correct setup. */ > > @@ -1633,8 +1634,11 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) > > * the done tail list manipulations are protected here. > > */ > > done = smp_load_acquire(&rcu_state.srs_done_tail); > > - if (!done) > > + if (!done) { > > + /* See comments below. */ > > + atomic_dec_return_release(&rcu_state.srs_cleanups_pending); > > This condition is not supposed to happen. If the work is scheduled, > there has to be a wait_queue in rcu_state.srs_done_tail. And decrementing > may make things worse. > > So this should be: > > if (WARN_ON_ONCE(!done)) > return; Or just this: WARN_ON_ONCE(!smp_load_acquire(&rcu_state.srs_done_tail)); Uladzislau, thoughts? Is there some corner case where we really can see that smp_load_acquire() hand back a NULL pointer, perhaps during boot? Thanx, Paul
On Wed, Jun 5, 2024 at 10:05 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > Le Tue, Jun 04, 2024 at 03:23:49PM -0700, Paul E. McKenney a écrit : > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > > > In the synchronize_rcu() common case, we will have less than > > SR_MAX_USERS_WAKE_FROM_GP number of users per GP. Waking up the kworker > > is pointless just to free the last injected wait head since at that point, > > all the users have already been awakened. > > > > Introduce a new counter to track this and prevent the wakeup in the > > common case. > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > --- > > kernel/rcu/tree.c | 35 ++++++++++++++++++++++++++++++----- > > kernel/rcu/tree.h | 1 + > > 2 files changed, 31 insertions(+), 5 deletions(-) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 6ba36d9c09bde..2fe08e6186b4d 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -96,6 +96,7 @@ static struct rcu_state rcu_state = { > > .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED, > > .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work, > > rcu_sr_normal_gp_cleanup_work), > > + .srs_cleanups_pending = ATOMIC_INIT(0), > > }; > > > > /* Dump rcu_node combining tree at boot to verify correct setup. */ > > @@ -1633,8 +1634,11 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) > > * the done tail list manipulations are protected here. > > */ > > done = smp_load_acquire(&rcu_state.srs_done_tail); > > - if (!done) > > + if (!done) { > > + /* See comments below. */ > > + atomic_dec_return_release(&rcu_state.srs_cleanups_pending); > > This condition is not supposed to happen. If the work is scheduled, > there has to be a wait_queue in rcu_state.srs_done_tail. And decrementing > may make things worse. > I also don't see a scenario where this can happen. However, if we are returning from here, given that for every queued work we do an increment of rcu_state.srs_cleanups_pending, I think it's safer to decrement in this case, as that counter tracks only the work queuing and execution counts. atomic_inc(&rcu_state.srs_cleanups_pending); if (!queue_work(sync_wq, &rcu_state.srs_cleanup_work)) atomic_dec(&rcu_state.srs_cleanups_pending); Thanks Neeraj > So this should be: > > if (WARN_ON_ONCE(!done)) > return; > > Thanks. >
On Thu, Jun 06, 2024 at 11:28:07AM +0530, Neeraj upadhyay wrote: > On Wed, Jun 5, 2024 at 10:05 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > > > Le Tue, Jun 04, 2024 at 03:23:49PM -0700, Paul E. McKenney a écrit : > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > > > > > In the synchronize_rcu() common case, we will have less than > > > SR_MAX_USERS_WAKE_FROM_GP number of users per GP. Waking up the kworker > > > is pointless just to free the last injected wait head since at that point, > > > all the users have already been awakened. > > > > > > Introduce a new counter to track this and prevent the wakeup in the > > > common case. > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > --- > > > kernel/rcu/tree.c | 35 ++++++++++++++++++++++++++++++----- > > > kernel/rcu/tree.h | 1 + > > > 2 files changed, 31 insertions(+), 5 deletions(-) > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 6ba36d9c09bde..2fe08e6186b4d 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -96,6 +96,7 @@ static struct rcu_state rcu_state = { > > > .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED, > > > .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work, > > > rcu_sr_normal_gp_cleanup_work), > > > + .srs_cleanups_pending = ATOMIC_INIT(0), > > > }; > > > > > > /* Dump rcu_node combining tree at boot to verify correct setup. */ > > > @@ -1633,8 +1634,11 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) > > > * the done tail list manipulations are protected here. > > > */ > > > done = smp_load_acquire(&rcu_state.srs_done_tail); > > > - if (!done) > > > + if (!done) { > > > + /* See comments below. */ > > > + atomic_dec_return_release(&rcu_state.srs_cleanups_pending); > > > > This condition is not supposed to happen. If the work is scheduled, > > there has to be a wait_queue in rcu_state.srs_done_tail. And decrementing > > may make things worse. > > > > I also don't see a scenario where this can happen. However, if we are > returning from here, given that for every queued work we do an > increment of rcu_state.srs_cleanups_pending, I think it's safer to > decrement in this > case, as that counter tracks only the work queuing and execution counts. > > atomic_inc(&rcu_state.srs_cleanups_pending); > if (!queue_work(sync_wq, &rcu_state.srs_cleanup_work)) > atomic_dec(&rcu_state.srs_cleanups_pending); Linus Torvald's general rule is that if you cannot imagine how a bug can happen, don't attempt to clean up after it. His rationale (which is *almost* always a good one) is that not knowing how the bug happens means that attempts to clean up will usually just make matters worse. And all too often, the clean-up code makes debugging more difficult. One example exception to this rule is when debug-objects detects a duplicate call_rcu(). In that case, we ignore that second call_rcu(). But the reason is that experience has shown that the usual cause really is someone doing a duplicate call_rcu(), and also that ignoring the second call_rcu() makes debugging easier. So what is it that Frederic and I are missing here? Thanx, Paul > Thanks > Neeraj > > > So this should be: > > > > if (WARN_ON_ONCE(!done)) > > return; > > > > Thanks. > > >
On Thu, Jun 6, 2024 at 11:42 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Thu, Jun 06, 2024 at 11:28:07AM +0530, Neeraj upadhyay wrote: > > On Wed, Jun 5, 2024 at 10:05 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > > > > > Le Tue, Jun 04, 2024 at 03:23:49PM -0700, Paul E. McKenney a écrit : > > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > > > > > > > In the synchronize_rcu() common case, we will have less than > > > > SR_MAX_USERS_WAKE_FROM_GP number of users per GP. Waking up the kworker > > > > is pointless just to free the last injected wait head since at that point, > > > > all the users have already been awakened. > > > > > > > > Introduce a new counter to track this and prevent the wakeup in the > > > > common case. > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > > --- > > > > kernel/rcu/tree.c | 35 ++++++++++++++++++++++++++++++----- > > > > kernel/rcu/tree.h | 1 + > > > > 2 files changed, 31 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index 6ba36d9c09bde..2fe08e6186b4d 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -96,6 +96,7 @@ static struct rcu_state rcu_state = { > > > > .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED, > > > > .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work, > > > > rcu_sr_normal_gp_cleanup_work), > > > > + .srs_cleanups_pending = ATOMIC_INIT(0), > > > > }; > > > > > > > > /* Dump rcu_node combining tree at boot to verify correct setup. */ > > > > @@ -1633,8 +1634,11 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) > > > > * the done tail list manipulations are protected here. > > > > */ > > > > done = smp_load_acquire(&rcu_state.srs_done_tail); > > > > - if (!done) > > > > + if (!done) { > > > > + /* See comments below. */ > > > > + atomic_dec_return_release(&rcu_state.srs_cleanups_pending); > > > > > > This condition is not supposed to happen. If the work is scheduled, > > > there has to be a wait_queue in rcu_state.srs_done_tail. And decrementing > > > may make things worse. > > > > > > > I also don't see a scenario where this can happen. However, if we are > > returning from here, given that for every queued work we do an > > increment of rcu_state.srs_cleanups_pending, I think it's safer to > > decrement in this > > case, as that counter tracks only the work queuing and execution counts. > > > > atomic_inc(&rcu_state.srs_cleanups_pending); > > if (!queue_work(sync_wq, &rcu_state.srs_cleanup_work)) > > atomic_dec(&rcu_state.srs_cleanups_pending); > > Linus Torvald's general rule is that if you cannot imagine how a bug > can happen, don't attempt to clean up after it. His rationale (which > is *almost* always a good one) is that not knowing how the bug happens > means that attempts to clean up will usually just make matters worse. > And all too often, the clean-up code makes debugging more difficult. > Ok. Thanks for sharing this info! > One example exception to this rule is when debug-objects detects a > duplicate call_rcu(). In that case, we ignore that second call_rcu(). > But the reason is that experience has shown that the usual cause really > is someone doing a duplicate call_rcu(), and also that ignoring the > second call_rcu() makes debugging easier. > > So what is it that Frederic and I are missing here? > Maybe nothing. As kworker context does not modify srs_done_tail and invalid values of srs_done_tail can only be caused by the GP kthread manipulations of srs_done_tail , my thought here was, we can keep the pending rcu_sr_normal_gp_cleanup_work count consistent with the number of queue_work() and kworker executions, even when we see unexpected srs_done_tail values like these. However, as you described the general rule is to not attempt any clean up for such scenarios. Thanks Neeraj > Thanx, Paul > > > Thanks > > Neeraj > > > > > So this should be: > > > > > > if (WARN_ON_ONCE(!done)) > > > return; > > > > > > Thanks. > > > > >
On Fri, Jun 07, 2024 at 07:21:55AM +0530, Neeraj upadhyay wrote: > On Thu, Jun 6, 2024 at 11:42 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Thu, Jun 06, 2024 at 11:28:07AM +0530, Neeraj upadhyay wrote: > > > On Wed, Jun 5, 2024 at 10:05 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > > > > > > > Le Tue, Jun 04, 2024 at 03:23:49PM -0700, Paul E. McKenney a écrit : > > > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > > > > > > > > > In the synchronize_rcu() common case, we will have less than > > > > > SR_MAX_USERS_WAKE_FROM_GP number of users per GP. Waking up the kworker > > > > > is pointless just to free the last injected wait head since at that point, > > > > > all the users have already been awakened. > > > > > > > > > > Introduce a new counter to track this and prevent the wakeup in the > > > > > common case. > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > > > --- > > > > > kernel/rcu/tree.c | 35 ++++++++++++++++++++++++++++++----- > > > > > kernel/rcu/tree.h | 1 + > > > > > 2 files changed, 31 insertions(+), 5 deletions(-) > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > index 6ba36d9c09bde..2fe08e6186b4d 100644 > > > > > --- a/kernel/rcu/tree.c > > > > > +++ b/kernel/rcu/tree.c > > > > > @@ -96,6 +96,7 @@ static struct rcu_state rcu_state = { > > > > > .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED, > > > > > .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work, > > > > > rcu_sr_normal_gp_cleanup_work), > > > > > + .srs_cleanups_pending = ATOMIC_INIT(0), > > > > > }; > > > > > > > > > > /* Dump rcu_node combining tree at boot to verify correct setup. */ > > > > > @@ -1633,8 +1634,11 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) > > > > > * the done tail list manipulations are protected here. > > > > > */ > > > > > done = smp_load_acquire(&rcu_state.srs_done_tail); > > > > > - if (!done) > > > > > + if (!done) { > > > > > + /* See comments below. */ > > > > > + atomic_dec_return_release(&rcu_state.srs_cleanups_pending); > > > > > > > > This condition is not supposed to happen. If the work is scheduled, > > > > there has to be a wait_queue in rcu_state.srs_done_tail. And decrementing > > > > may make things worse. > > > > > > > > > > I also don't see a scenario where this can happen. However, if we are > > > returning from here, given that for every queued work we do an > > > increment of rcu_state.srs_cleanups_pending, I think it's safer to > > > decrement in this > > > case, as that counter tracks only the work queuing and execution counts. > > > > > > atomic_inc(&rcu_state.srs_cleanups_pending); > > > if (!queue_work(sync_wq, &rcu_state.srs_cleanup_work)) > > > atomic_dec(&rcu_state.srs_cleanups_pending); > > > > Linus Torvald's general rule is that if you cannot imagine how a bug > > can happen, don't attempt to clean up after it. His rationale (which > > is *almost* always a good one) is that not knowing how the bug happens > > means that attempts to clean up will usually just make matters worse. > > And all too often, the clean-up code makes debugging more difficult. > > > > Ok. Thanks for sharing this info! > > > One example exception to this rule is when debug-objects detects a > > duplicate call_rcu(). In that case, we ignore that second call_rcu(). > > But the reason is that experience has shown that the usual cause really > > is someone doing a duplicate call_rcu(), and also that ignoring the > > second call_rcu() makes debugging easier. > > > > So what is it that Frederic and I are missing here? > > Maybe nothing. As kworker context does not modify srs_done_tail and > invalid values > of srs_done_tail can only be caused by the GP kthread manipulations > of srs_done_tail , my thought here was, we can keep the pending > rcu_sr_normal_gp_cleanup_work count consistent with the number of > queue_work() and kworker executions, even when we see unexpected > srs_done_tail values like these. However, as you described the general rule > is to not attempt any clean up for such scenarios. So "if (WARN_ON_ONCE(!done) return;"? Or is there a better way? Thanx, Paul
On Mon, Jun 10, 2024 at 8:42 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Fri, Jun 07, 2024 at 07:21:55AM +0530, Neeraj upadhyay wrote: > > On Thu, Jun 6, 2024 at 11:42 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Thu, Jun 06, 2024 at 11:28:07AM +0530, Neeraj upadhyay wrote: > > > > On Wed, Jun 5, 2024 at 10:05 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > > > > > > > > > Le Tue, Jun 04, 2024 at 03:23:49PM -0700, Paul E. McKenney a écrit : > > > > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > > > > > > > > > > > In the synchronize_rcu() common case, we will have less than > > > > > > SR_MAX_USERS_WAKE_FROM_GP number of users per GP. Waking up the kworker > > > > > > is pointless just to free the last injected wait head since at that point, > > > > > > all the users have already been awakened. > > > > > > > > > > > > Introduce a new counter to track this and prevent the wakeup in the > > > > > > common case. > > > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > > > > --- > > > > > > kernel/rcu/tree.c | 35 ++++++++++++++++++++++++++++++----- > > > > > > kernel/rcu/tree.h | 1 + > > > > > > 2 files changed, 31 insertions(+), 5 deletions(-) > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > index 6ba36d9c09bde..2fe08e6186b4d 100644 > > > > > > --- a/kernel/rcu/tree.c > > > > > > +++ b/kernel/rcu/tree.c > > > > > > @@ -96,6 +96,7 @@ static struct rcu_state rcu_state = { > > > > > > .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED, > > > > > > .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work, > > > > > > rcu_sr_normal_gp_cleanup_work), > > > > > > + .srs_cleanups_pending = ATOMIC_INIT(0), > > > > > > }; > > > > > > > > > > > > /* Dump rcu_node combining tree at boot to verify correct setup. */ > > > > > > @@ -1633,8 +1634,11 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) > > > > > > * the done tail list manipulations are protected here. > > > > > > */ > > > > > > done = smp_load_acquire(&rcu_state.srs_done_tail); > > > > > > - if (!done) > > > > > > + if (!done) { > > > > > > + /* See comments below. */ > > > > > > + atomic_dec_return_release(&rcu_state.srs_cleanups_pending); > > > > > > > > > > This condition is not supposed to happen. If the work is scheduled, > > > > > there has to be a wait_queue in rcu_state.srs_done_tail. And decrementing > > > > > may make things worse. > > > > > > > > > > > > > I also don't see a scenario where this can happen. However, if we are > > > > returning from here, given that for every queued work we do an > > > > increment of rcu_state.srs_cleanups_pending, I think it's safer to > > > > decrement in this > > > > case, as that counter tracks only the work queuing and execution counts. > > > > > > > > atomic_inc(&rcu_state.srs_cleanups_pending); > > > > if (!queue_work(sync_wq, &rcu_state.srs_cleanup_work)) > > > > atomic_dec(&rcu_state.srs_cleanups_pending); > > > > > > Linus Torvald's general rule is that if you cannot imagine how a bug > > > can happen, don't attempt to clean up after it. His rationale (which > > > is *almost* always a good one) is that not knowing how the bug happens > > > means that attempts to clean up will usually just make matters worse. > > > And all too often, the clean-up code makes debugging more difficult. > > > > > > > Ok. Thanks for sharing this info! > > > > > One example exception to this rule is when debug-objects detects a > > > duplicate call_rcu(). In that case, we ignore that second call_rcu(). > > > But the reason is that experience has shown that the usual cause really > > > is someone doing a duplicate call_rcu(), and also that ignoring the > > > second call_rcu() makes debugging easier. > > > > > > So what is it that Frederic and I are missing here? > > > > Maybe nothing. As kworker context does not modify srs_done_tail and > > invalid values > > of srs_done_tail can only be caused by the GP kthread manipulations > > of srs_done_tail , my thought here was, we can keep the pending > > rcu_sr_normal_gp_cleanup_work count consistent with the number of > > queue_work() and kworker executions, even when we see unexpected > > srs_done_tail values like these. However, as you described the general rule > > is to not attempt any clean up for such scenarios. > > So "if (WARN_ON_ONCE(!done) return;"? > Looks good, nit: one more closing parenthesis "if (WARN_ON_ONCE(!done)) return;" > Or is there a better way? > I think, above check suffices . Thanks Neeraj > Thanx, Paul
On Tue, Jun 11, 2024 at 07:16:37PM +0530, Neeraj upadhyay wrote: > On Mon, Jun 10, 2024 at 8:42 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Fri, Jun 07, 2024 at 07:21:55AM +0530, Neeraj upadhyay wrote: > > > On Thu, Jun 6, 2024 at 11:42 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > On Thu, Jun 06, 2024 at 11:28:07AM +0530, Neeraj upadhyay wrote: > > > > > On Wed, Jun 5, 2024 at 10:05 PM Frederic Weisbecker <frederic@kernel.org> wrote: > > > > > > > > > > > > Le Tue, Jun 04, 2024 at 03:23:49PM -0700, Paul E. McKenney a écrit : > > > > > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > > > > > > > > > > > > > In the synchronize_rcu() common case, we will have less than > > > > > > > SR_MAX_USERS_WAKE_FROM_GP number of users per GP. Waking up the kworker > > > > > > > is pointless just to free the last injected wait head since at that point, > > > > > > > all the users have already been awakened. > > > > > > > > > > > > > > Introduce a new counter to track this and prevent the wakeup in the > > > > > > > common case. > > > > > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> > > > > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > > > > > --- > > > > > > > kernel/rcu/tree.c | 35 ++++++++++++++++++++++++++++++----- > > > > > > > kernel/rcu/tree.h | 1 + > > > > > > > 2 files changed, 31 insertions(+), 5 deletions(-) > > > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > > index 6ba36d9c09bde..2fe08e6186b4d 100644 > > > > > > > --- a/kernel/rcu/tree.c > > > > > > > +++ b/kernel/rcu/tree.c > > > > > > > @@ -96,6 +96,7 @@ static struct rcu_state rcu_state = { > > > > > > > .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED, > > > > > > > .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work, > > > > > > > rcu_sr_normal_gp_cleanup_work), > > > > > > > + .srs_cleanups_pending = ATOMIC_INIT(0), > > > > > > > }; > > > > > > > > > > > > > > /* Dump rcu_node combining tree at boot to verify correct setup. */ > > > > > > > @@ -1633,8 +1634,11 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) > > > > > > > * the done tail list manipulations are protected here. > > > > > > > */ > > > > > > > done = smp_load_acquire(&rcu_state.srs_done_tail); > > > > > > > - if (!done) > > > > > > > + if (!done) { > > > > > > > + /* See comments below. */ > > > > > > > + atomic_dec_return_release(&rcu_state.srs_cleanups_pending); > > > > > > > > > > > > This condition is not supposed to happen. If the work is scheduled, > > > > > > there has to be a wait_queue in rcu_state.srs_done_tail. And decrementing > > > > > > may make things worse. > > > > > > > > > > > > > > > > I also don't see a scenario where this can happen. However, if we are > > > > > returning from here, given that for every queued work we do an > > > > > increment of rcu_state.srs_cleanups_pending, I think it's safer to > > > > > decrement in this > > > > > case, as that counter tracks only the work queuing and execution counts. > > > > > > > > > > atomic_inc(&rcu_state.srs_cleanups_pending); > > > > > if (!queue_work(sync_wq, &rcu_state.srs_cleanup_work)) > > > > > atomic_dec(&rcu_state.srs_cleanups_pending); > > > > > > > > Linus Torvald's general rule is that if you cannot imagine how a bug > > > > can happen, don't attempt to clean up after it. His rationale (which > > > > is *almost* always a good one) is that not knowing how the bug happens > > > > means that attempts to clean up will usually just make matters worse. > > > > And all too often, the clean-up code makes debugging more difficult. > > > > > > > > > > Ok. Thanks for sharing this info! > > > > > > > One example exception to this rule is when debug-objects detects a > > > > duplicate call_rcu(). In that case, we ignore that second call_rcu(). > > > > But the reason is that experience has shown that the usual cause really > > > > is someone doing a duplicate call_rcu(), and also that ignoring the > > > > second call_rcu() makes debugging easier. > > > > > > > > So what is it that Frederic and I are missing here? > > > > > > Maybe nothing. As kworker context does not modify srs_done_tail and > > > invalid values > > > of srs_done_tail can only be caused by the GP kthread manipulations > > > of srs_done_tail , my thought here was, we can keep the pending > > > rcu_sr_normal_gp_cleanup_work count consistent with the number of > > > queue_work() and kworker executions, even when we see unexpected > > > srs_done_tail values like these. However, as you described the general rule > > > is to not attempt any clean up for such scenarios. > > > > So "if (WARN_ON_ONCE(!done) return;"? > > > > Looks good, nit: one more closing parenthesis "if (WARN_ON_ONCE(!done)) return;" > > > Or is there a better way? > > > > I think, above check suffices . Very well, I will make this change on the next rebase. Thanx, Paul
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 6ba36d9c09bde..2fe08e6186b4d 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -96,6 +96,7 @@ static struct rcu_state rcu_state = { .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED, .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work, rcu_sr_normal_gp_cleanup_work), + .srs_cleanups_pending = ATOMIC_INIT(0), }; /* Dump rcu_node combining tree at boot to verify correct setup. */ @@ -1633,8 +1634,11 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) * the done tail list manipulations are protected here. */ done = smp_load_acquire(&rcu_state.srs_done_tail); - if (!done) + if (!done) { + /* See comments below. */ + atomic_dec_return_release(&rcu_state.srs_cleanups_pending); return; + } head = done->next; done->next = NULL; @@ -1656,6 +1660,9 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) rcu_sr_put_wait_head(rcu); } + + /* Order list manipulations with atomic access. */ + atomic_dec_return_release(&rcu_state.srs_cleanups_pending); } /* @@ -1663,7 +1670,7 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work) */ static void rcu_sr_normal_gp_cleanup(void) { - struct llist_node *wait_tail, *next, *rcu; + struct llist_node *wait_tail, *next = NULL, *rcu = NULL; int done = 0; wait_tail = rcu_state.srs_wait_tail; @@ -1697,16 +1704,34 @@ static void rcu_sr_normal_gp_cleanup(void) break; } - // concurrent sr_normal_gp_cleanup work might observe this update. - smp_store_release(&rcu_state.srs_done_tail, wait_tail); + /* + * Fast path, no more users to process except putting the second last + * wait head if no inflight-workers. If there are in-flight workers, + * they will remove the last wait head. + * + * Note that the ACQUIRE orders atomic access with list manipulation. + */ + if (wait_tail->next && wait_tail->next->next == NULL && + rcu_sr_is_wait_head(wait_tail->next) && + !atomic_read_acquire(&rcu_state.srs_cleanups_pending)) { + rcu_sr_put_wait_head(wait_tail->next); + wait_tail->next = NULL; + } + + /* Concurrent sr_normal_gp_cleanup work might observe this update. */ ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_done_tail); + smp_store_release(&rcu_state.srs_done_tail, wait_tail); /* * We schedule a work in order to perform a final processing * of outstanding users(if still left) and releasing wait-heads * added by rcu_sr_normal_gp_init() call. */ - queue_work(sync_wq, &rcu_state.srs_cleanup_work); + if (wait_tail->next) { + atomic_inc(&rcu_state.srs_cleanups_pending); + if (!queue_work(sync_wq, &rcu_state.srs_cleanup_work)) + atomic_dec(&rcu_state.srs_cleanups_pending); + } } /* diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index bae7925c497fe..affcb92a358c3 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -420,6 +420,7 @@ struct rcu_state { struct llist_node *srs_done_tail; /* ready for GP users. */ struct sr_wait_node srs_wait_nodes[SR_NORMAL_GP_WAIT_HEAD_MAX]; struct work_struct srs_cleanup_work; + atomic_t srs_cleanups_pending; /* srs inflight worker cleanups. */ }; /* Values for rcu_state structure's gp_flags field. */