diff mbox series

srcu: Add notes for not-effective mutex

Message ID 20221214142446.56543-1-kernelfans@gmail.com (mailing list archive)
State New, archived
Headers show
Series srcu: Add notes for not-effective mutex | expand

Commit Message

Pingfan Liu Dec. 14, 2022, 2:24 p.m. UTC
Capped by the design and implementation of workqueue, the same work can not
be re-entrant.

SRCU has only a single work ssp->work, there is no way to occur the
concurrent scene in its state machine. Hence the mutexes for the
concurrency in the state machine is unnecessary. On the other hand,
using two works to utilize the window from the release of srcu_gp_mutex
to the end of srcu_gp_end() to enhance the parallelism has limited
effect. [1]

To save any trying of improving parallelism by using two work_struct,
put some notes around the srcu_gp_mutex and srcu_cb_mutex. And keep them
instead of removing them, in case in future, the state machine is moved
out of the protection of workqueue non-reentrant protection.

Some test data to show how dual work_struct affects the SRCU
performance:

-1. on 144 cpus hpe-dl560gen10
  modprobe rcutorture   torture_type=srcud  fwd_progress=144 fwd_progress_holdoff=1 shutdown_secs=36000 stat_interval=60 verbose=1
  # base line
  [37587.632155] srcud:  End-test grace-period state: g28287244 f0x0 total-gps=28287244
  # two dual work_struct
  [36443.468017] srcud:  End-test grace-period state: g29026056 f0x0 total-gps=29026056

-2. on 256 cpus amd-milan
  modprobe rcutorture   torture_type=srcud  fwd_progress=256 fwd_progress_holdoff=1 shutdown_secs=36000 stat_interval=60 verbose=1
  # base line
  [36093.605732] srcud:  End-test grace-period state: g10850284 f0x0 total-gps=10850284
  # two dual work_struct
  [36093.856713] srcud:  End-test grace-period state: g10672632 f0x0 total-gps=10672632

The first test shows that it has about 2.6% improvement, while the
second test shows it has no significant effect.

[1]: https://lore.kernel.org/rcu/Y5nC2yBhaUYirByo@piliu.users.ipa.redhat.com/T/#maf99cf1d92077a1b0b8e5c411c79fa4444576624

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Zhang, Qiang1" <qiang1.zhang@intel.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
To: rcu@vger.kernel.org
---
 include/linux/srcutree.h | 4 ++++
 kernel/rcu/srcutree.c    | 5 ++++-
 2 files changed, 8 insertions(+), 1 deletion(-)

Comments

Paul E. McKenney Dec. 18, 2022, 5:53 p.m. UTC | #1
On Wed, Dec 14, 2022 at 10:24:46PM +0800, Pingfan Liu wrote:
> Capped by the design and implementation of workqueue, the same work can not
> be re-entrant.
> 
> SRCU has only a single work ssp->work, there is no way to occur the
> concurrent scene in its state machine. Hence the mutexes for the
> concurrency in the state machine is unnecessary. On the other hand,
> using two works to utilize the window from the release of srcu_gp_mutex
> to the end of srcu_gp_end() to enhance the parallelism has limited
> effect. [1]
> 
> To save any trying of improving parallelism by using two work_struct,
> put some notes around the srcu_gp_mutex and srcu_cb_mutex. And keep them
> instead of removing them, in case in future, the state machine is moved
> out of the protection of workqueue non-reentrant protection.

Thank you for continuning to push on this!

But let's step back and take a wider view, plus summarize a bit.

First, as we have discussed, it is necessary to make RCU work not just
right now, but also to keep it from becoming too fragile.  This notion of
"fragile" isn't yet something that can be reasonably measured, but one
could unreasonably measure it by either comparing the number of words
of English required to explain why it works on the one hand and making
lots of random changes and seeing what fraction of those changes break
something on the other.  These unreasonable measures are, just like the
name says, unreasonable.  But I hope that they give an intuitive notion
of what the goal must be.  Again, I don't know of a reasonable scientific
(let alone mathematical) way of expressing it.

As you noted, the current state is unsatisfying.  There are two different
mechanisms providing mutual exclusion.  This code is nowhere near a
fastpath, so the added overhead is absolutely *not* a concern (not yet,
that we know of, anyway), but there are still risks, for example:

1.	Latent bugs could be introduced that suddenly make their presence
	known should the code ever be relying solely on the mutexes for
	mutual exclusion.

2.	People imagine other purposes for either the single workqueue
	or the mutexes, and inject bugs based on the fact that their
	imagination of the design has diverged from the actual design.

These risks are not huge, especially compared to other risks in the
code base.  For but one example, the code that provides the memory-order
guarantees for synchronize_rcu() are intricate and spread widely over the
code.  As they must be in order to make important fastpaths fast, which
is a benefit that is important enough to make taking that risk worthwhile.

So what are the options here?

1.	Leave it as it is, with the non-huge risks as called out above.

2.	Add comments, as in this commit.  One risk here is that someone
	adds the concurrency, but fails to remove the comments, which
	are now obsolete and confusing.

3.	Add words describing the current double-provided mutual exclusion
	to one of the RCU design documents.

4.	Leave mainline and -rcu as they are, but add your
	multiple-workqueue patch to one of the RCU design documents
	covering SRCU so that it doesn't get lost should it ever become
	necessary to speed up SRCU grace periods.

5.	Leave mainline as it is, but add your multiple-workqueue patch to
	-rcu with a not-for-mainline branch marking its presence so that
	this change can be easily applied later, if needed.  (For another
	example of this, see the sysidle.2017.05.11a branch, which was
	actually in mainline for some time and then later removed because
	it was not being used.)

6.	Make the workqueue supply concurrency, as in your earlier patch.
	As noted in the email threads, this adds risk for no known
	user-visible benefit.

7.	Remove the mutexes, and rely on the workqueue for the needed
	mutual exclusion.  This is unlikely to make the code more
	readable, and makes it more difficult to add concurrency if it
	is ever needed.

8.	Make the workqueue supply concurrency, but only when a given
	kernel-boot parameter or Kconfig option is supplied.  Enable
	this concurrency only during selected rcutorture tests.
	This has the advantage of actually testing the placement and
	usage of the mutexes, but on the other hand it increases test
	effort for no known end-user-visible benefit.

Are there other options?

							Thanx, Paul

> Some test data to show how dual work_struct affects the SRCU
> performance:
> 
> -1. on 144 cpus hpe-dl560gen10
>   modprobe rcutorture   torture_type=srcud  fwd_progress=144 fwd_progress_holdoff=1 shutdown_secs=36000 stat_interval=60 verbose=1
>   # base line
>   [37587.632155] srcud:  End-test grace-period state: g28287244 f0x0 total-gps=28287244
>   # two dual work_struct
>   [36443.468017] srcud:  End-test grace-period state: g29026056 f0x0 total-gps=29026056
> 
> -2. on 256 cpus amd-milan
>   modprobe rcutorture   torture_type=srcud  fwd_progress=256 fwd_progress_holdoff=1 shutdown_secs=36000 stat_interval=60 verbose=1
>   # base line
>   [36093.605732] srcud:  End-test grace-period state: g10850284 f0x0 total-gps=10850284
>   # two dual work_struct
>   [36093.856713] srcud:  End-test grace-period state: g10672632 f0x0 total-gps=10672632
> 
> The first test shows that it has about 2.6% improvement, while the
> second test shows it has no significant effect.
> 
> [1]: https://lore.kernel.org/rcu/Y5nC2yBhaUYirByo@piliu.users.ipa.redhat.com/T/#maf99cf1d92077a1b0b8e5c411c79fa4444576624
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: "Zhang, Qiang1" <qiang1.zhang@intel.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> To: rcu@vger.kernel.org
> ---
>  include/linux/srcutree.h | 4 ++++
>  kernel/rcu/srcutree.c    | 5 ++++-
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
> index 558057b517b7..f199c31493ab 100644
> --- a/include/linux/srcutree.h
> +++ b/include/linux/srcutree.h
> @@ -66,8 +66,12 @@ struct srcu_struct {
>  						/* First node at each level. */
>  	int srcu_size_state;			/* Small-to-big transition state. */
>  	struct mutex srcu_cb_mutex;		/* Serialize CB preparation. */
> +						/* Not take effect since the only single sdp->work */
> +						/* meets the workqueue non-reentrant condition */
>  	spinlock_t __private lock;		/* Protect counters and size state. */
>  	struct mutex srcu_gp_mutex;		/* Serialize GP work. */
> +						/* Not take effect since the only single sdp->work */
> +						/* meets the workqueue non-reentrant condition */
>  	unsigned int srcu_idx;			/* Current rdr array element. */
>  	unsigned long srcu_gp_seq;		/* Grace-period seq #. */
>  	unsigned long srcu_gp_seq_needed;	/* Latest gp_seq needed. */
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index ce39907fa381..8ebb1a240219 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -815,7 +815,10 @@ static void srcu_gp_end(struct srcu_struct *ssp)
>  	struct srcu_node *snp;
>  	int ss_state;
>  
> -	/* Prevent more than one additional grace period. */
> +	/*
> +	 * Prevent more than one additional grace period. But at present, it does
> +	 * not take effect. Refer to the note at the definition.
> +	 * /
>  	mutex_lock(&ssp->srcu_cb_mutex);
>  
>  	/* End the current grace period. */
> -- 
> 2.31.1
>
Pingfan Liu Dec. 23, 2022, 12:58 p.m. UTC | #2
On Sun, Dec 18, 2022 at 09:53:13AM -0800, Paul E. McKenney wrote:
> On Wed, Dec 14, 2022 at 10:24:46PM +0800, Pingfan Liu wrote:
> > Capped by the design and implementation of workqueue, the same work can not
> > be re-entrant.
> > 
> > SRCU has only a single work ssp->work, there is no way to occur the
> > concurrent scene in its state machine. Hence the mutexes for the
> > concurrency in the state machine is unnecessary. On the other hand,
> > using two works to utilize the window from the release of srcu_gp_mutex
> > to the end of srcu_gp_end() to enhance the parallelism has limited
> > effect. [1]
> > 
> > To save any trying of improving parallelism by using two work_struct,
> > put some notes around the srcu_gp_mutex and srcu_cb_mutex. And keep them
> > instead of removing them, in case in future, the state machine is moved
> > out of the protection of workqueue non-reentrant protection.
> 
> Thank you for continuning to push on this!
> 
> But let's step back and take a wider view, plus summarize a bit.
> 
> First, as we have discussed, it is necessary to make RCU work not just
> right now, but also to keep it from becoming too fragile.  This notion of
> "fragile" isn't yet something that can be reasonably measured, but one
> could unreasonably measure it by either comparing the number of words
> of English required to explain why it works on the one hand and making
> lots of random changes and seeing what fraction of those changes break
> something on the other.  These unreasonable measures are, just like the
> name says, unreasonable.  But I hope that they give an intuitive notion
> of what the goal must be.  Again, I don't know of a reasonable scientific
> (let alone mathematical) way of expressing it.
> 

Yes, I get the point of srcu_gp_mutex, which keeps the code from
becoming too fragile. But the carefully designed srcu_cb_mutex may leave
room for people to have fantasy. My purpose just reminders someone in
future it is known and tried.

> As you noted, the current state is unsatisfying.  There are two different
> mechanisms providing mutual exclusion.  This code is nowhere near a
> fastpath, so the added overhead is absolutely *not* a concern (not yet,
> that we know of, anyway), but there are still risks, for example:
> 
> 1.	Latent bugs could be introduced that suddenly make their presence
> 	known should the code ever be relying solely on the mutexes for
> 	mutual exclusion.
> 
> 2.	People imagine other purposes for either the single workqueue
> 	or the mutexes, and inject bugs based on the fact that their
> 	imagination of the design has diverged from the actual design.
> 
> These risks are not huge, especially compared to other risks in the
> code base.  For but one example, the code that provides the memory-order
> guarantees for synchronize_rcu() are intricate and spread widely over the
> code.  As they must be in order to make important fastpaths fast, which
> is a benefit that is important enough to make taking that risk worthwhile.
> 

As said above, I understood this and srcu_gp_mutex is not my target. (I
am sorry that my comment may give wrong impression)

> So what are the options here?
> 
> 1.	Leave it as it is, with the non-huge risks as called out above.
> 

Due to the test data, it is dim for the outlook of this patch.

I am fine with option one. But if necessary, I am also OK with some
comment around srcu_cb_mutex, which may save someone's time in future.


Thanks,

	Pingfan

> 2.	Add comments, as in this commit.  One risk here is that someone
> 	adds the concurrency, but fails to remove the comments, which
> 	are now obsolete and confusing.
> 
> 3.	Add words describing the current double-provided mutual exclusion
> 	to one of the RCU design documents.
> 
> 4.	Leave mainline and -rcu as they are, but add your
> 	multiple-workqueue patch to one of the RCU design documents
> 	covering SRCU so that it doesn't get lost should it ever become
> 	necessary to speed up SRCU grace periods.
> 
> 5.	Leave mainline as it is, but add your multiple-workqueue patch to
> 	-rcu with a not-for-mainline branch marking its presence so that
> 	this change can be easily applied later, if needed.  (For another
> 	example of this, see the sysidle.2017.05.11a branch, which was
> 	actually in mainline for some time and then later removed because
> 	it was not being used.)
> 
> 6.	Make the workqueue supply concurrency, as in your earlier patch.
> 	As noted in the email threads, this adds risk for no known
> 	user-visible benefit.
> 
> 7.	Remove the mutexes, and rely on the workqueue for the needed
> 	mutual exclusion.  This is unlikely to make the code more
> 	readable, and makes it more difficult to add concurrency if it
> 	is ever needed.
> 
> 8.	Make the workqueue supply concurrency, but only when a given
> 	kernel-boot parameter or Kconfig option is supplied.  Enable
> 	this concurrency only during selected rcutorture tests.
> 	This has the advantage of actually testing the placement and
> 	usage of the mutexes, but on the other hand it increases test
> 	effort for no known end-user-visible benefit.
> 
> Are there other options?
> 
> 							Thanx, Paul
> 
> > Some test data to show how dual work_struct affects the SRCU
> > performance:
> > 
> > -1. on 144 cpus hpe-dl560gen10
> >   modprobe rcutorture   torture_type=srcud  fwd_progress=144 fwd_progress_holdoff=1 shutdown_secs=36000 stat_interval=60 verbose=1
> >   # base line
> >   [37587.632155] srcud:  End-test grace-period state: g28287244 f0x0 total-gps=28287244
> >   # two dual work_struct
> >   [36443.468017] srcud:  End-test grace-period state: g29026056 f0x0 total-gps=29026056
> > 
> > -2. on 256 cpus amd-milan
> >   modprobe rcutorture   torture_type=srcud  fwd_progress=256 fwd_progress_holdoff=1 shutdown_secs=36000 stat_interval=60 verbose=1
> >   # base line
> >   [36093.605732] srcud:  End-test grace-period state: g10850284 f0x0 total-gps=10850284
> >   # two dual work_struct
> >   [36093.856713] srcud:  End-test grace-period state: g10672632 f0x0 total-gps=10672632
> > 
> > The first test shows that it has about 2.6% improvement, while the
> > second test shows it has no significant effect.
> > 
> > [1]: https://lore.kernel.org/rcu/Y5nC2yBhaUYirByo@piliu.users.ipa.redhat.com/T/#maf99cf1d92077a1b0b8e5c411c79fa4444576624
> > 
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: "Zhang, Qiang1" <qiang1.zhang@intel.com>
> > Cc: Boqun Feng <boqun.feng@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > To: rcu@vger.kernel.org
> > ---
> >  include/linux/srcutree.h | 4 ++++
> >  kernel/rcu/srcutree.c    | 5 ++++-
> >  2 files changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
> > index 558057b517b7..f199c31493ab 100644
> > --- a/include/linux/srcutree.h
> > +++ b/include/linux/srcutree.h
> > @@ -66,8 +66,12 @@ struct srcu_struct {
> >  						/* First node at each level. */
> >  	int srcu_size_state;			/* Small-to-big transition state. */
> >  	struct mutex srcu_cb_mutex;		/* Serialize CB preparation. */
> > +						/* Not take effect since the only single sdp->work */
> > +						/* meets the workqueue non-reentrant condition */
> >  	spinlock_t __private lock;		/* Protect counters and size state. */
> >  	struct mutex srcu_gp_mutex;		/* Serialize GP work. */
> > +						/* Not take effect since the only single sdp->work */
> > +						/* meets the workqueue non-reentrant condition */
> >  	unsigned int srcu_idx;			/* Current rdr array element. */
> >  	unsigned long srcu_gp_seq;		/* Grace-period seq #. */
> >  	unsigned long srcu_gp_seq_needed;	/* Latest gp_seq needed. */
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index ce39907fa381..8ebb1a240219 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -815,7 +815,10 @@ static void srcu_gp_end(struct srcu_struct *ssp)
> >  	struct srcu_node *snp;
> >  	int ss_state;
> >  
> > -	/* Prevent more than one additional grace period. */
> > +	/*
> > +	 * Prevent more than one additional grace period. But at present, it does
> > +	 * not take effect. Refer to the note at the definition.
> > +	 * /
> >  	mutex_lock(&ssp->srcu_cb_mutex);
> >  
> >  	/* End the current grace period. */
> > -- 
> > 2.31.1
> >
diff mbox series

Patch

diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
index 558057b517b7..f199c31493ab 100644
--- a/include/linux/srcutree.h
+++ b/include/linux/srcutree.h
@@ -66,8 +66,12 @@  struct srcu_struct {
 						/* First node at each level. */
 	int srcu_size_state;			/* Small-to-big transition state. */
 	struct mutex srcu_cb_mutex;		/* Serialize CB preparation. */
+						/* Not take effect since the only single sdp->work */
+						/* meets the workqueue non-reentrant condition */
 	spinlock_t __private lock;		/* Protect counters and size state. */
 	struct mutex srcu_gp_mutex;		/* Serialize GP work. */
+						/* Not take effect since the only single sdp->work */
+						/* meets the workqueue non-reentrant condition */
 	unsigned int srcu_idx;			/* Current rdr array element. */
 	unsigned long srcu_gp_seq;		/* Grace-period seq #. */
 	unsigned long srcu_gp_seq_needed;	/* Latest gp_seq needed. */
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index ce39907fa381..8ebb1a240219 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -815,7 +815,10 @@  static void srcu_gp_end(struct srcu_struct *ssp)
 	struct srcu_node *snp;
 	int ss_state;
 
-	/* Prevent more than one additional grace period. */
+	/*
+	 * Prevent more than one additional grace period. But at present, it does
+	 * not take effect. Refer to the note at the definition.
+	 * /
 	mutex_lock(&ssp->srcu_cb_mutex);
 
 	/* End the current grace period. */