diff mbox series

mm/mempolicy: Fix redundant check and refine lock protection scope in init_nodemask_of_mempolicy

Message ID 20241121122409.277927-1-zhen.ni@easystack.cn (mailing list archive)
State New
Headers show
Series mm/mempolicy: Fix redundant check and refine lock protection scope in init_nodemask_of_mempolicy | expand

Commit Message

zhen.ni Nov. 21, 2024, 12:24 p.m. UTC
1.Removing redundant checks for current->mempolicy, with a more concise
check order.

2.Using READ_ONCE(current->mempolicy) for safe, single access to
current->mempolicy to prevent potential race conditions.

3.Optimizing the scope of task_lock(current). The lock now only protects
the critical section where mempolicy is accessed, reducing the duration
the lock is held. This enhances performance by limiting the scope of the
lock to only what is necessary.

Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
---
 mm/mempolicy.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

David Hildenbrand Nov. 21, 2024, 1:40 p.m. UTC | #1
On 21.11.24 13:24, Zhen Ni wrote:
> 1.Removing redundant checks for current->mempolicy, with a more concise
> check order.
> 
> 2.Using READ_ONCE(current->mempolicy) for safe, single access to
> current->mempolicy to prevent potential race conditions.
> 
> 3.Optimizing the scope of task_lock(current). The lock now only protects
> the critical section where mempolicy is accessed, reducing the duration
> the lock is held. This enhances performance by limiting the scope of the
> lock to only what is necessary.
> 
> Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
> ---
>   mm/mempolicy.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index b646fab3e45e..8bff8830b7e6 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2132,11 +2132,14 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask)
>   {
>   	struct mempolicy *mempolicy;
>   
> -	if (!(mask && current->mempolicy))
 > +	if (!mask)> +		return false;
> +
> +	mempolicy = READ_ONCE(current->mempolicy);
> +	if (!mempolicy)
>   		return false;
>   
>   	task_lock(current);
> -	mempolicy = current->mempolicy;
>   	switch (mempolicy->mode) {
>   	case MPOL_PREFERRED:
>   	case MPOL_PREFERRED_MANY:

(a) current->mempolicy can only change under the task_lock(), see 
do_set_mempolicy().

(b) current->mempolicy cannot usually NULL once it was none-NULL. There 
are two exceptions:

copy_process() might set it to NULL when creating the process, before 
the task gets scheduled.

do_exit() might set it to NULL via mpol_put_task_policy().

I don't think init_nodemask_of_mempolicy() can be called while our task 
is exiting ...


Now, if you read current->mempolicy outside of task_lock() to the 
*dereference* it when it might already have changed+was freed, that's a 
BUG you're introducing.


So, regarding your (1): which redundant "check" ? There is a single "not 
NULL" chech. Reagrding your (2), I think you are introducing a BUG and 
regrding your (3) please share performance numbers, or realize that youa 
re making something up ;)


The only improvement we could make here is converting:
	if (!(mask && current->mempolicy))
into a more readable
	if (!mask || !current->mapping)

And maybe adding a comment that while we can race with someone changing 
current->mapping, we cannot race with someone changing it to NULL.
Gregory Price Nov. 21, 2024, 2:41 p.m. UTC | #2
On Thu, Nov 21, 2024 at 02:40:44PM +0100, David Hildenbrand wrote:
> On 21.11.24 13:24, Zhen Ni wrote:
> > 1.Removing redundant checks for current->mempolicy, with a more concise
> > check order.
> > 
> > 2.Using READ_ONCE(current->mempolicy) for safe, single access to
> > current->mempolicy to prevent potential race conditions.
> > 
> > 3.Optimizing the scope of task_lock(current). The lock now only protects
> > the critical section where mempolicy is accessed, reducing the duration
> > the lock is held. This enhances performance by limiting the scope of the
> > lock to only what is necessary.
> >

While optimizing task lock scopes is admirable, we're talking about a scale
of maybe a microsecond length access on a lock that's typically held quite
a bit longer - if it's ever really held given the context of this function.

What you'd actually want to do is the opposite:

task_lock(current);
mempolicy = get_mempolicy();
task_unlock(current);
/* do work */
put_mempolicy();

but even this is only optimizing a small amount of work.

And if you look at what init_nodemask_of_mempolicy is actually used for,
there is no need for this optimization - the task is almost certainly
short lived and intended to just adjust a system control value.

> > Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
> > ---
> >   mm/mempolicy.c | 7 +++++--
> >   1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index b646fab3e45e..8bff8830b7e6 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -2132,11 +2132,14 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask)
> >   {
> >   	struct mempolicy *mempolicy;
> > -	if (!(mask && current->mempolicy))
> > +	if (!mask)> +		return false;
> > +
> > +	mempolicy = READ_ONCE(current->mempolicy);
> > +	if (!mempolicy)
> >   		return false;
> >   	task_lock(current);
> > -	mempolicy = current->mempolicy;
> >   	switch (mempolicy->mode) {
> >   	case MPOL_PREFERRED:
> >   	case MPOL_PREFERRED_MANY:
> 
> (a) current->mempolicy can only change under the task_lock(), see
> do_set_mempolicy().
> 
> (b) current->mempolicy cannot usually NULL once it was none-NULL. There are
> two exceptions:
> 
> copy_process() might set it to NULL when creating the process, before the
> task gets scheduled.
> 
> do_exit() might set it to NULL via mpol_put_task_policy().
> 
> I don't think init_nodemask_of_mempolicy() can be called while our task is
> exiting ...
>

Unless someone put an access to nr_hugepages_mempolicy in the exit path, I
doubt it (also that would probably deadlock :])
 
> 
> Now, if you read current->mempolicy outside of task_lock() to the
> *dereference* it when it might already have changed+was freed, that's a BUG
> you're introducing.
> 

100% a bug, although since the task mempolicy can't currently change outside
the context of the task - I can't think of a a way poke this particular bug.

~Gregory
zhen.ni Nov. 22, 2024, 3:54 a.m. UTC | #3
Hi David and Gregory,

Thank you for your detailed feedback and insights. I deeply appreciate 
the time you’ve taken to point out potential issues and to clarify the 
behavior of current->mempolicy under task_lock().

Regarding the "redundant check" in my patch, my intention was to 
simplify the readability of the check for mask and current->mempolicy. 
Your suggested alternative:
	if (!mask || !current->mapping)
is indeed more concise and clear. I fully agree with this change and 
will incorporate it.

On the issue of reading current->mempolicy outside the task_lock() 
context, I acknowledge the potential for introducing a race condition 
when current->mempolicy could be dereferenced after being freed. This 
was an oversight on my part, as I was primarily focused on reducing the 
lock scope.

Regarding the potential performance improvement, I recognize that the 
optimization is minor, and in this specific context, the lock is held 
for such a short duration that it is unlikely to provide any meaningful 
benefit.

I’ve revised the patch to simplify the conditional check and added a 
comment to clarify the behavior of current->mempolicy based on your 
input. Additionally, I’ve removed the lock scope optimization attempt to 
avoid introducing any potential race conditions.

 From 073e4ac5ee6a3f2b45804492f3865cf9157155e2 Mon Sep 17 00:00:00 2001
From: Zhen Ni <zhen.ni@easystack.cn>
Date: Fri, 22 Nov 2024 11:48:05 +0800
Subject: [PATCH] mm/mempolicy: Improve readability of NULL check in
  init_nodemask_of_mempolicy

Refines the readability of the NULL check in init_nodemask_of_mempolicy.
Additionally, a comment is added to clarify current->mempolicy.

Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
---
  mm/mempolicy.c | 8 +++++++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index b646fab3e45e..0f0dd33d20d4 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2132,7 +2132,13 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask)
  {
         struct mempolicy *mempolicy;

-       if (!(mask && current->mempolicy))
+       /*
+        * While current->mempolicy can race with someone changing
+        * current->mapping, it cannot race with changes that set it
+        * to NULL. Such changes are restricted to specific contexts
+        * (e.g., process initialization or exit).
+        */
+       if (!mask || !current->mempolicy)
                 return false;

         task_lock(current);
diff mbox series

Patch

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index b646fab3e45e..8bff8830b7e6 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2132,11 +2132,14 @@  bool init_nodemask_of_mempolicy(nodemask_t *mask)
 {
 	struct mempolicy *mempolicy;
 
-	if (!(mask && current->mempolicy))
+	if (!mask)
+		return false;
+
+	mempolicy = READ_ONCE(current->mempolicy);
+	if (!mempolicy)
 		return false;
 
 	task_lock(current);
-	mempolicy = current->mempolicy;
 	switch (mempolicy->mode) {
 	case MPOL_PREFERRED:
 	case MPOL_PREFERRED_MANY: