diff mbox series

[1/3] mm/mempolicy: Use the already fetched local variable

Message ID 9c3f7b743477560d1c5b12b8c111a584a2cc92ee.1708097962.git.donettom@linux.ibm.com (mailing list archive)
State New
Headers show
Series [1/3] mm/mempolicy: Use the already fetched local variable | expand

Commit Message

Donet Tom Feb. 17, 2024, 7:31 a.m. UTC
Avoid doing a per cpu read and use the local variable thisnid. IMHO
this also makes the code more readable.

Signed-off-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
---
 mm/mempolicy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Andrew Morton Feb. 18, 2024, 9:38 p.m. UTC | #1
On Sat, 17 Feb 2024 01:31:33 -0600 Donet Tom <donettom@linux.ibm.com> wrote:

> Avoid doing a per cpu read and use the local variable thisnid. IMHO
> this also makes the code more readable.
> 
> ...
>
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2526,7 +2526,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>  		if (node_isset(curnid, pol->nodes))
>  			goto out;
>  		z = first_zones_zonelist(
> -				node_zonelist(numa_node_id(), GFP_HIGHUSER),
> +				node_zonelist(thisnid, GFP_HIGHUSER),
>  				gfp_zone(GFP_HIGHUSER),
>  				&pol->nodes);
>  		polnid = zone_to_nid(z->zone);

	int thisnid = cpu_to_node(thiscpu);

Is there any dofference between numa_node_id() and
cpu_to_node(raw_smp_processor_id())?  And it it explicable that we're
using one here and not the other?
Donet Tom Feb. 19, 2024, 8:34 a.m. UTC | #2
On 2/19/24 03:08, Andrew Morton wrote:
> On Sat, 17 Feb 2024 01:31:33 -0600 Donet Tom <donettom@linux.ibm.com> wrote:
>
>> Avoid doing a per cpu read and use the local variable thisnid. IMHO
>> this also makes the code more readable.
>>
>> ...
>>
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -2526,7 +2526,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>>   		if (node_isset(curnid, pol->nodes))
>>   			goto out;
>>   		z = first_zones_zonelist(
>> -				node_zonelist(numa_node_id(), GFP_HIGHUSER),
>> +				node_zonelist(thisnid, GFP_HIGHUSER),
>>   				gfp_zone(GFP_HIGHUSER),
>>   				&pol->nodes);
>>   		polnid = zone_to_nid(z->zone);
> 	int thisnid = cpu_to_node(thiscpu);
>
> Is there any dofference between numa_node_id() and
> cpu_to_node(raw_smp_processor_id())?  And it it explicable that we're
> using one here and not the other?

Hi Andrew

Both numa_node_id() and cpu_to_node(raw_smp_processor_id()) return the current execution node id,
Since the current execution node is already fetched at the beginning (thisnid) we can reuse it instead of getting it again.

Thanks
Donet Tom

>
Michal Hocko Feb. 19, 2024, 12:02 p.m. UTC | #3
On Sat 17-02-24 01:31:34, Donet Tom wrote:
> We will update MPOL_PREFERRED_MANY in the follow up patch. This change
> is required for that.

Why is it a separate patch then? Does it make review of the next patch
easier? If so make it explicit in the changelog.

> 
> Signed-off-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org>
> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> ---
>  mm/mempolicy.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 8478574c000c..73d698e21dae 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2515,7 +2515,15 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>  				break;
>  			goto out;
>  		}
> -		fallthrough;
> +
> +		if (node_isset(curnid, pol->nodes))
> +			goto out;
> +		z = first_zones_zonelist(
> +				node_zonelist(thisnid, GFP_HIGHUSER),
> +				gfp_zone(GFP_HIGHUSER),
> +				&pol->nodes);
> +		polnid = zone_to_nid(z->zone);
> +		break;
>  
>  	case MPOL_PREFERRED_MANY:
>  		/*
> -- 
> 2.39.3
Donet Tom Feb. 19, 2024, 3:18 p.m. UTC | #4
On 2/19/24 17:32, Michal Hocko wrote:
> On Sat 17-02-24 01:31:34, Donet Tom wrote:
>> We will update MPOL_PREFERRED_MANY in the follow up patch. This change
>> is required for that.
> Why is it a separate patch then? Does it make review of the next patch
> easier? If so make it explicit in the changelog.

Hi Michal

In this patch there is no functional changes. This is just re-arrangement of code. Patch 3 is the actual fix .It will not look nice if we mix these patches. As you said it is easy for reviewing also. That's why we kept it as a separate patch.

Thanks
Donet Tom

>
>> Signed-off-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org>
>> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
>> ---
>>   mm/mempolicy.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> index 8478574c000c..73d698e21dae 100644
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -2515,7 +2515,15 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>>   				break;
>>   			goto out;
>>   		}
>> -		fallthrough;
>> +
>> +		if (node_isset(curnid, pol->nodes))
>> +			goto out;
>> +		z = first_zones_zonelist(
>> +				node_zonelist(thisnid, GFP_HIGHUSER),
>> +				gfp_zone(GFP_HIGHUSER),
>> +				&pol->nodes);
>> +		polnid = zone_to_nid(z->zone);
>> +		break;
>>   
>>   	case MPOL_PREFERRED_MANY:
>>   		/*
>> -- 
>> 2.39.3
Andrew Morton Feb. 20, 2024, 1:21 a.m. UTC | #5
On Mon, 19 Feb 2024 14:04:23 +0530 Donet Tom <donettom@linux.ibm.com> wrote:

> >> --- a/mm/mempolicy.c
> >> +++ b/mm/mempolicy.c
> >> @@ -2526,7 +2526,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
> >>   		if (node_isset(curnid, pol->nodes))
> >>   			goto out;
> >>   		z = first_zones_zonelist(
> >> -				node_zonelist(numa_node_id(), GFP_HIGHUSER),
> >> +				node_zonelist(thisnid, GFP_HIGHUSER),
> >>   				gfp_zone(GFP_HIGHUSER),
> >>   				&pol->nodes);
> >>   		polnid = zone_to_nid(z->zone);
> > 	int thisnid = cpu_to_node(thiscpu);
> >
> > Is there any dofference between numa_node_id() and
> > cpu_to_node(raw_smp_processor_id())?  And it it explicable that we're
> > using one here and not the other?
> 
> Hi Andrew
> 
> Both numa_node_id() and cpu_to_node(raw_smp_processor_id()) return the current execution node id,
> Since the current execution node is already fetched at the beginning (thisnid) we can reuse it instead of getting it again.

Sure, but mine was a broader thought: why do we have both?  Is one
preferable and if so why?
Aneesh Kumar K.V Feb. 20, 2024, 4:10 a.m. UTC | #6
On 2/20/24 6:51 AM, Andrew Morton wrote:
> On Mon, 19 Feb 2024 14:04:23 +0530 Donet Tom <donettom@linux.ibm.com> wrote:
> 
>>>> --- a/mm/mempolicy.c
>>>> +++ b/mm/mempolicy.c
>>>> @@ -2526,7 +2526,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>>>>   		if (node_isset(curnid, pol->nodes))
>>>>   			goto out;
>>>>   		z = first_zones_zonelist(
>>>> -				node_zonelist(numa_node_id(), GFP_HIGHUSER),
>>>> +				node_zonelist(thisnid, GFP_HIGHUSER),
>>>>   				gfp_zone(GFP_HIGHUSER),
>>>>   				&pol->nodes);
>>>>   		polnid = zone_to_nid(z->zone);
>>> 	int thisnid = cpu_to_node(thiscpu);
>>>
>>> Is there any dofference between numa_node_id() and
>>> cpu_to_node(raw_smp_processor_id())?  And it it explicable that we're
>>> using one here and not the other?
>>
>> Hi Andrew
>>
>> Both numa_node_id() and cpu_to_node(raw_smp_processor_id()) return the current execution node id,
>> Since the current execution node is already fetched at the beginning (thisnid) we can reuse it instead of getting it again.
> 
> Sure, but mine was a broader thought: why do we have both?  Is one
> preferable and if so why?

IIUC these are two helpers to fetch current numa node id. and either of them can be used based on need. The default implementation shows the details.
(One small difference is numa_node_id() can use optimized per cpu reader because it is fetching the per cpu variable of the currently running cpu.)

#ifndef numa_node_id
/* Returns the number of the current Node. */
static inline int numa_node_id(void)
{
	return raw_cpu_read(numa_node);
}
#endif

#ifndef cpu_to_node
static inline int cpu_to_node(int cpu)
{
	return per_cpu(numa_node, cpu);
}
#endif

In mpol_misplaced function, we need the cpu details because we are using that in other place (should_numa_migreate_memory()). So it makes it easy
to use cpu_to_node(thiscpu) instead of numa_node_id(). 

-aneesh
Huang, Ying Feb. 20, 2024, 6:25 a.m. UTC | #7
"Aneesh Kumar K.V" <aneesh.kumar@kernel.org> writes:

> On 2/20/24 6:51 AM, Andrew Morton wrote:
>> On Mon, 19 Feb 2024 14:04:23 +0530 Donet Tom <donettom@linux.ibm.com> wrote:
>> 
>>>>> --- a/mm/mempolicy.c
>>>>> +++ b/mm/mempolicy.c
>>>>> @@ -2526,7 +2526,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>>>>>   		if (node_isset(curnid, pol->nodes))
>>>>>   			goto out;
>>>>>   		z = first_zones_zonelist(
>>>>> -				node_zonelist(numa_node_id(), GFP_HIGHUSER),
>>>>> +				node_zonelist(thisnid, GFP_HIGHUSER),
>>>>>   				gfp_zone(GFP_HIGHUSER),
>>>>>   				&pol->nodes);
>>>>>   		polnid = zone_to_nid(z->zone);
>>>> 	int thisnid = cpu_to_node(thiscpu);
>>>>
>>>> Is there any dofference between numa_node_id() and
>>>> cpu_to_node(raw_smp_processor_id())?  And it it explicable that we're
>>>> using one here and not the other?
>>>
>>> Hi Andrew
>>>
>>> Both numa_node_id() and cpu_to_node(raw_smp_processor_id()) return the current execution node id,
>>> Since the current execution node is already fetched at the beginning (thisnid) we can reuse it instead of getting it again.
>> 
>> Sure, but mine was a broader thought: why do we have both?  Is one
>> preferable and if so why?
>
> IIUC these are two helpers to fetch current numa node id. and either of them can be used based on need. The default implementation shows the details.
> (One small difference is numa_node_id() can use optimized per cpu reader because it is fetching the per cpu variable of the currently running cpu.)
>
> #ifndef numa_node_id
> /* Returns the number of the current Node. */
> static inline int numa_node_id(void)
> {
> 	return raw_cpu_read(numa_node);
> }
> #endif
>
> #ifndef cpu_to_node
> static inline int cpu_to_node(int cpu)
> {
> 	return per_cpu(numa_node, cpu);
> }
> #endif
>
> In mpol_misplaced function, we need the cpu details because we are using that in other place (should_numa_migreate_memory()). So it makes it easy
> to use cpu_to_node(thiscpu) instead of numa_node_id(). 

IIUC, numa_node_id() is faster than cpu_to_node(thiscpu), even if we
have thiscpu already.  cpu_to_node() is mainly used to get the node of
NOT current CPU.  So, IMHO, we should only use numa_node_id() in this
function.

--
Best Regards,
Huang, Ying
Aneesh Kumar K.V Feb. 20, 2024, 6:32 a.m. UTC | #8
On 2/20/24 11:55 AM, Huang, Ying wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> writes:
> 
>> On 2/20/24 6:51 AM, Andrew Morton wrote:
>>> On Mon, 19 Feb 2024 14:04:23 +0530 Donet Tom <donettom@linux.ibm.com> wrote:
>>>
>>>>>> --- a/mm/mempolicy.c
>>>>>> +++ b/mm/mempolicy.c
>>>>>> @@ -2526,7 +2526,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>>>>>>   		if (node_isset(curnid, pol->nodes))
>>>>>>   			goto out;
>>>>>>   		z = first_zones_zonelist(
>>>>>> -				node_zonelist(numa_node_id(), GFP_HIGHUSER),
>>>>>> +				node_zonelist(thisnid, GFP_HIGHUSER),
>>>>>>   				gfp_zone(GFP_HIGHUSER),
>>>>>>   				&pol->nodes);
>>>>>>   		polnid = zone_to_nid(z->zone);
>>>>> 	int thisnid = cpu_to_node(thiscpu);
>>>>>
>>>>> Is there any dofference between numa_node_id() and
>>>>> cpu_to_node(raw_smp_processor_id())?  And it it explicable that we're
>>>>> using one here and not the other?
>>>>
>>>> Hi Andrew
>>>>
>>>> Both numa_node_id() and cpu_to_node(raw_smp_processor_id()) return the current execution node id,
>>>> Since the current execution node is already fetched at the beginning (thisnid) we can reuse it instead of getting it again.
>>>
>>> Sure, but mine was a broader thought: why do we have both?  Is one
>>> preferable and if so why?
>>
>> IIUC these are two helpers to fetch current numa node id. and either of them can be used based on need. The default implementation shows the details.
>> (One small difference is numa_node_id() can use optimized per cpu reader because it is fetching the per cpu variable of the currently running cpu.)
>>
>> #ifndef numa_node_id
>> /* Returns the number of the current Node. */
>> static inline int numa_node_id(void)
>> {
>> 	return raw_cpu_read(numa_node);
>> }
>> #endif
>>
>> #ifndef cpu_to_node
>> static inline int cpu_to_node(int cpu)
>> {
>> 	return per_cpu(numa_node, cpu);
>> }
>> #endif
>>
>> In mpol_misplaced function, we need the cpu details because we are using that in other place (should_numa_migreate_memory()). So it makes it easy
>> to use cpu_to_node(thiscpu) instead of numa_node_id(). 
> 
> IIUC, numa_node_id() is faster than cpu_to_node(thiscpu), even if we
> have thiscpu already.  cpu_to_node() is mainly used to get the node of
> NOT current CPU.  So, IMHO, we should only use numa_node_id() in this
> function.
> 

This change?

modified   mm/mempolicy.c
@@ -2502,8 +2502,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
 	pgoff_t ilx;
 	struct zoneref *z;
 	int curnid = folio_nid(folio);
-	int thiscpu = raw_smp_processor_id();
-	int thisnid = cpu_to_node(thiscpu);
+	int thisnid = numa_node_id();
 	int polnid = NUMA_NO_NODE;
 	int ret = NUMA_NO_NODE;
 
@@ -2573,7 +2572,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
 		polnid = thisnid;
 
 		if (!should_numa_migrate_memory(current, folio, curnid,
-						thiscpu))
+						raw_smp_processor_id()))
 			goto out;
 	}
 



-aneesh
Aneesh Kumar K.V Feb. 20, 2024, 7:03 a.m. UTC | #9
On 2/20/24 12:02 PM, Aneesh Kumar K.V wrote:
> On 2/20/24 11:55 AM, Huang, Ying wrote:
>> "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> writes:
>>
>>> On 2/20/24 6:51 AM, Andrew Morton wrote:
>>>> On Mon, 19 Feb 2024 14:04:23 +0530 Donet Tom <donettom@linux.ibm.com> wrote:
>>>>
>>>>>>> --- a/mm/mempolicy.c
>>>>>>> +++ b/mm/mempolicy.c
>>>>>>> @@ -2526,7 +2526,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>>>>>>>   		if (node_isset(curnid, pol->nodes))
>>>>>>>   			goto out;
>>>>>>>   		z = first_zones_zonelist(
>>>>>>> -				node_zonelist(numa_node_id(), GFP_HIGHUSER),
>>>>>>> +				node_zonelist(thisnid, GFP_HIGHUSER),
>>>>>>>   				gfp_zone(GFP_HIGHUSER),
>>>>>>>   				&pol->nodes);
>>>>>>>   		polnid = zone_to_nid(z->zone);
>>>>>> 	int thisnid = cpu_to_node(thiscpu);
>>>>>>
>>>>>> Is there any dofference between numa_node_id() and
>>>>>> cpu_to_node(raw_smp_processor_id())?  And it it explicable that we're
>>>>>> using one here and not the other?
>>>>>
>>>>> Hi Andrew
>>>>>
>>>>> Both numa_node_id() and cpu_to_node(raw_smp_processor_id()) return the current execution node id,
>>>>> Since the current execution node is already fetched at the beginning (thisnid) we can reuse it instead of getting it again.
>>>>
>>>> Sure, but mine was a broader thought: why do we have both?  Is one
>>>> preferable and if so why?
>>>
>>> IIUC these are two helpers to fetch current numa node id. and either of them can be used based on need. The default implementation shows the details.
>>> (One small difference is numa_node_id() can use optimized per cpu reader because it is fetching the per cpu variable of the currently running cpu.)
>>>
>>> #ifndef numa_node_id
>>> /* Returns the number of the current Node. */
>>> static inline int numa_node_id(void)
>>> {
>>> 	return raw_cpu_read(numa_node);
>>> }
>>> #endif
>>>
>>> #ifndef cpu_to_node
>>> static inline int cpu_to_node(int cpu)
>>> {
>>> 	return per_cpu(numa_node, cpu);
>>> }
>>> #endif
>>>
>>> In mpol_misplaced function, we need the cpu details because we are using that in other place (should_numa_migreate_memory()). So it makes it easy
>>> to use cpu_to_node(thiscpu) instead of numa_node_id(). 
>>
>> IIUC, numa_node_id() is faster than cpu_to_node(thiscpu), even if we
>> have thiscpu already.  cpu_to_node() is mainly used to get the node of
>> NOT current CPU.  So, IMHO, we should only use numa_node_id() in this
>> function.
>>
> 
> This change?
> 
> modified   mm/mempolicy.c
> @@ -2502,8 +2502,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>  	pgoff_t ilx;
>  	struct zoneref *z;
>  	int curnid = folio_nid(folio);
> -	int thiscpu = raw_smp_processor_id();
> -	int thisnid = cpu_to_node(thiscpu);
> +	int thisnid = numa_node_id();
>  	int polnid = NUMA_NO_NODE;
>  	int ret = NUMA_NO_NODE;
>  
> @@ -2573,7 +2572,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>  		polnid = thisnid;
>  
>  		if (!should_numa_migrate_memory(current, folio, curnid,
> -						thiscpu))
> +						raw_smp_processor_id()))
>  			goto out;
>  	}
>  
> 

One of the problem with the above change will be the need to make sure smp processor id remaining stable, which
I am not sure we want in this function. With that we can end up with processor id not related to the numa node id
we are using. 

-aneesh
Huang, Ying Feb. 20, 2024, 7:22 a.m. UTC | #10
"Aneesh Kumar K.V" <aneesh.kumar@kernel.org> writes:

> On 2/20/24 12:02 PM, Aneesh Kumar K.V wrote:
>> On 2/20/24 11:55 AM, Huang, Ying wrote:
>>> "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> writes:
>>>
>>>> On 2/20/24 6:51 AM, Andrew Morton wrote:
>>>>> On Mon, 19 Feb 2024 14:04:23 +0530 Donet Tom <donettom@linux.ibm.com> wrote:
>>>>>
>>>>>>>> --- a/mm/mempolicy.c
>>>>>>>> +++ b/mm/mempolicy.c
>>>>>>>> @@ -2526,7 +2526,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>>>>>>>>   		if (node_isset(curnid, pol->nodes))
>>>>>>>>   			goto out;
>>>>>>>>   		z = first_zones_zonelist(
>>>>>>>> -				node_zonelist(numa_node_id(), GFP_HIGHUSER),
>>>>>>>> +				node_zonelist(thisnid, GFP_HIGHUSER),
>>>>>>>>   				gfp_zone(GFP_HIGHUSER),
>>>>>>>>   				&pol->nodes);
>>>>>>>>   		polnid = zone_to_nid(z->zone);
>>>>>>> 	int thisnid = cpu_to_node(thiscpu);
>>>>>>>
>>>>>>> Is there any dofference between numa_node_id() and
>>>>>>> cpu_to_node(raw_smp_processor_id())?  And it it explicable that we're
>>>>>>> using one here and not the other?
>>>>>>
>>>>>> Hi Andrew
>>>>>>
>>>>>> Both numa_node_id() and cpu_to_node(raw_smp_processor_id()) return the current execution node id,
>>>>>> Since the current execution node is already fetched at the beginning (thisnid) we can reuse it instead of getting it again.
>>>>>
>>>>> Sure, but mine was a broader thought: why do we have both?  Is one
>>>>> preferable and if so why?
>>>>
>>>> IIUC these are two helpers to fetch current numa node id. and either of them can be used based on need. The default implementation shows the details.
>>>> (One small difference is numa_node_id() can use optimized per cpu reader because it is fetching the per cpu variable of the currently running cpu.)
>>>>
>>>> #ifndef numa_node_id
>>>> /* Returns the number of the current Node. */
>>>> static inline int numa_node_id(void)
>>>> {
>>>> 	return raw_cpu_read(numa_node);
>>>> }
>>>> #endif
>>>>
>>>> #ifndef cpu_to_node
>>>> static inline int cpu_to_node(int cpu)
>>>> {
>>>> 	return per_cpu(numa_node, cpu);
>>>> }
>>>> #endif
>>>>
>>>> In mpol_misplaced function, we need the cpu details because we are using that in other place (should_numa_migreate_memory()). So it makes it easy
>>>> to use cpu_to_node(thiscpu) instead of numa_node_id(). 
>>>
>>> IIUC, numa_node_id() is faster than cpu_to_node(thiscpu), even if we
>>> have thiscpu already.  cpu_to_node() is mainly used to get the node of
>>> NOT current CPU.  So, IMHO, we should only use numa_node_id() in this
>>> function.
>>>
>> 
>> This change?
>> 
>> modified   mm/mempolicy.c
>> @@ -2502,8 +2502,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>>  	pgoff_t ilx;
>>  	struct zoneref *z;
>>  	int curnid = folio_nid(folio);
>> -	int thiscpu = raw_smp_processor_id();
>> -	int thisnid = cpu_to_node(thiscpu);
>> +	int thisnid = numa_node_id();
>>  	int polnid = NUMA_NO_NODE;
>>  	int ret = NUMA_NO_NODE;
>>  
>> @@ -2573,7 +2572,7 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
>>  		polnid = thisnid;
>>  
>>  		if (!should_numa_migrate_memory(current, folio, curnid,
>> -						thiscpu))
>> +						raw_smp_processor_id()))
>>  			goto out;
>>  	}
>>  
>> 
>
> One of the problem with the above change will be the need to make sure smp processor id remaining stable, which
> I am not sure we want in this function. With that we can end up with processor id not related to the numa node id
> we are using. 

This isn't an issue now, because mpol_misplaced() are always called with
PTL held.  And, we can still keep thiscpu local variable.

--
Best Regards,
Huang, Ying
Michal Hocko Feb. 20, 2024, 9:03 a.m. UTC | #11
On Tue 20-02-24 15:22:07, Huang, Ying wrote:
[...]
> This isn't an issue now, because mpol_misplaced() are always called with
> PTL held.  And, we can still keep thiscpu local variable.

yes, this is the case but it would be better if we made that assumption
official by lockdep_assert_held
Aneesh Kumar K.V March 3, 2024, 6:17 a.m. UTC | #12
Michal Hocko <mhocko@suse.com> writes:

> On Tue 20-02-24 15:22:07, Huang, Ying wrote:
> [...]
>> This isn't an issue now, because mpol_misplaced() are always called with
>> PTL held.  And, we can still keep thiscpu local variable.
>
> yes, this is the case but it would be better if we made that assumption
> official by lockdep_assert_held
>

How about this folded into this patch?

2 files changed, 12 insertions(+), 4 deletions(-)
mm/memory.c    |  6 ++++--
mm/mempolicy.c | 10 ++++++++--

modified   mm/memory.c
@@ -4879,9 +4879,11 @@ static vm_fault_t do_fault(struct vm_fault *vmf)
 	return ret;
 }
 
-int numa_migrate_prep(struct folio *folio, struct vm_area_struct *vma,
+int numa_migrate_prep(struct folio *folio, struct vm_fault *vmf,
 		      unsigned long addr, int page_nid, int *flags)
 {
+	struct vm_area_struct *vma = vmf->vma;
+
 	folio_get(folio);
 
 	/* Record the current PID acceesing VMA */
@@ -4893,7 +4895,7 @@ int numa_migrate_prep(struct folio *folio, struct vm_area_struct *vma,
 		*flags |= TNF_FAULT_LOCAL;
 	}
 
-	return mpol_misplaced(folio, vma, addr);
+	return mpol_misplaced(folio, vmf, addr);
 }
 
 static vm_fault_t do_numa_page(struct vm_fault *vmf)
modified   mm/mempolicy.c
@@ -2495,18 +2495,24 @@ static inline bool mpol_preferred_should_numa_migrate(int exec_node, int folio_n
  * Return: NUMA_NO_NODE if the page is in a node that is valid for this
  * policy, or a suitable node ID to allocate a replacement folio from.
  */
-int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
+int mpol_misplaced(struct folio *folio, struct vm_fault *vmf,
 		   unsigned long addr)
 {
 	struct mempolicy *pol;
 	pgoff_t ilx;
 	struct zoneref *z;
 	int curnid = folio_nid(folio);
+	struct vm_area_struct *vma = vmf->vma;
 	int thiscpu = raw_smp_processor_id();
-	int thisnid = cpu_to_node(thiscpu);
+	int thisnid = numa_node_id();
 	int polnid = NUMA_NO_NODE;
 	int ret = NUMA_NO_NODE;
 
+	/*
+	 * Make sure ptl is held so that we don't preempt and we
+	 * have a stable smp processor id
+	 */
+	lockdep_assert_held(vmf->ptl);
 	pol = get_vma_policy(vma, addr, folio_order(folio), &ilx);
 	if (!(pol->flags & MPOL_F_MOF))
 		goto out;

[back]
Huang, Ying March 4, 2024, 1:49 a.m. UTC | #13
Aneesh Kumar K.V <aneesh.kumar@kernel.org> writes:

> Michal Hocko <mhocko@suse.com> writes:
>
>> On Tue 20-02-24 15:22:07, Huang, Ying wrote:
>> [...]
>>> This isn't an issue now, because mpol_misplaced() are always called with
>>> PTL held.  And, we can still keep thiscpu local variable.
>>
>> yes, this is the case but it would be better if we made that assumption
>> official by lockdep_assert_held
>>
>
> How about this folded into this patch?
>
> 2 files changed, 12 insertions(+), 4 deletions(-)
> mm/memory.c    |  6 ++++--
> mm/mempolicy.c | 10 ++++++++--
>
> modified   mm/memory.c
> @@ -4879,9 +4879,11 @@ static vm_fault_t do_fault(struct vm_fault *vmf)
>  	return ret;
>  }
>  
> -int numa_migrate_prep(struct folio *folio, struct vm_area_struct *vma,
> +int numa_migrate_prep(struct folio *folio, struct vm_fault *vmf,
>  		      unsigned long addr, int page_nid, int *flags)
>  {
> +	struct vm_area_struct *vma = vmf->vma;
> +
>  	folio_get(folio);
>  
>  	/* Record the current PID acceesing VMA */
> @@ -4893,7 +4895,7 @@ int numa_migrate_prep(struct folio *folio, struct vm_area_struct *vma,
>  		*flags |= TNF_FAULT_LOCAL;
>  	}
>  
> -	return mpol_misplaced(folio, vma, addr);
> +	return mpol_misplaced(folio, vmf, addr);
>  }
>  
>  static vm_fault_t do_numa_page(struct vm_fault *vmf)
> modified   mm/mempolicy.c
> @@ -2495,18 +2495,24 @@ static inline bool mpol_preferred_should_numa_migrate(int exec_node, int folio_n
>   * Return: NUMA_NO_NODE if the page is in a node that is valid for this
>   * policy, or a suitable node ID to allocate a replacement folio from.
>   */
> -int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
> +int mpol_misplaced(struct folio *folio, struct vm_fault *vmf,
>  		   unsigned long addr)
>  {
>  	struct mempolicy *pol;
>  	pgoff_t ilx;
>  	struct zoneref *z;
>  	int curnid = folio_nid(folio);
> +	struct vm_area_struct *vma = vmf->vma;
>  	int thiscpu = raw_smp_processor_id();
> -	int thisnid = cpu_to_node(thiscpu);
> +	int thisnid = numa_node_id();
>  	int polnid = NUMA_NO_NODE;
>  	int ret = NUMA_NO_NODE;
>  
> +	/*
> +	 * Make sure ptl is held so that we don't preempt and we
> +	 * have a stable smp processor id
> +	 */
> +	lockdep_assert_held(vmf->ptl);
>  	pol = get_vma_policy(vma, addr, folio_order(folio), &ilx);
>  	if (!(pol->flags & MPOL_F_MOF))
>  		goto out;
>
> [back]
>  

LGTM, Thanks!

--
Best Regards,
Huang, Ying
diff mbox series

Patch

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 10a590ee1c89..8478574c000c 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2526,7 +2526,7 @@  int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma,
 		if (node_isset(curnid, pol->nodes))
 			goto out;
 		z = first_zones_zonelist(
-				node_zonelist(numa_node_id(), GFP_HIGHUSER),
+				node_zonelist(thisnid, GFP_HIGHUSER),
 				gfp_zone(GFP_HIGHUSER),
 				&pol->nodes);
 		polnid = zone_to_nid(z->zone);