diff mbox series

[RFC] mm: mempolicy: remove MPOL_MF_LAZY

Message ID 1553041659-46787-1-git-send-email-yang.shi@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series [RFC] mm: mempolicy: remove MPOL_MF_LAZY | expand

Commit Message

Yang Shi March 20, 2019, 12:27 a.m. UTC
MPOL_MF_LAZY was added by commit b24f53a0bea3 ("mm: mempolicy: Add
MPOL_MF_LAZY"), then it was disabled by commit a720094ded8c ("mm:
mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now")
right away in 2012.  So, it is never ever exported to userspace.

And, it looks nobody is interested in revisiting it since it was
disabled 7 years ago.  So, it sounds pointless to still keep it around.

Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
Hi folks,
I'm not sure if you still would like to revisit it later. And, I may be
not the first one to try to remvoe it. IMHO, it sounds pointless to still
keep it around if nobody is interested in it.

 include/uapi/linux/mempolicy.h |  3 +--
 mm/mempolicy.c                 | 13 -------------
 2 files changed, 1 insertion(+), 15 deletions(-)

Comments

Michal Hocko March 21, 2019, 2:57 p.m. UTC | #1
On Wed 20-03-19 08:27:39, Yang Shi wrote:
> MPOL_MF_LAZY was added by commit b24f53a0bea3 ("mm: mempolicy: Add
> MPOL_MF_LAZY"), then it was disabled by commit a720094ded8c ("mm:
> mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now")
> right away in 2012.  So, it is never ever exported to userspace.
> 
> And, it looks nobody is interested in revisiting it since it was
> disabled 7 years ago.  So, it sounds pointless to still keep it around.

The above changelog owes us a lot of explanation about why this is
safe and backward compatible. I am also not sure you can change
MPOL_MF_INTERNAL because somebody still might use the flag from
userspace and we want to guarantee it will have the exact same semantic.

> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> ---
> Hi folks,
> I'm not sure if you still would like to revisit it later. And, I may be
> not the first one to try to remvoe it. IMHO, it sounds pointless to still
> keep it around if nobody is interested in it.
> 
>  include/uapi/linux/mempolicy.h |  3 +--
>  mm/mempolicy.c                 | 13 -------------
>  2 files changed, 1 insertion(+), 15 deletions(-)
> 
> diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
> index 3354774..eb52a7a 100644
> --- a/include/uapi/linux/mempolicy.h
> +++ b/include/uapi/linux/mempolicy.h
> @@ -45,8 +45,7 @@ enum {
>  #define MPOL_MF_MOVE	 (1<<1)	/* Move pages owned by this process to conform
>  				   to policy */
>  #define MPOL_MF_MOVE_ALL (1<<2)	/* Move every page to conform to policy */
> -#define MPOL_MF_LAZY	 (1<<3)	/* Modifies '_MOVE:  lazy migrate on fault */
> -#define MPOL_MF_INTERNAL (1<<4)	/* Internal flags start here */
> +#define MPOL_MF_INTERNAL (1<<3)	/* Internal flags start here */
>  
>  #define MPOL_MF_VALID	(MPOL_MF_STRICT   | 	\
>  			 MPOL_MF_MOVE     | 	\
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index af171cc..67886f4 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -593,15 +593,6 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
>  
>  	qp->prev = vma;
>  
> -	if (flags & MPOL_MF_LAZY) {
> -		/* Similar to task_numa_work, skip inaccessible VMAs */
> -		if (!is_vm_hugetlb_page(vma) &&
> -			(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)) &&
> -			!(vma->vm_flags & VM_MIXEDMAP))
> -			change_prot_numa(vma, start, endvma);
> -		return 1;
> -	}
> -
>  	/* queue pages from current vma */
>  	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
>  		return 0;
> @@ -1181,9 +1172,6 @@ static long do_mbind(unsigned long start, unsigned long len,
>  	if (IS_ERR(new))
>  		return PTR_ERR(new);
>  
> -	if (flags & MPOL_MF_LAZY)
> -		new->flags |= MPOL_F_MOF;
> -
>  	/*
>  	 * If we are using the default policy then operation
>  	 * on discontinuous address spaces is okay after all
> @@ -1226,7 +1214,6 @@ static long do_mbind(unsigned long start, unsigned long len,
>  		int nr_failed = 0;
>  
>  		if (!list_empty(&pagelist)) {
> -			WARN_ON_ONCE(flags & MPOL_MF_LAZY);
>  			nr_failed = migrate_pages(&pagelist, new_page, NULL,
>  				start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND);
>  			if (nr_failed)
> -- 
> 1.8.3.1
>
Yang Shi March 21, 2019, 4:21 p.m. UTC | #2
On 3/21/19 7:57 AM, Michal Hocko wrote:
> On Wed 20-03-19 08:27:39, Yang Shi wrote:
>> MPOL_MF_LAZY was added by commit b24f53a0bea3 ("mm: mempolicy: Add
>> MPOL_MF_LAZY"), then it was disabled by commit a720094ded8c ("mm:
>> mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now")
>> right away in 2012.  So, it is never ever exported to userspace.
>>
>> And, it looks nobody is interested in revisiting it since it was
>> disabled 7 years ago.  So, it sounds pointless to still keep it around.
> The above changelog owes us a lot of explanation about why this is
> safe and backward compatible. I am also not sure you can change
> MPOL_MF_INTERNAL because somebody still might use the flag from
> userspace and we want to guarantee it will have the exact same semantic.

Since MPOL_MF_LAZY is never exported to userspace (Mel helped to confirm 
this in the other thread), so I'm supposed it should be safe and 
backward compatible to userspace.

I'm also not sure if anyone use MPOL_MF_INTERNAL or not and how they use 
it in their applications, but how about keeping it unchanged?

Thanks,
Yang

>
>> Cc: Mel Gorman <mgorman@techsingularity.net>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>> ---
>> Hi folks,
>> I'm not sure if you still would like to revisit it later. And, I may be
>> not the first one to try to remvoe it. IMHO, it sounds pointless to still
>> keep it around if nobody is interested in it.
>>
>>   include/uapi/linux/mempolicy.h |  3 +--
>>   mm/mempolicy.c                 | 13 -------------
>>   2 files changed, 1 insertion(+), 15 deletions(-)
>>
>> diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
>> index 3354774..eb52a7a 100644
>> --- a/include/uapi/linux/mempolicy.h
>> +++ b/include/uapi/linux/mempolicy.h
>> @@ -45,8 +45,7 @@ enum {
>>   #define MPOL_MF_MOVE	 (1<<1)	/* Move pages owned by this process to conform
>>   				   to policy */
>>   #define MPOL_MF_MOVE_ALL (1<<2)	/* Move every page to conform to policy */
>> -#define MPOL_MF_LAZY	 (1<<3)	/* Modifies '_MOVE:  lazy migrate on fault */
>> -#define MPOL_MF_INTERNAL (1<<4)	/* Internal flags start here */
>> +#define MPOL_MF_INTERNAL (1<<3)	/* Internal flags start here */
>>   
>>   #define MPOL_MF_VALID	(MPOL_MF_STRICT   | 	\
>>   			 MPOL_MF_MOVE     | 	\
>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> index af171cc..67886f4 100644
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -593,15 +593,6 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
>>   
>>   	qp->prev = vma;
>>   
>> -	if (flags & MPOL_MF_LAZY) {
>> -		/* Similar to task_numa_work, skip inaccessible VMAs */
>> -		if (!is_vm_hugetlb_page(vma) &&
>> -			(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)) &&
>> -			!(vma->vm_flags & VM_MIXEDMAP))
>> -			change_prot_numa(vma, start, endvma);
>> -		return 1;
>> -	}
>> -
>>   	/* queue pages from current vma */
>>   	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
>>   		return 0;
>> @@ -1181,9 +1172,6 @@ static long do_mbind(unsigned long start, unsigned long len,
>>   	if (IS_ERR(new))
>>   		return PTR_ERR(new);
>>   
>> -	if (flags & MPOL_MF_LAZY)
>> -		new->flags |= MPOL_F_MOF;
>> -
>>   	/*
>>   	 * If we are using the default policy then operation
>>   	 * on discontinuous address spaces is okay after all
>> @@ -1226,7 +1214,6 @@ static long do_mbind(unsigned long start, unsigned long len,
>>   		int nr_failed = 0;
>>   
>>   		if (!list_empty(&pagelist)) {
>> -			WARN_ON_ONCE(flags & MPOL_MF_LAZY);
>>   			nr_failed = migrate_pages(&pagelist, new_page, NULL,
>>   				start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND);
>>   			if (nr_failed)
>> -- 
>> 1.8.3.1
>>
Michal Hocko March 21, 2019, 4:51 p.m. UTC | #3
On Thu 21-03-19 09:21:39, Yang Shi wrote:
> 
> 
> On 3/21/19 7:57 AM, Michal Hocko wrote:
> > On Wed 20-03-19 08:27:39, Yang Shi wrote:
> > > MPOL_MF_LAZY was added by commit b24f53a0bea3 ("mm: mempolicy: Add
> > > MPOL_MF_LAZY"), then it was disabled by commit a720094ded8c ("mm:
> > > mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now")
> > > right away in 2012.  So, it is never ever exported to userspace.
> > > 
> > > And, it looks nobody is interested in revisiting it since it was
> > > disabled 7 years ago.  So, it sounds pointless to still keep it around.
> > The above changelog owes us a lot of explanation about why this is
> > safe and backward compatible. I am also not sure you can change
> > MPOL_MF_INTERNAL because somebody still might use the flag from
> > userspace and we want to guarantee it will have the exact same semantic.
> 
> Since MPOL_MF_LAZY is never exported to userspace (Mel helped to confirm
> this in the other thread), so I'm supposed it should be safe and backward
> compatible to userspace.

You didn't get my point. The flag is exported to the userspace and
nothing in the syscall entry path checks and masks it. So we really have
to preserve the semantic of the flag bit for ever.

> I'm also not sure if anyone use MPOL_MF_INTERNAL or not and how they use it
> in their applications, but how about keeping it unchanged?

You really have to. Because it is an offset of other MPLO flags for
internal usage.

That being said. Considering that we really have to preserve
MPOL_MF_LAZY value (we cannot even rename it because it is in uapi
headers and we do not want to break compilation). What is the point of
this change? Why is it an improvement? Yes, nobody is probably using
this because this is not respected in anything but the preferred mem
policy. At least that is the case from my quick glance. I might be still
wrong as it is quite easy to overlook all the consequences. So the risk
is non trivial while the benefit is not really clear to me. If you see
one, _document_ it. "Mel said it is not in use" is not a justification,
with all due respect.

> Thanks,
> Yang
> 
> > 
> > > Cc: Mel Gorman <mgorman@techsingularity.net>
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Vlastimil Babka <vbabka@suse.cz>
> > > Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> > > ---
> > > Hi folks,
> > > I'm not sure if you still would like to revisit it later. And, I may be
> > > not the first one to try to remvoe it. IMHO, it sounds pointless to still
> > > keep it around if nobody is interested in it.
> > > 
> > >   include/uapi/linux/mempolicy.h |  3 +--
> > >   mm/mempolicy.c                 | 13 -------------
> > >   2 files changed, 1 insertion(+), 15 deletions(-)
> > > 
> > > diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
> > > index 3354774..eb52a7a 100644
> > > --- a/include/uapi/linux/mempolicy.h
> > > +++ b/include/uapi/linux/mempolicy.h
> > > @@ -45,8 +45,7 @@ enum {
> > >   #define MPOL_MF_MOVE	 (1<<1)	/* Move pages owned by this process to conform
> > >   				   to policy */
> > >   #define MPOL_MF_MOVE_ALL (1<<2)	/* Move every page to conform to policy */
> > > -#define MPOL_MF_LAZY	 (1<<3)	/* Modifies '_MOVE:  lazy migrate on fault */
> > > -#define MPOL_MF_INTERNAL (1<<4)	/* Internal flags start here */
> > > +#define MPOL_MF_INTERNAL (1<<3)	/* Internal flags start here */
> > >   #define MPOL_MF_VALID	(MPOL_MF_STRICT   | 	\
> > >   			 MPOL_MF_MOVE     | 	\
> > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > > index af171cc..67886f4 100644
> > > --- a/mm/mempolicy.c
> > > +++ b/mm/mempolicy.c
> > > @@ -593,15 +593,6 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
> > >   	qp->prev = vma;
> > > -	if (flags & MPOL_MF_LAZY) {
> > > -		/* Similar to task_numa_work, skip inaccessible VMAs */
> > > -		if (!is_vm_hugetlb_page(vma) &&
> > > -			(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)) &&
> > > -			!(vma->vm_flags & VM_MIXEDMAP))
> > > -			change_prot_numa(vma, start, endvma);
> > > -		return 1;
> > > -	}
> > > -
> > >   	/* queue pages from current vma */
> > >   	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
> > >   		return 0;
> > > @@ -1181,9 +1172,6 @@ static long do_mbind(unsigned long start, unsigned long len,
> > >   	if (IS_ERR(new))
> > >   		return PTR_ERR(new);
> > > -	if (flags & MPOL_MF_LAZY)
> > > -		new->flags |= MPOL_F_MOF;
> > > -
> > >   	/*
> > >   	 * If we are using the default policy then operation
> > >   	 * on discontinuous address spaces is okay after all
> > > @@ -1226,7 +1214,6 @@ static long do_mbind(unsigned long start, unsigned long len,
> > >   		int nr_failed = 0;
> > >   		if (!list_empty(&pagelist)) {
> > > -			WARN_ON_ONCE(flags & MPOL_MF_LAZY);
> > >   			nr_failed = migrate_pages(&pagelist, new_page, NULL,
> > >   				start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND);
> > >   			if (nr_failed)
> > > -- 
> > > 1.8.3.1
> > >
Yang Shi March 21, 2019, 5:25 p.m. UTC | #4
On 3/21/19 9:51 AM, Michal Hocko wrote:
> On Thu 21-03-19 09:21:39, Yang Shi wrote:
>>
>> On 3/21/19 7:57 AM, Michal Hocko wrote:
>>> On Wed 20-03-19 08:27:39, Yang Shi wrote:
>>>> MPOL_MF_LAZY was added by commit b24f53a0bea3 ("mm: mempolicy: Add
>>>> MPOL_MF_LAZY"), then it was disabled by commit a720094ded8c ("mm:
>>>> mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now")
>>>> right away in 2012.  So, it is never ever exported to userspace.
>>>>
>>>> And, it looks nobody is interested in revisiting it since it was
>>>> disabled 7 years ago.  So, it sounds pointless to still keep it around.
>>> The above changelog owes us a lot of explanation about why this is
>>> safe and backward compatible. I am also not sure you can change
>>> MPOL_MF_INTERNAL because somebody still might use the flag from
>>> userspace and we want to guarantee it will have the exact same semantic.
>> Since MPOL_MF_LAZY is never exported to userspace (Mel helped to confirm
>> this in the other thread), so I'm supposed it should be safe and backward
>> compatible to userspace.
> You didn't get my point. The flag is exported to the userspace and
> nothing in the syscall entry path checks and masks it. So we really have
> to preserve the semantic of the flag bit for ever.

Thanks, I see you point. Yes, it is exported to userspace in some sense 
since it is in uapi header. But, it is never documented and 
MPOL_MF_VALID excludes it. mbind() does check and mask it. It would 
return -EINVAL if MPOL_MF_LAZY or any other undefined/invalid flag is 
set. See the below code snippet from do_mbind():

...
#define MPOL_MF_VALID    (MPOL_MF_STRICT   |     \
              MPOL_MF_MOVE     |     \
              MPOL_MF_MOVE_ALL)

if (flags & ~(unsigned long)MPOL_MF_VALID)
         return -EINVAL;

So, I don't think any application would really use the flag for mbind() 
unless it is aimed to test the -EINVAL. If just test program, it should 
be not considered as a regression.

>
>> I'm also not sure if anyone use MPOL_MF_INTERNAL or not and how they use it
>> in their applications, but how about keeping it unchanged?
> You really have to. Because it is an offset of other MPLO flags for
> internal usage.
>
> That being said. Considering that we really have to preserve
> MPOL_MF_LAZY value (we cannot even rename it because it is in uapi
> headers and we do not want to break compilation). What is the point of
> this change? Why is it an improvement? Yes, nobody is probably using
> this because this is not respected in anything but the preferred mem
> policy. At least that is the case from my quick glance. I might be still
> wrong as it is quite easy to overlook all the consequences. So the risk
> is non trivial while the benefit is not really clear to me. If you see
> one, _document_ it. "Mel said it is not in use" is not a justification,
> with all due respect.

As I elaborated above, mbind() syscall does check it and treat it as an 
invalid flag. MPOL_PREFERRED doesn't use it either, but just use 
MPOL_F_MOF directly.

Thanks,
Yang

>
>> Thanks,
>> Yang
>>
>>>> Cc: Mel Gorman <mgorman@techsingularity.net>
>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>>>> ---
>>>> Hi folks,
>>>> I'm not sure if you still would like to revisit it later. And, I may be
>>>> not the first one to try to remvoe it. IMHO, it sounds pointless to still
>>>> keep it around if nobody is interested in it.
>>>>
>>>>    include/uapi/linux/mempolicy.h |  3 +--
>>>>    mm/mempolicy.c                 | 13 -------------
>>>>    2 files changed, 1 insertion(+), 15 deletions(-)
>>>>
>>>> diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
>>>> index 3354774..eb52a7a 100644
>>>> --- a/include/uapi/linux/mempolicy.h
>>>> +++ b/include/uapi/linux/mempolicy.h
>>>> @@ -45,8 +45,7 @@ enum {
>>>>    #define MPOL_MF_MOVE	 (1<<1)	/* Move pages owned by this process to conform
>>>>    				   to policy */
>>>>    #define MPOL_MF_MOVE_ALL (1<<2)	/* Move every page to conform to policy */
>>>> -#define MPOL_MF_LAZY	 (1<<3)	/* Modifies '_MOVE:  lazy migrate on fault */
>>>> -#define MPOL_MF_INTERNAL (1<<4)	/* Internal flags start here */
>>>> +#define MPOL_MF_INTERNAL (1<<3)	/* Internal flags start here */
>>>>    #define MPOL_MF_VALID	(MPOL_MF_STRICT   | 	\
>>>>    			 MPOL_MF_MOVE     | 	\
>>>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>>>> index af171cc..67886f4 100644
>>>> --- a/mm/mempolicy.c
>>>> +++ b/mm/mempolicy.c
>>>> @@ -593,15 +593,6 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
>>>>    	qp->prev = vma;
>>>> -	if (flags & MPOL_MF_LAZY) {
>>>> -		/* Similar to task_numa_work, skip inaccessible VMAs */
>>>> -		if (!is_vm_hugetlb_page(vma) &&
>>>> -			(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)) &&
>>>> -			!(vma->vm_flags & VM_MIXEDMAP))
>>>> -			change_prot_numa(vma, start, endvma);
>>>> -		return 1;
>>>> -	}
>>>> -
>>>>    	/* queue pages from current vma */
>>>>    	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
>>>>    		return 0;
>>>> @@ -1181,9 +1172,6 @@ static long do_mbind(unsigned long start, unsigned long len,
>>>>    	if (IS_ERR(new))
>>>>    		return PTR_ERR(new);
>>>> -	if (flags & MPOL_MF_LAZY)
>>>> -		new->flags |= MPOL_F_MOF;
>>>> -
>>>>    	/*
>>>>    	 * If we are using the default policy then operation
>>>>    	 * on discontinuous address spaces is okay after all
>>>> @@ -1226,7 +1214,6 @@ static long do_mbind(unsigned long start, unsigned long len,
>>>>    		int nr_failed = 0;
>>>>    		if (!list_empty(&pagelist)) {
>>>> -			WARN_ON_ONCE(flags & MPOL_MF_LAZY);
>>>>    			nr_failed = migrate_pages(&pagelist, new_page, NULL,
>>>>    				start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND);
>>>>    			if (nr_failed)
>>>> -- 
>>>> 1.8.3.1
>>>>
Mel Gorman March 21, 2019, 7:24 p.m. UTC | #5
On Thu, Mar 21, 2019 at 10:25:08AM -0700, Yang Shi wrote:
> 
> 
> On 3/21/19 9:51 AM, Michal Hocko wrote:
> > On Thu 21-03-19 09:21:39, Yang Shi wrote:
> > > 
> > > On 3/21/19 7:57 AM, Michal Hocko wrote:
> > > > On Wed 20-03-19 08:27:39, Yang Shi wrote:
> > > > > MPOL_MF_LAZY was added by commit b24f53a0bea3 ("mm: mempolicy: Add
> > > > > MPOL_MF_LAZY"), then it was disabled by commit a720094ded8c ("mm:
> > > > > mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now")
> > > > > right away in 2012.  So, it is never ever exported to userspace.
> > > > > 
> > > > > And, it looks nobody is interested in revisiting it since it was
> > > > > disabled 7 years ago.  So, it sounds pointless to still keep it around.
> > > > The above changelog owes us a lot of explanation about why this is
> > > > safe and backward compatible. I am also not sure you can change
> > > > MPOL_MF_INTERNAL because somebody still might use the flag from
> > > > userspace and we want to guarantee it will have the exact same semantic.
> > > Since MPOL_MF_LAZY is never exported to userspace (Mel helped to confirm
> > > this in the other thread), so I'm supposed it should be safe and backward
> > > compatible to userspace.
> > You didn't get my point. The flag is exported to the userspace and
> > nothing in the syscall entry path checks and masks it. So we really have
> > to preserve the semantic of the flag bit for ever.
> 
> Thanks, I see you point. Yes, it is exported to userspace in some sense
> since it is in uapi header. But, it is never documented and MPOL_MF_VALID
> excludes it. mbind() does check and mask it. It would return -EINVAL if
> MPOL_MF_LAZY or any other undefined/invalid flag is set. See the below code
> snippet from do_mbind():
> 

That does not explain the motivation behind removing it or what we gain.
Yes, it's undocumented and it's unlikely that anyone will. Any potential
semantics are almost meaningless with mbind but there are two
possibilities. One, mbind is relaxed to allow migration within allowed
nodes and two, interleave could initially interleave but allow migration
to local node to get a mix of average performance at init and local
performance over time. No one tried taking that option so far but it
appears harmless to leave it alone too.
Michal Hocko March 21, 2019, 7:45 p.m. UTC | #6
On Thu 21-03-19 10:25:08, Yang Shi wrote:
> 
> 
> On 3/21/19 9:51 AM, Michal Hocko wrote:
> > On Thu 21-03-19 09:21:39, Yang Shi wrote:
> > > 
> > > On 3/21/19 7:57 AM, Michal Hocko wrote:
> > > > On Wed 20-03-19 08:27:39, Yang Shi wrote:
> > > > > MPOL_MF_LAZY was added by commit b24f53a0bea3 ("mm: mempolicy: Add
> > > > > MPOL_MF_LAZY"), then it was disabled by commit a720094ded8c ("mm:
> > > > > mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now")
> > > > > right away in 2012.  So, it is never ever exported to userspace.
> > > > > 
> > > > > And, it looks nobody is interested in revisiting it since it was
> > > > > disabled 7 years ago.  So, it sounds pointless to still keep it around.
> > > > The above changelog owes us a lot of explanation about why this is
> > > > safe and backward compatible. I am also not sure you can change
> > > > MPOL_MF_INTERNAL because somebody still might use the flag from
> > > > userspace and we want to guarantee it will have the exact same semantic.
> > > Since MPOL_MF_LAZY is never exported to userspace (Mel helped to confirm
> > > this in the other thread), so I'm supposed it should be safe and backward
> > > compatible to userspace.
> > You didn't get my point. The flag is exported to the userspace and
> > nothing in the syscall entry path checks and masks it. So we really have
> > to preserve the semantic of the flag bit for ever.
> 
> Thanks, I see you point. Yes, it is exported to userspace in some sense
> since it is in uapi header. But, it is never documented and MPOL_MF_VALID
> excludes it. mbind() does check and mask it. It would return -EINVAL if
> MPOL_MF_LAZY or any other undefined/invalid flag is set. See the below code
> snippet from do_mbind():
> 
> ...
> #define MPOL_MF_VALID    (MPOL_MF_STRICT   |     \
>              MPOL_MF_MOVE     |     \
>              MPOL_MF_MOVE_ALL)
> 
> if (flags & ~(unsigned long)MPOL_MF_VALID)
>         return -EINVAL;
> 
> So, I don't think any application would really use the flag for mbind()
> unless it is aimed to test the -EINVAL. If just test program, it should be
> not considered as a regression.

I have overlook that MPOL_MF_VALID doesn't include MPOL_MF_LAZY. Anyway,
my argument still holds that the bit has to be reserved for ever because
it used to be valid at some point of time and not returning EINVAL could
imply you are running on the kernel which supports the flag.
 
> > > I'm also not sure if anyone use MPOL_MF_INTERNAL or not and how they use it
> > > in their applications, but how about keeping it unchanged?
> > You really have to. Because it is an offset of other MPLO flags for
> > internal usage.
> > 
> > That being said. Considering that we really have to preserve
> > MPOL_MF_LAZY value (we cannot even rename it because it is in uapi
> > headers and we do not want to break compilation). What is the point of
> > this change? Why is it an improvement? Yes, nobody is probably using
> > this because this is not respected in anything but the preferred mem
> > policy. At least that is the case from my quick glance. I might be still
> > wrong as it is quite easy to overlook all the consequences. So the risk
> > is non trivial while the benefit is not really clear to me. If you see
> > one, _document_ it. "Mel said it is not in use" is not a justification,
> > with all due respect.
> 
> As I elaborated above, mbind() syscall does check it and treat it as an
> invalid flag. MPOL_PREFERRED doesn't use it either, but just use MPOL_F_MOF
> directly.

As Mel already pointed out. This doesn't really sound like a sound
argument. Say we would remove those few lines of code and preserve the
flag for future reservation of the flag bit. I would bet my head that it
will not be long before somebody just goes and clean it up and remove
because the flag is unused. So you would have to put a note explaining
why this has to be preserved. Maybe the current code is better to
document that. It would be much more sound to remove the code if it was
causing a measurable overhead or a maintenance burden. Is any of that
the case?
Yang Shi March 21, 2019, 11:25 p.m. UTC | #7
On 3/21/19 12:45 PM, Michal Hocko wrote:
> On Thu 21-03-19 10:25:08, Yang Shi wrote:
>>
>> On 3/21/19 9:51 AM, Michal Hocko wrote:
>>> On Thu 21-03-19 09:21:39, Yang Shi wrote:
>>>> On 3/21/19 7:57 AM, Michal Hocko wrote:
>>>>> On Wed 20-03-19 08:27:39, Yang Shi wrote:
>>>>>> MPOL_MF_LAZY was added by commit b24f53a0bea3 ("mm: mempolicy: Add
>>>>>> MPOL_MF_LAZY"), then it was disabled by commit a720094ded8c ("mm:
>>>>>> mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now")
>>>>>> right away in 2012.  So, it is never ever exported to userspace.
>>>>>>
>>>>>> And, it looks nobody is interested in revisiting it since it was
>>>>>> disabled 7 years ago.  So, it sounds pointless to still keep it around.
>>>>> The above changelog owes us a lot of explanation about why this is
>>>>> safe and backward compatible. I am also not sure you can change
>>>>> MPOL_MF_INTERNAL because somebody still might use the flag from
>>>>> userspace and we want to guarantee it will have the exact same semantic.
>>>> Since MPOL_MF_LAZY is never exported to userspace (Mel helped to confirm
>>>> this in the other thread), so I'm supposed it should be safe and backward
>>>> compatible to userspace.
>>> You didn't get my point. The flag is exported to the userspace and
>>> nothing in the syscall entry path checks and masks it. So we really have
>>> to preserve the semantic of the flag bit for ever.
>> Thanks, I see you point. Yes, it is exported to userspace in some sense
>> since it is in uapi header. But, it is never documented and MPOL_MF_VALID
>> excludes it. mbind() does check and mask it. It would return -EINVAL if
>> MPOL_MF_LAZY or any other undefined/invalid flag is set. See the below code
>> snippet from do_mbind():
>>
>> ...
>> #define MPOL_MF_VALID    (MPOL_MF_STRICT   |     \
>>               MPOL_MF_MOVE     |     \
>>               MPOL_MF_MOVE_ALL)
>>
>> if (flags & ~(unsigned long)MPOL_MF_VALID)
>>          return -EINVAL;
>>
>> So, I don't think any application would really use the flag for mbind()
>> unless it is aimed to test the -EINVAL. If just test program, it should be
>> not considered as a regression.
> I have overlook that MPOL_MF_VALID doesn't include MPOL_MF_LAZY. Anyway,
> my argument still holds that the bit has to be reserved for ever because
> it used to be valid at some point of time and not returning EINVAL could
> imply you are running on the kernel which supports the flag.

I'd say it is not valid since very beginning. MPOL_MF_LAZY was added by 
commit b24f53a0bea3 ("mm: mempolicy: Add
MPOL_MF_LAZY"), then it was hidden by commit a720094ded8c ("mm:
mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now"). 
And, git describe --contains shows:

US-143344MP:linux yang.s$ git describe --contains b24f53a0bea3
v3.8-rc1~92^2~27
US-143344MP:linux yang.s$ git describe --contains a720094ded8c
v3.8-rc1~92^2~25

This is why I thought it is never ever exported to userspace.

>   
>>>> I'm also not sure if anyone use MPOL_MF_INTERNAL or not and how they use it
>>>> in their applications, but how about keeping it unchanged?
>>> You really have to. Because it is an offset of other MPLO flags for
>>> internal usage.
>>>
>>> That being said. Considering that we really have to preserve
>>> MPOL_MF_LAZY value (we cannot even rename it because it is in uapi
>>> headers and we do not want to break compilation). What is the point of
>>> this change? Why is it an improvement? Yes, nobody is probably using
>>> this because this is not respected in anything but the preferred mem
>>> policy. At least that is the case from my quick glance. I might be still
>>> wrong as it is quite easy to overlook all the consequences. So the risk
>>> is non trivial while the benefit is not really clear to me. If you see
>>> one, _document_ it. "Mel said it is not in use" is not a justification,
>>> with all due respect.
>> As I elaborated above, mbind() syscall does check it and treat it as an
>> invalid flag. MPOL_PREFERRED doesn't use it either, but just use MPOL_F_MOF
>> directly.
> As Mel already pointed out. This doesn't really sound like a sound
> argument. Say we would remove those few lines of code and preserve the
> flag for future reservation of the flag bit. I would bet my head that it
> will not be long before somebody just goes and clean it up and remove
> because the flag is unused. So you would have to put a note explaining
> why this has to be preserved. Maybe the current code is better to
> document that. It would be much more sound to remove the code if it was
> causing a measurable overhead or a maintenance burden. Is any of that
> the case?

As what I found out, I just thought it may be dead code, if so why not 
remove it otherwise we may have to keep maintaining the unused code.

Thanks,
Yang

>
Yang Shi March 21, 2019, 11:29 p.m. UTC | #8
On 3/21/19 12:24 PM, Mel Gorman wrote:
> On Thu, Mar 21, 2019 at 10:25:08AM -0700, Yang Shi wrote:
>>
>> On 3/21/19 9:51 AM, Michal Hocko wrote:
>>> On Thu 21-03-19 09:21:39, Yang Shi wrote:
>>>> On 3/21/19 7:57 AM, Michal Hocko wrote:
>>>>> On Wed 20-03-19 08:27:39, Yang Shi wrote:
>>>>>> MPOL_MF_LAZY was added by commit b24f53a0bea3 ("mm: mempolicy: Add
>>>>>> MPOL_MF_LAZY"), then it was disabled by commit a720094ded8c ("mm:
>>>>>> mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now")
>>>>>> right away in 2012.  So, it is never ever exported to userspace.
>>>>>>
>>>>>> And, it looks nobody is interested in revisiting it since it was
>>>>>> disabled 7 years ago.  So, it sounds pointless to still keep it around.
>>>>> The above changelog owes us a lot of explanation about why this is
>>>>> safe and backward compatible. I am also not sure you can change
>>>>> MPOL_MF_INTERNAL because somebody still might use the flag from
>>>>> userspace and we want to guarantee it will have the exact same semantic.
>>>> Since MPOL_MF_LAZY is never exported to userspace (Mel helped to confirm
>>>> this in the other thread), so I'm supposed it should be safe and backward
>>>> compatible to userspace.
>>> You didn't get my point. The flag is exported to the userspace and
>>> nothing in the syscall entry path checks and masks it. So we really have
>>> to preserve the semantic of the flag bit for ever.
>> Thanks, I see you point. Yes, it is exported to userspace in some sense
>> since it is in uapi header. But, it is never documented and MPOL_MF_VALID
>> excludes it. mbind() does check and mask it. It would return -EINVAL if
>> MPOL_MF_LAZY or any other undefined/invalid flag is set. See the below code
>> snippet from do_mbind():
>>
> That does not explain the motivation behind removing it or what we gain.
> Yes, it's undocumented and it's unlikely that anyone will. Any potential
> semantics are almost meaningless with mbind but there are two
> possibilities. One, mbind is relaxed to allow migration within allowed
> nodes and two, interleave could initially interleave but allow migration
> to local node to get a mix of average performance at init and local
> performance over time. No one tried taking that option so far but it
> appears harmless to leave it alone too.

Yes, actually this is what I argued, no one tried taking the flag for 
long time. I also agree it sounds harmless to leave it. I just thought 
it may be dead code, if so why not just remove it.

Thanks,
Yang

>
diff mbox series

Patch

diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
index 3354774..eb52a7a 100644
--- a/include/uapi/linux/mempolicy.h
+++ b/include/uapi/linux/mempolicy.h
@@ -45,8 +45,7 @@  enum {
 #define MPOL_MF_MOVE	 (1<<1)	/* Move pages owned by this process to conform
 				   to policy */
 #define MPOL_MF_MOVE_ALL (1<<2)	/* Move every page to conform to policy */
-#define MPOL_MF_LAZY	 (1<<3)	/* Modifies '_MOVE:  lazy migrate on fault */
-#define MPOL_MF_INTERNAL (1<<4)	/* Internal flags start here */
+#define MPOL_MF_INTERNAL (1<<3)	/* Internal flags start here */
 
 #define MPOL_MF_VALID	(MPOL_MF_STRICT   | 	\
 			 MPOL_MF_MOVE     | 	\
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index af171cc..67886f4 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -593,15 +593,6 @@  static int queue_pages_test_walk(unsigned long start, unsigned long end,
 
 	qp->prev = vma;
 
-	if (flags & MPOL_MF_LAZY) {
-		/* Similar to task_numa_work, skip inaccessible VMAs */
-		if (!is_vm_hugetlb_page(vma) &&
-			(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)) &&
-			!(vma->vm_flags & VM_MIXEDMAP))
-			change_prot_numa(vma, start, endvma);
-		return 1;
-	}
-
 	/* queue pages from current vma */
 	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
 		return 0;
@@ -1181,9 +1172,6 @@  static long do_mbind(unsigned long start, unsigned long len,
 	if (IS_ERR(new))
 		return PTR_ERR(new);
 
-	if (flags & MPOL_MF_LAZY)
-		new->flags |= MPOL_F_MOF;
-
 	/*
 	 * If we are using the default policy then operation
 	 * on discontinuous address spaces is okay after all
@@ -1226,7 +1214,6 @@  static long do_mbind(unsigned long start, unsigned long len,
 		int nr_failed = 0;
 
 		if (!list_empty(&pagelist)) {
-			WARN_ON_ONCE(flags & MPOL_MF_LAZY);
 			nr_failed = migrate_pages(&pagelist, new_page, NULL,
 				start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND);
 			if (nr_failed)