diff mbox series

mm: introduce MAP_FIXED_HUGETLB_LEN to mmap()

Message ID 1585313944-8627-1-git-send-email-lixinhai.lxh@gmail.com (mailing list archive)
State New, archived
Headers show
Series mm: introduce MAP_FIXED_HUGETLB_LEN to mmap() | expand

Commit Message

Li Xinhai March 27, 2020, 12:59 p.m. UTC
The purpose of MAP_FIXED_HUGETLB_LEN is to check whether the parameter
length is valid or not according to the target file's huge page size.
When it is used, if length is not aligned to underlying huge page size,
mmap() is failed with errno set to EINVAL. When it is not used, the
current semantic is maintained, i.e., length is round up to underlying
huge page size.

In current code, the vma related call, except mmap, are all consider
not correctly aligned length as invalid parameter, including mprotect,
munmap, mlock, etc., by checking through hugetlb_vm_op_split. So, user
will see failure, after successfully call mmap, although using same
length parameter to other mapping syscall.

With MAP_FIXED_HUGETLB_LEN, user can choose to check if length is
correctly aligned at first place when call mmap, instead of failure after
mapping has been created.

Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/uapi/asm-generic/mman-common.h |  1 +
 mm/mmap.c                              | 17 +++++++++++++++--
 2 files changed, 16 insertions(+), 2 deletions(-)

Comments

John Hubbard March 27, 2020, 7:12 p.m. UTC | #1
On 3/27/20 5:59 AM, Li Xinhai wrote:
> The purpose of MAP_FIXED_HUGETLB_LEN is to check whether the parameter
> length is valid or not according to the target file's huge page size.
> When it is used, if length is not aligned to underlying huge page size,
> mmap() is failed with errno set to EINVAL. When it is not used, the
> current semantic is maintained, i.e., length is round up to underlying
> huge page size.
> 
> In current code, the vma related call, except mmap, are all consider
> not correctly aligned length as invalid parameter, including mprotect,
> munmap, mlock, etc., by checking through hugetlb_vm_op_split. So, user
> will see failure, after successfully call mmap, although using same
> length parameter to other mapping syscall.
> 
> With MAP_FIXED_HUGETLB_LEN, user can choose to check if length is
> correctly aligned at first place when call mmap, instead of failure after
> mapping has been created.

Hi Li,

This is not worth creating a new MAP_ flag. If you look at the existing flags
you will see that they are both limited and carefully chosen, so as to cover
a reasonable chunk of functionality per flag. We don't just drop in a flag
for tiny corner cases like this one.

btw, remember that user API changes require man pages updates as well. And
that the API has to be supported forever. And that if we use up valuable
flag slots on trivia then we'll run out of flags quite soon, and won't be
able to do broader, more important upgrades.

Also, we need to include a user space API mailing list for things that
affect that. Adding them now: Linux API <linux-api@vger.kernel.org>
The man pages mailing list will also be needed if we go there.

Let's take a closer look at your problem and see what it takes to solve it.
If we need some sort of flag to mmap() or other routines, fine. But so far,
I can see at least two solutions that are much easier:

> 
> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>   include/uapi/asm-generic/mman-common.h |  1 +
>   mm/mmap.c                              | 17 +++++++++++++++--
>   2 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
> index f94f65d..1c9ba97 100644
> --- a/include/uapi/asm-generic/mman-common.h
> +++ b/include/uapi/asm-generic/mman-common.h
> @@ -21,6 +21,7 @@
>   #define MAP_TYPE	0x0f		/* Mask for type of mapping */
>   #define MAP_FIXED	0x10		/* Interpret addr exactly */
>   #define MAP_ANONYMOUS	0x20		/* don't use a file */
> +#define MAP_FIXED_HUGETLB_LEN	0x40	/* check alignment of addr, length, offset */
>   
>   /* 0x0100 - 0x4000 flags are defined in asm-generic/mman.h */
>   #define MAP_POPULATE		0x008000	/* populate (prefault) pagetables */
> diff --git a/mm/mmap.c b/mm/mmap.c
> index d681a20..50a12e0 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1560,9 +1560,18 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
>   		file = fget(fd);
>   		if (!file)
>   			return -EBADF;
> -		if (is_file_hugepages(file))
> -			len = ALIGN(len, huge_page_size(hstate_file(file)));

Solution idea #1: because your proposal here requires changing the calling
(user space) code by adding the new flag to the mmap() call, it's therefore
clear that other changes to the calling code are also possible. So what
about simply doing the length check first, before calling mmap()? In other
words, do the user space equivalent of the above two lines that you're deleting?
That avoids your stated problem of calling mmap twice.


> +
>   		retval = -EINVAL;
> +		if (is_file_hugepages(file)) {
> +			struct hstate *hs = hstate_file(file);
> +
> +			if (flags & MAP_FIXED_HUGETLB_LEN &&
> +				len & ~(huge_page_mask(hs)))
> +				goto out_fput;
> +
> +			len = ALIGN(len, huge_page_size(hs));


Solution idea #2: just do the length check unconditionally here (without looking
at a new flag), and return an error if it is not aligned. And same thing for the
MAP_HUGETLB case below. And delete the "len = ALIGN(len, huge_page_size(hs));" in
both cases.

That would still require a man page update, and consensus that it won't Break
The World, but it's possible (I really don't know) that this is a more common
and desirable behavior.

Let's see if anyone else weighs in about this.


> +		}
> +
>   		if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file)))
>   			goto out_fput;
>   	} else if (flags & MAP_HUGETLB) {
> @@ -1573,6 +1582,10 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
>   		if (!hs)
>   			return -EINVAL;
>   
> +		if (flags & MAP_FIXED_HUGETLB_LEN &&
> +			len & ~(huge_page_mask(hs)))
> +			return -EINVAL;
> +
>   		len = ALIGN(len, huge_page_size(hs));
>   		/*
>   		 * VM_NORESERVE is used because the reservations will be
> 


thanks,
Mike Kravetz March 28, 2020, 1:31 a.m. UTC | #2
On 3/27/20 12:12 PM, John Hubbard wrote:
> On 3/27/20 5:59 AM, Li Xinhai wrote:
>> The purpose of MAP_FIXED_HUGETLB_LEN is to check whether the parameter
>> length is valid or not according to the target file's huge page size.
>> When it is used, if length is not aligned to underlying huge page size,
>> mmap() is failed with errno set to EINVAL. When it is not used, the
>> current semantic is maintained, i.e., length is round up to underlying
>> huge page size.
>>
>> In current code, the vma related call, except mmap, are all consider
>> not correctly aligned length as invalid parameter, including mprotect,
>> munmap, mlock, etc., by checking through hugetlb_vm_op_split. So, user
>> will see failure, after successfully call mmap, although using same
>> length parameter to other mapping syscall.
>>
>> With MAP_FIXED_HUGETLB_LEN, user can choose to check if length is
>> correctly aligned at first place when call mmap, instead of failure after
>> mapping has been created.
> 
> Hi Li,
> 
> This is not worth creating a new MAP_ flag. If you look at the existing flags
> you will see that they are both limited and carefully chosen, so as to cover
> a reasonable chunk of functionality per flag. We don't just drop in a flag
> for tiny corner cases like this one.
> 
> btw, remember that user API changes require man pages updates as well. And
> that the API has to be supported forever. And that if we use up valuable
> flag slots on trivia then we'll run out of flags quite soon, and won't be
> able to do broader, more important upgrades.
> 
> Also, we need to include a user space API mailing list for things that
> affect that. Adding them now: Linux API <linux-api@vger.kernel.org>
> The man pages mailing list will also be needed if we go there.
> 
> Let's take a closer look at your problem and see what it takes to solve it.
> If we need some sort of flag to mmap() or other routines, fine. But so far,
> I can see at least two solutions that are much easier:

I too question the motivation for this patch.  Is it simply to eliminate some
of the hugetlb special behavior and make it behave more like the rest of mm?

> Solution idea #2: just do the length check unconditionally here (without looking
> at a new flag), and return an error if it is not aligned. And same thing for the
> MAP_HUGETLB case below. And delete the "len = ALIGN(len, huge_page_size(hs));" in
> both cases.
> 
> That would still require a man page update, and consensus that it won't Break
> The World, but it's possible (I really don't know) that this is a more common
> and desirable behavior.
> 
> Let's see if anyone else weighs in about this.

That certainly would be the easiest thing to do.  However, I'm guessing
the current behavior was added when hugetlb mmap support was added.  There
is no telling how many applications might break if we change the behavior.
I'm guessing this is the reason Li chose to only change the behavior if
a new flag was specified.
Li Xinhai March 28, 2020, 2:14 a.m. UTC | #3
On 2020-03-28 at 03:12 John Hubbard wrote:
>On 3/27/20 5:59 AM, Li Xinhai wrote:
>> The purpose of MAP_FIXED_HUGETLB_LEN is to check whether the parameter
>> length is valid or not according to the target file's huge page size.
>> When it is used, if length is not aligned to underlying huge page size,
>> mmap() is failed with errno set to EINVAL. When it is not used, the
>> current semantic is maintained, i.e., length is round up to underlying
>> huge page size.
>>
>> In current code, the vma related call, except mmap, are all consider
>> not correctly aligned length as invalid parameter, including mprotect,
>> munmap, mlock, etc., by checking through hugetlb_vm_op_split. So, user
>> will see failure, after successfully call mmap, although using same
>> length parameter to other mapping syscall.
>>
>> With MAP_FIXED_HUGETLB_LEN, user can choose to check if length is
>> correctly aligned at first place when call mmap, instead of failure after
>> mapping has been created.
>
>Hi Li,
>
>This is not worth creating a new MAP_ flag. If you look at the existing flags
>you will see that they are both limited and carefully chosen, so as to cover
>a reasonable chunk of functionality per flag. We don't just drop in a flag
>for tiny corner cases like this one.
>
>btw, remember that user API changes require man pages updates as well. And
>that the API has to be supported forever. And that if we use up valuable
>flag slots on trivia then we'll run out of flags quite soon, and won't be
>able to do broader, more important upgrades.
>
>Also, we need to include a user space API mailing list for things that
>affect that. Adding them now: Linux API <linux-api@vger.kernel.org>
>The man pages mailing list will also be needed if we go there.
>
>Let's take a closer look at your problem and see what it takes to solve it.
>If we need some sort of flag to mmap() or other routines, fine. But so far,
>I can see at least two solutions that are much easier:
>
>>
>> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> ---
>>   include/uapi/asm-generic/mman-common.h |  1 +
>>   mm/mmap.c                              | 17 +++++++++++++++--
>>   2 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
>> index f94f65d..1c9ba97 100644
>> --- a/include/uapi/asm-generic/mman-common.h
>> +++ b/include/uapi/asm-generic/mman-common.h
>> @@ -21,6 +21,7 @@
>>   #define MAP_TYPE	0x0f	/* Mask for type of mapping */
>>   #define MAP_FIXED	0x10	/* Interpret addr exactly */
>>   #define MAP_ANONYMOUS	0x20	/* don't use a file */
>> +#define MAP_FIXED_HUGETLB_LEN	0x40	/* check alignment of addr, length, offset */
>>  
>>   /* 0x0100 - 0x4000 flags are defined in asm-generic/mman.h */
>>   #define MAP_POPULATE	0x008000	/* populate (prefault) pagetables */
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index d681a20..50a12e0 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -1560,9 +1560,18 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
>>   file = fget(fd);
>>   if (!file)
>>   return -EBADF;
>> -	if (is_file_hugepages(file))
>> -	len = ALIGN(len, huge_page_size(hstate_file(file)));
>
>Solution idea #1: because your proposal here requires changing the calling
>(user space) code by adding the new flag to the mmap() call, it's therefore
>clear that other changes to the calling code are also possible. So what
>about simply doing the length check first, before calling mmap()? In other
>words, do the user space equivalent of the above two lines that you're deleting? 

Yes, agree, and I am using this check after encounted unexpected munmap failure.

>That avoids your stated problem of calling mmap twice.
>
>
>> +
>>   retval = -EINVAL;
>> +	if (is_file_hugepages(file)) {
>> +	struct hstate *hs = hstate_file(file);
>> +
>> +	if (flags & MAP_FIXED_HUGETLB_LEN &&
>> +	len & ~(huge_page_mask(hs)))
>> +	goto out_fput;
>> +
>> +	len = ALIGN(len, huge_page_size(hs));
>
>
>Solution idea #2: just do the length check unconditionally here (without looking
>at a new flag), and return an error if it is not aligned. And same thing for the
>MAP_HUGETLB case below. And delete the "len = ALIGN(len, huge_page_size(hs));" in
>both cases. 

Same thoughts as you. I was planed to post patch in this way(prefer not
inventing new flag), and we wil have consistent behavior that already provided
by hugetlbfs, the checking by get_unmapped_area() from mmap() path and split()
from other syscall have same logic for lentgh(i.e., report EINVAL if not aligned).

>
>That would still require a man page update, and consensus that it won't Break
>The World, but it's possible (I really don't know) that this is a more common
>and desirable behavior. 

Yes, consistent behavior of hugetlb mapping is desirable.
For mapping of normal 4K pages, we see consistent behavior among relevant syscall,
they all round up 'length' to page size, although this is different from hugetlb
mapping.

>
>Let's see if anyone else weighs in about this. 
>  
>
>> +	}
>> +
>>   if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file)))
>>   goto out_fput;
>>   } else if (flags & MAP_HUGETLB) {
>> @@ -1573,6 +1582,10 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
>>   if (!hs)
>>   return -EINVAL;
>>  
>> +	if (flags & MAP_FIXED_HUGETLB_LEN &&
>> +	len & ~(huge_page_mask(hs)))
>> +	return -EINVAL;
>> +
>>   len = ALIGN(len, huge_page_size(hs));
>>   /*
>>   * VM_NORESERVE is used because the reservations will be
>>
>
>
>thanks,
>--
>John Hubbard
>NVIDIA
Li Xinhai March 28, 2020, 2:19 a.m. UTC | #4
On 2020-03-28 at 09:31 Mike Kravetz wrote:
>On 3/27/20 12:12 PM, John Hubbard wrote:
>> On 3/27/20 5:59 AM, Li Xinhai wrote:
>>> The purpose of MAP_FIXED_HUGETLB_LEN is to check whether the parameter
>>> length is valid or not according to the target file's huge page size.
>>> When it is used, if length is not aligned to underlying huge page size,
>>> mmap() is failed with errno set to EINVAL. When it is not used, the
>>> current semantic is maintained, i.e., length is round up to underlying
>>> huge page size.
>>>
>>> In current code, the vma related call, except mmap, are all consider
>>> not correctly aligned length as invalid parameter, including mprotect,
>>> munmap, mlock, etc., by checking through hugetlb_vm_op_split. So, user
>>> will see failure, after successfully call mmap, although using same
>>> length parameter to other mapping syscall.
>>>
>>> With MAP_FIXED_HUGETLB_LEN, user can choose to check if length is
>>> correctly aligned at first place when call mmap, instead of failure after
>>> mapping has been created.
>>
>> Hi Li,
>>
>> This is not worth creating a new MAP_ flag. If you look at the existing flags
>> you will see that they are both limited and carefully chosen, so as to cover
>> a reasonable chunk of functionality per flag. We don't just drop in a flag
>> for tiny corner cases like this one.
>>
>> btw, remember that user API changes require man pages updates as well. And
>> that the API has to be supported forever. And that if we use up valuable
>> flag slots on trivia then we'll run out of flags quite soon, and won't be
>> able to do broader, more important upgrades.
>>
>> Also, we need to include a user space API mailing list for things that
>> affect that. Adding them now: Linux API <linux-api@vger.kernel.org>
>> The man pages mailing list will also be needed if we go there.
>>
>> Let's take a closer look at your problem and see what it takes to solve it.
>> If we need some sort of flag to mmap() or other routines, fine. But so far,
>> I can see at least two solutions that are much easier:
>
>I too question the motivation for this patch.  Is it simply to eliminate some
>of the hugetlb special behavior and make it behave more like the rest of mm?
> 
>> Solution idea #2: just do the length check unconditionally here (without looking
>> at a new flag), and return an error if it is not aligned. And same thing for the
>> MAP_HUGETLB case below. And delete the "len = ALIGN(len, huge_page_size(hs));" in
>> both cases.
>>
>> That would still require a man page update, and consensus that it won't Break
>> The World, but it's possible (I really don't know) that this is a more common
>> and desirable behavior.
>>
>> Let's see if anyone else weighs in about this.
>
>That certainly would be the easiest thing to do.  However, I'm guessing
>the current behavior was added when hugetlb mmap support was added.  

>There is no telling how many applications might break if we change the behavior.
>I'm guessing this is the reason Li chose to only change the behavior if
>a new flag was specified. 
Yes, I was considering this change would break something.

>--
>Mike Kravetz
Li Xinhai March 29, 2020, 3:20 a.m. UTC | #5
On 2020-03-28 at 10:19 Li Xinhai wrote:
>On 2020-03-28 at 09:31 Mike Kravetz wrote:
>>On 3/27/20 12:12 PM, John Hubbard wrote:
>>> On 3/27/20 5:59 AM, Li Xinhai wrote:
>>>> The purpose of MAP_FIXED_HUGETLB_LEN is to check whether the parameter
>>>> length is valid or not according to the target file's huge page size.
>>>> When it is used, if length is not aligned to underlying huge page size,
>>>> mmap() is failed with errno set to EINVAL. When it is not used, the
>>>> current semantic is maintained, i.e., length is round up to underlying
>>>> huge page size.
>>>>
>>>> In current code, the vma related call, except mmap, are all consider
>>>> not correctly aligned length as invalid parameter, including mprotect,
>>>> munmap, mlock, etc., by checking through hugetlb_vm_op_split. So, user
>>>> will see failure, after successfully call mmap, although using same
>>>> length parameter to other mapping syscall.
>>>>
>>>> With MAP_FIXED_HUGETLB_LEN, user can choose to check if length is
>>>> correctly aligned at first place when call mmap, instead of failure after
>>>> mapping has been created.
>>>
>>> Hi Li,
>>>
>>> This is not worth creating a new MAP_ flag. If you look at the existing flags
>>> you will see that they are both limited and carefully chosen, so as to cover
>>> a reasonable chunk of functionality per flag. We don't just drop in a flag
>>> for tiny corner cases like this one.
>>>
>>> btw, remember that user API changes require man pages updates as well. And
>>> that the API has to be supported forever. And that if we use up valuable
>>> flag slots on trivia then we'll run out of flags quite soon, and won't be
>>> able to do broader, more important upgrades.
>>>
>>> Also, we need to include a user space API mailing list for things that
>>> affect that. Adding them now: Linux API <linux-api@vger.kernel.org>
>>> The man pages mailing list will also be needed if we go there.
>>>
>>> Let's take a closer look at your problem and see what it takes to solve it.
>>> If we need some sort of flag to mmap() or other routines, fine. But so far,
>>> I can see at least two solutions that are much easier:
>>
>>I too question the motivation for this patch.  Is it simply to eliminate some
>>of the hugetlb special behavior and make it behave more like the rest of mm?
>>
>>> Solution idea #2: just do the length check unconditionally here (without looking
>>> at a new flag), and return an error if it is not aligned. And same thing for the
>>> MAP_HUGETLB case below. And delete the "len = ALIGN(len, huge_page_size(hs));" in
>>> both cases.
>>>
>>> That would still require a man page update, and consensus that it won't Break
>>> The World, but it's possible (I really don't know) that this is a more common
>>> and desirable behavior.
>>>
>>> Let's see if anyone else weighs in about this.
>>
>>That certainly would be the easiest thing to do.  However, I'm guessing
>>the current behavior was added when hugetlb mmap support was added. 
>
>>There is no telling how many applications might break if we change the behavior.
>>I'm guessing this is the reason Li chose to only change the behavior if
>>a new flag was specified.
>Yes, I was considering this change would break something.
> 
It's better to have a new patch which don't introduce new flag, and I attached
statements from the kernel document.
https://lore.kernel.org/linux-api/1585451295-22302-1-git-send-email-lixinhai.lxh@gmail.com/

Let's have further check, thanks!

>>--
>>Mike Kravetz
diff mbox series

Patch

diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index f94f65d..1c9ba97 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -21,6 +21,7 @@ 
 #define MAP_TYPE	0x0f		/* Mask for type of mapping */
 #define MAP_FIXED	0x10		/* Interpret addr exactly */
 #define MAP_ANONYMOUS	0x20		/* don't use a file */
+#define MAP_FIXED_HUGETLB_LEN	0x40	/* check alignment of addr, length, offset */
 
 /* 0x0100 - 0x4000 flags are defined in asm-generic/mman.h */
 #define MAP_POPULATE		0x008000	/* populate (prefault) pagetables */
diff --git a/mm/mmap.c b/mm/mmap.c
index d681a20..50a12e0 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1560,9 +1560,18 @@  unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
 		file = fget(fd);
 		if (!file)
 			return -EBADF;
-		if (is_file_hugepages(file))
-			len = ALIGN(len, huge_page_size(hstate_file(file)));
+
 		retval = -EINVAL;
+		if (is_file_hugepages(file)) {
+			struct hstate *hs = hstate_file(file);
+
+			if (flags & MAP_FIXED_HUGETLB_LEN &&
+				len & ~(huge_page_mask(hs)))
+				goto out_fput;
+
+			len = ALIGN(len, huge_page_size(hs));
+		}
+
 		if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file)))
 			goto out_fput;
 	} else if (flags & MAP_HUGETLB) {
@@ -1573,6 +1582,10 @@  unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
 		if (!hs)
 			return -EINVAL;
 
+		if (flags & MAP_FIXED_HUGETLB_LEN &&
+			len & ~(huge_page_mask(hs)))
+			return -EINVAL;
+
 		len = ALIGN(len, huge_page_size(hs));
 		/*
 		 * VM_NORESERVE is used because the reservations will be