Message ID | 20220524232844.169332-1-mike.kravetz@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | madvise.2: Clarify addr/length and update hugetlb support | expand |
Hi, Mike, Some minor nitpicks below. On Tue, May 24, 2022 at 04:28:44PM -0700, Mike Kravetz wrote: > Clarify that madvise only works on full pages, and remove references > to 'bytes'. > > Update MADV_DONTNEED and MADV_REMOVE sections to remove notes that > HugeTLB mappings are not supported. They now are supported. > > Under 'Linux notes' describe addr requirements and length handling > for ranges in HugeTLB mappings. > > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > --- > man2/madvise.2 | 36 ++++++++++++++++++++++++++---------- > 1 file changed, 26 insertions(+), 10 deletions(-) > > diff --git a/man2/madvise.2 b/man2/madvise.2 > index f1f384c0c..c3b0615cb 100644 > --- a/man2/madvise.2 > +++ b/man2/madvise.2 > @@ -61,9 +61,13 @@ system call is used to give advice or directions to the kernel > about the address range beginning at address > .I addr > and with size > +.IR length . > +.BR madvise () > +only operates on whole pages, therefore > +.I addr > +must be page-aligned. The value of > .I length > -bytes > -In most cases, > +is rounded up to a multiple of page size. In most cases, > the goal of such advice is to improve system or application performance. > .PP > Initially, the system call supported a set of "conventional" > @@ -143,7 +147,7 @@ The resident set size (RSS) of the calling process will be immediately > reduced however. > .IP > .B MADV_DONTNEED > -cannot be applied to locked pages, Huge TLB pages, or > +cannot be applied to locked pages, or > .BR VM_PFNMAP > pages. This looks good, but since this will be a behavior change and we won't be able to change the old kernels, I saw the man page normally does this with things like: Since Linux 5.18, this madvise supports hugetlbfs pages. Majorly it states starting from which version it'll work, and when it'll not. > (Pages marked with the kernel-internal > @@ -170,24 +174,24 @@ Note that some of these operations change the semantics of memory accesses. > .\" commit f6b3ec238d12c8cc6cc71490c6e3127988460349 > Free up a given range of pages > and its associated backing store. > -This is equivalent to punching a hole in the corresponding byte > +This is equivalent to punching a hole in the corresponding > range of the backing store (see > .BR fallocate (2)). > Subsequent accesses in the specified address range will see > -bytes containing zero. > +pages containing zero. > .\" Databases want to use this feature to drop a section of their > .\" bufferpool (shared memory segments) - without writing back to > .\" disk/swap space. This feature is also useful for supporting > .\" hot-plug memory on UML. > .IP > The specified address range must be mapped shared and writable. > -This flag cannot be applied to locked pages, Huge TLB pages, or > +This flag cannot be applied to locked pages, or > .BR VM_PFNMAP > pages. > .IP > In the initial implementation, only > .BR tmpfs (5) > -was supported > +supported > .BR MADV_REMOVE ; > but since Linux 3.5, > .\" commit 3f31d07571eeea18a7d34db9af21d2285b807a17 > @@ -196,9 +200,9 @@ any filesystem which supports the > .BR FALLOC_FL_PUNCH_HOLE > mode also supports > .BR MADV_REMOVE . > -Hugetlbfs fails with the error > -.BR EINVAL > -and other filesystems fail with the error > +Filesystems which do not support > +.BR MADV_REMOVE > +fail with the error > .BR EOPNOTSUPP . > .TP > .BR MADV_DONTFORK " (since Linux 2.6.16)" > @@ -596,6 +600,18 @@ that are not mapped, the Linux version of > ignores them and applies the call to the rest (but returns > .B ENOMEM > from the system call, as it should). > +.PP > +If the specified address > +.I addr > +is within a mapping backed by Huge TLB pages, then > +.I addr > +must be aligned to the underlying Huge TLB page size. If the range > +specified by > +.I addr > +and > +.I length > +ends in a mapping backed by Huge TLB pages, then the end of the range > +will be rounded up to a multiple of the underlying Huge TLB page size. I'm slightly worried this could be hidden too deep, meanwhile it duplicates part of the sentence of how start/end will be treated. How about adding a short paragraph into each of MADV_DONTNEED and MADV_REMOVE section (right after the new sentences upon hugetlbfs), with: For hugetlbfs, the start/end alignments on page sizes will be based on huge page size. No strong opinions on any of these. Anyway: Acked-by: Peter Xu <peterx@redhat.com> Thanks,
On 5/25/22 11:38, Peter Xu wrote: > Hi, Mike, > > Some minor nitpicks below. > > On Tue, May 24, 2022 at 04:28:44PM -0700, Mike Kravetz wrote: >> .B MADV_DONTNEED >> -cannot be applied to locked pages, Huge TLB pages, or >> +cannot be applied to locked pages, or >> .BR VM_PFNMAP >> pages. > > This looks good, but since this will be a behavior change and we won't be > able to change the old kernels, I saw the man page normally does this with > things like: > > Since Linux 5.18, this madvise supports hugetlbfs page> > Majorly it states starting from which version it'll work, and when it'll > not. You are right. I will add this. > >> (Pages marked with the kernel-internal >> @@ -170,24 +174,24 @@ Note that some of these operations change the semantics of memory accesses. >> .\" commit f6b3ec238d12c8cc6cc71490c6e3127988460349 >> Free up a given range of pages >> and its associated backing store. >> -This is equivalent to punching a hole in the corresponding byte >> +This is equivalent to punching a hole in the corresponding >> range of the backing store (see >> .BR fallocate (2)). >> Subsequent accesses in the specified address range will see >> -bytes containing zero. >> +pages containing zero. >> .\" Databases want to use this feature to drop a section of their >> .\" bufferpool (shared memory segments) - without writing back to >> .\" disk/swap space. This feature is also useful for supporting >> .\" hot-plug memory on UML. >> .IP >> The specified address range must be mapped shared and writable. >> -This flag cannot be applied to locked pages, Huge TLB pages, or >> +This flag cannot be applied to locked pages, or >> .BR VM_PFNMAP >> pages. >> .IP >> In the initial implementation, only >> .BR tmpfs (5) >> -was supported >> +supported >> .BR MADV_REMOVE ; >> but since Linux 3.5, >> .\" commit 3f31d07571eeea18a7d34db9af21d2285b807a17 >> @@ -196,9 +200,9 @@ any filesystem which supports the >> .BR FALLOC_FL_PUNCH_HOLE >> mode also supports >> .BR MADV_REMOVE . >> -Hugetlbfs fails with the error >> -.BR EINVAL >> -and other filesystems fail with the error >> +Filesystems which do not support >> +.BR MADV_REMOVE >> +fail with the error >> .BR EOPNOTSUPP . >> .TP >> .BR MADV_DONTFORK " (since Linux 2.6.16)" >> @@ -596,6 +600,18 @@ that are not mapped, the Linux version of >> ignores them and applies the call to the rest (but returns >> .B ENOMEM >> from the system call, as it should). >> +.PP >> +If the specified address >> +.I addr >> +is within a mapping backed by Huge TLB pages, then >> +.I addr >> +must be aligned to the underlying Huge TLB page size. If the range >> +specified by >> +.I addr >> +and >> +.I length >> +ends in a mapping backed by Huge TLB pages, then the end of the range >> +will be rounded up to a multiple of the underlying Huge TLB page size. > > I'm slightly worried this could be hidden too deep, meanwhile it duplicates > part of the sentence of how start/end will be treated. Yes, I just dumped more stuff into the NOTES section. Will rearrange as you suggested. > > How about adding a short paragraph into each of MADV_DONTNEED and > MADV_REMOVE section (right after the new sentences upon hugetlbfs), with: > > For hugetlbfs, the start/end alignments on page sizes will be based on > huge page size. > > No strong opinions on any of these. Anyway: > > Acked-by: Peter Xu <peterx@redhat.com> Thanks Peter
diff --git a/man2/madvise.2 b/man2/madvise.2 index f1f384c0c..c3b0615cb 100644 --- a/man2/madvise.2 +++ b/man2/madvise.2 @@ -61,9 +61,13 @@ system call is used to give advice or directions to the kernel about the address range beginning at address .I addr and with size +.IR length . +.BR madvise () +only operates on whole pages, therefore +.I addr +must be page-aligned. The value of .I length -bytes -In most cases, +is rounded up to a multiple of page size. In most cases, the goal of such advice is to improve system or application performance. .PP Initially, the system call supported a set of "conventional" @@ -143,7 +147,7 @@ The resident set size (RSS) of the calling process will be immediately reduced however. .IP .B MADV_DONTNEED -cannot be applied to locked pages, Huge TLB pages, or +cannot be applied to locked pages, or .BR VM_PFNMAP pages. (Pages marked with the kernel-internal @@ -170,24 +174,24 @@ Note that some of these operations change the semantics of memory accesses. .\" commit f6b3ec238d12c8cc6cc71490c6e3127988460349 Free up a given range of pages and its associated backing store. -This is equivalent to punching a hole in the corresponding byte +This is equivalent to punching a hole in the corresponding range of the backing store (see .BR fallocate (2)). Subsequent accesses in the specified address range will see -bytes containing zero. +pages containing zero. .\" Databases want to use this feature to drop a section of their .\" bufferpool (shared memory segments) - without writing back to .\" disk/swap space. This feature is also useful for supporting .\" hot-plug memory on UML. .IP The specified address range must be mapped shared and writable. -This flag cannot be applied to locked pages, Huge TLB pages, or +This flag cannot be applied to locked pages, or .BR VM_PFNMAP pages. .IP In the initial implementation, only .BR tmpfs (5) -was supported +supported .BR MADV_REMOVE ; but since Linux 3.5, .\" commit 3f31d07571eeea18a7d34db9af21d2285b807a17 @@ -196,9 +200,9 @@ any filesystem which supports the .BR FALLOC_FL_PUNCH_HOLE mode also supports .BR MADV_REMOVE . -Hugetlbfs fails with the error -.BR EINVAL -and other filesystems fail with the error +Filesystems which do not support +.BR MADV_REMOVE +fail with the error .BR EOPNOTSUPP . .TP .BR MADV_DONTFORK " (since Linux 2.6.16)" @@ -596,6 +600,18 @@ that are not mapped, the Linux version of ignores them and applies the call to the rest (but returns .B ENOMEM from the system call, as it should). +.PP +If the specified address +.I addr +is within a mapping backed by Huge TLB pages, then +.I addr +must be aligned to the underlying Huge TLB page size. If the range +specified by +.I addr +and +.I length +ends in a mapping backed by Huge TLB pages, then the end of the range +will be rounded up to a multiple of the underlying Huge TLB page size. .\" .SH HISTORY .\" The .\" .BR madvise ()
Clarify that madvise only works on full pages, and remove references to 'bytes'. Update MADV_DONTNEED and MADV_REMOVE sections to remove notes that HugeTLB mappings are not supported. They now are supported. Under 'Linux notes' describe addr requirements and length handling for ranges in HugeTLB mappings. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- man2/madvise.2 | 36 ++++++++++++++++++++++++++---------- 1 file changed, 26 insertions(+), 10 deletions(-)