Message ID | 20221021223300.3675201-5-zokeefe@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Add MADV_COLLAPSE documentation | expand |
Hi Zach! On 10/22/22 00:33, Zach OKeefe wrote: > From: Zach O'Keefe <zokeefe@google.com> > > Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545 > ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and > upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to > MADV_COLLAPSE"). Update the man-pages for madvise(2) and > process_madvise(2). > > Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/ > Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/ > Signed-off-by: Zach O'Keefe <zokeefe@google.com> There are a few issues with this patch: alx@asus5775:~/src/linux/man-pages/man-pages$ make lint-man-groff LINT (groff) tmp/lint/man2/madvise.2.lint-man.groff.touch eqn:man2/madvise.2:473: error: invalid input character code '128' eqn:man2/madvise.2:473: error: invalid input character code '153' an.tmac:man2/madvise.2:445: style: .BR expects at least 2 arguments, got 1 an.tmac:man2/madvise.2:456: style: .BR expects at least 2 arguments, got 1 an.tmac:man2/madvise.2:463: style: .BR expects at least 2 arguments, got 1 found style problems; aborting make: *** [lib/lint-man.mk:77: tmp/lint/man2/madvise.2.lint-man.groff.touch] Error 1 Let's investigate them: alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 473p man2/madvise.2 this operation will be deemed successful. This one was a bit difficult to track, since the line count seems to be off by one: alx@asus5775:~/src/linux/man-pages/man-pages$ tbl man2/madvise.2 | hd | grep -C1 ' 80 ' 00003d40 63 65 73 73 66 75 6c 2e 0a 4e 6f 74 65 20 74 68 |cessful..Note th| 00003d50 61 74 20 74 68 69 73 20 64 6f 65 73 6e e2 80 99 |at this doesn...| 00003d60 74 20 67 75 61 72 61 6e 74 65 65 20 61 6e 79 74 |t guarantee anyt| alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 474p man2/madvise.2 Note that this doesn’t guarantee anything about other possible mappings of The issue was in line 474, and the issue is that it uses a weird single quote. Please use the foillowing ASCII character for the single quote (see ascii(7)): 047 39 27 ' The rest of issues seems trivial: Use .B instead of .BR because there's no "roman" (i.e., non-bold) part. alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 445p man2/madvise.2 .BR MADV_COLLAPSE alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 456p man2/madvise.2 .BR MADV_COLLAPSE alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 463p man2/madvise.2 .BR VM_NOHUGEPAGE I'll report a bug to groff(1) about the issue with the line count. Cheers, Alex > --- > man2/madvise.2 | 90 +++++++++++++++++++++++++++++++++++++++++- > man2/process_madvise.2 | 10 +++++ > 2 files changed, 98 insertions(+), 2 deletions(-) > > diff --git a/man2/madvise.2 b/man2/madvise.2 > index df3413cc8..b03fc731d 100644 > --- a/man2/madvise.2 > +++ b/man2/madvise.2 > @@ -385,9 +385,10 @@ set (see > .BR prctl (2) ). > .IP > The > -.B MADV_HUGEPAGE > +.BR MADV_HUGEPAGE , > +.BR MADV_NOHUGEPAGE , > and > -.B MADV_NOHUGEPAGE > +.B MADV_COLLAPSE > operations are available only if the kernel was configured with > .B CONFIG_TRANSPARENT_HUGEPAGE > and file/shmem memory is only supported if the kernel was configured with > @@ -400,6 +401,81 @@ and > .I length > will not be backed by transparent hugepages. > .TP > +.BR MADV_COLLAPSE " (since Linux 6.1)" > +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77 > +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321 > +Perform a best-effort synchronous collapse of the native pages mapped by the > +memory range into Transparent Huge Pages (THPs). > +.B MADV_COLLAPSE > +operates on the current state of memory of the calling process and makes no > +persistent changes or guarantees on how pages will be mapped, > +constructed, > +or faulted in the future. > +.IP > +.B MADV_COLLAPSE > +supports private anonymous pages (see > +.BR mmap (2)), > +shmem pages, > +and file-backed pages. > +See > +.B MADV_HUGEPAGE > +for general information on memory requirements for THP. > +If the range provided spans multiple VMAs, > +the semantics of the collapse over each VMA is independent from the others. > +If collapse of a given huge page-aligned/sized region fails, > +the operation may continue to attempt collapsing the remainder of the > +specified memory. > +.B MADV_COLLAPSE > +will automatically clamp the provided range to be hugepage-aligned. > +.IP > +All non-resident pages covered by the range will first be > +swapped/faulted-in, > +before being copied onto a freshly allocated hugepage. > +If the native pages compose the same PTE-mapped hugepage, > +and are suitably aligned, > +allocation of a new hugepage may be elided and collapse may happen > +in-place. > +Unmapped pages will have their data directly initialized to 0 in the new > +hugepage. > +However, > +for every eligible hugepage-aligned/sized region to be collapsed, > +at least one page must currently be backed by physical memory. > +.IP > +.BR MADV_COLLAPSE > +is independent of any sysfs > +(see > +.BR sysfs (5)) > +setting under > +.IR /sys/kernel/mm/transparent_hugepage , > +both in terms of determining THP eligibility, > +and allocation semantics. > +See Linux kernel source file > +.I Documentation/admin\-guide/mm/transhuge.rst > +for more information. > +.BR MADV_COLLAPSE > +also ignores > +.B huge= > +tmpfs mount when operating on tmpfs files. > +Allocation for the new hugepage may enter direct reclaim and/or compaction, > +regardless of VMA flags > +(though > +.BR VM_NOHUGEPAGE > +is still respected). > +.IP > +When the system has multiple NUMA nodes, > +the hugepage will be allocated from the node providing the most native > +pages. > +.IP > +If all hugepage-sized/aligned regions covered by the provided range were > +either successfully collapsed, > +or were already PMD-mapped THPs, > +this operation will be deemed successful. > +Note that this doesn’t guarantee anything about other possible mappings of > +the memory. > +Also note that many failures might have occurred since the operation may > +continue to collapse in the event collapse of a single hugepage-sized/aligned > +region fails. > +.TP > .BR MADV_DONTDUMP " (since Linux 3.4)" > .\" commit 909af768e88867016f427264ae39d27a57b6a8ed > .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519 > @@ -619,6 +695,11 @@ A kernel resource was temporarily unavailable. > .B EBADF > The map exists, but the area maps something that isn't a file. > .TP > +.B EBUSY > +(for > +.BR MADV_COLLAPSE ) > +Could not charge hugepage to cgroup: cgroup limit exceeded. > +.TP > .B EFAULT > .I advice > is > @@ -716,6 +797,11 @@ maximum resident set size. > Not enough memory: paging in failed. > .TP > .B ENOMEM > +(for > +.BR MADV_COLLAPSE ) > +Not enough memory: could not allocate hugepage. > +.TP > +.B ENOMEM > Addresses in the specified range are not currently > mapped, or are outside the address space of the process. > .TP > diff --git a/man2/process_madvise.2 b/man2/process_madvise.2 > index 44d3b94e8..8b0ddccdd 100644 > --- a/man2/process_madvise.2 > +++ b/man2/process_madvise.2 > @@ -73,6 +73,10 @@ argument is one of the following values: > See > .BR madvise (2). > .TP > +.B MADV_COLLAPSE > +See > +.BR madvise (2). > +.TP > .B MADV_PAGEOUT > See > .BR madvise (2). > @@ -173,6 +177,12 @@ The caller does not have permission to access the address space of the process > .TP > .B ESRCH > The target process does not exist (i.e., it has terminated and been waited on). > +.PP > +See > +.BR madvise (2) > +for > +.IR advice -specific > +errors. > .SH VERSIONS > This system call first appeared in Linux 5.10. > .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc
Hey Alex! On Mon, Oct 31, 2022 at 2:15 PM Alejandro Colomar <alx.manpages@gmail.com> wrote: > > Hi Zach! > > On 10/22/22 00:33, Zach OKeefe wrote: > > From: Zach O'Keefe <zokeefe@google.com> > > > > Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545 > > ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and > > upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to > > MADV_COLLAPSE"). Update the man-pages for madvise(2) and > > process_madvise(2). > > > > Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/ > > Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/ > > Signed-off-by: Zach O'Keefe <zokeefe@google.com> > > There are a few issues with this patch: > > alx@asus5775:~/src/linux/man-pages/man-pages$ make lint-man-groff > LINT (groff) tmp/lint/man2/madvise.2.lint-man.groff.touch > eqn:man2/madvise.2:473: error: invalid input character code '128' > eqn:man2/madvise.2:473: error: invalid input character code '153' > an.tmac:man2/madvise.2:445: style: .BR expects at least 2 arguments, got 1 > an.tmac:man2/madvise.2:456: style: .BR expects at least 2 arguments, got 1 > an.tmac:man2/madvise.2:463: style: .BR expects at least 2 arguments, got 1 > found style problems; aborting > make: *** [lib/lint-man.mk:77: tmp/lint/man2/madvise.2.lint-man.groff.touch] Error 1 > > > Let's investigate them: > Thank you :) > alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 473p man2/madvise.2 > this operation will be deemed successful. > > This one was a bit difficult to track, since the line count seems to be off by one: > > alx@asus5775:~/src/linux/man-pages/man-pages$ tbl man2/madvise.2 | hd | grep -C1 > ' 80 ' > 00003d40 63 65 73 73 66 75 6c 2e 0a 4e 6f 74 65 20 74 68 |cessful..Note th| > 00003d50 61 74 20 74 68 69 73 20 64 6f 65 73 6e e2 80 99 |at this doesn...| > 00003d60 74 20 67 75 61 72 61 6e 74 65 65 20 61 6e 79 74 |t guarantee anyt| > alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 474p man2/madvise.2 > Note that this doesn’t guarantee anything about other possible mappings of > > The issue was in line 474, and the issue is that it uses a weird single quote. > Please use the foillowing ASCII character for the single quote (see ascii(7)): > 047 39 27 ' > Very weird and good find! Honestly, I had prototyped this in Google Docs and copy-pasta'd this over as the basis. I tried testing this again - and same thing - Google Docs uses some other character. Anyways - glad you caught this. > The rest of issues seems trivial: > Use .B instead of .BR because there's no "roman" (i.e., non-bold) part. > This was the first time it clicked what ".BR" meant: "bold followed by roman". > alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 445p man2/madvise.2 > .BR MADV_COLLAPSE > alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 456p man2/madvise.2 > .BR MADV_COLLAPSE > alx@asus5775:~/src/linux/man-pages/man-pages$ sed -n 463p man2/madvise.2 > .BR VM_NOHUGEPAGE > These didn't show up with my version of groff (as in 1/2), but I've applied the fixes and sent out a v4 for this patch. Again, thank you for all your help here! Best, Zach > > I'll report a bug to groff(1) about the issue with the line count. > Ya that's an odd one. Sorry for having to encounter this - must have been quite confusing. Thank you! > Cheers, > > Alex > > > --- > > man2/madvise.2 | 90 +++++++++++++++++++++++++++++++++++++++++- > > man2/process_madvise.2 | 10 +++++ > > 2 files changed, 98 insertions(+), 2 deletions(-) > > > > diff --git a/man2/madvise.2 b/man2/madvise.2 > > index df3413cc8..b03fc731d 100644 > > --- a/man2/madvise.2 > > +++ b/man2/madvise.2 > > @@ -385,9 +385,10 @@ set (see > > .BR prctl (2) ). > > .IP > > The > > -.B MADV_HUGEPAGE > > +.BR MADV_HUGEPAGE , > > +.BR MADV_NOHUGEPAGE , > > and > > -.B MADV_NOHUGEPAGE > > +.B MADV_COLLAPSE > > operations are available only if the kernel was configured with > > .B CONFIG_TRANSPARENT_HUGEPAGE > > and file/shmem memory is only supported if the kernel was configured with > > @@ -400,6 +401,81 @@ and > > .I length > > will not be backed by transparent hugepages. > > .TP > > +.BR MADV_COLLAPSE " (since Linux 6.1)" > > +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77 > > +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321 > > +Perform a best-effort synchronous collapse of the native pages mapped by the > > +memory range into Transparent Huge Pages (THPs). > > +.B MADV_COLLAPSE > > +operates on the current state of memory of the calling process and makes no > > +persistent changes or guarantees on how pages will be mapped, > > +constructed, > > +or faulted in the future. > > +.IP > > +.B MADV_COLLAPSE > > +supports private anonymous pages (see > > +.BR mmap (2)), > > +shmem pages, > > +and file-backed pages. > > +See > > +.B MADV_HUGEPAGE > > +for general information on memory requirements for THP. > > +If the range provided spans multiple VMAs, > > +the semantics of the collapse over each VMA is independent from the others. > > +If collapse of a given huge page-aligned/sized region fails, > > +the operation may continue to attempt collapsing the remainder of the > > +specified memory. > > +.B MADV_COLLAPSE > > +will automatically clamp the provided range to be hugepage-aligned. > > +.IP > > +All non-resident pages covered by the range will first be > > +swapped/faulted-in, > > +before being copied onto a freshly allocated hugepage. > > +If the native pages compose the same PTE-mapped hugepage, > > +and are suitably aligned, > > +allocation of a new hugepage may be elided and collapse may happen > > +in-place. > > +Unmapped pages will have their data directly initialized to 0 in the new > > +hugepage. > > +However, > > +for every eligible hugepage-aligned/sized region to be collapsed, > > +at least one page must currently be backed by physical memory. > > +.IP > > +.BR MADV_COLLAPSE > > +is independent of any sysfs > > +(see > > +.BR sysfs (5)) > > +setting under > > +.IR /sys/kernel/mm/transparent_hugepage , > > +both in terms of determining THP eligibility, > > +and allocation semantics. > > +See Linux kernel source file > > +.I Documentation/admin\-guide/mm/transhuge.rst > > +for more information. > > +.BR MADV_COLLAPSE > > +also ignores > > +.B huge= > > +tmpfs mount when operating on tmpfs files. > > +Allocation for the new hugepage may enter direct reclaim and/or compaction, > > +regardless of VMA flags > > +(though > > +.BR VM_NOHUGEPAGE > > +is still respected). > > +.IP > > +When the system has multiple NUMA nodes, > > +the hugepage will be allocated from the node providing the most native > > +pages. > > +.IP > > +If all hugepage-sized/aligned regions covered by the provided range were > > +either successfully collapsed, > > +or were already PMD-mapped THPs, > > +this operation will be deemed successful. > > +Note that this doesn’t guarantee anything about other possible mappings of > > +the memory. > > +Also note that many failures might have occurred since the operation may > > +continue to collapse in the event collapse of a single hugepage-sized/aligned > > +region fails. > > +.TP > > .BR MADV_DONTDUMP " (since Linux 3.4)" > > .\" commit 909af768e88867016f427264ae39d27a57b6a8ed > > .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519 > > @@ -619,6 +695,11 @@ A kernel resource was temporarily unavailable. > > .B EBADF > > The map exists, but the area maps something that isn't a file. > > .TP > > +.B EBUSY > > +(for > > +.BR MADV_COLLAPSE ) > > +Could not charge hugepage to cgroup: cgroup limit exceeded. > > +.TP > > .B EFAULT > > .I advice > > is > > @@ -716,6 +797,11 @@ maximum resident set size. > > Not enough memory: paging in failed. > > .TP > > .B ENOMEM > > +(for > > +.BR MADV_COLLAPSE ) > > +Not enough memory: could not allocate hugepage. > > +.TP > > +.B ENOMEM > > Addresses in the specified range are not currently > > mapped, or are outside the address space of the process. > > .TP > > diff --git a/man2/process_madvise.2 b/man2/process_madvise.2 > > index 44d3b94e8..8b0ddccdd 100644 > > --- a/man2/process_madvise.2 > > +++ b/man2/process_madvise.2 > > @@ -73,6 +73,10 @@ argument is one of the following values: > > See > > .BR madvise (2). > > .TP > > +.B MADV_COLLAPSE > > +See > > +.BR madvise (2). > > +.TP > > .B MADV_PAGEOUT > > See > > .BR madvise (2). > > @@ -173,6 +177,12 @@ The caller does not have permission to access the address space of the process > > .TP > > .B ESRCH > > The target process does not exist (i.e., it has terminated and been waited on). > > +.PP > > +See > > +.BR madvise (2) > > +for > > +.IR advice -specific > > +errors. > > .SH VERSIONS > > This system call first appeared in Linux 5.10. > > .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc > > -- > <http://www.alejandro-colomar.es/>
At 2022-10-31T22:15:09+0100, Alejandro Colomar wrote: > The issue was in line 474, and the issue is that it uses a weird single > quote. Please use the foillowing ASCII character for the single quote (see > ascii(7)): > 047 39 27 ' [...] > I'll report a bug to groff(1) about the issue with the line count. Thanks, Alex. There appear to be some very old bugs around input line number tracking in GNU eqn, possibly going back 30+ years. I've committed a regression test[1] and fix.[2] The fix can be expected (along with literally hundreds of others) in groff 1.23. And now I see I managed to sneak a cosmetic indentation error into the commit message for the second (but not the ChangeLog), where I'm stuck with it forever. Oh well. Regards, Branden [1] https://git.savannah.gnu.org/cgit/groff.git/commit/?id=7e23e1342077a6d7c0b02c3d666f131d95f2b510 [2] https://git.savannah.gnu.org/cgit/groff.git/commit/?id=dc98a8b09e7f3dcfe968b978eb210f468db78cc9
Hey Branden! On 11/1/22 02:51, G. Branden Robinson wrote: > At 2022-10-31T22:15:09+0100, Alejandro Colomar wrote: >> The issue was in line 474, and the issue is that it uses a weird single >> quote. Please use the foillowing ASCII character for the single quote (see >> ascii(7)): >> 047 39 27 ' > [...] >> I'll report a bug to groff(1) about the issue with the line count. > > Thanks, Alex. There appear to be some very old bugs around input line > number tracking in GNU eqn, possibly going back 30+ years. Heh! It feels good finding 30-yr-old bugs :) > > I've committed a regression test[1] and fix.[2] The fix can be expected > (along with literally hundreds of others) in groff 1.23. Yeah, I'm waiting[1] for that release to have 'make lint' be usable by contributors. But it's been already proved in this patch set that it can be useful to catch things that I miss, even if I have to run it. [1]: <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/INSTALL#n98> > > And now I see I managed to sneak a cosmetic indentation error into the > commit message for the second (but not the ChangeLog), where I'm stuck > with it forever. Oh well. :) Cheers, Alex > > Regards, > Branden > > [1] https://git.savannah.gnu.org/cgit/groff.git/commit/?id=7e23e1342077a6d7c0b02c3d666f131d95f2b510 > [2] https://git.savannah.gnu.org/cgit/groff.git/commit/?id=dc98a8b09e7f3dcfe968b978eb210f468db78cc9
Hi Zach, On 10/22/22 00:33, Zach OKeefe wrote: > From: Zach O'Keefe <zokeefe@google.com> > > Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545 > ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and > upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to > MADV_COLLAPSE"). Update the man-pages for madvise(2) and > process_madvise(2). > > Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/ > Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/ > Signed-off-by: Zach O'Keefe <zokeefe@google.com> Please see a few comments below. Cheers, Alex > --- > man2/madvise.2 | 90 +++++++++++++++++++++++++++++++++++++++++- > man2/process_madvise.2 | 10 +++++ > 2 files changed, 98 insertions(+), 2 deletions(-) > > diff --git a/man2/madvise.2 b/man2/madvise.2 > index df3413cc8..b03fc731d 100644 > --- a/man2/madvise.2 > +++ b/man2/madvise.2 > @@ -385,9 +385,10 @@ set (see > .BR prctl (2) ). > .IP > The > -.B MADV_HUGEPAGE > +.BR MADV_HUGEPAGE , > +.BR MADV_NOHUGEPAGE , > and > -.B MADV_NOHUGEPAGE > +.B MADV_COLLAPSE > operations are available only if the kernel was configured with > .B CONFIG_TRANSPARENT_HUGEPAGE > and file/shmem memory is only supported if the kernel was configured with > @@ -400,6 +401,81 @@ and > .I length > will not be backed by transparent hugepages. > .TP > +.BR MADV_COLLAPSE " (since Linux 6.1)" > +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77 > +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321 > +Perform a best-effort synchronous collapse of the native pages mapped by the Please use semantic line breaks. In this case, I'd break after "pages". man-pages(7): Use semantic newlines In the source of a manual page, new sentences should be started on new lines, long sentences should be split into lines at clause breaks (com‐ mas, semicolons, colons, and so on), and long clauses should be split at phrase boundaries. This convention, sometimes known as "semantic newlines", makes it easier to see the effect of patches, which often operate at the level of individual sentences, clauses, or phrases. > +memory range into Transparent Huge Pages (THPs). > +.B MADV_COLLAPSE > +operates on the current state of memory of the calling process and makes no Here I'd break after "and". > +persistent changes or guarantees on how pages will be mapped, > +constructed, > +or faulted in the future. > +.IP > +.B MADV_COLLAPSE > +supports private anonymous pages (see > +.BR mmap (2)), > +shmem pages, > +and file-backed pages. > +See > +.B MADV_HUGEPAGE > +for general information on memory requirements for THP. > +If the range provided spans multiple VMAs, > +the semantics of the collapse over each VMA is independent from the others. > +If collapse of a given huge page-aligned/sized region fails, > +the operation may continue to attempt collapsing the remainder of the Break after "collapsing". > +specified memory. > +.B MADV_COLLAPSE > +will automatically clamp the provided range to be hugepage-aligned. > +.IP > +All non-resident pages covered by the range will first be Break after "range". > +swapped/faulted-in, > +before being copied onto a freshly allocated hugepage. > +If the native pages compose the same PTE-mapped hugepage, > +and are suitably aligned, > +allocation of a new hugepage may be elided and collapse may happen Break before or after "and". > +in-place. > +Unmapped pages will have their data directly initialized to 0 in the new Break after "0". > +hugepage. > +However, > +for every eligible hugepage-aligned/sized region to be collapsed, > +at least one page must currently be backed by physical memory. > +.IP > +.BR MADV_COLLAPSE s/BR/B/ > +is independent of any sysfs > +(see > +.BR sysfs (5)) > +setting under > +.IR /sys/kernel/mm/transparent_hugepage , > +both in terms of determining THP eligibility, > +and allocation semantics. > +See Linux kernel source file > +.I Documentation/admin\-guide/mm/transhuge.rst > +for more information. > +.BR MADV_COLLAPSE s/BR/B/ > +also ignores > +.B huge= > +tmpfs mount when operating on tmpfs files. > +Allocation for the new hugepage may enter direct reclaim and/or compaction, > +regardless of VMA flags > +(though > +.BR VM_NOHUGEPAGE s/BR/B/ > +is still respected). > +.IP > +When the system has multiple NUMA nodes, > +the hugepage will be allocated from the node providing the most native Break after "from". > +pages. > +.IP > +If all hugepage-sized/aligned regions covered by the provided range were Prefer English rather than "/". > +either successfully collapsed, > +or were already PMD-mapped THPs, > +this operation will be deemed successful. > +Note that this doesn’t guarantee anything about other possible mappings of Break after "about". > +the memory. > +Also note that many failures might have occurred since the operation may > +continue to collapse in the event collapse of a single hugepage-sized/aligned Add some omitted "that" or something that will help readability to non-native-English readers. And break at a better place. > +region fails. > +.TP > .BR MADV_DONTDUMP " (since Linux 3.4)" > .\" commit 909af768e88867016f427264ae39d27a57b6a8ed > .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519 > @@ -619,6 +695,11 @@ A kernel resource was temporarily unavailable. > .B EBADF > The map exists, but the area maps something that isn't a file. > .TP > +.B EBUSY > +(for > +.BR MADV_COLLAPSE ) > +Could not charge hugepage to cgroup: cgroup limit exceeded. > +.TP > .B EFAULT > .I advice > is > @@ -716,6 +797,11 @@ maximum resident set size. > Not enough memory: paging in failed. > .TP > .B ENOMEM > +(for > +.BR MADV_COLLAPSE ) > +Not enough memory: could not allocate hugepage. > +.TP > +.B ENOMEM > Addresses in the specified range are not currently > mapped, or are outside the address space of the process. > .TP > diff --git a/man2/process_madvise.2 b/man2/process_madvise.2 > index 44d3b94e8..8b0ddccdd 100644 > --- a/man2/process_madvise.2 > +++ b/man2/process_madvise.2 > @@ -73,6 +73,10 @@ argument is one of the following values: > See > .BR madvise (2). > .TP > +.B MADV_COLLAPSE > +See > +.BR madvise (2). > +.TP > .B MADV_PAGEOUT > See > .BR madvise (2). > @@ -173,6 +177,12 @@ The caller does not have permission to access the address space of the process > .TP > .B ESRCH > The target process does not exist (i.e., it has terminated and been waited on). > +.PP > +See > +.BR madvise (2) > +for > +.IR advice -specific > +errors. > .SH VERSIONS > This system call first appeared in Linux 5.10. > .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc
On Sun, Dec 11, 2022 at 9:59 AM Alejandro Colomar <alx.manpages@gmail.com> wrote: > > Hi Zach, Hey Alex, > On 10/22/22 00:33, Zach OKeefe wrote: > > From: Zach O'Keefe <zokeefe@google.com> > > > > Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545 > > ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and > > upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to > > MADV_COLLAPSE"). Update the man-pages for madvise(2) and > > process_madvise(2). > > > > Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/ > > Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/ > > Signed-off-by: Zach O'Keefe <zokeefe@google.com> > > Please see a few comments below. > Thanks for the mail. So, this patch was taken as commit b106cd5bf ("madvise.2: add documentation for MADV_COLLAPSE"). Some of your comments below were applied (I think, by you) as fixes pre-commit. However, there are some new comments (or ones that address the same lines, but in different ways). Is this mail to log ~ what changes were done, or is there anything actionable here on my side? Best, Zach Thanks for this. > Cheers, > > Alex > > > --- > > man2/madvise.2 | 90 +++++++++++++++++++++++++++++++++++++++++- > > man2/process_madvise.2 | 10 +++++ > > 2 files changed, 98 insertions(+), 2 deletions(-) > > > > diff --git a/man2/madvise.2 b/man2/madvise.2 > > index df3413cc8..b03fc731d 100644 > > --- a/man2/madvise.2 > > +++ b/man2/madvise.2 > > @@ -385,9 +385,10 @@ set (see > > .BR prctl (2) ). > > .IP > > The > > -.B MADV_HUGEPAGE > > +.BR MADV_HUGEPAGE , > > +.BR MADV_NOHUGEPAGE , > > and > > -.B MADV_NOHUGEPAGE > > +.B MADV_COLLAPSE > > operations are available only if the kernel was configured with > > .B CONFIG_TRANSPARENT_HUGEPAGE > > and file/shmem memory is only supported if the kernel was configured with > > @@ -400,6 +401,81 @@ and > > .I length > > will not be backed by transparent hugepages. > > .TP > > +.BR MADV_COLLAPSE " (since Linux 6.1)" > > +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77 > > +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321 > > +Perform a best-effort synchronous collapse of the native pages mapped by the > > Please use semantic line breaks. In this case, I'd break after "pages". > > man-pages(7): > Use semantic newlines > In the source of a manual page, new sentences should be started on new > lines, long sentences should be split into lines at clause breaks (com‐ > mas, semicolons, colons, and so on), and long clauses should be split > at phrase boundaries. This convention, sometimes known as "semantic > newlines", makes it easier to see the effect of patches, which often > operate at the level of individual sentences, clauses, or phrases. > > > +memory range into Transparent Huge Pages (THPs). > > +.B MADV_COLLAPSE > > +operates on the current state of memory of the calling process and makes no > > Here I'd break after "and". > > > +persistent changes or guarantees on how pages will be mapped, > > +constructed, > > +or faulted in the future. > > +.IP > > +.B MADV_COLLAPSE > > +supports private anonymous pages (see > > +.BR mmap (2)), > > +shmem pages, > > +and file-backed pages. > > +See > > +.B MADV_HUGEPAGE > > +for general information on memory requirements for THP. > > +If the range provided spans multiple VMAs, > > +the semantics of the collapse over each VMA is independent from the others. > > +If collapse of a given huge page-aligned/sized region fails, > > +the operation may continue to attempt collapsing the remainder of the > > Break after "collapsing". > > > +specified memory. > > +.B MADV_COLLAPSE > > +will automatically clamp the provided range to be hugepage-aligned. > > +.IP > > +All non-resident pages covered by the range will first be > > Break after "range". > > > +swapped/faulted-in, > > +before being copied onto a freshly allocated hugepage. > > +If the native pages compose the same PTE-mapped hugepage, > > +and are suitably aligned, > > +allocation of a new hugepage may be elided and collapse may happen > > Break before or after "and". > > > +in-place. > > +Unmapped pages will have their data directly initialized to 0 in the new > > Break after "0". > > > +hugepage. > > +However, > > +for every eligible hugepage-aligned/sized region to be collapsed, > > +at least one page must currently be backed by physical memory. > > +.IP > > +.BR MADV_COLLAPSE > > s/BR/B/ > > > +is independent of any sysfs > > +(see > > +.BR sysfs (5)) > > +setting under > > +.IR /sys/kernel/mm/transparent_hugepage , > > +both in terms of determining THP eligibility, > > +and allocation semantics. > > +See Linux kernel source file > > +.I Documentation/admin\-guide/mm/transhuge.rst > > +for more information. > > +.BR MADV_COLLAPSE > > s/BR/B/ > > > +also ignores > > +.B huge= > > +tmpfs mount when operating on tmpfs files. > > +Allocation for the new hugepage may enter direct reclaim and/or compaction, > > +regardless of VMA flags > > +(though > > +.BR VM_NOHUGEPAGE > > s/BR/B/ > > > +is still respected). > > +.IP > > +When the system has multiple NUMA nodes, > > +the hugepage will be allocated from the node providing the most native > > Break after "from". > > > +pages. > > +.IP > > +If all hugepage-sized/aligned regions covered by the provided range were > > Prefer English rather than "/". > > > +either successfully collapsed, > > +or were already PMD-mapped THPs, > > +this operation will be deemed successful. > > +Note that this doesn’t guarantee anything about other possible mappings of > > Break after "about". > > > +the memory. > > +Also note that many failures might have occurred since the operation may > > +continue to collapse in the event collapse of a single hugepage-sized/aligned > > Add some omitted "that" or something that will help readability to > non-native-English readers. > > And break at a better place. > > > +region fails. > > +.TP > > .BR MADV_DONTDUMP " (since Linux 3.4)" > > .\" commit 909af768e88867016f427264ae39d27a57b6a8ed > > .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519 > > @@ -619,6 +695,11 @@ A kernel resource was temporarily unavailable. > > .B EBADF > > The map exists, but the area maps something that isn't a file. > > .TP > > +.B EBUSY > > +(for > > +.BR MADV_COLLAPSE ) > > +Could not charge hugepage to cgroup: cgroup limit exceeded. > > +.TP > > .B EFAULT > > .I advice > > is > > @@ -716,6 +797,11 @@ maximum resident set size. > > Not enough memory: paging in failed. > > .TP > > .B ENOMEM > > +(for > > +.BR MADV_COLLAPSE ) > > +Not enough memory: could not allocate hugepage. > > +.TP > > +.B ENOMEM > > Addresses in the specified range are not currently > > mapped, or are outside the address space of the process. > > .TP > > diff --git a/man2/process_madvise.2 b/man2/process_madvise.2 > > index 44d3b94e8..8b0ddccdd 100644 > > --- a/man2/process_madvise.2 > > +++ b/man2/process_madvise.2 > > @@ -73,6 +73,10 @@ argument is one of the following values: > > See > > .BR madvise (2). > > .TP > > +.B MADV_COLLAPSE > > +See > > +.BR madvise (2). > > +.TP > > .B MADV_PAGEOUT > > See > > .BR madvise (2). > > @@ -173,6 +177,12 @@ The caller does not have permission to access the address space of the process > > .TP > > .B ESRCH > > The target process does not exist (i.e., it has terminated and been waited on). > > +.PP > > +See > > +.BR madvise (2) > > +for > > +.IR advice -specific > > +errors. > > .SH VERSIONS > > This system call first appeared in Linux 5.10. > > .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc > > -- > <http://www.alejandro-colomar.es/>
Hey Zach, On 12/11/22 22:51, Zach O'Keefe wrote: > On Sun, Dec 11, 2022 at 9:59 AM Alejandro Colomar > <alx.manpages@gmail.com> wrote: >> >> Hi Zach, > > Hey Alex, > >> On 10/22/22 00:33, Zach OKeefe wrote: >>> From: Zach O'Keefe <zokeefe@google.com> >>> >>> Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545 >>> ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and >>> upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to >>> MADV_COLLAPSE"). Update the man-pages for madvise(2) and >>> process_madvise(2). >>> >>> Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/ >>> Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/ >>> Signed-off-by: Zach O'Keefe <zokeefe@google.com> >> >> Please see a few comments below. >> > > Thanks for the mail. So, this patch was taken as commit b106cd5bf > ("madvise.2: add documentation for MADV_COLLAPSE"). Some of your > comments below were > applied (I think, by you) as fixes pre-commit. However, there are some > new comments (or ones > that address the same lines, but in different ways). Is this mail to > log ~ what changes were done, > or is there anything actionable here on my side? Ah no, it's just that I had it marked as unread for some reason, so I thought I had forgotten to respond (and I forgot that I had applied it). :-) So, no action required. Regarding different suggestions, heh, it demonstrates that it's not exactly deterministic :P Cheers, Alex P.S.: Do you know if I have anything missing from you or any of your collegues? > > Best, > Zach > > Thanks for this. >> Cheers, >> >> Alex >> >>> --- >>> man2/madvise.2 | 90 +++++++++++++++++++++++++++++++++++++++++- >>> man2/process_madvise.2 | 10 +++++ >>> 2 files changed, 98 insertions(+), 2 deletions(-) >>> >>> diff --git a/man2/madvise.2 b/man2/madvise.2 >>> index df3413cc8..b03fc731d 100644 >>> --- a/man2/madvise.2 >>> +++ b/man2/madvise.2 >>> @@ -385,9 +385,10 @@ set (see >>> .BR prctl (2) ). >>> .IP >>> The >>> -.B MADV_HUGEPAGE >>> +.BR MADV_HUGEPAGE , >>> +.BR MADV_NOHUGEPAGE , >>> and >>> -.B MADV_NOHUGEPAGE >>> +.B MADV_COLLAPSE >>> operations are available only if the kernel was configured with >>> .B CONFIG_TRANSPARENT_HUGEPAGE >>> and file/shmem memory is only supported if the kernel was configured with >>> @@ -400,6 +401,81 @@ and >>> .I length >>> will not be backed by transparent hugepages. >>> .TP >>> +.BR MADV_COLLAPSE " (since Linux 6.1)" >>> +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77 >>> +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321 >>> +Perform a best-effort synchronous collapse of the native pages mapped by the >> >> Please use semantic line breaks. In this case, I'd break after "pages". >> >> man-pages(7): >> Use semantic newlines >> In the source of a manual page, new sentences should be started on new >> lines, long sentences should be split into lines at clause breaks (com‐ >> mas, semicolons, colons, and so on), and long clauses should be split >> at phrase boundaries. This convention, sometimes known as "semantic >> newlines", makes it easier to see the effect of patches, which often >> operate at the level of individual sentences, clauses, or phrases. >> >>> +memory range into Transparent Huge Pages (THPs). >>> +.B MADV_COLLAPSE >>> +operates on the current state of memory of the calling process and makes no >> >> Here I'd break after "and". >> >>> +persistent changes or guarantees on how pages will be mapped, >>> +constructed, >>> +or faulted in the future. >>> +.IP >>> +.B MADV_COLLAPSE >>> +supports private anonymous pages (see >>> +.BR mmap (2)), >>> +shmem pages, >>> +and file-backed pages. >>> +See >>> +.B MADV_HUGEPAGE >>> +for general information on memory requirements for THP. >>> +If the range provided spans multiple VMAs, >>> +the semantics of the collapse over each VMA is independent from the others. >>> +If collapse of a given huge page-aligned/sized region fails, >>> +the operation may continue to attempt collapsing the remainder of the >> >> Break after "collapsing". >> >>> +specified memory. >>> +.B MADV_COLLAPSE >>> +will automatically clamp the provided range to be hugepage-aligned. >>> +.IP >>> +All non-resident pages covered by the range will first be >> >> Break after "range". >> >>> +swapped/faulted-in, >>> +before being copied onto a freshly allocated hugepage. >>> +If the native pages compose the same PTE-mapped hugepage, >>> +and are suitably aligned, >>> +allocation of a new hugepage may be elided and collapse may happen >> >> Break before or after "and". >> >>> +in-place. >>> +Unmapped pages will have their data directly initialized to 0 in the new >> >> Break after "0". >> >>> +hugepage. >>> +However, >>> +for every eligible hugepage-aligned/sized region to be collapsed, >>> +at least one page must currently be backed by physical memory. >>> +.IP >>> +.BR MADV_COLLAPSE >> >> s/BR/B/ >> >>> +is independent of any sysfs >>> +(see >>> +.BR sysfs (5)) >>> +setting under >>> +.IR /sys/kernel/mm/transparent_hugepage , >>> +both in terms of determining THP eligibility, >>> +and allocation semantics. >>> +See Linux kernel source file >>> +.I Documentation/admin\-guide/mm/transhuge.rst >>> +for more information. >>> +.BR MADV_COLLAPSE >> >> s/BR/B/ >> >>> +also ignores >>> +.B huge= >>> +tmpfs mount when operating on tmpfs files. >>> +Allocation for the new hugepage may enter direct reclaim and/or compaction, >>> +regardless of VMA flags >>> +(though >>> +.BR VM_NOHUGEPAGE >> >> s/BR/B/ >> >>> +is still respected). >>> +.IP >>> +When the system has multiple NUMA nodes, >>> +the hugepage will be allocated from the node providing the most native >> >> Break after "from". >> >>> +pages. >>> +.IP >>> +If all hugepage-sized/aligned regions covered by the provided range were >> >> Prefer English rather than "/". >> >>> +either successfully collapsed, >>> +or were already PMD-mapped THPs, >>> +this operation will be deemed successful. >>> +Note that this doesn’t guarantee anything about other possible mappings of >> >> Break after "about". >> >>> +the memory. >>> +Also note that many failures might have occurred since the operation may >>> +continue to collapse in the event collapse of a single hugepage-sized/aligned >> >> Add some omitted "that" or something that will help readability to >> non-native-English readers. >> >> And break at a better place. >> >>> +region fails. >>> +.TP >>> .BR MADV_DONTDUMP " (since Linux 3.4)" >>> .\" commit 909af768e88867016f427264ae39d27a57b6a8ed >>> .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519 >>> @@ -619,6 +695,11 @@ A kernel resource was temporarily unavailable. >>> .B EBADF >>> The map exists, but the area maps something that isn't a file. >>> .TP >>> +.B EBUSY >>> +(for >>> +.BR MADV_COLLAPSE ) >>> +Could not charge hugepage to cgroup: cgroup limit exceeded. >>> +.TP >>> .B EFAULT >>> .I advice >>> is >>> @@ -716,6 +797,11 @@ maximum resident set size. >>> Not enough memory: paging in failed. >>> .TP >>> .B ENOMEM >>> +(for >>> +.BR MADV_COLLAPSE ) >>> +Not enough memory: could not allocate hugepage. >>> +.TP >>> +.B ENOMEM >>> Addresses in the specified range are not currently >>> mapped, or are outside the address space of the process. >>> .TP >>> diff --git a/man2/process_madvise.2 b/man2/process_madvise.2 >>> index 44d3b94e8..8b0ddccdd 100644 >>> --- a/man2/process_madvise.2 >>> +++ b/man2/process_madvise.2 >>> @@ -73,6 +73,10 @@ argument is one of the following values: >>> See >>> .BR madvise (2). >>> .TP >>> +.B MADV_COLLAPSE >>> +See >>> +.BR madvise (2). >>> +.TP >>> .B MADV_PAGEOUT >>> See >>> .BR madvise (2). >>> @@ -173,6 +177,12 @@ The caller does not have permission to access the address space of the process >>> .TP >>> .B ESRCH >>> The target process does not exist (i.e., it has terminated and been waited on). >>> +.PP >>> +See >>> +.BR madvise (2) >>> +for >>> +.IR advice -specific >>> +errors. >>> .SH VERSIONS >>> This system call first appeared in Linux 5.10. >>> .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc >> >> -- >> <http://www.alejandro-colomar.es/>
On Sun, Dec 11, 2022 at 1:55 PM Alejandro Colomar <alx.manpages@gmail.com> wrote: > > Hey Zach, > > On 12/11/22 22:51, Zach O'Keefe wrote: > > On Sun, Dec 11, 2022 at 9:59 AM Alejandro Colomar > > <alx.manpages@gmail.com> wrote: > >> > >> Hi Zach, > > > > Hey Alex, > > > >> On 10/22/22 00:33, Zach OKeefe wrote: > >>> From: Zach O'Keefe <zokeefe@google.com> > >>> > >>> Linux 6.1 introduced MADV_COLLAPSE in upstream commit 7d8faaf15545 > >>> ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") and > >>> upstream commit 34488399fa08 ("mm/madvise: add file and shmem support to > >>> MADV_COLLAPSE"). Update the man-pages for madvise(2) and > >>> process_madvise(2). > >>> > >>> Link: https://lore.kernel.org/linux-mm/20220922224046.1143204-1-zokeefe@google.com/ > >>> Link: https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/ > >>> Signed-off-by: Zach O'Keefe <zokeefe@google.com> > >> > >> Please see a few comments below. > >> > > > > Thanks for the mail. So, this patch was taken as commit b106cd5bf > > ("madvise.2: add documentation for MADV_COLLAPSE"). Some of your > > comments below were > > applied (I think, by you) as fixes pre-commit. However, there are some > > new comments (or ones > > that address the same lines, but in different ways). Is this mail to > > log ~ what changes were done, > > or is there anything actionable here on my side? > > Ah no, it's just that I had it marked as unread for some reason, so I thought I > had forgotten to respond (and I forgot that I had applied it). :-) > > So, no action required. > > Regarding different suggestions, heh, it demonstrates that it's not exactly > deterministic :P > Heh -- no worries :) Thanks for following up! > Cheers, > > Alex > > P.S.: Do you know if I have anything missing from you or any of your collegues? At least on my part, I think you've taken all my patches (with help & edits -- thank you!). I can't speak for anyone else at Google, however (though, just a very hasty cross reference between git log and lore.kernel.org/linux-man seems to indicate patches sent from *@google.com since man-pages-6.00 have previously made it into man-pages-6.01, and nothing afterwards). Have a great rest of your weekend, Best, Zach > > > > > Best, > > Zach > > > > Thanks for this. > >> Cheers, > >> > >> Alex > >> > >>> --- > >>> man2/madvise.2 | 90 +++++++++++++++++++++++++++++++++++++++++- > >>> man2/process_madvise.2 | 10 +++++ > >>> 2 files changed, 98 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/man2/madvise.2 b/man2/madvise.2 > >>> index df3413cc8..b03fc731d 100644 > >>> --- a/man2/madvise.2 > >>> +++ b/man2/madvise.2 > >>> @@ -385,9 +385,10 @@ set (see > >>> .BR prctl (2) ). > >>> .IP > >>> The > >>> -.B MADV_HUGEPAGE > >>> +.BR MADV_HUGEPAGE , > >>> +.BR MADV_NOHUGEPAGE , > >>> and > >>> -.B MADV_NOHUGEPAGE > >>> +.B MADV_COLLAPSE > >>> operations are available only if the kernel was configured with > >>> .B CONFIG_TRANSPARENT_HUGEPAGE > >>> and file/shmem memory is only supported if the kernel was configured with > >>> @@ -400,6 +401,81 @@ and > >>> .I length > >>> will not be backed by transparent hugepages. > >>> .TP > >>> +.BR MADV_COLLAPSE " (since Linux 6.1)" > >>> +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77 > >>> +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321 > >>> +Perform a best-effort synchronous collapse of the native pages mapped by the > >> > >> Please use semantic line breaks. In this case, I'd break after "pages". > >> > >> man-pages(7): > >> Use semantic newlines > >> In the source of a manual page, new sentences should be started on new > >> lines, long sentences should be split into lines at clause breaks (com‐ > >> mas, semicolons, colons, and so on), and long clauses should be split > >> at phrase boundaries. This convention, sometimes known as "semantic > >> newlines", makes it easier to see the effect of patches, which often > >> operate at the level of individual sentences, clauses, or phrases. > >> > >>> +memory range into Transparent Huge Pages (THPs). > >>> +.B MADV_COLLAPSE > >>> +operates on the current state of memory of the calling process and makes no > >> > >> Here I'd break after "and". > >> > >>> +persistent changes or guarantees on how pages will be mapped, > >>> +constructed, > >>> +or faulted in the future. > >>> +.IP > >>> +.B MADV_COLLAPSE > >>> +supports private anonymous pages (see > >>> +.BR mmap (2)), > >>> +shmem pages, > >>> +and file-backed pages. > >>> +See > >>> +.B MADV_HUGEPAGE > >>> +for general information on memory requirements for THP. > >>> +If the range provided spans multiple VMAs, > >>> +the semantics of the collapse over each VMA is independent from the others. > >>> +If collapse of a given huge page-aligned/sized region fails, > >>> +the operation may continue to attempt collapsing the remainder of the > >> > >> Break after "collapsing". > >> > >>> +specified memory. > >>> +.B MADV_COLLAPSE > >>> +will automatically clamp the provided range to be hugepage-aligned. > >>> +.IP > >>> +All non-resident pages covered by the range will first be > >> > >> Break after "range". > >> > >>> +swapped/faulted-in, > >>> +before being copied onto a freshly allocated hugepage. > >>> +If the native pages compose the same PTE-mapped hugepage, > >>> +and are suitably aligned, > >>> +allocation of a new hugepage may be elided and collapse may happen > >> > >> Break before or after "and". > >> > >>> +in-place. > >>> +Unmapped pages will have their data directly initialized to 0 in the new > >> > >> Break after "0". > >> > >>> +hugepage. > >>> +However, > >>> +for every eligible hugepage-aligned/sized region to be collapsed, > >>> +at least one page must currently be backed by physical memory. > >>> +.IP > >>> +.BR MADV_COLLAPSE > >> > >> s/BR/B/ > >> > >>> +is independent of any sysfs > >>> +(see > >>> +.BR sysfs (5)) > >>> +setting under > >>> +.IR /sys/kernel/mm/transparent_hugepage , > >>> +both in terms of determining THP eligibility, > >>> +and allocation semantics. > >>> +See Linux kernel source file > >>> +.I Documentation/admin\-guide/mm/transhuge.rst > >>> +for more information. > >>> +.BR MADV_COLLAPSE > >> > >> s/BR/B/ > >> > >>> +also ignores > >>> +.B huge= > >>> +tmpfs mount when operating on tmpfs files. > >>> +Allocation for the new hugepage may enter direct reclaim and/or compaction, > >>> +regardless of VMA flags > >>> +(though > >>> +.BR VM_NOHUGEPAGE > >> > >> s/BR/B/ > >> > >>> +is still respected). > >>> +.IP > >>> +When the system has multiple NUMA nodes, > >>> +the hugepage will be allocated from the node providing the most native > >> > >> Break after "from". > >> > >>> +pages. > >>> +.IP > >>> +If all hugepage-sized/aligned regions covered by the provided range were > >> > >> Prefer English rather than "/". > >> > >>> +either successfully collapsed, > >>> +or were already PMD-mapped THPs, > >>> +this operation will be deemed successful. > >>> +Note that this doesn’t guarantee anything about other possible mappings of > >> > >> Break after "about". > >> > >>> +the memory. > >>> +Also note that many failures might have occurred since the operation may > >>> +continue to collapse in the event collapse of a single hugepage-sized/aligned > >> > >> Add some omitted "that" or something that will help readability to > >> non-native-English readers. > >> > >> And break at a better place. > >> > >>> +region fails. > >>> +.TP > >>> .BR MADV_DONTDUMP " (since Linux 3.4)" > >>> .\" commit 909af768e88867016f427264ae39d27a57b6a8ed > >>> .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519 > >>> @@ -619,6 +695,11 @@ A kernel resource was temporarily unavailable. > >>> .B EBADF > >>> The map exists, but the area maps something that isn't a file. > >>> .TP > >>> +.B EBUSY > >>> +(for > >>> +.BR MADV_COLLAPSE ) > >>> +Could not charge hugepage to cgroup: cgroup limit exceeded. > >>> +.TP > >>> .B EFAULT > >>> .I advice > >>> is > >>> @@ -716,6 +797,11 @@ maximum resident set size. > >>> Not enough memory: paging in failed. > >>> .TP > >>> .B ENOMEM > >>> +(for > >>> +.BR MADV_COLLAPSE ) > >>> +Not enough memory: could not allocate hugepage. > >>> +.TP > >>> +.B ENOMEM > >>> Addresses in the specified range are not currently > >>> mapped, or are outside the address space of the process. > >>> .TP > >>> diff --git a/man2/process_madvise.2 b/man2/process_madvise.2 > >>> index 44d3b94e8..8b0ddccdd 100644 > >>> --- a/man2/process_madvise.2 > >>> +++ b/man2/process_madvise.2 > >>> @@ -73,6 +73,10 @@ argument is one of the following values: > >>> See > >>> .BR madvise (2). > >>> .TP > >>> +.B MADV_COLLAPSE > >>> +See > >>> +.BR madvise (2). > >>> +.TP > >>> .B MADV_PAGEOUT > >>> See > >>> .BR madvise (2). > >>> @@ -173,6 +177,12 @@ The caller does not have permission to access the address space of the process > >>> .TP > >>> .B ESRCH > >>> The target process does not exist (i.e., it has terminated and been waited on). > >>> +.PP > >>> +See > >>> +.BR madvise (2) > >>> +for > >>> +.IR advice -specific > >>> +errors. > >>> .SH VERSIONS > >>> This system call first appeared in Linux 5.10. > >>> .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc > >> > >> -- > >> <http://www.alejandro-colomar.es/> > > -- > <http://www.alejandro-colomar.es/>
Hi Zach! On 12/11/22 23:37, Zach O'Keefe wrote: > Heh -- no worries :) Thanks for following up! :) > >> Cheers, >> >> Alex >> >> P.S.: Do you know if I have anything missing from you or any of your collegues? > > At least on my part, I think you've taken all my patches (with help & > edits -- thank you!). I can't speak for anyone else at Google, however > (though, just a very hasty cross reference between git log and > lore.kernel.org/linux-man seems to indicate patches sent from > *@google.com since man-pages-6.00 have previously made it into > man-pages-6.01, and nothing afterwards). Makes sense. 6.01 is very recent, and I don't remember any patches since then. > > Have a great rest of your weekend, Have a nice weekend! Cheers, Alex > > Best, > Zach >
diff --git a/man2/madvise.2 b/man2/madvise.2 index df3413cc8..b03fc731d 100644 --- a/man2/madvise.2 +++ b/man2/madvise.2 @@ -385,9 +385,10 @@ set (see .BR prctl (2) ). .IP The -.B MADV_HUGEPAGE +.BR MADV_HUGEPAGE , +.BR MADV_NOHUGEPAGE , and -.B MADV_NOHUGEPAGE +.B MADV_COLLAPSE operations are available only if the kernel was configured with .B CONFIG_TRANSPARENT_HUGEPAGE and file/shmem memory is only supported if the kernel was configured with @@ -400,6 +401,81 @@ and .I length will not be backed by transparent hugepages. .TP +.BR MADV_COLLAPSE " (since Linux 6.1)" +.\" commit 7d8faaf155454f8798ec56404faca29a82689c77 +.\" commit 34488399fa08faaf664743fa54b271eb6f9e1321 +Perform a best-effort synchronous collapse of the native pages mapped by the +memory range into Transparent Huge Pages (THPs). +.B MADV_COLLAPSE +operates on the current state of memory of the calling process and makes no +persistent changes or guarantees on how pages will be mapped, +constructed, +or faulted in the future. +.IP +.B MADV_COLLAPSE +supports private anonymous pages (see +.BR mmap (2)), +shmem pages, +and file-backed pages. +See +.B MADV_HUGEPAGE +for general information on memory requirements for THP. +If the range provided spans multiple VMAs, +the semantics of the collapse over each VMA is independent from the others. +If collapse of a given huge page-aligned/sized region fails, +the operation may continue to attempt collapsing the remainder of the +specified memory. +.B MADV_COLLAPSE +will automatically clamp the provided range to be hugepage-aligned. +.IP +All non-resident pages covered by the range will first be +swapped/faulted-in, +before being copied onto a freshly allocated hugepage. +If the native pages compose the same PTE-mapped hugepage, +and are suitably aligned, +allocation of a new hugepage may be elided and collapse may happen +in-place. +Unmapped pages will have their data directly initialized to 0 in the new +hugepage. +However, +for every eligible hugepage-aligned/sized region to be collapsed, +at least one page must currently be backed by physical memory. +.IP +.BR MADV_COLLAPSE +is independent of any sysfs +(see +.BR sysfs (5)) +setting under +.IR /sys/kernel/mm/transparent_hugepage , +both in terms of determining THP eligibility, +and allocation semantics. +See Linux kernel source file +.I Documentation/admin\-guide/mm/transhuge.rst +for more information. +.BR MADV_COLLAPSE +also ignores +.B huge= +tmpfs mount when operating on tmpfs files. +Allocation for the new hugepage may enter direct reclaim and/or compaction, +regardless of VMA flags +(though +.BR VM_NOHUGEPAGE +is still respected). +.IP +When the system has multiple NUMA nodes, +the hugepage will be allocated from the node providing the most native +pages. +.IP +If all hugepage-sized/aligned regions covered by the provided range were +either successfully collapsed, +or were already PMD-mapped THPs, +this operation will be deemed successful. +Note that this doesn’t guarantee anything about other possible mappings of +the memory. +Also note that many failures might have occurred since the operation may +continue to collapse in the event collapse of a single hugepage-sized/aligned +region fails. +.TP .BR MADV_DONTDUMP " (since Linux 3.4)" .\" commit 909af768e88867016f427264ae39d27a57b6a8ed .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519 @@ -619,6 +695,11 @@ A kernel resource was temporarily unavailable. .B EBADF The map exists, but the area maps something that isn't a file. .TP +.B EBUSY +(for +.BR MADV_COLLAPSE ) +Could not charge hugepage to cgroup: cgroup limit exceeded. +.TP .B EFAULT .I advice is @@ -716,6 +797,11 @@ maximum resident set size. Not enough memory: paging in failed. .TP .B ENOMEM +(for +.BR MADV_COLLAPSE ) +Not enough memory: could not allocate hugepage. +.TP +.B ENOMEM Addresses in the specified range are not currently mapped, or are outside the address space of the process. .TP diff --git a/man2/process_madvise.2 b/man2/process_madvise.2 index 44d3b94e8..8b0ddccdd 100644 --- a/man2/process_madvise.2 +++ b/man2/process_madvise.2 @@ -73,6 +73,10 @@ argument is one of the following values: See .BR madvise (2). .TP +.B MADV_COLLAPSE +See +.BR madvise (2). +.TP .B MADV_PAGEOUT See .BR madvise (2). @@ -173,6 +177,12 @@ The caller does not have permission to access the address space of the process .TP .B ESRCH The target process does not exist (i.e., it has terminated and been waited on). +.PP +See +.BR madvise (2) +for +.IR advice -specific +errors. .SH VERSIONS This system call first appeared in Linux 5.10. .\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc