diff mbox series

mm: Respect mmap hint address when aligning for THP

Message ID 20241115215256.578125-1-kaleshsingh@google.com (mailing list archive)
State New
Headers show
Series mm: Respect mmap hint address when aligning for THP | expand

Commit Message

Kalesh Singh Nov. 15, 2024, 9:52 p.m. UTC
Commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
boundaries") updated __get_unmapped_area() to align the start address
for the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=y.

It does this by effectively looking up a region that is of size,
request_size + PMD_SIZE, and aligning up the start to a PMD boundary.

Commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment
on 32 bit") opted out of this for 32bit due to regressions in mmap base
randomization.

Commit d4148aeab412 ("mm, mmap: limit THP alignment of anonymous
mappings to PMD-aligned sizes") restricted this to only mmap sizes that
are multiples of the PMD_SIZE due to reported regressions in some
performance benchmarks -- which seemed mostly due to the reduced spatial
locality of related mappings due to the forced PMD-alignment.

Another unintended side effect has emerged: When a user specifies an mmap
hint address, the THP alignment logic modifies the behavior, potentially
ignoring the hint even if a sufficiently large gap exists at the requested
hint location.

Example Scenario:

Consider the following simplified virtual address (VA) space:

    ...

    0x200000-0x400000 --- VMA A
    0x400000-0x600000 --- Hole
    0x600000-0x800000 --- VMA B

    ...

A call to mmap() with hint=0x400000 and len=0x200000 behaves differently:

  - Before THP alignment: The requested region (size 0x200000) fits into
    the gap at 0x400000, so the hint is respected.

  - After alignment: The logic searches for a region of size
    0x400000 (len + PMD_SIZE) starting at 0x400000.
    This search fails due to the mapping at 0x600000 (VMA B), and the hint
    is ignored, falling back to arch_get_unmapped_area[_topdown]().

In general the hint is effectively ignored, if there is any
existing mapping in the below range:

     [mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE)

This changes the semantics of mmap hint; from ""Respect the hint if a
sufficiently large gap exists at the requested location" to "Respect the
hint only if an additional PMD-sized gap exists beyond the requested size".

This has performance implications for allocators that allocate their heap
using mmap but try to keep it "as contiguous as possible" by using the
end of the exisiting heap as the address hint. With the new behavior
it's more likely to get a much less contiguous heap, adding extra
fragmentation and performance overhead.

To restore the expected behavior; don't use thp_get_unmapped_area_vmflags()
when the user provided a hint address.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hans Boehm <hboehm@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: <stable@vger.kernel.org>
Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 mm/mmap.c | 1 +
 1 file changed, 1 insertion(+)


base-commit: 2d5404caa8c7bb5c4e0435f94b28834ae5456623

Comments

Rik van Riel Nov. 15, 2024, 10:06 p.m. UTC | #1
On Fri, 2024-11-15 at 13:52 -0800, Kalesh Singh wrote:
> 
> To restore the expected behavior; don't use
> thp_get_unmapped_area_vmflags()
> when the user provided a hint address.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Yang Shi <yang@os.amperecomputing.com>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hans Boehm <hboehm@google.com>
> Cc: Lokesh Gidra <lokeshgidra@google.com>
> Cc: <stable@vger.kernel.org>
> Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> boundaries")
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> 

Reviewed-by: Rik van Riel <riel@surriel.com>
Yang Shi Nov. 15, 2024, 10:15 p.m. UTC | #2
On Fri, Nov 15, 2024 at 1:52 PM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> Commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> boundaries") updated __get_unmapped_area() to align the start address
> for the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=y.
>
> It does this by effectively looking up a region that is of size,
> request_size + PMD_SIZE, and aligning up the start to a PMD boundary.
>
> Commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment
> on 32 bit") opted out of this for 32bit due to regressions in mmap base
> randomization.
>
> Commit d4148aeab412 ("mm, mmap: limit THP alignment of anonymous
> mappings to PMD-aligned sizes") restricted this to only mmap sizes that
> are multiples of the PMD_SIZE due to reported regressions in some
> performance benchmarks -- which seemed mostly due to the reduced spatial
> locality of related mappings due to the forced PMD-alignment.
>
> Another unintended side effect has emerged: When a user specifies an mmap
> hint address, the THP alignment logic modifies the behavior, potentially
> ignoring the hint even if a sufficiently large gap exists at the requested
> hint location.
>
> Example Scenario:
>
> Consider the following simplified virtual address (VA) space:
>
>     ...
>
>     0x200000-0x400000 --- VMA A
>     0x400000-0x600000 --- Hole
>     0x600000-0x800000 --- VMA B
>
>     ...
>
> A call to mmap() with hint=0x400000 and len=0x200000 behaves differently:
>
>   - Before THP alignment: The requested region (size 0x200000) fits into
>     the gap at 0x400000, so the hint is respected.
>
>   - After alignment: The logic searches for a region of size
>     0x400000 (len + PMD_SIZE) starting at 0x400000.
>     This search fails due to the mapping at 0x600000 (VMA B), and the hint
>     is ignored, falling back to arch_get_unmapped_area[_topdown]().
>
> In general the hint is effectively ignored, if there is any
> existing mapping in the below range:
>
>      [mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE)
>
> This changes the semantics of mmap hint; from ""Respect the hint if a
> sufficiently large gap exists at the requested location" to "Respect the
> hint only if an additional PMD-sized gap exists beyond the requested size".
>
> This has performance implications for allocators that allocate their heap
> using mmap but try to keep it "as contiguous as possible" by using the
> end of the exisiting heap as the address hint. With the new behavior
> it's more likely to get a much less contiguous heap, adding extra
> fragmentation and performance overhead.
>
> To restore the expected behavior; don't use thp_get_unmapped_area_vmflags()
> when the user provided a hint address.

Thanks for fixing it. I agree we should respect the hint address. But
this patch actually just fixed anonymous mapping and the file mappings
which don't support thp_get_unmapped_area(). So I think you should
move the hint check to __thp_get_unmapped_area().

And Vlastimil's fix d4148aeab412 ("mm, mmap: limit THP alignment of
anonymous mappings to PMD-aligned sizes") should be moved to there too
IMHO.

> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Yang Shi <yang@os.amperecomputing.com>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hans Boehm <hboehm@google.com>
> Cc: Lokesh Gidra <lokeshgidra@google.com>
> Cc: <stable@vger.kernel.org>
> Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries")
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>  mm/mmap.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 79d541f1502b..2f01f1a8e304 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -901,6 +901,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
>         if (get_area) {
>                 addr = get_area(file, addr, len, pgoff, flags);
>         } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)
> +                  && !addr /* no hint */
>                    && IS_ALIGNED(len, PMD_SIZE)) {
>                 /* Ensures that larger anonymous mappings are THP aligned. */
>                 addr = thp_get_unmapped_area_vmflags(file, addr, len,
>
> base-commit: 2d5404caa8c7bb5c4e0435f94b28834ae5456623
> --
> 2.47.0.338.g60cca15819-goog
>
Kalesh Singh Nov. 15, 2024, 10:44 p.m. UTC | #3
On Fri, Nov 15, 2024 at 2:15 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Fri, Nov 15, 2024 at 1:52 PM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > Commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> > boundaries") updated __get_unmapped_area() to align the start address
> > for the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=y.
> >
> > It does this by effectively looking up a region that is of size,
> > request_size + PMD_SIZE, and aligning up the start to a PMD boundary.
> >
> > Commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment
> > on 32 bit") opted out of this for 32bit due to regressions in mmap base
> > randomization.
> >
> > Commit d4148aeab412 ("mm, mmap: limit THP alignment of anonymous
> > mappings to PMD-aligned sizes") restricted this to only mmap sizes that
> > are multiples of the PMD_SIZE due to reported regressions in some
> > performance benchmarks -- which seemed mostly due to the reduced spatial
> > locality of related mappings due to the forced PMD-alignment.
> >
> > Another unintended side effect has emerged: When a user specifies an mmap
> > hint address, the THP alignment logic modifies the behavior, potentially
> > ignoring the hint even if a sufficiently large gap exists at the requested
> > hint location.
> >
> > Example Scenario:
> >
> > Consider the following simplified virtual address (VA) space:
> >
> >     ...
> >
> >     0x200000-0x400000 --- VMA A
> >     0x400000-0x600000 --- Hole
> >     0x600000-0x800000 --- VMA B
> >
> >     ...
> >
> > A call to mmap() with hint=0x400000 and len=0x200000 behaves differently:
> >
> >   - Before THP alignment: The requested region (size 0x200000) fits into
> >     the gap at 0x400000, so the hint is respected.
> >
> >   - After alignment: The logic searches for a region of size
> >     0x400000 (len + PMD_SIZE) starting at 0x400000.
> >     This search fails due to the mapping at 0x600000 (VMA B), and the hint
> >     is ignored, falling back to arch_get_unmapped_area[_topdown]().
> >
> > In general the hint is effectively ignored, if there is any
> > existing mapping in the below range:
> >
> >      [mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE)
> >
> > This changes the semantics of mmap hint; from ""Respect the hint if a
> > sufficiently large gap exists at the requested location" to "Respect the
> > hint only if an additional PMD-sized gap exists beyond the requested size".
> >
> > This has performance implications for allocators that allocate their heap
> > using mmap but try to keep it "as contiguous as possible" by using the
> > end of the exisiting heap as the address hint. With the new behavior
> > it's more likely to get a much less contiguous heap, adding extra
> > fragmentation and performance overhead.
> >
> > To restore the expected behavior; don't use thp_get_unmapped_area_vmflags()
> > when the user provided a hint address.
>
> Thanks for fixing it. I agree we should respect the hint address. But
> this patch actually just fixed anonymous mapping and the file mappings
> which don't support thp_get_unmapped_area(). So I think you should
> move the hint check to __thp_get_unmapped_area().
>
> And Vlastimil's fix d4148aeab412 ("mm, mmap: limit THP alignment of
> anonymous mappings to PMD-aligned sizes") should be moved to there too
> IMHO.

Thanks Yang, you are right, to cover the file systems that are using
this for their .get_unmapped_area(). I'll move this to where the 64
bit checks are done when posting v2.

Thanks,
Kalesh

>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Yang Shi <yang@os.amperecomputing.com>
> > Cc: Rik van Riel <riel@surriel.com>
> > Cc: Ryan Roberts <ryan.roberts@arm.com>
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: Hans Boehm <hboehm@google.com>
> > Cc: Lokesh Gidra <lokeshgidra@google.com>
> > Cc: <stable@vger.kernel.org>
> > Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries")
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >  mm/mmap.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 79d541f1502b..2f01f1a8e304 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -901,6 +901,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> >         if (get_area) {
> >                 addr = get_area(file, addr, len, pgoff, flags);
> >         } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)
> > +                  && !addr /* no hint */
> >                    && IS_ALIGNED(len, PMD_SIZE)) {
> >                 /* Ensures that larger anonymous mappings are THP aligned. */
> >                 addr = thp_get_unmapped_area_vmflags(file, addr, len,
> >
> > base-commit: 2d5404caa8c7bb5c4e0435f94b28834ae5456623
> > --
> > 2.47.0.338.g60cca15819-goog
> >
Vlastimil Babka Nov. 17, 2024, 11:12 a.m. UTC | #4
On 11/15/24 23:15, Yang Shi wrote:
> On Fri, Nov 15, 2024 at 1:52 PM Kalesh Singh <kaleshsingh@google.com> wrote:
>>
>> Commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
>> boundaries") updated __get_unmapped_area() to align the start address
>> for the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=y.
>>
>> It does this by effectively looking up a region that is of size,
>> request_size + PMD_SIZE, and aligning up the start to a PMD boundary.
>>
>> Commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment
>> on 32 bit") opted out of this for 32bit due to regressions in mmap base
>> randomization.
>>
>> Commit d4148aeab412 ("mm, mmap: limit THP alignment of anonymous
>> mappings to PMD-aligned sizes") restricted this to only mmap sizes that
>> are multiples of the PMD_SIZE due to reported regressions in some
>> performance benchmarks -- which seemed mostly due to the reduced spatial
>> locality of related mappings due to the forced PMD-alignment.
>>
>> Another unintended side effect has emerged: When a user specifies an mmap
>> hint address, the THP alignment logic modifies the behavior, potentially
>> ignoring the hint even if a sufficiently large gap exists at the requested
>> hint location.
>>
>> Example Scenario:
>>
>> Consider the following simplified virtual address (VA) space:
>>
>>     ...
>>
>>     0x200000-0x400000 --- VMA A
>>     0x400000-0x600000 --- Hole
>>     0x600000-0x800000 --- VMA B
>>
>>     ...
>>
>> A call to mmap() with hint=0x400000 and len=0x200000 behaves differently:
>>
>>   - Before THP alignment: The requested region (size 0x200000) fits into
>>     the gap at 0x400000, so the hint is respected.
>>
>>   - After alignment: The logic searches for a region of size
>>     0x400000 (len + PMD_SIZE) starting at 0x400000.
>>     This search fails due to the mapping at 0x600000 (VMA B), and the hint
>>     is ignored, falling back to arch_get_unmapped_area[_topdown]().

Hmm looks like the search is not done in the optimal way regardless of
whether or not it ignores a hint - it should be able to find the hole, no?

>> In general the hint is effectively ignored, if there is any
>> existing mapping in the below range:
>>
>>      [mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE)
>>
>> This changes the semantics of mmap hint; from ""Respect the hint if a
>> sufficiently large gap exists at the requested location" to "Respect the
>> hint only if an additional PMD-sized gap exists beyond the requested size".
>>
>> This has performance implications for allocators that allocate their heap
>> using mmap but try to keep it "as contiguous as possible" by using the
>> end of the exisiting heap as the address hint. With the new behavior
>> it's more likely to get a much less contiguous heap, adding extra
>> fragmentation and performance overhead.
>>
>> To restore the expected behavior; don't use thp_get_unmapped_area_vmflags()
>> when the user provided a hint address.

Agreed, the hint should take precendence.

> Thanks for fixing it. I agree we should respect the hint address. But
> this patch actually just fixed anonymous mapping and the file mappings
> which don't support thp_get_unmapped_area(). So I think you should
> move the hint check to __thp_get_unmapped_area().
> 
> And Vlastimil's fix d4148aeab412 ("mm, mmap: limit THP alignment of
> anonymous mappings to PMD-aligned sizes") should be moved to there too
> IMHO.

This was brought up, but I didn't want to do it as part of the stable fix as
that would change even situations that Rik's change didn't.
If the mmap hint change is another stable hotfix, I wouldn't conflate it
either. But we can try it for further development. But careful about just
moving the code as-is, the file-based mappings are different than anonymous
memory and I believe file offsets matter:

https://lore.kernel.org/all/9d7c73f6-1e1a-458b-93c6-3b44959022e0@suse.cz/

https://lore.kernel.org/all/5f7a49e8-0416-4648-a704-a7a67e8cd894@suse.cz/

>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Yang Shi <yang@os.amperecomputing.com>
>> Cc: Rik van Riel <riel@surriel.com>
>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>> Cc: Suren Baghdasaryan <surenb@google.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Hans Boehm <hboehm@google.com>
>> Cc: Lokesh Gidra <lokeshgidra@google.com>
>> Cc: <stable@vger.kernel.org>
>> Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries")
>> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

>> ---
>>  mm/mmap.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 79d541f1502b..2f01f1a8e304 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -901,6 +901,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
>>         if (get_area) {
>>                 addr = get_area(file, addr, len, pgoff, flags);
>>         } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)
>> +                  && !addr /* no hint */
>>                    && IS_ALIGNED(len, PMD_SIZE)) {
>>                 /* Ensures that larger anonymous mappings are THP aligned. */
>>                 addr = thp_get_unmapped_area_vmflags(file, addr, len,
>>
>> base-commit: 2d5404caa8c7bb5c4e0435f94b28834ae5456623
>> --
>> 2.47.0.338.g60cca15819-goog
>>
Yang Shi Nov. 18, 2024, 5:05 p.m. UTC | #5
On Sun, Nov 17, 2024 at 3:12 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 11/15/24 23:15, Yang Shi wrote:
> > On Fri, Nov 15, 2024 at 1:52 PM Kalesh Singh <kaleshsingh@google.com> wrote:
> >>
> >> Commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> >> boundaries") updated __get_unmapped_area() to align the start address
> >> for the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=y.
> >>
> >> It does this by effectively looking up a region that is of size,
> >> request_size + PMD_SIZE, and aligning up the start to a PMD boundary.
> >>
> >> Commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment
> >> on 32 bit") opted out of this for 32bit due to regressions in mmap base
> >> randomization.
> >>
> >> Commit d4148aeab412 ("mm, mmap: limit THP alignment of anonymous
> >> mappings to PMD-aligned sizes") restricted this to only mmap sizes that
> >> are multiples of the PMD_SIZE due to reported regressions in some
> >> performance benchmarks -- which seemed mostly due to the reduced spatial
> >> locality of related mappings due to the forced PMD-alignment.
> >>
> >> Another unintended side effect has emerged: When a user specifies an mmap
> >> hint address, the THP alignment logic modifies the behavior, potentially
> >> ignoring the hint even if a sufficiently large gap exists at the requested
> >> hint location.
> >>
> >> Example Scenario:
> >>
> >> Consider the following simplified virtual address (VA) space:
> >>
> >>     ...
> >>
> >>     0x200000-0x400000 --- VMA A
> >>     0x400000-0x600000 --- Hole
> >>     0x600000-0x800000 --- VMA B
> >>
> >>     ...
> >>
> >> A call to mmap() with hint=0x400000 and len=0x200000 behaves differently:
> >>
> >>   - Before THP alignment: The requested region (size 0x200000) fits into
> >>     the gap at 0x400000, so the hint is respected.
> >>
> >>   - After alignment: The logic searches for a region of size
> >>     0x400000 (len + PMD_SIZE) starting at 0x400000.
> >>     This search fails due to the mapping at 0x600000 (VMA B), and the hint
> >>     is ignored, falling back to arch_get_unmapped_area[_topdown]().
>
> Hmm looks like the search is not done in the optimal way regardless of
> whether or not it ignores a hint - it should be able to find the hole, no?
>
> >> In general the hint is effectively ignored, if there is any
> >> existing mapping in the below range:
> >>
> >>      [mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE)
> >>
> >> This changes the semantics of mmap hint; from ""Respect the hint if a
> >> sufficiently large gap exists at the requested location" to "Respect the
> >> hint only if an additional PMD-sized gap exists beyond the requested size".
> >>
> >> This has performance implications for allocators that allocate their heap
> >> using mmap but try to keep it "as contiguous as possible" by using the
> >> end of the exisiting heap as the address hint. With the new behavior
> >> it's more likely to get a much less contiguous heap, adding extra
> >> fragmentation and performance overhead.
> >>
> >> To restore the expected behavior; don't use thp_get_unmapped_area_vmflags()
> >> when the user provided a hint address.
>
> Agreed, the hint should take precendence.
>
> > Thanks for fixing it. I agree we should respect the hint address. But
> > this patch actually just fixed anonymous mapping and the file mappings
> > which don't support thp_get_unmapped_area(). So I think you should
> > move the hint check to __thp_get_unmapped_area().
> >
> > And Vlastimil's fix d4148aeab412 ("mm, mmap: limit THP alignment of
> > anonymous mappings to PMD-aligned sizes") should be moved to there too
> > IMHO.
>
> This was brought up, but I didn't want to do it as part of the stable fix as
> that would change even situations that Rik's change didn't.
> If the mmap hint change is another stable hotfix, I wouldn't conflate it
> either. But we can try it for further development. But careful about just
> moving the code as-is, the file-based mappings are different than anonymous
> memory and I believe file offsets matter:
>
> https://lore.kernel.org/all/9d7c73f6-1e1a-458b-93c6-3b44959022e0@suse.cz/
>
> https://lore.kernel.org/all/5f7a49e8-0416-4648-a704-a7a67e8cd894@suse.cz/

Did some research about the history of the code, I found this commit:

97d3d0f9a1cf ("mm/huge_memory.c: thp: fix conflict of above-47bit hint
address and PMD alignment"), it tried to fix "the function would not
try to allocate PMD-aligned area if *any* hint address specified."
It was for file mapping back then since anonymous mapping THP
alignment was not supported yet.

But it seems like this patch somehow tried to do something reverse. It
may not be correct either.

With Vlastimis's fix, we just try to make the address THP aligned for
anonymous mapping when the size is PMD aligned. So we don't need to
take into account the padding for anonymous mapping anymore.

So IIUC we should do something like:

@@ -1085,7 +1085,11 @@ static unsigned long
__thp_get_unmapped_area(struct file *filp,
        if (off_end <= off_align || (off_end - off_align) < size)
                return 0;

-       len_pad = len + size;
+       if (filp)
+               len_pad = len + size;
+       else
+               len_pad = len;
+
        if (len_pad < len || (off + len_pad) < off)
                return 0;

>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: Vlastimil Babka <vbabka@suse.cz>
> >> Cc: Yang Shi <yang@os.amperecomputing.com>
> >> Cc: Rik van Riel <riel@surriel.com>
> >> Cc: Ryan Roberts <ryan.roberts@arm.com>
> >> Cc: Suren Baghdasaryan <surenb@google.com>
> >> Cc: Minchan Kim <minchan@kernel.org>
> >> Cc: Hans Boehm <hboehm@google.com>
> >> Cc: Lokesh Gidra <lokeshgidra@google.com>
> >> Cc: <stable@vger.kernel.org>
> >> Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries")
> >> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>
> >> ---
> >>  mm/mmap.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/mm/mmap.c b/mm/mmap.c
> >> index 79d541f1502b..2f01f1a8e304 100644
> >> --- a/mm/mmap.c
> >> +++ b/mm/mmap.c
> >> @@ -901,6 +901,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> >>         if (get_area) {
> >>                 addr = get_area(file, addr, len, pgoff, flags);
> >>         } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)
> >> +                  && !addr /* no hint */
> >>                    && IS_ALIGNED(len, PMD_SIZE)) {
> >>                 /* Ensures that larger anonymous mappings are THP aligned. */
> >>                 addr = thp_get_unmapped_area_vmflags(file, addr, len,
> >>
> >> base-commit: 2d5404caa8c7bb5c4e0435f94b28834ae5456623
> >> --
> >> 2.47.0.338.g60cca15819-goog
> >>
>
Kalesh Singh Nov. 18, 2024, 5:52 p.m. UTC | #6
On Mon, Nov 18, 2024 at 9:05 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Sun, Nov 17, 2024 at 3:12 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > On 11/15/24 23:15, Yang Shi wrote:
> > > On Fri, Nov 15, 2024 at 1:52 PM Kalesh Singh <kaleshsingh@google.com> wrote:
> > >>
> > >> Commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> > >> boundaries") updated __get_unmapped_area() to align the start address
> > >> for the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=y.
> > >>
> > >> It does this by effectively looking up a region that is of size,
> > >> request_size + PMD_SIZE, and aligning up the start to a PMD boundary.
> > >>
> > >> Commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment
> > >> on 32 bit") opted out of this for 32bit due to regressions in mmap base
> > >> randomization.
> > >>
> > >> Commit d4148aeab412 ("mm, mmap: limit THP alignment of anonymous
> > >> mappings to PMD-aligned sizes") restricted this to only mmap sizes that
> > >> are multiples of the PMD_SIZE due to reported regressions in some
> > >> performance benchmarks -- which seemed mostly due to the reduced spatial
> > >> locality of related mappings due to the forced PMD-alignment.
> > >>
> > >> Another unintended side effect has emerged: When a user specifies an mmap
> > >> hint address, the THP alignment logic modifies the behavior, potentially
> > >> ignoring the hint even if a sufficiently large gap exists at the requested
> > >> hint location.
> > >>
> > >> Example Scenario:
> > >>
> > >> Consider the following simplified virtual address (VA) space:
> > >>
> > >>     ...
> > >>
> > >>     0x200000-0x400000 --- VMA A
> > >>     0x400000-0x600000 --- Hole
> > >>     0x600000-0x800000 --- VMA B
> > >>
> > >>     ...
> > >>
> > >> A call to mmap() with hint=0x400000 and len=0x200000 behaves differently:
> > >>
> > >>   - Before THP alignment: The requested region (size 0x200000) fits into
> > >>     the gap at 0x400000, so the hint is respected.
> > >>
> > >>   - After alignment: The logic searches for a region of size
> > >>     0x400000 (len + PMD_SIZE) starting at 0x400000.
> > >>     This search fails due to the mapping at 0x600000 (VMA B), and the hint
> > >>     is ignored, falling back to arch_get_unmapped_area[_topdown]().
> >

Hi all, Thanks for the reviews.

> > Hmm looks like the search is not done in the optimal way regardless of
> > whether or not it ignores a hint - it should be able to find the hole, no?

It's not able to find the hole in the example case because the size we
are looking for is now
(requested size + padding len) which is larger than the hole we have
at the hint address.

> >
> > >> In general the hint is effectively ignored, if there is any
> > >> existing mapping in the below range:
> > >>
> > >>      [mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE)
> > >>
> > >> This changes the semantics of mmap hint; from ""Respect the hint if a
> > >> sufficiently large gap exists at the requested location" to "Respect the
> > >> hint only if an additional PMD-sized gap exists beyond the requested size".
> > >>
> > >> This has performance implications for allocators that allocate their heap
> > >> using mmap but try to keep it "as contiguous as possible" by using the
> > >> end of the exisiting heap as the address hint. With the new behavior
> > >> it's more likely to get a much less contiguous heap, adding extra
> > >> fragmentation and performance overhead.
> > >>
> > >> To restore the expected behavior; don't use thp_get_unmapped_area_vmflags()
> > >> when the user provided a hint address.
> >
> > Agreed, the hint should take precendence.
> >
> > > Thanks for fixing it. I agree we should respect the hint address. But
> > > this patch actually just fixed anonymous mapping and the file mappings
> > > which don't support thp_get_unmapped_area(). So I think you should
> > > move the hint check to __thp_get_unmapped_area().
> > >
> > > And Vlastimil's fix d4148aeab412 ("mm, mmap: limit THP alignment of
> > > anonymous mappings to PMD-aligned sizes") should be moved to there too
> > > IMHO.
> >
> > This was brought up, but I didn't want to do it as part of the stable fix as
> > that would change even situations that Rik's change didn't.
> > If the mmap hint change is another stable hotfix, I wouldn't conflate it
> > either. But we can try it for further development. But careful about just
> > moving the code as-is, the file-based mappings are different than anonymous
> > memory and I believe file offsets matter:
> >
> > https://lore.kernel.org/all/9d7c73f6-1e1a-458b-93c6-3b44959022e0@suse.cz/
> >
> > https://lore.kernel.org/all/5f7a49e8-0416-4648-a704-a7a67e8cd894@suse.cz/
>

I see, so I think we should keep the check here to address the
immediate regression for anonymous mappings.

I believe what we would need to address this longer term is to have
vma_iter_lowest() [1] vma_iter_highest() [2] take into account the
alignment when doing the search. That way we don't need to inflate the
search size to facilitate the manual alignment after the fact. I
haven't looked too too deeply into this, maybe Liam has some ideas
about that?

[1] https://github.com/torvalds/linux/blob/v6.12-rc7/mm/vma.h#L420

[2] https://github.com/torvalds/linux/blob/v6.12-rc7/mm/vma.h#L426

> Did some research about the history of the code, I found this commit:
>
> 97d3d0f9a1cf ("mm/huge_memory.c: thp: fix conflict of above-47bit hint
> address and PMD alignment"), it tried to fix "the function would not
> try to allocate PMD-aligned area if *any* hint address specified."
> It was for file mapping back then since anonymous mapping THP
> alignment was not supported yet.
>
> But it seems like this patch somehow tried to do something reverse. It
> may not be correct either.
>

IIUC Kirill's patch is doing the right thing (mostly), i.e. it will
return the hint address (without any rounding to PMD alignment). The
case it doesn't handle is what I mentioned above, where we aren't able
to find the hole at the hint address in the first place because the
hole is smaller than (size + padding len)

> With Vlastimis's fix, we just try to make the address THP aligned for
> anonymous mapping when the size is PMD aligned. So we don't need to
> take into account the padding for anonymous mapping anymore.
>

We still need to have padding length because PMD alignment of the size
doesn't mean that the start address returned by the search will be PMD
aligned. Inherently those are only PAGE aligned.

Thanks,
Kalesh

> So IIUC we should do something like:
>
> @@ -1085,7 +1085,11 @@ static unsigned long
> __thp_get_unmapped_area(struct file *filp,
>         if (off_end <= off_align || (off_end - off_align) < size)
>                 return 0;
>
> -       len_pad = len + size;
> +       if (filp)
> +               len_pad = len + size;
> +       else
> +               len_pad = len;
> +
>         if (len_pad < len || (off + len_pad) < off)
>                 return 0;
>
> >
> > >> Cc: Andrew Morton <akpm@linux-foundation.org>
> > >> Cc: Vlastimil Babka <vbabka@suse.cz>
> > >> Cc: Yang Shi <yang@os.amperecomputing.com>
> > >> Cc: Rik van Riel <riel@surriel.com>
> > >> Cc: Ryan Roberts <ryan.roberts@arm.com>
> > >> Cc: Suren Baghdasaryan <surenb@google.com>
> > >> Cc: Minchan Kim <minchan@kernel.org>
> > >> Cc: Hans Boehm <hboehm@google.com>
> > >> Cc: Lokesh Gidra <lokeshgidra@google.com>
> > >> Cc: <stable@vger.kernel.org>
> > >> Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries")
> > >> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> >
> > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> >
> > >> ---
> > >>  mm/mmap.c | 1 +
> > >>  1 file changed, 1 insertion(+)
> > >>
> > >> diff --git a/mm/mmap.c b/mm/mmap.c
> > >> index 79d541f1502b..2f01f1a8e304 100644
> > >> --- a/mm/mmap.c
> > >> +++ b/mm/mmap.c
> > >> @@ -901,6 +901,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> > >>         if (get_area) {
> > >>                 addr = get_area(file, addr, len, pgoff, flags);
> > >>         } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)
> > >> +                  && !addr /* no hint */
> > >>                    && IS_ALIGNED(len, PMD_SIZE)) {
> > >>                 /* Ensures that larger anonymous mappings are THP aligned. */
> > >>                 addr = thp_get_unmapped_area_vmflags(file, addr, len,
> > >>
> > >> base-commit: 2d5404caa8c7bb5c4e0435f94b28834ae5456623
> > >> --
> > >> 2.47.0.338.g60cca15819-goog
> > >>
> >
Yang Shi Nov. 18, 2024, 9:44 p.m. UTC | #7
On Mon, Nov 18, 2024 at 9:52 AM Kalesh Singh <kaleshsingh@google.com> wrote:
>
> On Mon, Nov 18, 2024 at 9:05 AM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Sun, Nov 17, 2024 at 3:12 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> > >
> > > On 11/15/24 23:15, Yang Shi wrote:
> > > > On Fri, Nov 15, 2024 at 1:52 PM Kalesh Singh <kaleshsingh@google.com> wrote:
> > > >>
> > > >> Commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> > > >> boundaries") updated __get_unmapped_area() to align the start address
> > > >> for the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=y.
> > > >>
> > > >> It does this by effectively looking up a region that is of size,
> > > >> request_size + PMD_SIZE, and aligning up the start to a PMD boundary.
> > > >>
> > > >> Commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment
> > > >> on 32 bit") opted out of this for 32bit due to regressions in mmap base
> > > >> randomization.
> > > >>
> > > >> Commit d4148aeab412 ("mm, mmap: limit THP alignment of anonymous
> > > >> mappings to PMD-aligned sizes") restricted this to only mmap sizes that
> > > >> are multiples of the PMD_SIZE due to reported regressions in some
> > > >> performance benchmarks -- which seemed mostly due to the reduced spatial
> > > >> locality of related mappings due to the forced PMD-alignment.
> > > >>
> > > >> Another unintended side effect has emerged: When a user specifies an mmap
> > > >> hint address, the THP alignment logic modifies the behavior, potentially
> > > >> ignoring the hint even if a sufficiently large gap exists at the requested
> > > >> hint location.
> > > >>
> > > >> Example Scenario:
> > > >>
> > > >> Consider the following simplified virtual address (VA) space:
> > > >>
> > > >>     ...
> > > >>
> > > >>     0x200000-0x400000 --- VMA A
> > > >>     0x400000-0x600000 --- Hole
> > > >>     0x600000-0x800000 --- VMA B
> > > >>
> > > >>     ...
> > > >>
> > > >> A call to mmap() with hint=0x400000 and len=0x200000 behaves differently:
> > > >>
> > > >>   - Before THP alignment: The requested region (size 0x200000) fits into
> > > >>     the gap at 0x400000, so the hint is respected.
> > > >>
> > > >>   - After alignment: The logic searches for a region of size
> > > >>     0x400000 (len + PMD_SIZE) starting at 0x400000.
> > > >>     This search fails due to the mapping at 0x600000 (VMA B), and the hint
> > > >>     is ignored, falling back to arch_get_unmapped_area[_topdown]().
> > >
>
> Hi all, Thanks for the reviews.
>
> > > Hmm looks like the search is not done in the optimal way regardless of
> > > whether or not it ignores a hint - it should be able to find the hole, no?
>
> It's not able to find the hole in the example case because the size we
> are looking for is now
> (requested size + padding len) which is larger than the hole we have
> at the hint address.
>
> > >
> > > >> In general the hint is effectively ignored, if there is any
> > > >> existing mapping in the below range:
> > > >>
> > > >>      [mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE)
> > > >>
> > > >> This changes the semantics of mmap hint; from ""Respect the hint if a
> > > >> sufficiently large gap exists at the requested location" to "Respect the
> > > >> hint only if an additional PMD-sized gap exists beyond the requested size".
> > > >>
> > > >> This has performance implications for allocators that allocate their heap
> > > >> using mmap but try to keep it "as contiguous as possible" by using the
> > > >> end of the exisiting heap as the address hint. With the new behavior
> > > >> it's more likely to get a much less contiguous heap, adding extra
> > > >> fragmentation and performance overhead.
> > > >>
> > > >> To restore the expected behavior; don't use thp_get_unmapped_area_vmflags()
> > > >> when the user provided a hint address.
> > >
> > > Agreed, the hint should take precendence.
> > >
> > > > Thanks for fixing it. I agree we should respect the hint address. But
> > > > this patch actually just fixed anonymous mapping and the file mappings
> > > > which don't support thp_get_unmapped_area(). So I think you should
> > > > move the hint check to __thp_get_unmapped_area().
> > > >
> > > > And Vlastimil's fix d4148aeab412 ("mm, mmap: limit THP alignment of
> > > > anonymous mappings to PMD-aligned sizes") should be moved to there too
> > > > IMHO.
> > >
> > > This was brought up, but I didn't want to do it as part of the stable fix as
> > > that would change even situations that Rik's change didn't.
> > > If the mmap hint change is another stable hotfix, I wouldn't conflate it
> > > either. But we can try it for further development. But careful about just
> > > moving the code as-is, the file-based mappings are different than anonymous
> > > memory and I believe file offsets matter:
> > >
> > > https://lore.kernel.org/all/9d7c73f6-1e1a-458b-93c6-3b44959022e0@suse.cz/
> > >
> > > https://lore.kernel.org/all/5f7a49e8-0416-4648-a704-a7a67e8cd894@suse.cz/
> >
>
> I see, so I think we should keep the check here to address the
> immediate regression for anonymous mappings.
>
> I believe what we would need to address this longer term is to have
> vma_iter_lowest() [1] vma_iter_highest() [2] take into account the
> alignment when doing the search. That way we don't need to inflate the
> search size to facilitate the manual alignment after the fact. I
> haven't looked too too deeply into this, maybe Liam has some ideas
> about that?
>
> [1] https://github.com/torvalds/linux/blob/v6.12-rc7/mm/vma.h#L420
>
> [2] https://github.com/torvalds/linux/blob/v6.12-rc7/mm/vma.h#L426
>
> > Did some research about the history of the code, I found this commit:
> >
> > 97d3d0f9a1cf ("mm/huge_memory.c: thp: fix conflict of above-47bit hint
> > address and PMD alignment"), it tried to fix "the function would not
> > try to allocate PMD-aligned area if *any* hint address specified."
> > It was for file mapping back then since anonymous mapping THP
> > alignment was not supported yet.
> >
> > But it seems like this patch somehow tried to do something reverse. It
> > may not be correct either.
> >
>
> IIUC Kirill's patch is doing the right thing (mostly), i.e. it will
> return the hint address (without any rounding to PMD alignment). The
> case it doesn't handle is what I mentioned above, where we aren't able
> to find the hole at the hint address in the first place because the
> hole is smaller than (size + padding len)

Yes. But my point is your fix (just simply skip PMD alignment when
hint is specified) actually broke what Kirill fixed IIUC.

>
> > With Vlastimis's fix, we just try to make the address THP aligned for
> > anonymous mapping when the size is PMD aligned. So we don't need to
> > take into account the padding for anonymous mapping anymore.
> >
>
> We still need to have padding length because PMD alignment of the size
> doesn't mean that the start address returned by the search will be PMD
> aligned. Inherently those are only PAGE aligned.

Yes, you are right, I overlooked this. I think we can do this in two
passes. Use len w/o padding in the first pass. If the returned address
equals the hint or it is already PMD aligned, just return it.
Otherwise it means there is no hole with suitable size and alignment.
In the second pass, we redo it with padding. Just off the top of my
head, others may have better ideas.

>
> Thanks,
> Kalesh
>
> > So IIUC we should do something like:
> >
> > @@ -1085,7 +1085,11 @@ static unsigned long
> > __thp_get_unmapped_area(struct file *filp,
> >         if (off_end <= off_align || (off_end - off_align) < size)
> >                 return 0;
> >
> > -       len_pad = len + size;
> > +       if (filp)
> > +               len_pad = len + size;
> > +       else
> > +               len_pad = len;
> > +
> >         if (len_pad < len || (off + len_pad) < off)
> >                 return 0;
> >
> > >
> > > >> Cc: Andrew Morton <akpm@linux-foundation.org>
> > > >> Cc: Vlastimil Babka <vbabka@suse.cz>
> > > >> Cc: Yang Shi <yang@os.amperecomputing.com>
> > > >> Cc: Rik van Riel <riel@surriel.com>
> > > >> Cc: Ryan Roberts <ryan.roberts@arm.com>
> > > >> Cc: Suren Baghdasaryan <surenb@google.com>
> > > >> Cc: Minchan Kim <minchan@kernel.org>
> > > >> Cc: Hans Boehm <hboehm@google.com>
> > > >> Cc: Lokesh Gidra <lokeshgidra@google.com>
> > > >> Cc: <stable@vger.kernel.org>
> > > >> Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries")
> > > >> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > >
> > > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> > >
> > > >> ---
> > > >>  mm/mmap.c | 1 +
> > > >>  1 file changed, 1 insertion(+)
> > > >>
> > > >> diff --git a/mm/mmap.c b/mm/mmap.c
> > > >> index 79d541f1502b..2f01f1a8e304 100644
> > > >> --- a/mm/mmap.c
> > > >> +++ b/mm/mmap.c
> > > >> @@ -901,6 +901,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> > > >>         if (get_area) {
> > > >>                 addr = get_area(file, addr, len, pgoff, flags);
> > > >>         } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)
> > > >> +                  && !addr /* no hint */
> > > >>                    && IS_ALIGNED(len, PMD_SIZE)) {
> > > >>                 /* Ensures that larger anonymous mappings are THP aligned. */
> > > >>                 addr = thp_get_unmapped_area_vmflags(file, addr, len,
> > > >>
> > > >> base-commit: 2d5404caa8c7bb5c4e0435f94b28834ae5456623
> > > >> --
> > > >> 2.47.0.338.g60cca15819-goog
> > > >>
> > >
Kalesh Singh Nov. 18, 2024, 10:04 p.m. UTC | #8
On Mon, Nov 18, 2024 at 1:44 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Mon, Nov 18, 2024 at 9:52 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > On Mon, Nov 18, 2024 at 9:05 AM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Sun, Nov 17, 2024 at 3:12 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> > > >
> > > > On 11/15/24 23:15, Yang Shi wrote:
> > > > > On Fri, Nov 15, 2024 at 1:52 PM Kalesh Singh <kaleshsingh@google.com> wrote:
> > > > >>
> > > > >> Commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> > > > >> boundaries") updated __get_unmapped_area() to align the start address
> > > > >> for the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=y.
> > > > >>
> > > > >> It does this by effectively looking up a region that is of size,
> > > > >> request_size + PMD_SIZE, and aligning up the start to a PMD boundary.
> > > > >>
> > > > >> Commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment
> > > > >> on 32 bit") opted out of this for 32bit due to regressions in mmap base
> > > > >> randomization.
> > > > >>
> > > > >> Commit d4148aeab412 ("mm, mmap: limit THP alignment of anonymous
> > > > >> mappings to PMD-aligned sizes") restricted this to only mmap sizes that
> > > > >> are multiples of the PMD_SIZE due to reported regressions in some
> > > > >> performance benchmarks -- which seemed mostly due to the reduced spatial
> > > > >> locality of related mappings due to the forced PMD-alignment.
> > > > >>
> > > > >> Another unintended side effect has emerged: When a user specifies an mmap
> > > > >> hint address, the THP alignment logic modifies the behavior, potentially
> > > > >> ignoring the hint even if a sufficiently large gap exists at the requested
> > > > >> hint location.
> > > > >>
> > > > >> Example Scenario:
> > > > >>
> > > > >> Consider the following simplified virtual address (VA) space:
> > > > >>
> > > > >>     ...
> > > > >>
> > > > >>     0x200000-0x400000 --- VMA A
> > > > >>     0x400000-0x600000 --- Hole
> > > > >>     0x600000-0x800000 --- VMA B
> > > > >>
> > > > >>     ...
> > > > >>
> > > > >> A call to mmap() with hint=0x400000 and len=0x200000 behaves differently:
> > > > >>
> > > > >>   - Before THP alignment: The requested region (size 0x200000) fits into
> > > > >>     the gap at 0x400000, so the hint is respected.
> > > > >>
> > > > >>   - After alignment: The logic searches for a region of size
> > > > >>     0x400000 (len + PMD_SIZE) starting at 0x400000.
> > > > >>     This search fails due to the mapping at 0x600000 (VMA B), and the hint
> > > > >>     is ignored, falling back to arch_get_unmapped_area[_topdown]().
> > > >
> >
> > Hi all, Thanks for the reviews.
> >
> > > > Hmm looks like the search is not done in the optimal way regardless of
> > > > whether or not it ignores a hint - it should be able to find the hole, no?
> >
> > It's not able to find the hole in the example case because the size we
> > are looking for is now
> > (requested size + padding len) which is larger than the hole we have
> > at the hint address.
> >
> > > >
> > > > >> In general the hint is effectively ignored, if there is any
> > > > >> existing mapping in the below range:
> > > > >>
> > > > >>      [mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE)
> > > > >>
> > > > >> This changes the semantics of mmap hint; from ""Respect the hint if a
> > > > >> sufficiently large gap exists at the requested location" to "Respect the
> > > > >> hint only if an additional PMD-sized gap exists beyond the requested size".
> > > > >>
> > > > >> This has performance implications for allocators that allocate their heap
> > > > >> using mmap but try to keep it "as contiguous as possible" by using the
> > > > >> end of the exisiting heap as the address hint. With the new behavior
> > > > >> it's more likely to get a much less contiguous heap, adding extra
> > > > >> fragmentation and performance overhead.
> > > > >>
> > > > >> To restore the expected behavior; don't use thp_get_unmapped_area_vmflags()
> > > > >> when the user provided a hint address.
> > > >
> > > > Agreed, the hint should take precendence.
> > > >
> > > > > Thanks for fixing it. I agree we should respect the hint address. But
> > > > > this patch actually just fixed anonymous mapping and the file mappings
> > > > > which don't support thp_get_unmapped_area(). So I think you should
> > > > > move the hint check to __thp_get_unmapped_area().
> > > > >
> > > > > And Vlastimil's fix d4148aeab412 ("mm, mmap: limit THP alignment of
> > > > > anonymous mappings to PMD-aligned sizes") should be moved to there too
> > > > > IMHO.
> > > >
> > > > This was brought up, but I didn't want to do it as part of the stable fix as
> > > > that would change even situations that Rik's change didn't.
> > > > If the mmap hint change is another stable hotfix, I wouldn't conflate it
> > > > either. But we can try it for further development. But careful about just
> > > > moving the code as-is, the file-based mappings are different than anonymous
> > > > memory and I believe file offsets matter:
> > > >
> > > > https://lore.kernel.org/all/9d7c73f6-1e1a-458b-93c6-3b44959022e0@suse.cz/
> > > >
> > > > https://lore.kernel.org/all/5f7a49e8-0416-4648-a704-a7a67e8cd894@suse.cz/
> > >
> >
> > I see, so I think we should keep the check here to address the
> > immediate regression for anonymous mappings.
> >
> > I believe what we would need to address this longer term is to have
> > vma_iter_lowest() [1] vma_iter_highest() [2] take into account the
> > alignment when doing the search. That way we don't need to inflate the
> > search size to facilitate the manual alignment after the fact. I
> > haven't looked too too deeply into this, maybe Liam has some ideas
> > about that?
> >
> > [1] https://github.com/torvalds/linux/blob/v6.12-rc7/mm/vma.h#L420
> >
> > [2] https://github.com/torvalds/linux/blob/v6.12-rc7/mm/vma.h#L426
> >
> > > Did some research about the history of the code, I found this commit:
> > >
> > > 97d3d0f9a1cf ("mm/huge_memory.c: thp: fix conflict of above-47bit hint
> > > address and PMD alignment"), it tried to fix "the function would not
> > > try to allocate PMD-aligned area if *any* hint address specified."
> > > It was for file mapping back then since anonymous mapping THP
> > > alignment was not supported yet.
> > >
> > > But it seems like this patch somehow tried to do something reverse. It
> > > may not be correct either.
> > >
> >
> > IIUC Kirill's patch is doing the right thing (mostly), i.e. it will
> > return the hint address (without any rounding to PMD alignment). The
> > case it doesn't handle is what I mentioned above, where we aren't able
> > to find the hole at the hint address in the first place because the
> > hole is smaller than (size + padding len)
>
> Yes. But my point is your fix (just simply skip PMD alignment when
> hint is specified) actually broke what Kirill fixed IIUC.

Hi Yang,

It's true the PMD alignment is skipped in that case for the anon
mappings. Though I believe that's still what we want to have here
initially as we shouldn't regress the hint behaviour.

I've posted the v2 here:
https://lore.kernel.org/lkml/20241118214650.3667577-1-kaleshsingh@google.com/

>
> >
> > > With Vlastimis's fix, we just try to make the address THP aligned for
> > > anonymous mapping when the size is PMD aligned. So we don't need to
> > > take into account the padding for anonymous mapping anymore.
> > >
> >
> > We still need to have padding length because PMD alignment of the size
> > doesn't mean that the start address returned by the search will be PMD
> > aligned. Inherently those are only PAGE aligned.
>
> Yes, you are right, I overlooked this. I think we can do this in two
> passes. Use len w/o padding in the first pass. If the returned address
> equals the hint or it is already PMD aligned, just return it.
> Otherwise it means there is no hole with suitable size and alignment.
> In the second pass, we redo it with padding. Just off the top of my
> head, others may have better ideas.
>

You are right, it's one way we can do it. Though, I am concerned that
the 2 passes will add overhead on mmap() performance. One idea I have
is to move the alignment handling lower down to
vma_iter_highest()/lowest(). Interested to hear your thoughts on that.
We can do this in a follow up patch, which should fix file mappings as
well.

Thanks,
Kalesh

> >
> > Thanks,
> > Kalesh
> >
> > > So IIUC we should do something like:
> > >
> > > @@ -1085,7 +1085,11 @@ static unsigned long
> > > __thp_get_unmapped_area(struct file *filp,
> > >         if (off_end <= off_align || (off_end - off_align) < size)
> > >                 return 0;
> > >
> > > -       len_pad = len + size;
> > > +       if (filp)
> > > +               len_pad = len + size;
> > > +       else
> > > +               len_pad = len;
> > > +
> > >         if (len_pad < len || (off + len_pad) < off)
> > >                 return 0;
> > >
> > > >
> > > > >> Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > >> Cc: Vlastimil Babka <vbabka@suse.cz>
> > > > >> Cc: Yang Shi <yang@os.amperecomputing.com>
> > > > >> Cc: Rik van Riel <riel@surriel.com>
> > > > >> Cc: Ryan Roberts <ryan.roberts@arm.com>
> > > > >> Cc: Suren Baghdasaryan <surenb@google.com>
> > > > >> Cc: Minchan Kim <minchan@kernel.org>
> > > > >> Cc: Hans Boehm <hboehm@google.com>
> > > > >> Cc: Lokesh Gidra <lokeshgidra@google.com>
> > > > >> Cc: <stable@vger.kernel.org>
> > > > >> Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries")
> > > > >> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > > >
> > > > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> > > >
> > > > >> ---
> > > > >>  mm/mmap.c | 1 +
> > > > >>  1 file changed, 1 insertion(+)
> > > > >>
> > > > >> diff --git a/mm/mmap.c b/mm/mmap.c
> > > > >> index 79d541f1502b..2f01f1a8e304 100644
> > > > >> --- a/mm/mmap.c
> > > > >> +++ b/mm/mmap.c
> > > > >> @@ -901,6 +901,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> > > > >>         if (get_area) {
> > > > >>                 addr = get_area(file, addr, len, pgoff, flags);
> > > > >>         } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)
> > > > >> +                  && !addr /* no hint */
> > > > >>                    && IS_ALIGNED(len, PMD_SIZE)) {
> > > > >>                 /* Ensures that larger anonymous mappings are THP aligned. */
> > > > >>                 addr = thp_get_unmapped_area_vmflags(file, addr, len,
> > > > >>
> > > > >> base-commit: 2d5404caa8c7bb5c4e0435f94b28834ae5456623
> > > > >> --
> > > > >> 2.47.0.338.g60cca15819-goog
> > > > >>
> > > >
diff mbox series

Patch

diff --git a/mm/mmap.c b/mm/mmap.c
index 79d541f1502b..2f01f1a8e304 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -901,6 +901,7 @@  __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
 	if (get_area) {
 		addr = get_area(file, addr, len, pgoff, flags);
 	} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)
+		   && !addr /* no hint */
 		   && IS_ALIGNED(len, PMD_SIZE)) {
 		/* Ensures that larger anonymous mappings are THP aligned. */
 		addr = thp_get_unmapped_area_vmflags(file, addr, len,