diff mbox series

[RESEND] mm: align larger anonymous mappings on THP boundaries

Message ID 20231214223423.1133074-1-yang@os.amperecomputing.com (mailing list archive)
State New
Headers show
Series [RESEND] mm: align larger anonymous mappings on THP boundaries | expand

Commit Message

Yang Shi Dec. 14, 2023, 10:34 p.m. UTC
From: Rik van Riel <riel@surriel.com>

Align larger anonymous memory mappings on THP boundaries by going through
thp_get_unmapped_area if THPs are enabled for the current process.

With this patch, larger anonymous mappings are now THP aligned.  When a
malloc library allocates a 2MB or larger arena, that arena can now be
mapped with THPs right from the start, which can result in better TLB hit
rates and execution time.

Link: https://lkml.kernel.org/r/20220809142457.4751229f@imladris.surriel.com
Signed-off-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christopher Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
This patch was applied to v6.1, but was reverted due to a regression
report.  However it turned out the regression was not due to this patch.
I ping'ed Andrew to reapply this patch, Andrew may forget it.  This
patch helps promote THP, so I rebased it onto the latest mm-unstable.


 mm/mmap.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Ryan Roberts Jan. 20, 2024, 12:04 p.m. UTC | #1
On 14/12/2023 22:34, Yang Shi wrote:
> From: Rik van Riel <riel@surriel.com>
> 
> Align larger anonymous memory mappings on THP boundaries by going through
> thp_get_unmapped_area if THPs are enabled for the current process.
> 
> With this patch, larger anonymous mappings are now THP aligned.  When a
> malloc library allocates a 2MB or larger arena, that arena can now be
> mapped with THPs right from the start, which can result in better TLB hit
> rates and execution time.
> 
> Link: https://lkml.kernel.org/r/20220809142457.4751229f@imladris.surriel.com
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Reviewed-by: Yang Shi <shy828301@gmail.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Christopher Lameter <cl@linux.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> This patch was applied to v6.1, but was reverted due to a regression
> report.  However it turned out the regression was not due to this patch.
> I ping'ed Andrew to reapply this patch, Andrew may forget it.  This
> patch helps promote THP, so I rebased it onto the latest mm-unstable.

Hi Yang,

I'm not sure what regression you are referring to above, but I'm seeing a
performance regression in the virtual_address_range mm selftest on arm64, caused
by this patch (which is now in v6.7).

I see 2 problems when running the test; 1) it takes much longer to execute, and
2) the test fails. Both are related:

The (first part of the) test allocates as many 1GB anonymous blocks as it can in
the low 256TB of address space, passing NULL as the addr hint to mmap. Before
this patch, all allocations were abutted and contained in a single, merged VMA.
However, after this patch, each allocation is in its own VMA, and there is a 2M
gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower
because there are so many VMAs to check to find a new 1G gap. 2) It fails once
it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then
causes a subsequent calloc() to fail, which causes the test to fail.

Looking at the code, I think the problem is that arm64 selects
ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates
len+2M then always aligns to the bottom of the discovered gap. That causes the
2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole.

I'm not quite sure what the fix is - perhaps __thp_get_unmapped_area() should be
implemented around vm_unmapped_area(), which can manage the alignment more
intelligently?

But until/unless someone comes along with a fix, I think this patch should be
reverted.

Thanks,
Ryan


> 
> 
>  mm/mmap.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 9d780f415be3..dd25a2aa94f7 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2232,6 +2232,9 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
>  		 */
>  		pgoff = 0;
>  		get_area = shmem_get_unmapped_area;
> +	} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> +		/* Ensures that larger anonymous mappings are THP aligned. */
> +		get_area = thp_get_unmapped_area;
>  	}
>  
>  	addr = get_area(file, addr, len, pgoff, flags);
Ryan Roberts Jan. 20, 2024, 12:13 p.m. UTC | #2
On 20/01/2024 12:04, Ryan Roberts wrote:
> On 14/12/2023 22:34, Yang Shi wrote:
>> From: Rik van Riel <riel@surriel.com>
>>
>> Align larger anonymous memory mappings on THP boundaries by going through
>> thp_get_unmapped_area if THPs are enabled for the current process.
>>
>> With this patch, larger anonymous mappings are now THP aligned.  When a
>> malloc library allocates a 2MB or larger arena, that arena can now be
>> mapped with THPs right from the start, which can result in better TLB hit
>> rates and execution time.
>>
>> Link: https://lkml.kernel.org/r/20220809142457.4751229f@imladris.surriel.com
>> Signed-off-by: Rik van Riel <riel@surriel.com>
>> Reviewed-by: Yang Shi <shy828301@gmail.com>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Christopher Lameter <cl@linux.com>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> ---
>> This patch was applied to v6.1, but was reverted due to a regression
>> report.  However it turned out the regression was not due to this patch.
>> I ping'ed Andrew to reapply this patch, Andrew may forget it.  This
>> patch helps promote THP, so I rebased it onto the latest mm-unstable.
> 
> Hi Yang,
> 
> I'm not sure what regression you are referring to above, but I'm seeing a
> performance regression in the virtual_address_range mm selftest on arm64, caused
> by this patch (which is now in v6.7).
> 
> I see 2 problems when running the test; 1) it takes much longer to execute, and
> 2) the test fails. Both are related:
> 
> The (first part of the) test allocates as many 1GB anonymous blocks as it can in
> the low 256TB of address space, passing NULL as the addr hint to mmap. Before
> this patch, all allocations were abutted and contained in a single, merged VMA.
> However, after this patch, each allocation is in its own VMA, and there is a 2M
> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower
> because there are so many VMAs to check to find a new 1G gap. 2) It fails once
> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then
> causes a subsequent calloc() to fail, which causes the test to fail.
> 
> Looking at the code, I think the problem is that arm64 selects
> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates
> len+2M then always aligns to the bottom of the discovered gap. That causes the
> 2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole.
> 
> I'm not quite sure what the fix is - perhaps __thp_get_unmapped_area() should be
> implemented around vm_unmapped_area(), which can manage the alignment more
> intelligently?
> 
> But until/unless someone comes along with a fix, I think this patch should be
> reverted.

Looks like this patch is also the cause of `ksm_tests -H -s 100` starting to
fail on arm64. I haven't looked in detail, but it passes without the change and
fails with. So this should definitely be reverted, I think.


> 
> Thanks,
> Ryan
> 
> 
>>
>>
>>  mm/mmap.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 9d780f415be3..dd25a2aa94f7 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -2232,6 +2232,9 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
>>  		 */
>>  		pgoff = 0;
>>  		get_area = shmem_get_unmapped_area;
>> +	} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
>> +		/* Ensures that larger anonymous mappings are THP aligned. */
>> +		get_area = thp_get_unmapped_area;
>>  	}
>>  
>>  	addr = get_area(file, addr, len, pgoff, flags);
>
Matthew Wilcox Jan. 20, 2024, 4:39 p.m. UTC | #3
On Sat, Jan 20, 2024 at 12:04:27PM +0000, Ryan Roberts wrote:
> However, after this patch, each allocation is in its own VMA, and there is a 2M
> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower
> because there are so many VMAs to check to find a new 1G gap. 2) It fails once
> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then
> causes a subsequent calloc() to fail, which causes the test to fail.
> 
> Looking at the code, I think the problem is that arm64 selects
> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates
> len+2M then always aligns to the bottom of the discovered gap. That causes the
> 2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole.

As a quick hack, perhaps
#ifdef ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
take-the-top-half
#else
current-take-bottom-half-code
#endif

?
Ryan Roberts Jan. 22, 2024, 11:37 a.m. UTC | #4
On 20/01/2024 16:39, Matthew Wilcox wrote:
> On Sat, Jan 20, 2024 at 12:04:27PM +0000, Ryan Roberts wrote:
>> However, after this patch, each allocation is in its own VMA, and there is a 2M
>> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower
>> because there are so many VMAs to check to find a new 1G gap. 2) It fails once
>> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then
>> causes a subsequent calloc() to fail, which causes the test to fail.
>>
>> Looking at the code, I think the problem is that arm64 selects
>> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates
>> len+2M then always aligns to the bottom of the discovered gap. That causes the
>> 2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole.
> 
> As a quick hack, perhaps
> #ifdef ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
> take-the-top-half
> #else
> current-take-bottom-half-code
> #endif
> 
> ?

There is a general problem though that there is a trade-off between abutting
VMAs, and aligning them to PMD boundaries. This patch has decided that in
general the latter is preferable. The case I'm hitting is special though, in
that both requirements could be achieved but currently are not.

The below fixes it, but I feel like there should be some bitwise magic that
would give the correct answer without the conditional - but my head is gone and
I can't see it. Any thoughts?

Beyond this, though, there is also a latent bug where the offset provided to
mmap() is carried all the way through to the get_unmapped_area()
impelementation, even for MAP_ANONYMOUS - I'm pretty sure we should be
force-zeroing it for MAP_ANONYMOUS? Certainly before this change, for arches
that use the default get_unmapped_area(), any non-zero offset would not have
been used. But this change starts using it, which is incorrect. That said, there
are some arches that override the default get_unmapped_area() and do use the
offset. So I'm not sure if this is a bug or a feature that user space can pass
an arbitrary value to the implementation for anon memory??

Finally, the second test failure I reported (ksm_tests) is actually caused by a
bug in the test code, but provoked by this change. So I'll send out a fix for
the test code separately.


diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4f542444a91f..68ac54117c77 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -632,7 +632,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
 {
        loff_t off_end = off + len;
        loff_t off_align = round_up(off, size);
-       unsigned long len_pad, ret;
+       unsigned long len_pad, ret, off_sub;

        if (off_end <= off_align || (off_end - off_align) < size)
                return 0;
@@ -658,7 +658,13 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
        if (ret == addr)
                return addr;

-       ret += (off - ret) & (size - 1);
+       off_sub = (off - ret) & (size - 1);
+
+       if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown &&
+           !off_sub)
+               return ret + size;
+
+       ret += off_sub;
        return ret;
 }
Yang Shi Jan. 22, 2024, 7:43 p.m. UTC | #5
On Mon, Jan 22, 2024 at 3:37 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 20/01/2024 16:39, Matthew Wilcox wrote:
> > On Sat, Jan 20, 2024 at 12:04:27PM +0000, Ryan Roberts wrote:
> >> However, after this patch, each allocation is in its own VMA, and there is a 2M
> >> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower
> >> because there are so many VMAs to check to find a new 1G gap. 2) It fails once
> >> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then
> >> causes a subsequent calloc() to fail, which causes the test to fail.
> >>
> >> Looking at the code, I think the problem is that arm64 selects
> >> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates
> >> len+2M then always aligns to the bottom of the discovered gap. That causes the
> >> 2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole.
> >
> > As a quick hack, perhaps
> > #ifdef ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
> > take-the-top-half
> > #else
> > current-take-bottom-half-code
> > #endif
> >
> > ?

Thanks for the suggestion. It makes sense to me. Doing the alignment
needs to take into account this.

>
> There is a general problem though that there is a trade-off between abutting
> VMAs, and aligning them to PMD boundaries. This patch has decided that in
> general the latter is preferable. The case I'm hitting is special though, in
> that both requirements could be achieved but currently are not.
>
> The below fixes it, but I feel like there should be some bitwise magic that
> would give the correct answer without the conditional - but my head is gone and
> I can't see it. Any thoughts?

Thanks Ryan for the patch. TBH I didn't see a bitwise magic without
the conditional either.

>
> Beyond this, though, there is also a latent bug where the offset provided to
> mmap() is carried all the way through to the get_unmapped_area()
> impelementation, even for MAP_ANONYMOUS - I'm pretty sure we should be
> force-zeroing it for MAP_ANONYMOUS? Certainly before this change, for arches
> that use the default get_unmapped_area(), any non-zero offset would not have
> been used. But this change starts using it, which is incorrect. That said, there
> are some arches that override the default get_unmapped_area() and do use the
> offset. So I'm not sure if this is a bug or a feature that user space can pass
> an arbitrary value to the implementation for anon memory??

Thanks for noticing this. If I read the code correctly, the pgoff used
by some arches to workaround VIPT caches, and it looks like it is for
shared mapping only (just checked arm and mips). And I believe
everybody assumes 0 should be used when doing anonymous mapping. The
offset should have nothing to do with seeking proper unmapped virtual
area. But the pgoff does make sense for file THP due to the alignment
requirements. I think it should be zero'ed for anonymous mappings,
like:

diff --git a/mm/mmap.c b/mm/mmap.c
index 2ff79b1d1564..a9ed353ce627 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1830,6 +1830,7 @@ get_unmapped_area(struct file *file, unsigned
long addr, unsigned long len,
                pgoff = 0;
                get_area = shmem_get_unmapped_area;
        } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+               pgoff = 0;
                /* Ensures that larger anonymous mappings are THP aligned. */
                get_area = thp_get_unmapped_area;
        }

>
> Finally, the second test failure I reported (ksm_tests) is actually caused by a
> bug in the test code, but provoked by this change. So I'll send out a fix for
> the test code separately.
>
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 4f542444a91f..68ac54117c77 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -632,7 +632,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
>  {
>         loff_t off_end = off + len;
>         loff_t off_align = round_up(off, size);
> -       unsigned long len_pad, ret;
> +       unsigned long len_pad, ret, off_sub;
>
>         if (off_end <= off_align || (off_end - off_align) < size)
>                 return 0;
> @@ -658,7 +658,13 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
>         if (ret == addr)
>                 return addr;
>
> -       ret += (off - ret) & (size - 1);
> +       off_sub = (off - ret) & (size - 1);
> +
> +       if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown &&
> +           !off_sub)
> +               return ret + size;
> +
> +       ret += off_sub;
>         return ret;
>  }

I didn't spot any problem, would you please come up with a formal patch?
Yang Shi Jan. 22, 2024, 8:20 p.m. UTC | #6
On Mon, Jan 22, 2024 at 3:37 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 20/01/2024 16:39, Matthew Wilcox wrote:
> > On Sat, Jan 20, 2024 at 12:04:27PM +0000, Ryan Roberts wrote:
> >> However, after this patch, each allocation is in its own VMA, and there is a 2M
> >> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower
> >> because there are so many VMAs to check to find a new 1G gap. 2) It fails once
> >> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then
> >> causes a subsequent calloc() to fail, which causes the test to fail.
> >>
> >> Looking at the code, I think the problem is that arm64 selects
> >> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates
> >> len+2M then always aligns to the bottom of the discovered gap. That causes the
> >> 2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole.
> >
> > As a quick hack, perhaps
> > #ifdef ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
> > take-the-top-half
> > #else
> > current-take-bottom-half-code
> > #endif
> >
> > ?
>
> There is a general problem though that there is a trade-off between abutting
> VMAs, and aligning them to PMD boundaries. This patch has decided that in
> general the latter is preferable. The case I'm hitting is special though, in
> that both requirements could be achieved but currently are not.
>
> The below fixes it, but I feel like there should be some bitwise magic that
> would give the correct answer without the conditional - but my head is gone and
> I can't see it. Any thoughts?
>
> Beyond this, though, there is also a latent bug where the offset provided to
> mmap() is carried all the way through to the get_unmapped_area()
> impelementation, even for MAP_ANONYMOUS - I'm pretty sure we should be
> force-zeroing it for MAP_ANONYMOUS? Certainly before this change, for arches
> that use the default get_unmapped_area(), any non-zero offset would not have
> been used. But this change starts using it, which is incorrect. That said, there
> are some arches that override the default get_unmapped_area() and do use the
> offset. So I'm not sure if this is a bug or a feature that user space can pass
> an arbitrary value to the implementation for anon memory??
>
> Finally, the second test failure I reported (ksm_tests) is actually caused by a
> bug in the test code, but provoked by this change. So I'll send out a fix for
> the test code separately.

Thanks for figuring this out.

>
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 4f542444a91f..68ac54117c77 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -632,7 +632,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
>  {
>         loff_t off_end = off + len;
>         loff_t off_align = round_up(off, size);
> -       unsigned long len_pad, ret;
> +       unsigned long len_pad, ret, off_sub;
>
>         if (off_end <= off_align || (off_end - off_align) < size)
>                 return 0;
> @@ -658,7 +658,13 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
>         if (ret == addr)
>                 return addr;
>
> -       ret += (off - ret) & (size - 1);
> +       off_sub = (off - ret) & (size - 1);
> +
> +       if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown &&
> +           !off_sub)
> +               return ret + size;
> +
> +       ret += off_sub;
>         return ret;
>  }
Ryan Roberts Jan. 23, 2024, 9:41 a.m. UTC | #7
On 22/01/2024 19:43, Yang Shi wrote:
> On Mon, Jan 22, 2024 at 3:37 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 20/01/2024 16:39, Matthew Wilcox wrote:
>>> On Sat, Jan 20, 2024 at 12:04:27PM +0000, Ryan Roberts wrote:
>>>> However, after this patch, each allocation is in its own VMA, and there is a 2M
>>>> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower
>>>> because there are so many VMAs to check to find a new 1G gap. 2) It fails once
>>>> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then
>>>> causes a subsequent calloc() to fail, which causes the test to fail.
>>>>
>>>> Looking at the code, I think the problem is that arm64 selects
>>>> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates
>>>> len+2M then always aligns to the bottom of the discovered gap. That causes the
>>>> 2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole.
>>>
>>> As a quick hack, perhaps
>>> #ifdef ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
>>> take-the-top-half
>>> #else
>>> current-take-bottom-half-code
>>> #endif
>>>
>>> ?
> 
> Thanks for the suggestion. It makes sense to me. Doing the alignment
> needs to take into account this.
> 
>>
>> There is a general problem though that there is a trade-off between abutting
>> VMAs, and aligning them to PMD boundaries. This patch has decided that in
>> general the latter is preferable. The case I'm hitting is special though, in
>> that both requirements could be achieved but currently are not.
>>
>> The below fixes it, but I feel like there should be some bitwise magic that
>> would give the correct answer without the conditional - but my head is gone and
>> I can't see it. Any thoughts?
> 
> Thanks Ryan for the patch. TBH I didn't see a bitwise magic without
> the conditional either.
> 
>>
>> Beyond this, though, there is also a latent bug where the offset provided to
>> mmap() is carried all the way through to the get_unmapped_area()
>> impelementation, even for MAP_ANONYMOUS - I'm pretty sure we should be
>> force-zeroing it for MAP_ANONYMOUS? Certainly before this change, for arches
>> that use the default get_unmapped_area(), any non-zero offset would not have
>> been used. But this change starts using it, which is incorrect. That said, there
>> are some arches that override the default get_unmapped_area() and do use the
>> offset. So I'm not sure if this is a bug or a feature that user space can pass
>> an arbitrary value to the implementation for anon memory??
> 
> Thanks for noticing this. If I read the code correctly, the pgoff used
> by some arches to workaround VIPT caches, and it looks like it is for
> shared mapping only (just checked arm and mips). And I believe
> everybody assumes 0 should be used when doing anonymous mapping. The
> offset should have nothing to do with seeking proper unmapped virtual
> area. But the pgoff does make sense for file THP due to the alignment
> requirements. I think it should be zero'ed for anonymous mappings,
> like:
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 2ff79b1d1564..a9ed353ce627 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1830,6 +1830,7 @@ get_unmapped_area(struct file *file, unsigned
> long addr, unsigned long len,
>                 pgoff = 0;
>                 get_area = shmem_get_unmapped_area;
>         } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> +               pgoff = 0;
>                 /* Ensures that larger anonymous mappings are THP aligned. */
>                 get_area = thp_get_unmapped_area;
>         }

I think it would be cleaner to just zero pgoff if file==NULL, then it covers the
shared case, the THP case, and the non-THP case properly. I'll prepare a
separate patch for this.


> 
>>
>> Finally, the second test failure I reported (ksm_tests) is actually caused by a
>> bug in the test code, but provoked by this change. So I'll send out a fix for
>> the test code separately.
>>
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 4f542444a91f..68ac54117c77 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -632,7 +632,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
>>  {
>>         loff_t off_end = off + len;
>>         loff_t off_align = round_up(off, size);
>> -       unsigned long len_pad, ret;
>> +       unsigned long len_pad, ret, off_sub;
>>
>>         if (off_end <= off_align || (off_end - off_align) < size)
>>                 return 0;
>> @@ -658,7 +658,13 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
>>         if (ret == addr)
>>                 return addr;
>>
>> -       ret += (off - ret) & (size - 1);
>> +       off_sub = (off - ret) & (size - 1);
>> +
>> +       if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown &&
>> +           !off_sub)
>> +               return ret + size;
>> +
>> +       ret += off_sub;
>>         return ret;
>>  }
> 
> I didn't spot any problem, would you please come up with a formal patch?

Yeah, I'll aim to post today.
Yang Shi Jan. 23, 2024, 5:14 p.m. UTC | #8
On Tue, Jan 23, 2024 at 1:41 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 22/01/2024 19:43, Yang Shi wrote:
> > On Mon, Jan 22, 2024 at 3:37 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> On 20/01/2024 16:39, Matthew Wilcox wrote:
> >>> On Sat, Jan 20, 2024 at 12:04:27PM +0000, Ryan Roberts wrote:
> >>>> However, after this patch, each allocation is in its own VMA, and there is a 2M
> >>>> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower
> >>>> because there are so many VMAs to check to find a new 1G gap. 2) It fails once
> >>>> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then
> >>>> causes a subsequent calloc() to fail, which causes the test to fail.
> >>>>
> >>>> Looking at the code, I think the problem is that arm64 selects
> >>>> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates
> >>>> len+2M then always aligns to the bottom of the discovered gap. That causes the
> >>>> 2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole.
> >>>
> >>> As a quick hack, perhaps
> >>> #ifdef ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
> >>> take-the-top-half
> >>> #else
> >>> current-take-bottom-half-code
> >>> #endif
> >>>
> >>> ?
> >
> > Thanks for the suggestion. It makes sense to me. Doing the alignment
> > needs to take into account this.
> >
> >>
> >> There is a general problem though that there is a trade-off between abutting
> >> VMAs, and aligning them to PMD boundaries. This patch has decided that in
> >> general the latter is preferable. The case I'm hitting is special though, in
> >> that both requirements could be achieved but currently are not.
> >>
> >> The below fixes it, but I feel like there should be some bitwise magic that
> >> would give the correct answer without the conditional - but my head is gone and
> >> I can't see it. Any thoughts?
> >
> > Thanks Ryan for the patch. TBH I didn't see a bitwise magic without
> > the conditional either.
> >
> >>
> >> Beyond this, though, there is also a latent bug where the offset provided to
> >> mmap() is carried all the way through to the get_unmapped_area()
> >> impelementation, even for MAP_ANONYMOUS - I'm pretty sure we should be
> >> force-zeroing it for MAP_ANONYMOUS? Certainly before this change, for arches
> >> that use the default get_unmapped_area(), any non-zero offset would not have
> >> been used. But this change starts using it, which is incorrect. That said, there
> >> are some arches that override the default get_unmapped_area() and do use the
> >> offset. So I'm not sure if this is a bug or a feature that user space can pass
> >> an arbitrary value to the implementation for anon memory??
> >
> > Thanks for noticing this. If I read the code correctly, the pgoff used
> > by some arches to workaround VIPT caches, and it looks like it is for
> > shared mapping only (just checked arm and mips). And I believe
> > everybody assumes 0 should be used when doing anonymous mapping. The
> > offset should have nothing to do with seeking proper unmapped virtual
> > area. But the pgoff does make sense for file THP due to the alignment
> > requirements. I think it should be zero'ed for anonymous mappings,
> > like:
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 2ff79b1d1564..a9ed353ce627 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1830,6 +1830,7 @@ get_unmapped_area(struct file *file, unsigned
> > long addr, unsigned long len,
> >                 pgoff = 0;
> >                 get_area = shmem_get_unmapped_area;
> >         } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > +               pgoff = 0;
> >                 /* Ensures that larger anonymous mappings are THP aligned. */
> >                 get_area = thp_get_unmapped_area;
> >         }
>
> I think it would be cleaner to just zero pgoff if file==NULL, then it covers the
> shared case, the THP case, and the non-THP case properly. I'll prepare a
> separate patch for this.

IIUC I don't think this is ok for those arches which have to
workaround VIPT cache since MAP_ANONYMOUS | MAP_SHARED with NULL file
pointer is a common case for creating tmpfs mapping. For example,
arm's arch_get_unmapped_area() has:

if (aliasing)
        do_align = filp || (flags & MAP_SHARED);

The pgoff is needed if do_align is true. So we should just zero pgoff
iff !file && !MAP_SHARED like what my patch does, we can move the
zeroing to a better place.

>
>
> >
> >>
> >> Finally, the second test failure I reported (ksm_tests) is actually caused by a
> >> bug in the test code, but provoked by this change. So I'll send out a fix for
> >> the test code separately.
> >>
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index 4f542444a91f..68ac54117c77 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -632,7 +632,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
> >>  {
> >>         loff_t off_end = off + len;
> >>         loff_t off_align = round_up(off, size);
> >> -       unsigned long len_pad, ret;
> >> +       unsigned long len_pad, ret, off_sub;
> >>
> >>         if (off_end <= off_align || (off_end - off_align) < size)
> >>                 return 0;
> >> @@ -658,7 +658,13 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
> >>         if (ret == addr)
> >>                 return addr;
> >>
> >> -       ret += (off - ret) & (size - 1);
> >> +       off_sub = (off - ret) & (size - 1);
> >> +
> >> +       if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown &&
> >> +           !off_sub)
> >> +               return ret + size;
> >> +
> >> +       ret += off_sub;
> >>         return ret;
> >>  }
> >
> > I didn't spot any problem, would you please come up with a formal patch?
>
> Yeah, I'll aim to post today.

Thanks!

>
>
Yang Shi Jan. 23, 2024, 5:26 p.m. UTC | #9
On Tue, Jan 23, 2024 at 9:14 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Tue, Jan 23, 2024 at 1:41 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
> >
> > On 22/01/2024 19:43, Yang Shi wrote:
> > > On Mon, Jan 22, 2024 at 3:37 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
> > >>
> > >> On 20/01/2024 16:39, Matthew Wilcox wrote:
> > >>> On Sat, Jan 20, 2024 at 12:04:27PM +0000, Ryan Roberts wrote:
> > >>>> However, after this patch, each allocation is in its own VMA, and there is a 2M
> > >>>> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower
> > >>>> because there are so many VMAs to check to find a new 1G gap. 2) It fails once
> > >>>> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then
> > >>>> causes a subsequent calloc() to fail, which causes the test to fail.
> > >>>>
> > >>>> Looking at the code, I think the problem is that arm64 selects
> > >>>> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates
> > >>>> len+2M then always aligns to the bottom of the discovered gap. That causes the
> > >>>> 2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole.
> > >>>
> > >>> As a quick hack, perhaps
> > >>> #ifdef ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
> > >>> take-the-top-half
> > >>> #else
> > >>> current-take-bottom-half-code
> > >>> #endif
> > >>>
> > >>> ?
> > >
> > > Thanks for the suggestion. It makes sense to me. Doing the alignment
> > > needs to take into account this.
> > >
> > >>
> > >> There is a general problem though that there is a trade-off between abutting
> > >> VMAs, and aligning them to PMD boundaries. This patch has decided that in
> > >> general the latter is preferable. The case I'm hitting is special though, in
> > >> that both requirements could be achieved but currently are not.
> > >>
> > >> The below fixes it, but I feel like there should be some bitwise magic that
> > >> would give the correct answer without the conditional - but my head is gone and
> > >> I can't see it. Any thoughts?
> > >
> > > Thanks Ryan for the patch. TBH I didn't see a bitwise magic without
> > > the conditional either.
> > >
> > >>
> > >> Beyond this, though, there is also a latent bug where the offset provided to
> > >> mmap() is carried all the way through to the get_unmapped_area()
> > >> impelementation, even for MAP_ANONYMOUS - I'm pretty sure we should be
> > >> force-zeroing it for MAP_ANONYMOUS? Certainly before this change, for arches
> > >> that use the default get_unmapped_area(), any non-zero offset would not have
> > >> been used. But this change starts using it, which is incorrect. That said, there
> > >> are some arches that override the default get_unmapped_area() and do use the
> > >> offset. So I'm not sure if this is a bug or a feature that user space can pass
> > >> an arbitrary value to the implementation for anon memory??
> > >
> > > Thanks for noticing this. If I read the code correctly, the pgoff used
> > > by some arches to workaround VIPT caches, and it looks like it is for
> > > shared mapping only (just checked arm and mips). And I believe
> > > everybody assumes 0 should be used when doing anonymous mapping. The
> > > offset should have nothing to do with seeking proper unmapped virtual
> > > area. But the pgoff does make sense for file THP due to the alignment
> > > requirements. I think it should be zero'ed for anonymous mappings,
> > > like:
> > >
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index 2ff79b1d1564..a9ed353ce627 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -1830,6 +1830,7 @@ get_unmapped_area(struct file *file, unsigned
> > > long addr, unsigned long len,
> > >                 pgoff = 0;
> > >                 get_area = shmem_get_unmapped_area;
> > >         } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > > +               pgoff = 0;
> > >                 /* Ensures that larger anonymous mappings are THP aligned. */
> > >                 get_area = thp_get_unmapped_area;
> > >         }
> >
> > I think it would be cleaner to just zero pgoff if file==NULL, then it covers the
> > shared case, the THP case, and the non-THP case properly. I'll prepare a
> > separate patch for this.
>
> IIUC I don't think this is ok for those arches which have to
> workaround VIPT cache since MAP_ANONYMOUS | MAP_SHARED with NULL file
> pointer is a common case for creating tmpfs mapping. For example,
> arm's arch_get_unmapped_area() has:
>
> if (aliasing)
>         do_align = filp || (flags & MAP_SHARED);
>
> The pgoff is needed if do_align is true. So we should just zero pgoff
> iff !file && !MAP_SHARED like what my patch does, we can move the
> zeroing to a better place.

Rethinking this... zeroing pgoff when file is NULL should be ok since
MAP_ANOYMOUS | MAP_SHARED mapping should typically have zero offset.
I'm not aware of any usecase with non-zero offset, or sane usecase...

>
> >
> >
> > >
> > >>
> > >> Finally, the second test failure I reported (ksm_tests) is actually caused by a
> > >> bug in the test code, but provoked by this change. So I'll send out a fix for
> > >> the test code separately.
> > >>
> > >>
> > >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > >> index 4f542444a91f..68ac54117c77 100644
> > >> --- a/mm/huge_memory.c
> > >> +++ b/mm/huge_memory.c
> > >> @@ -632,7 +632,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
> > >>  {
> > >>         loff_t off_end = off + len;
> > >>         loff_t off_align = round_up(off, size);
> > >> -       unsigned long len_pad, ret;
> > >> +       unsigned long len_pad, ret, off_sub;
> > >>
> > >>         if (off_end <= off_align || (off_end - off_align) < size)
> > >>                 return 0;
> > >> @@ -658,7 +658,13 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
> > >>         if (ret == addr)
> > >>                 return addr;
> > >>
> > >> -       ret += (off - ret) & (size - 1);
> > >> +       off_sub = (off - ret) & (size - 1);
> > >> +
> > >> +       if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown &&
> > >> +           !off_sub)
> > >> +               return ret + size;
> > >> +
> > >> +       ret += off_sub;
> > >>         return ret;
> > >>  }
> > >
> > > I didn't spot any problem, would you please come up with a formal patch?
> >
> > Yeah, I'll aim to post today.
>
> Thanks!
>
> >
> >
Ryan Roberts Jan. 23, 2024, 5:26 p.m. UTC | #10
On 23/01/2024 17:14, Yang Shi wrote:
> On Tue, Jan 23, 2024 at 1:41 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 22/01/2024 19:43, Yang Shi wrote:
>>> On Mon, Jan 22, 2024 at 3:37 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
>>>> On 20/01/2024 16:39, Matthew Wilcox wrote:
>>>>> On Sat, Jan 20, 2024 at 12:04:27PM +0000, Ryan Roberts wrote:
>>>>>> However, after this patch, each allocation is in its own VMA, and there is a 2M
>>>>>> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower
>>>>>> because there are so many VMAs to check to find a new 1G gap. 2) It fails once
>>>>>> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then
>>>>>> causes a subsequent calloc() to fail, which causes the test to fail.
>>>>>>
>>>>>> Looking at the code, I think the problem is that arm64 selects
>>>>>> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates
>>>>>> len+2M then always aligns to the bottom of the discovered gap. That causes the
>>>>>> 2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole.
>>>>>
>>>>> As a quick hack, perhaps
>>>>> #ifdef ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
>>>>> take-the-top-half
>>>>> #else
>>>>> current-take-bottom-half-code
>>>>> #endif
>>>>>
>>>>> ?
>>>
>>> Thanks for the suggestion. It makes sense to me. Doing the alignment
>>> needs to take into account this.
>>>
>>>>
>>>> There is a general problem though that there is a trade-off between abutting
>>>> VMAs, and aligning them to PMD boundaries. This patch has decided that in
>>>> general the latter is preferable. The case I'm hitting is special though, in
>>>> that both requirements could be achieved but currently are not.
>>>>
>>>> The below fixes it, but I feel like there should be some bitwise magic that
>>>> would give the correct answer without the conditional - but my head is gone and
>>>> I can't see it. Any thoughts?
>>>
>>> Thanks Ryan for the patch. TBH I didn't see a bitwise magic without
>>> the conditional either.
>>>
>>>>
>>>> Beyond this, though, there is also a latent bug where the offset provided to
>>>> mmap() is carried all the way through to the get_unmapped_area()
>>>> impelementation, even for MAP_ANONYMOUS - I'm pretty sure we should be
>>>> force-zeroing it for MAP_ANONYMOUS? Certainly before this change, for arches
>>>> that use the default get_unmapped_area(), any non-zero offset would not have
>>>> been used. But this change starts using it, which is incorrect. That said, there
>>>> are some arches that override the default get_unmapped_area() and do use the
>>>> offset. So I'm not sure if this is a bug or a feature that user space can pass
>>>> an arbitrary value to the implementation for anon memory??
>>>
>>> Thanks for noticing this. If I read the code correctly, the pgoff used
>>> by some arches to workaround VIPT caches, and it looks like it is for
>>> shared mapping only (just checked arm and mips). And I believe
>>> everybody assumes 0 should be used when doing anonymous mapping. The
>>> offset should have nothing to do with seeking proper unmapped virtual
>>> area. But the pgoff does make sense for file THP due to the alignment
>>> requirements. I think it should be zero'ed for anonymous mappings,
>>> like:
>>>
>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>> index 2ff79b1d1564..a9ed353ce627 100644
>>> --- a/mm/mmap.c
>>> +++ b/mm/mmap.c
>>> @@ -1830,6 +1830,7 @@ get_unmapped_area(struct file *file, unsigned
>>> long addr, unsigned long len,
>>>                 pgoff = 0;
>>>                 get_area = shmem_get_unmapped_area;
>>>         } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
>>> +               pgoff = 0;
>>>                 /* Ensures that larger anonymous mappings are THP aligned. */
>>>                 get_area = thp_get_unmapped_area;
>>>         }
>>
>> I think it would be cleaner to just zero pgoff if file==NULL, then it covers the
>> shared case, the THP case, and the non-THP case properly. I'll prepare a
>> separate patch for this.
> 
> IIUC I don't think this is ok for those arches which have to
> workaround VIPT cache since MAP_ANONYMOUS | MAP_SHARED with NULL file
> pointer is a common case for creating tmpfs mapping. For example,
> arm's arch_get_unmapped_area() has:
> 
> if (aliasing)
>         do_align = filp || (flags & MAP_SHARED);
> 
> The pgoff is needed if do_align is true. So we should just zero pgoff
> iff !file && !MAP_SHARED like what my patch does, we can move the
> zeroing to a better place.

We crossed streams - I sent out the patch just as you sent this. My patch is
implemented as I proposed.

I'm not sure I agree with what you are saying. The mmap man page says this:

  The  contents  of  a file mapping (as opposed to an anonymous mapping; see
  MAP_ANONYMOUS below), are initialized using length bytes starting at offset
  offset in the file (or other object) referred to by the file descriptor fd.

So that implies offset is only relavent when a file is provided. It then goes on
to say:

  MAP_ANONYMOUS
  The mapping is not backed by any file; its contents are initialized to zero.
  The fd argument is ignored; however, some implementations require fd to be -1
  if MAP_ANONYMOUS (or MAP_ANON) is specified, and portable applications should
  ensure this. The offset argument should be zero.

So users are expected to pass offset=0 when mapping anon memory, for both shared
and private cases.

Infact, in the line above where you made your proposed change, pgoff is also
being zeroed for the (!file && (flags & MAP_SHARED)) case.


> 
>>
>>
>>>
>>>>
>>>> Finally, the second test failure I reported (ksm_tests) is actually caused by a
>>>> bug in the test code, but provoked by this change. So I'll send out a fix for
>>>> the test code separately.
>>>>
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 4f542444a91f..68ac54117c77 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -632,7 +632,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
>>>>  {
>>>>         loff_t off_end = off + len;
>>>>         loff_t off_align = round_up(off, size);
>>>> -       unsigned long len_pad, ret;
>>>> +       unsigned long len_pad, ret, off_sub;
>>>>
>>>>         if (off_end <= off_align || (off_end - off_align) < size)
>>>>                 return 0;
>>>> @@ -658,7 +658,13 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
>>>>         if (ret == addr)
>>>>                 return addr;
>>>>
>>>> -       ret += (off - ret) & (size - 1);
>>>> +       off_sub = (off - ret) & (size - 1);
>>>> +
>>>> +       if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown &&
>>>> +           !off_sub)
>>>> +               return ret + size;
>>>> +
>>>> +       ret += off_sub;
>>>>         return ret;
>>>>  }
>>>
>>> I didn't spot any problem, would you please come up with a formal patch?
>>
>> Yeah, I'll aim to post today.
> 
> Thanks!
> 
>>
>>
Yang Shi Jan. 23, 2024, 5:33 p.m. UTC | #11
On Tue, Jan 23, 2024 at 9:26 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 23/01/2024 17:14, Yang Shi wrote:
> > On Tue, Jan 23, 2024 at 1:41 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> On 22/01/2024 19:43, Yang Shi wrote:
> >>> On Mon, Jan 22, 2024 at 3:37 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>>>
> >>>> On 20/01/2024 16:39, Matthew Wilcox wrote:
> >>>>> On Sat, Jan 20, 2024 at 12:04:27PM +0000, Ryan Roberts wrote:
> >>>>>> However, after this patch, each allocation is in its own VMA, and there is a 2M
> >>>>>> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower
> >>>>>> because there are so many VMAs to check to find a new 1G gap. 2) It fails once
> >>>>>> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then
> >>>>>> causes a subsequent calloc() to fail, which causes the test to fail.
> >>>>>>
> >>>>>> Looking at the code, I think the problem is that arm64 selects
> >>>>>> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates
> >>>>>> len+2M then always aligns to the bottom of the discovered gap. That causes the
> >>>>>> 2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole.
> >>>>>
> >>>>> As a quick hack, perhaps
> >>>>> #ifdef ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
> >>>>> take-the-top-half
> >>>>> #else
> >>>>> current-take-bottom-half-code
> >>>>> #endif
> >>>>>
> >>>>> ?
> >>>
> >>> Thanks for the suggestion. It makes sense to me. Doing the alignment
> >>> needs to take into account this.
> >>>
> >>>>
> >>>> There is a general problem though that there is a trade-off between abutting
> >>>> VMAs, and aligning them to PMD boundaries. This patch has decided that in
> >>>> general the latter is preferable. The case I'm hitting is special though, in
> >>>> that both requirements could be achieved but currently are not.
> >>>>
> >>>> The below fixes it, but I feel like there should be some bitwise magic that
> >>>> would give the correct answer without the conditional - but my head is gone and
> >>>> I can't see it. Any thoughts?
> >>>
> >>> Thanks Ryan for the patch. TBH I didn't see a bitwise magic without
> >>> the conditional either.
> >>>
> >>>>
> >>>> Beyond this, though, there is also a latent bug where the offset provided to
> >>>> mmap() is carried all the way through to the get_unmapped_area()
> >>>> impelementation, even for MAP_ANONYMOUS - I'm pretty sure we should be
> >>>> force-zeroing it for MAP_ANONYMOUS? Certainly before this change, for arches
> >>>> that use the default get_unmapped_area(), any non-zero offset would not have
> >>>> been used. But this change starts using it, which is incorrect. That said, there
> >>>> are some arches that override the default get_unmapped_area() and do use the
> >>>> offset. So I'm not sure if this is a bug or a feature that user space can pass
> >>>> an arbitrary value to the implementation for anon memory??
> >>>
> >>> Thanks for noticing this. If I read the code correctly, the pgoff used
> >>> by some arches to workaround VIPT caches, and it looks like it is for
> >>> shared mapping only (just checked arm and mips). And I believe
> >>> everybody assumes 0 should be used when doing anonymous mapping. The
> >>> offset should have nothing to do with seeking proper unmapped virtual
> >>> area. But the pgoff does make sense for file THP due to the alignment
> >>> requirements. I think it should be zero'ed for anonymous mappings,
> >>> like:
> >>>
> >>> diff --git a/mm/mmap.c b/mm/mmap.c
> >>> index 2ff79b1d1564..a9ed353ce627 100644
> >>> --- a/mm/mmap.c
> >>> +++ b/mm/mmap.c
> >>> @@ -1830,6 +1830,7 @@ get_unmapped_area(struct file *file, unsigned
> >>> long addr, unsigned long len,
> >>>                 pgoff = 0;
> >>>                 get_area = shmem_get_unmapped_area;
> >>>         } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> >>> +               pgoff = 0;
> >>>                 /* Ensures that larger anonymous mappings are THP aligned. */
> >>>                 get_area = thp_get_unmapped_area;
> >>>         }
> >>
> >> I think it would be cleaner to just zero pgoff if file==NULL, then it covers the
> >> shared case, the THP case, and the non-THP case properly. I'll prepare a
> >> separate patch for this.
> >
> > IIUC I don't think this is ok for those arches which have to
> > workaround VIPT cache since MAP_ANONYMOUS | MAP_SHARED with NULL file
> > pointer is a common case for creating tmpfs mapping. For example,
> > arm's arch_get_unmapped_area() has:
> >
> > if (aliasing)
> >         do_align = filp || (flags & MAP_SHARED);
> >
> > The pgoff is needed if do_align is true. So we should just zero pgoff
> > iff !file && !MAP_SHARED like what my patch does, we can move the
> > zeroing to a better place.
>
> We crossed streams - I sent out the patch just as you sent this. My patch is
> implemented as I proposed.

We crossed again :-)

>
> I'm not sure I agree with what you are saying. The mmap man page says this:
>
>   The  contents  of  a file mapping (as opposed to an anonymous mapping; see
>   MAP_ANONYMOUS below), are initialized using length bytes starting at offset
>   offset in the file (or other object) referred to by the file descriptor fd.
>
> So that implies offset is only relavent when a file is provided. It then goes on
> to say:
>
>   MAP_ANONYMOUS
>   The mapping is not backed by any file; its contents are initialized to zero.
>   The fd argument is ignored; however, some implementations require fd to be -1
>   if MAP_ANONYMOUS (or MAP_ANON) is specified, and portable applications should
>   ensure this. The offset argument should be zero.
>
> So users are expected to pass offset=0 when mapping anon memory, for both shared
> and private cases.
>
> Infact, in the line above where you made your proposed change, pgoff is also
> being zeroed for the (!file && (flags & MAP_SHARED)) case.

Yeah, rethinking led me to the same conclusion.

>
>
> >
> >>
> >>
> >>>
> >>>>
> >>>> Finally, the second test failure I reported (ksm_tests) is actually caused by a
> >>>> bug in the test code, but provoked by this change. So I'll send out a fix for
> >>>> the test code separately.
> >>>>
> >>>>
> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>>> index 4f542444a91f..68ac54117c77 100644
> >>>> --- a/mm/huge_memory.c
> >>>> +++ b/mm/huge_memory.c
> >>>> @@ -632,7 +632,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
> >>>>  {
> >>>>         loff_t off_end = off + len;
> >>>>         loff_t off_align = round_up(off, size);
> >>>> -       unsigned long len_pad, ret;
> >>>> +       unsigned long len_pad, ret, off_sub;
> >>>>
> >>>>         if (off_end <= off_align || (off_end - off_align) < size)
> >>>>                 return 0;
> >>>> @@ -658,7 +658,13 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
> >>>>         if (ret == addr)
> >>>>                 return addr;
> >>>>
> >>>> -       ret += (off - ret) & (size - 1);
> >>>> +       off_sub = (off - ret) & (size - 1);
> >>>> +
> >>>> +       if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown &&
> >>>> +           !off_sub)
> >>>> +               return ret + size;
> >>>> +
> >>>> +       ret += off_sub;
> >>>>         return ret;
> >>>>  }
> >>>
> >>> I didn't spot any problem, would you please come up with a formal patch?
> >>
> >> Yeah, I'll aim to post today.
> >
> > Thanks!
> >
> >>
> >>
>
Kefeng Wang May 7, 2024, 8:25 a.m. UTC | #12
Hi Ryan, Yang and all,

We see another regression on arm64(no issue on x86) when test memory
latency from lmbench,

./lat_mem_rd -P 1 512M 128

memory latency(smaller is better)

MiB     6.9-rc7	6.9-rc7+revert
0.00049	1.539 	1.539
0.00098	1.539 	1.539
0.00195	1.539 	1.539
0.00293	1.539 	1.539
0.00391	1.539 	1.539
0.00586	1.539 	1.539
0.00781	1.539 	1.539
0.01172	1.539 	1.539
0.01562	1.539 	1.539
0.02344	1.539 	1.539
0.03125	1.539 	1.539
0.04688	1.539 	1.539
0.0625	1.540 	1.540
0.09375	3.634 	3.086
0.125   3.874 	3.175
0.1875  3.544 	3.288
0.25    3.556 	3.461
0.375   3.641 	3.644
0.5     4.125 	3.851
0.75    4.968 	4.323
1       5.143 	4.686
1.5     5.309 	4.957
2       5.370 	5.116
3       5.430 	5.471
4       5.457 	5.671
6       6.100 	6.170
8       6.496 	6.468

-----------------------s
* L1 cache = 8M, it is no big changes below 8M *
* but the latency reduce a lot when revert this patch from L2 *

12      6.917 	6.840
16      7.268 	7.077
24      7.536 	7.345
32      10.723 	9.421
48      14.220 	11.350
64      16.253 	12.189
96      14.494 	12.507
128     14.630 	12.560
192     15.402 	12.967
256     16.178 	12.957
384     15.177 	13.346
512     15.235 	13.233

After quickly check the smaps, but don't find any clues, any suggestion?

Thanks.

On 2024/1/24 1:26, Ryan Roberts wrote:
> On 23/01/2024 17:14, Yang Shi wrote:
>> On Tue, Jan 23, 2024 at 1:41 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>
>>> On 22/01/2024 19:43, Yang Shi wrote:
>>>> On Mon, Jan 22, 2024 at 3:37 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>
>>>>> On 20/01/2024 16:39, Matthew Wilcox wrote:
>>>>>> On Sat, Jan 20, 2024 at 12:04:27PM +0000, Ryan Roberts wrote:
>>>>>>> However, after this patch, each allocation is in its own VMA, and there is a 2M
>>>>>>> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower
>>>>>>> because there are so many VMAs to check to find a new 1G gap. 2) It fails once
>>>>>>> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then
>>>>>>> causes a subsequent calloc() to fail, which causes the test to fail.
>>>>>>>
>>>>>>> Looking at the code, I think the problem is that arm64 selects
>>>>>>> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates
>>>>>>> len+2M then always aligns to the bottom of the discovered gap. That causes the
>>>>>>> 2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole.
>>>>>>
>>>>>> As a quick hack, perhaps
>>>>>> #ifdef ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
>>>>>> take-the-top-half
>>>>>> #else
>>>>>> current-take-bottom-half-code
>>>>>> #endif
>>>>>>
>>>>>> ?
>>>>
>>>> Thanks for the suggestion. It makes sense to me. Doing the alignment
>>>> needs to take into account this.
>>>>
>>>>>
>>>>> There is a general problem though that there is a trade-off between abutting
>>>>> VMAs, and aligning them to PMD boundaries. This patch has decided that in
>>>>> general the latter is preferable. The case I'm hitting is special though, in
>>>>> that both requirements could be achieved but currently are not.
>>>>>
>>>>> The below fixes it, but I feel like there should be some bitwise magic that
>>>>> would give the correct answer without the conditional - but my head is gone and
>>>>> I can't see it. Any thoughts?
>>>>
>>>> Thanks Ryan for the patch. TBH I didn't see a bitwise magic without
>>>> the conditional either.
>>>>
>>>>>
>>>>> Beyond this, though, there is also a latent bug where the offset provided to
>>>>> mmap() is carried all the way through to the get_unmapped_area()
>>>>> impelementation, even for MAP_ANONYMOUS - I'm pretty sure we should be
>>>>> force-zeroing it for MAP_ANONYMOUS? Certainly before this change, for arches
>>>>> that use the default get_unmapped_area(), any non-zero offset would not have
>>>>> been used. But this change starts using it, which is incorrect. That said, there
>>>>> are some arches that override the default get_unmapped_area() and do use the
>>>>> offset. So I'm not sure if this is a bug or a feature that user space can pass
>>>>> an arbitrary value to the implementation for anon memory??
>>>>
>>>> Thanks for noticing this. If I read the code correctly, the pgoff used
>>>> by some arches to workaround VIPT caches, and it looks like it is for
>>>> shared mapping only (just checked arm and mips). And I believe
>>>> everybody assumes 0 should be used when doing anonymous mapping. The
>>>> offset should have nothing to do with seeking proper unmapped virtual
>>>> area. But the pgoff does make sense for file THP due to the alignment
>>>> requirements. I think it should be zero'ed for anonymous mappings,
>>>> like:
>>>>
>>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>>> index 2ff79b1d1564..a9ed353ce627 100644
>>>> --- a/mm/mmap.c
>>>> +++ b/mm/mmap.c
>>>> @@ -1830,6 +1830,7 @@ get_unmapped_area(struct file *file, unsigned
>>>> long addr, unsigned long len,
>>>>                  pgoff = 0;
>>>>                  get_area = shmem_get_unmapped_area;
>>>>          } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
>>>> +               pgoff = 0;
>>>>                  /* Ensures that larger anonymous mappings are THP aligned. */
>>>>                  get_area = thp_get_unmapped_area;
>>>>          }
>>>
>>> I think it would be cleaner to just zero pgoff if file==NULL, then it covers the
>>> shared case, the THP case, and the non-THP case properly. I'll prepare a
>>> separate patch for this.
>>
>> IIUC I don't think this is ok for those arches which have to
>> workaround VIPT cache since MAP_ANONYMOUS | MAP_SHARED with NULL file
>> pointer is a common case for creating tmpfs mapping. For example,
>> arm's arch_get_unmapped_area() has:
>>
>> if (aliasing)
>>          do_align = filp || (flags & MAP_SHARED);
>>
>> The pgoff is needed if do_align is true. So we should just zero pgoff
>> iff !file && !MAP_SHARED like what my patch does, we can move the
>> zeroing to a better place.
> 
> We crossed streams - I sent out the patch just as you sent this. My patch is
> implemented as I proposed.
> 
> I'm not sure I agree with what you are saying. The mmap man page says this:
> 
>    The  contents  of  a file mapping (as opposed to an anonymous mapping; see
>    MAP_ANONYMOUS below), are initialized using length bytes starting at offset
>    offset in the file (or other object) referred to by the file descriptor fd.
> 
> So that implies offset is only relavent when a file is provided. It then goes on
> to say:
> 
>    MAP_ANONYMOUS
>    The mapping is not backed by any file; its contents are initialized to zero.
>    The fd argument is ignored; however, some implementations require fd to be -1
>    if MAP_ANONYMOUS (or MAP_ANON) is specified, and portable applications should
>    ensure this. The offset argument should be zero.
> 
> So users are expected to pass offset=0 when mapping anon memory, for both shared
> and private cases.
> 
> Infact, in the line above where you made your proposed change, pgoff is also
> being zeroed for the (!file && (flags & MAP_SHARED)) case.
> 
> 
>>
>>>
>>>
>>>>
>>>>>
>>>>> Finally, the second test failure I reported (ksm_tests) is actually caused by a
>>>>> bug in the test code, but provoked by this change. So I'll send out a fix for
>>>>> the test code separately.
>>>>>
>>>>>
>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>> index 4f542444a91f..68ac54117c77 100644
>>>>> --- a/mm/huge_memory.c
>>>>> +++ b/mm/huge_memory.c
>>>>> @@ -632,7 +632,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
>>>>>   {
>>>>>          loff_t off_end = off + len;
>>>>>          loff_t off_align = round_up(off, size);
>>>>> -       unsigned long len_pad, ret;
>>>>> +       unsigned long len_pad, ret, off_sub;
>>>>>
>>>>>          if (off_end <= off_align || (off_end - off_align) < size)
>>>>>                  return 0;
>>>>> @@ -658,7 +658,13 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
>>>>>          if (ret == addr)
>>>>>                  return addr;
>>>>>
>>>>> -       ret += (off - ret) & (size - 1);
>>>>> +       off_sub = (off - ret) & (size - 1);
>>>>> +
>>>>> +       if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown &&
>>>>> +           !off_sub)
>>>>> +               return ret + size;
>>>>> +
>>>>> +       ret += off_sub;
>>>>>          return ret;
>>>>>   }
>>>>
>>>> I didn't spot any problem, would you please come up with a formal patch?
>>>
>>> Yeah, I'll aim to post today.
>>
>> Thanks!
>>
>>>
>>>
> 
>
Ryan Roberts May 7, 2024, 10:08 a.m. UTC | #13
On 07/05/2024 09:25, Kefeng Wang wrote:
> Hi Ryan, Yang and all,
> 
> We see another regression on arm64(no issue on x86) when test memory
> latency from lmbench,
> 
> ./lat_mem_rd -P 1 512M 128

Do you know exectly what this test is doing?

> 
> memory latency(smaller is better)
> 
> MiB     6.9-rc7    6.9-rc7+revert

And what exactly have you reverted? I'm guessing just commit efa7df3e3bb5 ("mm:
align larger anonymous mappings on THP boundaries")?

> 0.00049    1.539     1.539
> 0.00098    1.539     1.539
> 0.00195    1.539     1.539
> 0.00293    1.539     1.539
> 0.00391    1.539     1.539
> 0.00586    1.539     1.539
> 0.00781    1.539     1.539
> 0.01172    1.539     1.539
> 0.01562    1.539     1.539
> 0.02344    1.539     1.539
> 0.03125    1.539     1.539
> 0.04688    1.539     1.539
> 0.0625    1.540     1.540
> 0.09375    3.634     3.086

So the first regression is for 96K - I'm guessing that's the mmap size? That
size shouldn't even be affected by this patch, apart from a few adds and a
compare which determines the size is too small to do PMD alignment for.

> 0.125   3.874     3.175
> 0.1875  3.544     3.288
> 0.25    3.556     3.461
> 0.375   3.641     3.644
> 0.5     4.125     3.851
> 0.75    4.968     4.323
> 1       5.143     4.686
> 1.5     5.309     4.957
> 2       5.370     5.116
> 3       5.430     5.471
> 4       5.457     5.671
> 6       6.100     6.170
> 8       6.496     6.468
> 
> -----------------------s
> * L1 cache = 8M, it is no big changes below 8M *
> * but the latency reduce a lot when revert this patch from L2 *
> 
> 12      6.917     6.840
> 16      7.268     7.077
> 24      7.536     7.345
> 32      10.723     9.421
> 48      14.220     11.350
> 64      16.253     12.189
> 96      14.494     12.507
> 128     14.630     12.560
> 192     15.402     12.967
> 256     16.178     12.957
> 384     15.177     13.346
> 512     15.235     13.233
> 
> After quickly check the smaps, but don't find any clues, any suggestion?

Without knowing exactly what the test does, it's difficult to know what to
suggest. If you want to try something semi-randomly; it might be useful to rule
out the arm64 contpte feature. I don't see how that would be interacting here if
mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
ARM64_CONTPTE (needs EXPERT) at compile time.

> 
> Thanks.
> 
> On 2024/1/24 1:26, Ryan Roberts wrote:
>> On 23/01/2024 17:14, Yang Shi wrote:
>>> On Tue, Jan 23, 2024 at 1:41 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
>>>> On 22/01/2024 19:43, Yang Shi wrote:
>>>>> On Mon, Jan 22, 2024 at 3:37 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>
>>>>>> On 20/01/2024 16:39, Matthew Wilcox wrote:
>>>>>>> On Sat, Jan 20, 2024 at 12:04:27PM +0000, Ryan Roberts wrote:
>>>>>>>> However, after this patch, each allocation is in its own VMA, and there
>>>>>>>> is a 2M
>>>>>>>> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower
>>>>>>>> because there are so many VMAs to check to find a new 1G gap. 2) It
>>>>>>>> fails once
>>>>>>>> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then
>>>>>>>> causes a subsequent calloc() to fail, which causes the test to fail.
>>>>>>>>
>>>>>>>> Looking at the code, I think the problem is that arm64 selects
>>>>>>>> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area()
>>>>>>>> allocates
>>>>>>>> len+2M then always aligns to the bottom of the discovered gap. That
>>>>>>>> causes the
>>>>>>>> 2M hole. As far as I can see, x86 allocates bottom up, so you don't get
>>>>>>>> a hole.
>>>>>>>
>>>>>>> As a quick hack, perhaps
>>>>>>> #ifdef ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
>>>>>>> take-the-top-half
>>>>>>> #else
>>>>>>> current-take-bottom-half-code
>>>>>>> #endif
>>>>>>>
>>>>>>> ?
>>>>>
>>>>> Thanks for the suggestion. It makes sense to me. Doing the alignment
>>>>> needs to take into account this.
>>>>>
>>>>>>
>>>>>> There is a general problem though that there is a trade-off between abutting
>>>>>> VMAs, and aligning them to PMD boundaries. This patch has decided that in
>>>>>> general the latter is preferable. The case I'm hitting is special though, in
>>>>>> that both requirements could be achieved but currently are not.
>>>>>>
>>>>>> The below fixes it, but I feel like there should be some bitwise magic that
>>>>>> would give the correct answer without the conditional - but my head is
>>>>>> gone and
>>>>>> I can't see it. Any thoughts?
>>>>>
>>>>> Thanks Ryan for the patch. TBH I didn't see a bitwise magic without
>>>>> the conditional either.
>>>>>
>>>>>>
>>>>>> Beyond this, though, there is also a latent bug where the offset provided to
>>>>>> mmap() is carried all the way through to the get_unmapped_area()
>>>>>> impelementation, even for MAP_ANONYMOUS - I'm pretty sure we should be
>>>>>> force-zeroing it for MAP_ANONYMOUS? Certainly before this change, for arches
>>>>>> that use the default get_unmapped_area(), any non-zero offset would not have
>>>>>> been used. But this change starts using it, which is incorrect. That said,
>>>>>> there
>>>>>> are some arches that override the default get_unmapped_area() and do use the
>>>>>> offset. So I'm not sure if this is a bug or a feature that user space can
>>>>>> pass
>>>>>> an arbitrary value to the implementation for anon memory??
>>>>>
>>>>> Thanks for noticing this. If I read the code correctly, the pgoff used
>>>>> by some arches to workaround VIPT caches, and it looks like it is for
>>>>> shared mapping only (just checked arm and mips). And I believe
>>>>> everybody assumes 0 should be used when doing anonymous mapping. The
>>>>> offset should have nothing to do with seeking proper unmapped virtual
>>>>> area. But the pgoff does make sense for file THP due to the alignment
>>>>> requirements. I think it should be zero'ed for anonymous mappings,
>>>>> like:
>>>>>
>>>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>>>> index 2ff79b1d1564..a9ed353ce627 100644
>>>>> --- a/mm/mmap.c
>>>>> +++ b/mm/mmap.c
>>>>> @@ -1830,6 +1830,7 @@ get_unmapped_area(struct file *file, unsigned
>>>>> long addr, unsigned long len,
>>>>>                  pgoff = 0;
>>>>>                  get_area = shmem_get_unmapped_area;
>>>>>          } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
>>>>> +               pgoff = 0;
>>>>>                  /* Ensures that larger anonymous mappings are THP aligned. */
>>>>>                  get_area = thp_get_unmapped_area;
>>>>>          }
>>>>
>>>> I think it would be cleaner to just zero pgoff if file==NULL, then it covers
>>>> the
>>>> shared case, the THP case, and the non-THP case properly. I'll prepare a
>>>> separate patch for this.
>>>
>>> IIUC I don't think this is ok for those arches which have to
>>> workaround VIPT cache since MAP_ANONYMOUS | MAP_SHARED with NULL file
>>> pointer is a common case for creating tmpfs mapping. For example,
>>> arm's arch_get_unmapped_area() has:
>>>
>>> if (aliasing)
>>>          do_align = filp || (flags & MAP_SHARED);
>>>
>>> The pgoff is needed if do_align is true. So we should just zero pgoff
>>> iff !file && !MAP_SHARED like what my patch does, we can move the
>>> zeroing to a better place.
>>
>> We crossed streams - I sent out the patch just as you sent this. My patch is
>> implemented as I proposed.
>>
>> I'm not sure I agree with what you are saying. The mmap man page says this:
>>
>>    The  contents  of  a file mapping (as opposed to an anonymous mapping; see
>>    MAP_ANONYMOUS below), are initialized using length bytes starting at offset
>>    offset in the file (or other object) referred to by the file descriptor fd.
>>
>> So that implies offset is only relavent when a file is provided. It then goes on
>> to say:
>>
>>    MAP_ANONYMOUS
>>    The mapping is not backed by any file; its contents are initialized to zero.
>>    The fd argument is ignored; however, some implementations require fd to be -1
>>    if MAP_ANONYMOUS (or MAP_ANON) is specified, and portable applications should
>>    ensure this. The offset argument should be zero.
>>
>> So users are expected to pass offset=0 when mapping anon memory, for both shared
>> and private cases.
>>
>> Infact, in the line above where you made your proposed change, pgoff is also
>> being zeroed for the (!file && (flags & MAP_SHARED)) case.
>>
>>
>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> Finally, the second test failure I reported (ksm_tests) is actually caused
>>>>>> by a
>>>>>> bug in the test code, but provoked by this change. So I'll send out a fix for
>>>>>> the test code separately.
>>>>>>
>>>>>>
>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>> index 4f542444a91f..68ac54117c77 100644
>>>>>> --- a/mm/huge_memory.c
>>>>>> +++ b/mm/huge_memory.c
>>>>>> @@ -632,7 +632,7 @@ static unsigned long __thp_get_unmapped_area(struct
>>>>>> file *filp,
>>>>>>   {
>>>>>>          loff_t off_end = off + len;
>>>>>>          loff_t off_align = round_up(off, size);
>>>>>> -       unsigned long len_pad, ret;
>>>>>> +       unsigned long len_pad, ret, off_sub;
>>>>>>
>>>>>>          if (off_end <= off_align || (off_end - off_align) < size)
>>>>>>                  return 0;
>>>>>> @@ -658,7 +658,13 @@ static unsigned long __thp_get_unmapped_area(struct
>>>>>> file *filp,
>>>>>>          if (ret == addr)
>>>>>>                  return addr;
>>>>>>
>>>>>> -       ret += (off - ret) & (size - 1);
>>>>>> +       off_sub = (off - ret) & (size - 1);
>>>>>> +
>>>>>> +       if (current->mm->get_unmapped_area ==
>>>>>> arch_get_unmapped_area_topdown &&
>>>>>> +           !off_sub)
>>>>>> +               return ret + size;
>>>>>> +
>>>>>> +       ret += off_sub;
>>>>>>          return ret;
>>>>>>   }
>>>>>
>>>>> I didn't spot any problem, would you please come up with a formal patch?
>>>>
>>>> Yeah, I'll aim to post today.
>>>
>>> Thanks!
>>>
>>>>
>>>>
>>
>>
Kefeng Wang May 7, 2024, 10:59 a.m. UTC | #14
On 2024/5/7 18:08, Ryan Roberts wrote:
> On 07/05/2024 09:25, Kefeng Wang wrote:
>> Hi Ryan, Yang and all,
>>
>> We see another regression on arm64(no issue on x86) when test memory
>> latency from lmbench,
>>
>> ./lat_mem_rd -P 1 512M 128
> 
> Do you know exectly what this test is doing?

lat_mem_rd measures memory read latency for varying memory sizes and
strides, see https://lmbench.sourceforge.net/man/lat_mem_rd.8.html
> 
>>
>> memory latency(smaller is better)
>>
>> MiB     6.9-rc7    6.9-rc7+revert
> 
> And what exactly have you reverted? I'm guessing just commit efa7df3e3bb5 ("mm:
> align larger anonymous mappings on THP boundaries")?

Yes, just revert efa7df3e3bb5.
> 
>> 0.00049    1.539     1.539
>> 0.00098    1.539     1.539
>> 0.00195    1.539     1.539
>> 0.00293    1.539     1.539
>> 0.00391    1.539     1.539
>> 0.00586    1.539     1.539
>> 0.00781    1.539     1.539
>> 0.01172    1.539     1.539
>> 0.01562    1.539     1.539
>> 0.02344    1.539     1.539
>> 0.03125    1.539     1.539
>> 0.04688    1.539     1.539
>> 0.0625    1.540     1.540
>> 0.09375    3.634     3.086
> 
> So the first regression is for 96K - I'm guessing that's the mmap size? That
> size shouldn't even be affected by this patch, apart from a few adds and a
> compare which determines the size is too small to do PMD alignment for.

Yes, no anon thp.
> 
>> 0.125   3.874     3.175
>> 0.1875  3.544     3.288
>> 0.25    3.556     3.461
>> 0.375   3.641     3.644
>> 0.5     4.125     3.851
>> 0.75    4.968     4.323
>> 1       5.143     4.686
>> 1.5     5.309     4.957
>> 2       5.370     5.116
>> 3       5.430     5.471
>> 4       5.457     5.671
>> 6       6.100     6.170
>> 8       6.496     6.468
>>
>> -----------------------s
>> * L1 cache = 8M, it is no big changes below 8M *
>> * but the latency reduce a lot when revert this patch from L2 *
>>
>> 12      6.917     6.840
>> 16      7.268     7.077
>> 24      7.536     7.345
>> 32      10.723     9.421
>> 48      14.220     11.350
>> 64      16.253     12.189
>> 96      14.494     12.507
>> 128     14.630     12.560
>> 192     15.402     12.967
>> 256     16.178     12.957
>> 384     15.177     13.346
>> 512     15.235     13.233
>>
>> After quickly check the smaps, but don't find any clues, any suggestion?
> 
> Without knowing exactly what the test does, it's difficult to know what to


The major operation(memory read) shows below,

#define    ONE      p = (char **)*p;
#define    FIVE     ONE ONE ONE ONE ONE
#define    TEN      FIVE FIVE
#define    FIFTY    TEN TEN TEN TEN TEN
#define    HUNDRED  FIFTY FIFTY

     while (iterations-- > 0) {
         for (i = 0; i < count; ++i) {
             HUNDRED;
         }
     }

https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95

> suggest. If you want to try something semi-randomly; it might be useful to rule
> out the arm64 contpte feature. I don't see how that would be interacting here if
> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
> ARM64_CONTPTE (needs EXPERT) at compile time.
I don't enabled mTHP, so it should be not related about ARM64_CONTPTE, 
but will have a try.
David Hildenbrand May 7, 2024, 11:13 a.m. UTC | #15
> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
> 
>> suggest. If you want to try something semi-randomly; it might be useful to rule
>> out the arm64 contpte feature. I don't see how that would be interacting here if
>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
>> ARM64_CONTPTE (needs EXPERT) at compile time.
> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
> but will have a try.

cont-pte can get active if we're just lucky when allocating pages in the 
right order, correct Ryan?
Ryan Roberts May 7, 2024, 11:14 a.m. UTC | #16
On 07/05/2024 12:13, David Hildenbrand wrote:
> 
>> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
>>
>>> suggest. If you want to try something semi-randomly; it might be useful to rule
>>> out the arm64 contpte feature. I don't see how that would be interacting here if
>>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
>>> ARM64_CONTPTE (needs EXPERT) at compile time.
>> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
>> but will have a try.
> 
> cont-pte can get active if we're just lucky when allocating pages in the right
> order, correct Ryan?

No it shouldn't do; it requires the pages to be in the same folio.
Ryan Roberts May 7, 2024, 11:26 a.m. UTC | #17
On 07/05/2024 12:14, Ryan Roberts wrote:
> On 07/05/2024 12:13, David Hildenbrand wrote:
>>
>>> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
>>>
>>>> suggest. If you want to try something semi-randomly; it might be useful to rule
>>>> out the arm64 contpte feature. I don't see how that would be interacting here if
>>>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
>>>> ARM64_CONTPTE (needs EXPERT) at compile time.
>>> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
>>> but will have a try.
>>
>> cont-pte can get active if we're just lucky when allocating pages in the right
>> order, correct Ryan?
> 
> No it shouldn't do; it requires the pages to be in the same folio.
> 

That said, if we got lucky in allocating the "right" pages, then we will end up
doing an extra function call and a bit of maths per every 16 PTEs in order to
figure out that the span is not contained by a single folio, before backing out
of an attempt to fold. That would probably be just about measurable.

But the regression doesn't kick in until 96K, which is the step after 64K. I'd
expect to see the regression on 64K too if that was the issue. The cacheline is
64K so I suspect it could be something related to the cache?
David Hildenbrand May 7, 2024, 11:34 a.m. UTC | #18
On 07.05.24 13:26, Ryan Roberts wrote:
> On 07/05/2024 12:14, Ryan Roberts wrote:
>> On 07/05/2024 12:13, David Hildenbrand wrote:
>>>
>>>> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
>>>>
>>>>> suggest. If you want to try something semi-randomly; it might be useful to rule
>>>>> out the arm64 contpte feature. I don't see how that would be interacting here if
>>>>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
>>>>> ARM64_CONTPTE (needs EXPERT) at compile time.
>>>> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
>>>> but will have a try.
>>>
>>> cont-pte can get active if we're just lucky when allocating pages in the right
>>> order, correct Ryan?
>>
>> No it shouldn't do; it requires the pages to be in the same folio.

Ah, my memory comes back. That's also important for folio_pte_batch() to 
currently work as expected I think. We could change that, though, and 
let cont-pte batch across folios.

>>
> 
> That said, if we got lucky in allocating the "right" pages, then we will end up
> doing an extra function call and a bit of maths per every 16 PTEs in order to
> figure out that the span is not contained by a single folio, before backing out
> of an attempt to fold. That would probably be just about measurable.
> 
> But the regression doesn't kick in until 96K, which is the step after 64K. I'd
> expect to see the regression on 64K too if that was the issue. The cacheline is
> 64K so I suspect it could be something related to the cache?
>
David Hildenbrand May 7, 2024, 11:42 a.m. UTC | #19
On 07.05.24 13:34, David Hildenbrand wrote:
> On 07.05.24 13:26, Ryan Roberts wrote:
>> On 07/05/2024 12:14, Ryan Roberts wrote:
>>> On 07/05/2024 12:13, David Hildenbrand wrote:
>>>>
>>>>> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
>>>>>
>>>>>> suggest. If you want to try something semi-randomly; it might be useful to rule
>>>>>> out the arm64 contpte feature. I don't see how that would be interacting here if
>>>>>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
>>>>>> ARM64_CONTPTE (needs EXPERT) at compile time.
>>>>> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
>>>>> but will have a try.
>>>>
>>>> cont-pte can get active if we're just lucky when allocating pages in the right
>>>> order, correct Ryan?
>>>
>>> No it shouldn't do; it requires the pages to be in the same folio.
> 
> Ah, my memory comes back. That's also important for folio_pte_batch() to
> currently work as expected I think. We could change that, though, and
> let cont-pte batch across folios.

Thinking about it (and trying to refresh my memories), access/dirty bits 
might be why we don't want to do that.
Ryan Roberts May 7, 2024, 12:36 p.m. UTC | #20
On 07/05/2024 12:42, David Hildenbrand wrote:
> On 07.05.24 13:34, David Hildenbrand wrote:
>> On 07.05.24 13:26, Ryan Roberts wrote:
>>> On 07/05/2024 12:14, Ryan Roberts wrote:
>>>> On 07/05/2024 12:13, David Hildenbrand wrote:
>>>>>
>>>>>> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
>>>>>>
>>>>>>> suggest. If you want to try something semi-randomly; it might be useful
>>>>>>> to rule
>>>>>>> out the arm64 contpte feature. I don't see how that would be interacting
>>>>>>> here if
>>>>>>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
>>>>>>> ARM64_CONTPTE (needs EXPERT) at compile time.
>>>>>> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
>>>>>> but will have a try.
>>>>>
>>>>> cont-pte can get active if we're just lucky when allocating pages in the right
>>>>> order, correct Ryan?
>>>>
>>>> No it shouldn't do; it requires the pages to be in the same folio.
>>
>> Ah, my memory comes back. That's also important for folio_pte_batch() to
>> currently work as expected I think. We could change that, though, and
>> let cont-pte batch across folios.
> 
> Thinking about it (and trying to refresh my memories), access/dirty bits might
> be why we don't want to do that.

Yes correct; we only get a single access/dirty bit for the whole contpte block.
So can't honour the core kernel's tracking requirements when the pages are not
part of a single folio.
Kefeng Wang May 7, 2024, 1:53 p.m. UTC | #21
On 2024/5/7 19:13, David Hildenbrand wrote:
> 
>> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
>>
>>> suggest. If you want to try something semi-randomly; it might be 
>>> useful to rule
>>> out the arm64 contpte feature. I don't see how that would be 
>>> interacting here if
>>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. 
>>> Disable with
>>> ARM64_CONTPTE (needs EXPERT) at compile time.
>> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
>> but will have a try.

After ARM64_CONTPTE disabled, memory read latency is similar with 
ARM64_CONTPTE enabled(default 6.9-rc7), still larger than align anon 
reverted.

> 
> cont-pte can get active if we're just lucky when allocating pages in the 
> right order, correct Ryan?
>
Ryan Roberts May 7, 2024, 3:53 p.m. UTC | #22
On 07/05/2024 14:53, Kefeng Wang wrote:
> 
> 
> On 2024/5/7 19:13, David Hildenbrand wrote:
>>
>>> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
>>>
>>>> suggest. If you want to try something semi-randomly; it might be useful to rule
>>>> out the arm64 contpte feature. I don't see how that would be interacting
>>>> here if
>>>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
>>>> ARM64_CONTPTE (needs EXPERT) at compile time.
>>> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
>>> but will have a try.
> 
> After ARM64_CONTPTE disabled, memory read latency is similar with ARM64_CONTPTE
> enabled(default 6.9-rc7), still larger than align anon reverted.

OK thanks for trying.

Looking at the source for lmbench, its malloc'ing (512M + 8K) up front and using
that for all sizes. That will presumably be considered "large" by malloc and
will be allocated using mmap. So with the patch, it will be 2M aligned. Without
it, it probably won't. I'm still struggling to understand why not aligning it in
virtual space would make it more performant though...

Is it possible to provide the smaps output for at least that 512M+8K block for
both cases? It might give a bit of a clue.

Do you have traditional (PMD-sized) THP enabled? If its enabled and unaligned
then the front of the buffer wouldn't be mapped with THP, but if it is aligned,
it will. That could affect it.

> 
>>
>> cont-pte can get active if we're just lucky when allocating pages in the right
>> order, correct Ryan?
>>
Yang Shi May 7, 2024, 5:17 p.m. UTC | #23
On Tue, May 7, 2024 at 8:53 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 07/05/2024 14:53, Kefeng Wang wrote:
> >
> >
> > On 2024/5/7 19:13, David Hildenbrand wrote:
> >>
> >>> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
> >>>
> >>>> suggest. If you want to try something semi-randomly; it might be useful to rule
> >>>> out the arm64 contpte feature. I don't see how that would be interacting
> >>>> here if
> >>>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
> >>>> ARM64_CONTPTE (needs EXPERT) at compile time.
> >>> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
> >>> but will have a try.
> >
> > After ARM64_CONTPTE disabled, memory read latency is similar with ARM64_CONTPTE
> > enabled(default 6.9-rc7), still larger than align anon reverted.
>
> OK thanks for trying.
>
> Looking at the source for lmbench, its malloc'ing (512M + 8K) up front and using
> that for all sizes. That will presumably be considered "large" by malloc and
> will be allocated using mmap. So with the patch, it will be 2M aligned. Without
> it, it probably won't. I'm still struggling to understand why not aligning it in
> virtual space would make it more performant though...

Yeah, I'm confused too.

I just ran the same command on 6.6.13 (w/o the thp alignment patch and
mTHP stuff) and 6.9-rc4 (w/ the thp alignment patch and all mTHP
stuff) on my arm64 machine, but I didn't see such a pattern.

The result has a little bit fluctuation, for example, 6.6.13 has
better result with 4M/6M/8M, but 6.9-rc4 has better result for
12M/16M/32M/48M/64M, and the difference may be quite noticeable. But
anyway I didn't see such a regression pattern.

The benchmark is supposed to measure cache and memory latency, its
result strongly relies on the cache and memory subsystem, for example,
hw prefetcher, etc.

>
> Is it possible to provide the smaps output for at least that 512M+8K block for
> both cases? It might give a bit of a clue.
>
> Do you have traditional (PMD-sized) THP enabled? If its enabled and unaligned
> then the front of the buffer wouldn't be mapped with THP, but if it is aligned,
> it will. That could affect it.
>
> >
> >>
> >> cont-pte can get active if we're just lucky when allocating pages in the right
> >> order, correct Ryan?
> >>
>
Kefeng Wang May 8, 2024, 7:48 a.m. UTC | #24
On 2024/5/8 1:17, Yang Shi wrote:
> On Tue, May 7, 2024 at 8:53 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 07/05/2024 14:53, Kefeng Wang wrote:
>>>
>>>
>>> On 2024/5/7 19:13, David Hildenbrand wrote:
>>>>
>>>>> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
>>>>>
>>>>>> suggest. If you want to try something semi-randomly; it might be useful to rule
>>>>>> out the arm64 contpte feature. I don't see how that would be interacting
>>>>>> here if
>>>>>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
>>>>>> ARM64_CONTPTE (needs EXPERT) at compile time.
>>>>> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
>>>>> but will have a try.
>>>
>>> After ARM64_CONTPTE disabled, memory read latency is similar with ARM64_CONTPTE
>>> enabled(default 6.9-rc7), still larger than align anon reverted.
>>
>> OK thanks for trying.
>>
>> Looking at the source for lmbench, its malloc'ing (512M + 8K) up front and using
>> that for all sizes. That will presumably be considered "large" by malloc and
>> will be allocated using mmap. So with the patch, it will be 2M aligned. Without
>> it, it probably won't. I'm still struggling to understand why not aligning it in
>> virtual space would make it more performant though...
> 
> Yeah, I'm confused too.
Me too, I get a smaps[_rollup] for 0.09375M size, the biggest difference
for anon shows below, and all attached.

1) with efa7df3e3bb5 smaps

ffff68e00000-ffff88e03000 rw-p 00000000 00:00 0
Size:             524300 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                2048 kB
Pss:                2048 kB
Pss_Dirty:          2048 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:      2048 kB
Referenced:         2048 kB
Anonymous:          2048 kB // we have 1 anon thp
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:      2048 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           1
VmFlags: rd wr mr mw me ac
ffff88eff000-ffff89000000 rw-p 00000000 00:00 0
Size:               1028 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                1028 kB
Pss:                1028 kB
Pss_Dirty:          1028 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:      1028 kB
Referenced:         1028 kB
Anonymous:          1028 kB // another large anon
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac

and the smap_rollup

00400000-fffff56bd000 ---p 00000000 00:00 0 
[rollup]
Rss:                4724 kB
Pss:                3408 kB
Pss_Dirty:          3338 kB
Pss_Anon:           3338 kB
Pss_File:             70 kB
Pss_Shmem:             0 kB
Shared_Clean:       1176 kB
Shared_Dirty:        420 kB
Private_Clean:         0 kB
Private_Dirty:      3128 kB
Referenced:         4344 kB
Anonymous:          3548 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:      2048 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB

2) without efa7df3e3bb5 smaps

ffff9845b000-ffffb855f000 rw-p 00000000 00:00 0
Size:             525328 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                1128 kB
Pss:                1128 kB
Pss_Dirty:          1128 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:      1128 kB
Referenced:         1128 kB
Anonymous:          1128 kB // only large anon
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           1
VmFlags: rd wr mr mw me ac

and the smap_rollup,

00400000-ffffca5dc000 ---p 00000000 00:00 0 
[rollup]
Rss:                2600 kB
Pss:                1472 kB
Pss_Dirty:          1388 kB
Pss_Anon:           1388 kB
Pss_File:             84 kB
Pss_Shmem:             0 kB
Shared_Clean:       1000 kB
Shared_Dirty:        424 kB
Private_Clean:         0 kB
Private_Dirty:      1176 kB
Referenced:         2220 kB
Anonymous:          1600 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB

> 
> I just ran the same command on 6.6.13 (w/o the thp alignment patch and
> mTHP stuff) and 6.9-rc4 (w/ the thp alignment patch and all mTHP
> stuff) on my arm64 machine, but I didn't see such a pattern.
> 
> The result has a little bit fluctuation, for example, 6.6.13 has
> better result with 4M/6M/8M, but 6.9-rc4 has better result for
> 12M/16M/32M/48M/64M, and the difference may be quite noticeable. But
> anyway I didn't see such a regression pattern.

Although it is not fluctuation, but on our arm64, it is very noticeable.

> 
> The benchmark is supposed to measure cache and memory latency, its
> result strongly relies on the cache and memory subsystem, for example,
> hw prefetcher, etc.

Yes, I will try another type of arm64 if possible, no available machine now.


> 
>>
>> Is it possible to provide the smaps output for at least that 512M+8K block for
>> both cases? It might give a bit of a clue.

Will collect more smaps.

>>
>> Do you have traditional (PMD-sized) THP enabled? If its enabled and unaligned
>> then the front of the buffer wouldn't be mapped with THP, but if it is aligned,
>> it will. That could affect it.

Yes, PMD-sized THP enabled. at least for above smaps, without 
efa7df3e3bb5, the anon don't be mappped with THP.
00400000-fffff56bd000 ---p 00000000 00:00 0                              [rollup]
Rss:                4724 kB
Pss:                3408 kB
Pss_Dirty:          3338 kB
Pss_Anon:           3338 kB
Pss_File:             70 kB
Pss_Shmem:             0 kB
Shared_Clean:       1176 kB
Shared_Dirty:        420 kB
Private_Clean:         0 kB
Private_Dirty:      3128 kB
Referenced:         4344 kB
Anonymous:          3548 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:      2048 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
00400000-00418000 r-xp 00000000 fd:05 25219030                           /home/zz/lmbench-3.0-a9/bin/aarch64-linux-gnu/lat_mem_rd
Size:                 96 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  96 kB
Pss:                  48 kB
Pss_Dirty:             0 kB
Shared_Clean:         96 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:           96 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
0042f000-00430000 r--p 0001f000 fd:05 25219030                           /home/zz/lmbench-3.0-a9/bin/aarch64-linux-gnu/lat_mem_rd
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            4 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
00430000-00431000 rw-p 00020000 fd:05 25219030                           /home/zz/lmbench-3.0-a9/bin/aarch64-linux-gnu/lat_mem_rd
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   4 kB
Pss_Dirty:             4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
3d0b5000-3d0d6000 rw-p 00000000 00:00 0                                  [heap]
Size:                132 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   8 kB
Pss:                   8 kB
Pss_Dirty:             8 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         8 kB
Referenced:            8 kB
Anonymous:             8 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff68e00000-ffff88e03000 rw-p 00000000 00:00 0 
Size:             524300 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                2048 kB
Pss:                2048 kB
Pss_Dirty:          2048 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:      2048 kB
Referenced:         2048 kB
Anonymous:          2048 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:      2048 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           1
VmFlags: rd wr mr mw me ac 
ffff88eff000-ffff89000000 rw-p 00000000 00:00 0 
Size:               1028 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                1028 kB
Pss:                1028 kB
Pss_Dirty:          1028 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:      1028 kB
Referenced:         1028 kB
Anonymous:          1028 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff89000000-ffff89275000 r-xp 00000000 fd:00 270258                     /usr/lib64/libcrypto.so.1.1.1m
Size:               2516 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           1
VmFlags: rd ex mr mw me 
ffff89275000-ffff89287000 ---p 00275000 fd:00 270258                     /usr/lib64/libcrypto.so.1.1.1m
Size:                 72 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffff89287000-ffff892b0000 r--p 00277000 fd:00 270258                     /usr/lib64/libcrypto.so.1.1.1m
Size:                164 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                 164 kB
Pss:                  82 kB
Pss_Dirty:            82 kB
Shared_Clean:          0 kB
Shared_Dirty:        164 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:           164 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffff892b0000-ffff892b4000 rw-p 002a0000 fd:00 270258                     /usr/lib64/libcrypto.so.1.1.1m
Size:                 16 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  16 kB
Pss:                   8 kB
Pss_Dirty:             8 kB
Shared_Clean:          0 kB
Shared_Dirty:         16 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:            16 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff892b4000-ffff892b8000 rw-p 00000000 00:00 0 
Size:                 16 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   8 kB
Pss:                   4 kB
Pss_Dirty:             4 kB
Shared_Clean:          0 kB
Shared_Dirty:          8 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             8 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff89350000-ffff89352000 rw-p 00000000 00:00 0 
Size:                  8 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   8 kB
Pss:                   6 kB
Pss_Dirty:             6 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            8 kB
Anonymous:             8 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff89352000-ffff893d2000 r-xp 00000000 fd:00 264838                     /usr/lib64/libpcre2-8.so.0.10.4
Size:                512 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffff893d2000-ffff893f1000 ---p 00080000 fd:00 264838                     /usr/lib64/libpcre2-8.so.0.10.4
Size:                124 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffff893f1000-ffff893f2000 r--p 0008f000 fd:00 264838                     /usr/lib64/libpcre2-8.so.0.10.4
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffff893f2000-ffff893f3000 rw-p 00090000 fd:00 264838                     /usr/lib64/libpcre2-8.so.0.10.4
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff893f3000-ffff8940b000 r-xp 00000000 fd:00 264844                     /usr/lib64/libz.so.1.2.11
Size:                 96 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffff8940b000-ffff89422000 ---p 00018000 fd:00 264844                     /usr/lib64/libz.so.1.2.11
Size:                 92 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffff89422000-ffff89423000 r--p 0001f000 fd:00 264844                     /usr/lib64/libz.so.1.2.11
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffff89423000-ffff89424000 rw-p 00020000 fd:00 264844                     /usr/lib64/libz.so.1.2.11
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff89424000-ffff8944c000 r-xp 00000000 fd:00 264087                     /usr/lib64/libselinux.so.1
Size:                160 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffff8944c000-ffff89463000 ---p 00028000 fd:00 264087                     /usr/lib64/libselinux.so.1
Size:                 92 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffff89463000-ffff89464000 r--p 0002f000 fd:00 264087                     /usr/lib64/libselinux.so.1
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffff89464000-ffff89465000 rw-p 00030000 fd:00 264087                     /usr/lib64/libselinux.so.1
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff89465000-ffff89469000 rw-p 00000000 00:00 0 
Size:                 16 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  12 kB
Pss:                   6 kB
Pss_Dirty:             6 kB
Shared_Clean:          0 kB
Shared_Dirty:         12 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            4 kB
Anonymous:            12 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff89469000-ffff89476000 r-xp 00000000 fd:00 264465                     /usr/lib64/libresolv.so.2
Size:                 52 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffff89476000-ffff89488000 ---p 0000d000 fd:00 264465                     /usr/lib64/libresolv.so.2
Size:                 72 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffff89488000-ffff89489000 r--p 0000f000 fd:00 264465                     /usr/lib64/libresolv.so.2
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffff89489000-ffff8948a000 rw-p 00010000 fd:00 264465                     /usr/lib64/libresolv.so.2
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff8948a000-ffff8948c000 rw-p 00000000 00:00 0 
Size:                  8 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff8948c000-ffff89490000 r-xp 00000000 fd:00 265039                     /usr/lib64/libkeyutils.so.1.10
Size:                 16 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffff89490000-ffff894ab000 ---p 00004000 fd:00 265039                     /usr/lib64/libkeyutils.so.1.10
Size:                108 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffff894ab000-ffff894ac000 r--p 0000f000 fd:00 265039                     /usr/lib64/libkeyutils.so.1.10
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffff894ac000-ffff894ad000 rw-p 00010000 fd:00 265039                     /usr/lib64/libkeyutils.so.1.10
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff894ad000-ffff894bc000 r-xp 00000000 fd:00 268503                     /usr/lib64/libkrb5support.so.0.1
Size:                 60 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffff894bc000-ffff894cc000 ---p 0000f000 fd:00 268503                     /usr/lib64/libkrb5support.so.0.1
Size:                 64 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffff894cc000-ffff894cd000 r--p 0000f000 fd:00 268503                     /usr/lib64/libkrb5support.so.0.1
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffff894cd000-ffff894ce000 rw-p 00010000 fd:00 268503                     /usr/lib64/libkrb5support.so.0.1
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff894ce000-ffff894d1000 r-xp 00000000 fd:00 268636                     /usr/lib64/libcom_err.so.2.1
Size:                 12 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffff894d1000-ffff894ed000 ---p 00003000 fd:00 268636                     /usr/lib64/libcom_err.so.2.1
Size:                112 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffff894ed000-ffff894ee000 r--p 0000f000 fd:00 268636                     /usr/lib64/libcom_err.so.2.1
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffff894ee000-ffff894ef000 rw-p 00010000 fd:00 268636                     /usr/lib64/libcom_err.so.2.1
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff894ef000-ffff89504000 r-xp 00000000 fd:00 268495                     /usr/lib64/libk5crypto.so.3.1
Size:                 84 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffff89504000-ffff8951d000 ---p 00015000 fd:00 268495                     /usr/lib64/libk5crypto.so.3.1
Size:                100 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffff8951d000-ffff8951f000 r--p 0001e000 fd:00 268495                     /usr/lib64/libk5crypto.so.3.1
Size:                  8 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   8 kB
Pss:                   4 kB
Pss_Dirty:             4 kB
Shared_Clean:          0 kB
Shared_Dirty:          8 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             8 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffff8951f000-ffff89520000 rw-p 00020000 fd:00 268495                     /usr/lib64/libk5crypto.so.3.1
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff89520000-ffff895f9000 r-xp 00000000 fd:00 268501                     /usr/lib64/libkrb5.so.3.3
Size:                868 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffff895f9000-ffff89613000 ---p 000d9000 fd:00 268501                     /usr/lib64/libkrb5.so.3.3
Size:                104 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffff89613000-ffff89620000 r--p 000e3000 fd:00 268501                     /usr/lib64/libkrb5.so.3.3
Size:                 52 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  52 kB
Pss:                  26 kB
Pss_Dirty:            26 kB
Shared_Clean:          0 kB
Shared_Dirty:         52 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:            52 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffff89620000-ffff89621000 rw-p 000f0000 fd:00 268501                     /usr/lib64/libkrb5.so.3.3
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff89621000-ffff89622000 rw-p 00000000 00:00 0 
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff89622000-ffff89670000 r-xp 00000000 fd:00 268491                     /usr/lib64/libgssapi_krb5.so.2.2
Size:                312 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffff89670000-ffff89680000 ---p 0004e000 fd:00 268491                     /usr/lib64/libgssapi_krb5.so.2.2
Size:                 64 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffff89680000-ffff89682000 r--p 0004e000 fd:00 268491                     /usr/lib64/libgssapi_krb5.so.2.2
Size:                  8 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   8 kB
Pss:                   4 kB
Pss_Dirty:             4 kB
Shared_Clean:          0 kB
Shared_Dirty:          8 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             8 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffff89682000-ffff89683000 rw-p 00050000 fd:00 268491                     /usr/lib64/libgssapi_krb5.so.2.2
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff89683000-ffff897fc000 r-xp 00000000 fd:00 264457                     /usr/lib64/libc.so.6
Size:               1508 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                 820 kB
Pss:                  12 kB
Pss_Dirty:             0 kB
Shared_Clean:        820 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:          820 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffff897fc000-ffff89810000 ---p 00179000 fd:00 264457                     /usr/lib64/libc.so.6
Size:                 80 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffff89810000-ffff89813000 r--p 0017d000 fd:00 264457                     /usr/lib64/libc.so.6
Size:                 12 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  12 kB
Pss:                   6 kB
Pss_Dirty:             6 kB
Shared_Clean:          0 kB
Shared_Dirty:         12 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            4 kB
Anonymous:            12 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffff89813000-ffff89816000 rw-p 00180000 fd:00 264457                     /usr/lib64/libc.so.6
Size:                 12 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  12 kB
Pss:                  12 kB
Pss_Dirty:            12 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        12 kB
Referenced:           12 kB
Anonymous:            12 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff89816000-ffff8981d000 rw-p 00000000 00:00 0 
Size:                 28 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  16 kB
Pss:                  12 kB
Pss_Dirty:            12 kB
Shared_Clean:          0 kB
Shared_Dirty:          8 kB
Private_Clean:         0 kB
Private_Dirty:         8 kB
Referenced:           12 kB
Anonymous:            16 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff8981d000-ffff8984a000 r-xp 00000000 fd:00 268681                     /usr/lib64/libtirpc.so.3.0.0
Size:                180 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  64 kB
Pss:                   4 kB
Pss_Dirty:             0 kB
Shared_Clean:         64 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:           64 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffff8984a000-ffff8985b000 ---p 0002d000 fd:00 268681                     /usr/lib64/libtirpc.so.3.0.0
Size:                 68 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffff8985b000-ffff8985d000 r--p 0002e000 fd:00 268681                     /usr/lib64/libtirpc.so.3.0.0
Size:                  8 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   8 kB
Pss:                   4 kB
Pss_Dirty:             4 kB
Shared_Clean:          0 kB
Shared_Dirty:          8 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            4 kB
Anonymous:             8 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffff8985d000-ffff8985e000 rw-p 00030000 fd:00 268681                     /usr/lib64/libtirpc.so.3.0.0
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff8985e000-ffff898e0000 r-xp 00000000 fd:00 264460                     /usr/lib64/libm.so.6
Size:                520 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  64 kB
Pss:                   2 kB
Pss_Dirty:             0 kB
Shared_Clean:         64 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:           64 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffff898e0000-ffff898fd000 ---p 00082000 fd:00 264460                     /usr/lib64/libm.so.6
Size:                116 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffff898fd000-ffff898fe000 r--p 0008f000 fd:00 264460                     /usr/lib64/libm.so.6
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            4 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffff898fe000-ffff898ff000 rw-p 00090000 fd:00 264460                     /usr/lib64/libm.so.6
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff898ff000-ffff89922000 r-xp 00000000 fd:00 264453                     /usr/lib/ld-linux-aarch64.so.1
Size:                140 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                 128 kB
Pss:                   2 kB
Pss_Dirty:             0 kB
Shared_Clean:        128 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:          128 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffff89923000-ffff89927000 rw-p 00000000 00:00 0 
Size:                 16 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  16 kB
Pss:                   8 kB
Pss_Dirty:             8 kB
Shared_Clean:          0 kB
Shared_Dirty:         16 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            8 kB
Anonymous:            16 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff8993a000-ffff8993c000 r--p 00000000 00:00 0                          [vvar]
Size:                  8 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr pf de 
ffff8993c000-ffff8993d000 r-xp 00000000 00:00 0                          [vdso]
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          4 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            4 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me de 
ffff8993d000-ffff8993f000 r--p 0002e000 fd:00 264453                     /usr/lib/ld-linux-aarch64.so.1
Size:                  8 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   8 kB
Pss:                   4 kB
Pss_Dirty:             4 kB
Shared_Clean:          0 kB
Shared_Dirty:          8 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            4 kB
Anonymous:             8 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffff8993f000-ffff89941000 rw-p 00030000 fd:00 264453                     /usr/lib/ld-linux-aarch64.so.1
Size:                  8 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   8 kB
Pss:                   8 kB
Pss_Dirty:             8 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         8 kB
Referenced:            8 kB
Anonymous:             8 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
fffff569c000-fffff56bd000 rw-p 00000000 00:00 0                          [stack]
Size:                132 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  20 kB
Pss:                  14 kB
Pss_Dirty:            14 kB
Shared_Clean:          0 kB
Shared_Dirty:         12 kB
Private_Clean:         0 kB
Private_Dirty:         8 kB
Referenced:            8 kB
Anonymous:            20 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me gd ac
00400000-00418000 r-xp 00000000 fd:04 25219030                           /home/zz/lmbench-3.0-a9/bin/aarch64-linux-gnu/lat_mem_rd
Size:                 96 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  96 kB
Pss:                  48 kB
Pss_Dirty:             0 kB
Shared_Clean:         96 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:           96 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
0042f000-00430000 r--p 0001f000 fd:04 25219030                           /home/zz/lmbench-3.0-a9/bin/aarch64-linux-gnu/lat_mem_rd
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            4 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
00430000-00431000 rw-p 00020000 fd:04 25219030                           /home/zz/lmbench-3.0-a9/bin/aarch64-linux-gnu/lat_mem_rd
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   4 kB
Pss_Dirty:             4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
0e9ab000-0e9cc000 rw-p 00000000 00:00 0                                  [heap]
Size:                132 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   8 kB
Pss:                   8 kB
Pss_Dirty:             8 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         8 kB
Referenced:            8 kB
Anonymous:             8 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffff9845b000-ffffb855f000 rw-p 00000000 00:00 0 
Size:             525328 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                1128 kB
Pss:                1128 kB
Pss_Dirty:          1128 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:      1128 kB
Referenced:         1128 kB
Anonymous:          1128 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           1
VmFlags: rd wr mr mw me ac 
ffffb855f000-ffffb85df000 r-xp 00000000 fd:00 264838                     /usr/lib64/libpcre2-8.so.0.10.4
Size:                512 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffffb85df000-ffffb85fe000 ---p 00080000 fd:00 264838                     /usr/lib64/libpcre2-8.so.0.10.4
Size:                124 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffffb85fe000-ffffb85ff000 r--p 0008f000 fd:00 264838                     /usr/lib64/libpcre2-8.so.0.10.4
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffffb85ff000-ffffb8600000 rw-p 00090000 fd:00 264838                     /usr/lib64/libpcre2-8.so.0.10.4
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb8600000-ffffb8875000 r-xp 00000000 fd:00 270258                     /usr/lib64/libcrypto.so.1.1.1m
Size:               2516 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           1
VmFlags: rd ex mr mw me 
ffffb8875000-ffffb8887000 ---p 00275000 fd:00 270258                     /usr/lib64/libcrypto.so.1.1.1m
Size:                 72 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffffb8887000-ffffb88b0000 r--p 00277000 fd:00 270258                     /usr/lib64/libcrypto.so.1.1.1m
Size:                164 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                 164 kB
Pss:                  82 kB
Pss_Dirty:            82 kB
Shared_Clean:          0 kB
Shared_Dirty:        164 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:           164 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffffb88b0000-ffffb88b4000 rw-p 002a0000 fd:00 270258                     /usr/lib64/libcrypto.so.1.1.1m
Size:                 16 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  16 kB
Pss:                   8 kB
Pss_Dirty:             8 kB
Shared_Clean:          0 kB
Shared_Dirty:         16 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:            16 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb88b4000-ffffb88b8000 rw-p 00000000 00:00 0 
Size:                 16 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   8 kB
Pss:                   4 kB
Pss_Dirty:             4 kB
Shared_Clean:          0 kB
Shared_Dirty:          8 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             8 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb88c8000-ffffb88ca000 rw-p 00000000 00:00 0 
Size:                  8 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   8 kB
Pss:                   6 kB
Pss_Dirty:             6 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            8 kB
Anonymous:             8 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb88ca000-ffffb88e2000 r-xp 00000000 fd:00 264844                     /usr/lib64/libz.so.1.2.11
Size:                 96 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffffb88e2000-ffffb88f9000 ---p 00018000 fd:00 264844                     /usr/lib64/libz.so.1.2.11
Size:                 92 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffffb88f9000-ffffb88fa000 r--p 0001f000 fd:00 264844                     /usr/lib64/libz.so.1.2.11
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffffb88fa000-ffffb88fb000 rw-p 00020000 fd:00 264844                     /usr/lib64/libz.so.1.2.11
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb88fb000-ffffb8923000 r-xp 00000000 fd:00 264087                     /usr/lib64/libselinux.so.1
Size:                160 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffffb8923000-ffffb893a000 ---p 00028000 fd:00 264087                     /usr/lib64/libselinux.so.1
Size:                 92 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffffb893a000-ffffb893b000 r--p 0002f000 fd:00 264087                     /usr/lib64/libselinux.so.1
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffffb893b000-ffffb893c000 rw-p 00030000 fd:00 264087                     /usr/lib64/libselinux.so.1
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb893c000-ffffb8940000 rw-p 00000000 00:00 0 
Size:                 16 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  12 kB
Pss:                   6 kB
Pss_Dirty:             6 kB
Shared_Clean:          0 kB
Shared_Dirty:         12 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            4 kB
Anonymous:            12 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb8940000-ffffb894d000 r-xp 00000000 fd:00 264465                     /usr/lib64/libresolv.so.2
Size:                 52 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffffb894d000-ffffb895f000 ---p 0000d000 fd:00 264465                     /usr/lib64/libresolv.so.2
Size:                 72 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffffb895f000-ffffb8960000 r--p 0000f000 fd:00 264465                     /usr/lib64/libresolv.so.2
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffffb8960000-ffffb8961000 rw-p 00010000 fd:00 264465                     /usr/lib64/libresolv.so.2
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb8961000-ffffb8963000 rw-p 00000000 00:00 0 
Size:                  8 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb8963000-ffffb8967000 r-xp 00000000 fd:00 265039                     /usr/lib64/libkeyutils.so.1.10
Size:                 16 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffffb8967000-ffffb8982000 ---p 00004000 fd:00 265039                     /usr/lib64/libkeyutils.so.1.10
Size:                108 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffffb8982000-ffffb8983000 r--p 0000f000 fd:00 265039                     /usr/lib64/libkeyutils.so.1.10
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffffb8983000-ffffb8984000 rw-p 00010000 fd:00 265039                     /usr/lib64/libkeyutils.so.1.10
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb8984000-ffffb8993000 r-xp 00000000 fd:00 268503                     /usr/lib64/libkrb5support.so.0.1
Size:                 60 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffffb8993000-ffffb89a3000 ---p 0000f000 fd:00 268503                     /usr/lib64/libkrb5support.so.0.1
Size:                 64 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffffb89a3000-ffffb89a4000 r--p 0000f000 fd:00 268503                     /usr/lib64/libkrb5support.so.0.1
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffffb89a4000-ffffb89a5000 rw-p 00010000 fd:00 268503                     /usr/lib64/libkrb5support.so.0.1
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb89a5000-ffffb89a8000 r-xp 00000000 fd:00 268636                     /usr/lib64/libcom_err.so.2.1
Size:                 12 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffffb89a8000-ffffb89c4000 ---p 00003000 fd:00 268636                     /usr/lib64/libcom_err.so.2.1
Size:                112 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffffb89c4000-ffffb89c5000 r--p 0000f000 fd:00 268636                     /usr/lib64/libcom_err.so.2.1
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffffb89c5000-ffffb89c6000 rw-p 00010000 fd:00 268636                     /usr/lib64/libcom_err.so.2.1
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb89c6000-ffffb89db000 r-xp 00000000 fd:00 268495                     /usr/lib64/libk5crypto.so.3.1
Size:                 84 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffffb89db000-ffffb89f4000 ---p 00015000 fd:00 268495                     /usr/lib64/libk5crypto.so.3.1
Size:                100 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffffb89f4000-ffffb89f6000 r--p 0001e000 fd:00 268495                     /usr/lib64/libk5crypto.so.3.1
Size:                  8 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   8 kB
Pss:                   4 kB
Pss_Dirty:             4 kB
Shared_Clean:          0 kB
Shared_Dirty:          8 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             8 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffffb89f6000-ffffb89f7000 rw-p 00020000 fd:00 268495                     /usr/lib64/libk5crypto.so.3.1
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb89f7000-ffffb8ad0000 r-xp 00000000 fd:00 268501                     /usr/lib64/libkrb5.so.3.3
Size:                868 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffffb8ad0000-ffffb8aea000 ---p 000d9000 fd:00 268501                     /usr/lib64/libkrb5.so.3.3
Size:                104 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffffb8aea000-ffffb8af7000 r--p 000e3000 fd:00 268501                     /usr/lib64/libkrb5.so.3.3
Size:                 52 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  52 kB
Pss:                  26 kB
Pss_Dirty:            26 kB
Shared_Clean:          0 kB
Shared_Dirty:         52 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:            52 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffffb8af7000-ffffb8af8000 rw-p 000f0000 fd:00 268501                     /usr/lib64/libkrb5.so.3.3
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb8af8000-ffffb8af9000 rw-p 00000000 00:00 0 
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb8af9000-ffffb8b47000 r-xp 00000000 fd:00 268491                     /usr/lib64/libgssapi_krb5.so.2.2
Size:                312 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffffb8b47000-ffffb8b57000 ---p 0004e000 fd:00 268491                     /usr/lib64/libgssapi_krb5.so.2.2
Size:                 64 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffffb8b57000-ffffb8b59000 r--p 0004e000 fd:00 268491                     /usr/lib64/libgssapi_krb5.so.2.2
Size:                  8 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   8 kB
Pss:                   4 kB
Pss_Dirty:             4 kB
Shared_Clean:          0 kB
Shared_Dirty:          8 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             8 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffffb8b59000-ffffb8b5a000 rw-p 00050000 fd:00 268491                     /usr/lib64/libgssapi_krb5.so.2.2
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb8b5a000-ffffb8cd3000 r-xp 00000000 fd:00 264457                     /usr/lib64/libc.so.6
Size:               1508 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                 636 kB
Pss:                  18 kB
Pss_Dirty:             0 kB
Shared_Clean:        636 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:          636 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffffb8cd3000-ffffb8ce7000 ---p 00179000 fd:00 264457                     /usr/lib64/libc.so.6
Size:                 80 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffffb8ce7000-ffffb8cea000 r--p 0017d000 fd:00 264457                     /usr/lib64/libc.so.6
Size:                 12 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  12 kB
Pss:                   6 kB
Pss_Dirty:             6 kB
Shared_Clean:          0 kB
Shared_Dirty:         12 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            4 kB
Anonymous:            12 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffffb8cea000-ffffb8ced000 rw-p 00180000 fd:00 264457                     /usr/lib64/libc.so.6
Size:                 12 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  12 kB
Pss:                  12 kB
Pss_Dirty:            12 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        12 kB
Referenced:           12 kB
Anonymous:            12 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb8ced000-ffffb8cf4000 rw-p 00000000 00:00 0 
Size:                 28 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  16 kB
Pss:                  12 kB
Pss_Dirty:            12 kB
Shared_Clean:          0 kB
Shared_Dirty:          8 kB
Private_Clean:         0 kB
Private_Dirty:         8 kB
Referenced:           12 kB
Anonymous:            16 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb8cf4000-ffffb8d21000 r-xp 00000000 fd:00 268681                     /usr/lib64/libtirpc.so.3.0.0
Size:                180 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  64 kB
Pss:                  10 kB
Pss_Dirty:             0 kB
Shared_Clean:         64 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:           64 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffffb8d21000-ffffb8d32000 ---p 0002d000 fd:00 268681                     /usr/lib64/libtirpc.so.3.0.0
Size:                 68 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffffb8d32000-ffffb8d34000 r--p 0002e000 fd:00 268681                     /usr/lib64/libtirpc.so.3.0.0
Size:                  8 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   8 kB
Pss:                   4 kB
Pss_Dirty:             4 kB
Shared_Clean:          0 kB
Shared_Dirty:          8 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            4 kB
Anonymous:             8 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffffb8d34000-ffffb8d35000 rw-p 00030000 fd:00 268681                     /usr/lib64/libtirpc.so.3.0.0
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb8d35000-ffffb8db7000 r-xp 00000000 fd:00 264460                     /usr/lib64/libm.so.6
Size:                520 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  60 kB
Pss:                   2 kB
Pss_Dirty:             0 kB
Shared_Clean:         60 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:           60 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffffb8db7000-ffffb8dd4000 ---p 00082000 fd:00 264460                     /usr/lib64/libm.so.6
Size:                116 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: mr mw me 
ffffb8dd4000-ffffb8dd5000 r--p 0008f000 fd:00 264460                     /usr/lib64/libm.so.6
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            4 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffffb8dd5000-ffffb8dd6000 rw-p 00090000 fd:00 264460                     /usr/lib64/libm.so.6
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   2 kB
Pss_Dirty:             2 kB
Shared_Clean:          0 kB
Shared_Dirty:          4 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             4 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb8dd6000-ffffb8df9000 r-xp 00000000 fd:00 264453                     /usr/lib/ld-linux-aarch64.so.1
Size:                140 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                 140 kB
Pss:                   4 kB
Pss_Dirty:             0 kB
Shared_Clean:        140 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:          140 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me 
ffffb8dfa000-ffffb8dfe000 rw-p 00000000 00:00 0 
Size:                 16 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  16 kB
Pss:                   8 kB
Pss_Dirty:             8 kB
Shared_Clean:          0 kB
Shared_Dirty:         16 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            8 kB
Anonymous:            16 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffb8e11000-ffffb8e13000 r--p 00000000 00:00 0                          [vvar]
Size:                  8 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   0 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr pf de 
ffffb8e13000-ffffb8e14000 r-xp 00000000 00:00 0                          [vdso]
Size:                  4 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   4 kB
Pss:                   0 kB
Pss_Dirty:             0 kB
Shared_Clean:          4 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            4 kB
Anonymous:             0 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd ex mr mw me de 
ffffb8e14000-ffffb8e16000 r--p 0002e000 fd:00 264453                     /usr/lib/ld-linux-aarch64.so.1
Size:                  8 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   8 kB
Pss:                   4 kB
Pss_Dirty:             4 kB
Shared_Clean:          0 kB
Shared_Dirty:          8 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            4 kB
Anonymous:             8 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd mr mw me ac 
ffffb8e16000-ffffb8e18000 rw-p 00030000 fd:00 264453                     /usr/lib/ld-linux-aarch64.so.1
Size:                  8 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                   8 kB
Pss:                   8 kB
Pss_Dirty:             8 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         8 kB
Referenced:            8 kB
Anonymous:             8 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac 
ffffca5bb000-ffffca5dc000 rw-p 00000000 00:00 0                          [stack]
Size:                132 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  20 kB
Pss:                  12 kB
Pss_Dirty:            12 kB
Shared_Clean:          0 kB
Shared_Dirty:         16 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            8 kB
Anonymous:            20 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me gd ac
00400000-ffffca5dc000 ---p 00000000 00:00 0                              [rollup]
Rss:                2600 kB
Pss:                1472 kB
Pss_Dirty:          1388 kB
Pss_Anon:           1388 kB
Pss_File:             84 kB
Pss_Shmem:             0 kB
Shared_Clean:       1000 kB
Shared_Dirty:        424 kB
Private_Clean:         0 kB
Private_Dirty:      1176 kB
Referenced:         2220 kB
Anonymous:          1600 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
Ryan Roberts May 8, 2024, 8:36 a.m. UTC | #25
On 08/05/2024 08:48, Kefeng Wang wrote:
> 
> 
> On 2024/5/8 1:17, Yang Shi wrote:
>> On Tue, May 7, 2024 at 8:53 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>
>>> On 07/05/2024 14:53, Kefeng Wang wrote:
>>>>
>>>>
>>>> On 2024/5/7 19:13, David Hildenbrand wrote:
>>>>>
>>>>>> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
>>>>>>
>>>>>>> suggest. If you want to try something semi-randomly; it might be useful
>>>>>>> to rule
>>>>>>> out the arm64 contpte feature. I don't see how that would be interacting
>>>>>>> here if
>>>>>>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
>>>>>>> ARM64_CONTPTE (needs EXPERT) at compile time.
>>>>>> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
>>>>>> but will have a try.
>>>>
>>>> After ARM64_CONTPTE disabled, memory read latency is similar with ARM64_CONTPTE
>>>> enabled(default 6.9-rc7), still larger than align anon reverted.
>>>
>>> OK thanks for trying.
>>>
>>> Looking at the source for lmbench, its malloc'ing (512M + 8K) up front and using
>>> that for all sizes. That will presumably be considered "large" by malloc and
>>> will be allocated using mmap. So with the patch, it will be 2M aligned. Without
>>> it, it probably won't. I'm still struggling to understand why not aligning it in
>>> virtual space would make it more performant though...
>>
>> Yeah, I'm confused too.
> Me too, I get a smaps[_rollup] for 0.09375M size, the biggest difference
> for anon shows below, and all attached.

OK, a bit more insight; during initialization, the test makes 2 big malloc
calls; the first is 1M and the second is 512M+8K. I think those 2 are the 2 vmas
below (malloc is adding an extra page to the allocation, presumably for
management structures).

With efa7df3e3bb5 applied, the 1M allocation is allocated at a non-THP-aligned
address. All of its pages are populated (see permutation() which allocates and
writes it) but none of them are THP (obviously - its only 1M and THP is only
enabled for 2M). But the 512M region is allocated at a THP-aligned address. And
the first page is populated with a THP (presumably faulted when malloc writes to
its control structure page before the application even sees the allocated buffer.

In contrast, when efa7df3e3bb5 is reverted, neither of the vmas are THP-aligned,
and therefore the 512M region abutts the 1M region and the vmas are merged in
the kernel. So we end up with the single 525328 kB region. There are no THPs
allocated here (due to alignment constraiints) so we end up with the 1M region
fully populated with 4K pages as before, and only the malloc control page plus
the parts of the buffer that the application actually touches being populated in
the 512M region.

As far as I can tell, the application never touches the 1M region during the
test so it should be cache-cold. It only touches the first part of the 512M
buffer it needs for the size of the test (96K here?). The latency of allocating
the THP will have been consumed during test setup so I doubt we are seeing that
in the test results and I don't see why having a single TLB entry vs 96K/4K=24
entries would make it slower.

It would be interesting to know the address that gets returned from malloc for
the 512M region if that's possible to get (in both cases)? I guess it is offset
into the first page. Perhaps it is offset such that with the THP alignment case
the 96K of interest ends up straddling 3 cache lines (cache line is 64K I
assume?), but for the unaligned case, it ends up nicely packed in 2?

Thanks,
Ryan

> 
> 1) with efa7df3e3bb5 smaps
> 
> ffff68e00000-ffff88e03000 rw-p 00000000 00:00 0
> Size:             524300 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Rss:                2048 kB
> Pss:                2048 kB
> Pss_Dirty:          2048 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:      2048 kB
> Referenced:         2048 kB
> Anonymous:          2048 kB // we have 1 anon thp
> KSM:                   0 kB
> LazyFree:              0 kB
> AnonHugePages:      2048 kB

Yes one 2M THP shown here.

> ShmemPmdMapped:        0 kB
> FilePmdMapped:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> Locked:                0 kB
> THPeligible:           1
> VmFlags: rd wr mr mw me ac
> ffff88eff000-ffff89000000 rw-p 00000000 00:00 0
> Size:               1028 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Rss:                1028 kB
> Pss:                1028 kB
> Pss_Dirty:          1028 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:      1028 kB
> Referenced:         1028 kB
> Anonymous:          1028 kB // another large anon

This is not THP, since you only have 2M THP enabled. This will be 1M of 4K page
allocations + 1 4K page malloc control structure, allocated and accessed by
permutation() during test setup.

> KSM:                   0 kB
> LazyFree:              0 kB
> AnonHugePages:         0 kB
> ShmemPmdMapped:        0 kB
> FilePmdMapped:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> Locked:                0 kB
> THPeligible:           0
> VmFlags: rd wr mr mw me ac
> 
> and the smap_rollup
> 
> 00400000-fffff56bd000 ---p 00000000 00:00 0 [rollup]
> Rss:                4724 kB
> Pss:                3408 kB
> Pss_Dirty:          3338 kB
> Pss_Anon:           3338 kB
> Pss_File:             70 kB
> Pss_Shmem:             0 kB
> Shared_Clean:       1176 kB
> Shared_Dirty:        420 kB
> Private_Clean:         0 kB
> Private_Dirty:      3128 kB
> Referenced:         4344 kB
> Anonymous:          3548 kB
> KSM:                   0 kB
> LazyFree:              0 kB
> AnonHugePages:      2048 kB
> ShmemPmdMapped:        0 kB
> FilePmdMapped:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> Locked:                0 kB
> 
> 2) without efa7df3e3bb5 smaps
> 
> ffff9845b000-ffffb855f000 rw-p 00000000 00:00 0
> Size:             525328 kB

This is a merged-vma version of the above 2 regions.

> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Rss:                1128 kB
> Pss:                1128 kB
> Pss_Dirty:          1128 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:      1128 kB
> Referenced:         1128 kB
> Anonymous:          1128 kB // only large anon
> KSM:                   0 kB
> LazyFree:              0 kB
> AnonHugePages:         0 kB
> ShmemPmdMapped:        0 kB
> FilePmdMapped:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> Locked:                0 kB
> THPeligible:           1
> VmFlags: rd wr mr mw me ac
> 
> and the smap_rollup,
> 
> 00400000-ffffca5dc000 ---p 00000000 00:00 0 [rollup]
> Rss:                2600 kB
> Pss:                1472 kB
> Pss_Dirty:          1388 kB
> Pss_Anon:           1388 kB
> Pss_File:             84 kB
> Pss_Shmem:             0 kB
> Shared_Clean:       1000 kB
> Shared_Dirty:        424 kB
> Private_Clean:         0 kB
> Private_Dirty:      1176 kB
> Referenced:         2220 kB
> Anonymous:          1600 kB
> KSM:                   0 kB
> LazyFree:              0 kB
> AnonHugePages:         0 kB
> ShmemPmdMapped:        0 kB
> FilePmdMapped:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> Locked:                0 kB
> 
>>
>> I just ran the same command on 6.6.13 (w/o the thp alignment patch and
>> mTHP stuff) and 6.9-rc4 (w/ the thp alignment patch and all mTHP
>> stuff) on my arm64 machine, but I didn't see such a pattern.
>>
>> The result has a little bit fluctuation, for example, 6.6.13 has
>> better result with 4M/6M/8M, but 6.9-rc4 has better result for
>> 12M/16M/32M/48M/64M, and the difference may be quite noticeable. But
>> anyway I didn't see such a regression pattern.
> 
> Although it is not fluctuation, but on our arm64, it is very noticeable.
> 
>>
>> The benchmark is supposed to measure cache and memory latency, its
>> result strongly relies on the cache and memory subsystem, for example,
>> hw prefetcher, etc.
> 
> Yes, I will try another type of arm64 if possible, no available machine now.
> 
> 
>>
>>>
>>> Is it possible to provide the smaps output for at least that 512M+8K block for
>>> both cases? It might give a bit of a clue.
> 
> Will collect more smaps.
> 
>>>
>>> Do you have traditional (PMD-sized) THP enabled? If its enabled and unaligned
>>> then the front of the buffer wouldn't be mapped with THP, but if it is aligned,
>>> it will. That could affect it.
> 
> Yes, PMD-sized THP enabled. at least for above smaps, without efa7df3e3bb5, the
> anon don't be mappped with THP.
Kefeng Wang May 8, 2024, 1:37 p.m. UTC | #26
On 2024/5/8 16:36, Ryan Roberts wrote:
> On 08/05/2024 08:48, Kefeng Wang wrote:
>>
>>
>> On 2024/5/8 1:17, Yang Shi wrote:
>>> On Tue, May 7, 2024 at 8:53 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
>>>> On 07/05/2024 14:53, Kefeng Wang wrote:
>>>>>
>>>>>
>>>>> On 2024/5/7 19:13, David Hildenbrand wrote:
>>>>>>
>>>>>>> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
>>>>>>>
>>>>>>>> suggest. If you want to try something semi-randomly; it might be useful
>>>>>>>> to rule
>>>>>>>> out the arm64 contpte feature. I don't see how that would be interacting
>>>>>>>> here if
>>>>>>>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
>>>>>>>> ARM64_CONTPTE (needs EXPERT) at compile time.
>>>>>>> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
>>>>>>> but will have a try.
>>>>>
>>>>> After ARM64_CONTPTE disabled, memory read latency is similar with ARM64_CONTPTE
>>>>> enabled(default 6.9-rc7), still larger than align anon reverted.
>>>>
>>>> OK thanks for trying.
>>>>
>>>> Looking at the source for lmbench, its malloc'ing (512M + 8K) up front and using
>>>> that for all sizes. That will presumably be considered "large" by malloc and
>>>> will be allocated using mmap. So with the patch, it will be 2M aligned. Without
>>>> it, it probably won't. I'm still struggling to understand why not aligning it in
>>>> virtual space would make it more performant though...
>>>
>>> Yeah, I'm confused too.
>> Me too, I get a smaps[_rollup] for 0.09375M size, the biggest difference
>> for anon shows below, and all attached.
> 
> OK, a bit more insight; during initialization, the test makes 2 big malloc
> calls; the first is 1M and the second is 512M+8K. I think those 2 are the 2 vmas
> below (malloc is adding an extra page to the allocation, presumably for
> management structures).
> 
> With efa7df3e3bb5 applied, the 1M allocation is allocated at a non-THP-aligned
> address. All of its pages are populated (see permutation() which allocates and
> writes it) but none of them are THP (obviously - its only 1M and THP is only
> enabled for 2M). But the 512M region is allocated at a THP-aligned address. And
> the first page is populated with a THP (presumably faulted when malloc writes to
> its control structure page before the application even sees the allocated buffer.
> 
> In contrast, when efa7df3e3bb5 is reverted, neither of the vmas are THP-aligned,
> and therefore the 512M region abutts the 1M region and the vmas are merged in
> the kernel. So we end up with the single 525328 kB region. There are no THPs
> allocated here (due to alignment constraiints) so we end up with the 1M region
> fully populated with 4K pages as before, and only the malloc control page plus
> the parts of the buffer that the application actually touches being populated in
> the 512M region.
> 
> As far as I can tell, the application never touches the 1M region during the
> test so it should be cache-cold. It only touches the first part of the 512M
> buffer it needs for the size of the test (96K here?). The latency of allocating
> the THP will have been consumed during test setup so I doubt we are seeing that
> in the test results and I don't see why having a single TLB entry vs 96K/4K=24
> entries would make it slower.

It is strange, and even more stranger, I got another machine(old machine
128 core and the new machine 96 core, but with same L1/L2 cache size
per-core), the new machine without this issue, will contact with our
hardware team, maybe some different configurations(prefetch or some
other similar hardware configurations) , thank for all the suggestion
and analysis!


> 
> It would be interesting to know the address that gets returned from malloc for
> the 512M region if that's possible to get (in both cases)? I guess it is offset
> into the first page. Perhaps it is offset such that with the THP alignment case
> the 96K of interest ends up straddling 3 cache lines (cache line is 64K I
> assume?), but for the unaligned case, it ends up nicely packed in 2?

CC zuoze, please help to check this.

Thank again.
> 
> Thanks,
> Ryan
> 
>>
>> 1) with efa7df3e3bb5 smaps
>>
>> ffff68e00000-ffff88e03000 rw-p 00000000 00:00 0
>> Size:             524300 kB
>> KernelPageSize:        4 kB
>> MMUPageSize:           4 kB
>> Rss:                2048 kB
>> Pss:                2048 kB
>> Pss_Dirty:          2048 kB
>> Shared_Clean:          0 kB
>> Shared_Dirty:          0 kB
>> Private_Clean:         0 kB
>> Private_Dirty:      2048 kB
>> Referenced:         2048 kB
>> Anonymous:          2048 kB // we have 1 anon thp
>> KSM:                   0 kB
>> LazyFree:              0 kB
>> AnonHugePages:      2048 kB
> 
> Yes one 2M THP shown here.
> 
>> ShmemPmdMapped:        0 kB
>> FilePmdMapped:         0 kB
>> Shared_Hugetlb:        0 kB
>> Private_Hugetlb:       0 kB
>> Swap:                  0 kB
>> SwapPss:               0 kB
>> Locked:                0 kB
>> THPeligible:           1
>> VmFlags: rd wr mr mw me ac
>> ffff88eff000-ffff89000000 rw-p 00000000 00:00 0
>> Size:               1028 kB
>> KernelPageSize:        4 kB
>> MMUPageSize:           4 kB
>> Rss:                1028 kB
>> Pss:                1028 kB
>> Pss_Dirty:          1028 kB
>> Shared_Clean:          0 kB
>> Shared_Dirty:          0 kB
>> Private_Clean:         0 kB
>> Private_Dirty:      1028 kB
>> Referenced:         1028 kB
>> Anonymous:          1028 kB // another large anon
> 
> This is not THP, since you only have 2M THP enabled. This will be 1M of 4K page
> allocations + 1 4K page malloc control structure, allocated and accessed by
> permutation() during test setup.
> 
>> KSM:                   0 kB
>> LazyFree:              0 kB
>> AnonHugePages:         0 kB
>> ShmemPmdMapped:        0 kB
>> FilePmdMapped:         0 kB
>> Shared_Hugetlb:        0 kB
>> Private_Hugetlb:       0 kB
>> Swap:                  0 kB
>> SwapPss:               0 kB
>> Locked:                0 kB
>> THPeligible:           0
>> VmFlags: rd wr mr mw me ac
>>
>> and the smap_rollup
>>
>> 00400000-fffff56bd000 ---p 00000000 00:00 0 [rollup]
>> Rss:                4724 kB
>> Pss:                3408 kB
>> Pss_Dirty:          3338 kB
>> Pss_Anon:           3338 kB
>> Pss_File:             70 kB
>> Pss_Shmem:             0 kB
>> Shared_Clean:       1176 kB
>> Shared_Dirty:        420 kB
>> Private_Clean:         0 kB
>> Private_Dirty:      3128 kB
>> Referenced:         4344 kB
>> Anonymous:          3548 kB
>> KSM:                   0 kB
>> LazyFree:              0 kB
>> AnonHugePages:      2048 kB
>> ShmemPmdMapped:        0 kB
>> FilePmdMapped:         0 kB
>> Shared_Hugetlb:        0 kB
>> Private_Hugetlb:       0 kB
>> Swap:                  0 kB
>> SwapPss:               0 kB
>> Locked:                0 kB
>>
>> 2) without efa7df3e3bb5 smaps
>>
>> ffff9845b000-ffffb855f000 rw-p 00000000 00:00 0
>> Size:             525328 kB
> 
> This is a merged-vma version of the above 2 regions.
> 
>> KernelPageSize:        4 kB
>> MMUPageSize:           4 kB
>> Rss:                1128 kB
>> Pss:                1128 kB
>> Pss_Dirty:          1128 kB
>> Shared_Clean:          0 kB
>> Shared_Dirty:          0 kB
>> Private_Clean:         0 kB
>> Private_Dirty:      1128 kB
>> Referenced:         1128 kB
>> Anonymous:          1128 kB // only large anon
>> KSM:                   0 kB
>> LazyFree:              0 kB
>> AnonHugePages:         0 kB
>> ShmemPmdMapped:        0 kB
>> FilePmdMapped:         0 kB
>> Shared_Hugetlb:        0 kB
>> Private_Hugetlb:       0 kB
>> Swap:                  0 kB
>> SwapPss:               0 kB
>> Locked:                0 kB
>> THPeligible:           1
>> VmFlags: rd wr mr mw me ac
>>
>> and the smap_rollup,
>>
>> 00400000-ffffca5dc000 ---p 00000000 00:00 0 [rollup]
>> Rss:                2600 kB
>> Pss:                1472 kB
>> Pss_Dirty:          1388 kB
>> Pss_Anon:           1388 kB
>> Pss_File:             84 kB
>> Pss_Shmem:             0 kB
>> Shared_Clean:       1000 kB
>> Shared_Dirty:        424 kB
>> Private_Clean:         0 kB
>> Private_Dirty:      1176 kB
>> Referenced:         2220 kB
>> Anonymous:          1600 kB
>> KSM:                   0 kB
>> LazyFree:              0 kB
>> AnonHugePages:         0 kB
>> ShmemPmdMapped:        0 kB
>> FilePmdMapped:         0 kB
>> Shared_Hugetlb:        0 kB
>> Private_Hugetlb:       0 kB
>> Swap:                  0 kB
>> SwapPss:               0 kB
>> Locked:                0 kB
>>
Ryan Roberts May 8, 2024, 1:41 p.m. UTC | #27
On 08/05/2024 14:37, Kefeng Wang wrote:
> 
> 
> On 2024/5/8 16:36, Ryan Roberts wrote:
>> On 08/05/2024 08:48, Kefeng Wang wrote:
>>>
>>>
>>> On 2024/5/8 1:17, Yang Shi wrote:
>>>> On Tue, May 7, 2024 at 8:53 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>
>>>>> On 07/05/2024 14:53, Kefeng Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 2024/5/7 19:13, David Hildenbrand wrote:
>>>>>>>
>>>>>>>> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
>>>>>>>>
>>>>>>>>> suggest. If you want to try something semi-randomly; it might be useful
>>>>>>>>> to rule
>>>>>>>>> out the arm64 contpte feature. I don't see how that would be interacting
>>>>>>>>> here if
>>>>>>>>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable
>>>>>>>>> with
>>>>>>>>> ARM64_CONTPTE (needs EXPERT) at compile time.
>>>>>>>> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
>>>>>>>> but will have a try.
>>>>>>
>>>>>> After ARM64_CONTPTE disabled, memory read latency is similar with
>>>>>> ARM64_CONTPTE
>>>>>> enabled(default 6.9-rc7), still larger than align anon reverted.
>>>>>
>>>>> OK thanks for trying.
>>>>>
>>>>> Looking at the source for lmbench, its malloc'ing (512M + 8K) up front and
>>>>> using
>>>>> that for all sizes. That will presumably be considered "large" by malloc and
>>>>> will be allocated using mmap. So with the patch, it will be 2M aligned.
>>>>> Without
>>>>> it, it probably won't. I'm still struggling to understand why not aligning
>>>>> it in
>>>>> virtual space would make it more performant though...
>>>>
>>>> Yeah, I'm confused too.
>>> Me too, I get a smaps[_rollup] for 0.09375M size, the biggest difference
>>> for anon shows below, and all attached.
>>
>> OK, a bit more insight; during initialization, the test makes 2 big malloc
>> calls; the first is 1M and the second is 512M+8K. I think those 2 are the 2 vmas
>> below (malloc is adding an extra page to the allocation, presumably for
>> management structures).
>>
>> With efa7df3e3bb5 applied, the 1M allocation is allocated at a non-THP-aligned
>> address. All of its pages are populated (see permutation() which allocates and
>> writes it) but none of them are THP (obviously - its only 1M and THP is only
>> enabled for 2M). But the 512M region is allocated at a THP-aligned address. And
>> the first page is populated with a THP (presumably faulted when malloc writes to
>> its control structure page before the application even sees the allocated buffer.
>>
>> In contrast, when efa7df3e3bb5 is reverted, neither of the vmas are THP-aligned,
>> and therefore the 512M region abutts the 1M region and the vmas are merged in
>> the kernel. So we end up with the single 525328 kB region. There are no THPs
>> allocated here (due to alignment constraiints) so we end up with the 1M region
>> fully populated with 4K pages as before, and only the malloc control page plus
>> the parts of the buffer that the application actually touches being populated in
>> the 512M region.
>>
>> As far as I can tell, the application never touches the 1M region during the
>> test so it should be cache-cold. It only touches the first part of the 512M
>> buffer it needs for the size of the test (96K here?). The latency of allocating
>> the THP will have been consumed during test setup so I doubt we are seeing that
>> in the test results and I don't see why having a single TLB entry vs 96K/4K=24
>> entries would make it slower.
> 
> It is strange, and even more stranger, I got another machine(old machine
> 128 core and the new machine 96 core, but with same L1/L2 cache size
> per-core), the new machine without this issue, will contact with our
> hardware team, maybe some different configurations(prefetch or some
> other similar hardware configurations) , thank for all the suggestion
> and analysis!

No problem, you're welcome!

> 
> 
>>
>> It would be interesting to know the address that gets returned from malloc for
>> the 512M region if that's possible to get (in both cases)? I guess it is offset
>> into the first page. Perhaps it is offset such that with the THP alignment case
>> the 96K of interest ends up straddling 3 cache lines (cache line is 64K I
>> assume?), but for the unaligned case, it ends up nicely packed in 2?
> 
> CC zuoze, please help to check this.
> 
> Thank again.
>>
>> Thanks,
>> Ryan
>>
>>>
>>> 1) with efa7df3e3bb5 smaps
>>>
>>> ffff68e00000-ffff88e03000 rw-p 00000000 00:00 0
>>> Size:             524300 kB
>>> KernelPageSize:        4 kB
>>> MMUPageSize:           4 kB
>>> Rss:                2048 kB
>>> Pss:                2048 kB
>>> Pss_Dirty:          2048 kB
>>> Shared_Clean:          0 kB
>>> Shared_Dirty:          0 kB
>>> Private_Clean:         0 kB
>>> Private_Dirty:      2048 kB
>>> Referenced:         2048 kB
>>> Anonymous:          2048 kB // we have 1 anon thp
>>> KSM:                   0 kB
>>> LazyFree:              0 kB
>>> AnonHugePages:      2048 kB
>>
>> Yes one 2M THP shown here.
>>
>>> ShmemPmdMapped:        0 kB
>>> FilePmdMapped:         0 kB
>>> Shared_Hugetlb:        0 kB
>>> Private_Hugetlb:       0 kB
>>> Swap:                  0 kB
>>> SwapPss:               0 kB
>>> Locked:                0 kB
>>> THPeligible:           1
>>> VmFlags: rd wr mr mw me ac
>>> ffff88eff000-ffff89000000 rw-p 00000000 00:00 0
>>> Size:               1028 kB
>>> KernelPageSize:        4 kB
>>> MMUPageSize:           4 kB
>>> Rss:                1028 kB
>>> Pss:                1028 kB
>>> Pss_Dirty:          1028 kB
>>> Shared_Clean:          0 kB
>>> Shared_Dirty:          0 kB
>>> Private_Clean:         0 kB
>>> Private_Dirty:      1028 kB
>>> Referenced:         1028 kB
>>> Anonymous:          1028 kB // another large anon
>>
>> This is not THP, since you only have 2M THP enabled. This will be 1M of 4K page
>> allocations + 1 4K page malloc control structure, allocated and accessed by
>> permutation() during test setup.
>>
>>> KSM:                   0 kB
>>> LazyFree:              0 kB
>>> AnonHugePages:         0 kB
>>> ShmemPmdMapped:        0 kB
>>> FilePmdMapped:         0 kB
>>> Shared_Hugetlb:        0 kB
>>> Private_Hugetlb:       0 kB
>>> Swap:                  0 kB
>>> SwapPss:               0 kB
>>> Locked:                0 kB
>>> THPeligible:           0
>>> VmFlags: rd wr mr mw me ac
>>>
>>> and the smap_rollup
>>>
>>> 00400000-fffff56bd000 ---p 00000000 00:00 0 [rollup]
>>> Rss:                4724 kB
>>> Pss:                3408 kB
>>> Pss_Dirty:          3338 kB
>>> Pss_Anon:           3338 kB
>>> Pss_File:             70 kB
>>> Pss_Shmem:             0 kB
>>> Shared_Clean:       1176 kB
>>> Shared_Dirty:        420 kB
>>> Private_Clean:         0 kB
>>> Private_Dirty:      3128 kB
>>> Referenced:         4344 kB
>>> Anonymous:          3548 kB
>>> KSM:                   0 kB
>>> LazyFree:              0 kB
>>> AnonHugePages:      2048 kB
>>> ShmemPmdMapped:        0 kB
>>> FilePmdMapped:         0 kB
>>> Shared_Hugetlb:        0 kB
>>> Private_Hugetlb:       0 kB
>>> Swap:                  0 kB
>>> SwapPss:               0 kB
>>> Locked:                0 kB
>>>
>>> 2) without efa7df3e3bb5 smaps
>>>
>>> ffff9845b000-ffffb855f000 rw-p 00000000 00:00 0
>>> Size:             525328 kB
>>
>> This is a merged-vma version of the above 2 regions.
>>
>>> KernelPageSize:        4 kB
>>> MMUPageSize:           4 kB
>>> Rss:                1128 kB
>>> Pss:                1128 kB
>>> Pss_Dirty:          1128 kB
>>> Shared_Clean:          0 kB
>>> Shared_Dirty:          0 kB
>>> Private_Clean:         0 kB
>>> Private_Dirty:      1128 kB
>>> Referenced:         1128 kB
>>> Anonymous:          1128 kB // only large anon
>>> KSM:                   0 kB
>>> LazyFree:              0 kB
>>> AnonHugePages:         0 kB
>>> ShmemPmdMapped:        0 kB
>>> FilePmdMapped:         0 kB
>>> Shared_Hugetlb:        0 kB
>>> Private_Hugetlb:       0 kB
>>> Swap:                  0 kB
>>> SwapPss:               0 kB
>>> Locked:                0 kB
>>> THPeligible:           1
>>> VmFlags: rd wr mr mw me ac
>>>
>>> and the smap_rollup,
>>>
>>> 00400000-ffffca5dc000 ---p 00000000 00:00 0 [rollup]
>>> Rss:                2600 kB
>>> Pss:                1472 kB
>>> Pss_Dirty:          1388 kB
>>> Pss_Anon:           1388 kB
>>> Pss_File:             84 kB
>>> Pss_Shmem:             0 kB
>>> Shared_Clean:       1000 kB
>>> Shared_Dirty:        424 kB
>>> Private_Clean:         0 kB
>>> Private_Dirty:      1176 kB
>>> Referenced:         2220 kB
>>> Anonymous:          1600 kB
>>> KSM:                   0 kB
>>> LazyFree:              0 kB
>>> AnonHugePages:         0 kB
>>> ShmemPmdMapped:        0 kB
>>> FilePmdMapped:         0 kB
>>> Shared_Hugetlb:        0 kB
>>> Private_Hugetlb:       0 kB
>>> Swap:                  0 kB
>>> SwapPss:               0 kB
>>> Locked:                0 kB
>>>
Yang Shi May 8, 2024, 3:25 p.m. UTC | #28
On Wed, May 8, 2024 at 6:37 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
>
>
> On 2024/5/8 16:36, Ryan Roberts wrote:
> > On 08/05/2024 08:48, Kefeng Wang wrote:
> >>
> >>
> >> On 2024/5/8 1:17, Yang Shi wrote:
> >>> On Tue, May 7, 2024 at 8:53 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>>>
> >>>> On 07/05/2024 14:53, Kefeng Wang wrote:
> >>>>>
> >>>>>
> >>>>> On 2024/5/7 19:13, David Hildenbrand wrote:
> >>>>>>
> >>>>>>> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
> >>>>>>>
> >>>>>>>> suggest. If you want to try something semi-randomly; it might be useful
> >>>>>>>> to rule
> >>>>>>>> out the arm64 contpte feature. I don't see how that would be interacting
> >>>>>>>> here if
> >>>>>>>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
> >>>>>>>> ARM64_CONTPTE (needs EXPERT) at compile time.
> >>>>>>> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
> >>>>>>> but will have a try.
> >>>>>
> >>>>> After ARM64_CONTPTE disabled, memory read latency is similar with ARM64_CONTPTE
> >>>>> enabled(default 6.9-rc7), still larger than align anon reverted.
> >>>>
> >>>> OK thanks for trying.
> >>>>
> >>>> Looking at the source for lmbench, its malloc'ing (512M + 8K) up front and using
> >>>> that for all sizes. That will presumably be considered "large" by malloc and
> >>>> will be allocated using mmap. So with the patch, it will be 2M aligned. Without
> >>>> it, it probably won't. I'm still struggling to understand why not aligning it in
> >>>> virtual space would make it more performant though...
> >>>
> >>> Yeah, I'm confused too.
> >> Me too, I get a smaps[_rollup] for 0.09375M size, the biggest difference
> >> for anon shows below, and all attached.
> >
> > OK, a bit more insight; during initialization, the test makes 2 big malloc
> > calls; the first is 1M and the second is 512M+8K. I think those 2 are the 2 vmas
> > below (malloc is adding an extra page to the allocation, presumably for
> > management structures).
> >
> > With efa7df3e3bb5 applied, the 1M allocation is allocated at a non-THP-aligned
> > address. All of its pages are populated (see permutation() which allocates and
> > writes it) but none of them are THP (obviously - its only 1M and THP is only
> > enabled for 2M). But the 512M region is allocated at a THP-aligned address. And
> > the first page is populated with a THP (presumably faulted when malloc writes to
> > its control structure page before the application even sees the allocated buffer.
> >
> > In contrast, when efa7df3e3bb5 is reverted, neither of the vmas are THP-aligned,
> > and therefore the 512M region abutts the 1M region and the vmas are merged in
> > the kernel. So we end up with the single 525328 kB region. There are no THPs
> > allocated here (due to alignment constraiints) so we end up with the 1M region
> > fully populated with 4K pages as before, and only the malloc control page plus
> > the parts of the buffer that the application actually touches being populated in
> > the 512M region.
> >
> > As far as I can tell, the application never touches the 1M region during the
> > test so it should be cache-cold. It only touches the first part of the 512M
> > buffer it needs for the size of the test (96K here?). The latency of allocating
> > the THP will have been consumed during test setup so I doubt we are seeing that
> > in the test results and I don't see why having a single TLB entry vs 96K/4K=24
> > entries would make it slower.
>
> It is strange, and even more stranger, I got another machine(old machine
> 128 core and the new machine 96 core, but with same L1/L2 cache size
> per-core), the new machine without this issue, will contact with our
> hardware team, maybe some different configurations(prefetch or some
> other similar hardware configurations) , thank for all the suggestion
> and analysis!

Yes, the benchmark result strongly relies on cache and memory
subsystem. See the below analysis.

>
>
> >
> > It would be interesting to know the address that gets returned from malloc for
> > the 512M region if that's possible to get (in both cases)? I guess it is offset
> > into the first page. Perhaps it is offset such that with the THP alignment case
> > the 96K of interest ends up straddling 3 cache lines (cache line is 64K I
> > assume?), but for the unaligned case, it ends up nicely packed in 2?
>
> CC zuoze, please help to check this.
>
> Thank again.
> >
> > Thanks,
> > Ryan
> >
> >>
> >> 1) with efa7df3e3bb5 smaps
> >>
> >> ffff68e00000-ffff88e03000 rw-p 00000000 00:00 0
> >> Size:             524300 kB
> >> KernelPageSize:        4 kB
> >> MMUPageSize:           4 kB
> >> Rss:                2048 kB
> >> Pss:                2048 kB
> >> Pss_Dirty:          2048 kB
> >> Shared_Clean:          0 kB
> >> Shared_Dirty:          0 kB
> >> Private_Clean:         0 kB
> >> Private_Dirty:      2048 kB
> >> Referenced:         2048 kB
> >> Anonymous:          2048 kB // we have 1 anon thp
> >> KSM:                   0 kB
> >> LazyFree:              0 kB
> >> AnonHugePages:      2048 kB
> >
> > Yes one 2M THP shown here.

You have THP allocated. W/o commit efa7df3e3bb5 the address may be not
PMD aligned (it still could be, but just not that likely), the base
pages were allocated. To get an apple to apple comparison, you need to
disable THP by setting /sys/kernel/mm/transparent_hugepage/enabled to
madvise or never, then you will get base pages too (IIRC lmbench
doesn't call MADV_HUGEPAGE).

The address alignment or page size may have a negative impact to your
CPU's cache and memory subsystem, for example, hw prefetcher. But I
saw a slight improvement with THP on my machine. So the behavior
strongly depends on the hardware.

> >
> >> ShmemPmdMapped:        0 kB
> >> FilePmdMapped:         0 kB
> >> Shared_Hugetlb:        0 kB
> >> Private_Hugetlb:       0 kB
> >> Swap:                  0 kB
> >> SwapPss:               0 kB
> >> Locked:                0 kB
> >> THPeligible:           1
> >> VmFlags: rd wr mr mw me ac
> >> ffff88eff000-ffff89000000 rw-p 00000000 00:00 0
> >> Size:               1028 kB
> >> KernelPageSize:        4 kB
> >> MMUPageSize:           4 kB
> >> Rss:                1028 kB
> >> Pss:                1028 kB
> >> Pss_Dirty:          1028 kB
> >> Shared_Clean:          0 kB
> >> Shared_Dirty:          0 kB
> >> Private_Clean:         0 kB
> >> Private_Dirty:      1028 kB
> >> Referenced:         1028 kB
> >> Anonymous:          1028 kB // another large anon
> >
> > This is not THP, since you only have 2M THP enabled. This will be 1M of 4K page
> > allocations + 1 4K page malloc control structure, allocated and accessed by
> > permutation() during test setup.
> >
> >> KSM:                   0 kB
> >> LazyFree:              0 kB
> >> AnonHugePages:         0 kB
> >> ShmemPmdMapped:        0 kB
> >> FilePmdMapped:         0 kB
> >> Shared_Hugetlb:        0 kB
> >> Private_Hugetlb:       0 kB
> >> Swap:                  0 kB
> >> SwapPss:               0 kB
> >> Locked:                0 kB
> >> THPeligible:           0
> >> VmFlags: rd wr mr mw me ac
> >>
> >> and the smap_rollup
> >>
> >> 00400000-fffff56bd000 ---p 00000000 00:00 0 [rollup]
> >> Rss:                4724 kB
> >> Pss:                3408 kB
> >> Pss_Dirty:          3338 kB
> >> Pss_Anon:           3338 kB
> >> Pss_File:             70 kB
> >> Pss_Shmem:             0 kB
> >> Shared_Clean:       1176 kB
> >> Shared_Dirty:        420 kB
> >> Private_Clean:         0 kB
> >> Private_Dirty:      3128 kB
> >> Referenced:         4344 kB
> >> Anonymous:          3548 kB
> >> KSM:                   0 kB
> >> LazyFree:              0 kB
> >> AnonHugePages:      2048 kB
> >> ShmemPmdMapped:        0 kB
> >> FilePmdMapped:         0 kB
> >> Shared_Hugetlb:        0 kB
> >> Private_Hugetlb:       0 kB
> >> Swap:                  0 kB
> >> SwapPss:               0 kB
> >> Locked:                0 kB
> >>
> >> 2) without efa7df3e3bb5 smaps
> >>
> >> ffff9845b000-ffffb855f000 rw-p 00000000 00:00 0
> >> Size:             525328 kB
> >
> > This is a merged-vma version of the above 2 regions.
> >
> >> KernelPageSize:        4 kB
> >> MMUPageSize:           4 kB
> >> Rss:                1128 kB
> >> Pss:                1128 kB
> >> Pss_Dirty:          1128 kB
> >> Shared_Clean:          0 kB
> >> Shared_Dirty:          0 kB
> >> Private_Clean:         0 kB
> >> Private_Dirty:      1128 kB
> >> Referenced:         1128 kB
> >> Anonymous:          1128 kB // only large anon
> >> KSM:                   0 kB
> >> LazyFree:              0 kB
> >> AnonHugePages:         0 kB
> >> ShmemPmdMapped:        0 kB
> >> FilePmdMapped:         0 kB
> >> Shared_Hugetlb:        0 kB
> >> Private_Hugetlb:       0 kB
> >> Swap:                  0 kB
> >> SwapPss:               0 kB
> >> Locked:                0 kB
> >> THPeligible:           1
> >> VmFlags: rd wr mr mw me ac
> >>
> >> and the smap_rollup,
> >>
> >> 00400000-ffffca5dc000 ---p 00000000 00:00 0 [rollup]
> >> Rss:                2600 kB
> >> Pss:                1472 kB
> >> Pss_Dirty:          1388 kB
> >> Pss_Anon:           1388 kB
> >> Pss_File:             84 kB
> >> Pss_Shmem:             0 kB
> >> Shared_Clean:       1000 kB
> >> Shared_Dirty:        424 kB
> >> Private_Clean:         0 kB
> >> Private_Dirty:      1176 kB
> >> Referenced:         2220 kB
> >> Anonymous:          1600 kB
> >> KSM:                   0 kB
> >> LazyFree:              0 kB
> >> AnonHugePages:         0 kB
> >> ShmemPmdMapped:        0 kB
> >> FilePmdMapped:         0 kB
> >> Shared_Hugetlb:        0 kB
> >> Private_Hugetlb:       0 kB
> >> Swap:                  0 kB
> >> SwapPss:               0 kB
> >> Locked:                0 kB
> >>
Kefeng Wang May 9, 2024, 1:47 a.m. UTC | #29
On 2024/5/8 23:25, Yang Shi wrote:
> On Wed, May 8, 2024 at 6:37 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>
>>
>>
>> On 2024/5/8 16:36, Ryan Roberts wrote:
>>> On 08/05/2024 08:48, Kefeng Wang wrote:
>>>>
>>>>
>>>> On 2024/5/8 1:17, Yang Shi wrote:
>>>>> On Tue, May 7, 2024 at 8:53 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>
>>>>>> On 07/05/2024 14:53, Kefeng Wang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2024/5/7 19:13, David Hildenbrand wrote:
>>>>>>>>
>>>>>>>>> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
>>>>>>>>>
>>>>>>>>>> suggest. If you want to try something semi-randomly; it might be useful
>>>>>>>>>> to rule
>>>>>>>>>> out the arm64 contpte feature. I don't see how that would be interacting
>>>>>>>>>> here if
>>>>>>>>>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
>>>>>>>>>> ARM64_CONTPTE (needs EXPERT) at compile time.
>>>>>>>>> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
>>>>>>>>> but will have a try.
>>>>>>>
>>>>>>> After ARM64_CONTPTE disabled, memory read latency is similar with ARM64_CONTPTE
>>>>>>> enabled(default 6.9-rc7), still larger than align anon reverted.
>>>>>>
>>>>>> OK thanks for trying.
>>>>>>
>>>>>> Looking at the source for lmbench, its malloc'ing (512M + 8K) up front and using
>>>>>> that for all sizes. That will presumably be considered "large" by malloc and
>>>>>> will be allocated using mmap. So with the patch, it will be 2M aligned. Without
>>>>>> it, it probably won't. I'm still struggling to understand why not aligning it in
>>>>>> virtual space would make it more performant though...
>>>>>
>>>>> Yeah, I'm confused too.
>>>> Me too, I get a smaps[_rollup] for 0.09375M size, the biggest difference
>>>> for anon shows below, and all attached.
>>>
>>> OK, a bit more insight; during initialization, the test makes 2 big malloc
>>> calls; the first is 1M and the second is 512M+8K. I think those 2 are the 2 vmas
>>> below (malloc is adding an extra page to the allocation, presumably for
>>> management structures).
>>>
>>> With efa7df3e3bb5 applied, the 1M allocation is allocated at a non-THP-aligned
>>> address. All of its pages are populated (see permutation() which allocates and
>>> writes it) but none of them are THP (obviously - its only 1M and THP is only
>>> enabled for 2M). But the 512M region is allocated at a THP-aligned address. And
>>> the first page is populated with a THP (presumably faulted when malloc writes to
>>> its control structure page before the application even sees the allocated buffer.
>>>
>>> In contrast, when efa7df3e3bb5 is reverted, neither of the vmas are THP-aligned,
>>> and therefore the 512M region abutts the 1M region and the vmas are merged in
>>> the kernel. So we end up with the single 525328 kB region. There are no THPs
>>> allocated here (due to alignment constraiints) so we end up with the 1M region
>>> fully populated with 4K pages as before, and only the malloc control page plus
>>> the parts of the buffer that the application actually touches being populated in
>>> the 512M region.
>>>
>>> As far as I can tell, the application never touches the 1M region during the
>>> test so it should be cache-cold. It only touches the first part of the 512M
>>> buffer it needs for the size of the test (96K here?). The latency of allocating
>>> the THP will have been consumed during test setup so I doubt we are seeing that
>>> in the test results and I don't see why having a single TLB entry vs 96K/4K=24
>>> entries would make it slower.
>>
>> It is strange, and even more stranger, I got another machine(old machine
>> 128 core and the new machine 96 core, but with same L1/L2 cache size
>> per-core), the new machine without this issue, will contact with our
>> hardware team, maybe some different configurations(prefetch or some
>> other similar hardware configurations) , thank for all the suggestion
>> and analysis!
> 
> Yes, the benchmark result strongly relies on cache and memory
> subsystem. See the below analysis.
> 
>>
>>
>>>
>>> It would be interesting to know the address that gets returned from malloc for
>>> the 512M region if that's possible to get (in both cases)? I guess it is offset
>>> into the first page. Perhaps it is offset such that with the THP alignment case
>>> the 96K of interest ends up straddling 3 cache lines (cache line is 64K I
>>> assume?), but for the unaligned case, it ends up nicely packed in 2?
>>
>> CC zuoze, please help to check this.
>>
>> Thank again.
>>>
>>> Thanks,
>>> Ryan
>>>
>>>>
>>>> 1) with efa7df3e3bb5 smaps
>>>>
>>>> ffff68e00000-ffff88e03000 rw-p 00000000 00:00 0
>>>> Size:             524300 kB
>>>> KernelPageSize:        4 kB
>>>> MMUPageSize:           4 kB
>>>> Rss:                2048 kB
>>>> Pss:                2048 kB
>>>> Pss_Dirty:          2048 kB
>>>> Shared_Clean:          0 kB
>>>> Shared_Dirty:          0 kB
>>>> Private_Clean:         0 kB
>>>> Private_Dirty:      2048 kB
>>>> Referenced:         2048 kB
>>>> Anonymous:          2048 kB // we have 1 anon thp
>>>> KSM:                   0 kB
>>>> LazyFree:              0 kB
>>>> AnonHugePages:      2048 kB
>>>
>>> Yes one 2M THP shown here.
> 
> You have THP allocated. W/o commit efa7df3e3bb5 the address may be not
> PMD aligned (it still could be, but just not that likely), the base
> pages were allocated. To get an apple to apple comparison, you need to
> disable THP by setting /sys/kernel/mm/transparent_hugepage/enabled to
> madvise or never, then you will get base pages too (IIRC lmbench
> doesn't call MADV_HUGEPAGE).

Yes, we tested no THP(disable by sysfs) before, no different w/ or w/o
this efa7df3e3bb5.

> 
> The address alignment or page size may have a negative impact to your
> CPU's cache and memory subsystem, for example, hw prefetcher. But I
> saw a slight improvement with THP on my machine. So the behavior
> strongly depends on the hardware.
> 
I hope this efa7df3e3bb5 could improve performance so I backport it
into our kernel, but found the above issue, and same result when retest
with the 6.9-rc7, since different hardware show different results, we
will test more hardware and try to contact with hardware team, thanks 
for your help.
diff mbox series

Patch

diff --git a/mm/mmap.c b/mm/mmap.c
index 9d780f415be3..dd25a2aa94f7 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2232,6 +2232,9 @@  get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
 		 */
 		pgoff = 0;
 		get_area = shmem_get_unmapped_area;
+	} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+		/* Ensures that larger anonymous mappings are THP aligned. */
+		get_area = thp_get_unmapped_area;
 	}
 
 	addr = get_area(file, addr, len, pgoff, flags);