diff mbox series

[mm-unstable,v7,13/18] proc/smaps: add PMDMappable field to smaps

Message ID 20220706235936.2197195-14-zokeefe@google.com (mailing list archive)
State New
Headers show
Series mm: userspace hugepage collapse | expand

Commit Message

Zach O'Keefe July 6, 2022, 11:59 p.m. UTC
Add PMDMappable field to smaps output which informs the user if memory
in the VMA can be PMD-mapped by MADV_COLLAPSE.

The distinction from THPeligible is needed for two reasons:

1) For THP, MADV_COLLAPSE is not coupled to THP sysfs controls, which
   THPeligible reports.

2) PMDMappable can also be used in HugeTLB fine-granularity mappings,
   which are independent from THP.

Signed-off-by: Zach O'Keefe <zokeefe@google.com>
---
 Documentation/filesystems/proc.rst | 10 ++++++++--
 fs/proc/task_mmu.c                 |  2 ++
 2 files changed, 10 insertions(+), 2 deletions(-)

Comments

Yang Shi July 11, 2022, 9:37 p.m. UTC | #1
On Wed, Jul 6, 2022 at 5:06 PM Zach O'Keefe <zokeefe@google.com> wrote:
>
> Add PMDMappable field to smaps output which informs the user if memory
> in the VMA can be PMD-mapped by MADV_COLLAPSE.
>
> The distinction from THPeligible is needed for two reasons:
>
> 1) For THP, MADV_COLLAPSE is not coupled to THP sysfs controls, which
>    THPeligible reports.
>
> 2) PMDMappable can also be used in HugeTLB fine-granularity mappings,
>    which are independent from THP.

Could you please elaborate the usecase? The user checks this hint
before calling MADV_COLLAPSE? Is it really necessary?

And, TBH it sounds confusing and we don't have to maintain both
THPeligible and PMDMappable. We could just relax THPeligible to make
it return 1 even though THP is disabled by sysfs but MADV_COLLAPSE
could collapse it if such hint is useful.


>
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> ---
>  Documentation/filesystems/proc.rst | 10 ++++++++--
>  fs/proc/task_mmu.c                 |  2 ++
>  2 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 47e95dbc820d..f207903a57a5 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -466,6 +466,7 @@ Memory Area, or VMA) there is a series of lines such as the following::
>      MMUPageSize:           4 kB
>      Locked:                0 kB
>      THPeligible:           0
> +    PMDMappable:           0
>      VmFlags: rd ex mr mw me dw
>
>  The first of these lines shows the same information as is displayed for the
> @@ -518,9 +519,14 @@ replaced by copy-on-write) part of the underlying shmem object out on swap.
>  does not take into account swapped out page of underlying shmem objects.
>  "Locked" indicates whether the mapping is locked in memory or not.
>
> +"PMDMappable" indicates if the memory can be mapped by PMDs - 1 if true, 0
> +otherwise.  It just shows the current status. Note that this is memory
> +operable on explicitly by MADV_COLLAPSE.
> +
>  "THPeligible" indicates whether the mapping is eligible for allocating THP
> -pages as well as the THP is PMD mappable or not - 1 if true, 0 otherwise.
> -It just shows the current status.
> +pages by the kernel, as well as the THP is PMD mappable or not - 1 if true, 0
> +otherwise. It just shows the current status.  Note this is memory the kernel can
> +transparently provide as THPs.
>
>  "VmFlags" field deserves a separate description. This member represents the
>  kernel flags associated with the particular virtual memory area in two letter
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index f8cd58846a28..29f2089456ba 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -867,6 +867,8 @@ static int show_smap(struct seq_file *m, void *v)
>
>         seq_printf(m, "THPeligible:    %d\n",
>                    hugepage_vma_check(vma, vma->vm_flags, true, false, true));
> +       seq_printf(m, "PMDMappable:    %d\n",
> +                  hugepage_vma_check(vma, vma->vm_flags, true, false, false));
>
>         if (arch_pkeys_enabled())
>                 seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
> --
> 2.37.0.rc0.161.g10f37bed90-goog
>
Zach O'Keefe July 12, 2022, 4:31 p.m. UTC | #2
On Jul 11 14:37, Yang Shi wrote:
> On Wed, Jul 6, 2022 at 5:06 PM Zach O'Keefe <zokeefe@google.com> wrote:
> >
> > Add PMDMappable field to smaps output which informs the user if memory
> > in the VMA can be PMD-mapped by MADV_COLLAPSE.
> >
> > The distinction from THPeligible is needed for two reasons:
> >
> > 1) For THP, MADV_COLLAPSE is not coupled to THP sysfs controls, which
> >    THPeligible reports.
> >
> > 2) PMDMappable can also be used in HugeTLB fine-granularity mappings,
> >    which are independent from THP.
> 
> Could you please elaborate the usecase? The user checks this hint
> before calling MADV_COLLAPSE? Is it really necessary?
> 
> And, TBH it sounds confusing and we don't have to maintain both
> THPeligible and PMDMappable. We could just relax THPeligible to make
> it return 1 even though THP is disabled by sysfs but MADV_COLLAPSE
> could collapse it if such hint is useful.
> 

Hey Yang,

Thanks for taking the time to review this series again, and thanks for
challenging this.

TLDR: "Is it really necessary" - at the moment, no, probably not .. but I think
it's "useful".

Rationale:

1. IMO, I thought was was confusing seeing:

	...
	AnonHugePages:      2048 kB
	ShmemPmdMapped:        0 kB
	FilePmdMapped:         0 kB
	Shared_Hugetlb:        0 kB
	Private_Hugetlb:       0 kB
	Swap:                  0 kB
	SwapPss:               0 kB
	Locked:                0 kB
	THPeligible:    0
	...

Maybe this could simply be clarified in the docs though.  I guess we can already
get:

	...
	AnonHugePages:         0 kB
	ShmemPmdMapped:        0 kB
	FilePmdMapped:      2048 kB
	Shared_Hugetlb:        0 kB
	Private_Hugetlb:       0 kB
	Swap:                  0 kB
	SwapPss:               0 kB
	Locked:                0 kB
	THPeligible:    0
	...

today[1], so perhaps it's not a big deal.


2. It was useful for debugging - similar to rationale for including
THPeligible1[2], the logic for determining if a VMA is eligible is pretty
complicated. I.e. is this file mapped suitably? Unlike THPeligible, however,
madvise(2) has the ability to set errno on failure to help* diagnose why some
memory isn't being backed.

3. For the immediately-envisioned usecases, the user "knows" about what memory
they are acting on. However, eventually we'd like to experiment with moving THP
utilization policy to userspace. Here, it would be useful if the userspace agent
managing was made aware of what memory it should be managing. I don't have a
working prototype of what this would like yet, however.

4. I thought it was neat that this field could be reused for HugeTLB
fine-granularity mappings - but TBH I'm not sure how useful it'd be there.

I figured relaxing existing THPeligible could break existing users / tests, and
it'd be likewise confusing for them to see THPeligible:	1, but then have faults
fail and they'd then have to go check sysfs settings and vma flags ; we'd be
back in pre-commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
vma").

Thanks,
Zach

[1] https://lore.kernel.org/linux-mm/YrxbQGiwml24APCx@google.com/


> 
> >
> > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> > ---
> >  Documentation/filesystems/proc.rst | 10 ++++++++--
> >  fs/proc/task_mmu.c                 |  2 ++
> >  2 files changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > index 47e95dbc820d..f207903a57a5 100644
> > --- a/Documentation/filesystems/proc.rst
> > +++ b/Documentation/filesystems/proc.rst
> > @@ -466,6 +466,7 @@ Memory Area, or VMA) there is a series of lines such as the following::
> >      MMUPageSize:           4 kB
> >      Locked:                0 kB
> >      THPeligible:           0
> > +    PMDMappable:           0
> >      VmFlags: rd ex mr mw me dw
> >
> >  The first of these lines shows the same information as is displayed for the
> > @@ -518,9 +519,14 @@ replaced by copy-on-write) part of the underlying shmem object out on swap.
> >  does not take into account swapped out page of underlying shmem objects.
> >  "Locked" indicates whether the mapping is locked in memory or not.
> >
> > +"PMDMappable" indicates if the memory can be mapped by PMDs - 1 if true, 0
> > +otherwise.  It just shows the current status. Note that this is memory
> > +operable on explicitly by MADV_COLLAPSE.
> > +
> >  "THPeligible" indicates whether the mapping is eligible for allocating THP
> > -pages as well as the THP is PMD mappable or not - 1 if true, 0 otherwise.
> > -It just shows the current status.
> > +pages by the kernel, as well as the THP is PMD mappable or not - 1 if true, 0
> > +otherwise. It just shows the current status.  Note this is memory the kernel can
> > +transparently provide as THPs.
> >
> >  "VmFlags" field deserves a separate description. This member represents the
> >  kernel flags associated with the particular virtual memory area in two letter
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index f8cd58846a28..29f2089456ba 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -867,6 +867,8 @@ static int show_smap(struct seq_file *m, void *v)
> >
> >         seq_printf(m, "THPeligible:    %d\n",
> >                    hugepage_vma_check(vma, vma->vm_flags, true, false, true));
> > +       seq_printf(m, "PMDMappable:    %d\n",
> > +                  hugepage_vma_check(vma, vma->vm_flags, true, false, false));
> >
> >         if (arch_pkeys_enabled())
> >                 seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
> > --
> > 2.37.0.rc0.161.g10f37bed90-goog
> >
Yang Shi July 12, 2022, 5:27 p.m. UTC | #3
On Tue, Jul 12, 2022 at 9:31 AM Zach O'Keefe <zokeefe@google.com> wrote:
>
> On Jul 11 14:37, Yang Shi wrote:
> > On Wed, Jul 6, 2022 at 5:06 PM Zach O'Keefe <zokeefe@google.com> wrote:
> > >
> > > Add PMDMappable field to smaps output which informs the user if memory
> > > in the VMA can be PMD-mapped by MADV_COLLAPSE.
> > >
> > > The distinction from THPeligible is needed for two reasons:
> > >
> > > 1) For THP, MADV_COLLAPSE is not coupled to THP sysfs controls, which
> > >    THPeligible reports.
> > >
> > > 2) PMDMappable can also be used in HugeTLB fine-granularity mappings,
> > >    which are independent from THP.
> >
> > Could you please elaborate the usecase? The user checks this hint
> > before calling MADV_COLLAPSE? Is it really necessary?
> >
> > And, TBH it sounds confusing and we don't have to maintain both
> > THPeligible and PMDMappable. We could just relax THPeligible to make
> > it return 1 even though THP is disabled by sysfs but MADV_COLLAPSE
> > could collapse it if such hint is useful.
> >
>
> Hey Yang,
>
> Thanks for taking the time to review this series again, and thanks for
> challenging this.
>
> TLDR: "Is it really necessary" - at the moment, no, probably not .. but I think
> it's "useful".
>
> Rationale:
>
> 1. IMO, I thought was was confusing seeing:
>
>         ...
>         AnonHugePages:      2048 kB
>         ShmemPmdMapped:        0 kB
>         FilePmdMapped:         0 kB
>         Shared_Hugetlb:        0 kB
>         Private_Hugetlb:       0 kB
>         Swap:                  0 kB
>         SwapPss:               0 kB
>         Locked:                0 kB
>         THPeligible:    0
>         ...
>
> Maybe this could simply be clarified in the docs though.  I guess we can already
> get:
>
>         ...
>         AnonHugePages:         0 kB
>         ShmemPmdMapped:        0 kB
>         FilePmdMapped:      2048 kB
>         Shared_Hugetlb:        0 kB
>         Private_Hugetlb:       0 kB
>         Swap:                  0 kB
>         SwapPss:               0 kB
>         Locked:                0 kB
>         THPeligible:    0
>         ...
>
> today[1], so perhaps it's not a big deal.

Not only that, if you have file PMD mapped then turn the THP sysfs
flag off, you get the same result. It is just a hint and just shows
the status at that moment when reading smaps.

>
>
> 2. It was useful for debugging - similar to rationale for including
> THPeligible1[2], the logic for determining if a VMA is eligible is pretty
> complicated. I.e. is this file mapped suitably? Unlike THPeligible, however,
> madvise(2) has the ability to set errno on failure to help* diagnose why some
> memory isn't being backed.

I don't disagree it would help for debugging. But as a user who
doesn't know too much about kernel internals, when I see THPeligible
and PMDmappable, I would get confused TBH. And do we have to maintain
another similar hint? Maybe not.

>
> 3. For the immediately-envisioned usecases, the user "knows" about what memory
> they are acting on. However, eventually we'd like to experiment with moving THP
> utilization policy to userspace. Here, it would be useful if the userspace agent
> managing was made aware of what memory it should be managing. I don't have a
> working prototype of what this would like yet, however.

It is not a strong justification to add some user visible stuff for a
future feature (not even prototyped) since things may change, it
sounds safer to add such things once the usecase is solid TBH.

>
> 4. I thought it was neat that this field could be reused for HugeTLB
> fine-granularity mappings - but TBH I'm not sure how useful it'd be there.
>
> I figured relaxing existing THPeligible could break existing users / tests, and
> it'd be likewise confusing for them to see THPeligible: 1, but then have faults
> fail and they'd then have to go check sysfs settings and vma flags ; we'd be
> back in pre-commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
> vma").

I'm not sure what applications rely on this hint, but if they are just
some test scripts, I think it should be fine. I don't think we
guarantee the test scripts won't get broken. AFAIK some test scripts
rely on the kernel dmesg text, for example, OOMs. And the meaning of
the fields do change, for example, inactive anon of /proc/meminfo,
which was changed by the patchset which put anon pages on inactive
list first instead of active list. We already noticed the abnormal
value from our monitoring tool when we adopted 5.10+ kernel. And
/proc/vmstat also had some fields renamed, for example,
workingset_refault of /proc/vmstat, it was split to
workseting_refault_anon and workingset_refault_file, so we had to
update our monitoring scripts accordingly. I think /proc/meminfo and
/proc/vmstat are more heavily used than smaps.

>
> Thanks,
> Zach
>
> [1] https://lore.kernel.org/linux-mm/YrxbQGiwml24APCx@google.com/
>
>
> >
> > >
> > > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> > > ---
> > >  Documentation/filesystems/proc.rst | 10 ++++++++--
> > >  fs/proc/task_mmu.c                 |  2 ++
> > >  2 files changed, 10 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > > index 47e95dbc820d..f207903a57a5 100644
> > > --- a/Documentation/filesystems/proc.rst
> > > +++ b/Documentation/filesystems/proc.rst
> > > @@ -466,6 +466,7 @@ Memory Area, or VMA) there is a series of lines such as the following::
> > >      MMUPageSize:           4 kB
> > >      Locked:                0 kB
> > >      THPeligible:           0
> > > +    PMDMappable:           0
> > >      VmFlags: rd ex mr mw me dw
> > >
> > >  The first of these lines shows the same information as is displayed for the
> > > @@ -518,9 +519,14 @@ replaced by copy-on-write) part of the underlying shmem object out on swap.
> > >  does not take into account swapped out page of underlying shmem objects.
> > >  "Locked" indicates whether the mapping is locked in memory or not.
> > >
> > > +"PMDMappable" indicates if the memory can be mapped by PMDs - 1 if true, 0
> > > +otherwise.  It just shows the current status. Note that this is memory
> > > +operable on explicitly by MADV_COLLAPSE.
> > > +
> > >  "THPeligible" indicates whether the mapping is eligible for allocating THP
> > > -pages as well as the THP is PMD mappable or not - 1 if true, 0 otherwise.
> > > -It just shows the current status.
> > > +pages by the kernel, as well as the THP is PMD mappable or not - 1 if true, 0
> > > +otherwise. It just shows the current status.  Note this is memory the kernel can
> > > +transparently provide as THPs.
> > >
> > >  "VmFlags" field deserves a separate description. This member represents the
> > >  kernel flags associated with the particular virtual memory area in two letter
> > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > > index f8cd58846a28..29f2089456ba 100644
> > > --- a/fs/proc/task_mmu.c
> > > +++ b/fs/proc/task_mmu.c
> > > @@ -867,6 +867,8 @@ static int show_smap(struct seq_file *m, void *v)
> > >
> > >         seq_printf(m, "THPeligible:    %d\n",
> > >                    hugepage_vma_check(vma, vma->vm_flags, true, false, true));
> > > +       seq_printf(m, "PMDMappable:    %d\n",
> > > +                  hugepage_vma_check(vma, vma->vm_flags, true, false, false));
> > >
> > >         if (arch_pkeys_enabled())
> > >                 seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
> > > --
> > > 2.37.0.rc0.161.g10f37bed90-goog
> > >
Zach O'Keefe July 12, 2022, 5:57 p.m. UTC | #4
On Jul 12 10:27, Yang Shi wrote:
> On Tue, Jul 12, 2022 at 9:31 AM Zach O'Keefe <zokeefe@google.com> wrote:
> >
> > On Jul 11 14:37, Yang Shi wrote:
> > > On Wed, Jul 6, 2022 at 5:06 PM Zach O'Keefe <zokeefe@google.com> wrote:
> > > >
> > > > Add PMDMappable field to smaps output which informs the user if memory
> > > > in the VMA can be PMD-mapped by MADV_COLLAPSE.
> > > >
> > > > The distinction from THPeligible is needed for two reasons:
> > > >
> > > > 1) For THP, MADV_COLLAPSE is not coupled to THP sysfs controls, which
> > > >    THPeligible reports.
> > > >
> > > > 2) PMDMappable can also be used in HugeTLB fine-granularity mappings,
> > > >    which are independent from THP.
> > >
> > > Could you please elaborate the usecase? The user checks this hint
> > > before calling MADV_COLLAPSE? Is it really necessary?
> > >
> > > And, TBH it sounds confusing and we don't have to maintain both
> > > THPeligible and PMDMappable. We could just relax THPeligible to make
> > > it return 1 even though THP is disabled by sysfs but MADV_COLLAPSE
> > > could collapse it if such hint is useful.
> > >
> >
> > Hey Yang,
> >
> > Thanks for taking the time to review this series again, and thanks for
> > challenging this.
> >
> > TLDR: "Is it really necessary" - at the moment, no, probably not .. but I think
> > it's "useful".
> >
> > Rationale:
> >
> > 1. IMO, I thought was was confusing seeing:
> >
> >         ...
> >         AnonHugePages:      2048 kB
> >         ShmemPmdMapped:        0 kB
> >         FilePmdMapped:         0 kB
> >         Shared_Hugetlb:        0 kB
> >         Private_Hugetlb:       0 kB
> >         Swap:                  0 kB
> >         SwapPss:               0 kB
> >         Locked:                0 kB
> >         THPeligible:    0
> >         ...
> >
> > Maybe this could simply be clarified in the docs though.  I guess we can already
> > get:
> >
> >         ...
> >         AnonHugePages:         0 kB
> >         ShmemPmdMapped:        0 kB
> >         FilePmdMapped:      2048 kB
> >         Shared_Hugetlb:        0 kB
> >         Private_Hugetlb:       0 kB
> >         Swap:                  0 kB
> >         SwapPss:               0 kB
> >         Locked:                0 kB
> >         THPeligible:    0
> >         ...
> >
> > today[1], so perhaps it's not a big deal.
> 
> Not only that, if you have file PMD mapped then turn the THP sysfs
> flag off, you get the same result. It is just a hint and just shows
> the status at that moment when reading smaps.
>

Very good point.

> >
> >
> > 2. It was useful for debugging - similar to rationale for including
> > THPeligible1[2], the logic for determining if a VMA is eligible is pretty
> > complicated. I.e. is this file mapped suitably? Unlike THPeligible, however,
> > madvise(2) has the ability to set errno on failure to help* diagnose why some
> > memory isn't being backed.
> 
> I don't disagree it would help for debugging. But as a user who
> doesn't know too much about kernel internals, when I see THPeligible
> and PMDmappable, I would get confused TBH. And do we have to maintain
> another similar hint? Maybe not.
>

> >
> > 3. For the immediately-envisioned usecases, the user "knows" about what memory
> > they are acting on. However, eventually we'd like to experiment with moving THP
> > utilization policy to userspace. Here, it would be useful if the userspace agent
> > managing was made aware of what memory it should be managing. I don't have a
> > working prototype of what this would like yet, however.
> 
> It is not a strong justification to add some user visible stuff for a
> future feature (not even prototyped) since things may change, it
> sounds safer to add such things once the usecase is solid TBH.
>

Ya, this was a weaker point for inclusion *now* TBH.

> >
> > 4. I thought it was neat that this field could be reused for HugeTLB
> > fine-granularity mappings - but TBH I'm not sure how useful it'd be there.
> >
> > I figured relaxing existing THPeligible could break existing users / tests, and
> > it'd be likewise confusing for them to see THPeligible: 1, but then have faults
> > fail and they'd then have to go check sysfs settings and vma flags ; we'd be
> > back in pre-commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
> > vma").
> 
> I'm not sure what applications rely on this hint, but if they are just
> some test scripts, I think it should be fine. I don't think we
> guarantee the test scripts won't get broken. AFAIK some test scripts
> rely on the kernel dmesg text, for example, OOMs. And the meaning of
> the fields do change, for example, inactive anon of /proc/meminfo,
> which was changed by the patchset which put anon pages on inactive
> list first instead of active list. We already noticed the abnormal
> value from our monitoring tool when we adopted 5.10+ kernel. And
> /proc/vmstat also had some fields renamed, for example,
> workingset_refault of /proc/vmstat, it was split to
> workseting_refault_anon and workingset_refault_file, so we had to
> update our monitoring scripts accordingly. I think /proc/meminfo and
> /proc/vmstat are more heavily used than smaps.
>

Thanks for the great context. My guess is, right now, THPelligible is more
useful as-is than if we were to relax it to MADV_COLLAPSE eligibility. As such,
I'm fine dropping this until a stronger and more immediate usecase presents
itself. Thanks for checking my rationale here.

Best,
Zach


> >
> > Thanks,
> > Zach
> >
> > [1] https://lore.kernel.org/linux-mm/YrxbQGiwml24APCx@google.com/
> >
> >
> > >
> > > >
> > > > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> > > > ---
> > > >  Documentation/filesystems/proc.rst | 10 ++++++++--
> > > >  fs/proc/task_mmu.c                 |  2 ++
> > > >  2 files changed, 10 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > > > index 47e95dbc820d..f207903a57a5 100644
> > > > --- a/Documentation/filesystems/proc.rst
> > > > +++ b/Documentation/filesystems/proc.rst
> > > > @@ -466,6 +466,7 @@ Memory Area, or VMA) there is a series of lines such as the following::
> > > >      MMUPageSize:           4 kB
> > > >      Locked:                0 kB
> > > >      THPeligible:           0
> > > > +    PMDMappable:           0
> > > >      VmFlags: rd ex mr mw me dw
> > > >
> > > >  The first of these lines shows the same information as is displayed for the
> > > > @@ -518,9 +519,14 @@ replaced by copy-on-write) part of the underlying shmem object out on swap.
> > > >  does not take into account swapped out page of underlying shmem objects.
> > > >  "Locked" indicates whether the mapping is locked in memory or not.
> > > >
> > > > +"PMDMappable" indicates if the memory can be mapped by PMDs - 1 if true, 0
> > > > +otherwise.  It just shows the current status. Note that this is memory
> > > > +operable on explicitly by MADV_COLLAPSE.
> > > > +
> > > >  "THPeligible" indicates whether the mapping is eligible for allocating THP
> > > > -pages as well as the THP is PMD mappable or not - 1 if true, 0 otherwise.
> > > > -It just shows the current status.
> > > > +pages by the kernel, as well as the THP is PMD mappable or not - 1 if true, 0
> > > > +otherwise. It just shows the current status.  Note this is memory the kernel can
> > > > +transparently provide as THPs.
> > > >
> > > >  "VmFlags" field deserves a separate description. This member represents the
> > > >  kernel flags associated with the particular virtual memory area in two letter
> > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > > > index f8cd58846a28..29f2089456ba 100644
> > > > --- a/fs/proc/task_mmu.c
> > > > +++ b/fs/proc/task_mmu.c
> > > > @@ -867,6 +867,8 @@ static int show_smap(struct seq_file *m, void *v)
> > > >
> > > >         seq_printf(m, "THPeligible:    %d\n",
> > > >                    hugepage_vma_check(vma, vma->vm_flags, true, false, true));
> > > > +       seq_printf(m, "PMDMappable:    %d\n",
> > > > +                  hugepage_vma_check(vma, vma->vm_flags, true, false, false));
> > > >
> > > >         if (arch_pkeys_enabled())
> > > >                 seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
> > > > --
> > > > 2.37.0.rc0.161.g10f37bed90-goog
> > > >
Andrew Morton July 13, 2022, 6:02 p.m. UTC | #5
On Tue, 12 Jul 2022 10:57:07 -0700 "Zach O'Keefe" <zokeefe@google.com> wrote:

> > I'm not sure what applications rely on this hint, but if they are just
> > some test scripts, I think it should be fine. I don't think we
> > guarantee the test scripts won't get broken. AFAIK some test scripts
> > rely on the kernel dmesg text, for example, OOMs. And the meaning of
> > the fields do change, for example, inactive anon of /proc/meminfo,
> > which was changed by the patchset which put anon pages on inactive
> > list first instead of active list. We already noticed the abnormal
> > value from our monitoring tool when we adopted 5.10+ kernel. And
> > /proc/vmstat also had some fields renamed, for example,
> > workingset_refault of /proc/vmstat, it was split to
> > workseting_refault_anon and workingset_refault_file, so we had to
> > update our monitoring scripts accordingly. I think /proc/meminfo and
> > /proc/vmstat are more heavily used than smaps.
> >
> 
> Thanks for the great context. My guess is, right now, THPelligible is more
> useful as-is than if we were to relax it to MADV_COLLAPSE eligibility. As such,
> I'm fine dropping this until a stronger and more immediate usecase presents
> itself. Thanks for checking my rationale here.

So... should I drop this patch?
Zach O'Keefe July 13, 2022, 6:40 p.m. UTC | #6
On Wed, Jul 13, 2022 at 11:02 AM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Tue, 12 Jul 2022 10:57:07 -0700 "Zach O'Keefe" <zokeefe@google.com> wrote:
>
> > > I'm not sure what applications rely on this hint, but if they are just
> > > some test scripts, I think it should be fine. I don't think we
> > > guarantee the test scripts won't get broken. AFAIK some test scripts
> > > rely on the kernel dmesg text, for example, OOMs. And the meaning of
> > > the fields do change, for example, inactive anon of /proc/meminfo,
> > > which was changed by the patchset which put anon pages on inactive
> > > list first instead of active list. We already noticed the abnormal
> > > value from our monitoring tool when we adopted 5.10+ kernel. And
> > > /proc/vmstat also had some fields renamed, for example,
> > > workingset_refault of /proc/vmstat, it was split to
> > > workseting_refault_anon and workingset_refault_file, so we had to
> > > update our monitoring scripts accordingly. I think /proc/meminfo and
> > > /proc/vmstat are more heavily used than smaps.
> > >
> >
> > Thanks for the great context. My guess is, right now, THPelligible is more
> > useful as-is than if we were to relax it to MADV_COLLAPSE eligibility. As such,
> > I'm fine dropping this until a stronger and more immediate usecase presents
> > itself. Thanks for checking my rationale here.
>
> So... should I drop this patch?

Ya, I don't think I have a solid argument for inclusion right now.

Thanks Andrew,

Zach
diff mbox series

Patch

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 47e95dbc820d..f207903a57a5 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -466,6 +466,7 @@  Memory Area, or VMA) there is a series of lines such as the following::
     MMUPageSize:           4 kB
     Locked:                0 kB
     THPeligible:           0
+    PMDMappable:           0
     VmFlags: rd ex mr mw me dw
 
 The first of these lines shows the same information as is displayed for the
@@ -518,9 +519,14 @@  replaced by copy-on-write) part of the underlying shmem object out on swap.
 does not take into account swapped out page of underlying shmem objects.
 "Locked" indicates whether the mapping is locked in memory or not.
 
+"PMDMappable" indicates if the memory can be mapped by PMDs - 1 if true, 0
+otherwise.  It just shows the current status. Note that this is memory
+operable on explicitly by MADV_COLLAPSE.
+
 "THPeligible" indicates whether the mapping is eligible for allocating THP
-pages as well as the THP is PMD mappable or not - 1 if true, 0 otherwise.
-It just shows the current status.
+pages by the kernel, as well as the THP is PMD mappable or not - 1 if true, 0
+otherwise. It just shows the current status.  Note this is memory the kernel can
+transparently provide as THPs.
 
 "VmFlags" field deserves a separate description. This member represents the
 kernel flags associated with the particular virtual memory area in two letter
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f8cd58846a28..29f2089456ba 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -867,6 +867,8 @@  static int show_smap(struct seq_file *m, void *v)
 
 	seq_printf(m, "THPeligible:    %d\n",
 		   hugepage_vma_check(vma, vma->vm_flags, true, false, true));
+	seq_printf(m, "PMDMappable:    %d\n",
+		   hugepage_vma_check(vma, vma->vm_flags, true, false, false));
 
 	if (arch_pkeys_enabled())
 		seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));