diff mbox series

[RFC,v3,1/8] mm: multi-gen LRU: ignore non-leaf pmd_young for force_scan=true

Message ID 20240327213108.2384666-2-yuanchu@google.com (mailing list archive)
State New
Headers show
Series mm: workingset reporting | expand

Commit Message

Yuanchu Xie March 27, 2024, 9:31 p.m. UTC
When non-leaf pmd accessed bits are available, MGLRU page table walks
can clear the accessed bit and promptly ignore the accessed bit on the
pte because it's on a different node, so the walk does not update the
generation of said page. When the next scan comes around on the right
node, the non-leaf pmd accessed bit might remain cleared and the pte
accessed bits won't be checked. While this is sufficient for
reclaim-driven aging, where the goal is to select a reasonably cold
page, the access can be missed when aging proactively for measuring the
working set size of a node/memcg.

Since force_scan disables various other optimizations, we check
force_scan to ignore the non-leaf pmd accessed bit.

Signed-off-by: Yuanchu Xie <yuanchu@google.com>
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Huang, Ying April 9, 2024, 6:50 a.m. UTC | #1
Yuanchu Xie <yuanchu@google.com> writes:

> When non-leaf pmd accessed bits are available, MGLRU page table walks
> can clear the accessed bit and promptly ignore the accessed bit on the
> pte because it's on a different node, so the walk does not update the
> generation of said page. When the next scan comes around on the right
> node, the non-leaf pmd accessed bit might remain cleared and the pte
> accessed bits won't be checked. While this is sufficient for
> reclaim-driven aging, where the goal is to select a reasonably cold
> page, the access can be missed when aging proactively for measuring the
> working set size of a node/memcg.
>
> Since force_scan disables various other optimizations, we check
> force_scan to ignore the non-leaf pmd accessed bit.
>
> Signed-off-by: Yuanchu Xie <yuanchu@google.com>
> ---
>  mm/vmscan.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4f9c854ce6cc..1a7c7d537db6 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3522,7 +3522,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
>  
>  		walk->mm_stats[MM_NONLEAF_TOTAL]++;
>  
> -		if (should_clear_pmd_young()) {
> +		if (!walk->force_scan && should_clear_pmd_young()) {
>  			if (!pmd_young(val))
>  				continue;

Sorry, I don't understand why we need this.  If !pmd_young(val), we
don't need to update the generation.  If pmd_young(val), the bloom
filter will be ignored if force_scan == true.  Or do I miss something?

--
Best Regards,
Huang, Ying
Yuanchu Xie April 9, 2024, 10:36 p.m. UTC | #2
On Mon, Apr 8, 2024 at 11:52 PM Huang, Ying <ying.huang@intel.com> wrote:
>
> Yuanchu Xie <yuanchu@google.com> writes:
>
> > When non-leaf pmd accessed bits are available, MGLRU page table walks
> > can clear the accessed bit and promptly ignore the accessed bit on the
> > pte because it's on a different node, so the walk does not update the
> > generation of said page. When the next scan comes around on the right
> > node, the non-leaf pmd accessed bit might remain cleared and the pte
> > accessed bits won't be checked. While this is sufficient for
> > reclaim-driven aging, where the goal is to select a reasonably cold
> > page, the access can be missed when aging proactively for measuring the
> > working set size of a node/memcg.
> >
> > Since force_scan disables various other optimizations, we check
> > force_scan to ignore the non-leaf pmd accessed bit.
> >
> > Signed-off-by: Yuanchu Xie <yuanchu@google.com>
> > ---
> >  mm/vmscan.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 4f9c854ce6cc..1a7c7d537db6 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -3522,7 +3522,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
> >
> >               walk->mm_stats[MM_NONLEAF_TOTAL]++;
> >
> > -             if (should_clear_pmd_young()) {
> > +             if (!walk->force_scan && should_clear_pmd_young()) {
> >                       if (!pmd_young(val))
> >                               continue;
>
> Sorry, I don't understand why we need this.  If !pmd_young(val), we
> don't need to update the generation.  If pmd_young(val), the bloom
> filter will be ignored if force_scan == true.  Or do I miss something?
If !pmd_young(val), we still might need to update the generation.

The get_pfn_folio function returns NULL if the folio's nid != node
under scanning,
so the pte accessed bit does not get cleared and the generation is not updated.
Now the pmd_young flag of this pmd is cleared, and if none of the
pte's are accessed
before another round of scanning occurs on the folio's node, the pmd_young check
fails and the pte accessed bit is skipped.

This is fine for kswapd but can introduce inaccuracies when scanning
proactively for
workingset estimation.

Thanks,
Yuanchu
Huang, Ying April 10, 2024, 6:15 a.m. UTC | #3
Yuanchu Xie <yuanchu@google.com> writes:

> On Mon, Apr 8, 2024 at 11:52 PM Huang, Ying <ying.huang@intel.com> wrote:
>>
>> Yuanchu Xie <yuanchu@google.com> writes:
>>
>> > When non-leaf pmd accessed bits are available, MGLRU page table walks
>> > can clear the accessed bit and promptly ignore the accessed bit on the
>> > pte because it's on a different node, so the walk does not update the
>> > generation of said page. When the next scan comes around on the right
>> > node, the non-leaf pmd accessed bit might remain cleared and the pte
>> > accessed bits won't be checked. While this is sufficient for
>> > reclaim-driven aging, where the goal is to select a reasonably cold
>> > page, the access can be missed when aging proactively for measuring the
>> > working set size of a node/memcg.
>> >
>> > Since force_scan disables various other optimizations, we check
>> > force_scan to ignore the non-leaf pmd accessed bit.
>> >
>> > Signed-off-by: Yuanchu Xie <yuanchu@google.com>
>> > ---
>> >  mm/vmscan.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/mm/vmscan.c b/mm/vmscan.c
>> > index 4f9c854ce6cc..1a7c7d537db6 100644
>> > --- a/mm/vmscan.c
>> > +++ b/mm/vmscan.c
>> > @@ -3522,7 +3522,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
>> >
>> >               walk->mm_stats[MM_NONLEAF_TOTAL]++;
>> >
>> > -             if (should_clear_pmd_young()) {
>> > +             if (!walk->force_scan && should_clear_pmd_young()) {
>> >                       if (!pmd_young(val))
>> >                               continue;
>>
>> Sorry, I don't understand why we need this.  If !pmd_young(val), we
>> don't need to update the generation.  If pmd_young(val), the bloom
>> filter will be ignored if force_scan == true.  Or do I miss something?
> If !pmd_young(val), we still might need to update the generation.
>
> The get_pfn_folio function returns NULL if the folio's nid != node
> under scanning,
> so the pte accessed bit does not get cleared and the generation is not updated.
> Now the pmd_young flag of this pmd is cleared, and if none of the
> pte's are accessed
> before another round of scanning occurs on the folio's node, the pmd_young check
> fails and the pte accessed bit is skipped.
>
> This is fine for kswapd but can introduce inaccuracies when scanning
> proactively for
> workingset estimation.

Got it!  Thanks for detailed explanation.  Can you give more details in
patch description too?

It's unfortunate because PMD young checking helps scanning performance
much.  It's unnecessary to be done in this patchset, but I hope we can
find some way to get it back at some time.

--
Best Regards,
Huang, Ying
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4f9c854ce6cc..1a7c7d537db6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3522,7 +3522,7 @@  static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
 
 		walk->mm_stats[MM_NONLEAF_TOTAL]++;
 
-		if (should_clear_pmd_young()) {
+		if (!walk->force_scan && should_clear_pmd_young()) {
 			if (!pmd_young(val))
 				continue;