diff mbox series

[3/3] mm, lru_gen: try to prefetch next page when canning LRU

Message ID 20231222102255.56993-4-ryncsn@gmail.com (mailing list archive)
State New
Headers show
Series mm, lru_gen: batch update pages when aging | expand

Commit Message

Kairui Song Dec. 22, 2023, 10:22 a.m. UTC
From: Kairui Song <kasong@tencent.com>

Prefetch for inactive/active LRU have been long exiting, apply the same
optimization for MGLRU.

Tested in a 4G memcg on a EPYC 7K62 with:

  memcached -u nobody -m 16384 -s /tmp/memcached.socket \
    -a 0766 -t 16 -B binary &

  memtier_benchmark -S /tmp/memcached.socket \
    -P memcache_binary -n allkeys \
    --key-minimum=1 --key-maximum=16000000 -d 1024 \
    --ratio=1:0 --key-pattern=P:P -c 2 -t 16 --pipeline 8 -x 6

Average result of 18 test runs:

Before:           44017.78 Ops/sec
After patch 1-3:  44890.50 Ops/sec (+1.8%)

Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/vmscan.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

Comments

Yu Zhao Dec. 25, 2023, 6:41 a.m. UTC | #1
On Fri, Dec 22, 2023 at 3:24 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> Prefetch for inactive/active LRU have been long exiting, apply the same
> optimization for MGLRU.

I seriously doubt that prefetch helps in this case.

Willy, any thoughts on this? Thanks.

> Tested in a 4G memcg on a EPYC 7K62 with:
>
>   memcached -u nobody -m 16384 -s /tmp/memcached.socket \
>     -a 0766 -t 16 -B binary &
>
>   memtier_benchmark -S /tmp/memcached.socket \
>     -P memcache_binary -n allkeys \
>     --key-minimum=1 --key-maximum=16000000 -d 1024 \
>     --ratio=1:0 --key-pattern=P:P -c 2 -t 16 --pipeline 8 -x 6
>
> Average result of 18 test runs:
>
> Before:           44017.78 Ops/sec
> After patch 1-3:  44890.50 Ops/sec (+1.8%)
>
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
>  mm/vmscan.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index af1266129c1b..1e9d69e18443 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3764,10 +3764,12 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap)
>                         VM_WARN_ON_ONCE_FOLIO(folio_is_file_lru(folio) != type, folio);
>                         VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio);
>
> -                       if (unlikely(list_is_first(&folio->lru, head)))
> +                       if (unlikely(list_is_first(&folio->lru, head))) {
>                                 prev = NULL;
> -                       else
> +                       } else {
>                                 prev = lru_to_folio(&folio->lru);
> +                               prefetchw(&prev->flags);
> +                       }
>
>                         new_gen = folio_inc_gen(lruvec, folio, false, &batch);
>                         lru_gen_try_inc_bulk(lrugen, folio, bulk_gen, new_gen, type, zone, &batch);
> @@ -4434,10 +4436,12 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
>                         VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio);
>
>                         scanned += delta;
> -                       if (unlikely(list_is_first(&folio->lru, head)))
> +                       if (unlikely(list_is_first(&folio->lru, head))) {
>                                 prev = NULL;
> -                       else
> +                       } else {
>                                 prev = lru_to_folio(&folio->lru);
> +                               prefetchw(&prev->flags);
> +                       }
>
>                         if (sort_folio(lruvec, folio, sc, tier, bulk_gen, &batch))
>                                 sorted += delta;
> --
> 2.43.0
>
Yu Zhao Dec. 25, 2023, 6:54 a.m. UTC | #2
On Sun, Dec 24, 2023 at 11:41 PM Yu Zhao <yuzhao@google.com> wrote:
>
> On Fri, Dec 22, 2023 at 3:24 AM Kairui Song <ryncsn@gmail.com> wrote:
> >
> > From: Kairui Song <kasong@tencent.com>
> >
> > Prefetch for inactive/active LRU have been long exiting, apply the same
> > optimization for MGLRU.
>
> I seriously doubt that prefetch helps in this case.
>
> Willy, any thoughts on this? Thanks.
>
> > Tested in a 4G memcg on a EPYC 7K62 with:
> >
> >   memcached -u nobody -m 16384 -s /tmp/memcached.socket \
> >     -a 0766 -t 16 -B binary &
> >
> >   memtier_benchmark -S /tmp/memcached.socket \
> >     -P memcache_binary -n allkeys \
> >     --key-minimum=1 --key-maximum=16000000 -d 1024 \
> >     --ratio=1:0 --key-pattern=P:P -c 2 -t 16 --pipeline 8 -x 6
> >
> > Average result of 18 test runs:
> >
> > Before:           44017.78 Ops/sec
> > After patch 1-3:  44890.50 Ops/sec (+1.8%)

This patch itself only brought a 0.17% "improvement", which I'm
99.999% sure is just noise.
Matthew Wilcox Dec. 25, 2023, 3:42 p.m. UTC | #3
On Sun, Dec 24, 2023 at 11:41:31PM -0700, Yu Zhao wrote:
> On Fri, Dec 22, 2023 at 3:24 AM Kairui Song <ryncsn@gmail.com> wrote:
> >
> > From: Kairui Song <kasong@tencent.com>
> >
> > Prefetch for inactive/active LRU have been long exiting, apply the same
> > optimization for MGLRU.
> 
> I seriously doubt that prefetch helps in this case.
> 
> Willy, any thoughts on this? Thanks.

It _might_ ... highly depends on microarchitecture.  My experience is
that it offers more benefit on AMD than on Intel, but that experience
is several generations out of date and it may just not be applicable to
modern AMD.

It's probably more effective on ARM Cortex A cores than on ARM Cortex X
cores ... maybe we can get someone from Android (Suren?) to do some
testing?
Suren Baghdasaryan Dec. 26, 2023, 10:12 p.m. UTC | #4
On Mon, Dec 25, 2023 at 7:42 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sun, Dec 24, 2023 at 11:41:31PM -0700, Yu Zhao wrote:
> > On Fri, Dec 22, 2023 at 3:24 AM Kairui Song <ryncsn@gmail.com> wrote:
> > >
> > > From: Kairui Song <kasong@tencent.com>
> > >
> > > Prefetch for inactive/active LRU have been long exiting, apply the same
> > > optimization for MGLRU.
> >
> > I seriously doubt that prefetch helps in this case.
> >
> > Willy, any thoughts on this? Thanks.
>
> It _might_ ... highly depends on microarchitecture.  My experience is
> that it offers more benefit on AMD than on Intel, but that experience
> is several generations out of date and it may just not be applicable to
> modern AMD.
>
> It's probably more effective on ARM Cortex A cores than on ARM Cortex X
> cores ... maybe we can get someone from Android (Suren?) to do some
> testing?

Android is quite noisy and I'm afraid a small improvement like this
would not be distinguishable from noise unless it's much more
pronounced. I'll take a stab but don't hold your breath.
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index af1266129c1b..1e9d69e18443 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3764,10 +3764,12 @@  static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap)
 			VM_WARN_ON_ONCE_FOLIO(folio_is_file_lru(folio) != type, folio);
 			VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio);
 
-			if (unlikely(list_is_first(&folio->lru, head)))
+			if (unlikely(list_is_first(&folio->lru, head))) {
 				prev = NULL;
-			else
+			} else {
 				prev = lru_to_folio(&folio->lru);
+				prefetchw(&prev->flags);
+			}
 
 			new_gen = folio_inc_gen(lruvec, folio, false, &batch);
 			lru_gen_try_inc_bulk(lrugen, folio, bulk_gen, new_gen, type, zone, &batch);
@@ -4434,10 +4436,12 @@  static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
 			VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio);
 
 			scanned += delta;
-			if (unlikely(list_is_first(&folio->lru, head)))
+			if (unlikely(list_is_first(&folio->lru, head))) {
 				prev = NULL;
-			else
+			} else {
 				prev = lru_to_folio(&folio->lru);
+				prefetchw(&prev->flags);
+			}
 
 			if (sort_folio(lruvec, folio, sc, tier, bulk_gen, &batch))
 				sorted += delta;