Message ID | 20231222102255.56993-4-ryncsn@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm, lru_gen: batch update pages when aging | expand |
On Fri, Dec 22, 2023 at 3:24 AM Kairui Song <ryncsn@gmail.com> wrote: > > From: Kairui Song <kasong@tencent.com> > > Prefetch for inactive/active LRU have been long exiting, apply the same > optimization for MGLRU. I seriously doubt that prefetch helps in this case. Willy, any thoughts on this? Thanks. > Tested in a 4G memcg on a EPYC 7K62 with: > > memcached -u nobody -m 16384 -s /tmp/memcached.socket \ > -a 0766 -t 16 -B binary & > > memtier_benchmark -S /tmp/memcached.socket \ > -P memcache_binary -n allkeys \ > --key-minimum=1 --key-maximum=16000000 -d 1024 \ > --ratio=1:0 --key-pattern=P:P -c 2 -t 16 --pipeline 8 -x 6 > > Average result of 18 test runs: > > Before: 44017.78 Ops/sec > After patch 1-3: 44890.50 Ops/sec (+1.8%) > > Signed-off-by: Kairui Song <kasong@tencent.com> > --- > mm/vmscan.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index af1266129c1b..1e9d69e18443 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -3764,10 +3764,12 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap) > VM_WARN_ON_ONCE_FOLIO(folio_is_file_lru(folio) != type, folio); > VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio); > > - if (unlikely(list_is_first(&folio->lru, head))) > + if (unlikely(list_is_first(&folio->lru, head))) { > prev = NULL; > - else > + } else { > prev = lru_to_folio(&folio->lru); > + prefetchw(&prev->flags); > + } > > new_gen = folio_inc_gen(lruvec, folio, false, &batch); > lru_gen_try_inc_bulk(lrugen, folio, bulk_gen, new_gen, type, zone, &batch); > @@ -4434,10 +4436,12 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, > VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio); > > scanned += delta; > - if (unlikely(list_is_first(&folio->lru, head))) > + if (unlikely(list_is_first(&folio->lru, head))) { > prev = NULL; > - else > + } else { > prev = lru_to_folio(&folio->lru); > + prefetchw(&prev->flags); > + } > > if (sort_folio(lruvec, folio, sc, tier, bulk_gen, &batch)) > sorted += delta; > -- > 2.43.0 >
On Sun, Dec 24, 2023 at 11:41 PM Yu Zhao <yuzhao@google.com> wrote: > > On Fri, Dec 22, 2023 at 3:24 AM Kairui Song <ryncsn@gmail.com> wrote: > > > > From: Kairui Song <kasong@tencent.com> > > > > Prefetch for inactive/active LRU have been long exiting, apply the same > > optimization for MGLRU. > > I seriously doubt that prefetch helps in this case. > > Willy, any thoughts on this? Thanks. > > > Tested in a 4G memcg on a EPYC 7K62 with: > > > > memcached -u nobody -m 16384 -s /tmp/memcached.socket \ > > -a 0766 -t 16 -B binary & > > > > memtier_benchmark -S /tmp/memcached.socket \ > > -P memcache_binary -n allkeys \ > > --key-minimum=1 --key-maximum=16000000 -d 1024 \ > > --ratio=1:0 --key-pattern=P:P -c 2 -t 16 --pipeline 8 -x 6 > > > > Average result of 18 test runs: > > > > Before: 44017.78 Ops/sec > > After patch 1-3: 44890.50 Ops/sec (+1.8%) This patch itself only brought a 0.17% "improvement", which I'm 99.999% sure is just noise.
On Sun, Dec 24, 2023 at 11:41:31PM -0700, Yu Zhao wrote: > On Fri, Dec 22, 2023 at 3:24 AM Kairui Song <ryncsn@gmail.com> wrote: > > > > From: Kairui Song <kasong@tencent.com> > > > > Prefetch for inactive/active LRU have been long exiting, apply the same > > optimization for MGLRU. > > I seriously doubt that prefetch helps in this case. > > Willy, any thoughts on this? Thanks. It _might_ ... highly depends on microarchitecture. My experience is that it offers more benefit on AMD than on Intel, but that experience is several generations out of date and it may just not be applicable to modern AMD. It's probably more effective on ARM Cortex A cores than on ARM Cortex X cores ... maybe we can get someone from Android (Suren?) to do some testing?
On Mon, Dec 25, 2023 at 7:42 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Sun, Dec 24, 2023 at 11:41:31PM -0700, Yu Zhao wrote: > > On Fri, Dec 22, 2023 at 3:24 AM Kairui Song <ryncsn@gmail.com> wrote: > > > > > > From: Kairui Song <kasong@tencent.com> > > > > > > Prefetch for inactive/active LRU have been long exiting, apply the same > > > optimization for MGLRU. > > > > I seriously doubt that prefetch helps in this case. > > > > Willy, any thoughts on this? Thanks. > > It _might_ ... highly depends on microarchitecture. My experience is > that it offers more benefit on AMD than on Intel, but that experience > is several generations out of date and it may just not be applicable to > modern AMD. > > It's probably more effective on ARM Cortex A cores than on ARM Cortex X > cores ... maybe we can get someone from Android (Suren?) to do some > testing? Android is quite noisy and I'm afraid a small improvement like this would not be distinguishable from noise unless it's much more pronounced. I'll take a stab but don't hold your breath.
diff --git a/mm/vmscan.c b/mm/vmscan.c index af1266129c1b..1e9d69e18443 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3764,10 +3764,12 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap) VM_WARN_ON_ONCE_FOLIO(folio_is_file_lru(folio) != type, folio); VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio); - if (unlikely(list_is_first(&folio->lru, head))) + if (unlikely(list_is_first(&folio->lru, head))) { prev = NULL; - else + } else { prev = lru_to_folio(&folio->lru); + prefetchw(&prev->flags); + } new_gen = folio_inc_gen(lruvec, folio, false, &batch); lru_gen_try_inc_bulk(lrugen, folio, bulk_gen, new_gen, type, zone, &batch); @@ -4434,10 +4436,12 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio); scanned += delta; - if (unlikely(list_is_first(&folio->lru, head))) + if (unlikely(list_is_first(&folio->lru, head))) { prev = NULL; - else + } else { prev = lru_to_folio(&folio->lru); + prefetchw(&prev->flags); + } if (sort_folio(lruvec, folio, sc, tier, bulk_gen, &batch)) sorted += delta;