mbox series

[RFC,v1,0/7] Support arch-specific page aging mechanism

Message ID 20230402104240.1734931-1-aneesh.kumar@linux.ibm.com (mailing list archive)
Headers show
Series Support arch-specific page aging mechanism | expand

Message

Aneesh Kumar K.V April 2, 2023, 10:42 a.m. UTC
Architectures like powerpc support page access count mechanism which can be used
for better identification of hot/cold pages in the system. POWER10 supports a
32-bit page access count which is incremented based on page access and
decremented based on time decay. The page access count is incremented based on
physical address filtering and hence should count access via page table(mmap)
and read/write syscall.

This patch series updates multi-gen LRU to use this page access count instead of
the page table reference bit to classify a page into a generation. Pages are
classified into generation during the sorting phase of reclaim. Currently
sorting phase use generation details stored in page flags and with this change,
we can avoid using page flags for storing generation. That will free the 3 bits
in page flag used to store generation. Since the page access counting mechanism can
also count access via read/write, we can look at avoiding using tier index
in page flags. That should free the 2 bits in page flag used for REFS (this is not
done in this patch).

I also added a patch that did the below
@@ -5243,7 +5243,8 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 	if (list_empty(&list))
 		return scanned;
 retry:
-	reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false);
+	reclaimed = shrink_folio_list(&list, pgdat, sc,
+				      &stat, arch_supports_page_access_count());
 	sc->nr_reclaimed += reclaimed;

The performance did improve, but that did result in a large increase in the
workingset_refault_anon. I think this is because it takes some minimal
access to classify the pages to the younger generation and we can have high page
refaults during that window.

PATCH 2 did result in some improvements on powerpc because it is removing all
additional code that is not used in page classification.

memcached:
patch details                 Total Ops/sec:
mglru                         160821
PATCH 2                       164572

mongodb:
Patch details                 Throughput(Ops/sec)
mglru                         92987
PATCH 2                       93740

Enabling the architecture-supported page access count does impact workload
performance since updating the access count involves some memory access
overhead. Another challenge with page access count is in determining relative
hotness between pages. I did try two methods density-based clustering and kmean
clustering to classify pages to LRU generation based on sampled hotness. Doing
more work during page classification is resulting in increased lock contention
on lru_lock and hence hurts performance.


memcached:
patch details                       Total Ops/sec:
arch page access count              161940
avoid folio_check_reference         171631 (but refault count increase from 2606765 -> 7793482)

mongodb:
Patch details                      Throughput(Ops/sec)
arch page access count             92533
avoid folio_check_reference        91105 ( refault: 828951 -> 4592539)

The patch series does show that using page access count is not resulting in any
regression and can keep the code simpler w.r.t different feedback loop used
during multi-gen LRU reclaim. This also saves some bits in page->flags . It was
also observed that overhead in counting page access is not that high and can be
mitigated by further tuning of the page generation classification logic. This
also enables us to start looking at using page access count in other parts of
the linux kernel like page promotion. I haven't been able to measure the impact
on page promotion yet due to hardware availability.


Aneesh Kumar K.V (7):
  mm: Move some code around so that next patch is simpler
  mm: Don't build multi-gen LRU page table walk code on architecture not
    supported
  mm: multi-gen LRU: avoid using generation stored in page flags for
    generation
  mm: multi-gen LRU: support different page aging mechanism
  powerpc/mm: Add page access count support
  powerpc/mm: Clear page access count on allocation
  mm: multi-gen LRU: Shrink folio list without checking for page table
    reference

 arch/Kconfig                          |   3 +
 arch/arm64/Kconfig                    |   1 +
 arch/powerpc/Kconfig                  |  10 +
 arch/powerpc/include/asm/hca.h        |  49 ++++
 arch/powerpc/include/asm/page.h       |   5 +
 arch/powerpc/include/asm/page_aging.h |  35 +++
 arch/powerpc/mm/Makefile              |   1 +
 arch/powerpc/mm/hca.c                 | 288 ++++++++++++++++++++
 arch/x86/Kconfig                      |   1 +
 include/linux/memcontrol.h            |   2 +-
 include/linux/mm_inline.h             |  47 +---
 include/linux/mm_types.h              |   8 +-
 include/linux/mmzone.h                |  15 +-
 include/linux/page_aging.h            |  43 +++
 include/linux/swap.h                  |   2 +-
 kernel/fork.c                         |   2 +-
 mm/Kconfig                            |   4 +
 mm/memcontrol.c                       |   2 +-
 mm/rmap.c                             |   4 +-
 mm/vmscan.c                           | 372 ++++++++++++++++++++++----
 20 files changed, 780 insertions(+), 114 deletions(-)
 create mode 100644 arch/powerpc/include/asm/hca.h
 create mode 100644 arch/powerpc/include/asm/page_aging.h
 create mode 100644 arch/powerpc/mm/hca.c
 create mode 100644 include/linux/page_aging.h