mbox series

[v2,0/9] mm: introduce Designated Movable Blocks

Message ID 20220928223301.375229-1-opendmb@gmail.com (mailing list archive)
Headers show
Series mm: introduce Designated Movable Blocks | expand

Message

Doug Berger Sept. 28, 2022, 10:32 p.m. UTC
MOTIVATION:
Some Broadcom devices (e.g. 7445, 7278) contain multiple memory
controllers with each mapped in a different address range within
a Uniform Memory Architecture. Some users of these systems have
expressed the desire to locate ZONE_MOVABLE memory on each
memory controller to allow user space intensive processing to
make better use of the additional memory bandwidth.
Unfortunately, the historical monotonic layout of zones would
mean that if the lowest addressed memory controller contains
ZONE_MOVABLE memory then all of the memory available from
memory controllers at higher addresses must also be in the
ZONE_MOVABLE zone. This would force all kernel memory accesses
onto the lowest addressed memory controller and significantly
reduce the amount of memory available for non-movable
allocations.

The main objective of this patch set is therefore to allow a
block of memory to be designated as part of the ZONE_MOVABLE
zone where it will always only be used by the kernel page
allocator to satisfy requests for movable pages. The term
Designated Movable Block is introduced here to represent such a
block. The favored implementation allows modification of the
'movablecore' kernel parameter to allow specification of a base
address and support for multiple blocks. The existing
'movablecore' mechanisms are retained.

BACKGROUND:
NUMA architectures support distributing movablecore memory
across each node, but it is undesirable to introduce the
overhead and complexities of NUMA on systems that don't have a
Non-Uniform Memory Architecture.

Commit 342332e6a925 ("mm/page_alloc.c: introduce kernelcore=mirror option")
also depends on zone overlap to support sytems with multiple
mirrored ranges.

Commit c6f03e2903c9 ("mm, memory_hotplug: remove zone restrictions")
embraced overlapped zones for memory hotplug.

This commit set follows their lead to allow the ZONE_MOVABLE
zone to overlap other zones while spanning the pages from the
lowest Designated Movable Block to the end of the node.
Designated Movable Blocks are made absent from overlapping zones
and present within the ZONE_MOVABLE zone.

I initially investigated an implementation using a Designated
Movable migrate type in line with comments[1] made by Mel Gorman
regarding a "sticky" MIGRATE_MOVABLE type to avoid using
ZONE_MOVABLE. However, this approach was riskier since it was
much more instrusive on the allocation paths. Ultimately, the
progress made by the memory hotplug folks to expand the
ZONE_MOVABLE functionality convinced me to follow this approach.

OTHER OPPORTUNITIES:
CMA introduced a paradigm where multiple allocators could
operate on the same region of memory, and that paradigm can be
extended to Designated Movable Blocks as well. I was interested
in using kernel resource management as a mechanism for exposing
Designated Movable Block resources (e.g. /proc/iomem) that would
be used by the kernel page allocator like any other ZONE_MOVABLE
memory, but could be claimed by an alternative allocator (e.g.
CMA). Unfortunately, this becomes complicated because the kernel
resource implementation varies materially across different
architectures and I do not require this capability so I have
deferred that.

The Devicetree Specification includes support for specifying
reserved memory regions with a 'reusable' property to allow
sharing of the reserved memory between device drivers and the
OS. This is in line with the paradigm introduced by CMA, but is
currently only used by 'shared-dma-pool' compatible reserved
memory regions. Linux could choose to use Designated Movable
Blocks as the default mechanism for other 'reusable' reserved
memory. Device drivers that own 'reusable' reserved memory could
use the dmb_intersects() function introduced here to determine
whether memory requires reclamation from the OS before use and
could use the alloc/free_contig_range() functions to perform the
reclamation and release of memory needed by the device. The CMA
allocator API could be another candidate for device driver
reclamation, but it is not currently exposed for use by device
drivers in modules.

There have been attempts to modify the behavior of the kernel
page allocators use of CMA regions (e.g. [1] & [2]). This
implementation of Designated Movable Blocks creates an
opportunity to allow the CMA allocator to operate on
ZONE_MOVABLE memory that the kernel page allocator can use more
agressively, without affecting users of the existing CMA
implementation. This would have benefit when memory reuse is
more valuable than the cost of increased latency of CMA
allocations (e.g. hugetlb_cma).

These other opportunities are dependent on the Designated
Movable Block concept introduced here, so I will hold off
submitting any such follow-on proposals until there is movement
on this commit set.

NOTES:
The MEMBLOCK_MOVABLE and MEMBLOCK_HOTPLUG flags have a lot in
common and could potentially be consolidated, but I chose to
avoid that here to reduce controversy.

The CMA and DMB alignment constraints are currently the same so
the logic could be simplified, but this implementation keeps
them distinct to facilitate independent evolution of the
implementations if necessary.

Changes in v2:
  - first three commits upstreamed separately [3], [4], and [5].
  - commits 04-06 submitted separately [6].
  - Corrected errors "Reported-by: kernel test robot <lkp@intel.com>"
  - Deferred commits after 15 to simplify review of the base
    functionality.
  - minor reorganization of commit 13.

v1: https://lore.kernel.org/linux-mm/20220913195508.3511038-1-opendmb@gmail.com/

[1] https://lore.kernel.org/lkml/20160428103927.GM2858@techsingularity.net/
[2] https://lore.kernel.org/lkml/1401260672-28339-1-git-send-email-iamjoonsoo.kim@lge.com
[3] https://lore.kernel.org/linux-mm/20220914023913.1855924-1-zi.yan@sent.com
[4] https://lore.kernel.org/linux-mm/20220823030209.57434-2-linmiaohe@huawei.com
[5] https://lore.kernel.org/linux-mm/20220914190917.3517663-1-opendmb@gmail.com
[6] https://lore.kernel.org/linux-mm/20220921223639.1152392-1-opendmb@gmail.com/

Doug Berger (9):
  lib/show_mem.c: display MovableOnly
  mm/vmstat: show start_pfn when zone spans pages
  mm/page_alloc: calculate node_spanned_pages from pfns
  mm/page_alloc.c: allow oversized movablecore
  mm/page_alloc: introduce init_reserved_pageblock()
  memblock: introduce MEMBLOCK_MOVABLE flag
  mm/dmb: Introduce Designated Movable Blocks
  mm/page_alloc: make alloc_contig_pages DMB aware
  mm/page_alloc: allow base for movablecore

 .../admin-guide/kernel-parameters.txt         |  14 +-
 include/linux/dmb.h                           |  29 ++++
 include/linux/gfp.h                           |   5 +-
 include/linux/memblock.h                      |   8 +
 lib/show_mem.c                                |   2 +-
 mm/Kconfig                                    |  12 ++
 mm/Makefile                                   |   1 +
 mm/cma.c                                      |  15 +-
 mm/dmb.c                                      |  91 ++++++++++
 mm/memblock.c                                 |  30 +++-
 mm/page_alloc.c                               | 155 ++++++++++++++----
 mm/vmstat.c                                   |   5 +
 12 files changed, 321 insertions(+), 46 deletions(-)
 create mode 100644 include/linux/dmb.h
 create mode 100644 mm/dmb.c