mbox series

[v4,0/6] Memory Hierarchy: Enable target node lookups for reserved memory

Message ID 157966227494.2508551.7206194169374588977.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
Headers show
Series Memory Hierarchy: Enable target node lookups for reserved memory | expand

Message

Dan Williams Jan. 22, 2020, 3:04 a.m. UTC
Changes since v3 [1]:
- Cleanup numa_map_to_online_node() to remove redundant "if
  (!node_online(node))" (Aneesh)

[1]: http://lore.kernel.org/r/157954696789.2239526.17707265517154476652.stgit@dwillia2-desk3.amr.corp.intel.com

---

Merge notes:

x86 folks: This has an ack from Rafael for ACPI, and Michael for Power.
With an x86 ack I plan to take this through the libnvdimm tree provided
the x86 touches look ok to you.

---

Cover:

Arrange for platform numa info to be preserved for determining
'target_node' data. Where a 'target_node' is the node a reserved memory
range will become when it is onlined.

This new infrastructure is expected to be more valuable over time for
Memory Tiers / Hierarchy management as more platforms (via the ACPI HMAT
and EFI Specific Purpose Memory) publish reserved or "soft-reserved"
ranges to Linux. Linux system administrators will expect to be able to
interact with those ranges with a unique numa node number when/if that
memory is onlined via the dax_kmem driver [2].

One configuration that currently fails to properly convey the target
node for the resulting memory hotplug operation is persistent memory
defined by the memmap=nn!ss parameter. For example, today if node1 is a
memory only node, and all the memory from node1 is specified to
memmap=nn!ss and subsequently onlined, it will end up being onlined as
node0 memory. As it stands, memory_add_physaddr_to_nid() can only
identify online nodes and since node1 in this example has no online cpus
/ memory the target node is initialized node0.

The fix is to preserve rather than discard the numa_meminfo entries that
are relevant for reserved memory ranges, and to uplevel the node
distance helper for determining the "local" (closest) node relative to
an initiator node.

[2]: https://pmem.io/ndctl/daxctl-reconfigure-device.html

---

Dan Williams (6):
      ACPI: NUMA: Up-level "map to online node" functionality
      mm/numa: Skip NUMA_NO_NODE and online nodes in numa_map_to_online_node()
      powerpc/papr_scm: Switch to numa_map_to_online_node()
      x86/mm: Introduce CONFIG_KEEP_NUMA
      x86/numa: Provide a range-to-target_node lookup facility
      libnvdimm/e820: Retrieve and populate correct 'target_node' info


 arch/powerpc/platforms/pseries/papr_scm.c |   21 --------
 arch/x86/Kconfig                          |    1 
 arch/x86/mm/numa.c                        |   74 +++++++++++++++++++++++------
 drivers/acpi/numa/srat.c                  |   41 ----------------
 drivers/nvdimm/e820.c                     |   18 ++-----
 include/linux/acpi.h                      |   23 +++++++++
 include/linux/numa.h                      |   23 +++++++++
 mm/Kconfig                                |    5 ++
 mm/mempolicy.c                            |   31 ++++++++++++
 9 files changed, 145 insertions(+), 92 deletions(-)

Comments

Dan Williams Feb. 13, 2020, 2:28 a.m. UTC | #1
On Tue, Jan 21, 2020 at 7:20 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Changes since v3 [1]:
> - Cleanup numa_map_to_online_node() to remove redundant "if
>   (!node_online(node))" (Aneesh)
>
> [1]: http://lore.kernel.org/r/157954696789.2239526.17707265517154476652.stgit@dwillia2-desk3.amr.corp.intel.com
>
> ---
>
> Merge notes:
>
> x86 folks: This has an ack from Rafael for ACPI, and Michael for Power.
> With an x86 ack I plan to take this through the libnvdimm tree provided
> the x86 touches look ok to you.

Ping x86 folks. There's no additional changes identified for this
series. Can I request an ack to take it through libnvdimm.git? Do you
need a resend?

    x86/mm: Introduce CONFIG_KEEP_NUMA
    x86/numa: Provide a range-to-target_node lookup facility


>
> ---
>
> Cover:
>
> Arrange for platform numa info to be preserved for determining
> 'target_node' data. Where a 'target_node' is the node a reserved memory
> range will become when it is onlined.
>
> This new infrastructure is expected to be more valuable over time for
> Memory Tiers / Hierarchy management as more platforms (via the ACPI HMAT
> and EFI Specific Purpose Memory) publish reserved or "soft-reserved"
> ranges to Linux. Linux system administrators will expect to be able to
> interact with those ranges with a unique numa node number when/if that
> memory is onlined via the dax_kmem driver [2].
>
> One configuration that currently fails to properly convey the target
> node for the resulting memory hotplug operation is persistent memory
> defined by the memmap=nn!ss parameter. For example, today if node1 is a
> memory only node, and all the memory from node1 is specified to
> memmap=nn!ss and subsequently onlined, it will end up being onlined as
> node0 memory. As it stands, memory_add_physaddr_to_nid() can only
> identify online nodes and since node1 in this example has no online cpus
> / memory the target node is initialized node0.
>
> The fix is to preserve rather than discard the numa_meminfo entries that
> are relevant for reserved memory ranges, and to uplevel the node
> distance helper for determining the "local" (closest) node relative to
> an initiator node.
>
> [2]: https://pmem.io/ndctl/daxctl-reconfigure-device.html
>
> ---
>
> Dan Williams (6):
>       ACPI: NUMA: Up-level "map to online node" functionality
>       mm/numa: Skip NUMA_NO_NODE and online nodes in numa_map_to_online_node()
>       powerpc/papr_scm: Switch to numa_map_to_online_node()
>       x86/mm: Introduce CONFIG_KEEP_NUMA
>       x86/numa: Provide a range-to-target_node lookup facility
>       libnvdimm/e820: Retrieve and populate correct 'target_node' info
>
>
>  arch/powerpc/platforms/pseries/papr_scm.c |   21 --------
>  arch/x86/Kconfig                          |    1
>  arch/x86/mm/numa.c                        |   74 +++++++++++++++++++++++------
>  drivers/acpi/numa/srat.c                  |   41 ----------------
>  drivers/nvdimm/e820.c                     |   18 ++-----
>  include/linux/acpi.h                      |   23 +++++++++
>  include/linux/numa.h                      |   23 +++++++++
>  mm/Kconfig                                |    5 ++
>  mm/mempolicy.c                            |   31 ++++++++++++
>  9 files changed, 145 insertions(+), 92 deletions(-)
Ingo Molnar Feb. 13, 2020, 9:37 a.m. UTC | #2
* Dan Williams <dan.j.williams@intel.com> wrote:

> On Tue, Jan 21, 2020 at 7:20 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > Changes since v3 [1]:
> > - Cleanup numa_map_to_online_node() to remove redundant "if
> >   (!node_online(node))" (Aneesh)
> >
> > [1]: http://lore.kernel.org/r/157954696789.2239526.17707265517154476652.stgit@dwillia2-desk3.amr.corp.intel.com
> >
> > ---
> >
> > Merge notes:
> >
> > x86 folks: This has an ack from Rafael for ACPI, and Michael for Power.
> > With an x86 ack I plan to take this through the libnvdimm tree provided
> > the x86 touches look ok to you.
> 
> Ping x86 folks. There's no additional changes identified for this
> series. Can I request an ack to take it through libnvdimm.git? Do you
> need a resend?
> 
>     x86/mm: Introduce CONFIG_KEEP_NUMA
>     x86/numa: Provide a range-to-target_node lookup facility

If the minor complaints I outlined are addressed:

Reviewed-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo