Message ID | 158318759687.2216124.4684754859068906007.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Manual definition of Soft Reserved memory devices | expand |
Dan Williams <dan.j.williams@intel.com> writes: > Given the current dearth of systems that supply an ACPI HMAT table, and > the utility of being able to manually define device-dax "hmem" instances > via the efi_fake_mem= option, relax the requirements for creating these > devices. Specifically, add an option (numa=nohmat) to optionally disable > consideration of the HMAT and update efi_fake_mem= to behave like > memmap=nn!ss in terms of delimiting device boundaries. So, am I correct in deducing that your primary motivation is testing without hardware/firmware support? This looks like a bit of a hack to me, and I think maybe it would be better to just emulate the HMAT using qemu. I don't have a strong objection, though. -Jeff > > All review welcome of course, but the E820 changes want an x86 > maintainer ack, the efi_fake_mem update needs Ard, and Rafael has > previously shepherded the HMAT changes. For the changes to > kernel/resource.c, where there is no clear maintainer, I just copied the > last few people to make thoughtful changes in that area. I am happy to > take these through the nvdimm tree along with these prerequisites > already in -next: > > b2ca916ce392 ACPI: NUMA: Up-level "map to online node" functionality > 4fcbe96e4d0b mm/numa: Skip NUMA_NO_NODE and online nodes in numa_map_to_online_node() > 575e23b6e13c powerpc/papr_scm: Switch to numa_map_to_online_node() > 1e5d8e1e47af x86/mm: Introduce CONFIG_NUMA_KEEP_MEMINFO > 5d30f92e7631 x86/NUMA: Provide a range-to-target_node lookup facility > 7b27a8622f80 libnvdimm/e820: Retrieve and populate correct 'target_node' info > > Tested with: > > numa=nohmat efi_fake_mem=4G@9G:0x40000,4G@13G:0x40000 > > ...to create to device-dax instances: > > # daxctl list -RDu > [ > { > "path":"\/platform\/hmem.1", > "id":1, > "size":"4.00 GiB (4.29 GB)", > "align":2097152, > "devices":[ > { > "chardev":"dax1.0", > "size":"4.00 GiB (4.29 GB)", > "target_node":3, > "mode":"devdax" > } > ] > }, > { > "path":"\/platform\/hmem.0", > "id":0, > "size":"4.00 GiB (4.29 GB)", > "align":2097152, > "devices":[ > { > "chardev":"dax0.0", > "size":"4.00 GiB (4.29 GB)", > "target_node":2, > "mode":"devdax" > } > ] > } > ] > > --- > > Dan Williams (5): > ACPI: NUMA: Add 'nohmat' option > efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance > ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device > resource: Report parent to walk_iomem_res_desc() callback > ACPI: HMAT: Attach a device for each soft-reserved range > > > arch/x86/kernel/e820.c | 16 +++++- > arch/x86/mm/numa.c | 4 + > drivers/acpi/numa/hmat.c | 71 +++----------------------- > drivers/dax/Kconfig | 5 ++ > drivers/dax/Makefile | 3 - > drivers/dax/hmem/Makefile | 6 ++ > drivers/dax/hmem/device.c | 97 +++++++++++++++++++++++++++++++++++ > drivers/dax/hmem/hmem.c | 2 - > drivers/firmware/efi/x86_fake_mem.c | 12 +++- > include/acpi/acpi_numa.h | 1 > include/linux/dax.h | 8 +++ > kernel/resource.c | 1 > 12 files changed, 156 insertions(+), 70 deletions(-) > create mode 100644 drivers/dax/hmem/Makefile > create mode 100644 drivers/dax/hmem/device.c > rename drivers/dax/{hmem.c => hmem/hmem.c} (98%) > > base-commit: 7b27a8622f802761d5c6abd6c37b22312a35343c
On Fri, Mar 6, 2020 at 12:07 PM Jeff Moyer <jmoyer@redhat.com> wrote: > > Dan Williams <dan.j.williams@intel.com> writes: > > > Given the current dearth of systems that supply an ACPI HMAT table, and > > the utility of being able to manually define device-dax "hmem" instances > > via the efi_fake_mem= option, relax the requirements for creating these > > devices. Specifically, add an option (numa=nohmat) to optionally disable > > consideration of the HMAT and update efi_fake_mem= to behave like > > memmap=nn!ss in terms of delimiting device boundaries. > > So, am I correct in deducing that your primary motivation is testing > without hardware/firmware support? My primary motivation is making the dax_kmem facility useful to shipping platforms that have performance differentiated memory, but may not have EFI-defined soft-reservations / HMAT (or non-EFI-ACPI-platform equivalent). I'm anticipating HMAT enabled platforms where the platform firmware policy for what is soft-reserved, or not, is not the policy the system owner would pick. I'd also highlight Joao's work [1] (see the TODO section) as an indication of the demand for custom carving memory resources and applying the device-dax memory management interface. > This looks like a bit of a hack to > me, and I think maybe it would be better to just emulate the HMAT using > qemu. I don't have a strong objection, though. Yeah, qemu emulation does not help when you, the system owner, have a different use case than what the bare-metal platform-firmware envisioned for "specific-purpose memory". [1]: https://lore.kernel.org/lkml/20200110190313.17144-1-joao.m.martins@oracle.com/