Message ID | 20210910124628.6261-1-justin.he@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] device-dax: use fallback nid when numa node is invalid | expand |
On Fri, Sep 10, 2021 at 5:46 AM Jia He <justin.he@arm.com> wrote: > > Previously, numa_off was set unconditionally in dummy_numa_init() > even with a fake numa node. Then ACPI sets node id as NUMA_NO_NODE(-1) > after acpi_map_pxm_to_node() because it regards numa_off as turning > off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on > arm64 with fake numa case. > > Without this patch, pmem can't be probed as RAM devices on arm64 if > SRAT table isn't present: > $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 64K > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with invalid node: -1 > kmem: probe of dax0.0 failed with error -22 > > This fixes it by using fallback memory_add_physaddr_to_nid() as nid. > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Jia He <justin.he@arm.com> > --- > v2: - rebase it based on David's "memory group" patch. > - drop the changes in dev_dax_kmem_remove() since nid had been > removed in remove_memory(). > drivers/dax/kmem.c | 31 +++++++++++++++++-------------- > 1 file changed, 17 insertions(+), 14 deletions(-) > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > index a37622060fff..e4836eb7539e 100644 > --- a/drivers/dax/kmem.c > +++ b/drivers/dax/kmem.c > @@ -47,20 +47,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) > unsigned long total_len = 0; > struct dax_kmem_data *data; > int i, rc, mapped = 0; > - int numa_node; > - > - /* > - * Ensure good NUMA information for the persistent memory. > - * Without this check, there is a risk that slow memory > - * could be mixed in a node with faster memory, causing > - * unavoidable performance issues. > - */ > - numa_node = dev_dax->target_node; > - if (numa_node < 0) { > - dev_warn(dev, "rejecting DAX region with invalid node: %d\n", > - numa_node); > - return -EINVAL; > - } > + int numa_node = dev_dax->target_node; > > for (i = 0; i < dev_dax->nr_range; i++) { > struct range range; > @@ -71,6 +58,22 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) > i, range.start, range.end); > continue; > } > + > + /* > + * Ensure good NUMA information for the persistent memory. > + * Without this check, there is a risk but not fatal that slow > + * memory could be mixed in a node with faster memory, causing > + * unavoidable performance issues. Warn this and use fallback > + * node id. > + */ > + if (numa_node < 0) { > + int new_node = memory_add_physaddr_to_nid(range.start); > + > + dev_info(dev, "changing nid from %d to %d for DAX region [%#llx-%#llx]\n", > + numa_node, new_node, range.start, range.end); > + numa_node = new_node; > + } > + > total_len += range_len(&range); This fallback change belongs where the parent region for the namespace adopts its target_node, because it's not clear memory_add_physaddr_to_nid() is the right fallback in all situations. Here is where this setting is happening currently: drivers/acpi/nfit/core.c:3004: ndr_desc->target_node = pxm_to_node(spa->proximity_domain); drivers/acpi/nfit/core.c:3007: ndr_desc->target_node = NUMA_NO_NODE; drivers/nvdimm/e820.c:29: ndr_desc.target_node = nid; drivers/nvdimm/of_pmem.c:58: ndr_desc.target_node = ndr_desc.numa_node; drivers/nvdimm/region_devs.c:1127: nd_region->target_node = ndr_desc->target_node; ...where is this pmem region originating on this arm64 platform?
Hi Dan, > -----Original Message----- > From: Dan Williams <dan.j.williams@intel.com> > Sent: Friday, September 10, 2021 11:42 PM > To: Justin He <Justin.He@arm.com> > Cc: Vishal Verma <vishal.l.verma@intel.com>; Dave Jiang > <dave.jiang@intel.com>; David Hildenbrand <david@redhat.com>; Linux NVDIMM > <nvdimm@lists.linux.dev>; Linux Kernel Mailing List <linux- > kernel@vger.kernel.org> > Subject: Re: [PATCH v2] device-dax: use fallback nid when numa node is > invalid > > On Fri, Sep 10, 2021 at 5:46 AM Jia He <justin.he@arm.com> wrote: > > > > Previously, numa_off was set unconditionally in dummy_numa_init() > > even with a fake numa node. Then ACPI sets node id as NUMA_NO_NODE(-1) > > after acpi_map_pxm_to_node() because it regards numa_off as turning > > off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on > > arm64 with fake numa case. > > > > Without this patch, pmem can't be probed as RAM devices on arm64 if > > SRAT table isn't present: > > $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g > -a 64K > > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with > invalid node: -1 > > kmem: probe of dax0.0 failed with error -22 > > > > This fixes it by using fallback memory_add_physaddr_to_nid() as nid. > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > Signed-off-by: Jia He <justin.he@arm.com> > > --- > > v2: - rebase it based on David's "memory group" patch. > > - drop the changes in dev_dax_kmem_remove() since nid had been > > removed in remove_memory(). > > drivers/dax/kmem.c | 31 +++++++++++++++++-------------- > > 1 file changed, 17 insertions(+), 14 deletions(-) > > > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > > index a37622060fff..e4836eb7539e 100644 > > --- a/drivers/dax/kmem.c > > +++ b/drivers/dax/kmem.c > > @@ -47,20 +47,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) > > unsigned long total_len = 0; > > struct dax_kmem_data *data; > > int i, rc, mapped = 0; > > - int numa_node; > > - > > - /* > > - * Ensure good NUMA information for the persistent memory. > > - * Without this check, there is a risk that slow memory > > - * could be mixed in a node with faster memory, causing > > - * unavoidable performance issues. > > - */ > > - numa_node = dev_dax->target_node; > > - if (numa_node < 0) { > > - dev_warn(dev, "rejecting DAX region with invalid > node: %d\n", > > - numa_node); > > - return -EINVAL; > > - } > > + int numa_node = dev_dax->target_node; > > > > for (i = 0; i < dev_dax->nr_range; i++) { > > struct range range; > > @@ -71,6 +58,22 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) > > i, range.start, range.end); > > continue; > > } > > + > > + /* > > + * Ensure good NUMA information for the persistent > memory. > > + * Without this check, there is a risk but not fatal > that slow > > + * memory could be mixed in a node with faster memory, > causing > > + * unavoidable performance issues. Warn this and use > fallback > > + * node id. > > + */ > > + if (numa_node < 0) { > > + int new_node = > memory_add_physaddr_to_nid(range.start); > > + > > + dev_info(dev, "changing nid from %d to %d for > DAX region [%#llx-%#llx]\n", > > + numa_node, new_node, range.start, > range.end); > > + numa_node = new_node; > > + } > > + > > total_len += range_len(&range); > > This fallback change belongs where the parent region for the namespace > adopts its target_node, because it's not clear > memory_add_physaddr_to_nid() is the right fallback in all situations. > Here is where this setting is happening currently: > > drivers/acpi/nfit/core.c:3004: ndr_desc->target_node = > pxm_to_node(spa->proximity_domain); On my local arm64 guest('virt' machine type), the target_node is set to -1 at this line. That is: The condition "spa->flags & ACPI_NFIT_PROXIMITY_VALID" is hit. > drivers/acpi/nfit/core.c:3007: ndr_desc->target_node = > NUMA_NO_NODE; > drivers/nvdimm/e820.c:29: ndr_desc.target_node = nid; > drivers/nvdimm/of_pmem.c:58: ndr_desc.target_node = > ndr_desc.numa_node; > drivers/nvdimm/region_devs.c:1127: nd_region->target_node = > ndr_desc->target_node; Sorry,Dan. I thought I missed your previous mail: ========================================= Looks like it is the NFIT driver, thanks. If you're getting NUMA_NO_NODE in dax_kmem from the NFIT driver in means your ACPI NFIT table is failing to populate correct numa information. You could try the following to fix it up, but I think the real problem is that your platform BIOS needs to add the proper numa data. diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index fb775b967c52..d3a0cec635b1 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -3005,15 +3005,8 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc, ndr_desc->res = &res; ndr_desc->provider_data = nfit_spa; ndr_desc->attr_groups = acpi_nfit_region_attribute_groups; - if (spa->flags & ACPI_NFIT_PROXIMITY_VALID) { - ndr_desc->numa_node = acpi_map_pxm_to_online_node( - spa->proximity_domain); - ndr_desc->target_node = acpi_map_pxm_to_node( - spa->proximity_domain); - } else { - ndr_desc->numa_node = NUMA_NO_NODE; - ndr_desc->target_node = NUMA_NO_NODE; - } + ndr_desc->numa_node = memory_add_physaddr_to_nid(spa->address); + ndr_desc->target_node = phys_to_target_node(spa->address); /* * Persistence domain bits are hierarchical, if =================================================== Do you still suggest fixing like this? -- Cheers, Justin (Jia He)
On Mon, Sep 13, 2021 at 7:06 PM Justin He <Justin.He@arm.com> wrote: > > Hi Dan, > > > -----Original Message----- > > From: Dan Williams <dan.j.williams@intel.com> > > Sent: Friday, September 10, 2021 11:42 PM > > To: Justin He <Justin.He@arm.com> > > Cc: Vishal Verma <vishal.l.verma@intel.com>; Dave Jiang > > <dave.jiang@intel.com>; David Hildenbrand <david@redhat.com>; Linux NVDIMM > > <nvdimm@lists.linux.dev>; Linux Kernel Mailing List <linux- > > kernel@vger.kernel.org> > > Subject: Re: [PATCH v2] device-dax: use fallback nid when numa node is > > invalid > > > > On Fri, Sep 10, 2021 at 5:46 AM Jia He <justin.he@arm.com> wrote: > > > > > > Previously, numa_off was set unconditionally in dummy_numa_init() > > > even with a fake numa node. Then ACPI sets node id as NUMA_NO_NODE(-1) > > > after acpi_map_pxm_to_node() because it regards numa_off as turning > > > off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on > > > arm64 with fake numa case. > > > > > > Without this patch, pmem can't be probed as RAM devices on arm64 if > > > SRAT table isn't present: > > > $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g > > -a 64K > > > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with > > invalid node: -1 > > > kmem: probe of dax0.0 failed with error -22 > > > > > > This fixes it by using fallback memory_add_physaddr_to_nid() as nid. > > > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > > Signed-off-by: Jia He <justin.he@arm.com> > > > --- > > > v2: - rebase it based on David's "memory group" patch. > > > - drop the changes in dev_dax_kmem_remove() since nid had been > > > removed in remove_memory(). > > > drivers/dax/kmem.c | 31 +++++++++++++++++-------------- > > > 1 file changed, 17 insertions(+), 14 deletions(-) > > > > > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > > > index a37622060fff..e4836eb7539e 100644 > > > --- a/drivers/dax/kmem.c > > > +++ b/drivers/dax/kmem.c > > > @@ -47,20 +47,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) > > > unsigned long total_len = 0; > > > struct dax_kmem_data *data; > > > int i, rc, mapped = 0; > > > - int numa_node; > > > - > > > - /* > > > - * Ensure good NUMA information for the persistent memory. > > > - * Without this check, there is a risk that slow memory > > > - * could be mixed in a node with faster memory, causing > > > - * unavoidable performance issues. > > > - */ > > > - numa_node = dev_dax->target_node; > > > - if (numa_node < 0) { > > > - dev_warn(dev, "rejecting DAX region with invalid > > node: %d\n", > > > - numa_node); > > > - return -EINVAL; > > > - } > > > + int numa_node = dev_dax->target_node; > > > > > > for (i = 0; i < dev_dax->nr_range; i++) { > > > struct range range; > > > @@ -71,6 +58,22 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) > > > i, range.start, range.end); > > > continue; > > > } > > > + > > > + /* > > > + * Ensure good NUMA information for the persistent > > memory. > > > + * Without this check, there is a risk but not fatal > > that slow > > > + * memory could be mixed in a node with faster memory, > > causing > > > + * unavoidable performance issues. Warn this and use > > fallback > > > + * node id. > > > + */ > > > + if (numa_node < 0) { > > > + int new_node = > > memory_add_physaddr_to_nid(range.start); > > > + > > > + dev_info(dev, "changing nid from %d to %d for > > DAX region [%#llx-%#llx]\n", > > > + numa_node, new_node, range.start, > > range.end); > > > + numa_node = new_node; > > > + } > > > + > > > total_len += range_len(&range); > > > > This fallback change belongs where the parent region for the namespace > > adopts its target_node, because it's not clear > > memory_add_physaddr_to_nid() is the right fallback in all situations. > > Here is where this setting is happening currently: > > > > drivers/acpi/nfit/core.c:3004: ndr_desc->target_node = > > pxm_to_node(spa->proximity_domain); > On my local arm64 guest('virt' machine type), the target_node is > set to -1 at this line. > That is: > The condition "spa->flags & ACPI_NFIT_PROXIMITY_VALID" is hit. > > > drivers/acpi/nfit/core.c:3007: ndr_desc->target_node = > > NUMA_NO_NODE; > > drivers/nvdimm/e820.c:29: ndr_desc.target_node = nid; > > drivers/nvdimm/of_pmem.c:58: ndr_desc.target_node = > > ndr_desc.numa_node; > > drivers/nvdimm/region_devs.c:1127: nd_region->target_node = > > ndr_desc->target_node; > > > Sorry,Dan. I thought I missed your previous mail: > > ========================================= > Looks like it is the NFIT driver, thanks. > > If you're getting NUMA_NO_NODE in dax_kmem from the NFIT driver in > means your ACPI NFIT table is failing to populate correct numa > information. You could try the following to fix it up, but I think the > real problem is that your platform BIOS needs to add the proper numa > data. > > diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c > index fb775b967c52..d3a0cec635b1 100644 > --- a/drivers/acpi/nfit/core.c > +++ b/drivers/acpi/nfit/core.c > @@ -3005,15 +3005,8 @@ static int acpi_nfit_register_region(struct > acpi_nfit_desc *acpi_desc, > ndr_desc->res = &res; > ndr_desc->provider_data = nfit_spa; > ndr_desc->attr_groups = acpi_nfit_region_attribute_groups; > - if (spa->flags & ACPI_NFIT_PROXIMITY_VALID) { > - ndr_desc->numa_node = acpi_map_pxm_to_online_node( > - spa->proximity_domain); > - ndr_desc->target_node = acpi_map_pxm_to_node( > - spa->proximity_domain); > - } else { > - ndr_desc->numa_node = NUMA_NO_NODE; > - ndr_desc->target_node = NUMA_NO_NODE; > - } > + ndr_desc->numa_node = memory_add_physaddr_to_nid(spa->address); > + ndr_desc->target_node = phys_to_target_node(spa->address); > > /* > * Persistence domain bits are hierarchical, if > =================================================== > > Do you still suggest fixing like this? Are you saying that ACPI_NFIT_PROXIMITY_VALID is not set on your platform, or that pxm_to_node() returns NUMA_NO_NODE? I would expect something like this: diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index a3ef6cce644c..95de7dc18ed8 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -3007,6 +3007,15 @@ static int acpi_nfit_register_region(struct acpi_nfit_desc *acpi_desc, ndr_desc->target_node = NUMA_NO_NODE; } + /* + * Fallback to address based numa information if node lookup + * failed + */ + if (ndr_desc->numa_node == NUMA_NO_NODE) + ndr_desc->numa_node = memory_add_physaddr_to_nid(spa->address); + if (ndr_desc->target_node == NUMA_NO_NODE) + phys_to_target_node(spa->address); + /* * Persistence domain bits are hierarchical, if * ACPI_NFIT_CAPABILITY_CACHE_FLUSH is set then
> -----Original Message----- > From: Dan Williams <dan.j.williams@intel.com> > Sent: Wednesday, September 15, 2021 1:16 PM > To: Justin He <Justin.He@arm.com> > Cc: Vishal Verma <vishal.l.verma@intel.com>; Dave Jiang > <dave.jiang@intel.com>; David Hildenbrand <david@redhat.com>; Linux NVDIMM > <nvdimm@lists.linux.dev>; Linux Kernel Mailing List <linux- > kernel@vger.kernel.org>; nd <nd@arm.com> > Subject: Re: [PATCH v2] device-dax: use fallback nid when numa node is > invalid > > On Mon, Sep 13, 2021 at 7:06 PM Justin He <Justin.He@arm.com> wrote: > > > > Hi Dan, > > > > > -----Original Message----- > > > From: Dan Williams <dan.j.williams@intel.com> > > > Sent: Friday, September 10, 2021 11:42 PM > > > To: Justin He <Justin.He@arm.com> > > > Cc: Vishal Verma <vishal.l.verma@intel.com>; Dave Jiang > > > <dave.jiang@intel.com>; David Hildenbrand <david@redhat.com>; Linux > NVDIMM > > > <nvdimm@lists.linux.dev>; Linux Kernel Mailing List <linux- > > > kernel@vger.kernel.org> > > > Subject: Re: [PATCH v2] device-dax: use fallback nid when numa node is > > > invalid > > > > > > On Fri, Sep 10, 2021 at 5:46 AM Jia He <justin.he@arm.com> wrote: > > > > > > > > Previously, numa_off was set unconditionally in dummy_numa_init() > > > > even with a fake numa node. Then ACPI sets node id as NUMA_NO_NODE(-1) > > > > after acpi_map_pxm_to_node() because it regards numa_off as turning > > > > off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on > > > > arm64 with fake numa case. > > > > > > > > Without this patch, pmem can't be probed as RAM devices on arm64 if > > > > SRAT table isn't present: > > > > $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s > 1g > > > -a 64K > > > > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] > with > > > invalid node: -1 > > > > kmem: probe of dax0.0 failed with error -22 > > > > > > > > This fixes it by using fallback memory_add_physaddr_to_nid() as nid. > > > > > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > > > Signed-off-by: Jia He <justin.he@arm.com> > > > > --- > > > > v2: - rebase it based on David's "memory group" patch. > > > > - drop the changes in dev_dax_kmem_remove() since nid had been > > > > removed in remove_memory(). > > > > drivers/dax/kmem.c | 31 +++++++++++++++++-------------- > > > > 1 file changed, 17 insertions(+), 14 deletions(-) > > > > > > > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > > > > index a37622060fff..e4836eb7539e 100644 > > > > --- a/drivers/dax/kmem.c > > > > +++ b/drivers/dax/kmem.c > > > > @@ -47,20 +47,7 @@ static int dev_dax_kmem_probe(struct dev_dax > *dev_dax) > > > > unsigned long total_len = 0; > > > > struct dax_kmem_data *data; > > > > int i, rc, mapped = 0; > > > > - int numa_node; > > > > - > > > > - /* > > > > - * Ensure good NUMA information for the persistent memory. > > > > - * Without this check, there is a risk that slow memory > > > > - * could be mixed in a node with faster memory, causing > > > > - * unavoidable performance issues. > > > > - */ > > > > - numa_node = dev_dax->target_node; > > > > - if (numa_node < 0) { > > > > - dev_warn(dev, "rejecting DAX region with invalid > > > node: %d\n", > > > > - numa_node); > > > > - return -EINVAL; > > > > - } > > > > + int numa_node = dev_dax->target_node; > > > > > > > > for (i = 0; i < dev_dax->nr_range; i++) { > > > > struct range range; > > > > @@ -71,6 +58,22 @@ static int dev_dax_kmem_probe(struct dev_dax > *dev_dax) > > > > i, range.start, range.end); > > > > continue; > > > > } > > > > + > > > > + /* > > > > + * Ensure good NUMA information for the persistent > > > memory. > > > > + * Without this check, there is a risk but not fatal > > > that slow > > > > + * memory could be mixed in a node with faster memory, > > > causing > > > > + * unavoidable performance issues. Warn this and use > > > fallback > > > > + * node id. > > > > + */ > > > > + if (numa_node < 0) { > > > > + int new_node = > > > memory_add_physaddr_to_nid(range.start); > > > > + > > > > + dev_info(dev, "changing nid from %d to %d for > > > DAX region [%#llx-%#llx]\n", > > > > + numa_node, new_node, range.start, > > > range.end); > > > > + numa_node = new_node; > > > > + } > > > > + > > > > total_len += range_len(&range); > > > > > > This fallback change belongs where the parent region for the namespace > > > adopts its target_node, because it's not clear > > > memory_add_physaddr_to_nid() is the right fallback in all situations. > > > Here is where this setting is happening currently: > > > > > > drivers/acpi/nfit/core.c:3004: ndr_desc->target_node = > > > pxm_to_node(spa->proximity_domain); > > On my local arm64 guest('virt' machine type), the target_node is > > set to -1 at this line. > > That is: > > The condition "spa->flags & ACPI_NFIT_PROXIMITY_VALID" is hit. > > > > > drivers/acpi/nfit/core.c:3007: ndr_desc->target_node = > > > NUMA_NO_NODE; > > > drivers/nvdimm/e820.c:29: ndr_desc.target_node = nid; > > > drivers/nvdimm/of_pmem.c:58: ndr_desc.target_node = > > > ndr_desc.numa_node; > > > drivers/nvdimm/region_devs.c:1127: nd_region->target_node = > > > ndr_desc->target_node; > > > > > > Sorry,Dan. I thought I missed your previous mail: > > > > ========================================= > > Looks like it is the NFIT driver, thanks. > > > > If you're getting NUMA_NO_NODE in dax_kmem from the NFIT driver in > > means your ACPI NFIT table is failing to populate correct numa > > information. You could try the following to fix it up, but I think the > > real problem is that your platform BIOS needs to add the proper numa > > data. > > > > diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c > > index fb775b967c52..d3a0cec635b1 100644 > > --- a/drivers/acpi/nfit/core.c > > +++ b/drivers/acpi/nfit/core.c > > @@ -3005,15 +3005,8 @@ static int acpi_nfit_register_region(struct > > acpi_nfit_desc *acpi_desc, > > ndr_desc->res = &res; > > ndr_desc->provider_data = nfit_spa; > > ndr_desc->attr_groups = acpi_nfit_region_attribute_groups; > > - if (spa->flags & ACPI_NFIT_PROXIMITY_VALID) { > > - ndr_desc->numa_node = acpi_map_pxm_to_online_node( > > - spa->proximity_domain); > > - ndr_desc->target_node = acpi_map_pxm_to_node( > > - spa->proximity_domain); > > - } else { > > - ndr_desc->numa_node = NUMA_NO_NODE; > > - ndr_desc->target_node = NUMA_NO_NODE; > > - } > > + ndr_desc->numa_node = memory_add_physaddr_to_nid(spa->address); > > + ndr_desc->target_node = phys_to_target_node(spa->address); > > > > /* > > * Persistence domain bits are hierarchical, if > > =================================================== > > > > Do you still suggest fixing like this? > > Are you saying that ACPI_NFIT_PROXIMITY_VALID is not set on your > platform, or that pxm_to_node() returns NUMA_NO_NODE? > Latter, ACPI_NFIT_PROXIMITY_VALID is *set* in my case. > I would expect something like this: > > diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c > index a3ef6cce644c..95de7dc18ed8 100644 > --- a/drivers/acpi/nfit/core.c > +++ b/drivers/acpi/nfit/core.c > @@ -3007,6 +3007,15 @@ static int acpi_nfit_register_region(struct > acpi_nfit_desc *acpi_desc, > ndr_desc->target_node = NUMA_NO_NODE; > } > > + /* > + * Fallback to address based numa information if node lookup > + * failed > + */ > + if (ndr_desc->numa_node == NUMA_NO_NODE) > + ndr_desc->numa_node = memory_add_physaddr_to_nid(spa- > >address); > + if (ndr_desc->target_node == NUMA_NO_NODE) > + phys_to_target_node(spa->address); > + Would it better to add a dev_info() here to report this node id changing? -- Cheers, Justin (Jia He)
On Tue, Sep 14, 2021 at 11:51 PM Justin He <Justin.He@arm.com> wrote: [..] > > > diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c > > > index fb775b967c52..d3a0cec635b1 100644 > > > --- a/drivers/acpi/nfit/core.c > > > +++ b/drivers/acpi/nfit/core.c > > > @@ -3005,15 +3005,8 @@ static int acpi_nfit_register_region(struct > > > acpi_nfit_desc *acpi_desc, > > > ndr_desc->res = &res; > > > ndr_desc->provider_data = nfit_spa; > > > ndr_desc->attr_groups = acpi_nfit_region_attribute_groups; > > > - if (spa->flags & ACPI_NFIT_PROXIMITY_VALID) { > > > - ndr_desc->numa_node = acpi_map_pxm_to_online_node( > > > - spa->proximity_domain); > > > - ndr_desc->target_node = acpi_map_pxm_to_node( > > > - spa->proximity_domain); > > > - } else { > > > - ndr_desc->numa_node = NUMA_NO_NODE; > > > - ndr_desc->target_node = NUMA_NO_NODE; > > > - } > > > + ndr_desc->numa_node = memory_add_physaddr_to_nid(spa->address); > > > + ndr_desc->target_node = phys_to_target_node(spa->address); > > > > > > /* > > > * Persistence domain bits are hierarchical, if > > > =================================================== > > > > > > Do you still suggest fixing like this? > > > > Are you saying that ACPI_NFIT_PROXIMITY_VALID is not set on your > > platform, or that pxm_to_node() returns NUMA_NO_NODE? > > > Latter, ACPI_NFIT_PROXIMITY_VALID is *set* in my case. > > > I would expect something like this: > > > > diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c > > index a3ef6cce644c..95de7dc18ed8 100644 > > --- a/drivers/acpi/nfit/core.c > > +++ b/drivers/acpi/nfit/core.c > > @@ -3007,6 +3007,15 @@ static int acpi_nfit_register_region(struct > > acpi_nfit_desc *acpi_desc, > > ndr_desc->target_node = NUMA_NO_NODE; > > } > > > > + /* > > + * Fallback to address based numa information if node lookup > > + * failed > > + */ > > + if (ndr_desc->numa_node == NUMA_NO_NODE) > > + ndr_desc->numa_node = memory_add_physaddr_to_nid(spa- > > >address); > > + if (ndr_desc->target_node == NUMA_NO_NODE) > > + phys_to_target_node(spa->address); > > + > > Would it better to add a dev_info() here to report this node id changing? Yes, given all the possibilities here, a dev_info() reporting the final result of the node mapping is justifiable.
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index a37622060fff..e4836eb7539e 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -47,20 +47,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) unsigned long total_len = 0; struct dax_kmem_data *data; int i, rc, mapped = 0; - int numa_node; - - /* - * Ensure good NUMA information for the persistent memory. - * Without this check, there is a risk that slow memory - * could be mixed in a node with faster memory, causing - * unavoidable performance issues. - */ - numa_node = dev_dax->target_node; - if (numa_node < 0) { - dev_warn(dev, "rejecting DAX region with invalid node: %d\n", - numa_node); - return -EINVAL; - } + int numa_node = dev_dax->target_node; for (i = 0; i < dev_dax->nr_range; i++) { struct range range; @@ -71,6 +58,22 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) i, range.start, range.end); continue; } + + /* + * Ensure good NUMA information for the persistent memory. + * Without this check, there is a risk but not fatal that slow + * memory could be mixed in a node with faster memory, causing + * unavoidable performance issues. Warn this and use fallback + * node id. + */ + if (numa_node < 0) { + int new_node = memory_add_physaddr_to_nid(range.start); + + dev_info(dev, "changing nid from %d to %d for DAX region [%#llx-%#llx]\n", + numa_node, new_node, range.start, range.end); + numa_node = new_node; + } + total_len += range_len(&range); }
Previously, numa_off was set unconditionally in dummy_numa_init() even with a fake numa node. Then ACPI sets node id as NUMA_NO_NODE(-1) after acpi_map_pxm_to_node() because it regards numa_off as turning off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on arm64 with fake numa case. Without this patch, pmem can't be probed as RAM devices on arm64 if SRAT table isn't present: $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 64K kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with invalid node: -1 kmem: probe of dax0.0 failed with error -22 This fixes it by using fallback memory_add_physaddr_to_nid() as nid. Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Jia He <justin.he@arm.com> --- v2: - rebase it based on David's "memory group" patch. - drop the changes in dev_dax_kmem_remove() since nid had been removed in remove_memory(). drivers/dax/kmem.c | 31 +++++++++++++++++-------------- 1 file changed, 17 insertions(+), 14 deletions(-)