Message ID | 158155490897.3343782.14216276134794923581.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | libnvdimm: Cross-arch compatible namespace alignment | expand |
Dan Williams <dan.j.williams@intel.com> writes: > The pmem driver on PowerPC crashes with the following signature when > instantiating misaligned namespaces that map their capacity via > memremap_pages(). > > BUG: Unable to handle kernel data access at 0xc001000406000000 > Faulting instruction address: 0xc000000000090790 > NIP [c000000000090790] arch_add_memory+0xc0/0x130 > LR [c000000000090744] arch_add_memory+0x74/0x130 > Call Trace: > arch_add_memory+0x74/0x130 (unreliable) > memremap_pages+0x74c/0xa30 > devm_memremap_pages+0x3c/0xa0 > pmem_attach_disk+0x188/0x770 > nvdimm_bus_probe+0xd8/0x470 > > With the assumption that only memremap_pages() has alignment > constraints, enforce memremap_compat_align() for > pmem_should_map_pages(), nd_pfn, or nd_dax cases. > > Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > Cc: Jeff Moyer <jmoyer@redhat.com> > Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > Link: https://lore.kernel.org/r/158041477336.3889308.4581652885008605170.stgit@dwillia2-desk3.amr.corp.intel.com > Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Dan Williams <dan.j.williams@intel.com> writes: > The pmem driver on PowerPC crashes with the following signature when > instantiating misaligned namespaces that map their capacity via > memremap_pages(). > > BUG: Unable to handle kernel data access at 0xc001000406000000 > Faulting instruction address: 0xc000000000090790 > NIP [c000000000090790] arch_add_memory+0xc0/0x130 > LR [c000000000090744] arch_add_memory+0x74/0x130 > Call Trace: > arch_add_memory+0x74/0x130 (unreliable) > memremap_pages+0x74c/0xa30 > devm_memremap_pages+0x3c/0xa0 > pmem_attach_disk+0x188/0x770 > nvdimm_bus_probe+0xd8/0x470 > > With the assumption that only memremap_pages() has alignment > constraints, enforce memremap_compat_align() for > pmem_should_map_pages(), nd_pfn, or nd_dax cases. > > Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > Cc: Jeff Moyer <jmoyer@redhat.com> > Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > Link: https://lore.kernel.org/r/158041477336.3889308.4581652885008605170.stgit@dwillia2-desk3.amr.corp.intel.com > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > drivers/nvdimm/namespace_devs.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c > index 032dc61725ff..aff1f32fdb4f 100644 > --- a/drivers/nvdimm/namespace_devs.c > +++ b/drivers/nvdimm/namespace_devs.c > @@ -1739,6 +1739,16 @@ struct nd_namespace_common *nvdimm_namespace_common_probe(struct device *dev) > return ERR_PTR(-ENODEV); > } > > + if (pmem_should_map_pages(dev) || nd_pfn || nd_dax) { > + struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); > + resource_size_t start = nsio->res.start; > + > + if (!IS_ALIGNED(start | size, memremap_compat_align())) { > + dev_dbg(&ndns->dev, "misaligned, unable to map\n"); > + return ERR_PTR(-EOPNOTSUPP); > + } > + } > + > if (is_namespace_pmem(&ndns->dev)) { > struct nd_namespace_pmem *nspm; > Actually, I take back my ack. :) This prevents a previously working namespace from being successfully probed/setup. I thought we were only going to enforce the alignment for a newly created namespace? This should only check whether the alignment works for the current platform. -Jeff
On Thu, Feb 13, 2020 at 1:55 PM Jeff Moyer <jmoyer@redhat.com> wrote: > > Dan Williams <dan.j.williams@intel.com> writes: > > > The pmem driver on PowerPC crashes with the following signature when > > instantiating misaligned namespaces that map their capacity via > > memremap_pages(). > > > > BUG: Unable to handle kernel data access at 0xc001000406000000 > > Faulting instruction address: 0xc000000000090790 > > NIP [c000000000090790] arch_add_memory+0xc0/0x130 > > LR [c000000000090744] arch_add_memory+0x74/0x130 > > Call Trace: > > arch_add_memory+0x74/0x130 (unreliable) > > memremap_pages+0x74c/0xa30 > > devm_memremap_pages+0x3c/0xa0 > > pmem_attach_disk+0x188/0x770 > > nvdimm_bus_probe+0xd8/0x470 > > > > With the assumption that only memremap_pages() has alignment > > constraints, enforce memremap_compat_align() for > > pmem_should_map_pages(), nd_pfn, or nd_dax cases. > > > > Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > > Cc: Jeff Moyer <jmoyer@redhat.com> > > Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > > Link: https://lore.kernel.org/r/158041477336.3889308.4581652885008605170.stgit@dwillia2-desk3.amr.corp.intel.com > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > --- > > drivers/nvdimm/namespace_devs.c | 10 ++++++++++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c > > index 032dc61725ff..aff1f32fdb4f 100644 > > --- a/drivers/nvdimm/namespace_devs.c > > +++ b/drivers/nvdimm/namespace_devs.c > > @@ -1739,6 +1739,16 @@ struct nd_namespace_common *nvdimm_namespace_common_probe(struct device *dev) > > return ERR_PTR(-ENODEV); > > } > > > > + if (pmem_should_map_pages(dev) || nd_pfn || nd_dax) { > > + struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); > > + resource_size_t start = nsio->res.start; > > + > > + if (!IS_ALIGNED(start | size, memremap_compat_align())) { > > + dev_dbg(&ndns->dev, "misaligned, unable to map\n"); > > + return ERR_PTR(-EOPNOTSUPP); > > + } > > + } > > + > > if (is_namespace_pmem(&ndns->dev)) { > > struct nd_namespace_pmem *nspm; > > > > Actually, I take back my ack. :) This prevents a previously working > namespace from being successfully probed/setup. Do you have a test case handy? I can see a potential gap with a namespace that used internal padding to fix up the alignment. The goal of this check is to catch cases that are just going to fail devm_memremap_pages(), and the expectation is that it could not have worked before unless it was ported from another platform, or someone flipped the page-size switch on PowerPC. > I thought we were only > going to enforce the alignment for a newly created namespace? This should > only check whether the alignment works for the current platform. The model is a new default 16MB alignment is enforced at creation time, but if you need to support previously created namespaces then you can manually trim that alignment requirement to no less than memremap_compat_align() because that's the point at which devm_memremap_pages() will start failing or crashing.
Dan Williams <dan.j.williams@intel.com> writes: > On Thu, Feb 13, 2020 at 1:55 PM Jeff Moyer <jmoyer@redhat.com> wrote: >> >> Dan Williams <dan.j.williams@intel.com> writes: >> >> > The pmem driver on PowerPC crashes with the following signature when >> > instantiating misaligned namespaces that map their capacity via >> > memremap_pages(). >> > >> > BUG: Unable to handle kernel data access at 0xc001000406000000 >> > Faulting instruction address: 0xc000000000090790 >> > NIP [c000000000090790] arch_add_memory+0xc0/0x130 >> > LR [c000000000090744] arch_add_memory+0x74/0x130 >> > Call Trace: >> > arch_add_memory+0x74/0x130 (unreliable) >> > memremap_pages+0x74c/0xa30 >> > devm_memremap_pages+0x3c/0xa0 >> > pmem_attach_disk+0x188/0x770 >> > nvdimm_bus_probe+0xd8/0x470 >> > >> > With the assumption that only memremap_pages() has alignment >> > constraints, enforce memremap_compat_align() for >> > pmem_should_map_pages(), nd_pfn, or nd_dax cases. >> > >> > Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >> > Cc: Jeff Moyer <jmoyer@redhat.com> >> > Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >> > Link: https://lore.kernel.org/r/158041477336.3889308.4581652885008605170.stgit@dwillia2-desk3.amr.corp.intel.com >> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> >> > --- >> > drivers/nvdimm/namespace_devs.c | 10 ++++++++++ >> > 1 file changed, 10 insertions(+) >> > >> > diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c >> > index 032dc61725ff..aff1f32fdb4f 100644 >> > --- a/drivers/nvdimm/namespace_devs.c >> > +++ b/drivers/nvdimm/namespace_devs.c >> > @@ -1739,6 +1739,16 @@ struct nd_namespace_common *nvdimm_namespace_common_probe(struct device *dev) >> > return ERR_PTR(-ENODEV); >> > } >> > >> > + if (pmem_should_map_pages(dev) || nd_pfn || nd_dax) { >> > + struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); >> > + resource_size_t start = nsio->res.start; >> > + >> > + if (!IS_ALIGNED(start | size, memremap_compat_align())) { >> > + dev_dbg(&ndns->dev, "misaligned, unable to map\n"); >> > + return ERR_PTR(-EOPNOTSUPP); >> > + } >> > + } >> > + >> > if (is_namespace_pmem(&ndns->dev)) { >> > struct nd_namespace_pmem *nspm; >> > >> >> Actually, I take back my ack. :) This prevents a previously working >> namespace from being successfully probed/setup. > > Do you have a test case handy? I can see a potential gap with a > namespace that used internal padding to fix up the alignment. # ndctl list -v -n namespace0.0 [ { "dev":"namespace0.0", "mode":"fsdax", "map":"dev", "size":52846133248, "uuid":"b99f6f6a-2909-4189-9bfa-6eeebd95d40e", "raw_uuid":"aff43777-015b-493f-bbf9-7c7b0fe33519", "sector_size":512, "align":4096, "blockdev":"pmem0", "numa_node":0 } ] # cat /sys/bus/nd/devices/region0/mappings 6 # grep namespace0.0 /proc/iomem 1860000000-24e0003fff : namespace0.0 > The goal of this check is to catch cases that are just going to fail > devm_memremap_pages(), and the expectation is that it could not have > worked before unless it was ported from another platform, or someone > flipped the page-size switch on PowerPC. On x86, creation and probing of the namespace worked fine before this patch. What *doesn't* work is creating another fsdax namespace after this one. sector mode namespaces can still be created, though: [ { "dev":"namespace0.1", "mode":"sector", "size":53270768640, "uuid":"67ea2c74-d4b1-4fc9-9c1a-a7d2a6c2a4a7", "sector_size":512, "blockdev":"pmem0.1s" }, # grep namespace0.1 /proc/iomem 24e0004000-3160007fff : namespace0.1 >> I thought we were only going to enforce the alignment for a newly >> created namespace? This should only check whether the alignment >> works for the current platform. > > The model is a new default 16MB alignment is enforced at creation > time, but if you need to support previously created namespaces then > you can manually trim that alignment requirement to no less than > memremap_compat_align() because that's the point at which > devm_memremap_pages() will start failing or crashing. The problem is that older kernels did not enforce alignment to SUBSECTION_SIZE. We shouldn't prevent those namespaces from being accessed. The probe itself will not cause the WARN_ON to trigger. Creating new namespaces at misaligned addresses could, but you've altered the free space allocation such that we won't hit that anymore. If I drop this patch, the probe will still work, and allocating new namespaces will also work: # ndctl list [ { "dev":"namespace0.1", "mode":"sector", "size":53270768640, "uuid":"67ea2c74-d4b1-4fc9-9c1a-a7d2a6c2a4a7", "sector_size":512, "blockdev":"pmem0.1s" }, { "dev":"namespace0.0", "mode":"fsdax", "map":"dev", "size":52846133248, "uuid":"b99f6f6a-2909-4189-9bfa-6eeebd95d40e", "sector_size":512, "align":4096, "blockdev":"pmem0" } ] ndctl create-namespace -m fsdax -s 36g -r 0 { "dev":"namespace0.2", "mode":"fsdax", "map":"dev", "size":"35.44 GiB (38.05 GB)", "uuid":"7893264c-c7ef-4cbe-95e1-ccf2aff041fb", "sector_size":512, "align":2097152, "blockdev":"pmem0.2" } proc/iomem: 1860000000-d55fffffff : Persistent Memory 1860000000-24e0003fff : namespace0.0 24e0004000-3160007fff : namespace0.1 3162000000-3a61ffffff : namespace0.2 So, maybe the right thing is to make memremap_compat_align return PAGE_SIZE for x86 instead of SUBSECTION_SIZE? -Jeff
On 2/14/20 10:14 PM, Jeff Moyer wrote: > Dan Williams <dan.j.williams@intel.com> writes: > >> On Thu, Feb 13, 2020 at 1:55 PM Jeff Moyer <jmoyer@redhat.com> wrote: >>> >>> Dan Williams <dan.j.williams@intel.com> writes: >>> >>>> The pmem driver on PowerPC crashes with the following signature when >>>> instantiating misaligned namespaces that map their capacity via >>>> memremap_pages(). >>>> >>>> BUG: Unable to handle kernel data access at 0xc001000406000000 >>>> Faulting instruction address: 0xc000000000090790 >>>> NIP [c000000000090790] arch_add_memory+0xc0/0x130 >>>> LR [c000000000090744] arch_add_memory+0x74/0x130 >>>> Call Trace: >>>> arch_add_memory+0x74/0x130 (unreliable) >>>> memremap_pages+0x74c/0xa30 >>>> devm_memremap_pages+0x3c/0xa0 >>>> pmem_attach_disk+0x188/0x770 >>>> nvdimm_bus_probe+0xd8/0x470 >>>> >>>> With the assumption that only memremap_pages() has alignment >>>> constraints, enforce memremap_compat_align() for >>>> pmem_should_map_pages(), nd_pfn, or nd_dax cases. >>>> >>>> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >>>> Cc: Jeff Moyer <jmoyer@redhat.com> >>>> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >>>> Link: https://lore.kernel.org/r/158041477336.3889308.4581652885008605170.stgit@dwillia2-desk3.amr.corp.intel.com >>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com> >>>> --- >>>> drivers/nvdimm/namespace_devs.c | 10 ++++++++++ >>>> 1 file changed, 10 insertions(+) >>>> >>>> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c >>>> index 032dc61725ff..aff1f32fdb4f 100644 >>>> --- a/drivers/nvdimm/namespace_devs.c >>>> +++ b/drivers/nvdimm/namespace_devs.c >>>> @@ -1739,6 +1739,16 @@ struct nd_namespace_common *nvdimm_namespace_common_probe(struct device *dev) >>>> return ERR_PTR(-ENODEV); >>>> } >>>> >>>> + if (pmem_should_map_pages(dev) || nd_pfn || nd_dax) { >>>> + struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); >>>> + resource_size_t start = nsio->res.start; >>>> + >>>> + if (!IS_ALIGNED(start | size, memremap_compat_align())) { >>>> + dev_dbg(&ndns->dev, "misaligned, unable to map\n"); >>>> + return ERR_PTR(-EOPNOTSUPP); >>>> + } >>>> + } >>>> + >>>> if (is_namespace_pmem(&ndns->dev)) { >>>> struct nd_namespace_pmem *nspm; >>>> >>> >>> Actually, I take back my ack. :) This prevents a previously working >>> namespace from being successfully probed/setup. >> >> Do you have a test case handy? I can see a potential gap with a >> namespace that used internal padding to fix up the alignment. > > # ndctl list -v -n namespace0.0 > [ > { > "dev":"namespace0.0", > "mode":"fsdax", > "map":"dev", > "size":52846133248, > "uuid":"b99f6f6a-2909-4189-9bfa-6eeebd95d40e", > "raw_uuid":"aff43777-015b-493f-bbf9-7c7b0fe33519", > "sector_size":512, > "align":4096, > "blockdev":"pmem0", > "numa_node":0 > } > ] > > # cat /sys/bus/nd/devices/region0/mappings > 6 > > # grep namespace0.0 /proc/iomem > 1860000000-24e0003fff : namespace0.0 > >> The goal of this check is to catch cases that are just going to fail >> devm_memremap_pages(), and the expectation is that it could not have >> worked before unless it was ported from another platform, or someone >> flipped the page-size switch on PowerPC. > > On x86, creation and probing of the namespace worked fine before this > patch. What *doesn't* work is creating another fsdax namespace after > this one. sector mode namespaces can still be created, though: > > [ > { > "dev":"namespace0.1", > "mode":"sector", > "size":53270768640, > "uuid":"67ea2c74-d4b1-4fc9-9c1a-a7d2a6c2a4a7", > "sector_size":512, > "blockdev":"pmem0.1s" > }, > > # grep namespace0.1 /proc/iomem > 24e0004000-3160007fff : namespace0.1 > >>> I thought we were only going to enforce the alignment for a newly >>> created namespace? This should only check whether the alignment >>> works for the current platform. >> >> The model is a new default 16MB alignment is enforced at creation >> time, but if you need to support previously created namespaces then >> you can manually trim that alignment requirement to no less than >> memremap_compat_align() because that's the point at which >> devm_memremap_pages() will start failing or crashing. > > The problem is that older kernels did not enforce alignment to > SUBSECTION_SIZE. We shouldn't prevent those namespaces from being > accessed. The probe itself will not cause the WARN_ON to trigger. > Creating new namespaces at misaligned addresses could, but you've > altered the free space allocation such that we won't hit that anymore. > > If I drop this patch, the probe will still work, and allocating new > namespaces will also work: > > # ndctl list > [ > { > "dev":"namespace0.1", > "mode":"sector", > "size":53270768640, > "uuid":"67ea2c74-d4b1-4fc9-9c1a-a7d2a6c2a4a7", > "sector_size":512, > "blockdev":"pmem0.1s" > }, > { > "dev":"namespace0.0", > "mode":"fsdax", > "map":"dev", > "size":52846133248, > "uuid":"b99f6f6a-2909-4189-9bfa-6eeebd95d40e", > "sector_size":512, > "align":4096, > "blockdev":"pmem0" > } > ] > ndctl create-namespace -m fsdax -s 36g -r 0 > { > "dev":"namespace0.2", > "mode":"fsdax", > "map":"dev", > "size":"35.44 GiB (38.05 GB)", > "uuid":"7893264c-c7ef-4cbe-95e1-ccf2aff041fb", > "sector_size":512, > "align":2097152, > "blockdev":"pmem0.2" > } > > proc/iomem: > > 1860000000-d55fffffff : Persistent Memory > 1860000000-24e0003fff : namespace0.0 > 24e0004000-3160007fff : namespace0.1 > 3162000000-3a61ffffff : namespace0.2 > > So, maybe the right thing is to make memremap_compat_align return > PAGE_SIZE for x86 instead of SUBSECTION_SIZE? > I did that as part of https://lore.kernel.org/linux-nvdimm/20200120140749.69549-2-aneesh.kumar@linux.ibm.com and applied the subsection details only when creating new namespace https://lore.kernel.org/linux-nvdimm/20200120140749.69549-4-aneesh.kumar@linux.ibm.com But I do agree with the approach that in-order to create a compatible namespace we need enforce max possible align value across all supported architectures. On POWER we should still be able to enforce SUBSECTION_SIZE restrictions. We did put that as document w.r.t. distributions like Suse https://www.suse.com/support/kb/doc/?id=7024300 -aneesh
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c index 032dc61725ff..aff1f32fdb4f 100644 --- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -1739,6 +1739,16 @@ struct nd_namespace_common *nvdimm_namespace_common_probe(struct device *dev) return ERR_PTR(-ENODEV); } + if (pmem_should_map_pages(dev) || nd_pfn || nd_dax) { + struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); + resource_size_t start = nsio->res.start; + + if (!IS_ALIGNED(start | size, memremap_compat_align())) { + dev_dbg(&ndns->dev, "misaligned, unable to map\n"); + return ERR_PTR(-EOPNOTSUPP); + } + } + if (is_namespace_pmem(&ndns->dev)) { struct nd_namespace_pmem *nspm;