Message ID | 158155490379.3343782.10305190793306743949.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | libnvdimm: Cross-arch compatible namespace alignment | expand |
Dan Williams <dan.j.williams@intel.com> writes: > The "sub-section memory hotplug" facility allows memremap_pages() users > like libnvdimm to compensate for hardware platforms like x86 that have a > section size larger than their hardware memory mapping granularity. The > compensation that sub-section support affords is being tolerant of > physical memory resources shifting by units smaller (64MiB on x86) than > the memory-hotplug section size (128 MiB). Where the platform > physical-memory mapping granularity is limited by the number and > capability of address-decode-registers in the memory controller. > > While the sub-section support allows memremap_pages() to operate on > sub-section (2MiB) granularity, the Power architecture may still > require 16MiB alignment on "!radix_enabled()" platforms. > > In order for libnvdimm to be able to detect and manage this per-arch > limitation, introduce memremap_compat_align() as a common minimum > alignment across all driver-facing memory-mapping interfaces, and let > Power override it to 16MiB in the "!radix_enabled()" case. > > The assumption / requirement for 16MiB to be a viable > memremap_compat_align() value is that Power does not have platforms > where its equivalent of address-decode-registers never hardware remaps a > persistent memory resource on smaller than 16MiB boundaries. Note that I > tried my best to not add a new Kconfig symbol, but header include > entanglements defeated the #ifndef memremap_compat_align design pattern > and the need to export it defeats the __weak design pattern for arch > overrides. > > Based on an initial patch by Aneesh. I have just a couple of questions. First, can you please add a comment above the generic implementation of memremap_compat_align describing its purpose, and why a platform might want to override it? Second, I will take it at face value that the power architecture requires a 16MB alignment, but it's not clear to me why mmu_linear_psize was chosen to represent that. What's the relationship, there, and can we please have a comment explaining it? Thanks! Jeff > > Link: http://lore.kernel.org/r/CAPcyv4gBGNP95APYaBcsocEa50tQj9b5h__83vgngjq3ouGX_Q@mail.gmail.com > Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > Reported-by: Jeff Moyer <jmoyer@redhat.com> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Paul Mackerras <paulus@samba.org> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > arch/powerpc/Kconfig | 1 + > arch/powerpc/mm/ioremap.c | 12 ++++++++++++ > drivers/nvdimm/pfn_devs.c | 2 +- > include/linux/memremap.h | 8 ++++++++ > include/linux/mmzone.h | 1 + > lib/Kconfig | 3 +++ > mm/memremap.c | 13 +++++++++++++ > 7 files changed, 39 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 497b7d0b2d7e..e6ffe905e2b9 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -122,6 +122,7 @@ config PPC > select ARCH_HAS_GCOV_PROFILE_ALL > select ARCH_HAS_KCOV > select ARCH_HAS_HUGEPD if HUGETLB_PAGE > + select ARCH_HAS_MEMREMAP_COMPAT_ALIGN > select ARCH_HAS_MMIOWB if PPC64 > select ARCH_HAS_PHYS_TO_DMA > select ARCH_HAS_PMEM_API > diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c > index fc669643ce6a..38b5ba7d3e2d 100644 > --- a/arch/powerpc/mm/ioremap.c > +++ b/arch/powerpc/mm/ioremap.c > @@ -2,6 +2,7 @@ > > #include <linux/io.h> > #include <linux/slab.h> > +#include <linux/mmzone.h> > #include <linux/vmalloc.h> > #include <asm/io-workarounds.h> > > @@ -97,3 +98,14 @@ void __iomem *do_ioremap(phys_addr_t pa, phys_addr_t offset, unsigned long size, > > return NULL; > } > + > +#ifdef CONFIG_ZONE_DEVICE > +/* override of the generic version in mm/memremap.c */ > +unsigned long memremap_compat_align(void) > +{ > + if (radix_enabled()) > + return SUBSECTION_SIZE; > + return (1UL << mmu_psize_defs[mmu_linear_psize].shift); > +} > +EXPORT_SYMBOL_GPL(memremap_compat_align); > +#endif > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c > index b94f7a7e94b8..a5c25cb87116 100644 > --- a/drivers/nvdimm/pfn_devs.c > +++ b/drivers/nvdimm/pfn_devs.c > @@ -750,7 +750,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > start = nsio->res.start; > size = resource_size(&nsio->res); > npfns = PHYS_PFN(size - SZ_8K); > - align = max(nd_pfn->align, (1UL << SUBSECTION_SHIFT)); > + align = max(nd_pfn->align, SUBSECTION_SIZE); > end_trunc = start + size - ALIGN_DOWN(start + size, align); > if (nd_pfn->mode == PFN_MODE_PMEM) { > /* > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > index 6fefb09af7c3..8af1cbd8f293 100644 > --- a/include/linux/memremap.h > +++ b/include/linux/memremap.h > @@ -132,6 +132,7 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn, > > unsigned long vmem_altmap_offset(struct vmem_altmap *altmap); > void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns); > +unsigned long memremap_compat_align(void); > #else > static inline void *devm_memremap_pages(struct device *dev, > struct dev_pagemap *pgmap) > @@ -165,6 +166,12 @@ static inline void vmem_altmap_free(struct vmem_altmap *altmap, > unsigned long nr_pfns) > { > } > + > +/* when memremap_pages() is disabled all archs can remap a single page */ > +static inline unsigned long memremap_compat_align(void) > +{ > + return PAGE_SIZE; > +} > #endif /* CONFIG_ZONE_DEVICE */ > > static inline void put_dev_pagemap(struct dev_pagemap *pgmap) > @@ -172,4 +179,5 @@ static inline void put_dev_pagemap(struct dev_pagemap *pgmap) > if (pgmap) > percpu_ref_put(pgmap->ref); > } > + > #endif /* _LINUX_MEMREMAP_H_ */ > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 462f6873905a..6b77f7239af5 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -1170,6 +1170,7 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec) > #define SECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SECTION_MASK) > > #define SUBSECTION_SHIFT 21 > +#define SUBSECTION_SIZE (1UL << SUBSECTION_SHIFT) > > #define PFN_SUBSECTION_SHIFT (SUBSECTION_SHIFT - PAGE_SHIFT) > #define PAGES_PER_SUBSECTION (1UL << PFN_SUBSECTION_SHIFT) > diff --git a/lib/Kconfig b/lib/Kconfig > index 0cf875fd627c..17dbc7bd3895 100644 > --- a/lib/Kconfig > +++ b/lib/Kconfig > @@ -618,6 +618,9 @@ config ARCH_HAS_PMEM_API > config MEMREGION > bool > > +config ARCH_HAS_MEMREMAP_COMPAT_ALIGN > + bool > + > # use memcpy to implement user copies for nommu architectures > config UACCESS_MEMCPY > bool > diff --git a/mm/memremap.c b/mm/memremap.c > index 09b5b7adc773..a6905d28fe91 100644 > --- a/mm/memremap.c > +++ b/mm/memremap.c > @@ -7,6 +7,7 @@ > #include <linux/mm.h> > #include <linux/pfn_t.h> > #include <linux/swap.h> > +#include <linux/mmzone.h> > #include <linux/swapops.h> > #include <linux/types.h> > #include <linux/wait_bit.h> > @@ -14,6 +15,18 @@ > > static DEFINE_XARRAY(pgmap_array); > > +/* > + * Minimum compatible alignment of the resource (start, end) across > + * memremap interfaces (i.e. memremap + memremap_pages) > + */ > +#ifndef CONFIG_ARCH_HAS_MEMREMAP_COMPAT_ALIGN > +unsigned long memremap_compat_align(void) > +{ > + return SUBSECTION_SIZE; > +} > +EXPORT_SYMBOL_GPL(memremap_compat_align); > +#endif > + > #ifdef CONFIG_DEV_PAGEMAP_OPS > DEFINE_STATIC_KEY_FALSE(devmap_managed_key); > EXPORT_SYMBOL(devmap_managed_key);
On Thu, Feb 13, 2020 at 8:58 AM Jeff Moyer <jmoyer@redhat.com> wrote: > > Dan Williams <dan.j.williams@intel.com> writes: > > > The "sub-section memory hotplug" facility allows memremap_pages() users > > like libnvdimm to compensate for hardware platforms like x86 that have a > > section size larger than their hardware memory mapping granularity. The > > compensation that sub-section support affords is being tolerant of > > physical memory resources shifting by units smaller (64MiB on x86) than > > the memory-hotplug section size (128 MiB). Where the platform > > physical-memory mapping granularity is limited by the number and > > capability of address-decode-registers in the memory controller. > > > > While the sub-section support allows memremap_pages() to operate on > > sub-section (2MiB) granularity, the Power architecture may still > > require 16MiB alignment on "!radix_enabled()" platforms. > > > > In order for libnvdimm to be able to detect and manage this per-arch > > limitation, introduce memremap_compat_align() as a common minimum > > alignment across all driver-facing memory-mapping interfaces, and let > > Power override it to 16MiB in the "!radix_enabled()" case. > > > > The assumption / requirement for 16MiB to be a viable > > memremap_compat_align() value is that Power does not have platforms > > where its equivalent of address-decode-registers never hardware remaps a > > persistent memory resource on smaller than 16MiB boundaries. Note that I > > tried my best to not add a new Kconfig symbol, but header include > > entanglements defeated the #ifndef memremap_compat_align design pattern > > and the need to export it defeats the __weak design pattern for arch > > overrides. > > > > Based on an initial patch by Aneesh. > > I have just a couple of questions. > > First, can you please add a comment above the generic implementation of > memremap_compat_align describing its purpose, and why a platform might > want to override it? Sure, how about: /* * The memremap() and memremap_pages() interfaces are alternately used * to map persistent memory namespaces. These interfaces place different * constraints on the alignment and size of the mapping (namespace). * memremap() can map individual PAGE_SIZE pages. memremap_pages() can * only map subsections (2MB), and at least one architecture (PowerPC) * the minimum mapping granularity of memremap_pages() is 16MB. * * The role of memremap_compat_align() is to communicate the minimum * arch supported alignment of a namespace such that it can freely * switch modes without violating the arch constraint. Namely, do not * allow a namespace to be PAGE_SIZE aligned since that namespace may be * reconfigured into a mode that requires SUBSECTION_SIZE alignment. */ > Second, I will take it at face value that the power architecture > requires a 16MB alignment, but it's not clear to me why mmu_linear_psize > was chosen to represent that. What's the relationship, there, and can > we please have a comment explaining it? Aneesh, can you help here?
Dan Williams <dan.j.williams@intel.com> writes: > On Thu, Feb 13, 2020 at 8:58 AM Jeff Moyer <jmoyer@redhat.com> wrote: >> >> Dan Williams <dan.j.williams@intel.com> writes: >> >> > The "sub-section memory hotplug" facility allows memremap_pages() users >> > like libnvdimm to compensate for hardware platforms like x86 that have a >> > section size larger than their hardware memory mapping granularity. The >> > compensation that sub-section support affords is being tolerant of >> > physical memory resources shifting by units smaller (64MiB on x86) than >> > the memory-hotplug section size (128 MiB). Where the platform >> > physical-memory mapping granularity is limited by the number and >> > capability of address-decode-registers in the memory controller. >> > >> > While the sub-section support allows memremap_pages() to operate on >> > sub-section (2MiB) granularity, the Power architecture may still >> > require 16MiB alignment on "!radix_enabled()" platforms. >> > >> > In order for libnvdimm to be able to detect and manage this per-arch >> > limitation, introduce memremap_compat_align() as a common minimum >> > alignment across all driver-facing memory-mapping interfaces, and let >> > Power override it to 16MiB in the "!radix_enabled()" case. >> > >> > The assumption / requirement for 16MiB to be a viable >> > memremap_compat_align() value is that Power does not have platforms >> > where its equivalent of address-decode-registers never hardware remaps a >> > persistent memory resource on smaller than 16MiB boundaries. Note that I >> > tried my best to not add a new Kconfig symbol, but header include >> > entanglements defeated the #ifndef memremap_compat_align design pattern >> > and the need to export it defeats the __weak design pattern for arch >> > overrides. >> > >> > Based on an initial patch by Aneesh. >> >> I have just a couple of questions. >> >> First, can you please add a comment above the generic implementation of >> memremap_compat_align describing its purpose, and why a platform might >> want to override it? > > Sure, how about: > > /* > * The memremap() and memremap_pages() interfaces are alternately used > * to map persistent memory namespaces. These interfaces place different > * constraints on the alignment and size of the mapping (namespace). > * memremap() can map individual PAGE_SIZE pages. memremap_pages() can > * only map subsections (2MB), and at least one architecture (PowerPC) > * the minimum mapping granularity of memremap_pages() is 16MB. > * > * The role of memremap_compat_align() is to communicate the minimum > * arch supported alignment of a namespace such that it can freely > * switch modes without violating the arch constraint. Namely, do not > * allow a namespace to be PAGE_SIZE aligned since that namespace may be > * reconfigured into a mode that requires SUBSECTION_SIZE alignment. > */ > >> Second, I will take it at face value that the power architecture >> requires a 16MB alignment, but it's not clear to me why mmu_linear_psize >> was chosen to represent that. What's the relationship, there, and can >> we please have a comment explaining it? > > Aneesh, can you help here? With hash translation, we map the direct-map range with just one page size. Based on different restrictions as described in htab_init_page_sizes we can end up choosing 16M, 64K or even 4K. We use the variable mmu_linear_psize to indicate which page size we used for direct-map range. ie we should do. +unsigned long arch_namespace_align_size(void) +{ + unsigned long sub_section_size = (1UL << SUBSECTION_SHIFT); + + if (radix_enabled()) + return sub_section_size; + return max(sub_section_size, (1UL << mmu_psize_defs[mmu_linear_psize].shift)); + +} +EXPORT_SYMBOL_GPL(arch_namespace_align_size); as done here https://lore.kernel.org/linux-nvdimm/20200120140749.69549-4-aneesh.kumar@linux.ibm.com/ Dan can you update the powerpc definition? -aneesh
Dan Williams <dan.j.williams@intel.com> writes: > On Thu, Feb 13, 2020 at 8:58 AM Jeff Moyer <jmoyer@redhat.com> wrote: >> I have just a couple of questions. >> >> First, can you please add a comment above the generic implementation of >> memremap_compat_align describing its purpose, and why a platform might >> want to override it? > > Sure, how about: > > /* > * The memremap() and memremap_pages() interfaces are alternately used > * to map persistent memory namespaces. These interfaces place different > * constraints on the alignment and size of the mapping (namespace). > * memremap() can map individual PAGE_SIZE pages. memremap_pages() can > * only map subsections (2MB), and at least one architecture (PowerPC) > * the minimum mapping granularity of memremap_pages() is 16MB. > * > * The role of memremap_compat_align() is to communicate the minimum > * arch supported alignment of a namespace such that it can freely > * switch modes without violating the arch constraint. Namely, do not > * allow a namespace to be PAGE_SIZE aligned since that namespace may be > * reconfigured into a mode that requires SUBSECTION_SIZE alignment. > */ Well, if we modify the x86 variant to be PAGE_SIZE, I think that text won't work. How about: /* * memremap_compat_align should return the minimum alignment for * mapping memory via memremap() and memremap_pages(). For x86, this * is the system PAGE_SIZE. Other architectures may impose different * restrictions, as is seen on powerpc where the minimum alignment is * tied to the linear mapping page size. * * When creating persistent memory namespaces, the alignment is forced * to the least common denominator (MEMREMAP_COMPAT_ALIGN_MAX, * currently 16MB). However, older kernels did not enforce this * behavior, so we allow mapping namespaces with smaller alignments, * so long as the platform supports it. See nvdimm_namespace_common_probe. */ -Jeff
On Fri, Feb 14, 2020 at 12:59 PM Jeff Moyer <jmoyer@redhat.com> wrote: > > Dan Williams <dan.j.williams@intel.com> writes: > > > On Thu, Feb 13, 2020 at 8:58 AM Jeff Moyer <jmoyer@redhat.com> wrote: > > >> I have just a couple of questions. > >> > >> First, can you please add a comment above the generic implementation of > >> memremap_compat_align describing its purpose, and why a platform might > >> want to override it? > > > > Sure, how about: > > > > /* > > * The memremap() and memremap_pages() interfaces are alternately used > > * to map persistent memory namespaces. These interfaces place different > > * constraints on the alignment and size of the mapping (namespace). > > * memremap() can map individual PAGE_SIZE pages. memremap_pages() can > > * only map subsections (2MB), and at least one architecture (PowerPC) > > * the minimum mapping granularity of memremap_pages() is 16MB. > > * > > * The role of memremap_compat_align() is to communicate the minimum > > * arch supported alignment of a namespace such that it can freely > > * switch modes without violating the arch constraint. Namely, do not > > * allow a namespace to be PAGE_SIZE aligned since that namespace may be > > * reconfigured into a mode that requires SUBSECTION_SIZE alignment. > > */ > > Well, if we modify the x86 variant to be PAGE_SIZE, I think that text > won't work. How about: ...but I'm not looking to change it to PAGE_SIZE, I'm going to fix the alignment check to skip if the namespace has "inner" alignment padding, i.e. "start_pad" and/or "end_trunc" are non-zero.
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 497b7d0b2d7e..e6ffe905e2b9 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -122,6 +122,7 @@ config PPC select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOV select ARCH_HAS_HUGEPD if HUGETLB_PAGE + select ARCH_HAS_MEMREMAP_COMPAT_ALIGN select ARCH_HAS_MMIOWB if PPC64 select ARCH_HAS_PHYS_TO_DMA select ARCH_HAS_PMEM_API diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c index fc669643ce6a..38b5ba7d3e2d 100644 --- a/arch/powerpc/mm/ioremap.c +++ b/arch/powerpc/mm/ioremap.c @@ -2,6 +2,7 @@ #include <linux/io.h> #include <linux/slab.h> +#include <linux/mmzone.h> #include <linux/vmalloc.h> #include <asm/io-workarounds.h> @@ -97,3 +98,14 @@ void __iomem *do_ioremap(phys_addr_t pa, phys_addr_t offset, unsigned long size, return NULL; } + +#ifdef CONFIG_ZONE_DEVICE +/* override of the generic version in mm/memremap.c */ +unsigned long memremap_compat_align(void) +{ + if (radix_enabled()) + return SUBSECTION_SIZE; + return (1UL << mmu_psize_defs[mmu_linear_psize].shift); +} +EXPORT_SYMBOL_GPL(memremap_compat_align); +#endif diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index b94f7a7e94b8..a5c25cb87116 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -750,7 +750,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) start = nsio->res.start; size = resource_size(&nsio->res); npfns = PHYS_PFN(size - SZ_8K); - align = max(nd_pfn->align, (1UL << SUBSECTION_SHIFT)); + align = max(nd_pfn->align, SUBSECTION_SIZE); end_trunc = start + size - ALIGN_DOWN(start + size, align); if (nd_pfn->mode == PFN_MODE_PMEM) { /* diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 6fefb09af7c3..8af1cbd8f293 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -132,6 +132,7 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn, unsigned long vmem_altmap_offset(struct vmem_altmap *altmap); void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns); +unsigned long memremap_compat_align(void); #else static inline void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) @@ -165,6 +166,12 @@ static inline void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns) { } + +/* when memremap_pages() is disabled all archs can remap a single page */ +static inline unsigned long memremap_compat_align(void) +{ + return PAGE_SIZE; +} #endif /* CONFIG_ZONE_DEVICE */ static inline void put_dev_pagemap(struct dev_pagemap *pgmap) @@ -172,4 +179,5 @@ static inline void put_dev_pagemap(struct dev_pagemap *pgmap) if (pgmap) percpu_ref_put(pgmap->ref); } + #endif /* _LINUX_MEMREMAP_H_ */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 462f6873905a..6b77f7239af5 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1170,6 +1170,7 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec) #define SECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SECTION_MASK) #define SUBSECTION_SHIFT 21 +#define SUBSECTION_SIZE (1UL << SUBSECTION_SHIFT) #define PFN_SUBSECTION_SHIFT (SUBSECTION_SHIFT - PAGE_SHIFT) #define PAGES_PER_SUBSECTION (1UL << PFN_SUBSECTION_SHIFT) diff --git a/lib/Kconfig b/lib/Kconfig index 0cf875fd627c..17dbc7bd3895 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -618,6 +618,9 @@ config ARCH_HAS_PMEM_API config MEMREGION bool +config ARCH_HAS_MEMREMAP_COMPAT_ALIGN + bool + # use memcpy to implement user copies for nommu architectures config UACCESS_MEMCPY bool diff --git a/mm/memremap.c b/mm/memremap.c index 09b5b7adc773..a6905d28fe91 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -7,6 +7,7 @@ #include <linux/mm.h> #include <linux/pfn_t.h> #include <linux/swap.h> +#include <linux/mmzone.h> #include <linux/swapops.h> #include <linux/types.h> #include <linux/wait_bit.h> @@ -14,6 +15,18 @@ static DEFINE_XARRAY(pgmap_array); +/* + * Minimum compatible alignment of the resource (start, end) across + * memremap interfaces (i.e. memremap + memremap_pages) + */ +#ifndef CONFIG_ARCH_HAS_MEMREMAP_COMPAT_ALIGN +unsigned long memremap_compat_align(void) +{ + return SUBSECTION_SIZE; +} +EXPORT_SYMBOL_GPL(memremap_compat_align); +#endif + #ifdef CONFIG_DEV_PAGEMAP_OPS DEFINE_STATIC_KEY_FALSE(devmap_managed_key); EXPORT_SYMBOL(devmap_managed_key);
The "sub-section memory hotplug" facility allows memremap_pages() users like libnvdimm to compensate for hardware platforms like x86 that have a section size larger than their hardware memory mapping granularity. The compensation that sub-section support affords is being tolerant of physical memory resources shifting by units smaller (64MiB on x86) than the memory-hotplug section size (128 MiB). Where the platform physical-memory mapping granularity is limited by the number and capability of address-decode-registers in the memory controller. While the sub-section support allows memremap_pages() to operate on sub-section (2MiB) granularity, the Power architecture may still require 16MiB alignment on "!radix_enabled()" platforms. In order for libnvdimm to be able to detect and manage this per-arch limitation, introduce memremap_compat_align() as a common minimum alignment across all driver-facing memory-mapping interfaces, and let Power override it to 16MiB in the "!radix_enabled()" case. The assumption / requirement for 16MiB to be a viable memremap_compat_align() value is that Power does not have platforms where its equivalent of address-decode-registers never hardware remaps a persistent memory resource on smaller than 16MiB boundaries. Note that I tried my best to not add a new Kconfig symbol, but header include entanglements defeated the #ifndef memremap_compat_align design pattern and the need to export it defeats the __weak design pattern for arch overrides. Based on an initial patch by Aneesh. Link: http://lore.kernel.org/r/CAPcyv4gBGNP95APYaBcsocEa50tQj9b5h__83vgngjq3ouGX_Q@mail.gmail.com Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reported-by: Jeff Moyer <jmoyer@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- arch/powerpc/Kconfig | 1 + arch/powerpc/mm/ioremap.c | 12 ++++++++++++ drivers/nvdimm/pfn_devs.c | 2 +- include/linux/memremap.h | 8 ++++++++ include/linux/mmzone.h | 1 + lib/Kconfig | 3 +++ mm/memremap.c | 13 +++++++++++++ 7 files changed, 39 insertions(+), 1 deletion(-)