Message ID | 20230801044116.10674-1-aneesh.kumar@linux.ibm.com (mailing list archive) |
---|---|
Headers | show |
Series | Add support for memmap on memory feature on ppc64 | expand |
On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote: > Allow updating memmap_on_memory mode after the kernel boot. Memory > hotplug done after the mode update will use the new mmemap_on_memory > value. Well, this is a user space kABI extension and as such you should spend more words about the usecase. Why we could live with this static and now need dynamic?
On Tue 01-08-23 10:11:13, Aneesh Kumar K.V wrote: [...] > + if (mode == MEMMAP_ON_MEMORY_FORCE) { > + unsigned long memmap_pages = memory_block_memmap_on_memory_pages(); unsigned long wastage = memmap_pages - PFN_UP(memory_block_memmap_size()); if (wastage) pr_info_once("memmap_on_memory=force will waste %ld pages in each memory block %ld\n", wastage, memory_block_size_bytes() >> PAGE_SHIFT); would be more useful IMHO. > + > + pr_info_once("Memory hotplug will waste %ld pages in each memory block\n", > + memmap_pages - PFN_UP(memory_block_memmap_size()));
I cannot really judge the ppc specific part but other patches seem reasonable. Patch 4 could print a more useful information about the wastage but this is nothing really earth shattering. I am not sure about the last patch which makes the on-memory property dynamic. This needs much more justification and use case description IMHO. That being said for patches 1 - 4 and 6 feel free to add Acked-by: Michal Hocko <mhocko@suse.com> On Tue 01-08-23 10:11:09, Aneesh Kumar K.V wrote: > This patch series update memmap on memory feature to fall back to > memmap allocation outside the memory block if the alignment rules are > not met. This makes the feature more useful on architectures like > ppc64 where alignment rules are different with 64K page size. > > This patch series is dependent on dax vmemmap optimization series > posted here > https://lore.kernel.org/linux-mm/20230718022934.90447-1-aneesh.kumar@linux.ibm.com/ > > Changes from v6: > * Update comments in the code > * Update commit message for patch 7 > > Changes from v5: > * Update commit message > * Move memory alloc/free to the callers in patch 6 > * Address review feedback w.r.t patch 4 > > Changes from v4: > * Use altmap.free instead of altmap.reserve > * Address review feedback > > Changes from v3: > * Extend the module parameter memmap_on_memory to force allocation even > though we can waste hotplug memory. > > Changes from v2: > * Rebase to latest linus tree > * Redo the series based on review feedback. Multiple changes to the patchset. > > Changes from v1: > * update the memblock to store vmemmap_altmap details. This is required > so that when we remove the memory we can find the altmap details which > is needed on some architectures. > * rebase to latest linus tree > > > > Aneesh Kumar K.V (7): > mm/memory_hotplug: Simplify ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE kconfig > mm/memory_hotplug: Allow memmap on memory hotplug request to fallback > mm/memory_hotplug: Allow architecture to override memmap on memory > support check > mm/memory_hotplug: Support memmap_on_memory when memmap is not aligned > to pageblocks > powerpc/book3s64/memhotplug: Enable memmap on memory for radix > mm/memory_hotplug: Embed vmem_altmap details in memory block > mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter > > .../admin-guide/mm/memory-hotplug.rst | 12 + > arch/arm64/Kconfig | 4 +- > arch/powerpc/Kconfig | 1 + > arch/powerpc/include/asm/pgtable.h | 21 ++ > .../platforms/pseries/hotplug-memory.c | 2 +- > arch/x86/Kconfig | 4 +- > drivers/acpi/acpi_memhotplug.c | 3 +- > drivers/base/memory.c | 27 ++- > include/linux/memory.h | 8 +- > include/linux/memory_hotplug.h | 3 +- > mm/Kconfig | 3 + > mm/memory_hotplug.c | 205 ++++++++++++++---- > 12 files changed, 220 insertions(+), 73 deletions(-) > > -- > 2.41.0
On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote: > On 8/1/23 2:28 PM, Michal Hocko wrote: > > On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote: > >> Allow updating memmap_on_memory mode after the kernel boot. Memory > >> hotplug done after the mode update will use the new mmemap_on_memory > >> value. > > > > Well, this is a user space kABI extension and as such you should spend > > more words about the usecase. Why we could live with this static and now > > need dynamic? > > > > This enables easy testing of memmap_on_memory feature without a kernel reboot. Testing alone is rather weak argument to be honest. > I also expect people wanting to use that when they find dax kmem memory online > failing because of struct page allocation failures[1]. User could reboot back with > memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes > the feature much more useful. Sure it can be useful but that holds for any feature, right. The main question is whether this is worth maintaing. The current implementation seems rather trivial which is an argument to have it but are there any risks long term? Have you evaluated a potential long term maintenance cost? There is no easy way to go back and disable it later on without breaking some userspace. All that should be in the changelog!
On Tue, 2023-08-01 at 10:11 +0530, Aneesh Kumar K.V wrote: > With memmap on memory, some architecture needs more details w.r.t altmap > such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of > computing them again when we remove a memory block, embed vmem_altmap > details in struct memory_block if we are using memmap on memory block > feature. > > Acked-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > --- > drivers/base/memory.c | 27 +++++++++++++-------- > include/linux/memory.h | 8 ++---- > mm/memory_hotplug.c | 55 ++++++++++++++++++++++++++---------------- > 3 files changed, 53 insertions(+), 37 deletions(-) > <snip> > @@ -2136,10 +2148,10 @@ EXPORT_SYMBOL(try_offline_node); > > static int __ref try_remove_memory(u64 start, u64 size) > { > - struct vmem_altmap mhp_altmap = {}; > - struct vmem_altmap *altmap = NULL; > - unsigned long nr_vmemmap_pages; > + int ret; Minor nit - there is already an 'int rc' below - just use that, or rename it to 'ret' if that's better for consistency. > + struct memory_block *mem; > int rc = 0, nid = NUMA_NO_NODE; > + struct vmem_altmap *altmap = NULL; > > BUG_ON(check_hotplug_memory_range(start, size)); > > @@ -2161,25 +2173,20 @@ static int __ref try_remove_memory(u64 start, u64 size) > * the same granularity it was added - a single memory block. > */ > if (mhp_memmap_on_memory()) { > - nr_vmemmap_pages = walk_memory_blocks(start, size, NULL, > - get_nr_vmemmap_pages_cb); > - if (nr_vmemmap_pages) { > + ret = walk_memory_blocks(start, size, &mem, test_has_altmap_cb); > + if (ret) { > if (size != memory_block_size_bytes()) { > pr_warn("Refuse to remove %#llx - %#llx," > "wrong granularity\n", > start, start + size); > return -EINVAL; > } > - > + altmap = mem->altmap; > /* > - * Let remove_pmd_table->free_hugepage_table do the > - * right thing if we used vmem_altmap when hot-adding > - * the range. > + * Mark altmap NULL so that we can add a debug > + * check on memblock free. > */ > - mhp_altmap.base_pfn = PHYS_PFN(start); > - mhp_altmap.free = nr_vmemmap_pages; > - mhp_altmap.alloc = nr_vmemmap_pages; > - altmap = &mhp_altmap; > + mem->altmap = NULL; > } > } >
On 8/1/23 4:20 PM, Michal Hocko wrote: > On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote: >> On 8/1/23 2:28 PM, Michal Hocko wrote: >>> On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote: >>>> Allow updating memmap_on_memory mode after the kernel boot. Memory >>>> hotplug done after the mode update will use the new mmemap_on_memory >>>> value. >>> >>> Well, this is a user space kABI extension and as such you should spend >>> more words about the usecase. Why we could live with this static and now >>> need dynamic? >>> >> >> This enables easy testing of memmap_on_memory feature without a kernel reboot. > > Testing alone is rather weak argument to be honest. > >> I also expect people wanting to use that when they find dax kmem memory online >> failing because of struct page allocation failures[1]. User could reboot back with >> memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes >> the feature much more useful. > > Sure it can be useful but that holds for any feature, right. The main > question is whether this is worth maintaing. The current implementation > seems rather trivial which is an argument to have it but are there any > risks long term? Have you evaluated a potential long term maintenance > cost? There is no easy way to go back and disable it later on without > breaking some userspace. > > All that should be in the changelog! I updated it as below. mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter Allow updating memmap_on_memory mode after the kernel boot. Memory hotplug done after the mode update will use the new mmemap_on_memory value. It is now possible to test the memmap_on_memory feature easily without the need for a kernel reboot. Additionally, for those encountering struct page allocation failures while using dax kmem memory online, this feature may prove useful. Instead of rebooting with the memmap_on_memory=y kernel parameter, users can now enable it via sysfs, which greatly enhances its usefulness. Making the necessary changes to support runtime updates is a simple change that should not affect the addition of new features or result in long-term maintenance problems.
On 8/2/23 4:40 AM, Verma, Vishal L wrote: > On Tue, 2023-08-01 at 10:11 +0530, Aneesh Kumar K.V wrote: >> With memmap on memory, some architecture needs more details w.r.t altmap >> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of >> computing them again when we remove a memory block, embed vmem_altmap >> details in struct memory_block if we are using memmap on memory block >> feature. >> >> Acked-by: David Hildenbrand <david@redhat.com> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >> --- >> drivers/base/memory.c | 27 +++++++++++++-------- >> include/linux/memory.h | 8 ++---- >> mm/memory_hotplug.c | 55 ++++++++++++++++++++++++++---------------- >> 3 files changed, 53 insertions(+), 37 deletions(-) >> > <snip> > >> @@ -2136,10 +2148,10 @@ EXPORT_SYMBOL(try_offline_node); >> >> static int __ref try_remove_memory(u64 start, u64 size) >> { >> - struct vmem_altmap mhp_altmap = {}; >> - struct vmem_altmap *altmap = NULL; >> - unsigned long nr_vmemmap_pages; >> + int ret; > > Minor nit - there is already an 'int rc' below - just use that, or > rename it to 'ret' if that's better for consistency. > I reused the existing rc variable. >> + struct memory_block *mem; >> int rc = 0, nid = NUMA_NO_NODE; >> + struct vmem_altmap *altmap = NULL; >> >> BUG_ON(check_hotplug_memory_range(start, size)); >> >> @@ -2161,25 +2173,20 @@ static int __ref try_remove_memory(u64 start, u64 size) >> * the same granularity it was added - a single memory block. >> */ >> if (mhp_memmap_on_memory()) { >> - nr_vmemmap_pages = walk_memory_blocks(start, size, NULL, >> - get_nr_vmemmap_pages_cb); >> - if (nr_vmemmap_pages) { >> + ret = walk_memory_blocks(start, size, &mem, test_has_altmap_cb); >> + if (ret) { >> if (size != memory_block_size_bytes()) { >> pr_warn("Refuse to remove %#llx - %#llx," >> "wrong granularity\n", >> start, start + size); >> return -EINVAL; >> } >> - >> + altmap = mem->altmap; >> /* >> - * Let remove_pmd_table->free_hugepage_table do the >> - * right thing if we used vmem_altmap when hot-adding >> - * the range. >> + * Mark altmap NULL so that we can add a debug >> + * check on memblock free. >> */ >> - mhp_altmap.base_pfn = PHYS_PFN(start); >> - mhp_altmap.free = nr_vmemmap_pages; >> - mhp_altmap.alloc = nr_vmemmap_pages; >> - altmap = &mhp_altmap; >> + mem->altmap = NULL; >> } >> } >> Thank you for taking the time to review the patch. -aneesh
On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote: > On 8/1/23 4:20 PM, Michal Hocko wrote: > > On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote: > >> On 8/1/23 2:28 PM, Michal Hocko wrote: > >>> On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote: > >>>> Allow updating memmap_on_memory mode after the kernel boot. Memory > >>>> hotplug done after the mode update will use the new mmemap_on_memory > >>>> value. > >>> > >>> Well, this is a user space kABI extension and as such you should spend > >>> more words about the usecase. Why we could live with this static and now > >>> need dynamic? > >>> > >> > >> This enables easy testing of memmap_on_memory feature without a kernel reboot. > > > > Testing alone is rather weak argument to be honest. > > > >> I also expect people wanting to use that when they find dax kmem memory online > >> failing because of struct page allocation failures[1]. User could reboot back with > >> memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes > >> the feature much more useful. > > > > Sure it can be useful but that holds for any feature, right. The main > > question is whether this is worth maintaing. The current implementation > > seems rather trivial which is an argument to have it but are there any > > risks long term? Have you evaluated a potential long term maintenance > > cost? There is no easy way to go back and disable it later on without > > breaking some userspace. > > > > All that should be in the changelog! > > I updated it as below. > > mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter > > Allow updating memmap_on_memory mode after the kernel boot. Memory > hotplug done after the mode update will use the new mmemap_on_memory > value. > > It is now possible to test the memmap_on_memory feature easily without > the need for a kernel reboot. Additionally, for those encountering > struct page allocation failures while using dax kmem memory online, this > feature may prove useful. Instead of rebooting with the > memmap_on_memory=y kernel parameter, users can now enable it via sysfs, > which greatly enhances its usefulness. I do not really see a solid argument why rebooting is really a problem TBH. Also is the global policy knob really the right fit for existing hotplug usecases? In other words, if we really want to make memmap_on_memory more flexible would it make more sense to have it per memory block property instead (the global knob being the default if admin doesn't specify it differently).
On 02.08.23 17:50, Michal Hocko wrote: > On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote: >> On 8/1/23 4:20 PM, Michal Hocko wrote: >>> On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote: >>>> On 8/1/23 2:28 PM, Michal Hocko wrote: >>>>> On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote: >>>>>> Allow updating memmap_on_memory mode after the kernel boot. Memory >>>>>> hotplug done after the mode update will use the new mmemap_on_memory >>>>>> value. >>>>> >>>>> Well, this is a user space kABI extension and as such you should spend >>>>> more words about the usecase. Why we could live with this static and now >>>>> need dynamic? >>>>> >>>> >>>> This enables easy testing of memmap_on_memory feature without a kernel reboot. >>> >>> Testing alone is rather weak argument to be honest. >>> >>>> I also expect people wanting to use that when they find dax kmem memory online >>>> failing because of struct page allocation failures[1]. User could reboot back with >>>> memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes >>>> the feature much more useful. >>> >>> Sure it can be useful but that holds for any feature, right. The main >>> question is whether this is worth maintaing. The current implementation >>> seems rather trivial which is an argument to have it but are there any >>> risks long term? Have you evaluated a potential long term maintenance >>> cost? There is no easy way to go back and disable it later on without >>> breaking some userspace. >>> >>> All that should be in the changelog! >> >> I updated it as below. >> >> mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter >> >> Allow updating memmap_on_memory mode after the kernel boot. Memory >> hotplug done after the mode update will use the new mmemap_on_memory >> value. >> >> It is now possible to test the memmap_on_memory feature easily without >> the need for a kernel reboot. Additionally, for those encountering >> struct page allocation failures while using dax kmem memory online, this >> feature may prove useful. Instead of rebooting with the >> memmap_on_memory=y kernel parameter, users can now enable it via sysfs, >> which greatly enhances its usefulness. > > > I do not really see a solid argument why rebooting is really a problem > TBH. Also is the global policy knob really the right fit for existing > hotplug usecases? In other words, if we really want to make > memmap_on_memory more flexible would it make more sense to have it per > memory block property instead (the global knob being the default if > admin doesn't specify it differently). Per memory block isn't possible, due to the creation order. Also, I think it's not the right approach. I thought about driver toggles. At least for dax/kmem people are looking into that: https://lkml.kernel.org/r/20230801-vv-kmem_memmap-v3-2-406e9aaf5689@intel.com Where that can also be toggled per device.
On 8/2/23 9:24 PM, David Hildenbrand wrote: > On 02.08.23 17:50, Michal Hocko wrote: >> On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote: >>> On 8/1/23 4:20 PM, Michal Hocko wrote: >>>> On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote: >>>>> On 8/1/23 2:28 PM, Michal Hocko wrote: >>>>>> On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote: >>>>>>> Allow updating memmap_on_memory mode after the kernel boot. Memory >>>>>>> hotplug done after the mode update will use the new mmemap_on_memory >>>>>>> value. >>>>>> >>>>>> Well, this is a user space kABI extension and as such you should spend >>>>>> more words about the usecase. Why we could live with this static and now >>>>>> need dynamic? >>>>>> >>>>> >>>>> This enables easy testing of memmap_on_memory feature without a kernel reboot. >>>> >>>> Testing alone is rather weak argument to be honest. >>>> >>>>> I also expect people wanting to use that when they find dax kmem memory online >>>>> failing because of struct page allocation failures[1]. User could reboot back with >>>>> memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes >>>>> the feature much more useful. >>>> >>>> Sure it can be useful but that holds for any feature, right. The main >>>> question is whether this is worth maintaing. The current implementation >>>> seems rather trivial which is an argument to have it but are there any >>>> risks long term? Have you evaluated a potential long term maintenance >>>> cost? There is no easy way to go back and disable it later on without >>>> breaking some userspace. >>>> >>>> All that should be in the changelog! >>> >>> I updated it as below. >>> >>> mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter >>> >>> Allow updating memmap_on_memory mode after the kernel boot. Memory >>> hotplug done after the mode update will use the new mmemap_on_memory >>> value. >>> >>> It is now possible to test the memmap_on_memory feature easily without >>> the need for a kernel reboot. Additionally, for those encountering >>> struct page allocation failures while using dax kmem memory online, this >>> feature may prove useful. Instead of rebooting with the >>> memmap_on_memory=y kernel parameter, users can now enable it via sysfs, >>> which greatly enhances its usefulness. >> >> >> I do not really see a solid argument why rebooting is really a problem >> TBH. Also is the global policy knob really the right fit for existing >> hotplug usecases? In other words, if we really want to make >> memmap_on_memory more flexible would it make more sense to have it per >> memory block property instead (the global knob being the default if >> admin doesn't specify it differently). > > Per memory block isn't possible, due to the creation order. Also, I think it's not the right approach. > > I thought about driver toggles. At least for dax/kmem people are looking into that: > > https://lkml.kernel.org/r/20230801-vv-kmem_memmap-v3-2-406e9aaf5689@intel.com > > Where that can also be toggled per device. > That still is dependent on the global memmap_on_memory enabled right? We could make the dax facility independent of the global feature toggle? -aneesh
On Wed, 2023-08-02 at 21:27 +0530, Aneesh Kumar K V wrote: > On 8/2/23 9:24 PM, David Hildenbrand wrote: > > On 02.08.23 17:50, Michal Hocko wrote: > > > On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote: > > > > On 8/1/23 4:20 PM, Michal Hocko wrote: > > > > > On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote: > > > > > > On 8/1/23 2:28 PM, Michal Hocko wrote: > > > > > > > On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote: > > > > > > > > Allow updating memmap_on_memory mode after the kernel boot. Memory > > > > > > > > hotplug done after the mode update will use the new mmemap_on_memory > > > > > > > > value. > > > > > > > > > > > > > > Well, this is a user space kABI extension and as such you should spend > > > > > > > more words about the usecase. Why we could live with this static and now > > > > > > > need dynamic? > > > > > > > > > > > > > > > > > > > This enables easy testing of memmap_on_memory feature without a kernel reboot. > > > > > > > > > > Testing alone is rather weak argument to be honest. > > > > > > > > > > > I also expect people wanting to use that when they find dax kmem memory online > > > > > > failing because of struct page allocation failures[1]. User could reboot back with > > > > > > memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes > > > > > > the feature much more useful. > > > > > > > > > > Sure it can be useful but that holds for any feature, right. The main > > > > > question is whether this is worth maintaing. The current implementation > > > > > seems rather trivial which is an argument to have it but are there any > > > > > risks long term? Have you evaluated a potential long term maintenance > > > > > cost? There is no easy way to go back and disable it later on without > > > > > breaking some userspace. > > > > > > > > > > All that should be in the changelog! > > > > > > > > I updated it as below. > > > > > > > > mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter > > > > > > > > Allow updating memmap_on_memory mode after the kernel boot. Memory > > > > hotplug done after the mode update will use the new mmemap_on_memory > > > > value. > > > > > > > > It is now possible to test the memmap_on_memory feature easily without > > > > the need for a kernel reboot. Additionally, for those encountering > > > > struct page allocation failures while using dax kmem memory online, this > > > > feature may prove useful. Instead of rebooting with the > > > > memmap_on_memory=y kernel parameter, users can now enable it via sysfs, > > > > which greatly enhances its usefulness. > > > > > > > > > I do not really see a solid argument why rebooting is really a problem > > > TBH. Also is the global policy knob really the right fit for existing > > > hotplug usecases? In other words, if we really want to make > > > memmap_on_memory more flexible would it make more sense to have it per > > > memory block property instead (the global knob being the default if > > > admin doesn't specify it differently). > > > > Per memory block isn't possible, due to the creation order. Also, I think it's not the right approach. > > > > I thought about driver toggles. At least for dax/kmem people are looking into that: > > > > https://lkml.kernel.org/r/20230801-vv-kmem_memmap-v3-2-406e9aaf5689@intel.com > > > > Where that can also be toggled per device. > > > > That still is dependent on the global memmap_on_memory enabled right? We could make the dax facility independent of the > global feature toggle? Correct - the latest version of those David linked does depend on the global memmap_on_memory param. Since kmem's behavior for dax devices is set up to be turned on / off dynamically via sysfs, it would be nice to have a similar facility for this flag. I did try the making dax independent of memmap_on_memory approach in my first posting: https://lore.kernel.org/nvdimm/20230613-vv-kmem_memmap-v1-1-f6de9c6af2c6@intel.com/ We can certainly revisit that if it looks more suitable.
On Wed 02-08-23 17:54:04, David Hildenbrand wrote: > On 02.08.23 17:50, Michal Hocko wrote: > > On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote: > > > On 8/1/23 4:20 PM, Michal Hocko wrote: > > > > On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote: > > > > > On 8/1/23 2:28 PM, Michal Hocko wrote: > > > > > > On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote: > > > > > > > Allow updating memmap_on_memory mode after the kernel boot. Memory > > > > > > > hotplug done after the mode update will use the new mmemap_on_memory > > > > > > > value. > > > > > > > > > > > > Well, this is a user space kABI extension and as such you should spend > > > > > > more words about the usecase. Why we could live with this static and now > > > > > > need dynamic? > > > > > > > > > > > > > > > > This enables easy testing of memmap_on_memory feature without a kernel reboot. > > > > > > > > Testing alone is rather weak argument to be honest. > > > > > > > > > I also expect people wanting to use that when they find dax kmem memory online > > > > > failing because of struct page allocation failures[1]. User could reboot back with > > > > > memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes > > > > > the feature much more useful. > > > > > > > > Sure it can be useful but that holds for any feature, right. The main > > > > question is whether this is worth maintaing. The current implementation > > > > seems rather trivial which is an argument to have it but are there any > > > > risks long term? Have you evaluated a potential long term maintenance > > > > cost? There is no easy way to go back and disable it later on without > > > > breaking some userspace. > > > > > > > > All that should be in the changelog! > > > > > > I updated it as below. > > > > > > mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter > > > > > > Allow updating memmap_on_memory mode after the kernel boot. Memory > > > hotplug done after the mode update will use the new mmemap_on_memory > > > value. > > > > > > It is now possible to test the memmap_on_memory feature easily without > > > the need for a kernel reboot. Additionally, for those encountering > > > struct page allocation failures while using dax kmem memory online, this > > > feature may prove useful. Instead of rebooting with the > > > memmap_on_memory=y kernel parameter, users can now enable it via sysfs, > > > which greatly enhances its usefulness. > > > > > > I do not really see a solid argument why rebooting is really a problem > > TBH. Also is the global policy knob really the right fit for existing > > hotplug usecases? In other words, if we really want to make > > memmap_on_memory more flexible would it make more sense to have it per > > memory block property instead (the global knob being the default if > > admin doesn't specify it differently). > > Per memory block isn't possible, due to the creation order. I am not sure I follow. Could you elaborate more? > Also, I think it's not the right approach. > > I thought about driver toggles. At least for dax/kmem people are looking > into that: > > https://lkml.kernel.org/r/20230801-vv-kmem_memmap-v3-2-406e9aaf5689@intel.com > > Where that can also be toggled per device. Per device flag makes sense to me as well. But what we should do about hotplugged memory with a backing device (like regular RAM). I can see some reason to have large hotplugged memory ranges to be self vmemap hosted while smaller blocks to be backed by external vmemmaps.
On Wed 02-08-23 18:59:04, Michal Hocko wrote: > On Wed 02-08-23 17:54:04, David Hildenbrand wrote: > > On 02.08.23 17:50, Michal Hocko wrote: > > > On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote: > > > > On 8/1/23 4:20 PM, Michal Hocko wrote: > > > > > On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote: > > > > > > On 8/1/23 2:28 PM, Michal Hocko wrote: > > > > > > > On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote: > > > > > > > > Allow updating memmap_on_memory mode after the kernel boot. Memory > > > > > > > > hotplug done after the mode update will use the new mmemap_on_memory > > > > > > > > value. > > > > > > > > > > > > > > Well, this is a user space kABI extension and as such you should spend > > > > > > > more words about the usecase. Why we could live with this static and now > > > > > > > need dynamic? > > > > > > > > > > > > > > > > > > > This enables easy testing of memmap_on_memory feature without a kernel reboot. > > > > > > > > > > Testing alone is rather weak argument to be honest. > > > > > > > > > > > I also expect people wanting to use that when they find dax kmem memory online > > > > > > failing because of struct page allocation failures[1]. User could reboot back with > > > > > > memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes > > > > > > the feature much more useful. > > > > > > > > > > Sure it can be useful but that holds for any feature, right. The main > > > > > question is whether this is worth maintaing. The current implementation > > > > > seems rather trivial which is an argument to have it but are there any > > > > > risks long term? Have you evaluated a potential long term maintenance > > > > > cost? There is no easy way to go back and disable it later on without > > > > > breaking some userspace. > > > > > > > > > > All that should be in the changelog! > > > > > > > > I updated it as below. > > > > > > > > mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter > > > > > > > > Allow updating memmap_on_memory mode after the kernel boot. Memory > > > > hotplug done after the mode update will use the new mmemap_on_memory > > > > value. > > > > > > > > It is now possible to test the memmap_on_memory feature easily without > > > > the need for a kernel reboot. Additionally, for those encountering > > > > struct page allocation failures while using dax kmem memory online, this > > > > feature may prove useful. Instead of rebooting with the > > > > memmap_on_memory=y kernel parameter, users can now enable it via sysfs, > > > > which greatly enhances its usefulness. > > > > > > > > > I do not really see a solid argument why rebooting is really a problem > > > TBH. Also is the global policy knob really the right fit for existing > > > hotplug usecases? In other words, if we really want to make > > > memmap_on_memory more flexible would it make more sense to have it per > > > memory block property instead (the global knob being the default if > > > admin doesn't specify it differently). > > > > Per memory block isn't possible, due to the creation order. > > I am not sure I follow. Could you elaborate more? Must have been a tired brain. Now I see what you mean of course. vmemmap is allocated at the time the range is registered and therefore memory block sysfs created. So you are right that it is too late in some sense but I still believe that this would be doable. The memmap_on_memory file would be readable only when the block is offline and it would reallocate vmemmap on the change. Makes sense? Are there any risks? Maybe pfn walkers?
On 03.08.23 10:52, Michal Hocko wrote: > On Wed 02-08-23 18:59:04, Michal Hocko wrote: >> On Wed 02-08-23 17:54:04, David Hildenbrand wrote: >>> On 02.08.23 17:50, Michal Hocko wrote: >>>> On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote: >>>>> On 8/1/23 4:20 PM, Michal Hocko wrote: >>>>>> On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote: >>>>>>> On 8/1/23 2:28 PM, Michal Hocko wrote: >>>>>>>> On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote: >>>>>>>>> Allow updating memmap_on_memory mode after the kernel boot. Memory >>>>>>>>> hotplug done after the mode update will use the new mmemap_on_memory >>>>>>>>> value. >>>>>>>> >>>>>>>> Well, this is a user space kABI extension and as such you should spend >>>>>>>> more words about the usecase. Why we could live with this static and now >>>>>>>> need dynamic? >>>>>>>> >>>>>>> >>>>>>> This enables easy testing of memmap_on_memory feature without a kernel reboot. >>>>>> >>>>>> Testing alone is rather weak argument to be honest. >>>>>> >>>>>>> I also expect people wanting to use that when they find dax kmem memory online >>>>>>> failing because of struct page allocation failures[1]. User could reboot back with >>>>>>> memmap_on_memory=y kernel parameter. But being able to enable it via sysfs makes >>>>>>> the feature much more useful. >>>>>> >>>>>> Sure it can be useful but that holds for any feature, right. The main >>>>>> question is whether this is worth maintaing. The current implementation >>>>>> seems rather trivial which is an argument to have it but are there any >>>>>> risks long term? Have you evaluated a potential long term maintenance >>>>>> cost? There is no easy way to go back and disable it later on without >>>>>> breaking some userspace. >>>>>> >>>>>> All that should be in the changelog! >>>>> >>>>> I updated it as below. >>>>> >>>>> mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter >>>>> >>>>> Allow updating memmap_on_memory mode after the kernel boot. Memory >>>>> hotplug done after the mode update will use the new mmemap_on_memory >>>>> value. >>>>> >>>>> It is now possible to test the memmap_on_memory feature easily without >>>>> the need for a kernel reboot. Additionally, for those encountering >>>>> struct page allocation failures while using dax kmem memory online, this >>>>> feature may prove useful. Instead of rebooting with the >>>>> memmap_on_memory=y kernel parameter, users can now enable it via sysfs, >>>>> which greatly enhances its usefulness. >>>> >>>> >>>> I do not really see a solid argument why rebooting is really a problem >>>> TBH. Also is the global policy knob really the right fit for existing >>>> hotplug usecases? In other words, if we really want to make >>>> memmap_on_memory more flexible would it make more sense to have it per >>>> memory block property instead (the global knob being the default if >>>> admin doesn't specify it differently). >>> >>> Per memory block isn't possible, due to the creation order. >> >> I am not sure I follow. Could you elaborate more? > > Must have been a tired brain. Now I see what you mean of course. vmemmap > is allocated at the time the range is registered and therefore memory > block sysfs created. So you are right that it is too late in some sense > but I still believe that this would be doable. The memmap_on_memory file Exactly. > would be readable only when the block is offline and it would reallocate > vmemmap on the change. Makes sense? Are there any risks? Maybe pfn > walkers? The question is: is it of any real value such that it would be worth the cost and risk? One of the primary reasons for memmap_on_memory is that *memory hotplug* succeeds even in low-memory situations (including, low on ZONE_NORMAL situations). So you want that behavior already when hotplugging such devices. While there might be value to relocate it later, I'm not sure if that is really worth it, and it does not solve the main use case. For dax/kmem memory hotplug this is controlled by user space, so user space can easily configure it. For DIMMs it's just a hotplug event that triggers all that in the kernel, and one has to plan ahead (e.g., set the global kernel parameter). Besides, devices like virtio-mem really cannot universally support memmap_on_memory.
On Thu 03-08-23 11:24:08, David Hildenbrand wrote: [...] > > would be readable only when the block is offline and it would reallocate > > vmemmap on the change. Makes sense? Are there any risks? Maybe pfn > > walkers? > > The question is: is it of any real value such that it would be worth the > cost and risk? > > > One of the primary reasons for memmap_on_memory is that *memory hotplug* > succeeds even in low-memory situations (including, low on ZONE_NORMAL > situations). One usecase I would have in mind is a mix of smaller and larger memory blocks. For larger ones you want to have memmap_on_memory in general because they do not eat memory from outside but small(er) ones might be more tricky because now you can add a lot of blocks that would be internally fragmented to prevent larger allocations to form. > So you want that behavior already when hotplugging such > devices. While there might be value to relocate it later, I'm not sure if > that is really worth it, and it does not solve the main use case. Is it worth it? TBH I am not sure same as I am not sure the global default should be writable after boot. If we want to make it more dynamic we should however talk about the proper layer this is implemented on.
On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote: [...] > Do you see a need for firmware-managed memory to be hotplugged in with > different memory block sizes? In short. Yes. Slightly longer, a fixed size memory block semantic is just standing in the way and I would even argue it is actively harmful. Just have a look at ridicously small memory blocks on ppc. I do understand that it makes some sense to be aligned to the memory model (so sparsmem section aligned). In an ideal world, memory hotplug v2 interface (if we ever go that path) should be physical memory range based.
On 07.08.23 14:27, Michal Hocko wrote: > On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote: > [...] >> Do you see a need for firmware-managed memory to be hotplugged in with >> different memory block sizes? > > In short. Yes. Slightly longer, a fixed size memory block semantic is > just standing in the way and I would even argue it is actively harmful. > Just have a look at ridicously small memory blocks on ppc. I do > understand that it makes some sense to be aligned to the memory model > (so sparsmem section aligned). In an ideal world, memory hotplug v2 > interface (if we ever go that path) should be physical memory range based. Yes, we discussed that a couple of times already (and so far nobody cared to implement any of that). Small memory block sizes are very beneficial for use cases like PPC dlar, virtio-mem, hyperv-balloon, ... essentially in most virtual environments where you might want to add/remove memory in very small granularity. I don't see that changing any time soon. Rather the opposite. Small memory block sizes are suboptimal for large machines where you might never end up removing such memory (boot memory), or when dealing with devices that can only be removed in one piece (DIMM/kmem). We already have memory groups in place to model that. For the latter it might be beneficial to have memory blocks of larger size that correspond to the physical memory ranges. That might also make a memmap (re-)configuration easier. Not sure if that is standing in any way or is harmful, though.
On 03.08.23 13:30, Michal Hocko wrote: > On Thu 03-08-23 11:24:08, David Hildenbrand wrote: > [...] >>> would be readable only when the block is offline and it would reallocate >>> vmemmap on the change. Makes sense? Are there any risks? Maybe pfn >>> walkers? >> >> The question is: is it of any real value such that it would be worth the >> cost and risk? >> >> >> One of the primary reasons for memmap_on_memory is that *memory hotplug* >> succeeds even in low-memory situations (including, low on ZONE_NORMAL >> situations). Sorry for the late reply, I'm busy with 100 other things. > > One usecase I would have in mind is a mix of smaller and larger memory > blocks. For larger ones you want to have memmap_on_memory in general > because they do not eat memory from outside but small(er) ones might be > more tricky because now you can add a lot of blocks that would be > internally fragmented to prevent larger allocations to form. Okay, I see what you mean. The internal fragmentation might become an issue at some point: for x86-64 with 128 MiB blocks / 2 MiB THP it's not a real issue right now. For a arm64 64k with 512 MiB blocks and 512 MiB THP / hugelb it could be one. I recall discussing that with Oscar back when he added memmap_on_memory, where we also discussed the variable-sized memory blocks to avoid such internal fragmentation. For small ones you probably want to only use memmap_on_memory when unavoidable: for example, when adding without memmap_on_memory would fail / already failed. Possibly some later memmap relocation might make sense in some scenarios. > >> So you want that behavior already when hotplugging such >> devices. While there might be value to relocate it later, I'm not sure if >> that is really worth it, and it does not solve the main use case. > > Is it worth it? TBH I am not sure same as I am not sure the global > default should be writable after boot. If we want to make it more > dynamic we should however talk about the proper layer this is > implemented on. Agreed.
On 07.08.23 14:41, David Hildenbrand wrote: > On 07.08.23 14:27, Michal Hocko wrote: >> On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote: >> [...] >>> Do you see a need for firmware-managed memory to be hotplugged in with >>> different memory block sizes? >> >> In short. Yes. Slightly longer, a fixed size memory block semantic is >> just standing in the way and I would even argue it is actively harmful. >> Just have a look at ridicously small memory blocks on ppc. I do >> understand that it makes some sense to be aligned to the memory model >> (so sparsmem section aligned). In an ideal world, memory hotplug v2 >> interface (if we ever go that path) should be physical memory range based. > > Yes, we discussed that a couple of times already (and so far nobody > cared to implement any of that). > > Small memory block sizes are very beneficial for use cases like PPC > dlar, virtio-mem, hyperv-balloon, ... essentially in most virtual > environments where you might want to add/remove memory in very small > granularity. I don't see that changing any time soon. Rather the opposite. > > Small memory block sizes are suboptimal for large machines where you > might never end up removing such memory (boot memory), or when dealing > with devices that can only be removed in one piece (DIMM/kmem). We > already have memory groups in place to model that. > > For the latter it might be beneficial to have memory blocks of larger > size that correspond to the physical memory ranges. That might also make > a memmap (re-)configuration easier. > > Not sure if that is standing in any way or is harmful, though. > Just because I thought of something right now, I'll share it, maybe it makes sense. Assume when we get add_memory*(MHP_MEMMAP_ON_MEMORY) and it is enabled by the admin: 1) We create a single altmap at the beginning of the memory 2) We create the existing fixed-size memory block devices, but flag them to be part of a single "altmap" unit. 3) Whenever we trigger offlining of a single such memory block, we offline *all* memory blocks belonging to that altmap, essentially using a single offline_pages() call and updating all memory block states accordingly. 4) Whenever we trigger onlining of a single such memory block, we online *all* memory blocks belonging to that altmap, using a single online_pages() call. 5) We fail remove_memory() if it doesn't cover the same (altmap) range. So we can avoid having a memory block v2 (and all that comes with that ...) for now and still get that altmap stuff sorted out. As that altmap behavior can be controlled by the admin, we should be fine for now. I think all memory notifiers should already be able to handle bigger granularity, but it would be easy to check. Some internal things might require a bit of tweaking. Just a thought.
On 8/8/23 12:05 AM, David Hildenbrand wrote: > On 07.08.23 14:41, David Hildenbrand wrote: >> On 07.08.23 14:27, Michal Hocko wrote: >>> On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote: >>> [...] >>>> Do you see a need for firmware-managed memory to be hotplugged in with >>>> different memory block sizes? >>> >>> In short. Yes. Slightly longer, a fixed size memory block semantic is >>> just standing in the way and I would even argue it is actively harmful. >>> Just have a look at ridicously small memory blocks on ppc. I do >>> understand that it makes some sense to be aligned to the memory model >>> (so sparsmem section aligned). In an ideal world, memory hotplug v2 >>> interface (if we ever go that path) should be physical memory range based. >> >> Yes, we discussed that a couple of times already (and so far nobody >> cared to implement any of that). >> >> Small memory block sizes are very beneficial for use cases like PPC >> dlar, virtio-mem, hyperv-balloon, ... essentially in most virtual >> environments where you might want to add/remove memory in very small >> granularity. I don't see that changing any time soon. Rather the opposite. >> >> Small memory block sizes are suboptimal for large machines where you >> might never end up removing such memory (boot memory), or when dealing >> with devices that can only be removed in one piece (DIMM/kmem). We >> already have memory groups in place to model that. >> >> For the latter it might be beneficial to have memory blocks of larger >> size that correspond to the physical memory ranges. That might also make >> a memmap (re-)configuration easier. >> >> Not sure if that is standing in any way or is harmful, though. >> > > Just because I thought of something right now, I'll share it, maybe it makes sense. > > Assume when we get add_memory*(MHP_MEMMAP_ON_MEMORY) and it is enabled by the admin: > > 1) We create a single altmap at the beginning of the memory > > 2) We create the existing fixed-size memory block devices, but flag them > to be part of a single "altmap" unit. > > 3) Whenever we trigger offlining of a single such memory block, we > offline *all* memory blocks belonging to that altmap, essentially > using a single offline_pages() call and updating all memory block > states accordingly. > > 4) Whenever we trigger onlining of a single such memory block, we > online *all* memory blocks belonging to that altmap, using a single > online_pages() call. > > 5) We fail remove_memory() if it doesn't cover the same (altmap) range. > > So we can avoid having a memory block v2 (and all that comes with that ...) for now and still get that altmap stuff sorted out. As that altmap behavior can be controlled by the admin, we should be fine for now. > > I think all memory notifiers should already be able to handle bigger granularity, but it would be easy to check. Some internal things might require a bit of tweaking. > > Just a thought. > W.r.t enabling memmap_on_memory for ppc64, I guess I will drop patch 7 and resend the series so that Andrew can pick rest of the patches? -aneesh
On 8/8/23 12:05 AM, David Hildenbrand wrote: > On 07.08.23 14:41, David Hildenbrand wrote: >> On 07.08.23 14:27, Michal Hocko wrote: >>> On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote: >>> [...] >>>> Do you see a need for firmware-managed memory to be hotplugged in with >>>> different memory block sizes? >>> >>> In short. Yes. Slightly longer, a fixed size memory block semantic is >>> just standing in the way and I would even argue it is actively harmful. >>> Just have a look at ridicously small memory blocks on ppc. I do >>> understand that it makes some sense to be aligned to the memory model >>> (so sparsmem section aligned). In an ideal world, memory hotplug v2 >>> interface (if we ever go that path) should be physical memory range based. >> >> Yes, we discussed that a couple of times already (and so far nobody >> cared to implement any of that). >> >> Small memory block sizes are very beneficial for use cases like PPC >> dlar, virtio-mem, hyperv-balloon, ... essentially in most virtual >> environments where you might want to add/remove memory in very small >> granularity. I don't see that changing any time soon. Rather the opposite. >> >> Small memory block sizes are suboptimal for large machines where you >> might never end up removing such memory (boot memory), or when dealing >> with devices that can only be removed in one piece (DIMM/kmem). We >> already have memory groups in place to model that. >> >> For the latter it might be beneficial to have memory blocks of larger >> size that correspond to the physical memory ranges. That might also make >> a memmap (re-)configuration easier. >> >> Not sure if that is standing in any way or is harmful, though. >> > > Just because I thought of something right now, I'll share it, maybe it makes sense. > > Assume when we get add_memory*(MHP_MEMMAP_ON_MEMORY) and it is enabled by the admin: > > 1) We create a single altmap at the beginning of the memory > > 2) We create the existing fixed-size memory block devices, but flag them > to be part of a single "altmap" unit. > > 3) Whenever we trigger offlining of a single such memory block, we > offline *all* memory blocks belonging to that altmap, essentially > using a single offline_pages() call and updating all memory block > states accordingly. > > 4) Whenever we trigger onlining of a single such memory block, we > online *all* memory blocks belonging to that altmap, using a single > online_pages() call. > > 5) We fail remove_memory() if it doesn't cover the same (altmap) range. > > So we can avoid having a memory block v2 (and all that comes with that ...) for now and still get that altmap stuff sorted out. As that altmap behavior can be controlled by the admin, we should be fine for now. > > I think all memory notifiers should already be able to handle bigger granularity, but it would be easy to check. Some internal things might require a bit of tweaking. > We can look at the possibility of using the altmap space reserved for a namespace (via option -M dev) for allocating struct page memory even with dax/kmem. -aneesh
On 08.08.23 08:29, Aneesh Kumar K V wrote: > On 8/8/23 12:05 AM, David Hildenbrand wrote: >> On 07.08.23 14:41, David Hildenbrand wrote: >>> On 07.08.23 14:27, Michal Hocko wrote: >>>> On Sat 05-08-23 19:54:23, Aneesh Kumar K V wrote: >>>> [...] >>>>> Do you see a need for firmware-managed memory to be hotplugged in with >>>>> different memory block sizes? >>>> >>>> In short. Yes. Slightly longer, a fixed size memory block semantic is >>>> just standing in the way and I would even argue it is actively harmful. >>>> Just have a look at ridicously small memory blocks on ppc. I do >>>> understand that it makes some sense to be aligned to the memory model >>>> (so sparsmem section aligned). In an ideal world, memory hotplug v2 >>>> interface (if we ever go that path) should be physical memory range based. >>> >>> Yes, we discussed that a couple of times already (and so far nobody >>> cared to implement any of that). >>> >>> Small memory block sizes are very beneficial for use cases like PPC >>> dlar, virtio-mem, hyperv-balloon, ... essentially in most virtual >>> environments where you might want to add/remove memory in very small >>> granularity. I don't see that changing any time soon. Rather the opposite. >>> >>> Small memory block sizes are suboptimal for large machines where you >>> might never end up removing such memory (boot memory), or when dealing >>> with devices that can only be removed in one piece (DIMM/kmem). We >>> already have memory groups in place to model that. >>> >>> For the latter it might be beneficial to have memory blocks of larger >>> size that correspond to the physical memory ranges. That might also make >>> a memmap (re-)configuration easier. >>> >>> Not sure if that is standing in any way or is harmful, though. >>> >> >> Just because I thought of something right now, I'll share it, maybe it makes sense. >> >> Assume when we get add_memory*(MHP_MEMMAP_ON_MEMORY) and it is enabled by the admin: >> >> 1) We create a single altmap at the beginning of the memory >> >> 2) We create the existing fixed-size memory block devices, but flag them >> to be part of a single "altmap" unit. >> >> 3) Whenever we trigger offlining of a single such memory block, we >> offline *all* memory blocks belonging to that altmap, essentially >> using a single offline_pages() call and updating all memory block >> states accordingly. >> >> 4) Whenever we trigger onlining of a single such memory block, we >> online *all* memory blocks belonging to that altmap, using a single >> online_pages() call. >> >> 5) We fail remove_memory() if it doesn't cover the same (altmap) range. >> >> So we can avoid having a memory block v2 (and all that comes with that ...) for now and still get that altmap stuff sorted out. As that altmap behavior can be controlled by the admin, we should be fine for now. >> >> I think all memory notifiers should already be able to handle bigger granularity, but it would be easy to check. Some internal things might require a bit of tweaking. >> > > We can look at the possibility of using the altmap space reserved for a namespace (via option -M dev) for allocating struct page memory even with dax/kmem. Right, an alternative would also be for the caller to pass in the altmap. Individual memory blocks can then get onlined/offlined as is. One issue might be, how to get that altmap considered online memory (e.g., pfn_to_online_page(), kdump, ...). Right now, the altmap always falls into an online section once the memory block is online, so pfn_to_online_page() and especially online_section() holds as soon as the altmap has reasonable content -- when the memory block is online and initialized the memmap. Maybe that can be worked around, just pointing it out.