Message ID | 20190507183804.5512-1-david@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | mm/memory_hotplug: Factor out memory block device handling | expand |
On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote: > > We only want memory block devices for memory to be onlined/offlined > (add/remove from the buddy). This is required so user space can > online/offline memory and kdump gets notified about newly onlined memory. > > Only such memory has the requirement of having to span whole memory blocks. > Let's factor out creation/removal of memory block devices. This helps > to further cleanup arch_add_memory/arch_remove_memory() and to make > implementation of new features easier. E.g. supplying a driver for > memory block devices becomes way easier (so user space is able to > distinguish different types of added memory to properly online it). > > Patch 1 makes sure the memory block size granularity is always respected. > Patch 2 implements arch_remove_memory() on s390x. Patch 3 prepares > arch_remove_memory() to be also called without CONFIG_MEMORY_HOTREMOVE. > Patch 4,5 and 6 factor out creation/removal of memory block devices. > Patch 7 gets rid of some unlikely errors that could have happened, not > removing links between memory block devices and nodes, previously brought > up by Oscar. > > Did a quick sanity test with DIMM plug/unplug, making sure all devices > and sysfs links properly get added/removed. Compile tested on s390x and > x86-64. > > Based on git://git.cmpxchg.org/linux-mmots.git > > Next refactoring on my list will be making sure that remove_memory() > will never deal with zones / access "struct pages". Any kind of zone > handling will have to be done when offlining system memory / before > removing device memory. I am thinking about remove_pfn_range_from_zone()", > du undo everything "move_pfn_range_to_zone()" did. > > v1 -> v2: > - s390x/mm: Implement arch_remove_memory() > -- remove mapping after "__remove_pages" > > > David Hildenbrand (8): > mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() > s390x/mm: Implement arch_remove_memory() > mm/memory_hotplug: arch_remove_memory() and __remove_pages() with > CONFIG_MEMORY_HOTPLUG > mm/memory_hotplug: Create memory block devices after arch_add_memory() > mm/memory_hotplug: Drop MHP_MEMBLOCK_API So at a minimum we need a bit of patch staging guidance because this obviously collides with the subsection bits that are built on top of the existence of MHP_MEMBLOCK_API. What trigger do you envision as a replacement that arch_add_memory() use to determine that subsection operations should be disallowed?
On 07.05.19 21:04, Dan Williams wrote: > On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote: >> >> We only want memory block devices for memory to be onlined/offlined >> (add/remove from the buddy). This is required so user space can >> online/offline memory and kdump gets notified about newly onlined memory. >> >> Only such memory has the requirement of having to span whole memory blocks. >> Let's factor out creation/removal of memory block devices. This helps >> to further cleanup arch_add_memory/arch_remove_memory() and to make >> implementation of new features easier. E.g. supplying a driver for >> memory block devices becomes way easier (so user space is able to >> distinguish different types of added memory to properly online it). >> >> Patch 1 makes sure the memory block size granularity is always respected. >> Patch 2 implements arch_remove_memory() on s390x. Patch 3 prepares >> arch_remove_memory() to be also called without CONFIG_MEMORY_HOTREMOVE. >> Patch 4,5 and 6 factor out creation/removal of memory block devices. >> Patch 7 gets rid of some unlikely errors that could have happened, not >> removing links between memory block devices and nodes, previously brought >> up by Oscar. >> >> Did a quick sanity test with DIMM plug/unplug, making sure all devices >> and sysfs links properly get added/removed. Compile tested on s390x and >> x86-64. >> >> Based on git://git.cmpxchg.org/linux-mmots.git >> >> Next refactoring on my list will be making sure that remove_memory() >> will never deal with zones / access "struct pages". Any kind of zone >> handling will have to be done when offlining system memory / before >> removing device memory. I am thinking about remove_pfn_range_from_zone()", >> du undo everything "move_pfn_range_to_zone()" did. >> >> v1 -> v2: >> - s390x/mm: Implement arch_remove_memory() >> -- remove mapping after "__remove_pages" >> >> >> David Hildenbrand (8): >> mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() >> s390x/mm: Implement arch_remove_memory() >> mm/memory_hotplug: arch_remove_memory() and __remove_pages() with >> CONFIG_MEMORY_HOTPLUG >> mm/memory_hotplug: Create memory block devices after arch_add_memory() >> mm/memory_hotplug: Drop MHP_MEMBLOCK_API > > So at a minimum we need a bit of patch staging guidance because this > obviously collides with the subsection bits that are built on top of > the existence of MHP_MEMBLOCK_API. What trigger do you envision as a > replacement that arch_add_memory() use to determine that subsection > operations should be disallowed? > Looks like we now have time to sort it out :) Looking at your series [PATCH v8 08/12] mm/sparsemem: Prepare for sub-section ranges is the "single" effectively place using MHP_MEMBLOCK_API, namely "subsection_check()". Used when adding/removing memory. +static int subsection_check(unsigned long pfn, unsigned long nr_pages, + unsigned long flags, const char *reason) +{ + /* + * Only allow partial section hotplug for !memblock ranges, + * since register_new_memory() requires section alignment, and + * CONFIG_SPARSEMEM_VMEMMAP=n requires sections to be fully + * populated. + */ + if ((!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) + || (flags & MHP_MEMBLOCK_API)) + && ((pfn & ~PAGE_SECTION_MASK) + || (nr_pages & ~PAGE_SECTION_MASK))) { + WARN(1, "Sub-section hot-%s incompatible with %s\n", reason, + (flags & MHP_MEMBLOCK_API) + ? "memblock api" : "!CONFIG_SPARSEMEM_VMEMMAP"); + return -EINVAL; + } + return 0; } (flags & MHP_MEMBLOCK_API)) && ((pfn & ~PAGE_SECTION_MASK) || (nr_pages & ~PAGE_SECTION_MASK))) sounds like something the caller (add_memory()) always has to take care of. No need to check. The one imposing this restriction is the only caller. In my opinion, that check/function can go completely. Am I missing something / missing another user?
On 07.05.19 21:21, David Hildenbrand wrote: > On 07.05.19 21:04, Dan Williams wrote: >> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote: >>> >>> We only want memory block devices for memory to be onlined/offlined >>> (add/remove from the buddy). This is required so user space can >>> online/offline memory and kdump gets notified about newly onlined memory. >>> >>> Only such memory has the requirement of having to span whole memory blocks. >>> Let's factor out creation/removal of memory block devices. This helps >>> to further cleanup arch_add_memory/arch_remove_memory() and to make >>> implementation of new features easier. E.g. supplying a driver for >>> memory block devices becomes way easier (so user space is able to >>> distinguish different types of added memory to properly online it). >>> >>> Patch 1 makes sure the memory block size granularity is always respected. >>> Patch 2 implements arch_remove_memory() on s390x. Patch 3 prepares >>> arch_remove_memory() to be also called without CONFIG_MEMORY_HOTREMOVE. >>> Patch 4,5 and 6 factor out creation/removal of memory block devices. >>> Patch 7 gets rid of some unlikely errors that could have happened, not >>> removing links between memory block devices and nodes, previously brought >>> up by Oscar. >>> >>> Did a quick sanity test with DIMM plug/unplug, making sure all devices >>> and sysfs links properly get added/removed. Compile tested on s390x and >>> x86-64. >>> >>> Based on git://git.cmpxchg.org/linux-mmots.git >>> >>> Next refactoring on my list will be making sure that remove_memory() >>> will never deal with zones / access "struct pages". Any kind of zone >>> handling will have to be done when offlining system memory / before >>> removing device memory. I am thinking about remove_pfn_range_from_zone()", >>> du undo everything "move_pfn_range_to_zone()" did. >>> >>> v1 -> v2: >>> - s390x/mm: Implement arch_remove_memory() >>> -- remove mapping after "__remove_pages" >>> >>> >>> David Hildenbrand (8): >>> mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() >>> s390x/mm: Implement arch_remove_memory() >>> mm/memory_hotplug: arch_remove_memory() and __remove_pages() with >>> CONFIG_MEMORY_HOTPLUG >>> mm/memory_hotplug: Create memory block devices after arch_add_memory() >>> mm/memory_hotplug: Drop MHP_MEMBLOCK_API >> >> So at a minimum we need a bit of patch staging guidance because this >> obviously collides with the subsection bits that are built on top of >> the existence of MHP_MEMBLOCK_API. What trigger do you envision as a >> replacement that arch_add_memory() use to determine that subsection >> operations should be disallowed? >> > > Looks like we now have time to sort it out :) > > > Looking at your series > > [PATCH v8 08/12] mm/sparsemem: Prepare for sub-section ranges > > is the "single" effectively place using MHP_MEMBLOCK_API, namely > "subsection_check()". Used when adding/removing memory. > > > +static int subsection_check(unsigned long pfn, unsigned long nr_pages, > + unsigned long flags, const char *reason) > +{ > + /* > + * Only allow partial section hotplug for !memblock ranges, > + * since register_new_memory() requires section alignment, and > + * CONFIG_SPARSEMEM_VMEMMAP=n requires sections to be fully > + * populated. > + */ > + if ((!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) > + || (flags & MHP_MEMBLOCK_API)) > + && ((pfn & ~PAGE_SECTION_MASK) > + || (nr_pages & ~PAGE_SECTION_MASK))) { > + WARN(1, "Sub-section hot-%s incompatible with %s\n", reason, > + (flags & MHP_MEMBLOCK_API) > + ? "memblock api" : "!CONFIG_SPARSEMEM_VMEMMAP"); > + return -EINVAL; > + } > + return 0; > } > > > (flags & MHP_MEMBLOCK_API)) && ((pfn & ~PAGE_SECTION_MASK) || (nr_pages > & ~PAGE_SECTION_MASK))) > > sounds like something the caller (add_memory()) always has to take care > of. No need to check. The one imposing this restriction is the only caller. > > In my opinion, that check/function can go completely. > > Am I missing something / missing another user? > In other word, this series moves the restriction out of arch_add_memory() and therefore you don't need subsection_check() at all anymore. At least if I am not missing something :)
On Tue, May 7, 2019 at 12:38 PM David Hildenbrand <david@redhat.com> wrote: > > On 07.05.19 21:21, David Hildenbrand wrote: > > On 07.05.19 21:04, Dan Williams wrote: > >> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote: > >>> > >>> We only want memory block devices for memory to be onlined/offlined > >>> (add/remove from the buddy). This is required so user space can > >>> online/offline memory and kdump gets notified about newly onlined memory. > >>> > >>> Only such memory has the requirement of having to span whole memory blocks. > >>> Let's factor out creation/removal of memory block devices. This helps > >>> to further cleanup arch_add_memory/arch_remove_memory() and to make > >>> implementation of new features easier. E.g. supplying a driver for > >>> memory block devices becomes way easier (so user space is able to > >>> distinguish different types of added memory to properly online it). > >>> > >>> Patch 1 makes sure the memory block size granularity is always respected. > >>> Patch 2 implements arch_remove_memory() on s390x. Patch 3 prepares > >>> arch_remove_memory() to be also called without CONFIG_MEMORY_HOTREMOVE. > >>> Patch 4,5 and 6 factor out creation/removal of memory block devices. > >>> Patch 7 gets rid of some unlikely errors that could have happened, not > >>> removing links between memory block devices and nodes, previously brought > >>> up by Oscar. > >>> > >>> Did a quick sanity test with DIMM plug/unplug, making sure all devices > >>> and sysfs links properly get added/removed. Compile tested on s390x and > >>> x86-64. > >>> > >>> Based on git://git.cmpxchg.org/linux-mmots.git > >>> > >>> Next refactoring on my list will be making sure that remove_memory() > >>> will never deal with zones / access "struct pages". Any kind of zone > >>> handling will have to be done when offlining system memory / before > >>> removing device memory. I am thinking about remove_pfn_range_from_zone()", > >>> du undo everything "move_pfn_range_to_zone()" did. > >>> > >>> v1 -> v2: > >>> - s390x/mm: Implement arch_remove_memory() > >>> -- remove mapping after "__remove_pages" > >>> > >>> > >>> David Hildenbrand (8): > >>> mm/memory_hotplug: Simplify and fix check_hotplug_memory_range() > >>> s390x/mm: Implement arch_remove_memory() > >>> mm/memory_hotplug: arch_remove_memory() and __remove_pages() with > >>> CONFIG_MEMORY_HOTPLUG > >>> mm/memory_hotplug: Create memory block devices after arch_add_memory() > >>> mm/memory_hotplug: Drop MHP_MEMBLOCK_API > >> > >> So at a minimum we need a bit of patch staging guidance because this > >> obviously collides with the subsection bits that are built on top of > >> the existence of MHP_MEMBLOCK_API. What trigger do you envision as a > >> replacement that arch_add_memory() use to determine that subsection > >> operations should be disallowed? > >> > > > > Looks like we now have time to sort it out :) > > > > > > Looking at your series > > > > [PATCH v8 08/12] mm/sparsemem: Prepare for sub-section ranges > > > > is the "single" effectively place using MHP_MEMBLOCK_API, namely > > "subsection_check()". Used when adding/removing memory. > > > > > > +static int subsection_check(unsigned long pfn, unsigned long nr_pages, > > + unsigned long flags, const char *reason) > > +{ > > + /* > > + * Only allow partial section hotplug for !memblock ranges, > > + * since register_new_memory() requires section alignment, and > > + * CONFIG_SPARSEMEM_VMEMMAP=n requires sections to be fully > > + * populated. > > + */ > > + if ((!IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) > > + || (flags & MHP_MEMBLOCK_API)) > > + && ((pfn & ~PAGE_SECTION_MASK) > > + || (nr_pages & ~PAGE_SECTION_MASK))) { > > + WARN(1, "Sub-section hot-%s incompatible with %s\n", reason, > > + (flags & MHP_MEMBLOCK_API) > > + ? "memblock api" : "!CONFIG_SPARSEMEM_VMEMMAP"); > > + return -EINVAL; > > + } > > + return 0; > > } > > > > > > (flags & MHP_MEMBLOCK_API)) && ((pfn & ~PAGE_SECTION_MASK) || (nr_pages > > & ~PAGE_SECTION_MASK))) > > > > sounds like something the caller (add_memory()) always has to take care > > of. No need to check. The one imposing this restriction is the only caller. > > > > In my opinion, that check/function can go completely. > > > > Am I missing something / missing another user? > > > > In other word, this series moves the restriction out of > arch_add_memory() and therefore you don't need subsection_check() at all > anymore. At least if I am not missing something :) Ah, ok. Only direct arch_add_memory() users need to be worried about subsection hotplug and the add_memory_resource() + __remove_memory() paths are already protected by check_hotplug_memory_range(). Ok, I can get on board with the removal. Let me go ahead and review this series so Andrew can get it pulled in and I can rebase.