Message ID | 20210607195430.48228-6-david@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/memory_hotplug: "auto-movable" online policy and memory groups | expand |
David Hildenbrand <david@redhat.com> writes: > There is only a single user remaining. We can simply try to offline all > online nodes - which is fast, because we usually span pages and can skip > such nodes right away. That makes me slightly nervous, because our big powerpc boxes tend to trip on these scaling issues before others. But the spanned pages check is just: void try_offline_node(int nid) { pg_data_t *pgdat = NODE_DATA(nid); ... if (pgdat->node_spanned_pages) return; So I guess that's pretty cheap, and it's only O(nodes), which should never get that big. > Cc: Michael Ellerman <mpe@ellerman.id.au> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Paul Mackerras <paulus@samba.org> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> > Cc: Len Brown <lenb@kernel.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Vishal Verma <vishal.l.verma@intel.com> > Cc: Dave Jiang <dave.jiang@intel.com> > Cc: "Michael S. Tsirkin" <mst@redhat.com> > Cc: Jason Wang <jasowang@redhat.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Nathan Lynch <nathanl@linux.ibm.com> > Cc: Laurent Dufour <ldufour@linux.ibm.com> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> > Cc: Scott Cheloha <cheloha@linux.ibm.com> > Cc: Anton Blanchard <anton@ozlabs.org> > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-acpi@vger.kernel.org > Cc: nvdimm@lists.linux.dev > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > .../platforms/pseries/hotplug-memory.c | 9 ++++----- > drivers/acpi/acpi_memhotplug.c | 7 +------ > drivers/dax/kmem.c | 3 +-- > drivers/virtio/virtio_mem.c | 4 ++-- > include/linux/memory_hotplug.h | 10 +++++----- > mm/memory_hotplug.c | 20 +++++++++---------- > 6 files changed, 23 insertions(+), 30 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c > index 8377f1f7c78e..4a9232ddbefe 100644 > --- a/arch/powerpc/platforms/pseries/hotplug-memory.c > +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c > @@ -286,7 +286,7 @@ static int pseries_remove_memblock(unsigned long base, unsigned long memblock_si > { > unsigned long block_sz, start_pfn; > int sections_per_block; > - int i, nid; > + int i; > > start_pfn = base >> PAGE_SHIFT; > > @@ -297,10 +297,9 @@ static int pseries_remove_memblock(unsigned long base, unsigned long memblock_si > > block_sz = pseries_memory_block_size(); > sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE; > - nid = memory_add_physaddr_to_nid(base); > > for (i = 0; i < sections_per_block; i++) { > - __remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE); > + __remove_memory(base, MIN_MEMORY_BLOCK_SIZE); > base += MIN_MEMORY_BLOCK_SIZE; > } > > @@ -386,7 +385,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb) > > block_sz = pseries_memory_block_size(); > > - __remove_memory(mem_block->nid, lmb->base_addr, block_sz); > + __remove_memory(lmb->base_addr, block_sz); > put_device(&mem_block->dev); > > /* Update memory regions for memory remove */ > @@ -638,7 +637,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb) > > rc = dlpar_online_lmb(lmb); > if (rc) { > - __remove_memory(nid, lmb->base_addr, block_sz); > + __remove_memory(lmb->base_addr, block_sz); > invalidate_lmb_associativity_index(lmb); > } else { > lmb->flags |= DRCONF_MEM_ASSIGNED; Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) cheers
On 08.06.21 13:11, Michael Ellerman wrote: > David Hildenbrand <david@redhat.com> writes: >> There is only a single user remaining. We can simply try to offline all >> online nodes - which is fast, because we usually span pages and can skip >> such nodes right away. > > That makes me slightly nervous, because our big powerpc boxes tend to > trip on these scaling issues before others. > > But the spanned pages check is just: > > void try_offline_node(int nid) > { > pg_data_t *pgdat = NODE_DATA(nid); > ... > if (pgdat->node_spanned_pages) > return; > > So I guess that's pretty cheap, and it's only O(nodes), which should > never get that big. Exactly. And if it does turn out to be a problem, we can walk all memory blocks before removing them, collecting the nid(s).
On 08.06.21 13:18, David Hildenbrand wrote: > On 08.06.21 13:11, Michael Ellerman wrote: >> David Hildenbrand <david@redhat.com> writes: >>> There is only a single user remaining. We can simply try to offline all >>> online nodes - which is fast, because we usually span pages and can skip >>> such nodes right away. >> >> That makes me slightly nervous, because our big powerpc boxes tend to >> trip on these scaling issues before others. >> >> But the spanned pages check is just: >> >> void try_offline_node(int nid) >> { >> pg_data_t *pgdat = NODE_DATA(nid); >> ... >> if (pgdat->node_spanned_pages) >> return; >> >> So I guess that's pretty cheap, and it's only O(nodes), which should >> never get that big. > > Exactly. And if it does turn out to be a problem, we can walk all memory > blocks before removing them, collecting the nid(s). > I might just do the following on top: diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 61bff8f3bfb1..bbc26fdac364 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -2176,7 +2176,9 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, static int check_memblock_offlined_cb(struct memory_block *mem, void *arg) { int ret = !is_memblock_offlined(mem); + int *nid = arg; + *nid = mem->nid; if (unlikely(ret)) { phys_addr_t beginpa, endpa; @@ -2271,10 +2273,10 @@ EXPORT_SYMBOL(try_offline_node); static int __ref try_remove_memory(u64 start, u64 size) { - int rc = 0, nid; struct vmem_altmap mhp_altmap = {}; struct vmem_altmap *altmap = NULL; unsigned long nr_vmemmap_pages; + int rc = 0, nid = NUMA_NO_NODE; BUG_ON(check_hotplug_memory_range(start, size)); @@ -2282,8 +2284,12 @@ static int __ref try_remove_memory(u64 start, u64 size) * All memory blocks must be offlined before removing memory. Check * whether all memory blocks in question are offline and return error * if this is not the case. + * + * While at it, determine the nid. Note that if we'd have mixed nodes, + * we'd only try to offline the last determined one -- which is good + * enough for the cases we care about. */ - rc = walk_memory_blocks(start, size, NULL, check_memblock_offlined_cb); + rc = walk_memory_blocks(start, size, &nid, check_memblock_offlined_cb); if (rc) return rc; @@ -2332,7 +2338,7 @@ static int __ref try_remove_memory(u64 start, u64 size) release_mem_region_adjustable(start, size); - for_each_online_node(nid) + if (nid != NUMA_NO_NODE) try_offline_node(nid); mem_hotplug_done();
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index 8377f1f7c78e..4a9232ddbefe 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -286,7 +286,7 @@ static int pseries_remove_memblock(unsigned long base, unsigned long memblock_si { unsigned long block_sz, start_pfn; int sections_per_block; - int i, nid; + int i; start_pfn = base >> PAGE_SHIFT; @@ -297,10 +297,9 @@ static int pseries_remove_memblock(unsigned long base, unsigned long memblock_si block_sz = pseries_memory_block_size(); sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE; - nid = memory_add_physaddr_to_nid(base); for (i = 0; i < sections_per_block; i++) { - __remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE); + __remove_memory(base, MIN_MEMORY_BLOCK_SIZE); base += MIN_MEMORY_BLOCK_SIZE; } @@ -386,7 +385,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb) block_sz = pseries_memory_block_size(); - __remove_memory(mem_block->nid, lmb->base_addr, block_sz); + __remove_memory(lmb->base_addr, block_sz); put_device(&mem_block->dev); /* Update memory regions for memory remove */ @@ -638,7 +637,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb) rc = dlpar_online_lmb(lmb); if (rc) { - __remove_memory(nid, lmb->base_addr, block_sz); + __remove_memory(lmb->base_addr, block_sz); invalidate_lmb_associativity_index(lmb); } else { lmb->flags |= DRCONF_MEM_ASSIGNED; diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 8cc195c4c861..1d01d9414c40 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -239,19 +239,14 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device) { - acpi_handle handle = mem_device->device->handle; struct acpi_memory_info *info, *n; - int nid = acpi_get_node(handle); list_for_each_entry_safe(info, n, &mem_device->res_list, list) { if (!info->enabled) continue; - if (nid == NUMA_NO_NODE) - nid = memory_add_physaddr_to_nid(info->start_addr); - acpi_unbind_memory_blocks(info); - __remove_memory(nid, info->start_addr, info->length); + __remove_memory(info->start_addr, info->length); list_del(&info->list); kfree(info); } diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index ac231cc36359..99e0f60c4c26 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -156,8 +156,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax) if (rc) continue; - rc = remove_memory(dev_dax->target_node, range.start, - range_len(&range)); + rc = remove_memory(range.start, range_len(&range)); if (rc == 0) { release_resource(data->res[i]); kfree(data->res[i]); diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 10ec60d81e84..e327fb878143 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -673,7 +673,7 @@ static int virtio_mem_remove_memory(struct virtio_mem *vm, uint64_t addr, dev_dbg(&vm->vdev->dev, "removing memory: 0x%llx - 0x%llx\n", addr, addr + size - 1); - rc = remove_memory(vm->nid, addr, size); + rc = remove_memory(addr, size); if (!rc) { atomic64_sub(size, &vm->offline_size); /* @@ -728,7 +728,7 @@ static int virtio_mem_offline_and_remove_memory(struct virtio_mem *vm, "offlining and removing memory: 0x%llx - 0x%llx\n", addr, addr + size - 1); - rc = offline_and_remove_memory(vm->nid, addr, size); + rc = offline_and_remove_memory(addr, size); if (!rc) { atomic64_sub(size, &vm->offline_size); /* diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 1d8d09c029c9..84f05435e2ae 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -319,9 +319,9 @@ static inline void pgdat_resize_init(struct pglist_data *pgdat) {} extern void try_offline_node(int nid); extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); -extern int remove_memory(int nid, u64 start, u64 size); -extern void __remove_memory(int nid, u64 start, u64 size); -extern int offline_and_remove_memory(int nid, u64 start, u64 size); +extern int remove_memory(u64 start, u64 size); +extern void __remove_memory(u64 start, u64 size); +extern int offline_and_remove_memory(u64 start, u64 size); #else static inline void try_offline_node(int nid) {} @@ -331,12 +331,12 @@ static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages) return -EINVAL; } -static inline int remove_memory(int nid, u64 start, u64 size) +static inline int remove_memory(u64 start, u64 size) { return -EBUSY; } -static inline void __remove_memory(int nid, u64 start, u64 size) {} +static inline void __remove_memory(u64 start, u64 size) {} #endif /* CONFIG_MEMORY_HOTREMOVE */ extern void set_zone_contiguous(struct zone *zone); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index f9be66bbd847..9cae42636f3e 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -2157,9 +2157,9 @@ void try_offline_node(int nid) } EXPORT_SYMBOL(try_offline_node); -static int __ref try_remove_memory(int nid, u64 start, u64 size) +static int __ref try_remove_memory(u64 start, u64 size) { - int rc = 0; + int rc = 0, nid; struct vmem_altmap mhp_altmap = {}; struct vmem_altmap *altmap = NULL; unsigned long nr_vmemmap_pages; @@ -2220,7 +2220,8 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size) release_mem_region_adjustable(start, size); - try_offline_node(nid); + for_each_online_node(nid) + try_offline_node(nid); mem_hotplug_done(); return 0; @@ -2228,7 +2229,6 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size) /** * remove_memory - * @nid: the node ID * @start: physical address of the region to remove * @size: size of the region to remove * @@ -2236,14 +2236,14 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size) * and online/offline operations before this call, as required by * try_offline_node(). */ -void __remove_memory(int nid, u64 start, u64 size) +void __remove_memory(u64 start, u64 size) { /* * trigger BUG() if some memory is not offlined prior to calling this * function */ - if (try_remove_memory(nid, start, size)) + if (try_remove_memory(start, size)) BUG(); } @@ -2251,12 +2251,12 @@ void __remove_memory(int nid, u64 start, u64 size) * Remove memory if every memory block is offline, otherwise return -EBUSY is * some memory is not offline */ -int remove_memory(int nid, u64 start, u64 size) +int remove_memory(u64 start, u64 size) { int rc; lock_device_hotplug(); - rc = try_remove_memory(nid, start, size); + rc = try_remove_memory(start, size); unlock_device_hotplug(); return rc; @@ -2316,7 +2316,7 @@ static int try_reonline_memory_block(struct memory_block *mem, void *arg) * unplugged all memory (so it's no longer in use) and want to offline + remove * that memory. */ -int offline_and_remove_memory(int nid, u64 start, u64 size) +int offline_and_remove_memory(u64 start, u64 size) { const unsigned long mb_count = size / memory_block_size_bytes(); uint8_t *online_types, *tmp; @@ -2352,7 +2352,7 @@ int offline_and_remove_memory(int nid, u64 start, u64 size) * This cannot fail as it cannot get onlined in the meantime. */ if (!rc) { - rc = try_remove_memory(nid, start, size); + rc = try_remove_memory(start, size); if (rc) pr_err("%s: Failed to remove memory: %d", __func__, rc); }
There is only a single user remaining. We can simply try to offline all online nodes - which is fast, because we usually span pages and can skip such nodes right away. Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-acpi@vger.kernel.org Cc: nvdimm@lists.linux.dev Signed-off-by: David Hildenbrand <david@redhat.com> --- .../platforms/pseries/hotplug-memory.c | 9 ++++----- drivers/acpi/acpi_memhotplug.c | 7 +------ drivers/dax/kmem.c | 3 +-- drivers/virtio/virtio_mem.c | 4 ++-- include/linux/memory_hotplug.h | 10 +++++----- mm/memory_hotplug.c | 20 +++++++++---------- 6 files changed, 23 insertions(+), 30 deletions(-)