From patchwork Tue Sep 18 11:48:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 10604215 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E84C76CB for ; Tue, 18 Sep 2018 11:48:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D8CDB2A6C5 for ; Tue, 18 Sep 2018 11:48:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CCCF52A6FA; Tue, 18 Sep 2018 11:48:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 334472A6C5 for ; Tue, 18 Sep 2018 11:48:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728912AbeIRRUz (ORCPT ); Tue, 18 Sep 2018 13:20:55 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58136 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726768AbeIRRUy (ORCPT ); Tue, 18 Sep 2018 13:20:54 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D0CE781DF1; Tue, 18 Sep 2018 11:48:40 +0000 (UTC) Received: from t460s.redhat.com (ovpn-116-248.ams2.redhat.com [10.36.116.248]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7F9785C1B2; Tue, 18 Sep 2018 11:48:34 +0000 (UTC) From: David Hildenbrand To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-acpi@vger.kernel.org, xen-devel@lists.xenproject.org, devel@linuxdriverproject.org, David Hildenbrand , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , "Rafael J. Wysocki" , Len Brown , Rashmica Gupta , Michael Neuling , Balbir Singh , Nathan Fontenot , John Allen , Andrew Morton , Michal Hocko , Dan Williams , Joonsoo Kim , Vlastimil Babka , Pavel Tatashin , Greg Kroah-Hartman , Oscar Salvador , YASUAKI ISHIMATSU , Mathieu Malaterre Subject: [PATCH v1 1/6] mm/memory_hotplug: make remove_memory() take the device_hotplug_lock Date: Tue, 18 Sep 2018 13:48:17 +0200 Message-Id: <20180918114822.21926-2-david@redhat.com> In-Reply-To: <20180918114822.21926-1-david@redhat.com> References: <20180918114822.21926-1-david@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Tue, 18 Sep 2018 11:48:41 +0000 (UTC) Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP remove_memory() is exported right now but requires the device_hotplug_lock, which is not exported. So let's provide a variant that takes the lock and only export that one. The lock is already held in arch/powerpc/platforms/pseries/hotplug-memory.c drivers/acpi/acpi_memhotplug.c So, let's use the locked variant. The lock is not held (but taken in) arch/powerpc/platforms/powernv/memtrace.c So let's keep using the (now) locked variant. Apart from that, there are not other users in the tree. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Rashmica Gupta Cc: Michael Neuling Cc: Balbir Singh Cc: Nathan Fontenot Cc: John Allen Cc: Andrew Morton Cc: Michal Hocko Cc: Dan Williams Cc: Joonsoo Kim Cc: Vlastimil Babka Cc: Pavel Tatashin Cc: Greg Kroah-Hartman Cc: Oscar Salvador Cc: YASUAKI ISHIMATSU Cc: Mathieu Malaterre Reviewed-by: Pavel Tatashin Signed-off-by: David Hildenbrand Reviewed-by: Rafael J. Wysocki --- arch/powerpc/platforms/powernv/memtrace.c | 2 -- arch/powerpc/platforms/pseries/hotplug-memory.c | 6 +++--- drivers/acpi/acpi_memhotplug.c | 2 +- include/linux/memory_hotplug.h | 3 ++- mm/memory_hotplug.c | 9 ++++++++- 5 files changed, 14 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c index 51dc398ae3f7..8f1cd4f3bfd5 100644 --- a/arch/powerpc/platforms/powernv/memtrace.c +++ b/arch/powerpc/platforms/powernv/memtrace.c @@ -90,9 +90,7 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages) walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE, change_memblock_state); - lock_device_hotplug(); remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT); - unlock_device_hotplug(); return true; } diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index c1578f54c626..b3f54466e25f 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -334,7 +334,7 @@ static int pseries_remove_memblock(unsigned long base, unsigned int memblock_siz nid = memory_add_physaddr_to_nid(base); for (i = 0; i < sections_per_block; i++) { - remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE); + __remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE); base += MIN_MEMORY_BLOCK_SIZE; } @@ -423,7 +423,7 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb) block_sz = pseries_memory_block_size(); nid = memory_add_physaddr_to_nid(lmb->base_addr); - remove_memory(nid, lmb->base_addr, block_sz); + __remove_memory(nid, lmb->base_addr, block_sz); /* Update memory regions for memory remove */ memblock_remove(lmb->base_addr, block_sz); @@ -710,7 +710,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb) rc = dlpar_online_lmb(lmb); if (rc) { - remove_memory(nid, lmb->base_addr, block_sz); + __remove_memory(nid, lmb->base_addr, block_sz); dlpar_remove_device_tree_lmb(lmb); } else { lmb->flags |= DRCONF_MEM_ASSIGNED; diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 6b0d3ef7309c..811148415993 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct acpi_memory_device *mem_device) nid = memory_add_physaddr_to_nid(info->start_addr); acpi_unbind_memory_blocks(info); - remove_memory(nid, info->start_addr, info->length); + __remove_memory(nid, info->start_addr, info->length); list_del(&info->list); kfree(info); } diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 34a28227068d..1f096852f479 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -301,6 +301,7 @@ extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages); extern void try_offline_node(int nid); extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); extern void remove_memory(int nid, u64 start, u64 size); +extern void __remove_memory(int nid, u64 start, u64 size); #else static inline bool is_mem_section_removable(unsigned long pfn, @@ -317,6 +318,7 @@ static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages) } static inline void remove_memory(int nid, u64 start, u64 size) {} +static inline void __remove_memory(int nid, u64 start, u64 size) {} #endif /* CONFIG_MEMORY_HOTREMOVE */ extern void __ref free_area_init_core_hotplug(int nid); @@ -330,7 +332,6 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages, struct vmem_altmap *altmap); extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); extern bool is_memblock_offlined(struct memory_block *mem); -extern void remove_memory(int nid, u64 start, u64 size); extern int sparse_add_one_section(struct pglist_data *pgdat, unsigned long start_pfn, struct vmem_altmap *altmap); extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms, diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 38d94b703e9d..b8b1bd970322 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1873,7 +1873,7 @@ EXPORT_SYMBOL(try_offline_node); * and online/offline operations before this call, as required by * try_offline_node(). */ -void __ref remove_memory(int nid, u64 start, u64 size) +void __ref __remove_memory(int nid, u64 start, u64 size) { int ret; @@ -1902,5 +1902,12 @@ void __ref remove_memory(int nid, u64 start, u64 size) mem_hotplug_done(); } + +void remove_memory(int nid, u64 start, u64 size) +{ + lock_device_hotplug(); + __remove_memory(nid, start, size); + unlock_device_hotplug(); +} EXPORT_SYMBOL_GPL(remove_memory); #endif /* CONFIG_MEMORY_HOTREMOVE */ From patchwork Tue Sep 18 11:48:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 10604217 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2D113161F for ; Tue, 18 Sep 2018 11:48:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1A2222A6C5 for ; Tue, 18 Sep 2018 11:48:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0E7D62A704; Tue, 18 Sep 2018 11:48:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4EA522A6C5 for ; Tue, 18 Sep 2018 11:48:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729197AbeIRRVC (ORCPT ); Tue, 18 Sep 2018 13:21:02 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42092 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729049AbeIRRVC (ORCPT ); Tue, 18 Sep 2018 13:21:02 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 75470308427E; Tue, 18 Sep 2018 11:48:48 +0000 (UTC) Received: from t460s.redhat.com (ovpn-116-248.ams2.redhat.com [10.36.116.248]) by smtp.corp.redhat.com (Postfix) with ESMTP id 358D717101; Tue, 18 Sep 2018 11:48:41 +0000 (UTC) From: David Hildenbrand To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-acpi@vger.kernel.org, xen-devel@lists.xenproject.org, devel@linuxdriverproject.org, David Hildenbrand , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , "Rafael J. Wysocki" , Len Brown , Greg Kroah-Hartman , Boris Ostrovsky , Juergen Gross , Nathan Fontenot , John Allen , Andrew Morton , Michal Hocko , Dan Williams , Joonsoo Kim , Vlastimil Babka , Oscar Salvador , Mathieu Malaterre , Pavel Tatashin , YASUAKI ISHIMATSU Subject: [PATCH v1 2/6] mm/memory_hotplug: make add_memory() take the device_hotplug_lock Date: Tue, 18 Sep 2018 13:48:18 +0200 Message-Id: <20180918114822.21926-3-david@redhat.com> In-Reply-To: <20180918114822.21926-1-david@redhat.com> References: <20180918114822.21926-1-david@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Tue, 18 Sep 2018 11:48:48 +0000 (UTC) Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP add_memory() currently does not take the device_hotplug_lock, however is aleady called under the lock from arch/powerpc/platforms/pseries/hotplug-memory.c drivers/acpi/acpi_memhotplug.c to synchronize against CPU hot-remove and similar. In general, we should hold the device_hotplug_lock when adding memory to synchronize against online/offline request (e.g. from user space) - which already resulted in lock inversions due to device_lock() and mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory hot-add deadlock"). add_memory()/add_memory_resource() will create memory block devices, so this really feels like the right thing to do. Holding the device_hotplug_lock makes sure that a memory block device can really only be accessed (e.g. via .online/.state) from user space, once the memory has been fully added to the system. The lock is not held yet in drivers/xen/balloon.c arch/powerpc/platforms/powernv/memtrace.c drivers/s390/char/sclp_cmd.c drivers/hv/hv_balloon.c So, let's either use the locked variants or take the lock. Don't export add_memory_resource(), as it once was exported to be used by XEN, which is never built as a module. If somebody requires it, we also have to export a locked variant (as device_hotplug_lock is never exported). Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Greg Kroah-Hartman Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Nathan Fontenot Cc: John Allen Cc: Andrew Morton Cc: Michal Hocko Cc: Dan Williams Cc: Joonsoo Kim Cc: Vlastimil Babka Cc: Oscar Salvador Cc: Mathieu Malaterre Cc: Pavel Tatashin Cc: YASUAKI ISHIMATSU Reviewed-by: Pavel Tatashin Signed-off-by: David Hildenbrand Reviewed-by: Rafael J. Wysocki --- .../platforms/pseries/hotplug-memory.c | 2 +- drivers/acpi/acpi_memhotplug.c | 2 +- drivers/base/memory.c | 9 ++++++-- drivers/xen/balloon.c | 3 +++ include/linux/memory_hotplug.h | 1 + mm/memory_hotplug.c | 22 ++++++++++++++++--- 6 files changed, 32 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index b3f54466e25f..2e6f41dc103a 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -702,7 +702,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb) nid = memory_add_physaddr_to_nid(lmb->base_addr); /* Add the memory */ - rc = add_memory(nid, lmb->base_addr, block_sz); + rc = __add_memory(nid, lmb->base_addr, block_sz); if (rc) { dlpar_remove_device_tree_lmb(lmb); return rc; diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c index 811148415993..8fe0960ea572 100644 --- a/drivers/acpi/acpi_memhotplug.c +++ b/drivers/acpi/acpi_memhotplug.c @@ -228,7 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) if (node < 0) node = memory_add_physaddr_to_nid(info->start_addr); - result = add_memory(node, info->start_addr, info->length); + result = __add_memory(node, info->start_addr, info->length); /* * If the memory block has been used by the kernel, add_memory() diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 817320c7c4c1..40cac122ec73 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -519,15 +519,20 @@ memory_probe_store(struct device *dev, struct device_attribute *attr, if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1)) return -EINVAL; + ret = lock_device_hotplug_sysfs(); + if (ret) + goto out; + nid = memory_add_physaddr_to_nid(phys_addr); - ret = add_memory(nid, phys_addr, - MIN_MEMORY_BLOCK_SIZE * sections_per_block); + ret = __add_memory(nid, phys_addr, + MIN_MEMORY_BLOCK_SIZE * sections_per_block); if (ret) goto out; ret = count; out: + unlock_device_hotplug(); return ret; } diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index e12bb256036f..6bab019a82b1 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -395,7 +395,10 @@ static enum bp_state reserve_additional_memory(void) * callers drop the mutex before trying again. */ mutex_unlock(&balloon_mutex); + /* add_memory_resource() requires the device_hotplug lock */ + lock_device_hotplug(); rc = add_memory_resource(nid, resource, memhp_auto_online); + unlock_device_hotplug(); mutex_lock(&balloon_mutex); if (rc) { diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 1f096852f479..ffd9cd10fcf3 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -324,6 +324,7 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {} extern void __ref free_area_init_core_hotplug(int nid); extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn, void *arg, int (*func)(struct memory_block *, void *)); +extern int __add_memory(int nid, u64 start, u64 size); extern int add_memory(int nid, u64 start, u64 size); extern int add_memory_resource(int nid, struct resource *resource, bool online); extern int arch_add_memory(int nid, u64 start, u64 size, diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index b8b1bd970322..ef5444145c88 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1111,7 +1111,12 @@ static int online_memory_block(struct memory_block *mem, void *arg) return device_online(&mem->dev); } -/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ +/* + * NOTE: The caller must call lock_device_hotplug() to serialize hotplug + * and online/offline operations (triggered e.g. by sysfs). + * + * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG + */ int __ref add_memory_resource(int nid, struct resource *res, bool online) { u64 start, size; @@ -1180,9 +1185,9 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online) mem_hotplug_done(); return ret; } -EXPORT_SYMBOL_GPL(add_memory_resource); -int __ref add_memory(int nid, u64 start, u64 size) +/* requires device_hotplug_lock, see add_memory_resource() */ +int __ref __add_memory(int nid, u64 start, u64 size) { struct resource *res; int ret; @@ -1196,6 +1201,17 @@ int __ref add_memory(int nid, u64 start, u64 size) release_memory_resource(res); return ret; } + +int add_memory(int nid, u64 start, u64 size) +{ + int rc; + + lock_device_hotplug(); + rc = __add_memory(nid, start, size); + unlock_device_hotplug(); + + return rc; +} EXPORT_SYMBOL_GPL(add_memory); #ifdef CONFIG_MEMORY_HOTREMOVE From patchwork Tue Sep 18 11:48:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 10604235 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1DEF46CB for ; Tue, 18 Sep 2018 11:49:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0DD1F2A6FA for ; Tue, 18 Sep 2018 11:49:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 00F362A70B; Tue, 18 Sep 2018 11:49:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 336282A6FA for ; Tue, 18 Sep 2018 11:49:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729110AbeIRRVK (ORCPT ); Tue, 18 Sep 2018 13:21:10 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58260 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726768AbeIRRVJ (ORCPT ); Tue, 18 Sep 2018 13:21:09 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6A78F7F6BB; Tue, 18 Sep 2018 11:48:55 +0000 (UTC) Received: from t460s.redhat.com (ovpn-116-248.ams2.redhat.com [10.36.116.248]) by smtp.corp.redhat.com (Postfix) with ESMTP id CEE437DEEB; Tue, 18 Sep 2018 11:48:48 +0000 (UTC) From: David Hildenbrand To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-acpi@vger.kernel.org, xen-devel@lists.xenproject.org, devel@linuxdriverproject.org, David Hildenbrand , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , "Rafael J. Wysocki" , Len Brown , Greg Kroah-Hartman , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , Martin Schwidefsky , Heiko Carstens , Boris Ostrovsky , Juergen Gross , Rashmica Gupta , Michael Neuling , Balbir Singh , Kate Stewart , Thomas Gleixner , Philippe Ombredanne , Andrew Morton , Michal Hocko , Pavel Tatashin , Vlastimil Babka , Dan Williams , Oscar Salvador , YASUAKI ISHIMATSU , Mathieu Malaterre Subject: [PATCH v1 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock Date: Tue, 18 Sep 2018 13:48:19 +0200 Message-Id: <20180918114822.21926-4-david@redhat.com> In-Reply-To: <20180918114822.21926-1-david@redhat.com> References: <20180918114822.21926-1-david@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Tue, 18 Sep 2018 11:48:55 +0000 (UTC) Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP There seem to be some problems as result of 30467e0b3be ("mm, hotplug: fix concurrent memory hot-add deadlock"), which tried to fix a possible lock inversion reported and discussed in [1] due to the two locks a) device_lock() b) mem_hotplug_lock While add_memory() first takes b), followed by a) during bus_probe_device(), onlining of memory from user space first took a), followed by b), exposing a possible deadlock. In [1], and it was decided to not make use of device_hotplug_lock, but rather to enforce a locking order. The problems I spotted related to this: 1. Memory block device attributes: While .state first calls mem_hotplug_begin() and the calls device_online() - which takes device_lock() - .online does no longer call mem_hotplug_begin(), so effectively calls online_pages() without mem_hotplug_lock. 2. device_online() should be called under device_hotplug_lock, however onlining memory during add_memory() does not take care of that. In addition, I think there is also something wrong about the locking in 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages() without locks. This was introduced after 30467e0b3be. And skimming over the code, I assume it could need some more care in regards to locking (e.g. device_online() called without device_hotplug_lock. This will be addressed in the following patches. Now that we hold the device_hotplug_lock when - adding memory (e.g. via add_memory()/add_memory_resource()) - removing memory (e.g. via remove_memory()) - device_online()/device_offline() We can move mem_hotplug_lock usage back into online_pages()/offline_pages(). Why is mem_hotplug_lock still needed? Essentially to make get_online_mems()/put_online_mems() be very fast (relying on device_hotplug_lock would be very slow), and to serialize against addition of memory that does not create memory block devices (hmm). [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/ 2015-February/065324.html This patch is partly based on a patch by Vitaly Kuznetsov. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Greg Kroah-Hartman Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Stephen Hemminger Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Rashmica Gupta Cc: Michael Neuling Cc: Balbir Singh Cc: Kate Stewart Cc: Thomas Gleixner Cc: Philippe Ombredanne Cc: Andrew Morton Cc: Michal Hocko Cc: Pavel Tatashin Cc: Vlastimil Babka Cc: Dan Williams Cc: Oscar Salvador Cc: YASUAKI ISHIMATSU Cc: Mathieu Malaterre Reviewed-by: Pavel Tatashin Reviewed-by: Rashmica Gupta Signed-off-by: David Hildenbrand --- drivers/base/memory.c | 13 +------------ mm/memory_hotplug.c | 28 ++++++++++++++++++++-------- 2 files changed, 21 insertions(+), 20 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 40cac122ec73..0e5985682642 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -228,7 +228,6 @@ static bool pages_correctly_probed(unsigned long start_pfn) /* * MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is * OK to have direct references to sparsemem variables in here. - * Must already be protected by mem_hotplug_begin(). */ static int memory_block_action(unsigned long phys_index, unsigned long action, int online_type) @@ -294,7 +293,6 @@ static int memory_subsys_online(struct device *dev) if (mem->online_type < 0) mem->online_type = MMOP_ONLINE_KEEP; - /* Already under protection of mem_hotplug_begin() */ ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE); /* clear online_type */ @@ -341,19 +339,11 @@ store_mem_state(struct device *dev, goto err; } - /* - * Memory hotplug needs to hold mem_hotplug_begin() for probe to find - * the correct memory block to online before doing device_online(dev), - * which will take dev->mutex. Take the lock early to prevent an - * inversion, memory_subsys_online() callbacks will be implemented by - * assuming it's already protected. - */ - mem_hotplug_begin(); - switch (online_type) { case MMOP_ONLINE_KERNEL: case MMOP_ONLINE_MOVABLE: case MMOP_ONLINE_KEEP: + /* mem->online_type is protected by device_hotplug_lock */ mem->online_type = online_type; ret = device_online(&mem->dev); break; @@ -364,7 +354,6 @@ store_mem_state(struct device *dev, ret = -EINVAL; /* should never happen */ } - mem_hotplug_done(); err: unlock_device_hotplug(); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index ef5444145c88..497e9315ca6f 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -881,7 +881,6 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid, return zone; } -/* Must be protected by mem_hotplug_begin() or a device_lock */ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type) { unsigned long flags; @@ -893,6 +892,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ struct memory_notify arg; struct memory_block *mem; + mem_hotplug_begin(); + /* * We can't use pfn_to_nid() because nid might be stored in struct page * which is not yet initialized. Instead, we find nid from memory block. @@ -957,6 +958,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ if (onlined_pages) memory_notify(MEM_ONLINE, &arg); + mem_hotplug_done(); return 0; failed_addition: @@ -964,6 +966,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ (unsigned long long) pfn << PAGE_SHIFT, (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); memory_notify(MEM_CANCEL_ONLINE, &arg); + mem_hotplug_done(); return ret; } #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */ @@ -1168,20 +1171,20 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online) /* create new memmap entry */ firmware_map_add_hotplug(start, start + size, "System RAM"); + /* device_online() will take the lock when calling online_pages() */ + mem_hotplug_done(); + /* online pages if requested */ if (online) walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL, online_memory_block); - goto out; - + return ret; error: /* rollback pgdat allocation and others */ if (new_node) rollback_node_hotadd(nid); memblock_remove(start, size); - -out: mem_hotplug_done(); return ret; } @@ -1622,10 +1625,16 @@ static int __ref __offline_pages(unsigned long start_pfn, return -EINVAL; if (!IS_ALIGNED(end_pfn, pageblock_nr_pages)) return -EINVAL; + + mem_hotplug_begin(); + /* This makes hotplug much easier...and readable. we assume this for now. .*/ - if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, &valid_end)) + if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, + &valid_end)) { + mem_hotplug_done(); return -EINVAL; + } zone = page_zone(pfn_to_page(valid_start)); node = zone_to_nid(zone); @@ -1634,8 +1643,10 @@ static int __ref __offline_pages(unsigned long start_pfn, /* set above range as isolated */ ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE, true); - if (ret) + if (ret) { + mem_hotplug_done(); return ret; + } arg.start_pfn = start_pfn; arg.nr_pages = nr_pages; @@ -1706,6 +1717,7 @@ static int __ref __offline_pages(unsigned long start_pfn, writeback_set_ratelimit(); memory_notify(MEM_OFFLINE, &arg); + mem_hotplug_done(); return 0; failed_removal: @@ -1715,10 +1727,10 @@ static int __ref __offline_pages(unsigned long start_pfn, memory_notify(MEM_CANCEL_OFFLINE, &arg); /* pushback to free area */ undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); + mem_hotplug_done(); return ret; } -/* Must be protected by mem_hotplug_begin() or a device_lock */ int offline_pages(unsigned long start_pfn, unsigned long nr_pages) { return __offline_pages(start_pfn, start_pfn + nr_pages); From patchwork Tue Sep 18 11:48:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 10604233 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 63BA96CB for ; Tue, 18 Sep 2018 11:49:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 557082A6F6 for ; Tue, 18 Sep 2018 11:49:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 49E922A704; Tue, 18 Sep 2018 11:49:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F08072A6F6 for ; Tue, 18 Sep 2018 11:49:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726768AbeIRRVN (ORCPT ); Tue, 18 Sep 2018 13:21:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45822 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728657AbeIRRVM (ORCPT ); Tue, 18 Sep 2018 13:21:12 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A2498356E5; Tue, 18 Sep 2018 11:48:58 +0000 (UTC) Received: from t460s.redhat.com (ovpn-116-248.ams2.redhat.com [10.36.116.248]) by smtp.corp.redhat.com (Postfix) with ESMTP id C688217101; Tue, 18 Sep 2018 11:48:55 +0000 (UTC) From: David Hildenbrand To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-acpi@vger.kernel.org, xen-devel@lists.xenproject.org, devel@linuxdriverproject.org, David Hildenbrand , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Rashmica Gupta , Balbir Singh , Michael Neuling Subject: [PATCH v1 4/6] powerpc/powernv: hold device_hotplug_lock when calling device_online() Date: Tue, 18 Sep 2018 13:48:20 +0200 Message-Id: <20180918114822.21926-5-david@redhat.com> In-Reply-To: <20180918114822.21926-1-david@redhat.com> References: <20180918114822.21926-1-david@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 18 Sep 2018 11:48:58 +0000 (UTC) Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP device_online() should be called with device_hotplug_lock() held. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rashmica Gupta Cc: Balbir Singh Cc: Michael Neuling Reviewed-by: Pavel Tatashin Signed-off-by: David Hildenbrand --- arch/powerpc/platforms/powernv/memtrace.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c index 8f1cd4f3bfd5..ef7181d4fe68 100644 --- a/arch/powerpc/platforms/powernv/memtrace.c +++ b/arch/powerpc/platforms/powernv/memtrace.c @@ -229,9 +229,11 @@ static int memtrace_online(void) * we need to online the memory ourselves. */ if (!memhp_auto_online) { + lock_device_hotplug(); walk_memory_range(PFN_DOWN(ent->start), PFN_UP(ent->start + ent->size - 1), NULL, online_mem_block); + unlock_device_hotplug(); } /* From patchwork Tue Sep 18 11:48:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 10604231 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ECBA615A6 for ; Tue, 18 Sep 2018 11:49:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DDEDF2A6F6 for ; Tue, 18 Sep 2018 11:49:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D1EC72A704; Tue, 18 Sep 2018 11:49:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 76F612A6F6 for ; Tue, 18 Sep 2018 11:49:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729529AbeIRRVS (ORCPT ); Tue, 18 Sep 2018 13:21:18 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43674 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727667AbeIRRVR (ORCPT ); Tue, 18 Sep 2018 13:21:17 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5749286669; Tue, 18 Sep 2018 11:49:04 +0000 (UTC) Received: from t460s.redhat.com (ovpn-116-248.ams2.redhat.com [10.36.116.248]) by smtp.corp.redhat.com (Postfix) with ESMTP id 082DA5C1B2; Tue, 18 Sep 2018 11:48:58 +0000 (UTC) From: David Hildenbrand To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-acpi@vger.kernel.org, xen-devel@lists.xenproject.org, devel@linuxdriverproject.org, David Hildenbrand , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Rashmica Gupta , Balbir Singh , Michael Neuling Subject: [PATCH v1 5/6] powerpc/powernv: hold device_hotplug_lock in memtrace_offline_pages() Date: Tue, 18 Sep 2018 13:48:21 +0200 Message-Id: <20180918114822.21926-6-david@redhat.com> In-Reply-To: <20180918114822.21926-1-david@redhat.com> References: <20180918114822.21926-1-david@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Tue, 18 Sep 2018 11:49:04 +0000 (UTC) Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Let's perform all checking + offlining + removing under device_hotplug_lock, so nobody can mess with these devices via sysfs concurrently. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rashmica Gupta Cc: Balbir Singh Cc: Michael Neuling Reviewed-by: Pavel Tatashin Signed-off-by: David Hildenbrand --- arch/powerpc/platforms/powernv/memtrace.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c index ef7181d4fe68..473e59842ec5 100644 --- a/arch/powerpc/platforms/powernv/memtrace.c +++ b/arch/powerpc/platforms/powernv/memtrace.c @@ -74,9 +74,13 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages) { u64 end_pfn = start_pfn + nr_pages - 1; + lock_device_hotplug(); + if (walk_memory_range(start_pfn, end_pfn, NULL, - check_memblock_online)) + check_memblock_online)) { + unlock_device_hotplug(); return false; + } walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE, change_memblock_state); @@ -84,14 +88,16 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages) if (offline_pages(start_pfn, nr_pages)) { walk_memory_range(start_pfn, end_pfn, (void *)MEM_ONLINE, change_memblock_state); + unlock_device_hotplug(); return false; } walk_memory_range(start_pfn, end_pfn, (void *)MEM_OFFLINE, change_memblock_state); - remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT); + __remove_memory(nid, start_pfn << PAGE_SHIFT, nr_pages << PAGE_SHIFT); + unlock_device_hotplug(); return true; } From patchwork Tue Sep 18 11:48:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 10604229 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 171A615A6 for ; Tue, 18 Sep 2018 11:49:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 078EB2A6F6 for ; Tue, 18 Sep 2018 11:49:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EFB9E2A704; Tue, 18 Sep 2018 11:49:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 966282A6F6 for ; Tue, 18 Sep 2018 11:49:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729740AbeIRRVV (ORCPT ); Tue, 18 Sep 2018 13:21:21 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58356 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727667AbeIRRVU (ORCPT ); Tue, 18 Sep 2018 13:21:20 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 21F7B80E7E; Tue, 18 Sep 2018 11:49:07 +0000 (UTC) Received: from t460s.redhat.com (ovpn-116-248.ams2.redhat.com [10.36.116.248]) by smtp.corp.redhat.com (Postfix) with ESMTP id B43B417101; Tue, 18 Sep 2018 11:49:04 +0000 (UTC) From: David Hildenbrand To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-acpi@vger.kernel.org, xen-devel@lists.xenproject.org, devel@linuxdriverproject.org, David Hildenbrand , Jonathan Corbet , Michal Hocko , Andrew Morton Subject: [PATCH v1 6/6] memory-hotplug.txt: Add some details about locking internals Date: Tue, 18 Sep 2018 13:48:22 +0200 Message-Id: <20180918114822.21926-7-david@redhat.com> In-Reply-To: <20180918114822.21926-1-david@redhat.com> References: <20180918114822.21926-1-david@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Tue, 18 Sep 2018 11:49:07 +0000 (UTC) Sender: linux-acpi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Let's document the magic a bit, especially why device_hotplug_lock is required when adding/removing memory and how it all play together with requests to online/offline memory from user space. Cc: Jonathan Corbet Cc: Michal Hocko Cc: Andrew Morton Reviewed-by: Pavel Tatashin Signed-off-by: David Hildenbrand --- Documentation/memory-hotplug.txt | 39 +++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt index 7f49ebf3ddb2..03aaad7d7373 100644 --- a/Documentation/memory-hotplug.txt +++ b/Documentation/memory-hotplug.txt @@ -3,7 +3,7 @@ Memory Hotplug ============== :Created: Jul 28 2007 -:Updated: Add description of notifier of memory hotplug: Oct 11 2007 +:Updated: Add some details about locking internals: Aug 20 2018 This document is about memory hotplug including how-to-use and current status. Because Memory Hotplug is still under development, contents of this text will @@ -495,6 +495,43 @@ further processing of the notification queue. NOTIFY_STOP stops further processing of the notification queue. + +Locking Internals +================= + +When adding/removing memory that uses memory block devices (i.e. ordinary RAM), +the device_hotplug_lock should be held to: + +- synchronize against online/offline requests (e.g. via sysfs). This way, memory + block devices can only be accessed (.online/.state attributes) by user + space once memory has been fully added. And when removing memory, we + know nobody is in critical sections. +- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC) + +Especially, there is a possible lock inversion that is avoided using +device_hotplug_lock when adding memory and user space tries to online that +memory faster than expected: + +- device_online() will first take the device_lock(), followed by + mem_hotplug_lock +- add_memory_resource() will first take the mem_hotplug_lock, followed by + the device_lock() (while creating the devices, during bus_add_device()). + +As the device is visible to user space before taking the device_lock(), this +can result in a lock inversion. + +onlining/offlining of memory should be done via device_online()/ +device_offline() - to make sure it is properly synchronized to actions +via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type) + +When adding/removing/onlining/offlining memory or adding/removing +heterogeneous/device memory, we should always hold the mem_hotplug_lock to +serialise memory hotplug (e.g. access to global/zone variables). + +In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) allows +for a quite efficient get_online_mems/put_online_mems implementation. + + Future Work ===========