From patchwork Tue Sep 25 09:14:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 10613765 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4FA836CB for ; Tue, 25 Sep 2018 09:15:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 45D50299D5 for ; Tue, 25 Sep 2018 09:15:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 39E3529B3C; Tue, 25 Sep 2018 09:15:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F321F29B39 for ; Tue, 25 Sep 2018 09:15:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CC6B18E0085; Tue, 25 Sep 2018 05:15:40 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C73278E0072; Tue, 25 Sep 2018 05:15:40 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF03C8E0085; Tue, 25 Sep 2018 05:15:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by kanga.kvack.org (Postfix) with ESMTP id 7F5918E0072 for ; Tue, 25 Sep 2018 05:15:40 -0400 (EDT) Received: by mail-qt1-f199.google.com with SMTP id h26-v6so6767990qtp.18 for ; Tue, 25 Sep 2018 02:15:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=M8OnhsX7AqKY0AfhmWqiVkUqDUfxMPNR7Ro4YkEuXx0=; b=s8HlgyC/xBjBR19+PKLT8uKjcF7OdItlTXf2mIZo6F285RrHlS/cBtUX5WYMFKbYz3 zItUaSJmDrf5MznKJ0PQG+2OPrrl/pzd9X2zBsYAERs7iOmElFFlEfAQwN1mJa11j3vL UAmgDwhmfFgfFvoSnoKUyF8P2hPpc3NFwQgyRdJ85BXrq7h3kRbSTokzsG2U7X2UBWlk THPG10C6kunvtwp6gpDCB5KoisKeRzmE1ZdVFBRz33hYkkjAg4LRVVfEOgwZRVcUKcav liTIle7nF1pvd2KeQNlniYhh7aLfX9rY8VXQKuXeyWNKd/6oHWYOF2K8Dli+1lN8UVt0 wpew== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of david@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: ABuFfojd3KAVp/3Zlf8q1VfhSCe0ceQ2HqZ4ucVOkuxO72Iiz08wrNgk tt3QnZ2nZDsKKIkBXBGVsuu/MlwU3lC1+97FqIoMrNBk0b6uFFXT4fPiLiflZEq/ZpERWaSbYR/ iXKc+IFpHIcBEzv5F/ZzrIB07B4bZUJlt6gNFRvF4g6p2TDPXk95gq7TUv6DUOcLZRA== X-Received: by 2002:a37:a753:: with SMTP id q80-v6mr74145qke.92.1537866940250; Tue, 25 Sep 2018 02:15:40 -0700 (PDT) X-Google-Smtp-Source: ACcGV62Gciyqaj6T7LGiaoWPS/aWPxXgg+5c07IR+o46Nfw3Y/upwcDvrszspt5u6Ev1w69jSZBt X-Received: by 2002:a37:a753:: with SMTP id q80-v6mr74107qke.92.1537866939449; Tue, 25 Sep 2018 02:15:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537866939; cv=none; d=google.com; s=arc-20160816; b=iMWGj/fdBpotpxbvCMhk/z/eiRthjoUYZFib3qrXXBqEwLTYC6JRpsbmHSdHSGfrrY 438DSWeATCup5IDlc0WLpIjf7VKhtcwEx/xT7FTqPUKIz93zJ8llxNBlDn6tKaG5Etd+ 5/MKwt9Hyq2X8OaQYK4ems/ZQoxIbI2w7vt1P4gyCJ4WnjYnTgE/elRtWRauTtVsZKeq NoC4bBToFtMF60aZ4P34oVSpK9Gw1X66kTJtlk9gp7GIXQs6f2QgQF1hQZaIyGwqiJRk F1sreDDL3TxElbG2k0KGQk2l/DOKF5ndZ3bpvKYNscSlxex+GBzYJm5Kg3UYz9eiuOrH Tu9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=M8OnhsX7AqKY0AfhmWqiVkUqDUfxMPNR7Ro4YkEuXx0=; b=vdSkoo/mdUQWWPbQdO9vNcoHEYdkQwgVsBwVTpXYX+ySXAUCXhZiGuYTheazTgICVr HSnFihAD3+fPVKW5Ar1ZInH3ImvpR2TrOyvvtRNEpGBqMozTLaf5odrAQelQCWBd3blP hTis4/mEeb0AifOWYp3a9uesINu4e+HJJtV0Dc6r/H/jCqmQYOd6s+ty4kPQFHawbihO TWBZNZhxY1wlDF6pjyS/JIyV/e4rVkSmbfjOXHFNddhgUzgNSFKPYJZd6+GkDtnPz2N6 j8o/cULNvEGS7yvM8ryzPzOuta0pK9/UJRlWuOyZYpMEZ2LYWj/Pi0v5Oy96QIec2frc ogAA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of david@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id f25-v6si169393qvh.45.2018.09.25.02.15.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 02:15:39 -0700 (PDT) Received-SPF: pass (google.com: domain of david@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of david@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 34ABBC057E09; Tue, 25 Sep 2018 09:15:38 +0000 (UTC) Received: from t460s.redhat.com (ovpn-117-161.ams2.redhat.com [10.36.117.161]) by smtp.corp.redhat.com (Postfix) with ESMTP id 06857308BDAA; Tue, 25 Sep 2018 09:15:30 +0000 (UTC) From: David Hildenbrand To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-acpi@vger.kernel.org, xen-devel@lists.xenproject.org, devel@linuxdriverproject.org, David Hildenbrand , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , "Rafael J. Wysocki" , Len Brown , Greg Kroah-Hartman , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , Martin Schwidefsky , Heiko Carstens , Boris Ostrovsky , Juergen Gross , Rashmica Gupta , Michael Neuling , Balbir Singh , Kate Stewart , Thomas Gleixner , Philippe Ombredanne , Andrew Morton , Michal Hocko , Pavel Tatashin , Vlastimil Babka , Dan Williams , Oscar Salvador , YASUAKI ISHIMATSU , Mathieu Malaterre Subject: [PATCH v2 3/6] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock Date: Tue, 25 Sep 2018 11:14:54 +0200 Message-Id: <20180925091457.28651-4-david@redhat.com> In-Reply-To: <20180925091457.28651-1-david@redhat.com> References: <20180925091457.28651-1-david@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.24 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 25 Sep 2018 09:15:38 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP There seem to be some problems as result of 30467e0b3be ("mm, hotplug: fix concurrent memory hot-add deadlock"), which tried to fix a possible lock inversion reported and discussed in [1] due to the two locks a) device_lock() b) mem_hotplug_lock While add_memory() first takes b), followed by a) during bus_probe_device(), onlining of memory from user space first took a), followed by b), exposing a possible deadlock. In [1], and it was decided to not make use of device_hotplug_lock, but rather to enforce a locking order. The problems I spotted related to this: 1. Memory block device attributes: While .state first calls mem_hotplug_begin() and the calls device_online() - which takes device_lock() - .online does no longer call mem_hotplug_begin(), so effectively calls online_pages() without mem_hotplug_lock. 2. device_online() should be called under device_hotplug_lock, however onlining memory during add_memory() does not take care of that. In addition, I think there is also something wrong about the locking in 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages() without locks. This was introduced after 30467e0b3be. And skimming over the code, I assume it could need some more care in regards to locking (e.g. device_online() called without device_hotplug_lock. This will be addressed in the following patches. Now that we hold the device_hotplug_lock when - adding memory (e.g. via add_memory()/add_memory_resource()) - removing memory (e.g. via remove_memory()) - device_online()/device_offline() We can move mem_hotplug_lock usage back into online_pages()/offline_pages(). Why is mem_hotplug_lock still needed? Essentially to make get_online_mems()/put_online_mems() be very fast (relying on device_hotplug_lock would be very slow), and to serialize against addition of memory that does not create memory block devices (hmm). [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/ 2015-February/065324.html This patch is partly based on a patch by Vitaly Kuznetsov. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: Greg Kroah-Hartman Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Stephen Hemminger Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Rashmica Gupta Cc: Michael Neuling Cc: Balbir Singh Cc: Kate Stewart Cc: Thomas Gleixner Cc: Philippe Ombredanne Cc: Andrew Morton Cc: Michal Hocko Cc: Pavel Tatashin Cc: Vlastimil Babka Cc: Dan Williams Cc: Oscar Salvador Cc: YASUAKI ISHIMATSU Cc: Mathieu Malaterre Reviewed-by: Pavel Tatashin Reviewed-by: Rashmica Gupta Signed-off-by: David Hildenbrand --- drivers/base/memory.c | 13 +------------ mm/memory_hotplug.c | 28 ++++++++++++++++++++-------- 2 files changed, 21 insertions(+), 20 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 40cac122ec73..0e5985682642 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -228,7 +228,6 @@ static bool pages_correctly_probed(unsigned long start_pfn) /* * MEMORY_HOTPLUG depends on SPARSEMEM in mm/Kconfig, so it is * OK to have direct references to sparsemem variables in here. - * Must already be protected by mem_hotplug_begin(). */ static int memory_block_action(unsigned long phys_index, unsigned long action, int online_type) @@ -294,7 +293,6 @@ static int memory_subsys_online(struct device *dev) if (mem->online_type < 0) mem->online_type = MMOP_ONLINE_KEEP; - /* Already under protection of mem_hotplug_begin() */ ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE); /* clear online_type */ @@ -341,19 +339,11 @@ store_mem_state(struct device *dev, goto err; } - /* - * Memory hotplug needs to hold mem_hotplug_begin() for probe to find - * the correct memory block to online before doing device_online(dev), - * which will take dev->mutex. Take the lock early to prevent an - * inversion, memory_subsys_online() callbacks will be implemented by - * assuming it's already protected. - */ - mem_hotplug_begin(); - switch (online_type) { case MMOP_ONLINE_KERNEL: case MMOP_ONLINE_MOVABLE: case MMOP_ONLINE_KEEP: + /* mem->online_type is protected by device_hotplug_lock */ mem->online_type = online_type; ret = device_online(&mem->dev); break; @@ -364,7 +354,6 @@ store_mem_state(struct device *dev, ret = -EINVAL; /* should never happen */ } - mem_hotplug_done(); err: unlock_device_hotplug(); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index affb03e0dfef..d4c7e42e46f3 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -860,7 +860,6 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid, return zone; } -/* Must be protected by mem_hotplug_begin() or a device_lock */ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type) { unsigned long flags; @@ -872,6 +871,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ struct memory_notify arg; struct memory_block *mem; + mem_hotplug_begin(); + /* * We can't use pfn_to_nid() because nid might be stored in struct page * which is not yet initialized. Instead, we find nid from memory block. @@ -936,6 +937,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ if (onlined_pages) memory_notify(MEM_ONLINE, &arg); + mem_hotplug_done(); return 0; failed_addition: @@ -943,6 +945,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ (unsigned long long) pfn << PAGE_SHIFT, (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); memory_notify(MEM_CANCEL_ONLINE, &arg); + mem_hotplug_done(); return ret; } #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */ @@ -1147,20 +1150,20 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online) /* create new memmap entry */ firmware_map_add_hotplug(start, start + size, "System RAM"); + /* device_online() will take the lock when calling online_pages() */ + mem_hotplug_done(); + /* online pages if requested */ if (online) walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL, online_memory_block); - goto out; - + return ret; error: /* rollback pgdat allocation and others */ if (new_node) rollback_node_hotadd(nid); memblock_remove(start, size); - -out: mem_hotplug_done(); return ret; } @@ -1588,10 +1591,16 @@ static int __ref __offline_pages(unsigned long start_pfn, return -EINVAL; if (!IS_ALIGNED(end_pfn, pageblock_nr_pages)) return -EINVAL; + + mem_hotplug_begin(); + /* This makes hotplug much easier...and readable. we assume this for now. .*/ - if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, &valid_end)) + if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, + &valid_end)) { + mem_hotplug_done(); return -EINVAL; + } zone = page_zone(pfn_to_page(valid_start)); node = zone_to_nid(zone); @@ -1600,8 +1609,10 @@ static int __ref __offline_pages(unsigned long start_pfn, /* set above range as isolated */ ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE, true); - if (ret) + if (ret) { + mem_hotplug_done(); return ret; + } arg.start_pfn = start_pfn; arg.nr_pages = nr_pages; @@ -1672,6 +1683,7 @@ static int __ref __offline_pages(unsigned long start_pfn, writeback_set_ratelimit(); memory_notify(MEM_OFFLINE, &arg); + mem_hotplug_done(); return 0; failed_removal: @@ -1681,10 +1693,10 @@ static int __ref __offline_pages(unsigned long start_pfn, memory_notify(MEM_CANCEL_OFFLINE, &arg); /* pushback to free area */ undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); + mem_hotplug_done(); return ret; } -/* Must be protected by mem_hotplug_begin() or a device_lock */ int offline_pages(unsigned long start_pfn, unsigned long nr_pages) { return __offline_pages(start_pfn, start_pfn + nr_pages);