From patchwork Tue Sep 25 09:14:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 10613751 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1D1746CB for ; Tue, 25 Sep 2018 09:15:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1073D299D5 for ; Tue, 25 Sep 2018 09:15:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0340129B49; Tue, 25 Sep 2018 09:15:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6D710299D5 for ; Tue, 25 Sep 2018 09:15:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BDA1E8E0082; Tue, 25 Sep 2018 05:15:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B635D8E0072; Tue, 25 Sep 2018 05:15:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A042A8E0082; Tue, 25 Sep 2018 05:15:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by kanga.kvack.org (Postfix) with ESMTP id 743348E0072 for ; Tue, 25 Sep 2018 05:15:17 -0400 (EDT) Received: by mail-qk1-f199.google.com with SMTP id d194-v6so24960493qkb.12 for ; Tue, 25 Sep 2018 02:15:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id; bh=yOCfglrSyOwZ9B95naWx+Cw6D92CJJr9YAM1HzIZ+Jk=; b=gJvUa41zqGoOYyaCnRjL16YsdXiFopEeULuZ5DKbh5fuPmhROkVn9YsaSlZGxutE5g YjnTAlKR2cGr7oHfinTduyETgxNKbX6x5iiA+q1uTHEpMf7wG0sIe6VKJEdp9fCIBmIc L65elFeIVjQDxU1p4DZb2CUzBTwOGFspNAwFpkRVqEv3zp2nnYn64vhS7kj+mQLagkLZ Fke0u3P1IaZdVhVeMqMHAq1ufUDttepmFYSGQPP344f+zOAhvjhepar0qrYuZR3bjz07 1TPPwQhrNU06tCVgYAuGmYVZI+L5x+1MYtTercoacd8DCQqvHQ+sspGEVZnucVz5KW2z 494A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of david@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: ABuFfojoRaahvVD23LXuHjUJmYtHcFVtg76FbUcBU2CzQeZh8bRHjcjm eYWWWkb9UddVEcOLFvf2jRH6HuLfYREIM8b7bVA1FrrvrjbVAArsTGJ4anBBfBg+IRCEwMxu8VI /nXfplMuxqi3KQ/yvpk0Yo3hTbXP1OPC9g44IoAy2+cEEZBLRwQdFRRR0yBxZL+qkIg== X-Received: by 2002:a37:9544:: with SMTP id x65-v6mr70066qkd.94.1537866917163; Tue, 25 Sep 2018 02:15:17 -0700 (PDT) X-Google-Smtp-Source: ACcGV604FijoxjTvOYMKRujx+v9k7tko+XEZbqFrkPqpi7H4jMrBnJwhY5TJrQQCv5LQKvTZX39P X-Received: by 2002:a37:9544:: with SMTP id x65-v6mr70038qkd.94.1537866916300; Tue, 25 Sep 2018 02:15:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537866916; cv=none; d=google.com; s=arc-20160816; b=NoQ9K7ovbSJ86SjDm0NTh2mk5WOSs3evs/aaeafNh1oSN9mw7OtgTyE/vzssk4oQ+y 1/ddv+DM3biKLI2fMuDIzETisrFxxTNu2tLgDYX7iO7NBKmWifa5Ux1vOFSIPA4lrgYM xX6NoVptZIf4iWmk7So5VJ4/i478dNS/hVrs8ZKVpb7BlrCCd3uKa1XlpEpJhHH1zZGx mcPU5wZk6hndUsppSd51CdYbGccrGb6Bk8xModNT8RjDYNH/XmaRzWELLSrrFJUpLI8x nPYECT2XcPHmh0T5hFhSD6VoWa/UfLnZEPkmFBgut/Eyael/zC+o5RfpnUTXOgtermkk 9Brw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:cc:to:from; bh=yOCfglrSyOwZ9B95naWx+Cw6D92CJJr9YAM1HzIZ+Jk=; b=cDaINgnT2lVNctCfJ7Vtk/WLpl3Z9osZq5b2ioNsDNkhJIs940nqPZuTxFsVE43WaO JHoikeHr+KsQTETIXtZcfSFk1xvf/w6MeldbKIk71vQjQdkQ3cA0TDCUZuhn7AyVYqjL e/rEu+X9Ps+4Y3SK0LPEOMuuwT0T6aYNaORN+Lxq1OmDajctJF13IYVnooyBqSBnBIHi UGN87AGVno6ZqdcVOXzVaV3pluYn3u9YIQZS7BDlZ+JwWTAApS7c47re1Nr99sGK6ByE 34i+sS0jkW0oVYJIa30/X5EuXC9YE/woHIsklH5fq33kHqcxeuA31qW1yI9PVCtW2qYW XxKw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of david@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id k32-v6si1479075qtd.303.2018.09.25.02.15.15 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 02:15:16 -0700 (PDT) Received-SPF: pass (google.com: domain of david@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of david@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id F3C8C30832D0; Tue, 25 Sep 2018 09:15:13 +0000 (UTC) Received: from t460s.redhat.com (ovpn-117-161.ams2.redhat.com [10.36.117.161]) by smtp.corp.redhat.com (Postfix) with ESMTP id AD1F1308BE75; Tue, 25 Sep 2018 09:14:58 +0000 (UTC) From: David Hildenbrand To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-acpi@vger.kernel.org, xen-devel@lists.xenproject.org, devel@linuxdriverproject.org, David Hildenbrand , Andrew Morton , Balbir Singh , Benjamin Herrenschmidt , Boris Ostrovsky , Dan Williams , Greg Kroah-Hartman , Haiyang Zhang , Heiko Carstens , John Allen , Jonathan Corbet , Joonsoo Kim , Juergen Gross , Kate Stewart , "K. Y. Srinivasan" , Len Brown , Martin Schwidefsky , Mathieu Malaterre , Michael Ellerman , Michael Neuling , Michal Hocko , Nathan Fontenot , Oscar Salvador , Paul Mackerras , Pavel Tatashin , Pavel Tatashin , Philippe Ombredanne , "Rafael J. Wysocki" , "Rafael J. Wysocki" , Rashmica Gupta , Stephen Hemminger , Thomas Gleixner , Vlastimil Babka , YASUAKI ISHIMATSU Subject: [PATCH v2 0/6] mm: online/offline_pages called w.o. mem_hotplug_lock Date: Tue, 25 Sep 2018 11:14:51 +0200 Message-Id: <20180925091457.28651-1-david@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.24 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Tue, 25 Sep 2018 09:15:15 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Reading through the code and studying how mem_hotplug_lock is to be used, I noticed that there are two places where we can end up calling device_online()/device_offline() - online_pages()/offline_pages() without the mem_hotplug_lock. And there are other places where we call device_online()/device_offline() without the device_hotplug_lock. While e.g. echo "online" > /sys/devices/system/memory/memory9/state is fine, e.g. echo 1 > /sys/devices/system/memory/memory9/online Will not take the mem_hotplug_lock. However the device_lock() and device_hotplug_lock. E.g. via memory_probe_store(), we can end up calling add_memory()->online_pages() without the device_hotplug_lock. So we can have concurrent callers in online_pages(). We e.g. touch in online_pages() basically unprotected zone->present_pages then. Looks like there is a longer history to that (see Patch #2 for details), and fixing it to work the way it was intended is not really possible. We would e.g. have to take the mem_hotplug_lock in device/base/core.c, which sounds wrong. Summary: We had a lock inversion on mem_hotplug_lock and device_lock(). More details can be found in patch 3 and patch 6. I propose the general rules (documentation added in patch 6): 1. add_memory/add_memory_resource() must only be called with device_hotplug_lock. 2. remove_memory() must only be called with device_hotplug_lock. This is already documented and holds for all callers. 3. device_online()/device_offline() must only be called with device_hotplug_lock. This is already documented and true for now in core code. Other callers (related to memory hotplug) have to be fixed up. 4. mem_hotplug_lock is taken inside of add_memory/remove_memory/ online_pages/offline_pages. To me, this looks way cleaner than what we have right now (and easier to verify). And looking at the documentation of remove_memory, using lock_device_hotplug also for add_memory() feels natural. v1 -> v2: - Upstream changes in powerpc/powernv code required modifications to patch #1, #4 and #5. - Minor patch description changes. - Added more locking details in patch #6. - Added rb's RFCv2 -> v1: - Dropped an unnecessary _ref from remove_memory() in patch #1 - Minor patch description fixes. - Added rb's RFC -> RFCv2: - Don't export device_hotplug_lock, provide proper remove_memory/add_memory wrappers. - Split up the patches a bit. - Try to improve powernv memtrace locking - Add some documentation for locking that matches my knowledge David Hildenbrand (6): mm/memory_hotplug: make remove_memory() take the device_hotplug_lock mm/memory_hotplug: make add_memory() take the device_hotplug_lock mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock powerpc/powernv: hold device_hotplug_lock when calling device_online() powerpc/powernv: hold device_hotplug_lock when calling memtrace_offline_pages() memory-hotplug.txt: Add some details about locking internals Documentation/memory-hotplug.txt | 42 ++++++++++++- arch/powerpc/platforms/powernv/memtrace.c | 8 ++- .../platforms/pseries/hotplug-memory.c | 8 +-- drivers/acpi/acpi_memhotplug.c | 4 +- drivers/base/memory.c | 22 +++---- drivers/xen/balloon.c | 3 + include/linux/memory_hotplug.h | 4 +- mm/memory_hotplug.c | 59 +++++++++++++++---- 8 files changed, 114 insertions(+), 36 deletions(-)