From patchwork Sun Mar 2 14:55:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13997839 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 982E3C282D1 for ; Sun, 2 Mar 2025 14:56:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 134C96B0088; Sun, 2 Mar 2025 09:56:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D7176B0085; Sun, 2 Mar 2025 09:56:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EE04A6B0088; Sun, 2 Mar 2025 09:56:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D2EF26B0083 for ; Sun, 2 Mar 2025 09:56:13 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 83568AEC81 for ; Sun, 2 Mar 2025 14:56:13 +0000 (UTC) X-FDA: 83176911426.21.B31E3CF Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf18.hostedemail.com (Postfix) with ESMTP id D8B8D1C000B for ; Sun, 2 Mar 2025 14:56:11 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740927372; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=a/Xb4toz8HCHYv/O/yAZnV9dA2S902L1MudEnMSzveE=; b=3oS5f9melh0cROeo5SjaEVnoN/gZEl2aSlz7cRi51VVQREYgWEpRLjqTNJOJrLjOP/V+Ha mvainY1R43Htx9ZCe2O/zXx6i4ff+rfpEG1HNuHHRUMBvjlV4GFkCkqZ0/sySWH/gYid8a 3edsXG1vWN6L2OP2Gg2cF6F2ioPMeFo= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740927372; a=rsa-sha256; cv=none; b=ln0OPPfntwzejv+vB+HKppujrKhHBSsDnRbE/h/28wordu0h1AHFUH70ipXgANLqs1aMD8 2c4rwhIh1W+wumQW7FgiDltCemsg95Gda+iwKEUMqX7yVEmxtjbQLD1TIgGg0EbkSS/1Zm SyT8BV/oBXcsDhL0woXrdiV9/a9dyXU= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A14C212FC; Sun, 2 Mar 2025 06:56:25 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CCBDB3F5A1; Sun, 2 Mar 2025 06:56:08 -0800 (PST) From: Ryan Roberts To: Andrew Morton , "David S. Miller" , Andreas Larsson , Juergen Gross , Boris Ostrovsky , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , "Matthew Wilcox (Oracle)" , Catalin Marinas Cc: Ryan Roberts , linux-mm@kvack.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 1/4] mm: Fix lazy mmu docs and usage Date: Sun, 2 Mar 2025 14:55:51 +0000 Message-ID: <20250302145555.3236789-2-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250302145555.3236789-1-ryan.roberts@arm.com> References: <20250302145555.3236789-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D8B8D1C000B X-Rspam-User: X-Stat-Signature: pit1kysdoiwmw4zpmbc36c8neawfte4h X-HE-Tag: 1740927371-428275 X-HE-Meta: U2FsdGVkX19CTKMPHlwymWM6+xji+Oz26OMAANtcvsz4Cc916EMaF4epkxkG/3ENXZpsb8mbW16IgPPOYSQ+fkwCq8qg51EtJ6xvuHka+ct16rXHENmizMw2oZ4PpdV8UtPMD6EFy7R8ZKzBSQK5H7YtyjOuSIaH5O0yowzNom/h8NNCMw2dTHhLmbqFPtlGLEeY2v9Rz8ZHA77CVPI5m2oQHmKtnZhbv8VZg92AKZI5mVuBGVSDNoUD+3WlOlOkvferot1/AbJltKYL/Ck7DO++Rh5raJg6k6FgNPbc7fO81St+2o7mwgiOHqmhLm/Exadce/GkCMGlpRBuUEABNJHgod8/yt2ajBYSEXIGjnH+ZBBkL+/UpIUur3ex20nF1uGzhW88XOU7CcdfoQFVrldDsQvqlf1zfKTRqTeqjbo48vAjbPtWcFemrvcC9K5xgd7HbqqcJaEwsiHTVvaxQsPvh3N4Qka1x7C6cKtEzWseo8iHzSt3V/yrxG7/u5LjtoenVQDoFdGWsUyzScIbqyJV+FcGIDm3F6N1P1gXoDbcxh1IX0/VcpkzQlRGjpRiOeiNDDwnVYTbHac8jHa8o65sv48wsTO+i1Px7I4g1QmHks+qh1WeHm+jgZntD/4tYW8VpzxVC5M4ZAxWglzZKdXktcr4M+HHaJkKiURkcQtKjy6MeUqCFE+9CzEO7CUIy1eUL6i7ucYNraqMwcFSMkawn9Y9kl+bNMKw1HYLyfYQOQ+v7eEGykKdQyFTzZOl5jKeVebXQhX7emW/+VB1W8Nz16IQHqxhnIbyJPqSyjkECPPKqL9FdzEdxBGFWvzPtYrfwSrVTCSKKEy21IzGuA0SKmroC/Obti/dGVigK0ag0RoDKdC67TjAPJVRCPveP0pVFQqJuVv3W4Ql3BrDnSYTF7v5sGelseSqnMzY6dViqVudSg6C2XpfT81xFEnQBi8Cql4J7Z9cYbg6eFF /4v5JmZY mOd5s46aCAvlWhOkH/e2V/4SoXfhFQ9M9DwNNtZ8Wp4bJaAH5ar7YCjbuJH5EUIIG3Oq+cEx1DTBsak1OgO9ngneWeAkAIi45qaOfNIYQj4z3Hc8fSPZDB3cjiJdxX9NymeGGCjg39mVqcbR9V84WKOhBSJUZfksXnLPO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The docs, implementations and use of arch_[enter|leave]_lazy_mmu_mode() is a bit of a mess (to put it politely). There are a number of issues related to nesting of lazy mmu regions and confusion over whether the task, when in a lazy mmu region, is preemptible or not. Fix all the issues relating to the core-mm. Follow up commits will fix the arch-specific implementations. 3 arches implement lazy mmu; powerpc, sparc and x86. When arch_[enter|leave]_lazy_mmu_mode() was first introduced by commit 6606c3e0da53 ("[PATCH] paravirt: lazy mmu mode hooks.patch"), it was expected that lazy mmu regions would never nest and that the appropriate page table lock(s) would be held while in the region, thus ensuring the region is non-preemptible. Additionally lazy mmu regions were only used during manipulation of user mappings. Commit 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy updates") started invoking the lazy mmu mode in apply_to_pte_range(), which is used for both user and kernel mappings. For kernel mappings the region is no longer protected by any lock so there is no longer any guarantee about non-preemptibility. Additionally, for RT configs, the holding the PTL only implies no CPU migration, it doesn't prevent preemption. Commit bcc6cc832573 ("mm: add default definition of set_ptes()") added arch_[enter|leave]_lazy_mmu_mode() to the default implementation of set_ptes(), used by x86. So after this commit, lazy mmu regions can be nested. Additionally commit 1a10a44dfc1d ("sparc64: implement the new page table range API") and commit 9fee28baa601 ("powerpc: implement the new page table range API") did the same for the sparc and powerpc set_ptes() overrides. powerpc couldn't deal with preemption so avoids it in commit b9ef323ea168 ("powerpc/64s: Disable preemption in hash lazy mmu mode"), which explicitly disables preemption for the whole region in its implementation. x86 can support preemption (or at least it could until it tried to add support nesting; more on this below). Sparc looks to be totally broken in the face of preemption, as far as I can tell. powewrpc can't deal with nesting, so avoids it in commit 47b8def9358c ("powerpc/mm: Avoid calling arch_enter/leave_lazy_mmu() in set_ptes"), which removes the lazy mmu calls from its implementation of set_ptes(). x86 attempted to support nesting in commit 49147beb0ccb ("x86/xen: allow nesting of same lazy mode") but as far as I can tell, this breaks its support for preemption. In short, it's all a mess; the semantics for arch_[enter|leave]_lazy_mmu_mode() are not clearly defined and as a result the implementations all have different expectations, sticking plasters and bugs. arm64 is aiming to start using these hooks, so let's clean everything up before adding an arm64 implementation. Update the documentation to state that lazy mmu regions can never be nested, must not be called in interrupt context and preemption may or may not be enabled for the duration of the region. Additionally, update the way arch_[enter|leave]_lazy_mmu_mode() is called in pagemap_scan_pmd_entry() to follow the normal pattern of holding the ptl for user space mappings. As a result the scope is reduced to only the pte table, but that's where most of the performance win is. While I believe there wasn't technically a bug here, the original scope made it easier to accidentally nest or, worse, accidentally call something like kmap() which would expect an immediate mode pte modification but it would end up deferred. arch-specific fixes to conform to the new spec will proceed this one. These issues were spotted by code review and I have no evidence of issues being reported in the wild. Signed-off-by: Ryan Roberts Acked-by: David Hildenbrand --- fs/proc/task_mmu.c | 11 ++++------- include/linux/pgtable.h | 14 ++++++++------ 2 files changed, 12 insertions(+), 13 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index c17615e21a5d..b0f189815512 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -2459,22 +2459,19 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigned long start, spinlock_t *ptl; int ret; - arch_enter_lazy_mmu_mode(); - ret = pagemap_scan_thp_entry(pmd, start, end, walk); - if (ret != -ENOENT) { - arch_leave_lazy_mmu_mode(); + if (ret != -ENOENT) return ret; - } ret = 0; start_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl); if (!pte) { - arch_leave_lazy_mmu_mode(); walk->action = ACTION_AGAIN; return 0; } + arch_enter_lazy_mmu_mode(); + if ((p->arg.flags & PM_SCAN_WP_MATCHING) && !p->vec_out) { /* Fast path for performing exclusive WP */ for (addr = start; addr != end; pte++, addr += PAGE_SIZE) { @@ -2543,8 +2540,8 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigned long start, if (flush_end) flush_tlb_range(vma, start, addr); - pte_unmap_unlock(start_pte, ptl); arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(start_pte, ptl); cond_resched(); return ret; diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 94d267d02372..787c632ee2c9 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -222,10 +222,14 @@ static inline int pmd_dirty(pmd_t pmd) * hazard could result in the direct mode hypervisor case, since the actual * write to the page tables may not yet have taken place, so reads though * a raw PTE pointer after it has been modified are not guaranteed to be - * up to date. This mode can only be entered and left under the protection of - * the page table locks for all page tables which may be modified. In the UP - * case, this is required so that preemption is disabled, and in the SMP case, - * it must synchronize the delayed page table writes properly on other CPUs. + * up to date. + * + * In the general case, no lock is guaranteed to be held between entry and exit + * of the lazy mode. So the implementation must assume preemption may be enabled + * and cpu migration is possible; it must take steps to be robust against this. + * (In practice, for user PTE updates, the appropriate page table lock(s) are + * held, but for kernel PTE updates, no lock is held). Nesting is not permitted + * and the mode cannot be used in interrupt context. */ #ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE #define arch_enter_lazy_mmu_mode() do {} while (0) @@ -287,7 +291,6 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, { page_table_check_ptes_set(mm, ptep, pte, nr); - arch_enter_lazy_mmu_mode(); for (;;) { set_pte(ptep, pte); if (--nr == 0) @@ -295,7 +298,6 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, ptep++; pte = pte_next_pfn(pte); } - arch_leave_lazy_mmu_mode(); } #endif #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) From patchwork Sun Mar 2 14:55:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13997840 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8AAEAC282C6 for ; Sun, 2 Mar 2025 14:56:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 237F56B0083; Sun, 2 Mar 2025 09:56:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1EA3C6B0085; Sun, 2 Mar 2025 09:56:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 03AFD6B0089; Sun, 2 Mar 2025 09:56:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id CC8FF6B0083 for ; Sun, 2 Mar 2025 09:56:16 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5B7E4160185 for ; Sun, 2 Mar 2025 14:56:16 +0000 (UTC) X-FDA: 83176911552.22.68C8761 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf25.hostedemail.com (Postfix) with ESMTP id C04C1A0014 for ; Sun, 2 Mar 2025 14:56:14 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf25.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740927374; a=rsa-sha256; cv=none; b=Si4VasGQ6PcMMypZHttE8mOPw9vOeMSjc8EUtbFjoM4pCrhn+tP4eOsSW1ZXqoPYrLf//K xykicwor99qB9+zDPEa7mzrEWbxzNmglJRw+aGPxGKtmd9x8xW3IwxuRyklHSt81EQSA+T Ze3lvEPgtFXYR42gKLAX2TJYiH2Z4Lk= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf25.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740927374; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=74hnGihyX9jEup2KRN3+iLUKBxGL+YGeX9Qs83VnFo4=; b=ji3kXOXKa9/V9FFLb41nxE8X6sLoLUC/jfU5AtwM6nrMXU9KfUgC4yu33+k4747z7LTusi RD9VHJEAsvTB8LtuE9ULF/yuSF9tVczH29Ro+BeV/K7Hz9MguEEEyABsT2EXjvTRIh2fSk E7S3/WTRJRvixAJMj8ummIdTAzHe8kU= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4A5BE13D5; Sun, 2 Mar 2025 06:56:28 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7B5213F5A1; Sun, 2 Mar 2025 06:56:11 -0800 (PST) From: Ryan Roberts To: Andrew Morton , "David S. Miller" , Andreas Larsson , Juergen Gross , Boris Ostrovsky , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , "Matthew Wilcox (Oracle)" , Catalin Marinas Cc: Ryan Roberts , linux-mm@kvack.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 2/4] sparc/mm: Disable preemption in lazy mmu mode Date: Sun, 2 Mar 2025 14:55:52 +0000 Message-ID: <20250302145555.3236789-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250302145555.3236789-1-ryan.roberts@arm.com> References: <20250302145555.3236789-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: C04C1A0014 X-Stat-Signature: yf11hfusrd53ff1rf4w7ap9k8u6pcp5s X-HE-Tag: 1740927374-723232 X-HE-Meta: U2FsdGVkX19asp8bngqarGiZGtYhWC3EPh+XQaArvnTqwWscUrDNkMfSxnrw+MbPKncppj80MC8RoL3UkmKg11VKwBb3qqIo1dqT1iBRcoJQ0VFOpkq2F0b8IjH3QIBZzJnTVe28rA1vX/xGTgGhr2v1D8cuqno3o83SOFhJSlSmP5BTOtM8gH9vMJQnF9m9zrzpHNS6thBRclU7HtQzBZWT5Yxr5npQIGezlN6p2r1X24ucqxjXOyOMm8uz3Y4hhkLjeoIV88mBpDQ4KuU3UIuiy9X4iIMridnK+o9xK8XJY2tGGPa/+L+enyRle+aJgSaAEahwOC3O1z6UIcikI4u3KYMlvl4GsE1aiKyl0DRvnNBmrCSFTdn5H4dHTCZnrzzh0cf06bZAeXSG37btIj1tuHkTxa7zEvp/hS72b0WkJvB87pcbF+Ljwf0TfqW9glmLyBJNH2flDOebp8mUODYRZxAggF2sx4faOlQu0yWBNt7lnJWWQGLDJ33XnUvXa+fI1EzgkxF2WgmODx47ofFEZr1guHDyy91+1j5dR5ChVV0p4A4Ajk19JoP2xkaVw2yh3MH7yTV/TQ3IKC6AhlBpuEZOQrj+C3EAC3LaajYRE9oTO18KZyIwkojTg+CbcI3e4Yak9vRd5P+NCUiBoF7FMNciRkw5LGD99C2QeCZEzb5Vt1y/xnh8xf+YC30M89Txcrh3Lj/X1E5xjnYY9eNdAmU9pqwpy0DLS6qPokAK2pX+UxxGCDgpZjWBmc1q1O3k5o9b5yn5iM0TIOZAmm/nodPecmTbtIsSKrYE0kJIvOo9oNnwBw/lR0J5m43Dl2+qrHEk5B32zsbFeGzvyZxKc9QtzLaObcA3Sa2LQRN0Mi/RIY3SNIrHGw8Fv9FtNLhWsIBIIJnoxXjRHp+py9mNCuyZtVBRCDIiMPJyonasoX8/phbCOR9GxgG8xIZEVqhDUELtbVWiIxWlGLX hDYPGVLO YEGXDdRxtW3qgXKqa2Oxd0Wvn6cHy0+r+6TiQ2uU1vf6AGRnY6xgeo/jH2u6QaWRl96Jhddgz1NCKVI+FXhJQZwEPXCudJ0BAich0vt7RtrqRRqLoGIe6984XifZ424BGpuQYZz3/DEpGJbO3yaNi2ZTi8Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Since commit 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy updates") it's been possible for arch_[enter|leave]_lazy_mmu_mode() to be called without holding a page table lock (for the kernel mappings case), and therefore it is possible that preemption may occur while in the lazy mmu mode. The Sparc lazy mmu implementation is not robust to preemption since it stores the lazy mode state in a per-cpu structure and does not attempt to manage that state on task switch. Powerpc had the same issue and fixed it by explicitly disabling preemption in arch_enter_lazy_mmu_mode() and re-enabling in arch_leave_lazy_mmu_mode(). See commit b9ef323ea168 ("powerpc/64s: Disable preemption in hash lazy mmu mode"). Given Sparc's lazy mmu mode is based on powerpc's, let's fix it in the same way here. Fixes: 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy updates") Signed-off-by: Ryan Roberts Acked-by: David Hildenbrand Acked-by: Andreas Larsson --- arch/sparc/mm/tlb.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c index 8648a50afe88..a35ddcca5e76 100644 --- a/arch/sparc/mm/tlb.c +++ b/arch/sparc/mm/tlb.c @@ -52,8 +52,10 @@ void flush_tlb_pending(void) void arch_enter_lazy_mmu_mode(void) { - struct tlb_batch *tb = this_cpu_ptr(&tlb_batch); + struct tlb_batch *tb; + preempt_disable(); + tb = this_cpu_ptr(&tlb_batch); tb->active = 1; } @@ -64,6 +66,7 @@ void arch_leave_lazy_mmu_mode(void) if (tb->tlb_nr) flush_tlb_pending(); tb->active = 0; + preempt_enable(); } static void tlb_batch_add_one(struct mm_struct *mm, unsigned long vaddr, From patchwork Sun Mar 2 14:55:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13997841 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA745C282C6 for ; Sun, 2 Mar 2025 14:56:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A9826B0085; Sun, 2 Mar 2025 09:56:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 656C46B0089; Sun, 2 Mar 2025 09:56:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4A88E6B008A; Sun, 2 Mar 2025 09:56:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 20A036B0085 for ; Sun, 2 Mar 2025 09:56:19 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B39CDA0544 for ; Sun, 2 Mar 2025 14:56:18 +0000 (UTC) X-FDA: 83176911636.26.CB4A847 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf22.hostedemail.com (Postfix) with ESMTP id 33C98C0007 for ; Sun, 2 Mar 2025 14:56:17 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740927377; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8kbQWSaKFrksnWfVluuiwipXha1RHiU10Qk+MDjMsn4=; b=ybbjQN7RkqgyYbPlcaf6mBVvf8jZ9bwn+BaYMclkZA6Jvy4cH9k8sOs2O2XXbh1ZphDCRC aPN5x9i5bS32yRcjXqeOumstEuR6n9vWhuxvfXxPJBWrSCtqonveKKcY2WJ5Yn4KxFxHjo /g9dakETI9rSKRY7d3B0L9bBSp/eRhg= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740927377; a=rsa-sha256; cv=none; b=dgiD2nM9zBasT2j9uhD88SUhgJo7CNIOHXbN8TQTJeJOJeunCJAUTzjK1oHosCW5aZYk7P j8NM5GjTv6gevasDFgNNWPdoPuJwhIsDPnVDnI6D94BYt7D05W3G9LavXIHZ8mLmcaedgr jCJbdeVdtk0tJwiO5fp6lpuo0flcKXs= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EC74816F3; Sun, 2 Mar 2025 06:56:30 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 291D73F5A1; Sun, 2 Mar 2025 06:56:14 -0800 (PST) From: Ryan Roberts To: Andrew Morton , "David S. Miller" , Andreas Larsson , Juergen Gross , Boris Ostrovsky , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , "Matthew Wilcox (Oracle)" , Catalin Marinas Cc: Ryan Roberts , linux-mm@kvack.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 3/4] sparc/mm: Avoid calling arch_enter/leave_lazy_mmu() in set_ptes Date: Sun, 2 Mar 2025 14:55:53 +0000 Message-ID: <20250302145555.3236789-4-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250302145555.3236789-1-ryan.roberts@arm.com> References: <20250302145555.3236789-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 33C98C0007 X-Rspamd-Server: rspam09 X-Stat-Signature: 85ry8mzm8x4y8thnwds1rg6xcuk191ia X-HE-Tag: 1740927377-121607 X-HE-Meta: U2FsdGVkX18x0nh8Qcmm4bnw0+7LQeFl1aJkpGwyjYTMlboR6TKCbN/HMfUEGqy09fxM7gZD83wJeHBrSD2xH4YW4n1UB2vEdINoM/ojUZPAN6nWszyyffB+SPRX57/EdOzkWWTmOkhqB0aJ6gLRliWr43wwWRaUqge9hH+owrAz2wI5UEhVCGH7QCq2eseotLUb+apIVpp4WjcDvLKRiCMapi3saqBwmjfNUbRZ/p24fzgnuu8lAERLiF+EEsKm1DKiPI0qPOtKNqS+jBME92jghSDWxJqGe/VZnIbPeZKCQDtIxQwiHoR6sRWq3qylmozPHfta5Lrx/bgIOOYXhm/Ed0Z0E7N4INsQV4d7u8duiqBQIk2LfVelaa/ozb3F0QKi5FnFgokuQpYw9jTfem+OJuBCstgkMCs+MnlmtyE4znxFEFMQAxUjHWeVaPhGjex0NXfyW/o4tNdT9RxdBlTFNB2wsIiCOQUwe19Ede3TIYBFZnNzOMyYoV6akfOmsFORNx3rX8+vVSW6q4J/0OD7q8m/p08pDzmxfdCs30fFvlkJJ0ia1jjEBtQlszdK81NER3MdvXjC2YycLwH9enAlFz4aP0kDbZv0PDfUqPu9mq0OuiYcwiQCrKNLruiyEpj8y0WCGw895UQ2Gv+eVSu+PhAXPX9/Bl13XoL7BLWW4T+3+nMLT80zVaqvYBAr5aOiIVyVMChUkRWWELWrzctnbBjyoncNMjHXXVYPwnla3oJhrRnoAbRu1hVrBA/hXv5naMUgFDi6x/ect1oovp0ELDrTQVkaSL5XKt+CW7+BHq4Q9kIzHFfo0LTRxaTh7sYEquBvJwtXOyl6fo5J3A2wQiru5X0MKmx2utlo2VFHcPqGi8+kbzMP1hI/3P4mHFurU8Mn5RQ4APmJ9dkAYV1BmbFP0zWC8OJv3ujVV+B2l7b0bxim2tp/jwmCaTeMy8U7JcOAzwL9CK24FUl CH6J+ECF hz2rbiWmfILO77DywK1ckNX6vtWfr/bFQBZfSwp6VGCoS3Tu5XRAnkw7Ls+1SgYRq+biU79oT9gWELvPTb3GQeB2uiA+c7aMPLWwgMYMEB2fvR9Kptnl22SocWL+C2H++iu2VQ06k7nJHP6cqXNYmHocUs48EYNYOya8zJMv4SL7hpDWON8N8rty3lm0N5CiSlmjaOJtCC49kK8w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: With commit 1a10a44dfc1d ("sparc64: implement the new page table range API") set_ptes was added to the sparc architecture. The implementation included calling arch_enter/leave_lazy_mmu() calls. The patch removes the usage of arch_enter/leave_lazy_mmu() since this implies nesting of lazy mmu regions which is not supported. Without this fix, lazy mmu mode is effectively disabled because we exit the mode after the first set_ptes: remap_pte_range() -> arch_enter_lazy_mmu() -> set_ptes() -> arch_enter_lazy_mmu() -> arch_leave_lazy_mmu() -> arch_leave_lazy_mmu() Powerpc suffered the same problem and fixed it in a corresponding way with commit 47b8def9358c ("powerpc/mm: Avoid calling arch_enter/leave_lazy_mmu() in set_ptes"). Fixes: 1a10a44dfc1d ("sparc64: implement the new page table range API") Signed-off-by: Ryan Roberts Acked-by: David Hildenbrand Acked-by: Andreas Larsson --- arch/sparc/include/asm/pgtable_64.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 2b7f358762c1..dc28f2c4eee3 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -936,7 +936,6 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr, static inline void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr) { - arch_enter_lazy_mmu_mode(); for (;;) { __set_pte_at(mm, addr, ptep, pte, 0); if (--nr == 0) @@ -945,7 +944,6 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, pte_val(pte) += PAGE_SIZE; addr += PAGE_SIZE; } - arch_leave_lazy_mmu_mode(); } #define set_ptes set_ptes From patchwork Sun Mar 2 14:55:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13997842 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70831C282C6 for ; Sun, 2 Mar 2025 14:56:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0BAE6B0089; Sun, 2 Mar 2025 09:56:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EBB616B008A; Sun, 2 Mar 2025 09:56:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D81456B008C; Sun, 2 Mar 2025 09:56:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id BA49B6B0089 for ; Sun, 2 Mar 2025 09:56:21 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 75D6E1A0199 for ; Sun, 2 Mar 2025 14:56:21 +0000 (UTC) X-FDA: 83176911762.18.53D7AD6 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf12.hostedemail.com (Postfix) with ESMTP id CA3AB40006 for ; Sun, 2 Mar 2025 14:56:19 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740927380; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3m2pAygD+EMdTPBq71HILz1nyiOak8HVJYf6iQSblHs=; b=ajORdI6KnaKkdXOrdCt4YIiS9qotOX1lR/+xOihepmhGV6jSCm11SANVCEkOHLy4uy5dym kF9dODTjcAWcjdOiO9UtRCgBZFCUDzeJpYnmSbii7H0CV53hKEkR0Y2Fn3xmNJg+4Xm7Lh BdTMIPU+nk5S3FsZbKOfHM7DyONky/s= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740927380; a=rsa-sha256; cv=none; b=VHR4wpb5H4qX7sAZoAecdybM6XjizRH8fHQN+4j9clQrQatum2ApYd/HBLPMDlF8YlbxH+ 0cDxOfm8QYiiJgifI4kGoJAe9QGa8J9CVLZzEXs0BfNB7TeZ2Nr7EoL0vKujjgSYazDDju CWtZosxtl1U5QyeM39RpxiHePPbrgbw= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9B79A16F8; Sun, 2 Mar 2025 06:56:33 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CB8B43F5A1; Sun, 2 Mar 2025 06:56:16 -0800 (PST) From: Ryan Roberts To: Andrew Morton , "David S. Miller" , Andreas Larsson , Juergen Gross , Boris Ostrovsky , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , "Matthew Wilcox (Oracle)" , Catalin Marinas Cc: Ryan Roberts , linux-mm@kvack.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 4/4] Revert "x86/xen: allow nesting of same lazy mode" Date: Sun, 2 Mar 2025 14:55:54 +0000 Message-ID: <20250302145555.3236789-5-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250302145555.3236789-1-ryan.roberts@arm.com> References: <20250302145555.3236789-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: CA3AB40006 X-Rspamd-Server: rspam09 X-Stat-Signature: oufztyakj1q47ntxj49gks6sy569jsw8 X-HE-Tag: 1740927379-31448 X-HE-Meta: U2FsdGVkX1/06ASCepttRsedI7tVoP/paJ+dhRZeDF8aZFgY3CPsmjjt32VLfRt2SCp+2zxSXvHMggjcsyXT2w7NzV36N9+24gHmy7Pw60aKYyU5rIbQv/kysyzN2jV8Wg+Xi7SqbHaT0wOZIFD8ymU/ArP9PFOb/ijymmI777ke1B0Xj8N4dOb+jjFZ3mStWHtCdHNTz9hBLqMmdhnybULEtau3EN6p2D8uKiyGF87Hy3L5WwZK0AxI3td3nLhEMBAxZCCY69aSi/8DPII7Jbby8Vk3p28qRjbz6avf33miH8W0P7kCWa8ssXqvhdIftsN710LnPWpzwih917RgwKlx636V8rzwAZZOVHVMMgtf/dkoa8axokt/g59re5iEmhOB3PHB8N3tC3ekt7r7atbfbS2yOC5NrrruSC1GsWvafEagf173gml9Kig4IG8s4MhB7e5UeKLtMXD0aZ9ISm5KqfI53JyvEjczrLI+dgdQlI6UIRLVcF/5DiwCTZvXzq7qUZG/dQfT9JFiJarcnt01OHUxN8QlKjlCUCbnzY6eM9zQmIc/4mF7i95vIFgY3WThMyNgAjdAI+Zh6gR7/sS7fn3cBvkonLdTv9h3ygaBiSIvVzVPfpVRPqi/X22uNg8pilJe9KbpJKl/LWqhtoEh4H+RCt1uvwy8+JnUERMvfxu0+1V73RGirtyO5m2rS8Naoqalxz1es7Yk/klwU2Qtw85go7xyEEVYtd2bvJ6laADfRVne/0/TZwUQk3RnE3KZkpLHwRsmLnUFbdgFatBA1AAAODQT1nXdlkjimBPJ1oHC3Rk88aWgN9wa6h+AcUkywe7Y9X1rKdDAJrTBIXBbDopFeNbEpzmb4Wkz0E0B2/eL1kMdS3ztL0hc4cB9nJoGyCHyXkwZ4Zfy7El1DSVilCQlve1NP4ms4VyQgrkLjuK8zUzSK2+sK6M8B9X2bavIqzVC+32Ok1wPv9T ocCZTAML GTGiBdDhEJI/wd+XiprJYbu6h6ZJSFyDcslgl/fUuMHbaK74rxbWIaufdCPT36Zw6NDFkALZim6W5kx3TYwtAwwv7OEocbq1A3kIfM9GaebdrNDz805d8SYZ28V7Rm66QVCewA1zaOS6xF7YlK/eNagUyvw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Commit 49147beb0ccb ("x86/xen: allow nesting of same lazy mode") was added as a solution for a core-mm code change where arch_[enter|leave]_lazy_mmu_mode() started to be called in a nested manner; see commit bcc6cc832573 ("mm: add default definition of set_ptes()"). However, now that we have fixed the API to avoid nesting, we no longer need this capability in the x86 implementation. Additionally, from code review, I don't believe the fix was ever robust in the case of preemption occurring while in the nested lazy mode. The implementation usually deals with preemption by calling arch_leave_lazy_mmu_mode() from xen_start_context_switch() for the outgoing task if we are in the lazy mmu mode. Then in xen_end_context_switch(), it restarts the lazy mode by calling arch_enter_lazy_mmu_mode() for an incoming task that was in the lazy mode when it was switched out. But arch_leave_lazy_mmu_mode() will only unwind a single level of nesting. If we are in the double nest, then it's not fully unwound and per-cpu variables are left in a bad state. So the correct solution is to remove the possibility of nesting from the higher level (which has now been done) and remove this x86-specific solution. Fixes: 49147beb0ccb ("x86/xen: allow nesting of same lazy mode") Signed-off-by: Ryan Roberts Acked-by: David Hildenbrand --- arch/x86/include/asm/xen/hypervisor.h | 15 ++------------- arch/x86/xen/enlighten_pv.c | 1 - 2 files changed, 2 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/xen/hypervisor.h b/arch/x86/include/asm/xen/hypervisor.h index a9088250770f..bd0fc69a10a7 100644 --- a/arch/x86/include/asm/xen/hypervisor.h +++ b/arch/x86/include/asm/xen/hypervisor.h @@ -72,18 +72,10 @@ enum xen_lazy_mode { }; DECLARE_PER_CPU(enum xen_lazy_mode, xen_lazy_mode); -DECLARE_PER_CPU(unsigned int, xen_lazy_nesting); static inline void enter_lazy(enum xen_lazy_mode mode) { - enum xen_lazy_mode old_mode = this_cpu_read(xen_lazy_mode); - - if (mode == old_mode) { - this_cpu_inc(xen_lazy_nesting); - return; - } - - BUG_ON(old_mode != XEN_LAZY_NONE); + BUG_ON(this_cpu_read(xen_lazy_mode) != XEN_LAZY_NONE); this_cpu_write(xen_lazy_mode, mode); } @@ -92,10 +84,7 @@ static inline void leave_lazy(enum xen_lazy_mode mode) { BUG_ON(this_cpu_read(xen_lazy_mode) != mode); - if (this_cpu_read(xen_lazy_nesting) == 0) - this_cpu_write(xen_lazy_mode, XEN_LAZY_NONE); - else - this_cpu_dec(xen_lazy_nesting); + this_cpu_write(xen_lazy_mode, XEN_LAZY_NONE); } enum xen_lazy_mode xen_get_lazy_mode(void); diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 5e57835e999d..919e4df9380b 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -99,7 +99,6 @@ struct tls_descs { }; DEFINE_PER_CPU(enum xen_lazy_mode, xen_lazy_mode) = XEN_LAZY_NONE; -DEFINE_PER_CPU(unsigned int, xen_lazy_nesting); enum xen_lazy_mode xen_get_lazy_mode(void) {