From patchwork Fri Mar 11 19:07:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12778535 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 211D3C433FE for ; Fri, 11 Mar 2022 19:07:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B01C8D0002; Fri, 11 Mar 2022 14:07:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 65FB68D0001; Fri, 11 Mar 2022 14:07:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5016A8D0002; Fri, 11 Mar 2022 14:07:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0178.hostedemail.com [216.40.44.178]) by kanga.kvack.org (Postfix) with ESMTP id 4280A8D0001 for ; Fri, 11 Mar 2022 14:07:06 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id F327C89073 for ; Fri, 11 Mar 2022 19:07:05 +0000 (UTC) X-FDA: 79233038052.30.1596704 Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by imf10.hostedemail.com (Postfix) with ESMTP id 7DD74C0021 for ; Fri, 11 Mar 2022 19:07:05 +0000 (UTC) Received: by mail-pg1-f173.google.com with SMTP id 132so8210549pga.5 for ; Fri, 11 Mar 2022 11:07:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=sDJ/41RHMLJzc98gGAPcQbU38m8lJtFcZEeAQmDlWXg=; b=eB2fEtFzdgN9cXc57alBRC4gluos4heUHZd/NkibI+iQkHVIgyj9X/N3a59R9POFfz FZAEGeEoGaP/DiE1VKnvbdnQHI9dsGk0DFE/1tJBP3Sg24jyuIIvJsZyw+uOYNNaDR5P 62e/plny9+qhehr27OSqyhSf3wCMDRZ8k+gXY2FT3fY2V9dgrCAExF8mufrS9LbOWuWG ZUrU5JjD8b6cLYG9WZwCfsop5Xvm/ntnuXh5WrZOXYdeHzkZypMKytBFZb4inGRjlDos e0K7M+qmuOJCczhU20fTRaMNq2L8nicOYYEVQJ0yt3APiBEMLeIJacTtxNqhnQWbHep3 AxFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=sDJ/41RHMLJzc98gGAPcQbU38m8lJtFcZEeAQmDlWXg=; b=h/cymkXKpGoxPXoF7MUrL0UjNUAPTaCWXZz3XLw1+ogmUmKcfqtC9IvEEM5OFDdZqX seiaYGPUExwmHCF7fpp41DHJHnfq8580MN7JmRrm0tHKsZMGuR/07qDQ0BhpH1vtRElp vDqPPpJOCzfYAqjcPlsoIgTeXZXypig6pmrwsS2QAUxAEPlnXWtZGNM9J1i1RH3FVaMb 18TQG1ZEektA5yWGbkdg01lXblV6GwqujTJPsH8mEzHX/dnzpEev+KWnJ6nHZdlBQGrL tNTjLNN2UbyjNDdnUVU10N71YVqgz+q3+wHja7zHygc9hj9aIgJZ9lmPm4fl3kdJqTZB l0mw== X-Gm-Message-State: AOAM531qh/1zQNxrCoHfsnfaaPFcPt5Tclivd1XA+7sRc50zIV7S+kxI 1nQGpvrTG0+YjfpjF7gfb8Vy4r8woTuu+Q== X-Google-Smtp-Source: ABdhPJyTK8Dn5ndOR70NtnVBIdFNMmT84r6IAuEigLqvHxRv905ZSZ5+Jzeb5r4L2W+TCEADjvwsUg== X-Received: by 2002:a63:f90b:0:b0:378:a292:3951 with SMTP id h11-20020a63f90b000000b00378a2923951mr9594408pgi.312.1647025624006; Fri, 11 Mar 2022 11:07:04 -0800 (PST) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id a5-20020a621a05000000b004f79f8f795fsm857329pfa.0.2022.03.11.11.07.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Mar 2022 11:07:03 -0800 (PST) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Nadav Amit , Andi Kleen , Andrea Arcangeli , Andrew Cooper , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org Subject: [RESEND PATCH v3 1/5] x86: Detection of Knights Landing A/D leak Date: Fri, 11 Mar 2022 11:07:45 -0800 Message-Id: <20220311190749.338281-2-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220311190749.338281-1-namit@vmware.com> References: <20220311190749.338281-1-namit@vmware.com> MIME-Version: 1.0 X-Stat-Signature: ctnruiax6pe55wiu9epggeqgwoud88up Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=eB2fEtFz; spf=none (imf10.hostedemail.com: domain of mail-pg1-f173.google.com has no SPF policy when checking 209.85.215.173) smtp.helo=mail-pg1-f173.google.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 7DD74C0021 X-HE-Tag: 1647025625-295252 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Knights Landing has a issue that a thread setting A or D bits may not do so atomically against checking the present bit. A thread which is going to page fault may still set those bits, even though the present bit was already atomically cleared. This implies that when the kernel clears present atomically, some time later the supposed to be zero entry could be corrupted with stray A or D bits. Since the PTE could be already used for storing a swap index, or a NUMA migration index, this cannot be tolerated. Most of the time the kernel detects the problem, but in some rare cases it may not. This patch adds an interface to detect the bug, which will be used in a following patch. Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Andrew Cooper Cc: Andrew Morton Cc: Andy Lutomirski Cc: Dave Hansen Cc: Peter Xu Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Cc: Yu Zhao Cc: Nick Piggin Cc: x86@kernel.org Link: https://lore.kernel.org/lkml/1465919919-2093-1-git-send-email-lukasz.anaczkowski@intel.com/ Signed-off-by: Nadav Amit --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/intel.c | 5 +++++ 2 files changed, 6 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 67ef0e81c7dc..184b299dbf12 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -442,5 +442,6 @@ #define X86_BUG_TAA X86_BUG(22) /* CPU is affected by TSX Async Abort(TAA) */ #define X86_BUG_ITLB_MULTIHIT X86_BUG(23) /* CPU may incur MCE during certain page attribute changes */ #define X86_BUG_SRBDS X86_BUG(24) /* CPU may leak RNG bits if not mitigated */ +#define X86_BUG_PTE_LEAK X86_BUG(25) /* PTE may leak A/D bits after clear */ #endif /* _ASM_X86_CPUFEATURES_H */ diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 8321c43554a1..74780fef3f12 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -296,6 +296,11 @@ static void early_init_intel(struct cpuinfo_x86 *c) } } + if (c->x86_model == INTEL_FAM6_XEON_PHI_KNL) { + pr_info_once("Enabling PTE leaking workaround\n"); + set_cpu_bug(c, X86_BUG_PTE_LEAK); + } + /* * Intel Quark Core DevMan_001.pdf section 6.4.11 * "The operating system also is required to invalidate (i.e., flush) From patchwork Fri Mar 11 19:07:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12778536 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F7D1C433FE for ; Fri, 11 Mar 2022 19:07:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C2868D0003; Fri, 11 Mar 2022 14:07:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7712F8D0001; Fri, 11 Mar 2022 14:07:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C3ED8D0003; Fri, 11 Mar 2022 14:07:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0173.hostedemail.com [216.40.44.173]) by kanga.kvack.org (Postfix) with ESMTP id 4BEC78D0001 for ; Fri, 11 Mar 2022 14:07:10 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id F340D181D0839 for ; Fri, 11 Mar 2022 19:07:09 +0000 (UTC) X-FDA: 79233038178.22.7F90152 Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) by imf24.hostedemail.com (Postfix) with ESMTP id 7C9E9180020 for ; Fri, 11 Mar 2022 19:07:09 +0000 (UTC) Received: by mail-pj1-f46.google.com with SMTP id cx5so9013415pjb.1 for ; Fri, 11 Mar 2022 11:07:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Ez1m8ji0DENiS3MZwucDKYsoMBLcsLLT3XVhtPvMlT4=; b=AJRZoiUSCJAIrzm032iHUICwDZq/hq8Zyd+0XX14SBnOKC0e2eobJYzH9yvUz+4JBz F0NgASM4Gn41ToKhL/ZFuYihUKIRe/tteZZREyCn+Xi116UhFsMXOfEYuyu2J/FP55Ks XRyazFiUYgGhBU4KK+rSZk4E/40pWNOk1mVpnkLu/u67ESTD4qeYnF++dwPjbo8B0H5r 5P8nlKgoTXW5QMBagNvjL0o9ke56d0j7d4wirVH/SX2BRvOoKsZQDkrDlZ2D5+QrjyYl 1Dopzx9r6iBC19OZ94o+hfw+op06Hvgjz5eTuglSokDkQOasgInXMg+wuLfGXa9VdD6D iZIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Ez1m8ji0DENiS3MZwucDKYsoMBLcsLLT3XVhtPvMlT4=; b=I6xwOYY0FwSZ7D/dL5uy2BlhK/P1h9SQYAivVdfH0yiR9T8vpSGayiw1tWCpGXSVrn PcgLOkG/WjCyZHEljqvZolLF3yL6JTMnxY8YfSALlIT+7+g1U4GTYXtG62Yc4dS/jlSb AnA8wPwdnuWWCwZ/SS1nVMRrkbLIJcIuZxHTA21t6ulODjrw0adBaitP/T1zscW282K+ /AztThz3HtSddevz4sZVO/tFZXkJgewu2U+rLKpoUCVL+bLIOiRSHKXXk54cZIXqYSjI HEny5AAnlpPmTStBnsWLA4LLfDwvKV4TraAdZ5hTUO9Ml1KpvE9E4FnxVpfQTVgzghJm B/wg== X-Gm-Message-State: AOAM532njJ+0M+j/dcx44Mgu3hgGH1CjrpW9BmD3U53Z2qap49lAgWgs JpIDqdXNpXNu8ADqaC1GcAuzujsqhVA= X-Google-Smtp-Source: ABdhPJwzrGkJNU8eaqpNmtPYmUY5AIgPhop03DqvgllySa8neNw7f69O4zIOiZLwIyR6udKDnvGUNQ== X-Received: by 2002:a17:903:2051:b0:151:d161:f0a4 with SMTP id q17-20020a170903205100b00151d161f0a4mr12109442pla.37.1647025628182; Fri, 11 Mar 2022 11:07:08 -0800 (PST) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id a5-20020a621a05000000b004f79f8f795fsm857329pfa.0.2022.03.11.11.07.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Mar 2022 11:07:07 -0800 (PST) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Nadav Amit , Andrea Arcangeli , Andy Lutomirski , Dave Hansen , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org Subject: [RESEND PATCH v3 4/5] mm/mprotect: do not flush on permission promotion Date: Fri, 11 Mar 2022 11:07:48 -0800 Message-Id: <20220311190749.338281-5-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220311190749.338281-1-namit@vmware.com> References: <20220311190749.338281-1-namit@vmware.com> MIME-Version: 1.0 Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AJRZoiUS; spf=none (imf24.hostedemail.com: domain of mail-pj1-f46.google.com has no SPF policy when checking 209.85.216.46) smtp.helo=mail-pj1-f46.google.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 7C9E9180020 X-Stat-Signature: 3tqhm1p54wmkqdx7a69dsze36m6qk7x8 X-HE-Tag: 1647025629-988535 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Currently, using mprotect() to unprotect a memory region or uffd to unprotect a memory region causes a TLB flush. At least on x86, as protection is promoted, no TLB flush is needed. Add an arch-specific pte_may_need_flush() which tells whether a TLB flush is needed based on the old PTE and the new one. Implement an x86 pte_may_need_flush(). For x86, besides the simple logic that PTE protection promotion or changes of software bits does require a flush, also add logic that considers the dirty-bit. If the dirty-bit is clear and write-protect is set, no TLB flush is needed, as x86 updates the dirty-bit atomically on write, and if the bit is clear, the PTE is reread. Cc: Andrea Arcangeli Cc: Andrew Morton Cc: Andy Lutomirski Cc: Dave Hansen Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Cc: Yu Zhao Cc: Nick Piggin Cc: x86@kernel.org Signed-off-by: Nadav Amit --- arch/x86/include/asm/pgtable_types.h | 2 + arch/x86/include/asm/tlbflush.h | 82 ++++++++++++++++++++++++++++ include/asm-generic/tlb.h | 14 +++++ mm/huge_memory.c | 9 +-- mm/mprotect.c | 3 +- 5 files changed, 105 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 40497a9020c6..8668bc661026 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -110,9 +110,11 @@ #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX) #define _PAGE_DEVMAP (_AT(u64, 1) << _PAGE_BIT_DEVMAP) +#define _PAGE_SOFTW4 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW4) #else #define _PAGE_NX (_AT(pteval_t, 0)) #define _PAGE_DEVMAP (_AT(pteval_t, 0)) +#define _PAGE_SOFTW4 (_AT(pteval_t, 0)) #endif #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 98fa0a114074..ec01e6cff137 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -259,6 +259,88 @@ static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch, extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +/* + * The enabled_mask tells which bits that were present and gets cleared require + * flush. + * + * The disabled_mask tells which bits that were missing and gets set require + * flush. + * + * All the other bits except the ignored bits will require a flush no matter if + * they gets set or cleared. + * + * This function ignores the global bit, as it is used for protnone. This + * function should therefore not be used if the global bit might really be + * cleared. + * + * The function allows to ignore_access flags (provide _PAGE_ACCESS as argument + * to do so). The expected use is that the access bit would be ignored for PTEs + * but not for PMDs. That is the way the kernel perform TLB flushes after + * updates of the access-bit in other situations. + */ +static inline bool pte_flags_need_flush(unsigned long oldflags, + unsigned long newflags, + pteval_t ignore_access) +{ + const pteval_t ignore_mask = _PAGE_SOFTW1 | _PAGE_SOFTW2 | + _PAGE_SOFTW3 | _PAGE_SOFTW4 | _PAGE_GLOBAL | ignore_access; + const pteval_t enable_mask = _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | + (_PAGE_ACCESSED & ~ignore_access); + const pteval_t disable_mask = _PAGE_NX; + unsigned long diff = oldflags ^ newflags; + + VM_BUG_ON(ignore_access != 0 && ignore_access != _PAGE_ACCESSED); + + return diff & ((oldflags & enable_mask) | + (newflags & disable_mask) | + ~(enable_mask | disable_mask | ignore_mask)); +} + +/* + * pte_needs_flush() checks whether permissions were demoted and require a + * flush. It should only be used for userspace PTEs. + */ +static inline bool pte_needs_flush(pte_t oldpte, pte_t newpte) +{ + /* !PRESENT -> * ; no need for flush */ + if (!pte_present(oldpte)) + return false; + + /* PRESENT -> !PRESENT ; needs flush */ + if (!pte_present(newpte)) + return true; + + /* PFN changed ; needs flush */ + if (pte_pfn(oldpte) != pte_pfn(newpte)) + return true; + + return pte_flags_need_flush(pte_flags(oldpte), pte_flags(newpte), + _PAGE_ACCESSED); +} +#define pte_needs_flush pte_needs_flush + +/* + * huge_pmd_needs_flush() checks whether permissions were demoted and + * require a flush. It should only be used for userspace huge PMDs. + */ +static inline bool huge_pmd_needs_flush(pmd_t oldpmd, pmd_t newpmd) +{ + /* !PRESENT -> * ; no need for flush */ + if (!pmd_present(oldpmd)) + return false; + + /* PRESENT -> !PRESENT ; needs flush */ + if (!pmd_present(newpmd)) + return true; + + /* PFN changed ; needs flush */ + if (pmd_pfn(oldpmd) != pmd_pfn(newpmd)) + return true; + + return pte_flags_need_flush(pmd_flags(oldpmd), pmd_flags(newpmd), 0); +} +#define huge_pmd_needs_flush huge_pmd_needs_flush + #endif /* !MODULE */ static inline void __native_tlb_flush_global(unsigned long cr4) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index fd7feb5c7894..3a30e23fa35d 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -654,6 +654,20 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, } while (0) #endif +#ifndef pte_needs_flush +static inline bool pte_needs_flush(pte_t oldpte, pte_t newpte) +{ + return true; +} +#endif + +#ifndef huge_pmd_needs_flush +static inline bool huge_pmd_needs_flush(pmd_t oldpmd, pmd_t newpmd) +{ + return true; +} +#endif + #endif /* CONFIG_MMU */ #endif /* _ASM_GENERIC__TLB_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d58a5b498011..51b0f3cb1ba0 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1698,7 +1698,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, { struct mm_struct *mm = vma->vm_mm; spinlock_t *ptl; - pmd_t entry; + pmd_t oldpmd, entry; bool preserve_write; int ret; bool prot_numa = cp_flags & MM_CP_PROT_NUMA; @@ -1784,9 +1784,9 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, * pmdp_invalidate() is required to make sure we don't miss * dirty/young flags set by hardware. */ - entry = pmdp_invalidate(vma, addr, pmd); + oldpmd = pmdp_invalidate(vma, addr, pmd); - entry = pmd_modify(entry, newprot); + entry = pmd_modify(oldpmd, newprot); if (preserve_write) entry = pmd_mk_savedwrite(entry); if (uffd_wp) { @@ -1803,7 +1803,8 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, ret = HPAGE_PMD_NR; set_pmd_at(mm, addr, pmd, entry); - tlb_flush_pmd_range(tlb, addr, HPAGE_PMD_SIZE); + if (huge_pmd_needs_flush(oldpmd, entry)) + tlb_flush_pmd_range(tlb, addr, HPAGE_PMD_SIZE); BUG_ON(vma_is_anonymous(vma) && !preserve_write && pmd_write(entry)); unlock: diff --git a/mm/mprotect.c b/mm/mprotect.c index f9730bac2d78..97967d589ddb 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -152,7 +152,8 @@ static unsigned long change_pte_range(struct mmu_gather *tlb, ptent = pte_mkwrite(ptent); } ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); - tlb_flush_pte_range(tlb, addr, PAGE_SIZE); + if (pte_needs_flush(oldpte, ptent)) + tlb_flush_pte_range(tlb, addr, PAGE_SIZE); pages++; } else if (is_swap_pte(oldpte)) { swp_entry_t entry = pte_to_swp_entry(oldpte); From patchwork Fri Mar 11 19:07:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12778537 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EDF8C433F5 for ; Fri, 11 Mar 2022 19:07:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E54718D0005; Fri, 11 Mar 2022 14:07:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DD97F8D0001; Fri, 11 Mar 2022 14:07:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C55508D0005; Fri, 11 Mar 2022 14:07:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0158.hostedemail.com [216.40.44.158]) by kanga.kvack.org (Postfix) with ESMTP id A3C758D0001 for ; Fri, 11 Mar 2022 14:07:11 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 63F72181D3975 for ; Fri, 11 Mar 2022 19:07:11 +0000 (UTC) X-FDA: 79233038262.29.41DD689 Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) by imf31.hostedemail.com (Postfix) with ESMTP id E7AF620028 for ; Fri, 11 Mar 2022 19:07:10 +0000 (UTC) Received: by mail-pg1-f171.google.com with SMTP id t187so8232611pgb.1 for ; Fri, 11 Mar 2022 11:07:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=H+7HXLBkyAOLpnbUPOmqITaSjFmsP/5pELWsoOHR/vE=; b=MJUW/OBtekTcOTSPuabcFv3BZEGGPew/ae5RAx3Okfwdh7+YFVmWWVBCoD2oJ9Wpp2 yiGW3QXb4/sREzD5y8Ba3WABTEmXNS4zXZMt11BPqGrjIRvnnYrfBJHqf05nQHuRTfoQ HtrZNCd49fExA3DSuGIxYVOTGgHwbzMgx8JzfVqSZWBRig7y9a0r5GBTAALGT6wJ8h5d 1mZT0oa97bKk6POT4ycHwrShqf3jeoz1RqYEIe1YSaWdzJdFyhsQUkNrKan7IoJmhYia TWWZg1lcE/0BzVPilxag1kNVCgjk1t/S1yWle7P4w4w9O/XDRfUvm+pGvIoV+2wLIPcq OQUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=H+7HXLBkyAOLpnbUPOmqITaSjFmsP/5pELWsoOHR/vE=; b=SYEOQC6sHD+/652UJhCZ+NJCypVA713JYvvsWom2ReJZqB1eAKzAq8wMlN0v36zJ30 pScNQBeKs6/33lq68FtxF255wGrRigJewTbFaeaYuyIc3ZYkANYeWMsjeP2LqHglHCi8 eCAwqhx3Jb4jv4uVZpYwqiSoZ6L7U0MmmdreH6DlhdEIaH6DcFHsrc24hA6DSTb+7kbz 90hl//zb4hUguDO9ceyyljOllELot3jej738YZKbdGF4NuJYnsim9GCm8HWniXn5BWS+ 3ADwxcKAnuAY6+z6zhOZLjuGi4XCzzUKJ/98Ncq0qwgFfR3IZ9vqbWsvfJRWozRz/aqf 5G3g== X-Gm-Message-State: AOAM530ZMXj2AArrL3xRRI9Uz1dMyRvGU9Hu/hP8HVNBJjaQNA7elcAl v68IeC3LCFDJB2COV4PfJcKPFFkorhM= X-Google-Smtp-Source: ABdhPJy26ze4QxSMoBycri2ovrzLLtoevyKLqjLrR1rTJIL+/VdzLtOd0fg/tVvh/U2SjSZv0AfVrA== X-Received: by 2002:a63:7cb:0:b0:380:f89f:c9a2 with SMTP id 194-20020a6307cb000000b00380f89fc9a2mr6316679pgh.264.1647025629551; Fri, 11 Mar 2022 11:07:09 -0800 (PST) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id a5-20020a621a05000000b004f79f8f795fsm857329pfa.0.2022.03.11.11.07.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Mar 2022 11:07:09 -0800 (PST) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Nadav Amit , Andrea Arcangeli , Andrew Cooper , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org Subject: [RESEND PATCH v3 5/5] mm: avoid unnecessary flush on change_huge_pmd() Date: Fri, 11 Mar 2022 11:07:49 -0800 Message-Id: <20220311190749.338281-6-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220311190749.338281-1-namit@vmware.com> References: <20220311190749.338281-1-namit@vmware.com> MIME-Version: 1.0 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: E7AF620028 X-Stat-Signature: st8abe4d77qbaj8itmsyn1okekdf7t9s X-Rspam-User: Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="MJUW/OBt"; spf=none (imf31.hostedemail.com: domain of mail-pg1-f171.google.com has no SPF policy when checking 209.85.215.171) smtp.helo=mail-pg1-f171.google.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1647025630-229343 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Calls to change_protection_range() on THP can trigger, at least on x86, two TLB flushes for one page: one immediately, when pmdp_invalidate() is called by change_huge_pmd(), and then another one later (that can be batched) when change_protection_range() finishes. The first TLB flush is only necessary to prevent the dirty bit (and with a lesser importance the access bit) from changing while the PTE is modified. However, this is not necessary as the x86 CPUs set the dirty-bit atomically with an additional check that the PTE is (still) present. One caveat is Intel's Knights Landing that has a bug and does not do so. Leverage this behavior to eliminate the unnecessary TLB flush in change_huge_pmd(). Introduce a new arch specific pmdp_invalidate_ad() that only invalidates the access and dirty bit from further changes. Cc: Andrea Arcangeli Cc: Andrew Cooper Cc: Andrew Morton Cc: Andy Lutomirski Cc: Dave Hansen Cc: Peter Xu Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Cc: Yu Zhao Cc: Nick Piggin Cc: x86@kernel.org Signed-off-by: Nadav Amit --- arch/x86/include/asm/pgtable.h | 5 +++++ arch/x86/mm/pgtable.c | 10 ++++++++++ include/linux/pgtable.h | 20 ++++++++++++++++++++ mm/huge_memory.c | 4 ++-- mm/pgtable-generic.c | 8 ++++++++ 5 files changed, 45 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 62ab07e24aef..23ad34edcc4b 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1173,6 +1173,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma, } } #endif + +#define __HAVE_ARCH_PMDP_INVALIDATE_AD +extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, + unsigned long address, pmd_t *pmdp); + /* * Page table pages are page-aligned. The lower half of the top * level is used for userspace and the top half for the kernel. diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 3481b35cb4ec..b2fcb2c749ce 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -608,6 +608,16 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma, return young; } + +pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp)); + + if (cpu_feature_enabled(X86_BUG_PTE_LEAK)) + flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); + return old; +} #endif /** diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index f4f4077b97aa..5826e8e52619 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -570,6 +570,26 @@ extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); #endif +#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD + +/* + * pmdp_invalidate_ad() invalidates the PMD while changing a transparent + * hugepage mapping in the page tables. This function is similar to + * pmdp_invalidate(), but should only be used if the access and dirty bits would + * not be cleared by the software in the new PMD value. The function ensures + * that hardware changes of the access and dirty bits updates would not be lost. + * + * Doing so can allow in certain architectures to avoid a TLB flush in most + * cases. Yet, another TLB flush might be necessary later if the PMD update + * itself requires such flush (e.g., if protection was set to be stricter). Yet, + * even when a TLB flush is needed because of the update, the caller may be able + * to batch these TLB flushing operations, so fewer TLB flush operations are + * needed. + */ +extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, + unsigned long address, pmd_t *pmdp); +#endif + #ifndef __HAVE_ARCH_PTE_SAME static inline int pte_same(pte_t pte_a, pte_t pte_b) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 51b0f3cb1ba0..691d80edcfd7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1781,10 +1781,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, * The race makes MADV_DONTNEED miss the huge pmd and don't clear it * which may break userspace. * - * pmdp_invalidate() is required to make sure we don't miss + * pmdp_invalidate_ad() is required to make sure we don't miss * dirty/young flags set by hardware. */ - oldpmd = pmdp_invalidate(vma, addr, pmd); + oldpmd = pmdp_invalidate_ad(vma, addr, pmd); entry = pmd_modify(oldpmd, newprot); if (preserve_write) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 6523fda274e5..90ab721a12a8 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -201,6 +201,14 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, } #endif +#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD +pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + return pmdp_invalidate(vma, address, pmdp); +} +#endif + #ifndef pmdp_collapse_flush pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp)