From patchwork Wed Mar 9 04:10:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12774736 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 416ABC433FE for ; Wed, 9 Mar 2022 04:10:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CCDC18D0007; Tue, 8 Mar 2022 23:10:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C7BCA8D0001; Tue, 8 Mar 2022 23:10:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A814C8D0007; Tue, 8 Mar 2022 23:10:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0084.hostedemail.com [216.40.44.84]) by kanga.kvack.org (Postfix) with ESMTP id 98A5E8D0001 for ; Tue, 8 Mar 2022 23:10:03 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 4B21A1828958B for ; Wed, 9 Mar 2022 04:10:03 +0000 (UTC) X-FDA: 79223519886.22.15E1EE8 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf13.hostedemail.com (Postfix) with ESMTP id 8F09320002 for ; Wed, 9 Mar 2022 04:10:02 +0000 (UTC) Received: by mail-pl1-f179.google.com with SMTP id p17so886060plo.9 for ; Tue, 08 Mar 2022 20:10:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=H+7HXLBkyAOLpnbUPOmqITaSjFmsP/5pELWsoOHR/vE=; b=MfsGJPwWyDMvyL1T59Ih3sOsJ9u2v4j8aGO0XaZ3KIb4388ZkxhUtWz2kSSakAvdvQ fja25Gg2Me/TdX8V4mlvBtciwBfjO5nABn/gr+drVXhJhbFGzJJKkVT1vczzyfDhHbhr Zba5YFzibH3XBShb6rm9f0LqaZfu4yy0Hzfdv9bgyzFWWKib+WWxxR2g7XkPgl7gPr6O arfa2xO3VrRNUq/Ugck6YDY9QM66XUZ68Pv6/QoSRVVd6Rmj1iV4l6xPY4O91v1EPu/+ t52TYmyxM0hSZHCkubEIF4LgasTDYOKMgMt/KOW/xgQCQCw5OiogWEOlaL9RSDAH/b9+ rWvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=H+7HXLBkyAOLpnbUPOmqITaSjFmsP/5pELWsoOHR/vE=; b=BOJAqZ00/y9V3lv/csWduNJHH1Xq1/qqtE+lGG6KyX99qTTo4JFThZYu1mjRW4T5Ei 4g6rqApAdjpgIhv7+Ld/48TF8aeD6c1dI6eR8iOczIrUrh/3Nojm0crQTcIkQBzOnsqp GGe/5x9vPFNRbrINsQH2IBkj7uXicgUqfTQNwhO07CGvSIawWtl20nDKR3f4yYz9jTuw WJVECXZFtSTPAZvCHjrYc+1Et6HBKd8dgChQr7WizTYhW6MT5NK9b8hV/jaVclv9m0cC VM+ShP9+tPGQKJte8Q93r7sV3gXGQi/LIVmyVRKdn/mYy7Y+uulLS8cez5GFNZtSyBDs 6gXQ== X-Gm-Message-State: AOAM533wtB35x1GtUx2K6TLS/W0omAuempX03x2dogJ2f8RuvyjYDz5G 2Rar9RBvYnR7NxqPa1OSjKdgr14+AoI= X-Google-Smtp-Source: ABdhPJxQZ3sgwqpjKvQcjt0M3A+/5t6QgTIum9HbB2Tsb3eB/2CY2OrWg3AwGO8ue9+92QN1YdadbA== X-Received: by 2002:a17:90a:4289:b0:1bc:275b:8986 with SMTP id p9-20020a17090a428900b001bc275b8986mr8247190pjg.153.1646799001284; Tue, 08 Mar 2022 20:10:01 -0800 (PST) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id g5-20020a655805000000b003643e405b56sm604343pgr.24.2022.03.08.20.10.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Mar 2022 20:10:00 -0800 (PST) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Nadav Amit , Andrea Arcangeli , Andrew Cooper , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org Subject: [PATCH v3 5/5] mm: avoid unnecessary flush on change_huge_pmd() Date: Tue, 8 Mar 2022 20:10:43 -0800 Message-Id: <20220309041043.302261-6-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220309041043.302261-1-namit@vmware.com> References: <20220309041043.302261-1-namit@vmware.com> MIME-Version: 1.0 X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 8F09320002 X-Rspam-User: Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=MfsGJPwW; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf13.hostedemail.com: domain of mail-pl1-f179.google.com has no SPF policy when checking 209.85.214.179) smtp.helo=mail-pl1-f179.google.com X-Stat-Signature: mr6pccxbxn4x97efj6yxmg6rji68647u X-HE-Tag: 1646799002-236168 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Calls to change_protection_range() on THP can trigger, at least on x86, two TLB flushes for one page: one immediately, when pmdp_invalidate() is called by change_huge_pmd(), and then another one later (that can be batched) when change_protection_range() finishes. The first TLB flush is only necessary to prevent the dirty bit (and with a lesser importance the access bit) from changing while the PTE is modified. However, this is not necessary as the x86 CPUs set the dirty-bit atomically with an additional check that the PTE is (still) present. One caveat is Intel's Knights Landing that has a bug and does not do so. Leverage this behavior to eliminate the unnecessary TLB flush in change_huge_pmd(). Introduce a new arch specific pmdp_invalidate_ad() that only invalidates the access and dirty bit from further changes. Cc: Andrea Arcangeli Cc: Andrew Cooper Cc: Andrew Morton Cc: Andy Lutomirski Cc: Dave Hansen Cc: Peter Xu Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Cc: Yu Zhao Cc: Nick Piggin Cc: x86@kernel.org Signed-off-by: Nadav Amit --- arch/x86/include/asm/pgtable.h | 5 +++++ arch/x86/mm/pgtable.c | 10 ++++++++++ include/linux/pgtable.h | 20 ++++++++++++++++++++ mm/huge_memory.c | 4 ++-- mm/pgtable-generic.c | 8 ++++++++ 5 files changed, 45 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 62ab07e24aef..23ad34edcc4b 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1173,6 +1173,11 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma, } } #endif + +#define __HAVE_ARCH_PMDP_INVALIDATE_AD +extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, + unsigned long address, pmd_t *pmdp); + /* * Page table pages are page-aligned. The lower half of the top * level is used for userspace and the top half for the kernel. diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 3481b35cb4ec..b2fcb2c749ce 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -608,6 +608,16 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma, return young; } + +pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp)); + + if (cpu_feature_enabled(X86_BUG_PTE_LEAK)) + flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); + return old; +} #endif /** diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index f4f4077b97aa..5826e8e52619 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -570,6 +570,26 @@ extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); #endif +#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD + +/* + * pmdp_invalidate_ad() invalidates the PMD while changing a transparent + * hugepage mapping in the page tables. This function is similar to + * pmdp_invalidate(), but should only be used if the access and dirty bits would + * not be cleared by the software in the new PMD value. The function ensures + * that hardware changes of the access and dirty bits updates would not be lost. + * + * Doing so can allow in certain architectures to avoid a TLB flush in most + * cases. Yet, another TLB flush might be necessary later if the PMD update + * itself requires such flush (e.g., if protection was set to be stricter). Yet, + * even when a TLB flush is needed because of the update, the caller may be able + * to batch these TLB flushing operations, so fewer TLB flush operations are + * needed. + */ +extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, + unsigned long address, pmd_t *pmdp); +#endif + #ifndef __HAVE_ARCH_PTE_SAME static inline int pte_same(pte_t pte_a, pte_t pte_b) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 51b0f3cb1ba0..691d80edcfd7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1781,10 +1781,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, * The race makes MADV_DONTNEED miss the huge pmd and don't clear it * which may break userspace. * - * pmdp_invalidate() is required to make sure we don't miss + * pmdp_invalidate_ad() is required to make sure we don't miss * dirty/young flags set by hardware. */ - oldpmd = pmdp_invalidate(vma, addr, pmd); + oldpmd = pmdp_invalidate_ad(vma, addr, pmd); entry = pmd_modify(oldpmd, newprot); if (preserve_write) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 6523fda274e5..90ab721a12a8 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -201,6 +201,14 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, } #endif +#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD +pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + return pmdp_invalidate(vma, address, pmdp); +} +#endif + #ifndef pmdp_collapse_flush pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp)