From patchwork Thu Oct 21 12:21:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12576361 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91D77C433FE for ; Thu, 21 Oct 2021 19:52:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3DB396103D for ; Thu, 21 Oct 2021 19:52:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3DB396103D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id C38336B006C; Thu, 21 Oct 2021 15:52:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BE69B940007; Thu, 21 Oct 2021 15:52:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD4C16B0073; Thu, 21 Oct 2021 15:52:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0005.hostedemail.com [216.40.44.5]) by kanga.kvack.org (Postfix) with ESMTP id A12D36B006C for ; Thu, 21 Oct 2021 15:52:50 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 31755183381FE for ; Thu, 21 Oct 2021 19:52:50 +0000 (UTC) X-FDA: 78721492500.08.1949CBA Received: from mail-pg1-f180.google.com (mail-pg1-f180.google.com [209.85.215.180]) by imf04.hostedemail.com (Postfix) with ESMTP id 9BCE65000303 for ; Thu, 21 Oct 2021 19:52:46 +0000 (UTC) Received: by mail-pg1-f180.google.com with SMTP id g184so1262950pgc.6 for ; Thu, 21 Oct 2021 12:52:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2owe5n0pN7M447a4f5gtFpdAO84GGDezTjkbshsV/5A=; b=NJDNLISDqCS6DWtuKe+YL6u25O9QPnbwfEMv6fi/FTzhN7VZCfvtRP1YGKBc0+XjMh YCazBLUWtRkWc81D2XisUekNzNGZaZaRAzPypK2TOhTwjkA/YUpKCRlrLvtfrQM5Pna6 OryIRAJXIdg5dmnUvNdYU7M0QYPu9QFqpiRmNRY0U5mYE5NsJZOIMzM+a3ROsX7Vumyb 1M0zHT3yzYTM1R3GOd2RDmG99sGCnvZCnjZ1jWU7BEQssUH2pLHmL6SJ+O4x90sMvFGB JO/Ut/ZDjB7YA0eUMqECF9o/RU7AsYnkSG+l2ESCEmCFZWO0AcsdEVfdHTAwybrMkUwG bT5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2owe5n0pN7M447a4f5gtFpdAO84GGDezTjkbshsV/5A=; b=wRZzhgqjF8sO0a9vMSZFnmXMhrILp4CndOVucHqn46Yh3MOEzY4DP+6VwtszHPQo9m AStKj4TY/0+fYHzSQWcGGesrB7is6oK4kZ3H7T8iaB4sDLtFBK8SdZyHfnidbVAEq4C1 T199kHUvcDvh0oLl425CnVT3QysV3PGNIr49ei+kEu9ZeROcY8de90FqyK4nAO5JQavF 0pjxi3pJvD7FNvOJKhkUqt+NTwlCdm+TAr7TpZafYXdZYiJMkzxatmBp6TEePkn020yP 82OXcY+n5qlBgpGktK18+L4W9WH/tVOXVgevTvoAtQ15f9fNz1exrIuHMkp9EjwJkW6P 8kXw== X-Gm-Message-State: AOAM531bB/AwX4pYNQ/pNCt8U9yegi14p29wcnLs6RtEgBIryYwLYyuH JnTOFyJjGpsKGa3HOwEJ2xH7O2X/SFQ= X-Google-Smtp-Source: ABdhPJwDn3L/Z/hX4WazYQ273fY7J6ervyUBeTkfMNwrqmDOEOlhO9clSXfesU4EUXHWpvcLXdAWhg== X-Received: by 2002:a63:9554:: with SMTP id t20mr5888260pgn.153.1634845968105; Thu, 21 Oct 2021 12:52:48 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id n202sm7098078pfd.160.2021.10.21.12.52.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Oct 2021 12:52:47 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Nadav Amit , Andi Kleen , Andrea Arcangeli , Andrew Cooper , Andrew Morton , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org Subject: [PATCH v2 1/5] x86: Detection of Knights Landing A/D leak Date: Thu, 21 Oct 2021 05:21:08 -0700 Message-Id: <20211021122112.592634-2-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211021122112.592634-1-namit@vmware.com> References: <20211021122112.592634-1-namit@vmware.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 9BCE65000303 X-Stat-Signature: o8uqgqe4odnrho68qp48a3dwuk5mkhz4 Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=NJDNLISD; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf04.hostedemail.com: domain of mail-pg1-f180.google.com has no SPF policy when checking 209.85.215.180) smtp.helo=mail-pg1-f180.google.com X-HE-Tag: 1634845966-729158 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Knights Landing has a issue that a thread setting A or D bits may not do so atomically against checking the present bit. A thread which is going to page fault may still set those bits, even though the present bit was already atomically cleared. This implies that when the kernel clears present atomically, some time later the supposed to be zero entry could be corrupted with stray A or D bits. Since the PTE could be already used for storing a swap index, or a NUMA migration index, this cannot be tolerated. Most of the time the kernel detects the problem, but in some rare cases it may not. This patch adds an interface to detect the bug, which will be used in the following patch. [ Based on a patch by Andi Kleen ] Cc: Andi Kleen Cc: Andrea Arcangeli Cc: Andrew Cooper Cc: Andrew Morton Cc: Andy Lutomirski Cc: Dave Hansen Cc: Peter Xu Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Cc: Yu Zhao Cc: Nick Piggin Cc: x86@kernel.org Link: https://lore.kernel.org/lkml/1465919919-2093-1-git-send-email-lukasz.anaczkowski@intel.com/ Signed-off-by: Nadav Amit --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/intel.c | 5 +++++ 2 files changed, 6 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index d0ce5cfd3ac1..32d0aabd788d 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -436,5 +436,6 @@ #define X86_BUG_TAA X86_BUG(22) /* CPU is affected by TSX Async Abort(TAA) */ #define X86_BUG_ITLB_MULTIHIT X86_BUG(23) /* CPU may incur MCE during certain page attribute changes */ #define X86_BUG_SRBDS X86_BUG(24) /* CPU may leak RNG bits if not mitigated */ +#define X86_BUG_PTE_LEAK X86_BUG(25) /* PTE may leak A/D bits after clear */ #endif /* _ASM_X86_CPUFEATURES_H */ diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 8321c43554a1..40bcba6e3641 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -296,6 +296,11 @@ static void early_init_intel(struct cpuinfo_x86 *c) } } + if (c->x86_model == 87) { + pr_info_once("Enabling PTE leaking workaround\n"); + set_cpu_bug(c, X86_BUG_PTE_LEAK); + } + /* * Intel Quark Core DevMan_001.pdf section 6.4.11 * "The operating system also is required to invalidate (i.e., flush) From patchwork Thu Oct 21 12:21:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12576363 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3CA8C433EF for ; Thu, 21 Oct 2021 19:52:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A75B260F22 for ; Thu, 21 Oct 2021 19:52:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A75B260F22 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 114E4940008; Thu, 21 Oct 2021 15:52:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C1E9940007; Thu, 21 Oct 2021 15:52:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E5753940008; Thu, 21 Oct 2021 15:52:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0182.hostedemail.com [216.40.44.182]) by kanga.kvack.org (Postfix) with ESMTP id D3B74940007 for ; Thu, 21 Oct 2021 15:52:51 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 8BEBD82499A8 for ; Thu, 21 Oct 2021 19:52:51 +0000 (UTC) X-FDA: 78721492542.31.024F7D6 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf02.hostedemail.com (Postfix) with ESMTP id 9F8047001704 for ; Thu, 21 Oct 2021 19:52:48 +0000 (UTC) Received: by mail-pl1-f176.google.com with SMTP id s1so1136544plg.12 for ; Thu, 21 Oct 2021 12:52:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ppGy6AQbHnjCFGSfBBiNfJAwHFBC18I3Nums49Qp4rg=; b=U0hlQgwsREzwNIA20fgl1YsScyiSJXb4rLjRg1wz3mBQYgMNQ+IjpEK7387xyH7JkV qmVnulwbVyxw3rmj8vPQhyvOn/OqV4yJx4dK8k0Aako34Bs0jAtDvRxf10N9qF3mx0Rn f0vS0Tg97kF6Lzv+2QVRKcL6JvSnckS2f5E/vsZiseRS0EKWEeF1NOmgwBtVe+6D3P6q epYHqYopt5aj27JW2trNi91Und57gdcPk36/6uMqwJjCLK+5G5KcRPDfAiJB3ELOP20B DRGcxiyIrfopGeWhsl8OcHAkpqOqLd5f4eoo/dlv87ItbXi0AOZbyOKwsFqwfH05Im7p 1b+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ppGy6AQbHnjCFGSfBBiNfJAwHFBC18I3Nums49Qp4rg=; b=7k8NGteQnI00tPqmk877duTZ5MZVpPlYFGqYMvbgfjkuDI+nF83Onr/x1lGg6BmeEB /2mt/JDj0gWxzFbL+uk0icogwQb8RVWiwHbuzum3mMa98X5CW1uEQ/tbiLMuMTzDjUyw kmQRInvnecUokoPAI4ZvW6DtzdBYpZekOcEdt5N8XQsVWBIrpdTohKVLdvPvdjcDxt7G aKBswaAkKWwlo9H9w32ZihfzKBD/7lKx19asjrAY14cFFi5KOvJWDVYH9OlLfHL2cBIg Bx4FdyLsfaeTWDGbJwqE4kbEHzSSANbmH/TL150XkxVjysZflmbhYxxxb3qLXLjBQkHU jxrQ== X-Gm-Message-State: AOAM532MLvXUuVbIhNn5zEvIlKh+bAEQjkHv727d1bHyTi2TdwCpjBeZ ePDu2jptNkUxHjL28PYOkpR2ab7WtbU= X-Google-Smtp-Source: ABdhPJy7zcJn3qNfuZ+x0jgYPf1goT6pv5RF7V4ijPVQm6pq9YaVEd3bqwUA39f2Bi9H+10rQIqXeA== X-Received: by 2002:a17:90b:4c03:: with SMTP id na3mr9087894pjb.14.1634845969760; Thu, 21 Oct 2021 12:52:49 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id n202sm7098078pfd.160.2021.10.21.12.52.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Oct 2021 12:52:49 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Nadav Amit , Andrea Arcangeli , Andrew Cooper , Andrew Morton , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org Subject: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd() Date: Thu, 21 Oct 2021 05:21:09 -0700 Message-Id: <20211021122112.592634-3-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211021122112.592634-1-namit@vmware.com> References: <20211021122112.592634-1-namit@vmware.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 9F8047001704 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=U0hlQgws; spf=none (imf02.hostedemail.com: domain of mail-pl1-f176.google.com has no SPF policy when checking 209.85.214.176) smtp.helo=mail-pl1-f176.google.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: i7bhqfo6ezgkb3mnxojwg3rbegitwmif X-Rspamd-Server: rspam06 X-HE-Tag: 1634845968-148501 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Calls to change_protection_range() on THP can trigger, at least on x86, two TLB flushes for one page: one immediately, when pmdp_invalidate() is called by change_huge_pmd(), and then another one later (that can be batched) when change_protection_range() finishes. The first TLB flush is only necessary to prevent the dirty bit (and with a lesser importance the access bit) from changing while the PTE is modified. However, this is not necessary as the x86 CPUs set the dirty-bit atomically with an additional check that the PTE is (still) present. One caveat is Intel's Knights Landing that has a bug and does not do so. Leverage this behavior to eliminate the unnecessary TLB flush in change_huge_pmd(). Introduce a new arch specific pmdp_invalidate_ad() that only invalidates the access and dirty bit from further changes. Cc: Andrea Arcangeli Cc: Andrew Cooper Cc: Andrew Morton Cc: Andy Lutomirski Cc: Dave Hansen Cc: Peter Xu Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Cc: Yu Zhao Cc: Nick Piggin Cc: x86@kernel.org Signed-off-by: Nadav Amit --- arch/x86/include/asm/pgtable.h | 8 ++++++++ include/linux/pgtable.h | 5 +++++ mm/huge_memory.c | 7 ++++--- mm/pgtable-generic.c | 8 ++++++++ 4 files changed, 25 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 448cd01eb3ec..18c3366f8f4d 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1146,6 +1146,14 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma, } } #endif + +#define __HAVE_ARCH_PMDP_INVALIDATE_AD +static inline pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, + unsigned long address, pmd_t *pmdp) +{ + return pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp)); +} + /* * Page table pages are page-aligned. The lower half of the top * level is used for userspace and the top half for the kernel. diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e24d2c992b11..622efe0a9ef0 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -561,6 +561,11 @@ extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); #endif +#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD +extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, + unsigned long address, pmd_t *pmdp); +#endif + #ifndef __HAVE_ARCH_PTE_SAME static inline int pte_same(pte_t pte_a, pte_t pte_b) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e5ea5f775d5c..435da011b1a2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1795,10 +1795,11 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, * The race makes MADV_DONTNEED miss the huge pmd and don't clear it * which may break userspace. * - * pmdp_invalidate() is required to make sure we don't miss - * dirty/young flags set by hardware. + * pmdp_invalidate_ad() is required to make sure we don't miss + * dirty/young flags (which are also known as access/dirty) cannot be + * further modifeid by the hardware. */ - entry = pmdp_invalidate(vma, addr, pmd); + entry = pmdp_invalidate_ad(vma, addr, pmd); entry = pmd_modify(entry, newprot); if (preserve_write) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 4e640baf9794..b0ce6c7391bf 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -200,6 +200,14 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, } #endif +#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD +pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + return pmdp_invalidate(vma, address, pmdp); +} +#endif + #ifndef pmdp_collapse_flush pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) From patchwork Thu Oct 21 12:21:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12576365 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A6CCC433F5 for ; Thu, 21 Oct 2021 19:52:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C0B45611C7 for ; Thu, 21 Oct 2021 19:52:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org C0B45611C7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 72D27940009; Thu, 21 Oct 2021 15:52:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7036E940007; Thu, 21 Oct 2021 15:52:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5CC6B940009; Thu, 21 Oct 2021 15:52:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0121.hostedemail.com [216.40.44.121]) by kanga.kvack.org (Postfix) with ESMTP id 5085F940007 for ; Thu, 21 Oct 2021 15:52:53 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 0DA0A2D233 for ; Thu, 21 Oct 2021 19:52:53 +0000 (UTC) X-FDA: 78721492626.05.9F379A3 Received: from mail-pg1-f181.google.com (mail-pg1-f181.google.com [209.85.215.181]) by imf20.hostedemail.com (Postfix) with ESMTP id D0FD5D0000BE for ; Thu, 21 Oct 2021 19:52:48 +0000 (UTC) Received: by mail-pg1-f181.google.com with SMTP id 75so1273883pga.3 for ; Thu, 21 Oct 2021 12:52:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZkXKIl0cvonpSMPGbZtkMfANAFTOohR0bKnvqthvOXw=; b=JutQJLFxA+3lL90mZzR5yi7o5P/8Etyk+Xr3rSPjfgVQe85aUnXiu7bYYc2kWNmws2 VsWMouTl3YB6Pq7NcS6pip1fYYxtcsG3Ez9la13QpAVxXaucpYDuWh6Kd7AFNOUJ8617 TkPo8H2jC7nB16PUtjcOBdfU1sOT5kEiNQWcN+dfy7bDzVNNqOib0MHgbWgRYIYe8SXP devpb+kyUVY+LYsB5JUab9lEr7+nSM+QGrY0gspIVN3HpCeXSaurHAn38A9kRy0rWbdW Mi2kYLEgwZAQlKct4VpEm4YyuoppuvNr2dzsguH/Jv6uMMDPf4En3Ar4tOL1U5Ovph9W adeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZkXKIl0cvonpSMPGbZtkMfANAFTOohR0bKnvqthvOXw=; b=mE0leBzUr3qf/8kt/U0jI+f2EyMw+FYFR7BMMLCeNbekwaRJbw4uUTdV1c43ugKX7T Cd22IVwsGnhbEYGrr/AmCtymahWqlOeRdzqWC9wZyO7FFiT2ZcbNPbeGp+4ttHleUFm9 s5RyFF5gkhANh/fCplk3F6sE9h3qkmDCIrPi307ZE/mHF0YdqP3un2GI/SE/mCuH4IOk x6fVRJU43K/jFZAlzvUOWmf6qOPbIPG7y1M4tYT5Of0T9nYJ/9SV6Nv2sCC0sxTYGItf n3ElJRbCBbtC4W+BdKFGwa2QVSXOmiEBIVAuPdkxOfF4ARKr630rOVsdhjfzr8OVKgXW EVpA== X-Gm-Message-State: AOAM532e7EYm/sa+KOPN1ryxpU3d/WOy59OAJ13+m12iAI4WDuTUhi6A mLoIHwHzTuQTZKFoLykcY/foBgAW4J4= X-Google-Smtp-Source: ABdhPJyepvA4cFmUlSoj8gy3U9v1K7jLnLDxZ2aoIQifXbNjNp7IFgZ3P3NY1KDGqfb7CERd7Vbxug== X-Received: by 2002:a05:6a00:1255:b0:44c:dd49:b39a with SMTP id u21-20020a056a00125500b0044cdd49b39amr8050182pfi.66.1634845971227; Thu, 21 Oct 2021 12:52:51 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id n202sm7098078pfd.160.2021.10.21.12.52.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Oct 2021 12:52:50 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Nadav Amit , Andrea Arcangeli , Andrew Cooper , Andrew Morton , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org Subject: [PATCH v2 3/5] x86/mm: check exec permissions on fault Date: Thu, 21 Oct 2021 05:21:10 -0700 Message-Id: <20211021122112.592634-4-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211021122112.592634-1-namit@vmware.com> References: <20211021122112.592634-1-namit@vmware.com> MIME-Version: 1.0 X-Stat-Signature: nykukrki3hjgore1kbpb5yr8w7omqw39 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: D0FD5D0000BE Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=JutQJLFx; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf20.hostedemail.com: domain of mail-pg1-f181.google.com has no SPF policy when checking 209.85.215.181) smtp.helo=mail-pg1-f181.google.com X-HE-Tag: 1634845968-755294 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit access_error() currently does not check for execution permission violation. As a result, spurious page-faults due to execution permission violation cause SIGSEGV. It appears not to be an issue so far, but the next patches avoid TLB flushes on permission promotion, which can lead to this scenario. nodejs for instance crashes when TLB flush is avoided on permission promotion. Add a check to prevent access_error() from returning mistakenly that page-faults due to instruction fetch are not allowed. Intel SDM does not indicate whether "instruction fetch" and "write" in the hardware error code are mutual exclusive, so check both before returning whether the access is allowed. Cc: Andrea Arcangeli Cc: Andrew Cooper Cc: Andrew Morton Cc: Andy Lutomirski Cc: Dave Hansen Cc: Peter Xu Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Cc: Yu Zhao Cc: Nick Piggin Cc: x86@kernel.org Signed-off-by: Nadav Amit --- arch/x86/mm/fault.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index b2eefdefc108..e776130473ce 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1100,10 +1100,17 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) (error_code & X86_PF_INSTR), foreign)) return 1; - if (error_code & X86_PF_WRITE) { + if (error_code & (X86_PF_WRITE | X86_PF_INSTR)) { /* write, present and write, not present: */ - if (unlikely(!(vma->vm_flags & VM_WRITE))) + if ((error_code & X86_PF_WRITE) && + unlikely(!(vma->vm_flags & VM_WRITE))) return 1; + + /* exec, present and exec, not present: */ + if ((error_code & X86_PF_INSTR) && + unlikely(!(vma->vm_flags & VM_EXEC))) + return 1; + return 0; } From patchwork Thu Oct 21 12:21:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12576367 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 355F2C433FE for ; Thu, 21 Oct 2021 19:52:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B508160F22 for ; Thu, 21 Oct 2021 19:52:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B508160F22 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 50CA194000A; Thu, 21 Oct 2021 15:52:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 46EB1940007; Thu, 21 Oct 2021 15:52:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 297D994000A; Thu, 21 Oct 2021 15:52:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0046.hostedemail.com [216.40.44.46]) by kanga.kvack.org (Postfix) with ESMTP id 139E2940007 for ; Thu, 21 Oct 2021 15:52:55 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id BFB1982499A8 for ; Thu, 21 Oct 2021 19:52:54 +0000 (UTC) X-FDA: 78721492668.11.56386E2 Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf05.hostedemail.com (Postfix) with ESMTP id 093475084E90 for ; Thu, 21 Oct 2021 19:52:47 +0000 (UTC) Received: by mail-pj1-f41.google.com with SMTP id ls18so1289100pjb.3 for ; Thu, 21 Oct 2021 12:52:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=SRq4NQhQA7HWz7IbxvdffYxvD2YgwmAObj1FZn6RW1g=; b=oumy2yWWoX2f7QqFA6XX12D2XWvcNBNfD2oI53KYTe/CYI21sh9/H+o5EXZChJ8hTR W8JZ8j/pYe9D1GF+FC5jE7827JY9oUXOGCcfk6E6gZk3AvbjWB9f+hHAM0mEqXInfbTj kZVwVmnCHDCEA+aCif7i/Ec/76aQWZEafZqF38LW+rkH1/azuXeNi9oK+PfADFbqN2C8 cEmpsAqX1KOLQQ+aeF2K7tnVL9PxHCBqkpQEPVpjrXrFfYh2bzmgJcMgaJiaJBTrFNYy yXSZkwk4xNDNJaoKy5BS6jQtyk6O3SplXSanCOCZu5SjX8/geathPdPdOqY1xkfQbNra diXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SRq4NQhQA7HWz7IbxvdffYxvD2YgwmAObj1FZn6RW1g=; b=yP4uPb0/dVlYs0Mt5j/Iyc2AtW1X9LWjCskreendvgMTRRObTJ+TIYme9KeLrB2yXs AzOcXhs0maXInmduahdIZE1PuOzGmmLntfRWWKiSk/+ei58+lclCgoCNNe32YwwRgVC0 FZwoRTS/QccnrmGUdiEUo4ezOAtaG6H98c6bSGNmffTc4jce25nrU+AO80IuQe7ZRVji zFrTllJtti+/JNOwBdYNmyc4hAqUrHgnshX46tQINGjndkdBrdeesEwao/7GFHGbcjMy hul7od7memBiqq79b+kMIsHYg+nnBB0Rfu0X7868WPC9elfxlFO+pCtLFgeXsYQAlJVz 3w4Q== X-Gm-Message-State: AOAM530SmdYx9Ckp67Icx3m9jdxR9Cd8CowSg1qYrEF3TH/TnMdLV1t2 RqR/+PCXaITo3yL8RbUqztWXNw3m28E= X-Google-Smtp-Source: ABdhPJxuRrAgcMRKwkH6qwRdR6U/DGcHWsujfkIWXHgYU2X95hWFAKv+02N6brbIwuKvyVM1y+DQ6A== X-Received: by 2002:a17:90a:6fc1:: with SMTP id e59mr9219821pjk.103.1634845972919; Thu, 21 Oct 2021 12:52:52 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id n202sm7098078pfd.160.2021.10.21.12.52.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Oct 2021 12:52:52 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Nadav Amit , Andrea Arcangeli , Andrew Cooper , Andrew Morton , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org Subject: [PATCH v2 4/5] mm/mprotect: use mmu_gather Date: Thu, 21 Oct 2021 05:21:11 -0700 Message-Id: <20211021122112.592634-5-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211021122112.592634-1-namit@vmware.com> References: <20211021122112.592634-1-namit@vmware.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 093475084E90 X-Stat-Signature: crbtokkwoakbt4gnf177k8guoe5yg4qh Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=oumy2yWW; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf05.hostedemail.com: domain of mail-pj1-f41.google.com has no SPF policy when checking 209.85.216.41) smtp.helo=mail-pj1-f41.google.com X-HE-Tag: 1634845967-949642 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit change_pXX_range() currently does not use mmu_gather, but instead implements its own deferred TLB flushes scheme. This both complicates the code, as developers need to be aware of different invalidation schemes, and prevents opportunities to avoid TLB flushes or perform them in finer granularity. The use of mmu_gather for modified PTEs has benefits in various scenarios even if pages are not released. For instance, if only a single page needs to be flushed out of a range of many pages, only that page would be flushed. If a THP page is flushed, on x86 a single TLB invlpg instruction can be used instead of 512 instructions (or a full TLB flush, which would Linux would actually use by default). mprotect() over multiple VMAs requires a single flush. Use mmu_gather in change_pXX_range(). As the pages are not released, only record the flushed range using tlb_flush_pXX_range(). Handle THP similarly and get rid of flush_cache_range() which becomes redundant since tlb_start_vma() calls it when needed. Cc: Andrea Arcangeli Cc: Andrew Cooper Cc: Andrew Morton Cc: Andy Lutomirski Cc: Dave Hansen Cc: Peter Xu Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Cc: Yu Zhao Cc: Nick Piggin Cc: x86@kernel.org Signed-off-by: Nadav Amit --- fs/exec.c | 6 ++- include/linux/huge_mm.h | 5 ++- include/linux/mm.h | 5 ++- mm/huge_memory.c | 10 ++++- mm/mprotect.c | 93 ++++++++++++++++++++++------------------- mm/userfaultfd.c | 6 ++- 6 files changed, 75 insertions(+), 50 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 5a7a07dfdc81..7f8609bbc6b3 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -752,6 +752,7 @@ int setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_size; unsigned long stack_expand; unsigned long rlim_stack; + struct mmu_gather tlb; #ifdef CONFIG_STACK_GROWSUP /* Limit stack size */ @@ -806,8 +807,11 @@ int setup_arg_pages(struct linux_binprm *bprm, vm_flags |= mm->def_flags; vm_flags |= VM_STACK_INCOMPLETE_SETUP; - ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, + tlb_gather_mmu(&tlb, mm); + ret = mprotect_fixup(&tlb, vma, &prev, vma->vm_start, vma->vm_end, vm_flags); + tlb_finish_mmu(&tlb); + if (ret) goto out_unlock; BUG_ON(prev != vma); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index f280f33ff223..a9b6e03e9c4c 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -36,8 +36,9 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud, unsigned long addr); bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd); -int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, - pgprot_t newprot, unsigned long cp_flags); +int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, + pmd_t *pmd, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags); vm_fault_t vmf_insert_pfn_pmd_prot(struct vm_fault *vmf, pfn_t pfn, pgprot_t pgprot, bool write); diff --git a/include/linux/mm.h b/include/linux/mm.h index 00bb2d938df4..f46bab158560 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2001,10 +2001,11 @@ extern unsigned long move_page_tables(struct vm_area_struct *vma, #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \ MM_CP_UFFD_WP_RESOLVE) -extern unsigned long change_protection(struct vm_area_struct *vma, unsigned long start, +extern unsigned long change_protection(struct mmu_gather *tlb, + struct vm_area_struct *vma, unsigned long start, unsigned long end, pgprot_t newprot, unsigned long cp_flags); -extern int mprotect_fixup(struct vm_area_struct *vma, +extern int mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma, struct vm_area_struct **pprev, unsigned long start, unsigned long end, unsigned long newflags); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 435da011b1a2..f5d0357a25ce 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1720,8 +1720,9 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, * or if prot_numa but THP migration is not supported * - HPAGE_PMD_NR if protections changed and TLB flush necessary */ -int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long addr, pgprot_t newprot, unsigned long cp_flags) +int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, + pmd_t *pmd, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags) { struct mm_struct *mm = vma->vm_mm; spinlock_t *ptl; @@ -1732,6 +1733,8 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, bool uffd_wp = cp_flags & MM_CP_UFFD_WP; bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + if (prot_numa && !thp_migration_supported()) return 1; @@ -1817,6 +1820,9 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, } ret = HPAGE_PMD_NR; set_pmd_at(mm, addr, pmd, entry); + + tlb_flush_pmd_range(tlb, addr, HPAGE_PMD_SIZE); + BUG_ON(vma_is_anonymous(vma) && !preserve_write && pmd_write(entry)); unlock: spin_unlock(ptl); diff --git a/mm/mprotect.c b/mm/mprotect.c index 883e2cc85cad..0f5c87af5c60 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -32,12 +32,13 @@ #include #include #include +#include #include "internal.h" -static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long addr, unsigned long end, pgprot_t newprot, - unsigned long cp_flags) +static unsigned long change_pte_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, + unsigned long end, pgprot_t newprot, unsigned long cp_flags) { pte_t *pte, oldpte; spinlock_t *ptl; @@ -48,6 +49,8 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, bool uffd_wp = cp_flags & MM_CP_UFFD_WP; bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; + tlb_change_page_size(tlb, PAGE_SIZE); + /* * Can be called with only the mmap_lock for reading by * prot_numa so we must check the pmd isn't constantly @@ -138,6 +141,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, ptent = pte_mkwrite(ptent); } ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); + tlb_flush_pte_range(tlb, addr, PAGE_SIZE); pages++; } else if (is_swap_pte(oldpte)) { swp_entry_t entry = pte_to_swp_entry(oldpte); @@ -219,9 +223,9 @@ static inline int pmd_none_or_clear_bad_unless_trans_huge(pmd_t *pmd) return 0; } -static inline unsigned long change_pmd_range(struct vm_area_struct *vma, - pud_t *pud, unsigned long addr, unsigned long end, - pgprot_t newprot, unsigned long cp_flags) +static inline unsigned long change_pmd_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, pud_t *pud, unsigned long addr, + unsigned long end, pgprot_t newprot, unsigned long cp_flags) { pmd_t *pmd; unsigned long next; @@ -261,8 +265,12 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, if (next - addr != HPAGE_PMD_SIZE) { __split_huge_pmd(vma, pmd, addr, false, NULL); } else { - int nr_ptes = change_huge_pmd(vma, pmd, addr, - newprot, cp_flags); + /* + * change_huge_pmd() does not defer TLB flushes, + * so no need to propagate the tlb argument. + */ + int nr_ptes = change_huge_pmd(tlb, vma, pmd, + addr, newprot, cp_flags); if (nr_ptes) { if (nr_ptes == HPAGE_PMD_NR) { @@ -276,8 +284,8 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, } /* fall through, the trans huge pmd just split */ } - this_pages = change_pte_range(vma, pmd, addr, next, newprot, - cp_flags); + this_pages = change_pte_range(tlb, vma, pmd, addr, next, + newprot, cp_flags); pages += this_pages; next: cond_resched(); @@ -291,9 +299,9 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, return pages; } -static inline unsigned long change_pud_range(struct vm_area_struct *vma, - p4d_t *p4d, unsigned long addr, unsigned long end, - pgprot_t newprot, unsigned long cp_flags) +static inline unsigned long change_pud_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, p4d_t *p4d, unsigned long addr, + unsigned long end, pgprot_t newprot, unsigned long cp_flags) { pud_t *pud; unsigned long next; @@ -304,16 +312,16 @@ static inline unsigned long change_pud_range(struct vm_area_struct *vma, next = pud_addr_end(addr, end); if (pud_none_or_clear_bad(pud)) continue; - pages += change_pmd_range(vma, pud, addr, next, newprot, + pages += change_pmd_range(tlb, vma, pud, addr, next, newprot, cp_flags); } while (pud++, addr = next, addr != end); return pages; } -static inline unsigned long change_p4d_range(struct vm_area_struct *vma, - pgd_t *pgd, unsigned long addr, unsigned long end, - pgprot_t newprot, unsigned long cp_flags) +static inline unsigned long change_p4d_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, pgd_t *pgd, unsigned long addr, + unsigned long end, pgprot_t newprot, unsigned long cp_flags) { p4d_t *p4d; unsigned long next; @@ -324,44 +332,40 @@ static inline unsigned long change_p4d_range(struct vm_area_struct *vma, next = p4d_addr_end(addr, end); if (p4d_none_or_clear_bad(p4d)) continue; - pages += change_pud_range(vma, p4d, addr, next, newprot, + pages += change_pud_range(tlb, vma, p4d, addr, next, newprot, cp_flags); } while (p4d++, addr = next, addr != end); return pages; } -static unsigned long change_protection_range(struct vm_area_struct *vma, - unsigned long addr, unsigned long end, pgprot_t newprot, - unsigned long cp_flags) +static unsigned long change_protection_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, unsigned long addr, + unsigned long end, pgprot_t newprot, unsigned long cp_flags) { struct mm_struct *mm = vma->vm_mm; pgd_t *pgd; unsigned long next; - unsigned long start = addr; unsigned long pages = 0; BUG_ON(addr >= end); pgd = pgd_offset(mm, addr); - flush_cache_range(vma, addr, end); - inc_tlb_flush_pending(mm); + tlb_start_vma(tlb, vma); do { next = pgd_addr_end(addr, end); if (pgd_none_or_clear_bad(pgd)) continue; - pages += change_p4d_range(vma, pgd, addr, next, newprot, + pages += change_p4d_range(tlb, vma, pgd, addr, next, newprot, cp_flags); } while (pgd++, addr = next, addr != end); - /* Only flush the TLB if we actually modified any entries: */ - if (pages) - flush_tlb_range(vma, start, end); - dec_tlb_flush_pending(mm); + tlb_end_vma(tlb, vma); return pages; } -unsigned long change_protection(struct vm_area_struct *vma, unsigned long start, +unsigned long change_protection(struct mmu_gather *tlb, + struct vm_area_struct *vma, unsigned long start, unsigned long end, pgprot_t newprot, unsigned long cp_flags) { @@ -372,7 +376,7 @@ unsigned long change_protection(struct vm_area_struct *vma, unsigned long start, if (is_vm_hugetlb_page(vma)) pages = hugetlb_change_protection(vma, start, end, newprot); else - pages = change_protection_range(vma, start, end, newprot, + pages = change_protection_range(tlb, vma, start, end, newprot, cp_flags); return pages; @@ -406,8 +410,9 @@ static const struct mm_walk_ops prot_none_walk_ops = { }; int -mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, - unsigned long start, unsigned long end, unsigned long newflags) +mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma, + struct vm_area_struct **pprev, unsigned long start, + unsigned long end, unsigned long newflags) { struct mm_struct *mm = vma->vm_mm; unsigned long oldflags = vma->vm_flags; @@ -494,7 +499,7 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev, dirty_accountable = vma_wants_writenotify(vma, vma->vm_page_prot); vma_set_page_prot(vma); - change_protection(vma, start, end, vma->vm_page_prot, + change_protection(tlb, vma, start, end, vma->vm_page_prot, dirty_accountable ? MM_CP_DIRTY_ACCT : 0); /* @@ -528,6 +533,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len, const int grows = prot & (PROT_GROWSDOWN|PROT_GROWSUP); const bool rier = (current->personality & READ_IMPLIES_EXEC) && (prot & PROT_READ); + struct mmu_gather tlb; start = untagged_addr(start); @@ -584,6 +590,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len, if (start > vma->vm_start) prev = vma; + tlb_gather_mmu(&tlb, current->mm); for (nstart = start ; ; ) { unsigned long mask_off_old_flags; unsigned long newflags; @@ -610,18 +617,18 @@ static int do_mprotect_pkey(unsigned long start, size_t len, /* newflags >> 4 shift VM_MAY% in place of VM_% */ if ((newflags & ~(newflags >> 4)) & VM_ACCESS_FLAGS) { error = -EACCES; - goto out; + goto out_tlb; } /* Allow architectures to sanity-check the new flags */ if (!arch_validate_flags(newflags)) { error = -EINVAL; - goto out; + goto out_tlb; } error = security_file_mprotect(vma, reqprot, prot); if (error) - goto out; + goto out_tlb; tmp = vma->vm_end; if (tmp > end) @@ -630,27 +637,29 @@ static int do_mprotect_pkey(unsigned long start, size_t len, if (vma->vm_ops && vma->vm_ops->mprotect) { error = vma->vm_ops->mprotect(vma, nstart, tmp, newflags); if (error) - goto out; + goto out_tlb; } - error = mprotect_fixup(vma, &prev, nstart, tmp, newflags); + error = mprotect_fixup(&tlb, vma, &prev, nstart, tmp, newflags); if (error) - goto out; + goto out_tlb; nstart = tmp; if (nstart < prev->vm_end) nstart = prev->vm_end; if (nstart >= end) - goto out; + goto out_tlb; vma = prev->vm_next; if (!vma || vma->vm_start != nstart) { error = -ENOMEM; - goto out; + goto out_tlb; } prot = reqprot; } +out_tlb: + tlb_finish_mmu(&tlb); out: mmap_write_unlock(current->mm); return error; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index ac6f036298cd..15a20bb35868 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "internal.h" static __always_inline @@ -674,6 +675,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, atomic_t *mmap_changing) { struct vm_area_struct *dst_vma; + struct mmu_gather tlb; pgprot_t newprot; int err; @@ -715,8 +717,10 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, else newprot = vm_get_page_prot(dst_vma->vm_flags); - change_protection(dst_vma, start, start + len, newprot, + tlb_gather_mmu(&tlb, dst_mm); + change_protection(&tlb, dst_vma, start, start + len, newprot, enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE); + tlb_finish_mmu(&tlb); err = 0; out_unlock: From patchwork Thu Oct 21 12:21:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12576369 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8347FC433EF for ; Thu, 21 Oct 2021 19:52:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2F4AB610A3 for ; Thu, 21 Oct 2021 19:52:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2F4AB610A3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id E824494000B; Thu, 21 Oct 2021 15:52:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E3041940007; Thu, 21 Oct 2021 15:52:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD15094000B; Thu, 21 Oct 2021 15:52:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0082.hostedemail.com [216.40.44.82]) by kanga.kvack.org (Postfix) with ESMTP id BF05F940007 for ; Thu, 21 Oct 2021 15:52:56 -0400 (EDT) Received: from smtpin36.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 786641833B8F6 for ; Thu, 21 Oct 2021 19:52:56 +0000 (UTC) X-FDA: 78721492752.36.123CCDB Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) by imf09.hostedemail.com (Postfix) with ESMTP id AF1513000116 for ; Thu, 21 Oct 2021 19:52:53 +0000 (UTC) Received: by mail-pf1-f178.google.com with SMTP id c29so1613511pfp.2 for ; Thu, 21 Oct 2021 12:52:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xAPUfxWmK2pbXk4NEqW3WWyCxNsLR7Fe6Gbqejxq7W8=; b=ATG9UfjwmnYi5q2/fzhTWXxRsr9/eHWuvWM7Og0nOg7+2bUxnVmIAKZWQN+Fw2U4c2 IrpXqcLnOTipIMfLVvmRJAkfDOWD1DKjGBB+frOZtSVYlSbEJyIejLcFuHTNU3NfCrEo N4hNJZomGpaBo5w9T1waObJCqp5+DQHdNyXq30ocPj94Bt1PKgdO+zMoMlkZ5wqTF6u0 IyKbvDozfvUqPsPBp1pYO8nQZzvDhR7AyQt4bQBsrvp/fBnM/n+O9nfcA6aS65E9mfF/ HkbeclAn3KKBdp8H4KONs0yvPVqKCn6b4gI5vBw+sgjz0dO35Ze7fnNSDG+m9FYLO4uD lmfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=xAPUfxWmK2pbXk4NEqW3WWyCxNsLR7Fe6Gbqejxq7W8=; b=PQ3UhN9utyg1pA8Igom4xlAzs31Xgi/LX9pQtvnfguQbqbAWBhbKv4NGq4xUALtvUS gIU8N+WmW+iX8qkRr5AysTSXI83iJoV8cdOqfkr02SD6Gsp5MGnalN6cOcxs5HYfnkJr om13nypM9WmERI7FneHb0p+YEHFA9AFhFfF7vB2/CFt/E4TJp8mZ5OVEaOyhWyvcWKBt ggGQZtJQc0hMSrhFaJX7FS1Bc1f8gWFdjok9BsFSZa5oaDsx5HFcr3Zdu1xz3AcUSuAo GzoVSTyzHCMPbrNXoFccykSqnBIap/vN3L+oo8dQn623jYSlM4ao6OE1gklXMxzlr19P lmmQ== X-Gm-Message-State: AOAM533WafrTVTl+dAsSL/t1XnA3Fqn3Qc3RPEkxu+4J7FUCFZ6ahAax j1x5k9xsdztLqTUk95VT8FdmrNHg//U= X-Google-Smtp-Source: ABdhPJywvKbor/Q3RJAA85stKkFeJlmUYm3yWNeunU3Ykbxr+eKhf9jeD1rL0QOy1zIeVdijqyJBAw== X-Received: by 2002:a05:6a00:2388:b0:44d:4b5d:d5e with SMTP id f8-20020a056a00238800b0044d4b5d0d5emr7700377pfc.80.1634845974547; Thu, 21 Oct 2021 12:52:54 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id n202sm7098078pfd.160.2021.10.21.12.52.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Oct 2021 12:52:53 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Nadav Amit , Andrea Arcangeli , Andrew Morton , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org Subject: [PATCH v2 5/5] mm/mprotect: do not flush on permission promotion Date: Thu, 21 Oct 2021 05:21:12 -0700 Message-Id: <20211021122112.592634-6-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20211021122112.592634-1-namit@vmware.com> References: <20211021122112.592634-1-namit@vmware.com> MIME-Version: 1.0 X-Stat-Signature: ki9iwnejgwhqykgfqmqhmu7za3435fq4 Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ATG9Ufjw; spf=none (imf09.hostedemail.com: domain of mail-pf1-f178.google.com has no SPF policy when checking 209.85.210.178) smtp.helo=mail-pf1-f178.google.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: AF1513000116 X-HE-Tag: 1634845973-429021 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Currently, using mprotect() to unprotect a memory region or uffd to unprotect a memory region causes a TLB flush. At least on x86, as protection is promoted, no TLB flush is needed. Add an arch-specific pte_may_need_flush() which tells whether a TLB flush is needed based on the old PTE and the new one. Implement an x86 pte_may_need_flush(). For x86, besides the simple logic that PTE protection promotion or changes of software bits does require a flush, also add logic that considers the dirty-bit. If the dirty-bit is clear and write-protect is set, no TLB flush is needed, as x86 updates the dirty-bit atomically on write, and if the bit is clear, the PTE is reread. Signed-off-by: Nadav Amit Cc: Andrea Arcangeli Cc: Andrew Morton Cc: Andy Lutomirski Cc: Dave Hansen Cc: Peter Xu Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Cc: Yu Zhao Cc: Nick Piggin Cc: x86@kernel.org --- arch/x86/include/asm/pgtable_types.h | 2 + arch/x86/include/asm/tlbflush.h | 80 ++++++++++++++++++++++++++++ include/asm-generic/tlb.h | 14 +++++ mm/huge_memory.c | 9 ++-- mm/mprotect.c | 3 +- 5 files changed, 103 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 40497a9020c6..8668bc661026 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -110,9 +110,11 @@ #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX) #define _PAGE_DEVMAP (_AT(u64, 1) << _PAGE_BIT_DEVMAP) +#define _PAGE_SOFTW4 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW4) #else #define _PAGE_NX (_AT(pteval_t, 0)) #define _PAGE_DEVMAP (_AT(pteval_t, 0)) +#define _PAGE_SOFTW4 (_AT(pteval_t, 0)) #endif #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index b587a9ee9cb2..a782adde3d62 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -259,6 +259,86 @@ static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch, extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +/* + * The enabled_mask tells which bits that were present and gets cleared require + * flush. + * + * The disabled_mask tells which bits that were missing and gets set require + * flush. + * + * All the other bits except the ignored bits will require a flush no matter if + * they gets set or cleared. + * + * Note that we ignore the accessed bit, since anyhow the kernel does not flush + * after clearing it in other situations. We also ignore the global bit, as it + * is used for protnone. + */ +static inline bool pte_flags_may_need_flush(unsigned long oldflags, + unsigned long newflags) +{ + const pteval_t ignore_mask = _PAGE_SOFTW1 | _PAGE_SOFTW2 | + _PAGE_SOFTW3 | _PAGE_SOFTW4 | _PAGE_ACCESSED | _PAGE_GLOBAL; + const pteval_t enable_mask = _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT; + const pteval_t disable_mask = _PAGE_NX; + unsigned long diff = oldflags ^ newflags; + + return diff & ((oldflags & enable_mask) | + (newflags & disable_mask) | + ~(enable_mask | disable_mask | ignore_mask)); +} + +/* + * pte_may_need_flush() checks whether permissions were demoted and require a + * flush. It should only be used for userspace PTEs. + */ +static inline bool pte_may_need_flush(pte_t oldpte, pte_t newpte) +{ + /* new is non-present: need only if old is present */ + if (!pte_present(newpte)) + return pte_present(oldpte); + + /* old is not present: no need for flush */ + if (!pte_present(oldpte)) + return false; + + /* + * Avoid open-coding to account for protnone_mask() and perform + * comparison of the PTEs. + */ + if (pte_pfn(oldpte) != pte_pfn(newpte)) + return true; + + return pte_flags_may_need_flush(pte_flags(oldpte), + pte_flags(newpte)); +} +#define pte_may_need_flush pte_may_need_flush + +/* + * huge_pmd_may_need_flush() checks whether permissions were demoted and + * require a flush. It should only be used for userspace huge PMDs. + */ +static inline bool huge_pmd_may_need_flush(pmd_t oldpmd, pmd_t newpmd) +{ + /* new is non-present: need only if old is present */ + if (!pmd_present(newpmd)) + return pmd_present(oldpmd); + + /* old is not present: no need for flush */ + if (!pmd_present(oldpmd)) + return false; + + /* + * Avoid open-coding to account for protnone_mask() and perform + * comparison of the PTEs. + */ + if (pmd_pfn(oldpmd) != pmd_pfn(newpmd)) + return true; + + return pte_flags_may_need_flush(pmd_flags(oldpmd), + pmd_flags(newpmd)); +} +#define huge_pmd_may_need_flush huge_pmd_may_need_flush + #endif /* !MODULE */ #endif /* _ASM_X86_TLBFLUSH_H */ diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 2c68a545ffa7..2d3736c62602 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -654,6 +654,20 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, } while (0) #endif +#ifndef pte_may_need_flush +static inline bool pte_may_need_flush(pte_t oldpte, pte_t newpte) +{ + return true; +} +#endif + +#ifndef huge_pmd_may_need_flush +static inline bool huge_pmd_may_need_flush(pmd_t oldpmd, pmd_t newpmd) +{ + return true; +} +#endif + #endif /* CONFIG_MMU */ #endif /* _ASM_GENERIC__TLB_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f5d0357a25ce..f80936324e6a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1726,7 +1726,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, { struct mm_struct *mm = vma->vm_mm; spinlock_t *ptl; - pmd_t entry; + pmd_t oldpmd, entry; bool preserve_write; int ret; bool prot_numa = cp_flags & MM_CP_PROT_NUMA; @@ -1802,9 +1802,9 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, * dirty/young flags (which are also known as access/dirty) cannot be * further modifeid by the hardware. */ - entry = pmdp_invalidate_ad(vma, addr, pmd); + oldpmd = pmdp_invalidate_ad(vma, addr, pmd); - entry = pmd_modify(entry, newprot); + entry = pmd_modify(oldpmd, newprot); if (preserve_write) entry = pmd_mk_savedwrite(entry); if (uffd_wp) { @@ -1821,7 +1821,8 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, ret = HPAGE_PMD_NR; set_pmd_at(mm, addr, pmd, entry); - tlb_flush_pmd_range(tlb, addr, HPAGE_PMD_SIZE); + if (huge_pmd_may_need_flush(oldpmd, entry)) + tlb_flush_pmd_range(tlb, addr, HPAGE_PMD_SIZE); BUG_ON(vma_is_anonymous(vma) && !preserve_write && pmd_write(entry)); unlock: diff --git a/mm/mprotect.c b/mm/mprotect.c index 0f5c87af5c60..6179c82ea72d 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -141,7 +141,8 @@ static unsigned long change_pte_range(struct mmu_gather *tlb, ptent = pte_mkwrite(ptent); } ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); - tlb_flush_pte_range(tlb, addr, PAGE_SIZE); + if (pte_may_need_flush(oldpte, ptent)) + tlb_flush_pte_range(tlb, addr, PAGE_SIZE); pages++; } else if (is_swap_pte(oldpte)) { swp_entry_t entry = pte_to_swp_entry(oldpte);