From patchwork Thu Feb 15 10:32:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13557863 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4F01C4829E for ; Thu, 15 Feb 2024 10:33:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 528578D001F; Thu, 15 Feb 2024 05:33:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B15B8D000E; Thu, 15 Feb 2024 05:33:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B74F8D001F; Thu, 15 Feb 2024 05:33:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 1178C8D000E for ; Thu, 15 Feb 2024 05:33:08 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id DC1BD1A067B for ; Thu, 15 Feb 2024 10:33:07 +0000 (UTC) X-FDA: 81793675614.19.A780688 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf19.hostedemail.com (Postfix) with ESMTP id 3CF6A1A0018 for ; Thu, 15 Feb 2024 10:33:05 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf19.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707993185; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CrpX24hJ21lNTKd4YCqgFb/8kYDx8R2PLZ109arTQ9Q=; b=aqLBzknrrG3HVKLe3KP/auye+o0tEFVVtmR6jNMAneJN5QbK9NseSJQeFjoBOsx0SPMrnl Jh41Kv3omRaIMub3Yshq0AUNT4VUqNe/em0VTY5ary4P5mCV+sDl8o53iB7GsNKl3sb0ql 6dbl9q/41mwkFDt2WmN7GAcgtEgRens= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf19.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707993185; a=rsa-sha256; cv=none; b=olU5GrHZJJktx/AIwlA3scl1dE2YjAuT8Em+qaYKiq+u4cCrlsARpcz0rF+4vHC4Hxq/XF w9/Hzb0PJITgoHaN1SqWAVBdJ15qmQZ2Mb8UiuWJrvBwB7Aluxz1Tcs9Q5Z/29VOy4OGFB u+tHl58gViM7dcsu9lFIUBv6/AKGqAU= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4D79D1595; Thu, 15 Feb 2024 02:33:45 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 44B7C3F7B4; Thu, 15 Feb 2024 02:33:01 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , James Morse , Andrey Ryabinin , Andrew Morton , Matthew Wilcox , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, x86@kernel.org, linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v6 13/18] arm64/mm: Implement new wrprotect_ptes() batch API Date: Thu, 15 Feb 2024 10:32:00 +0000 Message-Id: <20240215103205.2607016-14-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240215103205.2607016-1-ryan.roberts@arm.com> References: <20240215103205.2607016-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 3CF6A1A0018 X-Stat-Signature: w95fieawtomq3uk6totw48yifjajn6ow X-Rspam-User: X-HE-Tag: 1707993185-439266 X-HE-Meta: U2FsdGVkX1/xdbkfuljv9P+P7Ab8rJoFf9uYsOii39BriR7dJNehV+OAsoS0yX68G5vFgbVRu2NCI7zt9Fl+KryDzF2yYlVi7P56PY3ewz3p3WBEZAI6Stm+R87Wjj0IYJw214JyJ/uA7y5dRkp/4MJQPba/ZbKVBleStNfhya9LHwI3xOUiwZANzpUw2aGYuYrQYHcYyOGMV77+whjjMjIP4KyrJhyxASpmRCRSb+Di49gYSiM2DnzM4q58tg6KPTFwW9E+k30vakQ96UI09KJWKhCbRHvThZZdTpBFiJTzpMTQwOKeIPzetLNM9QHjkQrWQvU82PyMEoPiJKB482uxva+9nwqoRts9cncAvzJfhuhyD1blZ7mPBaVl0Z1FgW5Xoh+dSQwPLux+WaKwT+8NSTbm3Vys+QuNsVwk25s1esOkwD+IfKHMEVLGgIP6dq0keMHekBNxhuW/ay8gSnETvBdwEwLvb0QKh3emDrxHY/JtbHwcdvnHq5pIpXjnFv+ecdibK6rOLf9rkpY8gAy5JuwbtOsjC4cxk+Aezf+H5k0Crs/YILm97N3v0RMskPIqsBQMwlmo4v9thB/Ih1oXys2tn19YJIIAhrfYR3Cml6ugkFkW3Em6/H6aO06b3T5F+AxlTcvJIDcMGAKvS08CqsyFM2CmbE0kFr/qD9Ik1JjmEDc9F2qOsGdNHx4+grFOBIuRazNO5Gy2fCCgKDZrKAZkOwaVH28OEz0DB770/qelle7PuTnrzHcqsiu5O02gmUiGicpZq7PdO6MNA8OgtFxW2+RmSVTwqKfeG00Q9I5IocpPJGVsfV0AB28XoQTg6Wigm0WPjwcVPULKKTeMvL5aqaKkmoOmAAVkDUyE5CC5zxtRnbNYWN5Rq2hDATvPMfo5T2x7CqNZXAt8V0aDtCXklC5ijQCR5woP+NufLXbUNygAwHgG/UlOXkGe+Vvd+SMnaDTm/Uiph2O BG6A074m 1L7dzL1hhfK6+dt/TDK0DMEzihMJWJfgmSZ709QyxgYrfnNg0babRD9070I9rgbgep2Stc//bmoq6HAhva2OycZEsfo3peHx0k2OI5d2AX+EImI/9sLjh+T7u82JzkbwuKghW9PPZ7tASg4mks4gpcSFQGwRIL7shMmuVI2WdKlpNrIkTZfleVH2+YdIECX5NsEcly8BDrLGir7Wn2NddTfcpWFGWbXDMJEQBg8PM6A6012m5e8S4CDYWiGk+7h3vI2BxM+6u+fAWcbgONv2h+GwhF5iu/Pj0pnXLOJJVs8rYC3c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Optimize the contpte implementation to fix some of the fork performance regression introduced by the initial contpte commit. Subsequent patches will solve it entirely. During fork(), any private memory in the parent must be write-protected. Previously this was done 1 PTE at a time. But the core-mm supports batched wrprotect via the new wrprotect_ptes() API. So let's implement that API and for fully covered contpte mappings, we no longer need to unfold the contpte. This has 2 benefits: - reduced unfolding, reduces the number of tlbis that must be issued. - The memory remains contpte-mapped ("folded") in the parent, so it continues to benefit from the more efficient use of the TLB after the fork. The optimization to wrprotect a whole contpte block without unfolding is possible thanks to the tightening of the Arm ARM in respect to the definition and behaviour when 'Misprogramming the Contiguous bit'. See section D21194 at https://developer.arm.com/documentation/102105/ja-07/ Tested-by: John Hubbard Signed-off-by: Ryan Roberts Acked-by: Mark Rutland Acked-by: Catalin Marinas --- arch/arm64/include/asm/pgtable.h | 61 ++++++++++++++++++++++++++------ arch/arm64/mm/contpte.c | 38 ++++++++++++++++++++ 2 files changed, 89 insertions(+), 10 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 831099cfc96b..8643227c318b 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -978,16 +978,12 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -/* - * __ptep_set_wrprotect - mark read-only while trasferring potential hardware - * dirty status (PTE_DBM && !PTE_RDONLY) to the software PTE_DIRTY bit. - */ -static inline void __ptep_set_wrprotect(struct mm_struct *mm, - unsigned long address, pte_t *ptep) +static inline void ___ptep_set_wrprotect(struct mm_struct *mm, + unsigned long address, pte_t *ptep, + pte_t pte) { - pte_t old_pte, pte; + pte_t old_pte; - pte = __ptep_get(ptep); do { old_pte = pte; pte = pte_wrprotect(pte); @@ -996,6 +992,25 @@ static inline void __ptep_set_wrprotect(struct mm_struct *mm, } while (pte_val(pte) != pte_val(old_pte)); } +/* + * __ptep_set_wrprotect - mark read-only while trasferring potential hardware + * dirty status (PTE_DBM && !PTE_RDONLY) to the software PTE_DIRTY bit. + */ +static inline void __ptep_set_wrprotect(struct mm_struct *mm, + unsigned long address, pte_t *ptep) +{ + ___ptep_set_wrprotect(mm, address, ptep, __ptep_get(ptep)); +} + +static inline void __wrprotect_ptes(struct mm_struct *mm, unsigned long address, + pte_t *ptep, unsigned int nr) +{ + unsigned int i; + + for (i = 0; i < nr; i++, address += PAGE_SIZE, ptep++) + __ptep_set_wrprotect(mm, address, ptep); +} + #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define __HAVE_ARCH_PMDP_SET_WRPROTECT static inline void pmdp_set_wrprotect(struct mm_struct *mm, @@ -1149,6 +1164,8 @@ extern int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); extern int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); +extern void contpte_wrprotect_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned int nr); extern int contpte_ptep_set_access_flags(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t entry, int dirty); @@ -1268,12 +1285,35 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma, return contpte_ptep_clear_flush_young(vma, addr, ptep); } +#define wrprotect_ptes wrprotect_ptes +static inline void wrprotect_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned int nr) +{ + if (likely(nr == 1)) { + /* + * Optimization: wrprotect_ptes() can only be called for present + * ptes so we only need to check contig bit as condition for + * unfold, and we can remove the contig bit from the pte we read + * to avoid re-reading. This speeds up fork() which is sensitive + * for order-0 folios. Equivalent to contpte_try_unfold(). + */ + pte_t orig_pte = __ptep_get(ptep); + + if (unlikely(pte_cont(orig_pte))) { + __contpte_try_unfold(mm, addr, ptep, orig_pte); + orig_pte = pte_mknoncont(orig_pte); + } + ___ptep_set_wrprotect(mm, addr, ptep, orig_pte); + } else { + contpte_wrprotect_ptes(mm, addr, ptep, nr); + } +} + #define __HAVE_ARCH_PTEP_SET_WRPROTECT static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); - __ptep_set_wrprotect(mm, addr, ptep); + wrprotect_ptes(mm, addr, ptep, 1); } #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS @@ -1305,6 +1345,7 @@ static inline int ptep_set_access_flags(struct vm_area_struct *vma, #define ptep_clear_flush_young __ptep_clear_flush_young #define __HAVE_ARCH_PTEP_SET_WRPROTECT #define ptep_set_wrprotect __ptep_set_wrprotect +#define wrprotect_ptes __wrprotect_ptes #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS #define ptep_set_access_flags __ptep_set_access_flags diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c index 6d7f40667fa2..bedb58524535 100644 --- a/arch/arm64/mm/contpte.c +++ b/arch/arm64/mm/contpte.c @@ -26,6 +26,26 @@ static inline pte_t *contpte_align_down(pte_t *ptep) return PTR_ALIGN_DOWN(ptep, sizeof(*ptep) * CONT_PTES); } +static void contpte_try_unfold_partial(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned int nr) +{ + /* + * Unfold any partially covered contpte block at the beginning and end + * of the range. + */ + + if (ptep != contpte_align_down(ptep) || nr < CONT_PTES) + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); + + if (ptep + nr != contpte_align_down(ptep + nr)) { + unsigned long last_addr = addr + PAGE_SIZE * (nr - 1); + pte_t *last_ptep = ptep + nr - 1; + + contpte_try_unfold(mm, last_addr, last_ptep, + __ptep_get(last_ptep)); + } +} + static void contpte_convert(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) { @@ -238,6 +258,24 @@ int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, } EXPORT_SYMBOL(contpte_ptep_clear_flush_young); +void contpte_wrprotect_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned int nr) +{ + /* + * If wrprotecting an entire contig range, we can avoid unfolding. Just + * set wrprotect and wait for the later mmu_gather flush to invalidate + * the tlb. Until the flush, the page may or may not be wrprotected. + * After the flush, it is guaranteed wrprotected. If it's a partial + * range though, we must unfold, because we can't have a case where + * CONT_PTE is set but wrprotect applies to a subset of the PTEs; this + * would cause it to continue to be unpredictable after the flush. + */ + + contpte_try_unfold_partial(mm, addr, ptep, nr); + __wrprotect_ptes(mm, addr, ptep, nr); +} +EXPORT_SYMBOL(contpte_wrprotect_ptes); + int contpte_ptep_set_access_flags(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t entry, int dirty)