From patchwork Mon Dec 4 10:54:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13478198 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4DCDC4167B for ; Mon, 4 Dec 2023 10:55:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 310E06B02B6; Mon, 4 Dec 2023 05:55:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C0F56B02B8; Mon, 4 Dec 2023 05:55:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 13B436B02B9; Mon, 4 Dec 2023 05:55:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 02D7D6B02B6 for ; Mon, 4 Dec 2023 05:55:50 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D718D1401AE for ; Mon, 4 Dec 2023 10:55:49 +0000 (UTC) X-FDA: 81528830418.27.2A99CB0 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf23.hostedemail.com (Postfix) with ESMTP id 380A3140003 for ; Mon, 4 Dec 2023 10:55:48 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701687348; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=w1WNdKqA4Y1Q1WabMoagwRQ3JEZGGW7XgKYZUGZQZTE=; b=PDH3t+t0hBKZiiM4awfBCpI07SRRmkuFJNVXpQ4OkcyWPNmIpziCGYMarXDdsSqC/Q8U7r eQbJ1Vq3tGRg9YPBLTd5dsec0rdWrgWnMFqpAZZ5XO9GE94ssMWW8eDx+iTGiJ6qyOE954 2bHKprfWvNckVBs97sAl/PPz10wEnZE= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701687348; a=rsa-sha256; cv=none; b=cMfWDFcJQQAaOCcyVV15XzVO/+dFvEWx7WsZ5Hf4idDVzsKIDi1h3PRa0D8v14HY8d219R IsJXIatpyBTTB/N+uDS4qyeTuTIsbR9U2XX2KpOc4/8wCpechbbL19kFA3u2AcXqFsF5b3 N1P1mAUcvGX0m6rZx6Vbl2j04x17N+8= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D6749139F; Mon, 4 Dec 2023 02:56:34 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2EF503F6C4; Mon, 4 Dec 2023 02:55:44 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 14/15] arm64/mm: Implement ptep_set_wrprotects() to optimize fork() Date: Mon, 4 Dec 2023 10:54:39 +0000 Message-Id: <20231204105440.61448-15-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231204105440.61448-1-ryan.roberts@arm.com> References: <20231204105440.61448-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 380A3140003 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 1xdkyrey6qn4f9ida51nm4db4fdhendi X-HE-Tag: 1701687348-679776 X-HE-Meta: U2FsdGVkX1+AU4jYHp2SmgwkLEGeXP6cM1hQJl7MyAKnvkFavNfJtnjcNCqxv9lc+CdhCUF+ACeQFUyvHpfmFJ496qPZKW7fDd0JHL/dTlMQtnLeK1cTHIDP8aIu6QFGp9OmC/vdvD3/x/VoSb+CjXyJvbqU9vO8z+iHmpbC95/2Vo6wXLW9WjaP1NWgeZGU5g5IQJLnbasFWmLkUTIPkNWKXYQlg1hLtXe/qlp4X/Ai7DltA0ReRY9m8go2yDVBCmDl7CUM52+Wmv+1EgqLn6CCgmocPlbKWw50gBAQ/MvKE53pKY5fJKjwBNRBvNandGKQT9bUIISF42+7FTNbP2JlMu7XKNmK7j66rfLGguDO4LFvMsIw4PU5YcSmHsJpfwbQzc03QMUEpGNiI1xPTZHswHhuHSv6w/CjU3ZbB8kvHicz1kJVEr+J5QhhHdsPXTwBdHRMxUBYoHw/GcUhOEuwC7XD4sPc41UUHkQ5mme6Nu/9IBexYCMZd4b6ixGhIlk+PWFIv/wUUM5RmxGOG3XqKjqD5GA35yslpADVFYw5QqtH1XXE1dByq8nnUFL20eFLZX3cGVA7z/BJM1qQxRRqZgohZDY7hA6oiRcyNDrTTJ3vK44oZXwHJUkpXJy6L/I/ZLwFiiEAkyo70TctT6woBPTFsxF2YeSXPauomRBXlVBYlMe8fDitrS7o1hdYwQwBpxJ3hs2JrNQbb4fTXNCmNnSKLYHiKQrvLCw0cYx+SJQVw9DEogryZesxvOghMLNwA0k62ehSluZtxqD0Fv3LoGygd03YWpZWkpbHZSoa38QXzjeOft/TUOs0B+6vSAGiN0izRqSymFnujQO8sQJ45eSed7I4X0rv5OFvMfLBAVdCSQJ1/0cyObVVnffAViiRFoJZMLLzjsCyQQuMirQTk1eqJeOt3AsWSIilWoXQdFztv6TxZC1BH3t/tuR6CeTFmFy7XsqZ5P+jkRw BMm0OEMv 83kUeIs5yJ8Pt5yr1s/BepzCLeN1BapheMH5ewkPIslwmIJWDuEhGAlenO/7ZNIx99IeOorol2qAOP8/TidF49VzAhaItcXk5YbLfGmYsjzBoMCEKuembxG18wxW+hG98/pIqUHs1S5uotRBHqJ+rZsA1fFQJ/6SBNMSgEbs0zA57OeCmWutSEaqq7waa4fS0ruZwUqNeu8GFNfWPm9HI8gJMw3Joqnrbp5807+DhLNiOugFcBIBtsqwNswiykXcrzbi+qEbwUlLaCjZ7dga5rWH1kg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: With the core-mm changes in place to batch-copy ptes during fork, we can take advantage of this in arm64 to greatly reduce the number of tlbis we have to issue, and recover the lost fork performance incured when adding support for transparent contiguous ptes. If we are write-protecting a whole contig range, we can apply the write-protection to the whole range and know that it won't change whether the range should have the contiguous bit set or not. For ranges smaller than the contig range, we will still have to unfold, apply the write-protection, then fold if the change now means the range is foldable. This optimization is possible thanks to the tightening of the Arm ARM in respect to the definition and behaviour when 'Misprogramming the Contiguous bit'. See section D21194 at https://developer.arm.com/documentation/102105/latest/ Performance tested with the following test written for the will-it-scale framework: ------- char *testcase_description = "fork and exit"; void testcase(unsigned long long *iterations, unsigned long nr) { int pid; char *mem; mem = malloc(SZ_128M); assert(mem); memset(mem, 1, SZ_128M); while (1) { pid = fork(); assert(pid >= 0); if (!pid) exit(0); waitpid(pid, NULL, 0); (*iterations)++; } } ------- I see huge performance regression when PTE_CONT support was added, then the regression is mostly fixed with the addition of this change. The following shows regression relative to before PTE_CONT was enabled (bigger negative value is bigger regression): | cpus | before opt | after opt | |-------:|-------------:|------------:| | 1 | -10.4% | -5.2% | | 8 | -15.4% | -3.5% | | 16 | -38.7% | -3.7% | | 24 | -57.0% | -4.4% | | 32 | -65.8% | -5.4% | Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 30 ++++++++++++++++++++--- arch/arm64/mm/contpte.c | 42 ++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 15bc9cf1eef4..9bd2f57a9e11 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -984,6 +984,16 @@ static inline void __ptep_set_wrprotect(struct mm_struct *mm, } while (pte_val(pte) != pte_val(old_pte)); } +static inline void __ptep_set_wrprotects(struct mm_struct *mm, + unsigned long address, pte_t *ptep, + unsigned int nr) +{ + unsigned int i; + + for (i = 0; i < nr; i++, address += PAGE_SIZE, ptep++) + __ptep_set_wrprotect(mm, address, ptep); +} + #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define __HAVE_ARCH_PMDP_SET_WRPROTECT static inline void pmdp_set_wrprotect(struct mm_struct *mm, @@ -1139,6 +1149,8 @@ extern int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); extern int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); +extern void contpte_set_wrprotects(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned int nr); extern int contpte_ptep_set_access_flags(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t entry, int dirty); @@ -1290,13 +1302,25 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma, return contpte_ptep_clear_flush_young(vma, addr, ptep); } +#define ptep_set_wrprotects ptep_set_wrprotects +static inline void ptep_set_wrprotects(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned int nr) +{ + if (!contpte_is_enabled(mm)) + __ptep_set_wrprotects(mm, addr, ptep, nr); + else if (nr == 1) { + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); + __ptep_set_wrprotects(mm, addr, ptep, 1); + contpte_try_fold(mm, addr, ptep, __ptep_get(ptep)); + } else + contpte_set_wrprotects(mm, addr, ptep, nr); +} + #define __HAVE_ARCH_PTEP_SET_WRPROTECT static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); - __ptep_set_wrprotect(mm, addr, ptep); - contpte_try_fold(mm, addr, ptep, __ptep_get(ptep)); + ptep_set_wrprotects(mm, addr, ptep, 1); } #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c index e079ec61d7d1..2a57df16bf58 100644 --- a/arch/arm64/mm/contpte.c +++ b/arch/arm64/mm/contpte.c @@ -303,6 +303,48 @@ int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, } EXPORT_SYMBOL(contpte_ptep_clear_flush_young); +void contpte_set_wrprotects(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned int nr) +{ + unsigned long next; + unsigned long end = addr + (nr << PAGE_SHIFT); + + do { + next = pte_cont_addr_end(addr, end); + nr = (next - addr) >> PAGE_SHIFT; + + /* + * If wrprotecting an entire contig range, we can avoid + * unfolding. Just set wrprotect and wait for the later + * mmu_gather flush to invalidate the tlb. Until the flush, the + * page may or may not be wrprotected. After the flush, it is + * guarranteed wrprotected. If its a partial range though, we + * must unfold, because we can't have a case where CONT_PTE is + * set but wrprotect applies to a subset of the PTEs; this would + * cause it to continue to be unpredictable after the flush. + */ + if (nr != CONT_PTES) + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); + + __ptep_set_wrprotects(mm, addr, ptep, nr); + + addr = next; + ptep += nr; + + /* + * If applying to a partial contig range, the change could have + * made the range foldable. Use the last pte in the range we + * just set for comparison, since contpte_try_fold() only + * triggers when acting on the last pte in the contig range. + */ + if (nr != CONT_PTES) + contpte_try_fold(mm, addr - PAGE_SIZE, ptep - 1, + __ptep_get(ptep - 1)); + + } while (addr != end); +} +EXPORT_SYMBOL(contpte_set_wrprotects); + int contpte_ptep_set_access_flags(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t entry, int dirty)