From patchwork Fri Apr 14 13:02:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211585 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B1C60C77B6E for ; Fri, 14 Apr 2023 14:16:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=tuZvaVJSbJ4YP82yUfGvnppFcDZTI40FEfvhkkoYOy0=; b=ganVFDcVxAWmON tFkP0Sg8LNKsob87I8p32klmoqZHqMBzYifPhQeKTB7ZYCWhEke3E7NPSA+HVmpC7mc59gTp31Gjl 9R361HsgT/b5pkcgRJQAeWsZyqD9lq1bS8ppF7g81TJUvY/NLVa1JEqWevsPEY5SQ2U2cJD9c7p7X qiq6kC5CQ0JYk31aY7aaH2bCi+BhRooez8nybmOxViqC5twsVTCi3+gogRCUENZbfBXcoVhDp4b9n pKjc1NkBPSC7PokfqVIIysL/v8l30HeMshrihqUxcb+BCsI7WdSGgMMSs2PUxRiGYczSJoZFHrP55 HBLTG0DukfjcnJhjaOdw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnKDj-009n3j-1N; Fri, 14 Apr 2023 14:16:11 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5I-009bMd-30 for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:30 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D119D4B3; Fri, 14 Apr 2023 06:04:04 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5881F3F6C4; Fri, 14 Apr 2023 06:03:19 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 01/17] mm: Expose clear_huge_page() unconditionally Date: Fri, 14 Apr 2023 14:02:47 +0100 Message-Id: <20230414130303.2345383-2-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_060325_057571_BBE9BDDF X-CRM114-Status: GOOD ( 10.26 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org In preparation for extending vma_alloc_zeroed_movable_folio() to allocate a arbitrary order folio, expose clear_huge_page() unconditionally, so that it can be used to zero the allocated folio. Signed-off-by: Ryan Roberts --- include/linux/mm.h | 3 ++- mm/memory.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) -- 2.25.1 diff --git a/include/linux/mm.h b/include/linux/mm.h index 1f79667824eb..cdb8c6031d0f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3538,10 +3538,11 @@ enum mf_action_page_type { */ extern const struct attribute_group memory_failure_attr_group; -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) extern void clear_huge_page(struct page *page, unsigned long addr_hint, unsigned int pages_per_huge_page); + +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) extern void copy_user_huge_page(struct page *dst, struct page *src, unsigned long addr_hint, struct vm_area_struct *vma, diff --git a/mm/memory.c b/mm/memory.c index 01a23ad48a04..3e2eee8c66a7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5642,7 +5642,6 @@ void __might_fault(const char *file, int line) EXPORT_SYMBOL(__might_fault); #endif -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) /* * Process all subpages of the specified huge page with the specified * operation. The target subpage will be processed last to keep its @@ -5730,6 +5729,8 @@ void clear_huge_page(struct page *page, process_huge_page(addr_hint, pages_per_huge_page, clear_subpage, page); } +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) + static void copy_user_gigantic_page(struct page *dst, struct page *src, unsigned long addr, struct vm_area_struct *vma, From patchwork Fri Apr 14 13:02:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211583 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 740B5C77B6E for ; Fri, 14 Apr 2023 14:16:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=czl5yqHeq5ucV/QLiqVtj8wxO7p9gcXYCTVqdowqmuI=; b=U4LoGSSuottgE9 ysbFmTJ4vz8flOdXSm8IKYPA+8dlb8E5VMnOa4zUwlyhzEVyAS4YY7Wemz1saMihqn2xW9yHCqecs uh12tl3FrYrb2e4HhrLlO3NNEaepy5cDyXaoQnAlACEp5XdwgC0lpSHV4lZkqp1vhjNd9PpIvpC7U PL36FrKwjfI+WvbVSOjOxTfsYJDu0MAZmvvKO6P3/EiKIXeFhkouThPkMV+ESKCs8vkh3wXF6GYF5 5e0SLElDVSguvHbuMmf86lQekqRMN/t9VmGA5KnAns0L89UYAe83yjlV2CSrK5JZQ37yppWJiEVtY sV+i/O38EYZFZQWmvOLg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnKDi-009n3P-02; Fri, 14 Apr 2023 14:16:10 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5I-009bMf-2b for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:27 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0F8A316F8; Fri, 14 Apr 2023 06:04:06 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8AF9E3F6C4; Fri, 14 Apr 2023 06:03:20 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 02/17] mm: pass gfp flags and order to vma_alloc_zeroed_movable_folio() Date: Fri, 14 Apr 2023 14:02:48 +0100 Message-Id: <20230414130303.2345383-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_060324_989775_3E216EA2 X-CRM114-Status: GOOD ( 18.70 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Allow allocation of large folios with vma_alloc_zeroed_movable_folio(). This prepares the ground for large anonymous folios. The generic implementation of vma_alloc_zeroed_movable_folio() now uses clear_huge_page() to zero the allocated folio since it may now be a non-0 order. Currently the function is always called with order 0 and no extra gfp flags, so no functional change intended. Signed-off-by: Ryan Roberts --- arch/alpha/include/asm/page.h | 5 +++-- arch/arm64/include/asm/page.h | 3 ++- arch/arm64/mm/fault.c | 7 ++++--- arch/ia64/include/asm/page.h | 5 +++-- arch/m68k/include/asm/page_no.h | 7 ++++--- arch/s390/include/asm/page.h | 5 +++-- arch/x86/include/asm/page.h | 5 +++-- include/linux/highmem.h | 23 +++++++++++++---------- mm/memory.c | 5 +++-- 9 files changed, 38 insertions(+), 27 deletions(-) -- 2.25.1 diff --git a/arch/alpha/include/asm/page.h b/arch/alpha/include/asm/page.h index 4db1ebc0ed99..6fc7fe91b6cb 100644 --- a/arch/alpha/include/asm/page.h +++ b/arch/alpha/include/asm/page.h @@ -17,8 +17,9 @@ extern void clear_page(void *page); #define clear_user_page(page, vaddr, pg) clear_page(page) -#define vma_alloc_zeroed_movable_folio(vma, vaddr) \ - vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false) +#define vma_alloc_zeroed_movable_folio(vma, vaddr, gfp, order) \ + vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO | (gfp), \ + order, vma, vaddr, false) extern void copy_page(void * _to, void * _from); #define copy_user_page(to, from, vaddr, pg) copy_page(to, from) diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h index 2312e6ee595f..47710852f872 100644 --- a/arch/arm64/include/asm/page.h +++ b/arch/arm64/include/asm/page.h @@ -30,7 +30,8 @@ void copy_highpage(struct page *to, struct page *from); #define __HAVE_ARCH_COPY_HIGHPAGE struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma, - unsigned long vaddr); + unsigned long vaddr, + gfp_t gfp, int order); #define vma_alloc_zeroed_movable_folio vma_alloc_zeroed_movable_folio void tag_clear_highpage(struct page *to); diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index f4cb0f85ccf4..3b4cc04f7a23 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -926,9 +926,10 @@ NOKPROBE_SYMBOL(do_debug_exception); * Used during anonymous page fault handling. */ struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma, - unsigned long vaddr) + unsigned long vaddr, + gfp_t gfp, int order) { - gfp_t flags = GFP_HIGHUSER_MOVABLE | __GFP_ZERO; + gfp_t flags = GFP_HIGHUSER_MOVABLE | __GFP_ZERO | gfp; /* * If the page is mapped with PROT_MTE, initialise the tags at the @@ -938,7 +939,7 @@ struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma, if (vma->vm_flags & VM_MTE) flags |= __GFP_ZEROTAGS; - return vma_alloc_folio(flags, 0, vma, vaddr, false); + return vma_alloc_folio(flags, order, vma, vaddr, false); } void tag_clear_highpage(struct page *page) diff --git a/arch/ia64/include/asm/page.h b/arch/ia64/include/asm/page.h index 310b09c3342d..ebdf04274023 100644 --- a/arch/ia64/include/asm/page.h +++ b/arch/ia64/include/asm/page.h @@ -82,10 +82,11 @@ do { \ } while (0) -#define vma_alloc_zeroed_movable_folio(vma, vaddr) \ +#define vma_alloc_zeroed_movable_folio(vma, vaddr, gfp, order) \ ({ \ struct folio *folio = vma_alloc_folio( \ - GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false); \ + GFP_HIGHUSER_MOVABLE | __GFP_ZERO | (gfp), \ + order, vma, vaddr, false); \ if (folio) \ flush_dcache_folio(folio); \ folio; \ diff --git a/arch/m68k/include/asm/page_no.h b/arch/m68k/include/asm/page_no.h index 060e4c0e7605..4a2fe57fef5e 100644 --- a/arch/m68k/include/asm/page_no.h +++ b/arch/m68k/include/asm/page_no.h @@ -3,7 +3,7 @@ #define _M68K_PAGE_NO_H #ifndef __ASSEMBLY__ - + extern unsigned long memory_start; extern unsigned long memory_end; @@ -13,8 +13,9 @@ extern unsigned long memory_end; #define clear_user_page(page, vaddr, pg) clear_page(page) #define copy_user_page(to, from, vaddr, pg) copy_page(to, from) -#define vma_alloc_zeroed_movable_folio(vma, vaddr) \ - vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false) +#define vma_alloc_zeroed_movable_folio(vma, vaddr, gfp, order) \ + vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO | (gfp), \ + order, vma, vaddr, false) #define __pa(vaddr) ((unsigned long)(vaddr)) #define __va(paddr) ((void *)((unsigned long)(paddr))) diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h index 8a2a3b5d1e29..b749564140f1 100644 --- a/arch/s390/include/asm/page.h +++ b/arch/s390/include/asm/page.h @@ -73,8 +73,9 @@ static inline void copy_page(void *to, void *from) #define clear_user_page(page, vaddr, pg) clear_page(page) #define copy_user_page(to, from, vaddr, pg) copy_page(to, from) -#define vma_alloc_zeroed_movable_folio(vma, vaddr) \ - vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false) +#define vma_alloc_zeroed_movable_folio(vma, vaddr, gfp, order) \ + vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO | (gfp), \ + order, vma, vaddr, false) /* * These are used to make use of C type-checking.. diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h index d18e5c332cb9..34deab1a8dae 100644 --- a/arch/x86/include/asm/page.h +++ b/arch/x86/include/asm/page.h @@ -34,8 +34,9 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr, copy_page(to, from); } -#define vma_alloc_zeroed_movable_folio(vma, vaddr) \ - vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false) +#define vma_alloc_zeroed_movable_folio(vma, vaddr, gfp, order) \ + vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO | (gfp), \ + order, vma, vaddr, false) #ifndef __pa #define __pa(x) __phys_addr((unsigned long)(x)) diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 8fc10089e19e..54e68deae5ef 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -209,26 +209,29 @@ static inline void clear_user_highpage(struct page *page, unsigned long vaddr) #ifndef vma_alloc_zeroed_movable_folio /** - * vma_alloc_zeroed_movable_folio - Allocate a zeroed page for a VMA. - * @vma: The VMA the page is to be allocated for. - * @vaddr: The virtual address the page will be inserted into. - * - * This function will allocate a page suitable for inserting into this - * VMA at this virtual address. It may be allocated from highmem or + * vma_alloc_zeroed_movable_folio - Allocate a zeroed folio for a VMA. + * @vma: The start VMA the folio is to be allocated for. + * @vaddr: The virtual address the folio will be inserted into. + * @gfp: Additional gfp falgs to mix in or 0. + * @order: The order of the folio (2^order pages). + * + * This function will allocate a folio suitable for inserting into this + * VMA starting at this virtual address. It may be allocated from highmem or * the movable zone. An architecture may provide its own implementation. * - * Return: A folio containing one allocated and zeroed page or NULL if + * Return: A folio containing 2^order allocated and zeroed pages or NULL if * we are out of memory. */ static inline struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma, - unsigned long vaddr) + unsigned long vaddr, gfp_t gfp, int order) { struct folio *folio; - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr, false); + folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE | gfp, + order, vma, vaddr, false); if (folio) - clear_user_highpage(&folio->page, vaddr); + clear_huge_page(&folio->page, vaddr, 1U << order); return folio; } diff --git a/mm/memory.c b/mm/memory.c index 3e2eee8c66a7..9d5e8be49f3b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3061,7 +3061,8 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) goto oom; if (is_zero_pfn(pte_pfn(vmf->orig_pte))) { - new_folio = vma_alloc_zeroed_movable_folio(vma, vmf->address); + new_folio = vma_alloc_zeroed_movable_folio(vma, vmf->address, + 0, 0); if (!new_folio) goto oom; } else { @@ -4063,7 +4064,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) /* Allocate our own private page. */ if (unlikely(anon_vma_prepare(vma))) goto oom; - folio = vma_alloc_zeroed_movable_folio(vma, vmf->address); + folio = vma_alloc_zeroed_movable_folio(vma, vmf->address, 0, 0); if (!folio) goto oom; From patchwork Fri Apr 14 13:02:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211584 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 44CCCC77B72 for ; Fri, 14 Apr 2023 14:16:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=/O6d7/3hQfYF0YEdcxjTlOXYpTTb9Oh2eP4SqOzsKss=; b=ReU4q6LO9Gz2Hu F2N1JQAbNF13ujvzifyWzQwzhwJezbgl4MQFJQdUzBl6AapdxCFrJVy1/0JGlLmMBAqI+4tzW2V4r nYVwM2Pt6owCuE7VkgPFWVzxN6HU1/i4SG6LWho+ELzELuCzwWboBmS/bkkCHfED/oG+iMLXcmYpG pYIH9nodH4WB6d93HWaS9QqWK8aW/D6J7v4EfK/YxEgWpeHvvpDj0J0rGWTpUwmFzdhKsRX9inKmP Mu8yKjFT3EAlmXrfutHLa6DOGUCSTMioarmZ4lZSQq3tzr7alEgaIfH3U71nqPnu9RxBToHrF48bz vqnYBby3bGrF9bl3Ie0g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnKDi-009n3T-1J; Fri, 14 Apr 2023 14:16:10 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5I-009bMl-2m for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:30 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 417941713; Fri, 14 Apr 2023 06:04:07 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BD1443F6C4; Fri, 14 Apr 2023 06:03:21 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 03/17] mm: Introduce try_vma_alloc_movable_folio() Date: Fri, 14 Apr 2023 14:02:49 +0100 Message-Id: <20230414130303.2345383-4-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_060325_236056_20499166 X-CRM114-Status: GOOD ( 12.63 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Opportunistically attempt to allocate high-order folios in highmem, optionally zeroed. Retry with lower orders all the way to order-0, until success. Although, of note, order-1 allocations are skipped since a large folio must be at least order-2 to work with the THP machinery. The user must check what they got with folio_order(). This will be used to oportunistically allocate large folios for anonymous memory with a sensible fallback under memory pressure. For attempts to allocate non-0 orders, we set __GFP_NORETRY to prevent high latency due to reclaim, instead preferring to just try for a lower order. The same approach is used by the readahead code when allocating large folios. Signed-off-by: Ryan Roberts --- mm/memory.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) -- 2.25.1 diff --git a/mm/memory.c b/mm/memory.c index 9d5e8be49f3b..ca32f59acef2 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2989,6 +2989,39 @@ static vm_fault_t fault_dirty_shared_page(struct vm_fault *vmf) return 0; } +static inline struct folio *vma_alloc_movable_folio(struct vm_area_struct *vma, + unsigned long vaddr, int order, bool zeroed) +{ + gfp_t gfp = order > 0 ? __GFP_NORETRY | __GFP_NOWARN : 0; + + if (zeroed) + return vma_alloc_zeroed_movable_folio(vma, vaddr, gfp, order); + else + return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | gfp, order, vma, + vaddr, false); +} + +/* + * Opportunistically attempt to allocate high-order folios, retrying with lower + * orders all the way to order-0, until success. order-1 allocations are skipped + * since a folio must be at least order-2 to work with the THP machinery. The + * user must check what they got with folio_order(). vaddr can be any virtual + * address that will be mapped by the allocated folio. + */ +static struct folio *try_vma_alloc_movable_folio(struct vm_area_struct *vma, + unsigned long vaddr, int order, bool zeroed) +{ + struct folio *folio; + + for (; order > 1; order--) { + folio = vma_alloc_movable_folio(vma, vaddr, order, zeroed); + if (folio) + return folio; + } + + return vma_alloc_movable_folio(vma, vaddr, 0, zeroed); +} + /* * Handle write page faults for pages that can be reused in the current vma * From patchwork Fri Apr 14 13:02:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211586 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 52BC8C77B72 for ; Fri, 14 Apr 2023 14:16:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=7okxUHoKL6m7+fXJw8CH//uhZsjik+IRi0cZ3fj6Z8s=; b=R5h7Nz7NnNxwHq Uta7ITIslZ/N777174b5iRio0E/vFj4ccCxzD8hHKxED9wfZTjEQ+NVZOWOnqvEk/30LsexNb2Fbr W6gAo95LwEVgB+EMevjBFuMRDxix5L75EthkPMpDLy8FDjklbWodozqunNLsD10VwKJNtQOvJxran qb5ZIlLyTQEW8OPx63rXyAWV22Ko4FnUrGLyo1W0d7WEvY6WPhYPVzWI2TIuQYCr+oY4o5PjHN7tZ 9vFt09ZxmMEo3A/t2pPn9RdZAEt0nYNXbfkIW0onqgxoBdebQXC2uYyvXDHBwJ4BK07KgJeZ47zx5 Ufmr1Na1cmTOXS23vM/w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnKDk-009n4M-1f; Fri, 14 Apr 2023 14:16:12 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5I-009bMt-2f for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:30 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 73A851756; Fri, 14 Apr 2023 06:04:08 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EF0BF3F6C4; Fri, 14 Apr 2023 06:03:22 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 04/17] mm: Implement folio_add_new_anon_rmap_range() Date: Fri, 14 Apr 2023 14:02:50 +0100 Message-Id: <20230414130303.2345383-5-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_060324_996231_CF2EB268 X-CRM114-Status: GOOD ( 14.31 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Like folio_add_new_anon_rmap() but batch-rmaps a range of pages belonging to a folio, for effciency savings. All pages are accounted as small pages. Signed-off-by: Ryan Roberts --- include/linux/rmap.h | 2 ++ mm/rmap.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+) -- 2.25.1 diff --git a/include/linux/rmap.h b/include/linux/rmap.h index b87d01660412..5c707f53d7b5 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -196,6 +196,8 @@ void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, unsigned long address); void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *, unsigned long address); +void folio_add_new_anon_rmap_range(struct folio *folio, struct page *page, + int nr, struct vm_area_struct *vma, unsigned long address); void page_add_file_rmap(struct page *, struct vm_area_struct *, bool compound); void page_remove_rmap(struct page *, struct vm_area_struct *, diff --git a/mm/rmap.c b/mm/rmap.c index 8632e02661ac..d563d979c005 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1302,6 +1302,49 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, __page_set_anon_rmap(folio, &folio->page, vma, address, 1); } +/** + * folio_add_new_anon_rmap_range - Add mapping to a set of pages within a new + * anonymous potentially large folio. + * @folio: The folio containing the pages to be mapped + * @page: First page in the folio to be mapped + * @nr: Number of pages to be mapped + * @vma: the vm area in which the mapping is added + * @address: the user virtual address of the first page to be mapped + * + * Like folio_add_new_anon_rmap() but batch-maps a range of pages within a folio + * using non-THP accounting. Like folio_add_new_anon_rmap(), the inc-and-test is + * bypassed and the folio does not have to be locked. All pages in the folio are + * individually accounted. + * + * As the folio is new, it's assumed to be mapped exclusively by a single + * process. + */ +void folio_add_new_anon_rmap_range(struct folio *folio, struct page *page, + int nr, struct vm_area_struct *vma, unsigned long address) +{ + int i; + + VM_BUG_ON_VMA(address < vma->vm_start || + address + (nr << PAGE_SHIFT) > vma->vm_end, vma); + __folio_set_swapbacked(folio); + + if (folio_test_large(folio)) { + /* increment count (starts at 0) */ + atomic_set(&folio->_nr_pages_mapped, nr); + } + + for (i = 0; i < nr; i++) { + /* increment count (starts at -1) */ + atomic_set(&page->_mapcount, 0); + __page_set_anon_rmap(folio, page, vma, address, 1); + page++; + address += PAGE_SIZE; + } + + __lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr); + +} + /** * page_add_file_rmap - add pte mapping to a file page * @page: the page to add the mapping to From patchwork Fri Apr 14 13:02:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211474 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 96137C77B76 for ; Fri, 14 Apr 2023 13:04:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=ExKz1vGRnVpwK+YfqpvJTBpUG9qyIcnnhoB8pCUzkvA=; b=JbKQueoq7f33Kk HBYO811Iwk2+JlurWMqZ5lIozEpGzRhwuG5DKjXgenpG9ER4GTpYV8A8an/hxL2lFkL/XB/09xMNw jsxT0aI8pDvFLhM3wjylLIMG65osOBAD2FwqQ+Tp1vkEw1ZJPrXvN6M93oKpG6OW9k5TybKXrJf9Y ozAC2mK75kpx/5nuhtTYLGmrUErzO7UXrQIOuGx0IFn7tHeANdKc0J46ISJ2N9vySbKdFtlGgLIyu Y8Yqvif+p9kPN4k7myFMcG3evhB+jmiv+AbCo/ko+aqY2ssd91RLS0hIZvAelTIMGscLOODfsa49H S5mVU2MW5FAE47A4CyaA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5l-009bco-0q; Fri, 14 Apr 2023 13:03:53 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5M-009bQ5-2U for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:37 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A585D1758; Fri, 14 Apr 2023 06:04:09 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2CD9F3F6C4; Fri, 14 Apr 2023 06:03:24 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 05/17] mm: Routines to determine max anon folio allocation order Date: Fri, 14 Apr 2023 14:02:51 +0100 Message-Id: <20230414130303.2345383-6-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_060328_932211_7EBA77CC X-CRM114-Status: GOOD ( 10.07 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org For variable-order anonymous folios, we want to tune the order that we prefer to allocate based on the vma. Add the routines to manage that heuristic. TODO: Currently we always use the global maximum. Add per-vma logic! Signed-off-by: Ryan Roberts --- include/linux/mm.h | 5 +++++ mm/memory.c | 8 ++++++++ 2 files changed, 13 insertions(+) -- 2.25.1 diff --git a/include/linux/mm.h b/include/linux/mm.h index cdb8c6031d0f..cc8d0b239116 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3674,4 +3674,9 @@ madvise_set_anon_name(struct mm_struct *mm, unsigned long start, } #endif +/* + * TODO: Should this be set per-architecture? + */ +#define ANON_FOLIO_ORDER_MAX 4 + #endif /* _LINUX_MM_H */ diff --git a/mm/memory.c b/mm/memory.c index ca32f59acef2..d7e34a8c46aa 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3022,6 +3022,14 @@ static struct folio *try_vma_alloc_movable_folio(struct vm_area_struct *vma, return vma_alloc_movable_folio(vma, vaddr, 0, zeroed); } +static inline int max_anon_folio_order(struct vm_area_struct *vma) +{ + /* + * TODO: Policy for maximum folio order should likely be per-vma. + */ + return ANON_FOLIO_ORDER_MAX; +} + /* * Handle write page faults for pages that can be reused in the current vma * From patchwork Fri Apr 14 13:02:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211480 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7B3CBC77B76 for ; Fri, 14 Apr 2023 13:05:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=0vGimOWDgtE7DWQSIToOOnorsImDW4PmoLGXjlyGRng=; b=1ie6QzOCM6n8qf Vujufqd/ezx2qxMCuElbseKZzIzgfWVAzm6HSgvF9mpnzTsM/1YHg7Q5F2f6SSdsletVzBE5bLtRS zSuDBUpo2ag1azRk2a46c4gPDOBbk8Oey5M0GXZzdylovyzPAJT8+OEp81hAMG1YI62laz42v0ILX ZBRMUo/aXfrtZjPGewNP5fW3KJ4lVsf6BjRkApU/Gdj0uXKbsRD5GWtWhTHRtZuThRuYkyEO7L/E2 Js1LySooNYwF1wd7h8Ek1qSIWoM+cDD4qrlWUCISz6CSk+g1DPiln3xsHIzNF3DFtfH1N0fraFWeu eIV3iMfqNocnwAsnlsdw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5o-009bed-26; Fri, 14 Apr 2023 13:03:57 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5W-009bVm-0l for linux-arm-kernel@bombadil.infradead.org; Fri, 14 Apr 2023 13:03:38 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=umfZXG2tnEOVqPFoBPLwRJ4GEhHWiFP35tKb/83gX6k=; b=FlWrBjCIoqrB7SRFQaWNoZLU9w oiYjLOEinAX3qeBRz4hSshVR6I/GXA/YzG2yESSVlLXudmqk6BgdeR8LpWCiuBqi8HUVWuvf5Oqk6 KhN90LuftojxqloI8suuscaDlZU188oXyd9hXeCrk4XgBTp2OJdV6aiM+VJSQkK1N783wh6xsAzAu eYkfpX0/ogVUqNsHj6Gg/ZeHy2Ys89xFP5tL8Evexxa7U98i+UQedZkQGx96myNDnaySklP+w2sF1 FbZ1ZPd77s7C+4lxAa2xrZpdwG/iEvLcA8S6v3wJk3qdWRQIV1iFfs049SN7W0GdA5RF7ap7W15CJ bPvqV6Xw==; Received: from foss.arm.com ([217.140.110.172]) by desiato.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5R-00Fa15-0E for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:36 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D7B5F175A; Fri, 14 Apr 2023 06:04:10 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5F32E3F6C4; Fri, 14 Apr 2023 06:03:25 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 06/17] mm: Allocate large folios for anonymous memory Date: Fri, 14 Apr 2023 14:02:52 +0100 Message-Id: <20230414130303.2345383-7-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_140333_503237_176A1B7B X-CRM114-Status: GOOD ( 25.83 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Add the machinery to determine what order of folio to allocate within do_anonymous_page() and deal with racing faults to the same region. For now, the maximum order is set to 4. This should probably be set per-vma based on factors, and adjusted dynamically. Signed-off-by: Ryan Roberts --- mm/memory.c | 154 ++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 138 insertions(+), 16 deletions(-) -- 2.25.1 diff --git a/mm/memory.c b/mm/memory.c index d7e34a8c46aa..f92a28064596 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3030,6 +3030,90 @@ static inline int max_anon_folio_order(struct vm_area_struct *vma) return ANON_FOLIO_ORDER_MAX; } +/* + * Returns index of first pte that is not none, or nr if all are none. + */ +static inline int check_ptes_none(pte_t *pte, int nr) +{ + int i; + + for (i = 0; i < nr; i++) { + if (!pte_none(*pte++)) + return i; + } + + return nr; +} + +static int calc_anon_folio_order_alloc(struct vm_fault *vmf, int order) +{ + /* + * The aim here is to determine what size of folio we should allocate + * for this fault. Factors include: + * - Order must not be higher than `order` upon entry + * - Folio must be naturally aligned within VA space + * - Folio must not breach boundaries of vma + * - Folio must be fully contained inside one pmd entry + * - Folio must not overlap any non-none ptes + * + * Additionally, we do not allow order-1 since this breaks assumptions + * elsewhere in the mm; THP pages must be at least order-2 (since they + * store state up to the 3rd struct page subpage), and these pages must + * be THP in order to correctly use pre-existing THP infrastructure such + * as folio_split(). + * + * As a consequence of relying on the THP infrastructure, if the system + * does not support THP, we always fallback to order-0. + * + * Note that the caller may or may not choose to lock the pte. If + * unlocked, the calculation should be considered an estimate that will + * need to be validated under the lock. + */ + + struct vm_area_struct *vma = vmf->vma; + int nr; + unsigned long addr; + pte_t *pte; + pte_t *first_set = NULL; + int ret; + + if (has_transparent_hugepage()) { + order = min(order, PMD_SHIFT - PAGE_SHIFT); + + for (; order > 1; order--) { + nr = 1 << order; + addr = ALIGN_DOWN(vmf->address, nr << PAGE_SHIFT); + pte = vmf->pte - ((vmf->address - addr) >> PAGE_SHIFT); + + /* Check vma bounds. */ + if (addr < vma->vm_start || + addr + (nr << PAGE_SHIFT) > vma->vm_end) + continue; + + /* Ptes covered by order already known to be none. */ + if (pte + nr <= first_set) + break; + + /* Already found set pte in range covered by order. */ + if (pte <= first_set) + continue; + + /* Need to check if all the ptes are none. */ + ret = check_ptes_none(pte, nr); + if (ret == nr) + break; + + first_set = pte + ret; + } + + if (order == 1) + order = 0; + } else + order = 0; + + return order; +} + /* * Handle write page faults for pages that can be reused in the current vma * @@ -4058,6 +4142,9 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) struct folio *folio; vm_fault_t ret = 0; pte_t entry; + unsigned long addr; + int order = max_anon_folio_order(vma); + int pgcount = BIT(order); /* File mapping without ->vm_ops ? */ if (vma->vm_flags & VM_SHARED) @@ -4099,24 +4186,42 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); return handle_userfault(vmf, VM_UFFD_MISSING); } - goto setpte; + set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); + + /* No need to invalidate - it was non-present before */ + update_mmu_cache(vma, vmf->address, vmf->pte); + goto unlock; } - /* Allocate our own private page. */ +retry: + /* + * Estimate the folio order to allocate. We are not under the ptl here + * so this estiamte needs to be re-checked later once we have the lock. + */ + vmf->pte = pte_offset_map(vmf->pmd, vmf->address); + order = calc_anon_folio_order_alloc(vmf, order); + pte_unmap(vmf->pte); + + /* Allocate our own private folio. */ if (unlikely(anon_vma_prepare(vma))) goto oom; - folio = vma_alloc_zeroed_movable_folio(vma, vmf->address, 0, 0); + folio = try_vma_alloc_movable_folio(vma, vmf->address, order, true); if (!folio) goto oom; + /* We may have been granted less than we asked for. */ + order = folio_order(folio); + pgcount = BIT(order); + addr = ALIGN_DOWN(vmf->address, pgcount << PAGE_SHIFT); + if (mem_cgroup_charge(folio, vma->vm_mm, GFP_KERNEL)) goto oom_free_page; - cgroup_throttle_swaprate(&folio->page, GFP_KERNEL); + folio_throttle_swaprate(folio, GFP_KERNEL); /* * The memory barrier inside __folio_mark_uptodate makes sure that - * preceding stores to the page contents become visible before - * the set_pte_at() write. + * preceding stores to the folio contents become visible before + * the set_ptes() write. */ __folio_mark_uptodate(folio); @@ -4125,11 +4230,26 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) if (vma->vm_flags & VM_WRITE) entry = pte_mkwrite(pte_mkdirty(entry)); - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, - &vmf->ptl); - if (!pte_none(*vmf->pte)) { - update_mmu_tlb(vma, vmf->address, vmf->pte); - goto release; + vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl); + + /* + * Ensure our estimate above is still correct; we could have raced with + * another thread to service a fault in the region. + */ + if (unlikely(check_ptes_none(vmf->pte, pgcount) != pgcount)) { + pte_t *pte = vmf->pte + ((vmf->address - addr) >> PAGE_SHIFT); + + /* If faulting pte was allocated by another, exit early. */ + if (order == 0 || !pte_none(*pte)) { + update_mmu_tlb(vma, vmf->address, pte); + goto release; + } + + /* Else try again, with a lower order. */ + pte_unmap_unlock(vmf->pte, vmf->ptl); + folio_put(folio); + order--; + goto retry; } ret = check_stable_address_space(vma->vm_mm); @@ -4143,14 +4263,16 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) return handle_userfault(vmf, VM_UFFD_MISSING); } - inc_mm_counter(vma->vm_mm, MM_ANONPAGES); - folio_add_new_anon_rmap(folio, vma, vmf->address); + folio_ref_add(folio, pgcount - 1); + + add_mm_counter(vma->vm_mm, MM_ANONPAGES, pgcount); + folio_add_new_anon_rmap_range(folio, &folio->page, pgcount, vma, addr); folio_add_lru_vma(folio, vma); -setpte: - set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); + + set_ptes(vma->vm_mm, addr, vmf->pte, entry, pgcount); /* No need to invalidate - it was non-present before */ - update_mmu_cache(vma, vmf->address, vmf->pte); + update_mmu_cache_range(vma, addr, vmf->pte, pgcount); unlock: pte_unmap_unlock(vmf->pte, vmf->ptl); return ret; From patchwork Fri Apr 14 13:02:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211475 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DAB9DC77B72 for ; Fri, 14 Apr 2023 13:04:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=/AbvSSOuUteGB2AC8v00mDgcRxz1SMK9C+LcC69HXK0=; b=sU3ufPRFx5UMBM eq75A2vru/MLaEL88URqfLbcIg36dm0XXb+I88AtGZJKVXwIenyoKVEGxz59s8FDtdvXHBcDqE79I 1IHjgrRKTXuxM3lZKWvGAVESIKEXtHvQhzjZYsd5lbfjgxV0iBhw1MC/dvW8Wd3z92orosiJjCf1Y 9lFOi+RxkFHZTsFVH1yu5eSMs6xiag+EmQNnesBjsla2dfXCn9pl+EN6IXwgTDTR5zpcFQqzEUGsQ 2Hlctx0HCgafvs+IWGzHLxPDzMt+bmOl5ASPHJC4mK2dbWM5W+Wd8SaadjEFmyKxYQJzWeRCFv8CN w7eJuq07ZucydvK6WzRQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5k-009bcE-07; Fri, 14 Apr 2023 13:03:52 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5M-009bQI-2U for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:37 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 15934175D; Fri, 14 Apr 2023 06:04:12 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 911753F6C4; Fri, 14 Apr 2023 06:03:26 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 07/17] mm: Allow deferred splitting of arbitrary large anon folios Date: Fri, 14 Apr 2023 14:02:53 +0100 Message-Id: <20230414130303.2345383-8-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_060328_931954_7E3E42BB X-CRM114-Status: GOOD ( 12.42 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org With the introduction of large folios for anonymous memory, we would like to be able to split them when they have unmapped subpages, in order to free those unused pages under memory pressure. So remove the artificial requirement that the large folio needed to be at least PMD-sized. Signed-off-by: Ryan Roberts --- mm/rmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.25.1 diff --git a/mm/rmap.c b/mm/rmap.c index d563d979c005..5148a484f915 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1470,7 +1470,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, * page of the folio is unmapped and at least one page * is still mapped. */ - if (folio_test_pmd_mappable(folio) && folio_test_anon(folio)) + if (folio_test_large(folio) && folio_test_anon(folio)) if (!compound || nr < nr_pmdmapped) deferred_split_folio(folio); } From patchwork Fri Apr 14 13:02:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211476 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7CC6BC77B6E for ; Fri, 14 Apr 2023 13:04:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=uHEgDlSvnvxHmHw+YNwR035Flem0G1VEGsx3SCxIsk8=; b=xUVv/N3urNLKqT Cgftmzj+CGQ4Chcg0NUl8PfBa+CHfd7N3d0VCAt75mlbHb+cBRxBwCvD8ckJkmUnqrgeL2q/xMZbN LZtuwNNXGVIulQDqffArBAHDDO8FtjK4I0LyXBazB6cy7do4HctuD95XNJe4TdzB1RzrdRd8xdiG6 hYv26OgyELf72KAXM1ANJicPVsmnxph+vfBmWkNgYZ6j2+kwVpCMk4i0kqIpP3zd8skJPVbIVvIIc bSxPRGGom3EXP6dhM7PZoeGN3WmbZUxDV0SJkFr8LAElN1o3IxZdT4t0oS8JHllzVHlU4ElwQNuXi Egjniuyo9h+0e6ZNF5dw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5m-009bdW-32; Fri, 14 Apr 2023 13:03:55 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5N-009bQw-2q for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:37 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 47F6A1762; Fri, 14 Apr 2023 06:04:13 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C33FF3F6C4; Fri, 14 Apr 2023 06:03:27 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 08/17] mm: Implement folio_move_anon_rmap_range() Date: Fri, 14 Apr 2023 14:02:54 +0100 Message-Id: <20230414130303.2345383-9-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_060330_065736_8C849BCB X-CRM114-Status: GOOD ( 14.34 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Similar to page_move_anon_rmap() except it can batch-move a range of pages within a folio for increased efficiency. Will be used to enable reusing multiple pages from a large anonymous folio in one go. Signed-off-by: Ryan Roberts --- include/linux/rmap.h | 2 ++ mm/rmap.c | 40 ++++++++++++++++++++++++++++++---------- 2 files changed, 32 insertions(+), 10 deletions(-) -- 2.25.1 diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 5c707f53d7b5..8cb0ba48d58f 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -190,6 +190,8 @@ typedef int __bitwise rmap_t; * rmap interfaces called when adding or removing pte of page */ void page_move_anon_rmap(struct page *, struct vm_area_struct *); +void folio_move_anon_rmap_range(struct folio *folio, struct page *page, + int nr, struct vm_area_struct *vma); void page_add_anon_rmap(struct page *, struct vm_area_struct *, unsigned long address, rmap_t flags); void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, diff --git a/mm/rmap.c b/mm/rmap.c index 5148a484f915..1cd8fb0b929f 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1103,19 +1103,22 @@ int folio_total_mapcount(struct folio *folio) } /** - * page_move_anon_rmap - move a page to our anon_vma - * @page: the page to move to our anon_vma - * @vma: the vma the page belongs to + * folio_move_anon_rmap_range - batch-move a range of pages within a folio to + * our anon_vma; a more efficient version of page_move_anon_rmap(). + * @folio: folio that owns the range of pages + * @page: the first page to move to our anon_vma + * @nr: number of pages to move to our anon_vma + * @vma: the vma the page belongs to * - * When a page belongs exclusively to one process after a COW event, - * that page can be moved into the anon_vma that belongs to just that - * process, so the rmap code will not search the parent or sibling - * processes. + * When a range of pages belongs exclusively to one process after a COW event, + * those pages can be moved into the anon_vma that belongs to just that process, + * so the rmap code will not search the parent or sibling processes. */ -void page_move_anon_rmap(struct page *page, struct vm_area_struct *vma) +void folio_move_anon_rmap_range(struct folio *folio, struct page *page, + int nr, struct vm_area_struct *vma) { void *anon_vma = vma->anon_vma; - struct folio *folio = page_folio(page); + int i; VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); VM_BUG_ON_VMA(!anon_vma, vma); @@ -1127,7 +1130,24 @@ void page_move_anon_rmap(struct page *page, struct vm_area_struct *vma) * folio_test_anon()) will not see one without the other. */ WRITE_ONCE(folio->mapping, anon_vma); - SetPageAnonExclusive(page); + + for (i = 0; i < nr; i++) + SetPageAnonExclusive(page++); +} + +/** + * page_move_anon_rmap - move a page to our anon_vma + * @page: the page to move to our anon_vma + * @vma: the vma the page belongs to + * + * When a page belongs exclusively to one process after a COW event, + * that page can be moved into the anon_vma that belongs to just that + * process, so the rmap code will not search the parent or sibling + * processes. + */ +void page_move_anon_rmap(struct page *page, struct vm_area_struct *vma) +{ + folio_move_anon_rmap_range(page_folio(page), page, 1, vma); } /** From patchwork Fri Apr 14 13:02:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211477 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 981F4C77B6E for ; Fri, 14 Apr 2023 13:04:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=+GWRZuV5HqowrpHnoaXNXdv2JXxy1sD+AnIVAy1GkXs=; b=0ITDDvla+6kwME jL6ECcvOBQ9C/E93kAH1B4U1RKLfi/ITwn9Ay1S2670O83HAy/fj9sXjpYyKWnwuWhogYMMP4GBEf DsrbayeDsADZdYiLftdi9rxMJlMSLoxGvMn00SjeClQ/XtewqM8ARkatWtI68tXsqZjgSi3oklF/U 7+vUDRa8Pj1AVJdWEu2qfnNR26P62RNmUWhxFr5n999TjMYveLE6XJGdbV6eklGc3XYWdF9w5D9t+ vf/8WAEkb9sBBsuXKf2DtI1u0VFyeejxNAo6rBdpcgZ7BwlIv4DiHHpu79GK0moqeKFGICK7d3tld cLdKGXN2X1+1+h+KoJow==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5r-009bfo-1C; Fri, 14 Apr 2023 13:03:59 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5P-009bRi-1u for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:38 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7A1EA1763; Fri, 14 Apr 2023 06:04:14 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0171B3F6C4; Fri, 14 Apr 2023 06:03:28 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 09/17] mm: Update wp_page_reuse() to operate on range of pages Date: Fri, 14 Apr 2023 14:02:55 +0100 Message-Id: <20230414130303.2345383-10-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_060331_737686_702A6038 X-CRM114-Status: GOOD ( 18.03 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org We will shortly be updating do_wp_page() to be able to reuse a range of pages from a large anon folio. As an enabling step, modify wp_page_reuse() to operate on a range of pages, if a struct anon_folio_range is passed in. Batching in this way allows us to batch up the cache maintenance and event counting for small performance improvements. Currently all callsites pass range=NULL, so no functional changes intended. Signed-off-by: Ryan Roberts --- mm/memory.c | 80 +++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 60 insertions(+), 20 deletions(-) -- 2.25.1 diff --git a/mm/memory.c b/mm/memory.c index f92a28064596..83835ff5a818 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3030,6 +3030,14 @@ static inline int max_anon_folio_order(struct vm_area_struct *vma) return ANON_FOLIO_ORDER_MAX; } +struct anon_folio_range { + unsigned long va_start; + pte_t *pte_start; + struct page *pg_start; + int nr; + bool exclusive; +}; + /* * Returns index of first pte that is not none, or nr if all are none. */ @@ -3122,31 +3130,63 @@ static int calc_anon_folio_order_alloc(struct vm_fault *vmf, int order) * case, all we need to do here is to mark the page as writable and update * any related book-keeping. */ -static inline void wp_page_reuse(struct vm_fault *vmf) +static inline void wp_page_reuse(struct vm_fault *vmf, + struct anon_folio_range *range) __releases(vmf->ptl) { struct vm_area_struct *vma = vmf->vma; - struct page *page = vmf->page; + unsigned long addr; + pte_t *pte; + struct page *page; + int nr; pte_t entry; + int change = 0; + int i; VM_BUG_ON(!(vmf->flags & FAULT_FLAG_WRITE)); - VM_BUG_ON(page && PageAnon(page) && !PageAnonExclusive(page)); - /* - * Clear the pages cpupid information as the existing - * information potentially belongs to a now completely - * unrelated process. - */ - if (page) - page_cpupid_xchg_last(page, (1 << LAST_CPUPID_SHIFT) - 1); + if (range) { + addr = range->va_start; + pte = range->pte_start; + page = range->pg_start; + nr = range->nr; + } else { + addr = vmf->address; + pte = vmf->pte; + page = vmf->page; + nr = 1; + } + + if (page) { + for (i = 0; i < nr; i++, page++) { + VM_BUG_ON(PageAnon(page) && !PageAnonExclusive(page)); + + /* + * Clear the pages cpupid information as the existing + * information potentially belongs to a now completely + * unrelated process. + */ + page_cpupid_xchg_last(page, + (1 << LAST_CPUPID_SHIFT) - 1); + } + } + + flush_cache_range(vma, addr, addr + (nr << PAGE_SHIFT)); + + for (i = 0; i < nr; i++) { + entry = pte_mkyoung(pte[i]); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); + change |= ptep_set_access_flags(vma, + addr + (i << PAGE_SHIFT), + pte + i, + entry, 1); + } + + if (change) + update_mmu_cache_range(vma, addr, pte, nr); - flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); - entry = pte_mkyoung(vmf->orig_pte); - entry = maybe_mkwrite(pte_mkdirty(entry), vma); - if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1)) - update_mmu_cache(vma, vmf->address, vmf->pte); pte_unmap_unlock(vmf->pte, vmf->ptl); - count_vm_event(PGREUSE); + count_vm_events(PGREUSE, nr); } /* @@ -3359,7 +3399,7 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); return VM_FAULT_NOPAGE; } - wp_page_reuse(vmf); + wp_page_reuse(vmf, NULL); return 0; } @@ -3381,7 +3421,7 @@ static vm_fault_t wp_pfn_shared(struct vm_fault *vmf) return ret; return finish_mkwrite_fault(vmf); } - wp_page_reuse(vmf); + wp_page_reuse(vmf, NULL); return 0; } @@ -3410,7 +3450,7 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf) return tmp; } } else { - wp_page_reuse(vmf); + wp_page_reuse(vmf, NULL); lock_page(vmf->page); } ret |= fault_dirty_shared_page(vmf); @@ -3534,7 +3574,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); return 0; } - wp_page_reuse(vmf); + wp_page_reuse(vmf, NULL); return 0; } copy: From patchwork Fri Apr 14 13:02:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211481 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 240CEC77B72 for ; Fri, 14 Apr 2023 13:05:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=X4Ytz+QO9m4CD09+6tof4sGWQJOoN/24lW2lHD9B6ew=; b=wR7zNKJ83e3mIL JM1N8oHOIAe150+clc47yIkzKhpSZUa9t8Zp3FqVIDlIaBqhvznq+GRbvo2eCikiZ+oo+o58fTSta 8tGC57ry53cO0Fvhdk7+vUP4fdR5xG7XB+1y9tOTImIZVkJRzMIZrYBMxGVbP+ihXSErg+MtuHGBH zjsfKMn5uKdFcIQnVupdYi1LqPaKrk1WuA/ckhETejU5Haz+RXqCUKSNBKUpoBcWqNoN65N1/5EWs Fzf/AT3QvFqKJFGi8VW6QEM8FE4hqTavETfXDKZO0KVdbCBQB2U0y9K9iqa+qJIjqI8FrZAtxACCv c4rc508xrxe5PPvNN6WQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5t-009bh6-0r; Fri, 14 Apr 2023 13:04:01 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5P-009bS7-2d for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:38 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AC7E21764; Fri, 14 Apr 2023 06:04:15 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 33CFE3F6C4; Fri, 14 Apr 2023 06:03:30 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 10/17] mm: Reuse large folios for anonymous memory Date: Fri, 14 Apr 2023 14:02:56 +0100 Message-Id: <20230414130303.2345383-11-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_060332_062681_C1A53A79 X-CRM114-Status: GOOD ( 28.88 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org When taking a write fault on an anonymous page, attempt to reuse as much of the folio as possible if it is exclusive to the process. This avoids a problem where an exclusive, PTE-mapped THP would previously have all of its pages except the last one CoWed, then the last page would be reused, causing the whole original folio to hang around as well as all the CoWed pages. This problem is exaserbated now that we are allocating variable-order folios for anonymous memory. The reason for this behaviour is that a PTE-mapped THP has a reference for each PTE and the old code thought that meant it was not exclusively mapped, and therefore could not be reused. We now take care to find the region that intersects the underlying folio, the VMA and the PMD entry and for the presence of that number of references as indicating exclusivity. Note that we are not guarranteed that this region will cover the whole folio due to munmap and mremap. The aim is to reuse as much as possible in one go in order to: - reduce memory consumption - reduce number of CoWs - reduce time spent in fault handler Signed-off-by: Ryan Roberts --- mm/memory.c | 169 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 160 insertions(+), 9 deletions(-) -- 2.25.1 diff --git a/mm/memory.c b/mm/memory.c index 83835ff5a818..7e2af54fe2e0 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3038,6 +3038,26 @@ struct anon_folio_range { bool exclusive; }; +static inline unsigned long page_addr(struct page *page, + struct page *anchor, unsigned long anchor_addr) +{ + unsigned long offset; + unsigned long addr; + + offset = (page_to_pfn(page) - page_to_pfn(anchor)) << PAGE_SHIFT; + addr = anchor_addr + offset; + + if (anchor > page) { + if (addr > anchor_addr) + return 0; + } else { + if (addr < anchor_addr) + return ULONG_MAX; + } + + return addr; +} + /* * Returns index of first pte that is not none, or nr if all are none. */ @@ -3122,6 +3142,122 @@ static int calc_anon_folio_order_alloc(struct vm_fault *vmf, int order) return order; } +static void calc_anon_folio_range_reuse(struct vm_fault *vmf, + struct folio *folio, + struct anon_folio_range *range_out) +{ + /* + * The aim here is to determine the biggest range of pages that can be + * reused for this CoW fault if the identified range is responsible for + * all the references on the folio (i.e. it is exclusive) such that: + * - All pages are contained within folio + * - All pages are within VMA + * - All pages are within the same pmd entry as vmf->address + * - vmf->page is contained within the range + * - All covered ptes must be present, physically contiguous and RO + * + * Note that the folio itself may not be naturally aligned in VA space + * due to mremap. We take the largest range we can in order to increase + * our chances of being the exclusive user of the folio, therefore + * meaning we can reuse. Its possible that the folio crosses a pmd + * boundary, in which case we don't follow it into the next pte because + * this complicates the locking. + * + * Note that the caller may or may not choose to lock the pte. If + * unlocked, the calculation should be considered an estimate that will + * need to be validated under the lock. + */ + + struct vm_area_struct *vma = vmf->vma; + struct page *page; + pte_t *ptep; + pte_t pte; + bool excl = true; + unsigned long start, end; + int bloops, floops; + int i; + unsigned long pfn; + + /* + * Iterate backwards, starting with the page immediately before the + * anchor page. On exit from the loop, start is the inclusive start + * virtual address of the range. + */ + + start = page_addr(&folio->page, vmf->page, vmf->address); + start = max(start, vma->vm_start); + start = max(start, ALIGN_DOWN(vmf->address, PMD_SIZE)); + bloops = (vmf->address - start) >> PAGE_SHIFT; + + page = vmf->page - 1; + ptep = vmf->pte - 1; + pfn = page_to_pfn(vmf->page) - 1; + + for (i = 0; i < bloops; i++) { + pte = *ptep; + + if (!pte_present(pte) || + pte_write(pte) || + pte_protnone(pte) || + pte_pfn(pte) != pfn) { + start = vmf->address - (i << PAGE_SHIFT); + break; + } + + if (excl && !PageAnonExclusive(page)) + excl = false; + + pfn--; + ptep--; + page--; + } + + /* + * Iterate forward, starting with the anchor page. On exit from the + * loop, end is the exclusive end virtual address of the range. + */ + + end = page_addr(&folio->page + folio_nr_pages(folio), + vmf->page, vmf->address); + end = min(end, vma->vm_end); + end = min(end, ALIGN_DOWN(vmf->address, PMD_SIZE) + PMD_SIZE); + floops = (end - vmf->address) >> PAGE_SHIFT; + + page = vmf->page; + ptep = vmf->pte; + pfn = page_to_pfn(vmf->page); + + for (i = 0; i < floops; i++) { + pte = *ptep; + + if (!pte_present(pte) || + pte_write(pte) || + pte_protnone(pte) || + pte_pfn(pte) != pfn) { + end = vmf->address + (i << PAGE_SHIFT); + break; + } + + if (excl && !PageAnonExclusive(page)) + excl = false; + + pfn++; + ptep++; + page++; + } + + /* + * Fixup vmf to point to the start of the range, and return number of + * pages in range. + */ + + range_out->va_start = start; + range_out->pg_start = vmf->page - ((vmf->address - start) >> PAGE_SHIFT); + range_out->pte_start = vmf->pte - ((vmf->address - start) >> PAGE_SHIFT); + range_out->nr = (end - start) >> PAGE_SHIFT; + range_out->exclusive = excl; +} + /* * Handle write page faults for pages that can be reused in the current vma * @@ -3528,13 +3664,23 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) /* * Private mapping: create an exclusive anonymous page copy if reuse * is impossible. We might miss VM_WRITE for FOLL_FORCE handling. + * For anonymous memory, we attempt to copy/reuse in folios rather than + * page-by-page. We always prefer reuse above copy, even if we can only + * reuse a subset of the folio. Note that when reusing pages in a folio, + * due to munmap, mremap and friends, the folio isn't guarranteed to be + * naturally aligned in virtual memory space. */ if (folio && folio_test_anon(folio)) { + struct anon_folio_range range; + int swaprefs; + + calc_anon_folio_range_reuse(vmf, folio, &range); + /* - * If the page is exclusive to this process we must reuse the - * page without further checks. + * If the pages have already been proven to be exclusive to this + * process we must reuse the pages without further checks. */ - if (PageAnonExclusive(vmf->page)) + if (range.exclusive) goto reuse; /* @@ -3544,7 +3690,10 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) * * KSM doesn't necessarily raise the folio refcount. */ - if (folio_test_ksm(folio) || folio_ref_count(folio) > 3) + swaprefs = folio_test_swapcache(folio) ? + folio_nr_pages(folio) : 0; + if (folio_test_ksm(folio) || + folio_ref_count(folio) > range.nr + swaprefs + 1) goto copy; if (!folio_test_lru(folio)) /* @@ -3552,29 +3701,31 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) * remote LRU pagevecs or references to LRU folios. */ lru_add_drain(); - if (folio_ref_count(folio) > 1 + folio_test_swapcache(folio)) + if (folio_ref_count(folio) > range.nr + swaprefs) goto copy; if (!folio_trylock(folio)) goto copy; if (folio_test_swapcache(folio)) folio_free_swap(folio); - if (folio_test_ksm(folio) || folio_ref_count(folio) != 1) { + if (folio_test_ksm(folio) || + folio_ref_count(folio) != range.nr) { folio_unlock(folio); goto copy; } /* - * Ok, we've got the only folio reference from our mapping + * Ok, we've got the only folio references from our mapping * and the folio is locked, it's dark out, and we're wearing * sunglasses. Hit it. */ - page_move_anon_rmap(vmf->page, vma); + folio_move_anon_rmap_range(folio, range.pg_start, + range.nr, vma); folio_unlock(folio); reuse: if (unlikely(unshare)) { pte_unmap_unlock(vmf->pte, vmf->ptl); return 0; } - wp_page_reuse(vmf, NULL); + wp_page_reuse(vmf, &range); return 0; } copy: From patchwork Fri Apr 14 13:02:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211478 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D91DEC77B76 for ; Fri, 14 Apr 2023 13:04:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=a4AYw8QOLLD4dhUrLmIyKk6IEM161rMTZLOInlHNLW8=; b=nIzC1UAQ09EFkc OW8xnc0JdcwFVSq3zkIv7LupHA4SkS8zIQDIZDA5ZC4qcx4L5mLlawmB1RBkk7YCkXGjN8cZXkUBz Vn6jmgJY5MjCE7Wu3Sg/nBdqRokj+U/1hxf7CaZAiHaY154+kcqD+PH+Mzt9WHsrXhbn+lvTqN05n DkoHdh14gcFtabnjxS6F0JMI/DdANM3z9QPicwBzUj/324lR8PkRpMaBLIENRgnySNUzTECqSP9Aq YD1cWjeEWIurCLbOZShicjmqdd2S69mdq5KUyYVy1cTAbMfy+P7lhexovLTGex31p8vA8FSWR3DR3 zMyqrXvFQbVFI41/z92w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5u-009biC-2n; Fri, 14 Apr 2023 13:04:02 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5R-009bSz-24 for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:39 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DE8F04B3; Fri, 14 Apr 2023 06:04:16 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 661D03F6C4; Fri, 14 Apr 2023 06:03:31 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 11/17] mm: Split __wp_page_copy_user() into 2 variants Date: Fri, 14 Apr 2023 14:02:57 +0100 Message-Id: <20230414130303.2345383-12-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_060333_799686_F26B835F X-CRM114-Status: GOOD ( 17.42 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org We will soon support CoWing large folios, so will need support for copying a contiguous range of pages in the case where there is a source folio. Therefore, split __wp_page_copy_user() into 2 variants: __wp_page_copy_user_pfn() copies a single pfn to a destination page. This is used when CoWing from a source without a folio, and is always only a single page copy. __wp_page_copy_user_range() copies a range of pages from source to destination and is used when the source has an underlying folio. For now it is only used to copy a single page, but this will change in a future commit. In both cases, kmsan_copy_page_meta() is moved into these helper functions so that the caller does not need to be concerned with calling it multiple times for the range case. No functional changes intended. Signed-off-by: Ryan Roberts --- mm/memory.c | 41 +++++++++++++++++++++++++++++------------ 1 file changed, 29 insertions(+), 12 deletions(-) -- 2.25.1 diff --git a/mm/memory.c b/mm/memory.c index 7e2af54fe2e0..f2b7cfb2efc0 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2786,14 +2786,34 @@ static inline int pte_unmap_same(struct vm_fault *vmf) return same; } +/* + * Return: + * 0: copied succeeded + * -EHWPOISON: copy failed due to hwpoison in source page + */ +static inline int __wp_page_copy_user_range(struct page *dst, struct page *src, + int nr, unsigned long addr, + struct vm_area_struct *vma) +{ + for (; nr != 0; nr--, dst++, src++, addr += PAGE_SIZE) { + if (copy_mc_user_highpage(dst, src, addr, vma)) { + memory_failure_queue(page_to_pfn(src), 0); + return -EHWPOISON; + } + kmsan_copy_page_meta(dst, src); + } + + return 0; +} + /* * Return: * 0: copied succeeded * -EHWPOISON: copy failed due to hwpoison in source page * -EAGAIN: copied failed (some other reason) */ -static inline int __wp_page_copy_user(struct page *dst, struct page *src, - struct vm_fault *vmf) +static inline int __wp_page_copy_user_pfn(struct page *dst, + struct vm_fault *vmf) { int ret; void *kaddr; @@ -2803,14 +2823,6 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src, struct mm_struct *mm = vma->vm_mm; unsigned long addr = vmf->address; - if (likely(src)) { - if (copy_mc_user_highpage(dst, src, addr, vma)) { - memory_failure_queue(page_to_pfn(src), 0); - return -EHWPOISON; - } - return 0; - } - /* * If the source page was a PFN mapping, we don't have * a "struct page" for it. We do a best-effort copy by @@ -2879,6 +2891,7 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src, } } + kmsan_copy_page_meta(dst, NULL); ret = 0; pte_unlock: @@ -3372,7 +3385,12 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) if (!new_folio) goto oom; - ret = __wp_page_copy_user(&new_folio->page, vmf->page, vmf); + if (likely(old_folio)) + ret = __wp_page_copy_user_range(&new_folio->page, + vmf->page, + 1, vmf->address, vma); + else + ret = __wp_page_copy_user_pfn(&new_folio->page, vmf); if (ret) { /* * COW failed, if the fault was solved by other, @@ -3388,7 +3406,6 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) delayacct_wpcopy_end(); return ret == -EHWPOISON ? VM_FAULT_HWPOISON : 0; } - kmsan_copy_page_meta(&new_folio->page, vmf->page); } if (mem_cgroup_charge(new_folio, mm, GFP_KERNEL)) From patchwork Fri Apr 14 13:02:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211485 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7E72FC77B72 for ; Fri, 14 Apr 2023 13:05:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=u+yH6a/FR9Poa8Cgqz5v/OKOSsi6/QykmrPQjrRmsFY=; b=Ks10ySrabiZuLN +uOKER8E6IXFBY81+uLZ1ZflU6wtYqmBV7RUqlzKRDx5vknWHvgraxYw7nt/ZPEkd39I9UTSQeYRK mNMRxyklunoTaPjNfDLJoFBOZzVzmKyrWDFNUnoU8ODB4BSw3pmiKb9U6+MHPY4SWtaI4xoQ8RKto brhKVimEe98jG3yVQjzRV2j03J8NpDh+NUBqb4OEfAq6EHwQ/+OwuswkPUERDGKmCb6f8CaJXVRJ5 ERG8FZMkLKzYvS2yoiBMu6Vlcz7rYR228E81BTcWQE01yhIE/fZ5UNlAR9oDxZSPfZgYvMfXP/drf rWjre2QIJnEKa4UnLL+w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ6H-009bw3-0w; Fri, 14 Apr 2023 13:04:25 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5R-009bTl-38 for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:43 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1C74816F8; Fri, 14 Apr 2023 06:04:18 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 980903F6C4; Fri, 14 Apr 2023 06:03:32 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 12/17] mm: ptep_clear_flush_range_notify() macro for batch operation Date: Fri, 14 Apr 2023 14:02:58 +0100 Message-Id: <20230414130303.2345383-13-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_060334_227792_D1469A07 X-CRM114-Status: GOOD ( 10.17 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org We will soon add support for CoWing large anonymous folios, so create a ranged version of the ptep_clear_flush_notify() macro in preparation for that. It is able call mmu_notifier_invalidate_range() once for the entire range, but still calls ptep_clear_flush() per page since there is no arch support for a batched version of this API yet. No functional change intended. Signed-off-by: Ryan Roberts --- include/linux/mmu_notifier.h | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) -- 2.25.1 diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 64a3e051c3c4..527aa89959b4 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -595,6 +595,24 @@ static inline void mmu_notifier_range_init_owner( ___pte; \ }) +#define ptep_clear_flush_range_notify(__vma, __address, __ptep, __nr) \ +({ \ + struct vm_area_struct *___vma = (__vma); \ + unsigned long ___addr = (__address) & PAGE_MASK; \ + pte_t *___ptep = (__ptep); \ + int ___nr = (__nr); \ + struct mm_struct *___mm = ___vma->vm_mm; \ + int ___i; \ + \ + for (___i = 0; ___i < ___nr; ___i++) \ + ptep_clear_flush(___vma, \ + ___addr + (___i << PAGE_SHIFT), \ + ___ptep + ___i); \ + \ + mmu_notifier_invalidate_range(___mm, ___addr, \ + ___addr + (___nr << PAGE_SHIFT)); \ +}) + #define pmdp_huge_clear_flush_notify(__vma, __haddr, __pmd) \ ({ \ unsigned long ___haddr = __haddr & HPAGE_PMD_MASK; \ @@ -736,6 +754,19 @@ static inline void mmu_notifier_subscriptions_destroy(struct mm_struct *mm) #define ptep_clear_young_notify ptep_test_and_clear_young #define pmdp_clear_young_notify pmdp_test_and_clear_young #define ptep_clear_flush_notify ptep_clear_flush +#define ptep_clear_flush_range_notify(__vma, __address, __ptep, __nr) \ +({ \ + struct vm_area_struct *___vma = (__vma); \ + unsigned long ___addr = (__address) & PAGE_MASK; \ + pte_t *___ptep = (__ptep); \ + int ___nr = (__nr); \ + int ___i; \ + \ + for (___i = 0; ___i < ___nr; ___i++) \ + ptep_clear_flush(___vma, \ + ___addr + (___i << PAGE_SHIFT), \ + ___ptep + ___i); \ +}) #define pmdp_huge_clear_flush_notify pmdp_huge_clear_flush #define pudp_huge_clear_flush_notify pudp_huge_clear_flush #define set_pte_at_notify set_pte_at From patchwork Fri Apr 14 13:02:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211479 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0BFAFC77B6E for ; Fri, 14 Apr 2023 13:05:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=huSzatDScDBljxPgiVstFN2THK1Wxa4o3Y1qza1CJas=; b=WLCfPRoOYy9UrO 8/ZBhZQWmjr83GvQFnCcth4k9qwMIWCNn0AVeY/1Z8d5oRXcK2kYwTrcDZf2nsPg1LQjb7HmxHQLS G5Az5cj6ItQtE+zE/oHMyddSk2LXrNNUIJKwbZ315rr0FdxEyEyfJL2MYC+nAI2AOnDr8PUSGKw0n uhf/CLTjZPuEliKcg9220fjj4nS9Q/bbwZ9lvIFidiicio/gQEY+EAGyq6LLY47IWjOopaN6M0+Dh gOdLRwQNj3x8SpxR2Zp5Qc126x7nod7hkTxAx74dIK6dLE4tKKFqZKNH6D1FAB+OX5paVsqMbaGXI /v7y4Au0+UV/miUVD/Yg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5w-009bj7-25; Fri, 14 Apr 2023 13:04:04 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5Y-009bXw-2K for linux-arm-kernel@bombadil.infradead.org; Fri, 14 Apr 2023 13:03:40 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=7ZDJ9cjTlxL0uCmbHQ5z7qLc8C1ClBv5+RC+K8b5xO4=; b=DckEsbN69vMtshsD7xV+oawpsf qcPufyqKxPa8F+l0FdN5XVbghxoUuMUDzCrDV1zlEN75tmW5JkUmFsFX9QW989bZvgWvqN07uzlNR CFJdVpzkTMwCRoRVC9EwujyW4h7G51wvMpIb882k1OFKQ+rg0SPJPY8wtItvHQt0197NG8tLH9A8V JVzhxvQoG/vP4q8Z3qHsVmE3HzGYr16mHPGxCv9JTCp4By4o3YczExu86ysdllbuGMxWuM+bqG9iW 6i+7h7W2YfApjZy+q0oKf6ih358YPm3R9iRFfN2KJo7UA0fxt4Y1IN/XJUyVzKwnrdQ1ErA8vGxjT w0skkApQ==; Received: from foss.arm.com ([217.140.110.172]) by desiato.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5T-00Fa1T-3D for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:39 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4E2F02F4; Fri, 14 Apr 2023 06:04:19 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C9E343F6C4; Fri, 14 Apr 2023 06:03:33 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 13/17] mm: Implement folio_remove_rmap_range() Date: Fri, 14 Apr 2023 14:02:59 +0100 Message-Id: <20230414130303.2345383-14-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_140336_405895_800CFDB2 X-CRM114-Status: GOOD ( 15.92 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Like page_remove_rmap() but batch-removes the rmap for a range of pages belonging to a folio, for effciency savings. All pages are accounted as small pages. Signed-off-by: Ryan Roberts --- include/linux/rmap.h | 2 ++ mm/rmap.c | 62 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 64 insertions(+) -- 2.25.1 diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 8cb0ba48d58f..7daf25887049 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -204,6 +204,8 @@ void page_add_file_rmap(struct page *, struct vm_area_struct *, bool compound); void page_remove_rmap(struct page *, struct vm_area_struct *, bool compound); +void folio_remove_rmap_range(struct folio *folio, struct page *page, + int nr, struct vm_area_struct *vma); void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *, unsigned long address, rmap_t flags); diff --git a/mm/rmap.c b/mm/rmap.c index 1cd8fb0b929f..954e44054d5c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1419,6 +1419,68 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, mlock_vma_folio(folio, vma, compound); } +/** + * folio_remove_rmap_range - take down pte mappings from a range of pages + * belonging to a folio. All pages are accounted as small pages. + * @folio: folio that all pages belong to + * @page: first page in range to remove mapping from + * @nr: number of pages in range to remove mapping from + * @vma: the vm area from which the mapping is removed + * + * The caller needs to hold the pte lock. + */ +void folio_remove_rmap_range(struct folio *folio, struct page *page, + int nr, struct vm_area_struct *vma) +{ + atomic_t *mapped = &folio->_nr_pages_mapped; + int nr_unmapped = 0; + int nr_mapped; + bool last; + enum node_stat_item idx; + + VM_BUG_ON_FOLIO(folio_test_hugetlb(folio), folio); + + if (!folio_test_large(folio)) { + /* Is this the page's last map to be removed? */ + last = atomic_add_negative(-1, &page->_mapcount); + nr_unmapped = last; + } else { + for (; nr != 0; nr--, page++) { + /* Is this the page's last map to be removed? */ + last = atomic_add_negative(-1, &page->_mapcount); + if (last) { + /* Page still mapped if folio mapped entirely */ + nr_mapped = atomic_dec_return_relaxed(mapped); + if (nr_mapped < COMPOUND_MAPPED) + nr_unmapped++; + } + } + } + + if (nr_unmapped) { + idx = folio_test_anon(folio) ? NR_ANON_MAPPED : NR_FILE_MAPPED; + __lruvec_stat_mod_folio(folio, idx, -nr_unmapped); + + /* + * Queue anon THP for deferred split if we have just unmapped at + * least 1 page, while at least 1 page remains mapped. + */ + if (folio_test_large(folio) && folio_test_anon(folio)) + if (nr_mapped) + deferred_split_folio(folio); + } + + /* + * It would be tidy to reset folio_test_anon mapping when fully + * unmapped, but that might overwrite a racing page_add_anon_rmap + * which increments mapcount after us but sets mapping before us: + * so leave the reset to free_pages_prepare, and remember that + * it's only reliable while mapped. + */ + + munlock_vma_folio(folio, vma, false); +} + /** * page_remove_rmap - take down pte mapping from a page * @page: page to remove mapping from From patchwork Fri Apr 14 13:03:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211482 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 21542C77B76 for ; Fri, 14 Apr 2023 13:05:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=DsT1GWqNILc2X1NiTCL/EiMgAJkv2F46ocZU/iOcJoc=; b=ZUZIW3jlW65v9F fXN+Av/kPoDmrs9wHY7amdwzxZyI16ScFZP2YSUbgqkwYwPQcZ9dEv0b0Bz2uG7MnO2gtF84LGlfj PTnDwd+aJdsKNsHoTuCi4M53fnOKhSsvhdKnnd6vprUumekUBGwRNFBlk5ZknGASkQy1jRz6LRXOx kVvORa5wlTzQI4iRZfT9OKxUIQIHSXUlrQszPdHY66arFtnL2rMPXnzWQoE3h4Ne65Q+gVoQIj2x7 P//WxkqeBzXMlZ7xnhgYGaQz6tVkUqR4nN3L7QqnxQRLy7/qmcyeRQa1YDBKfsDmahdzQBSKjyjPp RTjpFbftQk/LByk09OPA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5y-009bke-2e; Fri, 14 Apr 2023 13:04:06 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5U-009bVe-39 for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:43 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9B3761713; Fri, 14 Apr 2023 06:04:20 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 082AF3F6C4; Fri, 14 Apr 2023 06:03:34 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 14/17] mm: Copy large folios for anonymous memory Date: Fri, 14 Apr 2023 14:03:00 +0100 Message-Id: <20230414130303.2345383-15-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_060337_182373_A5925DBA X-CRM114-Status: GOOD ( 36.41 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org When taking a write fault on an anonymous page, if we are unable to reuse the folio (due to it being mapped by others), do CoW for the entire folio instead of just a single page. We assume that the size of the anonymous folio chosen at allocation time is still a good choice and therefore it is better to copy the entire folio rather than a single page. It does not seem wise to do this for file-backed folios, since the folio size chosen there is related to the system-wide usage of the file. So we continue to CoW a single page for file-backed mappings. There are edge cases where the original mapping has been mremapped or partially munmapped. In this case the source folio may not be naturally aligned in the virtual address space. In this case, we CoW a power-of-2 portion of the source folio which is aligned. A similar effect happens when allocation of a high order destination folio fails. In this case, we reduce the order to 0 until we are successful. Signed-off-by: Ryan Roberts --- mm/memory.c | 242 ++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 207 insertions(+), 35 deletions(-) -- 2.25.1 diff --git a/mm/memory.c b/mm/memory.c index f2b7cfb2efc0..61cec97a57f3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3086,6 +3086,30 @@ static inline int check_ptes_none(pte_t *pte, int nr) return nr; } +/* + * Returns index of first pte that is not mapped RO and physically contiguously + * starting at pfn, or nr if all are correct. + */ +static inline int check_ptes_contig_ro(pte_t *pte, int nr, unsigned long pfn) +{ + int i; + pte_t entry; + + for (i = 0; i < nr; i++) { + entry = *pte++; + + if (!pte_present(entry) || + pte_write(entry) || + pte_protnone(entry) || + pte_pfn(entry) != pfn) + return i; + + pfn++; + } + + return nr; +} + static int calc_anon_folio_order_alloc(struct vm_fault *vmf, int order) { /* @@ -3155,6 +3179,94 @@ static int calc_anon_folio_order_alloc(struct vm_fault *vmf, int order) return order; } +static int calc_anon_folio_order_copy(struct vm_fault *vmf, + struct folio *old_folio, int order) +{ + /* + * The aim here is to determine what size of folio we should allocate as + * the destination for this CoW fault. Factors include: + * - Order must not be higher than `order` upon entry + * - Folio must be naturally aligned within VA space + * - Folio must not breach boundaries of vma + * - Folio must be fully contained inside one pmd entry + * - All covered ptes must be present, physically contiguous and RO + * - All covered ptes must be mapped to old_folio + * + * Additionally, we do not allow order-1 since this breaks assumptions + * elsewhere in the mm; THP pages must be at least order-2 (since they + * store state up to the 3rd struct page subpage), and these pages must + * be THP in order to correctly use pre-existing THP infrastructure such + * as folio_split(). + * + * As a consequence of relying on the THP infrastructure, if the system + * does not support THP, we always fallback to order-0. + * + * Note that old_folio may not be naturally aligned in VA space due to + * mremap. We deliberately force alignment of the new folio to simplify + * fallback, so in this unaligned case we will end up only copying a + * portion of old_folio. + * + * Note that the caller may or may not choose to lock the pte. If + * unlocked, the calculation should be considered an estimate that will + * need to be validated under the lock. + */ + + struct vm_area_struct *vma = vmf->vma; + int nr; + unsigned long addr; + pte_t *pte; + pte_t *first_bad = NULL; + int ret; + unsigned long start, end; + unsigned long offset; + unsigned long pfn; + + if (has_transparent_hugepage()) { + order = min(order, PMD_SHIFT - PAGE_SHIFT); + + start = page_addr(&old_folio->page, vmf->page, vmf->address); + start = max(start, vma->vm_start); + + end = page_addr(&old_folio->page + folio_nr_pages(old_folio), + vmf->page, vmf->address); + end = min(end, vma->vm_end); + + for (; order > 1; order--) { + nr = 1 << order; + addr = ALIGN_DOWN(vmf->address, nr << PAGE_SHIFT); + offset = ((vmf->address - addr) >> PAGE_SHIFT); + pfn = page_to_pfn(vmf->page) - offset; + pte = vmf->pte - offset; + + /* Check vma and folio bounds. */ + if (addr < start || + addr + (nr << PAGE_SHIFT) > end) + continue; + + /* Ptes covered by order already known to be good. */ + if (pte + nr <= first_bad) + break; + + /* Already found bad pte in range covered by order. */ + if (pte <= first_bad) + continue; + + /* Need to check if all the ptes are good. */ + ret = check_ptes_contig_ro(pte, nr, pfn); + if (ret == nr) + break; + + first_bad = pte + ret; + } + + if (order == 1) + order = 0; + } else + order = 0; + + return order; +} + static void calc_anon_folio_range_reuse(struct vm_fault *vmf, struct folio *folio, struct anon_folio_range *range_out) @@ -3366,6 +3478,14 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) int page_copied = 0; struct mmu_notifier_range range; int ret; + pte_t orig_pte; + unsigned long addr = vmf->address; + int order = 0; + int pgcount = BIT(order); + unsigned long offset = 0; + unsigned long pfn; + struct page *page; + int i; delayacct_wpcopy_start(); @@ -3375,20 +3495,39 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) goto oom; if (is_zero_pfn(pte_pfn(vmf->orig_pte))) { - new_folio = vma_alloc_zeroed_movable_folio(vma, vmf->address, - 0, 0); + new_folio = vma_alloc_movable_folio(vma, vmf->address, 0, true); if (!new_folio) goto oom; } else { - new_folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, - vmf->address, false); + if (old_folio && folio_test_anon(old_folio)) { + order = min_t(int, folio_order(old_folio), + max_anon_folio_order(vma)); +retry: + /* + * Estimate the folio order to allocate. We are not + * under the ptl here so this estimate needs to be + * re-checked later once we have the lock. + */ + vmf->pte = pte_offset_map(vmf->pmd, vmf->address); + order = calc_anon_folio_order_copy(vmf, old_folio, order); + pte_unmap(vmf->pte); + } + + new_folio = try_vma_alloc_movable_folio(vma, vmf->address, + order, false); if (!new_folio) goto oom; + /* We may have been granted less than we asked for. */ + order = folio_order(new_folio); + pgcount = BIT(order); + addr = ALIGN_DOWN(vmf->address, pgcount << PAGE_SHIFT); + offset = ((vmf->address - addr) >> PAGE_SHIFT); + if (likely(old_folio)) ret = __wp_page_copy_user_range(&new_folio->page, - vmf->page, - 1, vmf->address, vma); + vmf->page - offset, + pgcount, addr, vma); else ret = __wp_page_copy_user_pfn(&new_folio->page, vmf); if (ret) { @@ -3410,39 +3549,31 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) if (mem_cgroup_charge(new_folio, mm, GFP_KERNEL)) goto oom_free_new; - cgroup_throttle_swaprate(&new_folio->page, GFP_KERNEL); + folio_throttle_swaprate(new_folio, GFP_KERNEL); __folio_mark_uptodate(new_folio); mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, - vmf->address & PAGE_MASK, - (vmf->address & PAGE_MASK) + PAGE_SIZE); + addr, addr + (pgcount << PAGE_SHIFT)); mmu_notifier_invalidate_range_start(&range); /* - * Re-check the pte - we dropped the lock + * Re-check the pte(s) - we dropped the lock */ - vmf->pte = pte_offset_map_lock(mm, vmf->pmd, vmf->address, &vmf->ptl); - if (likely(pte_same(*vmf->pte, vmf->orig_pte))) { + vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl); + pfn = pte_pfn(vmf->orig_pte) - offset; + if (likely(check_ptes_contig_ro(vmf->pte, pgcount, pfn) == pgcount)) { if (old_folio) { if (!folio_test_anon(old_folio)) { + VM_BUG_ON(order != 0); dec_mm_counter(mm, mm_counter_file(&old_folio->page)); inc_mm_counter(mm, MM_ANONPAGES); } } else { + VM_BUG_ON(order != 0); inc_mm_counter(mm, MM_ANONPAGES); } - flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); - entry = mk_pte(&new_folio->page, vma->vm_page_prot); - entry = pte_sw_mkyoung(entry); - if (unlikely(unshare)) { - if (pte_soft_dirty(vmf->orig_pte)) - entry = pte_mksoft_dirty(entry); - if (pte_uffd_wp(vmf->orig_pte)) - entry = pte_mkuffd_wp(entry); - } else { - entry = maybe_mkwrite(pte_mkdirty(entry), vma); - } + flush_cache_range(vma, addr, addr + (pgcount << PAGE_SHIFT)); /* * Clear the pte entry and flush it first, before updating the @@ -3451,17 +3582,40 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * that left a window where the new PTE could be loaded into * some TLBs while the old PTE remains in others. */ - ptep_clear_flush_notify(vma, vmf->address, vmf->pte); - folio_add_new_anon_rmap(new_folio, vma, vmf->address); + ptep_clear_flush_range_notify(vma, addr, vmf->pte, pgcount); + folio_ref_add(new_folio, pgcount - 1); + folio_add_new_anon_rmap_range(new_folio, &new_folio->page, + pgcount, vma, addr); folio_add_lru_vma(new_folio, vma); /* * We call the notify macro here because, when using secondary * mmu page tables (such as kvm shadow page tables), we want the * new page to be mapped directly into the secondary page table. */ - BUG_ON(unshare && pte_write(entry)); - set_pte_at_notify(mm, vmf->address, vmf->pte, entry); - update_mmu_cache(vma, vmf->address, vmf->pte); + page = &new_folio->page; + for (i = 0; i < pgcount; i++, page++) { + entry = mk_pte(page, vma->vm_page_prot); + entry = pte_sw_mkyoung(entry); + if (unlikely(unshare)) { + orig_pte = vmf->pte[i]; + if (pte_soft_dirty(orig_pte)) + entry = pte_mksoft_dirty(entry); + if (pte_uffd_wp(orig_pte)) + entry = pte_mkuffd_wp(entry); + } else { + entry = maybe_mkwrite(pte_mkdirty(entry), vma); + } + /* + * TODO: Batch for !unshare case. Could use set_ptes(), + * but currently there is no arch-agnostic way to + * increment pte values by pfn so can't do the notify + * part. So currently stuck creating the pte from + * scratch every iteration. + */ + set_pte_at_notify(mm, addr + (i << PAGE_SHIFT), + vmf->pte + i, entry); + } + update_mmu_cache_range(vma, addr, vmf->pte, pgcount); if (old_folio) { /* * Only after switching the pte to the new page may @@ -3473,10 +3627,10 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * threads. * * The critical issue is to order this - * page_remove_rmap with the ptp_clear_flush above. - * Those stores are ordered by (if nothing else,) + * folio_remove_rmap_range with the ptp_clear_flush + * above. Those stores are ordered by (if nothing else,) * the barrier present in the atomic_add_negative - * in page_remove_rmap. + * in folio_remove_rmap_range. * * Then the TLB flush in ptep_clear_flush ensures that * no process can access the old page before the @@ -3485,14 +3639,30 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * mapcount is visible. So transitively, TLBs to * old page will be flushed before it can be reused. */ - page_remove_rmap(vmf->page, vma, false); + folio_remove_rmap_range(old_folio, + vmf->page - offset, + pgcount, vma); } /* Free the old page.. */ new_folio = old_folio; page_copied = 1; } else { - update_mmu_tlb(vma, vmf->address, vmf->pte); + pte_t *pte = vmf->pte + ((vmf->address - addr) >> PAGE_SHIFT); + + /* + * If faulting pte was serviced by another, exit early. Else try + * again, with a lower order. + */ + if (order > 0 && pte_same(*pte, vmf->orig_pte)) { + pte_unmap_unlock(vmf->pte, vmf->ptl); + mmu_notifier_invalidate_range_only_end(&range); + folio_put(new_folio); + order--; + goto retry; + } + + update_mmu_tlb(vma, vmf->address, pte); } if (new_folio) @@ -3505,9 +3675,11 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) */ mmu_notifier_invalidate_range_only_end(&range); if (old_folio) { - if (page_copied) + if (page_copied) { free_swap_cache(&old_folio->page); - folio_put(old_folio); + folio_put_refs(old_folio, pgcount); + } else + folio_put(old_folio); } delayacct_wpcopy_end(); From patchwork Fri Apr 14 13:03:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211483 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7699AC77B72 for ; Fri, 14 Apr 2023 13:05:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=QUZzzcjCySA/OKGwSJ++meR6Rk0DYq9AwDhKSuFk+Dg=; b=1FPGwPB4QIIYHC TeRqajjHZaCeybMyJDS/U5dF1DHkHOaERDHYAAqtMFj+wVsAyIFSVuRriZsHvgenU2cf47x4wt+MP idwevr5iFaACqgDFvuexEgLejSC6w8/HHa7Z33Ca+AZpkjBI+ANIq4bs9Awuov7bQYlbkH3eMHywz /c3m3byLeinwJ4qUs3+zAwKfBK+5c16Z+KriCXlbwrY3pDO4fwgq/JbnIPRWxhZ5U5xZymMZRcr/H zfSRWCQmwqRR344a9g+sDGXmQJ8eNh/+eP5gm5ekmH3zcSsQrg38+2vzJiaGrtZZEAex3HL9VqzWJ alWx37dS/rahfTmCpeOg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ60-009bmF-1u; Fri, 14 Apr 2023 13:04:08 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5b-009bZA-06 for linux-arm-kernel@bombadil.infradead.org; Fri, 14 Apr 2023 13:03:43 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=ybFxUPyHWDzwds2QZnZm/D00r0MGjfNOCyQiLQp+jsg=; b=BbtJ+AzHg1/1WayY9c9REVINsQ /W6L1n141yzuvMs8Zx4PYqBEPntlSKny45QxPZp828+oYC3CZzCQdWxCkFgcELE3umLEHyE8EUWBt BfVCIOxluAMAQoRva7Y4KBsvgNNHoRMVW3d8VKzB2hey2zO54r1Fg5wdMJ3jfj1i46XAZsT2xtRtj OeUWUFwt1N1vT1uCZwK+EnWQhHvcqKl+cRKlqFom34zC8fm9Fe6IsFMqu00/wyG3U/slcuwNvhCRD fiP+PgszaHbMum5Y43xpqbqFgszNUAOXtWoP4Wou4hLcrC4hqocLGixWW/hyVPTdx4pokGn71273l vJv5TxgA==; Received: from foss.arm.com ([217.140.110.172]) by desiato.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5V-00Fa15-1q for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:41 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CD8031756; Fri, 14 Apr 2023 06:04:21 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 550D53F6C4; Fri, 14 Apr 2023 06:03:36 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 15/17] mm: Convert zero page to large folios on write Date: Fri, 14 Apr 2023 14:03:01 +0100 Message-Id: <20230414130303.2345383-16-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_140337_916500_4AEA54A3 X-CRM114-Status: GOOD ( 22.07 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org A read fault causes the zero page to be mapped read-only. A subsequent write fault causes the zero page to be replaced with a zero-filled private anonymous page. Change the write fault behaviour to replace the zero page with a large anonymous folio, allocated using the same policy as if the write fault had happened without the previous read fault. Experimentation shows that reading multiple contiguous pages is extremely rare without interleved writes, so we don't bother to map a large zero page. We just use the small zero page as a marker and expand the allocation at the write fault. Signed-off-by: Ryan Roberts --- mm/memory.c | 115 ++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 80 insertions(+), 35 deletions(-) -- 2.25.1 diff --git a/mm/memory.c b/mm/memory.c index 61cec97a57f3..fac686e9f895 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3110,6 +3110,23 @@ static inline int check_ptes_contig_ro(pte_t *pte, int nr, unsigned long pfn) return nr; } +/* + * Checks that all ptes are none except for the pte at offset, which should be + * entry. Returns index of first pte that does not meet expectations, or nr if + * all are correct. + */ +static inline int check_ptes_none_or_entry(pte_t *pte, int nr, + pte_t entry, unsigned long offset) +{ + int ret; + + ret = check_ptes_none(pte, offset); + if (ret == offset && pte_same(pte[offset], entry)) + ret += 1 + check_ptes_none(pte + offset + 1, nr - offset - 1); + + return ret; +} + static int calc_anon_folio_order_alloc(struct vm_fault *vmf, int order) { /* @@ -3141,6 +3158,7 @@ static int calc_anon_folio_order_alloc(struct vm_fault *vmf, int order) pte_t *pte; pte_t *first_set = NULL; int ret; + unsigned long offset; if (has_transparent_hugepage()) { order = min(order, PMD_SHIFT - PAGE_SHIFT); @@ -3148,7 +3166,8 @@ static int calc_anon_folio_order_alloc(struct vm_fault *vmf, int order) for (; order > 1; order--) { nr = 1 << order; addr = ALIGN_DOWN(vmf->address, nr << PAGE_SHIFT); - pte = vmf->pte - ((vmf->address - addr) >> PAGE_SHIFT); + offset = ((vmf->address - addr) >> PAGE_SHIFT); + pte = vmf->pte - offset; /* Check vma bounds. */ if (addr < vma->vm_start || @@ -3163,8 +3182,9 @@ static int calc_anon_folio_order_alloc(struct vm_fault *vmf, int order) if (pte <= first_set) continue; - /* Need to check if all the ptes are none. */ - ret = check_ptes_none(pte, nr); + /* Need to check if all the ptes are none or entry. */ + ret = check_ptes_none_or_entry(pte, nr, + vmf->orig_pte, offset); if (ret == nr) break; @@ -3479,13 +3499,15 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) struct mmu_notifier_range range; int ret; pte_t orig_pte; - unsigned long addr = vmf->address; - int order = 0; - int pgcount = BIT(order); - unsigned long offset = 0; + unsigned long addr; + int order; + int pgcount; + unsigned long offset; unsigned long pfn; struct page *page; int i; + bool zero; + bool anon; delayacct_wpcopy_start(); @@ -3494,36 +3516,54 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) if (unlikely(anon_vma_prepare(vma))) goto oom; + /* + * Set the upper bound of the folio allocation order. If we hit a zero + * page, we allocate a folio with the same policy as allocation upon + * write fault. If we are copying an anon folio, then limit ourself to + * its order as we don't want to copy from multiple folios. For all + * other cases (e.g. file-mapped) CoW a single page. + */ if (is_zero_pfn(pte_pfn(vmf->orig_pte))) { - new_folio = vma_alloc_movable_folio(vma, vmf->address, 0, true); - if (!new_folio) - goto oom; - } else { - if (old_folio && folio_test_anon(old_folio)) { - order = min_t(int, folio_order(old_folio), + zero = true; + anon = false; + order = max_anon_folio_order(vma); + } else if (old_folio && folio_test_anon(old_folio)) { + zero = false; + anon = true; + order = min_t(int, folio_order(old_folio), max_anon_folio_order(vma)); + } else { + zero = false; + anon = false; + order = 0; + } + retry: - /* - * Estimate the folio order to allocate. We are not - * under the ptl here so this estimate needs to be - * re-checked later once we have the lock. - */ - vmf->pte = pte_offset_map(vmf->pmd, vmf->address); - order = calc_anon_folio_order_copy(vmf, old_folio, order); - pte_unmap(vmf->pte); - } + /* + * Estimate the folio order to allocate. We are not under the ptl here + * so this estimate needs to be re-checked later once we have the lock. + */ + if (zero || anon) { + vmf->pte = pte_offset_map(vmf->pmd, vmf->address); + order = zero ? calc_anon_folio_order_alloc(vmf, order) : + calc_anon_folio_order_copy(vmf, old_folio, order); + pte_unmap(vmf->pte); + } - new_folio = try_vma_alloc_movable_folio(vma, vmf->address, - order, false); - if (!new_folio) - goto oom; + /* Allocate the new folio. */ + new_folio = try_vma_alloc_movable_folio(vma, vmf->address, order, zero); + if (!new_folio) + goto oom; - /* We may have been granted less than we asked for. */ - order = folio_order(new_folio); - pgcount = BIT(order); - addr = ALIGN_DOWN(vmf->address, pgcount << PAGE_SHIFT); - offset = ((vmf->address - addr) >> PAGE_SHIFT); + /* We may have been granted less than we asked for. */ + order = folio_order(new_folio); + pgcount = BIT(order); + addr = ALIGN_DOWN(vmf->address, pgcount << PAGE_SHIFT); + offset = ((vmf->address - addr) >> PAGE_SHIFT); + pfn = pte_pfn(vmf->orig_pte) - offset; + /* Copy contents. */ + if (!zero) { if (likely(old_folio)) ret = __wp_page_copy_user_range(&new_folio->page, vmf->page - offset, @@ -3561,8 +3601,14 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * Re-check the pte(s) - we dropped the lock */ vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl); - pfn = pte_pfn(vmf->orig_pte) - offset; - if (likely(check_ptes_contig_ro(vmf->pte, pgcount, pfn) == pgcount)) { + + if (zero) + ret = check_ptes_none_or_entry(vmf->pte, pgcount, + vmf->orig_pte, offset); + else + ret = check_ptes_contig_ro(vmf->pte, pgcount, pfn); + + if (likely(ret == pgcount)) { if (old_folio) { if (!folio_test_anon(old_folio)) { VM_BUG_ON(order != 0); @@ -3570,8 +3616,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) inc_mm_counter(mm, MM_ANONPAGES); } } else { - VM_BUG_ON(order != 0); - inc_mm_counter(mm, MM_ANONPAGES); + add_mm_counter(mm, MM_ANONPAGES, pgcount); } flush_cache_range(vma, addr, addr + (pgcount << PAGE_SHIFT)); From patchwork Fri Apr 14 13:03:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211484 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4A84AC77B6E for ; Fri, 14 Apr 2023 13:05:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=9LNsaLx/NxrS4Sr0uxWgBpRLXdlaoBHHAPiaU9pRtZM=; b=ig7l1pnXjsxPK4 f7lHsJgzwMcn0neul57U7QIZA7EMsfnXpAYs3dOeqqjWurAG4oFul7zS5+BKiR9jeLnz/F7VoAtDX E2By/V9VBdv1AvRoe2EJTL4A5MB0MYZjfN+SbOJuYpNMy6Dq4jYmYXJQAnX6bmrLsm84lfOmPf1wC SkYCZNllVpIZi9Gbw+9Ls0RtnJiLWjeo9Iw02SMvUtADCe7vmWIbLvqWQQXKkvt1kY9VP8myHsmDa agNcSzdUjiTUYRhBAXny61UHY3F9gck1NKY2tvMiwKfQQFkTy1+A1WKv8CgJHND+aS55wpPcUGjf2 ONFG8cOkFgEtZpBfcGYQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ68-009bqP-2H; Fri, 14 Apr 2023 13:04:17 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5W-009bQ5-2v for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:43 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0B5541758; Fri, 14 Apr 2023 06:04:23 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 86F0B3F6C4; Fri, 14 Apr 2023 06:03:37 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 16/17] mm: mmap: Align unhinted maps to highest anon folio order Date: Fri, 14 Apr 2023 14:03:02 +0100 Message-Id: <20230414130303.2345383-17-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_060339_450615_C8A60FA0 X-CRM114-Status: GOOD ( 11.92 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org When allocating large anonymous folios, we want to maximize our chances of being able to use the highest order we support. Since one of the constraints is that a folio has to be mapped naturally aligned, let's have mmap default to that alignment when user space does not provide a hint. With this in place, an extra 2% of all allocated anonymous memory belongs to a folio of the highest order, when compiling the kernel. Signed-off-by: Ryan Roberts --- mm/mmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- 2.25.1 diff --git a/mm/mmap.c b/mm/mmap.c index ff68a67a2a7c..e7652001a32e 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1627,7 +1627,7 @@ generic_get_unmapped_area(struct file *filp, unsigned long addr, info.length = len; info.low_limit = mm->mmap_base; info.high_limit = mmap_end; - info.align_mask = 0; + info.align_mask = BIT(PAGE_SHIFT + ANON_FOLIO_ORDER_MAX) - 1; info.align_offset = 0; return vm_unmapped_area(&info); } @@ -1677,7 +1677,7 @@ generic_get_unmapped_area_topdown(struct file *filp, unsigned long addr, info.length = len; info.low_limit = max(PAGE_SIZE, mmap_min_addr); info.high_limit = arch_get_mmap_base(addr, mm->mmap_base); - info.align_mask = 0; + info.align_mask = BIT(PAGE_SHIFT + ANON_FOLIO_ORDER_MAX) - 1; info.align_offset = 0; addr = vm_unmapped_area(&info); From patchwork Fri Apr 14 13:03:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211486 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D07A2C77B72 for ; Fri, 14 Apr 2023 13:05:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=llNTWfYqXcJGxZQqom1HOn2f96Ag4jPtDTZ5q8jLIA4=; b=M7Suv7fW+/Y2ep KceiOP0N+Pgl0pp2pkQ/czQCcoC3rWodhz3yefugdy2EgJLuWXJ+dtxb7cEY/KdKrh6pH52IxXBIf zAyZyZBHM6McY0FVOzevtVL+JBU0f8pWbHl8OstDSm3qUFCL3OCvlNbgi6sx3qVC3DaLhfNUDfi9d Foa/MYJBt0uDwWB9RnXpDfnzeA4xLfmFWHFcIYYTwgHqprSM9gAF8OWJo4kQhpATyVAvWTif+0AX4 lhBykqd8A5iox/gw0d3Nos2691kS70HjqXFcTZ+bgwk2U1WRrxbIwGZaPx/o22IQQQmlngkkmQFaf 2HxdntShFj436Yo/CIOg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ6L-009bzr-2v; Fri, 14 Apr 2023 13:04:30 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5X-009bRi-2n for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:44 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3D53C176A; Fri, 14 Apr 2023 06:04:24 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B902E3F6C4; Fri, 14 Apr 2023 06:03:38 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 17/17] mm: Batch-zap large anonymous folio PTE mappings Date: Fri, 14 Apr 2023 14:03:03 +0100 Message-Id: <20230414130303.2345383-18-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_060340_238152_9925544B X-CRM114-Status: GOOD ( 19.22 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org This allows batching the rmap removal with folio_remove_rmap_range(), which means we avoid spuriously adding a partially unmapped folio to the deferrred split queue in the common case, which reduces split queue lock contention. Previously each page was removed from the rmap individually with page_remove_rmap(). If the first page belonged to a large folio, this would cause page_remove_rmap() to conclude that the folio was now partially mapped and add the folio to the deferred split queue. But subsequent calls would cause the folio to become fully unmapped, meaning there is no value to adding it to the split queue. Signed-off-by: Ryan Roberts --- mm/memory.c | 139 ++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 119 insertions(+), 20 deletions(-) -- 2.25.1 diff --git a/mm/memory.c b/mm/memory.c index fac686e9f895..e1cb4bf6fd5d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1351,6 +1351,95 @@ zap_install_uffd_wp_if_needed(struct vm_area_struct *vma, pte_install_uffd_wp_if_needed(vma, addr, pte, pteval); } +static inline unsigned long page_addr(struct page *page, + struct page *anchor, unsigned long anchor_addr) +{ + unsigned long offset; + unsigned long addr; + + offset = (page_to_pfn(page) - page_to_pfn(anchor)) << PAGE_SHIFT; + addr = anchor_addr + offset; + + if (anchor > page) { + if (addr > anchor_addr) + return 0; + } else { + if (addr < anchor_addr) + return ULONG_MAX; + } + + return addr; +} + +static int calc_anon_folio_map_pgcount(struct folio *folio, + struct page *page, pte_t *pte, + unsigned long addr, unsigned long end) +{ + pte_t ptent; + int floops; + int i; + unsigned long pfn; + + end = min(page_addr(&folio->page + folio_nr_pages(folio), page, addr), + end); + floops = (end - addr) >> PAGE_SHIFT; + pfn = page_to_pfn(page); + pfn++; + pte++; + + for (i = 1; i < floops; i++) { + ptent = *pte; + + if (!pte_present(ptent) || + pte_pfn(ptent) != pfn) { + return i; + } + + pfn++; + pte++; + } + + return floops; +} + +static unsigned long zap_anon_pte_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + struct page *page, pte_t *pte, + unsigned long addr, unsigned long end, + bool *full_out) +{ + struct folio *folio = page_folio(page); + struct mm_struct *mm = tlb->mm; + pte_t ptent; + int pgcount; + int i; + bool full; + + pgcount = calc_anon_folio_map_pgcount(folio, page, pte, addr, end); + + for (i = 0; i < pgcount;) { + ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); + tlb_remove_tlb_entry(tlb, pte, addr); + full = __tlb_remove_page(tlb, page, 0); + + if (unlikely(page_mapcount(page) < 1)) + print_bad_pte(vma, addr, ptent, page); + + i++; + page++; + pte++; + addr += PAGE_SIZE; + + if (unlikely(full)) + break; + } + + folio_remove_rmap_range(folio, page - i, i, vma); + + *full_out = full; + return i; +} + static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, @@ -1387,6 +1476,36 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, page = vm_normal_page(vma, addr, ptent); if (unlikely(!should_zap_page(details, page))) continue; + + /* + * Batch zap large anonymous folio mappings. This allows + * batching the rmap removal, which means we avoid + * spuriously adding a partially unmapped folio to the + * deferrred split queue in the common case, which + * reduces split queue lock contention. Require the VMA + * to be anonymous to ensure that none of the PTEs in + * the range require zap_install_uffd_wp_if_needed(). + */ + if (page && PageAnon(page) && vma_is_anonymous(vma)) { + bool full; + int pgcount; + + pgcount = zap_anon_pte_range(tlb, vma, + page, pte, addr, end, &full); + + rss[mm_counter(page)] -= pgcount; + pgcount--; + pte += pgcount; + addr += pgcount << PAGE_SHIFT; + + if (unlikely(full)) { + force_flush = 1; + addr += PAGE_SIZE; + break; + } + continue; + } + ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); tlb_remove_tlb_entry(tlb, pte, addr); @@ -3051,26 +3170,6 @@ struct anon_folio_range { bool exclusive; }; -static inline unsigned long page_addr(struct page *page, - struct page *anchor, unsigned long anchor_addr) -{ - unsigned long offset; - unsigned long addr; - - offset = (page_to_pfn(page) - page_to_pfn(anchor)) << PAGE_SHIFT; - addr = anchor_addr + offset; - - if (anchor > page) { - if (addr > anchor_addr) - return 0; - } else { - if (addr < anchor_addr) - return ULONG_MAX; - } - - return addr; -} - /* * Returns index of first pte that is not none, or nr if all are none. */