From patchwork Fri Apr 14 13:03:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13211467 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51082C77B6E for ; Fri, 14 Apr 2023 13:04:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0A710280006; Fri, 14 Apr 2023 09:03:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 030FA280001; Fri, 14 Apr 2023 09:03:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF685280006; Fri, 14 Apr 2023 09:03:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C16DD280001 for ; Fri, 14 Apr 2023 09:03:39 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7D161120181 for ; Fri, 14 Apr 2023 13:03:39 +0000 (UTC) X-FDA: 80680013358.21.868A581 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf21.hostedemail.com (Postfix) with ESMTP id 8B7FB1C0038 for ; Fri, 14 Apr 2023 13:03:37 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681477417; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ybFxUPyHWDzwds2QZnZm/D00r0MGjfNOCyQiLQp+jsg=; b=WwpcSzVkszEnvqE/PVxkdvh+vfH5guQAdv0YvzOBUhgHJflUWrp29f6y3n1mQ19Sj7RBdc PmsVlQiyINHeeJdT+xtJ3uVC7RRHccAfhly4L7N6n5TZegH8FoQdy4sRhE7CVi8/1iw5gE jNpLk5O1JPUCDJe4WUzwJsNwvYS6AG4= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681477417; a=rsa-sha256; cv=none; b=DRDzb3FfSilf/oWrXrglWNFM+g6tS89Mg8gISFyWyPAA5qzJv3/Rx3ZWUxDy5WqxwiJcSe ZebssyFT5h78Sb02Z0f8B+ftFsoTModGZgOFlhImrd85ewrZ5PN2Xb9uQmKIhQklrRMg9+ V5X+zjerMaa+KNav3B1PvvLSRQzMwd4= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CD8031756; Fri, 14 Apr 2023 06:04:21 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 550D53F6C4; Fri, 14 Apr 2023 06:03:36 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 15/17] mm: Convert zero page to large folios on write Date: Fri, 14 Apr 2023 14:03:01 +0100 Message-Id: <20230414130303.2345383-16-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: 6pwh9kzyibj4oa9p9d559rak8ipr93rz X-Rspamd-Queue-Id: 8B7FB1C0038 X-HE-Tag: 1681477417-814351 X-HE-Meta: U2FsdGVkX1/wInExKKtf3ifTY28SFXgOJPBeWoWuW3oDxplJfHu7HfulAsoP3bSoaJuarei6DRTUeYV2XSv0oRsxnjvK0avAlnOkrm0nKH70EhYaV7FYVECQYkEc2vmNPV/gcmFH43KPEkcWWivAXW5Kx3M7L8tDh6Cnx/2m7X8PO6GFpBt0MsbGWzzVtcS0T6qUMbTJ3GWvwEBckSzAqlzftVfpK0n5nPQeuliCuOpI/hT+DCDcerrWxQhQs9iNUQ+T8InrJEHZU8LzC9PWDZUH4AQfqc84DzTRMPaNNutGAdvXLqlvdwNiN6blhRhSaO+jPCw90NYVDs4GuRMSMPQxTsYPvUBrEJA0HNIrTSTyTiT/44m68fQm4uRmnO5l99yQmTxi13kbpENo5r4vvo7RvxkJQKm9ltG7YoR1MujealbLsuavTFa4fdh8PgieXkXqy0IRmK+0+yceAebPdMZ2PtLWB0yReoIiLansynfAWbrUhe49gLEKVE74dJGnW8p50hChGA8qXNKoj996DQ50P4MlGj1F7v1pwGSoZz0vjTV0dDAAa6GWl11vCeVFYgjAyn6mO1rS1IdP6d/XV1qooG9o8lCQAYdvAQNWV/TGYqOLzE3xonXq7rGDD1v5u4wy0t12cc1szXt/vOQrFoV+Fb99xmvdF0Vnf95BnA+cVCx/dVieQEyMW2JJ+23VR06KWOeBrHbkQ84SsFteVP5cs+OlOG4SfV6i1FIo0nZajcSGrt/yc4JeacNbHGy+GISLG2EuotveWr+nXGMShEZyUgx2hoD6I644Cf4MwBCWY5Ws5pZivEL34VzDrz5++idkW2jxPenkK+CpdH/Hc8EhJPfVcKmZ8gTEu4Hbqb2PFEIev67o26UmDtSBjkmtMYb2dV3m8bpRxsLUJAv/iDPhKjY797jQ1I0qZYKxh0gPXtOAK4a7wkWrYg5PiIfkVdl8MJ8gkW/bqh7C4Hp 7WI0ITLf CVMKzi3BEbyp/ZlNe4VDviUh4zLqJT2mKQSIW0YP5TRj8/Syz3WkqaVc8lBT1NAfmKq15O4krpEqqGvIxFyJXZAACbWjd6QhoYLLldWXwJCIGZ7w2C5Py4eVR+j0isw41E+ntt9/dClcXmL7P+yDClPIj0I9ctCTkfwmXWIUVAuLiSs2ni4KrGgcIKcUS7e1+nQan X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A read fault causes the zero page to be mapped read-only. A subsequent write fault causes the zero page to be replaced with a zero-filled private anonymous page. Change the write fault behaviour to replace the zero page with a large anonymous folio, allocated using the same policy as if the write fault had happened without the previous read fault. Experimentation shows that reading multiple contiguous pages is extremely rare without interleved writes, so we don't bother to map a large zero page. We just use the small zero page as a marker and expand the allocation at the write fault. Signed-off-by: Ryan Roberts --- mm/memory.c | 115 ++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 80 insertions(+), 35 deletions(-) -- 2.25.1 diff --git a/mm/memory.c b/mm/memory.c index 61cec97a57f3..fac686e9f895 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3110,6 +3110,23 @@ static inline int check_ptes_contig_ro(pte_t *pte, int nr, unsigned long pfn) return nr; } +/* + * Checks that all ptes are none except for the pte at offset, which should be + * entry. Returns index of first pte that does not meet expectations, or nr if + * all are correct. + */ +static inline int check_ptes_none_or_entry(pte_t *pte, int nr, + pte_t entry, unsigned long offset) +{ + int ret; + + ret = check_ptes_none(pte, offset); + if (ret == offset && pte_same(pte[offset], entry)) + ret += 1 + check_ptes_none(pte + offset + 1, nr - offset - 1); + + return ret; +} + static int calc_anon_folio_order_alloc(struct vm_fault *vmf, int order) { /* @@ -3141,6 +3158,7 @@ static int calc_anon_folio_order_alloc(struct vm_fault *vmf, int order) pte_t *pte; pte_t *first_set = NULL; int ret; + unsigned long offset; if (has_transparent_hugepage()) { order = min(order, PMD_SHIFT - PAGE_SHIFT); @@ -3148,7 +3166,8 @@ static int calc_anon_folio_order_alloc(struct vm_fault *vmf, int order) for (; order > 1; order--) { nr = 1 << order; addr = ALIGN_DOWN(vmf->address, nr << PAGE_SHIFT); - pte = vmf->pte - ((vmf->address - addr) >> PAGE_SHIFT); + offset = ((vmf->address - addr) >> PAGE_SHIFT); + pte = vmf->pte - offset; /* Check vma bounds. */ if (addr < vma->vm_start || @@ -3163,8 +3182,9 @@ static int calc_anon_folio_order_alloc(struct vm_fault *vmf, int order) if (pte <= first_set) continue; - /* Need to check if all the ptes are none. */ - ret = check_ptes_none(pte, nr); + /* Need to check if all the ptes are none or entry. */ + ret = check_ptes_none_or_entry(pte, nr, + vmf->orig_pte, offset); if (ret == nr) break; @@ -3479,13 +3499,15 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) struct mmu_notifier_range range; int ret; pte_t orig_pte; - unsigned long addr = vmf->address; - int order = 0; - int pgcount = BIT(order); - unsigned long offset = 0; + unsigned long addr; + int order; + int pgcount; + unsigned long offset; unsigned long pfn; struct page *page; int i; + bool zero; + bool anon; delayacct_wpcopy_start(); @@ -3494,36 +3516,54 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) if (unlikely(anon_vma_prepare(vma))) goto oom; + /* + * Set the upper bound of the folio allocation order. If we hit a zero + * page, we allocate a folio with the same policy as allocation upon + * write fault. If we are copying an anon folio, then limit ourself to + * its order as we don't want to copy from multiple folios. For all + * other cases (e.g. file-mapped) CoW a single page. + */ if (is_zero_pfn(pte_pfn(vmf->orig_pte))) { - new_folio = vma_alloc_movable_folio(vma, vmf->address, 0, true); - if (!new_folio) - goto oom; - } else { - if (old_folio && folio_test_anon(old_folio)) { - order = min_t(int, folio_order(old_folio), + zero = true; + anon = false; + order = max_anon_folio_order(vma); + } else if (old_folio && folio_test_anon(old_folio)) { + zero = false; + anon = true; + order = min_t(int, folio_order(old_folio), max_anon_folio_order(vma)); + } else { + zero = false; + anon = false; + order = 0; + } + retry: - /* - * Estimate the folio order to allocate. We are not - * under the ptl here so this estimate needs to be - * re-checked later once we have the lock. - */ - vmf->pte = pte_offset_map(vmf->pmd, vmf->address); - order = calc_anon_folio_order_copy(vmf, old_folio, order); - pte_unmap(vmf->pte); - } + /* + * Estimate the folio order to allocate. We are not under the ptl here + * so this estimate needs to be re-checked later once we have the lock. + */ + if (zero || anon) { + vmf->pte = pte_offset_map(vmf->pmd, vmf->address); + order = zero ? calc_anon_folio_order_alloc(vmf, order) : + calc_anon_folio_order_copy(vmf, old_folio, order); + pte_unmap(vmf->pte); + } - new_folio = try_vma_alloc_movable_folio(vma, vmf->address, - order, false); - if (!new_folio) - goto oom; + /* Allocate the new folio. */ + new_folio = try_vma_alloc_movable_folio(vma, vmf->address, order, zero); + if (!new_folio) + goto oom; - /* We may have been granted less than we asked for. */ - order = folio_order(new_folio); - pgcount = BIT(order); - addr = ALIGN_DOWN(vmf->address, pgcount << PAGE_SHIFT); - offset = ((vmf->address - addr) >> PAGE_SHIFT); + /* We may have been granted less than we asked for. */ + order = folio_order(new_folio); + pgcount = BIT(order); + addr = ALIGN_DOWN(vmf->address, pgcount << PAGE_SHIFT); + offset = ((vmf->address - addr) >> PAGE_SHIFT); + pfn = pte_pfn(vmf->orig_pte) - offset; + /* Copy contents. */ + if (!zero) { if (likely(old_folio)) ret = __wp_page_copy_user_range(&new_folio->page, vmf->page - offset, @@ -3561,8 +3601,14 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * Re-check the pte(s) - we dropped the lock */ vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl); - pfn = pte_pfn(vmf->orig_pte) - offset; - if (likely(check_ptes_contig_ro(vmf->pte, pgcount, pfn) == pgcount)) { + + if (zero) + ret = check_ptes_none_or_entry(vmf->pte, pgcount, + vmf->orig_pte, offset); + else + ret = check_ptes_contig_ro(vmf->pte, pgcount, pfn); + + if (likely(ret == pgcount)) { if (old_folio) { if (!folio_test_anon(old_folio)) { VM_BUG_ON(order != 0); @@ -3570,8 +3616,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) inc_mm_counter(mm, MM_ANONPAGES); } } else { - VM_BUG_ON(order != 0); - inc_mm_counter(mm, MM_ANONPAGES); + add_mm_counter(mm, MM_ANONPAGES, pgcount); } flush_cache_range(vma, addr, addr + (pgcount << PAGE_SHIFT));