From patchwork Tue Feb 11 11:13:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dev Jain X-Patchwork-Id: 13969528 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C627CC021A2 for ; Tue, 11 Feb 2025 11:16:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3E4576B0088; Tue, 11 Feb 2025 06:16:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 31EBF6B0093; Tue, 11 Feb 2025 06:16:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12360280001; Tue, 11 Feb 2025 06:16:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C6E226B0088 for ; Tue, 11 Feb 2025 06:16:08 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 97BEB1C9393 for ; Tue, 11 Feb 2025 11:15:17 +0000 (UTC) X-FDA: 83107407474.09.D3A3A24 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf23.hostedemail.com (Postfix) with ESMTP id F1BF2140002 for ; Tue, 11 Feb 2025 11:15:15 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739272516; a=rsa-sha256; cv=none; b=S4RNmcmyv7ExQ8vXEBw44UuPAmrTvVUPv8meV6zKZWu24VeJMUYsmbrVTvHPmKWWojwlX9 Uq9vtweDhfpwMWoTZFKfoWEqwOaJ/SJuECPeOmjCoFPYGKefUmN9JmTpvj7L4xV/L44UpV Hhg3myK1aIyLe5W/TbrAOQcRCb7hhuI= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739272516; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fO93T+MZuXBGjgk/TFn0xI+IcnCBkWuxazq1WnoeDws=; b=NbQy9644ndlf4zQ0pXWSkUGJ92UA+Os2HTlSk0ci8Qe8tH3Lx9ahrWOt1DnOjJksA5/rjr U0lQpkxhbHu7qW7nyVoRLIpOf6Wf7ZtWZEq5fZY9k2VvtpFrHYUVNVSrbilwwt0qUdfe+0 WQfTt//PWwHoXx/8NyrTuPiujNfae8Y= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CD31513D5; Tue, 11 Feb 2025 03:15:36 -0800 (PST) Received: from K4MQJ0H1H2.emea.arm.com (K4MQJ0H1H2.blr.arm.com [10.162.40.80]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 863983F5A1; Tue, 11 Feb 2025 03:15:05 -0800 (PST) From: Dev Jain To: akpm@linux-foundation.org, david@redhat.com, willy@infradead.org, kirill.shutemov@linux.intel.com Cc: npache@redhat.com, ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dev Jain Subject: [PATCH v2 09/17] khugepaged: Define collapse policy if a larger folio is already mapped Date: Tue, 11 Feb 2025 16:43:18 +0530 Message-Id: <20250211111326.14295-10-dev.jain@arm.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250211111326.14295-1-dev.jain@arm.com> References: <20250211111326.14295-1-dev.jain@arm.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: F1BF2140002 X-Stat-Signature: 8b6roon4364i6784nhxuuc7tm8r1oywb X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1739272515-829427 X-HE-Meta: U2FsdGVkX18uoLo+8A/W0xiKdAhI0FGSXFcoG7G2F4TqQPgD7Demk4xPEuy6b1auUI3AIl+yMjEA5PgJQMgvnYdagjALGYte8q+vn/CDSp2U7DVIz0IQxU47MsgeFzX8bou7ZvNDVDjAQ8qo05aO4YSit14ary9HBFTqgfAAEriX4z8ye8RKF1QUEj8HsvPr4vXI6K2TU+ioONo9N0N6Hr/Si74UVvqu6ZPW/OwK9oZ4FHtET4VKQ/pdhW350ptlFZfVxabpSKVgjvjW4/2RlraFLTaohd/lDycsNGvH7UP/+v3lMDDqqHfoBIkulau8KYKURxm/6qQyAWCQfcDs/if0rFuTy1tD+K905G3l0Jm5LDDfSejJGpXifbvHX9FGLlGBAdKjDrr5nLJG4TeZqsOUcPSDg/IRF1mtWuYnkJire5vrhikNOYVle5WHP2h41KQ+SU4ZmoNcYS6i+59JWh5zMV+6ge/jrvnqVIsW6XvGsG3cFV3OB2C9wkyRQDHoEaFpWMaAnOA/PXtP0yU1zeeKPH4Elgp6B/ncX/rqnjhyjk0vOzVw84CJ227O7xDpMw27+TrgO2d4Yoi5z18qWPgJ3kehoO6QaLvDiUUPm5jCZb59wCVI1DQ7PzkE6APQSZAI7dm8ZT/oNORphLlVHFQI5X9cewsWoSpN8BdBqBR+V/JLeyZExwnKSZ642a4dkDQHTNA3sucx6Xaj1BPmM9UAPaiBQ1uQaIS2D91TwTip2AqzO7K88q2rFXxy5XCMlS7+Olkf0Z/LcbDwclO9zwWviGDaJulehOszVew82pk7nSHA86irXacAQ3SB9W/k29QJVp6zc8BZL7InjmI2s36Cdv2FTpOBBUb8GlT9kb/oVrcJthbJlCxadEzoZvgCI5WXaXamzvB04D8G2b9LCuj69dC+6gicJcZMmqDW1IYmE9C1klGmIq1+5P4cIEybZ1uIG9gt2IVZdHGHUZG 7ARHpKKe fqRHu6UT3u4YmIRCVYNI7qP1amKV3giSvmays9dnwg/nxTtWoqrpMJLio97bt6GN1DMMM2tRjuvmSLbFzrtjkGAr0zDQRc7va+td3mAcfyTUCpG56kJS8tn0JD4x4nwEZEZjWTc94yeDSKfzbIV3KjetMNL8FB6ftsyuqEwPJQWDIcMa7QN7W8He7eyymL9e/jpN2jg7oiG0XPfu/NtzLbgS21hlW/DoCK6NEpPVHWRpDEFaUlBe74VKmVA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: As noted in [1], khugepaged's goal must be to collapse memory to the highest aligned order possible. Suppose khugepaged is scanning for 64K, and we have a 128K folio, whose first 64K half is VA-PA aligned and fully mapped. In such a case, it does not make sense to break this down into two 64K folios. On the other hand, if the first half is not aligned, or it is partially mapped, it makes sense for khugepaged to collapse this portion into a VA-PA aligned fully mapped 64K folio. [1] https://lore.kernel.org/all/aa647830-cf55-48f0-98c2-8230796e35b3@arm.com/ Signed-off-by: Dev Jain --- mm/khugepaged.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 65 insertions(+), 1 deletion(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index a674014b6563..0d0d8f415a2e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -34,6 +34,7 @@ enum scan_result { SCAN_PMD_NULL, SCAN_PMD_NONE, SCAN_PMD_MAPPED, + SCAN_PTE_MAPPED_THP, SCAN_EXCEED_NONE_PTE, SCAN_EXCEED_SWAP_PTE, SCAN_EXCEED_SHARED_PTE, @@ -562,6 +563,14 @@ static bool is_refcount_suitable(struct folio *folio) return folio_ref_count(folio) == expected_refcount; } +/* Assumes an embedded PFN */ +static bool is_same_folio(pte_t *first_pte, pte_t *last_pte) +{ + struct folio *folio1 = page_folio(pte_page(ptep_get(first_pte))); + struct folio *folio2 = page_folio(pte_page(ptep_get(last_pte))); + return folio1 == folio2; +} + static int __collapse_huge_page_isolate(struct vm_area_struct *vma, unsigned long address, pte_t *pte, @@ -575,13 +584,22 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, bool writable = false; unsigned int max_ptes_shared = khugepaged_max_ptes_shared >> (HPAGE_PMD_ORDER - order); unsigned int max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order); + bool all_pfns_present = true; + bool all_pfns_contig = true; + bool first_pfn_aligned = true; + pte_t prev_pteval; for (_pte = pte; _pte < pte + (1UL << order); _pte++, address += PAGE_SIZE) { pte_t pteval = ptep_get(_pte); + if (_pte == pte) { + if (!IS_ALIGNED(pte_pfn(pteval), (1UL << order))) + first_pfn_aligned = false; + } if (pte_none(pteval) || (pte_present(pteval) && is_zero_pfn(pte_pfn(pteval)))) { ++none_or_zero; + all_pfns_present = false; if (!userfaultfd_armed(vma) && (!cc->is_khugepaged || none_or_zero <= max_ptes_none)) { @@ -660,6 +678,12 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, goto out; } + if (all_pfns_contig && (pte != _pte) && !(all_pfns_present && + (pte_pfn(pteval) == pte_pfn(prev_pteval) + 1))) + all_pfns_contig = false; + + prev_pteval = pteval; + /* * Isolate the page to avoid collapsing an hugepage * currently in use by the VM. @@ -696,6 +720,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_PAGE_RO; } else if (unlikely(cc->is_khugepaged && !referenced)) { result = SCAN_LACK_REFERENCED_PAGE; + } else if ((result == SCAN_SUCCEED) && (order != HPAGE_PMD_ORDER) && all_pfns_present && + all_pfns_contig && first_pfn_aligned && + is_same_folio(pte, pte + (1UL << order) - 1)) { + result = SCAN_PTE_MAPPED_THP; } else { result = SCAN_SUCCEED; trace_mm_collapse_huge_page_isolate(&folio->page, none_or_zero, @@ -1398,6 +1426,8 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, bool writable = false; unsigned long orders, orig_orders; int order, prev_order; + bool all_pfns_present, all_pfns_contig, first_pfn_aligned; + pte_t prev_pteval; VM_BUG_ON(address & ~HPAGE_PMD_MASK); @@ -1417,6 +1447,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order); max_ptes_swap = khugepaged_max_ptes_swap >> (HPAGE_PMD_ORDER - order); referenced = 0, shared = 0, none_or_zero = 0, unmapped = 0; + all_pfns_present = true, all_pfns_contig = true, first_pfn_aligned = true; /* Check pmd after taking mmap lock */ result = find_pmd_or_thp_or_none(mm, address, &pmd); @@ -1435,8 +1466,14 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, for (_address = address, _pte = pte; _pte < pte + (1UL << order); _pte++, _address += PAGE_SIZE) { pte_t pteval = ptep_get(_pte); + if (_pte == pte) { + if (!IS_ALIGNED(pte_pfn(pteval), (1UL << order))) + first_pfn_aligned = false; + } + if (is_swap_pte(pteval)) { ++unmapped; + all_pfns_present = false; if (!cc->is_khugepaged || unmapped <= max_ptes_swap) { /* @@ -1457,6 +1494,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, } if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { ++none_or_zero; + all_pfns_present = false; if (!userfaultfd_armed(vma) && (!cc->is_khugepaged || none_or_zero <= max_ptes_none)) { @@ -1546,6 +1584,17 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, goto out_unmap; } + + /* + * PFNs not contig, if either at least one PFN not present, or the previous + * and this PFN are not contig + */ + if (all_pfns_contig && (pte != _pte) && !(all_pfns_present && + (pte_pfn(pteval) == pte_pfn(prev_pteval) + 1))) + all_pfns_contig = false; + + prev_pteval = pteval; + /* * If collapse was initiated by khugepaged, check that there is * enough young pte to justify collapsing the page @@ -1567,15 +1616,30 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, } out_unmap: pte_unmap_unlock(pte, ptl); + + /* + * We skip if the following conditions are true: + * 1) All PTEs point to consecutive PFNs + * 2) All PFNs belong to the same folio + * 3) The PFNs are PA-aligned to the order we are scanning for + */ + if ((result == SCAN_SUCCEED) && (order != HPAGE_PMD_ORDER) && all_pfns_present && + all_pfns_contig && first_pfn_aligned && + is_same_folio(pte, pte + (1UL << order) - 1)) { + result = SCAN_PTE_MAPPED_THP; + goto decide_order; + } + if (result == SCAN_SUCCEED) { result = collapse_huge_page(mm, address, referenced, unmapped, order, cc); /* collapse_huge_page will return with the mmap_lock released */ *mmap_locked = false; /* Skip over this range and decide order */ - if (result == SCAN_SUCCEED) + if (result == SCAN_SUCCEED || result == SCAN_PTE_MAPPED_THP) goto decide_order; } + if (result != SCAN_SUCCEED) { /* Go to the next order */