From patchwork Tue Feb 11 11:13:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dev Jain X-Patchwork-Id: 13969522 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A624DC021A2 for ; Tue, 11 Feb 2025 11:14:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 25AAB6B0092; Tue, 11 Feb 2025 06:14:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 20B286B0093; Tue, 11 Feb 2025 06:14:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0DC836B0095; Tue, 11 Feb 2025 06:14:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E604F6B0092 for ; Tue, 11 Feb 2025 06:14:56 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 95FF112061F for ; Tue, 11 Feb 2025 11:14:56 +0000 (UTC) X-FDA: 83107406592.22.BB89E8B Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf19.hostedemail.com (Postfix) with ESMTP id 030A41A0009 for ; Tue, 11 Feb 2025 11:14:54 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; spf=pass (imf19.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739272495; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=osKbVa4uo+Y7wX/dV75sV1DxfxhmrFukEpqmQkJP73o=; b=aJPPLrtRTSMqlK+gRdhQzQ93/KVZPfZR9PD0WMq8Id6ExCWUMjthGuLrtywvMRXF/X8v0W N7rjrx4VpUrdfjFMbMsTZqBF2+lMDkvx0Wl84VScef9RyfjcPra30mt18QL0BXJG47Jehd g42DW104ScN/fS5fX2dFS5s+q+x5GZc= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; spf=pass (imf19.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739272495; a=rsa-sha256; cv=none; b=qhTRUmq3Um4t7PZyXv/7A/13+routTkAhjM6huVJAN8ambzLkPiV6QxH759lNS7FvTsUqc tiSPpn0gEZzzZ5D6/MrxFEHn6vO7Eoe2BX9BH1EWqfOoQG3wyd7b8lYP8W02Gky8VzgIo3 DtKyX39iLGHPCntbW2MNh8hgjYYunc4= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C658E13D5; Tue, 11 Feb 2025 03:15:15 -0800 (PST) Received: from K4MQJ0H1H2.emea.arm.com (K4MQJ0H1H2.blr.arm.com [10.162.40.80]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id AA1D23F5A1; Tue, 11 Feb 2025 03:14:44 -0800 (PST) From: Dev Jain To: akpm@linux-foundation.org, david@redhat.com, willy@infradead.org, kirill.shutemov@linux.intel.com Cc: npache@redhat.com, ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dev Jain Subject: [PATCH v2 07/17] khugepaged: Scan PTEs order-wise Date: Tue, 11 Feb 2025 16:43:16 +0530 Message-Id: <20250211111326.14295-8-dev.jain@arm.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250211111326.14295-1-dev.jain@arm.com> References: <20250211111326.14295-1-dev.jain@arm.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 030A41A0009 X-Stat-Signature: 8im9p14qu668u4tx71nxzxwcd8z5ftzk X-Rspamd-Server: rspam03 X-HE-Tag: 1739272494-736973 X-HE-Meta: U2FsdGVkX19u26nilOaPesZ8SVj3/C6BSFl6EdjdCHKzIENBREDg6OZ8mux1EA+PhNrfYuT0M4JJO/iq5GL9KpuYzuj4NRuN8JHJBgWMu+95L4ockRK09MGGtuCpcMyfGbzxq75JRoqBOGN+lHc/+vK0aHqYAAZfcCqO4Akb1pLRNLOnHqMN8JaJDBt4kaX9kTcLS7h66INKSpddjTfwBImJhIpOrcziliIdGzWv9+QdCXtkdzec9sm1XOr2crkmd0C2+ioB7WhvCjoq5LuWwhYHw5M2l7dfy+WBhavX6Hzau5GChh6svhFaOOYE2g+/JizlU3SlgjgG3jFxkaCVGszjW9NBdx740j8X6pptxkzTSdwSvukDjsdyCvIpYZVCAAZBl06kxOUqwel3GZKguc/ZfsAQvbnQAMQYkHWtgAy/FHTS+R1QqcmXVF1iUz2IWv1f9un/YLRiVh4UjmWzjKixnkVX5uJ3ENkr2r6vbt9D4boUzUZNelVqeNEq6MPFWD+zVhROYWW6QrPXXt/8YlERh5K6JGEx8kztRsfeyT/FSqDamoORscN0tjUZsJhO6yOAk5IdL21JKwjcuhsqhXWlgCB0Lr+hpvJ202mFV6nhyN/MPgOV3vr5MmkcWw0p66lG5/puI6+Jy5QNn8EYZbnoQdWtuieQWB61RObssHM64mXWd30GH9Nwvj7/Uzs08Nl1gbtoMpDbvDdcF45NXb35Sbn9QcuCqQR/bXxdRw3VCx3Fh6ShMRpjv0RFmNmUUlTPYrlaLR6I4Q8wBYRYL5AjRePX7elS+vMxnRACv7iQLUd4m7EkpbHRd0hYu+wPvsC04nRqJfmetFIr+uam2yHtVv79Gj8CWAxUKieMhfoX/+YT9l3srhMV3a2xqr0pxYTnxw25GbYi/8HsyLqELVXWPJDV6MJBAev6JhV0Lyv3hROS7LAbVHtN/+4OBl1l5ESMJoywzAaRCB9LAUj AqSX4tuA BWfZrarm0TUU21xo0bve77FbB46lT+meYYdbad6b/0aL2Pcz/tSnXVrhQkTwyj0UWQKicPwt9syBhDZyJUFCIzEmvEKmK1vFh2iL9+NhE8oIOpx4HGrxgY0fE90nD/U8DXTYDmEy0x5s0MEwwNKyhQpZ+qD48/yTn62PUBMLPo+3ZEYpognGrlz6QXzCer3CXgGD2HvQ5OzTcLwTxymuApH4H6f2fOQFNoVBZ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Scan the PTEs order-wise, using the mask of suitable orders for this VMA derived in conjunction with sysfs THP settings. Scale down the tunables (to be changed in subsequent patches); in case of collapse failure, we drop down to the next order. Otherwise, we try to jump to the highest possible order and then start a fresh scan. Note that madvise(MADV_COLLAPSE) has not been generalized. Signed-off-by: Dev Jain --- mm/khugepaged.c | 97 ++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 83 insertions(+), 14 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 498cb5ad9ff1..fbfd8a78ef51 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include @@ -1295,36 +1296,57 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, { pmd_t *pmd; pte_t *pte, *_pte; - int result = SCAN_FAIL, referenced = 0; - int none_or_zero = 0, shared = 0; - struct page *page = NULL; struct folio *folio = NULL; - unsigned long _address; + int result = SCAN_FAIL; spinlock_t *ptl; - int node = NUMA_NO_NODE, unmapped = 0; + unsigned int max_ptes_shared, max_ptes_none, max_ptes_swap; + int referenced, shared, none_or_zero, unmapped; + unsigned long _address, orig_address = address; + int node = NUMA_NO_NODE; bool writable = false; + unsigned long orders, orig_orders; + int order, prev_order; VM_BUG_ON(address & ~HPAGE_PMD_MASK); + orders = thp_vma_allowable_orders(vma, vma->vm_flags, + TVA_IN_PF | TVA_ENFORCE_SYSFS, THP_ORDERS_ALL_ANON); + orders = thp_vma_suitable_orders(vma, address, orders); + orig_orders = orders; + order = highest_order(orders); + + /* MADV_COLLAPSE needs to work irrespective of sysfs setting */ + if (!cc->is_khugepaged) + order = HPAGE_PMD_ORDER; + +scan_pte_range: + + max_ptes_shared = khugepaged_max_ptes_shared >> (HPAGE_PMD_ORDER - order); + max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order); + max_ptes_swap = khugepaged_max_ptes_swap >> (HPAGE_PMD_ORDER - order); + referenced = 0, shared = 0, none_or_zero = 0, unmapped = 0; + + /* Check pmd after taking mmap lock */ result = find_pmd_or_thp_or_none(mm, address, &pmd); if (result != SCAN_SUCCEED) goto out; memset(cc->node_load, 0, sizeof(cc->node_load)); nodes_clear(cc->alloc_nmask); + pte = pte_offset_map_lock(mm, pmd, address, &ptl); if (!pte) { result = SCAN_PMD_NULL; goto out; } - for (_address = address, _pte = pte; _pte < pte + HPAGE_PMD_NR; + for (_address = address, _pte = pte; _pte < pte + (1UL << order); _pte++, _address += PAGE_SIZE) { pte_t pteval = ptep_get(_pte); if (is_swap_pte(pteval)) { ++unmapped; if (!cc->is_khugepaged || - unmapped <= khugepaged_max_ptes_swap) { + unmapped <= max_ptes_swap) { /* * Always be strict with uffd-wp * enabled swap entries. Please see @@ -1345,7 +1367,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, ++none_or_zero; if (!userfaultfd_armed(vma) && (!cc->is_khugepaged || - none_or_zero <= khugepaged_max_ptes_none)) { + none_or_zero <= max_ptes_none)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -1369,12 +1391,11 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, if (pte_write(pteval)) writable = true; - page = vm_normal_page(vma, _address, pteval); - if (unlikely(!page) || unlikely(is_zone_device_page(page))) { + folio = vm_normal_folio(vma, _address, pteval); + if (unlikely(!folio) || unlikely(folio_is_zone_device(folio))) { result = SCAN_PAGE_NULL; goto out_unmap; } - folio = page_folio(page); if (!folio_test_anon(folio)) { result = SCAN_PAGE_ANON; @@ -1390,7 +1411,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, if (folio_likely_mapped_shared(folio)) { ++shared; if (cc->is_khugepaged && - shared > khugepaged_max_ptes_shared) { + shared > max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out_unmap; @@ -1447,7 +1468,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, result = SCAN_PAGE_RO; } else if (cc->is_khugepaged && (!referenced || - (unmapped && referenced < HPAGE_PMD_NR / 2))) { + (unmapped && referenced < (1UL << order) / 2))) { result = SCAN_LACK_REFERENCED_PAGE; } else { result = SCAN_SUCCEED; @@ -1456,10 +1477,58 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, pte_unmap_unlock(pte, ptl); if (result == SCAN_SUCCEED) { result = collapse_huge_page(mm, address, referenced, - unmapped, HPAGE_PMD_ORDER, cc); + unmapped, order, cc); /* collapse_huge_page will return with the mmap_lock released */ *mmap_locked = false; + /* Skip over this range and decide order */ + if (result == SCAN_SUCCEED) + goto decide_order; + } + if (result != SCAN_SUCCEED) { + + /* Go to the next order */ + prev_order = order; + order = next_order(&orders, order); + if (order < 2) { + /* Skip over this range, and decide order */ + _address = address + (PAGE_SIZE << prev_order); + _pte = pte + (1UL << prev_order); + goto decide_order; + } + goto maybe_mmap_lock; } + +decide_order: + /* Immediately exit on exhaustion of range */ + if (_address == orig_address + (PAGE_SIZE << HPAGE_PMD_ORDER)) + goto out; + + /* Get highest order possible starting from address */ + order = count_trailing_zeros(_address >> PAGE_SHIFT); + + orders = orig_orders & ((1UL << (order + 1)) - 1); + if (!(orders & (1UL << order))) + order = next_order(&orders, order); + + /* This should never happen, since we are on an aligned address */ + BUG_ON(cc->is_khugepaged && order < 2); + + address = _address; + pte = _pte; + +maybe_mmap_lock: + if (!(*mmap_locked)) { + mmap_read_lock(mm); + *mmap_locked = true; + /* Validate VMA after retaking mmap_lock */ + result = hugepage_vma_revalidate(mm, address, true, &vma, + order, cc); + if (result != SCAN_SUCCEED) { + mmap_read_unlock(mm); + goto out; + } + } + goto scan_pte_range; out: trace_mm_khugepaged_scan_pmd(mm, &folio->page, writable, referenced, none_or_zero, result, unmapped);