From patchwork Mon Dec 16 16:51:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dev Jain X-Patchwork-Id: 13910082 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCD85E7717F for ; Mon, 16 Dec 2024 16:53:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6BC1F6B0082; Mon, 16 Dec 2024 11:53:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 66D416B0083; Mon, 16 Dec 2024 11:53:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5340F6B00C8; Mon, 16 Dec 2024 11:53:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 3368A6B0082 for ; Mon, 16 Dec 2024 11:53:07 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C9ACE4575D for ; Mon, 16 Dec 2024 16:53:06 +0000 (UTC) X-FDA: 82901416332.26.FF3CC72 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf12.hostedemail.com (Postfix) with ESMTP id 4692F40011 for ; Mon, 16 Dec 2024 16:52:52 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf12.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734367971; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n8GzYR6cW/SAemjTI3W4Wft8Ia3ux1GWyb+HizoSsdY=; b=ofn2fxpW8hhg51YS/pv2rjZcTrKxMIV1Y7jfdCOTfoBKxAxpawfI+VejF0OXlkGCSXJqtG LhXxEpVZiwOEfuLaYFSB1v0dv05tD4deM7QnnxZ2WmDfx0dcRckR57SGWX3ATPB9V1bZK+ uL+qOuKM2jyOJ5IHngXv3lU0PSPyYXU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734367971; a=rsa-sha256; cv=none; b=HBl8GwEsW4dKYyudk7n7fgu/8yWMUB1Yvwz5il/1KuV5UhrYorJtCxWsFSYp5fZuQhgagt aMOyC+83+20rcQ6fQhAzXS3BaIr+CRM/022guibXP591sTTF28uKVKvvZLmsqCFtIzN1f5 Yx2ndAtxdBE5QBqOsFiYJuF5q/HvLYs= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf12.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8BFE216F8; Mon, 16 Dec 2024 08:53:32 -0800 (PST) Received: from K4MQJ0H1H2.arm.com (unknown [10.163.78.212]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id C07E13F528; Mon, 16 Dec 2024 08:52:53 -0800 (PST) From: Dev Jain To: akpm@linux-foundation.org, david@redhat.com, willy@infradead.org, kirill.shutemov@linux.intel.com Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dev Jain Subject: [RFC PATCH 08/12] khugepaged: Abstract PMD-THP collapse Date: Mon, 16 Dec 2024 22:21:01 +0530 Message-Id: <20241216165105.56185-9-dev.jain@arm.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20241216165105.56185-1-dev.jain@arm.com> References: <20241216165105.56185-1-dev.jain@arm.com> MIME-Version: 1.0 X-Stat-Signature: 5aq8hq9n7c1frwyb6qguhm6uj7m6dkkw X-Rspamd-Queue-Id: 4692F40011 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1734367972-246123 X-HE-Meta: U2FsdGVkX19OWCZs+AZKWe5RarJkvMukz0mZXQ91UOEphXBE6g6arAP8JNcfWhO0hUOMHkSkJ878pqGmuiyt2Xxw/W7ZnbjrW3i5//GO2hSeQ7Zq6Qg9Wl3m1UASePmDBoxzfnIAuy7np8QSF3hVMukel/JmerwjjBsz+Tv1xSQMvbjcmoseq69PtDE5eVYnuafZKCNblCTANNmQ1yWra+IK5z8rEKGmv6sbxOjmgKUi7wcwx6nsYXrnzpF21EjqXlfCPzpgScFPVFZeLy1RvCPXfIwSVK+1Ihn+ebnHFBivayvyPwUSY6GLmGm//b90/sc5mjEwLo4ABNc2xsVFrV8P0XBbU75Ju4rEgBPeTJDbGOYDS8oApCD1YzqrITWS6u8FnMNaE46WdJd42kNV1BzOnXgPOLpObxIXrlAfIMbD32aYoT+17FjCFIvTbUS4e4N1Xu6JuRqB4g0huboy5WNZ2aMUx9OEIl0A920dKWwmff9ik3RqyotaAFY320BLg+pNQb7gIM+DLTW/BYqaZvLzf9CjSQAVyonbAdW+jd5lrkpZLY7wArqzUNVVlR4frxNm31SeEUanIQkBT8S65xAqrmbFRRqIqpkndCdKIt7QXpjiPCX0kVBj6VMKUq2XfBxu6g/H69UqGh+lF+bZIZgCkgfTW7SL4bzFn5aIjcVL6MuPeO7FzLUT2QPkIpmvgwQRbbaZLQFRjCTHgeU1zqc9x0GD36jzEJch5UjB/XU+Cal0+X9CeNosn8aeQaTxom/y9N/UslTBHOdwncTsr2etXP2GJwkRUkDZN8RDvnIlV5cvjDIHRxLOdehCDNghpzB3miV3Nlepirdr4y2PNBOEgM35MFqnsxePRbPFz85407xwp8JIjUZd2f4IhfMDI1kC8MVa9ZaJM6QTEkdjsRA/423kvYJNUd3E1zx7P1z9RC6TX+Lq6DCi+Uv6cGsGLQKk9EnTbqPaDTd+csH yfMdtEJa b3R3pUpSkWCgK/kVtZFeHKq2pnWnIdYJj25fbSq9jsphkxDTctSWypYD5J/aHxi6o677+3yeaEEpoFQPe6T/zELurfa97RsmyxZOknROynfmcKT/JE71T9eiAax3Vfi/RtECQxSGwRxU6OhrfOMP46qzl/pxCf5GXeSeuje6xVD9RD00HN9KTZ9nTi8FBfiUwZ49zefX9UW/OOHuxib6yV+s0moWjYkAPKPbJN0jp9XUFJ6wcqg1HGYRh6jKrGdJfHX84HToo27vjRXxRJQ6SCPs+i4lLCNYMI2sqKFgpXKsvjvk86G3whLhX9hOP/2ETiY8j X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Abstract away taking the mmap_lock exclusively, copying page contents, and setting the PMD, into vma_collapse_anon_folio_pmd(). Signed-off-by: Dev Jain --- mm/khugepaged.c | 119 +++++++++++++++++++++++++++--------------------- 1 file changed, 66 insertions(+), 53 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 078794aa3335..88beebef773e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1111,58 +1111,17 @@ static int alloc_charge_folio(struct folio **foliop, struct mm_struct *mm, return SCAN_SUCCEED; } -static int collapse_huge_page(struct mm_struct *mm, unsigned long address, - int referenced, int unmapped, int order, - struct collapse_control *cc) +static int vma_collapse_anon_folio_pmd(struct mm_struct *mm, unsigned long address, + struct vm_area_struct *vma, struct collapse_control *cc, pmd_t *pmd, + struct folio *folio) { + struct mmu_notifier_range range; + spinlock_t *pmd_ptl, *pte_ptl; LIST_HEAD(compound_pagelist); - pmd_t *pmd, _pmd; - pte_t *pte; pgtable_t pgtable; - struct folio *folio; - spinlock_t *pmd_ptl, *pte_ptl; - int result = SCAN_FAIL; - struct vm_area_struct *vma; - struct mmu_notifier_range range; - - VM_BUG_ON(address & ~HPAGE_PMD_MASK); - - /* - * Before allocating the hugepage, release the mmap_lock read lock. - * The allocation can take potentially a long time if it involves - * sync compaction, and we do not need to hold the mmap_lock during - * that. We will recheck the vma after taking it again in write mode. - */ - mmap_read_unlock(mm); - - result = alloc_charge_folio(&folio, mm, order, cc); - if (result != SCAN_SUCCEED) - goto out_nolock; - - mmap_read_lock(mm); - result = hugepage_vma_revalidate(mm, address, true, &vma, order, cc); - if (result != SCAN_SUCCEED) { - mmap_read_unlock(mm); - goto out_nolock; - } - - result = find_pmd_or_thp_or_none(mm, address, &pmd); - if (result != SCAN_SUCCEED) { - mmap_read_unlock(mm); - goto out_nolock; - } - - if (unmapped) { - /* - * __collapse_huge_page_swapin will return with mmap_lock - * released when it fails. So we jump out_nolock directly in - * that case. Continuing to collapse causes inconsistency. - */ - result = __collapse_huge_page_swapin(mm, vma, address, pmd, - referenced, order); - if (result != SCAN_SUCCEED) - goto out_nolock; - } + int result; + pmd_t _pmd; + pte_t *pte; mmap_read_unlock(mm); /* @@ -1174,7 +1133,8 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, * mmap_lock. */ mmap_write_lock(mm); - result = hugepage_vma_revalidate(mm, address, true, &vma, order, cc); + + result = hugepage_vma_revalidate(mm, address, true, &vma, HPAGE_PMD_ORDER, cc); if (result != SCAN_SUCCEED) goto out_up_write; /* check if the pmd is still valid */ @@ -1206,7 +1166,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, pte = pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); if (pte) { result = __collapse_huge_page_isolate(vma, address, pte, cc, - &compound_pagelist, order); + &compound_pagelist, HPAGE_PMD_ORDER); spin_unlock(pte_ptl); } else { result = SCAN_PMD_NULL; @@ -1262,11 +1222,64 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, deferred_split_folio(folio, false); spin_unlock(pmd_ptl); - folio = NULL; - result = SCAN_SUCCEED; out_up_write: mmap_write_unlock(mm); + return result; +} + +static int collapse_huge_page(struct mm_struct *mm, unsigned long address, + int referenced, int unmapped, int order, + struct collapse_control *cc) +{ + struct vm_area_struct *vma; + int result = SCAN_FAIL; + struct folio *folio; + pmd_t *pmd; + + /* + * Before allocating the hugepage, release the mmap_lock read lock. + * The allocation can take potentially a long time if it involves + * sync compaction, and we do not need to hold the mmap_lock during + * that. We will recheck the vma after taking it again in write mode. + */ + mmap_read_unlock(mm); + + result = alloc_charge_folio(&folio, mm, order, cc); + if (result != SCAN_SUCCEED) + goto out_nolock; + + mmap_read_lock(mm); + result = hugepage_vma_revalidate(mm, address, true, &vma, order, cc); + if (result != SCAN_SUCCEED) { + mmap_read_unlock(mm); + goto out_nolock; + } + + result = find_pmd_or_thp_or_none(mm, address, &pmd); + if (result != SCAN_SUCCEED) { + mmap_read_unlock(mm); + goto out_nolock; + } + + if (unmapped) { + /* + * __collapse_huge_page_swapin will return with mmap_lock + * released when it fails. So we jump out_nolock directly in + * that case. Continuing to collapse causes inconsistency. + */ + result = __collapse_huge_page_swapin(mm, vma, address, pmd, + referenced, order); + if (result != SCAN_SUCCEED) + goto out_nolock; + } + + if (order == HPAGE_PMD_ORDER) + result = vma_collapse_anon_folio_pmd(mm, address, vma, cc, pmd, folio); + + if (result == SCAN_SUCCEED) + folio = NULL; + out_nolock: if (folio) folio_put(folio);