From patchwork Sat Jan 11 07:58:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ge Yang X-Patchwork-Id: 13935935 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BC1FE7719A for ; Sat, 11 Jan 2025 07:58:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 590D56B0082; Sat, 11 Jan 2025 02:58:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5414B6B0083; Sat, 11 Jan 2025 02:58:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4083B6B0085; Sat, 11 Jan 2025 02:58:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1E1EA6B0082 for ; Sat, 11 Jan 2025 02:58:53 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9C37A1C83B2 for ; Sat, 11 Jan 2025 07:58:52 +0000 (UTC) X-FDA: 82994419704.28.9412AF3 Received: from m16.mail.126.com (m16.mail.126.com [220.197.31.8]) by imf01.hostedemail.com (Postfix) with ESMTP id 894CA40010 for ; Sat, 11 Jan 2025 07:58:49 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=dNV1mVvD; spf=pass (imf01.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.8 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736582331; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:references:dkim-signature; bh=EbVpNs4nBql5kg9VLUgHnPO9ghVEIHVDXxoIC4i+t0Q=; b=6AwGNWg1b2o7FrGLI14cZh//x79tnw7PvPdZntFIXpt/fNvNCyNEXiUukLiyhQQVCilTiT UvNYWrgNNFcS7eDgvGBabBN8MGmaNS7JW+s/zdzjUeWc/j/9wARlTML/cMKo6+aLDTxfKk wZScgc40RAIMdmxiLL5vcICByRVqDyk= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=dNV1mVvD; spf=pass (imf01.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.8 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736582331; a=rsa-sha256; cv=none; b=chz9X4IuVwUwnA868ejM/RCzPkHmfH2mDl9zKCsLU88tx3oD3BvebmiW4Nk1NDoMlNm7xA uWekVY3ODiEo/eMW7iArDf+wndw4NO928ENJcHcYyNzUmkdi+pz5G8oQRoB0sgHYKBULaO g81e9zFI6nHLBjXsL5S7/kf0lcSv/oU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=From:Subject:Date:Message-Id; bh=EbVpNs4nBql5kg9VLU gHnPO9ghVEIHVDXxoIC4i+t0Q=; b=dNV1mVvDq7e6QdCqDK9K2fac/LdR98bCMn 0KIoc6BuL6viVKQcB1MZeO2Fvb4JPofnxYIU8xN1aj3z0VBTz5ZaggXVzQNcS9xu VBQhz351txywFE6KQOu/InItxDrpUbIPwZe0CreHUH+P5QVyx605vURIHMkfB9lv EoRtHQiu0= Received: from hg-OptiPlex-7040.hygon.cn (unknown [112.64.138.194]) by gzsmtp2 (Coremail) with SMTP id pikvCgDnPgmdJIJnomUpDg--.65089S2; Sat, 11 Jan 2025 15:58:22 +0800 (CST) From: yangge1116@126.com To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, 21cnbao@gmail.com, david@redhat.com, baolin.wang@linux.alibaba.com, sj@kernel.org, liuzixing@hygon.cn, yangge Subject: [PATCH V5] mm: replace free hugepage folios after migration Date: Sat, 11 Jan 2025 15:58:20 +0800 Message-Id: <1736582300-11364-1-git-send-email-yangge1116@126.com> X-Mailer: git-send-email 2.7.4 X-CM-TRANSID: pikvCgDnPgmdJIJnomUpDg--.65089S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxuw47tr47Xr4kKFW3WF1UAwb_yoWxGF17pF ykGrnxGrWDJr9xGr4xAan8Ar1SyrZ5XFWjkFWftw43ZF13t34jkF9Fyw1DZ3yrAr97CF4S vrWqvFWkuF1UZaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0zRolk3UUUUU= X-Originating-IP: [112.64.138.194] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiihfRG2eCCVUhgAABsO X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 894CA40010 X-Rspam-User: X-Stat-Signature: xfd9pxqm77cwznsakwxia6a94n4m95xw X-HE-Tag: 1736582329-266987 X-HE-Meta: U2FsdGVkX1+t+fqCUGk1v4BHVcWKnrjQZMND5GfZVwWY1SGosTPc9k++92wLfJH4ZmD6Ajt53pTNExDEyPoYOlMQfurHOuidkDrZhNwAuE/kiMnLQS7Tet9Mhpr9fsopIVvo4gPfn0mHy9C4RlUBK3XjOUpqZYg+fSdHdOeVLEze3uHdL/nyrCyOJa1kXjnD0HG5dHbgpnAD5JNT7Bv2FCQf54/LWYyaH7g8kCmDqfro7E9P3TKcYE0AHsWLLQc9bQcdsJ2GZX+2v34+iCOIuD4lH62H+7bj9sTcic9svojxtwlpIUp7j222iQrJP2/qweKrtqPpBvdIpjJMcdhgmBTF3b0wQQ/kWEpF4T/xadfkXV0Spx0nDV21nNjvzVwULj3jYDqj2K8b5aUUsGpL6sqcn+nScmEZ5FZK8zwqf0HPXy3B1UhPJojkgCHc3MPvWYvvAnit6TAYD98qjV5koHoyaHH/4FYcLSiaURD+4pq/kNYdCr0xATrWRZeplBWiGs3slMIMW1HaeGutxWrFpmnaW51A0cT/OFLtSyppBRbMIvRV7Mo19vzG9E4YJebW12h2AO32jZ8qj5ZRcA4z1beOXa2EPODWkKB+cbJg3GPHqsE2IVBvRpVXjDSxQLLlFSGBlYhr7E/2Zbsb0H1hmAJLE0GbfEIIaGPw7GTjSFdOeIjXHI86UdUlrL8txaRF/4xxuWw1R1j9+6UdQtr9ctzOq9IXdNrv86OpXuYI6/ombya+H0QcYF1qTFRTFaF30Wv+z/jRHDEx2EHspKerNdQm+V/v/9aG0ROeVl7USeCeXGqpfSAW1dPrwBm64lhEZ5BD2iC4Lwy/pPOCXmhoqClBGmGXGbFXWc62ldYL4YPm1BZ8oYJRr29O+L/SuTR58fCPZt5TCLBnBezxZlf0mxA4lNUetkyFnpGoGWw0CoyY1YcGeeyxiRxp2dkwuD073laWnVdGc13l3mQGQc2 gXXlYVOK MhKf6VWCqxUmr3aW/g5VpRqv8Wrd9EJudgd1QgWHMMVh5mfaK51fgbRY2XcblK8Rw4DJ1ozYFewvj1uHM4MPDbOwXT2aNCpdRUQnAATYSkcwgqXjcoYrozV0qHtHInltyEYTVc01lRMcWogbpFW7xIU/vOuQVkyR1+MbENDh4ao0SU17YBJLN5J2EbXejVs2ELmDHJnv/TQ5HecE9TfgupqopGi0FqcJyWhAzRYwKu+tA1YYH5/6J6O8GZLb9pZDyvljxU5NMom14Vr+0jc4lXeZRbQQT5QLuk79OmkWeiZwXd+6ldtR0Qcw1anSCIQfkcAQcrPIG2zqzx/ICV9YmVwlAxw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: yangge My machine has 4 NUMA nodes, each equipped with 32GB of memory. I have configured each NUMA node with 16GB of CMA and 16GB of in-use hugetlb pages. The allocation of contiguous memory via cma_alloc() can fail probabilistically. When there are free hugetlb folios in the hugetlb pool, during the migration of in-use hugetlb folios, new folios are allocated from the free hugetlb pool. After the migration is completed, the old folios are released back to the free hugetlb pool instead of being returned to the buddy system. This can cause test_pages_isolated() check to fail, ultimately leading to the failure of cma_alloc(). Call trace: cma_alloc() __alloc_contig_migrate_range() // migrate in-use hugepage test_pages_isolated() __test_page_isolated_in_pageblock() PageBuddy(page) // check if the page is in buddy To address this issue, we introduce a function named replace_free_hugepage_folios(). This function will replace the hugepage in the free hugepage pool with a new one and release the old one to the buddy system. After the migration of in-use hugetlb pages is completed, we will invoke replace_free_hugepage_folios() to ensure that these hugepages are properly released to the buddy system. Following this step, when test_pages_isolated() is executed for inspection, it will successfully pass. Additionally, when alloc_contig_range() is used to migrate multiple in-use hugetlb pages, it can result in some in-use hugetlb pages being released back to the free hugetlb pool and subsequently being reallocated and used again. For example: [huge 0] [huge 1] To migrate huge 0, we obtain huge x from the pool. After the migration is completed, we return the now-freed huge 0 back to the pool. When it's time to migrate huge 1, we can simply reuse the now-freed huge 0 from the pool. As a result, when replace_free_hugepage_folios() is executed, it cannot release huge 0 back to the buddy system. To address this issue, we should prevent the reuse of isolated free hugepages during the migration process. Link: https://lkml.kernel.org/r/1734503588-16254-1-git-send-email-yangge1116@126.com Signed-off-by: yangge --- V5: - squash V1 ~ V4 into one fix V4: - mm/hugetlb: prevent reuse of isolated free hugepages V3: - mm/hugetlb: define replace_free_hugepage_folios() on CONFIG_HUGETLB_PAGE=n as static inline V2: - fix comments, 80-column tweak include/linux/hugetlb.h | 7 +++++++ mm/hugetlb.c | 42 ++++++++++++++++++++++++++++++++++++++++++ mm/page_alloc.c | 12 +++++++++++- 3 files changed, 60 insertions(+), 1 deletion(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ae4fe86..10faf42 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -681,6 +681,7 @@ struct huge_bootmem_page { }; int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); +int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, @@ -1059,6 +1060,12 @@ static inline int isolate_or_dissolve_huge_page(struct page *page, return -ENOMEM; } +static inline int replace_free_hugepage_folios(unsigned long start_pfn, + unsigned long end_pfn) +{ + return 0; +} + static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1672bfd..312ed27 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -48,6 +48,7 @@ #include #include "internal.h" #include "hugetlb_vmemmap.h" +#include int hugetlb_max_hstate __read_mostly; unsigned int default_hstate_idx; @@ -1336,6 +1337,9 @@ static struct folio *dequeue_hugetlb_folio_node_exact(struct hstate *h, if (folio_test_hwpoison(folio)) continue; + if (is_migrate_isolate_page(&folio->page)) + continue; + list_move(&folio->lru, &h->hugepage_activelist); folio_ref_unfreeze(folio, 1); folio_clear_hugetlb_freed(folio); @@ -2975,6 +2979,44 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) return ret; } +/* + * replace_free_hugepage_folios - Replace free hugepage folios in a given pfn + * range with new folios. + * @start_pfn: start pfn of the given pfn range + * @end_pfn: end pfn of the given pfn range + * Returns 0 on success, otherwise negated error. + */ +int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) +{ + struct hstate *h; + struct folio *folio; + int ret = 0; + + LIST_HEAD(isolate_list); + + while (start_pfn < end_pfn) { + folio = pfn_folio(start_pfn); + if (folio_test_hugetlb(folio)) { + h = folio_hstate(folio); + } else { + start_pfn++; + continue; + } + + if (!folio_ref_count(folio)) { + ret = alloc_and_dissolve_hugetlb_folio(h, folio, + &isolate_list); + if (ret) + break; + + putback_movable_pages(&isolate_list); + } + start_pfn++; + } + + return ret; +} + struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve) { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 681a6fa..aa70d0e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6507,7 +6507,17 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end, ret = __alloc_contig_migrate_range(&cc, start, end, migratetype); if (ret && ret != -EBUSY) goto done; - ret = 0; + + /* + * When in-use hugetlb pages are migrated, they may simply be released + * back into the free hugepage pool instead of being returned to the + * buddy system. After the migration of in-use huge pages is completed, + * we will invoke replace_free_hugepage_folios() to ensure that these + * hugepages are properly released to the buddy system. + */ + ret = replace_free_hugepage_folios(start, end); + if (ret) + goto done; /* * Pages from [start, end) are within a pageblock_nr_pages