From patchwork Tue Apr 1 08:23:39 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jinjiang Tu X-Patchwork-Id: 14034475 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70D36C36014 for ; Tue, 1 Apr 2025 08:33:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3641E280002; Tue, 1 Apr 2025 04:33:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2E9CA280001; Tue, 1 Apr 2025 04:33:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18AA3280002; Tue, 1 Apr 2025 04:33:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id EC684280001 for ; Tue, 1 Apr 2025 04:33:57 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id BD9CA161D3F for ; Tue, 1 Apr 2025 08:33:57 +0000 (UTC) X-FDA: 83284812114.07.D47717A Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf09.hostedemail.com (Postfix) with ESMTP id 02138140009 for ; Tue, 1 Apr 2025 08:33:54 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743496436; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=vwTmwle71UfHXFIKZR6gFUBgVR4cK98x20lcEsVGywk=; b=kOmnooB+lw9wBsxSLseR7xi91nemmtzdY0udZhPq64WcdJiUwPOO5CD4AnWqhTjsL2hXVK B0U/qof5bTXdoHTaxI4+3qWhHv2NpdbiwjCUsEWcE9ni7tuVaJ28N9D57RTSBDfw1XGkjE A2MzBNoLLX8FvpK+BsP8CeW5N5BYcxg= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743496436; a=rsa-sha256; cv=none; b=1ymYEneL3PfvJs/svd8pcCTlf+OkLbp5g4SBG2rnizgU7URrgx6Z70SHVszQY8QbpC3wAc NszDrH14xZR48+LIld1ISnCR3vhEPyzXhZNg9+RyrTqjw0Ft2MYT1dU9nX2Keid50gPpUO zVVBh1y5oxLmQzHpq/cYB2LjXnfr168= Received: from mail.maildlp.com (unknown [172.19.88.163]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4ZRh3y6zHVz2TS4D; Tue, 1 Apr 2025 16:29:02 +0800 (CST) Received: from kwepemo200002.china.huawei.com (unknown [7.202.195.209]) by mail.maildlp.com (Postfix) with ESMTPS id E682718005F; Tue, 1 Apr 2025 16:33:49 +0800 (CST) Received: from huawei.com (10.175.124.71) by kwepemo200002.china.huawei.com (7.202.195.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 1 Apr 2025 16:33:49 +0800 From: Jinjiang Tu To: , , , CC: , , Subject: [PATCH v2] mm/hugetlb: fix set_max_huge_pages() when there are surplus pages Date: Tue, 1 Apr 2025 16:23:39 +0800 Message-ID: <20250401082339.676723-1-tujinjiang@huawei.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Originating-IP: [10.175.124.71] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemo200002.china.huawei.com (7.202.195.209) X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 02138140009 X-Stat-Signature: rafeh6f94gtdndyuchtt4jf1sobpr86q X-Rspam-User: X-HE-Tag: 1743496434-833991 X-HE-Meta: U2FsdGVkX1+2+ggYtHUtG9i48sovsoOqQlk95RIEGsM9e0ujPJ7HEfin32BUHASsfEYiV6zA5kyp1oClqhDoefjzYrMGgb2cyM+L9MRawRcpSyRyujZHaQvwIueP9+AxLKpQhraV/qAwTtBPXXqH0mniS1UaC2r9suuXoIVxMIOl1kO7RdwCqjQzZbpl3QP5eFxrVsrRH/YJxZwnrsZBX9T/aKKpBo3Tj+7jv/ESzo4oiFEcb0FEY0KtXCLL/WCcWw3BwoFYkkMmXq/FWj9orhNQ4gvKyr7rQeQpCQlTDloN90c4n+8TkJlLsdz9amn3aCUr7V0HMOEgELRJHtcDxQfoNOv6xilkW1A5RXfXwfwcqkx3coYD2fZsYO1HjiITFtr2hlz6r79wDV2++SJbals+VgVYqFZXnvgUzvF9Kgp4FXKNRngslEc4F1nH9Z4iW3KaEDn3VE55COhZ11CIPfdVE+7FaJwPdRlmieCvrq2K9UB8wzaaGfDtDDzwfzzh2BymVE4vx6EQY4H+At9Aycb3I8WjeWiRODW3cqCw490Kwq7wQWPwfvW/A6s1t4nV6v72LRdwNEYYIIBfjY0iTVJTfxNXeVNsuuah7Xe4FDqpOLic8ggy0ABepLQq/jK98EU4pH2+PMDF4zxuxD8kypbLOIjb/AqQr/LJjcewYobRj/DD651aksz5idP4kWMzvbFXpKJfeQtcYT6RxFOBnbRkqpdqmsZFxTXdqKhrBKHU0IJWAYAgqe0BcLwUfTKv8QWzD8vFpVjLOv+onNHf3+DICif9TAiRXCNEshyZXlHqHLvsi6vM9hUuax4jt57Wqb5echv5ytmkYlSjwxN5/vsd6loo/vOAX9L5jln2ZBJ6PmhYkZhVS81P+ghammktAR9LTmpm9Ex37boRTpQkx+04fXzHo9SVWYtwUN4s14rgvkqDC6dCs/fSPrPFt4JcnXantJWbni+btvCkQ28 1abgfBBx dmSoj3Eexfi/CldR75pf1pSP4DmC1Vqt9IDjKUvOvUqhCcCOU7UbI5BqQDD8S/VXIFvDxCLFmAOFzjPhBxhh+PTSvQAizC6xKLHTE54IVz2jZfhkSJceXOgwTgrRT8vaRcxKHEYc7Gb81aLMeqLVFEPFipctWk83xzrkF+QiI6i3z8/JllEaqTtvaPKFyVdozh0O4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In set_max_huge_pages(), min_count should mean the acquired persistent huge pages, but it contains surplus huge pages. It will leads to failing to freeing free huge pages for a Node. Steps to reproduce: 1) create 5 hugetlb folios in Node0 2) run a program to use all the hugetlb folios 3) echo 0 > nr_hugepages for Node0 to free the hugetlb folios. Thus the 5 hugetlb folios in Node0 are accounted as surplus. 4) create 5 hugetlb folios in Node1 5) echo 0 > nr_hugepages for Node1 to free the hugetlb folios The result: Node0 Node1 Total 5 5 Free 0 5 Surp 5 5 We couldn't subtract surplus_huge_pages from min_mount, since free hugetlb folios may be surplus due to HVO. In __update_and_free_hugetlb_folio(), hugetlb_vmemmap_restore_folio() may fail, add the folio back to pool and treat it as surplus. If we directly subtract surplus_huge_pages from min_mount, some free folios will be subtracted twice. To fix it, check if count is less than the num of free huge pages that could be destroyed (i.e., available_huge_pages(h)), and remove hugetlb folios if so. Since there may exist free surplus hugetlb folios, we should remove surplus folios first to make surplus count correct. The result with this patch: Node0 Node1 Total 5 0 Free 0 0 Surp 5 0 Fixes: 9a30523066cd ("hugetlb: add per node hstate attributes") Signed-off-by: Jinjiang Tu --- Changelog since v1: * Handle free surplus pages due to HVO mm/hugetlb.c | 47 ++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 40 insertions(+), 7 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 318624c96584..57319ff7856b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3577,7 +3577,7 @@ static void try_to_free_low(struct hstate *h, unsigned long count, struct folio *folio, *next; struct list_head *freel = &h->hugepage_freelists[i]; list_for_each_entry_safe(folio, next, freel, lru) { - if (count >= h->nr_huge_pages) + if (count >= available_huge_pages(h)) goto out; if (folio_test_highmem(folio)) continue; @@ -3631,11 +3631,30 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed, return 1; } +static struct folio *remove_surplus_pool_hugetlb_folio(struct hstate *h, + nodemask_t *nodes_allowed) +{ + int nr_nodes, node; + struct folio *folio = NULL; + + lockdep_assert_held(&hugetlb_lock); + for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) { + if (h->surplus_huge_pages_node[node] && + !list_empty(&h->hugepage_freelists[node])) { + folio = list_entry(h->hugepage_freelists[node].next, + struct folio, lru); + remove_hugetlb_folio(h, folio, true); + break; + } + } + + return folio; +} + #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages) static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, nodemask_t *nodes_allowed) { - unsigned long min_count; unsigned long allocated; struct folio *folio; LIST_HEAD(page_list); @@ -3770,15 +3789,29 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, * and won't grow the pool anywhere else. Not until one of the * sysctls are changed, or the surplus pages go out of use. */ - min_count = h->resv_huge_pages + h->nr_huge_pages - h->free_huge_pages; - min_count = max(count, min_count); - try_to_free_low(h, min_count, nodes_allowed); + try_to_free_low(h, count, nodes_allowed); /* * Collect pages to be removed on list without dropping lock + * + * There may be free surplus huge pages due to HVO, see comments + * in __update_and_free_hugetlb_folio() when calling + * hugetlb_vmemmap_restore_folio(). Collect surplus pages first. */ - while (min_count < persistent_huge_pages(h)) { - folio = remove_pool_hugetlb_folio(h, nodes_allowed, 0); + while (count < available_huge_pages(h)) { + if (h->surplus_huge_pages && h->free_huge_pages) { + folio = remove_surplus_pool_hugetlb_folio(h, nodes_allowed); + if (!folio) + break; + + list_add(&folio->lru, &page_list); + } else { + break; + } + } + + while (count < available_huge_pages(h)) { + folio = remove_pool_hugetlb_folio(h, nodes_allowed, false); if (!folio) break;