From patchwork Tue Mar 25 06:16:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wupeng Ma X-Patchwork-Id: 14028105 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E818C35FFC for ; Tue, 25 Mar 2025 06:26:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B1026280002; Tue, 25 Mar 2025 02:26:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ABFDA280001; Tue, 25 Mar 2025 02:26:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9AF25280002; Tue, 25 Mar 2025 02:26:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7E163280001 for ; Tue, 25 Mar 2025 02:26:29 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DE09A140B6C for ; Tue, 25 Mar 2025 06:26:30 +0000 (UTC) X-FDA: 83259089340.09.BBB342A Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf24.hostedemail.com (Postfix) with ESMTP id 0BCA1180007 for ; Tue, 25 Mar 2025 06:26:27 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=none; spf=pass (imf24.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742883989; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=Y/7PZhDsk6eg/b8wTyrVIBIGT7aGebosOaT03QhzX2k=; b=3EFd/zRUcJYBT1wutj1tpYDVH847sukXDBQFRpezoSew0VLi+fI8PcFKDKo61zjsPY4Elm anhrl7NkbL5+ySfwYrQmSlMttjdMJMjM2ig6syqqek8qYyC48yYFyjV48okKD1MOtMsUgX o2JgILMTSCg54f7cpeKbYzU/L//Yec8= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none; spf=pass (imf24.hostedemail.com: domain of mawupeng1@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=mawupeng1@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742883989; a=rsa-sha256; cv=none; b=cPM3aEjMgrMYF4i356i3XXUJ7t//UvwDKuQlvsgMLJX96n7oxVpzSuEaXUR9MMHcOlLaQL InSp5uBSGKCs+2LL9eZjnrMvvbr6VMi9IBbCqg17wbT3ZBdGD3wsESDSeMuqCOvtp4KI4l NVRQh3euOJvKUO16OpkV+mRjMKczlqo= Received: from mail.maildlp.com (unknown [172.19.163.252]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4ZMKf10n59ztRM2; Tue, 25 Mar 2025 14:24:57 +0800 (CST) Received: from kwepemg100017.china.huawei.com (unknown [7.202.181.58]) by mail.maildlp.com (Postfix) with ESMTPS id D35C81800E2; Tue, 25 Mar 2025 14:26:23 +0800 (CST) Received: from huawei.com (10.175.124.71) by kwepemg100017.china.huawei.com (7.202.181.58) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 25 Mar 2025 14:26:23 +0800 From: Wupeng Ma To: , , CC: , , , Subject: [RFC PATCH] mm: hugetlb: Fix incorrect fallback for subpool Date: Tue, 25 Mar 2025 14:16:34 +0800 Message-ID: <20250325061634.2118202-1-mawupeng1@huawei.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Originating-IP: [10.175.124.71] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemg100017.china.huawei.com (7.202.181.58) X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 0BCA1180007 X-Stat-Signature: xk3n38sdbge953wah6oez9xxsssufkda X-Rspam-User: X-HE-Tag: 1742883987-2310 X-HE-Meta: U2FsdGVkX1/XkXaTBrayLzQKXBTtqm0Ooi4YukTLPprTJNzxdkfEB4ofNgI+OEPJFblj8PxwXhQs4Jwgtq1JwqIuHzs/UFXeSLSEz96iQ9a1dBslWOuNLqOdULvD8L145PL6uw3GX9zG2KRLouCIwymRN/lUPg/sffiCKxfqWzC01/hcQh20A6yp+B7Yu8rX8Iwz04TkyGoF4WsMhDAJ2+Iybkf+o25zwGz3n8H+5IXAKoijKRxWcc/glFVQKo0SkNq/IFOWBlZaJ36COYuH/qmy9rIeX/ronmRpi4moFWWHwN9CFN4THc0flZosaD9pLsC2KHS1zZT0xFHjtmQZ1QDE7HVR4k1l6NQnv5bSzbxWYMZqxSvRChxXGABg4WLlQNftA45MD+vf87hzOn/YbJkOyv298SwIMKx+Ixn8a1j4fhWL59siXLeOx8RUM9QicJR7WyIUwBQqsY10W/cg2XN6TV94GfH0sB3fTpOQnC5g61K2TgOxzg2nTVzCsP7DOETwW+jJyROFwPCi9n2bRVKUFxcNbfRGIT7V/iI44niXp5I9t6X9NKZYw7VuOaZOgtB6qBrXCnidFQuKiJOwhdn45SsuiSZy+KwhuL7hV8TRr01+ZqrzBjC77GVYN0BOzNzODHTpqkmtOBb5TmQY8GkpJH/kWkLjQHQngeYN6yupp6WwCuFgddOoI81n/czhxzmQdyQJnxB7DNPWdB/QojIkcFGKWWVp2ZF1eWvj/3qZ51YLFKG0QRBO7s3RF77Th/io2z5iZMVtH3/nqzglBeaFKkxKiSXtTyJZv7meuW3AAhOc8mdCuA6HMJY5jbDjixkELLLbXo5kc9+tUq14WCbSh1QSVhIjOuIGuhCix/RquNEOUvABGL6XqJbwEPYcqtP0eH1aVYG+MYaoMVdcKy3iuz/kAMOdhL4vHwe7e+S3hcVX9lmploAsST2sXfGIdT8ZaSN9gFD5I+u56G9 LXgkJOTU XNgM2C2ev6OTNSrK9WmrKCtXC7hI9tFO7sp5CnK0nU3qcIVvJdAWsXAWrYQgK0o8jQVfDojsQ8+xfOR8I1VIlmy+vEwzmiH9KqnIMQ9MDo2JWTHF1bjP+i6ZnfkFtu4SJ2H6BKYEsrpKfptnSrFQKebL5Lg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: During our testing with hugetlb subpool enabled, we observe that hstate->resv_huge_pages may underflow into negative values. Root cause analysis reveals a race condition in subpool reservation fallback handling as follow: hugetlb_reserve_pages() /* Attempt subpool reservation */ gbl_reserve = hugepage_subpool_get_pages(spool, chg); /* Global reservation may fail after subpool allocation */ if (hugetlb_acct_memory(h, gbl_reserve) < 0) goto out_put_pages; out_put_pages: /* This incorrectly restores reservation to subpool */ hugepage_subpool_put_pages(spool, chg); When hugetlb_acct_memory() fails after subpool allocation, the current implementation over-commits subpool reservations by returning the full 'chg' value instead of the actual allocated 'gbl_reserve' amount. This discrepancy propagates to global reservations during subsequent releases, eventually causing resv_huge_pages underflow. This problem can be trigger easily with the following steps: 1. reverse hugepage for hugeltb allocation 2. mount hugetlbfs with min_size to enable hugetlb subpool 3. alloc hugepages with two task(make sure the second will fail due to insufficient amount of hugepages) 4. with for a few seconds and repeat step 3 which will make hstate->resv_huge_pages to go below zero. To fix this problem, return corrent amount of pages to subpool during the fallback after hugepage_subpool_get_pages is called. Fixes: 1c5ecae3a93f ("hugetlbfs: add minimum size accounting to subpools") Signed-off-by: Wupeng Ma Reviewed-by: Joshua Hahn Tested-by: Joshua Hahn --- mm/hugetlb.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 318624c96584..dc4449592d00 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2987,7 +2987,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, struct hugepage_subpool *spool = subpool_vma(vma); struct hstate *h = hstate_vma(vma); struct folio *folio; - long retval, gbl_chg; + long retval, gbl_chg, gbl_reserve; map_chg_state map_chg; int ret, idx; struct hugetlb_cgroup *h_cg = NULL; @@ -3140,8 +3140,16 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), h_cg); out_subpool_put: - if (map_chg) - hugepage_subpool_put_pages(spool, 1); + /* + * put page to subpool iff the quota of subpool's rsv_hpages is used + * during hugepage_subpool_get_pages. + */ + if (map_chg && !gbl_chg) { + gbl_reserve = hugepage_subpool_put_pages(spool, 1); + hugetlb_acct_memory(h, -gbl_reserve); + } + + out_end_reservation: if (map_chg != MAP_CHG_ENFORCED) vma_end_reservation(h, vma, addr); @@ -6949,7 +6957,7 @@ bool hugetlb_reserve_pages(struct inode *inode, struct vm_area_struct *vma, vm_flags_t vm_flags) { - long chg = -1, add = -1; + long chg = -1, add = -1, spool_resv, gbl_resv; struct hstate *h = hstate_inode(inode); struct hugepage_subpool *spool = subpool_inode(inode); struct resv_map *resv_map; @@ -7084,8 +7092,16 @@ bool hugetlb_reserve_pages(struct inode *inode, return true; out_put_pages: - /* put back original number of pages, chg */ - (void)hugepage_subpool_put_pages(spool, chg); + spool_resv = chg - gbl_reserve; + if (spool_resv) { + /* put sub pool's reservation back, chg - gbl_reserve */ + gbl_resv = hugepage_subpool_put_pages(spool, spool_resv); + /* + * subpool's reserved pages can not be put back due to race, + * return to hstate. + */ + hugetlb_acct_memory(h, -gbl_resv); + } out_uncharge_cgroup: hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h), chg * pages_per_huge_page(h), h_cg);