From patchwork Tue Oct 3 00:18:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13406631 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2E99E776DA for ; Tue, 3 Oct 2023 00:18:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 303EC8D0051; Mon, 2 Oct 2023 20:18:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B4678D0003; Mon, 2 Oct 2023 20:18:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 091338D0051; Mon, 2 Oct 2023 20:18:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E70CD8D0003 for ; Mon, 2 Oct 2023 20:18:34 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id BB52E140123 for ; Tue, 3 Oct 2023 00:18:34 +0000 (UTC) X-FDA: 81302238948.20.F145173 Received: from mail-yw1-f180.google.com (mail-yw1-f180.google.com [209.85.128.180]) by imf15.hostedemail.com (Postfix) with ESMTP id 06EE5A000C for ; Tue, 3 Oct 2023 00:18:32 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GlZVdX7V; spf=pass (imf15.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.128.180 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696292313; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ljlf1C6cemVWRMa2u3+4If4eJjY7Mwf2KpBD+q2vND4=; b=GyNRUhW59eJKAov4jyqaPEgRmMPo90RzbpAxWrsKnAeLO6RQ8g4kTADEcwbp6qEGUy7/vl aX+dRt9Ugluqwht4JbPC342HVNl/BbKd4l2eWhCqrXq+0tgqlmLq8BwZA3QsRP1cY8jweI g4KOHhB9zG68a30lHXMKHShEhz/7w/w= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696292313; a=rsa-sha256; cv=none; b=g9f58wAzczJ9n2rPM3IiXde4ZtlieoTfwjxvEIRAqgo1M+HtTpfr/ZMq54qSH5BN70Im1/ /75gyj28w2GzJ+0suoXQBVS/Fa0vn7jX4umrk8L+P4dM83DI+UldsiuN9l7+cQb+DsQEU5 9k9iQyYsKJbhHOgkvBf4lCEkr3/9NSM= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GlZVdX7V; spf=pass (imf15.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.128.180 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-yw1-f180.google.com with SMTP id 00721157ae682-59f6441215dso4573427b3.2 for ; Mon, 02 Oct 2023 17:18:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696292312; x=1696897112; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ljlf1C6cemVWRMa2u3+4If4eJjY7Mwf2KpBD+q2vND4=; b=GlZVdX7Vr4SVTmHI4Vt8DTUeQl55I7VB6v2p1poOSM1zTtgZW0k9vrx60GNanNAEJL wmHv0IdcxHUzgodeMkHKfvjYdYIgpLpwbPMSlYvWkH4a7p71SxaEaTO5kqA0NvugWu9K K9yQC/McV8KWZmAhdJ49udesCiPzCzd0u0wcWcWdUxhv876GKvbnoRCfosNk3PfkryIU a0wDolLUDmNTU9tMMFRgxnu3j3IrmWSZskzuhqLdHSwcyxqSLstTEY8FFYFdIU2+Iiiv 9mhRRjUJAGvQ4lg5bnjYa2Q5/HFevXf3YzOuVYydAQld97+p8BYRl6XU1ihfMukc8FA+ IMCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696292312; x=1696897112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ljlf1C6cemVWRMa2u3+4If4eJjY7Mwf2KpBD+q2vND4=; b=RQv0Jz7wTyJp35ckzrAqDZ1UkMKlgCkQjDh370E31+jFoQV4iPEz3tUHXBdeaXoLAj mYXmGhlIrCXBUq7ZYBrpHL9q1TuXZEa8E8+88pSRWcSL6FAg4B8TJG1W8MbIXpn7X72p H4fihiytaYWJ4kwzY1MzHDYKN7qkMvP7SWm3kX55mIqRosEuKzSEQTmq1ER7Id75w/8C Ei4Q01cnJ7J/3IaoccmZsXgBJt48izcpxcz3yMz8o80PbjNQW69rPIozD5a57C/P9pmH 2Ynsqe76kIZXXYoN8qr3WmaEIMbkCGjBcw+H6KODukRKCplRAUOmzrthPgFrFYvZs6X5 WQ0A== X-Gm-Message-State: AOJu0Yyk9T1kdBt8ljozD7ceNewNKLfs0/oAkjZobbnqm6yWX3RG7pPT DSD737Y4NAZ53bWurCNsvDw= X-Google-Smtp-Source: AGHT+IHKZ2t5Y+qhgEWn4ftJqbQ0/DCHUPMUTJhv+VodPGYm/UEFAikE60ER66YrSWH+pyTsNpZdDw== X-Received: by 2002:a0d:f146:0:b0:589:fad6:c17c with SMTP id a67-20020a0df146000000b00589fad6c17cmr13490093ywf.45.1696292311958; Mon, 02 Oct 2023 17:18:31 -0700 (PDT) Received: from localhost (fwdproxy-prn-015.fbsv.net. [2a03:2880:ff:f::face:b00c]) by smtp.gmail.com with ESMTPSA id a15-20020a17090a70cf00b00276bde3b8cesm6793326pjm.15.2023.10.02.17.18.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Oct 2023 17:18:31 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: riel@surriel.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, fvdl@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in memory controller Date: Mon, 2 Oct 2023 17:18:27 -0700 Message-Id: <20231003001828.2554080-3-nphamcs@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231003001828.2554080-1-nphamcs@gmail.com> References: <20231003001828.2554080-1-nphamcs@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 06EE5A000C X-Rspam-User: X-Stat-Signature: xddm3birm7agoqjreynkjhz8eofu5kfe X-Rspamd-Server: rspam03 X-HE-Tag: 1696292312-643790 X-HE-Meta: U2FsdGVkX18IrYUoQ1fwdS/ENlEk/66k7GDNspO6dS7W8xqzN3UUPQEaoFZ5IVsp9U9xdPCIvGFZowBVh8hr/dLaD0zy+LAeyoAUcqHZFrryNplBaeIeHkoc4F6phiIDyldFfBAHU1P5gsuF7qe7ImE/YfRFTrVgjEhYYiVh0b3AsvwBANGPA42fQ7LDCZe6FzMeyan0vXKa1cKQ89wLPVRMA8KZIH4BAgbLwrYKKta/ccGn66/GWOnrUAjT2zswedVP99I99o/y+NQMuSfYMWqqhxbUzi9DzJM9GoJecFHLAPqHlyGpGSpuwqlnR/YhivcaxJC6SU+eEes9f3OrKmzSOA60UBB7fDyi/X6qws51lurpsMehjmQhSmUSy395glsPp+wyNk3hYXUoBVDZFa6T/EFQiygO2tLza9IrKU6/4br2mCaWE9nWRObe7wAH5pv2lVWtv05ddiZosXGubyjMIpbUj/A0x27+TjweSbSVn6VhwLLfs76bMLb+vpaGiT5hWu9tdn/zxc153DrfeXhlRhl3fo7gu3XiJQTvkT2WNNDLokUlKP0rpcS4sIZytpWDAYEV9oaNb0dJKzMwz3Ekht8RVqry8TErmn0eSpKyvxQIpEohn5EPci3S1Q/arZrWAfP2dWc+yAYoZeLikYMIpViWoNhNjZD5clbVr7kuPvjnsFTFSKBF4uvFKTnSp43hCQzsfmF8FMn9TsXoPi+WioQECn3WnejlnvaiS73coqt9mmNrhvy33KpPcPxj3yO41i561jwyqHSczQBaE3LywVnDAS0s3ZOnlHECpocknk2Ovtfe/yp1ZQsClEoL/59RRi0X6lih4/eoV3NWQLqRrhU7H9kAgj/MrbTs+0ZHcOmjXb3+7v4A18O9qj2Rv4p8RMbshQnIBtb3t8OxUSTgDKOi+TePk6IzMTcc8y5Y/b+Whn4QxTAm+kA1I5IMq781qY7ukqG2pXHPsBa XJNxOhsz jDmBeCLhdfG436k+vSciagmfB2j7zIoWB9C71A9+wYm4ZFCG5kZv6C9nusvtd7YIxQEGPrIUKXMnClSHjPYYL5rZ90+ARZOlgFPj3UdPqoUOP7FmwX1qT2atbzLM0Usc+hPGuz46LX64LUAQ2esTRFvokJyZJLXQg9uBWFI+8OXfvvUwesxM6L6WPCiSTDiGHTmuKK9pdkWGxMP2lWfUDvg83YJAUzvpUOsuu1Da6wWwA0gcaRlTphBKxm5nbdYAb9+NU6/0tdjoUG3w3Ts96L7QkNfzyITuJ0UXRziPh+GfM7REV3uOiCNrIwPK3p96c/OIEyCV2p4wZdrEt2nKwIzsb5FoolarliR8ebZIP2SVVmclylDy3Y/shdNBle72YBviqdxdHgudXCNFOn5qg4RvyChs4HfwWEv2nzSeEW8C0D6b5dP1M0qSJ7g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, hugetlb memory usage is not acounted for in the memory controller, which could lead to memory overprotection for cgroups with hugetlb-backed memory. This has been observed in our production system. For instance, here is one of our usecases: suppose there are two 32G containers. The machine is booted with hugetlb_cma=6G, and each container may or may not use up to 3 gigantic page, depending on the workload within it. The rest is anon, cache, slab, etc. We can set the hugetlb cgroup limit of each cgroup to 3G to enforce hugetlb fairness. But it is very difficult to configure memory.max to keep overall consumption, including anon, cache, slab etc. fair. What we have had to resort to is to constantly poll hugetlb usage and readjust memory.max. Similar procedure is done to other memory limits (memory.low for e.g). However, this is rather cumbersome and buggy. Furthermore, when there is a delay in memory limits correction, (for e.g when hugetlb usage changes within consecutive runs of the userspace agent), the system could be in an over/underprotected state. This patch rectifies this issue by charging the memcg when the hugetlb folio is utilized, and uncharging when the folio is freed (analogous to the hugetlb controller). Note that we do not charge when the folio is allocated to the hugetlb pool, because at this point it is not owned by any memcg. Some caveats to consider: * This feature is only available on cgroup v2. * There is no hugetlb pool management involved in the memory controller. As stated above, hugetlb folios are only charged towards the memory controller when it is used. Host overcommit management has to consider it when configuring hard limits. * Failure to charge towards the memcg results in SIGBUS. This could happen even if the hugetlb pool still has pages (but the cgroup limit is hit and reclaim attempt fails). * When this feature is enabled, hugetlb pages contribute to memory reclaim protection. low, min limits tuning must take into account hugetlb memory. * Hugetlb pages utilized while this option is not selected will not be tracked by the memory controller (even if cgroup v2 is remounted later on). Signed-off-by: Nhat Pham Acked-by: Johannes Weiner --- Documentation/admin-guide/cgroup-v2.rst | 29 ++++++++++++++++++++ include/linux/cgroup-defs.h | 5 ++++ include/linux/memcontrol.h | 9 +++++++ kernel/cgroup/cgroup.c | 15 ++++++++++- mm/hugetlb.c | 35 ++++++++++++++++++++----- mm/memcontrol.c | 35 +++++++++++++++++++++++++ 6 files changed, 120 insertions(+), 8 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 622a7f28db1f..606b2e0eac4b 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -210,6 +210,35 @@ cgroup v2 currently supports the following mount options. relying on the original semantics (e.g. specifying bogusly high 'bypass' protection values at higher tree levels). + memory_hugetlb_accounting + Count HugeTLB memory usage towards the cgroup's overall + memory usage for the memory controller (for the purpose of + statistics reporting and memory protetion). This is a new + behavior that could regress existing setups, so it must be + explicitly opted in with this mount option. + + A few caveats to keep in mind: + + * There is no HugeTLB pool management involved in the memory + controller. The pre-allocated pool does not belong to anyone. + Specifically, when a new HugeTLB folio is allocated to + the pool, it is not accounted for from the perspective of the + memory controller. It is only charged to a cgroup when it is + actually used (for e.g at page fault time). Host memory + overcommit management has to consider this when configuring + hard limits. In general, HugeTLB pool management should be + done via other mechanisms (such as the HugeTLB controller). + * Failure to charge a HugeTLB folio to the memory controller + results in SIGBUS. This could happen even if the HugeTLB pool + still has pages available (but the cgroup limit is hit and + reclaim attempt fails). + * Charging HugeTLB memory towards the memory controller affects + memory protection and reclaim dynamics. Any userspace tuning + (of low, min limits for e.g) needs to take this into account. + * HugeTLB pages utilized while this option is not selected + will not be tracked by the memory controller (even if cgroup + v2 is remounted later on). + Organizing Processes and Threads -------------------------------- diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index f1b3151ac30b..8641f4320c98 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -115,6 +115,11 @@ enum { * Enable recursive subtree protection */ CGRP_ROOT_MEMORY_RECURSIVE_PROT = (1 << 18), + + /* + * Enable hugetlb accounting for the memory controller. + */ + CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING = (1 << 19), }; /* cftype->flags */ diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 42bf7e9b1a2f..a827e2129790 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -679,6 +679,9 @@ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, return __mem_cgroup_charge(folio, mm, gfp); } +int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp, + long nr_pages); + int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry); void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry); @@ -1262,6 +1265,12 @@ static inline int mem_cgroup_charge(struct folio *folio, return 0; } +static inline int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, + gfp_t gfp, long nr_pages) +{ + return 0; +} + static inline int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry) { diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 1fb7f562289d..f11488b18ceb 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1902,6 +1902,7 @@ enum cgroup2_param { Opt_favordynmods, Opt_memory_localevents, Opt_memory_recursiveprot, + Opt_memory_hugetlb_accounting, nr__cgroup2_params }; @@ -1910,6 +1911,7 @@ static const struct fs_parameter_spec cgroup2_fs_parameters[] = { fsparam_flag("favordynmods", Opt_favordynmods), fsparam_flag("memory_localevents", Opt_memory_localevents), fsparam_flag("memory_recursiveprot", Opt_memory_recursiveprot), + fsparam_flag("memory_hugetlb_accounting", Opt_memory_hugetlb_accounting), {} }; @@ -1936,6 +1938,9 @@ static int cgroup2_parse_param(struct fs_context *fc, struct fs_parameter *param case Opt_memory_recursiveprot: ctx->flags |= CGRP_ROOT_MEMORY_RECURSIVE_PROT; return 0; + case Opt_memory_hugetlb_accounting: + ctx->flags |= CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING; + return 0; } return -EINVAL; } @@ -1960,6 +1965,11 @@ static void apply_cgroup_root_flags(unsigned int root_flags) cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_RECURSIVE_PROT; else cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_RECURSIVE_PROT; + + if (root_flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING) + cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING; + else + cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING; } } @@ -1973,6 +1983,8 @@ static int cgroup_show_options(struct seq_file *seq, struct kernfs_root *kf_root seq_puts(seq, ",memory_localevents"); if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_RECURSIVE_PROT) seq_puts(seq, ",memory_recursiveprot"); + if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING) + seq_puts(seq, ",memory_hugetlb_accounting"); return 0; } @@ -7050,7 +7062,8 @@ static ssize_t features_show(struct kobject *kobj, struct kobj_attribute *attr, "nsdelegate\n" "favordynmods\n" "memory_localevents\n" - "memory_recursiveprot\n"); + "memory_recursiveprot\n" + "memory_hugetlb_accounting\n"); } static struct kobj_attribute cgroup_features_attr = __ATTR_RO(features); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index de220e3ff8be..74472e911b0a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1902,6 +1902,7 @@ void free_huge_folio(struct folio *folio) pages_per_huge_page(h), folio); hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h), pages_per_huge_page(h), folio); + mem_cgroup_uncharge(folio); if (restore_reserve) h->resv_huge_pages++; @@ -3009,11 +3010,20 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, struct hugepage_subpool *spool = subpool_vma(vma); struct hstate *h = hstate_vma(vma); struct folio *folio; - long map_chg, map_commit; + long map_chg, map_commit, nr_pages = pages_per_huge_page(h); long gbl_chg; - int ret, idx; + int memcg_charge_ret, ret, idx; struct hugetlb_cgroup *h_cg = NULL; + struct mem_cgroup *memcg; bool deferred_reserve; + gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; + + memcg = get_mem_cgroup_from_current(); + memcg_charge_ret = mem_cgroup_hugetlb_try_charge(memcg, gfp, nr_pages); + if (memcg_charge_ret == -ENOMEM) { + mem_cgroup_put(memcg); + return ERR_PTR(-ENOMEM); + } idx = hstate_index(h); /* @@ -3022,8 +3032,12 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, * code of zero indicates a reservation exists (no change). */ map_chg = gbl_chg = vma_needs_reservation(h, vma, addr); - if (map_chg < 0) + if (map_chg < 0) { + if (!memcg_charge_ret) + mem_cgroup_cancel_charge(memcg, nr_pages); + mem_cgroup_put(memcg); return ERR_PTR(-ENOMEM); + } /* * Processes that did not create the mapping will have no @@ -3034,10 +3048,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, */ if (map_chg || avoid_reserve) { gbl_chg = hugepage_subpool_get_pages(spool, 1); - if (gbl_chg < 0) { - vma_end_reservation(h, vma, addr); - return ERR_PTR(-ENOSPC); - } + if (gbl_chg < 0) + goto out_end_reservation; /* * Even though there was no reservation in the region/reserve @@ -3119,6 +3131,11 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h), pages_per_huge_page(h), folio); } + + if (!memcg_charge_ret) + mem_cgroup_commit_charge(folio, memcg); + mem_cgroup_put(memcg); + return folio; out_uncharge_cgroup: @@ -3130,7 +3147,11 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, out_subpool_put: if (map_chg || avoid_reserve) hugepage_subpool_put_pages(spool, 1); +out_end_reservation: vma_end_reservation(h, vma, addr); + if (!memcg_charge_ret) + mem_cgroup_cancel_charge(memcg, nr_pages); + mem_cgroup_put(memcg); return ERR_PTR(-ENOSPC); } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 0219befeae38..6660684f6f97 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7085,6 +7085,41 @@ int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp) return ret; } +/** + * mem_cgroup_hugetlb_try_charge - try to charge the memcg for a hugetlb folio + * @memcg: memcg to charge. + * @gfp: reclaim mode. + * @nr_pages: number of pages to charge. + * + * This function is called when allocating a huge page folio to determine if + * the memcg has the capacity for it. It does not commit the charge yet, + * as the hugetlb folio itself has not been obtained from the hugetlb pool. + * + * Once we have obtained the hugetlb folio, we can call + * mem_cgroup_commit_charge() to commit the charge. If we fail to obtain the + * folio, we should instead call mem_cgroup_cancel_charge() to undo the effect + * of try_charge(). + * + * Returns 0 on success. Otherwise, an error code is returned. + */ +int mem_cgroup_hugetlb_try_charge(struct mem_cgroup *memcg, gfp_t gfp, + long nr_pages) +{ + /* + * If hugetlb memcg charging is not enabled, do not fail hugetlb allocation, + * but do not attempt to commit charge later (or cancel on error) either. + */ + if (mem_cgroup_disabled() || !memcg || + !cgroup_subsys_on_dfl(memory_cgrp_subsys) || + !(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING)) + return -EOPNOTSUPP; + + if (try_charge(memcg, gfp, nr_pages)) + return -ENOMEM; + + return 0; +} + /** * mem_cgroup_swapin_charge_folio - Charge a newly allocated folio for swapin. * @folio: folio to charge.