From patchwork Thu Sep 28 00:57:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13401844 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53750E80ABF for ; Thu, 28 Sep 2023 00:57:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E648F8D007C; Wed, 27 Sep 2023 20:57:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D9F2D8D0002; Wed, 27 Sep 2023 20:57:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF1C58D007C; Wed, 27 Sep 2023 20:57:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B06328D0002 for ; Wed, 27 Sep 2023 20:57:28 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7CF44C0237 for ; Thu, 28 Sep 2023 00:57:28 +0000 (UTC) X-FDA: 81284192976.20.81BFAFB Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf25.hostedemail.com (Postfix) with ESMTP id 96A3CA0010 for ; Thu, 28 Sep 2023 00:57:26 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FaVJjjVj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695862646; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=l+H70WFrHjMrC33zWeSocX4v9+VGzmilW73FM+FD2Vg=; b=HLeekaTRUX2afuRuSylvhleYt0Xvu5kE8UnBaTKxBmrblYD1rsKtOR0yZLnat3/EkVp4E/ hIc8R6MhxQ3h0379M8C6LZhDPwcyFLGozZ2nQLvtYidID62fLJULFjNzAaDOmtcXdWloAg NrGsB4PQE7E3hMpjyFj5mSdowFpXbGg= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=FaVJjjVj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695862646; a=rsa-sha256; cv=none; b=jEtSBCO+FA3zRk3QpgZgUfGpHwHE9Tidh56QmGWTgrJRzbWH7Cr54M/Kn+byPcNRylEJ8F tv+ENXZ+ZQfCuJ/FYa5vElS0Zqvb9dyHeiaCYucY1BNnS1+aOrB4YqL1ImGoho5qC9IFjS JRMQ2nOFuD+hBV3V9N7j30NbgziLTgc= Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-1c60778a3bfso80173355ad.1 for ; Wed, 27 Sep 2023 17:57:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695862645; x=1696467445; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=l+H70WFrHjMrC33zWeSocX4v9+VGzmilW73FM+FD2Vg=; b=FaVJjjVj8bDsaEBkmRs3uXICi94CxinJX2AgCG94uSiitb4hqP2FbnMD/dTem2FKxx EQiZ49EpSfdECWgbWBhABCVoLCRgOmYig0yL5EwXN21lt+MZWzGfXVE3qoDjC7MDFc9/ mtb/LSf5aFMpss8fNa8awtRNqgVvLCk88SJqCCSJVWxX4kyqc67ijnhDhmpm1rNfTp9U L1y6sNmV38f2mfhc+nGW2TGNQNwHBP9LGRq6yVBqzihGODiDrSMb/bflAVudGVlwJobT Q+M9+Th+4J4dNLES3/ckx/y3cBTJYpHkMa8C3Z2KuylkrZJHvqEl3DNYMCxcKHctMZqo IW2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695862645; x=1696467445; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=l+H70WFrHjMrC33zWeSocX4v9+VGzmilW73FM+FD2Vg=; b=I7DhsJyFyzQILo5JRlzHoXSmuWuFVhONysAi7lXsKRJrnxyc1F/VA3Gt0GYV/ZIPSd vLdee/IiD3MrQwUKNjcaiCZJ8netk3DiaQU1e/c8spYPrxsOz9d3YcL0JS48D7j82CCB cmP1YFqOMWMGnp6Gbxh5Xx/wKP+QRWqWnEM/YmLyUCdIhl93EBOlFH9AwqVaqIRcpiv2 57tCn+eq8r71PVSQdE4GZ/nasiRLodxtjXd+P91bneV5epmTMJhayVD5zka1WG2wD1rd aCxmuq/MPsAP89aVstGFCcOYE2jWXnAwnyoBsy5G0iDtxoQ4Ljvv3HuyV8qrHMkzxbRP zhbw== X-Gm-Message-State: AOJu0YxeCqIVsTZmfSCiRoQiGQSdIgNscnGNK5oFj4fApu6MHK92Xght clkvNty4Ijh79Xb72LxylMc= X-Google-Smtp-Source: AGHT+IEjRNoqzXKF5gkF19vMW7RkK619NLwy8fCyfUasNScpL0Etc6j3YwNDSSRZ9ZPXyjI3TILnCw== X-Received: by 2002:a17:902:c409:b0:1c5:76b6:d94f with SMTP id k9-20020a170902c40900b001c576b6d94fmr4207589plk.31.1695862645217; Wed, 27 Sep 2023 17:57:25 -0700 (PDT) Received: from localhost (fwdproxy-prn-117.fbsv.net. [2a03:2880:ff:75::face:b00c]) by smtp.gmail.com with ESMTPSA id w4-20020a170902d3c400b001c737387810sm91937plb.31.2023.09.27.17.57.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Sep 2023 17:57:24 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: riel@surriel.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: [PATCH v2 1/2] hugetlb: memcg: account hugetlb-backed memory in memory controller Date: Wed, 27 Sep 2023 17:57:22 -0700 Message-Id: <20230928005723.1709119-2-nphamcs@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230928005723.1709119-1-nphamcs@gmail.com> References: <20230928005723.1709119-1-nphamcs@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 96A3CA0010 X-Stat-Signature: 43abkhhbfcp7u3meryj983aty3x5npt8 X-HE-Tag: 1695862646-450150 X-HE-Meta: U2FsdGVkX19fN5dWCOToPblZ4WWGJU7BVmUrHDoYFtguiv8+IIscsfxzmWbevWFC53FhgX5vQQuUGPa4W++JgaUnljekYS+7I5Z8lwvL8VO78u+bb1Gww783QhUO999HNs3je5ug+2K2jVtHfr8F6lbAQnJIOoqHKdI4R76Lo7k5CrFj8gR1Ou89AK6jpP8uUKhu2WdtTupZZUqHn9lsYWjQMzebzP3GAeF/xX2mKeCLmmK2WCi16C9vs/XW/uwTQjkH0YfReAQs1u53bag2H6+pPQIg9iPsWzluQjfd4ozZLWoBA4HKi6QD7LO/xjhte9x8F4CiKGrh5Dqy/qEU2DJv0b9ZqGQ3EKqXLUCIncg9X/+M0z/+v1OP1YxySF8oMOrCEuKoFAwC4jDubxc7QClFTh/7x+Q/Ql/okQZVVLGBePsIi8kzLElHfZEmUCdRYvoiFxmtO4yuyzlynZUwlHmNaI14UAKzAbLkTV3ueb06hZm3SJ4Hh7U8dG7pJ/yyp6/WrNzC+q4U6jCPKxprr9TAh+xNLLEjBwyqEUYENb3TOmqjMtZH6VOl4v5gLlDKGqvgf/I+ph7EbnSQ/Gp6crN8rfOj+WuqhoNRb/Eo7iGhmLxbndYWDew3Y2/EZl6T8v/EEWdIIowEmbHYBznPbMwwMXRPmfM44smQ9BALXz9AnNo++iBhFkkJuyCkFlyzH33mwHKNa8EE8oBKFsVyNa1yoihoRQj6sHAWG/Gq6rGMLmEHMz1fDN7WrZCgtm6+2EZfu3/3i4ycpA3pQV+3yPjDPx0oPzKeR4mRlKUoS6TKBsGxQCxT8qO33gahCgMoJjAQttyzIbTOVXXztqJJ5+JlIuE43jLpc7UWYycvJ7WsrfwGb6dqaYL82ss01EN/lwJhaT4chntcN7E9Z5faqjpD8X9zERgdkdGa8Vzv3LsOzSel9GtRXPj4QPsSYlyeJ1mYYsaRe9mu1DQ/90g suEsd6Wg Dd9HBZJHDOgIL4QKw1pTwKJqc5A1a+o1zU1Hh3tOD+o+GDlvjcfv5KPbvRHr256C/mt7qFXpWWh3OMM1QIM02l/dk0c7F4E7ZBQYsBF33UYwsci3ZdbTrRfH8sQIJiV+2V4sX4JPjFyMM1oExRAhVdxDA5berd3k5a+yF6dENi8PsDStFyNYuviL+stnWJymaHbrLt+dAK5UtwUnVKeH3P4OL3z29vKMyX3mhu+P0dGE1xhBCb1BpRaiE7vS04g++6du/EKNQWMEgkMlQ9EvxIP85kQr4VRLhGiwEt35ebp6MboLaijOdUM7MfJc4WzNcgKo4N+0fI6562t1aGgUzIXt8zsk0dzjn7Eipnp68rg/yIw0SGAiHUr4/Jyj3R/srdghwKQrdtgWTrowTaR6ecHOWjTSZNY/tLcFt1hU5Xj9WchPnD8dtQkFsvQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, hugetlb memory usage is not acounted for in the memory controller, which could lead to memory overprotection for cgroups with hugetlb-backed memory. This has been observed in our production system. This patch rectifies this issue by charging the memcg when the hugetlb folio is allocated, and uncharging when the folio is freed (analogous to the hugetlb controller). Signed-off-by: Nhat Pham --- Documentation/admin-guide/cgroup-v2.rst | 9 ++++++ fs/hugetlbfs/inode.c | 2 +- include/linux/cgroup-defs.h | 5 +++ include/linux/hugetlb.h | 6 ++-- include/linux/memcontrol.h | 8 +++++ kernel/cgroup/cgroup.c | 15 ++++++++- mm/hugetlb.c | 23 ++++++++++---- mm/memcontrol.c | 41 +++++++++++++++++++++++++ 8 files changed, 99 insertions(+), 10 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 622a7f28db1f..e6267b8cbd1d 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -210,6 +210,15 @@ cgroup v2 currently supports the following mount options. relying on the original semantics (e.g. specifying bogusly high 'bypass' protection values at higher tree levels). + memory_hugetlb_accounting + Count hugetlb memory usage towards the cgroup's overall + memory usage for the memory controller. This is a new behavior + that could regress existing setups, so it must be explicitly + opted in with this mount option. Note that hugetlb pages + allocated while this option is not selected will not be + tracked by the memory controller (even if cgroup v2 is + remounted later on). + Organizing Processes and Threads -------------------------------- diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 60fce26ff937..034967319955 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -902,7 +902,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, * to keep reservation accounting consistent. */ hugetlb_set_vma_policy(&pseudo_vma, inode, index); - folio = alloc_hugetlb_folio(&pseudo_vma, addr, 0); + folio = alloc_hugetlb_folio(&pseudo_vma, addr, 0, true); hugetlb_drop_vma_policy(&pseudo_vma); if (IS_ERR(folio)) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index f1b3151ac30b..8641f4320c98 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -115,6 +115,11 @@ enum { * Enable recursive subtree protection */ CGRP_ROOT_MEMORY_RECURSIVE_PROT = (1 << 18), + + /* + * Enable hugetlb accounting for the memory controller. + */ + CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING = (1 << 19), }; /* cftype->flags */ diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a30686e649f7..9b73db1605a2 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -713,7 +713,8 @@ struct huge_bootmem_page { int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve); + unsigned long addr, int avoid_reserve, + bool restore_reserve_on_memcg_failure); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask); struct folio *alloc_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma, @@ -1016,7 +1017,8 @@ static inline int isolate_or_dissolve_huge_page(struct page *page, static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, - int avoid_reserve) + int avoid_reserve, + bool restore_reserve_on_memcg_failure) { return NULL; } diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e0cfab58ab71..8094679c99dd 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -677,6 +677,8 @@ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, return __mem_cgroup_charge(folio, mm, gfp); } +int mem_cgroup_hugetlb_charge_folio(struct folio *folio, gfp_t gfp); + int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry); void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry); @@ -1251,6 +1253,12 @@ static inline int mem_cgroup_charge(struct folio *folio, return 0; } +static inline int mem_cgroup_hugetlb_charge_folio(struct folio *folio, + gfp_t gfp) +{ + return 0; +} + static inline int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry) { diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 1fb7f562289d..f11488b18ceb 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1902,6 +1902,7 @@ enum cgroup2_param { Opt_favordynmods, Opt_memory_localevents, Opt_memory_recursiveprot, + Opt_memory_hugetlb_accounting, nr__cgroup2_params }; @@ -1910,6 +1911,7 @@ static const struct fs_parameter_spec cgroup2_fs_parameters[] = { fsparam_flag("favordynmods", Opt_favordynmods), fsparam_flag("memory_localevents", Opt_memory_localevents), fsparam_flag("memory_recursiveprot", Opt_memory_recursiveprot), + fsparam_flag("memory_hugetlb_accounting", Opt_memory_hugetlb_accounting), {} }; @@ -1936,6 +1938,9 @@ static int cgroup2_parse_param(struct fs_context *fc, struct fs_parameter *param case Opt_memory_recursiveprot: ctx->flags |= CGRP_ROOT_MEMORY_RECURSIVE_PROT; return 0; + case Opt_memory_hugetlb_accounting: + ctx->flags |= CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING; + return 0; } return -EINVAL; } @@ -1960,6 +1965,11 @@ static void apply_cgroup_root_flags(unsigned int root_flags) cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_RECURSIVE_PROT; else cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_RECURSIVE_PROT; + + if (root_flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING) + cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING; + else + cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING; } } @@ -1973,6 +1983,8 @@ static int cgroup_show_options(struct seq_file *seq, struct kernfs_root *kf_root seq_puts(seq, ",memory_localevents"); if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_RECURSIVE_PROT) seq_puts(seq, ",memory_recursiveprot"); + if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING) + seq_puts(seq, ",memory_hugetlb_accounting"); return 0; } @@ -7050,7 +7062,8 @@ static ssize_t features_show(struct kobject *kobj, struct kobj_attribute *attr, "nsdelegate\n" "favordynmods\n" "memory_localevents\n" - "memory_recursiveprot\n"); + "memory_recursiveprot\n" + "memory_hugetlb_accounting\n"); } static struct kobj_attribute cgroup_features_attr = __ATTR_RO(features); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index de220e3ff8be..ff88ea4df11a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1902,6 +1902,7 @@ void free_huge_folio(struct folio *folio) pages_per_huge_page(h), folio); hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h), pages_per_huge_page(h), folio); + mem_cgroup_uncharge(folio); if (restore_reserve) h->resv_huge_pages++; @@ -3004,7 +3005,8 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) } struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve) + unsigned long addr, int avoid_reserve, + bool restore_reserve_on_memcg_failure) { struct hugepage_subpool *spool = subpool_vma(vma); struct hstate *h = hstate_vma(vma); @@ -3119,6 +3121,15 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h), pages_per_huge_page(h), folio); } + + /* undo allocation if memory controller disallows it. */ + if (mem_cgroup_hugetlb_charge_folio(folio, GFP_KERNEL)) { + if (restore_reserve_on_memcg_failure) + restore_reserve_on_error(h, vma, addr, folio); + folio_put(folio); + return ERR_PTR(-ENOMEM); + } + return folio; out_uncharge_cgroup: @@ -5179,7 +5190,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ - new_folio = alloc_hugetlb_folio(dst_vma, addr, 1); + new_folio = alloc_hugetlb_folio(dst_vma, addr, 1, false); if (IS_ERR(new_folio)) { folio_put(pte_folio); ret = PTR_ERR(new_folio); @@ -5656,7 +5667,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, * be acquired again before returning to the caller, as expected. */ spin_unlock(ptl); - new_folio = alloc_hugetlb_folio(vma, haddr, outside_reserve); + new_folio = alloc_hugetlb_folio(vma, haddr, outside_reserve, true); if (IS_ERR(new_folio)) { /* @@ -5930,7 +5941,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, VM_UFFD_MISSING); } - folio = alloc_hugetlb_folio(vma, haddr, 0); + folio = alloc_hugetlb_folio(vma, haddr, 0, true); if (IS_ERR(folio)) { /* * Returning error will result in faulting task being @@ -6352,7 +6363,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, goto out; } - folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0); + folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0, true); if (IS_ERR(folio)) { ret = -ENOMEM; goto out; @@ -6394,7 +6405,7 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, goto out; } - folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0); + folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0, false); if (IS_ERR(folio)) { folio_put(*foliop); ret = -ENOMEM; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d1a322a75172..d5dfc9b36acb 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7050,6 +7050,47 @@ int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp) return ret; } +static struct mem_cgroup *get_mem_cgroup_from_current(void) +{ + struct mem_cgroup *memcg; + +again: + rcu_read_lock(); + memcg = mem_cgroup_from_task(current); + if (!css_tryget(&memcg->css)) { + rcu_read_unlock(); + goto again; + } + rcu_read_unlock(); + return memcg; +} + +/** + * mem_cgroup_hugetlb_charge_folio - Charge a newly allocated hugetlb folio. + * @folio: folio to charge. + * @gfp: reclaim mode + * + * This function charges an allocated hugetlbf folio to the memcg of the + * current task. + * + * Returns 0 on success. Otherwise, an error code is returned. + */ +int mem_cgroup_hugetlb_charge_folio(struct folio *folio, gfp_t gfp) +{ + struct mem_cgroup *memcg; + int ret; + + if (mem_cgroup_disabled() || + !(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING)) + return 0; + + memcg = get_mem_cgroup_from_current(); + ret = charge_memcg(folio, memcg, gfp); + mem_cgroup_put(memcg); + + return ret; +} + /** * mem_cgroup_swapin_charge_folio - Charge a newly allocated folio for swapin. * @folio: folio to charge. From patchwork Thu Sep 28 00:57:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13401845 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5EE06E82CBC for ; Thu, 28 Sep 2023 00:57:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7AF518D0087; Wed, 27 Sep 2023 20:57:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 736418D0002; Wed, 27 Sep 2023 20:57:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53A288D0087; Wed, 27 Sep 2023 20:57:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 45B998D0002 for ; Wed, 27 Sep 2023 20:57:29 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 27757C01D4 for ; Thu, 28 Sep 2023 00:57:29 +0000 (UTC) X-FDA: 81284193018.18.14215CF Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf13.hostedemail.com (Postfix) with ESMTP id 554AA20005 for ; Thu, 28 Sep 2023 00:57:27 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Z6RAXo3u; spf=pass (imf13.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695862647; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vY+Soyh4ezPzEsiTwAjxm3PLfRjJVUJ94CEs1TUSu9g=; b=RueIO0WFXTWdE1uB9jJagG4grxrGA5jcOFViD0ph9H41/DgXoDP6mAAQXRIhclNiUCETBO Djswacek3uucwn45x6hTEMM7qY77E/ivxOgQ7/Gqujh9CR8pILYLCBA3CbjLVUdmen6BWX m2Lz4/KwwlYCnwx1m384C+FwblQCkuo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695862647; a=rsa-sha256; cv=none; b=Xu95o1BTxFB6o2Dvv6eQmyAL+vJ7iivwtJ60kZ0kC4tqwelHy+RGACC3Tu5otZlN9CjRe/ LtG9op7b5UI2FrOyeLECaVHJV5qd1JXC6rvUEdtDrv9C3gLS2AZ42tGYn2LtslusYvKMp1 loKAAZbh00j/gX0G8X+aRZ5oAK6NlEg= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Z6RAXo3u; spf=pass (imf13.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-1bf6ea270b2so95215585ad.0 for ; Wed, 27 Sep 2023 17:57:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695862646; x=1696467446; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vY+Soyh4ezPzEsiTwAjxm3PLfRjJVUJ94CEs1TUSu9g=; b=Z6RAXo3uP2ZTSD1+Yq6zPruLL0WBG02y44vBgDLQgfzBcDtSR90W5erOFwrQIOABPo Db0YpFQuaYRgrHq4nH52WMn9G5cMWS+FaXtRzZLqylBhgniAiCkxyT3DMyEbU3dPh8By 0lIn5+h+zf7vGSv7C9iOVjNzP2HnybAqvqqm2WSn3DYRZb+DASYbyIUtpw4KschurYAL GUI0ZOhDTFPKuWQVJvUh5gnjnHH4oQaOvfJm4Qkp8WcilPWwywHcjiDxdso3I5h3o5OP YadUZEB0GMU7e+8Cap1t/apulQqurpA3n0RyjqbV604lxYJZX+yM7zqKE+rd8WPw4A/D Nvdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695862646; x=1696467446; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vY+Soyh4ezPzEsiTwAjxm3PLfRjJVUJ94CEs1TUSu9g=; b=X2xSr2GszAM+gWrKhu62m14LO8OQ85KVSjxRJaGynfGb/Jziqi5df1erRQnhvhgYRg 0YrU0zzx8lwjR8SEAcye1JPHFF7qi/OmITXUPDZisuRgtMVPFeH+cJWpHxsUKD0QvNiB 2J0S+DhMRkIO+WjYnSOK9+Xw141rMR1A/zH0/05JC3/01yTKS6AWHsBxky0Zus0cG1kl VqKjGX+JpsANxNLxDfqCmSmJi8aGG6mImPu+0bCCnyDX+xb/DlwmJuZkKVUuGjR22cee btTae0coXuk4zroGrbKlQkQgddz98UxH/EtvNW9n9Ogon4fLw8wiXw3PRgGrA/5zkYwo G1BA== X-Gm-Message-State: AOJu0YyYMazvgBBuvNCAgH+vNxnf4lJ4c4nkM9ORtJxzDjAeTYA6rxtb gaKZloa03Ke8U25Zz8oqY08= X-Google-Smtp-Source: AGHT+IFRZZ4kr16bceFBPEJY6aoAH1hHQNJwKsnZQGk/U7syFO2CQBXtRIX2rfEbcGsKUbPNwDLQoQ== X-Received: by 2002:a17:902:a40c:b0:1c4:1b27:1cd6 with SMTP id p12-20020a170902a40c00b001c41b271cd6mr2964348plq.26.1695862646017; Wed, 27 Sep 2023 17:57:26 -0700 (PDT) Received: from localhost (fwdproxy-prn-018.fbsv.net. [2a03:2880:ff:12::face:b00c]) by smtp.gmail.com with ESMTPSA id jh1-20020a170903328100b001c60635c13esm9274548plb.115.2023.09.27.17.57.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Sep 2023 17:57:25 -0700 (PDT) From: Nhat Pham To: akpm@linux-foundation.org Cc: riel@surriel.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: [PATCH v2 2/2] selftests: add a selftest to verify hugetlb usage in memcg Date: Wed, 27 Sep 2023 17:57:23 -0700 Message-Id: <20230928005723.1709119-3-nphamcs@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230928005723.1709119-1-nphamcs@gmail.com> References: <20230928005723.1709119-1-nphamcs@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 554AA20005 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 8xcrns3r5tn4n3jxozbm8uxgsowbh3dk X-HE-Tag: 1695862647-116031 X-HE-Meta: U2FsdGVkX18+fmkwzVF5UAs9ykFIknzyGOpHvGZMDbMRR9bsNuh3krGVvm6mSAdItgl23TPLAi1fWmK4dHp653zrZO6v/7kMqJu0i6dHbTHSCO4F3Mf/qelMjs/MJFEquArXta+WGkvfxBJt7+S2vNCL5Pbxi+C8ybSOwiZIBaCqbDCFggCmTWQ4mJqdETtNiPSDT+JcYy7aSffdZBNA3ILV7rEmjEYBYGGySaownjFbYkQrw8pPlcO3U1FL5mHHWVj+q7/KsFFA0ppn9cJR/qeDxn7PvgzgEf82so6s6EX7dSDmQq/72viNXPzBgY4Tm+yKj5YmeOztO7/plJD1qBKxrGo4eVD256Eat+8Ps1IJacV6FWUbn9kbzACkrtaa4BEKc3hGNYx+QL9HU6bvnas43EOBIhdqDvek+i4M65W/kPE5LDm00bKjtBtwobdNuNiTXExjnTmb4uuqUg1XRI+7JzsJ/P388ywxdxUftLAropVnExNn6IbO6PVJKurOSYXlrCr0+GCNYJ8jhgCYFgDlkLrHqn5AvEwFVWBQj7PQFkfmGp+iGHQ+grvIhQ29YWESEthxSjrgpDiG5PvIOztHbCWOwysmd8Qha+DL5EYc7OcLPWkojxdvlJH/9sTMOSNuBJFydr0odY6RmyQO2Jvaj1h3Wi/JaN4IDNSVuCA2SN5Y8z04H1IyrJBd+6pB6lCkreMkm3X1rPldwawWMkuipU87p6BBuoOFND6DvTLIJIHod0SnqvzQx1CvWLua3vtWgexa3AocJ8RpwGlC4NHVWigHFBVZGfb7Y6DJoi8Bwqmf0MdDq3JfdXflAjQcL+uRse/RR0XEoaKowLSMt1gN0BsxYtLmEIQUmzUtfEzMR6CTkULCNSzlP26szSucWJyGAkoTONn85jHCd/GxH2DlbusfDel32QT6HHpbmnKTFcHXjRJkXcC6przxB8yKfpVzLv9FgmXrRkeE9k7 50xKjIkB 8ROk1lLLI8p6qQ4T2BmJPFH0gXo5yqSN7almAN5915/Ay2+7mnzamxFuVOBagohc0fdCIyFVwa2TfDd4FgihV9w8Nt18UfDERxdc3cbTcoT3AGAjjMFQMc23Tvs9DHciqmHddmSTIfN4jeKUXzDYYifz8PxxQJEstH4dtfH47Es/MqangQJ8Sz4LDRq49U5d17Y3ReC6yVAkHAt89SmtZAPdkNmGFIoOfsF9w3kKHuCEGhv0omUROoCDTi6T8MMuHbRY1KY3bGV2jCcwlNyhTFCL+4BVjQrSlZoN3BtgQmaHYER00X1ChlVgFl0oTIOZHOZ7LWBu+FqBnhMiwpbGZCUnKImcRCHEi9YYBVuBbHMF2WzipKDyTV1j5KKQkkiIDJr5vadMxPKCIimQHuL0X/D2nPJbw+3kh/N6T9Kwe7rzcKs6eh7etYKARprT08yJuJ5Jef8na44npOU6kyjoFms1L3/wVFEEt+ilupZVZavUKsZ8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.427875, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch add a new kselftest to demonstrate and verify the new hugetlb memcg accounting behavior. Signed-off-by: Nhat Pham --- MAINTAINERS | 2 + tools/testing/selftests/cgroup/.gitignore | 1 + tools/testing/selftests/cgroup/Makefile | 2 + .../selftests/cgroup/test_hugetlb_memcg.c | 234 ++++++++++++++++++ 4 files changed, 239 insertions(+) create mode 100644 tools/testing/selftests/cgroup/test_hugetlb_memcg.c diff --git a/MAINTAINERS b/MAINTAINERS index bf0f54c24f81..ce9f40bcc2ba 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5269,6 +5269,7 @@ S: Maintained F: mm/memcontrol.c F: mm/swap_cgroup.c F: tools/testing/selftests/cgroup/memcg_protection.m +F: tools/testing/selftests/cgroup/test_hugetlb_memcg.c F: tools/testing/selftests/cgroup/test_kmem.c F: tools/testing/selftests/cgroup/test_memcontrol.c @@ -9652,6 +9653,7 @@ F: include/linux/hugetlb.h F: mm/hugetlb.c F: mm/hugetlb_vmemmap.c F: mm/hugetlb_vmemmap.h +F: tools/testing/selftests/cgroup/test_hugetlb_memcg.c HVA ST MEDIA DRIVER M: Jean-Christophe Trotin diff --git a/tools/testing/selftests/cgroup/.gitignore b/tools/testing/selftests/cgroup/.gitignore index af8c3f30b9c1..2732e0b29271 100644 --- a/tools/testing/selftests/cgroup/.gitignore +++ b/tools/testing/selftests/cgroup/.gitignore @@ -7,4 +7,5 @@ test_kill test_cpu test_cpuset test_zswap +test_hugetlb_memcg wait_inotify diff --git a/tools/testing/selftests/cgroup/Makefile b/tools/testing/selftests/cgroup/Makefile index c27f05f6ce9b..00b441928909 100644 --- a/tools/testing/selftests/cgroup/Makefile +++ b/tools/testing/selftests/cgroup/Makefile @@ -14,6 +14,7 @@ TEST_GEN_PROGS += test_kill TEST_GEN_PROGS += test_cpu TEST_GEN_PROGS += test_cpuset TEST_GEN_PROGS += test_zswap +TEST_GEN_PROGS += test_hugetlb_memcg LOCAL_HDRS += $(selfdir)/clone3/clone3_selftests.h $(selfdir)/pidfd/pidfd.h @@ -27,3 +28,4 @@ $(OUTPUT)/test_kill: cgroup_util.c $(OUTPUT)/test_cpu: cgroup_util.c $(OUTPUT)/test_cpuset: cgroup_util.c $(OUTPUT)/test_zswap: cgroup_util.c +$(OUTPUT)/test_hugetlb_memcg: cgroup_util.c diff --git a/tools/testing/selftests/cgroup/test_hugetlb_memcg.c b/tools/testing/selftests/cgroup/test_hugetlb_memcg.c new file mode 100644 index 000000000000..f0fefeb4cc24 --- /dev/null +++ b/tools/testing/selftests/cgroup/test_hugetlb_memcg.c @@ -0,0 +1,234 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include "../kselftest.h" +#include "cgroup_util.h" + +#define ADDR ((void *)(0x0UL)) +#define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB) +/* mapping 8 MBs == 4 hugepages */ +#define LENGTH (8UL*1024*1024) +#define PROTECTION (PROT_READ | PROT_WRITE) + +/* borrowed from mm/hmm-tests.c */ +static long get_hugepage_size(void) +{ + int fd; + char buf[2048]; + int len; + char *p, *q, *path = "/proc/meminfo", *tag = "Hugepagesize:"; + long val; + + fd = open(path, O_RDONLY); + if (fd < 0) { + /* Error opening the file */ + return -1; + } + + len = read(fd, buf, sizeof(buf)); + close(fd); + if (len < 0) { + /* Error in reading the file */ + return -1; + } + if (len == sizeof(buf)) { + /* Error file is too large */ + return -1; + } + buf[len] = '\0'; + + /* Search for a tag if provided */ + if (tag) { + p = strstr(buf, tag); + if (!p) + return -1; /* looks like the line we want isn't there */ + p += strlen(tag); + } else + p = buf; + + val = strtol(p, &q, 0); + if (*q != ' ') { + /* Error parsing the file */ + return -1; + } + + return val; +} + +static int set_file(const char *path, long value) +{ + FILE *file; + int ret; + + file = fopen(path, "w"); + if (!file) + return -1; + ret = fprintf(file, "%ld\n", value); + fclose(file); + return ret; +} + +static int set_nr_hugepages(long value) +{ + return set_file("/proc/sys/vm/nr_hugepages", value); +} + +static unsigned int check_first(char *addr) +{ + return *(unsigned int *)addr; +} + +static void write_data(char *addr) +{ + unsigned long i; + + for (i = 0; i < LENGTH; i++) + *(addr + i) = (char)i; +} + +static int hugetlb_test_program(const char *cgroup, void *arg) +{ + char *test_group = (char *)arg; + void *addr; + long old_current, expected_current, current; + int ret = EXIT_FAILURE; + + old_current = cg_read_long(test_group, "memory.current"); + set_nr_hugepages(20); + current = cg_read_long(test_group, "memory.current"); + if (current - old_current >= MB(2)) { + ksft_print_msg( + "setting nr_hugepages should not increase hugepage usage.\n"); + ksft_print_msg("before: %ld, after: %ld\n", old_current, current); + return EXIT_FAILURE; + } + + addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, 0, 0); + if (addr == MAP_FAILED) { + ksft_print_msg("fail to mmap.\n"); + return EXIT_FAILURE; + } + current = cg_read_long(test_group, "memory.current"); + if (current - old_current >= MB(2)) { + ksft_print_msg("mmap should not increase hugepage usage.\n"); + ksft_print_msg("before: %ld, after: %ld\n", old_current, current); + goto out_failed_munmap; + } + old_current = current; + + /* read the first page */ + check_first(addr); + expected_current = old_current + MB(2); + current = cg_read_long(test_group, "memory.current"); + if (!values_close(expected_current, current, 5)) { + ksft_print_msg("memory usage should increase by around 2MB.\n"); + ksft_print_msg( + "expected memory: %ld, actual memory: %ld\n", + expected_current, current); + goto out_failed_munmap; + } + + /* write to the whole range */ + write_data(addr); + current = cg_read_long(test_group, "memory.current"); + expected_current = old_current + MB(8); + if (!values_close(expected_current, current, 5)) { + ksft_print_msg("memory usage should increase by around 8MB.\n"); + ksft_print_msg( + "expected memory: %ld, actual memory: %ld\n", + expected_current, current); + goto out_failed_munmap; + } + + /* unmap the whole range */ + munmap(addr, LENGTH); + current = cg_read_long(test_group, "memory.current"); + expected_current = old_current; + if (!values_close(expected_current, current, 5)) { + ksft_print_msg("memory usage should go back down.\n"); + ksft_print_msg( + "expected memory: %ld, actual memory: %ld\n", + expected_current, current); + return ret; + } + + ret = EXIT_SUCCESS; + return ret; + +out_failed_munmap: + munmap(addr, LENGTH); + return ret; +} + +static int test_hugetlb_memcg(char *root) +{ + int ret = KSFT_FAIL; + char *test_group; + + test_group = cg_name(root, "hugetlb_memcg_test"); + if (!test_group || cg_create(test_group)) { + ksft_print_msg("fail to create cgroup.\n"); + goto out; + } + + if (cg_write(test_group, "memory.max", "100M")) { + ksft_print_msg("fail to set cgroup memory limit.\n"); + goto out; + } + + /* disable swap */ + if (cg_write(test_group, "memory.swap.max", "0")) { + ksft_print_msg("fail to disable swap.\n"); + goto out; + } + + if (!cg_run(test_group, hugetlb_test_program, (void *)test_group)) + ret = KSFT_PASS; +out: + cg_destroy(test_group); + free(test_group); + return ret; +} + +int main(int argc, char **argv) +{ + char root[PATH_MAX]; + int ret = EXIT_SUCCESS, has_memory_hugetlb_acc; + + has_memory_hugetlb_acc = proc_mount_contains("memory_hugetlb_accounting"); + if (has_memory_hugetlb_acc < 0) + ksft_exit_skip("Failed to query cgroup mount option\n"); + else if (!has_memory_hugetlb_acc) + ksft_exit_skip("memory hugetlb accounting is disabled\n"); + + /* Unit is kB! */ + if (get_hugepage_size() != 2048) { + ksft_print_msg("test_hugetlb_memcg requires 2MB hugepages\n"); + ksft_test_result_skip("test_hugetlb_memcg\n"); + return ret; + } + + if (cg_find_unified_root(root, sizeof(root))) + ksft_exit_skip("cgroup v2 isn't mounted\n"); + + switch (test_hugetlb_memcg(root)) { + case KSFT_PASS: + ksft_test_result_pass("test_hugetlb_memcg\n"); + break; + case KSFT_SKIP: + ksft_test_result_skip("test_hugetlb_memcg\n"); + break; + default: + ret = EXIT_FAILURE; + ksft_test_result_fail("test_hugetlb_memcg\n"); + break; + } + + return ret; +}