From patchwork Mon Oct 28 21:05:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joshua Hahn X-Patchwork-Id: 13854151 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84953D5B154 for ; Mon, 28 Oct 2024 21:05:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 02C756B00A8; Mon, 28 Oct 2024 17:05:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F1DB46B00A9; Mon, 28 Oct 2024 17:05:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E0C9A6B00AA; Mon, 28 Oct 2024 17:05:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C29476B00A8 for ; Mon, 28 Oct 2024 17:05:09 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 350FD140362 for ; Mon, 28 Oct 2024 21:05:09 +0000 (UTC) X-FDA: 82724239962.03.6C6DC28 Received: from mail-yw1-f171.google.com (mail-yw1-f171.google.com [209.85.128.171]) by imf27.hostedemail.com (Postfix) with ESMTP id 4B1314001C for ; Mon, 28 Oct 2024 21:04:43 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LBoeksMW; spf=pass (imf27.hostedemail.com: domain of joshua.hahnjy@gmail.com designates 209.85.128.171 as permitted sender) smtp.mailfrom=joshua.hahnjy@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730149350; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=aLEBUO3zgTSFGN59d665N/08QPwfc0hewb2QT9/bCXo=; b=O1mbCb0qiLR5JmUelcs6Bww0vbyqLsI4bK3aS7aZKg0W9fDa1Yjh7Q7GhRWpUMQ9DqhFBJ Sw7QiIdl5QTf+07Zr+oAVUGGjhEnG4Rcf0TWci+K3aRQdlMJL3fw/IuWEX/0wrwMd8sgjX 4wiqa/m3V28EcwLlGGL2XqM3x28U49c= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730149350; a=rsa-sha256; cv=none; b=AkyyWafcGAHWkwM7xdM4Sf/Jna+bGmKo9IY98V6vvu5RXnd6/XLLHQAfedBXbK7q+z2tDX nqIKCITSy1BzBlZTNmhSFh+tC9/CfdjujOMTL3ehBZyqSEQZIjpRE1fJ/WzCJGvYuTbSZu 52w+Nc0uOA0Ggui8UFZ12Eu+vhjId0M= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LBoeksMW; spf=pass (imf27.hostedemail.com: domain of joshua.hahnjy@gmail.com designates 209.85.128.171 as permitted sender) smtp.mailfrom=joshua.hahnjy@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-yw1-f171.google.com with SMTP id 00721157ae682-6e5a5a59094so40613407b3.3 for ; Mon, 28 Oct 2024 14:05:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730149506; x=1730754306; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=aLEBUO3zgTSFGN59d665N/08QPwfc0hewb2QT9/bCXo=; b=LBoeksMW4gFh2qJ7nD2cvs5dtFvG2ELc1wZjfpbrLSZ3hffEp9NEiL81Kk9HitJ99T XM6T03MdB89e5a3IH6y7RtHd0Ofe8aDI0pwHjHW0biU5ovODfTIWDzr9r8mPM+jJNRUR UAuiCOdYloXd/nKY1khck+mGBDrieMDi0Pf+SaotWG/sHBBr9X7tTD50mhUKxREWvaQ4 +yFUp5cCeySwg7VVtrLWAaPhhCoSL1RLadgo0YNj06beU3ZgPzKqo674fkbe1oiqwRih 40ejUaRTp4Sd+SAVxVZMl/ZL+RsINPy538s834mrVgZsj26vld4KoldmQI6LCrgHMyCV h+tQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730149506; x=1730754306; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=aLEBUO3zgTSFGN59d665N/08QPwfc0hewb2QT9/bCXo=; b=Dy4O8qlzERzrR7wHGfYujIejNi2NTto2fZFzxxzabyXzDrISNeMvuXVwbgZrJKFQ0M lBsDc2nyhYeBYQ9sdu3Z+M83Wy8558ead4RsfoIDppKsZy4N1F1P9E/5sTr0GbWtA/x8 ydoucM/ztypXENkQWkJcjTGhgjQPl8ohgfXrxE8n5thCnVkExP7fai6rFg7oDFPm6UlQ Wvcp71Q1tFm4TG+ihHPJZsN47YgNMJ5b2Qdo60cubxcIr+vbgGqZQHWajxy2Wp2WTWDf Cmu3+RoGVbfM4Ecn7QNxcOZtvGOGniiAQVyNAuMKvDgN1A+E+rYUz/m39n18aKft74Tm 8C6Q== X-Forwarded-Encrypted: i=1; AJvYcCXIlGq+Nwj7hAK/C50zv97NBacJA2tL9kpNzBcw6lDOhlQ7F+SQzK+ijmjdjg2FRMakYCB7YLGxtA==@kvack.org X-Gm-Message-State: AOJu0YwfuW0oMdNQa9eZEmKxzuRnG7HVs2q8Zdv3TCUUXH8oonKPXfRU /2Ad7f44KeDaPTjwwKKmCnX9RC1jy6ZlO1CaMsj0PgETLgLebPXU X-Google-Smtp-Source: AGHT+IG+IPKm2Fw3hNr6RVtM0QZGyoPz7y6L5l00CFINrofIoAx5v5ae8AlvQ2HENfUdx2ayHL2eZA== X-Received: by 2002:a05:690c:6085:b0:6d5:7b2f:60a0 with SMTP id 00721157ae682-6e9d8afb26dmr97249987b3.34.1730149506371; Mon, 28 Oct 2024 14:05:06 -0700 (PDT) Received: from localhost (fwdproxy-frc-034.fbsv.net. [2a03:2880:21ff:22::face:b00c]) by smtp.gmail.com with ESMTPSA id 00721157ae682-6e9c6c76a07sm16223747b3.81.2024.10.28.14.05.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Oct 2024 14:05:06 -0700 (PDT) From: Joshua Hahn To: hannes@cmpxchg.org Cc: nphamcs@gmail.com, shakeel.butt@linux.dev, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, mkoutny@suse.com, corbet@lwn.net, lnyng@meta.com, akpm@linux-foundation.org, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH v3 1/1] memcg/hugetlb: Adding hugeTLB counters to memcg Date: Mon, 28 Oct 2024 14:05:05 -0700 Message-ID: <20241028210505.1950884-1-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.43.5 MIME-Version: 1.0 X-Rspamd-Queue-Id: 4B1314001C X-Stat-Signature: qwo65s7ut1dzmajkcwghr1z57hza8mr4 X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1730149483-768836 X-HE-Meta: U2FsdGVkX1+utueEGQERb56b6Kpy+i1VdEVXAAe94wurpib6ZThXnoefsrxP8NgKgO8qkaarmtD0oi2U3Cjs1xqtBV8XnaCJG8Qajvo6NflmJm6Tj8Md0Ng/hBnN+qVR8G0IG3brt/0TYZGWfyD38TnHKyQOq2f7dIVhqrViz+5aan2HE6Hrn9+NP0fsYqWAVhsSeXnOeZTIIwClcR8VqiSnr7YpryCZfx2qKD9bCESrRvGC7vQ3Lncf7KVZwEPswvEID5bz5xTjkhFS3+ryXLEYxio1Weh+XtAHkpUPHFV0B3xWdsoAiuJbOM5xvYMuWChFSjBfuAB5Smjr/JAr2Bg2YE/Qy2hJMWyopg3oINueay7mcLIdRoitt5zffYyqVhU6iCl1TuJRI3ks0C8d2ZgMvHBshRNNgHBFlfLKBkAQ9gZ14vDA5HZlfpDAbjKZq9rEPkYneHAd4L5gJQVru+N6ESN4m9ucSgqvtuXjPsLWL8B00F0QvFpCsIicFXi+zwkV7ci1mIw61QrAomG3jHRfquyfVDcLpes5yawASavxp1FwKDpLkgeRaet/mLUqkSu54B4ct4MiZf/Dg6XyBRjq1EF8haIh8hXloHjezkqIy4uB+zJL/mvzfkoJjOG4tLflTGDt/czuZPR9C1OOMxeWg2QoxGGhiVVLuZrmY5yRZPhQoy1uLp04yFx4WfWlmIZ1dfBfzriYpgiq5mOzpt2cgoEMbr6oCwNYvnQCPpQTSDmGnIwBWHOKsdYO/Eyt89ZZYLqxHqlZT9FBUF3eU0ghU14LFTTbQfxFNHN1en25Oe/siXyDB9ZKZN2ydltQlIbMLrraE7/us+Yr56QfC6SggqNsOSHDJRedh7z0nxkILLnaPsqJiNk+NITkpLh+fDMBSZVJqz2oPEi6lLclJFSZuDrlxZ/S9LJ+mZjlbjQ9q6+amF7jz92YgIvtu6f4NlTR8UqtxLwcb3Hmhmj XZUFR935 yQzWjpQTfET9HALFxio2W3BjKPbBbPkBLrFgobRP6rOzGFazoKgxHEjhMhxPNYOkbL03PS0kSEfGVh9rQqF8ygt5p6BafdrBClXlf0HaHebaC+Mwrz1LFslN+dr+ZA3tvxJ6XKMWEiJUecDUDe33MrNGibRnNN5JGpChFwDIVMRe33597YIUuiogtgaKolvnip4yLoAKbncqCueyjZx0ZIXALxHUADeWH76bdoFcCcDSw184lPeoj3gqq7JnTZgBPULVeW4gc9apES2DDIr58VjKhyM/Q+lx32quXv11OK8g8FL2MyFgnn5JM0iScCOQb3g/ZKqESNNeZ7TkeHqei9HxxUYMMv7oKnVMTbubpYvI3hffPR4biAqZpqIwUI0WghDt8zJA/dvtLL5Vc0eBhv2+Dld4q68LGuDMz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch introduces a new counter to memory.stat that tracks hugeTLB usage, only if hugeTLB accounting is done to memory.current. This feature is enabled the same way hugeTLB accounting is enabled, via the memory_hugetlb_accounting mount flag for cgroupsv2. 1. Why is this patch necessary? Currently, memcg hugeTLB accounting is an opt-in feature [1] that adds hugeTLB usage to memory.current. However, the metric is not reported in memory.stat. Given that users often interpret memory.stat as a breakdown of the value reported in memory.current, the disparity between the two reports can be confusing. This patch solves this problem by including the metric in memory.stat as well, but only if it is also reported in memory.current (it would also be confusing if the value was reported in memory.stat, but not in memory.current) Aside from the consistency between the two files, we also see benefits in observability. Userspace might be interested in the hugeTLB footprint of cgroups for many reasons. For instance, system admins might want to verify that hugeTLB usage is distributed as expected across tasks: i.e. memory-intensive tasks are using more hugeTLB pages than tasks that don't consume a lot of memory, or are seen to fault frequently. Note that this is separate from wanting to inspect the distribution for limiting purposes (in which case, hugeTLB controller makes more sense). 2. We already have a hugeTLB controller. Why not use that? It is true that hugeTLB tracks the exact value that we want. In fact, by enabling the hugeTLB controller, we get all of the observability benefits that I mentioned above, and users can check the total hugeTLB usage, verify if it is distributed as expected, etc. With this said, there are 2 problems: (a) They are still not reported in memory.stat, which means the disparity between the memcg reports are still there. (b) We cannot reasonably expect users to enable the hugeTLB controller just for the sake of hugeTLB usage reporting, especially since they don't have any use for hugeTLB usage enforcing [2]. [1] https://lore.kernel.org/all/20231006184629.155543-1-nphamcs@gmail.com/ [2] Of course, we can't make a new patch for every feature that can be duplicated. However, since the existing solution of enabling the hugeTLB controller is an imperfect solution that still leaves a discrepancy between memory.stat and memory.curent, I think that it is reasonable to isolate the feature in this case. Suggested-by: Nhat Pham Suggested-by: Shakeel Butt Signed-off-by: Joshua Hahn Acked-by: Shakeel Butt Reviewed-by: Roman Gushchin Acked-by: Johannes Weiner Reviewed-by: Nhat Pham Acked-by: Chris Down Acked-by: Shakeel Butt Acked-by: Johannes Weiner Acked-by: Chris Down Reviewed-by: Roman Gushchin Reviewed-by: Nhat Pham Signed-off-by: Joshua Hahn Acked-by: Michal Hocko --- Changelog v3: * Removed check for whether CGRP_ROOT_HUGETLB_ACCOUNTING is on, since this check is already handled by lruvec_stat_mod (and doing the check in hugetlb.c actually breaks the build if MEMCG is not enabled. * Because there is now only one check for the flags, I've opted to do all of the cleanup in a separate patch series. * Added hugetlb information in cgroup-v2.rst * Added Suggested-by: Shakeel Butt v2: * Enables the feature only if memcg accounts for hugeTLB usage * Moves the counter from memcg_stat_item to node_stat_item * Expands on motivation & justification in commitlog * Added Suggested-by: Nhat Pham Documentation/admin-guide/cgroup-v2.rst | 5 +++++ include/linux/mmzone.h | 3 +++ mm/hugetlb.c | 2 ++ mm/memcontrol.c | 11 +++++++++++ mm/vmstat.c | 3 +++ 5 files changed, 24 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 69af2173555f..bd7e81c2aa2b 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1646,6 +1646,11 @@ The following nested keys are defined. pgdemote_khugepaged Number of pages demoted by khugepaged. + hugetlb + Amount of memory used by hugetlb pages. This metric only shows + up if hugetlb usage is accounted for in memory.current (i.e. + cgroup is mounted with the memory_hugetlb_accounting option). + memory.numa_stat A read-only nested-keyed file which exists on non-root cgroups. diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 17506e4a2835..972795ae5946 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -220,6 +220,9 @@ enum node_stat_item { PGDEMOTE_KSWAPD, PGDEMOTE_DIRECT, PGDEMOTE_KHUGEPAGED, +#ifdef CONFIG_HUGETLB_PAGE + NR_HUGETLB, +#endif NR_VM_NODE_STAT_ITEMS }; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 190fa05635f4..fbb10e52d7ea 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1925,6 +1925,7 @@ void free_huge_folio(struct folio *folio) pages_per_huge_page(h), folio); hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h), pages_per_huge_page(h), folio); + lruvec_stat_mod_folio(folio, NR_HUGETLB, -pages_per_huge_page(h)); mem_cgroup_uncharge(folio); if (restore_reserve) h->resv_huge_pages++; @@ -3093,6 +3094,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, if (!memcg_charge_ret) mem_cgroup_commit_charge(folio, memcg); + lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h)); mem_cgroup_put(memcg); return folio; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7845c64a2c57..5444d0e7bb64 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -310,6 +310,9 @@ static const unsigned int memcg_node_stat_items[] = { PGDEMOTE_KSWAPD, PGDEMOTE_DIRECT, PGDEMOTE_KHUGEPAGED, +#ifdef CONFIG_HUGETLB_PAGE + NR_HUGETLB, +#endif }; static const unsigned int memcg_stat_items[] = { @@ -1346,6 +1349,9 @@ static const struct memory_stat memory_stats[] = { { "unevictable", NR_UNEVICTABLE }, { "slab_reclaimable", NR_SLAB_RECLAIMABLE_B }, { "slab_unreclaimable", NR_SLAB_UNRECLAIMABLE_B }, +#ifdef CONFIG_HUGETLB_PAGE + { "hugetlb", NR_HUGETLB }, +#endif /* The memory events */ { "workingset_refault_anon", WORKINGSET_REFAULT_ANON }, @@ -1441,6 +1447,11 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) for (i = 0; i < ARRAY_SIZE(memory_stats); i++) { u64 size; +#ifdef CONFIG_HUGETLB_PAGE + if (unlikely(memory_stats[i].idx == NR_HUGETLB) && + !(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING)) + continue; +#endif size = memcg_page_state_output(memcg, memory_stats[i].idx); seq_buf_printf(s, "%s %llu\n", memory_stats[i].name, size); diff --git a/mm/vmstat.c b/mm/vmstat.c index b5a4cea423e1..871566b04b79 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1273,6 +1273,9 @@ const char * const vmstat_text[] = { "pgdemote_kswapd", "pgdemote_direct", "pgdemote_khugepaged", +#ifdef CONFIG_HUGETLB_PAGE + "nr_hugetlb", +#endif /* system-wide enum vm_stat_item counters */ "nr_dirty_threshold", "nr_dirty_background_threshold",