From patchwork Tue Feb 18 03:14:40 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: JP Kobryn X-Patchwork-Id: 13978852 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEAE0C021AA for ; Tue, 18 Feb 2025 03:15:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 094FE2800B9; Mon, 17 Feb 2025 22:15:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F37042800A4; Mon, 17 Feb 2025 22:15:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D8AC72800B9; Mon, 17 Feb 2025 22:15:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A96172800A4 for ; Mon, 17 Feb 2025 22:15:08 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 5A686B516B for ; Tue, 18 Feb 2025 03:15:08 +0000 (UTC) X-FDA: 83131599096.08.A983FD1 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf16.hostedemail.com (Postfix) with ESMTP id 69232180006 for ; Tue, 18 Feb 2025 03:15:06 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NWaX+L8w; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739848506; a=rsa-sha256; cv=none; b=GCeSN44OLZVNYiiabrGBqc6lgR/PT4modbvA788shc8ICrcQVz8atC/nZTyIAD13OFAUYo XzOBTYaT1z4HqhCTwT4Z8kyisXlQo1gvHBjYBr09GeNd8tfnsXBVmAWC1JPKvkJnc0MznN yoXKGhyhct5d74oO53uszXuylknFCus= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NWaX+L8w; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739848506; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/UwRbDpwwiM5OHo47KEwhQuET3ryRjJIsdR5W9Qkkts=; b=zBOF7nHLeB+XaYwtcNGjfzx/+Qvs587D5tMks10b8QeSKEK27OwF14nyPniz/Acsb/3X/Y /ZBg08E8oInl5vTpeCF0MR2iCbSjGc2ETpMFg/D0omwG1WDJuRrhFODSpMVrWRROR983I5 /Mjjrd06XQgaVcxYfvQXZAfK9hXzMmE= Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-220d39a5627so71563155ad.1 for ; Mon, 17 Feb 2025 19:15:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739848505; x=1740453305; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/UwRbDpwwiM5OHo47KEwhQuET3ryRjJIsdR5W9Qkkts=; b=NWaX+L8wMUVyw4vKTY/9ywO0f4fbt90aL1onTZS9blYpjU5XnrrhN7fDNSNME8vOK6 nx72uN+wtDzwH1sGtWatqwAG/r+pxwZO4hZgegRlEMbHTFoQMnD2yEMR5iTv4/njHwqd cmntohXlxFlLUtt3ukIF8xP0NE9FQOrc/cQJYI9S/I4TY0CMcGvMKBax+s/h9KAuAt3A o8v5gVa3PSJgpHE+K8ud5v24vnYlVTF2mUFv/f79RPnsHOEn1XfXuJf2nJj/Tn3uy3S9 v4r2EQKLCgccsLj3raUnFWh+SFILvXO1+ZWwjCKtbp6SeGbSJeA/61jOiNnho8nv6ein LbWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739848505; x=1740453305; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/UwRbDpwwiM5OHo47KEwhQuET3ryRjJIsdR5W9Qkkts=; b=Mqb/vMw9gUDk7lhPSZVuEATnXgCMwZH+5qYKJs21CIqE7R4jG6gYms6+x2ZXJ5FO2B h9O6K3Rx1lQbW5fmneOKV5JL6mdg4N7q6ibI0pYkb2D7MYNQ4Mk+PC9f+6H00U/jeuVI hKVcg5T0BdZFzFaU3Ssm5UzMqNpsrsveW12KGYenYnu7sFQpl9HKwDHpronpU04nNNSW Z6aAgkZ2L/Jdyjsa7JMM6TMy93dMJGBY+VSsiH47y/b2k+wsytzfDia3Y3BRGiCQ+rIN MoANxUPZn2JEWSaSP0Hsk0jvhNGN2rhWSq0BRAHDzOmrxRt4A1AupxvSkDTNQhW9QtWa g0XQ== X-Gm-Message-State: AOJu0YwnFogY52jNMi4DF+V41g8oh+sQ3Pfnwzj2PcD4w0rXCvFcwrPd n5mwgeMU7ec4/wApcMtD/dz49rw8FSscTegj8V5DAW0sPgKVfvHs X-Gm-Gg: ASbGncuklG6owznCYOGC2CAkmzRfg+zN+0rkcNDpPFkBMxbnk2RmCjEbrEkRgkwUojv iMLejZDjFdqHNYXdRGHQSMIP8jXI3ZmmaitVsjyX3j7E7v3ULc9fq4LKKvcgbKmeZIAYV2x8SuK qAx+cbQnokgpDN13e0nQnhwYID56Tc4QJ7ir8nGhMXZJHykkVKQhTWC58Klj4FyUlTE1NEkDdnO S0eMn93JGKmJN22QmGoo4iEZ0ih418iOnefQU8fNkDX+U564scJEr6fU48+JJ9v1DBRulbAfcB0 rEaE7/JadGBxmWXsl0nuseavqkQVo4CFMVzoxiZBddsrFwDJJKmZ X-Google-Smtp-Source: AGHT+IFOvjlSJbep9cTBMal2GTTkcqYRhr58+YYHWmPTnoxu5Wh1hKr9YEqq6/k/u2c1lVSxE6Idrg== X-Received: by 2002:a05:6a20:d49b:b0:1ee:c598:7a7d with SMTP id adf61e73a8af0-1eec5987c38mr2688332637.41.1739848505171; Mon, 17 Feb 2025 19:15:05 -0800 (PST) Received: from saturn.. (c-67-188-127-15.hsd1.ca.comcast.net. [67.188.127.15]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-732425466besm8763451b3a.9.2025.02.17.19.15.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Feb 2025 19:15:04 -0800 (PST) From: JP Kobryn To: shakeel.butt@linux.dev, tj@kernel.org, mhocko@kernel.org, hannes@cmpxchg.org, yosryahmed@google.com, akpm@linux-foundation.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 03/11] cgroup: move cgroup_rstat from cgroup to cgroup_subsys_state Date: Mon, 17 Feb 2025 19:14:40 -0800 Message-ID: <20250218031448.46951-4-inwardvessel@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250218031448.46951-1-inwardvessel@gmail.com> References: <20250218031448.46951-1-inwardvessel@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 69232180006 X-Stat-Signature: 8zbgux17jpfuh11k68k7hqfm3dpyj8q8 X-Rspam-User: X-HE-Tag: 1739848506-966249 X-HE-Meta: U2FsdGVkX1/9jaZhbpkCQEt7W885bJgy+YQb8EDefg47YOJjOrCQgOTmZXKDc4BUPODZ+T/xwdFkAzGGBr3IMY+6LFhTwcG17Ajsw1YSO6BeyD2utKDijJkxk2HY3JqyLwfRBMe5mfdeaaP5lQhTvUptbgcjpfnMBUhQA4C1H+nNfb6DL11pwnNHwkF0kRPyCEQbX5st8bMAA+w/cAbcjaxjh4/E4lsDT975pvwOsGPCGqUFSQ1qCkmtEsvCVsulLI8aWDXQ3NjKwR+TNf73qe5+fKjIAad76co5F/DyB8BTcD6qb0GbPHGYjc2laoiyv0pbxmw7IYkC+859UbX+VG9U/ptsEdOCqOLwqu4tr8Aj62fJEIF57fuVR+lGHBmIJ8O8gNfN7jjuJWpqfRD0qsJghy9WZv9tuzxqREq9GjAl0d92eEa0y7hoFuZSKySSzHoFYOCCixjs+qH+W27lGAPPWC9pm5GXbMDc5ueF6DljltSiWC0cO91njZHORmQNON9ZUjWNWhVB8II1FJ6+lo41jtcmXY2NnMOmYwX/3ocxhgc9XPxzVYkL13mgrlc40yRnzNUlQ7LmdwdZRjVKUJ8KcOg0PN81toyn44ArqKhd/FaIKr+ThN/SEc4cmDsoB0MUf2mWgMbcpC0cc2zlvBEDMkIFbVi6yDcrIUKI0OPaNcenubSRRo1QO4T5z+ftviwe8+ehwvvVi8zmXUTOQaekAEeGDenmHN6kEIiOaRVNbAL/C927LM4aSv6KEePuwskGL2YOm1uL9xIWkiiH2gvcPJgxwlze0NcY4d1zXRZD36bBLcQud9aGOLtkVTKS0nW1HtXyWPCiqcbI1maRhqEjsCueKVEtP64D+Ir+6G7atKdjM51FbJz4AP2oiNsWUAMJnL8d79gMfJCpGMP6+thk7+NCsPq+JlKRTD984uLYn8kPYjN5kIJtC+1knPw0dipUfndU4XNZsZZO2X+ AAfBycAZ tudrSUvaZT+tTaJwcdGA+ockDVF8niCbdbA/hCLi+C/r/FAD+lxA5dmwyayMSeUgFSLBH5idHLbpfaMwsw1QyoI8zVmV5ImAKUafpxX8HhQFjTT+5opW/4t1R1dSJKXONzhIw4QZds19AlDkoZRLsZabulq2xPH/S7u+VX3KjpdIxN16BRion57TKwSuDg8ZRJKuSgc7gPNsocSH1Ff+lTbJurFvSrG+qMLJQFZFVzqGb0JsUyjP8zOol3kZM4lCNwt19gtK/S45+XMxIsj03SigP9MvQi1SWG+dcCigftAtbpRfoLL2rIQBZgxyYQmPhDmjIZA4EVEJGuUEtXIviXoxKjNQnNdDTfWfVKziW9MoRHYYddgwlxCB8XII4RzYvHAzV/eMJSun2pn0ufpK5qC/QLo8+VebD83gtgW3ycA1HnkQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Each cgroup owns a single cgroup_rstat instance. This means that a tree of pending rstat updates can contain changes from different subsystems. Because of this arrangement, when one subsystem is flushed via the public api cgroup_rstat_flushed(), all other subsystems with pending updates will also be flushed. Remove the cgroup_rstat instance from the cgroup and instead give one instance to each cgroup_subsys_state. Separate rstat trees will now exist for each unique subsystem. This separation allows for subsystems to make updates and flushes without the side effects of other subsystems. i.e. flushing the cpu stats does not cause the memory stats to be flushed and vice versa. The change in cgroup_rstat ownership from cgroup to cgroup_subsys_state allows for direct flushing of the css, so the rcu list management entities and operations previously tied to the cgroup which were used for managing a list of subsystem states with pending flushes are removed. In terms of client code, public api calls were changed to now accept the cgroup_subsystem_state so that when flushing or updating, a specific subsystem is associated with the call. Signed-off-by: JP Kobryn --- block/blk-cgroup.c | 4 +- include/linux/cgroup-defs.h | 10 +- include/linux/cgroup.h | 8 +- kernel/cgroup/cgroup-internal.h | 4 +- kernel/cgroup/cgroup.c | 64 ++++++---- kernel/cgroup/rstat.c | 117 ++++++++++-------- mm/memcontrol.c | 4 +- .../selftests/bpf/progs/btf_type_tag_percpu.c | 5 +- .../bpf/progs/cgroup_hierarchical_stats.c | 8 +- 9 files changed, 123 insertions(+), 101 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 9ed93d91d754..6a0680d8ce6a 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1201,7 +1201,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v) if (!seq_css(sf)->parent) blkcg_fill_root_iostats(); else - cgroup_rstat_flush(blkcg->css.cgroup); + cgroup_rstat_flush(&blkcg->css); rcu_read_lock(); hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) { @@ -2186,7 +2186,7 @@ void blk_cgroup_bio_start(struct bio *bio) } u64_stats_update_end_irqrestore(&bis->sync, flags); - cgroup_rstat_updated(blkcg->css.cgroup, cpu); + cgroup_rstat_updated(&blkcg->css, cpu); put_cpu(); } diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 6b6cc027fe70..81ec56860ee5 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -180,9 +180,6 @@ struct cgroup_subsys_state { struct list_head sibling; struct list_head children; - /* flush target list anchored at cgrp->rstat_css_list */ - struct list_head rstat_css_node; - /* * PI: Subsys-unique ID. 0 is unused and root is always 1. The * matching css can be looked up using css_from_id(). @@ -222,6 +219,9 @@ struct cgroup_subsys_state { * Protected by cgroup_mutex. */ int nr_descendants; + + /* per-cpu recursive resource statistics */ + struct cgroup_rstat rstat; }; /* @@ -444,10 +444,6 @@ struct cgroup { struct cgroup *dom_cgrp; struct cgroup *old_dom_cgrp; /* used while enabling threaded */ - /* per-cpu recursive resource statistics */ - struct cgroup_rstat rstat; - struct list_head rstat_css_list; - /* cgroup basic resource statistics */ struct cgroup_base_stat last_bstat; struct cgroup_base_stat bstat; diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index f8ef47f8a634..eec970622419 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -687,10 +687,10 @@ static inline void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen) /* * cgroup scalable recursive statistics. */ -void cgroup_rstat_updated(struct cgroup *cgrp, int cpu); -void cgroup_rstat_flush(struct cgroup *cgrp); -void cgroup_rstat_flush_hold(struct cgroup *cgrp); -void cgroup_rstat_flush_release(struct cgroup *cgrp); +void cgroup_rstat_updated(struct cgroup_subsys_state *css, int cpu); +void cgroup_rstat_flush(struct cgroup_subsys_state *css); +void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css); +void cgroup_rstat_flush_release(struct cgroup_subsys_state *css); /* * Basic resource stats. diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index 03139018da43..87d062baff90 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -269,8 +269,8 @@ int cgroup_task_count(const struct cgroup *cgrp); /* * rstat.c */ -int cgroup_rstat_init(struct cgroup_rstat *rstat); -void cgroup_rstat_exit(struct cgroup_rstat *rstat); +int cgroup_rstat_init(struct cgroup_subsys_state *css); +void cgroup_rstat_exit(struct cgroup_subsys_state *css); void cgroup_rstat_boot(void); void cgroup_base_stat_cputime_show(struct seq_file *seq); diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 02d6c11ccccb..916e9c5a1fd7 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -165,7 +165,9 @@ static struct static_key_true *cgroup_subsys_on_dfl_key[] = { static DEFINE_PER_CPU(struct cgroup_rstat_cpu, cgrp_dfl_root_rstat_cpu); /* the default hierarchy */ -struct cgroup_root cgrp_dfl_root = { .cgrp.rstat.rstat_cpu = &cgrp_dfl_root_rstat_cpu }; +struct cgroup_root cgrp_dfl_root = { + .cgrp.self.rstat.rstat_cpu = &cgrp_dfl_root_rstat_cpu +}; EXPORT_SYMBOL_GPL(cgrp_dfl_root); /* @@ -1359,7 +1361,7 @@ static void cgroup_destroy_root(struct cgroup_root *root) cgroup_unlock(); - cgroup_rstat_exit(&cgrp->rstat); + cgroup_rstat_exit(&cgrp->self); kernfs_destroy_root(root->kf_root); cgroup_free_root(root); } @@ -1864,13 +1866,6 @@ int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) } spin_unlock_irq(&css_set_lock); - if (ss->css_rstat_flush) { - list_del_rcu(&css->rstat_css_node); - synchronize_rcu(); - list_add_rcu(&css->rstat_css_node, - &dcgrp->rstat_css_list); - } - /* default hierarchy doesn't enable controllers by default */ dst_root->subsys_mask |= 1 << ssid; if (dst_root == &cgrp_dfl_root) { @@ -2053,7 +2048,6 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp) cgrp->dom_cgrp = cgrp; cgrp->max_descendants = INT_MAX; cgrp->max_depth = INT_MAX; - INIT_LIST_HEAD(&cgrp->rstat_css_list); prev_cputime_init(&cgrp->prev_cputime); for_each_subsys(ss, ssid) @@ -2133,7 +2127,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) if (ret) goto destroy_root; - ret = cgroup_rstat_init(&root_cgrp->rstat); + ret = cgroup_rstat_init(&root_cgrp->self); if (ret) goto destroy_root; @@ -2175,7 +2169,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) goto out; exit_stats: - cgroup_rstat_exit(&root_cgrp->rstat); + cgroup_rstat_exit(&root_cgrp->self); destroy_root: kernfs_destroy_root(root->kf_root); root->kf_root = NULL; @@ -3240,6 +3234,12 @@ static int cgroup_apply_control_enable(struct cgroup *cgrp) css = css_create(dsct, ss); if (IS_ERR(css)) return PTR_ERR(css); + + if (css->ss && css->ss->css_rstat_flush) { + ret = cgroup_rstat_init(css); + if (ret) + goto err_out; + } } WARN_ON_ONCE(percpu_ref_is_dying(&css->refcnt)); @@ -3253,6 +3253,21 @@ static int cgroup_apply_control_enable(struct cgroup *cgrp) } return 0; + +err_out: + cgroup_for_each_live_descendant_pre(dsct, d_css, cgrp) { + for_each_subsys(ss, ssid) { + struct cgroup_subsys_state *css = cgroup_css(dsct, ss); + + if (!(cgroup_ss_mask(dsct) & (1 << ss->id))) + continue; + + if (css && css->ss && css->ss->css_rstat_flush) + cgroup_rstat_exit(css); + } + } + + return ret; } /** @@ -5436,7 +5451,7 @@ static void css_free_rwork_fn(struct work_struct *work) cgroup_put(cgroup_parent(cgrp)); kernfs_put(cgrp->kn); psi_cgroup_free(cgrp); - cgroup_rstat_exit(&cgrp->rstat); + cgroup_rstat_exit(&cgrp->self); kfree(cgrp); } else { /* @@ -5464,11 +5479,7 @@ static void css_release_work_fn(struct work_struct *work) if (ss) { struct cgroup *parent_cgrp; - /* css release path */ - if (!list_empty(&css->rstat_css_node)) { - cgroup_rstat_flush(cgrp); - list_del_rcu(&css->rstat_css_node); - } + cgroup_rstat_flush(css); cgroup_idr_replace(&ss->css_idr, NULL, css->id); if (ss->css_released) @@ -5494,7 +5505,7 @@ static void css_release_work_fn(struct work_struct *work) /* cgroup release path */ TRACE_CGROUP_PATH(release, cgrp); - cgroup_rstat_flush(cgrp); + cgroup_rstat_flush(&cgrp->self); spin_lock_irq(&css_set_lock); for (tcgrp = cgroup_parent(cgrp); tcgrp; @@ -5542,7 +5553,6 @@ static void init_and_link_css(struct cgroup_subsys_state *css, css->id = -1; INIT_LIST_HEAD(&css->sibling); INIT_LIST_HEAD(&css->children); - INIT_LIST_HEAD(&css->rstat_css_node); css->serial_nr = css_serial_nr_next++; atomic_set(&css->online_cnt, 0); @@ -5551,9 +5561,6 @@ static void init_and_link_css(struct cgroup_subsys_state *css, css_get(css->parent); } - if (ss->css_rstat_flush) - list_add_rcu(&css->rstat_css_node, &cgrp->rstat_css_list); - BUG_ON(cgroup_css(cgrp, ss)); } @@ -5659,7 +5666,6 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp, err_list_del: list_del_rcu(&css->sibling); err_free_css: - list_del_rcu(&css->rstat_css_node); INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn); queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork); return ERR_PTR(err); @@ -5687,7 +5693,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name, if (ret) goto out_free_cgrp; - ret = cgroup_rstat_init(&cgrp->rstat); + ret = cgroup_rstat_init(&cgrp->self); if (ret) goto out_cancel_ref; @@ -5780,7 +5786,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name, out_kernfs_remove: kernfs_remove(cgrp->kn); out_stat_exit: - cgroup_rstat_exit(&cgrp->rstat); + cgroup_rstat_exit(&cgrp->self); out_cancel_ref: percpu_ref_exit(&cgrp->self.refcnt); out_free_cgrp: @@ -6092,6 +6098,9 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early) } else { css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL); BUG_ON(css->id < 0); + + if (css->ss && css->ss->css_rstat_flush) + BUG_ON(cgroup_rstat_init(css)); } /* Update the init_css_set to contain a subsys @@ -6193,6 +6202,9 @@ int __init cgroup_init(void) css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL); BUG_ON(css->id < 0); + + if (css->ss && css->ss->css_rstat_flush) + BUG_ON(cgroup_rstat_init(css)); } else { cgroup_init_subsys(ss, false); } diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 13090dda56aa..a32bcd7942a5 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -21,13 +21,13 @@ static struct cgroup_rstat_cpu *rstat_cpu(struct cgroup_rstat *rstat, int cpu) static struct cgroup_rstat *rstat_parent(struct cgroup_rstat *rstat) { - struct cgroup *cgrp = container_of(rstat, typeof(*cgrp), rstat); - struct cgroup *parent = cgroup_parent(cgrp); + struct cgroup_subsys_state *css = container_of( + rstat, typeof(*css), rstat); - if (!parent) + if (!css->parent) return NULL; - return &parent->rstat; + return &(css->parent->rstat); } /* @@ -86,7 +86,9 @@ void _cgroup_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int cpu, static void __cgroup_rstat_updated(struct cgroup_rstat *rstat, int cpu) { - struct cgroup *cgrp = container_of(rstat, typeof(*cgrp), rstat); + struct cgroup_subsys_state *css = container_of( + rstat, typeof(*css), rstat); + struct cgroup *cgrp = css->cgroup; raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); unsigned long flags; @@ -95,7 +97,7 @@ static void __cgroup_rstat_updated(struct cgroup_rstat *rstat, int cpu) * temporary inaccuracies, which is fine. * * Because @parent's updated_children is terminated with @parent - * instead of NULL, we can tell whether @cgrp is on the list by + * instead of NULL, we can tell whether @rstat is on the list by * testing the next pointer for NULL. */ if (data_race(rstat_cpu(rstat, cpu)->updated_next)) @@ -103,7 +105,7 @@ static void __cgroup_rstat_updated(struct cgroup_rstat *rstat, int cpu) flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, true); - /* put @cgrp and all ancestors on the corresponding updated lists */ + /* put @rstat and all ancestors on the corresponding updated lists */ while (true) { struct cgroup_rstat_cpu *rstatc = rstat_cpu(rstat, cpu); struct cgroup_rstat *parent = rstat_parent(rstat); @@ -134,16 +136,16 @@ static void __cgroup_rstat_updated(struct cgroup_rstat *rstat, int cpu) /** * cgroup_rstat_updated - keep track of updated rstat_cpu - * @cgrp: target cgroup + * @css: target cgroup subsystem state * @cpu: cpu on which rstat_cpu was updated * - * @cgrp's rstat_cpu on @cpu was updated. Put it on the parent's matching + * @css's rstat_cpu on @cpu was updated. Put it on the parent's matching * rstat_cpu->updated_children list. See the comment on top of * cgroup_rstat_cpu definition for details. */ -__bpf_kfunc void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) +__bpf_kfunc void cgroup_rstat_updated(struct cgroup_subsys_state *css, int cpu) { - __cgroup_rstat_updated(&cgrp->rstat, cpu); + __cgroup_rstat_updated(&css->rstat, cpu); } /** @@ -220,7 +222,9 @@ static struct cgroup_rstat *cgroup_rstat_push_children( static struct cgroup_rstat *cgroup_rstat_updated_list( struct cgroup_rstat *root, int cpu) { - struct cgroup *cgrp = container_of(root, typeof(*cgrp), rstat); + struct cgroup_subsys_state *css = container_of( + root, typeof(*css), rstat); + struct cgroup *cgrp = css->cgroup; raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); struct cgroup_rstat_cpu *rstatc = rstat_cpu(root, cpu); struct cgroup_rstat *head = NULL, *parent, *child; @@ -322,7 +326,9 @@ static inline void __cgroup_rstat_unlock(struct cgroup *cgrp, int cpu_in_loop) static void cgroup_rstat_flush_locked(struct cgroup_rstat *rstat) __releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock) { - struct cgroup *cgrp = container_of(rstat, typeof(*cgrp), rstat); + struct cgroup_subsys_state *css = container_of( + rstat, typeof(*css), rstat); + struct cgroup *cgrp = css->cgroup; int cpu; lockdep_assert_held(&cgroup_rstat_lock); @@ -331,17 +337,16 @@ static void cgroup_rstat_flush_locked(struct cgroup_rstat *rstat) struct cgroup_rstat *pos = cgroup_rstat_updated_list(rstat, cpu); for (; pos; pos = pos->rstat_flush_next) { - struct cgroup *pos_cgroup = container_of(pos, struct cgroup, rstat); - struct cgroup_subsys_state *css; + struct cgroup_subsys_state *pos_css = container_of( + pos, typeof(*pos_css), rstat); + struct cgroup *pos_cgroup = pos_css->cgroup; - cgroup_base_stat_flush(pos_cgroup, cpu); - bpf_rstat_flush(pos_cgroup, cgroup_parent(pos_cgroup), cpu); + if (!pos_css->ss) + cgroup_base_stat_flush(pos_cgroup, cpu); + else + pos_css->ss->css_rstat_flush(pos_css, cpu); - rcu_read_lock(); - list_for_each_entry_rcu(css, &pos_cgroup->rstat_css_list, - rstat_css_node) - css->ss->css_rstat_flush(css, cpu); - rcu_read_unlock(); + bpf_rstat_flush(pos_cgroup, cgroup_parent(pos_cgroup), cpu); } /* play nice and yield if necessary */ @@ -356,7 +361,9 @@ static void cgroup_rstat_flush_locked(struct cgroup_rstat *rstat) static void __cgroup_rstat_flush(struct cgroup_rstat *rstat) { - struct cgroup *cgrp = container_of(rstat, typeof(*cgrp), rstat); + struct cgroup_subsys_state *css = container_of( + rstat, typeof(*css), rstat); + struct cgroup *cgrp = css->cgroup; might_sleep(); @@ -366,27 +373,29 @@ static void __cgroup_rstat_flush(struct cgroup_rstat *rstat) } /** - * cgroup_rstat_flush - flush stats in @cgrp's subtree - * @cgrp: target cgroup + * cgroup_rstat_flush - flush stats in @css's rstat subtree + * @css: target cgroup subsystem state * - * Collect all per-cpu stats in @cgrp's subtree into the global counters - * and propagate them upwards. After this function returns, all cgroups in - * the subtree have up-to-date ->stat. + * Collect all per-cpu stats in @css's subtree into the global counters + * and propagate them upwards. After this function returns, all rstat + * nodes in the subtree have up-to-date ->stat. * - * This also gets all cgroups in the subtree including @cgrp off the + * This also gets all rstat nodes in the subtree including @css off the * ->updated_children lists. * * This function may block. */ -__bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp) +__bpf_kfunc void cgroup_rstat_flush(struct cgroup_subsys_state *css) { - __cgroup_rstat_flush(&cgrp->rstat); + __cgroup_rstat_flush(&css->rstat); } static void __cgroup_rstat_flush_hold(struct cgroup_rstat *rstat) __acquires(&cgroup_rstat_lock) { - struct cgroup *cgrp = container_of(rstat, typeof(*cgrp), rstat); + struct cgroup_subsys_state *css = container_of( + rstat, typeof(*css), rstat); + struct cgroup *cgrp = css->cgroup; might_sleep(); __cgroup_rstat_lock(cgrp, -1); @@ -394,38 +403,40 @@ static void __cgroup_rstat_flush_hold(struct cgroup_rstat *rstat) } /** - * cgroup_rstat_flush_hold - flush stats in @cgrp's subtree and hold - * @cgrp: target cgroup + * cgroup_rstat_flush_hold - flush stats in @css's rstat subtree and hold + * @css: target subsystem state * - * Flush stats in @cgrp's subtree and prevent further flushes. Must be + * Flush stats in @css's rstat subtree and prevent further flushes. Must be * paired with cgroup_rstat_flush_release(). * * This function may block. */ -void cgroup_rstat_flush_hold(struct cgroup *cgrp) +void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css) { - __cgroup_rstat_flush_hold(&cgrp->rstat); + __cgroup_rstat_flush_hold(&css->rstat); } /** * cgroup_rstat_flush_release - release cgroup_rstat_flush_hold() - * @cgrp: cgroup used by tracepoint + * @rstat: rstat node used to find associated cgroup used by tracepoint */ static void __cgroup_rstat_flush_release(struct cgroup_rstat *rstat) __releases(&cgroup_rstat_lock) { - struct cgroup *cgrp = container_of(rstat, typeof(*cgrp), rstat); + struct cgroup_subsys_state *css = container_of( + rstat, typeof(*css), rstat); + struct cgroup *cgrp = css->cgroup; __cgroup_rstat_unlock(cgrp, -1); } /** * cgroup_rstat_flush_release - release cgroup_rstat_flush_hold() - * @cgrp: cgroup used by tracepoint + * @css: css that was previously used for the call to flush hold */ -void cgroup_rstat_flush_release(struct cgroup *cgrp) +void cgroup_rstat_flush_release(struct cgroup_subsys_state *css) { - __cgroup_rstat_flush_release(&cgrp->rstat); + __cgroup_rstat_flush_release(&css->rstat); } static void __cgroup_rstat_init(struct cgroup_rstat *rstat) @@ -441,8 +452,10 @@ static void __cgroup_rstat_init(struct cgroup_rstat *rstat) } } -int cgroup_rstat_init(struct cgroup_rstat *rstat) +int cgroup_rstat_init(struct cgroup_subsys_state *css) { + struct cgroup_rstat *rstat = &css->rstat; + /* the root cgrp has rstat_cpu preallocated */ if (!rstat->rstat_cpu) { rstat->rstat_cpu = alloc_percpu(struct cgroup_rstat_cpu); @@ -472,11 +485,11 @@ static void __cgroup_rstat_exit(struct cgroup_rstat *rstat) rstat->rstat_cpu = NULL; } -void cgroup_rstat_exit(struct cgroup_rstat *rstat) +void cgroup_rstat_exit(struct cgroup_subsys_state *css) { - struct cgroup *cgrp = container_of(rstat, typeof(*cgrp), rstat); + struct cgroup_rstat *rstat = &css->rstat; - cgroup_rstat_flush(cgrp); + cgroup_rstat_flush(css); __cgroup_rstat_exit(rstat); } @@ -518,7 +531,7 @@ static void cgroup_base_stat_sub(struct cgroup_base_stat *dst_bstat, static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) { - struct cgroup_rstat_cpu *rstatc = rstat_cpu(&cgrp->rstat, cpu); + struct cgroup_rstat_cpu *rstatc = rstat_cpu(&(cgrp->self.rstat), cpu); struct cgroup *parent = cgroup_parent(cgrp); struct cgroup_rstat_cpu *prstatc; struct cgroup_base_stat delta; @@ -548,7 +561,7 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) cgroup_base_stat_add(&cgrp->last_bstat, &delta); delta = rstatc->subtree_bstat; - prstatc = rstat_cpu(&parent->rstat, cpu); + prstatc = rstat_cpu(&(parent->self.rstat), cpu); cgroup_base_stat_sub(&delta, &rstatc->last_subtree_bstat); cgroup_base_stat_add(&prstatc->subtree_bstat, &delta); cgroup_base_stat_add(&rstatc->last_subtree_bstat, &delta); @@ -560,7 +573,7 @@ cgroup_base_stat_cputime_account_begin(struct cgroup *cgrp, unsigned long *flags { struct cgroup_rstat_cpu *rstatc; - rstatc = get_cpu_ptr(cgrp->rstat.rstat_cpu); + rstatc = get_cpu_ptr(cgrp->self.rstat.rstat_cpu); *flags = u64_stats_update_begin_irqsave(&rstatc->bsync); return rstatc; } @@ -570,7 +583,7 @@ static void cgroup_base_stat_cputime_account_end(struct cgroup *cgrp, unsigned long flags) { u64_stats_update_end_irqrestore(&rstatc->bsync, flags); - cgroup_rstat_updated(cgrp, smp_processor_id()); + cgroup_rstat_updated(&cgrp->self, smp_processor_id()); put_cpu_ptr(rstatc); } @@ -673,12 +686,12 @@ void cgroup_base_stat_cputime_show(struct seq_file *seq) u64 usage, utime, stime, ntime; if (cgroup_parent(cgrp)) { - cgroup_rstat_flush_hold(cgrp); + cgroup_rstat_flush_hold(&cgrp->self); usage = cgrp->bstat.cputime.sum_exec_runtime; cputime_adjust(&cgrp->bstat.cputime, &cgrp->prev_cputime, &utime, &stime); ntime = cgrp->bstat.ntime; - cgroup_rstat_flush_release(cgrp); + cgroup_rstat_flush_release(&cgrp->self); } else { /* cgrp->bstat of root is not actually used, reuse it */ root_cgroup_cputime(&cgrp->bstat); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 46f8b372d212..88c2c8e610b1 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -579,7 +579,7 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) if (!val) return; - cgroup_rstat_updated(memcg->css.cgroup, cpu); + cgroup_rstat_updated(&memcg->css, cpu); statc = this_cpu_ptr(memcg->vmstats_percpu); for (; statc; statc = statc->parent) { stats_updates = READ_ONCE(statc->stats_updates) + abs(val); @@ -611,7 +611,7 @@ static void __mem_cgroup_flush_stats(struct mem_cgroup *memcg, bool force) if (mem_cgroup_is_root(memcg)) WRITE_ONCE(flush_last_time, jiffies_64); - cgroup_rstat_flush(memcg->css.cgroup); + cgroup_rstat_flush(&memcg->css); } /* diff --git a/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c b/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c index 035412265c3c..310cd51e12e8 100644 --- a/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c +++ b/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c @@ -45,7 +45,7 @@ int BPF_PROG(test_percpu2, struct bpf_testmod_btf_type_tag_2 *arg) SEC("tp_btf/cgroup_mkdir") int BPF_PROG(test_percpu_load, struct cgroup *cgrp, const char *path) { - g = (__u64)cgrp->rstat.rstat_cpu->updated_children; + g = (__u64)cgrp->self.rstat.rstat_cpu->updated_children; return 0; } @@ -56,7 +56,8 @@ int BPF_PROG(test_percpu_helper, struct cgroup *cgrp, const char *path) __u32 cpu; cpu = bpf_get_smp_processor_id(); - rstat = (struct cgroup_rstat_cpu *)bpf_per_cpu_ptr(cgrp->rstat.rstat_cpu, cpu); + rstat = (struct cgroup_rstat_cpu *)bpf_per_cpu_ptr( + cgrp->self.rstat.rstat_cpu, cpu); if (rstat) { /* READ_ONCE */ *(volatile int *)rstat; diff --git a/tools/testing/selftests/bpf/progs/cgroup_hierarchical_stats.c b/tools/testing/selftests/bpf/progs/cgroup_hierarchical_stats.c index c74362854948..10c803c8dc70 100644 --- a/tools/testing/selftests/bpf/progs/cgroup_hierarchical_stats.c +++ b/tools/testing/selftests/bpf/progs/cgroup_hierarchical_stats.c @@ -37,8 +37,8 @@ struct { __type(value, struct attach_counter); } attach_counters SEC(".maps"); -extern void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) __ksym; -extern void cgroup_rstat_flush(struct cgroup *cgrp) __ksym; +extern void cgroup_rstat_updated(struct cgroup_subsys_state *css, int cpu) __ksym; +extern void cgroup_rstat_flush(struct cgroup_subsys_state *css) __ksym; static uint64_t cgroup_id(struct cgroup *cgrp) { @@ -75,7 +75,7 @@ int BPF_PROG(counter, struct cgroup *dst_cgrp, struct task_struct *leader, else if (create_percpu_attach_counter(cg_id, 1)) return 0; - cgroup_rstat_updated(dst_cgrp, bpf_get_smp_processor_id()); + cgroup_rstat_updated(&dst_cgrp->self, bpf_get_smp_processor_id()); return 0; } @@ -141,7 +141,7 @@ int BPF_PROG(dumper, struct bpf_iter_meta *meta, struct cgroup *cgrp) return 1; /* Flush the stats to make sure we get the most updated numbers */ - cgroup_rstat_flush(cgrp); + cgroup_rstat_flush(&cgrp->self); total_counter = bpf_map_lookup_elem(&attach_counters, &cg_id); if (!total_counter) {