From patchwork Wed Mar 19 22:16:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: JP Kobryn X-Patchwork-Id: 14023225 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1549C36001 for ; Wed, 19 Mar 2025 22:17:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 56B04280005; Wed, 19 Mar 2025 18:17:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4CA9C280002; Wed, 19 Mar 2025 18:17:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2FA6B280005; Wed, 19 Mar 2025 18:17:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 0E136280002 for ; Wed, 19 Mar 2025 18:17:10 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 975481A059F for ; Wed, 19 Mar 2025 22:17:11 +0000 (UTC) X-FDA: 83239712262.19.CBEA74E Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf03.hostedemail.com (Postfix) with ESMTP id C410820004 for ; Wed, 19 Mar 2025 22:17:09 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZxQxeDlC; spf=pass (imf03.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742422629; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=G+XAb5qZM9EJun3uRHZFaBgl2slfeKVEfwXubaUvgvQ=; b=1GiVbM5W+NP1J+PDM+/Wo4ScxWEWAPIRcdJnfh/WVx2l0diDUInMZlYzH1gQCag7WdUcRe Jeg43FMe5P+eD5KIz7VfGPAkAanX6zXNmCgOz6vUx+7hDIUY9TUgUwqZIN0zQ3ktXWQltC Cjo80+fM4B6qoX3vteTzqIglrpdiO88= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZxQxeDlC; spf=pass (imf03.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742422629; a=rsa-sha256; cv=none; b=caKGgq+pfGoCKmNLIOfhs2ykRrbdAQlsBsOHNc40wFrs2QjqMcRpSVy+ou73hxpWTNQLlb z6GRXqpd2EBM8Ern/8WeMiBBYuMh37Xju87GeqyKwspQPhC9YeaSJ3OGG+TQoG+woaChye NwMEZOcMZmWhXPRva4bSy2+KmhG1VfU= Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-225477548e1so1140015ad.0 for ; Wed, 19 Mar 2025 15:17:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742422629; x=1743027429; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=G+XAb5qZM9EJun3uRHZFaBgl2slfeKVEfwXubaUvgvQ=; b=ZxQxeDlCVRDCxufVuUWJNX4vVYEcGQn2IlqMUC4KY69j4YIKiDliPYZ7oK85qCF3SI e3AR2PB0X0P1uAJnE6ZL0nbzrFE2yMDxnkAfG3AqWZ6KnXp+6gzhDveTchUq7WlgDNFe X0W2FBN9MK7SZxctLAX68MTIVs8ZWYAYrbmUd4tkuP5Ny32TlZ5Zymm1z3lZzLnrimRt B0qmmJxUDFAzpqAbHxd9MsF1oxRdEcqyslMpioDlBJV9zWeQJOtmklm45e2SRJZ+Fqev /LqSVQZAzph/3zQJtRA19w4V5SbvN77SKZhJWhnfIN7PbAnIo+RO/lAUsN8fk/gXb0ni 0Jew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742422629; x=1743027429; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=G+XAb5qZM9EJun3uRHZFaBgl2slfeKVEfwXubaUvgvQ=; b=G+qm75R+92hL/IPMA9u3/h89ez1VDE9jd5PqRO7B37eg470ffDoiF6Dq4yLxImNciQ nU+0Nvtq++qj3F23toqBhOdin2Rs80Oa4zdU2d5nwLktz3+FGPdTiC9W39CeTS0KWp4S s/AZwzPIF3NvUo5FfurYfTXJ++MREmip7L+Mf5SEYA/qWmL9SLlwbavaqP1BQ1Kbm1Rk FqZaxg63K8YrkC7yT7bc8g+rfijcS2V50fep9lUdT1o+2deXfVaGZcmrClqLL2hojp1U 25iUQAXDw3lzR7urSJTmY1CMCgfCfQ7jDLwUkgkKQZ0I3l9ALnC4SFobocGBpHz9BllC cTww== X-Gm-Message-State: AOJu0YwdZVGksHzfmVzA50VdLto7mWkOaxi/d7DdR84/O539QyCAVT7d 7TvLnCQiXRePI92UKklHhYAhIFxDzif5nOlSO6MYhMRZjojI/v2K X-Gm-Gg: ASbGncsxCNr6JfFW899X2oSqwBaiRKQOclvehflHahfZzW5xEENEcBF7xFFMQd2wlF6 pJwnSZMbuG8QYYr/+2CrdkdR5qSOiiEk6jL6I8zQ4JceJ8GJa+pP9egsk0+sClLSH8uTcgTFrd8 FviLxZ5aap36/0nKrO2DtAxR1D5MvD8v+avCGcz8VUoIn/dyPsMfoX5eKkHAACb0xT3L5vH8+9p ZoMURfa8MDzcaKLA3DCF1ysbqzWZBPsqXtJpH6FXFZXQJG0jMei3dn8ELDJORiKm2ZrxCgog7P4 WpxgrPg8gEcN1deWbDV2Iu0a/xncvBQtCaFIC/s+NxsW2VmxI9qfBTcx6DI3uFJs0HLqeGWb8m0 dn5A5Nb0= X-Google-Smtp-Source: AGHT+IEKKYpDuuecD1uo8ylrhr/00YN1rGh/OPCt+W9SEcE6Jd30rlNrqU8W2oVUc0Hxahi+e0T9+w== X-Received: by 2002:a05:6a00:3c96:b0:732:5611:cbb5 with SMTP id d2e1a72fcca58-7377a866940mr1458881b3a.11.1742422628594; Wed, 19 Mar 2025 15:17:08 -0700 (PDT) Received: from jpkobryn-fedora-PF5CFKNC.thefacebook.com ([2620:10d:c090:500::4:39d5]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-737116af372sm12253977b3a.160.2025.03.19.15.17.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Mar 2025 15:17:08 -0700 (PDT) From: JP Kobryn To: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com, mkoutny@suse.com, hannes@cmpxchg.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 1/4] cgroup: use separate rstat api for bpf programs Date: Wed, 19 Mar 2025 15:16:31 -0700 Message-ID: <20250319221634.71128-2-inwardvessel@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250319221634.71128-1-inwardvessel@gmail.com> References: <20250319221634.71128-1-inwardvessel@gmail.com> MIME-Version: 1.0 X-Stat-Signature: he91ay7np7tmw4arwzyb9fbt6to5dxkq X-Rspamd-Queue-Id: C410820004 X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1742422629-63594 X-HE-Meta: U2FsdGVkX1+xfDPEAQXhBd/A/0FyZbIO9Z26uusbWYqhWf3JTHdZgHR1mG5ZDWrF0Fd8vHjXIj5znAaNucLVs/YTYlHrYW+YcmxVRP1TgGORiyrmwMbmo/eVIfQ63MrNUpy3rUZs05CuT3GpYYprOkRaCmFD6wbTcO+8VPITuTclBRoVVn8nBpKN5nn/Bd3JkXuGCBAdzJ4mgPaHK4wMr2+bnxTXNAFMxP32+oOaUDRHMAg2N+9qnmUOspMZTEnMvi3ob+DrNMudM584lYG7CLUc+ZM8Dc3wViXtije3II291EpyoOiGGuZeASdMtb29eNfpxkcMon4DVlrhGk0LPjJio6Siu7v8VpHp6Sv94Crrz2OEJtxz6rTAJx8UPQ7X6CmYNmGvHlO21B6o8Hx8VJoMJgqMmvjnPU2HM/I/lYVLLvsegDIoKRBX5+9G1dyvH+T6NFIfVHx9uNCvFotqBKUVpz4He7OJGHZYwUC7OYhHF1okNbRsfH1VWfjmKRvS6Axj03kL9BrJAw7unR4xMk59aWdqYG2SzGmnAFJuFdFOG9dvUFDKgw0B4AgyKS19Tp1f/pZDVzmrW0qA9zhjf8QowjNES5R0EqsZZlHFoUglXkVA8w7MM9xrmc2wHe+FMlZYjLcKdHtWKCi8BYFfPj3pWJq9k2Ar7tIg2qtuJ0VOZNsQ9mtR97cfdlEKNsAIricC9igJSO08nqAMFFjFvZLpwwMDCGhlc+43o6VxWRqPkIYdYi5w1u9UJDJKeFb0/MgWji9WE03NBE/IWe0ZmMuJ76AuUar27IjPzppb28tYdKSHH4VyqlHxEyAU+p3gens/tvI1kjh0bSvZzbBf82/dv7GRXTH8rIfaaFc8WURR5vrTUMvXf6uzdLylEDUQ+6xngVsLbyh7NrOJE7gIAbEb5wWUWWR9liuxg/W9y7BvkaN4czTXaaZo2dJbTvsGPvjL4knwaES0/Njv9a3 uRLX1PY+ oJbZXqYmPwExTdXiYZgo82lH05fwfzcfAuwr8hXzHcNWTkVzTpFb3a7VLZFDFPDaivbeYDQV8hWLnTvT4pL11p0QhuUv+orMFrnEdeTknlAeKbTv+b3euU/TlYvLtRmtsqaX7G/qZrzwsDwY9s4yw0NEyhvDAIVHtxPlhnkiWt/oLvUfhyQdYg0zRSqLFmDI99GWrxwsbIQo35/zmpjsMxlbz5XQKEJXRhFjukZL0a7AZiSTc/mxt1v7z+Dm7N1r7WM6s48owvfkncF1ObXv+25YPIGQoUY92cRpueHcHTIyRdhC5hNX+veTBSjPGsV1s60M06Ja/jiw6uh0yOVqNhN4A5bTuO2fMGGZxO0mnMGwoBB3mVRS+xGTJWo7VdjdTLJAM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The rstat updated/flush API functions are exported as kfuncs so bpf programs can make the same calls that in-kernel code can. Split these API functions into separate in-kernel and bpf versions. Function signatures remain unchanged. The kfuncs are named with the prefix "bpf_". This non-functional change allows for future commits which will modify the signature of the in-kernel API without impacting bpf call sites. The implementations of the kfuncs serve as adapters to the in-kernel API. Signed-off-by: JP Kobryn --- include/linux/cgroup.h | 3 +++ kernel/cgroup/rstat.c | 19 ++++++++++++++----- .../bpf/progs/cgroup_hierarchical_stats.c | 8 ++++---- 3 files changed, 21 insertions(+), 9 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index f8ef47f8a634..13fd82a4336d 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -692,6 +692,9 @@ void cgroup_rstat_flush(struct cgroup *cgrp); void cgroup_rstat_flush_hold(struct cgroup *cgrp); void cgroup_rstat_flush_release(struct cgroup *cgrp); +void bpf_cgroup_rstat_updated(struct cgroup *cgrp, int cpu); +void bpf_cgroup_rstat_flush(struct cgroup *cgrp); + /* * Basic resource stats. */ diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index aac91466279f..0d66cfc53061 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -82,7 +82,7 @@ void _cgroup_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int cpu, * rstat_cpu->updated_children list. See the comment on top of * cgroup_rstat_cpu definition for details. */ -__bpf_kfunc void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) +void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) { raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); unsigned long flags; @@ -129,6 +129,11 @@ __bpf_kfunc void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) _cgroup_rstat_cpu_unlock(cpu_lock, cpu, cgrp, flags, true); } +__bpf_kfunc void bpf_cgroup_rstat_updated(struct cgroup *cgrp, int cpu) +{ + cgroup_rstat_updated(cgrp, cpu); +} + /** * cgroup_rstat_push_children - push children cgroups into the given list * @head: current head of the list (= subtree root) @@ -346,7 +351,7 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp) * * This function may block. */ -__bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp) +void cgroup_rstat_flush(struct cgroup *cgrp) { might_sleep(); @@ -355,6 +360,11 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp) __cgroup_rstat_unlock(cgrp, -1); } +__bpf_kfunc void bpf_cgroup_rstat_flush(struct cgroup *cgrp) +{ + cgroup_rstat_flush(cgrp); +} + /** * cgroup_rstat_flush_hold - flush stats in @cgrp's subtree and hold * @cgrp: target cgroup @@ -644,10 +654,9 @@ void cgroup_base_stat_cputime_show(struct seq_file *seq) cgroup_force_idle_show(seq, &cgrp->bstat); } -/* Add bpf kfuncs for cgroup_rstat_updated() and cgroup_rstat_flush() */ BTF_KFUNCS_START(bpf_rstat_kfunc_ids) -BTF_ID_FLAGS(func, cgroup_rstat_updated) -BTF_ID_FLAGS(func, cgroup_rstat_flush, KF_SLEEPABLE) +BTF_ID_FLAGS(func, bpf_cgroup_rstat_updated) +BTF_ID_FLAGS(func, bpf_cgroup_rstat_flush, KF_SLEEPABLE) BTF_KFUNCS_END(bpf_rstat_kfunc_ids) static const struct btf_kfunc_id_set bpf_rstat_kfunc_set = { diff --git a/tools/testing/selftests/bpf/progs/cgroup_hierarchical_stats.c b/tools/testing/selftests/bpf/progs/cgroup_hierarchical_stats.c index c74362854948..24450dd4d3f3 100644 --- a/tools/testing/selftests/bpf/progs/cgroup_hierarchical_stats.c +++ b/tools/testing/selftests/bpf/progs/cgroup_hierarchical_stats.c @@ -37,8 +37,8 @@ struct { __type(value, struct attach_counter); } attach_counters SEC(".maps"); -extern void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) __ksym; -extern void cgroup_rstat_flush(struct cgroup *cgrp) __ksym; +extern void bpf_cgroup_rstat_updated(struct cgroup *cgrp, int cpu) __ksym; +extern void bpf_cgroup_rstat_flush(struct cgroup *cgrp) __ksym; static uint64_t cgroup_id(struct cgroup *cgrp) { @@ -75,7 +75,7 @@ int BPF_PROG(counter, struct cgroup *dst_cgrp, struct task_struct *leader, else if (create_percpu_attach_counter(cg_id, 1)) return 0; - cgroup_rstat_updated(dst_cgrp, bpf_get_smp_processor_id()); + bpf_cgroup_rstat_updated(dst_cgrp, bpf_get_smp_processor_id()); return 0; } @@ -141,7 +141,7 @@ int BPF_PROG(dumper, struct bpf_iter_meta *meta, struct cgroup *cgrp) return 1; /* Flush the stats to make sure we get the most updated numbers */ - cgroup_rstat_flush(cgrp); + bpf_cgroup_rstat_flush(cgrp); total_counter = bpf_map_lookup_elem(&attach_counters, &cg_id); if (!total_counter) { From patchwork Wed Mar 19 22:16:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: JP Kobryn X-Patchwork-Id: 14023226 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21DE9C35FFA for ; Wed, 19 Mar 2025 22:17:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5F8F4280009; Wed, 19 Mar 2025 18:17:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 53903280002; Wed, 19 Mar 2025 18:17:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 342A0280009; Wed, 19 Mar 2025 18:17:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 06ED9280002 for ; Wed, 19 Mar 2025 18:17:12 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9519F55F9D for ; Wed, 19 Mar 2025 22:17:13 +0000 (UTC) X-FDA: 83239712346.25.8CBB2A5 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf15.hostedemail.com (Postfix) with ESMTP id A3A90A0006 for ; Wed, 19 Mar 2025 22:17:11 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ArGjyHdW; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742422631; a=rsa-sha256; cv=none; b=n4V0sEZyUvoyme6zK1Zh61iJW/LNd5WycNh0lSWQf51IjoDtBO+g/ECKezNbQwhi6Avc86 AOgorKDJRUzv/x58jiFcpVvsZSadME9s3YdRL/C/pz1xtupKEb4ljxfY0Ie5h+PskLrFPA C4VsgjzU30XyYkFjYw+2gFITEavsWQ4= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ArGjyHdW; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742422631; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ddAcUsPMaGKNvVLTpfP1DRbRXL6vfAD6mKCpTB6XO+A=; b=j45U9s4DyrLQnvme8K3mjeJbgbKNazwNWMkRArN8NFGMr4OlSl9B8a4c1Vg+kcDid14/On fKcFFRXT86ASm8z6fevfJjfmdb8CEzv5s3jz38C3v0wlOxuwhAlVBOWt81UGZkDIz19V5a XgjGvytml9+NUBYWQ7HMCybcTykBwVc= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-224171d6826so1395055ad.3 for ; Wed, 19 Mar 2025 15:17:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742422630; x=1743027430; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ddAcUsPMaGKNvVLTpfP1DRbRXL6vfAD6mKCpTB6XO+A=; b=ArGjyHdWp+lwzxTVhNNlaKW40Z4Fn6ww6cRpXe/Hh6HM9aHCnC2Z+FIrixdt9Oqb89 PKO7m1YfeiJn41L6cUQj/nziBYfBZzY7FXIiJQ1Whtq32S32MmmtYHW15Rx5d69qcSdx 0oKZ0DqXZKs1LtFtY6VuKtWO11Jh35eqy2aDlXEKZOKcXQ400k1qF3EIGNFZmgBMikDA uj7iO1qR0HvQSdOi79ysXRhCd8uFgrU8VTq3sMM6Q/QYOvN199Aj6EYHejekCjLHgehQ L3C4MOcHobR5PMgMMxqWmIfuS0PgUNc1sQCszTMRJC3xaphZVyCQc+PaR9ynViUJXKkW rbpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742422630; x=1743027430; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ddAcUsPMaGKNvVLTpfP1DRbRXL6vfAD6mKCpTB6XO+A=; b=ovrNPI5vUh7+o44FdMsGa9bcKgUNpmaBy7uFs1toT1zwFsDtu+SXWsEUC8ShuaRwKy 1W7YI2AbNGCZrocudhK1oN4Vq3HyLIl8d72HDqeiCl6SUa6GMWrYz+WghSK3cLbqpANV jmm4ZMa1akW4N9ZtuE1AN3kOTBIK8UO/sP20zynzMRWQSawSKqhfxmSFR3tbGk0Y+C0C GmSxfwDSXTrUqV/B/Lj8dWVDZIvuRxXOSdwaRvoEzabhYZ0n+PZlu/A+ww15PrZNp781 u0rTrX5Yyar51U3/nRCRKTlSRR8ZaKoPfDCWK6XmMMsGh/mUbiabfRikeQQ4WABghTVm 9u8g== X-Gm-Message-State: AOJu0YyZoXT5ZNjsFzSgZfKK+FvNjjE2IMh+jThDYc0zoes/Gw7Ad4Fj 5MZ0O7k59kh+NDLNVY6y1wcRBaV1J1ZI4RJuqPkdfbXCZJVHGhAW X-Gm-Gg: ASbGncsLZ+AbLoIOsgTBvPUYC92sugB9rXT3z3SnM3jv2M6Vy9P0PpAwW80VA4mqKQK f9vnc/q9Bzx8xC3F1MrjblL7xmGZ0D9HzQNVc8Odz3UtWUKds4dp4o1gZGbBZzeinW+2pxpIWsm /a9DGauugWepUTSJIDN2LxityVZ45a4QsItE8L5bFmBQqSt53So7MZxZ6qJ1tyL04gQokEsxGIN hOrVyQ7Csg+V7+UroB1WCx/xh5mxxTE+p2VUX3P+/nyrBaw1xhsVgGWIM1+MZ+rXB7lMZntEP5E 5IBpaFlLDb8vnLzAQwKPzbtVyKXYcrXf5Jc2PLCT8snmx3bpkt7j5zhOe4DJ/0ZB8ibNCWtH0jn gjnQPc/k= X-Google-Smtp-Source: AGHT+IHRXBu4Wu5kcthmwsBMt5ij407dsQBcBPf2w/7rF4yHXT8r2oRycBVvbhFWtBGjl1BDL9dEcg== X-Received: by 2002:a05:6a20:2d09:b0:1f5:769a:a4c3 with SMTP id adf61e73a8af0-1fbecd48275mr7330927637.27.1742422630231; Wed, 19 Mar 2025 15:17:10 -0700 (PDT) Received: from jpkobryn-fedora-PF5CFKNC.thefacebook.com ([2620:10d:c090:500::4:39d5]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-737116af372sm12253977b3a.160.2025.03.19.15.17.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Mar 2025 15:17:09 -0700 (PDT) From: JP Kobryn To: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com, mkoutny@suse.com, hannes@cmpxchg.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 2/4] cgroup: use separate rstat trees for each subsystem Date: Wed, 19 Mar 2025 15:16:32 -0700 Message-ID: <20250319221634.71128-3-inwardvessel@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250319221634.71128-1-inwardvessel@gmail.com> References: <20250319221634.71128-1-inwardvessel@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspam-User: X-Stat-Signature: 1pjwc47a8cfmigacb3pmobkcapupjfrc X-Rspamd-Queue-Id: A3A90A0006 X-HE-Tag: 1742422631-658705 X-HE-Meta: U2FsdGVkX1/N4WqVruhkSxXeq1AsctXYFcnRTkJWueBghYc6Bt82MQsacpSDQHSH4T+UsysH1JZCTmSj5iLu4e8Wz5/RewFMDrY1IfkiAY7gcQlipRhXhyhl226ybWfcuugw7xN+BF4CnOb2ryvtrT3TXIaetHe4pXXQnodK8Ogh9xeM5Oq9UM3ltRR6xFSmyywqx4rM4ObETUDCNtlU/4yN6RRXiKabN2zzy9RUuI4BJnGIQwFYWEdTqIRYn3mDVNIUYAs1r/qyceAL8mE03Yo52lPAtahtRAjsQWYS2Bh9AfVRUtIq9AcRlOtANpEeYuoTjm6U+Je23rw+TNXdAlwyFpznl7QX/srp5panW3fJjA8CbKOYkHg4werdSDYZJ+AltwIvqTF+0unxDQVHnecE4yXjmoSqT4oldGAB1JG+6hSWQOA9kUCD1APLY2R5mYrxsbhju7/aGqquRboW1qlPFfKG/ipIKptjIEhX3xpWiM4LPrsIWicTIuLkjM63LxyKwibTkdCfdhPCAYPZVI8XFZBo8eAHKqDxGgP2cVbDe60oYB1I6SYUZMzeugZ1lJTWIHMe7in7I8Ur8kQuDreVeiCaAdI79WrE2mE5QXuIKAYOf7xwkaowXbSXNQwOhkXpYEuKoHEEKurelHWRe5m0D2dJgNX2C+ohubRnbmNVvJByPSMHQ7cPuBSkNeRwjG4HhfHj8iGjY1KcImSQoyyiWpAXwpmvVhF0xWiEpr/Gj3uiXzED0RrghEvdxgTGFZ2AsmUufZPqK9BjxQ+7nwk3KzGUvS68dQ9XC6iRxIQgX/2opHuqNjhpRTIEs4qIiUDzMmTP0MoVDNI8eV1Jb8mE/q+FtTt3QxX9yU9RLP7/Ypi69rbELD/4E0l2IENJCOWmIvjXbuUxDd+fCONysZd7kQrzKqKPBm85Luk7oyC7Fj/EwWb60lxF6sVTQGiol65+PrlSY0gxJLfmPaP yXvDcjdL 9J+/00/vJwES+SZSn2O2O2PnbwW6fndq3iVHzvJsFZ3fNtYU6RZXe47Xcotg9d+uif8VCNsOiIAZxrkqLJ4Iiuo18YqXeg1W8KE5HJV/Yn9J9kPU49yacj8O9DHQ5z6Fyfb/6yLNi4XO1NpReV5Iq9sUyuxcIzoEofTvQGatQ3oyiCLHFSzApfG+3EvH1mAaX8xYXiMA8iilK7PVrO7V6fVSiwCRMiEc/vJwMsJsxXIH6YJ3nW+5ZgpHigjeEKE5RSTxENI3EYIv1Fxr5q1vr+YnRt567l/kMQgMiekzioAhQQRR3R2rl8XsSt5t10l+mpX0llxXHq0SURpWVci6InclRKhv+AFlZd2BM2eX41LXp0Pj0eRyjFz+S86398hKNDP8G X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Different subsystems may call cgroup_rstat_updated() within the same cgroup, resulting in a tree of pending updates from multiple subsystems. When one of these subsystems is flushed via cgroup_rstat_flushed(), all other subsystems with pending updates on the tree will also be flushed. Change the paradigm of having a single rstat tree for all subsystems to having separate trees for each subsystem. This separation allows for subsystems to perform flushes without the side effects of other subsystems. As an example, flushing the cpu stats will no longer cause the memory stats to be flushed and vice versa. In order to achieve subsystem-specific trees, change the tree node type from cgroup to cgroup_subsys_state pointer. Then remove those pointers from the cgroup and instead place them on the css. Finally, change the updated/flush API's to accept a reference to a css instead of a cgroup. This allows a specific subsystem to be associated with an update or flush. Separate rstat trees will now exist for each unique subsystem. Since updating/flushing will now be done at the subsystem level, there is no longer a need to keep track of updated css nodes at the cgroup level. The list management of these nodes done within the cgroup (rstat_css_list and related) has been removed accordingly. There was also padding in the cgroup to keep rstat_css_list on a cacheline different from rstat_flush_next and the base stats. This padding has also been removed. Signed-off-by: JP Kobryn --- block/blk-cgroup.c | 4 +- include/linux/cgroup-defs.h | 41 ++-- include/linux/cgroup.h | 13 +- kernel/cgroup/cgroup-internal.h | 4 +- kernel/cgroup/cgroup.c | 63 +++--- kernel/cgroup/rstat.c | 212 +++++++++--------- mm/memcontrol.c | 4 +- .../selftests/bpf/progs/btf_type_tag_percpu.c | 5 +- 8 files changed, 177 insertions(+), 169 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 9ed93d91d754..cd9521f4f607 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1201,7 +1201,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v) if (!seq_css(sf)->parent) blkcg_fill_root_iostats(); else - cgroup_rstat_flush(blkcg->css.cgroup); + css_rstat_flush(&blkcg->css); rcu_read_lock(); hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) { @@ -2186,7 +2186,7 @@ void blk_cgroup_bio_start(struct bio *bio) } u64_stats_update_end_irqrestore(&bis->sync, flags); - cgroup_rstat_updated(blkcg->css.cgroup, cpu); + css_rstat_updated(&blkcg->css, cpu); put_cpu(); } diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 17960a1e858d..031f55a9ac49 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -169,6 +169,9 @@ struct cgroup_subsys_state { /* reference count - access via css_[try]get() and css_put() */ struct percpu_ref refcnt; + /* per-cpu recursive resource statistics */ + struct css_rstat_cpu __percpu *rstat_cpu; + /* * siblings list anchored at the parent's ->children * @@ -177,9 +180,6 @@ struct cgroup_subsys_state { struct list_head sibling; struct list_head children; - /* flush target list anchored at cgrp->rstat_css_list */ - struct list_head rstat_css_node; - /* * PI: Subsys-unique ID. 0 is unused and root is always 1. The * matching css can be looked up using css_from_id(). @@ -219,6 +219,13 @@ struct cgroup_subsys_state { * Protected by cgroup_mutex. */ int nr_descendants; + + /* + * A singly-linked list of css structures to be rstat flushed. + * This is a scratch field to be used exclusively by + * cgroup_rstat_flush_locked() and protected by cgroup_rstat_lock. + */ + struct cgroup_subsys_state *rstat_flush_next; }; /* @@ -329,10 +336,10 @@ struct cgroup_base_stat { /* * rstat - cgroup scalable recursive statistics. Accounting is done - * per-cpu in cgroup_rstat_cpu which is then lazily propagated up the + * per-cpu in css_rstat_cpu which is then lazily propagated up the * hierarchy on reads. * - * When a stat gets updated, the cgroup_rstat_cpu and its ancestors are + * When a stat gets updated, the css_rstat_cpu and its ancestors are * linked into the updated tree. On the following read, propagation only * considers and consumes the updated tree. This makes reading O(the * number of descendants which have been active since last read) instead of @@ -347,7 +354,7 @@ struct cgroup_base_stat { * updated_children and updated_next - and the fields which track basic * resource statistics on top of it - bsync, bstat and last_bstat. */ -struct cgroup_rstat_cpu { +struct css_rstat_cpu { /* * ->bsync protects ->bstat. These are the only fields which get * updated in the hot path. @@ -386,8 +393,8 @@ struct cgroup_rstat_cpu { * * Protected by per-cpu cgroup_rstat_cpu_lock. */ - struct cgroup *updated_children; /* terminated by self cgroup */ - struct cgroup *updated_next; /* NULL iff not on the list */ + struct cgroup_subsys_state *updated_children; /* terminated by self */ + struct cgroup_subsys_state *updated_next; /* NULL if not on list */ }; struct cgroup_freezer_state { @@ -516,24 +523,6 @@ struct cgroup { struct cgroup *dom_cgrp; struct cgroup *old_dom_cgrp; /* used while enabling threaded */ - /* per-cpu recursive resource statistics */ - struct cgroup_rstat_cpu __percpu *rstat_cpu; - struct list_head rstat_css_list; - - /* - * Add padding to separate the read mostly rstat_cpu and - * rstat_css_list into a different cacheline from the following - * rstat_flush_next and *bstat fields which can have frequent updates. - */ - CACHELINE_PADDING(_pad_); - - /* - * A singly-linked list of cgroup structures to be rstat flushed. - * This is a scratch field to be used exclusively by - * cgroup_rstat_flush_locked() and protected by cgroup_rstat_lock. - */ - struct cgroup *rstat_flush_next; - /* cgroup basic resource statistics */ struct cgroup_base_stat last_bstat; struct cgroup_base_stat bstat; diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 13fd82a4336d..4e71ae9858d3 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -346,6 +346,11 @@ static inline bool css_is_dying(struct cgroup_subsys_state *css) return !(css->flags & CSS_NO_REF) && percpu_ref_is_dying(&css->refcnt); } +static inline bool css_is_cgroup(struct cgroup_subsys_state *css) +{ + return css->ss == NULL; +} + static inline void cgroup_get(struct cgroup *cgrp) { css_get(&cgrp->self); @@ -687,10 +692,10 @@ static inline void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen) /* * cgroup scalable recursive statistics. */ -void cgroup_rstat_updated(struct cgroup *cgrp, int cpu); -void cgroup_rstat_flush(struct cgroup *cgrp); -void cgroup_rstat_flush_hold(struct cgroup *cgrp); -void cgroup_rstat_flush_release(struct cgroup *cgrp); +void css_rstat_updated(struct cgroup_subsys_state *css, int cpu); +void css_rstat_flush(struct cgroup_subsys_state *css); +void css_rstat_flush_hold(struct cgroup_subsys_state *css); +void css_rstat_flush_release(struct cgroup_subsys_state *css); void bpf_cgroup_rstat_updated(struct cgroup *cgrp, int cpu); void bpf_cgroup_rstat_flush(struct cgroup *cgrp); diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index c964dd7ff967..d4b75fba9a54 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -269,8 +269,8 @@ int cgroup_task_count(const struct cgroup *cgrp); /* * rstat.c */ -int cgroup_rstat_init(struct cgroup *cgrp); -void cgroup_rstat_exit(struct cgroup *cgrp); +int css_rstat_init(struct cgroup_subsys_state *css); +void css_rstat_exit(struct cgroup_subsys_state *css); void cgroup_rstat_boot(void); void cgroup_base_stat_cputime_show(struct seq_file *seq); diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index afc665b7b1fe..1e21065dec0e 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -161,10 +161,12 @@ static struct static_key_true *cgroup_subsys_on_dfl_key[] = { }; #undef SUBSYS -static DEFINE_PER_CPU(struct cgroup_rstat_cpu, cgrp_dfl_root_rstat_cpu); +static DEFINE_PER_CPU(struct css_rstat_cpu, root_self_rstat_cpu); /* the default hierarchy */ -struct cgroup_root cgrp_dfl_root = { .cgrp.rstat_cpu = &cgrp_dfl_root_rstat_cpu }; +struct cgroup_root cgrp_dfl_root = { + .cgrp.self.rstat_cpu = &root_self_rstat_cpu +}; EXPORT_SYMBOL_GPL(cgrp_dfl_root); /* @@ -1358,7 +1360,7 @@ static void cgroup_destroy_root(struct cgroup_root *root) cgroup_unlock(); - cgroup_rstat_exit(cgrp); + css_rstat_exit(&cgrp->self); kernfs_destroy_root(root->kf_root); cgroup_free_root(root); } @@ -1863,13 +1865,6 @@ int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) } spin_unlock_irq(&css_set_lock); - if (ss->css_rstat_flush) { - list_del_rcu(&css->rstat_css_node); - synchronize_rcu(); - list_add_rcu(&css->rstat_css_node, - &dcgrp->rstat_css_list); - } - /* default hierarchy doesn't enable controllers by default */ dst_root->subsys_mask |= 1 << ssid; if (dst_root == &cgrp_dfl_root) { @@ -2052,7 +2047,6 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp) cgrp->dom_cgrp = cgrp; cgrp->max_descendants = INT_MAX; cgrp->max_depth = INT_MAX; - INIT_LIST_HEAD(&cgrp->rstat_css_list); prev_cputime_init(&cgrp->prev_cputime); for_each_subsys(ss, ssid) @@ -2132,7 +2126,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) if (ret) goto destroy_root; - ret = cgroup_rstat_init(root_cgrp); + ret = css_rstat_init(&root_cgrp->self); if (ret) goto destroy_root; @@ -2174,7 +2168,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) goto out; exit_stats: - cgroup_rstat_exit(root_cgrp); + css_rstat_exit(&root_cgrp->self); destroy_root: kernfs_destroy_root(root->kf_root); root->kf_root = NULL; @@ -5407,6 +5401,9 @@ static void css_free_rwork_fn(struct work_struct *work) struct cgroup_subsys_state *parent = css->parent; int id = css->id; + if (ss->css_rstat_flush) + css_rstat_exit(css); + ss->css_free(css); cgroup_idr_remove(&ss->css_idr, id); cgroup_put(cgrp); @@ -5431,7 +5428,7 @@ static void css_free_rwork_fn(struct work_struct *work) cgroup_put(cgroup_parent(cgrp)); kernfs_put(cgrp->kn); psi_cgroup_free(cgrp); - cgroup_rstat_exit(cgrp); + css_rstat_exit(&cgrp->self); kfree(cgrp); } else { /* @@ -5459,11 +5456,8 @@ static void css_release_work_fn(struct work_struct *work) if (ss) { struct cgroup *parent_cgrp; - /* css release path */ - if (!list_empty(&css->rstat_css_node)) { - cgroup_rstat_flush(cgrp); - list_del_rcu(&css->rstat_css_node); - } + if (ss->css_rstat_flush) + css_rstat_flush(css); cgroup_idr_replace(&ss->css_idr, NULL, css->id); if (ss->css_released) @@ -5489,7 +5483,7 @@ static void css_release_work_fn(struct work_struct *work) /* cgroup release path */ TRACE_CGROUP_PATH(release, cgrp); - cgroup_rstat_flush(cgrp); + css_rstat_flush(&cgrp->self); spin_lock_irq(&css_set_lock); for (tcgrp = cgroup_parent(cgrp); tcgrp; @@ -5537,7 +5531,6 @@ static void init_and_link_css(struct cgroup_subsys_state *css, css->id = -1; INIT_LIST_HEAD(&css->sibling); INIT_LIST_HEAD(&css->children); - INIT_LIST_HEAD(&css->rstat_css_node); css->serial_nr = css_serial_nr_next++; atomic_set(&css->online_cnt, 0); @@ -5546,9 +5539,6 @@ static void init_and_link_css(struct cgroup_subsys_state *css, css_get(css->parent); } - if (ss->css_rstat_flush) - list_add_rcu(&css->rstat_css_node, &cgrp->rstat_css_list); - BUG_ON(cgroup_css(cgrp, ss)); } @@ -5641,6 +5631,12 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp, goto err_free_css; css->id = err; + if (ss->css_rstat_flush) { + err = css_rstat_init(css); + if (err) + goto err_free_css; + } + /* @css is ready to be brought online now, make it visible */ list_add_tail_rcu(&css->sibling, &parent_css->children); cgroup_idr_replace(&ss->css_idr, css, css->id); @@ -5654,7 +5650,6 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp, err_list_del: list_del_rcu(&css->sibling); err_free_css: - list_del_rcu(&css->rstat_css_node); INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn); queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork); return ERR_PTR(err); @@ -5682,7 +5677,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name, if (ret) goto out_free_cgrp; - ret = cgroup_rstat_init(cgrp); + ret = css_rstat_init(&cgrp->self); if (ret) goto out_cancel_ref; @@ -5775,7 +5770,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name, out_kernfs_remove: kernfs_remove(cgrp->kn); out_stat_exit: - cgroup_rstat_exit(cgrp); + css_rstat_exit(&cgrp->self); out_cancel_ref: percpu_ref_exit(&cgrp->self.refcnt); out_free_cgrp: @@ -6082,11 +6077,16 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early) css->flags |= CSS_NO_REF; if (early) { - /* allocation can't be done safely during early init */ + /* allocation can't be done safely during early init. + * defer idr and rstat allocations until cgroup_init(). + */ css->id = 1; } else { css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL); BUG_ON(css->id < 0); + + if (ss->css_rstat_flush) + BUG_ON(css_rstat_init(css)); } /* Update the init_css_set to contain a subsys @@ -6185,9 +6185,16 @@ int __init cgroup_init(void) struct cgroup_subsys_state *css = init_css_set.subsys[ss->id]; + /* it is now safe to perform allocations. + * finish setting up subsystems that previously + * deferred idr and rstat allocations. + */ css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL); BUG_ON(css->id < 0); + + if (ss->css_rstat_flush) + BUG_ON(css_rstat_init(css)); } else { cgroup_init_subsys(ss, false); } diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 0d66cfc53061..a28c00b11736 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -14,9 +14,10 @@ static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock); static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu); -static struct cgroup_rstat_cpu *cgroup_rstat_cpu(struct cgroup *cgrp, int cpu) +static struct css_rstat_cpu *css_rstat_cpu( + struct cgroup_subsys_state *css, int cpu) { - return per_cpu_ptr(cgrp->rstat_cpu, cpu); + return per_cpu_ptr(css->rstat_cpu, cpu); } /* @@ -74,16 +75,17 @@ void _cgroup_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int cpu, } /** - * cgroup_rstat_updated - keep track of updated rstat_cpu - * @cgrp: target cgroup + * css_rstat_updated - keep track of updated rstat_cpu + * @css: target cgroup subsystem state * @cpu: cpu on which rstat_cpu was updated * - * @cgrp's rstat_cpu on @cpu was updated. Put it on the parent's matching + * @css's rstat_cpu on @cpu was updated. Put it on the parent's matching * rstat_cpu->updated_children list. See the comment on top of - * cgroup_rstat_cpu definition for details. + * css_rstat_cpu definition for details. */ -void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) +void css_rstat_updated(struct cgroup_subsys_state *css, int cpu) { + struct cgroup *cgrp = css->cgroup; raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); unsigned long flags; @@ -92,19 +94,19 @@ void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) * temporary inaccuracies, which is fine. * * Because @parent's updated_children is terminated with @parent - * instead of NULL, we can tell whether @cgrp is on the list by + * instead of NULL, we can tell whether @css is on the list by * testing the next pointer for NULL. */ - if (data_race(cgroup_rstat_cpu(cgrp, cpu)->updated_next)) + if (data_race(css_rstat_cpu(css, cpu)->updated_next)) return; flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, true); - /* put @cgrp and all ancestors on the corresponding updated lists */ + /* put @css and all ancestors on the corresponding updated lists */ while (true) { - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu); - struct cgroup *parent = cgroup_parent(cgrp); - struct cgroup_rstat_cpu *prstatc; + struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu); + struct cgroup_subsys_state *parent = css->parent; + struct css_rstat_cpu *prstatc; /* * Both additions and removals are bottom-up. If a cgroup @@ -115,15 +117,15 @@ void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) /* Root has no parent to link it to, but mark it busy */ if (!parent) { - rstatc->updated_next = cgrp; + rstatc->updated_next = css; break; } - prstatc = cgroup_rstat_cpu(parent, cpu); + prstatc = css_rstat_cpu(parent, cpu); rstatc->updated_next = prstatc->updated_children; - prstatc->updated_children = cgrp; + prstatc->updated_children = css; - cgrp = parent; + css = parent; } _cgroup_rstat_cpu_unlock(cpu_lock, cpu, cgrp, flags, true); @@ -131,7 +133,7 @@ void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) __bpf_kfunc void bpf_cgroup_rstat_updated(struct cgroup *cgrp, int cpu) { - cgroup_rstat_updated(cgrp, cpu); + css_rstat_updated(&cgrp->self, cpu); } /** @@ -141,18 +143,19 @@ __bpf_kfunc void bpf_cgroup_rstat_updated(struct cgroup *cgrp, int cpu) * @cpu: target cpu * Return: A new singly linked list of cgroups to be flush * - * Iteratively traverse down the cgroup_rstat_cpu updated tree level by + * Iteratively traverse down the css_rstat_cpu updated tree level by * level and push all the parents first before their next level children * into a singly linked list built from the tail backward like "pushing" * cgroups into a stack. The root is pushed by the caller. */ -static struct cgroup *cgroup_rstat_push_children(struct cgroup *head, - struct cgroup *child, int cpu) +static struct cgroup_subsys_state *cgroup_rstat_push_children( + struct cgroup_subsys_state *head, + struct cgroup_subsys_state *child, int cpu) { - struct cgroup *chead = child; /* Head of child cgroup level */ - struct cgroup *ghead = NULL; /* Head of grandchild cgroup level */ - struct cgroup *parent, *grandchild; - struct cgroup_rstat_cpu *crstatc; + struct cgroup_subsys_state *chead = child; /* Head of child css level */ + struct cgroup_subsys_state *ghead = NULL; /* Head of grandchild css level */ + struct cgroup_subsys_state *parent, *grandchild; + struct css_rstat_cpu *crstatc; child->rstat_flush_next = NULL; @@ -160,13 +163,13 @@ static struct cgroup *cgroup_rstat_push_children(struct cgroup *head, while (chead) { child = chead; chead = child->rstat_flush_next; - parent = cgroup_parent(child); + parent = child->parent; /* updated_next is parent cgroup terminated */ while (child != parent) { child->rstat_flush_next = head; head = child; - crstatc = cgroup_rstat_cpu(child, cpu); + crstatc = css_rstat_cpu(child, cpu); grandchild = crstatc->updated_children; if (grandchild != child) { /* Push the grand child to the next level */ @@ -188,31 +191,33 @@ static struct cgroup *cgroup_rstat_push_children(struct cgroup *head, } /** - * cgroup_rstat_updated_list - return a list of updated cgroups to be flushed - * @root: root of the cgroup subtree to traverse + * css_rstat_updated_list - return a list of updated cgroups to be flushed + * @root: root of the css subtree to traverse * @cpu: target cpu * Return: A singly linked list of cgroups to be flushed * * Walks the updated rstat_cpu tree on @cpu from @root. During traversal, - * each returned cgroup is unlinked from the updated tree. + * each returned css is unlinked from the updated tree. * * The only ordering guarantee is that, for a parent and a child pair * covered by a given traversal, the child is before its parent in * the list. * * Note that updated_children is self terminated and points to a list of - * child cgroups if not empty. Whereas updated_next is like a sibling link - * within the children list and terminated by the parent cgroup. An exception + * child css's if not empty. Whereas updated_next is like a sibling link + * within the children list and terminated by the parent css. An exception * here is the cgroup root whose updated_next can be self terminated. */ -static struct cgroup *cgroup_rstat_updated_list(struct cgroup *root, int cpu) +static struct cgroup_subsys_state *css_rstat_updated_list( + struct cgroup_subsys_state *root, int cpu) { + struct cgroup *cgrp = root->cgroup; raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(root, cpu); - struct cgroup *head = NULL, *parent, *child; + struct css_rstat_cpu *rstatc = css_rstat_cpu(root, cpu); + struct cgroup_subsys_state *head = NULL, *parent, *child; unsigned long flags; - flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, root, false); + flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, false); /* Return NULL if this subtree is not on-list */ if (!rstatc->updated_next) @@ -222,17 +227,17 @@ static struct cgroup *cgroup_rstat_updated_list(struct cgroup *root, int cpu) * Unlink @root from its parent. As the updated_children list is * singly linked, we have to walk it to find the removal point. */ - parent = cgroup_parent(root); + parent = root->parent; if (parent) { - struct cgroup_rstat_cpu *prstatc; - struct cgroup **nextp; + struct css_rstat_cpu *prstatc; + struct cgroup_subsys_state **nextp; - prstatc = cgroup_rstat_cpu(parent, cpu); + prstatc = css_rstat_cpu(parent, cpu); nextp = &prstatc->updated_children; while (*nextp != root) { - struct cgroup_rstat_cpu *nrstatc; + struct css_rstat_cpu *nrstatc; - nrstatc = cgroup_rstat_cpu(*nextp, cpu); + nrstatc = css_rstat_cpu(*nextp, cpu); WARN_ON_ONCE(*nextp == parent); nextp = &nrstatc->updated_next; } @@ -249,14 +254,14 @@ static struct cgroup *cgroup_rstat_updated_list(struct cgroup *root, int cpu) if (child != root) head = cgroup_rstat_push_children(head, child, cpu); unlock_ret: - _cgroup_rstat_cpu_unlock(cpu_lock, cpu, root, flags, false); + _cgroup_rstat_cpu_unlock(cpu_lock, cpu, cgrp, flags, false); return head; } /* * A hook for bpf stat collectors to attach to and flush their stats. - * Together with providing bpf kfuncs for cgroup_rstat_updated() and - * cgroup_rstat_flush(), this enables a complete workflow where bpf progs that + * Together with providing bpf kfuncs for css_rstat_updated() and + * css_rstat_flush(), this enables a complete workflow where bpf progs that * collect cgroup stats can integrate with rstat for efficient flushing. * * A static noinline declaration here could cause the compiler to optimize away @@ -304,28 +309,26 @@ static inline void __cgroup_rstat_unlock(struct cgroup *cgrp, int cpu_in_loop) spin_unlock_irq(&cgroup_rstat_lock); } -/* see cgroup_rstat_flush() */ -static void cgroup_rstat_flush_locked(struct cgroup *cgrp) +/* see css_rstat_flush() */ +static void css_rstat_flush_locked(struct cgroup_subsys_state *css) __releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock) { + struct cgroup *cgrp = css->cgroup; int cpu; lockdep_assert_held(&cgroup_rstat_lock); for_each_possible_cpu(cpu) { - struct cgroup *pos = cgroup_rstat_updated_list(cgrp, cpu); + struct cgroup_subsys_state *pos; + pos = css_rstat_updated_list(css, cpu); for (; pos; pos = pos->rstat_flush_next) { - struct cgroup_subsys_state *css; - - cgroup_base_stat_flush(pos, cpu); - bpf_rstat_flush(pos, cgroup_parent(pos), cpu); - - rcu_read_lock(); - list_for_each_entry_rcu(css, &pos->rstat_css_list, - rstat_css_node) - css->ss->css_rstat_flush(css, cpu); - rcu_read_unlock(); + if (css_is_cgroup(pos)) { + cgroup_base_stat_flush(pos->cgroup, cpu); + bpf_rstat_flush(pos->cgroup, + cgroup_parent(pos->cgroup), cpu); + } else if (pos->ss->css_rstat_flush) + pos->ss->css_rstat_flush(pos, cpu); } /* play nice and yield if necessary */ @@ -339,98 +342,101 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp) } /** - * cgroup_rstat_flush - flush stats in @cgrp's subtree - * @cgrp: target cgroup + * css_rstat_flush - flush stats in @css's rstat subtree + * @css: target cgroup subsystem state * - * Collect all per-cpu stats in @cgrp's subtree into the global counters - * and propagate them upwards. After this function returns, all cgroups in - * the subtree have up-to-date ->stat. + * Collect all per-cpu stats in @css's subtree into the global counters + * and propagate them upwards. After this function returns, all rstat + * nodes in the subtree have up-to-date ->stat. * - * This also gets all cgroups in the subtree including @cgrp off the + * This also gets all rstat nodes in the subtree including @css off the * ->updated_children lists. * * This function may block. */ -void cgroup_rstat_flush(struct cgroup *cgrp) +void css_rstat_flush(struct cgroup_subsys_state *css) { + struct cgroup *cgrp = css->cgroup; + might_sleep(); __cgroup_rstat_lock(cgrp, -1); - cgroup_rstat_flush_locked(cgrp); + css_rstat_flush_locked(css); __cgroup_rstat_unlock(cgrp, -1); } __bpf_kfunc void bpf_cgroup_rstat_flush(struct cgroup *cgrp) { - cgroup_rstat_flush(cgrp); + css_rstat_flush(&cgrp->self); } /** - * cgroup_rstat_flush_hold - flush stats in @cgrp's subtree and hold - * @cgrp: target cgroup + * css_rstat_flush_hold - flush stats in @css's rstat subtree and hold + * @css: target subsystem state * - * Flush stats in @cgrp's subtree and prevent further flushes. Must be - * paired with cgroup_rstat_flush_release(). + * Flush stats in @css's rstat subtree and prevent further flushes. Must be + * paired with css_rstat_flush_release(). * * This function may block. */ -void cgroup_rstat_flush_hold(struct cgroup *cgrp) - __acquires(&cgroup_rstat_lock) +void css_rstat_flush_hold(struct cgroup_subsys_state *css) { + struct cgroup *cgrp = css->cgroup; + might_sleep(); __cgroup_rstat_lock(cgrp, -1); - cgroup_rstat_flush_locked(cgrp); + css_rstat_flush_locked(css); } /** - * cgroup_rstat_flush_release - release cgroup_rstat_flush_hold() - * @cgrp: cgroup used by tracepoint + * css_rstat_flush_release - release css_rstat_flush_hold() + * @css: css that was previously used for the call to flush hold */ -void cgroup_rstat_flush_release(struct cgroup *cgrp) - __releases(&cgroup_rstat_lock) +void css_rstat_flush_release(struct cgroup_subsys_state *css) { + struct cgroup *cgrp = css->cgroup; __cgroup_rstat_unlock(cgrp, -1); } -int cgroup_rstat_init(struct cgroup *cgrp) +int css_rstat_init(struct cgroup_subsys_state *css) { int cpu; - /* the root cgrp has rstat_cpu preallocated */ - if (!cgrp->rstat_cpu) { - cgrp->rstat_cpu = alloc_percpu(struct cgroup_rstat_cpu); - if (!cgrp->rstat_cpu) + /* the root cgrp's self css has rstat_cpu preallocated */ + if (!css->rstat_cpu) { + css->rstat_cpu = alloc_percpu(struct css_rstat_cpu); + if (!css->rstat_cpu) return -ENOMEM; } /* ->updated_children list is self terminated */ for_each_possible_cpu(cpu) { - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu); + struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu); - rstatc->updated_children = cgrp; + rstatc->updated_children = css; u64_stats_init(&rstatc->bsync); } return 0; } -void cgroup_rstat_exit(struct cgroup *cgrp) +void css_rstat_exit(struct cgroup_subsys_state *css) { int cpu; - cgroup_rstat_flush(cgrp); + css_rstat_flush(css); /* sanity check */ for_each_possible_cpu(cpu) { - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu); + struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu); - if (WARN_ON_ONCE(rstatc->updated_children != cgrp) || + if (WARN_ON_ONCE(rstatc->updated_children != css) || WARN_ON_ONCE(rstatc->updated_next)) return; } - free_percpu(cgrp->rstat_cpu); - cgrp->rstat_cpu = NULL; + free_percpu(css->rstat_cpu); + css->rstat_cpu = NULL; } void __init cgroup_rstat_boot(void) @@ -471,9 +477,9 @@ static void cgroup_base_stat_sub(struct cgroup_base_stat *dst_bstat, static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) { - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu); + struct css_rstat_cpu *rstatc = css_rstat_cpu(&cgrp->self, cpu); struct cgroup *parent = cgroup_parent(cgrp); - struct cgroup_rstat_cpu *prstatc; + struct css_rstat_cpu *prstatc; struct cgroup_base_stat delta; unsigned seq; @@ -501,35 +507,35 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) cgroup_base_stat_add(&cgrp->last_bstat, &delta); delta = rstatc->subtree_bstat; - prstatc = cgroup_rstat_cpu(parent, cpu); + prstatc = css_rstat_cpu(&parent->self, cpu); cgroup_base_stat_sub(&delta, &rstatc->last_subtree_bstat); cgroup_base_stat_add(&prstatc->subtree_bstat, &delta); cgroup_base_stat_add(&rstatc->last_subtree_bstat, &delta); } } -static struct cgroup_rstat_cpu * +static struct css_rstat_cpu * cgroup_base_stat_cputime_account_begin(struct cgroup *cgrp, unsigned long *flags) { - struct cgroup_rstat_cpu *rstatc; + struct css_rstat_cpu *rstatc; - rstatc = get_cpu_ptr(cgrp->rstat_cpu); + rstatc = get_cpu_ptr(cgrp->self.rstat_cpu); *flags = u64_stats_update_begin_irqsave(&rstatc->bsync); return rstatc; } static void cgroup_base_stat_cputime_account_end(struct cgroup *cgrp, - struct cgroup_rstat_cpu *rstatc, + struct css_rstat_cpu *rstatc, unsigned long flags) { u64_stats_update_end_irqrestore(&rstatc->bsync, flags); - cgroup_rstat_updated(cgrp, smp_processor_id()); + css_rstat_updated(&cgrp->self, smp_processor_id()); put_cpu_ptr(rstatc); } void __cgroup_account_cputime(struct cgroup *cgrp, u64 delta_exec) { - struct cgroup_rstat_cpu *rstatc; + struct css_rstat_cpu *rstatc; unsigned long flags; rstatc = cgroup_base_stat_cputime_account_begin(cgrp, &flags); @@ -540,7 +546,7 @@ void __cgroup_account_cputime(struct cgroup *cgrp, u64 delta_exec) void __cgroup_account_cputime_field(struct cgroup *cgrp, enum cpu_usage_stat index, u64 delta_exec) { - struct cgroup_rstat_cpu *rstatc; + struct css_rstat_cpu *rstatc; unsigned long flags; rstatc = cgroup_base_stat_cputime_account_begin(cgrp, &flags); @@ -625,12 +631,12 @@ void cgroup_base_stat_cputime_show(struct seq_file *seq) u64 usage, utime, stime, ntime; if (cgroup_parent(cgrp)) { - cgroup_rstat_flush_hold(cgrp); + css_rstat_flush_hold(&cgrp->self); usage = cgrp->bstat.cputime.sum_exec_runtime; cputime_adjust(&cgrp->bstat.cputime, &cgrp->prev_cputime, &utime, &stime); ntime = cgrp->bstat.ntime; - cgroup_rstat_flush_release(cgrp); + css_rstat_flush_release(&cgrp->self); } else { /* cgrp->bstat of root is not actually used, reuse it */ root_cgroup_cputime(&cgrp->bstat); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4de6acb9b8ec..fe86d7efe372 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -579,7 +579,7 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) if (!val) return; - cgroup_rstat_updated(memcg->css.cgroup, cpu); + css_rstat_updated(&memcg->css, cpu); statc = this_cpu_ptr(memcg->vmstats_percpu); for (; statc; statc = statc->parent) { stats_updates = READ_ONCE(statc->stats_updates) + abs(val); @@ -611,7 +611,7 @@ static void __mem_cgroup_flush_stats(struct mem_cgroup *memcg, bool force) if (mem_cgroup_is_root(memcg)) WRITE_ONCE(flush_last_time, jiffies_64); - cgroup_rstat_flush(memcg->css.cgroup); + css_rstat_flush(&memcg->css); } /* diff --git a/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c b/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c index 38f78d9345de..f362f7d41b9e 100644 --- a/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c +++ b/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c @@ -45,7 +45,7 @@ int BPF_PROG(test_percpu2, struct bpf_testmod_btf_type_tag_2 *arg) SEC("tp_btf/cgroup_mkdir") int BPF_PROG(test_percpu_load, struct cgroup *cgrp, const char *path) { - g = (__u64)cgrp->rstat_cpu->updated_children; + g = (__u64)cgrp->self.rstat_cpu->updated_children; return 0; } @@ -56,7 +56,8 @@ int BPF_PROG(test_percpu_helper, struct cgroup *cgrp, const char *path) __u32 cpu; cpu = bpf_get_smp_processor_id(); - rstat = (struct cgroup_rstat_cpu *)bpf_per_cpu_ptr(cgrp->rstat_cpu, cpu); + rstat = (struct cgroup_rstat_cpu *)bpf_per_cpu_ptr( + cgrp->self.rstat_cpu, cpu); if (rstat) { /* READ_ONCE */ *(volatile int *)rstat; From patchwork Wed Mar 19 22:16:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: JP Kobryn X-Patchwork-Id: 14023227 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39006C36000 for ; Wed, 19 Mar 2025 22:17:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 77A7328000A; Wed, 19 Mar 2025 18:17:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6FE9E280002; Wed, 19 Mar 2025 18:17:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5605C28000A; Wed, 19 Mar 2025 18:17:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2A6FB280002 for ; Wed, 19 Mar 2025 18:17:13 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 880EDA9C22 for ; Wed, 19 Mar 2025 22:17:14 +0000 (UTC) X-FDA: 83239712388.07.68606CE Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf26.hostedemail.com (Postfix) with ESMTP id D8C21140008 for ; Wed, 19 Mar 2025 22:17:12 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Y3VlD8Br; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742422632; a=rsa-sha256; cv=none; b=fAm4dT/sNJzqGMZ5B0MDQzCt745DG27PTeriSNwO6i8+SxCFk5b5swpIZEBvcHT/tFQb7j J5IOn2A4G9kr2xy4TyHEmwPUMfRgrlZR8QJP3SgfRtwLd6YQ4D81cqkVbeOI+wRoAo1KsK nprDPen6lAKmI3CJ0CIEW32zLBvuKF0= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Y3VlD8Br; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742422632; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yBVQ9JLe0YyGvw+H4jsTKvC5B30TGlzUOUpyXhkBvvk=; b=HZ/ayAiI+rca5h9on0CxJHPhuj9zmNQpB8UHRkZhO8b1M1A8D7HAeQt0859coW5FblkOyr HjxPjTTQyQHrPn630I5fRmqMzNAkSN3ZAUyJ0/xFyluXLVI0JqdgT7UedEwDw1fvJhsddR 9iJ3yrZ7bH8YZYQo0cJukX4vOqEvQ20= Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-224171d6826so1395575ad.3 for ; Wed, 19 Mar 2025 15:17:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742422632; x=1743027432; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=yBVQ9JLe0YyGvw+H4jsTKvC5B30TGlzUOUpyXhkBvvk=; b=Y3VlD8BrNAU+3l0mYXJKo9QhBvNw+2g8nEFj3nST6krJmhYF9dxPL1TwfDIjAns/my XX3QhPfg0ryVZ6PavdwCSgwRom8OhTCk8RCfzXEAqOoLKYGtNNMmUeX5h1xTpFNo1A7L Wk008ijQUKe8fRHjECHQURI0C3zffMDzNHH4kSvhI8gApuH8D1PglHGkZ0RryoJICoKv VQ5Yw/q7HGgY8GENahXlXmJm02WCOcVhQ9JenNSv3tO/ru/YPuHI/dBuxQ2QkNlnZdmA Ci11oKt9Zc9CWy0FDuF+hm6fcwVEXjV82W865wIxlmEn1viKrUx8q7tDmT1gIUbbfGWT tG1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742422632; x=1743027432; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yBVQ9JLe0YyGvw+H4jsTKvC5B30TGlzUOUpyXhkBvvk=; b=xJwD+1Y6fFCEOrqR+tO/g2Lz2Rvr/74V5eGcI9zRN88OwmZuIuKCwYe4tWmq9KIYXp fPJPaH43XhRHTP9oB58tHAx/maOfost0JCHwC//5rO28qywJvONPuyzXpWgPSSmgLQLC tXO3BF/GWuTpWCKTfdm5lmAHLughTs09TJuMmaN0dnO4HScxdk7ErBLMxeYVC5WmHI6v L4+wRckJg3BsC1T6bsrySov/UNKc//nWwGQXJhbnm4VFzMegDYIxdUi7WTbpwbLwwwS+ yq3sgXuRuzzAm1HaMDhEOuEzseQvxBvekXsSNxsdAzMhtL89PuRaQZ7GPM/b+N0IUyY1 l7qg== X-Gm-Message-State: AOJu0YymvbXA7oZpUKBK9RvfJwGRDhZwP6R/aW8u39Q5tORanACNslxi TtQsSNHeb/Pip9AJHZIiudiIgOQAAaA+oeDwlu64VEVxSPygdbAH X-Gm-Gg: ASbGncsjwOMo9p0518fY0KaQQ8Jq/jMu20I1jwV+Vj6nT5eFeR0wibIFuPP1ur3Ou9d B4xet9qiAEmvFLokqWspjBClBZ1dVQ2zl5yzofFKh/ZPbrvTeZoxg+qTHz1YS2EqK6bNdewY3zU IGih3y2RR8cSS0RQZstwtt5YBHo4OtxipaNjrwznnb8piLz+0WUBb61X9nNQ2cjqqoGHvg+Temb EdCGsapx1j5qdSPcJ2sDnpBHMOpkirS2d8swe9UR/5XyHQF9UOF1isMY8AJUd+prEVJkpIHz5rz kiYNrHfEt4U90AVQjCzX+G8Rwa2uwgkKKjFwXFob958awpgPhMGRBfQNSM6QClg55gwtLAyO X-Google-Smtp-Source: AGHT+IFkBKXNjicIFUdDP35mA4KEP43YFXh5t0TytgASyjMgPQgr3FsiJbwrqnL3GZ6l81MVO6Gx0Q== X-Received: by 2002:a17:902:e747:b0:220:efc8:60b1 with SMTP id d9443c01a7336-22649a68215mr65921775ad.39.1742422631621; Wed, 19 Mar 2025 15:17:11 -0700 (PDT) Received: from jpkobryn-fedora-PF5CFKNC.thefacebook.com ([2620:10d:c090:500::4:39d5]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-737116af372sm12253977b3a.160.2025.03.19.15.17.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Mar 2025 15:17:11 -0700 (PDT) From: JP Kobryn To: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com, mkoutny@suse.com, hannes@cmpxchg.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 3/4] cgroup: use subsystem-specific rstat locks to avoid contention Date: Wed, 19 Mar 2025 15:16:33 -0700 Message-ID: <20250319221634.71128-4-inwardvessel@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250319221634.71128-1-inwardvessel@gmail.com> References: <20250319221634.71128-1-inwardvessel@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: D8C21140008 X-Stat-Signature: fzp8ktgckmhip5zu1hbmi5ccq5e1u7wz X-Rspamd-Server: rspam06 X-HE-Tag: 1742422632-996337 X-HE-Meta: U2FsdGVkX18RtFBc3x7SgVMmPkTXNBpMPDSRTlb87Av6/Iq33B6cFD6dO+8RuFK6j/vnLnjh3x1H6NmDQkoGVo9IPzGUDNJ7IaJMNXvQhy1ZDc9p3cjVhjMZLKLg6JLOyXeYdEhhAeOO9JRBW4jJmf3ID9Kw1OnEJEg/YAIi6Y3E2Ah0tzIveNgCpT82WyX4RGntGa8/Oyyfn3JK8O+iPxR0EsRGbYSUotctPtS+hYfoHzu84Tg89i+qneHaeXFXkg+7ctV5RfmToWq9e5w5osKJ0FbSpNpXNbPBoLGhsm795ihty9/Xk68olYLE0CYLbjvD1pe7dkTQ37JPJi2dSfzUlItHOfeZB6ibOFg7ZByOgLL8tvr5yhiW7YjbnllEG4wekotSky3mSuNyEnnVt+VATxce8WwCxWg7sat6JrTQswy8u2nCwltiRQH3C7Zod0CeqkrRnRwh9lhgPUSLyiNm5l+hvblJtxhw/IbdYW4qWJ6d8TKjV3/1LONSLVc4ZL5PzFHxUZI5No8ngIEjI8xu8QY1xyxQ8jS66vbB9lkZrk6hwdwBNiiYEFsVbTdDqSr6HWLPARxcM+nZ5TDLZnHK65I+x83877wi06kz+izslLwoABi9yTaW/LK7YkR4MVESQ7YjfvY4GYri/lG/opXJHIxhfaIA2ewc3NaKdUXblOlXUIepyHctMMKBBINfRQ8LhGA+f+e3eClLya6XbMltFDJndJlKAJfJa3REii3qiQNNsL+t8LZ4RCsRk8Hxbkw8g2HLu+mW3O6+cKu3uUbDxdq99cEEqXW0LxT0tDdKbl2WAjrvgKxAJzB93XDljtVcXsMcnfwRriNUtxhm5PC5HQ6HwiY7+zdqa3rqyQMuq5JvE5jMIlokbaX/sP4JQooAsqykq6lwZviWBsPmGDvm/IAuYWOBHHA0yYt0u7cU1tcGgPoW8m6QXy1gZAQWnp80hSGhg43c5aGyznl ix8M8N3I wC2C9ZyiVyqsT9/k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: It is possible to eliminate contention between subsystems when updating/flushing stats by using subsystem-specific locks. Let the existing rstat locks be dedicated to the base stats and rename them to reflect it. Add similar locks to the cgroup_subsys struct for use with individual subsystems. To make use of the new locks, change the existing lock helper functions to accept a reference to a css and use css->ss to access the locks or use the static locks for base stats when css->ss is NULL. Signed-off-by: JP Kobryn --- block/blk-cgroup.c | 2 +- include/linux/cgroup-defs.h | 12 ++- include/trace/events/cgroup.h | 10 ++- kernel/cgroup/cgroup-internal.h | 2 +- kernel/cgroup/cgroup.c | 10 ++- kernel/cgroup/rstat.c | 145 +++++++++++++++++++++----------- 6 files changed, 122 insertions(+), 59 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index cd9521f4f607..34d72bbdd5e5 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1022,7 +1022,7 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu) /* * For covering concurrent parent blkg update from blkg_release(). * - * When flushing from cgroup, cgroup_rstat_lock is always held, so + * When flushing from cgroup, the subsystem lock is always held, so * this lock won't cause contention most of time. */ raw_spin_lock_irqsave(&blkg_stat_lock, flags); diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 031f55a9ac49..0ffc8438c6d9 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -223,7 +223,10 @@ struct cgroup_subsys_state { /* * A singly-linked list of css structures to be rstat flushed. * This is a scratch field to be used exclusively by - * cgroup_rstat_flush_locked() and protected by cgroup_rstat_lock. + * cgroup_rstat_flush_locked(). + * + * protected by rstat_base_lock when css is cgroup::self + * protected by css->ss->lock otherwise */ struct cgroup_subsys_state *rstat_flush_next; }; @@ -391,7 +394,9 @@ struct css_rstat_cpu { * to the cgroup makes it unnecessary for each per-cpu struct to * point back to the associated cgroup. * - * Protected by per-cpu cgroup_rstat_cpu_lock. + * Protected by per-cpu rstat_base_cpu_lock when css->ss == NULL + * otherwise, + * Protected by per-cpu css->ss->rstat_cpu_lock */ struct cgroup_subsys_state *updated_children; /* terminated by self */ struct cgroup_subsys_state *updated_next; /* NULL if not on list */ @@ -779,6 +784,9 @@ struct cgroup_subsys { * specifies the mask of subsystems that this one depends on. */ unsigned int depends_on; + + spinlock_t lock; + raw_spinlock_t __percpu *percpu_lock; }; extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem; diff --git a/include/trace/events/cgroup.h b/include/trace/events/cgroup.h index af2755bda6eb..ec3a95bf4981 100644 --- a/include/trace/events/cgroup.h +++ b/include/trace/events/cgroup.h @@ -231,7 +231,10 @@ DECLARE_EVENT_CLASS(cgroup_rstat, __entry->cpu, __entry->contended) ); -/* Related to global: cgroup_rstat_lock */ +/* Related to locks: + * rstat_base_lock when handling cgroup::self + * css->ss->lock otherwise + */ DEFINE_EVENT(cgroup_rstat, cgroup_rstat_lock_contended, TP_PROTO(struct cgroup *cgrp, int cpu, bool contended), @@ -253,7 +256,10 @@ DEFINE_EVENT(cgroup_rstat, cgroup_rstat_unlock, TP_ARGS(cgrp, cpu, contended) ); -/* Related to per CPU: cgroup_rstat_cpu_lock */ +/* Related to per CPU locks: + * rstat_base_cpu_lock when handling cgroup::self + * css->ss->cpu_lock otherwise + */ DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_lock_contended, TP_PROTO(struct cgroup *cgrp, int cpu, bool contended), diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index d4b75fba9a54..513bfce3bc23 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -271,7 +271,7 @@ int cgroup_task_count(const struct cgroup *cgrp); */ int css_rstat_init(struct cgroup_subsys_state *css); void css_rstat_exit(struct cgroup_subsys_state *css); -void cgroup_rstat_boot(void); +int ss_rstat_init(struct cgroup_subsys *ss); void cgroup_base_stat_cputime_show(struct seq_file *seq); /* diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 1e21065dec0e..3e8948805f67 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -6085,8 +6085,10 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early) css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL); BUG_ON(css->id < 0); - if (ss->css_rstat_flush) + if (ss->css_rstat_flush) { + BUG_ON(ss_rstat_init(ss)); BUG_ON(css_rstat_init(css)); + } } /* Update the init_css_set to contain a subsys @@ -6163,7 +6165,7 @@ int __init cgroup_init(void) BUG_ON(cgroup_init_cftypes(NULL, cgroup_psi_files)); BUG_ON(cgroup_init_cftypes(NULL, cgroup1_base_files)); - cgroup_rstat_boot(); + BUG_ON(ss_rstat_init(NULL)); get_user_ns(init_cgroup_ns.user_ns); @@ -6193,8 +6195,10 @@ int __init cgroup_init(void) GFP_KERNEL); BUG_ON(css->id < 0); - if (ss->css_rstat_flush) + if (ss->css_rstat_flush) { + BUG_ON(ss_rstat_init(ss)); BUG_ON(css_rstat_init(css)); + } } else { cgroup_init_subsys(ss, false); } diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index a28c00b11736..ffd7ac6bcefc 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -9,8 +9,8 @@ #include -static DEFINE_SPINLOCK(cgroup_rstat_lock); -static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock); +static DEFINE_SPINLOCK(rstat_base_lock); +static DEFINE_PER_CPU(raw_spinlock_t, rstat_base_cpu_lock); static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu); @@ -20,8 +20,24 @@ static struct css_rstat_cpu *css_rstat_cpu( return per_cpu_ptr(css->rstat_cpu, cpu); } +static spinlock_t *ss_rstat_lock(struct cgroup_subsys *ss) +{ + if (ss) + return &ss->lock; + + return &rstat_base_lock; +} + +static raw_spinlock_t *ss_rstat_cpu_lock(struct cgroup_subsys *ss, int cpu) +{ + if (ss) + return per_cpu_ptr(ss->percpu_lock, cpu); + + return per_cpu_ptr(&rstat_base_cpu_lock, cpu); +} + /* - * Helper functions for rstat per CPU lock (cgroup_rstat_cpu_lock). + * Helper functions for rstat per CPU locks. * * This makes it easier to diagnose locking issues and contention in * production environments. The parameter @fast_path determine the @@ -29,20 +45,23 @@ static struct css_rstat_cpu *css_rstat_cpu( * operations without handling high-frequency fast-path "update" events. */ static __always_inline -unsigned long _cgroup_rstat_cpu_lock(raw_spinlock_t *cpu_lock, int cpu, - struct cgroup *cgrp, const bool fast_path) +unsigned long _css_rstat_cpu_lock(struct cgroup_subsys_state *css, int cpu, + const bool fast_path) { + struct cgroup *cgrp = css->cgroup; + raw_spinlock_t *cpu_lock; unsigned long flags; bool contended; /* - * The _irqsave() is needed because cgroup_rstat_lock is - * spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring - * this lock with the _irq() suffix only disables interrupts on - * a non-PREEMPT_RT kernel. The raw_spinlock_t below disables - * interrupts on both configurations. The _irqsave() ensures - * that interrupts are always disabled and later restored. + * The _irqsave() is needed because the locks used for flushing are + * spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring this lock + * with the _irq() suffix only disables interrupts on a non-PREEMPT_RT + * kernel. The raw_spinlock_t below disables interrupts on both + * configurations. The _irqsave() ensures that interrupts are always + * disabled and later restored. */ + cpu_lock = ss_rstat_cpu_lock(css->ss, cpu); contended = !raw_spin_trylock_irqsave(cpu_lock, flags); if (contended) { if (fast_path) @@ -62,15 +81,18 @@ unsigned long _cgroup_rstat_cpu_lock(raw_spinlock_t *cpu_lock, int cpu, } static __always_inline -void _cgroup_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int cpu, - struct cgroup *cgrp, unsigned long flags, - const bool fast_path) +void _css_rstat_cpu_unlock(struct cgroup_subsys_state *css, int cpu, + unsigned long flags, const bool fast_path) { + struct cgroup *cgrp = css->cgroup; + raw_spinlock_t *cpu_lock; + if (fast_path) trace_cgroup_rstat_cpu_unlock_fastpath(cgrp, cpu, false); else trace_cgroup_rstat_cpu_unlock(cgrp, cpu, false); + cpu_lock = ss_rstat_cpu_lock(css->ss, cpu); raw_spin_unlock_irqrestore(cpu_lock, flags); } @@ -85,8 +107,6 @@ void _cgroup_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int cpu, */ void css_rstat_updated(struct cgroup_subsys_state *css, int cpu) { - struct cgroup *cgrp = css->cgroup; - raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); unsigned long flags; /* @@ -100,7 +120,7 @@ void css_rstat_updated(struct cgroup_subsys_state *css, int cpu) if (data_race(css_rstat_cpu(css, cpu)->updated_next)) return; - flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, true); + flags = _css_rstat_cpu_lock(css, cpu, true); /* put @css and all ancestors on the corresponding updated lists */ while (true) { @@ -128,7 +148,7 @@ void css_rstat_updated(struct cgroup_subsys_state *css, int cpu) css = parent; } - _cgroup_rstat_cpu_unlock(cpu_lock, cpu, cgrp, flags, true); + _css_rstat_cpu_unlock(css, cpu, flags, true); } __bpf_kfunc void bpf_cgroup_rstat_updated(struct cgroup *cgrp, int cpu) @@ -211,13 +231,11 @@ static struct cgroup_subsys_state *cgroup_rstat_push_children( static struct cgroup_subsys_state *css_rstat_updated_list( struct cgroup_subsys_state *root, int cpu) { - struct cgroup *cgrp = root->cgroup; - raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); struct css_rstat_cpu *rstatc = css_rstat_cpu(root, cpu); struct cgroup_subsys_state *head = NULL, *parent, *child; unsigned long flags; - flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, false); + flags = _css_rstat_cpu_lock(root, cpu, false); /* Return NULL if this subtree is not on-list */ if (!rstatc->updated_next) @@ -254,7 +272,7 @@ static struct cgroup_subsys_state *css_rstat_updated_list( if (child != root) head = cgroup_rstat_push_children(head, child, cpu); unlock_ret: - _cgroup_rstat_cpu_unlock(cpu_lock, cpu, cgrp, flags, false); + _css_rstat_cpu_unlock(root, cpu, flags, false); return head; } @@ -281,7 +299,7 @@ __weak noinline void bpf_rstat_flush(struct cgroup *cgrp, __bpf_hook_end(); /* - * Helper functions for locking cgroup_rstat_lock. + * Helper functions for locking. * * This makes it easier to diagnose locking issues and contention in * production environments. The parameter @cpu_in_loop indicate lock @@ -289,35 +307,44 @@ __bpf_hook_end(); * value -1 is used when obtaining the main lock else this is the CPU * number processed last. */ -static inline void __cgroup_rstat_lock(struct cgroup *cgrp, int cpu_in_loop) - __acquires(&cgroup_rstat_lock) +static inline void __css_rstat_lock(struct cgroup_subsys_state *css, + int cpu_in_loop) + __acquires(lock) { + struct cgroup *cgrp = css->cgroup; + spinlock_t *lock; bool contended; - contended = !spin_trylock_irq(&cgroup_rstat_lock); + lock = ss_rstat_lock(css->ss); + contended = !spin_trylock_irq(lock); if (contended) { trace_cgroup_rstat_lock_contended(cgrp, cpu_in_loop, contended); - spin_lock_irq(&cgroup_rstat_lock); + spin_lock_irq(lock); } trace_cgroup_rstat_locked(cgrp, cpu_in_loop, contended); } -static inline void __cgroup_rstat_unlock(struct cgroup *cgrp, int cpu_in_loop) - __releases(&cgroup_rstat_lock) +static inline void __css_rstat_unlock(struct cgroup_subsys_state *css, + int cpu_in_loop) + __releases(lock) { + struct cgroup *cgrp = css->cgroup; + spinlock_t *lock; + + lock = ss_rstat_lock(css->ss); trace_cgroup_rstat_unlock(cgrp, cpu_in_loop, false); - spin_unlock_irq(&cgroup_rstat_lock); + spin_unlock_irq(lock); } -/* see css_rstat_flush() */ +/* see css_rstat_flush() + * + * it is required that callers have previously acquired a lock via + * __css_rstat_lock(css) + */ static void css_rstat_flush_locked(struct cgroup_subsys_state *css) - __releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock) { - struct cgroup *cgrp = css->cgroup; int cpu; - lockdep_assert_held(&cgroup_rstat_lock); - for_each_possible_cpu(cpu) { struct cgroup_subsys_state *pos; @@ -332,11 +359,11 @@ static void css_rstat_flush_locked(struct cgroup_subsys_state *css) } /* play nice and yield if necessary */ - if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) { - __cgroup_rstat_unlock(cgrp, cpu); + if (need_resched() || spin_needbreak(ss_rstat_lock(css->ss))) { + __css_rstat_unlock(css, cpu); if (!cond_resched()) cpu_relax(); - __cgroup_rstat_lock(cgrp, cpu); + __css_rstat_lock(css, cpu); } } } @@ -356,13 +383,10 @@ static void css_rstat_flush_locked(struct cgroup_subsys_state *css) */ void css_rstat_flush(struct cgroup_subsys_state *css) { - struct cgroup *cgrp = css->cgroup; - might_sleep(); - - __cgroup_rstat_lock(cgrp, -1); + __css_rstat_lock(css, -1); css_rstat_flush_locked(css); - __cgroup_rstat_unlock(cgrp, -1); + __css_rstat_unlock(css, -1); } __bpf_kfunc void bpf_cgroup_rstat_flush(struct cgroup *cgrp) @@ -381,10 +405,8 @@ __bpf_kfunc void bpf_cgroup_rstat_flush(struct cgroup *cgrp) */ void css_rstat_flush_hold(struct cgroup_subsys_state *css) { - struct cgroup *cgrp = css->cgroup; - might_sleep(); - __cgroup_rstat_lock(cgrp, -1); + __css_rstat_lock(css, -1); css_rstat_flush_locked(css); } @@ -394,8 +416,7 @@ void css_rstat_flush_hold(struct cgroup_subsys_state *css) */ void css_rstat_flush_release(struct cgroup_subsys_state *css) { - struct cgroup *cgrp = css->cgroup; - __cgroup_rstat_unlock(cgrp, -1); + __css_rstat_unlock(css, -1); } int css_rstat_init(struct cgroup_subsys_state *css) @@ -439,12 +460,36 @@ void css_rstat_exit(struct cgroup_subsys_state *css) css->rstat_cpu = NULL; } -void __init cgroup_rstat_boot(void) +/** + * ss_rstat_init - subsystem-specific rstat initialization + * @ss: target subsystem + * + * If @ss is NULL, the static locks associated with the base stats + * are initialized. If @ss is non-NULL, the subsystem-specific locks + * are initialized. + */ +int __init ss_rstat_init(struct cgroup_subsys *ss) { int cpu; + if (!ss) { + spin_lock_init(&rstat_base_lock); + + for_each_possible_cpu(cpu) + raw_spin_lock_init(per_cpu_ptr(&rstat_base_cpu_lock, cpu)); + + return 0; + } + + spin_lock_init(&ss->lock); + ss->percpu_lock = alloc_percpu(raw_spinlock_t); + if (!ss->percpu_lock) + return -ENOMEM; + for_each_possible_cpu(cpu) - raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu)); + raw_spin_lock_init(per_cpu_ptr(ss->percpu_lock, cpu)); + + return 0; } /* From patchwork Wed Mar 19 22:16:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: JP Kobryn X-Patchwork-Id: 14023228 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 552F3C36001 for ; Wed, 19 Mar 2025 22:17:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AAD9228000B; Wed, 19 Mar 2025 18:17:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A2C1F280002; Wed, 19 Mar 2025 18:17:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 80C4F28000B; Wed, 19 Mar 2025 18:17:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5C562280002 for ; Wed, 19 Mar 2025 18:17:14 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id EC76CA99A6 for ; Wed, 19 Mar 2025 22:17:15 +0000 (UTC) X-FDA: 83239712430.26.C0CF18D Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf30.hostedemail.com (Postfix) with ESMTP id 214F38000D for ; Wed, 19 Mar 2025 22:17:13 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="MTSr+0/k"; spf=pass (imf30.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742422634; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=afEnSPzJRvzNwhVeLcHVZ9SDJWWdS1gSO21ww78tbw0=; b=YvvcNFpvRP3g1Y+v+0BAb/Jk7YQGqytN/f02CMeuXCYRfT+HD7p8Q4RljbvHQu+5lJkrEE 4TzYVnH3dZpX4klOqho7rgdJtYWx2R4v7eKz8PlVtpIO6PLvU+KP2OkphygnW4pzOEbgeM OQM+koYJ6zSp+OueXqKov17vtltHUcg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742422634; a=rsa-sha256; cv=none; b=uGB6rv3G7HNDByO7uN6VqTRiJmBwYHffk0TyLQsPHRO4YUSgZkzpJ/5UtgH5OyYT6Zs9Mj ZnWoa4XFPmL1AQIooUZbJI0XNQ7SIMKyin4Sm+j3Ab4wgZxi7+YZSlILUlP4M5esgVMmN5 7KL9wrTdLKPDnRub6KWoE8XdEBU+4DQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="MTSr+0/k"; spf=pass (imf30.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-225e3002dffso804075ad.1 for ; Wed, 19 Mar 2025 15:17:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742422633; x=1743027433; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=afEnSPzJRvzNwhVeLcHVZ9SDJWWdS1gSO21ww78tbw0=; b=MTSr+0/kQFSExcuacA29MDJyRb25hfoHmFmkB3EGJuy87zkJDBfXOQ7/HtkIB37EGP gXsiukm9n+cl4CWGehPtnMKkdc4Jrn1DuBqdddcUQwWBOVSdj+08IvoM/RKDMqHgpNYC sprUK91qczVyLoepGxCJtagY0FoRfk5pt5a1M/3OXKnHeOvJ2msMvnijXbaTVtbIFCBl 9hKnZGUCCQjd21iquc6X4tr3NE+rTj10vnyaODON9sb88rGe0AegojEQaNrBfiuAF2xC K6d8w5HPZEHpdRnqKoB/8psAchhX0S8NGPqiDsyoY2rw9XMaPDIKXD4kP+L9bxNJNq00 iu/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742422633; x=1743027433; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=afEnSPzJRvzNwhVeLcHVZ9SDJWWdS1gSO21ww78tbw0=; b=uKzxG2pke/S0qYmvUpUotvzR9ZFtcn5rXfqAh1WRCtjIcsaYS/OlgpPkBltjYomdLx hJyk4j7U6pIM4PGI4R/BZO9crQg5HeBs43JZYw6Nleul7vEeRxkYaSZKSb1PVq9ykBbG +WswdD94XsHce6CKqrmZ+CluVoPBdJ5twIW6czeeN3xdT/kn06Zg3dSHJGP27M6ICoF2 be5oTXPtT+zu3uODhGkzJWaAkXUzDqh/LG6eGajgxIvE+x+y7L3SaIIu1abgUtMKzojK p7f1f8i74YQ1RfkalZ9TmK+nPi+uZ4EOnBFynYB1JwGGQYzVoyrQoETuV+bUb+NPbIwY iEhA== X-Gm-Message-State: AOJu0YwS7pjZ0o/FA1Dp6J55Igm9CYh02YQPvaUoyCCQorSkcE7ftL0q +31rKtlH8M+CPAw/m++OMNX75Mh30+YYOmvH1Tiq2WNarcf12gxCMdmFgg== X-Gm-Gg: ASbGncvemgLG/Hjevl6S/WRAEbfGAQaNvRJaJl+fu2i0/cu1lIYlq3jBwFdNBOKoEVy +e9AK+8gfCmkW/EIm4trseVmjFUrrKJ44x5w+VAcpU3fKpYlatQxuHwfrE29W/yr4CZHitz0TtT QGf/THJnQAPZCzA3JFkXACvUXcBoUBiXcicg+NsOQ+UyN/ns0vt3fperKGguPx6Xpf5Ts8L89t/ qEEUhF4CMBW4Wvo5dDOoUrfHyurC6A0y+mIySCvDOQqRQwTdNEYnEUp/Z1mI9MfmjtVP9l0w9Ux 3HF857lJK2O56q4RhxGkedYagl4sMvhaaImAYNDGFbHBES5U45aSjlCjbV9NXVtppJcicAvy X-Google-Smtp-Source: AGHT+IFrFgCS8dDhvXzZ3pmRSL5PI+IuoTySLBe+Qt2yGOY/MJc5imuwWRkc8x+PC5JqjjBIrA1KCA== X-Received: by 2002:a05:6a20:8426:b0:1f5:7f0c:2462 with SMTP id adf61e73a8af0-1fbed21eeabmr8324206637.28.1742422633020; Wed, 19 Mar 2025 15:17:13 -0700 (PDT) Received: from jpkobryn-fedora-PF5CFKNC.thefacebook.com ([2620:10d:c090:500::4:39d5]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-737116af372sm12253977b3a.160.2025.03.19.15.17.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Mar 2025 15:17:12 -0700 (PDT) From: JP Kobryn To: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com, mkoutny@suse.com, hannes@cmpxchg.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 4/4] cgroup: save memory by splitting cgroup_rstat_cpu into compact and full versions Date: Wed, 19 Mar 2025 15:16:34 -0700 Message-ID: <20250319221634.71128-5-inwardvessel@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250319221634.71128-1-inwardvessel@gmail.com> References: <20250319221634.71128-1-inwardvessel@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 214F38000D X-Stat-Signature: fthap96b1445trykryz9k3x64ymkynxk X-HE-Tag: 1742422633-783854 X-HE-Meta: U2FsdGVkX18sK/NZkZ1o2eLwxcJmKOn4fiZmsB8XYTbL6hz8K5tsPkAE9iT3lywCGRfZ9YPvGeTjx8u/5TaqNns/dgvuBq1N07aBrA0i+7gzOCIVrGi/6UXFjJFrY+Zi85iFSc9jBvESpcVBtWCWmPc0WzO0wVuichjl6ncolvpJyCXEOn3IStQBomZqgSYN2Be2IhuMHAp+GYdc4ubdi7f1vDq1e5e6W5bfsu4K6eomgaCR7wIy8SP52JEkVamgbJPy3YZrAIgrHTGWbX1CMowwbF/85gC55C9S+jxkKIy2lXoNws5pslrUixOAkuRlJcoSK9X10tIkCxseQtBRa8WukVj2gLp1cq0aMxQorQffF2sFDpU0Il97rckttm1zmuxWctPPTjcpvmWxFa+gxmkk32yl/DOBkSD1saRBWE+xHb4igTUEoTDsWkl7Rr4VehjPKo3AUWnid0Bpk6/RvqpyDocZOJnlcVJw4NEnVvIQXqGLKLL1wnccJREWLtJYnUrK5D8dNbdYd8CSs0vCzB/V49QZOv7uchBDpGI98EJGxl7K/PTF+Bz+KpRieqlb3J8vm5qyfgF3qjaI71A4AqkWWNDjwT80qf2mXlaoWYVM67tWq8/aGNl6rOi7s9trIYiGhUVrchNOc0VE5TJW0GsFik0GajyJbi3n/xksCGhvwZ5S/0tpPZYMtBGqud7+Sp2LIqzQejlkwndlOqfz5GH0yDjA2B/3wSoGyyH6+PjxJUXyu6jICuY3CIW9p8WX0XdWyNtgQFsnoCPwbNIVmajhR9BrijEoCeLlVgAs/i158A8vrc0x9vxKrcZd14JX5V8j4OcLjXg1BVNVoO2MJgmM3Lk2wt2aZ74oXyoe6DUomOJ3ZE237CkWQoZJYoJwmAsNKIR0kJubCd/sITDoFXKfpeBSqqCbL4T36/ZF3Scyo4l5KrGKH5v61baqbPhxxnTHhisccw8CeVBUm/y afogLV9Q 56CsL60tXz5iCk/XFZKiAYImPKSx07z8x+j9ES3Z0dUbi5JChgI4xVfyrerhlGN8vhu6sLY4yfetv+uYCpANa1b6D3aFiCA6O+IjBp/nf83PQG+5QfWGAjHZXFYEN0+dtkWXyMKnPi/402ExMD5nZoqaqC3Yp1DKhQib4iHqa2wTkQkr7v6OEuk21I58SauQ9xY83ig84UFcCVx1cb46Tepl79f5KU/pOgttFBpX9FwtOiW8OAhvE1FK3G0BgnFpXQ/2CBSNHj02FpBcZCh1WHWj/AnftQ0I0dMK8kaAlkhlj/qyXg6wL7BO+lG+381r2pjyDDb9CEoD6SNnlIQVzqATLey9kE41NPNoP1vuiVTJICRlDMMT+mRw7itcw+apnIzr0VehJaJdEg0B01+12AzOdgpXoeKlZi0+G6DKWvtSugwg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The cgroup_rstat_cpu struct contains rstat node pointers and also the base stat objects. Since ownership of the cgroup_rstat_cpu has shifted from cgroup to cgroup_subsys_state, css's other than cgroup::self are now carrying along these base stat objects which go unused. Eliminate this wasted memory by splitting up cgroup_rstat_cpu into two separate structs. The cgroup_rstat_cpu struct is modified in a way that it now contains only the rstat node pointers. css's that are associated with a subsystem (memory, io) use this compact struct to participate in rstat without the memory overhead of the base stat objects. As for css's represented by cgroup::self, a new cgroup_rstat_base_cpu struct is introduced. It contains the new compact cgroup_rstat_cpu struct as its first field followed by the base stat objects. Because the rstat pointers exist at the same offset (beginning) in both structs, cgroup_subsys_state is modified to contain a union of the two structs. Where css initialization is done, the compact struct is allocated when the css is associated with a subsystem. When the css is not associated with a subsystem, the full struct is allocated. The union allows the existing rstat updated/flush routines to work with any css regardless of subsystem association. The base stats routines however, were modified to access the full struct field in the union. The change in memory on a per-cpu basis is shown below. before: struct size sizeof(cgroup_rstat_cpu) =~ 144 bytes /* can vary based on config */ per-cpu overhead nr_cgroups * ( sizeof(cgroup_rstat_cpu) * (1 + nr_rstat_subsystems) ) nr_cgroups * (144 * (1 + 2)) nr_cgroups * 432 432 bytes per cgroup per cpu after: struct sizes sizeof(cgroup_rstat_base_cpu) =~ 144 bytes sizeof(cgroup_rstat_cpu) = 16 bytes per-cpu overhead nr_cgroups * ( sizeof(cgroup_rstat_base_cpu) + sizeof(cgroup_rstat_cpu) * (nr_rstat_subsystems) ) nr_cgroups * (144 + 16 * 2) nr_cgroups * 176 176 bytes per cgroup per cpu savings: 256 bytes per cgroup per cpu Reviewed-by: Shakeel Butt Signed-off-by: JP Kobryn --- include/linux/cgroup-defs.h | 41 +++++++++------ kernel/cgroup/rstat.c | 100 ++++++++++++++++++++++-------------- 2 files changed, 86 insertions(+), 55 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 0ffc8438c6d9..f9b84e7f718d 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -170,7 +170,10 @@ struct cgroup_subsys_state { struct percpu_ref refcnt; /* per-cpu recursive resource statistics */ - struct css_rstat_cpu __percpu *rstat_cpu; + union { + struct css_rstat_cpu __percpu *rstat_cpu; + struct css_rstat_base_cpu __percpu *rstat_base_cpu; + }; /* * siblings list anchored at the parent's ->children @@ -358,6 +361,26 @@ struct cgroup_base_stat { * resource statistics on top of it - bsync, bstat and last_bstat. */ struct css_rstat_cpu { + /* + * Child cgroups with stat updates on this cpu since the last read + * are linked on the parent's ->updated_children through + * ->updated_next. + * + * In addition to being more compact, singly-linked list pointing + * to the cgroup makes it unnecessary for each per-cpu struct to + * point back to the associated cgroup. + * + * Protected by per-cpu rstat_base_cpu_lock when css->ss == NULL + * otherwise, + * Protected by per-cpu css->ss->rstat_cpu_lock + */ + struct cgroup_subsys_state *updated_children; /* terminated by self */ + struct cgroup_subsys_state *updated_next; /* NULL if not on list */ +}; + +struct css_rstat_base_cpu { + struct css_rstat_cpu rstat_cpu; + /* * ->bsync protects ->bstat. These are the only fields which get * updated in the hot path. @@ -384,22 +407,6 @@ struct css_rstat_cpu { * deltas to propagate to the per-cpu subtree_bstat. */ struct cgroup_base_stat last_subtree_bstat; - - /* - * Child cgroups with stat updates on this cpu since the last read - * are linked on the parent's ->updated_children through - * ->updated_next. - * - * In addition to being more compact, singly-linked list pointing - * to the cgroup makes it unnecessary for each per-cpu struct to - * point back to the associated cgroup. - * - * Protected by per-cpu rstat_base_cpu_lock when css->ss == NULL - * otherwise, - * Protected by per-cpu css->ss->rstat_cpu_lock - */ - struct cgroup_subsys_state *updated_children; /* terminated by self */ - struct cgroup_subsys_state *updated_next; /* NULL if not on list */ }; struct cgroup_freezer_state { diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index ffd7ac6bcefc..250f0987407e 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -20,6 +20,12 @@ static struct css_rstat_cpu *css_rstat_cpu( return per_cpu_ptr(css->rstat_cpu, cpu); } +static struct css_rstat_base_cpu *css_rstat_base_cpu( + struct cgroup_subsys_state *css, int cpu) +{ + return per_cpu_ptr(css->rstat_base_cpu, cpu); +} + static spinlock_t *ss_rstat_lock(struct cgroup_subsys *ss) { if (ss) @@ -425,17 +431,35 @@ int css_rstat_init(struct cgroup_subsys_state *css) /* the root cgrp's self css has rstat_cpu preallocated */ if (!css->rstat_cpu) { - css->rstat_cpu = alloc_percpu(struct css_rstat_cpu); - if (!css->rstat_cpu) - return -ENOMEM; + /* One of the union fields must be initialized. + * Allocate the larger rstat struct for base stats when css is + * cgroup::self. + * Otherwise, allocate the compact rstat struct since the css is + * associated with a subsystem. + */ + if (css_is_cgroup(css)) { + css->rstat_base_cpu = alloc_percpu(struct css_rstat_base_cpu); + if (!css->rstat_base_cpu) + return -ENOMEM; + } else { + css->rstat_cpu = alloc_percpu(struct css_rstat_cpu); + if (!css->rstat_cpu) + return -ENOMEM; + } } - /* ->updated_children list is self terminated */ for_each_possible_cpu(cpu) { - struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu); + struct css_rstat_cpu *rstatc; + rstatc = css_rstat_cpu(css, cpu); rstatc->updated_children = css; - u64_stats_init(&rstatc->bsync); + + if (css_is_cgroup(css)) { + struct css_rstat_base_cpu *rstatbc; + + rstatbc = css_rstat_base_cpu(css, cpu); + u64_stats_init(&rstatbc->bsync); + } } return 0; @@ -522,9 +546,9 @@ static void cgroup_base_stat_sub(struct cgroup_base_stat *dst_bstat, static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) { - struct css_rstat_cpu *rstatc = css_rstat_cpu(&cgrp->self, cpu); + struct css_rstat_base_cpu *rstatbc = css_rstat_base_cpu(&cgrp->self, cpu); struct cgroup *parent = cgroup_parent(cgrp); - struct css_rstat_cpu *prstatc; + struct css_rstat_base_cpu *prstatbc; struct cgroup_base_stat delta; unsigned seq; @@ -534,15 +558,15 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) /* fetch the current per-cpu values */ do { - seq = __u64_stats_fetch_begin(&rstatc->bsync); - delta = rstatc->bstat; - } while (__u64_stats_fetch_retry(&rstatc->bsync, seq)); + seq = __u64_stats_fetch_begin(&rstatbc->bsync); + delta = rstatbc->bstat; + } while (__u64_stats_fetch_retry(&rstatbc->bsync, seq)); /* propagate per-cpu delta to cgroup and per-cpu global statistics */ - cgroup_base_stat_sub(&delta, &rstatc->last_bstat); + cgroup_base_stat_sub(&delta, &rstatbc->last_bstat); cgroup_base_stat_add(&cgrp->bstat, &delta); - cgroup_base_stat_add(&rstatc->last_bstat, &delta); - cgroup_base_stat_add(&rstatc->subtree_bstat, &delta); + cgroup_base_stat_add(&rstatbc->last_bstat, &delta); + cgroup_base_stat_add(&rstatbc->subtree_bstat, &delta); /* propagate cgroup and per-cpu global delta to parent (unless that's root) */ if (cgroup_parent(parent)) { @@ -551,73 +575,73 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) cgroup_base_stat_add(&parent->bstat, &delta); cgroup_base_stat_add(&cgrp->last_bstat, &delta); - delta = rstatc->subtree_bstat; - prstatc = css_rstat_cpu(&parent->self, cpu); - cgroup_base_stat_sub(&delta, &rstatc->last_subtree_bstat); - cgroup_base_stat_add(&prstatc->subtree_bstat, &delta); - cgroup_base_stat_add(&rstatc->last_subtree_bstat, &delta); + delta = rstatbc->subtree_bstat; + prstatbc = css_rstat_base_cpu(&parent->self, cpu); + cgroup_base_stat_sub(&delta, &rstatbc->last_subtree_bstat); + cgroup_base_stat_add(&prstatbc->subtree_bstat, &delta); + cgroup_base_stat_add(&rstatbc->last_subtree_bstat, &delta); } } -static struct css_rstat_cpu * +static struct css_rstat_base_cpu * cgroup_base_stat_cputime_account_begin(struct cgroup *cgrp, unsigned long *flags) { - struct css_rstat_cpu *rstatc; + struct css_rstat_base_cpu *rstatbc; - rstatc = get_cpu_ptr(cgrp->self.rstat_cpu); - *flags = u64_stats_update_begin_irqsave(&rstatc->bsync); - return rstatc; + rstatbc = get_cpu_ptr(cgrp->self.rstat_base_cpu); + *flags = u64_stats_update_begin_irqsave(&rstatbc->bsync); + return rstatbc; } static void cgroup_base_stat_cputime_account_end(struct cgroup *cgrp, - struct css_rstat_cpu *rstatc, + struct css_rstat_base_cpu *rstatbc, unsigned long flags) { - u64_stats_update_end_irqrestore(&rstatc->bsync, flags); + u64_stats_update_end_irqrestore(&rstatbc->bsync, flags); css_rstat_updated(&cgrp->self, smp_processor_id()); - put_cpu_ptr(rstatc); + put_cpu_ptr(rstatbc); } void __cgroup_account_cputime(struct cgroup *cgrp, u64 delta_exec) { - struct css_rstat_cpu *rstatc; + struct css_rstat_base_cpu *rstatbc; unsigned long flags; - rstatc = cgroup_base_stat_cputime_account_begin(cgrp, &flags); - rstatc->bstat.cputime.sum_exec_runtime += delta_exec; - cgroup_base_stat_cputime_account_end(cgrp, rstatc, flags); + rstatbc = cgroup_base_stat_cputime_account_begin(cgrp, &flags); + rstatbc->bstat.cputime.sum_exec_runtime += delta_exec; + cgroup_base_stat_cputime_account_end(cgrp, rstatbc, flags); } void __cgroup_account_cputime_field(struct cgroup *cgrp, enum cpu_usage_stat index, u64 delta_exec) { - struct css_rstat_cpu *rstatc; + struct css_rstat_base_cpu *rstatbc; unsigned long flags; - rstatc = cgroup_base_stat_cputime_account_begin(cgrp, &flags); + rstatbc = cgroup_base_stat_cputime_account_begin(cgrp, &flags); switch (index) { case CPUTIME_NICE: - rstatc->bstat.ntime += delta_exec; + rstatbc->bstat.ntime += delta_exec; fallthrough; case CPUTIME_USER: - rstatc->bstat.cputime.utime += delta_exec; + rstatbc->bstat.cputime.utime += delta_exec; break; case CPUTIME_SYSTEM: case CPUTIME_IRQ: case CPUTIME_SOFTIRQ: - rstatc->bstat.cputime.stime += delta_exec; + rstatbc->bstat.cputime.stime += delta_exec; break; #ifdef CONFIG_SCHED_CORE case CPUTIME_FORCEIDLE: - rstatc->bstat.forceidle_sum += delta_exec; + rstatbc->bstat.forceidle_sum += delta_exec; break; #endif default: break; } - cgroup_base_stat_cputime_account_end(cgrp, rstatc, flags); + cgroup_base_stat_cputime_account_end(cgrp, rstatbc, flags); } /*