From patchwork Wed Mar 19 22:21:47 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: JP Kobryn X-Patchwork-Id: 14023233 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20D4BC36000 for ; Wed, 19 Mar 2025 22:22:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 47E94280009; Wed, 19 Mar 2025 18:22:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 403C3280004; Wed, 19 Mar 2025 18:22:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27D21280009; Wed, 19 Mar 2025 18:22:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 04BDC280004 for ; Wed, 19 Mar 2025 18:21:59 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 0E3CA1C8C40 for ; Wed, 19 Mar 2025 22:22:02 +0000 (UTC) X-FDA: 83239724484.27.854B781 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf19.hostedemail.com (Postfix) with ESMTP id 388751A000A for ; Wed, 19 Mar 2025 22:22:00 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KZlm1K7Q; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742422920; a=rsa-sha256; cv=none; b=r1ewiqL9W8S5d3tbfWG6EwQjgqfw/gJE7WuammzfTA5rUK4Z9o1x8W7D9zR3P0uQId/AMQ gI634h71pWTipAUyHgGP9qxfNYLwR9NuRI7+1alxDmoV8ccP7LGguNBg8UgoHUS3JC1kf9 Jx41qRRFkFNeMt2UxGg3HVgdpMM61Og= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KZlm1K7Q; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742422920; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=G+XAb5qZM9EJun3uRHZFaBgl2slfeKVEfwXubaUvgvQ=; b=Ps4kyOnf1CLGbp9plsq147vBHW/hMvFnpvc/CiaP7WUEn1nrloWJwhkkLIw0NydlAdlN6f ErEx8jN7jQOfcKYxG0MxVYin1szNw9Am9aGfRoP3KFtgzqUwJtfiJxjLO0CAbYeNk8UeRD 5+7GqKlEv4v7lVw6ZpHD9ux5iF4XJi8= Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-224171d6826so1482425ad.3 for ; Wed, 19 Mar 2025 15:21:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742422919; x=1743027719; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=G+XAb5qZM9EJun3uRHZFaBgl2slfeKVEfwXubaUvgvQ=; b=KZlm1K7QJXZZfSKlljU5UV5CjyaOYNZiYA+0bHFfEtLwkwuy7VjM42+w9GbeYIWrhD FFGQO3etsZyRjVVFhFiYAXHGKgC73BmPmETvuH0yAAA1X5Ex6CHCB/dabPWlWTbN2N5T 9T8EG6rS2PL3gTnlviUuLBsIPRkmT5wAC5OL5wJKaiyhyAPQWEM7JQVWwJ6WSJo9t6sK BfkACdGjZmrrJxhK7NEpBhUbZkUr3N7L5k73hgpGVQQYydmSCmLVzBvFSG4jhyZHzoGD lLwPjKRQU+NnbjCrp91YQdc+9yZcH0L8mBlGyFQLPeaV/fQBu3nottZMkdHNtllAOwbq reTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742422919; x=1743027719; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=G+XAb5qZM9EJun3uRHZFaBgl2slfeKVEfwXubaUvgvQ=; b=MCP7+SuH39CB/V0KttMVQgAIQTTdF4/YpyKD8DwdirjGrQtsmf4eTrb4NRzG2YeqEy PgEtlH0NXH3RCDGO1v3D7uYYU2GqyyoO5AdqSjnhG9wXbKZbNnVqgjiBtcIwTDBw7nVw LUBkehTqTT7VT0WvBcvAlmjf5124wuRRxs8R/jDixh2VMiyJDb59JAJqXZWDCC9LpaGS VG7jL/MgD1wVVis1Z30DE0BN0aPs1pDClSAKk2TTdQ0BAB4Q5OPYmou75eKy9TQowTVT j+gBwCJRuvkDM0tutwZxEXy4rFtS7A1g+mI8rgxyYvD9ufNMpd8maTWSM9mgCqSwjsQH twJg== X-Gm-Message-State: AOJu0YwzkczhqMsk/xdPKxJg6VE07Qa5HvqguWYsd1zqZXbkZZc3MWp5 YnD95zqYBYIvDvvAOYiUGA2ZwW0InpR4cZIOQ19tXATs9INsCDUf X-Gm-Gg: ASbGncvZAo5wIUxy2JCPQlVmEVV31ArW1TcTVJKXlHniCTRrsWNzcmT4g69ldy2HHAo zfmdhcW03YFBYUb+fqBywbrJFFZE8MXKgeIC46P3Phn3AOFiMN8EgwV/atSlT5w5DjxTb6g9Pty xs1D++2DUsz67Tm5JUZL+xNhQGCXRkeiK4Emrjq2P/UrgeBKXk9pogAlrDN4C89iR6s+wozntjf B1hhLHfHYu+0CRBJQh2D53nzRyguBik+k55Pe4Avd/lO8jmSbiWzKi/p8f/i/jW9BcRTXwfQ+kJ KMm/3CIAuhKjHFBN1siQqUrSYLX/RV3tUqnju5W3Xy+FaYszZvX1+69DK0AHVg/BZbY+/CRMjBm 7VvwgwsM= X-Google-Smtp-Source: AGHT+IHhffs21ubNbKbw8DD38vzkz2j2B8+UW1CCrWUaooj1n92Fo34tlPZHLmMVyTLWDrjn+85oJw== X-Received: by 2002:a05:6a20:7346:b0:1f5:9330:29fe with SMTP id adf61e73a8af0-1fbeba8fdf1mr6705493637.17.1742422919143; Wed, 19 Mar 2025 15:21:59 -0700 (PDT) Received: from jpkobryn-fedora-PF5CFKNC.thefacebook.com ([2620:10d:c090:500::4:39d5]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-af56e9dd388sm11467484a12.20.2025.03.19.15.21.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Mar 2025 15:21:58 -0700 (PDT) From: JP Kobryn To: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com, mkoutny@suse.com, hannes@cmpxchg.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 1/4 v3] cgroup: use separate rstat api for bpf programs Date: Wed, 19 Mar 2025 15:21:47 -0700 Message-ID: <20250319222150.71813-2-inwardvessel@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250319222150.71813-1-inwardvessel@gmail.com> References: <20250319222150.71813-1-inwardvessel@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspam-User: X-Stat-Signature: m1h4qwn573czfttsoxsa5nytrx4hfnme X-Rspamd-Queue-Id: 388751A000A X-HE-Tag: 1742422920-148235 X-HE-Meta: U2FsdGVkX193evXNc7wcJiQV72pUUX3GMOQNuVnCK8fAEQwcJfNsoJ0MWnaIQD4/G36OLcvpWHlA1qy5oek79pZH7gu6wg7B20fsbK9WVeXRsq23YwuBIDArRen2KFmbE00n7hvz2pBQ0mEBsEHsazuVhkuMSxDRYlj17vKod9XrCiLa3Q33Fox+jpkd3pwp4Jcw6lb1A7wgD8iUT3VquUCCfyv37rAa4AvnLlj4/LKAzBV56zVgODng6gd5pP14mN1vpEN6mXLVCPEv8xda2lr27hBb6cISm9S2M5xAGwJNi07Ek1XUZKGAOD1i/lrMLibMBy6UXp9DZ5F3dRLnG82yrFjjq+TLC8KJ6uIZmMNLDXpG5hxRezA7Xv1U0nEkktgMn9weG8dlhjvw72qT8fUV3UjPd4U3lVv0ijdLR8t495RNgGCftwl3r3i1fY0ZDpRhmU7QZQKuRBEpXckHmDKa30WhIYtvk2ePHjZzeZBNhemiQyZwTyVIN1aPHwdrLGr4hU/CO9tYziidtEifLq3enkgUnmckDuEuqb9WF841GhoOE1iVWq8lNVo98t7o8ZzCObYOCB2Br1tq4p1iRy9j3PMwWn7zZrm1fbzG8nQZCd4WjXYped096a9J7eDPOkqiB8lroGPDXZoA7T8/hgZmFubR9T/LUWxrhC2jQE87o6i64Q+AWeDEA65HORb28YRRJWJNT661p7faJwIKkNEnRxorgurXMPR1GaQnFSckeN0tO0o3ogz2fMgPZQ2Nfeqtb9jIKdHDAhy4bZdn+E04iSZdSg/viKxmktQO3mcRQ2MWOjugiTqCPt4Ds01A6ebcYd2Hd6+oOyxGW9eJJ84uc37GqkiYQF85eYVsj6WziRW2dhaIOkd2cs9969UPSE8h42nRjXxfne99EONWy6a7A7uxz60m13Hejp6t1L+EVDvg2n/GgHx68mwVHRJeyntQozKr57rS1BvdKt+ NEviG/4o cpbBtM+Gjc/f551N7xfyWPON4DgWSr8aKBtF3djbxIeykvve1IcMwmPFd1+beW8+HP9yyYCzRuVuqtZh9cAUyzj0Ok1K3YvBONxFvLVNuI2/qBeohXGmJJqJuqne9stSEgsK/L8Ga6pQE3AKLEaxO210pyauQcS61n+Pl98AhqvWqvM7bIqU/SEZy+2HIQOCP9o9ReJvud+rzT6dDWPmCj8efgy/fzCBo3L5LvwqVQLbfQOWfigzFTHCFORsnAziskCImkABiXWlafWjniCnTvOsEPBdLIMykiu9VubNIlW/GVGqNNMAd0nQL9N9biVdhabUtSaU+QNyENikg74KVwCbhBZMFuvj4AbSbVq8eLiLEwP+ywEEbbWmZz3Zz5G+qvHi2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The rstat updated/flush API functions are exported as kfuncs so bpf programs can make the same calls that in-kernel code can. Split these API functions into separate in-kernel and bpf versions. Function signatures remain unchanged. The kfuncs are named with the prefix "bpf_". This non-functional change allows for future commits which will modify the signature of the in-kernel API without impacting bpf call sites. The implementations of the kfuncs serve as adapters to the in-kernel API. Signed-off-by: JP Kobryn --- include/linux/cgroup.h | 3 +++ kernel/cgroup/rstat.c | 19 ++++++++++++++----- .../bpf/progs/cgroup_hierarchical_stats.c | 8 ++++---- 3 files changed, 21 insertions(+), 9 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index f8ef47f8a634..13fd82a4336d 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -692,6 +692,9 @@ void cgroup_rstat_flush(struct cgroup *cgrp); void cgroup_rstat_flush_hold(struct cgroup *cgrp); void cgroup_rstat_flush_release(struct cgroup *cgrp); +void bpf_cgroup_rstat_updated(struct cgroup *cgrp, int cpu); +void bpf_cgroup_rstat_flush(struct cgroup *cgrp); + /* * Basic resource stats. */ diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index aac91466279f..0d66cfc53061 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -82,7 +82,7 @@ void _cgroup_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int cpu, * rstat_cpu->updated_children list. See the comment on top of * cgroup_rstat_cpu definition for details. */ -__bpf_kfunc void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) +void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) { raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); unsigned long flags; @@ -129,6 +129,11 @@ __bpf_kfunc void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) _cgroup_rstat_cpu_unlock(cpu_lock, cpu, cgrp, flags, true); } +__bpf_kfunc void bpf_cgroup_rstat_updated(struct cgroup *cgrp, int cpu) +{ + cgroup_rstat_updated(cgrp, cpu); +} + /** * cgroup_rstat_push_children - push children cgroups into the given list * @head: current head of the list (= subtree root) @@ -346,7 +351,7 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp) * * This function may block. */ -__bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp) +void cgroup_rstat_flush(struct cgroup *cgrp) { might_sleep(); @@ -355,6 +360,11 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp) __cgroup_rstat_unlock(cgrp, -1); } +__bpf_kfunc void bpf_cgroup_rstat_flush(struct cgroup *cgrp) +{ + cgroup_rstat_flush(cgrp); +} + /** * cgroup_rstat_flush_hold - flush stats in @cgrp's subtree and hold * @cgrp: target cgroup @@ -644,10 +654,9 @@ void cgroup_base_stat_cputime_show(struct seq_file *seq) cgroup_force_idle_show(seq, &cgrp->bstat); } -/* Add bpf kfuncs for cgroup_rstat_updated() and cgroup_rstat_flush() */ BTF_KFUNCS_START(bpf_rstat_kfunc_ids) -BTF_ID_FLAGS(func, cgroup_rstat_updated) -BTF_ID_FLAGS(func, cgroup_rstat_flush, KF_SLEEPABLE) +BTF_ID_FLAGS(func, bpf_cgroup_rstat_updated) +BTF_ID_FLAGS(func, bpf_cgroup_rstat_flush, KF_SLEEPABLE) BTF_KFUNCS_END(bpf_rstat_kfunc_ids) static const struct btf_kfunc_id_set bpf_rstat_kfunc_set = { diff --git a/tools/testing/selftests/bpf/progs/cgroup_hierarchical_stats.c b/tools/testing/selftests/bpf/progs/cgroup_hierarchical_stats.c index c74362854948..24450dd4d3f3 100644 --- a/tools/testing/selftests/bpf/progs/cgroup_hierarchical_stats.c +++ b/tools/testing/selftests/bpf/progs/cgroup_hierarchical_stats.c @@ -37,8 +37,8 @@ struct { __type(value, struct attach_counter); } attach_counters SEC(".maps"); -extern void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) __ksym; -extern void cgroup_rstat_flush(struct cgroup *cgrp) __ksym; +extern void bpf_cgroup_rstat_updated(struct cgroup *cgrp, int cpu) __ksym; +extern void bpf_cgroup_rstat_flush(struct cgroup *cgrp) __ksym; static uint64_t cgroup_id(struct cgroup *cgrp) { @@ -75,7 +75,7 @@ int BPF_PROG(counter, struct cgroup *dst_cgrp, struct task_struct *leader, else if (create_percpu_attach_counter(cg_id, 1)) return 0; - cgroup_rstat_updated(dst_cgrp, bpf_get_smp_processor_id()); + bpf_cgroup_rstat_updated(dst_cgrp, bpf_get_smp_processor_id()); return 0; } @@ -141,7 +141,7 @@ int BPF_PROG(dumper, struct bpf_iter_meta *meta, struct cgroup *cgrp) return 1; /* Flush the stats to make sure we get the most updated numbers */ - cgroup_rstat_flush(cgrp); + bpf_cgroup_rstat_flush(cgrp); total_counter = bpf_map_lookup_elem(&attach_counters, &cg_id); if (!total_counter) { From patchwork Wed Mar 19 22:21:48 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: JP Kobryn X-Patchwork-Id: 14023234 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BDB7C36000 for ; Wed, 19 Mar 2025 22:22:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5EAD828000C; Wed, 19 Mar 2025 18:22:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5751C280004; Wed, 19 Mar 2025 18:22:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A0BD28000C; Wed, 19 Mar 2025 18:22:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 05F4B280004 for ; Wed, 19 Mar 2025 18:22:03 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E0F3DB706E for ; Wed, 19 Mar 2025 22:22:04 +0000 (UTC) X-FDA: 83239724568.13.12F28BD Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf21.hostedemail.com (Postfix) with ESMTP id 9FBB91C0003 for ; Wed, 19 Mar 2025 22:22:02 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BEXOYQc5; spf=pass (imf21.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742422922; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ddAcUsPMaGKNvVLTpfP1DRbRXL6vfAD6mKCpTB6XO+A=; b=sJym7hoWo53/1ALfPcCVdma6kBrSA2/CNTGksBQLT69Hw2wx3Blp2c/yK/qpVaMlgodFZl DubDQBrASfev6j+tc7jw2HfZbZ/wp+jhfh1Azh8YEIyjvWSYFWJ9a+5Uqm8ArDIY7G2fYA q3tJeGPFLF7Rk3a10A7zs3sdMbTot6k= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742422922; a=rsa-sha256; cv=none; b=LcVVVADSU7GwCWc+jBGKOAKfxU/jPidng0dvTDS8uk7UxbT3q2Q5a3QRnn3uAAnv08ZMpu N7yNPyEwTu29G1UulHgGDSu1+jSCgr6j0JMzxnUQg2jokk/dopfAqsIgjG1YMDEbu+UL43 vmwRmJ/7bXLg3CbsZkETCNt0DDYiFN4= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=BEXOYQc5; spf=pass (imf21.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-2260c91576aso894735ad.3 for ; Wed, 19 Mar 2025 15:22:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742422921; x=1743027721; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ddAcUsPMaGKNvVLTpfP1DRbRXL6vfAD6mKCpTB6XO+A=; b=BEXOYQc5bDwebkhx8+fpE8bgsGsCbH7QjvvO7dN3m/I983UIGzsH4r7EoKLjPqxkEd Z2ZSsNZtUxcl2lOjR97QbVQvHPp+f2f3rQgUkGIQtWNCc/pc5TlzaRvDaASCQ8dkAOOV HYxJZUhkc5HXPVs7K/ezod+zLcTyzI9yJRNVIgDcneNOUKT0C5nqktdcYW/Ia9yNGGqF U34oOmya08wHhkGrge0Vp4XWho727V/85IheRKEdXvJV5eXydK7jZJlKG8BJjptPqdZj VsyAdWDUC+JIMhTzB0x970iqQC5hdD6lIouasfqZNAblm3JUgQezdgFeI1b+yhpoxmde PRWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742422921; x=1743027721; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ddAcUsPMaGKNvVLTpfP1DRbRXL6vfAD6mKCpTB6XO+A=; b=filIB9WtC/QcqEia2kRT/pdE1XKfvmiBR0M8zy07o5cBbkGlQlHT94v1QSUincd4wr bq6GLqdanygvdexSLSECTv11Rygf6p6jh9l0jNz56dMmhhkm/QSifwwg/KVAhKF+uE1z bRa8fRrKwmvqI2i0niiX3GHk0O1y7Es+ndAJJLpR91B3xLbqIp/5/Tps8mq9/HXZxcE7 n0twC/EPva2bYMVKRZIX4lwrGDk5nfM2GRZiBEmZHvGrFwcdygkwWfL8h07Kzrmq+9vC rZfwUXlFRKLlFL6htTfSC6na7Ni7BUJsQFX9ZHROrmmlBgvuLUb8CgXgGl8nTsxmJOIC AlbA== X-Gm-Message-State: AOJu0YwvYSWB0fXnos+zjPssvwlcrX7PHDkRTr1iw4f0WMLjCiE0IuwW YLpoAXzs7Gfbi8lVcleKhwcx/KE6PLQbOuBDiBxy9yGtODJEIzrs X-Gm-Gg: ASbGncuG7971HFfnMmaJPrJinFRrE9q1AgdJlbHm2Ocyg6BOVP8R742LecDjaktJhQh fcCXAgxLTJfhvXMzU0PvvTiNp6a9EiTGK21HZuULnwL90mMfZ6C0c4VAhxRHTikleMZUi1yFWQK cLVYYlbJ0YMOo5VH3y7/yA817Tueo0fTW9y+Xcc0uY4JftPHKy00E/bUWYwi6eC1WNFhBgyvNaU fUyUxPcxaXMFl2kfOfnwCtSmfushj9mB3m00KNlNdzZ0BRnO8jy3wXdFPpDz1wyVww91urdbMGF Hi8+OjJPxUpxh+9okad8Akh7SsjKe0Gke2nq9HHFJzBee3Jp9n+ho/6Zxups5/UHUORg1kuX X-Google-Smtp-Source: AGHT+IGx52mh2B8HcohIwEGU8W1tvVZgLLMXpXoDkdkWSbbwJoBg/6aNkCcUlVD0yVeePo6lgEMBSA== X-Received: by 2002:a05:6a21:3993:b0:1f5:7df9:f147 with SMTP id adf61e73a8af0-1fbed3161b5mr8159113637.40.1742422921283; Wed, 19 Mar 2025 15:22:01 -0700 (PDT) Received: from jpkobryn-fedora-PF5CFKNC.thefacebook.com ([2620:10d:c090:500::4:39d5]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-af56e9dd388sm11467484a12.20.2025.03.19.15.21.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Mar 2025 15:22:00 -0700 (PDT) From: JP Kobryn To: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com, mkoutny@suse.com, hannes@cmpxchg.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 2/4 v3] cgroup: use separate rstat trees for each subsystem Date: Wed, 19 Mar 2025 15:21:48 -0700 Message-ID: <20250319222150.71813-3-inwardvessel@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250319222150.71813-1-inwardvessel@gmail.com> References: <20250319222150.71813-1-inwardvessel@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 9FBB91C0003 X-Stat-Signature: oemftnr1gunbd7k9growp4y4hw47swin X-HE-Tag: 1742422922-784337 X-HE-Meta: U2FsdGVkX1+fo7I/q8MaOsfux4P0547zNUKBcrIMvxtoA1hbaMjJ9clKZfD+divBdqgwhXSr3X/QQoW9cb5mfpS0thnPuHeAsC94xXZkx6GOgcukNJ2WJoV/HKdtHkioDhbnSiDWp1NpRaEK1wQHjWGX3VBTtaTkLqNODeud+rZoo5/KuEMBhhBxwAv9IBkMy69wySTGVJzoLnMXqlfY0+ilZY5frA+YPXkMH9pOFYlGqfpvI0zDGFcjszrSjF0mGArBdR0AwjyyrExauZ4p9CxtyTOzQUwxufmcmjuz6hkJWM18HtYE7RbpJS3uL5VhLSeBE7h0buaPvXUxZoavdy77nrm0CrPiL5e6Zio1eDragX8s1u0xGwQcRoNRXLsiCoXTHW/9AmxGyGmeRnSYfZf/2PRE0JFb4FHUbV7rjGL+rNgRolbnpiEDx2ugtbeiADUL2KgRv7qBi5VwQy5mIGQSsHvLLoHqILx0slmmOd5Y7GVjC4gI8hsMPgcLgPszHgw9JYTD3TBh7vvmpjB/bH3J5MAP/9xuZDscwGko7fA563VAvmjH4M9SadNkXCkCaXdYWqcK+U2FTCw4x+LWp1Ms2foc3OiFuhmGKxi3GkjGA81KaPLIjb9G2IzwZSIY47dSe5cgYJcwfHQjTQVvAiBatg38aiZPynco81K9sk6zhi5WjGkr/Z+NqbaMUE6/kJVDO7QPhhUjsx1Jc5hkNdfDw4ywgFQ/r30yYYqZOXRpYSSVxZafxaqG35w/nLkR3TSE4RhxfOXmv6L7UmYPSA1Ur3UKnVOPfGrwO12/zEVzoIWEX+Zp5YGC7ezldjcZh3x06XEp6BgR43pH1dGkBFqjYsSXBt82MSP45wtSPt25vZPyicMu5RdHSaRPh0d89kCWJJYvivttDHyluNjF2YvwU/tNoWlL+ksJ3UciOyAo9GKlWA+g8hmdAfvb2UrrgvFwBKLmy5BlERRoX4l Gg7ZNCUH bv2GXDAjo1R44y58ELWsVdxjl3a8EobRBR/qWh0lVunoHEmiEe/4Ap0rw9QLMI3/wFgPSDYsv2Ci8qFdZL1m8soq7x5g5piOEKXwF6EAEuP9PcsqDR2cJy9FsoQ7c90GB0zIunGdU3v0hDDWy+uODqs/8clqGm1FrBN7Qy5KPNhl0VQx5N7OXvFF70Kl3w1EKz/ysKqkas5USVfciXzMoIq9CwxbZiCAbbjEP6oo/bVShGVrI0oOwdAtB8DjCKJaiOBa0SOQw3MjEoo066bmAxc7HXxyIExtMh9MipEhv5x0zogXLlZu7pSBp4ilqXLgvXhpPk5+IgL1Hhl5epWy2zodFrmCqeHg0j5oB2y6BsTWnVMQeJwPecW35/ZpUpyYZHwA5 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Different subsystems may call cgroup_rstat_updated() within the same cgroup, resulting in a tree of pending updates from multiple subsystems. When one of these subsystems is flushed via cgroup_rstat_flushed(), all other subsystems with pending updates on the tree will also be flushed. Change the paradigm of having a single rstat tree for all subsystems to having separate trees for each subsystem. This separation allows for subsystems to perform flushes without the side effects of other subsystems. As an example, flushing the cpu stats will no longer cause the memory stats to be flushed and vice versa. In order to achieve subsystem-specific trees, change the tree node type from cgroup to cgroup_subsys_state pointer. Then remove those pointers from the cgroup and instead place them on the css. Finally, change the updated/flush API's to accept a reference to a css instead of a cgroup. This allows a specific subsystem to be associated with an update or flush. Separate rstat trees will now exist for each unique subsystem. Since updating/flushing will now be done at the subsystem level, there is no longer a need to keep track of updated css nodes at the cgroup level. The list management of these nodes done within the cgroup (rstat_css_list and related) has been removed accordingly. There was also padding in the cgroup to keep rstat_css_list on a cacheline different from rstat_flush_next and the base stats. This padding has also been removed. Signed-off-by: JP Kobryn --- block/blk-cgroup.c | 4 +- include/linux/cgroup-defs.h | 41 ++-- include/linux/cgroup.h | 13 +- kernel/cgroup/cgroup-internal.h | 4 +- kernel/cgroup/cgroup.c | 63 +++--- kernel/cgroup/rstat.c | 212 +++++++++--------- mm/memcontrol.c | 4 +- .../selftests/bpf/progs/btf_type_tag_percpu.c | 5 +- 8 files changed, 177 insertions(+), 169 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 9ed93d91d754..cd9521f4f607 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1201,7 +1201,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v) if (!seq_css(sf)->parent) blkcg_fill_root_iostats(); else - cgroup_rstat_flush(blkcg->css.cgroup); + css_rstat_flush(&blkcg->css); rcu_read_lock(); hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) { @@ -2186,7 +2186,7 @@ void blk_cgroup_bio_start(struct bio *bio) } u64_stats_update_end_irqrestore(&bis->sync, flags); - cgroup_rstat_updated(blkcg->css.cgroup, cpu); + css_rstat_updated(&blkcg->css, cpu); put_cpu(); } diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 17960a1e858d..031f55a9ac49 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -169,6 +169,9 @@ struct cgroup_subsys_state { /* reference count - access via css_[try]get() and css_put() */ struct percpu_ref refcnt; + /* per-cpu recursive resource statistics */ + struct css_rstat_cpu __percpu *rstat_cpu; + /* * siblings list anchored at the parent's ->children * @@ -177,9 +180,6 @@ struct cgroup_subsys_state { struct list_head sibling; struct list_head children; - /* flush target list anchored at cgrp->rstat_css_list */ - struct list_head rstat_css_node; - /* * PI: Subsys-unique ID. 0 is unused and root is always 1. The * matching css can be looked up using css_from_id(). @@ -219,6 +219,13 @@ struct cgroup_subsys_state { * Protected by cgroup_mutex. */ int nr_descendants; + + /* + * A singly-linked list of css structures to be rstat flushed. + * This is a scratch field to be used exclusively by + * cgroup_rstat_flush_locked() and protected by cgroup_rstat_lock. + */ + struct cgroup_subsys_state *rstat_flush_next; }; /* @@ -329,10 +336,10 @@ struct cgroup_base_stat { /* * rstat - cgroup scalable recursive statistics. Accounting is done - * per-cpu in cgroup_rstat_cpu which is then lazily propagated up the + * per-cpu in css_rstat_cpu which is then lazily propagated up the * hierarchy on reads. * - * When a stat gets updated, the cgroup_rstat_cpu and its ancestors are + * When a stat gets updated, the css_rstat_cpu and its ancestors are * linked into the updated tree. On the following read, propagation only * considers and consumes the updated tree. This makes reading O(the * number of descendants which have been active since last read) instead of @@ -347,7 +354,7 @@ struct cgroup_base_stat { * updated_children and updated_next - and the fields which track basic * resource statistics on top of it - bsync, bstat and last_bstat. */ -struct cgroup_rstat_cpu { +struct css_rstat_cpu { /* * ->bsync protects ->bstat. These are the only fields which get * updated in the hot path. @@ -386,8 +393,8 @@ struct cgroup_rstat_cpu { * * Protected by per-cpu cgroup_rstat_cpu_lock. */ - struct cgroup *updated_children; /* terminated by self cgroup */ - struct cgroup *updated_next; /* NULL iff not on the list */ + struct cgroup_subsys_state *updated_children; /* terminated by self */ + struct cgroup_subsys_state *updated_next; /* NULL if not on list */ }; struct cgroup_freezer_state { @@ -516,24 +523,6 @@ struct cgroup { struct cgroup *dom_cgrp; struct cgroup *old_dom_cgrp; /* used while enabling threaded */ - /* per-cpu recursive resource statistics */ - struct cgroup_rstat_cpu __percpu *rstat_cpu; - struct list_head rstat_css_list; - - /* - * Add padding to separate the read mostly rstat_cpu and - * rstat_css_list into a different cacheline from the following - * rstat_flush_next and *bstat fields which can have frequent updates. - */ - CACHELINE_PADDING(_pad_); - - /* - * A singly-linked list of cgroup structures to be rstat flushed. - * This is a scratch field to be used exclusively by - * cgroup_rstat_flush_locked() and protected by cgroup_rstat_lock. - */ - struct cgroup *rstat_flush_next; - /* cgroup basic resource statistics */ struct cgroup_base_stat last_bstat; struct cgroup_base_stat bstat; diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 13fd82a4336d..4e71ae9858d3 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -346,6 +346,11 @@ static inline bool css_is_dying(struct cgroup_subsys_state *css) return !(css->flags & CSS_NO_REF) && percpu_ref_is_dying(&css->refcnt); } +static inline bool css_is_cgroup(struct cgroup_subsys_state *css) +{ + return css->ss == NULL; +} + static inline void cgroup_get(struct cgroup *cgrp) { css_get(&cgrp->self); @@ -687,10 +692,10 @@ static inline void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen) /* * cgroup scalable recursive statistics. */ -void cgroup_rstat_updated(struct cgroup *cgrp, int cpu); -void cgroup_rstat_flush(struct cgroup *cgrp); -void cgroup_rstat_flush_hold(struct cgroup *cgrp); -void cgroup_rstat_flush_release(struct cgroup *cgrp); +void css_rstat_updated(struct cgroup_subsys_state *css, int cpu); +void css_rstat_flush(struct cgroup_subsys_state *css); +void css_rstat_flush_hold(struct cgroup_subsys_state *css); +void css_rstat_flush_release(struct cgroup_subsys_state *css); void bpf_cgroup_rstat_updated(struct cgroup *cgrp, int cpu); void bpf_cgroup_rstat_flush(struct cgroup *cgrp); diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index c964dd7ff967..d4b75fba9a54 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -269,8 +269,8 @@ int cgroup_task_count(const struct cgroup *cgrp); /* * rstat.c */ -int cgroup_rstat_init(struct cgroup *cgrp); -void cgroup_rstat_exit(struct cgroup *cgrp); +int css_rstat_init(struct cgroup_subsys_state *css); +void css_rstat_exit(struct cgroup_subsys_state *css); void cgroup_rstat_boot(void); void cgroup_base_stat_cputime_show(struct seq_file *seq); diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index afc665b7b1fe..1e21065dec0e 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -161,10 +161,12 @@ static struct static_key_true *cgroup_subsys_on_dfl_key[] = { }; #undef SUBSYS -static DEFINE_PER_CPU(struct cgroup_rstat_cpu, cgrp_dfl_root_rstat_cpu); +static DEFINE_PER_CPU(struct css_rstat_cpu, root_self_rstat_cpu); /* the default hierarchy */ -struct cgroup_root cgrp_dfl_root = { .cgrp.rstat_cpu = &cgrp_dfl_root_rstat_cpu }; +struct cgroup_root cgrp_dfl_root = { + .cgrp.self.rstat_cpu = &root_self_rstat_cpu +}; EXPORT_SYMBOL_GPL(cgrp_dfl_root); /* @@ -1358,7 +1360,7 @@ static void cgroup_destroy_root(struct cgroup_root *root) cgroup_unlock(); - cgroup_rstat_exit(cgrp); + css_rstat_exit(&cgrp->self); kernfs_destroy_root(root->kf_root); cgroup_free_root(root); } @@ -1863,13 +1865,6 @@ int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) } spin_unlock_irq(&css_set_lock); - if (ss->css_rstat_flush) { - list_del_rcu(&css->rstat_css_node); - synchronize_rcu(); - list_add_rcu(&css->rstat_css_node, - &dcgrp->rstat_css_list); - } - /* default hierarchy doesn't enable controllers by default */ dst_root->subsys_mask |= 1 << ssid; if (dst_root == &cgrp_dfl_root) { @@ -2052,7 +2047,6 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp) cgrp->dom_cgrp = cgrp; cgrp->max_descendants = INT_MAX; cgrp->max_depth = INT_MAX; - INIT_LIST_HEAD(&cgrp->rstat_css_list); prev_cputime_init(&cgrp->prev_cputime); for_each_subsys(ss, ssid) @@ -2132,7 +2126,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) if (ret) goto destroy_root; - ret = cgroup_rstat_init(root_cgrp); + ret = css_rstat_init(&root_cgrp->self); if (ret) goto destroy_root; @@ -2174,7 +2168,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) goto out; exit_stats: - cgroup_rstat_exit(root_cgrp); + css_rstat_exit(&root_cgrp->self); destroy_root: kernfs_destroy_root(root->kf_root); root->kf_root = NULL; @@ -5407,6 +5401,9 @@ static void css_free_rwork_fn(struct work_struct *work) struct cgroup_subsys_state *parent = css->parent; int id = css->id; + if (ss->css_rstat_flush) + css_rstat_exit(css); + ss->css_free(css); cgroup_idr_remove(&ss->css_idr, id); cgroup_put(cgrp); @@ -5431,7 +5428,7 @@ static void css_free_rwork_fn(struct work_struct *work) cgroup_put(cgroup_parent(cgrp)); kernfs_put(cgrp->kn); psi_cgroup_free(cgrp); - cgroup_rstat_exit(cgrp); + css_rstat_exit(&cgrp->self); kfree(cgrp); } else { /* @@ -5459,11 +5456,8 @@ static void css_release_work_fn(struct work_struct *work) if (ss) { struct cgroup *parent_cgrp; - /* css release path */ - if (!list_empty(&css->rstat_css_node)) { - cgroup_rstat_flush(cgrp); - list_del_rcu(&css->rstat_css_node); - } + if (ss->css_rstat_flush) + css_rstat_flush(css); cgroup_idr_replace(&ss->css_idr, NULL, css->id); if (ss->css_released) @@ -5489,7 +5483,7 @@ static void css_release_work_fn(struct work_struct *work) /* cgroup release path */ TRACE_CGROUP_PATH(release, cgrp); - cgroup_rstat_flush(cgrp); + css_rstat_flush(&cgrp->self); spin_lock_irq(&css_set_lock); for (tcgrp = cgroup_parent(cgrp); tcgrp; @@ -5537,7 +5531,6 @@ static void init_and_link_css(struct cgroup_subsys_state *css, css->id = -1; INIT_LIST_HEAD(&css->sibling); INIT_LIST_HEAD(&css->children); - INIT_LIST_HEAD(&css->rstat_css_node); css->serial_nr = css_serial_nr_next++; atomic_set(&css->online_cnt, 0); @@ -5546,9 +5539,6 @@ static void init_and_link_css(struct cgroup_subsys_state *css, css_get(css->parent); } - if (ss->css_rstat_flush) - list_add_rcu(&css->rstat_css_node, &cgrp->rstat_css_list); - BUG_ON(cgroup_css(cgrp, ss)); } @@ -5641,6 +5631,12 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp, goto err_free_css; css->id = err; + if (ss->css_rstat_flush) { + err = css_rstat_init(css); + if (err) + goto err_free_css; + } + /* @css is ready to be brought online now, make it visible */ list_add_tail_rcu(&css->sibling, &parent_css->children); cgroup_idr_replace(&ss->css_idr, css, css->id); @@ -5654,7 +5650,6 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp, err_list_del: list_del_rcu(&css->sibling); err_free_css: - list_del_rcu(&css->rstat_css_node); INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn); queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork); return ERR_PTR(err); @@ -5682,7 +5677,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name, if (ret) goto out_free_cgrp; - ret = cgroup_rstat_init(cgrp); + ret = css_rstat_init(&cgrp->self); if (ret) goto out_cancel_ref; @@ -5775,7 +5770,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name, out_kernfs_remove: kernfs_remove(cgrp->kn); out_stat_exit: - cgroup_rstat_exit(cgrp); + css_rstat_exit(&cgrp->self); out_cancel_ref: percpu_ref_exit(&cgrp->self.refcnt); out_free_cgrp: @@ -6082,11 +6077,16 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early) css->flags |= CSS_NO_REF; if (early) { - /* allocation can't be done safely during early init */ + /* allocation can't be done safely during early init. + * defer idr and rstat allocations until cgroup_init(). + */ css->id = 1; } else { css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL); BUG_ON(css->id < 0); + + if (ss->css_rstat_flush) + BUG_ON(css_rstat_init(css)); } /* Update the init_css_set to contain a subsys @@ -6185,9 +6185,16 @@ int __init cgroup_init(void) struct cgroup_subsys_state *css = init_css_set.subsys[ss->id]; + /* it is now safe to perform allocations. + * finish setting up subsystems that previously + * deferred idr and rstat allocations. + */ css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL); BUG_ON(css->id < 0); + + if (ss->css_rstat_flush) + BUG_ON(css_rstat_init(css)); } else { cgroup_init_subsys(ss, false); } diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 0d66cfc53061..a28c00b11736 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -14,9 +14,10 @@ static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock); static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu); -static struct cgroup_rstat_cpu *cgroup_rstat_cpu(struct cgroup *cgrp, int cpu) +static struct css_rstat_cpu *css_rstat_cpu( + struct cgroup_subsys_state *css, int cpu) { - return per_cpu_ptr(cgrp->rstat_cpu, cpu); + return per_cpu_ptr(css->rstat_cpu, cpu); } /* @@ -74,16 +75,17 @@ void _cgroup_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int cpu, } /** - * cgroup_rstat_updated - keep track of updated rstat_cpu - * @cgrp: target cgroup + * css_rstat_updated - keep track of updated rstat_cpu + * @css: target cgroup subsystem state * @cpu: cpu on which rstat_cpu was updated * - * @cgrp's rstat_cpu on @cpu was updated. Put it on the parent's matching + * @css's rstat_cpu on @cpu was updated. Put it on the parent's matching * rstat_cpu->updated_children list. See the comment on top of - * cgroup_rstat_cpu definition for details. + * css_rstat_cpu definition for details. */ -void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) +void css_rstat_updated(struct cgroup_subsys_state *css, int cpu) { + struct cgroup *cgrp = css->cgroup; raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); unsigned long flags; @@ -92,19 +94,19 @@ void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) * temporary inaccuracies, which is fine. * * Because @parent's updated_children is terminated with @parent - * instead of NULL, we can tell whether @cgrp is on the list by + * instead of NULL, we can tell whether @css is on the list by * testing the next pointer for NULL. */ - if (data_race(cgroup_rstat_cpu(cgrp, cpu)->updated_next)) + if (data_race(css_rstat_cpu(css, cpu)->updated_next)) return; flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, true); - /* put @cgrp and all ancestors on the corresponding updated lists */ + /* put @css and all ancestors on the corresponding updated lists */ while (true) { - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu); - struct cgroup *parent = cgroup_parent(cgrp); - struct cgroup_rstat_cpu *prstatc; + struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu); + struct cgroup_subsys_state *parent = css->parent; + struct css_rstat_cpu *prstatc; /* * Both additions and removals are bottom-up. If a cgroup @@ -115,15 +117,15 @@ void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) /* Root has no parent to link it to, but mark it busy */ if (!parent) { - rstatc->updated_next = cgrp; + rstatc->updated_next = css; break; } - prstatc = cgroup_rstat_cpu(parent, cpu); + prstatc = css_rstat_cpu(parent, cpu); rstatc->updated_next = prstatc->updated_children; - prstatc->updated_children = cgrp; + prstatc->updated_children = css; - cgrp = parent; + css = parent; } _cgroup_rstat_cpu_unlock(cpu_lock, cpu, cgrp, flags, true); @@ -131,7 +133,7 @@ void cgroup_rstat_updated(struct cgroup *cgrp, int cpu) __bpf_kfunc void bpf_cgroup_rstat_updated(struct cgroup *cgrp, int cpu) { - cgroup_rstat_updated(cgrp, cpu); + css_rstat_updated(&cgrp->self, cpu); } /** @@ -141,18 +143,19 @@ __bpf_kfunc void bpf_cgroup_rstat_updated(struct cgroup *cgrp, int cpu) * @cpu: target cpu * Return: A new singly linked list of cgroups to be flush * - * Iteratively traverse down the cgroup_rstat_cpu updated tree level by + * Iteratively traverse down the css_rstat_cpu updated tree level by * level and push all the parents first before their next level children * into a singly linked list built from the tail backward like "pushing" * cgroups into a stack. The root is pushed by the caller. */ -static struct cgroup *cgroup_rstat_push_children(struct cgroup *head, - struct cgroup *child, int cpu) +static struct cgroup_subsys_state *cgroup_rstat_push_children( + struct cgroup_subsys_state *head, + struct cgroup_subsys_state *child, int cpu) { - struct cgroup *chead = child; /* Head of child cgroup level */ - struct cgroup *ghead = NULL; /* Head of grandchild cgroup level */ - struct cgroup *parent, *grandchild; - struct cgroup_rstat_cpu *crstatc; + struct cgroup_subsys_state *chead = child; /* Head of child css level */ + struct cgroup_subsys_state *ghead = NULL; /* Head of grandchild css level */ + struct cgroup_subsys_state *parent, *grandchild; + struct css_rstat_cpu *crstatc; child->rstat_flush_next = NULL; @@ -160,13 +163,13 @@ static struct cgroup *cgroup_rstat_push_children(struct cgroup *head, while (chead) { child = chead; chead = child->rstat_flush_next; - parent = cgroup_parent(child); + parent = child->parent; /* updated_next is parent cgroup terminated */ while (child != parent) { child->rstat_flush_next = head; head = child; - crstatc = cgroup_rstat_cpu(child, cpu); + crstatc = css_rstat_cpu(child, cpu); grandchild = crstatc->updated_children; if (grandchild != child) { /* Push the grand child to the next level */ @@ -188,31 +191,33 @@ static struct cgroup *cgroup_rstat_push_children(struct cgroup *head, } /** - * cgroup_rstat_updated_list - return a list of updated cgroups to be flushed - * @root: root of the cgroup subtree to traverse + * css_rstat_updated_list - return a list of updated cgroups to be flushed + * @root: root of the css subtree to traverse * @cpu: target cpu * Return: A singly linked list of cgroups to be flushed * * Walks the updated rstat_cpu tree on @cpu from @root. During traversal, - * each returned cgroup is unlinked from the updated tree. + * each returned css is unlinked from the updated tree. * * The only ordering guarantee is that, for a parent and a child pair * covered by a given traversal, the child is before its parent in * the list. * * Note that updated_children is self terminated and points to a list of - * child cgroups if not empty. Whereas updated_next is like a sibling link - * within the children list and terminated by the parent cgroup. An exception + * child css's if not empty. Whereas updated_next is like a sibling link + * within the children list and terminated by the parent css. An exception * here is the cgroup root whose updated_next can be self terminated. */ -static struct cgroup *cgroup_rstat_updated_list(struct cgroup *root, int cpu) +static struct cgroup_subsys_state *css_rstat_updated_list( + struct cgroup_subsys_state *root, int cpu) { + struct cgroup *cgrp = root->cgroup; raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(root, cpu); - struct cgroup *head = NULL, *parent, *child; + struct css_rstat_cpu *rstatc = css_rstat_cpu(root, cpu); + struct cgroup_subsys_state *head = NULL, *parent, *child; unsigned long flags; - flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, root, false); + flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, false); /* Return NULL if this subtree is not on-list */ if (!rstatc->updated_next) @@ -222,17 +227,17 @@ static struct cgroup *cgroup_rstat_updated_list(struct cgroup *root, int cpu) * Unlink @root from its parent. As the updated_children list is * singly linked, we have to walk it to find the removal point. */ - parent = cgroup_parent(root); + parent = root->parent; if (parent) { - struct cgroup_rstat_cpu *prstatc; - struct cgroup **nextp; + struct css_rstat_cpu *prstatc; + struct cgroup_subsys_state **nextp; - prstatc = cgroup_rstat_cpu(parent, cpu); + prstatc = css_rstat_cpu(parent, cpu); nextp = &prstatc->updated_children; while (*nextp != root) { - struct cgroup_rstat_cpu *nrstatc; + struct css_rstat_cpu *nrstatc; - nrstatc = cgroup_rstat_cpu(*nextp, cpu); + nrstatc = css_rstat_cpu(*nextp, cpu); WARN_ON_ONCE(*nextp == parent); nextp = &nrstatc->updated_next; } @@ -249,14 +254,14 @@ static struct cgroup *cgroup_rstat_updated_list(struct cgroup *root, int cpu) if (child != root) head = cgroup_rstat_push_children(head, child, cpu); unlock_ret: - _cgroup_rstat_cpu_unlock(cpu_lock, cpu, root, flags, false); + _cgroup_rstat_cpu_unlock(cpu_lock, cpu, cgrp, flags, false); return head; } /* * A hook for bpf stat collectors to attach to and flush their stats. - * Together with providing bpf kfuncs for cgroup_rstat_updated() and - * cgroup_rstat_flush(), this enables a complete workflow where bpf progs that + * Together with providing bpf kfuncs for css_rstat_updated() and + * css_rstat_flush(), this enables a complete workflow where bpf progs that * collect cgroup stats can integrate with rstat for efficient flushing. * * A static noinline declaration here could cause the compiler to optimize away @@ -304,28 +309,26 @@ static inline void __cgroup_rstat_unlock(struct cgroup *cgrp, int cpu_in_loop) spin_unlock_irq(&cgroup_rstat_lock); } -/* see cgroup_rstat_flush() */ -static void cgroup_rstat_flush_locked(struct cgroup *cgrp) +/* see css_rstat_flush() */ +static void css_rstat_flush_locked(struct cgroup_subsys_state *css) __releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock) { + struct cgroup *cgrp = css->cgroup; int cpu; lockdep_assert_held(&cgroup_rstat_lock); for_each_possible_cpu(cpu) { - struct cgroup *pos = cgroup_rstat_updated_list(cgrp, cpu); + struct cgroup_subsys_state *pos; + pos = css_rstat_updated_list(css, cpu); for (; pos; pos = pos->rstat_flush_next) { - struct cgroup_subsys_state *css; - - cgroup_base_stat_flush(pos, cpu); - bpf_rstat_flush(pos, cgroup_parent(pos), cpu); - - rcu_read_lock(); - list_for_each_entry_rcu(css, &pos->rstat_css_list, - rstat_css_node) - css->ss->css_rstat_flush(css, cpu); - rcu_read_unlock(); + if (css_is_cgroup(pos)) { + cgroup_base_stat_flush(pos->cgroup, cpu); + bpf_rstat_flush(pos->cgroup, + cgroup_parent(pos->cgroup), cpu); + } else if (pos->ss->css_rstat_flush) + pos->ss->css_rstat_flush(pos, cpu); } /* play nice and yield if necessary */ @@ -339,98 +342,101 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp) } /** - * cgroup_rstat_flush - flush stats in @cgrp's subtree - * @cgrp: target cgroup + * css_rstat_flush - flush stats in @css's rstat subtree + * @css: target cgroup subsystem state * - * Collect all per-cpu stats in @cgrp's subtree into the global counters - * and propagate them upwards. After this function returns, all cgroups in - * the subtree have up-to-date ->stat. + * Collect all per-cpu stats in @css's subtree into the global counters + * and propagate them upwards. After this function returns, all rstat + * nodes in the subtree have up-to-date ->stat. * - * This also gets all cgroups in the subtree including @cgrp off the + * This also gets all rstat nodes in the subtree including @css off the * ->updated_children lists. * * This function may block. */ -void cgroup_rstat_flush(struct cgroup *cgrp) +void css_rstat_flush(struct cgroup_subsys_state *css) { + struct cgroup *cgrp = css->cgroup; + might_sleep(); __cgroup_rstat_lock(cgrp, -1); - cgroup_rstat_flush_locked(cgrp); + css_rstat_flush_locked(css); __cgroup_rstat_unlock(cgrp, -1); } __bpf_kfunc void bpf_cgroup_rstat_flush(struct cgroup *cgrp) { - cgroup_rstat_flush(cgrp); + css_rstat_flush(&cgrp->self); } /** - * cgroup_rstat_flush_hold - flush stats in @cgrp's subtree and hold - * @cgrp: target cgroup + * css_rstat_flush_hold - flush stats in @css's rstat subtree and hold + * @css: target subsystem state * - * Flush stats in @cgrp's subtree and prevent further flushes. Must be - * paired with cgroup_rstat_flush_release(). + * Flush stats in @css's rstat subtree and prevent further flushes. Must be + * paired with css_rstat_flush_release(). * * This function may block. */ -void cgroup_rstat_flush_hold(struct cgroup *cgrp) - __acquires(&cgroup_rstat_lock) +void css_rstat_flush_hold(struct cgroup_subsys_state *css) { + struct cgroup *cgrp = css->cgroup; + might_sleep(); __cgroup_rstat_lock(cgrp, -1); - cgroup_rstat_flush_locked(cgrp); + css_rstat_flush_locked(css); } /** - * cgroup_rstat_flush_release - release cgroup_rstat_flush_hold() - * @cgrp: cgroup used by tracepoint + * css_rstat_flush_release - release css_rstat_flush_hold() + * @css: css that was previously used for the call to flush hold */ -void cgroup_rstat_flush_release(struct cgroup *cgrp) - __releases(&cgroup_rstat_lock) +void css_rstat_flush_release(struct cgroup_subsys_state *css) { + struct cgroup *cgrp = css->cgroup; __cgroup_rstat_unlock(cgrp, -1); } -int cgroup_rstat_init(struct cgroup *cgrp) +int css_rstat_init(struct cgroup_subsys_state *css) { int cpu; - /* the root cgrp has rstat_cpu preallocated */ - if (!cgrp->rstat_cpu) { - cgrp->rstat_cpu = alloc_percpu(struct cgroup_rstat_cpu); - if (!cgrp->rstat_cpu) + /* the root cgrp's self css has rstat_cpu preallocated */ + if (!css->rstat_cpu) { + css->rstat_cpu = alloc_percpu(struct css_rstat_cpu); + if (!css->rstat_cpu) return -ENOMEM; } /* ->updated_children list is self terminated */ for_each_possible_cpu(cpu) { - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu); + struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu); - rstatc->updated_children = cgrp; + rstatc->updated_children = css; u64_stats_init(&rstatc->bsync); } return 0; } -void cgroup_rstat_exit(struct cgroup *cgrp) +void css_rstat_exit(struct cgroup_subsys_state *css) { int cpu; - cgroup_rstat_flush(cgrp); + css_rstat_flush(css); /* sanity check */ for_each_possible_cpu(cpu) { - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu); + struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu); - if (WARN_ON_ONCE(rstatc->updated_children != cgrp) || + if (WARN_ON_ONCE(rstatc->updated_children != css) || WARN_ON_ONCE(rstatc->updated_next)) return; } - free_percpu(cgrp->rstat_cpu); - cgrp->rstat_cpu = NULL; + free_percpu(css->rstat_cpu); + css->rstat_cpu = NULL; } void __init cgroup_rstat_boot(void) @@ -471,9 +477,9 @@ static void cgroup_base_stat_sub(struct cgroup_base_stat *dst_bstat, static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) { - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu); + struct css_rstat_cpu *rstatc = css_rstat_cpu(&cgrp->self, cpu); struct cgroup *parent = cgroup_parent(cgrp); - struct cgroup_rstat_cpu *prstatc; + struct css_rstat_cpu *prstatc; struct cgroup_base_stat delta; unsigned seq; @@ -501,35 +507,35 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) cgroup_base_stat_add(&cgrp->last_bstat, &delta); delta = rstatc->subtree_bstat; - prstatc = cgroup_rstat_cpu(parent, cpu); + prstatc = css_rstat_cpu(&parent->self, cpu); cgroup_base_stat_sub(&delta, &rstatc->last_subtree_bstat); cgroup_base_stat_add(&prstatc->subtree_bstat, &delta); cgroup_base_stat_add(&rstatc->last_subtree_bstat, &delta); } } -static struct cgroup_rstat_cpu * +static struct css_rstat_cpu * cgroup_base_stat_cputime_account_begin(struct cgroup *cgrp, unsigned long *flags) { - struct cgroup_rstat_cpu *rstatc; + struct css_rstat_cpu *rstatc; - rstatc = get_cpu_ptr(cgrp->rstat_cpu); + rstatc = get_cpu_ptr(cgrp->self.rstat_cpu); *flags = u64_stats_update_begin_irqsave(&rstatc->bsync); return rstatc; } static void cgroup_base_stat_cputime_account_end(struct cgroup *cgrp, - struct cgroup_rstat_cpu *rstatc, + struct css_rstat_cpu *rstatc, unsigned long flags) { u64_stats_update_end_irqrestore(&rstatc->bsync, flags); - cgroup_rstat_updated(cgrp, smp_processor_id()); + css_rstat_updated(&cgrp->self, smp_processor_id()); put_cpu_ptr(rstatc); } void __cgroup_account_cputime(struct cgroup *cgrp, u64 delta_exec) { - struct cgroup_rstat_cpu *rstatc; + struct css_rstat_cpu *rstatc; unsigned long flags; rstatc = cgroup_base_stat_cputime_account_begin(cgrp, &flags); @@ -540,7 +546,7 @@ void __cgroup_account_cputime(struct cgroup *cgrp, u64 delta_exec) void __cgroup_account_cputime_field(struct cgroup *cgrp, enum cpu_usage_stat index, u64 delta_exec) { - struct cgroup_rstat_cpu *rstatc; + struct css_rstat_cpu *rstatc; unsigned long flags; rstatc = cgroup_base_stat_cputime_account_begin(cgrp, &flags); @@ -625,12 +631,12 @@ void cgroup_base_stat_cputime_show(struct seq_file *seq) u64 usage, utime, stime, ntime; if (cgroup_parent(cgrp)) { - cgroup_rstat_flush_hold(cgrp); + css_rstat_flush_hold(&cgrp->self); usage = cgrp->bstat.cputime.sum_exec_runtime; cputime_adjust(&cgrp->bstat.cputime, &cgrp->prev_cputime, &utime, &stime); ntime = cgrp->bstat.ntime; - cgroup_rstat_flush_release(cgrp); + css_rstat_flush_release(&cgrp->self); } else { /* cgrp->bstat of root is not actually used, reuse it */ root_cgroup_cputime(&cgrp->bstat); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4de6acb9b8ec..fe86d7efe372 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -579,7 +579,7 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) if (!val) return; - cgroup_rstat_updated(memcg->css.cgroup, cpu); + css_rstat_updated(&memcg->css, cpu); statc = this_cpu_ptr(memcg->vmstats_percpu); for (; statc; statc = statc->parent) { stats_updates = READ_ONCE(statc->stats_updates) + abs(val); @@ -611,7 +611,7 @@ static void __mem_cgroup_flush_stats(struct mem_cgroup *memcg, bool force) if (mem_cgroup_is_root(memcg)) WRITE_ONCE(flush_last_time, jiffies_64); - cgroup_rstat_flush(memcg->css.cgroup); + css_rstat_flush(&memcg->css); } /* diff --git a/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c b/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c index 38f78d9345de..f362f7d41b9e 100644 --- a/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c +++ b/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c @@ -45,7 +45,7 @@ int BPF_PROG(test_percpu2, struct bpf_testmod_btf_type_tag_2 *arg) SEC("tp_btf/cgroup_mkdir") int BPF_PROG(test_percpu_load, struct cgroup *cgrp, const char *path) { - g = (__u64)cgrp->rstat_cpu->updated_children; + g = (__u64)cgrp->self.rstat_cpu->updated_children; return 0; } @@ -56,7 +56,8 @@ int BPF_PROG(test_percpu_helper, struct cgroup *cgrp, const char *path) __u32 cpu; cpu = bpf_get_smp_processor_id(); - rstat = (struct cgroup_rstat_cpu *)bpf_per_cpu_ptr(cgrp->rstat_cpu, cpu); + rstat = (struct cgroup_rstat_cpu *)bpf_per_cpu_ptr( + cgrp->self.rstat_cpu, cpu); if (rstat) { /* READ_ONCE */ *(volatile int *)rstat; From patchwork Wed Mar 19 22:21:49 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: JP Kobryn X-Patchwork-Id: 14023235 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0F02C36001 for ; Wed, 19 Mar 2025 22:22:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 351A8280010; Wed, 19 Mar 2025 18:22:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D516280004; Wed, 19 Mar 2025 18:22:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 106C3280010; Wed, 19 Mar 2025 18:22:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id DB5DC280004 for ; Wed, 19 Mar 2025 18:22:03 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id F002F120412 for ; Wed, 19 Mar 2025 22:22:05 +0000 (UTC) X-FDA: 83239724610.17.7DA2FB7 Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf28.hostedemail.com (Postfix) with ESMTP id 1933CC000D for ; Wed, 19 Mar 2025 22:22:03 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ISYplf+f; spf=pass (imf28.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742422924; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yBVQ9JLe0YyGvw+H4jsTKvC5B30TGlzUOUpyXhkBvvk=; b=tSmQRkhPH+xCO3/adtdfyO0obdrd2Wke9doP0vjDZ04RR806w1soYyD6vPwFeFTtm/1b4f FsJWs3D0Yz+Iult5FvnFLfV8jzRgMsrHpUYDrHk2yHMh6e2I1zxEAV45QCuYbNJqi3eXHV JV0eo16rhAHrKMbEP7dPK14IxaaqdvY= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ISYplf+f; spf=pass (imf28.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742422924; a=rsa-sha256; cv=none; b=1Ozd8SgrCQutm1YsTyocNnLxHMfyBjz/7tTbtdJXBMIRk0edjguGms6ZytmRF2wB1YQGf9 4FMVyPtiq8x1y5oMcY0ysrfAE8wj0qRjXv6+nVVHPnPgG0v7wpz2XQClkiwkwATiyt4M// HlAVYHu1ba7atF9dkA27E/VyfZ2VM2Y= Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-224341bbc1dso1035535ad.3 for ; Wed, 19 Mar 2025 15:22:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742422923; x=1743027723; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=yBVQ9JLe0YyGvw+H4jsTKvC5B30TGlzUOUpyXhkBvvk=; b=ISYplf+fr6cyMMKBe+4mNjSWfQT6vY/X5bPGiFYK00sJZuF/yqpfX7JP22VKWIz3u3 YZnCcbKOPg2sm7EKzeE4cFY4muo1BSL1wl+4bWOrbFrDOaVgKJrhb5NujGrWD32QhA9M gHmS6CDa2/EAhqU2Uo4+xJFJ/dJFBa4MHlFYNTsJgLhcaBknQfKlzFhJ1HcNgdmftJ8D JgzVtah03rxrJ/PGX1nFMkKYnAr+9Dy0xT3fhjXlIFkAB6JcBD1V2maCoE/vmCOolIr+ 1xB7PNFXUZpiP2/DhRZZ8ukgSMcUgyBBPO1er3QJhHb3+RVFSQJaUnqUk7Pz8iP00zug jKMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742422923; x=1743027723; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yBVQ9JLe0YyGvw+H4jsTKvC5B30TGlzUOUpyXhkBvvk=; b=kxvkDqK5r4nKb4nJuj9H9dV7IoXyBRIi8WZmttZlrnsoHsX1IRUJGEO7YQk1tfB1Oj VA8EFQnY7uWNu6nbLf9vMRXt75ieis9gs0+PGMGsg1yFsV7uN7IfOycZ9hjqK2fkN79v ir1GDi9Gab41wG26rI+fCUQVXaTfV7RGjwznS9E4Eb8Bm+6J6NN4X4j/jzRMWenLJdYV 1J6iErYfjkBoI4Cuxb8ZZkASL17M0JvbXVyF5I2meUR0YgSLSJwwM9jPgPlEsYqaWfWK Yg7k5FzUIv8NPBrYOye2vz7XzAhe8tIa5xQSFnyoaUywjGOXI03VbYaTthI/NX7O48nz E8DQ== X-Gm-Message-State: AOJu0YzIKs0vGQnCIZl2qGd4FfE6llQNErruYEz3SW0XjwMTAMviYPLV tN/W3pVnlBsXllMtY2fuhUawEuKUCabO4Sy3/LTQPUwBrXhXgewp X-Gm-Gg: ASbGncvq5N1YMF0c5R322Pz+pc2mbkrHdskVp5/JBqVtBqpYggHBzTZBYwWHw/YsF4t C4M8w7HtXupKLqqCkWF0U7nGmjGp0ytlkdv+/HubrbgCJ2+Zg8X/+ehwaB5QNDvUc9FGu+ddK0F vBGtLYlTtYBevPD5QSNS+nK0ae2Q0rRsKWJX81vui3oD5hMfNhCgnPLmgo7Sev8F1nZctYMb/ad xd7a0e9KYQB8GFJRHpQ0SOsOe/LYvDMcQA/e66tVGlgMmdS+ZymVTNzR0mlgL0SnkXiiHQgh+h7 0tCeOqO6DxPTL7eh8qQn9QP0A3f+lsFs7QWdrQdRan9RB8uKBw1UEFpevzSsOy/YOV4HrFPM X-Google-Smtp-Source: AGHT+IEGcY2aKA8XQV959Yq4kTr/uU7ry8vGv9V27wj29zqACrniwt41wDvcC+5GWMs2YXTmgSZOdw== X-Received: by 2002:a05:6a21:6f10:b0:1f5:709d:e0cc with SMTP id adf61e73a8af0-1fbed8e7137mr7970456637.40.1742422922975; Wed, 19 Mar 2025 15:22:02 -0700 (PDT) Received: from jpkobryn-fedora-PF5CFKNC.thefacebook.com ([2620:10d:c090:500::4:39d5]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-af56e9dd388sm11467484a12.20.2025.03.19.15.22.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Mar 2025 15:22:02 -0700 (PDT) From: JP Kobryn To: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com, mkoutny@suse.com, hannes@cmpxchg.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 3/4 v3] cgroup: use subsystem-specific rstat locks to avoid contention Date: Wed, 19 Mar 2025 15:21:49 -0700 Message-ID: <20250319222150.71813-4-inwardvessel@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250319222150.71813-1-inwardvessel@gmail.com> References: <20250319222150.71813-1-inwardvessel@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 1933CC000D X-Rspamd-Server: rspam03 X-Stat-Signature: io37remtnj6n3tcze5ed3fwmm8utb5tc X-HE-Tag: 1742422923-345299 X-HE-Meta: U2FsdGVkX19Bjd0m98LD7G3xIZEQ090bloU/AGg2caoTOU+vETHBo9kknq+PCDEu+ae6wJFX6moqaWlkmtkYZbmF/sqi4KBw4zudZXFn59CIzHnLTVxmN0NqXJnz63KSKl+B0N/sgYmn2mCNsxlmGG8Q8K1Oam6CNpRIATHNqE1BHVSqo2zEqdATNfBk4FXi3wHIK0XGVZ7TPI7ybb3o4oNVKiYAdoa48dU6O+IvsV2D5aUmKY4PiFG/nrG32V2iTRG/8EsM7VgcKYO1OzgYLhY0baN48nPX65tmUJs+JEKn4ua8AbabXnZO02v3CyeexjnNxjU4W3gV6aFdNteNauZPCeCdXKJcFzx/EqotMwDN4r5xgwecUUuPb/oueolK52XmGWHhxcVrVOXE4mz/V7MUORn3hrnnMJhKYp9u2ew4KW51SP5mXOIJGRS+4LFHnS8kP5UW84ss4G5GTQO6aGk7ac1J+i/rWv4d3aVfU6Ngm+8UzmtPjXzIJqtjdEb2SuV33zc1WYU4JSMLjkecenZt6gvS2dPNaiUUXphYyq/K1tdRKDyQ188FSDrrwj8enu0Og75Ffhugth4hIkUPTaYbN6UKhLdi8sheU25vs/KfND6Eay+daTlB7K0QnI1JNnmD4NrSuhn+hbVxTHQuF/PyGjuh9cuS296hFzeHvxPf/LoAn7DIlJMRELSYTcOTOJEW8lCusHxorfX4h2LgnPU1p1dW7NppcTjDBa+bCZ2AMGk6kSMaW8ORSkiFJ1bGuXqQcvX7JAZJqZ94RrtB7RZkTm0NbH8ar8oPIptDCJM1lpTQBPNGksSS9JpyJAs68CgQ2Cg739FG2Z9PiO5/TRyEWwASZ9Wb4c+M8e3ZOCdf9SNAKvsWznUbMgVLRg4U3t1nix4+e9eqDUTGiB2osXoXX+BjYYxHrWg88Wu9SU+1w3avVuUG4gUTtXnQVnVfvmlPB1joikhGWm1GZKh iR4qubXj I4hXhv/UrunMAS5hyQpeq98w0xl2Acl6/3ntI704UcFHXurJNFrvIJZo6wXhus68lRK8QSZlTBz1qWv1x5tDdhYbjKMnJDxDB4wTBVsrb0ftC+VjX3jPVtmnE9nnNgZHLpHweqaL5pnhliHPtFqlARYGb09iLwv/fiuTWKApcq3RuvkR5WiHlD2C7y1uCnbbYcv9bde0/CJN2L9XCAzKCx5gaCC+E+fcMF5Y3uDvUbASFkJbv3DwPSC+qPLU6TJAlkz2OJV+yebw08cPstAM5Ss76+VnjlNkRXnlF3IbjHatFN/XANkrj1/qUjr1JAhd+jRXINMgxav9i96R5cdzIIlhHKYFEBZcEJCsShpQZVsS0ZfggjYaR28lcqB2/25T/+Z8R X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: It is possible to eliminate contention between subsystems when updating/flushing stats by using subsystem-specific locks. Let the existing rstat locks be dedicated to the base stats and rename them to reflect it. Add similar locks to the cgroup_subsys struct for use with individual subsystems. To make use of the new locks, change the existing lock helper functions to accept a reference to a css and use css->ss to access the locks or use the static locks for base stats when css->ss is NULL. Signed-off-by: JP Kobryn --- block/blk-cgroup.c | 2 +- include/linux/cgroup-defs.h | 12 ++- include/trace/events/cgroup.h | 10 ++- kernel/cgroup/cgroup-internal.h | 2 +- kernel/cgroup/cgroup.c | 10 ++- kernel/cgroup/rstat.c | 145 +++++++++++++++++++++----------- 6 files changed, 122 insertions(+), 59 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index cd9521f4f607..34d72bbdd5e5 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1022,7 +1022,7 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu) /* * For covering concurrent parent blkg update from blkg_release(). * - * When flushing from cgroup, cgroup_rstat_lock is always held, so + * When flushing from cgroup, the subsystem lock is always held, so * this lock won't cause contention most of time. */ raw_spin_lock_irqsave(&blkg_stat_lock, flags); diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 031f55a9ac49..0ffc8438c6d9 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -223,7 +223,10 @@ struct cgroup_subsys_state { /* * A singly-linked list of css structures to be rstat flushed. * This is a scratch field to be used exclusively by - * cgroup_rstat_flush_locked() and protected by cgroup_rstat_lock. + * cgroup_rstat_flush_locked(). + * + * protected by rstat_base_lock when css is cgroup::self + * protected by css->ss->lock otherwise */ struct cgroup_subsys_state *rstat_flush_next; }; @@ -391,7 +394,9 @@ struct css_rstat_cpu { * to the cgroup makes it unnecessary for each per-cpu struct to * point back to the associated cgroup. * - * Protected by per-cpu cgroup_rstat_cpu_lock. + * Protected by per-cpu rstat_base_cpu_lock when css->ss == NULL + * otherwise, + * Protected by per-cpu css->ss->rstat_cpu_lock */ struct cgroup_subsys_state *updated_children; /* terminated by self */ struct cgroup_subsys_state *updated_next; /* NULL if not on list */ @@ -779,6 +784,9 @@ struct cgroup_subsys { * specifies the mask of subsystems that this one depends on. */ unsigned int depends_on; + + spinlock_t lock; + raw_spinlock_t __percpu *percpu_lock; }; extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem; diff --git a/include/trace/events/cgroup.h b/include/trace/events/cgroup.h index af2755bda6eb..ec3a95bf4981 100644 --- a/include/trace/events/cgroup.h +++ b/include/trace/events/cgroup.h @@ -231,7 +231,10 @@ DECLARE_EVENT_CLASS(cgroup_rstat, __entry->cpu, __entry->contended) ); -/* Related to global: cgroup_rstat_lock */ +/* Related to locks: + * rstat_base_lock when handling cgroup::self + * css->ss->lock otherwise + */ DEFINE_EVENT(cgroup_rstat, cgroup_rstat_lock_contended, TP_PROTO(struct cgroup *cgrp, int cpu, bool contended), @@ -253,7 +256,10 @@ DEFINE_EVENT(cgroup_rstat, cgroup_rstat_unlock, TP_ARGS(cgrp, cpu, contended) ); -/* Related to per CPU: cgroup_rstat_cpu_lock */ +/* Related to per CPU locks: + * rstat_base_cpu_lock when handling cgroup::self + * css->ss->cpu_lock otherwise + */ DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_lock_contended, TP_PROTO(struct cgroup *cgrp, int cpu, bool contended), diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index d4b75fba9a54..513bfce3bc23 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -271,7 +271,7 @@ int cgroup_task_count(const struct cgroup *cgrp); */ int css_rstat_init(struct cgroup_subsys_state *css); void css_rstat_exit(struct cgroup_subsys_state *css); -void cgroup_rstat_boot(void); +int ss_rstat_init(struct cgroup_subsys *ss); void cgroup_base_stat_cputime_show(struct seq_file *seq); /* diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 1e21065dec0e..3e8948805f67 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -6085,8 +6085,10 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early) css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL); BUG_ON(css->id < 0); - if (ss->css_rstat_flush) + if (ss->css_rstat_flush) { + BUG_ON(ss_rstat_init(ss)); BUG_ON(css_rstat_init(css)); + } } /* Update the init_css_set to contain a subsys @@ -6163,7 +6165,7 @@ int __init cgroup_init(void) BUG_ON(cgroup_init_cftypes(NULL, cgroup_psi_files)); BUG_ON(cgroup_init_cftypes(NULL, cgroup1_base_files)); - cgroup_rstat_boot(); + BUG_ON(ss_rstat_init(NULL)); get_user_ns(init_cgroup_ns.user_ns); @@ -6193,8 +6195,10 @@ int __init cgroup_init(void) GFP_KERNEL); BUG_ON(css->id < 0); - if (ss->css_rstat_flush) + if (ss->css_rstat_flush) { + BUG_ON(ss_rstat_init(ss)); BUG_ON(css_rstat_init(css)); + } } else { cgroup_init_subsys(ss, false); } diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index a28c00b11736..ffd7ac6bcefc 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -9,8 +9,8 @@ #include -static DEFINE_SPINLOCK(cgroup_rstat_lock); -static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock); +static DEFINE_SPINLOCK(rstat_base_lock); +static DEFINE_PER_CPU(raw_spinlock_t, rstat_base_cpu_lock); static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu); @@ -20,8 +20,24 @@ static struct css_rstat_cpu *css_rstat_cpu( return per_cpu_ptr(css->rstat_cpu, cpu); } +static spinlock_t *ss_rstat_lock(struct cgroup_subsys *ss) +{ + if (ss) + return &ss->lock; + + return &rstat_base_lock; +} + +static raw_spinlock_t *ss_rstat_cpu_lock(struct cgroup_subsys *ss, int cpu) +{ + if (ss) + return per_cpu_ptr(ss->percpu_lock, cpu); + + return per_cpu_ptr(&rstat_base_cpu_lock, cpu); +} + /* - * Helper functions for rstat per CPU lock (cgroup_rstat_cpu_lock). + * Helper functions for rstat per CPU locks. * * This makes it easier to diagnose locking issues and contention in * production environments. The parameter @fast_path determine the @@ -29,20 +45,23 @@ static struct css_rstat_cpu *css_rstat_cpu( * operations without handling high-frequency fast-path "update" events. */ static __always_inline -unsigned long _cgroup_rstat_cpu_lock(raw_spinlock_t *cpu_lock, int cpu, - struct cgroup *cgrp, const bool fast_path) +unsigned long _css_rstat_cpu_lock(struct cgroup_subsys_state *css, int cpu, + const bool fast_path) { + struct cgroup *cgrp = css->cgroup; + raw_spinlock_t *cpu_lock; unsigned long flags; bool contended; /* - * The _irqsave() is needed because cgroup_rstat_lock is - * spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring - * this lock with the _irq() suffix only disables interrupts on - * a non-PREEMPT_RT kernel. The raw_spinlock_t below disables - * interrupts on both configurations. The _irqsave() ensures - * that interrupts are always disabled and later restored. + * The _irqsave() is needed because the locks used for flushing are + * spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring this lock + * with the _irq() suffix only disables interrupts on a non-PREEMPT_RT + * kernel. The raw_spinlock_t below disables interrupts on both + * configurations. The _irqsave() ensures that interrupts are always + * disabled and later restored. */ + cpu_lock = ss_rstat_cpu_lock(css->ss, cpu); contended = !raw_spin_trylock_irqsave(cpu_lock, flags); if (contended) { if (fast_path) @@ -62,15 +81,18 @@ unsigned long _cgroup_rstat_cpu_lock(raw_spinlock_t *cpu_lock, int cpu, } static __always_inline -void _cgroup_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int cpu, - struct cgroup *cgrp, unsigned long flags, - const bool fast_path) +void _css_rstat_cpu_unlock(struct cgroup_subsys_state *css, int cpu, + unsigned long flags, const bool fast_path) { + struct cgroup *cgrp = css->cgroup; + raw_spinlock_t *cpu_lock; + if (fast_path) trace_cgroup_rstat_cpu_unlock_fastpath(cgrp, cpu, false); else trace_cgroup_rstat_cpu_unlock(cgrp, cpu, false); + cpu_lock = ss_rstat_cpu_lock(css->ss, cpu); raw_spin_unlock_irqrestore(cpu_lock, flags); } @@ -85,8 +107,6 @@ void _cgroup_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int cpu, */ void css_rstat_updated(struct cgroup_subsys_state *css, int cpu) { - struct cgroup *cgrp = css->cgroup; - raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); unsigned long flags; /* @@ -100,7 +120,7 @@ void css_rstat_updated(struct cgroup_subsys_state *css, int cpu) if (data_race(css_rstat_cpu(css, cpu)->updated_next)) return; - flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, true); + flags = _css_rstat_cpu_lock(css, cpu, true); /* put @css and all ancestors on the corresponding updated lists */ while (true) { @@ -128,7 +148,7 @@ void css_rstat_updated(struct cgroup_subsys_state *css, int cpu) css = parent; } - _cgroup_rstat_cpu_unlock(cpu_lock, cpu, cgrp, flags, true); + _css_rstat_cpu_unlock(css, cpu, flags, true); } __bpf_kfunc void bpf_cgroup_rstat_updated(struct cgroup *cgrp, int cpu) @@ -211,13 +231,11 @@ static struct cgroup_subsys_state *cgroup_rstat_push_children( static struct cgroup_subsys_state *css_rstat_updated_list( struct cgroup_subsys_state *root, int cpu) { - struct cgroup *cgrp = root->cgroup; - raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); struct css_rstat_cpu *rstatc = css_rstat_cpu(root, cpu); struct cgroup_subsys_state *head = NULL, *parent, *child; unsigned long flags; - flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, false); + flags = _css_rstat_cpu_lock(root, cpu, false); /* Return NULL if this subtree is not on-list */ if (!rstatc->updated_next) @@ -254,7 +272,7 @@ static struct cgroup_subsys_state *css_rstat_updated_list( if (child != root) head = cgroup_rstat_push_children(head, child, cpu); unlock_ret: - _cgroup_rstat_cpu_unlock(cpu_lock, cpu, cgrp, flags, false); + _css_rstat_cpu_unlock(root, cpu, flags, false); return head; } @@ -281,7 +299,7 @@ __weak noinline void bpf_rstat_flush(struct cgroup *cgrp, __bpf_hook_end(); /* - * Helper functions for locking cgroup_rstat_lock. + * Helper functions for locking. * * This makes it easier to diagnose locking issues and contention in * production environments. The parameter @cpu_in_loop indicate lock @@ -289,35 +307,44 @@ __bpf_hook_end(); * value -1 is used when obtaining the main lock else this is the CPU * number processed last. */ -static inline void __cgroup_rstat_lock(struct cgroup *cgrp, int cpu_in_loop) - __acquires(&cgroup_rstat_lock) +static inline void __css_rstat_lock(struct cgroup_subsys_state *css, + int cpu_in_loop) + __acquires(lock) { + struct cgroup *cgrp = css->cgroup; + spinlock_t *lock; bool contended; - contended = !spin_trylock_irq(&cgroup_rstat_lock); + lock = ss_rstat_lock(css->ss); + contended = !spin_trylock_irq(lock); if (contended) { trace_cgroup_rstat_lock_contended(cgrp, cpu_in_loop, contended); - spin_lock_irq(&cgroup_rstat_lock); + spin_lock_irq(lock); } trace_cgroup_rstat_locked(cgrp, cpu_in_loop, contended); } -static inline void __cgroup_rstat_unlock(struct cgroup *cgrp, int cpu_in_loop) - __releases(&cgroup_rstat_lock) +static inline void __css_rstat_unlock(struct cgroup_subsys_state *css, + int cpu_in_loop) + __releases(lock) { + struct cgroup *cgrp = css->cgroup; + spinlock_t *lock; + + lock = ss_rstat_lock(css->ss); trace_cgroup_rstat_unlock(cgrp, cpu_in_loop, false); - spin_unlock_irq(&cgroup_rstat_lock); + spin_unlock_irq(lock); } -/* see css_rstat_flush() */ +/* see css_rstat_flush() + * + * it is required that callers have previously acquired a lock via + * __css_rstat_lock(css) + */ static void css_rstat_flush_locked(struct cgroup_subsys_state *css) - __releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock) { - struct cgroup *cgrp = css->cgroup; int cpu; - lockdep_assert_held(&cgroup_rstat_lock); - for_each_possible_cpu(cpu) { struct cgroup_subsys_state *pos; @@ -332,11 +359,11 @@ static void css_rstat_flush_locked(struct cgroup_subsys_state *css) } /* play nice and yield if necessary */ - if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) { - __cgroup_rstat_unlock(cgrp, cpu); + if (need_resched() || spin_needbreak(ss_rstat_lock(css->ss))) { + __css_rstat_unlock(css, cpu); if (!cond_resched()) cpu_relax(); - __cgroup_rstat_lock(cgrp, cpu); + __css_rstat_lock(css, cpu); } } } @@ -356,13 +383,10 @@ static void css_rstat_flush_locked(struct cgroup_subsys_state *css) */ void css_rstat_flush(struct cgroup_subsys_state *css) { - struct cgroup *cgrp = css->cgroup; - might_sleep(); - - __cgroup_rstat_lock(cgrp, -1); + __css_rstat_lock(css, -1); css_rstat_flush_locked(css); - __cgroup_rstat_unlock(cgrp, -1); + __css_rstat_unlock(css, -1); } __bpf_kfunc void bpf_cgroup_rstat_flush(struct cgroup *cgrp) @@ -381,10 +405,8 @@ __bpf_kfunc void bpf_cgroup_rstat_flush(struct cgroup *cgrp) */ void css_rstat_flush_hold(struct cgroup_subsys_state *css) { - struct cgroup *cgrp = css->cgroup; - might_sleep(); - __cgroup_rstat_lock(cgrp, -1); + __css_rstat_lock(css, -1); css_rstat_flush_locked(css); } @@ -394,8 +416,7 @@ void css_rstat_flush_hold(struct cgroup_subsys_state *css) */ void css_rstat_flush_release(struct cgroup_subsys_state *css) { - struct cgroup *cgrp = css->cgroup; - __cgroup_rstat_unlock(cgrp, -1); + __css_rstat_unlock(css, -1); } int css_rstat_init(struct cgroup_subsys_state *css) @@ -439,12 +460,36 @@ void css_rstat_exit(struct cgroup_subsys_state *css) css->rstat_cpu = NULL; } -void __init cgroup_rstat_boot(void) +/** + * ss_rstat_init - subsystem-specific rstat initialization + * @ss: target subsystem + * + * If @ss is NULL, the static locks associated with the base stats + * are initialized. If @ss is non-NULL, the subsystem-specific locks + * are initialized. + */ +int __init ss_rstat_init(struct cgroup_subsys *ss) { int cpu; + if (!ss) { + spin_lock_init(&rstat_base_lock); + + for_each_possible_cpu(cpu) + raw_spin_lock_init(per_cpu_ptr(&rstat_base_cpu_lock, cpu)); + + return 0; + } + + spin_lock_init(&ss->lock); + ss->percpu_lock = alloc_percpu(raw_spinlock_t); + if (!ss->percpu_lock) + return -ENOMEM; + for_each_possible_cpu(cpu) - raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu)); + raw_spin_lock_init(per_cpu_ptr(ss->percpu_lock, cpu)); + + return 0; } /* From patchwork Wed Mar 19 22:21:50 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: JP Kobryn X-Patchwork-Id: 14023236 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05673C35FFA for ; Wed, 19 Mar 2025 22:22:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 05CB728000B; Wed, 19 Mar 2025 18:22:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EB1C3280004; Wed, 19 Mar 2025 18:22:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C94FE28000B; Wed, 19 Mar 2025 18:22:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9A35D280004 for ; Wed, 19 Mar 2025 18:22:05 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A74E61A04EF for ; Wed, 19 Mar 2025 22:22:07 +0000 (UTC) X-FDA: 83239724694.03.DBF186C Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf28.hostedemail.com (Postfix) with ESMTP id D5D35C0004 for ; Wed, 19 Mar 2025 22:22:05 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Xvg43KdG; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf28.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742422925; a=rsa-sha256; cv=none; b=VWba7J8cTRHQzBfOIQtJz8eXkc2jAcC1CIT07uKAXE9q8q0aWj9fxX/AE8Ek1TbRvsmZfT 2sQa0Vq06pK7T7RD/ypC0Zi0ucw2/DctK4iUZPI9Abr1kULs5uZwvD96OqWV0ZECXmYIGa tpqrcYPJzRljQN1no6rf/1APd+TzCOk= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Xvg43KdG; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf28.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742422925; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=afEnSPzJRvzNwhVeLcHVZ9SDJWWdS1gSO21ww78tbw0=; b=t4jCtukugji5w01fvpFcmfRGmCIhNx8eD3OZgFPs2bwNun2kz08UMT699py3dgfldgVPD7 DqW1hkk0ctiQ4YJ3o0maa2nIqRMRg/daLSLHJSTgCckdVwBuDR6Z5hWGkFCzSsgF9rA5uT /palC9kjWCUmrGOIF/m7L84kCcCaDAg= Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-224019ad9edso1430835ad.1 for ; Wed, 19 Mar 2025 15:22:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742422925; x=1743027725; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=afEnSPzJRvzNwhVeLcHVZ9SDJWWdS1gSO21ww78tbw0=; b=Xvg43KdGcl9n4x3merGpN2a6H9K88SfQraidkniDYLhCPObSRgAd923epjlhiyKQil mMiBmfO6cLlMBzC7N/6v6jxUsOIVGgBqqgLb0MNZiLfWARroqHyQOmvfUQwx2Ws5YMGL Q7rMSisZhCRAV72D3ECMl0urYbvMf5W+5LwH16A33gn32rONNMMUG/oPXHR4DlMND0rL E8HwU1AF6r5FHJ5MLiGQvr9/ll1Y+lBF79vB1FAO0CWemR8yM65Ral88i2N1qLWoiTRr Bte0UdEYJVLWz4knuCAsEcN4GzzkqK2dHBOMVahORtFIraDJxujn1GttUrZ0ShwuzNCg HDHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742422925; x=1743027725; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=afEnSPzJRvzNwhVeLcHVZ9SDJWWdS1gSO21ww78tbw0=; b=jK9OLZdBllzGzifFP1zCDS1blKkgTEDVuCYi44Ak3mTiBboN0Wcbf5BZE8cujN7lG6 beHL8HSCq57FXN1saqy283srRFyzEiMCLJm53FVD2+kEXdUhJ5QCzkwamJCWvYj7UPFZ RnS/G3dFuU3rz0y16l6EsDO5Ul8avlyGtkDpreK2wkh2S3+a94IQRqsbWn9/gchMflA1 ZZ9G72UebTh5gqvGjnBSUwvrGssyMaw63V+93jHDeDFzXTW++XiksFdwPXkA8+FLQEKl LGKX8FbQirBYV/jpeT7kAbYpKfebKbPJ2XQpNgY3GvgQDdb2Mca3b4pIhO1KWmSIxlDc dq6Q== X-Gm-Message-State: AOJu0Yxr/wk1w0IQDwuSM7ltWbyFBpON5NdXYN1/GS3N0suNPKUbXM6f gAZTjc5Xrs/3XHrsaCM+Wse5QpuWEMiN5PzHdyAG6fNn3EFlbfxB X-Gm-Gg: ASbGnctP+UtOI6+hkMJJZJLyKo4I7Im6PH3kKYOulYanDTQaJ+Cw3JKmz/C2bq8uqcf TJ651QhdAU5uG8MNvVd68H0a1O1DIKVP4dSjSv3YDy/xPlBmyU4R1o+q7Y0pDNxC16/TvDtGuv4 u5668L0Qp2pa2Q1284rrMJzhdCMFt1GtAHlmNDY3lQv9xLz0TfWbV+kremU9IKWTe+AXAwmYUOB QW/Qy54B4NqKYBBOu0U+UJmy5lKa4sqHnPasZQFoYwTcB5QtLKdQn237saUe7w68xjqN8OulpxR wxdOiarsAncbZTKARM5vMdiXBvbhjxolNug0yZDq4C8XXfBgP6mI3mFwuEzqpilUZRnLmik+U6p xJ4afz0E= X-Google-Smtp-Source: AGHT+IHeBTMuPZFnYGmdte7IN0yVe8czKRw723EhSW3NMILXdMshuXnqJ5xofxMOD6z6JMe6DSzi6Q== X-Received: by 2002:a05:6a21:7308:b0:1fa:995a:5004 with SMTP id adf61e73a8af0-1fbecd36be2mr7740753637.26.1742422924746; Wed, 19 Mar 2025 15:22:04 -0700 (PDT) Received: from jpkobryn-fedora-PF5CFKNC.thefacebook.com ([2620:10d:c090:500::4:39d5]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-af56e9dd388sm11467484a12.20.2025.03.19.15.22.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Mar 2025 15:22:04 -0700 (PDT) From: JP Kobryn To: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com, mkoutny@suse.com, hannes@cmpxchg.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 4/4 v3] cgroup: save memory by splitting cgroup_rstat_cpu into compact and full versions Date: Wed, 19 Mar 2025 15:21:50 -0700 Message-ID: <20250319222150.71813-5-inwardvessel@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250319222150.71813-1-inwardvessel@gmail.com> References: <20250319222150.71813-1-inwardvessel@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: D5D35C0004 X-Rspamd-Server: rspam05 X-Stat-Signature: pqyuqm4jgjmyffur6uz89ecw9xoumswu X-HE-Tag: 1742422925-16215 X-HE-Meta: U2FsdGVkX1+WkQZsdEuRichS0hTZlpIwiFNm8HOL1rcRZKt1JAprkNLgFLos3l9s2iTsubgbHaQTTv07pnOXD0R1bKwA5ZwbWS6ZRtJYF6zHMLKqxj3UEqoCqkoeNLpRxp3SzawwZTplPS7XpHqzT4wiVi1vEAIvcxS0Z518sBLU7ZZvuYWZpKpAJilxe+QDfuX23+j2+K6En4gD82rRt/q5U2X3qp2T0ng2Sz+3TOcagaSo+wrp3JpQZqxgS+FOmuhUHCG2VGerSv4RbbL1hVdnh3fKlBZGveS1zYCgZ3i6NtwZffvnb4YYjmBtw9aT1NchMHJ0jQQl0UsFitx+UVoP/xomoTYWk6YM1rn4AIX9C4cbqFVsuHHsnrnG3lO40iDYlt0q/V2+TbgqEnTOuzY5Q32OfwJwL3TmMVH0WkS2nRjyuvnBenM8z88b/6cqBw0w0Y7AgBafgChqUDwwtB1umHfRuUoQ0Rm8Fp6qEdBmrcrgqI8dX7GP+mvRh2y2K6hCxjqM7esb6/M5fQeoCPKbi7cGemF9N8+TJiHOpj1nV2FA064iFBAgtU2+QwFrmnXXNXFYS2k/+14WEri0bdDKu5/dbCSaG6KeMPDH+Ksod1K+ycwXWaVk8iQnUhp9xdFtsHCpERl5070S+bb/HtUwmxuUYLB8DvYNHISPKzzkzRFF3ExlMBscqoUvpUPZPQrVN5GiWI5acHWgvUbgUUQqw46ARm57hHCt5LWf3knftvWRewYEpdcekRdlUOEliFeNdQHMeRIvYBsY5VxV2QYHpXSeD5GOZBIFiV5sR21AFusMlutwwHc1gTBHsjIw2VxZfU+ISckgTDpMV79a691/95KNdXN9imuLxpDNl6K2GSAlqvC3pWGqh3NrtlXi4QgRAdemIv6od4IAsJcSlTIFUGkbPtv31bDLhD3giPloUTbUtqqLBLK5vMXe8uX/JZgVim5/qyJR3UoF7FO SwAPykto DAKpkQyZeDNuMhHsTZ5teWoxc0aadSSXH/OlMR46ywsp4KDHh3imFH0cyYm7O+rAv/t9LtffAFwKEB+EuCrcZroHC5yq9QKenZxI5Gq98srRpqq4on6hZL4XauWeZmzHWiPkdM6uAXkWpl2++E/tiCcVrEfyztU7gLTo4ZKuEXJIt2fM6LrpWHwkfAeaTlGWPbUgt3seQ3m0r3fdHdbTMm62HXlQZO/fk6p29BQR2TfrvtvLgQdsoU80nIADEtWGzRCQpDunHeMJ3BejygqXgr61V57BSgRZq9PFvT9/1NGMZO3u8f7xJoqgWiPYeVE7gLAfkbKaNcllyx5zBwHR9rb7M/OZbJuvb8S6flgasJPkrogLMqAVO0J67oKBefZUGo2xKSHg0yCEnHEdKaxi5nTWvsgY0l+RJJhu5Yw0npXZfaEQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The cgroup_rstat_cpu struct contains rstat node pointers and also the base stat objects. Since ownership of the cgroup_rstat_cpu has shifted from cgroup to cgroup_subsys_state, css's other than cgroup::self are now carrying along these base stat objects which go unused. Eliminate this wasted memory by splitting up cgroup_rstat_cpu into two separate structs. The cgroup_rstat_cpu struct is modified in a way that it now contains only the rstat node pointers. css's that are associated with a subsystem (memory, io) use this compact struct to participate in rstat without the memory overhead of the base stat objects. As for css's represented by cgroup::self, a new cgroup_rstat_base_cpu struct is introduced. It contains the new compact cgroup_rstat_cpu struct as its first field followed by the base stat objects. Because the rstat pointers exist at the same offset (beginning) in both structs, cgroup_subsys_state is modified to contain a union of the two structs. Where css initialization is done, the compact struct is allocated when the css is associated with a subsystem. When the css is not associated with a subsystem, the full struct is allocated. The union allows the existing rstat updated/flush routines to work with any css regardless of subsystem association. The base stats routines however, were modified to access the full struct field in the union. The change in memory on a per-cpu basis is shown below. before: struct size sizeof(cgroup_rstat_cpu) =~ 144 bytes /* can vary based on config */ per-cpu overhead nr_cgroups * ( sizeof(cgroup_rstat_cpu) * (1 + nr_rstat_subsystems) ) nr_cgroups * (144 * (1 + 2)) nr_cgroups * 432 432 bytes per cgroup per cpu after: struct sizes sizeof(cgroup_rstat_base_cpu) =~ 144 bytes sizeof(cgroup_rstat_cpu) = 16 bytes per-cpu overhead nr_cgroups * ( sizeof(cgroup_rstat_base_cpu) + sizeof(cgroup_rstat_cpu) * (nr_rstat_subsystems) ) nr_cgroups * (144 + 16 * 2) nr_cgroups * 176 176 bytes per cgroup per cpu savings: 256 bytes per cgroup per cpu Reviewed-by: Shakeel Butt Signed-off-by: JP Kobryn --- include/linux/cgroup-defs.h | 41 +++++++++------ kernel/cgroup/rstat.c | 100 ++++++++++++++++++++++-------------- 2 files changed, 86 insertions(+), 55 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 0ffc8438c6d9..f9b84e7f718d 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -170,7 +170,10 @@ struct cgroup_subsys_state { struct percpu_ref refcnt; /* per-cpu recursive resource statistics */ - struct css_rstat_cpu __percpu *rstat_cpu; + union { + struct css_rstat_cpu __percpu *rstat_cpu; + struct css_rstat_base_cpu __percpu *rstat_base_cpu; + }; /* * siblings list anchored at the parent's ->children @@ -358,6 +361,26 @@ struct cgroup_base_stat { * resource statistics on top of it - bsync, bstat and last_bstat. */ struct css_rstat_cpu { + /* + * Child cgroups with stat updates on this cpu since the last read + * are linked on the parent's ->updated_children through + * ->updated_next. + * + * In addition to being more compact, singly-linked list pointing + * to the cgroup makes it unnecessary for each per-cpu struct to + * point back to the associated cgroup. + * + * Protected by per-cpu rstat_base_cpu_lock when css->ss == NULL + * otherwise, + * Protected by per-cpu css->ss->rstat_cpu_lock + */ + struct cgroup_subsys_state *updated_children; /* terminated by self */ + struct cgroup_subsys_state *updated_next; /* NULL if not on list */ +}; + +struct css_rstat_base_cpu { + struct css_rstat_cpu rstat_cpu; + /* * ->bsync protects ->bstat. These are the only fields which get * updated in the hot path. @@ -384,22 +407,6 @@ struct css_rstat_cpu { * deltas to propagate to the per-cpu subtree_bstat. */ struct cgroup_base_stat last_subtree_bstat; - - /* - * Child cgroups with stat updates on this cpu since the last read - * are linked on the parent's ->updated_children through - * ->updated_next. - * - * In addition to being more compact, singly-linked list pointing - * to the cgroup makes it unnecessary for each per-cpu struct to - * point back to the associated cgroup. - * - * Protected by per-cpu rstat_base_cpu_lock when css->ss == NULL - * otherwise, - * Protected by per-cpu css->ss->rstat_cpu_lock - */ - struct cgroup_subsys_state *updated_children; /* terminated by self */ - struct cgroup_subsys_state *updated_next; /* NULL if not on list */ }; struct cgroup_freezer_state { diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index ffd7ac6bcefc..250f0987407e 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -20,6 +20,12 @@ static struct css_rstat_cpu *css_rstat_cpu( return per_cpu_ptr(css->rstat_cpu, cpu); } +static struct css_rstat_base_cpu *css_rstat_base_cpu( + struct cgroup_subsys_state *css, int cpu) +{ + return per_cpu_ptr(css->rstat_base_cpu, cpu); +} + static spinlock_t *ss_rstat_lock(struct cgroup_subsys *ss) { if (ss) @@ -425,17 +431,35 @@ int css_rstat_init(struct cgroup_subsys_state *css) /* the root cgrp's self css has rstat_cpu preallocated */ if (!css->rstat_cpu) { - css->rstat_cpu = alloc_percpu(struct css_rstat_cpu); - if (!css->rstat_cpu) - return -ENOMEM; + /* One of the union fields must be initialized. + * Allocate the larger rstat struct for base stats when css is + * cgroup::self. + * Otherwise, allocate the compact rstat struct since the css is + * associated with a subsystem. + */ + if (css_is_cgroup(css)) { + css->rstat_base_cpu = alloc_percpu(struct css_rstat_base_cpu); + if (!css->rstat_base_cpu) + return -ENOMEM; + } else { + css->rstat_cpu = alloc_percpu(struct css_rstat_cpu); + if (!css->rstat_cpu) + return -ENOMEM; + } } - /* ->updated_children list is self terminated */ for_each_possible_cpu(cpu) { - struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu); + struct css_rstat_cpu *rstatc; + rstatc = css_rstat_cpu(css, cpu); rstatc->updated_children = css; - u64_stats_init(&rstatc->bsync); + + if (css_is_cgroup(css)) { + struct css_rstat_base_cpu *rstatbc; + + rstatbc = css_rstat_base_cpu(css, cpu); + u64_stats_init(&rstatbc->bsync); + } } return 0; @@ -522,9 +546,9 @@ static void cgroup_base_stat_sub(struct cgroup_base_stat *dst_bstat, static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) { - struct css_rstat_cpu *rstatc = css_rstat_cpu(&cgrp->self, cpu); + struct css_rstat_base_cpu *rstatbc = css_rstat_base_cpu(&cgrp->self, cpu); struct cgroup *parent = cgroup_parent(cgrp); - struct css_rstat_cpu *prstatc; + struct css_rstat_base_cpu *prstatbc; struct cgroup_base_stat delta; unsigned seq; @@ -534,15 +558,15 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) /* fetch the current per-cpu values */ do { - seq = __u64_stats_fetch_begin(&rstatc->bsync); - delta = rstatc->bstat; - } while (__u64_stats_fetch_retry(&rstatc->bsync, seq)); + seq = __u64_stats_fetch_begin(&rstatbc->bsync); + delta = rstatbc->bstat; + } while (__u64_stats_fetch_retry(&rstatbc->bsync, seq)); /* propagate per-cpu delta to cgroup and per-cpu global statistics */ - cgroup_base_stat_sub(&delta, &rstatc->last_bstat); + cgroup_base_stat_sub(&delta, &rstatbc->last_bstat); cgroup_base_stat_add(&cgrp->bstat, &delta); - cgroup_base_stat_add(&rstatc->last_bstat, &delta); - cgroup_base_stat_add(&rstatc->subtree_bstat, &delta); + cgroup_base_stat_add(&rstatbc->last_bstat, &delta); + cgroup_base_stat_add(&rstatbc->subtree_bstat, &delta); /* propagate cgroup and per-cpu global delta to parent (unless that's root) */ if (cgroup_parent(parent)) { @@ -551,73 +575,73 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) cgroup_base_stat_add(&parent->bstat, &delta); cgroup_base_stat_add(&cgrp->last_bstat, &delta); - delta = rstatc->subtree_bstat; - prstatc = css_rstat_cpu(&parent->self, cpu); - cgroup_base_stat_sub(&delta, &rstatc->last_subtree_bstat); - cgroup_base_stat_add(&prstatc->subtree_bstat, &delta); - cgroup_base_stat_add(&rstatc->last_subtree_bstat, &delta); + delta = rstatbc->subtree_bstat; + prstatbc = css_rstat_base_cpu(&parent->self, cpu); + cgroup_base_stat_sub(&delta, &rstatbc->last_subtree_bstat); + cgroup_base_stat_add(&prstatbc->subtree_bstat, &delta); + cgroup_base_stat_add(&rstatbc->last_subtree_bstat, &delta); } } -static struct css_rstat_cpu * +static struct css_rstat_base_cpu * cgroup_base_stat_cputime_account_begin(struct cgroup *cgrp, unsigned long *flags) { - struct css_rstat_cpu *rstatc; + struct css_rstat_base_cpu *rstatbc; - rstatc = get_cpu_ptr(cgrp->self.rstat_cpu); - *flags = u64_stats_update_begin_irqsave(&rstatc->bsync); - return rstatc; + rstatbc = get_cpu_ptr(cgrp->self.rstat_base_cpu); + *flags = u64_stats_update_begin_irqsave(&rstatbc->bsync); + return rstatbc; } static void cgroup_base_stat_cputime_account_end(struct cgroup *cgrp, - struct css_rstat_cpu *rstatc, + struct css_rstat_base_cpu *rstatbc, unsigned long flags) { - u64_stats_update_end_irqrestore(&rstatc->bsync, flags); + u64_stats_update_end_irqrestore(&rstatbc->bsync, flags); css_rstat_updated(&cgrp->self, smp_processor_id()); - put_cpu_ptr(rstatc); + put_cpu_ptr(rstatbc); } void __cgroup_account_cputime(struct cgroup *cgrp, u64 delta_exec) { - struct css_rstat_cpu *rstatc; + struct css_rstat_base_cpu *rstatbc; unsigned long flags; - rstatc = cgroup_base_stat_cputime_account_begin(cgrp, &flags); - rstatc->bstat.cputime.sum_exec_runtime += delta_exec; - cgroup_base_stat_cputime_account_end(cgrp, rstatc, flags); + rstatbc = cgroup_base_stat_cputime_account_begin(cgrp, &flags); + rstatbc->bstat.cputime.sum_exec_runtime += delta_exec; + cgroup_base_stat_cputime_account_end(cgrp, rstatbc, flags); } void __cgroup_account_cputime_field(struct cgroup *cgrp, enum cpu_usage_stat index, u64 delta_exec) { - struct css_rstat_cpu *rstatc; + struct css_rstat_base_cpu *rstatbc; unsigned long flags; - rstatc = cgroup_base_stat_cputime_account_begin(cgrp, &flags); + rstatbc = cgroup_base_stat_cputime_account_begin(cgrp, &flags); switch (index) { case CPUTIME_NICE: - rstatc->bstat.ntime += delta_exec; + rstatbc->bstat.ntime += delta_exec; fallthrough; case CPUTIME_USER: - rstatc->bstat.cputime.utime += delta_exec; + rstatbc->bstat.cputime.utime += delta_exec; break; case CPUTIME_SYSTEM: case CPUTIME_IRQ: case CPUTIME_SOFTIRQ: - rstatc->bstat.cputime.stime += delta_exec; + rstatbc->bstat.cputime.stime += delta_exec; break; #ifdef CONFIG_SCHED_CORE case CPUTIME_FORCEIDLE: - rstatc->bstat.forceidle_sum += delta_exec; + rstatbc->bstat.forceidle_sum += delta_exec; break; #endif default: break; } - cgroup_base_stat_cputime_account_end(cgrp, rstatc, flags); + cgroup_base_stat_cputime_account_end(cgrp, rstatbc, flags); } /*