From patchwork Tue Aug 1 14:29:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13336803 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7840B253C3 for ; Tue, 1 Aug 2023 14:29:30 +0000 (UTC) Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1BDAE5C for ; Tue, 1 Aug 2023 07:29:28 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-686e0213c0bso3971643b3a.1 for ; Tue, 01 Aug 2023 07:29:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690900168; x=1691504968; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JociV+0W2qvJDiMkmz+yQyDz8Am55+TtEw9M0BSr3Js=; b=B0P1XO4/4iw5iwQJYg81nd538G/jNedBmJGjWZfXqMuJfwi1xU0rsQm6gapryb4wwN GukuH6KYeEfXaPnS4wq9iLsKUlG0DqNMxICNengEVDSrPGix5rl80tTZbRUiXNq0l0gn MQCJL1uCRgjjJD4ZWmMi1JkUoQmcmJR8SmeLaNUpzaquLCPmQ/kc13ppvFb0R1XlS7GR oaTSTRL8q2accsDi+V+qRIGbGI0HT7u8eldocr8RM8lFsAklFwXr/dqkReA6MZv+s80O wpX/+xzZbmurxDrSJTK/YCx3YiY3LNkm6EDD85s1U3Wqo388jI5MOHC8XbEzk8rmidrq RD9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690900168; x=1691504968; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JociV+0W2qvJDiMkmz+yQyDz8Am55+TtEw9M0BSr3Js=; b=XJCAzq9z9kZA/QVx46Rt7qul88/Tl5U8d/xvaoJEYfm0qTunqSywvZQefX7asGWlWH TPR1a4t+kI13wXXYC8Dm48MXowh+rluhvCB+2PQejBDUTfbFyerrV5jbqOC7wlsjyIPN WSjzYj+FJ+R7vP/a+jTG8BR++YRkfHNPDv1vr/xTV6qDe/j8kXuQRk0bPEvSa6uosfuA yjWEKpu7tEXJOGNK8M2B2lMDUZqb+HF1EWYqK8uJ1V/kw5WivkgjyLy06tgMYMrXl5uP ePuBS1u8GkM2lleA8JKppxXgaqEcjHDZbSJy75ruMDnvLQ00ysmgw67jwT9pDbZYoV6N M7Gg== X-Gm-Message-State: ABy/qLZRt6tGoE4X1c6sUfxbxOFGZ0DxXAadJ/qnd6Bp2iGjIngB7KqJ /YsqdoAOMrnV58lF81jpGy0= X-Google-Smtp-Source: APBJJlGUzzFcnABBba1NrLYYNpKjiiIbBiTB+jcyC4uN0V+JXdsU+OEIJuNfMmqjiMTlsDAanAmVHA== X-Received: by 2002:a05:6a20:1450:b0:132:cd2d:16fd with SMTP id a16-20020a056a20145000b00132cd2d16fdmr12929374pzi.38.1690900168160; Tue, 01 Aug 2023 07:29:28 -0700 (PDT) Received: from vultr.guest ([2001:19f0:ac01:1409:5400:4ff:fe86:cf7a]) by smtp.gmail.com with ESMTPSA id n2-20020aa79042000000b00686a80f431dsm9391491pfo.126.2023.08.01.07.29.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Aug 2023 07:29:27 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, john.fastabend@gmail.com, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next 1/3] bpf: Add bpf_for_each_cpu helper Date: Tue, 1 Aug 2023 14:29:10 +0000 Message-Id: <20230801142912.55078-2-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20230801142912.55078-1-laoar.shao@gmail.com> References: <20230801142912.55078-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Some statistical data is stored in percpu pointers, but the kernel does not consolidate them into a single value, such as the data stored within struct psi_group_cpu. To facilitate obtaining the sum of this data, a new bpf helper called bpf_for_each_cpu is introduced. This new helper implements for_each_{possible, present, online}_cpu, allowing the user to traverse CPUs conveniently. For instance, it enables walking through the CPUs of a cpuset cgroup when the task is within that cgroup. Signed-off-by: Yafang Shao --- include/linux/bpf.h | 1 + include/uapi/linux/bpf.h | 32 +++++++++++++++++++ kernel/bpf/bpf_iter.c | 72 ++++++++++++++++++++++++++++++++++++++++++ kernel/bpf/helpers.c | 2 ++ kernel/bpf/verifier.c | 29 ++++++++++++++++- tools/include/uapi/linux/bpf.h | 32 +++++++++++++++++++ 6 files changed, 167 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index ceaa8c2..3e63607 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2951,6 +2951,7 @@ static inline int bpf_fd_reuseport_array_update_elem(struct bpf_map *map, extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto; extern const struct bpf_func_proto bpf_cgrp_storage_get_proto; extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto; +extern const struct bpf_func_proto bpf_for_each_cpu_proto; const struct bpf_func_proto *tracing_prog_func_proto( enum bpf_func_id func_id, const struct bpf_prog *prog); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 7fc98f4..e8a0ac7 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1663,6 +1663,14 @@ struct bpf_stack_build_id { } __attribute__((aligned(8))); +enum bpf_cpu_mask_type { + CPU_MASK_UNSPEC = 0, + CPU_MASK_POSSIBLE = 1, + CPU_MASK_ONLINE = 2, + CPU_MASK_PRESENT = 3, + CPU_MASK_TASK = 4, /* cpu mask of a task */ +}; + /* The description below is an attempt at providing documentation to eBPF * developers about the multiple available eBPF helper functions. It can be * parsed and used to produce a manual page. The workflow is the following, @@ -5609,6 +5617,29 @@ struct bpf_stack_build_id { * 0 on success. * * **-ENOENT** if the bpf_local_storage cannot be found. + * + * long bpf_for_each_cpu(void *callback_fn, void *callback_ctx, const void *pcpu_ptr, u32 type, u32 target) + * Description + * Walk the percpu pointer **pcpu_ptr** with the callback **callback_fn** function. + * The **callback_fn** should be a static function and the **callback_ctx** should + * be a pointer to the stack. + * The **callback_ctx** is the context parameter. + * The **type** and **tartet** specify which CPUs to walk. If **target** is specified, + * it will get the cpumask from the associated target. + * + * long (\*callback_fn)(u32 cpu, void \*ctx, const void \*ptr); + * + * where **cpu** is the current cpu in the walk, the **ctx** is the **callback_ctx**, + * and the **ptr** is the address of **pcpu_ptr** on current cpu. + * + * If **callback_fn** returns 0, the helper will continue to the next + * loop. If return value is 1, the helper will skip the rest of + * the loops and return. Other return values are not used now, + * and will be rejected by the verifier. + * + * Return + * The number of CPUs walked, **-EINVAL** for invalid **type**, **target** or + * **pcpu_ptr**. */ #define ___BPF_FUNC_MAPPER(FN, ctx...) \ FN(unspec, 0, ##ctx) \ @@ -5823,6 +5854,7 @@ struct bpf_stack_build_id { FN(user_ringbuf_drain, 209, ##ctx) \ FN(cgrp_storage_get, 210, ##ctx) \ FN(cgrp_storage_delete, 211, ##ctx) \ + FN(for_each_cpu, 212, ##ctx) \ /* */ /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c index 96856f1..e588a14 100644 --- a/kernel/bpf/bpf_iter.c +++ b/kernel/bpf/bpf_iter.c @@ -6,6 +6,8 @@ #include #include #include +#include +#include struct bpf_iter_target_info { struct list_head list; @@ -777,6 +779,76 @@ int bpf_iter_run_prog(struct bpf_prog *prog, void *ctx) .arg4_type = ARG_ANYTHING, }; +BPF_CALL_5(bpf_for_each_cpu, void *, callback_fn, void *, callback_ctx, + const void *, pcpu_ptr, u32, type, u32, target) +{ + bpf_callback_t callback = (bpf_callback_t)callback_fn; + struct task_struct *task = NULL; + const cpumask_t *mask; + const void *ptr; + u64 ret; + u32 cpu; + + if (!pcpu_ptr) + return -EINVAL; + + if ((type != CPU_MASK_TASK && target) || (type == CPU_MASK_TASK && !target)) + return -EINVAL; + + switch (type) { + case CPU_MASK_POSSIBLE: + mask = cpu_possible_mask; + break; + case CPU_MASK_ONLINE: + mask = cpu_online_mask; + break; + case CPU_MASK_PRESENT: + mask = cpu_present_mask; + break; + case CPU_MASK_TASK: + rcu_read_lock(); + task = get_pid_task(find_vpid(target), PIDTYPE_PID); + rcu_read_unlock(); + if (!task) + return -EINVAL; + mask = task->cpus_ptr; + break; + default: + return -EINVAL; + } + + for_each_cpu(cpu, mask) { + ptr = per_cpu_ptr((const void __percpu *)pcpu_ptr, cpu); + if (!ptr) { + if (task) + put_task_struct(task); + return cpu + 1; + } + + ret = callback((u64)cpu, (u64)(long)callback_ctx, (u64)(long)ptr, 0, 0); + if (ret) { + if (task) + put_task_struct(task); + return cpu + 1; + } + } + + if (task) + put_task_struct(task); + return cpu; +} + +const struct bpf_func_proto bpf_for_each_cpu_proto = { + .func = bpf_for_each_cpu, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_FUNC, + .arg2_type = ARG_PTR_TO_STACK_OR_NULL, + .arg3_type = ARG_PTR_TO_PERCPU_BTF_ID, + .arg4_type = ARG_ANYTHING, + .arg5_type = ARG_ANYTHING, +}; + struct bpf_iter_num_kern { int cur; /* current value, inclusive */ int end; /* final value, exclusive */ diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 56ce500..06f624e 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1768,6 +1768,8 @@ static int bpf_dynptr_check_off_len(const struct bpf_dynptr_kern *ptr, u32 offse case BPF_FUNC_get_current_ancestor_cgroup_id: return &bpf_get_current_ancestor_cgroup_id_proto; #endif + case BPF_FUNC_for_each_cpu: + return &bpf_for_each_cpu_proto; default: break; } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 71473c1..cd6d0a4 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -549,7 +549,8 @@ static bool is_callback_calling_function(enum bpf_func_id func_id) func_id == BPF_FUNC_timer_set_callback || func_id == BPF_FUNC_find_vma || func_id == BPF_FUNC_loop || - func_id == BPF_FUNC_user_ringbuf_drain; + func_id == BPF_FUNC_user_ringbuf_drain || + func_id == BPF_FUNC_for_each_cpu; } static bool is_async_callback_calling_function(enum bpf_func_id func_id) @@ -9028,6 +9029,28 @@ static int set_user_ringbuf_callback_state(struct bpf_verifier_env *env, return 0; } +static int set_for_each_cpu_callback_state(struct bpf_verifier_env *env, + struct bpf_func_state *caller, + struct bpf_func_state *callee, + int insn_idx) +{ + /* long bpf_for_each_cpu(bpf_callback_t callback_fn, void *callback_ctx, + * const void *pc, u32 type, u64 flags) + * callback_fn(u64 cpu, void *callback_ctx, const void *pc); + */ + callee->regs[BPF_REG_1].type = SCALAR_VALUE; + callee->regs[BPF_REG_2] = caller->regs[BPF_REG_2]; + callee->regs[BPF_REG_3] = caller->regs[BPF_REG_3]; + + /* unused */ + __mark_reg_not_init(env, &callee->regs[BPF_REG_4]); + __mark_reg_not_init(env, &callee->regs[BPF_REG_5]); + + callee->in_callback_fn = true; + callee->callback_ret_range = tnum_range(0, 1); + return 0; +} + static int set_rbtree_add_callback_state(struct bpf_verifier_env *env, struct bpf_func_state *caller, struct bpf_func_state *callee, @@ -9625,6 +9648,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, set_user_ringbuf_callback_state); break; + case BPF_FUNC_for_each_cpu: + err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, + set_for_each_cpu_callback_state); + break; } if (err) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 7fc98f4..e8a0ac7 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1663,6 +1663,14 @@ struct bpf_stack_build_id { } __attribute__((aligned(8))); +enum bpf_cpu_mask_type { + CPU_MASK_UNSPEC = 0, + CPU_MASK_POSSIBLE = 1, + CPU_MASK_ONLINE = 2, + CPU_MASK_PRESENT = 3, + CPU_MASK_TASK = 4, /* cpu mask of a task */ +}; + /* The description below is an attempt at providing documentation to eBPF * developers about the multiple available eBPF helper functions. It can be * parsed and used to produce a manual page. The workflow is the following, @@ -5609,6 +5617,29 @@ struct bpf_stack_build_id { * 0 on success. * * **-ENOENT** if the bpf_local_storage cannot be found. + * + * long bpf_for_each_cpu(void *callback_fn, void *callback_ctx, const void *pcpu_ptr, u32 type, u32 target) + * Description + * Walk the percpu pointer **pcpu_ptr** with the callback **callback_fn** function. + * The **callback_fn** should be a static function and the **callback_ctx** should + * be a pointer to the stack. + * The **callback_ctx** is the context parameter. + * The **type** and **tartet** specify which CPUs to walk. If **target** is specified, + * it will get the cpumask from the associated target. + * + * long (\*callback_fn)(u32 cpu, void \*ctx, const void \*ptr); + * + * where **cpu** is the current cpu in the walk, the **ctx** is the **callback_ctx**, + * and the **ptr** is the address of **pcpu_ptr** on current cpu. + * + * If **callback_fn** returns 0, the helper will continue to the next + * loop. If return value is 1, the helper will skip the rest of + * the loops and return. Other return values are not used now, + * and will be rejected by the verifier. + * + * Return + * The number of CPUs walked, **-EINVAL** for invalid **type**, **target** or + * **pcpu_ptr**. */ #define ___BPF_FUNC_MAPPER(FN, ctx...) \ FN(unspec, 0, ##ctx) \ @@ -5823,6 +5854,7 @@ struct bpf_stack_build_id { FN(user_ringbuf_drain, 209, ##ctx) \ FN(cgrp_storage_get, 210, ##ctx) \ FN(cgrp_storage_delete, 211, ##ctx) \ + FN(for_each_cpu, 212, ##ctx) \ /* */ /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't From patchwork Tue Aug 1 14:29:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13336804 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18A5B25172 for ; Tue, 1 Aug 2023 14:29:31 +0000 (UTC) Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2CDD51724 for ; Tue, 1 Aug 2023 07:29:30 -0700 (PDT) Received: by mail-pf1-x434.google.com with SMTP id d2e1a72fcca58-686e0213c0bso3971665b3a.1 for ; Tue, 01 Aug 2023 07:29:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690900169; x=1691504969; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8zC3WGzZtaMKNMFp/YGjNb28rA0pmC4/+92OTfnwc+k=; b=QSiVTBUI6+DxJqeoC9Iv+0F9ZJu1Vu8Ap/c3HT75RuIuar1b7N8uwPzQJiNnLhA/Jx 4+nXs4gdTnDHv6TTwghyuMweztKnNcShYelUFj3X66Z+QPsn2f24sday8sMyARz+JS5m lMFrnrkCdCoIsZIyeYE9Stsw54X66er4lh+cvENAv86kyOZcN4y0Wm+s0VCBuzKTZG4m K9bBluX8U/pSUEApkUs70kECCZHhpH1eCcjXx3qd9y90zjN17gj4zJT8kGDYOTjFDKDy IznQq9435k/Q0zGd4Hj2iXIZcVjAUmGxtBSSUS87f690yP0u1DNBSg/DcxWStxiLofA6 u2yQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690900169; x=1691504969; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8zC3WGzZtaMKNMFp/YGjNb28rA0pmC4/+92OTfnwc+k=; b=dVgYSq6sLBBieii/aKGG+Wflskl4Yon7EhGFl2UcYbu5AMQQ6zcXc87QH22DXGAZ6R 37DVEcbV+uhfbkpmJxywml/C9UFFW2l0cZFxn7KNF2diiesNXEf4Zw+iHbRm0rg6m5rf Q3l82KYJxWWF29UafA33uG+dWul6r0q+b0AiQa/rrvXbnPLHShccbm7S1WW+UkQINDV1 AzxEZjsvKlMR7t/bHvSSYKEB98wX8d3fCaYd8G2cVNxvwwjKQlQ1ho6JUXAFiyHnfqSP MyAzEhIgaiWePCynlmlvM0Kim8fUNAv01zO0atdKkIfxkvrhLab85wWVgqO0nRviGDtF m6lg== X-Gm-Message-State: ABy/qLai8k0MRlr9p5tu7IJN1PuJeLIzRDlAA55fQyWL+MnV4iDo8edB gl8bqF2Oh5CtcJS5XKfyUT0= X-Google-Smtp-Source: APBJJlHsRXvb5Uy63SVM3+1m+lhHr/tA42aPadDm/psfm7FHzWiO+Jswir3kNH9t/DKEl9dp/xHbkA== X-Received: by 2002:a05:6a00:1953:b0:687:189c:4e3d with SMTP id s19-20020a056a00195300b00687189c4e3dmr11576339pfk.10.1690900169482; Tue, 01 Aug 2023 07:29:29 -0700 (PDT) Received: from vultr.guest ([2001:19f0:ac01:1409:5400:4ff:fe86:cf7a]) by smtp.gmail.com with ESMTPSA id n2-20020aa79042000000b00686a80f431dsm9391491pfo.126.2023.08.01.07.29.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Aug 2023 07:29:29 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, john.fastabend@gmail.com, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next 2/3] cgroup, psi: Init root cgroup psi to psi_system Date: Tue, 1 Aug 2023 14:29:11 +0000 Message-Id: <20230801142912.55078-3-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20230801142912.55078-1-laoar.shao@gmail.com> References: <20230801142912.55078-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC By initializing the root cgroup's psi field to psi_system, we can consistently obtain the psi information for all cgroups from the struct cgroup. Signed-off-by: Yafang Shao --- include/linux/psi.h | 2 +- kernel/cgroup/cgroup.c | 5 ++++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/include/linux/psi.h b/include/linux/psi.h index e074587..8f2db51 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -34,7 +34,7 @@ __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, #ifdef CONFIG_CGROUPS static inline struct psi_group *cgroup_psi(struct cgroup *cgrp) { - return cgroup_ino(cgrp) == 1 ? &psi_system : cgrp->psi; + return cgrp->psi; } int psi_cgroup_alloc(struct cgroup *cgrp); diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index f55a40d..d7ba5fa 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -164,7 +164,10 @@ struct cgroup_subsys *cgroup_subsys[] = { static DEFINE_PER_CPU(struct cgroup_rstat_cpu, cgrp_dfl_root_rstat_cpu); /* the default hierarchy */ -struct cgroup_root cgrp_dfl_root = { .cgrp.rstat_cpu = &cgrp_dfl_root_rstat_cpu }; +struct cgroup_root cgrp_dfl_root = { + .cgrp.rstat_cpu = &cgrp_dfl_root_rstat_cpu, + .cgrp.psi = &psi_system, +}; EXPORT_SYMBOL_GPL(cgrp_dfl_root); /* From patchwork Tue Aug 1 14:29:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13336805 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B0FF25149 for ; Tue, 1 Aug 2023 14:29:37 +0000 (UTC) Received: from mail-pf1-x42a.google.com (mail-pf1-x42a.google.com [IPv6:2607:f8b0:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 766CFE9 for ; Tue, 1 Aug 2023 07:29:31 -0700 (PDT) Received: by mail-pf1-x42a.google.com with SMTP id d2e1a72fcca58-686efb9ee0cso5578061b3a.3 for ; Tue, 01 Aug 2023 07:29:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690900171; x=1691504971; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2sUkC4jkUYFldvTGZxhhk+zWylwTGWzdpzJvGKnftz8=; b=Gg449gmapyYiLQ8s3HS4nEvdO/DISkPqXZMTP6R5fWs+jJ1PkbOq29ksYsdMDK4u07 P/LX/Gdz3JaJMSQcG383Ltb1OJ00quo9MNoENlSyOfu1VfN3XOEdk0Ujd2XCsKt9AjKe 5X8r1yGkYJecY289iQ9GDTnh+UJKgNjYsSTr9hqhmJudPt5ObQsXtk3r3dnpRFXHfERg OKRj0miQiEMVqmZnfd/kgGuFMbPOBcCFYHyyLfDgeDbwNvMNE+cshCActEBnoJ8ecDDS 76T2/QzPlQ3KD/H/xIci8/flxu3go37LZcqBTViXJ2V2q2eY+xBkGGGiXXEmGVnFknG5 jPOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690900171; x=1691504971; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2sUkC4jkUYFldvTGZxhhk+zWylwTGWzdpzJvGKnftz8=; b=HXmG4/0izPhyTCpA+l5D2octqn4oozNe1b+S0HJoB7ntBNV5DNClkGWkJ8yLzTr/JT h7XVv1BZEnEVIP9dnM++/GDDeYzXXBJFevvCVkeVh4WbF1VP5viH47l3Dq7zaUzXQhSc Q6ef2hIAmjgZu3Baql7Tyfwka4OHlFy3l2PG/gG4/JG929azElnTej0az4Rdv9WtXjm0 r6Jcq2eulRwUSqWq9ap570dwwfIhLXOFVuctu4KbdxlouFXrlTDcV4cbyNrwRoY9NKyp ecWvvEuRRTKNvCHSskrxW57YrDwteadcNK6IfzsQjslmm+YWZOIghXRIY8TDumRIqz2b zY7w== X-Gm-Message-State: ABy/qLbJ2V3Tp2godIamlMD37N2ZxgfvsMCeU+IdVapT8F4HVGMcwoND 4RZsGWMCFOkTF6kN1m1QE14= X-Google-Smtp-Source: APBJJlHlaoRIIWzQgtLqFfhb5SJmkZZox7XGcH+5v/H48hp0SkVo1/zsNjQIPCvNZvw9GhLfLxDpjA== X-Received: by 2002:a05:6a00:b8a:b0:687:494c:2ebf with SMTP id g10-20020a056a000b8a00b00687494c2ebfmr4980186pfj.7.1690900170895; Tue, 01 Aug 2023 07:29:30 -0700 (PDT) Received: from vultr.guest ([2001:19f0:ac01:1409:5400:4ff:fe86:cf7a]) by smtp.gmail.com with ESMTPSA id n2-20020aa79042000000b00686a80f431dsm9391491pfo.126.2023.08.01.07.29.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Aug 2023 07:29:30 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, john.fastabend@gmail.com, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org Cc: bpf@vger.kernel.org, Yafang Shao Subject: [RFC PATCH bpf-next 3/3] selftests/bpf: Add selftest for for_each_cpu Date: Tue, 1 Aug 2023 14:29:12 +0000 Message-Id: <20230801142912.55078-4-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20230801142912.55078-1-laoar.shao@gmail.com> References: <20230801142912.55078-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Add selftest for the new for_each_cpu helper. The result: $ tools/testing/selftests/bpf/test_progs --name=for_each_cpu #84/1 for_each_cpu/psi_system:OK #84/2 for_each_cpu/psi_cgroup:OK #84/3 for_each_cpu/invalid_cpumask:OK #84 for_each_cpu:OK Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Yafang Shao --- .../selftests/bpf/prog_tests/for_each_cpu.c | 137 +++++++++++++++++++++ .../selftests/bpf/progs/test_for_each_cpu.c | 63 ++++++++++ 2 files changed, 200 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/for_each_cpu.c create mode 100644 tools/testing/selftests/bpf/progs/test_for_each_cpu.c diff --git a/tools/testing/selftests/bpf/prog_tests/for_each_cpu.c b/tools/testing/selftests/bpf/prog_tests/for_each_cpu.c new file mode 100644 index 0000000..b0eaaec --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/for_each_cpu.c @@ -0,0 +1,137 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2023 Yafang Shao */ + +#include +#include +#include "cgroup_helpers.h" +#include "test_for_each_cpu.skel.h" + +static void verify_percpu_psi_value(struct test_for_each_cpu *skel, int fd, __u32 running, int res) +{ + DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts); + union bpf_iter_link_info linfo; + int len, iter_fd, result; + struct bpf_link *link; + static char buf[128]; + __u32 nr_running; + size_t left; + char *p; + + memset(&linfo, 0, sizeof(linfo)); + linfo.cgroup.cgroup_fd = fd; + linfo.cgroup.order = BPF_CGROUP_ITER_SELF_ONLY; + opts.link_info = &linfo; + opts.link_info_len = sizeof(linfo); + + link = bpf_program__attach_iter(skel->progs.psi_cgroup, &opts); + if (!ASSERT_OK_PTR(link, "attach_iter")) + return; + + iter_fd = bpf_iter_create(bpf_link__fd(link)); + if (!ASSERT_GE(iter_fd, 0, "iter_fd")) + goto free_link; + + memset(buf, 0, sizeof(buf)); + left = ARRAY_SIZE(buf); + p = buf; + while ((len = read(iter_fd, p, left)) > 0) { + p += len; + left -= len; + } + + ASSERT_EQ(sscanf(buf, "nr_running %u ret %d\n", &nr_running, &result), 2, "seq_format"); + ASSERT_EQ(result, res, "for_each_cpu_result"); + if (running) + ASSERT_GE(nr_running, running, "nr_running"); + else + ASSERT_EQ(nr_running, running, "nr_running"); + + /* read() after iter finishes should be ok. */ + if (len == 0) + ASSERT_OK(read(iter_fd, buf, sizeof(buf)), "second_read"); + close(iter_fd); +free_link: + bpf_link__destroy(link); +} + +void test_root_cgroup(struct test_for_each_cpu *skel) +{ + int cgrp_fd, nr_cpus; + + cgrp_fd = get_root_cgroup(); + if (!ASSERT_GE(cgrp_fd, 0, "create cgrp")) + return; + + skel->bss->cpu_mask = CPU_MASK_POSSIBLE; + skel->bss->pid = 0; + nr_cpus = bpf_num_possible_cpus(); + /* At least current is running */ + verify_percpu_psi_value(skel, cgrp_fd, 1, nr_cpus); + close(cgrp_fd); +} + +void test_child_cgroup(struct test_for_each_cpu *skel) +{ + int cgrp_fd, nr_cpus; + + cgrp_fd = create_and_get_cgroup("for_each_cpu"); + if (!ASSERT_GE(cgrp_fd, 0, "create cgrp")) + return; + + skel->bss->cpu_mask = CPU_MASK_POSSIBLE; + skel->bss->pid = 0; + nr_cpus = bpf_num_possible_cpus(); + /* No tasks in the cgroup */ + verify_percpu_psi_value(skel, cgrp_fd, 0, nr_cpus); + close(cgrp_fd); + remove_cgroup("for_each_cpu"); +} + +void verify_invalid_cpumask(struct test_for_each_cpu *skel, int fd, __u32 cpumask, __u32 pid) +{ + DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts); + + skel->bss->cpu_mask = cpumask; + skel->bss->pid = pid; + verify_percpu_psi_value(skel, fd, 0, -EINVAL); +} + +void test_invalid_cpumask(struct test_for_each_cpu *skel) +{ + int cgrp_fd; + + cgrp_fd = create_and_get_cgroup("for_each_cpu"); + if (!ASSERT_GE(cgrp_fd, 0, "create cgrp")) + return; + + verify_invalid_cpumask(skel, cgrp_fd, CPU_MASK_POSSIBLE, 1); + verify_invalid_cpumask(skel, cgrp_fd, CPU_MASK_PRESENT, 1); + verify_invalid_cpumask(skel, cgrp_fd, CPU_MASK_ONLINE, 1); + verify_invalid_cpumask(skel, cgrp_fd, CPU_MASK_TASK, 0); + verify_invalid_cpumask(skel, cgrp_fd, -1, 0); + verify_invalid_cpumask(skel, cgrp_fd, -1, 1); + close(cgrp_fd); + remove_cgroup("for_each_cpu"); +} + +void test_for_each_cpu(void) +{ + struct test_for_each_cpu *skel = NULL; + + skel = test_for_each_cpu__open_and_load(); + if (!ASSERT_OK_PTR(skel, "test_for_each_cpu__open_and_load")) + return; + + if (setup_cgroup_environment()) + return; + + if (test__start_subtest("psi_system")) + test_root_cgroup(skel); + if (test__start_subtest("psi_cgroup")) + test_child_cgroup(skel); + if (test__start_subtest("invalid_cpumask")) + test_invalid_cpumask(skel); + + test_for_each_cpu__destroy(skel); + cleanup_cgroup_environment(); +} diff --git a/tools/testing/selftests/bpf/progs/test_for_each_cpu.c b/tools/testing/selftests/bpf/progs/test_for_each_cpu.c new file mode 100644 index 0000000..1554895 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_for_each_cpu.c @@ -0,0 +1,63 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2023 Yafang Shao */ +#include "vmlinux.h" +#include +#include + +#define __percpu __attribute__((btf_type_tag("percpu"))) + +enum bpf_cpu_mask_type cpu_mask; +__u32 pid; + +struct callback_ctx { + __u32 nr_running; + __u32 id; +}; + +static uint64_t cgroup_id(struct cgroup *cgrp) +{ + return cgrp->kn->id; +} + +static int callback(__u32 cpu, void *ctx, const void *ptr) +{ + unsigned int tasks[NR_PSI_TASK_COUNTS]; + const struct psi_group_cpu *groupc = ptr; + struct callback_ctx *data = ctx; + + bpf_probe_read_kernel(&tasks, sizeof(tasks), &groupc->tasks); + data->nr_running += tasks[NR_RUNNING]; + return 0; +} + +SEC("iter.s/cgroup") +int BPF_PROG(psi_cgroup, struct bpf_iter_meta *meta, struct cgroup *cgrp) +{ + struct seq_file *seq = (struct seq_file *)meta->seq; + struct psi_group_cpu __percpu *pcpu_ptr; + struct callback_ctx data; + struct psi_group *psi; + __u64 cg_id; + int ret; + + cg_id = cgrp ? cgroup_id(cgrp) : 0; + if (!cg_id) + return 1; + + psi = cgrp->psi; + if (!psi) + return 1; + + pcpu_ptr = psi->pcpu; + if (!pcpu_ptr) + return 1; + + data.nr_running = 0; + data.id = cg_id; + ret = bpf_for_each_cpu(callback, &data, pcpu_ptr, cpu_mask, pid); + BPF_SEQ_PRINTF(seq, "nr_running %d ret %d\n", data.nr_running, ret); + + return ret ? 1 : 0; +} + +char _license[] SEC("license") = "GPL";