From patchwork Thu Sep 16 16:24:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 12499633 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B5BCC433F5 for ; Thu, 16 Sep 2021 16:27:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 11A2061A7D for ; Thu, 16 Sep 2021 16:27:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241282AbhIPQ27 (ORCPT ); Thu, 16 Sep 2021 12:28:59 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:40382 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S241995AbhIPQ0b (ORCPT ); Thu, 16 Sep 2021 12:26:31 -0400 Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.16.1.2/8.16.1.2) with SMTP id 18GFgsYM027913 for ; Thu, 16 Sep 2021 09:25:10 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=2Ud41W8f3UVH99N8pgaXHFy5OqTvZZTVZAHia0O/tSY=; b=rR+24Bq/c67WB6j1TeBlwwMMjHa0XJO1dvRorNROPyuvY84ABLyJIC1P70nR9+bhbPyb Jr1Sbn3+bYHEP8SVIVUmvWJrMsXanO00/tuneUMR5VpBz4udJxj/LskqYANx0qU9uNLC 6/yZC04QQuxBmdU7L51CQW4MyBuK8P2ALBw= Received: from mail.thefacebook.com ([163.114.132.120]) by m0089730.ppops.net with ESMTP id 3b42vxtth9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 16 Sep 2021 09:25:10 -0700 Received: from intmgw003.48.prn1.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.14; Thu, 16 Sep 2021 09:25:09 -0700 Received: by devvm3388.prn0.facebook.com (Postfix, from userid 111017) id 27B82BE68AAA; Thu, 16 Sep 2021 09:25:02 -0700 (PDT) From: Roman Gushchin To: Peter Zijlstra , Ingo Molnar CC: Mel Gorman , , , Roman Gushchin Subject: [PATCH rfc 1/6] bpf: sched: basic infrastructure for scheduler bpf Date: Thu, 16 Sep 2021 09:24:46 -0700 Message-ID: <20210916162451.709260-2-guro@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210916162451.709260-1-guro@fb.com> References: <20210915213550.3696532-1-guro@fb.com> <20210916162451.709260-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-FB-Source: Intern X-Proofpoint-GUID: QNn0CZhVpdTXPQJl5CxmIXl2K8maan7p X-Proofpoint-ORIG-GUID: QNn0CZhVpdTXPQJl5CxmIXl2K8maan7p X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.391,FMLib:17.0.607.475 definitions=2021-09-16_04,2021-09-16_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 impostorscore=0 suspectscore=0 priorityscore=1501 mlxlogscore=734 adultscore=0 malwarescore=0 clxscore=1015 phishscore=0 bulkscore=0 lowpriorityscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109030001 definitions=main-2109160098 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This commit introduces basic definitions and infrastructure for scheduler bpf programs. It defines the BPF_PROG_TYPE_SCHED program type and the BPF_SCHED attachment type. The implementation is inspired by lsm bpf programs and is based on kretprobes. This will allow to add new hooks with a minimal changes to the kernel code and without any changes to libbpf/bpftool. It's very convenient as I anticipate a large number of private patches being used for a long time before (or if at all) reaching upstream. Sched programs are expected to return an int, which meaning will be context defined. This patch doesn't add any real scheduler hooks (only a stub), it will be done by following patches in the series. Scheduler bpf programs as now are very restricted in what they can do: only the bpf_printk() helper is available. The scheduler context can impose significant restrictions on what's safe and what's not. So let's extend their abilities on case by case basis when a need arise. Signed-off-by: Roman Gushchin --- include/linux/bpf_sched.h | 28 +++++++++++++++ include/linux/bpf_types.h | 3 ++ include/linux/sched_hook_defs.h | 2 ++ include/uapi/linux/bpf.h | 2 ++ kernel/bpf/btf.c | 1 + kernel/bpf/syscall.c | 14 ++++++-- kernel/bpf/trampoline.c | 1 + kernel/bpf/verifier.c | 9 ++++- kernel/sched/Makefile | 1 + kernel/sched/bpf_sched.c | 62 +++++++++++++++++++++++++++++++++ tools/include/uapi/linux/bpf.h | 2 ++ 11 files changed, 122 insertions(+), 3 deletions(-) create mode 100644 include/linux/bpf_sched.h create mode 100644 include/linux/sched_hook_defs.h create mode 100644 kernel/sched/bpf_sched.c diff --git a/include/linux/bpf_sched.h b/include/linux/bpf_sched.h new file mode 100644 index 000000000000..0f8d3dae53df --- /dev/null +++ b/include/linux/bpf_sched.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _BPF_SCHED_H +#define _BPF_SCHED_H + +#include + +#ifdef CONFIG_BPF_SYSCALL + +#define BPF_SCHED_HOOK(RET, DEFAULT, NAME, ...) \ + RET bpf_sched_##NAME(__VA_ARGS__); +#include +#undef BPF_SCHED_HOOK + +int bpf_sched_verify_prog(struct bpf_verifier_log *vlog, + const struct bpf_prog *prog); + +#else /* CONFIG_BPF_SYSCALL */ + +#define BPF_SCHED_HOOK(RET, DEFAULT, NAME, ...) \ +static inline RET bpf_sched_##NAME(__VA_ARGS__) \ +{ \ + return DEFAULT; \ +} +#undef BPF_SCHED_HOOK + +#endif /* CONFIG_BPF_SYSCALL */ + +#endif /* _BPF_CGROUP_H */ diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index 9c81724e4b98..ed6aac4368c0 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -79,6 +79,9 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LSM, lsm, #endif BPF_PROG_TYPE(BPF_PROG_TYPE_SYSCALL, bpf_syscall, void *, void *) +BPF_PROG_TYPE(BPF_PROG_TYPE_SCHED, bpf_sched, + void *, void *) + BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops) diff --git a/include/linux/sched_hook_defs.h b/include/linux/sched_hook_defs.h new file mode 100644 index 000000000000..14344004e335 --- /dev/null +++ b/include/linux/sched_hook_defs.h @@ -0,0 +1,2 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +BPF_SCHED_HOOK(int, 0, dummy, void) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index d21326558d42..6dfbebb8fc8f 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -949,6 +949,7 @@ enum bpf_prog_type { BPF_PROG_TYPE_LSM, BPF_PROG_TYPE_SK_LOOKUP, BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */ + BPF_PROG_TYPE_SCHED, }; enum bpf_attach_type { @@ -994,6 +995,7 @@ enum bpf_attach_type { BPF_SK_REUSEPORT_SELECT, BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, BPF_PERF_EVENT, + BPF_SCHED, __MAX_BPF_ATTACH_TYPE }; diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index c3d605b22473..fb46e447a062 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -4884,6 +4884,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type, return true; t = btf_type_by_id(btf, t->type); break; + case BPF_SCHED: case BPF_MODIFY_RETURN: /* For now the BPF_MODIFY_RETURN can only be attached to * functions that return an int. diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 4e50c0bfdb7d..67e062376f22 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2026,6 +2026,7 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type, case BPF_PROG_TYPE_LSM: case BPF_PROG_TYPE_STRUCT_OPS: case BPF_PROG_TYPE_EXT: + case BPF_PROG_TYPE_SCHED: break; default: return -EINVAL; @@ -2149,6 +2150,7 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type) case BPF_PROG_TYPE_LSM: case BPF_PROG_TYPE_STRUCT_OPS: /* has access to struct sock */ case BPF_PROG_TYPE_EXT: /* extends any prog */ + case BPF_PROG_TYPE_SCHED: return true; default: return false; @@ -2682,6 +2684,12 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog, goto out_put_prog; } break; + case BPF_PROG_TYPE_SCHED: + if (prog->expected_attach_type != BPF_SCHED) { + err = -EINVAL; + goto out_put_prog; + } + break; default: err = -EINVAL; goto out_put_prog; @@ -2740,13 +2748,14 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog, */ if (!prog->aux->dst_trampoline && !tgt_prog) { /* - * Allow re-attach for TRACING and LSM programs. If it's + * Allow re-attach for TRACING, LSM ans SCHED programs. If it's * currently linked, bpf_trampoline_link_prog will fail. * EXT programs need to specify tgt_prog_fd, so they * re-attach in separate code path. */ if (prog->type != BPF_PROG_TYPE_TRACING && - prog->type != BPF_PROG_TYPE_LSM) { + prog->type != BPF_PROG_TYPE_LSM && + prog->type != BPF_PROG_TYPE_SCHED) { err = -EINVAL; goto out_unlock; } @@ -2996,6 +3005,7 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr) case BPF_PROG_TYPE_TRACING: case BPF_PROG_TYPE_EXT: case BPF_PROG_TYPE_LSM: + case BPF_PROG_TYPE_SCHED: if (attr->raw_tracepoint.name) { /* The attach point for this category of programs * should be specified via btf_id during program load. diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index 39eaaff81953..980b878892a4 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -394,6 +394,7 @@ static enum bpf_tramp_prog_type bpf_attach_type_to_tramp(struct bpf_prog *prog) switch (prog->expected_attach_type) { case BPF_TRACE_FENTRY: return BPF_TRAMP_FENTRY; + case BPF_SCHED: case BPF_MODIFY_RETURN: return BPF_TRAMP_MODIFY_RETURN; case BPF_TRACE_FEXIT: diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 047ac4b4703b..233445619084 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -22,6 +22,7 @@ #include #include #include +#include #include "disasm.h" @@ -13477,6 +13478,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log, case BPF_LSM_MAC: case BPF_TRACE_FENTRY: case BPF_TRACE_FEXIT: + case BPF_SCHED: if (!btf_type_is_func(t)) { bpf_log(log, "attach_btf_id %u is not a function\n", btf_id); @@ -13601,7 +13603,8 @@ static int check_attach_btf_id(struct bpf_verifier_env *env) if (prog->type != BPF_PROG_TYPE_TRACING && prog->type != BPF_PROG_TYPE_LSM && - prog->type != BPF_PROG_TYPE_EXT) + prog->type != BPF_PROG_TYPE_EXT && + prog->type != BPF_PROG_TYPE_SCHED) return 0; ret = bpf_check_attach_target(&env->log, prog, tgt_prog, btf_id, &tgt_info); @@ -13642,6 +13645,10 @@ static int check_attach_btf_id(struct bpf_verifier_env *env) } else if (prog->type == BPF_PROG_TYPE_TRACING && btf_id_set_contains(&btf_id_deny, btf_id)) { return -EINVAL; + } else if (prog->type == BPF_PROG_TYPE_SCHED) { + ret = bpf_sched_verify_prog(&env->log, prog); + if (ret < 0) + return ret; } key = bpf_trampoline_compute_key(tgt_prog, prog->aux->attach_btf, btf_id); diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile index 978fcfca5871..efb2cad4651b 100644 --- a/kernel/sched/Makefile +++ b/kernel/sched/Makefile @@ -37,3 +37,4 @@ obj-$(CONFIG_MEMBARRIER) += membarrier.o obj-$(CONFIG_CPU_ISOLATION) += isolation.o obj-$(CONFIG_PSI) += psi.o obj-$(CONFIG_SCHED_CORE) += core_sched.o +obj-$(CONFIG_BPF_SYSCALL) += bpf_sched.o diff --git a/kernel/sched/bpf_sched.c b/kernel/sched/bpf_sched.c new file mode 100644 index 000000000000..2f05c186cfd0 --- /dev/null +++ b/kernel/sched/bpf_sched.c @@ -0,0 +1,62 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include "sched.h" + +/* + * For every hook declare a nop function where a BPF program can be attached. + */ +#define BPF_SCHED_HOOK(RET, DEFAULT, NAME, ...) \ +noinline RET bpf_sched_##NAME(__VA_ARGS__) \ +{ \ + return DEFAULT; \ +} + +#include +#undef BPF_SCHED_HOOK + +#define BPF_SCHED_HOOK(RET, DEFAULT, NAME, ...) BTF_ID(func, bpf_sched_##NAME) +BTF_SET_START(bpf_sched_hooks) +#include +#undef BPF_SCHED_HOOK +BTF_SET_END(bpf_sched_hooks) + +int bpf_sched_verify_prog(struct bpf_verifier_log *vlog, + const struct bpf_prog *prog) +{ + if (!prog->gpl_compatible) { + bpf_log(vlog, + "sched programs must have a GPL compatible license\n"); + return -EINVAL; + } + + if (!btf_id_set_contains(&bpf_sched_hooks, prog->aux->attach_btf_id)) { + bpf_log(vlog, "attach_btf_id %u points to wrong type name %s\n", + prog->aux->attach_btf_id, prog->aux->attach_func_name); + return -EINVAL; + } + + return 0; +} + +static const struct bpf_func_proto * +bpf_sched_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +{ + switch (func_id) { + case BPF_FUNC_trace_printk: + return bpf_get_trace_printk_proto(); + default: + return NULL; + } +} + +const struct bpf_prog_ops bpf_sched_prog_ops = { +}; + +const struct bpf_verifier_ops bpf_sched_verifier_ops = { + .get_func_proto = bpf_sched_func_proto, + .is_valid_access = btf_ctx_access, +}; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index d21326558d42..6dfbebb8fc8f 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -949,6 +949,7 @@ enum bpf_prog_type { BPF_PROG_TYPE_LSM, BPF_PROG_TYPE_SK_LOOKUP, BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */ + BPF_PROG_TYPE_SCHED, }; enum bpf_attach_type { @@ -994,6 +995,7 @@ enum bpf_attach_type { BPF_SK_REUSEPORT_SELECT, BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, BPF_PERF_EVENT, + BPF_SCHED, __MAX_BPF_ATTACH_TYPE }; From patchwork Thu Sep 16 16:24:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 12499625 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9BDFC433F5 for ; Thu, 16 Sep 2021 16:27:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CBBB76152B for ; Thu, 16 Sep 2021 16:27:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242466AbhIPQ21 (ORCPT ); Thu, 16 Sep 2021 12:28:27 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:38508 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S241918AbhIPQ01 (ORCPT ); Thu, 16 Sep 2021 12:26:27 -0400 Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.16.1.2/8.16.1.2) with SMTP id 18GFgvmi028087 for ; Thu, 16 Sep 2021 09:25:06 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=I5fkErTvZrkiVI6NbzLcumqoZjkORUv3EO++1kQRzZs=; b=D9SljK0EcOKqOWh0eoKaggHLnjdCdjhvLrE8T7n6JLEqNCRl2utlgWro2nOM5L3fY+54 e3EodqLr5GFMoQCXD9mUICLLn05Wf1IYYaTkvCbPIxHi8y2BSmJb58QCKtOKuBL4Gw+G iUrmfkBjvRH3o0163SLxE1P+qJxIDxK03X8= Received: from mail.thefacebook.com ([163.114.132.120]) by m0089730.ppops.net with ESMTP id 3b42vxttgx-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 16 Sep 2021 09:25:06 -0700 Received: from intmgw002.48.prn1.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:11d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.14; Thu, 16 Sep 2021 09:25:04 -0700 Received: by devvm3388.prn0.facebook.com (Postfix, from userid 111017) id 3080EBE68AAC; Thu, 16 Sep 2021 09:25:02 -0700 (PDT) From: Roman Gushchin To: Peter Zijlstra , Ingo Molnar CC: Mel Gorman , , , Roman Gushchin Subject: [PATCH rfc 2/6] bpf: sched: add convenient helpers to identify sched entities Date: Thu, 16 Sep 2021 09:24:47 -0700 Message-ID: <20210916162451.709260-3-guro@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210916162451.709260-1-guro@fb.com> References: <20210915213550.3696532-1-guro@fb.com> <20210916162451.709260-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-FB-Source: Intern X-Proofpoint-GUID: -1obUGZkUac_MnMQ9poMh6taHHsHMkMx X-Proofpoint-ORIG-GUID: -1obUGZkUac_MnMQ9poMh6taHHsHMkMx X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.391,FMLib:17.0.607.475 definitions=2021-09-16_04,2021-09-16_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 impostorscore=0 suspectscore=0 priorityscore=1501 mlxlogscore=865 adultscore=0 malwarescore=0 clxscore=1015 phishscore=0 bulkscore=0 lowpriorityscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109030001 definitions=main-2109160098 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This patch adds 3 helpers useful for dealing with sched entities: u64 bpf_sched_entity_to_tgidpid(struct sched_entity *se); u64 bpf_sched_entity_to_cgrpid(struct sched_entity *se); long bpf_sched_entity_belongs_to_cgrp(struct sched_entity *se, u64 cgrpid); Sched entity is a basic structure used by the scheduler to represent schedulable objects: tasks and cgroups (if CONFIG_FAIR_GROUP_SCHED is enabled). It will be passed as an argument to many bpf hooks, so scheduler bpf programs need a convenient way to deal with it. bpf_sched_entity_to_tgidpid() and bpf_sched_entity_to_cgrpid() are useful to identify a sched entity in userspace terms (pid, tgid and cgroup id). bpf_sched_entity_belongs_to_cgrp() allows to check whether a sched entity belongs to sub-tree of a cgroup. It allows to write cgroup-specific scheduler policies even without enabling the cgroup cpu controller. Signed-off-by: Roman Gushchin --- include/uapi/linux/bpf.h | 23 +++++++++++ kernel/sched/bpf_sched.c | 74 ++++++++++++++++++++++++++++++++++ scripts/bpf_doc.py | 2 + tools/include/uapi/linux/bpf.h | 23 +++++++++++ 4 files changed, 122 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 6dfbebb8fc8f..199e4a92820d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -4900,6 +4900,26 @@ union bpf_attr { * **-EINVAL** if *flags* is not zero. * * **-ENOENT** if architecture does not support branch records. + * + * u64 bpf_sched_entity_to_tgidpid(struct sched_entity *se) + * Description + * Return task's encoded tgid and pid if the sched entity is a task. + * Return + * Tgid and pid encoded as tgid << 32 \| pid, if *se* is a task. (u64)-1 otherwise. + * + * u64 bpf_sched_entity_to_cgrpid(struct sched_entity *se) + * Description + * Return cgroup id if the given sched entity is a cgroup. + * Return + * Cgroup id, if *se* is a cgroup. (u64)-1 otherwise. + * + * long bpf_sched_entity_belongs_to_cgrp(struct sched_entity *se, u64 cgrpid) + * Description + * Checks whether the sched entity belongs to a cgroup or + * it's sub-tree. It doesn't require a cgroup CPU controller + * to be enabled. + * Return + * 1 if the sched entity belongs to a cgroup, 0 otherwise. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5079,6 +5099,9 @@ union bpf_attr { FN(get_attach_cookie), \ FN(task_pt_regs), \ FN(get_branch_snapshot), \ + FN(sched_entity_to_tgidpid), \ + FN(sched_entity_to_cgrpid), \ + FN(sched_entity_belongs_to_cgrp), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/kernel/sched/bpf_sched.c b/kernel/sched/bpf_sched.c index 2f05c186cfd0..ead691dc6e85 100644 --- a/kernel/sched/bpf_sched.c +++ b/kernel/sched/bpf_sched.c @@ -42,12 +42,86 @@ int bpf_sched_verify_prog(struct bpf_verifier_log *vlog, return 0; } +BPF_CALL_1(bpf_sched_entity_to_tgidpid, struct sched_entity *, se) +{ + if (entity_is_task(se)) { + struct task_struct *task = task_of(se); + + return (u64) task->tgid << 32 | task->pid; + } else { + return (u64) -1; + } +} + +BPF_CALL_1(bpf_sched_entity_to_cgrpid, struct sched_entity *, se) +{ +#ifdef CONFIG_FAIR_GROUP_SCHED + if (!entity_is_task(se)) + return cgroup_id(se->cfs_rq->tg->css.cgroup); +#endif + return (u64) -1; +} + +BPF_CALL_2(bpf_sched_entity_belongs_to_cgrp, struct sched_entity *, se, + u64, cgrpid) +{ +#ifdef CONFIG_CGROUPS + struct cgroup *cgrp; + int level; + + if (entity_is_task(se)) + cgrp = task_dfl_cgroup(task_of(se)); +#ifdef CONFIG_FAIR_GROUP_SCHED + else + cgrp = se->cfs_rq->tg->css.cgroup; +#endif + + for (level = cgrp->level; level; level--) + if (cgrp->ancestor_ids[level] == cgrpid) + return 1; +#endif + return 0; +} + +BTF_ID_LIST_SINGLE(btf_sched_entity_ids, struct, sched_entity) + +static const struct bpf_func_proto bpf_sched_entity_to_tgidpid_proto = { + .func = bpf_sched_entity_to_tgidpid, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_BTF_ID, + .arg1_btf_id = &btf_sched_entity_ids[0], +}; + +static const struct bpf_func_proto bpf_sched_entity_to_cgrpid_proto = { + .func = bpf_sched_entity_to_cgrpid, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_BTF_ID, + .arg1_btf_id = &btf_sched_entity_ids[0], +}; + +static const struct bpf_func_proto bpf_sched_entity_belongs_to_cgrp_proto = { + .func = bpf_sched_entity_belongs_to_cgrp, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_BTF_ID, + .arg1_btf_id = &btf_sched_entity_ids[0], + .arg2_type = ARG_ANYTHING, +}; + static const struct bpf_func_proto * bpf_sched_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_trace_printk: return bpf_get_trace_printk_proto(); + case BPF_FUNC_sched_entity_to_tgidpid: + return &bpf_sched_entity_to_tgidpid_proto; + case BPF_FUNC_sched_entity_to_cgrpid: + return &bpf_sched_entity_to_cgrpid_proto; + case BPF_FUNC_sched_entity_belongs_to_cgrp: + return &bpf_sched_entity_belongs_to_cgrp_proto; default: return NULL; } diff --git a/scripts/bpf_doc.py b/scripts/bpf_doc.py index 00ac7b79cddb..84019ba5b67b 100755 --- a/scripts/bpf_doc.py +++ b/scripts/bpf_doc.py @@ -548,6 +548,7 @@ class PrinterHelpers(Printer): 'struct socket', 'struct file', 'struct bpf_timer', + 'struct sched_entity', ] known_types = { '...', @@ -596,6 +597,7 @@ class PrinterHelpers(Printer): 'struct socket', 'struct file', 'struct bpf_timer', + 'struct sched_entity', } mapped_types = { 'u8': '__u8', diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 6dfbebb8fc8f..199e4a92820d 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -4900,6 +4900,26 @@ union bpf_attr { * **-EINVAL** if *flags* is not zero. * * **-ENOENT** if architecture does not support branch records. + * + * u64 bpf_sched_entity_to_tgidpid(struct sched_entity *se) + * Description + * Return task's encoded tgid and pid if the sched entity is a task. + * Return + * Tgid and pid encoded as tgid << 32 \| pid, if *se* is a task. (u64)-1 otherwise. + * + * u64 bpf_sched_entity_to_cgrpid(struct sched_entity *se) + * Description + * Return cgroup id if the given sched entity is a cgroup. + * Return + * Cgroup id, if *se* is a cgroup. (u64)-1 otherwise. + * + * long bpf_sched_entity_belongs_to_cgrp(struct sched_entity *se, u64 cgrpid) + * Description + * Checks whether the sched entity belongs to a cgroup or + * it's sub-tree. It doesn't require a cgroup CPU controller + * to be enabled. + * Return + * 1 if the sched entity belongs to a cgroup, 0 otherwise. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -5079,6 +5099,9 @@ union bpf_attr { FN(get_attach_cookie), \ FN(task_pt_regs), \ FN(get_branch_snapshot), \ + FN(sched_entity_to_tgidpid), \ + FN(sched_entity_to_cgrpid), \ + FN(sched_entity_belongs_to_cgrp), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper From patchwork Thu Sep 16 16:24:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 12499627 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E884CC4332F for ; Thu, 16 Sep 2021 16:27:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CE6136152B for ; Thu, 16 Sep 2021 16:27:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242509AbhIPQ2d (ORCPT ); Thu, 16 Sep 2021 12:28:33 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:65282 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S232660AbhIPQ01 (ORCPT ); Thu, 16 Sep 2021 12:26:27 -0400 Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.16.1.2/8.16.1.2) with SMTP id 18GFgtm3027960 for ; Thu, 16 Sep 2021 09:25:06 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=FvkZsdRnuDFdsFs6RIuDJK0lxpqem6zpUwmYQiIpIG4=; b=A+pK/+rZLVyk3N245w0cDN5aG0VOxd+gBaLAQqwoiLk7nrCCQsqw9vd12Lc3/zjUlkfb OnpuSAqHGioIdNO65LTJY/iG2rj8AbBNj3NvrJfRdzRUqpRfQVq14RqyxPFhW1LOKPXw BHU+uKzXLBbs76Xsp1h3kv4NL0b2YxbrDfU= Received: from mail.thefacebook.com ([163.114.132.120]) by m0089730.ppops.net with ESMTP id 3b42vxttgy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 16 Sep 2021 09:25:05 -0700 Received: from intmgw002.46.prn1.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:21d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.14; Thu, 16 Sep 2021 09:25:04 -0700 Received: by devvm3388.prn0.facebook.com (Postfix, from userid 111017) id 36BF6BE68AAE; Thu, 16 Sep 2021 09:25:02 -0700 (PDT) From: Roman Gushchin To: Peter Zijlstra , Ingo Molnar CC: Mel Gorman , , , Roman Gushchin Subject: [PATCH rfc 3/6] bpf: sched: introduce bpf_sched_enable() Date: Thu, 16 Sep 2021 09:24:48 -0700 Message-ID: <20210916162451.709260-4-guro@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210916162451.709260-1-guro@fb.com> References: <20210915213550.3696532-1-guro@fb.com> <20210916162451.709260-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-FB-Source: Intern X-Proofpoint-GUID: N4OWnV7FrtyBrAEYYACzcI_dk7gikiGB X-Proofpoint-ORIG-GUID: N4OWnV7FrtyBrAEYYACzcI_dk7gikiGB X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.391,FMLib:17.0.607.475 definitions=2021-09-16_04,2021-09-16_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 impostorscore=0 suspectscore=0 priorityscore=1501 mlxlogscore=999 adultscore=0 malwarescore=0 clxscore=1015 phishscore=0 bulkscore=0 lowpriorityscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109030001 definitions=main-2109160098 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Introduce a dedicated static key and the bpf_sched_enabled() wrapper to guard all invocations of bpf programs in the scheduler code. It will help to avoid any potential performance regression in a case when no scheduler bpf programs are attached. Signed-off-by: Roman Gushchin --- include/linux/bpf_sched.h | 24 ++++++++++++++++++++++++ kernel/bpf/syscall.c | 7 +++++++ kernel/sched/bpf_sched.c | 2 ++ 3 files changed, 33 insertions(+) diff --git a/include/linux/bpf_sched.h b/include/linux/bpf_sched.h index 0f8d3dae53df..6e773aecdff7 100644 --- a/include/linux/bpf_sched.h +++ b/include/linux/bpf_sched.h @@ -6,6 +6,8 @@ #ifdef CONFIG_BPF_SYSCALL +#include + #define BPF_SCHED_HOOK(RET, DEFAULT, NAME, ...) \ RET bpf_sched_##NAME(__VA_ARGS__); #include @@ -14,6 +16,23 @@ int bpf_sched_verify_prog(struct bpf_verifier_log *vlog, const struct bpf_prog *prog); +DECLARE_STATIC_KEY_FALSE(bpf_sched_enabled_key); + +static inline bool bpf_sched_enabled(void) +{ + return static_branch_unlikely(&bpf_sched_enabled_key); +} + +static inline void bpf_sched_inc(void) +{ + static_branch_inc(&bpf_sched_enabled_key); +} + +static inline void bpf_sched_dec(void) +{ + static_branch_dec(&bpf_sched_enabled_key); +} + #else /* CONFIG_BPF_SYSCALL */ #define BPF_SCHED_HOOK(RET, DEFAULT, NAME, ...) \ @@ -23,6 +42,11 @@ static inline RET bpf_sched_##NAME(__VA_ARGS__) \ } #undef BPF_SCHED_HOOK +static inline bool bpf_sched_enabled(void) +{ + return false; +} + #endif /* CONFIG_BPF_SYSCALL */ #endif /* _BPF_CGROUP_H */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 67e062376f22..aa5565110498 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -31,6 +31,7 @@ #include #include #include +#include #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \ (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \ @@ -2602,6 +2603,9 @@ static void bpf_tracing_link_release(struct bpf_link *link) struct bpf_tracing_link *tr_link = container_of(link, struct bpf_tracing_link, link); + if (link->prog->type == BPF_PROG_TYPE_SCHED) + bpf_sched_dec(); + WARN_ON_ONCE(bpf_trampoline_unlink_prog(link->prog, tr_link->trampoline)); @@ -2804,6 +2808,9 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog, goto out_unlock; } + if (prog->type == BPF_PROG_TYPE_SCHED) + bpf_sched_inc(); + link->tgt_prog = tgt_prog; link->trampoline = tr; diff --git a/kernel/sched/bpf_sched.c b/kernel/sched/bpf_sched.c index ead691dc6e85..bf92cfb5ecf4 100644 --- a/kernel/sched/bpf_sched.c +++ b/kernel/sched/bpf_sched.c @@ -6,6 +6,8 @@ #include #include "sched.h" +DEFINE_STATIC_KEY_FALSE(bpf_sched_enabled_key); + /* * For every hook declare a nop function where a BPF program can be attached. */ From patchwork Thu Sep 16 16:24:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 12499629 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96A64C433F5 for ; Thu, 16 Sep 2021 16:27:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8038461A78 for ; Thu, 16 Sep 2021 16:27:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242113AbhIPQ2v (ORCPT ); Thu, 16 Sep 2021 12:28:51 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:11886 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241243AbhIPQ0y (ORCPT ); Thu, 16 Sep 2021 12:26:54 -0400 Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 18GFgwhA000568 for ; Thu, 16 Sep 2021 09:25:33 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=iau4hHOFchq8z6EJX9Kwz613Q1mY9uyx7wtHal15ZB0=; b=iy5pGpvpSJlSThE8pIndCuEzZrunpurp2puh669JaVbQvyqSSP8rMSjm7W7B5dYEHI5Z tzX4uOj9S2EBUym+HBHQe9eIwoW2bzqFvyc1yigXDfvHeWTTVt+oGf3hZ0WWQwf80cGu 8KybFGsOYAWnB0F/7k5mcPLrC+3wi04XaJs= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 3b40fb3fes-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 16 Sep 2021 09:25:33 -0700 Received: from intmgw002.46.prn1.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.14; Thu, 16 Sep 2021 09:25:04 -0700 Received: by devvm3388.prn0.facebook.com (Postfix, from userid 111017) id 3CC76BE68AB2; Thu, 16 Sep 2021 09:25:02 -0700 (PDT) From: Roman Gushchin To: Peter Zijlstra , Ingo Molnar CC: Mel Gorman , , , Roman Gushchin Subject: [PATCH rfc 4/6] sched: cfs: add bpf hooks to control wakeup and tick preemption Date: Thu, 16 Sep 2021 09:24:49 -0700 Message-ID: <20210916162451.709260-5-guro@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210916162451.709260-1-guro@fb.com> References: <20210915213550.3696532-1-guro@fb.com> <20210916162451.709260-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-FB-Source: Intern X-Proofpoint-GUID: -XznsGAKI8nZxTxgB_2aureNOrMCkajs X-Proofpoint-ORIG-GUID: -XznsGAKI8nZxTxgB_2aureNOrMCkajs X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.391,FMLib:17.0.607.475 definitions=2021-09-16_04,2021-09-16_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 priorityscore=1501 lowpriorityscore=0 clxscore=1015 adultscore=0 mlxlogscore=999 malwarescore=0 suspectscore=0 impostorscore=0 spamscore=0 phishscore=0 mlxscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109030001 definitions=main-2109160098 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This patch adds 3 hooks to control wakeup and tick preemption: cfs_check_preempt_tick cfs_check_preempt_wakeup cfs_wakeup_preempt_entity The first one allows to force or suppress a preemption from a tick context. An obvious usage example is to minimize the number of non-voluntary context switches and decrease an associated latency penalty by (conditionally) providing tasks or task groups an extended execution slice. It can be used instead of tweaking sysctl_sched_min_granularity. The second one is called from the wakeup preemption code and allows to redefine whether a newly woken task should preempt the execution of the current task. This is useful to minimize a number of preemptions of latency sensitive tasks. To some extent it's a more flexible analog of a sysctl_sched_wakeup_granularity. The third one is similar, but it tweaks the wakeup_preempt_entity() function, which is called not only from a wakeup context, but also from pick_next_task(), which allows to influence the decision on which task will be running next. It's a place for a discussion whether we need both these hooks or only one of them: the second is more powerful, but depends more on the current implementation. In any case, bpf hooks are not an ABI, so it's not a deal breaker. The idea of the wakeup_preempt_entity hook belongs to Rik van Riel. He also contributed a lot to the whole patchset by proving his ideas, recommendations and a feedback for earlier (non-public) versions. Signed-off-by: Roman Gushchin --- include/linux/bpf_sched.h | 1 + include/linux/sched_hook_defs.h | 4 +++- kernel/sched/fair.c | 27 +++++++++++++++++++++++++++ 3 files changed, 31 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf_sched.h b/include/linux/bpf_sched.h index 6e773aecdff7..5c238aeb853c 100644 --- a/include/linux/bpf_sched.h +++ b/include/linux/bpf_sched.h @@ -40,6 +40,7 @@ static inline RET bpf_sched_##NAME(__VA_ARGS__) \ { \ return DEFAULT; \ } +#include #undef BPF_SCHED_HOOK static inline bool bpf_sched_enabled(void) diff --git a/include/linux/sched_hook_defs.h b/include/linux/sched_hook_defs.h index 14344004e335..f075b32698cd 100644 --- a/include/linux/sched_hook_defs.h +++ b/include/linux/sched_hook_defs.h @@ -1,2 +1,4 @@ /* SPDX-License-Identifier: GPL-2.0 */ -BPF_SCHED_HOOK(int, 0, dummy, void) +BPF_SCHED_HOOK(int, 0, cfs_check_preempt_tick, struct sched_entity *curr, unsigned long delta_exec) +BPF_SCHED_HOOK(int, 0, cfs_check_preempt_wakeup, struct task_struct *curr, struct task_struct *p) +BPF_SCHED_HOOK(int, 0, cfs_wakeup_preempt_entity, struct sched_entity *curr, struct sched_entity *se) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ff69f245b939..35ea8911b25c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -21,6 +21,7 @@ * Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra */ #include "sched.h" +#include /* * Targeted preemption latency for CPU-bound tasks: @@ -4447,6 +4448,16 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr) ideal_runtime = sched_slice(cfs_rq, curr); delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime; + + if (bpf_sched_enabled()) { + int ret = bpf_sched_cfs_check_preempt_tick(curr, delta_exec); + + if (ret < 0) + return; + else if (ret > 0) + resched_curr(rq_of(cfs_rq)); + } + if (delta_exec > ideal_runtime) { resched_curr(rq_of(cfs_rq)); /* @@ -7083,6 +7094,13 @@ wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se) { s64 gran, vdiff = curr->vruntime - se->vruntime; + if (bpf_sched_enabled()) { + int ret = bpf_sched_cfs_wakeup_preempt_entity(curr, se); + + if (ret) + return ret; + } + if (vdiff <= 0) return -1; @@ -7168,6 +7186,15 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_ likely(!task_has_idle_policy(p))) goto preempt; + if (bpf_sched_enabled()) { + int ret = bpf_sched_cfs_check_preempt_wakeup(current, p); + + if (ret < 0) + return; + else if (ret > 0) + goto preempt; + } + /* * Batch and idle tasks do not preempt non-idle tasks (their preemption * is driven by the tick): From patchwork Thu Sep 16 16:24:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 12499635 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4972C433EF for ; Thu, 16 Sep 2021 16:27:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9DA5D61A81 for ; Thu, 16 Sep 2021 16:27:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241260AbhIPQ3B (ORCPT ); Thu, 16 Sep 2021 12:29:01 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:43128 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241990AbhIPQ0b (ORCPT ); Thu, 16 Sep 2021 12:26:31 -0400 Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 18GFgtmp005023 for ; Thu, 16 Sep 2021 09:25:10 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=FEngnceCHNRk7KyGdfiXouOzgaCwaRfn74nt54cqj9c=; b=R2FkPkU1U0cINYp3qHN+FvlK6++yo6b1tp9TDrhJTBuDM4iUDST3vzKn4+oIdKha00xy V4zBtj+gytyT+z7oyraZaapOi44uONzNwRmnclkTqR3Fz9+oghLcy2v0UlMQFhn3U59N u6UeP3zEEdohUNW2G2osY7g/0ctVAk23oBE= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 3b3dkwjsbt-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 16 Sep 2021 09:25:10 -0700 Received: from intmgw003.48.prn1.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.14; Thu, 16 Sep 2021 09:25:09 -0700 Received: by devvm3388.prn0.facebook.com (Postfix, from userid 111017) id 4305CBE68AB4; Thu, 16 Sep 2021 09:25:02 -0700 (PDT) From: Roman Gushchin To: Peter Zijlstra , Ingo Molnar CC: Mel Gorman , , , Roman Gushchin Subject: [PATCH rfc 5/6] libbpf: add support for scheduler bpf programs Date: Thu, 16 Sep 2021 09:24:50 -0700 Message-ID: <20210916162451.709260-6-guro@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210916162451.709260-1-guro@fb.com> References: <20210915213550.3696532-1-guro@fb.com> <20210916162451.709260-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-FB-Source: Intern X-Proofpoint-GUID: PheQcomK4p2WdO4EtqnTKaGPa5wq9CVP X-Proofpoint-ORIG-GUID: PheQcomK4p2WdO4EtqnTKaGPa5wq9CVP X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.391,FMLib:17.0.607.475 definitions=2021-09-16_04,2021-09-16_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 malwarescore=0 clxscore=1015 suspectscore=0 mlxlogscore=999 impostorscore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 bulkscore=0 adultscore=0 phishscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109030001 definitions=main-2109160098 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC This patch adds a support for loading and attaching scheduler bpf programs. Signed-off-by: Roman Gushchin --- tools/lib/bpf/libbpf.c | 27 +++++++++++++++++++++++++-- tools/lib/bpf/libbpf.h | 4 ++++ tools/lib/bpf/libbpf.map | 3 +++ 3 files changed, 32 insertions(+), 2 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 62a43c408d73..8374a8d4aafe 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -2633,7 +2633,8 @@ static int bpf_object__finalize_btf(struct bpf_object *obj) static bool prog_needs_vmlinux_btf(struct bpf_program *prog) { if (prog->type == BPF_PROG_TYPE_STRUCT_OPS || - prog->type == BPF_PROG_TYPE_LSM) + prog->type == BPF_PROG_TYPE_LSM || + prog->type == BPF_PROG_TYPE_SCHED) return true; /* BPF_PROG_TYPE_TRACING programs which do not attach to other programs @@ -6280,7 +6281,8 @@ int bpf_program__load(struct bpf_program *prog, char *license, __u32 kern_ver) if ((prog->type == BPF_PROG_TYPE_TRACING || prog->type == BPF_PROG_TYPE_LSM || - prog->type == BPF_PROG_TYPE_EXT) && !prog->attach_btf_id) { + prog->type == BPF_PROG_TYPE_EXT || + prog->type == BPF_PROG_TYPE_SCHED) && !prog->attach_btf_id) { int btf_obj_fd = 0, btf_type_id = 0; err = libbpf_find_attach_btf_id(prog, &btf_obj_fd, &btf_type_id); @@ -7892,6 +7894,7 @@ BPF_PROG_TYPE_FNS(tracing, BPF_PROG_TYPE_TRACING); BPF_PROG_TYPE_FNS(struct_ops, BPF_PROG_TYPE_STRUCT_OPS); BPF_PROG_TYPE_FNS(extension, BPF_PROG_TYPE_EXT); BPF_PROG_TYPE_FNS(sk_lookup, BPF_PROG_TYPE_SK_LOOKUP); +BPF_PROG_TYPE_FNS(sched, BPF_PROG_TYPE_SCHED); enum bpf_attach_type bpf_program__get_expected_attach_type(const struct bpf_program *prog) @@ -7950,6 +7953,7 @@ static struct bpf_link *attach_raw_tp(struct bpf_program *prog); static struct bpf_link *attach_trace(struct bpf_program *prog); static struct bpf_link *attach_lsm(struct bpf_program *prog); static struct bpf_link *attach_iter(struct bpf_program *prog); +static struct bpf_link *attach_sched(struct bpf_program *prog); static const struct bpf_sec_def section_defs[] = { BPF_PROG_SEC("socket", BPF_PROG_TYPE_SOCKET_FILTER), @@ -8022,6 +8026,10 @@ static const struct bpf_sec_def section_defs[] = { .attach_fn = attach_iter), SEC_DEF("syscall", SYSCALL, .is_sleepable = true), + SEC_DEF("sched/", SCHED, + .is_attach_btf = true, + .expected_attach_type = BPF_SCHED, + .attach_fn = attach_sched), BPF_EAPROG_SEC("xdp_devmap/", BPF_PROG_TYPE_XDP, BPF_XDP_DEVMAP), BPF_EAPROG_SEC("xdp_cpumap/", BPF_PROG_TYPE_XDP, @@ -8311,6 +8319,7 @@ static int bpf_object__collect_st_ops_relos(struct bpf_object *obj, #define BTF_TRACE_PREFIX "btf_trace_" #define BTF_LSM_PREFIX "bpf_lsm_" #define BTF_ITER_PREFIX "bpf_iter_" +#define BTF_SCHED_PREFIX "bpf_sched_" #define BTF_MAX_NAME_SIZE 128 void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type, @@ -8329,6 +8338,10 @@ void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type, *prefix = BTF_ITER_PREFIX; *kind = BTF_KIND_FUNC; break; + case BPF_SCHED: + *prefix = BTF_SCHED_PREFIX; + *kind = BTF_KIND_FUNC; + break; default: *prefix = ""; *kind = BTF_KIND_FUNC; @@ -9675,6 +9688,11 @@ struct bpf_link *bpf_program__attach_lsm(struct bpf_program *prog) return bpf_program__attach_btf_id(prog); } +struct bpf_link *bpf_program__attach_sched(struct bpf_program *prog) +{ + return bpf_program__attach_btf_id(prog); +} + static struct bpf_link *attach_trace(struct bpf_program *prog) { return bpf_program__attach_trace(prog); @@ -9685,6 +9703,11 @@ static struct bpf_link *attach_lsm(struct bpf_program *prog) return bpf_program__attach_lsm(prog); } +static struct bpf_link *attach_sched(struct bpf_program *prog) +{ + return bpf_program__attach_sched(prog); +} + static struct bpf_link * bpf_program__attach_fd(struct bpf_program *prog, int target_fd, int btf_id, const char *target_name) diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index 2f6f0e15d1e7..42a3dfcca778 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -339,6 +339,8 @@ bpf_program__attach_xdp(struct bpf_program *prog, int ifindex); LIBBPF_API struct bpf_link * bpf_program__attach_freplace(struct bpf_program *prog, int target_fd, const char *attach_func_name); +LIBBPF_API struct bpf_link * +bpf_program__attach_sched(struct bpf_program *prog); struct bpf_map; @@ -435,6 +437,7 @@ LIBBPF_API int bpf_program__set_tracing(struct bpf_program *prog); LIBBPF_API int bpf_program__set_struct_ops(struct bpf_program *prog); LIBBPF_API int bpf_program__set_extension(struct bpf_program *prog); LIBBPF_API int bpf_program__set_sk_lookup(struct bpf_program *prog); +LIBBPF_API int bpf_program__set_sched(struct bpf_program *prog); LIBBPF_API enum bpf_prog_type bpf_program__get_type(const struct bpf_program *prog); LIBBPF_API void bpf_program__set_type(struct bpf_program *prog, @@ -463,6 +466,7 @@ LIBBPF_API bool bpf_program__is_tracing(const struct bpf_program *prog); LIBBPF_API bool bpf_program__is_struct_ops(const struct bpf_program *prog); LIBBPF_API bool bpf_program__is_extension(const struct bpf_program *prog); LIBBPF_API bool bpf_program__is_sk_lookup(const struct bpf_program *prog); +LIBBPF_API bool bpf_program__is_sched(const struct bpf_program *prog); /* * No need for __attribute__((packed)), all members of 'bpf_map_def' diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index 9e649cf9e771..02f149aced5a 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -390,4 +390,7 @@ LIBBPF_0.5.0 { LIBBPF_0.6.0 { global: btf__add_tag; + bpf_program__attach_sched; + bpf_program__is_sched; + bpf_program__set_sched; } LIBBPF_0.5.0; From patchwork Thu Sep 16 16:24:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Gushchin X-Patchwork-Id: 12499631 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31CECC433F5 for ; Thu, 16 Sep 2021 16:27:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1B23D61A80 for ; Thu, 16 Sep 2021 16:27:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241786AbhIPQ2z (ORCPT ); Thu, 16 Sep 2021 12:28:55 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:55700 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242040AbhIPQ0d (ORCPT ); Thu, 16 Sep 2021 12:26:33 -0400 Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 18GFgs8s005005 for ; Thu, 16 Sep 2021 09:25:12 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=jLNFft97hv/+lzB1+kKBtzjwdHEKLK12ir3d65yRKbs=; b=BdE8vgfuIBiqSt7JKmZakqlS4vbYR7D0qzVC1UTsbGgAFri1TttE1Zqg2mdtAaNRWuAa UrDz10ftOJTe2Br6skpcjBhHRf4J+6rb3H+smx7SgZzqPAXFH/+qTY75EF1sRH++iVFC c5RY1avm01b2awBOJUo6LA6a9DPfPwQVMDo= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 3b3dkwjsb3-7 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 16 Sep 2021 09:25:12 -0700 Received: from intmgw002.46.prn1.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.14; Thu, 16 Sep 2021 09:25:04 -0700 Received: by devvm3388.prn0.facebook.com (Postfix, from userid 111017) id 495CFBE68AB6; Thu, 16 Sep 2021 09:25:02 -0700 (PDT) From: Roman Gushchin To: Peter Zijlstra , Ingo Molnar CC: Mel Gorman , , , Roman Gushchin Subject: [PATCH rfc 6/6] bpftool: recognize scheduler programs Date: Thu, 16 Sep 2021 09:24:51 -0700 Message-ID: <20210916162451.709260-7-guro@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210916162451.709260-1-guro@fb.com> References: <20210915213550.3696532-1-guro@fb.com> <20210916162451.709260-1-guro@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-FB-Source: Intern X-Proofpoint-GUID: 8HvMUC_xPgX95nOH5jqKUhGlVBP27p6z X-Proofpoint-ORIG-GUID: 8HvMUC_xPgX95nOH5jqKUhGlVBP27p6z X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.391,FMLib:17.0.607.475 definitions=2021-09-16_04,2021-09-16_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 malwarescore=0 clxscore=1015 suspectscore=0 mlxlogscore=595 impostorscore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 bulkscore=0 adultscore=0 phishscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109030001 definitions=main-2109160098 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC Teach bpftool to recognize scheduler bpf programs. Signed-off-by: Roman Gushchin --- tools/bpf/bpftool/common.c | 1 + tools/bpf/bpftool/prog.c | 1 + 2 files changed, 2 insertions(+) diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c index d42d930a3ec4..c73d634f4e82 100644 --- a/tools/bpf/bpftool/common.c +++ b/tools/bpf/bpftool/common.c @@ -73,6 +73,7 @@ const char * const attach_type_name[__MAX_BPF_ATTACH_TYPE] = { [BPF_XDP] = "xdp", [BPF_SK_REUSEPORT_SELECT] = "sk_skb_reuseport_select", [BPF_SK_REUSEPORT_SELECT_OR_MIGRATE] = "sk_skb_reuseport_select_or_migrate", + [BPF_SCHED] = "sched", }; void p_err(const char *fmt, ...) diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index 9c3e343b7d87..78eb4e807a6b 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -67,6 +67,7 @@ const char * const prog_type_name[] = { [BPF_PROG_TYPE_EXT] = "ext", [BPF_PROG_TYPE_LSM] = "lsm", [BPF_PROG_TYPE_SK_LOOKUP] = "sk_lookup", + [BPF_PROG_TYPE_SCHED] = "sched", }; const size_t prog_type_name_size = ARRAY_SIZE(prog_type_name);