From patchwork Thu Sep 29 07:04:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12993571 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27E6DC6FA82 for ; Thu, 29 Sep 2022 07:04:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235083AbiI2HE4 (ORCPT ); Thu, 29 Sep 2022 03:04:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35654 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235042AbiI2HEi (ORCPT ); Thu, 29 Sep 2022 03:04:38 -0400 Received: from out2.migadu.com (out2.migadu.com [IPv6:2001:41d0:2:aacc::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D585131991; Thu, 29 Sep 2022 00:04:26 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1664435064; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0UfA0PJAoP2y++0RBxRan7HEgfBgWML+qPMCq4fnM70=; b=mRk14q4vNluBZXNfspg7aaGv2Ssf8VeGa2c+QtWLH0sBTVfnGCTDLsqXFMabz/XOnn7Zm7 ih90pDyo9Vazg8Ley+J0F62obm2JDthX07ULLtJ7Qej1qoyhyhg0cmnk09izNxhgPjZb+L Bz0ZGvAxA7J8HSmMw55BrPDr7flS4BY= From: Martin KaFai Lau To: ' ' , ' ' Cc: 'Alexei Starovoitov ' , 'Andrii Nakryiko ' , 'Daniel Borkmann ' , 'David Miller ' , 'Jakub Kicinski ' , 'Eric Dumazet ' , 'Paolo Abeni ' , ' ' Subject: [PATCH v3 bpf-next 1/5] bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline Date: Thu, 29 Sep 2022 00:04:03 -0700 Message-Id: <20220929070407.965581-2-martin.lau@linux.dev> In-Reply-To: <20220929070407.965581-1-martin.lau@linux.dev> References: <20220929070407.965581-1-martin.lau@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau The struct_ops prog is to allow using bpf to implement the functions in a struct (eg. kernel module). The current usage is to implement the tcp_congestion. The kernel does not call the tcp-cc's ops (ie. the bpf prog) in a recursive way. The struct_ops is sharing the tracing-trampoline's enter/exit function which tracks prog->active to avoid recursion. It is needed for tracing prog. However, it turns out the struct_ops bpf prog will hit this prog->active and unnecessarily skipped running the struct_ops prog. eg. The '.ssthresh' may run in_task() and then interrupted by softirq that runs the same '.ssthresh'. Skip running the '.ssthresh' will end up returning random value to the caller. The patch adds __bpf_prog_{enter,exit}_struct_ops for the struct_ops trampoline. They do not track the prog->active to detect recursion. One exception is when the tcp_congestion's '.init' ops is doing bpf_setsockopt(TCP_CONGESTION) and then recurs to the same '.init' ops. This will be addressed in the following patches. Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism") Signed-off-by: Martin KaFai Lau --- arch/x86/net/bpf_jit_comp.c | 3 +++ include/linux/bpf.h | 4 ++++ kernel/bpf/trampoline.c | 23 +++++++++++++++++++++++ 3 files changed, 30 insertions(+) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 35796db58116..5b6230779cf3 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1836,6 +1836,9 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog, if (p->aux->sleepable) { enter = __bpf_prog_enter_sleepable; exit = __bpf_prog_exit_sleepable; + } else if (p->type == BPF_PROG_TYPE_STRUCT_OPS) { + enter = __bpf_prog_enter_struct_ops; + exit = __bpf_prog_exit_struct_ops; } else if (p->expected_attach_type == BPF_LSM_CGROUP) { enter = __bpf_prog_enter_lsm_cgroup; exit = __bpf_prog_exit_lsm_cgroup; diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 5161fac0513f..f98ab36c09d3 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -864,6 +864,10 @@ u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx); void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start, struct bpf_tramp_run_ctx *run_ctx); +u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog, + struct bpf_tramp_run_ctx *run_ctx); +void notrace __bpf_prog_exit_struct_ops(struct bpf_prog *prog, u64 start, + struct bpf_tramp_run_ctx *run_ctx); void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr); void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr); diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index 6f7b939321d6..bf0906e1e2b9 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -964,6 +964,29 @@ void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start, rcu_read_unlock_trace(); } +u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog, + struct bpf_tramp_run_ctx *run_ctx) + __acquires(RCU) +{ + rcu_read_lock(); + migrate_disable(); + + run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); + + return bpf_prog_start_time(); +} + +void notrace __bpf_prog_exit_struct_ops(struct bpf_prog *prog, u64 start, + struct bpf_tramp_run_ctx *run_ctx) + __releases(RCU) +{ + bpf_reset_run_ctx(run_ctx->saved_run_ctx); + + update_prog_stats(prog, start); + migrate_enable(); + rcu_read_unlock(); +} + void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr) { percpu_ref_get(&tr->pcref); From patchwork Thu Sep 29 07:04:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12993573 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F532C07E9D for ; Thu, 29 Sep 2022 07:05:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235120AbiI2HFJ (ORCPT ); Thu, 29 Sep 2022 03:05:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235072AbiI2HEm (ORCPT ); Thu, 29 Sep 2022 03:04:42 -0400 Received: from out2.migadu.com (out2.migadu.com [IPv6:2001:41d0:2:aacc::]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C128613070A; Thu, 29 Sep 2022 00:04:29 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1664435068; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Uh+ejWrx6vnFNK0wy56yr0aj1vBg58V3Md/AW4RsBQQ=; b=TQWjUUP7upW2dOzPQicUtj7yvX9ICrPTkCgil0Emkcmlz3DAr0nIZy3Ws7WBvzGO2TczBk u0LlCR/+4msrBpT+u+QVCO6mVgQV9VmrhObPHHl+ezqbaysQIBekfJ+bz73Ud3w1jOXkr4 vgVerwWaYCdfCvqzh54R5bKa2MTCys4= From: Martin KaFai Lau To: ' ' , ' ' Cc: 'Alexei Starovoitov ' , 'Andrii Nakryiko ' , 'Daniel Borkmann ' , 'David Miller ' , 'Jakub Kicinski ' , 'Eric Dumazet ' , 'Paolo Abeni ' , ' ' Subject: [PATCH v3 bpf-next 2/5] bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt() Date: Thu, 29 Sep 2022 00:04:04 -0700 Message-Id: <20220929070407.965581-3-martin.lau@linux.dev> In-Reply-To: <20220929070407.965581-1-martin.lau@linux.dev> References: <20220929070407.965581-1-martin.lau@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau The check on the tcp-cc, "cdg", is done in the bpf_sk_setsockopt which is used by the bpf_tcp_ca, bpf_lsm, cg_sockopt, and tcp_iter hooks. However, it is not done for cg sock_ddr, cg sockops, and some of the bpf_lsm_cgroup hooks. The tcp-cc "cdg" should have very limited usage. This patch is to move the "cdg" check to the common sol_tcp_sockopt() so that all hooks have a consistent behavior. The motivation to make this check consistent now is because the latter patch will refactor the bpf_setsockopt(TCP_CONGESTION) into another function, so it is better to take this chance to refactor this piece also. Signed-off-by: Martin KaFai Lau --- net/core/filter.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 2fd9449026aa..f4cea3ff994a 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5127,6 +5127,13 @@ static int sol_tcp_sockopt(struct sock *sk, int optname, case TCP_CONGESTION: if (*optlen < 2) return -EINVAL; + /* "cdg" is the only cc that alloc a ptr + * in inet_csk_ca area. The bpf-tcp-cc may + * overwrite this ptr after switching to cdg. + */ + if (!getopt && *optlen >= sizeof("cdg") - 1 && + !strncmp("cdg", optval, *optlen)) + return -ENOTSUPP; break; case TCP_SAVED_SYN: if (*optlen < 1) @@ -5285,12 +5292,6 @@ static int _bpf_getsockopt(struct sock *sk, int level, int optname, BPF_CALL_5(bpf_sk_setsockopt, struct sock *, sk, int, level, int, optname, char *, optval, int, optlen) { - if (level == SOL_TCP && optname == TCP_CONGESTION) { - if (optlen >= sizeof("cdg") - 1 && - !strncmp("cdg", optval, optlen)) - return -ENOTSUPP; - } - return _bpf_setsockopt(sk, level, optname, optval, optlen); } From patchwork Thu Sep 29 07:04:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12993572 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40DCEC07E9D for ; Thu, 29 Sep 2022 07:05:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235085AbiI2HE7 (ORCPT ); Thu, 29 Sep 2022 03:04:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36040 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235063AbiI2HEm (ORCPT ); Thu, 29 Sep 2022 03:04:42 -0400 Received: from out2.migadu.com (out2.migadu.com [188.165.223.204]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 80C261003; Thu, 29 Sep 2022 00:04:33 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1664435071; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lv9oaVEkuaN/4tfopReITc+74lJ/gN9iWS/b7MxMp88=; b=qa09C+/dyrU+b9Bc9OYMG1QSvp806rPIhhOT8aP145G+JozG1cTTazgRSEeZAKki0/68wX BoFJPeo4dn4EaplAWHrwmJUrb0/edIu4c4x2r6l38fWObfCNaumofCVxGIZ/QrVXOtOA3z 4f9ybsg+n5icV+eXw+s7VFSY+AHXgc8= From: Martin KaFai Lau To: ' ' , ' ' Cc: 'Alexei Starovoitov ' , 'Andrii Nakryiko ' , 'Daniel Borkmann ' , 'David Miller ' , 'Jakub Kicinski ' , 'Eric Dumazet ' , 'Paolo Abeni ' , ' ' Subject: [PATCH v3 bpf-next 3/5] bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function Date: Thu, 29 Sep 2022 00:04:05 -0700 Message-Id: <20220929070407.965581-4-martin.lau@linux.dev> In-Reply-To: <20220929070407.965581-1-martin.lau@linux.dev> References: <20220929070407.965581-1-martin.lau@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau This patch moves the bpf_setsockopt(TCP_CONGESTION) logic into another function. The next patch will add extra logic to avoid recursion and this will make the latter patch easier to follow. Signed-off-by: Martin KaFai Lau --- net/core/filter.c | 45 ++++++++++++++++++++++++++++----------------- 1 file changed, 28 insertions(+), 17 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index f4cea3ff994a..96f2f7a65e65 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5102,6 +5102,33 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname, return 0; } +static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval, + int *optlen, bool getopt) +{ + if (*optlen < 2) + return -EINVAL; + + if (getopt) { + if (!inet_csk(sk)->icsk_ca_ops) + return -EINVAL; + /* BPF expects NULL-terminated tcp-cc string */ + optval[--(*optlen)] = '\0'; + return do_tcp_getsockopt(sk, SOL_TCP, TCP_CONGESTION, + KERNEL_SOCKPTR(optval), + KERNEL_SOCKPTR(optlen)); + } + + /* "cdg" is the only cc that alloc a ptr + * in inet_csk_ca area. The bpf-tcp-cc may + * overwrite this ptr after switching to cdg. + */ + if (*optlen >= sizeof("cdg") - 1 && !strncmp("cdg", optval, *optlen)) + return -ENOTSUPP; + + return do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION, + KERNEL_SOCKPTR(optval), *optlen); +} + static int sol_tcp_sockopt(struct sock *sk, int optname, char *optval, int *optlen, bool getopt) @@ -5125,16 +5152,7 @@ static int sol_tcp_sockopt(struct sock *sk, int optname, return -EINVAL; break; case TCP_CONGESTION: - if (*optlen < 2) - return -EINVAL; - /* "cdg" is the only cc that alloc a ptr - * in inet_csk_ca area. The bpf-tcp-cc may - * overwrite this ptr after switching to cdg. - */ - if (!getopt && *optlen >= sizeof("cdg") - 1 && - !strncmp("cdg", optval, *optlen)) - return -ENOTSUPP; - break; + return sol_tcp_sockopt_congestion(sk, optval, optlen, getopt); case TCP_SAVED_SYN: if (*optlen < 1) return -EINVAL; @@ -5159,13 +5177,6 @@ static int sol_tcp_sockopt(struct sock *sk, int optname, return 0; } - if (optname == TCP_CONGESTION) { - if (!inet_csk(sk)->icsk_ca_ops) - return -EINVAL; - /* BPF expects NULL-terminated tcp-cc string */ - optval[--(*optlen)] = '\0'; - } - return do_tcp_getsockopt(sk, SOL_TCP, optname, KERNEL_SOCKPTR(optval), KERNEL_SOCKPTR(optlen)); From patchwork Thu Sep 29 07:04:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12993574 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4594C6FA90 for ; Thu, 29 Sep 2022 07:05:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234820AbiI2HFK (ORCPT ); Thu, 29 Sep 2022 03:05:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58382 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235075AbiI2HEn (ORCPT ); Thu, 29 Sep 2022 03:04:43 -0400 Received: from out2.migadu.com (out2.migadu.com [188.165.223.204]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABABF5F8C; Thu, 29 Sep 2022 00:04:35 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1664435074; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mrtWExq0wBIpyOPOl96p4X+vkDCpxYHVu/CvZF0R/5Y=; b=uEamq2QYjtyqsfp4pDJigvVn1gC3erCNRlj4CjW3pNjI9vZb5z1o3tUeygE6FcMTKRoYhL nWMjh3PONjzLnHBBJrZ7jGBKC6BLEnG45CNuiFpiJDvSNj87bbo6pTkTGrcCoEsUrizPyb 2hCX2RZanCnFocQTtgQaSdojjhpzvTk= From: Martin KaFai Lau To: ' ' , ' ' Cc: 'Alexei Starovoitov ' , 'Andrii Nakryiko ' , 'Daniel Borkmann ' , 'David Miller ' , 'Jakub Kicinski ' , 'Eric Dumazet ' , 'Paolo Abeni ' , ' ' Subject: [PATCH v3 bpf-next 4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself Date: Thu, 29 Sep 2022 00:04:06 -0700 Message-Id: <20220929070407.965581-5-martin.lau@linux.dev> In-Reply-To: <20220929070407.965581-1-martin.lau@linux.dev> References: <20220929070407.965581-1-martin.lau@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau When a bad bpf prog '.init' calls bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop: .init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ... ... => .init => bpf_setsockopt(tcp_cc). It was prevented by the prog->active counter before but the prog->active detection cannot be used in struct_ops as explained in the earlier patch of the set. In this patch, the second bpf_setsockopt(tcp_cc) is not allowed in order to break the loop. This is done by using a bit of an existing 1 byte hole in tcp_sock to check if there is on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock. Note that this essentially limits only the first '.init' can call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer does not support ECN) and the second '.init' cannot fallback to another cc. This applies even the second bpf_setsockopt(TCP_CONGESTION) will not cause a loop. Signed-off-by: Martin KaFai Lau Reviewed-by: Eric Dumazet --- include/linux/tcp.h | 6 ++++++ net/core/filter.c | 28 +++++++++++++++++++++++++++- net/ipv4/tcp_minisocks.c | 1 + 3 files changed, 34 insertions(+), 1 deletion(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index a9fbe22732c3..3bdf687e2fb3 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -388,6 +388,12 @@ struct tcp_sock { u8 bpf_sock_ops_cb_flags; /* Control calling BPF programs * values defined in uapi/linux/tcp.h */ + u8 bpf_chg_cc_inprogress:1; /* In the middle of + * bpf_setsockopt(TCP_CONGESTION), + * it is to avoid the bpf_tcp_cc->init() + * to recur itself by calling + * bpf_setsockopt(TCP_CONGESTION, "itself"). + */ #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG) #else #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0 diff --git a/net/core/filter.c b/net/core/filter.c index 96f2f7a65e65..ac4c45c02da5 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5105,6 +5105,9 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname, static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval, int *optlen, bool getopt) { + struct tcp_sock *tp; + int ret; + if (*optlen < 2) return -EINVAL; @@ -5125,8 +5128,31 @@ static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval, if (*optlen >= sizeof("cdg") - 1 && !strncmp("cdg", optval, *optlen)) return -ENOTSUPP; - return do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION, + /* It stops this looping + * + * .init => bpf_setsockopt(tcp_cc) => .init => + * bpf_setsockopt(tcp_cc)" => .init => .... + * + * The second bpf_setsockopt(tcp_cc) is not allowed + * in order to break the loop when both .init + * are the same bpf prog. + * + * This applies even the second bpf_setsockopt(tcp_cc) + * does not cause a loop. This limits only the first + * '.init' can call bpf_setsockopt(TCP_CONGESTION) to + * pick a fallback cc (eg. peer does not support ECN) + * and the second '.init' cannot fallback to + * another. + */ + tp = tcp_sk(sk); + if (tp->bpf_chg_cc_inprogress) + return -EBUSY; + + tp->bpf_chg_cc_inprogress = 1; + ret = do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION, KERNEL_SOCKPTR(optval), *optlen); + tp->bpf_chg_cc_inprogress = 0; + return ret; } static int sol_tcp_sockopt(struct sock *sk, int optname, diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index cb95d88497ae..ddcdc2bc4c04 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -541,6 +541,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk, newtp->fastopen_req = NULL; RCU_INIT_POINTER(newtp->fastopen_rsk, NULL); + newtp->bpf_chg_cc_inprogress = 0; tcp_bpf_clone(sk, newsk); __TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS); From patchwork Thu Sep 29 07:04:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin KaFai Lau X-Patchwork-Id: 12993575 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EEF23C04A95 for ; Thu, 29 Sep 2022 07:05:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234542AbiI2HFK (ORCPT ); Thu, 29 Sep 2022 03:05:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235025AbiI2HEq (ORCPT ); Thu, 29 Sep 2022 03:04:46 -0400 Received: from out2.migadu.com (out2.migadu.com [188.165.223.204]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2289EE16; Thu, 29 Sep 2022 00:04:39 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1664435077; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VvIKaWyrD++47cob00FLRKjCXfixbyYJJB5mAod9z80=; b=kvMJ5uJ+vqDnrPMsi6q9nrtcFEEMXbNUE8hOsYzvwoksSgl9RC/087RhIyhbtIb7GPqeCY +a+RQsVAKWdoMB9V0tmvbRXMt2w/P+nMtI6QumgALGxH+FOW1626DNjkr2RNzAQ00UP5h3 weTATx8C7OWYYYVEi64kTpNcs1gl9b4= From: Martin KaFai Lau To: ' ' , ' ' Cc: 'Alexei Starovoitov ' , 'Andrii Nakryiko ' , 'Daniel Borkmann ' , 'David Miller ' , 'Jakub Kicinski ' , 'Eric Dumazet ' , 'Paolo Abeni ' , ' ' Subject: [PATCH v3 bpf-next 5/5] selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION) Date: Thu, 29 Sep 2022 00:04:07 -0700 Message-Id: <20220929070407.965581-6-martin.lau@linux.dev> In-Reply-To: <20220929070407.965581-1-martin.lau@linux.dev> References: <20220929070407.965581-1-martin.lau@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Martin KaFai Lau This patch changes the bpf_dctcp test to ensure the recurred bpf_setsockopt(TCP_CONGESTION) returns -EBUSY. Signed-off-by: Martin KaFai Lau --- .../selftests/bpf/prog_tests/bpf_tcp_ca.c | 4 +++ tools/testing/selftests/bpf/progs/bpf_dctcp.c | 25 +++++++++++++------ 2 files changed, 21 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c index 2959a52ced06..e980188d4124 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c @@ -290,6 +290,10 @@ static void test_dctcp_fallback(void) goto done; ASSERT_STREQ(dctcp_skel->bss->cc_res, "cubic", "cc_res"); ASSERT_EQ(dctcp_skel->bss->tcp_cdg_res, -ENOTSUPP, "tcp_cdg_res"); + /* All setsockopt(TCP_CONGESTION) in the recurred + * bpf_dctcp->init() should fail with -EBUSY. + */ + ASSERT_EQ(dctcp_skel->bss->ebusy_cnt, 3, "ebusy_cnt"); err = getsockopt(srv_fd, SOL_TCP, TCP_CONGESTION, srv_cc, &cc_len); if (!ASSERT_OK(err, "getsockopt(srv_fd, TCP_CONGESTION)")) diff --git a/tools/testing/selftests/bpf/progs/bpf_dctcp.c b/tools/testing/selftests/bpf/progs/bpf_dctcp.c index 9573be6122be..460682759aed 100644 --- a/tools/testing/selftests/bpf/progs/bpf_dctcp.c +++ b/tools/testing/selftests/bpf/progs/bpf_dctcp.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include "bpf_tcp_helpers.h" @@ -23,6 +24,7 @@ const char tcp_cdg[] = "cdg"; char cc_res[TCP_CA_NAME_MAX]; int tcp_cdg_res = 0; int stg_result = 0; +int ebusy_cnt = 0; struct { __uint(type, BPF_MAP_TYPE_SK_STORAGE); @@ -64,16 +66,23 @@ void BPF_PROG(dctcp_init, struct sock *sk) if (!(tp->ecn_flags & TCP_ECN_OK) && fallback[0]) { /* Switch to fallback */ - bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, - (void *)fallback, sizeof(fallback)); - /* Switch back to myself which the bpf trampoline - * stopped calling dctcp_init recursively. + if (bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, + (void *)fallback, sizeof(fallback)) == -EBUSY) + ebusy_cnt++; + + /* Switch back to myself and the recurred dctcp_init() + * will get -EBUSY for all bpf_setsockopt(TCP_CONGESTION), + * except the last "cdg" one. */ - bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, - (void *)bpf_dctcp, sizeof(bpf_dctcp)); + if (bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, + (void *)bpf_dctcp, sizeof(bpf_dctcp)) == -EBUSY) + ebusy_cnt++; + /* Switch back to fallback */ - bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, - (void *)fallback, sizeof(fallback)); + if (bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, + (void *)fallback, sizeof(fallback)) == -EBUSY) + ebusy_cnt++; + /* Expecting -ENOTSUPP for tcp_cdg_res */ tcp_cdg_res = bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION, (void *)tcp_cdg, sizeof(tcp_cdg));