Message ID | 20240507-upstream-bpf-next-20240506-mptcp-subflow-test-v1-3-e2bcbdf49857@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | selftests/bpf: new MPTCP subflow subtest & improvements | expand |
On Tue, May 7, 2024 at 3:53 AM Matthieu Baerts (NGI0) <matttbe@kernel.org> wrote: > > From: Nicolas Rybowski <nicolas.rybowski@tessares.net> > > Move Nicolas's patch into bpf selftests directory. This example added a > test that was adding a different mark (SO_MARK) on each subflow, and > changing the TCP CC only on the first subflow. > > This example shows how it is possible to: > > Identify the parent msk of an MPTCP subflow. > Put different sockopt for each subflow of a same MPTCP connection. > > Here especially, we implemented two different behaviours: > > A socket mark (SOL_SOCKET SO_MARK) is put on each subflow of a same > MPTCP connection. The order of creation of the current subflow defines > its mark. > The TCP CC algorithm of the very first subflow of an MPTCP > connection is set to "reno". why? What does it test? That bpf_setsockopt() can actually do it? But the next patch doesn't check that it's reno. It looks to me that dropping this "set to reno" part won't change the purpose of the rest of selftest. pw-bot: cr
Hi Alexei, Thank you for the review! On 07/05/2024 16:49, Alexei Starovoitov wrote: > On Tue, May 7, 2024 at 3:53 AM Matthieu Baerts (NGI0) > <matttbe@kernel.org> wrote: >> >> From: Nicolas Rybowski <nicolas.rybowski@tessares.net> >> >> Move Nicolas's patch into bpf selftests directory. This example added a >> test that was adding a different mark (SO_MARK) on each subflow, and >> changing the TCP CC only on the first subflow. >> >> This example shows how it is possible to: >> >> Identify the parent msk of an MPTCP subflow. >> Put different sockopt for each subflow of a same MPTCP connection. >> >> Here especially, we implemented two different behaviours: >> >> A socket mark (SOL_SOCKET SO_MARK) is put on each subflow of a same >> MPTCP connection. The order of creation of the current subflow defines >> its mark. > >> The TCP CC algorithm of the very first subflow of an MPTCP >> connection is set to "reno". > > why? > What does it test? > That bpf_setsockopt() can actually do it? Correct. Here is a bit of context: from the userspace, an application can do a setsockopt() on an MPTCP socket, and typically the same value will be set on all subflows (paths). If someone wants to have different values per subflow, the recommanded way is to use BPF. We can indeed restrict this test to changing the MARK only. I think the CC has been modified just not to check one thing, but also to change something at the TCP level, because it is managed differently on MPTCP side -- but only when the userspace set something, or when new subflows are created. The result of this operation is easy to check with 'ss', and it was to show an exemple where this is set only on one subflow. > But the next patch doesn't check that it's reno. No, I think it is checked: 'reno' is not hardcoded, but 'skel->data->cc' is used instead: run_subflow(skel->data->cc); > It looks to me that dropping this "set to reno" part > won't change the purpose of the rest of selftest. Yes, up to you. If you still think it is better without it, we can remove the modification of the CC in patch 3/4, and the validation in patch 4/4. > pw-bot: cr Cheers, Matt
On Tue, May 7, 2024 at 9:03 AM Matthieu Baerts <matttbe@kernel.org> wrote: > > Hi Alexei, > > Thank you for the review! > > On 07/05/2024 16:49, Alexei Starovoitov wrote: > > On Tue, May 7, 2024 at 3:53 AM Matthieu Baerts (NGI0) > > <matttbe@kernel.org> wrote: > >> > >> From: Nicolas Rybowski <nicolas.rybowski@tessares.net> > >> > >> Move Nicolas's patch into bpf selftests directory. This example added a > >> test that was adding a different mark (SO_MARK) on each subflow, and > >> changing the TCP CC only on the first subflow. > >> > >> This example shows how it is possible to: > >> > >> Identify the parent msk of an MPTCP subflow. > >> Put different sockopt for each subflow of a same MPTCP connection. > >> > >> Here especially, we implemented two different behaviours: > >> > >> A socket mark (SOL_SOCKET SO_MARK) is put on each subflow of a same > >> MPTCP connection. The order of creation of the current subflow defines > >> its mark. > > > >> The TCP CC algorithm of the very first subflow of an MPTCP > >> connection is set to "reno". > > > > why? > > What does it test? > > That bpf_setsockopt() can actually do it? > > Correct. > > Here is a bit of context: from the userspace, an application can do a > setsockopt() on an MPTCP socket, and typically the same value will be > set on all subflows (paths). If someone wants to have different values > per subflow, the recommanded way is to use BPF. > > We can indeed restrict this test to changing the MARK only. I think the > CC has been modified just not to check one thing, but also to change > something at the TCP level, because it is managed differently on MPTCP > side -- but only when the userspace set something, or when new subflows > are created. The result of this operation is easy to check with 'ss', > and it was to show an exemple where this is set only on one subflow. > > > But the next patch doesn't check that it's reno. > > No, I think it is checked: 'reno' is not hardcoded, but 'skel->data->cc' > is used instead: > > run_subflow(skel->data->cc); > > > It looks to me that dropping this "set to reno" part > > won't change the purpose of the rest of selftest. > > Yes, up to you. If you still think it is better without it, we can > remove the modification of the CC in patch 3/4, and the validation in > patch 4/4. The concern with picking reno is extra deps to CI and every developer. Currently in selftests/bpf/config we do: CONFIG_TCP_CONG_DCTCP=y CONFIG_TCP_CONG_BBR=y I'd like to avoid adding reno there as well. Will bpf_setsockopt("dctcp") work?
Hi Alexei, Thank you for your reply! On 07/05/2024 22:54, Alexei Starovoitov wrote: > On Tue, May 7, 2024 at 9:03 AM Matthieu Baerts <matttbe@kernel.org> wrote: >> >> Hi Alexei, >> >> Thank you for the review! >> >> On 07/05/2024 16:49, Alexei Starovoitov wrote: >>> On Tue, May 7, 2024 at 3:53 AM Matthieu Baerts (NGI0) >>> <matttbe@kernel.org> wrote: >>>> >>>> From: Nicolas Rybowski <nicolas.rybowski@tessares.net> >>>> >>>> Move Nicolas's patch into bpf selftests directory. This example added a >>>> test that was adding a different mark (SO_MARK) on each subflow, and >>>> changing the TCP CC only on the first subflow. >>>> >>>> This example shows how it is possible to: >>>> >>>> Identify the parent msk of an MPTCP subflow. >>>> Put different sockopt for each subflow of a same MPTCP connection. >>>> >>>> Here especially, we implemented two different behaviours: >>>> >>>> A socket mark (SOL_SOCKET SO_MARK) is put on each subflow of a same >>>> MPTCP connection. The order of creation of the current subflow defines >>>> its mark. >>> >>>> The TCP CC algorithm of the very first subflow of an MPTCP >>>> connection is set to "reno". >>> >>> why? >>> What does it test? >>> That bpf_setsockopt() can actually do it? >> >> Correct. >> >> Here is a bit of context: from the userspace, an application can do a >> setsockopt() on an MPTCP socket, and typically the same value will be >> set on all subflows (paths). If someone wants to have different values >> per subflow, the recommanded way is to use BPF. >> >> We can indeed restrict this test to changing the MARK only. I think the >> CC has been modified just not to check one thing, but also to change >> something at the TCP level, because it is managed differently on MPTCP >> side -- but only when the userspace set something, or when new subflows >> are created. The result of this operation is easy to check with 'ss', >> and it was to show an exemple where this is set only on one subflow. >> >>> But the next patch doesn't check that it's reno. >> >> No, I think it is checked: 'reno' is not hardcoded, but 'skel->data->cc' >> is used instead: >> >> run_subflow(skel->data->cc); >> >>> It looks to me that dropping this "set to reno" part >>> won't change the purpose of the rest of selftest. >> >> Yes, up to you. If you still think it is better without it, we can >> remove the modification of the CC in patch 3/4, and the validation in >> patch 4/4. > > The concern with picking reno is extra deps to CI and every developer. > Currently in selftests/bpf/config we do: > CONFIG_TCP_CONG_DCTCP=y > CONFIG_TCP_CONG_BBR=y > > I'd like to avoid adding reno there as well. > Will bpf_setsockopt("dctcp") work? We picked Reno because this is an inlined kernel module that is always built: there is no kernel config to set, no extra deps. Also, it is usually not used as default, mostly used as fallback, so the verification should not be an issue. We can switch to DCTCP or BBR if you prefer, but I think it is "safer" with Reno, no? Cheers, Matt
On Wed, May 8, 2024 at 12:36 AM Matthieu Baerts <matttbe@kernel.org> wrote: > > > > > The concern with picking reno is extra deps to CI and every developer. > > Currently in selftests/bpf/config we do: > > CONFIG_TCP_CONG_DCTCP=y > > CONFIG_TCP_CONG_BBR=y > > > > I'd like to avoid adding reno there as well. > > Will bpf_setsockopt("dctcp") work? > > We picked Reno because this is an inlined kernel module that is always > built: there is no kernel config to set, no extra deps. Also, it is > usually not used as default, mostly used as fallback, so the > verification should not be an issue. Ahh. didn't realize that it's builtin. Then sure. keep it as reno.
diff --git a/tools/testing/selftests/bpf/progs/mptcp_subflow.c b/tools/testing/selftests/bpf/progs/mptcp_subflow.c new file mode 100644 index 000000000000..de9dbba37133 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/mptcp_subflow.c @@ -0,0 +1,70 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2020, Tessares SA. */ +/* Copyright (c) 2024, Kylin Software */ + +#include <sys/socket.h> // SOL_SOCKET, SO_MARK, ... +#include <linux/tcp.h> // TCP_CONGESTION +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> +#include "bpf_tcp_helpers.h" + +char _license[] SEC("license") = "GPL"; + +#ifndef SOL_TCP +#define SOL_TCP 6 +#endif + +#ifndef TCP_CA_NAME_MAX +#define TCP_CA_NAME_MAX 16 +#endif + +char cc[TCP_CA_NAME_MAX] = "reno"; + +/* Associate a subflow counter to each token */ +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u32)); + __uint(max_entries, 100); +} mptcp_sf SEC(".maps"); + +SEC("sockops") +int mptcp_subflow(struct bpf_sock_ops *skops) +{ + __u32 init = 1, key, mark, *cnt; + struct mptcp_sock *msk; + struct bpf_sock *sk; + int err; + + if (skops->op != BPF_SOCK_OPS_TCP_CONNECT_CB) + return 1; + + sk = skops->sk; + if (!sk) + return 1; + + msk = bpf_skc_to_mptcp_sock(sk); + if (!msk) + return 1; + + key = msk->token; + cnt = bpf_map_lookup_elem(&mptcp_sf, &key); + if (cnt) { + /* A new subflow is added to an existing MPTCP connection */ + __sync_fetch_and_add(cnt, 1); + mark = *cnt; + } else { + /* A new MPTCP connection is just initiated and this is its primary subflow */ + bpf_map_update_elem(&mptcp_sf, &key, &init, BPF_ANY); + mark = init; + } + + /* Set the mark of the subflow's socket based on appearance order */ + err = bpf_setsockopt(skops, SOL_SOCKET, SO_MARK, &mark, sizeof(mark)); + if (err < 0) + return 1; + if (mark == 1) + err = bpf_setsockopt(skops, SOL_TCP, TCP_CONGESTION, cc, TCP_CA_NAME_MAX); + + return 1; +}