diff mbox series

[bpf-next,3/4] selftests/bpf: Add mptcp subflow example

Message ID 20240507-upstream-bpf-next-20240506-mptcp-subflow-test-v1-3-e2bcbdf49857@kernel.org (mailing list archive)
State New
Headers show
Series selftests/bpf: new MPTCP subflow subtest & improvements | expand

Commit Message

Matthieu Baerts (NGI0) May 7, 2024, 10:53 a.m. UTC
From: Nicolas Rybowski <nicolas.rybowski@tessares.net>

Move Nicolas's patch into bpf selftests directory. This example added a
test that was adding a different mark (SO_MARK) on each subflow, and
changing the TCP CC only on the first subflow.

This example shows how it is possible to:

    Identify the parent msk of an MPTCP subflow.
    Put different sockopt for each subflow of a same MPTCP connection.

Here especially, we implemented two different behaviours:

    A socket mark (SOL_SOCKET SO_MARK) is put on each subflow of a same
    MPTCP connection. The order of creation of the current subflow defines
    its mark. The TCP CC algorithm of the very first subflow of an MPTCP
    connection is set to "reno".

The code comes from

    commit 4d120186e4d6 ("bpf:examples: update mptcp_set_mark_kern.c")

in MPTCP repo https://github.com/multipath-tcp/mptcp_net-next (the
"scripts" branch).

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/76
Co-developed-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 tools/testing/selftests/bpf/progs/mptcp_subflow.c | 70 +++++++++++++++++++++++
 1 file changed, 70 insertions(+)

Comments

Alexei Starovoitov May 7, 2024, 2:49 p.m. UTC | #1
On Tue, May 7, 2024 at 3:53 AM Matthieu Baerts (NGI0)
<matttbe@kernel.org> wrote:
>
> From: Nicolas Rybowski <nicolas.rybowski@tessares.net>
>
> Move Nicolas's patch into bpf selftests directory. This example added a
> test that was adding a different mark (SO_MARK) on each subflow, and
> changing the TCP CC only on the first subflow.
>
> This example shows how it is possible to:
>
>     Identify the parent msk of an MPTCP subflow.
>     Put different sockopt for each subflow of a same MPTCP connection.
>
> Here especially, we implemented two different behaviours:
>
>     A socket mark (SOL_SOCKET SO_MARK) is put on each subflow of a same
>     MPTCP connection. The order of creation of the current subflow defines
>     its mark.

> The TCP CC algorithm of the very first subflow of an MPTCP
>     connection is set to "reno".

why?
What does it test?
That bpf_setsockopt() can actually do it?
But the next patch doesn't check that it's reno.

It looks to me that dropping this "set to reno" part
won't change the purpose of the rest of selftest.

pw-bot: cr
Matthieu Baerts (NGI0) May 7, 2024, 4:03 p.m. UTC | #2
Hi Alexei,

Thank you for the review!

On 07/05/2024 16:49, Alexei Starovoitov wrote:
> On Tue, May 7, 2024 at 3:53 AM Matthieu Baerts (NGI0)
> <matttbe@kernel.org> wrote:
>>
>> From: Nicolas Rybowski <nicolas.rybowski@tessares.net>
>>
>> Move Nicolas's patch into bpf selftests directory. This example added a
>> test that was adding a different mark (SO_MARK) on each subflow, and
>> changing the TCP CC only on the first subflow.
>>
>> This example shows how it is possible to:
>>
>>     Identify the parent msk of an MPTCP subflow.
>>     Put different sockopt for each subflow of a same MPTCP connection.
>>
>> Here especially, we implemented two different behaviours:
>>
>>     A socket mark (SOL_SOCKET SO_MARK) is put on each subflow of a same
>>     MPTCP connection. The order of creation of the current subflow defines
>>     its mark.
> 
>> The TCP CC algorithm of the very first subflow of an MPTCP
>>     connection is set to "reno".
> 
> why?
> What does it test?
> That bpf_setsockopt() can actually do it?

Correct.

Here is a bit of context: from the userspace, an application can do a
setsockopt() on an MPTCP socket, and typically the same value will be
set on all subflows (paths). If someone wants to have different values
per subflow, the recommanded way is to use BPF.

We can indeed restrict this test to changing the MARK only. I think the
CC has been modified just not to check one thing, but also to change
something at the TCP level, because it is managed differently on MPTCP
side -- but only when the userspace set something, or when new subflows
are created. The result of this operation is easy to check with 'ss',
and it was to show an exemple where this is set only on one subflow.

> But the next patch doesn't check that it's reno.

No, I think it is checked: 'reno' is not hardcoded, but 'skel->data->cc'
is used instead:

  run_subflow(skel->data->cc);

> It looks to me that dropping this "set to reno" part
> won't change the purpose of the rest of selftest.

Yes, up to you. If you still think it is better without it, we can
remove the modification of the CC in patch 3/4, and the validation in
patch 4/4.

> pw-bot: cr

Cheers,
Matt
Alexei Starovoitov May 7, 2024, 8:54 p.m. UTC | #3
On Tue, May 7, 2024 at 9:03 AM Matthieu Baerts <matttbe@kernel.org> wrote:
>
> Hi Alexei,
>
> Thank you for the review!
>
> On 07/05/2024 16:49, Alexei Starovoitov wrote:
> > On Tue, May 7, 2024 at 3:53 AM Matthieu Baerts (NGI0)
> > <matttbe@kernel.org> wrote:
> >>
> >> From: Nicolas Rybowski <nicolas.rybowski@tessares.net>
> >>
> >> Move Nicolas's patch into bpf selftests directory. This example added a
> >> test that was adding a different mark (SO_MARK) on each subflow, and
> >> changing the TCP CC only on the first subflow.
> >>
> >> This example shows how it is possible to:
> >>
> >>     Identify the parent msk of an MPTCP subflow.
> >>     Put different sockopt for each subflow of a same MPTCP connection.
> >>
> >> Here especially, we implemented two different behaviours:
> >>
> >>     A socket mark (SOL_SOCKET SO_MARK) is put on each subflow of a same
> >>     MPTCP connection. The order of creation of the current subflow defines
> >>     its mark.
> >
> >> The TCP CC algorithm of the very first subflow of an MPTCP
> >>     connection is set to "reno".
> >
> > why?
> > What does it test?
> > That bpf_setsockopt() can actually do it?
>
> Correct.
>
> Here is a bit of context: from the userspace, an application can do a
> setsockopt() on an MPTCP socket, and typically the same value will be
> set on all subflows (paths). If someone wants to have different values
> per subflow, the recommanded way is to use BPF.
>
> We can indeed restrict this test to changing the MARK only. I think the
> CC has been modified just not to check one thing, but also to change
> something at the TCP level, because it is managed differently on MPTCP
> side -- but only when the userspace set something, or when new subflows
> are created. The result of this operation is easy to check with 'ss',
> and it was to show an exemple where this is set only on one subflow.
>
> > But the next patch doesn't check that it's reno.
>
> No, I think it is checked: 'reno' is not hardcoded, but 'skel->data->cc'
> is used instead:
>
>   run_subflow(skel->data->cc);
>
> > It looks to me that dropping this "set to reno" part
> > won't change the purpose of the rest of selftest.
>
> Yes, up to you. If you still think it is better without it, we can
> remove the modification of the CC in patch 3/4, and the validation in
> patch 4/4.

The concern with picking reno is extra deps to CI and every developer.
Currently in selftests/bpf/config we do:
CONFIG_TCP_CONG_DCTCP=y
CONFIG_TCP_CONG_BBR=y

I'd like to avoid adding reno there as well.
Will bpf_setsockopt("dctcp") work?
Matthieu Baerts (NGI0) May 8, 2024, 7:36 a.m. UTC | #4
Hi Alexei,

Thank you for your reply!

On 07/05/2024 22:54, Alexei Starovoitov wrote:
> On Tue, May 7, 2024 at 9:03 AM Matthieu Baerts <matttbe@kernel.org> wrote:
>>
>> Hi Alexei,
>>
>> Thank you for the review!
>>
>> On 07/05/2024 16:49, Alexei Starovoitov wrote:
>>> On Tue, May 7, 2024 at 3:53 AM Matthieu Baerts (NGI0)
>>> <matttbe@kernel.org> wrote:
>>>>
>>>> From: Nicolas Rybowski <nicolas.rybowski@tessares.net>
>>>>
>>>> Move Nicolas's patch into bpf selftests directory. This example added a
>>>> test that was adding a different mark (SO_MARK) on each subflow, and
>>>> changing the TCP CC only on the first subflow.
>>>>
>>>> This example shows how it is possible to:
>>>>
>>>>     Identify the parent msk of an MPTCP subflow.
>>>>     Put different sockopt for each subflow of a same MPTCP connection.
>>>>
>>>> Here especially, we implemented two different behaviours:
>>>>
>>>>     A socket mark (SOL_SOCKET SO_MARK) is put on each subflow of a same
>>>>     MPTCP connection. The order of creation of the current subflow defines
>>>>     its mark.
>>>
>>>> The TCP CC algorithm of the very first subflow of an MPTCP
>>>>     connection is set to "reno".
>>>
>>> why?
>>> What does it test?
>>> That bpf_setsockopt() can actually do it?
>>
>> Correct.
>>
>> Here is a bit of context: from the userspace, an application can do a
>> setsockopt() on an MPTCP socket, and typically the same value will be
>> set on all subflows (paths). If someone wants to have different values
>> per subflow, the recommanded way is to use BPF.
>>
>> We can indeed restrict this test to changing the MARK only. I think the
>> CC has been modified just not to check one thing, but also to change
>> something at the TCP level, because it is managed differently on MPTCP
>> side -- but only when the userspace set something, or when new subflows
>> are created. The result of this operation is easy to check with 'ss',
>> and it was to show an exemple where this is set only on one subflow.
>>
>>> But the next patch doesn't check that it's reno.
>>
>> No, I think it is checked: 'reno' is not hardcoded, but 'skel->data->cc'
>> is used instead:
>>
>>   run_subflow(skel->data->cc);
>>
>>> It looks to me that dropping this "set to reno" part
>>> won't change the purpose of the rest of selftest.
>>
>> Yes, up to you. If you still think it is better without it, we can
>> remove the modification of the CC in patch 3/4, and the validation in
>> patch 4/4.
> 
> The concern with picking reno is extra deps to CI and every developer.
> Currently in selftests/bpf/config we do:
> CONFIG_TCP_CONG_DCTCP=y
> CONFIG_TCP_CONG_BBR=y
> 
> I'd like to avoid adding reno there as well.
> Will bpf_setsockopt("dctcp") work?

We picked Reno because this is an inlined kernel module that is always
built: there is no kernel config to set, no extra deps. Also, it is
usually not used as default, mostly used as fallback, so the
verification should not be an issue.

We can switch to DCTCP or BBR if you prefer, but I think it is "safer"
with Reno, no?

Cheers,
Matt
Alexei Starovoitov May 8, 2024, 2:32 p.m. UTC | #5
On Wed, May 8, 2024 at 12:36 AM Matthieu Baerts <matttbe@kernel.org> wrote:
>
> >
> > The concern with picking reno is extra deps to CI and every developer.
> > Currently in selftests/bpf/config we do:
> > CONFIG_TCP_CONG_DCTCP=y
> > CONFIG_TCP_CONG_BBR=y
> >
> > I'd like to avoid adding reno there as well.
> > Will bpf_setsockopt("dctcp") work?
>
> We picked Reno because this is an inlined kernel module that is always
> built: there is no kernel config to set, no extra deps. Also, it is
> usually not used as default, mostly used as fallback, so the
> verification should not be an issue.

Ahh. didn't realize that it's builtin. Then sure. keep it as reno.
diff mbox series

Patch

diff --git a/tools/testing/selftests/bpf/progs/mptcp_subflow.c b/tools/testing/selftests/bpf/progs/mptcp_subflow.c
new file mode 100644
index 000000000000..de9dbba37133
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/mptcp_subflow.c
@@ -0,0 +1,70 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020, Tessares SA. */
+/* Copyright (c) 2024, Kylin Software */
+
+#include <sys/socket.h> // SOL_SOCKET, SO_MARK, ...
+#include <linux/tcp.h>  // TCP_CONGESTION
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_tcp_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+
+#ifndef SOL_TCP
+#define SOL_TCP 6
+#endif
+
+#ifndef TCP_CA_NAME_MAX
+#define TCP_CA_NAME_MAX 16
+#endif
+
+char cc[TCP_CA_NAME_MAX] = "reno";
+
+/* Associate a subflow counter to each token */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+	__uint(max_entries, 100);
+} mptcp_sf SEC(".maps");
+
+SEC("sockops")
+int mptcp_subflow(struct bpf_sock_ops *skops)
+{
+	__u32 init = 1, key, mark, *cnt;
+	struct mptcp_sock *msk;
+	struct bpf_sock *sk;
+	int err;
+
+	if (skops->op != BPF_SOCK_OPS_TCP_CONNECT_CB)
+		return 1;
+
+	sk = skops->sk;
+	if (!sk)
+		return 1;
+
+	msk = bpf_skc_to_mptcp_sock(sk);
+	if (!msk)
+		return 1;
+
+	key = msk->token;
+	cnt = bpf_map_lookup_elem(&mptcp_sf, &key);
+	if (cnt) {
+		/* A new subflow is added to an existing MPTCP connection */
+		__sync_fetch_and_add(cnt, 1);
+		mark = *cnt;
+	} else {
+		/* A new MPTCP connection is just initiated and this is its primary subflow */
+		bpf_map_update_elem(&mptcp_sf, &key, &init, BPF_ANY);
+		mark = init;
+	}
+
+	/* Set the mark of the subflow's socket based on appearance order */
+	err = bpf_setsockopt(skops, SOL_SOCKET, SO_MARK, &mark, sizeof(mark));
+	if (err < 0)
+		return 1;
+	if (mark == 1)
+		err = bpf_setsockopt(skops, SOL_TCP, TCP_CONGESTION, cc, TCP_CA_NAME_MAX);
+
+	return 1;
+}