Message ID | 0ccc1c26d27d6ee7be22806a97983d37c6ca548c.1715053270.git.tanggeliang@kylinos.cn (mailing list archive) |
---|---|
State | Changes Requested, archived |
Delegated to: | Matthieu Baerts |
Headers | show |
Series | [mptcp-net,v2] mptcp: fix the default value of scaling_ratio | expand |
Context | Check | Description |
---|---|---|
matttbe/build | success | Build and static analysis OK |
matttbe/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 10 lines checked |
matttbe/shellcheck | success | MPTCP selftests files have not been modified |
matttbe/KVM_Validation__normal | success | Success! ✅ |
matttbe/KVM_Validation__debug | success | Success! ✅ |
matttbe/KVM_Validation__btf__only_bpftest_all_ | success | Success! ✅ |
Hi Geliang, Thank you for your modifications, that's great! Our CI did some validations and here is its report: - KVM Validation: normal: Success! ✅ - KVM Validation: debug: Success! ✅ - KVM Validation: btf (only bpftest_all): Success! ✅ - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/8979422287 Initiator: Patchew Applier Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/1c37730c0e5f Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=851005 If there are some issues, you can reproduce them using the same environment as the one used by the CI thanks to a docker image, e.g.: $ cd [kernel source code] $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \ --pull always mptcp/mptcp-upstream-virtme-docker:latest \ auto-normal For more details: https://github.com/multipath-tcp/mptcp-upstream-virtme-docker Please note that despite all the efforts that have been already done to have a stable tests suite when executed on a public CI like here, it is possible some reported issues are not due to your modifications. Still, do not hesitate to help us improve that ;-) Cheers, MPTCP GH Action bot Bot operated by Matthieu Baerts (NGI0 Core)
On Tue, May 07, 2024 at 11:46:31AM +0800, Geliang Tang wrote: > From: Geliang Tang <tanggeliang@kylinos.cn> > > BPF tests fail sometimes with "bytes != total_bytes" errors: > > test_default:PASS:sched_init:default 0 nsec > send_data:PASS:pthread_create 0 nsec > send_data:FAIL:recv 936000 != 10485760 nr_recv:-1 errno:11 > default: 3041 ms > server:FAIL:send 7579500 != 10485760 nr_sent:-1 errno:11 > send_data:FAIL:pthread_join thread_ret:-11 test_default:PASS: \ > has_bytes_sent addr_1 0 nsec > test_default:PASS:has_bytes_sent addr_2 0 nsec > close_netns:PASS:setns 0 nsec > > In this case mptcp_recvmsg() gets EAGAIN errors. This issue introduces > by commit b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning"). The default > value of scaling_ratio should be TCP_DEFAULT_SCALING_RATIO, not U8_MAX. Please update the last line of the commit log as: ''' value of scaling_ratio should be TCP_DEFAULT_SCALING_RATIO (128), not U8_MAX (255). ''' Thanks, -Geliang > > Fixes: b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning") > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487 > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> > --- > v2: > - I finally found the root cause of this issue. > - cc Martin too. > --- > net/mptcp/protocol.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > index 579031c60937..d00cd21e8d3f 100644 > --- a/net/mptcp/protocol.c > +++ b/net/mptcp/protocol.c > @@ -1981,9 +1981,9 @@ static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk, > */ > static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied) > { > + u8 scaling_ratio = TCP_DEFAULT_SCALING_RATIO; > struct mptcp_subflow_context *subflow; > struct sock *sk = (struct sock *)msk; > - u8 scaling_ratio = U8_MAX; > u32 time, advmss = 1; > u64 rtt_us, mstamp; > > -- > 2.43.0 >
Hi Geliang, (+cc Paolo who wrote the commit mentioned by the "Fixes" tag) On 07/05/2024 05:46, Geliang Tang wrote: > From: Geliang Tang <tanggeliang@kylinos.cn> > > BPF tests fail sometimes with "bytes != total_bytes" errors: > > test_default:PASS:sched_init:default 0 nsec > send_data:PASS:pthread_create 0 nsec > send_data:FAIL:recv 936000 != 10485760 nr_recv:-1 errno:11 > default: 3041 ms > server:FAIL:send 7579500 != 10485760 nr_sent:-1 errno:11 > send_data:FAIL:pthread_join thread_ret:-11 test_default:PASS: \ > has_bytes_sent addr_1 0 nsec > test_default:PASS:has_bytes_sent addr_2 0 nsec > close_netns:PASS:setns 0 nsec > > In this case mptcp_recvmsg() gets EAGAIN errors. This issue introduces > by commit b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning"). The default > value of scaling_ratio should be TCP_DEFAULT_SCALING_RATIO, not U8_MAX. It looks like you are modifying the max value, not the default one. Also, please always explain the reason: why should it be X, not Y? > > Fixes: b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning") > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487 > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> > --- > v2: > - I finally found the root cause of this issue. > - cc Martin too. > --- > net/mptcp/protocol.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > index 579031c60937..d00cd21e8d3f 100644 > --- a/net/mptcp/protocol.c > +++ b/net/mptcp/protocol.c > @@ -1981,9 +1981,9 @@ static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk, > */ > static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied) > { > + u8 scaling_ratio = TCP_DEFAULT_SCALING_RATIO; With this modification, you are limiting msk->scaling_ratio to TCP_DEFAULT_SCALING_RATIO. That's probably not what you want. See below: (...) mptcp_for_each_subflow(msk, subflow) (...) scaling_ratio = min(tp->scaling_ratio, scaling_ratio); (...) msk->scaling_ratio = scaling_ratio; Do you mean that in your case, msk->scaling_ratio was assigned to U8_MAX? Or it was assigned to a too high value (>TCP_DEFAULT_SCALING_RATIO)? Please add such info in the commit message as well. > struct mptcp_subflow_context *subflow; > struct sock *sk = (struct sock *)msk; > - u8 scaling_ratio = U8_MAX; > u32 time, advmss = 1; > u64 rtt_us, mstamp; > Cheers, Matt
Hi Matt, On Tue, May 07, 2024 at 10:12:41AM +0200, Matthieu Baerts wrote: > Hi Geliang, > > (+cc Paolo who wrote the commit mentioned by the "Fixes" tag) > > On 07/05/2024 05:46, Geliang Tang wrote: > > From: Geliang Tang <tanggeliang@kylinos.cn> > > > > BPF tests fail sometimes with "bytes != total_bytes" errors: > > > > test_default:PASS:sched_init:default 0 nsec > > send_data:PASS:pthread_create 0 nsec > > send_data:FAIL:recv 936000 != 10485760 nr_recv:-1 errno:11 > > default: 3041 ms > > server:FAIL:send 7579500 != 10485760 nr_sent:-1 errno:11 > > send_data:FAIL:pthread_join thread_ret:-11 test_default:PASS: \ > > has_bytes_sent addr_1 0 nsec > > test_default:PASS:has_bytes_sent addr_2 0 nsec > > close_netns:PASS:setns 0 nsec > > > > In this case mptcp_recvmsg() gets EAGAIN errors. This issue introduces > > by commit b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning"). The default > > value of scaling_ratio should be TCP_DEFAULT_SCALING_RATIO, not U8_MAX. > > It looks like you are modifying the max value, not the default one. > > Also, please always explain the reason: why should it be X, not Y? > > > > > Fixes: b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning") > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487 > > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> > > --- > > v2: > > - I finally found the root cause of this issue. > > - cc Martin too. > > --- > > net/mptcp/protocol.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > > index 579031c60937..d00cd21e8d3f 100644 > > --- a/net/mptcp/protocol.c > > +++ b/net/mptcp/protocol.c > > @@ -1981,9 +1981,9 @@ static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk, > > */ > > static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied) > > { > > + u8 scaling_ratio = TCP_DEFAULT_SCALING_RATIO; > > With this modification, you are limiting msk->scaling_ratio to > TCP_DEFAULT_SCALING_RATIO. That's probably not what you want. See below: > > (...) > mptcp_for_each_subflow(msk, subflow) > (...) > scaling_ratio = min(tp->scaling_ratio, scaling_ratio); > > (...) > msk->scaling_ratio = scaling_ratio; > > > Do you mean that in your case, msk->scaling_ratio was assigned to > U8_MAX? Or it was assigned to a too high value (>TCP_DEFAULT_SCALING_RATIO)? > > Please add such info in the commit message as well. Sure. Here's the new subject and commit log: ''' Subject: mptcp: fix the max value of scaling_ratio BPF tests fail sometimes (a probability of approximately 1%) with "bytes != total_bytes" errors: test_burst:PASS:open_and_load:burst 0 nsec test_bpf_sched:PASS:Scheduler name too long 0 nsec test_bpf_sched:PASS:burst 0 nsec create_netns:PASS:ip netns add mptcp_ns 0 nsec create_netns:PASS:ip -net mptcp_ns link set dev lo up 0 nsec sched_init:PASS:create_netns 0 nsec endpoint_init:PASS:ip -net mptcp_ns link add veth1 type veth peer name endpoint_init:PASS:ip -net mptcp_ns addr add 10.0.1.1/24 dev veth1 0 nsec endpoint_init:PASS:ip -net mptcp_ns link set dev veth1 up 0 nsec endpoint_init:PASS:ip -net mptcp_ns addr add 10.0.1.2/24 dev veth2 0 nsec endpoint_init:PASS:ip -net mptcp_ns link set dev veth2 up 0 nsec endpoint_init:PASS:ip -net mptcp_ns mptcp endpoint add 10.0.1.2 subflow sched_init:PASS:endpoint_init 0 nsec test_bpf_sched:PASS:burst 0 nsec send_data_and_verify:PASS:burst 0 nsec send_data_and_verify:PASS:burst 0 nsec (network_helpers.c:613: errno: Resource temporarily unavailable) \ send 5608500 expected 10485760 (network_helpers.c:661: errno: None) recv 2755984 expected 10485760 (network_helpers.c:669: errno: None) Failed in thread_ret -11 send_data_and_verify:FAIL:send_recv_data unexpected error: -4 (errno 0) #162/9 mptcp/burst:FAIL #162 mptcp:FAIL In this case, mptcp_recvmsg() gets EAGAIN errors. This issue introduces by commit b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning"). The max value of scaling_ratio should be TCP_DEFAULT_SCALING_RATIO (128), not U8_MAX (255). Otherwise, scaling_ratio is assigned to a too high value. ''' Thanks, -Geliang > > > struct mptcp_subflow_context *subflow; > > struct sock *sk = (struct sock *)msk; > > - u8 scaling_ratio = U8_MAX; > > u32 time, advmss = 1; > > u64 rtt_us, mstamp; > > > > Cheers, > Matt > -- > Sponsored by the NGI0 Core fund. >
Hi Geliang, On 07/05/2024 11:34, Geliang Tang wrote: > Hi Matt, > > On Tue, May 07, 2024 at 10:12:41AM +0200, Matthieu Baerts wrote: >> Hi Geliang, >> >> (+cc Paolo who wrote the commit mentioned by the "Fixes" tag) >> >> On 07/05/2024 05:46, Geliang Tang wrote: >>> From: Geliang Tang <tanggeliang@kylinos.cn> >>> >>> BPF tests fail sometimes with "bytes != total_bytes" errors: >>> >>> test_default:PASS:sched_init:default 0 nsec >>> send_data:PASS:pthread_create 0 nsec >>> send_data:FAIL:recv 936000 != 10485760 nr_recv:-1 errno:11 >>> default: 3041 ms >>> server:FAIL:send 7579500 != 10485760 nr_sent:-1 errno:11 >>> send_data:FAIL:pthread_join thread_ret:-11 test_default:PASS: \ >>> has_bytes_sent addr_1 0 nsec >>> test_default:PASS:has_bytes_sent addr_2 0 nsec >>> close_netns:PASS:setns 0 nsec >>> >>> In this case mptcp_recvmsg() gets EAGAIN errors. This issue introduces >>> by commit b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning"). The default >>> value of scaling_ratio should be TCP_DEFAULT_SCALING_RATIO, not U8_MAX. >> >> It looks like you are modifying the max value, not the default one. >> >> Also, please always explain the reason: why should it be X, not Y? >> >>> >>> Fixes: b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning") >>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487 >>> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> >>> --- >>> v2: >>> - I finally found the root cause of this issue. >>> - cc Martin too. >>> --- >>> net/mptcp/protocol.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c >>> index 579031c60937..d00cd21e8d3f 100644 >>> --- a/net/mptcp/protocol.c >>> +++ b/net/mptcp/protocol.c >>> @@ -1981,9 +1981,9 @@ static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk, >>> */ >>> static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied) >>> { >>> + u8 scaling_ratio = TCP_DEFAULT_SCALING_RATIO; >> >> With this modification, you are limiting msk->scaling_ratio to >> TCP_DEFAULT_SCALING_RATIO. That's probably not what you want. See below: >> >> (...) >> mptcp_for_each_subflow(msk, subflow) >> (...) >> scaling_ratio = min(tp->scaling_ratio, scaling_ratio); >> >> (...) >> msk->scaling_ratio = scaling_ratio; >> >> >> Do you mean that in your case, msk->scaling_ratio was assigned to >> U8_MAX? Or it was assigned to a too high value (>TCP_DEFAULT_SCALING_RATIO)? >> >> Please add such info in the commit message as well. > > Sure. Here's the new subject and commit log: > > ''' > Subject: mptcp: fix the max value of scaling_ratio > > BPF tests fail sometimes (a probability of approximately 1%) with > "bytes != total_bytes" errors: > > test_burst:PASS:open_and_load:burst 0 nsec > test_bpf_sched:PASS:Scheduler name too long 0 nsec > test_bpf_sched:PASS:burst 0 nsec > create_netns:PASS:ip netns add mptcp_ns 0 nsec > create_netns:PASS:ip -net mptcp_ns link set dev lo up 0 nsec > sched_init:PASS:create_netns 0 nsec > endpoint_init:PASS:ip -net mptcp_ns link add veth1 type veth peer name > endpoint_init:PASS:ip -net mptcp_ns addr add 10.0.1.1/24 dev veth1 0 nsec > endpoint_init:PASS:ip -net mptcp_ns link set dev veth1 up 0 nsec > endpoint_init:PASS:ip -net mptcp_ns addr add 10.0.1.2/24 dev veth2 0 nsec > endpoint_init:PASS:ip -net mptcp_ns link set dev veth2 up 0 nsec > endpoint_init:PASS:ip -net mptcp_ns mptcp endpoint add 10.0.1.2 subflow > sched_init:PASS:endpoint_init 0 nsec > test_bpf_sched:PASS:burst 0 nsec > send_data_and_verify:PASS:burst 0 nsec > send_data_and_verify:PASS:burst 0 nsec > (network_helpers.c:613: errno: Resource temporarily unavailable) \ > send 5608500 expected 10485760 > (network_helpers.c:661: errno: None) recv 2755984 expected 10485760 > (network_helpers.c:669: errno: None) Failed in thread_ret -11 > send_data_and_verify:FAIL:send_recv_data unexpected error: -4 (errno 0) > #162/9 mptcp/burst:FAIL > #162 mptcp:FAIL > > In this case, mptcp_recvmsg() gets EAGAIN errors. This issue introduces > by commit b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning"). The max > value of scaling_ratio should be TCP_DEFAULT_SCALING_RATIO (128), not > U8_MAX (255). Otherwise, scaling_ratio is assigned to a too high value. > ''' Sorry, I'm still not sure whether I understand the issue: - Why limiting to the default value? - Here, we want the higher scaling_ratio from all the available subflows: is this value too high? What "too high value" you got that was causing the issue you mentioned? - What is a "too high value" for you and why? - Or is it because scaling_ratio was assigned to U8_MAX and not changed to the max value of the subflows by accident because there was a bypass somehow? Cheers, Matt
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 579031c60937..d00cd21e8d3f 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1981,9 +1981,9 @@ static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk, */ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied) { + u8 scaling_ratio = TCP_DEFAULT_SCALING_RATIO; struct mptcp_subflow_context *subflow; struct sock *sk = (struct sock *)msk; - u8 scaling_ratio = U8_MAX; u32 time, advmss = 1; u64 rtt_us, mstamp;