diff mbox series

[mptcp-net,v2] mptcp: fix the default value of scaling_ratio

Message ID 0ccc1c26d27d6ee7be22806a97983d37c6ca548c.1715053270.git.tanggeliang@kylinos.cn (mailing list archive)
State Changes Requested, archived
Delegated to: Matthieu Baerts
Headers show
Series [mptcp-net,v2] mptcp: fix the default value of scaling_ratio | expand

Checks

Context Check Description
matttbe/build success Build and static analysis OK
matttbe/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
matttbe/shellcheck success MPTCP selftests files have not been modified
matttbe/KVM_Validation__normal success Success! ✅
matttbe/KVM_Validation__debug success Success! ✅
matttbe/KVM_Validation__btf__only_bpftest_all_ success Success! ✅

Commit Message

Geliang Tang May 7, 2024, 3:46 a.m. UTC
From: Geliang Tang <tanggeliang@kylinos.cn>

BPF tests fail sometimes with "bytes != total_bytes" errors:

 test_default:PASS:sched_init:default 0 nsec
 send_data:PASS:pthread_create 0 nsec
 send_data:FAIL:recv 936000 != 10485760 nr_recv:-1 errno:11
 default: 3041 ms
 server:FAIL:send 7579500 != 10485760 nr_sent:-1 errno:11
 send_data:FAIL:pthread_join thread_ret:-11 test_default:PASS: \
				has_bytes_sent addr_1 0 nsec
 test_default:PASS:has_bytes_sent addr_2 0 nsec
 close_netns:PASS:setns 0 nsec

In this case mptcp_recvmsg() gets EAGAIN errors. This issue introduces
by commit b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning"). The default
value of scaling_ratio should be TCP_DEFAULT_SCALING_RATIO, not U8_MAX.

Fixes: b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
v2:
 - I finally found the root cause of this issue.
 - cc Martin too.
---
 net/mptcp/protocol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

MPTCP CI May 7, 2024, 4:33 a.m. UTC | #1
Hi Geliang,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/8979422287

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/1c37730c0e5f
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=851005


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
Geliang Tang May 7, 2024, 5:37 a.m. UTC | #2
On Tue, May 07, 2024 at 11:46:31AM +0800, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
> 
> BPF tests fail sometimes with "bytes != total_bytes" errors:
> 
>  test_default:PASS:sched_init:default 0 nsec
>  send_data:PASS:pthread_create 0 nsec
>  send_data:FAIL:recv 936000 != 10485760 nr_recv:-1 errno:11
>  default: 3041 ms
>  server:FAIL:send 7579500 != 10485760 nr_sent:-1 errno:11
>  send_data:FAIL:pthread_join thread_ret:-11 test_default:PASS: \
> 				has_bytes_sent addr_1 0 nsec
>  test_default:PASS:has_bytes_sent addr_2 0 nsec
>  close_netns:PASS:setns 0 nsec
> 
> In this case mptcp_recvmsg() gets EAGAIN errors. This issue introduces
> by commit b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning"). The default
> value of scaling_ratio should be TCP_DEFAULT_SCALING_RATIO, not U8_MAX.

Please update the last line of the commit log as:

'''
  value of scaling_ratio should be TCP_DEFAULT_SCALING_RATIO (128), not
  U8_MAX (255).
'''

Thanks,
-Geliang

> 
> Fixes: b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning")
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
> v2:
>  - I finally found the root cause of this issue.
>  - cc Martin too.
> ---
>  net/mptcp/protocol.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 579031c60937..d00cd21e8d3f 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -1981,9 +1981,9 @@ static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk,
>   */
>  static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
>  {
> +	u8 scaling_ratio = TCP_DEFAULT_SCALING_RATIO;
>  	struct mptcp_subflow_context *subflow;
>  	struct sock *sk = (struct sock *)msk;
> -	u8 scaling_ratio = U8_MAX;
>  	u32 time, advmss = 1;
>  	u64 rtt_us, mstamp;
>  
> -- 
> 2.43.0
>
Matthieu Baerts (NGI0) May 7, 2024, 8:12 a.m. UTC | #3
Hi Geliang,

(+cc Paolo who wrote the commit mentioned by the "Fixes" tag)

On 07/05/2024 05:46, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
> 
> BPF tests fail sometimes with "bytes != total_bytes" errors:
> 
>  test_default:PASS:sched_init:default 0 nsec
>  send_data:PASS:pthread_create 0 nsec
>  send_data:FAIL:recv 936000 != 10485760 nr_recv:-1 errno:11
>  default: 3041 ms
>  server:FAIL:send 7579500 != 10485760 nr_sent:-1 errno:11
>  send_data:FAIL:pthread_join thread_ret:-11 test_default:PASS: \
> 				has_bytes_sent addr_1 0 nsec
>  test_default:PASS:has_bytes_sent addr_2 0 nsec
>  close_netns:PASS:setns 0 nsec
> 
> In this case mptcp_recvmsg() gets EAGAIN errors. This issue introduces
> by commit b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning"). The default
> value of scaling_ratio should be TCP_DEFAULT_SCALING_RATIO, not U8_MAX.

It looks like you are modifying the max value, not the default one.

Also, please always explain the reason: why should it be X, not Y?

> 
> Fixes: b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning")
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
> v2:
>  - I finally found the root cause of this issue.
>  - cc Martin too.
> ---
>  net/mptcp/protocol.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 579031c60937..d00cd21e8d3f 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -1981,9 +1981,9 @@ static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk,
>   */
>  static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
>  {
> +	u8 scaling_ratio = TCP_DEFAULT_SCALING_RATIO;

With this modification, you are limiting msk->scaling_ratio to
TCP_DEFAULT_SCALING_RATIO. That's probably not what you want. See below:

  (...)
  mptcp_for_each_subflow(msk, subflow)
      (...)
      scaling_ratio = min(tp->scaling_ratio, scaling_ratio);

  (...)
  msk->scaling_ratio = scaling_ratio;


Do you mean that in your case, msk->scaling_ratio was assigned to
U8_MAX? Or it was assigned to a too high value (>TCP_DEFAULT_SCALING_RATIO)?

Please add such info in the commit message as well.

>  	struct mptcp_subflow_context *subflow;
>  	struct sock *sk = (struct sock *)msk;
> -	u8 scaling_ratio = U8_MAX;
>  	u32 time, advmss = 1;
>  	u64 rtt_us, mstamp;
>  

Cheers,
Matt
Geliang Tang May 7, 2024, 9:34 a.m. UTC | #4
Hi Matt,

On Tue, May 07, 2024 at 10:12:41AM +0200, Matthieu Baerts wrote:
> Hi Geliang,
> 
> (+cc Paolo who wrote the commit mentioned by the "Fixes" tag)
> 
> On 07/05/2024 05:46, Geliang Tang wrote:
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> > 
> > BPF tests fail sometimes with "bytes != total_bytes" errors:
> > 
> >  test_default:PASS:sched_init:default 0 nsec
> >  send_data:PASS:pthread_create 0 nsec
> >  send_data:FAIL:recv 936000 != 10485760 nr_recv:-1 errno:11
> >  default: 3041 ms
> >  server:FAIL:send 7579500 != 10485760 nr_sent:-1 errno:11
> >  send_data:FAIL:pthread_join thread_ret:-11 test_default:PASS: \
> > 				has_bytes_sent addr_1 0 nsec
> >  test_default:PASS:has_bytes_sent addr_2 0 nsec
> >  close_netns:PASS:setns 0 nsec
> > 
> > In this case mptcp_recvmsg() gets EAGAIN errors. This issue introduces
> > by commit b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning"). The default
> > value of scaling_ratio should be TCP_DEFAULT_SCALING_RATIO, not U8_MAX.
> 
> It looks like you are modifying the max value, not the default one.
> 
> Also, please always explain the reason: why should it be X, not Y?
> 
> > 
> > Fixes: b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning")
> > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487
> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > ---
> > v2:
> >  - I finally found the root cause of this issue.
> >  - cc Martin too.
> > ---
> >  net/mptcp/protocol.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index 579031c60937..d00cd21e8d3f 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -1981,9 +1981,9 @@ static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk,
> >   */
> >  static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
> >  {
> > +	u8 scaling_ratio = TCP_DEFAULT_SCALING_RATIO;
> 
> With this modification, you are limiting msk->scaling_ratio to
> TCP_DEFAULT_SCALING_RATIO. That's probably not what you want. See below:
> 
>   (...)
>   mptcp_for_each_subflow(msk, subflow)
>       (...)
>       scaling_ratio = min(tp->scaling_ratio, scaling_ratio);
> 
>   (...)
>   msk->scaling_ratio = scaling_ratio;
> 
> 
> Do you mean that in your case, msk->scaling_ratio was assigned to
> U8_MAX? Or it was assigned to a too high value (>TCP_DEFAULT_SCALING_RATIO)?
> 
> Please add such info in the commit message as well.

Sure. Here's the new subject and commit log:

'''
Subject: mptcp: fix the max value of scaling_ratio

BPF tests fail sometimes (a probability of approximately 1%) with
"bytes != total_bytes" errors:

 test_burst:PASS:open_and_load:burst 0 nsec
 test_bpf_sched:PASS:Scheduler name too long 0 nsec
 test_bpf_sched:PASS:burst 0 nsec
 create_netns:PASS:ip netns add mptcp_ns 0 nsec
 create_netns:PASS:ip -net mptcp_ns link set dev lo up 0 nsec
 sched_init:PASS:create_netns 0 nsec
 endpoint_init:PASS:ip -net mptcp_ns link add veth1 type veth peer name
 endpoint_init:PASS:ip -net mptcp_ns addr add 10.0.1.1/24 dev veth1 0 nsec
 endpoint_init:PASS:ip -net mptcp_ns link set dev veth1 up 0 nsec
 endpoint_init:PASS:ip -net mptcp_ns addr add 10.0.1.2/24 dev veth2 0 nsec
 endpoint_init:PASS:ip -net mptcp_ns link set dev veth2 up 0 nsec
 endpoint_init:PASS:ip -net mptcp_ns mptcp endpoint add 10.0.1.2 subflow
 sched_init:PASS:endpoint_init 0 nsec
 test_bpf_sched:PASS:burst 0 nsec
 send_data_and_verify:PASS:burst 0 nsec
 send_data_and_verify:PASS:burst 0 nsec
 (network_helpers.c:613: errno: Resource temporarily unavailable) \
                                        send 5608500 expected 10485760
 (network_helpers.c:661: errno: None) recv 2755984 expected 10485760
 (network_helpers.c:669: errno: None) Failed in thread_ret -11
 send_data_and_verify:FAIL:send_recv_data unexpected error: -4 (errno 0)
 #162/9   mptcp/burst:FAIL
 #162     mptcp:FAIL

In this case, mptcp_recvmsg() gets EAGAIN errors. This issue introduces
by commit b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning"). The max
value of scaling_ratio should be TCP_DEFAULT_SCALING_RATIO (128), not
U8_MAX (255). Otherwise, scaling_ratio is assigned to a too high value.
'''

Thanks,
-Geliang

> 
> >  	struct mptcp_subflow_context *subflow;
> >  	struct sock *sk = (struct sock *)msk;
> > -	u8 scaling_ratio = U8_MAX;
> >  	u32 time, advmss = 1;
> >  	u64 rtt_us, mstamp;
> >  
> 
> Cheers,
> Matt
> -- 
> Sponsored by the NGI0 Core fund.
>
Matthieu Baerts (NGI0) May 7, 2024, 10 a.m. UTC | #5
Hi Geliang,

On 07/05/2024 11:34, Geliang Tang wrote:
> Hi Matt,
> 
> On Tue, May 07, 2024 at 10:12:41AM +0200, Matthieu Baerts wrote:
>> Hi Geliang,
>>
>> (+cc Paolo who wrote the commit mentioned by the "Fixes" tag)
>>
>> On 07/05/2024 05:46, Geliang Tang wrote:
>>> From: Geliang Tang <tanggeliang@kylinos.cn>
>>>
>>> BPF tests fail sometimes with "bytes != total_bytes" errors:
>>>
>>>  test_default:PASS:sched_init:default 0 nsec
>>>  send_data:PASS:pthread_create 0 nsec
>>>  send_data:FAIL:recv 936000 != 10485760 nr_recv:-1 errno:11
>>>  default: 3041 ms
>>>  server:FAIL:send 7579500 != 10485760 nr_sent:-1 errno:11
>>>  send_data:FAIL:pthread_join thread_ret:-11 test_default:PASS: \
>>> 				has_bytes_sent addr_1 0 nsec
>>>  test_default:PASS:has_bytes_sent addr_2 0 nsec
>>>  close_netns:PASS:setns 0 nsec
>>>
>>> In this case mptcp_recvmsg() gets EAGAIN errors. This issue introduces
>>> by commit b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning"). The default
>>> value of scaling_ratio should be TCP_DEFAULT_SCALING_RATIO, not U8_MAX.
>>
>> It looks like you are modifying the max value, not the default one.
>>
>> Also, please always explain the reason: why should it be X, not Y?
>>
>>>
>>> Fixes: b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning")
>>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487
>>> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
>>> ---
>>> v2:
>>>  - I finally found the root cause of this issue.
>>>  - cc Martin too.
>>> ---
>>>  net/mptcp/protocol.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>> index 579031c60937..d00cd21e8d3f 100644
>>> --- a/net/mptcp/protocol.c
>>> +++ b/net/mptcp/protocol.c
>>> @@ -1981,9 +1981,9 @@ static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk,
>>>   */
>>>  static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
>>>  {
>>> +	u8 scaling_ratio = TCP_DEFAULT_SCALING_RATIO;
>>
>> With this modification, you are limiting msk->scaling_ratio to
>> TCP_DEFAULT_SCALING_RATIO. That's probably not what you want. See below:
>>
>>   (...)
>>   mptcp_for_each_subflow(msk, subflow)
>>       (...)
>>       scaling_ratio = min(tp->scaling_ratio, scaling_ratio);
>>
>>   (...)
>>   msk->scaling_ratio = scaling_ratio;
>>
>>
>> Do you mean that in your case, msk->scaling_ratio was assigned to
>> U8_MAX? Or it was assigned to a too high value (>TCP_DEFAULT_SCALING_RATIO)?
>>
>> Please add such info in the commit message as well.
> 
> Sure. Here's the new subject and commit log:
> 
> '''
> Subject: mptcp: fix the max value of scaling_ratio
> 
> BPF tests fail sometimes (a probability of approximately 1%) with
> "bytes != total_bytes" errors:
> 
>  test_burst:PASS:open_and_load:burst 0 nsec
>  test_bpf_sched:PASS:Scheduler name too long 0 nsec
>  test_bpf_sched:PASS:burst 0 nsec
>  create_netns:PASS:ip netns add mptcp_ns 0 nsec
>  create_netns:PASS:ip -net mptcp_ns link set dev lo up 0 nsec
>  sched_init:PASS:create_netns 0 nsec
>  endpoint_init:PASS:ip -net mptcp_ns link add veth1 type veth peer name
>  endpoint_init:PASS:ip -net mptcp_ns addr add 10.0.1.1/24 dev veth1 0 nsec
>  endpoint_init:PASS:ip -net mptcp_ns link set dev veth1 up 0 nsec
>  endpoint_init:PASS:ip -net mptcp_ns addr add 10.0.1.2/24 dev veth2 0 nsec
>  endpoint_init:PASS:ip -net mptcp_ns link set dev veth2 up 0 nsec
>  endpoint_init:PASS:ip -net mptcp_ns mptcp endpoint add 10.0.1.2 subflow
>  sched_init:PASS:endpoint_init 0 nsec
>  test_bpf_sched:PASS:burst 0 nsec
>  send_data_and_verify:PASS:burst 0 nsec
>  send_data_and_verify:PASS:burst 0 nsec
>  (network_helpers.c:613: errno: Resource temporarily unavailable) \
>                                         send 5608500 expected 10485760
>  (network_helpers.c:661: errno: None) recv 2755984 expected 10485760
>  (network_helpers.c:669: errno: None) Failed in thread_ret -11
>  send_data_and_verify:FAIL:send_recv_data unexpected error: -4 (errno 0)
>  #162/9   mptcp/burst:FAIL
>  #162     mptcp:FAIL
> 
> In this case, mptcp_recvmsg() gets EAGAIN errors. This issue introduces
> by commit b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning"). The max
> value of scaling_ratio should be TCP_DEFAULT_SCALING_RATIO (128), not
> U8_MAX (255). Otherwise, scaling_ratio is assigned to a too high value.
> '''

Sorry, I'm still not sure whether I understand the issue:

- Why limiting to the default value?

- Here, we want the higher scaling_ratio from all the available
subflows: is this value too high? What "too high value" you got that was
causing the issue you mentioned?

- What is a "too high value" for you and why?

- Or is it because scaling_ratio was assigned to U8_MAX and not changed
to the max value of the subflows by accident because there was a bypass
somehow?

Cheers,
Matt
diff mbox series

Patch

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 579031c60937..d00cd21e8d3f 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1981,9 +1981,9 @@  static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk,
  */
 static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
 {
+	u8 scaling_ratio = TCP_DEFAULT_SCALING_RATIO;
 	struct mptcp_subflow_context *subflow;
 	struct sock *sk = (struct sock *)msk;
-	u8 scaling_ratio = U8_MAX;
 	u32 time, advmss = 1;
 	u64 rtt_us, mstamp;