diff mbox series

[mptcp-next] Squash-to: "mptcp: give rcvlowat some love"

Message ID 56f04f69f7e7c37e0a1820787f7e9063a50bf937.1695911979.git.pabeni@redhat.com (mailing list archive)
State Superseded, archived
Delegated to: Matthieu Baerts
Headers show
Series [mptcp-next] Squash-to: "mptcp: give rcvlowat some love" | expand

Checks

Context Check Description
matttbe/build success Build and static analysis OK
matttbe/checkpatch success total: 0 errors, 0 warnings, 0 checks, 37 lines checked
matttbe/KVM_Validation__normal__except_selftest_mptcp_join_ warning Unstable: 1 failed test(s): selftest_userspace_pm
matttbe/KVM_Validation__normal__only_selftest_mptcp_join_ success Success! ✅
matttbe/KVM_Validation__debug__only_selftest_mptcp_join_ success Success! ✅
matttbe/KVM_Validation__debug__except_selftest_mptcp_join_ warning Unstable: 1 failed test(s): packetdrill_sockopts

Commit Message

Paolo Abeni Sept. 28, 2023, 3:52 p.m. UTC
Christoph reported a couple of serious splat caused by
the mentioned patch.

mptcp_set_rcvlowat() can use msk->scaling_ratio, before
such field is initialized, causing a divide by zero: we
need to init it in the sock constructor.

Additionally the same function bogusly cast an msk to a
tcp_sock, causing memory corruption. The reproducer likely
clears the sk refcount for the next msk allocated into the
same slab.

The intent was to properly propagate the rcvbuf changes to
the subflows. Let's do that explicitly.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
--
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/442
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/443

since the above issues are introduced by the squash-to patch, I think
we can't have the tag in the final patch.
---
 net/mptcp/protocol.c |  2 ++
 net/mptcp/sockopt.c  | 17 ++++++++++++++---
 2 files changed, 16 insertions(+), 3 deletions(-)

Comments

Matthieu Baerts Sept. 28, 2023, 5:11 p.m. UTC | #1
Hi Paolo,

On 28/09/2023 17:52, Paolo Abeni wrote:
> Christoph reported a couple of serious splat caused by
> the mentioned patch.
> 
> mptcp_set_rcvlowat() can use msk->scaling_ratio, before
> such field is initialized, causing a divide by zero: we
> need to init it in the sock constructor.
> 
> Additionally the same function bogusly cast an msk to a
> tcp_sock, causing memory corruption. The reproducer likely
> clears the sk refcount for the next msk allocated into the
> same slab.
> 
> The intent was to properly propagate the rcvbuf changes to
> the subflows. Let's do that explicitly.

Thank you for fixing all that!

> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> --
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/442
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/443
> 
> since the above issues are introduced by the squash-to patch, I think
> we can't have the tag in the final patch.
> ---
>  net/mptcp/protocol.c |  2 ++
>  net/mptcp/sockopt.c  | 17 ++++++++++++++---
>  2 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 6f9e116598ed..3ef6368e26f6 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -2758,6 +2758,8 @@ static void __mptcp_init_sock(struct sock *sk)
>  	msk->rmem_fwd_alloc = 0;
>  	WRITE_ONCE(msk->rmem_released, 0);
>  	msk->timer_ival = TCP_RTO_MIN;
> +	msk->scaling_ratio = (1200 << TCP_RMEM_TO_WIN_SCALE) /
> +			     SKB_TRUESIZE(4096);

I see that the code in TCP is quite recent: dfa2f0483360 ("tcp: get rid
of sysctl_tcp_adv_win_scale").

Do we want to add a new function in include/net/tcp.h which returns:

  (1200 << TCP_RMEM_TO_WIN_SCALE) / SKB_TRUESIZE(4096);

and use it here and in tcp_scaling_ratio_init()?

>  
>  	WRITE_ONCE(msk->first, NULL);
>  	inet_csk(sk)->icsk_sync_mss = mptcp_sync_mss;

Cheers,
Matt
MPTCP CI Sept. 28, 2023, 5:26 p.m. UTC | #2
Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join):
  - Unstable: 1 failed test(s): selftest_userspace_pm 
MPTCP CI Sept. 28, 2023, 8:10 p.m. UTC | #3
Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join):
  - Unstable: 1 failed test(s): selftest_userspace_pm 
Paolo Abeni Sept. 29, 2023, 3:50 p.m. UTC | #4
On Thu, 2023-09-28 at 19:11 +0200, Matthieu Baerts wrote:
> Hi Paolo,
> 
> On 28/09/2023 17:52, Paolo Abeni wrote:
> > Christoph reported a couple of serious splat caused by
> > the mentioned patch.
> > 
> > mptcp_set_rcvlowat() can use msk->scaling_ratio, before
> > such field is initialized, causing a divide by zero: we
> > need to init it in the sock constructor.
> > 
> > Additionally the same function bogusly cast an msk to a
> > tcp_sock, causing memory corruption. The reproducer likely
> > clears the sk refcount for the next msk allocated into the
> > same slab.
> > 
> > The intent was to properly propagate the rcvbuf changes to
> > the subflows. Let's do that explicitly.
> 
> Thank you for fixing all that!
> 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > --
> > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/442
> > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/443
> > 
> > since the above issues are introduced by the squash-to patch, I think
> > we can't have the tag in the final patch.
> > ---
> >  net/mptcp/protocol.c |  2 ++
> >  net/mptcp/sockopt.c  | 17 ++++++++++++++---
> >  2 files changed, 16 insertions(+), 3 deletions(-)
> > 
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index 6f9e116598ed..3ef6368e26f6 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -2758,6 +2758,8 @@ static void __mptcp_init_sock(struct sock *sk)
> >  	msk->rmem_fwd_alloc = 0;
> >  	WRITE_ONCE(msk->rmem_released, 0);
> >  	msk->timer_ival = TCP_RTO_MIN;
> > +	msk->scaling_ratio = (1200 << TCP_RMEM_TO_WIN_SCALE) /
> > +			     SKB_TRUESIZE(4096);
> 
> I see that the code in TCP is quite recent: dfa2f0483360 ("tcp: get rid
> of sysctl_tcp_adv_win_scale").
> 
> Do we want to add a new function in include/net/tcp.h which returns:
> 
>   (1200 << TCP_RMEM_TO_WIN_SCALE) / SKB_TRUESIZE(4096);
> 
> and use it here and in tcp_scaling_ratio_init()?

Perhaps just a define? 

#define TCP_INITIAL_SCALING_RATIO ((1200 << TCP_RMEM_TO_WIN_SCALE) / SKB_TRUESIZE(4096))

Note: we want to avoid any argument, to avoid 'different sock' struct
issues.

/P
diff mbox series

Patch

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 6f9e116598ed..3ef6368e26f6 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2758,6 +2758,8 @@  static void __mptcp_init_sock(struct sock *sk)
 	msk->rmem_fwd_alloc = 0;
 	WRITE_ONCE(msk->rmem_released, 0);
 	msk->timer_ival = TCP_RTO_MIN;
+	msk->scaling_ratio = (1200 << TCP_RMEM_TO_WIN_SCALE) /
+			     SKB_TRUESIZE(4096);
 
 	WRITE_ONCE(msk->first, NULL);
 	inet_csk(sk)->icsk_sync_mss = mptcp_sync_mss;
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 340e87195a27..6b37946b5c52 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -1473,6 +1473,7 @@  void mptcp_sockopt_sync_locked(struct mptcp_sock *msk, struct sock *ssk)
  */
 int mptcp_set_rcvlowat(struct sock *sk, int val)
 {
+	struct mptcp_subflow_context *subflow;
 	int space, cap;
 
 	if (sk->sk_userlocks & SOCK_RCVBUF_LOCK)
@@ -1490,9 +1491,19 @@  int mptcp_set_rcvlowat(struct sock *sk, int val)
 		return 0;
 
 	space = __tcp_space_from_win(mptcp_sk(sk)->scaling_ratio, val);
-	if (space > sk->sk_rcvbuf) {
-		WRITE_ONCE(sk->sk_rcvbuf, space);
-		tcp_sk(sk)->window_clamp = val;
+	if (space <= sk->sk_rcvbuf)
+		return 0;
+
+	/* propagate the rcvbuf changes to all the subflows */
+	WRITE_ONCE(sk->sk_rcvbuf, space);
+	mptcp_for_each_subflow(mptcp_sk(sk), subflow) {
+		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+		bool slow;
+
+		slow = lock_sock_fast(ssk);
+		WRITE_ONCE(ssk->sk_rcvbuf, space);
+		tcp_sk(ssk)->window_clamp = val;
+		unlock_sock_fast(ssk, slow);
 	}
 	return 0;
 }