Message ID | 20240725215542.894348-1-quic_subashab@quicinc.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] tcp: Adjust clamping window for applications specifying SO_RCVBUF | expand |
On Thu, Jul 25, 2024 at 11:55 PM Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> wrote: > > tp->scaling_ratio is not updated based on skb->len/skb->truesize once > SO_RCVBUF is set leading to the maximum window scaling to be 25% of > rcvbuf after > commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") > and 50% of rcvbuf after > commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"). > 50% tries to emulate the behavior of older kernels using > sysctl_tcp_adv_win_scale with default value. > > Systems which were using a different values of sysctl_tcp_adv_win_scale > in older kernels ended up seeing reduced download speeds in certain > cases as covered in https://lists.openwall.net/netdev/2024/05/15/13 > While the sysctl scheme is no longer acceptable, the value of 50% is > a bit conservative when the skb->len/skb->truesize ratio is later > determined to be ~0.66. > > Applications not specifying SO_RCVBUF update the window scaling and > the receiver buffer every time data is copied to userspace. This > computation is now used for applications setting SO_RCVBUF to update > the maximum window scaling while ensuring that the receive buffer > is within the application specified limit. > > Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") > Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com> > Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> > --- > net/ipv4/tcp_input.c | 25 ++++++++++++++++++------- > 1 file changed, 18 insertions(+), 7 deletions(-) > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index 454362e359da..c8fb029a15a4 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -754,8 +754,7 @@ void tcp_rcv_space_adjust(struct sock *sk) > * <prev RTT . ><current RTT .. ><next RTT .... > > */ > > - if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) && > - !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { > + if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) { > u64 rcvwin, grow; > int rcvbuf; > > @@ -771,12 +770,24 @@ void tcp_rcv_space_adjust(struct sock *sk) > > rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin), > READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])); > - if (rcvbuf > sk->sk_rcvbuf) { > - WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); > + if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { > + if (rcvbuf > sk->sk_rcvbuf) { > + WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); > > - /* Make the window clamp follow along. */ > - WRITE_ONCE(tp->window_clamp, > - tcp_win_from_space(sk, rcvbuf)); > + /* Make the window clamp follow along. */ > + WRITE_ONCE(tp->window_clamp, > + tcp_win_from_space(sk, rcvbuf)); > + } > + } else { > + /* Make the window clamp follow along while being bounded > + * by SO_RCVBUF. > + */ > + if (rcvbuf <= sk->sk_rcvbuf) { I do not really understand this part. I am guessing this test will often be false and your problem won't be fixed. You do not handle all sysctl_tcp_adv_win_scale values (positive and negative) I would instead not use "if (rcvbuf <= sk->sk_rcvbuf) {" and instead : else { int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf)); if (clamp > tp->window_clamp) WRITE_ONCE(tp->window_clamp, clamp); }
On 7/26/2024 3:40 AM, Eric Dumazet wrote: > On Thu, Jul 25, 2024 at 11:55 PM Subash Abhinov Kasiviswanathan > <quic_subashab@quicinc.com> wrote: >> >> */ >> >> - if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) && >> - !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { >> + if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) { >> u64 rcvwin, grow; >> int rcvbuf; >> >> @@ -771,12 +770,24 @@ void tcp_rcv_space_adjust(struct sock *sk) >> >> rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin), >> READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])); >> - if (rcvbuf > sk->sk_rcvbuf) { >> - WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); >> + if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { >> + if (rcvbuf > sk->sk_rcvbuf) { >> + WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); >> >> - /* Make the window clamp follow along. */ >> - WRITE_ONCE(tp->window_clamp, >> - tcp_win_from_space(sk, rcvbuf)); >> + /* Make the window clamp follow along. */ >> + WRITE_ONCE(tp->window_clamp, >> + tcp_win_from_space(sk, rcvbuf)); >> + } >> + } else { >> + /* Make the window clamp follow along while being bounded >> + * by SO_RCVBUF. >> + */ >> + if (rcvbuf <= sk->sk_rcvbuf) { > > I do not really understand this part. > I am guessing this test will often be false and your problem won't be fixed. > You do not handle all sysctl_tcp_adv_win_scale values (positive and negative) > > I would instead not use "if (rcvbuf <= sk->sk_rcvbuf) {" > > and instead : > > else { > int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf)); > > if (clamp > tp->window_clamp) > WRITE_ONCE(tp->window_clamp, clamp); > } Thanks Eric, I've updated this in v2.
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 454362e359da..c8fb029a15a4 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -754,8 +754,7 @@ void tcp_rcv_space_adjust(struct sock *sk) * <prev RTT . ><current RTT .. ><next RTT .... > */ - if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) && - !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { + if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) { u64 rcvwin, grow; int rcvbuf; @@ -771,12 +770,24 @@ void tcp_rcv_space_adjust(struct sock *sk) rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin), READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])); - if (rcvbuf > sk->sk_rcvbuf) { - WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); + if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { + if (rcvbuf > sk->sk_rcvbuf) { + WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); - /* Make the window clamp follow along. */ - WRITE_ONCE(tp->window_clamp, - tcp_win_from_space(sk, rcvbuf)); + /* Make the window clamp follow along. */ + WRITE_ONCE(tp->window_clamp, + tcp_win_from_space(sk, rcvbuf)); + } + } else { + /* Make the window clamp follow along while being bounded + * by SO_RCVBUF. + */ + if (rcvbuf <= sk->sk_rcvbuf) { + int clamp = tcp_win_from_space(sk, rcvbuf); + + if (clamp > tp->window_clamp) + WRITE_ONCE(tp->window_clamp, clamp); + } } } tp->rcvq_space.space = copied;