Message ID | 20250217232905.3162187-1-kuba@kernel.org (mailing list archive) |
---|---|
State | Accepted |
Commit | f5da7c45188eea71394bf445655cae2df88a7788 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] tcp: adjust rcvq_space after updating scaling ratio | expand |
On Tue, Feb 18, 2025 at 12:29 AM Jakub Kicinski <kuba@kernel.org> wrote: > > Since commit under Fixes we set the window clamp in accordance > to newly measured rcvbuf scaling_ratio. If the scaling_ratio > decreased significantly we may put ourselves in a situation > where windows become smaller than rcvq_space, preventing > tcp_rcv_space_adjust() from increasing rcvbuf. > > The significant decrease of scaling_ratio is far more likely > since commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"), > which increased the "default" scaling ratio from ~30% to 50%. > > Hitting the bad condition depends a lot on TCP tuning, and > drivers at play. One of Meta's workloads hits it reliably > under following conditions: > - default rcvbuf of 125k > - sender MTU 1500, receiver MTU 5000 > - driver settles on scaling_ratio of 78 for the config above. > Initial rcvq_space gets calculated as TCP_INIT_CWND * tp->advmss > (10 * 5k = 50k). Once we find out the true scaling ratio and > MSS we clamp the windows to 38k. Triggering the condition also > depends on the message sequence of this workload. I can't repro > the problem with simple iperf or TCP_RR-style tests. > > Fixes: a2cbb1603943 ("tcp: Update window clamping condition") > Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Thanks !
On Mon, Feb 17, 2025 at 6:29 PM Jakub Kicinski <kuba@kernel.org> wrote: > > Since commit under Fixes we set the window clamp in accordance > to newly measured rcvbuf scaling_ratio. If the scaling_ratio > decreased significantly we may put ourselves in a situation > where windows become smaller than rcvq_space, preventing > tcp_rcv_space_adjust() from increasing rcvbuf. > > The significant decrease of scaling_ratio is far more likely > since commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"), > which increased the "default" scaling ratio from ~30% to 50%. > > Hitting the bad condition depends a lot on TCP tuning, and > drivers at play. One of Meta's workloads hits it reliably > under following conditions: > - default rcvbuf of 125k > - sender MTU 1500, receiver MTU 5000 > - driver settles on scaling_ratio of 78 for the config above. > Initial rcvq_space gets calculated as TCP_INIT_CWND * tp->advmss > (10 * 5k = 50k). Once we find out the true scaling ratio and > MSS we clamp the windows to 38k. Triggering the condition also > depends on the message sequence of this workload. I can't repro > the problem with simple iperf or TCP_RR-style tests. > > Fixes: a2cbb1603943 ("tcp: Update window clamping condition") > Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Neal Cardwell <ncardwell@google.com> Thanks, Jakub! neal
Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Mon, 17 Feb 2025 15:29:05 -0800 you wrote: > Since commit under Fixes we set the window clamp in accordance > to newly measured rcvbuf scaling_ratio. If the scaling_ratio > decreased significantly we may put ourselves in a situation > where windows become smaller than rcvq_space, preventing > tcp_rcv_space_adjust() from increasing rcvbuf. > > The significant decrease of scaling_ratio is far more likely > since commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"), > which increased the "default" scaling ratio from ~30% to 50%. > > [...] Here is the summary with links: - [net] tcp: adjust rcvq_space after updating scaling ratio https://git.kernel.org/netdev/net/c/f5da7c45188e You are awesome, thank you!
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 074406890552..6467716820ff 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -243,9 +243,15 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb) do_div(val, skb->truesize); tcp_sk(sk)->scaling_ratio = val ? val : 1; - if (old_ratio != tcp_sk(sk)->scaling_ratio) - WRITE_ONCE(tcp_sk(sk)->window_clamp, - tcp_win_from_space(sk, sk->sk_rcvbuf)); + if (old_ratio != tcp_sk(sk)->scaling_ratio) { + struct tcp_sock *tp = tcp_sk(sk); + + val = tcp_win_from_space(sk, sk->sk_rcvbuf); + tcp_set_window_clamp(sk, val); + + if (tp->window_clamp < tp->rcvq_space.space) + tp->rcvq_space.space = tp->window_clamp; + } } icsk->icsk_ack.rcv_mss = min_t(unsigned int, len, tcp_sk(sk)->advmss);
Since commit under Fixes we set the window clamp in accordance to newly measured rcvbuf scaling_ratio. If the scaling_ratio decreased significantly we may put ourselves in a situation where windows become smaller than rcvq_space, preventing tcp_rcv_space_adjust() from increasing rcvbuf. The significant decrease of scaling_ratio is far more likely since commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"), which increased the "default" scaling ratio from ~30% to 50%. Hitting the bad condition depends a lot on TCP tuning, and drivers at play. One of Meta's workloads hits it reliably under following conditions: - default rcvbuf of 125k - sender MTU 1500, receiver MTU 5000 - driver settles on scaling_ratio of 78 for the config above. Initial rcvq_space gets calculated as TCP_INIT_CWND * tp->advmss (10 * 5k = 50k). Once we find out the true scaling ratio and MSS we clamp the windows to 38k. Triggering the condition also depends on the message sequence of this workload. I can't repro the problem with simple iperf or TCP_RR-style tests. Fixes: a2cbb1603943 ("tcp: Update window clamping condition") Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- CC: ncardwell@google.com CC: kuniyu@amazon.com CC: hli@netflix.com CC: quic_stranche@quicinc.com CC: quic_subashab@quicinc.com --- net/ipv4/tcp_input.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)