diff mbox series

[net] tcp: adjust rcvq_space after updating scaling ratio

Message ID 20250217232905.3162187-1-kuba@kernel.org (mailing list archive)
State Accepted
Commit f5da7c45188eea71394bf445655cae2df88a7788
Delegated to: Netdev Maintainers
Headers show
Series [net] tcp: adjust rcvq_space after updating scaling ratio | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: dsahern@kernel.org
netdev/build_clang success Errors and warnings before: 1 this patch: 1
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 18 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-02-18--00-00 (tests: 891)

Commit Message

Jakub Kicinski Feb. 17, 2025, 11:29 p.m. UTC
Since commit under Fixes we set the window clamp in accordance
to newly measured rcvbuf scaling_ratio. If the scaling_ratio
decreased significantly we may put ourselves in a situation
where windows become smaller than rcvq_space, preventing
tcp_rcv_space_adjust() from increasing rcvbuf.

The significant decrease of scaling_ratio is far more likely
since commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"),
which increased the "default" scaling ratio from ~30% to 50%.

Hitting the bad condition depends a lot on TCP tuning, and
drivers at play. One of Meta's workloads hits it reliably
under following conditions:
 - default rcvbuf of 125k
 - sender MTU 1500, receiver MTU 5000
 - driver settles on scaling_ratio of 78 for the config above.
Initial rcvq_space gets calculated as TCP_INIT_CWND * tp->advmss
(10 * 5k = 50k). Once we find out the true scaling ratio and
MSS we clamp the windows to 38k. Triggering the condition also
depends on the message sequence of this workload. I can't repro
the problem with simple iperf or TCP_RR-style tests.

Fixes: a2cbb1603943 ("tcp: Update window clamping condition")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: ncardwell@google.com
CC: kuniyu@amazon.com
CC: hli@netflix.com
CC: quic_stranche@quicinc.com
CC: quic_subashab@quicinc.com
---
 net/ipv4/tcp_input.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

Comments

Eric Dumazet Feb. 18, 2025, 2:10 p.m. UTC | #1
On Tue, Feb 18, 2025 at 12:29 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Since commit under Fixes we set the window clamp in accordance
> to newly measured rcvbuf scaling_ratio. If the scaling_ratio
> decreased significantly we may put ourselves in a situation
> where windows become smaller than rcvq_space, preventing
> tcp_rcv_space_adjust() from increasing rcvbuf.
>
> The significant decrease of scaling_ratio is far more likely
> since commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"),
> which increased the "default" scaling ratio from ~30% to 50%.
>
> Hitting the bad condition depends a lot on TCP tuning, and
> drivers at play. One of Meta's workloads hits it reliably
> under following conditions:
>  - default rcvbuf of 125k
>  - sender MTU 1500, receiver MTU 5000
>  - driver settles on scaling_ratio of 78 for the config above.
> Initial rcvq_space gets calculated as TCP_INIT_CWND * tp->advmss
> (10 * 5k = 50k). Once we find out the true scaling ratio and
> MSS we clamp the windows to 38k. Triggering the condition also
> depends on the message sequence of this workload. I can't repro
> the problem with simple iperf or TCP_RR-style tests.
>
> Fixes: a2cbb1603943 ("tcp: Update window clamping condition")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks !
Neal Cardwell Feb. 18, 2025, 2:14 p.m. UTC | #2
On Mon, Feb 17, 2025 at 6:29 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Since commit under Fixes we set the window clamp in accordance
> to newly measured rcvbuf scaling_ratio. If the scaling_ratio
> decreased significantly we may put ourselves in a situation
> where windows become smaller than rcvq_space, preventing
> tcp_rcv_space_adjust() from increasing rcvbuf.
>
> The significant decrease of scaling_ratio is far more likely
> since commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"),
> which increased the "default" scaling ratio from ~30% to 50%.
>
> Hitting the bad condition depends a lot on TCP tuning, and
> drivers at play. One of Meta's workloads hits it reliably
> under following conditions:
>  - default rcvbuf of 125k
>  - sender MTU 1500, receiver MTU 5000
>  - driver settles on scaling_ratio of 78 for the config above.
> Initial rcvq_space gets calculated as TCP_INIT_CWND * tp->advmss
> (10 * 5k = 50k). Once we find out the true scaling ratio and
> MSS we clamp the windows to 38k. Triggering the condition also
> depends on the message sequence of this workload. I can't repro
> the problem with simple iperf or TCP_RR-style tests.
>
> Fixes: a2cbb1603943 ("tcp: Update window clamping condition")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Neal Cardwell <ncardwell@google.com>

Thanks, Jakub!

neal
patchwork-bot+netdevbpf@kernel.org Feb. 19, 2025, 12:10 a.m. UTC | #3
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 17 Feb 2025 15:29:05 -0800 you wrote:
> Since commit under Fixes we set the window clamp in accordance
> to newly measured rcvbuf scaling_ratio. If the scaling_ratio
> decreased significantly we may put ourselves in a situation
> where windows become smaller than rcvq_space, preventing
> tcp_rcv_space_adjust() from increasing rcvbuf.
> 
> The significant decrease of scaling_ratio is far more likely
> since commit 697a6c8cec03 ("tcp: increase the default TCP scaling ratio"),
> which increased the "default" scaling ratio from ~30% to 50%.
> 
> [...]

Here is the summary with links:
  - [net] tcp: adjust rcvq_space after updating scaling ratio
    https://git.kernel.org/netdev/net/c/f5da7c45188e

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 074406890552..6467716820ff 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -243,9 +243,15 @@  static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
 			do_div(val, skb->truesize);
 			tcp_sk(sk)->scaling_ratio = val ? val : 1;
 
-			if (old_ratio != tcp_sk(sk)->scaling_ratio)
-				WRITE_ONCE(tcp_sk(sk)->window_clamp,
-					   tcp_win_from_space(sk, sk->sk_rcvbuf));
+			if (old_ratio != tcp_sk(sk)->scaling_ratio) {
+				struct tcp_sock *tp = tcp_sk(sk);
+
+				val = tcp_win_from_space(sk, sk->sk_rcvbuf);
+				tcp_set_window_clamp(sk, val);
+
+				if (tp->window_clamp < tp->rcvq_space.space)
+					tp->rcvq_space.space = tp->window_clamp;
+			}
 		}
 		icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
 					       tcp_sk(sk)->advmss);