diff mbox series

[net] tcp: avoid too many retransmit packets

Message ID 20240710001402.2758273-1-edumazet@google.com (mailing list archive)
State Accepted
Commit 97a9063518f198ec0adb2ecb89789de342bb8283
Delegated to: Netdev Maintainers
Headers show
Series [net] tcp: avoid too many retransmit packets | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 833 this patch: 833
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: dsahern@kernel.org
netdev/build_clang success Errors and warnings before: 835 this patch: 835
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 835 this patch: 835
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 28 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 3 this patch: 3
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-07-10--21-00 (tests: 693)

Commit Message

Eric Dumazet July 10, 2024, 12:14 a.m. UTC
If a TCP socket is using TCP_USER_TIMEOUT, and the other peer
retracted its window to zero, tcp_retransmit_timer() can
retransmit a packet every two jiffies (2 ms for HZ=1000),
for about 4 minutes after TCP_USER_TIMEOUT has 'expired'.

The fix is to make sure tcp_rtx_probe0_timed_out() takes
icsk->icsk_user_timeout into account.

Before blamed commit, the socket would not timeout after
icsk->icsk_user_timeout, but would use standard exponential
backoff for the retransmits.

Also worth noting that before commit e89688e3e978 ("net: tcp:
fix unexcepted socket die when snd_wnd is 0"), the issue
would last 2 minutes instead of 4.

Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jon Maxwell <jmaxwell37@gmail.com>
Cc: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_timer.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

Comments

Jason Xing July 10, 2024, 1:27 a.m. UTC | #1
On Wed, Jul 10, 2024 at 8:14 AM Eric Dumazet <edumazet@google.com> wrote:
>
> If a TCP socket is using TCP_USER_TIMEOUT, and the other peer
> retracted its window to zero, tcp_retransmit_timer() can
> retransmit a packet every two jiffies (2 ms for HZ=1000),
> for about 4 minutes after TCP_USER_TIMEOUT has 'expired'.
>
> The fix is to make sure tcp_rtx_probe0_timed_out() takes
> icsk->icsk_user_timeout into account.
>
> Before blamed commit, the socket would not timeout after
> icsk->icsk_user_timeout, but would use standard exponential
> backoff for the retransmits.
>
> Also worth noting that before commit e89688e3e978 ("net: tcp:
> fix unexcepted socket die when snd_wnd is 0"), the issue
> would last 2 minutes instead of 4.
>
> Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jon Maxwell <jmaxwell37@gmail.com>
> Cc: Neal Cardwell <ncardwell@google.com>

Interesting case. Thanks!

Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

> ---
>  net/ipv4/tcp_timer.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index db9d826560e57caf8274d1b7253c7af4dd7821a0..892c86657fbc243ce53a939157b77f1fe0410097 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -483,15 +483,26 @@ static bool tcp_rtx_probe0_timed_out(const struct sock *sk,
>                                      const struct sk_buff *skb,
>                                      u32 rtx_delta)
>  {
> +       const struct inet_connection_sock *icsk = inet_csk(sk);
> +       u32 user_timeout = READ_ONCE(icsk->icsk_user_timeout);
>         const struct tcp_sock *tp = tcp_sk(sk);
> -       const int timeout = TCP_RTO_MAX * 2;
> +       int timeout = TCP_RTO_MAX * 2;
>         s32 rcv_delta;
>
> +       if (user_timeout) {
> +               /* If user application specified a TCP_USER_TIMEOUT,
> +                * it does not want win 0 packets to 'reset the timer'
> +                * while retransmits are not making progress.
> +                */
> +               if (rtx_delta > user_timeout)
> +                       return true;
> +               timeout = min_t(u32, timeout, msecs_to_jiffies(user_timeout));
> +       }
>         /* Note: timer interrupt might have been delayed by at least one jiffy,
>          * and tp->rcv_tstamp might very well have been written recently.
>          * rcv_delta can thus be negative.
>          */
> -       rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp;
> +       rcv_delta = icsk->icsk_timeout - tp->rcv_tstamp;
>         if (rcv_delta <= timeout)
>                 return false;
>
> --
> 2.45.2.993.g49e7a77208-goog
>
>
Jonathan Maxwell July 10, 2024, 1:43 a.m. UTC | #2
On Wed, Jul 10, 2024 at 10:14 AM Eric Dumazet <edumazet@google.com> wrote:
>
> If a TCP socket is using TCP_USER_TIMEOUT, and the other peer
> retracted its window to zero, tcp_retransmit_timer() can
> retransmit a packet every two jiffies (2 ms for HZ=1000),
> for about 4 minutes after TCP_USER_TIMEOUT has 'expired'.
>
> The fix is to make sure tcp_rtx_probe0_timed_out() takes
> icsk->icsk_user_timeout into account.
>
> Before blamed commit, the socket would not timeout after
> icsk->icsk_user_timeout, but would use standard exponential
> backoff for the retransmits.
>
> Also worth noting that before commit e89688e3e978 ("net: tcp:
> fix unexcepted socket die when snd_wnd is 0"), the issue
> would last 2 minutes instead of 4.
>
> Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jon Maxwell <jmaxwell37@gmail.com>
> Cc: Neal Cardwell <ncardwell@google.com>

Nice catch Eric. Thanks for fixing LGTM.

Reviewed-by: Jon Maxwell <jmaxwell37@gmail.com>



> ---
>  net/ipv4/tcp_timer.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index db9d826560e57caf8274d1b7253c7af4dd7821a0..892c86657fbc243ce53a939157b77f1fe0410097 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -483,15 +483,26 @@ static bool tcp_rtx_probe0_timed_out(const struct sock *sk,
>                                      const struct sk_buff *skb,
>                                      u32 rtx_delta)
>  {
> +       const struct inet_connection_sock *icsk = inet_csk(sk);
> +       u32 user_timeout = READ_ONCE(icsk->icsk_user_timeout);
>         const struct tcp_sock *tp = tcp_sk(sk);
> -       const int timeout = TCP_RTO_MAX * 2;
> +       int timeout = TCP_RTO_MAX * 2;
>         s32 rcv_delta;
>
> +       if (user_timeout) {
> +               /* If user application specified a TCP_USER_TIMEOUT,
> +                * it does not want win 0 packets to 'reset the timer'
> +                * while retransmits are not making progress.
> +                */
> +               if (rtx_delta > user_timeout)
> +                       return true;
> +               timeout = min_t(u32, timeout, msecs_to_jiffies(user_timeout));
> +       }
>         /* Note: timer interrupt might have been delayed by at least one jiffy,
>          * and tp->rcv_tstamp might very well have been written recently.
>          * rcv_delta can thus be negative.
>          */
> -       rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp;
> +       rcv_delta = icsk->icsk_timeout - tp->rcv_tstamp;
>         if (rcv_delta <= timeout)
>                 return false;
>
> --
> 2.45.2.993.g49e7a77208-goog
>
Kuniyuki Iwashima July 10, 2024, 10:18 p.m. UTC | #3
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 10 Jul 2024 00:14:01 +0000
> If a TCP socket is using TCP_USER_TIMEOUT, and the other peer
> retracted its window to zero, tcp_retransmit_timer() can
> retransmit a packet every two jiffies (2 ms for HZ=1000),
> for about 4 minutes after TCP_USER_TIMEOUT has 'expired'.
> 
> The fix is to make sure tcp_rtx_probe0_timed_out() takes
> icsk->icsk_user_timeout into account.
> 
> Before blamed commit, the socket would not timeout after
> icsk->icsk_user_timeout, but would use standard exponential
> backoff for the retransmits.
> 
> Also worth noting that before commit e89688e3e978 ("net: tcp:
> fix unexcepted socket die when snd_wnd is 0"), the issue
> would last 2 minutes instead of 4.
> 
> Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
patchwork-bot+netdevbpf@kernel.org July 11, 2024, 2:20 a.m. UTC | #4
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 10 Jul 2024 00:14:01 +0000 you wrote:
> If a TCP socket is using TCP_USER_TIMEOUT, and the other peer
> retracted its window to zero, tcp_retransmit_timer() can
> retransmit a packet every two jiffies (2 ms for HZ=1000),
> for about 4 minutes after TCP_USER_TIMEOUT has 'expired'.
> 
> The fix is to make sure tcp_rtx_probe0_timed_out() takes
> icsk->icsk_user_timeout into account.
> 
> [...]

Here is the summary with links:
  - [net] tcp: avoid too many retransmit packets
    https://git.kernel.org/netdev/net/c/97a9063518f1

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index db9d826560e57caf8274d1b7253c7af4dd7821a0..892c86657fbc243ce53a939157b77f1fe0410097 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -483,15 +483,26 @@  static bool tcp_rtx_probe0_timed_out(const struct sock *sk,
 				     const struct sk_buff *skb,
 				     u32 rtx_delta)
 {
+	const struct inet_connection_sock *icsk = inet_csk(sk);
+	u32 user_timeout = READ_ONCE(icsk->icsk_user_timeout);
 	const struct tcp_sock *tp = tcp_sk(sk);
-	const int timeout = TCP_RTO_MAX * 2;
+	int timeout = TCP_RTO_MAX * 2;
 	s32 rcv_delta;
 
+	if (user_timeout) {
+		/* If user application specified a TCP_USER_TIMEOUT,
+		 * it does not want win 0 packets to 'reset the timer'
+		 * while retransmits are not making progress.
+		 */
+		if (rtx_delta > user_timeout)
+			return true;
+		timeout = min_t(u32, timeout, msecs_to_jiffies(user_timeout));
+	}
 	/* Note: timer interrupt might have been delayed by at least one jiffy,
 	 * and tp->rcv_tstamp might very well have been written recently.
 	 * rcv_delta can thus be negative.
 	 */
-	rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp;
+	rcv_delta = icsk->icsk_timeout - tp->rcv_tstamp;
 	if (rcv_delta <= timeout)
 		return false;