diff mbox series

[net-next,1/2] tcp: fix SO_RCVLOWAT related hangs under mem pressure

Message ID 20210212232214.2869897-2-eric.dumazet@gmail.com (mailing list archive)
State Accepted
Delegated to: Netdev Maintainers
Headers show
Series tcp: mem pressure vs SO_RCVLOWAT | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 1 maintainers not CCed: soheil@google.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 1496 this patch: 1496
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 15 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 1498 this patch: 1498
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Eric Dumazet Feb. 12, 2021, 11:22 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

While commit 24adbc1676af ("tcp: fix SO_RCVLOWAT hangs with fat skbs")
fixed an issue vs too small sk_rcvbuf for given sk_rcvlowat constraint,
it missed to address issue caused by memory pressure.

1) If we are under memory pressure and socket receive queue is empty.
First incoming packet is allowed to be queued, after commit
76dfa6082032 ("tcp: allow one skb to be received per socket under memory pressure")

But we do not send EPOLLIN yet, in case tcp_data_ready() sees sk_rcvlowat
is bigger than skb length.

2) Then, when next packet comes, it is dropped, and we directly
call sk->sk_data_ready().

3) If application is using poll(), tcp_poll() will then use
tcp_stream_is_readable() and decide the socket receive queue is
not yet filled, so nothing will happen.

Even when sender retransmits packets, phases 2) & 3) repeat
and flow is effectively frozen, until memory pressure is off.

Fix is to consider tcp_under_memory_pressure() to take care
of global memory pressure or memcg pressure.

Fixes: 24adbc1676af ("tcp: fix SO_RCVLOWAT hangs with fat skbs")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Arjun Roy <arjunroy@google.com>
Suggested-by: Wei Wang <weiwan@google.com>
---
 include/net/tcp.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Wei Wang Feb. 13, 2021, 12:23 a.m. UTC | #1
On Fri, Feb 12, 2021 at 3:22 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
>
> While commit 24adbc1676af ("tcp: fix SO_RCVLOWAT hangs with fat skbs")
> fixed an issue vs too small sk_rcvbuf for given sk_rcvlowat constraint,
> it missed to address issue caused by memory pressure.
>
> 1) If we are under memory pressure and socket receive queue is empty.
> First incoming packet is allowed to be queued, after commit
> 76dfa6082032 ("tcp: allow one skb to be received per socket under memory pressure")
>
> But we do not send EPOLLIN yet, in case tcp_data_ready() sees sk_rcvlowat
> is bigger than skb length.
>
> 2) Then, when next packet comes, it is dropped, and we directly
> call sk->sk_data_ready().
>
> 3) If application is using poll(), tcp_poll() will then use
> tcp_stream_is_readable() and decide the socket receive queue is
> not yet filled, so nothing will happen.
>
> Even when sender retransmits packets, phases 2) & 3) repeat
> and flow is effectively frozen, until memory pressure is off.
>
> Fix is to consider tcp_under_memory_pressure() to take care
> of global memory pressure or memcg pressure.
>
> Fixes: 24adbc1676af ("tcp: fix SO_RCVLOWAT hangs with fat skbs")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Arjun Roy <arjunroy@google.com>
> Suggested-by: Wei Wang <weiwan@google.com>
> ---
Nice description in the commit msg!

Reviewed-by: Wei Wang <weiwan@google.com>

>  include/net/tcp.h | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 25bbada379c46add16fb7239733bd6571f10f680..244208f6f6c2ace87920b633e469421f557427a6 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1431,8 +1431,13 @@ void tcp_cleanup_rbuf(struct sock *sk, int copied);
>   */
>  static inline bool tcp_rmem_pressure(const struct sock *sk)
>  {
> -       int rcvbuf = READ_ONCE(sk->sk_rcvbuf);
> -       int threshold = rcvbuf - (rcvbuf >> 3);
> +       int rcvbuf, threshold;
> +
> +       if (tcp_under_memory_pressure(sk))
> +               return true;
> +
> +       rcvbuf = READ_ONCE(sk->sk_rcvbuf);
> +       threshold = rcvbuf - (rcvbuf >> 3);
>
>         return atomic_read(&sk->sk_rmem_alloc) > threshold;
>  }
> --
> 2.30.0.478.g8a0d178c01-goog
>
diff mbox series

Patch

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 25bbada379c46add16fb7239733bd6571f10f680..244208f6f6c2ace87920b633e469421f557427a6 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1431,8 +1431,13 @@  void tcp_cleanup_rbuf(struct sock *sk, int copied);
  */
 static inline bool tcp_rmem_pressure(const struct sock *sk)
 {
-	int rcvbuf = READ_ONCE(sk->sk_rcvbuf);
-	int threshold = rcvbuf - (rcvbuf >> 3);
+	int rcvbuf, threshold;
+
+	if (tcp_under_memory_pressure(sk))
+		return true;
+
+	rcvbuf = READ_ONCE(sk->sk_rcvbuf);
+	threshold = rcvbuf - (rcvbuf >> 3);
 
 	return atomic_read(&sk->sk_rmem_alloc) > threshold;
 }