[2/2] tcp: fix forever orphan socket caused by tcp_abort

Message ID	20250314092446.852230-2-youngmin.nam@samsung.com (mailing list archive)
State	New
Delegated to:	Netdev Maintainers
Headers	show Received: from mailout4.samsung.com (mailout4.samsung.com [203.254.224.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 608591E633C for <netdev@vger.kernel.org>; Fri, 14 Mar 2025 09:21:40 +0000 (UTC) From: Youngmin Nam <youngmin.nam@samsung.com> To: stable@vger.kernel.org Cc: ncardwell@google.com, edumazet@google.com, kuba@kernel.org, davem@davemloft.net, dsahern@kernel.org, pabeni@redhat.com, horms@kernel.org, guo88.liu@samsung.com, yiwang.cai@samsung.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, joonki.min@samsung.com, hajun.sung@samsung.com, d7271.choe@samsung.com, sw.ju@samsung.com, dujeong.lee@samsung.com, ycheng@google.com, yyd@google.com, kuro@kuroa.me, youngmin.nam@samsung.com, cmllamas@google.com, willdeacon@google.com, maennich@google.com, gregkh@google.com, Lorenzo Colitti <lorenzo@google.com>, Jason Xing <kerneljasonxing@gmail.com> Subject: [PATCH 2/2] tcp: fix forever orphan socket caused by tcp_abort Date: Fri, 14 Mar 2025 18:24:46 +0900 Message-Id: <20250314092446.852230-2-youngmin.nam@samsung.com> In-Reply-To: <20250314092446.852230-1-youngmin.nam@samsung.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="utf-8" CMS-TYPE: 102P DLP-Filter: Pass References: <20250314092446.852230-1-youngmin.nam@samsung.com> <CGME20250314092130epcas2p34e60b23ff983fe03195820a38fb376c5@epcas2p3.samsung.com>
Series	[1/2] tcp: fix races in tcp_abort() \| expand [1/2] tcp: fix races in tcp_abort() [2/2] tcp: fix forever orphan socket caused by tcp_abort

Message ID

20250314092446.852230-2-youngmin.nam@samsung.com (mailing list archive)

State

New

Delegated to:

Netdev Maintainers

Headers

From: Youngmin Nam <youngmin.nam@samsung.com>
To: stable@vger.kernel.org
Cc: ncardwell@google.com, edumazet@google.com, kuba@kernel.org,
	davem@davemloft.net, dsahern@kernel.org, pabeni@redhat.com,
	horms@kernel.org, guo88.liu@samsung.com, yiwang.cai@samsung.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	joonki.min@samsung.com, hajun.sung@samsung.com, d7271.choe@samsung.com,
	sw.ju@samsung.com, dujeong.lee@samsung.com, ycheng@google.com,
	yyd@google.com, kuro@kuroa.me, youngmin.nam@samsung.com,
	cmllamas@google.com, willdeacon@google.com, maennich@google.com,
	gregkh@google.com, Lorenzo Colitti <lorenzo@google.com>, Jason Xing
	<kerneljasonxing@gmail.com>
Subject: [PATCH 2/2] tcp: fix forever orphan socket caused by tcp_abort
Date: Fri, 14 Mar 2025 18:24:46 +0900
Message-Id: <20250314092446.852230-2-youngmin.nam@samsung.com>
In-Reply-To: <20250314092446.852230-1-youngmin.nam@samsung.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="utf-8"
CMS-TYPE: 102P
DLP-Filter: Pass
References: <20250314092446.852230-1-youngmin.nam@samsung.com>
	<CGME20250314092130epcas2p34e60b23ff983fe03195820a38fb376c5@epcas2p3.samsung.com>

Series

[1/2] tcp: fix races in tcp_abort() | expand

Context	Check	Description
netdev/tree_selection	success	Guessing tree name failed - patch did not apply

Context

Check

Description

netdev/tree_selection

success

Guessing tree name failed - patch did not apply

Commit Message

Youngmin Nam March 14, 2025, 9:24 a.m. UTC

From: Xueming Feng <kuro@kuroa.me>

commit bac76cf89816bff06c4ec2f3df97dc34e150a1c4 upstream.

We have some problem closing zero-window fin-wait-1 tcp sockets in our
environment. This patch come from the investigation.

Previously tcp_abort only sends out reset and calls tcp_done when the
socket is not SOCK_DEAD, aka orphan. For orphan socket, it will only
purging the write queue, but not close the socket and left it to the
timer.

While purging the write queue, tp->packets_out and sk->sk_write_queue
is cleared along the way. However tcp_retransmit_timer have early
return based on !tp->packets_out and tcp_probe_timer have early
return based on !sk->sk_write_queue.

This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched
and socket not being killed by the timers, converting a zero-windowed
orphan into a forever orphan.

This patch removes the SOCK_DEAD check in tcp_abort, making it send
reset to peer and close the socket accordingly. Preventing the
timer-less orphan from happening.

According to Lorenzo's email in the v1 thread, the check was there to
prevent force-closing the same socket twice. That situation is handled
by testing for TCP_CLOSE inside lock, and returning -ENOENT if it is
already closed.

The -ENOENT code comes from the associate patch Lorenzo made for
iproute2-ss; link attached below, which also conform to RFC 9293.

At the end of the patch, tcp_write_queue_purge(sk) is removed because it
was already called in tcp_done_with_error().

p.s. This is the same patch with v2. Resent due to mis-labeled "changes
requested" on patchwork.kernel.org.

Link: https://patchwork.ozlabs.org/project/netdev/patch/1450773094-7978-3-git-send-email-lorenzo@google.com/
Fixes: c1e64e298b8c ("net: diag: Support destroying TCP sockets.")
Signed-off-by: Xueming Feng <kuro@kuroa.me>
Tested-by: Lorenzo Colitti <lorenzo@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240826102327.1461482-1-kuro@kuroa.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cc: <stable@vger.kernel.org> # v5.10+
Link: https://lore.kernel.org/lkml/Z9OZS%2Fhc+v5og6%2FU@perf/
[youngmin: Resolved minor conflict in net/ipv4/tcp.c]
Signed-off-by: Youngmin Nam <youngmin.nam@samsung.com>
---
 net/ipv4/tcp.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

Comments

Greg KH March 14, 2025, 12:24 p.m. UTC | #1

On Fri, Mar 14, 2025 at 06:24:46PM +0900, Youngmin Nam wrote:
> From: Xueming Feng <kuro@kuroa.me>
> 
> commit bac76cf89816bff06c4ec2f3df97dc34e150a1c4 upstream.
> 
> We have some problem closing zero-window fin-wait-1 tcp sockets in our
> environment. This patch come from the investigation.
> 
> Previously tcp_abort only sends out reset and calls tcp_done when the
> socket is not SOCK_DEAD, aka orphan. For orphan socket, it will only
> purging the write queue, but not close the socket and left it to the
> timer.
> 
> While purging the write queue, tp->packets_out and sk->sk_write_queue
> is cleared along the way. However tcp_retransmit_timer have early
> return based on !tp->packets_out and tcp_probe_timer have early
> return based on !sk->sk_write_queue.
> 
> This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched
> and socket not being killed by the timers, converting a zero-windowed
> orphan into a forever orphan.
> 
> This patch removes the SOCK_DEAD check in tcp_abort, making it send
> reset to peer and close the socket accordingly. Preventing the
> timer-less orphan from happening.
> 
> According to Lorenzo's email in the v1 thread, the check was there to
> prevent force-closing the same socket twice. That situation is handled
> by testing for TCP_CLOSE inside lock, and returning -ENOENT if it is
> already closed.
> 
> The -ENOENT code comes from the associate patch Lorenzo made for
> iproute2-ss; link attached below, which also conform to RFC 9293.
> 
> At the end of the patch, tcp_write_queue_purge(sk) is removed because it
> was already called in tcp_done_with_error().
> 
> p.s. This is the same patch with v2. Resent due to mis-labeled "changes
> requested" on patchwork.kernel.org.
> 
> Link: https://patchwork.ozlabs.org/project/netdev/patch/1450773094-7978-3-git-send-email-lorenzo@google.com/
> Fixes: c1e64e298b8c ("net: diag: Support destroying TCP sockets.")
> Signed-off-by: Xueming Feng <kuro@kuroa.me>
> Tested-by: Lorenzo Colitti <lorenzo@google.com>
> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Link: https://patch.msgid.link/20240826102327.1461482-1-kuro@kuroa.me
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> Cc: <stable@vger.kernel.org> # v5.10+

Does not apply to 6.1.y or older, what did you want this applied to?

thanks,

greg k-h

Youngmin Nam March 17, 2025, 4:32 a.m. UTC | #2

On Fri, Mar 14, 2025 at 01:24:26PM +0100, Greg KH wrote:
> On Fri, Mar 14, 2025 at 06:24:46PM +0900, Youngmin Nam wrote:
> > From: Xueming Feng <kuro@kuroa.me>
> > 
> > commit bac76cf89816bff06c4ec2f3df97dc34e150a1c4 upstream.
> > 
> > We have some problem closing zero-window fin-wait-1 tcp sockets in our
> > environment. This patch come from the investigation.
> > 
> > Previously tcp_abort only sends out reset and calls tcp_done when the
> > socket is not SOCK_DEAD, aka orphan. For orphan socket, it will only
> > purging the write queue, but not close the socket and left it to the
> > timer.
> > 
> > While purging the write queue, tp->packets_out and sk->sk_write_queue
> > is cleared along the way. However tcp_retransmit_timer have early
> > return based on !tp->packets_out and tcp_probe_timer have early
> > return based on !sk->sk_write_queue.
> > 
> > This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched
> > and socket not being killed by the timers, converting a zero-windowed
> > orphan into a forever orphan.
> > 
> > This patch removes the SOCK_DEAD check in tcp_abort, making it send
> > reset to peer and close the socket accordingly. Preventing the
> > timer-less orphan from happening.
> > 
> > According to Lorenzo's email in the v1 thread, the check was there to
> > prevent force-closing the same socket twice. That situation is handled
> > by testing for TCP_CLOSE inside lock, and returning -ENOENT if it is
> > already closed.
> > 
> > The -ENOENT code comes from the associate patch Lorenzo made for
> > iproute2-ss; link attached below, which also conform to RFC 9293.
> > 
> > At the end of the patch, tcp_write_queue_purge(sk) is removed because it
> > was already called in tcp_done_with_error().
> > 
> > p.s. This is the same patch with v2. Resent due to mis-labeled "changes
> > requested" on patchwork.kernel.org.
> > 
> > Link: https://protect2.fireeye.com/v1/url?k=f1caf90b-ae51376f-f1cb7244-000babda0201-1111684dae24e0cf&q=1&e=32bd2804-1687-48c6-945d-f20eded99c42&u=https%3A%2F%2Fpatchwork.ozlabs.org%2Fproject%2Fnetdev%2Fpatch%2F1450773094-7978-3-git-send-email-lorenzo%40google.com%2F
> > Fixes: c1e64e298b8c ("net: diag: Support destroying TCP sockets.")
> > Signed-off-by: Xueming Feng <kuro@kuroa.me>
> > Tested-by: Lorenzo Colitti <lorenzo@google.com>
> > Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
> > Reviewed-by: Eric Dumazet <edumazet@google.com>
> > Link: https://protect2.fireeye.com/v1/url?k=66416ec8-39daa0ac-6640e587-000babda0201-21346ca5121765eb&q=1&e=32bd2804-1687-48c6-945d-f20eded99c42&u=https%3A%2F%2Fpatch.msgid.link%2F20240826102327.1461482-1-kuro%40kuroa.me
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> > Cc: <stable@vger.kernel.org> # v5.10+
> 
> Does not apply to 6.1.y or older, what did you want this applied to?
> 
> thanks,
> 
> greg k-h
> 
Hi Greg,

Sorry about that. Let me resend these patches for 6.1 and 5.15.

As for 5.10, it seems to have more dependencies for the backport.
I think the maintainer should handle it to ensure a safe backport.

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 9fe164aa185c..ff22060f9145 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4620,6 +4620,13 @@  int tcp_abort(struct sock *sk, int err)
 		/* Don't race with userspace socket closes such as tcp_close. */
 		lock_sock(sk);
 
+	/* Avoid closing the same socket twice. */
+	if (sk->sk_state == TCP_CLOSE) {
+		if (!has_current_bpf_ctx())
+			release_sock(sk);
+		return -ENOENT;
+	}
+
 	if (sk->sk_state == TCP_LISTEN) {
 		tcp_set_state(sk, TCP_CLOSE);
 		inet_csk_listen_stop(sk);
@@ -4629,15 +4636,12 @@  int tcp_abort(struct sock *sk, int err)
 	local_bh_disable();
 	bh_lock_sock(sk);
 
-	if (!sock_flag(sk, SOCK_DEAD)) {
-		if (tcp_need_reset(sk->sk_state))
-			tcp_send_active_reset(sk, GFP_ATOMIC);
-		tcp_done_with_error(sk, err);
-	}
+	if (tcp_need_reset(sk->sk_state))
+		tcp_send_active_reset(sk, GFP_ATOMIC);
+	tcp_done_with_error(sk, err);
 
 	bh_unlock_sock(sk);
 	local_bh_enable();
-	tcp_write_queue_purge(sk);
 	if (!has_current_bpf_ctx())
 		release_sock(sk);
 	return 0;

[2/2] tcp: fix forever orphan socket caused by tcp_abort

Checks

Commit Message

Comments

Patch