Message ID | 20240823001152.31004-1-kerneljasonxing@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 0d9e5df4a257afc3a471a82961ace9a22b88295a |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v4,net-next] tcp: avoid reusing FIN_WAIT2 when trying to find port in connect() process | expand |
On Fri, Aug 23, 2024 at 2:12 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > From: Jason Xing <kernelxing@tencent.com> > > We found that one close-wait socket was reset by the other side > due to a new connection reusing the same port which is beyond our > expectation, so we have to investigate the underlying reason. > > The following experiment is conducted in the test environment. We > limit the port range from 40000 to 40010 and delay the time to close() > after receiving a fin from the active close side, which can help us > easily reproduce like what happened in production. > > Here are three connections captured by tcpdump: > 127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965525191 > 127.0.0.1.9999 > 127.0.0.1.40002: Flags [S.], seq 2769915070 > 127.0.0.1.40002 > 127.0.0.1.9999: Flags [.], ack 1 > 127.0.0.1.40002 > 127.0.0.1.9999: Flags [F.], seq 1, ack 1 > // a few seconds later, within 60 seconds > 127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965590730 > 127.0.0.1.9999 > 127.0.0.1.40002: Flags [.], ack 2 > 127.0.0.1.40002 > 127.0.0.1.9999: Flags [R], seq 2965525193 > // later, very quickly > 127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965590730 > 127.0.0.1.9999 > 127.0.0.1.40002: Flags [S.], seq 3120990805 > 127.0.0.1.40002 > 127.0.0.1.9999: Flags [.], ack 1 > > As we can see, the first flow is reset because: > 1) client starts a new connection, I mean, the second one > 2) client tries to find a suitable port which is a timewait socket > (its state is timewait, substate is fin_wait2) > 3) client occupies that timewait port to send a SYN > 4) server finds a corresponding close-wait socket in ehash table, > then replies with a challenge ack > 5) client sends an RST to terminate this old close-wait socket. > > I don't think the port selection algo can choose a FIN_WAIT2 socket > when we turn on tcp_tw_reuse because on the server side there > remain unread data. In some cases, if one side haven't call close() yet, > we should not consider it as expendable and treat it at will. > > Even though, sometimes, the server isn't able to call close() as soon > as possible like what we expect, it can not be terminated easily, > especially due to a second unrelated connection happening. > > After this patch, we can see the expected failure if we start a > connection when all the ports are occupied in fin_wait2 state: > "Ncat: Cannot assign requested address." > > Reported-by: Jade Dong <jadedong@tencent.com> > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- Reviewed-by: Eric Dumazet <edumazet@google.com>
Hello: This patch was applied to netdev/net-next.git (main) by Jakub Kicinski <kuba@kernel.org>: On Fri, 23 Aug 2024 08:11:52 +0800 you wrote: > From: Jason Xing <kernelxing@tencent.com> > > We found that one close-wait socket was reset by the other side > due to a new connection reusing the same port which is beyond our > expectation, so we have to investigate the underlying reason. > > The following experiment is conducted in the test environment. We > limit the port range from 40000 to 40010 and delay the time to close() > after receiving a fin from the active close side, which can help us > easily reproduce like what happened in production. > > [...] Here is the summary with links: - [v4,net-next] tcp: avoid reusing FIN_WAIT2 when trying to find port in connect() process https://git.kernel.org/netdev/net-next/c/0d9e5df4a257 You are awesome, thank you!
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index fd17f25ff288..9cdf6e7c44d9 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -118,6 +118,9 @@ int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp) struct tcp_sock *tp = tcp_sk(sk); int ts_recent_stamp; + if (tw->tw_substate == TCP_FIN_WAIT2) + reuse = 0; + if (reuse == 2) { /* Still does not detect *everything* that goes through * lo, since we require a loopback src or dst address