Message ID | 20240528125253.1966136-3-edumazet@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 853c3bd7b7917670224c9fe5245bd045cac411dd |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | tcp: fix tcp_poll() races | expand |
On Tue, May 28, 2024 at 8:53 AM Eric Dumazet <edumazet@google.com> wrote: > > I noticed flakes in a packetdrill test, expecting an epoll_wait() > to return EPOLLERR | EPOLLHUP on a failed connect() attempt, > after multiple SYN retransmits. It sometimes return EPOLLERR only. > > The issue is that tcp_write_err(): > 1) writes an error in sk->sk_err, > 2) calls sk_error_report(), > 3) then calls tcp_done(). > > tcp_done() is writing SHUTDOWN_MASK into sk->sk_shutdown, > among other things. > > Problem is that the awaken user thread (from 2) sk_error_report()) > might call tcp_poll() before tcp_done() has written sk->sk_shutdown. > > tcp_poll() only sees a non zero sk->sk_err and returns EPOLLERR. > > This patch fixes the issue by making sure to call sk_error_report() > after tcp_done(). > > tcp_write_err() also lacks an smp_wmb(). > > We can reuse tcp_done_with_error() to factor out the details, > as Neal suggested. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Signed-off-by: Eric Dumazet <edumazet@google.com> > --- Acked-by: Neal Cardwell <ncardwell@google.com> Thanks, Eric! neal
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 83fe7f62f7f10ab111512a3ef15a97a04c79cb4a..3e8604ae7d06c5b010a2034e3295675a7d358f13 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -74,11 +74,7 @@ u32 tcp_clamp_probe0_to_user_timeout(const struct sock *sk, u32 when) static void tcp_write_err(struct sock *sk) { - WRITE_ONCE(sk->sk_err, READ_ONCE(sk->sk_err_soft) ? : ETIMEDOUT); - sk_error_report(sk); - - tcp_write_queue_purge(sk); - tcp_done(sk); + tcp_done_with_error(sk, READ_ONCE(sk->sk_err_soft) ? : ETIMEDOUT); __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONTIMEOUT); }
I noticed flakes in a packetdrill test, expecting an epoll_wait() to return EPOLLERR | EPOLLHUP on a failed connect() attempt, after multiple SYN retransmits. It sometimes return EPOLLERR only. The issue is that tcp_write_err(): 1) writes an error in sk->sk_err, 2) calls sk_error_report(), 3) then calls tcp_done(). tcp_done() is writing SHUTDOWN_MASK into sk->sk_shutdown, among other things. Problem is that the awaken user thread (from 2) sk_error_report()) might call tcp_poll() before tcp_done() has written sk->sk_shutdown. tcp_poll() only sees a non zero sk->sk_err and returns EPOLLERR. This patch fixes the issue by making sure to call sk_error_report() after tcp_done(). tcp_write_err() also lacks an smp_wmb(). We can reuse tcp_done_with_error() to factor out the details, as Neal suggested. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> --- net/ipv4/tcp_timer.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)