Message ID | 20240731120955.23542-5-kerneljasonxing@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | tcp: completely support active reset | expand |
On Wed, Jul 31, 2024 at 2:10 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > From: Jason Xing <kernelxing@tencent.com> > > Introducing a new type TCP_STATE to handle some reset conditions > appearing in RFC 793 due to its socket state. Actually, we can look > into RFC 9293 which has no discrepancy about this part. > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- > V2 > Link: https://lore.kernel.org/all/20240730200633.93761-1-kuniyu@amazon.com/ > 1. use RFC 9293 instead of RFC 793 which is too old (Kuniyuki) > --- > include/net/rstreason.h | 6 ++++++ > net/ipv4/tcp.c | 4 ++-- > net/ipv4/tcp_timer.c | 2 +- > 3 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/include/net/rstreason.h b/include/net/rstreason.h > index eef658da8952..bbf20d0bbde7 100644 > --- a/include/net/rstreason.h > +++ b/include/net/rstreason.h > @@ -20,6 +20,7 @@ > FN(TCP_ABORT_ON_CLOSE) \ > FN(TCP_ABORT_ON_LINGER) \ > FN(TCP_ABORT_ON_MEMORY) \ > + FN(TCP_STATE) \ > FN(MPTCP_RST_EUNSPEC) \ > FN(MPTCP_RST_EMPTCP) \ > FN(MPTCP_RST_ERESOURCE) \ > @@ -102,6 +103,11 @@ enum sk_rst_reason { > * corresponding to LINUX_MIB_TCPABORTONMEMORY > */ > SK_RST_REASON_TCP_ABORT_ON_MEMORY, > + /** > + * @SK_RST_REASON_TCP_STATE: abort on tcp state > + * Please see RFC 9293 for all possible reset conditions > + */ > + SK_RST_REASON_TCP_STATE, > > /* Copy from include/uapi/linux/mptcp.h. > * These reset fields will not be changed since they adhere to > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index fd928c447ce8..64a49cb714e1 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -3031,7 +3031,7 @@ int tcp_disconnect(struct sock *sk, int flags) > /* The last check adjusts for discrepancy of Linux wrt. RFC > * states > */ > - tcp_send_active_reset(sk, gfp_any(), SK_RST_REASON_NOT_SPECIFIED); > + tcp_send_active_reset(sk, gfp_any(), SK_RST_REASON_TCP_STATE); I disagree with this. tcp_disconnect() is initiated by the user. You are conflating two possible conditions : 1) tcp_need_reset(old_state) 2) (tp->snd_nxt != tp->write_seq && (1 << old_state) & (TCPF_CLOSING | TCPF_LAST_ACK))) > WRITE_ONCE(sk->sk_err, ECONNRESET); > } else if (old_state == TCP_SYN_SENT) > WRITE_ONCE(sk->sk_err, ECONNRESET); > @@ -4649,7 +4649,7 @@ int tcp_abort(struct sock *sk, int err) > if (!sock_flag(sk, SOCK_DEAD)) { > if (tcp_need_reset(sk->sk_state)) > tcp_send_active_reset(sk, GFP_ATOMIC, > - SK_RST_REASON_NOT_SPECIFIED); > + SK_RST_REASON_TCP_STATE); > tcp_done_with_error(sk, err); > } > > diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c > index 0fba4a4fb988..3910f6d8614e 100644 > --- a/net/ipv4/tcp_timer.c > +++ b/net/ipv4/tcp_timer.c > @@ -779,7 +779,7 @@ static void tcp_keepalive_timer (struct timer_list *t) > goto out; > } > } > - tcp_send_active_reset(sk, GFP_ATOMIC, SK_RST_REASON_NOT_SPECIFIED); > + tcp_send_active_reset(sk, GFP_ATOMIC, SK_RST_REASON_TCP_STATE); > goto death; > } > > -- > 2.37.3 >
Hello Eric, On Thu, Aug 1, 2024 at 2:56 PM Eric Dumazet <edumazet@google.com> wrote: > > On Wed, Jul 31, 2024 at 2:10 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > Introducing a new type TCP_STATE to handle some reset conditions > > appearing in RFC 793 due to its socket state. Actually, we can look > > into RFC 9293 which has no discrepancy about this part. > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > --- > > V2 > > Link: https://lore.kernel.org/all/20240730200633.93761-1-kuniyu@amazon.com/ > > 1. use RFC 9293 instead of RFC 793 which is too old (Kuniyuki) > > --- > > include/net/rstreason.h | 6 ++++++ > > net/ipv4/tcp.c | 4 ++-- > > net/ipv4/tcp_timer.c | 2 +- > > 3 files changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/include/net/rstreason.h b/include/net/rstreason.h > > index eef658da8952..bbf20d0bbde7 100644 > > --- a/include/net/rstreason.h > > +++ b/include/net/rstreason.h > > @@ -20,6 +20,7 @@ > > FN(TCP_ABORT_ON_CLOSE) \ > > FN(TCP_ABORT_ON_LINGER) \ > > FN(TCP_ABORT_ON_MEMORY) \ > > + FN(TCP_STATE) \ > > FN(MPTCP_RST_EUNSPEC) \ > > FN(MPTCP_RST_EMPTCP) \ > > FN(MPTCP_RST_ERESOURCE) \ > > @@ -102,6 +103,11 @@ enum sk_rst_reason { > > * corresponding to LINUX_MIB_TCPABORTONMEMORY > > */ > > SK_RST_REASON_TCP_ABORT_ON_MEMORY, > > + /** > > + * @SK_RST_REASON_TCP_STATE: abort on tcp state > > + * Please see RFC 9293 for all possible reset conditions > > + */ > > + SK_RST_REASON_TCP_STATE, > > > > /* Copy from include/uapi/linux/mptcp.h. > > * These reset fields will not be changed since they adhere to > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > index fd928c447ce8..64a49cb714e1 100644 > > --- a/net/ipv4/tcp.c > > +++ b/net/ipv4/tcp.c > > @@ -3031,7 +3031,7 @@ int tcp_disconnect(struct sock *sk, int flags) > > /* The last check adjusts for discrepancy of Linux wrt. RFC > > * states > > */ > > - tcp_send_active_reset(sk, gfp_any(), SK_RST_REASON_NOT_SPECIFIED); > > + tcp_send_active_reset(sk, gfp_any(), SK_RST_REASON_TCP_STATE); > > I disagree with this. tcp_disconnect() is initiated by the user. > > You are conflating two possible conditions : > > 1) tcp_need_reset(old_state) For this one, I can keep the TCP_STATE reason, right? > 2) (tp->snd_nxt != tp->write_seq && (1 << old_state) & (TCPF_CLOSING | > TCPF_LAST_ACK))) > For this one, I wonder if I need to separate this condition with 'tcp_need_reset()' and put it into another 'else-if' branch? I decided to name it as 'CLOSE_WITH_DATA' because it can reflect that the write queue of the socket is not empty (at this time the user may think he has more data to send) but it stays in the active close state. How about it? Thanks, Jason
On Thu, Aug 1, 2024 at 11:51 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > Hello Eric, > > On Thu, Aug 1, 2024 at 2:56 PM Eric Dumazet <edumazet@google.com> wrote: > > > > On Wed, Jul 31, 2024 at 2:10 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > Introducing a new type TCP_STATE to handle some reset conditions > > > appearing in RFC 793 due to its socket state. Actually, we can look > > > into RFC 9293 which has no discrepancy about this part. > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > --- > > > V2 > > > Link: https://lore.kernel.org/all/20240730200633.93761-1-kuniyu@amazon.com/ > > > 1. use RFC 9293 instead of RFC 793 which is too old (Kuniyuki) > > > --- > > > include/net/rstreason.h | 6 ++++++ > > > net/ipv4/tcp.c | 4 ++-- > > > net/ipv4/tcp_timer.c | 2 +- > > > 3 files changed, 9 insertions(+), 3 deletions(-) > > > > > > diff --git a/include/net/rstreason.h b/include/net/rstreason.h > > > index eef658da8952..bbf20d0bbde7 100644 > > > --- a/include/net/rstreason.h > > > +++ b/include/net/rstreason.h > > > @@ -20,6 +20,7 @@ > > > FN(TCP_ABORT_ON_CLOSE) \ > > > FN(TCP_ABORT_ON_LINGER) \ > > > FN(TCP_ABORT_ON_MEMORY) \ > > > + FN(TCP_STATE) \ > > > FN(MPTCP_RST_EUNSPEC) \ > > > FN(MPTCP_RST_EMPTCP) \ > > > FN(MPTCP_RST_ERESOURCE) \ > > > @@ -102,6 +103,11 @@ enum sk_rst_reason { > > > * corresponding to LINUX_MIB_TCPABORTONMEMORY > > > */ > > > SK_RST_REASON_TCP_ABORT_ON_MEMORY, > > > + /** > > > + * @SK_RST_REASON_TCP_STATE: abort on tcp state > > > + * Please see RFC 9293 for all possible reset conditions > > > + */ > > > + SK_RST_REASON_TCP_STATE, > > > > > > /* Copy from include/uapi/linux/mptcp.h. > > > * These reset fields will not be changed since they adhere to > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > > index fd928c447ce8..64a49cb714e1 100644 > > > --- a/net/ipv4/tcp.c > > > +++ b/net/ipv4/tcp.c > > > @@ -3031,7 +3031,7 @@ int tcp_disconnect(struct sock *sk, int flags) > > > /* The last check adjusts for discrepancy of Linux wrt. RFC > > > * states > > > */ > > > - tcp_send_active_reset(sk, gfp_any(), SK_RST_REASON_NOT_SPECIFIED); > > > + tcp_send_active_reset(sk, gfp_any(), SK_RST_REASON_TCP_STATE); > > > > I disagree with this. tcp_disconnect() is initiated by the user. > > > > You are conflating two possible conditions : > > > > 1) tcp_need_reset(old_state) > > For this one, I can keep the TCP_STATE reason, right? What does it mean ? > > > 2) (tp->snd_nxt != tp->write_seq && (1 << old_state) & (TCPF_CLOSING | > > TCPF_LAST_ACK))) > > > > For this one, I wonder if I need to separate this condition with > 'tcp_need_reset()' and put it into another 'else-if' branch? > I decided to name it as 'CLOSE_WITH_DATA' because it can reflect that > the write queue of the socket is not empty (at this time the user may > think he has more data to send) but it stays in the active close > state. > How about it? This is not CLOSE_WITH_DATA, but a disconnect() operation, initiated by user space. If we add RST reasons, can we please be careful about the chosen names ? man connect <quote> Some protocol sockets (e.g., TCP sockets as well as datagram sockets in the UNIX and Internet domains) may dissolve the association by connecting to an address with the sa_family member of sockaddr set to AF_UNSPEC; thereafter, the socket can be connected to another ad‐ dress. (AF_UNSPEC is supported since Linux 2.2.) </quote> Very different from close()...
Hello Eric, On Thu, Aug 1, 2024 at 6:06 PM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Aug 1, 2024 at 11:51 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > Hello Eric, > > > > On Thu, Aug 1, 2024 at 2:56 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Wed, Jul 31, 2024 at 2:10 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > Introducing a new type TCP_STATE to handle some reset conditions > > > > appearing in RFC 793 due to its socket state. Actually, we can look > > > > into RFC 9293 which has no discrepancy about this part. > > > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > --- > > > > V2 > > > > Link: https://lore.kernel.org/all/20240730200633.93761-1-kuniyu@amazon.com/ > > > > 1. use RFC 9293 instead of RFC 793 which is too old (Kuniyuki) > > > > --- > > > > include/net/rstreason.h | 6 ++++++ > > > > net/ipv4/tcp.c | 4 ++-- > > > > net/ipv4/tcp_timer.c | 2 +- > > > > 3 files changed, 9 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/include/net/rstreason.h b/include/net/rstreason.h > > > > index eef658da8952..bbf20d0bbde7 100644 > > > > --- a/include/net/rstreason.h > > > > +++ b/include/net/rstreason.h > > > > @@ -20,6 +20,7 @@ > > > > FN(TCP_ABORT_ON_CLOSE) \ > > > > FN(TCP_ABORT_ON_LINGER) \ > > > > FN(TCP_ABORT_ON_MEMORY) \ > > > > + FN(TCP_STATE) \ > > > > FN(MPTCP_RST_EUNSPEC) \ > > > > FN(MPTCP_RST_EMPTCP) \ > > > > FN(MPTCP_RST_ERESOURCE) \ > > > > @@ -102,6 +103,11 @@ enum sk_rst_reason { > > > > * corresponding to LINUX_MIB_TCPABORTONMEMORY > > > > */ > > > > SK_RST_REASON_TCP_ABORT_ON_MEMORY, > > > > + /** > > > > + * @SK_RST_REASON_TCP_STATE: abort on tcp state > > > > + * Please see RFC 9293 for all possible reset conditions > > > > + */ > > > > + SK_RST_REASON_TCP_STATE, > > > > > > > > /* Copy from include/uapi/linux/mptcp.h. > > > > * These reset fields will not be changed since they adhere to > > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > > > index fd928c447ce8..64a49cb714e1 100644 > > > > --- a/net/ipv4/tcp.c > > > > +++ b/net/ipv4/tcp.c > > > > @@ -3031,7 +3031,7 @@ int tcp_disconnect(struct sock *sk, int flags) > > > > /* The last check adjusts for discrepancy of Linux wrt. RFC > > > > * states > > > > */ > > > > - tcp_send_active_reset(sk, gfp_any(), SK_RST_REASON_NOT_SPECIFIED); > > > > + tcp_send_active_reset(sk, gfp_any(), SK_RST_REASON_TCP_STATE); > > > > > > I disagree with this. tcp_disconnect() is initiated by the user. > > > > > > You are conflating two possible conditions : > > > > > > 1) tcp_need_reset(old_state) > > > > For this one, I can keep the TCP_STATE reason, right? > > What does it mean ? I mean I wonder if I can use the TCP_STATE reason in tcp_abort() and tcp_disconnect() when tcp_need_reset() returns true? > > > > > > 2) (tp->snd_nxt != tp->write_seq && (1 << old_state) & (TCPF_CLOSING | > > > TCPF_LAST_ACK))) > > > > > > > For this one, I wonder if I need to separate this condition with > > 'tcp_need_reset()' and put it into another 'else-if' branch? > > I decided to name it as 'CLOSE_WITH_DATA' because it can reflect that > > the write queue of the socket is not empty (at this time the user may > > think he has more data to send) but it stays in the active close > > state. > > How about it? > > This is not CLOSE_WITH_DATA, but a disconnect() operation, initiated > by user space. > If we add RST reasons, can we please be careful about the chosen names ? Yes, I know, but like old days, I'm struggling with the English name. Sorry. > > man connect > > <quote> > Some protocol sockets (e.g., TCP sockets as well as datagram > sockets in the UNIX and Internet domains) may dissolve the association > by connecting to an address with the sa_family member of sockaddr set > to AF_UNSPEC; thereafter, the socket can be connected to another ad‐ > dress. (AF_UNSPEC is supported since Linux 2.2.) > </quote> > > Very different from close()... Oh, I see. What I was talking about 'CLOSE' is the socket state, but you are right: the name will finally be displayed to users, which must clearly reflect the real meaning of the underlying behavior. I will use "TCP_DISCONNECT_WITH_DATA" instead under this condition. And then, I will put it into a new patch since it's a different reason name. Thanks for your help! Jason
diff --git a/include/net/rstreason.h b/include/net/rstreason.h index eef658da8952..bbf20d0bbde7 100644 --- a/include/net/rstreason.h +++ b/include/net/rstreason.h @@ -20,6 +20,7 @@ FN(TCP_ABORT_ON_CLOSE) \ FN(TCP_ABORT_ON_LINGER) \ FN(TCP_ABORT_ON_MEMORY) \ + FN(TCP_STATE) \ FN(MPTCP_RST_EUNSPEC) \ FN(MPTCP_RST_EMPTCP) \ FN(MPTCP_RST_ERESOURCE) \ @@ -102,6 +103,11 @@ enum sk_rst_reason { * corresponding to LINUX_MIB_TCPABORTONMEMORY */ SK_RST_REASON_TCP_ABORT_ON_MEMORY, + /** + * @SK_RST_REASON_TCP_STATE: abort on tcp state + * Please see RFC 9293 for all possible reset conditions + */ + SK_RST_REASON_TCP_STATE, /* Copy from include/uapi/linux/mptcp.h. * These reset fields will not be changed since they adhere to diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index fd928c447ce8..64a49cb714e1 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3031,7 +3031,7 @@ int tcp_disconnect(struct sock *sk, int flags) /* The last check adjusts for discrepancy of Linux wrt. RFC * states */ - tcp_send_active_reset(sk, gfp_any(), SK_RST_REASON_NOT_SPECIFIED); + tcp_send_active_reset(sk, gfp_any(), SK_RST_REASON_TCP_STATE); WRITE_ONCE(sk->sk_err, ECONNRESET); } else if (old_state == TCP_SYN_SENT) WRITE_ONCE(sk->sk_err, ECONNRESET); @@ -4649,7 +4649,7 @@ int tcp_abort(struct sock *sk, int err) if (!sock_flag(sk, SOCK_DEAD)) { if (tcp_need_reset(sk->sk_state)) tcp_send_active_reset(sk, GFP_ATOMIC, - SK_RST_REASON_NOT_SPECIFIED); + SK_RST_REASON_TCP_STATE); tcp_done_with_error(sk, err); } diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 0fba4a4fb988..3910f6d8614e 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -779,7 +779,7 @@ static void tcp_keepalive_timer (struct timer_list *t) goto out; } } - tcp_send_active_reset(sk, GFP_ATOMIC, SK_RST_REASON_NOT_SPECIFIED); + tcp_send_active_reset(sk, GFP_ATOMIC, SK_RST_REASON_TCP_STATE); goto death; }