Message ID | 20241105025511.42652-1-kerneljasonxing@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process | expand |
From: Jason Xing <kerneljasonxing@gmail.com> Date: Tue, 5 Nov 2024 10:55:11 +0800 > From: Jason Xing <kernelxing@tencent.com> > > We found there are rare chances that some RST packets appear during > the shakehands because the timewait socket cannot accept the SYN and s/shakehands/handshake/ same in the subject. > doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > Here is how things happen in production: > Time Client(A) Server(B) > 0s SYN--> > ... > 132s <-- FIN > ... > 169s FIN--> > 169s <-- ACK > 169s SYN--> > 169s <-- ACK > 169s RST--> > As above picture shows, the two flows have a start time difference > of 169 seconds. B starts to send FIN so it will finally enter into > TIMEWAIT state. Nearly at the same time A launches a new connection > that soon is reset by itself due to receiving a ACK. > > There are two key checks in tcp_timewait_state_process() when timewait > socket in B receives the SYN packet: > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > Regarding the first rule, it fails as expected because in the first > connection the seq of SYN sent from A is 1892994276, then 169s have > passed, the second SYN has 239034613 (caused by overflow of s32). > > Then how about the second rule? > It fails again! > Let's take a look at how the tsval comes out: > __tcp_transmit_skb() > -> tcp_syn_options() > -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > The timestamp depends on two things, one is skb->skb_mstamp_ns, the > other is tp->tsoffset. The latter value is fixed, so we don't need > to care about it. If both operations (sending FIN and then starting > sending SYN) from A happen in 1ms, then the tsval would be the same. > It can be clearly seen in the tcpdump log. Notice that the tsval is > with millisecond precision. > > Based on the above analysis, I decided to make a small change to > the check in tcp_timewait_state_process() so that the second flow > would not fail. > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- > net/ipv4/tcp_minisocks.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c > index bb1fe1ba867a..2b29d1bf5ca0 100644 > --- a/net/ipv4/tcp_minisocks.c > +++ b/net/ipv4/tcp_minisocks.c > @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb, > if (th->syn && !th->rst && !th->ack && !paws_reject && > (after(TCP_SKB_CB(skb)->seq, rcv_nxt) || > (tmp_opt.saw_tstamp && > - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) { I think this follows RFC 6191 and such a change requires a formal discussion at IETF. https://datatracker.ietf.org/doc/html/rfc6191#section-2 ---8<--- * If TCP Timestamps would be enabled for the new incarnation of the connection, and the timestamp contained in the incoming SYN segment is greater than the last timestamp seen on the previous incarnation of the connection (for that direction of the data transfer), honor the connection request (creating a connection in the SYN-RECEIVED state). * If TCP Timestamps would be enabled for the new incarnation of the connection, the timestamp contained in the incoming SYN segment is equal to the last timestamp seen on the previous incarnation of the connection (for that direction of the data transfer), and the Sequence Number of the incoming SYN segment is greater than the last sequence number seen on the previous incarnation of the connection (for that direction of the data transfer), honor the connection request (creating a connection in the SYN-RECEIVED state). ---8<--- > + (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) <= 0))) { > u32 isn = tcptw->tw_snd_nxt + 65535 + 2; > if (isn == 0) > isn++; > -- > 2.37.3
On Tue, Nov 5, 2024 at 3:49 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > From: Jason Xing <kerneljasonxing@gmail.com> > Date: Tue, 5 Nov 2024 10:55:11 +0800 > > From: Jason Xing <kernelxing@tencent.com> > > > > We found there are rare chances that some RST packets appear during > > the shakehands because the timewait socket cannot accept the SYN and > > s/shakehands/handshake/ > > same in the subject. > > > doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > > > Here is how things happen in production: > > Time Client(A) Server(B) > > 0s SYN--> > > ... > > 132s <-- FIN > > ... > > 169s FIN--> > > 169s <-- ACK > > 169s SYN--> > > 169s <-- ACK > > 169s RST--> > > As above picture shows, the two flows have a start time difference > > of 169 seconds. B starts to send FIN so it will finally enter into > > TIMEWAIT state. Nearly at the same time A launches a new connection > > that soon is reset by itself due to receiving a ACK. > > > > There are two key checks in tcp_timewait_state_process() when timewait > > socket in B receives the SYN packet: > > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > > > Regarding the first rule, it fails as expected because in the first > > connection the seq of SYN sent from A is 1892994276, then 169s have > > passed, the second SYN has 239034613 (caused by overflow of s32). > > > > Then how about the second rule? > > It fails again! > > Let's take a look at how the tsval comes out: > > __tcp_transmit_skb() > > -> tcp_syn_options() > > -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > > The timestamp depends on two things, one is skb->skb_mstamp_ns, the > > other is tp->tsoffset. The latter value is fixed, so we don't need > > to care about it. If both operations (sending FIN and then starting > > sending SYN) from A happen in 1ms, then the tsval would be the same. > > It can be clearly seen in the tcpdump log. Notice that the tsval is > > with millisecond precision. > > > > Based on the above analysis, I decided to make a small change to > > the check in tcp_timewait_state_process() so that the second flow > > would not fail. > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > --- > > net/ipv4/tcp_minisocks.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c > > index bb1fe1ba867a..2b29d1bf5ca0 100644 > > --- a/net/ipv4/tcp_minisocks.c > > +++ b/net/ipv4/tcp_minisocks.c > > @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb, > > if (th->syn && !th->rst && !th->ack && !paws_reject && > > (after(TCP_SKB_CB(skb)->seq, rcv_nxt) || > > (tmp_opt.saw_tstamp && > > - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) { > > I think this follows RFC 6191 and such a change requires a formal > discussion at IETF. > > https://datatracker.ietf.org/doc/html/rfc6191#section-2 > > ---8<--- > * If TCP Timestamps would be enabled for the new incarnation of > the connection, and the timestamp contained in the incoming SYN > segment is greater than the last timestamp seen on the previous The true thing is that the timestamp of the SYN packet is greater than that of the last packet, but the kernel implementation uses ms precision (please see tcp_skb_timestamp_ts()). That function makes those two timestamps the same. This case happens as expected, so the second connection should be established. My confusion just popped out of my mind: what rules should we follow to stop the second flow? Thanks, Jason
On Tue, Nov 5, 2024 at 10:08 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Tue, Nov 5, 2024 at 3:49 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > From: Jason Xing <kerneljasonxing@gmail.com> > > Date: Tue, 5 Nov 2024 10:55:11 +0800 > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > We found there are rare chances that some RST packets appear during > > > the shakehands because the timewait socket cannot accept the SYN and > > > > s/shakehands/handshake/ > > > > same in the subject. > > > > > doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > > > > > Here is how things happen in production: > > > Time Client(A) Server(B) > > > 0s SYN--> > > > ... > > > 132s <-- FIN > > > ... > > > 169s FIN--> > > > 169s <-- ACK > > > 169s SYN--> > > > 169s <-- ACK > > > 169s RST--> > > > As above picture shows, the two flows have a start time difference > > > of 169 seconds. B starts to send FIN so it will finally enter into > > > TIMEWAIT state. Nearly at the same time A launches a new connection > > > that soon is reset by itself due to receiving a ACK. > > > > > > There are two key checks in tcp_timewait_state_process() when timewait > > > socket in B receives the SYN packet: > > > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > > > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > > > > > Regarding the first rule, it fails as expected because in the first > > > connection the seq of SYN sent from A is 1892994276, then 169s have > > > passed, the second SYN has 239034613 (caused by overflow of s32). > > > > > > Then how about the second rule? > > > It fails again! > > > Let's take a look at how the tsval comes out: > > > __tcp_transmit_skb() > > > -> tcp_syn_options() > > > -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > > > The timestamp depends on two things, one is skb->skb_mstamp_ns, the > > > other is tp->tsoffset. The latter value is fixed, so we don't need > > > to care about it. If both operations (sending FIN and then starting > > > sending SYN) from A happen in 1ms, then the tsval would be the same. > > > It can be clearly seen in the tcpdump log. Notice that the tsval is > > > with millisecond precision. > > > > > > Based on the above analysis, I decided to make a small change to > > > the check in tcp_timewait_state_process() so that the second flow > > > would not fail. > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > --- > > > net/ipv4/tcp_minisocks.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c > > > index bb1fe1ba867a..2b29d1bf5ca0 100644 > > > --- a/net/ipv4/tcp_minisocks.c > > > +++ b/net/ipv4/tcp_minisocks.c > > > @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb, > > > if (th->syn && !th->rst && !th->ack && !paws_reject && > > > (after(TCP_SKB_CB(skb)->seq, rcv_nxt) || > > > (tmp_opt.saw_tstamp && > > > - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) { > > > > I think this follows RFC 6191 and such a change requires a formal > > discussion at IETF. > > > > https://datatracker.ietf.org/doc/html/rfc6191#section-2 > > > > ---8<--- > > * If TCP Timestamps would be enabled for the new incarnation of > > the connection, and the timestamp contained in the incoming SYN > > segment is greater than the last timestamp seen on the previous > > The true thing is that the timestamp of the SYN packet is greater than > that of the last packet, but the kernel implementation uses ms > precision (please see tcp_skb_timestamp_ts()). That function makes > those two timestamps the same. > > This case happens as expected, so the second connection should be > established. My confusion just popped out of my mind: what rules > should we follow to stop the second flow? Note that linux TCP stack can use usec resolution for TCP TS values. You might adopt it, and no longer care about this ms granularity.
On Tue, Nov 5, 2024 at 5:35 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Nov 5, 2024 at 10:08 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Tue, Nov 5, 2024 at 3:49 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > > > From: Jason Xing <kerneljasonxing@gmail.com> > > > Date: Tue, 5 Nov 2024 10:55:11 +0800 > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > We found there are rare chances that some RST packets appear during > > > > the shakehands because the timewait socket cannot accept the SYN and > > > > > > s/shakehands/handshake/ > > > > > > same in the subject. > > > > > > > doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > > > > > > > Here is how things happen in production: > > > > Time Client(A) Server(B) > > > > 0s SYN--> > > > > ... > > > > 132s <-- FIN > > > > ... > > > > 169s FIN--> > > > > 169s <-- ACK > > > > 169s SYN--> > > > > 169s <-- ACK > > > > 169s RST--> > > > > As above picture shows, the two flows have a start time difference > > > > of 169 seconds. B starts to send FIN so it will finally enter into > > > > TIMEWAIT state. Nearly at the same time A launches a new connection > > > > that soon is reset by itself due to receiving a ACK. > > > > > > > > There are two key checks in tcp_timewait_state_process() when timewait > > > > socket in B receives the SYN packet: > > > > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > > > > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > > > > > > > Regarding the first rule, it fails as expected because in the first > > > > connection the seq of SYN sent from A is 1892994276, then 169s have > > > > passed, the second SYN has 239034613 (caused by overflow of s32). > > > > > > > > Then how about the second rule? > > > > It fails again! > > > > Let's take a look at how the tsval comes out: > > > > __tcp_transmit_skb() > > > > -> tcp_syn_options() > > > > -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > > > > The timestamp depends on two things, one is skb->skb_mstamp_ns, the > > > > other is tp->tsoffset. The latter value is fixed, so we don't need > > > > to care about it. If both operations (sending FIN and then starting > > > > sending SYN) from A happen in 1ms, then the tsval would be the same. > > > > It can be clearly seen in the tcpdump log. Notice that the tsval is > > > > with millisecond precision. > > > > > > > > Based on the above analysis, I decided to make a small change to > > > > the check in tcp_timewait_state_process() so that the second flow > > > > would not fail. > > > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > --- > > > > net/ipv4/tcp_minisocks.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c > > > > index bb1fe1ba867a..2b29d1bf5ca0 100644 > > > > --- a/net/ipv4/tcp_minisocks.c > > > > +++ b/net/ipv4/tcp_minisocks.c > > > > @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb, > > > > if (th->syn && !th->rst && !th->ack && !paws_reject && > > > > (after(TCP_SKB_CB(skb)->seq, rcv_nxt) || > > > > (tmp_opt.saw_tstamp && > > > > - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) { > > > > > > I think this follows RFC 6191 and such a change requires a formal > > > discussion at IETF. > > > > > > https://datatracker.ietf.org/doc/html/rfc6191#section-2 > > > > > > ---8<--- > > > * If TCP Timestamps would be enabled for the new incarnation of > > > the connection, and the timestamp contained in the incoming SYN > > > segment is greater than the last timestamp seen on the previous > > > > The true thing is that the timestamp of the SYN packet is greater than > > that of the last packet, but the kernel implementation uses ms > > precision (please see tcp_skb_timestamp_ts()). That function makes > > those two timestamps the same. > > > > This case happens as expected, so the second connection should be > > established. My confusion just popped out of my mind: what rules > > should we follow to stop the second flow? > > Note that linux TCP stack can use usec resolution for TCP TS values. > > You might adopt it, and no longer care about this ms granularity. Right, I noticed this feature. I wonder if we can change the check in tcp_timewait_state_process() like this patch if it has no side effects? I'm worried that some programs don't use this feature. It's the reason why I try to propose this patch to you. Thanks, Jason
On Tue, Nov 5, 2024 at 10:42 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Tue, Nov 5, 2024 at 5:35 PM Eric Dumazet <edumazet@google.com> wrote: > > > > On Tue, Nov 5, 2024 at 10:08 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > On Tue, Nov 5, 2024 at 3:49 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > > > > > From: Jason Xing <kerneljasonxing@gmail.com> > > > > Date: Tue, 5 Nov 2024 10:55:11 +0800 > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > We found there are rare chances that some RST packets appear during > > > > > the shakehands because the timewait socket cannot accept the SYN and > > > > > > > > s/shakehands/handshake/ > > > > > > > > same in the subject. > > > > > > > > > doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > > > > > > > > > Here is how things happen in production: > > > > > Time Client(A) Server(B) > > > > > 0s SYN--> > > > > > ... > > > > > 132s <-- FIN > > > > > ... > > > > > 169s FIN--> > > > > > 169s <-- ACK > > > > > 169s SYN--> > > > > > 169s <-- ACK > > > > > 169s RST--> > > > > > As above picture shows, the two flows have a start time difference > > > > > of 169 seconds. B starts to send FIN so it will finally enter into > > > > > TIMEWAIT state. Nearly at the same time A launches a new connection > > > > > that soon is reset by itself due to receiving a ACK. > > > > > > > > > > There are two key checks in tcp_timewait_state_process() when timewait > > > > > socket in B receives the SYN packet: > > > > > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > > > > > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > > > > > > > > > Regarding the first rule, it fails as expected because in the first > > > > > connection the seq of SYN sent from A is 1892994276, then 169s have > > > > > passed, the second SYN has 239034613 (caused by overflow of s32). > > > > > > > > > > Then how about the second rule? > > > > > It fails again! > > > > > Let's take a look at how the tsval comes out: > > > > > __tcp_transmit_skb() > > > > > -> tcp_syn_options() > > > > > -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > > > > > The timestamp depends on two things, one is skb->skb_mstamp_ns, the > > > > > other is tp->tsoffset. The latter value is fixed, so we don't need > > > > > to care about it. If both operations (sending FIN and then starting > > > > > sending SYN) from A happen in 1ms, then the tsval would be the same. > > > > > It can be clearly seen in the tcpdump log. Notice that the tsval is > > > > > with millisecond precision. > > > > > > > > > > Based on the above analysis, I decided to make a small change to > > > > > the check in tcp_timewait_state_process() so that the second flow > > > > > would not fail. > > > > > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > > --- > > > > > net/ipv4/tcp_minisocks.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c > > > > > index bb1fe1ba867a..2b29d1bf5ca0 100644 > > > > > --- a/net/ipv4/tcp_minisocks.c > > > > > +++ b/net/ipv4/tcp_minisocks.c > > > > > @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb, > > > > > if (th->syn && !th->rst && !th->ack && !paws_reject && > > > > > (after(TCP_SKB_CB(skb)->seq, rcv_nxt) || > > > > > (tmp_opt.saw_tstamp && > > > > > - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) { > > > > > > > > I think this follows RFC 6191 and such a change requires a formal > > > > discussion at IETF. > > > > > > > > https://datatracker.ietf.org/doc/html/rfc6191#section-2 > > > > > > > > ---8<--- > > > > * If TCP Timestamps would be enabled for the new incarnation of > > > > the connection, and the timestamp contained in the incoming SYN > > > > segment is greater than the last timestamp seen on the previous > > > > > > The true thing is that the timestamp of the SYN packet is greater than > > > that of the last packet, but the kernel implementation uses ms > > > precision (please see tcp_skb_timestamp_ts()). That function makes > > > those two timestamps the same. > > > > > > This case happens as expected, so the second connection should be > > > established. My confusion just popped out of my mind: what rules > > > should we follow to stop the second flow? > > > > Note that linux TCP stack can use usec resolution for TCP TS values. > > > > You might adopt it, and no longer care about this ms granularity. > > Right, I noticed this feature. I wonder if we can change the check in > tcp_timewait_state_process() like this patch if it has no side > effects? I'm worried that some programs don't use this feature. It's > the reason why I try to propose this patch to you. Breaking RFC ? I do not think so. Instead, use a usec clock and voila, the problem is solved.
On Tue, Nov 5, 2024 at 5:50 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Nov 5, 2024 at 10:42 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Tue, Nov 5, 2024 at 5:35 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Tue, Nov 5, 2024 at 10:08 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > On Tue, Nov 5, 2024 at 3:49 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > > > > > > > From: Jason Xing <kerneljasonxing@gmail.com> > > > > > Date: Tue, 5 Nov 2024 10:55:11 +0800 > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > > > We found there are rare chances that some RST packets appear during > > > > > > the shakehands because the timewait socket cannot accept the SYN and > > > > > > > > > > s/shakehands/handshake/ > > > > > > > > > > same in the subject. > > > > > > > > > > > doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > > > > > > > > > > > Here is how things happen in production: > > > > > > Time Client(A) Server(B) > > > > > > 0s SYN--> > > > > > > ... > > > > > > 132s <-- FIN > > > > > > ... > > > > > > 169s FIN--> > > > > > > 169s <-- ACK > > > > > > 169s SYN--> > > > > > > 169s <-- ACK > > > > > > 169s RST--> > > > > > > As above picture shows, the two flows have a start time difference > > > > > > of 169 seconds. B starts to send FIN so it will finally enter into > > > > > > TIMEWAIT state. Nearly at the same time A launches a new connection > > > > > > that soon is reset by itself due to receiving a ACK. > > > > > > > > > > > > There are two key checks in tcp_timewait_state_process() when timewait > > > > > > socket in B receives the SYN packet: > > > > > > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > > > > > > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > > > > > > > > > > > Regarding the first rule, it fails as expected because in the first > > > > > > connection the seq of SYN sent from A is 1892994276, then 169s have > > > > > > passed, the second SYN has 239034613 (caused by overflow of s32). > > > > > > > > > > > > Then how about the second rule? > > > > > > It fails again! > > > > > > Let's take a look at how the tsval comes out: > > > > > > __tcp_transmit_skb() > > > > > > -> tcp_syn_options() > > > > > > -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > > > > > > The timestamp depends on two things, one is skb->skb_mstamp_ns, the > > > > > > other is tp->tsoffset. The latter value is fixed, so we don't need > > > > > > to care about it. If both operations (sending FIN and then starting > > > > > > sending SYN) from A happen in 1ms, then the tsval would be the same. > > > > > > It can be clearly seen in the tcpdump log. Notice that the tsval is > > > > > > with millisecond precision. > > > > > > > > > > > > Based on the above analysis, I decided to make a small change to > > > > > > the check in tcp_timewait_state_process() so that the second flow > > > > > > would not fail. > > > > > > > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > > > --- > > > > > > net/ipv4/tcp_minisocks.c | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c > > > > > > index bb1fe1ba867a..2b29d1bf5ca0 100644 > > > > > > --- a/net/ipv4/tcp_minisocks.c > > > > > > +++ b/net/ipv4/tcp_minisocks.c > > > > > > @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb, > > > > > > if (th->syn && !th->rst && !th->ack && !paws_reject && > > > > > > (after(TCP_SKB_CB(skb)->seq, rcv_nxt) || > > > > > > (tmp_opt.saw_tstamp && > > > > > > - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) { > > > > > > > > > > I think this follows RFC 6191 and such a change requires a formal > > > > > discussion at IETF. > > > > > > > > > > https://datatracker.ietf.org/doc/html/rfc6191#section-2 > > > > > > > > > > ---8<--- > > > > > * If TCP Timestamps would be enabled for the new incarnation of > > > > > the connection, and the timestamp contained in the incoming SYN > > > > > segment is greater than the last timestamp seen on the previous > > > > > > > > The true thing is that the timestamp of the SYN packet is greater than > > > > that of the last packet, but the kernel implementation uses ms > > > > precision (please see tcp_skb_timestamp_ts()). That function makes > > > > those two timestamps the same. > > > > > > > > This case happens as expected, so the second connection should be > > > > established. My confusion just popped out of my mind: what rules > > > > should we follow to stop the second flow? > > > > > > Note that linux TCP stack can use usec resolution for TCP TS values. > > > > > > You might adopt it, and no longer care about this ms granularity. > > > > Right, I noticed this feature. I wonder if we can change the check in > > tcp_timewait_state_process() like this patch if it has no side > > effects? I'm worried that some programs don't use this feature. It's > > the reason why I try to propose this patch to you. > > Breaking RFC ? I do not think so. Oh right, I just can't figure it out why since we've already lost the fine-grained timestamp in each skb. I spent a few days investigating the bad cases this patch may bring, but I failed. > Instead, use a usec clock and voila, the problem is solved. Sure, it can work :) Thanks, Jason
Hello Eric, Martin On Tue, Nov 5, 2024 at 5:50 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Nov 5, 2024 at 10:42 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Tue, Nov 5, 2024 at 5:35 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Tue, Nov 5, 2024 at 10:08 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > On Tue, Nov 5, 2024 at 3:49 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > > > > > > > From: Jason Xing <kerneljasonxing@gmail.com> > > > > > Date: Tue, 5 Nov 2024 10:55:11 +0800 > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > > > We found there are rare chances that some RST packets appear during > > > > > > the shakehands because the timewait socket cannot accept the SYN and > > > > > > > > > > s/shakehands/handshake/ > > > > > > > > > > same in the subject. > > > > > > > > > > > doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > > > > > > > > > > > Here is how things happen in production: > > > > > > Time Client(A) Server(B) > > > > > > 0s SYN--> > > > > > > ... > > > > > > 132s <-- FIN > > > > > > ... > > > > > > 169s FIN--> > > > > > > 169s <-- ACK > > > > > > 169s SYN--> > > > > > > 169s <-- ACK > > > > > > 169s RST--> > > > > > > As above picture shows, the two flows have a start time difference > > > > > > of 169 seconds. B starts to send FIN so it will finally enter into > > > > > > TIMEWAIT state. Nearly at the same time A launches a new connection > > > > > > that soon is reset by itself due to receiving a ACK. > > > > > > > > > > > > There are two key checks in tcp_timewait_state_process() when timewait > > > > > > socket in B receives the SYN packet: > > > > > > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > > > > > > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > > > > > > > > > > > Regarding the first rule, it fails as expected because in the first > > > > > > connection the seq of SYN sent from A is 1892994276, then 169s have > > > > > > passed, the second SYN has 239034613 (caused by overflow of s32). > > > > > > > > > > > > Then how about the second rule? > > > > > > It fails again! > > > > > > Let's take a look at how the tsval comes out: > > > > > > __tcp_transmit_skb() > > > > > > -> tcp_syn_options() > > > > > > -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > > > > > > The timestamp depends on two things, one is skb->skb_mstamp_ns, the > > > > > > other is tp->tsoffset. The latter value is fixed, so we don't need > > > > > > to care about it. If both operations (sending FIN and then starting > > > > > > sending SYN) from A happen in 1ms, then the tsval would be the same. > > > > > > It can be clearly seen in the tcpdump log. Notice that the tsval is > > > > > > with millisecond precision. > > > > > > > > > > > > Based on the above analysis, I decided to make a small change to > > > > > > the check in tcp_timewait_state_process() so that the second flow > > > > > > would not fail. > > > > > > > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > > > --- > > > > > > net/ipv4/tcp_minisocks.c | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c > > > > > > index bb1fe1ba867a..2b29d1bf5ca0 100644 > > > > > > --- a/net/ipv4/tcp_minisocks.c > > > > > > +++ b/net/ipv4/tcp_minisocks.c > > > > > > @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb, > > > > > > if (th->syn && !th->rst && !th->ack && !paws_reject && > > > > > > (after(TCP_SKB_CB(skb)->seq, rcv_nxt) || > > > > > > (tmp_opt.saw_tstamp && > > > > > > - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) { > > > > > > > > > > I think this follows RFC 6191 and such a change requires a formal > > > > > discussion at IETF. > > > > > > > > > > https://datatracker.ietf.org/doc/html/rfc6191#section-2 > > > > > > > > > > ---8<--- > > > > > * If TCP Timestamps would be enabled for the new incarnation of > > > > > the connection, and the timestamp contained in the incoming SYN > > > > > segment is greater than the last timestamp seen on the previous > > > > > > > > The true thing is that the timestamp of the SYN packet is greater than > > > > that of the last packet, but the kernel implementation uses ms > > > > precision (please see tcp_skb_timestamp_ts()). That function makes > > > > those two timestamps the same. > > > > > > > > This case happens as expected, so the second connection should be > > > > established. My confusion just popped out of my mind: what rules > > > > should we follow to stop the second flow? > > > > > > Note that linux TCP stack can use usec resolution for TCP TS values. > > > > > > You might adopt it, and no longer care about this ms granularity. > > > > Right, I noticed this feature. I wonder if we can change the check in > > tcp_timewait_state_process() like this patch if it has no side > > effects? I'm worried that some programs don't use this feature. It's > > the reason why I try to propose this patch to you. > > Breaking RFC ? I do not think so. > > Instead, use a usec clock and voila, the problem is solved. I wonder if it is necessary to use BPF to extend the usec clock. Since I started investigating the BPF part, I found it can help us extend many useful features in TCP to the most extent. Using it with the BPF program can make the feature take effect and widely used in future. I'm asking because I somehow have a feeling that someday the majority of traditional sockopts could be replaced by BPF. After all, requiring application modification for those kernel features is a bit heavy. I'm not sure if I'm right about it :S Thanks, Jason
Hello Eric, On Tue, Nov 5, 2024 at 10:55 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > From: Jason Xing <kernelxing@tencent.com> > > We found there are rare chances that some RST packets appear during > the shakehands because the timewait socket cannot accept the SYN and > doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > Here is how things happen in production: > Time Client(A) Server(B) > 0s SYN--> > ... > 132s <-- FIN > ... > 169s FIN--> > 169s <-- ACK > 169s SYN--> > 169s <-- ACK I noticed the above ACK doesn't adhere to RFC 6191. It says: "If the previous incarnation of the connection used Timestamps, then: if ... ... * Otherwise, silently drop the incoming SYN segment, thus leaving the previous incarnation of the connection in the TIME-WAIT state. " But the timewait socket sends an ACK because of this code snippet: tcp_timewait_state_process() -> // the checks of SYN packet failed. -> if (!th->rst) { -> return TCP_TW_ACK; // this line can be traced back to 2005 I think the following patch follows the RFC: diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index bb1fe1ba867a..cc22f0412f98 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -231,15 +231,17 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb, but not fatal yet. */ - if (th->syn && !th->rst && !th->ack && !paws_reject && - (after(TCP_SKB_CB(skb)->seq, rcv_nxt) || - (tmp_opt.saw_tstamp && - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) { - u32 isn = tcptw->tw_snd_nxt + 65535 + 2; - if (isn == 0) - isn++; - *tw_isn = isn; - return TCP_TW_SYN; + if (th->syn && !th->rst && !th->ack && !paws_reject) { + if (after(TCP_SKB_CB(skb)->seq, rcv_nxt) || + (tmp_opt.saw_tstamp && + (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)) { + u32 isn = tcptw->tw_snd_nxt + 65535 + 2; + if (isn == 0) + isn++; + *tw_isn = isn; + return TCP_TW_SYN; + } + return TCP_TW_SUCCESS; } if (paws_reject) Could you help me review this, Eric? Thanks in advance! Thanks, Jason > 169s RST--> > As above picture shows, the two flows have a start time difference > of 169 seconds. B starts to send FIN so it will finally enter into > TIMEWAIT state. Nearly at the same time A launches a new connection > that soon is reset by itself due to receiving a ACK. > > There are two key checks in tcp_timewait_state_process() when timewait > socket in B receives the SYN packet: > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > Regarding the first rule, it fails as expected because in the first > connection the seq of SYN sent from A is 1892994276, then 169s have > passed, the second SYN has 239034613 (caused by overflow of s32). > > Then how about the second rule? > It fails again! > Let's take a look at how the tsval comes out: > __tcp_transmit_skb() > -> tcp_syn_options() > -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > The timestamp depends on two things, one is skb->skb_mstamp_ns, the > other is tp->tsoffset. The latter value is fixed, so we don't need > to care about it. If both operations (sending FIN and then starting > sending SYN) from A happen in 1ms, then the tsval would be the same. > It can be clearly seen in the tcpdump log. Notice that the tsval is > with millisecond precision. > > Based on the above analysis, I decided to make a small change to > the check in tcp_timewait_state_process() so that the second flow > would not fail. > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- > net/ipv4/tcp_minisocks.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c > index bb1fe1ba867a..2b29d1bf5ca0 100644 > --- a/net/ipv4/tcp_minisocks.c > +++ b/net/ipv4/tcp_minisocks.c > @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb, > if (th->syn && !th->rst && !th->ack && !paws_reject && > (after(TCP_SKB_CB(skb)->seq, rcv_nxt) || > (tmp_opt.saw_tstamp && > - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) { > + (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) <= 0))) { > u32 isn = tcptw->tw_snd_nxt + 65535 + 2; > if (isn == 0) > isn++; > -- > 2.37.3 >
From: Jason Xing <kerneljasonxing@gmail.com> Date: Thu, 7 Nov 2024 11:16:04 +0800 > > Here is how things happen in production: > > Time Client(A) Server(B) > > 0s SYN--> > > ... > > 132s <-- FIN > > ... > > 169s FIN--> > > 169s <-- ACK > > 169s SYN--> > > 169s <-- ACK > > I noticed the above ACK doesn't adhere to RFC 6191. It says: > "If the previous incarnation of the connection used Timestamps, then: > if ... > ... > * Otherwise, silently drop the incoming SYN segment, thus leaving > the previous incarnation of the connection in the TIME-WAIT > state. > " > But the timewait socket sends an ACK because of this code snippet: > tcp_timewait_state_process() > -> // the checks of SYN packet failed. > -> if (!th->rst) { > -> return TCP_TW_ACK; // this line can be traced back to 2005 This is a challenge ACK following RFC 5961. If SYN is returned here, the client may lose the chance to RST the previous connection in TIME_WAIT. https://www.rfc-editor.org/rfc/rfc9293.html#section-3.10.7.4-2.4.1 ---8<--- - TIME-WAIT STATE o If the SYN bit is set in these synchronized states, it may be either a legitimate new connection attempt (e.g., in the case of TIME-WAIT), an error where the connection should be reset, or the result of an attack attempt, as described in RFC 5961 [9]. For the TIME-WAIT state, new connections can be accepted if the Timestamp Option is used and meets expectations (per [40]). For all other cases, RFC 5961 provides a mitigation with applicability to some situations, though there are also alternatives that offer cryptographic protection (see Section 7). RFC 5961 recommends that in these synchronized states, if the SYN bit is set, irrespective of the sequence number, TCP endpoints MUST send a "challenge ACK" to the remote peer: <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> ---8<--- https://datatracker.ietf.org/doc/html/rfc5961#section-4 ---8<--- 1) If the SYN bit is set, irrespective of the sequence number, TCP MUST send an ACK (also referred to as challenge ACK) to the remote peer: <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> After sending the acknowledgment, TCP MUST drop the unacceptable segment and stop processing further. ---8<---
From: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Wed, 6 Nov 2024 20:15:06 -0800 > From: Jason Xing <kerneljasonxing@gmail.com> > Date: Thu, 7 Nov 2024 11:16:04 +0800 > > > Here is how things happen in production: > > > Time Client(A) Server(B) > > > 0s SYN--> > > > ... > > > 132s <-- FIN > > > ... > > > 169s FIN--> > > > 169s <-- ACK > > > 169s SYN--> > > > 169s <-- ACK > > > > I noticed the above ACK doesn't adhere to RFC 6191. It says: > > "If the previous incarnation of the connection used Timestamps, then: > > if ... > > ... > > * Otherwise, silently drop the incoming SYN segment, thus leaving > > the previous incarnation of the connection in the TIME-WAIT > > state. > > " > > But the timewait socket sends an ACK because of this code snippet: > > tcp_timewait_state_process() > > -> // the checks of SYN packet failed. > > -> if (!th->rst) { > > -> return TCP_TW_ACK; // this line can be traced back to 2005 > > This is a challenge ACK following RFC 5961. > > If SYN is returned here, the client may lose the chance to RST the > previous connection in TIME_WAIT. s/returned/silently dropped/ :/
On Thu, Nov 7, 2024 at 12:15 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > From: Jason Xing <kerneljasonxing@gmail.com> > Date: Thu, 7 Nov 2024 11:16:04 +0800 > > > Here is how things happen in production: > > > Time Client(A) Server(B) > > > 0s SYN--> > > > ... > > > 132s <-- FIN > > > ... > > > 169s FIN--> > > > 169s <-- ACK > > > 169s SYN--> > > > 169s <-- ACK > > > > I noticed the above ACK doesn't adhere to RFC 6191. It says: > > "If the previous incarnation of the connection used Timestamps, then: > > if ... > > ... > > * Otherwise, silently drop the incoming SYN segment, thus leaving > > the previous incarnation of the connection in the TIME-WAIT > > state. > > " > > But the timewait socket sends an ACK because of this code snippet: > > tcp_timewait_state_process() > > -> // the checks of SYN packet failed. > > -> if (!th->rst) { > > -> return TCP_TW_ACK; // this line can be traced back to 2005 > > This is a challenge ACK following RFC 5961. Please note the idea of challenge ack was proposed in 2010. But this code snippet has already existed before 2005. If it is a challenge ack, then at least we need to count it (by using NET_INC_STATS(net, LINUX_MIB_TCPCHALLENGEACK);). > > If SYN is returned here, the client may lose the chance to RST the > previous connection in TIME_WAIT. > > https://www.rfc-editor.org/rfc/rfc9293.html#section-3.10.7.4-2.4.1 > ---8<--- > - TIME-WAIT STATE > > o If the SYN bit is set in these synchronized states, it may > be either a legitimate new connection attempt (e.g., in the > case of TIME-WAIT), an error where the connection should be > reset, or the result of an attack attempt, as described in > RFC 5961 [9]. For the TIME-WAIT state, new connections can > be accepted if the Timestamp Option is used and meets > expectations (per [40]). For all other cases, RFC 5961 > provides a mitigation with applicability to some situations, > though there are also alternatives that offer cryptographic > protection (see Section 7). RFC 5961 recommends that in > these synchronized states, if the SYN bit is set, > irrespective of the sequence number, TCP endpoints MUST send > a "challenge ACK" to the remote peer: > > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> > ---8<--- > > https://datatracker.ietf.org/doc/html/rfc5961#section-4 > ---8<--- > 1) If the SYN bit is set, irrespective of the sequence number, TCP > MUST send an ACK (also referred to as challenge ACK) to the remote > peer: > > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> > > After sending the acknowledgment, TCP MUST drop the unacceptable > segment and stop processing further. > ---8<--- The RFC 5961 4.2 was implemented in tcp_validate_incoming(): /* step 4: Check for a SYN * RFC 5961 4.2 : Send a challenge ack */ if (th->syn) { if (sk->sk_state == TCP_SYN_RECV && sk->sk_socket && th->ack && TCP_SKB_CB(skb)->seq + 1 == TCP_SKB_CB(skb)->end_seq && TCP_SKB_CB(skb)->seq + 1 == tp->rcv_nxt && TCP_SKB_CB(skb)->ack_seq == tp->snd_nxt) goto pass; syn_challenge: if (syn_inerr) TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS); NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSYNCHALLENGE); tcp_send_challenge_ack(sk); SKB_DR_SET(reason, TCP_INVALID_SYN); goto discard; } Also, this quotation you mentioned obviously doesn't match the kernel implementation: "If the SYN bit is set, irrespective of the sequence number, TCP MUST send an ACK" The tcp_timewait_state_process() does care about the seq number, or else timewait socket would refuse every SYN packet. Thanks, Jason
From: Jason Xing <kerneljasonxing@gmail.com> Date: Thu, 7 Nov 2024 13:23:50 +0800 > On Thu, Nov 7, 2024 at 12:15 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > From: Jason Xing <kerneljasonxing@gmail.com> > > Date: Thu, 7 Nov 2024 11:16:04 +0800 > > > > Here is how things happen in production: > > > > Time Client(A) Server(B) > > > > 0s SYN--> > > > > ... > > > > 132s <-- FIN > > > > ... > > > > 169s FIN--> > > > > 169s <-- ACK > > > > 169s SYN--> > > > > 169s <-- ACK > > > > > > I noticed the above ACK doesn't adhere to RFC 6191. It says: > > > "If the previous incarnation of the connection used Timestamps, then: > > > if ... > > > ... > > > * Otherwise, silently drop the incoming SYN segment, thus leaving > > > the previous incarnation of the connection in the TIME-WAIT > > > state. > > > " > > > But the timewait socket sends an ACK because of this code snippet: > > > tcp_timewait_state_process() > > > -> // the checks of SYN packet failed. > > > -> if (!th->rst) { > > > -> return TCP_TW_ACK; // this line can be traced back to 2005 > > > > This is a challenge ACK following RFC 5961. > > Please note the idea of challenge ack was proposed in 2010. But this > code snippet has already existed before 2005. If it is a challenge > ack, then at least we need to count it (by using NET_INC_STATS(net, > LINUX_MIB_TCPCHALLENGEACK);). The word was not accurate, the behaviour is compliant with RFC 5961. RFC is often formalised based on real implementations. Incrementing the count makes sense to me. > > > > > If SYN is returned here, the client may lose the chance to RST the > > previous connection in TIME_WAIT. > > > > https://www.rfc-editor.org/rfc/rfc9293.html#section-3.10.7.4-2.4.1 > > ---8<--- > > - TIME-WAIT STATE > > > > o If the SYN bit is set in these synchronized states, it may > > be either a legitimate new connection attempt (e.g., in the > > case of TIME-WAIT), an error where the connection should be > > reset, or the result of an attack attempt, as described in > > RFC 5961 [9]. For the TIME-WAIT state, new connections can > > be accepted if the Timestamp Option is used and meets > > expectations (per [40]). For all other cases, RFC 5961 > > provides a mitigation with applicability to some situations, > > though there are also alternatives that offer cryptographic > > protection (see Section 7). RFC 5961 recommends that in > > these synchronized states, if the SYN bit is set, > > irrespective of the sequence number, TCP endpoints MUST send > > a "challenge ACK" to the remote peer: > > > > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> > > ---8<--- > > > > https://datatracker.ietf.org/doc/html/rfc5961#section-4 > > ---8<--- > > 1) If the SYN bit is set, irrespective of the sequence number, TCP > > MUST send an ACK (also referred to as challenge ACK) to the remote > > peer: > > > > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> > > > > After sending the acknowledgment, TCP MUST drop the unacceptable > > segment and stop processing further. > > ---8<--- > > The RFC 5961 4.2 was implemented in tcp_validate_incoming(): > /* step 4: Check for a SYN > * RFC 5961 4.2 : Send a challenge ack > */ > if (th->syn) { > if (sk->sk_state == TCP_SYN_RECV && sk->sk_socket && th->ack && > TCP_SKB_CB(skb)->seq + 1 == TCP_SKB_CB(skb)->end_seq && > TCP_SKB_CB(skb)->seq + 1 == tp->rcv_nxt && > TCP_SKB_CB(skb)->ack_seq == tp->snd_nxt) > goto pass; > syn_challenge: > if (syn_inerr) > TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS); > NET_INC_STATS(sock_net(sk), > LINUX_MIB_TCPSYNCHALLENGE); > tcp_send_challenge_ack(sk); > SKB_DR_SET(reason, TCP_INVALID_SYN); > goto discard; > } > > Also, this quotation you mentioned obviously doesn't match the kernel > implementation: > "If the SYN bit is set, irrespective of the sequence number, TCP MUST > send an ACK" > The tcp_timewait_state_process() does care about the seq number, or > else timewait socket would refuse every SYN packet. That's why I pasted RFC 9293 first that clearly states that we should check seq number and then return ACK for all other cases.
On Thu, Nov 7, 2024 at 1:43 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > From: Jason Xing <kerneljasonxing@gmail.com> > Date: Thu, 7 Nov 2024 13:23:50 +0800 > > On Thu, Nov 7, 2024 at 12:15 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > > > From: Jason Xing <kerneljasonxing@gmail.com> > > > Date: Thu, 7 Nov 2024 11:16:04 +0800 > > > > > Here is how things happen in production: > > > > > Time Client(A) Server(B) > > > > > 0s SYN--> > > > > > ... > > > > > 132s <-- FIN > > > > > ... > > > > > 169s FIN--> > > > > > 169s <-- ACK > > > > > 169s SYN--> > > > > > 169s <-- ACK > > > > > > > > I noticed the above ACK doesn't adhere to RFC 6191. It says: > > > > "If the previous incarnation of the connection used Timestamps, then: > > > > if ... > > > > ... > > > > * Otherwise, silently drop the incoming SYN segment, thus leaving > > > > the previous incarnation of the connection in the TIME-WAIT > > > > state. > > > > " > > > > But the timewait socket sends an ACK because of this code snippet: > > > > tcp_timewait_state_process() > > > > -> // the checks of SYN packet failed. > > > > -> if (!th->rst) { > > > > -> return TCP_TW_ACK; // this line can be traced back to 2005 > > > > > > This is a challenge ACK following RFC 5961. > > > > Please note the idea of challenge ack was proposed in 2010. But this > > code snippet has already existed before 2005. If it is a challenge > > ack, then at least we need to count it (by using NET_INC_STATS(net, > > LINUX_MIB_TCPCHALLENGEACK);). > > The word was not accurate, the behaviour is compliant with RFC 5961. > RFC is often formalised based on real implementations. > > Incrementing the count makes sense to me. > > > > > > > > > If SYN is returned here, the client may lose the chance to RST the > > > previous connection in TIME_WAIT. > > > > > > https://www.rfc-editor.org/rfc/rfc9293.html#section-3.10.7.4-2.4.1 > > > ---8<--- > > > - TIME-WAIT STATE > > > > > > o If the SYN bit is set in these synchronized states, it may > > > be either a legitimate new connection attempt (e.g., in the > > > case of TIME-WAIT), an error where the connection should be > > > reset, or the result of an attack attempt, as described in > > > RFC 5961 [9]. For the TIME-WAIT state, new connections can > > > be accepted if the Timestamp Option is used and meets > > > expectations (per [40]). For all other cases, RFC 5961 > > > provides a mitigation with applicability to some situations, > > > though there are also alternatives that offer cryptographic > > > protection (see Section 7). RFC 5961 recommends that in > > > these synchronized states, if the SYN bit is set, > > > irrespective of the sequence number, TCP endpoints MUST send > > > a "challenge ACK" to the remote peer: > > > > > > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> > > > ---8<--- > > > > > > https://datatracker.ietf.org/doc/html/rfc5961#section-4 > > > ---8<--- > > > 1) If the SYN bit is set, irrespective of the sequence number, TCP > > > MUST send an ACK (also referred to as challenge ACK) to the remote > > > peer: > > > > > > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> > > > > > > After sending the acknowledgment, TCP MUST drop the unacceptable > > > segment and stop processing further. > > > ---8<--- > > > > The RFC 5961 4.2 was implemented in tcp_validate_incoming(): > > /* step 4: Check for a SYN > > * RFC 5961 4.2 : Send a challenge ack > > */ > > if (th->syn) { > > if (sk->sk_state == TCP_SYN_RECV && sk->sk_socket && th->ack && > > TCP_SKB_CB(skb)->seq + 1 == TCP_SKB_CB(skb)->end_seq && > > TCP_SKB_CB(skb)->seq + 1 == tp->rcv_nxt && > > TCP_SKB_CB(skb)->ack_seq == tp->snd_nxt) > > goto pass; > > syn_challenge: > > if (syn_inerr) > > TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS); > > NET_INC_STATS(sock_net(sk), > > LINUX_MIB_TCPSYNCHALLENGE); > > tcp_send_challenge_ack(sk); > > SKB_DR_SET(reason, TCP_INVALID_SYN); > > goto discard; > > } > > > > Also, this quotation you mentioned obviously doesn't match the kernel > > implementation: > > "If the SYN bit is set, irrespective of the sequence number, TCP MUST > > send an ACK" > > The tcp_timewait_state_process() does care about the seq number, or > > else timewait socket would refuse every SYN packet. > > That's why I pasted RFC 9293 first that clearly states that we > should check seq number and then return ACK for all other cases. I don't think so. RFC 9293 only states that RFC 5691 provides an approach that mitigates the risk by rejecting all the SYN packets if the socket stays in synchronized state. It's "For all other cases" in RFC 9293. Please loot at "irrespective of the sequence number" in RFC 5691 4.2 [1]. It means no matter what the seq is we MUST send back an ACK instead of establishing a new connection. Actually the tcp_timewait_state_process() checks the seq or timestamp in the SYN packet. [1]: https://datatracker.ietf.org/doc/html/rfc5961#section-4.2 Thanks, Jason
From: Jason Xing <kerneljasonxing@gmail.com> Date: Thu, 7 Nov 2024 14:51:35 +0800 > On Thu, Nov 7, 2024 at 1:43 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > From: Jason Xing <kerneljasonxing@gmail.com> > > Date: Thu, 7 Nov 2024 13:23:50 +0800 > > > On Thu, Nov 7, 2024 at 12:15 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > > > > > From: Jason Xing <kerneljasonxing@gmail.com> > > > > Date: Thu, 7 Nov 2024 11:16:04 +0800 > > > > > > Here is how things happen in production: > > > > > > Time Client(A) Server(B) > > > > > > 0s SYN--> > > > > > > ... > > > > > > 132s <-- FIN > > > > > > ... > > > > > > 169s FIN--> > > > > > > 169s <-- ACK > > > > > > 169s SYN--> > > > > > > 169s <-- ACK > > > > > > > > > > I noticed the above ACK doesn't adhere to RFC 6191. It says: > > > > > "If the previous incarnation of the connection used Timestamps, then: > > > > > if ... > > > > > ... > > > > > * Otherwise, silently drop the incoming SYN segment, thus leaving > > > > > the previous incarnation of the connection in the TIME-WAIT > > > > > state. > > > > > " > > > > > But the timewait socket sends an ACK because of this code snippet: > > > > > tcp_timewait_state_process() > > > > > -> // the checks of SYN packet failed. > > > > > -> if (!th->rst) { > > > > > -> return TCP_TW_ACK; // this line can be traced back to 2005 > > > > > > > > This is a challenge ACK following RFC 5961. > > > > > > Please note the idea of challenge ack was proposed in 2010. But this > > > code snippet has already existed before 2005. If it is a challenge > > > ack, then at least we need to count it (by using NET_INC_STATS(net, > > > LINUX_MIB_TCPCHALLENGEACK);). > > > > The word was not accurate, the behaviour is compliant with RFC 5961. > > RFC is often formalised based on real implementations. > > > > Incrementing the count makes sense to me. > > > > > > > > > > > > > If SYN is returned here, the client may lose the chance to RST the > > > > previous connection in TIME_WAIT. > > > > > > > > https://www.rfc-editor.org/rfc/rfc9293.html#section-3.10.7.4-2.4.1 > > > > ---8<--- > > > > - TIME-WAIT STATE > > > > > > > > o If the SYN bit is set in these synchronized states, it may > > > > be either a legitimate new connection attempt (e.g., in the > > > > case of TIME-WAIT), an error where the connection should be > > > > reset, or the result of an attack attempt, as described in > > > > RFC 5961 [9]. For the TIME-WAIT state, new connections can > > > > be accepted if the Timestamp Option is used and meets > > > > expectations (per [40]). For all other cases, RFC 5961 > > > > provides a mitigation with applicability to some situations, > > > > though there are also alternatives that offer cryptographic > > > > protection (see Section 7). RFC 5961 recommends that in > > > > these synchronized states, if the SYN bit is set, > > > > irrespective of the sequence number, TCP endpoints MUST send > > > > a "challenge ACK" to the remote peer: > > > > > > > > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> > > > > ---8<--- > > > > > > > > https://datatracker.ietf.org/doc/html/rfc5961#section-4 > > > > ---8<--- > > > > 1) If the SYN bit is set, irrespective of the sequence number, TCP > > > > MUST send an ACK (also referred to as challenge ACK) to the remote > > > > peer: > > > > > > > > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> > > > > > > > > After sending the acknowledgment, TCP MUST drop the unacceptable > > > > segment and stop processing further. > > > > ---8<--- > > > > > > The RFC 5961 4.2 was implemented in tcp_validate_incoming(): > > > /* step 4: Check for a SYN > > > * RFC 5961 4.2 : Send a challenge ack > > > */ > > > if (th->syn) { > > > if (sk->sk_state == TCP_SYN_RECV && sk->sk_socket && th->ack && > > > TCP_SKB_CB(skb)->seq + 1 == TCP_SKB_CB(skb)->end_seq && > > > TCP_SKB_CB(skb)->seq + 1 == tp->rcv_nxt && > > > TCP_SKB_CB(skb)->ack_seq == tp->snd_nxt) > > > goto pass; > > > syn_challenge: > > > if (syn_inerr) > > > TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS); > > > NET_INC_STATS(sock_net(sk), > > > LINUX_MIB_TCPSYNCHALLENGE); > > > tcp_send_challenge_ack(sk); > > > SKB_DR_SET(reason, TCP_INVALID_SYN); > > > goto discard; > > > } > > > > > > Also, this quotation you mentioned obviously doesn't match the kernel > > > implementation: > > > "If the SYN bit is set, irrespective of the sequence number, TCP MUST > > > send an ACK" > > > The tcp_timewait_state_process() does care about the seq number, or > > > else timewait socket would refuse every SYN packet. > > > > That's why I pasted RFC 9293 first that clearly states that we > > should check seq number and then return ACK for all other cases. > > I don't think so. > > RFC 9293 only states that RFC 5691 provides an approach that mitigates > the risk by rejecting all the SYN packets if the socket stays in > synchronized state. It's "For all other cases" in RFC 9293. RFC 9293 states which RFC to prioritise. You will find the link [40] is RFC 6191. ---8<--- For the TIME-WAIT state, new connections can be accepted if the Timestamp Option is used and meets expectations (per [40]). For all other cases, RFC 5961 ... ---8<--- > Please loot at "irrespective of the sequence number" in RFC 5691 4.2 > [1]. It means no matter what the seq is we MUST send back an ACK > instead of establishing a new connection. RFC 9293 mentions accepatble cases first, so this is only applied to "all other cases" > Actually the tcp_timewait_state_process() checks the seq or timestamp > in the SYN packet. and this part takes precedence than "all other cases". Also, you missed that the pasted part is the 4th step of incoming segment processing. https://www.rfc-editor.org/rfc/rfc9293.html#section-3.10.7.4 ---8<--- First, check sequence number: ... Second, check the RST bit: ... Third, check security: ... Fourth, check the SYN bit: ... TIME-WAIT STATE If the SYN bit is set in these synchronized states... ---8<--- So, RFC 9293 says "check seq number, RST, security, then if the connection is still accepatable for TIME_WAIT based on RFC 6191, accept it, otherwise, return ACK based on RFC 5691".
On Thu, Nov 7, 2024 at 3:11 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > From: Jason Xing <kerneljasonxing@gmail.com> > Date: Thu, 7 Nov 2024 14:51:35 +0800 > > On Thu, Nov 7, 2024 at 1:43 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > > > From: Jason Xing <kerneljasonxing@gmail.com> > > > Date: Thu, 7 Nov 2024 13:23:50 +0800 > > > > On Thu, Nov 7, 2024 at 12:15 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > > > > > > > From: Jason Xing <kerneljasonxing@gmail.com> > > > > > Date: Thu, 7 Nov 2024 11:16:04 +0800 > > > > > > > Here is how things happen in production: > > > > > > > Time Client(A) Server(B) > > > > > > > 0s SYN--> > > > > > > > ... > > > > > > > 132s <-- FIN > > > > > > > ... > > > > > > > 169s FIN--> > > > > > > > 169s <-- ACK > > > > > > > 169s SYN--> > > > > > > > 169s <-- ACK > > > > > > > > > > > > I noticed the above ACK doesn't adhere to RFC 6191. It says: > > > > > > "If the previous incarnation of the connection used Timestamps, then: > > > > > > if ... > > > > > > ... > > > > > > * Otherwise, silently drop the incoming SYN segment, thus leaving > > > > > > the previous incarnation of the connection in the TIME-WAIT > > > > > > state. > > > > > > " > > > > > > But the timewait socket sends an ACK because of this code snippet: > > > > > > tcp_timewait_state_process() > > > > > > -> // the checks of SYN packet failed. > > > > > > -> if (!th->rst) { > > > > > > -> return TCP_TW_ACK; // this line can be traced back to 2005 > > > > > > > > > > This is a challenge ACK following RFC 5961. > > > > > > > > Please note the idea of challenge ack was proposed in 2010. But this > > > > code snippet has already existed before 2005. If it is a challenge > > > > ack, then at least we need to count it (by using NET_INC_STATS(net, > > > > LINUX_MIB_TCPCHALLENGEACK);). > > > > > > The word was not accurate, the behaviour is compliant with RFC 5961. > > > RFC is often formalised based on real implementations. > > > > > > Incrementing the count makes sense to me. > > > > > > > > > > > > > > > > > If SYN is returned here, the client may lose the chance to RST the > > > > > previous connection in TIME_WAIT. > > > > > > > > > > https://www.rfc-editor.org/rfc/rfc9293.html#section-3.10.7.4-2.4.1 > > > > > ---8<--- > > > > > - TIME-WAIT STATE > > > > > > > > > > o If the SYN bit is set in these synchronized states, it may > > > > > be either a legitimate new connection attempt (e.g., in the > > > > > case of TIME-WAIT), an error where the connection should be > > > > > reset, or the result of an attack attempt, as described in > > > > > RFC 5961 [9]. For the TIME-WAIT state, new connections can > > > > > be accepted if the Timestamp Option is used and meets > > > > > expectations (per [40]). For all other cases, RFC 5961 > > > > > provides a mitigation with applicability to some situations, > > > > > though there are also alternatives that offer cryptographic > > > > > protection (see Section 7). RFC 5961 recommends that in > > > > > these synchronized states, if the SYN bit is set, > > > > > irrespective of the sequence number, TCP endpoints MUST send > > > > > a "challenge ACK" to the remote peer: > > > > > > > > > > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> > > > > > ---8<--- > > > > > > > > > > https://datatracker.ietf.org/doc/html/rfc5961#section-4 > > > > > ---8<--- > > > > > 1) If the SYN bit is set, irrespective of the sequence number, TCP > > > > > MUST send an ACK (also referred to as challenge ACK) to the remote > > > > > peer: > > > > > > > > > > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK> > > > > > > > > > > After sending the acknowledgment, TCP MUST drop the unacceptable > > > > > segment and stop processing further. > > > > > ---8<--- > > > > > > > > The RFC 5961 4.2 was implemented in tcp_validate_incoming(): > > > > /* step 4: Check for a SYN > > > > * RFC 5961 4.2 : Send a challenge ack > > > > */ > > > > if (th->syn) { > > > > if (sk->sk_state == TCP_SYN_RECV && sk->sk_socket && th->ack && > > > > TCP_SKB_CB(skb)->seq + 1 == TCP_SKB_CB(skb)->end_seq && > > > > TCP_SKB_CB(skb)->seq + 1 == tp->rcv_nxt && > > > > TCP_SKB_CB(skb)->ack_seq == tp->snd_nxt) > > > > goto pass; > > > > syn_challenge: > > > > if (syn_inerr) > > > > TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS); > > > > NET_INC_STATS(sock_net(sk), > > > > LINUX_MIB_TCPSYNCHALLENGE); > > > > tcp_send_challenge_ack(sk); > > > > SKB_DR_SET(reason, TCP_INVALID_SYN); > > > > goto discard; > > > > } > > > > > > > > Also, this quotation you mentioned obviously doesn't match the kernel > > > > implementation: > > > > "If the SYN bit is set, irrespective of the sequence number, TCP MUST > > > > send an ACK" > > > > The tcp_timewait_state_process() does care about the seq number, or > > > > else timewait socket would refuse every SYN packet. > > > > > > That's why I pasted RFC 9293 first that clearly states that we > > > should check seq number and then return ACK for all other cases. > > > > I don't think so. > > > > RFC 9293 only states that RFC 5691 provides an approach that mitigates > > the risk by rejecting all the SYN packets if the socket stays in > > synchronized state. It's "For all other cases" in RFC 9293. > > RFC 9293 states which RFC to prioritise. You will find the > link [40] is RFC 6191. > > ---8<--- > For the TIME-WAIT state, new connections can > be accepted if the Timestamp Option is used and meets > expectations (per [40]). For all other cases, RFC 5961 > ... > ---8<--- > > > Please loot at "irrespective of the sequence number" in RFC 5691 4.2 > > [1]. It means no matter what the seq is we MUST send back an ACK > > instead of establishing a new connection. > > RFC 9293 mentions accepatble cases first, so this is only applied > to "all other cases" > > > > Actually the tcp_timewait_state_process() checks the seq or timestamp > > in the SYN packet. > > and this part takes precedence than "all other cases". > > Also, you missed that the pasted part is the 4th step of incoming > segment processing. > > https://www.rfc-editor.org/rfc/rfc9293.html#section-3.10.7.4 > ---8<--- > First, check sequence number: ... > Second, check the RST bit: ... > Third, check security: ... > Fourth, check the SYN bit: > ... > TIME-WAIT STATE > If the SYN bit is set in these synchronized states... > ---8<--- > > So, RFC 9293 says "check seq number, RST, security, then > if the connection is still accepatable for TIME_WAIT based on > RFC 6191, accept it, otherwise, return ACK based on RFC 5691". I see what you mean here. If that is so, tcp_timewait_state_process() has already implemented the challenge ack even before the challenge ack showed up in 2010. Interesting. If this part refers to RFC 5691, then we need to copy part of tcp_send_challenge_ack() in this case to send the challenge ack, like testing sysctl_tcp_challenge_ack_limit, etc. Thanks, Jason
Hi Jason, On 2024/11/5 10:55, Jason Xing wrote: > From: Jason Xing <kernelxing@tencent.com> > > We found there are rare chances that some RST packets appear during > the shakehands because the timewait socket cannot accept the SYN and > doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > Here is how things happen in production: > Time Client(A) Server(B) > 0s SYN--> > ... > 132s <-- FIN > ... > 169s FIN--> > 169s <-- ACK > 169s SYN--> > 169s <-- ACK > 169s RST--> > As above picture shows, the two flows have a start time difference > of 169 seconds. B starts to send FIN so it will finally enter into > TIMEWAIT state. Nearly at the same time A launches a new connection > that soon is reset by itself due to receiving a ACK. > > There are two key checks in tcp_timewait_state_process() when timewait > socket in B receives the SYN packet: > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > Regarding the first rule, it fails as expected because in the first > connection the seq of SYN sent from A is 1892994276, then 169s have > passed, the second SYN has 239034613 (caused by overflow of s32). > > Then how about the second rule? > It fails again! > Let's take a look at how the tsval comes out: > __tcp_transmit_skb() > -> tcp_syn_options() > -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > The timestamp depends on two things, one is skb->skb_mstamp_ns, the > other is tp->tsoffset. The latter value is fixed, so we don't need > to care about it. If both operations (sending FIN and then starting > sending SYN) from A happen in 1ms, then the tsval would be the same. > It can be clearly seen in the tcpdump log. Notice that the tsval is > with millisecond precision. > > Based on the above analysis, I decided to make a small change to > the check in tcp_timewait_state_process() so that the second flow > would not fail. > I wonder what a bad result the RST causes. As far as I know, the client will not close the connect and return. Instead, it re-sends an SYN in TCP_TIMEOUT_MIN(2) jiffies (implemented in tcp_rcv_synsent_state_process). So the second connection could still be established successfully, at the cost of a bit more delay. Like: Time Client(A) Server(B) 0s SYN--> ... 132s <-- FIN ... 169s FIN--> 169s <-- ACK 169s SYN--> 169s <-- ACK 169s RST--> ~2jiffies SYN--> <-- SYN,ACK Thanks.
On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > Hi Jason, > > On 2024/11/5 10:55, Jason Xing wrote: > > From: Jason Xing <kernelxing@tencent.com> > > > > We found there are rare chances that some RST packets appear during > > the shakehands because the timewait socket cannot accept the SYN and > > doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > > > Here is how things happen in production: > > Time Client(A) Server(B) > > 0s SYN--> > > ... > > 132s <-- FIN > > ... > > 169s FIN--> > > 169s <-- ACK > > 169s SYN--> > > 169s <-- ACK > > 169s RST--> > > As above picture shows, the two flows have a start time difference > > of 169 seconds. B starts to send FIN so it will finally enter into > > TIMEWAIT state. Nearly at the same time A launches a new connection > > that soon is reset by itself due to receiving a ACK. > > > > There are two key checks in tcp_timewait_state_process() when timewait > > socket in B receives the SYN packet: > > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > > > Regarding the first rule, it fails as expected because in the first > > connection the seq of SYN sent from A is 1892994276, then 169s have > > passed, the second SYN has 239034613 (caused by overflow of s32). > > > > Then how about the second rule? > > It fails again! > > Let's take a look at how the tsval comes out: > > __tcp_transmit_skb() > > -> tcp_syn_options() > > -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > > The timestamp depends on two things, one is skb->skb_mstamp_ns, the > > other is tp->tsoffset. The latter value is fixed, so we don't need > > to care about it. If both operations (sending FIN and then starting > > sending SYN) from A happen in 1ms, then the tsval would be the same. > > It can be clearly seen in the tcpdump log. Notice that the tsval is > > with millisecond precision. > > > > Based on the above analysis, I decided to make a small change to > > the check in tcp_timewait_state_process() so that the second flow > > would not fail. > > > > I wonder what a bad result the RST causes. As far as I know, the client > will not close the connect and return. Instead, it re-sends an SYN in > TCP_TIMEOUT_MIN(2) jiffies (implemented in > tcp_rcv_synsent_state_process). So the second connection could still be > established successfully, at the cost of a bit more delay. Like: > > Time Client(A) Server(B) > 0s SYN--> > ... > 132s <-- FIN > ... > 169s FIN--> > 169s <-- ACK > 169s SYN--> > 169s <-- ACK > 169s RST--> > ~2jiffies SYN--> > <-- SYN,ACK That's exactly what I meant here :) Originally I didn't expect the application to relaunch a connection in this case.
On 2024/11/7 16:01, Jason Xing wrote: > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote: >> >> Hi Jason, >> >> On 2024/11/5 10:55, Jason Xing wrote: >>> From: Jason Xing <kernelxing@tencent.com> >>> >>> We found there are rare chances that some RST packets appear during >>> the shakehands because the timewait socket cannot accept the SYN and >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process(). >>> >>> Here is how things happen in production: >>> Time Client(A) Server(B) >>> 0s SYN--> >>> ... >>> 132s <-- FIN >>> ... >>> 169s FIN--> >>> 169s <-- ACK >>> 169s SYN--> >>> 169s <-- ACK >>> 169s RST--> >>> As above picture shows, the two flows have a start time difference >>> of 169 seconds. B starts to send FIN so it will finally enter into >>> TIMEWAIT state. Nearly at the same time A launches a new connection >>> that soon is reset by itself due to receiving a ACK. >>> >>> There are two key checks in tcp_timewait_state_process() when timewait >>> socket in B receives the SYN packet: >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) >>> >>> Regarding the first rule, it fails as expected because in the first >>> connection the seq of SYN sent from A is 1892994276, then 169s have >>> passed, the second SYN has 239034613 (caused by overflow of s32). >>> >>> Then how about the second rule? >>> It fails again! >>> Let's take a look at how the tsval comes out: >>> __tcp_transmit_skb() >>> -> tcp_syn_options() >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the >>> other is tp->tsoffset. The latter value is fixed, so we don't need >>> to care about it. If both operations (sending FIN and then starting >>> sending SYN) from A happen in 1ms, then the tsval would be the same. >>> It can be clearly seen in the tcpdump log. Notice that the tsval is >>> with millisecond precision. >>> >>> Based on the above analysis, I decided to make a small change to >>> the check in tcp_timewait_state_process() so that the second flow >>> would not fail. >>> >> >> I wonder what a bad result the RST causes. As far as I know, the client >> will not close the connect and return. Instead, it re-sends an SYN in >> TCP_TIMEOUT_MIN(2) jiffies (implemented in >> tcp_rcv_synsent_state_process). So the second connection could still be >> established successfully, at the cost of a bit more delay. Like: >> >> Time Client(A) Server(B) >> 0s SYN--> >> ... >> 132s <-- FIN >> ... >> 169s FIN--> >> 169s <-- ACK >> 169s SYN--> >> 169s <-- ACK >> 169s RST--> >> ~2jiffies SYN--> >> <-- SYN,ACK > > That's exactly what I meant here :) Originally I didn't expect the > application to relaunch a connection in this case. s/application/kernel/, right? Because the retry is transparent to user applications except the additional latency. I think all of these are finished in a single connect() :)
On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > > > On 2024/11/7 16:01, Jason Xing wrote: > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote: > >> > >> Hi Jason, > >> > >> On 2024/11/5 10:55, Jason Xing wrote: > >>> From: Jason Xing <kernelxing@tencent.com> > >>> > >>> We found there are rare chances that some RST packets appear during > >>> the shakehands because the timewait socket cannot accept the SYN and > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > >>> > >>> Here is how things happen in production: > >>> Time Client(A) Server(B) > >>> 0s SYN--> > >>> ... > >>> 132s <-- FIN > >>> ... > >>> 169s FIN--> > >>> 169s <-- ACK > >>> 169s SYN--> > >>> 169s <-- ACK > >>> 169s RST--> > >>> As above picture shows, the two flows have a start time difference > >>> of 169 seconds. B starts to send FIN so it will finally enter into > >>> TIMEWAIT state. Nearly at the same time A launches a new connection > >>> that soon is reset by itself due to receiving a ACK. > >>> > >>> There are two key checks in tcp_timewait_state_process() when timewait > >>> socket in B receives the SYN packet: > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > >>> > >>> Regarding the first rule, it fails as expected because in the first > >>> connection the seq of SYN sent from A is 1892994276, then 169s have > >>> passed, the second SYN has 239034613 (caused by overflow of s32). > >>> > >>> Then how about the second rule? > >>> It fails again! > >>> Let's take a look at how the tsval comes out: > >>> __tcp_transmit_skb() > >>> -> tcp_syn_options() > >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the > >>> other is tp->tsoffset. The latter value is fixed, so we don't need > >>> to care about it. If both operations (sending FIN and then starting > >>> sending SYN) from A happen in 1ms, then the tsval would be the same. > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is > >>> with millisecond precision. > >>> > >>> Based on the above analysis, I decided to make a small change to > >>> the check in tcp_timewait_state_process() so that the second flow > >>> would not fail. > >>> > >> > >> I wonder what a bad result the RST causes. As far as I know, the client > >> will not close the connect and return. Instead, it re-sends an SYN in > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in > >> tcp_rcv_synsent_state_process). So the second connection could still be > >> established successfully, at the cost of a bit more delay. Like: > >> > >> Time Client(A) Server(B) > >> 0s SYN--> > >> ... > >> 132s <-- FIN > >> ... > >> 169s FIN--> > >> 169s <-- ACK > >> 169s SYN--> > >> 169s <-- ACK > >> 169s RST--> > >> ~2jiffies SYN--> > >> <-- SYN,ACK > > > > That's exactly what I meant here :) Originally I didn't expect the > > application to relaunch a connection in this case. > > s/application/kernel/, right? No. Perhaps I didn't make myself clear. If the kernel doesn't silently drop the SYN and then send back an ACK, the application has to call the connect() syscall again. > Because the retry is transparent to user > applications except the additional latency. I think all of these are > finished in a single connect() :) Right.
On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > > > > > > > On 2024/11/7 16:01, Jason Xing wrote: > > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > >> > > >> Hi Jason, > > >> > > >> On 2024/11/5 10:55, Jason Xing wrote: > > >>> From: Jason Xing <kernelxing@tencent.com> > > >>> > > >>> We found there are rare chances that some RST packets appear during > > >>> the shakehands because the timewait socket cannot accept the SYN and > > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > >>> > > >>> Here is how things happen in production: > > >>> Time Client(A) Server(B) > > >>> 0s SYN--> > > >>> ... > > >>> 132s <-- FIN > > >>> ... > > >>> 169s FIN--> > > >>> 169s <-- ACK > > >>> 169s SYN--> > > >>> 169s <-- ACK > > >>> 169s RST--> > > >>> As above picture shows, the two flows have a start time difference > > >>> of 169 seconds. B starts to send FIN so it will finally enter into > > >>> TIMEWAIT state. Nearly at the same time A launches a new connection > > >>> that soon is reset by itself due to receiving a ACK. > > >>> > > >>> There are two key checks in tcp_timewait_state_process() when timewait > > >>> socket in B receives the SYN packet: > > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > >>> > > >>> Regarding the first rule, it fails as expected because in the first > > >>> connection the seq of SYN sent from A is 1892994276, then 169s have > > >>> passed, the second SYN has 239034613 (caused by overflow of s32). > > >>> > > >>> Then how about the second rule? > > >>> It fails again! > > >>> Let's take a look at how the tsval comes out: > > >>> __tcp_transmit_skb() > > >>> -> tcp_syn_options() > > >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the > > >>> other is tp->tsoffset. The latter value is fixed, so we don't need > > >>> to care about it. If both operations (sending FIN and then starting > > >>> sending SYN) from A happen in 1ms, then the tsval would be the same. > > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is > > >>> with millisecond precision. > > >>> > > >>> Based on the above analysis, I decided to make a small change to > > >>> the check in tcp_timewait_state_process() so that the second flow > > >>> would not fail. > > >>> > > >> > > >> I wonder what a bad result the RST causes. As far as I know, the client > > >> will not close the connect and return. Instead, it re-sends an SYN in > > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in > > >> tcp_rcv_synsent_state_process). So the second connection could still be > > >> established successfully, at the cost of a bit more delay. Like: > > >> > > >> Time Client(A) Server(B) > > >> 0s SYN--> > > >> ... > > >> 132s <-- FIN > > >> ... > > >> 169s FIN--> > > >> 169s <-- ACK > > >> 169s SYN--> > > >> 169s <-- ACK > > >> 169s RST--> > > >> ~2jiffies SYN--> > > >> <-- SYN,ACK > > > > > > That's exactly what I meant here :) Originally I didn't expect the > > > application to relaunch a connection in this case. > > > > s/application/kernel/, right? > > No. Perhaps I didn't make myself clear. If the kernel doesn't silently > drop the SYN and then send back an ACK, the application has to call > the connect() syscall again. My suggestion to stop the confusion: Provide a packetdrill test.
On Thu, Nov 7, 2024 at 4:37 PM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > > > > > > > > > > > On 2024/11/7 16:01, Jason Xing wrote: > > > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > > >> > > > >> Hi Jason, > > > >> > > > >> On 2024/11/5 10:55, Jason Xing wrote: > > > >>> From: Jason Xing <kernelxing@tencent.com> > > > >>> > > > >>> We found there are rare chances that some RST packets appear during > > > >>> the shakehands because the timewait socket cannot accept the SYN and > > > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > > >>> > > > >>> Here is how things happen in production: > > > >>> Time Client(A) Server(B) > > > >>> 0s SYN--> > > > >>> ... > > > >>> 132s <-- FIN > > > >>> ... > > > >>> 169s FIN--> > > > >>> 169s <-- ACK > > > >>> 169s SYN--> > > > >>> 169s <-- ACK > > > >>> 169s RST--> > > > >>> As above picture shows, the two flows have a start time difference > > > >>> of 169 seconds. B starts to send FIN so it will finally enter into > > > >>> TIMEWAIT state. Nearly at the same time A launches a new connection > > > >>> that soon is reset by itself due to receiving a ACK. > > > >>> > > > >>> There are two key checks in tcp_timewait_state_process() when timewait > > > >>> socket in B receives the SYN packet: > > > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > > > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > > >>> > > > >>> Regarding the first rule, it fails as expected because in the first > > > >>> connection the seq of SYN sent from A is 1892994276, then 169s have > > > >>> passed, the second SYN has 239034613 (caused by overflow of s32). > > > >>> > > > >>> Then how about the second rule? > > > >>> It fails again! > > > >>> Let's take a look at how the tsval comes out: > > > >>> __tcp_transmit_skb() > > > >>> -> tcp_syn_options() > > > >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > > > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the > > > >>> other is tp->tsoffset. The latter value is fixed, so we don't need > > > >>> to care about it. If both operations (sending FIN and then starting > > > >>> sending SYN) from A happen in 1ms, then the tsval would be the same. > > > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is > > > >>> with millisecond precision. > > > >>> > > > >>> Based on the above analysis, I decided to make a small change to > > > >>> the check in tcp_timewait_state_process() so that the second flow > > > >>> would not fail. > > > >>> > > > >> > > > >> I wonder what a bad result the RST causes. As far as I know, the client > > > >> will not close the connect and return. Instead, it re-sends an SYN in > > > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in > > > >> tcp_rcv_synsent_state_process). So the second connection could still be > > > >> established successfully, at the cost of a bit more delay. Like: > > > >> > > > >> Time Client(A) Server(B) > > > >> 0s SYN--> > > > >> ... > > > >> 132s <-- FIN > > > >> ... > > > >> 169s FIN--> > > > >> 169s <-- ACK > > > >> 169s SYN--> > > > >> 169s <-- ACK > > > >> 169s RST--> > > > >> ~2jiffies SYN--> It doesn't happen. I dare to say the SYN will not be retransmitted soon which I can tell from many tcpdump logs I captured. > > > >> <-- SYN,ACK > > > > > > > > That's exactly what I meant here :) Originally I didn't expect the > > > > application to relaunch a connection in this case. > > > > > > s/application/kernel/, right? > > > > No. Perhaps I didn't make myself clear. If the kernel doesn't silently > > drop the SYN and then send back an ACK, the application has to call > > the connect() syscall again. > > My suggestion to stop the confusion: Oh, right now, I realized that Philo and I are not on the same page :( Please forget that conversation. My points are: 1) If B silently drops the SYN packet, A will retransmit an SYN packet and then the connection will be established. It's what I tried to propose and would like to see. It also adheres to the RFC 6191. 2) As kuniyuki pointed out on the contrary, sending an ACK (like the current implementation) instead of silently dropping the SYN packet is actually a challenge ack. If so, I think we need to consider this ACK as a challenge ack like what tcp_send_challenge_ack() does. I'd like to hear more about your opinions on the above conclusion. Thanks!
On Thu, Nov 7, 2024 at 10:01 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Thu, Nov 7, 2024 at 4:37 PM Eric Dumazet <edumazet@google.com> wrote: > > > > On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > On 2024/11/7 16:01, Jason Xing wrote: > > > > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > > > >> > > > > >> Hi Jason, > > > > >> > > > > >> On 2024/11/5 10:55, Jason Xing wrote: > > > > >>> From: Jason Xing <kernelxing@tencent.com> > > > > >>> > > > > >>> We found there are rare chances that some RST packets appear during > > > > >>> the shakehands because the timewait socket cannot accept the SYN and > > > > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > > > >>> > > > > >>> Here is how things happen in production: > > > > >>> Time Client(A) Server(B) > > > > >>> 0s SYN--> > > > > >>> ... > > > > >>> 132s <-- FIN > > > > >>> ... > > > > >>> 169s FIN--> > > > > >>> 169s <-- ACK > > > > >>> 169s SYN--> > > > > >>> 169s <-- ACK > > > > >>> 169s RST--> > > > > >>> As above picture shows, the two flows have a start time difference > > > > >>> of 169 seconds. B starts to send FIN so it will finally enter into > > > > >>> TIMEWAIT state. Nearly at the same time A launches a new connection > > > > >>> that soon is reset by itself due to receiving a ACK. > > > > >>> > > > > >>> There are two key checks in tcp_timewait_state_process() when timewait > > > > >>> socket in B receives the SYN packet: > > > > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > > > > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > > > >>> > > > > >>> Regarding the first rule, it fails as expected because in the first > > > > >>> connection the seq of SYN sent from A is 1892994276, then 169s have > > > > >>> passed, the second SYN has 239034613 (caused by overflow of s32). > > > > >>> > > > > >>> Then how about the second rule? > > > > >>> It fails again! > > > > >>> Let's take a look at how the tsval comes out: > > > > >>> __tcp_transmit_skb() > > > > >>> -> tcp_syn_options() > > > > >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > > > > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the > > > > >>> other is tp->tsoffset. The latter value is fixed, so we don't need > > > > >>> to care about it. If both operations (sending FIN and then starting > > > > >>> sending SYN) from A happen in 1ms, then the tsval would be the same. > > > > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is > > > > >>> with millisecond precision. > > > > >>> > > > > >>> Based on the above analysis, I decided to make a small change to > > > > >>> the check in tcp_timewait_state_process() so that the second flow > > > > >>> would not fail. > > > > >>> > > > > >> > > > > >> I wonder what a bad result the RST causes. As far as I know, the client > > > > >> will not close the connect and return. Instead, it re-sends an SYN in > > > > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in > > > > >> tcp_rcv_synsent_state_process). So the second connection could still be > > > > >> established successfully, at the cost of a bit more delay. Like: > > > > >> > > > > >> Time Client(A) Server(B) > > > > >> 0s SYN--> > > > > >> ... > > > > >> 132s <-- FIN > > > > >> ... > > > > >> 169s FIN--> > > > > >> 169s <-- ACK > > > > >> 169s SYN--> > > > > >> 169s <-- ACK > > > > >> 169s RST--> > > > > >> ~2jiffies SYN--> > > It doesn't happen. I dare to say the SYN will not be retransmitted > soon which I can tell from many tcpdump logs I captured. > > > > > >> <-- SYN,ACK > > > > > > > > > > That's exactly what I meant here :) Originally I didn't expect the > > > > > application to relaunch a connection in this case. > > > > > > > > s/application/kernel/, right? > > > > > > No. Perhaps I didn't make myself clear. If the kernel doesn't silently > > > drop the SYN and then send back an ACK, the application has to call > > > the connect() syscall again. > > > > My suggestion to stop the confusion: > > Oh, right now, I realized that Philo and I are not on the same page :( > Please forget that conversation. > > My points are: > 1) If B silently drops the SYN packet, A will retransmit an SYN packet > and then the connection will be established. It's what I tried to > propose and would like to see. It also adheres to the RFC 6191. > 2) As kuniyuki pointed out on the contrary, sending an ACK (like the > current implementation) instead of silently dropping the SYN packet is > actually a challenge ack. If so, I think we need to consider this ACK > as a challenge ack like what tcp_send_challenge_ack() does. Like where ? Again, a packetdrill test will clarify all of this. > > I'd like to hear more about your opinions on the above conclusion.
On Thu, Nov 7, 2024 at 5:17 PM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Nov 7, 2024 at 10:01 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Thu, Nov 7, 2024 at 4:37 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > > > On 2024/11/7 16:01, Jason Xing wrote: > > > > > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > > > > >> > > > > > >> Hi Jason, > > > > > >> > > > > > >> On 2024/11/5 10:55, Jason Xing wrote: > > > > > >>> From: Jason Xing <kernelxing@tencent.com> > > > > > >>> > > > > > >>> We found there are rare chances that some RST packets appear during > > > > > >>> the shakehands because the timewait socket cannot accept the SYN and > > > > > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > > > > >>> > > > > > >>> Here is how things happen in production: > > > > > >>> Time Client(A) Server(B) > > > > > >>> 0s SYN--> > > > > > >>> ... > > > > > >>> 132s <-- FIN > > > > > >>> ... > > > > > >>> 169s FIN--> > > > > > >>> 169s <-- ACK > > > > > >>> 169s SYN--> > > > > > >>> 169s <-- ACK > > > > > >>> 169s RST--> > > > > > >>> As above picture shows, the two flows have a start time difference > > > > > >>> of 169 seconds. B starts to send FIN so it will finally enter into > > > > > >>> TIMEWAIT state. Nearly at the same time A launches a new connection > > > > > >>> that soon is reset by itself due to receiving a ACK. > > > > > >>> > > > > > >>> There are two key checks in tcp_timewait_state_process() when timewait > > > > > >>> socket in B receives the SYN packet: > > > > > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > > > > > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > > > > >>> > > > > > >>> Regarding the first rule, it fails as expected because in the first > > > > > >>> connection the seq of SYN sent from A is 1892994276, then 169s have > > > > > >>> passed, the second SYN has 239034613 (caused by overflow of s32). > > > > > >>> > > > > > >>> Then how about the second rule? > > > > > >>> It fails again! > > > > > >>> Let's take a look at how the tsval comes out: > > > > > >>> __tcp_transmit_skb() > > > > > >>> -> tcp_syn_options() > > > > > >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > > > > > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the > > > > > >>> other is tp->tsoffset. The latter value is fixed, so we don't need > > > > > >>> to care about it. If both operations (sending FIN and then starting > > > > > >>> sending SYN) from A happen in 1ms, then the tsval would be the same. > > > > > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is > > > > > >>> with millisecond precision. > > > > > >>> > > > > > >>> Based on the above analysis, I decided to make a small change to > > > > > >>> the check in tcp_timewait_state_process() so that the second flow > > > > > >>> would not fail. > > > > > >>> > > > > > >> > > > > > >> I wonder what a bad result the RST causes. As far as I know, the client > > > > > >> will not close the connect and return. Instead, it re-sends an SYN in > > > > > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in > > > > > >> tcp_rcv_synsent_state_process). So the second connection could still be > > > > > >> established successfully, at the cost of a bit more delay. Like: > > > > > >> > > > > > >> Time Client(A) Server(B) > > > > > >> 0s SYN--> > > > > > >> ... > > > > > >> 132s <-- FIN > > > > > >> ... > > > > > >> 169s FIN--> > > > > > >> 169s <-- ACK > > > > > >> 169s SYN--> > > > > > >> 169s <-- ACK > > > > > >> 169s RST--> > > > > > >> ~2jiffies SYN--> > > > > It doesn't happen. I dare to say the SYN will not be retransmitted > > soon which I can tell from many tcpdump logs I captured. > > > > > > > >> <-- SYN,ACK > > > > > > > > > > > > That's exactly what I meant here :) Originally I didn't expect the > > > > > > application to relaunch a connection in this case. > > > > > > > > > > s/application/kernel/, right? > > > > > > > > No. Perhaps I didn't make myself clear. If the kernel doesn't silently > > > > drop the SYN and then send back an ACK, the application has to call > > > > the connect() syscall again. > > > > > > My suggestion to stop the confusion: > > > > Oh, right now, I realized that Philo and I are not on the same page :( > > Please forget that conversation. > > > > My points are: > > 1) If B silently drops the SYN packet, A will retransmit an SYN packet > > and then the connection will be established. It's what I tried to > > propose and would like to see. It also adheres to the RFC 6191. > > 2) As kuniyuki pointed out on the contrary, sending an ACK (like the > > current implementation) instead of silently dropping the SYN packet is > > actually a challenge ack. If so, I think we need to consider this ACK > > as a challenge ack like what tcp_send_challenge_ack() does. > > Like where ? Again, a packetdrill test will clarify all of this. Well, okay, Let me try.
On Thu, Nov 7, 2024 at 5:00 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Thu, Nov 7, 2024 at 4:37 PM Eric Dumazet <edumazet@google.com> wrote: > > > > On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > On 2024/11/7 16:01, Jason Xing wrote: > > > > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > > > >> > > > > >> Hi Jason, > > > > >> > > > > >> On 2024/11/5 10:55, Jason Xing wrote: > > > > >>> From: Jason Xing <kernelxing@tencent.com> > > > > >>> > > > > >>> We found there are rare chances that some RST packets appear during > > > > >>> the shakehands because the timewait socket cannot accept the SYN and > > > > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > > > >>> > > > > >>> Here is how things happen in production: > > > > >>> Time Client(A) Server(B) > > > > >>> 0s SYN--> > > > > >>> ... > > > > >>> 132s <-- FIN > > > > >>> ... > > > > >>> 169s FIN--> > > > > >>> 169s <-- ACK > > > > >>> 169s SYN--> > > > > >>> 169s <-- ACK > > > > >>> 169s RST--> > > > > >>> As above picture shows, the two flows have a start time difference > > > > >>> of 169 seconds. B starts to send FIN so it will finally enter into > > > > >>> TIMEWAIT state. Nearly at the same time A launches a new connection > > > > >>> that soon is reset by itself due to receiving a ACK. > > > > >>> > > > > >>> There are two key checks in tcp_timewait_state_process() when timewait > > > > >>> socket in B receives the SYN packet: > > > > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > > > > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > > > >>> > > > > >>> Regarding the first rule, it fails as expected because in the first > > > > >>> connection the seq of SYN sent from A is 1892994276, then 169s have > > > > >>> passed, the second SYN has 239034613 (caused by overflow of s32). > > > > >>> > > > > >>> Then how about the second rule? > > > > >>> It fails again! > > > > >>> Let's take a look at how the tsval comes out: > > > > >>> __tcp_transmit_skb() > > > > >>> -> tcp_syn_options() > > > > >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > > > > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the > > > > >>> other is tp->tsoffset. The latter value is fixed, so we don't need > > > > >>> to care about it. If both operations (sending FIN and then starting > > > > >>> sending SYN) from A happen in 1ms, then the tsval would be the same. > > > > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is > > > > >>> with millisecond precision. > > > > >>> > > > > >>> Based on the above analysis, I decided to make a small change to > > > > >>> the check in tcp_timewait_state_process() so that the second flow > > > > >>> would not fail. > > > > >>> > > > > >> > > > > >> I wonder what a bad result the RST causes. As far as I know, the client > > > > >> will not close the connect and return. Instead, it re-sends an SYN in > > > > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in > > > > >> tcp_rcv_synsent_state_process). So the second connection could still be > > > > >> established successfully, at the cost of a bit more delay. Like: > > > > >> > > > > >> Time Client(A) Server(B) > > > > >> 0s SYN--> > > > > >> ... > > > > >> 132s <-- FIN > > > > >> ... > > > > >> 169s FIN--> > > > > >> 169s <-- ACK > > > > >> 169s SYN--> > > > > >> 169s <-- ACK > > > > >> 169s RST--> > > > > >> ~2jiffies SYN--> > > It doesn't happen. I dare to say the SYN will not be retransmitted > soon which I can tell from many tcpdump logs I captured. My mind went blank probably because of getting sick. Sorry. Why the tcpdump doesn't show the retransmitted SYN packet is our private kernel missed this following commit: commit 9603d47bad4642118fa19fd1562569663d9235f6 Author: SeongJae Park <sjpark@amazon.de> Date: Sun Feb 2 03:38:26 2020 +0000 tcp: Reduce SYN resend delay if a suspicous ACK is received When closing a connection, the two acks that required to change closing socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in reverse order. This is possible in RSS disabled environments such as a connection inside a host.
On Thu, Nov 7, 2024 at 10:30 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Thu, Nov 7, 2024 at 5:00 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Thu, Nov 7, 2024 at 4:37 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > > > On 2024/11/7 16:01, Jason Xing wrote: > > > > > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > > > > >> > > > > > >> Hi Jason, > > > > > >> > > > > > >> On 2024/11/5 10:55, Jason Xing wrote: > > > > > >>> From: Jason Xing <kernelxing@tencent.com> > > > > > >>> > > > > > >>> We found there are rare chances that some RST packets appear during > > > > > >>> the shakehands because the timewait socket cannot accept the SYN and > > > > > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > > > > >>> > > > > > >>> Here is how things happen in production: > > > > > >>> Time Client(A) Server(B) > > > > > >>> 0s SYN--> > > > > > >>> ... > > > > > >>> 132s <-- FIN > > > > > >>> ... > > > > > >>> 169s FIN--> > > > > > >>> 169s <-- ACK > > > > > >>> 169s SYN--> > > > > > >>> 169s <-- ACK > > > > > >>> 169s RST--> > > > > > >>> As above picture shows, the two flows have a start time difference > > > > > >>> of 169 seconds. B starts to send FIN so it will finally enter into > > > > > >>> TIMEWAIT state. Nearly at the same time A launches a new connection > > > > > >>> that soon is reset by itself due to receiving a ACK. > > > > > >>> > > > > > >>> There are two key checks in tcp_timewait_state_process() when timewait > > > > > >>> socket in B receives the SYN packet: > > > > > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > > > > > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > > > > >>> > > > > > >>> Regarding the first rule, it fails as expected because in the first > > > > > >>> connection the seq of SYN sent from A is 1892994276, then 169s have > > > > > >>> passed, the second SYN has 239034613 (caused by overflow of s32). > > > > > >>> > > > > > >>> Then how about the second rule? > > > > > >>> It fails again! > > > > > >>> Let's take a look at how the tsval comes out: > > > > > >>> __tcp_transmit_skb() > > > > > >>> -> tcp_syn_options() > > > > > >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > > > > > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the > > > > > >>> other is tp->tsoffset. The latter value is fixed, so we don't need > > > > > >>> to care about it. If both operations (sending FIN and then starting > > > > > >>> sending SYN) from A happen in 1ms, then the tsval would be the same. > > > > > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is > > > > > >>> with millisecond precision. > > > > > >>> > > > > > >>> Based on the above analysis, I decided to make a small change to > > > > > >>> the check in tcp_timewait_state_process() so that the second flow > > > > > >>> would not fail. > > > > > >>> > > > > > >> > > > > > >> I wonder what a bad result the RST causes. As far as I know, the client > > > > > >> will not close the connect and return. Instead, it re-sends an SYN in > > > > > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in > > > > > >> tcp_rcv_synsent_state_process). So the second connection could still be > > > > > >> established successfully, at the cost of a bit more delay. Like: > > > > > >> > > > > > >> Time Client(A) Server(B) > > > > > >> 0s SYN--> > > > > > >> ... > > > > > >> 132s <-- FIN > > > > > >> ... > > > > > >> 169s FIN--> > > > > > >> 169s <-- ACK > > > > > >> 169s SYN--> > > > > > >> 169s <-- ACK > > > > > >> 169s RST--> > > > > > >> ~2jiffies SYN--> > > > > It doesn't happen. I dare to say the SYN will not be retransmitted > > soon which I can tell from many tcpdump logs I captured. > > My mind went blank probably because of getting sick. Sorry. Why the > tcpdump doesn't show the retransmitted SYN packet is our private > kernel missed this following commit: > > commit 9603d47bad4642118fa19fd1562569663d9235f6 > Author: SeongJae Park <sjpark@amazon.de> > Date: Sun Feb 2 03:38:26 2020 +0000 > > tcp: Reduce SYN resend delay if a suspicous ACK is received > > When closing a connection, the two acks that required to change closing > socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in > reverse order. This is possible in RSS disabled environments such as a > connection inside a host. Not having a patch from linux-5.6 does not really give a good impression, please next time make sure your patches are needed for upstream trees, net or net-next. This also means you can write a packetdrill test to show the benefit of this patch, and send it as a new selftests ;) Back to packetdrill, and everyone will be happy.
On Thu, Nov 7, 2024 at 10:45 AM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Nov 7, 2024 at 10:30 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Thu, Nov 7, 2024 at 5:00 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > On Thu, Nov 7, 2024 at 4:37 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > > > On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > > > On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 2024/11/7 16:01, Jason Xing wrote: > > > > > > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > > > > > >> > > > > > > >> Hi Jason, > > > > > > >> > > > > > > >> On 2024/11/5 10:55, Jason Xing wrote: > > > > > > >>> From: Jason Xing <kernelxing@tencent.com> > > > > > > >>> > > > > > > >>> We found there are rare chances that some RST packets appear during > > > > > > >>> the shakehands because the timewait socket cannot accept the SYN and > > > > > > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > > > > > >>> > > > > > > >>> Here is how things happen in production: > > > > > > >>> Time Client(A) Server(B) > > > > > > >>> 0s SYN--> > > > > > > >>> ... > > > > > > >>> 132s <-- FIN > > > > > > >>> ... > > > > > > >>> 169s FIN--> > > > > > > >>> 169s <-- ACK > > > > > > >>> 169s SYN--> > > > > > > >>> 169s <-- ACK > > > > > > >>> 169s RST--> > > > > > > >>> As above picture shows, the two flows have a start time difference > > > > > > >>> of 169 seconds. B starts to send FIN so it will finally enter into > > > > > > >>> TIMEWAIT state. Nearly at the same time A launches a new connection > > > > > > >>> that soon is reset by itself due to receiving a ACK. > > > > > > >>> > > > > > > >>> There are two key checks in tcp_timewait_state_process() when timewait > > > > > > >>> socket in B receives the SYN packet: > > > > > > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > > > > > > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > > > > > >>> > > > > > > >>> Regarding the first rule, it fails as expected because in the first > > > > > > >>> connection the seq of SYN sent from A is 1892994276, then 169s have > > > > > > >>> passed, the second SYN has 239034613 (caused by overflow of s32). > > > > > > >>> > > > > > > >>> Then how about the second rule? > > > > > > >>> It fails again! > > > > > > >>> Let's take a look at how the tsval comes out: > > > > > > >>> __tcp_transmit_skb() > > > > > > >>> -> tcp_syn_options() > > > > > > >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > > > > > > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the > > > > > > >>> other is tp->tsoffset. The latter value is fixed, so we don't need > > > > > > >>> to care about it. If both operations (sending FIN and then starting > > > > > > >>> sending SYN) from A happen in 1ms, then the tsval would be the same. > > > > > > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is > > > > > > >>> with millisecond precision. > > > > > > >>> > > > > > > >>> Based on the above analysis, I decided to make a small change to > > > > > > >>> the check in tcp_timewait_state_process() so that the second flow > > > > > > >>> would not fail. > > > > > > >>> > > > > > > >> > > > > > > >> I wonder what a bad result the RST causes. As far as I know, the client > > > > > > >> will not close the connect and return. Instead, it re-sends an SYN in > > > > > > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in > > > > > > >> tcp_rcv_synsent_state_process). So the second connection could still be > > > > > > >> established successfully, at the cost of a bit more delay. Like: > > > > > > >> > > > > > > >> Time Client(A) Server(B) > > > > > > >> 0s SYN--> > > > > > > >> ... > > > > > > >> 132s <-- FIN > > > > > > >> ... > > > > > > >> 169s FIN--> > > > > > > >> 169s <-- ACK > > > > > > >> 169s SYN--> > > > > > > >> 169s <-- ACK > > > > > > >> 169s RST--> > > > > > > >> ~2jiffies SYN--> > > > > > > It doesn't happen. I dare to say the SYN will not be retransmitted > > > soon which I can tell from many tcpdump logs I captured. > > > > My mind went blank probably because of getting sick. Sorry. Why the > > tcpdump doesn't show the retransmitted SYN packet is our private > > kernel missed this following commit: > > > > commit 9603d47bad4642118fa19fd1562569663d9235f6 > > Author: SeongJae Park <sjpark@amazon.de> > > Date: Sun Feb 2 03:38:26 2020 +0000 > > > > tcp: Reduce SYN resend delay if a suspicous ACK is received > > > > When closing a connection, the two acks that required to change closing > > socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in > > reverse order. This is possible in RSS disabled environments such as a > > connection inside a host. > > Not having a patch from linux-5.6 does not really give a good impression, > please next time make sure your patches are needed for upstream trees, > net or net-next. > > This also means you can write a packetdrill test to show the benefit > of this patch, > and send it as a new selftests ;) > > Back to packetdrill, and everyone will be happy. Well, there is selftest already, not based on packetdrill because we had no packetdrill support yet in selftests commit af8c8a450bf4698a8a6a7c68956ea5ccafe4cc88 Author: SeongJae Park <sjpark@amazon.de> Date: Sun Feb 2 03:38:27 2020 +0000 selftests: net: Add FIN_ACK processing order related latency spike test
On Thu, Nov 7, 2024 at 5:45 PM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Nov 7, 2024 at 10:30 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Thu, Nov 7, 2024 at 5:00 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > On Thu, Nov 7, 2024 at 4:37 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > > > On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > > > > > > > On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 2024/11/7 16:01, Jason Xing wrote: > > > > > > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote: > > > > > > >> > > > > > > >> Hi Jason, > > > > > > >> > > > > > > >> On 2024/11/5 10:55, Jason Xing wrote: > > > > > > >>> From: Jason Xing <kernelxing@tencent.com> > > > > > > >>> > > > > > > >>> We found there are rare chances that some RST packets appear during > > > > > > >>> the shakehands because the timewait socket cannot accept the SYN and > > > > > > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process(). > > > > > > >>> > > > > > > >>> Here is how things happen in production: > > > > > > >>> Time Client(A) Server(B) > > > > > > >>> 0s SYN--> > > > > > > >>> ... > > > > > > >>> 132s <-- FIN > > > > > > >>> ... > > > > > > >>> 169s FIN--> > > > > > > >>> 169s <-- ACK > > > > > > >>> 169s SYN--> > > > > > > >>> 169s <-- ACK > > > > > > >>> 169s RST--> > > > > > > >>> As above picture shows, the two flows have a start time difference > > > > > > >>> of 169 seconds. B starts to send FIN so it will finally enter into > > > > > > >>> TIMEWAIT state. Nearly at the same time A launches a new connection > > > > > > >>> that soon is reset by itself due to receiving a ACK. > > > > > > >>> > > > > > > >>> There are two key checks in tcp_timewait_state_process() when timewait > > > > > > >>> socket in B receives the SYN packet: > > > > > > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt) > > > > > > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0) > > > > > > >>> > > > > > > >>> Regarding the first rule, it fails as expected because in the first > > > > > > >>> connection the seq of SYN sent from A is 1892994276, then 169s have > > > > > > >>> passed, the second SYN has 239034613 (caused by overflow of s32). > > > > > > >>> > > > > > > >>> Then how about the second rule? > > > > > > >>> It fails again! > > > > > > >>> Let's take a look at how the tsval comes out: > > > > > > >>> __tcp_transmit_skb() > > > > > > >>> -> tcp_syn_options() > > > > > > >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset; > > > > > > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the > > > > > > >>> other is tp->tsoffset. The latter value is fixed, so we don't need > > > > > > >>> to care about it. If both operations (sending FIN and then starting > > > > > > >>> sending SYN) from A happen in 1ms, then the tsval would be the same. > > > > > > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is > > > > > > >>> with millisecond precision. > > > > > > >>> > > > > > > >>> Based on the above analysis, I decided to make a small change to > > > > > > >>> the check in tcp_timewait_state_process() so that the second flow > > > > > > >>> would not fail. > > > > > > >>> > > > > > > >> > > > > > > >> I wonder what a bad result the RST causes. As far as I know, the client > > > > > > >> will not close the connect and return. Instead, it re-sends an SYN in > > > > > > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in > > > > > > >> tcp_rcv_synsent_state_process). So the second connection could still be > > > > > > >> established successfully, at the cost of a bit more delay. Like: > > > > > > >> > > > > > > >> Time Client(A) Server(B) > > > > > > >> 0s SYN--> > > > > > > >> ... > > > > > > >> 132s <-- FIN > > > > > > >> ... > > > > > > >> 169s FIN--> > > > > > > >> 169s <-- ACK > > > > > > >> 169s SYN--> > > > > > > >> 169s <-- ACK > > > > > > >> 169s RST--> > > > > > > >> ~2jiffies SYN--> > > > > > > It doesn't happen. I dare to say the SYN will not be retransmitted > > > soon which I can tell from many tcpdump logs I captured. > > > > My mind went blank probably because of getting sick. Sorry. Why the > > tcpdump doesn't show the retransmitted SYN packet is our private > > kernel missed this following commit: > > > > commit 9603d47bad4642118fa19fd1562569663d9235f6 > > Author: SeongJae Park <sjpark@amazon.de> > > Date: Sun Feb 2 03:38:26 2020 +0000 > > > > tcp: Reduce SYN resend delay if a suspicous ACK is received > > > > When closing a connection, the two acks that required to change closing > > socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in > > reverse order. This is possible in RSS disabled environments such as a > > connection inside a host. > > Not having a patch from linux-5.6 does not really give a good impression, > please next time make sure your patches are needed for upstream trees, > net or net-next. Sorry about this. Normally, I found an issue on those older kernels and then I will test it on the net-next tree. The issue this patch tried to fix can be reproduced, as we all see. But today regarding the topic about re-sending an SYN quickly, I didn't conduct the test, only analyzing the code :( Sorry for the noise. > > This also means you can write a packetdrill test to show the benefit > of this patch, > and send it as a new selftests ;) > > Back to packetdrill, and everyone will be happy. Thanks for reminding me. It's really necessary for me to get started to learn how to write various packetdrill tests.
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index bb1fe1ba867a..2b29d1bf5ca0 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb, if (th->syn && !th->rst && !th->ack && !paws_reject && (after(TCP_SKB_CB(skb)->seq, rcv_nxt) || (tmp_opt.saw_tstamp && - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) { + (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) <= 0))) { u32 isn = tcptw->tw_snd_nxt + 65535 + 2; if (isn == 0) isn++;