Message ID | 1611311242-6675-1-git-send-email-yangpc@wangsu.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] tcp: fix TLP timer not set when CA_STATE changes from DISORDER to OPEN | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | fail | Series targets non-next tree, but doesn't contain any Fixes tags |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 2 maintainers not CCed: yoshfuji@linux-ipv6.org kuba@kernel.org |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 1498 this patch: 1498 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | CHECK: extern prototypes should be avoided in .h files |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 1505 this patch: 1505 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On Fri, Jan 22, 2021 at 11:28 AM Pengcheng Yang <yangpc@wangsu.com> wrote: > > When CA_STATE is in DISORDER, the TLP timer is not set when receiving > an ACK (a cumulative ACK covered out-of-order data) causes CA_STATE to > change from DISORDER to OPEN. If the sender is app-limited, it can only > wait for the RTO timer to expire and retransmit. > > The reason for this is that the TLP timer is set before CA_STATE changes > in tcp_ack(), so we delay the time point of calling tcp_set_xmit_timer() > until after tcp_fastretrans_alert() returns and remove the > FLAG_SET_XMIT_TIMER from ack_flag when the RACK reorder timer is set. > > This commit has two additional benefits: > 1) Make sure to reset RTO according to RFC6298 when receiving ACK, to > avoid spurious RTO caused by RTO timer early expires. > 2) Reduce the xmit timer reschedule once per ACK when the RACK reorder > timer is set. > > Link: https://lore.kernel.org/netdev/1611139794-11254-1-git-send-email-yangpc@wangsu.com > Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> > Cc: Neal Cardwell <ncardwell@google.com> > --- This looks like a very nice patch, let me run packetdrill tests on it. By any chance, have you cooked a packetdrill test showing the issue (failing on unpatched kernel) ? Thanks. > include/net/tcp.h | 2 +- > net/ipv4/tcp_input.c | 10 ++++++---- > net/ipv4/tcp_recovery.c | 5 +++-- > 3 files changed, 10 insertions(+), 7 deletions(-) > > diff --git a/include/net/tcp.h b/include/net/tcp.h > index 78d13c8..67f7e52 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -2060,7 +2060,7 @@ static inline __u32 cookie_init_sequence(const struct tcp_request_sock_ops *ops, > void tcp_newreno_mark_lost(struct sock *sk, bool snd_una_advanced); > extern s32 tcp_rack_skb_timeout(struct tcp_sock *tp, struct sk_buff *skb, > u32 reo_wnd); > -extern void tcp_rack_mark_lost(struct sock *sk); > +extern bool tcp_rack_mark_lost(struct sock *sk); > extern void tcp_rack_advance(struct tcp_sock *tp, u8 sacked, u32 end_seq, > u64 xmit_time); > extern void tcp_rack_reo_timeout(struct sock *sk); > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index c7e16b0..d0a9588 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -2859,7 +2859,8 @@ static void tcp_identify_packet_loss(struct sock *sk, int *ack_flag) > } else if (tcp_is_rack(sk)) { > u32 prior_retrans = tp->retrans_out; > > - tcp_rack_mark_lost(sk); > + if (tcp_rack_mark_lost(sk)) > + *ack_flag &= ~FLAG_SET_XMIT_TIMER; > if (prior_retrans > tp->retrans_out) > *ack_flag |= FLAG_LOST_RETRANS; > } > @@ -3815,9 +3816,6 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) > > if (tp->tlp_high_seq) > tcp_process_tlp_ack(sk, ack, flag); > - /* If needed, reset TLP/RTO timer; RACK may later override this. */ > - if (flag & FLAG_SET_XMIT_TIMER) > - tcp_set_xmit_timer(sk); > > if (tcp_ack_is_dubious(sk, flag)) { > if (!(flag & (FLAG_SND_UNA_ADVANCED | FLAG_NOT_DUP))) { > @@ -3830,6 +3828,10 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) > &rexmit); > } > > + /* If needed, reset TLP/RTO timer when RACK doesn't set. */ > + if (flag & FLAG_SET_XMIT_TIMER) > + tcp_set_xmit_timer(sk); > + > if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) > sk_dst_confirm(sk); > > diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c > index 177307a..6f1b4ac 100644 > --- a/net/ipv4/tcp_recovery.c > +++ b/net/ipv4/tcp_recovery.c > @@ -96,13 +96,13 @@ static void tcp_rack_detect_loss(struct sock *sk, u32 *reo_timeout) > } > } > > -void tcp_rack_mark_lost(struct sock *sk) > +bool tcp_rack_mark_lost(struct sock *sk) > { > struct tcp_sock *tp = tcp_sk(sk); > u32 timeout; > > if (!tp->rack.advanced) > - return; > + return false; > > /* Reset the advanced flag to avoid unnecessary queue scanning */ > tp->rack.advanced = 0; > @@ -112,6 +112,7 @@ void tcp_rack_mark_lost(struct sock *sk) > inet_csk_reset_xmit_timer(sk, ICSK_TIME_REO_TIMEOUT, > timeout, inet_csk(sk)->icsk_rto); > } > + return !!timeout; > } > > /* Record the most recently (re)sent time among the (s)acked packets > -- > 1.8.3.1 >
On Fri, Jan 22, 2021 at 5:53 AM Eric Dumazet <edumazet@google.com> wrote: > > On Fri, Jan 22, 2021 at 11:28 AM Pengcheng Yang <yangpc@wangsu.com> wrote: > > > > When CA_STATE is in DISORDER, the TLP timer is not set when receiving > > an ACK (a cumulative ACK covered out-of-order data) causes CA_STATE to > > change from DISORDER to OPEN. If the sender is app-limited, it can only > > wait for the RTO timer to expire and retransmit. > > > > The reason for this is that the TLP timer is set before CA_STATE changes > > in tcp_ack(), so we delay the time point of calling tcp_set_xmit_timer() > > until after tcp_fastretrans_alert() returns and remove the > > FLAG_SET_XMIT_TIMER from ack_flag when the RACK reorder timer is set. > > > > This commit has two additional benefits: > > 1) Make sure to reset RTO according to RFC6298 when receiving ACK, to > > avoid spurious RTO caused by RTO timer early expires. > > 2) Reduce the xmit timer reschedule once per ACK when the RACK reorder > > timer is set. > > > > Link: https://lore.kernel.org/netdev/1611139794-11254-1-git-send-email-yangpc@wangsu.com > > Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> > > Cc: Neal Cardwell <ncardwell@google.com> > > --- > > This looks like a very nice patch, let me run packetdrill tests on it. > > By any chance, have you cooked a packetdrill test showing the issue > (failing on unpatched kernel) ? Thanks, Pengcheng. This patch looks good to me as well, assuming it passes our packetdrill tests. I agree with Eric that it would be good to have an explicit packetdrill test for this case. neal
On Fri, Jan 22, 2021 at 6:37 AM Neal Cardwell <ncardwell@google.com> wrote: > > On Fri, Jan 22, 2021 at 5:53 AM Eric Dumazet <edumazet@google.com> wrote: > > > > On Fri, Jan 22, 2021 at 11:28 AM Pengcheng Yang <yangpc@wangsu.com> wrote: > > > > > > When CA_STATE is in DISORDER, the TLP timer is not set when receiving > > > an ACK (a cumulative ACK covered out-of-order data) causes CA_STATE to > > > change from DISORDER to OPEN. If the sender is app-limited, it can only Could you point which line of code causes the state to flip incorrectly due to the TLP timer setting? > > > wait for the RTO timer to expire and retransmit. > > > > > > The reason for this is that the TLP timer is set before CA_STATE changes > > > in tcp_ack(), so we delay the time point of calling tcp_set_xmit_timer() > > > until after tcp_fastretrans_alert() returns and remove the > > > FLAG_SET_XMIT_TIMER from ack_flag when the RACK reorder timer is set. > > > > > > This commit has two additional benefits: > > > 1) Make sure to reset RTO according to RFC6298 when receiving ACK, to > > > avoid spurious RTO caused by RTO timer early expires. > > > 2) Reduce the xmit timer reschedule once per ACK when the RACK reorder > > > timer is set. > > > > > > Link: https://lore.kernel.org/netdev/1611139794-11254-1-git-send-email-yangpc@wangsu.com > > > Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> > > > Cc: Neal Cardwell <ncardwell@google.com> > > > --- > > > > This looks like a very nice patch, let me run packetdrill tests on it. > > > > By any chance, have you cooked a packetdrill test showing the issue > > (failing on unpatched kernel) ? > > Thanks, Pengcheng. This patch looks good to me as well, assuming it > passes our packetdrill tests. I agree with Eric that it would be good > to have an explicit packetdrill test for this case. > > neal
On Fri, 22 Jan 2021 11:53:46 +0100 Eric Dumazet wrote: > On Fri, Jan 22, 2021 at 11:28 AM Pengcheng Yang <yangpc@wangsu.com> wrote: > > > > When CA_STATE is in DISORDER, the TLP timer is not set when receiving > > an ACK (a cumulative ACK covered out-of-order data) causes CA_STATE to > > change from DISORDER to OPEN. If the sender is app-limited, it can only > > wait for the RTO timer to expire and retransmit. > > > > The reason for this is that the TLP timer is set before CA_STATE changes > > in tcp_ack(), so we delay the time point of calling tcp_set_xmit_timer() > > until after tcp_fastretrans_alert() returns and remove the > > FLAG_SET_XMIT_TIMER from ack_flag when the RACK reorder timer is set. > > > > This commit has two additional benefits: > > 1) Make sure to reset RTO according to RFC6298 when receiving ACK, to > > avoid spurious RTO caused by RTO timer early expires. > > 2) Reduce the xmit timer reschedule once per ACK when the RACK reorder > > timer is set. > > > > Link: https://lore.kernel.org/netdev/1611139794-11254-1-git-send-email-yangpc@wangsu.com > > Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> > > Cc: Neal Cardwell <ncardwell@google.com> > > This looks like a very nice patch, let me run packetdrill tests on it. > > By any chance, have you cooked a packetdrill test showing the issue > (failing on unpatched kernel) ? Any guidance on backporting / fixes tag? (once the packetdrill questions are satisfied)
On Sat, Jan 23, 2021 at 5:02 AM "Yuchung Cheng" <ycheng@google.com> wrote: > > On Fri, Jan 22, 2021 at 6:37 AM Neal Cardwell <ncardwell@google.com> wrote: > > > > On Fri, Jan 22, 2021 at 5:53 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Fri, Jan 22, 2021 at 11:28 AM Pengcheng Yang <yangpc@wangsu.com> wrote: > > > > > > > > When CA_STATE is in DISORDER, the TLP timer is not set when receiving > > > > an ACK (a cumulative ACK covered out-of-order data) causes CA_STATE to > > > > change from DISORDER to OPEN. If the sender is app-limited, it can only > > Could you point which line of code causes the state to flip > incorrectly due to the TLP timer setting? > I mean TLP timer is not set due to receiving an ACK that changes CA_STATE from DISORDER to OPEN. Receive an ACK covered out-of-order data in disorder state: tcp_ack() |-tcp_set_xmit_timer() // RTO timer is set instead of TLP timer | ... |-tcp_fastretrans_alert() // change from disorder to open > > > > wait for the RTO timer to expire and retransmit. > > > > > > > > The reason for this is that the TLP timer is set before CA_STATE changes > > > > in tcp_ack(), so we delay the time point of calling tcp_set_xmit_timer() > > > > until after tcp_fastretrans_alert() returns and remove the > > > > FLAG_SET_XMIT_TIMER from ack_flag when the RACK reorder timer is set. > > > > > > > > This commit has two additional benefits: > > > > 1) Make sure to reset RTO according to RFC6298 when receiving ACK, to > > > > avoid spurious RTO caused by RTO timer early expires. > > > > 2) Reduce the xmit timer reschedule once per ACK when the RACK reorder > > > > timer is set. > > > > > > > > Link: https://lore.kernel.org/netdev/1611139794-11254-1-git-send-email-yangpc@wangsu.com > > > > Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> > > > > Cc: Neal Cardwell <ncardwell@google.com> > > > > --- > > > > > > This looks like a very nice patch, let me run packetdrill tests on it. > > > > > > By any chance, have you cooked a packetdrill test showing the issue > > > (failing on unpatched kernel) ? > > > > Thanks, Pengcheng. This patch looks good to me as well, assuming it > > passes our packetdrill tests. I agree with Eric that it would be good > > to have an explicit packetdrill test for this case. > > > > neal Here is a packetdrill test case: // Enable TLP 0 `sysctl -q net.ipv4.tcp_early_retrans=3` // Establish a connection +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // RTT 100ms, RTO 300ms +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <...> +.1 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 // Send 4 data segments +0 write(4, ..., 4000) = 4000 +0 > P. 1:4001(4000) ack 1 // out-of-order: ca_state turns to disorder +.1 < . 1:1(0) ack 1 win 257 <sack 1001:2001,nop,nop> // ACK covered out-of-order data: ca_state turns to open, // but RTO timer is set instead of TLP timer and the RTO // timer will expire at rtx_head_time+RTO (in 200ms). +0 < . 1:1(0) ack 2001 win 257 // Expect to send TLP packet in 2*rtt (200ms) +.2~+.25 > P. 3001:4001(1000) ack 1 I ran this packetdrill test case on the kernel without the patch applied: tlp_timer_unset.pkt:31: error handling packet: live packet field tcp_seq: expected: 3001 (0xbb9) vs actual: 2001 (0x7d1) script packet: 0.644587 P. 3001:4001(1000) ack 1 actual packet: 0.646197 . 2001:3001(1000) ack 1 win 502
On Sat, Jan 23, 2021 at 9:27 AM "Jakub Kicinski" <kuba@kernel.org> wrote: > > On Fri, 22 Jan 2021 11:53:46 +0100 Eric Dumazet wrote: > > On Fri, Jan 22, 2021 at 11:28 AM Pengcheng Yang <yangpc@wangsu.com> wrote: > > > > > > When CA_STATE is in DISORDER, the TLP timer is not set when receiving > > > an ACK (a cumulative ACK covered out-of-order data) causes CA_STATE to > > > change from DISORDER to OPEN. If the sender is app-limited, it can only > > > wait for the RTO timer to expire and retransmit. > > > > > > The reason for this is that the TLP timer is set before CA_STATE changes > > > in tcp_ack(), so we delay the time point of calling tcp_set_xmit_timer() > > > until after tcp_fastretrans_alert() returns and remove the > > > FLAG_SET_XMIT_TIMER from ack_flag when the RACK reorder timer is set. > > > > > > This commit has two additional benefits: > > > 1) Make sure to reset RTO according to RFC6298 when receiving ACK, to > > > avoid spurious RTO caused by RTO timer early expires. > > > 2) Reduce the xmit timer reschedule once per ACK when the RACK reorder > > > timer is set. > > > > > > Link: https://lore.kernel.org/netdev/1611139794-11254-1-git-send-email-yangpc@wangsu.com > > > Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> > > > Cc: Neal Cardwell <ncardwell@google.com> > > > > This looks like a very nice patch, let me run packetdrill tests on it. > > > > By any chance, have you cooked a packetdrill test showing the issue > > (failing on unpatched kernel) ? > > Any guidance on backporting / fixes tag? (once the packetdrill > questions are satisfied) By reading the commits, we can add: Fixes: df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed")
On Sat, Jan 23, 2021 at 9:15 AM Pengcheng Yang <yangpc@wangsu.com> wrote: > > On Sat, Jan 23, 2021 at 9:27 AM "Jakub Kicinski" <kuba@kernel.org> wrote: > > > > On Fri, 22 Jan 2021 11:53:46 +0100 Eric Dumazet wrote: > > > On Fri, Jan 22, 2021 at 11:28 AM Pengcheng Yang <yangpc@wangsu.com> wrote: > > > > > > > > When CA_STATE is in DISORDER, the TLP timer is not set when receiving > > > > an ACK (a cumulative ACK covered out-of-order data) causes CA_STATE to > > > > change from DISORDER to OPEN. If the sender is app-limited, it can only > > > > wait for the RTO timer to expire and retransmit. > > > > > > > > The reason for this is that the TLP timer is set before CA_STATE changes > > > > in tcp_ack(), so we delay the time point of calling tcp_set_xmit_timer() > > > > until after tcp_fastretrans_alert() returns and remove the > > > > FLAG_SET_XMIT_TIMER from ack_flag when the RACK reorder timer is set. > > > > > > > > This commit has two additional benefits: > > > > 1) Make sure to reset RTO according to RFC6298 when receiving ACK, to > > > > avoid spurious RTO caused by RTO timer early expires. > > > > 2) Reduce the xmit timer reschedule once per ACK when the RACK reorder > > > > timer is set. > > > > > > > > Link: https://lore.kernel.org/netdev/1611139794-11254-1-git-send-email-yangpc@wangsu.com > > > > Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> > > > > Cc: Neal Cardwell <ncardwell@google.com> > > > > > > This looks like a very nice patch, let me run packetdrill tests on it. > > > > > > By any chance, have you cooked a packetdrill test showing the issue > > > (failing on unpatched kernel) ? > > > > Any guidance on backporting / fixes tag? (once the packetdrill > > questions are satisfied) > > By reading the commits, we can add: > Fixes: df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed") Thanks for the fix and the packetdrill test! Eric ran our internal packetdrill test suite and all the changes in behavior with your patch all look like fixes to me. Acked-by: Neal Cardwell <ncardwell@google.com> The Fixes: tag you suggest also looks good to me. (It seems like patchwork did not pick it up from your email and add it to the commit message automatically, BTW?) neal
On Sat, Jan 23, 2021 at 5:27 AM Pengcheng Yang <yangpc@wangsu.com> wrote: > > On Sat, Jan 23, 2021 at 5:02 AM "Yuchung Cheng" <ycheng@google.com> wrote: > > > > On Fri, Jan 22, 2021 at 6:37 AM Neal Cardwell <ncardwell@google.com> wrote: > > > > > > On Fri, Jan 22, 2021 at 5:53 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > > > On Fri, Jan 22, 2021 at 11:28 AM Pengcheng Yang <yangpc@wangsu.com> wrote: > > > > > > > > > > When CA_STATE is in DISORDER, the TLP timer is not set when receiving > > > > > an ACK (a cumulative ACK covered out-of-order data) causes CA_STATE to > > > > > change from DISORDER to OPEN. If the sender is app-limited, it can only > > > > Could you point which line of code causes the state to flip > > incorrectly due to the TLP timer setting? > > > > I mean TLP timer is not set due to receiving an ACK that changes CA_STATE > from DISORDER to OPEN. Acked-by: Yuchung Cheng <ycheng@google.com> Thanks for the clarification and the fix. The initial cmsg reads like the timer setting is causing state flips. I'd suggest this commit message in your next rev. "Upon receiving a cumulative ACK that changes the congestion state from Disorder to Open, the TLP timer is not set. If the sender is app-limited, it can only wait for the RTO timer to expire and retransmit. The reason for this ... " > > Receive an ACK covered out-of-order data in disorder state: > > tcp_ack() > |-tcp_set_xmit_timer() // RTO timer is set instead of TLP timer > | ... > |-tcp_fastretrans_alert() // change from disorder to open > > > > > > wait for the RTO timer to expire and retransmit. > > > > > > > > > > The reason for this is that the TLP timer is set before CA_STATE changes > > > > > in tcp_ack(), so we delay the time point of calling tcp_set_xmit_timer() > > > > > until after tcp_fastretrans_alert() returns and remove the > > > > > FLAG_SET_XMIT_TIMER from ack_flag when the RACK reorder timer is set. > > > > > > > > > > This commit has two additional benefits: > > > > > 1) Make sure to reset RTO according to RFC6298 when receiving ACK, to > > > > > avoid spurious RTO caused by RTO timer early expires. > > > > > 2) Reduce the xmit timer reschedule once per ACK when the RACK reorder > > > > > timer is set. > > > > > > > > > > Link: https://lore.kernel.org/netdev/1611139794-11254-1-git-send-email-yangpc@wangsu.com > > > > > Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> > > > > > Cc: Neal Cardwell <ncardwell@google.com> > > > > > --- > > > > > > > > This looks like a very nice patch, let me run packetdrill tests on it. > > > > > > > > By any chance, have you cooked a packetdrill test showing the issue > > > > (failing on unpatched kernel) ? > > > > > > Thanks, Pengcheng. This patch looks good to me as well, assuming it > > > passes our packetdrill tests. I agree with Eric that it would be good > > > to have an explicit packetdrill test for this case. > > > > > > neal > > Here is a packetdrill test case: > > // Enable TLP > 0 `sysctl -q net.ipv4.tcp_early_retrans=3` > > // Establish a connection > +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > +0 bind(3, ..., ...) = 0 > +0 listen(3, 1) = 0 > > // RTT 100ms, RTO 300ms > +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> > +0 > S. 0:0(0) ack 1 <...> > +.1 < . 1:1(0) ack 1 win 257 > +0 accept(3, ..., ...) = 4 > > // Send 4 data segments > +0 write(4, ..., 4000) = 4000 > +0 > P. 1:4001(4000) ack 1 > > // out-of-order: ca_state turns to disorder > +.1 < . 1:1(0) ack 1 win 257 <sack 1001:2001,nop,nop> > > // ACK covered out-of-order data: ca_state turns to open, > // but RTO timer is set instead of TLP timer and the RTO > // timer will expire at rtx_head_time+RTO (in 200ms). > +0 < . 1:1(0) ack 2001 win 257 > > // Expect to send TLP packet in 2*rtt (200ms) > +.2~+.25 > P. 3001:4001(1000) ack 1 > > > I ran this packetdrill test case on the kernel without > the patch applied: > > tlp_timer_unset.pkt:31: error handling packet: live packet > field tcp_seq: expected: 3001 (0xbb9) vs actual: 2001 (0x7d1) > script packet: 0.644587 P. 3001:4001(1000) ack 1 > actual packet: 0.646197 . 2001:3001(1000) ack 1 win 502 >
On Sat, 23 Jan 2021 13:25:23 -0500 Neal Cardwell wrote: > On Sat, Jan 23, 2021 at 9:15 AM Pengcheng Yang <yangpc@wangsu.com> wrote: > > > > On Sat, Jan 23, 2021 at 9:27 AM "Jakub Kicinski" <kuba@kernel.org> wrote: > > > > > > On Fri, 22 Jan 2021 11:53:46 +0100 Eric Dumazet wrote: > > > > On Fri, Jan 22, 2021 at 11:28 AM Pengcheng Yang <yangpc@wangsu.com> wrote: > > > > > > > > > > When CA_STATE is in DISORDER, the TLP timer is not set when receiving > > > > > an ACK (a cumulative ACK covered out-of-order data) causes CA_STATE to > > > > > change from DISORDER to OPEN. If the sender is app-limited, it can only > > > > > wait for the RTO timer to expire and retransmit. > > > > > > > > > > The reason for this is that the TLP timer is set before CA_STATE changes > > > > > in tcp_ack(), so we delay the time point of calling tcp_set_xmit_timer() > > > > > until after tcp_fastretrans_alert() returns and remove the > > > > > FLAG_SET_XMIT_TIMER from ack_flag when the RACK reorder timer is set. > > > > > > > > > > This commit has two additional benefits: > > > > > 1) Make sure to reset RTO according to RFC6298 when receiving ACK, to > > > > > avoid spurious RTO caused by RTO timer early expires. > > > > > 2) Reduce the xmit timer reschedule once per ACK when the RACK reorder > > > > > timer is set. > > > > > > > > > > Link: https://lore.kernel.org/netdev/1611139794-11254-1-git-send-email-yangpc@wangsu.com > > > > > Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> > > > > > Cc: Neal Cardwell <ncardwell@google.com> > > > > > > > > This looks like a very nice patch, let me run packetdrill tests on it. > > > > > > > > By any chance, have you cooked a packetdrill test showing the issue > > > > (failing on unpatched kernel) ? > > > > > > Any guidance on backporting / fixes tag? (once the packetdrill > > > questions are satisfied) > > > > By reading the commits, we can add: > > Fixes: df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed") > > Thanks for the fix and the packetdrill test! > > Eric ran our internal packetdrill test suite and all the changes in > behavior with your patch all look like fixes to me. > > Acked-by: Neal Cardwell <ncardwell@google.com> Thanks! > The Fixes: tag you suggest also looks good to me. (It seems like > patchwork did not pick it up from your email and add it to the commit > message automatically, BTW?) Indeed, we haven't worked that out yet in patchwork.kernel.org, but it's no big deal, normally we can add manually. In this case seems like Yuchung asked for a small rewrite to the commit message so I'd appreciate a full repost.
diff --git a/include/net/tcp.h b/include/net/tcp.h index 78d13c8..67f7e52 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2060,7 +2060,7 @@ static inline __u32 cookie_init_sequence(const struct tcp_request_sock_ops *ops, void tcp_newreno_mark_lost(struct sock *sk, bool snd_una_advanced); extern s32 tcp_rack_skb_timeout(struct tcp_sock *tp, struct sk_buff *skb, u32 reo_wnd); -extern void tcp_rack_mark_lost(struct sock *sk); +extern bool tcp_rack_mark_lost(struct sock *sk); extern void tcp_rack_advance(struct tcp_sock *tp, u8 sacked, u32 end_seq, u64 xmit_time); extern void tcp_rack_reo_timeout(struct sock *sk); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c7e16b0..d0a9588 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2859,7 +2859,8 @@ static void tcp_identify_packet_loss(struct sock *sk, int *ack_flag) } else if (tcp_is_rack(sk)) { u32 prior_retrans = tp->retrans_out; - tcp_rack_mark_lost(sk); + if (tcp_rack_mark_lost(sk)) + *ack_flag &= ~FLAG_SET_XMIT_TIMER; if (prior_retrans > tp->retrans_out) *ack_flag |= FLAG_LOST_RETRANS; } @@ -3815,9 +3816,6 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) if (tp->tlp_high_seq) tcp_process_tlp_ack(sk, ack, flag); - /* If needed, reset TLP/RTO timer; RACK may later override this. */ - if (flag & FLAG_SET_XMIT_TIMER) - tcp_set_xmit_timer(sk); if (tcp_ack_is_dubious(sk, flag)) { if (!(flag & (FLAG_SND_UNA_ADVANCED | FLAG_NOT_DUP))) { @@ -3830,6 +3828,10 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) &rexmit); } + /* If needed, reset TLP/RTO timer when RACK doesn't set. */ + if (flag & FLAG_SET_XMIT_TIMER) + tcp_set_xmit_timer(sk); + if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) sk_dst_confirm(sk); diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c index 177307a..6f1b4ac 100644 --- a/net/ipv4/tcp_recovery.c +++ b/net/ipv4/tcp_recovery.c @@ -96,13 +96,13 @@ static void tcp_rack_detect_loss(struct sock *sk, u32 *reo_timeout) } } -void tcp_rack_mark_lost(struct sock *sk) +bool tcp_rack_mark_lost(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); u32 timeout; if (!tp->rack.advanced) - return; + return false; /* Reset the advanced flag to avoid unnecessary queue scanning */ tp->rack.advanced = 0; @@ -112,6 +112,7 @@ void tcp_rack_mark_lost(struct sock *sk) inet_csk_reset_xmit_timer(sk, ICSK_TIME_REO_TIMEOUT, timeout, inet_csk(sk)->icsk_rto); } + return !!timeout; } /* Record the most recently (re)sent time among the (s)acked packets
When CA_STATE is in DISORDER, the TLP timer is not set when receiving an ACK (a cumulative ACK covered out-of-order data) causes CA_STATE to change from DISORDER to OPEN. If the sender is app-limited, it can only wait for the RTO timer to expire and retransmit. The reason for this is that the TLP timer is set before CA_STATE changes in tcp_ack(), so we delay the time point of calling tcp_set_xmit_timer() until after tcp_fastretrans_alert() returns and remove the FLAG_SET_XMIT_TIMER from ack_flag when the RACK reorder timer is set. This commit has two additional benefits: 1) Make sure to reset RTO according to RFC6298 when receiving ACK, to avoid spurious RTO caused by RTO timer early expires. 2) Reduce the xmit timer reschedule once per ACK when the RACK reorder timer is set. Link: https://lore.kernel.org/netdev/1611139794-11254-1-git-send-email-yangpc@wangsu.com Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Cc: Neal Cardwell <ncardwell@google.com> --- include/net/tcp.h | 2 +- net/ipv4/tcp_input.c | 10 ++++++---- net/ipv4/tcp_recovery.c | 5 +++-- 3 files changed, 10 insertions(+), 7 deletions(-)