Message ID | 20240716-upstream-net-next-20240716-tcp-3rd-ack-consume-sk_socket-v1-1-4e61d0b79233@kernel.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] tcp: process the 3rd ACK with sk_socket for for TFO/MPTCP | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Clearly marked for net |
netdev/apply | fail | Patch does not apply to net-0 |
On Tue, Jul 16, 2024 at 12:43 PM Matthieu Baerts (NGI0) <matttbe@kernel.org> wrote: > > The 'Fixes' commit recently changed the behaviour of TCP by skipping the > processing of the 3rd ACK when a sk->sk_socket is set. The goal was to > skip tcp_ack_snd_check() in tcp_rcv_state_process() not to send an > unnecessary ACK in case of simultaneous connect(). Unfortunately, that > had an impact on TFO and MPTCP. > > I started to look at the impact on MPTCP, because the MPTCP CI found > some issues with the MPTCP Packetdrill tests [1]. Then Paolo suggested > me to look at the impact on TFO with "plain" TCP. > > For MPTCP, when receiving the 3rd ACK of a request adding a new path > (MP_JOIN), sk->sk_socket will be set, and point to the MPTCP sock that > has been created when the MPTCP connection got established before with > the first path. The newly added 'goto' will then skip the processing of > the segment text (step 7) and not go through tcp_data_queue() where the > MPTCP options are validated, and some actions are triggered, e.g. > sending the MPJ 4th ACK [2] as demonstrated by the new errors when > running a packetdrill test [3] establishing a second subflow. > > This doesn't fully break MPTCP, mainly the 4th MPJ ACK that will be > delayed. Still, we don't want to have this behaviour as it delays the > switch to the fully established mode, and invalid MPTCP options in this > 3rd ACK will not be caught any more. This modification also affects the > MPTCP + TFO feature as well, and being the reason why the selftests > started to be unstable the last few days [4]. > > For TFO, the existing 'basic-cookie-not-reqd' test [5] was no longer > passing: if the 3rd ACK contains data, these data would no longer be > processed, and thus not ACKed. > > Note that for MPTCP, in case of simultaneous connect(), a fallback to > TCP will be done, which seems fine: > > `../common/defaults.sh` > > 0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_MPTCP) = 3 > +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) > > +0 > S 0:0(0) <mss 1460, sackOK, TS val 100 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> > +0 < S 0:0(0) win 1000 <mss 1460, sackOK, TS val 407 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> > +0 > S. 0:0(0) ack 1 <mss 1460, sackOK, TS val 330 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> > +0 < S. 0:0(0) ack 1 win 65535 <mss 1460, sackOK, TS val 700 ecr 100, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey=2]> > > +0 write(3, ..., 100) = 100 > +0 > . 1:1(0) ack 1 <nop, nop, TS val 845707014 ecr 700, nop, nop, sack 0:1> > +0 > P. 1:101(100) ack 1 <nop, nop, TS val 845958933 ecr 700> > > Link: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/9936227696 [1] > Link: https://datatracker.ietf.org/doc/html/rfc8684#fig_tokens [2] > Link: https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/syscalls/accept.pkt#L28 [3] > Link: https://netdev.bots.linux.dev/contest.html?executor=vmksft-mptcp-dbg&test=mptcp-connect-sh [4] > Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/server/basic-cookie-not-reqd.pkt#L21 [5] > Fixes: 23e89e8ee7be ("tcp: Don't drop SYN+ACK for simultaneous connect().") > Suggested-by: Paolo Abeni <pabeni@redhat.com> > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > --- > Notes: > - We could also drop this 'goto consume', and send the unnecessary ACK > in this simultaneous connect case, which doesn't seem to be a "real" > case, more something for fuzzers. > - When sending this patch, the 'Fixes' commit is only in net-next, this > patch is then on top of net-next. But because net-next will be merged > into -net soon -- judging by the PR that has been sent to Linus a few > hours ago -- the 'net' prefix is then used. > --- > net/ipv4/tcp_input.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index ff9ab3d01ced..a89b3ee57d8c 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -6820,7 +6820,13 @@ tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb) > if (sk->sk_shutdown & SEND_SHUTDOWN) > tcp_shutdown(sk, SEND_SHUTDOWN); > > - if (sk->sk_socket) > + /* In simult-connect cases, sk_socket will be assigned. But also > + * with TFO and MPTCP (MPJ) while they required further > + * processing later in tcp_data_queue(). > + */ > + if (sk->sk_socket && > + TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq && > + !sk_is_mptcp(sk)) > goto consume; > break; > Hi Matthieu I had no time yet to run all our packetdrill tests with Kuniyuki patch because of the ongoing netdev conference. Is it ok for you if we hold your patch for about 5 days ? I would like to make sure we did not miss anything else. I am CCing Neal, perhaps he can help to expedite the testing part while I am busy. Thanks !
Hi Eric, On 17/07/2024 16:57, Eric Dumazet wrote: > On Tue, Jul 16, 2024 at 12:43 PM Matthieu Baerts (NGI0) > <matttbe@kernel.org> wrote: >> >> The 'Fixes' commit recently changed the behaviour of TCP by skipping the >> processing of the 3rd ACK when a sk->sk_socket is set. The goal was to >> skip tcp_ack_snd_check() in tcp_rcv_state_process() not to send an >> unnecessary ACK in case of simultaneous connect(). Unfortunately, that >> had an impact on TFO and MPTCP. >> >> I started to look at the impact on MPTCP, because the MPTCP CI found >> some issues with the MPTCP Packetdrill tests [1]. Then Paolo suggested >> me to look at the impact on TFO with "plain" TCP. >> >> For MPTCP, when receiving the 3rd ACK of a request adding a new path >> (MP_JOIN), sk->sk_socket will be set, and point to the MPTCP sock that >> has been created when the MPTCP connection got established before with >> the first path. The newly added 'goto' will then skip the processing of >> the segment text (step 7) and not go through tcp_data_queue() where the >> MPTCP options are validated, and some actions are triggered, e.g. >> sending the MPJ 4th ACK [2] as demonstrated by the new errors when >> running a packetdrill test [3] establishing a second subflow. >> >> This doesn't fully break MPTCP, mainly the 4th MPJ ACK that will be >> delayed. Still, we don't want to have this behaviour as it delays the >> switch to the fully established mode, and invalid MPTCP options in this >> 3rd ACK will not be caught any more. This modification also affects the >> MPTCP + TFO feature as well, and being the reason why the selftests >> started to be unstable the last few days [4]. >> >> For TFO, the existing 'basic-cookie-not-reqd' test [5] was no longer >> passing: if the 3rd ACK contains data, these data would no longer be >> processed, and thus not ACKed. >> >> Note that for MPTCP, in case of simultaneous connect(), a fallback to >> TCP will be done, which seems fine: >> >> `../common/defaults.sh` >> >> 0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_MPTCP) = 3 >> +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) >> >> +0 > S 0:0(0) <mss 1460, sackOK, TS val 100 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> >> +0 < S 0:0(0) win 1000 <mss 1460, sackOK, TS val 407 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> >> +0 > S. 0:0(0) ack 1 <mss 1460, sackOK, TS val 330 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> >> +0 < S. 0:0(0) ack 1 win 65535 <mss 1460, sackOK, TS val 700 ecr 100, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey=2]> >> >> +0 write(3, ..., 100) = 100 >> +0 > . 1:1(0) ack 1 <nop, nop, TS val 845707014 ecr 700, nop, nop, sack 0:1> >> +0 > P. 1:101(100) ack 1 <nop, nop, TS val 845958933 ecr 700> >> >> Link: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/9936227696 [1] >> Link: https://datatracker.ietf.org/doc/html/rfc8684#fig_tokens [2] >> Link: https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/syscalls/accept.pkt#L28 [3] >> Link: https://netdev.bots.linux.dev/contest.html?executor=vmksft-mptcp-dbg&test=mptcp-connect-sh [4] >> Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/server/basic-cookie-not-reqd.pkt#L21 [5] >> Fixes: 23e89e8ee7be ("tcp: Don't drop SYN+ACK for simultaneous connect().") >> Suggested-by: Paolo Abeni <pabeni@redhat.com> >> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> >> --- >> Notes: >> - We could also drop this 'goto consume', and send the unnecessary ACK >> in this simultaneous connect case, which doesn't seem to be a "real" >> case, more something for fuzzers. >> - When sending this patch, the 'Fixes' commit is only in net-next, this >> patch is then on top of net-next. But because net-next will be merged >> into -net soon -- judging by the PR that has been sent to Linus a few >> hours ago -- the 'net' prefix is then used. >> --- >> net/ipv4/tcp_input.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c >> index ff9ab3d01ced..a89b3ee57d8c 100644 >> --- a/net/ipv4/tcp_input.c >> +++ b/net/ipv4/tcp_input.c >> @@ -6820,7 +6820,13 @@ tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb) >> if (sk->sk_shutdown & SEND_SHUTDOWN) >> tcp_shutdown(sk, SEND_SHUTDOWN); >> >> - if (sk->sk_socket) >> + /* In simult-connect cases, sk_socket will be assigned. But also >> + * with TFO and MPTCP (MPJ) while they required further >> + * processing later in tcp_data_queue(). >> + */ >> + if (sk->sk_socket && >> + TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq && >> + !sk_is_mptcp(sk)) >> goto consume; >> break; >> > > Hi Matthieu > > I had no time yet to run all our packetdrill tests with Kuniyuki patch > because of the ongoing netdev conference. > > Is it ok for you if we hold your patch for about 5 days ? Sure, no problem, take your time! > I would like to make sure we did not miss anything else. I understand! > I am CCing Neal, perhaps he can help to expedite the testing part > while I am busy. Thank you, no urgency here. If it's OK with you, I can send a v2 using Kuniyuki's suggestion -- simply limiting the bypass to SYN+ACK only -- because it is simpler and ready to be sent, but also to please the CI because my v1 was rejected by the CI because I sent it just before the sync with Linus tree. We can choose later to pick the v2, the previous one, or a future one. Cheers, Matt
On 7/17/24 17:09, Matthieu Baerts wrote: > On 17/07/2024 16:57, Eric Dumazet wrote: >> I had no time yet to run all our packetdrill tests with Kuniyuki patch >> because of the ongoing netdev conference. >> >> Is it ok for you if we hold your patch for about 5 days ? > > Sure, no problem, take your time! > >> I would like to make sure we did not miss anything else. > > I understand! > >> I am CCing Neal, perhaps he can help to expedite the testing part >> while I am busy. > > Thank you, no urgency here. > > If it's OK with you, I can send a v2 using Kuniyuki's suggestion -- > simply limiting the bypass to SYN+ACK only -- because it is simpler and > ready to be sent, but also to please the CI because my v1 was rejected > by the CI because I sent it just before the sync with Linus tree. We can > choose later to pick the v2, the previous one, or a future one. I think it would be better to have this patch going through the netdev CI, so a repost would be appreciated. I also thing Kuniyuki's suggestion should be preferred, so I would say go for it :) @Neal could you please run the pktdrill tests on the new, upcoming version, instead? Thanks, Paolo
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index ff9ab3d01ced..a89b3ee57d8c 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -6820,7 +6820,13 @@ tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb) if (sk->sk_shutdown & SEND_SHUTDOWN) tcp_shutdown(sk, SEND_SHUTDOWN); - if (sk->sk_socket) + /* In simult-connect cases, sk_socket will be assigned. But also + * with TFO and MPTCP (MPJ) while they required further + * processing later in tcp_data_queue(). + */ + if (sk->sk_socket && + TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq && + !sk_is_mptcp(sk)) goto consume; break;
The 'Fixes' commit recently changed the behaviour of TCP by skipping the processing of the 3rd ACK when a sk->sk_socket is set. The goal was to skip tcp_ack_snd_check() in tcp_rcv_state_process() not to send an unnecessary ACK in case of simultaneous connect(). Unfortunately, that had an impact on TFO and MPTCP. I started to look at the impact on MPTCP, because the MPTCP CI found some issues with the MPTCP Packetdrill tests [1]. Then Paolo suggested me to look at the impact on TFO with "plain" TCP. For MPTCP, when receiving the 3rd ACK of a request adding a new path (MP_JOIN), sk->sk_socket will be set, and point to the MPTCP sock that has been created when the MPTCP connection got established before with the first path. The newly added 'goto' will then skip the processing of the segment text (step 7) and not go through tcp_data_queue() where the MPTCP options are validated, and some actions are triggered, e.g. sending the MPJ 4th ACK [2] as demonstrated by the new errors when running a packetdrill test [3] establishing a second subflow. This doesn't fully break MPTCP, mainly the 4th MPJ ACK that will be delayed. Still, we don't want to have this behaviour as it delays the switch to the fully established mode, and invalid MPTCP options in this 3rd ACK will not be caught any more. This modification also affects the MPTCP + TFO feature as well, and being the reason why the selftests started to be unstable the last few days [4]. For TFO, the existing 'basic-cookie-not-reqd' test [5] was no longer passing: if the 3rd ACK contains data, these data would no longer be processed, and thus not ACKed. Note that for MPTCP, in case of simultaneous connect(), a fallback to TCP will be done, which seems fine: `../common/defaults.sh` 0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_MPTCP) = 3 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > S 0:0(0) <mss 1460, sackOK, TS val 100 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> +0 < S 0:0(0) win 1000 <mss 1460, sackOK, TS val 407 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> +0 > S. 0:0(0) ack 1 <mss 1460, sackOK, TS val 330 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> +0 < S. 0:0(0) ack 1 win 65535 <mss 1460, sackOK, TS val 700 ecr 100, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey=2]> +0 write(3, ..., 100) = 100 +0 > . 1:1(0) ack 1 <nop, nop, TS val 845707014 ecr 700, nop, nop, sack 0:1> +0 > P. 1:101(100) ack 1 <nop, nop, TS val 845958933 ecr 700> Link: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/9936227696 [1] Link: https://datatracker.ietf.org/doc/html/rfc8684#fig_tokens [2] Link: https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/syscalls/accept.pkt#L28 [3] Link: https://netdev.bots.linux.dev/contest.html?executor=vmksft-mptcp-dbg&test=mptcp-connect-sh [4] Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/server/basic-cookie-not-reqd.pkt#L21 [5] Fixes: 23e89e8ee7be ("tcp: Don't drop SYN+ACK for simultaneous connect().") Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- Notes: - We could also drop this 'goto consume', and send the unnecessary ACK in this simultaneous connect case, which doesn't seem to be a "real" case, more something for fuzzers. - When sending this patch, the 'Fixes' commit is only in net-next, this patch is then on top of net-next. But because net-next will be merged into -net soon -- judging by the PR that has been sent to Linus a few hours ago -- the 'net' prefix is then used. --- net/ipv4/tcp_input.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) --- base-commit: 77ae5e5b00720372af2860efdc4bc652ac682696 change-id: 20240716-upstream-net-next-20240716-tcp-3rd-ack-consume-sk_socket-0cbd159fc5b0 Best regards,