Message ID | 20240417165756.2531620-2-edumazet@google.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | tcp: tcp_v4_err() changes | expand |
On Wed, Apr 17, 2024 at 9:58 AM Eric Dumazet <edumazet@google.com> wrote: > > Blamed commit claimed in its changelog that the new functionality > was guarded by IP_RECVERR/IPV6_RECVERR : > > Note that applications need to set IP_RECVERR/IPV6_RECVERR option to > enable this feature, and that the error message is only queued > while in SYN_SNT state. > > This was true only for IPv6, because ipv6_icmp_error() has > the following check: > > if (!inet6_test_bit(RECVERR6, sk)) > return; > > Other callers check IP_RECVERR by themselves, it is unclear > if we could factorize these checks in ip_icmp_error() > > For stable backports, I chose to add the missing check in tcp_v4_err() > > We think this missing check was the root cause for commit > 0a8de364ff7a ("tcp: no longer abort SYN_SENT when receiving > some ICMP") breakage, leading to a revert. > > Many thanks to Dragos Tatulea for conducting the investigations. > > As Jakub said : > > The suspicion is that SSH sees the ICMP report on the socket error queue > and tries to connect() again, but due to the patch the socket isn't > disconnected, so it gets EALREADY, and throws its hands up... > > The error bubbles up to Vagrant which also becomes unhappy. > > Can we skip the call to ip_icmp_error() for non-fatal ICMP errors? > > Fixes: 45af29ca761c ("tcp: allow traceroute -Mtcp for unpriv users") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Tested-by: Dragos Tatulea <dtatulea@nvidia.com> > Cc: Dragos Tatulea <dtatulea@nvidia.com> > Cc: Maciej Żenczykowski <maze@google.com> > Cc: Willem de Bruijn <willemb@google.com> > Cc: Neal Cardwell <ncardwell@google.com> > Cc: Shachar Kagan <skagan@nvidia.com> > --- > net/ipv4/tcp_ipv4.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index 88c83ac4212957f19efad0f967952d2502bdbc7f..a717db99972d977a64178d7ed1109325d64a6d51 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -602,7 +602,8 @@ int tcp_v4_err(struct sk_buff *skb, u32 info) > if (fastopen && !fastopen->sk) > break; > > - ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); > + if (inet_test_bit(RECVERR, sk)) > + ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); > > if (!sock_owned_by_user(sk)) { > WRITE_ONCE(sk->sk_err, err); > -- > 2.44.0.683.g7961c838ac-goog > Reviewed-by: Maciej Żenczykowski <maze@google.com> Makes sense to me.
On Thu, Apr 18, 2024 at 12:59 AM Eric Dumazet <edumazet@google.com> wrote: > > Blamed commit claimed in its changelog that the new functionality > was guarded by IP_RECVERR/IPV6_RECVERR : > > Note that applications need to set IP_RECVERR/IPV6_RECVERR option to > enable this feature, and that the error message is only queued > while in SYN_SNT state. > > This was true only for IPv6, because ipv6_icmp_error() has > the following check: > > if (!inet6_test_bit(RECVERR6, sk)) > return; > > Other callers check IP_RECVERR by themselves, it is unclear > if we could factorize these checks in ip_icmp_error() > > For stable backports, I chose to add the missing check in tcp_v4_err() > > We think this missing check was the root cause for commit > 0a8de364ff7a ("tcp: no longer abort SYN_SENT when receiving > some ICMP") breakage, leading to a revert. > > Many thanks to Dragos Tatulea for conducting the investigations. > > As Jakub said : > > The suspicion is that SSH sees the ICMP report on the socket error queue > and tries to connect() again, but due to the patch the socket isn't > disconnected, so it gets EALREADY, and throws its hands up... > > The error bubbles up to Vagrant which also becomes unhappy. > > Can we skip the call to ip_icmp_error() for non-fatal ICMP errors? > > Fixes: 45af29ca761c ("tcp: allow traceroute -Mtcp for unpriv users") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Tested-by: Dragos Tatulea <dtatulea@nvidia.com> > Cc: Dragos Tatulea <dtatulea@nvidia.com> > Cc: Maciej Żenczykowski <maze@google.com> > Cc: Willem de Bruijn <willemb@google.com> > Cc: Neal Cardwell <ncardwell@google.com> > Cc: Shachar Kagan <skagan@nvidia.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> I wonder if we're supposed to move this check into ip_icmp_error() like ipv6_icmp_error() does, because I notice one caller rxrpc_encap_err_rcv() without checking RECVERR bit reuses the ICMP error logic which is introduced in commit b6c66c4324e7 ("rxrpc: Use the core ICMP/ICMP6 parsers'')? Or should it be a follow-up patch (moving it inside of ip_icmp_error()) to handle the rxrpc case and also prevent future misuse for other people? Thanks, Jason
On Thu, Apr 18, 2024 at 5:23 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Thu, Apr 18, 2024 at 12:59 AM Eric Dumazet <edumazet@google.com> wrote: > > > > Blamed commit claimed in its changelog that the new functionality > > was guarded by IP_RECVERR/IPV6_RECVERR : > > > > Note that applications need to set IP_RECVERR/IPV6_RECVERR option to > > enable this feature, and that the error message is only queued > > while in SYN_SNT state. > > > > This was true only for IPv6, because ipv6_icmp_error() has > > the following check: > > > > if (!inet6_test_bit(RECVERR6, sk)) > > return; > > > > Other callers check IP_RECVERR by themselves, it is unclear > > if we could factorize these checks in ip_icmp_error() > > > > For stable backports, I chose to add the missing check in tcp_v4_err() > > > > We think this missing check was the root cause for commit > > 0a8de364ff7a ("tcp: no longer abort SYN_SENT when receiving > > some ICMP") breakage, leading to a revert. > > > > Many thanks to Dragos Tatulea for conducting the investigations. > > > > As Jakub said : > > > > The suspicion is that SSH sees the ICMP report on the socket error queue > > and tries to connect() again, but due to the patch the socket isn't > > disconnected, so it gets EALREADY, and throws its hands up... > > > > The error bubbles up to Vagrant which also becomes unhappy. > > > > Can we skip the call to ip_icmp_error() for non-fatal ICMP errors? > > > > Fixes: 45af29ca761c ("tcp: allow traceroute -Mtcp for unpriv users") > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Tested-by: Dragos Tatulea <dtatulea@nvidia.com> > > Cc: Dragos Tatulea <dtatulea@nvidia.com> > > Cc: Maciej Żenczykowski <maze@google.com> > > Cc: Willem de Bruijn <willemb@google.com> > > Cc: Neal Cardwell <ncardwell@google.com> > > Cc: Shachar Kagan <skagan@nvidia.com> > > Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> > > I wonder if we're supposed to move this check into ip_icmp_error() > like ipv6_icmp_error() does, because I notice one caller > rxrpc_encap_err_rcv() without checking RECVERR bit reuses the ICMP > error logic which is introduced in commit b6c66c4324e7 ("rxrpc: Use > the core ICMP/ICMP6 parsers'')? I tried to focus on the TCP issues, and to have a stable candidate for patch #1. The refactoring can wait. > > Or should it be a follow-up patch (moving it inside of > ip_icmp_error()) to handle the rxrpc case and also prevent future > misuse for other people?
On Thu, Apr 18, 2024 at 2:45 PM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Apr 18, 2024 at 5:23 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Thu, Apr 18, 2024 at 12:59 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > Blamed commit claimed in its changelog that the new functionality > > > was guarded by IP_RECVERR/IPV6_RECVERR : > > > > > > Note that applications need to set IP_RECVERR/IPV6_RECVERR option to > > > enable this feature, and that the error message is only queued > > > while in SYN_SNT state. > > > > > > This was true only for IPv6, because ipv6_icmp_error() has > > > the following check: > > > > > > if (!inet6_test_bit(RECVERR6, sk)) > > > return; > > > > > > Other callers check IP_RECVERR by themselves, it is unclear > > > if we could factorize these checks in ip_icmp_error() > > > > > > For stable backports, I chose to add the missing check in tcp_v4_err() > > > > > > We think this missing check was the root cause for commit > > > 0a8de364ff7a ("tcp: no longer abort SYN_SENT when receiving > > > some ICMP") breakage, leading to a revert. > > > > > > Many thanks to Dragos Tatulea for conducting the investigations. > > > > > > As Jakub said : > > > > > > The suspicion is that SSH sees the ICMP report on the socket error queue > > > and tries to connect() again, but due to the patch the socket isn't > > > disconnected, so it gets EALREADY, and throws its hands up... > > > > > > The error bubbles up to Vagrant which also becomes unhappy. > > > > > > Can we skip the call to ip_icmp_error() for non-fatal ICMP errors? > > > > > > Fixes: 45af29ca761c ("tcp: allow traceroute -Mtcp for unpriv users") > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > Tested-by: Dragos Tatulea <dtatulea@nvidia.com> > > > Cc: Dragos Tatulea <dtatulea@nvidia.com> > > > Cc: Maciej Żenczykowski <maze@google.com> > > > Cc: Willem de Bruijn <willemb@google.com> > > > Cc: Neal Cardwell <ncardwell@google.com> > > > Cc: Shachar Kagan <skagan@nvidia.com> > > > > Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> > > > > I wonder if we're supposed to move this check into ip_icmp_error() > > like ipv6_icmp_error() does, because I notice one caller > > rxrpc_encap_err_rcv() without checking RECVERR bit reuses the ICMP > > error logic which is introduced in commit b6c66c4324e7 ("rxrpc: Use > > the core ICMP/ICMP6 parsers'')? > > I tried to focus on the TCP issues, and to have a stable candidate for patch #1. > > The refactoring can wait. Got it. It's clear. After this patch is applied, I can adjust a little bit (only by moving it into ip_icmp_error()). Thanks, Jason > > > > > Or should it be a follow-up patch (moving it inside of > > ip_icmp_error()) to handle the rxrpc case and also prevent future > > misuse for other people?
Hi, On Wed, 2024-04-17 at 16:57 +0000, Eric Dumazet wrote: > Blamed commit claimed in its changelog that the new functionality > was guarded by IP_RECVERR/IPV6_RECVERR : > > Note that applications need to set IP_RECVERR/IPV6_RECVERR option to > enable this feature, and that the error message is only queued > while in SYN_SNT state. > > This was true only for IPv6, because ipv6_icmp_error() has > the following check: > > if (!inet6_test_bit(RECVERR6, sk)) > return; > > Other callers check IP_RECVERR by themselves, it is unclear > if we could factorize these checks in ip_icmp_error() > > For stable backports, I chose to add the missing check in tcp_v4_err() > > We think this missing check was the root cause for commit > 0a8de364ff7a ("tcp: no longer abort SYN_SENT when receiving > some ICMP") breakage, leading to a revert. > > Many thanks to Dragos Tatulea for conducting the investigations. > > As Jakub said : > > The suspicion is that SSH sees the ICMP report on the socket error queue > and tries to connect() again, but due to the patch the socket isn't > disconnected, so it gets EALREADY, and throws its hands up... > > The error bubbles up to Vagrant which also becomes unhappy. > > Can we skip the call to ip_icmp_error() for non-fatal ICMP errors? > > Fixes: 45af29ca761c ("tcp: allow traceroute -Mtcp for unpriv users") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Tested-by: Dragos Tatulea <dtatulea@nvidia.com> > Cc: Dragos Tatulea <dtatulea@nvidia.com> > Cc: Maciej Żenczykowski <maze@google.com> > Cc: Willem de Bruijn <willemb@google.com> > Cc: Neal Cardwell <ncardwell@google.com> > Cc: Shachar Kagan <skagan@nvidia.com> > --- > net/ipv4/tcp_ipv4.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index 88c83ac4212957f19efad0f967952d2502bdbc7f..a717db99972d977a64178d7ed1109325d64a6d51 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -602,7 +602,8 @@ int tcp_v4_err(struct sk_buff *skb, u32 info) > if (fastopen && !fastopen->sk) > break; > > - ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); > + if (inet_test_bit(RECVERR, sk)) > + ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); > > if (!sock_owned_by_user(sk)) { > WRITE_ONCE(sk->sk_err, err); We have a fcnal-test.sh self-test failure: https://netdev.bots.linux.dev/contest.html?branch=net-next-2024-04-18--06-00&test=fcnal-test-sh that I suspect are related to this patch (or the following one): the test case creates a TCP connection on loopback and this is the only patchseries touching the related code, included in the relevant patch burst. Could you please have a look? Thanks! Paolo
On Thu, Apr 18, 2024 at 10:02 AM Paolo Abeni <pabeni@redhat.com> wrote: > > Hi, > > On Wed, 2024-04-17 at 16:57 +0000, Eric Dumazet wrote: > > Blamed commit claimed in its changelog that the new functionality > > was guarded by IP_RECVERR/IPV6_RECVERR : > > > > Note that applications need to set IP_RECVERR/IPV6_RECVERR option to > > enable this feature, and that the error message is only queued > > while in SYN_SNT state. > > > > This was true only for IPv6, because ipv6_icmp_error() has > > the following check: > > > > if (!inet6_test_bit(RECVERR6, sk)) > > return; > > > > Other callers check IP_RECVERR by themselves, it is unclear > > if we could factorize these checks in ip_icmp_error() > > > > For stable backports, I chose to add the missing check in tcp_v4_err() > > > > We think this missing check was the root cause for commit > > 0a8de364ff7a ("tcp: no longer abort SYN_SENT when receiving > > some ICMP") breakage, leading to a revert. > > > > Many thanks to Dragos Tatulea for conducting the investigations. > > > > As Jakub said : > > > > The suspicion is that SSH sees the ICMP report on the socket error queue > > and tries to connect() again, but due to the patch the socket isn't > > disconnected, so it gets EALREADY, and throws its hands up... > > > > The error bubbles up to Vagrant which also becomes unhappy. > > > > Can we skip the call to ip_icmp_error() for non-fatal ICMP errors? > > > > Fixes: 45af29ca761c ("tcp: allow traceroute -Mtcp for unpriv users") > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Tested-by: Dragos Tatulea <dtatulea@nvidia.com> > > Cc: Dragos Tatulea <dtatulea@nvidia.com> > > Cc: Maciej Żenczykowski <maze@google.com> > > Cc: Willem de Bruijn <willemb@google.com> > > Cc: Neal Cardwell <ncardwell@google.com> > > Cc: Shachar Kagan <skagan@nvidia.com> > > --- > > net/ipv4/tcp_ipv4.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > > index 88c83ac4212957f19efad0f967952d2502bdbc7f..a717db99972d977a64178d7ed1109325d64a6d51 100644 > > --- a/net/ipv4/tcp_ipv4.c > > +++ b/net/ipv4/tcp_ipv4.c > > @@ -602,7 +602,8 @@ int tcp_v4_err(struct sk_buff *skb, u32 info) > > if (fastopen && !fastopen->sk) > > break; > > > > - ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); > > + if (inet_test_bit(RECVERR, sk)) > > + ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); > > > > if (!sock_owned_by_user(sk)) { > > WRITE_ONCE(sk->sk_err, err); > > We have a fcnal-test.sh self-test failure: > > https://netdev.bots.linux.dev/contest.html?branch=net-next-2024-04-18--06-00&test=fcnal-test-sh > > that I suspect are related to this patch (or the following one): the > test case creates a TCP connection on loopback and this is the only > patchseries touching the related code, included in the relevant patch > burst. > > Could you please have a look? Sure, thanks Paolo !
On Thu, Apr 18, 2024 at 10:03 AM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Apr 18, 2024 at 10:02 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > Hi, > > > > On Wed, 2024-04-17 at 16:57 +0000, Eric Dumazet wrote: > > > Blamed commit claimed in its changelog that the new functionality > > > was guarded by IP_RECVERR/IPV6_RECVERR : > > > > > > Note that applications need to set IP_RECVERR/IPV6_RECVERR option to > > > enable this feature, and that the error message is only queued > > > while in SYN_SNT state. > > > > > > This was true only for IPv6, because ipv6_icmp_error() has > > > the following check: > > > > > > if (!inet6_test_bit(RECVERR6, sk)) > > > return; > > > > > > Other callers check IP_RECVERR by themselves, it is unclear > > > if we could factorize these checks in ip_icmp_error() > > > > > > For stable backports, I chose to add the missing check in tcp_v4_err() > > > > > > We think this missing check was the root cause for commit > > > 0a8de364ff7a ("tcp: no longer abort SYN_SENT when receiving > > > some ICMP") breakage, leading to a revert. > > > > > > Many thanks to Dragos Tatulea for conducting the investigations. > > > > > > As Jakub said : > > > > > > The suspicion is that SSH sees the ICMP report on the socket error queue > > > and tries to connect() again, but due to the patch the socket isn't > > > disconnected, so it gets EALREADY, and throws its hands up... > > > > > > The error bubbles up to Vagrant which also becomes unhappy. > > > > > > Can we skip the call to ip_icmp_error() for non-fatal ICMP errors? > > > > > > Fixes: 45af29ca761c ("tcp: allow traceroute -Mtcp for unpriv users") > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > Tested-by: Dragos Tatulea <dtatulea@nvidia.com> > > > Cc: Dragos Tatulea <dtatulea@nvidia.com> > > > Cc: Maciej Żenczykowski <maze@google.com> > > > Cc: Willem de Bruijn <willemb@google.com> > > > Cc: Neal Cardwell <ncardwell@google.com> > > > Cc: Shachar Kagan <skagan@nvidia.com> > > > --- > > > net/ipv4/tcp_ipv4.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > > > index 88c83ac4212957f19efad0f967952d2502bdbc7f..a717db99972d977a64178d7ed1109325d64a6d51 100644 > > > --- a/net/ipv4/tcp_ipv4.c > > > +++ b/net/ipv4/tcp_ipv4.c > > > @@ -602,7 +602,8 @@ int tcp_v4_err(struct sk_buff *skb, u32 info) > > > if (fastopen && !fastopen->sk) > > > break; > > > > > > - ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); > > > + if (inet_test_bit(RECVERR, sk)) > > > + ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); > > > > > > if (!sock_owned_by_user(sk)) { > > > WRITE_ONCE(sk->sk_err, err); > > > > We have a fcnal-test.sh self-test failure: > > > > https://netdev.bots.linux.dev/contest.html?branch=net-next-2024-04-18--06-00&test=fcnal-test-sh > > > > that I suspect are related to this patch (or the following one): the > > test case creates a TCP connection on loopback and this is the only > > patchseries touching the related code, included in the relevant patch > > burst. > > > > Could you please have a look? > > Sure, thanks Paolo ! First patch is fine, I see no failure from fcnal-test.sh (as I would expect) For the second one, I am not familiar enough with this very slow test suite (all these "sleep 1" ... oh well) I guess "failing tests" depended on TCP connect() to immediately abort on one ICMP message, depending on old kernel behavior. I do not know how to launch a subset of the tests, and trace these. "./fcnal-test.sh -t ipv4_tcp" alone takes more than 9 minutes [1] in a VM running a non debug kernel :/ David, do you have an idea how to proceed ? Thanks. [1] Tests passed: 134 Tests failed: 0 real 9m33.085s user 0m40.159s sys 0m30.098s
On Thu, 2024-04-18 at 11:26 +0200, Eric Dumazet wrote: > On Thu, Apr 18, 2024 at 10:03 AM Eric Dumazet <edumazet@google.com> wrote: > > > > On Thu, Apr 18, 2024 at 10:02 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > Hi, > > > > > > On Wed, 2024-04-17 at 16:57 +0000, Eric Dumazet wrote: > > > > Blamed commit claimed in its changelog that the new functionality > > > > was guarded by IP_RECVERR/IPV6_RECVERR : > > > > > > > > Note that applications need to set IP_RECVERR/IPV6_RECVERR option to > > > > enable this feature, and that the error message is only queued > > > > while in SYN_SNT state. > > > > > > > > This was true only for IPv6, because ipv6_icmp_error() has > > > > the following check: > > > > > > > > if (!inet6_test_bit(RECVERR6, sk)) > > > > return; > > > > > > > > Other callers check IP_RECVERR by themselves, it is unclear > > > > if we could factorize these checks in ip_icmp_error() > > > > > > > > For stable backports, I chose to add the missing check in tcp_v4_err() > > > > > > > > We think this missing check was the root cause for commit > > > > 0a8de364ff7a ("tcp: no longer abort SYN_SENT when receiving > > > > some ICMP") breakage, leading to a revert. > > > > > > > > Many thanks to Dragos Tatulea for conducting the investigations. > > > > > > > > As Jakub said : > > > > > > > > The suspicion is that SSH sees the ICMP report on the socket error queue > > > > and tries to connect() again, but due to the patch the socket isn't > > > > disconnected, so it gets EALREADY, and throws its hands up... > > > > > > > > The error bubbles up to Vagrant which also becomes unhappy. > > > > > > > > Can we skip the call to ip_icmp_error() for non-fatal ICMP errors? > > > > > > > > Fixes: 45af29ca761c ("tcp: allow traceroute -Mtcp for unpriv users") > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > > Tested-by: Dragos Tatulea <dtatulea@nvidia.com> > > > > Cc: Dragos Tatulea <dtatulea@nvidia.com> > > > > Cc: Maciej Żenczykowski <maze@google.com> > > > > Cc: Willem de Bruijn <willemb@google.com> > > > > Cc: Neal Cardwell <ncardwell@google.com> > > > > Cc: Shachar Kagan <skagan@nvidia.com> > > > > --- > > > > net/ipv4/tcp_ipv4.c | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > > > > index 88c83ac4212957f19efad0f967952d2502bdbc7f..a717db99972d977a64178d7ed1109325d64a6d51 100644 > > > > --- a/net/ipv4/tcp_ipv4.c > > > > +++ b/net/ipv4/tcp_ipv4.c > > > > @@ -602,7 +602,8 @@ int tcp_v4_err(struct sk_buff *skb, u32 info) > > > > if (fastopen && !fastopen->sk) > > > > break; > > > > > > > > - ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); > > > > + if (inet_test_bit(RECVERR, sk)) > > > > + ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); > > > > > > > > if (!sock_owned_by_user(sk)) { > > > > WRITE_ONCE(sk->sk_err, err); > > > > > > We have a fcnal-test.sh self-test failure: > > > > > > https://netdev.bots.linux.dev/contest.html?branch=net-next-2024-04-18--06-00&test=fcnal-test-sh > > > > > > that I suspect are related to this patch (or the following one): the > > > test case creates a TCP connection on loopback and this is the only > > > patchseries touching the related code, included in the relevant patch > > > burst. > > > > > > Could you please have a look? > > > > Sure, thanks Paolo ! > > First patch is fine, I see no failure from fcnal-test.sh (as I would expect) > > For the second one, I am not familiar enough with this very slow test > suite (all these "sleep 1" ... oh well) @David, some of them could be replaced with loopy_wait calls > I guess "failing tests" depended on TCP connect() to immediately abort > on one ICMP message, > depending on old kernel behavior. > > I do not know how to launch a subset of the tests, and trace these. > > "./fcnal-test.sh -t ipv4_tcp" alone takes more than 9 minutes [1] in a > VM running a non debug kernel :/ > > David, do you have an idea how to proceed ? One very dumb thing I do in that cases is commenting out the other tests, something alike (completely untested!): --- diff --git a/tools/testing/selftests/net/fcnal-test.sh b/tools/testing/selftests/net/fcnal-test.sh index 386ebd829df5..494932aa99b2 100755 --- a/tools/testing/selftests/net/fcnal-test.sh +++ b/tools/testing/selftests/net/fcnal-test.sh @@ -1186,6 +1186,7 @@ ipv4_tcp_novrf() { local a +if false; then # # server tests # @@ -1271,6 +1272,7 @@ ipv4_tcp_novrf() log_test_addr ${a} $? 1 "Device server, unbound client, local connection" done +fi a=${NSA_IP} log_start run_cmd nettest -s & @@ -1487,12 +1489,14 @@ ipv4_tcp() set_sysctl net.ipv4.tcp_l3mdev_accept=0 ipv4_tcp_novrf log_subsection "tcp_l3mdev_accept enabled" +if false; then set_sysctl net.ipv4.tcp_l3mdev_accept=1 ipv4_tcp_novrf log_subsection "With VRF" setup "yes" ipv4_tcp_vrf +fi } ################################################################################
On Thu, Apr 18, 2024 at 11:58 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On Thu, 2024-04-18 at 11:26 +0200, Eric Dumazet wrote: > > On Thu, Apr 18, 2024 at 10:03 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Thu, Apr 18, 2024 at 10:02 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > Hi, > > > > > > > > On Wed, 2024-04-17 at 16:57 +0000, Eric Dumazet wrote: > > > > > Blamed commit claimed in its changelog that the new functionality > > > > > was guarded by IP_RECVERR/IPV6_RECVERR : > > > > > > > > > > Note that applications need to set IP_RECVERR/IPV6_RECVERR option to > > > > > enable this feature, and that the error message is only queued > > > > > while in SYN_SNT state. > > > > > > > > > > This was true only for IPv6, because ipv6_icmp_error() has > > > > > the following check: > > > > > > > > > > if (!inet6_test_bit(RECVERR6, sk)) > > > > > return; > > > > > > > > > > Other callers check IP_RECVERR by themselves, it is unclear > > > > > if we could factorize these checks in ip_icmp_error() > > > > > > > > > > For stable backports, I chose to add the missing check in tcp_v4_err() > > > > > > > > > > We think this missing check was the root cause for commit > > > > > 0a8de364ff7a ("tcp: no longer abort SYN_SENT when receiving > > > > > some ICMP") breakage, leading to a revert. > > > > > > > > > > Many thanks to Dragos Tatulea for conducting the investigations. > > > > > > > > > > As Jakub said : > > > > > > > > > > The suspicion is that SSH sees the ICMP report on the socket error queue > > > > > and tries to connect() again, but due to the patch the socket isn't > > > > > disconnected, so it gets EALREADY, and throws its hands up... > > > > > > > > > > The error bubbles up to Vagrant which also becomes unhappy. > > > > > > > > > > Can we skip the call to ip_icmp_error() for non-fatal ICMP errors? > > > > > > > > > > Fixes: 45af29ca761c ("tcp: allow traceroute -Mtcp for unpriv users") > > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > > > Tested-by: Dragos Tatulea <dtatulea@nvidia.com> > > > > > Cc: Dragos Tatulea <dtatulea@nvidia.com> > > > > > Cc: Maciej Żenczykowski <maze@google.com> > > > > > Cc: Willem de Bruijn <willemb@google.com> > > > > > Cc: Neal Cardwell <ncardwell@google.com> > > > > > Cc: Shachar Kagan <skagan@nvidia.com> > > > > > --- > > > > > net/ipv4/tcp_ipv4.c | 3 ++- > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > > > > > index 88c83ac4212957f19efad0f967952d2502bdbc7f..a717db99972d977a64178d7ed1109325d64a6d51 100644 > > > > > --- a/net/ipv4/tcp_ipv4.c > > > > > +++ b/net/ipv4/tcp_ipv4.c > > > > > @@ -602,7 +602,8 @@ int tcp_v4_err(struct sk_buff *skb, u32 info) > > > > > if (fastopen && !fastopen->sk) > > > > > break; > > > > > > > > > > - ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); > > > > > + if (inet_test_bit(RECVERR, sk)) > > > > > + ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); > > > > > > > > > > if (!sock_owned_by_user(sk)) { > > > > > WRITE_ONCE(sk->sk_err, err); > > > > > > > > We have a fcnal-test.sh self-test failure: > > > > > > > > https://netdev.bots.linux.dev/contest.html?branch=net-next-2024-04-18--06-00&test=fcnal-test-sh > > > > > > > > that I suspect are related to this patch (or the following one): the > > > > test case creates a TCP connection on loopback and this is the only > > > > patchseries touching the related code, included in the relevant patch > > > > burst. > > > > > > > > Could you please have a look? > > > > > > Sure, thanks Paolo ! > > > > First patch is fine, I see no failure from fcnal-test.sh (as I would expect) > > > > For the second one, I am not familiar enough with this very slow test > > suite (all these "sleep 1" ... oh well) > > @David, some of them could be replaced with loopy_wait calls > > > I guess "failing tests" depended on TCP connect() to immediately abort > > on one ICMP message, > > depending on old kernel behavior. > > > > I do not know how to launch a subset of the tests, and trace these. > > > > "./fcnal-test.sh -t ipv4_tcp" alone takes more than 9 minutes [1] in a > > VM running a non debug kernel :/ > > > > David, do you have an idea how to proceed ? > > One very dumb thing I do in that cases is commenting out the other > tests, something alike (completely untested!): > --- > diff --git a/tools/testing/selftests/net/fcnal-test.sh b/tools/testing/selftests/net/fcnal-test.sh > index 386ebd829df5..494932aa99b2 100755 > --- a/tools/testing/selftests/net/fcnal-test.sh > +++ b/tools/testing/selftests/net/fcnal-test.sh > @@ -1186,6 +1186,7 @@ ipv4_tcp_novrf() > { > local a > > +if false; then > # > # server tests > # > @@ -1271,6 +1272,7 @@ ipv4_tcp_novrf() > log_test_addr ${a} $? 1 "Device server, unbound client, local connection" > done > > +fi > a=${NSA_IP} > log_start > run_cmd nettest -s & > @@ -1487,12 +1489,14 @@ ipv4_tcp() > set_sysctl net.ipv4.tcp_l3mdev_accept=0 > ipv4_tcp_novrf > log_subsection "tcp_l3mdev_accept enabled" > +if false; then > set_sysctl net.ipv4.tcp_l3mdev_accept=1 > ipv4_tcp_novrf > > log_subsection "With VRF" > setup "yes" > ipv4_tcp_vrf > +fi > } Thanks Paolo I found that the following patch is fixing the issue for me. diff --git a/tools/testing/selftests/net/nettest.c b/tools/testing/selftests/net/nettest.c index cd8a580974480212b45d86f35293b77f3d033473..ff25e53024ef6d4101f251c8a8a5e936e44e280f 100644 --- a/tools/testing/selftests/net/nettest.c +++ b/tools/testing/selftests/net/nettest.c @@ -1744,6 +1744,7 @@ static int connectsock(void *addr, socklen_t alen, struct sock_args *args) if (args->bind_test_only) goto out; + set_recv_attr(sd, args->version); if (connect(sd, addr, alen) < 0) { if (errno != EINPROGRESS) { log_err_errno("Failed to connect to remote host");
On Thu, Apr 18, 2024 at 12:15 PM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Apr 18, 2024 at 11:58 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > On Thu, 2024-04-18 at 11:26 +0200, Eric Dumazet wrote: > > > On Thu, Apr 18, 2024 at 10:03 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > > > On Thu, Apr 18, 2024 at 10:02 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > > > Hi, > > > > > > > > > > On Wed, 2024-04-17 at 16:57 +0000, Eric Dumazet wrote: > > > > > > Blamed commit claimed in its changelog that the new functionality > > > > > > was guarded by IP_RECVERR/IPV6_RECVERR : > > > > > > > > > > > > Note that applications need to set IP_RECVERR/IPV6_RECVERR option to > > > > > > enable this feature, and that the error message is only queued > > > > > > while in SYN_SNT state. > > > > > > > > > > > > This was true only for IPv6, because ipv6_icmp_error() has > > > > > > the following check: > > > > > > > > > > > > if (!inet6_test_bit(RECVERR6, sk)) > > > > > > return; > > > > > > > > > > > > Other callers check IP_RECVERR by themselves, it is unclear > > > > > > if we could factorize these checks in ip_icmp_error() > > > > > > > > > > > > For stable backports, I chose to add the missing check in tcp_v4_err() > > > > > > > > > > > > We think this missing check was the root cause for commit > > > > > > 0a8de364ff7a ("tcp: no longer abort SYN_SENT when receiving > > > > > > some ICMP") breakage, leading to a revert. > > > > > > > > > > > > Many thanks to Dragos Tatulea for conducting the investigations. > > > > > > > > > > > > As Jakub said : > > > > > > > > > > > > The suspicion is that SSH sees the ICMP report on the socket error queue > > > > > > and tries to connect() again, but due to the patch the socket isn't > > > > > > disconnected, so it gets EALREADY, and throws its hands up... > > > > > > > > > > > > The error bubbles up to Vagrant which also becomes unhappy. > > > > > > > > > > > > Can we skip the call to ip_icmp_error() for non-fatal ICMP errors? > > > > > > > > > > > > Fixes: 45af29ca761c ("tcp: allow traceroute -Mtcp for unpriv users") > > > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > > > > Tested-by: Dragos Tatulea <dtatulea@nvidia.com> > > > > > > Cc: Dragos Tatulea <dtatulea@nvidia.com> > > > > > > Cc: Maciej Żenczykowski <maze@google.com> > > > > > > Cc: Willem de Bruijn <willemb@google.com> > > > > > > Cc: Neal Cardwell <ncardwell@google.com> > > > > > > Cc: Shachar Kagan <skagan@nvidia.com> > > > > > > --- > > > > > > net/ipv4/tcp_ipv4.c | 3 ++- > > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > > > > > > index 88c83ac4212957f19efad0f967952d2502bdbc7f..a717db99972d977a64178d7ed1109325d64a6d51 100644 > > > > > > --- a/net/ipv4/tcp_ipv4.c > > > > > > +++ b/net/ipv4/tcp_ipv4.c > > > > > > @@ -602,7 +602,8 @@ int tcp_v4_err(struct sk_buff *skb, u32 info) > > > > > > if (fastopen && !fastopen->sk) > > > > > > break; > > > > > > > > > > > > - ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); > > > > > > + if (inet_test_bit(RECVERR, sk)) > > > > > > + ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); > > > > > > > > > > > > if (!sock_owned_by_user(sk)) { > > > > > > WRITE_ONCE(sk->sk_err, err); > > > > > > > > > > We have a fcnal-test.sh self-test failure: > > > > > > > > > > https://netdev.bots.linux.dev/contest.html?branch=net-next-2024-04-18--06-00&test=fcnal-test-sh > > > > > > > > > > that I suspect are related to this patch (or the following one): the > > > > > test case creates a TCP connection on loopback and this is the only > > > > > patchseries touching the related code, included in the relevant patch > > > > > burst. > > > > > > > > > > Could you please have a look? > > > > > > > > Sure, thanks Paolo ! > > > > > > First patch is fine, I see no failure from fcnal-test.sh (as I would expect) > > > > > > For the second one, I am not familiar enough with this very slow test > > > suite (all these "sleep 1" ... oh well) > > > > @David, some of them could be replaced with loopy_wait calls > > > > > I guess "failing tests" depended on TCP connect() to immediately abort > > > on one ICMP message, > > > depending on old kernel behavior. > > > > > > I do not know how to launch a subset of the tests, and trace these. > > > > > > "./fcnal-test.sh -t ipv4_tcp" alone takes more than 9 minutes [1] in a > > > VM running a non debug kernel :/ > > > > > > David, do you have an idea how to proceed ? > > > > One very dumb thing I do in that cases is commenting out the other > > tests, something alike (completely untested!): > > --- > > diff --git a/tools/testing/selftests/net/fcnal-test.sh b/tools/testing/selftests/net/fcnal-test.sh > > index 386ebd829df5..494932aa99b2 100755 > > --- a/tools/testing/selftests/net/fcnal-test.sh > > +++ b/tools/testing/selftests/net/fcnal-test.sh > > @@ -1186,6 +1186,7 @@ ipv4_tcp_novrf() > > { > > local a > > > > +if false; then > > # > > # server tests > > # > > @@ -1271,6 +1272,7 @@ ipv4_tcp_novrf() > > log_test_addr ${a} $? 1 "Device server, unbound client, local connection" > > done > > > > +fi > > a=${NSA_IP} > > log_start > > run_cmd nettest -s & > > @@ -1487,12 +1489,14 @@ ipv4_tcp() > > set_sysctl net.ipv4.tcp_l3mdev_accept=0 > > ipv4_tcp_novrf > > log_subsection "tcp_l3mdev_accept enabled" > > +if false; then > > set_sysctl net.ipv4.tcp_l3mdev_accept=1 > > ipv4_tcp_novrf > > > > log_subsection "With VRF" > > setup "yes" > > ipv4_tcp_vrf > > +fi > > } > > Thanks Paolo > > I found that the following patch is fixing the issue for me. > > diff --git a/tools/testing/selftests/net/nettest.c > b/tools/testing/selftests/net/nettest.c > index cd8a580974480212b45d86f35293b77f3d033473..ff25e53024ef6d4101f251c8a8a5e936e44e280f > 100644 > --- a/tools/testing/selftests/net/nettest.c > +++ b/tools/testing/selftests/net/nettest.c > @@ -1744,6 +1744,7 @@ static int connectsock(void *addr, socklen_t > alen, struct sock_args *args) > if (args->bind_test_only) > goto out; > > + set_recv_attr(sd, args->version); > if (connect(sd, addr, alen) < 0) { > if (errno != EINPROGRESS) { > log_err_errno("Failed to connect to remote host"); When tracing nettest we now have EHOSTUNREACH 3343 setsockopt(3, SOL_SOCKET, SO_REUSEPORT, [1], 4) = 0 <0.000210> 3343 setsockopt(3, SOL_SOCKET, SO_BINDTODEVICE, "eth1\0", 5) = 0 <0.000170> 3343 setsockopt(3, SOL_IP, IP_PKTINFO, [1], 4) = 0 <0.000161> 3343 setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) = 0 <0.000181> 3343 connect(3, {sa_family=AF_INET, sin_port=htons(12345), sin_addr=inet_addr("172.16.2.1")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000874> 3343 pselect6(1024, NULL, [3], NULL, {tv_sec=5, tv_nsec=0}, NULL) = 1 (out [3], left {tv_sec=1, tv_nsec=930762080}) <3.069673> 3343 getsockopt(3, SOL_SOCKET, SO_ERROR, [EHOSTUNREACH], [4]) = 0 <0.000270> As mentioned in net/ipv4/icmp.c : RFC 1122: 3.2.2.1 States that NET_UNREACH, HOST_UNREACH and SR_FAILED MUST be considered 'transient errs'. Maybe another way to fix nettest would be to change wait_for_connect() to pass a non NULL fdset in 4th argument of select() select(FD_SETSIZE, NULL, &wfd, NULL /* here */, tv);
On Thu, Apr 18, 2024 at 12:22 PM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Apr 18, 2024 at 12:15 PM Eric Dumazet <edumazet@google.com> wrote: > > > > On Thu, Apr 18, 2024 at 11:58 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > On Thu, 2024-04-18 at 11:26 +0200, Eric Dumazet wrote: > > > > On Thu, Apr 18, 2024 at 10:03 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > > > > > On Thu, Apr 18, 2024 at 10:02 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > On Wed, 2024-04-17 at 16:57 +0000, Eric Dumazet wrote: > > > > > > > Blamed commit claimed in its changelog that the new functionality > > > > > > > was guarded by IP_RECVERR/IPV6_RECVERR : > > > > > > > > > > > > > > Note that applications need to set IP_RECVERR/IPV6_RECVERR option to > > > > > > > enable this feature, and that the error message is only queued > > > > > > > while in SYN_SNT state. > > > > > > > > > > > > > > This was true only for IPv6, because ipv6_icmp_error() has > > > > > > > the following check: > > > > > > > > > > > > > > if (!inet6_test_bit(RECVERR6, sk)) > > > > > > > return; > > > > > > > > > > > > > > Other callers check IP_RECVERR by themselves, it is unclear > > > > > > > if we could factorize these checks in ip_icmp_error() > > > > > > > > > > > > > > For stable backports, I chose to add the missing check in tcp_v4_err() > > > > > > > > > > > > > > We think this missing check was the root cause for commit > > > > > > > 0a8de364ff7a ("tcp: no longer abort SYN_SENT when receiving > > > > > > > some ICMP") breakage, leading to a revert. > > > > > > > > > > > > > > Many thanks to Dragos Tatulea for conducting the investigations. > > > > > > > > > > > > > > As Jakub said : > > > > > > > > > > > > > > The suspicion is that SSH sees the ICMP report on the socket error queue > > > > > > > and tries to connect() again, but due to the patch the socket isn't > > > > > > > disconnected, so it gets EALREADY, and throws its hands up... > > > > > > > > > > > > > > The error bubbles up to Vagrant which also becomes unhappy. > > > > > > > > > > > > > > Can we skip the call to ip_icmp_error() for non-fatal ICMP errors? > > > > > > > > > > > > > > Fixes: 45af29ca761c ("tcp: allow traceroute -Mtcp for unpriv users") > > > > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > > > > > Tested-by: Dragos Tatulea <dtatulea@nvidia.com> > > > > > > > Cc: Dragos Tatulea <dtatulea@nvidia.com> > > > > > > > Cc: Maciej Żenczykowski <maze@google.com> > > > > > > > Cc: Willem de Bruijn <willemb@google.com> > > > > > > > Cc: Neal Cardwell <ncardwell@google.com> > > > > > > > Cc: Shachar Kagan <skagan@nvidia.com> > > > > > > > --- > > > > > > > net/ipv4/tcp_ipv4.c | 3 ++- > > > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > > > > > > > index 88c83ac4212957f19efad0f967952d2502bdbc7f..a717db99972d977a64178d7ed1109325d64a6d51 100644 > > > > > > > --- a/net/ipv4/tcp_ipv4.c > > > > > > > +++ b/net/ipv4/tcp_ipv4.c > > > > > > > @@ -602,7 +602,8 @@ int tcp_v4_err(struct sk_buff *skb, u32 info) > > > > > > > if (fastopen && !fastopen->sk) > > > > > > > break; > > > > > > > > > > > > > > - ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); > > > > > > > + if (inet_test_bit(RECVERR, sk)) > > > > > > > + ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); > > > > > > > > > > > > > > if (!sock_owned_by_user(sk)) { > > > > > > > WRITE_ONCE(sk->sk_err, err); > > > > > > > > > > > > We have a fcnal-test.sh self-test failure: > > > > > > > > > > > > https://netdev.bots.linux.dev/contest.html?branch=net-next-2024-04-18--06-00&test=fcnal-test-sh > > > > > > > > > > > > that I suspect are related to this patch (or the following one): the > > > > > > test case creates a TCP connection on loopback and this is the only > > > > > > patchseries touching the related code, included in the relevant patch > > > > > > burst. > > > > > > > > > > > > Could you please have a look? > > > > > > > > > > Sure, thanks Paolo ! > > > > > > > > First patch is fine, I see no failure from fcnal-test.sh (as I would expect) > > > > > > > > For the second one, I am not familiar enough with this very slow test > > > > suite (all these "sleep 1" ... oh well) > > > > > > @David, some of them could be replaced with loopy_wait calls > > > > > > > I guess "failing tests" depended on TCP connect() to immediately abort > > > > on one ICMP message, > > > > depending on old kernel behavior. > > > > > > > > I do not know how to launch a subset of the tests, and trace these. > > > > > > > > "./fcnal-test.sh -t ipv4_tcp" alone takes more than 9 minutes [1] in a > > > > VM running a non debug kernel :/ > > > > > > > > David, do you have an idea how to proceed ? > > > > > > One very dumb thing I do in that cases is commenting out the other > > > tests, something alike (completely untested!): > > > --- > > > diff --git a/tools/testing/selftests/net/fcnal-test.sh b/tools/testing/selftests/net/fcnal-test.sh > > > index 386ebd829df5..494932aa99b2 100755 > > > --- a/tools/testing/selftests/net/fcnal-test.sh > > > +++ b/tools/testing/selftests/net/fcnal-test.sh > > > @@ -1186,6 +1186,7 @@ ipv4_tcp_novrf() > > > { > > > local a > > > > > > +if false; then > > > # > > > # server tests > > > # > > > @@ -1271,6 +1272,7 @@ ipv4_tcp_novrf() > > > log_test_addr ${a} $? 1 "Device server, unbound client, local connection" > > > done > > > > > > +fi > > > a=${NSA_IP} > > > log_start > > > run_cmd nettest -s & > > > @@ -1487,12 +1489,14 @@ ipv4_tcp() > > > set_sysctl net.ipv4.tcp_l3mdev_accept=0 > > > ipv4_tcp_novrf > > > log_subsection "tcp_l3mdev_accept enabled" > > > +if false; then > > > set_sysctl net.ipv4.tcp_l3mdev_accept=1 > > > ipv4_tcp_novrf > > > > > > log_subsection "With VRF" > > > setup "yes" > > > ipv4_tcp_vrf > > > +fi > > > } > > > > Thanks Paolo > > > > I found that the following patch is fixing the issue for me. > > > > diff --git a/tools/testing/selftests/net/nettest.c > > b/tools/testing/selftests/net/nettest.c > > index cd8a580974480212b45d86f35293b77f3d033473..ff25e53024ef6d4101f251c8a8a5e936e44e280f > > 100644 > > --- a/tools/testing/selftests/net/nettest.c > > +++ b/tools/testing/selftests/net/nettest.c > > @@ -1744,6 +1744,7 @@ static int connectsock(void *addr, socklen_t > > alen, struct sock_args *args) > > if (args->bind_test_only) > > goto out; > > > > + set_recv_attr(sd, args->version); > > if (connect(sd, addr, alen) < 0) { > > if (errno != EINPROGRESS) { > > log_err_errno("Failed to connect to remote host"); > > When tracing nettest we now have EHOSTUNREACH > > 3343 setsockopt(3, SOL_SOCKET, SO_REUSEPORT, [1], 4) = 0 <0.000210> > 3343 setsockopt(3, SOL_SOCKET, SO_BINDTODEVICE, "eth1\0", 5) = 0 <0.000170> > 3343 setsockopt(3, SOL_IP, IP_PKTINFO, [1], 4) = 0 <0.000161> > 3343 setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) = 0 <0.000181> > 3343 connect(3, {sa_family=AF_INET, sin_port=htons(12345), > sin_addr=inet_addr("172.16.2.1")}, 16) = -1 EINPROGRESS (Operation now > in progress) <0.000874> > 3343 pselect6(1024, NULL, [3], NULL, {tv_sec=5, tv_nsec=0}, NULL) = 1 > (out [3], left {tv_sec=1, tv_nsec=930762080}) <3.069673> > 3343 getsockopt(3, SOL_SOCKET, SO_ERROR, [EHOSTUNREACH], [4]) = 0 <0.000270> > > As mentioned in net/ipv4/icmp.c : > RFC 1122: 3.2.2.1 States that NET_UNREACH, HOST_UNREACH and SR_FAILED > MUST be considered 'transient errs'. > > Maybe another way to fix nettest would be to change wait_for_connect() > to pass a non NULL fdset in 4th argument of select() > > select(FD_SETSIZE, NULL, &wfd, NULL /* here */, tv); This change in wait_for_connect() does not help. I am guessing set_recv_attr(sd, args->version); is what we need. I am running all ./fcnal-test.sh tests to make sure everything is green.
On 4/18/24 4:15 AM, Eric Dumazet wrote: > > Thanks Paolo > > I found that the following patch is fixing the issue for me. > > diff --git a/tools/testing/selftests/net/nettest.c > b/tools/testing/selftests/net/nettest.c > index cd8a580974480212b45d86f35293b77f3d033473..ff25e53024ef6d4101f251c8a8a5e936e44e280f > 100644 > --- a/tools/testing/selftests/net/nettest.c > +++ b/tools/testing/selftests/net/nettest.c > @@ -1744,6 +1744,7 @@ static int connectsock(void *addr, socklen_t > alen, struct sock_args *args) > if (args->bind_test_only) > goto out; > > + set_recv_attr(sd, args->version); > if (connect(sd, addr, alen) < 0) { > if (errno != EINPROGRESS) { > log_err_errno("Failed to connect to remote host"); You have a kernel patch that makes a test fail, and your solution is changing userspace? The tests are examples of userspace applications and how they can use APIs, so if the patch breaks a test it is by definition breaking userspace which is not allowed.
On Thu, Apr 18, 2024 at 7:46 PM David Ahern <dsahern@kernel.org> wrote: > > On 4/18/24 4:15 AM, Eric Dumazet wrote: > > > > Thanks Paolo > > > > I found that the following patch is fixing the issue for me. > > > > diff --git a/tools/testing/selftests/net/nettest.c > > b/tools/testing/selftests/net/nettest.c > > index cd8a580974480212b45d86f35293b77f3d033473..ff25e53024ef6d4101f251c8a8a5e936e44e280f > > 100644 > > --- a/tools/testing/selftests/net/nettest.c > > +++ b/tools/testing/selftests/net/nettest.c > > @@ -1744,6 +1744,7 @@ static int connectsock(void *addr, socklen_t > > alen, struct sock_args *args) > > if (args->bind_test_only) > > goto out; > > > > + set_recv_attr(sd, args->version); > > if (connect(sd, addr, alen) < 0) { > > if (errno != EINPROGRESS) { > > log_err_errno("Failed to connect to remote host"); > > You have a kernel patch that makes a test fail, and your solution is > changing userspace? The tests are examples of userspace applications and > how they can use APIs, so if the patch breaks a test it is by definition > breaking userspace which is not allowed. I think the userspace program relied on a bug added in linux in 2020 Jakub, I will stop trying to push the patches, this is a lost battle.
On 4/18/24 3:26 AM, Eric Dumazet wrote: > For the second one, I am not familiar enough with this very slow test > suite (all these "sleep 1" ... oh well) > > I guess "failing tests" depended on TCP connect() to immediately abort > on one ICMP message, > depending on old kernel behavior. > > I do not know how to launch a subset of the tests, and trace these. > > "./fcnal-test.sh -t ipv4_tcp" alone takes more than 9 minutes [1] in a > VM running a non debug kernel :/ > > David, do you have an idea how to proceed ? > > Thanks. > > [1] > Tests passed: 134 > Tests failed: 0 > > real 9m33.085s > user 0m40.159s > sys 0m30.098s The test suite was started in 2013 and has evolved to cover the many permutations of APIs -- setsockopts, cmsg, VRF, routing, ip rules, netfilter, etc. There are a lot of combinations. They verify not just control path or socket bind succeeds, but data path works as expected which means do a connect and packet transfer. Some years back nettest gained support to change namespaces and run both client and server in a single instance. I started converting the tests to use that capability and remove the sleeps, but it did not speed up the tests in any significant way. The tests are in blocks and the blocks can be split out to separate scripts or kept in one to retain the common setup code and launched in parallel. Splitting out to any lower level does not make sense. If someone wants to pick up the task of splitting the tests or running them in parallel, please do. I have zero time right now. That the suite detects changes in kernel behavior shows it is doing what it is designed to do and proves its value.
On 4/18/24 11:47 AM, Eric Dumazet wrote:
> I think the userspace program relied on a bug added in linux in 2020
which test - when was it added? nettest.c and the fcnal script were
merged in 2019.
On Thu, 18 Apr 2024 19:47:51 +0200 Eric Dumazet wrote: > > You have a kernel patch that makes a test fail, and your solution is > > changing userspace? The tests are examples of userspace applications and > > how they can use APIs, so if the patch breaks a test it is by definition > > breaking userspace which is not allowed. Tests are often overly sensitive to kernel behavior, while this is obviously a red flag it's not an automatic nack. The more tests we have the more often we'll catch tiny changes. A lot of tests started flaking with 6.9 because of the optimizations in the timer subsystem. You know where I'm going with this.. > I think the userspace program relied on a bug added in linux in 2020 > > Jakub, I will stop trying to push the patches, this is a lost battle. If you have the patches ready - please post them. I'm happy to take the blame if they actually regress something in the wild :( We're pursuing this because real application suffer real problems when routing changes cause transient ICMP errors. Users read the RFC and then come shouting at us that Linux is buggy.
On Thu, Apr 18, 2024 at 8:02 PM David Ahern <dsahern@kernel.org> wrote: > > On 4/18/24 11:47 AM, Eric Dumazet wrote: > > I think the userspace program relied on a bug added in linux in 2020 > > which test - when was it added? nettest.c and the fcnal script were > merged in 2019. The test relies on connect() being aborted by EHOSTUNREACH This is in complete violation with RFC Even our own net.ipv4/icmp.c states this clearly. RFC 1122: 3.2.2.1 States that NET_UNREACH, HOST_UNREACH and SR_FAILED MUST be considered 'transient errs'. Your program wants to use RECVERR instead of depending on a bug that we are trying very hard to solve.
On Thu, Apr 18, 2024 at 8:09 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 18 Apr 2024 19:47:51 +0200 Eric Dumazet wrote: > > > You have a kernel patch that makes a test fail, and your solution is > > > changing userspace? The tests are examples of userspace applications and > > > how they can use APIs, so if the patch breaks a test it is by definition > > > breaking userspace which is not allowed. > > Tests are often overly sensitive to kernel behavior, while this is > obviously a red flag it's not an automatic nack. The more tests we > have the more often we'll catch tiny changes. A lot of tests started > flaking with 6.9 because of the optimizations in the timer subsystem. > You know where I'm going with this.. > > > I think the userspace program relied on a bug added in linux in 2020 > > > > Jakub, I will stop trying to push the patches, this is a lost battle. > > If you have the patches ready - please post them. > I'm happy to take the blame if they actually regress something in > the wild :( The series with the 2 patches has been posted already. The remaining part is a nettest.c fix, that David is not happy with. diff --git a/tools/testing/selftests/net/nettest.c b/tools/testing/selftests/net/nettest.c index cd8a580974480212b45d86f35293b77f3d033473..23d56797900f19890923028af0b7ba162d9d5794 100644 --- a/tools/testing/selftests/net/nettest.c +++ b/tools/testing/selftests/net/nettest.c @@ -1744,6 +1744,8 @@ static int connectsock(void *addr, socklen_t alen, struct sock_args *args) if (args->bind_test_only) goto out; + set_recv_attr(sd, args->version); + if (connect(sd, addr, alen) < 0) { if (errno != EINPROGRESS) { log_err_errno("Failed to connect to remote host"); > > We're pursuing this because real application suffer real problems > when routing changes cause transient ICMP errors. Users read the RFC > and then come shouting at us that Linux is buggy.
On 4/18/24 12:09 PM, Jakub Kicinski wrote: > On Thu, 18 Apr 2024 19:47:51 +0200 Eric Dumazet wrote: >>> You have a kernel patch that makes a test fail, and your solution is >>> changing userspace? The tests are examples of userspace applications and >>> how they can use APIs, so if the patch breaks a test it is by definition >>> breaking userspace which is not allowed. > > Tests are often overly sensitive to kernel behavior, while this is That test script is a fairly comprehensive sweep of uAPIs and kernel behavior. Its sole job is to detect user visible changes and breakage from patches. It has done its job here. Do not shoot or criticize the messenger because you do not like the message. > obviously a red flag it's not an automatic nack. The more tests we > have the more often we'll catch tiny changes. A lot of tests started > flaking with 6.9 because of the optimizations in the timer subsystem. > You know where I'm going with this.. > >> I think the userspace program relied on a bug added in linux in 2020 >> >> Jakub, I will stop trying to push the patches, this is a lost battle. > > If you have the patches ready - please post them. > I'm happy to take the blame if they actually regress something in > the wild :( And because of this test suite you are making a conscious decision to merge a patch that is making a user visible change. That is part of the tough decisions a maintainer has to make; I have no problem with that.
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 88c83ac4212957f19efad0f967952d2502bdbc7f..a717db99972d977a64178d7ed1109325d64a6d51 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -602,7 +602,8 @@ int tcp_v4_err(struct sk_buff *skb, u32 info) if (fastopen && !fastopen->sk) break; - ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); + if (inet_test_bit(RECVERR, sk)) + ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th); if (!sock_owned_by_user(sk)) { WRITE_ONCE(sk->sk_err, err);