Message ID | 20240420104801.94701-1-usiegl00@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xprtsock: Fix a loop in xs_tcp_setup_socket() | expand |
On Sat, 2024-04-20 at 19:48 +0900, Lex Siegel wrote: > When using a bpf on kernel_connect(), the call can return -EPERM. > This causes xs_tcp_setup_socket() to loop forever, filling up the > syslog and causing the kernel to freeze up. > > Signed-off-by: Lex Siegel <usiegl00@gmail.com> > --- > net/sunrpc/xprtsock.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index bb9b747d58a1..47b254806a08 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -2446,6 +2446,8 @@ static void xs_tcp_setup_socket(struct work_struct *work) > /* Happens, for instance, if the user specified a link > * local IPv6 address without a scope-id. > */ > + case -EPERM: > + /* Happens, for instance, if a bpf is preventing the connect */ > case -ECONNREFUSED: > case -ECONNRESET: > case -ENETDOWN: Yeah, I think we'd consider -EPERM a permanent error. Reviewed-by: Jeff Layton <jlayton@kernel.org>
On Sat, 2024-04-20 at 19:48 +0900, Lex Siegel wrote: > [You don't often get email from usiegl00@gmail.com. Learn why this is > important at https://aka.ms/LearnAboutSenderIdentification ] > > When using a bpf on kernel_connect(), the call can return -EPERM. > This causes xs_tcp_setup_socket() to loop forever, filling up the > syslog and causing the kernel to freeze up. > > Signed-off-by: Lex Siegel <usiegl00@gmail.com> > --- > net/sunrpc/xprtsock.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index bb9b747d58a1..47b254806a08 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -2446,6 +2446,8 @@ static void xs_tcp_setup_socket(struct > work_struct *work) > /* Happens, for instance, if the user specified a > link > * local IPv6 address without a scope-id. > */ > + case -EPERM: > + /* Happens, for instance, if a bpf is preventing the > connect */ > case -ECONNREFUSED: > case -ECONNRESET: > case -ENETDOWN: > -- > 2.39.3 > This looks as if it will have some rather subtle interactions if the RPC_TASK_SOFTCONN flag is set. Have you tested this kind of scenario?
On Sat, 20 Apr 2024, Lex Siegel wrote: > When using a bpf on kernel_connect(), the call can return -EPERM. > This causes xs_tcp_setup_socket() to loop forever, filling up the > syslog and causing the kernel to freeze up. > > Signed-off-by: Lex Siegel <usiegl00@gmail.com> > --- > net/sunrpc/xprtsock.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index bb9b747d58a1..47b254806a08 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -2446,6 +2446,8 @@ static void xs_tcp_setup_socket(struct work_struct *work) > /* Happens, for instance, if the user specified a link > * local IPv6 address without a scope-id. > */ > + case -EPERM: > + /* Happens, for instance, if a bpf is preventing the connect */ This will propagate -EPERM up into other layers which might not be ready to handle it. It might be safer to map EPERM to an error we would be more likely to expect from the network system - such as ECONNREFUSED or ENETDOWN. Better still would be for kernel_connect() to return a more normal error code - not EPERM. If that cannot be achieved, then I think it would be best for the sunrpc code to map EPERM to something else at the place where kernel_connect() is called - catch it early. NeilBrown > case -ECONNREFUSED: > case -ECONNRESET: > case -ENETDOWN: > -- > 2.39.3 > >
> Better still would be for kernel_connect() to return a more normal error > code - not EPERM. If that cannot be achieved, then I think it would be > best for the sunrpc code to map EPERM to something else at the place > where kernel_connect() is called - catch it early. The question is whether a permission error, EPERM, should cause a retry or return. Currently xs_tcp_setup_socket() is retrying. For the retry to clear, the connect call will have to not return a permission error to halt the retry attempts. This is a default behavior because EPERM is not an explicit case of the switch statement. Because bpf appropriately uses EPERM to show that the kernel_connect was not permitted, it highlights the return handling for this case is missing. It is unlikely that retry was ever the intended result. Upstream, the bpf that caused this is at: https://github.com/cilium/cilium/blob/v1.15/bpf/bpf_sock.c#L336 This cilium bpf code has two return statuses, EPERM and ENXIO, that fall through to the default case of retrying. Here, cilium expects both of these statuses to indicate the connect failed. A retry is not the intended result. Handling this case without a retry aligns this code with the udp behavior. This precedence for passing EPERM back up the stack was set in 3dedbb5ca10ef. I will amend my patch to include an explicit case for ENXIO as well, as this is also in cilium's bpf and will cause the same bug to occur. On Mon, Apr 22, 2024 at 8:22 AM NeilBrown <neilb@suse.de> wrote: > > On Sat, 20 Apr 2024, Lex Siegel wrote: > > When using a bpf on kernel_connect(), the call can return -EPERM. > > This causes xs_tcp_setup_socket() to loop forever, filling up the > > syslog and causing the kernel to freeze up. > > > > Signed-off-by: Lex Siegel <usiegl00@gmail.com> > > --- > > net/sunrpc/xprtsock.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > > index bb9b747d58a1..47b254806a08 100644 > > --- a/net/sunrpc/xprtsock.c > > +++ b/net/sunrpc/xprtsock.c > > @@ -2446,6 +2446,8 @@ static void xs_tcp_setup_socket(struct work_struct *work) > > /* Happens, for instance, if the user specified a link > > * local IPv6 address without a scope-id. > > */ > > + case -EPERM: > > + /* Happens, for instance, if a bpf is preventing the connect */ > > This will propagate -EPERM up into other layers which might not be ready > to handle it. > It might be safer to map EPERM to an error we would be more likely to > expect from the network system - such as ECONNREFUSED or ENETDOWN. > > Better still would be for kernel_connect() to return a more normal error > code - not EPERM. If that cannot be achieved, then I think it would be > best for the sunrpc code to map EPERM to something else at the place > where kernel_connect() is called - catch it early. > > NeilBrown > > > > case -ECONNREFUSED: > > case -ECONNRESET: > > case -ENETDOWN: > > -- > > 2.39.3 > > > > >
On Mon, 22 Apr 2024, Lex Siegel wrote: > > Better still would be for kernel_connect() to return a more normal error > > code - not EPERM. If that cannot be achieved, then I think it would be > > best for the sunrpc code to map EPERM to something else at the place > > where kernel_connect() is called - catch it early. > > The question is whether a permission error, EPERM, should cause a retry or > return. Currently xs_tcp_setup_socket() is retrying. For the retry to clear, > the connect call will have to not return a permission error to halt the retry > attempts. > > This is a default behavior because EPERM is not an explicit case of the switch > statement. Because bpf appropriately uses EPERM to show that the kernel_connect > was not permitted, it highlights the return handling for this case is missing. > It is unlikely that retry was ever the intended result. > > Upstream, the bpf that caused this is at: > https://github.com/cilium/cilium/blob/v1.15/bpf/bpf_sock.c#L336 > > This cilium bpf code has two return statuses, EPERM and ENXIO, that fall > through to the default case of retrying. Here, cilium expects both of these > statuses to indicate the connect failed. A retry is not the intended result. > > Handling this case without a retry aligns this code with the udp behavior. This > precedence for passing EPERM back up the stack was set in 3dedbb5ca10ef. > > I will amend my patch to include an explicit case for ENXIO as well, as this is > also in cilium's bpf and will cause the same bug to occur. > I think it should be up to cilium to report an errno that the kernel understands, not up to the kernel to understand whatever errno cilium chooses to return. I don't think EPERM or ENXIO are appropriate errors for network problems. EHOSTUNREACH or ECONNREFUSED would make much more sense. NeilBrown > > On Mon, Apr 22, 2024 at 8:22 AM NeilBrown <neilb@suse.de> wrote: > > > > On Sat, 20 Apr 2024, Lex Siegel wrote: > > > When using a bpf on kernel_connect(), the call can return -EPERM. > > > This causes xs_tcp_setup_socket() to loop forever, filling up the > > > syslog and causing the kernel to freeze up. > > > > > > Signed-off-by: Lex Siegel <usiegl00@gmail.com> > > > --- > > > net/sunrpc/xprtsock.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > > > index bb9b747d58a1..47b254806a08 100644 > > > --- a/net/sunrpc/xprtsock.c > > > +++ b/net/sunrpc/xprtsock.c > > > @@ -2446,6 +2446,8 @@ static void xs_tcp_setup_socket(struct work_struct *work) > > > /* Happens, for instance, if the user specified a link > > > * local IPv6 address without a scope-id. > > > */ > > > + case -EPERM: > > > + /* Happens, for instance, if a bpf is preventing the connect */ > > > > This will propagate -EPERM up into other layers which might not be ready > > to handle it. > > It might be safer to map EPERM to an error we would be more likely to > > expect from the network system - such as ECONNREFUSED or ENETDOWN. > > > > Better still would be for kernel_connect() to return a more normal error > > code - not EPERM. If that cannot be achieved, then I think it would be > > best for the sunrpc code to map EPERM to something else at the place > > where kernel_connect() is called - catch it early. > > > > NeilBrown > > > > > > > case -ECONNREFUSED: > > > case -ECONNRESET: > > > case -ENETDOWN: > > > -- > > > 2.39.3 > > > > > > > > >
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index bb9b747d58a1..47b254806a08 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2446,6 +2446,8 @@ static void xs_tcp_setup_socket(struct work_struct *work) /* Happens, for instance, if the user specified a link * local IPv6 address without a scope-id. */ + case -EPERM: + /* Happens, for instance, if a bpf is preventing the connect */ case -ECONNREFUSED: case -ECONNRESET: case -ENETDOWN:
When using a bpf on kernel_connect(), the call can return -EPERM. This causes xs_tcp_setup_socket() to loop forever, filling up the syslog and causing the kernel to freeze up. Signed-off-by: Lex Siegel <usiegl00@gmail.com> --- net/sunrpc/xprtsock.c | 2 ++ 1 file changed, 2 insertions(+)