diff mbox series

xprtsock: Fix a loop in xs_tcp_setup_socket()

Message ID 20240420104801.94701-1-usiegl00@gmail.com (mailing list archive)
State New, archived
Headers show
Series xprtsock: Fix a loop in xs_tcp_setup_socket() | expand

Commit Message

Lex Siegel April 20, 2024, 10:48 a.m. UTC
When using a bpf on kernel_connect(), the call can return -EPERM.
This causes xs_tcp_setup_socket() to loop forever, filling up the
syslog and causing the kernel to freeze up.

Signed-off-by: Lex Siegel <usiegl00@gmail.com>
---
 net/sunrpc/xprtsock.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Jeff Layton April 20, 2024, 11:06 a.m. UTC | #1
On Sat, 2024-04-20 at 19:48 +0900, Lex Siegel wrote:
> When using a bpf on kernel_connect(), the call can return -EPERM.
> This causes xs_tcp_setup_socket() to loop forever, filling up the
> syslog and causing the kernel to freeze up.
> 
> Signed-off-by: Lex Siegel <usiegl00@gmail.com>
> ---
>  net/sunrpc/xprtsock.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index bb9b747d58a1..47b254806a08 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -2446,6 +2446,8 @@ static void xs_tcp_setup_socket(struct work_struct *work)
>  		/* Happens, for instance, if the user specified a link
>  		 * local IPv6 address without a scope-id.
>  		 */
> +	case -EPERM:
> +		/* Happens, for instance, if a bpf is preventing the connect */
>  	case -ECONNREFUSED:
>  	case -ECONNRESET:
>  	case -ENETDOWN:

Yeah, I think we'd consider -EPERM a permanent error.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Trond Myklebust April 21, 2024, 10:32 p.m. UTC | #2
On Sat, 2024-04-20 at 19:48 +0900, Lex Siegel wrote:
> [You don't often get email from usiegl00@gmail.com. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> When using a bpf on kernel_connect(), the call can return -EPERM.
> This causes xs_tcp_setup_socket() to loop forever, filling up the
> syslog and causing the kernel to freeze up.
> 
> Signed-off-by: Lex Siegel <usiegl00@gmail.com>
> ---
>  net/sunrpc/xprtsock.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index bb9b747d58a1..47b254806a08 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -2446,6 +2446,8 @@ static void xs_tcp_setup_socket(struct
> work_struct *work)
>                 /* Happens, for instance, if the user specified a
> link
>                  * local IPv6 address without a scope-id.
>                  */
> +       case -EPERM:
> +               /* Happens, for instance, if a bpf is preventing the
> connect */
>         case -ECONNREFUSED:
>         case -ECONNRESET:
>         case -ENETDOWN:
> --
> 2.39.3
> 

This looks as if it will have some rather subtle interactions if the
RPC_TASK_SOFTCONN flag is set. Have you tested this kind of scenario?
NeilBrown April 21, 2024, 11:22 p.m. UTC | #3
On Sat, 20 Apr 2024, Lex Siegel wrote:
> When using a bpf on kernel_connect(), the call can return -EPERM.
> This causes xs_tcp_setup_socket() to loop forever, filling up the
> syslog and causing the kernel to freeze up.
> 
> Signed-off-by: Lex Siegel <usiegl00@gmail.com>
> ---
>  net/sunrpc/xprtsock.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index bb9b747d58a1..47b254806a08 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -2446,6 +2446,8 @@ static void xs_tcp_setup_socket(struct work_struct *work)
>  		/* Happens, for instance, if the user specified a link
>  		 * local IPv6 address without a scope-id.
>  		 */
> +	case -EPERM:
> +		/* Happens, for instance, if a bpf is preventing the connect */

This will propagate -EPERM up into other layers which might not be ready
to handle it.
It might be safer to map EPERM to an error we would be more likely to
expect  from the network system - such as ECONNREFUSED or ENETDOWN.

Better still would be for kernel_connect() to return a more normal error
code - not EPERM.  If that cannot be achieved, then I think it would be
best for the sunrpc code to map EPERM to something else at the place
where kernel_connect() is called - catch it early.

NeilBrown


>  	case -ECONNREFUSED:
>  	case -ECONNRESET:
>  	case -ENETDOWN:
> -- 
> 2.39.3
> 
>
Lex Siegel April 22, 2024, 3:32 a.m. UTC | #4
> Better still would be for kernel_connect() to return a more normal error
> code - not EPERM.  If that cannot be achieved, then I think it would be
> best for the sunrpc code to map EPERM to something else at the place
> where kernel_connect() is called - catch it early.

The question is whether a permission error, EPERM, should cause a retry or
return. Currently xs_tcp_setup_socket() is retrying. For the retry to clear,
the connect call will have to not return a permission error to halt the retry
attempts.

This is a default behavior because EPERM is not an explicit case of the switch
statement. Because bpf appropriately uses EPERM to show that the kernel_connect
was not permitted, it highlights the return handling for this case is missing.
It is unlikely that retry was ever the intended result.

Upstream, the bpf that caused this is at:
https://github.com/cilium/cilium/blob/v1.15/bpf/bpf_sock.c#L336

This cilium bpf code has two return statuses, EPERM and ENXIO, that fall
through to the default case of retrying. Here, cilium expects both of these
statuses to indicate the connect failed. A retry is not the intended result.

Handling this case without a retry aligns this code with the udp behavior. This
precedence for passing EPERM back up the stack was set in 3dedbb5ca10ef.

I will amend my patch to include an explicit case for ENXIO as well, as this is
also in cilium's bpf and will cause the same bug to occur.


On Mon, Apr 22, 2024 at 8:22 AM NeilBrown <neilb@suse.de> wrote:
>
> On Sat, 20 Apr 2024, Lex Siegel wrote:
> > When using a bpf on kernel_connect(), the call can return -EPERM.
> > This causes xs_tcp_setup_socket() to loop forever, filling up the
> > syslog and causing the kernel to freeze up.
> >
> > Signed-off-by: Lex Siegel <usiegl00@gmail.com>
> > ---
> >  net/sunrpc/xprtsock.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > index bb9b747d58a1..47b254806a08 100644
> > --- a/net/sunrpc/xprtsock.c
> > +++ b/net/sunrpc/xprtsock.c
> > @@ -2446,6 +2446,8 @@ static void xs_tcp_setup_socket(struct work_struct *work)
> >               /* Happens, for instance, if the user specified a link
> >                * local IPv6 address without a scope-id.
> >                */
> > +     case -EPERM:
> > +             /* Happens, for instance, if a bpf is preventing the connect */
>
> This will propagate -EPERM up into other layers which might not be ready
> to handle it.
> It might be safer to map EPERM to an error we would be more likely to
> expect  from the network system - such as ECONNREFUSED or ENETDOWN.
>
> Better still would be for kernel_connect() to return a more normal error
> code - not EPERM.  If that cannot be achieved, then I think it would be
> best for the sunrpc code to map EPERM to something else at the place
> where kernel_connect() is called - catch it early.
>
> NeilBrown
>
>
> >       case -ECONNREFUSED:
> >       case -ECONNRESET:
> >       case -ENETDOWN:
> > --
> > 2.39.3
> >
> >
>
NeilBrown April 22, 2024, 3:44 a.m. UTC | #5
On Mon, 22 Apr 2024, Lex Siegel wrote:
> > Better still would be for kernel_connect() to return a more normal error
> > code - not EPERM.  If that cannot be achieved, then I think it would be
> > best for the sunrpc code to map EPERM to something else at the place
> > where kernel_connect() is called - catch it early.
> 
> The question is whether a permission error, EPERM, should cause a retry or
> return. Currently xs_tcp_setup_socket() is retrying. For the retry to clear,
> the connect call will have to not return a permission error to halt the retry
> attempts.
> 
> This is a default behavior because EPERM is not an explicit case of the switch
> statement. Because bpf appropriately uses EPERM to show that the kernel_connect
> was not permitted, it highlights the return handling for this case is missing.
> It is unlikely that retry was ever the intended result.
> 
> Upstream, the bpf that caused this is at:
> https://github.com/cilium/cilium/blob/v1.15/bpf/bpf_sock.c#L336
> 
> This cilium bpf code has two return statuses, EPERM and ENXIO, that fall
> through to the default case of retrying. Here, cilium expects both of these
> statuses to indicate the connect failed. A retry is not the intended result.
> 
> Handling this case without a retry aligns this code with the udp behavior. This
> precedence for passing EPERM back up the stack was set in 3dedbb5ca10ef.
> 
> I will amend my patch to include an explicit case for ENXIO as well, as this is
> also in cilium's bpf and will cause the same bug to occur.
> 

I think it should be up to cilium to report an errno that the kernel
understands, not up to the kernel to understand whatever errno cilium
chooses to return.

I don't think EPERM or ENXIO are appropriate errors for network
problems.
EHOSTUNREACH or ECONNREFUSED would make much more sense.

NeilBrown


> 
> On Mon, Apr 22, 2024 at 8:22 AM NeilBrown <neilb@suse.de> wrote:
> >
> > On Sat, 20 Apr 2024, Lex Siegel wrote:
> > > When using a bpf on kernel_connect(), the call can return -EPERM.
> > > This causes xs_tcp_setup_socket() to loop forever, filling up the
> > > syslog and causing the kernel to freeze up.
> > >
> > > Signed-off-by: Lex Siegel <usiegl00@gmail.com>
> > > ---
> > >  net/sunrpc/xprtsock.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > > index bb9b747d58a1..47b254806a08 100644
> > > --- a/net/sunrpc/xprtsock.c
> > > +++ b/net/sunrpc/xprtsock.c
> > > @@ -2446,6 +2446,8 @@ static void xs_tcp_setup_socket(struct work_struct *work)
> > >               /* Happens, for instance, if the user specified a link
> > >                * local IPv6 address without a scope-id.
> > >                */
> > > +     case -EPERM:
> > > +             /* Happens, for instance, if a bpf is preventing the connect */
> >
> > This will propagate -EPERM up into other layers which might not be ready
> > to handle it.
> > It might be safer to map EPERM to an error we would be more likely to
> > expect  from the network system - such as ECONNREFUSED or ENETDOWN.
> >
> > Better still would be for kernel_connect() to return a more normal error
> > code - not EPERM.  If that cannot be achieved, then I think it would be
> > best for the sunrpc code to map EPERM to something else at the place
> > where kernel_connect() is called - catch it early.
> >
> > NeilBrown
> >
> >
> > >       case -ECONNREFUSED:
> > >       case -ECONNRESET:
> > >       case -ENETDOWN:
> > > --
> > > 2.39.3
> > >
> > >
> >
>
diff mbox series

Patch

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index bb9b747d58a1..47b254806a08 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2446,6 +2446,8 @@  static void xs_tcp_setup_socket(struct work_struct *work)
 		/* Happens, for instance, if the user specified a link
 		 * local IPv6 address without a scope-id.
 		 */
+	case -EPERM:
+		/* Happens, for instance, if a bpf is preventing the connect */
 	case -ECONNREFUSED:
 	case -ECONNRESET:
 	case -ENETDOWN: