diff mbox series

sunrpc: fix NFSACL RPC retry on soft mount

Message ID 20240425104938.3363417-1-dan.aloni@vastdata.com (mailing list archive)
State New
Headers show
Series sunrpc: fix NFSACL RPC retry on soft mount | expand

Commit Message

Dan Aloni April 25, 2024, 10:49 a.m. UTC
It used to be quite awhile ago since 1b63a75180c6 ('SUNRPC: Refactor
rpc_clone_client()'), in 2012, that `cl_timeout` was copied in so that
all mount parameters propagate to NFSACL clients. However since that
change, if mount options as follows are given:

    soft,timeo=50,retrans=16,vers=3

The resultant NFSACL client receives:

    cl_softrtry: 1
    cl_timeout: to_initval=60000, to_maxval=60000, to_increment=0, to_retries=2, to_exponential=0

These values lead to NFSACL operations not being retried under the
condition of transient network outages with soft mount. Instead, getacl
call fails after 60 seconds with EIO.

The simple fix is to pass the existing client's `cl_timeout` as the new
client timeout.

Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/all/20231105154857.ryakhmgaptq3hb6b@gmail.com/T/
Fixes: 1b63a75180c6 ('SUNRPC: Refactor rpc_clone_client()')
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
---
 net/sunrpc/clnt.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Benjamin Coddington April 25, 2024, 11:04 a.m. UTC | #1
On 25 Apr 2024, at 6:49, Dan Aloni wrote:

> It used to be quite awhile ago since 1b63a75180c6 ('SUNRPC: Refactor
> rpc_clone_client()'), in 2012, that `cl_timeout` was copied in so that
> all mount parameters propagate to NFSACL clients. However since that
> change, if mount options as follows are given:
>
>     soft,timeo=50,retrans=16,vers=3
>
> The resultant NFSACL client receives:
>
>     cl_softrtry: 1
>     cl_timeout: to_initval=60000, to_maxval=60000, to_increment=0, to_retries=2, to_exponential=0
>
> These values lead to NFSACL operations not being retried under the
> condition of transient network outages with soft mount. Instead, getacl
> call fails after 60 seconds with EIO.
>
> The simple fix is to pass the existing client's `cl_timeout` as the new
> client timeout.
>
> Cc: Chuck Lever <chuck.lever@oracle.com>
> Cc: Benjamin Coddington <bcodding@redhat.com>
> Link: https://lore.kernel.org/all/20231105154857.ryakhmgaptq3hb6b@gmail.com/T/
> Fixes: 1b63a75180c6 ('SUNRPC: Refactor rpc_clone_client()')
> Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>

This also affects the local rpcbind, and makes the change in
6b996476f364 sunrpc: honor rpc_task's timeout value in rpcb_create()
redundant.  Just an observation, thanks for fixing this!

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>

Ben

> ---
>  net/sunrpc/clnt.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index cda0935a68c9..07ffd4ee695a 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1068,6 +1068,7 @@ struct rpc_clnt *rpc_bind_new_program(struct rpc_clnt *old,
>  		.version	= vers,
>  		.authflavor	= old->cl_auth->au_flavor,
>  		.cred		= old->cl_cred,
> +		.timeout	= old->cl_timeout,
>  	};
>  	struct rpc_clnt *clnt;
>  	int err;
> -- 
> 2.39.3
diff mbox series

Patch

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index cda0935a68c9..07ffd4ee695a 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1068,6 +1068,7 @@  struct rpc_clnt *rpc_bind_new_program(struct rpc_clnt *old,
 		.version	= vers,
 		.authflavor	= old->cl_auth->au_flavor,
 		.cred		= old->cl_cred,
+		.timeout	= old->cl_timeout,
 	};
 	struct rpc_clnt *clnt;
 	int err;