diff mbox series

SUNRPC: remove the maximum number of retries in call_bind_status

Message ID 1678301132-24496-1-git-send-email-dai.ngo@oracle.com (mailing list archive)
State New, archived
Headers show
Series SUNRPC: remove the maximum number of retries in call_bind_status | expand

Commit Message

Dai Ngo March 8, 2023, 6:45 p.m. UTC
Currently call_bind_status places a hard limit of 3 to the number of
retries on EACCES error. This limit was done to accommodate the behavior
of a buggy server that keeps returning garbage when the NLM daemon is
killed on the NFS server. However this change causes problem for other
servers that take a little longer than 9 seconds for the port mapper to
become ready when the NFS server is restarted.

This patch removes this hard coded limit and let the RPC handles
the retry according to whether the export is soft or hard mounted.

To avoid the hang with buggy server, the client can use soft mount for
the export.

Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests")
Reported-by: Helen Chao <helen.chao@oracle.com>
Tested-by: Helen Chao <helen.chao@oracle.com>
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
 include/linux/sunrpc/sched.h | 3 +--
 net/sunrpc/clnt.c            | 3 ---
 net/sunrpc/sched.c           | 1 -
 3 files changed, 1 insertion(+), 6 deletions(-)

Comments

Chuck Lever March 8, 2023, 6:50 p.m. UTC | #1
> On Mar 8, 2023, at 1:45 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> 
> Currently call_bind_status places a hard limit of 3 to the number of
> retries on EACCES error. This limit was done to accommodate the behavior
> of a buggy server that keeps returning garbage when the NLM daemon is
> killed on the NFS server. However this change causes problem for other
> servers that take a little longer than 9 seconds for the port mapper to
> become ready when the NFS server is restarted.
> 
> This patch removes this hard coded limit and let the RPC handles
> the retry according to whether the export is soft or hard mounted.
> 
> To avoid the hang with buggy server, the client can use soft mount for
> the export.
> 
> Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests")
> Reported-by: Helen Chao <helen.chao@oracle.com>
> Tested-by: Helen Chao <helen.chao@oracle.com>
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>

Helen is the royal queen of ^C  ;-)

Did you try ^C on a mount while it waits for a rebind?


> ---
> include/linux/sunrpc/sched.h | 3 +--
> net/sunrpc/clnt.c            | 3 ---
> net/sunrpc/sched.c           | 1 -
> 3 files changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
> index b8ca3ecaf8d7..8ada7dc802d3 100644
> --- a/include/linux/sunrpc/sched.h
> +++ b/include/linux/sunrpc/sched.h
> @@ -90,8 +90,7 @@ struct rpc_task {
> #endif
> 	unsigned char		tk_priority : 2,/* Task priority */
> 				tk_garb_retry : 2,
> -				tk_cred_retry : 2,
> -				tk_rebind_retry : 2;
> +				tk_cred_retry : 2;
> };
> 
> typedef void			(*rpc_action)(struct rpc_task *);
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 0b0b9f1eed46..63b438d8564b 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -2050,9 +2050,6 @@ call_bind_status(struct rpc_task *task)
> 			status = -EOPNOTSUPP;
> 			break;
> 		}
> -		if (task->tk_rebind_retry == 0)
> -			break;
> -		task->tk_rebind_retry--;
> 		rpc_delay(task, 3*HZ);
> 		goto retry_timeout;
> 	case -ENOBUFS:
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index be587a308e05..c8321de341ee 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -817,7 +817,6 @@ rpc_init_task_statistics(struct rpc_task *task)
> 	/* Initialize retry counters */
> 	task->tk_garb_retry = 2;
> 	task->tk_cred_retry = 2;
> -	task->tk_rebind_retry = 2;
> 
> 	/* starting timestamp */
> 	task->tk_start = ktime_get();
> -- 
> 2.9.5
> 

--
Chuck Lever
Dai Ngo March 8, 2023, 7:03 p.m. UTC | #2
On 3/8/23 10:50 AM, Chuck Lever III wrote:
>
>> On Mar 8, 2023, at 1:45 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
>>
>> Currently call_bind_status places a hard limit of 3 to the number of
>> retries on EACCES error. This limit was done to accommodate the behavior
>> of a buggy server that keeps returning garbage when the NLM daemon is
>> killed on the NFS server. However this change causes problem for other
>> servers that take a little longer than 9 seconds for the port mapper to
>> become ready when the NFS server is restarted.
>>
>> This patch removes this hard coded limit and let the RPC handles
>> the retry according to whether the export is soft or hard mounted.
>>
>> To avoid the hang with buggy server, the client can use soft mount for
>> the export.
>>
>> Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests")
>> Reported-by: Helen Chao <helen.chao@oracle.com>
>> Tested-by: Helen Chao <helen.chao@oracle.com>
>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> Helen is the royal queen of ^C  ;-)
>
> Did you try ^C on a mount while it waits for a rebind?

She uses a test script that restarts the NFS server while NLM lock test
is running. The failure is random, sometimes it fails and sometimes it
passes depending on when the LOCK/UNLOCK requests come in so I think
it's hard to time it to do the ^C, but I will ask.

Thanks,
-Dai

>
>
>> ---
>> include/linux/sunrpc/sched.h | 3 +--
>> net/sunrpc/clnt.c            | 3 ---
>> net/sunrpc/sched.c           | 1 -
>> 3 files changed, 1 insertion(+), 6 deletions(-)
>>
>> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
>> index b8ca3ecaf8d7..8ada7dc802d3 100644
>> --- a/include/linux/sunrpc/sched.h
>> +++ b/include/linux/sunrpc/sched.h
>> @@ -90,8 +90,7 @@ struct rpc_task {
>> #endif
>> 	unsigned char		tk_priority : 2,/* Task priority */
>> 				tk_garb_retry : 2,
>> -				tk_cred_retry : 2,
>> -				tk_rebind_retry : 2;
>> +				tk_cred_retry : 2;
>> };
>>
>> typedef void			(*rpc_action)(struct rpc_task *);
>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>> index 0b0b9f1eed46..63b438d8564b 100644
>> --- a/net/sunrpc/clnt.c
>> +++ b/net/sunrpc/clnt.c
>> @@ -2050,9 +2050,6 @@ call_bind_status(struct rpc_task *task)
>> 			status = -EOPNOTSUPP;
>> 			break;
>> 		}
>> -		if (task->tk_rebind_retry == 0)
>> -			break;
>> -		task->tk_rebind_retry--;
>> 		rpc_delay(task, 3*HZ);
>> 		goto retry_timeout;
>> 	case -ENOBUFS:
>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>> index be587a308e05..c8321de341ee 100644
>> --- a/net/sunrpc/sched.c
>> +++ b/net/sunrpc/sched.c
>> @@ -817,7 +817,6 @@ rpc_init_task_statistics(struct rpc_task *task)
>> 	/* Initialize retry counters */
>> 	task->tk_garb_retry = 2;
>> 	task->tk_cred_retry = 2;
>> -	task->tk_rebind_retry = 2;
>>
>> 	/* starting timestamp */
>> 	task->tk_start = ktime_get();
>> -- 
>> 2.9.5
>>
> --
> Chuck Lever
>
>
Dai Ngo March 14, 2023, 4:19 p.m. UTC | #3
On 3/8/23 11:03 AM, dai.ngo@oracle.com wrote:
> On 3/8/23 10:50 AM, Chuck Lever III wrote:
>>
>>> On Mar 8, 2023, at 1:45 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
>>>
>>> Currently call_bind_status places a hard limit of 3 to the number of
>>> retries on EACCES error. This limit was done to accommodate the 
>>> behavior
>>> of a buggy server that keeps returning garbage when the NLM daemon is
>>> killed on the NFS server. However this change causes problem for other
>>> servers that take a little longer than 9 seconds for the port mapper to
>>> become ready when the NFS server is restarted.
>>>
>>> This patch removes this hard coded limit and let the RPC handles
>>> the retry according to whether the export is soft or hard mounted.
>>>
>>> To avoid the hang with buggy server, the client can use soft mount for
>>> the export.
>>>
>>> Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests")
>>> Reported-by: Helen Chao <helen.chao@oracle.com>
>>> Tested-by: Helen Chao <helen.chao@oracle.com>
>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>> Helen is the royal queen of ^C  ;-)
>>
>> Did you try ^C on a mount while it waits for a rebind?
>
> She uses a test script that restarts the NFS server while NLM lock test
> is running. The failure is random, sometimes it fails and sometimes it
> passes depending on when the LOCK/UNLOCK requests come in so I think
> it's hard to time it to do the ^C, but I will ask.

We did the test with ^C and here is what we found.

For synchronous RPC task the signal was delivered to the RPC task and
the task exit with -ERESTARTSYS from __rpc_execute as expected.

For asynchronous RPC task the process that invokes the RPC task to send
the request detected the signal in rpc_wait_for_completion_task and exits
with -ERESTARTSYS. However the async RPC was allowed to continue to run
to completion. So if the async RPC task was retrying an operation and
the NFS server was down, it will retry forever if this is a hard mount
or until the NFS server comes back up.

The question for the list is should we propagate the signal to the async
task via rpc_signal_task to stop its execution or just leave it alone as is.

-Dai

>
> Thanks,
> -Dai
>
>>
>>
>>> ---
>>> include/linux/sunrpc/sched.h | 3 +--
>>> net/sunrpc/clnt.c            | 3 ---
>>> net/sunrpc/sched.c           | 1 -
>>> 3 files changed, 1 insertion(+), 6 deletions(-)
>>>
>>> diff --git a/include/linux/sunrpc/sched.h 
>>> b/include/linux/sunrpc/sched.h
>>> index b8ca3ecaf8d7..8ada7dc802d3 100644
>>> --- a/include/linux/sunrpc/sched.h
>>> +++ b/include/linux/sunrpc/sched.h
>>> @@ -90,8 +90,7 @@ struct rpc_task {
>>> #endif
>>>     unsigned char        tk_priority : 2,/* Task priority */
>>>                 tk_garb_retry : 2,
>>> -                tk_cred_retry : 2,
>>> -                tk_rebind_retry : 2;
>>> +                tk_cred_retry : 2;
>>> };
>>>
>>> typedef void            (*rpc_action)(struct rpc_task *);
>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>>> index 0b0b9f1eed46..63b438d8564b 100644
>>> --- a/net/sunrpc/clnt.c
>>> +++ b/net/sunrpc/clnt.c
>>> @@ -2050,9 +2050,6 @@ call_bind_status(struct rpc_task *task)
>>>             status = -EOPNOTSUPP;
>>>             break;
>>>         }
>>> -        if (task->tk_rebind_retry == 0)
>>> -            break;
>>> -        task->tk_rebind_retry--;
>>>         rpc_delay(task, 3*HZ);
>>>         goto retry_timeout;
>>>     case -ENOBUFS:
>>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>>> index be587a308e05..c8321de341ee 100644
>>> --- a/net/sunrpc/sched.c
>>> +++ b/net/sunrpc/sched.c
>>> @@ -817,7 +817,6 @@ rpc_init_task_statistics(struct rpc_task *task)
>>>     /* Initialize retry counters */
>>>     task->tk_garb_retry = 2;
>>>     task->tk_cred_retry = 2;
>>> -    task->tk_rebind_retry = 2;
>>>
>>>     /* starting timestamp */
>>>     task->tk_start = ktime_get();
>>> -- 
>>> 2.9.5
>>>
>> -- 
>> Chuck Lever
>>
>>
Chuck Lever March 16, 2023, 1:38 p.m. UTC | #4
We need to hear from the NFS client maintainers about
the below question and the patch.


> Begin forwarded message:
> 
> From: dai.ngo@oracle.com
> Subject: Re: [PATCH] SUNRPC: remove the maximum number of retries in call_bind_status
> Date: March 14, 2023 at 12:19:30 PM EDT
> To: Chuck Lever III <chuck.lever@oracle.com>
> Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>, Helen Chao <helen.chao@oracle.com>
> 
> 
> On 3/8/23 11:03 AM, dai.ngo@oracle.com wrote:
>> On 3/8/23 10:50 AM, Chuck Lever III wrote:
>>> 
>>>> On Mar 8, 2023, at 1:45 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
>>>> 
>>>> Currently call_bind_status places a hard limit of 3 to the number of
>>>> retries on EACCES error. This limit was done to accommodate the behavior
>>>> of a buggy server that keeps returning garbage when the NLM daemon is
>>>> killed on the NFS server. However this change causes problem for other
>>>> servers that take a little longer than 9 seconds for the port mapper to
>>>> become ready when the NFS server is restarted.
>>>> 
>>>> This patch removes this hard coded limit and let the RPC handles
>>>> the retry according to whether the export is soft or hard mounted.
>>>> 
>>>> To avoid the hang with buggy server, the client can use soft mount for
>>>> the export.
>>>> 
>>>> Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests")
>>>> Reported-by: Helen Chao <helen.chao@oracle.com>
>>>> Tested-by: Helen Chao <helen.chao@oracle.com>
>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>> Helen is the royal queen of ^C  ;-)
>>> 
>>> Did you try ^C on a mount while it waits for a rebind?
>> 
>> She uses a test script that restarts the NFS server while NLM lock test
>> is running. The failure is random, sometimes it fails and sometimes it
>> passes depending on when the LOCK/UNLOCK requests come in so I think
>> it's hard to time it to do the ^C, but I will ask.
> 
> We did the test with ^C and here is what we found.
> 
> For synchronous RPC task the signal was delivered to the RPC task and
> the task exit with -ERESTARTSYS from __rpc_execute as expected.
> 
> For asynchronous RPC task the process that invokes the RPC task to send
> the request detected the signal in rpc_wait_for_completion_task and exits
> with -ERESTARTSYS. However the async RPC was allowed to continue to run
> to completion. So if the async RPC task was retrying an operation and
> the NFS server was down, it will retry forever if this is a hard mount
> or until the NFS server comes back up.
> 
> The question for the list is should we propagate the signal to the async
> task via rpc_signal_task to stop its execution or just leave it alone as is.
> 
> -Dai
> 
>> 
>> Thanks,
>> -Dai
>> 
>>> 
>>> 
>>>> ---
>>>> include/linux/sunrpc/sched.h | 3 +--
>>>> net/sunrpc/clnt.c            | 3 ---
>>>> net/sunrpc/sched.c           | 1 -
>>>> 3 files changed, 1 insertion(+), 6 deletions(-)
>>>> 
>>>> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
>>>> index b8ca3ecaf8d7..8ada7dc802d3 100644
>>>> --- a/include/linux/sunrpc/sched.h
>>>> +++ b/include/linux/sunrpc/sched.h
>>>> @@ -90,8 +90,7 @@ struct rpc_task {
>>>> #endif
>>>>     unsigned char        tk_priority : 2,/* Task priority */
>>>>                 tk_garb_retry : 2,
>>>> -                tk_cred_retry : 2,
>>>> -                tk_rebind_retry : 2;
>>>> +                tk_cred_retry : 2;
>>>> };
>>>> 
>>>> typedef void            (*rpc_action)(struct rpc_task *);
>>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>>>> index 0b0b9f1eed46..63b438d8564b 100644
>>>> --- a/net/sunrpc/clnt.c
>>>> +++ b/net/sunrpc/clnt.c
>>>> @@ -2050,9 +2050,6 @@ call_bind_status(struct rpc_task *task)
>>>>             status = -EOPNOTSUPP;
>>>>             break;
>>>>         }
>>>> -        if (task->tk_rebind_retry == 0)
>>>> -            break;
>>>> -        task->tk_rebind_retry--;
>>>>         rpc_delay(task, 3*HZ);
>>>>         goto retry_timeout;
>>>>     case -ENOBUFS:
>>>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>>>> index be587a308e05..c8321de341ee 100644
>>>> --- a/net/sunrpc/sched.c
>>>> +++ b/net/sunrpc/sched.c
>>>> @@ -817,7 +817,6 @@ rpc_init_task_statistics(struct rpc_task *task)
>>>>     /* Initialize retry counters */
>>>>     task->tk_garb_retry = 2;
>>>>     task->tk_cred_retry = 2;
>>>> -    task->tk_rebind_retry = 2;
>>>> 
>>>>     /* starting timestamp */
>>>>     task->tk_start = ktime_get();
>>>> -- 
>>>> 2.9.5
>>>> 
>>> -- 
>>> Chuck Lever

--
Chuck Lever
Dai Ngo March 27, 2023, 4:05 p.m. UTC | #5
Hi,

Just a friendly reminder for the status of this patch.

Thanks,

-Dai

On 3/16/23 6:38 AM, Chuck Lever III wrote:
> We need to hear from the NFS client maintainers about
> the below question and the patch.
>
>
>> Begin forwarded message:
>>
>> From: dai.ngo@oracle.com
>> Subject: Re: [PATCH] SUNRPC: remove the maximum number of retries in call_bind_status
>> Date: March 14, 2023 at 12:19:30 PM EDT
>> To: Chuck Lever III <chuck.lever@oracle.com>
>> Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>, Helen Chao <helen.chao@oracle.com>
>>
>>
>> On 3/8/23 11:03 AM, dai.ngo@oracle.com wrote:
>>> On 3/8/23 10:50 AM, Chuck Lever III wrote:
>>>>> On Mar 8, 2023, at 1:45 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
>>>>>
>>>>> Currently call_bind_status places a hard limit of 3 to the number of
>>>>> retries on EACCES error. This limit was done to accommodate the behavior
>>>>> of a buggy server that keeps returning garbage when the NLM daemon is
>>>>> killed on the NFS server. However this change causes problem for other
>>>>> servers that take a little longer than 9 seconds for the port mapper to
>>>>> become ready when the NFS server is restarted.
>>>>>
>>>>> This patch removes this hard coded limit and let the RPC handles
>>>>> the retry according to whether the export is soft or hard mounted.
>>>>>
>>>>> To avoid the hang with buggy server, the client can use soft mount for
>>>>> the export.
>>>>>
>>>>> Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests")
>>>>> Reported-by: Helen Chao <helen.chao@oracle.com>
>>>>> Tested-by: Helen Chao <helen.chao@oracle.com>
>>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>> Helen is the royal queen of ^C  ;-)
>>>>
>>>> Did you try ^C on a mount while it waits for a rebind?
>>> She uses a test script that restarts the NFS server while NLM lock test
>>> is running. The failure is random, sometimes it fails and sometimes it
>>> passes depending on when the LOCK/UNLOCK requests come in so I think
>>> it's hard to time it to do the ^C, but I will ask.
>> We did the test with ^C and here is what we found.
>>
>> For synchronous RPC task the signal was delivered to the RPC task and
>> the task exit with -ERESTARTSYS from __rpc_execute as expected.
>>
>> For asynchronous RPC task the process that invokes the RPC task to send
>> the request detected the signal in rpc_wait_for_completion_task and exits
>> with -ERESTARTSYS. However the async RPC was allowed to continue to run
>> to completion. So if the async RPC task was retrying an operation and
>> the NFS server was down, it will retry forever if this is a hard mount
>> or until the NFS server comes back up.
>>
>> The question for the list is should we propagate the signal to the async
>> task via rpc_signal_task to stop its execution or just leave it alone as is.
>>
>> -Dai
>>
>>> Thanks,
>>> -Dai
>>>
>>>>
>>>>> ---
>>>>> include/linux/sunrpc/sched.h | 3 +--
>>>>> net/sunrpc/clnt.c            | 3 ---
>>>>> net/sunrpc/sched.c           | 1 -
>>>>> 3 files changed, 1 insertion(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
>>>>> index b8ca3ecaf8d7..8ada7dc802d3 100644
>>>>> --- a/include/linux/sunrpc/sched.h
>>>>> +++ b/include/linux/sunrpc/sched.h
>>>>> @@ -90,8 +90,7 @@ struct rpc_task {
>>>>> #endif
>>>>>      unsigned char        tk_priority : 2,/* Task priority */
>>>>>                  tk_garb_retry : 2,
>>>>> -                tk_cred_retry : 2,
>>>>> -                tk_rebind_retry : 2;
>>>>> +                tk_cred_retry : 2;
>>>>> };
>>>>>
>>>>> typedef void            (*rpc_action)(struct rpc_task *);
>>>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>>>>> index 0b0b9f1eed46..63b438d8564b 100644
>>>>> --- a/net/sunrpc/clnt.c
>>>>> +++ b/net/sunrpc/clnt.c
>>>>> @@ -2050,9 +2050,6 @@ call_bind_status(struct rpc_task *task)
>>>>>              status = -EOPNOTSUPP;
>>>>>              break;
>>>>>          }
>>>>> -        if (task->tk_rebind_retry == 0)
>>>>> -            break;
>>>>> -        task->tk_rebind_retry--;
>>>>>          rpc_delay(task, 3*HZ);
>>>>>          goto retry_timeout;
>>>>>      case -ENOBUFS:
>>>>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>>>>> index be587a308e05..c8321de341ee 100644
>>>>> --- a/net/sunrpc/sched.c
>>>>> +++ b/net/sunrpc/sched.c
>>>>> @@ -817,7 +817,6 @@ rpc_init_task_statistics(struct rpc_task *task)
>>>>>      /* Initialize retry counters */
>>>>>      task->tk_garb_retry = 2;
>>>>>      task->tk_cred_retry = 2;
>>>>> -    task->tk_rebind_retry = 2;
>>>>>
>>>>>      /* starting timestamp */
>>>>>      task->tk_start = ktime_get();
>>>>> -- 
>>>>> 2.9.5
>>>>>
>>>> -- 
>>>> Chuck Lever
> --
> Chuck Lever
>
>
Jeff Layton April 6, 2023, 5:33 p.m. UTC | #6
On Tue, 2023-03-14 at 09:19 -0700, dai.ngo@oracle.com wrote:
> On 3/8/23 11:03 AM, dai.ngo@oracle.com wrote:
> > On 3/8/23 10:50 AM, Chuck Lever III wrote:
> > > 
> > > > On Mar 8, 2023, at 1:45 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> > > > 
> > > > Currently call_bind_status places a hard limit of 3 to the number of
> > > > retries on EACCES error. This limit was done to accommodate the 
> > > > behavior
> > > > of a buggy server that keeps returning garbage when the NLM daemon is
> > > > killed on the NFS server. However this change causes problem for other
> > > > servers that take a little longer than 9 seconds for the port mapper to
> > > > become ready when the NFS server is restarted.
> > > > 
> > > > This patch removes this hard coded limit and let the RPC handles
> > > > the retry according to whether the export is soft or hard mounted.
> > > > 
> > > > To avoid the hang with buggy server, the client can use soft mount for
> > > > the export.
> > > > 
> > > > Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests")
> > > > Reported-by: Helen Chao <helen.chao@oracle.com>
> > > > Tested-by: Helen Chao <helen.chao@oracle.com>
> > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > Helen is the royal queen of ^C  ;-)
> > > 
> > > Did you try ^C on a mount while it waits for a rebind?
> > 
> > She uses a test script that restarts the NFS server while NLM lock test
> > is running. The failure is random, sometimes it fails and sometimes it
> > passes depending on when the LOCK/UNLOCK requests come in so I think
> > it's hard to time it to do the ^C, but I will ask.
> 
> We did the test with ^C and here is what we found.
> 
> For synchronous RPC task the signal was delivered to the RPC task and
> the task exit with -ERESTARTSYS from __rpc_execute as expected.
> 
> For asynchronous RPC task the process that invokes the RPC task to send
> the request detected the signal in rpc_wait_for_completion_task and exits
> with -ERESTARTSYS. However the async RPC was allowed to continue to run
> to completion. So if the async RPC task was retrying an operation and
> the NFS server was down, it will retry forever if this is a hard mount
> or until the NFS server comes back up.
> 
> The question for the list is should we propagate the signal to the async
> task via rpc_signal_task to stop its execution or just leave it alone as is.
> 
> 

That is a good question.

I like the patch overall, as it gets rid of a special one-off retry
counter, but I too share some concerns about retrying indefinitely when
an server goes missing.

> 
Propagating a signal seems like the right thing to do. Looks like
rpcb_getport_done would also need to grow a check for RPC_SIGNALLED ?

It sounds pretty straightforward otherwise.
Jeff Layton April 6, 2023, 6:10 p.m. UTC | #7
On Thu, 2023-04-06 at 13:33 -0400, Jeff Layton wrote:
> On Tue, 2023-03-14 at 09:19 -0700, dai.ngo@oracle.com wrote:
> > On 3/8/23 11:03 AM, dai.ngo@oracle.com wrote:
> > > On 3/8/23 10:50 AM, Chuck Lever III wrote:
> > > > 
> > > > > On Mar 8, 2023, at 1:45 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> > > > > 
> > > > > Currently call_bind_status places a hard limit of 3 to the number of
> > > > > retries on EACCES error. This limit was done to accommodate the 
> > > > > behavior
> > > > > of a buggy server that keeps returning garbage when the NLM daemon is
> > > > > killed on the NFS server. However this change causes problem for other
> > > > > servers that take a little longer than 9 seconds for the port mapper to
> > > > > 
> > > > > 


Actually, the EACCES error means that the host doesn't have the port
registered. That could happen if (e.g.) the host had a NFSv3 mount up
with an NLM connection and then crashed and rebooted and didn't remount
it.
 
> > > > > become ready when the NFS server is restarted.
> > > > > 
> > > > > This patch removes this hard coded limit and let the RPC handles
> > > > > the retry according to whether the export is soft or hard mounted.
> > > > > 
> > > > > To avoid the hang with buggy server, the client can use soft mount for
> > > > > the export.
> > > > > 
> > > > > Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests")
> > > > > Reported-by: Helen Chao <helen.chao@oracle.com>
> > > > > Tested-by: Helen Chao <helen.chao@oracle.com>
> > > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > > Helen is the royal queen of ^C  ;-)
> > > > 
> > > > Did you try ^C on a mount while it waits for a rebind?
> > > 
> > > She uses a test script that restarts the NFS server while NLM lock test
> > > is running. The failure is random, sometimes it fails and sometimes it
> > > passes depending on when the LOCK/UNLOCK requests come in so I think
> > > it's hard to time it to do the ^C, but I will ask.
> > 
> > We did the test with ^C and here is what we found.
> > 
> > For synchronous RPC task the signal was delivered to the RPC task and
> > the task exit with -ERESTARTSYS from __rpc_execute as expected.
> > 
> > For asynchronous RPC task the process that invokes the RPC task to send
> > the request detected the signal in rpc_wait_for_completion_task and exits
> > with -ERESTARTSYS. However the async RPC was allowed to continue to run
> > to completion. So if the async RPC task was retrying an operation and
> > the NFS server was down, it will retry forever if this is a hard mount
> > or until the NFS server comes back up.
> > 
> > The question for the list is should we propagate the signal to the async
> > task via rpc_signal_task to stop its execution or just leave it alone as is.
> > 
> > 
> 
> That is a good question.
> 
> I like the patch overall, as it gets rid of a special one-off retry
> counter, but I too share some concerns about retrying indefinitely when
> an server goes missing.
> 
> > 
> Propagating a signal seems like the right thing to do. Looks like
> rpcb_getport_done would also need to grow a check for RPC_SIGNALLED ?
> 
> It sounds pretty straightforward otherwise.

Erm, except that some of these xprts are in the context of the server.

For instance: server-side lockd sometimes has to send messages to the
clients (e.g. GRANTED callbacks). Suppose we're trying to send a message
to the client, but it has crashed and rebooted...or maybe the client's
address got handed to another host that isn't doing NFS at all so the
NLM service will never come back.

This will mean that those RPCs will retry forever now in this situation.
I'm not sure that's what we want. Maybe we need some way to distinguish
between "user-driven" RPCs and those that aren't?

As a simpler workaround, would it work to just increase the number of
retries here? There's nothing magical about 3 tries. ISTM that having it
retry enough times to cover at least a minute or two would be
acceptable.
Dai Ngo April 6, 2023, 7:36 p.m. UTC | #8
Hi Jeff,

Thank you for taking a look at the patch.

On 4/6/23 11:10 AM, Jeff Layton wrote:
> On Thu, 2023-04-06 at 13:33 -0400, Jeff Layton wrote:
>> On Tue, 2023-03-14 at 09:19 -0700, dai.ngo@oracle.com wrote:
>>> On 3/8/23 11:03 AM, dai.ngo@oracle.com wrote:
>>>> On 3/8/23 10:50 AM, Chuck Lever III wrote:
>>>>>> On Mar 8, 2023, at 1:45 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
>>>>>>
>>>>>> Currently call_bind_status places a hard limit of 3 to the number of
>>>>>> retries on EACCES error. This limit was done to accommodate the
>>>>>> behavior
>>>>>> of a buggy server that keeps returning garbage when the NLM daemon is
>>>>>> killed on the NFS server. However this change causes problem for other
>>>>>> servers that take a little longer than 9 seconds for the port mapper to
>>>>>>
>>>>>>
>
> Actually, the EACCES error means that the host doesn't have the port
> registered.

Yes, our SQA team runs stress lock test and restart the NFS server.
Sometimes the NFS server starts up and register to the port mapper in
time and there is no problem but occasionally it's late coming up causing
this EACCES error.

>   That could happen if (e.g.) the host had a NFSv3 mount up
> with an NLM connection and then crashed and rebooted and didn't remount
> it.

can you please explain this scenario I don't quite follow it. If the v3
client crashed and did not remount the export then how can the client try
to access/lock anything on the server? I must have missing something here.

>   
>>>>>> become ready when the NFS server is restarted.
>>>>>>
>>>>>> This patch removes this hard coded limit and let the RPC handles
>>>>>> the retry according to whether the export is soft or hard mounted.
>>>>>>
>>>>>> To avoid the hang with buggy server, the client can use soft mount for
>>>>>> the export.
>>>>>>
>>>>>> Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests")
>>>>>> Reported-by: Helen Chao <helen.chao@oracle.com>
>>>>>> Tested-by: Helen Chao <helen.chao@oracle.com>
>>>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>>> Helen is the royal queen of ^C  ;-)
>>>>>
>>>>> Did you try ^C on a mount while it waits for a rebind?
>>>> She uses a test script that restarts the NFS server while NLM lock test
>>>> is running. The failure is random, sometimes it fails and sometimes it
>>>> passes depending on when the LOCK/UNLOCK requests come in so I think
>>>> it's hard to time it to do the ^C, but I will ask.
>>> We did the test with ^C and here is what we found.
>>>
>>> For synchronous RPC task the signal was delivered to the RPC task and
>>> the task exit with -ERESTARTSYS from __rpc_execute as expected.
>>>
>>> For asynchronous RPC task the process that invokes the RPC task to send
>>> the request detected the signal in rpc_wait_for_completion_task and exits
>>> with -ERESTARTSYS. However the async RPC was allowed to continue to run
>>> to completion. So if the async RPC task was retrying an operation and
>>> the NFS server was down, it will retry forever if this is a hard mount
>>> or until the NFS server comes back up.
>>>
>>> The question for the list is should we propagate the signal to the async
>>> task via rpc_signal_task to stop its execution or just leave it alone as is.
>>>
>>>
>> That is a good question.

Maybe we should defer the propagation of the signal for later. Chuck has
some concern since this can change the existing behavior of async task
for other v4 operations.

>>
>> I like the patch overall, as it gets rid of a special one-off retry
>> counter, but I too share some concerns about retrying indefinitely when
>> an server goes missing.
>>
>> Propagating a signal seems like the right thing to do. Looks like
>> rpcb_getport_done would also need to grow a check for RPC_SIGNALLED ?
>>
>> It sounds pretty straightforward otherwise.
> Erm, except that some of these xprts are in the context of the server.
>
> For instance: server-side lockd sometimes has to send messages to the
> clients (e.g. GRANTED callbacks). Suppose we're trying to send a message
> to the client, but it has crashed and rebooted...or maybe the client's
> address got handed to another host that isn't doing NFS at all so the
> NLM service will never come back.
>
> This will mean that those RPCs will retry forever now in this situation.
> I'm not sure that's what we want. Maybe we need some way to distinguish
> between "user-driven" RPCs and those that aren't?
>
> As a simpler workaround, would it work to just increase the number of
> retries here? There's nothing magical about 3 tries. ISTM that having it
> retry enough times to cover at least a minute or two would be
> acceptable.

I'm happy with the simple workaround by just increasing the number of
retries. This would fix the immediate problem that we're facing without
potential of breaking things in other areas. If you agree then I will
prepare the simpler patch for this.

Thanks,
-Dai
Chuck Lever April 6, 2023, 7:43 p.m. UTC | #9
> On Apr 6, 2023, at 3:36 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> 
> Hi Jeff,
> 
> Thank you for taking a look at the patch.
> 
> On 4/6/23 11:10 AM, Jeff Layton wrote:
>> On Thu, 2023-04-06 at 13:33 -0400, Jeff Layton wrote:
>>> On Tue, 2023-03-14 at 09:19 -0700, dai.ngo@oracle.com wrote:
>>>> On 3/8/23 11:03 AM, dai.ngo@oracle.com wrote:
>>>>> On 3/8/23 10:50 AM, Chuck Lever III wrote:
>>>>>>> On Mar 8, 2023, at 1:45 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
>>>>>>> 
>>>>>>> Currently call_bind_status places a hard limit of 3 to the number of
>>>>>>> retries on EACCES error. This limit was done to accommodate the
>>>>>>> behavior
>>>>>>> of a buggy server that keeps returning garbage when the NLM daemon is
>>>>>>> killed on the NFS server. However this change causes problem for other
>>>>>>> servers that take a little longer than 9 seconds for the port mapper to
>>>>>>> 
>>>>>>> 
>> 
>> Actually, the EACCES error means that the host doesn't have the port
>> registered.
> 
> Yes, our SQA team runs stress lock test and restart the NFS server.
> Sometimes the NFS server starts up and register to the port mapper in
> time and there is no problem but occasionally it's late coming up causing
> this EACCES error.
> 
>>  That could happen if (e.g.) the host had a NFSv3 mount up
>> with an NLM connection and then crashed and rebooted and didn't remount
>> it.
> 
> can you please explain this scenario I don't quite follow it. If the v3
> client crashed and did not remount the export then how can the client try
> to access/lock anything on the server? I must have missing something here.
> 
>>  
>>>>>>> become ready when the NFS server is restarted.
>>>>>>> 
>>>>>>> This patch removes this hard coded limit and let the RPC handles
>>>>>>> the retry according to whether the export is soft or hard mounted.
>>>>>>> 
>>>>>>> To avoid the hang with buggy server, the client can use soft mount for
>>>>>>> the export.
>>>>>>> 
>>>>>>> Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests")
>>>>>>> Reported-by: Helen Chao <helen.chao@oracle.com>
>>>>>>> Tested-by: Helen Chao <helen.chao@oracle.com>
>>>>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>>>> Helen is the royal queen of ^C  ;-)
>>>>>> 
>>>>>> Did you try ^C on a mount while it waits for a rebind?
>>>>> She uses a test script that restarts the NFS server while NLM lock test
>>>>> is running. The failure is random, sometimes it fails and sometimes it
>>>>> passes depending on when the LOCK/UNLOCK requests come in so I think
>>>>> it's hard to time it to do the ^C, but I will ask.
>>>> We did the test with ^C and here is what we found.
>>>> 
>>>> For synchronous RPC task the signal was delivered to the RPC task and
>>>> the task exit with -ERESTARTSYS from __rpc_execute as expected.
>>>> 
>>>> For asynchronous RPC task the process that invokes the RPC task to send
>>>> the request detected the signal in rpc_wait_for_completion_task and exits
>>>> with -ERESTARTSYS. However the async RPC was allowed to continue to run
>>>> to completion. So if the async RPC task was retrying an operation and
>>>> the NFS server was down, it will retry forever if this is a hard mount
>>>> or until the NFS server comes back up.
>>>> 
>>>> The question for the list is should we propagate the signal to the async
>>>> task via rpc_signal_task to stop its execution or just leave it alone as is.
>>>> 
>>>> 
>>> That is a good question.
> 
> Maybe we should defer the propagation of the signal for later. Chuck has
> some concern since this can change the existing behavior of async task
> for other v4 operations.
> 
>>> 
>>> I like the patch overall, as it gets rid of a special one-off retry
>>> counter, but I too share some concerns about retrying indefinitely when
>>> an server goes missing.
>>> 
>>> Propagating a signal seems like the right thing to do. Looks like
>>> rpcb_getport_done would also need to grow a check for RPC_SIGNALLED ?
>>> 
>>> It sounds pretty straightforward otherwise.
>> Erm, except that some of these xprts are in the context of the server.
>> 
>> For instance: server-side lockd sometimes has to send messages to the
>> clients (e.g. GRANTED callbacks). Suppose we're trying to send a message
>> to the client, but it has crashed and rebooted...or maybe the client's
>> address got handed to another host that isn't doing NFS at all so the
>> NLM service will never come back.
>> 
>> This will mean that those RPCs will retry forever now in this situation.
>> I'm not sure that's what we want. Maybe we need some way to distinguish
>> between "user-driven" RPCs and those that aren't?
>> 
>> As a simpler workaround, would it work to just increase the number of
>> retries here? There's nothing magical about 3 tries. ISTM that having it
>> retry enough times to cover at least a minute or two would be
>> acceptable.
> 
> I'm happy with the simple workaround by just increasing the number of
> retries. This would fix the immediate problem that we're facing without
> potential of breaking things in other areas. If you agree then I will
> prepare the simpler patch for this.

A longer retry period is a short-term solution. I can get behind
that as long as the patch description makes clear some of the
concerns that have been brought up in this email thread.


--
Chuck Lever
Jeff Layton April 6, 2023, 7:59 p.m. UTC | #10
On Thu, 2023-04-06 at 19:43 +0000, Chuck Lever III wrote:
> 
> > On Apr 6, 2023, at 3:36 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> > 
> > Hi Jeff,
> > 
> > Thank you for taking a look at the patch.
> > 
> > On 4/6/23 11:10 AM, Jeff Layton wrote:
> > > On Thu, 2023-04-06 at 13:33 -0400, Jeff Layton wrote:
> > > > On Tue, 2023-03-14 at 09:19 -0700, dai.ngo@oracle.com wrote:
> > > > > On 3/8/23 11:03 AM, dai.ngo@oracle.com wrote:
> > > > > > On 3/8/23 10:50 AM, Chuck Lever III wrote:
> > > > > > > > On Mar 8, 2023, at 1:45 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> > > > > > > > 
> > > > > > > > Currently call_bind_status places a hard limit of 3 to the number of
> > > > > > > > retries on EACCES error. This limit was done to accommodate the
> > > > > > > > behavior
> > > > > > > > of a buggy server that keeps returning garbage when the NLM daemon is
> > > > > > > > killed on the NFS server. However this change causes problem for other
> > > > > > > > servers that take a little longer than 9 seconds for the port mapper to
> > > > > > > > 
> > > > > > > > 
> > > 
> > > Actually, the EACCES error means that the host doesn't have the port
> > > registered.
> > 
> > Yes, our SQA team runs stress lock test and restart the NFS server.
> > Sometimes the NFS server starts up and register to the port mapper in
> > time and there is no problem but occasionally it's late coming up causing
> > this EACCES error.
> > 
> > >  That could happen if (e.g.) the host had a NFSv3 mount up
> > > with an NLM connection and then crashed and rebooted and didn't remount
> > > it.
> > 
> > can you please explain this scenario I don't quite follow it. If the v3
> > client crashed and did not remount the export then how can the client try
> > to access/lock anything on the server? I must have missing something here.
> > 
> > >  

Suppose you have a client with an admin that mounts a NFSv3 mount "by
hand" (and doesn't set up statd to run at boot time). Client requests an
NLM lock and then reboots.

When it comes up, there's no notification to the server that the client
rebooted. Later, the lock becomes free and the server tries to grant it
to the client. It talks to rpcbind but lockd is never started and the
server keeps querying the client's rpcbind forever.

Maybe more likely situation: the client crashes and loses its DHCP
address when it comes back up, and the old addr gets reassigned to
another host that has rpcbind running but no NFS.

Either way, it'd keep trying to call the client back indefinitely that
way.

> > > > > > > > become ready when the NFS server is restarted.
> > > > > > > > 
> > > > > > > > This patch removes this hard coded limit and let the RPC handles
> > > > > > > > the retry according to whether the export is soft or hard mounted.
> > > > > > > > 
> > > > > > > > To avoid the hang with buggy server, the client can use soft mount for
> > > > > > > > the export.
> > > > > > > > 
> > > > > > > > Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests")
> > > > > > > > Reported-by: Helen Chao <helen.chao@oracle.com>
> > > > > > > > Tested-by: Helen Chao <helen.chao@oracle.com>
> > > > > > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > > > > > Helen is the royal queen of ^C  ;-)
> > > > > > > 
> > > > > > > Did you try ^C on a mount while it waits for a rebind?
> > > > > > She uses a test script that restarts the NFS server while NLM lock test
> > > > > > is running. The failure is random, sometimes it fails and sometimes it
> > > > > > passes depending on when the LOCK/UNLOCK requests come in so I think
> > > > > > it's hard to time it to do the ^C, but I will ask.
> > > > > We did the test with ^C and here is what we found.
> > > > > 
> > > > > For synchronous RPC task the signal was delivered to the RPC task and
> > > > > the task exit with -ERESTARTSYS from __rpc_execute as expected.
> > > > > 
> > > > > For asynchronous RPC task the process that invokes the RPC task to send
> > > > > the request detected the signal in rpc_wait_for_completion_task and exits
> > > > > with -ERESTARTSYS. However the async RPC was allowed to continue to run
> > > > > to completion. So if the async RPC task was retrying an operation and
> > > > > the NFS server was down, it will retry forever if this is a hard mount
> > > > > or until the NFS server comes back up.
> > > > > 
> > > > > The question for the list is should we propagate the signal to the async
> > > > > task via rpc_signal_task to stop its execution or just leave it alone as is.
> > > > > 
> > > > > 
> > > > That is a good question.
> > 
> > Maybe we should defer the propagation of the signal for later. Chuck has
> > some concern since this can change the existing behavior of async task
> > for other v4 operations.
> > 

Fair enough.

> > > > 
> > > > I like the patch overall, as it gets rid of a special one-off retry
> > > > counter, but I too share some concerns about retrying indefinitely when
> > > > an server goes missing.
> > > > 
> > > > Propagating a signal seems like the right thing to do. Looks like
> > > > rpcb_getport_done would also need to grow a check for RPC_SIGNALLED ?
> > > > 
> > > > It sounds pretty straightforward otherwise.
> > > Erm, except that some of these xprts are in the context of the server.
> > > 
> > > For instance: server-side lockd sometimes has to send messages to the
> > > clients (e.g. GRANTED callbacks). Suppose we're trying to send a message
> > > to the client, but it has crashed and rebooted...or maybe the client's
> > > address got handed to another host that isn't doing NFS at all so the
> > > NLM service will never come back.
> > > 
> > > This will mean that those RPCs will retry forever now in this situation.
> > > I'm not sure that's what we want. Maybe we need some way to distinguish
> > > between "user-driven" RPCs and those that aren't?
> > > 
> > > As a simpler workaround, would it work to just increase the number of
> > > retries here? There's nothing magical about 3 tries. ISTM that having it
> > > retry enough times to cover at least a minute or two would be
> > > acceptable.
> > 
> > I'm happy with the simple workaround by just increasing the number of
> > retries. This would fix the immediate problem that we're facing without
> > potential of breaking things in other areas. If you agree then I will
> > prepare the simpler patch for this.
> 
> A longer retry period is a short-term solution. I can get behind
> that as long as the patch description makes clear some of the
> concerns that have been brought up in this email thread.
> 

Sounds good to me too -- 9s is an very short amount of time, IMO. We
generally wait on the order of minutes for RPC timeouts and this seems
like a similar situation.
Dai Ngo April 6, 2023, 8:58 p.m. UTC | #11
On 4/6/23 12:43 PM, Chuck Lever III wrote:
>
>> On Apr 6, 2023, at 3:36 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
>>
>> Hi Jeff,
>>
>> Thank you for taking a look at the patch.
>>
>> On 4/6/23 11:10 AM, Jeff Layton wrote:
>>> On Thu, 2023-04-06 at 13:33 -0400, Jeff Layton wrote:
>>>> On Tue, 2023-03-14 at 09:19 -0700, dai.ngo@oracle.com wrote:
>>>>> On 3/8/23 11:03 AM, dai.ngo@oracle.com wrote:
>>>>>> On 3/8/23 10:50 AM, Chuck Lever III wrote:
>>>>>>>> On Mar 8, 2023, at 1:45 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
>>>>>>>>
>>>>>>>> Currently call_bind_status places a hard limit of 3 to the number of
>>>>>>>> retries on EACCES error. This limit was done to accommodate the
>>>>>>>> behavior
>>>>>>>> of a buggy server that keeps returning garbage when the NLM daemon is
>>>>>>>> killed on the NFS server. However this change causes problem for other
>>>>>>>> servers that take a little longer than 9 seconds for the port mapper to
>>>>>>>>
>>>>>>>>
>>> Actually, the EACCES error means that the host doesn't have the port
>>> registered.
>> Yes, our SQA team runs stress lock test and restart the NFS server.
>> Sometimes the NFS server starts up and register to the port mapper in
>> time and there is no problem but occasionally it's late coming up causing
>> this EACCES error.
>>
>>>   That could happen if (e.g.) the host had a NFSv3 mount up
>>> with an NLM connection and then crashed and rebooted and didn't remount
>>> it.
>> can you please explain this scenario I don't quite follow it. If the v3
>> client crashed and did not remount the export then how can the client try
>> to access/lock anything on the server? I must have missing something here.
>>
>>>   
>>>>>>>> become ready when the NFS server is restarted.
>>>>>>>>
>>>>>>>> This patch removes this hard coded limit and let the RPC handles
>>>>>>>> the retry according to whether the export is soft or hard mounted.
>>>>>>>>
>>>>>>>> To avoid the hang with buggy server, the client can use soft mount for
>>>>>>>> the export.
>>>>>>>>
>>>>>>>> Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests")
>>>>>>>> Reported-by: Helen Chao <helen.chao@oracle.com>
>>>>>>>> Tested-by: Helen Chao <helen.chao@oracle.com>
>>>>>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>>>>> Helen is the royal queen of ^C  ;-)
>>>>>>>
>>>>>>> Did you try ^C on a mount while it waits for a rebind?
>>>>>> She uses a test script that restarts the NFS server while NLM lock test
>>>>>> is running. The failure is random, sometimes it fails and sometimes it
>>>>>> passes depending on when the LOCK/UNLOCK requests come in so I think
>>>>>> it's hard to time it to do the ^C, but I will ask.
>>>>> We did the test with ^C and here is what we found.
>>>>>
>>>>> For synchronous RPC task the signal was delivered to the RPC task and
>>>>> the task exit with -ERESTARTSYS from __rpc_execute as expected.
>>>>>
>>>>> For asynchronous RPC task the process that invokes the RPC task to send
>>>>> the request detected the signal in rpc_wait_for_completion_task and exits
>>>>> with -ERESTARTSYS. However the async RPC was allowed to continue to run
>>>>> to completion. So if the async RPC task was retrying an operation and
>>>>> the NFS server was down, it will retry forever if this is a hard mount
>>>>> or until the NFS server comes back up.
>>>>>
>>>>> The question for the list is should we propagate the signal to the async
>>>>> task via rpc_signal_task to stop its execution or just leave it alone as is.
>>>>>
>>>>>
>>>> That is a good question.
>> Maybe we should defer the propagation of the signal for later. Chuck has
>> some concern since this can change the existing behavior of async task
>> for other v4 operations.
>>
>>>> I like the patch overall, as it gets rid of a special one-off retry
>>>> counter, but I too share some concerns about retrying indefinitely when
>>>> an server goes missing.
>>>>
>>>> Propagating a signal seems like the right thing to do. Looks like
>>>> rpcb_getport_done would also need to grow a check for RPC_SIGNALLED ?
>>>>
>>>> It sounds pretty straightforward otherwise.
>>> Erm, except that some of these xprts are in the context of the server.
>>>
>>> For instance: server-side lockd sometimes has to send messages to the
>>> clients (e.g. GRANTED callbacks). Suppose we're trying to send a message
>>> to the client, but it has crashed and rebooted...or maybe the client's
>>> address got handed to another host that isn't doing NFS at all so the
>>> NLM service will never come back.
>>>
>>> This will mean that those RPCs will retry forever now in this situation.
>>> I'm not sure that's what we want. Maybe we need some way to distinguish
>>> between "user-driven" RPCs and those that aren't?
>>>
>>> As a simpler workaround, would it work to just increase the number of
>>> retries here? There's nothing magical about 3 tries. ISTM that having it
>>> retry enough times to cover at least a minute or two would be
>>> acceptable.
>> I'm happy with the simple workaround by just increasing the number of
>> retries. This would fix the immediate problem that we're facing without
>> potential of breaking things in other areas. If you agree then I will
>> prepare the simpler patch for this.
> A longer retry period is a short-term solution. I can get behind
> that as long as the patch description makes clear some of the
> concerns that have been brought up in this email thread.

Thank you Chuck, will do.

-Dai

>
>
> --
> Chuck Lever
>
>
Dai Ngo April 6, 2023, 8:58 p.m. UTC | #12
On 4/6/23 12:59 PM, Jeff Layton wrote:
> On Thu, 2023-04-06 at 19:43 +0000, Chuck Lever III wrote:
>>> On Apr 6, 2023, at 3:36 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
>>>
>>> Hi Jeff,
>>>
>>> Thank you for taking a look at the patch.
>>>
>>> On 4/6/23 11:10 AM, Jeff Layton wrote:
>>>> On Thu, 2023-04-06 at 13:33 -0400, Jeff Layton wrote:
>>>>> On Tue, 2023-03-14 at 09:19 -0700, dai.ngo@oracle.com wrote:
>>>>>> On 3/8/23 11:03 AM, dai.ngo@oracle.com wrote:
>>>>>>> On 3/8/23 10:50 AM, Chuck Lever III wrote:
>>>>>>>>> On Mar 8, 2023, at 1:45 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
>>>>>>>>>
>>>>>>>>> Currently call_bind_status places a hard limit of 3 to the number of
>>>>>>>>> retries on EACCES error. This limit was done to accommodate the
>>>>>>>>> behavior
>>>>>>>>> of a buggy server that keeps returning garbage when the NLM daemon is
>>>>>>>>> killed on the NFS server. However this change causes problem for other
>>>>>>>>> servers that take a little longer than 9 seconds for the port mapper to
>>>>>>>>>
>>>>>>>>>
>>>> Actually, the EACCES error means that the host doesn't have the port
>>>> registered.
>>> Yes, our SQA team runs stress lock test and restart the NFS server.
>>> Sometimes the NFS server starts up and register to the port mapper in
>>> time and there is no problem but occasionally it's late coming up causing
>>> this EACCES error.
>>>
>>>>   That could happen if (e.g.) the host had a NFSv3 mount up
>>>> with an NLM connection and then crashed and rebooted and didn't remount
>>>> it.
>>> can you please explain this scenario I don't quite follow it. If the v3
>>> client crashed and did not remount the export then how can the client try
>>> to access/lock anything on the server? I must have missing something here.
>>>
>>>>   
> Suppose you have a client with an admin that mounts a NFSv3 mount "by
> hand" (and doesn't set up statd to run at boot time). Client requests an
> NLM lock and then reboots.
>
> When it comes up, there's no notification to the server that the client
> rebooted. Later, the lock becomes free and the server tries to grant it
> to the client. It talks to rpcbind but lockd is never started and the
> server keeps querying the client's rpcbind forever.
>
> Maybe more likely situation: the client crashes and loses its DHCP
> address when it comes back up, and the old addr gets reassigned to
> another host that has rpcbind running but no NFS.
>
> Either way, it'd keep trying to call the client back indefinitely that
> way.

Got it Jeff, thank you for the explanation. This is when NLM requests
are originated from the NFS server.

>
>>>>>>>>> become ready when the NFS server is restarted.
>>>>>>>>>
>>>>>>>>> This patch removes this hard coded limit and let the RPC handles
>>>>>>>>> the retry according to whether the export is soft or hard mounted.
>>>>>>>>>
>>>>>>>>> To avoid the hang with buggy server, the client can use soft mount for
>>>>>>>>> the export.
>>>>>>>>>
>>>>>>>>> Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests")
>>>>>>>>> Reported-by: Helen Chao <helen.chao@oracle.com>
>>>>>>>>> Tested-by: Helen Chao <helen.chao@oracle.com>
>>>>>>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>>>>>> Helen is the royal queen of ^C  ;-)
>>>>>>>>
>>>>>>>> Did you try ^C on a mount while it waits for a rebind?
>>>>>>> She uses a test script that restarts the NFS server while NLM lock test
>>>>>>> is running. The failure is random, sometimes it fails and sometimes it
>>>>>>> passes depending on when the LOCK/UNLOCK requests come in so I think
>>>>>>> it's hard to time it to do the ^C, but I will ask.
>>>>>> We did the test with ^C and here is what we found.
>>>>>>
>>>>>> For synchronous RPC task the signal was delivered to the RPC task and
>>>>>> the task exit with -ERESTARTSYS from __rpc_execute as expected.
>>>>>>
>>>>>> For asynchronous RPC task the process that invokes the RPC task to send
>>>>>> the request detected the signal in rpc_wait_for_completion_task and exits
>>>>>> with -ERESTARTSYS. However the async RPC was allowed to continue to run
>>>>>> to completion. So if the async RPC task was retrying an operation and
>>>>>> the NFS server was down, it will retry forever if this is a hard mount
>>>>>> or until the NFS server comes back up.
>>>>>>
>>>>>> The question for the list is should we propagate the signal to the async
>>>>>> task via rpc_signal_task to stop its execution or just leave it alone as is.
>>>>>>
>>>>>>
>>>>> That is a good question.
>>> Maybe we should defer the propagation of the signal for later. Chuck has
>>> some concern since this can change the existing behavior of async task
>>> for other v4 operations.
>>>
> Fair enough.
>
>>>>> I like the patch overall, as it gets rid of a special one-off retry
>>>>> counter, but I too share some concerns about retrying indefinitely when
>>>>> an server goes missing.
>>>>>
>>>>> Propagating a signal seems like the right thing to do. Looks like
>>>>> rpcb_getport_done would also need to grow a check for RPC_SIGNALLED ?
>>>>>
>>>>> It sounds pretty straightforward otherwise.
>>>> Erm, except that some of these xprts are in the context of the server.
>>>>
>>>> For instance: server-side lockd sometimes has to send messages to the
>>>> clients (e.g. GRANTED callbacks). Suppose we're trying to send a message
>>>> to the client, but it has crashed and rebooted...or maybe the client's
>>>> address got handed to another host that isn't doing NFS at all so the
>>>> NLM service will never come back.
>>>>
>>>> This will mean that those RPCs will retry forever now in this situation.
>>>> I'm not sure that's what we want. Maybe we need some way to distinguish
>>>> between "user-driven" RPCs and those that aren't?
>>>>
>>>> As a simpler workaround, would it work to just increase the number of
>>>> retries here? There's nothing magical about 3 tries. ISTM that having it
>>>> retry enough times to cover at least a minute or two would be
>>>> acceptable.
>>> I'm happy with the simple workaround by just increasing the number of
>>> retries. This would fix the immediate problem that we're facing without
>>> potential of breaking things in other areas. If you agree then I will
>>> prepare the simpler patch for this.
>> A longer retry period is a short-term solution. I can get behind
>> that as long as the patch description makes clear some of the
>> concerns that have been brought up in this email thread.
>>
> Sounds good to me too -- 9s is an very short amount of time, IMO. We
> generally wait on the order of minutes for RPC timeouts and this seems
> like a similar situation.

I'll prepare the patch to increase the time out.

Thanks,
-Dai
Jeff Layton April 6, 2023, 9:51 p.m. UTC | #13
On Thu, 2023-04-06 at 13:58 -0700, dai.ngo@oracle.com wrote:
> On 4/6/23 12:59 PM, Jeff Layton wrote:
> > On Thu, 2023-04-06 at 19:43 +0000, Chuck Lever III wrote:
> > > > On Apr 6, 2023, at 3:36 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> > > > 
> > > > Hi Jeff,
> > > > 
> > > > Thank you for taking a look at the patch.
> > > > 
> > > > On 4/6/23 11:10 AM, Jeff Layton wrote:
> > > > > On Thu, 2023-04-06 at 13:33 -0400, Jeff Layton wrote:
> > > > > > On Tue, 2023-03-14 at 09:19 -0700, dai.ngo@oracle.com wrote:
> > > > > > > On 3/8/23 11:03 AM, dai.ngo@oracle.com wrote:
> > > > > > > > On 3/8/23 10:50 AM, Chuck Lever III wrote:
> > > > > > > > > > On Mar 8, 2023, at 1:45 PM, Dai Ngo <dai.ngo@oracle.com> wrote:
> > > > > > > > > > 
> > > > > > > > > > Currently call_bind_status places a hard limit of 3 to the number of
> > > > > > > > > > retries on EACCES error. This limit was done to accommodate the
> > > > > > > > > > behavior
> > > > > > > > > > of a buggy server that keeps returning garbage when the NLM daemon is
> > > > > > > > > > killed on the NFS server. However this change causes problem for other
> > > > > > > > > > servers that take a little longer than 9 seconds for the port mapper to
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > Actually, the EACCES error means that the host doesn't have the port
> > > > > registered.
> > > > Yes, our SQA team runs stress lock test and restart the NFS server.
> > > > Sometimes the NFS server starts up and register to the port mapper in
> > > > time and there is no problem but occasionally it's late coming up causing
> > > > this EACCES error.
> > > > 
> > > > >   That could happen if (e.g.) the host had a NFSv3 mount up
> > > > > with an NLM connection and then crashed and rebooted and didn't remount
> > > > > it.
> > > > can you please explain this scenario I don't quite follow it. If the v3
> > > > client crashed and did not remount the export then how can the client try
> > > > to access/lock anything on the server? I must have missing something here.
> > > > 
> > > > >   
> > Suppose you have a client with an admin that mounts a NFSv3 mount "by
> > hand" (and doesn't set up statd to run at boot time). Client requests an
> > NLM lock and then reboots.
> > 
> > When it comes up, there's no notification to the server that the client
> > rebooted. Later, the lock becomes free and the server tries to grant it
> > to the client. It talks to rpcbind but lockd is never started and the
> > server keeps querying the client's rpcbind forever.
> > 
> > Maybe more likely situation: the client crashes and loses its DHCP
> > address when it comes back up, and the old addr gets reassigned to
> > another host that has rpcbind running but no NFS.
> > 
> > Either way, it'd keep trying to call the client back indefinitely that
> > way.
> 
> Got it Jeff, thank you for the explanation. This is when NLM requests
> are originated from the NFS server.
> 

Mostly, yes. The old, stateless NFS v2/v3 server code didn't have much
in the way of callbacks, and v4 (for the most part) doesn't rely on
rpcbind.

That said, there may be some RPC calls done by the v2/3 NFS client that
don't have a direct connection to a client task. Consider stuff like
writeback requests. Signals won't do anything there.

I think keeping a hard timeout of some sort is probably prudent.

>
diff mbox series

Patch

diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index b8ca3ecaf8d7..8ada7dc802d3 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -90,8 +90,7 @@  struct rpc_task {
 #endif
 	unsigned char		tk_priority : 2,/* Task priority */
 				tk_garb_retry : 2,
-				tk_cred_retry : 2,
-				tk_rebind_retry : 2;
+				tk_cred_retry : 2;
 };
 
 typedef void			(*rpc_action)(struct rpc_task *);
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 0b0b9f1eed46..63b438d8564b 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2050,9 +2050,6 @@  call_bind_status(struct rpc_task *task)
 			status = -EOPNOTSUPP;
 			break;
 		}
-		if (task->tk_rebind_retry == 0)
-			break;
-		task->tk_rebind_retry--;
 		rpc_delay(task, 3*HZ);
 		goto retry_timeout;
 	case -ENOBUFS:
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index be587a308e05..c8321de341ee 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -817,7 +817,6 @@  rpc_init_task_statistics(struct rpc_task *task)
 	/* Initialize retry counters */
 	task->tk_garb_retry = 2;
 	task->tk_cred_retry = 2;
-	task->tk_rebind_retry = 2;
 
 	/* starting timestamp */
 	task->tk_start = ktime_get();