NFS: Wait for session recovery to finish before returning

Message ID	1351627595-6462-1-git-send-email-bjschuma@netapp.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nfs-owner@vger.kernel.org> From: bjschuma@netapp.com To: Trond.Myklebust@netapp.com Cc: linux-nfs@vger.kernel.org Subject: [PATCH] NFS: Wait for session recovery to finish before returning Date: Tue, 30 Oct 2012 16:06:35 -0400 Message-Id: <1351627595-6462-1-git-send-email-bjschuma@netapp.com> Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk

Message ID

1351627595-6462-1-git-send-email-bjschuma@netapp.com (mailing list archive)

State

New, archived

Headers

From: bjschuma@netapp.com
To: Trond.Myklebust@netapp.com
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH] NFS: Wait for session recovery to finish before returning
Date: Tue, 30 Oct 2012 16:06:35 -0400
Message-Id: <1351627595-6462-1-git-send-email-bjschuma@netapp.com>
Sender: linux-nfs-owner@vger.kernel.org
Precedence: bulk

Commit Message

Bryan Schumaker Oct. 30, 2012, 8:06 p.m. UTC

From: Bryan Schumaker <bjschuma@netapp.com>

Currently, we will schedule session recovery and then return to the
caller of nfs4_handle_exception.  This works for most cases, but causes
a hang on the following test case:

	Client				Server
	------				------
	Open file over NFS v4.1
	Write to file
					Expire client
	Try to lock file

The server will return NFS4ERR_BADSESSION, prompting the client to
schedule recovery.  However, the client will continue placing lock
attempts and the open recovery never seems to be scheduled.  The
simplest solution is to wait for session recovery to run before retrying
the lock.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
---
 fs/nfs/nfs4proc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

William A. (Andy) Adamson Oct. 31, 2012, 10:14 p.m. UTC | #1

On Tue, Oct 30, 2012 at 4:06 PM,  <bjschuma@netapp.com> wrote:
> From: Bryan Schumaker <bjschuma@netapp.com>
>
> Currently, we will schedule session recovery and then return to the
> caller of nfs4_handle_exception.  This works for most cases, but causes
> a hang on the following test case:
>
>         Client                          Server
>         ------                          ------
>         Open file over NFS v4.1
>         Write to file
>                                         Expire client
>         Try to lock file
>
> The server will return NFS4ERR_BADSESSION, prompting the client to
> schedule recovery.  However, the client will continue placing lock
> attempts and the open recovery never seems to be scheduled.

I need more detail. Is this what is happening?

When the first NFS4ERR_BADSESSION error arrives, session recovery
starts by setting the NFS4_SESSION_DRAINING bit which places all non
RPC_PRIORITY_PRIVILEGED tasks with OP_SEQUENCE on the slot_tbl_waitq,
so the lock requests should be placed on the slot_tbl_waitq and not go
on the wire.

Once the session is drained,  DESTROY_SESSION and CREATE_SESSION are
called and one will return with NFS4ERR_STALE_CLIENTID kicking off the
EXCHANGE_ID / CREATE_SESSION to establish a new clientid.

Do you see the clientid being recovered?

Once the new session is created, it inherits the slot_tbl_waitq of the
old session, and all the lock requests and any other tasks piled up on
the slot_tbl_waitq will run. At the same time, the state_manager
thread is running recovery for any OPENs established under the old
clientid, and then their associated LOCKs.

The OPEN and LOCK recovery tasks run with RPC_PRIORITY_PRIVILEGE which
should mean that they get run before the RPC tasks with lower
priority.

So my question (finally!): How is open recovery starvation happening?
nfs41_sequence_free_slot => nfs4_check_drain_fc_complete =>
rpc_wake_up_first on the session slot_tbl_waitq, which in turn calls
__rpc_find_next_queued_priority which should choose
RPC_PRIORITY_PRIVILEGE tasks over lower privileged tasks on the
session slot_tbl_waitq.

So even with a bunch of slots, when the first one returns and and free
it's slot, the priority tasks (e.g. open recovery) should run.

-->Andy

> The
> simplest solution is to wait for session recovery to run before retrying
> the lock.

If on the wire RPCs return with NFS4ERR_BADSESSION, and then wait for
session recovery in the exception handler, they will not call their
rpc_release function

>
> Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
> ---
>  fs/nfs/nfs4proc.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 8b04d14..70e0c47 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -338,8 +338,7 @@ static int nfs4_handle_exception(struct nfs_server *server, int errorcode, struc
>                         dprintk("%s ERROR: %d Reset session\n", __func__,
>                                 errorcode);
>                         nfs4_schedule_session_recovery(clp->cl_session, errorcode);
> -                       exception->retry = 1;
> -                       break;
> +                       goto wait_on_recovery;
>  #endif /* defined(CONFIG_NFS_V4_1) */
>                 case -NFS4ERR_FILE_OPEN:
>                         if (exception->timeout > HZ) {
> --
> 1.8.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

William A. (Andy) Adamson Oct. 31, 2012, 10:19 p.m. UTC | #2

On Wed, Oct 31, 2012 at 6:14 PM, Andy Adamson <androsadamson@gmail.com> wrote:
> On Tue, Oct 30, 2012 at 4:06 PM,  <bjschuma@netapp.com> wrote:
>> From: Bryan Schumaker <bjschuma@netapp.com>
>>
>> Currently, we will schedule session recovery and then return to the
>> caller of nfs4_handle_exception.  This works for most cases, but causes
>> a hang on the following test case:
>>
>>         Client                          Server
>>         ------                          ------
>>         Open file over NFS v4.1
>>         Write to file
>>                                         Expire client
>>         Try to lock file
>>
>> The server will return NFS4ERR_BADSESSION, prompting the client to
>> schedule recovery.  However, the client will continue placing lock
>> attempts and the open recovery never seems to be scheduled.
>
> I need more detail. Is this what is happening?
>
> When the first NFS4ERR_BADSESSION error arrives, session recovery
> starts by setting the NFS4_SESSION_DRAINING bit which places all non
> RPC_PRIORITY_PRIVILEGED tasks with OP_SEQUENCE on the slot_tbl_waitq,
> so the lock requests should be placed on the slot_tbl_waitq and not go
> on the wire.
>
> Once the session is drained,  DESTROY_SESSION and CREATE_SESSION are
> called and one will return with NFS4ERR_STALE_CLIENTID kicking off the
> EXCHANGE_ID / CREATE_SESSION to establish a new clientid.
>
> Do you see the clientid being recovered?
>
> Once the new session is created, it inherits the slot_tbl_waitq of the
> old session, and all the lock requests and any other tasks piled up on
> the slot_tbl_waitq will run. At the same time, the state_manager
> thread is running recovery for any OPENs established under the old
> clientid, and then their associated LOCKs.
>
> The OPEN and LOCK recovery tasks run with RPC_PRIORITY_PRIVILEGE which
> should mean that they get run before the RPC tasks with lower
> priority.
>
> So my question (finally!): How is open recovery starvation happening?
> nfs41_sequence_free_slot => nfs4_check_drain_fc_complete =>
> rpc_wake_up_first on the session slot_tbl_waitq, which in turn calls
> __rpc_find_next_queued_priority which should choose
> RPC_PRIORITY_PRIVILEGE tasks over lower privileged tasks on the
> session slot_tbl_waitq.
>
> So even with a bunch of slots, when the first one returns and and free
> it's slot, the priority tasks (e.g. open recovery) should run.
>
>
> -->Andy
>
>
>> The
>> simplest solution is to wait for session recovery to run before retrying
>> the lock.
>
> If on the wire RPCs return with NFS4ERR_BADSESSION, and then wait for
> session recovery in the exception handler, they will not call their
> rpc_release function

Please ignore the above sentence - it was early thinking....

-->Andy

>
>>
>> Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
>> ---
>>  fs/nfs/nfs4proc.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 8b04d14..70e0c47 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -338,8 +338,7 @@ static int nfs4_handle_exception(struct nfs_server *server, int errorcode, struc
>>                         dprintk("%s ERROR: %d Reset session\n", __func__,
>>                                 errorcode);
>>                         nfs4_schedule_session_recovery(clp->cl_session, errorcode);
>> -                       exception->retry = 1;
>> -                       break;
>> +                       goto wait_on_recovery;
>>  #endif /* defined(CONFIG_NFS_V4_1) */
>>                 case -NFS4ERR_FILE_OPEN:
>>                         if (exception->timeout > HZ) {
>> --
>> 1.8.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 8b04d14..70e0c47 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -338,8 +338,7 @@  static int nfs4_handle_exception(struct nfs_server *server, int errorcode, struc
 			dprintk("%s ERROR: %d Reset session\n", __func__,
 				errorcode);
 			nfs4_schedule_session_recovery(clp->cl_session, errorcode);
-			exception->retry = 1;
-			break;
+			goto wait_on_recovery;
 #endif /* defined(CONFIG_NFS_V4_1) */
 		case -NFS4ERR_FILE_OPEN:
 			if (exception->timeout > HZ) {

NFS: Wait for session recovery to finish before returning

Commit Message

Comments

Patch