[RFC] nfsd: fix nfsd4_cb_recall_done error handling

Message ID	20140922182923.GA18904@infradead.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nfs-owner@kernel.org> Date: Mon, 22 Sep 2014 11:29:23 -0700 From: Christoph Hellwig <hch@infradead.org> To: linux-nfs@vger.kernel.org Subject: [PATCH, RFC] nfsd: fix nfsd4_cb_recall_done error handling Message-ID: <20140922182923.GA18904@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk

Message ID

20140922182923.GA18904@infradead.org (mailing list archive)

State

New, archived

Headers

Date: Mon, 22 Sep 2014 11:29:23 -0700
From: Christoph Hellwig <hch@infradead.org>
To: linux-nfs@vger.kernel.org
Subject: [PATCH, RFC] nfsd: fix nfsd4_cb_recall_done error handling
Message-ID: <20140922182923.GA18904@infradead.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-nfs-owner@vger.kernel.org
Precedence: bulk

Commit Message

Christoph Hellwig Sept. 22, 2014, 6:29 p.m. UTC

The error handling for CB_RECALL seems fairly broken to me.

What looks good:

 - for EBADHANDLE and NFS4ERR_BAD_STATEID retry until dl_retries
   hits zero, then mark the connection down and set cb_done

What looks wrong:

 - for everything else we first mark the connection down, then
   retry until dl_retries hits zero, then mark the connection down
   again  and set cb_done.

From all I can see what we want is:

 - keep the behavior for EBADHANDLE and NFS4ERR_BAD_STATEID,
   otherwise jump straight to making the connection down
   and setting cb_done

But maybe I'm missing something?

Comments

Trond Myklebust Sept. 22, 2014, 8:03 p.m. UTC | #1

On Mon, Sep 22, 2014 at 2:29 PM, Christoph Hellwig <hch@infradead.org> wrote:
> The error handling for CB_RECALL seems fairly broken to me.
>
> What looks good:
>
>  - for EBADHANDLE and NFS4ERR_BAD_STATEID retry until dl_retries
>    hits zero, then mark the connection down and set cb_done
>
> What looks wrong:
>
>  - for everything else we first mark the connection down, then
>    retry until dl_retries hits zero, then mark the connection down
>    again  and set cb_done.
>
> From all I can see what we want is:
>
>  - keep the behavior for EBADHANDLE and NFS4ERR_BAD_STATEID,
>    otherwise jump straight to making the connection down
>    and setting cb_done
>
> But maybe I'm missing something?
>
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 17d5441..ed25c58 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -971,24 +971,21 @@ static void nfsd4_cb_recall_done(struct rpc_task *task, void *calldata)
>                 return;
>         switch (task->tk_status) {
>         case 0:
> -               cb->cb_done = true;
> -               return;
> +               break;
>         case -EBADHANDLE:
>         case -NFS4ERR_BAD_STATEID:
>                 /* Race: client probably got cb_recall
>                  * before open reply granting delegation */
> -               break;
> +               if (dp->dl_retries--) {
> +                       rpc_delay(task, 2*HZ);
> +                       task->tk_status = 0;
> +                       rpc_restart_call_prepare(task);
> +                       return;
> +               }
>         default:
>                 /* Network partition? */
>                 nfsd4_mark_cb_down(clp, task->tk_status);
>         }
> -       if (dp->dl_retries--) {
> -               rpc_delay(task, 2*HZ);
> -               task->tk_status = 0;
> -               rpc_restart_call_prepare(task);
> -               return;
> -       }
> -       nfsd4_mark_cb_down(clp, task->tk_status);
>         cb->cb_done = true;
>  }
>
>

We're also missing a handler for NFS4ERR_DELAY, which is listed as a
legal response to CB_RECALL in both RFC5661 and RFC3530bis. As far as
I can tell from the above, knfsd will currently take that to be a sign
it should mark the callback path as being down...

Christoph Hellwig Sept. 22, 2014, 8:06 p.m. UTC | #2

On Mon, Sep 22, 2014 at 04:03:37PM -0400, Trond Myklebust wrote:
> We're also missing a handler for NFS4ERR_DELAY, which is listed as a
> legal response to CB_RECALL in both RFC5661 and RFC3530bis. As far as
> I can tell from the above, knfsd will currently take that to be a sign
> it should mark the callback path as being down...

Yes.  I've got a fix of that further down in my queue with the pnfs
patches, just wanted to set this bit out first.

I plan to handle NFS4ERR_DELAY in the generic callback layer instead of
burderning it onto the individual callback implementations.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

J. Bruce Fields Sept. 22, 2014, 8:25 p.m. UTC | #3

On Mon, Sep 22, 2014 at 11:29:23AM -0700, Christoph Hellwig wrote:
> The error handling for CB_RECALL seems fairly broken to me.
> 
> What looks good:
> 
>  - for EBADHANDLE and NFS4ERR_BAD_STATEID retry until dl_retries
>    hits zero, then mark the connection down and set cb_done
> 
> What looks wrong:
> 
>  - for everything else we first mark the connection down, then
>    retry until dl_retries hits zero, then mark the connection down
>    again  and set cb_done.
> 
> >From all I can see what we want is:
> 
>  - keep the behavior for EBADHANDLE and NFS4ERR_BAD_STATEID,
>    otherwise jump straight to making the connection down
>    and setting cb_done
> 
> But maybe I'm missing something?

I can't think of anything; let me know when you want something applied.

--b.

> 
> 
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 17d5441..ed25c58 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -971,24 +971,21 @@ static void nfsd4_cb_recall_done(struct rpc_task *task, void *calldata)
>  		return;
>  	switch (task->tk_status) {
>  	case 0:
> -		cb->cb_done = true;
> -		return;
> +		break;
>  	case -EBADHANDLE:
>  	case -NFS4ERR_BAD_STATEID:
>  		/* Race: client probably got cb_recall
>  		 * before open reply granting delegation */
> -		break;
> +		if (dp->dl_retries--) {
> +			rpc_delay(task, 2*HZ);
> +			task->tk_status = 0;
> +			rpc_restart_call_prepare(task);
> +			return;
> +		}
>  	default:
>  		/* Network partition? */
>  		nfsd4_mark_cb_down(clp, task->tk_status);
>  	}
> -	if (dp->dl_retries--) {
> -		rpc_delay(task, 2*HZ);
> -		task->tk_status = 0;
> -		rpc_restart_call_prepare(task);
> -		return;
> -	}
> -	nfsd4_mark_cb_down(clp, task->tk_status);
>  	cb->cb_done = true;
>  }
>  
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 17d5441..ed25c58 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -971,24 +971,21 @@  static void nfsd4_cb_recall_done(struct rpc_task *task, void *calldata)
 		return;
 	switch (task->tk_status) {
 	case 0:
-		cb->cb_done = true;
-		return;
+		break;
 	case -EBADHANDLE:
 	case -NFS4ERR_BAD_STATEID:
 		/* Race: client probably got cb_recall
 		 * before open reply granting delegation */
-		break;
+		if (dp->dl_retries--) {
+			rpc_delay(task, 2*HZ);
+			task->tk_status = 0;
+			rpc_restart_call_prepare(task);
+			return;
+		}
 	default:
 		/* Network partition? */
 		nfsd4_mark_cb_down(clp, task->tk_status);
 	}
-	if (dp->dl_retries--) {
-		rpc_delay(task, 2*HZ);
-		task->tk_status = 0;
-		rpc_restart_call_prepare(task);
-		return;
-	}
-	nfsd4_mark_cb_down(clp, task->tk_status);
 	cb->cb_done = true;
 }

[RFC] nfsd: fix nfsd4_cb_recall_done error handling

Commit Message

Comments

Patch