NFSD: nfsd4_open Avoid race with grace period expiration

Message ID	4E45C164.1070604@panasas.com (mailing list archive)
State	New, archived
Headers	show Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter2.kernel.org (8.14.4/8.14.4) with ESMTP id p7D0CVLZ032508 for <patchwork-linux-nfs@patchwork.kernel.org>; Sat, 13 Aug 2011 00:12:31 GMT Message-ID: <4E45C164.1070604@panasas.com> Date: Fri, 12 Aug 2011 17:12:20 -0700 From: Boaz Harrosh <bharrosh@panasas.com> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20110707 Thunderbird/5.0 MIME-Version: 1.0 To: "J. Bruce Fields" <bfields@redhat.com>, NFS list <linux-nfs@vger.kernel.org> Subject: [PATCH] NFSD: nfsd4_open Avoid race with grace period expiration Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk

Boaz Harrosh Aug. 13, 2011, 12:12 a.m. UTC

locks_in_grace() was called twice one for the "yes" case second
for the "no" case. If the status changes between these two calls
the Server would do the wrong thing. Sample it only once.

Also Add a DMESG print in the case of a bad (old) client that does
*not* send a RECLAIM_COMPLETE before doing new opens. The admin might
want to know it has an unsupported client at hand. Because in this
case with our server the client will get stuck in an endless loop.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 fs/nfsd/nfs4proc.c |   23 ++++++++++++++++-------
 1 files changed, 16 insertions(+), 7 deletions(-)

J. Bruce Fields Aug. 26, 2011, 8:39 p.m. UTC | #1

On Fri, Aug 12, 2011 at 05:12:20PM -0700, Boaz Harrosh wrote:
> 
> locks_in_grace() was called twice one for the "yes" case second
> for the "no" case. If the status changes between these two calls
> the Server would do the wrong thing. Sample it only once.

I don't see how this fixes any bug.  The only thing that could happen in
between those two checks is that the grace period could end.  In that
case op_claim_type must be NFS4_OPEN_CLAIM_PREVIOUS (otherwise we would
have jumped to out and not hit the second case).  The second condition
will therefore be true and we'll fail with err_no_grace.  I don't see
that is incorrect, as in fact the grace period is now over.

There may be a different bug: if the grace period ends *any time* after
locks_in_grace() is called but before we actually do the lock or open,
then a reclaim could be incorrectly granted.  The state lock prevents
this happening between two nfsv4 clients, but it could happen between
a v4 client and a lockd client, I think.  In more detail:


	lockd client		NFSv4 client
	------------		------------

				reclaim request passes grace check
		-- grace period ends --
	gets conflicting lock
	drops conflicting lock
				reclaim request granted

One fix might be to take some sort of reference count as long as you're
processing a reclaim request, and not end the grace period till that
count goes to zero.

Another might be push the grace checks down into the core lock code and
make sure there's a lock that provides mutual exclusion between the
locks_end_grace() call and lock reclaims.

--b.


> 
> Also Add a DMESG print in the case of a bad (old) client that does
> *not* send a RECLAIM_COMPLETE before doing new opens. The admin might
> want to know it has an unsupported client at hand. Because in this
> case with our server the client will get stuck in an endless loop.
> 
> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
> ---
>  fs/nfsd/nfs4proc.c |   23 ++++++++++++++++-------
>  1 files changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index a68384f..efc6369 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -301,8 +301,12 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	 */
>  	if (nfsd4_has_session(cstate) &&
>  	    !cstate->session->se_client->cl_firststate &&
> -	    open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
> +	    open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) {
> +		printk(KERN_INFO
> +			"NFSD: nfsd4_open: Broken client, "
> +			"open sent before RECLAIM_COMPLETE done\n");
>  		return nfserr_grace;
> +	}
>  
>  	if (nfsd4_has_session(cstate))
>  		copy_clientid(&open->op_clientid, cstate->session);
> @@ -333,12 +337,17 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  
>  	/* Openowner is now set, so sequence id will get bumped.  Now we need
>  	 * these checks before we do any creates: */
> -	status = nfserr_grace;
> -	if (locks_in_grace() && open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS)
> -		goto out;
> -	status = nfserr_no_grace;
> -	if (!locks_in_grace() && open->op_claim_type == NFS4_OPEN_CLAIM_PREVIOUS)
> -		goto out;
> +	if (locks_in_grace()) {
> +		if (open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) {
> +			status = nfserr_grace;
> +			goto out;
> +		}
> +	} else {
> +		if (open->op_claim_type == NFS4_OPEN_CLAIM_PREVIOUS) {
> +			status = nfserr_no_grace;
> +			goto out;
> +		}
> +	}
>  
>  	switch (open->op_claim_type) {
>  		case NFS4_OPEN_CLAIM_DELEGATE_CUR:
> -- 
> 1.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

J. Bruce Fields Aug. 26, 2011, 8:44 p.m. UTC | #2

You also asked more generally what we could do to make the grace period
less obnoxious.  I have some minimal notes here:

	http://wiki.linux-nfs.org/wiki/index.php/Nfsd4_server_recovery#Nice_to_have

Making sure there are no v4 clients on its own doesn't do the job since
we also want consistency with NLM clients.  But the kernel doesn't
currently know about NLM clients.

I haven't come up with any bright ideas here yet.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Boaz Harrosh Aug. 26, 2011, 9:35 p.m. UTC | #3

On 08/26/2011 01:39 PM, J. Bruce Fields wrote:
> On Fri, Aug 12, 2011 at 05:12:20PM -0700, Boaz Harrosh wrote:
>>
>> locks_in_grace() was called twice one for the "yes" case second
>> for the "no" case. If the status changes between these two calls
>> the Server would do the wrong thing. Sample it only once.
> 
> I don't see how this fixes any bug.  The only thing that could happen in
> between those two checks is that the grace period could end.  In that
> case op_claim_type must be NFS4_OPEN_CLAIM_PREVIOUS (otherwise we would
> have jumped to out and not hit the second case).  The second condition
> will therefore be true and we'll fail with err_no_grace.  I don't see
> that is incorrect, as in fact the grace period is now over.

OK You are right it is not NFS incorrect. But I still think a single
sample is better coding practice. no? Though I agree that the commit log
wording should be changed.

What about the other part of the patch? The print. Should I send a separate
patch for that?

> 
> There may be a different bug: if the grace period ends *any time* after
> locks_in_grace() is called but before we actually do the lock or open,
> then a reclaim could be incorrectly granted.  The state lock prevents
> this happening between two nfsv4 clients, but it could happen between
> a v4 client and a lockd client, I think.  In more detail:
> 
> 
> 	lockd client		NFSv4 client
> 	------------		------------
> 
> 				reclaim request passes grace check
> 		-- grace period ends --
> 	gets conflicting lock
> 	drops conflicting lock
> 				reclaim request granted
> 
> One fix might be to take some sort of reference count as long as you're
> processing a reclaim request, and not end the grace period till that
> count goes to zero.
> 
> Another might be push the grace checks down into the core lock code and
> make sure there's a lock that provides mutual exclusion between the
> locks_end_grace() call and lock reclaims.
> 

You might get by, by rechecking the grace period at the end of the
processing and if passed issue a "reclaim request failed" anyway.
So the first check is only for optimization but the final disposition
is the post-check. (Just as if you dropped that refcount above)

Boaz

> --b.
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bruce Fields Aug. 26, 2011, 9:54 p.m. UTC | #4

On Fri, Aug 26, 2011 at 02:35:22PM -0700, Boaz Harrosh wrote:
> On 08/26/2011 01:39 PM, J. Bruce Fields wrote:
> > On Fri, Aug 12, 2011 at 05:12:20PM -0700, Boaz Harrosh wrote:
> >>
> >> locks_in_grace() was called twice one for the "yes" case second
> >> for the "no" case. If the status changes between these two calls
> >> the Server would do the wrong thing. Sample it only once.
> > 
> > I don't see how this fixes any bug.  The only thing that could happen in
> > between those two checks is that the grace period could end.  In that
> > case op_claim_type must be NFS4_OPEN_CLAIM_PREVIOUS (otherwise we would
> > have jumped to out and not hit the second case).  The second condition
> > will therefore be true and we'll fail with err_no_grace.  I don't see
> > that is incorrect, as in fact the grace period is now over.
> 
> OK You are right it is not NFS incorrect. But I still think a single
> sample is better coding practice. no? Though I agree that the commit log
> wording should be changed.

Sure, if you want to resend with just that, that'd be OK.

> What about the other part of the patch? The print. Should I send a separate
> patch for that?

As a general rule, I try to avoid logging client problems:
	- A malicious client could fill up our logs.
	- A bunch of broken clients could do the same unintentionally.

If it's really necessary then maybe do it with printk_once().

Which clients are you worried about exactly?

> > One fix might be to take some sort of reference count as long as you're
> > processing a reclaim request, and not end the grace period till that
> > count goes to zero.
> > 
> > Another might be push the grace checks down into the core lock code and
> > make sure there's a lock that provides mutual exclusion between the
> > locks_end_grace() call and lock reclaims.
> > 
> 
> You might get by, by rechecking the grace period at the end of the
> processing and if passed issue a "reclaim request failed" anyway.
> So the first check is only for optimization but the final disposition
> is the post-check. (Just as if you dropped that refcount above)

I guess that could work, but you'd have to back out the operation you
just did if the check showed you'd left the grace period.  I'd rather
avoid that.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Boaz Harrosh Aug. 26, 2011, 11:51 p.m. UTC | #5

On 08/26/2011 02:54 PM, J. Bruce Fields wrote:
>> You might get by, by rechecking the grace period at the end of the
>> processing and if passed issue a "reclaim request failed" anyway.
>> So the first check is only for optimization but the final disposition
>> is the post-check. (Just as if you dropped that refcount above)
> 
> I guess that could work, but you'd have to back out the operation you
> just did if the check showed you'd left the grace period.  I'd rather
> avoid that.
> 

Yes, you'll need to "back out" of the operation. It was just a suggestion.
You'll have to see what is easier to implement.

The above is less invasive and does have merits. It is a bit like the RCU_FREE
pattern when at the end of the operation you see that you lost the race and needs
to "back out". But in the hot path it is very cheap, (No reference counts no locking).
And only in the very very rare event, Those that today we fail on, do you do
the extra "backing out". So overall runtime and coding wise it might be the
cheapest solution.

Just my $0.017

> --b.

Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Boaz Harrosh Aug. 26, 2011, 11:56 p.m. UTC | #6

On 08/26/2011 02:54 PM, J. Bruce Fields wrote:
> On Fri, Aug 26, 2011 at 02:35:22PM -0700, Boaz Harrosh wrote:
> 
> 
> If it's really necessary then maybe do it with printk_once().
> 
> Which clients are you worried about exactly?
> 

I'm afraid of Server bugs, actually, as demonstrated lately.

I'll summit a patch that atleast does a dprint to help
someone that tries to debug his problems.

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

J. Bruce Fields Aug. 26, 2011, 11:57 p.m. UTC | #7

On Fri, Aug 26, 2011 at 04:56:02PM -0700, Boaz Harrosh wrote:
> On 08/26/2011 02:54 PM, J. Bruce Fields wrote:
> > On Fri, Aug 26, 2011 at 02:35:22PM -0700, Boaz Harrosh wrote:
> > 
> > 
> > If it's really necessary then maybe do it with printk_once().
> > 
> > Which clients are you worried about exactly?
> > 
> 
> I'm afraid of Server bugs, actually, as demonstrated lately.

OK.

> I'll summit a patch that atleast does a dprint to help
> someone that tries to debug his problems.

Sounds fine.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

NFSD: nfsd4_open Avoid race with grace period expiration

Commit Message

Comments

Patch