Message ID | 4E45C164.1070604@panasas.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Aug 12, 2011 at 05:12:20PM -0700, Boaz Harrosh wrote: > > locks_in_grace() was called twice one for the "yes" case second > for the "no" case. If the status changes between these two calls > the Server would do the wrong thing. Sample it only once. I don't see how this fixes any bug. The only thing that could happen in between those two checks is that the grace period could end. In that case op_claim_type must be NFS4_OPEN_CLAIM_PREVIOUS (otherwise we would have jumped to out and not hit the second case). The second condition will therefore be true and we'll fail with err_no_grace. I don't see that is incorrect, as in fact the grace period is now over. There may be a different bug: if the grace period ends *any time* after locks_in_grace() is called but before we actually do the lock or open, then a reclaim could be incorrectly granted. The state lock prevents this happening between two nfsv4 clients, but it could happen between a v4 client and a lockd client, I think. In more detail: lockd client NFSv4 client ------------ ------------ reclaim request passes grace check -- grace period ends -- gets conflicting lock drops conflicting lock reclaim request granted One fix might be to take some sort of reference count as long as you're processing a reclaim request, and not end the grace period till that count goes to zero. Another might be push the grace checks down into the core lock code and make sure there's a lock that provides mutual exclusion between the locks_end_grace() call and lock reclaims. --b. > > Also Add a DMESG print in the case of a bad (old) client that does > *not* send a RECLAIM_COMPLETE before doing new opens. The admin might > want to know it has an unsupported client at hand. Because in this > case with our server the client will get stuck in an endless loop. > > Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> > --- > fs/nfsd/nfs4proc.c | 23 ++++++++++++++++------- > 1 files changed, 16 insertions(+), 7 deletions(-) > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > index a68384f..efc6369 100644 > --- a/fs/nfsd/nfs4proc.c > +++ b/fs/nfsd/nfs4proc.c > @@ -301,8 +301,12 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > */ > if (nfsd4_has_session(cstate) && > !cstate->session->se_client->cl_firststate && > - open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) > + open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) { > + printk(KERN_INFO > + "NFSD: nfsd4_open: Broken client, " > + "open sent before RECLAIM_COMPLETE done\n"); > return nfserr_grace; > + } > > if (nfsd4_has_session(cstate)) > copy_clientid(&open->op_clientid, cstate->session); > @@ -333,12 +337,17 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > > /* Openowner is now set, so sequence id will get bumped. Now we need > * these checks before we do any creates: */ > - status = nfserr_grace; > - if (locks_in_grace() && open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) > - goto out; > - status = nfserr_no_grace; > - if (!locks_in_grace() && open->op_claim_type == NFS4_OPEN_CLAIM_PREVIOUS) > - goto out; > + if (locks_in_grace()) { > + if (open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) { > + status = nfserr_grace; > + goto out; > + } > + } else { > + if (open->op_claim_type == NFS4_OPEN_CLAIM_PREVIOUS) { > + status = nfserr_no_grace; > + goto out; > + } > + } > > switch (open->op_claim_type) { > case NFS4_OPEN_CLAIM_DELEGATE_CUR: > -- > 1.7.6 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
You also asked more generally what we could do to make the grace period less obnoxious. I have some minimal notes here: http://wiki.linux-nfs.org/wiki/index.php/Nfsd4_server_recovery#Nice_to_have Making sure there are no v4 clients on its own doesn't do the job since we also want consistency with NLM clients. But the kernel doesn't currently know about NLM clients. I haven't come up with any bright ideas here yet. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/26/2011 01:39 PM, J. Bruce Fields wrote: > On Fri, Aug 12, 2011 at 05:12:20PM -0700, Boaz Harrosh wrote: >> >> locks_in_grace() was called twice one for the "yes" case second >> for the "no" case. If the status changes between these two calls >> the Server would do the wrong thing. Sample it only once. > > I don't see how this fixes any bug. The only thing that could happen in > between those two checks is that the grace period could end. In that > case op_claim_type must be NFS4_OPEN_CLAIM_PREVIOUS (otherwise we would > have jumped to out and not hit the second case). The second condition > will therefore be true and we'll fail with err_no_grace. I don't see > that is incorrect, as in fact the grace period is now over. OK You are right it is not NFS incorrect. But I still think a single sample is better coding practice. no? Though I agree that the commit log wording should be changed. What about the other part of the patch? The print. Should I send a separate patch for that? > > There may be a different bug: if the grace period ends *any time* after > locks_in_grace() is called but before we actually do the lock or open, > then a reclaim could be incorrectly granted. The state lock prevents > this happening between two nfsv4 clients, but it could happen between > a v4 client and a lockd client, I think. In more detail: > > > lockd client NFSv4 client > ------------ ------------ > > reclaim request passes grace check > -- grace period ends -- > gets conflicting lock > drops conflicting lock > reclaim request granted > > One fix might be to take some sort of reference count as long as you're > processing a reclaim request, and not end the grace period till that > count goes to zero. > > Another might be push the grace checks down into the core lock code and > make sure there's a lock that provides mutual exclusion between the > locks_end_grace() call and lock reclaims. > You might get by, by rechecking the grace period at the end of the processing and if passed issue a "reclaim request failed" anyway. So the first check is only for optimization but the final disposition is the post-check. (Just as if you dropped that refcount above) Boaz > --b. > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 26, 2011 at 02:35:22PM -0700, Boaz Harrosh wrote: > On 08/26/2011 01:39 PM, J. Bruce Fields wrote: > > On Fri, Aug 12, 2011 at 05:12:20PM -0700, Boaz Harrosh wrote: > >> > >> locks_in_grace() was called twice one for the "yes" case second > >> for the "no" case. If the status changes between these two calls > >> the Server would do the wrong thing. Sample it only once. > > > > I don't see how this fixes any bug. The only thing that could happen in > > between those two checks is that the grace period could end. In that > > case op_claim_type must be NFS4_OPEN_CLAIM_PREVIOUS (otherwise we would > > have jumped to out and not hit the second case). The second condition > > will therefore be true and we'll fail with err_no_grace. I don't see > > that is incorrect, as in fact the grace period is now over. > > OK You are right it is not NFS incorrect. But I still think a single > sample is better coding practice. no? Though I agree that the commit log > wording should be changed. Sure, if you want to resend with just that, that'd be OK. > What about the other part of the patch? The print. Should I send a separate > patch for that? As a general rule, I try to avoid logging client problems: - A malicious client could fill up our logs. - A bunch of broken clients could do the same unintentionally. If it's really necessary then maybe do it with printk_once(). Which clients are you worried about exactly? > > One fix might be to take some sort of reference count as long as you're > > processing a reclaim request, and not end the grace period till that > > count goes to zero. > > > > Another might be push the grace checks down into the core lock code and > > make sure there's a lock that provides mutual exclusion between the > > locks_end_grace() call and lock reclaims. > > > > You might get by, by rechecking the grace period at the end of the > processing and if passed issue a "reclaim request failed" anyway. > So the first check is only for optimization but the final disposition > is the post-check. (Just as if you dropped that refcount above) I guess that could work, but you'd have to back out the operation you just did if the check showed you'd left the grace period. I'd rather avoid that. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/26/2011 02:54 PM, J. Bruce Fields wrote: >> You might get by, by rechecking the grace period at the end of the >> processing and if passed issue a "reclaim request failed" anyway. >> So the first check is only for optimization but the final disposition >> is the post-check. (Just as if you dropped that refcount above) > > I guess that could work, but you'd have to back out the operation you > just did if the check showed you'd left the grace period. I'd rather > avoid that. > Yes, you'll need to "back out" of the operation. It was just a suggestion. You'll have to see what is easier to implement. The above is less invasive and does have merits. It is a bit like the RCU_FREE pattern when at the end of the operation you see that you lost the race and needs to "back out". But in the hot path it is very cheap, (No reference counts no locking). And only in the very very rare event, Those that today we fail on, do you do the extra "backing out". So overall runtime and coding wise it might be the cheapest solution. Just my $0.017 > --b. Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/26/2011 02:54 PM, J. Bruce Fields wrote: > On Fri, Aug 26, 2011 at 02:35:22PM -0700, Boaz Harrosh wrote: > > > If it's really necessary then maybe do it with printk_once(). > > Which clients are you worried about exactly? > I'm afraid of Server bugs, actually, as demonstrated lately. I'll summit a patch that atleast does a dprint to help someone that tries to debug his problems. Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 26, 2011 at 04:56:02PM -0700, Boaz Harrosh wrote: > On 08/26/2011 02:54 PM, J. Bruce Fields wrote: > > On Fri, Aug 26, 2011 at 02:35:22PM -0700, Boaz Harrosh wrote: > > > > > > If it's really necessary then maybe do it with printk_once(). > > > > Which clients are you worried about exactly? > > > > I'm afraid of Server bugs, actually, as demonstrated lately. OK. > I'll summit a patch that atleast does a dprint to help > someone that tries to debug his problems. Sounds fine. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index a68384f..efc6369 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -301,8 +301,12 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, */ if (nfsd4_has_session(cstate) && !cstate->session->se_client->cl_firststate && - open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) + open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) { + printk(KERN_INFO + "NFSD: nfsd4_open: Broken client, " + "open sent before RECLAIM_COMPLETE done\n"); return nfserr_grace; + } if (nfsd4_has_session(cstate)) copy_clientid(&open->op_clientid, cstate->session); @@ -333,12 +337,17 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, /* Openowner is now set, so sequence id will get bumped. Now we need * these checks before we do any creates: */ - status = nfserr_grace; - if (locks_in_grace() && open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) - goto out; - status = nfserr_no_grace; - if (!locks_in_grace() && open->op_claim_type == NFS4_OPEN_CLAIM_PREVIOUS) - goto out; + if (locks_in_grace()) { + if (open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) { + status = nfserr_grace; + goto out; + } + } else { + if (open->op_claim_type == NFS4_OPEN_CLAIM_PREVIOUS) { + status = nfserr_no_grace; + goto out; + } + } switch (open->op_claim_type) { case NFS4_OPEN_CLAIM_DELEGATE_CUR:
locks_in_grace() was called twice one for the "yes" case second for the "no" case. If the status changes between these two calls the Server would do the wrong thing. Sample it only once. Also Add a DMESG print in the case of a bad (old) client that does *not* send a RECLAIM_COMPLETE before doing new opens. The admin might want to know it has an unsupported client at hand. Because in this case with our server the client will get stuck in an endless loop. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> --- fs/nfsd/nfs4proc.c | 23 ++++++++++++++++------- 1 files changed, 16 insertions(+), 7 deletions(-)