Message ID | 1441037201-78787-1-git-send-email-aweits@rit.edu (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 31 Aug 2015 12:06:41 -0400 Andrew Elble <aweits@rit.edu> wrote: > We have observed the server sending recalls for delegation stateids > that have already been successfully returned. Change > nfsd4_cb_recall_done() to return success if the client has returned > the delegation. While this does not completely eliminate the sending > of recalls for delegations that have already been returned, this > does prevent unnecessarily declaring the callback path to be down. > > Reported-by: Eric Meddaugh <etmsys@rit.edu> > Signed-off-by: Andrew Elble <aweits@rit.edu> > --- > fs/nfsd/nfs4state.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index de45c2de1668..ac235a7470e3 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -3509,6 +3509,9 @@ static int nfsd4_cb_recall_done(struct nfsd4_callback *cb, > { > struct nfs4_delegation *dp = cb_to_delegation(cb); > > + if (dp->dl_stid.sc_type == NFS4_CLOSED_DELEG_STID) > + return 1; > + > switch (task->tk_status) { > case 0: > return 1; Yeah, there will always be some potential for a CB_RECALL/DELEGRETURN race since the former is driven by the server and the latter by the client. What error are you usually getting back from the client when you see this happen? NFS4ERR_BAD_STATEID? If so, then maybe we should confine this check to that error case? OTOH, maybe there's no harm in just ignoring the error if the delegation is already returned... Acked-by: Jeff Layton <jlayton@poochiereds.net> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Yeah, there will always be some potential for a CB_RECALL/DELEGRETURN > race since the former is driven by the server and the latter by the > client. I believe there are some client differences to account for as well. i.e. when does the client "unhash" the delegation? Before the DELEGRETURN is sent, or after the response comes back? > What error are you usually getting back from the client when you see > this happen? NFS4ERR_BAD_STATEID? If so, then maybe we should confine > this check to that error case? Usually EBADHANDLE. I actually had a version of this patch where it was confined to those cases. My rationale in changing it was that continuing to solicit the return of the delegation when we have it is pointless. i.e. RFC5661 20.2.4: "...The recall is not complete until the delegation is returned using a DELEGRETURN operation." Thanks, Andy
I should probably state what I'm currently chasing: 1.) Somehow, delegations wind up on the cl_revoked list at the server. 2.) The server continually asserts SEQ4_STATUS_RECALLABLE_STATE_REVOKED. 3.) The client for some reason doesn't have anything to give the server to satisfy it. I speculate that this patch will assist in making this go away - I'm just not 100% sure of all the conditions that result in making #1 possible. (Enough recalls backed up on the callback slot on the same filehandle stacking timeouts/retries and resultant path down errors to exceed lease time?). Unfortunately this problem seems to always occur at ~1AM - and obtaining a network capture of the problem in action has been elusive. Thanks, Andy
On Tue, Sep 01, 2015 at 09:18:42AM -0400, Jeff Layton wrote: > On Mon, 31 Aug 2015 12:06:41 -0400 > Andrew Elble <aweits@rit.edu> wrote: > > > We have observed the server sending recalls for delegation stateids > > that have already been successfully returned. Change > > nfsd4_cb_recall_done() to return success if the client has returned > > the delegation. While this does not completely eliminate the sending > > of recalls for delegations that have already been returned, this > > does prevent unnecessarily declaring the callback path to be down. > > > > Reported-by: Eric Meddaugh <etmsys@rit.edu> > > Signed-off-by: Andrew Elble <aweits@rit.edu> > > --- > > fs/nfsd/nfs4state.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > index de45c2de1668..ac235a7470e3 100644 > > --- a/fs/nfsd/nfs4state.c > > +++ b/fs/nfsd/nfs4state.c > > @@ -3509,6 +3509,9 @@ static int nfsd4_cb_recall_done(struct nfsd4_callback *cb, > > { > > struct nfs4_delegation *dp = cb_to_delegation(cb); > > > > + if (dp->dl_stid.sc_type == NFS4_CLOSED_DELEG_STID) > > + return 1; > > + > > switch (task->tk_status) { > > case 0: > > return 1; > > Yeah, there will always be some potential for a CB_RECALL/DELEGRETURN > race since the former is driven by the server and the latter by the > client. > > What error are you usually getting back from the client when you see > this happen? NFS4ERR_BAD_STATEID? If so, then maybe we should confine > this check to that error case? > > OTOH, maybe there's no harm in just ignoring the error if the > delegation is already returned... > > Acked-by: Jeff Layton <jlayton@poochiereds.net> Agreed, applying--thanks.--b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 01, 2015 at 10:48:27AM -0400, Andrew W Elble wrote: > > I should probably state what I'm currently chasing: > > 1.) Somehow, delegations wind up on the cl_revoked list at the server. > 2.) The server continually asserts SEQ4_STATUS_RECALLABLE_STATE_REVOKED. > 3.) The client for some reason doesn't have anything to give the server > to satisfy it. > > I speculate that this patch will assist in making this go away - I'm > just not 100% sure of all the conditions that result in making #1 possible. > (Enough recalls backed up on the callback slot on the same filehandle > stacking timeouts/retries and resultant path down errors to exceed lease > time?). Even in the worst case where the client never gets the recall, in theory it should take the RECALLABLE_STATE_REVOKED as a sign that it's missed one and return its delegations. Is the client attempting any such recovery when it sees that flag? > Unfortunately this problem seems to always occur at ~1AM - and > obtaining a network capture of the problem in action has been elusive. Anyway, sounds interesting, let us know what you find.... --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index de45c2de1668..ac235a7470e3 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3509,6 +3509,9 @@ static int nfsd4_cb_recall_done(struct nfsd4_callback *cb, { struct nfs4_delegation *dp = cb_to_delegation(cb); + if (dp->dl_stid.sc_type == NFS4_CLOSED_DELEG_STID) + return 1; + switch (task->tk_status) { case 0: return 1;
We have observed the server sending recalls for delegation stateids that have already been successfully returned. Change nfsd4_cb_recall_done() to return success if the client has returned the delegation. While this does not completely eliminate the sending of recalls for delegations that have already been returned, this does prevent unnecessarily declaring the callback path to be down. Reported-by: Eric Meddaugh <etmsys@rit.edu> Signed-off-by: Andrew Elble <aweits@rit.edu> --- fs/nfsd/nfs4state.c | 3 +++ 1 file changed, 3 insertions(+)