Message ID | 01047e4baa85ca541a5a43f88f588b15163292dc.1687890438.git.bcodding@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return" | expand |
On Tue, 2023-06-27 at 14:31 -0400, Benjamin Coddington wrote: > Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during > delegation > return") attempted to solve this problem by using nfs4's generic > async error > handling, but introduced a regression where v4.0 lock recovery would > hang. > The additional complexity introduced by overloading that error > handling is > not necessary for this case. > > The problem as originally explained in the above commit is: > > There's a small window where a LOCK sent during a delegation > return can > race with another OPEN on client, but the open stateid has not > yet been > updated. In this case, the client doesn't handle the OLD_STATEID > error > from the server and will lose this lock, emitting: > "NFS: nfs4_handle_delegation_recall_error: unhandled error - > 10024". > > We want a fix that is much more focused to the original problem. Fix > this > issue by returning -EAGAIN from the > nfs4_handle_delegation_recall_error() on > OLD_STATEID, and use that as a signal for the delegation return code > to > retry the LOCK operation. We should at this point be able to send > along > the updated stateid. > > Signed-off-by: Benjamin Coddington <bcodding@redhat.com> > --- > fs/nfs/delegation.c | 4 +++- > fs/nfs/nfs4proc.c | 1 + > 2 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c > index cf7365581031..23aeb02319a5 100644 > --- a/fs/nfs/delegation.c > +++ b/fs/nfs/delegation.c > @@ -160,7 +160,9 @@ static int nfs_delegation_claim_locks(struct > nfs4_state *state, const nfs4_state > if (nfs_file_open_context(fl->fl_file)->state != > state) > continue; > spin_unlock(&flctx->flc_lock); > - status = nfs4_lock_delegation_recall(fl, state, > stateid); > + do { > + status = nfs4_lock_delegation_recall(fl, > state, stateid); > + } while (status == -EAGAIN); > if (status < 0) > goto out; > spin_lock(&flctx->flc_lock); > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > index 6bb14f6cfbc0..399db73a57f4 100644 > --- a/fs/nfs/nfs4proc.c > +++ b/fs/nfs/nfs4proc.c > @@ -2262,6 +2262,7 @@ static int > nfs4_handle_delegation_recall_error(struct nfs_server *server, struct > case -NFS4ERR_BAD_HIGH_SLOT: > case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION: > case -NFS4ERR_DEADSESSION: > + case -NFS4ERR_OLD_STATEID: > return -EAGAIN; Hmm... Rather than issuing a blanket EAGAIN, we really should be looking at using either nfs4_refresh_lock_old_stateid() or nfs4_refresh_open_old_stateid(), depending on whether the stateid that saw the NFS4ERR_OLD_STATEID was a lock stateid or an open stateid. > case -NFS4ERR_STALE_CLIENTID: > case -NFS4ERR_STALE_STATEID:
On 29 Jun 2023, at 14:33, Trond Myklebust wrote: > On Tue, 2023-06-27 at 14:31 -0400, Benjamin Coddington wrote: >> Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during >> delegation >> return") attempted to solve this problem by using nfs4's generic >> async error >> handling, but introduced a regression where v4.0 lock recovery would >> hang. >> The additional complexity introduced by overloading that error >> handling is >> not necessary for this case. >> >> The problem as originally explained in the above commit is: >> >> There's a small window where a LOCK sent during a delegation >> return can >> race with another OPEN on client, but the open stateid has not >> yet been >> updated. In this case, the client doesn't handle the OLD_STATEID >> error >> from the server and will lose this lock, emitting: >> "NFS: nfs4_handle_delegation_recall_error: unhandled error - >> 10024". >> >> We want a fix that is much more focused to the original problem. Fix >> this >> issue by returning -EAGAIN from the >> nfs4_handle_delegation_recall_error() on >> OLD_STATEID, and use that as a signal for the delegation return code >> to >> retry the LOCK operation. We should at this point be able to send >> along >> the updated stateid. >> >> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> >> --- >> fs/nfs/delegation.c | 4 +++- >> fs/nfs/nfs4proc.c | 1 + >> 2 files changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c >> index cf7365581031..23aeb02319a5 100644 >> --- a/fs/nfs/delegation.c >> +++ b/fs/nfs/delegation.c >> @@ -160,7 +160,9 @@ static int nfs_delegation_claim_locks(struct >> nfs4_state *state, const nfs4_state >> if (nfs_file_open_context(fl->fl_file)->state != >> state) >> continue; >> spin_unlock(&flctx->flc_lock); >> - status = nfs4_lock_delegation_recall(fl, state, >> stateid); >> + do { >> + status = nfs4_lock_delegation_recall(fl, >> state, stateid); >> + } while (status == -EAGAIN); >> if (status < 0) >> goto out; >> spin_lock(&flctx->flc_lock); >> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >> index 6bb14f6cfbc0..399db73a57f4 100644 >> --- a/fs/nfs/nfs4proc.c >> +++ b/fs/nfs/nfs4proc.c >> @@ -2262,6 +2262,7 @@ static int >> nfs4_handle_delegation_recall_error(struct nfs_server *server, struct >> case -NFS4ERR_BAD_HIGH_SLOT: >> case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION: >> case -NFS4ERR_DEADSESSION: >> + case -NFS4ERR_OLD_STATEID: >> return -EAGAIN; > > Hmm... Rather than issuing a blanket EAGAIN, we really should be > looking at using either nfs4_refresh_lock_old_stateid() or > nfs4_refresh_open_old_stateid(), depending on whether the stateid that > saw the NFS4ERR_OLD_STATEID was a lock stateid or an open stateid. I figured if there was client race that would trigger the OLD_STATEID on open, we'd have heard from the "unhandled error" printk by now, so I rushed this out.. But I also don't like the overloading for open error handling here. I'll work it up as you suggest. The revert can go without a fix (IMO). The fix is worse than original bug which was really hard to hit. I couldn't reproduce it without artificial delays. Ben
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c index cf7365581031..23aeb02319a5 100644 --- a/fs/nfs/delegation.c +++ b/fs/nfs/delegation.c @@ -160,7 +160,9 @@ static int nfs_delegation_claim_locks(struct nfs4_state *state, const nfs4_state if (nfs_file_open_context(fl->fl_file)->state != state) continue; spin_unlock(&flctx->flc_lock); - status = nfs4_lock_delegation_recall(fl, state, stateid); + do { + status = nfs4_lock_delegation_recall(fl, state, stateid); + } while (status == -EAGAIN); if (status < 0) goto out; spin_lock(&flctx->flc_lock); diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 6bb14f6cfbc0..399db73a57f4 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -2262,6 +2262,7 @@ static int nfs4_handle_delegation_recall_error(struct nfs_server *server, struct case -NFS4ERR_BAD_HIGH_SLOT: case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION: case -NFS4ERR_DEADSESSION: + case -NFS4ERR_OLD_STATEID: return -EAGAIN; case -NFS4ERR_STALE_CLIENTID: case -NFS4ERR_STALE_STATEID:
Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation return") attempted to solve this problem by using nfs4's generic async error handling, but introduced a regression where v4.0 lock recovery would hang. The additional complexity introduced by overloading that error handling is not necessary for this case. The problem as originally explained in the above commit is: There's a small window where a LOCK sent during a delegation return can race with another OPEN on client, but the open stateid has not yet been updated. In this case, the client doesn't handle the OLD_STATEID error from the server and will lose this lock, emitting: "NFS: nfs4_handle_delegation_recall_error: unhandled error -10024". We want a fix that is much more focused to the original problem. Fix this issue by returning -EAGAIN from the nfs4_handle_delegation_recall_error() on OLD_STATEID, and use that as a signal for the delegation return code to retry the LOCK operation. We should at this point be able to send along the updated stateid. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> --- fs/nfs/delegation.c | 4 +++- fs/nfs/nfs4proc.c | 1 + 2 files changed, 4 insertions(+), 1 deletion(-)