diff mbox

[PATCH/RFC] Don't try to recover NFS locks when they are lost.

Message ID 20130904104946.1be45e52@notabene.brown (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown Sept. 4, 2013, 12:49 a.m. UTC
On Tue, 3 Sep 2013 18:43:23 +0000 "Myklebust, Trond"
<Trond.Myklebust@netapp.com> wrote:

> On Thu, 2013-08-15 at 12:36 +1000, NeilBrown wrote:
> > 
> > When an NFS (V4 specifically) client loses contact with the server it can
> > lose any locks that it holds.
> > Currently when it reconnects to the server it simply tries to reclaim
> > those locks.  This might succeed even though some other client has held and
> > released a lock in the mean time.  So the first client might think the file
> > is unchanged, but it isn't.  This isn't good.
> > 
> > If, when recovery happens, the locks cannot be claimed because some other
> > client still holds the lock, then  we get a message in the kernel logs, but
> > the client can still write.  So two clients can both think they have a lock
> > and can both write at the same time.  This is equally not good.
> > 
> > There was a patch a while ago
> >   http://comments.gmane.org/gmane.linux.nfs/41917
> > 
> > which tried to address some of this, but it didn't seem to go anywhere.
> > That patch would also send a signal to the process.  That might be useful
> > but I'm really just interested in failing the writes.
> > For NFSv4 (unlike v2/v3) there is a strong link between the lock and the
> > write request so we can fairly easily fail an IO of the lock is gone.
> > 
> > The patch below attempts to do this.  Does it make sense?
> > Because this is a fairly big change I introduces a module parameter
> > "recover_locks" which defaults to true (the current behaviour) but can be set
> > to "false" to tell the client not to try to recover things that were lost.
> > 
> > Comments?
> 
> I think this patch is close to being usable. A couple of questions,
> though:
> 
>      1. What happens if another process' open() causes us to receive a
>         delegation after NFS_LOCK_LOST has been set on our lock stateid,
>         but before we call nfs4_set_rw_stateid()?

Good point.  I think we need to check for NFS_LOCK_LOST before checking for a
delegation.  Does the incremental patch below look OK?
It takes a spinlock in the case where we have a delegation and  hold some
locks which it didn't have to take before.  Is that a concern?


>      2. Shouldn't we clear NFS_LOCK_LOST at some point? It looks to me
>         as if a process which sees the EIO, and decides to recover by
>         calling close(), reopen()ing the file and then locking it again,
>         might find NFS_LOCK_LOST still being set.


NFS_LOCK_LOST is per nfs4_lock_state which should be freed by
nfs4_fl_release_lock().
So when the files is closed, the locks a dropped, and the structure holding
the NFS_LOCK_LOST flag will go away.
Or did I miss something?

Thanks,
NeilBrown
diff mbox

Patch

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 4d103ff..bb1fd5d 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1040,10 +1040,11 @@  static int nfs4_copy_open_stateid(nfs4_stateid *dst, struct nfs4_state *state)
 int nfs4_select_rw_stateid(nfs4_stateid *dst, struct nfs4_state *state,
 		fmode_t fmode, const struct nfs_lockowner *lockowner)
 {
-	int ret = 0;
+	int ret = nfs4_copy_lock_stateid(dst, state, lockowner);
+	if (ret == -EIO)
+		goto out;
 	if (nfs4_copy_delegation_stateid(dst, state->inode, fmode))
 		goto out;
-	ret = nfs4_copy_lock_stateid(dst, state, lockowner);
 	if (ret != -ENOENT)
 		goto out;
 	ret = nfs4_copy_open_stateid(dst, state);