Message ID | 1312481855.8919.20.camel@lade.trondhjem.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Aug 04, 2011 at 02:17:35PM -0400, Trond Myklebust wrote: > On Thu, 2011-08-04 at 19:27 +0200, Frank van Maarseveen wrote: > > On Thu, Aug 04, 2011 at 01:10:20PM -0400, Trond Myklebust wrote: > > > On Thu, 2011-08-04 at 12:49 -0400, J. Bruce Fields wrote: > > > > On Thu, Aug 04, 2011 at 06:43:13PM +0200, Frank van Maarseveen wrote: > > > > > On Thu, Aug 04, 2011 at 12:34:52PM -0400, J. Bruce Fields wrote: > > > > > > On Thu, Aug 04, 2011 at 12:30:19PM +0200, Frank van Maarseveen wrote: > > > > > > > Both client- and server run 2.6.39.3, NFSv3 over UDP (without the > > > > > > > relock_filesystem patch proposed earlier). > > > > > > > > > > > > > > A second client has an exclusive lock on a file on the server. The > > > > > > > client under test calls fcntl(F_SETLKW) to wait for the same exclusive > > > > > > > lock. Wireshark sees NLM V4 LOCK calls resulting in NLM_BLOCKED. > > > > > > > > > > > > > > Next the server is rebooted. The second client recovers the lock > > > > > > > correctly. The client under test now receives NLM_DENIED_GRACE_PERIOD for > > > > > > > every NLM V4 LOCK request resulting from the waiting fcntl(F_SETLKW). When > > > > > > > this changes to NLM_BLOCKED after grace period expiration the fcntl > > > > > > > returns -ENOLCK ("No locks available.") instead of continuing to wait. > > > > > > > > > > > > So that sounds like a client bug, and correct behavior from the server > > > > > > (assuming the second client was still holding the lock throughout). > > > > > > > > > > yes. > > > > > > Is the client actually asking for a blocking lock after the grace period > > > expires? > > > > yes, according to my interpretation of that of wireshark, see reply to Bruce. > > > > OK... Does the following patch help? > > Cheers > Trond > --- > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c > index 8392cb8..40c0d88 100644 > --- a/fs/lockd/clntproc.c > +++ b/fs/lockd/clntproc.c > @@ -270,6 +270,9 @@ nlmclnt_call(struct rpc_cred *cred, struct nlm_rqst *req, u32 proc) > return -ENOLCK; > msg.rpc_proc = &clnt->cl_procinfo[proc]; > > + /* Reset the reply status */ > + if (argp->block) > + resp->status = nlm_lck_blocked; > /* Perform the RPC call. If an error occurs, try again */ > if ((status = rpc_call_sync(clnt, &msg, 0)) < 0) { > dprintk("lockd: rpc_call returned error %d\n", -status); > Negative. I've tried it on the client under test and I'm seeing three types of behavior, one good, two bad. In all cases the secondary client (unmodified) correctly regains the lock after the server has rebooted. Client under test behavior depends on whether it had queued the conflicting lock before of after the server reboot. Afterwards it seems to work with the above modification (don't know if that was the case before though). When the client under test tries to lock before the server reboot then the fcntl(F_SETLKW) returns either right after the NSM NOTIFY with -ENOLCK without any NLM trafic or it returns with -ENOLCK when the NLM_DENIED_GRACE_PERIOD changes into NLM_BLOCKED (the original report).
--- a/fs/lockd/clntproc.c +++ b/fs/lockd/clntproc.c @@ -270,6 +270,9 @@ nlmclnt_call(struct rpc_cred *cred, struct nlm_rqst *req, u32 proc) return -ENOLCK; msg.rpc_proc = &clnt->cl_procinfo[proc]; + /* Reset the reply status */ + if (argp->block) + resp->status = nlm_lck_blocked; /* Perform the RPC call. If an error occurs, try again */ if ((status = rpc_call_sync(clnt, &msg, 0)) < 0) { dprintk("lockd: rpc_call returned error %d\n", -status);