diff mbox series

[1/2] NFSv4: Fix a nfs4_state_manager() race

Message ID 20230917230551.30483-1-trondmy@kernel.org (mailing list archive)
State New, archived
Headers show
Series [1/2] NFSv4: Fix a nfs4_state_manager() race | expand

Commit Message

Trond Myklebust Sept. 17, 2023, 11:05 p.m. UTC
From: Trond Myklebust <trond.myklebust@hammerspace.com>

If the NFS4CLNT_RUN_MANAGER flag got set just before we cleared
NFS4CLNT_MANAGER_RUNNING, then we might have won the race against
nfs4_schedule_state_manager(), and are responsible for handling the
recovery situation.

Fixes: aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4state.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

NeilBrown Sept. 18, 2023, 1:17 a.m. UTC | #1
On Mon, 18 Sep 2023, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> If the NFS4CLNT_RUN_MANAGER flag got set just before we cleared
> NFS4CLNT_MANAGER_RUNNING, then we might have won the race against
> nfs4_schedule_state_manager(), and are responsible for handling the
> recovery situation.
> 
> Fixes: aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock")
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
>  fs/nfs/nfs4state.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index e079987af4a3..0bc160fbabec 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -2703,6 +2703,13 @@ static void nfs4_state_manager(struct nfs_client *clp)
>  		nfs4_end_drain_session(clp);
>  		nfs4_clear_state_manager_bit(clp);
>  
> +		if (test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state) &&
> +		    !test_and_set_bit(NFS4CLNT_MANAGER_RUNNING,
> +				      &clp->cl_state)) {
> +			memflags = memalloc_nofs_save();
> +			continue;
> +		}
> +

I cannot see what race this closes.
When we cleared MANAGER_RUNNING, the only possible consequence is that
nfs4_wait_clnt_recover could have woken up.  This leads to
nfs4_schedule_state_manager() being run, which sets RUN_MANAGER whether
it was set or not.

I understand that there are problems with MANAGER_AVAILABLE which your
next patch addresses, but I cannot see what this one actually fixes.
Can you help me?

Thanks,
NeilBrown



>  		if (!test_and_set_bit(NFS4CLNT_RECALL_RUNNING, &clp->cl_state)) {
>  			if (test_and_clear_bit(NFS4CLNT_DELEGRETURN, &clp->cl_state)) {
>  				nfs_client_return_marked_delegations(clp);
> -- 
> 2.41.0
> 
>
Trond Myklebust Sept. 18, 2023, 2:20 a.m. UTC | #2
On Mon, 2023-09-18 at 11:17 +1000, NeilBrown wrote:
> On Mon, 18 Sep 2023, trondmy@kernel.org wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > If the NFS4CLNT_RUN_MANAGER flag got set just before we cleared
> > NFS4CLNT_MANAGER_RUNNING, then we might have won the race against
> > nfs4_schedule_state_manager(), and are responsible for handling the
> > recovery situation.
> > 
> > Fixes: aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock")
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> >  fs/nfs/nfs4state.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > index e079987af4a3..0bc160fbabec 100644
> > --- a/fs/nfs/nfs4state.c
> > +++ b/fs/nfs/nfs4state.c
> > @@ -2703,6 +2703,13 @@ static void nfs4_state_manager(struct
> > nfs_client *clp)
> >                 nfs4_end_drain_session(clp);
> >                 nfs4_clear_state_manager_bit(clp);
> >  
> > +               if (test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state)
> > &&
> > +                   !test_and_set_bit(NFS4CLNT_MANAGER_RUNNING,
> > +                                     &clp->cl_state)) {
> > +                       memflags = memalloc_nofs_save();
> > +                       continue;
> > +               }
> > +
> 
> I cannot see what race this closes.
> When we cleared MANAGER_RUNNING, the only possible consequence is
> that
> nfs4_wait_clnt_recover could have woken up.  This leads to
> nfs4_schedule_state_manager() being run, which sets RUN_MANAGER
> whether
> it was set or not.
> 
> I understand that there are problems with MANAGER_AVAILABLE which
> your
> next patch addresses, but I cannot see what this one actually fixes.
> Can you help me?
> 

If NFS4CLNT_RUN_MANAGER gets set while we're handling the reboot
recovery or network partition, then NFS4CLNT_MANAGER_RUNNING will be
set, so nfs4_schedule_state_manager() will just exit rather than start
a new thread. If we don't catch that situation before we start handling
the asynchronous delegation returns, then we can deadlock.

If, OTOH, nfs4_schedule_state_manager() runs after we've cleared
NFS4CLNT_MANAGER_RUNNING, then we should be OK (assuming both patches
are applied).

Cheers
 Trond

> Thanks,
> NeilBrown
> 
> 
> 
> >                 if (!test_and_set_bit(NFS4CLNT_RECALL_RUNNING,
> > &clp->cl_state)) {
> >                         if
> > (test_and_clear_bit(NFS4CLNT_DELEGRETURN, &clp->cl_state)) {
> >                                 nfs_client_return_marked_delegation
> > s(clp);
> > -- 
> > 2.41.0
> > 
> > 
>
diff mbox series

Patch

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index e079987af4a3..0bc160fbabec 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2703,6 +2703,13 @@  static void nfs4_state_manager(struct nfs_client *clp)
 		nfs4_end_drain_session(clp);
 		nfs4_clear_state_manager_bit(clp);
 
+		if (test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state) &&
+		    !test_and_set_bit(NFS4CLNT_MANAGER_RUNNING,
+				      &clp->cl_state)) {
+			memflags = memalloc_nofs_save();
+			continue;
+		}
+
 		if (!test_and_set_bit(NFS4CLNT_RECALL_RUNNING, &clp->cl_state)) {
 			if (test_and_clear_bit(NFS4CLNT_DELEGRETURN, &clp->cl_state)) {
 				nfs_client_return_marked_delegations(clp);