Message ID | 20230917230551.30483-1-trondmy@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] NFSv4: Fix a nfs4_state_manager() race | expand |
On Mon, 18 Sep 2023, trondmy@kernel.org wrote: > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > If the NFS4CLNT_RUN_MANAGER flag got set just before we cleared > NFS4CLNT_MANAGER_RUNNING, then we might have won the race against > nfs4_schedule_state_manager(), and are responsible for handling the > recovery situation. > > Fixes: aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock") > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > --- > fs/nfs/nfs4state.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > index e079987af4a3..0bc160fbabec 100644 > --- a/fs/nfs/nfs4state.c > +++ b/fs/nfs/nfs4state.c > @@ -2703,6 +2703,13 @@ static void nfs4_state_manager(struct nfs_client *clp) > nfs4_end_drain_session(clp); > nfs4_clear_state_manager_bit(clp); > > + if (test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state) && > + !test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, > + &clp->cl_state)) { > + memflags = memalloc_nofs_save(); > + continue; > + } > + I cannot see what race this closes. When we cleared MANAGER_RUNNING, the only possible consequence is that nfs4_wait_clnt_recover could have woken up. This leads to nfs4_schedule_state_manager() being run, which sets RUN_MANAGER whether it was set or not. I understand that there are problems with MANAGER_AVAILABLE which your next patch addresses, but I cannot see what this one actually fixes. Can you help me? Thanks, NeilBrown > if (!test_and_set_bit(NFS4CLNT_RECALL_RUNNING, &clp->cl_state)) { > if (test_and_clear_bit(NFS4CLNT_DELEGRETURN, &clp->cl_state)) { > nfs_client_return_marked_delegations(clp); > -- > 2.41.0 > >
On Mon, 2023-09-18 at 11:17 +1000, NeilBrown wrote: > On Mon, 18 Sep 2023, trondmy@kernel.org wrote: > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > If the NFS4CLNT_RUN_MANAGER flag got set just before we cleared > > NFS4CLNT_MANAGER_RUNNING, then we might have won the race against > > nfs4_schedule_state_manager(), and are responsible for handling the > > recovery situation. > > > > Fixes: aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock") > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > --- > > fs/nfs/nfs4state.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > > index e079987af4a3..0bc160fbabec 100644 > > --- a/fs/nfs/nfs4state.c > > +++ b/fs/nfs/nfs4state.c > > @@ -2703,6 +2703,13 @@ static void nfs4_state_manager(struct > > nfs_client *clp) > > nfs4_end_drain_session(clp); > > nfs4_clear_state_manager_bit(clp); > > > > + if (test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state) > > && > > + !test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, > > + &clp->cl_state)) { > > + memflags = memalloc_nofs_save(); > > + continue; > > + } > > + > > I cannot see what race this closes. > When we cleared MANAGER_RUNNING, the only possible consequence is > that > nfs4_wait_clnt_recover could have woken up. This leads to > nfs4_schedule_state_manager() being run, which sets RUN_MANAGER > whether > it was set or not. > > I understand that there are problems with MANAGER_AVAILABLE which > your > next patch addresses, but I cannot see what this one actually fixes. > Can you help me? > If NFS4CLNT_RUN_MANAGER gets set while we're handling the reboot recovery or network partition, then NFS4CLNT_MANAGER_RUNNING will be set, so nfs4_schedule_state_manager() will just exit rather than start a new thread. If we don't catch that situation before we start handling the asynchronous delegation returns, then we can deadlock. If, OTOH, nfs4_schedule_state_manager() runs after we've cleared NFS4CLNT_MANAGER_RUNNING, then we should be OK (assuming both patches are applied). Cheers Trond > Thanks, > NeilBrown > > > > > if (!test_and_set_bit(NFS4CLNT_RECALL_RUNNING, > > &clp->cl_state)) { > > if > > (test_and_clear_bit(NFS4CLNT_DELEGRETURN, &clp->cl_state)) { > > nfs_client_return_marked_delegation > > s(clp); > > -- > > 2.41.0 > > > > >
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index e079987af4a3..0bc160fbabec 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -2703,6 +2703,13 @@ static void nfs4_state_manager(struct nfs_client *clp) nfs4_end_drain_session(clp); nfs4_clear_state_manager_bit(clp); + if (test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state) && + !test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, + &clp->cl_state)) { + memflags = memalloc_nofs_save(); + continue; + } + if (!test_and_set_bit(NFS4CLNT_RECALL_RUNNING, &clp->cl_state)) { if (test_and_clear_bit(NFS4CLNT_DELEGRETURN, &clp->cl_state)) { nfs_client_return_marked_delegations(clp);