Message ID | 20241031134000.53396-12-cel@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | async COPY fixes for NFSD | expand |
On Thu, 2024-10-31 at 09:40 -0400, cel@kernel.org wrote: > From: Chuck Lever <chuck.lever@oracle.com> > > nfsd4_shutdown_copy() is just this: > > while ((copy = nfsd4_get_copy(clp)) != NULL) > nfsd4_stop_copy(copy); > > nfsd4_get_copy() bumps @copy's reference count, preventing > nfsd4_stop_copy() from releasing @copy. > > A while loop like this usually works by removing the first element > of the list, but neither nfsd4_get_copy() nor nfsd4_stop_copy() > alters the async_copies list. > > Best I can tell, then, is that nfsd4_shutdown_copy() continues to > loop until other threads manage to remove all the items from this > list. The spinning loop blocks shutdown until these items are gone. > > Possibly the reason we haven't seen this issue in the field is > because client_has_state() prevents __destroy_client() from calling > nfsd4_shutdown_copy() if there are any items on this list. In a > subsequent patch I plan to remove that restriction. > > Fixes: e0639dc5805a ("NFSD introduce async copy feature") > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > fs/nfsd/nfs4proc.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > index 9f6617fa5412..8229bbfdd735 100644 > --- a/fs/nfsd/nfs4proc.c > +++ b/fs/nfsd/nfs4proc.c > @@ -1302,6 +1302,9 @@ static struct nfsd4_copy *nfsd4_get_copy(struct nfs4_client *clp) > copy = list_first_entry(&clp->async_copies, struct nfsd4_copy, > copies); > refcount_inc(©->refcount); > + copy->cp_clp = NULL; > + if (!list_empty(©->copies)) > + list_del_init(©->copies); > } > spin_unlock(&clp->async_lock); > return copy; My initial thought was: The problem sounds real, but is nfsd4_get_copy() the place for the above logic? But then I noticed that nfsd4_get_copy() is only called from nfsd4_shutdown_copy(). Maybe we should be rename nfsd4_get_copy() to nfsd4_find_and_unhash_copy() ?
On Fri, Nov 01, 2024 at 08:28:42AM -0400, Jeff Layton wrote: > On Thu, 2024-10-31 at 09:40 -0400, cel@kernel.org wrote: > > From: Chuck Lever <chuck.lever@oracle.com> > > > > nfsd4_shutdown_copy() is just this: > > > > while ((copy = nfsd4_get_copy(clp)) != NULL) > > nfsd4_stop_copy(copy); > > > > nfsd4_get_copy() bumps @copy's reference count, preventing > > nfsd4_stop_copy() from releasing @copy. > > > > A while loop like this usually works by removing the first element > > of the list, but neither nfsd4_get_copy() nor nfsd4_stop_copy() > > alters the async_copies list. > > > > Best I can tell, then, is that nfsd4_shutdown_copy() continues to > > loop until other threads manage to remove all the items from this > > list. The spinning loop blocks shutdown until these items are gone. > > > > Possibly the reason we haven't seen this issue in the field is > > because client_has_state() prevents __destroy_client() from calling > > nfsd4_shutdown_copy() if there are any items on this list. In a > > subsequent patch I plan to remove that restriction. > > > > Fixes: e0639dc5805a ("NFSD introduce async copy feature") > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > --- > > fs/nfsd/nfs4proc.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > > index 9f6617fa5412..8229bbfdd735 100644 > > --- a/fs/nfsd/nfs4proc.c > > +++ b/fs/nfsd/nfs4proc.c > > @@ -1302,6 +1302,9 @@ static struct nfsd4_copy *nfsd4_get_copy(struct nfs4_client *clp) > > copy = list_first_entry(&clp->async_copies, struct nfsd4_copy, > > copies); > > refcount_inc(©->refcount); > > + copy->cp_clp = NULL; > > + if (!list_empty(©->copies)) > > + list_del_init(©->copies); > > } > > spin_unlock(&clp->async_lock); > > return copy; > > My initial thought was: > > The problem sounds real, but is nfsd4_get_copy() the place for the > above logic? > > But then I noticed that nfsd4_get_copy() is only called from > nfsd4_shutdown_copy(). Maybe we should be rename nfsd4_get_copy() to > nfsd4_find_and_unhash_copy() ? nfsd4_get_copy() book-ends nfsd4_put_copy(). But nfsd4_unhash_copy() would work too, I think.
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 9f6617fa5412..8229bbfdd735 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1302,6 +1302,9 @@ static struct nfsd4_copy *nfsd4_get_copy(struct nfs4_client *clp) copy = list_first_entry(&clp->async_copies, struct nfsd4_copy, copies); refcount_inc(©->refcount); + copy->cp_clp = NULL; + if (!list_empty(©->copies)) + list_del_init(©->copies); } spin_unlock(&clp->async_lock); return copy;