diff mbox series

[v3,2/8] NFSD: Fix nfsd4_shutdown_copy()

Message ID 20241031134000.53396-12-cel@kernel.org (mailing list archive)
State New
Headers show
Series async COPY fixes for NFSD | expand

Commit Message

Chuck Lever Oct. 31, 2024, 1:40 p.m. UTC
From: Chuck Lever <chuck.lever@oracle.com>

nfsd4_shutdown_copy() is just this:

	while ((copy = nfsd4_get_copy(clp)) != NULL)
		nfsd4_stop_copy(copy);

nfsd4_get_copy() bumps @copy's reference count, preventing
nfsd4_stop_copy() from releasing @copy.

A while loop like this usually works by removing the first element
of the list, but neither nfsd4_get_copy() nor nfsd4_stop_copy()
alters the async_copies list.

Best I can tell, then, is that nfsd4_shutdown_copy() continues to
loop until other threads manage to remove all the items from this
list. The spinning loop blocks shutdown until these items are gone.

Possibly the reason we haven't seen this issue in the field is
because client_has_state() prevents __destroy_client() from calling
nfsd4_shutdown_copy() if there are any items on this list. In a
subsequent patch I plan to remove that restriction.

Fixes: e0639dc5805a ("NFSD introduce async copy feature")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4proc.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Jeff Layton Nov. 1, 2024, 12:28 p.m. UTC | #1
On Thu, 2024-10-31 at 09:40 -0400, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> nfsd4_shutdown_copy() is just this:
> 
> 	while ((copy = nfsd4_get_copy(clp)) != NULL)
> 		nfsd4_stop_copy(copy);
> 
> nfsd4_get_copy() bumps @copy's reference count, preventing
> nfsd4_stop_copy() from releasing @copy.
> 
> A while loop like this usually works by removing the first element
> of the list, but neither nfsd4_get_copy() nor nfsd4_stop_copy()
> alters the async_copies list.
> 
> Best I can tell, then, is that nfsd4_shutdown_copy() continues to
> loop until other threads manage to remove all the items from this
> list. The spinning loop blocks shutdown until these items are gone.
> 
> Possibly the reason we haven't seen this issue in the field is
> because client_has_state() prevents __destroy_client() from calling
> nfsd4_shutdown_copy() if there are any items on this list. In a
> subsequent patch I plan to remove that restriction.
> 
> Fixes: e0639dc5805a ("NFSD introduce async copy feature")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfs4proc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 9f6617fa5412..8229bbfdd735 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1302,6 +1302,9 @@ static struct nfsd4_copy *nfsd4_get_copy(struct nfs4_client *clp)
>  		copy = list_first_entry(&clp->async_copies, struct nfsd4_copy,
>  					copies);
>  		refcount_inc(&copy->refcount);
> +		copy->cp_clp = NULL;
> +		if (!list_empty(&copy->copies))
> +			list_del_init(&copy->copies);
>  	}
>  	spin_unlock(&clp->async_lock);
>  	return copy;

My initial thought was:

The problem sounds real, but is nfsd4_get_copy() the place for the
above logic?

But then I noticed that nfsd4_get_copy() is only called from
nfsd4_shutdown_copy(). Maybe we should be rename nfsd4_get_copy() to
nfsd4_find_and_unhash_copy() ?
Chuck Lever III Nov. 1, 2024, 1:04 p.m. UTC | #2
On Fri, Nov 01, 2024 at 08:28:42AM -0400, Jeff Layton wrote:
> On Thu, 2024-10-31 at 09:40 -0400, cel@kernel.org wrote:
> > From: Chuck Lever <chuck.lever@oracle.com>
> > 
> > nfsd4_shutdown_copy() is just this:
> > 
> > 	while ((copy = nfsd4_get_copy(clp)) != NULL)
> > 		nfsd4_stop_copy(copy);
> > 
> > nfsd4_get_copy() bumps @copy's reference count, preventing
> > nfsd4_stop_copy() from releasing @copy.
> > 
> > A while loop like this usually works by removing the first element
> > of the list, but neither nfsd4_get_copy() nor nfsd4_stop_copy()
> > alters the async_copies list.
> > 
> > Best I can tell, then, is that nfsd4_shutdown_copy() continues to
> > loop until other threads manage to remove all the items from this
> > list. The spinning loop blocks shutdown until these items are gone.
> > 
> > Possibly the reason we haven't seen this issue in the field is
> > because client_has_state() prevents __destroy_client() from calling
> > nfsd4_shutdown_copy() if there are any items on this list. In a
> > subsequent patch I plan to remove that restriction.
> > 
> > Fixes: e0639dc5805a ("NFSD introduce async copy feature")
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > ---
> >  fs/nfsd/nfs4proc.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index 9f6617fa5412..8229bbfdd735 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -1302,6 +1302,9 @@ static struct nfsd4_copy *nfsd4_get_copy(struct nfs4_client *clp)
> >  		copy = list_first_entry(&clp->async_copies, struct nfsd4_copy,
> >  					copies);
> >  		refcount_inc(&copy->refcount);
> > +		copy->cp_clp = NULL;
> > +		if (!list_empty(&copy->copies))
> > +			list_del_init(&copy->copies);
> >  	}
> >  	spin_unlock(&clp->async_lock);
> >  	return copy;
> 
> My initial thought was:
> 
> The problem sounds real, but is nfsd4_get_copy() the place for the
> above logic?
> 
> But then I noticed that nfsd4_get_copy() is only called from
> nfsd4_shutdown_copy(). Maybe we should be rename nfsd4_get_copy() to
> nfsd4_find_and_unhash_copy() ?

nfsd4_get_copy() book-ends nfsd4_put_copy(). But nfsd4_unhash_copy()
would work too, I think.
diff mbox series

Patch

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 9f6617fa5412..8229bbfdd735 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1302,6 +1302,9 @@  static struct nfsd4_copy *nfsd4_get_copy(struct nfs4_client *clp)
 		copy = list_first_entry(&clp->async_copies, struct nfsd4_copy,
 					copies);
 		refcount_inc(&copy->refcount);
+		copy->cp_clp = NULL;
+		if (!list_empty(&copy->copies))
+			list_del_init(&copy->copies);
 	}
 	spin_unlock(&clp->async_lock);
 	return copy;