[v3,8/8] NFSD: Send CB_OFFLOAD on graceful shutdown

Message ID	20241031134000.53396-18-cel@kernel.org (mailing list archive)
State	New
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F03B419DF8D for <linux-nfs@vger.kernel.org>; Thu, 31 Oct 2024 13:40:18 +0000 (UTC) From: cel@kernel.org To: Neil Brown <neilb@suse.de>, Jeff Layton <jlayton@kernel.org>, Olga Kornievskaia <okorniev@redhat.com>, Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com> Cc: <linux-nfs@vger.kernel.org>, Chuck Lever <chuck.lever@oracle.com> Subject: [PATCH v3 8/8] NFSD: Send CB_OFFLOAD on graceful shutdown Date: Thu, 31 Oct 2024 09:40:09 -0400 Message-ID: <20241031134000.53396-18-cel@kernel.org> In-Reply-To: <20241031134000.53396-10-cel@kernel.org> References: <20241031134000.53396-10-cel@kernel.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	async COPY fixes for NFSD \| expand [v3,0/8] async COPY fixes for NFSD [v3,1/8] NFSD: Add a tracepoint to record canceled async COPY operations [v3,2/8] NFSD: Fix nfsd4_shutdown_copy() [v3,3/8] NFSD: Free async copy information in nfsd4_cb_offload_release() [v3,4/8] NFSD: Handle an NFS4ERR_DELAY response to CB_OFFLOAD [v3,5/8] NFSD: Block DESTROY_CLIENTID only when there are ongoing async COPY operations [v3,6/8] NFSD: Add a laundromat reaper for async copy state [v3,7/8] NFSD: Add nfsd4_copy time-to-live [v3,8/8] NFSD: Send CB_OFFLOAD on graceful shutdown

Message ID

20241031134000.53396-18-cel@kernel.org (mailing list archive)

State

New

Headers

From: cel@kernel.org
To: Neil Brown <neilb@suse.de>,
	Jeff Layton <jlayton@kernel.org>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <dai.ngo@oracle.com>,
	Tom Talpey <tom@talpey.com>
Cc: <linux-nfs@vger.kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>
Subject: [PATCH v3 8/8] NFSD: Send CB_OFFLOAD on graceful shutdown
Date: Thu, 31 Oct 2024 09:40:09 -0400
Message-ID: <20241031134000.53396-18-cel@kernel.org>
In-Reply-To: <20241031134000.53396-10-cel@kernel.org>
References: <20241031134000.53396-10-cel@kernel.org>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

async COPY fixes for NFSD | expand

Commit Message

Chuck Lever Oct. 31, 2024, 1:40 p.m. UTC

From: Chuck Lever <chuck.lever@oracle.com>

If an async COPY operation happens to be running when the server is
shut down, notify the requesting client that the copy has completed.

Since the nfs4_client is going away, seems like this could introduce
some UAFs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4proc.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Jeff Layton Nov. 1, 2024, 1:05 p.m. UTC | #1

On Thu, 2024-10-31 at 09:40 -0400, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> If an async COPY operation happens to be running when the server is
> shut down, notify the requesting client that the copy has completed.
> 
> Since the nfs4_client is going away, seems like this could introduce
> some UAFs.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfs4proc.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 4c964bce6bd7..51b3f85f3791 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -68,6 +68,8 @@ MODULE_PARM_DESC(nfsd4_ssc_umount_timeout,
>  
>  #define NFSDDBG_FACILITY		NFSDDBG_PROC
>  
> +static void nfsd4_send_cb_offload(struct nfsd4_copy *copy);
> +
>  static u32 nfsd_attrmask[] = {
>  	NFSD_WRITEABLE_ATTRS_WORD0,
>  	NFSD_WRITEABLE_ATTRS_WORD1,
> @@ -1381,8 +1383,10 @@ void nfsd4_shutdown_copy(struct nfs4_client *clp)
>  {
>  	struct nfsd4_copy *copy;
>  
> -	while ((copy = nfsd4_get_copy(clp)) != NULL)
> +	while ((copy = nfsd4_get_copy(clp)) != NULL) {
>  		nfsd4_stop_copy(copy);
> +		nfsd4_send_cb_offload(copy);
> +	}

Not sure about a UAF, but  it seems like NFS4ERR_DELAY returns might
delay the client destruction for quite a while. Maybe this CB_OFFLOAD
shouldn't retry on DELAY?


>  }
>  #ifdef CONFIG_NFSD_V4_2_INTER_SSC
>

Chuck Lever Nov. 1, 2024, 1:18 p.m. UTC | #2

On Fri, Nov 01, 2024 at 09:05:07AM -0400, Jeff Layton wrote:
> On Thu, 2024-10-31 at 09:40 -0400, cel@kernel.org wrote:
> > From: Chuck Lever <chuck.lever@oracle.com>
> > 
> > If an async COPY operation happens to be running when the server is
> > shut down, notify the requesting client that the copy has completed.
> > 
> > Since the nfs4_client is going away, seems like this could introduce
> > some UAFs.
> > 
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > ---
> >  fs/nfsd/nfs4proc.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index 4c964bce6bd7..51b3f85f3791 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -68,6 +68,8 @@ MODULE_PARM_DESC(nfsd4_ssc_umount_timeout,
> >  
> >  #define NFSDDBG_FACILITY		NFSDDBG_PROC
> >  
> > +static void nfsd4_send_cb_offload(struct nfsd4_copy *copy);
> > +
> >  static u32 nfsd_attrmask[] = {
> >  	NFSD_WRITEABLE_ATTRS_WORD0,
> >  	NFSD_WRITEABLE_ATTRS_WORD1,
> > @@ -1381,8 +1383,10 @@ void nfsd4_shutdown_copy(struct nfs4_client *clp)
> >  {
> >  	struct nfsd4_copy *copy;
> >  
> > -	while ((copy = nfsd4_get_copy(clp)) != NULL)
> > +	while ((copy = nfsd4_get_copy(clp)) != NULL) {
> >  		nfsd4_stop_copy(copy);
> > +		nfsd4_send_cb_offload(copy);
> > +	}
> 
> Not sure about a UAF, but it seems like NFS4ERR_DELAY returns might
> delay the client destruction for quite a while.

The nfsd4_copy() is removed from the nfs4_client's async_copies
list, and the RPC proceeds in the background. It doesn't block the
destruction of the CLIENTID.

But it might be a problem for the RPC logic to transmit the call
when there's no nfs4_client to dereference. I should probably
drop this patch and try a different approach.


> Maybe this CB_OFFLOAD shouldn't retry on DELAY?
> 
> 
> >  }
> >  #ifdef CONFIG_NFSD_V4_2_INTER_SSC
> >  
> 
> 
> -- 
> Jeff Layton <jlayton@kernel.org>
>

Jeff Layton Nov. 1, 2024, 1:30 p.m. UTC | #3

On Fri, 2024-11-01 at 09:18 -0400, Chuck Lever wrote:
> On Fri, Nov 01, 2024 at 09:05:07AM -0400, Jeff Layton wrote:
> > On Thu, 2024-10-31 at 09:40 -0400, cel@kernel.org wrote:
> > > From: Chuck Lever <chuck.lever@oracle.com>
> > > 
> > > If an async COPY operation happens to be running when the server is
> > > shut down, notify the requesting client that the copy has completed.
> > > 
> > > Since the nfs4_client is going away, seems like this could introduce
> > > some UAFs.
> > > 
> > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > ---
> > >  fs/nfsd/nfs4proc.c | 6 +++++-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > index 4c964bce6bd7..51b3f85f3791 100644
> > > --- a/fs/nfsd/nfs4proc.c
> > > +++ b/fs/nfsd/nfs4proc.c
> > > @@ -68,6 +68,8 @@ MODULE_PARM_DESC(nfsd4_ssc_umount_timeout,
> > >  
> > >  #define NFSDDBG_FACILITY		NFSDDBG_PROC
> > >  
> > > +static void nfsd4_send_cb_offload(struct nfsd4_copy *copy);
> > > +
> > >  static u32 nfsd_attrmask[] = {
> > >  	NFSD_WRITEABLE_ATTRS_WORD0,
> > >  	NFSD_WRITEABLE_ATTRS_WORD1,
> > > @@ -1381,8 +1383,10 @@ void nfsd4_shutdown_copy(struct nfs4_client *clp)
> > >  {
> > >  	struct nfsd4_copy *copy;
> > >  
> > > -	while ((copy = nfsd4_get_copy(clp)) != NULL)
> > > +	while ((copy = nfsd4_get_copy(clp)) != NULL) {
> > >  		nfsd4_stop_copy(copy);
> > > +		nfsd4_send_cb_offload(copy);
> > > +	}
> > 
> > Not sure about a UAF, but it seems like NFS4ERR_DELAY returns might
> > delay the client destruction for quite a while.
> 
> The nfsd4_copy() is removed from the nfs4_client's async_copies
> list, and the RPC proceeds in the background. It doesn't block the
> destruction of the CLIENTID.
> 
> But it might be a problem for the RPC logic to transmit the call
> when there's no nfs4_client to dereference. I should probably
> drop this patch and try a different approach.


No, I think this will block the client shutdown. After
nfsd4_shutdown_copy(), it calls nfsd4_shutdown_callback(), which does:

        flush_workqueue(clp->cl_callback_wq);                         
        nfsd41_cb_inflight_wait_complete(clp);  

So the CB_OFFLOAD workqueue jobs will run, and everything will wait for
clp->cl_cb_inflight to go to 0. That won't happen until the CB_OFFLOAD
jobs complete.

No objection from me though if you see a better approach. Dropping this
one for now seems reasonable.

> 
> > Maybe this CB_OFFLOAD shouldn't retry on DELAY?
> > 
> > 
> > >  }
> > >  #ifdef CONFIG_NFSD_V4_2_INTER_SSC
> > >  
> > 
> > 
> > -- 
> > Jeff Layton <jlayton@kernel.org>
> > 
>

Chuck Lever Nov. 1, 2024, 2 p.m. UTC | #4

On Fri, Nov 01, 2024 at 09:30:44AM -0400, Jeff Layton wrote:
> On Fri, 2024-11-01 at 09:18 -0400, Chuck Lever wrote:
> > On Fri, Nov 01, 2024 at 09:05:07AM -0400, Jeff Layton wrote:
> > > On Thu, 2024-10-31 at 09:40 -0400, cel@kernel.org wrote:
> > > > From: Chuck Lever <chuck.lever@oracle.com>
> > > > 
> > > > If an async COPY operation happens to be running when the server is
> > > > shut down, notify the requesting client that the copy has completed.
> > > > 
> > > > Since the nfs4_client is going away, seems like this could introduce
> > > > some UAFs.
> > > > 
> > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > > ---
> > > >  fs/nfsd/nfs4proc.c | 6 +++++-
> > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > index 4c964bce6bd7..51b3f85f3791 100644
> > > > --- a/fs/nfsd/nfs4proc.c
> > > > +++ b/fs/nfsd/nfs4proc.c
> > > > @@ -68,6 +68,8 @@ MODULE_PARM_DESC(nfsd4_ssc_umount_timeout,
> > > >  
> > > >  #define NFSDDBG_FACILITY		NFSDDBG_PROC
> > > >  
> > > > +static void nfsd4_send_cb_offload(struct nfsd4_copy *copy);
> > > > +
> > > >  static u32 nfsd_attrmask[] = {
> > > >  	NFSD_WRITEABLE_ATTRS_WORD0,
> > > >  	NFSD_WRITEABLE_ATTRS_WORD1,
> > > > @@ -1381,8 +1383,10 @@ void nfsd4_shutdown_copy(struct nfs4_client *clp)
> > > >  {
> > > >  	struct nfsd4_copy *copy;
> > > >  
> > > > -	while ((copy = nfsd4_get_copy(clp)) != NULL)
> > > > +	while ((copy = nfsd4_get_copy(clp)) != NULL) {
> > > >  		nfsd4_stop_copy(copy);
> > > > +		nfsd4_send_cb_offload(copy);
> > > > +	}
> > > 
> > > Not sure about a UAF, but it seems like NFS4ERR_DELAY returns might
> > > delay the client destruction for quite a while.
> > 
> > The nfsd4_copy() is removed from the nfs4_client's async_copies
> > list, and the RPC proceeds in the background. It doesn't block the
> > destruction of the CLIENTID.
> > 
> > But it might be a problem for the RPC logic to transmit the call
> > when there's no nfs4_client to dereference. I should probably
> > drop this patch and try a different approach.
> 
> 
> No, I think this will block the client shutdown. After
> nfsd4_shutdown_copy(), it calls nfsd4_shutdown_callback(), which does:
> 
>         flush_workqueue(clp->cl_callback_wq);                         
>         nfsd41_cb_inflight_wait_complete(clp);  
> 
> So the CB_OFFLOAD workqueue jobs will run, and everything will wait for
> clp->cl_cb_inflight to go to 0. That won't happen until the CB_OFFLOAD
> jobs complete.
> 
> No objection from me though if you see a better approach. Dropping this
> one for now seems reasonable.

There is no spec mandate (yet) for CB_OFFLOAD notification on
DESTROY_CLIENTID, so I'm dropping the patch for now.


> > > Maybe this CB_OFFLOAD shouldn't retry on DELAY?

Some logic can be added such that no retry is done. However, if the
client does not respond at all to the CB_OFFLOAD, NFSD's RPC timeout
is still 7 seconds.

We expect this will be exceptionally rare, though -- clients are
typically aware of ongoing COPY operations and generally won't try
to destroy a client ID where there is pending work that they care
about.


> > > >  }
> > > >  #ifdef CONFIG_NFSD_V4_2_INTER_SSC
> > > >  
> > > 
> > > 
> > > -- 
> > > Jeff Layton <jlayton@kernel.org>
> > > 
> > 
> 
> -- 
> Jeff Layton <jlayton@kernel.org>
>

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 4c964bce6bd7..51b3f85f3791 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -68,6 +68,8 @@  MODULE_PARM_DESC(nfsd4_ssc_umount_timeout,
 
 #define NFSDDBG_FACILITY		NFSDDBG_PROC
 
+static void nfsd4_send_cb_offload(struct nfsd4_copy *copy);
+
 static u32 nfsd_attrmask[] = {
 	NFSD_WRITEABLE_ATTRS_WORD0,
 	NFSD_WRITEABLE_ATTRS_WORD1,
@@ -1381,8 +1383,10 @@  void nfsd4_shutdown_copy(struct nfs4_client *clp)
 {
 	struct nfsd4_copy *copy;
 
-	while ((copy = nfsd4_get_copy(clp)) != NULL)
+	while ((copy = nfsd4_get_copy(clp)) != NULL) {
 		nfsd4_stop_copy(copy);
+		nfsd4_send_cb_offload(copy);
+	}
 }
 #ifdef CONFIG_NFSD_V4_2_INTER_SSC

[v3,8/8] NFSD: Send CB_OFFLOAD on graceful shutdown

Commit Message

Comments

Patch