nfsd: fallback to sync COPY if async not possible

Message ID	173069566284.81717.2360317209010090007@noble.neil.brown.name (mailing list archive)
State	New
Headers	show Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E9AD1632CF for <linux-nfs@vger.kernel.org>; Mon, 4 Nov 2024 04:47:50 +0000 (UTC) Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk MIME-Version: 1.0 From: "NeilBrown" <neilb@suse.de> To: Chuck Lever <chuck.lever@oracle.com>, Jeff Layton <jlayton@kernel.org> Cc: Olga Kornievskaia <okorniev@redhat.com>, Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>, linux-nfs@vger.kernel.org Subject: [PATCH] nfsd: fallback to sync COPY if async not possible Date: Mon, 04 Nov 2024 15:47:42 +1100 Message-id: <173069566284.81717.2360317209010090007@noble.neil.brown.name>
Series	nfsd: fallback to sync COPY if async not possible \| expand nfsd: fallback to sync COPY if async not possible

NeilBrown Nov. 4, 2024, 4:47 a.m. UTC

An NFSv4.2 COPY request can explicitly request a synchronous copy.  If
that is not requested then the server is free to perform the copy
synchronously or asynchronously.

In the Linux implementation an async copy requires more resources than a
sync copy.  If nfsd cannot allocate these resources, the best response
is to simply perform the copy (or the first 4MB of it) synchronously.

This choice may be debatable if the unavailable resource was due to
memory allocation failure - when memalloc fails it might be best to
simply give up as the server is clearly under load.  However in the case
that policy prevents another kthread being created there is no benefit
and much cost is failing with NFS4ERR_DELAY.  In that case it seems
reasonable to avoid that error in all circumstances.

So change the out_err case to retry as a sync copy.

Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfs4proc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44

Jeff Layton Nov. 4, 2024, 1:05 p.m. UTC | #1

On Mon, 2024-11-04 at 15:47 +1100, NeilBrown wrote:
> An NFSv4.2 COPY request can explicitly request a synchronous copy.  If
> that is not requested then the server is free to perform the copy
> synchronously or asynchronously.
> 
> In the Linux implementation an async copy requires more resources than a
> sync copy.  If nfsd cannot allocate these resources, the best response
> is to simply perform the copy (or the first 4MB of it) synchronously.
> 

Where does the copy get clamped at 4MB?

> This choice may be debatable if the unavailable resource was due to
> memory allocation failure - when memalloc fails it might be best to
> simply give up as the server is clearly under load.  However in the case
> that policy prevents another kthread being created there is no benefit
> and much cost is failing with NFS4ERR_DELAY.  In that case it seems
> reasonable to avoid that error in all circumstances.
> 
> So change the out_err case to retry as a sync copy.
> 
> Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs4proc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index fea171ffed62..06e0d9153ca9 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  		wake_up_process(async_copy->copy_task);
>  		status = nfs_ok;
>  	} else {
> +	retry_sync:
>  		status = nfsd4_do_copy(copy, copy->nf_src->nf_file,
>  				       copy->nf_dst->nf_file, true);
>  	}
> @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	}
>  	if (async_copy)
>  		cleanup_async_copy(async_copy);
> -	status = nfserr_jukebox;
> -	goto out;
> +	goto retry_sync;
>  }
>  
>  static struct nfsd4_copy *
> 
> base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44


You're probably right that attempting to do this synchronously is
probably best. That should avoid another RPC, though at the cost of
having to shovel the data into the pagecache.

If memory is very tight, I suppose the synchronous copy might also fail
with NFS4ERR_DELAY, in which case we're no worse off.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

Chuck Lever III Nov. 4, 2024, 5:10 p.m. UTC | #2

On Mon, Nov 04, 2024 at 08:05:27AM -0500, Jeff Layton wrote:
> On Mon, 2024-11-04 at 15:47 +1100, NeilBrown wrote:
> > An NFSv4.2 COPY request can explicitly request a synchronous copy.  If
> > that is not requested then the server is free to perform the copy
> > synchronously or asynchronously.
> > 
> > In the Linux implementation an async copy requires more resources than a
> > sync copy.  If nfsd cannot allocate these resources, the best response
> > is to simply perform the copy (or the first 4MB of it) synchronously.
> > 
> 
> Where does the copy get clamped at 4MB?

fs/nfsd/vfs.c: nfsd_copy_file_range()


> > This choice may be debatable if the unavailable resource was due to
> > memory allocation failure - when memalloc fails it might be best to
> > simply give up as the server is clearly under load.  However in the case
> > that policy prevents another kthread being created there is no benefit
> > and much cost is failing with NFS4ERR_DELAY.  In that case it seems
> > reasonable to avoid that error in all circumstances.
> > 
> > So change the out_err case to retry as a sync copy.
> > 
> > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> >  fs/nfsd/nfs4proc.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index fea171ffed62..06e0d9153ca9 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  		wake_up_process(async_copy->copy_task);
> >  		status = nfs_ok;
> >  	} else {
> > +	retry_sync:
> >  		status = nfsd4_do_copy(copy, copy->nf_src->nf_file,
> >  				       copy->nf_dst->nf_file, true);
> >  	}
> > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  	}
> >  	if (async_copy)
> >  		cleanup_async_copy(async_copy);
> > -	status = nfserr_jukebox;
> > -	goto out;
> > +	goto retry_sync;
> >  }
> >  
> >  static struct nfsd4_copy *
> > 
> > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44
> 
> 
> You're probably right that attempting to do this synchronously is
> probably best. That should avoid another RPC, though at the cost of
> having to shovel the data into the pagecache.
> 
> If memory is very tight, I suppose the synchronous copy might also fail
> with NFS4ERR_DELAY, in which case we're no worse off.
> 
> Reviewed-by: Jeff Layton <jlayton@kernel.org>

Chuck Lever III Nov. 4, 2024, 5:15 p.m. UTC | #3

On Mon, Nov 04, 2024 at 03:47:42PM +1100, NeilBrown wrote:
> 
> An NFSv4.2 COPY request can explicitly request a synchronous copy.  If
> that is not requested then the server is free to perform the copy
> synchronously or asynchronously.
> 
> In the Linux implementation an async copy requires more resources than a
> sync copy.  If nfsd cannot allocate these resources, the best response
> is to simply perform the copy (or the first 4MB of it) synchronously.
> 
> This choice may be debatable if the unavailable resource was due to
> memory allocation failure - when memalloc fails it might be best to
> simply give up as the server is clearly under load.  However in the case
> that policy prevents another kthread being created there is no benefit
> and much cost is failing with NFS4ERR_DELAY.  In that case it seems
> reasonable to avoid that error in all circumstances.
> 
> So change the out_err case to retry as a sync copy.
> 
> Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")

Hi Neil,

Why is a Fixes: tag necessary?

And why that commit? async copies can fail due to lack of resources
on kernels that don't have aadc3bbea163, AFAICT.


> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfs4proc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index fea171ffed62..06e0d9153ca9 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  		wake_up_process(async_copy->copy_task);
>  		status = nfs_ok;
>  	} else {
> +	retry_sync:
>  		status = nfsd4_do_copy(copy, copy->nf_src->nf_file,
>  				       copy->nf_dst->nf_file, true);
>  	}
> @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  	}
>  	if (async_copy)
>  		cleanup_async_copy(async_copy);
> -	status = nfserr_jukebox;
> -	goto out;
> +	goto retry_sync;
>  }
>  
>  static struct nfsd4_copy *
> 
> base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44
> -- 
> 2.47.0
>

NeilBrown Nov. 4, 2024, 8:18 p.m. UTC | #4

On Tue, 05 Nov 2024, Jeff Layton wrote:
> On Mon, 2024-11-04 at 15:47 +1100, NeilBrown wrote:
> > An NFSv4.2 COPY request can explicitly request a synchronous copy.  If
> > that is not requested then the server is free to perform the copy
> > synchronously or asynchronously.
> > 
> > In the Linux implementation an async copy requires more resources than a
> > sync copy.  If nfsd cannot allocate these resources, the best response
> > is to simply perform the copy (or the first 4MB of it) synchronously.
> > 
> 
> Where does the copy get clamped at 4MB?
> 
> > This choice may be debatable if the unavailable resource was due to
> > memory allocation failure - when memalloc fails it might be best to
> > simply give up as the server is clearly under load.  However in the case
> > that policy prevents another kthread being created there is no benefit
> > and much cost is failing with NFS4ERR_DELAY.  In that case it seems
> > reasonable to avoid that error in all circumstances.
> > 
> > So change the out_err case to retry as a sync copy.
> > 
> > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> >  fs/nfsd/nfs4proc.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index fea171ffed62..06e0d9153ca9 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  		wake_up_process(async_copy->copy_task);
> >  		status = nfs_ok;
> >  	} else {
> > +	retry_sync:
> >  		status = nfsd4_do_copy(copy, copy->nf_src->nf_file,
> >  				       copy->nf_dst->nf_file, true);
> >  	}
> > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  	}
> >  	if (async_copy)
> >  		cleanup_async_copy(async_copy);
> > -	status = nfserr_jukebox;
> > -	goto out;
> > +	goto retry_sync;
> >  }
> >  
> >  static struct nfsd4_copy *
> > 
> > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44
> 
> 
> You're probably right that attempting to do this synchronously is
> probably best. That should avoid another RPC, though at the cost of
> having to shovel the data into the pagecache.

COPY always shovels data into the pagecache.  The difference between
sync and async is:
 - sync runs in nfsd thread, async run in a workqueue
 - sync stops after 4MB, async loops around after each 4MB until it
 finishes
(plus the obvious protocol differences).

> 
> If memory is very tight, I suppose the synchronous copy might also fail
> with NFS4ERR_DELAY, in which case we're no worse off.

Memory being tight isn't the interesting case.  More COPYs than nfsd
threads is the interesting case.

> 
> Reviewed-by: Jeff Layton <jlayton@kernel.org>

Thanks,
NeilBrown

>

NeilBrown Nov. 4, 2024, 8:30 p.m. UTC | #5

On Tue, 05 Nov 2024, Chuck Lever wrote:
> On Mon, Nov 04, 2024 at 03:47:42PM +1100, NeilBrown wrote:
> > 
> > An NFSv4.2 COPY request can explicitly request a synchronous copy.  If
> > that is not requested then the server is free to perform the copy
> > synchronously or asynchronously.
> > 
> > In the Linux implementation an async copy requires more resources than a
> > sync copy.  If nfsd cannot allocate these resources, the best response
> > is to simply perform the copy (or the first 4MB of it) synchronously.
> > 
> > This choice may be debatable if the unavailable resource was due to
> > memory allocation failure - when memalloc fails it might be best to
> > simply give up as the server is clearly under load.  However in the case
> > that policy prevents another kthread being created there is no benefit
> > and much cost is failing with NFS4ERR_DELAY.  In that case it seems
> > reasonable to avoid that error in all circumstances.
> > 
> > So change the out_err case to retry as a sync copy.
> > 
> > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
> 
> Hi Neil,
> 
> Why is a Fixes: tag necessary?
> 
> And why that commit? async copies can fail due to lack of resources
> on kernels that don't have aadc3bbea163, AFAICT.

I had hoped my commit message would have explained that, though I accept
it was not as explicit as it could be.

kmalloc(GFP_KERNEL) allocation failures aren't interesting.  They never
happen for smallish sizes, and if they do then the server is so borked
that it hardly matter what we do.

The fixed commit introduces a new failure mode that COULD easily be hit
in practice.  It causes the N+1st COPY to wait indefinitely until at
least one other copy completes which, as you observed in that commit,
could "run for a long time".  I don't think that behaviour is necessary
or appropriate.

Changing the handling for kmalloc failure was just an irrelevant
side-effect for changing the behaviour when then number of COPY requests
exceeded the number of configured threads.

This came up because CVE-2024-49974 was created so I had to do something
about the theoretical DoS vector in SLE kernels.  I didn't like the
patch so I backported

Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous")

instead (and wondered why it hadn't gone to stable).

Thanks,
NeilBrown


> 
> 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> >  fs/nfsd/nfs4proc.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index fea171ffed62..06e0d9153ca9 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  		wake_up_process(async_copy->copy_task);
> >  		status = nfs_ok;
> >  	} else {
> > +	retry_sync:
> >  		status = nfsd4_do_copy(copy, copy->nf_src->nf_file,
> >  				       copy->nf_dst->nf_file, true);
> >  	}
> > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >  	}
> >  	if (async_copy)
> >  		cleanup_async_copy(async_copy);
> > -	status = nfserr_jukebox;
> > -	goto out;
> > +	goto retry_sync;
> >  }
> >  
> >  static struct nfsd4_copy *
> > 
> > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44
> > -- 
> > 2.47.0
> > 
> 
> -- 
> Chuck Lever
>

Chuck Lever III Nov. 4, 2024, 8:52 p.m. UTC | #6

On Tue, Nov 05, 2024 at 07:30:03AM +1100, NeilBrown wrote:
> On Tue, 05 Nov 2024, Chuck Lever wrote:
> > On Mon, Nov 04, 2024 at 03:47:42PM +1100, NeilBrown wrote:
> > > 
> > > An NFSv4.2 COPY request can explicitly request a synchronous copy.  If
> > > that is not requested then the server is free to perform the copy
> > > synchronously or asynchronously.
> > > 
> > > In the Linux implementation an async copy requires more resources than a
> > > sync copy.  If nfsd cannot allocate these resources, the best response
> > > is to simply perform the copy (or the first 4MB of it) synchronously.
> > > 
> > > This choice may be debatable if the unavailable resource was due to
> > > memory allocation failure - when memalloc fails it might be best to
> > > simply give up as the server is clearly under load.  However in the case
> > > that policy prevents another kthread being created there is no benefit
> > > and much cost is failing with NFS4ERR_DELAY.  In that case it seems
> > > reasonable to avoid that error in all circumstances.
> > > 
> > > So change the out_err case to retry as a sync copy.
> > > 
> > > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
> > 
> > Hi Neil,
> > 
> > Why is a Fixes: tag necessary?
> > 
> > And why that commit? async copies can fail due to lack of resources
> > on kernels that don't have aadc3bbea163, AFAICT.
> 
> I had hoped my commit message would have explained that, though I accept
> it was not as explicit as it could be.

The problem might be that you and I have different understandings of
what exactly aadc3bbea163 does.


> kmalloc(GFP_KERNEL) allocation failures aren't interesting.  They never
> happen for smallish sizes, and if they do then the server is so borked
> that it hardly matter what we do.
> 
> The fixed commit introduces a new failure mode that COULD easily be hit
> in practice.  It causes the N+1st COPY to wait indefinitely until at
> least one other copy completes which, as you observed in that commit,
> could "run for a long time".  I don't think that behaviour is necessary
> or appropriate.

The waiting happens on the client. An async COPY operation always
completes quickly on the server, in this case with NFS4ERR_DELAY. It
does not tie up an nfsd thread.

By the way, there are two fixes in this area that now appear in
v6.12-rc6 that you should check out.


> Changing the handling for kmalloc failure was just an irrelevant
> side-effect for changing the behaviour when then number of COPY requests
> exceeded the number of configured threads.

aadc3bbea163 checks the number of concurrent /async/ COPY requests,
which do not tie up nfsd threads, and thus are not limited by the
svc_thread count, as synchronous COPY operations are by definition.

I'm still thinking about the ramifications of converting an async
COPY to a sync COPY in this case. We want to reduce the server
workload in this case, rather than accommodate an aggressive client.


> This came up because CVE-2024-49974 was created so I had to do something
> about the theoretical DoS vector in SLE kernels.  I didn't like the
> patch so I backported
> 
> Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous")
> 
> instead (and wondered why it hadn't gone to stable).

I was conservative about requesting a backport here. However, if a
CVE has been filed, and if there is no automation behind that
process, you can explicitly request aadc3bbea163 be backported.

The problem, to me, was less about server resource depletion and
more about client hangs.


> Thanks,
> NeilBrown
> 
> 
> > 
> > 
> > > Signed-off-by: NeilBrown <neilb@suse.de>
> > > ---
> > >  fs/nfsd/nfs4proc.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > index fea171ffed62..06e0d9153ca9 100644
> > > --- a/fs/nfsd/nfs4proc.c
> > > +++ b/fs/nfsd/nfs4proc.c
> > > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > >  		wake_up_process(async_copy->copy_task);
> > >  		status = nfs_ok;
> > >  	} else {
> > > +	retry_sync:
> > >  		status = nfsd4_do_copy(copy, copy->nf_src->nf_file,
> > >  				       copy->nf_dst->nf_file, true);
> > >  	}
> > > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > >  	}
> > >  	if (async_copy)
> > >  		cleanup_async_copy(async_copy);
> > > -	status = nfserr_jukebox;
> > > -	goto out;
> > > +	goto retry_sync;
> > >  }
> > >  
> > >  static struct nfsd4_copy *
> > > 
> > > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44
> > > -- 
> > > 2.47.0
> > > 
> > 
> > -- 
> > Chuck Lever
> > 
>

NeilBrown Nov. 5, 2024, 12:32 a.m. UTC | #7

On Tue, 05 Nov 2024, Chuck Lever wrote:
> On Tue, Nov 05, 2024 at 07:30:03AM +1100, NeilBrown wrote:
> > On Tue, 05 Nov 2024, Chuck Lever wrote:
> > > On Mon, Nov 04, 2024 at 03:47:42PM +1100, NeilBrown wrote:
> > > > 
> > > > An NFSv4.2 COPY request can explicitly request a synchronous copy.  If
> > > > that is not requested then the server is free to perform the copy
> > > > synchronously or asynchronously.
> > > > 
> > > > In the Linux implementation an async copy requires more resources than a
> > > > sync copy.  If nfsd cannot allocate these resources, the best response
> > > > is to simply perform the copy (or the first 4MB of it) synchronously.
> > > > 
> > > > This choice may be debatable if the unavailable resource was due to
> > > > memory allocation failure - when memalloc fails it might be best to
> > > > simply give up as the server is clearly under load.  However in the case
> > > > that policy prevents another kthread being created there is no benefit
> > > > and much cost is failing with NFS4ERR_DELAY.  In that case it seems
> > > > reasonable to avoid that error in all circumstances.
> > > > 
> > > > So change the out_err case to retry as a sync copy.
> > > > 
> > > > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
> > > 
> > > Hi Neil,
> > > 
> > > Why is a Fixes: tag necessary?
> > > 
> > > And why that commit? async copies can fail due to lack of resources
> > > on kernels that don't have aadc3bbea163, AFAICT.
> > 
> > I had hoped my commit message would have explained that, though I accept
> > it was not as explicit as it could be.
> 
> The problem might be that you and I have different understandings of
> what exactly aadc3bbea163 does.

It might be.
My understanding is that it limits the number of concurrent async
COPY requests to ->sp_nrthreads and once that limit in reached
any further COPY requests that don't explicitly request "synchronous"
are refused with NFS4ERR_DELAY.

> 
> 
> > kmalloc(GFP_KERNEL) allocation failures aren't interesting.  They never
> > happen for smallish sizes, and if they do then the server is so borked
> > that it hardly matter what we do.
> > 
> > The fixed commit introduces a new failure mode that COULD easily be hit
> > in practice.  It causes the N+1st COPY to wait indefinitely until at
> > least one other copy completes which, as you observed in that commit,
> > could "run for a long time".  I don't think that behaviour is necessary
> > or appropriate.
> 
> The waiting happens on the client. An async COPY operation always
> completes quickly on the server, in this case with NFS4ERR_DELAY. It
> does not tie up an nfsd thread.

Agreed that it doesn't tie up an nfsd thread.  It does tie up a separate
kthread for which there is a limit matching the number of nfsd threads
(in the pool).

Agreed that the waiting happens on the client, but why should there be
any waiting at all?  The client doesn't know what it is waiting for, so
will typically wait a few seconds.  In that time many megabytes of sync
COPY could have been processed.

> 
> By the way, there are two fixes in this area that now appear in
> v6.12-rc6 that you should check out.

I'll try to schedule time to have a look - thanks.

> 
> 
> > Changing the handling for kmalloc failure was just an irrelevant
> > side-effect for changing the behaviour when then number of COPY requests
> > exceeded the number of configured threads.
> 
> aadc3bbea163 checks the number of concurrent /async/ COPY requests,
> which do not tie up nfsd threads, and thus are not limited by the
> svc_thread count, as synchronous COPY operations are by definition.

They are PRECISELY limited by the svc_thread count.  ->sp_nrthreads.

+               if (atomic_inc_return(&nn->pending_async_copies) >
+                               (int)rqstp->rq_pool->sp_nrthreads) {

> 
> I'm still thinking about the ramifications of converting an async
> COPY to a sync COPY in this case. We want to reduce the server
> workload in this case, rather than accommodate an aggressive client.

We are not "converting" an async COPY to a sync COPY.  There is no such
thing as an "async COPY" in terms of what the client requests.
The client can request "COPY" which there server may perform sync or
async, or the client can request "COPY synchronous" which the server
must perform synchronously, or refuse to perform.

By tying up a thread temporarily with a sync COPY we do reduce server
workload by potentially increasing latency to the client.  I don't
think that "aggressive" is a fair description of the client.
"opportunistic" might be reasonable.

My current thinking is that we should not start extra threads for
handling async copies.  We should create a queue of pending copies and
any nfsd thread can dequeue a copy and process 4MB each time through
"The main request loop" just like it calls nfsd_file_net_dispose() to do
a little bit of work.  That isn't needed now but I'll need something
like that before my dynamic thread pool work can land.

> 
> 
> > This came up because CVE-2024-49974 was created so I had to do something
> > about the theoretical DoS vector in SLE kernels.  I didn't like the
> > patch so I backported
> > 
> > Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous")
> > 
> > instead (and wondered why it hadn't gone to stable).
> 
> I was conservative about requesting a backport here. However, if a
> CVE has been filed, and if there is no automation behind that
> process, you can explicitly request aadc3bbea163 be backported.
> 
> The problem, to me, was less about server resource depletion and
> more about client hangs.

And yet the patch that dealt with the less important server resource
depletion was marked for stable, and the patch that dealt with client
hangs wasn't??

The CVE was for that less important patch, probably because it contained
the magic word "DoS".

I think 8d915bbf3926 should go to stable but I would like to understand
why you felt the need to be conservative.

Thanks,
NeilBrown


> 
> 
> > Thanks,
> > NeilBrown
> > 
> > 
> > > 
> > > 
> > > > Signed-off-by: NeilBrown <neilb@suse.de>
> > > > ---
> > > >  fs/nfsd/nfs4proc.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > index fea171ffed62..06e0d9153ca9 100644
> > > > --- a/fs/nfsd/nfs4proc.c
> > > > +++ b/fs/nfsd/nfs4proc.c
> > > > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > >  		wake_up_process(async_copy->copy_task);
> > > >  		status = nfs_ok;
> > > >  	} else {
> > > > +	retry_sync:
> > > >  		status = nfsd4_do_copy(copy, copy->nf_src->nf_file,
> > > >  				       copy->nf_dst->nf_file, true);
> > > >  	}
> > > > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > >  	}
> > > >  	if (async_copy)
> > > >  		cleanup_async_copy(async_copy);
> > > > -	status = nfserr_jukebox;
> > > > -	goto out;
> > > > +	goto retry_sync;
> > > >  }
> > > >  
> > > >  static struct nfsd4_copy *
> > > > 
> > > > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44
> > > > -- 
> > > > 2.47.0
> > > > 
> > > 
> > > -- 
> > > Chuck Lever
> > > 
> > 
> 
> -- 
> Chuck Lever
>

Chuck Lever III Nov. 5, 2024, 2:46 p.m. UTC | #8

On Tue, Nov 05, 2024 at 11:32:48AM +1100, NeilBrown wrote:
> On Tue, 05 Nov 2024, Chuck Lever wrote:
> > On Tue, Nov 05, 2024 at 07:30:03AM +1100, NeilBrown wrote:
> > > On Tue, 05 Nov 2024, Chuck Lever wrote:
> > > > On Mon, Nov 04, 2024 at 03:47:42PM +1100, NeilBrown wrote:
> > > > > 
> > > > > An NFSv4.2 COPY request can explicitly request a synchronous copy.  If
> > > > > that is not requested then the server is free to perform the copy
> > > > > synchronously or asynchronously.
> > > > > 
> > > > > In the Linux implementation an async copy requires more resources than a
> > > > > sync copy.  If nfsd cannot allocate these resources, the best response
> > > > > is to simply perform the copy (or the first 4MB of it) synchronously.
> > > > > 
> > > > > This choice may be debatable if the unavailable resource was due to
> > > > > memory allocation failure - when memalloc fails it might be best to
> > > > > simply give up as the server is clearly under load.  However in the case
> > > > > that policy prevents another kthread being created there is no benefit
> > > > > and much cost is failing with NFS4ERR_DELAY.  In that case it seems
> > > > > reasonable to avoid that error in all circumstances.
> > > > > 
> > > > > So change the out_err case to retry as a sync copy.
> > > > > 
> > > > > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
> > > > 
> > > > Hi Neil,
> > > > 
> > > > Why is a Fixes: tag necessary?
> > > > 
> > > > And why that commit? async copies can fail due to lack of resources
> > > > on kernels that don't have aadc3bbea163, AFAICT.
> > > 
> > > I had hoped my commit message would have explained that, though I accept
> > > it was not as explicit as it could be.
> > 
> > The problem might be that you and I have different understandings of
> > what exactly aadc3bbea163 does.
> 
> It might be.
> My understanding is that it limits the number of concurrent async
> COPY requests to ->sp_nrthreads and once that limit in reached
> any further COPY requests that don't explicitly request "synchronous"
> are refused with NFS4ERR_DELAY.
> 
> > > kmalloc(GFP_KERNEL) allocation failures aren't interesting.  They never
> > > happen for smallish sizes, and if they do then the server is so borked
> > > that it hardly matter what we do.
> > > 
> > > The fixed commit introduces a new failure mode that COULD easily be hit
> > > in practice.  It causes the N+1st COPY to wait indefinitely until at
> > > least one other copy completes which, as you observed in that commit,
> > > could "run for a long time".  I don't think that behaviour is necessary
> > > or appropriate.
> > 
> > The waiting happens on the client. An async COPY operation always
> > completes quickly on the server, in this case with NFS4ERR_DELAY. It
> > does not tie up an nfsd thread.
> 
> Agreed that it doesn't tie up an nfsd thread.  It does tie up a separate
> kthread for which there is a limit matching the number of nfsd threads
> (in the pool).
> 
> Agreed that the waiting happens on the client, but why should there be
> any waiting at all?  The client doesn't know what it is waiting for, so
> will typically wait a few seconds.  In that time many megabytes of sync
> COPY could have been processed.

The Linux NFS client's usual delay is, IIRC, 100 msec with
exponential backoff.

It's possible that the number of async copies is large because they
are running on a slow export. Adding more copy work is going to make
the situation worse -- and by handling the work with a synchronous
COPY, it will tie up threads that should be available for other
work.

The feedback loop here should result in reducing server workload,
not increasing it.


> > By the way, there are two fixes in this area that now appear in
> > v6.12-rc6 that you should check out.
> 
> I'll try to schedule time to have a look - thanks.
> 
> > > Changing the handling for kmalloc failure was just an irrelevant
> > > side-effect for changing the behaviour when then number of COPY requests
> > > exceeded the number of configured threads.
> > 
> > aadc3bbea163 checks the number of concurrent /async/ COPY requests,
> > which do not tie up nfsd threads, and thus are not limited by the
> > svc_thread count, as synchronous COPY operations are by definition.
> 
> They are PRECISELY limited by the svc_thread count.  ->sp_nrthreads.

I was describing the situation /before/ aadc3bbea163 , when there
was no limit at all.

Note that is an arbitrary limit. We could pick something else if
this limit interferes with the dynamic thread count changes.


> My current thinking is that we should not start extra threads for
> handling async copies.  We should create a queue of pending copies and
> any nfsd thread can dequeue a copy and process 4MB each time through
> "The main request loop" just like it calls nfsd_file_net_dispose() to do
> a little bit of work.

Having nfsd threads handle this workload again invites a DoS vector.

The 4MB chunk limit is there precisely to prevent synchronous COPY
operations from tying up nfsd threads for too long. On a slow export,
this is still not going to work, so I'd rather see a timer for this
protection; say, 30ms, rather than a byte count. If more than 4MB
can be handled quickly, that will be good for throughput.

Note that we still want to limit the number of background copy
operations going on. I don't want a mechanism where a client can
start an unbounded amount of work on the server.


> > > This came up because CVE-2024-49974 was created so I had to do something
> > > about the theoretical DoS vector in SLE kernels.  I didn't like the
> > > patch so I backported
> > > 
> > > Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous")
> > > 
> > > instead (and wondered why it hadn't gone to stable).
> > 
> > I was conservative about requesting a backport here. However, if a
> > CVE has been filed, and if there is no automation behind that
> > process, you can explicitly request aadc3bbea163 be backported.
> > 
> > The problem, to me, was less about server resource depletion and
> > more about client hangs.
> 
> And yet the patch that dealt with the less important server resource
> depletion was marked for stable, and the patch that dealt with client
> hangs wasn't??
> 
> The CVE was for that less important patch, probably because it contained
> the magic word "DoS".

Quite likely. I wasn't consulted before the CVE was opened, nor was
I notified that it had been created.

Note that distributions are encouraged to evaluate whether a CVE is
serious enough to address, rather than simply backporting the fixes
automatically. But I know some customers want every CVE handled, so
that is sometimes difficult.


> I think 8d915bbf3926 should go to stable but I would like to understand
> why you felt the need to be conservative.

First, I'm told that LTS generally avoids taking backports that
overtly change user-visible behavior like disabling server-to-server
copy (which requires async COPY to work). That was the main reason
for my hesitance.

But more importantly, the problem with the automatic backport
mechanism is that marked patches are taken /immediately/ into
stable. They don't get the kind of soak time that a normally-merged
unmarked patch gets. The only way to ensure they get any real-world
test experience at all is to not mark them, and then come back to
them later and explicitly request a backport.

And, generally, we want to know that a patch destined for LTS
kernels has actually been applied to and tested on LTS first.
Automatically backported patches don't get that verification at all.

My overall preference is that Fixed: patches should be ignored by
the automation, and that we have a designated NFSD LTS maintainer
who will test patches on each LTS kernel and request their backport.
I haven't found anyone to do that work, so we are limping along with
the current situation. I recognize, however, that this needs to
improve somehow with only the maintainer resources we have.


> > > > > Signed-off-by: NeilBrown <neilb@suse.de>
> > > > > ---
> > > > >  fs/nfsd/nfs4proc.c | 4 ++--
> > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > > index fea171ffed62..06e0d9153ca9 100644
> > > > > --- a/fs/nfsd/nfs4proc.c
> > > > > +++ b/fs/nfsd/nfs4proc.c
> > > > > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > >  		wake_up_process(async_copy->copy_task);
> > > > >  		status = nfs_ok;
> > > > >  	} else {
> > > > > +	retry_sync:
> > > > >  		status = nfsd4_do_copy(copy, copy->nf_src->nf_file,
> > > > >  				       copy->nf_dst->nf_file, true);
> > > > >  	}
> > > > > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > >  	}
> > > > >  	if (async_copy)
> > > > >  		cleanup_async_copy(async_copy);
> > > > > -	status = nfserr_jukebox;
> > > > > -	goto out;
> > > > > +	goto retry_sync;
> > > > >  }
> > > > >  
> > > > >  static struct nfsd4_copy *
> > > > > 
> > > > > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44
> > > > > -- 
> > > > > 2.47.0
> > > > > 
> > > > 
> > > > -- 
> > > > Chuck Lever
> > > > 
> > > 
> > 
> > -- 
> > Chuck Lever
> > 
>

NeilBrown Nov. 5, 2024, 9:06 p.m. UTC | #9

On Wed, 06 Nov 2024, Chuck Lever wrote:
> On Tue, Nov 05, 2024 at 11:32:48AM +1100, NeilBrown wrote:
> > On Tue, 05 Nov 2024, Chuck Lever wrote:
> > > On Tue, Nov 05, 2024 at 07:30:03AM +1100, NeilBrown wrote:
> > > > On Tue, 05 Nov 2024, Chuck Lever wrote:
> > > > > On Mon, Nov 04, 2024 at 03:47:42PM +1100, NeilBrown wrote:
> > > > > > 
> > > > > > An NFSv4.2 COPY request can explicitly request a synchronous copy.  If
> > > > > > that is not requested then the server is free to perform the copy
> > > > > > synchronously or asynchronously.
> > > > > > 
> > > > > > In the Linux implementation an async copy requires more resources than a
> > > > > > sync copy.  If nfsd cannot allocate these resources, the best response
> > > > > > is to simply perform the copy (or the first 4MB of it) synchronously.
> > > > > > 
> > > > > > This choice may be debatable if the unavailable resource was due to
> > > > > > memory allocation failure - when memalloc fails it might be best to
> > > > > > simply give up as the server is clearly under load.  However in the case
> > > > > > that policy prevents another kthread being created there is no benefit
> > > > > > and much cost is failing with NFS4ERR_DELAY.  In that case it seems
> > > > > > reasonable to avoid that error in all circumstances.
> > > > > > 
> > > > > > So change the out_err case to retry as a sync copy.
> > > > > > 
> > > > > > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
> > > > > 
> > > > > Hi Neil,
> > > > > 
> > > > > Why is a Fixes: tag necessary?
> > > > > 
> > > > > And why that commit? async copies can fail due to lack of resources
> > > > > on kernels that don't have aadc3bbea163, AFAICT.
> > > > 
> > > > I had hoped my commit message would have explained that, though I accept
> > > > it was not as explicit as it could be.
> > > 
> > > The problem might be that you and I have different understandings of
> > > what exactly aadc3bbea163 does.
> > 
> > It might be.
> > My understanding is that it limits the number of concurrent async
> > COPY requests to ->sp_nrthreads and once that limit in reached
> > any further COPY requests that don't explicitly request "synchronous"
> > are refused with NFS4ERR_DELAY.
> > 
> > > > kmalloc(GFP_KERNEL) allocation failures aren't interesting.  They never
> > > > happen for smallish sizes, and if they do then the server is so borked
> > > > that it hardly matter what we do.
> > > > 
> > > > The fixed commit introduces a new failure mode that COULD easily be hit
> > > > in practice.  It causes the N+1st COPY to wait indefinitely until at
> > > > least one other copy completes which, as you observed in that commit,
> > > > could "run for a long time".  I don't think that behaviour is necessary
> > > > or appropriate.
> > > 
> > > The waiting happens on the client. An async COPY operation always
> > > completes quickly on the server, in this case with NFS4ERR_DELAY. It
> > > does not tie up an nfsd thread.
> > 
> > Agreed that it doesn't tie up an nfsd thread.  It does tie up a separate
> > kthread for which there is a limit matching the number of nfsd threads
> > (in the pool).
> > 
> > Agreed that the waiting happens on the client, but why should there be
> > any waiting at all?  The client doesn't know what it is waiting for, so
> > will typically wait a few seconds.  In that time many megabytes of sync
> > COPY could have been processed.
> 
> The Linux NFS client's usual delay is, IIRC, 100 msec with
> exponential backoff.

Yep, up to 15seconds.

> 
> It's possible that the number of async copies is large because they
> are running on a slow export. Adding more copy work is going to make
> the situation worse -- and by handling the work with a synchronous
> COPY, it will tie up threads that should be available for other
> work.

Should be available for "work" yes, but why "other work"?  Is COPY work
not as important as READ or WRITE or GETATTR work?
READ/WRITE are limited to 1MB, sync-COPY to 4MB so a small difference
that, but it doesn't seem substantial.

> 
> The feedback loop here should result in reducing server workload,
> not increasing it.

I agree with not increasing it.  I don't see the rational for reducing
workload, only for limiting it.

> 
> 
> > > By the way, there are two fixes in this area that now appear in
> > > v6.12-rc6 that you should check out.
> > 
> > I'll try to schedule time to have a look - thanks.
> > 
> > > > Changing the handling for kmalloc failure was just an irrelevant
> > > > side-effect for changing the behaviour when then number of COPY requests
> > > > exceeded the number of configured threads.
> > > 
> > > aadc3bbea163 checks the number of concurrent /async/ COPY requests,
> > > which do not tie up nfsd threads, and thus are not limited by the
> > > svc_thread count, as synchronous COPY operations are by definition.
> > 
> > They are PRECISELY limited by the svc_thread count.  ->sp_nrthreads.
> 
> I was describing the situation /before/ aadc3bbea163 , when there
> was no limit at all.
> 
> Note that is an arbitrary limit. We could pick something else if
> this limit interferes with the dynamic thread count changes.
> 
> 
> > My current thinking is that we should not start extra threads for
> > handling async copies.  We should create a queue of pending copies and
> > any nfsd thread can dequeue a copy and process 4MB each time through
> > "The main request loop" just like it calls nfsd_file_net_dispose() to do
> > a little bit of work.
> 
> Having nfsd threads handle this workload again invites a DoS vector.

Any more so that having nfsd thread handling a WRITE workload?

> 
> The 4MB chunk limit is there precisely to prevent synchronous COPY
> operations from tying up nfsd threads for too long. On a slow export,
> this is still not going to work, so I'd rather see a timer for this
> protection; say, 30ms, rather than a byte count. If more than 4MB
> can be handled quickly, that will be good for throughput.

That sounds like a good goal.  Ideally we would need a way to negotiate
a window with write-back throttling so that we don't bother reading
until we know that writing to the page-cache won't block.  Certainly
worth exploring.

> 
> Note that we still want to limit the number of background copy
> operations going on. I don't want a mechanism where a client can
> start an unbounded amount of work on the server.

This isn't obvious to me.  The server makes no promise concerning the
throughput it will provide.  Having a list of COPY requests that add up
to many terabytes isn't intrinsically a problem.  Having millions of
COPY requests in the list *might* be a little bit of a burden.  Using
__GFP_NORETRY might help put a natural limit on that.

> 
> 
> > > > This came up because CVE-2024-49974 was created so I had to do something
> > > > about the theoretical DoS vector in SLE kernels.  I didn't like the
> > > > patch so I backported
> > > > 
> > > > Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous")
> > > > 
> > > > instead (and wondered why it hadn't gone to stable).
> > > 
> > > I was conservative about requesting a backport here. However, if a
> > > CVE has been filed, and if there is no automation behind that
> > > process, you can explicitly request aadc3bbea163 be backported.
> > > 
> > > The problem, to me, was less about server resource depletion and
> > > more about client hangs.
> > 
> > And yet the patch that dealt with the less important server resource
> > depletion was marked for stable, and the patch that dealt with client
> > hangs wasn't??
> > 
> > The CVE was for that less important patch, probably because it contained
> > the magic word "DoS".
> 
> Quite likely. I wasn't consulted before the CVE was opened, nor was
> I notified that it had been created.
> 
> Note that distributions are encouraged to evaluate whether a CVE is
> serious enough to address, rather than simply backporting the fixes
> automatically. But I know some customers want every CVE handled, so
> that is sometimes difficult.

Yes, it needs to be handled.  Declaring it invalid is certainly an
option for handling it.  I didn't quite feel I could justify that in
this case.

> 
> 
> > I think 8d915bbf3926 should go to stable but I would like to understand
> > why you felt the need to be conservative.
> 
> First, I'm told that LTS generally avoids taking backports that
> overtly change user-visible behavior like disabling server-to-server
> copy (which requires async COPY to work). That was the main reason
> for my hesitance.

Why does server-to-server require async COPY?
RFC 7862 section 4.5.  Inter-Server Copy says
  The destination server may perform the copy synchronously or
  asynchronously.
but I see that nfsd4_copy() returns nfs4err_notsupp if the inter-server
copy_is_sync(), but it isn't immediately obvious to me why.  The patch
which landed this functionality doesn't explain the restriction.

I guess that with multiple 4MB sync COPY requests the server would need
to repeatedly mount and unmount the source server which could be
unnecessary work - or would need to cache the mount and know when to
unmount it....

On the other hand, changing the user-visible behaviour of the client
unexpected hanging waiting for a server-side copy completion
notification that it will never get seems like a user-visible change
that would be desirable.

I'm starting to see why this is awkward.

> 
> But more importantly, the problem with the automatic backport
> mechanism is that marked patches are taken /immediately/ into
> stable. They don't get the kind of soak time that a normally-merged
> unmarked patch gets. The only way to ensure they get any real-world
> test experience at all is to not mark them, and then come back to
> them later and explicitly request a backport.
> 
> And, generally, we want to know that a patch destined for LTS
> kernels has actually been applied to and tested on LTS first.
> Automatically backported patches don't get that verification at all.

I thought it was possible to mark patches to tell the stable team
exactly what you want.  Greg certainly seems eager to give maintainers as
much control as they ask for - without requiring them to do anything
they don't want to do.  If you have a clear idea of what you want, it
might be good to spell that out and ask how to achieve it.

> 
> My overall preference is that Fixed: patches should be ignored by
> the automation, and that we have a designated NFSD LTS maintainer
> who will test patches on each LTS kernel and request their backport.
> I haven't found anyone to do that work, so we are limping along with
> the current situation. I recognize, however, that this needs to
> improve somehow with only the maintainer resources we have.

:-)

Thanks,
NeilBrown

> 
> 
> > > > > > Signed-off-by: NeilBrown <neilb@suse.de>
> > > > > > ---
> > > > > >  fs/nfsd/nfs4proc.c | 4 ++--
> > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > > > index fea171ffed62..06e0d9153ca9 100644
> > > > > > --- a/fs/nfsd/nfs4proc.c
> > > > > > +++ b/fs/nfsd/nfs4proc.c
> > > > > > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > > >  		wake_up_process(async_copy->copy_task);
> > > > > >  		status = nfs_ok;
> > > > > >  	} else {
> > > > > > +	retry_sync:
> > > > > >  		status = nfsd4_do_copy(copy, copy->nf_src->nf_file,
> > > > > >  				       copy->nf_dst->nf_file, true);
> > > > > >  	}
> > > > > > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > > >  	}
> > > > > >  	if (async_copy)
> > > > > >  		cleanup_async_copy(async_copy);
> > > > > > -	status = nfserr_jukebox;
> > > > > > -	goto out;
> > > > > > +	goto retry_sync;
> > > > > >  }
> > > > > >  
> > > > > >  static struct nfsd4_copy *
> > > > > > 
> > > > > > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44
> > > > > > -- 
> > > > > > 2.47.0
> > > > > > 
> > > > > 
> > > > > -- 
> > > > > Chuck Lever
> > > > > 
> > > > 
> > > 
> > > -- 
> > > Chuck Lever
> > > 
> > 
> 
> -- 
> Chuck Lever
>

Chuck Lever III Nov. 5, 2024, 9:28 p.m. UTC | #10

On Wed, Nov 06, 2024 at 08:06:40AM +1100, NeilBrown wrote:
> On Wed, 06 Nov 2024, Chuck Lever wrote:
> > But more importantly, the problem with the automatic backport
> > mechanism is that marked patches are taken /immediately/ into
> > stable. They don't get the kind of soak time that a normally-merged
> > unmarked patch gets. The only way to ensure they get any real-world
> > test experience at all is to not mark them, and then come back to
> > them later and explicitly request a backport.
> > 
> > And, generally, we want to know that a patch destined for LTS
> > kernels has actually been applied to and tested on LTS first.
> > Automatically backported patches don't get that verification at all.
> 
> I thought it was possible to mark patches to tell the stable team
> exactly what you want.  Greg certainly seems eager to give maintainers as
> much control as they ask for - without requiring them to do anything
> they don't want to do.

Yes, Greg and Sasha want to do something constructive. I see
suggestions flying by from time to time. The last specific one I
think was rejected by Linus. It appears to be an ongoing
conversation.

> If you have a clear idea of what you want, it
> might be good to spell that out and ask how to achieve it.

Perhaps having a separate review process, where each patch in
nfsd-next is audited just before the merge window to see whether it
should be marked for stable, would help. Keeping written notes on
these decisions would also be helpful; Greg asks when an explicit
backport request comes by why it wasn't marked when it went into
Linus' branch.

Making it easier / more automated to test-apply such patches to LTS
kernels is also something to think about. I have nightly CI for
those branches, but it tests what is already in the LTS queue rather
than testing NFSD-specific patches.

Olga Kornievskaia Nov. 5, 2024, 11:47 p.m. UTC | #11

On Tue, Nov 5, 2024 at 4:06 PM NeilBrown <neilb@suse.de> wrote:
>
> On Wed, 06 Nov 2024, Chuck Lever wrote:
> > On Tue, Nov 05, 2024 at 11:32:48AM +1100, NeilBrown wrote:
> > > On Tue, 05 Nov 2024, Chuck Lever wrote:
> > > > On Tue, Nov 05, 2024 at 07:30:03AM +1100, NeilBrown wrote:
> > > > > On Tue, 05 Nov 2024, Chuck Lever wrote:
> > > > > > On Mon, Nov 04, 2024 at 03:47:42PM +1100, NeilBrown wrote:
> > > > > > >
> > > > > > > An NFSv4.2 COPY request can explicitly request a synchronous copy.  If
> > > > > > > that is not requested then the server is free to perform the copy
> > > > > > > synchronously or asynchronously.
> > > > > > >
> > > > > > > In the Linux implementation an async copy requires more resources than a
> > > > > > > sync copy.  If nfsd cannot allocate these resources, the best response
> > > > > > > is to simply perform the copy (or the first 4MB of it) synchronously.
> > > > > > >
> > > > > > > This choice may be debatable if the unavailable resource was due to
> > > > > > > memory allocation failure - when memalloc fails it might be best to
> > > > > > > simply give up as the server is clearly under load.  However in the case
> > > > > > > that policy prevents another kthread being created there is no benefit
> > > > > > > and much cost is failing with NFS4ERR_DELAY.  In that case it seems
> > > > > > > reasonable to avoid that error in all circumstances.
> > > > > > >
> > > > > > > So change the out_err case to retry as a sync copy.
> > > > > > >
> > > > > > > Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
> > > > > >
> > > > > > Hi Neil,
> > > > > >
> > > > > > Why is a Fixes: tag necessary?
> > > > > >
> > > > > > And why that commit? async copies can fail due to lack of resources
> > > > > > on kernels that don't have aadc3bbea163, AFAICT.
> > > > >
> > > > > I had hoped my commit message would have explained that, though I accept
> > > > > it was not as explicit as it could be.
> > > >
> > > > The problem might be that you and I have different understandings of
> > > > what exactly aadc3bbea163 does.
> > >
> > > It might be.
> > > My understanding is that it limits the number of concurrent async
> > > COPY requests to ->sp_nrthreads and once that limit in reached
> > > any further COPY requests that don't explicitly request "synchronous"
> > > are refused with NFS4ERR_DELAY.
> > >
> > > > > kmalloc(GFP_KERNEL) allocation failures aren't interesting.  They never
> > > > > happen for smallish sizes, and if they do then the server is so borked
> > > > > that it hardly matter what we do.
> > > > >
> > > > > The fixed commit introduces a new failure mode that COULD easily be hit
> > > > > in practice.  It causes the N+1st COPY to wait indefinitely until at
> > > > > least one other copy completes which, as you observed in that commit,
> > > > > could "run for a long time".  I don't think that behaviour is necessary
> > > > > or appropriate.
> > > >
> > > > The waiting happens on the client. An async COPY operation always
> > > > completes quickly on the server, in this case with NFS4ERR_DELAY. It
> > > > does not tie up an nfsd thread.
> > >
> > > Agreed that it doesn't tie up an nfsd thread.  It does tie up a separate
> > > kthread for which there is a limit matching the number of nfsd threads
> > > (in the pool).
> > >
> > > Agreed that the waiting happens on the client, but why should there be
> > > any waiting at all?  The client doesn't know what it is waiting for, so
> > > will typically wait a few seconds.  In that time many megabytes of sync
> > > COPY could have been processed.
> >
> > The Linux NFS client's usual delay is, IIRC, 100 msec with
> > exponential backoff.
>
> Yep, up to 15seconds.
>
> >
> > It's possible that the number of async copies is large because they
> > are running on a slow export. Adding more copy work is going to make
> > the situation worse -- and by handling the work with a synchronous
> > COPY, it will tie up threads that should be available for other
> > work.
>
> Should be available for "work" yes, but why "other work"?  Is COPY work
> not as important as READ or WRITE or GETATTR work?
> READ/WRITE are limited to 1MB, sync-COPY to 4MB so a small difference
> that, but it doesn't seem substantial.
>
> >
> > The feedback loop here should result in reducing server workload,
> > not increasing it.
>
> I agree with not increasing it.  I don't see the rational for reducing
> workload, only for limiting it.
>
> >
> >
> > > > By the way, there are two fixes in this area that now appear in
> > > > v6.12-rc6 that you should check out.
> > >
> > > I'll try to schedule time to have a look - thanks.
> > >
> > > > > Changing the handling for kmalloc failure was just an irrelevant
> > > > > side-effect for changing the behaviour when then number of COPY requests
> > > > > exceeded the number of configured threads.
> > > >
> > > > aadc3bbea163 checks the number of concurrent /async/ COPY requests,
> > > > which do not tie up nfsd threads, and thus are not limited by the
> > > > svc_thread count, as synchronous COPY operations are by definition.
> > >
> > > They are PRECISELY limited by the svc_thread count.  ->sp_nrthreads.
> >
> > I was describing the situation /before/ aadc3bbea163 , when there
> > was no limit at all.
> >
> > Note that is an arbitrary limit. We could pick something else if
> > this limit interferes with the dynamic thread count changes.
> >
> >
> > > My current thinking is that we should not start extra threads for
> > > handling async copies.  We should create a queue of pending copies and
> > > any nfsd thread can dequeue a copy and process 4MB each time through
> > > "The main request loop" just like it calls nfsd_file_net_dispose() to do
> > > a little bit of work.
> >
> > Having nfsd threads handle this workload again invites a DoS vector.
>
> Any more so that having nfsd thread handling a WRITE workload?

Isn't a difference between a COPY and a WRITE the fact that the server
has the ability to restrict the TCP window of the client sending the
bytes. And for simplicity's sake, if we assume client/server has a
single TCP stream when the window size is limited then other WRITEs
are also prevented from sending more data. But a COPY operation
doesn't require much to be sent and preventing the client from sending
another COPY can't be achieved thru TCP window size.

> > The 4MB chunk limit is there precisely to prevent synchronous COPY
> > operations from tying up nfsd threads for too long. On a slow export,
> > this is still not going to work, so I'd rather see a timer for this
> > protection; say, 30ms, rather than a byte count. If more than 4MB
> > can be handled quickly, that will be good for throughput.
>
> That sounds like a good goal.  Ideally we would need a way to negotiate
> a window with write-back throttling so that we don't bother reading
> until we know that writing to the page-cache won't block.  Certainly
> worth exploring.
>
> >
> > Note that we still want to limit the number of background copy
> > operations going on. I don't want a mechanism where a client can
> > start an unbounded amount of work on the server.
>
> This isn't obvious to me.  The server makes no promise concerning the
> throughput it will provide.  Having a list of COPY requests that add up
> to many terabytes isn't intrinsically a problem.  Having millions of
> COPY requests in the list *might* be a little bit of a burden.  Using
> __GFP_NORETRY might help put a natural limit on that.
>
> >
> >
> > > > > This came up because CVE-2024-49974 was created so I had to do something
> > > > > about the theoretical DoS vector in SLE kernels.  I didn't like the
> > > > > patch so I backported
> > > > >
> > > > > Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous")

I'm doing the same for RHEL. But the fact that CVE was created seems
like a flaw of the CVE creation process in this case. It should have
never been made a CVE.

> > > > >
> > > > > instead (and wondered why it hadn't gone to stable).
> > > >
> > > > I was conservative about requesting a backport here. However, if a
> > > > CVE has been filed, and if there is no automation behind that
> > > > process, you can explicitly request aadc3bbea163 be backported.
> > > >
> > > > The problem, to me, was less about server resource depletion and
> > > > more about client hangs.
> > >
> > > And yet the patch that dealt with the less important server resource
> > > depletion was marked for stable, and the patch that dealt with client
> > > hangs wasn't??
> > >
> > > The CVE was for that less important patch, probably because it contained
> > > the magic word "DoS".
> >
> > Quite likely. I wasn't consulted before the CVE was opened, nor was
> > I notified that it had been created.
> >
> > Note that distributions are encouraged to evaluate whether a CVE is
> > serious enough to address, rather than simply backporting the fixes
> > automatically. But I know some customers want every CVE handled, so
> > that is sometimes difficult.
>
> Yes, it needs to be handled.  Declaring it invalid is certainly an
> option for handling it.  I didn't quite feel I could justify that in
> this case.
>
> >
> >
> > > I think 8d915bbf3926 should go to stable but I would like to understand
> > > why you felt the need to be conservative.
> >
> > First, I'm told that LTS generally avoids taking backports that
> > overtly change user-visible behavior like disabling server-to-server
> > copy (which requires async COPY to work). That was the main reason
> > for my hesitance.
>
> Why does server-to-server require async COPY?
> RFC 7862 section 4.5.  Inter-Server Copy says
>   The destination server may perform the copy synchronously or
>   asynchronously.
> but I see that nfsd4_copy() returns nfs4err_notsupp if the inter-server
> copy_is_sync(), but it isn't immediately obvious to me why.  The patch
> which landed this functionality doesn't explain the restriction.

The choice (my choice) for not implementing a synchronous inter-server
copy was from my understanding that such operation would be taking too
much time and tie up a knfsd thread.

> I guess that with multiple 4MB sync COPY requests the server would need
> to repeatedly mount and unmount the source server which could be
> unnecessary work - or would need to cache the mount and know when to
> unmount it....

The implementation caches the (source server) mount for a period of
time. But needing to issue multiple COPY can eventually impact
performance.

> On the other hand, changing the user-visible behaviour of the client
> unexpected hanging waiting for a server-side copy completion
> notification that it will never get seems like a user-visible change
> that would be desirable.

> I'm starting to see why this is awkward.
>
> >
> > But more importantly, the problem with the automatic backport
> > mechanism is that marked patches are taken /immediately/ into
> > stable. They don't get the kind of soak time that a normally-merged
> > unmarked patch gets. The only way to ensure they get any real-world
> > test experience at all is to not mark them, and then come back to
> > them later and explicitly request a backport.
> >
> > And, generally, we want to know that a patch destined for LTS
> > kernels has actually been applied to and tested on LTS first.
> > Automatically backported patches don't get that verification at all.
>
> I thought it was possible to mark patches to tell the stable team
> exactly what you want.  Greg certainly seems eager to give maintainers as
> much control as they ask for - without requiring them to do anything
> they don't want to do.  If you have a clear idea of what you want, it
> might be good to spell that out and ask how to achieve it.
>
> >
> > My overall preference is that Fixed: patches should be ignored by
> > the automation, and that we have a designated NFSD LTS maintainer
> > who will test patches on each LTS kernel and request their backport.
> > I haven't found anyone to do that work, so we are limping along with
> > the current situation. I recognize, however, that this needs to
> > improve somehow with only the maintainer resources we have.
>
> :-)
>
> Thanks,
> NeilBrown
>
> >
> >
> > > > > > > Signed-off-by: NeilBrown <neilb@suse.de>
> > > > > > > ---
> > > > > > >  fs/nfsd/nfs4proc.c | 4 ++--
> > > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > > > > index fea171ffed62..06e0d9153ca9 100644
> > > > > > > --- a/fs/nfsd/nfs4proc.c
> > > > > > > +++ b/fs/nfsd/nfs4proc.c
> > > > > > > @@ -1972,6 +1972,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > > > >             wake_up_process(async_copy->copy_task);
> > > > > > >             status = nfs_ok;
> > > > > > >     } else {
> > > > > > > +   retry_sync:
> > > > > > >             status = nfsd4_do_copy(copy, copy->nf_src->nf_file,
> > > > > > >                                    copy->nf_dst->nf_file, true);
> > > > > > >     }
> > > > > > > @@ -1990,8 +1991,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > > > > > >     }
> > > > > > >     if (async_copy)
> > > > > > >             cleanup_async_copy(async_copy);
> > > > > > > -   status = nfserr_jukebox;
> > > > > > > -   goto out;
> > > > > > > +   goto retry_sync;
> > > > > > >  }
> > > > > > >
> > > > > > >  static struct nfsd4_copy *
> > > > > > >
> > > > > > > base-commit: 26e6e693936986309c01e8bb80e318d63fda4a44
> > > > > > > --
> > > > > > > 2.47.0
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Chuck Lever
> > > > > >
> > > > >
> > > >
> > > > --
> > > > Chuck Lever
> > > >
> > >
> >
> > --
> > Chuck Lever
> >
>
>

NeilBrown Nov. 6, 2024, 4:30 a.m. UTC | #12

On Wed, 06 Nov 2024, Olga Kornievskaia wrote:
> On Tue, Nov 5, 2024 at 4:06 PM NeilBrown <neilb@suse.de> wrote:
> >
> > On Wed, 06 Nov 2024, Chuck Lever wrote:
> > >
> > > Having nfsd threads handle this workload again invites a DoS vector.
> >
> > Any more so that having nfsd thread handling a WRITE workload?
> 
> Isn't a difference between a COPY and a WRITE the fact that the server
> has the ability to restrict the TCP window of the client sending the
> bytes. And for simplicity's sake, if we assume client/server has a
> single TCP stream when the window size is limited then other WRITEs
> are also prevented from sending more data. But a COPY operation
> doesn't require much to be sent and preventing the client from sending
> another COPY can't be achieved thru TCP window size.

I think you are saying that the WRITE requests are naturally throttled. 
However that isn't necessarily the case.  If the network is
significantly faster than the storage path, and if a client (or several
clients) send enough concurrent WRITE requests, then all nfsd threads
could get stuck in writeback throttling code - which could be seen as a
denial of service as no other requests could be serviced until
congestion eases.

So maybe some threads should be reserved for non-IO requests and so
would avoid dequeuing WRITE and READ and COPY requests - and would not
be involved in async COPY.

So I'm still not convinced that COPY in any way introduced a new DoS
problem - providing the limit of async-copy threads is in place.

I wonder if we could use request deferral to delay IO requests until an
IO thread is available...

> 
> > > The 4MB chunk limit is there precisely to prevent synchronous COPY
> > > operations from tying up nfsd threads for too long. On a slow export,
> > > this is still not going to work, so I'd rather see a timer for this
> > > protection; say, 30ms, rather than a byte count. If more than 4MB
> > > can be handled quickly, that will be good for throughput.
> >
> > That sounds like a good goal.  Ideally we would need a way to negotiate
> > a window with write-back throttling so that we don't bother reading
> > until we know that writing to the page-cache won't block.  Certainly
> > worth exploring.
> >
> > >
> > > Note that we still want to limit the number of background copy
> > > operations going on. I don't want a mechanism where a client can
> > > start an unbounded amount of work on the server.
> >
> > This isn't obvious to me.  The server makes no promise concerning the
> > throughput it will provide.  Having a list of COPY requests that add up
> > to many terabytes isn't intrinsically a problem.  Having millions of
> > COPY requests in the list *might* be a little bit of a burden.  Using
> > __GFP_NORETRY might help put a natural limit on that.
> >
> > >
> > >
> > > > > > This came up because CVE-2024-49974 was created so I had to do something
> > > > > > about the theoretical DoS vector in SLE kernels.  I didn't like the
> > > > > > patch so I backported
> > > > > >
> > > > > > Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous")
> 
> I'm doing the same for RHEL. But the fact that CVE was created seems
> like a flaw of the CVE creation process in this case. It should have
> never been made a CVE.

I think everyone agrees that the CVE process is flawed.  Fixing it is
not so easy :-(
I'm glad you are doing the same.  I'm a little concerned that this
disabled inter-server copies too.  I guess the answer to that is to help
resolve the problems with async copy so that it can be re-enabled.

> 
> > > > > >
> > > > > > instead (and wondered why it hadn't gone to stable).
> > > > >
> > > > > I was conservative about requesting a backport here. However, if a
> > > > > CVE has been filed, and if there is no automation behind that
> > > > > process, you can explicitly request aadc3bbea163 be backported.
> > > > >
> > > > > The problem, to me, was less about server resource depletion and
> > > > > more about client hangs.
> > > >
> > > > And yet the patch that dealt with the less important server resource
> > > > depletion was marked for stable, and the patch that dealt with client
> > > > hangs wasn't??
> > > >
> > > > The CVE was for that less important patch, probably because it contained
> > > > the magic word "DoS".
> > >
> > > Quite likely. I wasn't consulted before the CVE was opened, nor was
> > > I notified that it had been created.
> > >
> > > Note that distributions are encouraged to evaluate whether a CVE is
> > > serious enough to address, rather than simply backporting the fixes
> > > automatically. But I know some customers want every CVE handled, so
> > > that is sometimes difficult.
> >
> > Yes, it needs to be handled.  Declaring it invalid is certainly an
> > option for handling it.  I didn't quite feel I could justify that in
> > this case.
> >
> > >
> > >
> > > > I think 8d915bbf3926 should go to stable but I would like to understand
> > > > why you felt the need to be conservative.
> > >
> > > First, I'm told that LTS generally avoids taking backports that
> > > overtly change user-visible behavior like disabling server-to-server
> > > copy (which requires async COPY to work). That was the main reason
> > > for my hesitance.
> >
> > Why does server-to-server require async COPY?
> > RFC 7862 section 4.5.  Inter-Server Copy says
> >   The destination server may perform the copy synchronously or
> >   asynchronously.
> > but I see that nfsd4_copy() returns nfs4err_notsupp if the inter-server
> > copy_is_sync(), but it isn't immediately obvious to me why.  The patch
> > which landed this functionality doesn't explain the restriction.
> 
> The choice (my choice) for not implementing a synchronous inter-server
> copy was from my understanding that such operation would be taking too
> much time and tie up a knfsd thread.

I guess either mount or the read could add indefinite delays... Thanks.

> 
> > I guess that with multiple 4MB sync COPY requests the server would need
> > to repeatedly mount and unmount the source server which could be
> > unnecessary work - or would need to cache the mount and know when to
> > unmount it....
> 
> The implementation caches the (source server) mount for a period of
> time. But needing to issue multiple COPY can eventually impact
> performance.

True, but I think that is the goal.  If there are two many async COPYs,
force the rest to by sync so as the slow down the client and limit the
impact on the server.


Thanks,
NeilBrown

Chuck Lever III Nov. 6, 2024, 2:18 p.m. UTC | #13

On Wed, Nov 06, 2024 at 03:30:54PM +1100, NeilBrown wrote:
> On Wed, 06 Nov 2024, Olga Kornievskaia wrote:
> > On Tue, Nov 5, 2024 at 4:06 PM NeilBrown <neilb@suse.de> wrote:
> > >
> > > On Wed, 06 Nov 2024, Chuck Lever wrote:
> > > >
> > > > Having nfsd threads handle this workload again invites a DoS vector.
> > >
> > > Any more so that having nfsd thread handling a WRITE workload?
> > 
> > Isn't a difference between a COPY and a WRITE the fact that the server
> > has the ability to restrict the TCP window of the client sending the
> > bytes. And for simplicity's sake, if we assume client/server has a
> > single TCP stream when the window size is limited then other WRITEs
> > are also prevented from sending more data. But a COPY operation
> > doesn't require much to be sent and preventing the client from sending
> > another COPY can't be achieved thru TCP window size.
> 
> I think you are saying that the WRITE requests are naturally throttled. 
> However that isn't necessarily the case.  If the network is
> significantly faster than the storage path, and if a client (or several
> clients) send enough concurrent WRITE requests, then all nfsd threads
> could get stuck in writeback throttling code - which could be seen as a
> denial of service as no other requests could be serviced until
> congestion eases.

I agree that WRITE throttling needs attention too, but limiting
background COPY seems like the more urgent matter. Unlike READ or
WRITE, COPY isn't restricted to a single 1MB chunk.

> So maybe some threads should be reserved for non-IO requests and so
> would avoid dequeuing WRITE and READ and COPY requests - and would not
> be involved in async COPY.

I'm not enthusiastic about "reserving threads for other purposes".
The mechanism that services incoming network requests is simply not
at the right layer to sort I/O and non-I/O NFS requests.

Instead NFSD will need some kind of two-layer request handling
mechanism where any request that might take time or resources will
need to be moved out of the layer that handles ingress network
requests. I hesitate to call it deferral, because I suspect it will
be a hot path, unlike the current svc_defer.

> So I'm still not convinced that COPY in any way introduced a new DoS
> problem - providing the limit of async-copy threads is in place.

Because the kernel doesn't do background COPY at all right now, I'm
going to defer this change until after the v6.13 merge window. The
patch proposed at the beginning of this thread doesn't seem like a
fix to me, but rather a change in policy.

> > > > > > > This came up because CVE-2024-49974 was created so I had to do something
> > > > > > > about the theoretical DoS vector in SLE kernels.  I didn't like the
> > > > > > > patch so I backported
> > > > > > >
> > > > > > > Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous")
> > 
> > I'm doing the same for RHEL. But the fact that CVE was created seems
> > like a flaw of the CVE creation process in this case. It should have
> > never been made a CVE.
> 
> I think everyone agrees that the CVE process is flawed.  Fixing it is
> not so easy :-(
> I'm glad you are doing the same.  I'm a little concerned that this
> disabled inter-server copies too.  I guess the answer to that is to help
> resolve the problems with async copy so that it can be re-enabled.

We've been working on restoring background COPY all summer. It is a
priority.

One issue is that a callback completion mechanism such as CB_OFFLOAD
is inherently unreliable. The SCSI community has loads of
implementation experience that backs this statement up. I've got an
implementation of OFFLOAD_STATUS ready for the Linux NFS client so
that it can poll for COPY completion if it thinks it has missed the
CB_OFFLOAD callback.

Another issue here is that RFC 7862 Section 4.8 wants each COPY
stateid to remain until the server sees the CB_OFFLOAD response.
NFSD needs to have a mechanism in place to ensure that a rogue
client or poor network conditions don't allow the set of waiting
COPY stateids to grow large over time.

nfsd: fallback to sync COPY if async not possible

Commit Message

Comments

Patch