diff mbox series

[1/6] nfsd: filecache: remove race handling.

Message ID 20250207051701.3467505-2-neilb@suse.de (mailing list archive)
State Changes Requested
Delegated to: Chuck Lever
Headers show
Series nfsd: filecache: various fixes | expand

Commit Message

NeilBrown Feb. 7, 2025, 5:15 a.m. UTC
The race that this code tries to protect against is not interesting.
The code is problematic as we access the "nf" after we have given our
reference to the lru system.  While that take 2+ seconds to free things
it is still poor form.

The only interesting race I can find would be with
nfsd_file_close_inode_sync();
This is the only place that really doesn't want the file to stay on the
LRU when unhashed (which is the direct consequence of the race).

However for the race to happen, some other thread must own a reference
to a file and be putting in while nfsd_file_close_inode_sync() is trying
to close all files for an inode.  If this is possible, that other thread
could simply call nfsd_file_put() a little bit later and the result
would be the same: not all files are closed when
nfsd_file_close_inode_sync() completes.

If this was really a problem, we would need to wait in close_inode_sync
for the other references to be dropped.  We probably don't want to do
that.

So it is best to simply remove this code.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/filecache.c | 16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

Comments

Dave Chinner Feb. 10, 2025, 11:33 p.m. UTC | #1
On Fri, Feb 07, 2025 at 04:15:11PM +1100, NeilBrown wrote:
> The race that this code tries to protect against is not interesting.
> The code is problematic as we access the "nf" after we have given our
> reference to the lru system.  While that take 2+ seconds to free things
> it is still poor form.
> 
> The only interesting race I can find would be with
> nfsd_file_close_inode_sync();
> This is the only place that really doesn't want the file to stay on the
> LRU when unhashed (which is the direct consequence of the race).
> 
> However for the race to happen, some other thread must own a reference
> to a file and be putting in while nfsd_file_close_inode_sync() is trying
> to close all files for an inode.  If this is possible, that other thread
> could simply call nfsd_file_put() a little bit later and the result
> would be the same: not all files are closed when
> nfsd_file_close_inode_sync() completes.
> 
> If this was really a problem, we would need to wait in close_inode_sync
> for the other references to be dropped.  We probably don't want to do
> that.
> 
> So it is best to simply remove this code.
> 
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/filecache.c | 16 +++-------------
>  1 file changed, 3 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index a1cdba42c4fa..b13255bcbb96 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -371,20 +371,10 @@ nfsd_file_put(struct nfsd_file *nf)
>  
>  		/* Try to add it to the LRU.  If that fails, decrement. */
>  		if (nfsd_file_lru_add(nf)) {
> -			/* If it's still hashed, we're done */
> -			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> -				nfsd_file_schedule_laundrette();
> -				return;
> -			}
> -
> -			/*
> -			 * We're racing with unhashing, so try to remove it from
> -			 * the LRU. If removal fails, then someone else already
> -			 * has our reference.
> -			 */
> -			if (!nfsd_file_lru_remove(nf))
> -				return;
> +			nfsd_file_schedule_laundrette();
> +			return;

Why do we need to start the background gc on demand?  When the nfsd
subsystem is under load it is going to be calling
nfsd_file_schedule_laundrette() all the time. However, gc is almost
always going to be queued/running already.

i.e. the above code results in adding the overhead of an atomic
NFSD_FILE_CACHE_UP bit check and then a call to queue_delayed_work()
on an already queued item. This will disables interrupts, make an
atomic bit check that sees the work is queued, re-enable interrupts
and return.

IMO, there is no need to do this unnecessary work on every object
that is added to the LRU.  Changing the gc worker to always run
every 2s and check if it has work to do like so:

 static void
 nfsd_file_gc_worker(struct work_struct *work)
 {
-	nfsd_file_gc();
-	if (list_lru_count(&nfsd_file_lru))
-		nfsd_file_schedule_laundrette();
+	if (list_lru_count(&nfsd_file_lru))
+		nfsd_file_gc();
+	nfsd_file_schedule_laundrette();
 }

means that nfsd_file_gc() will be run the same way and have the same
behaviour as the current code. When the system it idle, it does a
list_lru_count() check every 2 seconds and goes back to sleep.
That's going to be pretty much unnoticable on most machines that run
NFS servers.

-Dave.
NeilBrown Feb. 12, 2025, 10:16 p.m. UTC | #2
On Tue, 11 Feb 2025, Dave Chinner wrote:
> On Fri, Feb 07, 2025 at 04:15:11PM +1100, NeilBrown wrote:
> > The race that this code tries to protect against is not interesting.
> > The code is problematic as we access the "nf" after we have given our
> > reference to the lru system.  While that take 2+ seconds to free things
> > it is still poor form.
> > 
> > The only interesting race I can find would be with
> > nfsd_file_close_inode_sync();
> > This is the only place that really doesn't want the file to stay on the
> > LRU when unhashed (which is the direct consequence of the race).
> > 
> > However for the race to happen, some other thread must own a reference
> > to a file and be putting in while nfsd_file_close_inode_sync() is trying
> > to close all files for an inode.  If this is possible, that other thread
> > could simply call nfsd_file_put() a little bit later and the result
> > would be the same: not all files are closed when
> > nfsd_file_close_inode_sync() completes.
> > 
> > If this was really a problem, we would need to wait in close_inode_sync
> > for the other references to be dropped.  We probably don't want to do
> > that.
> > 
> > So it is best to simply remove this code.
> > 
> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> >  fs/nfsd/filecache.c | 16 +++-------------
> >  1 file changed, 3 insertions(+), 13 deletions(-)
> > 
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index a1cdba42c4fa..b13255bcbb96 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -371,20 +371,10 @@ nfsd_file_put(struct nfsd_file *nf)
> >  
> >  		/* Try to add it to the LRU.  If that fails, decrement. */
> >  		if (nfsd_file_lru_add(nf)) {
> > -			/* If it's still hashed, we're done */
> > -			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> > -				nfsd_file_schedule_laundrette();
> > -				return;
> > -			}
> > -
> > -			/*
> > -			 * We're racing with unhashing, so try to remove it from
> > -			 * the LRU. If removal fails, then someone else already
> > -			 * has our reference.
> > -			 */
> > -			if (!nfsd_file_lru_remove(nf))
> > -				return;
> > +			nfsd_file_schedule_laundrette();
> > +			return;
> 
> Why do we need to start the background gc on demand?  When the nfsd
> subsystem is under load it is going to be calling
> nfsd_file_schedule_laundrette() all the time. However, gc is almost
> always going to be queued/running already.
> 
> i.e. the above code results in adding the overhead of an atomic
> NFSD_FILE_CACHE_UP bit check and then a call to queue_delayed_work()
> on an already queued item. This will disables interrupts, make an
> atomic bit check that sees the work is queued, re-enable interrupts
> and return.

I don't think we really need the NFSD_FILE_CACHE_UP test - if we have a
file to put, then the cache must be up.
And we could use delayed_work_pending() to avoid the irq disable.
That is still one atomic bit test though.

> 
> IMO, there is no need to do this unnecessary work on every object
> that is added to the LRU.  Changing the gc worker to always run
> every 2s and check if it has work to do like so:
> 
>  static void
>  nfsd_file_gc_worker(struct work_struct *work)
>  {
> -	nfsd_file_gc();
> -	if (list_lru_count(&nfsd_file_lru))
> -		nfsd_file_schedule_laundrette();
> +	if (list_lru_count(&nfsd_file_lru))
> +		nfsd_file_gc();
> +	nfsd_file_schedule_laundrette();
>  }
> 
> means that nfsd_file_gc() will be run the same way and have the same
> behaviour as the current code. When the system it idle, it does a
> list_lru_count() check every 2 seconds and goes back to sleep.
> That's going to be pretty much unnoticable on most machines that run
> NFS servers.

When serving a v4 only load we don't need the timer at all...  Maybe
that doesn't matter.

I would rather use a timer instead of a delayed work as the task doesn't
require sleeping, and we have a pool of nfsd threads to do any work that
might be needed.  So a timer that checks if work is needed then sets a
flag and wakes an nfsd would be even lower impact that a delayed work
which needs to wake a wq thread every time even if there is nothing to
do.

Thanks,
NeilBrown
Chuck Lever Feb. 13, 2025, 3:02 p.m. UTC | #3
[ responding to both Neil and Dave ... ]

On 2/12/25 5:16 PM, NeilBrown wrote:
> On Tue, 11 Feb 2025, Dave Chinner wrote:
>> On Fri, Feb 07, 2025 at 04:15:11PM +1100, NeilBrown wrote:
>>> The race that this code tries to protect against is not interesting.
>>> The code is problematic as we access the "nf" after we have given our
>>> reference to the lru system.  While that take 2+ seconds to free things
>>> it is still poor form.
>>>
>>> The only interesting race I can find would be with
>>> nfsd_file_close_inode_sync();
>>> This is the only place that really doesn't want the file to stay on the
>>> LRU when unhashed (which is the direct consequence of the race).
>>>
>>> However for the race to happen, some other thread must own a reference
>>> to a file and be putting in while nfsd_file_close_inode_sync() is trying
>>> to close all files for an inode.  If this is possible, that other thread
>>> could simply call nfsd_file_put() a little bit later and the result
>>> would be the same: not all files are closed when
>>> nfsd_file_close_inode_sync() completes.
>>>
>>> If this was really a problem, we would need to wait in close_inode_sync
>>> for the other references to be dropped.  We probably don't want to do
>>> that.
>>>
>>> So it is best to simply remove this code.
>>>
>>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>>> Signed-off-by: NeilBrown <neilb@suse.de>
>>> ---
>>>  fs/nfsd/filecache.c | 16 +++-------------
>>>  1 file changed, 3 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>>> index a1cdba42c4fa..b13255bcbb96 100644
>>> --- a/fs/nfsd/filecache.c
>>> +++ b/fs/nfsd/filecache.c
>>> @@ -371,20 +371,10 @@ nfsd_file_put(struct nfsd_file *nf)
>>>  
>>>  		/* Try to add it to the LRU.  If that fails, decrement. */
>>>  		if (nfsd_file_lru_add(nf)) {
>>> -			/* If it's still hashed, we're done */
>>> -			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
>>> -				nfsd_file_schedule_laundrette();
>>> -				return;
>>> -			}
>>> -
>>> -			/*
>>> -			 * We're racing with unhashing, so try to remove it from
>>> -			 * the LRU. If removal fails, then someone else already
>>> -			 * has our reference.
>>> -			 */
>>> -			if (!nfsd_file_lru_remove(nf))
>>> -				return;
>>> +			nfsd_file_schedule_laundrette();
>>> +			return;
>>
>> Why do we need to start the background gc on demand?

This call is an original feature of the filecache -- I'm guessing it
was done to ensure that a final put closes the file quickly. But over
time, the filecache has grown explicit code that handles that, so I
think it is likely this call is no longer needed.


>> When the nfsd
>> subsystem is under load it is going to be calling
>> nfsd_file_schedule_laundrette() all the time. However, gc is almost
>> always going to be queued/running already.
>>
>> i.e. the above code results in adding the overhead of an atomic
>> NFSD_FILE_CACHE_UP bit check and then a call to queue_delayed_work()
>> on an already queued item. This will disables interrupts, make an
>> atomic bit check that sees the work is queued, re-enable interrupts
>> and return.
> 
> I don't think we really need the NFSD_FILE_CACHE_UP test - if we have a
> file to put, then the cache must be up.

I can change 1/6 to remove the call to nfsd_file_schedule_laundrette().
Then these questions become moot.


> And we could use delayed_work_pending() to avoid the irq disable.
> That is still one atomic bit test though.
> 
>>
>> IMO, there is no need to do this unnecessary work on every object
>> that is added to the LRU.  Changing the gc worker to always run
>> every 2s and check if it has work to do like so:
>>
>>  static void
>>  nfsd_file_gc_worker(struct work_struct *work)
>>  {
>> -	nfsd_file_gc();
>> -	if (list_lru_count(&nfsd_file_lru))
>> -		nfsd_file_schedule_laundrette();
>> +	if (list_lru_count(&nfsd_file_lru))
>> +		nfsd_file_gc();
>> +	nfsd_file_schedule_laundrette();
>>  }
>>
>> means that nfsd_file_gc() will be run the same way and have the same
>> behaviour as the current code. When the system it idle, it does a
>> list_lru_count() check every 2 seconds and goes back to sleep.
>> That's going to be pretty much unnoticable on most machines that run
>> NFS servers.

I can add a patch to this series that makes this swap.


> When serving a v4 only load we don't need the timer at all...  Maybe
> that doesn't matter.
> 
> I would rather use a timer instead of a delayed work as the task doesn't
> require sleeping, and we have a pool of nfsd threads to do any work that
> might be needed.  So a timer that checks if work is needed then sets a
> flag and wakes an nfsd would be even lower impact that a delayed work
> which needs to wake a wq thread every time even if there is nothing to
> do.

Moving the laundrette work to nfsd threads seems sensible, but let's
leave that for another patch series.
diff mbox series

Patch

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index a1cdba42c4fa..b13255bcbb96 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -371,20 +371,10 @@  nfsd_file_put(struct nfsd_file *nf)
 
 		/* Try to add it to the LRU.  If that fails, decrement. */
 		if (nfsd_file_lru_add(nf)) {
-			/* If it's still hashed, we're done */
-			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
-				nfsd_file_schedule_laundrette();
-				return;
-			}
-
-			/*
-			 * We're racing with unhashing, so try to remove it from
-			 * the LRU. If removal fails, then someone else already
-			 * has our reference.
-			 */
-			if (!nfsd_file_lru_remove(nf))
-				return;
+			nfsd_file_schedule_laundrette();
+			return;
 		}
+
 	}
 	if (refcount_dec_and_test(&nf->nf_ref))
 		nfsd_file_free(nf);