[v2,2/7] NFSD: Re-organize nfsd_file_gc_worker()

Message ID	20250218153937.6125-3-cel@kernel.org (mailing list archive)
State	Under Review
Delegated to:	Chuck Lever
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C613526D5CC for <linux-nfs@vger.kernel.org>; Tue, 18 Feb 2025 15:39:42 +0000 (UTC) From: cel@kernel.org To: Neil Brown <neilb@suse.de>, Jeff Layton <jlayton@kernel.org>, Olga Kornievskaia <okorniev@redhat.com>, Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com> Cc: <linux-nfs@vger.kernel.org>, Dave Chinner <david@fromorbit.com>, Chuck Lever <chuck.lever@oracle.com> Subject: [PATCH v2 2/7] NFSD: Re-organize nfsd_file_gc_worker() Date: Tue, 18 Feb 2025 10:39:32 -0500 Message-ID: <20250218153937.6125-3-cel@kernel.org> In-Reply-To: <20250218153937.6125-1-cel@kernel.org> References: <20250218153937.6125-1-cel@kernel.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	nfsd: filecache: various fixes \| expand [v2,0/7] nfsd: filecache: various fixes [v2,1/7] nfsd: filecache: remove race handling. [v2,2/7] NFSD: Re-organize nfsd_file_gc_worker() [v2,3/7] nfsd: filecache: use nfsd_file_dispose_list() in nfsd_file_close_inode_sync() [v2,4/7] nfsd: filecache: use list_lru_walk_node() in nfsd_file_gc() [v2,5/7] nfsd: filecache: introduce NFSD_FILE_RECENT [v2,6/7] nfsd: filecache: don't repeatedly add/remove files on the lru list [v2,7/7] nfsd: filecache: drop the list_lru lock during lock gc scans

Message ID

20250218153937.6125-3-cel@kernel.org (mailing list archive)

State

Under Review

Delegated to:

Chuck Lever

Headers

From: cel@kernel.org
To: Neil Brown <neilb@suse.de>,
	Jeff Layton <jlayton@kernel.org>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <dai.ngo@oracle.com>,
	Tom Talpey <tom@talpey.com>
Cc: <linux-nfs@vger.kernel.org>,
	Dave Chinner <david@fromorbit.com>,
	Chuck Lever <chuck.lever@oracle.com>
Subject: [PATCH v2 2/7] NFSD: Re-organize nfsd_file_gc_worker()
Date: Tue, 18 Feb 2025 10:39:32 -0500
Message-ID: <20250218153937.6125-3-cel@kernel.org>
In-Reply-To: <20250218153937.6125-1-cel@kernel.org>
References: <20250218153937.6125-1-cel@kernel.org>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

nfsd: filecache: various fixes | expand

Commit Message

Chuck Lever Feb. 18, 2025, 3:39 p.m. UTC

From: Chuck Lever <chuck.lever@oracle.com>

Dave opines:

IMO, there is no need to do this unnecessary work on every object
that is added to the LRU.  Changing the gc worker to always run
every 2s and check if it has work to do like so:

 static void
 nfsd_file_gc_worker(struct work_struct *work)
 {
-	nfsd_file_gc();
-	if (list_lru_count(&nfsd_file_lru))
-		nfsd_file_schedule_laundrette();
+	if (list_lru_count(&nfsd_file_lru))
+		nfsd_file_gc();
+	nfsd_file_schedule_laundrette();
 }

means that nfsd_file_gc() will be run the same way and have the same
behaviour as the current code. When the system it idle, it does a
list_lru_count() check every 2 seconds and goes back to sleep.
That's going to be pretty much unnoticable on most machines that run
NFS servers.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Jeff Layton Feb. 18, 2025, 7:59 p.m. UTC | #1

On Tue, 2025-02-18 at 10:39 -0500, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Dave opines:
> 
> IMO, there is no need to do this unnecessary work on every object
> that is added to the LRU.  Changing the gc worker to always run
> every 2s and check if it has work to do like so:
> 
>  static void
>  nfsd_file_gc_worker(struct work_struct *work)
>  {
> -	nfsd_file_gc();
> -	if (list_lru_count(&nfsd_file_lru))
> -		nfsd_file_schedule_laundrette();
> +	if (list_lru_count(&nfsd_file_lru))
> +		nfsd_file_gc();
> +	nfsd_file_schedule_laundrette();
>  }
> 
> means that nfsd_file_gc() will be run the same way and have the same
> behaviour as the current code. When the system it idle, it does a
> list_lru_count() check every 2 seconds and goes back to sleep.
> That's going to be pretty much unnoticable on most machines that run
> NFS servers.
> 
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/filecache.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 909b5bc72bd3..2933cba1e5f4 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -549,9 +549,9 @@ nfsd_file_gc(void)
>  static void
>  nfsd_file_gc_worker(struct work_struct *work)
>  {
> -	nfsd_file_gc();
> +	nfsd_file_schedule_laundrette();
>  	if (list_lru_count(&nfsd_file_lru))
> -		nfsd_file_schedule_laundrette();
> +		nfsd_file_gc();
>  }
>  
>  static unsigned long

Given that it's a delayed workqueue job, it probably doesn't matter,
but why schedule the laundrette before doing the nfsd_file_gc() call?

Dave Chinner Feb. 19, 2025, 12:33 a.m. UTC | #2

On Tue, Feb 18, 2025 at 10:39:32AM -0500, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Dave opines:
> 
> IMO, there is no need to do this unnecessary work on every object
> that is added to the LRU.  Changing the gc worker to always run
> every 2s and check if it has work to do like so:
> 
>  static void
>  nfsd_file_gc_worker(struct work_struct *work)
>  {
> -	nfsd_file_gc();
> -	if (list_lru_count(&nfsd_file_lru))
> -		nfsd_file_schedule_laundrette();
> +	if (list_lru_count(&nfsd_file_lru))
> +		nfsd_file_gc();
> +	nfsd_file_schedule_laundrette();
>  }
> 
> means that nfsd_file_gc() will be run the same way and have the same
> behaviour as the current code. When the system it idle, it does a
> list_lru_count() check every 2 seconds and goes back to sleep.
> That's going to be pretty much unnoticable on most machines that run
> NFS servers.
> 
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/filecache.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 909b5bc72bd3..2933cba1e5f4 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -549,9 +549,9 @@ nfsd_file_gc(void)
>  static void
>  nfsd_file_gc_worker(struct work_struct *work)
>  {
> -	nfsd_file_gc();
> +	nfsd_file_schedule_laundrette();
>  	if (list_lru_count(&nfsd_file_lru))
> -		nfsd_file_schedule_laundrette();
> +		nfsd_file_gc();
>  }

IMO, the scheduling of new work is the wrong way around. It should
be done on completion of gc work, not before gc work is started.

i.e. If nfsd_file_gc() is overly delayed (because load, rt preempt,
etc), then a new gc worker will be started in 2s regardless of
whether the currently running gc worker has completed or not.

Worse case, there's a spinlock hang bug in nfsd_file_gc(). This code
will end up with N worker threads all spinning up in nfsd_file_gc()
chewing up all the CPU in the system, not making any progress....
If we schedule new work after completion of this work, then gc might
hang but it won't slowly drag the entire system down with it.

-Dave.

NeilBrown Feb. 19, 2025, 1:20 a.m. UTC | #3

On Wed, 19 Feb 2025, Dave Chinner wrote:
> On Tue, Feb 18, 2025 at 10:39:32AM -0500, cel@kernel.org wrote:
> > From: Chuck Lever <chuck.lever@oracle.com>
> > 
> > Dave opines:
> > 
> > IMO, there is no need to do this unnecessary work on every object
> > that is added to the LRU.  Changing the gc worker to always run
> > every 2s and check if it has work to do like so:
> > 
> >  static void
> >  nfsd_file_gc_worker(struct work_struct *work)
> >  {
> > -	nfsd_file_gc();
> > -	if (list_lru_count(&nfsd_file_lru))
> > -		nfsd_file_schedule_laundrette();
> > +	if (list_lru_count(&nfsd_file_lru))
> > +		nfsd_file_gc();
> > +	nfsd_file_schedule_laundrette();
> >  }
> > 
> > means that nfsd_file_gc() will be run the same way and have the same
> > behaviour as the current code. When the system it idle, it does a
> > list_lru_count() check every 2 seconds and goes back to sleep.
> > That's going to be pretty much unnoticable on most machines that run
> > NFS servers.
> > 
> > Suggested-by: Dave Chinner <david@fromorbit.com>
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > ---
> >  fs/nfsd/filecache.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index 909b5bc72bd3..2933cba1e5f4 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -549,9 +549,9 @@ nfsd_file_gc(void)
> >  static void
> >  nfsd_file_gc_worker(struct work_struct *work)
> >  {
> > -	nfsd_file_gc();
> > +	nfsd_file_schedule_laundrette();
> >  	if (list_lru_count(&nfsd_file_lru))
> > -		nfsd_file_schedule_laundrette();
> > +		nfsd_file_gc();
> >  }
> 
> IMO, the scheduling of new work is the wrong way around. It should
> be done on completion of gc work, not before gc work is started.
> 
> i.e. If nfsd_file_gc() is overly delayed (because load, rt preempt,
> etc), then a new gc worker will be started in 2s regardless of
> whether the currently running gc worker has completed or not.
> 
> Worse case, there's a spinlock hang bug in nfsd_file_gc(). This code
> will end up with N worker threads all spinning up in nfsd_file_gc()
> chewing up all the CPU in the system, not making any progress....
> If we schedule new work after completion of this work, then gc might
> hang but it won't slowly drag the entire system down with it.

While I agree that the enqueue is best done later rather than earlier, I
think your worst-case is over-stated.
queue_delayed_work() is a no-op if WORK_STRUCT_PENDING_BIT is still set.
A given work_struct can only be running once.
If the timer fires while nfsd_file_gc() is still running,
nfsd_filecache_laundrette will be queued to start immediately that the
currently running instance completes.  So the worst cases is that
there will always be one instance running.

Thanks,
NeilBrown

Chuck Lever Feb. 19, 2025, 2:01 p.m. UTC | #4

On 2/18/25 7:33 PM, Dave Chinner wrote:
> On Tue, Feb 18, 2025 at 10:39:32AM -0500, cel@kernel.org wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> Dave opines:
>>
>> IMO, there is no need to do this unnecessary work on every object
>> that is added to the LRU.  Changing the gc worker to always run
>> every 2s and check if it has work to do like so:
>>
>>  static void
>>  nfsd_file_gc_worker(struct work_struct *work)
>>  {
>> -	nfsd_file_gc();
>> -	if (list_lru_count(&nfsd_file_lru))
>> -		nfsd_file_schedule_laundrette();
>> +	if (list_lru_count(&nfsd_file_lru))
>> +		nfsd_file_gc();
>> +	nfsd_file_schedule_laundrette();
>>  }
>>
>> means that nfsd_file_gc() will be run the same way and have the same
>> behaviour as the current code. When the system it idle, it does a
>> list_lru_count() check every 2 seconds and goes back to sleep.
>> That's going to be pretty much unnoticable on most machines that run
>> NFS servers.
>>
>> Suggested-by: Dave Chinner <david@fromorbit.com>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>  fs/nfsd/filecache.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>> index 909b5bc72bd3..2933cba1e5f4 100644
>> --- a/fs/nfsd/filecache.c
>> +++ b/fs/nfsd/filecache.c
>> @@ -549,9 +549,9 @@ nfsd_file_gc(void)
>>  static void
>>  nfsd_file_gc_worker(struct work_struct *work)
>>  {
>> -	nfsd_file_gc();
>> +	nfsd_file_schedule_laundrette();
>>  	if (list_lru_count(&nfsd_file_lru))
>> -		nfsd_file_schedule_laundrette();
>> +		nfsd_file_gc();
>>  }
> 
> IMO, the scheduling of new work is the wrong way around. It should
> be done on completion of gc work, not before gc work is started.
> 
> i.e. If nfsd_file_gc() is overly delayed (because load, rt preempt,
> etc), then a new gc worker will be started in 2s regardless of
> whether the currently running gc worker has completed or not.
> 
> Worse case, there's a spinlock hang bug in nfsd_file_gc(). This code
> will end up with N worker threads all spinning up in nfsd_file_gc()
> chewing up all the CPU in the system, not making any progress....
> If we schedule new work after completion of this work, then gc might
> hang but it won't slowly drag the entire system down with it.

My bad. I miscopied your suggestion. Will fix in my tree.

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 909b5bc72bd3..2933cba1e5f4 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -549,9 +549,9 @@  nfsd_file_gc(void)
 static void
 nfsd_file_gc_worker(struct work_struct *work)
 {
-	nfsd_file_gc();
+	nfsd_file_schedule_laundrette();
 	if (list_lru_count(&nfsd_file_lru))
-		nfsd_file_schedule_laundrette();
+		nfsd_file_gc();
 }
 
 static unsigned long

[v2,2/7] NFSD: Re-organize nfsd_file_gc_worker()

Commit Message

Comments

Patch