diff mbox series

[1/2] SUNRPC: Don't allow waiting for exiting tasks

Message ID f301a88d04e1929a313c458bd8ce1218903b648a.1743183579.git.trond.myklebust@hammerspace.com (mailing list archive)
State Handled Elsewhere
Headers show
Series [1/2] SUNRPC: Don't allow waiting for exiting tasks | expand

Commit Message

Trond Myklebust March 28, 2025, 5:40 p.m. UTC
From: Trond Myklebust <trond.myklebust@hammerspace.com>

Once a task calls exit_signals() it can no longer be signalled. So do
not allow it to do killable waits.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 net/sunrpc/sched.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Jeff Layton March 28, 2025, 5:53 p.m. UTC | #1
On Fri, 2025-03-28 at 13:40 -0400, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> Once a task calls exit_signals() it can no longer be signalled. So do
> not allow it to do killable waits.
> 
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
>  net/sunrpc/sched.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index 9b45fbdc90ca..73bc39281ef5 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -276,6 +276,8 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
>  
>  static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
>  {
> +	if (unlikely(current->flags & PF_EXITING))
> +		return -EINTR;
>  	schedule();
>  	if (signal_pending_state(mode, current))
>  		return -ERESTARTSYS;

Won't this mean that if a task is signalled and does a final fput, that
a CLOSE sent in task_work will never get sent?
Trond Myklebust March 28, 2025, 6 p.m. UTC | #2
On Fri, 2025-03-28 at 13:53 -0400, Jeff Layton wrote:
> On Fri, 2025-03-28 at 13:40 -0400, trondmy@kernel.org wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > Once a task calls exit_signals() it can no longer be signalled. So
> > do
> > not allow it to do killable waits.
> > 
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> >  net/sunrpc/sched.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > index 9b45fbdc90ca..73bc39281ef5 100644
> > --- a/net/sunrpc/sched.c
> > +++ b/net/sunrpc/sched.c
> > @@ -276,6 +276,8 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
> >  
> >  static int rpc_wait_bit_killable(struct wait_bit_key *key, int
> > mode)
> >  {
> > +	if (unlikely(current->flags & PF_EXITING))
> > +		return -EINTR;
> >  	schedule();
> >  	if (signal_pending_state(mode, current))
> >  		return -ERESTARTSYS;
> 
> Won't this mean that if a task is signalled and does a final fput,
> that
> a CLOSE sent in task_work will never get sent?

It should mean that the close gets sent, but the task won't wait for
completion.
Jeff Layton March 28, 2025, 6:09 p.m. UTC | #3
On Fri, 2025-03-28 at 18:00 +0000, Trond Myklebust wrote:
> On Fri, 2025-03-28 at 13:53 -0400, Jeff Layton wrote:
> > On Fri, 2025-03-28 at 13:40 -0400, trondmy@kernel.org wrote:
> > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > 
> > > Once a task calls exit_signals() it can no longer be signalled. So
> > > do
> > > not allow it to do killable waits.
> > > 
> > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > ---
> > >  net/sunrpc/sched.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > > index 9b45fbdc90ca..73bc39281ef5 100644
> > > --- a/net/sunrpc/sched.c
> > > +++ b/net/sunrpc/sched.c
> > > @@ -276,6 +276,8 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
> > >  
> > >  static int rpc_wait_bit_killable(struct wait_bit_key *key, int
> > > mode)
> > >  {
> > > +	if (unlikely(current->flags & PF_EXITING))
> > > +		return -EINTR;
> > >  	schedule();
> > >  	if (signal_pending_state(mode, current))
> > >  		return -ERESTARTSYS;
> > 
> > Won't this mean that if a task is signalled and does a final fput,
> > that
> > a CLOSE sent in task_work will never get sent?
> 
> It should mean that the close gets sent, but the task won't wait for
> completion.
> 

Good enough, I guess.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Trond Myklebust March 28, 2025, 7:36 p.m. UTC | #4
On Fri, 2025-03-28 at 14:09 -0400, Jeff Layton wrote:
> On Fri, 2025-03-28 at 18:00 +0000, Trond Myklebust wrote:
> > On Fri, 2025-03-28 at 13:53 -0400, Jeff Layton wrote:
> > > On Fri, 2025-03-28 at 13:40 -0400, trondmy@kernel.org wrote:
> > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > 
> > > > Once a task calls exit_signals() it can no longer be signalled.
> > > > So
> > > > do
> > > > not allow it to do killable waits.
> > > > 
> > > > Signed-off-by: Trond Myklebust
> > > > <trond.myklebust@hammerspace.com>
> > > > ---
> > > >  net/sunrpc/sched.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > > 
> > > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > > > index 9b45fbdc90ca..73bc39281ef5 100644
> > > > --- a/net/sunrpc/sched.c
> > > > +++ b/net/sunrpc/sched.c
> > > > @@ -276,6 +276,8 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
> > > >  
> > > >  static int rpc_wait_bit_killable(struct wait_bit_key *key, int
> > > > mode)
> > > >  {
> > > > +	if (unlikely(current->flags & PF_EXITING))
> > > > +		return -EINTR;
> > > >  	schedule();
> > > >  	if (signal_pending_state(mode, current))
> > > >  		return -ERESTARTSYS;
> > > 
> > > Won't this mean that if a task is signalled and does a final
> > > fput,
> > > that
> > > a CLOSE sent in task_work will never get sent?
> > 
> > It should mean that the close gets sent, but the task won't wait
> > for
> > completion.
> > 
> 
> Good enough, I guess.
> 
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> 

It ensures that the task can die, and with the actual close RPC call
being asynchronous, it will keep going until either the server comes
back and processes it, or someone force umounts the partition and kills
the call, etc.
Mark Brown April 8, 2025, 10:31 a.m. UTC | #5
On Fri, Mar 28, 2025 at 01:40:44PM -0400, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> Once a task calls exit_signals() it can no longer be signalled. So do
> not allow it to do killable waits.

We're seeing the LTP acct02 test failing in kernels with this patch
applied, testing on systems with NFS root filesystems:

10271 05:03:09.064993  tst_test.c:1900: TINFO: LTP version: 20250130-1-g60fe84aaf
10272 05:03:09.076425  tst_test.c:1904: TINFO: Tested kernel: 6.15.0-rc1 #1 SMP PREEMPT Sun Apr  6 21:18:14 UTC 2025 aarch64
10273 05:03:09.076733  tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
10274 05:03:09.087803  tst_test.c:1722: TINFO: Overall timeout per run is 0h 01m 30s
10275 05:03:09.088107  tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
10276 05:03:09.093097  acct02.c:63: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
10277 05:03:09.093400  acct02.c:240: TINFO: Verifying using 'struct acct_v3'
10278 05:03:10.053504  <6>[   98.043143] Process accounting resumed
10279 05:03:10.053935  <6>[   98.043143] Process accounting resumed
10280 05:03:10.064653  acct02.c:193: TINFO: == entry 1 ==
10281 05:03:10.064953  acct02.c:84: TINFO: ac_comm != 'acct02_helper' ('acct02')
10282 05:03:10.076029  acct02.c:133: TINFO: ac_exitcode != 32768 (0)
10283 05:03:10.076331  acct02.c:141: TINFO: ac_ppid != 2466 (2461)
10284 05:03:10.076572  acct02.c:183: TFAIL: end of file reached
10285 05:03:10.076790  
10286 05:03:10.087439  HINT: You _MAY_ be missing kernel fixes:
10287 05:03:10.087741  
10288 05:03:10.087979  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d9570158b626
10289 05:03:10.088201  
10290 05:03:10.088414  Summary:
10291 05:03:10.088618  passed   0
10292 05:03:10.098852  failed   1
10293 05:03:10.099212  broken   0
10294 05:03:10.099454  skipped  0
10295 05:03:10.099675  warnings 0

I ran a bisect which zeroed in on this commit (log below), I didn't look
into it properly but the test does start and exit a test program to
verify that accounting records get created properly which does look
relevant.

git bisect start
# status: waiting for both good and bad commits
# bad: [0af2f6be1b4281385b618cb86ad946eded089ac8] Linux 6.15-rc1
git bisect bad 0af2f6be1b4281385b618cb86ad946eded089ac8
# status: waiting for good commit(s), bad commit known
# good: [38fec10eb60d687e30c8c6b5420d86e8149f7557] Linux 6.14
git bisect good 38fec10eb60d687e30c8c6b5420d86e8149f7557
# good: [fd71def6d9abc5ae362fb9995d46049b7b0ed391] Merge tag 'for-6.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
git bisect good fd71def6d9abc5ae362fb9995d46049b7b0ed391
# good: [93d52288679e29aaa44a6f12d5a02e8a90e742c5] Merge tag 'backlight-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
git bisect good 93d52288679e29aaa44a6f12d5a02e8a90e742c5
# good: [2cd5769fb0b78b8ef583ab4c0015c2c48d525dac] Merge tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
git bisect good 2cd5769fb0b78b8ef583ab4c0015c2c48d525dac
# bad: [25757984d77da731922bed5001431673b6daf5ac] Merge tag 'staging-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect bad 25757984d77da731922bed5001431673b6daf5ac
# good: [28a1b05678f4e88de90b0987b06e13c454ad9bd6] Merge tag 'i2c-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
git bisect good 28a1b05678f4e88de90b0987b06e13c454ad9bd6
# good: [92b71befc349587d58fdbbe6cdd68fb67f4933a8] Merge tag 'objtool-urgent-2025-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 92b71befc349587d58fdbbe6cdd68fb67f4933a8
# good: [5e17b5c71729d8ce936c83a579ed45f65efcb456] Merge tag 'fuse-update-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
git bisect good 5e17b5c71729d8ce936c83a579ed45f65efcb456
# good: [344a50b0f4eecc160c61d780f53d2f75586016ce] staging: gpib: lpvo_usb_gpib: struct gpib_board
git bisect good 344a50b0f4eecc160c61d780f53d2f75586016ce
# bad: [9e8f324bd44c1fe026b582b75213de4eccfa1163] NFSv4: Check for delegation validity in nfs_start_delegation_return_locked()
git bisect bad 9e8f324bd44c1fe026b582b75213de4eccfa1163
# good: [df210d9b0951d714c1668c511ca5c8ff38cf6916] sunrpc: Add a sysfs file for adding a new xprt
git bisect good df210d9b0951d714c1668c511ca5c8ff38cf6916
# good: [bf9be373b830a3e48117da5d89bb6145a575f880] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting
git bisect good bf9be373b830a3e48117da5d89bb6145a575f880
# good: [c81d5bcb7b38ab0322aea93152c091451b82d3f3] NFSv4: clp->cl_cons_state < 0 signifies an invalid nfs_client
git bisect good c81d5bcb7b38ab0322aea93152c091451b82d3f3
# bad: [14e41b16e8cb677bb440dca2edba8b041646c742] SUNRPC: Don't allow waiting for exiting tasks
git bisect bad 14e41b16e8cb677bb440dca2edba8b041646c742
# good: [0af5fb5ed3d2fd9e110c6112271f022b744a849a] NFSv4: Treat ENETUNREACH errors as fatal for state recovery
git bisect good 0af5fb5ed3d2fd9e110c6112271f022b744a849a
# first bad commit: [14e41b16e8cb677bb440dca2edba8b041646c742] SUNRPC: Don't allow waiting for exiting tasks
diff mbox series

Patch

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 9b45fbdc90ca..73bc39281ef5 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -276,6 +276,8 @@  EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
 
 static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
 {
+	if (unlikely(current->flags & PF_EXITING))
+		return -EINTR;
 	schedule();
 	if (signal_pending_state(mode, current))
 		return -ERESTARTSYS;