Message ID | f301a88d04e1929a313c458bd8ce1218903b648a.1743183579.git.trond.myklebust@hammerspace.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | [1/2] SUNRPC: Don't allow waiting for exiting tasks | expand |
On Fri, 2025-03-28 at 13:40 -0400, trondmy@kernel.org wrote: > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > Once a task calls exit_signals() it can no longer be signalled. So do > not allow it to do killable waits. > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > --- > net/sunrpc/sched.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c > index 9b45fbdc90ca..73bc39281ef5 100644 > --- a/net/sunrpc/sched.c > +++ b/net/sunrpc/sched.c > @@ -276,6 +276,8 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue); > > static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode) > { > + if (unlikely(current->flags & PF_EXITING)) > + return -EINTR; > schedule(); > if (signal_pending_state(mode, current)) > return -ERESTARTSYS; Won't this mean that if a task is signalled and does a final fput, that a CLOSE sent in task_work will never get sent?
On Fri, 2025-03-28 at 13:53 -0400, Jeff Layton wrote: > On Fri, 2025-03-28 at 13:40 -0400, trondmy@kernel.org wrote: > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > Once a task calls exit_signals() it can no longer be signalled. So > > do > > not allow it to do killable waits. > > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > --- > > net/sunrpc/sched.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c > > index 9b45fbdc90ca..73bc39281ef5 100644 > > --- a/net/sunrpc/sched.c > > +++ b/net/sunrpc/sched.c > > @@ -276,6 +276,8 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue); > > > > static int rpc_wait_bit_killable(struct wait_bit_key *key, int > > mode) > > { > > + if (unlikely(current->flags & PF_EXITING)) > > + return -EINTR; > > schedule(); > > if (signal_pending_state(mode, current)) > > return -ERESTARTSYS; > > Won't this mean that if a task is signalled and does a final fput, > that > a CLOSE sent in task_work will never get sent? It should mean that the close gets sent, but the task won't wait for completion.
On Fri, 2025-03-28 at 18:00 +0000, Trond Myklebust wrote: > On Fri, 2025-03-28 at 13:53 -0400, Jeff Layton wrote: > > On Fri, 2025-03-28 at 13:40 -0400, trondmy@kernel.org wrote: > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > Once a task calls exit_signals() it can no longer be signalled. So > > > do > > > not allow it to do killable waits. > > > > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > > --- > > > net/sunrpc/sched.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c > > > index 9b45fbdc90ca..73bc39281ef5 100644 > > > --- a/net/sunrpc/sched.c > > > +++ b/net/sunrpc/sched.c > > > @@ -276,6 +276,8 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue); > > > > > > static int rpc_wait_bit_killable(struct wait_bit_key *key, int > > > mode) > > > { > > > + if (unlikely(current->flags & PF_EXITING)) > > > + return -EINTR; > > > schedule(); > > > if (signal_pending_state(mode, current)) > > > return -ERESTARTSYS; > > > > Won't this mean that if a task is signalled and does a final fput, > > that > > a CLOSE sent in task_work will never get sent? > > It should mean that the close gets sent, but the task won't wait for > completion. > Good enough, I guess. Reviewed-by: Jeff Layton <jlayton@kernel.org>
On Fri, 2025-03-28 at 14:09 -0400, Jeff Layton wrote: > On Fri, 2025-03-28 at 18:00 +0000, Trond Myklebust wrote: > > On Fri, 2025-03-28 at 13:53 -0400, Jeff Layton wrote: > > > On Fri, 2025-03-28 at 13:40 -0400, trondmy@kernel.org wrote: > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > > > Once a task calls exit_signals() it can no longer be signalled. > > > > So > > > > do > > > > not allow it to do killable waits. > > > > > > > > Signed-off-by: Trond Myklebust > > > > <trond.myklebust@hammerspace.com> > > > > --- > > > > net/sunrpc/sched.c | 2 ++ > > > > 1 file changed, 2 insertions(+) > > > > > > > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c > > > > index 9b45fbdc90ca..73bc39281ef5 100644 > > > > --- a/net/sunrpc/sched.c > > > > +++ b/net/sunrpc/sched.c > > > > @@ -276,6 +276,8 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue); > > > > > > > > static int rpc_wait_bit_killable(struct wait_bit_key *key, int > > > > mode) > > > > { > > > > + if (unlikely(current->flags & PF_EXITING)) > > > > + return -EINTR; > > > > schedule(); > > > > if (signal_pending_state(mode, current)) > > > > return -ERESTARTSYS; > > > > > > Won't this mean that if a task is signalled and does a final > > > fput, > > > that > > > a CLOSE sent in task_work will never get sent? > > > > It should mean that the close gets sent, but the task won't wait > > for > > completion. > > > > Good enough, I guess. > > Reviewed-by: Jeff Layton <jlayton@kernel.org> > It ensures that the task can die, and with the actual close RPC call being asynchronous, it will keep going until either the server comes back and processes it, or someone force umounts the partition and kills the call, etc.
On Fri, Mar 28, 2025 at 01:40:44PM -0400, trondmy@kernel.org wrote: > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > Once a task calls exit_signals() it can no longer be signalled. So do > not allow it to do killable waits. We're seeing the LTP acct02 test failing in kernels with this patch applied, testing on systems with NFS root filesystems: 10271 05:03:09.064993 tst_test.c:1900: TINFO: LTP version: 20250130-1-g60fe84aaf 10272 05:03:09.076425 tst_test.c:1904: TINFO: Tested kernel: 6.15.0-rc1 #1 SMP PREEMPT Sun Apr 6 21:18:14 UTC 2025 aarch64 10273 05:03:09.076733 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz' 10274 05:03:09.087803 tst_test.c:1722: TINFO: Overall timeout per run is 0h 01m 30s 10275 05:03:09.088107 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz' 10276 05:03:09.093097 acct02.c:63: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y 10277 05:03:09.093400 acct02.c:240: TINFO: Verifying using 'struct acct_v3' 10278 05:03:10.053504 <6>[ 98.043143] Process accounting resumed 10279 05:03:10.053935 <6>[ 98.043143] Process accounting resumed 10280 05:03:10.064653 acct02.c:193: TINFO: == entry 1 == 10281 05:03:10.064953 acct02.c:84: TINFO: ac_comm != 'acct02_helper' ('acct02') 10282 05:03:10.076029 acct02.c:133: TINFO: ac_exitcode != 32768 (0) 10283 05:03:10.076331 acct02.c:141: TINFO: ac_ppid != 2466 (2461) 10284 05:03:10.076572 acct02.c:183: TFAIL: end of file reached 10285 05:03:10.076790 10286 05:03:10.087439 HINT: You _MAY_ be missing kernel fixes: 10287 05:03:10.087741 10288 05:03:10.087979 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d9570158b626 10289 05:03:10.088201 10290 05:03:10.088414 Summary: 10291 05:03:10.088618 passed 0 10292 05:03:10.098852 failed 1 10293 05:03:10.099212 broken 0 10294 05:03:10.099454 skipped 0 10295 05:03:10.099675 warnings 0 I ran a bisect which zeroed in on this commit (log below), I didn't look into it properly but the test does start and exit a test program to verify that accounting records get created properly which does look relevant. git bisect start # status: waiting for both good and bad commits # bad: [0af2f6be1b4281385b618cb86ad946eded089ac8] Linux 6.15-rc1 git bisect bad 0af2f6be1b4281385b618cb86ad946eded089ac8 # status: waiting for good commit(s), bad commit known # good: [38fec10eb60d687e30c8c6b5420d86e8149f7557] Linux 6.14 git bisect good 38fec10eb60d687e30c8c6b5420d86e8149f7557 # good: [fd71def6d9abc5ae362fb9995d46049b7b0ed391] Merge tag 'for-6.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux git bisect good fd71def6d9abc5ae362fb9995d46049b7b0ed391 # good: [93d52288679e29aaa44a6f12d5a02e8a90e742c5] Merge tag 'backlight-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight git bisect good 93d52288679e29aaa44a6f12d5a02e8a90e742c5 # good: [2cd5769fb0b78b8ef583ab4c0015c2c48d525dac] Merge tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core git bisect good 2cd5769fb0b78b8ef583ab4c0015c2c48d525dac # bad: [25757984d77da731922bed5001431673b6daf5ac] Merge tag 'staging-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging git bisect bad 25757984d77da731922bed5001431673b6daf5ac # good: [28a1b05678f4e88de90b0987b06e13c454ad9bd6] Merge tag 'i2c-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux git bisect good 28a1b05678f4e88de90b0987b06e13c454ad9bd6 # good: [92b71befc349587d58fdbbe6cdd68fb67f4933a8] Merge tag 'objtool-urgent-2025-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good 92b71befc349587d58fdbbe6cdd68fb67f4933a8 # good: [5e17b5c71729d8ce936c83a579ed45f65efcb456] Merge tag 'fuse-update-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse git bisect good 5e17b5c71729d8ce936c83a579ed45f65efcb456 # good: [344a50b0f4eecc160c61d780f53d2f75586016ce] staging: gpib: lpvo_usb_gpib: struct gpib_board git bisect good 344a50b0f4eecc160c61d780f53d2f75586016ce # bad: [9e8f324bd44c1fe026b582b75213de4eccfa1163] NFSv4: Check for delegation validity in nfs_start_delegation_return_locked() git bisect bad 9e8f324bd44c1fe026b582b75213de4eccfa1163 # good: [df210d9b0951d714c1668c511ca5c8ff38cf6916] sunrpc: Add a sysfs file for adding a new xprt git bisect good df210d9b0951d714c1668c511ca5c8ff38cf6916 # good: [bf9be373b830a3e48117da5d89bb6145a575f880] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting git bisect good bf9be373b830a3e48117da5d89bb6145a575f880 # good: [c81d5bcb7b38ab0322aea93152c091451b82d3f3] NFSv4: clp->cl_cons_state < 0 signifies an invalid nfs_client git bisect good c81d5bcb7b38ab0322aea93152c091451b82d3f3 # bad: [14e41b16e8cb677bb440dca2edba8b041646c742] SUNRPC: Don't allow waiting for exiting tasks git bisect bad 14e41b16e8cb677bb440dca2edba8b041646c742 # good: [0af5fb5ed3d2fd9e110c6112271f022b744a849a] NFSv4: Treat ENETUNREACH errors as fatal for state recovery git bisect good 0af5fb5ed3d2fd9e110c6112271f022b744a849a # first bad commit: [14e41b16e8cb677bb440dca2edba8b041646c742] SUNRPC: Don't allow waiting for exiting tasks
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 9b45fbdc90ca..73bc39281ef5 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -276,6 +276,8 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue); static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode) { + if (unlikely(current->flags & PF_EXITING)) + return -EINTR; schedule(); if (signal_pending_state(mode, current)) return -ERESTARTSYS;