mbox series

[v2,0/5] nfsd: don't allow concurrent queueing of workqueue jobs

Message ID 20250220-nfsd-callback-v2-0-6a57f46e1c3a@kernel.org (mailing list archive)
Headers show
Series nfsd: don't allow concurrent queueing of workqueue jobs | expand

Message

Jeff Layton Feb. 20, 2025, 4:47 p.m. UTC
While looking at the problem that Li Lingfeng reported [1] around
callback queueing failures, I noticed that there were potential
scenarios where the callback workqueue jobs could run concurrently with
an rpc_task. Since they touch some of the same fields, this is incorrect
at best and potentially dangerous.

This patchset adds a new mechanism for ensuring that the same
nfsd4_callback can't run concurrently with itself, regardless of where
it is in its execution. This also gives us a more sure mechanism for
handling the places where we need to take and hold a reference on an
object while the callback is running.

This should also fix the problem that Li Lingfeng reported, since
queueing the work from nfsd4_cb_release() should never fail. Note that
their earlier patch (fdf5c9413ea) should be dropped from nfsd-testing
before this will apply cleanly.

[1]: https://lore.kernel.org/linux-nfs/20250218135423.1487309-1-lilingfeng3@huawei.com/

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Changes in v2:
- added patche to handle rpc_call_async() errors
- rename NFSD4_CALLBACK_RESTART to NFSD4_CALLBACK_REQUEUE
- add patch to replace CB_GETATTR_BUSY with NFSD4_CALLBACK_REQUEUE
- Link to v1: https://lore.kernel.org/r/20250218-nfsd-callback-v1-0-14f966967dd8@kernel.org

---
Jeff Layton (5):
      nfsd: prevent callback tasks running concurrently
      nfsd: eliminate cl_ra_cblist and NFSD4_CLIENT_CB_RECALL_ANY
      nfsd: replace CB_GETATTR_BUSY with NFSD4_CALLBACK_RUNNING
      nfsd: move cb_need_restart flag into cb_flags
      nfsd: handle errors from rpc_call_async()

 fs/nfsd/nfs4callback.c | 26 +++++++++++++++++---------
 fs/nfsd/nfs4layouts.c  |  7 ++++---
 fs/nfsd/nfs4proc.c     |  2 +-
 fs/nfsd/nfs4state.c    | 31 ++++++++++++++-----------------
 fs/nfsd/state.h        | 18 +++++++++++-------
 fs/nfsd/trace.h        |  2 +-
 6 files changed, 48 insertions(+), 38 deletions(-)
---
base-commit: b7e85fd7c8964e31f8fa1cf7333b12f442b642f1
change-id: 20250218-nfsd-callback-f723b8498c78

Best regards,

Comments

Chuck Lever Feb. 20, 2025, 7:21 p.m. UTC | #1
From: Chuck Lever <chuck.lever@oracle.com>

On Thu, 20 Feb 2025 11:47:12 -0500, Jeff Layton wrote:
> While looking at the problem that Li Lingfeng reported [1] around
> callback queueing failures, I noticed that there were potential
> scenarios where the callback workqueue jobs could run concurrently with
> an rpc_task. Since they touch some of the same fields, this is incorrect
> at best and potentially dangerous.
> 
> This patchset adds a new mechanism for ensuring that the same
> nfsd4_callback can't run concurrently with itself, regardless of where
> it is in its execution. This also gives us a more sure mechanism for
> handling the places where we need to take and hold a reference on an
> object while the callback is running.
> 
> [...]

Applied to nfsd-testing, thanks! This series replaces:

https://lore.kernel.org/linux-nfs/20250218135423.1487309-1-lilingfeng3@huawei.com/

Review is still open.

[1/5] nfsd: prevent callback tasks running concurrently
      commit: 9a03a9d82410bdb758a6b342689e0c235bba94f1
[2/5] nfsd: eliminate cl_ra_cblist and NFSD4_CLIENT_CB_RECALL_ANY
      commit: 743fda103062626c828dbac774716e718a74f81b
[3/5] nfsd: replace CB_GETATTR_BUSY with NFSD4_CALLBACK_RUNNING
      commit: d2d94554567f486eba111e953e75745eca09bee3
[4/5] nfsd: move cb_need_restart flag into cb_flags
      commit: 355f1ec5ce21ab324d9b3978d2d5abe6d0c84024
[5/5] nfsd: handle errors from rpc_call_async()
      commit: d0f1ba5ed270fbda06248ef8af822a9e14708ee1

--
Chuck Lever