Message ID | 20250220-nfsd-callback-v2-0-6a57f46e1c3a@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | nfsd: don't allow concurrent queueing of workqueue jobs | expand |
From: Chuck Lever <chuck.lever@oracle.com> On Thu, 20 Feb 2025 11:47:12 -0500, Jeff Layton wrote: > While looking at the problem that Li Lingfeng reported [1] around > callback queueing failures, I noticed that there were potential > scenarios where the callback workqueue jobs could run concurrently with > an rpc_task. Since they touch some of the same fields, this is incorrect > at best and potentially dangerous. > > This patchset adds a new mechanism for ensuring that the same > nfsd4_callback can't run concurrently with itself, regardless of where > it is in its execution. This also gives us a more sure mechanism for > handling the places where we need to take and hold a reference on an > object while the callback is running. > > [...] Applied to nfsd-testing, thanks! This series replaces: https://lore.kernel.org/linux-nfs/20250218135423.1487309-1-lilingfeng3@huawei.com/ Review is still open. [1/5] nfsd: prevent callback tasks running concurrently commit: 9a03a9d82410bdb758a6b342689e0c235bba94f1 [2/5] nfsd: eliminate cl_ra_cblist and NFSD4_CLIENT_CB_RECALL_ANY commit: 743fda103062626c828dbac774716e718a74f81b [3/5] nfsd: replace CB_GETATTR_BUSY with NFSD4_CALLBACK_RUNNING commit: d2d94554567f486eba111e953e75745eca09bee3 [4/5] nfsd: move cb_need_restart flag into cb_flags commit: 355f1ec5ce21ab324d9b3978d2d5abe6d0c84024 [5/5] nfsd: handle errors from rpc_call_async() commit: d0f1ba5ed270fbda06248ef8af822a9e14708ee1 -- Chuck Lever
While looking at the problem that Li Lingfeng reported [1] around callback queueing failures, I noticed that there were potential scenarios where the callback workqueue jobs could run concurrently with an rpc_task. Since they touch some of the same fields, this is incorrect at best and potentially dangerous. This patchset adds a new mechanism for ensuring that the same nfsd4_callback can't run concurrently with itself, regardless of where it is in its execution. This also gives us a more sure mechanism for handling the places where we need to take and hold a reference on an object while the callback is running. This should also fix the problem that Li Lingfeng reported, since queueing the work from nfsd4_cb_release() should never fail. Note that their earlier patch (fdf5c9413ea) should be dropped from nfsd-testing before this will apply cleanly. [1]: https://lore.kernel.org/linux-nfs/20250218135423.1487309-1-lilingfeng3@huawei.com/ Signed-off-by: Jeff Layton <jlayton@kernel.org> --- Changes in v2: - added patche to handle rpc_call_async() errors - rename NFSD4_CALLBACK_RESTART to NFSD4_CALLBACK_REQUEUE - add patch to replace CB_GETATTR_BUSY with NFSD4_CALLBACK_REQUEUE - Link to v1: https://lore.kernel.org/r/20250218-nfsd-callback-v1-0-14f966967dd8@kernel.org --- Jeff Layton (5): nfsd: prevent callback tasks running concurrently nfsd: eliminate cl_ra_cblist and NFSD4_CLIENT_CB_RECALL_ANY nfsd: replace CB_GETATTR_BUSY with NFSD4_CALLBACK_RUNNING nfsd: move cb_need_restart flag into cb_flags nfsd: handle errors from rpc_call_async() fs/nfsd/nfs4callback.c | 26 +++++++++++++++++--------- fs/nfsd/nfs4layouts.c | 7 ++++--- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfs4state.c | 31 ++++++++++++++----------------- fs/nfsd/state.h | 18 +++++++++++------- fs/nfsd/trace.h | 2 +- 6 files changed, 48 insertions(+), 38 deletions(-) --- base-commit: b7e85fd7c8964e31f8fa1cf7333b12f442b642f1 change-id: 20250218-nfsd-callback-f723b8498c78 Best regards,