Message ID | 4088a4fe-1c1e-7b9b-0685-dac367094b61@virtuozzo.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | nfs4: skip locks_lock_inode_wait() in nfs4_locku_done if FL_ACCESS is set | expand |
On 05.12.2021 13:12, Vasily Averin wrote: > In 2006 Trond Myklebust added support for the FL_ACCESS flag, > commit 01c3b861cd77 ("NLM,NFSv4: Wait on local locks before we put RPC > calls on the wire"), as a result of which _nfs4_proc_setlk() began > to execute _nfs4_do_setlk() with modified request->fl_flag where > FL_ACCESS flag was set. > > It was not important not till 2015, when commit c69899a17ca4 ("NFSv4: > Update of VFS byte range lock must be atomic with the stateid update") > added do_vfs_lock call into nfs4_locku_done(). > nfs4_locku_done() in this case uses calldata->fl of nfs4_unlockdata. > It is copied from struct nfs4_lockdata, which in turn uses the fl_flag > copied from the request->fl_flag provided by _nfs4_do_setlk(), i.e. with > FL_ACCESS flag set. > > FL_ACCESS flag is removed in nfs4_lock_done() for non-cancelled case. > however rpc task can be cancelled earlier. > > As a result flock_lock_inode() can be called with request->fl_type F_UNLCK > and fl_flags with FL_ACCESS flag set. > Such request is processed incorectly. Instead of expected search and > removal of exisiting flocks it jumps to "find_conflict" label and can call > locks_insert_block() function. > > On kernels before 2018, (i.e. before commit 7b587e1a5a6c > ("NFS: use locks_copy_lock() to copy locks.")) it caused a BUG in > __locks_insert_block() because copied fl had incorrectly linked fl_block. originally it was foudn during processing of real customers bugreports on RHEL7-based OpenVz7 kernel. kernel BUG at fs/locks.c:612! CPU: 7 PID: 1019852 Comm: kworker/u65:43 ve: 0 Kdump: loaded Tainted: G W O ------------ 3.10.0-1160.41.1.vz7.183.5 #1 183.5 Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.3 05/23/2018 Workqueue: rpciod rpc_async_schedule [sunrpc] task: ffff9d50e5de0000 ti: ffff9d3c9ec10000 task.ti: ffff9d3c9ec10000 RIP: 0010:[<ffffffffbe0d590a>] [<ffffffffbe0d590a>] __locks_insert_block+0xea/0xf0 RSP: 0018:ffff9d3c9ec13c78 EFLAGS: 00010297 RAX: 0000000000000000 RBX: ffff9d529554e180 RCX: 0000000000000001 RDX: 0000000000000001 RSI: ffff9d51d2363a98 RDI: ffff9d51d2363ab0 RBP: ffff9d3c9ec13c88 R08: 0000000000000003 R09: ffff9d5f5b8dfcd0 R10: ffff9d5f5b8dfd08 R11: ffffbb21594b5a80 R12: ffff9d51d2363a98 R13: 0000000000000000 R14: ffff9d50e5de0000 R15: ffff9d3da03915f8 FS: 0000000000000000(0000) GS:ffff9d55bfbc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f93d65ee1e8 CR3: 00000029a04d6000 CR4: 00000000000607e0 Call Trace: [<ffffffffbe0d5939>] locks_insert_block+0x29/0x40 [<ffffffffbe0d6d5b>] flock_lock_inode_wait+0x2bb/0x310 [<ffffffffc01c7470>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc] [<ffffffffbe0d6dce>] locks_lock_inode_wait+0x1e/0x40 [<ffffffffc0c9f5c0>] nfs4_locku_done+0x90/0x190 [nfsv4] [<ffffffffc01bb750>] ? call_decode+0x1f0/0x880 [sunrpc] [<ffffffffc01c7470>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc] [<ffffffffc01c74a1>] rpc_exit_task+0x31/0x90 [sunrpc] [<ffffffffc01c9654>] __rpc_execute+0xe4/0x470 [sunrpc] [<ffffffffc01c99f2>] rpc_async_schedule+0x12/0x20 [sunrpc] [<ffffffffbdec1b25>] process_one_work+0x185/0x440 [<ffffffffbdec27e6>] worker_thread+0x126/0x3c0 [<ffffffffbdec26c0>] ? manage_workers.isra.26+0x2a0/0x2a0 [<ffffffffbdec9e31>] kthread+0xd1/0xe0 [<ffffffffbdec9d60>] ? create_kthread+0x60/0x60 [<ffffffffbe5d2eb7>] ret_from_fork_nospec_begin+0x21/0x21 [<ffffffffbdec9d60>] ? create_kthread+0x60/0x60 Code: 48 85 d2 49 89 54 24 08 74 04 48 89 4a 08 48 89 0c c5 c0 ee 09 bf 49 89 74 24 10 5b 41 5c 5d c3 90 49 8b 44 24 28 e9 80 ff ff ff <0f> 0b 0f 1f 40 00 66 66 66 66 90 55 48 89 e5 41 54 49 89 f4 53 RIP [<ffffffffbe0d590a>] __locks_insert_block+0xea/0xf0 RSP <ffff9d3c9ec13c78> In crashdump I've found nfs4_lockudata and (already freed but not reused) nfs4_lockdata both have fl->fl_flags = 0x8a. Thank you, Vasily Averin i.e have set FL_SLEEP, FL_ACCESS and FL_FLOCK. fl_flags = 0x8a,
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index ee3bc79f6ca3..4417dde69202 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -6728,7 +6728,9 @@ static void nfs4_locku_done(struct rpc_task *task, void *data) switch (task->tk_status) { case 0: renew_lease(calldata->server, calldata->timestamp); - locks_lock_inode_wait(calldata->lsp->ls_state->inode, &calldata->fl); + if (!(calldata->fl.fl_flags & FL_ACCESS)) + locks_lock_inode_wait(calldata->lsp->ls_state->inode, + &calldata->fl); if (nfs4_update_lock_stateid(calldata->lsp, &calldata->res.stateid)) break;
In 2006 Trond Myklebust added support for the FL_ACCESS flag, commit 01c3b861cd77 ("NLM,NFSv4: Wait on local locks before we put RPC calls on the wire"), as a result of which _nfs4_proc_setlk() began to execute _nfs4_do_setlk() with modified request->fl_flag where FL_ACCESS flag was set. It was not important not till 2015, when commit c69899a17ca4 ("NFSv4: Update of VFS byte range lock must be atomic with the stateid update") added do_vfs_lock call into nfs4_locku_done(). nfs4_locku_done() in this case uses calldata->fl of nfs4_unlockdata. It is copied from struct nfs4_lockdata, which in turn uses the fl_flag copied from the request->fl_flag provided by _nfs4_do_setlk(), i.e. with FL_ACCESS flag set. FL_ACCESS flag is removed in nfs4_lock_done() for non-cancelled case. however rpc task can be cancelled earlier. As a result flock_lock_inode() can be called with request->fl_type F_UNLCK and fl_flags with FL_ACCESS flag set. Such request is processed incorectly. Instead of expected search and removal of exisiting flocks it jumps to "find_conflict" label and can call locks_insert_block() function. On kernels before 2018, (i.e. before commit 7b587e1a5a6c ("NFS: use locks_copy_lock() to copy locks.")) it caused a BUG in __locks_insert_block() because copied fl had incorrectly linked fl_block. On new kernels all lists are properly initialized and no BUG occur, however any any case, such a call does nothing useful. If I understand correctly locks_lock_inode_wait(F_UNLCK) call is required to revert locks_lock_inode_wait(F_LCK) request send from nfs4_lock_done(). An additional F_UNLCK request is dangerous, because of it can remove flock set not by canceled task but by some other concurrent process. So I think we need to add FL_ACCESS check in nfs4_locku_done and skip locks_lock_inode_wait() executing if this flag is set. Fixes: c69899a17ca4 ("NFSv4: Update of VFS byte range lock must be atomic with the stateid update") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> --- fs/nfs/nfs4proc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)