diff mbox series

nfs4: skip locks_lock_inode_wait() in nfs4_locku_done if FL_ACCESS is set

Message ID 4088a4fe-1c1e-7b9b-0685-dac367094b61@virtuozzo.com (mailing list archive)
State New, archived
Headers show
Series nfs4: skip locks_lock_inode_wait() in nfs4_locku_done if FL_ACCESS is set | expand

Commit Message

Vasily Averin Dec. 5, 2021, 10:12 a.m. UTC
In 2006 Trond Myklebust added support for the FL_ACCESS flag,
commit 01c3b861cd77 ("NLM,NFSv4: Wait on local locks before we put RPC
calls on the wire"), as a result of which _nfs4_proc_setlk() began
to execute _nfs4_do_setlk() with modified request->fl_flag where
FL_ACCESS flag was set.

It was not important not till 2015, when commit c69899a17ca4 ("NFSv4:
Update of VFS byte range lock must be atomic with the stateid update")
added do_vfs_lock call into nfs4_locku_done().
nfs4_locku_done() in this case uses calldata->fl of nfs4_unlockdata.
It is copied from struct nfs4_lockdata, which in turn uses the fl_flag
copied from the request->fl_flag provided by _nfs4_do_setlk(), i.e. with
FL_ACCESS flag set.

FL_ACCESS flag is removed in nfs4_lock_done() for non-cancelled case.
however rpc task can be cancelled earlier.

As a result flock_lock_inode() can be called with request->fl_type F_UNLCK
and fl_flags with FL_ACCESS flag set.
Such request is processed incorectly. Instead of expected search and
removal of exisiting flocks it jumps to "find_conflict" label and can call
locks_insert_block() function.

On kernels before 2018, (i.e. before commit 7b587e1a5a6c
("NFS: use locks_copy_lock() to copy locks.")) it caused a BUG in
__locks_insert_block() because copied fl had incorrectly linked fl_block.

On new kernels all lists are properly initialized and no BUG occur,
however any any case, such a call does nothing useful.

If I understand correctly locks_lock_inode_wait(F_UNLCK) call is required
to revert locks_lock_inode_wait(F_LCK) request send from nfs4_lock_done().
An additional F_UNLCK request is dangerous, because of it can remove flock
set not by canceled task but by some other concurrent process.

So I think we need to add FL_ACCESS check in nfs4_locku_done
and skip locks_lock_inode_wait() executing if this flag is set.

Fixes: c69899a17ca4 ("NFSv4: Update of VFS byte range lock must be atomic with the stateid update")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
---
 fs/nfs/nfs4proc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Vasily Averin Dec. 5, 2021, 10:43 a.m. UTC | #1
On 05.12.2021 13:12, Vasily Averin wrote:
> In 2006 Trond Myklebust added support for the FL_ACCESS flag,
> commit 01c3b861cd77 ("NLM,NFSv4: Wait on local locks before we put RPC
> calls on the wire"), as a result of which _nfs4_proc_setlk() began
> to execute _nfs4_do_setlk() with modified request->fl_flag where
> FL_ACCESS flag was set.
> 
> It was not important not till 2015, when commit c69899a17ca4 ("NFSv4:
> Update of VFS byte range lock must be atomic with the stateid update")
> added do_vfs_lock call into nfs4_locku_done().
> nfs4_locku_done() in this case uses calldata->fl of nfs4_unlockdata.
> It is copied from struct nfs4_lockdata, which in turn uses the fl_flag
> copied from the request->fl_flag provided by _nfs4_do_setlk(), i.e. with
> FL_ACCESS flag set.
> 
> FL_ACCESS flag is removed in nfs4_lock_done() for non-cancelled case.
> however rpc task can be cancelled earlier.
> 
> As a result flock_lock_inode() can be called with request->fl_type F_UNLCK
> and fl_flags with FL_ACCESS flag set.
> Such request is processed incorectly. Instead of expected search and
> removal of exisiting flocks it jumps to "find_conflict" label and can call
> locks_insert_block() function.
> 
> On kernels before 2018, (i.e. before commit 7b587e1a5a6c
> ("NFS: use locks_copy_lock() to copy locks.")) it caused a BUG in
> __locks_insert_block() because copied fl had incorrectly linked fl_block.

originally it was foudn during processing of real customers bugreports on
RHEL7-based OpenVz7 kernel.
 kernel BUG at fs/locks.c:612!
 CPU: 7 PID: 1019852 Comm: kworker/u65:43 ve: 0 Kdump: loaded Tainted: G        W  O   ------------   3.10.0-1160.41.1.vz7.183.5 #1 183.5
 Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.3 05/23/2018
 Workqueue: rpciod rpc_async_schedule [sunrpc]
 task: ffff9d50e5de0000 ti: ffff9d3c9ec10000 task.ti: ffff9d3c9ec10000
 RIP: 0010:[<ffffffffbe0d590a>]  [<ffffffffbe0d590a>] __locks_insert_block+0xea/0xf0
 RSP: 0018:ffff9d3c9ec13c78  EFLAGS: 00010297
 RAX: 0000000000000000 RBX: ffff9d529554e180 RCX: 0000000000000001
 RDX: 0000000000000001 RSI: ffff9d51d2363a98 RDI: ffff9d51d2363ab0
 RBP: ffff9d3c9ec13c88 R08: 0000000000000003 R09: ffff9d5f5b8dfcd0
 R10: ffff9d5f5b8dfd08 R11: ffffbb21594b5a80 R12: ffff9d51d2363a98
 R13: 0000000000000000 R14: ffff9d50e5de0000 R15: ffff9d3da03915f8
 FS:  0000000000000000(0000) GS:ffff9d55bfbc0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f93d65ee1e8 CR3: 00000029a04d6000 CR4: 00000000000607e0
 Call Trace:
  [<ffffffffbe0d5939>] locks_insert_block+0x29/0x40
  [<ffffffffbe0d6d5b>] flock_lock_inode_wait+0x2bb/0x310
  [<ffffffffc01c7470>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
  [<ffffffffbe0d6dce>] locks_lock_inode_wait+0x1e/0x40
  [<ffffffffc0c9f5c0>] nfs4_locku_done+0x90/0x190 [nfsv4]
  [<ffffffffc01bb750>] ? call_decode+0x1f0/0x880 [sunrpc]
  [<ffffffffc01c7470>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
  [<ffffffffc01c74a1>] rpc_exit_task+0x31/0x90 [sunrpc]
  [<ffffffffc01c9654>] __rpc_execute+0xe4/0x470 [sunrpc]
  [<ffffffffc01c99f2>] rpc_async_schedule+0x12/0x20 [sunrpc]
  [<ffffffffbdec1b25>] process_one_work+0x185/0x440
  [<ffffffffbdec27e6>] worker_thread+0x126/0x3c0
  [<ffffffffbdec26c0>] ? manage_workers.isra.26+0x2a0/0x2a0
  [<ffffffffbdec9e31>] kthread+0xd1/0xe0
  [<ffffffffbdec9d60>] ? create_kthread+0x60/0x60
  [<ffffffffbe5d2eb7>] ret_from_fork_nospec_begin+0x21/0x21
  [<ffffffffbdec9d60>] ? create_kthread+0x60/0x60
 Code: 48 85 d2 49 89 54 24 08 74 04 48 89 4a 08 48 89 0c c5 c0 ee 09 bf 49 89 74 24 10 5b 41 5c 5d c3 90 49 8b 44 24 28 e9 80 ff ff ff <0f> 0b 0f 1f 40 00 66 66 66 66 90 55 48 89 e5 41 54 49 89 f4 53
 RIP  [<ffffffffbe0d590a>] __locks_insert_block+0xea/0xf0
 RSP <ffff9d3c9ec13c78>

In crashdump I've found nfs4_lockudata and (already freed but not reused) nfs4_lockdata
both have fl->fl_flags = 0x8a.

Thank you,
	Vasily Averin
i.e  have set FL_SLEEP, FL_ACCESS and FL_FLOCK.
fl_flags = 0x8a,
diff mbox series

Patch

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index ee3bc79f6ca3..4417dde69202 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -6728,7 +6728,9 @@  static void nfs4_locku_done(struct rpc_task *task, void *data)
 	switch (task->tk_status) {
 		case 0:
 			renew_lease(calldata->server, calldata->timestamp);
-			locks_lock_inode_wait(calldata->lsp->ls_state->inode, &calldata->fl);
+			if (!(calldata->fl.fl_flags & FL_ACCESS))
+				locks_lock_inode_wait(calldata->lsp->ls_state->inode,
+						      &calldata->fl);
 			if (nfs4_update_lock_stateid(calldata->lsp,
 					&calldata->res.stateid))
 				break;