Message ID | 20140811113557.692d408b@tlielax.poochiereds.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Aug 11, 2014 at 11:35 AM, Jeff Layton <jeff.layton@primarydata.com> wrote: > On Mon, 11 Aug 2014 08:09:24 -0700 > Christoph Hellwig <hch@infradead.org> wrote: > >> >> I managed to hit a similar but different issue with your patch applied (note >> the slab poisoning pattern in rax): >> >> generic/089 409s ...[ 399.057379] general protection fault: 0000 [#1] >> SMP >> [ 399.059137] Modules linked in: >> [ 399.060089] CPU: 2 PID: 4367 Comm: kworker/2:2 Not tainted 3.16.0-rc6+ #1153 >> [ 399.060617] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 >> [ 399.060617] Workqueue: nfsiod free_lock_state_work >> [ 399.060617] task: ffff88007ab68810 ti: ffff88007c3b4000 task.ti: ffff88007c3b4000 >> [ 399.060617] RIP: 0010:[<ffffffff8134e5bb>] [<ffffffff8134e5bb>] nfs41_free_lock_state+0x2b/0x70 >> [ 399.060617] RSP: 0018:ffff88007c3b7d18 EFLAGS: 00010286 >> [ 399.060617] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007cdd3800 RCX: 0000000000000000 >> [ 399.060617] RDX: ffffffff81e04c60 RSI: ffff88007cdd39a0 RDI: ffff880079e5a000 >> [ 399.060617] RBP: ffff88007c3b7d38 R08: ffffffff832df6d0 R09: 000001c90f100000 >> [ 399.060617] R10: 0000000000000000 R11: 00000000000656f0 R12: ffff880079e5a000 >> [ 399.060617] R13: ffff88007fd18b00 R14: ffff88007cdd39c0 R15: 0000000000000000 >> [ 399.060617] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 >> [ 399.060617] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> [ 399.060617] CR2: 00007f5ac2f56800 CR3: 000000007a95b000 CR4: 00000000000006e0 >> [ 399.060617] Stack: >> [ 399.060617] 000000007fd13240 ffff88007a8f7800 ffff88007fd13240 ffff88007fd18b00 >> [ 399.060617] ffff88007c3b7d58 ffffffff813621ae ffff88007cdd39c0 ffff88007a4d0c40 >> [ 399.060617] ffff88007c3b7dd8 ffffffff810cc877 ffffffff810cc80d ffff88007fd13258 >> [ 399.060617] Call Trace: >> [ 399.060617] [<ffffffff813621ae>] free_lock_state_work+0x2e/0x40 >> [ 399.060617] [<ffffffff810cc877>] process_one_work+0x1c7/0x490 >> [ 399.060617] [<ffffffff810cc80d>] ? process_one_work+0x15d/0x490 >> [ 399.060617] [<ffffffff810cd569>] worker_thread+0x119/0x4f0 >> [ 399.060617] [<ffffffff810fbbad>] ? trace_hardirqs_on+0xd/0x10 >> [ 399.060617] [<ffffffff810cd450>] ? init_pwq+0x190/0x190 >> [ 399.060617] [<ffffffff810d3c6f>] kthread+0xdf/0x100 >> [ 399.060617] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 >> [ 399.060617] [<ffffffff81d9873c>] ret_from_fork+0x7c/0xb0 >> [ 399.060617] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 >> >> (gdb) l *(nfs41_free_lock_state+0x2b) >> 0xffffffff8134e5bb is in nfs41_free_lock_state (fs/nfs/nfs4proc.c:8313). >> 8308 nfs41_free_lock_state(struct nfs_server *server, struct >> nfs4_lock_state *lsp) >> 8309 { >> 8310 struct rpc_task *task; >> 8311 struct rpc_cred *cred = lsp->ls_state->owner->so_cred; >> 8312 >> 8313 task = _nfs41_free_stateid(server, &lsp->ls_stateid, >> cred, false); >> 8314 nfs4_free_lock_state(server, lsp); >> 8315 if (IS_ERR(task)) >> 8316 return; >> 8317 rpc_put_task(task); >> > > Oof -- right. Same problem, just in a different spot. So there, we need > the openowner. We don't have a pointer directly to that, so we're > probably best off just holding a reference to the open stateid, and > pinning the superblock too. > > Maybe something like this instead? I'm also running xfstests on a VM > now to see if I can reproduce this and verify the fix: > > ---------------------[snip]----------------------- > > [PATCH] nfs: pin superblock and open state when running free_lock_state operations > > Christoph reported seeing this oops when running xfstests on a v4.1 client: > > generic/089 242s ...[ 2187.041239] general protection fault: 0000 [#1] SMP > [ 2187.042899] Modules linked in: > [ 2187.044000] CPU: 0 PID: 11913 Comm: kworker/0:1 Not tainted 3.16.0-rc6+ #1151 > [ 2187.044287] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 > [ 2187.044287] Workqueue: nfsiod free_lock_state_work > [ 2187.044287] task: ffff880072b50cd0 ti: ffff88007a4ec000 task.ti: ffff88007a4ec000 > [ 2187.044287] RIP: 0010:[<ffffffff81361ca6>] [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30 > [ 2187.044287] RSP: 0018:ffff88007a4efd58 EFLAGS: 00010296 > [ 2187.044287] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007a947ac0 RCX: 8000000000000000 > [ 2187.044287] RDX: ffffffff826af9e0 RSI: ffff88007b093c00 RDI: ffff88007b093db8 > [ 2187.044287] RBP: ffff88007a4efd58 R08: ffffffff832d3e10 R09: 000001c40efc0000 > [ 2187.044287] R10: 0000000000000000 R11: 0000000000059e30 R12: ffff88007fc13240 > [ 2187.044287] R13: ffff88007fc18b00 R14: ffff88007b093db8 R15: 0000000000000000 > [ 2187.044287] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 > [ 2187.044287] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 2187.044287] CR2: 00007f93ec33fb80 CR3: 0000000079dc2000 CR4: 00000000000006f0 > [ 2187.044287] Stack: > [ 2187.044287] ffff88007a4efdd8 ffffffff810cc877 ffffffff810cc80d ffff88007fc13258 > [ 2187.044287] 000000007a947af0 0000000000000000 ffffffff8353ccc8 ffffffff82b6f3d0 > [ 2187.044287] 0000000000000000 ffffffff82267679 ffff88007a4efdd8 ffff88007fc13240 > [ 2187.044287] Call Trace: > [ 2187.044287] [<ffffffff810cc877>] process_one_work+0x1c7/0x490 > [ 2187.044287] [<ffffffff810cc80d>] ? process_one_work+0x15d/0x490 > [ 2187.044287] [<ffffffff810cd569>] worker_thread+0x119/0x4f0 > [ 2187.044287] [<ffffffff810fbbad>] ? trace_hardirqs_on+0xd/0x10 > [ 2187.044287] [<ffffffff810cd450>] ? init_pwq+0x190/0x190 > [ 2187.044287] [<ffffffff810d3c6f>] kthread+0xdf/0x100 > [ 2187.044287] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 > [ 2187.044287] [<ffffffff81d9873c>] ret_from_fork+0x7c/0xb0 > [ 2187.044287] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 > [ 2187.044287] Code: 0f 1f 44 00 00 31 c0 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d b7 48 fe ff ff 48 8b 87 58 fe ff ff 48 89 e5 48 8b 40 30 <48> 8b 00 48 8b 10 48 89 c7 48 8b 92 90 03 00 00 ff 52 28 5d c3 > [ 2187.044287] RIP [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30 > [ 2187.044287] RSP <ffff88007a4efd58> > [ 2187.103626] ---[ end trace 0f11326d28e5d8fa ]--- > > It appears that the lock state outlived the open state with which it > was associated. > > Fix this by pinning down the open stateid and the superblock prior to > queueing the workqueue job, and then releasing putting those references > once the RPC task has been queued. > > Reported-by: Christoph Hellwig <hch@infradead.org> > Signed-off-by: Jeff Layton <jlayton@primarydata.com> > --- > fs/nfs/nfs4state.c | 13 +++++++++---- > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > index a770c8e469a7..fb29c7cbe8e3 100644 > --- a/fs/nfs/nfs4state.c > +++ b/fs/nfs/nfs4state.c > @@ -804,11 +804,14 @@ free_lock_state_work(struct work_struct *work) > { > struct nfs4_lock_state *lsp = container_of(work, > struct nfs4_lock_state, ls_release); > - struct nfs4_state *state = lsp->ls_state; > - struct nfs_server *server = state->owner->so_server; > + struct nfs4_state *osp = lsp->ls_state; > + struct super_block *sb = osp->inode->i_sb; > + struct nfs_server *server = NFS_SB(sb); > struct nfs_client *clp = server->nfs_client; > > clp->cl_mvops->free_lock_state(server, lsp); > + nfs4_put_open_state(osp); > + nfs_sb_deactive(sb); > } > > /* > @@ -896,9 +899,11 @@ void nfs4_put_lock_state(struct nfs4_lock_state *lsp) > if (list_empty(&state->lock_states)) > clear_bit(LK_STATE_IN_USE, &state->flags); > spin_unlock(&state->state_lock); > - if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) > + if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) { > + atomic_inc(&lsp->ls_state->count); > + nfs_sb_active(lsp->ls_state->inode->i_sb); > queue_work(nfsiod_workqueue, &lsp->ls_release); > - else { > + } else { > server = state->owner->so_server; > nfs4_free_lock_state(server, lsp); > } > -- > 1.9.3 > Can we step back a little? It looks to me as if we're working on a whole new range of symptoms that are a consequence of a set of locking rule changes for fl_release_private that were not well thought through. Prior to commit 72f98e72551fa, the locking rules for fl_release_private stated that it _does_ allow blocking. Nothing in commit 72f98e72551fa was done to change the code that did block, instead it just decreed that fl_release_private no longer allows blocking as if that magically makes thing work. This whole thing of doing an asynchronous call started because locks_delete_lock() is calling lock_free_lock() instead of just unlinking, and then deferring the actual freeing of the locks until we've dropped the spinlocks. It should be trivial to change locks_delete_lock() so that after calling locks_unlink_lock(), it adds to a private list that can then be drained at the end of flock_lock_file(), __posix_lock_file(), and locks_remove_file(). The lease code looks like it may need to be special cased (lease_modify()), but that can just keep doing what it is currently doing until someone fixes it. Pretty please....
On Mon, 11 Aug 2014 12:47:48 -0400 Trond Myklebust <trond.myklebust@primarydata.com> wrote: > On Mon, Aug 11, 2014 at 11:35 AM, Jeff Layton > <jeff.layton@primarydata.com> wrote: > > On Mon, 11 Aug 2014 08:09:24 -0700 > > Christoph Hellwig <hch@infradead.org> wrote: > > > >> > >> I managed to hit a similar but different issue with your patch applied (note > >> the slab poisoning pattern in rax): > >> > >> generic/089 409s ...[ 399.057379] general protection fault: 0000 [#1] > >> SMP > >> [ 399.059137] Modules linked in: > >> [ 399.060089] CPU: 2 PID: 4367 Comm: kworker/2:2 Not tainted 3.16.0-rc6+ #1153 > >> [ 399.060617] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 > >> [ 399.060617] Workqueue: nfsiod free_lock_state_work > >> [ 399.060617] task: ffff88007ab68810 ti: ffff88007c3b4000 task.ti: ffff88007c3b4000 > >> [ 399.060617] RIP: 0010:[<ffffffff8134e5bb>] [<ffffffff8134e5bb>] nfs41_free_lock_state+0x2b/0x70 > >> [ 399.060617] RSP: 0018:ffff88007c3b7d18 EFLAGS: 00010286 > >> [ 399.060617] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007cdd3800 RCX: 0000000000000000 > >> [ 399.060617] RDX: ffffffff81e04c60 RSI: ffff88007cdd39a0 RDI: ffff880079e5a000 > >> [ 399.060617] RBP: ffff88007c3b7d38 R08: ffffffff832df6d0 R09: 000001c90f100000 > >> [ 399.060617] R10: 0000000000000000 R11: 00000000000656f0 R12: ffff880079e5a000 > >> [ 399.060617] R13: ffff88007fd18b00 R14: ffff88007cdd39c0 R15: 0000000000000000 > >> [ 399.060617] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 > >> [ 399.060617] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > >> [ 399.060617] CR2: 00007f5ac2f56800 CR3: 000000007a95b000 CR4: 00000000000006e0 > >> [ 399.060617] Stack: > >> [ 399.060617] 000000007fd13240 ffff88007a8f7800 ffff88007fd13240 ffff88007fd18b00 > >> [ 399.060617] ffff88007c3b7d58 ffffffff813621ae ffff88007cdd39c0 ffff88007a4d0c40 > >> [ 399.060617] ffff88007c3b7dd8 ffffffff810cc877 ffffffff810cc80d ffff88007fd13258 > >> [ 399.060617] Call Trace: > >> [ 399.060617] [<ffffffff813621ae>] free_lock_state_work+0x2e/0x40 > >> [ 399.060617] [<ffffffff810cc877>] process_one_work+0x1c7/0x490 > >> [ 399.060617] [<ffffffff810cc80d>] ? process_one_work+0x15d/0x490 > >> [ 399.060617] [<ffffffff810cd569>] worker_thread+0x119/0x4f0 > >> [ 399.060617] [<ffffffff810fbbad>] ? trace_hardirqs_on+0xd/0x10 > >> [ 399.060617] [<ffffffff810cd450>] ? init_pwq+0x190/0x190 > >> [ 399.060617] [<ffffffff810d3c6f>] kthread+0xdf/0x100 > >> [ 399.060617] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 > >> [ 399.060617] [<ffffffff81d9873c>] ret_from_fork+0x7c/0xb0 > >> [ 399.060617] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 > >> > >> (gdb) l *(nfs41_free_lock_state+0x2b) > >> 0xffffffff8134e5bb is in nfs41_free_lock_state (fs/nfs/nfs4proc.c:8313). > >> 8308 nfs41_free_lock_state(struct nfs_server *server, struct > >> nfs4_lock_state *lsp) > >> 8309 { > >> 8310 struct rpc_task *task; > >> 8311 struct rpc_cred *cred = lsp->ls_state->owner->so_cred; > >> 8312 > >> 8313 task = _nfs41_free_stateid(server, &lsp->ls_stateid, > >> cred, false); > >> 8314 nfs4_free_lock_state(server, lsp); > >> 8315 if (IS_ERR(task)) > >> 8316 return; > >> 8317 rpc_put_task(task); > >> > > > > Oof -- right. Same problem, just in a different spot. So there, we need > > the openowner. We don't have a pointer directly to that, so we're > > probably best off just holding a reference to the open stateid, and > > pinning the superblock too. > > > > Maybe something like this instead? I'm also running xfstests on a VM > > now to see if I can reproduce this and verify the fix: > > > > ---------------------[snip]----------------------- > > > > [PATCH] nfs: pin superblock and open state when running free_lock_state operations > > > > Christoph reported seeing this oops when running xfstests on a v4.1 client: > > > > generic/089 242s ...[ 2187.041239] general protection fault: 0000 [#1] SMP > > [ 2187.042899] Modules linked in: > > [ 2187.044000] CPU: 0 PID: 11913 Comm: kworker/0:1 Not tainted 3.16.0-rc6+ #1151 > > [ 2187.044287] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 > > [ 2187.044287] Workqueue: nfsiod free_lock_state_work > > [ 2187.044287] task: ffff880072b50cd0 ti: ffff88007a4ec000 task.ti: ffff88007a4ec000 > > [ 2187.044287] RIP: 0010:[<ffffffff81361ca6>] [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30 > > [ 2187.044287] RSP: 0018:ffff88007a4efd58 EFLAGS: 00010296 > > [ 2187.044287] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007a947ac0 RCX: 8000000000000000 > > [ 2187.044287] RDX: ffffffff826af9e0 RSI: ffff88007b093c00 RDI: ffff88007b093db8 > > [ 2187.044287] RBP: ffff88007a4efd58 R08: ffffffff832d3e10 R09: 000001c40efc0000 > > [ 2187.044287] R10: 0000000000000000 R11: 0000000000059e30 R12: ffff88007fc13240 > > [ 2187.044287] R13: ffff88007fc18b00 R14: ffff88007b093db8 R15: 0000000000000000 > > [ 2187.044287] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 > > [ 2187.044287] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > [ 2187.044287] CR2: 00007f93ec33fb80 CR3: 0000000079dc2000 CR4: 00000000000006f0 > > [ 2187.044287] Stack: > > [ 2187.044287] ffff88007a4efdd8 ffffffff810cc877 ffffffff810cc80d ffff88007fc13258 > > [ 2187.044287] 000000007a947af0 0000000000000000 ffffffff8353ccc8 ffffffff82b6f3d0 > > [ 2187.044287] 0000000000000000 ffffffff82267679 ffff88007a4efdd8 ffff88007fc13240 > > [ 2187.044287] Call Trace: > > [ 2187.044287] [<ffffffff810cc877>] process_one_work+0x1c7/0x490 > > [ 2187.044287] [<ffffffff810cc80d>] ? process_one_work+0x15d/0x490 > > [ 2187.044287] [<ffffffff810cd569>] worker_thread+0x119/0x4f0 > > [ 2187.044287] [<ffffffff810fbbad>] ? trace_hardirqs_on+0xd/0x10 > > [ 2187.044287] [<ffffffff810cd450>] ? init_pwq+0x190/0x190 > > [ 2187.044287] [<ffffffff810d3c6f>] kthread+0xdf/0x100 > > [ 2187.044287] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 > > [ 2187.044287] [<ffffffff81d9873c>] ret_from_fork+0x7c/0xb0 > > [ 2187.044287] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 > > [ 2187.044287] Code: 0f 1f 44 00 00 31 c0 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d b7 48 fe ff ff 48 8b 87 58 fe ff ff 48 89 e5 48 8b 40 30 <48> 8b 00 48 8b 10 48 89 c7 48 8b 92 90 03 00 00 ff 52 28 5d c3 > > [ 2187.044287] RIP [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30 > > [ 2187.044287] RSP <ffff88007a4efd58> > > [ 2187.103626] ---[ end trace 0f11326d28e5d8fa ]--- > > > > It appears that the lock state outlived the open state with which it > > was associated. > > > > Fix this by pinning down the open stateid and the superblock prior to > > queueing the workqueue job, and then releasing putting those references > > once the RPC task has been queued. > > > > Reported-by: Christoph Hellwig <hch@infradead.org> > > Signed-off-by: Jeff Layton <jlayton@primarydata.com> > > --- > > fs/nfs/nfs4state.c | 13 +++++++++---- > > 1 file changed, 9 insertions(+), 4 deletions(-) > > > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c > > index a770c8e469a7..fb29c7cbe8e3 100644 > > --- a/fs/nfs/nfs4state.c > > +++ b/fs/nfs/nfs4state.c > > @@ -804,11 +804,14 @@ free_lock_state_work(struct work_struct *work) > > { > > struct nfs4_lock_state *lsp = container_of(work, > > struct nfs4_lock_state, ls_release); > > - struct nfs4_state *state = lsp->ls_state; > > - struct nfs_server *server = state->owner->so_server; > > + struct nfs4_state *osp = lsp->ls_state; > > + struct super_block *sb = osp->inode->i_sb; > > + struct nfs_server *server = NFS_SB(sb); > > struct nfs_client *clp = server->nfs_client; > > > > clp->cl_mvops->free_lock_state(server, lsp); > > + nfs4_put_open_state(osp); > > + nfs_sb_deactive(sb); > > } > > > > /* > > @@ -896,9 +899,11 @@ void nfs4_put_lock_state(struct nfs4_lock_state *lsp) > > if (list_empty(&state->lock_states)) > > clear_bit(LK_STATE_IN_USE, &state->flags); > > spin_unlock(&state->state_lock); > > - if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) > > + if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) { > > + atomic_inc(&lsp->ls_state->count); > > + nfs_sb_active(lsp->ls_state->inode->i_sb); > > queue_work(nfsiod_workqueue, &lsp->ls_release); > > - else { > > + } else { > > server = state->owner->so_server; > > nfs4_free_lock_state(server, lsp); > > } > > -- > > 1.9.3 > > > > Can we step back a little? It looks to me as if we're working on a > whole new range of symptoms that are a consequence of a set of locking > rule changes for fl_release_private that were not well thought > through. > Prior to commit 72f98e72551fa, the locking rules for > fl_release_private stated that it _does_ allow blocking. > Nothing in commit 72f98e72551fa was done to change the code that did > block, instead it just decreed that fl_release_private no longer > allows blocking as if that magically makes thing work. > > This whole thing of doing an asynchronous call started because > locks_delete_lock() is calling lock_free_lock() instead of just > unlinking, and then deferring the actual freeing of the locks until > we've dropped the spinlocks. > > It should be trivial to change locks_delete_lock() so that after > calling locks_unlink_lock(), it adds to a private list that can then > be drained at the end of flock_lock_file(), __posix_lock_file(), and > locks_remove_file(). > The lease code looks like it may need to be special cased > (lease_modify()), but that can just keep doing what it is currently > doing until someone fixes it. > > Pretty please.... > Ok. It's not quite that simple but it should be doable... locks_release_private is also called by locks_copy_lock, and I don't see a good way to defer that call. We might be able to just get rid of it though, and just require a "pristine" lock be passed to locks_copy_lock. It looks like that's already done in most cases anyway (maybe all cases). I've already been making some motions toward doing a lock pushdown in the lease code anyway, so this may just give me a real reason to finish that up...
> the openowner. We don't have a pointer directly to that, so we're > probably best off just holding a reference to the open stateid, and > pinning the superblock too. > > Maybe something like this instead? I'm also running xfstests on a VM > now to see if I can reproduce this and verify the fix: This version has survived several xfstests run so far, so it seems good. I'll keep running it a bit longer. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 11, 2014 at 1:35 PM, Jeff Layton <jeff.layton@primarydata.com> wrote: > On Mon, 11 Aug 2014 12:47:48 -0400 > Trond Myklebust <trond.myklebust@primarydata.com> wrote: > >> On Mon, Aug 11, 2014 at 11:35 AM, Jeff Layton >> <jeff.layton@primarydata.com> wrote: >> > On Mon, 11 Aug 2014 08:09:24 -0700 >> > Christoph Hellwig <hch@infradead.org> wrote: >> > >> >> >> >> I managed to hit a similar but different issue with your patch applied (note >> >> the slab poisoning pattern in rax): >> >> >> >> generic/089 409s ...[ 399.057379] general protection fault: 0000 [#1] >> >> SMP >> >> [ 399.059137] Modules linked in: >> >> [ 399.060089] CPU: 2 PID: 4367 Comm: kworker/2:2 Not tainted 3.16.0-rc6+ #1153 >> >> [ 399.060617] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 >> >> [ 399.060617] Workqueue: nfsiod free_lock_state_work >> >> [ 399.060617] task: ffff88007ab68810 ti: ffff88007c3b4000 task.ti: ffff88007c3b4000 >> >> [ 399.060617] RIP: 0010:[<ffffffff8134e5bb>] [<ffffffff8134e5bb>] nfs41_free_lock_state+0x2b/0x70 >> >> [ 399.060617] RSP: 0018:ffff88007c3b7d18 EFLAGS: 00010286 >> >> [ 399.060617] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007cdd3800 RCX: 0000000000000000 >> >> [ 399.060617] RDX: ffffffff81e04c60 RSI: ffff88007cdd39a0 RDI: ffff880079e5a000 >> >> [ 399.060617] RBP: ffff88007c3b7d38 R08: ffffffff832df6d0 R09: 000001c90f100000 >> >> [ 399.060617] R10: 0000000000000000 R11: 00000000000656f0 R12: ffff880079e5a000 >> >> [ 399.060617] R13: ffff88007fd18b00 R14: ffff88007cdd39c0 R15: 0000000000000000 >> >> [ 399.060617] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 >> >> [ 399.060617] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> >> [ 399.060617] CR2: 00007f5ac2f56800 CR3: 000000007a95b000 CR4: 00000000000006e0 >> >> [ 399.060617] Stack: >> >> [ 399.060617] 000000007fd13240 ffff88007a8f7800 ffff88007fd13240 ffff88007fd18b00 >> >> [ 399.060617] ffff88007c3b7d58 ffffffff813621ae ffff88007cdd39c0 ffff88007a4d0c40 >> >> [ 399.060617] ffff88007c3b7dd8 ffffffff810cc877 ffffffff810cc80d ffff88007fd13258 >> >> [ 399.060617] Call Trace: >> >> [ 399.060617] [<ffffffff813621ae>] free_lock_state_work+0x2e/0x40 >> >> [ 399.060617] [<ffffffff810cc877>] process_one_work+0x1c7/0x490 >> >> [ 399.060617] [<ffffffff810cc80d>] ? process_one_work+0x15d/0x490 >> >> [ 399.060617] [<ffffffff810cd569>] worker_thread+0x119/0x4f0 >> >> [ 399.060617] [<ffffffff810fbbad>] ? trace_hardirqs_on+0xd/0x10 >> >> [ 399.060617] [<ffffffff810cd450>] ? init_pwq+0x190/0x190 >> >> [ 399.060617] [<ffffffff810d3c6f>] kthread+0xdf/0x100 >> >> [ 399.060617] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 >> >> [ 399.060617] [<ffffffff81d9873c>] ret_from_fork+0x7c/0xb0 >> >> [ 399.060617] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 >> >> >> >> (gdb) l *(nfs41_free_lock_state+0x2b) >> >> 0xffffffff8134e5bb is in nfs41_free_lock_state (fs/nfs/nfs4proc.c:8313). >> >> 8308 nfs41_free_lock_state(struct nfs_server *server, struct >> >> nfs4_lock_state *lsp) >> >> 8309 { >> >> 8310 struct rpc_task *task; >> >> 8311 struct rpc_cred *cred = lsp->ls_state->owner->so_cred; >> >> 8312 >> >> 8313 task = _nfs41_free_stateid(server, &lsp->ls_stateid, >> >> cred, false); >> >> 8314 nfs4_free_lock_state(server, lsp); >> >> 8315 if (IS_ERR(task)) >> >> 8316 return; >> >> 8317 rpc_put_task(task); >> >> >> > >> > Oof -- right. Same problem, just in a different spot. So there, we need >> > the openowner. We don't have a pointer directly to that, so we're >> > probably best off just holding a reference to the open stateid, and >> > pinning the superblock too. >> > >> > Maybe something like this instead? I'm also running xfstests on a VM >> > now to see if I can reproduce this and verify the fix: >> > >> > ---------------------[snip]----------------------- >> > >> > [PATCH] nfs: pin superblock and open state when running free_lock_state operations >> > >> > Christoph reported seeing this oops when running xfstests on a v4.1 client: >> > >> > generic/089 242s ...[ 2187.041239] general protection fault: 0000 [#1] SMP >> > [ 2187.042899] Modules linked in: >> > [ 2187.044000] CPU: 0 PID: 11913 Comm: kworker/0:1 Not tainted 3.16.0-rc6+ #1151 >> > [ 2187.044287] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 >> > [ 2187.044287] Workqueue: nfsiod free_lock_state_work >> > [ 2187.044287] task: ffff880072b50cd0 ti: ffff88007a4ec000 task.ti: ffff88007a4ec000 >> > [ 2187.044287] RIP: 0010:[<ffffffff81361ca6>] [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30 >> > [ 2187.044287] RSP: 0018:ffff88007a4efd58 EFLAGS: 00010296 >> > [ 2187.044287] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007a947ac0 RCX: 8000000000000000 >> > [ 2187.044287] RDX: ffffffff826af9e0 RSI: ffff88007b093c00 RDI: ffff88007b093db8 >> > [ 2187.044287] RBP: ffff88007a4efd58 R08: ffffffff832d3e10 R09: 000001c40efc0000 >> > [ 2187.044287] R10: 0000000000000000 R11: 0000000000059e30 R12: ffff88007fc13240 >> > [ 2187.044287] R13: ffff88007fc18b00 R14: ffff88007b093db8 R15: 0000000000000000 >> > [ 2187.044287] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 >> > [ 2187.044287] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> > [ 2187.044287] CR2: 00007f93ec33fb80 CR3: 0000000079dc2000 CR4: 00000000000006f0 >> > [ 2187.044287] Stack: >> > [ 2187.044287] ffff88007a4efdd8 ffffffff810cc877 ffffffff810cc80d ffff88007fc13258 >> > [ 2187.044287] 000000007a947af0 0000000000000000 ffffffff8353ccc8 ffffffff82b6f3d0 >> > [ 2187.044287] 0000000000000000 ffffffff82267679 ffff88007a4efdd8 ffff88007fc13240 >> > [ 2187.044287] Call Trace: >> > [ 2187.044287] [<ffffffff810cc877>] process_one_work+0x1c7/0x490 >> > [ 2187.044287] [<ffffffff810cc80d>] ? process_one_work+0x15d/0x490 >> > [ 2187.044287] [<ffffffff810cd569>] worker_thread+0x119/0x4f0 >> > [ 2187.044287] [<ffffffff810fbbad>] ? trace_hardirqs_on+0xd/0x10 >> > [ 2187.044287] [<ffffffff810cd450>] ? init_pwq+0x190/0x190 >> > [ 2187.044287] [<ffffffff810d3c6f>] kthread+0xdf/0x100 >> > [ 2187.044287] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 >> > [ 2187.044287] [<ffffffff81d9873c>] ret_from_fork+0x7c/0xb0 >> > [ 2187.044287] [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70 >> > [ 2187.044287] Code: 0f 1f 44 00 00 31 c0 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d b7 48 fe ff ff 48 8b 87 58 fe ff ff 48 89 e5 48 8b 40 30 <48> 8b 00 48 8b 10 48 89 c7 48 8b 92 90 03 00 00 ff 52 28 5d c3 >> > [ 2187.044287] RIP [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30 >> > [ 2187.044287] RSP <ffff88007a4efd58> >> > [ 2187.103626] ---[ end trace 0f11326d28e5d8fa ]--- >> > >> > It appears that the lock state outlived the open state with which it >> > was associated. >> > >> > Fix this by pinning down the open stateid and the superblock prior to >> > queueing the workqueue job, and then releasing putting those references >> > once the RPC task has been queued. >> > >> > Reported-by: Christoph Hellwig <hch@infradead.org> >> > Signed-off-by: Jeff Layton <jlayton@primarydata.com> >> > --- >> > fs/nfs/nfs4state.c | 13 +++++++++---- >> > 1 file changed, 9 insertions(+), 4 deletions(-) >> > >> > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c >> > index a770c8e469a7..fb29c7cbe8e3 100644 >> > --- a/fs/nfs/nfs4state.c >> > +++ b/fs/nfs/nfs4state.c >> > @@ -804,11 +804,14 @@ free_lock_state_work(struct work_struct *work) >> > { >> > struct nfs4_lock_state *lsp = container_of(work, >> > struct nfs4_lock_state, ls_release); >> > - struct nfs4_state *state = lsp->ls_state; >> > - struct nfs_server *server = state->owner->so_server; >> > + struct nfs4_state *osp = lsp->ls_state; >> > + struct super_block *sb = osp->inode->i_sb; >> > + struct nfs_server *server = NFS_SB(sb); >> > struct nfs_client *clp = server->nfs_client; >> > >> > clp->cl_mvops->free_lock_state(server, lsp); >> > + nfs4_put_open_state(osp); >> > + nfs_sb_deactive(sb); >> > } >> > >> > /* >> > @@ -896,9 +899,11 @@ void nfs4_put_lock_state(struct nfs4_lock_state *lsp) >> > if (list_empty(&state->lock_states)) >> > clear_bit(LK_STATE_IN_USE, &state->flags); >> > spin_unlock(&state->state_lock); >> > - if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) >> > + if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) { >> > + atomic_inc(&lsp->ls_state->count); >> > + nfs_sb_active(lsp->ls_state->inode->i_sb); >> > queue_work(nfsiod_workqueue, &lsp->ls_release); >> > - else { >> > + } else { >> > server = state->owner->so_server; >> > nfs4_free_lock_state(server, lsp); >> > } >> > -- >> > 1.9.3 >> > >> >> Can we step back a little? It looks to me as if we're working on a >> whole new range of symptoms that are a consequence of a set of locking >> rule changes for fl_release_private that were not well thought >> through. >> Prior to commit 72f98e72551fa, the locking rules for >> fl_release_private stated that it _does_ allow blocking. >> Nothing in commit 72f98e72551fa was done to change the code that did >> block, instead it just decreed that fl_release_private no longer >> allows blocking as if that magically makes thing work. >> >> This whole thing of doing an asynchronous call started because >> locks_delete_lock() is calling lock_free_lock() instead of just >> unlinking, and then deferring the actual freeing of the locks until >> we've dropped the spinlocks. >> >> It should be trivial to change locks_delete_lock() so that after >> calling locks_unlink_lock(), it adds to a private list that can then >> be drained at the end of flock_lock_file(), __posix_lock_file(), and >> locks_remove_file(). >> The lease code looks like it may need to be special cased >> (lease_modify()), but that can just keep doing what it is currently >> doing until someone fixes it. >> >> Pretty please.... >> > > Ok. It's not quite that simple but it should be doable... > > locks_release_private is also called by locks_copy_lock, and I don't > see a good way to defer that call. We might be able to just get rid of > it though, and just require a "pristine" lock be passed to > locks_copy_lock. It looks like that's already done in most cases anyway > (maybe all cases). It is definitely the case that all existing callers are using pristine locks. > I've already been making some motions toward doing a lock pushdown in > the lease code anyway, so this may just give me a real reason to finish > that up... Ack, however please note that neither NFS nor AFS support leases, and since they are the only filesystems that set fl_ops->fl_release_private, you only need to consider the lock manager fl_lmops->fl_release_private callers for now.
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index a770c8e469a7..fb29c7cbe8e3 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -804,11 +804,14 @@ free_lock_state_work(struct work_struct *work) { struct nfs4_lock_state *lsp = container_of(work, struct nfs4_lock_state, ls_release); - struct nfs4_state *state = lsp->ls_state; - struct nfs_server *server = state->owner->so_server; + struct nfs4_state *osp = lsp->ls_state; + struct super_block *sb = osp->inode->i_sb; + struct nfs_server *server = NFS_SB(sb); struct nfs_client *clp = server->nfs_client; clp->cl_mvops->free_lock_state(server, lsp); + nfs4_put_open_state(osp); + nfs_sb_deactive(sb); } /* @@ -896,9 +899,11 @@ void nfs4_put_lock_state(struct nfs4_lock_state *lsp) if (list_empty(&state->lock_states)) clear_bit(LK_STATE_IN_USE, &state->flags); spin_unlock(&state->state_lock); - if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) + if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) { + atomic_inc(&lsp->ls_state->count); + nfs_sb_active(lsp->ls_state->inode->i_sb); queue_work(nfsiod_workqueue, &lsp->ls_release); - else { + } else { server = state->owner->so_server; nfs4_free_lock_state(server, lsp); }