Message ID | 1387299103-2577-1-git-send-email-dros@netapp.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Resending to Trond’s correct email address. -dros On Dec 17, 2013, at 11:51 AM, Weston Andros Adamson <dros@netapp.com> wrote: > When a task enters call_refreshresult with status 0 from call_refresh and > !rpcauth_uptodatecred(task) it enters call_refresh again with no rate-limiting > or max number of retries. > > Instead of trying forever, make use of the retry path that other errors use. > > This only seems to be possible when the crrefresh callback is gss_refresh_null, > which only happens when destroying the context. > > To reproduce: > > 1) mount with sec=krb5 (or sec=sys with krb5 negotiated for non FSID specific > operations). > > 2) reboot - the client will be stuck and will need to be hard rebooted > > BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:46] > Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 parport_pc parport shpchp nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy > irq event stamp: 195724 > hardirqs last enabled at (195723): [<ffffffff814a925c>] restore_args+0x0/0x30 > hardirqs last disabled at (195724): [<ffffffff814b0a6a>] apic_timer_interrupt+0x6a/0x80 > softirqs last enabled at (195722): [<ffffffff8103f583>] __do_softirq+0x1df/0x276 > softirqs last disabled at (195717): [<ffffffff8103f852>] irq_exit+0x53/0x9a > CPU: 0 PID: 46 Comm: kworker/0:2 Not tainted 3.13.0-rc3-branch-dros_testing+ #4 > Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 > Workqueue: rpciod rpc_async_schedule [sunrpc] > task: ffff8800799c4260 ti: ffff880079002000 task.ti: ffff880079002000 > RIP: 0010:[<ffffffffa0064fd4>] [<ffffffffa0064fd4>] __rpc_execute+0x8a/0x362 [sunrpc] > RSP: 0018:ffff880079003d18 EFLAGS: 00000246 > RAX: 0000000000000005 RBX: 0000000000000007 RCX: 0000000000000007 > RDX: 0000000000000007 RSI: ffff88007aecbae8 RDI: ffff8800783d8900 > RBP: ffff880079003d78 R08: ffff88006e30e9f8 R09: ffffffffa005a3d7 > R10: ffff88006e30e7b0 R11: ffff8800783d8900 R12: ffffffffa006675e > R13: ffff880079003ce8 R14: ffff88006e30e7b0 R15: ffff8800783d8900 > FS: 0000000000000000(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f3072333000 CR3: 0000000001a0b000 CR4: 00000000001407f0 > Stack: > ffff880079003d98 0000000000000246 0000000000000000 ffff88007a9a4830 > ffff880000000000 ffffffff81073f47 ffff88007f212b00 ffff8800799c4260 > ffff8800783d8988 ffff88007f212b00 ffffe8ffff604800 0000000000000000 > Call Trace: > [<ffffffff81073f47>] ? trace_hardirqs_on_caller+0x145/0x1a1 > [<ffffffffa00652d3>] rpc_async_schedule+0x27/0x32 [sunrpc] > [<ffffffff81052974>] process_one_work+0x211/0x3a5 > [<ffffffff810528d5>] ? process_one_work+0x172/0x3a5 > [<ffffffff81052eeb>] worker_thread+0x134/0x202 > [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280 > [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280 > [<ffffffff810584a0>] kthread+0xc9/0xd1 > [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61 > [<ffffffff814afd6c>] ret_from_fork+0x7c/0xb0 > [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61 > Code: e8 87 63 fd e0 c6 05 10 dd 01 00 01 48 8b 43 70 4c 8d 6b 70 45 31 e4 a8 02 0f 85 d5 02 00 00 4c 8b 7b 48 48 c7 43 48 00 00 00 00 <4c> 8b 4b 50 4d 85 ff 75 0c 4d 85 c9 4d 89 cf 0f 84 32 01 00 00 > > And the output of "rpcdebug -m rpc -s all": > > RPC: 61 call_refresh (status 0) > RPC: 61 call_refresh (status 0) > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > RPC: 61 call_refreshresult (status 0) > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > RPC: 61 call_refreshresult (status 0) > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > RPC: 61 call_refresh (status 0) > RPC: 61 call_refreshresult (status 0) > RPC: 61 call_refresh (status 0) > RPC: 61 call_refresh (status 0) > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > RPC: 61 call_refreshresult (status 0) > RPC: 61 call_refresh (status 0) > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > RPC: 61 call_refresh (status 0) > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > RPC: 61 call_refreshresult (status 0) > RPC: 61 call_refresh (status 0) > RPC: 61 call_refresh (status 0) > RPC: 61 call_refresh (status 0) > RPC: 61 call_refresh (status 0) > RPC: 61 call_refreshresult (status 0) > RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 > Signed-off-by: Weston Andros Adamson <dros@netapp.com> > --- > net/sunrpc/clnt.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > index f09b7db..5f585a7 100644 > --- a/net/sunrpc/clnt.c > +++ b/net/sunrpc/clnt.c > @@ -1529,9 +1529,14 @@ call_refreshresult(struct rpc_task *task) > task->tk_action = call_refresh; > switch (status) { > case 0: > - if (rpcauth_uptodatecred(task)) > + if (rpcauth_uptodatecred(task)) { > task->tk_action = call_allocate; > - return; > + return; > + } > + /* > + * Use rate-limiting and a max number of retries if refresh > + * had status 0 but failed to update the cred. > + */ > case -ETIMEDOUT: > rpc_delay(task, 3*HZ); > case -EAGAIN: > -- > 1.8.3.1 (Apple Git-46) > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index f09b7db..5f585a7 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1529,9 +1529,14 @@ call_refreshresult(struct rpc_task *task) task->tk_action = call_refresh; switch (status) { case 0: - if (rpcauth_uptodatecred(task)) + if (rpcauth_uptodatecred(task)) { task->tk_action = call_allocate; - return; + return; + } + /* + * Use rate-limiting and a max number of retries if refresh + * had status 0 but failed to update the cred. + */ case -ETIMEDOUT: rpc_delay(task, 3*HZ); case -EAGAIN:
When a task enters call_refreshresult with status 0 from call_refresh and !rpcauth_uptodatecred(task) it enters call_refresh again with no rate-limiting or max number of retries. Instead of trying forever, make use of the retry path that other errors use. This only seems to be possible when the crrefresh callback is gss_refresh_null, which only happens when destroying the context. To reproduce: 1) mount with sec=krb5 (or sec=sys with krb5 negotiated for non FSID specific operations). 2) reboot - the client will be stuck and will need to be hard rebooted BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:46] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 parport_pc parport shpchp nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy irq event stamp: 195724 hardirqs last enabled at (195723): [<ffffffff814a925c>] restore_args+0x0/0x30 hardirqs last disabled at (195724): [<ffffffff814b0a6a>] apic_timer_interrupt+0x6a/0x80 softirqs last enabled at (195722): [<ffffffff8103f583>] __do_softirq+0x1df/0x276 softirqs last disabled at (195717): [<ffffffff8103f852>] irq_exit+0x53/0x9a CPU: 0 PID: 46 Comm: kworker/0:2 Not tainted 3.13.0-rc3-branch-dros_testing+ #4 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 Workqueue: rpciod rpc_async_schedule [sunrpc] task: ffff8800799c4260 ti: ffff880079002000 task.ti: ffff880079002000 RIP: 0010:[<ffffffffa0064fd4>] [<ffffffffa0064fd4>] __rpc_execute+0x8a/0x362 [sunrpc] RSP: 0018:ffff880079003d18 EFLAGS: 00000246 RAX: 0000000000000005 RBX: 0000000000000007 RCX: 0000000000000007 RDX: 0000000000000007 RSI: ffff88007aecbae8 RDI: ffff8800783d8900 RBP: ffff880079003d78 R08: ffff88006e30e9f8 R09: ffffffffa005a3d7 R10: ffff88006e30e7b0 R11: ffff8800783d8900 R12: ffffffffa006675e R13: ffff880079003ce8 R14: ffff88006e30e7b0 R15: ffff8800783d8900 FS: 0000000000000000(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3072333000 CR3: 0000000001a0b000 CR4: 00000000001407f0 Stack: ffff880079003d98 0000000000000246 0000000000000000 ffff88007a9a4830 ffff880000000000 ffffffff81073f47 ffff88007f212b00 ffff8800799c4260 ffff8800783d8988 ffff88007f212b00 ffffe8ffff604800 0000000000000000 Call Trace: [<ffffffff81073f47>] ? trace_hardirqs_on_caller+0x145/0x1a1 [<ffffffffa00652d3>] rpc_async_schedule+0x27/0x32 [sunrpc] [<ffffffff81052974>] process_one_work+0x211/0x3a5 [<ffffffff810528d5>] ? process_one_work+0x172/0x3a5 [<ffffffff81052eeb>] worker_thread+0x134/0x202 [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280 [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280 [<ffffffff810584a0>] kthread+0xc9/0xd1 [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61 [<ffffffff814afd6c>] ret_from_fork+0x7c/0xb0 [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61 Code: e8 87 63 fd e0 c6 05 10 dd 01 00 01 48 8b 43 70 4c 8d 6b 70 45 31 e4 a8 02 0f 85 d5 02 00 00 4c 8b 7b 48 48 c7 43 48 00 00 00 00 <4c> 8b 4b 50 4d 85 ff 75 0c 4d 85 c9 4d 89 cf 0f 84 32 01 00 00 And the output of "rpcdebug -m rpc -s all": RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refreshresult (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refreshresult (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refresh (status 0) RPC: 61 call_refreshresult (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refreshresult (status 0) RPC: 61 call_refresh (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refresh (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refreshresult (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refreshresult (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 Signed-off-by: Weston Andros Adamson <dros@netapp.com> --- net/sunrpc/clnt.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)