Message ID | 1309995932.5447.6.camel@lade.trondhjem.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/06/2011 04:45 PM, Trond Myklebust wrote: > On Wed, 2011-07-06 at 15:49 -0700, greearb@candelatech.com wrote: >> From: Ben Greear<greearb@candelatech.com> >> >> The rpc_killall_tasks logic is not locked against >> the work-queue thread, but it still directly modifies >> function pointers and data in the task objects. >> >> This patch changes the killall-tasks logic to set a flag >> that tells the work-queue thread to terminate the task >> instead of directly calling the terminate logic. >> >> Signed-off-by: Ben Greear<greearb@candelatech.com> >> --- >> >> NOTE: This needs review, as I am still struggling to understand >> the rpc code, and it's quite possible this patch either doesn't >> fully fix the problem or actually causes other issues. That said, >> my nfs stress test seems to run a bit more stable with this patch applied. > > Yes, but I don't see why you are adding a new flag, nor do I see why we > want to keep checking for that flag in the rpc_execute() loop. > rpc_killall_tasks() is not a frequent operation that we want to optimise > for. I was hoping that if the killall logic never set anything that was also set by the work-queue thread it would be lock-safe without needing explicit locking. I was a bit concerned that my flags |= KILLME logic would potentially over-write flags that were being simultaneously written elsewhere (so maybe I'd have to add a completely new variable for that KILLME flag to really be safe.) > > How about the following instead? I think it still races..more comments below. > > 8<---------------------------------------------------------------------------------- > From ecb7244b661c3f9d2008ef6048733e5cea2f98ab Mon Sep 17 00:00:00 2001 > From: Trond Myklebust<Trond.Myklebust@netapp.com> > Date: Wed, 6 Jul 2011 19:44:52 -0400 > Subject: [PATCH] SUNRPC: Fix a race between work-queue and rpc_killall_tasks > > Since rpc_killall_tasks may modify the rpc_task's tk_action field > without any locking, we need to be careful when dereferencing it. > + do_action = task->tk_callback; > + task->tk_callback = NULL; > + if (do_action == NULL) { I think the race still exists, though it would be harder to hit. What if the killall logic sets task->tk_callback right after you assign do_action, but before you set tk_callback to NULL? Or after you set tk_callback to NULL for that matter. > /* > * Perform the next FSM step. > - * tk_action may be NULL when the task has been killed > - * by someone else. > + * tk_action may be NULL if the task has been killed. > + * In particular, note that rpc_killall_tasks may > + * do this at any time, so beware when dereferencing. > */ > - if (task->tk_action == NULL) > + do_action = task->tk_action; > + if (do_action == NULL) > break; > - task->tk_action(task); > } > + do_action(task); > > /* > * Lockless check for whether task is sleeping or not. Thanks, Ben
On Wed, 2011-07-06 at 17:07 -0700, Ben Greear wrote: > On 07/06/2011 04:45 PM, Trond Myklebust wrote: > > On Wed, 2011-07-06 at 15:49 -0700, greearb@candelatech.com wrote: > >> From: Ben Greear<greearb@candelatech.com> > >> > >> The rpc_killall_tasks logic is not locked against > >> the work-queue thread, but it still directly modifies > >> function pointers and data in the task objects. > >> > >> This patch changes the killall-tasks logic to set a flag > >> that tells the work-queue thread to terminate the task > >> instead of directly calling the terminate logic. > >> > >> Signed-off-by: Ben Greear<greearb@candelatech.com> > >> --- > >> > >> NOTE: This needs review, as I am still struggling to understand > >> the rpc code, and it's quite possible this patch either doesn't > >> fully fix the problem or actually causes other issues. That said, > >> my nfs stress test seems to run a bit more stable with this patch applied. > > > > Yes, but I don't see why you are adding a new flag, nor do I see why we > > want to keep checking for that flag in the rpc_execute() loop. > > rpc_killall_tasks() is not a frequent operation that we want to optimise > > for. > > I was hoping that if the killall logic never set anything that was also > set by the work-queue thread it would be lock-safe without needing > explicit locking. > > I was a bit concerned that my flags |= KILLME logic would potentially > over-write flags that were being simultaneously written elsewhere > (so maybe I'd have to add a completely new variable for that KILLME flag > to really be safe.) > > > > > How about the following instead? > > I think it still races..more comments below. > > > > > 8<---------------------------------------------------------------------------------- > > From ecb7244b661c3f9d2008ef6048733e5cea2f98ab Mon Sep 17 00:00:00 2001 > > From: Trond Myklebust<Trond.Myklebust@netapp.com> > > Date: Wed, 6 Jul 2011 19:44:52 -0400 > > Subject: [PATCH] SUNRPC: Fix a race between work-queue and rpc_killall_tasks > > > > Since rpc_killall_tasks may modify the rpc_task's tk_action field > > without any locking, we need to be careful when dereferencing it. > > > + do_action = task->tk_callback; > > + task->tk_callback = NULL; > > + if (do_action == NULL) { > > I think the race still exists, though it would be harder to hit. > What if the killall logic sets task->tk_callback right after you assign do_action, but before > you set tk_callback to NULL? Or after you set tk_callback to NULL for > that matter. What if it does? The rpc call will continue to execute until it completes. rpc_killall_tasks is really only useful for signalling to tasks that are hanging on a completely unresponsive server that we want them to stop. The only case where we really care is in rpc_shutdown_client(), where we sleep and loop anyway. IOW: I really don't care about 'fixing' rpc_killall_tasks to perfection. All I care about is that it doesn't Oops.
On 07/06/2011 05:17 PM, Trond Myklebust wrote: > On Wed, 2011-07-06 at 17:07 -0700, Ben Greear wrote: >> On 07/06/2011 04:45 PM, Trond Myklebust wrote: >>> On Wed, 2011-07-06 at 15:49 -0700, greearb@candelatech.com wrote: >>>> From: Ben Greear<greearb@candelatech.com> >>>> >>>> The rpc_killall_tasks logic is not locked against >>>> the work-queue thread, but it still directly modifies >>>> function pointers and data in the task objects. >>>> >>>> This patch changes the killall-tasks logic to set a flag >>>> that tells the work-queue thread to terminate the task >>>> instead of directly calling the terminate logic. >>>> >>>> Signed-off-by: Ben Greear<greearb@candelatech.com> >>>> --- >>>> >>>> NOTE: This needs review, as I am still struggling to understand >>>> the rpc code, and it's quite possible this patch either doesn't >>>> fully fix the problem or actually causes other issues. That said, >>>> my nfs stress test seems to run a bit more stable with this patch applied. >>> >>> Yes, but I don't see why you are adding a new flag, nor do I see why we >>> want to keep checking for that flag in the rpc_execute() loop. >>> rpc_killall_tasks() is not a frequent operation that we want to optimise >>> for. >> >> I was hoping that if the killall logic never set anything that was also >> set by the work-queue thread it would be lock-safe without needing >> explicit locking. >> >> I was a bit concerned that my flags |= KILLME logic would potentially >> over-write flags that were being simultaneously written elsewhere >> (so maybe I'd have to add a completely new variable for that KILLME flag >> to really be safe.) >> >>> >>> How about the following instead? >> >> I think it still races..more comments below. >> >>> >>> 8<---------------------------------------------------------------------------------- >>> From ecb7244b661c3f9d2008ef6048733e5cea2f98ab Mon Sep 17 00:00:00 2001 >>> From: Trond Myklebust<Trond.Myklebust@netapp.com> >>> Date: Wed, 6 Jul 2011 19:44:52 -0400 >>> Subject: [PATCH] SUNRPC: Fix a race between work-queue and rpc_killall_tasks >>> >>> Since rpc_killall_tasks may modify the rpc_task's tk_action field >>> without any locking, we need to be careful when dereferencing it. >> >>> + do_action = task->tk_callback; >>> + task->tk_callback = NULL; >>> + if (do_action == NULL) { >> >> I think the race still exists, though it would be harder to hit. >> What if the killall logic sets task->tk_callback right after you assign do_action, but before >> you set tk_callback to NULL? Or after you set tk_callback to NULL for >> that matter. > > What if it does? The rpc call will continue to execute until it > completes. > > rpc_killall_tasks is really only useful for signalling to tasks that are > hanging on a completely unresponsive server that we want them to stop. > The only case where we really care is in rpc_shutdown_client(), where we > sleep and loop anyway. > > IOW: I really don't care about 'fixing' rpc_killall_tasks to perfection. > All I care about is that it doesn't Oops. That is my concern as well. I'll try your patch and see if it fixes the crashes I'm seeing in this area. Thanks, Ben
On 07/06/2011 04:45 PM, Trond Myklebust wrote: > On Wed, 2011-07-06 at 15:49 -0700, greearb@candelatech.com wrote: >> From: Ben Greear<greearb@candelatech.com> >> >> The rpc_killall_tasks logic is not locked against >> the work-queue thread, but it still directly modifies >> function pointers and data in the task objects. >> >> This patch changes the killall-tasks logic to set a flag >> that tells the work-queue thread to terminate the task >> instead of directly calling the terminate logic. >> >> Signed-off-by: Ben Greear<greearb@candelatech.com> >> --- >> >> NOTE: This needs review, as I am still struggling to understand >> the rpc code, and it's quite possible this patch either doesn't >> fully fix the problem or actually causes other issues. That said, >> my nfs stress test seems to run a bit more stable with this patch applied. > > Yes, but I don't see why you are adding a new flag, nor do I see why we > want to keep checking for that flag in the rpc_execute() loop. > rpc_killall_tasks() is not a frequent operation that we want to optimise > for. > > How about the following instead? > > 8<---------------------------------------------------------------------------------- > From ecb7244b661c3f9d2008ef6048733e5cea2f98ab Mon Sep 17 00:00:00 2001 > From: Trond Myklebust<Trond.Myklebust@netapp.com> > Date: Wed, 6 Jul 2011 19:44:52 -0400 > Subject: [PATCH] SUNRPC: Fix a race between work-queue and rpc_killall_tasks > > Since rpc_killall_tasks may modify the rpc_task's tk_action field > without any locking, we need to be careful when dereferencing it. > > Reported-by: Ben Greear<greearb@candelatech.com> > Signed-off-by: Trond Myklebust<Trond.Myklebust@netapp.com> I've been testing this for 4+ hours, and it seems to fix the problem. We'll continue to burn on it for a day or two just in case we're getting (un)lucky in our testing. Thanks, Ben > --- > net/sunrpc/sched.c | 27 +++++++++++---------------- > 1 files changed, 11 insertions(+), 16 deletions(-) > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c > index a27406b..4814e24 100644 > --- a/net/sunrpc/sched.c > +++ b/net/sunrpc/sched.c > @@ -616,30 +616,25 @@ static void __rpc_execute(struct rpc_task *task) > BUG_ON(RPC_IS_QUEUED(task)); > > for (;;) { > + void (*do_action)(struct rpc_task *); > > /* > - * Execute any pending callback. > + * Execute any pending callback first. > */ > - if (task->tk_callback) { > - void (*save_callback)(struct rpc_task *); > - > - /* > - * We set tk_callback to NULL before calling it, > - * in case it sets the tk_callback field itself: > - */ > - save_callback = task->tk_callback; > - task->tk_callback = NULL; > - save_callback(task); > - } else { > + do_action = task->tk_callback; > + task->tk_callback = NULL; > + if (do_action == NULL) { > /* > * Perform the next FSM step. > - * tk_action may be NULL when the task has been killed > - * by someone else. > + * tk_action may be NULL if the task has been killed. > + * In particular, note that rpc_killall_tasks may > + * do this at any time, so beware when dereferencing. > */ > - if (task->tk_action == NULL) > + do_action = task->tk_action; > + if (do_action == NULL) > break; > - task->tk_action(task); > } > + do_action(task); > > /* > * Lockless check for whether task is sleeping or not.
On 07/07/2011 01:38 PM, Ben Greear wrote: > On 07/06/2011 04:45 PM, Trond Myklebust wrote: >> On Wed, 2011-07-06 at 15:49 -0700, greearb@candelatech.com wrote: >>> From: Ben Greear<greearb@candelatech.com> >>> >>> The rpc_killall_tasks logic is not locked against >>> the work-queue thread, but it still directly modifies >>> function pointers and data in the task objects. >>> >>> This patch changes the killall-tasks logic to set a flag >>> that tells the work-queue thread to terminate the task >>> instead of directly calling the terminate logic. >>> >>> Signed-off-by: Ben Greear<greearb@candelatech.com> >>> --- >>> >>> NOTE: This needs review, as I am still struggling to understand >>> the rpc code, and it's quite possible this patch either doesn't >>> fully fix the problem or actually causes other issues. That said, >>> my nfs stress test seems to run a bit more stable with this patch applied. >> >> Yes, but I don't see why you are adding a new flag, nor do I see why we >> want to keep checking for that flag in the rpc_execute() loop. >> rpc_killall_tasks() is not a frequent operation that we want to optimise >> for. >> >> How about the following instead? >> >> 8<---------------------------------------------------------------------------------- >> From ecb7244b661c3f9d2008ef6048733e5cea2f98ab Mon Sep 17 00:00:00 2001 >> From: Trond Myklebust<Trond.Myklebust@netapp.com> >> Date: Wed, 6 Jul 2011 19:44:52 -0400 >> Subject: [PATCH] SUNRPC: Fix a race between work-queue and rpc_killall_tasks >> >> Since rpc_killall_tasks may modify the rpc_task's tk_action field >> without any locking, we need to be careful when dereferencing it. >> >> Reported-by: Ben Greear<greearb@candelatech.com> >> Signed-off-by: Trond Myklebust<Trond.Myklebust@netapp.com> > > I've been testing this for 4+ hours, and it seems to fix the problem. We'll > continue to burn on it for a day or two just in case we're > getting (un)lucky in our testing. > > Thanks, > Ben Well, we still hit the bug in over-night testing. Maybe there are other races... ============================================================================= BUG kmalloc-64: Object is on free-list ----------------------------------------------------------------------------- INFO: Allocated in rpcb_getport_async+0x39c/0x5a5 [sunrpc] age=52 cpu=6 pid=18908 __slab_alloc+0x348/0x3ba kmem_cache_alloc_trace+0x67/0xe7 rpcb_getport_async+0x39c/0x5a5 [sunrpc] call_bind+0x70/0x75 [sunrpc] __rpc_execute+0x80/0x253 [sunrpc] rpc_execute+0x3d/0x42 [sunrpc] rpc_run_task+0x79/0x81 [sunrpc] rpc_call_sync+0x3f/0x60 [sunrpc] rpc_ping+0x42/0x58 [sunrpc] rpc_create+0x4aa/0x527 [sunrpc] nfs_create_rpc_client+0xb1/0xf6 [nfs] nfs_init_client+0x3b/0x7d [nfs] nfs_get_client+0x453/0x5ab [nfs] nfs_create_server+0x10b/0x437 [nfs] nfs_fs_mount+0x4ca/0x708 [nfs] mount_fs+0x6b/0x152 INFO: Freed in rpcb_map_release+0x3f/0x44 [sunrpc] age=119 cpu=5 pid=13934 __slab_free+0x57/0x150 kfree+0x107/0x13a rpcb_map_release+0x3f/0x44 [sunrpc] rpc_release_calldata+0x12/0x14 [sunrpc] rpc_free_task+0x72/0x7a [sunrpc] rpc_final_put_task+0x82/0x8a [sunrpc] __rpc_execute+0x244/0x253 [sunrpc] rpc_async_schedule+0x10/0x12 [sunrpc] process_one_work+0x230/0x41d worker_thread+0x133/0x217 kthread+0x7d/0x85 kernel_thread_helper+0x4/0x10 INFO: Slab 0xffffea0002c38b90 objects=20 used=13 fp=0xffff8800ca27e620 flags=0x20000000004081 INFO: Object 0xffff8800ca27e620 @offset=1568 fp=0xffff8800ca27f880 Bytes b4 0xffff8800ca27e610: d0 c4 1a 02 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ÐÄ......ZZZZZZZZ Object 0xffff8800ca27e620: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff8800ca27e630: 6b 6b 6b 6b 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkk..kkkkkkkkkk Object 0xffff8800ca27e640: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xffff8800ca27e650: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk¥ Redzone 0xffff8800ca27e660: bb bb bb bb bb bb bb bb »»»»»»»» Padding 0xffff8800ca27e7a0: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ Pid: 13035, comm: kworker/0:1 Not tainted 3.0.0-rc6+ #13 Call Trace: [<ffffffff81105907>] print_trailer+0x131/0x13a [<ffffffff81105945>] object_err+0x35/0x3e [<ffffffff811077b3>] verify_mem_not_deleted+0x7a/0xb7 [<ffffffffa02891e5>] rpcb_getport_done+0x23/0x126 [sunrpc] [<ffffffffa02810df>] rpc_exit_task+0x3f/0x6d [sunrpc] [<ffffffffa02814d8>] __rpc_execute+0x80/0x253 [sunrpc] [<ffffffffa02816ed>] ? rpc_execute+0x42/0x42 [sunrpc] [<ffffffffa02816fd>] rpc_async_schedule+0x10/0x12 [sunrpc] [<ffffffff81061343>] process_one_work+0x230/0x41d [<ffffffff8106128e>] ? process_one_work+0x17b/0x41d [<ffffffff8106379f>] worker_thread+0x133/0x217 [<ffffffff8106366c>] ? manage_workers+0x191/0x191 [<ffffffff81066f9c>] kthread+0x7d/0x85 [<ffffffff81485ee4>] kernel_thread_helper+0x4/0x10 [<ffffffff8147f0d8>] ? retint_restore_args+0x13/0x13 [<ffffffff81066f1f>] ? __init_kthread_worker+0x56/0x56 [<ffffffff81485ee0>] ? gs_change+0x13/0x13 Thanks, Ben
On 07/08/2011 11:11 AM, Myklebust, Trond wrote: >> -----Original Message----- >> From: Ben Greear [mailto:greearb@candelatech.com] >> Sent: Friday, July 08, 2011 1:19 PM >> To: Myklebust, Trond >> Cc: linux-nfs@vger.kernel.org; linux-kernel@vger.kernel.org >> Subject: Re: [RFC] sunrpc: Fix race between work-queue and >> rpc_killall_tasks. >> >> On 07/06/2011 04:45 PM, Trond Myklebust wrote: >>> On Wed, 2011-07-06 at 15:49 -0700, greearb@candelatech.com wrote: >>>> From: Ben Greear<greearb@candelatech.com> >>>> >>>> The rpc_killall_tasks logic is not locked against >>>> the work-queue thread, but it still directly modifies >>>> function pointers and data in the task objects. >>>> >>>> This patch changes the killall-tasks logic to set a flag >>>> that tells the work-queue thread to terminate the task >>>> instead of directly calling the terminate logic. >>>> >>>> Signed-off-by: Ben Greear<greearb@candelatech.com> >>>> --- >>>> >>>> NOTE: This needs review, as I am still struggling to understand >>>> the rpc code, and it's quite possible this patch either doesn't >>>> fully fix the problem or actually causes other issues. That said, >>>> my nfs stress test seems to run a bit more stable with this patch >> applied. >>> >>> Yes, but I don't see why you are adding a new flag, nor do I see why >> we >>> want to keep checking for that flag in the rpc_execute() loop. >>> rpc_killall_tasks() is not a frequent operation that we want to >> optimise >>> for. >>> >>> How about the following instead? >> >> Ok, I looked at your patch closer. I think it can still cause >> bad race conditions. >> >> For instance: >> >> Assume that tk_callback is NULL at beginning of while loop in >> __rpc_execute, >> and tk_action is rpc_exit_task. >> >> While do_action(task) is being called, tk_action is set to NULL in >> rpc_exit_task. >> >> But, right after tk_action is set to NULL in rpc_exit_task, the >> rpc_killall_tasks >> method calls rpc_exit, which sets tk_action back to rpc_exit_task. >> >> I believe this could cause the xprt_release(task) logic to be called in >> the >> work-queue's execution of rpc_exit_task due to tk_action != NULL when >> it should not be. > > Why would this be a problem? xprt_release() can certainly be called multiple times on an rpc_task. Ditto rpbc_getport_done. > > The only thing that is not re-entrant there is rpcb_map_release, which should only ever be called once whether or not something calls rpc_killall_tasks. From the trace I posted, this stack trace below is being called with the void *data object already freed. One way for this to happen would be to have rpc_exit_task call task->tk_ops->rpc_call_done more than once (I believe). Two calls to rpc_exit_task could do that, and since the rpc_exit_task method is assigned to tk_action, I *think* the race I mention above could cause rpc_exit_task to be called twice. [<ffffffff81105907>] print_trailer+0x131/0x13a [<ffffffff81105945>] object_err+0x35/0x3e [<ffffffff811077b3>] verify_mem_not_deleted+0x7a/0xb7 [<ffffffffa02891e5>] rpcb_getport_done+0x23/0x126 [sunrpc] [<ffffffffa02810df>] rpc_exit_task+0x3f/0x6d [sunrpc] [<ffffffffa02814d8>] __rpc_execute+0x80/0x253 [sunrpc] [<ffffffffa02816ed>] ? rpc_execute+0x42/0x42 [sunrpc] [<ffffffffa02816fd>] rpc_async_schedule+0x10/0x12 [sunrpc] [<ffffffff81061343>] process_one_work+0x230/0x41d [<ffffffff8106128e>] ? process_one_work+0x17b/0x41d [<ffffffff8106379f>] worker_thread+0x133/0x217 [<ffffffff8106366c>] ? manage_workers+0x191/0x191 [<ffffffff81066f9c>] kthread+0x7d/0x85 [<ffffffff81485ee4>] kernel_thread_helper+0x4/0x10 [<ffffffff8147f0d8>] ? retint_restore_args+0x13/0x13 [<ffffffff81066f1f>] ? __init_kthread_worker+0x56/0x56 [<ffffffff81485ee0>] ? gs_change+0x13/0x13 Thanks, Ben
On 07/08/2011 03:14 PM, Myklebust, Trond wrote: > The calldata gets freed in the rpc_final_put_task() which shouldn't ever be run while the task is still referenced in __rpc_execute Ok, please go ahead and use your patch for the killall tasks race. My problem remains with or without your patch, and with or without my version. So, I'm hitting something else. I'm real low on ideas of how exactly I am hitting the bug..but will keep poking around. Thanks, Ben > > IOW: it should be impossible to call rpc_exit_task() after rpc_final_put_task > N?????r??y???b?X???v?^?)?{.n?+????{???"??^n?r??z???h????&???G???h?(????j"???m?????z?????f???h???~?mml==
On 07/08/2011 03:14 PM, Myklebust, Trond wrote: >> [<ffffffff81105907>] print_trailer+0x131/0x13a >> [<ffffffff81105945>] object_err+0x35/0x3e >> [<ffffffff811077b3>] verify_mem_not_deleted+0x7a/0xb7 >> [<ffffffffa02891e5>] rpcb_getport_done+0x23/0x126 [sunrpc] >> [<ffffffffa02810df>] rpc_exit_task+0x3f/0x6d [sunrpc] >> [<ffffffffa02814d8>] __rpc_execute+0x80/0x253 [sunrpc] >> [<ffffffffa02816ed>] ? rpc_execute+0x42/0x42 [sunrpc] >> [<ffffffffa02816fd>] rpc_async_schedule+0x10/0x12 [sunrpc] >> [<ffffffff81061343>] process_one_work+0x230/0x41d >> [<ffffffff8106128e>] ? process_one_work+0x17b/0x41d >> [<ffffffff8106379f>] worker_thread+0x133/0x217 >> [<ffffffff8106366c>] ? manage_workers+0x191/0x191 >> [<ffffffff81066f9c>] kthread+0x7d/0x85 >> [<ffffffff81485ee4>] kernel_thread_helper+0x4/0x10 >> [<ffffffff8147f0d8>] ? retint_restore_args+0x13/0x13 >> [<ffffffff81066f1f>] ? __init_kthread_worker+0x56/0x56 >> [<ffffffff81485ee0>] ? gs_change+0x13/0x13 > > The calldata gets freed in the rpc_final_put_task() which shouldn't ever be run while the task is still referenced in __rpc_execute > > IOW: it should be impossible to call rpc_exit_task() after rpc_final_put_task I added lots of locking around the calldata, work-queue logic, and such, and still the problem persists w/out hitting any of the debug warnings or poisoned values I put in. It almost seems like tk_calldata is just assigned to two different tasks. While poking through the code, I noticed that 'map' is static in rpcb_getport_async. That would seem to cause problems if two threads called this method at the same time, possibly causing tk_calldata to be assigned to two different tasks??? Any idea why it is static? I'm going to start another test run with this non-static to see if that resolves things... Thanks, Ben
On 07/12/2011 10:25 AM, Myklebust, Trond wrote: >> -----Original Message----- >> From: Ben Greear [mailto:greearb@candelatech.com] >> Sent: Tuesday, July 12, 2011 1:15 PM >> To: Myklebust, Trond >> Cc: linux-nfs@vger.kernel.org; linux-kernel@vger.kernel.org >> Subject: Re: [RFC] sunrpc: Fix race between work-queue and >> rpc_killall_tasks. >> >> On 07/08/2011 03:14 PM, Myklebust, Trond wrote: >> >>>> [<ffffffff81105907>] print_trailer+0x131/0x13a >>>> [<ffffffff81105945>] object_err+0x35/0x3e >>>> [<ffffffff811077b3>] verify_mem_not_deleted+0x7a/0xb7 >>>> [<ffffffffa02891e5>] rpcb_getport_done+0x23/0x126 [sunrpc] >>>> [<ffffffffa02810df>] rpc_exit_task+0x3f/0x6d [sunrpc] >>>> [<ffffffffa02814d8>] __rpc_execute+0x80/0x253 [sunrpc] >>>> [<ffffffffa02816ed>] ? rpc_execute+0x42/0x42 [sunrpc] >>>> [<ffffffffa02816fd>] rpc_async_schedule+0x10/0x12 [sunrpc] >>>> [<ffffffff81061343>] process_one_work+0x230/0x41d >>>> [<ffffffff8106128e>] ? process_one_work+0x17b/0x41d >>>> [<ffffffff8106379f>] worker_thread+0x133/0x217 >>>> [<ffffffff8106366c>] ? manage_workers+0x191/0x191 >>>> [<ffffffff81066f9c>] kthread+0x7d/0x85 >>>> [<ffffffff81485ee4>] kernel_thread_helper+0x4/0x10 >>>> [<ffffffff8147f0d8>] ? retint_restore_args+0x13/0x13 >>>> [<ffffffff81066f1f>] ? __init_kthread_worker+0x56/0x56 >>>> [<ffffffff81485ee0>] ? gs_change+0x13/0x13 >>> >>> The calldata gets freed in the rpc_final_put_task() which shouldn't >> ever be run while the task is still referenced in __rpc_execute >>> >>> IOW: it should be impossible to call rpc_exit_task() after >> rpc_final_put_task >> >> I added lots of locking around the calldata, work-queue logic, and >> such, and >> still the problem persists w/out hitting any of the debug warnings or >> poisoned >> values I put in. It almost seems like tk_calldata is just assigned to >> two >> different tasks. >> >> While poking through the code, I noticed that 'map' is static in >> rpcb_getport_async. >> >> That would seem to cause problems if two threads called this method at >> the same time, possibly causing tk_calldata to be assigned to two >> different >> tasks??? >> >> Any idea why it is static? > > Doh! That is clearly a typo dating all the way back to when Chuck wrote that function. > > Yes, that would definitely explain your problem. Ok, patch sent. I assume someone will propagate this to stable as desired? And assuming this fixes it, can I get some brownie points towards review of the ip-addr binding patches? :) Thanks, Ben
On 07/12/2011 10:30 AM, Ben Greear wrote: > On 07/12/2011 10:25 AM, Myklebust, Trond wrote: >>> -----Original Message----- >>> From: Ben Greear [mailto:greearb@candelatech.com] >>> Sent: Tuesday, July 12, 2011 1:15 PM >>> To: Myklebust, Trond >>> Cc: linux-nfs@vger.kernel.org; linux-kernel@vger.kernel.org >>> Subject: Re: [RFC] sunrpc: Fix race between work-queue and >>> rpc_killall_tasks. >>> >>> I added lots of locking around the calldata, work-queue logic, and >>> such, and >>> still the problem persists w/out hitting any of the debug warnings or >>> poisoned >>> values I put in. It almost seems like tk_calldata is just assigned to >>> two >>> different tasks. >>> >>> While poking through the code, I noticed that 'map' is static in >>> rpcb_getport_async. >>> >>> That would seem to cause problems if two threads called this method at >>> the same time, possibly causing tk_calldata to be assigned to two >>> different >>> tasks??? >>> >>> Any idea why it is static? >> >> Doh! That is clearly a typo dating all the way back to when Chuck >> wrote that function. >> >> Yes, that would definitely explain your problem. > > Ok, patch sent. I assume someone will propagate this to stable > as desired? > > And assuming this fixes it, can I get some brownie points towards > review of the ip-addr binding patches? :) Just to close this issue: We ran a clean 24+ hour test mounting and unmounting 200 mounts every 30 seconds, and it ran with zero problems. This was with 2.6.38.8+ with this fix applied. 3.0-rc7+ is still flaky in various other ways, but I see no more NFS problems at least. So, that was the problem I was hitting, and it appears to be the last problem in this area. Thanks, Ben
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index a27406b..4814e24 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -616,30 +616,25 @@ static void __rpc_execute(struct rpc_task *task) BUG_ON(RPC_IS_QUEUED(task)); for (;;) { + void (*do_action)(struct rpc_task *); /* - * Execute any pending callback. + * Execute any pending callback first. */ - if (task->tk_callback) { - void (*save_callback)(struct rpc_task *); - - /* - * We set tk_callback to NULL before calling it, - * in case it sets the tk_callback field itself: - */ - save_callback = task->tk_callback; - task->tk_callback = NULL; - save_callback(task); - } else { + do_action = task->tk_callback; + task->tk_callback = NULL; + if (do_action == NULL) { /* * Perform the next FSM step. - * tk_action may be NULL when the task has been killed - * by someone else. + * tk_action may be NULL if the task has been killed. + * In particular, note that rpc_killall_tasks may + * do this at any time, so beware when dereferencing. */ - if (task->tk_action == NULL) + do_action = task->tk_action; + if (do_action == NULL) break; - task->tk_action(task); } + do_action(task); /* * Lockless check for whether task is sleeping or not.