diff mbox

[RFC] sunrpc: Fix race between work-queue and rpc_killall_tasks.

Message ID 1309995932.5447.6.camel@lade.trondhjem.org (mailing list archive)
State New, archived
Headers show

Commit Message

Trond Myklebust July 6, 2011, 11:45 p.m. UTC
On Wed, 2011-07-06 at 15:49 -0700, greearb@candelatech.com wrote: 
> From: Ben Greear <greearb@candelatech.com>
> 
> The rpc_killall_tasks logic is not locked against
> the work-queue thread, but it still directly modifies
> function pointers and data in the task objects.
> 
> This patch changes the killall-tasks logic to set a flag
> that tells the work-queue thread to terminate the task
> instead of directly calling the terminate logic.
> 
> Signed-off-by: Ben Greear <greearb@candelatech.com>
> ---
> 
> NOTE:  This needs review, as I am still struggling to understand
> the rpc code, and it's quite possible this patch either doesn't
> fully fix the problem or actually causes other issues.  That said,
> my nfs stress test seems to run a bit more stable with this patch applied.

Yes, but I don't see why you are adding a new flag, nor do I see why we
want to keep checking for that flag in the rpc_execute() loop.
rpc_killall_tasks() is not a frequent operation that we want to optimise
for.

How about the following instead?

8<---------------------------------------------------------------------------------- 
From ecb7244b661c3f9d2008ef6048733e5cea2f98ab Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Wed, 6 Jul 2011 19:44:52 -0400
Subject: [PATCH] SUNRPC: Fix a race between work-queue and rpc_killall_tasks

Since rpc_killall_tasks may modify the rpc_task's tk_action field
without any locking, we need to be careful when dereferencing it.

Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 net/sunrpc/sched.c |   27 +++++++++++----------------
 1 files changed, 11 insertions(+), 16 deletions(-)

Comments

Ben Greear July 7, 2011, 12:07 a.m. UTC | #1
On 07/06/2011 04:45 PM, Trond Myklebust wrote:
> On Wed, 2011-07-06 at 15:49 -0700, greearb@candelatech.com wrote:
>> From: Ben Greear<greearb@candelatech.com>
>>
>> The rpc_killall_tasks logic is not locked against
>> the work-queue thread, but it still directly modifies
>> function pointers and data in the task objects.
>>
>> This patch changes the killall-tasks logic to set a flag
>> that tells the work-queue thread to terminate the task
>> instead of directly calling the terminate logic.
>>
>> Signed-off-by: Ben Greear<greearb@candelatech.com>
>> ---
>>
>> NOTE:  This needs review, as I am still struggling to understand
>> the rpc code, and it's quite possible this patch either doesn't
>> fully fix the problem or actually causes other issues.  That said,
>> my nfs stress test seems to run a bit more stable with this patch applied.
>
> Yes, but I don't see why you are adding a new flag, nor do I see why we
> want to keep checking for that flag in the rpc_execute() loop.
> rpc_killall_tasks() is not a frequent operation that we want to optimise
> for.

I was hoping that if the killall logic never set anything that was also
set by the work-queue thread it would be lock-safe without needing
explicit locking.

I was a bit concerned that my flags |= KILLME logic would potentially
over-write flags that were being simultaneously written elsewhere
(so maybe I'd have to add a completely new variable for that KILLME flag
to really be safe.)

>
> How about the following instead?

I think it still races..more comments below.

>
> 8<----------------------------------------------------------------------------------
>  From ecb7244b661c3f9d2008ef6048733e5cea2f98ab Mon Sep 17 00:00:00 2001
> From: Trond Myklebust<Trond.Myklebust@netapp.com>
> Date: Wed, 6 Jul 2011 19:44:52 -0400
> Subject: [PATCH] SUNRPC: Fix a race between work-queue and rpc_killall_tasks
>
> Since rpc_killall_tasks may modify the rpc_task's tk_action field
> without any locking, we need to be careful when dereferencing it.

> +		do_action = task->tk_callback;
> +		task->tk_callback = NULL;
> +		if (do_action == NULL) {

I think the race still exists, though it would be harder to hit.
What if the killall logic sets task->tk_callback right after you assign do_action, but before
you set tk_callback to NULL?  Or after you set tk_callback to NULL for
that matter.

>   			/*
>   			 * Perform the next FSM step.
> -			 * tk_action may be NULL when the task has been killed
> -			 * by someone else.
> +			 * tk_action may be NULL if the task has been killed.
> +			 * In particular, note that rpc_killall_tasks may
> +			 * do this at any time, so beware when dereferencing.
>   			 */
> -			if (task->tk_action == NULL)
> +			do_action = task->tk_action;
> +			if (do_action == NULL)
>   				break;
> -			task->tk_action(task);
>   		}
> +		do_action(task);
>
>   		/*
>   		 * Lockless check for whether task is sleeping or not.

Thanks,
Ben
Trond Myklebust July 7, 2011, 12:17 a.m. UTC | #2
On Wed, 2011-07-06 at 17:07 -0700, Ben Greear wrote: 
> On 07/06/2011 04:45 PM, Trond Myklebust wrote:
> > On Wed, 2011-07-06 at 15:49 -0700, greearb@candelatech.com wrote:
> >> From: Ben Greear<greearb@candelatech.com>
> >>
> >> The rpc_killall_tasks logic is not locked against
> >> the work-queue thread, but it still directly modifies
> >> function pointers and data in the task objects.
> >>
> >> This patch changes the killall-tasks logic to set a flag
> >> that tells the work-queue thread to terminate the task
> >> instead of directly calling the terminate logic.
> >>
> >> Signed-off-by: Ben Greear<greearb@candelatech.com>
> >> ---
> >>
> >> NOTE:  This needs review, as I am still struggling to understand
> >> the rpc code, and it's quite possible this patch either doesn't
> >> fully fix the problem or actually causes other issues.  That said,
> >> my nfs stress test seems to run a bit more stable with this patch applied.
> >
> > Yes, but I don't see why you are adding a new flag, nor do I see why we
> > want to keep checking for that flag in the rpc_execute() loop.
> > rpc_killall_tasks() is not a frequent operation that we want to optimise
> > for.
> 
> I was hoping that if the killall logic never set anything that was also
> set by the work-queue thread it would be lock-safe without needing
> explicit locking.
> 
> I was a bit concerned that my flags |= KILLME logic would potentially
> over-write flags that were being simultaneously written elsewhere
> (so maybe I'd have to add a completely new variable for that KILLME flag
> to really be safe.)
> 
> >
> > How about the following instead?
> 
> I think it still races..more comments below.
> 
> >
> > 8<----------------------------------------------------------------------------------
> >  From ecb7244b661c3f9d2008ef6048733e5cea2f98ab Mon Sep 17 00:00:00 2001
> > From: Trond Myklebust<Trond.Myklebust@netapp.com>
> > Date: Wed, 6 Jul 2011 19:44:52 -0400
> > Subject: [PATCH] SUNRPC: Fix a race between work-queue and rpc_killall_tasks
> >
> > Since rpc_killall_tasks may modify the rpc_task's tk_action field
> > without any locking, we need to be careful when dereferencing it.
> 
> > +		do_action = task->tk_callback;
> > +		task->tk_callback = NULL;
> > +		if (do_action == NULL) {
> 
> I think the race still exists, though it would be harder to hit.
> What if the killall logic sets task->tk_callback right after you assign do_action, but before
> you set tk_callback to NULL?  Or after you set tk_callback to NULL for
> that matter.

What if it does? The rpc call will continue to execute until it
completes.

rpc_killall_tasks is really only useful for signalling to tasks that are
hanging on a completely unresponsive server that we want them to stop.
The only case where we really care is in rpc_shutdown_client(), where we
sleep and loop anyway.

IOW: I really don't care about 'fixing' rpc_killall_tasks to perfection.
All I care about is that it doesn't Oops.
Ben Greear July 7, 2011, 12:35 a.m. UTC | #3
On 07/06/2011 05:17 PM, Trond Myklebust wrote:
> On Wed, 2011-07-06 at 17:07 -0700, Ben Greear wrote:
>> On 07/06/2011 04:45 PM, Trond Myklebust wrote:
>>> On Wed, 2011-07-06 at 15:49 -0700, greearb@candelatech.com wrote:
>>>> From: Ben Greear<greearb@candelatech.com>
>>>>
>>>> The rpc_killall_tasks logic is not locked against
>>>> the work-queue thread, but it still directly modifies
>>>> function pointers and data in the task objects.
>>>>
>>>> This patch changes the killall-tasks logic to set a flag
>>>> that tells the work-queue thread to terminate the task
>>>> instead of directly calling the terminate logic.
>>>>
>>>> Signed-off-by: Ben Greear<greearb@candelatech.com>
>>>> ---
>>>>
>>>> NOTE:  This needs review, as I am still struggling to understand
>>>> the rpc code, and it's quite possible this patch either doesn't
>>>> fully fix the problem or actually causes other issues.  That said,
>>>> my nfs stress test seems to run a bit more stable with this patch applied.
>>>
>>> Yes, but I don't see why you are adding a new flag, nor do I see why we
>>> want to keep checking for that flag in the rpc_execute() loop.
>>> rpc_killall_tasks() is not a frequent operation that we want to optimise
>>> for.
>>
>> I was hoping that if the killall logic never set anything that was also
>> set by the work-queue thread it would be lock-safe without needing
>> explicit locking.
>>
>> I was a bit concerned that my flags |= KILLME logic would potentially
>> over-write flags that were being simultaneously written elsewhere
>> (so maybe I'd have to add a completely new variable for that KILLME flag
>> to really be safe.)
>>
>>>
>>> How about the following instead?
>>
>> I think it still races..more comments below.
>>
>>>
>>> 8<----------------------------------------------------------------------------------
>>>   From ecb7244b661c3f9d2008ef6048733e5cea2f98ab Mon Sep 17 00:00:00 2001
>>> From: Trond Myklebust<Trond.Myklebust@netapp.com>
>>> Date: Wed, 6 Jul 2011 19:44:52 -0400
>>> Subject: [PATCH] SUNRPC: Fix a race between work-queue and rpc_killall_tasks
>>>
>>> Since rpc_killall_tasks may modify the rpc_task's tk_action field
>>> without any locking, we need to be careful when dereferencing it.
>>
>>> +		do_action = task->tk_callback;
>>> +		task->tk_callback = NULL;
>>> +		if (do_action == NULL) {
>>
>> I think the race still exists, though it would be harder to hit.
>> What if the killall logic sets task->tk_callback right after you assign do_action, but before
>> you set tk_callback to NULL?  Or after you set tk_callback to NULL for
>> that matter.
>
> What if it does? The rpc call will continue to execute until it
> completes.
>
> rpc_killall_tasks is really only useful for signalling to tasks that are
> hanging on a completely unresponsive server that we want them to stop.
> The only case where we really care is in rpc_shutdown_client(), where we
> sleep and loop anyway.
>
> IOW: I really don't care about 'fixing' rpc_killall_tasks to perfection.
> All I care about is that it doesn't Oops.

That is my concern as well.  I'll try your patch and see if it fixes
the crashes I'm seeing in this area.

Thanks,
Ben
Ben Greear July 7, 2011, 8:38 p.m. UTC | #4
On 07/06/2011 04:45 PM, Trond Myklebust wrote:
> On Wed, 2011-07-06 at 15:49 -0700, greearb@candelatech.com wrote:
>> From: Ben Greear<greearb@candelatech.com>
>>
>> The rpc_killall_tasks logic is not locked against
>> the work-queue thread, but it still directly modifies
>> function pointers and data in the task objects.
>>
>> This patch changes the killall-tasks logic to set a flag
>> that tells the work-queue thread to terminate the task
>> instead of directly calling the terminate logic.
>>
>> Signed-off-by: Ben Greear<greearb@candelatech.com>
>> ---
>>
>> NOTE:  This needs review, as I am still struggling to understand
>> the rpc code, and it's quite possible this patch either doesn't
>> fully fix the problem or actually causes other issues.  That said,
>> my nfs stress test seems to run a bit more stable with this patch applied.
>
> Yes, but I don't see why you are adding a new flag, nor do I see why we
> want to keep checking for that flag in the rpc_execute() loop.
> rpc_killall_tasks() is not a frequent operation that we want to optimise
> for.
>
> How about the following instead?
>
> 8<----------------------------------------------------------------------------------
>  From ecb7244b661c3f9d2008ef6048733e5cea2f98ab Mon Sep 17 00:00:00 2001
> From: Trond Myklebust<Trond.Myklebust@netapp.com>
> Date: Wed, 6 Jul 2011 19:44:52 -0400
> Subject: [PATCH] SUNRPC: Fix a race between work-queue and rpc_killall_tasks
>
> Since rpc_killall_tasks may modify the rpc_task's tk_action field
> without any locking, we need to be careful when dereferencing it.
>
> Reported-by: Ben Greear<greearb@candelatech.com>
> Signed-off-by: Trond Myklebust<Trond.Myklebust@netapp.com>

I've been testing this for 4+ hours, and it seems to fix the problem.  We'll
continue to burn on it for a day or two just in case we're
getting (un)lucky in our testing.

Thanks,
Ben

> ---
>   net/sunrpc/sched.c |   27 +++++++++++----------------
>   1 files changed, 11 insertions(+), 16 deletions(-)
>
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index a27406b..4814e24 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -616,30 +616,25 @@ static void __rpc_execute(struct rpc_task *task)
>   	BUG_ON(RPC_IS_QUEUED(task));
>
>   	for (;;) {
> +		void (*do_action)(struct rpc_task *);
>
>   		/*
> -		 * Execute any pending callback.
> +		 * Execute any pending callback first.
>   		 */
> -		if (task->tk_callback) {
> -			void (*save_callback)(struct rpc_task *);
> -
> -			/*
> -			 * We set tk_callback to NULL before calling it,
> -			 * in case it sets the tk_callback field itself:
> -			 */
> -			save_callback = task->tk_callback;
> -			task->tk_callback = NULL;
> -			save_callback(task);
> -		} else {
> +		do_action = task->tk_callback;
> +		task->tk_callback = NULL;
> +		if (do_action == NULL) {
>   			/*
>   			 * Perform the next FSM step.
> -			 * tk_action may be NULL when the task has been killed
> -			 * by someone else.
> +			 * tk_action may be NULL if the task has been killed.
> +			 * In particular, note that rpc_killall_tasks may
> +			 * do this at any time, so beware when dereferencing.
>   			 */
> -			if (task->tk_action == NULL)
> +			do_action = task->tk_action;
> +			if (do_action == NULL)
>   				break;
> -			task->tk_action(task);
>   		}
> +		do_action(task);
>
>   		/*
>   		 * Lockless check for whether task is sleeping or not.
Ben Greear July 8, 2011, 3:03 p.m. UTC | #5
On 07/07/2011 01:38 PM, Ben Greear wrote:
> On 07/06/2011 04:45 PM, Trond Myklebust wrote:
>> On Wed, 2011-07-06 at 15:49 -0700, greearb@candelatech.com wrote:
>>> From: Ben Greear<greearb@candelatech.com>
>>>
>>> The rpc_killall_tasks logic is not locked against
>>> the work-queue thread, but it still directly modifies
>>> function pointers and data in the task objects.
>>>
>>> This patch changes the killall-tasks logic to set a flag
>>> that tells the work-queue thread to terminate the task
>>> instead of directly calling the terminate logic.
>>>
>>> Signed-off-by: Ben Greear<greearb@candelatech.com>
>>> ---
>>>
>>> NOTE: This needs review, as I am still struggling to understand
>>> the rpc code, and it's quite possible this patch either doesn't
>>> fully fix the problem or actually causes other issues. That said,
>>> my nfs stress test seems to run a bit more stable with this patch applied.
>>
>> Yes, but I don't see why you are adding a new flag, nor do I see why we
>> want to keep checking for that flag in the rpc_execute() loop.
>> rpc_killall_tasks() is not a frequent operation that we want to optimise
>> for.
>>
>> How about the following instead?
>>
>> 8<----------------------------------------------------------------------------------
>> From ecb7244b661c3f9d2008ef6048733e5cea2f98ab Mon Sep 17 00:00:00 2001
>> From: Trond Myklebust<Trond.Myklebust@netapp.com>
>> Date: Wed, 6 Jul 2011 19:44:52 -0400
>> Subject: [PATCH] SUNRPC: Fix a race between work-queue and rpc_killall_tasks
>>
>> Since rpc_killall_tasks may modify the rpc_task's tk_action field
>> without any locking, we need to be careful when dereferencing it.
>>
>> Reported-by: Ben Greear<greearb@candelatech.com>
>> Signed-off-by: Trond Myklebust<Trond.Myklebust@netapp.com>
>
> I've been testing this for 4+ hours, and it seems to fix the problem. We'll
> continue to burn on it for a day or two just in case we're
> getting (un)lucky in our testing.
>
> Thanks,
> Ben

Well, we still hit the bug in over-night testing.  Maybe there are other races...

=============================================================================
BUG kmalloc-64: Object is on free-list
-----------------------------------------------------------------------------

INFO: Allocated in rpcb_getport_async+0x39c/0x5a5 [sunrpc] age=52 cpu=6 pid=18908
         __slab_alloc+0x348/0x3ba
         kmem_cache_alloc_trace+0x67/0xe7
         rpcb_getport_async+0x39c/0x5a5 [sunrpc]
         call_bind+0x70/0x75 [sunrpc]
         __rpc_execute+0x80/0x253 [sunrpc]
         rpc_execute+0x3d/0x42 [sunrpc]
         rpc_run_task+0x79/0x81 [sunrpc]
         rpc_call_sync+0x3f/0x60 [sunrpc]
         rpc_ping+0x42/0x58 [sunrpc]
         rpc_create+0x4aa/0x527 [sunrpc]
         nfs_create_rpc_client+0xb1/0xf6 [nfs]
         nfs_init_client+0x3b/0x7d [nfs]
         nfs_get_client+0x453/0x5ab [nfs]
         nfs_create_server+0x10b/0x437 [nfs]
         nfs_fs_mount+0x4ca/0x708 [nfs]
         mount_fs+0x6b/0x152
INFO: Freed in rpcb_map_release+0x3f/0x44 [sunrpc] age=119 cpu=5 pid=13934
         __slab_free+0x57/0x150
         kfree+0x107/0x13a
         rpcb_map_release+0x3f/0x44 [sunrpc]
         rpc_release_calldata+0x12/0x14 [sunrpc]
         rpc_free_task+0x72/0x7a [sunrpc]
         rpc_final_put_task+0x82/0x8a [sunrpc]
         __rpc_execute+0x244/0x253 [sunrpc]
         rpc_async_schedule+0x10/0x12 [sunrpc]
         process_one_work+0x230/0x41d
         worker_thread+0x133/0x217
         kthread+0x7d/0x85
         kernel_thread_helper+0x4/0x10
INFO: Slab 0xffffea0002c38b90 objects=20 used=13 fp=0xffff8800ca27e620 flags=0x20000000004081
INFO: Object 0xffff8800ca27e620 @offset=1568 fp=0xffff8800ca27f880
Bytes b4 0xffff8800ca27e610:  d0 c4 1a 02 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ÐÄ......ZZZZZZZZ
   Object 0xffff8800ca27e620:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
   Object 0xffff8800ca27e630:  6b 6b 6b 6b 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkk..kkkkkkkkkk
   Object 0xffff8800ca27e640:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
   Object 0xffff8800ca27e650:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk¥
  Redzone 0xffff8800ca27e660:  bb bb bb bb bb bb bb bb                         »»»»»»»»
  Padding 0xffff8800ca27e7a0:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ
Pid: 13035, comm: kworker/0:1 Not tainted 3.0.0-rc6+ #13
Call Trace:
  [<ffffffff81105907>] print_trailer+0x131/0x13a
  [<ffffffff81105945>] object_err+0x35/0x3e
  [<ffffffff811077b3>] verify_mem_not_deleted+0x7a/0xb7
  [<ffffffffa02891e5>] rpcb_getport_done+0x23/0x126 [sunrpc]
  [<ffffffffa02810df>] rpc_exit_task+0x3f/0x6d [sunrpc]
  [<ffffffffa02814d8>] __rpc_execute+0x80/0x253 [sunrpc]
  [<ffffffffa02816ed>] ? rpc_execute+0x42/0x42 [sunrpc]
  [<ffffffffa02816fd>] rpc_async_schedule+0x10/0x12 [sunrpc]
  [<ffffffff81061343>] process_one_work+0x230/0x41d
  [<ffffffff8106128e>] ? process_one_work+0x17b/0x41d
  [<ffffffff8106379f>] worker_thread+0x133/0x217
  [<ffffffff8106366c>] ? manage_workers+0x191/0x191
  [<ffffffff81066f9c>] kthread+0x7d/0x85
  [<ffffffff81485ee4>] kernel_thread_helper+0x4/0x10
  [<ffffffff8147f0d8>] ? retint_restore_args+0x13/0x13
  [<ffffffff81066f1f>] ? __init_kthread_worker+0x56/0x56
  [<ffffffff81485ee0>] ? gs_change+0x13/0x13

Thanks,
Ben
Ben Greear July 8, 2011, 10:03 p.m. UTC | #6
On 07/08/2011 11:11 AM, Myklebust, Trond wrote:
>> -----Original Message-----
>> From: Ben Greear [mailto:greearb@candelatech.com]
>> Sent: Friday, July 08, 2011 1:19 PM
>> To: Myklebust, Trond
>> Cc: linux-nfs@vger.kernel.org; linux-kernel@vger.kernel.org
>> Subject: Re: [RFC] sunrpc: Fix race between work-queue and
>> rpc_killall_tasks.
>>
>> On 07/06/2011 04:45 PM, Trond Myklebust wrote:
>>> On Wed, 2011-07-06 at 15:49 -0700, greearb@candelatech.com wrote:
>>>> From: Ben Greear<greearb@candelatech.com>
>>>>
>>>> The rpc_killall_tasks logic is not locked against
>>>> the work-queue thread, but it still directly modifies
>>>> function pointers and data in the task objects.
>>>>
>>>> This patch changes the killall-tasks logic to set a flag
>>>> that tells the work-queue thread to terminate the task
>>>> instead of directly calling the terminate logic.
>>>>
>>>> Signed-off-by: Ben Greear<greearb@candelatech.com>
>>>> ---
>>>>
>>>> NOTE:  This needs review, as I am still struggling to understand
>>>> the rpc code, and it's quite possible this patch either doesn't
>>>> fully fix the problem or actually causes other issues.  That said,
>>>> my nfs stress test seems to run a bit more stable with this patch
>> applied.
>>>
>>> Yes, but I don't see why you are adding a new flag, nor do I see why
>> we
>>> want to keep checking for that flag in the rpc_execute() loop.
>>> rpc_killall_tasks() is not a frequent operation that we want to
>> optimise
>>> for.
>>>
>>> How about the following instead?
>>
>> Ok, I looked at your patch closer.  I think it can still cause
>> bad race conditions.
>>
>> For instance:
>>
>> Assume that tk_callback is NULL at beginning of while loop in
>> __rpc_execute,
>> and tk_action is rpc_exit_task.
>>
>> While do_action(task) is being called, tk_action is set to NULL in
>> rpc_exit_task.
>>
>> But, right after tk_action is set to NULL in rpc_exit_task, the
>> rpc_killall_tasks
>> method calls rpc_exit, which sets tk_action back to rpc_exit_task.
>>
>> I believe this could cause the xprt_release(task) logic to be called in
>> the
>> work-queue's execution of rpc_exit_task due to tk_action != NULL when
>> it should not be.
>
> Why would this be a problem? xprt_release() can certainly be called multiple times on an rpc_task. Ditto rpbc_getport_done.
>
> The only thing that is not re-entrant there is rpcb_map_release, which should only ever be called once whether or not something calls rpc_killall_tasks.


 From the trace I posted, this stack trace below is being
called with the void *data object already freed.

One way for this to happen would be to have rpc_exit_task call task->tk_ops->rpc_call_done
more than once (I believe).  Two calls to rpc_exit_task could do that, and since the
rpc_exit_task method is assigned to tk_action, I *think* the race I mention above could cause
rpc_exit_task to be called twice.

  [<ffffffff81105907>] print_trailer+0x131/0x13a
  [<ffffffff81105945>] object_err+0x35/0x3e
  [<ffffffff811077b3>] verify_mem_not_deleted+0x7a/0xb7
  [<ffffffffa02891e5>] rpcb_getport_done+0x23/0x126 [sunrpc]
  [<ffffffffa02810df>] rpc_exit_task+0x3f/0x6d [sunrpc]
  [<ffffffffa02814d8>] __rpc_execute+0x80/0x253 [sunrpc]
  [<ffffffffa02816ed>] ? rpc_execute+0x42/0x42 [sunrpc]
  [<ffffffffa02816fd>] rpc_async_schedule+0x10/0x12 [sunrpc]
  [<ffffffff81061343>] process_one_work+0x230/0x41d
  [<ffffffff8106128e>] ? process_one_work+0x17b/0x41d
  [<ffffffff8106379f>] worker_thread+0x133/0x217
  [<ffffffff8106366c>] ? manage_workers+0x191/0x191
  [<ffffffff81066f9c>] kthread+0x7d/0x85
  [<ffffffff81485ee4>] kernel_thread_helper+0x4/0x10
  [<ffffffff8147f0d8>] ? retint_restore_args+0x13/0x13
  [<ffffffff81066f1f>] ? __init_kthread_worker+0x56/0x56
  [<ffffffff81485ee0>] ? gs_change+0x13/0x13

Thanks,
Ben
Ben Greear July 9, 2011, 4:34 p.m. UTC | #7
On 07/08/2011 03:14 PM, Myklebust, Trond wrote:

> The calldata gets freed in the rpc_final_put_task() which shouldn't ever be run while the task is still referenced in __rpc_execute

Ok, please go ahead and use your patch for the killall tasks race.  My problem remains with or without
your patch, and with or without my version.  So, I'm hitting something else.

I'm real low on ideas of how exactly I am hitting the bug..but will keep poking around.

Thanks,
Ben

>
> IOW: it should be impossible to call rpc_exit_task() after rpc_final_put_task
> N?????r??y???b?X???v?^?)?{.n?+????{???"??^n?r??z???h????&???G???h?(????j"???m?????z?????f???h???~?mml==
Ben Greear July 12, 2011, 5:14 p.m. UTC | #8
On 07/08/2011 03:14 PM, Myklebust, Trond wrote:

>>    [<ffffffff81105907>] print_trailer+0x131/0x13a
>>    [<ffffffff81105945>] object_err+0x35/0x3e
>>    [<ffffffff811077b3>] verify_mem_not_deleted+0x7a/0xb7
>>    [<ffffffffa02891e5>] rpcb_getport_done+0x23/0x126 [sunrpc]
>>    [<ffffffffa02810df>] rpc_exit_task+0x3f/0x6d [sunrpc]
>>    [<ffffffffa02814d8>] __rpc_execute+0x80/0x253 [sunrpc]
>>    [<ffffffffa02816ed>] ? rpc_execute+0x42/0x42 [sunrpc]
>>    [<ffffffffa02816fd>] rpc_async_schedule+0x10/0x12 [sunrpc]
>>    [<ffffffff81061343>] process_one_work+0x230/0x41d
>>    [<ffffffff8106128e>] ? process_one_work+0x17b/0x41d
>>    [<ffffffff8106379f>] worker_thread+0x133/0x217
>>    [<ffffffff8106366c>] ? manage_workers+0x191/0x191
>>    [<ffffffff81066f9c>] kthread+0x7d/0x85
>>    [<ffffffff81485ee4>] kernel_thread_helper+0x4/0x10
>>    [<ffffffff8147f0d8>] ? retint_restore_args+0x13/0x13
>>    [<ffffffff81066f1f>] ? __init_kthread_worker+0x56/0x56
>>    [<ffffffff81485ee0>] ? gs_change+0x13/0x13
>
> The calldata gets freed in the rpc_final_put_task() which shouldn't ever be run while the task is still referenced in __rpc_execute
>
> IOW: it should be impossible to call rpc_exit_task() after rpc_final_put_task

I added lots of locking around the calldata, work-queue logic, and such, and
still the problem persists w/out hitting any of the debug warnings or poisoned
values I put in.  It almost seems like tk_calldata is just assigned to two
different tasks.

While poking through the code, I noticed that 'map' is static in rpcb_getport_async.

That would seem to cause problems if two threads called this method at
the same time, possibly causing tk_calldata to be assigned to two different
tasks???

Any idea why it is static?

I'm going to start another test run with this non-static
to see if that resolves things...

Thanks,
Ben
Ben Greear July 12, 2011, 5:30 p.m. UTC | #9
On 07/12/2011 10:25 AM, Myklebust, Trond wrote:
>> -----Original Message-----
>> From: Ben Greear [mailto:greearb@candelatech.com]
>> Sent: Tuesday, July 12, 2011 1:15 PM
>> To: Myklebust, Trond
>> Cc: linux-nfs@vger.kernel.org; linux-kernel@vger.kernel.org
>> Subject: Re: [RFC] sunrpc: Fix race between work-queue and
>> rpc_killall_tasks.
>>
>> On 07/08/2011 03:14 PM, Myklebust, Trond wrote:
>>
>>>>     [<ffffffff81105907>] print_trailer+0x131/0x13a
>>>>     [<ffffffff81105945>] object_err+0x35/0x3e
>>>>     [<ffffffff811077b3>] verify_mem_not_deleted+0x7a/0xb7
>>>>     [<ffffffffa02891e5>] rpcb_getport_done+0x23/0x126 [sunrpc]
>>>>     [<ffffffffa02810df>] rpc_exit_task+0x3f/0x6d [sunrpc]
>>>>     [<ffffffffa02814d8>] __rpc_execute+0x80/0x253 [sunrpc]
>>>>     [<ffffffffa02816ed>] ? rpc_execute+0x42/0x42 [sunrpc]
>>>>     [<ffffffffa02816fd>] rpc_async_schedule+0x10/0x12 [sunrpc]
>>>>     [<ffffffff81061343>] process_one_work+0x230/0x41d
>>>>     [<ffffffff8106128e>] ? process_one_work+0x17b/0x41d
>>>>     [<ffffffff8106379f>] worker_thread+0x133/0x217
>>>>     [<ffffffff8106366c>] ? manage_workers+0x191/0x191
>>>>     [<ffffffff81066f9c>] kthread+0x7d/0x85
>>>>     [<ffffffff81485ee4>] kernel_thread_helper+0x4/0x10
>>>>     [<ffffffff8147f0d8>] ? retint_restore_args+0x13/0x13
>>>>     [<ffffffff81066f1f>] ? __init_kthread_worker+0x56/0x56
>>>>     [<ffffffff81485ee0>] ? gs_change+0x13/0x13
>>>
>>> The calldata gets freed in the rpc_final_put_task() which shouldn't
>> ever be run while the task is still referenced in __rpc_execute
>>>
>>> IOW: it should be impossible to call rpc_exit_task() after
>> rpc_final_put_task
>>
>> I added lots of locking around the calldata, work-queue logic, and
>> such, and
>> still the problem persists w/out hitting any of the debug warnings or
>> poisoned
>> values I put in.  It almost seems like tk_calldata is just assigned to
>> two
>> different tasks.
>>
>> While poking through the code, I noticed that 'map' is static in
>> rpcb_getport_async.
>>
>> That would seem to cause problems if two threads called this method at
>> the same time, possibly causing tk_calldata to be assigned to two
>> different
>> tasks???
>>
>> Any idea why it is static?
>
> Doh! That is clearly a typo dating all the way back to when Chuck wrote that function.
>
> Yes, that would definitely explain your problem.

Ok, patch sent.  I assume someone will propagate this to stable
as desired?

And assuming this fixes it, can I get some brownie points towards
review of the ip-addr binding patches? :)

Thanks,
Ben
Ben Greear July 14, 2011, 4:20 p.m. UTC | #10
On 07/12/2011 10:30 AM, Ben Greear wrote:
> On 07/12/2011 10:25 AM, Myklebust, Trond wrote:
>>> -----Original Message-----
>>> From: Ben Greear [mailto:greearb@candelatech.com]
>>> Sent: Tuesday, July 12, 2011 1:15 PM
>>> To: Myklebust, Trond
>>> Cc: linux-nfs@vger.kernel.org; linux-kernel@vger.kernel.org
>>> Subject: Re: [RFC] sunrpc: Fix race between work-queue and
>>> rpc_killall_tasks.
>>>

>>> I added lots of locking around the calldata, work-queue logic, and
>>> such, and
>>> still the problem persists w/out hitting any of the debug warnings or
>>> poisoned
>>> values I put in. It almost seems like tk_calldata is just assigned to
>>> two
>>> different tasks.
>>>
>>> While poking through the code, I noticed that 'map' is static in
>>> rpcb_getport_async.
>>>
>>> That would seem to cause problems if two threads called this method at
>>> the same time, possibly causing tk_calldata to be assigned to two
>>> different
>>> tasks???
>>>
>>> Any idea why it is static?
>>
>> Doh! That is clearly a typo dating all the way back to when Chuck
>> wrote that function.
>>
>> Yes, that would definitely explain your problem.
>
> Ok, patch sent. I assume someone will propagate this to stable
> as desired?
>
> And assuming this fixes it, can I get some brownie points towards
> review of the ip-addr binding patches? :)

Just to close this issue:  We ran a clean 24+ hour test mounting and
unmounting 200 mounts every 30 seconds, and it ran with zero problems.

This was with 2.6.38.8+ with this fix applied.

3.0-rc7+ is still flaky in various other ways, but I see no more
NFS problems at least.

So, that was the problem I was hitting, and it appears to be the
last problem in this area.

Thanks,
Ben
diff mbox

Patch

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index a27406b..4814e24 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -616,30 +616,25 @@  static void __rpc_execute(struct rpc_task *task)
 	BUG_ON(RPC_IS_QUEUED(task));
 
 	for (;;) {
+		void (*do_action)(struct rpc_task *);
 
 		/*
-		 * Execute any pending callback.
+		 * Execute any pending callback first.
 		 */
-		if (task->tk_callback) {
-			void (*save_callback)(struct rpc_task *);
-
-			/*
-			 * We set tk_callback to NULL before calling it,
-			 * in case it sets the tk_callback field itself:
-			 */
-			save_callback = task->tk_callback;
-			task->tk_callback = NULL;
-			save_callback(task);
-		} else {
+		do_action = task->tk_callback;
+		task->tk_callback = NULL;
+		if (do_action == NULL) {
 			/*
 			 * Perform the next FSM step.
-			 * tk_action may be NULL when the task has been killed
-			 * by someone else.
+			 * tk_action may be NULL if the task has been killed.
+			 * In particular, note that rpc_killall_tasks may
+			 * do this at any time, so beware when dereferencing.
 			 */
-			if (task->tk_action == NULL)
+			do_action = task->tk_action;
+			if (do_action == NULL)
 				break;
-			task->tk_action(task);
 		}
+		do_action(task);
 
 		/*
 		 * Lockless check for whether task is sleeping or not.