Message ID | 4FF730CD.7050907@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Jul 06, 2012 at 11:39:09AM -0700, Stephen Boyd wrote: > On 07/05/12 17:24, Paul E. McKenney wrote: > > On Thu, Jul 05, 2012 at 04:45:58PM -0700, Stephen Boyd wrote: > >> @@ -179,7 +184,7 @@ void __ref cpu_die(void) > >> mb(); > >> > >> /* Tell __cpu_die() that this CPU is now safe to dispose of */ > >> - complete(&cpu_died); > >> + __this_cpu_write(cpu_state, CPU_DEAD); > > Or you could do something like: > > > > RCU_NONIDLE(complete(&cpu_died)); > > > > This would tell RCU that it needed to pay attention to this CPU for > > the duration of the "complete()" function call despite the CPU's being > > idle. And might allow you to dispense with the rest of the patch. > > Great! I like that more since we get to keep the completion mechanism > instead of a busy wait. > > Russell, which one would you prefer? Here's the other version I think I prefer the version below. > > ----->8-----8<----- > Subject: [PATCH] ARM: smp: Fix suspicious RCU originating from cpu_die() > > While running hotplug tests I ran into this RCU splat > > =============================== > [ INFO: suspicious RCU usage. ] > 3.4.0 #3275 Tainted: G W > ------------------------------- > include/linux/rcupdate.h:729 rcu_read_lock() used illegally while idle! > > other info that might help us debug this: > > RCU used illegally from idle CPU! > rcu_scheduler_active = 1, debug_locks = 0 > RCU used illegally from extended quiescent state! > 4 locks held by swapper/2/0: > #0: ((cpu_died).wait.lock){......}, at: [<c00ab128>] complete+0x1c/0x5c > #1: (&p->pi_lock){-.-.-.}, at: [<c00b275c>] try_to_wake_up+0x2c/0x388 > #2: (&rq->lock){-.-.-.}, at: [<c00b2860>] try_to_wake_up+0x130/0x388 > #3: (rcu_read_lock){.+.+..}, at: [<c00abe5c>] cpuacct_charge+0x28/0x1f4 > > stack backtrace: > [<c001521c>] (unwind_backtrace+0x0/0x12c) from [<c00abec8>] (cpuacct_charge+0x94/0x1f4) > [<c00abec8>] (cpuacct_charge+0x94/0x1f4) from [<c00b395c>] (update_curr+0x24c/0x2c8) > [<c00b395c>] (update_curr+0x24c/0x2c8) from [<c00b59c4>] (enqueue_task_fair+0x50/0x194) > [<c00b59c4>] (enqueue_task_fair+0x50/0x194) from [<c00afea4>] (enqueue_task+0x30/0x34) > [<c00afea4>] (enqueue_task+0x30/0x34) from [<c00b0908>] (ttwu_activate+0x14/0x38) > [<c00b0908>] (ttwu_activate+0x14/0x38) from [<c00b28a8>] (try_to_wake_up+0x178/0x388) > [<c00b28a8>] (try_to_wake_up+0x178/0x388) from [<c00a82a0>] (__wake_up_common+0x34/0x78) > [<c00a82a0>] (__wake_up_common+0x34/0x78) from [<c00ab154>] (complete+0x48/0x5c) > [<c00ab154>] (complete+0x48/0x5c) from [<c07db7cc>] (cpu_die+0x2c/0x58) > [<c07db7cc>] (cpu_die+0x2c/0x58) from [<c000f954>] (cpu_idle+0x64/0xfc) > [<c000f954>] (cpu_idle+0x64/0xfc) from [<80208160>] (0x80208160) > > When a cpu is marked offline during its idle thread it calls > cpu_die() during an RCU idle period. cpu_die() calls complete() > to notify the killing process that the cpu has died. complete() > calls into the scheduler code and eventually grabs an RCU read > lock in cpuacct_charge(). > > Mark complete() as RCU_NONIDLE so that RCU pays attention to this > CPU for the duration of the complete() function even though it's > in idle. > > Suggested-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> > --- > arch/arm/kernel/smp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c > index 2c7217d..aea74f5 100644 > --- a/arch/arm/kernel/smp.c > +++ b/arch/arm/kernel/smp.c > @@ -179,7 +179,7 @@ void __ref cpu_die(void) > mb(); > > /* Tell __cpu_die() that this CPU is now safe to dispose of */ > - complete(&cpu_died); > + RCU_NONIDLE(complete(&cpu_died)); > > /* > * actual CPU shutdown procedure is at least platform (if not > > -- > Sent by an employee of the Qualcomm Innovation Center, Inc. > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 07/06/12 13:30, Russell King - ARM Linux wrote: > On Fri, Jul 06, 2012 at 11:39:09AM -0700, Stephen Boyd wrote: >> On 07/05/12 17:24, Paul E. McKenney wrote: >>> On Thu, Jul 05, 2012 at 04:45:58PM -0700, Stephen Boyd wrote: >>>> @@ -179,7 +184,7 @@ void __ref cpu_die(void) >>>> mb(); >>>> >>>> /* Tell __cpu_die() that this CPU is now safe to dispose of */ >>>> - complete(&cpu_died); >>>> + __this_cpu_write(cpu_state, CPU_DEAD); >>> Or you could do something like: >>> >>> RCU_NONIDLE(complete(&cpu_died)); >>> >>> This would tell RCU that it needed to pay attention to this CPU for >>> the duration of the "complete()" function call despite the CPU's being >>> idle. And might allow you to dispense with the rest of the patch. >> Great! I like that more since we get to keep the completion mechanism >> instead of a busy wait. >> >> Russell, which one would you prefer? Here's the other version > I think I prefer the version below. > Ok. I put it in the patch tracker.
=============================== [ INFO: suspicious RCU usage. ] 3.4.0 #3275 Tainted: G W ------------------------------- include/linux/rcupdate.h:729 rcu_read_lock() used illegally while idle! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! 4 locks held by swapper/2/0: #0: ((cpu_died).wait.lock){......}, at: [<c00ab128>] complete+0x1c/0x5c #1: (&p->pi_lock){-.-.-.}, at: [<c00b275c>] try_to_wake_up+0x2c/0x388 #2: (&rq->lock){-.-.-.}, at: [<c00b2860>] try_to_wake_up+0x130/0x388 #3: (rcu_read_lock){.+.+..}, at: [<c00abe5c>] cpuacct_charge+0x28/0x1f4 stack backtrace: [<c001521c>] (unwind_backtrace+0x0/0x12c) from [<c00abec8>] (cpuacct_charge+0x94/0x1f4) [<c00abec8>] (cpuacct_charge+0x94/0x1f4) from [<c00b395c>] (update_curr+0x24c/0x2c8) [<c00b395c>] (update_curr+0x24c/0x2c8) from [<c00b59c4>] (enqueue_task_fair+0x50/0x194) [<c00b59c4>] (enqueue_task_fair+0x50/0x194) from [<c00afea4>] (enqueue_task+0x30/0x34) [<c00afea4>] (enqueue_task+0x30/0x34) from [<c00b0908>] (ttwu_activate+0x14/0x38) [<c00b0908>] (ttwu_activate+0x14/0x38) from [<c00b28a8>] (try_to_wake_up+0x178/0x388) [<c00b28a8>] (try_to_wake_up+0x178/0x388) from [<c00a82a0>] (__wake_up_common+0x34/0x78) [<c00a82a0>] (__wake_up_common+0x34/0x78) from [<c00ab154>] (complete+0x48/0x5c) [<c00ab154>] (complete+0x48/0x5c) from [<c07db7cc>] (cpu_die+0x2c/0x58) [<c07db7cc>] (cpu_die+0x2c/0x58) from [<c000f954>] (cpu_idle+0x64/0xfc) [<c000f954>] (cpu_idle+0x64/0xfc) from [<80208160>] (0x80208160) When a cpu is marked offline during its idle thread it calls cpu_die() during an RCU idle period. cpu_die() calls complete() to notify the killing process that the cpu has died. complete() calls into the scheduler code and eventually grabs an RCU read lock in cpuacct_charge(). Mark complete() as RCU_NONIDLE so that RCU pays attention to this CPU for the duration of the complete() function even though it's in idle. Suggested-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> --- arch/arm/kernel/smp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index 2c7217d..aea74f5 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -179,7 +179,7 @@ void __ref cpu_die(void) mb(); /* Tell __cpu_die() that this CPU is now safe to dispose of */ - complete(&cpu_died); + RCU_NONIDLE(complete(&cpu_died)); /* * actual CPU shutdown procedure is at least platform (if not