Message ID | 1308561839-18407-1-git-send-email-santosh.shilimkar@ti.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Jun 20, 2011 at 02:53:59PM +0530, Santosh Shilimkar wrote: > The current ARM CPU hotplug code suffers from couple of race conditions > in CPU online path with scheduler. > The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be marked > active as part of cpu_notify() by the CPU which brought it up before > enabling interrupts. Hmm, why not just move the set_cpu_online() call before notify_cpu_starting() and add the wait after the set_cpu_online() ?
On 6/20/2011 3:20 PM, Russell King - ARM Linux wrote: > On Mon, Jun 20, 2011 at 02:53:59PM +0530, Santosh Shilimkar wrote: >> The current ARM CPU hotplug code suffers from couple of race conditions >> in CPU online path with scheduler. >> The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be marked >> active as part of cpu_notify() by the CPU which brought it up before >> enabling interrupts. > > Hmm, why not just move the set_cpu_online() call before notify_cpu_starting() > and add the wait after the set_cpu_online() ? I think that's what patch is doing. Do you mean, calling hotplug notifier chain immediately after CPU marked as online. Something like below. set_cpu_online(cpu, true); notify_cpu_starting(cpu); while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask)) cpu_relax(); /* * Enable local interrupts. */ local_irq_enable(); local_fiq_enable(); Regards Santosh
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index 344e52b..84373a9 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -302,13 +302,6 @@ asmlinkage void __cpuinit secondary_start_kernel(void) platform_secondary_init(cpu); /* - * Enable local interrupts. - */ - notify_cpu_starting(cpu); - local_irq_enable(); - local_fiq_enable(); - - /* * Setup the percpu timer for this CPU. */ percpu_timer_setup(); @@ -322,6 +315,17 @@ asmlinkage void __cpuinit secondary_start_kernel(void) */ set_cpu_online(cpu, true); + + while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask)) + cpu_relax(); + + /* + * Enable local interrupts. + */ + notify_cpu_starting(cpu); + local_irq_enable(); + local_fiq_enable(); + /* * OK, it's off to the idle thread for us */
The current ARM CPU hotplug code suffers from couple of race conditions in CPU online path with scheduler. The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be marked active as part of cpu_notify() by the CPU which brought it up before enabling interrupts. So we end up in with couple of race conditions, 1) Interrupts are enabled even before CPU is marked as active. 2) Newly plugged CPU is marked as active but it is not marked online yet. When an interrupt happens before the cpu_active bit is set, the scheduler can't schedule the woken thread which is bound to that newly onlined cpu and and selects a fallback runqueue. Secondly marking CPU active before it is online also not desirable behaviour. Fix this race conditions. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> --- On v3.0 kernel I started seeing lock-up and random crashes when CPU online/offline was attempted aggressively. With git bisect I could reach to Peter's commit e4a52bcb9a18142d79e231b6733cabdbf2e67c1f[sched: Remove rq->lock from the first half of ttwu()] which was also reported by Marc Zyngier. But even after using the follow up fix from Peter d6aa8f85f163[sched: Fix ttwu() for __ARCH_WANT_INTERRUPTS_ON_CTXSW], I was still seeing issues with hotplug path. So as a experiment I just pushed down the interrupt enabling on newly plugged CPU after it's marked as online. This made things better and much stable but occasionally I was still seeing lock-up. With above as background I looked at arch/x86/ code and got convinced myself that the experimental hack could be the right fix. While doing this I came across a commit from Thomas fd8a7de177b [x86: cpu-hotplug: Prevent softirq wakeup on wrong CPU] which fixed the race 2) on x86 architecture. In this patch I have folded possible fixes for both race conditions for ARM hotplug code as mentioned in change log. Hopefully I am not introducing any new race with this patch and hence the RFC. arch/arm/kernel/smp.c | 18 +++++++++++------- 1 files changed, 11 insertions(+), 7 deletions(-)