From patchwork Mon Jun 20 09:23:59 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Santosh Shilimkar X-Patchwork-Id: 896622 Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by demeter1.kernel.org (8.14.4/8.14.4) with ESMTP id p5K9S7Ex011747 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Mon, 20 Jun 2011 09:28:28 GMT Received: from canuck.infradead.org ([134.117.69.58]) by merlin.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1QYalA-00041Q-07; Mon, 20 Jun 2011 09:27:08 +0000 Received: from localhost ([127.0.0.1] helo=canuck.infradead.org) by canuck.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1QYal9-0004Yx-D6; Mon, 20 Jun 2011 09:27:07 +0000 Received: from arroyo.ext.ti.com ([192.94.94.40]) by canuck.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1QYaio-0003rc-IF for linux-arm-kernel@lists.infradead.org; Mon, 20 Jun 2011 09:24:45 +0000 Received: from dbdp20.itg.ti.com ([172.24.170.38]) by arroyo.ext.ti.com (8.13.7/8.13.7) with ESMTP id p5K9O9pC008522 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 20 Jun 2011 04:24:12 -0500 Received: from dbde71.ent.ti.com (localhost [127.0.0.1]) by dbdp20.itg.ti.com (8.13.8/8.13.8) with ESMTP id p5K9O7uZ015770; Mon, 20 Jun 2011 14:54:08 +0530 (IST) Received: from dbdp31.itg.ti.com (172.24.170.98) by DBDE71.ent.ti.com (172.24.170.149) with Microsoft SMTP Server id 8.3.106.1; Mon, 20 Jun 2011 14:54:07 +0530 Received: from linfarm476.india.ti.com (linfarm476.india.ti.com [10.24.132.205]) by dbdp31.itg.ti.com (8.13.8/8.13.8) with ESMTP id p5K9O1VE010019; Mon, 20 Jun 2011 14:54:02 +0530 (IST) Received: (from a0393909@localhost) by linfarm476.india.ti.com (8.12.11/8.13.8/Submit) id p5K9NxKS018432; Mon, 20 Jun 2011 14:53:59 +0530 From: Santosh Shilimkar To: Subject: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler. Date: Mon, 20 Jun 2011 14:53:59 +0530 Message-ID: <1308561839-18407-1-git-send-email-santosh.shilimkar@ti.com> X-Mailer: git-send-email 1.5.6.6 MIME-Version: 1.0 X-CRM114-Version: 20090807-BlameThorstenAndJenny ( TRE 0.7.6 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20110620_052442_820812_C31B5C32 X-CRM114-Status: GOOD ( 18.24 ) X-Spam-Score: -2.3 (--) X-Spam-Report: SpamAssassin version 3.3.1 on canuck.infradead.org summary: Content analysis details: (-2.3 points) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, medium trust [192.94.94.40 listed in list.dnswl.org] -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay domain Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, Santosh Shilimkar , Russell King , Thomas Gleixner , linux-omap@vger.kernel.org X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]); Mon, 20 Jun 2011 09:28:28 +0000 (UTC) The current ARM CPU hotplug code suffers from couple of race conditions in CPU online path with scheduler. The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be marked active as part of cpu_notify() by the CPU which brought it up before enabling interrupts. So we end up in with couple of race conditions, 1) Interrupts are enabled even before CPU is marked as active. 2) Newly plugged CPU is marked as active but it is not marked online yet. When an interrupt happens before the cpu_active bit is set, the scheduler can't schedule the woken thread which is bound to that newly onlined cpu and and selects a fallback runqueue. Secondly marking CPU active before it is online also not desirable behaviour. Fix this race conditions. Signed-off-by: Santosh Shilimkar Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Russell King --- On v3.0 kernel I started seeing lock-up and random crashes when CPU online/offline was attempted aggressively. With git bisect I could reach to Peter's commit e4a52bcb9a18142d79e231b6733cabdbf2e67c1f[sched: Remove rq->lock from the first half of ttwu()] which was also reported by Marc Zyngier. But even after using the follow up fix from Peter d6aa8f85f163[sched: Fix ttwu() for __ARCH_WANT_INTERRUPTS_ON_CTXSW], I was still seeing issues with hotplug path. So as a experiment I just pushed down the interrupt enabling on newly plugged CPU after it's marked as online. This made things better and much stable but occasionally I was still seeing lock-up. With above as background I looked at arch/x86/ code and got convinced myself that the experimental hack could be the right fix. While doing this I came across a commit from Thomas fd8a7de177b [x86: cpu-hotplug: Prevent softirq wakeup on wrong CPU] which fixed the race 2) on x86 architecture. In this patch I have folded possible fixes for both race conditions for ARM hotplug code as mentioned in change log. Hopefully I am not introducing any new race with this patch and hence the RFC. arch/arm/kernel/smp.c | 18 +++++++++++------- 1 files changed, 11 insertions(+), 7 deletions(-) diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index 344e52b..84373a9 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -302,13 +302,6 @@ asmlinkage void __cpuinit secondary_start_kernel(void) platform_secondary_init(cpu); /* - * Enable local interrupts. - */ - notify_cpu_starting(cpu); - local_irq_enable(); - local_fiq_enable(); - - /* * Setup the percpu timer for this CPU. */ percpu_timer_setup(); @@ -322,6 +315,17 @@ asmlinkage void __cpuinit secondary_start_kernel(void) */ set_cpu_online(cpu, true); + + while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask)) + cpu_relax(); + + /* + * Enable local interrupts. + */ + notify_cpu_starting(cpu); + local_irq_enable(); + local_fiq_enable(); + /* * OK, it's off to the idle thread for us */