From patchwork Wed Mar 6 11:18:35 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 2224691 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by patchwork2.kernel.org (Postfix) with ESMTP id 65565DF23A for ; Wed, 6 Mar 2013 11:25:40 +0000 (UTC) Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1UDCQ1-0000S0-WC; Wed, 06 Mar 2013 11:21:58 +0000 Received: from www.linutronix.de ([62.245.132.108] helo=Galois.linutronix.de) by merlin.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1UDCOk-0007zQ-B3 for linux-arm-kernel@lists.infradead.org; Wed, 06 Mar 2013 11:20:46 +0000 Received: from localhost ([127.0.0.1] helo=localhost6.localdomain6) by Galois.linutronix.de with esmtp (Exim 4.72) (envelope-from ) id 1UDCMm-0006Mc-9i; Wed, 06 Mar 2013 12:18:36 +0100 Message-Id: <20130306111537.492045206@linutronix.de> User-Agent: quilt/0.48-1 Date: Wed, 06 Mar 2013 11:18:35 -0000 From: Thomas Gleixner To: LKML Subject: [patch 4/7] tick: Handle broadcast wakeup of multiple cpus References: <20130306111351.883117670@linutronix.de> Content-Disposition: inline; filename=tick-broadcast-handle-mass-wakeup.patch X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1, SHORTCIRCUIT=-0.0001 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20130306_062038_873466_78AC0023 X-CRM114-Status: GOOD ( 17.81 ) X-Spam-Score: -4.8 (----) X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary: Content analysis details: (-4.8 points) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, medium trust [62.245.132.108 listed in list.dnswl.org] -0.6 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] Cc: Lorenzo Pieralisi , Santosh Shilimkar , Jason Liu , John Stultz , LAK , Arjan van de Veen X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Some brilliant hardware implementations wake multiple cores when the broadcast timer fires. This leads to the following interesting problem: CPU0 CPU1 wakeup from idle wakeup from idle leave broadcast mode leave broadcast mode restart per cpu timer restart per cpu timer go back to idle handle broadcast (empty mask) enter broadcast mode programm broadcast device enter broadcast mode programm broadcast device So what happens is that due to the forced reprogramming of the cpu local timer, we need to set a event in the future. Now if we manage to go back to idle before the timer fires, we switch off the timer and arm the broadcast device with an already expired time (covered by forced mode). So in the worst case we repeat the above ping pong forever. Unfortunately we have no information about what caused the wakeup, but we can check current time against the expiry time of the local cpu. If the local event is already in the past, we know that the broadcast timer is about to fire and send an IPI. So we mark ourself as an IPI target even if we left broadcast mode and avoid the reprogramming of the local cpu timer. This still leaves the possibility that a CPU which is not handling the broadcast interrupt is going to reach idle again before the IPI arrives. This can't be solved in the core code and will be handled in follow up patches. Reported-by: Jason Liu Signed-off-by: Thomas Gleixner Tested-by: Lorenzo Pieralisi --- kernel/time/tick-broadcast.c | 59 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 58 insertions(+), 1 deletion(-) Index: tip/kernel/time/tick-broadcast.c =================================================================== --- tip.orig/kernel/time/tick-broadcast.c +++ tip/kernel/time/tick-broadcast.c @@ -393,6 +393,7 @@ int tick_resume_broadcast(void) static cpumask_var_t tick_broadcast_oneshot_mask; static cpumask_var_t tick_broadcast_pending_mask; +static cpumask_var_t tick_broadcast_force_mask; /* * Exposed for debugging: see timer_list.c @@ -462,6 +463,10 @@ again: } } + /* Take care of enforced broadcast requests */ + cpumask_or(tmpmask, tmpmask, tick_broadcast_force_mask); + cpumask_clear(tick_broadcast_force_mask); + /* * Wakeup the cpus which have an expired event. */ @@ -497,6 +502,7 @@ void tick_broadcast_oneshot_control(unsi struct clock_event_device *bc, *dev; struct tick_device *td; unsigned long flags; + ktime_t now; int cpu; /* @@ -524,7 +530,16 @@ void tick_broadcast_oneshot_control(unsi WARN_ON_ONCE(cpumask_test_cpu(cpu, tick_broadcast_pending_mask)); if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_oneshot_mask)) { clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN); - if (dev->next_event.tv64 < bc->next_event.tv64) + /* + * We only reprogram the broadcast timer if we + * did not mark ourself in the force mask and + * if the cpu local event is earlier than the + * broadcast event. If the current CPU is in + * the force mask, then we are going to be + * woken by the IPI right away. + */ + if (!cpumask_test_cpu(cpu, tick_broadcast_force_mask) && + dev->next_event.tv64 < bc->next_event.tv64) tick_broadcast_set_event(dev->next_event, 1); } } else { @@ -545,6 +560,47 @@ void tick_broadcast_oneshot_control(unsi tick_broadcast_pending_mask)) goto out; + /* + * If the pending bit is not set, then we are + * either the CPU handling the broadcast + * interrupt or we got woken by something else. + * + * We are not longer in the broadcast mask, so + * if the cpu local expiry time is already + * reached, we would reprogram the cpu local + * timer with an already expired event. + * + * This can lead to a ping-pong when we return + * to idle and therefor rearm the broadcast + * timer before the cpu local timer was able + * to fire. This happens because the forced + * reprogramming makes sure that the event + * will happen in the future and depending on + * the min_delta setting this might be far + * enough out that the ping-pong starts. + * + * If the cpu local next_event has expired + * then we know that the broadcast timer + * next_event has expired as well and + * broadcast is about to be handled. So we + * avoid reprogramming and enforce that the + * broadcast handler, which did not run yet, + * will invoke the cpu local handler. + * + * We cannot call the handler directly from + * here, because we might be in a NOHZ phase + * and we did not go through the irq_enter() + * nohz fixups. + */ + now = ktime_get(); + if (dev->next_event.tv64 <= now.tv64) { + cpumask_set_cpu(cpu, tick_broadcast_force_mask); + goto out; + } + /* + * We got woken by something else. Reprogram + * the cpu local timer device. + */ tick_program_event(dev->next_event, 1); } } @@ -686,5 +742,6 @@ void tick_broadcast_init(void) #ifdef CONFIG_TICK_ONESHOT alloc_cpumask_var(&tick_broadcast_oneshot_mask, GFP_NOWAIT); alloc_cpumask_var(&tick_broadcast_pending_mask, GFP_NOWAIT); + alloc_cpumask_var(&tick_broadcast_force_mask, GFP_NOWAIT); #endif }