From patchwork Fri Aug 23 19:45:11 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Colin Cross <ccross@android.com>
X-Patchwork-Id: 2849036
Return-Path: <linux-pm-owner@kernel.org>
X-Original-To: patchwork-linux-pm@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.19.201])
	by patchwork2.web.kernel.org (Postfix) with ESMTP id E460ABF546
	for <patchwork-linux-pm@patchwork.kernel.org>;
	Fri, 23 Aug 2013 19:52:52 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id C5EEB20518
	for <patchwork-linux-pm@patchwork.kernel.org>;
	Fri, 23 Aug 2013 19:52:51 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 807FC20511
	for <patchwork-linux-pm@patchwork.kernel.org>;
	Fri, 23 Aug 2013 19:52:50 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1756210Ab3HWTwt (ORCPT
	<rfc822;patchwork-linux-pm@patchwork.kernel.org>);
	Fri, 23 Aug 2013 15:52:49 -0400
Received: from mail-ve0-f201.google.com ([209.85.128.201]:39687 "EHLO
	mail-ve0-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1754760Ab3HWTwt (ORCPT
	<rfc822;linux-pm@vger.kernel.org>); Fri, 23 Aug 2013 15:52:49 -0400
Received: by mail-ve0-f201.google.com with SMTP id m1so104092ves.0
	for <linux-pm@vger.kernel.org>; Fri, 23 Aug 2013 12:52:48 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=GsFlT9SaXeFgMgseMk1DX4bdmPNGa7IRbI9aUZIIjyw=;
	b=XWTlII6OaQZVb7mFGs86xNCRt57ecquawMzt8reGPpUxqKDbee4/59K6cxhBdr1iKH
	Pat3enOhtDGyvhkjbIBic8DcP2ktcn1/fHbuloKpfaBRQKr6fkN6IVVXRUsOjkzNXKB3
	kHept1Ud8/57gJegJLOKc15f+eH3ihc+B8uoKsaSYbeLoMKiLubg3uiK4U6+K32BwISU
	MiPtgnf5K1xphGvHPNXVrzoZmnJt7+RKD4yEyRZbD1NKDApt0TtpbAY9xDCK0hfFCu+1
	6cXsI6o9lfM3UepAWV/MbxKl3EFq2kNBxURTKRVUuz7ScMzx7WfMApCLB06+SmzwgtHO
	ID7g==
X-Gm-Message-State: 
 ALoCoQl2HaUccTwEL6qMwP8sKEDVZH+XpylTOFF20R0ciwmQmaRmC0APZTLVx16QKezfr94JFCzJO4Zs/yJVHy0b5YhDK4a+qquw01NJ5Sop2V9g3CPcprNUvYx3hWJiA2RPy+HrTIVYjTdse1zoR1IW6jjnSL7CakMmq/JbRpD8d9Z2il5LhxMv+BZ6xlciAXTKX6S+2co5Pzcz7JXENA2SB3qugeXBEA==
X-Received: by 10.236.145.196 with SMTP id p44mr411505yhj.24.1377287115829;
	Fri, 23 Aug 2013 12:45:15 -0700 (PDT)
Received: from corp2gmr1-2.hot.corp.google.com
	(corp2gmr1-2.hot.corp.google.com [172.24.189.93])
	by gmr-mx.google.com with ESMTPS id
	k45si95621yhn.4.1969.12.31.16.00.00
	(version=TLSv1.1 cipher=AES128-SHA bits=128/128);
	Fri, 23 Aug 2013 12:45:15 -0700 (PDT)
Received: from walnut.mtv.corp.google.com (walnut.mtv.corp.google.com
	[172.18.120.100])
	by corp2gmr1-2.hot.corp.google.com (Postfix) with ESMTP id
	99FCD5A4269; Fri, 23 Aug 2013 12:45:15 -0700 (PDT)
Received: by walnut.mtv.corp.google.com (Postfix, from userid 99897)
	id 2C7F81606ED; Fri, 23 Aug 2013 12:45:15 -0700 (PDT)
From: Colin Cross <ccross@android.com>
To: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, Neil Zhang <zhangwm@marvell.com>,
	Joseph Lo <josephl@nvidia.com>, linux-tegra@vger.kernel.org,
	Colin Cross <ccross@android.com>, stable@vger.kernel.org,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	Daniel Lezcano <daniel.lezcano@linaro.org>
Subject: [PATCH 2/3] cpuidle: coupled: abort idle if pokes are pending
Date: Fri, 23 Aug 2013 12:45:11 -0700
Message-Id: <1377287112-12018-2-git-send-email-ccross@android.com>
X-Mailer: git-send-email 1.8.3
In-Reply-To: <1377287112-12018-1-git-send-email-ccross@android.com>
References: <1377287112-12018-1-git-send-email-ccross@android.com>
Sender: linux-pm-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-pm.vger.kernel.org>
X-Mailing-List: linux-pm@vger.kernel.org
X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI,
	RP_MATCHES_RCVD,
	UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Joseph Lo <josephl@nvidia.com> reported a lockup on Tegra3 caused
by a race condition in coupled cpuidle.  When two or more cpus
enter idle at the same time, the first cpus to arrive may go to the
ready loop without processing pending pokes from the last cpu to
arrive.

This patch adds a check for pending pokes once all cpus have been
synchronized in the ready loop and resets the coupled state and
retries if any cpus failed to handle their pending poke.

Retrying on all cpus may trigger the same issue again, so this patch
also adds a check to ensure that each cpu has received at least one
poke between when it enters the waiting loop and when it moves on to
the ready loop.

Reported-by: Joseph Lo <josephl@nvidia.com>
CC: stable@vger.kernel.org
Signed-off-by: Colin Cross <ccross@android.com>
Tested-by: Joseph Lo <josephl@nvidia.com>
---
 drivers/cpuidle/coupled.c | 107 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 82 insertions(+), 25 deletions(-)

diff --git a/drivers/cpuidle/coupled.c b/drivers/cpuidle/coupled.c
index db92bcb..bbc4ba5 100644
--- a/drivers/cpuidle/coupled.c
+++ b/drivers/cpuidle/coupled.c
@@ -106,6 +106,7 @@ struct cpuidle_coupled {
 	cpumask_t coupled_cpus;
 	int requested_state[NR_CPUS];
 	atomic_t ready_waiting_counts;
+	atomic_t abort_barrier;
 	int online_count;
 	int refcnt;
 	int prevent;
@@ -122,12 +123,19 @@ static DEFINE_MUTEX(cpuidle_coupled_lock);
 static DEFINE_PER_CPU(struct call_single_data, cpuidle_coupled_poke_cb);
 
 /*
- * The cpuidle_coupled_poked_mask mask is used to avoid calling
+ * The cpuidle_coupled_poke_pending mask is used to avoid calling
  * __smp_call_function_single with the per cpu call_single_data struct already
  * in use.  This prevents a deadlock where two cpus are waiting for each others
  * call_single_data struct to be available
  */
-static cpumask_t cpuidle_coupled_poked_mask;
+static cpumask_t cpuidle_coupled_poke_pending;
+
+/*
+ * The cpuidle_coupled_poke_pending mask is used to ensure that each cpu has
+ * been poked once to minimize entering the ready loop with a poke pending,
+ * which would require aborting and retrying.
+ */
+static cpumask_t cpuidle_coupled_poked;
 
 /**
  * cpuidle_coupled_parallel_barrier - synchronize all online coupled cpus
@@ -291,10 +299,11 @@ static inline int cpuidle_coupled_get_state(struct cpuidle_device *dev,
 	return state;
 }
 
-static void cpuidle_coupled_poked(void *info)
+static void cpuidle_coupled_handle_poke(void *info)
 {
 	int cpu = (unsigned long)info;
-	cpumask_clear_cpu(cpu, &cpuidle_coupled_poked_mask);
+	cpumask_set_cpu(cpu, &cpuidle_coupled_poked);
+	cpumask_clear_cpu(cpu, &cpuidle_coupled_poke_pending);
 }
 
 /**
@@ -313,7 +322,7 @@ static void cpuidle_coupled_poke(int cpu)
 {
 	struct call_single_data *csd = &per_cpu(cpuidle_coupled_poke_cb, cpu);
 
-	if (!cpumask_test_and_set_cpu(cpu, &cpuidle_coupled_poked_mask))
+	if (!cpumask_test_and_set_cpu(cpu, &cpuidle_coupled_poke_pending))
 		__smp_call_function_single(cpu, csd, 0);
 }
 
@@ -340,30 +349,19 @@ static void cpuidle_coupled_poke_others(int this_cpu,
  * @coupled: the struct coupled that contains the current cpu
  * @next_state: the index in drv->states of the requested state for this cpu
  *
- * Updates the requested idle state for the specified cpuidle device,
- * poking all coupled cpus out of idle if necessary to let them see the new
- * state.
+ * Updates the requested idle state for the specified cpuidle device.
+ * Returns the number of waiting cpus.
  */
-static void cpuidle_coupled_set_waiting(int cpu,
+static int cpuidle_coupled_set_waiting(int cpu,
 		struct cpuidle_coupled *coupled, int next_state)
 {
-	int w;
-
 	coupled->requested_state[cpu] = next_state;
 
 	/*
-	 * If this is the last cpu to enter the waiting state, poke
-	 * all the other cpus out of their waiting state so they can
-	 * enter a deeper state.  This can race with one of the cpus
-	 * exiting the waiting state due to an interrupt and
-	 * decrementing waiting_count, see comment below.
-	 *
 	 * The atomic_inc_return provides a write barrier to order the write
 	 * to requested_state with the later write that increments ready_count.
 	 */
-	w = atomic_inc_return(&coupled->ready_waiting_counts) & WAITING_MASK;
-	if (w == coupled->online_count)
-		cpuidle_coupled_poke_others(cpu, coupled);
+	return atomic_inc_return(&coupled->ready_waiting_counts) & WAITING_MASK;
 }
 
 /**
@@ -418,13 +416,24 @@ static void cpuidle_coupled_set_done(int cpu, struct cpuidle_coupled *coupled)
 static int cpuidle_coupled_clear_pokes(int cpu)
 {
 	local_irq_enable();
-	while (cpumask_test_cpu(cpu, &cpuidle_coupled_poked_mask))
+	while (cpumask_test_cpu(cpu, &cpuidle_coupled_poke_pending))
 		cpu_relax();
 	local_irq_disable();
 
 	return need_resched() ? -EINTR : 0;
 }
 
+static bool cpuidle_coupled_any_pokes_pending(struct cpuidle_coupled *coupled)
+{
+	cpumask_t cpus;
+	int ret;
+
+	cpumask_and(&cpus, cpu_online_mask, &coupled->coupled_cpus);
+	ret = cpumask_and(&cpus, &cpuidle_coupled_poke_pending, &cpus);
+
+	return ret;
+}
+
 /**
  * cpuidle_enter_state_coupled - attempt to enter a state with coupled cpus
  * @dev: struct cpuidle_device for the current cpu
@@ -449,6 +458,7 @@ int cpuidle_enter_state_coupled(struct cpuidle_device *dev,
 {
 	int entered_state = -1;
 	struct cpuidle_coupled *coupled = dev->coupled;
+	int w;
 
 	if (!coupled)
 		return -EINVAL;
@@ -466,14 +476,33 @@ int cpuidle_enter_state_coupled(struct cpuidle_device *dev,
 	/* Read barrier ensures online_count is read after prevent is cleared */
 	smp_rmb();
 
-	cpuidle_coupled_set_waiting(dev->cpu, coupled, next_state);
+reset:
+	cpumask_clear_cpu(dev->cpu, &cpuidle_coupled_poked);
+
+	w = cpuidle_coupled_set_waiting(dev->cpu, coupled, next_state);
+	/*
+	 * If this is the last cpu to enter the waiting state, poke
+	 * all the other cpus out of their waiting state so they can
+	 * enter a deeper state.  This can race with one of the cpus
+	 * exiting the waiting state due to an interrupt and
+	 * decrementing waiting_count, see comment below.
+	 */
+	if (w == coupled->online_count) {
+		cpumask_set_cpu(dev->cpu, &cpuidle_coupled_poked);
+		cpuidle_coupled_poke_others(dev->cpu, coupled);
+	}
 
 retry:
 	/*
 	 * Wait for all coupled cpus to be idle, using the deepest state
-	 * allowed for a single cpu.
+	 * allowed for a single cpu.  If this was not the poking cpu, wait
+	 * for at least one poke before leaving to avoid a race where
+	 * two cpus could arrive at the waiting loop at the same time,
+	 * but the first of the two to arrive could skip the loop without
+	 * processing the pokes from the last to arrive.
 	 */
-	while (!cpuidle_coupled_cpus_waiting(coupled)) {
+	while (!cpuidle_coupled_cpus_waiting(coupled) ||
+			!cpumask_test_cpu(dev->cpu, &cpuidle_coupled_poked)) {
 		if (cpuidle_coupled_clear_pokes(dev->cpu)) {
 			cpuidle_coupled_set_not_waiting(dev->cpu, coupled);
 			goto out;
@@ -495,6 +524,12 @@ retry:
 	}
 
 	/*
+	 * Make sure final poke status for this cpu is visible before setting
+	 * cpu as ready.
+	 */
+	smp_wmb();
+
+	/*
 	 * All coupled cpus are probably idle.  There is a small chance that
 	 * one of the other cpus just became active.  Increment the ready count,
 	 * and spin until all coupled cpus have incremented the counter. Once a
@@ -513,6 +548,28 @@ retry:
 		cpu_relax();
 	}
 
+	/*
+	 * Make sure read of all cpus ready is done before reading pending pokes
+	 */
+	smp_rmb();
+
+	/*
+	 * There is a small chance that a cpu left and reentered idle after this
+	 * cpu saw that all cpus were waiting.  The cpu that reentered idle will
+	 * have sent this cpu a poke, which will still be pending after the
+	 * ready loop.  The pending interrupt may be lost by the interrupt
+	 * controller when entering the deep idle state.  It's not possible to
+	 * clear a pending interrupt without turning interrupts on and handling
+	 * it, and it's too late to turn on interrupts here, so reset the
+	 * coupled idle state of all cpus and retry.
+	 */
+	if (cpuidle_coupled_any_pokes_pending(coupled)) {
+		cpuidle_coupled_set_done(dev->cpu, coupled);
+		/* Wait for all cpus to see the pending pokes */
+		cpuidle_coupled_parallel_barrier(dev, &coupled->abort_barrier);
+		goto reset;
+	}
+
 	/* all cpus have acked the coupled state */
 	next_state = cpuidle_coupled_get_state(dev, coupled);
 
@@ -598,7 +655,7 @@ have_coupled:
 	coupled->refcnt++;
 
 	csd = &per_cpu(cpuidle_coupled_poke_cb, dev->cpu);
-	csd->func = cpuidle_coupled_poked;
+	csd->func = cpuidle_coupled_handle_poke;
 	csd->info = (void *)(unsigned long)dev->cpu;
 
 	return 0;