From patchwork Wed Oct 22 16:54:33 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 5135881 Return-Path: X-Original-To: patchwork-linux-pm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 1C00DC11AC for ; Wed, 22 Oct 2014 16:54:43 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id DA446201F5 for ; Wed, 22 Oct 2014 16:54:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A8A0220154 for ; Wed, 22 Oct 2014 16:54:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753410AbaJVQyj (ORCPT ); Wed, 22 Oct 2014 12:54:39 -0400 Received: from e37.co.us.ibm.com ([32.97.110.158]:36357 "EHLO e37.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752423AbaJVQyi (ORCPT ); Wed, 22 Oct 2014 12:54:38 -0400 Received: from /spool/local by e37.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 22 Oct 2014 10:54:37 -0600 Received: from d03dlp03.boulder.ibm.com (9.17.202.179) by e37.co.us.ibm.com (192.168.1.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 22 Oct 2014 10:54:36 -0600 Received: from b03cxnp08025.gho.boulder.ibm.com (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17]) by d03dlp03.boulder.ibm.com (Postfix) with ESMTP id 9799519D803D; Wed, 22 Oct 2014 10:43:18 -0600 (MDT) Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id s9MGsZwn51380286; Wed, 22 Oct 2014 18:54:35 +0200 Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id s9MGxHFt013817; Wed, 22 Oct 2014 10:59:18 -0600 Received: from paulmck-ThinkPad-W500 ([9.70.82.148]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id s9MGxGLO013792; Wed, 22 Oct 2014 10:59:17 -0600 Received: by paulmck-ThinkPad-W500 (Postfix, from userid 1000) id 99A8C3821B8; Wed, 22 Oct 2014 09:54:33 -0700 (PDT) Date: Wed, 22 Oct 2014 09:54:33 -0700 From: "Paul E. McKenney" To: Jiri Kosina Cc: Peter Zijlstra , Ingo Molnar , "Rafael J. Wysocki" , Pavel Machek , Steven Rostedt , Dave Jones , Daniel Lezcano , Nicolas Pitre , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Subject: Re: lockdep splat in CPU hotplug Message-ID: <20141022165433.GA22874@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20141022143837.GW4977@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20141022143837.GW4977@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14102216-0025-0000-0000-00000589CC3A Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, Oct 22, 2014 at 07:38:37AM -0700, Paul E. McKenney wrote: > On Wed, Oct 22, 2014 at 11:53:49AM +0200, Jiri Kosina wrote: > > On Tue, 21 Oct 2014, Jiri Kosina wrote: > > > > > Hi, > > > > > > I am seeing the lockdep report below when resuming from suspend-to-disk > > > with current Linus' tree (c2661b80609). > > > > > > The reason for CCing Ingo and Peter is that I can't make any sense of one > > > of the stacktraces lockdep is providing. > > > > > > Please have a look at the very first stacktrace in the dump, where lockdep > > > is trying to explain where cpu_hotplug.lock#2 has been acquired. It seems > > > to imply that cpuidle_pause() is taking cpu_hotplug.lock, but that's not > > > the case at all. > > > > > > What am I missing? > > > > Okay, reverting 442bf3aaf55a ("sched: Let the scheduler see CPU idle > > states") and followup 83a0a96a5f26 ("sched/fair: Leverage the idle state > > info when choosing the "idlest" cpu") which depends on it makes the splat > > go away. > > > > Just for the sake of testing the hypothesis, I did just the minimal change > > below on top of current Linus' tree, and it also makes the splat go away > > (of course it's totally incorrect thing to do by itself alone): > > > > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c > > index 125150d..d31e04c 100644 > > --- a/drivers/cpuidle/cpuidle.c > > +++ b/drivers/cpuidle/cpuidle.c > > @@ -225,12 +225,6 @@ void cpuidle_uninstall_idle_handler(void) > > initialized = 0; > > wake_up_all_idle_cpus(); > > } > > - > > - /* > > - * Make sure external observers (such as the scheduler) > > - * are done looking at pointed idle states. > > - */ > > - synchronize_rcu(); > > } > > > > /** > > > > So indeed 442bf3aaf55a is guilty. > > > > Paul was stating yesterday that it can't be the try_get_online_cpus() in > > synchronize_sched_expedited(), as it's doing only trylock. There are > > however more places where synchronize_sched_expedited() is acquiring > > cpu_hotplug.lock unconditionally by calling put_online_cpus(), so the race > > seems real. > > Gah! So I only half-eliminated the deadlock between > synchronize_sched_expedited() and CPU hotplug. Back to the drawing > board... Please see below for an untested alleged fix. Thanx, Paul ------------------------------------------------------------------------ --- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/kernel/cpu.c b/kernel/cpu.c index 356450f09c1f..8589b94b005a 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -64,6 +64,8 @@ static struct { * an ongoing cpu hotplug operation. */ int refcount; + /* And allows lockless put_online_cpus(). */ + atomic_t puts_pending; #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; @@ -113,7 +115,10 @@ void put_online_cpus(void) { if (cpu_hotplug.active_writer == current) return; - mutex_lock(&cpu_hotplug.lock); + if (!mutex_trylock(&cpu_hotplug.lock)) { + atomic_inc(&cpu_hotplug.puts_pending); + return; + } if (WARN_ON(!cpu_hotplug.refcount)) cpu_hotplug.refcount++; /* try to fix things up */ @@ -155,6 +160,12 @@ void cpu_hotplug_begin(void) cpuhp_lock_acquire(); for (;;) { mutex_lock(&cpu_hotplug.lock); + if (atomic_read(&cpu_hotplug.puts_pending)) { + int delta; + + delta = atomic_xchg(&cpu_hotplug.puts_pending, 0); + cpu_hotplug.refcount -= delta; + } if (likely(!cpu_hotplug.refcount)) break; __set_current_state(TASK_UNINTERRUPTIBLE);