From patchwork Sat Sep 22 00:05:37 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 1493771 Return-Path: X-Original-To: patchwork-linux-omap@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork2.kernel.org (Postfix) with ESMTP id 97062DF28C for ; Sat, 22 Sep 2012 00:05:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754747Ab2IVAFn (ORCPT ); Fri, 21 Sep 2012 20:05:43 -0400 Received: from e38.co.us.ibm.com ([32.97.110.159]:53042 "EHLO e38.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753780Ab2IVAFl (ORCPT ); Fri, 21 Sep 2012 20:05:41 -0400 Received: from /spool/local by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 21 Sep 2012 18:05:41 -0600 Received: from d03dlp01.boulder.ibm.com (9.17.202.177) by e38.co.us.ibm.com (192.168.1.138) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 21 Sep 2012 18:05:40 -0600 Received: from d03relay03.boulder.ibm.com (d03relay03.boulder.ibm.com [9.17.195.228]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id C75DF1FF003F; Fri, 21 Sep 2012 18:05:35 -0600 (MDT) Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay03.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q8M05dwT238922; Fri, 21 Sep 2012 18:05:39 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q8M05cLh005815; Fri, 21 Sep 2012 18:05:38 -0600 Received: from paulmck-ThinkPad-W500 ([9.47.24.72]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q8M05bm7005800; Fri, 21 Sep 2012 18:05:38 -0600 Received: by paulmck-ThinkPad-W500 (Postfix, from userid 1000) id 66EF0EC515; Fri, 21 Sep 2012 17:05:37 -0700 (PDT) Date: Fri, 21 Sep 2012 17:05:37 -0700 From: "Paul E. McKenney" To: Paul Walmsley Cc: "Bruce, Becky" , "Paul E. McKenney" , "" , "" , "" , "Hilman, Kevin" , "Shilimkar, Santosh" , "Hunter, Jon" , "" Subject: Re: rcu self-detected stall messages on OMAP3, 4 boards Message-ID: <20120922000537.GH2454@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20120913011208.GT4257@linux.vnet.ibm.com> <20120920000351.GI2455@linux.vnet.ibm.com> <20120920220130.GN2449@linux.vnet.ibm.com> <20120921212054.GE2454@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Content-Scanned: Fidelis XPS MAILER x-cbid: 12092200-5518-0000-0000-000007D7B209 Sender: linux-omap-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-omap@vger.kernel.org On Fri, Sep 21, 2012 at 10:41:14PM +0000, Paul Walmsley wrote: > On Fri, 21 Sep 2012, Paul E. McKenney wrote: > > > On Fri, Sep 21, 2012 at 05:47:31PM +0000, Paul Walmsley wrote: > > > > > I built an OMAP kernel from Linus' commit > > > 4651afbbae968772efd6dc4ba461cba9b49bb9d8 ("Merge branch 'for-3.6-fixes' of > > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq"). The config used > > > was 'omap2plus_defconfig', and enabled CONFIG_CPU_IDLE by hand. Booted it > > > on a Pandaboard (OMAP4430ES2) into a very minimal Debian rootfs. > > > > Did you have the patch at https://lkml.org/lkml/2012/8/30/290 applied? > > No, it's just as described above. > > > If not, could you please try it? (This patch cleared up a similar > > problem for Becky, also on OMAP.) > > Did not seem to help, either with or without CONFIG_CPU_IDLE. I was hoping! ;-) And my init=/bin/sh kernel ran idle for more than an hour without any RCU CPU stall warnings... I am wondering if your system somehow figured out how to start a grace period that had no RCU callbacks waiting for it. If that happened, then a CONFIG_NO_HZ=y system could in theory get into a state where all CPUs are in dyntick-idle mode, so that none of them is doing anything to force the grace period to complete. That should be easy to diagnose, anyway. Please see below, which includes the earlier diagnostic patch. Thanx, Paul ------------------------------------------------------------------------ --- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 307caf1..696f189 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -879,6 +879,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp) unsigned long flags; int ndetected = 0; struct rcu_node *rnp = rcu_get_root(rsp); + long totqlen = 0; /* Only let one CPU complain about others per time interval. */ @@ -923,8 +924,11 @@ static void print_other_cpu_stall(struct rcu_state *rsp) raw_spin_unlock_irqrestore(&rnp->lock, flags); print_cpu_stall_info_end(); - printk(KERN_CONT "(detected by %d, t=%ld jiffies)\n", - smp_processor_id(), (long)(jiffies - rsp->gp_start)); + for_each_possible_cpu(cpu) + totqlen += per_cpu_ptr(rsp->rda, cpu)->qlen; + pr_cont("(detected by %d, t=%ld jiffies, g=%lu, c=%lu, q=%lu)\n", + smp_processor_id(), (long)(jiffies - rsp->gp_start), + rsp->gpnum, rsp->completed, totqlen); if (ndetected == 0) printk(KERN_ERR "INFO: Stall ended before state dump start\n"); else if (!trigger_all_cpu_backtrace()) @@ -939,8 +943,10 @@ static void print_other_cpu_stall(struct rcu_state *rsp) static void print_cpu_stall(struct rcu_state *rsp) { + int cpu; unsigned long flags; struct rcu_node *rnp = rcu_get_root(rsp); + long totqlen = 0; /* * OK, time to rat on ourselves... @@ -951,7 +957,10 @@ static void print_cpu_stall(struct rcu_state *rsp) print_cpu_stall_info_begin(); print_cpu_stall_info(rsp, smp_processor_id()); print_cpu_stall_info_end(); - printk(KERN_CONT " (t=%lu jiffies)\n", jiffies - rsp->gp_start); + for_each_possible_cpu(cpu) + totqlen += per_cpu_ptr(rsp->rda, cpu)->qlen; + pr_cont(" (t=%lu jiffies g=%lu c=%lu q=%lu)\n", + jiffies - rsp->gp_start, rsp->gpnum, rsp->completed, totqlen); if (!trigger_all_cpu_backtrace()) dump_stack();