From patchwork Fri Jul 29 14:02:11 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Felipe Balbi X-Patchwork-Id: 1020782 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter2.kernel.org (8.14.4/8.14.4) with ESMTP id p6TE2LtK009278 for ; Fri, 29 Jul 2011 14:02:21 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755503Ab1G2OCT (ORCPT ); Fri, 29 Jul 2011 10:02:19 -0400 Received: from na3sys009aog113.obsmtp.com ([74.125.149.209]:60695 "EHLO na3sys009aog113.obsmtp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753774Ab1G2OCS (ORCPT ); Fri, 29 Jul 2011 10:02:18 -0400 Received: from mail-wy0-f179.google.com ([74.125.82.179]) (using TLSv1) by na3sys009aob113.postini.com ([74.125.148.12]) with SMTP ID DSNKTjK9Z8IBwZ5nEOfbSJdnUmHc6zNJhQHD@postini.com; Fri, 29 Jul 2011 07:02:17 PDT Received: by mail-wy0-f179.google.com with SMTP id 21so219457wyh.10 for ; Fri, 29 Jul 2011 07:02:15 -0700 (PDT) Received: by 10.204.30.202 with SMTP id v10mr420975bkc.268.1311948133995; Fri, 29 Jul 2011 07:02:13 -0700 (PDT) Received: from localhost (cs181221225.pp.htv.fi [82.181.221.225]) by mx.google.com with ESMTPS id t2sm591558bku.0.2011.07.29.07.02.12 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 29 Jul 2011 07:02:13 -0700 (PDT) Date: Fri, 29 Jul 2011 17:02:11 +0300 From: Felipe Balbi To: Govindraj Cc: balbi@ti.com, "Govindraj.R" , linux-omap@vger.kernel.org, linux-serial@vger.kernel.org, linux-pm@lists.linux-foundation.org, "Rafael J. Wysocki" , Paul Walmsley , Kevin Hilman , Vishwanath Sripathy , Partha Basak , Santosh Shilimkar , Rajendra Nayak , Benoit Cousson , Tero Kristo Subject: Re: [RFC v2]: Issues implementing clock handling mechanism within UART driver Message-ID: <20110729140210.GT31013@legolas.emea.dhcp.ti.com> Reply-To: balbi@ti.com References: <1311845395-31917-1-git-send-email-govindraj.raja@ti.com> <20110729095512.GE31013@legolas.emea.dhcp.ti.com> <20110729113713.GK31013@legolas.emea.dhcp.ti.com> <20110729121907.GM31013@legolas.emea.dhcp.ti.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-omap-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-omap@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter2.kernel.org [140.211.167.43]); Fri, 29 Jul 2011 14:02:21 +0000 (UTC) Hi, On Fri, Jul 29, 2011 at 06:28:32PM +0530, Govindraj wrote: > On Fri, Jul 29, 2011 at 5:49 PM, Felipe Balbi wrote: > > On Fri, Jul 29, 2011 at 05:29:12PM +0530, Govindraj wrote: > >> Yes fine, But there are scenarios even before first runtime_suspend happens, > >> > >> ex: uart_port_configure -> get_sync -> pm_generic_runtime_resume > >> (omap_device_enable in this case) debug printk -> console_write -> get_sync. > >> > >> there are numerous such scenarios where we end from runtime context > >> to runtime api context again, or jumping from one uart port operation > >> to uart print operation. > > > > calling pm_runtime_get_sync() should not be a problem. It should only > > increase the usage counters... This sounds like a race condition on the > > driver, no ? > > Actually when we call a API to enable clocks we except the internals of API > to just enable clocks and return. > > *Clock enable API should not cause or trigger to do a _device_driver_operation_ > even before enabling clocks of the device-driver which called it* > > for uart context get_sync can land me to uart driver back > even before enabling the uart clocks due to printks. only if _you_ have prints or _your_ runtime_*() calls, no ? Let's say omap_hwmod.c wants to do a print: -> printk() -> pm_runtime_get_sync -> console_write -> pm_runtim_put now, if you have a printk() on your runtime_resume() before you enable the clocks, then I can see why you would deadlock: -> pm_runtime_get_sync -> omap_serial_runtime_resume -> printk -> pm_runtime_get_sync -> omap_serial_runtime_resume -> printk -> pm_runtime_get_sync ..... maybe I'm missing something, but can you add a stack dump on your ->runtime_resume and ->runtime_suspend methods, just so we try to figure out who's to blame here ? > > What you're experiencing, if I understood correctly, is a deadlock ? In > > that case, can you try to track the locking mechanism on the omap-serial > > driver to try to find if there isn't anything obviously wrong ? > > > > Yes deadlocks. due to entering from runtime context to runtime context > or entering from uart_port_operation to uart_console_write ops. > > There are already port locks used extensively within the uart driver > to secure a port operation. > > But cannot secure a port operation while using clock_enable API. > since clock enable API can land the control back to uart_console_write > operation.. but in that case, if clock isn't enabled, why don't you just ignore the print and enable the clock ?? Just return 0 and continue with clk_enable() ?? > >> So either we should not have those prints from pm_generic layers or suppress > >> them(seems pretty much a problem for a clean design within the driver > >> using console_lock/unlock for every get_sync, and for > >> runtime_put we cannot suppress the prints as it gets scheduled later) > >> > >> or if other folks who really need those prints form pm_generic* layers > >> to debug and analysis then we have no other choice rather control > >> the clk_enable/disable from outside driver code in idle path. > > > > yeah, none of these would be nice :-( > > > > I think this needs more debugging to be sure what's exactly going on. > > What's exactly causing the deadlock ? Which lock is held and never > > released ? > > > > I had done some investigations, from scenarios it simply boils down to fact > to handle clock within uart driver, uart driver expects clock enable API* used > to just enable uart clocks but rather not trigger a _uart_ops_ within which > kind of unacceptable from the uart_driver context. ok, now I see what you mean: 113 static int _omap_device_activate(struct omap_device *od, u8 ignore_lat) 114 { 115 struct timespec a, b, c; 116 117 pr_debug("omap_device: %s: activating\n", od->pdev.name); 118 119 while (od->pm_lat_level > 0) { 120 struct omap_device_pm_latency *odpl; 121 unsigned long long act_lat = 0; 122 123 od->pm_lat_level--; 124 125 odpl = od->pm_lats + od->pm_lat_level; 126 127 if (!ignore_lat && 128 (od->dev_wakeup_lat <= od->_dev_wakeup_lat_limit)) 129 break; 130 131 read_persistent_clock(&a); 132 133 /* XXX check return code */ 134 odpl->activate_func(od); 135 136 read_persistent_clock(&b); 137 138 c = timespec_sub(b, a); 139 act_lat = timespec_to_ns(&c); 140 141 pr_debug("omap_device: %s: pm_lat %d: activate: elapsed time " 142 "%llu nsec\n", od->pdev.name, od->pm_lat_level, 143 act_lat); 144 145 if (act_lat > odpl->activate_lat) { 146 odpl->activate_lat_worst = act_lat; 147 if (odpl->flags & OMAP_DEVICE_LATENCY_AUTO_ADJUST) { 148 odpl->activate_lat = act_lat; 149 pr_warning("omap_device: %s.%d: new worst case " 150 "activate latency %d: %llu\n", 151 od->pdev.name, od->pdev.id, 152 od->pm_lat_level, act_lat); 153 } else 154 pr_warning("omap_device: %s.%d: activate " 155 "latency %d higher than exptected. " 156 "(%llu > %d)\n", 157 od->pdev.name, od->pdev.id, 158 od->pm_lat_level, act_lat, 159 odpl->activate_lat); 160 } 161 162 od->dev_wakeup_lat -= odpl->activate_lat; 163 } 164 165 return 0; 166 } When that first pr_debug() triggers, UART's hwmod could be disabled, and that would trigger the state I described above where you would keep on calling pm_runtime_get_sync() forever ;-) isn't it enough to patch it like below: either the above or something like: if (pm_runtime_suspended(dev)) return 0; on console_write() ?? diff --git a/arch/arm/plat-omap/omap_device.c b/arch/arm/plat-omap/omap_device.c index b6b4097..560f622 100644 --- a/arch/arm/plat-omap/omap_device.c +++ b/arch/arm/plat-omap/omap_device.c @@ -114,8 +114,6 @@ static int _omap_device_activate(struct omap_device *od, u8 ignore_lat) { struct timespec a, b, c; - pr_debug("omap_device: %s: activating\n", od->pdev.name); - while (od->pm_lat_level > 0) { struct omap_device_pm_latency *odpl; unsigned long long act_lat = 0; @@ -162,6 +160,8 @@ static int _omap_device_activate(struct omap_device *od, u8 ignore_lat) od->dev_wakeup_lat -= odpl->activate_lat; } + pr_debug("omap_device: %s: activated\n", od->pdev.name); + return 0; }