From patchwork Tue Jul 30 20:38:53 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zoran Markovic X-Patchwork-Id: 2835926 Return-Path: X-Original-To: patchwork-linux-pm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 386E1C0319 for ; Tue, 30 Jul 2013 20:40:07 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 104AD201ED for ; Tue, 30 Jul 2013 20:40:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4F59B201E3 for ; Tue, 30 Jul 2013 20:40:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757455Ab3G3Ujr (ORCPT ); Tue, 30 Jul 2013 16:39:47 -0400 Received: from mail-ee0-f47.google.com ([74.125.83.47]:35127 "EHLO mail-ee0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757318Ab3G3Ujq (ORCPT ); Tue, 30 Jul 2013 16:39:46 -0400 Received: by mail-ee0-f47.google.com with SMTP id d49so1440317eek.34 for ; Tue, 30 Jul 2013 13:39:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:cc:subject:date:message-id:x-mailer:x-gm-message-state; bh=HIfP70Ss3VRwzhBPz0KsO3W5Kft04eefKou00zq++sU=; b=k7qy7Nfuijlrum44BNgYOtsSseUNJFv6cx/xGsE8ltQPRe5+132I0rCWvL3RddB5lS s518zh32PTNAiiHd/zBWRFCrEJUQir1JEP44Flv9phxp8Bo3JjOEpdXNGZ2K7KygEiLy FofWvOJxYBFOGLdEYPlh9vTmTVqYm2JbB+0BUfpT+K88W7Yv2Amo5FJ9r8V0jGFNGaTL L42HGRRcGRmf66QTT5c1m5KQP0nGGDVfX4j9A+Cyoa9PqQ1MESqPNNdZxhUipzbIlCoh UBYLvOt6WdIZ44ztfHxsAqeJ5ots29lYgHm2Pv8siRA4toCZFsrrFoRB8z49/kxQXRz2 qn6g== X-Received: by 10.14.179.131 with SMTP id h3mr4007732eem.102.1375216785256; Tue, 30 Jul 2013 13:39:45 -0700 (PDT) Received: from localhost.localdomain (79-101-245-52.dynamic.isp.telekom.rs. [79.101.245.52]) by mx.google.com with ESMTPSA id r54sm113227104eev.8.2013.07.30.13.39.43 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 30 Jul 2013 13:39:44 -0700 (PDT) From: Zoran Markovic To: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org, Benoit Goby , Android Kernel Team , Colin Cross , Todd Poynor , San Mehat , John Stultz , Pavel Machek , "Rafael J. Wysocki" , Len Brown , Greg Kroah-Hartman , Zoran Markovic Subject: [RFC PATCHv3] drivers: power: Detect device suspend/resume lockup and log event in pstore. Date: Tue, 30 Jul 2013 13:38:53 -0700 Message-Id: <1375216733-6740-1-git-send-email-zoran.markovic@linaro.org> X-Mailer: git-send-email 1.7.9.5 X-Gm-Message-State: ALoCoQk+YSwB3/tu6DUHSHLpVqPqvgP4dhBc49WBkAh0p6lvrcZXekqbHflXDQfdjbZb6WJJKFEr Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Benoit Goby Rather than hard-lock the kernel, dump the suspend/resume thread stack and panic() to capture a message in pstore when a driver takes too long to suspend/resume. Default suspend/resume watchdog timeout is set to 12 seconds to be longer than the usbhid 10 second timeout, but could be changed at compile time. Exclude from the watchdog the time spent waiting for children that are resumed asynchronously and time every device, whether or not they resumed synchronously. This patch is targeted for mobile devices where a suspend/resume lockup could cause a system reboot. Information about failing device can be retrieved in subsequent boot session by mounting pstore and inspecting the log. The hardware watchdog timer is likely suspended during this time and couldn't be relied upon. The soft-lockup detector would eventually tell that tasks are not scheduled, but would provide little context as to why. The patch hence uses system timer and assumes it is still active while the devices are suspended/resumed. This feature can be enabled/disabled during kernel configuration. Cc: Android Kernel Team Cc: Colin Cross Cc: Todd Poynor Cc: San Mehat Cc: Benoit Goby Cc: John Stultz Cc: Pavel Machek Cc: Rafael J. Wysocki Cc: Len Brown Cc: Greg Kroah-Hartman Original-author: San Mehat Signed-off-by: Benoit Goby [zoran.markovic@linaro.org: Changed printk(KERN_EMERG,...) to pr_emerg(...), tweaked commit message. Minor changes to add compile-time inclusion of the feature.] Signed-off-by: Zoran Markovic --- v3: * Added explicit dependency on pstore * Collapsed recovery options to system panic only * Logged driver string in panic message drivers/base/power/main.c | 70 +++++++++++++++++++++++++++++++++++++++++++++ kernel/power/Kconfig | 16 +++++++++++ 2 files changed, 86 insertions(+) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 5a9b656..c19aec0 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -29,6 +29,8 @@ #include #include #include +#include + #include "../base.h" #include "power.h" @@ -54,6 +56,12 @@ struct suspend_stats suspend_stats; static DEFINE_MUTEX(dpm_list_mtx); static pm_message_t pm_transition; +struct dpm_watchdog { + struct device *dev; + struct task_struct *tsk; + struct timer_list timer; +}; + static int async_error; /** @@ -384,6 +392,60 @@ static int dpm_run_callback(pm_callback_t cb, struct device *dev, return error; } +#ifdef CONFIG_DPM_WD +/** + * dpm_wd_handler - Driver suspend / resume watchdog handler. + * + * Called when a driver has timed out suspending or resuming. + * There's not much we can do here to recover so panic() to + * capture a crash-dump in pstore. + */ +static void dpm_wd_handler(unsigned long data) +{ + struct dpm_watchdog *wd = (void *)data; + + dev_emerg(wd->dev, "**** DPM device timeout ****\n"); + show_stack(wd->tsk, NULL); + panic("%s %s: unrecoverable failure\n", + dev_driver_string(wd->dev), dev_name(wd->dev)); +} + +/** + * dpm_wd_set - Enable pm watchdog for given device. + * @wd: Watchdog. Must be allocated on the stack. + * @dev: Device to handle. + */ +static void dpm_wd_set(struct dpm_watchdog *wd, struct device *dev) +{ + struct timer_list *timer = &wd->timer; + + wd->dev = dev; + wd->tsk = get_current(); + + init_timer_on_stack(timer); + /* use same timeout value for both suspend and resume */ + timer->expires = jiffies + HZ * CONFIG_DPM_WD_TIMEOUT; + timer->function = dpm_wd_handler; + timer->data = (unsigned long)wd; + add_timer(timer); +} + +/** + * dpm_wd_clear - Disable suspend/resume watchdog. + * @wd: Watchdog to disable. + */ +static void dpm_wd_clear(struct dpm_watchdog *wd) +{ + struct timer_list *timer = &wd->timer; + + del_timer_sync(timer); + destroy_timer_on_stack(timer); +} +#else +#define dpm_wd_set(x, y) +#define dpm_wd_clear(x) +#endif + /*------------------------- Resume routines -------------------------*/ /** @@ -570,6 +632,7 @@ static int device_resume(struct device *dev, pm_message_t state, bool async) pm_callback_t callback = NULL; char *info = NULL; int error = 0; + struct dpm_watchdog wd; TRACE_DEVICE(dev); TRACE_RESUME(0); @@ -585,6 +648,7 @@ static int device_resume(struct device *dev, pm_message_t state, bool async) * a resumed device, even if the device hasn't been completed yet. */ dev->power.is_prepared = false; + dpm_wd_set(&wd, dev); if (!dev->power.is_suspended) goto Unlock; @@ -636,6 +700,7 @@ static int device_resume(struct device *dev, pm_message_t state, bool async) Unlock: device_unlock(dev); + dpm_wd_clear(&wd); Complete: complete_all(&dev->power.completion); @@ -1053,6 +1118,7 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async) pm_callback_t callback = NULL; char *info = NULL; int error = 0; + struct dpm_watchdog wd; dpm_wait_for_children(dev, async); @@ -1076,6 +1142,8 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async) if (dev->power.syscore) goto Complete; + dpm_wd_set(&wd, dev); + device_lock(dev); if (dev->pm_domain) { @@ -1131,6 +1199,8 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async) device_unlock(dev); + dpm_wd_clear(&wd); + Complete: complete_all(&dev->power.completion); if (error) diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig index d444c4e..6a6b763 100644 --- a/kernel/power/Kconfig +++ b/kernel/power/Kconfig @@ -178,6 +178,22 @@ config PM_SLEEP_DEBUG def_bool y depends on PM_DEBUG && PM_SLEEP +config DPM_WD + bool "Device suspend/resume watchdog" + depends on PM_DEBUG && PSTORE + ---help--- + Sets up a watchdog timer to capture drivers that are + locked up attempting to suspend/resume a device. + A detected lockup causes system panic with message + captured in pstore device for inspection in subsequent + boot session. + +config DPM_WD_TIMEOUT + int "Watchdog timeout in seconds" + range 1 120 + default 12 + depends on DPM_WD + config PM_TRACE bool help