From patchwork Tue Oct 21 10:48:13 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dheeraj Jamwal X-Patchwork-Id: 5115921 Return-Path: X-Original-To: patchwork-ltsi-dev@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id D8C4BC11AC for ; Tue, 21 Oct 2014 11:22:30 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 750C42010F for ; Tue, 21 Oct 2014 11:22:29 +0000 (UTC) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EFB0D200CA for ; Tue, 21 Oct 2014 11:22:27 +0000 (UTC) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 4D26EDA3; Tue, 21 Oct 2014 11:03:12 +0000 (UTC) X-Original-To: ltsi-dev@lists.linuxfoundation.org Delivered-To: ltsi-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 5E103B2D for ; Tue, 21 Oct 2014 11:03:11 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by smtp1.linuxfoundation.org (Postfix) with ESMTP id 320FB1FAB2 for ; Tue, 21 Oct 2014 11:03:10 +0000 (UTC) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP; 21 Oct 2014 04:03:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.04,761,1406617200"; d="scan'208";a="617794539" Received: from ubuntu-desktop.png.intel.com ([10.221.122.25]) by fmsmga002.fm.intel.com with ESMTP; 21 Oct 2014 04:03:01 -0700 From: Dheeraj Jamwal To: ltsi-dev@lists.linuxfoundation.org Date: Tue, 21 Oct 2014 18:48:13 +0800 Message-Id: <1413889294-31328-294-git-send-email-dheerajx.s.jamwal@intel.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1413889294-31328-1-git-send-email-dheerajx.s.jamwal@intel.com> References: <1413889294-31328-1-git-send-email-dheerajx.s.jamwal@intel.com> X-Spam-Status: No, score=-5.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org Subject: [LTSI-dev] [PATCH 0293/1094] drm/i915: Add reason for capture in error state X-BeenThere: ltsi-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: "A list to discuss patches, development, and other things related to the LTSI project" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ltsi-dev-bounces@lists.linuxfoundation.org Errors-To: ltsi-dev-bounces@lists.linuxfoundation.org X-Virus-Scanned: ClamAV using ClamSMTP From: Mika Kuoppala We capture error state not only when the GPU hangs but also on other situations as in interrupt errors and in situations where we can kick things forward without GPU reset. There will be log entry on most of these cases. But as error state capture might be only thing we have, if dmesg was not captured. Or as in GEN4 case, interrupt error can trigger error state capture without log entry, the exact reason why capture was made is hard to decipher. v2: Split out the the error code stuff to separate patch (Ben) References: https://bugs.freedesktop.org/show_bug.cgi?id=74193 Signed-off-by: Mika Kuoppala Reviewed-by: Chris Wilson Signed-off-by: Daniel Vetter (cherry picked from commit 581744626d89930a03f307558182896a4f51173c) Signed-off-by: Dheeraj Jamwal --- drivers/gpu/drm/i915/i915_debugfs.c | 5 ++-- drivers/gpu/drm/i915/i915_drv.h | 7 +++-- drivers/gpu/drm/i915/i915_gpu_error.c | 27 ++++++++++++++----- drivers/gpu/drm/i915/i915_irq.c | 46 +++++++++++++++++++++------------ 4 files changed, 58 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index aef779b..43f3074 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -3190,9 +3190,8 @@ i915_wedged_set(void *data, u64 val) { struct drm_device *dev = data; - DRM_INFO("Manually setting wedged to %llu\n", val); - i915_handle_error(dev, val); - + i915_handle_error(dev, val, + "Manually setting wedged to %llu", val); return 0; } diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index ae5bc91..e444a8d 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2016,7 +2016,9 @@ extern void intel_console_resume(struct work_struct *work); /* i915_irq.c */ void i915_queue_hangcheck(struct drm_device *dev); -void i915_handle_error(struct drm_device *dev, bool wedged); +__printf(3, 4) +void i915_handle_error(struct drm_device *dev, bool wedged, + const char *fmt, ...); void gen6_set_pm_mask(struct drm_i915_private *dev_priv, u32 pm_iir, int new_delay); @@ -2457,7 +2459,8 @@ static inline void i915_error_state_buf_release( { kfree(eb->buf); } -void i915_capture_error_state(struct drm_device *dev); +void i915_capture_error_state(struct drm_device *dev, bool wedge, + const char *error_msg); void i915_error_state_get(struct drm_device *dev, struct i915_error_state_file_priv *error_priv); void i915_error_state_put(struct i915_error_state_file_priv *error_priv); diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 4879a2a..debbc7d 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1097,16 +1097,30 @@ static void i915_capture_reg_state(struct drm_i915_private *dev_priv, } static void i915_error_capture_msg(struct drm_device *dev, - struct drm_i915_error_state *error) + struct drm_i915_error_state *error, + bool wedged, + const char *error_msg) { struct drm_i915_private *dev_priv = dev->dev_private; u32 ecode; - int ring_id = -1; + int ring_id = -1, len; ecode = i915_error_generate_code(dev_priv, error, &ring_id); - scnprintf(error->error_msg, sizeof(error->error_msg), - "GPU HANG: ecode %d:0x%08x", ring_id, ecode); + len = scnprintf(error->error_msg, sizeof(error->error_msg), + "GPU HANG: ecode %d:0x%08x", ring_id, ecode); + + if (ring_id != -1 && error->ring[ring_id].pid != -1) + len += scnprintf(error->error_msg + len, + sizeof(error->error_msg) - len, + ", in %s [%d]", + error->ring[ring_id].comm, + error->ring[ring_id].pid); + + scnprintf(error->error_msg + len, sizeof(error->error_msg) - len, + ", reason: %s, action: %s", + error_msg, + wedged ? "reset" : "continue"); } /** @@ -1118,7 +1132,8 @@ static void i915_error_capture_msg(struct drm_device *dev, * out a structure which becomes available in debugfs for user level tools * to pick up. */ -void i915_capture_error_state(struct drm_device *dev) +void i915_capture_error_state(struct drm_device *dev, bool wedged, + const char *error_msg) { static bool warned; struct drm_i915_private *dev_priv = dev->dev_private; @@ -1144,7 +1159,7 @@ void i915_capture_error_state(struct drm_device *dev) error->overlay = intel_overlay_capture_error_state(dev); error->display = intel_display_capture_error_state(dev); - i915_error_capture_msg(dev, error); + i915_error_capture_msg(dev, error, wedged, error_msg); DRM_INFO("%s\n", error->error_msg); spin_lock_irqsave(&dev_priv->gpu_error.lock, flags); diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index ee012c4..5f92921 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1309,8 +1309,8 @@ static void snb_gt_irq_handler(struct drm_device *dev, if (gt_iir & (GT_BLT_CS_ERROR_INTERRUPT | GT_BSD_CS_ERROR_INTERRUPT | GT_RENDER_CS_MASTER_ERROR_INTERRUPT)) { - DRM_ERROR("GT error interrupt 0x%08x\n", gt_iir); - i915_handle_error(dev, false); + i915_handle_error(dev, false, "GT error interrupt 0x%08x", + gt_iir); } if (gt_iir & GT_PARITY_ERROR(dev)) @@ -1557,8 +1557,9 @@ static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir) notify_ring(dev_priv->dev, &dev_priv->ring[VECS]); if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT) { - DRM_ERROR("VEBOX CS error interrupt 0x%08x\n", pm_iir); - i915_handle_error(dev_priv->dev, false); + i915_handle_error(dev_priv->dev, false, + "VEBOX CS error interrupt 0x%08x", + pm_iir); } } } @@ -2290,11 +2291,18 @@ static void i915_report_and_clear_eir(struct drm_device *dev) * so userspace knows something bad happened (should trigger collection * of a ring dump etc.). */ -void i915_handle_error(struct drm_device *dev, bool wedged) +void i915_handle_error(struct drm_device *dev, bool wedged, + const char *fmt, ...) { struct drm_i915_private *dev_priv = dev->dev_private; + va_list args; + char error_msg[80]; - i915_capture_error_state(dev); + va_start(args, fmt); + vscnprintf(error_msg, sizeof(error_msg), fmt, args); + va_end(args); + + i915_capture_error_state(dev, wedged, error_msg); i915_report_and_clear_eir(dev); if (wedged) { @@ -2597,9 +2605,9 @@ ring_stuck(struct intel_ring_buffer *ring, u32 acthd) */ tmp = I915_READ_CTL(ring); if (tmp & RING_WAIT) { - DRM_ERROR("Kicking stuck wait on %s\n", - ring->name); - i915_handle_error(dev, false); + i915_handle_error(dev, false, + "Kicking stuck wait on %s", + ring->name); I915_WRITE_CTL(ring, tmp); return HANGCHECK_KICK; } @@ -2609,9 +2617,9 @@ ring_stuck(struct intel_ring_buffer *ring, u32 acthd) default: return HANGCHECK_HUNG; case 1: - DRM_ERROR("Kicking stuck semaphore on %s\n", - ring->name); - i915_handle_error(dev, false); + i915_handle_error(dev, false, + "Kicking stuck semaphore on %s", + ring->name); I915_WRITE_CTL(ring, tmp); return HANGCHECK_KICK; case 0: @@ -2733,7 +2741,7 @@ static void i915_hangcheck_elapsed(unsigned long data) } if (rings_hung) - return i915_handle_error(dev, true); + return i915_handle_error(dev, true, "Ring hung"); if (busy_count) /* Reset timer case chip hangs without another request @@ -3348,7 +3356,9 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg) */ spin_lock_irqsave(&dev_priv->irq_lock, irqflags); if (iir & I915_RENDER_COMMAND_PARSER_ERROR_INTERRUPT) - i915_handle_error(dev, false); + i915_handle_error(dev, false, + "Command parser error, iir 0x%08x", + iir); for_each_pipe(pipe) { int reg = PIPESTAT(pipe); @@ -3530,7 +3540,9 @@ static irqreturn_t i915_irq_handler(int irq, void *arg) */ spin_lock_irqsave(&dev_priv->irq_lock, irqflags); if (iir & I915_RENDER_COMMAND_PARSER_ERROR_INTERRUPT) - i915_handle_error(dev, false); + i915_handle_error(dev, false, + "Command parser error, iir 0x%08x", + iir); for_each_pipe(pipe) { int reg = PIPESTAT(pipe); @@ -3767,7 +3779,9 @@ static irqreturn_t i965_irq_handler(int irq, void *arg) */ spin_lock_irqsave(&dev_priv->irq_lock, irqflags); if (iir & I915_RENDER_COMMAND_PARSER_ERROR_INTERRUPT) - i915_handle_error(dev, false); + i915_handle_error(dev, false, + "Command parser error, iir 0x%08x", + iir); for_each_pipe(pipe) { int reg = PIPESTAT(pipe);