From patchwork Wed Jan 13 17:28:24 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: arun.siluvery@linux.intel.com X-Patchwork-Id: 8027391 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id B06ED9F32E for ; Wed, 13 Jan 2016 17:28:59 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id BCEFA20527 for ; Wed, 13 Jan 2016 17:28:58 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id C89762041E for ; Wed, 13 Jan 2016 17:28:57 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0B3617A065; Wed, 13 Jan 2016 09:28:57 -0800 (PST) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTP id 061717A065 for ; Wed, 13 Jan 2016 09:28:56 -0800 (PST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP; 13 Jan 2016 09:28:55 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,290,1449561600"; d="scan'208";a="892498272" Received: from asiluver-linux.isw.intel.com ([10.102.226.117]) by fmsmga002.fm.intel.com with ESMTP; 13 Jan 2016 09:28:52 -0800 From: Arun Siluvery To: intel-gfx@lists.freedesktop.org Date: Wed, 13 Jan 2016 17:28:24 +0000 Message-Id: <1452706112-8617-13-git-send-email-arun.siluvery@linux.intel.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1452706112-8617-1-git-send-email-arun.siluvery@linux.intel.com> References: <1452706112-8617-1-git-send-email-arun.siluvery@linux.intel.com> Cc: Ian Lister , Tomas Elf Subject: [Intel-gfx] [PATCH 12/20] drm/i915: Debugfs interface for per-engine hang recovery. X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Tomas Elf 1. The i915_wedged_set() function now allows for both legacy full GPU reset and per-engine reset of one or more engines at a time: a) Legacy hang recovery by passing 0. b) Multiple engine hang recovery by passing in an engine flag mask where bit 0 corresponds to engine 0 = RCS, bit 1 corresponds to engine 1 = VCS etc. This allows for any combination of engine hang recoveries to be tested. For example, by passing in the value 0x3 hang recovery for engines 0 and 1 (RCS and VCS) are scheduled at the same time. 2. The i915_hangcheck_info() function is complemented with statistics related to: a) Number of engine hangs detected by periodic hang checker. b) Number of watchdog timeout hangs detected. c) Number of full GPU resets carried out. d) Number of engine resets carried out. Signed-off-by: Tomas Elf Signed-off-by: Arun Siluvery Signed-off-by: Ian Lister Cc: Chris Wilson Cc: Mika Kuoppala --- drivers/gpu/drm/i915/i915_debugfs.c | 75 +++++++++++++++++++++++++++++++++++-- 1 file changed, 71 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index dabddda..62c9a41 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -1357,6 +1357,8 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused) } else seq_printf(m, "Hangcheck inactive\n"); + seq_printf(m, "Full GPU resets = %u\n", i915_reset_count(&dev_priv->gpu_error)); + for_each_ring(ring, dev_priv, i) { seq_printf(m, "%s:\n", ring->name); seq_printf(m, "\tseqno = %x [current %x]\n", @@ -1368,6 +1370,12 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused) (long long)ring->hangcheck.max_acthd); seq_printf(m, "\tscore = %d\n", ring->hangcheck.score); seq_printf(m, "\taction = %d\n", ring->hangcheck.action); + seq_printf(m, "\tengine resets = %u\n", + ring->hangcheck.reset_count); + seq_printf(m, "\tengine hang detections = %u\n", + ring->hangcheck.tdr_count); + seq_printf(m, "\tengine watchdog timeout detections = %u\n", + ring->hangcheck.watchdog_count); if (ring->id == RCS) { seq_puts(m, "\tinstdone read ="); @@ -4701,11 +4709,48 @@ i915_wedged_get(void *data, u64 *val) return 0; } +static const char *ringid_to_str(enum intel_ring_id ring_id) +{ + switch (ring_id) { + case RCS: + return "RCS"; + case VCS: + return "VCS"; + case BCS: + return "BCS"; + case VECS: + return "VECS"; + case VCS2: + return "VCS2"; + } + + return "unknown"; +} + static int i915_wedged_set(void *data, u64 val) { struct drm_device *dev = data; struct drm_i915_private *dev_priv = dev->dev_private; + struct intel_engine_cs *engine; + u32 i; +#define ENGINE_MSGLEN 64 + char msg[ENGINE_MSGLEN]; + + /* + * Val contains the engine flag mask of engines to be reset. + * + * * Full GPU reset is caused by passing val == 0x0 + * + * * Any combination of engine hangs is caused by setting up val as a + * mask with the following bits set for each engine to be hung: + * + * Bit 0: RCS engine + * Bit 1: VCS engine + * Bit 2: BCS engine + * Bit 3: VECS engine + * Bit 4: VCS2 engine (if available) + */ /* * There is no safeguard against this debugfs entry colliding @@ -4714,14 +4759,36 @@ i915_wedged_set(void *data, u64 val) * test harness is responsible enough not to inject gpu hangs * while it is writing to 'i915_wedged' */ - - if (i915_reset_in_progress(&dev_priv->gpu_error)) + if (i915_gem_check_wedge(dev_priv, NULL, true)) return -EAGAIN; intel_runtime_pm_get(dev_priv); - i915_handle_error(dev, 0x0, false, val, - "Manually setting wedged to %llu", val); + memset(msg, 0, sizeof(msg)); + + if (val) { + scnprintf(msg, sizeof(msg), "Manual reset:"); + + /* Assemble message string */ + for_each_ring(engine, dev_priv, i) + if (intel_ring_flag(engine) & val) { + DRM_INFO("Manual reset: %s\n", engine->name); + + scnprintf(msg, sizeof(msg), + "%s [%s]", + msg, + ringid_to_str(i)); + } + + } else { + scnprintf(msg, sizeof(msg), "Manual global reset"); + } + + i915_handle_error(dev, + val, + false, + true, + msg); intel_runtime_pm_put(dev_priv);