From patchwork Tue Mar 1 17:14:39 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: arun.siluvery@linux.intel.com X-Patchwork-Id: 8467581 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id CB6BEC0553 for ; Tue, 1 Mar 2016 17:14:59 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id CFE472014A for ; Tue, 1 Mar 2016 17:14:58 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id B4A6E202E5 for ; Tue, 1 Mar 2016 17:14:57 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 976446E6FD; Tue, 1 Mar 2016 17:14:54 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTP id 6665A6E6FD for ; Tue, 1 Mar 2016 17:14:53 +0000 (UTC) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga104.fm.intel.com with ESMTP; 01 Mar 2016 09:14:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,524,1449561600"; d="scan'208";a="924459187" Received: from asiluver-linux.isw.intel.com ([10.102.226.117]) by orsmga002.jf.intel.com with ESMTP; 01 Mar 2016 09:14:51 -0800 From: Arun Siluvery To: intel-gfx@lists.freedesktop.org Date: Tue, 1 Mar 2016 17:14:39 +0000 Message-Id: <1456852479-9375-3-git-send-email-arun.siluvery@linux.intel.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1456852479-9375-1-git-send-email-arun.siluvery@linux.intel.com> References: <1456852479-9375-1-git-send-email-arun.siluvery@linux.intel.com> Subject: [Intel-gfx] [PATCH 2/2] drm/i915/guc: Reset GuC and retry on firmware load failure X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Due to timing issues in the HW some of the status bits required for GuC authentication doesn't get set occassionally, when that happens, GuC cannot be initialized and we will be left with a wedged GPU. The WA suggested is to perform a soft reset of GuC and attempt to reload the fw again for few times before giving up. As the failure is dependent on timing, tests performed by triggering manual full gpu reset (i915_wedged) showed that we could sometimes hit this after several thousand iterations but sometimes tests ran even longer without any issues. Reset and reload mechanism proved helpful when we indeed hit fw load failure so it is better to include this to improve driver stability. This change implements the following WA, WaEnableuKernelHeaderValidFix:skl,bxt WaEnableGuCBootHashCheckNotSet:skl,bxt Cc: Dave Gordon Cc: Alex Dai Signed-off-by: Arun Siluvery --- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_guc_reg.h | 1 + drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_guc_loader.c | 49 +++++++++++++++++++++++++++++++-- drivers/gpu/drm/i915/intel_uncore.c | 17 ++++++++++++ 5 files changed, 67 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 55dadfc..3e5a2e5 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2742,6 +2742,7 @@ extern long i915_compat_ioctl(struct file *filp, unsigned int cmd, #endif extern int intel_gpu_reset(struct drm_device *dev); extern int intel_engine_reset(struct intel_engine_cs *engine); +extern int intel_guc_reset(struct drm_i915_private *dev_priv); extern bool intel_has_gpu_reset(struct drm_device *dev); extern int i915_reset(struct drm_device *dev); extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv); diff --git a/drivers/gpu/drm/i915/i915_guc_reg.h b/drivers/gpu/drm/i915/i915_guc_reg.h index e4ba582..94ceee5 100644 --- a/drivers/gpu/drm/i915/i915_guc_reg.h +++ b/drivers/gpu/drm/i915/i915_guc_reg.h @@ -27,6 +27,7 @@ /* Definitions of GuC H/W registers, bits, etc */ #define GUC_STATUS _MMIO(0xc000) +#define GS_MIA_IN_RESET (1 << 0) #define GS_BOOTROM_SHIFT 1 #define GS_BOOTROM_MASK (0x7F << GS_BOOTROM_SHIFT) #define GS_BOOTROM_RSA_FAILED (0x50 << GS_BOOTROM_SHIFT) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index a798e40..4496fc7 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -166,6 +166,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define GEN6_GRDOM_BLT (1 << 3) #define GEN6_GRDOM_VECS (1 << 4) #define GEN8_GRDOM_MEDIA2 (1 << 7) +#define GEN9_GRDOM_GUC (1 << 5) #define RING_PP_DIR_BASE(ring) _MMIO((ring)->mmio_base+0x228) #define RING_PP_DIR_BASE_READ(ring) _MMIO((ring)->mmio_base+0x518) diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c index 82a3c03..f9cb814 100644 --- a/drivers/gpu/drm/i915/intel_guc_loader.c +++ b/drivers/gpu/drm/i915/intel_guc_loader.c @@ -353,6 +353,24 @@ static int guc_ucode_xfer(struct drm_i915_private *dev_priv) return ret; } +static int i915_reset_guc(struct drm_i915_private *dev_priv) +{ + int ret; + u32 guc_status; + + ret = intel_guc_reset(dev_priv); + if (ret) { + DRM_ERROR("GuC reset failed, ret = %d\n", ret); + return ret; + } + + guc_status = I915_READ(GUC_STATUS); + WARN(!(guc_status & GS_MIA_IN_RESET), + "GuC status: 0x%x, MIA core expected to be in reset\n", guc_status); + + return ret; +} + /** * intel_guc_ucode_load() - load GuC uCode into the device * @dev: drm device @@ -417,9 +435,36 @@ int intel_guc_ucode_load(struct drm_device *dev) if (err) goto fail; + /* + * WaEnableuKernelHeaderValidFix:skl,bxt + * For BXT, this is only upto B0 but below WA is required for later + * steppings also so this is extended as well. + */ + /* WaEnableGuCBootHashCheckNotSet:skl,bxt */ err = guc_ucode_xfer(dev_priv); - if (err) - goto fail; + if (err) { + int retries = 3; + + DRM_ERROR("GuC fw load failed, err=%d, attempting reset and retry\n", err); + + while (retries--) { + err = i915_reset_guc(dev_priv); + if (err) + break; + + err = guc_ucode_xfer(dev_priv); + if (!err) { + DRM_DEBUG_DRIVER("GuC fw reload succeeded after reset\n"); + break; + } + DRM_DEBUG_DRIVER("GuC fw reload retries left: %d\n", retries); + } + + if (err) { + DRM_ERROR("GuC fw reload attempt failed, ret=%d\n", err); + goto fail; + } + } guc_fw->guc_fw_load_status = GUC_FIRMWARE_SUCCESS; diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index d003b78..19220b9 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1683,6 +1683,23 @@ int intel_engine_reset(struct intel_engine_cs *engine) return gen8_do_engine_reset(engine); } +int intel_guc_reset(struct drm_i915_private *dev_priv) +{ + int ret; + + if (!i915.enable_guc_submission) + return -EINVAL; + + intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); + + __raw_i915_write32(dev_priv, GEN6_GDRST, GEN9_GRDOM_GUC); + ret = wait_for_engine_reset(dev_priv, GEN9_GRDOM_GUC); + + intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); + + return ret; +} + bool intel_uncore_unclaimed_mmio(struct drm_i915_private *dev_priv) { return check_for_unclaimed_mmio(dev_priv);