From patchwork Sat Mar 25 01:30:05 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michel Thierry X-Patchwork-Id: 9644369 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1D1C060327 for ; Sat, 25 Mar 2017 01:30:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0F0C02654B for ; Sat, 25 Mar 2017 01:30:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 03E97269A3; Sat, 25 Mar 2017 01:30:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6E4B72654B for ; Sat, 25 Mar 2017 01:30:53 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C13986ECF1; Sat, 25 Mar 2017 01:30:41 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id 90C886E257 for ; Sat, 25 Mar 2017 01:30:12 +0000 (UTC) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga105.jf.intel.com with ESMTP; 24 Mar 2017 18:30:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.36,217,1486454400"; d="scan'208";a="70447253" Received: from relo-linux-11.sc.intel.com ([10.3.160.214]) by orsmga004.jf.intel.com with ESMTP; 24 Mar 2017 18:30:12 -0700 From: Michel Thierry To: intel-gfx@lists.freedesktop.org Date: Fri, 24 Mar 2017 18:30:05 -0700 Message-Id: <20170325013010.36244-14-michel.thierry@intel.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170325013010.36244-1-michel.thierry@intel.com> References: <20170325013010.36244-1-michel.thierry@intel.com> Subject: [Intel-gfx] [PATCH v5 13/18] drm/i915/guc: Add support for reset engine using GuC commands X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP This patch adds per engine reset and recovery (TDR) support when GuC is used to submit workloads to GPU. In the case of i915 directly submission to ELSP, driver manages hang detection, recovery and resubmission. With GuC submission these tasks are shared between driver and GuC. i915 is still responsible for detecting a hang, and when it does it only requests GuC to reset that Engine. GuC internally manages acquiring forcewake and idling the engine before actually resetting it. Once the reset is successful, i915 takes over again and handles resubmission. The scheduler in i915 knows which requests are pending so after resetting a engine, pending workloads/requests are resubmitted again. Signed-off-by: Arun Siluvery Signed-off-by: Jeff McGee Signed-off-by: Michel Thierry --- drivers/gpu/drm/i915/i915_drv.c | 43 +++++++++++++++++--------- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_guc_submission.c | 48 ++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/intel_guc_fwif.h | 6 ++++ drivers/gpu/drm/i915/intel_lrc.c | 5 ++-- drivers/gpu/drm/i915/intel_uc.h | 1 + drivers/gpu/drm/i915/intel_uncore.c | 5 ---- 7 files changed, 88 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index a111b39bbc12..3da0e7146ff8 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1940,23 +1940,34 @@ int i915_reset_engine(struct intel_engine_cs *engine) */ i915_gem_reset_engine(engine); - /* forcing engine to idle */ - ret = intel_request_reset_engine(engine); - if (ret) { - DRM_ERROR("Failed to disable %s\n", engine->name); - goto error; - } + if (!dev_priv->guc.execbuf_client) { + /* forcing engine to idle */ + ret = intel_request_reset_engine(engine); + if (ret) { + DRM_ERROR("Failed to disable %s\n", engine->name); + goto error; + } - /* finally, reset engine */ - ret = intel_gpu_reset(dev_priv, intel_engine_flag(engine)); - if (ret) { - DRM_ERROR("Failed to reset %s, ret=%d\n", engine->name, ret); + /* finally, reset engine */ + ret = intel_gpu_reset(dev_priv, intel_engine_flag(engine)); + if (ret) { + DRM_ERROR("Failed to reset %s, ret=%d\n", + engine->name, ret); + intel_unrequest_reset_engine(engine); + goto error; + } + + /* be sure the request reset bit gets cleared */ intel_unrequest_reset_engine(engine); - goto error; - } - /* be sure the request reset bit gets cleared */ - intel_unrequest_reset_engine(engine); + } else { + ret = i915_guc_request_engine_reset(engine); + if (ret) { + DRM_ERROR("GuC failed to reset %s, ret=%d\n", + engine->name, ret); + goto error; + } + } /* i915_gem_reset_prepare revoked the fences */ i915_gem_restore_fences(dev_priv); @@ -1967,6 +1978,10 @@ int i915_reset_engine(struct intel_engine_cs *engine) if (ret) goto error; + /* for guc too */ + if (dev_priv->guc.execbuf_client) + i915_guc_submission_reenable_engine(engine); + error->reset_engine_count[engine->id]++; wakeup: diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index fbb4f200756a..d5c12ddd35b3 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -3048,6 +3048,7 @@ extern void i915_reset(struct drm_i915_private *dev_priv, u32 engine_mask); extern bool intel_has_reset_engine(struct drm_i915_private *dev_priv); extern int intel_request_reset_engine(struct intel_engine_cs *engine); extern void intel_unrequest_reset_engine(struct intel_engine_cs *engine); +extern int i915_guc_request_engine_reset(struct intel_engine_cs *engine); extern int intel_guc_reset(struct drm_i915_private *dev_priv); extern void intel_engine_init_hangcheck(struct intel_engine_cs *engine); extern void intel_hangcheck_init(struct drm_i915_private *dev_priv); diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c index 2445af96aa71..fc21ec733f93 100644 --- a/drivers/gpu/drm/i915/i915_guc_submission.c +++ b/drivers/gpu/drm/i915/i915_guc_submission.c @@ -1335,6 +1335,25 @@ void i915_guc_submission_disable(struct drm_i915_private *dev_priv) guc->execbuf_client = NULL; } +void i915_guc_submission_reenable_engine(struct intel_engine_cs *engine) +{ + struct drm_i915_private *dev_priv = engine->i915; + struct intel_guc *guc = &dev_priv->guc; + struct i915_guc_client *client = guc->execbuf_client; + const int wqi_size = sizeof(struct guc_wq_item); + struct drm_i915_gem_request *rq; + + GEM_BUG_ON(!client); + intel_guc_sample_forcewake(guc); + + spin_lock_irq(&engine->timeline->lock); + list_for_each_entry(rq, &engine->timeline->requests, link) { + guc_client_update_wq_rsvd(client, wqi_size); + __i915_guc_submit(rq); + } + spin_unlock_irq(&engine->timeline->lock); +} + /** * intel_guc_suspend() - notify GuC entering suspend state * @dev_priv: i915 device private @@ -1386,3 +1405,32 @@ int intel_guc_resume(struct drm_i915_private *dev_priv) return intel_guc_send(guc, data, ARRAY_SIZE(data)); } + +int i915_guc_request_engine_reset(struct intel_engine_cs *engine) +{ + struct drm_i915_private *dev_priv = engine->i915; + struct intel_guc *guc = &dev_priv->guc; + struct i915_gem_context *ctx; + u32 data[7]; + + if (!i915.enable_guc_submission) + return 0; + + ctx = dev_priv->kernel_context; + + /* + * The affected context report is populated by GuC and is provided + * to the driver using the shared page. We request for it but don't + * use it as scheduler has all of these details. + */ + data[0] = INTEL_GUC_ACTION_REQUEST_ENGINE_RESET; + data[1] = engine->guc_id; + data[2] = INTEL_GUC_RESET_OPTION_REPORT_AFFECTED_CONTEXTS; + data[3] = 0; + data[4] = 0; + data[5] = guc->execbuf_client->stage_id; + /* first page is shared data with GuC */ + data[6] = guc_ggtt_offset(ctx->engine[RCS].state); + + return intel_guc_send(guc, data, ARRAY_SIZE(data)); +} diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h b/drivers/gpu/drm/i915/intel_guc_fwif.h index cb36cbf3818f..b627206b8f56 100644 --- a/drivers/gpu/drm/i915/intel_guc_fwif.h +++ b/drivers/gpu/drm/i915/intel_guc_fwif.h @@ -506,6 +506,7 @@ union guc_log_control { /* This Action will be programmed in C180 - SOFT_SCRATCH_O_REG */ enum intel_guc_action { INTEL_GUC_ACTION_DEFAULT = 0x0, + INTEL_GUC_ACTION_REQUEST_ENGINE_RESET = 0x3, INTEL_GUC_ACTION_SAMPLE_FORCEWAKE = 0x6, INTEL_GUC_ACTION_ALLOCATE_DOORBELL = 0x10, INTEL_GUC_ACTION_DEALLOCATE_DOORBELL = 0x20, @@ -519,6 +520,11 @@ enum intel_guc_action { INTEL_GUC_ACTION_LIMIT }; +/* Reset engine options */ +enum action_engine_reset_options { + INTEL_GUC_RESET_OPTION_REPORT_AFFECTED_CONTEXTS = 0x10, +}; + /* * The GuC sends its response to a command by overwriting the * command in SS0. The response is distinguishable from a command diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index dd0e9d587852..bc224a24ddad 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1187,14 +1187,15 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine) /* After a GPU reset, we may have requests to replay */ clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted); - if (!i915.enable_guc_submission && !execlists_elsp_idle(engine)) { + if (!execlists_elsp_idle(engine)) { DRM_DEBUG_DRIVER("Restarting %s from requests [0x%x, 0x%x]\n", engine->name, port_seqno(&engine->execlist_port[0]), port_seqno(&engine->execlist_port[1])); engine->execlist_port[0].count = 0; engine->execlist_port[1].count = 0; - execlists_submit_ports(engine); + if (!dev_priv->guc.execbuf_client) + execlists_submit_ports(engine); } return 0; diff --git a/drivers/gpu/drm/i915/intel_uc.h b/drivers/gpu/drm/i915/intel_uc.h index 6cf2d14fa0dc..4fb93f682b9c 100644 --- a/drivers/gpu/drm/i915/intel_uc.h +++ b/drivers/gpu/drm/i915/intel_uc.h @@ -221,6 +221,7 @@ int i915_guc_wq_reserve(struct drm_i915_gem_request *rq); void i915_guc_wq_unreserve(struct drm_i915_gem_request *request); void i915_guc_submission_disable(struct drm_i915_private *dev_priv); void i915_guc_submission_fini(struct drm_i915_private *dev_priv); +void i915_guc_submission_reenable_engine(struct intel_engine_cs *engine); struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size); /* intel_guc_log.c */ diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 9886d7bd11ba..533c86f41092 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1756,14 +1756,9 @@ bool intel_has_gpu_reset(struct drm_i915_private *dev_priv) return intel_get_gpu_reset(dev_priv) != NULL; } -/* - * When GuC submission is enabled, GuC manages ELSP and can initiate the - * engine reset too. For now, fall back to full GPU reset if it is enabled. - */ bool intel_has_reset_engine(struct drm_i915_private *dev_priv) { return (dev_priv->info.has_reset_engine && - !dev_priv->guc.execbuf_client && i915.reset == 2); }