From patchwork Mon May 15 21:20:01 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michel Thierry X-Patchwork-Id: 9727983 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 50FFB60386 for ; Mon, 15 May 2017 21:20:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5722228414 for ; Mon, 15 May 2017 21:20:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 48AFF289B9; Mon, 15 May 2017 21:20:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id C8D1528414 for ; Mon, 15 May 2017 21:20:03 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0B3AD89F35; Mon, 15 May 2017 21:20:03 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by gabe.freedesktop.org (Postfix) with ESMTPS id A451589F2D for ; Mon, 15 May 2017 21:20:01 +0000 (UTC) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP; 15 May 2017 14:20:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos; i="5.38,346,1491289200"; d="scan'208"; a="1130668385" Received: from relo-linux-11.sc.intel.com ([10.3.160.214]) by orsmga001.jf.intel.com with ESMTP; 15 May 2017 14:20:01 -0700 From: Michel Thierry To: intel-gfx@lists.freedesktop.org Date: Mon, 15 May 2017 14:20:01 -0700 Message-Id: <20170515212001.16418-1-michel.thierry@intel.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170427231300.32841-6-michel.thierry@intel.com> References: <20170427231300.32841-6-michel.thierry@intel.com> Subject: [Intel-gfx] [PATCH 05/20] drm/i915: Cancel reset-engine if we couldn't find an active request X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Before reseting an engine, check if there is an active request, and if the _hung_ request has completed. In these two cases, the seqno has moved after hang declaration and we can skip the reset. Also store the active request so that we only search for it once. v2: Check for request completion inside _prepare_engine, don't use ECANCELED, remove unnecessary null checks (Chris). Suggested-by: Chris Wilson Signed-off-by: Michel Thierry --- drivers/gpu/drm/i915/i915_drv.c | 21 +++++++++++++------ drivers/gpu/drm/i915/i915_drv.h | 6 ++++-- drivers/gpu/drm/i915/i915_gem.c | 45 ++++++++++++++++++++++++++++------------- 3 files changed, 50 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index d62793805794..6ee60c1e17ee 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1895,23 +1895,28 @@ int i915_reset_engine(struct intel_engine_cs *engine) int ret; struct drm_i915_private *dev_priv = engine->i915; struct i915_gpu_error *error = &dev_priv->gpu_error; + struct drm_i915_gem_request *active_request; GEM_BUG_ON(!test_bit(I915_RESET_ENGINE_IN_PROGRESS, &error->flags)); DRM_DEBUG_DRIVER("resetting %s\n", engine->name); - ret = i915_gem_reset_prepare_engine(engine); - if (ret) { - DRM_ERROR("Previous reset failed - promote to full reset\n"); + active_request = i915_gem_reset_prepare_engine(engine); + if (!active_request) { + DRM_DEBUG_DRIVER("seqno moved after hang declaration, pardoned\n"); + goto canceled; + } else if (IS_ERR(active_request)) { + DRM_DEBUG_DRIVER("Previous reset failed, promote to full reset\n"); + ret = PTR_ERR(active_request); goto out; } /* - * the request that caused the hang is stuck on elsp, identify the - * active request and drop it, adjust head to skip the offending + * the request that caused the hang is stuck on elsp, we know the + * active request and can drop it, adjust head to skip the offending * request to resume executing remaining requests in the queue. */ - i915_gem_reset_engine(engine); + i915_gem_reset_engine(engine, active_request); /* forcing engine to idle */ ret = intel_reset_engine_start(engine); @@ -1942,6 +1947,10 @@ int i915_reset_engine(struct intel_engine_cs *engine) out: return ret; + +canceled: + i915_gem_reset_finish_engine(engine); + return 0; } static int i915_pm_suspend(struct device *kdev) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index a5b9c666b3bf..f8cbd286f904 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -3370,14 +3370,16 @@ static inline u32 i915_reset_count(struct i915_gpu_error *error) return READ_ONCE(error->reset_count); } -int i915_gem_reset_prepare_engine(struct intel_engine_cs *engine); +struct drm_i915_gem_request * +i915_gem_reset_prepare_engine(struct intel_engine_cs *engine); int i915_gem_reset_prepare(struct drm_i915_private *dev_priv); void i915_gem_reset(struct drm_i915_private *dev_priv); void i915_gem_reset_finish_engine(struct intel_engine_cs *engine); void i915_gem_reset_finish(struct drm_i915_private *dev_priv); void i915_gem_set_wedged(struct drm_i915_private *dev_priv); bool i915_gem_unset_wedged(struct drm_i915_private *dev_priv); -void i915_gem_reset_engine(struct intel_engine_cs *engine); +void i915_gem_reset_engine(struct intel_engine_cs *engine, + struct drm_i915_gem_request *request); void i915_gem_init_mmio(struct drm_i915_private *i915); int __must_check i915_gem_init(struct drm_i915_private *dev_priv); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index b5dc073a5ddc..2e47678315d4 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2793,12 +2793,15 @@ static bool engine_stalled(struct intel_engine_cs *engine) return true; } -/* Ensure irq handler finishes, and not run again. */ -int i915_gem_reset_prepare_engine(struct intel_engine_cs *engine) +/* + * Ensure irq handler finishes, and not run again. + * For reset-engine we also store the active request so that we only search + * for it once. + */ +struct drm_i915_gem_request * +i915_gem_reset_prepare_engine(struct intel_engine_cs *engine) { - struct drm_i915_gem_request *request; - int err = 0; - + struct drm_i915_gem_request *request = NULL; /* Prevent the signaler thread from updating the request * state (by calling dma_fence_signal) as we are processing @@ -2827,21 +2830,34 @@ int i915_gem_reset_prepare_engine(struct intel_engine_cs *engine) if (engine_stalled(engine)) { request = i915_gem_find_active_request(engine); - if (request && request->fence.error == -EIO) - err = -EIO; /* Previous reset failed! */ + + if (request) { + if (request->fence.error == -EIO) + return ERR_PTR(-EIO); /* Previous reset failed! */ + + if (__i915_gem_request_completed(request, + engine->hangcheck.seqno)) + return NULL; /* request completed, skip reset */ + } } - return err; + return request; } int i915_gem_reset_prepare(struct drm_i915_private *dev_priv) { struct intel_engine_cs *engine; + struct drm_i915_gem_request *request; enum intel_engine_id id; int err = 0; - for_each_engine(engine, dev_priv, id) - err = i915_gem_reset_prepare_engine(engine); + for_each_engine(engine, dev_priv, id) { + request = i915_gem_reset_prepare_engine(engine); + if (IS_ERR(request)) { + err = PTR_ERR(request); + break; + } + } i915_gem_revoke_fences(dev_priv); @@ -2928,11 +2944,12 @@ static bool i915_gem_reset_request(struct drm_i915_gem_request *request) return guilty; } -void i915_gem_reset_engine(struct intel_engine_cs *engine) +void i915_gem_reset_engine(struct intel_engine_cs *engine, + struct drm_i915_gem_request *request) { - struct drm_i915_gem_request *request; + if (!request) + request = i915_gem_find_active_request(engine); - request = i915_gem_find_active_request(engine); if (request && i915_gem_reset_request(request)) { DRM_DEBUG_DRIVER("resetting %s to restart from tail of request 0x%x\n", engine->name, request->global_seqno); @@ -2958,7 +2975,7 @@ void i915_gem_reset(struct drm_i915_private *dev_priv) for_each_engine(engine, dev_priv, id) { struct i915_gem_context *ctx; - i915_gem_reset_engine(engine); + i915_gem_reset_engine(engine, NULL); ctx = fetch_and_zero(&engine->last_retired_context); if (ctx) engine->context_unpin(engine, ctx);