From patchwork Wed May 17 20:41:34 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michel Thierry X-Patchwork-Id: 9731883 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 28536601BC for ; Wed, 17 May 2017 20:41:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E42EE287C2 for ; Wed, 17 May 2017 20:41:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D728B287E0; Wed, 17 May 2017 20:41:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id BC239287E6 for ; Wed, 17 May 2017 20:41:38 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 52CCC6E0D6; Wed, 17 May 2017 20:41:37 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by gabe.freedesktop.org (Postfix) with ESMTPS id D0A286E0D6 for ; Wed, 17 May 2017 20:41:35 +0000 (UTC) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 May 2017 13:41:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.38,355,1491289200"; d="scan'208";a="969624737" Received: from relo-linux-11.sc.intel.com ([10.3.160.214]) by orsmga003.jf.intel.com with ESMTP; 17 May 2017 13:41:34 -0700 From: Michel Thierry To: intel-gfx@lists.freedesktop.org Date: Wed, 17 May 2017 13:41:34 -0700 Message-Id: <20170517204134.25388-1-michel.thierry@intel.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170427231300.32841-6-michel.thierry@intel.com> References: <20170427231300.32841-6-michel.thierry@intel.com> Subject: [Intel-gfx] [PATCH] drm/i915: Cancel reset-engine if we couldn't find an active request X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Before reseting an engine, check if there is an active request, and if the _hung_ request has completed. In these two cases, the seqno has moved after hang declaration and we can skip the reset. Also store the active request so that we only search for it once, this applies for reset-engine and full reset. v2: Check for request completion inside _prepare_engine, don't use ECANCELED, remove unnecessary null checks (Chris). v3: Capture active requests during reset_prepare and store it the engine hangcheck obj (Chris). Suggested-by: Chris Wilson Signed-off-by: Michel Thierry --- drivers/gpu/drm/i915/i915_drv.c | 18 ++++++++++---- drivers/gpu/drm/i915/i915_drv.h | 3 ++- drivers/gpu/drm/i915/i915_gem.c | 42 +++++++++++++++++++++++---------- drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + 4 files changed, 46 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index d62793805794..771857258292 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1900,15 +1900,19 @@ int i915_reset_engine(struct intel_engine_cs *engine) DRM_DEBUG_DRIVER("resetting %s\n", engine->name); - ret = i915_gem_reset_prepare_engine(engine); - if (ret) { - DRM_ERROR("Previous reset failed - promote to full reset\n"); + engine->hangcheck.active_request = i915_gem_reset_prepare_engine(engine); + if (!engine->hangcheck.active_request) { + DRM_DEBUG_DRIVER("seqno moved after hang declaration, pardoned\n"); + goto canceled; + } else if (IS_ERR(engine->hangcheck.active_request)) { + DRM_DEBUG_DRIVER("Previous reset failed, promote to full reset\n"); + ret = PTR_ERR(engine->hangcheck.active_request); goto out; } /* - * the request that caused the hang is stuck on elsp, identify the - * active request and drop it, adjust head to skip the offending + * the request that caused the hang is stuck on elsp, we know the + * active request and can drop it, adjust head to skip the offending * request to resume executing remaining requests in the queue. */ i915_gem_reset_engine(engine); @@ -1942,6 +1946,10 @@ int i915_reset_engine(struct intel_engine_cs *engine) out: return ret; + +canceled: + i915_gem_reset_finish_engine(engine); + return 0; } static int i915_pm_suspend(struct device *kdev) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index a5b9c666b3bf..6cbfeaa02246 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -3370,7 +3370,8 @@ static inline u32 i915_reset_count(struct i915_gpu_error *error) return READ_ONCE(error->reset_count); } -int i915_gem_reset_prepare_engine(struct intel_engine_cs *engine); +struct drm_i915_gem_request * +i915_gem_reset_prepare_engine(struct intel_engine_cs *engine); int i915_gem_reset_prepare(struct drm_i915_private *dev_priv); void i915_gem_reset(struct drm_i915_private *dev_priv); void i915_gem_reset_finish_engine(struct intel_engine_cs *engine); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index b5dc073a5ddc..5ec454dafb9f 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2793,12 +2793,14 @@ static bool engine_stalled(struct intel_engine_cs *engine) return true; } -/* Ensure irq handler finishes, and not run again. */ -int i915_gem_reset_prepare_engine(struct intel_engine_cs *engine) +/* + * Ensure irq handler finishes, and not run again. + * Also store the active request so that we only search for it once. + */ +struct drm_i915_gem_request * +i915_gem_reset_prepare_engine(struct intel_engine_cs *engine) { - struct drm_i915_gem_request *request; - int err = 0; - + struct drm_i915_gem_request *request = NULL; /* Prevent the signaler thread from updating the request * state (by calling dma_fence_signal) as we are processing @@ -2827,21 +2829,35 @@ int i915_gem_reset_prepare_engine(struct intel_engine_cs *engine) if (engine_stalled(engine)) { request = i915_gem_find_active_request(engine); - if (request && request->fence.error == -EIO) - err = -EIO; /* Previous reset failed! */ + + if (request) { + if (request->fence.error == -EIO) + return ERR_PTR(-EIO); /* Previous reset failed! */ + + if (i915_gem_request_completed(request)) + return NULL; /* request completed, skip it */ + } } - return err; + return request; } int i915_gem_reset_prepare(struct drm_i915_private *dev_priv) { struct intel_engine_cs *engine; + struct drm_i915_gem_request *request; enum intel_engine_id id; int err = 0; - for_each_engine(engine, dev_priv, id) - err = i915_gem_reset_prepare_engine(engine); + for_each_engine(engine, dev_priv, id) { + request = i915_gem_reset_prepare_engine(engine); + if (IS_ERR(request)) { + err = PTR_ERR(request); + break; + } + + engine->hangcheck.active_request = request; + } i915_gem_revoke_fences(dev_priv); @@ -2930,9 +2946,11 @@ static bool i915_gem_reset_request(struct drm_i915_gem_request *request) void i915_gem_reset_engine(struct intel_engine_cs *engine) { - struct drm_i915_gem_request *request; + struct drm_i915_gem_request *request = engine->hangcheck.active_request; + + if (!request) + request = i915_gem_find_active_request(engine); - request = i915_gem_find_active_request(engine); if (request && i915_gem_reset_request(request)) { DRM_DEBUG_DRIVER("resetting %s to restart from tail of request 0x%x\n", engine->name, request->global_seqno); diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index ec16fb6fde62..f850c4b12337 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -121,6 +121,7 @@ struct intel_engine_hangcheck { unsigned long action_timestamp; int deadlock; struct intel_instdone instdone; + struct drm_i915_gem_request *active_request; bool stalled; };