From patchwork Tue Apr 16 11:32:13 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mika Kuoppala X-Patchwork-Id: 2449121 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by patchwork1.kernel.org (Postfix) with ESMTP id 527273FD40 for ; Tue, 16 Apr 2013 11:32:29 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4FEAEE6362 for ; Tue, 16 Apr 2013 04:32:29 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by gabe.freedesktop.org (Postfix) with ESMTP id B0176E5C21 for ; Tue, 16 Apr 2013 04:32:19 -0700 (PDT) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP; 16 Apr 2013 04:30:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.87,485,1363158000"; d="scan'208";a="295623720" Received: from gaia.fi.intel.com (HELO gaia) ([10.237.72.66]) by orsmga001.jf.intel.com with ESMTP; 16 Apr 2013 04:32:18 -0700 Received: by gaia (Postfix, from userid 1000) id 6D24D40E2D; Tue, 16 Apr 2013 14:32:14 +0300 (EEST) From: Mika Kuoppala To: intel-gfx@lists.freedesktop.org Date: Tue, 16 Apr 2013 14:32:13 +0300 Message-Id: <1366111933-24960-1-git-send-email-mika.kuoppala@intel.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1365089568-20457-15-git-send-email-mika.kuoppala@intel.com> References: <1365089568-20457-15-git-send-email-mika.kuoppala@intel.com> Subject: [Intel-gfx] [PATCH 14/16] drm/i915: refuse to submit more batchbuffers from guilty context X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org If context has recently submitted a faulty batchbuffers guilty of gpu hang and decides to keep submitting more crap, ban it permanently. v2: Store guilty ban status bool in gpu_error instead of pointers that might become danling before hang is declared. Signed-off-by: Mika Kuoppala --- drivers/gpu/drm/i915/i915_drv.c | 6 +++++- drivers/gpu/drm/i915/i915_drv.h | 9 +++++++++ drivers/gpu/drm/i915/i915_gem.c | 14 ++++++++++++-- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 +++++++++++++ 4 files changed, 39 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index bddb9a5..ae689b4 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -885,10 +885,14 @@ int i915_reset(struct drm_device *dev) mutex_lock(&dev->struct_mutex); + /* i915_gem_reset() will set this */ + dev_priv->gpu_error.ctx_banned = false; + i915_gem_reset(dev); ret = -ENODEV; - if (get_seconds() - dev_priv->gpu_error.last_reset < 5) + if (!dev_priv->gpu_error.ctx_banned && + get_seconds() - dev_priv->gpu_error.last_reset < 5) DRM_ERROR("GPU hanging too fast, declaring wedged!\n"); else ret = intel_gpu_reset(dev); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 3b393ed..1945224 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -459,6 +459,12 @@ struct i915_ctx_hang_stats { /* This context had batch active when hang was declared */ unsigned batch_active; + + /* Time when this context was last blamed for a GPU reset */ + unsigned long batch_active_reset_ts; + + /* This context is banned to submit more work */ + bool banned; }; /* This must match up with the value previously used for execbuf2.rsvd1. */ @@ -835,6 +841,9 @@ struct i915_gpu_error { unsigned long last_reset; + /* During reset handling, guilty context found and banned */ + bool ctx_banned; + /** * State variable and reset counter controlling the reset flow * diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0e87765..134528a 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2147,6 +2147,7 @@ static void i915_set_reset_status(struct intel_ring_buffer *ring, struct drm_i915_gem_request *request, u32 acthd) { + struct drm_i915_private *dev_priv = ring->dev->dev_private; struct i915_ctx_hang_stats *hs = NULL; bool inside, guilty; @@ -2175,10 +2176,19 @@ static void i915_set_reset_status(struct intel_ring_buffer *ring, hs = &request->file_priv->hang_stats; if (hs) { - if (guilty) + if (guilty) { + if (!hs->banned && + get_seconds() - hs->batch_active_reset_ts < 5) { + hs->banned = true; + DRM_ERROR("context hanging too fast, " + "declaring banned\n"); + dev_priv->gpu_error.ctx_banned = true; + } hs->batch_active++; - else + hs->batch_active_reset_ts = get_seconds(); + } else { hs->batch_pending++; + } } } diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index bd1750a..f1b1ea9 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -844,6 +844,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, struct drm_clip_rect *cliprects = NULL; struct intel_ring_buffer *ring; struct i915_hw_context *ctx; + struct i915_ctx_hang_stats *hs; u32 ctx_id = i915_execbuffer2_get_context_id(*args); u32 exec_start, exec_len; u32 mask, flags; @@ -1026,6 +1027,18 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, if (ret) goto err; + hs = i915_gem_context_get_hang_stats(&dev_priv->ring[RCS], + file, ctx_id); + if (IS_ERR(hs)) { + ret = PTR_ERR(hs); + goto err; + } + + if (hs->banned) { + ret = -EIO; + goto err; + } + ctx = i915_switch_context(ring, file, ctx_id); if (IS_ERR(ctx)) { ret = PTR_ERR(ctx);