From patchwork Tue Sep 10 13:16:50 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mika Kuoppala X-Patchwork-Id: 2866021 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 93D1EBF43F for ; Tue, 10 Sep 2013 13:17:42 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 945612020E for ; Tue, 10 Sep 2013 13:17:37 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id A24E220207 for ; Tue, 10 Sep 2013 13:17:35 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 54CC4E719C for ; Tue, 10 Sep 2013 06:17:35 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga14.intel.com (mga14.intel.com [143.182.124.37]) by gabe.freedesktop.org (Postfix) with ESMTP id 64C81E5CC3 for ; Tue, 10 Sep 2013 06:17:18 -0700 (PDT) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by azsmga102.ch.intel.com with ESMTP; 10 Sep 2013 06:17:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.90,878,1371106800"; d="scan'208";a="393578571" Received: from rosetta.fi.intel.com (HELO rosetta) ([10.237.72.86]) by fmsmga001.fm.intel.com with ESMTP; 10 Sep 2013 06:16:49 -0700 Received: by rosetta (Postfix, from userid 1000) id E1CE980078; Tue, 10 Sep 2013 16:16:50 +0300 (EEST) From: Mika Kuoppala To: intel-gfx@lists.freedesktop.org Date: Tue, 10 Sep 2013 16:16:50 +0300 Message-Id: <1378819010-5173-1-git-send-email-mika.kuoppala@intel.com> X-Mailer: git-send-email 1.7.9.5 Cc: Ben Widawsky , Paul Berry Subject: [Intel-gfx] [PATCH] drm/i915: optionally ban context on first hang X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Current policy is to ban context if it manages to hang gpu in a certain time windows. Paul Berry asked if more strict policy could be available for use cases where the application doesn't know if the rendering command stream sent to gpu is valid or not. Provide an option, flag on context creation time, to let userspace to set more strict policy for handling gpu hangs for this context. If context with this flag set ever hangs the gpu, it will be permanently banned from accessing the GPU. All subsequent batch submissions will return -EIO. Requested-by: Paul Berry Cc: Paul Berry Cc: Ben Widawsky Signed-off-by: Mika Kuoppala --- drivers/gpu/drm/i915/i915_dma.c | 3 +++ drivers/gpu/drm/i915/i915_drv.h | 3 +++ drivers/gpu/drm/i915/i915_gem.c | 9 ++++++++- drivers/gpu/drm/i915/i915_gem_context.c | 12 +++++++++--- include/uapi/drm/i915_drm.h | 5 +++++ 5 files changed, 28 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 3de6050..4353458 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -1003,6 +1003,9 @@ static int i915_getparam(struct drm_device *dev, void *data, case I915_PARAM_HAS_EXEC_HANDLE_LUT: value = 1; break; + case I915_PARAM_HAS_CONTEXT_BAN: + value = 1; + break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL; diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 81ba5bb..9cf7050 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -594,6 +594,9 @@ struct i915_ctx_hang_stats { /* This context is banned to submit more work */ bool banned; + + /* Instead of default period based ban policy, ban on first hang */ + bool ban_on_first; }; /* This must match up with the value previously used for execbuf2.rsvd1. */ diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 04e810c..9feaafd2 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2190,11 +2190,18 @@ static bool i915_request_guilty(struct drm_i915_gem_request *request, static bool i915_context_is_banned(const struct i915_ctx_hang_stats *hs) { - const unsigned long elapsed = get_seconds() - hs->guilty_ts; + unsigned long elapsed; if (hs->banned) return true; + if (hs->ban_on_first) { + DRM_ERROR("context banned on first hang!\n"); + return true; + } + + elapsed = get_seconds() - hs->guilty_ts; + if (elapsed <= DRM_I915_CTX_BAN_PERIOD) { DRM_ERROR("context hanging too fast, declaring banned!\n"); return true; diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 26c3fcc..8baa1d0 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -135,7 +135,8 @@ void i915_gem_context_free(struct kref *ctx_ref) static struct i915_hw_context * create_hw_context(struct drm_device *dev, - struct drm_i915_file_private *file_priv) + struct drm_i915_file_private *file_priv, + const u64 flags) { struct drm_i915_private *dev_priv = dev->dev_private; struct i915_hw_context *ctx; @@ -178,6 +179,7 @@ create_hw_context(struct drm_device *dev, ctx->file_priv = file_priv; ctx->id = ret; + ctx->hang_stats.ban_on_first = !!(flags & I915_CONTEXT_BAN_ON_HANG); return ctx; @@ -203,7 +205,7 @@ static int create_default_context(struct drm_i915_private *dev_priv) BUG_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex)); - ctx = create_hw_context(dev_priv->dev, NULL); + ctx = create_hw_context(dev_priv->dev, NULL, 0); if (IS_ERR(ctx)) return PTR_ERR(ctx); @@ -520,11 +522,15 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data, if (dev_priv->hw_contexts_disabled) return -ENODEV; + if (args->flags & __I915_CONTEXT_UNKNOWN_FLAGS) + return -EINVAL; + ret = i915_mutex_lock_interruptible(dev); if (ret) return ret; - ctx = create_hw_context(dev, file_priv); + ctx = create_hw_context(dev, file_priv, args->flags); + mutex_unlock(&dev->struct_mutex); if (IS_ERR(ctx)) return PTR_ERR(ctx); diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 55bb572..a020454 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -335,6 +335,7 @@ typedef struct drm_i915_irq_wait { #define I915_PARAM_HAS_EXEC_NO_RELOC 25 #define I915_PARAM_HAS_EXEC_HANDLE_LUT 26 #define I915_PARAM_HAS_WT 27 +#define I915_PARAM_HAS_CONTEXT_BAN 28 typedef struct drm_i915_getparam { int param; @@ -1015,10 +1016,14 @@ struct drm_i915_gem_wait { __s64 timeout_ns; }; +#define I915_CONTEXT_BAN_ON_HANG (1 << 0) +#define __I915_CONTEXT_UNKNOWN_FLAGS -(I915_CONTEXT_BAN_ON_HANG<<1) + struct drm_i915_gem_context_create { /* output: id of new context*/ __u32 ctx_id; __u32 pad; + __u64 flags; }; struct drm_i915_gem_context_destroy {