From patchwork Thu Feb 23 19:44:19 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michel Thierry X-Patchwork-Id: 9588897 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 90662604A2 for ; Thu, 23 Feb 2017 19:44:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7CC9028773 for ; Thu, 23 Feb 2017 19:44:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7196C288A0; Thu, 23 Feb 2017 19:44:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id E808828773 for ; Thu, 23 Feb 2017 19:44:32 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7FA776EB28; Thu, 23 Feb 2017 19:44:32 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by gabe.freedesktop.org (Postfix) with ESMTPS id 62A2F6EB1D for ; Thu, 23 Feb 2017 19:44:22 +0000 (UTC) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Feb 2017 11:44:22 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos; i="5.35,198,1484035200"; d="scan'208"; a="1134035320" Received: from relo-linux-11.sc.intel.com ([10.3.160.214]) by fmsmga002.fm.intel.com with ESMTP; 23 Feb 2017 11:44:21 -0800 From: Michel Thierry To: intel-gfx@lists.freedesktop.org Date: Thu, 23 Feb 2017 11:44:19 -0800 Message-Id: <20170223194421.28463-3-michel.thierry@intel.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170223194421.28463-1-michel.thierry@intel.com> References: <20170223194421.28463-1-michel.thierry@intel.com> Subject: [Intel-gfx] [RFC 3/3] drm/i915: Watchdog timeout: DRM kernel interface to set the timeout X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Final enablement patch for GPU hang detection using watchdog timeout. Using the gem_context_setparam ioctl, users can specify the desired timeout value in milliseconds, and the driver will do the conversion to 'timestamps'. The _recommended_ default watchdog threshold for video engines is 60 ms, since this has been _empirically determined_ to be a good compromise for low-latency requirements and low rate of false positives. The default register value is ~106ms and the theoretical max value (all 1s) is 353 seconds. Signed-off-by: Tomas Elf Signed-off-by: Arun Siluvery Signed-off-by: Michel Thierry --- drivers/gpu/drm/i915/i915_gem_context.c | 46 ++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_gem_context.h | 3 ++ drivers/gpu/drm/i915/i915_gem_execbuffer.c | 8 +----- include/uapi/drm/i915_drm.h | 1 + 4 files changed, 51 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 99c46f4dbde6..c3748878e64c 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -440,6 +440,32 @@ i915_gem_context_create_gvt(struct drm_device *dev) return ctx; } +void i915_context_watchdog_setup(struct i915_gem_context *ctx, u32 value_in_ms) +{ + /* + * Based on time out value (ms) calculate + * timer count thresholds needed based on core frequency. + */ +#define TIMER_MILLISECOND 1000 + + /* + * Timestamp timer resolution = 0.080 uSec, + * or 12500000 counts per second + */ +#define TIMESTAMP_CNTS_PER_SEC_80NS 12500000 + + ctx->watchdog_threshold = + ((value_in_ms) * + ((TIMESTAMP_CNTS_PER_SEC_80NS) / (TIMER_MILLISECOND))); + + /* + * watchdog register must never be programmed to zero. This would + * cause the watchdog counter to exceed and not allow the engine to + * go into IDLE state + */ + GEM_BUG_ON(ctx->watchdog_threshold == 0); +} + int i915_gem_context_init(struct drm_i915_private *dev_priv) { struct i915_gem_context *ctx; @@ -1056,6 +1082,7 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, struct drm_i915_file_private *file_priv = file->driver_priv; struct drm_i915_gem_context_param *args = data; struct i915_gem_context *ctx; + struct intel_engine_cs *engine; int ret; ret = i915_mutex_lock_interruptible(dev); @@ -1090,6 +1117,15 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, case I915_CONTEXT_PARAM_BANNABLE: args->value = i915_gem_context_is_bannable(ctx); break; + case I915_CONTEXT_PARAM_WATCHDOG: + engine = to_i915(dev)->engine[VCS]; + if (!engine->emit_start_watchdog) + ret = -EINVAL; + else if (args->value) + ret = -EINVAL; + else + args->value = ctx->watchdog_threshold; + break; default: ret = -EINVAL; break; @@ -1105,6 +1141,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, struct drm_i915_file_private *file_priv = file->driver_priv; struct drm_i915_gem_context_param *args = data; struct i915_gem_context *ctx; + struct intel_engine_cs *engine; int ret; ret = i915_mutex_lock_interruptible(dev); @@ -1147,6 +1184,15 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, else i915_gem_context_clear_bannable(ctx); break; + case I915_CONTEXT_PARAM_WATCHDOG: + engine = to_i915(dev)->engine[VCS]; + if (!engine->emit_start_watchdog) + ret = -EINVAL; + else if (!args->value) + ret = -EINVAL; + else + i915_context_watchdog_setup(ctx, args->value); + break; default: ret = -EINVAL; break; diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h index 0ac750b90f3d..133ed7b413aa 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.h +++ b/drivers/gpu/drm/i915/i915_gem_context.h @@ -176,6 +176,9 @@ struct i915_gem_context { /** ban_score: Accumulated score of all hangs caused by this context. */ int ban_score; + /** watchdog_threshold: hw watchdog threshold value, in clock counts */ + u32 watchdog_threshold; + /** remap_slice: Bitmask of cache lines that need remapping */ u8 remap_slice; }; diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 348d81c40e81..26c50a6d6158 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1419,12 +1419,6 @@ execbuf_submit(struct i915_execbuffer_params *params, bool watchdog_running = false; int ret; - /* - * NB: Place-holder until watchdog timeout is enabled through DRM - * interface - */ - bool enable_watchdog = false; - ret = i915_gem_execbuffer_move_to_gpu(params->request, vmas); if (ret) return ret; @@ -1488,7 +1482,7 @@ execbuf_submit(struct i915_execbuffer_params *params, } /* Start watchdog timer */ - if (enable_watchdog) { + if (params->ctx->watchdog_threshold != 0) { if (!params->engine->emit_start_watchdog) return -EINVAL; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 3554495bef13..e318c4f53a9e 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1289,6 +1289,7 @@ struct drm_i915_gem_context_param { #define I915_CONTEXT_PARAM_GTT_SIZE 0x3 #define I915_CONTEXT_PARAM_NO_ERROR_CAPTURE 0x4 #define I915_CONTEXT_PARAM_BANNABLE 0x5 +#define I915_CONTEXT_PARAM_WATCHDOG 0x6 __u64 value; };