From patchwork Mon Apr 22 20:19:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nirmoy Das X-Patchwork-Id: 13638957 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AA573C4345F for ; Mon, 22 Apr 2024 20:34:05 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C96DA10EE07; Mon, 22 Apr 2024 20:33:59 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="Q5RHuB3R"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1DEA810EB5E; Mon, 22 Apr 2024 20:33:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1713818038; x=1745354038; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=rHjb9dn/8s5k2vqqkRS8COycFlT+zzP7jHa4lafViak=; b=Q5RHuB3R/EJriX9ivas9dklzX1d8eGr5OuagGF89XV/BaoFFYufyVpCe OAn7GDjljPV10/6oRBVKEm/kO5kSE05iaOpRDrAtwZkO7LZHAcy5FYj/M dvqZ09Gyr8PbZzFNnIWmVPXAYKA98MVPBARGks0Dr1bbYv1q2JkmB2WIl LAKc+sPFc+do3GhiXqG724VZtLY4fP2SzO82vJ8k7GeYpXKeZ1ExG1gGl httTNVF4/1HEdYpHe+tn9Ufj68Sl7X0WpxOpAihnnia/A5UhacnePJlct XurpHcpxmO6CKbfpa5Yio6MUoqA65olD4tfw2O2NnlnIc5X24rmMpEQoZ w==; X-CSE-ConnectionGUID: /cKx5N20Rcq1zwAU6vgVzg== X-CSE-MsgGUID: BpvydWxeR6ictoR4zKI3/Q== X-IronPort-AV: E=McAfee;i="6600,9927,11052"; a="20781594" X-IronPort-AV: E=Sophos;i="6.07,221,1708416000"; d="scan'208";a="20781594" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Apr 2024 13:33:58 -0700 X-CSE-ConnectionGUID: sZpNf4E4RLG2cTdDm/WErg== X-CSE-MsgGUID: 07NcmmcIR8qk5CeCku2CCQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,221,1708416000"; d="scan'208";a="24191719" Received: from nirmoyda-desk.igk.intel.com ([10.102.138.190]) by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Apr 2024 13:33:56 -0700 From: Nirmoy Das To: intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org, John.C.Harrison@Intel.com, Nirmoy Das , John Harrison Subject: [PATCH v2 1/2] drm/i915: Refactor confusing __intel_gt_reset() Date: Mon, 22 Apr 2024 22:19:50 +0200 Message-ID: <20240422201951.633-1-nirmoy.das@intel.com> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Organization: Intel Deutschland GmbH, Registered Address: Am Campeon 10, 85579 Neubiberg, Germany, Commercial Register: Amtsgericht Muenchen HRB 186928 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" __intel_gt_reset() is really for resetting engines though the name might suggest something else. So add a helper function to remove confusions with no functional changes. v2: Move intel_gt_reset_all_engines() next to intel_gt_reset_engine() to make diff simple(John) Cc: John Harrison Signed-off-by: Nirmoy Das Reviewed-by: John Harrison Reviewed-by: Andi Shyti --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 2 +- .../drm/i915/gt/intel_execlists_submission.c | 2 +- drivers/gpu/drm/i915/gt/intel_gt.c | 2 +- drivers/gpu/drm/i915/gt/intel_gt_pm.c | 2 +- drivers/gpu/drm/i915/gt/intel_reset.c | 35 +++++++++++++++---- drivers/gpu/drm/i915/gt/intel_reset.h | 3 +- drivers/gpu/drm/i915/gt/selftest_reset.c | 2 +- drivers/gpu/drm/i915/i915_driver.c | 2 +- 8 files changed, 37 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 8c44af1c3451..5c8e9ee3b008 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -678,7 +678,7 @@ void intel_engines_release(struct intel_gt *gt) */ GEM_BUG_ON(intel_gt_pm_is_awake(gt)); if (!INTEL_INFO(gt->i915)->gpu_reset_clobbers_display) - __intel_gt_reset(gt, ALL_ENGINES); + intel_gt_reset_all_engines(gt); /* Decouple the backend; but keep the layout for late GPU resets */ for_each_engine(engine, gt, id) { diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 355aab5b38ba..21829439e686 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -2898,7 +2898,7 @@ static void enable_error_interrupt(struct intel_engine_cs *engine) drm_err(&engine->i915->drm, "engine '%s' resumed still in error: %08x\n", engine->name, status); - __intel_gt_reset(engine->gt, engine->mask); + intel_gt_reset_engine(engine); } /* diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 580b5141ce1e..626b166e67ef 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -832,7 +832,7 @@ void intel_gt_driver_unregister(struct intel_gt *gt) /* Scrub all HW state upon release */ with_intel_runtime_pm(gt->uncore->rpm, wakeref) - __intel_gt_reset(gt, ALL_ENGINES); + intel_gt_reset_all_engines(gt); } void intel_gt_driver_release(struct intel_gt *gt) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index 220ac4f92edf..c08fdb65cc69 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -159,7 +159,7 @@ static bool reset_engines(struct intel_gt *gt) if (INTEL_INFO(gt->i915)->gpu_reset_clobbers_display) return false; - return __intel_gt_reset(gt, ALL_ENGINES) == 0; + return intel_gt_reset_all_engines(gt) == 0; } static void gt_sanitize(struct intel_gt *gt, bool force) diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index c8e9aa41fdea..b1393863ca9b 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -764,7 +764,7 @@ wa_14015076503_end(struct intel_gt *gt, intel_engine_mask_t engine_mask) HECI_H_GS1_ER_PREP, 0); } -int __intel_gt_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask) +static int __intel_gt_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask) { const int retries = engine_mask == ALL_ENGINES ? RESET_MAX_RETRIES : 1; reset_func reset; @@ -978,7 +978,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt) /* Even if the GPU reset fails, it should still stop the engines */ if (!INTEL_INFO(gt->i915)->gpu_reset_clobbers_display) - __intel_gt_reset(gt, ALL_ENGINES); + intel_gt_reset_all_engines(gt); for_each_engine(engine, gt, id) engine->submit_request = nop_submit_request; @@ -1089,7 +1089,7 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt) /* We must reset pending GPU events before restoring our submission */ ok = !HAS_EXECLISTS(gt->i915); /* XXX better agnosticism desired */ if (!INTEL_INFO(gt->i915)->gpu_reset_clobbers_display) - ok = __intel_gt_reset(gt, ALL_ENGINES) == 0; + ok = intel_gt_reset_all_engines(gt) == 0; if (!ok) { /* * Warn CI about the unrecoverable wedged condition. @@ -1133,10 +1133,10 @@ static int do_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask) { int err, i; - err = __intel_gt_reset(gt, ALL_ENGINES); + err = intel_gt_reset_all_engines(gt); for (i = 0; err && i < RESET_MAX_RETRIES; i++) { msleep(10 * (i + 1)); - err = __intel_gt_reset(gt, ALL_ENGINES); + err = intel_gt_reset_all_engines(gt); } if (err) return err; @@ -1270,7 +1270,30 @@ void intel_gt_reset(struct intel_gt *gt, goto finish; } -static int intel_gt_reset_engine(struct intel_engine_cs *engine) +/** + * intel_gt_reset_all_engines() - Reset all engines in the given gt. + * @gt: the GT to reset all engines for. + * + * This function resets all engines within the given gt. + * + * Returns: + * Zero on success, negative error code on failure. + */ +int intel_gt_reset_all_engines(struct intel_gt *gt) +{ + return __intel_gt_reset(gt, ALL_ENGINES); +} + +/** + * intel_gt_reset_engine() - Reset a specific engine within a gt. + * @engine: engine to be reset. + * + * This function resets the specified engine within a gt. + * + * Returns: + * Zero on success, negative error code on failure. + */ +int intel_gt_reset_engine(struct intel_engine_cs *engine) { return __intel_gt_reset(engine->gt, engine->mask); } diff --git a/drivers/gpu/drm/i915/gt/intel_reset.h b/drivers/gpu/drm/i915/gt/intel_reset.h index f615b30b81c5..c00de353075c 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.h +++ b/drivers/gpu/drm/i915/gt/intel_reset.h @@ -54,7 +54,8 @@ int intel_gt_terminally_wedged(struct intel_gt *gt); void intel_gt_set_wedged_on_init(struct intel_gt *gt); void intel_gt_set_wedged_on_fini(struct intel_gt *gt); -int __intel_gt_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask); +int intel_gt_reset_engine(struct intel_engine_cs *engine); +int intel_gt_reset_all_engines(struct intel_gt *gt); int intel_reset_guc(struct intel_gt *gt); diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c index f40de408cd3a..2cfc23c58e90 100644 --- a/drivers/gpu/drm/i915/gt/selftest_reset.c +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c @@ -281,7 +281,7 @@ static int igt_atomic_reset(void *arg) awake = reset_prepare(gt); p->critical_section_begin(); - err = __intel_gt_reset(gt, ALL_ENGINES); + err = intel_gt_reset_all_engines(gt); p->critical_section_end(); reset_finish(gt, awake); diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c index 4b9233c07a22..622a24305bc2 100644 --- a/drivers/gpu/drm/i915/i915_driver.c +++ b/drivers/gpu/drm/i915/i915_driver.c @@ -202,7 +202,7 @@ static void sanitize_gpu(struct drm_i915_private *i915) unsigned int i; for_each_gt(gt, i915, i) - __intel_gt_reset(gt, ALL_ENGINES); + intel_gt_reset_all_engines(gt); } } From patchwork Mon Apr 22 20:19:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nirmoy Das X-Patchwork-Id: 13638956 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1B52AC10F15 for ; Mon, 22 Apr 2024 20:34:02 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id BBB9310EB5E; Mon, 22 Apr 2024 20:33:59 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="TLnAKju6"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3BBBE10EB5E; Mon, 22 Apr 2024 20:33:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1713818039; x=1745354039; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LUZ7YaAWEq8ANrIZ/9zRKcBMPghyjgHguorL0pKOUiY=; b=TLnAKju6tUR5UcvP7zIw9eKwl7W4rqfYmOAz4a7qVDzXRKMR9uGhjgwc ViKjUY3/JODChXNjKaadullJ6tzaNGnPYtk+HilQMz7tBwJyiW4l7sFqv LBS1z5ttDBCf8gR7tXKdW9eGKUVVJbZLBfqTHyeaT3TT6f8a+aVPKmlBM AvHiRo3beJQY1nj3OKT/0Df3bsnDnSGCIoMljgs/orVpzh9Of8eUnWAAO BUbEHCB4GYoQBSpldBFJ9DxzRWc7cc0UZg67NUEs1fD1PZ2m4CZzcj5Cl yjWOBGek3I7blspKCzEcw8GEAQNcdKeCi1/ldEZNvl35SNe8KxdUNpN9X g==; X-CSE-ConnectionGUID: rbzH6IZuTk+450G3FHha5w== X-CSE-MsgGUID: 19/NFJW8TQOdq/SwS+PuRw== X-IronPort-AV: E=McAfee;i="6600,9927,11052"; a="20781600" X-IronPort-AV: E=Sophos;i="6.07,221,1708416000"; d="scan'208";a="20781600" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Apr 2024 13:33:59 -0700 X-CSE-ConnectionGUID: mtMPpH7xTeK3lBMhu7wdoA== X-CSE-MsgGUID: f0zSYgkWThCPLjiskmjTSQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,221,1708416000"; d="scan'208";a="24191728" Received: from nirmoyda-desk.igk.intel.com ([10.102.138.190]) by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Apr 2024 13:33:57 -0700 From: Nirmoy Das To: intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org, John.C.Harrison@Intel.com, Nirmoy Das , John Harrison Subject: [PATCH v2 2/2] drm/i915: Fix gt reset with GuC submission is disabled Date: Mon, 22 Apr 2024 22:19:51 +0200 Message-ID: <20240422201951.633-2-nirmoy.das@intel.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240422201951.633-1-nirmoy.das@intel.com> References: <20240422201951.633-1-nirmoy.das@intel.com> MIME-Version: 1.0 Organization: Intel Deutschland GmbH, Registered Address: Am Campeon 10, 85579 Neubiberg, Germany, Commercial Register: Amtsgericht Muenchen HRB 186928 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Currently intel_gt_reset() kills the GuC and then resets requested engines. This is problematic because there is a dedicated CSB FIFO which only GuC can access and if that FIFO fills up, the hardware will block on the next context switch until there is space that means the system is effectively hung. If an engine is reset whilst actively executing a context, a CSB entry will be sent to say that the context has gone idle. Thus if reset happens on a very busy system then killing GuC before killing the engines will lead to deadlock because of filled up CSB FIFO. To address this issue, the GuC should be killed only after resetting the requested engines and before calling intel_gt_init_hw(). v2: Improve commit message(John) Cc: John Harrison Signed-off-by: Nirmoy Das Reviewed-by: John Harrison Reviewed-by: Andi Shyti --- drivers/gpu/drm/i915/gt/intel_reset.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index b1393863ca9b..6161f7a3ff70 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -879,8 +879,17 @@ static intel_engine_mask_t reset_prepare(struct intel_gt *gt) intel_engine_mask_t awake = 0; enum intel_engine_id id; - /* For GuC mode, ensure submission is disabled before stopping ring */ - intel_uc_reset_prepare(>->uc); + /** + * For GuC mode with submission enabled, ensure submission + * is disabled before stopping ring. + * + * For GuC mode with submission disabled, ensure that GuC is not + * sanitized, do that after engine reset. reset_prepare() + * is followed by engine reset which in this mode requires GuC to + * process any CSB FIFO entries generated by the resets. + */ + if (intel_uc_uses_guc_submission(>->uc)) + intel_uc_reset_prepare(>->uc); for_each_engine(engine, gt, id) { if (intel_engine_pm_get_if_awake(engine)) @@ -1227,6 +1236,9 @@ void intel_gt_reset(struct intel_gt *gt, intel_overlay_reset(gt->i915); + /* sanitize uC after engine reset */ + if (!intel_uc_uses_guc_submission(>->uc)) + intel_uc_reset_prepare(>->uc); /* * Next we need to restore the context, but we don't use those * yet either...