From patchwork Sun Sep 10 03:58:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Previn X-Patchwork-Id: 13378450 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B9491EE7FF4 for ; Sun, 10 Sep 2023 03:59:01 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7143410E079; Sun, 10 Sep 2023 03:58:51 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id DACDD10E075; Sun, 10 Sep 2023 03:58:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1694318328; x=1725854328; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8kgE2vDd4envF6ybBIHG4KsloqsYw8n65ahuHXkC8AA=; b=dX78x3rJlD+0IT55PdlSKy+MMoR6JQxQtrzRSgafPZKB0s9u7Lz4OPXs KZGlFs5GbxYkA4re0VJzRNw8PnrPICisiTvsUITxYFDb+YKY6LOj7gUMI SSt0KZbW1dEfjrL4Dl5haMD4nDsaA0uz44z7gkloOEm/pRVsi0sseOPsD +4CzOKaaiHRhSXYmdYW60O7FXW+YZAxXpwBSU6NL6aFXgnJ7VICMLV0ak Hmi+wu6D320ydRfpL38X3vsbMoj96VSwhftIMZVo1UQMLPs/vE3WtJ6eB rvz5OkfhRi6ffP2BzqSksGRQH4iO5sWKwspQvVf890Ys16G0bmnIQOdyj g==; X-IronPort-AV: E=McAfee;i="6600,9927,10827"; a="381684300" X-IronPort-AV: E=Sophos;i="6.02,241,1688454000"; d="scan'208";a="381684300" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Sep 2023 20:58:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10827"; a="736377110" X-IronPort-AV: E=Sophos;i="6.02,241,1688454000"; d="scan'208";a="736377110" Received: from aalteres-desk.fm.intel.com ([10.80.57.53]) by orsmga007.jf.intel.com with ESMTP; 09 Sep 2023 20:58:48 -0700 From: Alan Previn To: intel-gfx@lists.freedesktop.org Subject: [PATCH v3 1/3] drm/i915/guc: Flush context destruction worker at suspend Date: Sat, 9 Sep 2023 20:58:44 -0700 Message-Id: <20230910035846.493766-2-alan.previn.teres.alexis@intel.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230910035846.493766-1-alan.previn.teres.alexis@intel.com> References: <20230910035846.493766-1-alan.previn.teres.alexis@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: , Alan Previn , dri-devel@lists.freedesktop.org, Daniele Ceraolo Spurio , Rodrigo Vivi , intel.com@freedesktop.org, Mousumi Jana , John Harrison Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" When suspending, flush the context-guc-id deregistration worker at the final stages of intel_gt_suspend_late when we finally call gt_sanitize that eventually leads down to __uc_sanitize so that the deregistration worker doesn't fire off later as we reset the GuC microcontroller. Signed-off-by: Alan Previn Tested-by: Mousumi Jana --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 +++++ drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 ++ drivers/gpu/drm/i915/gt/uc/intel_uc.c | 2 ++ 3 files changed, 9 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index cabdc645fcdd..0ed44637bca0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1578,6 +1578,11 @@ static void guc_flush_submissions(struct intel_guc *guc) spin_unlock_irqrestore(&sched_engine->lock, flags); } +void intel_guc_submission_flush_work(struct intel_guc *guc) +{ + flush_work(&guc->submission_state.destroyed_worker); +} + static void guc_flush_destroyed_contexts(struct intel_guc *guc); void intel_guc_submission_reset_prepare(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index c57b29cdb1a6..b6df75622d3b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -38,6 +38,8 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc, bool interruptible, long timeout); +void intel_guc_submission_flush_work(struct intel_guc *guc); + static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) { return guc->submission_supported; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 98b103375b7a..eb3554cb5ea4 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -693,6 +693,8 @@ void intel_uc_suspend(struct intel_uc *uc) return; } + intel_guc_submission_flush_work(guc); + with_intel_runtime_pm(&uc_to_gt(uc)->i915->runtime_pm, wakeref) { err = intel_guc_suspend(guc); if (err) From patchwork Sun Sep 10 03:58:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Previn X-Patchwork-Id: 13378449 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D926EEE7FF4 for ; Sun, 10 Sep 2023 03:58:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1981210E075; Sun, 10 Sep 2023 03:58:51 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id 17E6910E05F; Sun, 10 Sep 2023 03:58:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1694318329; x=1725854329; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FKE8uSxct1Hx8LSGCmMYLWpkbQDDqWrF7kXyidNA0/s=; b=ZOoghJ7MKpKXYlrd5AClqaidzaa9ZyFL2dlvLpsmk47saEiIkz8R21oV kx3flKsXI16jSqw4vsBucy4K7qku0NZeBheFymeoHLXbd3VlmhjbNBfMk foxCzvUWFGx+zVWlx4GDgE1fAjMK85kJJuKBos8GwBSuF78zb2nz//unv 6IKqRCvcmbVpkbyQCDBGeW7L593ECkmyTkIPXWsXedEbGomT/8TlMD757 gPOxuiPq0xXEE0P+s/+1WHaoiTq+bl5ZEIiVpJvX8SZVqHq6fO+uDLyyR 4BaumVvweU1egNUp+/30YT8YntGExTxvYaUizXM8gHzHc8bgDT/VIry2e w==; X-IronPort-AV: E=McAfee;i="6600,9927,10827"; a="381684303" X-IronPort-AV: E=Sophos;i="6.02,241,1688454000"; d="scan'208";a="381684303" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Sep 2023 20:58:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10827"; a="736377114" X-IronPort-AV: E=Sophos;i="6.02,241,1688454000"; d="scan'208";a="736377114" Received: from aalteres-desk.fm.intel.com ([10.80.57.53]) by orsmga007.jf.intel.com with ESMTP; 09 Sep 2023 20:58:48 -0700 From: Alan Previn To: intel-gfx@lists.freedesktop.org Subject: [PATCH v3 2/3] drm/i915/guc: Close deregister-context race against CT-loss Date: Sat, 9 Sep 2023 20:58:45 -0700 Message-Id: <20230910035846.493766-3-alan.previn.teres.alexis@intel.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230910035846.493766-1-alan.previn.teres.alexis@intel.com> References: <20230910035846.493766-1-alan.previn.teres.alexis@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: , Alan Previn , dri-devel@lists.freedesktop.org, Daniele Ceraolo Spurio , Rodrigo Vivi , intel.com@freedesktop.org, Mousumi Jana , John Harrison Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" If we are at the end of suspend or very early in resume its possible an async fence signal could lead us to the execution of the context destruction worker (after the prior worker flush). Even if checking that the CT is enabled before calling destroyed_worker_func, guc_lrc_desc_unpin may still fail because in corner cases, as we traverse the GuC's context-destroy list, the CT could get disabled in the mid of it right before calling the GuC's CT send function. We've witnessed this race condition once every ~6000-8000 suspend-resume cycles while ensuring workloads that render something onscreen is continuously started just before we suspend (and the workload is small enough to complete and trigger the queued engine/context free-up either very late in suspend or very early in resume). In such a case, we need to unroll the unpin process because guc-lrc-unpin takes a gt wakeref which only gets released in the G2H IRQ reply that never comes through in this corner case. That will cascade into a kernel hang later at the tail end of suspend in this function: intel_wakeref_wait_for_idle(>->wakeref) (called by) - intel_gt_pm_wait_for_idle (called by) - wait_for_suspend Doing this unroll and keeping the context in the GuC's destroy-list will allow the context to get picked up on the next destroy worker invocation or purged as part of a major GuC sanitization or reset flow. Signed-off-by: Alan Previn Tested-by: Mousumi Jana --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 76 ++++++++++++++++--- 1 file changed, 65 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 0ed44637bca0..db7df1217350 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -235,6 +235,13 @@ set_context_destroyed(struct intel_context *ce) ce->guc_state.sched_state |= SCHED_STATE_DESTROYED; } +static inline void +clr_context_destroyed(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state &= ~SCHED_STATE_DESTROYED; +} + static inline bool context_pending_disable(struct intel_context *ce) { return ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE; @@ -612,6 +619,8 @@ static int guc_submission_send_busy_loop(struct intel_guc *guc, u32 g2h_len_dw, bool loop) { + int ret; + /* * We always loop when a send requires a reply (i.e. g2h_len_dw > 0), * so we don't handle the case where we don't get a reply because we @@ -622,7 +631,11 @@ static int guc_submission_send_busy_loop(struct intel_guc *guc, if (g2h_len_dw) atomic_inc(&guc->outstanding_submission_g2h); - return intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop); + ret = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop); + if (ret) + atomic_dec(&guc->outstanding_submission_g2h); + + return ret; } int intel_guc_wait_for_pending_msg(struct intel_guc *guc, @@ -3173,12 +3186,13 @@ static void guc_context_close(struct intel_context *ce) spin_unlock_irqrestore(&ce->guc_state.lock, flags); } -static inline void guc_lrc_desc_unpin(struct intel_context *ce) +static inline int guc_lrc_desc_unpin(struct intel_context *ce) { struct intel_guc *guc = ce_to_guc(ce); struct intel_gt *gt = guc_to_gt(guc); unsigned long flags; bool disabled; + int ret; GEM_BUG_ON(!intel_gt_pm_is_awake(gt)); GEM_BUG_ON(!ctx_id_mapped(guc, ce->guc_id.id)); @@ -3188,19 +3202,33 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce) /* Seal race with Reset */ spin_lock_irqsave(&ce->guc_state.lock, flags); disabled = submission_disabled(guc); - if (likely(!disabled)) { - __intel_gt_pm_get(gt); - set_context_destroyed(ce); - clr_context_registered(ce); - } - spin_unlock_irqrestore(&ce->guc_state.lock, flags); if (unlikely(disabled)) { + spin_unlock_irqrestore(&ce->guc_state.lock, flags); release_guc_id(guc, ce); __guc_context_destroy(ce); - return; + return 0; } - deregister_context(ce, ce->guc_id.id); + /* GuC is active, lets destroy this context, + * but at this point we can still be racing with + * suspend, so we undo everything if the H2G fails + */ + + /* Change context state to destroyed and get gt-pm */ + __intel_gt_pm_get(gt); + set_context_destroyed(ce); + clr_context_registered(ce); + + ret = deregister_context(ce, ce->guc_id.id); + if (ret) { + /* Undo the state change and put gt-pm if that failed */ + set_context_registered(ce); + clr_context_destroyed(ce); + intel_gt_pm_put(gt); + } + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + + return 0; } static void __guc_context_destroy(struct intel_context *ce) @@ -3268,7 +3296,22 @@ static void deregister_destroyed_contexts(struct intel_guc *guc) if (!ce) break; - guc_lrc_desc_unpin(ce); + if (guc_lrc_desc_unpin(ce)) { + /* + * This means GuC's CT link severed mid-way which could happen + * in suspend-resume corner cases. In this case, put the + * context back into the destroyed_contexts list which will + * get picked up on the next context deregistration event or + * purged in a GuC sanitization event (reset/unload/wedged/...). + */ + spin_lock_irqsave(&guc->submission_state.lock, flags); + list_add_tail(&ce->destroyed_link, + &guc->submission_state.destroyed_contexts); + spin_unlock_irqrestore(&guc->submission_state.lock, flags); + /* Bail now since the list might never be emptied if h2gs fail */ + break; + } + } } @@ -3279,6 +3322,17 @@ static void destroyed_worker_func(struct work_struct *w) struct intel_gt *gt = guc_to_gt(guc); int tmp; + /* + * In rare cases we can get here via async context-free fence-signals that + * come very late in suspend flow or very early in resume flows. In these + * cases, GuC won't be ready but just skipping it here is fine as these + * pending-destroy-contexts get destroyed totally at GuC reset time at the + * end of suspend.. OR.. this worker can be picked up later on the next + * context destruction trigger after resume-completes + */ + if (!intel_guc_is_ready(guc)) + return; + with_intel_gt_pm(gt, tmp) deregister_destroyed_contexts(guc); } From patchwork Sun Sep 10 03:58:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Previn X-Patchwork-Id: 13378452 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2B07BEE7FF4 for ; Sun, 10 Sep 2023 03:59:07 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id BF8EA10E12E; Sun, 10 Sep 2023 03:58:56 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4C0CF10E075; Sun, 10 Sep 2023 03:58:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1694318329; x=1725854329; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=AefDBmB4tsmMpjWp05okuoGD+ZVs6gw0OKfYwGkno+k=; b=eSjF+TbxF2aozP9G5QFVpteTWaOR8Zla0thBWDd/v+SwymSOM5zCE/Bw rxDPK//I7TnBaEyx8TRFNzAXiPv55nymX30L2aVmqRZN4fNPBS9CaHbYh y9EQPCaaLP+eN40c4kS3g+75AdqiEXZ6aADbdgSK5M0HWKjDKP0sbmuMR eyW6LQU5bcRDBKxGr/njzXUrKkFtMxgfzTeHmNCU5ZNnOXzD2cLMuN0kT Edn019V9h58O8IJDGkeACCDMuk3rZOyJzBy3aiGR6oCRJvxA+kwGGIU5x cjJvHz/9tadvh9kZbf64ndRnS1QiBz9ZbyOQtAzwtyQIL5OG/fq70+bp7 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10827"; a="381684306" X-IronPort-AV: E=Sophos;i="6.02,241,1688454000"; d="scan'208";a="381684306" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Sep 2023 20:58:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10827"; a="736377118" X-IronPort-AV: E=Sophos;i="6.02,241,1688454000"; d="scan'208";a="736377118" Received: from aalteres-desk.fm.intel.com ([10.80.57.53]) by orsmga007.jf.intel.com with ESMTP; 09 Sep 2023 20:58:48 -0700 From: Alan Previn To: intel-gfx@lists.freedesktop.org Subject: [PATCH v3 3/3] drm/i915/gt: Timeout when waiting for idle in suspending Date: Sat, 9 Sep 2023 20:58:46 -0700 Message-Id: <20230910035846.493766-4-alan.previn.teres.alexis@intel.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230910035846.493766-1-alan.previn.teres.alexis@intel.com> References: <20230910035846.493766-1-alan.previn.teres.alexis@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: , Alan Previn , dri-devel@lists.freedesktop.org, Daniele Ceraolo Spurio , Rodrigo Vivi , intel.com@freedesktop.org, Mousumi Jana , John Harrison Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" When suspending, add a timeout when calling intel_gt_pm_wait_for_idle else if we have a lost G2H event that holds a wakeref (which would be indicative of a bug elsewhere in the driver), driver will at least complete the suspend-resume cycle, (albeit not hitting all the targets for low power hw counters), instead of hanging in the kernel. Signed-off-by: Alan Previn Tested-by: Mousumi Jana Reviewed-by: Rodrigo Vivi --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 2 +- drivers/gpu/drm/i915/gt/intel_gt_pm.c | 7 ++++++- drivers/gpu/drm/i915/gt/intel_gt_pm.h | 7 ++++++- drivers/gpu/drm/i915/intel_wakeref.c | 14 ++++++++++---- drivers/gpu/drm/i915/intel_wakeref.h | 6 ++++-- 5 files changed, 27 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index dfb69fc977a0..4811f3be0332 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -688,7 +688,7 @@ void intel_engines_release(struct intel_gt *gt) if (!engine->release) continue; - intel_wakeref_wait_for_idle(&engine->wakeref); + intel_wakeref_wait_for_idle(&engine->wakeref, 0); GEM_BUG_ON(intel_engine_pm_is_awake(engine)); engine->release(engine); diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index 5a942af0a14e..ca46aee72573 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -289,6 +289,8 @@ int intel_gt_resume(struct intel_gt *gt) static void wait_for_suspend(struct intel_gt *gt) { + int timeout_ms = CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT ? : 10000; + if (!intel_gt_pm_is_awake(gt)) return; @@ -301,7 +303,10 @@ static void wait_for_suspend(struct intel_gt *gt) intel_gt_retire_requests(gt); } - intel_gt_pm_wait_for_idle(gt); + /* we are suspending, so we shouldn't be waiting forever */ + if (intel_gt_pm_wait_timeout_for_idle(gt, timeout_ms) == -ETIMEDOUT) + gt_warn(gt, "bailing from %s after %d milisec timeout\n", + __func__, timeout_ms); } void intel_gt_suspend_prepare(struct intel_gt *gt) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h index 6c9a46452364..5358acc2b5b1 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h @@ -68,7 +68,12 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt) static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt) { - return intel_wakeref_wait_for_idle(>->wakeref); + return intel_wakeref_wait_for_idle(>->wakeref, 0); +} + +static inline int intel_gt_pm_wait_timeout_for_idle(struct intel_gt *gt, int timeout_ms) +{ + return intel_wakeref_wait_for_idle(>->wakeref, timeout_ms); } void intel_gt_pm_init_early(struct intel_gt *gt); diff --git a/drivers/gpu/drm/i915/intel_wakeref.c b/drivers/gpu/drm/i915/intel_wakeref.c index 718f2f1b6174..383a37521415 100644 --- a/drivers/gpu/drm/i915/intel_wakeref.c +++ b/drivers/gpu/drm/i915/intel_wakeref.c @@ -111,14 +111,20 @@ void __intel_wakeref_init(struct intel_wakeref *wf, "wakeref.work", &key->work, 0); } -int intel_wakeref_wait_for_idle(struct intel_wakeref *wf) +int intel_wakeref_wait_for_idle(struct intel_wakeref *wf, int timeout_ms) { - int err; + int err = 0; might_sleep(); - err = wait_var_event_killable(&wf->wakeref, - !intel_wakeref_is_active(wf)); + if (!timeout_ms) + err = wait_var_event_killable(&wf->wakeref, + !intel_wakeref_is_active(wf)); + else if (wait_var_event_timeout(&wf->wakeref, + !intel_wakeref_is_active(wf), + msecs_to_jiffies(timeout_ms)) < 1) + err = -ETIMEDOUT; + if (err) return err; diff --git a/drivers/gpu/drm/i915/intel_wakeref.h b/drivers/gpu/drm/i915/intel_wakeref.h index ec881b097368..302694a780d2 100644 --- a/drivers/gpu/drm/i915/intel_wakeref.h +++ b/drivers/gpu/drm/i915/intel_wakeref.h @@ -251,15 +251,17 @@ __intel_wakeref_defer_park(struct intel_wakeref *wf) /** * intel_wakeref_wait_for_idle: Wait until the wakeref is idle * @wf: the wakeref + * @timeout_ms: Timeout in ms, 0 means never timeout. * * Wait for the earlier asynchronous release of the wakeref. Note * this will wait for any third party as well, so make sure you only wait * when you have control over the wakeref and trust no one else is acquiring * it. * - * Return: 0 on success, error code if killed. + * Returns 0 on success, -ETIMEDOUT upon a timeout, or the unlikely + * error propagation from wait_var_event_killable if timeout_ms is 0. */ -int intel_wakeref_wait_for_idle(struct intel_wakeref *wf); +int intel_wakeref_wait_for_idle(struct intel_wakeref *wf, int timeout_ms); struct intel_wakeref_auto { struct drm_i915_private *i915;