From patchwork Sat Oct 14 01:04:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Previn X-Patchwork-Id: 13421877 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 19E63CDB483 for ; Sat, 14 Oct 2023 01:04:27 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id F3F9710E08B; Sat, 14 Oct 2023 01:04:18 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id DE0F810E07A; Sat, 14 Oct 2023 01:04:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697245456; x=1728781456; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/TweaP5OeguMJ81EfpX5BPbRMgW6OrN2SdGzrAILsSM=; b=MY9OUZWp6Uh3mUDE7/1ALF4crSWGXEhfPLkQuNdubGkLy0u9AB2egtk5 j/OY8Wm6WeoYIH6S3IW+vrQn1GVetzRZ7nUZkfrfH5knBeW4/ykvOLag1 T05+VJzBU20fHESne/EkLu4danzETJgj6XqP+hVUHxW9IEhXmFBk40VIu ZP7Up675muvgfpyUe/AXAzGJSZze1GfJQPKNGO3SkioeMJPNfVE4g98jX 8c0v9OcgyqL8x5HK/J/yEcouB8tKJioZBqqdECMef3HrU+4QVpJWcFokJ AnWTQLg5IlUzPbZpIRE7NOtJo3rufFMvx9CsPpsr2qxkJVUxO6wk1oFzl Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10862"; a="364656949" X-IronPort-AV: E=Sophos;i="6.03,223,1694761200"; d="scan'208";a="364656949" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2023 18:04:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10862"; a="898756734" X-IronPort-AV: E=Sophos;i="6.03,223,1694761200"; d="scan'208";a="898756734" Received: from aalteres-desk.fm.intel.com ([10.80.57.53]) by fmsmga001.fm.intel.com with ESMTP; 13 Oct 2023 18:02:24 -0700 From: Alan Previn To: intel-gfx@lists.freedesktop.org Subject: [PATCH v5 1/3] drm/i915/guc: Flush context destruction worker at suspend Date: Fri, 13 Oct 2023 18:04:11 -0700 Message-Id: <20231014010413.256468-2-alan.previn.teres.alexis@intel.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20231014010413.256468-1-alan.previn.teres.alexis@intel.com> References: <20231014010413.256468-1-alan.previn.teres.alexis@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alan Previn , Tvrtko Ursulin , Anshuman Gupta , dri-devel@lists.freedesktop.org, Daniele Ceraolo Spurio , Rodrigo Vivi , Mousumi Jana , John Harrison Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" When suspending, flush the context-guc-id deregistration worker at the final stages of intel_gt_suspend_late when we finally call gt_sanitize that eventually leads down to __uc_sanitize so that the deregistration worker doesn't fire off later as we reset the GuC microcontroller. Signed-off-by: Alan Previn Reviewed-by: Rodrigo Vivi Tested-by: Mousumi Jana --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 +++++ drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 ++ drivers/gpu/drm/i915/gt/uc/intel_uc.c | 2 ++ 3 files changed, 9 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 2cce5ec1ff00..a5b68f77e494 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1610,6 +1610,11 @@ static void guc_flush_submissions(struct intel_guc *guc) spin_unlock_irqrestore(&sched_engine->lock, flags); } +void intel_guc_submission_flush_work(struct intel_guc *guc) +{ + flush_work(&guc->submission_state.destroyed_worker); +} + static void guc_flush_destroyed_contexts(struct intel_guc *guc); void intel_guc_submission_reset_prepare(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index c57b29cdb1a6..b6df75622d3b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -38,6 +38,8 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc, bool interruptible, long timeout); +void intel_guc_submission_flush_work(struct intel_guc *guc); + static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) { return guc->submission_supported; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 98b103375b7a..eb3554cb5ea4 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -693,6 +693,8 @@ void intel_uc_suspend(struct intel_uc *uc) return; } + intel_guc_submission_flush_work(guc); + with_intel_runtime_pm(&uc_to_gt(uc)->i915->runtime_pm, wakeref) { err = intel_guc_suspend(guc); if (err) From patchwork Sat Oct 14 01:04:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Previn X-Patchwork-Id: 13421876 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 17026CDB483 for ; Sat, 14 Oct 2023 01:04:22 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4B84E10E07A; Sat, 14 Oct 2023 01:04:18 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id 101BC10E034; Sat, 14 Oct 2023 01:04:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697245457; x=1728781457; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Y+0C/6NQeEsCCkcqyHmXtzobTsX44LL3xon0zEvOSiI=; b=hbR0pqRCchqt7eKXb3XwrtCKkhy2nt9Uejy6kbdebIN34H/+Kw6XAj81 GTcTQYs0NvhdKk6/wMIRZ6oPYr3YUQbi2tg0Ay93w6OQZelzZNlvtm+qf XcvGbMw4auxAft0KEBQ46rlbDoA3Np9p+FjHpgeLnW6kE6WK9/YVl2Lp4 Gkx9GOHFZZjwbgz9AM5r5JAz7L0O5NEevaj8gn3RUG0mqN5k1W0gSBe7U ntDoR3LB3fUso+OQNbkEmr55mrP3jgvLR32PeD8AfrWQEHbeIbTbMFxbD bl5h1Bqj+g5w96LYf8VQ0QnfS44s/wy3xKXL+//nm1kIGgw0+va/5XKHl A==; X-IronPort-AV: E=McAfee;i="6600,9927,10862"; a="364656950" X-IronPort-AV: E=Sophos;i="6.03,223,1694761200"; d="scan'208";a="364656950" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2023 18:04:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10862"; a="898756737" X-IronPort-AV: E=Sophos;i="6.03,223,1694761200"; d="scan'208";a="898756737" Received: from aalteres-desk.fm.intel.com ([10.80.57.53]) by fmsmga001.fm.intel.com with ESMTP; 13 Oct 2023 18:02:24 -0700 From: Alan Previn To: intel-gfx@lists.freedesktop.org Subject: [PATCH v5 2/3] drm/i915/guc: Close deregister-context race against CT-loss Date: Fri, 13 Oct 2023 18:04:12 -0700 Message-Id: <20231014010413.256468-3-alan.previn.teres.alexis@intel.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20231014010413.256468-1-alan.previn.teres.alexis@intel.com> References: <20231014010413.256468-1-alan.previn.teres.alexis@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alan Previn , Tvrtko Ursulin , Anshuman Gupta , dri-devel@lists.freedesktop.org, Daniele Ceraolo Spurio , Rodrigo Vivi , Mousumi Jana , John Harrison Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" If we are at the end of suspend or very early in resume its possible an async fence signal (via rcu_call) is triggered to free_engines which could lead us to the execution of the context destruction worker (after a prior worker flush). Thus, when suspending, insert rcu_barriers at the start of i915_gem_suspend (part of driver's suspend prepare) and again in i915_gem_suspend_late so that all such cases have completed and context destruction list isn't missing anything. In destroyed_worker_func, close the race against CT-loss by checking that CT is enabled before calling into deregister_destroyed_contexts. Based on testing, guc_lrc_desc_unpin may still race and fail as we traverse the GuC's context-destroy list because the CT could be disabled right before calling GuC's CT send function. We've witnessed this race condition once every ~6000-8000 suspend-resume cycles while ensuring workloads that render something onscreen is continuously started just before we suspend (and the workload is small enough to complete and trigger the queued engine/context free-up either very late in suspend or very early in resume). In such a case, we need to unroll the entire process because guc-lrc-unpin takes a gt wakeref which only gets released in the G2H IRQ reply that never comes through in this corner case. Without the unroll, the taken wakeref is leaked and will cascade into a kernel hang later at the tail end of suspend in this function: intel_wakeref_wait_for_idle(>->wakeref) (called by) - intel_gt_pm_wait_for_idle (called by) - wait_for_suspend Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_- contexts if guc_lrc_desc_unpin fails due to CT send falure. When unrolling, keep the context in the GuC's destroy-list so it can get picked up on the next destroy worker invocation (if suspend aborted) or get fully purged as part of a GuC sanitization (end of suspend) or a reset flow. Signed-off-by: Alan Previn Signed-off-by: Anshuman Gupta Tested-by: Mousumi Jana Acked-by: Anshuman Gupta --- drivers/gpu/drm/i915/gem/i915_gem_pm.c | 10 +++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 81 ++++++++++++++++--- 2 files changed, 80 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c index 0d812f4d787d..3b27218aabe2 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c @@ -28,6 +28,13 @@ void i915_gem_suspend(struct drm_i915_private *i915) GEM_TRACE("%s\n", dev_name(i915->drm.dev)); intel_wakeref_auto(&i915->runtime_pm.userfault_wakeref, 0); + /* + * On rare occasions, we've observed the fence completion triggers + * free_engines asynchronously via rcu_call. Ensure those are done. + * This path is only called on suspend, so it's an acceptable cost. + */ + rcu_barrier(); + flush_workqueue(i915->wq); /* @@ -160,6 +167,9 @@ void i915_gem_suspend_late(struct drm_i915_private *i915) * machine in an unusable condition. */ + /* Like i915_gem_suspend, flush tasks staged from fence triggers */ + rcu_barrier(); + for_each_gt(gt, i915, i) intel_gt_suspend_late(gt); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index a5b68f77e494..9806b33c8561 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -235,6 +235,13 @@ set_context_destroyed(struct intel_context *ce) ce->guc_state.sched_state |= SCHED_STATE_DESTROYED; } +static inline void +clr_context_destroyed(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state &= ~SCHED_STATE_DESTROYED; +} + static inline bool context_pending_disable(struct intel_context *ce) { return ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE; @@ -612,6 +619,8 @@ static int guc_submission_send_busy_loop(struct intel_guc *guc, u32 g2h_len_dw, bool loop) { + int ret; + /* * We always loop when a send requires a reply (i.e. g2h_len_dw > 0), * so we don't handle the case where we don't get a reply because we @@ -622,7 +631,11 @@ static int guc_submission_send_busy_loop(struct intel_guc *guc, if (g2h_len_dw) atomic_inc(&guc->outstanding_submission_g2h); - return intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop); + ret = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop); + if (ret) + atomic_dec(&guc->outstanding_submission_g2h); + + return ret; } int intel_guc_wait_for_pending_msg(struct intel_guc *guc, @@ -3205,12 +3218,13 @@ static void guc_context_close(struct intel_context *ce) spin_unlock_irqrestore(&ce->guc_state.lock, flags); } -static inline void guc_lrc_desc_unpin(struct intel_context *ce) +static inline int guc_lrc_desc_unpin(struct intel_context *ce) { struct intel_guc *guc = ce_to_guc(ce); struct intel_gt *gt = guc_to_gt(guc); unsigned long flags; bool disabled; + int ret; GEM_BUG_ON(!intel_gt_pm_is_awake(gt)); GEM_BUG_ON(!ctx_id_mapped(guc, ce->guc_id.id)); @@ -3220,19 +3234,38 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce) /* Seal race with Reset */ spin_lock_irqsave(&ce->guc_state.lock, flags); disabled = submission_disabled(guc); - if (likely(!disabled)) { - __intel_gt_pm_get(gt); - set_context_destroyed(ce); - clr_context_registered(ce); - } - spin_unlock_irqrestore(&ce->guc_state.lock, flags); if (unlikely(disabled)) { + spin_unlock_irqrestore(&ce->guc_state.lock, flags); release_guc_id(guc, ce); __guc_context_destroy(ce); - return; + return 0; } - deregister_context(ce, ce->guc_id.id); + /* GuC is active, lets destroy this context, + * but at this point we can still be racing with + * suspend, so we undo everything if the H2G fails + */ + + /* Change context state to destroyed and get gt-pm */ + __intel_gt_pm_get(gt); + set_context_destroyed(ce); + clr_context_registered(ce); + + ret = deregister_context(ce, ce->guc_id.id); + if (ret) { + /* Undo the state change and put gt-pm if that failed */ + set_context_registered(ce); + clr_context_destroyed(ce); + /* + * Dont use might_sleep / ASYNC verion of put because + * CT loss in deregister_context could mean an ongoing + * reset or suspend flow. Immediately put before the unlock + */ + __intel_wakeref_put(>->wakeref, 0); + } + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + + return ret; } static void __guc_context_destroy(struct intel_context *ce) @@ -3300,7 +3333,22 @@ static void deregister_destroyed_contexts(struct intel_guc *guc) if (!ce) break; - guc_lrc_desc_unpin(ce); + if (guc_lrc_desc_unpin(ce)) { + /* + * This means GuC's CT link severed mid-way which could happen + * in suspend-resume corner cases. In this case, put the + * context back into the destroyed_contexts list which will + * get picked up on the next context deregistration event or + * purged in a GuC sanitization event (reset/unload/wedged/...). + */ + spin_lock_irqsave(&guc->submission_state.lock, flags); + list_add_tail(&ce->destroyed_link, + &guc->submission_state.destroyed_contexts); + spin_unlock_irqrestore(&guc->submission_state.lock, flags); + /* Bail now since the list might never be emptied if h2gs fail */ + break; + } + } } @@ -3311,6 +3359,17 @@ static void destroyed_worker_func(struct work_struct *w) struct intel_gt *gt = guc_to_gt(guc); int tmp; + /* + * In rare cases we can get here via async context-free fence-signals that + * come very late in suspend flow or very early in resume flows. In these + * cases, GuC won't be ready but just skipping it here is fine as these + * pending-destroy-contexts get destroyed totally at GuC reset time at the + * end of suspend.. OR.. this worker can be picked up later on the next + * context destruction trigger after resume-completes + */ + if (!intel_guc_is_ready(guc)) + return; + with_intel_gt_pm(gt, tmp) deregister_destroyed_contexts(guc); } From patchwork Sat Oct 14 01:04:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Previn X-Patchwork-Id: 13421879 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 84B83CDB482 for ; Sat, 14 Oct 2023 01:04:33 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4B65E10E0A1; Sat, 14 Oct 2023 01:04:24 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3B47F10E07A; Sat, 14 Oct 2023 01:04:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697245457; x=1728781457; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BrSK/Ywnd8ITyDiA5Uq1XljbxoBnWJoRU9KoKD8ZEhk=; b=eNSWtrk+14y/zxwGM/umCPuwZapc8rKDnKpOK5WAiOJzVyb9g4zIOSDA TMs4MERQbI1wNT6sRVqpacxikA/IU/dWbsgTnRktAdlzlqMIhXtXVTwBm MbLPwsvejZexTv5Kftda91M8AAwDtpaHDk3/DLtX6+5hFTfvKSPsCi5WS unCd1Hj0IddVBNGadBDltBPtVXGy0sCI1loQNL4Y5rKccMb2F9dUbsThP 64wLMT/vIZzA4aAUIANvFoc+ouUp17I1fJpTQhkoL+EzbsyAH++7iZZBy ZsM5C6OJpfIEBlqzdhbN8JMcB1qCT+VNlq3jmnEaXxFQE10gLqjBo6Wsr w==; X-IronPort-AV: E=McAfee;i="6600,9927,10862"; a="364656951" X-IronPort-AV: E=Sophos;i="6.03,223,1694761200"; d="scan'208";a="364656951" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2023 18:04:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10862"; a="898756740" X-IronPort-AV: E=Sophos;i="6.03,223,1694761200"; d="scan'208";a="898756740" Received: from aalteres-desk.fm.intel.com ([10.80.57.53]) by fmsmga001.fm.intel.com with ESMTP; 13 Oct 2023 18:02:24 -0700 From: Alan Previn To: intel-gfx@lists.freedesktop.org Subject: [PATCH v5 3/3] drm/i915/gt: Timeout when waiting for idle in suspending Date: Fri, 13 Oct 2023 18:04:13 -0700 Message-Id: <20231014010413.256468-4-alan.previn.teres.alexis@intel.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20231014010413.256468-1-alan.previn.teres.alexis@intel.com> References: <20231014010413.256468-1-alan.previn.teres.alexis@intel.com> MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alan Previn , Tvrtko Ursulin , Anshuman Gupta , dri-devel@lists.freedesktop.org, Daniele Ceraolo Spurio , Rodrigo Vivi , Mousumi Jana , John Harrison Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" When suspending, add a timeout when calling intel_gt_pm_wait_for_idle else if we have a lost G2H event that holds a wakeref (which would be indicative of a bug elsewhere in the driver), driver will at least complete the suspend-resume cycle, (albeit not hitting all the targets for low power hw counters), instead of hanging in the kernel. Signed-off-by: Alan Previn Reviewed-by: Rodrigo Vivi Tested-by: Mousumi Jana --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 2 +- drivers/gpu/drm/i915/gt/intel_gt_pm.c | 7 ++++++- drivers/gpu/drm/i915/gt/intel_gt_pm.h | 7 ++++++- drivers/gpu/drm/i915/intel_wakeref.c | 14 ++++++++++---- drivers/gpu/drm/i915/intel_wakeref.h | 6 ++++-- 5 files changed, 27 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 179d9546865b..4b45a8f7fe5a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -686,7 +686,7 @@ void intel_engines_release(struct intel_gt *gt) if (!engine->release) continue; - intel_wakeref_wait_for_idle(&engine->wakeref); + intel_wakeref_wait_for_idle(&engine->wakeref, 0); GEM_BUG_ON(intel_engine_pm_is_awake(engine)); engine->release(engine); diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index f5899d503e23..25cb39ba9fdf 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -306,6 +306,8 @@ int intel_gt_resume(struct intel_gt *gt) static void wait_for_suspend(struct intel_gt *gt) { + int final_timeout_ms = (I915_GT_SUSPEND_IDLE_TIMEOUT * 10); + if (!intel_gt_pm_is_awake(gt)) return; @@ -318,7 +320,10 @@ static void wait_for_suspend(struct intel_gt *gt) intel_gt_retire_requests(gt); } - intel_gt_pm_wait_for_idle(gt); + /* we are suspending, so we shouldn't be waiting forever */ + if (intel_gt_pm_wait_timeout_for_idle(gt, final_timeout_ms) == -ETIMEDOUT) + gt_warn(gt, "bailing from %s after %d milisec timeout\n", + __func__, final_timeout_ms); } void intel_gt_suspend_prepare(struct intel_gt *gt) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h index b1eeb5b33918..1757ca4c3077 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h @@ -68,7 +68,12 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt) static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt) { - return intel_wakeref_wait_for_idle(>->wakeref); + return intel_wakeref_wait_for_idle(>->wakeref, 0); +} + +static inline int intel_gt_pm_wait_timeout_for_idle(struct intel_gt *gt, int timeout_ms) +{ + return intel_wakeref_wait_for_idle(>->wakeref, timeout_ms); } void intel_gt_pm_init_early(struct intel_gt *gt); diff --git a/drivers/gpu/drm/i915/intel_wakeref.c b/drivers/gpu/drm/i915/intel_wakeref.c index 623a69089386..f2611c65246b 100644 --- a/drivers/gpu/drm/i915/intel_wakeref.c +++ b/drivers/gpu/drm/i915/intel_wakeref.c @@ -113,14 +113,20 @@ void __intel_wakeref_init(struct intel_wakeref *wf, "wakeref.work", &key->work, 0); } -int intel_wakeref_wait_for_idle(struct intel_wakeref *wf) +int intel_wakeref_wait_for_idle(struct intel_wakeref *wf, int timeout_ms) { - int err; + int err = 0; might_sleep(); - err = wait_var_event_killable(&wf->wakeref, - !intel_wakeref_is_active(wf)); + if (!timeout_ms) + err = wait_var_event_killable(&wf->wakeref, + !intel_wakeref_is_active(wf)); + else if (wait_var_event_timeout(&wf->wakeref, + !intel_wakeref_is_active(wf), + msecs_to_jiffies(timeout_ms)) < 1) + err = -ETIMEDOUT; + if (err) return err; diff --git a/drivers/gpu/drm/i915/intel_wakeref.h b/drivers/gpu/drm/i915/intel_wakeref.h index ec881b097368..302694a780d2 100644 --- a/drivers/gpu/drm/i915/intel_wakeref.h +++ b/drivers/gpu/drm/i915/intel_wakeref.h @@ -251,15 +251,17 @@ __intel_wakeref_defer_park(struct intel_wakeref *wf) /** * intel_wakeref_wait_for_idle: Wait until the wakeref is idle * @wf: the wakeref + * @timeout_ms: Timeout in ms, 0 means never timeout. * * Wait for the earlier asynchronous release of the wakeref. Note * this will wait for any third party as well, so make sure you only wait * when you have control over the wakeref and trust no one else is acquiring * it. * - * Return: 0 on success, error code if killed. + * Returns 0 on success, -ETIMEDOUT upon a timeout, or the unlikely + * error propagation from wait_var_event_killable if timeout_ms is 0. */ -int intel_wakeref_wait_for_idle(struct intel_wakeref *wf); +int intel_wakeref_wait_for_idle(struct intel_wakeref *wf, int timeout_ms); struct intel_wakeref_auto { struct drm_i915_private *i915;