diff mbox series

i915/guc: Run busyness worker only if gt is awake

Message ID 20230912005228.1736471-1-umesh.nerlige.ramappa@intel.com (mailing list archive)
State New, archived
Headers show
Series i915/guc: Run busyness worker only if gt is awake | expand

Commit Message

Umesh Nerlige Ramappa Sept. 12, 2023, 12:52 a.m. UTC
The worker is canceled in the __gt_park path, but we still see it
running sometimes during suspend.

Only update stats if gt is awake. If not, intel_guc_busyness_park would
have already updated the stats. Note that we do not requeue the worker
if gt is not awake since intel_guc_busyness_unpark would do that at some
point.

If the gt was parked longer than time taken for GT timestamp to roll
over, we ignore those rollovers since we don't care about tracking the
exact GT time. We only care about roll overs when the gt is active and
running workloads.

v2 (Daniele)
- Edit commit message and code comment
- Use runtime pm in the worker
- Put runtime pm after enabling the worker
- Use Link tag and add Fixes tag

Link: https://gitlab.freedesktop.org/drm/intel/-/issues/7077
Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 26 ++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

Comments

Daniele Ceraolo Spurio Sept. 14, 2023, 5:23 p.m. UTC | #1
On 9/11/2023 5:52 PM, Umesh Nerlige Ramappa wrote:
> The worker is canceled in the __gt_park path, but we still see it
> running sometimes during suspend.
>
> Only update stats if gt is awake. If not, intel_guc_busyness_park would
> have already updated the stats. Note that we do not requeue the worker
> if gt is not awake since intel_guc_busyness_unpark would do that at some
> point.
>
> If the gt was parked longer than time taken for GT timestamp to roll
> over, we ignore those rollovers since we don't care about tracking the
> exact GT time. We only care about roll overs when the gt is active and
> running workloads.
>
> v2 (Daniele)
> - Edit commit message and code comment
> - Use runtime pm in the worker
> - Put runtime pm after enabling the worker
> - Use Link tag and add Fixes tag
>
> Link: https://gitlab.freedesktop.org/drm/intel/-/issues/7077
> Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu")
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 26 ++++++++++++++++---
>   1 file changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index e250bedf90fb..d37b255559a0 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1461,6 +1461,24 @@ static void guc_timestamp_ping(struct work_struct *wrk)
>   	unsigned long index;
>   	int srcu, ret;
>   
> +	/*
> +	 * The worker is canceled in the __gt_park path, but we still see it
> +	 * running sometimes during suspend.

This sounds very vague and more like there is a bug in __gt_park. We 
actually do know that the issue on the park side is that the worker 
re-queues itself if it's running while we cancel it; that's not really a 
problem as long as we don't wake the device after it's gone to sleep 
(which is what your change below does), so this sentence should be 
reworded or dropped.

> +	 *
> +	 * Only update stats if gt is awake. If not, intel_guc_busyness_park
> +	 * would have already updated the stats. Note that we do not requeue the
> +	 * worker in this case since intel_guc_busyness_unpark would do that at
> +	 * some point.
> +	 *
> +	 * If the gt was parked longer than time taken for GT timestamp to roll
> +	 * over, we ignore those rollovers since we don't care about tracking
> +	 * the exact GT time. We only care about roll overs when the gt is
> +	 * active and running workloads.
> +	 */
> +	wakeref = intel_runtime_pm_get_if_active(&gt->i915->runtime_pm);

The patch title and the comment refer to GT being awake, but here you're 
taking a global rpm ref and not a gt_pm ref. I understand that taking a 
gt_pm ref in here can be complicated because of the canceling of the 
worker in the gt_park flow and that taking an rpm ref does work for what 
we need, but the title/comments need to reflect and explain that.

Daniele

> +	if (!wakeref)
> +		return;
> +
>   	/*
>   	 * Synchronize with gt reset to make sure the worker does not
>   	 * corrupt the engine/guc stats. NB: can't actually block waiting
> @@ -1469,10 +1487,9 @@ static void guc_timestamp_ping(struct work_struct *wrk)
>   	 */
>   	ret = intel_gt_reset_trylock(gt, &srcu);
>   	if (ret)
> -		return;
> +		goto err_trylock;
>   
> -	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
> -		__update_guc_busyness_stats(guc);
> +	__update_guc_busyness_stats(guc);
>   
>   	/* adjust context stats for overflow */
>   	xa_for_each(&guc->context_lookup, index, ce)
> @@ -1481,6 +1498,9 @@ static void guc_timestamp_ping(struct work_struct *wrk)
>   	intel_gt_reset_unlock(gt, srcu);
>   
>   	guc_enable_busyness_worker(guc);
> +
> +err_trylock:
> +	intel_runtime_pm_put(&gt->i915->runtime_pm, wakeref);
>   }
>   
>   static int guc_action_enable_usage_stats(struct intel_guc *guc)
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index e250bedf90fb..d37b255559a0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1461,6 +1461,24 @@  static void guc_timestamp_ping(struct work_struct *wrk)
 	unsigned long index;
 	int srcu, ret;
 
+	/*
+	 * The worker is canceled in the __gt_park path, but we still see it
+	 * running sometimes during suspend.
+	 *
+	 * Only update stats if gt is awake. If not, intel_guc_busyness_park
+	 * would have already updated the stats. Note that we do not requeue the
+	 * worker in this case since intel_guc_busyness_unpark would do that at
+	 * some point.
+	 *
+	 * If the gt was parked longer than time taken for GT timestamp to roll
+	 * over, we ignore those rollovers since we don't care about tracking
+	 * the exact GT time. We only care about roll overs when the gt is
+	 * active and running workloads.
+	 */
+	wakeref = intel_runtime_pm_get_if_active(&gt->i915->runtime_pm);
+	if (!wakeref)
+		return;
+
 	/*
 	 * Synchronize with gt reset to make sure the worker does not
 	 * corrupt the engine/guc stats. NB: can't actually block waiting
@@ -1469,10 +1487,9 @@  static void guc_timestamp_ping(struct work_struct *wrk)
 	 */
 	ret = intel_gt_reset_trylock(gt, &srcu);
 	if (ret)
-		return;
+		goto err_trylock;
 
-	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
-		__update_guc_busyness_stats(guc);
+	__update_guc_busyness_stats(guc);
 
 	/* adjust context stats for overflow */
 	xa_for_each(&guc->context_lookup, index, ce)
@@ -1481,6 +1498,9 @@  static void guc_timestamp_ping(struct work_struct *wrk)
 	intel_gt_reset_unlock(gt, srcu);
 
 	guc_enable_busyness_worker(guc);
+
+err_trylock:
+	intel_runtime_pm_put(&gt->i915->runtime_pm, wakeref);
 }
 
 static int guc_action_enable_usage_stats(struct intel_guc *guc)