diff mbox series

[1/2] i915/pmu: Add support for total context runtime for GuC back-end

Message ID 20230427224705.2785566-2-umesh.nerlige.ramappa@intel.com (mailing list archive)
State New, archived
Headers show
Series fdinfo: Enable some support for GuC based client busyness | expand

Commit Message

Umesh Nerlige Ramappa April 27, 2023, 10:47 p.m. UTC
GPU accumulates the context runtime in a 32 bit counter - CTX_TIMESTAMP
in the context image. This value is saved/restored on context switches.
KMD accumulates these values into a 64 bit counter taking care of any
overflows as needed. This count provides the basis for client specific
busyness in the fdinfo interface.

KMD accumulation happens just before the context is unpinned and when
context switches out. This works for execlist back-end since execlist
scheduling has visibility into context switches. With GuC mode, KMD does
not have visibility into context switches and this counter is
accumulated only when context is unpinned. Context is unpinned once the
context scheduling is successfully disabled. Disabling context
scheduling is an asynchronous operation. Also if a context is servicing
frequent requests, scheduling may never be disabled on it.

For GuC mode, since updates to the context runtime may be delayed, add
hooks to update the context runtime in a worker thread as well as when
a user queries for it.

Limitation:
- If a context is never switched out or runs for a long period of time,
  the runtime value of CTX_TIMESTAMP may never be updated, so the
  counter value may be unreliable. This patch does not support such
  cases. Such support must be available from the GuC FW and it is WIP.

This patch is an extract from previous work authored by John/Umesh here -
https://patchwork.freedesktop.org/patch/496441/?series=105085&rev=4

v2: (Ashutosh)
- Drop COPS_RUNTIME_ACTIVE_TOTAL
- s/guc_context_update_clks/__guc_context_update_stats
- Pin context before accessing in guc_timestamp_ping
- In guc_context_unpin, use spinlock to serialize access to runtime stats

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Co-developed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |  5 ++-
 drivers/gpu/drm/i915/gt/intel_context.h       |  2 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |  2 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 31 +++++++++++++++++++
 4 files changed, 38 insertions(+), 2 deletions(-)

Comments

Dixit, Ashutosh April 28, 2023, 9:55 p.m. UTC | #1
On Thu, 27 Apr 2023 15:47:04 -0700, Umesh Nerlige Ramappa wrote:
>
> GPU accumulates the context runtime in a 32 bit counter - CTX_TIMESTAMP
> in the context image. This value is saved/restored on context switches.
> KMD accumulates these values into a 64 bit counter taking care of any
> overflows as needed. This count provides the basis for client specific
> busyness in the fdinfo interface.
>
> KMD accumulation happens just before the context is unpinned and when
> context switches out. This works for execlist back-end since execlist
> scheduling has visibility into context switches. With GuC mode, KMD does
> not have visibility into context switches and this counter is
> accumulated only when context is unpinned. Context is unpinned once the
> context scheduling is successfully disabled. Disabling context
> scheduling is an asynchronous operation. Also if a context is servicing
> frequent requests, scheduling may never be disabled on it.
>
> For GuC mode, since updates to the context runtime may be delayed, add
> hooks to update the context runtime in a worker thread as well as when
> a user queries for it.
>
> Limitation:
> - If a context is never switched out or runs for a long period of time,
>   the runtime value of CTX_TIMESTAMP may never be updated, so the
>   counter value may be unreliable. This patch does not support such
>   cases. Such support must be available from the GuC FW and it is WIP.
>
> This patch is an extract from previous work authored by John/Umesh here -
> https://patchwork.freedesktop.org/patch/496441/?series=105085&rev=4
>
> v2: (Ashutosh)
> - Drop COPS_RUNTIME_ACTIVE_TOTAL
> - s/guc_context_update_clks/__guc_context_update_stats
> - Pin context before accessing in guc_timestamp_ping
> - In guc_context_unpin, use spinlock to serialize access to runtime stats

LGTM now:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

I have made a few notes for myself below.

>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> Co-developed-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_context.c       |  5 ++-
>  drivers/gpu/drm/i915/gt/intel_context.h       |  2 +-
>  drivers/gpu/drm/i915/gt/intel_context_types.h |  2 ++
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 31 +++++++++++++++++++
>  4 files changed, 38 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index 2aa63ec521b8..a53b26178f0a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -578,10 +578,13 @@ void intel_context_bind_parent_child(struct intel_context *parent,
>	child->parallel.parent = parent;
>  }
>
> -u64 intel_context_get_total_runtime_ns(const struct intel_context *ce)
> +u64 intel_context_get_total_runtime_ns(struct intel_context *ce)
>  {
>	u64 total, active;
>
> +	if (ce->ops->update_stats)
> +		ce->ops->update_stats(ce);

This is indeed called from atomic context due to rcu_read_lock in
show_client_class, so we can't sleep in update_stats (and we are not due to
use of intel_context_pin_if_active).

Because we are already updating stats from guc_timestamp_ping, we could
have skipped updating them again from here without losing too much accuracy
but it's fine.

> +
>	total = ce->stats.runtime.total;
>	if (ce->ops->flags & COPS_RUNTIME_CYCLES)
>		total *= ce->engine->gt->clock_period_ns;
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> index 48f888c3da08..43ed92f8dc83 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> @@ -375,7 +375,7 @@ intel_context_clear_nopreempt(struct intel_context *ce)
>	clear_bit(CONTEXT_NOPREEMPT, &ce->flags);
>  }
>
> -u64 intel_context_get_total_runtime_ns(const struct intel_context *ce);
> +u64 intel_context_get_total_runtime_ns(struct intel_context *ce);
>  u64 intel_context_get_avg_runtime_ns(struct intel_context *ce);
>
>  static inline u64 intel_context_clock(void)
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index e36670f2e626..aceaac28a33e 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -58,6 +58,8 @@ struct intel_context_ops {
>
>	void (*sched_disable)(struct intel_context *ce);
>
> +	void (*update_stats)(struct intel_context *ce);
> +
>	void (*reset)(struct intel_context *ce);
>	void (*destroy)(struct kref *kref);
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index ee3e8352637f..c869ddc73e69 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1402,13 +1402,34 @@ static void __update_guc_busyness_stats(struct intel_guc *guc)
>	spin_unlock_irqrestore(&guc->timestamp.lock, flags);
>  }
>
> +static void __guc_context_update_stats(struct intel_context *ce)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&guc->timestamp.lock, flags);
> +	lrc_update_runtime(ce);
> +	spin_unlock_irqrestore(&guc->timestamp.lock, flags);
> +}
> +
> +static void guc_context_update_stats(struct intel_context *ce)
> +{
> +	if (!intel_context_pin_if_active(ce))

Correct, all we need here is intel_context_pin_if_active. If the context is
not pinned we updated stats when we unpinned.

> +		return;
> +
> +	__guc_context_update_stats(ce);
> +	intel_context_unpin(ce);
> +}
> +
>  static void guc_timestamp_ping(struct work_struct *wrk)
>  {
>	struct intel_guc *guc = container_of(wrk, typeof(*guc),
>					     timestamp.work.work);
>	struct intel_uc *uc = container_of(guc, typeof(*uc), guc);
>	struct intel_gt *gt = guc_to_gt(guc);
> +	struct intel_context *ce;
>	intel_wakeref_t wakeref;
> +	unsigned long index;
>	int srcu, ret;
>
>	/*
> @@ -1424,6 +1445,10 @@ static void guc_timestamp_ping(struct work_struct *wrk)
>	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
>		__update_guc_busyness_stats(guc);
>
> +	/* adjust context stats for overflow */
> +	xa_for_each(&guc->context_lookup, index, ce)
> +		guc_context_update_stats(ce);

This helps prevent overflow if the context switched out (which updated the
runtime in the context image) but was not not unpinned.

> +
>	intel_gt_reset_unlock(gt, srcu);
>
>	guc_enable_busyness_worker(guc);
> @@ -2774,6 +2799,7 @@ static void guc_context_unpin(struct intel_context *ce)
>  {
>	struct intel_guc *guc = ce_to_guc(ce);
>
> +	__guc_context_update_stats(ce);

Correct, cannot call intel_context_unpin (from guc_context_update_stats)
since this function is called from intel_context_unpin.

>	unpin_guc_id(guc, ce);
>	lrc_unpin(ce);
>
> @@ -3455,6 +3481,7 @@ static void remove_from_context(struct i915_request *rq)
>  }
>
>  static const struct intel_context_ops guc_context_ops = {
> +	.flags = COPS_RUNTIME_CYCLES,
>	.alloc = guc_context_alloc,
>
>	.close = guc_context_close,
> @@ -3473,6 +3500,8 @@ static const struct intel_context_ops guc_context_ops = {
>
>	.sched_disable = guc_context_sched_disable,
>
> +	.update_stats = guc_context_update_stats,
> +
>	.reset = lrc_reset,
>	.destroy = guc_context_destroy,
>
> @@ -3728,6 +3757,7 @@ static int guc_virtual_context_alloc(struct intel_context *ce)
>  }
>
>  static const struct intel_context_ops virtual_guc_context_ops = {
> +	.flags = COPS_RUNTIME_CYCLES,
>	.alloc = guc_virtual_context_alloc,
>
>	.close = guc_context_close,
> @@ -3745,6 +3775,7 @@ static const struct intel_context_ops virtual_guc_context_ops = {
>	.exit = guc_virtual_context_exit,
>
>	.sched_disable = guc_context_sched_disable,
> +	.update_stats = guc_context_update_stats,

The approach should also work for virtual engines.

>
>	.destroy = guc_context_destroy,
>
> --
> 2.36.1
>

Thanks.
--
Ashutosh
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 2aa63ec521b8..a53b26178f0a 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -578,10 +578,13 @@  void intel_context_bind_parent_child(struct intel_context *parent,
 	child->parallel.parent = parent;
 }
 
-u64 intel_context_get_total_runtime_ns(const struct intel_context *ce)
+u64 intel_context_get_total_runtime_ns(struct intel_context *ce)
 {
 	u64 total, active;
 
+	if (ce->ops->update_stats)
+		ce->ops->update_stats(ce);
+
 	total = ce->stats.runtime.total;
 	if (ce->ops->flags & COPS_RUNTIME_CYCLES)
 		total *= ce->engine->gt->clock_period_ns;
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index 48f888c3da08..43ed92f8dc83 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -375,7 +375,7 @@  intel_context_clear_nopreempt(struct intel_context *ce)
 	clear_bit(CONTEXT_NOPREEMPT, &ce->flags);
 }
 
-u64 intel_context_get_total_runtime_ns(const struct intel_context *ce);
+u64 intel_context_get_total_runtime_ns(struct intel_context *ce);
 u64 intel_context_get_avg_runtime_ns(struct intel_context *ce);
 
 static inline u64 intel_context_clock(void)
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index e36670f2e626..aceaac28a33e 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -58,6 +58,8 @@  struct intel_context_ops {
 
 	void (*sched_disable)(struct intel_context *ce);
 
+	void (*update_stats)(struct intel_context *ce);
+
 	void (*reset)(struct intel_context *ce);
 	void (*destroy)(struct kref *kref);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ee3e8352637f..c869ddc73e69 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1402,13 +1402,34 @@  static void __update_guc_busyness_stats(struct intel_guc *guc)
 	spin_unlock_irqrestore(&guc->timestamp.lock, flags);
 }
 
+static void __guc_context_update_stats(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	unsigned long flags;
+
+	spin_lock_irqsave(&guc->timestamp.lock, flags);
+	lrc_update_runtime(ce);
+	spin_unlock_irqrestore(&guc->timestamp.lock, flags);
+}
+
+static void guc_context_update_stats(struct intel_context *ce)
+{
+	if (!intel_context_pin_if_active(ce))
+		return;
+
+	__guc_context_update_stats(ce);
+	intel_context_unpin(ce);
+}
+
 static void guc_timestamp_ping(struct work_struct *wrk)
 {
 	struct intel_guc *guc = container_of(wrk, typeof(*guc),
 					     timestamp.work.work);
 	struct intel_uc *uc = container_of(guc, typeof(*uc), guc);
 	struct intel_gt *gt = guc_to_gt(guc);
+	struct intel_context *ce;
 	intel_wakeref_t wakeref;
+	unsigned long index;
 	int srcu, ret;
 
 	/*
@@ -1424,6 +1445,10 @@  static void guc_timestamp_ping(struct work_struct *wrk)
 	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
 		__update_guc_busyness_stats(guc);
 
+	/* adjust context stats for overflow */
+	xa_for_each(&guc->context_lookup, index, ce)
+		guc_context_update_stats(ce);
+
 	intel_gt_reset_unlock(gt, srcu);
 
 	guc_enable_busyness_worker(guc);
@@ -2774,6 +2799,7 @@  static void guc_context_unpin(struct intel_context *ce)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 
+	__guc_context_update_stats(ce);
 	unpin_guc_id(guc, ce);
 	lrc_unpin(ce);
 
@@ -3455,6 +3481,7 @@  static void remove_from_context(struct i915_request *rq)
 }
 
 static const struct intel_context_ops guc_context_ops = {
+	.flags = COPS_RUNTIME_CYCLES,
 	.alloc = guc_context_alloc,
 
 	.close = guc_context_close,
@@ -3473,6 +3500,8 @@  static const struct intel_context_ops guc_context_ops = {
 
 	.sched_disable = guc_context_sched_disable,
 
+	.update_stats = guc_context_update_stats,
+
 	.reset = lrc_reset,
 	.destroy = guc_context_destroy,
 
@@ -3728,6 +3757,7 @@  static int guc_virtual_context_alloc(struct intel_context *ce)
 }
 
 static const struct intel_context_ops virtual_guc_context_ops = {
+	.flags = COPS_RUNTIME_CYCLES,
 	.alloc = guc_virtual_context_alloc,
 
 	.close = guc_context_close,
@@ -3745,6 +3775,7 @@  static const struct intel_context_ops virtual_guc_context_ops = {
 	.exit = guc_virtual_context_exit,
 
 	.sched_disable = guc_context_sched_disable,
+	.update_stats = guc_context_update_stats,
 
 	.destroy = guc_context_destroy,