diff mbox series

[v6,09/11] drm/i915/perf: allow holding preemption on filtered ctx

Message ID 20190701113437.4043-10-lionel.g.landwerlin@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Vulkan performance query support | expand

Commit Message

Lionel Landwerlin July 1, 2019, 11:34 a.m. UTC
We would like to make use of perf in Vulkan. The Vulkan API is much
lower level than OpenGL, with applications directly exposed to the
concept of command buffers (pretty much equivalent to our batch
buffers). In Vulkan, queries are always limited in scope to a command
buffer. In OpenGL, the lack of command buffer concept meant that
queries' duration could span multiple command buffers.

With that restriction gone in Vulkan, we would like to simplify
measuring performance just by measuring the deltas between the counter
snapshots written by 2 MI_RECORD_PERF_COUNT commands, rather than the
more complex scheme we currently have in the GL driver, using 2
MI_RECORD_PERF_COUNT commands and doing some post processing on the
stream of OA reports, coming from the global OA buffer, to remove any
unrelated deltas in between the 2 MI_RECORD_PERF_COUNT.

Disabling preemption only apply to a single context with which want to
query performance counters for and is considered a privileged
operation, by default protected by CAP_SYS_ADMIN. It is possible to
enable it for a normal user by disabling the paranoid stream setting.

v2: Store preemption setting in intel_context (Chris)

v3: Use priorities to avoid preemption rather than the HW mechanism

v4: Just modify the port priority reporting function

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  8 +++++++
 drivers/gpu/drm/i915/gt/intel_lrc.c           |  7 +++++-
 drivers/gpu/drm/i915/i915_drv.c               |  2 +-
 drivers/gpu/drm/i915/i915_drv.h               |  8 +++++++
 drivers/gpu/drm/i915/i915_perf.c              | 22 +++++++++++++++++--
 drivers/gpu/drm/i915/i915_priolist_types.h    |  7 ++++++
 drivers/gpu/drm/i915/i915_request.c           |  4 ++--
 drivers/gpu/drm/i915/i915_request.h           | 14 +++++++++++-
 drivers/gpu/drm/i915/intel_guc_submission.c   | 10 ++++++++-
 drivers/gpu/drm/i915/intel_pm.c               |  5 +++--
 include/uapi/drm/i915_drm.h                   | 11 ++++++++++
 11 files changed, 88 insertions(+), 10 deletions(-)

Comments

Chris Wilson July 1, 2019, 12:03 p.m. UTC | #1
Quoting Lionel Landwerlin (2019-07-01 12:34:35)
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index f92bace9caff..012d6d7f54e2 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -2104,6 +2104,14 @@ static int eb_oa_config(struct i915_execbuffer *eb)
>         if (err)
>                 return err;
>  
> +       /*
> +        * If the perf stream was opened with hold preemption, flag the
> +        * request properly so that the priority of the request is bumped once
> +        * it reaches the execlist ports.
> +        */
> +       if (eb->i915->perf.oa.exclusive_stream->hold_preemption)
> +               eb->request->flags |= I915_REQUEST_FLAGS_PERF;

Just to reassure myself that this is the behaviour you:

If the exclusive_stream is changed before the request is executed, it is
likely that we no longer notice the earlier preemption-protection. This
should not matter because the listener is no longer interested in those
events?
-Chris
Lionel Landwerlin July 1, 2019, 12:10 p.m. UTC | #2
On 01/07/2019 15:03, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-07-01 12:34:35)
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index f92bace9caff..012d6d7f54e2 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -2104,6 +2104,14 @@ static int eb_oa_config(struct i915_execbuffer *eb)
>>          if (err)
>>                  return err;
>>   
>> +       /*
>> +        * If the perf stream was opened with hold preemption, flag the
>> +        * request properly so that the priority of the request is bumped once
>> +        * it reaches the execlist ports.
>> +        */
>> +       if (eb->i915->perf.oa.exclusive_stream->hold_preemption)
>> +               eb->request->flags |= I915_REQUEST_FLAGS_PERF;
> Just to reassure myself that this is the behaviour you:
>
> If the exclusive_stream is changed before the request is executed, it is
> likely that we no longer notice the earlier preemption-protection. This
> should not matter because the listener is no longer interested in those
> events?
> -Chris
>

Yeah, dropping the perf stream before your queries complete and you're 
in undefined behavior territory.


-Lionel
Chris Wilson July 1, 2019, 2:37 p.m. UTC | #3
Quoting Lionel Landwerlin (2019-07-01 13:10:53)
> On 01/07/2019 15:03, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2019-07-01 12:34:35)
> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >> index f92bace9caff..012d6d7f54e2 100644
> >> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >> @@ -2104,6 +2104,14 @@ static int eb_oa_config(struct i915_execbuffer *eb)
> >>          if (err)
> >>                  return err;
> >>   
> >> +       /*
> >> +        * If the perf stream was opened with hold preemption, flag the
> >> +        * request properly so that the priority of the request is bumped once
> >> +        * it reaches the execlist ports.
> >> +        */
> >> +       if (eb->i915->perf.oa.exclusive_stream->hold_preemption)
> >> +               eb->request->flags |= I915_REQUEST_FLAGS_PERF;
> > Just to reassure myself that this is the behaviour you:
> >
> > If the exclusive_stream is changed before the request is executed, it is
> > likely that we no longer notice the earlier preemption-protection. This
> > should not matter because the listener is no longer interested in those
> > events?
> > -Chris
> >
> 
> Yeah, dropping the perf stream before your queries complete and you're 
> in undefined behavior territory.

Then this should do what you want, and if I break it in future, I have
to fix it ;)

Hmm, this definitely merits some selftest/igt as I am very liable to
break it.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Lionel Landwerlin July 9, 2019, 9:18 a.m. UTC | #4
On 01/07/2019 17:37, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-07-01 13:10:53)
>> On 01/07/2019 15:03, Chris Wilson wrote:
>>> Quoting Lionel Landwerlin (2019-07-01 12:34:35)
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>> index f92bace9caff..012d6d7f54e2 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>> @@ -2104,6 +2104,14 @@ static int eb_oa_config(struct i915_execbuffer *eb)
>>>>           if (err)
>>>>                   return err;
>>>>    
>>>> +       /*
>>>> +        * If the perf stream was opened with hold preemption, flag the
>>>> +        * request properly so that the priority of the request is bumped once
>>>> +        * it reaches the execlist ports.
>>>> +        */
>>>> +       if (eb->i915->perf.oa.exclusive_stream->hold_preemption)
>>>> +               eb->request->flags |= I915_REQUEST_FLAGS_PERF;
>>> Just to reassure myself that this is the behaviour you:
>>>
>>> If the exclusive_stream is changed before the request is executed, it is
>>> likely that we no longer notice the earlier preemption-protection. This
>>> should not matter because the listener is no longer interested in those
>>> events?
>>> -Chris
>>>
>> Yeah, dropping the perf stream before your queries complete and you're
>> in undefined behavior territory.
> Then this should do what you want, and if I break it in future, I have
> to fix it ;)
>
> Hmm, this definitely merits some selftest/igt as I am very liable to
> break it.
>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> -Chris
>

I had an IGT test that used a spinning batch for a couple of seconds 
(same trick as the noa wait patch) and verified that no other context 
would come up in the OA buffer.

This might need allowing the preempt context or whatever is used for 
hang checks.


-Lionel
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index f92bace9caff..012d6d7f54e2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2104,6 +2104,14 @@  static int eb_oa_config(struct i915_execbuffer *eb)
 	if (err)
 		return err;
 
+	/*
+	 * If the perf stream was opened with hold preemption, flag the
+	 * request properly so that the priority of the request is bumped once
+	 * it reaches the execlist ports.
+	 */
+	if (eb->i915->perf.oa.exclusive_stream->hold_preemption)
+		eb->request->flags |= I915_REQUEST_FLAGS_PERF;
+
 	/*
 	 * If the config hasn't changed, skip reconfiguring the HW (this is
 	 * subject to a delay we want to avoid has much as possible).
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 001b696834f2..3ec7fe6d2790 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -256,7 +256,12 @@  static inline int rq_prio(const struct i915_request *rq)
 
 static int effective_prio(const struct i915_request *rq)
 {
-	int prio = rq_prio(rq);
+	int prio;
+
+	if (i915_request_has_perf(rq))
+		prio = I915_USER_PRIORITY(I915_PRIORITY_PERF);
+	else
+		prio = rq_prio(rq);
 
 	/*
 	 * On unwinding the active request, we give it a priority bump
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index a08c6123bf38..6395deadf73f 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -485,7 +485,7 @@  static int i915_getparam_ioctl(struct drm_device *dev, void *data,
 		value = INTEL_INFO(dev_priv)->has_coherent_ggtt;
 		break;
 	case I915_PARAM_PERF_REVISION:
-		value = 1;
+		value = 2;
 		break;
 	case I915_PARAM_HAS_EXEC_PERF_CONFIG:
 		/* Obviously requires perf support. */
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 664554114718..84f9dec59a19 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1232,6 +1232,14 @@  struct i915_perf_stream {
 	 */
 	bool enabled;
 
+	/**
+	 * @hold_preemption: Whether preemption is put on hold for command
+	 * submissions done on the @ctx. This is useful for some drivers that
+	 * cannot easily post process the OA buffer context to subtract delta
+	 * of performance counters not associated with @ctx.
+	 */
+	bool hold_preemption;
+
 	/**
 	 * @ops: The callbacks providing the implementation of this specific
 	 * type of configured stream.
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 6b659b5f1948..272036316192 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -344,6 +344,8 @@  static const struct i915_oa_format gen8_plus_oa_formats[I915_OA_FORMAT_MAX] = {
  * struct perf_open_properties - for validated properties given to open a stream
  * @sample_flags: `DRM_I915_PERF_PROP_SAMPLE_*` properties are tracked as flags
  * @single_context: Whether a single or all gpu contexts should be monitored
+ * @hold_preemption: Whether the preemption is disabled for the filtered
+ *                   context
  * @ctx_handle: A gem ctx handle for use with @single_context
  * @metrics_set: An ID for an OA unit metric set advertised via sysfs
  * @oa_format: An OA unit HW report format
@@ -358,6 +360,7 @@  struct perf_open_properties {
 	u32 sample_flags;
 
 	u64 single_context:1;
+	u64 hold_preemption:1;
 	u64 ctx_handle;
 
 	/* OA sampling state */
@@ -2399,6 +2402,8 @@  static int i915_oa_stream_init(struct i915_perf_stream *stream,
 	stream->sample_flags |= SAMPLE_OA_REPORT;
 	stream->sample_size += format_size;
 
+	stream->hold_preemption = props->hold_preemption;
+
 	dev_priv->perf.oa.oa_buffer.format_size = format_size;
 	if (WARN_ON(dev_priv->perf.oa.oa_buffer.format_size == 0))
 		return -EINVAL;
@@ -2937,6 +2942,15 @@  i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
 		}
 	}
 
+	if (props->hold_preemption) {
+		if (!props->single_context) {
+			DRM_DEBUG("preemption disable with no context\n");
+			ret = -EINVAL;
+			goto err;
+		}
+		privileged_op = true;
+	}
+
 	/*
 	 * On Haswell the OA unit supports clock gating off for a specific
 	 * context and in this mode there's no visibility of metrics for the
@@ -2951,8 +2965,9 @@  i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
 	 * MI_REPORT_PERF_COUNT commands and so consider it a privileged op to
 	 * enable the OA unit by default.
 	 */
-	if (IS_HASWELL(dev_priv) && specific_ctx)
+	if (IS_HASWELL(dev_priv) && specific_ctx && !props->hold_preemption) {
 		privileged_op = false;
+	}
 
 	/* Similar to perf's kernel.perf_paranoid_cpu sysctl option
 	 * we check a dev.i915.perf_stream_paranoid sysctl option
@@ -2961,7 +2976,7 @@  i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv,
 	 */
 	if (privileged_op &&
 	    i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
-		DRM_DEBUG("Insufficient privileges to open system-wide i915 perf stream\n");
+		DRM_DEBUG("Insufficient privileges to open i915 perf stream\n");
 		ret = -EACCES;
 		goto err_ctx;
 	}
@@ -3153,6 +3168,9 @@  static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 			props->oa_periodic = true;
 			props->oa_period_exponent = value;
 			break;
+		case DRM_I915_PERF_PROP_HOLD_PREEMPTION:
+			props->hold_preemption = !!value;
+			break;
 		case DRM_I915_PERF_PROP_MAX:
 			MISSING_CASE(id);
 			return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_priolist_types.h b/drivers/gpu/drm/i915/i915_priolist_types.h
index 49709de69875..bb9d1e1fb94a 100644
--- a/drivers/gpu/drm/i915/i915_priolist_types.h
+++ b/drivers/gpu/drm/i915/i915_priolist_types.h
@@ -17,6 +17,13 @@  enum {
 	I915_PRIORITY_NORMAL = I915_CONTEXT_DEFAULT_PRIORITY,
 	I915_PRIORITY_MAX = I915_CONTEXT_MAX_USER_PRIORITY + 1,
 
+	/* Requests containing performance queries must not be preempted by
+	 * another context. They get scheduled with their default priority and
+	 * once they reach the execlist ports we bump them to
+	 * I915_PRIORITY_PERF so that they stick to the HW until they finish.
+	 */
+	I915_PRIORITY_PERF,
+
 	I915_PRIORITY_INVALID = INT_MIN
 };
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5ff87c4a0cd5..222c9c56e9de 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -292,7 +292,7 @@  static bool i915_request_retire(struct i915_request *rq)
 		dma_fence_signal_locked(&rq->fence);
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
 		i915_request_cancel_breadcrumb(rq);
-	if (rq->waitboost) {
+	if (i915_request_has_waitboost(rq)) {
 		GEM_BUG_ON(!atomic_read(&rq->i915->gt_pm.rps.num_waiters));
 		atomic_dec(&rq->i915->gt_pm.rps.num_waiters);
 	}
@@ -684,7 +684,7 @@  __i915_request_create(struct intel_context *ce, gfp_t gfp)
 	rq->file_priv = NULL;
 	rq->batch = NULL;
 	rq->capture_list = NULL;
-	rq->waitboost = false;
+	rq->flags = 0;
 	rq->execution_mask = ALL_ENGINES;
 
 	INIT_LIST_HEAD(&rq->active_list);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index b58ceef92e20..3bc2a3b8b9ca 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -216,7 +216,9 @@  struct i915_request {
 	/** Time at which this request was emitted, in jiffies. */
 	unsigned long emitted_jiffies;
 
-	bool waitboost;
+#define I915_REQUEST_FLAGS_WAITBOOST BIT(0)
+#define I915_REQUEST_FLAGS_PERF      BIT(1)
+	u32 flags;
 
 	/** timeline->request entry for this request */
 	struct list_head link;
@@ -430,6 +432,16 @@  static inline void i915_request_mark_complete(struct i915_request *rq)
 	rq->hwsp_seqno = (u32 *)&rq->fence.seqno; /* decouple from HWSP */
 }
 
+static inline bool i915_request_has_waitboost(const struct i915_request *rq)
+{
+	return rq->flags & I915_REQUEST_FLAGS_WAITBOOST;
+}
+
+static inline bool i915_request_has_perf(const struct i915_request *rq)
+{
+	return rq->flags & I915_REQUEST_FLAGS_PERF;
+}
+
 bool i915_retire_requests(struct drm_i915_private *i915);
 
 #endif /* I915_REQUEST_H */
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 12c22359fdac..48e7ae3d67a2 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -707,6 +707,14 @@  static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority | __NO_PREEMPTION;
 }
 
+static inline int effective_prio(const struct i915_request *rq)
+{
+	if (i915_request_has_perf(rq))
+		return I915_USER_PRIORITY(I915_PRIORITY_PERF) | __NO_PREEMPTION;
+
+	return rq_prio(rq);
+}
+
 static struct i915_request *schedule_in(struct i915_request *rq, int idx)
 {
 	trace_i915_request_in(rq, idx);
@@ -747,7 +755,7 @@  static void __guc_dequeue(struct intel_engine_cs *engine)
 				&engine->i915->guc.preempt_work[engine->id];
 			int prio = execlists->queue_priority_hint;
 
-			if (i915_scheduler_need_preempt(prio, rq_prio(last))) {
+			if (i915_scheduler_need_preempt(prio, effective_prio(last))) {
 				intel_write_status_page(engine,
 							I915_GEM_HWS_PREEMPT,
 							GUC_PREEMPT_INPROGRESS);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index d9a7a13ce32a..491251419c67 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6891,9 +6891,10 @@  void gen6_rps_boost(struct i915_request *rq)
 	/* Serializes with i915_request_retire() */
 	boost = false;
 	spin_lock_irqsave(&rq->lock, flags);
-	if (!rq->waitboost && !dma_fence_is_signaled_locked(&rq->fence)) {
+	if (!i915_request_has_waitboost(rq) &&
+	    !dma_fence_is_signaled_locked(&rq->fence)) {
 		boost = !atomic_fetch_inc(&rps->num_waiters);
-		rq->waitboost = true;
+		rq->flags |= I915_REQUEST_FLAGS_WAITBOOST;
 	}
 	spin_unlock_irqrestore(&rq->lock, flags);
 	if (!boost)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 96bc09af8cc4..f2b7a6c1e870 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1983,6 +1983,17 @@  enum drm_i915_perf_property_id {
 	 */
 	DRM_I915_PERF_PROP_OA_EXPONENT,
 
+	/**
+	 * Specifying this property is only valid when specify a context to
+	 * filter with DRM_I915_PERF_PROP_CTX_HANDLE. Specifying this property
+	 * will hold preemption of the particular context we want to gather
+	 * performance data about. The execbuf2 submissions must include a
+	 * drm_i915_gem_execbuffer_ext_perf parameter for this to apply.
+	 *
+	 * This property is available in perf revision 2.
+	 */
+	DRM_I915_PERF_PROP_HOLD_PREEMPTION,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };