diff mbox

[13/15] drm/i915: Integrate GuC-based command submission

Message ID 1434393394-21002-14-git-send-email-david.s.gordon@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dave Gordon June 15, 2015, 6:36 p.m. UTC
From: Alex Dai <yu.dai@intel.com>

GuC-based submission is mostly the same as execlist mode, up to
intel_logical_ring_advance_and_submit(), where the context being
dispatched would be added to the execlist queue; at this point
we submit the context to the GuC backend instead.

There are, however, a few other changes also required, notably:
1.  Contexts must be pinned at GGTT addresses accessible by the GuC
    i.e. NOT in the range [0..WOPCM_SIZE), so we have to add the
    PIN_OFFSET_BIAS flag to the relevant GGTT-pinning calls.

2.  The GuC's TLB must be invalidated after a context is pinned at
    a new GGTT address.

3.  GuC firmware uses the one page before Ring Context as shared data.
    Therefore, whenever driver wants to get base address of LRC, we
    will offset one page for it. LRC_PPHWSP_PN is defined as the page
    number of LRCA.

4.  In the work queue used to pass requests to the GuC, the GuC
    firmware requires the ring-tail-offset to be represented as an
    11-bit value, expressed in QWords. Therefore, the ringbuffer
    size must be reduced to the representable range (4 pages).

Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |    2 +-
 drivers/gpu/drm/i915/i915_guc_submission.c |   46 ++++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_guc.h           |    1 +
 drivers/gpu/drm/i915/intel_lrc.c           |   48 ++++++++++++++++++++--------
 drivers/gpu/drm/i915/intel_lrc.h           |    6 ++++
 5 files changed, 86 insertions(+), 17 deletions(-)

Comments

Chris Wilson June 16, 2015, 9:22 a.m. UTC | #1
On Mon, Jun 15, 2015 at 07:36:31PM +0100, Dave Gordon wrote:
> From: Alex Dai <yu.dai@intel.com>
> 
> GuC-based submission is mostly the same as execlist mode, up to
> intel_logical_ring_advance_and_submit(), where the context being
> dispatched would be added to the execlist queue; at this point
> we submit the context to the GuC backend instead.
> 
> There are, however, a few other changes also required, notably:
> 1.  Contexts must be pinned at GGTT addresses accessible by the GuC
>     i.e. NOT in the range [0..WOPCM_SIZE), so we have to add the
>     PIN_OFFSET_BIAS flag to the relevant GGTT-pinning calls.
> 
> 2.  The GuC's TLB must be invalidated after a context is pinned at
>     a new GGTT address.
> 
> 3.  GuC firmware uses the one page before Ring Context as shared data.
>     Therefore, whenever driver wants to get base address of LRC, we
>     will offset one page for it. LRC_PPHWSP_PN is defined as the page
>     number of LRCA.
> 
> 4.  In the work queue used to pass requests to the GuC, the GuC
>     firmware requires the ring-tail-offset to be represented as an
>     11-bit value, expressed in QWords. Therefore, the ringbuffer
>     size must be reduced to the representable range (4 pages).

I don't like how this sabotages the existing execlists implementation
in order for i915_guc_submission (an interesting choice of file name,
since we go i915_gem_execbuffer (API) -> intel_execlists (HW) ->
i915_guc_submission (HW), not fitting into our, admittedly loose, naming
convention very well) to share a few functions. Even a couple of which
are already vfunc.
-Chris
Dave Gordon June 19, 2015, 6:18 p.m. UTC | #2
On 16/06/15 10:22, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:31PM +0100, Dave Gordon wrote:
>> From: Alex Dai <yu.dai@intel.com>
>>
>> GuC-based submission is mostly the same as execlist mode, up to
>> intel_logical_ring_advance_and_submit(), where the context being
>> dispatched would be added to the execlist queue; at this point
>> we submit the context to the GuC backend instead.
>>
>> There are, however, a few other changes also required, notably:
>> 1.  Contexts must be pinned at GGTT addresses accessible by the GuC
>>     i.e. NOT in the range [0..WOPCM_SIZE), so we have to add the
>>     PIN_OFFSET_BIAS flag to the relevant GGTT-pinning calls.
>>
>> 2.  The GuC's TLB must be invalidated after a context is pinned at
>>     a new GGTT address.
>>
>> 3.  GuC firmware uses the one page before Ring Context as shared data.
>>     Therefore, whenever driver wants to get base address of LRC, we
>>     will offset one page for it. LRC_PPHWSP_PN is defined as the page
>>     number of LRCA.
>>
>> 4.  In the work queue used to pass requests to the GuC, the GuC
>>     firmware requires the ring-tail-offset to be represented as an
>>     11-bit value, expressed in QWords. Therefore, the ringbuffer
>>     size must be reduced to the representable range (4 pages).
> 
> I don't like how this sabotages the existing execlists implementation
> in order for i915_guc_submission (an interesting choice of file name,
> since we go i915_gem_execbuffer (API) -> intel_execlists (HW) ->
> i915_guc_submission (HW), not fitting into our, admittedly loose, naming
> convention very well) to share a few functions. Even a couple of which
> are already vfunc.
> -Chris

Not really "sabotages"; big ringbuffers are a waste of space anyway.
Four pages is enough to have at least 64 batchbuffers queued up for the
engine to process (1 second of video, or 0.00012 of a bitcoin). When the
scheduler lands, it will generally reduce the number of batches in the
h/w rings anyway, mostly to improve responsiveness and fair-sharing
among different applications.

I quite agree that the execlists implementation, which is mostly in a
file called intel_lrc.c, should really be split into LRC-related code
(common to execlists and GuC modes) with the rest moved into
intel_execlist_submission.c. Or is that i915_execlist_submission.c?

If we took contexts as primary, rather than "rings" (engines) we could
also share a lot more between GuC/execlists and legacy ringbuffer mode.
At least we should get some improvement with AntiOLR, so we'll have

    request->context<->ringbuffer->engine->submission mechanism :)

.Dave.
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index b0aa4af..c6e2582 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1975,7 +1975,7 @@  static void i915_dump_lrc_obj(struct seq_file *m,
 		return;
 	}
 
-	page = i915_gem_object_get_page(ctx_obj, 1);
+	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
 	if (!WARN_ON(page == NULL)) {
 		reg_state = kmap_atomic(page);
 
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 487f295..b423faf 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -356,16 +356,52 @@  static void guc_init_ctx_desc(struct intel_guc *guc,
 {
 	struct guc_context_desc desc;
 	struct sg_table *sg;
+	int i;
 
 	memset(&desc, 0, sizeof(desc));
 
 	desc.attribute = GUC_CTX_DESC_ATTR_ACTIVE | GUC_CTX_DESC_ATTR_KERNEL;
 	desc.context_id = client->ctx_index;
 	desc.priority = client->priority;
-	desc.engines_used = (1 << RCS) | (1 << VCS) | (1 << BCS) |
-			    (1 << VECS) | (1 << VCS2); /* all engines */
 	desc.db_id = client->doorbell_id;
 
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		struct guc_execlist_context *lrc = &desc.lrc[i];
+		struct intel_engine_cs *ring;
+		struct drm_i915_gem_object *obj;
+
+		/* TODO: We have a design issue to be solved here. Only when we
+		 * receive the first batch, we know which engine is used by the
+		 * user. But here GuC expects the lrc and ring to be pinned. It
+		 * is not an issue for default context, which is the only one
+		 * for now who owns a GuC client. But for future owner of GuC
+		 * client, need to make sure lrc is pinned prior to enter here.
+		 */
+		obj = client->owner->engine[i].state;
+		if (!obj)
+			break;
+
+		ring = client->owner->engine[i].ringbuf->ring;
+
+		lrc->context_desc = execlists_ctx_descriptor(ring, obj);
+		/* The state page is after PPHWSP */
+		lrc->ring_lcra = i915_gem_obj_ggtt_offset(obj) +
+				LRC_STATE_PN * PAGE_SIZE;
+		lrc->context_id = (client->ctx_index << GUC_ELC_CTXID_OFFSET) |
+				(ring->id << GUC_ELC_ENGINE_OFFSET);
+
+		obj = client->owner->engine[i].ringbuf->obj;
+
+		lrc->ring_begin = i915_gem_obj_ggtt_offset(obj);
+		lrc->ring_end = lrc->ring_begin + obj->base.size - 1;
+		lrc->ring_next_free_location = lrc->ring_begin;
+		lrc->ring_current_tail_pointer_value = 0;
+
+		desc.engines_used |= (1 << ring->id);
+	}
+
+	WARN_ON(desc.engines_used == 0);
+
 	/*
 	 * The CPU address is only needed at certain points, so kmap_atomic on
 	 * demand instead of storing it in the ctx descriptor.
@@ -650,6 +686,7 @@  static void guc_client_free(struct drm_device *dev,
  * Return:	An i915_guc_client object if success.
  */
 static struct i915_guc_client *guc_client_alloc(struct drm_device *dev,
+						struct intel_context *ctx,
 						u32 priority)
 {
 	struct i915_guc_client *client;
@@ -698,6 +735,8 @@  static struct i915_guc_client *guc_client_alloc(struct drm_device *dev,
 		/* XXX: evict a doorbell instead */
 		goto err;
 
+	client->owner = ctx;
+
 	guc_init_proc_desc(guc, client);
 	guc_init_ctx_desc(guc, client);
 	guc_init_doorbell(guc, client);
@@ -796,10 +835,11 @@  int i915_guc_submission_enable(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_guc *guc = &dev_priv->guc;
+	struct intel_context *ctx = dev_priv->ring[RCS].default_context;
 	struct i915_guc_client *client;
 
 	/* client for execbuf submission */
-	client = guc_client_alloc(dev, GUC_CTX_PRIORITY_NORMAL);
+	client = guc_client_alloc(dev, ctx, GUC_CTX_PRIORITY_NORMAL);
 	if (!client) {
 		DRM_ERROR("Failed to create execbuf guc_client\n");
 		return -ENOMEM;
diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
index 147d288..4f7d55f 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -33,6 +33,7 @@ 
 struct i915_guc_client {
 	spinlock_t wq_lock;
 	struct drm_i915_gem_object *client_obj;
+	struct intel_context *owner;
 	u32 priority;
 	off_t doorbell_offset;
 	off_t proc_desc_offset;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4fd1941..2801fe2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -180,7 +180,8 @@  int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
  */
 u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
 {
-	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj);
+	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
+			LRC_PPHWSP_PN * PAGE_SIZE;
 
 	/* LRCA is required to be 4K aligned so the more significant 20 bits
 	 * are globally unique */
@@ -192,7 +193,8 @@  uint64_t execlists_ctx_descriptor(struct intel_engine_cs *ring,
 {
 	struct drm_device *dev = ring->dev;
 	uint64_t desc;
-	uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj);
+	uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
+			LRC_PPHWSP_PN * PAGE_SIZE;
 
 	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
 
@@ -262,7 +264,7 @@  static int execlists_update_context(struct drm_i915_gem_object *ctx_obj,
 	struct page *page;
 	uint32_t *reg_state;
 
-	page = i915_gem_object_get_page(ctx_obj, 1);
+	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
 	reg_state = kmap_atomic(page);
 
 	reg_state[CTX_RING_TAIL+1] = tail;
@@ -644,13 +646,17 @@  intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf,
 				      struct drm_i915_gem_request *request)
 {
 	struct intel_engine_cs *ring = ringbuf->ring;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 
 	intel_logical_ring_advance(ringbuf);
 
 	if (intel_ring_stopped(ring))
 		return;
 
-	execlists_context_queue(ring, ctx, ringbuf->tail, request);
+	if (dev_priv->guc.execbuf_client)
+		i915_guc_submit(dev_priv->guc.execbuf_client, ctx, ring);
+	else
+		execlists_context_queue(ring, ctx, ringbuf->tail, request);
 }
 
 static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf,
@@ -914,18 +920,23 @@  static int intel_lr_context_pin(struct intel_engine_cs *ring,
 {
 	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
 	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	int ret = 0;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
 	if (ctx->engine[ring->id].pin_count++ == 0) {
-		ret = i915_gem_obj_ggtt_pin(ctx_obj,
-				GEN8_LR_CONTEXT_ALIGN, 0);
+		ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN,
+				PIN_OFFSET_BIAS | GUC_WOPCM_SIZE_VALUE);
 		if (ret)
 			goto reset_pin_count;
 
 		ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
 		if (ret)
 			goto unpin_ctx_obj;
+
+		/* Invalidate GuC TLB. */
+		if (i915.enable_guc_submission)
+			I915_WRITE(GEN8_GTCR, GEN8_GTCR_INVALIDATE);
 	}
 
 	return ret;
@@ -1337,8 +1348,13 @@  out:
 static int gen8_init_rcs_context(struct intel_engine_cs *ring,
 		       struct intel_context *ctx)
 {
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	int ret;
 
+	/* Invalidate GuC TLB. */
+	if (i915.enable_guc_submission)
+		I915_WRITE(GEN8_GTCR, GEN8_GTCR_INVALIDATE);
+
 	ret = intel_logical_ring_workarounds_emit(ring, ctx);
 	if (ret)
 		return ret;
@@ -1677,7 +1693,7 @@  populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 
 	/* The second page of the context object contains some fields which must
 	 * be set up prior to the first execution. */
-	page = i915_gem_object_get_page(ctx_obj, 1);
+	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
 	reg_state = kmap_atomic(page);
 
 	/* A context is actually a big batch buffer with several MI_LOAD_REGISTER_IMM
@@ -1823,12 +1839,14 @@  static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
 		struct drm_i915_gem_object *default_ctx_obj)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct page *page;
 
 	/* The status page is offset 0 from the default context object
 	 * in LRC mode. */
-	ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(default_ctx_obj);
-	ring->status_page.page_addr =
-			kmap(sg_page(default_ctx_obj->pages->sgl));
+	ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(default_ctx_obj)
+			+ LRC_PPHWSP_PN * PAGE_SIZE;
+	page = i915_gem_object_get_page(default_ctx_obj, LRC_PPHWSP_PN);
+	ring->status_page.page_addr = kmap(page);
 	ring->status_page.obj = default_ctx_obj;
 
 	I915_WRITE(RING_HWS_PGA(ring->mmio_base),
@@ -1864,6 +1882,9 @@  int intel_lr_context_deferred_create(struct intel_context *ctx,
 
 	context_size = round_up(get_lr_context_size(ring), 4096);
 
+	/* One extra page as the sharing data between driver and GuC */
+	context_size += PAGE_SIZE * LRC_PPHWSP_PN;
+
 	ctx_obj = i915_gem_alloc_object(dev, context_size);
 	if (!ctx_obj) {
 		DRM_DEBUG_DRIVER("Alloc LRC backing obj failed.\n");
@@ -1871,7 +1892,8 @@  int intel_lr_context_deferred_create(struct intel_context *ctx,
 	}
 
 	if (is_global_default_ctx) {
-		ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN, 0);
+		ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN,
+				PIN_OFFSET_BIAS | GUC_WOPCM_SIZE_VALUE);
 		if (ret) {
 			DRM_DEBUG_DRIVER("Pin LRC backing obj failed: %d\n",
 					ret);
@@ -1890,7 +1912,7 @@  int intel_lr_context_deferred_create(struct intel_context *ctx,
 
 	ringbuf->ring = ring;
 
-	ringbuf->size = 32 * PAGE_SIZE;
+	ringbuf->size = 4 * PAGE_SIZE;
 	ringbuf->effective_size = ringbuf->size;
 	ringbuf->head = 0;
 	ringbuf->tail = 0;
@@ -1981,7 +2003,7 @@  void intel_lr_context_reset(struct drm_device *dev,
 			WARN(1, "Failed get_pages for context obj\n");
 			continue;
 		}
-		page = i915_gem_object_get_page(ctx_obj, 1);
+		page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
 		reg_state = kmap_atomic(page);
 
 		reg_state[CTX_RING_HEAD+1] = 0;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 19c9a02..fd5d791 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -67,6 +67,12 @@  static inline void intel_logical_ring_emit(struct intel_ringbuffer *ringbuf,
 }
 
 /* Logical Ring Contexts */
+
+/* One extra page is added before LRC for GuC as shared data */
+#define LRC_GUCSHR_PN	(0)
+#define LRC_PPHWSP_PN	(LRC_GUCSHR_PN + 1)
+#define LRC_STATE_PN	(LRC_PPHWSP_PN + 1)
+
 void intel_lr_context_free(struct intel_context *ctx);
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring);