[12/27] drm/i915/guc: Add multi-lrc context registration

Message ID	20210820224446.30620-13-matthew.brost@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=BHe2=NL=lists.freedesktop.org=intel-gfx-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 310EF601FE From: Matthew Brost <matthew.brost@intel.com> To: <intel-gfx@lists.freedesktop.org>, <dri-devel@lists.freedesktop.org> Cc: <daniel.vetter@ffwll.ch>, <tony.ye@intel.com>, <zhengguo.xu@intel.com> Date: Fri, 20 Aug 2021 15:44:31 -0700 Message-Id: <20210820224446.30620-13-matthew.brost@intel.com> In-Reply-To: <20210820224446.30620-1-matthew.brost@intel.com> References: <20210820224446.30620-1-matthew.brost@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Intel-gfx] [PATCH 12/27] drm/i915/guc: Add multi-lrc context registration Precedence: list Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	Parallel submission aka multi-bb execbuf \| expand [00/27] Parallel submission aka multi-bb execbuf [01/27] drm/i915/guc: Squash Clean up GuC CI failures, simplify locking, and kernel DOC [02/27] drm/i915/guc: Allow flexible number of context ids [03/27] drm/i915/guc: Connect the number of guc_ids to debugfs [04/27] drm/i915/guc: Take GT PM ref when deregistering context [05/27] drm/i915: Add GT PM unpark worker [06/27] drm/i915/guc: Take engine PM when a context is pinned with GuC submission [07/27] drm/i915/guc: Don't call switch_to_kernel_context with GuC submission [08/27] drm/i915: Add logical engine mapping [09/27] drm/i915: Expose logical engine instance to user [10/27] drm/i915/guc: Introduce context parent-child relationship [11/27] drm/i915/guc: Implement parallel context pin / unpin functions [12/27] drm/i915/guc: Add multi-lrc context registration [13/27] drm/i915/guc: Ensure GuC schedule operations do not operate on child contexts [14/27] drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids [15/27] drm/i915/guc: Implement multi-lrc submission [16/27] drm/i915/guc: Insert submit fences between requests in parent-child relationship [17/27] drm/i915/guc: Implement multi-lrc reset [18/27] drm/i915/guc: Update debugfs for GuC multi-lrc [19/27] drm/i915: Fix bug in user proto-context creation that leaked contexts [20/27] drm/i915/guc: Connect UAPI to GuC multi-lrc interface [21/27] drm/i915/doc: Update parallel submit doc to point to i915_drm.h [22/27] drm/i915/guc: Add basic GuC multi-lrc selftest [23/27] drm/i915/guc: Implement no mid batch preemption for multi-lrc [24/27] drm/i915: Multi-BB execbuf [25/27] drm/i915/guc: Handle errors in multi-lrc requests [26/27] drm/i915: Enable multi-bb execbuf [27/27] drm/i915/execlists: Weak parallel submission support for execlists

Message ID

20210820224446.30620-13-matthew.brost@intel.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 310EF601FE
From: Matthew Brost <matthew.brost@intel.com>
To: <intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Cc: <daniel.vetter@ffwll.ch>,
	<tony.ye@intel.com>,
	<zhengguo.xu@intel.com>
Date: Fri, 20 Aug 2021 15:44:31 -0700
Message-Id: <20210820224446.30620-13-matthew.brost@intel.com>
In-Reply-To: <20210820224446.30620-1-matthew.brost@intel.com>
References: <20210820224446.30620-1-matthew.brost@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Subject: [Intel-gfx] [PATCH 12/27] drm/i915/guc: Add multi-lrc context
 registration
Precedence: list
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Series

Parallel submission aka multi-bb execbuf | expand

Commit Message

Matthew Brost Aug. 20, 2021, 10:44 p.m. UTC

Add multi-lrc context registration H2G. In addition a workqueue and
process descriptor are setup during multi-lrc context registration as
these data structures are needed for multi-lrc submission.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |  12 ++
 drivers/gpu/drm/i915/gt/intel_lrc.c           |   5 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 109 +++++++++++++++++-
 4 files changed, 126 insertions(+), 2 deletions(-)

Comments

John Harrison Sept. 15, 2021, 7:21 p.m. UTC | #1

On 8/20/2021 15:44, Matthew Brost wrote:
> Add multi-lrc context registration H2G. In addition a workqueue and
> process descriptor are setup during multi-lrc context registration as
> these data structures are needed for multi-lrc submission.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context_types.h |  12 ++
>   drivers/gpu/drm/i915/gt/intel_lrc.c           |   5 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 109 +++++++++++++++++-
>   4 files changed, 126 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 0fafc178cf2c..6f567ebeb039 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -232,8 +232,20 @@ struct intel_context {
>   		/** @parent: pointer to parent if child */
>   		struct intel_context *parent;
>   
> +
> +		/** @guc_wqi_head: head pointer in work queue */
> +		u16 guc_wqi_head;
> +		/** @guc_wqi_tail: tail pointer in work queue */
> +		u16 guc_wqi_tail;
> +
These should be in the 'guc_state' sub-struct? Would be good to keep all 
GuC specific content in one self-contained struct. Especially given the 
other child/parent fields are no going to be guc_ prefixed any more.


>   		/** @guc_number_children: number of children if parent */
>   		u8 guc_number_children;
> +
> +		/**
> +		 * @parent_page: page in context used by parent for work queue,
Maybe 'page in context record'? Otherwise, exactly what 'context' is 
meant here? It isn't the 'struct intel_context'. The contetx record is 
saved as 'ce->state' / 'ce->lrc_reg_state', yes? Is it possible to link 
to either of those field? Probably not given that they don't appear to 
have any kerneldoc description :(. Maybe add that in too :).

> +		 * work queue descriptor
Later on, it is described as 'process descriptor and work queue'. It 
would be good to be consistent.

> +		 */
> +		u8 parent_page;
>   	};
>   
>   #ifdef CONFIG_DRM_I915_SELFTEST
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index bb4af4977920..0ddbad4e062a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -861,6 +861,11 @@ __lrc_alloc_state(struct intel_context *ce, struct intel_engine_cs *engine)
>   		context_size += PAGE_SIZE;
>   	}
>   
> +	if (intel_context_is_parent(ce)) {
> +		ce->parent_page = context_size / PAGE_SIZE;
> +		context_size += PAGE_SIZE;
> +	}
> +
>   	obj = i915_gem_object_create_lmem(engine->i915, context_size, 0);
>   	if (IS_ERR(obj))
>   		obj = i915_gem_object_create_shmem(engine->i915, context_size);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> index fa4be13c8854..0e600a3b8f1e 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> @@ -52,7 +52,7 @@
>   
>   #define GUC_DOORBELL_INVALID		256
>   
> -#define GUC_WQ_SIZE			(PAGE_SIZE * 2)
> +#define GUC_WQ_SIZE			(PAGE_SIZE / 2)
Is this size actually dictated by the GuC API? Or is it just a driver 
level decision? If the latter, shouldn't this be below instead?

>   
>   /* Work queue item header definitions */
>   #define WQ_STATUS_ACTIVE		1
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 14b24298cdd7..dbcb9ab28a9a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -340,6 +340,39 @@ static struct i915_priolist *to_priolist(struct rb_node *rb)
>   	return rb_entry(rb, struct i915_priolist, node);
>   }
>   
> +/*
> + * When using multi-lrc submission an extra page in the context state is
> + * reserved for the process descriptor and work queue.
> + *
> + * The layout of this page is below:
> + * 0						guc_process_desc
> + * ...						unused
> + * PAGE_SIZE / 2				work queue start
> + * ...						work queue
> + * PAGE_SIZE - 1				work queue end
> + */
> +#define WQ_OFFSET	(PAGE_SIZE / 2)
Can this not be derived from GUC_WQ_SIZE given that the two are 
fundamentally linked? E.g. '#define WQ_OFFSET (PAGE_SIZE - 
GUC_WQ_SIZE)'? And maybe have a '#define WQ_TOTAL_SIZE PAGE_SIZE' and 
use that in all of WQ_OFFSET, GUC_WQ_SIZE and the allocation itself in 
intel_lrc.c?

Also, the process descriptor is actually an array of descriptors sized 
by the number of children? Or am I misunderstanding the code below? In 
so, shouldn't there be a 'COMPILE_BUG_ON((MAX_ENGINE_INSTANCE * 
sizeof(descriptor)) < (WQ_DESC_SIZE)' where WQ_DESC_SIZE is 
WQ_TOTAL_SIZE - WQ_SIZE?

> +static u32 __get_process_desc_offset(struct intel_context *ce)
> +{
> +	GEM_BUG_ON(!ce->parent_page);
> +
> +	return ce->parent_page * PAGE_SIZE;
> +}
> +
> +static u32 __get_wq_offset(struct intel_context *ce)
> +{
> +	return __get_process_desc_offset(ce) + WQ_OFFSET;
> +}
> +
> +static struct guc_process_desc *
> +__get_process_desc(struct intel_context *ce)
> +{
> +	return (struct guc_process_desc *)
> +		(ce->lrc_reg_state +
> +		 ((__get_process_desc_offset(ce) -
> +		   LRC_STATE_OFFSET) / sizeof(u32)));
Where did the LRC_STATE_OFFSET come from? Is that built in to the 
lrg_reg_state pointer itself? That needs to be documented somewhere.

> +}
> +
>   static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
>   {
>   	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
> @@ -1342,6 +1375,30 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
>   	spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>   }
>   
> +static int __guc_action_register_multi_lrc(struct intel_guc *guc,
> +					   struct intel_context *ce,
> +					   u32 guc_id,
> +					   u32 offset,
> +					   bool loop)
> +{
> +	struct intel_context *child;
> +	u32 action[4 + MAX_ENGINE_INSTANCE];
> +	int len = 0;
> +
> +	GEM_BUG_ON(ce->guc_number_children > MAX_ENGINE_INSTANCE);
> +
> +	action[len++] = INTEL_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC;
> +	action[len++] = guc_id;
> +	action[len++] = ce->guc_number_children + 1;
> +	action[len++] = offset;
> +	for_each_child(ce, child) {
> +		offset += sizeof(struct guc_lrc_desc);
> +		action[len++] = offset;
> +	}
> +
> +	return guc_submission_send_busy_loop(guc, action, len, 0, loop);
> +}
> +
>   static int __guc_action_register_context(struct intel_guc *guc,
>   					 u32 guc_id,
>   					 u32 offset,
> @@ -1364,9 +1421,15 @@ static int register_context(struct intel_context *ce, bool loop)
>   		ce->guc_id.id * sizeof(struct guc_lrc_desc);
>   	int ret;
>   
> +	GEM_BUG_ON(intel_context_is_child(ce));
>   	trace_intel_context_register(ce);
>   
> -	ret = __guc_action_register_context(guc, ce->guc_id.id, offset, loop);
> +	if (intel_context_is_parent(ce))
> +		ret = __guc_action_register_multi_lrc(guc, ce, ce->guc_id.id,
> +						      offset, loop);
> +	else
> +		ret = __guc_action_register_context(guc, ce->guc_id.id, offset,
> +						    loop);
>   	if (likely(!ret)) {
>   		unsigned long flags;
>   
> @@ -1396,6 +1459,7 @@ static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
>   {
>   	struct intel_guc *guc = ce_to_guc(ce);
>   
> +	GEM_BUG_ON(intel_context_is_child(ce));
>   	trace_intel_context_deregister(ce);
>   
>   	return __guc_action_deregister_context(guc, guc_id, loop);
> @@ -1423,6 +1487,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>   	struct guc_lrc_desc *desc;
>   	bool context_registered;
>   	intel_wakeref_t wakeref;
> +	struct intel_context *child;
>   	int ret = 0;
>   
>   	GEM_BUG_ON(!engine->mask);
> @@ -1448,6 +1513,42 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>   	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
>   	guc_context_policy_init(engine, desc);
>   
> +	/*
> +	 * Context is a parent, we need to register a process descriptor
> +	 * describing a work queue and register all child contexts.
Technically, this should say 'If the context is a parent'. Or just move 
it to be inside the if block.

> +	 */
> +	if (intel_context_is_parent(ce)) {
> +		struct guc_process_desc *pdesc;
> +
> +		ce->guc_wqi_tail = 0;
> +		ce->guc_wqi_head = 0;
> +
> +		desc->process_desc = i915_ggtt_offset(ce->state) +
> +			__get_process_desc_offset(ce);
> +		desc->wq_addr = i915_ggtt_offset(ce->state) +
> +			__get_wq_offset(ce);
> +		desc->wq_size = GUC_WQ_SIZE;
> +
> +		pdesc = __get_process_desc(ce);
> +		memset(pdesc, 0, sizeof(*(pdesc)));
> +		pdesc->stage_id = ce->guc_id.id;
> +		pdesc->wq_base_addr = desc->wq_addr;
> +		pdesc->wq_size_bytes = desc->wq_size;
> +		pdesc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
Should this not be inherited from the ce? And same below. Or are we not 
using this priority in that way?

John.

> +		pdesc->wq_status = WQ_STATUS_ACTIVE;
> +
> +		for_each_child(ce, child) {
> +			desc = __get_lrc_desc(guc, child->guc_id.id);
> +
> +			desc->engine_class =
> +				engine_class_to_guc_class(engine->class);
> +			desc->hw_context_desc = child->lrc.lrca;
> +			desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> +			desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> +			guc_context_policy_init(engine, desc);
> +		}
> +	}
> +
>   	/*
>   	 * The context_lookup xarray is used to determine if the hardware
>   	 * context is currently registered. There are two cases in which it
> @@ -2858,6 +2959,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
>   		return NULL;
>   	}
>   
> +	if (unlikely(intel_context_is_child(ce))) {
> +		drm_err(&guc_to_gt(guc)->i915->drm,
> +			"Context is child, desc_idx %u", desc_idx);
> +		return NULL;
> +	}
> +
>   	return ce;
>   }
>

Matthew Brost Sept. 15, 2021, 7:31 p.m. UTC | #2

On Wed, Sep 15, 2021 at 12:21:35PM -0700, John Harrison wrote:
> On 8/20/2021 15:44, Matthew Brost wrote:
> > Add multi-lrc context registration H2G. In addition a workqueue and
> > process descriptor are setup during multi-lrc context registration as
> > these data structures are needed for multi-lrc submission.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |  12 ++
> >   drivers/gpu/drm/i915/gt/intel_lrc.c           |   5 +
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 +-
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 109 +++++++++++++++++-
> >   4 files changed, 126 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 0fafc178cf2c..6f567ebeb039 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -232,8 +232,20 @@ struct intel_context {
> >   		/** @parent: pointer to parent if child */
> >   		struct intel_context *parent;
> > +
> > +		/** @guc_wqi_head: head pointer in work queue */
> > +		u16 guc_wqi_head;
> > +		/** @guc_wqi_tail: tail pointer in work queue */
> > +		u16 guc_wqi_tail;
> > +
> These should be in the 'guc_state' sub-struct? Would be good to keep all GuC
> specific content in one self-contained struct. Especially given the other
> child/parent fields are no going to be guc_ prefixed any more.
> 

Right now I have everything in guc_state protected by guc_state.lock,
these fields are not protected by this lock. IMO it is better to use a
different sub-structure for the parallel fields (even if anonymous).

> 
> >   		/** @guc_number_children: number of children if parent */
> >   		u8 guc_number_children;
> > +
> > +		/**
> > +		 * @parent_page: page in context used by parent for work queue,
> Maybe 'page in context record'? Otherwise, exactly what 'context' is meant
> here? It isn't the 'struct intel_context'. The contetx record is saved as
> 'ce->state' / 'ce->lrc_reg_state', yes? Is it possible to link to either of

It is the page in ce->state / page minus LRC reg offset in
ce->lrg_reg_state. Will update the commit to make that clear.

> those field? Probably not given that they don't appear to have any kerneldoc
> description :(. Maybe add that in too :).
> 
> > +		 * work queue descriptor
> Later on, it is described as 'process descriptor and work queue'. It would
> be good to be consistent.
>

Yep. Will fix.

> > +		 */
> > +		u8 parent_page;
> >   	};
> >   #ifdef CONFIG_DRM_I915_SELFTEST
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index bb4af4977920..0ddbad4e062a 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -861,6 +861,11 @@ __lrc_alloc_state(struct intel_context *ce, struct intel_engine_cs *engine)
> >   		context_size += PAGE_SIZE;
> >   	}
> > +	if (intel_context_is_parent(ce)) {
> > +		ce->parent_page = context_size / PAGE_SIZE;
> > +		context_size += PAGE_SIZE;
> > +	}
> > +
> >   	obj = i915_gem_object_create_lmem(engine->i915, context_size, 0);
> >   	if (IS_ERR(obj))
> >   		obj = i915_gem_object_create_shmem(engine->i915, context_size);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> > index fa4be13c8854..0e600a3b8f1e 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> > @@ -52,7 +52,7 @@
> >   #define GUC_DOORBELL_INVALID		256
> > -#define GUC_WQ_SIZE			(PAGE_SIZE * 2)
> > +#define GUC_WQ_SIZE			(PAGE_SIZE / 2)
> Is this size actually dictated by the GuC API? Or is it just a driver level
> decision? If the latter, shouldn't this be below instead?
>

Driver level decision. What exactly do you mean by below?
 
> >   /* Work queue item header definitions */
> >   #define WQ_STATUS_ACTIVE		1
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 14b24298cdd7..dbcb9ab28a9a 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -340,6 +340,39 @@ static struct i915_priolist *to_priolist(struct rb_node *rb)
> >   	return rb_entry(rb, struct i915_priolist, node);
> >   }
> > +/*
> > + * When using multi-lrc submission an extra page in the context state is
> > + * reserved for the process descriptor and work queue.
> > + *
> > + * The layout of this page is below:
> > + * 0						guc_process_desc
> > + * ...						unused
> > + * PAGE_SIZE / 2				work queue start
> > + * ...						work queue
> > + * PAGE_SIZE - 1				work queue end
> > + */
> > +#define WQ_OFFSET	(PAGE_SIZE / 2)
> Can this not be derived from GUC_WQ_SIZE given that the two are
> fundamentally linked? E.g. '#define WQ_OFFSET (PAGE_SIZE - GUC_WQ_SIZE)'?

Yes. I like 'define WQ_OFFSET (PAGE_SIZE - GUC_WQ_SIZE)'. Will change.

> And maybe have a '#define WQ_TOTAL_SIZE PAGE_SIZE' and use that in all of
> WQ_OFFSET, GUC_WQ_SIZE and the allocation itself in intel_lrc.c?
> 
> Also, the process descriptor is actually an array of descriptors sized by
> the number of children? Or am I misunderstanding the code below? In so,

No, it is fixed size descriptor.

A later patch in the series uses the space of above the process
descriptor for insertation of preeemption points handshake. That does
depend on the number of children. I will add a COMPILE_BUG_ON for that
to ensure everything fits in the memory layout.

> shouldn't there be a 'COMPILE_BUG_ON((MAX_ENGINE_INSTANCE *
> sizeof(descriptor)) < (WQ_DESC_SIZE)' where WQ_DESC_SIZE is WQ_TOTAL_SIZE -
> WQ_SIZE?
> 
> > +static u32 __get_process_desc_offset(struct intel_context *ce)
> > +{
> > +	GEM_BUG_ON(!ce->parent_page);
> > +
> > +	return ce->parent_page * PAGE_SIZE;
> > +}
> > +
> > +static u32 __get_wq_offset(struct intel_context *ce)
> > +{
> > +	return __get_process_desc_offset(ce) + WQ_OFFSET;
> > +}
> > +
> > +static struct guc_process_desc *
> > +__get_process_desc(struct intel_context *ce)
> > +{
> > +	return (struct guc_process_desc *)
> > +		(ce->lrc_reg_state +
> > +		 ((__get_process_desc_offset(ce) -
> > +		   LRC_STATE_OFFSET) / sizeof(u32)));
> Where did the LRC_STATE_OFFSET come from? Is that built in to the
> lrg_reg_state pointer itself? That needs to be documented somewhere.
> 

In gt/intel_lrc.c (lrc_pin) ce->lrc_reg_state is assigned to
mmap(ce->state) + LRC_STATE_OFFSET. I can update the kerneldoc for that
field in this patch.

> > +}
> > +
> >   static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
> >   {
> >   	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
> > @@ -1342,6 +1375,30 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> >   	spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> >   }
> > +static int __guc_action_register_multi_lrc(struct intel_guc *guc,
> > +					   struct intel_context *ce,
> > +					   u32 guc_id,
> > +					   u32 offset,
> > +					   bool loop)
> > +{
> > +	struct intel_context *child;
> > +	u32 action[4 + MAX_ENGINE_INSTANCE];
> > +	int len = 0;
> > +
> > +	GEM_BUG_ON(ce->guc_number_children > MAX_ENGINE_INSTANCE);
> > +
> > +	action[len++] = INTEL_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC;
> > +	action[len++] = guc_id;
> > +	action[len++] = ce->guc_number_children + 1;
> > +	action[len++] = offset;
> > +	for_each_child(ce, child) {
> > +		offset += sizeof(struct guc_lrc_desc);
> > +		action[len++] = offset;
> > +	}
> > +
> > +	return guc_submission_send_busy_loop(guc, action, len, 0, loop);
> > +}
> > +
> >   static int __guc_action_register_context(struct intel_guc *guc,
> >   					 u32 guc_id,
> >   					 u32 offset,
> > @@ -1364,9 +1421,15 @@ static int register_context(struct intel_context *ce, bool loop)
> >   		ce->guc_id.id * sizeof(struct guc_lrc_desc);
> >   	int ret;
> > +	GEM_BUG_ON(intel_context_is_child(ce));
> >   	trace_intel_context_register(ce);
> > -	ret = __guc_action_register_context(guc, ce->guc_id.id, offset, loop);
> > +	if (intel_context_is_parent(ce))
> > +		ret = __guc_action_register_multi_lrc(guc, ce, ce->guc_id.id,
> > +						      offset, loop);
> > +	else
> > +		ret = __guc_action_register_context(guc, ce->guc_id.id, offset,
> > +						    loop);
> >   	if (likely(!ret)) {
> >   		unsigned long flags;
> > @@ -1396,6 +1459,7 @@ static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
> >   {
> >   	struct intel_guc *guc = ce_to_guc(ce);
> > +	GEM_BUG_ON(intel_context_is_child(ce));
> >   	trace_intel_context_deregister(ce);
> >   	return __guc_action_deregister_context(guc, guc_id, loop);
> > @@ -1423,6 +1487,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> >   	struct guc_lrc_desc *desc;
> >   	bool context_registered;
> >   	intel_wakeref_t wakeref;
> > +	struct intel_context *child;
> >   	int ret = 0;
> >   	GEM_BUG_ON(!engine->mask);
> > @@ -1448,6 +1513,42 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> >   	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> >   	guc_context_policy_init(engine, desc);
> > +	/*
> > +	 * Context is a parent, we need to register a process descriptor
> > +	 * describing a work queue and register all child contexts.
> Technically, this should say 'If the context is a parent'. Or just move it
> to be inside the if block.
> 

I will add the "If the".

> > +	 */
> > +	if (intel_context_is_parent(ce)) {
> > +		struct guc_process_desc *pdesc;
> > +
> > +		ce->guc_wqi_tail = 0;
> > +		ce->guc_wqi_head = 0;
> > +
> > +		desc->process_desc = i915_ggtt_offset(ce->state) +
> > +			__get_process_desc_offset(ce);
> > +		desc->wq_addr = i915_ggtt_offset(ce->state) +
> > +			__get_wq_offset(ce);
> > +		desc->wq_size = GUC_WQ_SIZE;
> > +
> > +		pdesc = __get_process_desc(ce);
> > +		memset(pdesc, 0, sizeof(*(pdesc)));
> > +		pdesc->stage_id = ce->guc_id.id;
> > +		pdesc->wq_base_addr = desc->wq_addr;
> > +		pdesc->wq_size_bytes = desc->wq_size;
> > +		pdesc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> Should this not be inherited from the ce? And same below. Or are we not
> using this priority in that way?
> 

Honestly I don't think this field is used or maybe doesn't even exist
anymore. I'll check the GuC code and likely delete this or if it is
still present I'll inherited this from the ce.

Matt

> John.
> 
> > +		pdesc->wq_status = WQ_STATUS_ACTIVE;
> > +
> > +		for_each_child(ce, child) {
> > +			desc = __get_lrc_desc(guc, child->guc_id.id);
> > +
> > +			desc->engine_class =
> > +				engine_class_to_guc_class(engine->class);
> > +			desc->hw_context_desc = child->lrc.lrca;
> > +			desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> > +			desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> > +			guc_context_policy_init(engine, desc);
> > +		}
> > +	}
> > +
> >   	/*
> >   	 * The context_lookup xarray is used to determine if the hardware
> >   	 * context is currently registered. There are two cases in which it
> > @@ -2858,6 +2959,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> >   		return NULL;
> >   	}
> > +	if (unlikely(intel_context_is_child(ce))) {
> > +		drm_err(&guc_to_gt(guc)->i915->drm,
> > +			"Context is child, desc_idx %u", desc_idx);
> > +		return NULL;
> > +	}
> > +
> >   	return ce;
> >   }
>

John Harrison Sept. 15, 2021, 8:23 p.m. UTC | #3

On 9/15/2021 12:31, Matthew Brost wrote:
> On Wed, Sep 15, 2021 at 12:21:35PM -0700, John Harrison wrote:
>> On 8/20/2021 15:44, Matthew Brost wrote:
>>> Add multi-lrc context registration H2G. In addition a workqueue and
>>> process descriptor are setup during multi-lrc context registration as
>>> these data structures are needed for multi-lrc submission.
>>>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_context_types.h |  12 ++
>>>    drivers/gpu/drm/i915/gt/intel_lrc.c           |   5 +
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 +-
>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 109 +++++++++++++++++-
>>>    4 files changed, 126 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
>>> index 0fafc178cf2c..6f567ebeb039 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
>>> @@ -232,8 +232,20 @@ struct intel_context {
>>>    		/** @parent: pointer to parent if child */
>>>    		struct intel_context *parent;
>>> +
>>> +		/** @guc_wqi_head: head pointer in work queue */
>>> +		u16 guc_wqi_head;
>>> +		/** @guc_wqi_tail: tail pointer in work queue */
>>> +		u16 guc_wqi_tail;
>>> +
>> These should be in the 'guc_state' sub-struct? Would be good to keep all GuC
>> specific content in one self-contained struct. Especially given the other
>> child/parent fields are no going to be guc_ prefixed any more.
>>
> Right now I have everything in guc_state protected by guc_state.lock,
> these fields are not protected by this lock. IMO it is better to use a
> different sub-structure for the parallel fields (even if anonymous).
Hmm, I still think it is bad to be scattering back-end specific fields 
amongst regular fields. The GuC patches include a whole bunch of 
complaints about execlist back-end specific stuff leaking through to the 
higher levels, we really shouldn't be guilty of doing the same with GuC 
if at all possible. At the very least, the GuC specific fields should be 
grouped together at the end of the struct rather than inter-mingled.

>
>>>    		/** @guc_number_children: number of children if parent */
>>>    		u8 guc_number_children;
>>> +
>>> +		/**
>>> +		 * @parent_page: page in context used by parent for work queue,
>> Maybe 'page in context record'? Otherwise, exactly what 'context' is meant
>> here? It isn't the 'struct intel_context'. The contetx record is saved as
>> 'ce->state' / 'ce->lrc_reg_state', yes? Is it possible to link to either of
> It is the page in ce->state / page minus LRC reg offset in
> ce->lrg_reg_state. Will update the commit to make that clear.
>
>> those field? Probably not given that they don't appear to have any kerneldoc
>> description :(. Maybe add that in too :).
>>
>>> +		 * work queue descriptor
>> Later on, it is described as 'process descriptor and work queue'. It would
>> be good to be consistent.
>>
> Yep. Will fix.
>
>>> +		 */
>>> +		u8 parent_page;
>>>    	};
>>>    #ifdef CONFIG_DRM_I915_SELFTEST
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> index bb4af4977920..0ddbad4e062a 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> @@ -861,6 +861,11 @@ __lrc_alloc_state(struct intel_context *ce, struct intel_engine_cs *engine)
>>>    		context_size += PAGE_SIZE;
>>>    	}
>>> +	if (intel_context_is_parent(ce)) {
>>> +		ce->parent_page = context_size / PAGE_SIZE;
>>> +		context_size += PAGE_SIZE;
>>> +	}
>>> +
>>>    	obj = i915_gem_object_create_lmem(engine->i915, context_size, 0);
>>>    	if (IS_ERR(obj))
>>>    		obj = i915_gem_object_create_shmem(engine->i915, context_size);
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
>>> index fa4be13c8854..0e600a3b8f1e 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
>>> @@ -52,7 +52,7 @@
>>>    #define GUC_DOORBELL_INVALID		256
>>> -#define GUC_WQ_SIZE			(PAGE_SIZE * 2)
>>> +#define GUC_WQ_SIZE			(PAGE_SIZE / 2)
>> Is this size actually dictated by the GuC API? Or is it just a driver level
>> decision? If the latter, shouldn't this be below instead?
>>
> Driver level decision. What exactly do you mean by below?
The next chunk of the patch - where WQ_OFFSET is defined and the whole 
caboodle is described.

>   
>>>    /* Work queue item header definitions */
>>>    #define WQ_STATUS_ACTIVE		1
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> index 14b24298cdd7..dbcb9ab28a9a 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> @@ -340,6 +340,39 @@ static struct i915_priolist *to_priolist(struct rb_node *rb)
>>>    	return rb_entry(rb, struct i915_priolist, node);
>>>    }
>>> +/*
>>> + * When using multi-lrc submission an extra page in the context state is
>>> + * reserved for the process descriptor and work queue.
>>> + *
>>> + * The layout of this page is below:
>>> + * 0						guc_process_desc
>>> + * ...						unused
>>> + * PAGE_SIZE / 2				work queue start
>>> + * ...						work queue
>>> + * PAGE_SIZE - 1				work queue end
>>> + */
>>> +#define WQ_OFFSET	(PAGE_SIZE / 2)
>> Can this not be derived from GUC_WQ_SIZE given that the two are
>> fundamentally linked? E.g. '#define WQ_OFFSET (PAGE_SIZE - GUC_WQ_SIZE)'?
> Yes. I like 'define WQ_OFFSET (PAGE_SIZE - GUC_WQ_SIZE)'. Will change.
>
>> And maybe have a '#define WQ_TOTAL_SIZE PAGE_SIZE' and use that in all of
>> WQ_OFFSET, GUC_WQ_SIZE and the allocation itself in intel_lrc.c?
>>
>> Also, the process descriptor is actually an array of descriptors sized by
>> the number of children? Or am I misunderstanding the code below? In so,
> No, it is fixed size descriptor.
Yeah, I think I was getting confused between pdesc and desc in the code 
below.

I still think it would be a good idea to have everything explicitly 
named and the only mention of PAGE_SIZE is in the 'total size' definition.

John.


>
> A later patch in the series uses the space of above the process
> descriptor for insertation of preeemption points handshake. That does
> depend on the number of children. I will add a COMPILE_BUG_ON for that
> to ensure everything fits in the memory layout.
>
>> shouldn't there be a 'COMPILE_BUG_ON((MAX_ENGINE_INSTANCE *
>> sizeof(descriptor)) < (WQ_DESC_SIZE)' where WQ_DESC_SIZE is WQ_TOTAL_SIZE -
>> WQ_SIZE?
>>
>>> +static u32 __get_process_desc_offset(struct intel_context *ce)
>>> +{
>>> +	GEM_BUG_ON(!ce->parent_page);
>>> +
>>> +	return ce->parent_page * PAGE_SIZE;
>>> +}
>>> +
>>> +static u32 __get_wq_offset(struct intel_context *ce)
>>> +{
>>> +	return __get_process_desc_offset(ce) + WQ_OFFSET;
>>> +}
>>> +
>>> +static struct guc_process_desc *
>>> +__get_process_desc(struct intel_context *ce)
>>> +{
>>> +	return (struct guc_process_desc *)
>>> +		(ce->lrc_reg_state +
>>> +		 ((__get_process_desc_offset(ce) -
>>> +		   LRC_STATE_OFFSET) / sizeof(u32)));
>> Where did the LRC_STATE_OFFSET come from? Is that built in to the
>> lrg_reg_state pointer itself? That needs to be documented somewhere.
>>
> In gt/intel_lrc.c (lrc_pin) ce->lrc_reg_state is assigned to
> mmap(ce->state) + LRC_STATE_OFFSET. I can update the kerneldoc for that
> field in this patch.
>
>>> +}
>>> +
>>>    static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
>>>    {
>>>    	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
>>> @@ -1342,6 +1375,30 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
>>>    	spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>>>    }
>>> +static int __guc_action_register_multi_lrc(struct intel_guc *guc,
>>> +					   struct intel_context *ce,
>>> +					   u32 guc_id,
>>> +					   u32 offset,
>>> +					   bool loop)
>>> +{
>>> +	struct intel_context *child;
>>> +	u32 action[4 + MAX_ENGINE_INSTANCE];
>>> +	int len = 0;
>>> +
>>> +	GEM_BUG_ON(ce->guc_number_children > MAX_ENGINE_INSTANCE);
>>> +
>>> +	action[len++] = INTEL_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC;
>>> +	action[len++] = guc_id;
>>> +	action[len++] = ce->guc_number_children + 1;
>>> +	action[len++] = offset;
>>> +	for_each_child(ce, child) {
>>> +		offset += sizeof(struct guc_lrc_desc);
>>> +		action[len++] = offset;
>>> +	}
>>> +
>>> +	return guc_submission_send_busy_loop(guc, action, len, 0, loop);
>>> +}
>>> +
>>>    static int __guc_action_register_context(struct intel_guc *guc,
>>>    					 u32 guc_id,
>>>    					 u32 offset,
>>> @@ -1364,9 +1421,15 @@ static int register_context(struct intel_context *ce, bool loop)
>>>    		ce->guc_id.id * sizeof(struct guc_lrc_desc);
>>>    	int ret;
>>> +	GEM_BUG_ON(intel_context_is_child(ce));
>>>    	trace_intel_context_register(ce);
>>> -	ret = __guc_action_register_context(guc, ce->guc_id.id, offset, loop);
>>> +	if (intel_context_is_parent(ce))
>>> +		ret = __guc_action_register_multi_lrc(guc, ce, ce->guc_id.id,
>>> +						      offset, loop);
>>> +	else
>>> +		ret = __guc_action_register_context(guc, ce->guc_id.id, offset,
>>> +						    loop);
>>>    	if (likely(!ret)) {
>>>    		unsigned long flags;
>>> @@ -1396,6 +1459,7 @@ static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
>>>    {
>>>    	struct intel_guc *guc = ce_to_guc(ce);
>>> +	GEM_BUG_ON(intel_context_is_child(ce));
>>>    	trace_intel_context_deregister(ce);
>>>    	return __guc_action_deregister_context(guc, guc_id, loop);
>>> @@ -1423,6 +1487,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>>>    	struct guc_lrc_desc *desc;
>>>    	bool context_registered;
>>>    	intel_wakeref_t wakeref;
>>> +	struct intel_context *child;
>>>    	int ret = 0;
>>>    	GEM_BUG_ON(!engine->mask);
>>> @@ -1448,6 +1513,42 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>>>    	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
>>>    	guc_context_policy_init(engine, desc);
>>> +	/*
>>> +	 * Context is a parent, we need to register a process descriptor
>>> +	 * describing a work queue and register all child contexts.
>> Technically, this should say 'If the context is a parent'. Or just move it
>> to be inside the if block.
>>
> I will add the "If the".
>
>>> +	 */
>>> +	if (intel_context_is_parent(ce)) {
>>> +		struct guc_process_desc *pdesc;
>>> +
>>> +		ce->guc_wqi_tail = 0;
>>> +		ce->guc_wqi_head = 0;
>>> +
>>> +		desc->process_desc = i915_ggtt_offset(ce->state) +
>>> +			__get_process_desc_offset(ce);
>>> +		desc->wq_addr = i915_ggtt_offset(ce->state) +
>>> +			__get_wq_offset(ce);
>>> +		desc->wq_size = GUC_WQ_SIZE;
>>> +
>>> +		pdesc = __get_process_desc(ce);
>>> +		memset(pdesc, 0, sizeof(*(pdesc)));
>>> +		pdesc->stage_id = ce->guc_id.id;
>>> +		pdesc->wq_base_addr = desc->wq_addr;
>>> +		pdesc->wq_size_bytes = desc->wq_size;
>>> +		pdesc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
>> Should this not be inherited from the ce? And same below. Or are we not
>> using this priority in that way?
>>
> Honestly I don't think this field is used or maybe doesn't even exist
> anymore. I'll check the GuC code and likely delete this or if it is
> still present I'll inherited this from the ce.
>
> Matt
>
>> John.
>>
>>> +		pdesc->wq_status = WQ_STATUS_ACTIVE;
>>> +
>>> +		for_each_child(ce, child) {
>>> +			desc = __get_lrc_desc(guc, child->guc_id.id);
>>> +
>>> +			desc->engine_class =
>>> +				engine_class_to_guc_class(engine->class);
>>> +			desc->hw_context_desc = child->lrc.lrca;
>>> +			desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
>>> +			desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
>>> +			guc_context_policy_init(engine, desc);
>>> +		}
>>> +	}
>>> +
>>>    	/*
>>>    	 * The context_lookup xarray is used to determine if the hardware
>>>    	 * context is currently registered. There are two cases in which it
>>> @@ -2858,6 +2959,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
>>>    		return NULL;
>>>    	}
>>> +	if (unlikely(intel_context_is_child(ce))) {
>>> +		drm_err(&guc_to_gt(guc)->i915->drm,
>>> +			"Context is child, desc_idx %u", desc_idx);
>>> +		return NULL;
>>> +	}
>>> +
>>>    	return ce;
>>>    }

Matthew Brost Sept. 15, 2021, 8:33 p.m. UTC | #4

On Wed, Sep 15, 2021 at 01:23:19PM -0700, John Harrison wrote:
> On 9/15/2021 12:31, Matthew Brost wrote:
> > On Wed, Sep 15, 2021 at 12:21:35PM -0700, John Harrison wrote:
> > > On 8/20/2021 15:44, Matthew Brost wrote:
> > > > Add multi-lrc context registration H2G. In addition a workqueue and
> > > > process descriptor are setup during multi-lrc context registration as
> > > > these data structures are needed for multi-lrc submission.
> > > > 
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >    drivers/gpu/drm/i915/gt/intel_context_types.h |  12 ++
> > > >    drivers/gpu/drm/i915/gt/intel_lrc.c           |   5 +
> > > >    drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   2 +-
> > > >    .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 109 +++++++++++++++++-
> > > >    4 files changed, 126 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > > index 0fafc178cf2c..6f567ebeb039 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > > @@ -232,8 +232,20 @@ struct intel_context {
> > > >    		/** @parent: pointer to parent if child */
> > > >    		struct intel_context *parent;
> > > > +
> > > > +		/** @guc_wqi_head: head pointer in work queue */
> > > > +		u16 guc_wqi_head;
> > > > +		/** @guc_wqi_tail: tail pointer in work queue */
> > > > +		u16 guc_wqi_tail;
> > > > +
> > > These should be in the 'guc_state' sub-struct? Would be good to keep all GuC
> > > specific content in one self-contained struct. Especially given the other
> > > child/parent fields are no going to be guc_ prefixed any more.
> > > 
> > Right now I have everything in guc_state protected by guc_state.lock,
> > these fields are not protected by this lock. IMO it is better to use a
> > different sub-structure for the parallel fields (even if anonymous).
> Hmm, I still think it is bad to be scattering back-end specific fields
> amongst regular fields. The GuC patches include a whole bunch of complaints
> about execlist back-end specific stuff leaking through to the higher levels,
> we really shouldn't be guilty of doing the same with GuC if at all possible.
> At the very least, the GuC specific fields should be grouped together at the
> end of the struct rather than inter-mingled.
> 

How 2 different sub-structures - parallel (shared) & guc_parallel (guc specific)?

> > 
> > > >    		/** @guc_number_children: number of children if parent */
> > > >    		u8 guc_number_children;
> > > > +
> > > > +		/**
> > > > +		 * @parent_page: page in context used by parent for work queue,
> > > Maybe 'page in context record'? Otherwise, exactly what 'context' is meant
> > > here? It isn't the 'struct intel_context'. The contetx record is saved as
> > > 'ce->state' / 'ce->lrc_reg_state', yes? Is it possible to link to either of
> > It is the page in ce->state / page minus LRC reg offset in
> > ce->lrg_reg_state. Will update the commit to make that clear.
> > 
> > > those field? Probably not given that they don't appear to have any kerneldoc
> > > description :(. Maybe add that in too :).
> > > 
> > > > +		 * work queue descriptor
> > > Later on, it is described as 'process descriptor and work queue'. It would
> > > be good to be consistent.
> > > 
> > Yep. Will fix.
> > 
> > > > +		 */
> > > > +		u8 parent_page;
> > > >    	};
> > > >    #ifdef CONFIG_DRM_I915_SELFTEST
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > > > index bb4af4977920..0ddbad4e062a 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > > > @@ -861,6 +861,11 @@ __lrc_alloc_state(struct intel_context *ce, struct intel_engine_cs *engine)
> > > >    		context_size += PAGE_SIZE;
> > > >    	}
> > > > +	if (intel_context_is_parent(ce)) {
> > > > +		ce->parent_page = context_size / PAGE_SIZE;
> > > > +		context_size += PAGE_SIZE;
> > > > +	}
> > > > +
> > > >    	obj = i915_gem_object_create_lmem(engine->i915, context_size, 0);
> > > >    	if (IS_ERR(obj))
> > > >    		obj = i915_gem_object_create_shmem(engine->i915, context_size);
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> > > > index fa4be13c8854..0e600a3b8f1e 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> > > > @@ -52,7 +52,7 @@
> > > >    #define GUC_DOORBELL_INVALID		256
> > > > -#define GUC_WQ_SIZE			(PAGE_SIZE * 2)
> > > > +#define GUC_WQ_SIZE			(PAGE_SIZE / 2)
> > > Is this size actually dictated by the GuC API? Or is it just a driver level
> > > decision? If the latter, shouldn't this be below instead?
> > > 
> > Driver level decision. What exactly do you mean by below?
> The next chunk of the patch - where WQ_OFFSET is defined and the whole
> caboodle is described.
> 
> > > >    /* Work queue item header definitions */
> > > >    #define WQ_STATUS_ACTIVE		1
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > index 14b24298cdd7..dbcb9ab28a9a 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > @@ -340,6 +340,39 @@ static struct i915_priolist *to_priolist(struct rb_node *rb)
> > > >    	return rb_entry(rb, struct i915_priolist, node);
> > > >    }
> > > > +/*
> > > > + * When using multi-lrc submission an extra page in the context state is
> > > > + * reserved for the process descriptor and work queue.
> > > > + *
> > > > + * The layout of this page is below:
> > > > + * 0						guc_process_desc
> > > > + * ...						unused
> > > > + * PAGE_SIZE / 2				work queue start
> > > > + * ...						work queue
> > > > + * PAGE_SIZE - 1				work queue end
> > > > + */
> > > > +#define WQ_OFFSET	(PAGE_SIZE / 2)
> > > Can this not be derived from GUC_WQ_SIZE given that the two are
> > > fundamentally linked? E.g. '#define WQ_OFFSET (PAGE_SIZE - GUC_WQ_SIZE)'?
> > Yes. I like 'define WQ_OFFSET (PAGE_SIZE - GUC_WQ_SIZE)'. Will change.
> > 
> > > And maybe have a '#define WQ_TOTAL_SIZE PAGE_SIZE' and use that in all of
> > > WQ_OFFSET, GUC_WQ_SIZE and the allocation itself in intel_lrc.c?
> > > 
> > > Also, the process descriptor is actually an array of descriptors sized by
> > > the number of children? Or am I misunderstanding the code below? In so,
> > No, it is fixed size descriptor.
> Yeah, I think I was getting confused between pdesc and desc in the code
> below.
> 
> I still think it would be a good idea to have everything explicitly named
> and the only mention of PAGE_SIZE is in the 'total size' definition.
> 

#define PARENT_SCRATCH_SIZE 	PAGE_SIZE?

Matt 

> John.
> 
> 
> > 
> > A later patch in the series uses the space of above the process
> > descriptor for insertation of preeemption points handshake. That does
> > depend on the number of children. I will add a COMPILE_BUG_ON for that
> > to ensure everything fits in the memory layout.
> > 
> > > shouldn't there be a 'COMPILE_BUG_ON((MAX_ENGINE_INSTANCE *
> > > sizeof(descriptor)) < (WQ_DESC_SIZE)' where WQ_DESC_SIZE is WQ_TOTAL_SIZE -
> > > WQ_SIZE?
> > > 
> > > > +static u32 __get_process_desc_offset(struct intel_context *ce)
> > > > +{
> > > > +	GEM_BUG_ON(!ce->parent_page);
> > > > +
> > > > +	return ce->parent_page * PAGE_SIZE;
> > > > +}
> > > > +
> > > > +static u32 __get_wq_offset(struct intel_context *ce)
> > > > +{
> > > > +	return __get_process_desc_offset(ce) + WQ_OFFSET;
> > > > +}
> > > > +
> > > > +static struct guc_process_desc *
> > > > +__get_process_desc(struct intel_context *ce)
> > > > +{
> > > > +	return (struct guc_process_desc *)
> > > > +		(ce->lrc_reg_state +
> > > > +		 ((__get_process_desc_offset(ce) -
> > > > +		   LRC_STATE_OFFSET) / sizeof(u32)));
> > > Where did the LRC_STATE_OFFSET come from? Is that built in to the
> > > lrg_reg_state pointer itself? That needs to be documented somewhere.
> > > 
> > In gt/intel_lrc.c (lrc_pin) ce->lrc_reg_state is assigned to
> > mmap(ce->state) + LRC_STATE_OFFSET. I can update the kerneldoc for that
> > field in this patch.
> > 
> > > > +}
> > > > +
> > > >    static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
> > > >    {
> > > >    	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
> > > > @@ -1342,6 +1375,30 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > > >    	spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> > > >    }
> > > > +static int __guc_action_register_multi_lrc(struct intel_guc *guc,
> > > > +					   struct intel_context *ce,
> > > > +					   u32 guc_id,
> > > > +					   u32 offset,
> > > > +					   bool loop)
> > > > +{
> > > > +	struct intel_context *child;
> > > > +	u32 action[4 + MAX_ENGINE_INSTANCE];
> > > > +	int len = 0;
> > > > +
> > > > +	GEM_BUG_ON(ce->guc_number_children > MAX_ENGINE_INSTANCE);
> > > > +
> > > > +	action[len++] = INTEL_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC;
> > > > +	action[len++] = guc_id;
> > > > +	action[len++] = ce->guc_number_children + 1;
> > > > +	action[len++] = offset;
> > > > +	for_each_child(ce, child) {
> > > > +		offset += sizeof(struct guc_lrc_desc);
> > > > +		action[len++] = offset;
> > > > +	}
> > > > +
> > > > +	return guc_submission_send_busy_loop(guc, action, len, 0, loop);
> > > > +}
> > > > +
> > > >    static int __guc_action_register_context(struct intel_guc *guc,
> > > >    					 u32 guc_id,
> > > >    					 u32 offset,
> > > > @@ -1364,9 +1421,15 @@ static int register_context(struct intel_context *ce, bool loop)
> > > >    		ce->guc_id.id * sizeof(struct guc_lrc_desc);
> > > >    	int ret;
> > > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > > >    	trace_intel_context_register(ce);
> > > > -	ret = __guc_action_register_context(guc, ce->guc_id.id, offset, loop);
> > > > +	if (intel_context_is_parent(ce))
> > > > +		ret = __guc_action_register_multi_lrc(guc, ce, ce->guc_id.id,
> > > > +						      offset, loop);
> > > > +	else
> > > > +		ret = __guc_action_register_context(guc, ce->guc_id.id, offset,
> > > > +						    loop);
> > > >    	if (likely(!ret)) {
> > > >    		unsigned long flags;
> > > > @@ -1396,6 +1459,7 @@ static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
> > > >    {
> > > >    	struct intel_guc *guc = ce_to_guc(ce);
> > > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > > >    	trace_intel_context_deregister(ce);
> > > >    	return __guc_action_deregister_context(guc, guc_id, loop);
> > > > @@ -1423,6 +1487,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> > > >    	struct guc_lrc_desc *desc;
> > > >    	bool context_registered;
> > > >    	intel_wakeref_t wakeref;
> > > > +	struct intel_context *child;
> > > >    	int ret = 0;
> > > >    	GEM_BUG_ON(!engine->mask);
> > > > @@ -1448,6 +1513,42 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> > > >    	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> > > >    	guc_context_policy_init(engine, desc);
> > > > +	/*
> > > > +	 * Context is a parent, we need to register a process descriptor
> > > > +	 * describing a work queue and register all child contexts.
> > > Technically, this should say 'If the context is a parent'. Or just move it
> > > to be inside the if block.
> > > 
> > I will add the "If the".
> > 
> > > > +	 */
> > > > +	if (intel_context_is_parent(ce)) {
> > > > +		struct guc_process_desc *pdesc;
> > > > +
> > > > +		ce->guc_wqi_tail = 0;
> > > > +		ce->guc_wqi_head = 0;
> > > > +
> > > > +		desc->process_desc = i915_ggtt_offset(ce->state) +
> > > > +			__get_process_desc_offset(ce);
> > > > +		desc->wq_addr = i915_ggtt_offset(ce->state) +
> > > > +			__get_wq_offset(ce);
> > > > +		desc->wq_size = GUC_WQ_SIZE;
> > > > +
> > > > +		pdesc = __get_process_desc(ce);
> > > > +		memset(pdesc, 0, sizeof(*(pdesc)));
> > > > +		pdesc->stage_id = ce->guc_id.id;
> > > > +		pdesc->wq_base_addr = desc->wq_addr;
> > > > +		pdesc->wq_size_bytes = desc->wq_size;
> > > > +		pdesc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> > > Should this not be inherited from the ce? And same below. Or are we not
> > > using this priority in that way?
> > > 
> > Honestly I don't think this field is used or maybe doesn't even exist
> > anymore. I'll check the GuC code and likely delete this or if it is
> > still present I'll inherited this from the ce.
> > 
> > Matt
> > 
> > > John.
> > > 
> > > > +		pdesc->wq_status = WQ_STATUS_ACTIVE;
> > > > +
> > > > +		for_each_child(ce, child) {
> > > > +			desc = __get_lrc_desc(guc, child->guc_id.id);
> > > > +
> > > > +			desc->engine_class =
> > > > +				engine_class_to_guc_class(engine->class);
> > > > +			desc->hw_context_desc = child->lrc.lrca;
> > > > +			desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> > > > +			desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> > > > +			guc_context_policy_init(engine, desc);
> > > > +		}
> > > > +	}
> > > > +
> > > >    	/*
> > > >    	 * The context_lookup xarray is used to determine if the hardware
> > > >    	 * context is currently registered. There are two cases in which it
> > > > @@ -2858,6 +2959,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> > > >    		return NULL;
> > > >    	}
> > > > +	if (unlikely(intel_context_is_child(ce))) {
> > > > +		drm_err(&guc_to_gt(guc)->i915->drm,
> > > > +			"Context is child, desc_idx %u", desc_idx);
> > > > +		return NULL;
> > > > +	}
> > > > +
> > > >    	return ce;
> > > >    }
>

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 0fafc178cf2c..6f567ebeb039 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -232,8 +232,20 @@  struct intel_context {
 		/** @parent: pointer to parent if child */
 		struct intel_context *parent;
 
+
+		/** @guc_wqi_head: head pointer in work queue */
+		u16 guc_wqi_head;
+		/** @guc_wqi_tail: tail pointer in work queue */
+		u16 guc_wqi_tail;
+
 		/** @guc_number_children: number of children if parent */
 		u8 guc_number_children;
+
+		/**
+		 * @parent_page: page in context used by parent for work queue,
+		 * work queue descriptor
+		 */
+		u8 parent_page;
 	};
 
 #ifdef CONFIG_DRM_I915_SELFTEST
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index bb4af4977920..0ddbad4e062a 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -861,6 +861,11 @@  __lrc_alloc_state(struct intel_context *ce, struct intel_engine_cs *engine)
 		context_size += PAGE_SIZE;
 	}
 
+	if (intel_context_is_parent(ce)) {
+		ce->parent_page = context_size / PAGE_SIZE;
+		context_size += PAGE_SIZE;
+	}
+
 	obj = i915_gem_object_create_lmem(engine->i915, context_size, 0);
 	if (IS_ERR(obj))
 		obj = i915_gem_object_create_shmem(engine->i915, context_size);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index fa4be13c8854..0e600a3b8f1e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -52,7 +52,7 @@ 
 
 #define GUC_DOORBELL_INVALID		256
 
-#define GUC_WQ_SIZE			(PAGE_SIZE * 2)
+#define GUC_WQ_SIZE			(PAGE_SIZE / 2)
 
 /* Work queue item header definitions */
 #define WQ_STATUS_ACTIVE		1
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 14b24298cdd7..dbcb9ab28a9a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -340,6 +340,39 @@  static struct i915_priolist *to_priolist(struct rb_node *rb)
 	return rb_entry(rb, struct i915_priolist, node);
 }
 
+/*
+ * When using multi-lrc submission an extra page in the context state is
+ * reserved for the process descriptor and work queue.
+ *
+ * The layout of this page is below:
+ * 0						guc_process_desc
+ * ...						unused
+ * PAGE_SIZE / 2				work queue start
+ * ...						work queue
+ * PAGE_SIZE - 1				work queue end
+ */
+#define WQ_OFFSET	(PAGE_SIZE / 2)
+static u32 __get_process_desc_offset(struct intel_context *ce)
+{
+	GEM_BUG_ON(!ce->parent_page);
+
+	return ce->parent_page * PAGE_SIZE;
+}
+
+static u32 __get_wq_offset(struct intel_context *ce)
+{
+	return __get_process_desc_offset(ce) + WQ_OFFSET;
+}
+
+static struct guc_process_desc *
+__get_process_desc(struct intel_context *ce)
+{
+	return (struct guc_process_desc *)
+		(ce->lrc_reg_state +
+		 ((__get_process_desc_offset(ce) -
+		   LRC_STATE_OFFSET) / sizeof(u32)));
+}
+
 static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
 {
 	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
@@ -1342,6 +1375,30 @@  static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
 	spin_unlock_irqrestore(&guc->submission_state.lock, flags);
 }
 
+static int __guc_action_register_multi_lrc(struct intel_guc *guc,
+					   struct intel_context *ce,
+					   u32 guc_id,
+					   u32 offset,
+					   bool loop)
+{
+	struct intel_context *child;
+	u32 action[4 + MAX_ENGINE_INSTANCE];
+	int len = 0;
+
+	GEM_BUG_ON(ce->guc_number_children > MAX_ENGINE_INSTANCE);
+
+	action[len++] = INTEL_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC;
+	action[len++] = guc_id;
+	action[len++] = ce->guc_number_children + 1;
+	action[len++] = offset;
+	for_each_child(ce, child) {
+		offset += sizeof(struct guc_lrc_desc);
+		action[len++] = offset;
+	}
+
+	return guc_submission_send_busy_loop(guc, action, len, 0, loop);
+}
+
 static int __guc_action_register_context(struct intel_guc *guc,
 					 u32 guc_id,
 					 u32 offset,
@@ -1364,9 +1421,15 @@  static int register_context(struct intel_context *ce, bool loop)
 		ce->guc_id.id * sizeof(struct guc_lrc_desc);
 	int ret;
 
+	GEM_BUG_ON(intel_context_is_child(ce));
 	trace_intel_context_register(ce);
 
-	ret = __guc_action_register_context(guc, ce->guc_id.id, offset, loop);
+	if (intel_context_is_parent(ce))
+		ret = __guc_action_register_multi_lrc(guc, ce, ce->guc_id.id,
+						      offset, loop);
+	else
+		ret = __guc_action_register_context(guc, ce->guc_id.id, offset,
+						    loop);
 	if (likely(!ret)) {
 		unsigned long flags;
 
@@ -1396,6 +1459,7 @@  static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 
+	GEM_BUG_ON(intel_context_is_child(ce));
 	trace_intel_context_deregister(ce);
 
 	return __guc_action_deregister_context(guc, guc_id, loop);
@@ -1423,6 +1487,7 @@  static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	struct guc_lrc_desc *desc;
 	bool context_registered;
 	intel_wakeref_t wakeref;
+	struct intel_context *child;
 	int ret = 0;
 
 	GEM_BUG_ON(!engine->mask);
@@ -1448,6 +1513,42 @@  static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
 	guc_context_policy_init(engine, desc);
 
+	/*
+	 * Context is a parent, we need to register a process descriptor
+	 * describing a work queue and register all child contexts.
+	 */
+	if (intel_context_is_parent(ce)) {
+		struct guc_process_desc *pdesc;
+
+		ce->guc_wqi_tail = 0;
+		ce->guc_wqi_head = 0;
+
+		desc->process_desc = i915_ggtt_offset(ce->state) +
+			__get_process_desc_offset(ce);
+		desc->wq_addr = i915_ggtt_offset(ce->state) +
+			__get_wq_offset(ce);
+		desc->wq_size = GUC_WQ_SIZE;
+
+		pdesc = __get_process_desc(ce);
+		memset(pdesc, 0, sizeof(*(pdesc)));
+		pdesc->stage_id = ce->guc_id.id;
+		pdesc->wq_base_addr = desc->wq_addr;
+		pdesc->wq_size_bytes = desc->wq_size;
+		pdesc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
+		pdesc->wq_status = WQ_STATUS_ACTIVE;
+
+		for_each_child(ce, child) {
+			desc = __get_lrc_desc(guc, child->guc_id.id);
+
+			desc->engine_class =
+				engine_class_to_guc_class(engine->class);
+			desc->hw_context_desc = child->lrc.lrca;
+			desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
+			desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
+			guc_context_policy_init(engine, desc);
+		}
+	}
+
 	/*
 	 * The context_lookup xarray is used to determine if the hardware
 	 * context is currently registered. There are two cases in which it
@@ -2858,6 +2959,12 @@  g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
 		return NULL;
 	}
 
+	if (unlikely(intel_context_is_child(ce))) {
+		drm_err(&guc_to_gt(guc)->i915->drm,
+			"Context is child, desc_idx %u", desc_idx);
+		return NULL;
+	}
+
 	return ce;
 }

[12/27] drm/i915/guc: Add multi-lrc context registration

Commit Message

Comments

Patch