[v2,4/5] drm/i915/guc: rework guc_add_workqueue_item()

Message ID	1461780195-17434-4-git-send-email-david.s.gordon@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Dave Gordon <david.s.gordon@intel.com> To: intel-gfx@lists.freedesktop.org Date: Wed, 27 Apr 2016 19:03:14 +0100 Message-Id: <1461780195-17434-4-git-send-email-david.s.gordon@intel.com> In-Reply-To: <1461780195-17434-1-git-send-email-david.s.gordon@intel.com> References: <1461780195-17434-1-git-send-email-david.s.gordon@intel.com> Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ Subject: [Intel-gfx] [PATCH v2 4/5] drm/i915/guc: rework guc_add_workqueue_item() Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Dave Gordon April 27, 2016, 6:03 p.m. UTC

Mostly little optimisations; for instance, if the driver is correctly
following the submission protocol, the "out of space" condition is
impossible, so the previous runtime WARN_ON() is promoted to a
GEM_BUG_ON() for a more dramatic effect in development and less impact
in end-user systems.

Similarly we can replace other WARN_ON() conditions that don't relate to
the hardware state with either BUILD_BUG_ON() for compile-time-
detectable issues, or GEM_BUG_ON() for logical "can't happen" errors.

With those changes, we can convert it to void, as suggested by Chris
Wilson, and update the calling code appropriately.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>

---
 drivers/gpu/drm/i915/i915_guc_submission.c | 69 +++++++++++++++---------------
 drivers/gpu/drm/i915/intel_guc.h           |  2 +-
 drivers/gpu/drm/i915/intel_guc_fwif.h      |  3 +-
 3 files changed, 37 insertions(+), 37 deletions(-)

Tvrtko Ursulin April 29, 2016, 3:44 p.m. UTC | #1

On 27/04/16 19:03, Dave Gordon wrote:
> Mostly little optimisations; for instance, if the driver is correctly
> following the submission protocol, the "out of space" condition is
> impossible, so the previous runtime WARN_ON() is promoted to a
> GEM_BUG_ON() for a more dramatic effect in development and less impact
> in end-user systems.
>
> Similarly we can replace other WARN_ON() conditions that don't relate to
> the hardware state with either BUILD_BUG_ON() for compile-time-
> detectable issues, or GEM_BUG_ON() for logical "can't happen" errors.
>
> With those changes, we can convert it to void, as suggested by Chris
> Wilson, and update the calling code appropriately.
>
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>
> ---
>   drivers/gpu/drm/i915/i915_guc_submission.c | 69 +++++++++++++++---------------
>   drivers/gpu/drm/i915/intel_guc.h           |  2 +-
>   drivers/gpu/drm/i915/intel_guc_fwif.h      |  3 +-
>   3 files changed, 37 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 6626eff..4d2ea84 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -470,23 +470,28 @@ int i915_guc_wq_check_space(struct drm_i915_gem_request *request)
>   	return -EAGAIN;
>   }
>
> -static int guc_add_workqueue_item(struct i915_guc_client *gc,
> -				  struct drm_i915_gem_request *rq)
> +static void guc_add_workqueue_item(struct i915_guc_client *gc,
> +				   struct drm_i915_gem_request *rq)
>   {
> +	/* wqi_len is in DWords, and does not include the one-word header */
> +	const size_t wqi_size = sizeof(struct guc_wq_item);

Again, u32 is correct I think.

> +	const u32 wqi_len = wqi_size/sizeof(u32) - 1;
>   	struct guc_process_desc *desc;
>   	struct guc_wq_item *wqi;
>   	void *base;
> -	u32 tail, wq_len, wq_off, space;
> +	u32 space, tail, wq_off, wq_page;
>
>   	desc = gc->client_base + gc->proc_desc_offset;
> +
> +	/* Space was checked earlier, in i915_guc_wq_check_space() above */

It may be above in the file, but the two do not call one another so I 
recommend saying exactly who called it.

>   	space = CIRC_SPACE(gc->wq_tail, desc->head, gc->wq_size);
> -	if (WARN_ON(space < sizeof(struct guc_wq_item)))
> -		return -ENOSPC; /* shouldn't happen */
> +	GEM_BUG_ON(space < wqi_size);

It is impossible to hit this only because of the struct_mutex guarding 
the whole time window from request creation to submission. If in the 
future, near or far, that gets fixed, then this will need reworking.

I don't have any better ideas though.

But a WARN_ON and return would be almost as good. Everything is better 
than a dead machine one can't ssh into...

So I appeal to make this a WARN_ON and return. Nothing bad would happen 
apart from software thinking GPU has hung.

>
> -	/* postincrement WQ tail for next time */
> -	wq_off = gc->wq_tail;
> -	gc->wq_tail += sizeof(struct guc_wq_item);
> -	gc->wq_tail &= gc->wq_size - 1;
> +	/* The GuC firmware wants the tail index in QWords, not bytes */
> +	tail = rq->tail;

Used to be sampled from rq->ringbuf->tail - are those the same?

> +	GEM_BUG_ON(tail & 7);
> +	tail >>= 3;
> +	GEM_BUG_ON(tail > WQ_RING_TAIL_MAX);
>
>   	/* For now workqueue item is 4 DWs; workqueue buffer is 2 pages. So we
>   	 * should not have the case where structure wqi is across page, neither
> @@ -495,19 +500,23 @@ static int guc_add_workqueue_item(struct i915_guc_client *gc,
>   	 * XXX: if not the case, we need save data to a temp wqi and copy it to
>   	 * workqueue buffer dw by dw.
>   	 */
> -	WARN_ON(sizeof(struct guc_wq_item) != 16);
> -	WARN_ON(wq_off & 3);
> +	BUILD_BUG_ON(wqi_size != 16);
>
> -	/* wq starts from the page after doorbell / process_desc */
> -	base = kmap_atomic(i915_gem_object_get_page(gc->client_obj,
> -			(wq_off + GUC_DB_SIZE) >> PAGE_SHIFT));
> +	/* postincrement WQ tail for next time */
> +	wq_off = gc->wq_tail;
> +	gc->wq_tail += wqi_size;
> +	gc->wq_tail &= gc->wq_size - 1;
> +	GEM_BUG_ON(wq_off & (wqi_size - 1));

Use to be wq_off & 3, now is wq_off & 15, which one is correct?

> +
> +	/* WQ starts from the page after doorbell / process_desc */
> +	wq_page = (wq_off + GUC_DB_SIZE) >> PAGE_SHIFT;
>   	wq_off &= PAGE_SIZE - 1;
> +	base = kmap_atomic(i915_gem_object_get_page(gc->client_obj, wq_page));
>   	wqi = (struct guc_wq_item *)((char *)base + wq_off);
>
> -	/* len does not include the header */
> -	wq_len = sizeof(struct guc_wq_item) / sizeof(u32) - 1;
> +	/* Now fill in the 4-word work queue item */
>   	wqi->header = WQ_TYPE_INORDER |
> -			(wq_len << WQ_LEN_SHIFT) |
> +			(wqi_len << WQ_LEN_SHIFT) |
>   			(rq->engine->guc_id << WQ_TARGET_SHIFT) |
>   			WQ_NO_WCFLUSH_WAIT;
>
> @@ -515,14 +524,10 @@ static int guc_add_workqueue_item(struct i915_guc_client *gc,
>   	wqi->context_desc = (u32)intel_lr_context_descriptor(rq->ctx,
>   							     rq->engine);
>
> -	/* The GuC firmware wants the tail index in QWords, not bytes */
> -	tail = rq->ringbuf->tail >> 3;
>   	wqi->ring_tail = tail << WQ_RING_TAIL_SHIFT;
> -	wqi->fence_id = 0; /*XXX: what fence to be here */
> +	wqi->fence_id = rq->seqno;

Not mentioned in the commit?

>
>   	kunmap_atomic(base);
> -
> -	return 0;
>   }
>
>   /**
> @@ -537,26 +542,20 @@ int i915_guc_submit(struct drm_i915_gem_request *rq)
>   	unsigned int engine_id = rq->engine->guc_id;
>   	struct intel_guc *guc = &rq->i915->guc;
>   	struct i915_guc_client *client = guc->execbuf_client;
> -	int q_ret, b_ret;
> +	int b_ret;

What is b_ret out of interest, door*B*ell return code ?

>
> -	q_ret = guc_add_workqueue_item(client, rq);
> -	if (q_ret == 0)
> -		b_ret = guc_ring_doorbell(client);
> +	guc_add_workqueue_item(client, rq);
> +	b_ret = guc_ring_doorbell(client);
>
>   	client->submissions[engine_id] += 1;
> -	if (q_ret) {
> -		client->q_fail += 1;
> -		client->retcode = q_ret;
> -	} else if (b_ret) {
> +	client->retcode = b_ret;

I wanted to ask for what is this for but then found it is for debugfs.

> +	if (b_ret)
>   		client->b_fail += 1;
> -		client->retcode = q_ret = b_ret;
> -	} else {
> -		client->retcode = 0;
> -	}
> +
>   	guc->submissions[engine_id] += 1;
>   	guc->last_seqno[engine_id] = rq->seqno;
>
> -	return q_ret;
> +	return b_ret;

Could also kill the return value from this one, the only caller, 
intel_logical_ring_advance_and_submit does not use it. That function 
itself does not look like it needs a return value at the moment. :)

>   }
>
>   /*
> diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
> index 436f2d6..10e1d5e 100644
> --- a/drivers/gpu/drm/i915/intel_guc.h
> +++ b/drivers/gpu/drm/i915/intel_guc.h
> @@ -74,7 +74,7 @@ struct i915_guc_client {
>   	/* GuC submission statistics & status */
>   	uint64_t submissions[GUC_MAX_ENGINES_NUM];
>   	uint32_t no_wq_space;		/* Space pre-check failed	*/
> -	uint32_t q_fail;		/* Failed to queue (MBZ)	*/
> +	uint32_t q_fail;		/* No longer used		*/

Get rid of it then?

>   	uint32_t b_fail;		/* Doorbell failure (MBZ)	*/
>   	int retcode;			/* Result of last guc_submit()	*/
>   };
> diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h b/drivers/gpu/drm/i915/intel_guc_fwif.h
> index 2de57ff..944786d 100644
> --- a/drivers/gpu/drm/i915/intel_guc_fwif.h
> +++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
> @@ -71,7 +71,8 @@
>   #define   WQ_WORKLOAD_TOUCH		(2 << WQ_WORKLOAD_SHIFT)
>
>   #define WQ_RING_TAIL_SHIFT		20
> -#define WQ_RING_TAIL_MASK		(0x7FF << WQ_RING_TAIL_SHIFT)
> +#define WQ_RING_TAIL_MAX		0x7FF	/* 2^11 QWords */
> +#define WQ_RING_TAIL_MASK		(WQ_RING_TAIL_MAX << WQ_RING_TAIL_SHIFT)
>
>   #define GUC_DOORBELL_ENABLED		1
>   #define GUC_DOORBELL_DISABLED		0
>

Regards,

Tvrtko

Chris Wilson April 29, 2016, 4:10 p.m. UTC | #2

On Fri, Apr 29, 2016 at 04:44:24PM +0100, Tvrtko Ursulin wrote:
> 
> On 27/04/16 19:03, Dave Gordon wrote:
> >Mostly little optimisations; for instance, if the driver is correctly
> >following the submission protocol, the "out of space" condition is
> >impossible, so the previous runtime WARN_ON() is promoted to a
> >GEM_BUG_ON() for a more dramatic effect in development and less impact
> >in end-user systems.
> >
> >Similarly we can replace other WARN_ON() conditions that don't relate to
> >the hardware state with either BUILD_BUG_ON() for compile-time-
> >detectable issues, or GEM_BUG_ON() for logical "can't happen" errors.
> >
> >With those changes, we can convert it to void, as suggested by Chris
> >Wilson, and update the calling code appropriately.
> >
> >Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> >Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >
> >---
> >  drivers/gpu/drm/i915/i915_guc_submission.c | 69 +++++++++++++++---------------
> >  drivers/gpu/drm/i915/intel_guc.h           |  2 +-
> >  drivers/gpu/drm/i915/intel_guc_fwif.h      |  3 +-
> >  3 files changed, 37 insertions(+), 37 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> >index 6626eff..4d2ea84 100644
> >--- a/drivers/gpu/drm/i915/i915_guc_submission.c
> >+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> >@@ -470,23 +470,28 @@ int i915_guc_wq_check_space(struct drm_i915_gem_request *request)
> >  	return -EAGAIN;
> >  }
> >
> >-static int guc_add_workqueue_item(struct i915_guc_client *gc,
> >-				  struct drm_i915_gem_request *rq)
> >+static void guc_add_workqueue_item(struct i915_guc_client *gc,
> >+				   struct drm_i915_gem_request *rq)
> >  {
> >+	/* wqi_len is in DWords, and does not include the one-word header */
> >+	const size_t wqi_size = sizeof(struct guc_wq_item);
> 
> Again, u32 is correct I think.
> 
> >+	const u32 wqi_len = wqi_size/sizeof(u32) - 1;
> >  	struct guc_process_desc *desc;
> >  	struct guc_wq_item *wqi;
> >  	void *base;
> >-	u32 tail, wq_len, wq_off, space;
> >+	u32 space, tail, wq_off, wq_page;
> >
> >  	desc = gc->client_base + gc->proc_desc_offset;
> >+
> >+	/* Space was checked earlier, in i915_guc_wq_check_space() above */
> 
> It may be above in the file, but the two do not call one another so
> I recommend saying exactly who called it.
> 
> >  	space = CIRC_SPACE(gc->wq_tail, desc->head, gc->wq_size);
> >-	if (WARN_ON(space < sizeof(struct guc_wq_item)))
> >-		return -ENOSPC; /* shouldn't happen */
> >+	GEM_BUG_ON(space < wqi_size);
> 
> It is impossible to hit this only because of the struct_mutex
> guarding the whole time window from request creation to submission.
> If in the future, near or far, that gets fixed, then this will need
> reworking.

Request submission will still have to serialised by a "ring" mutex, 
from the time we allocate the request to the time we add it to whatever
submission queue. It should still hold that we can pin all the required
resources (ringbuffer, context state, vm page tables, workqueues) up
front and take any errors early and then rely on our preallocation when
submitting the request.

> I don't have any better ideas though.
> 
> But a WARN_ON and return would be almost as good. Everything is
> better than a dead machine one can't ssh into...
> 
> So I appeal to make this a WARN_ON and return. Nothing bad would
> happen apart from software thinking GPU has hung.

Hence why not make it a bug? If you can't ssh in because the driver died
inside GEM, something is very wrong.
-Chris

Tvrtko Ursulin May 3, 2016, 10:26 a.m. UTC | #3

On 29/04/16 17:10, Chris Wilson wrote:
> On Fri, Apr 29, 2016 at 04:44:24PM +0100, Tvrtko Ursulin wrote:
>>
>> On 27/04/16 19:03, Dave Gordon wrote:
>>> Mostly little optimisations; for instance, if the driver is correctly
>>> following the submission protocol, the "out of space" condition is
>>> impossible, so the previous runtime WARN_ON() is promoted to a
>>> GEM_BUG_ON() for a more dramatic effect in development and less impact
>>> in end-user systems.
>>>
>>> Similarly we can replace other WARN_ON() conditions that don't relate to
>>> the hardware state with either BUILD_BUG_ON() for compile-time-
>>> detectable issues, or GEM_BUG_ON() for logical "can't happen" errors.
>>>
>>> With those changes, we can convert it to void, as suggested by Chris
>>> Wilson, and update the calling code appropriately.
>>>
>>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>
>>> ---
>>>   drivers/gpu/drm/i915/i915_guc_submission.c | 69 +++++++++++++++---------------
>>>   drivers/gpu/drm/i915/intel_guc.h           |  2 +-
>>>   drivers/gpu/drm/i915/intel_guc_fwif.h      |  3 +-
>>>   3 files changed, 37 insertions(+), 37 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
>>> index 6626eff..4d2ea84 100644
>>> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
>>> @@ -470,23 +470,28 @@ int i915_guc_wq_check_space(struct drm_i915_gem_request *request)
>>>   	return -EAGAIN;
>>>   }
>>>
>>> -static int guc_add_workqueue_item(struct i915_guc_client *gc,
>>> -				  struct drm_i915_gem_request *rq)
>>> +static void guc_add_workqueue_item(struct i915_guc_client *gc,
>>> +				   struct drm_i915_gem_request *rq)
>>>   {
>>> +	/* wqi_len is in DWords, and does not include the one-word header */
>>> +	const size_t wqi_size = sizeof(struct guc_wq_item);
>>
>> Again, u32 is correct I think.
>>
>>> +	const u32 wqi_len = wqi_size/sizeof(u32) - 1;
>>>   	struct guc_process_desc *desc;
>>>   	struct guc_wq_item *wqi;
>>>   	void *base;
>>> -	u32 tail, wq_len, wq_off, space;
>>> +	u32 space, tail, wq_off, wq_page;
>>>
>>>   	desc = gc->client_base + gc->proc_desc_offset;
>>> +
>>> +	/* Space was checked earlier, in i915_guc_wq_check_space() above */
>>
>> It may be above in the file, but the two do not call one another so
>> I recommend saying exactly who called it.
>>
>>>   	space = CIRC_SPACE(gc->wq_tail, desc->head, gc->wq_size);
>>> -	if (WARN_ON(space < sizeof(struct guc_wq_item)))
>>> -		return -ENOSPC; /* shouldn't happen */
>>> +	GEM_BUG_ON(space < wqi_size);
>>
>> It is impossible to hit this only because of the struct_mutex
>> guarding the whole time window from request creation to submission.
>> If in the future, near or far, that gets fixed, then this will need
>> reworking.
>
> Request submission will still have to serialised by a "ring" mutex,
> from the time we allocate the request to the time we add it to whatever
> submission queue. It should still hold that we can pin all the required
> resources (ringbuffer, context state, vm page tables, workqueues) up
> front and take any errors early and then rely on our preallocation when
> submitting the request.
>
>> I don't have any better ideas though.
>>
>> But a WARN_ON and return would be almost as good. Everything is
>> better than a dead machine one can't ssh into...
>>
>> So I appeal to make this a WARN_ON and return. Nothing bad would
>> happen apart from software thinking GPU has hung.
>
> Hence why not make it a bug? If you can't ssh in because the driver died
> inside GEM, something is very wrong.

I was sure bugs are kernel panics, looks like I've been running with 
panic on oops for too long. :(

GEM_BUG_ON is ok then.

Regards,

Tvrtko

Dave Gordon May 5, 2016, 6:38 p.m. UTC | #4

On 29/04/2016 16:44, Tvrtko Ursulin wrote:
>
> On 27/04/16 19:03, Dave Gordon wrote:
>> Mostly little optimisations; for instance, if the driver is correctly
>> following the submission protocol, the "out of space" condition is
>> impossible, so the previous runtime WARN_ON() is promoted to a
>> GEM_BUG_ON() for a more dramatic effect in development and less impact
>> in end-user systems.
>>
>> Similarly we can replace other WARN_ON() conditions that don't relate to
>> the hardware state with either BUILD_BUG_ON() for compile-time-
>> detectable issues, or GEM_BUG_ON() for logical "can't happen" errors.
>>
>> With those changes, we can convert it to void, as suggested by Chris
>> Wilson, and update the calling code appropriately.
>>
>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> ---
>>   drivers/gpu/drm/i915/i915_guc_submission.c | 69 
>> +++++++++++++++---------------
>>   drivers/gpu/drm/i915/intel_guc.h           |  2 +-
>>   drivers/gpu/drm/i915/intel_guc_fwif.h      |  3 +-
>>   3 files changed, 37 insertions(+), 37 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c 
>> b/drivers/gpu/drm/i915/i915_guc_submission.c
>> index 6626eff..4d2ea84 100644
>> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
>> @@ -470,23 +470,28 @@ int i915_guc_wq_check_space(struct 
>> drm_i915_gem_request *request)
>>       return -EAGAIN;
>>   }
>>
>> -static int guc_add_workqueue_item(struct i915_guc_client *gc,
>> -                  struct drm_i915_gem_request *rq)
>> +static void guc_add_workqueue_item(struct i915_guc_client *gc,
>> +                   struct drm_i915_gem_request *rq)
>>   {
>> +    /* wqi_len is in DWords, and does not include the one-word 
>> header */
>> +    const size_t wqi_size = sizeof(struct guc_wq_item);
>
> Again, u32 is correct I think.
Nope, it's a sizeof(), but the compiler will check that it fits in u32 
when we convert it to DWords below.
>
>> +    const u32 wqi_len = wqi_size/sizeof(u32) - 1;
>>       struct guc_process_desc *desc;
>>       struct guc_wq_item *wqi;
>>       void *base;
>> -    u32 tail, wq_len, wq_off, space;
>> +    u32 space, tail, wq_off, wq_page;
>>
>>       desc = gc->client_base + gc->proc_desc_offset;
>> +
>> +    /* Space was checked earlier, in i915_guc_wq_check_space() above */
>
> It may be above in the file, but the two do not call one another so I 
> recommend saying exactly who called it.
I don't really mind who called it, as long as it was called sometime 
earlier in the request submission protocol -- the callsite may get moved 
around a bit. Use cscope(1) or your favourite IDE to find it.
>
>>       space = CIRC_SPACE(gc->wq_tail, desc->head, gc->wq_size);
>> -    if (WARN_ON(space < sizeof(struct guc_wq_item)))
>> -        return -ENOSPC; /* shouldn't happen */
>> +    GEM_BUG_ON(space < wqi_size);
>
> It is impossible to hit this only because of the struct_mutex guarding 
> the whole time window from request creation to submission. If in the 
> future, near or far, that gets fixed, then this will need reworking.
>
> I don't have any better ideas though.
>
> But a WARN_ON and return would be almost as good. Everything is better 
> than a dead machine one can't ssh into...
>
> So I appeal to make this a WARN_ON and return. Nothing bad would 
> happen apart from software thinking GPU has hung.
If the driver violates the sequencing of check+submit then it does 
indeed require reworking -- that would be a bug. Hence (GEM_)BUG_ON().
>
>>
>> -    /* postincrement WQ tail for next time */
>> -    wq_off = gc->wq_tail;
>> -    gc->wq_tail += sizeof(struct guc_wq_item);
>> -    gc->wq_tail &= gc->wq_size - 1;
>> +    /* The GuC firmware wants the tail index in QWords, not bytes */
>> +    tail = rq->tail;
>
> Used to be sampled from rq->ringbuf->tail - are those the same?
It should always have been rq->tail; they're the same at present because 
it was copied from ringbuffer as part of the submission process, but it 
might not be the same with the scheduler & preemption. So let's get it 
right before it becomes a mysterious bug.
>
>> +    GEM_BUG_ON(tail & 7);
>> +    tail >>= 3;
>> +    GEM_BUG_ON(tail > WQ_RING_TAIL_MAX);
>>
>>       /* For now workqueue item is 4 DWs; workqueue buffer is 2 
>> pages. So we
>>        * should not have the case where structure wqi is across page, 
>> neither
>> @@ -495,19 +500,23 @@ static int guc_add_workqueue_item(struct 
>> i915_guc_client *gc,
>>        * XXX: if not the case, we need save data to a temp wqi and 
>> copy it to
>>        * workqueue buffer dw by dw.
>>        */
>> -    WARN_ON(sizeof(struct guc_wq_item) != 16);
>> -    WARN_ON(wq_off & 3);
>> +    BUILD_BUG_ON(wqi_size != 16);
>>
>> -    /* wq starts from the page after doorbell / process_desc */
>> -    base = kmap_atomic(i915_gem_object_get_page(gc->client_obj,
>> -            (wq_off + GUC_DB_SIZE) >> PAGE_SHIFT));
>> +    /* postincrement WQ tail for next time */
>> +    wq_off = gc->wq_tail;
>> +    gc->wq_tail += wqi_size;
>> +    gc->wq_tail &= gc->wq_size - 1;
>> +    GEM_BUG_ON(wq_off & (wqi_size - 1));
>
> Use to be wq_off & 3, now is wq_off & 15, which one is correct?
The new one :) I made it more stringent, so we check for correct 
alignment of wq_tail to sizeof(queue item), not just DWord.
>
>> +
>> +    /* WQ starts from the page after doorbell / process_desc */
>> +    wq_page = (wq_off + GUC_DB_SIZE) >> PAGE_SHIFT;
>>       wq_off &= PAGE_SIZE - 1;
>> +    base = kmap_atomic(i915_gem_object_get_page(gc->client_obj, 
>> wq_page));
>>       wqi = (struct guc_wq_item *)((char *)base + wq_off);
>>
>> -    /* len does not include the header */
>> -    wq_len = sizeof(struct guc_wq_item) / sizeof(u32) - 1;
>> +    /* Now fill in the 4-word work queue item */
>>       wqi->header = WQ_TYPE_INORDER |
>> -            (wq_len << WQ_LEN_SHIFT) |
>> +            (wqi_len << WQ_LEN_SHIFT) |
>>               (rq->engine->guc_id << WQ_TARGET_SHIFT) |
>>               WQ_NO_WCFLUSH_WAIT;
>>
>> @@ -515,14 +524,10 @@ static int guc_add_workqueue_item(struct 
>> i915_guc_client *gc,
>>       wqi->context_desc = (u32)intel_lr_context_descriptor(rq->ctx,
>>                                    rq->engine);
>>
>> -    /* The GuC firmware wants the tail index in QWords, not bytes */
>> -    tail = rq->ringbuf->tail >> 3;
>>       wqi->ring_tail = tail << WQ_RING_TAIL_SHIFT;
>> -    wqi->fence_id = 0; /*XXX: what fence to be here */
>> +    wqi->fence_id = rq->seqno;
>
> Not mentioned in the commit?
Not actually used either, but I expect it was provided for a Windows 
fence ID, so our equivalent is a seqno.
It may show up in a GuC log, in which case seqno provides more helpful 
information.
>
>>
>>       kunmap_atomic(base);
>> -
>> -    return 0;
>>   }
>>
>>   /**
>> @@ -537,26 +542,20 @@ int i915_guc_submit(struct drm_i915_gem_request 
>> *rq)
>>       unsigned int engine_id = rq->engine->guc_id;
>>       struct intel_guc *guc = &rq->i915->guc;
>>       struct i915_guc_client *client = guc->execbuf_client;
>> -    int q_ret, b_ret;
>> +    int b_ret;
>
> What is b_ret out of interest, door*B*ell return code ?
>
Yes, originally q_ret & b_ret, for return from Queuing and Bellringing, 
respectively. Queuing can no longer return an error, but ringing the 
doorbell can, even if only in the event of hardware not behaving as 
specified.
>>
>> -    q_ret = guc_add_workqueue_item(client, rq);
>> -    if (q_ret == 0)
>> -        b_ret = guc_ring_doorbell(client);
>> +    guc_add_workqueue_item(client, rq);
>> +    b_ret = guc_ring_doorbell(client);
>>
>>       client->submissions[engine_id] += 1;
>> -    if (q_ret) {
>> -        client->q_fail += 1;
>> -        client->retcode = q_ret;
>> -    } else if (b_ret) {
>> +    client->retcode = b_ret;
>
> I wanted to ask for what is this for but then found it is for debugfs.
>
>> +    if (b_ret)
>>           client->b_fail += 1;
>> -        client->retcode = q_ret = b_ret;
>> -    } else {
>> -        client->retcode = 0;
>> -    }
>> +
>>       guc->submissions[engine_id] += 1;
>>       guc->last_seqno[engine_id] = rq->seqno;
>>
>> -    return q_ret;
>> +    return b_ret;
>
> Could also kill the return value from this one, the only caller, 
> intel_logical_ring_advance_and_submit does not use it. That function 
> itself does not look like it needs a return value at the moment. :)
That's really only because we don't have an exit strategy for submission 
failure. I would rather see all parts of the submission process able to 
return errors (e.g. hardware not responding) and let the top-level code 
implement some sort of recovery strategy. Anyway, I'm leaving this here 
until someone changes the call signature to void.
>
>>   }
>>
>>   /*
>> diff --git a/drivers/gpu/drm/i915/intel_guc.h 
>> b/drivers/gpu/drm/i915/intel_guc.h
>> index 436f2d6..10e1d5e 100644
>> --- a/drivers/gpu/drm/i915/intel_guc.h
>> +++ b/drivers/gpu/drm/i915/intel_guc.h
>> @@ -74,7 +74,7 @@ struct i915_guc_client {
>>       /* GuC submission statistics & status */
>>       uint64_t submissions[GUC_MAX_ENGINES_NUM];
>>       uint32_t no_wq_space;        /* Space pre-check failed */
>> -    uint32_t q_fail;        /* Failed to queue (MBZ)    */
>> +    uint32_t q_fail;        /* No longer used        */
>
> Get rid of it then?
No, it keeps things aligned. It can be reused for the next thing we need 
to add.

.Dave.

>
>>       uint32_t b_fail;        /* Doorbell failure (MBZ)    */
>>       int retcode;            /* Result of last guc_submit() */
>>   };
>> diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h 
>> b/drivers/gpu/drm/i915/intel_guc_fwif.h
>> index 2de57ff..944786d 100644
>> --- a/drivers/gpu/drm/i915/intel_guc_fwif.h
>> +++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
>> @@ -71,7 +71,8 @@
>>   #define   WQ_WORKLOAD_TOUCH        (2 << WQ_WORKLOAD_SHIFT)
>>
>>   #define WQ_RING_TAIL_SHIFT        20
>> -#define WQ_RING_TAIL_MASK        (0x7FF << WQ_RING_TAIL_SHIFT)
>> +#define WQ_RING_TAIL_MAX        0x7FF    /* 2^11 QWords */
>> +#define WQ_RING_TAIL_MASK        (WQ_RING_TAIL_MAX << 
>> WQ_RING_TAIL_SHIFT)
>>
>>   #define GUC_DOORBELL_ENABLED        1
>>   #define GUC_DOORBELL_DISABLED        0
>>
>
> Regards,
>
> Tvrtko
>

Tvrtko Ursulin May 6, 2016, 8:55 a.m. UTC | #5

On 05/05/16 19:38, Dave Gordon wrote:
> On 29/04/2016 16:44, Tvrtko Ursulin wrote:
>>
>> On 27/04/16 19:03, Dave Gordon wrote:
>>> Mostly little optimisations; for instance, if the driver is correctly
>>> following the submission protocol, the "out of space" condition is
>>> impossible, so the previous runtime WARN_ON() is promoted to a
>>> GEM_BUG_ON() for a more dramatic effect in development and less impact
>>> in end-user systems.
>>>
>>> Similarly we can replace other WARN_ON() conditions that don't relate to
>>> the hardware state with either BUILD_BUG_ON() for compile-time-
>>> detectable issues, or GEM_BUG_ON() for logical "can't happen" errors.
>>>
>>> With those changes, we can convert it to void, as suggested by Chris
>>> Wilson, and update the calling code appropriately.
>>>
>>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>
>>> ---
>>>   drivers/gpu/drm/i915/i915_guc_submission.c | 69
>>> +++++++++++++++---------------
>>>   drivers/gpu/drm/i915/intel_guc.h           |  2 +-
>>>   drivers/gpu/drm/i915/intel_guc_fwif.h      |  3 +-
>>>   3 files changed, 37 insertions(+), 37 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c
>>> b/drivers/gpu/drm/i915/i915_guc_submission.c
>>> index 6626eff..4d2ea84 100644
>>> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
>>> @@ -470,23 +470,28 @@ int i915_guc_wq_check_space(struct
>>> drm_i915_gem_request *request)
>>>       return -EAGAIN;
>>>   }
>>>
>>> -static int guc_add_workqueue_item(struct i915_guc_client *gc,
>>> -                  struct drm_i915_gem_request *rq)
>>> +static void guc_add_workqueue_item(struct i915_guc_client *gc,
>>> +                   struct drm_i915_gem_request *rq)
>>>   {
>>> +    /* wqi_len is in DWords, and does not include the one-word
>>> header */
>>> +    const size_t wqi_size = sizeof(struct guc_wq_item);
>>
>> Again, u32 is correct I think.
> Nope, it's a sizeof(), but the compiler will check that it fits in u32
> when we convert it to DWords below.

I already wrote in this same thread few days ago this was my oversight.

>>
>>> +    const u32 wqi_len = wqi_size/sizeof(u32) - 1;
>>>       struct guc_process_desc *desc;
>>>       struct guc_wq_item *wqi;
>>>       void *base;
>>> -    u32 tail, wq_len, wq_off, space;
>>> +    u32 space, tail, wq_off, wq_page;
>>>
>>>       desc = gc->client_base + gc->proc_desc_offset;
>>> +
>>> +    /* Space was checked earlier, in i915_guc_wq_check_space() above */
>>
>> It may be above in the file, but the two do not call one another so I
>> recommend saying exactly who called it.
> I don't really mind who called it, as long as it was called sometime
> earlier in the request submission protocol -- the callsite may get moved
> around a bit. Use cscope(1) or your favourite IDE to find it.

So what does this comment supposed to tell the reader? 
guc_add_workqueue_item does not call i915_guc_wq_check_space so "above" 
what? Above in the file? Or maybe *earlier* in the call sequence of 
what? Might as well put an useful comment in when you are adding one.

>>
>>>       space = CIRC_SPACE(gc->wq_tail, desc->head, gc->wq_size);
>>> -    if (WARN_ON(space < sizeof(struct guc_wq_item)))
>>> -        return -ENOSPC; /* shouldn't happen */
>>> +    GEM_BUG_ON(space < wqi_size);
>>
>> It is impossible to hit this only because of the struct_mutex guarding
>> the whole time window from request creation to submission. If in the
>> future, near or far, that gets fixed, then this will need reworking.
>>
>> I don't have any better ideas though.
>>
>> But a WARN_ON and return would be almost as good. Everything is better
>> than a dead machine one can't ssh into...
>>
>> So I appeal to make this a WARN_ON and return. Nothing bad would
>> happen apart from software thinking GPU has hung.
> If the driver violates the sequencing of check+submit then it does
> indeed require reworking -- that would be a bug. Hence (GEM_)BUG_ON().

This also was clarified in this same thread few days ago and I conceded 
the point.

>>
>>>
>>> -    /* postincrement WQ tail for next time */
>>> -    wq_off = gc->wq_tail;
>>> -    gc->wq_tail += sizeof(struct guc_wq_item);
>>> -    gc->wq_tail &= gc->wq_size - 1;
>>> +    /* The GuC firmware wants the tail index in QWords, not bytes */
>>> +    tail = rq->tail;
>>
>> Used to be sampled from rq->ringbuf->tail - are those the same?
> It should always have been rq->tail; they're the same at present because
> it was copied from ringbuffer as part of the submission process, but it
> might not be the same with the scheduler & preemption. So let's get it
> right before it becomes a mysterious bug.
>>
>>> +    GEM_BUG_ON(tail & 7);
>>> +    tail >>= 3;
>>> +    GEM_BUG_ON(tail > WQ_RING_TAIL_MAX);
>>>
>>>       /* For now workqueue item is 4 DWs; workqueue buffer is 2
>>> pages. So we
>>>        * should not have the case where structure wqi is across page,
>>> neither
>>> @@ -495,19 +500,23 @@ static int guc_add_workqueue_item(struct
>>> i915_guc_client *gc,
>>>        * XXX: if not the case, we need save data to a temp wqi and
>>> copy it to
>>>        * workqueue buffer dw by dw.
>>>        */
>>> -    WARN_ON(sizeof(struct guc_wq_item) != 16);
>>> -    WARN_ON(wq_off & 3);
>>> +    BUILD_BUG_ON(wqi_size != 16);
>>>
>>> -    /* wq starts from the page after doorbell / process_desc */
>>> -    base = kmap_atomic(i915_gem_object_get_page(gc->client_obj,
>>> -            (wq_off + GUC_DB_SIZE) >> PAGE_SHIFT));
>>> +    /* postincrement WQ tail for next time */
>>> +    wq_off = gc->wq_tail;
>>> +    gc->wq_tail += wqi_size;
>>> +    gc->wq_tail &= gc->wq_size - 1;
>>> +    GEM_BUG_ON(wq_off & (wqi_size - 1));
>>
>> Use to be wq_off & 3, now is wq_off & 15, which one is correct?
> The new one :) I made it more stringent, so we check for correct
> alignment of wq_tail to sizeof(queue item), not just DWord.

Worth mentioning in the commit message? Since it begins with "Mostly 
little optimisations;" readers/reviewers expect to see no functional 
changes then.

>>
>>> +
>>> +    /* WQ starts from the page after doorbell / process_desc */
>>> +    wq_page = (wq_off + GUC_DB_SIZE) >> PAGE_SHIFT;
>>>       wq_off &= PAGE_SIZE - 1;
>>> +    base = kmap_atomic(i915_gem_object_get_page(gc->client_obj,
>>> wq_page));
>>>       wqi = (struct guc_wq_item *)((char *)base + wq_off);
>>>
>>> -    /* len does not include the header */
>>> -    wq_len = sizeof(struct guc_wq_item) / sizeof(u32) - 1;
>>> +    /* Now fill in the 4-word work queue item */
>>>       wqi->header = WQ_TYPE_INORDER |
>>> -            (wq_len << WQ_LEN_SHIFT) |
>>> +            (wqi_len << WQ_LEN_SHIFT) |
>>>               (rq->engine->guc_id << WQ_TARGET_SHIFT) |
>>>               WQ_NO_WCFLUSH_WAIT;
>>>
>>> @@ -515,14 +524,10 @@ static int guc_add_workqueue_item(struct
>>> i915_guc_client *gc,
>>>       wqi->context_desc = (u32)intel_lr_context_descriptor(rq->ctx,
>>>                                    rq->engine);
>>>
>>> -    /* The GuC firmware wants the tail index in QWords, not bytes */
>>> -    tail = rq->ringbuf->tail >> 3;
>>>       wqi->ring_tail = tail << WQ_RING_TAIL_SHIFT;
>>> -    wqi->fence_id = 0; /*XXX: what fence to be here */
>>> +    wqi->fence_id = rq->seqno;
>>
>> Not mentioned in the commit?
> Not actually used either, but I expect it was provided for a Windows
> fence ID, so our equivalent is a seqno.
> It may show up in a GuC log, in which case seqno provides more helpful
> information.

Best to mention it in the commit imho, again, not to confuse 
readers/reviewers.

>>
>>>
>>>       kunmap_atomic(base);
>>> -
>>> -    return 0;
>>>   }
>>>
>>>   /**
>>> @@ -537,26 +542,20 @@ int i915_guc_submit(struct drm_i915_gem_request
>>> *rq)
>>>       unsigned int engine_id = rq->engine->guc_id;
>>>       struct intel_guc *guc = &rq->i915->guc;
>>>       struct i915_guc_client *client = guc->execbuf_client;
>>> -    int q_ret, b_ret;
>>> +    int b_ret;
>>
>> What is b_ret out of interest, door*B*ell return code ?
>>
> Yes, originally q_ret & b_ret, for return from Queuing and Bellringing,
> respectively. Queuing can no longer return an error, but ringing the
> doorbell can, even if only in the event of hardware not behaving as
> specified.
>>>
>>> -    q_ret = guc_add_workqueue_item(client, rq);
>>> -    if (q_ret == 0)
>>> -        b_ret = guc_ring_doorbell(client);
>>> +    guc_add_workqueue_item(client, rq);
>>> +    b_ret = guc_ring_doorbell(client);
>>>
>>>       client->submissions[engine_id] += 1;
>>> -    if (q_ret) {
>>> -        client->q_fail += 1;
>>> -        client->retcode = q_ret;
>>> -    } else if (b_ret) {
>>> +    client->retcode = b_ret;
>>
>> I wanted to ask for what is this for but then found it is for debugfs.
>>
>>> +    if (b_ret)
>>>           client->b_fail += 1;
>>> -        client->retcode = q_ret = b_ret;
>>> -    } else {
>>> -        client->retcode = 0;
>>> -    }
>>> +
>>>       guc->submissions[engine_id] += 1;
>>>       guc->last_seqno[engine_id] = rq->seqno;
>>>
>>> -    return q_ret;
>>> +    return b_ret;
>>
>> Could also kill the return value from this one, the only caller,
>> intel_logical_ring_advance_and_submit does not use it. That function
>> itself does not look like it needs a return value at the moment. :)
> That's really only because we don't have an exit strategy for submission
> failure. I would rather see all parts of the submission process able to
> return errors (e.g. hardware not responding) and let the top-level code
> implement some sort of recovery strategy. Anyway, I'm leaving this here
> until someone changes the call signature to void.
>>
>>>   }
>>>
>>>   /*
>>> diff --git a/drivers/gpu/drm/i915/intel_guc.h
>>> b/drivers/gpu/drm/i915/intel_guc.h
>>> index 436f2d6..10e1d5e 100644
>>> --- a/drivers/gpu/drm/i915/intel_guc.h
>>> +++ b/drivers/gpu/drm/i915/intel_guc.h
>>> @@ -74,7 +74,7 @@ struct i915_guc_client {
>>>       /* GuC submission statistics & status */
>>>       uint64_t submissions[GUC_MAX_ENGINES_NUM];
>>>       uint32_t no_wq_space;        /* Space pre-check failed */
>>> -    uint32_t q_fail;        /* Failed to queue (MBZ)    */
>>> +    uint32_t q_fail;        /* No longer used        */
>>
>> Get rid of it then?
> No, it keeps things aligned. It can be reused for the next thing we need
> to add.

AFAIR u32s need to be 4-byte aligned on x86_64 so I don't get it but 
whatever.

Regards,

Tvrtko

Dave Gordon May 6, 2016, 3:06 p.m. UTC | #6

On 06/05/16 09:55, Tvrtko Ursulin wrote:
>
> On 05/05/16 19:38, Dave Gordon wrote:
>> On 29/04/2016 16:44, Tvrtko Ursulin wrote:
>>>
>>> On 27/04/16 19:03, Dave Gordon wrote:
>>>> Mostly little optimisations; for instance, if the driver is correctly
>>>> following the submission protocol, the "out of space" condition is
>>>> impossible, so the previous runtime WARN_ON() is promoted to a
>>>> GEM_BUG_ON() for a more dramatic effect in development and less impact
>>>> in end-user systems.
>>>>
>>>> Similarly we can replace other WARN_ON() conditions that don't
>>>> relate to
>>>> the hardware state with either BUILD_BUG_ON() for compile-time-
>>>> detectable issues, or GEM_BUG_ON() for logical "can't happen" errors.
>>>>
>>>> With those changes, we can convert it to void, as suggested by Chris
>>>> Wilson, and update the calling code appropriately.
>>>>
>>>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>>
>>>> ---
>>>>   drivers/gpu/drm/i915/i915_guc_submission.c | 69
>>>> +++++++++++++++---------------
>>>>   drivers/gpu/drm/i915/intel_guc.h           |  2 +-
>>>>   drivers/gpu/drm/i915/intel_guc_fwif.h      |  3 +-
>>>>   3 files changed, 37 insertions(+), 37 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c
>>>> b/drivers/gpu/drm/i915/i915_guc_submission.c
>>>> index 6626eff..4d2ea84 100644
>>>> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
>>>> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
>>>> @@ -470,23 +470,28 @@ int i915_guc_wq_check_space(struct
>>>> drm_i915_gem_request *request)
>>>>       return -EAGAIN;
>>>>   }
>>>>
>>>> -static int guc_add_workqueue_item(struct i915_guc_client *gc,
>>>> -                  struct drm_i915_gem_request *rq)
>>>> +static void guc_add_workqueue_item(struct i915_guc_client *gc,
>>>> +                   struct drm_i915_gem_request *rq)
>>>>   {
>>>> +    /* wqi_len is in DWords, and does not include the one-word
>>>> header */
>>>> +    const size_t wqi_size = sizeof(struct guc_wq_item);
>>>
>>> Again, u32 is correct I think.
>> Nope, it's a sizeof(), but the compiler will check that it fits in u32
>> when we convert it to DWords below.
>
> I already wrote in this same thread few days ago this was my oversight.

Yeah, I'm answering messages out of sequence :)

>>>> +    const u32 wqi_len = wqi_size/sizeof(u32) - 1;
>>>>       struct guc_process_desc *desc;
>>>>       struct guc_wq_item *wqi;
>>>>       void *base;
>>>> -    u32 tail, wq_len, wq_off, space;
>>>> +    u32 space, tail, wq_off, wq_page;
>>>>
>>>>       desc = gc->client_base + gc->proc_desc_offset;
>>>> +
>>>> +    /* Space was checked earlier, in i915_guc_wq_check_space()
>>>> above */
>>>
>>> It may be above in the file, but the two do not call one another so I
>>> recommend saying exactly who called it.
 >>>
>> I don't really mind who called it, as long as it was called sometime
>> earlier in the request submission protocol -- the callsite may get moved
>> around a bit. Use cscope(1) or your favourite IDE to find it.
>
> So what does this comment supposed to tell the reader?

It tells the reader that we "cannot" be out of space, because we already 
checked that there was enough. Unless, of course, somebody removed the 
check. The commit message says that, too.

> guc_add_workqueue_item does not call i915_guc_wq_check_space so "above"
> what? Above in the file? Or maybe *earlier* in the call sequence of
> what? Might as well put an useful comment in when you are adding one.

It already has both the words "earlier" (as in, before in time, in 
calling sequence), and "above" (meaning, in textual order, in this 
file). So yes, both "above" AND "earlier". Anyway, this one-liner is 
only here to justify the lack of any check-and-wait code. OTOH I could 
add some kerneldoc elsewhere to explain that the caller must call 
check_space() before calling submit().

>>>>       space = CIRC_SPACE(gc->wq_tail, desc->head, gc->wq_size);
>>>> -    if (WARN_ON(space < sizeof(struct guc_wq_item)))
>>>> -        return -ENOSPC; /* shouldn't happen */
>>>> +    GEM_BUG_ON(space < wqi_size);
>>>
>>> It is impossible to hit this only because of the struct_mutex guarding
>>> the whole time window from request creation to submission. If in the
>>> future, near or far, that gets fixed, then this will need reworking.
>>>
>>> I don't have any better ideas though.
>>>
>>> But a WARN_ON and return would be almost as good. Everything is better
>>> than a dead machine one can't ssh into...
>>>
>>> So I appeal to make this a WARN_ON and return. Nothing bad would
>>> happen apart from software thinking GPU has hung.
 >>
>> If the driver violates the sequencing of check+submit then it does
>> indeed require reworking -- that would be a bug. Hence (GEM_)BUG_ON().
>
> This also was clarified in this same thread few days ago and I conceded
> the point.
>
>>>>
>>>> -    /* postincrement WQ tail for next time */
>>>> -    wq_off = gc->wq_tail;
>>>> -    gc->wq_tail += sizeof(struct guc_wq_item);
>>>> -    gc->wq_tail &= gc->wq_size - 1;
>>>> +    /* The GuC firmware wants the tail index in QWords, not bytes */
>>>> +    tail = rq->tail;
>>>
>>> Used to be sampled from rq->ringbuf->tail - are those the same?
 >>
>> It should always have been rq->tail; they're the same at present because
>> it was copied from ringbuffer as part of the submission process, but it
>> might not be the same with the scheduler & preemption. So let's get it
>> right before it becomes a mysterious bug.
>>
>>>> +    GEM_BUG_ON(tail & 7);
>>>> +    tail >>= 3;
>>>> +    GEM_BUG_ON(tail > WQ_RING_TAIL_MAX);
>>>>
>>>>       /* For now workqueue item is 4 DWs; workqueue buffer is 2
>>>> pages. So we
>>>>        * should not have the case where structure wqi is across page,
>>>> neither
>>>> @@ -495,19 +500,23 @@ static int guc_add_workqueue_item(struct
>>>> i915_guc_client *gc,
>>>>        * XXX: if not the case, we need save data to a temp wqi and
>>>> copy it to
>>>>        * workqueue buffer dw by dw.
>>>>        */
>>>> -    WARN_ON(sizeof(struct guc_wq_item) != 16);
>>>> -    WARN_ON(wq_off & 3);
>>>> +    BUILD_BUG_ON(wqi_size != 16);
>>>>
>>>> -    /* wq starts from the page after doorbell / process_desc */
>>>> -    base = kmap_atomic(i915_gem_object_get_page(gc->client_obj,
>>>> -            (wq_off + GUC_DB_SIZE) >> PAGE_SHIFT));
>>>> +    /* postincrement WQ tail for next time */
>>>> +    wq_off = gc->wq_tail;
>>>> +    gc->wq_tail += wqi_size;
>>>> +    gc->wq_tail &= gc->wq_size - 1;
>>>> +    GEM_BUG_ON(wq_off & (wqi_size - 1));
>>>
>>> Use to be wq_off & 3, now is wq_off & 15, which one is correct?
>> The new one :) I made it more stringent, so we check for correct
>> alignment of wq_tail to sizeof(queue item), not just DWord.
>
> Worth mentioning in the commit message? Since it begins with "Mostly
> little optimisations;" readers/reviewers expect to see no functional
> changes then.

It *isn't* a functional change. It *should* be impossible for the value 
ever NOT to be a multiple of 16, and neither the old version nor the new 
version has ever been triggered. It's just future-proofing for the day 
when someone else changes this code and introduces a bug :)

>>>> +
>>>> +    /* WQ starts from the page after doorbell / process_desc */
>>>> +    wq_page = (wq_off + GUC_DB_SIZE) >> PAGE_SHIFT;
>>>>       wq_off &= PAGE_SIZE - 1;
>>>> +    base = kmap_atomic(i915_gem_object_get_page(gc->client_obj,
>>>> wq_page));
>>>>       wqi = (struct guc_wq_item *)((char *)base + wq_off);
>>>>
>>>> -    /* len does not include the header */
>>>> -    wq_len = sizeof(struct guc_wq_item) / sizeof(u32) - 1;
>>>> +    /* Now fill in the 4-word work queue item */
>>>>       wqi->header = WQ_TYPE_INORDER |
>>>> -            (wq_len << WQ_LEN_SHIFT) |
>>>> +            (wqi_len << WQ_LEN_SHIFT) |
>>>>               (rq->engine->guc_id << WQ_TARGET_SHIFT) |
>>>>               WQ_NO_WCFLUSH_WAIT;
>>>>
>>>> @@ -515,14 +524,10 @@ static int guc_add_workqueue_item(struct
>>>> i915_guc_client *gc,
>>>>       wqi->context_desc = (u32)intel_lr_context_descriptor(rq->ctx,
>>>>                                    rq->engine);
>>>>
>>>> -    /* The GuC firmware wants the tail index in QWords, not bytes */
>>>> -    tail = rq->ringbuf->tail >> 3;
>>>>       wqi->ring_tail = tail << WQ_RING_TAIL_SHIFT;
>>>> -    wqi->fence_id = 0; /*XXX: what fence to be here */
>>>> +    wqi->fence_id = rq->seqno;
>>>
>>> Not mentioned in the commit?
 >>
>> Not actually used either, but I expect it was provided for a Windows
>> fence ID, so our equivalent is a seqno.
>> It may show up in a GuC log, in which case seqno provides more helpful
>> information.
>
> Best to mention it in the commit imho, again, not to confuse
> readers/reviewers.

As the (deleted) comment says, we didn't really know what to put here, 
so no one can have been relying on an *absence* of useful information. 
Maybe next week we'll put the system timestamp in this field instead. 
Also, I'm not guaranteeing that this value *will* show up anywhere at 
all; for now at least, GuC logging is an uncommitted interface. So 
noting that we're setting a previously-unused value to a new value that 
has no functional effect and isn't visible through any stable interface 
seems a little pointless.

(Of course, *next* time somebody changes it there will probably be a 
good reason, like "timestamp is more useful than seqno in GuC log").

>>>>       kunmap_atomic(base);
>>>> -
>>>> -    return 0;
>>>>   }
>>>>
>>>>   /**
>>>> @@ -537,26 +542,20 @@ int i915_guc_submit(struct drm_i915_gem_request
>>>> *rq)
>>>>       unsigned int engine_id = rq->engine->guc_id;
>>>>       struct intel_guc *guc = &rq->i915->guc;
>>>>       struct i915_guc_client *client = guc->execbuf_client;
>>>> -    int q_ret, b_ret;
>>>> +    int b_ret;
>>>
>>> What is b_ret out of interest, door*B*ell return code ?
>>>
>> Yes, originally q_ret & b_ret, for return from Queuing and Bellringing,
>> respectively. Queuing can no longer return an error, but ringing the
>> doorbell can, even if only in the event of hardware not behaving as
>> specified.
>>>
>>>> -    q_ret = guc_add_workqueue_item(client, rq);
>>>> -    if (q_ret == 0)
>>>> -        b_ret = guc_ring_doorbell(client);
>>>> +    guc_add_workqueue_item(client, rq);
>>>> +    b_ret = guc_ring_doorbell(client);
>>>>
>>>>       client->submissions[engine_id] += 1;
>>>> -    if (q_ret) {
>>>> -        client->q_fail += 1;
>>>> -        client->retcode = q_ret;
>>>> -    } else if (b_ret) {
>>>> +    client->retcode = b_ret;
>>>
>>> I wanted to ask for what is this for but then found it is for debugfs.
>>>
>>>> +    if (b_ret)
>>>>           client->b_fail += 1;
>>>> -        client->retcode = q_ret = b_ret;
>>>> -    } else {
>>>> -        client->retcode = 0;
>>>> -    }
>>>> +
>>>>       guc->submissions[engine_id] += 1;
>>>>       guc->last_seqno[engine_id] = rq->seqno;
>>>>
>>>> -    return q_ret;
>>>> +    return b_ret;
>>>
>>> Could also kill the return value from this one, the only caller,
>>> intel_logical_ring_advance_and_submit does not use it. That function
>>> itself does not look like it needs a return value at the moment. :)
 >>
>> That's really only because we don't have an exit strategy for submission
>> failure. I would rather see all parts of the submission process able to
>> return errors (e.g. hardware not responding) and let the top-level code
>> implement some sort of recovery strategy. Anyway, I'm leaving this here
>> until someone changes the call signature to void.
>>
>>>>   }
>>>>
>>>>   /*
>>>> diff --git a/drivers/gpu/drm/i915/intel_guc.h
>>>> b/drivers/gpu/drm/i915/intel_guc.h
>>>> index 436f2d6..10e1d5e 100644
>>>> --- a/drivers/gpu/drm/i915/intel_guc.h
>>>> +++ b/drivers/gpu/drm/i915/intel_guc.h
>>>> @@ -74,7 +74,7 @@ struct i915_guc_client {
>>>>       /* GuC submission statistics & status */
>>>>       uint64_t submissions[GUC_MAX_ENGINES_NUM];
>>>>       uint32_t no_wq_space;        /* Space pre-check failed */
>>>> -    uint32_t q_fail;        /* Failed to queue (MBZ)    */
>>>> +    uint32_t q_fail;        /* No longer used        */
>>>
>>> Get rid of it then?
>> No, it keeps things aligned. It can be reused for the next thing we need
>> to add.
>
> AFAIR u32s need to be 4-byte aligned on x86_64 so I don't get it but
> whatever.
>
> Regards,
> Tvrtko

It's not just about keeping things DWord aligned, but also optimising 
cache-line placement. One recommendation (from an Intel-internal site) 
suggests anything up to 16 bytes be aligned to the power-of-two value 
equal to or greater than its own size, and anything over 64 bytes (in 
aggregate) should be aligned to 64 bytes (1 cacheline). At present, this 
structure exactly fits a multiple of 64, so we might as well just leave 
the spare word for reuse later.

.Dave.

[v2,4/5] drm/i915/guc: rework guc_add_workqueue_item()

Commit Message

Comments

Patch