diff mbox

[v2,3/3] drm/i915: Consolidate ring freespace calculations

Message ID 1416341242-5352-4-git-send-email-david.s.gordon@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dave Gordon Nov. 18, 2014, 8:07 p.m. UTC
There are numerous places in the code where the driver's idea of
how much space is left in a ring is updated using the driver's
latest notions of the positions of 'head' and 'tail' for the ring.
Among them are some that update one or both of these values before
(re)doing the calculation. In particular, there are four different
places in the code where 'last_retired_head' is copied to 'head'
and then set to -1; and two of these do not have a guard to check
that it has actually been updated since last time it was consumed,
leaving the possibility that the dummy -1 can be transferred from
'last_retired_head' to 'head', causing the space calculation to
produce 'impossible' results (previously seen on Android/VLV).

This code therefore consolidates all the calculation and updating of
these values, such that there is only one place where the ring space
is updated, and it ALWAYS uses (and consumes) 'last_retired_head' if
(and ONLY if) it has been updated since the last call.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_dma.c         |    5 ++-
 drivers/gpu/drm/i915/intel_lrc.c        |   25 +++++----------
 drivers/gpu/drm/i915/intel_ringbuffer.c |   51 ++++++++++++++++---------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |    1 +
 4 files changed, 37 insertions(+), 45 deletions(-)

Comments

Daniel Vetter Nov. 24, 2014, 10:04 a.m. UTC | #1
On Tue, Nov 18, 2014 at 08:07:22PM +0000, Dave Gordon wrote:
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index ae09258..a08ae65 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -52,16 +52,27 @@ intel_ring_initialized(struct intel_engine_cs *ring)
>  
>  int __intel_ring_space(int head, int tail, int size)
>  {
> -	int space = head - (tail + I915_RING_FREE_SPACE);
> -	if (space < 0)
> +	int space = head - tail;
> +	if (space <= 0)
>  		space += size;
> -	return space;
> +	return space - I915_RING_FREE_SPACE;

Changing the free space computation doesn't seem required, but resulting
in Chris&me just discussing it on irc to convince ourselves it's not
broken accidentally now. Can you please respin you patch with this change
dropped?

In generally it's good practice to review the diff after committing a
patch and hunt for any unecessary changes. Imo even when the endresult
looks a bit ulgy (e.g. crazy ordering of static functions which requires
tons of predeclarations) it's better if the resulting diff looks cleaner.
And if things get out of hand we can always do a pure cleanup patch.
-Daniel
Dave Gordon Nov. 24, 2014, 2:32 p.m. UTC | #2
On 24/11/14 10:04, Daniel Vetter wrote:
> On Tue, Nov 18, 2014 at 08:07:22PM +0000, Dave Gordon wrote:
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index ae09258..a08ae65 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -52,16 +52,27 @@ intel_ring_initialized(struct intel_engine_cs *ring)
>>  
>>  int __intel_ring_space(int head, int tail, int size)
>>  {
>> -	int space = head - (tail + I915_RING_FREE_SPACE);
>> -	if (space < 0)
>> +	int space = head - tail;
>> +	if (space <= 0)
>>  		space += size;
>> -	return space;
>> +	return space - I915_RING_FREE_SPACE;
> 
> Changing the free space computation doesn't seem required, but resulting
> in Chris&me just discussing it on irc to convince ourselves it's not
> broken accidentally now. Can you please respin you patch with this change
> dropped?
> 
> In generally it's good practice to review the diff after committing a
> patch and hunt for any unecessary changes. Imo even when the endresult
> looks a bit ulgy (e.g. crazy ordering of static functions which requires
> tons of predeclarations) it's better if the resulting diff looks cleaner.
> And if things get out of hand we can always do a pure cleanup patch.
> -Daniel

This isn't an accidental change; it's to improve resilience in the case
that head and/or tail end up with values they shouldn't have.

Here's a C program to model the two different calculations in a tiny ring:

#include <stdio.h>

#define FREE    4
#define SIZE    8

main()
{
        int h, t, s1, s2;

        printf("%s\t%s\t%s\t%s\n", "Head", "Tail", "OSpace", "NSpace");
        for (h = 0; h <= SIZE; ++h)
        {
                for (t = 0; t <= SIZE; ++t)
                {
                        s1 = h - (t + FREE);
                        if (s1 < 0)
                                s1 += SIZE;

                        s2 = h - t;
                        if (s2 <= 0)
                                s2 += SIZE;
                        s2 -= FREE;

                        printf("%2d\t%2d\t%2d\t%2d\n", h, t, s1, s2);
                }
                printf("\n");
        }
}

OSpace (s1) uses the old code, whereas NSpace (s2) is my new method.
They agree for all well-behaved cases, but look at this snippet:

Head	Tail	OSpace	NSpace
 6	 0	 2	 2
 6	 1	 1	 1
 6	 2	 0	 0
 6	 3	 7	-1
 6	 4	 6	-2
 6	 5	 5	-3
 6	 6	 4	 4
 6	 7	 3	 3
 6	 8	 2	 2

Both the old and new code give the right answers if HEAD and TAIL have
legitimate values. But if TAIL should somehow advance further than it
should, and run into the reserved area, the old code might tell you that
there's now LOTS of space available, whereas the new code returns "less
than zero" space available.

For example, the old calculation tells us that if head=6 and tail=4 then
there are 6 slots available -- which just can't be right, as by
definition the answer should never be more than (SIZE-RSVD). I'd much
rather get the answer "-2" for this case, as it's then obvious that this
really shouldn't happen.

In particular, this new code would mean that the commonly-used test
(available >= required) would immediately reject further writes into the
ring after an overrun, giving some chance that we can recover from or at
least diagnose the original problem; whereas allowing more writes would
likely both confuse the h/w and destroy the evidence.

It's also easier to understand, IMHO (if experienced programmers such as
you & Chris had to discuss the old code to be confident that it was
correct, that already suggests that it's not as clear as it could be).

The used space in a ring is given by the cyclic distance from the
consumer to the producer; conversely, the available space in a ring is
the cyclic distance from the producer to the consumer, MINUS the amount
reserved. The new code expresses that directly, without having to figure
out the meaning of ADDING the reserved amount to the tail before
subtracting it from head. In ASCII pix:

                  +++++++++++++++++++
                  +      free       +  0
                  +      free       +  1
                  +    reserved     +  2
                  +    reserved     +  3
(consumer) HEAD-> +      used       +  4
                  +      used       +  5
                  +      used       +  6
                  +      used       +  7
                  +      used       +  8
                  +      used       +  9
(producer) TAIL-> +      free       + 10
                  +      free       + 11
                  +++++++++++++++++++

The sketch above shows the situation with size=12, reserved=2, TAIL=10
and HEAD=4. Slots 4 to 9 are used (TAIL-HEAD = 10-4 = 6 slots). The
available space is (HEAD-TAIL (+SIZE) - RSVD = 4-10+12-2 = 4 slots).

                  +++++++++++++++++++
                  +      used       +  0
                  +      used       +  1
(producer) TAIL-> +      free       +  2
                  +      free       +  3
                  +    reserved     +  4
                  +    reserved     +  5
(consumer) HEAD-> +      used       +  6
                  +      used       +  7
                  +      used       +  8
                  +      used       +  9
                  +      used       + 10
                  +      used       + 11
                  +++++++++++++++++++

After TAIL has wrapped, we might have this situation with TAIL=2 and
HEAD=6. Used space is (TAIL-HEAD (+SIZE) = 2-6+12 = 8 slots), and
available space is (HEAD-TAIL - RSVD) = 6-2-2 = 2 slots. Straightforward
and easy to understand :)

So, I'd definitely prefer to keep this new code. We not only want to do
the calculation in only one place, but we want to do it in the best
possible way, with the minimum chance of propagating errors and
confusing both h/w and developers if someone does introduce a ring
overrun or off-by-one error in some ring-stuffing code elsewhere.
(BTW,

.Dave.
deepak.s@linux.intel.com Nov. 25, 2014, 7:59 a.m. UTC | #3
On Wednesday 19 November 2014 01:37 AM, Dave Gordon wrote:
> There are numerous places in the code where the driver's idea of
> how much space is left in a ring is updated using the driver's
> latest notions of the positions of 'head' and 'tail' for the ring.
> Among them are some that update one or both of these values before
> (re)doing the calculation. In particular, there are four different
> places in the code where 'last_retired_head' is copied to 'head'
> and then set to -1; and two of these do not have a guard to check
> that it has actually been updated since last time it was consumed,
> leaving the possibility that the dummy -1 can be transferred from
> 'last_retired_head' to 'head', causing the space calculation to
> produce 'impossible' results (previously seen on Android/VLV).
>
> This code therefore consolidates all the calculation and updating of
> these values, such that there is only one place where the ring space
> is updated, and it ALWAYS uses (and consumes) 'last_retired_head' if
> (and ONLY if) it has been updated since the last call.
>
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_dma.c         |    5 ++-
>   drivers/gpu/drm/i915/intel_lrc.c        |   25 +++++----------
>   drivers/gpu/drm/i915/intel_ringbuffer.c |   51 ++++++++++++++++---------------
>   drivers/gpu/drm/i915/intel_ringbuffer.h |    1 +
>   4 files changed, 37 insertions(+), 45 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 5dc37f0..4a98399 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -154,11 +154,10 @@ void i915_kernel_lost_context(struct drm_device *dev)
>   	if (drm_core_check_feature(dev, DRIVER_MODESET))
>   		return;
>   
> +	ringbuf->last_retired_head = -1;
>   	ringbuf->head = I915_READ_HEAD(ring) & HEAD_ADDR;
>   	ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
> -	ringbuf->space = ringbuf->head - (ringbuf->tail + I915_RING_FREE_SPACE);
> -	if (ringbuf->space < 0)
> -		ringbuf->space += ringbuf->size;
> +	intel_ring_update_space(ringbuf);
>   
>   	if (!dev->primary->master)
>   		return;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index ad31373..c9a5227 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -825,14 +825,8 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
>   	u32 seqno = 0;
>   	int ret;
>   
> -	if (ringbuf->last_retired_head != -1) {
> -		ringbuf->head = ringbuf->last_retired_head;
> -		ringbuf->last_retired_head = -1;
> -
> -		ringbuf->space = intel_ring_space(ringbuf);
> -		if (ringbuf->space >= bytes)
> -			return 0;
> -	}
> +	if (intel_ring_space(ringbuf) >= bytes)
> +		return 0;
>   
>   	list_for_each_entry(request, &ring->request_list, list) {
>   		/*
> @@ -860,11 +854,8 @@ static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
>   		return ret;
>   
>   	i915_gem_retire_requests_ring(ring);
> -	ringbuf->head = ringbuf->last_retired_head;
> -	ringbuf->last_retired_head = -1;
>   
> -	ringbuf->space = intel_ring_space(ringbuf);
> -	return 0;
> +	return intel_ring_space(ringbuf) >= bytes ? 0 : -ENOSPC;
>   }
>   
>   static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
> @@ -890,12 +881,10 @@ static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
>   	 * case by choosing an insanely large timeout. */
>   	end = jiffies + 60 * HZ;
>   
> +	ret = 0;
>   	do {
> -		ringbuf->space = intel_ring_space(ringbuf);
> -		if (ringbuf->space >= bytes) {
> -			ret = 0;
> +		if (intel_ring_space(ringbuf) >= bytes)
>   			break;
> -		}
>   
>   		msleep(1);
>   
> @@ -936,7 +925,7 @@ static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf)
>   		iowrite32(MI_NOOP, virt++);
>   
>   	ringbuf->tail = 0;
> -	ringbuf->space = intel_ring_space(ringbuf);
> +	intel_ring_update_space(ringbuf);
>   
>   	return 0;
>   }
> @@ -1777,8 +1766,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
>   	ringbuf->effective_size = ringbuf->size;
>   	ringbuf->head = 0;
>   	ringbuf->tail = 0;
> -	ringbuf->space = ringbuf->size;
>   	ringbuf->last_retired_head = -1;
> +	intel_ring_update_space(ringbuf);
>   
>   	/* TODO: For now we put this in the mappable region so that we can reuse
>   	 * the existing ringbuffer code which ioremaps it. When we start
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index ae09258..a08ae65 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -52,16 +52,27 @@ intel_ring_initialized(struct intel_engine_cs *ring)
>   
>   int __intel_ring_space(int head, int tail, int size)
>   {
> -	int space = head - (tail + I915_RING_FREE_SPACE);
> -	if (space < 0)
> +	int space = head - tail;
> +	if (space <= 0)
>   		space += size;
> -	return space;
> +	return space - I915_RING_FREE_SPACE;
> +}
> +
> +void intel_ring_update_space(struct intel_ringbuffer *ringbuf)
> +{
> +	if (ringbuf->last_retired_head != -1) {
> +		ringbuf->head = ringbuf->last_retired_head;
> +		ringbuf->last_retired_head = -1;
> +	}
> +
> +	ringbuf->space = __intel_ring_space(ringbuf->head & HEAD_ADDR,
> +					    ringbuf->tail, ringbuf->size);
>   }
>   
>   int intel_ring_space(struct intel_ringbuffer *ringbuf)
>   {
> -	return __intel_ring_space(ringbuf->head & HEAD_ADDR,
> -				  ringbuf->tail, ringbuf->size);
> +	intel_ring_update_space(ringbuf);
> +	return ringbuf->space;
>   }
>   
>   bool intel_ring_stopped(struct intel_engine_cs *ring)
> @@ -592,10 +603,10 @@ static int init_ring_common(struct intel_engine_cs *ring)
>   	if (!drm_core_check_feature(ring->dev, DRIVER_MODESET))
>   		i915_kernel_lost_context(ring->dev);
>   	else {
> +		ringbuf->last_retired_head = -1;
>   		ringbuf->head = I915_READ_HEAD(ring);
>   		ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
> -		ringbuf->space = intel_ring_space(ringbuf);
> -		ringbuf->last_retired_head = -1;
> +		intel_ring_update_space(ringbuf);
>   	}
>   
>   	memset(&ring->hangcheck, 0, sizeof(ring->hangcheck));
> @@ -1880,14 +1891,8 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
>   	u32 seqno = 0;
>   	int ret;
>   
> -	if (ringbuf->last_retired_head != -1) {
> -		ringbuf->head = ringbuf->last_retired_head;
> -		ringbuf->last_retired_head = -1;
> -
> -		ringbuf->space = intel_ring_space(ringbuf);
> -		if (ringbuf->space >= n)
> -			return 0;
> -	}
> +	if (intel_ring_space(ringbuf) >= n)
> +		return 0;
>   
>   	list_for_each_entry(request, &ring->request_list, list) {
>   		if (__intel_ring_space(request->tail, ringbuf->tail,
> @@ -1905,10 +1910,7 @@ static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
>   		return ret;
>   
>   	i915_gem_retire_requests_ring(ring);
> -	ringbuf->head = ringbuf->last_retired_head;
> -	ringbuf->last_retired_head = -1;
>   
> -	ringbuf->space = intel_ring_space(ringbuf);
>   	return 0;
>   }
>   
> @@ -1934,14 +1936,14 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
>   	 * case by choosing an insanely large timeout. */
>   	end = jiffies + 60 * HZ;
>   
> +	ret = 0;
>   	trace_i915_ring_wait_begin(ring);
>   	do {
> +		if (intel_ring_space(ringbuf) >= n)
> +			break;
>   		ringbuf->head = I915_READ_HEAD(ring);
> -		ringbuf->space = intel_ring_space(ringbuf);
> -		if (ringbuf->space >= n) {
> -			ret = 0;
> +		if (intel_ring_space(ringbuf) >= n)
>   			break;
> -		}
>   
>   		if (!drm_core_check_feature(dev, DRIVER_MODESET) &&
>   		    dev->primary->master) {
> @@ -1989,7 +1991,7 @@ static int intel_wrap_ring_buffer(struct intel_engine_cs *ring)
>   		iowrite32(MI_NOOP, virt++);
>   
>   	ringbuf->tail = 0;
> -	ringbuf->space = intel_ring_space(ringbuf);
> +	intel_ring_update_space(ringbuf);
>   
>   	return 0;
>   }
> @@ -2061,6 +2063,7 @@ int intel_ring_begin(struct intel_engine_cs *ring,
>   		     int num_dwords)
>   {
>   	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct intel_ringbuffer *ringbuf = ring->buffer;
>   	int ret;
>   
>   	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
> @@ -2077,7 +2080,7 @@ int intel_ring_begin(struct intel_engine_cs *ring,
>   	if (ret)
>   		return ret;
>   
> -	ring->buffer->space -= num_dwords * sizeof(uint32_t);
> +	ringbuf->space -= num_dwords * sizeof(uint32_t);
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index aab2e2f..24a5723 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -404,6 +404,7 @@ static inline void intel_ring_advance(struct intel_engine_cs *ring)
>   	ringbuf->tail &= ringbuf->size - 1;
>   }
>   int __intel_ring_space(int head, int tail, int size);
> +void intel_ring_update_space(struct intel_ringbuffer *ringbuf);
>   int intel_ring_space(struct intel_ringbuffer *ringbuf);
>   bool intel_ring_stopped(struct intel_engine_cs *ring);
>   void __intel_ring_advance(struct intel_engine_cs *ring);

Looks fine to me
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Daniel Vetter Nov. 25, 2014, 11:41 a.m. UTC | #4
On Mon, Nov 24, 2014 at 02:32:25PM +0000, Dave Gordon wrote:
> On 24/11/14 10:04, Daniel Vetter wrote:
> > On Tue, Nov 18, 2014 at 08:07:22PM +0000, Dave Gordon wrote:
> >> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >> index ae09258..a08ae65 100644
> >> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> >> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >> @@ -52,16 +52,27 @@ intel_ring_initialized(struct intel_engine_cs *ring)
> >>  
> >>  int __intel_ring_space(int head, int tail, int size)
> >>  {
> >> -	int space = head - (tail + I915_RING_FREE_SPACE);
> >> -	if (space < 0)
> >> +	int space = head - tail;
> >> +	if (space <= 0)
> >>  		space += size;
> >> -	return space;
> >> +	return space - I915_RING_FREE_SPACE;
> > 
> > Changing the free space computation doesn't seem required, but resulting
> > in Chris&me just discussing it on irc to convince ourselves it's not
> > broken accidentally now. Can you please respin you patch with this change
> > dropped?
> > 
> > In generally it's good practice to review the diff after committing a
> > patch and hunt for any unecessary changes. Imo even when the endresult
> > looks a bit ulgy (e.g. crazy ordering of static functions which requires
> > tons of predeclarations) it's better if the resulting diff looks cleaner.
> > And if things get out of hand we can always do a pure cleanup patch.
> > -Daniel
> 
> This isn't an accidental change; it's to improve resilience in the case
> that head and/or tail end up with values they shouldn't have.
> 
> Here's a C program to model the two different calculations in a tiny ring:
> 
> #include <stdio.h>
> 
> #define FREE    4
> #define SIZE    8
> 
> main()
> {
>         int h, t, s1, s2;
> 
>         printf("%s\t%s\t%s\t%s\n", "Head", "Tail", "OSpace", "NSpace");
>         for (h = 0; h <= SIZE; ++h)
>         {
>                 for (t = 0; t <= SIZE; ++t)
>                 {
>                         s1 = h - (t + FREE);
>                         if (s1 < 0)
>                                 s1 += SIZE;
> 
>                         s2 = h - t;
>                         if (s2 <= 0)
>                                 s2 += SIZE;
>                         s2 -= FREE;
> 
>                         printf("%2d\t%2d\t%2d\t%2d\n", h, t, s1, s2);
>                 }
>                 printf("\n");
>         }
> }
> 
> OSpace (s1) uses the old code, whereas NSpace (s2) is my new method.
> They agree for all well-behaved cases, but look at this snippet:
> 
> Head	Tail	OSpace	NSpace
>  6	 0	 2	 2
>  6	 1	 1	 1
>  6	 2	 0	 0
>  6	 3	 7	-1
>  6	 4	 6	-2
>  6	 5	 5	-3
>  6	 6	 4	 4
>  6	 7	 3	 3
>  6	 8	 2	 2
> 
> Both the old and new code give the right answers if HEAD and TAIL have
> legitimate values. But if TAIL should somehow advance further than it
> should, and run into the reserved area, the old code might tell you that
> there's now LOTS of space available, whereas the new code returns "less
> than zero" space available.
> 
> For example, the old calculation tells us that if head=6 and tail=4 then
> there are 6 slots available -- which just can't be right, as by
> definition the answer should never be more than (SIZE-RSVD). I'd much
> rather get the answer "-2" for this case, as it's then obvious that this
> really shouldn't happen.
> 
> In particular, this new code would mean that the commonly-used test
> (available >= required) would immediately reject further writes into the
> ring after an overrun, giving some chance that we can recover from or at
> least diagnose the original problem; whereas allowing more writes would
> likely both confuse the h/w and destroy the evidence.
> 
> It's also easier to understand, IMHO (if experienced programmers such as
> you & Chris had to discuss the old code to be confident that it was
> correct, that already suggests that it's not as clear as it could be).
> 
> The used space in a ring is given by the cyclic distance from the
> consumer to the producer; conversely, the available space in a ring is
> the cyclic distance from the producer to the consumer, MINUS the amount
> reserved. The new code expresses that directly, without having to figure
> out the meaning of ADDING the reserved amount to the tail before
> subtracting it from head. In ASCII pix:
> 
>                   +++++++++++++++++++
>                   +      free       +  0
>                   +      free       +  1
>                   +    reserved     +  2
>                   +    reserved     +  3
> (consumer) HEAD-> +      used       +  4
>                   +      used       +  5
>                   +      used       +  6
>                   +      used       +  7
>                   +      used       +  8
>                   +      used       +  9
> (producer) TAIL-> +      free       + 10
>                   +      free       + 11
>                   +++++++++++++++++++
> 
> The sketch above shows the situation with size=12, reserved=2, TAIL=10
> and HEAD=4. Slots 4 to 9 are used (TAIL-HEAD = 10-4 = 6 slots). The
> available space is (HEAD-TAIL (+SIZE) - RSVD = 4-10+12-2 = 4 slots).
> 
>                   +++++++++++++++++++
>                   +      used       +  0
>                   +      used       +  1
> (producer) TAIL-> +      free       +  2
>                   +      free       +  3
>                   +    reserved     +  4
>                   +    reserved     +  5
> (consumer) HEAD-> +      used       +  6
>                   +      used       +  7
>                   +      used       +  8
>                   +      used       +  9
>                   +      used       + 10
>                   +      used       + 11
>                   +++++++++++++++++++
> 
> After TAIL has wrapped, we might have this situation with TAIL=2 and
> HEAD=6. Used space is (TAIL-HEAD (+SIZE) = 2-6+12 = 8 slots), and
> available space is (HEAD-TAIL - RSVD) = 6-2-2 = 2 slots. Straightforward
> and easy to understand :)
> 
> So, I'd definitely prefer to keep this new code. We not only want to do
> the calculation in only one place, but we want to do it in the best
> possible way, with the minimum chance of propagating errors and
> confusing both h/w and developers if someone does introduce a ring
> overrun or off-by-one error in some ring-stuffing code elsewhere.
> (BTW,

Given the extensive explanation you've delivered here this _definitely_
should be in a separate patch. Rule-of-thumb: If you have multipled pieces
in your commit message/explanation and tie them together in the
explanation with an "and" you should consider splitting the patch along
the "and"s. Except when it's all really trivial stuff (e.g. going ocd over
docs or so). For the commit message please just reuse the above mail, it's
a great explanation!

And could we perhaps WARN_ON if the negative space thing happens, which
should make such an occurance a lot easier to understand?

Thanks, Daniel
Chris Wilson Nov. 25, 2014, 11:47 a.m. UTC | #5
On Tue, Nov 25, 2014 at 12:41:16PM +0100, Daniel Vetter wrote:
> On Mon, Nov 24, 2014 at 02:32:25PM +0000, Dave Gordon wrote:
> > On 24/11/14 10:04, Daniel Vetter wrote:
> > > On Tue, Nov 18, 2014 at 08:07:22PM +0000, Dave Gordon wrote:
> > >> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > >> index ae09258..a08ae65 100644
> > >> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > >> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > >> @@ -52,16 +52,27 @@ intel_ring_initialized(struct intel_engine_cs *ring)
> > >>  
> > >>  int __intel_ring_space(int head, int tail, int size)
> > >>  {
> > >> -	int space = head - (tail + I915_RING_FREE_SPACE);
> > >> -	if (space < 0)
> > >> +	int space = head - tail;
> > >> +	if (space <= 0)
> > >>  		space += size;
> > >> -	return space;
> > >> +	return space - I915_RING_FREE_SPACE;
> > > 
> > > Changing the free space computation doesn't seem required, but resulting
> > > in Chris&me just discussing it on irc to convince ourselves it's not
> > > broken accidentally now. Can you please respin you patch with this change
> > > dropped?
> > > 
> > > In generally it's good practice to review the diff after committing a
> > > patch and hunt for any unecessary changes. Imo even when the endresult
> > > looks a bit ulgy (e.g. crazy ordering of static functions which requires
> > > tons of predeclarations) it's better if the resulting diff looks cleaner.
> > > And if things get out of hand we can always do a pure cleanup patch.
> > > -Daniel
> > 
> > This isn't an accidental change; it's to improve resilience in the case
> > that head and/or tail end up with values they shouldn't have.

Looks like a fun story. How about tackling a real issue like we should
prevent updating TAIL whilst in the same cacheline as HEAD?
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 5dc37f0..4a98399 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -154,11 +154,10 @@  void i915_kernel_lost_context(struct drm_device *dev)
 	if (drm_core_check_feature(dev, DRIVER_MODESET))
 		return;
 
+	ringbuf->last_retired_head = -1;
 	ringbuf->head = I915_READ_HEAD(ring) & HEAD_ADDR;
 	ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
-	ringbuf->space = ringbuf->head - (ringbuf->tail + I915_RING_FREE_SPACE);
-	if (ringbuf->space < 0)
-		ringbuf->space += ringbuf->size;
+	intel_ring_update_space(ringbuf);
 
 	if (!dev->primary->master)
 		return;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ad31373..c9a5227 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -825,14 +825,8 @@  static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
 	u32 seqno = 0;
 	int ret;
 
-	if (ringbuf->last_retired_head != -1) {
-		ringbuf->head = ringbuf->last_retired_head;
-		ringbuf->last_retired_head = -1;
-
-		ringbuf->space = intel_ring_space(ringbuf);
-		if (ringbuf->space >= bytes)
-			return 0;
-	}
+	if (intel_ring_space(ringbuf) >= bytes)
+		return 0;
 
 	list_for_each_entry(request, &ring->request_list, list) {
 		/*
@@ -860,11 +854,8 @@  static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
 		return ret;
 
 	i915_gem_retire_requests_ring(ring);
-	ringbuf->head = ringbuf->last_retired_head;
-	ringbuf->last_retired_head = -1;
 
-	ringbuf->space = intel_ring_space(ringbuf);
-	return 0;
+	return intel_ring_space(ringbuf) >= bytes ? 0 : -ENOSPC;
 }
 
 static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
@@ -890,12 +881,10 @@  static int logical_ring_wait_for_space(struct intel_ringbuffer *ringbuf,
 	 * case by choosing an insanely large timeout. */
 	end = jiffies + 60 * HZ;
 
+	ret = 0;
 	do {
-		ringbuf->space = intel_ring_space(ringbuf);
-		if (ringbuf->space >= bytes) {
-			ret = 0;
+		if (intel_ring_space(ringbuf) >= bytes)
 			break;
-		}
 
 		msleep(1);
 
@@ -936,7 +925,7 @@  static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf)
 		iowrite32(MI_NOOP, virt++);
 
 	ringbuf->tail = 0;
-	ringbuf->space = intel_ring_space(ringbuf);
+	intel_ring_update_space(ringbuf);
 
 	return 0;
 }
@@ -1777,8 +1766,8 @@  int intel_lr_context_deferred_create(struct intel_context *ctx,
 	ringbuf->effective_size = ringbuf->size;
 	ringbuf->head = 0;
 	ringbuf->tail = 0;
-	ringbuf->space = ringbuf->size;
 	ringbuf->last_retired_head = -1;
+	intel_ring_update_space(ringbuf);
 
 	/* TODO: For now we put this in the mappable region so that we can reuse
 	 * the existing ringbuffer code which ioremaps it. When we start
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ae09258..a08ae65 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -52,16 +52,27 @@  intel_ring_initialized(struct intel_engine_cs *ring)
 
 int __intel_ring_space(int head, int tail, int size)
 {
-	int space = head - (tail + I915_RING_FREE_SPACE);
-	if (space < 0)
+	int space = head - tail;
+	if (space <= 0)
 		space += size;
-	return space;
+	return space - I915_RING_FREE_SPACE;
+}
+
+void intel_ring_update_space(struct intel_ringbuffer *ringbuf)
+{
+	if (ringbuf->last_retired_head != -1) {
+		ringbuf->head = ringbuf->last_retired_head;
+		ringbuf->last_retired_head = -1;
+	}
+
+	ringbuf->space = __intel_ring_space(ringbuf->head & HEAD_ADDR,
+					    ringbuf->tail, ringbuf->size);
 }
 
 int intel_ring_space(struct intel_ringbuffer *ringbuf)
 {
-	return __intel_ring_space(ringbuf->head & HEAD_ADDR,
-				  ringbuf->tail, ringbuf->size);
+	intel_ring_update_space(ringbuf);
+	return ringbuf->space;
 }
 
 bool intel_ring_stopped(struct intel_engine_cs *ring)
@@ -592,10 +603,10 @@  static int init_ring_common(struct intel_engine_cs *ring)
 	if (!drm_core_check_feature(ring->dev, DRIVER_MODESET))
 		i915_kernel_lost_context(ring->dev);
 	else {
+		ringbuf->last_retired_head = -1;
 		ringbuf->head = I915_READ_HEAD(ring);
 		ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
-		ringbuf->space = intel_ring_space(ringbuf);
-		ringbuf->last_retired_head = -1;
+		intel_ring_update_space(ringbuf);
 	}
 
 	memset(&ring->hangcheck, 0, sizeof(ring->hangcheck));
@@ -1880,14 +1891,8 @@  static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 	u32 seqno = 0;
 	int ret;
 
-	if (ringbuf->last_retired_head != -1) {
-		ringbuf->head = ringbuf->last_retired_head;
-		ringbuf->last_retired_head = -1;
-
-		ringbuf->space = intel_ring_space(ringbuf);
-		if (ringbuf->space >= n)
-			return 0;
-	}
+	if (intel_ring_space(ringbuf) >= n)
+		return 0;
 
 	list_for_each_entry(request, &ring->request_list, list) {
 		if (__intel_ring_space(request->tail, ringbuf->tail,
@@ -1905,10 +1910,7 @@  static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 		return ret;
 
 	i915_gem_retire_requests_ring(ring);
-	ringbuf->head = ringbuf->last_retired_head;
-	ringbuf->last_retired_head = -1;
 
-	ringbuf->space = intel_ring_space(ringbuf);
 	return 0;
 }
 
@@ -1934,14 +1936,14 @@  static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
 	 * case by choosing an insanely large timeout. */
 	end = jiffies + 60 * HZ;
 
+	ret = 0;
 	trace_i915_ring_wait_begin(ring);
 	do {
+		if (intel_ring_space(ringbuf) >= n)
+			break;
 		ringbuf->head = I915_READ_HEAD(ring);
-		ringbuf->space = intel_ring_space(ringbuf);
-		if (ringbuf->space >= n) {
-			ret = 0;
+		if (intel_ring_space(ringbuf) >= n)
 			break;
-		}
 
 		if (!drm_core_check_feature(dev, DRIVER_MODESET) &&
 		    dev->primary->master) {
@@ -1989,7 +1991,7 @@  static int intel_wrap_ring_buffer(struct intel_engine_cs *ring)
 		iowrite32(MI_NOOP, virt++);
 
 	ringbuf->tail = 0;
-	ringbuf->space = intel_ring_space(ringbuf);
+	intel_ring_update_space(ringbuf);
 
 	return 0;
 }
@@ -2061,6 +2063,7 @@  int intel_ring_begin(struct intel_engine_cs *ring,
 		     int num_dwords)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct intel_ringbuffer *ringbuf = ring->buffer;
 	int ret;
 
 	ret = i915_gem_check_wedge(&dev_priv->gpu_error,
@@ -2077,7 +2080,7 @@  int intel_ring_begin(struct intel_engine_cs *ring,
 	if (ret)
 		return ret;
 
-	ring->buffer->space -= num_dwords * sizeof(uint32_t);
+	ringbuf->space -= num_dwords * sizeof(uint32_t);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index aab2e2f..24a5723 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -404,6 +404,7 @@  static inline void intel_ring_advance(struct intel_engine_cs *ring)
 	ringbuf->tail &= ringbuf->size - 1;
 }
 int __intel_ring_space(int head, int tail, int size);
+void intel_ring_update_space(struct intel_ringbuffer *ringbuf);
 int intel_ring_space(struct intel_ringbuffer *ringbuf);
 bool intel_ring_stopped(struct intel_engine_cs *ring);
 void __intel_ring_advance(struct intel_engine_cs *ring);