diff mbox series

drm/i915/perf: don't read head/tail pointers outside critical section

Message ID 20200330091411.37357-1-lionel.g.landwerlin@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/perf: don't read head/tail pointers outside critical section | expand

Commit Message

Lionel Landwerlin March 30, 2020, 9:14 a.m. UTC
Reading or writing those fields should only happen under
stream->oa_buffer.ptr_lock.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d1df41eb72ef ("drm/i915/perf: rework aging tail workaround")
---
 drivers/gpu/drm/i915/i915_perf.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Chris Wilson March 30, 2020, 10:09 a.m. UTC | #1
Quoting Lionel Landwerlin (2020-03-30 10:14:11)
> Reading or writing those fields should only happen under
> stream->oa_buffer.ptr_lock.

Writing, yes. Reading as a pair, sure. There are other ways you can
ensure that the tail/head are read as one, but fair enough.

> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Fixes: d1df41eb72ef ("drm/i915/perf: rework aging tail workaround")
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index c74ebac50015..ec9421f02ebd 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -463,6 +463,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>         u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
>         int report_size = stream->oa_buffer.format_size;
>         unsigned long flags;
> +       bool pollin;
>         u32 hw_tail;
>         u64 now;
>  
> @@ -532,10 +533,13 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>                 stream->oa_buffer.aging_timestamp = now;
>         }
>  
> +       pollin = OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> +                         stream->oa_buffer.head - gtt_offset) >= report_size;
> +
> +

Bonus \n

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

>         spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>  
> -       return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> -                       stream->oa_buffer.head - gtt_offset) >= report_size;
> +       return pollin;

You could always leave the calculation here, and just have the read
inside.
-Chris
Dixit, Ashutosh March 30, 2020, 3:55 p.m. UTC | #2
On Mon, 30 Mar 2020 03:09:20 -0700, Chris Wilson wrote:
>
> Quoting Lionel Landwerlin (2020-03-30 10:14:11)
> > Reading or writing those fields should only happen under
> > stream->oa_buffer.ptr_lock.
>
> Writing, yes. Reading as a pair, sure. There are other ways you can
> ensure that the tail/head are read as one, but fair enough.

Sorry but I am trying to understand exactly what the purpose of
stream->oa_buffer.ptr_lock is? This is a classic ring buffer producer
consumer situation where producer updates tail and consumer updates
head. Since both are u32's can't those operations be done without requiring
a lock?

>
> > Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> > Fixes: d1df41eb72ef ("drm/i915/perf: rework aging tail workaround")
> > ---
> >  drivers/gpu/drm/i915/i915_perf.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > index c74ebac50015..ec9421f02ebd 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -463,6 +463,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> >         u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
> >         int report_size = stream->oa_buffer.format_size;
> >         unsigned long flags;
> > +       bool pollin;
> >         u32 hw_tail;
> >         u64 now;
> >
> > @@ -532,10 +533,13 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> >                 stream->oa_buffer.aging_timestamp = now;
> >         }
> >
> > +       pollin = OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> > +                         stream->oa_buffer.head - gtt_offset) >= report_size;
> > +
> > +
>
> Bonus \n
>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>
> >         spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> >
> > -       return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> > -                       stream->oa_buffer.head - gtt_offset) >= report_size;
> > +       return pollin;

In what way is the original code incorrect? As I mentioned head is u32 and
can be read atomically without requiring a lock? We had deliberately moved
this code outside the lock so as to pick up the the latest value of head if
it had been updated in the consumer (read).
Chris Wilson March 30, 2020, 4:38 p.m. UTC | #3
Quoting Dixit, Ashutosh (2020-03-30 16:55:32)
> On Mon, 30 Mar 2020 03:09:20 -0700, Chris Wilson wrote:
> >
> > Quoting Lionel Landwerlin (2020-03-30 10:14:11)
> > > Reading or writing those fields should only happen under
> > > stream->oa_buffer.ptr_lock.
> >
> > Writing, yes. Reading as a pair, sure. There are other ways you can
> > ensure that the tail/head are read as one, but fair enough.
> 
> Sorry but I am trying to understand exactly what the purpose of
> stream->oa_buffer.ptr_lock is? This is a classic ring buffer producer
> consumer situation where producer updates tail and consumer updates
> head. Since both are u32's can't those operations be done without requiring
> a lock?

> > >         spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> > >
> > > -       return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> > > -                       stream->oa_buffer.head - gtt_offset) >= report_size;
> > > +       return pollin;
> 
> In what way is the original code incorrect? As I mentioned head is u32 and
> can be read atomically without requiring a lock? We had deliberately moved
> this code outside the lock so as to pick up the the latest value of head if
> it had been updated in the consumer (read).

It's the pair of reads here. What's the synchronisation between the read
of tail/head with the update? There's no sync between the reads so
order is not determined here.

So we may see the head updated for an old tail, and so think we have
plenty to report, when in fact there's none (or someother convolution).

Normal ringbuffer is to sample the head/tail pointers, smp_rmb(), then
consume the data between head/tail (with the write doing the smp_wmb()
after updating the data and before moving the tail). [So the normal
usage of barriers is around access to one of tail/head (the other is
under your control) and the shared contents.]
-Chris
Dixit, Ashutosh March 31, 2020, 1:40 a.m. UTC | #4
On Mon, 30 Mar 2020 09:38:23 -0700, Chris Wilson wrote:
>
> Quoting Dixit, Ashutosh (2020-03-30 16:55:32)
> > On Mon, 30 Mar 2020 03:09:20 -0700, Chris Wilson wrote:
> > >
> > > Quoting Lionel Landwerlin (2020-03-30 10:14:11)
> > > > Reading or writing those fields should only happen under
> > > > stream->oa_buffer.ptr_lock.
> > >
> > > Writing, yes. Reading as a pair, sure. There are other ways you can
> > > ensure that the tail/head are read as one, but fair enough.
> >
> > Sorry but I am trying to understand exactly what the purpose of
> > stream->oa_buffer.ptr_lock is? This is a classic ring buffer producer
> > consumer situation where producer updates tail and consumer updates
> > head. Since both are u32's can't those operations be done without requiring
> > a lock?
>
> > > >         spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> > > >
> > > > -       return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> > > > -                       stream->oa_buffer.head - gtt_offset) >= report_size;
> > > > +       return pollin;
> >
> > In what way is the original code incorrect? As I mentioned head is u32 and
> > can be read atomically without requiring a lock? We had deliberately moved
> > this code outside the lock so as to pick up the the latest value of head if
> > it had been updated in the consumer (read).
>
> It's the pair of reads here. What's the synchronisation between the read
> of tail/head with the update? There's no sync between the reads so
> order is not determined here.
>
> So we may see the head updated for an old tail, and so think we have
> plenty to report, when in fact there's none (or someother convolution).
>
> Normal ringbuffer is to sample the head/tail pointers, smp_rmb(), then
> consume the data between head/tail (with the write doing the smp_wmb()
> after updating the data and before moving the tail). [So the normal
> usage of barriers is around access to one of tail/head (the other is
> under your control) and the shared contents.]

Ok, thanks for explanantion Chris, I think I understand how barriers are
used with ring buffers but I am still not sure if the previous code was
incorrect. It is almost consumer side code running in the producer
thread. Anyway, let's just go with this patch for now:

Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index c74ebac50015..ec9421f02ebd 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -463,6 +463,7 @@  static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 	u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
 	int report_size = stream->oa_buffer.format_size;
 	unsigned long flags;
+	bool pollin;
 	u32 hw_tail;
 	u64 now;
 
@@ -532,10 +533,13 @@  static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 		stream->oa_buffer.aging_timestamp = now;
 	}
 
+	pollin = OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
+			  stream->oa_buffer.head - gtt_offset) >= report_size;
+
+
 	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
 
-	return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
-			stream->oa_buffer.head - gtt_offset) >= report_size;
+	return pollin;
 }
 
 /**