Message ID | 20200330091411.37357-1-lionel.g.landwerlin@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/perf: don't read head/tail pointers outside critical section | expand |
Quoting Lionel Landwerlin (2020-03-30 10:14:11) > Reading or writing those fields should only happen under > stream->oa_buffer.ptr_lock. Writing, yes. Reading as a pair, sure. There are other ways you can ensure that the tail/head are read as one, but fair enough. > Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > Fixes: d1df41eb72ef ("drm/i915/perf: rework aging tail workaround") > --- > drivers/gpu/drm/i915/i915_perf.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > index c74ebac50015..ec9421f02ebd 100644 > --- a/drivers/gpu/drm/i915/i915_perf.c > +++ b/drivers/gpu/drm/i915/i915_perf.c > @@ -463,6 +463,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) > u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); > int report_size = stream->oa_buffer.format_size; > unsigned long flags; > + bool pollin; > u32 hw_tail; > u64 now; > > @@ -532,10 +533,13 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) > stream->oa_buffer.aging_timestamp = now; > } > > + pollin = OA_TAKEN(stream->oa_buffer.tail - gtt_offset, > + stream->oa_buffer.head - gtt_offset) >= report_size; > + > + Bonus \n Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> > spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags); > > - return OA_TAKEN(stream->oa_buffer.tail - gtt_offset, > - stream->oa_buffer.head - gtt_offset) >= report_size; > + return pollin; You could always leave the calculation here, and just have the read inside. -Chris
On Mon, 30 Mar 2020 03:09:20 -0700, Chris Wilson wrote: > > Quoting Lionel Landwerlin (2020-03-30 10:14:11) > > Reading or writing those fields should only happen under > > stream->oa_buffer.ptr_lock. > > Writing, yes. Reading as a pair, sure. There are other ways you can > ensure that the tail/head are read as one, but fair enough. Sorry but I am trying to understand exactly what the purpose of stream->oa_buffer.ptr_lock is? This is a classic ring buffer producer consumer situation where producer updates tail and consumer updates head. Since both are u32's can't those operations be done without requiring a lock? > > > Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > > Fixes: d1df41eb72ef ("drm/i915/perf: rework aging tail workaround") > > --- > > drivers/gpu/drm/i915/i915_perf.c | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > > index c74ebac50015..ec9421f02ebd 100644 > > --- a/drivers/gpu/drm/i915/i915_perf.c > > +++ b/drivers/gpu/drm/i915/i915_perf.c > > @@ -463,6 +463,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) > > u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); > > int report_size = stream->oa_buffer.format_size; > > unsigned long flags; > > + bool pollin; > > u32 hw_tail; > > u64 now; > > > > @@ -532,10 +533,13 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) > > stream->oa_buffer.aging_timestamp = now; > > } > > > > + pollin = OA_TAKEN(stream->oa_buffer.tail - gtt_offset, > > + stream->oa_buffer.head - gtt_offset) >= report_size; > > + > > + > > Bonus \n > > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> > > > spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags); > > > > - return OA_TAKEN(stream->oa_buffer.tail - gtt_offset, > > - stream->oa_buffer.head - gtt_offset) >= report_size; > > + return pollin; In what way is the original code incorrect? As I mentioned head is u32 and can be read atomically without requiring a lock? We had deliberately moved this code outside the lock so as to pick up the the latest value of head if it had been updated in the consumer (read).
Quoting Dixit, Ashutosh (2020-03-30 16:55:32) > On Mon, 30 Mar 2020 03:09:20 -0700, Chris Wilson wrote: > > > > Quoting Lionel Landwerlin (2020-03-30 10:14:11) > > > Reading or writing those fields should only happen under > > > stream->oa_buffer.ptr_lock. > > > > Writing, yes. Reading as a pair, sure. There are other ways you can > > ensure that the tail/head are read as one, but fair enough. > > Sorry but I am trying to understand exactly what the purpose of > stream->oa_buffer.ptr_lock is? This is a classic ring buffer producer > consumer situation where producer updates tail and consumer updates > head. Since both are u32's can't those operations be done without requiring > a lock? > > > spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags); > > > > > > - return OA_TAKEN(stream->oa_buffer.tail - gtt_offset, > > > - stream->oa_buffer.head - gtt_offset) >= report_size; > > > + return pollin; > > In what way is the original code incorrect? As I mentioned head is u32 and > can be read atomically without requiring a lock? We had deliberately moved > this code outside the lock so as to pick up the the latest value of head if > it had been updated in the consumer (read). It's the pair of reads here. What's the synchronisation between the read of tail/head with the update? There's no sync between the reads so order is not determined here. So we may see the head updated for an old tail, and so think we have plenty to report, when in fact there's none (or someother convolution). Normal ringbuffer is to sample the head/tail pointers, smp_rmb(), then consume the data between head/tail (with the write doing the smp_wmb() after updating the data and before moving the tail). [So the normal usage of barriers is around access to one of tail/head (the other is under your control) and the shared contents.] -Chris
On Mon, 30 Mar 2020 09:38:23 -0700, Chris Wilson wrote: > > Quoting Dixit, Ashutosh (2020-03-30 16:55:32) > > On Mon, 30 Mar 2020 03:09:20 -0700, Chris Wilson wrote: > > > > > > Quoting Lionel Landwerlin (2020-03-30 10:14:11) > > > > Reading or writing those fields should only happen under > > > > stream->oa_buffer.ptr_lock. > > > > > > Writing, yes. Reading as a pair, sure. There are other ways you can > > > ensure that the tail/head are read as one, but fair enough. > > > > Sorry but I am trying to understand exactly what the purpose of > > stream->oa_buffer.ptr_lock is? This is a classic ring buffer producer > > consumer situation where producer updates tail and consumer updates > > head. Since both are u32's can't those operations be done without requiring > > a lock? > > > > > spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags); > > > > > > > > - return OA_TAKEN(stream->oa_buffer.tail - gtt_offset, > > > > - stream->oa_buffer.head - gtt_offset) >= report_size; > > > > + return pollin; > > > > In what way is the original code incorrect? As I mentioned head is u32 and > > can be read atomically without requiring a lock? We had deliberately moved > > this code outside the lock so as to pick up the the latest value of head if > > it had been updated in the consumer (read). > > It's the pair of reads here. What's the synchronisation between the read > of tail/head with the update? There's no sync between the reads so > order is not determined here. > > So we may see the head updated for an old tail, and so think we have > plenty to report, when in fact there's none (or someother convolution). > > Normal ringbuffer is to sample the head/tail pointers, smp_rmb(), then > consume the data between head/tail (with the write doing the smp_wmb() > after updating the data and before moving the tail). [So the normal > usage of barriers is around access to one of tail/head (the other is > under your control) and the shared contents.] Ok, thanks for explanantion Chris, I think I understand how barriers are used with ring buffers but I am still not sure if the previous code was incorrect. It is almost consumer side code running in the producer thread. Anyway, let's just go with this patch for now: Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index c74ebac50015..ec9421f02ebd 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -463,6 +463,7 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma); int report_size = stream->oa_buffer.format_size; unsigned long flags; + bool pollin; u32 hw_tail; u64 now; @@ -532,10 +533,13 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) stream->oa_buffer.aging_timestamp = now; } + pollin = OA_TAKEN(stream->oa_buffer.tail - gtt_offset, + stream->oa_buffer.head - gtt_offset) >= report_size; + + spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags); - return OA_TAKEN(stream->oa_buffer.tail - gtt_offset, - stream->oa_buffer.head - gtt_offset) >= report_size; + return pollin; } /**
Reading or writing those fields should only happen under stream->oa_buffer.ptr_lock. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: d1df41eb72ef ("drm/i915/perf: rework aging tail workaround") --- drivers/gpu/drm/i915/i915_perf.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)