diff mbox series

[1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads

Message ID 20200413154822.11620-2-umesh.nerlige.ramappa@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/perf: add OA interrupt support | expand

Commit Message

Umesh Nerlige Ramappa April 13, 2020, 3:48 p.m. UTC
A condition in wait_event_interruptible seems to be checked twice before
waiting on the event to occur. These checks are redundant when hrtimer
events will call oa_buffer_check_unlocked to update the oa_buffer tail
pointers. The redundant checks add cpu overhead. Simplify the check
to reduce cpu overhead when using blocking io to read oa buffer reports.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

Comments

Dixit, Ashutosh April 14, 2020, 5:08 a.m. UTC | #1
On Mon, 13 Apr 2020 08:48:20 -0700, Umesh Nerlige Ramappa wrote:
>
> A condition in wait_event_interruptible seems to be checked twice before
> waiting on the event to occur. These checks are redundant when hrtimer
> events will call oa_buffer_check_unlocked to update the oa_buffer tail
> pointers. The redundant checks add cpu overhead. Simplify the check
> to reduce cpu overhead when using blocking io to read oa buffer reports.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
>  1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 5cde3e4e7be6..e28a3ab83fde 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>	return pollin;
>  }
>
> +/**
> + * oa_buffer_check_reports - quick check if reports are available
> + * @stream: i915 stream instance
> + *
> + * The return from this function is used as a condition for
> + * wait_event_interruptible in blocking read. This is used to detect
> + * available reports.
> + *
> + * A condition in wait_event_interruptible seems to be checked twice before
> + * waiting on an event to occur. These checks are redundant when hrtimer events
> + * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
> + * redundant checks add cpu overhead. We simplify the check to reduce cpu
> + * overhead.
> + */
> +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
> +{
> +	unsigned long flags;
> +	bool available;
> +
> +	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> +	available = stream->oa_buffer.tail != stream->oa_buffer.head;
> +	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> +
> +	return available;
> +}
> +
>  /**
>   * append_oa_status - Appends a status record to a userspace read() buffer.
>   * @stream: An i915-perf stream opened for OA metrics
> @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
>		return -EIO;
>
>	return wait_event_interruptible(stream->poll_wq,
> -					oa_buffer_check_unlocked(stream));
> +					oa_buffer_check_reports(stream));
>  }

I think the problem with this patch is that the original code had the
property that the condition for data availability is checked (and the OA
tail advanced) /before/ entering the wait. So the tail was advanced and if
there was data there was no need to wait at all. This change has lost that
property. With this change we will first always enter the wait and then get
unblocked after the timer interrupt which will advance the tail and wake us
up.

I think if we want to do this, it is possible to do without losing the
original property. Just call oa_buffer_check_unlocked() first (outside
wait_event) and if there is data just return. If not, put yourself on
stream->poll_wq from which the timer callback will wake us up. I think this
is going to be something like prepare_to_wait() / schedule() /
finish_wait() pattern of which there are numerous examples in the
kernel. So in this case we won't call wait_event_interruptible() which
checks the condition upon waking up. No need to define
oa_buffer_check_reports() either.
Dixit, Ashutosh April 14, 2020, 6:58 a.m. UTC | #2
On Mon, 13 Apr 2020 22:08:47 -0700, Dixit, Ashutosh wrote:
>
> On Mon, 13 Apr 2020 08:48:20 -0700, Umesh Nerlige Ramappa wrote:
> >
> > A condition in wait_event_interruptible seems to be checked twice before
> > waiting on the event to occur. These checks are redundant when hrtimer
> > events will call oa_buffer_check_unlocked to update the oa_buffer tail
> > pointers. The redundant checks add cpu overhead. Simplify the check
> > to reduce cpu overhead when using blocking io to read oa buffer reports.
> >
> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
> >  1 file changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > index 5cde3e4e7be6..e28a3ab83fde 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> >	return pollin;
> >  }
> >
> > +/**
> > + * oa_buffer_check_reports - quick check if reports are available
> > + * @stream: i915 stream instance
> > + *
> > + * The return from this function is used as a condition for
> > + * wait_event_interruptible in blocking read. This is used to detect
> > + * available reports.
> > + *
> > + * A condition in wait_event_interruptible seems to be checked twice before
> > + * waiting on an event to occur. These checks are redundant when hrtimer events
> > + * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
> > + * redundant checks add cpu overhead. We simplify the check to reduce cpu
> > + * overhead.
> > + */
> > +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
> > +{
> > +	unsigned long flags;
> > +	bool available;
> > +
> > +	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> > +	available = stream->oa_buffer.tail != stream->oa_buffer.head;
> > +	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> > +
> > +	return available;
> > +}
> > +
> >  /**
> >   * append_oa_status - Appends a status record to a userspace read() buffer.
> >   * @stream: An i915-perf stream opened for OA metrics
> > @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
> >		return -EIO;
> >
> >	return wait_event_interruptible(stream->poll_wq,
> > -					oa_buffer_check_unlocked(stream));
> > +					oa_buffer_check_reports(stream));
> >  }
>
> I think the problem with this patch is that the original code had the
> property that the condition for data availability is checked (and the OA
> tail advanced) /before/ entering the wait. So the tail was advanced and if
> there was data there was no need to wait at all. This change has lost that
> property. With this change we will first always enter the wait and then get
> unblocked after the timer interrupt which will advance the tail and wake us
> up.
>
> I think if we want to do this, it is possible to do without losing the
> original property. Just call oa_buffer_check_unlocked() first (outside
> wait_event) and if there is data just return. If not, put yourself on
> stream->poll_wq from which the timer callback will wake us up. I think this
> is going to be something like prepare_to_wait() / schedule() /
> finish_wait() pattern of which there are numerous examples in the
> kernel. So in this case we won't call wait_event_interruptible() which
> checks the condition upon waking up. No need to define
> oa_buffer_check_reports() either.

If on the other hand we say that this should actually be the desired
behavior of the blocking read, that it should not unblock immediately but
only after the next timer interrupt (so that an app can call the blocking
read repeatedly without worrying about it will return a little bit of data
in each call at a considerable amount of CPU load), then we may be able to
do something like this, but even then we may have to think about what would
be the correct way to do that. Though this may be that correct way and we
may just need to change the commit message, but is that what we want?
Umesh Nerlige Ramappa April 14, 2020, 4:59 p.m. UTC | #3
On Mon, Apr 13, 2020 at 11:58:18PM -0700, Dixit, Ashutosh wrote:
>On Mon, 13 Apr 2020 22:08:47 -0700, Dixit, Ashutosh wrote:
>>
>> On Mon, 13 Apr 2020 08:48:20 -0700, Umesh Nerlige Ramappa wrote:
>> >
>> > A condition in wait_event_interruptible seems to be checked twice before
>> > waiting on the event to occur. These checks are redundant when hrtimer
>> > events will call oa_buffer_check_unlocked to update the oa_buffer tail
>> > pointers. The redundant checks add cpu overhead. Simplify the check
>> > to reduce cpu overhead when using blocking io to read oa buffer reports.
>> >
>> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
>> >  1 file changed, 27 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> > index 5cde3e4e7be6..e28a3ab83fde 100644
>> > --- a/drivers/gpu/drm/i915/i915_perf.c
>> > +++ b/drivers/gpu/drm/i915/i915_perf.c
>> > @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>> >	return pollin;
>> >  }
>> >
>> > +/**
>> > + * oa_buffer_check_reports - quick check if reports are available
>> > + * @stream: i915 stream instance
>> > + *
>> > + * The return from this function is used as a condition for
>> > + * wait_event_interruptible in blocking read. This is used to detect
>> > + * available reports.
>> > + *
>> > + * A condition in wait_event_interruptible seems to be checked twice before
>> > + * waiting on an event to occur. These checks are redundant when hrtimer events
>> > + * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
>> > + * redundant checks add cpu overhead. We simplify the check to reduce cpu
>> > + * overhead.
>> > + */
>> > +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
>> > +{
>> > +	unsigned long flags;
>> > +	bool available;
>> > +
>> > +	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>> > +	available = stream->oa_buffer.tail != stream->oa_buffer.head;
>> > +	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>> > +
>> > +	return available;
>> > +}
>> > +
>> >  /**
>> >   * append_oa_status - Appends a status record to a userspace read() buffer.
>> >   * @stream: An i915-perf stream opened for OA metrics
>> > @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
>> >		return -EIO;
>> >
>> >	return wait_event_interruptible(stream->poll_wq,
>> > -					oa_buffer_check_unlocked(stream));
>> > +					oa_buffer_check_reports(stream));
>> >  }
>>
>> I think the problem with this patch is that the original code had the
>> property that the condition for data availability is checked (and the OA
>> tail advanced) /before/ entering the wait. So the tail was advanced and if
>> there was data there was no need to wait at all. This change has lost that
>> property. With this change we will first always enter the wait and then get
>> unblocked after the timer interrupt which will advance the tail and wake us
>> up.
>>
>> I think if we want to do this, it is possible to do without losing the
>> original property. Just call oa_buffer_check_unlocked() first (outside
>> wait_event) and if there is data just return. If not, put yourself on
>> stream->poll_wq from which the timer callback will wake us up. I think this
>> is going to be something like prepare_to_wait() / schedule() /
>> finish_wait() pattern of which there are numerous examples in the
>> kernel. So in this case we won't call wait_event_interruptible() which
>> checks the condition upon waking up. No need to define
>> oa_buffer_check_reports() either.
>
>If on the other hand we say that this should actually be the desired
>behavior of the blocking read, that it should not unblock immediately but
>only after the next timer interrupt (so that an app can call the blocking
>read repeatedly without worrying about it will return a little bit of data
>in each call at a considerable amount of CPU load), then we may be able to
>do something like this, but even then we may have to think about what would
>be the correct way to do that. Though this may be that correct way and we
>may just need to change the commit message, but is that what we want?

I am not quite clear on how the blocking read should behave in terms of 
the API itself.

The change here is solely to reduce cpu overhead from the additional 2 
calls to oa_buffer_check_unlocked before blocking and that would happen 
on every call to the read. I agree that the read would block if the 
timer did not update the tail yet, but that makes sense in a way that we 
don't want read to constantly return data when the OA sampling rates are 
high (say 100 us). With this change the behavior becomes consistent 
irrespective of the OA sampling rate and user has the flexibility to set 
the POLL_OA_PERIOD to whatever value is suitable for the OA sampling 
rate chosen.

Thanks,
Umesh
Dixit, Ashutosh April 14, 2020, 7:09 p.m. UTC | #4
On Tue, 14 Apr 2020 09:59:48 -0700, Umesh Nerlige Ramappa wrote:
> On Mon, Apr 13, 2020 at 11:58:18PM -0700, Dixit, Ashutosh wrote:
> > On Mon, 13 Apr 2020 22:08:47 -0700, Dixit, Ashutosh wrote:
> >>
> >> On Mon, 13 Apr 2020 08:48:20 -0700, Umesh Nerlige Ramappa wrote:
> >> >
> >> > A condition in wait_event_interruptible seems to be checked twice before
> >> > waiting on the event to occur. These checks are redundant when hrtimer
> >> > events will call oa_buffer_check_unlocked to update the oa_buffer tail
> >> > pointers. The redundant checks add cpu overhead. Simplify the check
> >> > to reduce cpu overhead when using blocking io to read oa buffer reports.
> >> >
> >> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >> > ---
> >> >  drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
> >> >  1 file changed, 27 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> >> > index 5cde3e4e7be6..e28a3ab83fde 100644
> >> > --- a/drivers/gpu/drm/i915/i915_perf.c
> >> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> >> > @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> >> >	return pollin;
> >> >  }
> >> >
> >> > +/**
> >> > + * oa_buffer_check_reports - quick check if reports are available
> >> > + * @stream: i915 stream instance
> >> > + *
> >> > + * The return from this function is used as a condition for
> >> > + * wait_event_interruptible in blocking read. This is used to detect
> >> > + * available reports.
> >> > + *
> >> > + * A condition in wait_event_interruptible seems to be checked twice before
> >> > + * waiting on an event to occur. These checks are redundant when hrtimer events
> >> > + * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
> >> > + * redundant checks add cpu overhead. We simplify the check to reduce cpu
> >> > + * overhead.
> >> > + */
> >> > +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
> >> > +{
> >> > +	unsigned long flags;
> >> > +	bool available;
> >> > +
> >> > +	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> >> > +	available = stream->oa_buffer.tail != stream->oa_buffer.head;
> >> > +	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> >> > +
> >> > +	return available;
> >> > +}
> >> > +
> >> >  /**
> >> >   * append_oa_status - Appends a status record to a userspace read() buffer.
> >> >   * @stream: An i915-perf stream opened for OA metrics
> >> > @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
> >> >		return -EIO;
> >> >
> >> >	return wait_event_interruptible(stream->poll_wq,
> >> > -					oa_buffer_check_unlocked(stream));
> >> > +					oa_buffer_check_reports(stream));
> >> >  }
> >>
> >> I think the problem with this patch is that the original code had the
> >> property that the condition for data availability is checked (and the OA
> >> tail advanced) /before/ entering the wait. So the tail was advanced and if
> >> there was data there was no need to wait at all. This change has lost that
> >> property. With this change we will first always enter the wait and then get
> >> unblocked after the timer interrupt which will advance the tail and wake us
> >> up.
> >>
> >> I think if we want to do this, it is possible to do without losing the
> >> original property. Just call oa_buffer_check_unlocked() first (outside
> >> wait_event) and if there is data just return. If not, put yourself on
> >> stream->poll_wq from which the timer callback will wake us up. I think this
> >> is going to be something like prepare_to_wait() / schedule() /
> >> finish_wait() pattern of which there are numerous examples in the
> >> kernel. So in this case we won't call wait_event_interruptible() which
> >> checks the condition upon waking up. No need to define
> >> oa_buffer_check_reports() either.
> >
> > If on the other hand we say that this should actually be the desired
> > behavior of the blocking read, that it should not unblock immediately but
> > only after the next timer interrupt (so that an app can call the blocking
> > read repeatedly without worrying about it will return a little bit of data
> > in each call at a considerable amount of CPU load), then we may be able to
> > do something like this, but even then we may have to think about what would
> > be the correct way to do that. Though this may be that correct way and we
> > may just need to change the commit message, but is that what we want?
>
> I am not quite clear on how the blocking read should behave in terms of the
> API itself.
>
> The change here is solely to reduce cpu overhead from the additional 2
> calls to oa_buffer_check_unlocked before blocking and that would happen on
> every call to the read. I agree that the read would block if the timer did
> not update the tail yet, but that makes sense in a way that we don't want
> read to constantly return data when the OA sampling rates are high (say 100
> us). With this change the behavior becomes consistent irrespective of the
> OA sampling rate and user has the flexibility to set the POLL_OA_PERIOD to
> whatever value is suitable for the OA sampling rate chosen.

I think the patch commit message should clearly state the reason for the
patch. Afais if the purpose of the patch is to reduce cpu overhead, the
patch should not change API behavior. If on the other hand the purpose of
the patch is to make the API behavior consistent, then please resubmit the
patch after modifying the commit message to reflect the purpose of the
patch so that the patch can be reviewed from that point of view.

Also I think we should be able to just use stream->pollin instead of
defining the new oa_buffer_check_reports(). E.g. isn't
stream->pollin == (stream->oa_buffer.tail != stream->oa_buffer.head) ?

Thanks!
--
Ashutosh
Dixit, Ashutosh April 14, 2020, 7:37 p.m. UTC | #5
On Tue, 14 Apr 2020 12:09:42 -0700, Dixit, Ashutosh wrote:
>
> On Tue, 14 Apr 2020 09:59:48 -0700, Umesh Nerlige Ramappa wrote:
> > On Mon, Apr 13, 2020 at 11:58:18PM -0700, Dixit, Ashutosh wrote:
> > > On Mon, 13 Apr 2020 22:08:47 -0700, Dixit, Ashutosh wrote:
> > >>
> > >> On Mon, 13 Apr 2020 08:48:20 -0700, Umesh Nerlige Ramappa wrote:
> > >> >
> > >> > A condition in wait_event_interruptible seems to be checked twice before
> > >> > waiting on the event to occur. These checks are redundant when hrtimer
> > >> > events will call oa_buffer_check_unlocked to update the oa_buffer tail
> > >> > pointers. The redundant checks add cpu overhead. Simplify the check
> > >> > to reduce cpu overhead when using blocking io to read oa buffer reports.
> > >> >
> > >> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > >> > ---
> > >> >  drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
> > >> >  1 file changed, 27 insertions(+), 1 deletion(-)
> > >> >
> > >> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > >> > index 5cde3e4e7be6..e28a3ab83fde 100644
> > >> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > >> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > >> > @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> > >> >	return pollin;
> > >> >  }
> > >> >
> > >> > +/**
> > >> > + * oa_buffer_check_reports - quick check if reports are available
> > >> > + * @stream: i915 stream instance
> > >> > + *
> > >> > + * The return from this function is used as a condition for
> > >> > + * wait_event_interruptible in blocking read. This is used to detect
> > >> > + * available reports.
> > >> > + *
> > >> > + * A condition in wait_event_interruptible seems to be checked twice before
> > >> > + * waiting on an event to occur. These checks are redundant when hrtimer events
> > >> > + * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
> > >> > + * redundant checks add cpu overhead. We simplify the check to reduce cpu
> > >> > + * overhead.
> > >> > + */
> > >> > +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
> > >> > +{
> > >> > +	unsigned long flags;
> > >> > +	bool available;
> > >> > +
> > >> > +	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> > >> > +	available = stream->oa_buffer.tail != stream->oa_buffer.head;
> > >> > +	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> > >> > +
> > >> > +	return available;
> > >> > +}
> > >> > +
> > >> >  /**
> > >> >   * append_oa_status - Appends a status record to a userspace read() buffer.
> > >> >   * @stream: An i915-perf stream opened for OA metrics
> > >> > @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
> > >> >		return -EIO;
> > >> >
> > >> >	return wait_event_interruptible(stream->poll_wq,
> > >> > -					oa_buffer_check_unlocked(stream));
> > >> > +					oa_buffer_check_reports(stream));
> > >> >  }
> > >>
> > >> I think the problem with this patch is that the original code had the
> > >> property that the condition for data availability is checked (and the OA
> > >> tail advanced) /before/ entering the wait. So the tail was advanced and if
> > >> there was data there was no need to wait at all. This change has lost that
> > >> property. With this change we will first always enter the wait and then get
> > >> unblocked after the timer interrupt which will advance the tail and wake us
> > >> up.
> > >>
> > >> I think if we want to do this, it is possible to do without losing the
> > >> original property. Just call oa_buffer_check_unlocked() first (outside
> > >> wait_event) and if there is data just return. If not, put yourself on
> > >> stream->poll_wq from which the timer callback will wake us up. I think this
> > >> is going to be something like prepare_to_wait() / schedule() /
> > >> finish_wait() pattern of which there are numerous examples in the
> > >> kernel. So in this case we won't call wait_event_interruptible() which
> > >> checks the condition upon waking up. No need to define
> > >> oa_buffer_check_reports() either.
> > >
> > > If on the other hand we say that this should actually be the desired
> > > behavior of the blocking read, that it should not unblock immediately but
> > > only after the next timer interrupt (so that an app can call the blocking
> > > read repeatedly without worrying about it will return a little bit of data
> > > in each call at a considerable amount of CPU load), then we may be able to
> > > do something like this, but even then we may have to think about what would
> > > be the correct way to do that. Though this may be that correct way and we
> > > may just need to change the commit message, but is that what we want?
> >
> > I am not quite clear on how the blocking read should behave in terms of the
> > API itself.
> >
> > The change here is solely to reduce cpu overhead from the additional 2
> > calls to oa_buffer_check_unlocked before blocking and that would happen on
> > every call to the read. I agree that the read would block if the timer did
> > not update the tail yet, but that makes sense in a way that we don't want
> > read to constantly return data when the OA sampling rates are high (say 100
> > us). With this change the behavior becomes consistent irrespective of the
> > OA sampling rate and user has the flexibility to set the POLL_OA_PERIOD to
> > whatever value is suitable for the OA sampling rate chosen.
>
> I think the patch commit message should clearly state the reason for the
> patch. Afais if the purpose of the patch is to reduce cpu overhead, the
> patch should not change API behavior. If on the other hand the purpose of
> the patch is to make the API behavior consistent, then please resubmit the
> patch after modifying the commit message to reflect the purpose of the
> patch so that the patch can be reviewed from that point of view.
>
> Also I think we should be able to just use stream->pollin instead of
> defining the new oa_buffer_check_reports(). E.g. isn't
> stream->pollin == (stream->oa_buffer.tail != stream->oa_buffer.head) ?

Actually yes, using stream->pollin will make the wait for both blocking and
non-blocking reads identical. So the blocking wait just becomes:

static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
{
	...

        return wait_event_interruptible(stream->poll_wq, stream->pollin);
}

That is if we re-purpose the patch to make the API behavior consistent
across blocking and non-blocking reads.
Lionel Landwerlin April 15, 2020, 10 a.m. UTC | #6
On 13/04/2020 18:48, Umesh Nerlige Ramappa wrote:
> A condition in wait_event_interruptible seems to be checked twice before
> waiting on the event to occur. These checks are redundant when hrtimer
> events will call oa_buffer_check_unlocked to update the oa_buffer tail
> pointers. The redundant checks add cpu overhead. Simplify the check
> to reduce cpu overhead when using blocking io to read oa buffer reports.
>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>


I cherry picked this patch alone and it seems to break the 
disabled-read-error test.


> ---
>   drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
>   1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 5cde3e4e7be6..e28a3ab83fde 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>   	return pollin;
>   }
>   
> +/**
> + * oa_buffer_check_reports - quick check if reports are available
> + * @stream: i915 stream instance
> + *
> + * The return from this function is used as a condition for
> + * wait_event_interruptible in blocking read. This is used to detect
> + * available reports.
> + *
> + * A condition in wait_event_interruptible seems to be checked twice before
> + * waiting on an event to occur. These checks are redundant when hrtimer events
> + * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
> + * redundant checks add cpu overhead. We simplify the check to reduce cpu
> + * overhead.
> + */
> +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
> +{
> +	unsigned long flags;
> +	bool available;
> +
> +	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> +	available = stream->oa_buffer.tail != stream->oa_buffer.head;
> +	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> +
> +	return available;
> +}
> +
>   /**
>    * append_oa_status - Appends a status record to a userspace read() buffer.
>    * @stream: An i915-perf stream opened for OA metrics
> @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
>   		return -EIO;
>   
>   	return wait_event_interruptible(stream->poll_wq,
> -					oa_buffer_check_unlocked(stream));
> +					oa_buffer_check_reports(stream));
>   }
>   
>   /**
Umesh Nerlige Ramappa April 15, 2020, 6:55 p.m. UTC | #7
On Wed, Apr 15, 2020 at 01:00:30PM +0300, Lionel Landwerlin wrote:
>On 13/04/2020 18:48, Umesh Nerlige Ramappa wrote:
>>A condition in wait_event_interruptible seems to be checked twice before
>>waiting on the event to occur. These checks are redundant when hrtimer
>>events will call oa_buffer_check_unlocked to update the oa_buffer tail
>>pointers. The redundant checks add cpu overhead. Simplify the check
>>to reduce cpu overhead when using blocking io to read oa buffer reports.
>>
>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>
>
>I cherry picked this patch alone and it seems to break the 
>disabled-read-error test.

Strange. I don't see it fail on my CFL. I am apply this on the latest 
drm-tip from yesterday.

The patch still checks if reports are available before blocking.  The 
behavior should still be the same w.r.t this test.

What machine did you run it on? I will try on the same. Any chance you 
have the debug output from the test?

Thanks,
Umesh

>
>
>>---
>>  drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
>>  1 file changed, 27 insertions(+), 1 deletion(-)
>>
>>diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>>index 5cde3e4e7be6..e28a3ab83fde 100644
>>--- a/drivers/gpu/drm/i915/i915_perf.c
>>+++ b/drivers/gpu/drm/i915/i915_perf.c
>>@@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
>>  	return pollin;
>>  }
>>+/**
>>+ * oa_buffer_check_reports - quick check if reports are available
>>+ * @stream: i915 stream instance
>>+ *
>>+ * The return from this function is used as a condition for
>>+ * wait_event_interruptible in blocking read. This is used to detect
>>+ * available reports.
>>+ *
>>+ * A condition in wait_event_interruptible seems to be checked twice before
>>+ * waiting on an event to occur. These checks are redundant when hrtimer events
>>+ * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
>>+ * redundant checks add cpu overhead. We simplify the check to reduce cpu
>>+ * overhead.
>>+ */
>>+static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
>>+{
>>+	unsigned long flags;
>>+	bool available;
>>+
>>+	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>>+	available = stream->oa_buffer.tail != stream->oa_buffer.head;
>>+	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>>+
>>+	return available;
>>+}
>>+
>>  /**
>>   * append_oa_status - Appends a status record to a userspace read() buffer.
>>   * @stream: An i915-perf stream opened for OA metrics
>>@@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
>>  		return -EIO;
>>  	return wait_event_interruptible(stream->poll_wq,
>>-					oa_buffer_check_unlocked(stream));
>>+					oa_buffer_check_reports(stream));
>>  }
>>  /**
>
>
Lionel Landwerlin April 15, 2020, 7:16 p.m. UTC | #8
On 15/04/2020 21:55, Umesh Nerlige Ramappa wrote:
> On Wed, Apr 15, 2020 at 01:00:30PM +0300, Lionel Landwerlin wrote:
>> On 13/04/2020 18:48, Umesh Nerlige Ramappa wrote:
>>> A condition in wait_event_interruptible seems to be checked twice 
>>> before
>>> waiting on the event to occur. These checks are redundant when hrtimer
>>> events will call oa_buffer_check_unlocked to update the oa_buffer tail
>>> pointers. The redundant checks add cpu overhead. Simplify the check
>>> to reduce cpu overhead when using blocking io to read oa buffer 
>>> reports.
>>>
>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>
>>
>> I cherry picked this patch alone and it seems to break the 
>> disabled-read-error test.
>
> Strange. I don't see it fail on my CFL. I am apply this on the latest 
> drm-tip from yesterday.
>
> The patch still checks if reports are available before blocking. The 
> behavior should still be the same w.r.t this test.
>
> What machine did you run it on? I will try on the same. Any chance you 
> have the debug output from the test?
>
> Thanks,
> Umesh
>

I got that on SKL GT4 : http://paste.debian.net/1140604/


Thanks,


-Lionel


>>
>>
>>> ---
>>>  drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
>>>  1 file changed, 27 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>> b/drivers/gpu/drm/i915/i915_perf.c
>>> index 5cde3e4e7be6..e28a3ab83fde 100644
>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>> @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct 
>>> i915_perf_stream *stream)
>>>      return pollin;
>>>  }
>>> +/**
>>> + * oa_buffer_check_reports - quick check if reports are available
>>> + * @stream: i915 stream instance
>>> + *
>>> + * The return from this function is used as a condition for
>>> + * wait_event_interruptible in blocking read. This is used to detect
>>> + * available reports.
>>> + *
>>> + * A condition in wait_event_interruptible seems to be checked 
>>> twice before
>>> + * waiting on an event to occur. These checks are redundant when 
>>> hrtimer events
>>> + * will call oa_buffer_check_unlocked to update the oa_buffer tail 
>>> pointers. The
>>> + * redundant checks add cpu overhead. We simplify the check to 
>>> reduce cpu
>>> + * overhead.
>>> + */
>>> +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
>>> +{
>>> +    unsigned long flags;
>>> +    bool available;
>>> +
>>> +    spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>>> +    available = stream->oa_buffer.tail != stream->oa_buffer.head;
>>> + spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>>> +
>>> +    return available;
>>> +}
>>> +
>>>  /**
>>>   * append_oa_status - Appends a status record to a userspace read() 
>>> buffer.
>>>   * @stream: An i915-perf stream opened for OA metrics
>>> @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct 
>>> i915_perf_stream *stream)
>>>          return -EIO;
>>>      return wait_event_interruptible(stream->poll_wq,
>>> -                    oa_buffer_check_unlocked(stream));
>>> +                    oa_buffer_check_reports(stream));
>>>  }
>>>  /**
>>
>>
Umesh Nerlige Ramappa April 15, 2020, 10:47 p.m. UTC | #9
On Wed, Apr 15, 2020 at 10:16:59PM +0300, Lionel Landwerlin wrote:
>On 15/04/2020 21:55, Umesh Nerlige Ramappa wrote:
>>On Wed, Apr 15, 2020 at 01:00:30PM +0300, Lionel Landwerlin wrote:
>>>On 13/04/2020 18:48, Umesh Nerlige Ramappa wrote:
>>>>A condition in wait_event_interruptible seems to be checked 
>>>>twice before
>>>>waiting on the event to occur. These checks are redundant when hrtimer
>>>>events will call oa_buffer_check_unlocked to update the oa_buffer tail
>>>>pointers. The redundant checks add cpu overhead. Simplify the check
>>>>to reduce cpu overhead when using blocking io to read oa buffer 
>>>>reports.
>>>>
>>>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>>
>>>
>>>I cherry picked this patch alone and it seems to break the 
>>>disabled-read-error test.
>>
>>Strange. I don't see it fail on my CFL. I am apply this on the 
>>latest drm-tip from yesterday.
>>
>>The patch still checks if reports are available before blocking. The 
>>behavior should still be the same w.r.t this test.
>>
>>What machine did you run it on? I will try on the same. Any chance 
>>you have the debug output from the test?
>>
>>Thanks,
>>Umesh
>>
>
>I got that on SKL GT4 : http://paste.debian.net/1140604/
>

Fails always on SKL GT4. Thanks for catching this :)

Also fails without this patch if this test were to use NONBLOCKing IO.

This has to do with being able to read reports at sampling frequencies 
as high as 5 us (on some platforms, I guess).

Will look into it further and repost the series. Let me know if you have 
any other thoughts on this.

Thanks,
Umesh

>
>Thanks,
>
>
>-Lionel
>
>
>>>
>>>
>>>>---
>>>> drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
>>>> 1 file changed, 27 insertions(+), 1 deletion(-)
>>>>
>>>>diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>>>b/drivers/gpu/drm/i915/i915_perf.c
>>>>index 5cde3e4e7be6..e28a3ab83fde 100644
>>>>--- a/drivers/gpu/drm/i915/i915_perf.c
>>>>+++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>@@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct 
>>>>i915_perf_stream *stream)
>>>>     return pollin;
>>>> }
>>>>+/**
>>>>+ * oa_buffer_check_reports - quick check if reports are available
>>>>+ * @stream: i915 stream instance
>>>>+ *
>>>>+ * The return from this function is used as a condition for
>>>>+ * wait_event_interruptible in blocking read. This is used to detect
>>>>+ * available reports.
>>>>+ *
>>>>+ * A condition in wait_event_interruptible seems to be checked 
>>>>twice before
>>>>+ * waiting on an event to occur. These checks are redundant 
>>>>when hrtimer events
>>>>+ * will call oa_buffer_check_unlocked to update the oa_buffer 
>>>>tail pointers. The
>>>>+ * redundant checks add cpu overhead. We simplify the check to 
>>>>reduce cpu
>>>>+ * overhead.
>>>>+ */
>>>>+static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
>>>>+{
>>>>+    unsigned long flags;
>>>>+    bool available;
>>>>+
>>>>+    spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>>>>+    available = stream->oa_buffer.tail != stream->oa_buffer.head;
>>>>+ spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
>>>>+
>>>>+    return available;
>>>>+}
>>>>+
>>>> /**
>>>>  * append_oa_status - Appends a status record to a userspace 
>>>>read() buffer.
>>>>  * @stream: An i915-perf stream opened for OA metrics
>>>>@@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct 
>>>>i915_perf_stream *stream)
>>>>         return -EIO;
>>>>     return wait_event_interruptible(stream->poll_wq,
>>>>-                    oa_buffer_check_unlocked(stream));
>>>>+                    oa_buffer_check_reports(stream));
>>>> }
>>>> /**
>>>
>>>
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 5cde3e4e7be6..e28a3ab83fde 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -541,6 +541,32 @@  static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
 	return pollin;
 }
 
+/**
+ * oa_buffer_check_reports - quick check if reports are available
+ * @stream: i915 stream instance
+ *
+ * The return from this function is used as a condition for
+ * wait_event_interruptible in blocking read. This is used to detect
+ * available reports.
+ *
+ * A condition in wait_event_interruptible seems to be checked twice before
+ * waiting on an event to occur. These checks are redundant when hrtimer events
+ * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
+ * redundant checks add cpu overhead. We simplify the check to reduce cpu
+ * overhead.
+ */
+static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
+{
+	unsigned long flags;
+	bool available;
+
+	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
+	available = stream->oa_buffer.tail != stream->oa_buffer.head;
+	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
+
+	return available;
+}
+
 /**
  * append_oa_status - Appends a status record to a userspace read() buffer.
  * @stream: An i915-perf stream opened for OA metrics
@@ -1150,7 +1176,7 @@  static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
 		return -EIO;
 
 	return wait_event_interruptible(stream->poll_wq,
-					oa_buffer_check_unlocked(stream));
+					oa_buffer_check_reports(stream));
 }
 
 /**