diff mbox

[v2] drm/i915: recheck ringspace after wait+retire

Message ID 1418922221-18789-1-git-send-email-david.s.gordon@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dave Gordon Dec. 18, 2014, 5:03 p.m. UTC
In {intel,logical}_ring_wait_request(), we try to find a request
whose completion will release the amount of ring space required.
If we find such a request, we wait for it, and then retire it, in
the expectation that we will now have at least the required amount
of free space in the ring. But it's good to check that this has
actually happened, so we can back out with a meaningful error
code if something unexpected has happened, such as wait_request
returning early.

This code was already in the execlist version, so the change to
intel_lrc.c is just to add a comment; but we want the same check
in the legacy ringbuffer mode too.

v2: rebase to latest drm-intel-nightly

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        |    9 +++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c |   12 +++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

Comments

Chris Wilson Dec. 18, 2014, 5:18 p.m. UTC | #1
On Thu, Dec 18, 2014 at 05:03:41PM +0000, Dave Gordon wrote:
> In {intel,logical}_ring_wait_request(), we try to find a request
> whose completion will release the amount of ring space required.
> If we find such a request, we wait for it, and then retire it, in
> the expectation that we will now have at least the required amount
> of free space in the ring. But it's good to check that this has
> actually happened, so we can back out with a meaningful error
> code if something unexpected has happened, such as wait_request
> returning early.

It's pretty pointless. It's a programming bug, if anything it should
WARN if it happens.
-Chris
Dave Gordon Dec. 18, 2014, 6:09 p.m. UTC | #2
On 18/12/14 17:18, Chris Wilson wrote:
> On Thu, Dec 18, 2014 at 05:03:41PM +0000, Dave Gordon wrote:
>> In {intel,logical}_ring_wait_request(), we try to find a request
>> whose completion will release the amount of ring space required.
>> If we find such a request, we wait for it, and then retire it, in
>> the expectation that we will now have at least the required amount
>> of free space in the ring. But it's good to check that this has
>> actually happened, so we can back out with a meaningful error
>> code if something unexpected has happened, such as wait_request
>> returning early.
> 
> It's pretty pointless. It's a programming bug, if anything it should
> WARN if it happens.
> -Chris

I don't regard preventing the propagation of errors as pointless,
whether or not it's a programming bug. If this case arose, then without
this check we would go ahead and trample over whatever was next in the
ring. Quite likely TAIL would overtake HEAD, and then all sorts of
bizarre things would happen some time later.

In particular, this may be able to catch bugs introduced by /future
features/, such as TDR, scheduling, and preemption, all of which are in
preparation now and all of which have the potential to disrupt the
predictable flow of requests. So it's good to be resilient to upcoming
changes and make it as easy as possible to catch bugs at the earliest
opportunity.

This is also a step towards the deduplication of the ringbuffer and LRC
paths by making sure that they are as similar as possible, and thus
simple to merge at some later opportunity.

.Dave.
Daniel Vetter Dec. 18, 2014, 8:19 p.m. UTC | #3
On Thu, Dec 18, 2014 at 06:09:18PM +0000, Dave Gordon wrote:
> On 18/12/14 17:18, Chris Wilson wrote:
> > On Thu, Dec 18, 2014 at 05:03:41PM +0000, Dave Gordon wrote:
> >> In {intel,logical}_ring_wait_request(), we try to find a request
> >> whose completion will release the amount of ring space required.
> >> If we find such a request, we wait for it, and then retire it, in
> >> the expectation that we will now have at least the required amount
> >> of free space in the ring. But it's good to check that this has
> >> actually happened, so we can back out with a meaningful error
> >> code if something unexpected has happened, such as wait_request
> >> returning early.
> > 
> > It's pretty pointless. It's a programming bug, if anything it should
> > WARN if it happens.
> > -Chris
> 
> I don't regard preventing the propagation of errors as pointless,
> whether or not it's a programming bug. If this case arose, then without
> this check we would go ahead and trample over whatever was next in the
> ring. Quite likely TAIL would overtake HEAD, and then all sorts of
> bizarre things would happen some time later.
> 
> In particular, this may be able to catch bugs introduced by /future
> features/, such as TDR, scheduling, and preemption, all of which are in
> preparation now and all of which have the potential to disrupt the
> predictable flow of requests. So it's good to be resilient to upcoming
> changes and make it as easy as possible to catch bugs at the earliest
> opportunity.
> 
> This is also a step towards the deduplication of the ringbuffer and LRC
> paths by making sure that they are as similar as possible, and thus
> simple to merge at some later opportunity.

Overwriting the TAIL is a fairly benign disaster as far as gpus are
concerned, the gpu tends to survive those. I concur with Chris that a
simple WARN_ON for paranoia is good enough. And I don't think we need the
comments, imo the code is clear enough as-is.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 90867b2..329cb5c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -955,6 +955,15 @@  static int logical_ring_wait_request(struct intel_ringbuffer *ringbuf,
 
 	i915_gem_retire_requests_ring(ring);
 
+	/*
+	 * According to our calculation above, retiring the request we just
+	 * waited for should have resulted in there being enough space in
+	 * the ringbuffer; but let's check.
+	 *
+	 * If there is not now enough space, something has gone horribly worng
+	 * (such as wait_request returning early, but with no error, or
+	 * retire_requests failing to retire the request we expected it to).
+	 */
 	return intel_ring_space(ringbuf) >= bytes ? 0 : -ENOSPC;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 018a37e..c95bf3a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1932,6 +1932,7 @@  static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 		return 0;
 
 	list_for_each_entry(request, &ring->request_list, list) {
+		/* Would completion of this request free enough space? */
 		if (__intel_ring_space(request->tail, ringbuf->tail,
 				       ringbuf->effective_size) >= n) {
 			break;
@@ -1947,7 +1948,16 @@  static int intel_ring_wait_request(struct intel_engine_cs *ring, int n)
 
 	i915_gem_retire_requests_ring(ring);
 
-	return 0;
+	/*
+	 * According to our calculation above, retiring the request we just
+	 * waited for should have resulted in there being enough space in
+	 * the ringbuffer; but let's check.
+	 *
+	 * If there is not now enough space, something has gone horribly worng
+	 * (such as wait_request returning early, but with no error, or
+	 * retire_requests failing to retire the request we expected it to).
+	 */
+	return intel_ring_space(ringbuf) >= n ? 0 : -ENOSPC;
 }
 
 static int ring_wait_for_space(struct intel_engine_cs *ring, int n)