diff mbox

[2/2] drm/i915: Rework order of operations in {__intel, logical}_ring_prepare()

Message ID 1434128948-9221-3-git-send-email-david.s.gordon@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dave Gordon June 12, 2015, 5:09 p.m. UTC
The original idea of preallocating the OLR was implemented in

> 9d773091 drm/i915: Preallocate next seqno before touching the ring

and the sequence of operations was to allocate the OLR, then wrap past
the end of the ring if necessary, then wait for space if necessary.
But subsequently intel_ring_begin() was refactored, in

> 304d695 drm/i915: Flush outstanding requests before allocating new seqno

to ensure that pending work that might need to be flushed used the old
and not the newly-allocated request. This changed the sequence to wrap
and/or wait, then allocate, although the comment still said
	/* Preallocate the olr before touching the ring */
which was no longer true as intel_wrap_ring_buffer() touches the ring.

However, with the introduction of dynamic pinning, in

> 7ba717c drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand

came the possibility that the ringbuffer might not be pinned to the GTT
or mapped into CPU address space when intel_ring_begin() is called. It
gets pinned when the request is allocated, so it's now important that
this comes *before* anything that can write into the ringbuffer, in this
case intel_wrap_ring_buffer(), as this will fault if (a) the ringbuffer
happens not to be mapped, and (b) tail happens to be sufficiently close
to the end of the ring to trigger wrapping.

So the correct order is neither the original allocate-wait-pad-wait, nor
the subsequent wait-pad-wait-allocate, but wait-allocate-pad, avoiding
both the problems described in the two commits mentioned above.

As a bonus, we eliminate the special case where a single ring_begin()
might end up waiting twice (once to be able to wrap, and then again
if that still hadn't actually freed enough space for the request). We
just precalculate the total amount of space we'll need *including* any
for padding the end of the ring and wait for that much in one go :)

In the time since this code was written, it has all been cloned from
the original ringbuffer model to become the execbuffer code, in

> 82e104c drm/i915/bdw: New logical ring submission mechanism

So now we have to fix it in both paths ...

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        |   64 +++++++++++++++----------------
 drivers/gpu/drm/i915/intel_ringbuffer.c |   63 +++++++++++++++---------------
 2 files changed, 64 insertions(+), 63 deletions(-)

Comments

Chris Wilson June 12, 2015, 6:05 p.m. UTC | #1
On Fri, Jun 12, 2015 at 06:09:08PM +0100, Dave Gordon wrote:
> The original idea of preallocating the OLR was implemented in
> 
> > 9d773091 drm/i915: Preallocate next seqno before touching the ring
> 
> and the sequence of operations was to allocate the OLR, then wrap past
> the end of the ring if necessary, then wait for space if necessary.
> But subsequently intel_ring_begin() was refactored, in
> 
> > 304d695 drm/i915: Flush outstanding requests before allocating new seqno
> 
> to ensure that pending work that might need to be flushed used the old
> and not the newly-allocated request. This changed the sequence to wrap
> and/or wait, then allocate, although the comment still said
> 	/* Preallocate the olr before touching the ring */
> which was no longer true as intel_wrap_ring_buffer() touches the ring.
> 
> However, with the introduction of dynamic pinning, in
> 
> > 7ba717c drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand
> 
> came the possibility that the ringbuffer might not be pinned to the GTT
> or mapped into CPU address space when intel_ring_begin() is called. It
> gets pinned when the request is allocated, so it's now important that
> this comes *before* anything that can write into the ringbuffer, in this
> case intel_wrap_ring_buffer(), as this will fault if (a) the ringbuffer
> happens not to be mapped, and (b) tail happens to be sufficiently close
> to the end of the ring to trigger wrapping.

On the other hand, the request allocation can itself write into the
ring. This is not the right fix, that is the elimination of olr itself
and passing the request into intel_ring_begin. That way we can explicit
in our ordering into ring access.
-Chris
Dave Gordon June 12, 2015, 6:54 p.m. UTC | #2
On 12/06/15 19:05, Chris Wilson wrote:
> On Fri, Jun 12, 2015 at 06:09:08PM +0100, Dave Gordon wrote:
>> The original idea of preallocating the OLR was implemented in
>>
>>> 9d773091 drm/i915: Preallocate next seqno before touching the ring
>>
>> and the sequence of operations was to allocate the OLR, then wrap past
>> the end of the ring if necessary, then wait for space if necessary.
>> But subsequently intel_ring_begin() was refactored, in
>>
>>> 304d695 drm/i915: Flush outstanding requests before allocating new seqno
>>
>> to ensure that pending work that might need to be flushed used the old
>> and not the newly-allocated request. This changed the sequence to wrap
>> and/or wait, then allocate, although the comment still said
>> 	/* Preallocate the olr before touching the ring */
>> which was no longer true as intel_wrap_ring_buffer() touches the ring.
>>
>> However, with the introduction of dynamic pinning, in
>>
>>> 7ba717c drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand
>>
>> came the possibility that the ringbuffer might not be pinned to the GTT
>> or mapped into CPU address space when intel_ring_begin() is called. It
>> gets pinned when the request is allocated, so it's now important that
>> this comes *before* anything that can write into the ringbuffer, in this
>> case intel_wrap_ring_buffer(), as this will fault if (a) the ringbuffer
>> happens not to be mapped, and (b) tail happens to be sufficiently close
>> to the end of the ring to trigger wrapping.
> 
> On the other hand, the request allocation can itself write into the
> ring. This is not the right fix, that is the elimination of olr itself
> and passing the request into intel_ring_begin. That way we can explicit
> in our ordering into ring access.
> -Chris

AFAICS, request allocation can write into the ring only if it actually
has to flush some *pre-existing* OLR. [Aside: it can actually trigger
writing into a completely different ringbuffer, but not the one we're
handling here!] The worst-case sequence is:

	i915_gem_request_alloc		finds there's no OLR
	  i915_gem_get_seqno		  finds the seqno is 0
	    i915_gem_init_seqno		    for_eash_ring do ...
	      intel_ring_idle		      but no OLR, so OK

It only works because i915_gem_request_alloc() allocates the request
early but doesn't store it in the OLR until the end.

OTOH I agree that the long-term answer is the elimination of the OLR;
this is really something of a stopgap until John H's Anti-OLR patchset
is merged.

Although, the simplification of the wait-wrap/wait-space sequence is
probably worthwhile in its own right, so if Anti-OLR gets merged first
we can put the rest of the changes on top of that. It's only code inside
the "if(!OLR)" section that would need to be removed.

.Dave.
Chris Wilson June 12, 2015, 7:10 p.m. UTC | #3
On Fri, Jun 12, 2015 at 07:54:17PM +0100, Dave Gordon wrote:
> On 12/06/15 19:05, Chris Wilson wrote:
> > On Fri, Jun 12, 2015 at 06:09:08PM +0100, Dave Gordon wrote:
> >> The original idea of preallocating the OLR was implemented in
> >>
> >>> 9d773091 drm/i915: Preallocate next seqno before touching the ring
> >>
> >> and the sequence of operations was to allocate the OLR, then wrap past
> >> the end of the ring if necessary, then wait for space if necessary.
> >> But subsequently intel_ring_begin() was refactored, in
> >>
> >>> 304d695 drm/i915: Flush outstanding requests before allocating new seqno
> >>
> >> to ensure that pending work that might need to be flushed used the old
> >> and not the newly-allocated request. This changed the sequence to wrap
> >> and/or wait, then allocate, although the comment still said
> >> 	/* Preallocate the olr before touching the ring */
> >> which was no longer true as intel_wrap_ring_buffer() touches the ring.
> >>
> >> However, with the introduction of dynamic pinning, in
> >>
> >>> 7ba717c drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand
> >>
> >> came the possibility that the ringbuffer might not be pinned to the GTT
> >> or mapped into CPU address space when intel_ring_begin() is called. It
> >> gets pinned when the request is allocated, so it's now important that
> >> this comes *before* anything that can write into the ringbuffer, in this
> >> case intel_wrap_ring_buffer(), as this will fault if (a) the ringbuffer
> >> happens not to be mapped, and (b) tail happens to be sufficiently close
> >> to the end of the ring to trigger wrapping.
> > 
> > On the other hand, the request allocation can itself write into the
> > ring. This is not the right fix, that is the elimination of olr itself
> > and passing the request into intel_ring_begin. That way we can explicit
> > in our ordering into ring access.
> > -Chris
> 
> AFAICS, request allocation can write into the ring only if it actually
> has to flush some *pre-existing* OLR. [Aside: it can actually trigger
> writing into a completely different ringbuffer, but not the one we're
> handling here!] The worst-case sequence is:

You forget that ultimately (or rather should have been in the design for
execlists once the shortcomings of the ad hoc method were apparent) 
equest allocation will also be responsible for context management (since
they are one and the same).

> It only works because i915_gem_request_alloc() allocates the request
> early but doesn't store it in the OLR until the end.
> 
> OTOH I agree that the long-term answer is the elimination of the OLR;
> this is really something of a stopgap until John H's Anti-OLR patchset
> is merged.

See my patches a year ago for a more complete cleanup.
-Chris
Dave Gordon June 12, 2015, 8:25 p.m. UTC | #4
Updated version split into two. The first tidies up the _ring_prepare()
functions and removes the corner case where we might have had to wait
twice; the second is a temporary workaround to solve a kernel OOPS that
can occur if logical_ring_begin is called while the ringbuffer is not
mapped because there's no current request.

The latter will be superseded by the Anti-OLR patch series currently
in review. But this helps with GuC submission, which is better than
the execlist path at exposing the problematic case :(
Daniel Vetter June 17, 2015, 11:04 a.m. UTC | #5
On Fri, Jun 12, 2015 at 09:25:36PM +0100, Dave Gordon wrote:
> Updated version split into two. The first tidies up the _ring_prepare()
> functions and removes the corner case where we might have had to wait
> twice; the second is a temporary workaround to solve a kernel OOPS that
> can occur if logical_ring_begin is called while the ringbuffer is not
> mapped because there's no current request.
> 
> The latter will be superseded by the Anti-OLR patch series currently
> in review. But this helps with GuC submission, which is better than
> the execlist path at exposing the problematic case :(

Maintainer broken record: Lack of changelog makes it hard to figure out
what changed and which patches are the latest version. Even more so when
trying to catch up from vacation ...
-Daniel
Jani Nikula June 17, 2015, 12:41 p.m. UTC | #6
On Wed, 17 Jun 2015, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Fri, Jun 12, 2015 at 09:25:36PM +0100, Dave Gordon wrote:
>> Updated version split into two. The first tidies up the _ring_prepare()
>> functions and removes the corner case where we might have had to wait
>> twice; the second is a temporary workaround to solve a kernel OOPS that
>> can occur if logical_ring_begin is called while the ringbuffer is not
>> mapped because there's no current request.
>> 
>> The latter will be superseded by the Anti-OLR patch series currently
>> in review. But this helps with GuC submission, which is better than
>> the execlist path at exposing the problematic case :(
>
> Maintainer broken record: Lack of changelog makes it hard to figure out
> what changed and which patches are the latest version. Even more so when
> trying to catch up from vacation ...

Is it time we adopted Greg's <formletter> approach with copy-pasted
snippets from [1]...? See [2] for an example.

BR,
Jani.


[1] https://github.com/gregkh/gregkh-linux/blob/master/forms/patch_bot
[2] http://mid.gmane.org/20150612153842.GA12274@kroah.com

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Dave Gordon June 18, 2015, 10:30 a.m. UTC | #7
On 17/06/15 12:04, Daniel Vetter wrote:
> On Fri, Jun 12, 2015 at 09:25:36PM +0100, Dave Gordon wrote:
>> Updated version split into two. The first tidies up the _ring_prepare()
>> functions and removes the corner case where we might have had to wait
>> twice; the second is a temporary workaround to solve a kernel OOPS that
>> can occur if logical_ring_begin is called while the ringbuffer is not
>> mapped because there's no current request.
>>
>> The latter will be superseded by the Anti-OLR patch series currently
>> in review. But this helps with GuC submission, which is better than
>> the execlist path at exposing the problematic case :(
> 
> Maintainer broken record: Lack of changelog makes it hard to figure out
> what changed and which patches are the latest version. Even more so when
> trying to catch up from vacation ...
> -Daniel

Oops, that wasn't ready to go to the mailing list, that was just
supposed to go to myself so I could test whether the changes I'd made to
my git-format-patch and git-send-email settings worked! Hence lack of
subject line :(

And the settings obviously /weren't/ right; apart from it going to the
list, it didn't have the proper "Organisation" header, which was the
thing I was trying to update, as well as setting up proper definitions
so I could write "git send-email --identity=external --to=myself ..."

I think I got them all sorted out before sending the GuC submission
sequence though :)

.Dave.
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 454e836..3ef5fb6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -740,39 +740,22 @@  intel_logical_ring_advance_and_submit(struct intel_ringbuffer *ringbuf,
 	execlists_context_queue(ring, ctx, ringbuf->tail, request);
 }
 
-static int logical_ring_wrap_buffer(struct intel_ringbuffer *ringbuf,
-				    struct intel_context *ctx)
-{
-	uint32_t __iomem *virt;
-	int rem = ringbuf->size - ringbuf->tail;
-
-	if (ringbuf->space < rem) {
-		int ret = logical_ring_wait_for_space(ringbuf, ctx, rem);
-
-		if (ret)
-			return ret;
-	}
-
-	virt = ringbuf->virtual_start + ringbuf->tail;
-	rem /= 4;
-	while (rem--)
-		iowrite32(MI_NOOP, virt++);
-
-	ringbuf->tail = 0;
-	intel_ring_update_space(ringbuf);
-
-	return 0;
-}
-
 static int logical_ring_prepare(struct intel_ringbuffer *ringbuf,
 				struct intel_context *ctx, int bytes)
 {
+	int fill = 0;
 	int ret;
 
+	/*
+	 * If the request will not fit between 'tail' and the effective
+	 * size of the ringbuffer, then we need to pad the end of the
+	 * ringbuffer with NOOPs, then start the request at the top of
+	 * the ring. This increases the total size that we need to check
+	 * for by however much is left at the end of the ring ...
+	 */
 	if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
-		ret = logical_ring_wrap_buffer(ringbuf, ctx);
-		if (unlikely(ret))
-			return ret;
+		fill = ringbuf->size - ringbuf->tail;
+		bytes += fill;
 	}
 
 	if (unlikely(ringbuf->space < bytes)) {
@@ -781,6 +764,28 @@  static int logical_ring_prepare(struct intel_ringbuffer *ringbuf,
 			return ret;
 	}
 
+	/* Ensure we have a request before touching the ring */
+	if (!ringbuf->ring->outstanding_lazy_request) {
+		ret = i915_gem_request_alloc(ringbuf->ring, ctx);
+		if (ret)
+			return ret;
+	}
+
+	if (unlikely(fill)) {
+		uint32_t __iomem *virt = ringbuf->virtual_start + ringbuf->tail;
+
+		/* tail should not have moved */
+		if (WARN_ON(fill != ringbuf->size - ringbuf->tail))
+			fill = ringbuf->size - ringbuf->tail;
+
+		do
+			iowrite32(MI_NOOP, virt++);
+		while ((fill -= 4) > 0);
+
+		ringbuf->tail = 0;
+		intel_ring_update_space(ringbuf);
+	}
+
 	return 0;
 }
 
@@ -814,11 +819,6 @@  static int intel_logical_ring_begin(struct intel_ringbuffer *ringbuf,
 	if (ret)
 		return ret;
 
-	/* Preallocate the olr before touching the ring */
-	ret = i915_gem_request_alloc(ring, ctx);
-	if (ret)
-		return ret;
-
 	ringbuf->space -= num_dwords * sizeof(uint32_t);
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index a3406b2..4c0bc29 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2137,29 +2137,6 @@  static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
 	return 0;
 }
 
-static int intel_wrap_ring_buffer(struct intel_engine_cs *ring)
-{
-	uint32_t __iomem *virt;
-	struct intel_ringbuffer *ringbuf = ring->buffer;
-	int rem = ringbuf->size - ringbuf->tail;
-
-	if (ringbuf->space < rem) {
-		int ret = ring_wait_for_space(ring, rem);
-		if (ret)
-			return ret;
-	}
-
-	virt = ringbuf->virtual_start + ringbuf->tail;
-	rem /= 4;
-	while (rem--)
-		iowrite32(MI_NOOP, virt++);
-
-	ringbuf->tail = 0;
-	intel_ring_update_space(ringbuf);
-
-	return 0;
-}
-
 int intel_ring_idle(struct intel_engine_cs *ring)
 {
 	struct drm_i915_gem_request *req;
@@ -2197,12 +2174,19 @@  static int __intel_ring_prepare(struct intel_engine_cs *ring,
 				int bytes)
 {
 	struct intel_ringbuffer *ringbuf = ring->buffer;
+	int fill = 0;
 	int ret;
 
+	/*
+	 * If the request will not fit between 'tail' and the effective
+	 * size of the ringbuffer, then we need to pad the end of the
+	 * ringbuffer with NOOPs, then start the request at the top of
+	 * the ring. This increases the total size that we need to check
+	 * for by however much is left at the end of the ring ...
+	 */
 	if (unlikely(ringbuf->tail + bytes > ringbuf->effective_size)) {
-		ret = intel_wrap_ring_buffer(ring);
-		if (unlikely(ret))
-			return ret;
+		fill = ringbuf->size - ringbuf->tail;
+		bytes += fill;
 	}
 
 	if (unlikely(ringbuf->space < bytes)) {
@@ -2211,6 +2195,28 @@  static int __intel_ring_prepare(struct intel_engine_cs *ring,
 			return ret;
 	}
 
+	/* Ensure we have a request before touching the ring */
+	if (!ringbuf->ring->outstanding_lazy_request) {
+		ret = i915_gem_request_alloc(ringbuf->ring, ctx);
+		if (ret)
+			return ret;
+	}
+
+	if (unlikely(fill)) {
+		uint32_t __iomem *virt = ringbuf->virtual_start + ringbuf->tail;
+
+		/* tail should not have moved */
+		if (WARN_ON(fill != ringbuf->size - ringbuf->tail))
+			fill = ringbuf->size - ringbuf->tail;
+
+		do
+			iowrite32(MI_NOOP, virt++);
+		while ((fill -= 4) > 0);
+
+		ringbuf->tail = 0;
+		intel_ring_update_space(ringbuf);
+	}
+
 	return 0;
 }
 
@@ -2229,11 +2235,6 @@  int intel_ring_begin(struct intel_engine_cs *ring,
 	if (ret)
 		return ret;
 
-	/* Preallocate the olr before touching the ring */
-	ret = i915_gem_request_alloc(ring, ring->default_context);
-	if (ret)
-		return ret;
-
 	ring->buffer->space -= num_dwords * sizeof(uint32_t);
 	return 0;
 }