Message ID | 20210820224446.30620-26-matthew.brost@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Parallel submission aka multi-bb execbuf | expand |
On 8/20/2021 15:44, Matthew Brost wrote: > If an error occurs in the front end when multi-lrc requests are getting > generated we need to skip these in the backend but we still need to > emit the breadcrumbs seqno. An issues arrises because with multi-lrc arrises -> arises > breadcrumbs there is a handshake between the parent and children to make > forwad progress. If all the requests are not present this handshake forwad -> forward > doesn't work. To work around this, if multi-lrc request has an error we > skip the handshake but still emit the breadcrumbs seqno. > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > --- > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 61 ++++++++++++++++++- > 1 file changed, 58 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > index 2ef38557b0f0..61e737fd1eee 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > @@ -3546,8 +3546,8 @@ static int emit_bb_start_child_no_preempt_mid_batch(struct i915_request *rq, > } > > static u32 * > -emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq, > - u32 *cs) > +__emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq, > + u32 *cs) > { > struct intel_context *ce = rq->context; > u8 i; > @@ -3575,6 +3575,41 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq, > get_children_go_addr(ce), > 0); > > + return cs; > +} > + > +/* > + * If this true, a submission of multi-lrc requests had an error and the > + * requests need to be skipped. The front end (execuf IOCTL) should've called > + * i915_request_skip which squashes the BB but we still need to emit the fini > + * breadrcrumbs seqno write. At this point we don't know how many of the > + * requests in the multi-lrc submission were generated so we can't do the > + * handshake between the parent and children (e.g. if 4 requests should be > + * generated but 2nd hit an error only 1 would be seen by the GuC backend). > + * Simply skip the handshake, but still emit the breadcrumbd seqno, if an error > + * has occurred on any of the requests in submission / relationship. > + */ > +static inline bool skip_handshake(struct i915_request *rq) > +{ > + return test_bit(I915_FENCE_FLAG_SKIP_PARALLEL, &rq->fence.flags); > +} > + > +static u32 * > +emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq, > + u32 *cs) > +{ > + struct intel_context *ce = rq->context; > + > + GEM_BUG_ON(!intel_context_is_parent(ce)); > + > + if (unlikely(skip_handshake(rq))) { > + memset(cs, 0, sizeof(u32) * > + (ce->engine->emit_fini_breadcrumb_dw - 6)); > + cs += ce->engine->emit_fini_breadcrumb_dw - 6; Why -6? There are 12 words about to be written. Indeed the value of emit_..._dw is '12 + 4*num_children'. This should only be skipping over the 4*children, right? As it stands, it will skip all but the last six words, then write an extra twelve words and thus overflow the reservation by six. Unless I am totally confused? I assume there is some reason why the amount of data written must exactly match the space reserved? It's a while since I've looked at the ring buffer code! Seems like it would be clearer to not split the semaphore writes out but have them right next to the skip code that is meant to replicate them but with no-ops. > + } else { > + cs = __emit_fini_breadcrumb_parent_no_preempt_mid_batch(rq, cs); > + } > + > /* Emit fini breadcrumb */ > cs = gen8_emit_ggtt_write(cs, > rq->fence.seqno, > @@ -3591,7 +3626,8 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq, > } > > static u32 * > -emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, u32 *cs) > +__emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, > + u32 *cs) > { > struct intel_context *ce = rq->context; > > @@ -3617,6 +3653,25 @@ emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, u32 *cs > *cs++ = get_children_go_addr(ce->parent); > *cs++ = 0; > > + return cs; > +} > + > +static u32 * > +emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, > + u32 *cs) > +{ > + struct intel_context *ce = rq->context; > + > + GEM_BUG_ON(!intel_context_is_child(ce)); > + > + if (unlikely(skip_handshake(rq))) { > + memset(cs, 0, sizeof(u32) * > + (ce->engine->emit_fini_breadcrumb_dw - 6)); > + cs += ce->engine->emit_fini_breadcrumb_dw - 6; > + } else { > + cs = __emit_fini_breadcrumb_child_no_preempt_mid_batch(rq, cs); > + } Same points as above - why -6 not -12 and would be clearer to keep the no-ops and the writes adjacent. John. > + > /* Emit fini breadcrumb */ > cs = gen8_emit_ggtt_write(cs, > rq->fence.seqno,
On Wed, Sep 29, 2021 at 01:44:10PM -0700, John Harrison wrote: > On 8/20/2021 15:44, Matthew Brost wrote: > > If an error occurs in the front end when multi-lrc requests are getting > > generated we need to skip these in the backend but we still need to > > emit the breadcrumbs seqno. An issues arrises because with multi-lrc > arrises -> arises > Yep. > > breadcrumbs there is a handshake between the parent and children to make > > forwad progress. If all the requests are not present this handshake > forwad -> forward > Yep. > > doesn't work. To work around this, if multi-lrc request has an error we > > skip the handshake but still emit the breadcrumbs seqno. > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > --- > > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 61 ++++++++++++++++++- > > 1 file changed, 58 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > index 2ef38557b0f0..61e737fd1eee 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > @@ -3546,8 +3546,8 @@ static int emit_bb_start_child_no_preempt_mid_batch(struct i915_request *rq, > > } > > static u32 * > > -emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq, > > - u32 *cs) > > +__emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq, > > + u32 *cs) > > { > > struct intel_context *ce = rq->context; > > u8 i; > > @@ -3575,6 +3575,41 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq, > > get_children_go_addr(ce), > > 0); > > + return cs; > > +} > > + > > +/* > > + * If this true, a submission of multi-lrc requests had an error and the > > + * requests need to be skipped. The front end (execuf IOCTL) should've called > > + * i915_request_skip which squashes the BB but we still need to emit the fini > > + * breadrcrumbs seqno write. At this point we don't know how many of the > > + * requests in the multi-lrc submission were generated so we can't do the > > + * handshake between the parent and children (e.g. if 4 requests should be > > + * generated but 2nd hit an error only 1 would be seen by the GuC backend). > > + * Simply skip the handshake, but still emit the breadcrumbd seqno, if an error > > + * has occurred on any of the requests in submission / relationship. > > + */ > > +static inline bool skip_handshake(struct i915_request *rq) > > +{ > > + return test_bit(I915_FENCE_FLAG_SKIP_PARALLEL, &rq->fence.flags); > > +} > > + > > +static u32 * > > +emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq, > > + u32 *cs) > > +{ > > + struct intel_context *ce = rq->context; > > + > > + GEM_BUG_ON(!intel_context_is_parent(ce)); > > + > > + if (unlikely(skip_handshake(rq))) { > > + memset(cs, 0, sizeof(u32) * > > + (ce->engine->emit_fini_breadcrumb_dw - 6)); > > + cs += ce->engine->emit_fini_breadcrumb_dw - 6; > Why -6? There are 12 words about to be written. Indeed the value of > emit_..._dw is '12 + 4*num_children'. This should only be skipping over the > 4*children, right? As it stands, it will skip all but the last six words, > then write an extra twelve words and thus overflow the reservation by six. > Unless I am totally confused? > Let me decode the length: 'Wait on children' (in __emit_fini_breadcrumb_parent_no_preempt_mid_batch) = 4 * num_children 'Turn on preemption' (in __emit_fini_breadcrumb_parent_no_preempt_mid_batch) = 2 'Tell children go' (in __emit_fini_breadcrumb_parent_no_preempt_mid_batch) = 4 'Emit fini breadcrumb' (in emit_fini_breadcrumb_child_no_preempt_mid_batch) = 4 'User interrupt' (in emit_fini_breadcrumb_child_no_preempt_mid_batch) = 2 So for a total (emit_fini_breadcrumb_dw) we have '12 + 4 * num_children' We want skip everything in __emit_fini_breadcrumb_parent_no_preempt_mid_batch, so that is '6 + 4 * num_children' or 'emit_fini_breadcrumb_dw - 6' Make sense? > I assume there is some reason why the amount of data written must exactly > match the space reserved? It's a while since I've looked at the ring buffer > code! > I think it because the ring space is reserved at request creation time but the fini breadcrumbs are not written until submission time. > Seems like it would be clearer to not split the semaphore writes out but > have them right next to the skip code that is meant to replicate them but > with no-ops. > I guess that works too, I personally like the way it is but if you insist I can change it. > > + } else { > > + cs = __emit_fini_breadcrumb_parent_no_preempt_mid_batch(rq, cs); > > + } > > + > > /* Emit fini breadcrumb */ > > cs = gen8_emit_ggtt_write(cs, > > rq->fence.seqno, > > @@ -3591,7 +3626,8 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq, > > } > > static u32 * > > -emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, u32 *cs) > > +__emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, > > + u32 *cs) > > { > > struct intel_context *ce = rq->context; > > @@ -3617,6 +3653,25 @@ emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, u32 *cs > > *cs++ = get_children_go_addr(ce->parent); > > *cs++ = 0; > > + return cs; > > +} > > + > > +static u32 * > > +emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, > > + u32 *cs) > > +{ > > + struct intel_context *ce = rq->context; > > + > > + GEM_BUG_ON(!intel_context_is_child(ce)); > > + > > + if (unlikely(skip_handshake(rq))) { > > + memset(cs, 0, sizeof(u32) * > > + (ce->engine->emit_fini_breadcrumb_dw - 6)); > > + cs += ce->engine->emit_fini_breadcrumb_dw - 6; > > + } else { > > + cs = __emit_fini_breadcrumb_child_no_preempt_mid_batch(rq, cs); > > + } > Same points as above - why -6 not -12 and would be clearer to keep the > no-ops and the writes adjacent. > Same as above we are NOP the length of __emit_fini_breadcrumb_child_no_preempt_mid_batch and still want the emit breadcrumbs below. Matt > John. > > > + > > /* Emit fini breadcrumb */ > > cs = gen8_emit_ggtt_write(cs, > > rq->fence.seqno, >
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 2ef38557b0f0..61e737fd1eee 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -3546,8 +3546,8 @@ static int emit_bb_start_child_no_preempt_mid_batch(struct i915_request *rq, } static u32 * -emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq, - u32 *cs) +__emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq, + u32 *cs) { struct intel_context *ce = rq->context; u8 i; @@ -3575,6 +3575,41 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq, get_children_go_addr(ce), 0); + return cs; +} + +/* + * If this true, a submission of multi-lrc requests had an error and the + * requests need to be skipped. The front end (execuf IOCTL) should've called + * i915_request_skip which squashes the BB but we still need to emit the fini + * breadrcrumbs seqno write. At this point we don't know how many of the + * requests in the multi-lrc submission were generated so we can't do the + * handshake between the parent and children (e.g. if 4 requests should be + * generated but 2nd hit an error only 1 would be seen by the GuC backend). + * Simply skip the handshake, but still emit the breadcrumbd seqno, if an error + * has occurred on any of the requests in submission / relationship. + */ +static inline bool skip_handshake(struct i915_request *rq) +{ + return test_bit(I915_FENCE_FLAG_SKIP_PARALLEL, &rq->fence.flags); +} + +static u32 * +emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq, + u32 *cs) +{ + struct intel_context *ce = rq->context; + + GEM_BUG_ON(!intel_context_is_parent(ce)); + + if (unlikely(skip_handshake(rq))) { + memset(cs, 0, sizeof(u32) * + (ce->engine->emit_fini_breadcrumb_dw - 6)); + cs += ce->engine->emit_fini_breadcrumb_dw - 6; + } else { + cs = __emit_fini_breadcrumb_parent_no_preempt_mid_batch(rq, cs); + } + /* Emit fini breadcrumb */ cs = gen8_emit_ggtt_write(cs, rq->fence.seqno, @@ -3591,7 +3626,8 @@ emit_fini_breadcrumb_parent_no_preempt_mid_batch(struct i915_request *rq, } static u32 * -emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, u32 *cs) +__emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, + u32 *cs) { struct intel_context *ce = rq->context; @@ -3617,6 +3653,25 @@ emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, u32 *cs *cs++ = get_children_go_addr(ce->parent); *cs++ = 0; + return cs; +} + +static u32 * +emit_fini_breadcrumb_child_no_preempt_mid_batch(struct i915_request *rq, + u32 *cs) +{ + struct intel_context *ce = rq->context; + + GEM_BUG_ON(!intel_context_is_child(ce)); + + if (unlikely(skip_handshake(rq))) { + memset(cs, 0, sizeof(u32) * + (ce->engine->emit_fini_breadcrumb_dw - 6)); + cs += ce->engine->emit_fini_breadcrumb_dw - 6; + } else { + cs = __emit_fini_breadcrumb_child_no_preempt_mid_batch(rq, cs); + } + /* Emit fini breadcrumb */ cs = gen8_emit_ggtt_write(cs, rq->fence.seqno,
If an error occurs in the front end when multi-lrc requests are getting generated we need to skip these in the backend but we still need to emit the breadcrumbs seqno. An issues arrises because with multi-lrc breadcrumbs there is a handshake between the parent and children to make forwad progress. If all the requests are not present this handshake doesn't work. To work around this, if multi-lrc request has an error we skip the handshake but still emit the breadcrumbs seqno. Signed-off-by: Matthew Brost <matthew.brost@intel.com> --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 61 ++++++++++++++++++- 1 file changed, 58 insertions(+), 3 deletions(-)