Message ID | 20210819061639.21051-3-matthew.brost@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Clean up GuC CI failures, simplify locking, and kernel DOC | expand |
On Thu, Aug 19, 2021 at 02:31:51PM -0700, Daniele Ceraolo Spurio wrote: > > > On 8/18/2021 11:16 PM, Matthew Brost wrote: > > A small race that could result in incorrect accounting of the number > > of outstanding G2H. Basically prior to this patch we did not increment > > the number of outstanding G2H if we encoutered a GT reset while sending > > a H2G. This was incorrect as the context state had already been updated > > to anticipate a G2H response thus the counter should be incremented. > > > > Also always use helper when decrementing this value. > > > > Fixes: f4eb1f3fe946 ("drm/i915/guc: Ensure G2H response has space in buffer") > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > Cc: <stable@vger.kernel.org> > > --- > > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 24 ++++++++++--------- > > 1 file changed, 13 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > index 69faa39da178..32c414aa9009 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > @@ -352,6 +352,12 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, > > xa_unlock_irqrestore(&guc->context_lookup, flags); > > } > > +static void decr_outstanding_submission_g2h(struct intel_guc *guc) > > +{ > > + if (atomic_dec_and_test(&guc->outstanding_submission_g2h)) > > + wake_up_all(&guc->ct.wq); > > +} > > + > > static int guc_submission_send_busy_loop(struct intel_guc *guc, > > const u32 *action, > > u32 len, > > @@ -360,11 +366,13 @@ static int guc_submission_send_busy_loop(struct intel_guc *guc, > > { > > int err; > > - err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop); > > - > > - if (!err && g2h_len_dw) > > + if (g2h_len_dw) > > atomic_inc(&guc->outstanding_submission_g2h); > > + err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop); > > + if (err == -EBUSY && g2h_len_dw) > > + decr_outstanding_submission_g2h(guc); > > + > > here you're special casingĀ -EBUSY, which kind of implies that the caller > needs to handle this differently, but most callers seem to ignore the return > code. Is the counter handled somewhere else in case of EBUSY? if so, please > add a comment about it. > Good catch, this is a dead code path. Will delete. Matt > Daniele > > > return err; > > } > > @@ -616,7 +624,7 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) > > init_sched_state(ce); > > if (pending_enable || destroyed || deregister) { > > - atomic_dec(&guc->outstanding_submission_g2h); > > + decr_outstanding_submission_g2h(guc); > > if (deregister) > > guc_signal_context_fence(ce); > > if (destroyed) { > > @@ -635,7 +643,7 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) > > intel_engine_signal_breadcrumbs(ce->engine); > > } > > intel_context_sched_disable_unpin(ce); > > - atomic_dec(&guc->outstanding_submission_g2h); > > + decr_outstanding_submission_g2h(guc); > > spin_lock_irqsave(&ce->guc_state.lock, flags); > > guc_blocked_fence_complete(ce); > > spin_unlock_irqrestore(&ce->guc_state.lock, flags); > > @@ -2583,12 +2591,6 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx) > > return ce; > > } > > -static void decr_outstanding_submission_g2h(struct intel_guc *guc) > > -{ > > - if (atomic_dec_and_test(&guc->outstanding_submission_g2h)) > > - wake_up_all(&guc->ct.wq); > > -} > > - > > int intel_guc_deregister_done_process_msg(struct intel_guc *guc, > > const u32 *msg, > > u32 len) >
On 8/18/2021 11:16 PM, Matthew Brost wrote: > A small race that could result in incorrect accounting of the number > of outstanding G2H. Basically prior to this patch we did not increment > the number of outstanding G2H if we encoutered a GT reset while sending > a H2G. This was incorrect as the context state had already been updated > to anticipate a G2H response thus the counter should be incremented. > > Also always use helper when decrementing this value. > > Fixes: f4eb1f3fe946 ("drm/i915/guc: Ensure G2H response has space in buffer") > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > Cc: <stable@vger.kernel.org> > --- > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 24 ++++++++++--------- > 1 file changed, 13 insertions(+), 11 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > index 69faa39da178..32c414aa9009 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > @@ -352,6 +352,12 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, > xa_unlock_irqrestore(&guc->context_lookup, flags); > } > > +static void decr_outstanding_submission_g2h(struct intel_guc *guc) > +{ > + if (atomic_dec_and_test(&guc->outstanding_submission_g2h)) > + wake_up_all(&guc->ct.wq); > +} > + > static int guc_submission_send_busy_loop(struct intel_guc *guc, > const u32 *action, > u32 len, > @@ -360,11 +366,13 @@ static int guc_submission_send_busy_loop(struct intel_guc *guc, > { > int err; > > - err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop); > - > - if (!err && g2h_len_dw) > + if (g2h_len_dw) > atomic_inc(&guc->outstanding_submission_g2h); > > + err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop); > + if (err == -EBUSY && g2h_len_dw) > + decr_outstanding_submission_g2h(guc); > + here you're special casingĀ -EBUSY, which kind of implies that the caller needs to handle this differently, but most callers seem to ignore the return code. Is the counter handled somewhere else in case of EBUSY? if so, please add a comment about it. Daniele > return err; > } > > @@ -616,7 +624,7 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) > init_sched_state(ce); > > if (pending_enable || destroyed || deregister) { > - atomic_dec(&guc->outstanding_submission_g2h); > + decr_outstanding_submission_g2h(guc); > if (deregister) > guc_signal_context_fence(ce); > if (destroyed) { > @@ -635,7 +643,7 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) > intel_engine_signal_breadcrumbs(ce->engine); > } > intel_context_sched_disable_unpin(ce); > - atomic_dec(&guc->outstanding_submission_g2h); > + decr_outstanding_submission_g2h(guc); > spin_lock_irqsave(&ce->guc_state.lock, flags); > guc_blocked_fence_complete(ce); > spin_unlock_irqrestore(&ce->guc_state.lock, flags); > @@ -2583,12 +2591,6 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx) > return ce; > } > > -static void decr_outstanding_submission_g2h(struct intel_guc *guc) > -{ > - if (atomic_dec_and_test(&guc->outstanding_submission_g2h)) > - wake_up_all(&guc->ct.wq); > -} > - > int intel_guc_deregister_done_process_msg(struct intel_guc *guc, > const u32 *msg, > u32 len)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 69faa39da178..32c414aa9009 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -352,6 +352,12 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, xa_unlock_irqrestore(&guc->context_lookup, flags); } +static void decr_outstanding_submission_g2h(struct intel_guc *guc) +{ + if (atomic_dec_and_test(&guc->outstanding_submission_g2h)) + wake_up_all(&guc->ct.wq); +} + static int guc_submission_send_busy_loop(struct intel_guc *guc, const u32 *action, u32 len, @@ -360,11 +366,13 @@ static int guc_submission_send_busy_loop(struct intel_guc *guc, { int err; - err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop); - - if (!err && g2h_len_dw) + if (g2h_len_dw) atomic_inc(&guc->outstanding_submission_g2h); + err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop); + if (err == -EBUSY && g2h_len_dw) + decr_outstanding_submission_g2h(guc); + return err; } @@ -616,7 +624,7 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) init_sched_state(ce); if (pending_enable || destroyed || deregister) { - atomic_dec(&guc->outstanding_submission_g2h); + decr_outstanding_submission_g2h(guc); if (deregister) guc_signal_context_fence(ce); if (destroyed) { @@ -635,7 +643,7 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) intel_engine_signal_breadcrumbs(ce->engine); } intel_context_sched_disable_unpin(ce); - atomic_dec(&guc->outstanding_submission_g2h); + decr_outstanding_submission_g2h(guc); spin_lock_irqsave(&ce->guc_state.lock, flags); guc_blocked_fence_complete(ce); spin_unlock_irqrestore(&ce->guc_state.lock, flags); @@ -2583,12 +2591,6 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx) return ce; } -static void decr_outstanding_submission_g2h(struct intel_guc *guc) -{ - if (atomic_dec_and_test(&guc->outstanding_submission_g2h)) - wake_up_all(&guc->ct.wq); -} - int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len)
A small race that could result in incorrect accounting of the number of outstanding G2H. Basically prior to this patch we did not increment the number of outstanding G2H if we encoutered a GT reset while sending a H2G. This was incorrect as the context state had already been updated to anticipate a G2H response thus the counter should be incremented. Also always use helper when decrementing this value. Fixes: f4eb1f3fe946 ("drm/i915/guc: Ensure G2H response has space in buffer") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 24 ++++++++++--------- 1 file changed, 13 insertions(+), 11 deletions(-)