Message ID | 20171019183619.6235-14-michal.winiarski@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Michał Winiarski (2017-10-19 19:36:18) > With GuC, we're scheduling tasklet on USER_INTERRUPT - since some work > has completed, perhaps we're able to submit some more work. We're doing > similar thing for preemption - after preemption has completed, it's time > to schedule the tasklet and submit more work (since the engine is now > idle). Unfortunately, we can hit the scenarios where the preemption is > done, but the interrupt is nowhere to be seen. To work around the > problem, let's use a delayed work that's kicking the tasklet if > preemption is done, and queueing itself otherwise. I'm not buying it yet. Missing USER_INTERRUPT would not be guc specific (surely?) and so we would be getting similar missed-breadcrumb warnings on skl+. We dedicate a lot of CI towards detecting those... The first alternative that springs to mind is bad ordering between ggtt write and MI_USER_INTERRUPT, but that too is common to execlists, and also results in missed-breadcrumb warnings (which are notable by their absence). Though having said that, the HWSP ordering issue with intel_iommu is worrying me that we have similar issues with breadcrumbs + intel_iommu. Is iommu a factor? In short, if this is a problem here, it should be a problem everywhere. Right? -Chris
Quoting Michał Winiarski (2017-10-19 19:36:18) > +#define GUC_LOST_IRQ_WORK_DELAY_MS 100 > +static void guc_lost_user_interrupt(struct work_struct *work) > +{ > + struct guc_preempt_work *preempt_work = > + container_of(to_delayed_work(work), typeof(*preempt_work), > + lost_irq_work); > + struct intel_engine_cs *engine = preempt_work->engine; > + struct intel_guc *guc = &engine->i915->guc; Since you switched preempt_work to be stored within intel_guc, you can use container_of to get to the guc and cut out the dance. Here and inject_preempt_context. -Chris
Quoting Chris Wilson (2017-10-19 20:44:41) > Quoting Michał Winiarski (2017-10-19 19:36:18) > > With GuC, we're scheduling tasklet on USER_INTERRUPT - since some work > > has completed, perhaps we're able to submit some more work. We're doing > > similar thing for preemption - after preemption has completed, it's time > > to schedule the tasklet and submit more work (since the engine is now > > idle). Unfortunately, we can hit the scenarios where the preemption is > > done, but the interrupt is nowhere to be seen. To work around the > > problem, let's use a delayed work that's kicking the tasklet if > > preemption is done, and queueing itself otherwise. > > I'm not buying it yet. I think I know what it is. I think it is as simple as us turning off the irq when there's no more requests being signaled on the engine. -Chris
On 20/10/17 10:22, Chris Wilson wrote: > Quoting Chris Wilson (2017-10-19 20:44:41) >> Quoting Michał Winiarski (2017-10-19 19:36:18) >>> With GuC, we're scheduling tasklet on USER_INTERRUPT - since some work >>> has completed, perhaps we're able to submit some more work. We're doing >>> similar thing for preemption - after preemption has completed, it's time >>> to schedule the tasklet and submit more work (since the engine is now >>> idle). Unfortunately, we can hit the scenarios where the preemption is >>> done, but the interrupt is nowhere to be seen. To work around the >>> problem, let's use a delayed work that's kicking the tasklet if >>> preemption is done, and queueing itself otherwise. >> >> I'm not buying it yet. > > I think I know what it is. I think it is as simple as us turning off the > irq when there's no more requests being signaled on the engine. > -Chris I can confirm this, I was just about to reply with the same findings ;) By hackily adding USER_INTERRUPT to irq_keep_mask we've been able to run exec_whisper@fds-priority several time in a row with no problems. Daniele
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c index a11ed4deff4b..dbb03b5481d2 100644 --- a/drivers/gpu/drm/i915/i915_guc_submission.c +++ b/drivers/gpu/drm/i915/i915_guc_submission.c @@ -574,6 +574,26 @@ static void flush_ggtt_writes(struct i915_vma *vma) POSTING_READ_FW(GUC_STATUS); } +#define GUC_LOST_IRQ_WORK_DELAY_MS 100 +static void guc_lost_user_interrupt(struct work_struct *work) +{ + struct guc_preempt_work *preempt_work = + container_of(to_delayed_work(work), typeof(*preempt_work), + lost_irq_work); + struct intel_engine_cs *engine = preempt_work->engine; + struct intel_guc *guc = &engine->i915->guc; + struct guc_shared_ctx_data *data = guc->shared_data_vaddr; + struct guc_ctx_report *report = &data->preempt_ctx_report[engine->guc_id]; + + if (report->report_return_status == INTEL_GUC_REPORT_STATUS_COMPLETE) + tasklet_schedule(&engine->execlists.irq_tasklet); + else + queue_delayed_work(guc->preempt_wq, + &preempt_work->lost_irq_work, + msecs_to_jiffies(GUC_LOST_IRQ_WORK_DELAY_MS)); + +} + #define GUC_PREEMPT_FINISHED 0x1 #define GUC_PREEMPT_BREADCRUMB_DWORDS 0x8 static void inject_preempt_context(struct work_struct *work) @@ -629,7 +649,13 @@ static void inject_preempt_context(struct work_struct *work) if (WARN_ON(intel_guc_send(guc, data, ARRAY_SIZE(data)))) { WRITE_ONCE(engine->execlists.preempt, false); tasklet_schedule(&engine->execlists.irq_tasklet); + + return; } + + queue_delayed_work(engine->i915->guc.preempt_wq, + &preempt_work->lost_irq_work, + msecs_to_jiffies(GUC_LOST_IRQ_WORK_DELAY_MS)); } /* @@ -647,6 +673,10 @@ static void wait_for_guc_preempt_report(struct intel_engine_cs *engine) struct guc_shared_ctx_data *data = guc->shared_data_vaddr; struct guc_ctx_report *report = &data->preempt_ctx_report[engine->guc_id]; + /* If we landed here, it means that we didn't lose an interrupt, and + * we can safely cancel the worker */ + cancel_delayed_work(&guc->preempt_work[engine->id].lost_irq_work); + WARN_ON(wait_for_atomic(report->report_return_status == INTEL_GUC_REPORT_STATUS_COMPLETE, GUC_PREEMPT_POSTPROCESS_DELAY_MS)); @@ -1229,6 +1259,8 @@ static int guc_preempt_work_create(struct intel_guc *guc) for_each_engine(engine, dev_priv, id) { guc->preempt_work[id].engine = engine; INIT_WORK(&guc->preempt_work[id].work, inject_preempt_context); + INIT_DELAYED_WORK(&guc->preempt_work[id].lost_irq_work, + guc_lost_user_interrupt); } return 0; @@ -1240,8 +1272,10 @@ static void guc_preempt_work_destroy(struct intel_guc *guc) struct intel_engine_cs *engine; enum intel_engine_id id; - for_each_engine(engine, dev_priv, id) + for_each_engine(engine, dev_priv, id) { + cancel_delayed_work_sync(&guc->preempt_work[id].lost_irq_work); cancel_work_sync(&guc->preempt_work[id].work); + } destroy_workqueue(guc->preempt_wq); } diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h index 7273a6be7dc1..0c9338b5c4b8 100644 --- a/drivers/gpu/drm/i915/intel_guc.h +++ b/drivers/gpu/drm/i915/intel_guc.h @@ -49,6 +49,7 @@ enum i915_guc_client_id { struct guc_preempt_work { struct intel_engine_cs *engine; struct work_struct work; + struct delayed_work lost_irq_work; }; struct intel_guc {
With GuC, we're scheduling tasklet on USER_INTERRUPT - since some work has completed, perhaps we're able to submit some more work. We're doing similar thing for preemption - after preemption has completed, it's time to schedule the tasklet and submit more work (since the engine is now idle). Unfortunately, we can hit the scenarios where the preemption is done, but the interrupt is nowhere to be seen. To work around the problem, let's use a delayed work that's kicking the tasklet if preemption is done, and queueing itself otherwise. Testcase: igt/gem_exec_whisper/*-priority Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jeff McGee <jeff.mcgee@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> --- drivers/gpu/drm/i915/i915_guc_submission.c | 36 +++++++++++++++++++++++++++++- drivers/gpu/drm/i915/intel_guc.h | 1 + 2 files changed, 36 insertions(+), 1 deletion(-)