Message ID | 1431356607-25092-1-git-send-email-peter.antoine@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, May 11, 2015 at 04:03:27PM +0100, Peter Antoine wrote: > If an batch ends while the IRQs are not turned on the notification can > go missing and the GPU can hang. So generate a warning in this case. > > Signed-off-by: Peter Antoine <peter.antoine@intel.com> Queued for -next, thanks for the patch. -Daniel > --- > drivers/gpu/drm/i915/intel_lrc.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index 0fa9209..0413b8f 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -394,6 +394,12 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring) > > assert_spin_locked(&ring->execlist_lock); > > + /* > + * If irqs are not active generate a warning as batches that finish > + * without the irqs may get lost and a GPU Hang may occur. > + */ > + WARN_ON(!intel_irqs_enabled(ring->dev->dev_private)); > + > if (list_empty(&ring->execlist_queue)) > return; > > -- > 1.9.1 >
On Mon, May 11, 2015 at 06:37:14PM +0200, Daniel Vetter wrote: > On Mon, May 11, 2015 at 04:03:27PM +0100, Peter Antoine wrote: > > If an batch ends while the IRQs are not turned on the notification can > > go missing and the GPU can hang. So generate a warning in this case. > > > > Signed-off-by: Peter Antoine <peter.antoine@intel.com> > > Queued for -next, thanks for the patch. Please drop this. We already have the test inside the irq handler. The instance where you want the guard is during resume, inside intel_lr_context_render_state_init() where you can even emit an error code! -Chris
On Mon, May 11, 2015 at 09:26:40PM +0100, Chris Wilson wrote: > On Mon, May 11, 2015 at 06:37:14PM +0200, Daniel Vetter wrote: > > On Mon, May 11, 2015 at 04:03:27PM +0100, Peter Antoine wrote: > > > If an batch ends while the IRQs are not turned on the notification can > > > go missing and the GPU can hang. So generate a warning in this case. > > > > > > Signed-off-by: Peter Antoine <peter.antoine@intel.com> > > > > Queued for -next, thanks for the patch. > > Please drop this. We already have the test inside the irq handler. The > instance where you want the guard is during resume, inside > intel_lr_context_render_state_init() where you can even emit an error > code! _unqueue is also called from intel_logical_ring_advance_and_submit and should cover us I hope. And I don't think we should add error handling for this, that has cost of its own. -Daniel
On Tue, May 12, 2015 at 08:42:58AM +0200, Daniel Vetter wrote: > On Mon, May 11, 2015 at 09:26:40PM +0100, Chris Wilson wrote: > > On Mon, May 11, 2015 at 06:37:14PM +0200, Daniel Vetter wrote: > > > On Mon, May 11, 2015 at 04:03:27PM +0100, Peter Antoine wrote: > > > > If an batch ends while the IRQs are not turned on the notification can > > > > go missing and the GPU can hang. So generate a warning in this case. > > > > > > > > Signed-off-by: Peter Antoine <peter.antoine@intel.com> > > > > > > Queued for -next, thanks for the patch. > > > > Please drop this. We already have the test inside the irq handler. The > > instance where you want the guard is during resume, inside > > intel_lr_context_render_state_init() where you can even emit an error > > code! > > _unqueue is also called from intel_logical_ring_advance_and_submit and > should cover us I hope. And I don't think we should add error handling for > this, that has cost of its own. Pardon? Your explanation for adding it to the *interrupt* handler after an existing check is that it also catches the initial submission? -Chris
On Tue, May 12, 2015 at 09:18:00AM +0100, Chris Wilson wrote: > On Tue, May 12, 2015 at 08:42:58AM +0200, Daniel Vetter wrote: > > On Mon, May 11, 2015 at 09:26:40PM +0100, Chris Wilson wrote: > > > On Mon, May 11, 2015 at 06:37:14PM +0200, Daniel Vetter wrote: > > > > On Mon, May 11, 2015 at 04:03:27PM +0100, Peter Antoine wrote: > > > > > If an batch ends while the IRQs are not turned on the notification can > > > > > go missing and the GPU can hang. So generate a warning in this case. > > > > > > > > > > Signed-off-by: Peter Antoine <peter.antoine@intel.com> > > > > > > > > Queued for -next, thanks for the patch. > > > > > > Please drop this. We already have the test inside the irq handler. The > > > instance where you want the guard is during resume, inside > > > intel_lr_context_render_state_init() where you can even emit an error > > > code! > > > > _unqueue is also called from intel_logical_ring_advance_and_submit and > > should cover us I hope. And I don't think we should add error handling for > > this, that has cost of its own. > > Pardon? Your explanation for adding it to the *interrupt* handler after > an existing check is that it also catches the initial submission? I wanted to add it as low down as possible to increase changes of the check surviving refactoring. This way it's as close to possible to the writes to the submit ports. Yes that means it's also run from interrupt context when requeueing, but I'm fairly meh about that. What's your concern? -Daniel
On Tue, May 12, 2015 at 10:42:41AM +0200, Daniel Vetter wrote: > On Tue, May 12, 2015 at 09:18:00AM +0100, Chris Wilson wrote: > > On Tue, May 12, 2015 at 08:42:58AM +0200, Daniel Vetter wrote: > > > On Mon, May 11, 2015 at 09:26:40PM +0100, Chris Wilson wrote: > > > > On Mon, May 11, 2015 at 06:37:14PM +0200, Daniel Vetter wrote: > > > > > On Mon, May 11, 2015 at 04:03:27PM +0100, Peter Antoine wrote: > > > > > > If an batch ends while the IRQs are not turned on the notification can > > > > > > go missing and the GPU can hang. So generate a warning in this case. > > > > > > > > > > > > Signed-off-by: Peter Antoine <peter.antoine@intel.com> > > > > > > > > > > Queued for -next, thanks for the patch. > > > > > > > > Please drop this. We already have the test inside the irq handler. The > > > > instance where you want the guard is during resume, inside > > > > intel_lr_context_render_state_init() where you can even emit an error > > > > code! > > > > > > _unqueue is also called from intel_logical_ring_advance_and_submit and > > > should cover us I hope. And I don't think we should add error handling for > > > this, that has cost of its own. > > > > Pardon? Your explanation for adding it to the *interrupt* handler after > > an existing check is that it also catches the initial submission? > > I wanted to add it as low down as possible to increase changes of the > check surviving refactoring. This way it's as close to possible to the > writes to the submit ports. Yes that means it's also run from interrupt > context when requeueing, but I'm fairly meh about that. What's your > concern? The interrupt handler + submission is the ratelimiting factor in execlists throughput. Tests for external factors really should be on the boundary. -Chris
Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 6381
-------------------------------------Summary-------------------------------------
Platform Delta drm-intel-nightly Series Applied
PNV 276/276 276/276
ILK 302/302 302/302
SNB -1 314/314 313/314
IVB 338/338 338/338
BYT 286/286 286/286
BDW -1 320/320 319/320
-------------------------------------Detailed-------------------------------------
Platform Test drm-intel-nightly Series Applied
SNB igt@pm_rpm@dpms-mode-unset-non-lpsp DMESG_WARN(13)PASS(1) DMESG_WARN(1)
(dmesg patch applied)WARNING:at_drivers/gpu/drm/i915/intel_uncore.c:#assert_device_not_suspended[i915]()@WARNING:.* at .* assert_device_not_suspended+0x
*BDW igt@gem_gtt_hog PASS(3) DMESG_WARN(1)
(dmesg patch applied)WARNING:at_drivers/gpu/drm/i915/intel_display.c:#assert_plane[i915]()@WARNING:.* at .* assert_plane
assertion_failure@assertion failure
WARNING:at_drivers/gpu/drm/drm_irq.c:#drm_wait_one_vblank[drm]()@WARNING:.* at .* drm_wait_one_vblank+0x
Note: You need to pay more attention to line start with '*'
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 0fa9209..0413b8f 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -394,6 +394,12 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring) assert_spin_locked(&ring->execlist_lock); + /* + * If irqs are not active generate a warning as batches that finish + * without the irqs may get lost and a GPU Hang may occur. + */ + WARN_ON(!intel_irqs_enabled(ring->dev->dev_private)); + if (list_empty(&ring->execlist_queue)) return;
If an batch ends while the IRQs are not turned on the notification can go missing and the GPU can hang. So generate a warning in this case. Signed-off-by: Peter Antoine <peter.antoine@intel.com> --- drivers/gpu/drm/i915/intel_lrc.c | 6 ++++++ 1 file changed, 6 insertions(+)