Message ID | 1433272019-21818-1-git-send-email-arun.siluvery@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 6526
-------------------------------------Summary-------------------------------------
Platform Delta drm-intel-nightly Series Applied
PNV 276/276 276/276
ILK 303/303 303/303
SNB -1 315/315 314/315
IVB 343/343 343/343
BYT 287/287 287/287
BDW 321/321 321/321
-------------------------------------Detailed-------------------------------------
Platform Test drm-intel-nightly Series Applied
*SNB igt@pm_rpm@dpms-mode-unset-non-lpsp PASS(1) DMESG_WARN(1)
(dmesg patch applied)WARNING:at_drivers/gpu/drm/i915/intel_uncore.c:#assert_device_not_suspended[i915]()@WARNING:.* at .* assert_device_not_suspended+0x
Note: You need to pay more attention to line start with '*'
On Tue, Jun 02, 2015 at 08:06:59PM +0100, Arun Siluvery wrote: > After GPU reset, HW is losing the address of HWS page in the register. > The page itself is valid except that HW is not aware of its location. > > [ 64.368623] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000 > [ 64.368655] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000 > [ 64.368681] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000 > [ 64.368704] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000 > > This patch reloads this value into the register during ring init. > > Change-Id: Ibdd1e4645921b4deb02dfc8d0d8e6ba993ce7371 > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> > --- > drivers/gpu/drm/i915/intel_lrc.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index 0413b8f..73033f5 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -1085,6 +1085,12 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring) > I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask)); > I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff); > > + if (ring->status_page.obj) { > + I915_WRITE(RING_HWS_PGA(ring->mmio_base), > + (u32)ring->status_page.gfx_addr); > + POSTING_READ(RING_HWS_PGA(ring->mmio_base)); > + } I was going to suggest removing the same thing from the lrc_setup_hardware_status_page(), but after another look it seems we sometimes call .init_hw() before the context setup. Would be nice to have a more consistent sequence for init and reset. But anyway the patch looks OK to me. I verified that we indeed lose this register on GPU reset. Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > + > I915_WRITE(RING_MODE_GEN7(ring), > _MASKED_BIT_DISABLE(GFX_REPLAY_MODE) | > _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE)); > -- > 2.3.0 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Wed, 03 Jun 2015, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote: > On Tue, Jun 02, 2015 at 08:06:59PM +0100, Arun Siluvery wrote: >> After GPU reset, HW is losing the address of HWS page in the register. >> The page itself is valid except that HW is not aware of its location. >> >> [ 64.368623] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000 >> [ 64.368655] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000 >> [ 64.368681] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000 >> [ 64.368704] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000 >> >> This patch reloads this value into the register during ring init. >> >> Change-Id: Ibdd1e4645921b4deb02dfc8d0d8e6ba993ce7371 >> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> >> --- >> drivers/gpu/drm/i915/intel_lrc.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c >> index 0413b8f..73033f5 100644 >> --- a/drivers/gpu/drm/i915/intel_lrc.c >> +++ b/drivers/gpu/drm/i915/intel_lrc.c >> @@ -1085,6 +1085,12 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring) >> I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask)); >> I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff); >> >> + if (ring->status_page.obj) { >> + I915_WRITE(RING_HWS_PGA(ring->mmio_base), >> + (u32)ring->status_page.gfx_addr); >> + POSTING_READ(RING_HWS_PGA(ring->mmio_base)); >> + } > > I was going to suggest removing the same thing from the > lrc_setup_hardware_status_page(), but after another look it seems we > sometimes call .init_hw() before the context setup. Would be nice to > have a more consistent sequence for init and reset. But anyway the patch > looks OK to me. I verified that we indeed lose this register on GPU > reset. > > Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Pushed to drm-intel-fixes, thanks for the patch and review. BR, Jani. > >> + >> I915_WRITE(RING_MODE_GEN7(ring), >> _MASKED_BIT_DISABLE(GFX_REPLAY_MODE) | >> _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE)); >> -- >> 2.3.0 >> >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Ville Syrjälä > Intel OTC > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Wed, Jun 3, 2015 at 6:14 PM, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote: > I was going to suggest removing the same thing from the > lrc_setup_hardware_status_page(), but after another look it seems we > sometimes call .init_hw() before the context setup. Would be nice to > have a more consistent sequence for init and reset. But anyway the patch > looks OK to me. I verified that we indeed lose this register on GPU > reset. Yep, this is a mess. And historically _any_ difference between driver load and gpu reset (or resume fwiw) has lead to hilarious bugs, so this difference is really troubling to me. Arun, can you please work on a patch to unify the setup sequence here, so that both driver load gpu resets work the same way? By the time we're calling gem_init_hw the default context should have been created already, and hence we should be able to write to HWS_PGA in ring->init_hw only. Also I wonder about resume, where's the HWS_PGA restore for that case? -Daniel
On 15/06/2015 06:20, Daniel Vetter wrote: > On Wed, Jun 3, 2015 at 6:14 PM, Ville Syrjälä > <ville.syrjala@linux.intel.com> wrote: >> I was going to suggest removing the same thing from the >> lrc_setup_hardware_status_page(), but after another look it seems we >> sometimes call .init_hw() before the context setup. Would be nice to >> have a more consistent sequence for init and reset. But anyway the patch >> looks OK to me. I verified that we indeed lose this register on GPU >> reset. > > Yep, this is a mess. And historically _any_ difference between driver > load and gpu reset (or resume fwiw) has lead to hilarious bugs, so > this difference is really troubling to me. Arun, can you please work > on a patch to unify the setup sequence here, so that both driver load > gpu resets work the same way? By the time we're calling gem_init_hw > the default context should have been created already, and hence we > should be able to write to HWS_PGA in ring->init_hw only. > Hi Daniel, I think the problem in this case was the code to init HWS page after reset was missing for Gen8+. For Gen7 we are doing this as part of ring->init_hw. Gen7: i915_reset() +--> i915_gem_init_hw() +------> ring->init_hw() which is init_render_ring() +----------> init_ring_common() +------------> intel_ring_setup_status_page() Gen8: i915_reset() +--> i915_gem_init_hw() +------> ring->init_hw() which is gen8_init_render_ring() +--------> gen8_init_common_ring() - I added changes in this function. We could probably use intel_ring_setup_status_page() for both cases, does it have to be Gen7 specific? > Also I wonder about resume, where's the HWS_PGA restore for that case? It is covered. i915_drm_resume() +-->i915_gem_init_hw regards Arun > -Daniel >
On Thu, Jun 18, 2015 at 03:05:12PM +0100, Siluvery, Arun wrote: > On 15/06/2015 06:20, Daniel Vetter wrote: > >On Wed, Jun 3, 2015 at 6:14 PM, Ville Syrjälä > ><ville.syrjala@linux.intel.com> wrote: > >>I was going to suggest removing the same thing from the > >>lrc_setup_hardware_status_page(), but after another look it seems we > >>sometimes call .init_hw() before the context setup. Would be nice to > >>have a more consistent sequence for init and reset. But anyway the patch > >>looks OK to me. I verified that we indeed lose this register on GPU > >>reset. > > > >Yep, this is a mess. And historically _any_ difference between driver > >load and gpu reset (or resume fwiw) has lead to hilarious bugs, so > >this difference is really troubling to me. Arun, can you please work > >on a patch to unify the setup sequence here, so that both driver load > >gpu resets work the same way? By the time we're calling gem_init_hw > >the default context should have been created already, and hence we > >should be able to write to HWS_PGA in ring->init_hw only. > > > > Hi Daniel, > > I think the problem in this case was the code to init HWS page after reset > was missing for Gen8+. For Gen7 we are doing this as part of ring->init_hw. > > Gen7: > i915_reset() > +--> i915_gem_init_hw() > +------> ring->init_hw() which is init_render_ring() > +----------> init_ring_common() > +------------> intel_ring_setup_status_page() > > Gen8: > i915_reset() > +--> i915_gem_init_hw() > +------> ring->init_hw() which is gen8_init_render_ring() > +--------> gen8_init_common_ring() - I added changes in this function. > > We could probably use intel_ring_setup_status_page() for both cases, does it > have to be Gen7 specific? My concern isn't that we have two functions doing hws setup. My concern is that we now have 2 callsites for execlist mode doing hws setup, with slight differences between reset/driver load and resume. I want one, unconditional call to set up the hws page at exactly the right place in the setup sequence. That might require some refactoring, I haven't looked that closely at intel_lrc.c The usual approach is that gem_init does exclusively software setup, and gem_init_hw does all the register writes an actual enabling. I think the hws setup in the driver load code is currently called from gem_init(), which is the wrong place. > >Also I wonder about resume, where's the HWS_PGA restore for that case? > It is covered. > > i915_drm_resume() > +-->i915_gem_init_hw Ok, so should be covered with whatever fix we have for gpu reset. Thanks, Daniel
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 0413b8f..73033f5 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1085,6 +1085,12 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring) I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask)); I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff); + if (ring->status_page.obj) { + I915_WRITE(RING_HWS_PGA(ring->mmio_base), + (u32)ring->status_page.gfx_addr); + POSTING_READ(RING_HWS_PGA(ring->mmio_base)); + } + I915_WRITE(RING_MODE_GEN7(ring), _MASKED_BIT_DISABLE(GFX_REPLAY_MODE) | _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));
After GPU reset, HW is losing the address of HWS page in the register. The page itself is valid except that HW is not aware of its location. [ 64.368623] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000 [ 64.368655] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000 [ 64.368681] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000 [ 64.368704] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000 This patch reloads this value into the register during ring init. Change-Id: Ibdd1e4645921b4deb02dfc8d0d8e6ba993ce7371 Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> --- drivers/gpu/drm/i915/intel_lrc.c | 6 ++++++ 1 file changed, 6 insertions(+)