Message ID | 20210721062607.512307-1-zhenyuw@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/gvt: Fix cached atomics setting for Windows VM | expand |
On Wed, Jul 21, 2021 at 8:21 AM Zhenyu Wang <zhenyuw@linux.intel.com> wrote: > We've seen recent regression with host and windows VM running > simultaneously that cause gpu hang or even crash. Finally bisect to > 58586680ffad ("drm/i915: Disable atomics in L3 for gen9"), which seems > cached atomics behavior difference caused regression issue. > > This trys to add new scratch register handler and add those in mmio > save/restore list for context switch. No gpu hang produced with this one. > > Cc: stable@vger.kernel.org # 5.12+ > Cc: "Xu, Terrence" <terrence.xu@intel.com> > Fixes: 58586680ffad ("drm/i915: Disable atomics in L3 for gen9") > Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Adding Jon Bloomfield, since different settings between linux and windows for something that can hard-hang the machine on gen9 sounds ... not good. -Daniel > --- > drivers/gpu/drm/i915/gvt/handlers.c | 1 + > drivers/gpu/drm/i915/gvt/mmio_context.c | 2 ++ > 2 files changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c > index 98eb48c24c46..345b4be5ebad 100644 > --- a/drivers/gpu/drm/i915/gvt/handlers.c > +++ b/drivers/gpu/drm/i915/gvt/handlers.c > @@ -3134,6 +3134,7 @@ static int init_bdw_mmio_info(struct intel_gvt *gvt) > MMIO_DFH(_MMIO(0xb100), D_BDW, F_CMD_ACCESS, NULL, NULL); > MMIO_DFH(_MMIO(0xb10c), D_BDW, F_CMD_ACCESS, NULL, NULL); > MMIO_D(_MMIO(0xb110), D_BDW); > + MMIO_D(GEN9_SCRATCH_LNCF1, D_BDW_PLUS); > > MMIO_F(_MMIO(0x24d0), 48, F_CMD_ACCESS | F_CMD_WRITE_PATCH, 0, 0, > D_BDW_PLUS, NULL, force_nonpriv_write); > diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c > index b8ac80765461..f776c470914d 100644 > --- a/drivers/gpu/drm/i915/gvt/mmio_context.c > +++ b/drivers/gpu/drm/i915/gvt/mmio_context.c > @@ -105,6 +105,8 @@ static struct engine_mmio gen9_engine_mmio_list[] __cacheline_aligned = { > {RCS0, COMMON_SLICE_CHICKEN2, 0xffff, true}, /* 0x7014 */ > {RCS0, GEN9_CS_DEBUG_MODE1, 0xffff, false}, /* 0x20ec */ > {RCS0, GEN8_L3SQCREG4, 0, false}, /* 0xb118 */ > + {RCS0, GEN9_SCRATCH1, 0, false}, /* 0xb11c */ > + {RCS0, GEN9_SCRATCH_LNCF1, 0, false}, /* 0xb008 */ > {RCS0, GEN7_HALF_SLICE_CHICKEN1, 0xffff, true}, /* 0xe100 */ > {RCS0, HALF_SLICE_CHICKEN2, 0xffff, true}, /* 0xe180 */ > {RCS0, HALF_SLICE_CHICKEN3, 0xffff, true}, /* 0xe184 */ > -- > 2.32.0.rc2 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Wed, Jul 21, 2021 at 4:08 AM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Wed, Jul 21, 2021 at 8:21 AM Zhenyu Wang <zhenyuw@linux.intel.com> wrote: > > We've seen recent regression with host and windows VM running > > simultaneously that cause gpu hang or even crash. Finally bisect to > > 58586680ffad ("drm/i915: Disable atomics in L3 for gen9"), which seems > > cached atomics behavior difference caused regression issue. > > > > This trys to add new scratch register handler and add those in mmio > > save/restore list for context switch. No gpu hang produced with this one. > > > > Cc: stable@vger.kernel.org # 5.12+ > > Cc: "Xu, Terrence" <terrence.xu@intel.com> > > Fixes: 58586680ffad ("drm/i915: Disable atomics in L3 for gen9") > > Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> > > Adding Jon Bloomfield, since different settings between linux and > windows for something that can hard-hang the machine on gen9 sounds > ... not good. The difference there is legit and intentional. As far as what we do about it for GVT, if we can safely smash L3 atomics off underneath Windows without causing problems for the VM, we should do that. If not, we need to discuss this internally before proceeding. --Jason > -Daniel > > > --- > > drivers/gpu/drm/i915/gvt/handlers.c | 1 + > > drivers/gpu/drm/i915/gvt/mmio_context.c | 2 ++ > > 2 files changed, 3 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c > > index 98eb48c24c46..345b4be5ebad 100644 > > --- a/drivers/gpu/drm/i915/gvt/handlers.c > > +++ b/drivers/gpu/drm/i915/gvt/handlers.c > > @@ -3134,6 +3134,7 @@ static int init_bdw_mmio_info(struct intel_gvt *gvt) > > MMIO_DFH(_MMIO(0xb100), D_BDW, F_CMD_ACCESS, NULL, NULL); > > MMIO_DFH(_MMIO(0xb10c), D_BDW, F_CMD_ACCESS, NULL, NULL); > > MMIO_D(_MMIO(0xb110), D_BDW); > > + MMIO_D(GEN9_SCRATCH_LNCF1, D_BDW_PLUS); > > > > MMIO_F(_MMIO(0x24d0), 48, F_CMD_ACCESS | F_CMD_WRITE_PATCH, 0, 0, > > D_BDW_PLUS, NULL, force_nonpriv_write); > > diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c > > index b8ac80765461..f776c470914d 100644 > > --- a/drivers/gpu/drm/i915/gvt/mmio_context.c > > +++ b/drivers/gpu/drm/i915/gvt/mmio_context.c > > @@ -105,6 +105,8 @@ static struct engine_mmio gen9_engine_mmio_list[] __cacheline_aligned = { > > {RCS0, COMMON_SLICE_CHICKEN2, 0xffff, true}, /* 0x7014 */ > > {RCS0, GEN9_CS_DEBUG_MODE1, 0xffff, false}, /* 0x20ec */ > > {RCS0, GEN8_L3SQCREG4, 0, false}, /* 0xb118 */ > > + {RCS0, GEN9_SCRATCH1, 0, false}, /* 0xb11c */ > > + {RCS0, GEN9_SCRATCH_LNCF1, 0, false}, /* 0xb008 */ > > {RCS0, GEN7_HALF_SLICE_CHICKEN1, 0xffff, true}, /* 0xe100 */ > > {RCS0, HALF_SLICE_CHICKEN2, 0xffff, true}, /* 0xe180 */ > > {RCS0, HALF_SLICE_CHICKEN3, 0xffff, true}, /* 0xe184 */ > > -- > > 2.32.0.rc2 > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c index 98eb48c24c46..345b4be5ebad 100644 --- a/drivers/gpu/drm/i915/gvt/handlers.c +++ b/drivers/gpu/drm/i915/gvt/handlers.c @@ -3134,6 +3134,7 @@ static int init_bdw_mmio_info(struct intel_gvt *gvt) MMIO_DFH(_MMIO(0xb100), D_BDW, F_CMD_ACCESS, NULL, NULL); MMIO_DFH(_MMIO(0xb10c), D_BDW, F_CMD_ACCESS, NULL, NULL); MMIO_D(_MMIO(0xb110), D_BDW); + MMIO_D(GEN9_SCRATCH_LNCF1, D_BDW_PLUS); MMIO_F(_MMIO(0x24d0), 48, F_CMD_ACCESS | F_CMD_WRITE_PATCH, 0, 0, D_BDW_PLUS, NULL, force_nonpriv_write); diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c index b8ac80765461..f776c470914d 100644 --- a/drivers/gpu/drm/i915/gvt/mmio_context.c +++ b/drivers/gpu/drm/i915/gvt/mmio_context.c @@ -105,6 +105,8 @@ static struct engine_mmio gen9_engine_mmio_list[] __cacheline_aligned = { {RCS0, COMMON_SLICE_CHICKEN2, 0xffff, true}, /* 0x7014 */ {RCS0, GEN9_CS_DEBUG_MODE1, 0xffff, false}, /* 0x20ec */ {RCS0, GEN8_L3SQCREG4, 0, false}, /* 0xb118 */ + {RCS0, GEN9_SCRATCH1, 0, false}, /* 0xb11c */ + {RCS0, GEN9_SCRATCH_LNCF1, 0, false}, /* 0xb008 */ {RCS0, GEN7_HALF_SLICE_CHICKEN1, 0xffff, true}, /* 0xe100 */ {RCS0, HALF_SLICE_CHICKEN2, 0xffff, true}, /* 0xe180 */ {RCS0, HALF_SLICE_CHICKEN3, 0xffff, true}, /* 0xe184 */
We've seen recent regression with host and windows VM running simultaneously that cause gpu hang or even crash. Finally bisect to 58586680ffad ("drm/i915: Disable atomics in L3 for gen9"), which seems cached atomics behavior difference caused regression issue. This trys to add new scratch register handler and add those in mmio save/restore list for context switch. No gpu hang produced with this one. Cc: stable@vger.kernel.org # 5.12+ Cc: "Xu, Terrence" <terrence.xu@intel.com> Fixes: 58586680ffad ("drm/i915: Disable atomics in L3 for gen9") Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> --- drivers/gpu/drm/i915/gvt/handlers.c | 1 + drivers/gpu/drm/i915/gvt/mmio_context.c | 2 ++ 2 files changed, 3 insertions(+)