Message ID | 20230320202326.296498-2-andi.shyti@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Report MMIO communication problems more clearly | expand |
On Mon, 20 Mar 2023, Andi Shyti <andi.shyti@linux.intel.com> wrote: > From: Matt Roper <matthew.d.roper@intel.com> > > We occasionally see the PCI device in a non-accessible state at the > point the driver is loaded. When this happens, all BAR accesses will > read back as 0xFFFFFFFF. Rather than reading registers and > misinterpreting their (invalid) values, let's specifically check for > 0xFFFFFFFF in a register that cannot have that value to see if the > device is accessible. > > Signed-off-by: Matt Roper <matthew.d.roper@intel.com> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> > --- > drivers/gpu/drm/i915/intel_uncore.c | 34 +++++++++++++++++++++++++++++ > 1 file changed, 34 insertions(+) > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > index e1e1f34490c8e..14ec45e6facfa 100644 > --- a/drivers/gpu/drm/i915/intel_uncore.c > +++ b/drivers/gpu/drm/i915/intel_uncore.c > @@ -2602,11 +2602,45 @@ static int uncore_forcewake_init(struct intel_uncore *uncore) > return 0; > } > > +static int sanity_check_mmio_access(struct intel_uncore *uncore) > +{ > + struct drm_i915_private *i915 = uncore->i915; > + > + if (GRAPHICS_VER(i915) < 8) > + return 0; > + > + /* > + * Sanitycheck that MMIO access to the device is working properly. If > + * the CPU is unable to communcate with a PCI device, BAR reads will > + * return 0xFFFFFFFF. Let's make sure the device isn't in this state > + * before we start trying to access registers. > + * > + * We use the primary GT's forcewake register as our guinea pig since > + * it's been around since HSW and it's a masked register so the upper > + * 16 bits can never read back as 1's if device access is operating > + * properly. > + * > + * If MMIO isn't working, we'll wait up to 2 seconds to see if it > + * recovers, then give up. > + */ > +#define COND (__raw_uncore_read32(uncore, FORCEWAKE_MT) != ~0) > + if (wait_for(COND, 2000) == -ETIMEDOUT) { I guess this somewhat reimplements intel_wait_for_register_fw()? > + drm_err(&i915->drm, "Device is non-operational; MMIO access returns 0xFFFFFFFF!\n"); > + return -EIO; > + } > + > + return 0; > +} > + > int intel_uncore_init_mmio(struct intel_uncore *uncore) > { > struct drm_i915_private *i915 = uncore->i915; > int ret; > > + ret = sanity_check_mmio_access(uncore); > + if (ret) > + return ret; > + > /* > * The boot firmware initializes local memory and assesses its health. > * If memory training fails, the punit will have been instructed to
Hi Jani, Thanks for looking into this, [...] > > +#define COND (__raw_uncore_read32(uncore, FORCEWAKE_MT) != ~0) > > + if (wait_for(COND, 2000) == -ETIMEDOUT) { > > I guess this somewhat reimplements intel_wait_for_register_fw()? Thanks! Andi > > + drm_err(&i915->drm, "Device is non-operational; MMIO access returns 0xFFFFFFFF!\n"); > > + return -EIO; > > + }
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index e1e1f34490c8e..14ec45e6facfa 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -2602,11 +2602,45 @@ static int uncore_forcewake_init(struct intel_uncore *uncore) return 0; } +static int sanity_check_mmio_access(struct intel_uncore *uncore) +{ + struct drm_i915_private *i915 = uncore->i915; + + if (GRAPHICS_VER(i915) < 8) + return 0; + + /* + * Sanitycheck that MMIO access to the device is working properly. If + * the CPU is unable to communcate with a PCI device, BAR reads will + * return 0xFFFFFFFF. Let's make sure the device isn't in this state + * before we start trying to access registers. + * + * We use the primary GT's forcewake register as our guinea pig since + * it's been around since HSW and it's a masked register so the upper + * 16 bits can never read back as 1's if device access is operating + * properly. + * + * If MMIO isn't working, we'll wait up to 2 seconds to see if it + * recovers, then give up. + */ +#define COND (__raw_uncore_read32(uncore, FORCEWAKE_MT) != ~0) + if (wait_for(COND, 2000) == -ETIMEDOUT) { + drm_err(&i915->drm, "Device is non-operational; MMIO access returns 0xFFFFFFFF!\n"); + return -EIO; + } + + return 0; +} + int intel_uncore_init_mmio(struct intel_uncore *uncore) { struct drm_i915_private *i915 = uncore->i915; int ret; + ret = sanity_check_mmio_access(uncore); + if (ret) + return ret; + /* * The boot firmware initializes local memory and assesses its health. * If memory training fails, the punit will have been instructed to