Message ID | 20171109185101.5653-1-hdegoede@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Hans de Goede (2017-11-09 18:51:01) > Bay Trail devices are known to hang when changing the frequency often, > this is discussed in great length in: > https://bugzilla.kernel.org/show_bug.cgi?id=109051 > > Commit 6067a27d1f01 ("drm/i915: Avoid tweaking evaluation thresholds > on Baytrail v3") is an attempt to workaround this. Several users in > bko109051 report that an earlier version of this patch, v1: > https://bugzilla.kernel.org/attachment.cgi?id=251471 > > Works better for them and they still see hangs with the merged v3. > > Comparing the 2 versions shows that they are indeed not equivalent, > v1 not only skips writing the GEN6_RP* registers from valleyview_set_rps, > as v3 does. It also contained these modifications to i915_irq.c: > > if (pm_iir & GEN6_PM_RP_DOWN_EI_EXPIRED) { > if (!vlv_c0_above(dev_priv, > &dev_priv->rps.down_ei, &now, > - dev_priv->rps.down_threshold)) > + VLV_RP_DOWN_EI_THRESHOLD)) > events |= GEN6_PM_RP_DOWN_THRESHOLD; > dev_priv->rps.down_ei = now; > } > > if (pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) { > if (vlv_c0_above(dev_priv, > &dev_priv->rps.up_ei, &now, > - dev_priv->rps.up_threshold)) > + VLV_RP_UP_EI_THRESHOLD)) > events |= GEN6_PM_RP_UP_THRESHOLD; > dev_priv->rps.up_ei = now; > } > > Which use less aggressive up/down thresholds, which results in less > GEN6_PM_RP_*_THRESHOLD events and thus in less calls to intel_set_rps() -> > valleyview_set_rps() -> vlv_punit_write(PUNIT_REG_GPU_FREQ_REQ). > With the last call being the likely cause of the hang. > > This commit hardcodes the threshold_up and _down values for Bay Trail to > less aggressive values, reducing the amount of clock frequency changes, > thus avoiding the hangs some people are still seeing with the merged fix. > > Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=109051 > Signed-off-by: Hans de Goede <hdegoede@redhat.com> > --- > drivers/gpu/drm/i915/i915_reg.h | 3 +++ > drivers/gpu/drm/i915/intel_pm.c | 5 ++++- > 2 files changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h > index 68a58cce6ab1..2561af075ebb 100644 > --- a/drivers/gpu/drm/i915/i915_reg.h > +++ b/drivers/gpu/drm/i915/i915_reg.h > @@ -1355,6 +1355,9 @@ enum i915_power_well_id { > #define VLV_BIAS_CPU_125_SOC_875 (6 << 2) > #define CHV_BIAS_CPU_50_SOC_50 (3 << 2) > > +#define VLV_RP_UP_EI_THRESHOLD 90 > +#define VLV_RP_DOWN_EI_THRESHOLD 70 > + > /* vlv2 north clock has */ > #define CCK_FUSE_REG 0x8 > #define CCK_FUSE_HPLL_FREQ_MASK 0x3 > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c > index 01966b89be14..177b6caa0a38 100644 > --- a/drivers/gpu/drm/i915/intel_pm.c > +++ b/drivers/gpu/drm/i915/intel_pm.c > @@ -6096,8 +6096,11 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val) > /* When byt can survive without system hang with dynamic > * sw freq adjustments, this restriction can be lifted. > */ > - if (IS_VALLEYVIEW(dev_priv)) > + if (IS_VALLEYVIEW(dev_priv)) { > + threshold_up = VLV_RP_UP_EI_THRESHOLD; > + threshold_down = VLV_RP_DOWN_EI_THRESHOLD; Basically you want a limit on the frequency of ... pcode access? As presented, you ultimately do not trust any write and the only solution is to disable all writes. No RPS whatsoever, run at max and hope rc6 works (maybe even decrease the rc6 threshold). One of the ideas we floated was moving the pcode access to a worker and ratelimiting the updates. -Chris
Hi, On 10-11-17 15:56, Chris Wilson wrote: > Quoting Hans de Goede (2017-11-09 18:51:01) >> Bay Trail devices are known to hang when changing the frequency often, >> this is discussed in great length in: >> https://bugzilla.kernel.org/show_bug.cgi?id=109051 >> >> Commit 6067a27d1f01 ("drm/i915: Avoid tweaking evaluation thresholds >> on Baytrail v3") is an attempt to workaround this. Several users in >> bko109051 report that an earlier version of this patch, v1: >> https://bugzilla.kernel.org/attachment.cgi?id=251471 >> >> Works better for them and they still see hangs with the merged v3. >> >> Comparing the 2 versions shows that they are indeed not equivalent, >> v1 not only skips writing the GEN6_RP* registers from valleyview_set_rps, >> as v3 does. It also contained these modifications to i915_irq.c: >> >> if (pm_iir & GEN6_PM_RP_DOWN_EI_EXPIRED) { >> if (!vlv_c0_above(dev_priv, >> &dev_priv->rps.down_ei, &now, >> - dev_priv->rps.down_threshold)) >> + VLV_RP_DOWN_EI_THRESHOLD)) >> events |= GEN6_PM_RP_DOWN_THRESHOLD; >> dev_priv->rps.down_ei = now; >> } >> >> if (pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) { >> if (vlv_c0_above(dev_priv, >> &dev_priv->rps.up_ei, &now, >> - dev_priv->rps.up_threshold)) >> + VLV_RP_UP_EI_THRESHOLD)) >> events |= GEN6_PM_RP_UP_THRESHOLD; >> dev_priv->rps.up_ei = now; >> } >> >> Which use less aggressive up/down thresholds, which results in less >> GEN6_PM_RP_*_THRESHOLD events and thus in less calls to intel_set_rps() -> >> valleyview_set_rps() -> vlv_punit_write(PUNIT_REG_GPU_FREQ_REQ). >> With the last call being the likely cause of the hang. >> >> This commit hardcodes the threshold_up and _down values for Bay Trail to >> less aggressive values, reducing the amount of clock frequency changes, >> thus avoiding the hangs some people are still seeing with the merged fix. >> >> Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=109051 >> Signed-off-by: Hans de Goede <hdegoede@redhat.com> >> --- >> drivers/gpu/drm/i915/i915_reg.h | 3 +++ >> drivers/gpu/drm/i915/intel_pm.c | 5 ++++- >> 2 files changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h >> index 68a58cce6ab1..2561af075ebb 100644 >> --- a/drivers/gpu/drm/i915/i915_reg.h >> +++ b/drivers/gpu/drm/i915/i915_reg.h >> @@ -1355,6 +1355,9 @@ enum i915_power_well_id { >> #define VLV_BIAS_CPU_125_SOC_875 (6 << 2) >> #define CHV_BIAS_CPU_50_SOC_50 (3 << 2) >> >> +#define VLV_RP_UP_EI_THRESHOLD 90 >> +#define VLV_RP_DOWN_EI_THRESHOLD 70 >> + >> /* vlv2 north clock has */ >> #define CCK_FUSE_REG 0x8 >> #define CCK_FUSE_HPLL_FREQ_MASK 0x3 >> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c >> index 01966b89be14..177b6caa0a38 100644 >> --- a/drivers/gpu/drm/i915/intel_pm.c >> +++ b/drivers/gpu/drm/i915/intel_pm.c >> @@ -6096,8 +6096,11 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val) >> /* When byt can survive without system hang with dynamic >> * sw freq adjustments, this restriction can be lifted. >> */ >> - if (IS_VALLEYVIEW(dev_priv)) >> + if (IS_VALLEYVIEW(dev_priv)) { >> + threshold_up = VLV_RP_UP_EI_THRESHOLD; >> + threshold_down = VLV_RP_DOWN_EI_THRESHOLD; > > Basically you want a limit on the frequency of ... pcode access? > As presented, you ultimately do not trust any write and the only > solution is to disable all writes. No RPS whatsoever, run at max and > hope rc6 works (maybe even decrease the rc6 threshold). > > One of the ideas we floated was moving the pcode access to a worker and > ratelimiting the updates. Yes ratelimiting the amount of vlv_punit_write(PUNIT_REG_GPU_FREQ_REQ) calls is also something which came to my mind as an alternative solution. As I tried to explain in the commit message my main inspiration for this patch was that people were reporting different success rates with v1 and v3 of the Commit 6067a27d1f01 ("drm/i915: Avoid tweaking evaluation thresholds on Baytrail v3") patch. Regards, Hans > -Chris >
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 68a58cce6ab1..2561af075ebb 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -1355,6 +1355,9 @@ enum i915_power_well_id { #define VLV_BIAS_CPU_125_SOC_875 (6 << 2) #define CHV_BIAS_CPU_50_SOC_50 (3 << 2) +#define VLV_RP_UP_EI_THRESHOLD 90 +#define VLV_RP_DOWN_EI_THRESHOLD 70 + /* vlv2 north clock has */ #define CCK_FUSE_REG 0x8 #define CCK_FUSE_HPLL_FREQ_MASK 0x3 diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index 01966b89be14..177b6caa0a38 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -6096,8 +6096,11 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val) /* When byt can survive without system hang with dynamic * sw freq adjustments, this restriction can be lifted. */ - if (IS_VALLEYVIEW(dev_priv)) + if (IS_VALLEYVIEW(dev_priv)) { + threshold_up = VLV_RP_UP_EI_THRESHOLD; + threshold_down = VLV_RP_DOWN_EI_THRESHOLD; goto skip_hw_write; + } I915_WRITE(GEN6_RP_UP_EI, GT_INTERVAL_FROM_US(dev_priv, ei_up));
Bay Trail devices are known to hang when changing the frequency often, this is discussed in great length in: https://bugzilla.kernel.org/show_bug.cgi?id=109051 Commit 6067a27d1f01 ("drm/i915: Avoid tweaking evaluation thresholds on Baytrail v3") is an attempt to workaround this. Several users in bko109051 report that an earlier version of this patch, v1: https://bugzilla.kernel.org/attachment.cgi?id=251471 Works better for them and they still see hangs with the merged v3. Comparing the 2 versions shows that they are indeed not equivalent, v1 not only skips writing the GEN6_RP* registers from valleyview_set_rps, as v3 does. It also contained these modifications to i915_irq.c: if (pm_iir & GEN6_PM_RP_DOWN_EI_EXPIRED) { if (!vlv_c0_above(dev_priv, &dev_priv->rps.down_ei, &now, - dev_priv->rps.down_threshold)) + VLV_RP_DOWN_EI_THRESHOLD)) events |= GEN6_PM_RP_DOWN_THRESHOLD; dev_priv->rps.down_ei = now; } if (pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) { if (vlv_c0_above(dev_priv, &dev_priv->rps.up_ei, &now, - dev_priv->rps.up_threshold)) + VLV_RP_UP_EI_THRESHOLD)) events |= GEN6_PM_RP_UP_THRESHOLD; dev_priv->rps.up_ei = now; } Which use less aggressive up/down thresholds, which results in less GEN6_PM_RP_*_THRESHOLD events and thus in less calls to intel_set_rps() -> valleyview_set_rps() -> vlv_punit_write(PUNIT_REG_GPU_FREQ_REQ). With the last call being the likely cause of the hang. This commit hardcodes the threshold_up and _down values for Bay Trail to less aggressive values, reducing the amount of clock frequency changes, thus avoiding the hangs some people are still seeing with the merged fix. Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=109051 Signed-off-by: Hans de Goede <hdegoede@redhat.com> --- drivers/gpu/drm/i915/i915_reg.h | 3 +++ drivers/gpu/drm/i915/intel_pm.c | 5 ++++- 2 files changed, 7 insertions(+), 1 deletion(-)