Message ID | 20190409161310.20382-1-mika.kuoppala@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/7] drm/i915: Use dedicated rc6 enabling sequence for gen11 | expand |
Quoting Mika Kuoppala (2019-04-09 17:13:04) > In order not to inflate gen9 rc6 enabling sequence with > gen11 specifics, use a separate function for it. And disable_rc6 remains as simple as before. > Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris
On Tue, 09 Apr 2019 18:13:04 +0200, Mika Kuoppala <mika.kuoppala@linux.intel.com> wrote: [snip] > + > + /* > + * 2c: Program Coarse Power Gating Policies. > + * > + * Bspec's guidance is to use 25us (really 25 * 1280ns) here. What we > + * use instead is a more conservative estimate for the maximum time > + * it takes us to service a CS interrupt and submit a new ELSP - that > + * is the time which the GPU is idle waiting for the CPU to select the > + * next request to execute. If the idle hysteresis is less than that > + * interrupt service latency, the hardware will automatically gate > + * the power well and we will then incur the wake up cost on top of > + * the service latency. A similar guide from intel_pstate is that we > + * do not want the enable hysteresis to less than the wakeup latency. > + * > + * igt/gem_exec_nop/sequential provides a rough estimate for the > + * service latency, and puts it around 10us for Broadwell (and other > + * big core) and around 40us for Broxton (and other low power cores). > + * [Note that for legacy ringbuffer submission, this is less than 1us!] > + * However, the wakeup latency on Broxton is closer to 100us. To be > + * conservative, we have to factor in a context switch on top (due > + * to ksoftirqd). > + */ Do we want to copy legacy comments to Gen11 specific function ?
Quoting Michal Wajdeczko (2019-04-09 17:57:58) > On Tue, 09 Apr 2019 18:13:04 +0200, Mika Kuoppala > <mika.kuoppala@linux.intel.com> wrote: > > [snip] > > > + > > + /* > > + * 2c: Program Coarse Power Gating Policies. > > + * > > + * Bspec's guidance is to use 25us (really 25 * 1280ns) here. What we > > + * use instead is a more conservative estimate for the maximum time > > + * it takes us to service a CS interrupt and submit a new ELSP - that > > + * is the time which the GPU is idle waiting for the CPU to select the > > + * next request to execute. If the idle hysteresis is less than that > > + * interrupt service latency, the hardware will automatically gate > > + * the power well and we will then incur the wake up cost on top of > > + * the service latency. A similar guide from intel_pstate is that we > > + * do not want the enable hysteresis to less than the wakeup latency. > > + * > > + * igt/gem_exec_nop/sequential provides a rough estimate for the > > + * service latency, and puts it around 10us for Broadwell (and other > > + * big core) and around 40us for Broxton (and other low power cores). > > + * [Note that for legacy ringbuffer submission, this is less than 1us!] > > + * However, the wakeup latency on Broxton is closer to 100us. To be > > + * conservative, we have to factor in a context switch on top (due > > + * to ksoftirqd). > > + */ > > Do we want to copy legacy comments to Gen11 specific function ? The comment isn't legacy until you crunch through the measurements to work out the minimum tolerances that are sensible for us. -Chris
Quoting Patchwork (2019-04-10 06:59:20) > #### Possible fixes #### > > * igt@i915_pm_rps@reset: > - shard-iclb: FAIL [fdo#108059] -> PASS +2 \o/ -Chris
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index bba477e62a12..43ec0fb4c197 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -7120,6 +7120,76 @@ static void gen9_enable_rps(struct drm_i915_private *dev_priv) intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL); } +static void gen11_enable_rc6(struct drm_i915_private *dev_priv) +{ + struct intel_engine_cs *engine; + enum intel_engine_id id; + + /* 1a: Software RC state - RC0 */ + I915_WRITE(GEN6_RC_STATE, 0); + + /* 1b: Get forcewake during program sequence. Although the driver + * hasn't enabled a state yet where we need forcewake, BIOS may have.*/ + intel_uncore_forcewake_get(&dev_priv->uncore, FORCEWAKE_ALL); + + /* 2a: Disable RC states. */ + I915_WRITE(GEN6_RC_CONTROL, 0); + + /* 2b: Program RC6 thresholds.*/ + I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 54 << 16 | 85); + I915_WRITE(GEN10_MEDIA_WAKE_RATE_LIMIT, 150); + + I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000); /* 12500 * 1280ns */ + I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25); /* 25 * 1280ns */ + for_each_engine(engine, dev_priv, id) + I915_WRITE(RING_MAX_IDLE(engine->mmio_base), 10); + + if (HAS_GUC(dev_priv)) + I915_WRITE(GUC_MAX_IDLE_COUNT, 0xA); + + I915_WRITE(GEN6_RC_SLEEP, 0); + + /* + * 2c: Program Coarse Power Gating Policies. + * + * Bspec's guidance is to use 25us (really 25 * 1280ns) here. What we + * use instead is a more conservative estimate for the maximum time + * it takes us to service a CS interrupt and submit a new ELSP - that + * is the time which the GPU is idle waiting for the CPU to select the + * next request to execute. If the idle hysteresis is less than that + * interrupt service latency, the hardware will automatically gate + * the power well and we will then incur the wake up cost on top of + * the service latency. A similar guide from intel_pstate is that we + * do not want the enable hysteresis to less than the wakeup latency. + * + * igt/gem_exec_nop/sequential provides a rough estimate for the + * service latency, and puts it around 10us for Broadwell (and other + * big core) and around 40us for Broxton (and other low power cores). + * [Note that for legacy ringbuffer submission, this is less than 1us!] + * However, the wakeup latency on Broxton is closer to 100us. To be + * conservative, we have to factor in a context switch on top (due + * to ksoftirqd). + */ + I915_WRITE(GEN9_MEDIA_PG_IDLE_HYSTERESIS, 250); + I915_WRITE(GEN9_RENDER_PG_IDLE_HYSTERESIS, 250); + + /* 3a: Enable RC6 */ + I915_WRITE(GEN6_RC6_THRESHOLD, 37500); /* 37.5/125ms per EI */ + + I915_WRITE(GEN6_RC_CONTROL, + GEN6_RC_CTL_HW_ENABLE | + GEN6_RC_CTL_RC6_ENABLE | + GEN6_RC_CTL_EI_MODE(1)); + + /* + * 3b: Enable Coarse Power Gating only when RC6 is enabled. + */ + I915_WRITE(GEN9_PG_ENABLE, + GEN9_RENDER_PG_ENABLE | GEN9_MEDIA_PG_ENABLE); + + intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL); +} + static void gen9_enable_rc6(struct drm_i915_private *dev_priv) { struct intel_engine_cs *engine; @@ -8596,6 +8666,8 @@ static void intel_enable_rc6(struct drm_i915_private *dev_priv) cherryview_enable_rc6(dev_priv); else if (IS_VALLEYVIEW(dev_priv)) valleyview_enable_rc6(dev_priv); + else if (INTEL_GEN(dev_priv) >= 11) + gen11_enable_rc6(dev_priv); else if (INTEL_GEN(dev_priv) >= 9) gen9_enable_rc6(dev_priv); else if (IS_BROADWELL(dev_priv))
In order not to inflate gen9 rc6 enabling sequence with gen11 specifics, use a separate function for it. Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> --- drivers/gpu/drm/i915/intel_pm.c | 72 +++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+)