diff mbox series

[1/7] drm/i915: Use dedicated rc6 enabling sequence for gen11

Message ID 20190409161310.20382-1-mika.kuoppala@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [1/7] drm/i915: Use dedicated rc6 enabling sequence for gen11 | expand

Commit Message

Mika Kuoppala April 9, 2019, 4:13 p.m. UTC
In order not to inflate gen9 rc6 enabling sequence with
gen11 specifics, use a separate function for it.

Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_pm.c | 72 +++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

Comments

Chris Wilson April 9, 2019, 4:28 p.m. UTC | #1
Quoting Mika Kuoppala (2019-04-09 17:13:04)
> In order not to inflate gen9 rc6 enabling sequence with
> gen11 specifics, use a separate function for it.

And disable_rc6 remains as simple as before.
 
> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Michal Wajdeczko April 9, 2019, 4:57 p.m. UTC | #2
On Tue, 09 Apr 2019 18:13:04 +0200, Mika Kuoppala  
<mika.kuoppala@linux.intel.com> wrote:

[snip]

> +
> +	/*
> +	 * 2c: Program Coarse Power Gating Policies.
> +	 *
> +	 * Bspec's guidance is to use 25us (really 25 * 1280ns) here. What we
> +	 * use instead is a more conservative estimate for the maximum time
> +	 * it takes us to service a CS interrupt and submit a new ELSP - that
> +	 * is the time which the GPU is idle waiting for the CPU to select the
> +	 * next request to execute. If the idle hysteresis is less than that
> +	 * interrupt service latency, the hardware will automatically gate
> +	 * the power well and we will then incur the wake up cost on top of
> +	 * the service latency. A similar guide from intel_pstate is that we
> +	 * do not want the enable hysteresis to less than the wakeup latency.
> +	 *
> +	 * igt/gem_exec_nop/sequential provides a rough estimate for the
> +	 * service latency, and puts it around 10us for Broadwell (and other
> +	 * big core) and around 40us for Broxton (and other low power cores).
> +	 * [Note that for legacy ringbuffer submission, this is less than 1us!]
> +	 * However, the wakeup latency on Broxton is closer to 100us. To be
> +	 * conservative, we have to factor in a context switch on top (due
> +	 * to ksoftirqd).
> +	 */

Do we want to copy legacy comments to Gen11 specific function ?
Chris Wilson April 9, 2019, 5:04 p.m. UTC | #3
Quoting Michal Wajdeczko (2019-04-09 17:57:58)
> On Tue, 09 Apr 2019 18:13:04 +0200, Mika Kuoppala  
> <mika.kuoppala@linux.intel.com> wrote:
> 
> [snip]
> 
> > +
> > +     /*
> > +      * 2c: Program Coarse Power Gating Policies.
> > +      *
> > +      * Bspec's guidance is to use 25us (really 25 * 1280ns) here. What we
> > +      * use instead is a more conservative estimate for the maximum time
> > +      * it takes us to service a CS interrupt and submit a new ELSP - that
> > +      * is the time which the GPU is idle waiting for the CPU to select the
> > +      * next request to execute. If the idle hysteresis is less than that
> > +      * interrupt service latency, the hardware will automatically gate
> > +      * the power well and we will then incur the wake up cost on top of
> > +      * the service latency. A similar guide from intel_pstate is that we
> > +      * do not want the enable hysteresis to less than the wakeup latency.
> > +      *
> > +      * igt/gem_exec_nop/sequential provides a rough estimate for the
> > +      * service latency, and puts it around 10us for Broadwell (and other
> > +      * big core) and around 40us for Broxton (and other low power cores).
> > +      * [Note that for legacy ringbuffer submission, this is less than 1us!]
> > +      * However, the wakeup latency on Broxton is closer to 100us. To be
> > +      * conservative, we have to factor in a context switch on top (due
> > +      * to ksoftirqd).
> > +      */
> 
> Do we want to copy legacy comments to Gen11 specific function ?

The comment isn't legacy until you crunch through the measurements to
work out the minimum tolerances that are sensible for us.
-Chris
Chris Wilson April 10, 2019, 9:04 a.m. UTC | #4
Quoting Patchwork (2019-04-10 06:59:20)
> #### Possible fixes ####
> 
>   * igt@i915_pm_rps@reset:
>     - shard-iclb:         FAIL [fdo#108059] -> PASS +2

\o/
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index bba477e62a12..43ec0fb4c197 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7120,6 +7120,76 @@  static void gen9_enable_rps(struct drm_i915_private *dev_priv)
 	intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL);
 }
 
+static void gen11_enable_rc6(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	/* 1a: Software RC state - RC0 */
+	I915_WRITE(GEN6_RC_STATE, 0);
+
+	/* 1b: Get forcewake during program sequence. Although the driver
+	 * hasn't enabled a state yet where we need forcewake, BIOS may have.*/
+	intel_uncore_forcewake_get(&dev_priv->uncore, FORCEWAKE_ALL);
+
+	/* 2a: Disable RC states. */
+	I915_WRITE(GEN6_RC_CONTROL, 0);
+
+	/* 2b: Program RC6 thresholds.*/
+	I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 54 << 16 | 85);
+	I915_WRITE(GEN10_MEDIA_WAKE_RATE_LIMIT, 150);
+
+	I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000); /* 12500 * 1280ns */
+	I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25); /* 25 * 1280ns */
+	for_each_engine(engine, dev_priv, id)
+		I915_WRITE(RING_MAX_IDLE(engine->mmio_base), 10);
+
+	if (HAS_GUC(dev_priv))
+		I915_WRITE(GUC_MAX_IDLE_COUNT, 0xA);
+
+	I915_WRITE(GEN6_RC_SLEEP, 0);
+
+	/*
+	 * 2c: Program Coarse Power Gating Policies.
+	 *
+	 * Bspec's guidance is to use 25us (really 25 * 1280ns) here. What we
+	 * use instead is a more conservative estimate for the maximum time
+	 * it takes us to service a CS interrupt and submit a new ELSP - that
+	 * is the time which the GPU is idle waiting for the CPU to select the
+	 * next request to execute. If the idle hysteresis is less than that
+	 * interrupt service latency, the hardware will automatically gate
+	 * the power well and we will then incur the wake up cost on top of
+	 * the service latency. A similar guide from intel_pstate is that we
+	 * do not want the enable hysteresis to less than the wakeup latency.
+	 *
+	 * igt/gem_exec_nop/sequential provides a rough estimate for the
+	 * service latency, and puts it around 10us for Broadwell (and other
+	 * big core) and around 40us for Broxton (and other low power cores).
+	 * [Note that for legacy ringbuffer submission, this is less than 1us!]
+	 * However, the wakeup latency on Broxton is closer to 100us. To be
+	 * conservative, we have to factor in a context switch on top (due
+	 * to ksoftirqd).
+	 */
+	I915_WRITE(GEN9_MEDIA_PG_IDLE_HYSTERESIS, 250);
+	I915_WRITE(GEN9_RENDER_PG_IDLE_HYSTERESIS, 250);
+
+	/* 3a: Enable RC6 */
+	I915_WRITE(GEN6_RC6_THRESHOLD, 37500); /* 37.5/125ms per EI */
+
+	I915_WRITE(GEN6_RC_CONTROL,
+		   GEN6_RC_CTL_HW_ENABLE |
+		   GEN6_RC_CTL_RC6_ENABLE |
+		   GEN6_RC_CTL_EI_MODE(1));
+
+	/*
+	 * 3b: Enable Coarse Power Gating only when RC6 is enabled.
+	 */
+	I915_WRITE(GEN9_PG_ENABLE,
+		   GEN9_RENDER_PG_ENABLE | GEN9_MEDIA_PG_ENABLE);
+
+	intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL);
+}
+
 static void gen9_enable_rc6(struct drm_i915_private *dev_priv)
 {
 	struct intel_engine_cs *engine;
@@ -8596,6 +8666,8 @@  static void intel_enable_rc6(struct drm_i915_private *dev_priv)
 		cherryview_enable_rc6(dev_priv);
 	else if (IS_VALLEYVIEW(dev_priv))
 		valleyview_enable_rc6(dev_priv);
+	else if (INTEL_GEN(dev_priv) >= 11)
+		gen11_enable_rc6(dev_priv);
 	else if (INTEL_GEN(dev_priv) >= 9)
 		gen9_enable_rc6(dev_priv);
 	else if (IS_BROADWELL(dev_priv))