diff mbox

[3/3] drm/i915: fix infinite loop at gen6_update_ring_freq

Message ID 1396391989-2244-4-git-send-email-przanoni@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paulo Zanoni April 1, 2014, 10:39 p.m. UTC
From: Paulo Zanoni <paulo.r.zanoni@intel.com>

If I boot my Broadwell machine to X on a system with Mesa Gallium
llvmpipe instead of i965, then kill X and try to run pm_pc8.c, when we
disable PC8 and call gen6_update_ring_freq(), we will get stuck on an
infinite loop because the frequencies are zero and the variables are
unsigned. This happens because we never ran any batch, so we did not
enable RC6, so the variables are zero. If I run gem_exec_nop before
running pm_pc8, everything works as expected because gem_exec_nop
makes RC6 be enabled.

This commit should prevent the infinite loop, which IMHO is already a
good reason to be merged, but it is not the proper fix to the "RC6 is
not being enabled" problem.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
---
 drivers/gpu/drm/i915/intel_pm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Ben Widawsky April 2, 2014, 5:03 a.m. UTC | #1
On Tue, Apr 01, 2014 at 07:39:49PM -0300, Paulo Zanoni wrote:
> From: Paulo Zanoni <paulo.r.zanoni@intel.com>
> 
> If I boot my Broadwell machine to X on a system with Mesa Gallium
> llvmpipe instead of i965, then kill X and try to run pm_pc8.c, when we
> disable PC8 and call gen6_update_ring_freq(), we will get stuck on an
> infinite loop because the frequencies are zero and the variables are
> unsigned. This happens because we never ran any batch, so we did not
> enable RC6, so the variables are zero. If I run gem_exec_nop before
> running pm_pc8, everything works as expected because gem_exec_nop
> makes RC6 be enabled.
> 
> This commit should prevent the infinite loop, which IMHO is already a
> good reason to be merged, but it is not the proper fix to the "RC6 is
> not being enabled" problem.
> 
> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>

You should determine if it's possible to hit this without my patch. I
convinced myself pretty thoroughly that it isn't possible to enter pc8
(and thus get to this point) without actually turning on RC6. One theory
I had which I'm pretty sure I disproved is if BIOS leaves RC6 on, and we
get into PC8 before the deferred rc6 workqueue runs. I'm pretty certain
however that we forcibly disable rc6 first, so it can't happen.

> ---
>  drivers/gpu/drm/i915/intel_pm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index cebe0d4..0a0685b 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -3487,7 +3487,8 @@ void gen6_update_ring_freq(struct drm_device *dev)
>  	 * to use for memory access.  We do this by specifying the IA frequency
>  	 * the PCU should use as a reference to determine the ring frequency.
>  	 */

So I think it should really be (if anything at all):
if (WARN_ON(!dev_priv->rps.max_freq_softlimit))
	return;

> -	for (gpu_freq = dev_priv->rps.max_freq_softlimit; gpu_freq >= dev_priv->rps.min_freq_softlimit;
> +	for (gpu_freq = dev_priv->rps.max_freq_softlimit;
> +	     gpu_freq >= dev_priv->rps.min_freq_softlimit && gpu_freq != 0;
>  	     gpu_freq--) {
>  		int diff = dev_priv->rps.max_freq_softlimit - gpu_freq;
>  		unsigned int ia_freq = 0, ring_freq = 0;
> -- 
> 1.8.5.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter April 2, 2014, 7:26 a.m. UTC | #2
On Tue, Apr 01, 2014 at 07:39:49PM -0300, Paulo Zanoni wrote:
> From: Paulo Zanoni <paulo.r.zanoni@intel.com>
> 
> If I boot my Broadwell machine to X on a system with Mesa Gallium
> llvmpipe instead of i965, then kill X and try to run pm_pc8.c, when we
> disable PC8 and call gen6_update_ring_freq(), we will get stuck on an
> infinite loop because the frequencies are zero and the variables are
> unsigned. This happens because we never ran any batch, so we did not
> enable RC6, so the variables are zero. If I run gem_exec_nop before
> running pm_pc8, everything works as expected because gem_exec_nop
> makes RC6 be enabled.
> 
> This commit should prevent the infinite loop, which IMHO is already a
> good reason to be merged, but it is not the proper fix to the "RC6 is
> not being enabled" problem.
> 
> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>

This will be fixed (I hope at least) once we have the in-kernel 3d state
emission stuff.

But there's still a race when we go into runtime suspend right after
coming out of system resume. I think we need to grab a runtime pm
reference when we launch the delayed rps work and drop it once that has
run. Or at least we need to have a flush_delayed_work in the runtime pm
suspend function.

There's probably piles more stuff like this.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index cebe0d4..0a0685b 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3487,7 +3487,8 @@  void gen6_update_ring_freq(struct drm_device *dev)
 	 * to use for memory access.  We do this by specifying the IA frequency
 	 * the PCU should use as a reference to determine the ring frequency.
 	 */
-	for (gpu_freq = dev_priv->rps.max_freq_softlimit; gpu_freq >= dev_priv->rps.min_freq_softlimit;
+	for (gpu_freq = dev_priv->rps.max_freq_softlimit;
+	     gpu_freq >= dev_priv->rps.min_freq_softlimit && gpu_freq != 0;
 	     gpu_freq--) {
 		int diff = dev_priv->rps.max_freq_softlimit - gpu_freq;
 		unsigned int ia_freq = 0, ring_freq = 0;