[4/7] drm/i915: no hangcheck when reset is in progress

Message ID	1372861332-6308-5-git-send-email-mika.kuoppala@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org> From: Mika Kuoppala <mika.kuoppala@linux.intel.com> To: intel-gfx@lists.freedesktop.org Date: Wed, 3 Jul 2013 17:22:09 +0300 Message-Id: <1372861332-6308-5-git-send-email-mika.kuoppala@intel.com> In-Reply-To: <1372861332-6308-1-git-send-email-mika.kuoppala@intel.com> References: <1372861332-6308-1-git-send-email-mika.kuoppala@intel.com> Subject: [Intel-gfx] [PATCH 4/7] drm/i915: no hangcheck when reset is in progress Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org

Message ID

1372861332-6308-5-git-send-email-mika.kuoppala@intel.com (mailing list archive)

State

New, archived

Headers

From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: intel-gfx@lists.freedesktop.org
Date: Wed,  3 Jul 2013 17:22:09 +0300
Message-Id: <1372861332-6308-5-git-send-email-mika.kuoppala@intel.com>
In-Reply-To: <1372861332-6308-1-git-send-email-mika.kuoppala@intel.com>
References: <1372861332-6308-1-git-send-email-mika.kuoppala@intel.com>
Subject: [Intel-gfx] [PATCH 4/7] drm/i915: no hangcheck when reset is in
	progress
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org
Errors-To: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org

Commit Message

Mika Kuoppala July 3, 2013, 2:22 p.m. UTC

From: Mika Kuoppala <mika.kuoppala@linux.intel.com>

The timer for hangchecking can run again before the previous
reset it has triggered has been handled. This can corrupt
the hangcheck state as reset handling will access and write to
the hangcheck data. To prevent this, avoid running the hangcheck
logic while reset is in progress.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_irq.c |    3 +++
 1 file changed, 3 insertions(+)

Comments

Daniel Vetter July 16, 2013, 8:45 a.m. UTC | #1

On Wed, Jul 03, 2013 at 05:22:09PM +0300, Mika Kuoppala wrote:
> From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> 
> The timer for hangchecking can run again before the previous
> reset it has triggered has been handled. This can corrupt
> the hangcheck state as reset handling will access and write to
> the hangcheck data. To prevent this, avoid running the hangcheck
> logic while reset is in progress.
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_irq.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index dc1b878..b0fec7f 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2447,6 +2447,9 @@ void i915_hangcheck_elapsed(unsigned long data)
>  	if (!i915_enable_hangcheck)
>  		return;
>  
> +	if (i915_reset_in_progress(&dev_priv->gpu_error))
> +		return;

atomic_t are weakly-ordered (I know, we're on x86 here ...) and I think
we're missing the requisite amount of barriers to keep races at bay.

I think wrapping up all access to the hangcheck stats (both from the
hangcheck timer and in the reset code and eventually in the readout code)
with an irq-save spinlock is the simpler and much more obviously correct
(and hence safer) option. Note that you can't optimize away the irq
save/restore stuff in the timer since we call the hangcheck stuff also
from hardirq context.

So I'll punt on this patch here for now, merged all the others leading up
to this one here.

Thanks, Daniel

> +
>  	for_each_ring(ring, dev_priv, i) {
>  		u32 seqno, acthd;
>  		bool busy = true;
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index dc1b878..b0fec7f 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2447,6 +2447,9 @@  void i915_hangcheck_elapsed(unsigned long data)
 	if (!i915_enable_hangcheck)
 		return;
 
+	if (i915_reset_in_progress(&dev_priv->gpu_error))
+		return;
+
 	for_each_ring(ring, dev_priv, i) {
 		u32 seqno, acthd;
 		bool busy = true;

[4/7] drm/i915: no hangcheck when reset is in progress

Commit Message

Comments

Patch