From patchwork Fri Jun 17 07:09:03 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: arun.siluvery@linux.intel.com
X-Patchwork-Id: 9182701
Return-Path: <intel-gfx-bounces@lists.freedesktop.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	025886075F for <patchwork-intel-gfx@patchwork.kernel.org>;
	Fri, 17 Jun 2016 07:09:54 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E529D283A1
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Fri, 17 Jun 2016 07:09:53 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id D9DF7283A8; Fri, 17 Jun 2016 07:09:53 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
	autolearn=ham version=3.3.1
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7D4E3283A1
	for <patchwork-intel-gfx@patchwork.kernel.org>;
	Fri, 17 Jun 2016 07:09:53 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id A1AB56EAE5;
	Fri, 17 Jun 2016 07:09:48 +0000 (UTC)
X-Original-To: intel-gfx@lists.freedesktop.org
Delivered-To: intel-gfx@lists.freedesktop.org
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
	by gabe.freedesktop.org (Postfix) with ESMTP id 2DAB96EAE1
	for <intel-gfx@lists.freedesktop.org>;
	Fri, 17 Jun 2016 07:09:26 +0000 (UTC)
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
	by orsmga102.jf.intel.com with ESMTP; 17 Jun 2016 00:09:26 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos; i="5.26,482,1459839600"; d="scan'208";
	a="1003894733"
Received: from asiluver-linux.isw.intel.com ([10.102.226.117])
	by fmsmga002.fm.intel.com with ESMTP; 17 Jun 2016 00:09:24 -0700
From: Arun Siluvery <arun.siluvery@linux.intel.com>
To: intel-gfx@lists.freedesktop.org
Date: Fri, 17 Jun 2016 08:09:03 +0100
Message-Id: <1466147355-4635-4-git-send-email-arun.siluvery@linux.intel.com>
X-Mailer: git-send-email 1.9.1
In-Reply-To: <1466147355-4635-1-git-send-email-arun.siluvery@linux.intel.com>
References: <1466147355-4635-1-git-send-email-arun.siluvery@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>, Tomas Elf <tomas.elf@intel.com>
Subject: [Intel-gfx] [PATCH v2 03/15] drm/i915: Reinstate hang recovery work
	queue.
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Intel graphics driver community testing & development
	<intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
MIME-Version: 1.0
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
X-Virus-Scanned: ClamAV using ClamSMTP

From: Tomas Elf <tomas.elf@intel.com>

There used to be a work queue separating the error handler from the hang
recovery path, which was removed a while back in this commit:

commit b8d24a06568368076ebd5a858a011699a97bfa42
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Wed Jan 28 17:03:14 2015 +0200

    drm/i915: Remove nested work in gpu error handling

Now we need to revert most of that commit since the work queue separating
hang detection from hang recovery is needed in preparation for the upcoming
watchdog timeout feature. The watchdog interrupt service routine will be a
second callsite of the error handler alongside the periodic hang checker,
which runs in a work queue context. Seeing as the error handler will be
serving a caller in a hard interrupt execution context that means that the
error handler must never end up in a situation where it needs to grab the
struct_mutex.  Unfortunately, that is exactly what we need to do first at
the start of the hang recovery path, which might potentially sleep if the
struct_mutex is already held by another thread. Not good when you're in a
hard interrupt context.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_dma.c |  1 +
 drivers/gpu/drm/i915/i915_drv.h |  1 +
 drivers/gpu/drm/i915/i915_irq.c | 26 +++++++++++++++++++++-----
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 24b670f..8a94fc5 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1552,6 +1552,7 @@ int i915_driver_unload(struct drm_device *dev)
 	/* Free error state after interrupts are fully disabled. */
 	cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
 	i915_destroy_error_state(dev);
+	cancel_work_sync(&dev_priv->gpu_error.work);
 
 	/* Flush any outstanding unpin_work. */
 	flush_workqueue(dev_priv->wq);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8bb05b2..9ad15c7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1372,6 +1372,7 @@ struct i915_gpu_error {
 	spinlock_t lock;
 	/* Protected by the above dev->gpu_error.lock. */
 	struct drm_i915_error_state *first_error;
+	struct work_struct work;
 
 	unsigned long missed_irq_rings;
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 4378a65..224ff2b 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2516,14 +2516,18 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 }
 
 /**
- * i915_reset_and_wakeup - do process context error handling work
- * @dev_priv: i915 device private
+ * i915_error_work_func - do process context error handling work
+ * @work: work item containing error struct, passed by the error handler
  *
  * Fire an error uevent so userspace can see that a hang or error
  * was detected.
  */
-static void i915_reset_and_wakeup(struct drm_i915_private *dev_priv)
+static void i915_error_work_func(struct work_struct *work)
 {
+	struct i915_gpu_error *error = container_of(work, struct i915_gpu_error,
+	                                            work);
+	struct drm_i915_private *dev_priv =
+	        container_of(error, struct drm_i915_private, gpu_error);
 	struct kobject *kobj = &dev_priv->dev->primary->kdev->kobj;
 	char *error_event[] = { I915_ERROR_UEVENT "=1", NULL };
 	char *reset_event[] = { I915_RESET_UEVENT "=1", NULL };
@@ -2717,7 +2721,19 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
 		i915_error_wake_up(dev_priv, false);
 	}
 
-	i915_reset_and_wakeup(dev_priv);
+	/*
+	 * Our reset work can grab modeset locks (since it needs to reset the
+	 * state of outstanding pageflips). Hence it must not be run on our own
+	 * dev-priv->wq work queue for otherwise the flush_work in the pageflip
+	 * code will deadlock.
+	 * If error_work is already in the work queue then it will not be added
+	 * again. It hasn't yet executed so it will see the reset flags when
+	 * it is scheduled. If it isn't in the queue or it is currently
+	 * executing then this call will add it to the queue again so that
+	 * even if it misses the reset flags during the current call it is
+	 * guaranteed to see them on the next call.
+	 */
+	schedule_work(&dev_priv->gpu_error.work);
 }
 
 /* Called from drm generic code, passed 'crtc' which
@@ -4556,7 +4572,7 @@ void intel_irq_init(struct drm_i915_private *dev_priv)
 	struct drm_device *dev = dev_priv->dev;
 
 	intel_hpd_init_work(dev_priv);
-
+	INIT_WORK(&dev_priv->gpu_error.work, i915_error_work_func);
 	INIT_WORK(&dev_priv->rps.work, gen6_pm_rps_work);
 	INIT_WORK(&dev_priv->l3_parity.error_work, ivybridge_parity_work);