diff mbox

Panic after S3 resume and modeset with MST

Message ID s5hlgrojkjy.wl-tiwai@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Takashi Iwai March 29, 2017, 1:10 p.m. UTC
On Mon, 27 Mar 2017 18:02:13 +0200,
Takashi Iwai wrote:
> 
> Hi,
> 
> the upstream fix a16b7658f4e0d4aec9bc3e75a5f0cc3f7a3a0422
>     drm/i915: Call intel_dp_mst_resume() before resuming displays
> 
> seems to trigger a kernel panic when some modeset change happens after
> S3 resume.  The details are found in openSUSE bugzilla,
>   https://bugzilla.suse.com/show_bug.cgi?id=1029634
> 
> In short, the following procedure causes a kernel panic (supposedly)
> almost 100% on Dell Latitude with Skylake with MST DP on dock:
> 
> - Boot with a docking station, DP-1 connected.
> - Login on X
> - xrandr --output eDP-1 --primary --auto --output DP-1-1 --auto --left-of eDP-1
>   ==> This changes the mode.
> - Suspend ("systemctl suspend" in my case), and close the lid.
> - Remove from the dock (keep the lid closed).
> - Open the lid, which resumes automatically.  It works.
> - Suspend again.
> - Connect to the dock again (keep the lid closed).
> - Open the lid, which resumes automatically.  It's still OK.
> - xrandr --output eDP-1 --primary --auto --output DP-1-1 --auto --left-of eDP-1
>   ==> Now the kernel feezes.
> 
> Reverting the commit mentioned above fixes the problem.
> 
> The problem is present in all versions I tested.  The reported kernel
> in the Bugzilla is 4.4.x-based one, but the issue is seen in 4.11-rc3,
> too.  Note that the S3 resume itself works in 4.11-rc3; the kernel
> panic happens when invoking xrandr manually after that.
> 
> Unfortunately, I couldn't get a kernel panic message, so far.  kdump
> didn't work well in this case by some reason.  There are some
> screenshots taken by the original reporter (could switch VT
> beforehand), but I don't know whether it helps.
> 
> If you have any hints for further debugging, it'd be highly
> appreciated.

It seems that the patch below works around the problem.
Can anyone enlighten what's going on there?


thanks,

Takashi

-- 8< --
From: Takashi Iwai <tiwai@suse.de>
Subject: [PATCH] drm/i915: Fix crash after S3 resume with DP MST mode change

We've got a bug report showing that Skylake Dell machines with a
docking station causes a kernel panic after S3 resume and modeset.
The details are found in the openSUSE bugzilla entry below.  The
typical test procedure is:

- Laptop is Dell Latitude with eDP (1366x768)
- Boot with docking station connected to a DP (1920x1080)
- Login, change the mode via
  xrandr --output eDP-1 --auto --output DP-1-1 --auto --left-of eDP-1
- Suspend, and close the lid after the suspend
  (or close the lid to trigger the suspend)
- Undock while keeping the lid closed.
- Open the lid, which triggers the resume;
  the machine wakes up well, and X shows up.  No problem, so far.
- Suspend again, close the lid.
- Dock again while keeping the lid closed.
- Open the lid, triggering the resume; this wakes up still fine.
- At this moment, run xrandr again to re-setup DP-1
  xrandr --output eDP-1 --auto --output DP-1-1 --auto --left-of eDP-1
  ==> This triggers a hard crash.

I could bisect it, and this leaded to the commit a16b7658f4e0
("drm/i915: Call intel_dp_mst_resume() before resuming displays").

Basically the commit just shuffles the calls of intel_display_resume()
and intel_dp_mst_resume().  So as a workaround, I tried to split
intel_dp_mst_resume() call to postpone the suspected code (the
invocation of intel_dp_check_mst_status()), then bingo, this cured the
problem.

But don't ask me *why* this fixes.  It's still in a cargo-cult state.

Fixes: a16b7658f4e0 ("drm/i915: Call intel_dp_mst_resume() before resuming displays")
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1029634
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 drivers/gpu/drm/i915/i915_drv.c  |  5 ++++-
 drivers/gpu/drm/i915/intel_dp.c  | 20 +++++++++++++++++++-
 drivers/gpu/drm/i915/intel_drv.h |  3 ++-
 3 files changed, 25 insertions(+), 3 deletions(-)

Comments

Ville Syrjälä March 29, 2017, 1:34 p.m. UTC | #1
On Wed, Mar 29, 2017 at 03:10:09PM +0200, Takashi Iwai wrote:
> On Mon, 27 Mar 2017 18:02:13 +0200,
> Takashi Iwai wrote:
> > 
> > Hi,
> > 
> > the upstream fix a16b7658f4e0d4aec9bc3e75a5f0cc3f7a3a0422
> >     drm/i915: Call intel_dp_mst_resume() before resuming displays
> > 
> > seems to trigger a kernel panic when some modeset change happens after
> > S3 resume.  The details are found in openSUSE bugzilla,
> >   https://bugzilla.suse.com/show_bug.cgi?id=1029634
> > 
> > In short, the following procedure causes a kernel panic (supposedly)
> > almost 100% on Dell Latitude with Skylake with MST DP on dock:
> > 
> > - Boot with a docking station, DP-1 connected.
> > - Login on X
> > - xrandr --output eDP-1 --primary --auto --output DP-1-1 --auto --left-of eDP-1
> >   ==> This changes the mode.
> > - Suspend ("systemctl suspend" in my case), and close the lid.
> > - Remove from the dock (keep the lid closed).
> > - Open the lid, which resumes automatically.  It works.
> > - Suspend again.
> > - Connect to the dock again (keep the lid closed).
> > - Open the lid, which resumes automatically.  It's still OK.
> > - xrandr --output eDP-1 --primary --auto --output DP-1-1 --auto --left-of eDP-1
> >   ==> Now the kernel feezes.
> > 
> > Reverting the commit mentioned above fixes the problem.
> > 
> > The problem is present in all versions I tested.  The reported kernel
> > in the Bugzilla is 4.4.x-based one, but the issue is seen in 4.11-rc3,
> > too.  Note that the S3 resume itself works in 4.11-rc3; the kernel
> > panic happens when invoking xrandr manually after that.
> > 
> > Unfortunately, I couldn't get a kernel panic message, so far.  kdump
> > didn't work well in this case by some reason.  There are some
> > screenshots taken by the original reporter (could switch VT
> > beforehand), but I don't know whether it helps.
> > 
> > If you have any hints for further debugging, it'd be highly
> > appreciated.
> 
> It seems that the patch below works around the problem.
> Can anyone enlighten what's going on there?
> 
> 
> thanks,
> 
> Takashi
> 
> -- 8< --
> From: Takashi Iwai <tiwai@suse.de>
> Subject: [PATCH] drm/i915: Fix crash after S3 resume with DP MST mode change
> 
> We've got a bug report showing that Skylake Dell machines with a
> docking station causes a kernel panic after S3 resume and modeset.
> The details are found in the openSUSE bugzilla entry below.  The
> typical test procedure is:
> 
> - Laptop is Dell Latitude with eDP (1366x768)
> - Boot with docking station connected to a DP (1920x1080)
> - Login, change the mode via
>   xrandr --output eDP-1 --auto --output DP-1-1 --auto --left-of eDP-1
> - Suspend, and close the lid after the suspend
>   (or close the lid to trigger the suspend)
> - Undock while keeping the lid closed.
> - Open the lid, which triggers the resume;
>   the machine wakes up well, and X shows up.  No problem, so far.
> - Suspend again, close the lid.
> - Dock again while keeping the lid closed.
> - Open the lid, triggering the resume; this wakes up still fine.
> - At this moment, run xrandr again to re-setup DP-1
>   xrandr --output eDP-1 --auto --output DP-1-1 --auto --left-of eDP-1
>   ==> This triggers a hard crash.
> 
> I could bisect it, and this leaded to the commit a16b7658f4e0
> ("drm/i915: Call intel_dp_mst_resume() before resuming displays").
> 
> Basically the commit just shuffles the calls of intel_display_resume()
> and intel_dp_mst_resume().  So as a workaround, I tried to split
> intel_dp_mst_resume() call to postpone the suspected code (the
> invocation of intel_dp_check_mst_status()), then bingo, this cured the
> problem.
> 
> But don't ask me *why* this fixes.  It's still in a cargo-cult state.
> 
> Fixes: a16b7658f4e0 ("drm/i915: Call intel_dp_mst_resume() before resuming displays")
> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1029634
> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> ---
>  drivers/gpu/drm/i915/i915_drv.c  |  5 ++++-
>  drivers/gpu/drm/i915/intel_dp.c  | 20 +++++++++++++++++++-
>  drivers/gpu/drm/i915/intel_drv.h |  3 ++-
>  3 files changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 1c75402a59c1..62c40090ceed 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1559,6 +1559,7 @@ static int i915_suspend_switcheroo(struct drm_device *dev, pm_message_t state)
>  static int i915_drm_resume(struct drm_device *dev)
>  {
>  	struct drm_i915_private *dev_priv = to_i915(dev);
> +	int mst_pending;
>  	int ret;
>  
>  	disable_rpm_wakeref_asserts(dev_priv);
> @@ -1608,10 +1609,12 @@ static int i915_drm_resume(struct drm_device *dev)
>  		dev_priv->display.hpd_irq_setup(dev_priv);
>  	spin_unlock_irq(&dev_priv->irq_lock);
>  
> -	intel_dp_mst_resume(dev);
> +	mst_pending = intel_dp_mst_resume(dev);
>  
>  	intel_display_resume(dev);
>  
> +	intel_dp_mst_resume_post(dev, mst_pending);
> +
>  	drm_kms_helper_poll_enable(dev);
>  
>  	/*
> diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
> index d1670b8afbf5..fc5ea900e6f3 100644
> --- a/drivers/gpu/drm/i915/intel_dp.c
> +++ b/drivers/gpu/drm/i915/intel_dp.c
> @@ -6027,9 +6027,10 @@ void intel_dp_mst_suspend(struct drm_device *dev)
>  	}
>  }
>  
> -void intel_dp_mst_resume(struct drm_device *dev)
> +int intel_dp_mst_resume(struct drm_device *dev)
>  {
>  	struct drm_i915_private *dev_priv = to_i915(dev);
> +	int pending = 0;
>  	int i;
>  
>  	for (i = 0; i < I915_MAX_PORTS; i++) {
> @@ -6041,6 +6042,23 @@ void intel_dp_mst_resume(struct drm_device *dev)
>  
>  		ret = drm_dp_mst_topology_mgr_resume(&intel_dig_port->dp.mst_mgr);
>  		if (ret)
> +			pending |= 1 << i;
> +	}
> +
> +	return pending;
> +}
> +
> +void intel_dp_mst_resume_post(struct drm_device *dev, int pending)
> +{
> +	struct drm_i915_private *dev_priv = to_i915(dev);
> +	int i;
> +
> +	for (i = 0; i < I915_MAX_PORTS; i++) {
> +		struct intel_digital_port *intel_dig_port =
> +			dev_priv->hotplug.irq_port[i];
> +		if (!intel_dig_port || !intel_dig_port->dp.can_mst)
> +			continue;
> +		if (pending & (1 << i))
>  			intel_dp_check_mst_status(&intel_dig_port->dp);
>  	}
>  }

The whole MST resume is a bit of chicken and egg type of situation. We
need the HPD interrupts to resume the previous state, but we don't want
to actually process real hotplugs until we've done the resume. The
current code is definitely broken IMO.

But I'm not really sure why this patch fixes things because the HPD
processing that will occur when we talk to the sink during the display
resume should also call intel_dp_check_mst_status().

> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 344f238b283f..2025aafdcd26 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -1447,7 +1447,8 @@ void intel_edp_panel_on(struct intel_dp *intel_dp);
>  void intel_edp_panel_off(struct intel_dp *intel_dp);
>  void intel_dp_add_properties(struct intel_dp *intel_dp, struct drm_connector *connector);
>  void intel_dp_mst_suspend(struct drm_device *dev);
> -void intel_dp_mst_resume(struct drm_device *dev);
> +int intel_dp_mst_resume(struct drm_device *dev);
> +void intel_dp_mst_resume_post(struct drm_device *dev, int pending);
>  int intel_dp_max_link_rate(struct intel_dp *intel_dp);
>  int intel_dp_rate_select(struct intel_dp *intel_dp, int rate);
>  void intel_dp_hot_plug(struct intel_encoder *intel_encoder);
> -- 
> 2.11.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 1c75402a59c1..62c40090ceed 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1559,6 +1559,7 @@  static int i915_suspend_switcheroo(struct drm_device *dev, pm_message_t state)
 static int i915_drm_resume(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
+	int mst_pending;
 	int ret;
 
 	disable_rpm_wakeref_asserts(dev_priv);
@@ -1608,10 +1609,12 @@  static int i915_drm_resume(struct drm_device *dev)
 		dev_priv->display.hpd_irq_setup(dev_priv);
 	spin_unlock_irq(&dev_priv->irq_lock);
 
-	intel_dp_mst_resume(dev);
+	mst_pending = intel_dp_mst_resume(dev);
 
 	intel_display_resume(dev);
 
+	intel_dp_mst_resume_post(dev, mst_pending);
+
 	drm_kms_helper_poll_enable(dev);
 
 	/*
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index d1670b8afbf5..fc5ea900e6f3 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -6027,9 +6027,10 @@  void intel_dp_mst_suspend(struct drm_device *dev)
 	}
 }
 
-void intel_dp_mst_resume(struct drm_device *dev)
+int intel_dp_mst_resume(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
+	int pending = 0;
 	int i;
 
 	for (i = 0; i < I915_MAX_PORTS; i++) {
@@ -6041,6 +6042,23 @@  void intel_dp_mst_resume(struct drm_device *dev)
 
 		ret = drm_dp_mst_topology_mgr_resume(&intel_dig_port->dp.mst_mgr);
 		if (ret)
+			pending |= 1 << i;
+	}
+
+	return pending;
+}
+
+void intel_dp_mst_resume_post(struct drm_device *dev, int pending)
+{
+	struct drm_i915_private *dev_priv = to_i915(dev);
+	int i;
+
+	for (i = 0; i < I915_MAX_PORTS; i++) {
+		struct intel_digital_port *intel_dig_port =
+			dev_priv->hotplug.irq_port[i];
+		if (!intel_dig_port || !intel_dig_port->dp.can_mst)
+			continue;
+		if (pending & (1 << i))
 			intel_dp_check_mst_status(&intel_dig_port->dp);
 	}
 }
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 344f238b283f..2025aafdcd26 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1447,7 +1447,8 @@  void intel_edp_panel_on(struct intel_dp *intel_dp);
 void intel_edp_panel_off(struct intel_dp *intel_dp);
 void intel_dp_add_properties(struct intel_dp *intel_dp, struct drm_connector *connector);
 void intel_dp_mst_suspend(struct drm_device *dev);
-void intel_dp_mst_resume(struct drm_device *dev);
+int intel_dp_mst_resume(struct drm_device *dev);
+void intel_dp_mst_resume_post(struct drm_device *dev, int pending);
 int intel_dp_max_link_rate(struct intel_dp *intel_dp);
 int intel_dp_rate_select(struct intel_dp *intel_dp, int rate);
 void intel_dp_hot_plug(struct intel_encoder *intel_encoder);