Panic after S3 resume and modeset with MST

From: Takashi Iwai <tiwai@suse.de>

On Wed, 29 Mar 2017 15:34:15 +0200,
Ville Syrjälä wrote:
> 
> On Wed, Mar 29, 2017 at 03:10:09PM +0200, Takashi Iwai wrote:
> > On Mon, 27 Mar 2017 18:02:13 +0200,
> > Takashi Iwai wrote:
> > > 
> > > Hi,
> > > 
> > > the upstream fix a16b7658f4e0d4aec9bc3e75a5f0cc3f7a3a0422
> > >     drm/i915: Call intel_dp_mst_resume() before resuming displays
> > > 
> > > seems to trigger a kernel panic when some modeset change happens after
> > > S3 resume.  The details are found in openSUSE bugzilla,
> > >   https://bugzilla.suse.com/show_bug.cgi?id=1029634
> > > 
> > > In short, the following procedure causes a kernel panic (supposedly)
> > > almost 100% on Dell Latitude with Skylake with MST DP on dock:
> > > 
> > > - Boot with a docking station, DP-1 connected.
> > > - Login on X
> > > - xrandr --output eDP-1 --primary --auto --output DP-1-1 --auto --left-of eDP-1
> > >   ==> This changes the mode.
> > > - Suspend ("systemctl suspend" in my case), and close the lid.
> > > - Remove from the dock (keep the lid closed).
> > > - Open the lid, which resumes automatically.  It works.
> > > - Suspend again.
> > > - Connect to the dock again (keep the lid closed).
> > > - Open the lid, which resumes automatically.  It's still OK.
> > > - xrandr --output eDP-1 --primary --auto --output DP-1-1 --auto --left-of eDP-1
> > >   ==> Now the kernel feezes.
> > > 
> > > Reverting the commit mentioned above fixes the problem.
> > > 
> > > The problem is present in all versions I tested.  The reported kernel
> > > in the Bugzilla is 4.4.x-based one, but the issue is seen in 4.11-rc3,
> > > too.  Note that the S3 resume itself works in 4.11-rc3; the kernel
> > > panic happens when invoking xrandr manually after that.
> > > 
> > > Unfortunately, I couldn't get a kernel panic message, so far.  kdump
> > > didn't work well in this case by some reason.  There are some
> > > screenshots taken by the original reporter (could switch VT
> > > beforehand), but I don't know whether it helps.
> > > 
> > > If you have any hints for further debugging, it'd be highly
> > > appreciated.
> > 
> > It seems that the patch below works around the problem.
> > Can anyone enlighten what's going on there?
> > 
> > 
> > thanks,
> > 
> > Takashi
> > 
> > -- 8< --
> > From: Takashi Iwai <tiwai@suse.de>
> > Subject: [PATCH] drm/i915: Fix crash after S3 resume with DP MST mode change
> > 
> > We've got a bug report showing that Skylake Dell machines with a
> > docking station causes a kernel panic after S3 resume and modeset.
> > The details are found in the openSUSE bugzilla entry below.  The
> > typical test procedure is:
> > 
> > - Laptop is Dell Latitude with eDP (1366x768)
> > - Boot with docking station connected to a DP (1920x1080)
> > - Login, change the mode via
> >   xrandr --output eDP-1 --auto --output DP-1-1 --auto --left-of eDP-1
> > - Suspend, and close the lid after the suspend
> >   (or close the lid to trigger the suspend)
> > - Undock while keeping the lid closed.
> > - Open the lid, which triggers the resume;
> >   the machine wakes up well, and X shows up.  No problem, so far.
> > - Suspend again, close the lid.
> > - Dock again while keeping the lid closed.
> > - Open the lid, triggering the resume; this wakes up still fine.
> > - At this moment, run xrandr again to re-setup DP-1
> >   xrandr --output eDP-1 --auto --output DP-1-1 --auto --left-of eDP-1
> >   ==> This triggers a hard crash.
> > 
> > I could bisect it, and this leaded to the commit a16b7658f4e0
> > ("drm/i915: Call intel_dp_mst_resume() before resuming displays").
> > 
> > Basically the commit just shuffles the calls of intel_display_resume()
> > and intel_dp_mst_resume().  So as a workaround, I tried to split
> > intel_dp_mst_resume() call to postpone the suspected code (the
> > invocation of intel_dp_check_mst_status()), then bingo, this cured the
> > problem.
> > 
> > But don't ask me *why* this fixes.  It's still in a cargo-cult state.
> > 
> > Fixes: a16b7658f4e0 ("drm/i915: Call intel_dp_mst_resume() before resuming displays")
> > Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1029634
> > Signed-off-by: Takashi Iwai <tiwai@suse.de>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.c  |  5 ++++-
> >  drivers/gpu/drm/i915/intel_dp.c  | 20 +++++++++++++++++++-
> >  drivers/gpu/drm/i915/intel_drv.h |  3 ++-
> >  3 files changed, 25 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index 1c75402a59c1..62c40090ceed 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -1559,6 +1559,7 @@ static int i915_suspend_switcheroo(struct drm_device *dev, pm_message_t state)
> >  static int i915_drm_resume(struct drm_device *dev)
> >  {
> >  	struct drm_i915_private *dev_priv = to_i915(dev);
> > +	int mst_pending;
> >  	int ret;
> >  
> >  	disable_rpm_wakeref_asserts(dev_priv);
> > @@ -1608,10 +1609,12 @@ static int i915_drm_resume(struct drm_device *dev)
> >  		dev_priv->display.hpd_irq_setup(dev_priv);
> >  	spin_unlock_irq(&dev_priv->irq_lock);
> >  
> > -	intel_dp_mst_resume(dev);
> > +	mst_pending = intel_dp_mst_resume(dev);
> >  
> >  	intel_display_resume(dev);
> >  
> > +	intel_dp_mst_resume_post(dev, mst_pending);
> > +
> >  	drm_kms_helper_poll_enable(dev);
> >  
> >  	/*
> > diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
> > index d1670b8afbf5..fc5ea900e6f3 100644
> > --- a/drivers/gpu/drm/i915/intel_dp.c
> > +++ b/drivers/gpu/drm/i915/intel_dp.c
> > @@ -6027,9 +6027,10 @@ void intel_dp_mst_suspend(struct drm_device *dev)
> >  	}
> >  }
> >  
> > -void intel_dp_mst_resume(struct drm_device *dev)
> > +int intel_dp_mst_resume(struct drm_device *dev)
> >  {
> >  	struct drm_i915_private *dev_priv = to_i915(dev);
> > +	int pending = 0;
> >  	int i;
> >  
> >  	for (i = 0; i < I915_MAX_PORTS; i++) {
> > @@ -6041,6 +6042,23 @@ void intel_dp_mst_resume(struct drm_device *dev)
> >  
> >  		ret = drm_dp_mst_topology_mgr_resume(&intel_dig_port->dp.mst_mgr);
> >  		if (ret)
> > +			pending |= 1 << i;
> > +	}
> > +
> > +	return pending;
> > +}
> > +
> > +void intel_dp_mst_resume_post(struct drm_device *dev, int pending)
> > +{
> > +	struct drm_i915_private *dev_priv = to_i915(dev);
> > +	int i;
> > +
> > +	for (i = 0; i < I915_MAX_PORTS; i++) {
> > +		struct intel_digital_port *intel_dig_port =
> > +			dev_priv->hotplug.irq_port[i];
> > +		if (!intel_dig_port || !intel_dig_port->dp.can_mst)
> > +			continue;
> > +		if (pending & (1 << i))
> >  			intel_dp_check_mst_status(&intel_dig_port->dp);
> >  	}
> >  }
> 
> The whole MST resume is a bit of chicken and egg type of situation. We
> need the HPD interrupts to resume the previous state, but we don't want
> to actually process real hotplugs until we've done the resume. The
> current code is definitely broken IMO.
> 
> But I'm not really sure why this patch fixes things because the HPD
> processing that will occur when we talk to the sink during the display
> resume should also call intel_dp_check_mst_status().

Actually, just dropping intel_dp_check_mst_status() calls in
intel_dp_mst_resume() seems enough to fix the problem.

Below is the v2 patch doing that.  Does it make more sense?

thanks,

Takashi

-- 8< --
From: Takashi Iwai <tiwai@suse.de>
Subject: [PATCH] drm/i915: Fix crash after S3 resume with DP MST mode change
 (v2)

We've got a bug report showing that Skylake Dell machines with a
docking station causes a kernel panic after S3 resume and modeset.
The details are found in the openSUSE bugzilla entry below.  The
typical test procedure is:

- Laptop is Dell Latitude with eDP (1366x768)
- Boot with docking station connected to a DP (1920x1080)
- Login, change the mode via
  xrandr --output eDP-1 --auto --output DP-1-1 --auto --left-of eDP-1
- Suspend, and close the lid after the suspend
  (or close the lid to trigger the suspend)
- Undock while keeping the lid closed.
- Open the lid, which triggers the resume;
  the machine wakes up well, and X shows up.  No problem, so far.
- Suspend again, close the lid.
- Dock again while keeping the lid closed.
- Open the lid, triggering the resume; this wakes up still fine.
- At this moment, run xrandr again to re-setup DP-1
  xrandr --output eDP-1 --auto --output DP-1-1 --auto --left-of eDP-1
  ==> This triggers a hard crash.

I could bisect it, and this leaded to the commit a16b7658f4e0
("drm/i915: Call intel_dp_mst_resume() before resuming displays").

This patch tries to work around the crash by ignoring the failed port
from drm_dp_mst_topology_mgr_resume().  They should be handled in hpd
later in anyway.

v1->v2: just ignore the drm_dp_mst_topology_mgr_resume() error codes
        instead of postponing.

Fixes: a16b7658f4e0 ("drm/i915: Call intel_dp_mst_resume() before resuming displays")
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1029634
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 drivers/gpu/drm/i915/intel_dp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Panic after S3 resume and modeset with MST

Commit Message

Comments

Patch