From patchwork Wed Mar 29 13:54:43 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Takashi Iwai X-Patchwork-Id: 9651599 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 72CEA602BE for ; Wed, 29 Mar 2017 13:54:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5BF2F28496 for ; Wed, 29 Mar 2017 13:54:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4D06E2849F; Wed, 29 Mar 2017 13:54:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 946A628496 for ; Wed, 29 Mar 2017 13:54:48 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A7F346E73D; Wed, 29 Mar 2017 13:54:47 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by gabe.freedesktop.org (Postfix) with ESMTPS id DAE106E73D for ; Wed, 29 Mar 2017 13:54:44 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 4A3D2ABBD; Wed, 29 Mar 2017 13:54:43 +0000 (UTC) Date: Wed, 29 Mar 2017 15:54:43 +0200 Message-ID: From: Takashi Iwai To: Ville =?UTF-8?B?U3lyasOkbMOk?= In-Reply-To: <20170329133415.GZ30290@intel.com> References: <20170329133415.GZ30290@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 Emacs/25.1 (x86_64-suse-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Cc: intel-gfx@lists.freedesktop.org Subject: Re: [Intel-gfx] Panic after S3 resume and modeset with MST X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP On Wed, 29 Mar 2017 15:34:15 +0200, Ville Syrjälä wrote: > > On Wed, Mar 29, 2017 at 03:10:09PM +0200, Takashi Iwai wrote: > > On Mon, 27 Mar 2017 18:02:13 +0200, > > Takashi Iwai wrote: > > > > > > Hi, > > > > > > the upstream fix a16b7658f4e0d4aec9bc3e75a5f0cc3f7a3a0422 > > > drm/i915: Call intel_dp_mst_resume() before resuming displays > > > > > > seems to trigger a kernel panic when some modeset change happens after > > > S3 resume. The details are found in openSUSE bugzilla, > > > https://bugzilla.suse.com/show_bug.cgi?id=1029634 > > > > > > In short, the following procedure causes a kernel panic (supposedly) > > > almost 100% on Dell Latitude with Skylake with MST DP on dock: > > > > > > - Boot with a docking station, DP-1 connected. > > > - Login on X > > > - xrandr --output eDP-1 --primary --auto --output DP-1-1 --auto --left-of eDP-1 > > > ==> This changes the mode. > > > - Suspend ("systemctl suspend" in my case), and close the lid. > > > - Remove from the dock (keep the lid closed). > > > - Open the lid, which resumes automatically. It works. > > > - Suspend again. > > > - Connect to the dock again (keep the lid closed). > > > - Open the lid, which resumes automatically. It's still OK. > > > - xrandr --output eDP-1 --primary --auto --output DP-1-1 --auto --left-of eDP-1 > > > ==> Now the kernel feezes. > > > > > > Reverting the commit mentioned above fixes the problem. > > > > > > The problem is present in all versions I tested. The reported kernel > > > in the Bugzilla is 4.4.x-based one, but the issue is seen in 4.11-rc3, > > > too. Note that the S3 resume itself works in 4.11-rc3; the kernel > > > panic happens when invoking xrandr manually after that. > > > > > > Unfortunately, I couldn't get a kernel panic message, so far. kdump > > > didn't work well in this case by some reason. There are some > > > screenshots taken by the original reporter (could switch VT > > > beforehand), but I don't know whether it helps. > > > > > > If you have any hints for further debugging, it'd be highly > > > appreciated. > > > > It seems that the patch below works around the problem. > > Can anyone enlighten what's going on there? > > > > > > thanks, > > > > Takashi > > > > -- 8< -- > > From: Takashi Iwai > > Subject: [PATCH] drm/i915: Fix crash after S3 resume with DP MST mode change > > > > We've got a bug report showing that Skylake Dell machines with a > > docking station causes a kernel panic after S3 resume and modeset. > > The details are found in the openSUSE bugzilla entry below. The > > typical test procedure is: > > > > - Laptop is Dell Latitude with eDP (1366x768) > > - Boot with docking station connected to a DP (1920x1080) > > - Login, change the mode via > > xrandr --output eDP-1 --auto --output DP-1-1 --auto --left-of eDP-1 > > - Suspend, and close the lid after the suspend > > (or close the lid to trigger the suspend) > > - Undock while keeping the lid closed. > > - Open the lid, which triggers the resume; > > the machine wakes up well, and X shows up. No problem, so far. > > - Suspend again, close the lid. > > - Dock again while keeping the lid closed. > > - Open the lid, triggering the resume; this wakes up still fine. > > - At this moment, run xrandr again to re-setup DP-1 > > xrandr --output eDP-1 --auto --output DP-1-1 --auto --left-of eDP-1 > > ==> This triggers a hard crash. > > > > I could bisect it, and this leaded to the commit a16b7658f4e0 > > ("drm/i915: Call intel_dp_mst_resume() before resuming displays"). > > > > Basically the commit just shuffles the calls of intel_display_resume() > > and intel_dp_mst_resume(). So as a workaround, I tried to split > > intel_dp_mst_resume() call to postpone the suspected code (the > > invocation of intel_dp_check_mst_status()), then bingo, this cured the > > problem. > > > > But don't ask me *why* this fixes. It's still in a cargo-cult state. > > > > Fixes: a16b7658f4e0 ("drm/i915: Call intel_dp_mst_resume() before resuming displays") > > Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1029634 > > Signed-off-by: Takashi Iwai > > --- > > drivers/gpu/drm/i915/i915_drv.c | 5 ++++- > > drivers/gpu/drm/i915/intel_dp.c | 20 +++++++++++++++++++- > > drivers/gpu/drm/i915/intel_drv.h | 3 ++- > > 3 files changed, 25 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > > index 1c75402a59c1..62c40090ceed 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.c > > +++ b/drivers/gpu/drm/i915/i915_drv.c > > @@ -1559,6 +1559,7 @@ static int i915_suspend_switcheroo(struct drm_device *dev, pm_message_t state) > > static int i915_drm_resume(struct drm_device *dev) > > { > > struct drm_i915_private *dev_priv = to_i915(dev); > > + int mst_pending; > > int ret; > > > > disable_rpm_wakeref_asserts(dev_priv); > > @@ -1608,10 +1609,12 @@ static int i915_drm_resume(struct drm_device *dev) > > dev_priv->display.hpd_irq_setup(dev_priv); > > spin_unlock_irq(&dev_priv->irq_lock); > > > > - intel_dp_mst_resume(dev); > > + mst_pending = intel_dp_mst_resume(dev); > > > > intel_display_resume(dev); > > > > + intel_dp_mst_resume_post(dev, mst_pending); > > + > > drm_kms_helper_poll_enable(dev); > > > > /* > > diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c > > index d1670b8afbf5..fc5ea900e6f3 100644 > > --- a/drivers/gpu/drm/i915/intel_dp.c > > +++ b/drivers/gpu/drm/i915/intel_dp.c > > @@ -6027,9 +6027,10 @@ void intel_dp_mst_suspend(struct drm_device *dev) > > } > > } > > > > -void intel_dp_mst_resume(struct drm_device *dev) > > +int intel_dp_mst_resume(struct drm_device *dev) > > { > > struct drm_i915_private *dev_priv = to_i915(dev); > > + int pending = 0; > > int i; > > > > for (i = 0; i < I915_MAX_PORTS; i++) { > > @@ -6041,6 +6042,23 @@ void intel_dp_mst_resume(struct drm_device *dev) > > > > ret = drm_dp_mst_topology_mgr_resume(&intel_dig_port->dp.mst_mgr); > > if (ret) > > + pending |= 1 << i; > > + } > > + > > + return pending; > > +} > > + > > +void intel_dp_mst_resume_post(struct drm_device *dev, int pending) > > +{ > > + struct drm_i915_private *dev_priv = to_i915(dev); > > + int i; > > + > > + for (i = 0; i < I915_MAX_PORTS; i++) { > > + struct intel_digital_port *intel_dig_port = > > + dev_priv->hotplug.irq_port[i]; > > + if (!intel_dig_port || !intel_dig_port->dp.can_mst) > > + continue; > > + if (pending & (1 << i)) > > intel_dp_check_mst_status(&intel_dig_port->dp); > > } > > } > > The whole MST resume is a bit of chicken and egg type of situation. We > need the HPD interrupts to resume the previous state, but we don't want > to actually process real hotplugs until we've done the resume. The > current code is definitely broken IMO. > > But I'm not really sure why this patch fixes things because the HPD > processing that will occur when we talk to the sink during the display > resume should also call intel_dp_check_mst_status(). Actually, just dropping intel_dp_check_mst_status() calls in intel_dp_mst_resume() seems enough to fix the problem. Below is the v2 patch doing that. Does it make more sense? thanks, Takashi -- 8< -- From: Takashi Iwai Subject: [PATCH] drm/i915: Fix crash after S3 resume with DP MST mode change (v2) We've got a bug report showing that Skylake Dell machines with a docking station causes a kernel panic after S3 resume and modeset. The details are found in the openSUSE bugzilla entry below. The typical test procedure is: - Laptop is Dell Latitude with eDP (1366x768) - Boot with docking station connected to a DP (1920x1080) - Login, change the mode via xrandr --output eDP-1 --auto --output DP-1-1 --auto --left-of eDP-1 - Suspend, and close the lid after the suspend (or close the lid to trigger the suspend) - Undock while keeping the lid closed. - Open the lid, which triggers the resume; the machine wakes up well, and X shows up. No problem, so far. - Suspend again, close the lid. - Dock again while keeping the lid closed. - Open the lid, triggering the resume; this wakes up still fine. - At this moment, run xrandr again to re-setup DP-1 xrandr --output eDP-1 --auto --output DP-1-1 --auto --left-of eDP-1 ==> This triggers a hard crash. I could bisect it, and this leaded to the commit a16b7658f4e0 ("drm/i915: Call intel_dp_mst_resume() before resuming displays"). This patch tries to work around the crash by ignoring the failed port from drm_dp_mst_topology_mgr_resume(). They should be handled in hpd later in anyway. v1->v2: just ignore the drm_dp_mst_topology_mgr_resume() error codes instead of postponing. Fixes: a16b7658f4e0 ("drm/i915: Call intel_dp_mst_resume() before resuming displays") Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1029634 Signed-off-by: Takashi Iwai Reviewed-by: Lyude --- drivers/gpu/drm/i915/intel_dp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c index d1670b8afbf5..a6c0f0ac16eb 100644 --- a/drivers/gpu/drm/i915/intel_dp.c +++ b/drivers/gpu/drm/i915/intel_dp.c @@ -6041,6 +6041,7 @@ void intel_dp_mst_resume(struct drm_device *dev) ret = drm_dp_mst_topology_mgr_resume(&intel_dig_port->dp.mst_mgr); if (ret) - intel_dp_check_mst_status(&intel_dig_port->dp); + DRM_DEBUG_KMS("DP MST resume failed for port-%c\n", + port_name(intel_dig_port->port)); } }