diff mbox

drm/i915: Protect fbdev across slow or failed initialisation

Message ID 20160205145831.GA14084@wunner.de (mailing list archive)
State New, archived
Headers show

Commit Message

Lukas Wunner Feb. 5, 2016, 2:58 p.m. UTC
Hi Chris,

On Fri, Feb 05, 2016 at 11:09:27AM +0000, Chris Wilson wrote:
> On Fri, Feb 05, 2016 at 01:27:10AM +0100, Lukas Wunner wrote:
> > On Thu, Feb 04, 2016 at 09:21:17AM +0000, Li, Weinan Z wrote:
> > > We still need this patch. Seems 54632abe8ca3 ("drm/i915: Fix oops caused by fbdev initialization
> > > failure") as well as 366e39b4d2c5 ("drm/i915: Tear down fbdev if
> > > initialization fails") this 2 patches can???t cover this case. It???s access ifbdev->fb before the initialization
> > > finished, but not initialization failed. If don???t have any other patches or code update relative, it may still have in 4.5.
> > 
> > Okay, I see.
> > 
> > > 
> > > add info NULL check should be better, it is also initialized in the async queue
> > > >       info = ifbdev->helper.fbdev;
> > > > +     if (info == NULL)
> > > > +            return false;
> > > >       if (!info->screen_base)
> > 
> > So if there's indeed a race between fbdev initialization and the call to
> > intel_fbdev_restore_mode() (on lastclose), there's more that can go awry:
> > - intel_fbdev_restore_mode() dereferences ifbdev->fb->obj
> > - it calls __drm_atomic_helper_set_config() which WARNs if ifbdev->fb is NULL
> > - it calls drm_fb_helper_set_par() which dereferences ifbdev->helper.fbdev
> > 
> > Instead of checking all that it's probably simpler to call
> > async_synchronize_full() at the beginning of intel_fbdev_restore_mode(),
> > as Li Weinan suggested. I'll submit the corresponding one-liner patch
> > tomorrow if noone else does.
> > 
> > Chris' patch also modified intel_fbdev_set_suspend() as well as
> > intel_fbdev_output_poll_changed(), not sure if these are racy as well.
> > At least the stack traces posted by Li Weinan and Gustav Fägerlind
> > only indicate that lastclose is racy.
> 
> We called set-suspend internally, but we don't do any checks before
> hand. I was concerned we may be able to get into a situation where
> .fb_probe fails and leaves a dangling dev_priv->ifbdev for which we
> would then trip over the NULL info->screen_base. So I was looking for a
> suitable guard.

Yes, looking at this with a fresh pair of eyeballs I think you were
totally right, i915_pm_ops is declared as part of i915_pci_driver and
it might indeed happen that i915_pm_suspend() is called before the fbdev
is fully initialized.

As for intel_fbdev_output_poll_changed(), there's even a comment in
i915_load_modeset_init() stating that hotplug events might occur
during fdbev initial config.

Below patch adds the requisite async_synchronize_full() to the three
functions that you also modified and updates that comment.

However AFAICT a corner case remains where we're still vulnerable to races:
async_schedule() runs synchronously "if we're out of memory or if there's
too much work pending already" (see __async_schedule() in kernel/async.c).
In this case no entry is added to the pending list and
async_synchronize_full() might theoretically return before the synchronously
executed function has finished.

The existing call to async_synchronize_full() in intel_fbdev_fini()
seems to be susceptible to the same race condition, but it's probably
very hard to trigger it. (Unload the module before the fbdev is fully
initialized.)

To make it bullet proof we could declare a completion in intel_fbdev.c
and wait for that instead. Opinions?

Thanks,

Lukas

-- >8 --

Subject: [PATCH] drm/i915: Fix races on fbdev

The ->lastclose callback invokes intel_fbdev_restore_mode() and has
been witnessed to run before intel_fbdev_initial_config_async()
has finished.

We might likewise receive hotplug events or be suspended before
we've had a chance to fully set up the fbdev.

Fix by waiting for the asynchronous thread to finish.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580
Reported-by: Gustav FÀgerlind <gustav.fagerlind@gmail.com>
Reported-by: "Li, Weinan Z" <weinan.z.li@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/gpu/drm/i915/i915_dma.c    | 8 +++-----
 drivers/gpu/drm/i915/intel_fbdev.c | 4 ++++
 2 files changed, 7 insertions(+), 5 deletions(-)

Comments

Daniel Vetter Feb. 15, 2016, 4:32 p.m. UTC | #1
On Fri, Feb 05, 2016 at 03:58:31PM +0100, Lukas Wunner wrote:
> Hi Chris,
> 
> On Fri, Feb 05, 2016 at 11:09:27AM +0000, Chris Wilson wrote:
> > On Fri, Feb 05, 2016 at 01:27:10AM +0100, Lukas Wunner wrote:
> > > On Thu, Feb 04, 2016 at 09:21:17AM +0000, Li, Weinan Z wrote:
> > > > We still need this patch. Seems 54632abe8ca3 ("drm/i915: Fix oops caused by fbdev initialization
> > > > failure") as well as 366e39b4d2c5 ("drm/i915: Tear down fbdev if
> > > > initialization fails") this 2 patches can???t cover this case. It???s access ifbdev->fb before the initialization
> > > > finished, but not initialization failed. If don???t have any other patches or code update relative, it may still have in 4.5.
> > > 
> > > Okay, I see.
> > > 
> > > > 
> > > > add info NULL check should be better, it is also initialized in the async queue
> > > > >       info = ifbdev->helper.fbdev;
> > > > > +     if (info == NULL)
> > > > > +            return false;
> > > > >       if (!info->screen_base)
> > > 
> > > So if there's indeed a race between fbdev initialization and the call to
> > > intel_fbdev_restore_mode() (on lastclose), there's more that can go awry:
> > > - intel_fbdev_restore_mode() dereferences ifbdev->fb->obj
> > > - it calls __drm_atomic_helper_set_config() which WARNs if ifbdev->fb is NULL
> > > - it calls drm_fb_helper_set_par() which dereferences ifbdev->helper.fbdev
> > > 
> > > Instead of checking all that it's probably simpler to call
> > > async_synchronize_full() at the beginning of intel_fbdev_restore_mode(),
> > > as Li Weinan suggested. I'll submit the corresponding one-liner patch
> > > tomorrow if noone else does.
> > > 
> > > Chris' patch also modified intel_fbdev_set_suspend() as well as
> > > intel_fbdev_output_poll_changed(), not sure if these are racy as well.
> > > At least the stack traces posted by Li Weinan and Gustav Fägerlind
> > > only indicate that lastclose is racy.
> > 
> > We called set-suspend internally, but we don't do any checks before
> > hand. I was concerned we may be able to get into a situation where
> > .fb_probe fails and leaves a dangling dev_priv->ifbdev for which we
> > would then trip over the NULL info->screen_base. So I was looking for a
> > suitable guard.
> 
> Yes, looking at this with a fresh pair of eyeballs I think you were
> totally right, i915_pm_ops is declared as part of i915_pci_driver and
> it might indeed happen that i915_pm_suspend() is called before the fbdev
> is fully initialized.
> 
> As for intel_fbdev_output_poll_changed(), there's even a comment in
> i915_load_modeset_init() stating that hotplug events might occur
> during fdbev initial config.
> 
> Below patch adds the requisite async_synchronize_full() to the three
> functions that you also modified and updates that comment.
> 
> However AFAICT a corner case remains where we're still vulnerable to races:
> async_schedule() runs synchronously "if we're out of memory or if there's
> too much work pending already" (see __async_schedule() in kernel/async.c).
> In this case no entry is added to the pending list and
> async_synchronize_full() might theoretically return before the synchronously
> executed function has finished.
> 
> The existing call to async_synchronize_full() in intel_fbdev_fini()
> seems to be susceptible to the same race condition, but it's probably
> very hard to trigger it. (Unload the module before the fbdev is fully
> initialized.)
> 
> To make it bullet proof we could declare a completion in intel_fbdev.c
> and wait for that instead. Opinions?
> 
> Thanks,
> 
> Lukas
> 
> -- >8 --
> 
> Subject: [PATCH] drm/i915: Fix races on fbdev
> 
> The ->lastclose callback invokes intel_fbdev_restore_mode() and has
> been witnessed to run before intel_fbdev_initial_config_async()
> has finished.
> 
> We might likewise receive hotplug events or be suspended before
> we've had a chance to fully set up the fbdev.
> 
> Fix by waiting for the asynchronous thread to finish.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580
> Reported-by: Gustav FÀgerlind <gustav.fagerlind@gmail.com>
> Reported-by: "Li, Weinan Z" <weinan.z.li@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: stable@vger.kernel.org
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

Can you pls resubmit this patch stand-alone? Patchwork (and therefore CI)
won't pick up inlined patches like this one here unfortunately.

Thanks, Daniel

> ---
>  drivers/gpu/drm/i915/i915_dma.c    | 8 +++-----
>  drivers/gpu/drm/i915/intel_fbdev.c | 4 ++++
>  2 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index a0f5659..a76b528 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -437,11 +437,9 @@ static int i915_load_modeset_init(struct drm_device *dev)
>  	 * Some ports require correctly set-up hpd registers for detection to
>  	 * work properly (leading to ghost connected connector status), e.g. VGA
>  	 * on gm45.  Hence we can only set up the initial fbdev config after hpd
> -	 * irqs are fully enabled. Now we should scan for the initial config
> -	 * only once hotplug handling is enabled, but due to screwed-up locking
> -	 * around kms/fbdev init we can't protect the fdbev initial config
> -	 * scanning against hotplug events. Hence do this first and ignore the
> -	 * tiny window where we will loose hotplug notifactions.
> +	 * irqs are fully enabled. We protect the fbdev initial config scanning
> +	 * against hotplug events by waiting in intel_fbdev_output_poll_changed
> +	 * until the asynchronous thread has finished.
>  	 */
>  	intel_fbdev_initial_config_async(dev);
>  
> diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
> index 09840f4..2002b13 100644
> --- a/drivers/gpu/drm/i915/intel_fbdev.c
> +++ b/drivers/gpu/drm/i915/intel_fbdev.c
> @@ -749,6 +749,7 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
>  	struct intel_fbdev *ifbdev = dev_priv->fbdev;
>  	struct fb_info *info;
>  
> +	async_synchronize_full();
>  	if (!ifbdev)
>  		return;
>  
> @@ -795,6 +796,8 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
>  void intel_fbdev_output_poll_changed(struct drm_device *dev)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	async_synchronize_full();
>  	if (dev_priv->fbdev)
>  		drm_fb_helper_hotplug_event(&dev_priv->fbdev->helper);
>  }
> @@ -806,6 +809,7 @@ void intel_fbdev_restore_mode(struct drm_device *dev)
>  	struct intel_fbdev *ifbdev = dev_priv->fbdev;
>  	struct drm_fb_helper *fb_helper;
>  
> +	async_synchronize_full();
>  	if (!ifbdev)
>  		return;
>  
> -- 
> 1.8.5.2 (Apple Git-48)
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Lukas Wunner Feb. 15, 2016, 5:13 p.m. UTC | #2
Originally submitted inline with <20160205145831.GA14084@wunner.de>,
hereby resubmitted standalone for CI as requested by Daniel with
<20160215163251.GR11240@phenom.ffwll.local>.

Above-mentioned message contained the following important note:

> However AFAICT a corner case remains where we're still vulnerable to races:
> async_schedule() runs synchronously "if we're out of memory or if there's
> too much work pending already" (see __async_schedule() in kernel/async.c).
> In this case no entry is added to the pending list and
> async_synchronize_full() might theoretically return before the synchronously
> executed function has finished.
>
> The existing call to async_synchronize_full() in intel_fbdev_fini()
> seems to be susceptible to the same race condition, but it's probably
> very hard to trigger it. (Unload the module before the fbdev is fully
> initialized.)
>
> To make it bullet proof we could declare a completion in intel_fbdev.c
> and wait for that instead. Opinions?


Lukas Wunner (1):
  drm/i915: Fix races on fbdev

 drivers/gpu/drm/i915/i915_dma.c    | 8 +++-----
 drivers/gpu/drm/i915/intel_fbdev.c | 4 ++++
 2 files changed, 7 insertions(+), 5 deletions(-)
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index a0f5659..a76b528 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -437,11 +437,9 @@  static int i915_load_modeset_init(struct drm_device *dev)
 	 * Some ports require correctly set-up hpd registers for detection to
 	 * work properly (leading to ghost connected connector status), e.g. VGA
 	 * on gm45.  Hence we can only set up the initial fbdev config after hpd
-	 * irqs are fully enabled. Now we should scan for the initial config
-	 * only once hotplug handling is enabled, but due to screwed-up locking
-	 * around kms/fbdev init we can't protect the fdbev initial config
-	 * scanning against hotplug events. Hence do this first and ignore the
-	 * tiny window where we will loose hotplug notifactions.
+	 * irqs are fully enabled. We protect the fbdev initial config scanning
+	 * against hotplug events by waiting in intel_fbdev_output_poll_changed
+	 * until the asynchronous thread has finished.
 	 */
 	intel_fbdev_initial_config_async(dev);
 
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index 09840f4..2002b13 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -749,6 +749,7 @@  void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
 	struct intel_fbdev *ifbdev = dev_priv->fbdev;
 	struct fb_info *info;
 
+	async_synchronize_full();
 	if (!ifbdev)
 		return;
 
@@ -795,6 +796,8 @@  void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
 void intel_fbdev_output_poll_changed(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	async_synchronize_full();
 	if (dev_priv->fbdev)
 		drm_fb_helper_hotplug_event(&dev_priv->fbdev->helper);
 }
@@ -806,6 +809,7 @@  void intel_fbdev_restore_mode(struct drm_device *dev)
 	struct intel_fbdev *ifbdev = dev_priv->fbdev;
 	struct drm_fb_helper *fb_helper;
 
+	async_synchronize_full();
 	if (!ifbdev)
 		return;