diff mbox

drm/i915: Fix race condition in intel_dp_destroy_mst_connector()

Message ID 1458155884-13877-1-git-send-email-cpaul@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

cpaul@redhat.com March 16, 2016, 7:18 p.m. UTC
After unplugging a DP MST display from the system, we have to go through
and destroy all of the DRM connectors associated with it since none of
them are valid anymore. Unfortunately, intel_dp_destroy_mst_connector()
doesn't do a good enough job of ensuring that throughout the destruction
process that no modesettings can be done with the connectors. As it is
right now, intel_dp_destroy_mst_connector() works like this:

* Take all modeset locks
* Clear the configuration of the crtc on the connector, if there is one
* Drop all modeset locks, this is required because of circular
  dependency issues that arise with trying to remove the connector from
  sysfs with modeset locks held
* Unregister the connector
* Take all modeset locks, again
* Do the rest of the required cleaning for destroying the connector
* Finally drop all modeset locks for good

This only works sometimes. During the destruction process, it's very
possible that a userspace application will attempt to do a modesetting
using the connector. When we drop the modeset locks, an ioctl handler
such as drm_mode_setcrtc has the oppurtunity to take all of the modeset
locks from us. When this happens, one thing leads to another and
eventually we end up committing a mode with the non-existent connector:

	[drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* failed to enable link training
	[drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f
	[drm:intel_dp_start_link_train [i915]] *ERROR* failed to start channel equalization
	[drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f
	[drm:intel_mst_pre_enable_dp [i915]] *ERROR* failed to allocate vcpi

And in some cases, such as with the T460s using an MST dock, this
results in breaking modesetting and/or panicking the system.

To work around this, we now unregister the connector at the very
beginning of intel_dp_destroy_mst_connector(), grab all the modesetting
locks, and then hold them until we finish the rest of the function.

CC: stable@vger.kernel.org
Signed-off-by: Lyude <cpaul@redhat.com>
Signed-off-by: Rob Clark <rclark@redhat.com>
---
 drivers/gpu/drm/i915/intel_dp_mst.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Comments

Ville Syrjala March 16, 2016, 7:39 p.m. UTC | #1
On Wed, Mar 16, 2016 at 03:18:04PM -0400, Lyude wrote:
> After unplugging a DP MST display from the system, we have to go through
> and destroy all of the DRM connectors associated with it since none of
> them are valid anymore. Unfortunately, intel_dp_destroy_mst_connector()
> doesn't do a good enough job of ensuring that throughout the destruction
> process that no modesettings can be done with the connectors. As it is
> right now, intel_dp_destroy_mst_connector() works like this:
> 
> * Take all modeset locks
> * Clear the configuration of the crtc on the connector, if there is one
> * Drop all modeset locks, this is required because of circular
>   dependency issues that arise with trying to remove the connector from
>   sysfs with modeset locks held
> * Unregister the connector
> * Take all modeset locks, again
> * Do the rest of the required cleaning for destroying the connector
> * Finally drop all modeset locks for good

So pretty much what I suspected
https://lists.freedesktop.org/archives/intel-gfx/2016-February/087734.html

> 
> This only works sometimes. During the destruction process, it's very
> possible that a userspace application will attempt to do a modesetting
> using the connector. When we drop the modeset locks, an ioctl handler
> such as drm_mode_setcrtc has the oppurtunity to take all of the modeset
> locks from us. When this happens, one thing leads to another and
> eventually we end up committing a mode with the non-existent connector:
> 
> 	[drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* failed to enable link training
> 	[drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f
> 	[drm:intel_dp_start_link_train [i915]] *ERROR* failed to start channel equalization
> 	[drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f
> 	[drm:intel_mst_pre_enable_dp [i915]] *ERROR* failed to allocate vcpi
> 
> And in some cases, such as with the T460s using an MST dock, this
> results in breaking modesetting and/or panicking the system.

Are these just kernel oopses etc.? If the hardware gets upset from
modesetting when the sink is gone, well, then we still have a problem
because the user can of course yank the cable while the modeset is already
underway.

> 
> To work around this, we now unregister the connector at the very
> beginning of intel_dp_destroy_mst_connector(), grab all the modesetting
> locks, and then hold them until we finish the rest of the function.
> 
> CC: stable@vger.kernel.org
> Signed-off-by: Lyude <cpaul@redhat.com>
> Signed-off-by: Rob Clark <rclark@redhat.com>

These sobs don't make much sense to me.

Patch itself does make sense to me, so 
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/intel_dp_mst.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
> index fa0dabf..b21ac88 100644
> --- a/drivers/gpu/drm/i915/intel_dp_mst.c
> +++ b/drivers/gpu/drm/i915/intel_dp_mst.c
> @@ -499,6 +499,8 @@ static void intel_dp_destroy_mst_connector(struct drm_dp_mst_topology_mgr *mgr,
>  	struct intel_connector *intel_connector = to_intel_connector(connector);
>  	struct drm_device *dev = connector->dev;
>  
> +	intel_connector->unregister(intel_connector);
> +
>  	/* need to nuke the connector */
>  	drm_modeset_lock_all(dev);
>  	if (connector->state->crtc) {
> @@ -512,11 +514,7 @@ static void intel_dp_destroy_mst_connector(struct drm_dp_mst_topology_mgr *mgr,
>  
>  		WARN(ret, "Disabling mst crtc failed with %i\n", ret);
>  	}
> -	drm_modeset_unlock_all(dev);
>  
> -	intel_connector->unregister(intel_connector);
> -
> -	drm_modeset_lock_all(dev);
>  	intel_connector_remove_from_fbdev(intel_connector);
>  	drm_connector_cleanup(connector);
>  	drm_modeset_unlock_all(dev);
> -- 
> 2.5.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
cpaul@redhat.com March 16, 2016, 7:44 p.m. UTC | #2
On Wed, 2016-03-16 at 21:39 +0200, Ville Syrjälä wrote:
> On Wed, Mar 16, 2016 at 03:18:04PM -0400, Lyude wrote:
> > 
> > After unplugging a DP MST display from the system, we have to go through
> > and destroy all of the DRM connectors associated with it since none of
> > them are valid anymore. Unfortunately, intel_dp_destroy_mst_connector()
> > doesn't do a good enough job of ensuring that throughout the destruction
> > process that no modesettings can be done with the connectors. As it is
> > right now, intel_dp_destroy_mst_connector() works like this:
> > 
> > * Take all modeset locks
> > * Clear the configuration of the crtc on the connector, if there is one
> > * Drop all modeset locks, this is required because of circular
> >   dependency issues that arise with trying to remove the connector from
> >   sysfs with modeset locks held
> > * Unregister the connector
> > * Take all modeset locks, again
> > * Do the rest of the required cleaning for destroying the connector
> > * Finally drop all modeset locks for good
> So pretty much what I suspected
> https://lists.freedesktop.org/archives/intel-gfx/2016-February/087734.html
> 
> > 
> > 
> > This only works sometimes. During the destruction process, it's very
> > possible that a userspace application will attempt to do a modesetting
> > using the connector. When we drop the modeset locks, an ioctl handler
> > such as drm_mode_setcrtc has the oppurtunity to take all of the modeset
> > locks from us. When this happens, one thing leads to another and
> > eventually we end up committing a mode with the non-existent connector:
> > 
> > 	[drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* failed to
> > enable link training
> > 	[drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f
> > 	[drm:intel_dp_start_link_train [i915]] *ERROR* failed to start channel
> > equalization
> > 	[drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f
> > 	[drm:intel_mst_pre_enable_dp [i915]] *ERROR* failed to allocate vcpi
> > 
> > And in some cases, such as with the T460s using an MST dock, this
> > results in breaking modesetting and/or panicking the system.
> Are these just kernel oopses etc.? If the hardware gets upset from
> modesetting when the sink is gone, well, then we still have a problem
> because the user can of course yank the cable while the modeset is already
> underway.
It is more then that. Unfortunately though, fixing that part is not as easy. We
never expect an atomic modesetting commit to fail, but unfortunately any code
having to do with turning on DP MST has the chance of failing and we turn on DP
MST during commits. So fixing that would take moving quite a bit of code around.

> 
> > 
> > 
> > To work around this, we now unregister the connector at the very
> > beginning of intel_dp_destroy_mst_connector(), grab all the modesetting
> > locks, and then hold them until we finish the rest of the function.
> > 
> > CC: stable@vger.kernel.org
> > Signed-off-by: Lyude <cpaul@redhat.com>
> > Signed-off-by: Rob Clark <rclark@redhat.com>
> These sobs don't make much sense to me.
I should have mentioned that Rob Clark was the one who came up with the idea of
just moving the connector->unregister() call to the top of the function.

> 
> Patch itself does make sense to me, so 
> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> > 
> > ---
> >  drivers/gpu/drm/i915/intel_dp_mst.c | 6 ++----
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c
> > b/drivers/gpu/drm/i915/intel_dp_mst.c
> > index fa0dabf..b21ac88 100644
> > --- a/drivers/gpu/drm/i915/intel_dp_mst.c
> > +++ b/drivers/gpu/drm/i915/intel_dp_mst.c
> > @@ -499,6 +499,8 @@ static void intel_dp_destroy_mst_connector(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  	struct intel_connector *intel_connector =
> > to_intel_connector(connector);
> >  	struct drm_device *dev = connector->dev;
> >  
> > +	intel_connector->unregister(intel_connector);
> > +
> >  	/* need to nuke the connector */
> >  	drm_modeset_lock_all(dev);
> >  	if (connector->state->crtc) {
> > @@ -512,11 +514,7 @@ static void intel_dp_destroy_mst_connector(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  
> >  		WARN(ret, "Disabling mst crtc failed with %i\n", ret);
> >  	}
> > -	drm_modeset_unlock_all(dev);
> >  
> > -	intel_connector->unregister(intel_connector);
> > -
> > -	drm_modeset_lock_all(dev);
> >  	intel_connector_remove_from_fbdev(intel_connector);
> >  	drm_connector_cleanup(connector);
> >  	drm_modeset_unlock_all(dev);
> > -- 
> > 2.5.0
> > 
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Ville Syrjala March 16, 2016, 7:59 p.m. UTC | #3
On Wed, Mar 16, 2016 at 03:44:37PM -0400, Lyude Paul wrote:
> On Wed, 2016-03-16 at 21:39 +0200, Ville Syrjälä wrote:
> > On Wed, Mar 16, 2016 at 03:18:04PM -0400, Lyude wrote:
> > > 
> > > After unplugging a DP MST display from the system, we have to go through
> > > and destroy all of the DRM connectors associated with it since none of
> > > them are valid anymore. Unfortunately, intel_dp_destroy_mst_connector()
> > > doesn't do a good enough job of ensuring that throughout the destruction
> > > process that no modesettings can be done with the connectors. As it is
> > > right now, intel_dp_destroy_mst_connector() works like this:
> > > 
> > > * Take all modeset locks
> > > * Clear the configuration of the crtc on the connector, if there is one
> > > * Drop all modeset locks, this is required because of circular
> > >   dependency issues that arise with trying to remove the connector from
> > >   sysfs with modeset locks held
> > > * Unregister the connector
> > > * Take all modeset locks, again
> > > * Do the rest of the required cleaning for destroying the connector
> > > * Finally drop all modeset locks for good
> > So pretty much what I suspected
> > https://lists.freedesktop.org/archives/intel-gfx/2016-February/087734.html
> > 
> > > 
> > > 
> > > This only works sometimes. During the destruction process, it's very
> > > possible that a userspace application will attempt to do a modesetting
> > > using the connector. When we drop the modeset locks, an ioctl handler
> > > such as drm_mode_setcrtc has the oppurtunity to take all of the modeset
> > > locks from us. When this happens, one thing leads to another and
> > > eventually we end up committing a mode with the non-existent connector:
> > > 
> > > 	[drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* failed to
> > > enable link training
> > > 	[drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f
> > > 	[drm:intel_dp_start_link_train [i915]] *ERROR* failed to start channel
> > > equalization
> > > 	[drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f
> > > 	[drm:intel_mst_pre_enable_dp [i915]] *ERROR* failed to allocate vcpi
> > > 
> > > And in some cases, such as with the T460s using an MST dock, this
> > > results in breaking modesetting and/or panicking the system.
> > Are these just kernel oopses etc.? If the hardware gets upset from
> > modesetting when the sink is gone, well, then we still have a problem
> > because the user can of course yank the cable while the modeset is already
> > underway.
> It is more then that. Unfortunately though, fixing that part is not as easy. We
> never expect an atomic modesetting commit to fail, but unfortunately any code
> having to do with turning on DP MST has the chance of failing and we turn on DP
> MST during commits. So fixing that would take moving quite a bit of code around.

SST has the same problems really. The sink may be gone so link training
etc. just won't succeed. But we should still finish the modeset without
killing the system or something.

> 
> > 
> > > 
> > > 
> > > To work around this, we now unregister the connector at the very
> > > beginning of intel_dp_destroy_mst_connector(), grab all the modesetting
> > > locks, and then hold them until we finish the rest of the function.
> > > 
> > > CC: stable@vger.kernel.org
> > > Signed-off-by: Lyude <cpaul@redhat.com>
> > > Signed-off-by: Rob Clark <rclark@redhat.com>
> > These sobs don't make much sense to me.
> I should have mentioned that Rob Clark was the one who came up with the idea of
> just moving the connector->unregister() call to the top of the function.
> 
> > 
> > Patch itself does make sense to me, so 
> > Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > > 
> > > ---
> > >  drivers/gpu/drm/i915/intel_dp_mst.c | 6 ++----
> > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c
> > > b/drivers/gpu/drm/i915/intel_dp_mst.c
> > > index fa0dabf..b21ac88 100644
> > > --- a/drivers/gpu/drm/i915/intel_dp_mst.c
> > > +++ b/drivers/gpu/drm/i915/intel_dp_mst.c
> > > @@ -499,6 +499,8 @@ static void intel_dp_destroy_mst_connector(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >  	struct intel_connector *intel_connector =
> > > to_intel_connector(connector);
> > >  	struct drm_device *dev = connector->dev;
> > >  
> > > +	intel_connector->unregister(intel_connector);
> > > +
> > >  	/* need to nuke the connector */
> > >  	drm_modeset_lock_all(dev);
> > >  	if (connector->state->crtc) {
> > > @@ -512,11 +514,7 @@ static void intel_dp_destroy_mst_connector(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >  
> > >  		WARN(ret, "Disabling mst crtc failed with %i\n", ret);
> > >  	}
> > > -	drm_modeset_unlock_all(dev);
> > >  
> > > -	intel_connector->unregister(intel_connector);
> > > -
> > > -	drm_modeset_lock_all(dev);
> > >  	intel_connector_remove_from_fbdev(intel_connector);
> > >  	drm_connector_cleanup(connector);
> > >  	drm_modeset_unlock_all(dev);
> > > -- 
> > > 2.5.0
> > > 
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Daniel Vetter March 17, 2016, 8:12 a.m. UTC | #4
On Wed, Mar 16, 2016 at 09:39:49PM +0200, Ville Syrjälä wrote:
> On Wed, Mar 16, 2016 at 03:18:04PM -0400, Lyude wrote:
> > After unplugging a DP MST display from the system, we have to go through
> > and destroy all of the DRM connectors associated with it since none of
> > them are valid anymore. Unfortunately, intel_dp_destroy_mst_connector()
> > doesn't do a good enough job of ensuring that throughout the destruction
> > process that no modesettings can be done with the connectors. As it is
> > right now, intel_dp_destroy_mst_connector() works like this:
> > 
> > * Take all modeset locks
> > * Clear the configuration of the crtc on the connector, if there is one
> > * Drop all modeset locks, this is required because of circular
> >   dependency issues that arise with trying to remove the connector from
> >   sysfs with modeset locks held
> > * Unregister the connector
> > * Take all modeset locks, again
> > * Do the rest of the required cleaning for destroying the connector
> > * Finally drop all modeset locks for good
> 
> So pretty much what I suspected
> https://lists.freedesktop.org/archives/intel-gfx/2016-February/087734.html
> 
> > 
> > This only works sometimes. During the destruction process, it's very
> > possible that a userspace application will attempt to do a modesetting
> > using the connector. When we drop the modeset locks, an ioctl handler
> > such as drm_mode_setcrtc has the oppurtunity to take all of the modeset
> > locks from us. When this happens, one thing leads to another and
> > eventually we end up committing a mode with the non-existent connector:
> > 
> > 	[drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* failed to enable link training
> > 	[drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f
> > 	[drm:intel_dp_start_link_train [i915]] *ERROR* failed to start channel equalization
> > 	[drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f
> > 	[drm:intel_mst_pre_enable_dp [i915]] *ERROR* failed to allocate vcpi
> > 
> > And in some cases, such as with the T460s using an MST dock, this
> > results in breaking modesetting and/or panicking the system.
> 
> Are these just kernel oopses etc.? If the hardware gets upset from
> modesetting when the sink is gone, well, then we still have a problem
> because the user can of course yank the cable while the modeset is already
> underway.
> 
> > 
> > To work around this, we now unregister the connector at the very
> > beginning of intel_dp_destroy_mst_connector(), grab all the modesetting
> > locks, and then hold them until we finish the rest of the function.
> > 
> > CC: stable@vger.kernel.org
> > Signed-off-by: Lyude <cpaul@redhat.com>
> > Signed-off-by: Rob Clark <rclark@redhat.com>
> 
> These sobs don't make much sense to me.
> 
> Patch itself does make sense to me, so 
> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Applied, thanks.
-Daniel

> 
> > ---
> >  drivers/gpu/drm/i915/intel_dp_mst.c | 6 ++----
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
> > index fa0dabf..b21ac88 100644
> > --- a/drivers/gpu/drm/i915/intel_dp_mst.c
> > +++ b/drivers/gpu/drm/i915/intel_dp_mst.c
> > @@ -499,6 +499,8 @@ static void intel_dp_destroy_mst_connector(struct drm_dp_mst_topology_mgr *mgr,
> >  	struct intel_connector *intel_connector = to_intel_connector(connector);
> >  	struct drm_device *dev = connector->dev;
> >  
> > +	intel_connector->unregister(intel_connector);
> > +
> >  	/* need to nuke the connector */
> >  	drm_modeset_lock_all(dev);
> >  	if (connector->state->crtc) {
> > @@ -512,11 +514,7 @@ static void intel_dp_destroy_mst_connector(struct drm_dp_mst_topology_mgr *mgr,
> >  
> >  		WARN(ret, "Disabling mst crtc failed with %i\n", ret);
> >  	}
> > -	drm_modeset_unlock_all(dev);
> >  
> > -	intel_connector->unregister(intel_connector);
> > -
> > -	drm_modeset_lock_all(dev);
> >  	intel_connector_remove_from_fbdev(intel_connector);
> >  	drm_connector_cleanup(connector);
> >  	drm_modeset_unlock_all(dev);
> > -- 
> > 2.5.0
> > 
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> -- 
> Ville Syrjälä
> Intel OTC
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
index fa0dabf..b21ac88 100644
--- a/drivers/gpu/drm/i915/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/intel_dp_mst.c
@@ -499,6 +499,8 @@  static void intel_dp_destroy_mst_connector(struct drm_dp_mst_topology_mgr *mgr,
 	struct intel_connector *intel_connector = to_intel_connector(connector);
 	struct drm_device *dev = connector->dev;
 
+	intel_connector->unregister(intel_connector);
+
 	/* need to nuke the connector */
 	drm_modeset_lock_all(dev);
 	if (connector->state->crtc) {
@@ -512,11 +514,7 @@  static void intel_dp_destroy_mst_connector(struct drm_dp_mst_topology_mgr *mgr,
 
 		WARN(ret, "Disabling mst crtc failed with %i\n", ret);
 	}
-	drm_modeset_unlock_all(dev);
 
-	intel_connector->unregister(intel_connector);
-
-	drm_modeset_lock_all(dev);
 	intel_connector_remove_from_fbdev(intel_connector);
 	drm_connector_cleanup(connector);
 	drm_modeset_unlock_all(dev);