Message ID | 20230824205335.500163-6-gildekel@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/dp_link_training: Define a final failure state when link training fails | expand |
On Thu, Aug 24, 2023 at 04:50:20PM -0400, Gil Dekel wrote: > Before sending a uevent to userspace in order to trigger a corrective > modeset, we change the failing connector's link-status to BAD. However, > the downstream MST branch ports are left in their original GOOD state. > > This patch utilizes the drm helper function > drm_dp_set_mst_topology_link_status() to rectify this and set all > downstream MST connectors' link-status to BAD before emitting the uevent > to userspace. > > Signed-off-by: Gil Dekel <gildekel@chromium.org> > --- > drivers/gpu/drm/i915/display/intel_dp.c | 16 ++++++++++------ > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c > index 42353b1ac487..e8b10f59e141 100644 > --- a/drivers/gpu/drm/i915/display/intel_dp.c > +++ b/drivers/gpu/drm/i915/display/intel_dp.c > @@ -5995,16 +5995,20 @@ static void intel_dp_modeset_retry_work_fn(struct work_struct *work) > struct intel_dp *intel_dp = > container_of(work, typeof(*intel_dp), modeset_retry_work); > struct drm_connector *connector = &intel_dp->attached_connector->base; > - drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s]\n", connector->base.id, > - connector->name); > > - /* Grab the locks before changing connector property*/ > - mutex_lock(&connector->dev->mode_config.mutex); > - /* Set connector link status to BAD and send a Uevent to notify > - * userspace to do a modeset. > + /* Set the connector's (and possibly all its downstream MST ports') link > + * status to BAD. > */ > + mutex_lock(&connector->dev->mode_config.mutex); > + drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s] link status %d -> %d\n", > + connector->base.id, connector->name, > + connector->state->link_status, DRM_MODE_LINK_STATUS_BAD); > drm_connector_set_link_status_property(connector, > DRM_MODE_LINK_STATUS_BAD); > + if (intel_dp->is_mst) { > + drm_dp_set_mst_topology_link_status(&intel_dp->mst_mgr, > + DRM_MODE_LINK_STATUS_BAD); Something is weird with the locking here. I noticed that on patch 3 this new function also gets the same mutex_lock(&connector->dev->mode_config.mutex); Since you didn't reach the deadlock, I'm clearly missing something on the flow. But regardless of what I could be missing, I believe this is totally not future proof and we will for sure hit dead-lock cases. > + } > mutex_unlock(&connector->dev->mode_config.mutex); > /* Send Hotplug uevent so userspace can reprobe */ > drm_kms_helper_connector_hotplug_event(connector); > -- > Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics
On Fri, Sep 1, 2023 at 2:55 PM Rodrigo Vivi <rodrigo.vivi@intel.com> wrote: > > On Thu, Aug 24, 2023 at 04:50:20PM -0400, Gil Dekel wrote: > > Before sending a uevent to userspace in order to trigger a corrective > > modeset, we change the failing connector's link-status to BAD. However, > > the downstream MST branch ports are left in their original GOOD state. > > > > This patch utilizes the drm helper function > > drm_dp_set_mst_topology_link_status() to rectify this and set all > > downstream MST connectors' link-status to BAD before emitting the uevent > > to userspace. > > > > Signed-off-by: Gil Dekel <gildekel@chromium.org> > > --- > > drivers/gpu/drm/i915/display/intel_dp.c | 16 ++++++++++------ > > 1 file changed, 10 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c > > index 42353b1ac487..e8b10f59e141 100644 > > --- a/drivers/gpu/drm/i915/display/intel_dp.c > > +++ b/drivers/gpu/drm/i915/display/intel_dp.c > > @@ -5995,16 +5995,20 @@ static void intel_dp_modeset_retry_work_fn(struct work_struct *work) > > struct intel_dp *intel_dp = > > container_of(work, typeof(*intel_dp), modeset_retry_work); > > struct drm_connector *connector = &intel_dp->attached_connector->base; > > - drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s]\n", connector->base.id, > > - connector->name); > > > > - /* Grab the locks before changing connector property*/ > > - mutex_lock(&connector->dev->mode_config.mutex); > > - /* Set connector link status to BAD and send a Uevent to notify > > - * userspace to do a modeset. > > + /* Set the connector's (and possibly all its downstream MST ports') link > > + * status to BAD. > > */ > > + mutex_lock(&connector->dev->mode_config.mutex); > > + drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s] link status %d -> %d\n", > > + connector->base.id, connector->name, > > + connector->state->link_status, DRM_MODE_LINK_STATUS_BAD); > > drm_connector_set_link_status_property(connector, > > DRM_MODE_LINK_STATUS_BAD); > > + if (intel_dp->is_mst) { > > + drm_dp_set_mst_topology_link_status(&intel_dp->mst_mgr, > > + DRM_MODE_LINK_STATUS_BAD); > > Something is weird with the locking here. > I noticed that on patch 3 this new function also gets the same > mutex_lock(&connector->dev->mode_config.mutex); > > Since you didn't reach the deadlock, I'm clearly missing something > on the flow. But regardless of what I could be missing, I believe > this is totally not future proof and we will for sure hit dead-lock > cases. > You are not wrong. Something must have been wrong in my workflow, as I was positive I tested the code with this lock, but I must remember wrong. I tried testing my current code and it immediately locked, as you expected. So thank you for catching this. Lyude's original patch didn't include drm_dp_set_mst_topology_link_status() as an exposed drm helper function, so when I adjusted it for this series, I decided to add locks similar to how her other function using drm_dp_set_mst_topology_link_status() did. However, I failed to use the right lock, which is: drm_modeset_lock(&connector->dev->mode_config.connection_mutex, NULL); drm_modeset_unlock(&connector->dev->mode_config.connection_mutex); This is similar to how drm_connector_set_link_status_property() locks before writing to link_status. I made sure to test my code with the above locks, and it runs well. Here's an instrumented log excerpt for failing link-training with an MST hub (I hacked the driver to always fail non eDP connectors and print the raw pointer addresses of the drm_device and mutex right before locking): [ 43.466329] i915 0000:00:02.0: [drm] *ERROR* Link Training Unsuccessful via gildekel HACK - (not eDP) [ 43.594950] i915 0000:00:02.0: [drm] *ERROR* Link Training Unsuccessful via gildekel HACK - (not eDP) [ 43.594979] i915 0000:00:02.0: [drm] *ERROR* Link Training Unsuccessful [ 43.595023] i915 0000:00:02.0: [drm] *ERROR* [CONNECTOR:273:DP-3]: [ 43.595028] i915 0000:00:02.0: [drm] *ERROR* connector->dev=00000000d4850450 [ 43.595033] i915 0000:00:02.0: [drm] *ERROR* connector->dev->mode_config.mutex=00000000aac3fe45 [ 44.771091] i915 0000:00:02.0: [drm] *ERROR* [MST-CONNECTOR:300:DP-5]: [ 44.771108] i915 0000:00:02.0: [drm] *ERROR* connector->dev=000000003fb97435 [ 44.771115] i915 0000:00:02.0: [drm] *ERROR* &connector->dev->mode_config.connection_mutex=000000009aece20e [ 44.771127] i915 0000:00:02.0: [drm] *ERROR* [MST-CONNECTOR:303:DP-6]: [ 44.771132] i915 0000:00:02.0: [drm] *ERROR* connector->dev=0000000075236b75 [ 44.771137] i915 0000:00:02.0: [drm] *ERROR* &connector->dev->mode_config.connection_mutex=000000009aece20e Also, I was under the assumption that all connectors in an MST topology should reference the same drm_device object, but it seems like that's not the case. Is my assumption wrong? > > + } > > mutex_unlock(&connector->dev->mode_config.mutex); > > /* Send Hotplug uevent so userspace can reprobe */ > > drm_kms_helper_connector_hotplug_event(connector); > > -- > > Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics Thanks for your time and comments! -- Best, Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics
On Fri, Sep 1, 2023 at 5:13 PM Gil Dekel <gildekel@chromium.org> wrote: > > On Fri, Sep 1, 2023 at 2:55 PM Rodrigo Vivi <rodrigo.vivi@intel.com> wrote: > > > > On Thu, Aug 24, 2023 at 04:50:20PM -0400, Gil Dekel wrote: > > > Before sending a uevent to userspace in order to trigger a corrective > > > modeset, we change the failing connector's link-status to BAD. However, > > > the downstream MST branch ports are left in their original GOOD state. > > > > > > This patch utilizes the drm helper function > > > drm_dp_set_mst_topology_link_status() to rectify this and set all > > > downstream MST connectors' link-status to BAD before emitting the uevent > > > to userspace. > > > > > > Signed-off-by: Gil Dekel <gildekel@chromium.org> > > > --- > > > drivers/gpu/drm/i915/display/intel_dp.c | 16 ++++++++++------ > > > 1 file changed, 10 insertions(+), 6 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c > > > index 42353b1ac487..e8b10f59e141 100644 > > > --- a/drivers/gpu/drm/i915/display/intel_dp.c > > > +++ b/drivers/gpu/drm/i915/display/intel_dp.c > > > @@ -5995,16 +5995,20 @@ static void intel_dp_modeset_retry_work_fn(struct work_struct *work) > > > struct intel_dp *intel_dp = > > > container_of(work, typeof(*intel_dp), modeset_retry_work); > > > struct drm_connector *connector = &intel_dp->attached_connector->base; > > > - drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s]\n", connector->base.id, > > > - connector->name); > > > > > > - /* Grab the locks before changing connector property*/ > > > - mutex_lock(&connector->dev->mode_config.mutex); > > > - /* Set connector link status to BAD and send a Uevent to notify > > > - * userspace to do a modeset. > > > + /* Set the connector's (and possibly all its downstream MST ports') link > > > + * status to BAD. > > > */ > > > + mutex_lock(&connector->dev->mode_config.mutex); > > > + drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s] link status %d -> %d\n", > > > + connector->base.id, connector->name, > > > + connector->state->link_status, DRM_MODE_LINK_STATUS_BAD); > > > drm_connector_set_link_status_property(connector, > > > DRM_MODE_LINK_STATUS_BAD); > > > + if (intel_dp->is_mst) { > > > + drm_dp_set_mst_topology_link_status(&intel_dp->mst_mgr, > > > + DRM_MODE_LINK_STATUS_BAD); > > > > Something is weird with the locking here. > > I noticed that on patch 3 this new function also gets the same > > mutex_lock(&connector->dev->mode_config.mutex); > > > > Since you didn't reach the deadlock, I'm clearly missing something > > on the flow. But regardless of what I could be missing, I believe > > this is totally not future proof and we will for sure hit dead-lock > > cases. > > > You are not wrong. > > Something must have been wrong in my workflow, as I was positive I > tested the code with this lock, but I must remember wrong. I tried > testing my current code and it immediately locked, as you expected. > So thank you for catching this. > > Lyude's original patch didn't include drm_dp_set_mst_topology_link_status() > as an exposed drm helper function, so when I adjusted it for this series, I > decided to add locks similar to how her other function using > drm_dp_set_mst_topology_link_status() did. However, I failed to use the > right lock, which is: > drm_modeset_lock(&connector->dev->mode_config.connection_mutex, NULL); > drm_modeset_unlock(&connector->dev->mode_config.connection_mutex); > This is similar to how drm_connector_set_link_status_property() locks > before writing to link_status. > > I made sure to test my code with the above locks, and it runs well. Here's > an instrumented log excerpt for failing link-training with an MST hub > (I hacked the driver to always fail non eDP connectors and print the > raw pointer addresses of the drm_device and mutex right before locking): > [ 43.466329] i915 0000:00:02.0: [drm] *ERROR* Link Training > Unsuccessful via gildekel HACK - (not eDP) > [ 43.594950] i915 0000:00:02.0: [drm] *ERROR* Link Training > Unsuccessful via gildekel HACK - (not eDP) > [ 43.594979] i915 0000:00:02.0: [drm] *ERROR* Link Training Unsuccessful > [ 43.595023] i915 0000:00:02.0: [drm] *ERROR* [CONNECTOR:273:DP-3]: > [ 43.595028] i915 0000:00:02.0: [drm] *ERROR* > connector->dev=00000000d4850450 > [ 43.595033] i915 0000:00:02.0: [drm] *ERROR* > connector->dev->mode_config.mutex=00000000aac3fe45 > [ 44.771091] i915 0000:00:02.0: [drm] *ERROR* > [MST-CONNECTOR:300:DP-5]: > [ 44.771108] i915 0000:00:02.0: [drm] *ERROR* > connector->dev=000000003fb97435 > [ 44.771115] i915 0000:00:02.0: [drm] *ERROR* > &connector->dev->mode_config.connection_mutex=000000009aece20e > [ 44.771127] i915 0000:00:02.0: [drm] *ERROR* > [MST-CONNECTOR:303:DP-6]: > [ 44.771132] i915 0000:00:02.0: [drm] *ERROR* > connector->dev=0000000075236b75 > [ 44.771137] i915 0000:00:02.0: [drm] *ERROR* > &connector->dev->mode_config.connection_mutex=000000009aece20e > > Also, I was under the assumption that all connectors in an MST topology > should reference the same drm_device object, but it seems like that's > not the case. Is my assumption wrong? > Sorry, please disregard. I was printing the pointers' address instead of the value's address. Same drm_device is shared: Link Training Unsuccessful via gildekel HACK - (not eDP) Link Training Unsuccessful via gildekel HACK - (not eDP) Link Training Unsuccessful [CONNECTOR:273:DP-3]: connector->dev=00000000b88b882c connector->dev->mode_config.mutex=00000000d64b22db [MST-CONNECTOR:297:DP-5]: connector->dev=00000000b88b882c &connector->dev->mode_config.connection_mutex=000000002d876227 [MST-CONNECTOR:301:DP-6]: connector->dev=00000000b88b882c &connector->dev->mode_config.connection_mutex=000000002d876227 Sorry for the noise. > > > + } > > > mutex_unlock(&connector->dev->mode_config.mutex); > > > /* Send Hotplug uevent so userspace can reprobe */ > > > drm_kms_helper_connector_hotplug_event(connector); > > > -- > > > Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics > > > Thanks for your time and comments! > -- > Best, > Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c index 42353b1ac487..e8b10f59e141 100644 --- a/drivers/gpu/drm/i915/display/intel_dp.c +++ b/drivers/gpu/drm/i915/display/intel_dp.c @@ -5995,16 +5995,20 @@ static void intel_dp_modeset_retry_work_fn(struct work_struct *work) struct intel_dp *intel_dp = container_of(work, typeof(*intel_dp), modeset_retry_work); struct drm_connector *connector = &intel_dp->attached_connector->base; - drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s]\n", connector->base.id, - connector->name); - /* Grab the locks before changing connector property*/ - mutex_lock(&connector->dev->mode_config.mutex); - /* Set connector link status to BAD and send a Uevent to notify - * userspace to do a modeset. + /* Set the connector's (and possibly all its downstream MST ports') link + * status to BAD. */ + mutex_lock(&connector->dev->mode_config.mutex); + drm_dbg_kms(connector->dev, "[CONNECTOR:%d:%s] link status %d -> %d\n", + connector->base.id, connector->name, + connector->state->link_status, DRM_MODE_LINK_STATUS_BAD); drm_connector_set_link_status_property(connector, DRM_MODE_LINK_STATUS_BAD); + if (intel_dp->is_mst) { + drm_dp_set_mst_topology_link_status(&intel_dp->mst_mgr, + DRM_MODE_LINK_STATUS_BAD); + } mutex_unlock(&connector->dev->mode_config.mutex); /* Send Hotplug uevent so userspace can reprobe */ drm_kms_helper_connector_hotplug_event(connector);
Before sending a uevent to userspace in order to trigger a corrective modeset, we change the failing connector's link-status to BAD. However, the downstream MST branch ports are left in their original GOOD state. This patch utilizes the drm helper function drm_dp_set_mst_topology_link_status() to rectify this and set all downstream MST connectors' link-status to BAD before emitting the uevent to userspace. Signed-off-by: Gil Dekel <gildekel@chromium.org> --- drivers/gpu/drm/i915/display/intel_dp.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) -- Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics