Message ID | 20230824205335.500163-1-gildekel@chromium.org (mailing list archive) |
---|---|
Headers | show |
Series | drm/i915/dp_link_training: Define a final failure state when link training fails | expand |
Other then the name typo (s/Pual/Paul): Signed-off-by: Lyude Paul <lyude@redhat.com> (just since I co-authored things~) Reviewed-by: Lyude Paul <lyude@redhat.com> I think we definitely want to make sure we get intel's opinions on this though, especially regarding the usage of link-status. I think we're close enough to link-status's intended purpose, but I definitely would like to know what others think about that since userspace will definitely have to handle situations like this a bit differently than with SST. Also - definitely make sure you take a look at Imre's patch series that's currently on the list (I just finished reviewing it), since it adds some things to the helpers that might end up being useful here :) https://patchwork.freedesktop.org/series/122589/ On Thu, 2023-08-24 at 16:50 -0400, Gil Dekel wrote: > Next version of https://patchwork.freedesktop.org/series/122850/ > > v4: > Another blunder. I uploaded the patches from my ChromeiumOS kernel dev repo > instead of drm-tip/drm-tip. Apologies for the noise :( > > v3: > Still learning the ropes of upstream workflow. Apologies for mucking up v2. > This is just a re-upload. > > v2: > Reorganize into: > 1) Add for final failure state for SST and MST link training fallback. > 2) Add a DRM helper for setting downstream MST ports' link-status state. > 3) Make handling SST and MST connectors simpler via intel_dp. > 4) Update link-status for downstream MST ports. > 5) Emit a uevent with the "link-status" trigger property. > > v1: > Currently, when link training fails after all fallback values have been > exhausted, the i915 driver seizes to send uevents to userspace. This leave > userspace thinking that the last passing atomic commit was successful, and that > all connectors (displays) are connected and operational, when in fact, the last > link failed to train and the displays remain dark. This manifests as "zombie" > displays in userspace, in which users observe the displays appear in their > display settings page, but they are dark and unresponsive. > > Since, at the time of writing, MST link training fallback is not implemented, > failing MST link training is a significantly more common case then a complete > SST link training failure. And with users using MST hubs more than ever to > connect multiple displays via their USB-C ports we observe this case often. > > This patchset series suggest a solution, in which a final failure state is > defined. In this final state, the connector's bit rate capabilities, namely > max_link_rate and max_link_lane_count, are set to 0. This effectively set the > connector's bandwidth to 0Gbps, thus causing all its modes to be pruned in the > following connector probing. > > Next, with this state defined, we emit a link-status=Bad uevent. The next time > userspace probes the connector, it should recognize that the connector has no > modes and ignore it since it is in a bad state. > > I am aware that always sending a uevent and never stopping may result in some > userspaces having their expectations broken and enter an infinite loop of > modesets and link-training attempts. However, per DRM link-status spec: > ``` > * link-status: > * Connector link-status property to indicate the status of link. The > * default value of link-status is "GOOD". If something fails during or > * after modeset, the kernel driver may set this to "BAD" and issue a > * hotplug uevent. Drivers should update this value using > * drm_connector_set_link_status_property(). > * > * When user-space receives the hotplug uevent and detects a "BAD" > * link-status, the sink doesn't receive pixels anymore (e.g. the screen > * becomes completely black). The list of available modes may have > * changed. User-space is expected to pick a new mode if the current one > * has disappeared and perform a new modeset with link-status set to > * "GOOD" to re-enable the connector. > ``` > (form drivers/gpu/drm/drm_connector.c - DOC: standard connector properties) > > it seems reasonable to assume that the suggested state is an extension of the > spec's guidelines, in which the next new mode userspace picks for a connector > with no modes is - none, thus breaking the cycle of failed link-training > attempts. > > I suspect that, maybe, zeroing out the bit rate capabilities is not the right > way to go, and perhaps marking the connector as disconnected instead may be a > better solution. However, if marking a connector disconnected is the way to go, > We will have to iterate over all MST ports in the MST case and mark the spawned > connectors as disconnected as well. I -think- this is probably fine, that's likely how I'd > > As a final note I should add that this approach was tested with ChromeOS as > userspace, and we observed that the zombie displays stop showing up once the > connectors are pruned of all their modes and are ignored by userspace. > > For your consideration and guidance. > Thanks, > > Gil Dekel (6): > drm/i915/dp_link_training: Add a final failing state to link training > fallback > drm/i915/dp_link_training: Add a final failing state to link training > fallback for MST > drm/dp_mst: Add drm_dp_set_mst_topology_link_status() > drm/i915: Move DP modeset_retry_work into intel_dp > drm/i915/dp_link_training: Set all downstream MST ports to BAD before > retrying > drm/i915/dp_link_training: Emit a link-status=Bad uevent with trigger > property > > drivers/gpu/drm/display/drm_dp_mst_topology.c | 39 ++++++++++ > drivers/gpu/drm/i915/display/intel_display.c | 14 +++- > .../drm/i915/display/intel_display_types.h | 6 +- > drivers/gpu/drm/i915/display/intel_dp.c | 75 ++++++++++++------- > drivers/gpu/drm/i915/display/intel_dp.h | 2 +- > .../drm/i915/display/intel_dp_link_training.c | 11 ++- > include/drm/display/drm_dp_mst_helper.h | 3 + > 7 files changed, 110 insertions(+), 40 deletions(-) > > -- > Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics >
On Wed, Aug 30, 2023 at 05:41:37PM -0400, Lyude Paul wrote: > Other then the name typo (s/Pual/Paul): > > Signed-off-by: Lyude Paul <lyude@redhat.com> (just since I co-authored > things~) I believe having the Co-developed-by: in the patches that you helped out would be nice. > Reviewed-by: Lyude Paul <lyude@redhat.com> > > I think we definitely want to make sure we get intel's opinions on this > though, especially regarding the usage of link-status. I think we're close > enough to link-status's intended purpose, but I definitely would like to know > what others think about that since userspace will definitely have to handle > situations like this a bit differently than with SST. > > Also - definitely make sure you take a look at Imre's patch series that's > currently on the list (I just finished reviewing it), since it adds some > things to the helpers that might end up being useful here :) > > https://patchwork.freedesktop.org/series/122589/ > > On Thu, 2023-08-24 at 16:50 -0400, Gil Dekel wrote: > > Next version of https://patchwork.freedesktop.org/series/122850/ > > > > v4: > > Another blunder. I uploaded the patches from my ChromeiumOS kernel dev repo > > instead of drm-tip/drm-tip. Apologies for the noise :( > > > > v3: > > Still learning the ropes of upstream workflow. Apologies for mucking up v2. > > This is just a re-upload. > > > > v2: > > Reorganize into: > > 1) Add for final failure state for SST and MST link training fallback. > > 2) Add a DRM helper for setting downstream MST ports' link-status state. > > 3) Make handling SST and MST connectors simpler via intel_dp. > > 4) Update link-status for downstream MST ports. > > 5) Emit a uevent with the "link-status" trigger property. > > > > v1: > > Currently, when link training fails after all fallback values have been > > exhausted, the i915 driver seizes to send uevents to userspace. This leave > > userspace thinking that the last passing atomic commit was successful, and that > > all connectors (displays) are connected and operational, when in fact, the last > > link failed to train and the displays remain dark. This manifests as "zombie" > > displays in userspace, in which users observe the displays appear in their > > display settings page, but they are dark and unresponsive. > > > > Since, at the time of writing, MST link training fallback is not implemented, > > failing MST link training is a significantly more common case then a complete > > SST link training failure. And with users using MST hubs more than ever to > > connect multiple displays via their USB-C ports we observe this case often. > > > > This patchset series suggest a solution, in which a final failure state is > > defined. In this final state, the connector's bit rate capabilities, namely > > max_link_rate and max_link_lane_count, are set to 0. This effectively set the > > connector's bandwidth to 0Gbps, thus causing all its modes to be pruned in the > > following connector probing. > > > > Next, with this state defined, we emit a link-status=Bad uevent. The next time > > userspace probes the connector, it should recognize that the connector has no > > modes and ignore it since it is in a bad state. > > > > I am aware that always sending a uevent and never stopping may result in some > > userspaces having their expectations broken and enter an infinite loop of > > modesets and link-training attempts. However, per DRM link-status spec: > > ``` > > * link-status: > > * Connector link-status property to indicate the status of link. The > > * default value of link-status is "GOOD". If something fails during or > > * after modeset, the kernel driver may set this to "BAD" and issue a > > * hotplug uevent. Drivers should update this value using > > * drm_connector_set_link_status_property(). > > * > > * When user-space receives the hotplug uevent and detects a "BAD" > > * link-status, the sink doesn't receive pixels anymore (e.g. the screen > > * becomes completely black). The list of available modes may have > > * changed. User-space is expected to pick a new mode if the current one > > * has disappeared and perform a new modeset with link-status set to > > * "GOOD" to re-enable the connector. > > ``` > > (form drivers/gpu/drm/drm_connector.c - DOC: standard connector properties) > > > > it seems reasonable to assume that the suggested state is an extension of the > > spec's guidelines, in which the next new mode userspace picks for a connector > > with no modes is - none, thus breaking the cycle of failed link-training > > attempts. > > > > I suspect that, maybe, zeroing out the bit rate capabilities is not the right > > way to go, and perhaps marking the connector as disconnected instead may be a > > better solution. However, if marking a connector disconnected is the way to go, > > We will have to iterate over all MST ports in the MST case and mark the spawned > > connectors as disconnected as well. > > I -think- this is probably fine, that's likely how I'd > > > > > As a final note I should add that this approach was tested with ChromeOS as > > userspace, and we observed that the zombie displays stop showing up once the > > connectors are pruned of all their modes and are ignored by userspace. > > > > For your consideration and guidance. > > Thanks, > > > > Gil Dekel (6): > > drm/i915/dp_link_training: Add a final failing state to link training > > fallback > > drm/i915/dp_link_training: Add a final failing state to link training > > fallback for MST > > drm/dp_mst: Add drm_dp_set_mst_topology_link_status() > > drm/i915: Move DP modeset_retry_work into intel_dp > > drm/i915/dp_link_training: Set all downstream MST ports to BAD before > > retrying > > drm/i915/dp_link_training: Emit a link-status=Bad uevent with trigger > > property > > > > drivers/gpu/drm/display/drm_dp_mst_topology.c | 39 ++++++++++ > > drivers/gpu/drm/i915/display/intel_display.c | 14 +++- > > .../drm/i915/display/intel_display_types.h | 6 +- > > drivers/gpu/drm/i915/display/intel_dp.c | 75 ++++++++++++------- > > drivers/gpu/drm/i915/display/intel_dp.h | 2 +- > > .../drm/i915/display/intel_dp_link_training.c | 11 ++- > > include/drm/display/drm_dp_mst_helper.h | 3 + > > 7 files changed, 110 insertions(+), 40 deletions(-) > > > > -- > > Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics > > > > -- > Cheers, > Lyude Paul (she/her) > Software Engineer at Red Hat >
On Fri, Sep 01, 2023 at 02:38:11PM -0400, Rodrigo Vivi wrote: > On Wed, Aug 30, 2023 at 05:41:37PM -0400, Lyude Paul wrote: > > Other then the name typo (s/Pual/Paul): > > > > Signed-off-by: Lyude Paul <lyude@redhat.com> (just since I co-authored > > things~) > > I believe having the Co-developed-by: in the patches that you helped > out would be nice. > > > Reviewed-by: Lyude Paul <lyude@redhat.com> > > > > I think we definitely want to make sure we get intel's opinions on this > > though, especially regarding the usage of link-status. I think we're close > > enough to link-status's intended purpose, but I definitely would like to know > > what others think about that since userspace will definitely have to handle > > situations like this a bit differently than with SST. I'd like to get Jani, or Ville, or Imre's eyes here. I believe this series has a good potential to solve some bad lingering MST issues, but I'm indeed concerned with the impact on depending on userspace behavior here. (Besides that potential dead-lock...) > > > > Also - definitely make sure you take a look at Imre's patch series that's > > currently on the list (I just finished reviewing it), since it adds some > > things to the helpers that might end up being useful here :) > > > > https://patchwork.freedesktop.org/series/122589/ > > > > On Thu, 2023-08-24 at 16:50 -0400, Gil Dekel wrote: > > > Next version of https://patchwork.freedesktop.org/series/122850/ > > > > > > v4: > > > Another blunder. I uploaded the patches from my ChromeiumOS kernel dev repo > > > instead of drm-tip/drm-tip. Apologies for the noise :( > > > > > > v3: > > > Still learning the ropes of upstream workflow. Apologies for mucking up v2. > > > This is just a re-upload. > > > > > > v2: > > > Reorganize into: > > > 1) Add for final failure state for SST and MST link training fallback. > > > 2) Add a DRM helper for setting downstream MST ports' link-status state. > > > 3) Make handling SST and MST connectors simpler via intel_dp. > > > 4) Update link-status for downstream MST ports. > > > 5) Emit a uevent with the "link-status" trigger property. > > > > > > v1: > > > Currently, when link training fails after all fallback values have been > > > exhausted, the i915 driver seizes to send uevents to userspace. This leave > > > userspace thinking that the last passing atomic commit was successful, and that > > > all connectors (displays) are connected and operational, when in fact, the last > > > link failed to train and the displays remain dark. This manifests as "zombie" > > > displays in userspace, in which users observe the displays appear in their > > > display settings page, but they are dark and unresponsive. > > > > > > Since, at the time of writing, MST link training fallback is not implemented, > > > failing MST link training is a significantly more common case then a complete > > > SST link training failure. And with users using MST hubs more than ever to > > > connect multiple displays via their USB-C ports we observe this case often. > > > > > > This patchset series suggest a solution, in which a final failure state is > > > defined. In this final state, the connector's bit rate capabilities, namely > > > max_link_rate and max_link_lane_count, are set to 0. This effectively set the > > > connector's bandwidth to 0Gbps, thus causing all its modes to be pruned in the > > > following connector probing. > > > > > > Next, with this state defined, we emit a link-status=Bad uevent. The next time > > > userspace probes the connector, it should recognize that the connector has no > > > modes and ignore it since it is in a bad state. > > > > > > I am aware that always sending a uevent and never stopping may result in some > > > userspaces having their expectations broken and enter an infinite loop of > > > modesets and link-training attempts. However, per DRM link-status spec: > > > ``` > > > * link-status: > > > * Connector link-status property to indicate the status of link. The > > > * default value of link-status is "GOOD". If something fails during or > > > * after modeset, the kernel driver may set this to "BAD" and issue a > > > * hotplug uevent. Drivers should update this value using > > > * drm_connector_set_link_status_property(). > > > * > > > * When user-space receives the hotplug uevent and detects a "BAD" > > > * link-status, the sink doesn't receive pixels anymore (e.g. the screen > > > * becomes completely black). The list of available modes may have > > > * changed. User-space is expected to pick a new mode if the current one > > > * has disappeared and perform a new modeset with link-status set to > > > * "GOOD" to re-enable the connector. > > > ``` > > > (form drivers/gpu/drm/drm_connector.c - DOC: standard connector properties) > > > > > > it seems reasonable to assume that the suggested state is an extension of the > > > spec's guidelines, in which the next new mode userspace picks for a connector > > > with no modes is - none, thus breaking the cycle of failed link-training > > > attempts. > > > > > > I suspect that, maybe, zeroing out the bit rate capabilities is not the right > > > way to go, and perhaps marking the connector as disconnected instead may be a > > > better solution. However, if marking a connector disconnected is the way to go, > > > We will have to iterate over all MST ports in the MST case and mark the spawned > > > connectors as disconnected as well. > > > > I -think- this is probably fine, that's likely how I'd > > > > > > > > As a final note I should add that this approach was tested with ChromeOS as > > > userspace, and we observed that the zombie displays stop showing up once the > > > connectors are pruned of all their modes and are ignored by userspace. > > > > > > For your consideration and guidance. > > > Thanks, > > > > > > Gil Dekel (6): > > > drm/i915/dp_link_training: Add a final failing state to link training > > > fallback > > > drm/i915/dp_link_training: Add a final failing state to link training > > > fallback for MST > > > drm/dp_mst: Add drm_dp_set_mst_topology_link_status() > > > drm/i915: Move DP modeset_retry_work into intel_dp > > > drm/i915/dp_link_training: Set all downstream MST ports to BAD before > > > retrying > > > drm/i915/dp_link_training: Emit a link-status=Bad uevent with trigger > > > property > > > > > > drivers/gpu/drm/display/drm_dp_mst_topology.c | 39 ++++++++++ > > > drivers/gpu/drm/i915/display/intel_display.c | 14 +++- > > > .../drm/i915/display/intel_display_types.h | 6 +- > > > drivers/gpu/drm/i915/display/intel_dp.c | 75 ++++++++++++------- > > > drivers/gpu/drm/i915/display/intel_dp.h | 2 +- > > > .../drm/i915/display/intel_dp_link_training.c | 11 ++- > > > include/drm/display/drm_dp_mst_helper.h | 3 + > > > 7 files changed, 110 insertions(+), 40 deletions(-) > > > > > > -- > > > Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics > > > > > > > -- > > Cheers, > > Lyude Paul (she/her) > > Software Engineer at Red Hat > >
On Fri, Sep 1, 2023 at 3:05 PM Rodrigo Vivi <rodrigo.vivi@intel.com> wrote: > > On Fri, Sep 01, 2023 at 02:38:11PM -0400, Rodrigo Vivi wrote: > > On Wed, Aug 30, 2023 at 05:41:37PM -0400, Lyude Paul wrote: > > > Other then the name typo (s/Pual/Paul): > > > > > > Signed-off-by: Lyude Paul <lyude@redhat.com> (just since I co-authored > > > things~) > > > > I believe having the Co-developed-by: in the patches that you helped > > out would be nice. > > Agreed. I think I'll set Lyude as the author of the two patches she originally wrote, and set myself as the co-developer. Thank you both for pointing this out. > > > Reviewed-by: Lyude Paul <lyude@redhat.com> > > > > > > I think we definitely want to make sure we get intel's opinions on this > > > though, especially regarding the usage of link-status. I think we're close > > > enough to link-status's intended purpose, but I definitely would like to know > > > what others think about that since userspace will definitely have to handle > > > situations like this a bit differently than with SST. > > I'd like to get Jani, or Ville, or Imre's eyes here. I believe this > series has a good potential to solve some bad lingering MST issues, > but I'm indeed concerned with the impact on depending on userspace > behavior here. > I would love to have other userspaces reviewing (or at least ACKing) this series. Any thoughts on who should be added on the next revision? > (Besides that potential dead-lock...) > Response is on patch 4/5. > > > > > > Also - definitely make sure you take a look at Imre's patch series that's > > > currently on the list (I just finished reviewing it), since it adds some > > > things to the helpers that might end up being useful here :) > > > > > > https://patchwork.freedesktop.org/series/122589/ > > > Do you have anything particular in mind? > > > On Thu, 2023-08-24 at 16:50 -0400, Gil Dekel wrote: > > > > Next version of https://patchwork.freedesktop.org/series/122850/ > > > > > > > > v4: > > > > Another blunder. I uploaded the patches from my ChromeiumOS kernel dev repo > > > > instead of drm-tip/drm-tip. Apologies for the noise :( > > > > > > > > v3: > > > > Still learning the ropes of upstream workflow. Apologies for mucking up v2. > > > > This is just a re-upload. > > > > > > > > v2: > > > > Reorganize into: > > > > 1) Add for final failure state for SST and MST link training fallback. > > > > 2) Add a DRM helper for setting downstream MST ports' link-status state. > > > > 3) Make handling SST and MST connectors simpler via intel_dp. > > > > 4) Update link-status for downstream MST ports. > > > > 5) Emit a uevent with the "link-status" trigger property. > > > > > > > > v1: > > > > Currently, when link training fails after all fallback values have been > > > > exhausted, the i915 driver seizes to send uevents to userspace. This leave > > > > userspace thinking that the last passing atomic commit was successful, and that > > > > all connectors (displays) are connected and operational, when in fact, the last > > > > link failed to train and the displays remain dark. This manifests as "zombie" > > > > displays in userspace, in which users observe the displays appear in their > > > > display settings page, but they are dark and unresponsive. > > > > > > > > Since, at the time of writing, MST link training fallback is not implemented, > > > > failing MST link training is a significantly more common case then a complete > > > > SST link training failure. And with users using MST hubs more than ever to > > > > connect multiple displays via their USB-C ports we observe this case often. > > > > > > > > This patchset series suggest a solution, in which a final failure state is > > > > defined. In this final state, the connector's bit rate capabilities, namely > > > > max_link_rate and max_link_lane_count, are set to 0. This effectively set the > > > > connector's bandwidth to 0Gbps, thus causing all its modes to be pruned in the > > > > following connector probing. > > > > > > > > Next, with this state defined, we emit a link-status=Bad uevent. The next time > > > > userspace probes the connector, it should recognize that the connector has no > > > > modes and ignore it since it is in a bad state. > > > > > > > > I am aware that always sending a uevent and never stopping may result in some > > > > userspaces having their expectations broken and enter an infinite loop of > > > > modesets and link-training attempts. However, per DRM link-status spec: > > > > ``` > > > > * link-status: > > > > * Connector link-status property to indicate the status of link. The > > > > * default value of link-status is "GOOD". If something fails during or > > > > * after modeset, the kernel driver may set this to "BAD" and issue a > > > > * hotplug uevent. Drivers should update this value using > > > > * drm_connector_set_link_status_property(). > > > > * > > > > * When user-space receives the hotplug uevent and detects a "BAD" > > > > * link-status, the sink doesn't receive pixels anymore (e.g. the screen > > > > * becomes completely black). The list of available modes may have > > > > * changed. User-space is expected to pick a new mode if the current one > > > > * has disappeared and perform a new modeset with link-status set to > > > > * "GOOD" to re-enable the connector. > > > > ``` > > > > (form drivers/gpu/drm/drm_connector.c - DOC: standard connector properties) > > > > > > > > it seems reasonable to assume that the suggested state is an extension of the > > > > spec's guidelines, in which the next new mode userspace picks for a connector > > > > with no modes is - none, thus breaking the cycle of failed link-training > > > > attempts. > > > > > > > > I suspect that, maybe, zeroing out the bit rate capabilities is not the right > > > > way to go, and perhaps marking the connector as disconnected instead may be a > > > > better solution. However, if marking a connector disconnected is the way to go, > > > > We will have to iterate over all MST ports in the MST case and mark the spawned > > > > connectors as disconnected as well. > > > > > > I -think- this is probably fine, that's likely how I'd > > > > > > > > > > > As a final note I should add that this approach was tested with ChromeOS as > > > > userspace, and we observed that the zombie displays stop showing up once the > > > > connectors are pruned of all their modes and are ignored by userspace. > > > > > > > > For your consideration and guidance. > > > > Thanks, > > > > > > > > Gil Dekel (6): > > > > drm/i915/dp_link_training: Add a final failing state to link training > > > > fallback > > > > drm/i915/dp_link_training: Add a final failing state to link training > > > > fallback for MST > > > > drm/dp_mst: Add drm_dp_set_mst_topology_link_status() > > > > drm/i915: Move DP modeset_retry_work into intel_dp > > > > drm/i915/dp_link_training: Set all downstream MST ports to BAD before > > > > retrying > > > > drm/i915/dp_link_training: Emit a link-status=Bad uevent with trigger > > > > property > > > > > > > > drivers/gpu/drm/display/drm_dp_mst_topology.c | 39 ++++++++++ > > > > drivers/gpu/drm/i915/display/intel_display.c | 14 +++- > > > > .../drm/i915/display/intel_display_types.h | 6 +- > > > > drivers/gpu/drm/i915/display/intel_dp.c | 75 ++++++++++++------- > > > > drivers/gpu/drm/i915/display/intel_dp.h | 2 +- > > > > .../drm/i915/display/intel_dp_link_training.c | 11 ++- > > > > include/drm/display/drm_dp_mst_helper.h | 3 + > > > > 7 files changed, 110 insertions(+), 40 deletions(-) > > > > > > > > -- > > > > Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics > > > > > > > > > > -- > > > Cheers, > > > Lyude Paul (she/her) > > > Software Engineer at Red Hat > > > Thanks for your time and comments! -- Best, Gil Dekel, Software Engineer, Google / ChromeOS Display and Graphics