diff mbox

[RFC,xserver] modesetting: re-set the crtc's mode when link-status goes BAD

Message ID 20170126123728.5680-1-martin.peres@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Martin Peres Jan. 26, 2017, 12:37 p.m. UTC
Despite all the careful planing of the kernel, a link may become
insufficient to handle the currently-set mode. At this point, the
kernel should mark this particular configuration as being broken
and potentially prune the mode before setting the offending connector's
link-status to BAD and send the userspace a hotplug event. This may
happen right after a modeset or later on.

When available, we should use the link-status information to reset
the wanted mode.

Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
---

WARNING: The patches have not been merged in the kernel yet, so this patch is
merely an RFC.

This patch is the result of discussions happening mostly in DRI-devel and
Intel-GFX on how to handle link training failures. I would advise reading the
thread [0] first and then this thread [1] which explain in great length why this
is needed and why the selected approach seems to be the best.

The relevant kernel patches + this patch are enough to pass the relevant
DisplayPort compliance tests, provided that the Desktop Environment or another
program ([2]?) provides the initial modesetting on hotplug.

Would this patch be acceptable to you? Any comments or suggestions?

[0] https://lists.freedesktop.org/archives/dri-devel/2016-November/123366.html
[1] https://lists.freedesktop.org/archives/dri-devel/2016-November/124736.html
[2] https://cgit.freedesktop.org/~mperes/auto-randr/


 hw/xfree86/drivers/modesetting/drmmode_display.c | 51 ++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

Comments

Daniel Vetter Jan. 26, 2017, 5:21 p.m. UTC | #1
On Thu, Jan 26, 2017 at 02:37:28PM +0200, Martin Peres wrote:
> Despite all the careful planing of the kernel, a link may become
> insufficient to handle the currently-set mode. At this point, the
> kernel should mark this particular configuration as being broken
> and potentially prune the mode before setting the offending connector's
> link-status to BAD and send the userspace a hotplug event. This may
> happen right after a modeset or later on.
> 
> When available, we should use the link-status information to reset
> the wanted mode.
> 
> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> ---
> 
> WARNING: The patches have not been merged in the kernel yet, so this patch is
> merely an RFC.

That's how it's supposed to happen, before we can merge the kernel side,
we need acceptance (=reviewed) by the userspace side. Which is why this
patch here.
-Daniel

> 
> This patch is the result of discussions happening mostly in DRI-devel and
> Intel-GFX on how to handle link training failures. I would advise reading the
> thread [0] first and then this thread [1] which explain in great length why this
> is needed and why the selected approach seems to be the best.
> 
> The relevant kernel patches + this patch are enough to pass the relevant
> DisplayPort compliance tests, provided that the Desktop Environment or another
> program ([2]?) provides the initial modesetting on hotplug.
> 
> Would this patch be acceptable to you? Any comments or suggestions?
> 
> [0] https://lists.freedesktop.org/archives/dri-devel/2016-November/123366.html
> [1] https://lists.freedesktop.org/archives/dri-devel/2016-November/124736.html
> [2] https://cgit.freedesktop.org/~mperes/auto-randr/
> 
> 
>  hw/xfree86/drivers/modesetting/drmmode_display.c | 51 ++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/hw/xfree86/drivers/modesetting/drmmode_display.c b/hw/xfree86/drivers/modesetting/drmmode_display.c
> index 6e755e9482..3144620c67 100644
> --- a/hw/xfree86/drivers/modesetting/drmmode_display.c
> +++ b/hw/xfree86/drivers/modesetting/drmmode_display.c
> @@ -2262,6 +2262,10 @@ drmmode_setup_colormap(ScreenPtr pScreen, ScrnInfoPtr pScrn)
>  }
>  
>  #ifdef CONFIG_UDEV_KMS
> +
> +#define DRM_MODE_LINK_STATUS_GOOD       0
> +#define DRM_MODE_LINK_STATUS_BAD        1
> +
>  static void
>  drmmode_handle_uevents(int fd, void *closure)
>  {
> @@ -2281,6 +2285,49 @@ drmmode_handle_uevents(int fd, void *closure)
>      if (!found)
>          return;
>  
> +    /* Try to re-set the mode on all the connectors with a BAD link-state:
> +     * This may happen if a link degrades and a new modeset is necessary, using
> +     * different link-training parameters. If the kernel found that the current
> +     * mode is not achievable anymore, it should have pruned the mode before
> +     * sending the hotplug event. Try to re-set the currently-set mode to keep
> +     * the display alive, this will fail if the mode has been pruned.
> +     * In any case, we will send randr events for the Desktop Environment to
> +     * deal with it, if it wants to.
> +     */
> +    for (i = 0; i < config->num_output; i++) {
> +        xf86OutputPtr output = config->output[i];
> +        drmmode_output_private_ptr drmmode_output = output->driver_private;
> +        uint32_t con_id = drmmode_output->mode_output->connector_id;
> +        drmModeConnectorPtr koutput;
> +
> +        /* Get an updated view of the properties for the current connector and
> +         * look for the link-status property
> +         */
> +        koutput = drmModeGetConnectorCurrent(drmmode->fd, con_id);
> +        for (j = 0; koutput && j < koutput->count_props; j++) {
> +            drmModePropertyPtr props;
> +            props = drmModeGetProperty(drmmode->fd, koutput->props[j]);
> +            if (props && props->flags & DRM_MODE_PROP_ENUM &&
> +                !strcmp(props->name, "link-status") &&
> +                koutput->prop_values[j] == DRM_MODE_LINK_STATUS_BAD) {
> +                xf86CrtcPtr crtc = output->crtc;
> +                if (!crtc)
> +                    continue;
> +
> +                /* the connector got a link failure, re-set the current mode */
> +                drmmode_set_mode_major(crtc, &crtc->mode, crtc->rotation,
> +                                       crtc->x, crtc->y);
> +
> +                xf86DrvMsg(scrn->scrnIndex, X_WARNING,
> +                           "hotplug event: connector %u's link-state is BAD, "
> +                           "tried resetting the current mode. You may be left"
> +                           "with a black screen if this fails...\n", con_id);
> +            }
> +            drmModeFreeProperty(props);
> +        }
> +        drmModeFreeConnector(koutput);
> +    }
> +
>      mode_res = drmModeGetResources(drmmode->fd);
>      if (!mode_res)
>          goto out;
> @@ -2345,6 +2392,10 @@ out_free_res:
>  out:
>      RRGetInfo(xf86ScrnToScreen(scrn), TRUE);
>  }
> +
> +#undef DRM_MODE_LINK_STATUS_BAD
> +#undef DRM_MODE_LINK_STATUS_GOOD
> +
>  #endif
>  
>  void
> -- 
> 2.11.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Navare, Manasi Jan. 31, 2017, 5:08 p.m. UTC | #2
On Thu, Jan 26, 2017 at 06:21:20PM +0100, Daniel Vetter wrote:
> On Thu, Jan 26, 2017 at 02:37:28PM +0200, Martin Peres wrote:
> > Despite all the careful planing of the kernel, a link may become
> > insufficient to handle the currently-set mode. At this point, the
> > kernel should mark this particular configuration as being broken
> > and potentially prune the mode before setting the offending connector's
> > link-status to BAD and send the userspace a hotplug event. This may
> > happen right after a modeset or later on.
> > 
> > When available, we should use the link-status information to reset
> > the wanted mode.
> > 
> > Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> > ---
> > 
> > WARNING: The patches have not been merged in the kernel yet, so this patch is
> > merely an RFC.
> 
> That's how it's supposed to happen, before we can merge the kernel side,
> we need acceptance (=reviewed) by the userspace side. Which is why this
> patch here.
> -Daniel
>

This has been validated with a compliance device inducing a forced link
failure during modeset, after which the kernel sets the link-status to BAD
and sends a hotplug to userspace. The -modesetting driver then resets the
configuration by forcing another mdoeset and retraining the link at lower
link rate/lane count. This has been proven to pass DP compliance.
It has been also verified that this avoids the black screens in case of such
mode failures due to link training failure.

Regards
Manasi

 
> > 
> > This patch is the result of discussions happening mostly in DRI-devel and
> > Intel-GFX on how to handle link training failures. I would advise reading the
> > thread [0] first and then this thread [1] which explain in great length why this
> > is needed and why the selected approach seems to be the best.
> > 
> > The relevant kernel patches + this patch are enough to pass the relevant
> > DisplayPort compliance tests, provided that the Desktop Environment or another
> > program ([2]?) provides the initial modesetting on hotplug.
> > 
> > Would this patch be acceptable to you? Any comments or suggestions?
> > 
> > [0] https://lists.freedesktop.org/archives/dri-devel/2016-November/123366.html
> > [1] https://lists.freedesktop.org/archives/dri-devel/2016-November/124736.html
> > [2] https://cgit.freedesktop.org/~mperes/auto-randr/
> > 
> > 
> >  hw/xfree86/drivers/modesetting/drmmode_display.c | 51 ++++++++++++++++++++++++
> >  1 file changed, 51 insertions(+)
> > 
> > diff --git a/hw/xfree86/drivers/modesetting/drmmode_display.c b/hw/xfree86/drivers/modesetting/drmmode_display.c
> > index 6e755e9482..3144620c67 100644
> > --- a/hw/xfree86/drivers/modesetting/drmmode_display.c
> > +++ b/hw/xfree86/drivers/modesetting/drmmode_display.c
> > @@ -2262,6 +2262,10 @@ drmmode_setup_colormap(ScreenPtr pScreen, ScrnInfoPtr pScrn)
> >  }
> >  
> >  #ifdef CONFIG_UDEV_KMS
> > +
> > +#define DRM_MODE_LINK_STATUS_GOOD       0
> > +#define DRM_MODE_LINK_STATUS_BAD        1
> > +
> >  static void
> >  drmmode_handle_uevents(int fd, void *closure)
> >  {
> > @@ -2281,6 +2285,49 @@ drmmode_handle_uevents(int fd, void *closure)
> >      if (!found)
> >          return;
> >  
> > +    /* Try to re-set the mode on all the connectors with a BAD link-state:
> > +     * This may happen if a link degrades and a new modeset is necessary, using
> > +     * different link-training parameters. If the kernel found that the current
> > +     * mode is not achievable anymore, it should have pruned the mode before
> > +     * sending the hotplug event. Try to re-set the currently-set mode to keep
> > +     * the display alive, this will fail if the mode has been pruned.
> > +     * In any case, we will send randr events for the Desktop Environment to
> > +     * deal with it, if it wants to.
> > +     */
> > +    for (i = 0; i < config->num_output; i++) {
> > +        xf86OutputPtr output = config->output[i];
> > +        drmmode_output_private_ptr drmmode_output = output->driver_private;
> > +        uint32_t con_id = drmmode_output->mode_output->connector_id;
> > +        drmModeConnectorPtr koutput;
> > +
> > +        /* Get an updated view of the properties for the current connector and
> > +         * look for the link-status property
> > +         */
> > +        koutput = drmModeGetConnectorCurrent(drmmode->fd, con_id);
> > +        for (j = 0; koutput && j < koutput->count_props; j++) {
> > +            drmModePropertyPtr props;
> > +            props = drmModeGetProperty(drmmode->fd, koutput->props[j]);
> > +            if (props && props->flags & DRM_MODE_PROP_ENUM &&
> > +                !strcmp(props->name, "link-status") &&
> > +                koutput->prop_values[j] == DRM_MODE_LINK_STATUS_BAD) {
> > +                xf86CrtcPtr crtc = output->crtc;
> > +                if (!crtc)
> > +                    continue;
> > +
> > +                /* the connector got a link failure, re-set the current mode */
> > +                drmmode_set_mode_major(crtc, &crtc->mode, crtc->rotation,
> > +                                       crtc->x, crtc->y);
> > +
> > +                xf86DrvMsg(scrn->scrnIndex, X_WARNING,
> > +                           "hotplug event: connector %u's link-state is BAD, "
> > +                           "tried resetting the current mode. You may be left"
> > +                           "with a black screen if this fails...\n", con_id);
> > +            }
> > +            drmModeFreeProperty(props);
> > +        }
> > +        drmModeFreeConnector(koutput);
> > +    }
> > +
> >      mode_res = drmModeGetResources(drmmode->fd);
> >      if (!mode_res)
> >          goto out;
> > @@ -2345,6 +2392,10 @@ out_free_res:
> >  out:
> >      RRGetInfo(xf86ScrnToScreen(scrn), TRUE);
> >  }
> > +
> > +#undef DRM_MODE_LINK_STATUS_BAD
> > +#undef DRM_MODE_LINK_STATUS_GOOD
> > +
> >  #endif
> >  
> >  void
> > -- 
> > 2.11.0
> > 
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Eric Anholt Jan. 31, 2017, 8:13 p.m. UTC | #3
Martin Peres <martin.peres@linux.intel.com> writes:

> Despite all the careful planing of the kernel, a link may become
> insufficient to handle the currently-set mode. At this point, the
> kernel should mark this particular configuration as being broken
> and potentially prune the mode before setting the offending connector's
> link-status to BAD and send the userspace a hotplug event. This may
> happen right after a modeset or later on.
>
> When available, we should use the link-status information to reset
> the wanted mode.
>
> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>

If I understand this right, there are two failure modes being handled:

1) A mode that won't actually work because the link isn't good enough.

2) A mode that should work, but link parameters were too optimistic and
if we just ask the kernel to set the mode again it'll use more
conservative parameters that work.

This patch seems good for 2).  For 1), the drmmode_set_mode_major is
going to set our old mode back.  Won't the modeset then fail to link
train again, and bring us back into this loop?  The only escape that I
see would be some other userspace responding to the advertised mode list
changing, and then asking X to modeset to something new.

To avoid that failure busy loop, should we re-fetching modes at this
point, and only re-setting if our mode still exists?
Jani Nikula Feb. 1, 2017, 10:03 a.m. UTC | #4
On Tue, 31 Jan 2017, Eric Anholt <eric@anholt.net> wrote:
> Martin Peres <martin.peres@linux.intel.com> writes:
>
>> Despite all the careful planing of the kernel, a link may become
>> insufficient to handle the currently-set mode. At this point, the
>> kernel should mark this particular configuration as being broken
>> and potentially prune the mode before setting the offending connector's
>> link-status to BAD and send the userspace a hotplug event. This may
>> happen right after a modeset or later on.
>>
>> When available, we should use the link-status information to reset
>> the wanted mode.
>>
>> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
>
> If I understand this right, there are two failure modes being handled:
>
> 1) A mode that won't actually work because the link isn't good enough.
>
> 2) A mode that should work, but link parameters were too optimistic and
> if we just ask the kernel to set the mode again it'll use more
> conservative parameters that work.
>
> This patch seems good for 2).  For 1), the drmmode_set_mode_major is
> going to set our old mode back.  Won't the modeset then fail to link
> train again, and bring us back into this loop?  The only escape that I
> see would be some other userspace responding to the advertised mode list
> changing, and then asking X to modeset to something new.
>
> To avoid that failure busy loop, should we re-fetching modes at this
> point, and only re-setting if our mode still exists?

Disclaimer: I don't know anything about the internals of the modesetting
driver.

Perhaps we can identify the two cases now, but I'd put this more
generally: if the link status has gone bad, it's an indicator to
userspace that the circumstances may have changed, and userspace should
query the kernel for the list of available modes again. It should no
longer trust information obtained prior to getting the bad link status,
including the current mode.

But specifically, I think you're right, and AFAICT asking for the list
of modes again is the only way for the userspace to distinguish between
the two cases. I don't think there's a shortcut for deciding the current
mode is still valid.

BR,
Jani.
Jani Nikula Feb. 1, 2017, 10:17 a.m. UTC | #5
On Tue, 31 Jan 2017, Manasi Navare <manasi.d.navare@intel.com> wrote:
> On Thu, Jan 26, 2017 at 06:21:20PM +0100, Daniel Vetter wrote:
>> On Thu, Jan 26, 2017 at 02:37:28PM +0200, Martin Peres wrote:
>> > Despite all the careful planing of the kernel, a link may become
>> > insufficient to handle the currently-set mode. At this point, the
>> > kernel should mark this particular configuration as being broken
>> > and potentially prune the mode before setting the offending connector's
>> > link-status to BAD and send the userspace a hotplug event. This may
>> > happen right after a modeset or later on.
>> > 
>> > When available, we should use the link-status information to reset
>> > the wanted mode.
>> > 
>> > Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
>> > ---
>> > 
>> > WARNING: The patches have not been merged in the kernel yet, so this patch is
>> > merely an RFC.
>> 
>> That's how it's supposed to happen, before we can merge the kernel side,
>> we need acceptance (=reviewed) by the userspace side. Which is why this
>> patch here.
>> -Daniel
>>
>
> This has been validated with a compliance device inducing a forced link
> failure during modeset, after which the kernel sets the link-status to BAD
> and sends a hotplug to userspace. The -modesetting driver then resets the
> configuration by forcing another mdoeset and retraining the link at lower
> link rate/lane count. This has been proven to pass DP compliance.
> It has been also verified that this avoids the black screens in case of such
> mode failures due to link training failure.

Validation is great, but review is what is required and we're after!

BR,
Jani.
Navare, Manasi Feb. 1, 2017, 7:55 p.m. UTC | #6
On Thu, Jan 26, 2017 at 02:37:28PM +0200, Martin Peres wrote:
> Despite all the careful planing of the kernel, a link may become
> insufficient to handle the currently-set mode. At this point, the
> kernel should mark this particular configuration as being broken
> and potentially prune the mode before setting the offending connector's
> link-status to BAD and send the userspace a hotplug event. This may
> happen right after a modeset or later on.
> 
> When available, we should use the link-status information to reset
> the wanted mode.
> 
> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> ---
> 
> WARNING: The patches have not been merged in the kernel yet, so this patch is
> merely an RFC.
> 
> This patch is the result of discussions happening mostly in DRI-devel and
> Intel-GFX on how to handle link training failures. I would advise reading the
> thread [0] first and then this thread [1] which explain in great length why this
> is needed and why the selected approach seems to be the best.
> 
> The relevant kernel patches + this patch are enough to pass the relevant
> DisplayPort compliance tests, provided that the Desktop Environment or another
> program ([2]?) provides the initial modesetting on hotplug.
> 
> Would this patch be acceptable to you? Any comments or suggestions?
> 
> [0] https://lists.freedesktop.org/archives/dri-devel/2016-November/123366.html
> [1] https://lists.freedesktop.org/archives/dri-devel/2016-November/124736.html
> [2] https://cgit.freedesktop.org/~mperes/auto-randr/
> 
> 
>  hw/xfree86/drivers/modesetting/drmmode_display.c | 51 ++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/hw/xfree86/drivers/modesetting/drmmode_display.c b/hw/xfree86/drivers/modesetting/drmmode_display.c
> index 6e755e9482..3144620c67 100644
> --- a/hw/xfree86/drivers/modesetting/drmmode_display.c
> +++ b/hw/xfree86/drivers/modesetting/drmmode_display.c
> @@ -2262,6 +2262,10 @@ drmmode_setup_colormap(ScreenPtr pScreen, ScrnInfoPtr pScrn)
>  }
>  
>  #ifdef CONFIG_UDEV_KMS
> +
> +#define DRM_MODE_LINK_STATUS_GOOD       0
> +#define DRM_MODE_LINK_STATUS_BAD        1
> +
>  static void
>  drmmode_handle_uevents(int fd, void *closure)
>  {
> @@ -2281,6 +2285,49 @@ drmmode_handle_uevents(int fd, void *closure)
>      if (!found)
>          return;
>  
> +    /* Try to re-set the mode on all the connectors with a BAD link-state:
> +     * This may happen if a link degrades and a new modeset is necessary, using
> +     * different link-training parameters. If the kernel found that the current
> +     * mode is not achievable anymore, it should have pruned the mode before

The kernel does not prune the mode before sending a hotplug event. This happens
when userspace tries to do a mdoeset on the current mode after which kernel will
validate it agianst the new degraded link parameters and prune it if its not
supported anymore in which case the userspace driver will need to request a modeset
on the next lower mode.

Manasi


> +     * sending the hotplug event. Try to re-set the currently-set mode to keep
> +     * the display alive, this will fail if the mode has been pruned.
> +     * In any case, we will send randr events for the Desktop Environment to
> +     * deal with it, if it wants to.
> +     */
> +    for (i = 0; i < config->num_output; i++) {
> +        xf86OutputPtr output = config->output[i];
> +        drmmode_output_private_ptr drmmode_output = output->driver_private;
> +        uint32_t con_id = drmmode_output->mode_output->connector_id;
> +        drmModeConnectorPtr koutput;
> +
> +        /* Get an updated view of the properties for the current connector and
> +         * look for the link-status property
> +         */
> +        koutput = drmModeGetConnectorCurrent(drmmode->fd, con_id);
> +        for (j = 0; koutput && j < koutput->count_props; j++) {
> +            drmModePropertyPtr props;
> +            props = drmModeGetProperty(drmmode->fd, koutput->props[j]);
> +            if (props && props->flags & DRM_MODE_PROP_ENUM &&
> +                !strcmp(props->name, "link-status") &&
> +                koutput->prop_values[j] == DRM_MODE_LINK_STATUS_BAD) {
> +                xf86CrtcPtr crtc = output->crtc;
> +                if (!crtc)
> +                    continue;
> +
> +                /* the connector got a link failure, re-set the current mode */
> +                drmmode_set_mode_major(crtc, &crtc->mode, crtc->rotation,
> +                                       crtc->x, crtc->y);
> +
> +                xf86DrvMsg(scrn->scrnIndex, X_WARNING,
> +                           "hotplug event: connector %u's link-state is BAD, "
> +                           "tried resetting the current mode. You may be left"
> +                           "with a black screen if this fails...\n", con_id);
> +            }

What happens incase this mdoe is pruned? In this case the kernel will fail
the atomic_check() and return an error early on.
How does the driver request a new list of supported modes and request
a lower resolution mode?

Regards
Manasi


> +            drmModeFreeProperty(props);
> +        }
> +        drmModeFreeConnector(koutput);
> +    }
> +
>      mode_res = drmModeGetResources(drmmode->fd);
>      if (!mode_res)
>          goto out;
> @@ -2345,6 +2392,10 @@ out_free_res:
>  out:
>      RRGetInfo(xf86ScrnToScreen(scrn), TRUE);
>  }
> +
> +#undef DRM_MODE_LINK_STATUS_BAD
> +#undef DRM_MODE_LINK_STATUS_GOOD
> +
>  #endif
>  
>  void
> -- 
> 2.11.0
>
Eric Anholt Feb. 1, 2017, 7:58 p.m. UTC | #7
Jani Nikula <jani.nikula@linux.intel.com> writes:

> On Tue, 31 Jan 2017, Eric Anholt <eric@anholt.net> wrote:
>> Martin Peres <martin.peres@linux.intel.com> writes:
>>
>>> Despite all the careful planing of the kernel, a link may become
>>> insufficient to handle the currently-set mode. At this point, the
>>> kernel should mark this particular configuration as being broken
>>> and potentially prune the mode before setting the offending connector's
>>> link-status to BAD and send the userspace a hotplug event. This may
>>> happen right after a modeset or later on.
>>>
>>> When available, we should use the link-status information to reset
>>> the wanted mode.
>>>
>>> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
>>
>> If I understand this right, there are two failure modes being handled:
>>
>> 1) A mode that won't actually work because the link isn't good enough.
>>
>> 2) A mode that should work, but link parameters were too optimistic and
>> if we just ask the kernel to set the mode again it'll use more
>> conservative parameters that work.
>>
>> This patch seems good for 2).  For 1), the drmmode_set_mode_major is
>> going to set our old mode back.  Won't the modeset then fail to link
>> train again, and bring us back into this loop?  The only escape that I
>> see would be some other userspace responding to the advertised mode list
>> changing, and then asking X to modeset to something new.
>>
>> To avoid that failure busy loop, should we re-fetching modes at this
>> point, and only re-setting if our mode still exists?
>
> Disclaimer: I don't know anything about the internals of the modesetting
> driver.
>
> Perhaps we can identify the two cases now, but I'd put this more
> generally: if the link status has gone bad, it's an indicator to
> userspace that the circumstances may have changed, and userspace should
> query the kernel for the list of available modes again. It should no
> longer trust information obtained prior to getting the bad link status,
> including the current mode.
>
> But specifically, I think you're right, and AFAICT asking for the list
> of modes again is the only way for the userspace to distinguish between
> the two cases. I don't think there's a shortcut for deciding the current
> mode is still valid.

To avoid the busy-loop problem, I think I'd like this patch to re-query
the kernel to ask about the current mode list, and only try to re-set
the mode if our mode is still there.

If the mode isn't there, then it's up to the DE to take action in
response to the notification of new modes.  If you don't have a DE to
take appropriate action, you're kind of out of luck.

As far as the ABI goes, this seems fine to me.  The only concern I had
about ABI was having to walk all the connectors on every uevent to see
if any had gone bad -- couldn't we have a flag of some sort about what
the uevent indicates?  But uevents should be super rare, so I'd say the
kernel could go ahead with the current plan.
Navare, Manasi Feb. 1, 2017, 8:05 p.m. UTC | #8
On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
> Jani Nikula <jani.nikula@linux.intel.com> writes:
> 
> > On Tue, 31 Jan 2017, Eric Anholt <eric@anholt.net> wrote:
> >> Martin Peres <martin.peres@linux.intel.com> writes:
> >>
> >>> Despite all the careful planing of the kernel, a link may become
> >>> insufficient to handle the currently-set mode. At this point, the
> >>> kernel should mark this particular configuration as being broken
> >>> and potentially prune the mode before setting the offending connector's
> >>> link-status to BAD and send the userspace a hotplug event. This may
> >>> happen right after a modeset or later on.
> >>>
> >>> When available, we should use the link-status information to reset
> >>> the wanted mode.
> >>>
> >>> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> >>
> >> If I understand this right, there are two failure modes being handled:
> >>
> >> 1) A mode that won't actually work because the link isn't good enough.
> >>
> >> 2) A mode that should work, but link parameters were too optimistic and
> >> if we just ask the kernel to set the mode again it'll use more
> >> conservative parameters that work.
> >>
> >> This patch seems good for 2).  For 1), the drmmode_set_mode_major is
> >> going to set our old mode back.  Won't the modeset then fail to link
> >> train again, and bring us back into this loop?  The only escape that I
> >> see would be some other userspace responding to the advertised mode list
> >> changing, and then asking X to modeset to something new.
> >>
> >> To avoid that failure busy loop, should we re-fetching modes at this
> >> point, and only re-setting if our mode still exists?
> >
> > Disclaimer: I don't know anything about the internals of the modesetting
> > driver.
> >
> > Perhaps we can identify the two cases now, but I'd put this more
> > generally: if the link status has gone bad, it's an indicator to
> > userspace that the circumstances may have changed, and userspace should
> > query the kernel for the list of available modes again. It should no
> > longer trust information obtained prior to getting the bad link status,
> > including the current mode.
> >
> > But specifically, I think you're right, and AFAICT asking for the list
> > of modes again is the only way for the userspace to distinguish between
> > the two cases. I don't think there's a shortcut for deciding the current
> > mode is still valid.
> 
> To avoid the busy-loop problem, I think I'd like this patch to re-query
> the kernel to ask about the current mode list, and only try to re-set
> the mode if our mode is still there.
> 
> If the mode isn't there, then it's up to the DE to take action in
> response to the notification of new modes.  If you don't have a DE to
> take appropriate action, you're kind of out of luck.
> 
> As far as the ABI goes, this seems fine to me.  The only concern I had
> about ABI was having to walk all the connectors on every uevent to see
> if any had gone bad -- couldn't we have a flag of some sort about what
> the uevent indicates?  But uevents should be super rare, so I'd say the
> kernel could go ahead with the current plan.

Yes I agree. The kernel sets the link status BAD as soona s link training fails
but does not prune the modes until a new modelist is requested by the userspace.
So this patch should re query the mode list as soon as it sees the link status
BAD in order for the kernel to validate the modes again based on new link
paarmeters and send a new mode list back.
Remember that it could still not prune any mode if the same mode is valid
with lower bpp, it will still keep the mode list same and when the 
userspace retries the same mode, it will do a modeset at lower bpp (say 18bpp)
and still succeed. (Same mode at lower bpp still better than dropping the resolution)

Regards
Manasi
Daniel Vetter Feb. 2, 2017, 9:01 a.m. UTC | #9
On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
> Jani Nikula <jani.nikula@linux.intel.com> writes:
> 
> > On Tue, 31 Jan 2017, Eric Anholt <eric@anholt.net> wrote:
> >> Martin Peres <martin.peres@linux.intel.com> writes:
> >>
> >>> Despite all the careful planing of the kernel, a link may become
> >>> insufficient to handle the currently-set mode. At this point, the
> >>> kernel should mark this particular configuration as being broken
> >>> and potentially prune the mode before setting the offending connector's
> >>> link-status to BAD and send the userspace a hotplug event. This may
> >>> happen right after a modeset or later on.
> >>>
> >>> When available, we should use the link-status information to reset
> >>> the wanted mode.
> >>>
> >>> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> >>
> >> If I understand this right, there are two failure modes being handled:
> >>
> >> 1) A mode that won't actually work because the link isn't good enough.
> >>
> >> 2) A mode that should work, but link parameters were too optimistic and
> >> if we just ask the kernel to set the mode again it'll use more
> >> conservative parameters that work.
> >>
> >> This patch seems good for 2).  For 1), the drmmode_set_mode_major is
> >> going to set our old mode back.  Won't the modeset then fail to link
> >> train again, and bring us back into this loop?  The only escape that I
> >> see would be some other userspace responding to the advertised mode list
> >> changing, and then asking X to modeset to something new.
> >>
> >> To avoid that failure busy loop, should we re-fetching modes at this
> >> point, and only re-setting if our mode still exists?
> >
> > Disclaimer: I don't know anything about the internals of the modesetting
> > driver.
> >
> > Perhaps we can identify the two cases now, but I'd put this more
> > generally: if the link status has gone bad, it's an indicator to
> > userspace that the circumstances may have changed, and userspace should
> > query the kernel for the list of available modes again. It should no
> > longer trust information obtained prior to getting the bad link status,
> > including the current mode.
> >
> > But specifically, I think you're right, and AFAICT asking for the list
> > of modes again is the only way for the userspace to distinguish between
> > the two cases. I don't think there's a shortcut for deciding the current
> > mode is still valid.
> 
> To avoid the busy-loop problem, I think I'd like this patch to re-query
> the kernel to ask about the current mode list, and only try to re-set
> the mode if our mode is still there.

Yeah, I guess that sounds like a reasonable heuristics. More integrated
compositors (aka the wayland ones) might be able to make more useful
decisions, but for -modesetting that's probably the best we can do.
 
> If the mode isn't there, then it's up to the DE to take action in
> response to the notification of new modes.  If you don't have a DE to
> take appropriate action, you're kind of out of luck.
> 
> As far as the ABI goes, this seems fine to me.  The only concern I had
> about ABI was having to walk all the connectors on every uevent to see
> if any had gone bad -- couldn't we have a flag of some sort about what
> the uevent indicates?  But uevents should be super rare, so I'd say the
> kernel could go ahead with the current plan.

We've discussed finer-grained uevent singalling a few times, and I'd like
that to be an orthogonal abi extension. Right now we just don't have the
required tracking even within the kernel to know which connector changed
(and the tracking we do have missed a few things, too). My idea is to push
the tracking into the leaves of the callchains with a function that
increases some per-connector epoch counter. Then we'd just have to expose
that somehow cheaply to userspace (could probably just send it along in
the uevent). Plus expose it as a read-only property so that userspace can
re-synchronize.

But again, that should be an orthogonal thing imo.
-Daniel
Eric Anholt Feb. 2, 2017, 5:57 p.m. UTC | #10
Daniel Vetter <daniel@ffwll.ch> writes:

> On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
>> Jani Nikula <jani.nikula@linux.intel.com> writes:
>> 
>> > On Tue, 31 Jan 2017, Eric Anholt <eric@anholt.net> wrote:
>> >> Martin Peres <martin.peres@linux.intel.com> writes:
>> >>
>> >>> Despite all the careful planing of the kernel, a link may become
>> >>> insufficient to handle the currently-set mode. At this point, the
>> >>> kernel should mark this particular configuration as being broken
>> >>> and potentially prune the mode before setting the offending connector's
>> >>> link-status to BAD and send the userspace a hotplug event. This may
>> >>> happen right after a modeset or later on.
>> >>>
>> >>> When available, we should use the link-status information to reset
>> >>> the wanted mode.
>> >>>
>> >>> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
>> >>
>> >> If I understand this right, there are two failure modes being handled:
>> >>
>> >> 1) A mode that won't actually work because the link isn't good enough.
>> >>
>> >> 2) A mode that should work, but link parameters were too optimistic and
>> >> if we just ask the kernel to set the mode again it'll use more
>> >> conservative parameters that work.
>> >>
>> >> This patch seems good for 2).  For 1), the drmmode_set_mode_major is
>> >> going to set our old mode back.  Won't the modeset then fail to link
>> >> train again, and bring us back into this loop?  The only escape that I
>> >> see would be some other userspace responding to the advertised mode list
>> >> changing, and then asking X to modeset to something new.
>> >>
>> >> To avoid that failure busy loop, should we re-fetching modes at this
>> >> point, and only re-setting if our mode still exists?
>> >
>> > Disclaimer: I don't know anything about the internals of the modesetting
>> > driver.
>> >
>> > Perhaps we can identify the two cases now, but I'd put this more
>> > generally: if the link status has gone bad, it's an indicator to
>> > userspace that the circumstances may have changed, and userspace should
>> > query the kernel for the list of available modes again. It should no
>> > longer trust information obtained prior to getting the bad link status,
>> > including the current mode.
>> >
>> > But specifically, I think you're right, and AFAICT asking for the list
>> > of modes again is the only way for the userspace to distinguish between
>> > the two cases. I don't think there's a shortcut for deciding the current
>> > mode is still valid.
>> 
>> To avoid the busy-loop problem, I think I'd like this patch to re-query
>> the kernel to ask about the current mode list, and only try to re-set
>> the mode if our mode is still there.
>
> Yeah, I guess that sounds like a reasonable heuristics. More integrated
> compositors (aka the wayland ones) might be able to make more useful
> decisions, but for -modesetting that's probably the best we can do.
>  
>> If the mode isn't there, then it's up to the DE to take action in
>> response to the notification of new modes.  If you don't have a DE to
>> take appropriate action, you're kind of out of luck.
>> 
>> As far as the ABI goes, this seems fine to me.  The only concern I had
>> about ABI was having to walk all the connectors on every uevent to see
>> if any had gone bad -- couldn't we have a flag of some sort about what
>> the uevent indicates?  But uevents should be super rare, so I'd say the
>> kernel could go ahead with the current plan.
>
> We've discussed finer-grained uevent singalling a few times, and I'd like
> that to be an orthogonal abi extension. Right now we just don't have the
> required tracking even within the kernel to know which connector changed
> (and the tracking we do have missed a few things, too). My idea is to push
> the tracking into the leaves of the callchains with a function that
> increases some per-connector epoch counter. Then we'd just have to expose
> that somehow cheaply to userspace (could probably just send it along in
> the uevent). Plus expose it as a read-only property so that userspace can
> re-synchronize.
>
> But again, that should be an orthogonal thing imo.

Yeah, that was roughly my thought process, too.
Martin Peres Feb. 2, 2017, 11:30 p.m. UTC | #11
On 01/02/17 22:05, Manasi Navare wrote:
> On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
>> Jani Nikula <jani.nikula@linux.intel.com> writes:
>>
>>> On Tue, 31 Jan 2017, Eric Anholt <eric@anholt.net> wrote:
>>>> Martin Peres <martin.peres@linux.intel.com> writes:
>>>>
>>>>> Despite all the careful planing of the kernel, a link may become
>>>>> insufficient to handle the currently-set mode. At this point, the
>>>>> kernel should mark this particular configuration as being broken
>>>>> and potentially prune the mode before setting the offending connector's
>>>>> link-status to BAD and send the userspace a hotplug event. This may
>>>>> happen right after a modeset or later on.
>>>>>
>>>>> When available, we should use the link-status information to reset
>>>>> the wanted mode.
>>>>>
>>>>> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
>>>> If I understand this right, there are two failure modes being handled:
>>>>
>>>> 1) A mode that won't actually work because the link isn't good enough.
>>>>
>>>> 2) A mode that should work, but link parameters were too optimistic and
>>>> if we just ask the kernel to set the mode again it'll use more
>>>> conservative parameters that work.
>>>>
>>>> This patch seems good for 2).  For 1), the drmmode_set_mode_major is
>>>> going to set our old mode back.  Won't the modeset then fail to link
>>>> train again, and bring us back into this loop?  The only escape that I
>>>> see would be some other userspace responding to the advertised mode list
>>>> changing, and then asking X to modeset to something new.
>>>>
>>>> To avoid that failure busy loop, should we re-fetching modes at this
>>>> point, and only re-setting if our mode still exists?
>>> Disclaimer: I don't know anything about the internals of the modesetting
>>> driver.
>>>
>>> Perhaps we can identify the two cases now, but I'd put this more
>>> generally: if the link status has gone bad, it's an indicator to
>>> userspace that the circumstances may have changed, and userspace should
>>> query the kernel for the list of available modes again. It should no
>>> longer trust information obtained prior to getting the bad link status,
>>> including the current mode.
>>>
>>> But specifically, I think you're right, and AFAICT asking for the list
>>> of modes again is the only way for the userspace to distinguish between
>>> the two cases. I don't think there's a shortcut for deciding the current
>>> mode is still valid.
>> To avoid the busy-loop problem, I think I'd like this patch to re-query
>> the kernel to ask about the current mode list, and only try to re-set
>> the mode if our mode is still there.
>>
>> If the mode isn't there, then it's up to the DE to take action in
>> response to the notification of new modes.  If you don't have a DE to
>> take appropriate action, you're kind of out of luck.
>>
>> As far as the ABI goes, this seems fine to me.  The only concern I had
>> about ABI was having to walk all the connectors on every uevent to see
>> if any had gone bad -- couldn't we have a flag of some sort about what
>> the uevent indicates?  But uevents should be super rare, so I'd say the
>> kernel could go ahead with the current plan.
> Yes I agree. The kernel sets the link status BAD as soona s link training fails
> but does not prune the modes until a new modelist is requested by the userspace.
> So this patch should re query the mode list as soon as it sees the link status
> BAD in order for the kernel to validate the modes again based on new link
> paarmeters and send a new mode list back.

Seems like a bad behaviour from the kernel, isn't it? It should return 
immediatly
if the mode is gonna be pruned :s

With the behaviour you are talking about, I should see 2 uevents when 
injecting one
BAD link-state (first uevent generates a new modeset that will then 
generate a BAD
state and another uevent, but this time the mode will have been pruned 
so when
-modesetting tries to set the mode, it will fail immediatly). During my 
testing, I do
not remember seeing such behaviour, so the kernel seemed to be doing the 
right thing
from my PoV (failing a modeset to a mode that is known not to be 
achievable). I can
verify tommorow, but it would be nice to make sure it is part of the ABI.

As for re-fetching the modes, this is something the DE will do anyway 
when asking
for them via randr. So, really, that will generate double probing in the 
common
case for what seems to be a workaround. Given that probing can be a 
super expensive
operation (request EDID from all monitors, potentially first starting up 
powered-down
GPUs such as NVIDIA or AMD), I would say that probing should not be 
taken lightly.

Isn't it possible to just return an error from the kernel if the mode 
should disapear?
As far as my testing goes, this was already what seemed to be 
happening... but I may be
wrong, especially since my DP monitor was fine with no link training 
whatsoever. What
is the current ABI for the userspace requesting a mode from a previous 
monitor to a new
one, because it did not reprobe?

In any case, this is a good discussion to have!
> Remember that it could still not prune any mode if the same mode is valid
> with lower bpp, it will still keep the mode list same and when the
> userspace retries the same mode, it will do a modeset at lower bpp (say 18bpp)
> and still succeed. (Same mode at lower bpp still better than dropping the resolution)

Yes, this is the reason why I am doing the re-set of the mode ;) 
Otherwise, we would not
need to do anything in there ;)

Martin
Navare, Manasi Feb. 3, 2017, 12:30 a.m. UTC | #12
On Fri, Feb 03, 2017 at 01:30:14AM +0200, Martin Peres wrote:
> On 01/02/17 22:05, Manasi Navare wrote:
> >On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
> >>Jani Nikula <jani.nikula@linux.intel.com> writes:
> >>
> >>>On Tue, 31 Jan 2017, Eric Anholt <eric@anholt.net> wrote:
> >>>>Martin Peres <martin.peres@linux.intel.com> writes:
> >>>>
> >>>>>Despite all the careful planing of the kernel, a link may become
> >>>>>insufficient to handle the currently-set mode. At this point, the
> >>>>>kernel should mark this particular configuration as being broken
> >>>>>and potentially prune the mode before setting the offending connector's
> >>>>>link-status to BAD and send the userspace a hotplug event. This may
> >>>>>happen right after a modeset or later on.
> >>>>>
> >>>>>When available, we should use the link-status information to reset
> >>>>>the wanted mode.
> >>>>>
> >>>>>Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> >>>>If I understand this right, there are two failure modes being handled:
> >>>>
> >>>>1) A mode that won't actually work because the link isn't good enough.
> >>>>
> >>>>2) A mode that should work, but link parameters were too optimistic and
> >>>>if we just ask the kernel to set the mode again it'll use more
> >>>>conservative parameters that work.
> >>>>
> >>>>This patch seems good for 2).  For 1), the drmmode_set_mode_major is
> >>>>going to set our old mode back.  Won't the modeset then fail to link
> >>>>train again, and bring us back into this loop?  The only escape that I
> >>>>see would be some other userspace responding to the advertised mode list
> >>>>changing, and then asking X to modeset to something new.
> >>>>
> >>>>To avoid that failure busy loop, should we re-fetching modes at this
> >>>>point, and only re-setting if our mode still exists?
> >>>Disclaimer: I don't know anything about the internals of the modesetting
> >>>driver.
> >>>
> >>>Perhaps we can identify the two cases now, but I'd put this more
> >>>generally: if the link status has gone bad, it's an indicator to
> >>>userspace that the circumstances may have changed, and userspace should
> >>>query the kernel for the list of available modes again. It should no
> >>>longer trust information obtained prior to getting the bad link status,
> >>>including the current mode.
> >>>
> >>>But specifically, I think you're right, and AFAICT asking for the list
> >>>of modes again is the only way for the userspace to distinguish between
> >>>the two cases. I don't think there's a shortcut for deciding the current
> >>>mode is still valid.
> >>To avoid the busy-loop problem, I think I'd like this patch to re-query
> >>the kernel to ask about the current mode list, and only try to re-set
> >>the mode if our mode is still there.
> >>
> >>If the mode isn't there, then it's up to the DE to take action in
> >>response to the notification of new modes.  If you don't have a DE to
> >>take appropriate action, you're kind of out of luck.
> >>
> >>As far as the ABI goes, this seems fine to me.  The only concern I had
> >>about ABI was having to walk all the connectors on every uevent to see
> >>if any had gone bad -- couldn't we have a flag of some sort about what
> >>the uevent indicates?  But uevents should be super rare, so I'd say the
> >>kernel could go ahead with the current plan.
> >Yes I agree. The kernel sets the link status BAD as soona s link training fails
> >but does not prune the modes until a new modelist is requested by the userspace.
> >So this patch should re query the mode list as soon as it sees the link status
> >BAD in order for the kernel to validate the modes again based on new link
> >paarmeters and send a new mode list back.
> 
> Seems like a bad behaviour from the kernel, isn't it? It should
> return immediatly
> if the mode is gonna be pruned :s

The kernel only knows that the link was bad because link training failed
so it sets the link status as BAD and sends a uevent expecting userspace
to take action. It will not prune the modes unless drm_helper_probe_single_connector_modes
is called by the userspace which would happen only when drmModeGetConnector call
is initiated by userspace. 

So in your function too to read the link status
you should do a drmModeGetConnector() that will probe the connector modes and 
validate and prune necessary modes, then if the link status is BAD handle it by two ways:

1. Modeset on the existing mode which will fail if the current mode was pruned already
2. If step 1 fails, then fetch the modes and this will be an updated mode list and
modeset on the first mode in the list.

This is how SNA driver implements this.

Danvet/Jani, please correct me if I am wrong and suggest if pruning should
be done by kernel instead before sending a uevent on link status BAD.

Regards
Manasi




> 
> With the behaviour you are talking about, I should see 2 uevents
> when injecting one
> BAD link-state (first uevent generates a new modeset that will then
> generate a BAD
> state and another uevent, but this time the mode will have been
> pruned so when
> -modesetting tries to set the mode, it will fail immediatly). During
> my testing, I do
> not remember seeing such behaviour, so the kernel seemed to be doing
> the right thing
> from my PoV (failing a modeset to a mode that is known not to be
> achievable). I can
> verify tommorow, but it would be nice to make sure it is part of the ABI.
> 
> As for re-fetching the modes, this is something the DE will do
> anyway when asking
> for them via randr. So, really, that will generate double probing in
> the common
> case for what seems to be a workaround. Given that probing can be a
> super expensive
> operation (request EDID from all monitors, potentially first
> starting up powered-down
> GPUs such as NVIDIA or AMD), I would say that probing should not be
> taken lightly.
> 
> Isn't it possible to just return an error from the kernel if the
> mode should disapear?
> As far as my testing goes, this was already what seemed to be
> happening... but I may be
> wrong, especially since my DP monitor was fine with no link training
> whatsoever. What
> is the current ABI for the userspace requesting a mode from a
> previous monitor to a new
> one, because it did not reprobe?
> 
> In any case, this is a good discussion to have!
> >Remember that it could still not prune any mode if the same mode is valid
> >with lower bpp, it will still keep the mode list same and when the
> >userspace retries the same mode, it will do a modeset at lower bpp (say 18bpp)
> >and still succeed. (Same mode at lower bpp still better than dropping the resolution)
> 
> Yes, this is the reason why I am doing the re-set of the mode ;)
> Otherwise, we would not
> need to do anything in there ;)
> 
> Martin
>
Daniel Vetter Feb. 3, 2017, 8:04 a.m. UTC | #13
On Fri, Feb 03, 2017 at 01:30:14AM +0200, Martin Peres wrote:
> On 01/02/17 22:05, Manasi Navare wrote:
> > On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
> > > Jani Nikula <jani.nikula@linux.intel.com> writes:
> > > 
> > > > On Tue, 31 Jan 2017, Eric Anholt <eric@anholt.net> wrote:
> > > > > Martin Peres <martin.peres@linux.intel.com> writes:
> > > > > 
> > > > > > Despite all the careful planing of the kernel, a link may become
> > > > > > insufficient to handle the currently-set mode. At this point, the
> > > > > > kernel should mark this particular configuration as being broken
> > > > > > and potentially prune the mode before setting the offending connector's
> > > > > > link-status to BAD and send the userspace a hotplug event. This may
> > > > > > happen right after a modeset or later on.
> > > > > > 
> > > > > > When available, we should use the link-status information to reset
> > > > > > the wanted mode.
> > > > > > 
> > > > > > Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> > > > > If I understand this right, there are two failure modes being handled:
> > > > > 
> > > > > 1) A mode that won't actually work because the link isn't good enough.
> > > > > 
> > > > > 2) A mode that should work, but link parameters were too optimistic and
> > > > > if we just ask the kernel to set the mode again it'll use more
> > > > > conservative parameters that work.
> > > > > 
> > > > > This patch seems good for 2).  For 1), the drmmode_set_mode_major is
> > > > > going to set our old mode back.  Won't the modeset then fail to link
> > > > > train again, and bring us back into this loop?  The only escape that I
> > > > > see would be some other userspace responding to the advertised mode list
> > > > > changing, and then asking X to modeset to something new.
> > > > > 
> > > > > To avoid that failure busy loop, should we re-fetching modes at this
> > > > > point, and only re-setting if our mode still exists?
> > > > Disclaimer: I don't know anything about the internals of the modesetting
> > > > driver.
> > > > 
> > > > Perhaps we can identify the two cases now, but I'd put this more
> > > > generally: if the link status has gone bad, it's an indicator to
> > > > userspace that the circumstances may have changed, and userspace should
> > > > query the kernel for the list of available modes again. It should no
> > > > longer trust information obtained prior to getting the bad link status,
> > > > including the current mode.
> > > > 
> > > > But specifically, I think you're right, and AFAICT asking for the list
> > > > of modes again is the only way for the userspace to distinguish between
> > > > the two cases. I don't think there's a shortcut for deciding the current
> > > > mode is still valid.
> > > To avoid the busy-loop problem, I think I'd like this patch to re-query
> > > the kernel to ask about the current mode list, and only try to re-set
> > > the mode if our mode is still there.
> > > 
> > > If the mode isn't there, then it's up to the DE to take action in
> > > response to the notification of new modes.  If you don't have a DE to
> > > take appropriate action, you're kind of out of luck.
> > > 
> > > As far as the ABI goes, this seems fine to me.  The only concern I had
> > > about ABI was having to walk all the connectors on every uevent to see
> > > if any had gone bad -- couldn't we have a flag of some sort about what
> > > the uevent indicates?  But uevents should be super rare, so I'd say the
> > > kernel could go ahead with the current plan.
> > Yes I agree. The kernel sets the link status BAD as soona s link training fails
> > but does not prune the modes until a new modelist is requested by the userspace.
> > So this patch should re query the mode list as soon as it sees the link status
> > BAD in order for the kernel to validate the modes again based on new link
> > paarmeters and send a new mode list back.
> 
> Seems like a bad behaviour from the kernel, isn't it? It should return
> immediatly
> if the mode is gonna be pruned :s

The mode list pruning isn't relevant for modeesets, the kernel doesn't
validate requested modes against that. The mode list is purely for
userspace's information. Which means updating it only when userspace asks
for it is fine.

I also thought some more about the loop when we're stuck on BAD, and I
think it shouldn't happen: When the link goes BAD we update the link
parameter values to the new limits, and the kernel will reject any mode
that won't fit anymore. Of course you might be unlucky, and the new link
limits are also not working, but eventually you're stuck at the "you're
link is broken, sry" stage, where the kernel rejects everything :-)

So I think the busy-loop has strictly a limited amount of time until it
runs out of steam.
-Daniel
Martin Peres Feb. 6, 2017, 3:50 p.m. UTC | #14
On 03/02/17 10:04, Daniel Vetter wrote:
> On Fri, Feb 03, 2017 at 01:30:14AM +0200, Martin Peres wrote:
>> On 01/02/17 22:05, Manasi Navare wrote:
>>> On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
>>>> Jani Nikula <jani.nikula@linux.intel.com> writes:
>>>>
>>>>> On Tue, 31 Jan 2017, Eric Anholt <eric@anholt.net> wrote:
>>>>>> Martin Peres <martin.peres@linux.intel.com> writes:
>>>>>>
>>>>>>> Despite all the careful planing of the kernel, a link may become
>>>>>>> insufficient to handle the currently-set mode. At this point, the
>>>>>>> kernel should mark this particular configuration as being broken
>>>>>>> and potentially prune the mode before setting the offending connector's
>>>>>>> link-status to BAD and send the userspace a hotplug event. This may
>>>>>>> happen right after a modeset or later on.
>>>>>>>
>>>>>>> When available, we should use the link-status information to reset
>>>>>>> the wanted mode.
>>>>>>>
>>>>>>> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
>>>>>> If I understand this right, there are two failure modes being handled:
>>>>>>
>>>>>> 1) A mode that won't actually work because the link isn't good enough.
>>>>>>
>>>>>> 2) A mode that should work, but link parameters were too optimistic and
>>>>>> if we just ask the kernel to set the mode again it'll use more
>>>>>> conservative parameters that work.
>>>>>>
>>>>>> This patch seems good for 2).  For 1), the drmmode_set_mode_major is
>>>>>> going to set our old mode back.  Won't the modeset then fail to link
>>>>>> train again, and bring us back into this loop?  The only escape that I
>>>>>> see would be some other userspace responding to the advertised mode list
>>>>>> changing, and then asking X to modeset to something new.
>>>>>>
>>>>>> To avoid that failure busy loop, should we re-fetching modes at this
>>>>>> point, and only re-setting if our mode still exists?
>>>>> Disclaimer: I don't know anything about the internals of the modesetting
>>>>> driver.
>>>>>
>>>>> Perhaps we can identify the two cases now, but I'd put this more
>>>>> generally: if the link status has gone bad, it's an indicator to
>>>>> userspace that the circumstances may have changed, and userspace should
>>>>> query the kernel for the list of available modes again. It should no
>>>>> longer trust information obtained prior to getting the bad link status,
>>>>> including the current mode.
>>>>>
>>>>> But specifically, I think you're right, and AFAICT asking for the list
>>>>> of modes again is the only way for the userspace to distinguish between
>>>>> the two cases. I don't think there's a shortcut for deciding the current
>>>>> mode is still valid.
>>>> To avoid the busy-loop problem, I think I'd like this patch to re-query
>>>> the kernel to ask about the current mode list, and only try to re-set
>>>> the mode if our mode is still there.
>>>>
>>>> If the mode isn't there, then it's up to the DE to take action in
>>>> response to the notification of new modes.  If you don't have a DE to
>>>> take appropriate action, you're kind of out of luck.
>>>>
>>>> As far as the ABI goes, this seems fine to me.  The only concern I had
>>>> about ABI was having to walk all the connectors on every uevent to see
>>>> if any had gone bad -- couldn't we have a flag of some sort about what
>>>> the uevent indicates?  But uevents should be super rare, so I'd say the
>>>> kernel could go ahead with the current plan.
>>> Yes I agree. The kernel sets the link status BAD as soona s link training fails
>>> but does not prune the modes until a new modelist is requested by the userspace.
>>> So this patch should re query the mode list as soon as it sees the link status
>>> BAD in order for the kernel to validate the modes again based on new link
>>> paarmeters and send a new mode list back.
>> Seems like a bad behaviour from the kernel, isn't it? It should return
>> immediatly
>> if the mode is gonna be pruned :s
> The mode list pruning isn't relevant for modeesets, the kernel doesn't
> validate requested modes against that. The mode list is purely for
> userspace's information. Which means updating it only when userspace asks
> for it is fine.

Hmm, ok. Fair enough!

> I also thought some more about the loop when we're stuck on BAD, and I
> think it shouldn't happen: When the link goes BAD we update the link
> parameter values to the new limits, and the kernel will reject any mode
> that won't fit anymore. Of course you might be unlucky, and the new link
> limits are also not working, but eventually you're stuck at the "you're
> link is broken, sry" stage, where the kernel rejects everything :-)
>
> So I think the busy-loop has strictly a limited amount of time until it
> runs out of steam.

OK, I have given it more thoughts and discussed with Ville and Chris IRL or
on IRC.

My current proposal is based on the idea that the kernel should reject a 
mode
it knows it cannot set. This is already largely tested in real life: Try
setting 4K@60Hz on an HDMI 1.x monitor, it should immediately fail on
prepare(). For this proposal to work, we would need to put in the ABI that a
driver that sets the link-status to BAD should also make sure that whatever
the userspace does, no infinite loop should be created (by changing the
maximum link characteristics before setting the link-status property).

Re-probing the list of modes and checking if the mode is still in there is
inherently racy, as a new screen may be plugged between the moment the list
is sent to the userspace and the moment when we try setting the mode. Or the
DE may also have changed the mode in the mean time and the kernel would
have reduced the limits even more. So, the userspace cannot be expected to
always do the right thing here :s.

 From this point of view, I really do not like the idea of re-probing, since
it increases the delay before the DE can handle the change and it does not
bring any guarantee of working properly.

Did I miss anything? Any opinions?

Cheers,
Martin
Martin Peres Feb. 8, 2017, 4:37 p.m. UTC | #15
On 06/02/17 17:50, Martin Peres wrote:
> On 03/02/17 10:04, Daniel Vetter wrote:
>> On Fri, Feb 03, 2017 at 01:30:14AM +0200, Martin Peres wrote:
>>> On 01/02/17 22:05, Manasi Navare wrote:
>>>> On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
>>>>> Jani Nikula <jani.nikula@linux.intel.com> writes:
>>>>>
>>>>>> On Tue, 31 Jan 2017, Eric Anholt <eric@anholt.net> wrote:
>>>>>>> Martin Peres <martin.peres@linux.intel.com> writes:
>>>>>>>
>>>>>>>> Despite all the careful planing of the kernel, a link may become
>>>>>>>> insufficient to handle the currently-set mode. At this point, the
>>>>>>>> kernel should mark this particular configuration as being broken
>>>>>>>> and potentially prune the mode before setting the offending
>>>>>>>> connector's
>>>>>>>> link-status to BAD and send the userspace a hotplug event. This may
>>>>>>>> happen right after a modeset or later on.
>>>>>>>>
>>>>>>>> When available, we should use the link-status information to reset
>>>>>>>> the wanted mode.
>>>>>>>>
>>>>>>>> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
>>>>>>> If I understand this right, there are two failure modes being
>>>>>>> handled:
>>>>>>>
>>>>>>> 1) A mode that won't actually work because the link isn't good
>>>>>>> enough.
>>>>>>>
>>>>>>> 2) A mode that should work, but link parameters were too
>>>>>>> optimistic and
>>>>>>> if we just ask the kernel to set the mode again it'll use more
>>>>>>> conservative parameters that work.
>>>>>>>
>>>>>>> This patch seems good for 2).  For 1), the drmmode_set_mode_major is
>>>>>>> going to set our old mode back.  Won't the modeset then fail to link
>>>>>>> train again, and bring us back into this loop?  The only escape
>>>>>>> that I
>>>>>>> see would be some other userspace responding to the advertised
>>>>>>> mode list
>>>>>>> changing, and then asking X to modeset to something new.
>>>>>>>
>>>>>>> To avoid that failure busy loop, should we re-fetching modes at this
>>>>>>> point, and only re-setting if our mode still exists?
>>>>>> Disclaimer: I don't know anything about the internals of the
>>>>>> modesetting
>>>>>> driver.
>>>>>>
>>>>>> Perhaps we can identify the two cases now, but I'd put this more
>>>>>> generally: if the link status has gone bad, it's an indicator to
>>>>>> userspace that the circumstances may have changed, and userspace
>>>>>> should
>>>>>> query the kernel for the list of available modes again. It should no
>>>>>> longer trust information obtained prior to getting the bad link
>>>>>> status,
>>>>>> including the current mode.
>>>>>>
>>>>>> But specifically, I think you're right, and AFAICT asking for the
>>>>>> list
>>>>>> of modes again is the only way for the userspace to distinguish
>>>>>> between
>>>>>> the two cases. I don't think there's a shortcut for deciding the
>>>>>> current
>>>>>> mode is still valid.
>>>>> To avoid the busy-loop problem, I think I'd like this patch to
>>>>> re-query
>>>>> the kernel to ask about the current mode list, and only try to re-set
>>>>> the mode if our mode is still there.
>>>>>
>>>>> If the mode isn't there, then it's up to the DE to take action in
>>>>> response to the notification of new modes.  If you don't have a DE to
>>>>> take appropriate action, you're kind of out of luck.
>>>>>
>>>>> As far as the ABI goes, this seems fine to me.  The only concern I had
>>>>> about ABI was having to walk all the connectors on every uevent to see
>>>>> if any had gone bad -- couldn't we have a flag of some sort about what
>>>>> the uevent indicates?  But uevents should be super rare, so I'd say
>>>>> the
>>>>> kernel could go ahead with the current plan.
>>>> Yes I agree. The kernel sets the link status BAD as soona s link
>>>> training fails
>>>> but does not prune the modes until a new modelist is requested by
>>>> the userspace.
>>>> So this patch should re query the mode list as soon as it sees the
>>>> link status
>>>> BAD in order for the kernel to validate the modes again based on new
>>>> link
>>>> paarmeters and send a new mode list back.
>>> Seems like a bad behaviour from the kernel, isn't it? It should return
>>> immediatly
>>> if the mode is gonna be pruned :s
>> The mode list pruning isn't relevant for modeesets, the kernel doesn't
>> validate requested modes against that. The mode list is purely for
>> userspace's information. Which means updating it only when userspace asks
>> for it is fine.
>
> Hmm, ok. Fair enough!
>
>> I also thought some more about the loop when we're stuck on BAD, and I
>> think it shouldn't happen: When the link goes BAD we update the link
>> parameter values to the new limits, and the kernel will reject any mode
>> that won't fit anymore. Of course you might be unlucky, and the new link
>> limits are also not working, but eventually you're stuck at the "you're
>> link is broken, sry" stage, where the kernel rejects everything :-)
>>
>> So I think the busy-loop has strictly a limited amount of time until it
>> runs out of steam.
>
> OK, I have given it more thoughts and discussed with Ville and Chris IRL or
> on IRC.
>
> My current proposal is based on the idea that the kernel should reject a
> mode
> it knows it cannot set. This is already largely tested in real life: Try
> setting 4K@60Hz on an HDMI 1.x monitor, it should immediately fail on
> prepare(). For this proposal to work, we would need to put in the ABI
> that a
> driver that sets the link-status to BAD should also make sure that whatever
> the userspace does, no infinite loop should be created (by changing the
> maximum link characteristics before setting the link-status property).
>
> Re-probing the list of modes and checking if the mode is still in there is
> inherently racy, as a new screen may be plugged between the moment the list
> is sent to the userspace and the moment when we try setting the mode. Or
> the
> DE may also have changed the mode in the mean time and the kernel would
> have reduced the limits even more. So, the userspace cannot be expected to
> always do the right thing here :s.
>
> From this point of view, I really do not like the idea of re-probing, since
> it increases the delay before the DE can handle the change and it does not
> bring any guarantee of working properly.
>
> Did I miss anything? Any opinions?

Any comments on this, Eric? Does it sound logical to you or did I miss 
something?

The kernel patches are stuck until this patch gets in. So far, you look 
like the only person who would be willing to review this patch (Ajax 
acked the patch, but that's the extent he was willing to go).

Feel free to suggest more potential reviewers if you don't want to 
commit to a R-b.

Cheers,
Martin
Eric Anholt Feb. 13, 2017, 9:05 p.m. UTC | #16
Martin Peres <martin.peres@linux.intel.com> writes:

> On 06/02/17 17:50, Martin Peres wrote:
>> On 03/02/17 10:04, Daniel Vetter wrote:
>>> On Fri, Feb 03, 2017 at 01:30:14AM +0200, Martin Peres wrote:
>>>> On 01/02/17 22:05, Manasi Navare wrote:
>>>>> On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
>>>>>> Jani Nikula <jani.nikula@linux.intel.com> writes:
>>>>>>
>>>>>>> On Tue, 31 Jan 2017, Eric Anholt <eric@anholt.net> wrote:
>>>>>>>> Martin Peres <martin.peres@linux.intel.com> writes:
>>>>>>>>
>>>>>>>>> Despite all the careful planing of the kernel, a link may become
>>>>>>>>> insufficient to handle the currently-set mode. At this point, the
>>>>>>>>> kernel should mark this particular configuration as being broken
>>>>>>>>> and potentially prune the mode before setting the offending
>>>>>>>>> connector's
>>>>>>>>> link-status to BAD and send the userspace a hotplug event. This may
>>>>>>>>> happen right after a modeset or later on.
>>>>>>>>>
>>>>>>>>> When available, we should use the link-status information to reset
>>>>>>>>> the wanted mode.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
>>>>>>>> If I understand this right, there are two failure modes being
>>>>>>>> handled:
>>>>>>>>
>>>>>>>> 1) A mode that won't actually work because the link isn't good
>>>>>>>> enough.
>>>>>>>>
>>>>>>>> 2) A mode that should work, but link parameters were too
>>>>>>>> optimistic and
>>>>>>>> if we just ask the kernel to set the mode again it'll use more
>>>>>>>> conservative parameters that work.
>>>>>>>>
>>>>>>>> This patch seems good for 2).  For 1), the drmmode_set_mode_major is
>>>>>>>> going to set our old mode back.  Won't the modeset then fail to link
>>>>>>>> train again, and bring us back into this loop?  The only escape
>>>>>>>> that I
>>>>>>>> see would be some other userspace responding to the advertised
>>>>>>>> mode list
>>>>>>>> changing, and then asking X to modeset to something new.
>>>>>>>>
>>>>>>>> To avoid that failure busy loop, should we re-fetching modes at this
>>>>>>>> point, and only re-setting if our mode still exists?
>>>>>>> Disclaimer: I don't know anything about the internals of the
>>>>>>> modesetting
>>>>>>> driver.
>>>>>>>
>>>>>>> Perhaps we can identify the two cases now, but I'd put this more
>>>>>>> generally: if the link status has gone bad, it's an indicator to
>>>>>>> userspace that the circumstances may have changed, and userspace
>>>>>>> should
>>>>>>> query the kernel for the list of available modes again. It should no
>>>>>>> longer trust information obtained prior to getting the bad link
>>>>>>> status,
>>>>>>> including the current mode.
>>>>>>>
>>>>>>> But specifically, I think you're right, and AFAICT asking for the
>>>>>>> list
>>>>>>> of modes again is the only way for the userspace to distinguish
>>>>>>> between
>>>>>>> the two cases. I don't think there's a shortcut for deciding the
>>>>>>> current
>>>>>>> mode is still valid.
>>>>>> To avoid the busy-loop problem, I think I'd like this patch to
>>>>>> re-query
>>>>>> the kernel to ask about the current mode list, and only try to re-set
>>>>>> the mode if our mode is still there.
>>>>>>
>>>>>> If the mode isn't there, then it's up to the DE to take action in
>>>>>> response to the notification of new modes.  If you don't have a DE to
>>>>>> take appropriate action, you're kind of out of luck.
>>>>>>
>>>>>> As far as the ABI goes, this seems fine to me.  The only concern I had
>>>>>> about ABI was having to walk all the connectors on every uevent to see
>>>>>> if any had gone bad -- couldn't we have a flag of some sort about what
>>>>>> the uevent indicates?  But uevents should be super rare, so I'd say
>>>>>> the
>>>>>> kernel could go ahead with the current plan.
>>>>> Yes I agree. The kernel sets the link status BAD as soona s link
>>>>> training fails
>>>>> but does not prune the modes until a new modelist is requested by
>>>>> the userspace.
>>>>> So this patch should re query the mode list as soon as it sees the
>>>>> link status
>>>>> BAD in order for the kernel to validate the modes again based on new
>>>>> link
>>>>> paarmeters and send a new mode list back.
>>>> Seems like a bad behaviour from the kernel, isn't it? It should return
>>>> immediatly
>>>> if the mode is gonna be pruned :s
>>> The mode list pruning isn't relevant for modeesets, the kernel doesn't
>>> validate requested modes against that. The mode list is purely for
>>> userspace's information. Which means updating it only when userspace asks
>>> for it is fine.
>>
>> Hmm, ok. Fair enough!
>>
>>> I also thought some more about the loop when we're stuck on BAD, and I
>>> think it shouldn't happen: When the link goes BAD we update the link
>>> parameter values to the new limits, and the kernel will reject any mode
>>> that won't fit anymore. Of course you might be unlucky, and the new link
>>> limits are also not working, but eventually you're stuck at the "you're
>>> link is broken, sry" stage, where the kernel rejects everything :-)
>>>
>>> So I think the busy-loop has strictly a limited amount of time until it
>>> runs out of steam.
>>
>> OK, I have given it more thoughts and discussed with Ville and Chris IRL or
>> on IRC.
>>
>> My current proposal is based on the idea that the kernel should reject a
>> mode
>> it knows it cannot set. This is already largely tested in real life: Try
>> setting 4K@60Hz on an HDMI 1.x monitor, it should immediately fail on
>> prepare(). For this proposal to work, we would need to put in the ABI
>> that a
>> driver that sets the link-status to BAD should also make sure that whatever
>> the userspace does, no infinite loop should be created (by changing the
>> maximum link characteristics before setting the link-status property).
>>
>> Re-probing the list of modes and checking if the mode is still in there is
>> inherently racy, as a new screen may be plugged between the moment the list
>> is sent to the userspace and the moment when we try setting the mode. Or
>> the
>> DE may also have changed the mode in the mean time and the kernel would
>> have reduced the limits even more. So, the userspace cannot be expected to
>> always do the right thing here :s.
>>
>> From this point of view, I really do not like the idea of re-probing, since
>> it increases the delay before the DE can handle the change and it does not
>> bring any guarantee of working properly.
>>
>> Did I miss anything? Any opinions?
>
> Any comments on this, Eric? Does it sound logical to you or did I miss 
> something?
>
> The kernel patches are stuck until this patch gets in. So far, you look 
> like the only person who would be willing to review this patch (Ajax 
> acked the patch, but that's the extent he was willing to go).

I was just trying to provide review to get the kernel unstuck.  The
kernel should not be blocked until the patch gets lands (this obviously
isn't the case with ioctls, which *don't* land in userspace until kernel
does), you just need userspace published and generally agreed that the
ABI works.
Navare, Manasi Feb. 13, 2017, 11:14 p.m. UTC | #17
On Mon, Feb 13, 2017 at 01:05:17PM -0800, Eric Anholt wrote:
> Martin Peres <martin.peres@linux.intel.com> writes:
> 
> > On 06/02/17 17:50, Martin Peres wrote:
> >> On 03/02/17 10:04, Daniel Vetter wrote:
> >>> On Fri, Feb 03, 2017 at 01:30:14AM +0200, Martin Peres wrote:
> >>>> On 01/02/17 22:05, Manasi Navare wrote:
> >>>>> On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
> >>>>>> Jani Nikula <jani.nikula@linux.intel.com> writes:
> >>>>>>
> >>>>>>> On Tue, 31 Jan 2017, Eric Anholt <eric@anholt.net> wrote:
> >>>>>>>> Martin Peres <martin.peres@linux.intel.com> writes:
> >>>>>>>>
> >>>>>>>>> Despite all the careful planing of the kernel, a link may become
> >>>>>>>>> insufficient to handle the currently-set mode. At this point, the
> >>>>>>>>> kernel should mark this particular configuration as being broken
> >>>>>>>>> and potentially prune the mode before setting the offending
> >>>>>>>>> connector's
> >>>>>>>>> link-status to BAD and send the userspace a hotplug event. This may
> >>>>>>>>> happen right after a modeset or later on.
> >>>>>>>>>
> >>>>>>>>> When available, we should use the link-status information to reset
> >>>>>>>>> the wanted mode.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> >>>>>>>> If I understand this right, there are two failure modes being
> >>>>>>>> handled:
> >>>>>>>>
> >>>>>>>> 1) A mode that won't actually work because the link isn't good
> >>>>>>>> enough.
> >>>>>>>>
> >>>>>>>> 2) A mode that should work, but link parameters were too
> >>>>>>>> optimistic and
> >>>>>>>> if we just ask the kernel to set the mode again it'll use more
> >>>>>>>> conservative parameters that work.
> >>>>>>>>
> >>>>>>>> This patch seems good for 2).  For 1), the drmmode_set_mode_major is
> >>>>>>>> going to set our old mode back.  Won't the modeset then fail to link
> >>>>>>>> train again, and bring us back into this loop?  The only escape
> >>>>>>>> that I
> >>>>>>>> see would be some other userspace responding to the advertised
> >>>>>>>> mode list
> >>>>>>>> changing, and then asking X to modeset to something new.
> >>>>>>>>
> >>>>>>>> To avoid that failure busy loop, should we re-fetching modes at this
> >>>>>>>> point, and only re-setting if our mode still exists?
> >>>>>>> Disclaimer: I don't know anything about the internals of the
> >>>>>>> modesetting
> >>>>>>> driver.
> >>>>>>>
> >>>>>>> Perhaps we can identify the two cases now, but I'd put this more
> >>>>>>> generally: if the link status has gone bad, it's an indicator to
> >>>>>>> userspace that the circumstances may have changed, and userspace
> >>>>>>> should
> >>>>>>> query the kernel for the list of available modes again. It should no
> >>>>>>> longer trust information obtained prior to getting the bad link
> >>>>>>> status,
> >>>>>>> including the current mode.
> >>>>>>>
> >>>>>>> But specifically, I think you're right, and AFAICT asking for the
> >>>>>>> list
> >>>>>>> of modes again is the only way for the userspace to distinguish
> >>>>>>> between
> >>>>>>> the two cases. I don't think there's a shortcut for deciding the
> >>>>>>> current
> >>>>>>> mode is still valid.
> >>>>>> To avoid the busy-loop problem, I think I'd like this patch to
> >>>>>> re-query
> >>>>>> the kernel to ask about the current mode list, and only try to re-set
> >>>>>> the mode if our mode is still there.
> >>>>>>
> >>>>>> If the mode isn't there, then it's up to the DE to take action in
> >>>>>> response to the notification of new modes.  If you don't have a DE to
> >>>>>> take appropriate action, you're kind of out of luck.
> >>>>>>
> >>>>>> As far as the ABI goes, this seems fine to me.  The only concern I had
> >>>>>> about ABI was having to walk all the connectors on every uevent to see
> >>>>>> if any had gone bad -- couldn't we have a flag of some sort about what
> >>>>>> the uevent indicates?  But uevents should be super rare, so I'd say
> >>>>>> the
> >>>>>> kernel could go ahead with the current plan.
> >>>>> Yes I agree. The kernel sets the link status BAD as soona s link
> >>>>> training fails
> >>>>> but does not prune the modes until a new modelist is requested by
> >>>>> the userspace.
> >>>>> So this patch should re query the mode list as soon as it sees the
> >>>>> link status
> >>>>> BAD in order for the kernel to validate the modes again based on new
> >>>>> link
> >>>>> paarmeters and send a new mode list back.
> >>>> Seems like a bad behaviour from the kernel, isn't it? It should return
> >>>> immediatly
> >>>> if the mode is gonna be pruned :s
> >>> The mode list pruning isn't relevant for modeesets, the kernel doesn't
> >>> validate requested modes against that. The mode list is purely for
> >>> userspace's information. Which means updating it only when userspace asks
> >>> for it is fine.
> >>
> >> Hmm, ok. Fair enough!
> >>
> >>> I also thought some more about the loop when we're stuck on BAD, and I
> >>> think it shouldn't happen: When the link goes BAD we update the link
> >>> parameter values to the new limits, and the kernel will reject any mode
> >>> that won't fit anymore. Of course you might be unlucky, and the new link
> >>> limits are also not working, but eventually you're stuck at the "you're
> >>> link is broken, sry" stage, where the kernel rejects everything :-)
> >>>
> >>> So I think the busy-loop has strictly a limited amount of time until it
> >>> runs out of steam.
> >>
> >> OK, I have given it more thoughts and discussed with Ville and Chris IRL or
> >> on IRC.
> >>
> >> My current proposal is based on the idea that the kernel should reject a
> >> mode
> >> it knows it cannot set. This is already largely tested in real life: Try
> >> setting 4K@60Hz on an HDMI 1.x monitor, it should immediately fail on
> >> prepare(). For this proposal to work, we would need to put in the ABI
> >> that a
> >> driver that sets the link-status to BAD should also make sure that whatever
> >> the userspace does, no infinite loop should be created (by changing the
> >> maximum link characteristics before setting the link-status property).
> >>
> >> Re-probing the list of modes and checking if the mode is still in there is
> >> inherently racy, as a new screen may be plugged between the moment the list
> >> is sent to the userspace and the moment when we try setting the mode. Or
> >> the
> >> DE may also have changed the mode in the mean time and the kernel would
> >> have reduced the limits even more. So, the userspace cannot be expected to
> >> always do the right thing here :s.
> >>
> >> From this point of view, I really do not like the idea of re-probing, since
> >> it increases the delay before the DE can handle the change and it does not
> >> bring any guarantee of working properly.
> >>
> >> Did I miss anything? Any opinions?
> >
> > Any comments on this, Eric? Does it sound logical to you or did I miss 
> > something?
> >
> > The kernel patches are stuck until this patch gets in. So far, you look 
> > like the only person who would be willing to review this patch (Ajax 
> > acked the patch, but that's the extent he was willing to go).
> 
> I was just trying to provide review to get the kernel unstuck.  The
> kernel should not be blocked until the patch gets lands (this obviously
> isn't the case with ioctls, which *don't* land in userspace until kernel
> does), you just need userspace published and generally agreed that the
> ABI works.

Thanks Eric. I agree so if we have a consensus from you and Ajax on this
design, can we move forward with getting the kernel patches merged?

We already have consensus from kernel folks as well as Ajax, 
Do we have your ACK on this?

Regards
Manasi
Martin Peres Feb. 16, 2017, 7:56 a.m. UTC | #18
On 13/02/17 23:05, Eric Anholt wrote:
> I was just trying to provide review to get the kernel unstuck.  The
> kernel should not be blocked until the patch gets lands (this obviously
> isn't the case with ioctls, which *don't* land in userspace until kernel
> does), you just need userspace published and generally agreed that the
> ABI works.
>

Yeah, I found it a bit odd too that we would get such requirement.

Anyway, thanks for taking the time, we will take this as an ACK from 
your side, which should be enough to merge the patch in the kernel.

Thanks!
Martin
Navare, Manasi Feb. 24, 2017, 8:09 p.m. UTC | #19
Hi Daniel,

We have ACKs on the userspace design from both Adams and Eric.
Is this enough to merge the kernel patches?

I spoke to Eric briefly about this and he gave me a verbal r-b as well.
He said the userspace patches cant get merged unless DRM patches are merged.

So what should be some of our next steps here?

Regards
Manasi


On Mon, Feb 13, 2017 at 01:05:17PM -0800, Eric Anholt wrote:
> Martin Peres <martin.peres@linux.intel.com> writes:
> 
> > On 06/02/17 17:50, Martin Peres wrote:
> >> On 03/02/17 10:04, Daniel Vetter wrote:
> >>> On Fri, Feb 03, 2017 at 01:30:14AM +0200, Martin Peres wrote:
> >>>> On 01/02/17 22:05, Manasi Navare wrote:
> >>>>> On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
> >>>>>> Jani Nikula <jani.nikula@linux.intel.com> writes:
> >>>>>>
> >>>>>>> On Tue, 31 Jan 2017, Eric Anholt <eric@anholt.net> wrote:
> >>>>>>>> Martin Peres <martin.peres@linux.intel.com> writes:
> >>>>>>>>
> >>>>>>>>> Despite all the careful planing of the kernel, a link may become
> >>>>>>>>> insufficient to handle the currently-set mode. At this point, the
> >>>>>>>>> kernel should mark this particular configuration as being broken
> >>>>>>>>> and potentially prune the mode before setting the offending
> >>>>>>>>> connector's
> >>>>>>>>> link-status to BAD and send the userspace a hotplug event. This may
> >>>>>>>>> happen right after a modeset or later on.
> >>>>>>>>>
> >>>>>>>>> When available, we should use the link-status information to reset
> >>>>>>>>> the wanted mode.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> >>>>>>>> If I understand this right, there are two failure modes being
> >>>>>>>> handled:
> >>>>>>>>
> >>>>>>>> 1) A mode that won't actually work because the link isn't good
> >>>>>>>> enough.
> >>>>>>>>
> >>>>>>>> 2) A mode that should work, but link parameters were too
> >>>>>>>> optimistic and
> >>>>>>>> if we just ask the kernel to set the mode again it'll use more
> >>>>>>>> conservative parameters that work.
> >>>>>>>>
> >>>>>>>> This patch seems good for 2).  For 1), the drmmode_set_mode_major is
> >>>>>>>> going to set our old mode back.  Won't the modeset then fail to link
> >>>>>>>> train again, and bring us back into this loop?  The only escape
> >>>>>>>> that I
> >>>>>>>> see would be some other userspace responding to the advertised
> >>>>>>>> mode list
> >>>>>>>> changing, and then asking X to modeset to something new.
> >>>>>>>>
> >>>>>>>> To avoid that failure busy loop, should we re-fetching modes at this
> >>>>>>>> point, and only re-setting if our mode still exists?
> >>>>>>> Disclaimer: I don't know anything about the internals of the
> >>>>>>> modesetting
> >>>>>>> driver.
> >>>>>>>
> >>>>>>> Perhaps we can identify the two cases now, but I'd put this more
> >>>>>>> generally: if the link status has gone bad, it's an indicator to
> >>>>>>> userspace that the circumstances may have changed, and userspace
> >>>>>>> should
> >>>>>>> query the kernel for the list of available modes again. It should no
> >>>>>>> longer trust information obtained prior to getting the bad link
> >>>>>>> status,
> >>>>>>> including the current mode.
> >>>>>>>
> >>>>>>> But specifically, I think you're right, and AFAICT asking for the
> >>>>>>> list
> >>>>>>> of modes again is the only way for the userspace to distinguish
> >>>>>>> between
> >>>>>>> the two cases. I don't think there's a shortcut for deciding the
> >>>>>>> current
> >>>>>>> mode is still valid.
> >>>>>> To avoid the busy-loop problem, I think I'd like this patch to
> >>>>>> re-query
> >>>>>> the kernel to ask about the current mode list, and only try to re-set
> >>>>>> the mode if our mode is still there.
> >>>>>>
> >>>>>> If the mode isn't there, then it's up to the DE to take action in
> >>>>>> response to the notification of new modes.  If you don't have a DE to
> >>>>>> take appropriate action, you're kind of out of luck.
> >>>>>>
> >>>>>> As far as the ABI goes, this seems fine to me.  The only concern I had
> >>>>>> about ABI was having to walk all the connectors on every uevent to see
> >>>>>> if any had gone bad -- couldn't we have a flag of some sort about what
> >>>>>> the uevent indicates?  But uevents should be super rare, so I'd say
> >>>>>> the
> >>>>>> kernel could go ahead with the current plan.
> >>>>> Yes I agree. The kernel sets the link status BAD as soona s link
> >>>>> training fails
> >>>>> but does not prune the modes until a new modelist is requested by
> >>>>> the userspace.
> >>>>> So this patch should re query the mode list as soon as it sees the
> >>>>> link status
> >>>>> BAD in order for the kernel to validate the modes again based on new
> >>>>> link
> >>>>> paarmeters and send a new mode list back.
> >>>> Seems like a bad behaviour from the kernel, isn't it? It should return
> >>>> immediatly
> >>>> if the mode is gonna be pruned :s
> >>> The mode list pruning isn't relevant for modeesets, the kernel doesn't
> >>> validate requested modes against that. The mode list is purely for
> >>> userspace's information. Which means updating it only when userspace asks
> >>> for it is fine.
> >>
> >> Hmm, ok. Fair enough!
> >>
> >>> I also thought some more about the loop when we're stuck on BAD, and I
> >>> think it shouldn't happen: When the link goes BAD we update the link
> >>> parameter values to the new limits, and the kernel will reject any mode
> >>> that won't fit anymore. Of course you might be unlucky, and the new link
> >>> limits are also not working, but eventually you're stuck at the "you're
> >>> link is broken, sry" stage, where the kernel rejects everything :-)
> >>>
> >>> So I think the busy-loop has strictly a limited amount of time until it
> >>> runs out of steam.
> >>
> >> OK, I have given it more thoughts and discussed with Ville and Chris IRL or
> >> on IRC.
> >>
> >> My current proposal is based on the idea that the kernel should reject a
> >> mode
> >> it knows it cannot set. This is already largely tested in real life: Try
> >> setting 4K@60Hz on an HDMI 1.x monitor, it should immediately fail on
> >> prepare(). For this proposal to work, we would need to put in the ABI
> >> that a
> >> driver that sets the link-status to BAD should also make sure that whatever
> >> the userspace does, no infinite loop should be created (by changing the
> >> maximum link characteristics before setting the link-status property).
> >>
> >> Re-probing the list of modes and checking if the mode is still in there is
> >> inherently racy, as a new screen may be plugged between the moment the list
> >> is sent to the userspace and the moment when we try setting the mode. Or
> >> the
> >> DE may also have changed the mode in the mean time and the kernel would
> >> have reduced the limits even more. So, the userspace cannot be expected to
> >> always do the right thing here :s.
> >>
> >> From this point of view, I really do not like the idea of re-probing, since
> >> it increases the delay before the DE can handle the change and it does not
> >> bring any guarantee of working properly.
> >>
> >> Did I miss anything? Any opinions?
> >
> > Any comments on this, Eric? Does it sound logical to you or did I miss 
> > something?
> >
> > The kernel patches are stuck until this patch gets in. So far, you look 
> > like the only person who would be willing to review this patch (Ajax 
> > acked the patch, but that's the extent he was willing to go).
> 
> I was just trying to provide review to get the kernel unstuck.  The
> kernel should not be blocked until the patch gets lands (this obviously
> isn't the case with ioctls, which *don't* land in userspace until kernel
> does), you just need userspace published and generally agreed that the
> ABI works.
Daniel Vetter Feb. 26, 2017, 7:42 p.m. UTC | #20
On Fri, Feb 24, 2017 at 12:09:51PM -0800, Manasi Navare wrote:
> Hi Daniel,
> 
> We have ACKs on the userspace design from both Adams and Eric.
> Is this enough to merge the kernel patches?
> 
> I spoke to Eric briefly about this and he gave me a verbal r-b as well.
> He said the userspace patches cant get merged unless DRM patches are merged.
> 
> So what should be some of our next steps here?

Still needs review on the kernel patches themselves, iirc Jani still had
some opens on that. But I was out of the loop for 2 weeks :-)
-Daniel

> 
> Regards
> Manasi
> 
> 
> On Mon, Feb 13, 2017 at 01:05:17PM -0800, Eric Anholt wrote:
> > Martin Peres <martin.peres@linux.intel.com> writes:
> > 
> > > On 06/02/17 17:50, Martin Peres wrote:
> > >> On 03/02/17 10:04, Daniel Vetter wrote:
> > >>> On Fri, Feb 03, 2017 at 01:30:14AM +0200, Martin Peres wrote:
> > >>>> On 01/02/17 22:05, Manasi Navare wrote:
> > >>>>> On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
> > >>>>>> Jani Nikula <jani.nikula@linux.intel.com> writes:
> > >>>>>>
> > >>>>>>> On Tue, 31 Jan 2017, Eric Anholt <eric@anholt.net> wrote:
> > >>>>>>>> Martin Peres <martin.peres@linux.intel.com> writes:
> > >>>>>>>>
> > >>>>>>>>> Despite all the careful planing of the kernel, a link may become
> > >>>>>>>>> insufficient to handle the currently-set mode. At this point, the
> > >>>>>>>>> kernel should mark this particular configuration as being broken
> > >>>>>>>>> and potentially prune the mode before setting the offending
> > >>>>>>>>> connector's
> > >>>>>>>>> link-status to BAD and send the userspace a hotplug event. This may
> > >>>>>>>>> happen right after a modeset or later on.
> > >>>>>>>>>
> > >>>>>>>>> When available, we should use the link-status information to reset
> > >>>>>>>>> the wanted mode.
> > >>>>>>>>>
> > >>>>>>>>> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> > >>>>>>>> If I understand this right, there are two failure modes being
> > >>>>>>>> handled:
> > >>>>>>>>
> > >>>>>>>> 1) A mode that won't actually work because the link isn't good
> > >>>>>>>> enough.
> > >>>>>>>>
> > >>>>>>>> 2) A mode that should work, but link parameters were too
> > >>>>>>>> optimistic and
> > >>>>>>>> if we just ask the kernel to set the mode again it'll use more
> > >>>>>>>> conservative parameters that work.
> > >>>>>>>>
> > >>>>>>>> This patch seems good for 2).  For 1), the drmmode_set_mode_major is
> > >>>>>>>> going to set our old mode back.  Won't the modeset then fail to link
> > >>>>>>>> train again, and bring us back into this loop?  The only escape
> > >>>>>>>> that I
> > >>>>>>>> see would be some other userspace responding to the advertised
> > >>>>>>>> mode list
> > >>>>>>>> changing, and then asking X to modeset to something new.
> > >>>>>>>>
> > >>>>>>>> To avoid that failure busy loop, should we re-fetching modes at this
> > >>>>>>>> point, and only re-setting if our mode still exists?
> > >>>>>>> Disclaimer: I don't know anything about the internals of the
> > >>>>>>> modesetting
> > >>>>>>> driver.
> > >>>>>>>
> > >>>>>>> Perhaps we can identify the two cases now, but I'd put this more
> > >>>>>>> generally: if the link status has gone bad, it's an indicator to
> > >>>>>>> userspace that the circumstances may have changed, and userspace
> > >>>>>>> should
> > >>>>>>> query the kernel for the list of available modes again. It should no
> > >>>>>>> longer trust information obtained prior to getting the bad link
> > >>>>>>> status,
> > >>>>>>> including the current mode.
> > >>>>>>>
> > >>>>>>> But specifically, I think you're right, and AFAICT asking for the
> > >>>>>>> list
> > >>>>>>> of modes again is the only way for the userspace to distinguish
> > >>>>>>> between
> > >>>>>>> the two cases. I don't think there's a shortcut for deciding the
> > >>>>>>> current
> > >>>>>>> mode is still valid.
> > >>>>>> To avoid the busy-loop problem, I think I'd like this patch to
> > >>>>>> re-query
> > >>>>>> the kernel to ask about the current mode list, and only try to re-set
> > >>>>>> the mode if our mode is still there.
> > >>>>>>
> > >>>>>> If the mode isn't there, then it's up to the DE to take action in
> > >>>>>> response to the notification of new modes.  If you don't have a DE to
> > >>>>>> take appropriate action, you're kind of out of luck.
> > >>>>>>
> > >>>>>> As far as the ABI goes, this seems fine to me.  The only concern I had
> > >>>>>> about ABI was having to walk all the connectors on every uevent to see
> > >>>>>> if any had gone bad -- couldn't we have a flag of some sort about what
> > >>>>>> the uevent indicates?  But uevents should be super rare, so I'd say
> > >>>>>> the
> > >>>>>> kernel could go ahead with the current plan.
> > >>>>> Yes I agree. The kernel sets the link status BAD as soona s link
> > >>>>> training fails
> > >>>>> but does not prune the modes until a new modelist is requested by
> > >>>>> the userspace.
> > >>>>> So this patch should re query the mode list as soon as it sees the
> > >>>>> link status
> > >>>>> BAD in order for the kernel to validate the modes again based on new
> > >>>>> link
> > >>>>> paarmeters and send a new mode list back.
> > >>>> Seems like a bad behaviour from the kernel, isn't it? It should return
> > >>>> immediatly
> > >>>> if the mode is gonna be pruned :s
> > >>> The mode list pruning isn't relevant for modeesets, the kernel doesn't
> > >>> validate requested modes against that. The mode list is purely for
> > >>> userspace's information. Which means updating it only when userspace asks
> > >>> for it is fine.
> > >>
> > >> Hmm, ok. Fair enough!
> > >>
> > >>> I also thought some more about the loop when we're stuck on BAD, and I
> > >>> think it shouldn't happen: When the link goes BAD we update the link
> > >>> parameter values to the new limits, and the kernel will reject any mode
> > >>> that won't fit anymore. Of course you might be unlucky, and the new link
> > >>> limits are also not working, but eventually you're stuck at the "you're
> > >>> link is broken, sry" stage, where the kernel rejects everything :-)
> > >>>
> > >>> So I think the busy-loop has strictly a limited amount of time until it
> > >>> runs out of steam.
> > >>
> > >> OK, I have given it more thoughts and discussed with Ville and Chris IRL or
> > >> on IRC.
> > >>
> > >> My current proposal is based on the idea that the kernel should reject a
> > >> mode
> > >> it knows it cannot set. This is already largely tested in real life: Try
> > >> setting 4K@60Hz on an HDMI 1.x monitor, it should immediately fail on
> > >> prepare(). For this proposal to work, we would need to put in the ABI
> > >> that a
> > >> driver that sets the link-status to BAD should also make sure that whatever
> > >> the userspace does, no infinite loop should be created (by changing the
> > >> maximum link characteristics before setting the link-status property).
> > >>
> > >> Re-probing the list of modes and checking if the mode is still in there is
> > >> inherently racy, as a new screen may be plugged between the moment the list
> > >> is sent to the userspace and the moment when we try setting the mode. Or
> > >> the
> > >> DE may also have changed the mode in the mean time and the kernel would
> > >> have reduced the limits even more. So, the userspace cannot be expected to
> > >> always do the right thing here :s.
> > >>
> > >> From this point of view, I really do not like the idea of re-probing, since
> > >> it increases the delay before the DE can handle the change and it does not
> > >> bring any guarantee of working properly.
> > >>
> > >> Did I miss anything? Any opinions?
> > >
> > > Any comments on this, Eric? Does it sound logical to you or did I miss 
> > > something?
> > >
> > > The kernel patches are stuck until this patch gets in. So far, you look 
> > > like the only person who would be willing to review this patch (Ajax 
> > > acked the patch, but that's the extent he was willing to go).
> > 
> > I was just trying to provide review to get the kernel unstuck.  The
> > kernel should not be blocked until the patch gets lands (this obviously
> > isn't the case with ioctls, which *don't* land in userspace until kernel
> > does), you just need userspace published and generally agreed that the
> > ABI works.
> 
>
Navare, Manasi Feb. 28, 2017, 4:07 a.m. UTC | #21
On Fri, Feb 24, 2017 at 12:09:51PM -0800, Manasi Navare wrote:
> Hi Daniel,
> 
> We have ACKs on the userspace design from both Adams and Eric.
> Is this enough to merge the kernel patches?
> 
> I spoke to Eric briefly about this and he gave me a verbal r-b as well.
> He said the userspace patches cant get merged unless DRM patches are merged.
> 
> So what should be some of our next steps here?

Still needs review on the kernel patches themselves, iirc Jani still had some opens on that. But I was out of the loop for 2  weeks :-) -Daniel

Thanks for merging the 1st patch in the kernel series. Are there any opens on the 2nd patch (the i915 patch)?
I wanted to actually respin the 2nd patch to add reset for intel_dp->lane_count  on the link training failure so that it doesn't accidently retrain the link with the stale/failed values. 

Regards
Manasi

> 
> Regards
> Manasi
> 
> 
> On Mon, Feb 13, 2017 at 01:05:17PM -0800, Eric Anholt wrote:
> > Martin Peres <martin.peres@linux.intel.com> writes:
> > 
> > > On 06/02/17 17:50, Martin Peres wrote:
> > >> On 03/02/17 10:04, Daniel Vetter wrote:
> > >>> On Fri, Feb 03, 2017 at 01:30:14AM +0200, Martin Peres wrote:
> > >>>> On 01/02/17 22:05, Manasi Navare wrote:
> > >>>>> On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
> > >>>>>> Jani Nikula <jani.nikula@linux.intel.com> writes:
> > >>>>>>
> > >>>>>>> On Tue, 31 Jan 2017, Eric Anholt <eric@anholt.net> wrote:
> > >>>>>>>> Martin Peres <martin.peres@linux.intel.com> writes:
> > >>>>>>>>
> > >>>>>>>>> Despite all the careful planing of the kernel, a link may 
> > >>>>>>>>> become insufficient to handle the currently-set mode. At 
> > >>>>>>>>> this point, the kernel should mark this particular 
> > >>>>>>>>> configuration as being broken and potentially prune the 
> > >>>>>>>>> mode before setting the offending connector's link-status 
> > >>>>>>>>> to BAD and send the userspace a hotplug event. This may 
> > >>>>>>>>> happen right after a modeset or later on.
> > >>>>>>>>>
> > >>>>>>>>> When available, we should use the link-status information 
> > >>>>>>>>> to reset the wanted mode.
> > >>>>>>>>>
> > >>>>>>>>> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> > >>>>>>>> If I understand this right, there are two failure modes 
> > >>>>>>>> being
> > >>>>>>>> handled:
> > >>>>>>>>
> > >>>>>>>> 1) A mode that won't actually work because the link isn't 
> > >>>>>>>> good enough.
> > >>>>>>>>
> > >>>>>>>> 2) A mode that should work, but link parameters were too 
> > >>>>>>>> optimistic and if we just ask the kernel to set the mode 
> > >>>>>>>> again it'll use more conservative parameters that work.
> > >>>>>>>>
> > >>>>>>>> This patch seems good for 2).  For 1), the 
> > >>>>>>>> drmmode_set_mode_major is going to set our old mode back.  
> > >>>>>>>> Won't the modeset then fail to link train again, and bring 
> > >>>>>>>> us back into this loop?  The only escape that I see would 
> > >>>>>>>> be some other userspace responding to the advertised mode 
> > >>>>>>>> list changing, and then asking X to modeset to something 
> > >>>>>>>> new.
> > >>>>>>>>
> > >>>>>>>> To avoid that failure busy loop, should we re-fetching 
> > >>>>>>>> modes at this point, and only re-setting if our mode still exists?
> > >>>>>>> Disclaimer: I don't know anything about the internals of the 
> > >>>>>>> modesetting driver.
> > >>>>>>>
> > >>>>>>> Perhaps we can identify the two cases now, but I'd put this 
> > >>>>>>> more
> > >>>>>>> generally: if the link status has gone bad, it's an 
> > >>>>>>> indicator to userspace that the circumstances may have 
> > >>>>>>> changed, and userspace should query the kernel for the list 
> > >>>>>>> of available modes again. It should no longer trust 
> > >>>>>>> information obtained prior to getting the bad link status, 
> > >>>>>>> including the current mode.
> > >>>>>>>
> > >>>>>>> But specifically, I think you're right, and AFAICT asking 
> > >>>>>>> for the list of modes again is the only way for the 
> > >>>>>>> userspace to distinguish between the two cases. I don't 
> > >>>>>>> think there's a shortcut for deciding the current mode is 
> > >>>>>>> still valid.
> > >>>>>> To avoid the busy-loop problem, I think I'd like this patch 
> > >>>>>> to
> > >>>>>> re-query
> > >>>>>> the kernel to ask about the current mode list, and only try to re-set
> > >>>>>> the mode if our mode is still there.
> > >>>>>>
> > >>>>>> If the mode isn't there, then it's up to the DE to take action in
> > >>>>>> response to the notification of new modes.  If you don't have a DE to
> > >>>>>> take appropriate action, you're kind of out of luck.
> > >>>>>>
> > >>>>>> As far as the ABI goes, this seems fine to me.  The only concern I had
> > >>>>>> about ABI was having to walk all the connectors on every uevent to see
> > >>>>>> if any had gone bad -- couldn't we have a flag of some sort about what
> > >>>>>> the uevent indicates?  But uevents should be super rare, so I'd say
> > >>>>>> the
> > >>>>>> kernel could go ahead with the current plan.
> > >>>>> Yes I agree. The kernel sets the link status BAD as soona s link
> > >>>>> training fails
> > >>>>> but does not prune the modes until a new modelist is requested by
> > >>>>> the userspace.
> > >>>>> So this patch should re query the mode list as soon as it sees the
> > >>>>> link status
> > >>>>> BAD in order for the kernel to validate the modes again based on new
> > >>>>> link
> > >>>>> paarmeters and send a new mode list back.
> > >>>> Seems like a bad behaviour from the kernel, isn't it? It should return
> > >>>> immediatly
> > >>>> if the mode is gonna be pruned :s
> > >>> The mode list pruning isn't relevant for modeesets, the kernel doesn't
> > >>> validate requested modes against that. The mode list is purely for
> > >>> userspace's information. Which means updating it only when userspace asks
> > >>> for it is fine.
> > >>
> > >> Hmm, ok. Fair enough!
> > >>
> > >>> I also thought some more about the loop when we're stuck on BAD, and I
> > >>> think it shouldn't happen: When the link goes BAD we update the link
> > >>> parameter values to the new limits, and the kernel will reject any mode
> > >>> that won't fit anymore. Of course you might be unlucky, and the new link
> > >>> limits are also not working, but eventually you're stuck at the "you're
> > >>> link is broken, sry" stage, where the kernel rejects everything :-)
> > >>>
> > >>> So I think the busy-loop has strictly a limited amount of time until it
> > >>> runs out of steam.
> > >>
> > >> OK, I have given it more thoughts and discussed with Ville and Chris IRL or
> > >> on IRC.
> > >>
> > >> My current proposal is based on the idea that the kernel should reject a
> > >> mode
> > >> it knows it cannot set. This is already largely tested in real life: Try
> > >> setting 4K@60Hz on an HDMI 1.x monitor, it should immediately fail on
> > >> prepare(). For this proposal to work, we would need to put in the ABI
> > >> that a
> > >> driver that sets the link-status to BAD should also make sure that whatever
> > >> the userspace does, no infinite loop should be created (by changing the
> > >> maximum link characteristics before setting the link-status property).
> > >>
> > >> Re-probing the list of modes and checking if the mode is still in there is
> > >> inherently racy, as a new screen may be plugged between the moment the list
> > >> is sent to the userspace and the moment when we try setting the mode. Or
> > >> the
> > >> DE may also have changed the mode in the mean time and the kernel would
> > >> have reduced the limits even more. So, the userspace cannot be expected to
> > >> always do the right thing here :s.
> > >>
> > >> From this point of view, I really do not like the idea of re-probing, since
> > >> it increases the delay before the DE can handle the change and it does not
> > >> bring any guarantee of working properly.
> > >>
> > >> Did I miss anything? Any opinions?
> > >
> > > Any comments on this, Eric? Does it sound logical to you or did I miss 
> > > something?
> > >
> > > The kernel patches are stuck until this patch gets in. So far, you look 
> > > like the only person who would be willing to review this patch (Ajax 
> > > acked the patch, but that's the extent he was willing to go).
> > 
> > I was just trying to provide review to get the kernel unstuck.  The
> > kernel should not be blocked until the patch gets lands (this obviously
> > isn't the case with ioctls, which *don't* land in userspace until kernel
> > does), you just need userspace published and generally agreed that the
> > ABI works.
> 
>
Daniel Vetter Feb. 28, 2017, 8:42 a.m. UTC | #22
On Tue, Feb 28, 2017 at 04:07:02AM +0000, Navare, Manasi D wrote:
> 
> On Fri, Feb 24, 2017 at 12:09:51PM -0800, Manasi Navare wrote:
> > Hi Daniel,
> > 
> > We have ACKs on the userspace design from both Adams and Eric.
> > Is this enough to merge the kernel patches?
> > 
> > I spoke to Eric briefly about this and he gave me a verbal r-b as well.
> > He said the userspace patches cant get merged unless DRM patches are merged.
> > 
> > So what should be some of our next steps here?
> 
> Still needs review on the kernel patches themselves, iirc Jani still had some opens on that. But I was out of the loop for 2  weeks :-) -Daniel
> 
> Thanks for merging the 1st patch in the kernel series. Are there any opens on the 2nd patch (the i915 patch)?
> I wanted to actually respin the 2nd patch to add reset for intel_dp->lane_count  on the link training failure so that it doesn't accidently retrain the link with the stale/failed values. 

Didn't look like that, and we can do this as a follow-up. I only asked
where we should merge it for best results. Let's continue that discussion
in the other thread.
-Daniel

> 
> Regards
> Manasi
> 
> > 
> > Regards
> > Manasi
> > 
> > 
> > On Mon, Feb 13, 2017 at 01:05:17PM -0800, Eric Anholt wrote:
> > > Martin Peres <martin.peres@linux.intel.com> writes:
> > > 
> > > > On 06/02/17 17:50, Martin Peres wrote:
> > > >> On 03/02/17 10:04, Daniel Vetter wrote:
> > > >>> On Fri, Feb 03, 2017 at 01:30:14AM +0200, Martin Peres wrote:
> > > >>>> On 01/02/17 22:05, Manasi Navare wrote:
> > > >>>>> On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
> > > >>>>>> Jani Nikula <jani.nikula@linux.intel.com> writes:
> > > >>>>>>
> > > >>>>>>> On Tue, 31 Jan 2017, Eric Anholt <eric@anholt.net> wrote:
> > > >>>>>>>> Martin Peres <martin.peres@linux.intel.com> writes:
> > > >>>>>>>>
> > > >>>>>>>>> Despite all the careful planing of the kernel, a link may 
> > > >>>>>>>>> become insufficient to handle the currently-set mode. At 
> > > >>>>>>>>> this point, the kernel should mark this particular 
> > > >>>>>>>>> configuration as being broken and potentially prune the 
> > > >>>>>>>>> mode before setting the offending connector's link-status 
> > > >>>>>>>>> to BAD and send the userspace a hotplug event. This may 
> > > >>>>>>>>> happen right after a modeset or later on.
> > > >>>>>>>>>
> > > >>>>>>>>> When available, we should use the link-status information 
> > > >>>>>>>>> to reset the wanted mode.
> > > >>>>>>>>>
> > > >>>>>>>>> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> > > >>>>>>>> If I understand this right, there are two failure modes 
> > > >>>>>>>> being
> > > >>>>>>>> handled:
> > > >>>>>>>>
> > > >>>>>>>> 1) A mode that won't actually work because the link isn't 
> > > >>>>>>>> good enough.
> > > >>>>>>>>
> > > >>>>>>>> 2) A mode that should work, but link parameters were too 
> > > >>>>>>>> optimistic and if we just ask the kernel to set the mode 
> > > >>>>>>>> again it'll use more conservative parameters that work.
> > > >>>>>>>>
> > > >>>>>>>> This patch seems good for 2).  For 1), the 
> > > >>>>>>>> drmmode_set_mode_major is going to set our old mode back.  
> > > >>>>>>>> Won't the modeset then fail to link train again, and bring 
> > > >>>>>>>> us back into this loop?  The only escape that I see would 
> > > >>>>>>>> be some other userspace responding to the advertised mode 
> > > >>>>>>>> list changing, and then asking X to modeset to something 
> > > >>>>>>>> new.
> > > >>>>>>>>
> > > >>>>>>>> To avoid that failure busy loop, should we re-fetching 
> > > >>>>>>>> modes at this point, and only re-setting if our mode still exists?
> > > >>>>>>> Disclaimer: I don't know anything about the internals of the 
> > > >>>>>>> modesetting driver.
> > > >>>>>>>
> > > >>>>>>> Perhaps we can identify the two cases now, but I'd put this 
> > > >>>>>>> more
> > > >>>>>>> generally: if the link status has gone bad, it's an 
> > > >>>>>>> indicator to userspace that the circumstances may have 
> > > >>>>>>> changed, and userspace should query the kernel for the list 
> > > >>>>>>> of available modes again. It should no longer trust 
> > > >>>>>>> information obtained prior to getting the bad link status, 
> > > >>>>>>> including the current mode.
> > > >>>>>>>
> > > >>>>>>> But specifically, I think you're right, and AFAICT asking 
> > > >>>>>>> for the list of modes again is the only way for the 
> > > >>>>>>> userspace to distinguish between the two cases. I don't 
> > > >>>>>>> think there's a shortcut for deciding the current mode is 
> > > >>>>>>> still valid.
> > > >>>>>> To avoid the busy-loop problem, I think I'd like this patch 
> > > >>>>>> to
> > > >>>>>> re-query
> > > >>>>>> the kernel to ask about the current mode list, and only try to re-set
> > > >>>>>> the mode if our mode is still there.
> > > >>>>>>
> > > >>>>>> If the mode isn't there, then it's up to the DE to take action in
> > > >>>>>> response to the notification of new modes.  If you don't have a DE to
> > > >>>>>> take appropriate action, you're kind of out of luck.
> > > >>>>>>
> > > >>>>>> As far as the ABI goes, this seems fine to me.  The only concern I had
> > > >>>>>> about ABI was having to walk all the connectors on every uevent to see
> > > >>>>>> if any had gone bad -- couldn't we have a flag of some sort about what
> > > >>>>>> the uevent indicates?  But uevents should be super rare, so I'd say
> > > >>>>>> the
> > > >>>>>> kernel could go ahead with the current plan.
> > > >>>>> Yes I agree. The kernel sets the link status BAD as soona s link
> > > >>>>> training fails
> > > >>>>> but does not prune the modes until a new modelist is requested by
> > > >>>>> the userspace.
> > > >>>>> So this patch should re query the mode list as soon as it sees the
> > > >>>>> link status
> > > >>>>> BAD in order for the kernel to validate the modes again based on new
> > > >>>>> link
> > > >>>>> paarmeters and send a new mode list back.
> > > >>>> Seems like a bad behaviour from the kernel, isn't it? It should return
> > > >>>> immediatly
> > > >>>> if the mode is gonna be pruned :s
> > > >>> The mode list pruning isn't relevant for modeesets, the kernel doesn't
> > > >>> validate requested modes against that. The mode list is purely for
> > > >>> userspace's information. Which means updating it only when userspace asks
> > > >>> for it is fine.
> > > >>
> > > >> Hmm, ok. Fair enough!
> > > >>
> > > >>> I also thought some more about the loop when we're stuck on BAD, and I
> > > >>> think it shouldn't happen: When the link goes BAD we update the link
> > > >>> parameter values to the new limits, and the kernel will reject any mode
> > > >>> that won't fit anymore. Of course you might be unlucky, and the new link
> > > >>> limits are also not working, but eventually you're stuck at the "you're
> > > >>> link is broken, sry" stage, where the kernel rejects everything :-)
> > > >>>
> > > >>> So I think the busy-loop has strictly a limited amount of time until it
> > > >>> runs out of steam.
> > > >>
> > > >> OK, I have given it more thoughts and discussed with Ville and Chris IRL or
> > > >> on IRC.
> > > >>
> > > >> My current proposal is based on the idea that the kernel should reject a
> > > >> mode
> > > >> it knows it cannot set. This is already largely tested in real life: Try
> > > >> setting 4K@60Hz on an HDMI 1.x monitor, it should immediately fail on
> > > >> prepare(). For this proposal to work, we would need to put in the ABI
> > > >> that a
> > > >> driver that sets the link-status to BAD should also make sure that whatever
> > > >> the userspace does, no infinite loop should be created (by changing the
> > > >> maximum link characteristics before setting the link-status property).
> > > >>
> > > >> Re-probing the list of modes and checking if the mode is still in there is
> > > >> inherently racy, as a new screen may be plugged between the moment the list
> > > >> is sent to the userspace and the moment when we try setting the mode. Or
> > > >> the
> > > >> DE may also have changed the mode in the mean time and the kernel would
> > > >> have reduced the limits even more. So, the userspace cannot be expected to
> > > >> always do the right thing here :s.
> > > >>
> > > >> From this point of view, I really do not like the idea of re-probing, since
> > > >> it increases the delay before the DE can handle the change and it does not
> > > >> bring any guarantee of working properly.
> > > >>
> > > >> Did I miss anything? Any opinions?
> > > >
> > > > Any comments on this, Eric? Does it sound logical to you or did I miss 
> > > > something?
> > > >
> > > > The kernel patches are stuck until this patch gets in. So far, you look 
> > > > like the only person who would be willing to review this patch (Ajax 
> > > > acked the patch, but that's the extent he was willing to go).
> > > 
> > > I was just trying to provide review to get the kernel unstuck.  The
> > > kernel should not be blocked until the patch gets lands (this obviously
> > > isn't the case with ioctls, which *don't* land in userspace until kernel
> > > does), you just need userspace published and generally agreed that the
> > > ABI works.
> > 
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Daniel Vetter Feb. 28, 2017, 8:43 a.m. UTC | #23
On Thu, Feb 02, 2017 at 09:57:20AM -0800, Eric Anholt wrote:
> Daniel Vetter <daniel@ffwll.ch> writes:
> 
> > On Wed, Feb 01, 2017 at 11:58:16AM -0800, Eric Anholt wrote:
> >> Jani Nikula <jani.nikula@linux.intel.com> writes:
> >> 
> >> > On Tue, 31 Jan 2017, Eric Anholt <eric@anholt.net> wrote:
> >> >> Martin Peres <martin.peres@linux.intel.com> writes:
> >> >>
> >> >>> Despite all the careful planing of the kernel, a link may become
> >> >>> insufficient to handle the currently-set mode. At this point, the
> >> >>> kernel should mark this particular configuration as being broken
> >> >>> and potentially prune the mode before setting the offending connector's
> >> >>> link-status to BAD and send the userspace a hotplug event. This may
> >> >>> happen right after a modeset or later on.
> >> >>>
> >> >>> When available, we should use the link-status information to reset
> >> >>> the wanted mode.
> >> >>>
> >> >>> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> >> >>
> >> >> If I understand this right, there are two failure modes being handled:
> >> >>
> >> >> 1) A mode that won't actually work because the link isn't good enough.
> >> >>
> >> >> 2) A mode that should work, but link parameters were too optimistic and
> >> >> if we just ask the kernel to set the mode again it'll use more
> >> >> conservative parameters that work.
> >> >>
> >> >> This patch seems good for 2).  For 1), the drmmode_set_mode_major is
> >> >> going to set our old mode back.  Won't the modeset then fail to link
> >> >> train again, and bring us back into this loop?  The only escape that I
> >> >> see would be some other userspace responding to the advertised mode list
> >> >> changing, and then asking X to modeset to something new.
> >> >>
> >> >> To avoid that failure busy loop, should we re-fetching modes at this
> >> >> point, and only re-setting if our mode still exists?
> >> >
> >> > Disclaimer: I don't know anything about the internals of the modesetting
> >> > driver.
> >> >
> >> > Perhaps we can identify the two cases now, but I'd put this more
> >> > generally: if the link status has gone bad, it's an indicator to
> >> > userspace that the circumstances may have changed, and userspace should
> >> > query the kernel for the list of available modes again. It should no
> >> > longer trust information obtained prior to getting the bad link status,
> >> > including the current mode.
> >> >
> >> > But specifically, I think you're right, and AFAICT asking for the list
> >> > of modes again is the only way for the userspace to distinguish between
> >> > the two cases. I don't think there's a shortcut for deciding the current
> >> > mode is still valid.
> >> 
> >> To avoid the busy-loop problem, I think I'd like this patch to re-query
> >> the kernel to ask about the current mode list, and only try to re-set
> >> the mode if our mode is still there.
> >
> > Yeah, I guess that sounds like a reasonable heuristics. More integrated
> > compositors (aka the wayland ones) might be able to make more useful
> > decisions, but for -modesetting that's probably the best we can do.
> >  
> >> If the mode isn't there, then it's up to the DE to take action in
> >> response to the notification of new modes.  If you don't have a DE to
> >> take appropriate action, you're kind of out of luck.
> >> 
> >> As far as the ABI goes, this seems fine to me.  The only concern I had
> >> about ABI was having to walk all the connectors on every uevent to see
> >> if any had gone bad -- couldn't we have a flag of some sort about what
> >> the uevent indicates?  But uevents should be super rare, so I'd say the
> >> kernel could go ahead with the current plan.
> >
> > We've discussed finer-grained uevent singalling a few times, and I'd like
> > that to be an orthogonal abi extension. Right now we just don't have the
> > required tracking even within the kernel to know which connector changed
> > (and the tracking we do have missed a few things, too). My idea is to push
> > the tracking into the leaves of the callchains with a function that
> > increases some per-connector epoch counter. Then we'd just have to expose
> > that somehow cheaply to userspace (could probably just send it along in
> > the uevent). Plus expose it as a read-only property so that userspace can
> > re-synchronize.
> >
> > But again, that should be an orthogonal thing imo.
> 
> Yeah, that was roughly my thought process, too.

I merged the kernel side of this new uapi:

commit 40ee6fbef75fe6452dc9e69e6f9f1a2c7808ed67
Author: Manasi Navare <manasi.d.navare@intel.com>
Date:   Fri Dec 16 12:29:06 2016 +0200

    drm: Add a new connector atomic property for link status

Feel free to go ahead with the userspace side merging. (Yes the i915 side
isn't merged yet, but purely for "in which branch should I put it?"
technicality).

Cheers, Daniel
Martin Peres March 27, 2017, 2:12 p.m. UTC | #24
On 26/01/17 14:37, Martin Peres wrote:
> Despite all the careful planing of the kernel, a link may become
> insufficient to handle the currently-set mode. At this point, the
> kernel should mark this particular configuration as being broken
> and potentially prune the mode before setting the offending connector's
> link-status to BAD and send the userspace a hotplug event. This may
> happen right after a modeset or later on.
>
> When available, we should use the link-status information to reset
> the wanted mode.
>
> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>

The relevant kernel patches have landed in drm-tip about a month ago.

Eric, would you mind providing feedback on or merging this patch?

Thanks,
Martin
Eric Anholt March 31, 2017, 12:37 a.m. UTC | #25
Martin Peres <martin.peres@linux.intel.com> writes:

> On 26/01/17 14:37, Martin Peres wrote:
>> Despite all the careful planing of the kernel, a link may become
>> insufficient to handle the currently-set mode. At this point, the
>> kernel should mark this particular configuration as being broken
>> and potentially prune the mode before setting the offending connector's
>> link-status to BAD and send the userspace a hotplug event. This may
>> happen right after a modeset or later on.
>>
>> When available, we should use the link-status information to reset
>> the wanted mode.
>>
>> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
>
> The relevant kernel patches have landed in drm-tip about a month ago.
>
> Eric, would you mind providing feedback on or merging this patch?

The later discussion has sounded like the kernel will (always) prune the
mode when we re-query, meaning that it doesn't make any sense to try to
re-set to the old mode.  Is this not the case?
Navare, Manasi March 31, 2017, 12:50 a.m. UTC | #26
On Thu, Mar 30, 2017 at 05:37:55PM -0700, Eric Anholt wrote:
> Martin Peres <martin.peres@linux.intel.com> writes:
> 
> > On 26/01/17 14:37, Martin Peres wrote:
> >> Despite all the careful planing of the kernel, a link may become
> >> insufficient to handle the currently-set mode. At this point, the
> >> kernel should mark this particular configuration as being broken
> >> and potentially prune the mode before setting the offending connector's
> >> link-status to BAD and send the userspace a hotplug event. This may
> >> happen right after a modeset or later on.
> >>
> >> When available, we should use the link-status information to reset
> >> the wanted mode.
> >>
> >> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> >
> > The relevant kernel patches have landed in drm-tip about a month ago.
> >
> > Eric, would you mind providing feedback on or merging this patch?
> 
> The later discussion has sounded like the kernel will (always) prune the
> mode when we re-query, meaning that it doesn't make any sense to try to
> re-set to the old mode.  Is this not the case?


No the kernel will simply send a hotplug with link status as bad
and then after that point its userspace driver's responsibility
to check if link status is BAD, retry the same mode and if it fails
then re probe.

Regards
Manasi
Eric Anholt March 31, 2017, 8:08 p.m. UTC | #27
Manasi Navare <manasi.d.navare@intel.com> writes:

> On Thu, Mar 30, 2017 at 05:37:55PM -0700, Eric Anholt wrote:
>> Martin Peres <martin.peres@linux.intel.com> writes:
>> 
>> > On 26/01/17 14:37, Martin Peres wrote:
>> >> Despite all the careful planing of the kernel, a link may become
>> >> insufficient to handle the currently-set mode. At this point, the
>> >> kernel should mark this particular configuration as being broken
>> >> and potentially prune the mode before setting the offending connector's
>> >> link-status to BAD and send the userspace a hotplug event. This may
>> >> happen right after a modeset or later on.
>> >>
>> >> When available, we should use the link-status information to reset
>> >> the wanted mode.
>> >>
>> >> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
>> >
>> > The relevant kernel patches have landed in drm-tip about a month ago.
>> >
>> > Eric, would you mind providing feedback on or merging this patch?
>> 
>> The later discussion has sounded like the kernel will (always) prune the
>> mode when we re-query, meaning that it doesn't make any sense to try to
>> re-set to the old mode.  Is this not the case?
>
>
> No the kernel will simply send a hotplug with link status as bad
> and then after that point its userspace driver's responsibility
> to check if link status is BAD, retry the same mode and if it fails
> then re probe.

So the kernel will sometimes allow the same mode to be re-set with the
same bpp?
Navare, Manasi March 31, 2017, 8:17 p.m. UTC | #28
On Fri, Mar 31, 2017 at 01:08:41PM -0700, Eric Anholt wrote:
> Manasi Navare <manasi.d.navare@intel.com> writes:
> 
> > On Thu, Mar 30, 2017 at 05:37:55PM -0700, Eric Anholt wrote:
> >> Martin Peres <martin.peres@linux.intel.com> writes:
> >> 
> >> > On 26/01/17 14:37, Martin Peres wrote:
> >> >> Despite all the careful planing of the kernel, a link may become
> >> >> insufficient to handle the currently-set mode. At this point, the
> >> >> kernel should mark this particular configuration as being broken
> >> >> and potentially prune the mode before setting the offending connector's
> >> >> link-status to BAD and send the userspace a hotplug event. This may
> >> >> happen right after a modeset or later on.
> >> >>
> >> >> When available, we should use the link-status information to reset
> >> >> the wanted mode.
> >> >>
> >> >> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> >> >
> >> > The relevant kernel patches have landed in drm-tip about a month ago.
> >> >
> >> > Eric, would you mind providing feedback on or merging this patch?
> >> 
> >> The later discussion has sounded like the kernel will (always) prune the
> >> mode when we re-query, meaning that it doesn't make any sense to try to
> >> re-set to the old mode.  Is this not the case?
> >
> >
> > No the kernel will simply send a hotplug with link status as bad
> > and then after that point its userspace driver's responsibility
> > to check if link status is BAD, retry the same mode and if it fails
> > then re probe.
> 
> So the kernel will sometimes allow the same mode to be re-set with the
> same bpp?

So when userspace driver re-sets the same mode, the kernel will call the
mode valid function where it will see it can allow the sam mode perhaps
at a lower bpp now since the link parameters are lowered.
So the mode which failed at 30 bpp, might still work at 18bpp and is
better going to a lower resolution.

Regards
Manasi
Eric Anholt April 1, 2017, 12:22 a.m. UTC | #29
Manasi Navare <manasi.d.navare@intel.com> writes:

> On Fri, Mar 31, 2017 at 01:08:41PM -0700, Eric Anholt wrote:
>> Manasi Navare <manasi.d.navare@intel.com> writes:
>> 
>> > On Thu, Mar 30, 2017 at 05:37:55PM -0700, Eric Anholt wrote:
>> >> Martin Peres <martin.peres@linux.intel.com> writes:
>> >> 
>> >> > On 26/01/17 14:37, Martin Peres wrote:
>> >> >> Despite all the careful planing of the kernel, a link may become
>> >> >> insufficient to handle the currently-set mode. At this point, the
>> >> >> kernel should mark this particular configuration as being broken
>> >> >> and potentially prune the mode before setting the offending connector's
>> >> >> link-status to BAD and send the userspace a hotplug event. This may
>> >> >> happen right after a modeset or later on.
>> >> >>
>> >> >> When available, we should use the link-status information to reset
>> >> >> the wanted mode.
>> >> >>
>> >> >> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
>> >> >
>> >> > The relevant kernel patches have landed in drm-tip about a month ago.
>> >> >
>> >> > Eric, would you mind providing feedback on or merging this patch?
>> >> 
>> >> The later discussion has sounded like the kernel will (always) prune the
>> >> mode when we re-query, meaning that it doesn't make any sense to try to
>> >> re-set to the old mode.  Is this not the case?
>> >
>> >
>> > No the kernel will simply send a hotplug with link status as bad
>> > and then after that point its userspace driver's responsibility
>> > to check if link status is BAD, retry the same mode and if it fails
>> > then re probe.
>> 
>> So the kernel will sometimes allow the same mode to be re-set with the
>> same bpp?
>
> So when userspace driver re-sets the same mode, the kernel will call the
> mode valid function where it will see it can allow the sam mode perhaps
> at a lower bpp now since the link parameters are lowered.
> So the mode which failed at 30 bpp, might still work at 18bpp and is
> better going to a lower resolution.

The question was whether the kernel will ever allow the same mode at the
same bpp, since that's what this patch tries to do.
Daniel Vetter April 2, 2017, 12:28 p.m. UTC | #30
On Fri, Mar 31, 2017 at 05:22:09PM -0700, Eric Anholt wrote:
> Manasi Navare <manasi.d.navare@intel.com> writes:
> 
> > On Fri, Mar 31, 2017 at 01:08:41PM -0700, Eric Anholt wrote:
> >> Manasi Navare <manasi.d.navare@intel.com> writes:
> >> 
> >> > On Thu, Mar 30, 2017 at 05:37:55PM -0700, Eric Anholt wrote:
> >> >> Martin Peres <martin.peres@linux.intel.com> writes:
> >> >> 
> >> >> > On 26/01/17 14:37, Martin Peres wrote:
> >> >> >> Despite all the careful planing of the kernel, a link may become
> >> >> >> insufficient to handle the currently-set mode. At this point, the
> >> >> >> kernel should mark this particular configuration as being broken
> >> >> >> and potentially prune the mode before setting the offending connector's
> >> >> >> link-status to BAD and send the userspace a hotplug event. This may
> >> >> >> happen right after a modeset or later on.
> >> >> >>
> >> >> >> When available, we should use the link-status information to reset
> >> >> >> the wanted mode.
> >> >> >>
> >> >> >> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> >> >> >
> >> >> > The relevant kernel patches have landed in drm-tip about a month ago.
> >> >> >
> >> >> > Eric, would you mind providing feedback on or merging this patch?
> >> >> 
> >> >> The later discussion has sounded like the kernel will (always) prune the
> >> >> mode when we re-query, meaning that it doesn't make any sense to try to
> >> >> re-set to the old mode.  Is this not the case?
> >> >
> >> >
> >> > No the kernel will simply send a hotplug with link status as bad
> >> > and then after that point its userspace driver's responsibility
> >> > to check if link status is BAD, retry the same mode and if it fails
> >> > then re probe.
> >> 
> >> So the kernel will sometimes allow the same mode to be re-set with the
> >> same bpp?
> >
> > So when userspace driver re-sets the same mode, the kernel will call the
> > mode valid function where it will see it can allow the sam mode perhaps
> > at a lower bpp now since the link parameters are lowered.
> > So the mode which failed at 30 bpp, might still work at 18bpp and is
> > better going to a lower resolution.
> 
> The question was whether the kernel will ever allow the same mode at the
> same bpp, since that's what this patch tries to do.

Yes, this can happen. Doing a full modeset with recomputing clocks and
everything behind userspace's back might not be something the kernel
driver can pull of with a reasonable amount of effort, hence why we punt
to userspace. The interface spec makes this a CAN, not WILL, to allow less
unreasonable hw to handle these cases directly in the kernel driver. E.g.
plain link-retraining is handled in i915.ko still.
-Daniel
Eric Anholt April 3, 2017, 2:21 a.m. UTC | #31
Daniel Vetter <daniel@ffwll.ch> writes:

> On Fri, Mar 31, 2017 at 05:22:09PM -0700, Eric Anholt wrote:
>> Manasi Navare <manasi.d.navare@intel.com> writes:
>> 
>> > On Fri, Mar 31, 2017 at 01:08:41PM -0700, Eric Anholt wrote:
>> >> Manasi Navare <manasi.d.navare@intel.com> writes:
>> >> 
>> >> > On Thu, Mar 30, 2017 at 05:37:55PM -0700, Eric Anholt wrote:
>> >> >> Martin Peres <martin.peres@linux.intel.com> writes:
>> >> >> 
>> >> >> > On 26/01/17 14:37, Martin Peres wrote:
>> >> >> >> Despite all the careful planing of the kernel, a link may become
>> >> >> >> insufficient to handle the currently-set mode. At this point, the
>> >> >> >> kernel should mark this particular configuration as being broken
>> >> >> >> and potentially prune the mode before setting the offending connector's
>> >> >> >> link-status to BAD and send the userspace a hotplug event. This may
>> >> >> >> happen right after a modeset or later on.
>> >> >> >>
>> >> >> >> When available, we should use the link-status information to reset
>> >> >> >> the wanted mode.
>> >> >> >>
>> >> >> >> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
>> >> >> >
>> >> >> > The relevant kernel patches have landed in drm-tip about a month ago.
>> >> >> >
>> >> >> > Eric, would you mind providing feedback on or merging this patch?
>> >> >> 
>> >> >> The later discussion has sounded like the kernel will (always) prune the
>> >> >> mode when we re-query, meaning that it doesn't make any sense to try to
>> >> >> re-set to the old mode.  Is this not the case?
>> >> >
>> >> >
>> >> > No the kernel will simply send a hotplug with link status as bad
>> >> > and then after that point its userspace driver's responsibility
>> >> > to check if link status is BAD, retry the same mode and if it fails
>> >> > then re probe.
>> >> 
>> >> So the kernel will sometimes allow the same mode to be re-set with the
>> >> same bpp?
>> >
>> > So when userspace driver re-sets the same mode, the kernel will call the
>> > mode valid function where it will see it can allow the sam mode perhaps
>> > at a lower bpp now since the link parameters are lowered.
>> > So the mode which failed at 30 bpp, might still work at 18bpp and is
>> > better going to a lower resolution.
>> 
>> The question was whether the kernel will ever allow the same mode at the
>> same bpp, since that's what this patch tries to do.
>
> Yes, this can happen. Doing a full modeset with recomputing clocks and
> everything behind userspace's back might not be something the kernel
> driver can pull of with a reasonable amount of effort, hence why we punt
> to userspace. The interface spec makes this a CAN, not WILL, to allow less
> unreasonable hw to handle these cases directly in the kernel driver. E.g.
> plain link-retraining is handled in i915.ko still.

So in that case you do need userspace to re-request the same mode at the
same bpp?
Navare, Manasi April 3, 2017, 6:25 a.m. UTC | #32
On Sun, Apr 02, 2017 at 07:21:09PM -0700, Eric Anholt wrote:
> Daniel Vetter <daniel@ffwll.ch> writes:
> 
> > On Fri, Mar 31, 2017 at 05:22:09PM -0700, Eric Anholt wrote:
> >> Manasi Navare <manasi.d.navare@intel.com> writes:
> >> 
> >> > On Fri, Mar 31, 2017 at 01:08:41PM -0700, Eric Anholt wrote:
> >> >> Manasi Navare <manasi.d.navare@intel.com> writes:
> >> >> 
> >> >> > On Thu, Mar 30, 2017 at 05:37:55PM -0700, Eric Anholt wrote:
> >> >> >> Martin Peres <martin.peres@linux.intel.com> writes:
> >> >> >> 
> >> >> >> > On 26/01/17 14:37, Martin Peres wrote:
> >> >> >> >> Despite all the careful planing of the kernel, a link may become
> >> >> >> >> insufficient to handle the currently-set mode. At this point, the
> >> >> >> >> kernel should mark this particular configuration as being broken
> >> >> >> >> and potentially prune the mode before setting the offending connector's
> >> >> >> >> link-status to BAD and send the userspace a hotplug event. This may
> >> >> >> >> happen right after a modeset or later on.
> >> >> >> >>
> >> >> >> >> When available, we should use the link-status information to reset
> >> >> >> >> the wanted mode.
> >> >> >> >>
> >> >> >> >> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> >> >> >> >
> >> >> >> > The relevant kernel patches have landed in drm-tip about a month ago.
> >> >> >> >
> >> >> >> > Eric, would you mind providing feedback on or merging this patch?
> >> >> >> 
> >> >> >> The later discussion has sounded like the kernel will (always) prune the
> >> >> >> mode when we re-query, meaning that it doesn't make any sense to try to
> >> >> >> re-set to the old mode.  Is this not the case?
> >> >> >
> >> >> >
> >> >> > No the kernel will simply send a hotplug with link status as bad
> >> >> > and then after that point its userspace driver's responsibility
> >> >> > to check if link status is BAD, retry the same mode and if it fails
> >> >> > then re probe.
> >> >> 
> >> >> So the kernel will sometimes allow the same mode to be re-set with the
> >> >> same bpp?
> >> >
> >> > So when userspace driver re-sets the same mode, the kernel will call the
> >> > mode valid function where it will see it can allow the sam mode perhaps
> >> > at a lower bpp now since the link parameters are lowered.
> >> > So the mode which failed at 30 bpp, might still work at 18bpp and is
> >> > better going to a lower resolution.
> >> 
> >> The question was whether the kernel will ever allow the same mode at the
> >> same bpp, since that's what this patch tries to do.
> >
> > Yes, this can happen. Doing a full modeset with recomputing clocks and
> > everything behind userspace's back might not be something the kernel
> > driver can pull of with a reasonable amount of effort, hence why we punt
> > to userspace. The interface spec makes this a CAN, not WILL, to allow less
> > unreasonable hw to handle these cases directly in the kernel driver. E.g.
> > plain link-retraining is handled in i915.ko still.
> 
> So in that case you do need userspace to re-request the same mode at the
> same bpp?

So yes because when userspace requests the same mode at same bpp,
kernel will still call intel_dp->mod_valid which validates the mode
against 18bpp so if the requested mode can be displayed at the lowest of
18bpp, then the kernel will try to do the modeset for that mode at lower
bpp. What I am trying to say is irrespective of what bpp userspace requests,
kernel will check if it can display that at the lowest of 18bpp.

Regards
Manasi

> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Daniel Vetter April 3, 2017, 7:19 a.m. UTC | #33
On Mon, Apr 3, 2017 at 8:25 AM, Manasi Navare <manasi.d.navare@intel.com> wrote:
>> So in that case you do need userspace to re-request the same mode at the
>> same bpp?
>
> So yes because when userspace requests the same mode at same bpp,
> kernel will still call intel_dp->mod_valid which validates the mode
> against 18bpp so if the requested mode can be displayed at the lowest of
> 18bpp, then the kernel will try to do the modeset for that mode at lower
> bpp. What I am trying to say is irrespective of what bpp userspace requests,
> kernel will check if it can display that at the lowest of 18bpp.

You're talking about two different bpp here I think. Eric talks about
the pixel format of the framebuffer, Manasi here about the bpp we send
over the wire. The kernel will auto-dither if the wire bpp is lower
than the stuff we scan out. Same with 6bpc panels really.

Right now userspace can't request a specific bpp for the sink/pipe,
that's fully under the kernel's control. It only gets to set the pixel
format of fbs.
-Daniel
Navare, Manasi April 5, 2017, 6:13 p.m. UTC | #34
On Mon, Apr 03, 2017 at 09:19:35AM +0200, Daniel Vetter wrote:
> On Mon, Apr 3, 2017 at 8:25 AM, Manasi Navare <manasi.d.navare@intel.com> wrote:
> >> So in that case you do need userspace to re-request the same mode at the
> >> same bpp?
> >
> > So yes because when userspace requests the same mode at same bpp,
> > kernel will still call intel_dp->mod_valid which validates the mode
> > against 18bpp so if the requested mode can be displayed at the lowest of
> > 18bpp, then the kernel will try to do the modeset for that mode at lower
> > bpp. What I am trying to say is irrespective of what bpp userspace requests,
> > kernel will check if it can display that at the lowest of 18bpp.
> 
> You're talking about two different bpp here I think. Eric talks about
> the pixel format of the framebuffer, Manasi here about the bpp we send
> over the wire. The kernel will auto-dither if the wire bpp is lower
> than the stuff we scan out. Same with 6bpc panels really.
> 
> Right now userspace can't request a specific bpp for the sink/pipe,
> that's fully under the kernel's control. It only gets to set the pixel
> format of fbs.
> -Daniel
> -- 

Yes so in that case, Eric we do need userspace to re-request the same mode
at the same bpp or the same pixel format. 
And if this also fails, then we need Gnome or KDE to be re triggering a
complete re probe.

Regards
Manasi


> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Navare, Manasi April 6, 2017, 5:15 p.m. UTC | #35
On Sun, Apr 02, 2017 at 07:21:09PM -0700, Eric Anholt wrote:
> Daniel Vetter <daniel@ffwll.ch> writes:
> 
> > On Fri, Mar 31, 2017 at 05:22:09PM -0700, Eric Anholt wrote:
> >> Manasi Navare <manasi.d.navare@intel.com> writes:
> >> 
> >> > On Fri, Mar 31, 2017 at 01:08:41PM -0700, Eric Anholt wrote:
> >> >> Manasi Navare <manasi.d.navare@intel.com> writes:
> >> >> 
> >> >> > On Thu, Mar 30, 2017 at 05:37:55PM -0700, Eric Anholt wrote:
> >> >> >> Martin Peres <martin.peres@linux.intel.com> writes:
> >> >> >> 
> >> >> >> > On 26/01/17 14:37, Martin Peres wrote:
> >> >> >> >> Despite all the careful planing of the kernel, a link may become
> >> >> >> >> insufficient to handle the currently-set mode. At this point, the
> >> >> >> >> kernel should mark this particular configuration as being broken
> >> >> >> >> and potentially prune the mode before setting the offending connector's
> >> >> >> >> link-status to BAD and send the userspace a hotplug event. This may
> >> >> >> >> happen right after a modeset or later on.
> >> >> >> >>
> >> >> >> >> When available, we should use the link-status information to reset
> >> >> >> >> the wanted mode.
> >> >> >> >>
> >> >> >> >> Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
> >> >> >> >
> >> >> >> > The relevant kernel patches have landed in drm-tip about a month ago.
> >> >> >> >
> >> >> >> > Eric, would you mind providing feedback on or merging this patch?
> >> >> >> 
> >> >> >> The later discussion has sounded like the kernel will (always) prune the
> >> >> >> mode when we re-query, meaning that it doesn't make any sense to try to
> >> >> >> re-set to the old mode.  Is this not the case?
> >> >> >
> >> >> >
> >> >> > No the kernel will simply send a hotplug with link status as bad
> >> >> > and then after that point its userspace driver's responsibility
> >> >> > to check if link status is BAD, retry the same mode and if it fails
> >> >> > then re probe.
> >> >> 
> >> >> So the kernel will sometimes allow the same mode to be re-set with the
> >> >> same bpp?
> >> >
> >> > So when userspace driver re-sets the same mode, the kernel will call the
> >> > mode valid function where it will see it can allow the sam mode perhaps
> >> > at a lower bpp now since the link parameters are lowered.
> >> > So the mode which failed at 30 bpp, might still work at 18bpp and is
> >> > better going to a lower resolution.
> >> 
> >> The question was whether the kernel will ever allow the same mode at the
> >> same bpp, since that's what this patch tries to do.
> >
> > Yes, this can happen. Doing a full modeset with recomputing clocks and
> > everything behind userspace's back might not be something the kernel
> > driver can pull of with a reasonable amount of effort, hence why we punt
> > to userspace. The interface spec makes this a CAN, not WILL, to allow less
> > unreasonable hw to handle these cases directly in the kernel driver. E.g.
> > plain link-retraining is handled in i915.ko still.
> 
> So in that case you do need userspace to re-request the same mode at the
> same bpp?

Hi Eric,

Yes so we do need userspace to re-request the same mode at the same bpp/pixel rate.
Kernel will try that at that bpp or lower it. Its fully in kernel's control.
If it fails then the atomic_check phase will return a failure and in that case
the Gnome/KDE will have to do a full reprobe.

Eric, do you have any more concerns about this patch or can this be pushed to Xserver?

Its critical for thsi patch to be pushed to Xserver so that the entire Gfx stack can handle
link failures and we can see some of the bugs going away.

Regards
Manasi

> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
diff mbox

Patch

diff --git a/hw/xfree86/drivers/modesetting/drmmode_display.c b/hw/xfree86/drivers/modesetting/drmmode_display.c
index 6e755e9482..3144620c67 100644
--- a/hw/xfree86/drivers/modesetting/drmmode_display.c
+++ b/hw/xfree86/drivers/modesetting/drmmode_display.c
@@ -2262,6 +2262,10 @@  drmmode_setup_colormap(ScreenPtr pScreen, ScrnInfoPtr pScrn)
 }
 
 #ifdef CONFIG_UDEV_KMS
+
+#define DRM_MODE_LINK_STATUS_GOOD       0
+#define DRM_MODE_LINK_STATUS_BAD        1
+
 static void
 drmmode_handle_uevents(int fd, void *closure)
 {
@@ -2281,6 +2285,49 @@  drmmode_handle_uevents(int fd, void *closure)
     if (!found)
         return;
 
+    /* Try to re-set the mode on all the connectors with a BAD link-state:
+     * This may happen if a link degrades and a new modeset is necessary, using
+     * different link-training parameters. If the kernel found that the current
+     * mode is not achievable anymore, it should have pruned the mode before
+     * sending the hotplug event. Try to re-set the currently-set mode to keep
+     * the display alive, this will fail if the mode has been pruned.
+     * In any case, we will send randr events for the Desktop Environment to
+     * deal with it, if it wants to.
+     */
+    for (i = 0; i < config->num_output; i++) {
+        xf86OutputPtr output = config->output[i];
+        drmmode_output_private_ptr drmmode_output = output->driver_private;
+        uint32_t con_id = drmmode_output->mode_output->connector_id;
+        drmModeConnectorPtr koutput;
+
+        /* Get an updated view of the properties for the current connector and
+         * look for the link-status property
+         */
+        koutput = drmModeGetConnectorCurrent(drmmode->fd, con_id);
+        for (j = 0; koutput && j < koutput->count_props; j++) {
+            drmModePropertyPtr props;
+            props = drmModeGetProperty(drmmode->fd, koutput->props[j]);
+            if (props && props->flags & DRM_MODE_PROP_ENUM &&
+                !strcmp(props->name, "link-status") &&
+                koutput->prop_values[j] == DRM_MODE_LINK_STATUS_BAD) {
+                xf86CrtcPtr crtc = output->crtc;
+                if (!crtc)
+                    continue;
+
+                /* the connector got a link failure, re-set the current mode */
+                drmmode_set_mode_major(crtc, &crtc->mode, crtc->rotation,
+                                       crtc->x, crtc->y);
+
+                xf86DrvMsg(scrn->scrnIndex, X_WARNING,
+                           "hotplug event: connector %u's link-state is BAD, "
+                           "tried resetting the current mode. You may be left"
+                           "with a black screen if this fails...\n", con_id);
+            }
+            drmModeFreeProperty(props);
+        }
+        drmModeFreeConnector(koutput);
+    }
+
     mode_res = drmModeGetResources(drmmode->fd);
     if (!mode_res)
         goto out;
@@ -2345,6 +2392,10 @@  out_free_res:
 out:
     RRGetInfo(xf86ScrnToScreen(scrn), TRUE);
 }
+
+#undef DRM_MODE_LINK_STATUS_BAD
+#undef DRM_MODE_LINK_STATUS_GOOD
+
 #endif
 
 void