diff mbox series

[2/4] drm/dp_mst: Only create connector for connected end device

Message ID 20210720160342.11415-3-Wayne.Lin@amd.com (mailing list archive)
State New, archived
Headers show
Series Unregister mst connectors when hotplug | expand

Commit Message

Lin, Wayne July 20, 2021, 4:03 p.m. UTC
[Why]
Currently, we will create connectors for all output ports no matter
it's connected or not. However, in MST, we can only determine
whether an output port really stands for a "connector" till it is
connected and check its peer device type as an end device.

In current code, we have chance to create connectors for output ports
connected with branch device and these are redundant connectors. e.g.
StarTech 1-to-4 DP hub is constructed by internal 2 layer 1-to-2 branch
devices. Creating connectors for such internal output ports are
redundant.

[How]
Put constraint on creating connector for connected end device only.

Fixes: 6f85f73821f6 ("drm/dp_mst: Add basic topology reprobing when resuming")
Cc: Juston Li <juston.li@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Harry Wentland <hwentlan@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Sean Paul <sean@poorly.run>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Eryk Brol <eryk.brol@amd.com>
Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: Nikola Cornij <nikola.cornij@amd.com>
Cc: Wayne Lin <Wayne.Lin@amd.com>
Cc: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Cc: "José Roberto de Souza" <jose.souza@intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.5+
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
---
 drivers/gpu/drm/drm_dp_mst_topology.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Lyude Paul Aug. 3, 2021, 11:58 p.m. UTC | #1
On Wed, 2021-07-21 at 00:03 +0800, Wayne Lin wrote:
> [Why]
> Currently, we will create connectors for all output ports no matter
> it's connected or not. However, in MST, we can only determine
> whether an output port really stands for a "connector" till it is
> connected and check its peer device type as an end device.

What is this commit trying to solve exactly? e.g. is AMD currently running
into issues with there being too many DRM connectors or something like that?
Ideally this is behavior I'd very much like us to keep as-is unless there's
good reason to change it.

Some context here btw - there's a lot of subtleties with MST locking that
isn't immediately obvious. It's been a while since I wrote this code, but if I
recall correctly one of those subtleties is that trying to create/destroy
connectors on the fly when ports change types introduces a lot of potential
issues with locking and some very complicated state transitions. Note that
because we maintain the topology as much as possible across suspend/resumes
this means there's a lot of potential state transitions with drm_dp_mst_port
and drm_dp_mst_branch we need to handle that would typically be impossible to
run into otherwise.

An example of this, if we were to try to prune connectors based on PDT
on the fly: assume we have a simple topology like this

Root MSTB -> Port 1 -> MSTB 1.1 (Connected w/ display)
          -> Port 2 -> MSTB 2.1

We suspend the system, unplug MSTB 1.1, and then resume. Once the
system starts reprobing, it will notice that MSTB 1.1 has been
disconnected. Since we no longer have a PDT, we decide to unregister our
connector. But there's a catch! We had a display connected to MSTB 1.1,
so even after unregistering the connector it's going to stay around
until userspace has committed a new mode with the connector disabled.

Now - assuming we're still in the same spot in the resume processs, let's assume
somehow MSTB 1.1 is suddenly plugged back in. Once we've finished
responding to the hotplug event, we will have created a connector for
it. Now we've hit a bug - userspace hasn't removed the previous zombie
connector which means we have references to the drm_dp_mst_port in our
atomic state and potentially also our payload tables (?? unsure about
this one).

So then how do we manage to add/remove connectors for input connectors
on the fly? Well, that's one of the fun normally-impossible state
transitions I mentioned before. According to the spec input ports are always
disconnected, so we'll never receive a CSN for them. This means in
theory the only possible way we could have a connector go from being an
input connector to an output connector connector would be if the entire
topology was swapped out during suspend/resume, and the input/output
ports in the two topologies topology happen to be in different places.
Since we only have to reprobe once during resume before we get
hotplugging enabled, we're guaranteed this state transition will only
happen once in this state - which means the second replug I described in
the previous paragraph can never happen.

Note that while I don't actually know if there's topologies with input
ports at indexes other than 0, since the specification isn't super clear
on this bit we play it safe and assume it is possible.

Anyway-this is -all- based off my memory, so please point out anything
here that I've explained that doesn't make sense or doesn't seem
correct :). It's totally possible I might have misremembered something.

> 
> In current code, we have chance to create connectors for output ports
> connected with branch device and these are redundant connectors. e.g.
> StarTech 1-to-4 DP hub is constructed by internal 2 layer 1-to-2 branch
> devices. Creating connectors for such internal output ports are
> redundant.
> 
> [How]
> Put constraint on creating connector for connected end device only.
> 
> Fixes: 6f85f73821f6 ("drm/dp_mst: Add basic topology reprobing when
> resuming")
> Cc: Juston Li <juston.li@intel.com>
> Cc: Imre Deak <imre.deak@intel.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: Harry Wentland <hwentlan@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Sean Paul <sean@poorly.run>
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Maxime Ripard <mripard@kernel.org>
> Cc: Thomas Zimmermann <tzimmermann@suse.de>
> Cc: David Airlie <airlied@linux.ie>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
> Cc: Eryk Brol <eryk.brol@amd.com>
> Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> Cc: Nikola Cornij <nikola.cornij@amd.com>
> Cc: Wayne Lin <Wayne.Lin@amd.com>
> Cc: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
> Cc: Jani Nikula <jani.nikula@intel.com>
> Cc: Manasi Navare <manasi.d.navare@intel.com>
> Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
> Cc: "José Roberto de Souza" <jose.souza@intel.com>
> Cc: Sean Paul <seanpaul@chromium.org>
> Cc: Ben Skeggs <bskeggs@redhat.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v5.5+
> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 51cd7f74f026..f13c7187b07f 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -2474,7 +2474,8 @@ drm_dp_mst_handle_link_address_port(struct
> drm_dp_mst_branch *mstb,
>  
>         if (port->connector)
>                 drm_modeset_unlock(&mgr->base.lock);
> -       else if (!port->input)
> +       else if (!port->input && port->pdt != DP_PEER_DEVICE_NONE &&
> +                drm_dp_mst_is_end_device(port->pdt, port->mcs))
>                 drm_dp_mst_port_add_connector(mstb, port);
>  
>         if (send_link_addr && port->mstb) {
> @@ -2557,6 +2558,10 @@ drm_dp_mst_handle_conn_stat(struct drm_dp_mst_branch
> *mstb,
>                 dowork = false;
>         }
>  
> +       if (!port->input && !port->connector && new_pdt !=
> DP_PEER_DEVICE_NONE &&
> +           drm_dp_mst_is_end_device(new_pdt, new_mcs))
> +               create_connector = true;
> +
>         if (port->connector)
>                 drm_modeset_unlock(&mgr->base.lock);
>         else if (create_connector)
Lyude Paul Aug. 4, 2021, 12:08 a.m. UTC | #2
On Tue, 2021-08-03 at 19:58 -0400, Lyude Paul wrote:
> On Wed, 2021-07-21 at 00:03 +0800, Wayne Lin wrote:
> > [Why]
> > Currently, we will create connectors for all output ports no matter
> > it's connected or not. However, in MST, we can only determine
> > whether an output port really stands for a "connector" till it is
> > connected and check its peer device type as an end device.
> 
> What is this commit trying to solve exactly? e.g. is AMD currently running
> into issues with there being too many DRM connectors or something like that?
> Ideally this is behavior I'd very much like us to keep as-is unless there's
> good reason to change it.
> 
> Some context here btw - there's a lot of subtleties with MST locking that
> isn't immediately obvious. It's been a while since I wrote this code, but if
> I
> recall correctly one of those subtleties is that trying to create/destroy
> connectors on the fly when ports change types introduces a lot of potential
> issues with locking and some very complicated state transitions. Note that
> because we maintain the topology as much as possible across suspend/resumes
> this means there's a lot of potential state transitions with drm_dp_mst_port
> and drm_dp_mst_branch we need to handle that would typically be impossible
> to
> run into otherwise.
> 
> An example of this, if we were to try to prune connectors based on PDT
> on the fly: assume we have a simple topology like this
> 
> Root MSTB -> Port 1 -> MSTB 1.1 (Connected w/ display)
>           -> Port 2 -> MSTB 2.1
> 
> We suspend the system, unplug MSTB 1.1, and then resume. Once the
> system starts reprobing, it will notice that MSTB 1.1 has been
> disconnected. Since we no longer have a PDT, we decide to unregister our
> connector. But there's a catch! We had a display connected to MSTB 1.1,
> so even after unregistering the connector it's going to stay around
> until userspace has committed a new mode with the connector disabled.
> 
> Now - assuming we're still in the same spot in the resume processs, let's
> assume
> somehow MSTB 1.1 is suddenly plugged back in. Once we've finished
> responding to the hotplug event, we will have created a connector for
> it. Now we've hit a bug - userspace hasn't removed the previous zombie
> connector which means we have references to the drm_dp_mst_port in our
> atomic state and potentially also our payload tables (?? unsure about
> this one).

Whoops. One thing I totally forgot to mention here: the reason this is a
problem is because we'd now have two drm_connectors which both have the same
drm_dp_mst_port pointer.

> 
> So then how do we manage to add/remove connectors for input connectors
> on the fly? Well, that's one of the fun normally-impossible state
> transitions I mentioned before. According to the spec input ports are always
> disconnected, so we'll never receive a CSN for them. This means in
> theory the only possible way we could have a connector go from being an
> input connector to an output connector connector would be if the entire
> topology was swapped out during suspend/resume, and the input/output
> ports in the two topologies topology happen to be in different places.
> Since we only have to reprobe once during resume before we get
> hotplugging enabled, we're guaranteed this state transition will only
> happen once in this state - which means the second replug I described in
> the previous paragraph can never happen.
> 
> Note that while I don't actually know if there's topologies with input
> ports at indexes other than 0, since the specification isn't super clear
> on this bit we play it safe and assume it is possible.
> 
> Anyway-this is -all- based off my memory, so please point out anything
> here that I've explained that doesn't make sense or doesn't seem
> correct :). It's totally possible I might have misremembered something.
> 
> > 
> > In current code, we have chance to create connectors for output ports
> > connected with branch device and these are redundant connectors. e.g.
> > StarTech 1-to-4 DP hub is constructed by internal 2 layer 1-to-2 branch
> > devices. Creating connectors for such internal output ports are
> > redundant.
> > 
> > [How]
> > Put constraint on creating connector for connected end device only.
> > 
> > Fixes: 6f85f73821f6 ("drm/dp_mst: Add basic topology reprobing when
> > resuming")
> > Cc: Juston Li <juston.li@intel.com>
> > Cc: Imre Deak <imre.deak@intel.com>
> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Cc: Harry Wentland <hwentlan@amd.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Sean Paul <sean@poorly.run>
> > Cc: Lyude Paul <lyude@redhat.com>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Cc: Maxime Ripard <mripard@kernel.org>
> > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > Cc: David Airlie <airlied@linux.ie>
> > Cc: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Alex Deucher <alexander.deucher@amd.com>
> > Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> > Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
> > Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
> > Cc: Eryk Brol <eryk.brol@amd.com>
> > Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> > Cc: Nikola Cornij <nikola.cornij@amd.com>
> > Cc: Wayne Lin <Wayne.Lin@amd.com>
> > Cc: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
> > Cc: Jani Nikula <jani.nikula@intel.com>
> > Cc: Manasi Navare <manasi.d.navare@intel.com>
> > Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
> > Cc: "José Roberto de Souza" <jose.souza@intel.com>
> > Cc: Sean Paul <seanpaul@chromium.org>
> > Cc: Ben Skeggs <bskeggs@redhat.com>
> > Cc: dri-devel@lists.freedesktop.org
> > Cc: <stable@vger.kernel.org> # v5.5+
> > Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
> > ---
> >  drivers/gpu/drm/drm_dp_mst_topology.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > index 51cd7f74f026..f13c7187b07f 100644
> > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > @@ -2474,7 +2474,8 @@ drm_dp_mst_handle_link_address_port(struct
> > drm_dp_mst_branch *mstb,
> >  
> >         if (port->connector)
> >                 drm_modeset_unlock(&mgr->base.lock);
> > -       else if (!port->input)
> > +       else if (!port->input && port->pdt != DP_PEER_DEVICE_NONE &&
> > +                drm_dp_mst_is_end_device(port->pdt, port->mcs))
> >                 drm_dp_mst_port_add_connector(mstb, port);
> >  
> >         if (send_link_addr && port->mstb) {
> > @@ -2557,6 +2558,10 @@ drm_dp_mst_handle_conn_stat(struct
> > drm_dp_mst_branch
> > *mstb,
> >                 dowork = false;
> >         }
> >  
> > +       if (!port->input && !port->connector && new_pdt !=
> > DP_PEER_DEVICE_NONE &&
> > +           drm_dp_mst_is_end_device(new_pdt, new_mcs))
> > +               create_connector = true;
> > +
> >         if (port->connector)
> >                 drm_modeset_unlock(&mgr->base.lock);
> >         else if (create_connector)
>
Lin, Wayne Aug. 4, 2021, 7:13 a.m. UTC | #3
[Public]

> -----Original Message-----
> From: Lyude Paul <lyude@redhat.com>
> Sent: Wednesday, August 4, 2021 8:09 AM
> To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland, Harry <Harry.Wentland@amd.com>; Zuo, Jerry
> <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li <juston.li@intel.com>; Imre Deak <imre.deak@intel.com>;
> Ville Syrjälä <ville.syrjala@linux.intel.com>; Wentland, Harry <Harry.Wentland@amd.com>; Daniel Vetter <daniel.vetter@ffwll.ch>;
> Sean Paul <sean@poorly.run>; Maarten Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard <mripard@kernel.org>;
> Thomas Zimmermann <tzimmermann@suse.de>; David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher,
> Alexander <Alexander.Deucher@amd.com>; Siqueira, Rodrigo <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> <Aurabindo.Pillai@amd.com>; Eryk Brol <eryk.brol@amd.com>; Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij, Nikola
> <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>; Manasi Navare <manasi.d.navare@intel.com>; Ankit Nautiyal
> <ankit.k.nautiyal@intel.com>; José Roberto de Souza <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben Skeggs
> <bskeggs@redhat.com>; stable@vger.kernel.org
> Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device
>
> On Tue, 2021-08-03 at 19:58 -0400, Lyude Paul wrote:
> > On Wed, 2021-07-21 at 00:03 +0800, Wayne Lin wrote:
> > > [Why]
> > > Currently, we will create connectors for all output ports no matter
> > > it's connected or not. However, in MST, we can only determine
> > > whether an output port really stands for a "connector" till it is
> > > connected and check its peer device type as an end device.
> >
> > What is this commit trying to solve exactly? e.g. is AMD currently
> > running into issues with there being too many DRM connectors or something like that?
> > Ideally this is behavior I'd very much like us to keep as-is unless
> > there's good reason to change it.
Hi Lyude,
Really appreciate for your time to elaborate in such detail. Thanks!

I come up with this commit because I observed something confusing when I was analyzing
MST connectors' life cycle. Take the topology instance you mentioned below

Root MSTB -> Output_Port 1 -> MSTB 1.1 ->Output_Port 1(Connected w/ display)
                    |                                                    ->Output_Port 2 (Disconnected)
                    -> Output_Port 2 -> MSTB 2.1 ->Output_Port 1 (Disconnected)
                                                                          -> Output_Port 2 (Disconnected)
Which is exactly the topology of Startech DP 1-to-4 hub. There are 3 1-to-2 branch chips
within this hub. With our MST implementation today, we'll create drm connectors for all
output ports. Hence, we totally create 6 drm connectors here. However, Output ports of
Root MSTB are not connected to a stream sink. They are connected with branch devices.
Thus, creating drm connector for such port looks a bit strange to me and increases
complexity to tracking drm connectors.  My thought is we only need to create drm
connector for those connected end device. Once output port is connected then we can
determine whether to add on a drm connector for this port based on the peer device type.
Hence, this commit doesn't try to break the locking logic but add more constraints when
We try to add drm connector. Please correct me if I misunderstand anything here. Thanks!
> >
> > Some context here btw - there's a lot of subtleties with MST locking
> > that isn't immediately obvious. It's been a while since I wrote this
> > code, but if I recall correctly one of those subtleties is that trying
> > to create/destroy connectors on the fly when ports change types
> > introduces a lot of potential issues with locking and some very
> > complicated state transitions. Note that because we maintain the
> > topology as much as possible across suspend/resumes this means there's
> > a lot of potential state transitions with drm_dp_mst_port and
> > drm_dp_mst_branch we need to handle that would typically be impossible
> > to run into otherwise.
> >
> > An example of this, if we were to try to prune connectors based on PDT
> > on the fly: assume we have a simple topology like this
> >
> > Root MSTB -> Port 1 -> MSTB 1.1 (Connected w/ display)
> >           -> Port 2 -> MSTB 2.1
> >
> > We suspend the system, unplug MSTB 1.1, and then resume. Once the
> > system starts reprobing, it will notice that MSTB 1.1 has been
> > disconnected. Since we no longer have a PDT, we decide to unregister
> > our connector. But there's a catch! We had a display connected to MSTB
> > 1.1, so even after unregistering the connector it's going to stay
> > around until userspace has committed a new mode with the connector disabled.
> >
> > Now - assuming we're still in the same spot in the resume processs,
> > let's assume somehow MSTB 1.1 is suddenly plugged back in. Once we've
> > finished responding to the hotplug event, we will have created a
> > connector for it. Now we've hit a bug - userspace hasn't removed the
> > previous zombie connector which means we have references to the
> > drm_dp_mst_port in our atomic state and potentially also our payload
> > tables (?? unsure about this one).
>
> Whoops. One thing I totally forgot to mention here: the reason this is a problem is because we'd now have two drm_connectors
> which both have the same drm_dp_mst_port pointer.
>
> >
> > So then how do we manage to add/remove connectors for input connectors
> > on the fly? Well, that's one of the fun normally-impossible state
> > transitions I mentioned before. According to the spec input ports are
> > always disconnected, so we'll never receive a CSN for them. This means
I think input ports' DisplayPort_Device_Plug_Status field is still set to 1? But yes,
according to DP1.4 spec 2.11.9.3, when MST device whose DPRX detected the
connection status change shall broadcast CSN downstream only. Hence, we'll never
receive a CSN for this case.
> > in theory the only possible way we could have a connector go from
> > being an input connector to an output connector connector would be if
> > the entire topology was swapped out during suspend/resume, and the
> > input/output ports in the two topologies topology happen to be in different places.
> > Since we only have to reprobe once during resume before we get
> > hotplugging enabled, we're guaranteed this state transition will only
> > happen once in this state - which means the second replug I described
> > in the previous paragraph can never happen.
> >
> > Note that while I don't actually know if there's topologies with input
> > ports at indexes other than 0, since the specification isn't super
> > clear on this bit we play it safe and assume it is possible.
Based on DP1.4 spec 2.5.1. Physical input ports are assigned smaller port
numbers than physical output ports. For concentrator product, if there are 2
input ports of it's branch device, then their port numbers are port 0 & port 1
which can refer to figure 2-122 of DP1.4.
> >
> > Anyway-this is -all- based off my memory, so please point out anything
> > here that I've explained that doesn't make sense or doesn't seem
> > correct :). It's totally possible I might have misremembered something.
Thanks again Lyude! Much appreciated for your time and help! And please
correct me if I misunderstand anything here : )
> >
> > >
> > > In current code, we have chance to create connectors for output
> > > ports connected with branch device and these are redundant connectors. e.g.
> > > StarTech 1-to-4 DP hub is constructed by internal 2 layer 1-to-2
> > > branch devices. Creating connectors for such internal output ports
> > > are redundant.
> > >
> > > [How]
> > > Put constraint on creating connector for connected end device only.
> > >
> > > Fixes: 6f85f73821f6 ("drm/dp_mst: Add basic topology reprobing when
> > > resuming")
> > > Cc: Juston Li <juston.li@intel.com>
> > > Cc: Imre Deak <imre.deak@intel.com>
> > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > Cc: Harry Wentland <hwentlan@amd.com>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: Sean Paul <sean@poorly.run>
> > > Cc: Lyude Paul <lyude@redhat.com>
> > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > Cc: Maxime Ripard <mripard@kernel.org>
> > > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > > Cc: David Airlie <airlied@linux.ie>
> > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> > > Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
> > > Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
> > > Cc: Eryk Brol <eryk.brol@amd.com>
> > > Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> > > Cc: Nikola Cornij <nikola.cornij@amd.com>
> > > Cc: Wayne Lin <Wayne.Lin@amd.com>
> > > Cc: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
> > > Cc: Jani Nikula <jani.nikula@intel.com>
> > > Cc: Manasi Navare <manasi.d.navare@intel.com>
> > > Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
> > > Cc: "José Roberto de Souza" <jose.souza@intel.com>
> > > Cc: Sean Paul <seanpaul@chromium.org>
> > > Cc: Ben Skeggs <bskeggs@redhat.com>
> > > Cc: dri-devel@lists.freedesktop.org
> > > Cc: <stable@vger.kernel.org> # v5.5+
> > > Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
> > > ---
> > >  drivers/gpu/drm/drm_dp_mst_topology.c | 7 ++++++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > index 51cd7f74f026..f13c7187b07f 100644
> > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > @@ -2474,7 +2474,8 @@ drm_dp_mst_handle_link_address_port(struct
> > > drm_dp_mst_branch *mstb,
> > >
> > >         if (port->connector)
> > >                 drm_modeset_unlock(&mgr->base.lock);
> > > -       else if (!port->input)
> > > +       else if (!port->input && port->pdt != DP_PEER_DEVICE_NONE &&
> > > +                drm_dp_mst_is_end_device(port->pdt, port->mcs))
> > >                 drm_dp_mst_port_add_connector(mstb, port);
> > >
> > >         if (send_link_addr && port->mstb) { @@ -2557,6 +2558,10 @@
> > > drm_dp_mst_handle_conn_stat(struct
> > > drm_dp_mst_branch
> > > *mstb,
> > >                 dowork = false;
> > >         }
> > >
> > > +       if (!port->input && !port->connector && new_pdt !=
> > > DP_PEER_DEVICE_NONE &&
> > > +           drm_dp_mst_is_end_device(new_pdt, new_mcs))
> > > +               create_connector = true;
> > > +
> > >         if (port->connector)
> > >                 drm_modeset_unlock(&mgr->base.lock);
> > >         else if (create_connector)
> >
>
> --
> Cheers,
>  Lyude Paul (she/her)
>  Software Engineer at Red Hat
Regards,
Wayne Lin
Lyude Paul Aug. 10, 2021, 8:45 p.m. UTC | #4
On Wed, 2021-08-04 at 07:13 +0000, Lin, Wayne wrote:
> [Public]
> 
> > -----Original Message-----
> > From: Lyude Paul <lyude@redhat.com>
> > Sent: Wednesday, August 4, 2021 8:09 AM
> > To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland, Harry <
> > Harry.Wentland@amd.com>; Zuo, Jerry
> > <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li <
> > juston.li@intel.com>; Imre Deak <imre.deak@intel.com>;
> > Ville Syrjälä <ville.syrjala@linux.intel.com>; Wentland, Harry <
> > Harry.Wentland@amd.com>; Daniel Vetter <daniel.vetter@ffwll.ch>;
> > Sean Paul <sean@poorly.run>; Maarten Lankhorst <
> > maarten.lankhorst@linux.intel.com>; Maxime Ripard <mripard@kernel.org>;
> > Thomas Zimmermann <tzimmermann@suse.de>; David Airlie <airlied@linux.ie>;
> > Daniel Vetter <daniel@ffwll.ch>; Deucher,
> > Alexander <Alexander.Deucher@amd.com>; Siqueira, Rodrigo <
> > Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > <Aurabindo.Pillai@amd.com>; Eryk Brol <eryk.brol@amd.com>; Bas
> > Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij, Nikola
> > <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>; Manasi
> > Navare <manasi.d.navare@intel.com>; Ankit Nautiyal
> > <ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben Skeggs
> > <bskeggs@redhat.com>; stable@vger.kernel.org
> > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected
> > end device
> > 
> > On Tue, 2021-08-03 at 19:58 -0400, Lyude Paul wrote:
> > > On Wed, 2021-07-21 at 00:03 +0800, Wayne Lin wrote:
> > > > [Why]
> > > > Currently, we will create connectors for all output ports no matter
> > > > it's connected or not. However, in MST, we can only determine
> > > > whether an output port really stands for a "connector" till it is
> > > > connected and check its peer device type as an end device.
> > > 
> > > What is this commit trying to solve exactly? e.g. is AMD currently
> > > running into issues with there being too many DRM connectors or
> > > something like that?
> > > Ideally this is behavior I'd very much like us to keep as-is unless
> > > there's good reason to change it.
> Hi Lyude,
> Really appreciate for your time to elaborate in such detail. Thanks!
> 
> I come up with this commit because I observed something confusing when I was
> analyzing
> MST connectors' life cycle. Take the topology instance you mentioned below
> 
> Root MSTB -> Output_Port 1 -> MSTB 1.1 ->Output_Port 1(Connected w/ display)
>                     |                                                    -
> >Output_Port 2 (Disconnected)
>                     -> Output_Port 2 -> MSTB 2.1 ->Output_Port 1
> (Disconnected)
>                                                                           ->
> Output_Port 2 (Disconnected)
> Which is exactly the topology of Startech DP 1-to-4 hub. There are 3 1-to-2
> branch chips
> within this hub. With our MST implementation today, we'll create drm
> connectors for all
> output ports. Hence, we totally create 6 drm connectors here. However,
> Output ports of
> Root MSTB are not connected to a stream sink. They are connected with branch
> devices.
> Thus, creating drm connector for such port looks a bit strange to me and
> increases
> complexity to tracking drm connectors.  My thought is we only need to create
> drm
> connector for those connected end device. Once output port is connected then
> we can
> determine whether to add on a drm connector for this port based on the peer
> device type.
> Hence, this commit doesn't try to break the locking logic but add more
> constraints when
> We try to add drm connector. Please correct me if I misunderstand anything
> here. Thanks!

Sorry-I will respond to this soon, some more stuff came up at work so it might
take me a day or two

> > > 
> > > Some context here btw - there's a lot of subtleties with MST locking
> > > that isn't immediately obvious. It's been a while since I wrote this
> > > code, but if I recall correctly one of those subtleties is that trying
> > > to create/destroy connectors on the fly when ports change types
> > > introduces a lot of potential issues with locking and some very
> > > complicated state transitions. Note that because we maintain the
> > > topology as much as possible across suspend/resumes this means there's
> > > a lot of potential state transitions with drm_dp_mst_port and
> > > drm_dp_mst_branch we need to handle that would typically be impossible
> > > to run into otherwise.
> > > 
> > > An example of this, if we were to try to prune connectors based on PDT
> > > on the fly: assume we have a simple topology like this
> > > 
> > > Root MSTB -> Port 1 -> MSTB 1.1 (Connected w/ display)
> > >           -> Port 2 -> MSTB 2.1
> > > 
> > > We suspend the system, unplug MSTB 1.1, and then resume. Once the
> > > system starts reprobing, it will notice that MSTB 1.1 has been
> > > disconnected. Since we no longer have a PDT, we decide to unregister
> > > our connector. But there's a catch! We had a display connected to MSTB
> > > 1.1, so even after unregistering the connector it's going to stay
> > > around until userspace has committed a new mode with the connector
> > > disabled.
> > > 
> > > Now - assuming we're still in the same spot in the resume processs,
> > > let's assume somehow MSTB 1.1 is suddenly plugged back in. Once we've
> > > finished responding to the hotplug event, we will have created a
> > > connector for it. Now we've hit a bug - userspace hasn't removed the
> > > previous zombie connector which means we have references to the
> > > drm_dp_mst_port in our atomic state and potentially also our payload
> > > tables (?? unsure about this one).
> > 
> > Whoops. One thing I totally forgot to mention here: the reason this is a
> > problem is because we'd now have two drm_connectors
> > which both have the same drm_dp_mst_port pointer.
> > 
> > > 
> > > So then how do we manage to add/remove connectors for input connectors
> > > on the fly? Well, that's one of the fun normally-impossible state
> > > transitions I mentioned before. According to the spec input ports are
> > > always disconnected, so we'll never receive a CSN for them. This means
> I think input ports' DisplayPort_Device_Plug_Status field is still set to 1?
> But yes,
> according to DP1.4 spec 2.11.9.3, when MST device whose DPRX detected the
> connection status change shall broadcast CSN downstream only. Hence, we'll
> never
> receive a CSN for this case.
> > > in theory the only possible way we could have a connector go from
> > > being an input connector to an output connector connector would be if
> > > the entire topology was swapped out during suspend/resume, and the
> > > input/output ports in the two topologies topology happen to be in
> > > different places.
> > > Since we only have to reprobe once during resume before we get
> > > hotplugging enabled, we're guaranteed this state transition will only
> > > happen once in this state - which means the second replug I described
> > > in the previous paragraph can never happen.
> > > 
> > > Note that while I don't actually know if there's topologies with input
> > > ports at indexes other than 0, since the specification isn't super
> > > clear on this bit we play it safe and assume it is possible.
> Based on DP1.4 spec 2.5.1. Physical input ports are assigned smaller port
> numbers than physical output ports. For concentrator product, if there are 2
> input ports of it's branch device, then their port numbers are port 0 & port
> 1
> which can refer to figure 2-122 of DP1.4.
> > > 
> > > Anyway-this is -all- based off my memory, so please point out anything
> > > here that I've explained that doesn't make sense or doesn't seem
> > > correct :). It's totally possible I might have misremembered something.
> Thanks again Lyude! Much appreciated for your time and help! And please
> correct me if I misunderstand anything here : )
> > > 
> > > > 
> > > > In current code, we have chance to create connectors for output
> > > > ports connected with branch device and these are redundant connectors.
> > > > e.g.
> > > > StarTech 1-to-4 DP hub is constructed by internal 2 layer 1-to-2
> > > > branch devices. Creating connectors for such internal output ports
> > > > are redundant.
> > > > 
> > > > [How]
> > > > Put constraint on creating connector for connected end device only.
> > > > 
> > > > Fixes: 6f85f73821f6 ("drm/dp_mst: Add basic topology reprobing when
> > > > resuming")
> > > > Cc: Juston Li <juston.li@intel.com>
> > > > Cc: Imre Deak <imre.deak@intel.com>
> > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > Cc: Harry Wentland <hwentlan@amd.com>
> > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > Cc: Sean Paul <sean@poorly.run>
> > > > Cc: Lyude Paul <lyude@redhat.com>
> > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > > Cc: Maxime Ripard <mripard@kernel.org>
> > > > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > > > Cc: David Airlie <airlied@linux.ie>
> > > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > > Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> > > > Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
> > > > Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
> > > > Cc: Eryk Brol <eryk.brol@amd.com>
> > > > Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> > > > Cc: Nikola Cornij <nikola.cornij@amd.com>
> > > > Cc: Wayne Lin <Wayne.Lin@amd.com>
> > > > Cc: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
> > > > Cc: Jani Nikula <jani.nikula@intel.com>
> > > > Cc: Manasi Navare <manasi.d.navare@intel.com>
> > > > Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
> > > > Cc: "José Roberto de Souza" <jose.souza@intel.com>
> > > > Cc: Sean Paul <seanpaul@chromium.org>
> > > > Cc: Ben Skeggs <bskeggs@redhat.com>
> > > > Cc: dri-devel@lists.freedesktop.org
> > > > Cc: <stable@vger.kernel.org> # v5.5+
> > > > Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
> > > > ---
> > > >  drivers/gpu/drm/drm_dp_mst_topology.c | 7 ++++++-
> > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > index 51cd7f74f026..f13c7187b07f 100644
> > > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > @@ -2474,7 +2474,8 @@ drm_dp_mst_handle_link_address_port(struct
> > > > drm_dp_mst_branch *mstb,
> > > > 
> > > >         if (port->connector)
> > > >                 drm_modeset_unlock(&mgr->base.lock);
> > > > -       else if (!port->input)
> > > > +       else if (!port->input && port->pdt != DP_PEER_DEVICE_NONE &&
> > > > +                drm_dp_mst_is_end_device(port->pdt, port->mcs))
> > > >                 drm_dp_mst_port_add_connector(mstb, port);
> > > > 
> > > >         if (send_link_addr && port->mstb) { @@ -2557,6 +2558,10 @@
> > > > drm_dp_mst_handle_conn_stat(struct
> > > > drm_dp_mst_branch
> > > > *mstb,
> > > >                 dowork = false;
> > > >         }
> > > > 
> > > > +       if (!port->input && !port->connector && new_pdt !=
> > > > DP_PEER_DEVICE_NONE &&
> > > > +           drm_dp_mst_is_end_device(new_pdt, new_mcs))
> > > > +               create_connector = true;
> > > > +
> > > >         if (port->connector)
> > > >                 drm_modeset_unlock(&mgr->base.lock);
> > > >         else if (create_connector)
> > > 
> > 
> > --
> > Cheers,
> >  Lyude Paul (she/her)
> >  Software Engineer at Red Hat
> Regards,
> Wayne Lin
>
Lin, Wayne Aug. 11, 2021, 9:49 a.m. UTC | #5
[Public]

> -----Original Message-----
> From: Lyude Paul <lyude@redhat.com>
> Sent: Wednesday, August 11, 2021 4:45 AM
> To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland, Harry <Harry.Wentland@amd.com>; Zuo, Jerry
> <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li <juston.li@intel.com>; Imre Deak <imre.deak@intel.com>;
> Ville Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten Lankhorst
> <maarten.lankhorst@linux.intel.com>; Maxime Ripard <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>;
> David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher, Alexander <Alexander.Deucher@amd.com>; Siqueira,
> Rodrigo <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo <Aurabindo.Pillai@amd.com>; Eryk Brol <eryk.brol@amd.com>; Bas
> Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>; Manasi
> Navare <manasi.d.navare@intel.com>; Ankit Nautiyal <ankit.k.nautiyal@intel.com>; José Roberto de Souza <jose.souza@intel.com>;
> Sean Paul <seanpaul@chromium.org>; Ben Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device
>
> On Wed, 2021-08-04 at 07:13 +0000, Lin, Wayne wrote:
> > [Public]
> >
> > > -----Original Message-----
> > > From: Lyude Paul <lyude@redhat.com>
> > > Sent: Wednesday, August 4, 2021 8:09 AM
> > > To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland,
> > > Harry < Harry.Wentland@amd.com>; Zuo, Jerry <Jerry.Zuo@amd.com>; Wu,
> > > Hersen <hersenxs.wu@amd.com>; Juston Li < juston.li@intel.com>; Imre
> > > Deak <imre.deak@intel.com>; Ville Syrjälä
> > > <ville.syrjala@linux.intel.com>; Wentland, Harry <
> > > Harry.Wentland@amd.com>; Daniel Vetter <daniel.vetter@ffwll.ch>;
> > > Sean Paul <sean@poorly.run>; Maarten Lankhorst <
> > > maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>; David
> > > Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher,
> > > Alexander <Alexander.Deucher@amd.com>; Siqueira, Rodrigo <
> > > Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > > <Aurabindo.Pillai@amd.com>; Eryk Brol <eryk.brol@amd.com>; Bas
> > > Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij, Nikola
> > > <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>; Manasi
> > > Navare <manasi.d.navare@intel.com>; Ankit Nautiyal
> > > <ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > connected end device
> > >
> > > On Tue, 2021-08-03 at 19:58 -0400, Lyude Paul wrote:
> > > > On Wed, 2021-07-21 at 00:03 +0800, Wayne Lin wrote:
> > > > > [Why]
> > > > > Currently, we will create connectors for all output ports no
> > > > > matter it's connected or not. However, in MST, we can only
> > > > > determine whether an output port really stands for a "connector"
> > > > > till it is connected and check its peer device type as an end device.
> > > >
> > > > What is this commit trying to solve exactly? e.g. is AMD currently
> > > > running into issues with there being too many DRM connectors or
> > > > something like that?
> > > > Ideally this is behavior I'd very much like us to keep as-is
> > > > unless there's good reason to change it.
> > Hi Lyude,
> > Really appreciate for your time to elaborate in such detail. Thanks!
> >
> > I come up with this commit because I observed something confusing when
> > I was analyzing MST connectors' life cycle. Take the topology instance
> > you mentioned below
> >
> > Root MSTB -> Output_Port 1 -> MSTB 1.1 ->Output_Port 1(Connected w/
> > display)
> >                     |
> > -
> > >Output_Port 2 (Disconnected)
> >                     -> Output_Port 2 -> MSTB 2.1 ->Output_Port 1
> > (Disconnected)
> >
> > -> Output_Port 2 (Disconnected) Which is exactly the topology of
> > Startech DP 1-to-4 hub. There are 3 1-to-2 branch chips within this
> > hub. With our MST implementation today, we'll create drm connectors
> > for all output ports. Hence, we totally create 6 drm connectors here.
> > However, Output ports of Root MSTB are not connected to a stream sink.
> > They are connected with branch devices.
> > Thus, creating drm connector for such port looks a bit strange to me
> > and increases complexity to tracking drm connectors.  My thought is we
> > only need to create drm connector for those connected end device. Once
> > output port is connected then we can determine whether to add on a drm
> > connector for this port based on the peer device type.
> > Hence, this commit doesn't try to break the locking logic but add more
> > constraints when We try to add drm connector. Please correct me if I
> > misunderstand anything here. Thanks!
>
> Sorry-I will respond to this soon, some more stuff came up at work so it might take me a day or two
No worries. Much appreciated for your time!
>
> > > >
> > > > Some context here btw - there's a lot of subtleties with MST
> > > > locking that isn't immediately obvious. It's been a while since I
> > > > wrote this code, but if I recall correctly one of those subtleties
> > > > is that trying to create/destroy connectors on the fly when ports
> > > > change types introduces a lot of potential issues with locking and
> > > > some very complicated state transitions. Note that because we
> > > > maintain the topology as much as possible across suspend/resumes
> > > > this means there's a lot of potential state transitions with
> > > > drm_dp_mst_port and drm_dp_mst_branch we need to handle that would
> > > > typically be impossible to run into otherwise.
> > > >
> > > > An example of this, if we were to try to prune connectors based on
> > > > PDT on the fly: assume we have a simple topology like this
> > > >
> > > > Root MSTB -> Port 1 -> MSTB 1.1 (Connected w/ display)
> > > >           -> Port 2 -> MSTB 2.1
> > > >
> > > > We suspend the system, unplug MSTB 1.1, and then resume. Once the
> > > > system starts reprobing, it will notice that MSTB 1.1 has been
> > > > disconnected. Since we no longer have a PDT, we decide to
> > > > unregister our connector. But there's a catch! We had a display
> > > > connected to MSTB 1.1, so even after unregistering the connector
> > > > it's going to stay around until userspace has committed a new mode
> > > > with the connector disabled.
> > > >
> > > > Now - assuming we're still in the same spot in the resume
> > > > processs, let's assume somehow MSTB 1.1 is suddenly plugged back
> > > > in. Once we've finished responding to the hotplug event, we will
> > > > have created a connector for it. Now we've hit a bug - userspace
> > > > hasn't removed the previous zombie connector which means we have
> > > > references to the drm_dp_mst_port in our atomic state and
> > > > potentially also our payload tables (?? unsure about this one).
> > >
> > > Whoops. One thing I totally forgot to mention here: the reason this
> > > is a problem is because we'd now have two drm_connectors which both
> > > have the same drm_dp_mst_port pointer.
> > >
> > > >
> > > > So then how do we manage to add/remove connectors for input
> > > > connectors on the fly? Well, that's one of the fun
> > > > normally-impossible state transitions I mentioned before.
> > > > According to the spec input ports are always disconnected, so
> > > > we'll never receive a CSN for them. This means
> > I think input ports' DisplayPort_Device_Plug_Status field is still set to 1?
> > But yes,
> > according to DP1.4 spec 2.11.9.3, when MST device whose DPRX detected
> > the connection status change shall broadcast CSN downstream only.
> > Hence, we'll never receive a CSN for this case.
> > > > in theory the only possible way we could have a connector go from
> > > > being an input connector to an output connector connector would be
> > > > if the entire topology was swapped out during suspend/resume, and
> > > > the input/output ports in the two topologies topology happen to be
> > > > in different places.
> > > > Since we only have to reprobe once during resume before we get
> > > > hotplugging enabled, we're guaranteed this state transition will
> > > > only happen once in this state - which means the second replug I
> > > > described in the previous paragraph can never happen.
> > > >
> > > > Note that while I don't actually know if there's topologies with
> > > > input ports at indexes other than 0, since the specification isn't
> > > > super clear on this bit we play it safe and assume it is possible.
> > Based on DP1.4 spec 2.5.1. Physical input ports are assigned smaller
> > port numbers than physical output ports. For concentrator product, if
> > there are 2 input ports of it's branch device, then their port numbers
> > are port 0 & port
> > 1
> > which can refer to figure 2-122 of DP1.4.
> > > >
> > > > Anyway-this is -all- based off my memory, so please point out
> > > > anything here that I've explained that doesn't make sense or
> > > > doesn't seem correct :). It's totally possible I might have misremembered something.
> > Thanks again Lyude! Much appreciated for your time and help! And
> > please correct me if I misunderstand anything here : )
> > > >
> > > > >
> > > > > In current code, we have chance to create connectors for output
> > > > > ports connected with branch device and these are redundant connectors.
> > > > > e.g.
> > > > > StarTech 1-to-4 DP hub is constructed by internal 2 layer 1-to-2
> > > > > branch devices. Creating connectors for such internal output
> > > > > ports are redundant.
> > > > >
> > > > > [How]
> > > > > Put constraint on creating connector for connected end device only.
> > > > >
> > > > > Fixes: 6f85f73821f6 ("drm/dp_mst: Add basic topology reprobing
> > > > > when
> > > > > resuming")
> > > > > Cc: Juston Li <juston.li@intel.com>
> > > > > Cc: Imre Deak <imre.deak@intel.com>
> > > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > Cc: Harry Wentland <hwentlan@amd.com>
> > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > Cc: Sean Paul <sean@poorly.run>
> > > > > Cc: Lyude Paul <lyude@redhat.com>
> > > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > > > Cc: Maxime Ripard <mripard@kernel.org>
> > > > > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > > > > Cc: David Airlie <airlied@linux.ie>
> > > > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > > > Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> > > > > Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
> > > > > Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
> > > > > Cc: Eryk Brol <eryk.brol@amd.com>
> > > > > Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> > > > > Cc: Nikola Cornij <nikola.cornij@amd.com>
> > > > > Cc: Wayne Lin <Wayne.Lin@amd.com>
> > > > > Cc: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
> > > > > Cc: Jani Nikula <jani.nikula@intel.com>
> > > > > Cc: Manasi Navare <manasi.d.navare@intel.com>
> > > > > Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
> > > > > Cc: "José Roberto de Souza" <jose.souza@intel.com>
> > > > > Cc: Sean Paul <seanpaul@chromium.org>
> > > > > Cc: Ben Skeggs <bskeggs@redhat.com>
> > > > > Cc: dri-devel@lists.freedesktop.org
> > > > > Cc: <stable@vger.kernel.org> # v5.5+
> > > > > Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
> > > > > ---
> > > > >  drivers/gpu/drm/drm_dp_mst_topology.c | 7 ++++++-
> > > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > index 51cd7f74f026..f13c7187b07f 100644
> > > > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > @@ -2474,7 +2474,8 @@ drm_dp_mst_handle_link_address_port(struct
> > > > > drm_dp_mst_branch *mstb,
> > > > >
> > > > >         if (port->connector)
> > > > >                 drm_modeset_unlock(&mgr->base.lock);
> > > > > -       else if (!port->input)
> > > > > +       else if (!port->input && port->pdt !=
> > > > > +DP_PEER_DEVICE_NONE &&
> > > > > +                drm_dp_mst_is_end_device(port->pdt, port->mcs))
> > > > >                 drm_dp_mst_port_add_connector(mstb, port);
> > > > >
> > > > >         if (send_link_addr && port->mstb) { @@ -2557,6 +2558,10
> > > > > @@ drm_dp_mst_handle_conn_stat(struct
> > > > > drm_dp_mst_branch
> > > > > *mstb,
> > > > >                 dowork = false;
> > > > >         }
> > > > >
> > > > > +       if (!port->input && !port->connector && new_pdt !=
> > > > > DP_PEER_DEVICE_NONE &&
> > > > > +           drm_dp_mst_is_end_device(new_pdt, new_mcs))
> > > > > +               create_connector = true;
> > > > > +
> > > > >         if (port->connector)
> > > > >                 drm_modeset_unlock(&mgr->base.lock);
> > > > >         else if (create_connector)
> > > >
> > >
> > > --
> > > Cheers,
> > >  Lyude Paul (she/her)
> > >  Software Engineer at Red Hat
> > Regards,
> > Wayne Lin
> >
>
> --
> Cheers,
>  Lyude Paul (she/her)
>  Software Engineer at Red Hat
--
Regards,
Wayne Lin
Lyude Paul Aug. 18, 2021, 6:58 p.m. UTC | #6
On Wed, 2021-08-11 at 09:49 +0000, Lin, Wayne wrote:
> [Public]
> 
> > -----Original Message-----
> > From: Lyude Paul <lyude@redhat.com>
> > Sent: Wednesday, August 11, 2021 4:45 AM
> > To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland, Harry <
> > Harry.Wentland@amd.com>; Zuo, Jerry
> > <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li <
> > juston.li@intel.com>; Imre Deak <imre.deak@intel.com>;
> > Ville Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter <
> > daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten Lankhorst
> > <maarten.lankhorst@linux.intel.com>; Maxime Ripard <mripard@kernel.org>;
> > Thomas Zimmermann <tzimmermann@suse.de>;
> > David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher,
> > Alexander <Alexander.Deucher@amd.com>; Siqueira,
> > Rodrigo <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo <
> > Aurabindo.Pillai@amd.com>; Eryk Brol <eryk.brol@amd.com>; Bas
> > Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij, Nikola <
> > Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>; Manasi
> > Navare <manasi.d.navare@intel.com>; Ankit Nautiyal <
> > ankit.k.nautiyal@intel.com>; José Roberto de Souza <jose.souza@intel.com>;
> > Sean Paul <seanpaul@chromium.org>; Ben Skeggs <bskeggs@redhat.com>; 
> > stable@vger.kernel.org
> > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected
> > end device
> > 
> > On Wed, 2021-08-04 at 07:13 +0000, Lin, Wayne wrote:
> > > [Public]
> > > 
> > > > -----Original Message-----
> > > > From: Lyude Paul <lyude@redhat.com>
> > > > Sent: Wednesday, August 4, 2021 8:09 AM
> > > > To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> > > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland,
> > > > Harry < Harry.Wentland@amd.com>; Zuo, Jerry <Jerry.Zuo@amd.com>; Wu,
> > > > Hersen <hersenxs.wu@amd.com>; Juston Li < juston.li@intel.com>; Imre
> > > > Deak <imre.deak@intel.com>; Ville Syrjälä
> > > > <ville.syrjala@linux.intel.com>; Wentland, Harry <
> > > > Harry.Wentland@amd.com>; Daniel Vetter <daniel.vetter@ffwll.ch>;
> > > > Sean Paul <sean@poorly.run>; Maarten Lankhorst <
> > > > maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>; David
> > > > Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher,
> > > > Alexander <Alexander.Deucher@amd.com>; Siqueira, Rodrigo <
> > > > Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > > > <Aurabindo.Pillai@amd.com>; Eryk Brol <eryk.brol@amd.com>; Bas
> > > > Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij, Nikola
> > > > <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>; Manasi
> > > > Navare <manasi.d.navare@intel.com>; Ankit Nautiyal
> > > > <ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > connected end device
> > > > 
> > > > On Tue, 2021-08-03 at 19:58 -0400, Lyude Paul wrote:
> > > > > On Wed, 2021-07-21 at 00:03 +0800, Wayne Lin wrote:
> > > > > > [Why]
> > > > > > Currently, we will create connectors for all output ports no
> > > > > > matter it's connected or not. However, in MST, we can only
> > > > > > determine whether an output port really stands for a "connector"
> > > > > > till it is connected and check its peer device type as an end
> > > > > > device.
> > > > > 
> > > > > What is this commit trying to solve exactly? e.g. is AMD currently
> > > > > running into issues with there being too many DRM connectors or
> > > > > something like that?
> > > > > Ideally this is behavior I'd very much like us to keep as-is
> > > > > unless there's good reason to change it.
> > > Hi Lyude,
> > > Really appreciate for your time to elaborate in such detail. Thanks!
> > > 
> > > I come up with this commit because I observed something confusing when
> > > I was analyzing MST connectors' life cycle. Take the topology instance
> > > you mentioned below
> > > 
> > > Root MSTB -> Output_Port 1 -> MSTB 1.1 ->Output_Port 1(Connected w/
> > > display)
> > >                     |
> > > -
> > > > Output_Port 2 (Disconnected)
> > >                     -> Output_Port 2 -> MSTB 2.1 ->Output_Port 1
> > > (Disconnected)
> > > 
> > > -> Output_Port 2 (Disconnected) Which is exactly the topology of
> > > Startech DP 1-to-4 hub. There are 3 1-to-2 branch chips within this
> > > hub. With our MST implementation today, we'll create drm connectors
> > > for all output ports. Hence, we totally create 6 drm connectors here.
> > > However, Output ports of Root MSTB are not connected to a stream sink.
> > > They are connected with branch devices.
> > > Thus, creating drm connector for such port looks a bit strange to me
> > > and increases complexity to tracking drm connectors.  My thought is we
> > > only need to create drm connector for those connected end device. Once
> > > output port is connected then we can determine whether to add on a drm
> > > connector for this port based on the peer device type.
> > > Hence, this commit doesn't try to break the locking logic but add more
> > > constraints when We try to add drm connector. Please correct me if I
> > > misunderstand anything here. Thanks!
> > 
> > Sorry-I will respond to this soon, some more stuff came up at work so it
> > might take me a day or two
> No worries. Much appreciated for your time!
> > 

Alright - finally got some time to respond to this. So this change still
doesn't really seem correct to me (if anyone watching this thread wants to
chime in to correct me btw feel free).

JFYI - I don't think the commit is trying to break anything intentionally,
it's just that there's a lot of moving pieces with the locking here that are
easy to trip over. That being said though, besides the locking issues after
thinking about this I'm still a bit skeptical on how much this would work or
even if we would want it.

To start off - my main issue with this is that it sounds like we're basically
entirely getting rid of the disconnected state for MST connectors, and then
only exposing the connector when something is connected. Unless I'm missing
something here, the PDT can pretty much change whenever something is
connected/disconnected or across suspend/resume reprobes. To do this with the
connector API would be very different from connector probing behavior for
other connector types, which already seems like an issue to me. This would
also break the ability to force a connector to be connected/disconnected, as
there would no longer be a way to force a disconnected MST connector on.

The other thing is I'm not entirely clear still on what's trying to be
accomplished here. If you're trying to identify DRM connectors, there's
already no guaranteed consistency with connector names which means that having
less connectors doesn't really make things any easier to identify. For
actually trying to figure out more details on connectors, if this is somethig
userspace needs, this seems like something we should just be adding in the
form of connector props.

With all of this being said, this ends up just seeming like we're adding
potentially a lot of complexity to how we create connectors and the
suspend/resume reprobing code. I think it'd be good to know what the precise
usecase for this actually is, if this is something you still think is needed.

> > > > > 
> > > > > Some context here btw - there's a lot of subtleties with MST
> > > > > locking that isn't immediately obvious. It's been a while since I
> > > > > wrote this code, but if I recall correctly one of those subtleties
> > > > > is that trying to create/destroy connectors on the fly when ports
> > > > > change types introduces a lot of potential issues with locking and
> > > > > some very complicated state transitions. Note that because we
> > > > > maintain the topology as much as possible across suspend/resumes
> > > > > this means there's a lot of potential state transitions with
> > > > > drm_dp_mst_port and drm_dp_mst_branch we need to handle that would
> > > > > typically be impossible to run into otherwise.
> > > > > 
> > > > > An example of this, if we were to try to prune connectors based on
> > > > > PDT on the fly: assume we have a simple topology like this
> > > > > 
> > > > > Root MSTB -> Port 1 -> MSTB 1.1 (Connected w/ display)
> > > > >           -> Port 2 -> MSTB 2.1
> > > > > 
> > > > > We suspend the system, unplug MSTB 1.1, and then resume. Once the
> > > > > system starts reprobing, it will notice that MSTB 1.1 has been
> > > > > disconnected. Since we no longer have a PDT, we decide to
> > > > > unregister our connector. But there's a catch! We had a display
> > > > > connected to MSTB 1.1, so even after unregistering the connector
> > > > > it's going to stay around until userspace has committed a new mode
> > > > > with the connector disabled.
> > > > > 
> > > > > Now - assuming we're still in the same spot in the resume
> > > > > processs, let's assume somehow MSTB 1.1 is suddenly plugged back
> > > > > in. Once we've finished responding to the hotplug event, we will
> > > > > have created a connector for it. Now we've hit a bug - userspace
> > > > > hasn't removed the previous zombie connector which means we have
> > > > > references to the drm_dp_mst_port in our atomic state and
> > > > > potentially also our payload tables (?? unsure about this one).
> > > > 
> > > > Whoops. One thing I totally forgot to mention here: the reason this
> > > > is a problem is because we'd now have two drm_connectors which both
> > > > have the same drm_dp_mst_port pointer.
> > > > 
> > > > > 
> > > > > So then how do we manage to add/remove connectors for input
> > > > > connectors on the fly? Well, that's one of the fun
> > > > > normally-impossible state transitions I mentioned before.
> > > > > According to the spec input ports are always disconnected, so
> > > > > we'll never receive a CSN for them. This means
> > > I think input ports' DisplayPort_Device_Plug_Status field is still set
> > > to 1?
> > > But yes,
> > > according to DP1.4 spec 2.11.9.3, when MST device whose DPRX detected
> > > the connection status change shall broadcast CSN downstream only.
> > > Hence, we'll never receive a CSN for this case.
> > > > > in theory the only possible way we could have a connector go from
> > > > > being an input connector to an output connector connector would be
> > > > > if the entire topology was swapped out during suspend/resume, and
> > > > > the input/output ports in the two topologies topology happen to be
> > > > > in different places.
> > > > > Since we only have to reprobe once during resume before we get
> > > > > hotplugging enabled, we're guaranteed this state transition will
> > > > > only happen once in this state - which means the second replug I
> > > > > described in the previous paragraph can never happen.
> > > > > 
> > > > > Note that while I don't actually know if there's topologies with
> > > > > input ports at indexes other than 0, since the specification isn't
> > > > > super clear on this bit we play it safe and assume it is possible.
> > > Based on DP1.4 spec 2.5.1. Physical input ports are assigned smaller
> > > port numbers than physical output ports. For concentrator product, if
> > > there are 2 input ports of it's branch device, then their port numbers
> > > are port 0 & port
> > > 1
> > > which can refer to figure 2-122 of DP1.4.
> > > > > 
> > > > > Anyway-this is -all- based off my memory, so please point out
> > > > > anything here that I've explained that doesn't make sense or
> > > > > doesn't seem correct :). It's totally possible I might have
> > > > > misremembered something.
> > > Thanks again Lyude! Much appreciated for your time and help! And
> > > please correct me if I misunderstand anything here : )
> > > > > 
> > > > > > 
> > > > > > In current code, we have chance to create connectors for output
> > > > > > ports connected with branch device and these are redundant
> > > > > > connectors.
> > > > > > e.g.
> > > > > > StarTech 1-to-4 DP hub is constructed by internal 2 layer 1-to-2
> > > > > > branch devices. Creating connectors for such internal output
> > > > > > ports are redundant.
> > > > > > 
> > > > > > [How]
> > > > > > Put constraint on creating connector for connected end device
> > > > > > only.
> > > > > > 
> > > > > > Fixes: 6f85f73821f6 ("drm/dp_mst: Add basic topology reprobing
> > > > > > when
> > > > > > resuming")
> > > > > > Cc: Juston Li <juston.li@intel.com>
> > > > > > Cc: Imre Deak <imre.deak@intel.com>
> > > > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > > Cc: Harry Wentland <hwentlan@amd.com>
> > > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > > Cc: Sean Paul <sean@poorly.run>
> > > > > > Cc: Lyude Paul <lyude@redhat.com>
> > > > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > > > > Cc: Maxime Ripard <mripard@kernel.org>
> > > > > > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > > > > > Cc: David Airlie <airlied@linux.ie>
> > > > > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > > > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > > > > Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> > > > > > Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
> > > > > > Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
> > > > > > Cc: Eryk Brol <eryk.brol@amd.com>
> > > > > > Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> > > > > > Cc: Nikola Cornij <nikola.cornij@amd.com>
> > > > > > Cc: Wayne Lin <Wayne.Lin@amd.com>
> > > > > > Cc: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
> > > > > > Cc: Jani Nikula <jani.nikula@intel.com>
> > > > > > Cc: Manasi Navare <manasi.d.navare@intel.com>
> > > > > > Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
> > > > > > Cc: "José Roberto de Souza" <jose.souza@intel.com>
> > > > > > Cc: Sean Paul <seanpaul@chromium.org>
> > > > > > Cc: Ben Skeggs <bskeggs@redhat.com>
> > > > > > Cc: dri-devel@lists.freedesktop.org
> > > > > > Cc: <stable@vger.kernel.org> # v5.5+
> > > > > > Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/drm_dp_mst_topology.c | 7 ++++++-
> > > > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > index 51cd7f74f026..f13c7187b07f 100644
> > > > > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > @@ -2474,7 +2474,8 @@ drm_dp_mst_handle_link_address_port(struct
> > > > > > drm_dp_mst_branch *mstb,
> > > > > > 
> > > > > >         if (port->connector)
> > > > > >                 drm_modeset_unlock(&mgr->base.lock);
> > > > > > -       else if (!port->input)
> > > > > > +       else if (!port->input && port->pdt !=
> > > > > > +DP_PEER_DEVICE_NONE &&
> > > > > > +                drm_dp_mst_is_end_device(port->pdt, port->mcs))
> > > > > >                 drm_dp_mst_port_add_connector(mstb, port);
> > > > > > 
> > > > > >         if (send_link_addr && port->mstb) { @@ -2557,6 +2558,10
> > > > > > @@ drm_dp_mst_handle_conn_stat(struct
> > > > > > drm_dp_mst_branch
> > > > > > *mstb,
> > > > > >                 dowork = false;
> > > > > >         }
> > > > > > 
> > > > > > +       if (!port->input && !port->connector && new_pdt !=
> > > > > > DP_PEER_DEVICE_NONE &&
> > > > > > +           drm_dp_mst_is_end_device(new_pdt, new_mcs))
> > > > > > +               create_connector = true;
> > > > > > +
> > > > > >         if (port->connector)
> > > > > >                 drm_modeset_unlock(&mgr->base.lock);
> > > > > >         else if (create_connector)
> > > > > 
> > > > 
> > > > --
> > > > Cheers,
> > > >  Lyude Paul (she/her)
> > > >  Software Engineer at Red Hat
> > > Regards,
> > > Wayne Lin
> > > 
> > 
> > --
> > Cheers,
> >  Lyude Paul (she/her)
> >  Software Engineer at Red Hat
> --
> Regards,
> Wayne Lin
>
Lin, Wayne Aug. 20, 2021, 11:20 a.m. UTC | #7
[Public]

> -----Original Message-----
> From: Lyude Paul <lyude@redhat.com>
> Sent: Thursday, August 19, 2021 2:59 AM
> To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland, Harry <Harry.Wentland@amd.com>; Zuo, Jerry
> <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li <juston.li@intel.com>; Imre Deak <imre.deak@intel.com>;
> Ville Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten Lankhorst
> <maarten.lankhorst@linux.intel.com>; Maxime Ripard <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>;
> David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher, Alexander <Alexander.Deucher@amd.com>; Siqueira,
> Rodrigo <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo <Aurabindo.Pillai@amd.com>; Eryk Brol <eryk.brol@amd.com>; Bas
> Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>; Manasi
> Navare <manasi.d.navare@intel.com>; Ankit Nautiyal <ankit.k.nautiyal@intel.com>; José Roberto de Souza <jose.souza@intel.com>;
> Sean Paul <seanpaul@chromium.org>; Ben Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device
>
> On Wed, 2021-08-11 at 09:49 +0000, Lin, Wayne wrote:
> > [Public]
> >
> > > -----Original Message-----
> > > From: Lyude Paul <lyude@redhat.com>
> > > Sent: Wednesday, August 11, 2021 4:45 AM
> > > To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland,
> > > Harry < Harry.Wentland@amd.com>; Zuo, Jerry <Jerry.Zuo@amd.com>; Wu,
> > > Hersen <hersenxs.wu@amd.com>; Juston Li < juston.li@intel.com>; Imre
> > > Deak <imre.deak@intel.com>; Ville Syrjälä
> > > <ville.syrjala@linux.intel.com>; Daniel Vetter <
> > > daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten
> > > Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>; David
> > > Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher,
> > > Alexander <Alexander.Deucher@amd.com>; Siqueira, Rodrigo
> > > <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo <
> > > Aurabindo.Pillai@amd.com>; Eryk Brol <eryk.brol@amd.com>; Bas
> > > Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij, Nikola <
> > > Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>; Manasi
> > > Navare <manasi.d.navare@intel.com>; Ankit Nautiyal <
> > > ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > connected end device
> > >
> > > On Wed, 2021-08-04 at 07:13 +0000, Lin, Wayne wrote:
> > > > [Public]
> > > >
> > > > > -----Original Message-----
> > > > > From: Lyude Paul <lyude@redhat.com>
> > > > > Sent: Wednesday, August 4, 2021 8:09 AM
> > > > > To: Lin, Wayne <Wayne.Lin@amd.com>;
> > > > > dri-devel@lists.freedesktop.org
> > > > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>;
> > > > > Wentland, Harry < Harry.Wentland@amd.com>; Zuo, Jerry
> > > > > <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li
> > > > > < juston.li@intel.com>; Imre Deak <imre.deak@intel.com>; Ville
> > > > > Syrjälä <ville.syrjala@linux.intel.com>; Wentland, Harry <
> > > > > Harry.Wentland@amd.com>; Daniel Vetter <daniel.vetter@ffwll.ch>;
> > > > > Sean Paul <sean@poorly.run>; Maarten Lankhorst <
> > > > > maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>;
> > > > > David Airlie <airlied@linux.ie>; Daniel Vetter
> > > > > <daniel@ffwll.ch>; Deucher, Alexander
> > > > > <Alexander.Deucher@amd.com>; Siqueira, Rodrigo <
> > > > > Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > > > > <Aurabindo.Pillai@amd.com>; Eryk Brol <eryk.brol@amd.com>; Bas
> > > > > Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij, Nikola
> > > > > <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>;
> > > > > Manasi Navare <manasi.d.navare@intel.com>; Ankit Nautiyal
> > > > > <ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > > connected end device
> > > > >
> > > > > On Tue, 2021-08-03 at 19:58 -0400, Lyude Paul wrote:
> > > > > > On Wed, 2021-07-21 at 00:03 +0800, Wayne Lin wrote:
> > > > > > > [Why]
> > > > > > > Currently, we will create connectors for all output ports no
> > > > > > > matter it's connected or not. However, in MST, we can only
> > > > > > > determine whether an output port really stands for a "connector"
> > > > > > > till it is connected and check its peer device type as an
> > > > > > > end device.
> > > > > >
> > > > > > What is this commit trying to solve exactly? e.g. is AMD
> > > > > > currently running into issues with there being too many DRM
> > > > > > connectors or something like that?
> > > > > > Ideally this is behavior I'd very much like us to keep as-is
> > > > > > unless there's good reason to change it.
> > > > Hi Lyude,
> > > > Really appreciate for your time to elaborate in such detail. Thanks!
> > > >
> > > > I come up with this commit because I observed something confusing
> > > > when I was analyzing MST connectors' life cycle. Take the topology
> > > > instance you mentioned below
> > > >
> > > > Root MSTB -> Output_Port 1 -> MSTB 1.1 ->Output_Port 1(Connected
> > > > w/
> > > > display)
> > > >                     |
> > > > -
> > > > > Output_Port 2 (Disconnected)
> > > >                     -> Output_Port 2 -> MSTB 2.1 ->Output_Port 1
> > > > (Disconnected)
> > > >
> > > > -> Output_Port 2 (Disconnected) Which is exactly the topology of
> > > > Startech DP 1-to-4 hub. There are 3 1-to-2 branch chips within
> > > > this hub. With our MST implementation today, we'll create drm
> > > > connectors for all output ports. Hence, we totally create 6 drm connectors here.
> > > > However, Output ports of Root MSTB are not connected to a stream sink.
> > > > They are connected with branch devices.
> > > > Thus, creating drm connector for such port looks a bit strange to
> > > > me and increases complexity to tracking drm connectors.  My
> > > > thought is we only need to create drm connector for those
> > > > connected end device. Once output port is connected then we can
> > > > determine whether to add on a drm connector for this port based on the peer device type.
> > > > Hence, this commit doesn't try to break the locking logic but add
> > > > more constraints when We try to add drm connector. Please correct
> > > > me if I misunderstand anything here. Thanks!
> > >
> > > Sorry-I will respond to this soon, some more stuff came up at work
> > > so it might take me a day or two
> > No worries. Much appreciated for your time!
> > >
>
> Alright - finally got some time to respond to this. So this change still doesn't really seem correct to me (if anyone watching this thread
> wants to chime in to correct me btw feel free).
>
> JFYI - I don't think the commit is trying to break anything intentionally, it's just that there's a lot of moving pieces with the locking here
> that are easy to trip over. That being said though, besides the locking issues after thinking about this I'm still a bit skeptical on how
> much this would work or even if we would want it.
>
> To start off - my main issue with this is that it sounds like we're basically entirely getting rid of the disconnected state for MST
> connectors, and then only exposing the connector when something is connected. Unless I'm missing something here, the PDT can
> pretty much change whenever something is connected/disconnected or across suspend/resume reprobes. To do this with the
> connector API would be very different from connector probing behavior for other connector types, which already seems like an issue
> to me. This would also break the ability to force a connector to be connected/disconnected, as there would no longer be a way to
> force a disconnected MST connector on.
>
> The other thing is I'm not entirely clear still on what's trying to be accomplished here. If you're trying to identify DRM connectors,
> there's already no guaranteed consistency with connector names which means that having less connectors doesn't really make things
> any easier to identify. For actually trying to figure out more details on connectors, if this is somethig userspace needs, this seems like
> something we should just be adding in the form of connector props.
>
> With all of this being said, this ends up just seeming like we're adding potentially a lot of complexity to how we create connectors and
> the suspend/resume reprobing code. I think it'd be good to know what the precise usecase for this actually is, if this is something you
> still think is needed.
Hi Lyude,

Really thankful for willing to explain in such details. Really appreciate.

I'm trying to fix some problems that observed after these 2 patches
* 09b974e8983 drm/amd/amdgpu_dm/mst: Remove ->destroy_connector() callback
* 72dc0f51591 drm/dp_mst: Remove drm_dp_mst_topology_cbs.destroy_connector

With above patches, we now change to remove dc_sink when connector is about to be destroyed. However, we found out that
connectors won't get destroyed after hotplugs. Thus, after few times hotplugs, we won't create any new dc_sink since number of
sink is exceeding our limitation. As the result of that, I'm trying to figure out why the refcount of connectors won't get zero.

Based on my analysis, I found out that if we connect a sst monitor to a mst hub then connect the hub to the system, and then unplug
the sst monitor from the hub. E.g.
src - mst hub - sst monitor => src - mst hub  (unplug) sst monitor

Within this case, we won't try to put refcount of the sst monitor. Which is what I tried to resolve by [PATCH 3/4].
But here comes a problem which is confusing me that if I can destroy connector in this case. By comparing to another case, if now
mst hub is connected with a mst monitor like this:
src - mst hub - mst monitor => src - mst hub  (unplug) mst monitor

We will put the topology refcount of mst monitor's branching unit in and  drm_dp_port_set_pdt() and eventually call
drm_dp_delayed_destroy_port() to unregister the connector of the logical port. So following the same rule, I think to dynamically
unregister a mst connector is what we want and should be reasonable to also destroy sst connectors in my case. But this conflicts the
idea what we have here. We want to create connectors for all output ports. So if dynamically creating/destroying connectors is what we
want, when is the appropriate time for us to create one is what I'm considering.

Take the StartTech hub DP 1to4 DP output ports for instance. This hub, internally, is constructed by  3 1-to-2 mst branch chips. 2 output
ports of 1st chip are hardwired to another 2 chips. It's how it makes it to support 1-to-4 mst branching. So within this case, the internal 2
output ports of 1st chip is not connecting to a stream sink and will never get connected to one.  Thus, I'm thinking maybe the best timing
to attach a connector to a port is when the port is connected, and the connected PDT is determined as a stream sink.

Sorry if I misunderstand anything here and really thanks for your time to shed light on this : ) Thanks Lyude.
>
> > > > > >
> > > > > > Some context here btw - there's a lot of subtleties with MST
> > > > > > locking that isn't immediately obvious. It's been a while
> > > > > > since I wrote this code, but if I recall correctly one of
> > > > > > those subtleties is that trying to create/destroy connectors
> > > > > > on the fly when ports change types introduces a lot of
> > > > > > potential issues with locking and some very complicated state
> > > > > > transitions. Note that because we maintain the topology as
> > > > > > much as possible across suspend/resumes this means there's a
> > > > > > lot of potential state transitions with drm_dp_mst_port and
> > > > > > drm_dp_mst_branch we need to handle that would typically be impossible to run into otherwise.
> > > > > >
> > > > > > An example of this, if we were to try to prune connectors
> > > > > > based on PDT on the fly: assume we have a simple topology like
> > > > > > this
> > > > > >
> > > > > > Root MSTB -> Port 1 -> MSTB 1.1 (Connected w/ display)
> > > > > >           -> Port 2 -> MSTB 2.1
> > > > > >
> > > > > > We suspend the system, unplug MSTB 1.1, and then resume. Once
> > > > > > the system starts reprobing, it will notice that MSTB 1.1 has
> > > > > > been disconnected. Since we no longer have a PDT, we decide to
> > > > > > unregister our connector. But there's a catch! We had a
> > > > > > display connected to MSTB 1.1, so even after unregistering the
> > > > > > connector it's going to stay around until userspace has
> > > > > > committed a new mode with the connector disabled.
> > > > > >
> > > > > > Now - assuming we're still in the same spot in the resume
> > > > > > processs, let's assume somehow MSTB 1.1 is suddenly plugged
> > > > > > back in. Once we've finished responding to the hotplug event,
> > > > > > we will have created a connector for it. Now we've hit a bug -
> > > > > > userspace hasn't removed the previous zombie connector which
> > > > > > means we have references to the drm_dp_mst_port in our atomic
> > > > > > state and potentially also our payload tables (?? unsure about this one).
> > > > >
> > > > > Whoops. One thing I totally forgot to mention here: the reason
> > > > > this is a problem is because we'd now have two drm_connectors
> > > > > which both have the same drm_dp_mst_port pointer.
> > > > >
> > > > > >
> > > > > > So then how do we manage to add/remove connectors for input
> > > > > > connectors on the fly? Well, that's one of the fun
> > > > > > normally-impossible state transitions I mentioned before.
> > > > > > According to the spec input ports are always disconnected, so
> > > > > > we'll never receive a CSN for them. This means
> > > > I think input ports' DisplayPort_Device_Plug_Status field is still
> > > > set to 1?
> > > > But yes,
> > > > according to DP1.4 spec 2.11.9.3, when MST device whose DPRX
> > > > detected the connection status change shall broadcast CSN downstream only.
> > > > Hence, we'll never receive a CSN for this case.
> > > > > > in theory the only possible way we could have a connector go
> > > > > > from being an input connector to an output connector connector
> > > > > > would be if the entire topology was swapped out during
> > > > > > suspend/resume, and the input/output ports in the two
> > > > > > topologies topology happen to be in different places.
> > > > > > Since we only have to reprobe once during resume before we get
> > > > > > hotplugging enabled, we're guaranteed this state transition
> > > > > > will only happen once in this state - which means the second
> > > > > > replug I described in the previous paragraph can never happen.
> > > > > >
> > > > > > Note that while I don't actually know if there's topologies
> > > > > > with input ports at indexes other than 0, since the
> > > > > > specification isn't super clear on this bit we play it safe and assume it is possible.
> > > > Based on DP1.4 spec 2.5.1. Physical input ports are assigned
> > > > smaller port numbers than physical output ports. For concentrator
> > > > product, if there are 2 input ports of it's branch device, then
> > > > their port numbers are port 0 & port
> > > > 1
> > > > which can refer to figure 2-122 of DP1.4.
> > > > > >
> > > > > > Anyway-this is -all- based off my memory, so please point out
> > > > > > anything here that I've explained that doesn't make sense or
> > > > > > doesn't seem correct :). It's totally possible I might have
> > > > > > misremembered something.
> > > > Thanks again Lyude! Much appreciated for your time and help! And
> > > > please correct me if I misunderstand anything here : )
> > > > > >
> > > > > > >
> > > > > > > In current code, we have chance to create connectors for
> > > > > > > output ports connected with branch device and these are
> > > > > > > redundant connectors.
> > > > > > > e.g.
> > > > > > > StarTech 1-to-4 DP hub is constructed by internal 2 layer
> > > > > > > 1-to-2 branch devices. Creating connectors for such internal
> > > > > > > output ports are redundant.
> > > > > > >
> > > > > > > [How]
> > > > > > > Put constraint on creating connector for connected end
> > > > > > > device only.
> > > > > > >
> > > > > > > Fixes: 6f85f73821f6 ("drm/dp_mst: Add basic topology
> > > > > > > reprobing when
> > > > > > > resuming")
> > > > > > > Cc: Juston Li <juston.li@intel.com>
> > > > > > > Cc: Imre Deak <imre.deak@intel.com>
> > > > > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > > > Cc: Harry Wentland <hwentlan@amd.com>
> > > > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > > > Cc: Sean Paul <sean@poorly.run>
> > > > > > > Cc: Lyude Paul <lyude@redhat.com>
> > > > > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > > > > > Cc: Maxime Ripard <mripard@kernel.org>
> > > > > > > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > > > > > > Cc: David Airlie <airlied@linux.ie>
> > > > > > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > > > > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > > > > > Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> > > > > > > Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
> > > > > > > Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
> > > > > > > Cc: Eryk Brol <eryk.brol@amd.com>
> > > > > > > Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> > > > > > > Cc: Nikola Cornij <nikola.cornij@amd.com>
> > > > > > > Cc: Wayne Lin <Wayne.Lin@amd.com>
> > > > > > > Cc: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
> > > > > > > Cc: Jani Nikula <jani.nikula@intel.com>
> > > > > > > Cc: Manasi Navare <manasi.d.navare@intel.com>
> > > > > > > Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
> > > > > > > Cc: "José Roberto de Souza" <jose.souza@intel.com>
> > > > > > > Cc: Sean Paul <seanpaul@chromium.org>
> > > > > > > Cc: Ben Skeggs <bskeggs@redhat.com>
> > > > > > > Cc: dri-devel@lists.freedesktop.org
> > > > > > > Cc: <stable@vger.kernel.org> # v5.5+
> > > > > > > Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/drm_dp_mst_topology.c | 7 ++++++-
> > > > > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > > index 51cd7f74f026..f13c7187b07f 100644
> > > > > > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > > @@ -2474,7 +2474,8 @@
> > > > > > > drm_dp_mst_handle_link_address_port(struct
> > > > > > > drm_dp_mst_branch *mstb,
> > > > > > >
> > > > > > >         if (port->connector)
> > > > > > >                 drm_modeset_unlock(&mgr->base.lock);
> > > > > > > -       else if (!port->input)
> > > > > > > +       else if (!port->input && port->pdt !=
> > > > > > > +DP_PEER_DEVICE_NONE &&
> > > > > > > +                drm_dp_mst_is_end_device(port->pdt,
> > > > > > > +port->mcs))
> > > > > > >                 drm_dp_mst_port_add_connector(mstb, port);
> > > > > > >
> > > > > > >         if (send_link_addr && port->mstb) { @@ -2557,6
> > > > > > > +2558,10 @@ drm_dp_mst_handle_conn_stat(struct
> > > > > > > drm_dp_mst_branch
> > > > > > > *mstb,
> > > > > > >                 dowork = false;
> > > > > > >         }
> > > > > > >
> > > > > > > +       if (!port->input && !port->connector && new_pdt !=
> > > > > > > DP_PEER_DEVICE_NONE &&
> > > > > > > +           drm_dp_mst_is_end_device(new_pdt, new_mcs))
> > > > > > > +               create_connector = true;
> > > > > > > +
> > > > > > >         if (port->connector)
> > > > > > >                 drm_modeset_unlock(&mgr->base.lock);
> > > > > > >         else if (create_connector)
> > > > > >
> > > > >
> > > > > --
> > > > > Cheers,
> > > > >  Lyude Paul (she/her)
> > > > >  Software Engineer at Red Hat
> > > > Regards,
> > > > Wayne Lin
> > > >
> > >
> > > --
> > > Cheers,
> > >  Lyude Paul (she/her)
> > >  Software Engineer at Red Hat
> > --
> > Regards,
> > Wayne Lin
> >
>
> --
> Cheers,
>  Lyude Paul (she/her)
>  Software Engineer at Red Hat
Best regards,
Wayne Lin
Lyude Paul Aug. 20, 2021, 8:47 p.m. UTC | #8
On Fri, 2021-08-20 at 11:20 +0000, Lin, Wayne wrote:
> [Public]
> 
> > -----Original Message-----
> > From: Lyude Paul <lyude@redhat.com>
> > Sent: Thursday, August 19, 2021 2:59 AM
> > To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland, Harry <
> > Harry.Wentland@amd.com>; Zuo, Jerry
> > <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li <
> > juston.li@intel.com>; Imre Deak <imre.deak@intel.com>;
> > Ville Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter <
> > daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten Lankhorst
> > <maarten.lankhorst@linux.intel.com>; Maxime Ripard <mripard@kernel.org>;
> > Thomas Zimmermann <tzimmermann@suse.de>;
> > David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher,
> > Alexander <Alexander.Deucher@amd.com>; Siqueira,
> > Rodrigo <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo <
> > Aurabindo.Pillai@amd.com>; Eryk Brol <eryk.brol@amd.com>; Bas
> > Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij, Nikola <
> > Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>; Manasi
> > Navare <manasi.d.navare@intel.com>; Ankit Nautiyal <
> > ankit.k.nautiyal@intel.com>; José Roberto de Souza <jose.souza@intel.com>;
> > Sean Paul <seanpaul@chromium.org>; Ben Skeggs <bskeggs@redhat.com>; 
> > stable@vger.kernel.org
> > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected
> > end device
> > 
> > On Wed, 2021-08-11 at 09:49 +0000, Lin, Wayne wrote:
> > > [Public]
> > > 
> > > > -----Original Message-----
> > > > From: Lyude Paul <lyude@redhat.com>
> > > > Sent: Wednesday, August 11, 2021 4:45 AM
> > > > To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> > > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland,
> > > > Harry < Harry.Wentland@amd.com>; Zuo, Jerry <Jerry.Zuo@amd.com>; Wu,
> > > > Hersen <hersenxs.wu@amd.com>; Juston Li < juston.li@intel.com>; Imre
> > > > Deak <imre.deak@intel.com>; Ville Syrjälä
> > > > <ville.syrjala@linux.intel.com>; Daniel Vetter <
> > > > daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten
> > > > Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>; David
> > > > Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher,
> > > > Alexander <Alexander.Deucher@amd.com>; Siqueira, Rodrigo
> > > > <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo <
> > > > Aurabindo.Pillai@amd.com>; Eryk Brol <eryk.brol@amd.com>; Bas
> > > > Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij, Nikola <
> > > > Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>; Manasi
> > > > Navare <manasi.d.navare@intel.com>; Ankit Nautiyal <
> > > > ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > connected end device
> > > > 
> > > > On Wed, 2021-08-04 at 07:13 +0000, Lin, Wayne wrote:
> > > > > [Public]
> > > > > 
> > > > > > -----Original Message-----
> > > > > > From: Lyude Paul <lyude@redhat.com>
> > > > > > Sent: Wednesday, August 4, 2021 8:09 AM
> > > > > > To: Lin, Wayne <Wayne.Lin@amd.com>;
> > > > > > dri-devel@lists.freedesktop.org
> > > > > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>;
> > > > > > Wentland, Harry < Harry.Wentland@amd.com>; Zuo, Jerry
> > > > > > <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li
> > > > > > < juston.li@intel.com>; Imre Deak <imre.deak@intel.com>; Ville
> > > > > > Syrjälä <ville.syrjala@linux.intel.com>; Wentland, Harry <
> > > > > > Harry.Wentland@amd.com>; Daniel Vetter <daniel.vetter@ffwll.ch>;
> > > > > > Sean Paul <sean@poorly.run>; Maarten Lankhorst <
> > > > > > maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > > > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>;
> > > > > > David Airlie <airlied@linux.ie>; Daniel Vetter
> > > > > > <daniel@ffwll.ch>; Deucher, Alexander
> > > > > > <Alexander.Deucher@amd.com>; Siqueira, Rodrigo <
> > > > > > Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > > > > > <Aurabindo.Pillai@amd.com>; Eryk Brol <eryk.brol@amd.com>; Bas
> > > > > > Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij, Nikola
> > > > > > <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>;
> > > > > > Manasi Navare <manasi.d.navare@intel.com>; Ankit Nautiyal
> > > > > > <ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > > > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > > > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > > > connected end device
> > > > > > 
> > > > > > On Tue, 2021-08-03 at 19:58 -0400, Lyude Paul wrote:
> > > > > > > On Wed, 2021-07-21 at 00:03 +0800, Wayne Lin wrote:
> > > > > > > > [Why]
> > > > > > > > Currently, we will create connectors for all output ports no
> > > > > > > > matter it's connected or not. However, in MST, we can only
> > > > > > > > determine whether an output port really stands for a
> > > > > > > > "connector"
> > > > > > > > till it is connected and check its peer device type as an
> > > > > > > > end device.
> > > > > > > 
> > > > > > > What is this commit trying to solve exactly? e.g. is AMD
> > > > > > > currently running into issues with there being too many DRM
> > > > > > > connectors or something like that?
> > > > > > > Ideally this is behavior I'd very much like us to keep as-is
> > > > > > > unless there's good reason to change it.
> > > > > Hi Lyude,
> > > > > Really appreciate for your time to elaborate in such detail. Thanks!
> > > > > 
> > > > > I come up with this commit because I observed something confusing
> > > > > when I was analyzing MST connectors' life cycle. Take the topology
> > > > > instance you mentioned below
> > > > > 
> > > > > Root MSTB -> Output_Port 1 -> MSTB 1.1 ->Output_Port 1(Connected
> > > > > w/
> > > > > display)
> > > > >                     |
> > > > > -
> > > > > > Output_Port 2 (Disconnected)
> > > > >                     -> Output_Port 2 -> MSTB 2.1 ->Output_Port 1
> > > > > (Disconnected)
> > > > > 
> > > > > -> Output_Port 2 (Disconnected) Which is exactly the topology of
> > > > > Startech DP 1-to-4 hub. There are 3 1-to-2 branch chips within
> > > > > this hub. With our MST implementation today, we'll create drm
> > > > > connectors for all output ports. Hence, we totally create 6 drm
> > > > > connectors here.
> > > > > However, Output ports of Root MSTB are not connected to a stream
> > > > > sink.
> > > > > They are connected with branch devices.
> > > > > Thus, creating drm connector for such port looks a bit strange to
> > > > > me and increases complexity to tracking drm connectors.  My
> > > > > thought is we only need to create drm connector for those
> > > > > connected end device. Once output port is connected then we can
> > > > > determine whether to add on a drm connector for this port based on
> > > > > the peer device type.
> > > > > Hence, this commit doesn't try to break the locking logic but add
> > > > > more constraints when We try to add drm connector. Please correct
> > > > > me if I misunderstand anything here. Thanks!
> > > > 
> > > > Sorry-I will respond to this soon, some more stuff came up at work
> > > > so it might take me a day or two
> > > No worries. Much appreciated for your time!
> > > > 
> > 
> > Alright - finally got some time to respond to this. So this change still
> > doesn't really seem correct to me (if anyone watching this thread
> > wants to chime in to correct me btw feel free).
> > 
> > JFYI - I don't think the commit is trying to break anything intentionally,
> > it's just that there's a lot of moving pieces with the locking here
> > that are easy to trip over. That being said though, besides the locking
> > issues after thinking about this I'm still a bit skeptical on how
> > much this would work or even if we would want it.
> > 
> > To start off - my main issue with this is that it sounds like we're
> > basically entirely getting rid of the disconnected state for MST
> > connectors, and then only exposing the connector when something is
> > connected. Unless I'm missing something here, the PDT can
> > pretty much change whenever something is connected/disconnected or across
> > suspend/resume reprobes. To do this with the
> > connector API would be very different from connector probing behavior for
> > other connector types, which already seems like an issue
> > to me. This would also break the ability to force a connector to be
> > connected/disconnected, as there would no longer be a way to
> > force a disconnected MST connector on.
> > 
> > The other thing is I'm not entirely clear still on what's trying to be
> > accomplished here. If you're trying to identify DRM connectors,
> > there's already no guaranteed consistency with connector names which means
> > that having less connectors doesn't really make things
> > any easier to identify. For actually trying to figure out more details on
> > connectors, if this is somethig userspace needs, this seems like
> > something we should just be adding in the form of connector props.
> > 
> > With all of this being said, this ends up just seeming like we're adding
> > potentially a lot of complexity to how we create connectors and
> > the suspend/resume reprobing code. I think it'd be good to know what the
> > precise usecase for this actually is, if this is something you
> > still think is needed.
> Hi Lyude,
> 
> Really thankful for willing to explain in such details. Really appreciate.
> 
> I'm trying to fix some problems that observed after these 2 patches
> * 09b974e8983 drm/amd/amdgpu_dm/mst: Remove ->destroy_connector() callback
> * 72dc0f51591 drm/dp_mst: Remove drm_dp_mst_topology_cbs.destroy_connector
> 
> With above patches, we now change to remove dc_sink when connector is about
> to be destroyed. However, we found out that
> connectors won't get destroyed after hotplugs. Thus, after few times
> hotplugs, we won't create any new dc_sink since number of
> sink is exceeding our limitation. As the result of that, I'm trying to
> figure out why the refcount of connectors won't get zero.
> 
> Based on my analysis, I found out that if we connect a sst monitor to a mst
> hub then connect the hub to the system, and then unplug
> the sst monitor from the hub. E.g.
> src - mst hub - sst monitor => src - mst hub  (unplug) sst monitor
> 
> Within this case, we won't try to put refcount of the sst monitor. Which is
> what I tried to resolve by [PATCH 3/4].
> But here comes a problem which is confusing me that if I can destroy
> connector in this case. By comparing to another case, if now
> mst hub is connected with a mst monitor like this:
> src - mst hub - mst monitor => src - mst hub  (unplug) mst monitor
> 
> We will put the topology refcount of mst monitor's branching unit in and 
> drm_dp_port_set_pdt() and eventually call
> drm_dp_delayed_destroy_port() to unregister the connector of the logical
> port. So following the same rule, I think to dynamically
> unregister a mst connector is what we want and should be reasonable to also
> destroy sst connectors in my case. But this conflicts the
> idea what we have here. We want to create connectors for all output ports.
> So if dynamically creating/destroying connectors is what we
> want, when is the appropriate time for us to create one is what I'm
> considering.
> 
> Take the StartTech hub DP 1to4 DP output ports for instance. This hub,
> internally, is constructed by  3 1-to-2 mst branch chips. 2 output
> ports of 1st chip are hardwired to another 2 chips. It's how it makes it to
> support 1-to-4 mst branching. So within this case, the internal 2
> output ports of 1st chip is not connecting to a stream sink and will never
> get connected to one.  Thus, I'm thinking maybe the best timing
> to attach a connector to a port is when the port is connected, and the
> connected PDT is determined as a stream sink.
> 
> Sorry if I misunderstand anything here and really thanks for your time to
> shed light on this : ) Thanks Lyude.

It's no problem, it is my job after all! Sorry for how long my responses have
been taking, but my plate seems to be finally clearing up for the foreseeable
future.

That being said - it sounds like with this we still aren't actually clear on
where the topology refcount leak is happening - only when it's happening,
which says to me that's the issue we really need to be figuring out the cause
of as opposed to trying to workaround it.

Actually - refcount leaks is an issue I've ran into a number of times before
in the past, so a while back I actually added some nice debugging features to
assist with debugging leaks. If you enable the following options in your
kernel config:

CONFIG_EXPERT=y # This must be set first before the next option
CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y

Unfortunately, I'm suddenly realizing after typing this that apparently I
never bothered adding a way for us to debug the refcounts of ports/mstbs that
haven't been released yet - only the ones for ones that have. This shouldn't
be difficult at all for me to add, so I'll send you a patch either today or at
the start of next week to try debugging with using this, and then we can
figure out where this leak is really coming from.

> > 
> > > > > > > 
> > > > > > > Some context here btw - there's a lot of subtleties with MST
> > > > > > > locking that isn't immediately obvious. It's been a while
> > > > > > > since I wrote this code, but if I recall correctly one of
> > > > > > > those subtleties is that trying to create/destroy connectors
> > > > > > > on the fly when ports change types introduces a lot of
> > > > > > > potential issues with locking and some very complicated state
> > > > > > > transitions. Note that because we maintain the topology as
> > > > > > > much as possible across suspend/resumes this means there's a
> > > > > > > lot of potential state transitions with drm_dp_mst_port and
> > > > > > > drm_dp_mst_branch we need to handle that would typically be
> > > > > > > impossible to run into otherwise.
> > > > > > > 
> > > > > > > An example of this, if we were to try to prune connectors
> > > > > > > based on PDT on the fly: assume we have a simple topology like
> > > > > > > this
> > > > > > > 
> > > > > > > Root MSTB -> Port 1 -> MSTB 1.1 (Connected w/ display)
> > > > > > >           -> Port 2 -> MSTB 2.1
> > > > > > > 
> > > > > > > We suspend the system, unplug MSTB 1.1, and then resume. Once
> > > > > > > the system starts reprobing, it will notice that MSTB 1.1 has
> > > > > > > been disconnected. Since we no longer have a PDT, we decide to
> > > > > > > unregister our connector. But there's a catch! We had a
> > > > > > > display connected to MSTB 1.1, so even after unregistering the
> > > > > > > connector it's going to stay around until userspace has
> > > > > > > committed a new mode with the connector disabled.
> > > > > > > 
> > > > > > > Now - assuming we're still in the same spot in the resume
> > > > > > > processs, let's assume somehow MSTB 1.1 is suddenly plugged
> > > > > > > back in. Once we've finished responding to the hotplug event,
> > > > > > > we will have created a connector for it. Now we've hit a bug -
> > > > > > > userspace hasn't removed the previous zombie connector which
> > > > > > > means we have references to the drm_dp_mst_port in our atomic
> > > > > > > state and potentially also our payload tables (?? unsure about
> > > > > > > this one).
> > > > > > 
> > > > > > Whoops. One thing I totally forgot to mention here: the reason
> > > > > > this is a problem is because we'd now have two drm_connectors
> > > > > > which both have the same drm_dp_mst_port pointer.
> > > > > > 
> > > > > > > 
> > > > > > > So then how do we manage to add/remove connectors for input
> > > > > > > connectors on the fly? Well, that's one of the fun
> > > > > > > normally-impossible state transitions I mentioned before.
> > > > > > > According to the spec input ports are always disconnected, so
> > > > > > > we'll never receive a CSN for them. This means
> > > > > I think input ports' DisplayPort_Device_Plug_Status field is still
> > > > > set to 1?
> > > > > But yes,
> > > > > according to DP1.4 spec 2.11.9.3, when MST device whose DPRX
> > > > > detected the connection status change shall broadcast CSN downstream
> > > > > only.
> > > > > Hence, we'll never receive a CSN for this case.
> > > > > > > in theory the only possible way we could have a connector go
> > > > > > > from being an input connector to an output connector connector
> > > > > > > would be if the entire topology was swapped out during
> > > > > > > suspend/resume, and the input/output ports in the two
> > > > > > > topologies topology happen to be in different places.
> > > > > > > Since we only have to reprobe once during resume before we get
> > > > > > > hotplugging enabled, we're guaranteed this state transition
> > > > > > > will only happen once in this state - which means the second
> > > > > > > replug I described in the previous paragraph can never happen.
> > > > > > > 
> > > > > > > Note that while I don't actually know if there's topologies
> > > > > > > with input ports at indexes other than 0, since the
> > > > > > > specification isn't super clear on this bit we play it safe and
> > > > > > > assume it is possible.
> > > > > Based on DP1.4 spec 2.5.1. Physical input ports are assigned
> > > > > smaller port numbers than physical output ports. For concentrator
> > > > > product, if there are 2 input ports of it's branch device, then
> > > > > their port numbers are port 0 & port
> > > > > 1
> > > > > which can refer to figure 2-122 of DP1.4.
> > > > > > > 
> > > > > > > Anyway-this is -all- based off my memory, so please point out
> > > > > > > anything here that I've explained that doesn't make sense or
> > > > > > > doesn't seem correct :). It's totally possible I might have
> > > > > > > misremembered something.
> > > > > Thanks again Lyude! Much appreciated for your time and help! And
> > > > > please correct me if I misunderstand anything here : )
> > > > > > > 
> > > > > > > > 
> > > > > > > > In current code, we have chance to create connectors for
> > > > > > > > output ports connected with branch device and these are
> > > > > > > > redundant connectors.
> > > > > > > > e.g.
> > > > > > > > StarTech 1-to-4 DP hub is constructed by internal 2 layer
> > > > > > > > 1-to-2 branch devices. Creating connectors for such internal
> > > > > > > > output ports are redundant.
> > > > > > > > 
> > > > > > > > [How]
> > > > > > > > Put constraint on creating connector for connected end
> > > > > > > > device only.
> > > > > > > > 
> > > > > > > > Fixes: 6f85f73821f6 ("drm/dp_mst: Add basic topology
> > > > > > > > reprobing when
> > > > > > > > resuming")
> > > > > > > > Cc: Juston Li <juston.li@intel.com>
> > > > > > > > Cc: Imre Deak <imre.deak@intel.com>
> > > > > > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > > > > Cc: Harry Wentland <hwentlan@amd.com>
> > > > > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > > > > Cc: Sean Paul <sean@poorly.run>
> > > > > > > > Cc: Lyude Paul <lyude@redhat.com>
> > > > > > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > > > > > > Cc: Maxime Ripard <mripard@kernel.org>
> > > > > > > > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > > > > > > > Cc: David Airlie <airlied@linux.ie>
> > > > > > > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > > > > > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > > > > > > Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> > > > > > > > Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
> > > > > > > > Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
> > > > > > > > Cc: Eryk Brol <eryk.brol@amd.com>
> > > > > > > > Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> > > > > > > > Cc: Nikola Cornij <nikola.cornij@amd.com>
> > > > > > > > Cc: Wayne Lin <Wayne.Lin@amd.com>
> > > > > > > > Cc: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
> > > > > > > > Cc: Jani Nikula <jani.nikula@intel.com>
> > > > > > > > Cc: Manasi Navare <manasi.d.navare@intel.com>
> > > > > > > > Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
> > > > > > > > Cc: "José Roberto de Souza" <jose.souza@intel.com>
> > > > > > > > Cc: Sean Paul <seanpaul@chromium.org>
> > > > > > > > Cc: Ben Skeggs <bskeggs@redhat.com>
> > > > > > > > Cc: dri-devel@lists.freedesktop.org
> > > > > > > > Cc: <stable@vger.kernel.org> # v5.5+
> > > > > > > > Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
> > > > > > > > ---
> > > > > > > >  drivers/gpu/drm/drm_dp_mst_topology.c | 7 ++++++-
> > > > > > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > > > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > > > index 51cd7f74f026..f13c7187b07f 100644
> > > > > > > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > > > @@ -2474,7 +2474,8 @@
> > > > > > > > drm_dp_mst_handle_link_address_port(struct
> > > > > > > > drm_dp_mst_branch *mstb,
> > > > > > > > 
> > > > > > > >         if (port->connector)
> > > > > > > >                 drm_modeset_unlock(&mgr->base.lock);
> > > > > > > > -       else if (!port->input)
> > > > > > > > +       else if (!port->input && port->pdt !=
> > > > > > > > +DP_PEER_DEVICE_NONE &&
> > > > > > > > +                drm_dp_mst_is_end_device(port->pdt,
> > > > > > > > +port->mcs))
> > > > > > > >                 drm_dp_mst_port_add_connector(mstb, port);
> > > > > > > > 
> > > > > > > >         if (send_link_addr && port->mstb) { @@ -2557,6
> > > > > > > > +2558,10 @@ drm_dp_mst_handle_conn_stat(struct
> > > > > > > > drm_dp_mst_branch
> > > > > > > > *mstb,
> > > > > > > >                 dowork = false;
> > > > > > > >         }
> > > > > > > > 
> > > > > > > > +       if (!port->input && !port->connector && new_pdt !=
> > > > > > > > DP_PEER_DEVICE_NONE &&
> > > > > > > > +           drm_dp_mst_is_end_device(new_pdt, new_mcs))
> > > > > > > > +               create_connector = true;
> > > > > > > > +
> > > > > > > >         if (port->connector)
> > > > > > > >                 drm_modeset_unlock(&mgr->base.lock);
> > > > > > > >         else if (create_connector)
> > > > > > > 
> > > > > > 
> > > > > > --
> > > > > > Cheers,
> > > > > >  Lyude Paul (she/her)
> > > > > >  Software Engineer at Red Hat
> > > > > Regards,
> > > > > Wayne Lin
> > > > > 
> > > > 
> > > > --
> > > > Cheers,
> > > >  Lyude Paul (she/her)
> > > >  Software Engineer at Red Hat
> > > --
> > > Regards,
> > > Wayne Lin
> > > 
> > 
> > --
> > Cheers,
> >  Lyude Paul (she/her)
> >  Software Engineer at Red Hat
> Best regards,
> Wayne Lin
>
Lin, Wayne Aug. 23, 2021, 6:33 a.m. UTC | #9
[Public]

> -----Original Message-----
> From: Lyude Paul <lyude@redhat.com>
> Sent: Saturday, August 21, 2021 4:48 AM
> To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland, Harry <Harry.Wentland@amd.com>; Zuo, Jerry
> <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li <juston.li@intel.com>; Imre Deak <imre.deak@intel.com>;
> Ville Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten Lankhorst
> <maarten.lankhorst@linux.intel.com>; Maxime Ripard <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>;
> David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher, Alexander <Alexander.Deucher@amd.com>; Siqueira,
> Rodrigo <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> <bas@basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>; Manasi Navare
> <manasi.d.navare@intel.com>; Ankit Nautiyal <ankit.k.nautiyal@intel.com>; José Roberto de Souza <jose.souza@intel.com>; Sean
> Paul <seanpaul@chromium.org>; Ben Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device
>
> On Fri, 2021-08-20 at 11:20 +0000, Lin, Wayne wrote:
> > [Public]
> >
> > > -----Original Message-----
> > > From: Lyude Paul <lyude@redhat.com>
> > > Sent: Thursday, August 19, 2021 2:59 AM
> > > To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland,
> > > Harry < Harry.Wentland@amd.com>; Zuo, Jerry <Jerry.Zuo@amd.com>; Wu,
> > > Hersen <hersenxs.wu@amd.com>; Juston Li < juston.li@intel.com>; Imre
> > > Deak <imre.deak@intel.com>; Ville Syrjälä
> > > <ville.syrjala@linux.intel.com>; Daniel Vetter <
> > > daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten
> > > Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>; David
> > > Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher,
> > > Alexander <Alexander.Deucher@amd.com>; Siqueira, Rodrigo
> > > <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo <
> > > Aurabindo.Pillai@amd.com>; Eryk Brol <eryk.brol@amd.com>; Bas
> > > Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij, Nikola <
> > > Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>; Manasi
> > > Navare <manasi.d.navare@intel.com>; Ankit Nautiyal <
> > > ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > connected end device
> > >
> > > On Wed, 2021-08-11 at 09:49 +0000, Lin, Wayne wrote:
> > > > [Public]
> > > >
> > > > > -----Original Message-----
> > > > > From: Lyude Paul <lyude@redhat.com>
> > > > > Sent: Wednesday, August 11, 2021 4:45 AM
> > > > > To: Lin, Wayne <Wayne.Lin@amd.com>;
> > > > > dri-devel@lists.freedesktop.org
> > > > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>;
> > > > > Wentland, Harry < Harry.Wentland@amd.com>; Zuo, Jerry
> > > > > <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li
> > > > > < juston.li@intel.com>; Imre Deak <imre.deak@intel.com>; Ville
> > > > > Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter <
> > > > > daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten
> > > > > Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>;
> > > > > David Airlie <airlied@linux.ie>; Daniel Vetter
> > > > > <daniel@ffwll.ch>; Deucher, Alexander
> > > > > <Alexander.Deucher@amd.com>; Siqueira, Rodrigo
> > > > > <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo <
> > > > > Aurabindo.Pillai@amd.com>; Eryk Brol <eryk.brol@amd.com>; Bas
> > > > > Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij, Nikola <
> > > > > Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>;
> > > > > Manasi Navare <manasi.d.navare@intel.com>; Ankit Nautiyal <
> > > > > ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > > connected end device
> > > > >
> > > > > On Wed, 2021-08-04 at 07:13 +0000, Lin, Wayne wrote:
> > > > > > [Public]
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Lyude Paul <lyude@redhat.com>
> > > > > > > Sent: Wednesday, August 4, 2021 8:09 AM
> > > > > > > To: Lin, Wayne <Wayne.Lin@amd.com>;
> > > > > > > dri-devel@lists.freedesktop.org
> > > > > > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>;
> > > > > > > Wentland, Harry < Harry.Wentland@amd.com>; Zuo, Jerry
> > > > > > > <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>;
> > > > > > > Juston Li < juston.li@intel.com>; Imre Deak
> > > > > > > <imre.deak@intel.com>; Ville Syrjälä
> > > > > > > <ville.syrjala@linux.intel.com>; Wentland, Harry <
> > > > > > > Harry.Wentland@amd.com>; Daniel Vetter
> > > > > > > <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>;
> > > > > > > Maarten Lankhorst < maarten.lankhorst@linux.intel.com>;
> > > > > > > Maxime Ripard <mripard@kernel.org>; Thomas Zimmermann
> > > > > > > <tzimmermann@suse.de>; David Airlie <airlied@linux.ie>;
> > > > > > > Daniel Vetter <daniel@ffwll.ch>; Deucher, Alexander
> > > > > > > <Alexander.Deucher@amd.com>; Siqueira, Rodrigo <
> > > > > > > Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > > > > > > <Aurabindo.Pillai@amd.com>; Eryk Brol <eryk.brol@amd.com>;
> > > > > > > Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij, Nikola
> > > > > > > <Nikola.Cornij@amd.com>; Jani Nikula
> > > > > > > <jani.nikula@intel.com>; Manasi Navare
> > > > > > > <manasi.d.navare@intel.com>; Ankit Nautiyal
> > > > > > > <ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > > > > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>;
> > > > > > > Ben Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > > > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector
> > > > > > > for connected end device
> > > > > > >
> > > > > > > On Tue, 2021-08-03 at 19:58 -0400, Lyude Paul wrote:
> > > > > > > > On Wed, 2021-07-21 at 00:03 +0800, Wayne Lin wrote:
> > > > > > > > > [Why]
> > > > > > > > > Currently, we will create connectors for all output
> > > > > > > > > ports no matter it's connected or not. However, in MST,
> > > > > > > > > we can only determine whether an output port really
> > > > > > > > > stands for a "connector"
> > > > > > > > > till it is connected and check its peer device type as
> > > > > > > > > an end device.
> > > > > > > >
> > > > > > > > What is this commit trying to solve exactly? e.g. is AMD
> > > > > > > > currently running into issues with there being too many
> > > > > > > > DRM connectors or something like that?
> > > > > > > > Ideally this is behavior I'd very much like us to keep
> > > > > > > > as-is unless there's good reason to change it.
> > > > > > Hi Lyude,
> > > > > > Really appreciate for your time to elaborate in such detail. Thanks!
> > > > > >
> > > > > > I come up with this commit because I observed something
> > > > > > confusing when I was analyzing MST connectors' life cycle.
> > > > > > Take the topology instance you mentioned below
> > > > > >
> > > > > > Root MSTB -> Output_Port 1 -> MSTB 1.1 ->Output_Port
> > > > > > 1(Connected w/
> > > > > > display)
> > > > > >                     |
> > > > > > -
> > > > > > > Output_Port 2 (Disconnected)
> > > > > >                     -> Output_Port 2 -> MSTB 2.1 ->Output_Port
> > > > > > 1
> > > > > > (Disconnected)
> > > > > >
> > > > > > -> Output_Port 2 (Disconnected) Which is exactly the topology
> > > > > > -> of
> > > > > > Startech DP 1-to-4 hub. There are 3 1-to-2 branch chips within
> > > > > > this hub. With our MST implementation today, we'll create drm
> > > > > > connectors for all output ports. Hence, we totally create 6
> > > > > > drm connectors here.
> > > > > > However, Output ports of Root MSTB are not connected to a
> > > > > > stream sink.
> > > > > > They are connected with branch devices.
> > > > > > Thus, creating drm connector for such port looks a bit strange
> > > > > > to me and increases complexity to tracking drm connectors.  My
> > > > > > thought is we only need to create drm connector for those
> > > > > > connected end device. Once output port is connected then we
> > > > > > can determine whether to add on a drm connector for this port
> > > > > > based on the peer device type.
> > > > > > Hence, this commit doesn't try to break the locking logic but
> > > > > > add more constraints when We try to add drm connector. Please
> > > > > > correct me if I misunderstand anything here. Thanks!
> > > > >
> > > > > Sorry-I will respond to this soon, some more stuff came up at
> > > > > work so it might take me a day or two
> > > > No worries. Much appreciated for your time!
> > > > >
> > >
> > > Alright - finally got some time to respond to this. So this change
> > > still doesn't really seem correct to me (if anyone watching this
> > > thread wants to chime in to correct me btw feel free).
> > >
> > > JFYI - I don't think the commit is trying to break anything
> > > intentionally, it's just that there's a lot of moving pieces with
> > > the locking here that are easy to trip over. That being said though,
> > > besides the locking issues after thinking about this I'm still a bit
> > > skeptical on how much this would work or even if we would want it.
> > >
> > > To start off - my main issue with this is that it sounds like we're
> > > basically entirely getting rid of the disconnected state for MST
> > > connectors, and then only exposing the connector when something is
> > > connected. Unless I'm missing something here, the PDT can pretty
> > > much change whenever something is connected/disconnected or across
> > > suspend/resume reprobes. To do this with the connector API would be
> > > very different from connector probing behavior for other connector
> > > types, which already seems like an issue to me. This would also
> > > break the ability to force a connector to be connected/disconnected,
> > > as there would no longer be a way to force a disconnected MST
> > > connector on.
> > >
> > > The other thing is I'm not entirely clear still on what's trying to
> > > be accomplished here. If you're trying to identify DRM connectors,
> > > there's already no guaranteed consistency with connector names which
> > > means that having less connectors doesn't really make things any
> > > easier to identify. For actually trying to figure out more details
> > > on connectors, if this is somethig userspace needs, this seems like
> > > something we should just be adding in the form of connector props.
> > >
> > > With all of this being said, this ends up just seeming like we're
> > > adding potentially a lot of complexity to how we create connectors
> > > and the suspend/resume reprobing code. I think it'd be good to know
> > > what the precise usecase for this actually is, if this is something
> > > you still think is needed.
> > Hi Lyude,
> >
> > Really thankful for willing to explain in such details. Really appreciate.
> >
> > I'm trying to fix some problems that observed after these 2 patches
> > * 09b974e8983 drm/amd/amdgpu_dm/mst: Remove ->destroy_connector()
> > callback
> > * 72dc0f51591 drm/dp_mst: Remove
> > drm_dp_mst_topology_cbs.destroy_connector
> >
> > With above patches, we now change to remove dc_sink when connector is
> > about to be destroyed. However, we found out that connectors won't get
> > destroyed after hotplugs. Thus, after few times hotplugs, we won't
> > create any new dc_sink since number of sink is exceeding our
> > limitation. As the result of that, I'm trying to figure out why the
> > refcount of connectors won't get zero.
> >
> > Based on my analysis, I found out that if we connect a sst monitor to
> > a mst hub then connect the hub to the system, and then unplug the sst
> > monitor from the hub. E.g.
> > src - mst hub - sst monitor => src - mst hub  (unplug) sst monitor
> >
> > Within this case, we won't try to put refcount of the sst monitor.
> > Which is what I tried to resolve by [PATCH 3/4].
> > But here comes a problem which is confusing me that if I can destroy
> > connector in this case. By comparing to another case, if now mst hub
> > is connected with a mst monitor like this:
> > src - mst hub - mst monitor => src - mst hub  (unplug) mst monitor
> >
> > We will put the topology refcount of mst monitor's branching unit in
> > and
> > drm_dp_port_set_pdt() and eventually call
> > drm_dp_delayed_destroy_port() to unregister the connector of the
> > logical port. So following the same rule, I think to dynamically
> > unregister a mst connector is what we want and should be reasonable to
> > also destroy sst connectors in my case. But this conflicts the idea
> > what we have here. We want to create connectors for all output ports.
> > So if dynamically creating/destroying connectors is what we want, when
> > is the appropriate time for us to create one is what I'm considering.
> >
> > Take the StartTech hub DP 1to4 DP output ports for instance. This hub,
> > internally, is constructed by  3 1-to-2 mst branch chips. 2 output
> > ports of 1st chip are hardwired to another 2 chips. It's how it makes
> > it to support 1-to-4 mst branching. So within this case, the internal
> > 2 output ports of 1st chip is not connecting to a stream sink and will
> > never get connected to one.  Thus, I'm thinking maybe the best timing
> > to attach a connector to a port is when the port is connected, and the
> > connected PDT is determined as a stream sink.
> >
> > Sorry if I misunderstand anything here and really thanks for your time
> > to shed light on this : ) Thanks Lyude.
>
> It's no problem, it is my job after all! Sorry for how long my responses have been taking, but my plate seems to be finally clearing up
> for the foreseeable future.
>
> That being said - it sounds like with this we still aren't actually clear on where the topology refcount leak is happening - only when it's
> happening, which says to me that's the issue we really need to be figuring out the cause of as opposed to trying to workaround it.
>
> Actually - refcount leaks is an issue I've ran into a number of times before in the past, so a while back I actually added some nice
> debugging features to assist with debugging leaks. If you enable the following options in your kernel config:
>
> CONFIG_EXPERT=y # This must be set first before the next option CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y
>
> Unfortunately, I'm suddenly realizing after typing this that apparently I never bothered adding a way for us to debug the refcounts of
> ports/mstbs that haven't been released yet - only the ones for ones that have. This shouldn't be difficult at all for me to add, so I'll
> send you a patch either today or at the start of next week to try debugging with using this, and then we can figure out where this leak
> is really coming from.

Thanks Lyude!

Sorry to bother you, but I would like to clarify this again.  So it sounds like you also agree that we should destroy associated connector
when we unplug sst monitor from a mst hub in the case that I described? In the case I described (unplug sst monitor), we only receive
CSN from the hub that notifying us the connection status of one of its downstream output ports is changed to disconnected. There is no
topology refcount needed to be decreased on this disconnected port but the malloc refcount. Since the output port is still declared by
the mst hub,  I think we shouldn't destroy the port. Actually, no ports nor mst branch devices should get destroyed in this case I think.
The result of LINK_ADDRESS is still the same before/after removing the sst monitor except the
DisplayPort_Device_Plug_Status/ Legacy_Device_Plug_Status.

Hence, if you agree that we should put refcount of the connector of the disconnected port within the unplugging sst monitor case to
release the allocated resource, it means we don't want to create connectors for those disconnected ports. Which conflicts current flow
to create connectors for all declared output ports.

Thanks again for your time Lyude!
>
> > >
> > > > > > > >
> > > > > > > > Some context here btw - there's a lot of subtleties with
> > > > > > > > MST locking that isn't immediately obvious. It's been a
> > > > > > > > while since I wrote this code, but if I recall correctly
> > > > > > > > one of those subtleties is that trying to create/destroy
> > > > > > > > connectors on the fly when ports change types introduces a
> > > > > > > > lot of potential issues with locking and some very
> > > > > > > > complicated state transitions. Note that because we
> > > > > > > > maintain the topology as much as possible across
> > > > > > > > suspend/resumes this means there's a lot of potential
> > > > > > > > state transitions with drm_dp_mst_port and
> > > > > > > > drm_dp_mst_branch we need to handle that would typically be impossible to run into otherwise.
> > > > > > > >
> > > > > > > > An example of this, if we were to try to prune connectors
> > > > > > > > based on PDT on the fly: assume we have a simple topology
> > > > > > > > like this
> > > > > > > >
> > > > > > > > Root MSTB -> Port 1 -> MSTB 1.1 (Connected w/ display)
> > > > > > > >           -> Port 2 -> MSTB 2.1
> > > > > > > >
> > > > > > > > We suspend the system, unplug MSTB 1.1, and then resume.
> > > > > > > > Once the system starts reprobing, it will notice that MSTB
> > > > > > > > 1.1 has been disconnected. Since we no longer have a PDT,
> > > > > > > > we decide to unregister our connector. But there's a
> > > > > > > > catch! We had a display connected to MSTB 1.1, so even
> > > > > > > > after unregistering the connector it's going to stay
> > > > > > > > around until userspace has committed a new mode with the connector disabled.
> > > > > > > >
> > > > > > > > Now - assuming we're still in the same spot in the resume
> > > > > > > > processs, let's assume somehow MSTB 1.1 is suddenly
> > > > > > > > plugged back in. Once we've finished responding to the
> > > > > > > > hotplug event, we will have created a connector for it.
> > > > > > > > Now we've hit a bug - userspace hasn't removed the
> > > > > > > > previous zombie connector which means we have references
> > > > > > > > to the drm_dp_mst_port in our atomic state and potentially
> > > > > > > > also our payload tables (?? unsure about this one).
> > > > > > >
> > > > > > > Whoops. One thing I totally forgot to mention here: the
> > > > > > > reason this is a problem is because we'd now have two
> > > > > > > drm_connectors which both have the same drm_dp_mst_port pointer.
> > > > > > >
> > > > > > > >
> > > > > > > > So then how do we manage to add/remove connectors for
> > > > > > > > input connectors on the fly? Well, that's one of the fun
> > > > > > > > normally-impossible state transitions I mentioned before.
> > > > > > > > According to the spec input ports are always disconnected,
> > > > > > > > so we'll never receive a CSN for them. This means
> > > > > > I think input ports' DisplayPort_Device_Plug_Status field is
> > > > > > still set to 1?
> > > > > > But yes,
> > > > > > according to DP1.4 spec 2.11.9.3, when MST device whose DPRX
> > > > > > detected the connection status change shall broadcast CSN
> > > > > > downstream only.
> > > > > > Hence, we'll never receive a CSN for this case.
> > > > > > > > in theory the only possible way we could have a connector
> > > > > > > > go from being an input connector to an output connector
> > > > > > > > connector would be if the entire topology was swapped out
> > > > > > > > during suspend/resume, and the input/output ports in the
> > > > > > > > two topologies topology happen to be in different places.
> > > > > > > > Since we only have to reprobe once during resume before we
> > > > > > > > get hotplugging enabled, we're guaranteed this state
> > > > > > > > transition will only happen once in this state - which
> > > > > > > > means the second replug I described in the previous paragraph can never happen.
> > > > > > > >
> > > > > > > > Note that while I don't actually know if there's
> > > > > > > > topologies with input ports at indexes other than 0, since
> > > > > > > > the specification isn't super clear on this bit we play it
> > > > > > > > safe and assume it is possible.
> > > > > > Based on DP1.4 spec 2.5.1. Physical input ports are assigned
> > > > > > smaller port numbers than physical output ports. For
> > > > > > concentrator product, if there are 2 input ports of it's
> > > > > > branch device, then their port numbers are port 0 & port
> > > > > > 1
> > > > > > which can refer to figure 2-122 of DP1.4.
> > > > > > > >
> > > > > > > > Anyway-this is -all- based off my memory, so please point
> > > > > > > > out anything here that I've explained that doesn't make
> > > > > > > > sense or doesn't seem correct :). It's totally possible I
> > > > > > > > might have misremembered something.
> > > > > > Thanks again Lyude! Much appreciated for your time and help!
> > > > > > And please correct me if I misunderstand anything here : )
> > > > > > > >
> > > > > > > > >
> > > > > > > > > In current code, we have chance to create connectors for
> > > > > > > > > output ports connected with branch device and these are
> > > > > > > > > redundant connectors.
> > > > > > > > > e.g.
> > > > > > > > > StarTech 1-to-4 DP hub is constructed by internal 2
> > > > > > > > > layer
> > > > > > > > > 1-to-2 branch devices. Creating connectors for such
> > > > > > > > > internal output ports are redundant.
> > > > > > > > >
> > > > > > > > > [How]
> > > > > > > > > Put constraint on creating connector for connected end
> > > > > > > > > device only.
> > > > > > > > >
> > > > > > > > > Fixes: 6f85f73821f6 ("drm/dp_mst: Add basic topology
> > > > > > > > > reprobing when
> > > > > > > > > resuming")
> > > > > > > > > Cc: Juston Li <juston.li@intel.com>
> > > > > > > > > Cc: Imre Deak <imre.deak@intel.com>
> > > > > > > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > > > > > Cc: Harry Wentland <hwentlan@amd.com>
> > > > > > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > > > > > Cc: Sean Paul <sean@poorly.run>
> > > > > > > > > Cc: Lyude Paul <lyude@redhat.com>
> > > > > > > > > Cc: Maarten Lankhorst
> > > > > > > > > <maarten.lankhorst@linux.intel.com>
> > > > > > > > > Cc: Maxime Ripard <mripard@kernel.org>
> > > > > > > > > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > > > > > > > > Cc: David Airlie <airlied@linux.ie>
> > > > > > > > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > > > > > > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > > > > > > > Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> > > > > > > > > Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
> > > > > > > > > Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
> > > > > > > > > Cc: Eryk Brol <eryk.brol@amd.com>
> > > > > > > > > Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> > > > > > > > > Cc: Nikola Cornij <nikola.cornij@amd.com>
> > > > > > > > > Cc: Wayne Lin <Wayne.Lin@amd.com>
> > > > > > > > > Cc: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
> > > > > > > > > Cc: Jani Nikula <jani.nikula@intel.com>
> > > > > > > > > Cc: Manasi Navare <manasi.d.navare@intel.com>
> > > > > > > > > Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
> > > > > > > > > Cc: "José Roberto de Souza" <jose.souza@intel.com>
> > > > > > > > > Cc: Sean Paul <seanpaul@chromium.org>
> > > > > > > > > Cc: Ben Skeggs <bskeggs@redhat.com>
> > > > > > > > > Cc: dri-devel@lists.freedesktop.org
> > > > > > > > > Cc: <stable@vger.kernel.org> # v5.5+
> > > > > > > > > Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
> > > > > > > > > ---
> > > > > > > > >  drivers/gpu/drm/drm_dp_mst_topology.c | 7 ++++++-
> > > > > > > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > > > > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > > > > index 51cd7f74f026..f13c7187b07f 100644
> > > > > > > > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > > > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > > > > > > > @@ -2474,7 +2474,8 @@
> > > > > > > > > drm_dp_mst_handle_link_address_port(struct
> > > > > > > > > drm_dp_mst_branch *mstb,
> > > > > > > > >
> > > > > > > > >         if (port->connector)
> > > > > > > > >                 drm_modeset_unlock(&mgr->base.lock);
> > > > > > > > > -       else if (!port->input)
> > > > > > > > > +       else if (!port->input && port->pdt !=
> > > > > > > > > +DP_PEER_DEVICE_NONE &&
> > > > > > > > > +                drm_dp_mst_is_end_device(port->pdt,
> > > > > > > > > +port->mcs))
> > > > > > > > >                 drm_dp_mst_port_add_connector(mstb,
> > > > > > > > > port);
> > > > > > > > >
> > > > > > > > >         if (send_link_addr && port->mstb) { @@ -2557,6
> > > > > > > > > +2558,10 @@ drm_dp_mst_handle_conn_stat(struct
> > > > > > > > > drm_dp_mst_branch
> > > > > > > > > *mstb,
> > > > > > > > >                 dowork = false;
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > > +       if (!port->input && !port->connector && new_pdt
> > > > > > > > > +!=
> > > > > > > > > DP_PEER_DEVICE_NONE &&
> > > > > > > > > +           drm_dp_mst_is_end_device(new_pdt, new_mcs))
> > > > > > > > > +               create_connector = true;
> > > > > > > > > +
> > > > > > > > >         if (port->connector)
> > > > > > > > >                 drm_modeset_unlock(&mgr->base.lock);
> > > > > > > > >         else if (create_connector)
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Cheers,
> > > > > > >  Lyude Paul (she/her)
> > > > > > >  Software Engineer at Red Hat
> > > > > > Regards,
> > > > > > Wayne Lin
> > > > > >
> > > > >
> > > > > --
> > > > > Cheers,
> > > > >  Lyude Paul (she/her)
> > > > >  Software Engineer at Red Hat
> > > > --
> > > > Regards,
> > > > Wayne Lin
> > > >
> > >
> > > --
> > > Cheers,
> > >  Lyude Paul (she/her)
> > >  Software Engineer at Red Hat
> > Best regards,
> > Wayne Lin
> >
>
> --
> Cheers,
>  Lyude Paul (she/her)
>  Software Engineer at Red Hat
---
Regards,
Wayne Lin
Lyude Paul Aug. 23, 2021, 9:18 p.m. UTC | #10
[snip]

I think I might still be misunderstanding something, some comments below

On Mon, 2021-08-23 at 06:33 +0000, Lin, Wayne wrote:
> > > Hi Lyude,
> > > 
> > > Really thankful for willing to explain in such details. Really
> > > appreciate.
> > > 
> > > I'm trying to fix some problems that observed after these 2 patches
> > > * 09b974e8983 drm/amd/amdgpu_dm/mst: Remove ->destroy_connector()
> > > callback
> > > * 72dc0f51591 drm/dp_mst: Remove
> > > drm_dp_mst_topology_cbs.destroy_connector
> > > 
> > > With above patches, we now change to remove dc_sink when connector is
> > > about to be destroyed. However, we found out that connectors won't get
> > > destroyed after hotplugs. Thus, after few times hotplugs, we won't
> > > create any new dc_sink since number of sink is exceeding our
> > > limitation. As the result of that, I'm trying to figure out why the
> > > refcount of connectors won't get zero.
> > > 
> > > Based on my analysis, I found out that if we connect a sst monitor to
> > > a mst hub then connect the hub to the system, and then unplug the sst
> > > monitor from the hub. E.g.
> > > src - mst hub - sst monitor => src - mst hub  (unplug) sst monitor
> > > 
> > > Within this case, we won't try to put refcount of the sst monitor.
> > > Which is what I tried to resolve by [PATCH 3/4].
> > > But here comes a problem which is confusing me that if I can destroy
> > > connector in this case. By comparing to another case, if now mst hub
> > > is connected with a mst monitor like this:
> > > src - mst hub - mst monitor => src - mst hub  (unplug) mst monitor
> > > 
> > > We will put the topology refcount of mst monitor's branching unit in
> > > and
> > > drm_dp_port_set_pdt() and eventually call
> > > drm_dp_delayed_destroy_port() to unregister the connector of the
> > > logical port. So following the same rule, I think to dynamically
> > > unregister a mst connector is what we want and should be reasonable to
> > > also destroy sst connectors in my case. But this conflicts the idea
> > > what we have here. We want to create connectors for all output ports.
> > > So if dynamically creating/destroying connectors is what we want, when
> > > is the appropriate time for us to create one is what I'm considering.
> > > 
> > > Take the StartTech hub DP 1to4 DP output ports for instance. This hub,
> > > internally, is constructed by  3 1-to-2 mst branch chips. 2 output
> > > ports of 1st chip are hardwired to another 2 chips. It's how it makes
> > > it to support 1-to-4 mst branching. So within this case, the internal
> > > 2 output ports of 1st chip is not connecting to a stream sink and will
> > > never get connected to one.  Thus, I'm thinking maybe the best timing
> > > to attach a connector to a port is when the port is connected, and the
> > > connected PDT is determined as a stream sink.
> > > 
> > > Sorry if I misunderstand anything here and really thanks for your time
> > > to shed light on this : ) Thanks Lyude.
> > 
> > It's no problem, it is my job after all! Sorry for how long my responses
> > have been taking, but my plate seems to be finally clearing up
> > for the foreseeable future.
> > 
> > That being said - it sounds like with this we still aren't actually clear
> > on where the topology refcount leak is happening - only when it's
> > happening, which says to me that's the issue we really need to be figuring
> > out the cause of as opposed to trying to workaround it.
> > 
> > Actually - refcount leaks is an issue I've ran into a number of times
> > before in the past, so a while back I actually added some nice
> > debugging features to assist with debugging leaks. If you enable the
> > following options in your kernel config:
> > 
> > CONFIG_EXPERT=y # This must be set first before the next option
> > CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y
> > 
> > Unfortunately, I'm suddenly realizing after typing this that apparently I
> > never bothered adding a way for us to debug the refcounts of
> > ports/mstbs that haven't been released yet - only the ones for ones that
> > have. This shouldn't be difficult at all for me to add, so I'll
> > send you a patch either today or at the start of next week to try
> > debugging with using this, and then we can figure out where this leak
> > is really coming from.
> 
> Thanks Lyude!
> 
> Sorry to bother you, but I would like to clarify this again.  So it sounds

It's no problem! It's my job and I'm happy to help :).

> like you also agree that we should destroy associated connector

Not quite. I think a better way of explaining this might be to point out
that the lifetime of an MST port and its connector isn't supposed to be
determined by whether or not it has something plugged into it - its
lifetime is supposed to depend on whether there's a valid path from us
down the MST topology to the port we're trying to reach. So an MSTB with
ports that is unplugged would destroy all of its ports - but an
unplugged port should just be the same as a disconnected DRM connector -
even if the port itself is just hosting a branching device.

Additionally - we don't want to try "delaying" connector creation
either. In the modern world hotplugging is almost always reliable in
normal situations, but even so there's still use cases for wanting force
probing for analog devices on DP converters and just in general as it's
a feature commonly used by developers or users working around monitors
with problematic HPD issues or EDID issues.

> when we unplug sst monitor from a mst hub in the case that I described? In
> the case I described (unplug sst monitor), we only receive
> CSN from the hub that notifying us the connection status of one of its
> downstream output ports is changed to disconnected. There is no
> topology refcount needed to be decreased on this disconnected port but the
> malloc refcount. Since the output port is still declared by

Apologies - I misunderstood your original mail as implying that topology
refcounts were being leaked - but it sounds like it's actually malloc
refcounts being leaked instead? In any case - that means we're still
tracing down a leak, just a malloc ref leak.

But, this still doesn't totally make sense to me. Malloc refs only keep
the actual drm_dp_mst_port/drm_dp_mst_branch struct alive in memory.
Nothing else is kept around, meaning the DRM connector (and I assume by
proxy, the dc_sink) should both be getting dropped still and the only
thing that should be leaked is a memory allocation. These things should
instead be dropped once there's no longer any topology references
around. So, are we _sure_ that the problem here is a missing
drm_dp_mst_port_put_malloc() or drm_dp_mst_mstb_put_malloc()?

If we are unfortunately we don't have equivalent tools for malloc()
tracing. I'm totally fine with trying to add some if we have trouble
figuring out this issue, but I'm a bit suspicious of the commits you
mentioned that introduced this problem. If the problem doesn't
happen until those two commits, then it's something in the code changes
there that are causing this problem.

The main thing I'm suspicious of just from looking at changes in
09b974e8983a4b163d4a406b46d50bf869da3073 is that the call to
amdgpu_dm_update_freesync_caps() that was previously in
dm_dp_destroy_mst_connector() appears to be dropped and not re-added in
(oh dear, this is a /very/ confusingly similar function name!!!)
dm_dp_mst_connector_destroy(). I don't remember if this was intentional
on my part, but does adding a call back to
amdgpu_dm_update_freesync_caps() into dm_dp_destroy_mst_connector()
right before the dc_link_remove_remote_sink() call fix anything?

As well, I'm far less suspicious of this one but does re-adding this
hunk:

	aconnector->dc_sink = NULL;
	aconnector->dc_link->cur_link_settings.lane_count = 0;

After dc_sink_release() fix anything either?

> the mst hub,  I think we shouldn't destroy the port. Actually, no ports nor
> mst branch devices should get destroyed in this case I think.
> The result of LINK_ADDRESS is still the same before/after removing the sst
> monitor except the
> DisplayPort_Device_Plug_Status/ Legacy_Device_Plug_Status.
> 
> Hence, if you agree that we should put refcount of the connector of the
> disconnected port within the unplugging sst monitor case to
> release the allocated resource, it means we don't want to create connectors
> for those disconnected ports. Which conflicts current flow
> to create connectors for all declared output ports.
> 
> Thanks again for your time Lyude!
Lin, Wayne Aug. 25, 2021, 3:35 a.m. UTC | #11
[Public]

> -----Original Message-----
> From: Lyude Paul <lyude@redhat.com>
> Sent: Tuesday, August 24, 2021 5:18 AM
> To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland, Harry <Harry.Wentland@amd.com>; Zuo, Jerry
> <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li <juston.li@intel.com>; Imre Deak <imre.deak@intel.com>;
> Ville Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten Lankhorst
> <maarten.lankhorst@linux.intel.com>; Maxime Ripard <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>;
> David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher, Alexander <Alexander.Deucher@amd.com>; Siqueira,
> Rodrigo <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> <bas@basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>; Manasi Navare
> <manasi.d.navare@intel.com>; Ankit Nautiyal <ankit.k.nautiyal@intel.com>; José Roberto de Souza <jose.souza@intel.com>; Sean
> Paul <seanpaul@chromium.org>; Ben Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device
>
> [snip]
>
> I think I might still be misunderstanding something, some comments below
>
> On Mon, 2021-08-23 at 06:33 +0000, Lin, Wayne wrote:
> > > > Hi Lyude,
> > > >
> > > > Really thankful for willing to explain in such details. Really
> > > > appreciate.
> > > >
> > > > I'm trying to fix some problems that observed after these 2
> > > > patches
> > > > * 09b974e8983 drm/amd/amdgpu_dm/mst: Remove ->destroy_connector()
> > > > callback
> > > > * 72dc0f51591 drm/dp_mst: Remove
> > > > drm_dp_mst_topology_cbs.destroy_connector
> > > >
> > > > With above patches, we now change to remove dc_sink when connector
> > > > is about to be destroyed. However, we found out that connectors
> > > > won't get destroyed after hotplugs. Thus, after few times
> > > > hotplugs, we won't create any new dc_sink since number of sink is
> > > > exceeding our limitation. As the result of that, I'm trying to
> > > > figure out why the refcount of connectors won't get zero.
> > > >
> > > > Based on my analysis, I found out that if we connect a sst monitor
> > > > to a mst hub then connect the hub to the system, and then unplug
> > > > the sst monitor from the hub. E.g.
> > > > src - mst hub - sst monitor => src - mst hub  (unplug) sst monitor
> > > >
> > > > Within this case, we won't try to put refcount of the sst monitor.
> > > > Which is what I tried to resolve by [PATCH 3/4].
> > > > But here comes a problem which is confusing me that if I can
> > > > destroy connector in this case. By comparing to another case, if
> > > > now mst hub is connected with a mst monitor like this:
> > > > src - mst hub - mst monitor => src - mst hub  (unplug) mst monitor
> > > >
> > > > We will put the topology refcount of mst monitor's branching unit
> > > > in and
> > > > drm_dp_port_set_pdt() and eventually call
> > > > drm_dp_delayed_destroy_port() to unregister the connector of the
> > > > logical port. So following the same rule, I think to dynamically
> > > > unregister a mst connector is what we want and should be
> > > > reasonable to also destroy sst connectors in my case. But this
> > > > conflicts the idea what we have here. We want to create connectors for all output ports.
> > > > So if dynamically creating/destroying connectors is what we want,
> > > > when is the appropriate time for us to create one is what I'm considering.
> > > >
> > > > Take the StartTech hub DP 1to4 DP output ports for instance. This
> > > > hub, internally, is constructed by  3 1-to-2 mst branch chips. 2
> > > > output ports of 1st chip are hardwired to another 2 chips. It's
> > > > how it makes it to support 1-to-4 mst branching. So within this
> > > > case, the internal
> > > > 2 output ports of 1st chip is not connecting to a stream sink and
> > > > will never get connected to one.  Thus, I'm thinking maybe the
> > > > best timing to attach a connector to a port is when the port is
> > > > connected, and the connected PDT is determined as a stream sink.
> > > >
> > > > Sorry if I misunderstand anything here and really thanks for your
> > > > time to shed light on this : ) Thanks Lyude.
> > >
> > > It's no problem, it is my job after all! Sorry for how long my
> > > responses have been taking, but my plate seems to be finally
> > > clearing up for the foreseeable future.
> > >
> > > That being said - it sounds like with this we still aren't actually
> > > clear on where the topology refcount leak is happening - only when
> > > it's happening, which says to me that's the issue we really need to
> > > be figuring out the cause of as opposed to trying to workaround it.
> > >
> > > Actually - refcount leaks is an issue I've ran into a number of
> > > times before in the past, so a while back I actually added some nice
> > > debugging features to assist with debugging leaks. If you enable the
> > > following options in your kernel config:
> > >
> > > CONFIG_EXPERT=y # This must be set first before the next option
> > > CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y
> > >
> > > Unfortunately, I'm suddenly realizing after typing this that
> > > apparently I never bothered adding a way for us to debug the

> > > refcounts of ports/mstbs that haven't been released yet - only the
> > > ones for ones that have. This shouldn't be difficult at all for me
> > > to add, so I'll send you a patch either today or at the start of
> > > next week to try debugging with using this, and then we can figure
> > > out where this leak is really coming from.
> >
> > Thanks Lyude!
> >
> > Sorry to bother you, but I would like to clarify this again.  So it
> > sounds
>
> It's no problem! It's my job and I'm happy to help :).

Thanks!
I would like to learn more from you as below : p
>
> > like you also agree that we should destroy associated connector
>
> Not quite. I think a better way of explaining this might be to point out that the lifetime of an MST port and its connector isn't supposed
> to be determined by whether or not it has something plugged into it - its lifetime is supposed to depend on whether there's a valid
> path from us down the MST topology to the port we're trying to reach. So an MSTB with ports that is unplugged would destroy all of
> its ports - but an unplugged port should just be the same as a disconnected DRM connector - even if the port itself is just hosting a
> branching device.

This is the part a bit difficult to me. I treat DRM connector as the place where we associate with a stream sink. So if the statement
is "All DP mst output ports are places we connect with stream sink", I would say false to this since I can find the negative example when
output port is connected with mst branch device. Thus, looks like we could only determine whether to create a connector for an output
port when the peer device type is known?
>
> Additionally - we don't want to try "delaying" connector creation either. In the modern world hotplugging is almost always reliable in
> normal situations, but even so there's still use cases for wanting force probing for analog devices on DP converters and just in general
> as it's a feature commonly used by developers or users working around monitors with problematic HPD issues or EDID issues.

I think I understand that why we want to create connectors for all output ports here. But under these mentioned use cases, aren't we still
capable to force connector to enable stream? MST hub with muti-functon capability, it will enumerate connected virtual DP peer device.
For problematic HPD issues or EDID issues, their connection status is also connected.

My understanding of output port is it is an internal node to help construct an end-to-end virtual channel between a stream source device
and a stream sink device. Creating connectors for internal nodes within a virtual channel is a bit hard for me to get the idea. Please correct
me if I misunderstand anything here. Thanks Lyude!
>
> > when we unplug sst monitor from a mst hub in the case that I
> > described? In the case I described (unplug sst monitor), we only
> > receive CSN from the hub that notifying us the connection status of
> > one of its downstream output ports is changed to disconnected. There
> > is no topology refcount needed to be decreased on this disconnected
> > port but the malloc refcount. Since the output port is still declared
> > by
>
> Apologies - I misunderstood your original mail as implying that topology refcounts were being leaked - but it sounds like it's actually
> malloc refcounts being leaked instead? In any case - that means we're still tracing down a leak, just a malloc ref leak.
>
> But, this still doesn't totally make sense to me. Malloc refs only keep the actual drm_dp_mst_port/drm_dp_mst_branch struct alive in
> memory.
> Nothing else is kept around, meaning the DRM connector (and I assume by proxy, the dc_sink) should both be getting dropped still
> and the only thing that should be leaked is a memory allocation. These things should instead be dropped once there's no longer any
> topology references around. So, are we _sure_ that the problem here is a missing
> drm_dp_mst_port_put_malloc() or drm_dp_mst_mstb_put_malloc()?

Just my two cents, I don't think it's leak of malloc ref neither. As you said, malloc ref is dealing with the last step to free port/mstb.
If there is still topology refcount on port/mstb in my case, we won't free port/mstb.
>
> If we are unfortunately we don't have equivalent tools for malloc() tracing. I'm totally fine with trying to add some if we have trouble
> figuring out this issue, but I'm a bit suspicious of the commits you mentioned that introduced this problem. If the problem doesn't
> happen until those two commits, then it's something in the code changes there that are causing this problem.

I think we probably also have the problem before these commits, but we didn't notice this before. Just when we change to clean up all
things in dm_dp_mst_connector_destroy(), I start to try to figure out all these things out.
>
> The main thing I'm suspicious of just from looking at changes in
> 09b974e8983a4b163d4a406b46d50bf869da3073 is that the call to
> amdgpu_dm_update_freesync_caps() that was previously in
> dm_dp_destroy_mst_connector() appears to be dropped and not re-added in (oh dear, this is a /very/ confusingly similar function

Lol. I also have hard time on this..
> name!!!) dm_dp_mst_connector_destroy(). I don't remember if this was intentional on my part, but does adding a call back to
> amdgpu_dm_update_freesync_caps() into dm_dp_destroy_mst_connector() right before the dc_link_remove_remote_sink() call fix
> anything?
>
> As well, I'm far less suspicious of this one but does re-adding this
> hunk:
>
>       aconnector->dc_sink = NULL;
>       aconnector->dc_link->cur_link_settings.lane_count = 0;
>
> After dc_sink_release() fix anything either?

So the main problem is we don't have chance to call dc_link_remove_remote_sink() in the unplugging SST case. We only have chance to
remove the remote sink of a link when unplug a mstb.
>
> > the mst hub,  I think we shouldn't destroy the port. Actually, no
> > ports nor mst branch devices should get destroyed in this case I think.
> > The result of LINK_ADDRESS is still the same before/after removing the
> > sst monitor except the DisplayPort_Device_Plug_Status/
> > Legacy_Device_Plug_Status.
> >
> > Hence, if you agree that we should put refcount of the connector of
> > the disconnected port within the unplugging sst monitor case to
> > release the allocated resource, it means we don't want to create
> > connectors for those disconnected ports. Which conflicts current flow
> > to create connectors for all declared output ports.
> >
> > Thanks again for your time Lyude!
>
> --
> Cheers,
>  Lyude Paul (she/her)
>  Software Engineer at Red Hat
--
Regards,
Wayne
Lyude Paul Aug. 31, 2021, 10:47 p.m. UTC | #12
(I am going to try responding to this tomorrow btw. I haven't been super busy
this week, but this has been a surprisingly difficult email to respond to
because I need to actually need to do a deep dive some of the MST helpers
tomorrow to figure out more of the specifics on why I realized we couldn't
just hot add/remove port->connector here).

On Wed, 2021-08-25 at 03:35 +0000, Lin, Wayne wrote:
> [Public]
> 
> > -----Original Message-----
> > From: Lyude Paul <lyude@redhat.com>
> > Sent: Tuesday, August 24, 2021 5:18 AM
> > To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland, Harry
> > <Harry.Wentland@amd.com>; Zuo, Jerry
> > <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li
> > <juston.li@intel.com>; Imre Deak <imre.deak@intel.com>;
> > Ville Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter
> > <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten Lankhorst
> > <maarten.lankhorst@linux.intel.com>; Maxime Ripard <mripard@kernel.org>;
> > Thomas Zimmermann <tzimmermann@suse.de>;
> > David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher,
> > Alexander <Alexander.Deucher@amd.com>; Siqueira,
> > Rodrigo <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> > <bas@basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij@amd.com>; Jani
> > Nikula <jani.nikula@intel.com>; Manasi Navare
> > <manasi.d.navare@intel.com>; Ankit Nautiyal <ankit.k.nautiyal@intel.com>;
> > José Roberto de Souza <jose.souza@intel.com>; Sean
> > Paul <seanpaul@chromium.org>; Ben Skeggs <bskeggs@redhat.com>;
> > stable@vger.kernel.org
> > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected
> > end device
> > 
> > [snip]
> > 
> > I think I might still be misunderstanding something, some comments below
> > 
> > On Mon, 2021-08-23 at 06:33 +0000, Lin, Wayne wrote:
> > > > > Hi Lyude,
> > > > > 
> > > > > Really thankful for willing to explain in such details. Really
> > > > > appreciate.
> > > > > 
> > > > > I'm trying to fix some problems that observed after these 2
> > > > > patches
> > > > > * 09b974e8983 drm/amd/amdgpu_dm/mst: Remove ->destroy_connector()
> > > > > callback
> > > > > * 72dc0f51591 drm/dp_mst: Remove
> > > > > drm_dp_mst_topology_cbs.destroy_connector
> > > > > 
> > > > > With above patches, we now change to remove dc_sink when connector
> > > > > is about to be destroyed. However, we found out that connectors
> > > > > won't get destroyed after hotplugs. Thus, after few times
> > > > > hotplugs, we won't create any new dc_sink since number of sink is
> > > > > exceeding our limitation. As the result of that, I'm trying to
> > > > > figure out why the refcount of connectors won't get zero.
> > > > > 
> > > > > Based on my analysis, I found out that if we connect a sst monitor
> > > > > to a mst hub then connect the hub to the system, and then unplug
> > > > > the sst monitor from the hub. E.g.
> > > > > src - mst hub - sst monitor => src - mst hub  (unplug) sst monitor
> > > > > 
> > > > > Within this case, we won't try to put refcount of the sst monitor.
> > > > > Which is what I tried to resolve by [PATCH 3/4].
> > > > > But here comes a problem which is confusing me that if I can
> > > > > destroy connector in this case. By comparing to another case, if
> > > > > now mst hub is connected with a mst monitor like this:
> > > > > src - mst hub - mst monitor => src - mst hub  (unplug) mst monitor
> > > > > 
> > > > > We will put the topology refcount of mst monitor's branching unit
> > > > > in and
> > > > > drm_dp_port_set_pdt() and eventually call
> > > > > drm_dp_delayed_destroy_port() to unregister the connector of the
> > > > > logical port. So following the same rule, I think to dynamically
> > > > > unregister a mst connector is what we want and should be
> > > > > reasonable to also destroy sst connectors in my case. But this
> > > > > conflicts the idea what we have here. We want to create connectors
> > > > > for all output ports.
> > > > > So if dynamically creating/destroying connectors is what we want,
> > > > > when is the appropriate time for us to create one is what I'm
> > > > > considering.
> > > > > 
> > > > > Take the StartTech hub DP 1to4 DP output ports for instance. This
> > > > > hub, internally, is constructed by  3 1-to-2 mst branch chips. 2
> > > > > output ports of 1st chip are hardwired to another 2 chips. It's
> > > > > how it makes it to support 1-to-4 mst branching. So within this
> > > > > case, the internal
> > > > > 2 output ports of 1st chip is not connecting to a stream sink and
> > > > > will never get connected to one.  Thus, I'm thinking maybe the
> > > > > best timing to attach a connector to a port is when the port is
> > > > > connected, and the connected PDT is determined as a stream sink.
> > > > > 
> > > > > Sorry if I misunderstand anything here and really thanks for your
> > > > > time to shed light on this : ) Thanks Lyude.
> > > > 
> > > > It's no problem, it is my job after all! Sorry for how long my
> > > > responses have been taking, but my plate seems to be finally
> > > > clearing up for the foreseeable future.
> > > > 
> > > > That being said - it sounds like with this we still aren't actually
> > > > clear on where the topology refcount leak is happening - only when
> > > > it's happening, which says to me that's the issue we really need to
> > > > be figuring out the cause of as opposed to trying to workaround it.
> > > > 
> > > > Actually - refcount leaks is an issue I've ran into a number of
> > > > times before in the past, so a while back I actually added some nice
> > > > debugging features to assist with debugging leaks. If you enable the
> > > > following options in your kernel config:
> > > > 
> > > > CONFIG_EXPERT=y # This must be set first before the next option
> > > > CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y
> > > > 
> > > > Unfortunately, I'm suddenly realizing after typing this that
> > > > apparently I never bothered adding a way for us to debug the
> 
> > > > refcounts of ports/mstbs that haven't been released yet - only the
> > > > ones for ones that have. This shouldn't be difficult at all for me
> > > > to add, so I'll send you a patch either today or at the start of
> > > > next week to try debugging with using this, and then we can figure
> > > > out where this leak is really coming from.
> > > 
> > > Thanks Lyude!
> > > 
> > > Sorry to bother you, but I would like to clarify this again.  So it
> > > sounds
> > 
> > It's no problem! It's my job and I'm happy to help :).
> 
> Thanks!
> I would like to learn more from you as below : p
> > 
> > > like you also agree that we should destroy associated connector
> > 
> > Not quite. I think a better way of explaining this might be to point out
> > that the lifetime of an MST port and its connector isn't supposed
> > to be determined by whether or not it has something plugged into it - its
> > lifetime is supposed to depend on whether there's a valid
> > path from us down the MST topology to the port we're trying to reach. So
> > an MSTB with ports that is unplugged would destroy all of
> > its ports - but an unplugged port should just be the same as a
> > disconnected DRM connector - even if the port itself is just hosting a
> > branching device.
> 
> This is the part a bit difficult to me. I treat DRM connector as the place
> where we associate with a stream sink. So if the statement
> is "All DP mst output ports are places we connect with stream sink", I would
> say false to this since I can find the negative example when
> output port is connected with mst branch device. Thus, looks like we could
> only determine whether to create a connector for an output
> port when the peer device type is known?
> > 
> > Additionally - we don't want to try "delaying" connector creation either.
> > In the modern world hotplugging is almost always reliable in
> > normal situations, but even so there's still use cases for wanting force
> > probing for analog devices on DP converters and just in general
> > as it's a feature commonly used by developers or users working around
> > monitors with problematic HPD issues or EDID issues.
> 
> I think I understand that why we want to create connectors for all output
> ports here. But under these mentioned use cases, aren't we still
> capable to force connector to enable stream? MST hub with muti-functon
> capability, it will enumerate connected virtual DP peer device.
> For problematic HPD issues or EDID issues, their connection status is also
> connected.
> 
> My understanding of output port is it is an internal node to help construct
> an end-to-end virtual channel between a stream source device
> and a stream sink device. Creating connectors for internal nodes within a
> virtual channel is a bit hard for me to get the idea. Please correct
> me if I misunderstand anything here. Thanks Lyude!
> > 
> > > when we unplug sst monitor from a mst hub in the case that I
> > > described? In the case I described (unplug sst monitor), we only
> > > receive CSN from the hub that notifying us the connection status of
> > > one of its downstream output ports is changed to disconnected. There
> > > is no topology refcount needed to be decreased on this disconnected
> > > port but the malloc refcount. Since the output port is still declared
> > > by
> > 
> > Apologies - I misunderstood your original mail as implying that topology
> > refcounts were being leaked - but it sounds like it's actually
> > malloc refcounts being leaked instead? In any case - that means we're
> > still tracing down a leak, just a malloc ref leak.
> > 
> > But, this still doesn't totally make sense to me. Malloc refs only keep
> > the actual drm_dp_mst_port/drm_dp_mst_branch struct alive in
> > memory.
> > Nothing else is kept around, meaning the DRM connector (and I assume by
> > proxy, the dc_sink) should both be getting dropped still
> > and the only thing that should be leaked is a memory allocation. These
> > things should instead be dropped once there's no longer any
> > topology references around. So, are we _sure_ that the problem here is a
> > missing
> > drm_dp_mst_port_put_malloc() or drm_dp_mst_mstb_put_malloc()?
> 
> Just my two cents, I don't think it's leak of malloc ref neither. As you
> said, malloc ref is dealing with the last step to free port/mstb.
> If there is still topology refcount on port/mstb in my case, we won't free
> port/mstb.
> > 
> > If we are unfortunately we don't have equivalent tools for malloc()
> > tracing. I'm totally fine with trying to add some if we have trouble
> > figuring out this issue, but I'm a bit suspicious of the commits you
> > mentioned that introduced this problem. If the problem doesn't
> > happen until those two commits, then it's something in the code changes
> > there that are causing this problem.
> 
> I think we probably also have the problem before these commits, but we
> didn't notice this before. Just when we change to clean up all
> things in dm_dp_mst_connector_destroy(), I start to try to figure out all
> these things out.
> > 
> > The main thing I'm suspicious of just from looking at changes in
> > 09b974e8983a4b163d4a406b46d50bf869da3073 is that the call to
> > amdgpu_dm_update_freesync_caps() that was previously in
> > dm_dp_destroy_mst_connector() appears to be dropped and not re-added in
> > (oh dear, this is a /very/ confusingly similar function
> 
> Lol. I also have hard time on this..
> > name!!!) dm_dp_mst_connector_destroy(). I don't remember if this was
> > intentional on my part, but does adding a call back to
> > amdgpu_dm_update_freesync_caps() into dm_dp_destroy_mst_connector() right
> > before the dc_link_remove_remote_sink() call fix
> > anything?
> > 
> > As well, I'm far less suspicious of this one but does re-adding this
> > hunk:
> > 
> >       aconnector->dc_sink = NULL;
> >       aconnector->dc_link->cur_link_settings.lane_count = 0;
> > 
> > After dc_sink_release() fix anything either?
> 
> So the main problem is we don't have chance to call
> dc_link_remove_remote_sink() in the unplugging SST case. We only have chance
> to
> remove the remote sink of a link when unplug a mstb.
> > 
> > > the mst hub,  I think we shouldn't destroy the port. Actually, no
> > > ports nor mst branch devices should get destroyed in this case I think.
> > > The result of LINK_ADDRESS is still the same before/after removing the
> > > sst monitor except the DisplayPort_Device_Plug_Status/
> > > Legacy_Device_Plug_Status.
> > > 
> > > Hence, if you agree that we should put refcount of the connector of
> > > the disconnected port within the unplugging sst monitor case to
> > > release the allocated resource, it means we don't want to create
> > > connectors for those disconnected ports. Which conflicts current flow
> > > to create connectors for all declared output ports.
> > > 
> > > Thanks again for your time Lyude!
> > 
> > --
> > Cheers,
> >  Lyude Paul (she/her)
> >  Software Engineer at Red Hat
> --
> Regards,
> Wayne
>
Lyude Paul Sept. 1, 2021, 9:59 p.m. UTC | #13
Actually - did some more thinking, and I think we shouldn't try to make
changes like this until we actually know what the problem is here. I could try
to figure out what the actual race conditions I was facing before with trying
to add/destroy connectors based on PDT, but we still don't even actually have
a clear idea of what's broken here. I'd much rather us figure out exactly how
this leak is happening before considering making changes like this, because we
have no way of knowing if we've properly fixed it or not if we don't know what
the problem is in the first place.

I'm still happy to write up the topology debugging stuff I mentioned to you if
you think that would help you debug this issue - since that would make it a
lot easier for you to track down what references are keeping a connector alive
(and additkionally, where those references were taken in code. thanks
stack_depot!)

On Tue, 2021-08-31 at 18:47 -0400, Lyude Paul wrote:
> (I am going to try responding to this tomorrow btw. I haven't been super
> busy
> this week, but this has been a surprisingly difficult email to respond to
> because I need to actually need to do a deep dive some of the MST helpers
> tomorrow to figure out more of the specifics on why I realized we couldn't
> just hot add/remove port->connector here).
> 
> On Wed, 2021-08-25 at 03:35 +0000, Lin, Wayne wrote:
> > [Public]
> > 
> > > -----Original Message-----
> > > From: Lyude Paul <lyude@redhat.com>
> > > Sent: Tuesday, August 24, 2021 5:18 AM
> > > To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland, Harry
> > > <Harry.Wentland@amd.com>; Zuo, Jerry
> > > <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li
> > > <juston.li@intel.com>; Imre Deak <imre.deak@intel.com>;
> > > Ville Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter
> > > <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten Lankhorst
> > > <maarten.lankhorst@linux.intel.com>; Maxime Ripard <mripard@kernel.org>;
> > > Thomas Zimmermann <tzimmermann@suse.de>;
> > > David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>;
> > > Deucher,
> > > Alexander <Alexander.Deucher@amd.com>; Siqueira,
> > > Rodrigo <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > > <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> > > <bas@basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij@amd.com>; Jani
> > > Nikula <jani.nikula@intel.com>; Manasi Navare
> > > <manasi.d.navare@intel.com>; Ankit Nautiyal
> > > <ankit.k.nautiyal@intel.com>;
> > > José Roberto de Souza <jose.souza@intel.com>; Sean
> > > Paul <seanpaul@chromium.org>; Ben Skeggs <bskeggs@redhat.com>;
> > > stable@vger.kernel.org
> > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected
> > > end device
> > > 
> > > [snip]
> > > 
> > > I think I might still be misunderstanding something, some comments below
> > > 
> > > On Mon, 2021-08-23 at 06:33 +0000, Lin, Wayne wrote:
> > > > > > Hi Lyude,
> > > > > > 
> > > > > > Really thankful for willing to explain in such details. Really
> > > > > > appreciate.
> > > > > > 
> > > > > > I'm trying to fix some problems that observed after these 2
> > > > > > patches
> > > > > > * 09b974e8983 drm/amd/amdgpu_dm/mst: Remove ->destroy_connector()
> > > > > > callback
> > > > > > * 72dc0f51591 drm/dp_mst: Remove
> > > > > > drm_dp_mst_topology_cbs.destroy_connector
> > > > > > 
> > > > > > With above patches, we now change to remove dc_sink when connector
> > > > > > is about to be destroyed. However, we found out that connectors
> > > > > > won't get destroyed after hotplugs. Thus, after few times
> > > > > > hotplugs, we won't create any new dc_sink since number of sink is
> > > > > > exceeding our limitation. As the result of that, I'm trying to
> > > > > > figure out why the refcount of connectors won't get zero.
> > > > > > 
> > > > > > Based on my analysis, I found out that if we connect a sst monitor
> > > > > > to a mst hub then connect the hub to the system, and then unplug
> > > > > > the sst monitor from the hub. E.g.
> > > > > > src - mst hub - sst monitor => src - mst hub  (unplug) sst monitor
> > > > > > 
> > > > > > Within this case, we won't try to put refcount of the sst monitor.
> > > > > > Which is what I tried to resolve by [PATCH 3/4].
> > > > > > But here comes a problem which is confusing me that if I can
> > > > > > destroy connector in this case. By comparing to another case, if
> > > > > > now mst hub is connected with a mst monitor like this:
> > > > > > src - mst hub - mst monitor => src - mst hub  (unplug) mst monitor
> > > > > > 
> > > > > > We will put the topology refcount of mst monitor's branching unit
> > > > > > in and
> > > > > > drm_dp_port_set_pdt() and eventually call
> > > > > > drm_dp_delayed_destroy_port() to unregister the connector of the
> > > > > > logical port. So following the same rule, I think to dynamically
> > > > > > unregister a mst connector is what we want and should be
> > > > > > reasonable to also destroy sst connectors in my case. But this
> > > > > > conflicts the idea what we have here. We want to create connectors
> > > > > > for all output ports.
> > > > > > So if dynamically creating/destroying connectors is what we want,
> > > > > > when is the appropriate time for us to create one is what I'm
> > > > > > considering.
> > > > > > 
> > > > > > Take the StartTech hub DP 1to4 DP output ports for instance. This
> > > > > > hub, internally, is constructed by  3 1-to-2 mst branch chips. 2
> > > > > > output ports of 1st chip are hardwired to another 2 chips. It's
> > > > > > how it makes it to support 1-to-4 mst branching. So within this
> > > > > > case, the internal
> > > > > > 2 output ports of 1st chip is not connecting to a stream sink and
> > > > > > will never get connected to one.  Thus, I'm thinking maybe the
> > > > > > best timing to attach a connector to a port is when the port is
> > > > > > connected, and the connected PDT is determined as a stream sink.
> > > > > > 
> > > > > > Sorry if I misunderstand anything here and really thanks for your
> > > > > > time to shed light on this : ) Thanks Lyude.
> > > > > 
> > > > > It's no problem, it is my job after all! Sorry for how long my
> > > > > responses have been taking, but my plate seems to be finally
> > > > > clearing up for the foreseeable future.
> > > > > 
> > > > > That being said - it sounds like with this we still aren't actually
> > > > > clear on where the topology refcount leak is happening - only when
> > > > > it's happening, which says to me that's the issue we really need to
> > > > > be figuring out the cause of as opposed to trying to workaround it.
> > > > > 
> > > > > Actually - refcount leaks is an issue I've ran into a number of
> > > > > times before in the past, so a while back I actually added some nice
> > > > > debugging features to assist with debugging leaks. If you enable the
> > > > > following options in your kernel config:
> > > > > 
> > > > > CONFIG_EXPERT=y # This must be set first before the next option
> > > > > CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y
> > > > > 
> > > > > Unfortunately, I'm suddenly realizing after typing this that
> > > > > apparently I never bothered adding a way for us to debug the
> > 
> > > > > refcounts of ports/mstbs that haven't been released yet - only the
> > > > > ones for ones that have. This shouldn't be difficult at all for me
> > > > > to add, so I'll send you a patch either today or at the start of
> > > > > next week to try debugging with using this, and then we can figure
> > > > > out where this leak is really coming from.
> > > > 
> > > > Thanks Lyude!
> > > > 
> > > > Sorry to bother you, but I would like to clarify this again.  So it
> > > > sounds
> > > 
> > > It's no problem! It's my job and I'm happy to help :).
> > 
> > Thanks!
> > I would like to learn more from you as below : p
> > > 
> > > > like you also agree that we should destroy associated connector
> > > 
> > > Not quite. I think a better way of explaining this might be to point out
> > > that the lifetime of an MST port and its connector isn't supposed
> > > to be determined by whether or not it has something plugged into it -
> > > its
> > > lifetime is supposed to depend on whether there's a valid
> > > path from us down the MST topology to the port we're trying to reach. So
> > > an MSTB with ports that is unplugged would destroy all of
> > > its ports - but an unplugged port should just be the same as a
> > > disconnected DRM connector - even if the port itself is just hosting a
> > > branching device.
> > 
> > This is the part a bit difficult to me. I treat DRM connector as the place
> > where we associate with a stream sink. So if the statement
> > is "All DP mst output ports are places we connect with stream sink", I
> > would
> > say false to this since I can find the negative example when
> > output port is connected with mst branch device. Thus, looks like we could
> > only determine whether to create a connector for an output
> > port when the peer device type is known?
> > > 
> > > Additionally - we don't want to try "delaying" connector creation
> > > either.
> > > In the modern world hotplugging is almost always reliable in
> > > normal situations, but even so there's still use cases for wanting force
> > > probing for analog devices on DP converters and just in general
> > > as it's a feature commonly used by developers or users working around
> > > monitors with problematic HPD issues or EDID issues.
> > 
> > I think I understand that why we want to create connectors for all output
> > ports here. But under these mentioned use cases, aren't we still
> > capable to force connector to enable stream? MST hub with muti-functon
> > capability, it will enumerate connected virtual DP peer device.
> > For problematic HPD issues or EDID issues, their connection status is also
> > connected.
> > 
> > My understanding of output port is it is an internal node to help
> > construct
> > an end-to-end virtual channel between a stream source device
> > and a stream sink device. Creating connectors for internal nodes within a
> > virtual channel is a bit hard for me to get the idea. Please correct
> > me if I misunderstand anything here. Thanks Lyude!
> > > 
> > > > when we unplug sst monitor from a mst hub in the case that I
> > > > described? In the case I described (unplug sst monitor), we only
> > > > receive CSN from the hub that notifying us the connection status of
> > > > one of its downstream output ports is changed to disconnected. There
> > > > is no topology refcount needed to be decreased on this disconnected
> > > > port but the malloc refcount. Since the output port is still declared
> > > > by
> > > 
> > > Apologies - I misunderstood your original mail as implying that topology
> > > refcounts were being leaked - but it sounds like it's actually
> > > malloc refcounts being leaked instead? In any case - that means we're
> > > still tracing down a leak, just a malloc ref leak.
> > > 
> > > But, this still doesn't totally make sense to me. Malloc refs only keep
> > > the actual drm_dp_mst_port/drm_dp_mst_branch struct alive in
> > > memory.
> > > Nothing else is kept around, meaning the DRM connector (and I assume by
> > > proxy, the dc_sink) should both be getting dropped still
> > > and the only thing that should be leaked is a memory allocation. These
> > > things should instead be dropped once there's no longer any
> > > topology references around. So, are we _sure_ that the problem here is a
> > > missing
> > > drm_dp_mst_port_put_malloc() or drm_dp_mst_mstb_put_malloc()?
> > 
> > Just my two cents, I don't think it's leak of malloc ref neither. As you
> > said, malloc ref is dealing with the last step to free port/mstb.
> > If there is still topology refcount on port/mstb in my case, we won't free
> > port/mstb.
> > > 
> > > If we are unfortunately we don't have equivalent tools for malloc()
> > > tracing. I'm totally fine with trying to add some if we have trouble
> > > figuring out this issue, but I'm a bit suspicious of the commits you
> > > mentioned that introduced this problem. If the problem doesn't
> > > happen until those two commits, then it's something in the code changes
> > > there that are causing this problem.
> > 
> > I think we probably also have the problem before these commits, but we
> > didn't notice this before. Just when we change to clean up all
> > things in dm_dp_mst_connector_destroy(), I start to try to figure out all
> > these things out.
> > > 
> > > The main thing I'm suspicious of just from looking at changes in
> > > 09b974e8983a4b163d4a406b46d50bf869da3073 is that the call to
> > > amdgpu_dm_update_freesync_caps() that was previously in
> > > dm_dp_destroy_mst_connector() appears to be dropped and not re-added in
> > > (oh dear, this is a /very/ confusingly similar function
> > 
> > Lol. I also have hard time on this..
> > > name!!!) dm_dp_mst_connector_destroy(). I don't remember if this was
> > > intentional on my part, but does adding a call back to
> > > amdgpu_dm_update_freesync_caps() into dm_dp_destroy_mst_connector()
> > > right
> > > before the dc_link_remove_remote_sink() call fix
> > > anything?
> > > 
> > > As well, I'm far less suspicious of this one but does re-adding this
> > > hunk:
> > > 
> > >       aconnector->dc_sink = NULL;
> > >       aconnector->dc_link->cur_link_settings.lane_count = 0;
> > > 
> > > After dc_sink_release() fix anything either?
> > 
> > So the main problem is we don't have chance to call
> > dc_link_remove_remote_sink() in the unplugging SST case. We only have
> > chance
> > to
> > remove the remote sink of a link when unplug a mstb.
> > > 
> > > > the mst hub,  I think we shouldn't destroy the port. Actually, no
> > > > ports nor mst branch devices should get destroyed in this case I
> > > > think.
> > > > The result of LINK_ADDRESS is still the same before/after removing the
> > > > sst monitor except the DisplayPort_Device_Plug_Status/
> > > > Legacy_Device_Plug_Status.
> > > > 
> > > > Hence, if you agree that we should put refcount of the connector of
> > > > the disconnected port within the unplugging sst monitor case to
> > > > release the allocated resource, it means we don't want to create
> > > > connectors for those disconnected ports. Which conflicts current flow
> > > > to create connectors for all declared output ports.
> > > > 
> > > > Thanks again for your time Lyude!
> > > 
> > > --
> > > Cheers,
> > >  Lyude Paul (she/her)
> > >  Software Engineer at Red Hat
> > --
> > Regards,
> > Wayne
> > 
>
Lin, Wayne Sept. 14, 2021, 8:46 a.m. UTC | #14
[Public]

> -----Original Message-----
> From: Lyude Paul <lyude@redhat.com>
> Sent: Thursday, September 2, 2021 6:00 AM
> To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland, Harry <Harry.Wentland@amd.com>; Zuo, Jerry
> <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li <juston.li@intel.com>; Imre Deak <imre.deak@intel.com>;
> Ville Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten Lankhorst
> <maarten.lankhorst@linux.intel.com>; Maxime Ripard <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>;
> David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher, Alexander <Alexander.Deucher@amd.com>; Siqueira,
> Rodrigo <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> <bas@basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>; Manasi Navare
> <manasi.d.navare@intel.com>; Ankit Nautiyal <ankit.k.nautiyal@intel.com>; José Roberto de Souza <jose.souza@intel.com>; Sean
> Paul <seanpaul@chromium.org>; Ben Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device
>
> Actually - did some more thinking, and I think we shouldn't try to make changes like this until we actually know what the problem is
> here. I could try to figure out what the actual race conditions I was facing before with trying to add/destroy connectors based on PDT,
> but we still don't even actually have a clear idea of what's broken here. I'd much rather us figure out exactly how this leak is
> happening before considering making changes like this, because we have no way of knowing if we've properly fixed it or not if we
> don't know what the problem is in the first place.
>
> I'm still happy to write up the topology debugging stuff I mentioned to you if you think that would help you debug this issue - since
> that would make it a lot easier for you to track down what references are keeping a connector alive (and additkionally, where those
> references were taken in code. thanks
> stack_depot!)
Hi Lyude,
Sorry for late response. A bit busy on other stuff recently..

Really really thankful for all your help : ) I'm also glad to have the debugging tool if it won’t bother you too much. But before debugging,
I need to have consensus with you about where do we expect to release resource allocated for a stream sink when it's reported as
disconnected. Previous patch suggests releasing resource when connector is destroyed which will happen when topology refcount
reaches zero (i.e. unplug mstb from topology). But when the case is receiving CSN notifying connection change, we don't try to destroy
connector in this case now. And this is not caused by topology/malloc refcount leak since I don't expect neither one of them get
decrease to zero under this case (topology of mstbs and ports is not changed). Hence, my plan was to also try to destroy connector under
this case and the reason is reasonable to me as described in previous mail. With this patch set, I can see connectors eventually get
successfully destroyed after userspace committing set_crtc() to free connectors (although also need a fix on the connector refcount
grabbed by drm_client_modeset_probe() under specific scenario).

I think the main problem I encountered here is that I couldn't find a place that notify us to release resource allocated for a disconnected
stream sink when receive CSN. If we decide not to destroy connector under this case, then I probably need some guidance about where
to do the release work.

Thanks again Lyude!
>
> On Tue, 2021-08-31 at 18:47 -0400, Lyude Paul wrote:
> > (I am going to try responding to this tomorrow btw. I haven't been
> > super busy this week, but this has been a surprisingly difficult email
> > to respond to because I need to actually need to do a deep dive some
> > of the MST helpers tomorrow to figure out more of the specifics on why
> > I realized we couldn't just hot add/remove port->connector here).
> >
> > On Wed, 2021-08-25 at 03:35 +0000, Lin, Wayne wrote:
> > > [Public]
> > >
> > > > -----Original Message-----
> > > > From: Lyude Paul <lyude@redhat.com>
> > > > Sent: Tuesday, August 24, 2021 5:18 AM
> > > > To: Lin, Wayne <Wayne.Lin@amd.com>;
> > > > dri-devel@lists.freedesktop.org
> > > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland,
> > > > Harry <Harry.Wentland@amd.com>; Zuo, Jerry <Jerry.Zuo@amd.com>;
> > > > Wu, Hersen <hersenxs.wu@amd.com>; Juston Li <juston.li@intel.com>;
> > > > Imre Deak <imre.deak@intel.com>; Ville Syrjälä
> > > > <ville.syrjala@linux.intel.com>; Daniel Vetter
> > > > <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten
> > > > Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>;
> > > > David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>;
> > > > Deucher, Alexander <Alexander.Deucher@amd.com>; Siqueira, Rodrigo
> > > > <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > > > <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> > > > <bas@basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij@amd.com>;
> > > > Jani Nikula <jani.nikula@intel.com>; Manasi Navare
> > > > <manasi.d.navare@intel.com>; Ankit Nautiyal
> > > > <ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > connected end device
> > > >
> > > > [snip]
> > > >
> > > > I think I might still be misunderstanding something, some comments
> > > > below
> > > >
> > > > On Mon, 2021-08-23 at 06:33 +0000, Lin, Wayne wrote:
> > > > > > > Hi Lyude,
> > > > > > >
> > > > > > > Really thankful for willing to explain in such details.
> > > > > > > Really appreciate.
> > > > > > >
> > > > > > > I'm trying to fix some problems that observed after these 2
> > > > > > > patches
> > > > > > > * 09b974e8983 drm/amd/amdgpu_dm/mst: Remove
> > > > > > > ->destroy_connector() callback
> > > > > > > * 72dc0f51591 drm/dp_mst: Remove
> > > > > > > drm_dp_mst_topology_cbs.destroy_connector
> > > > > > >
> > > > > > > With above patches, we now change to remove dc_sink when
> > > > > > > connector is about to be destroyed. However, we found out
> > > > > > > that connectors won't get destroyed after hotplugs. Thus,
> > > > > > > after few times hotplugs, we won't create any new dc_sink
> > > > > > > since number of sink is exceeding our limitation. As the
> > > > > > > result of that, I'm trying to figure out why the refcount of connectors won't get zero.
> > > > > > >
> > > > > > > Based on my analysis, I found out that if we connect a sst
> > > > > > > monitor to a mst hub then connect the hub to the system, and
> > > > > > > then unplug the sst monitor from the hub. E.g.
> > > > > > > src - mst hub - sst monitor => src - mst hub  (unplug) sst
> > > > > > > monitor
> > > > > > >
> > > > > > > Within this case, we won't try to put refcount of the sst monitor.
> > > > > > > Which is what I tried to resolve by [PATCH 3/4].
> > > > > > > But here comes a problem which is confusing me that if I can
> > > > > > > destroy connector in this case. By comparing to another
> > > > > > > case, if now mst hub is connected with a mst monitor like this:
> > > > > > > src - mst hub - mst monitor => src - mst hub  (unplug) mst
> > > > > > > monitor
> > > > > > >
> > > > > > > We will put the topology refcount of mst monitor's branching
> > > > > > > unit in and
> > > > > > > drm_dp_port_set_pdt() and eventually call
> > > > > > > drm_dp_delayed_destroy_port() to unregister the connector of
> > > > > > > the logical port. So following the same rule, I think to
> > > > > > > dynamically unregister a mst connector is what we want and
> > > > > > > should be reasonable to also destroy sst connectors in my
> > > > > > > case. But this conflicts the idea what we have here. We want
> > > > > > > to create connectors for all output ports.
> > > > > > > So if dynamically creating/destroying connectors is what we
> > > > > > > want, when is the appropriate time for us to create one is
> > > > > > > what I'm considering.
> > > > > > >
> > > > > > > Take the StartTech hub DP 1to4 DP output ports for instance.
> > > > > > > This hub, internally, is constructed by  3 1-to-2 mst branch
> > > > > > > chips. 2 output ports of 1st chip are hardwired to another 2
> > > > > > > chips. It's how it makes it to support 1-to-4 mst branching.
> > > > > > > So within this case, the internal
> > > > > > > 2 output ports of 1st chip is not connecting to a stream
> > > > > > > sink and will never get connected to one.  Thus, I'm
> > > > > > > thinking maybe the best timing to attach a connector to a
> > > > > > > port is when the port is connected, and the connected PDT is determined as a stream sink.
> > > > > > >
> > > > > > > Sorry if I misunderstand anything here and really thanks for
> > > > > > > your time to shed light on this : ) Thanks Lyude.
> > > > > >
> > > > > > It's no problem, it is my job after all! Sorry for how long my
> > > > > > responses have been taking, but my plate seems to be finally
> > > > > > clearing up for the foreseeable future.
> > > > > >
> > > > > > That being said - it sounds like with this we still aren't
> > > > > > actually clear on where the topology refcount leak is
> > > > > > happening - only when it's happening, which says to me that's
> > > > > > the issue we really need to be figuring out the cause of as opposed to trying to workaround it.
> > > > > >
> > > > > > Actually - refcount leaks is an issue I've ran into a number
> > > > > > of times before in the past, so a while back I actually added
> > > > > > some nice debugging features to assist with debugging leaks.
> > > > > > If you enable the following options in your kernel config:
> > > > > >
> > > > > > CONFIG_EXPERT=y # This must be set first before the next
> > > > > > option CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y
> > > > > >
> > > > > > Unfortunately, I'm suddenly realizing after typing this that
> > > > > > apparently I never bothered adding a way for us to debug the
> > >
> > > > > > refcounts of ports/mstbs that haven't been released yet - only
> > > > > > the ones for ones that have. This shouldn't be difficult at
> > > > > > all for me to add, so I'll send you a patch either today or at
> > > > > > the start of next week to try debugging with using this, and
> > > > > > then we can figure out where this leak is really coming from.
> > > > >
> > > > > Thanks Lyude!
> > > > >
> > > > > Sorry to bother you, but I would like to clarify this again.  So
> > > > > it sounds
> > > >
> > > > It's no problem! It's my job and I'm happy to help :).
> > >
> > > Thanks!
> > > I would like to learn more from you as below : p
> > > >
> > > > > like you also agree that we should destroy associated connector
> > > >
> > > > Not quite. I think a better way of explaining this might be to
> > > > point out that the lifetime of an MST port and its connector isn't
> > > > supposed to be determined by whether or not it has something
> > > > plugged into it - its lifetime is supposed to depend on whether
> > > > there's a valid path from us down the MST topology to the port
> > > > we're trying to reach. So an MSTB with ports that is unplugged
> > > > would destroy all of its ports - but an unplugged port should just
> > > > be the same as a disconnected DRM connector - even if the port
> > > > itself is just hosting a branching device.
> > >
> > > This is the part a bit difficult to me. I treat DRM connector as the
> > > place where we associate with a stream sink. So if the statement is
> > > "All DP mst output ports are places we connect with stream sink", I
> > > would say false to this since I can find the negative example when
> > > output port is connected with mst branch device. Thus, looks like we
> > > could only determine whether to create a connector for an output
> > > port when the peer device type is known?
> > > >
> > > > Additionally - we don't want to try "delaying" connector creation
> > > > either.
> > > > In the modern world hotplugging is almost always reliable in
> > > > normal situations, but even so there's still use cases for wanting
> > > > force probing for analog devices on DP converters and just in
> > > > general as it's a feature commonly used by developers or users
> > > > working around monitors with problematic HPD issues or EDID issues.
> > >
> > > I think I understand that why we want to create connectors for all
> > > output ports here. But under these mentioned use cases, aren't we
> > > still capable to force connector to enable stream? MST hub with
> > > muti-functon capability, it will enumerate connected virtual DP peer device.
> > > For problematic HPD issues or EDID issues, their connection status
> > > is also connected.
> > >
> > > My understanding of output port is it is an internal node to help
> > > construct an end-to-end virtual channel between a stream source
> > > device and a stream sink device. Creating connectors for internal
> > > nodes within a virtual channel is a bit hard for me to get the idea.
> > > Please correct me if I misunderstand anything here. Thanks Lyude!
> > > >
> > > > > when we unplug sst monitor from a mst hub in the case that I
> > > > > described? In the case I described (unplug sst monitor), we only
> > > > > receive CSN from the hub that notifying us the connection status
> > > > > of one of its downstream output ports is changed to
> > > > > disconnected. There is no topology refcount needed to be
> > > > > decreased on this disconnected port but the malloc refcount.
> > > > > Since the output port is still declared by
> > > >
> > > > Apologies - I misunderstood your original mail as implying that
> > > > topology refcounts were being leaked - but it sounds like it's
> > > > actually malloc refcounts being leaked instead? In any case - that
> > > > means we're still tracing down a leak, just a malloc ref leak.
> > > >
> > > > But, this still doesn't totally make sense to me. Malloc refs only
> > > > keep the actual drm_dp_mst_port/drm_dp_mst_branch struct alive in
> > > > memory.
> > > > Nothing else is kept around, meaning the DRM connector (and I
> > > > assume by proxy, the dc_sink) should both be getting dropped still
> > > > and the only thing that should be leaked is a memory allocation.
> > > > These things should instead be dropped once there's no longer any
> > > > topology references around. So, are we _sure_ that the problem
> > > > here is a missing
> > > > drm_dp_mst_port_put_malloc() or drm_dp_mst_mstb_put_malloc()?
> > >
> > > Just my two cents, I don't think it's leak of malloc ref neither. As
> > > you said, malloc ref is dealing with the last step to free port/mstb.
> > > If there is still topology refcount on port/mstb in my case, we
> > > won't free port/mstb.
> > > >
> > > > If we are unfortunately we don't have equivalent tools for
> > > > malloc() tracing. I'm totally fine with trying to add some if we
> > > > have trouble figuring out this issue, but I'm a bit suspicious of
> > > > the commits you mentioned that introduced this problem. If the
> > > > problem doesn't happen until those two commits, then it's
> > > > something in the code changes there that are causing this problem.
> > >
> > > I think we probably also have the problem before these commits, but
> > > we didn't notice this before. Just when we change to clean up all
> > > things in dm_dp_mst_connector_destroy(), I start to try to figure
> > > out all these things out.
> > > >
> > > > The main thing I'm suspicious of just from looking at changes in
> > > > 09b974e8983a4b163d4a406b46d50bf869da3073 is that the call to
> > > > amdgpu_dm_update_freesync_caps() that was previously in
> > > > dm_dp_destroy_mst_connector() appears to be dropped and not
> > > > re-added in (oh dear, this is a /very/ confusingly similar
> > > > function
> > >
> > > Lol. I also have hard time on this..
> > > > name!!!) dm_dp_mst_connector_destroy(). I don't remember if this
> > > > was intentional on my part, but does adding a call back to
> > > > amdgpu_dm_update_freesync_caps() into
> > > > dm_dp_destroy_mst_connector() right before the
> > > > dc_link_remove_remote_sink() call fix anything?
> > > >
> > > > As well, I'm far less suspicious of this one but does re-adding
> > > > this
> > > > hunk:
> > > >
> > > >       aconnector->dc_sink = NULL;
> > > >       aconnector->dc_link->cur_link_settings.lane_count = 0;
> > > >
> > > > After dc_sink_release() fix anything either?
> > >
> > > So the main problem is we don't have chance to call
> > > dc_link_remove_remote_sink() in the unplugging SST case. We only
> > > have chance to remove the remote sink of a link when unplug a mstb.
> > > >
> > > > > the mst hub,  I think we shouldn't destroy the port. Actually,
> > > > > no ports nor mst branch devices should get destroyed in this
> > > > > case I think.
> > > > > The result of LINK_ADDRESS is still the same before/after
> > > > > removing the sst monitor except the
> > > > > DisplayPort_Device_Plug_Status/ Legacy_Device_Plug_Status.
> > > > >
> > > > > Hence, if you agree that we should put refcount of the connector
> > > > > of the disconnected port within the unplugging sst monitor case
> > > > > to release the allocated resource, it means we don't want to
> > > > > create connectors for those disconnected ports. Which conflicts
> > > > > current flow to create connectors for all declared output ports.
> > > > >
> > > > > Thanks again for your time Lyude!
> > > >
> > > > --
> > > > Cheers,
> > > >  Lyude Paul (she/her)
> > > >  Software Engineer at Red Hat
> > > --
> > > Regards,
> > > Wayne
> > >
> >
>
> --
> Cheers,
>  Lyude Paul (she/her)
>  Software Engineer at Red Hat
--
Regards,
Wayne Lin
Lyude Paul Sept. 17, 2021, 5:48 p.m. UTC | #15
Sorry about the slow response, this week XDC has been going on and I've been
mostly paying attention to that.

On Tue, 2021-09-14 at 08:46 +0000, Lin, Wayne wrote:
> [Public]
> 
> > -----Original Message-----
> > From: Lyude Paul <lyude@redhat.com>
> > Sent: Thursday, September 2, 2021 6:00 AM
> > To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland, Harry
> > <Harry.Wentland@amd.com>; Zuo, Jerry
> > <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li
> > <juston.li@intel.com>; Imre Deak <imre.deak@intel.com>;
> > Ville Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter
> > <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten Lankhorst
> > <maarten.lankhorst@linux.intel.com>; Maxime Ripard <mripard@kernel.org>;
> > Thomas Zimmermann <tzimmermann@suse.de>;
> > David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher,
> > Alexander <Alexander.Deucher@amd.com>; Siqueira,
> > Rodrigo <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> > <bas@basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij@amd.com>; Jani
> > Nikula <jani.nikula@intel.com>; Manasi Navare
> > <manasi.d.navare@intel.com>; Ankit Nautiyal <ankit.k.nautiyal@intel.com>;
> > José Roberto de Souza <jose.souza@intel.com>; Sean
> > Paul <seanpaul@chromium.org>; Ben Skeggs <bskeggs@redhat.com>;
> > stable@vger.kernel.org
> > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end
> > device
> > 
> > Actually - did some more thinking, and I think we shouldn't try to make
> > changes like this until we actually know what the problem is
> > here. I could try to figure out what the actual race conditions I was facing
> > before with trying to add/destroy connectors based on PDT,
> > but we still don't even actually have a clear idea of what's broken here.
> > I'd much rather us figure out exactly how this leak is
> > happening before considering making changes like this, because we have no
> > way of knowing if we've properly fixed it or not if we
> > don't know what the problem is in the first place.
> > 
> > I'm still happy to write up the topology debugging stuff I mentioned to you
> > if you think that would help you debug this issue - since
> > that would make it a lot easier for you to track down what references are
> > keeping a connector alive (and additkionally, where those
> > references were taken in code. thanks
> > stack_depot!)
> Hi Lyude,
> Sorry for late response. A bit busy on other stuff recently..
> 
> Really really thankful for all your help : ) I'm also glad to have the
> debugging tool if it won’t bother you too much. But before debugging,

no problem! I will get to it early next week then

> I need to have consensus with you about where do we expect to release resource
> allocated for a stream sink when it's reported as
> disconnected. Previous patch suggests releasing resource when connector is
> destroyed which will happen when topology refcount
> reaches zero (i.e. unplug mstb from topology). But when the case is receiving
> CSN notifying connection change, we don't try to destroy
> connector in this case now. And this is not caused by topology/malloc refcount
> leak since I don't expect neither one of them get
> decrease to zero under this case (topology of mstbs and ports is not changed).
> Hence, my plan was to also try to destroy connector under

Ah - I wonder if this might have been where some of the confusion here came
from. So-both mstbs and ports (assume I'm talking the actual drm_dp_mst_port
and drm_dp_mst_branch structs here) are supposed to have non-zero topology
refcounts as long as there is a valid path between the port or mstb, and our
source. This also means that for ports, the drm_connector associated with
these ports should stay around as long as the port is reachable from the sink
- regardless of whether anything is actually plugged into the port or not.

So - a CSN on it's own shouldn't really get rid of the port it was notifying
us about. But if that CSN results in an MSTB -with- its own ports being
removed, this would mean there would no longer be a valid path between our
source and the ports on said MSTB and as such - the connector for each one of
those ports is removed from the topology. Remember however, when I say
"removed from the topology" what I'm referring to is the fact that the MST
helpers have dropped the main topology reference for a given mstb or port.
Since various MST helpers retrieve temporary topology references to connectors
they work on in order to simplify handling I/O errors, the operations from
those helpers would potentially keep the port or mstb around in the topology
until those helpers have had a chance to abort and drop their refs. And then
once all the topology references are released, a destruction worker gets
scheduled which handles unregistering the drm_connector (not destroying it).
The drm_connector stays around unregistered, up until the point at which all
malloc references to the drm_dp_mst_port have been released.

I think it may also be worth clarifying the lifetime of drm_connector itself
here as well, since that also actually has a refcount. Basically, as long as
userspace has a mode committed which references a drm_connector - that
drm_connector will still exist in memory, and its mode object ID will remain
valid. This means if we were to have a MST topology hooked up with one display
turned on and then suddenly unplugged it, keeping in mind that the port with
said display now becomes inaccessible from the topology, the drm_connector
associated with that display would continue to have a valid mode object ID up
until the point at which userspace has committed a new mode which disables it.
The sysfs paths for the connector however, will disappear immediately once the
connector is unregistered so as to ensure that userspace applications cannot
try to reuse it later or attempt to reprobe it.

Any resource releases beyond this (streams on the driver side, for instance)
are up to the driver, but typically I would expect them to happen in the same
places as they would with an SST connector. Does that answer your question?

> this case and the reason is reasonable to me as described in previous mail.
> With this patch set, I can see connectors eventually get
> successfully destroyed after userspace committing set_crtc() to free
> connectors (although also need a fix on the connector refcount
> grabbed by drm_client_modeset_probe() under specific scenario).
> 
> I think the main problem I encountered here is that I couldn't find a place
> that notify us to release resource allocated for a disconnected
> stream sink when receive CSN. If we decide not to destroy connector under this
> case, then I probably need some guidance about where
> to do the release work.
> 
> Thanks again Lyude!
> > 
> > On Tue, 2021-08-31 at 18:47 -0400, Lyude Paul wrote:
> > > (I am going to try responding to this tomorrow btw. I haven't been
> > > super busy this week, but this has been a surprisingly difficult email
> > > to respond to because I need to actually need to do a deep dive some
> > > of the MST helpers tomorrow to figure out more of the specifics on why
> > > I realized we couldn't just hot add/remove port->connector here).
> > > 
> > > On Wed, 2021-08-25 at 03:35 +0000, Lin, Wayne wrote:
> > > > [Public]
> > > > 
> > > > > -----Original Message-----
> > > > > From: Lyude Paul <lyude@redhat.com>
> > > > > Sent: Tuesday, August 24, 2021 5:18 AM
> > > > > To: Lin, Wayne <Wayne.Lin@amd.com>;
> > > > > dri-devel@lists.freedesktop.org
> > > > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland,
> > > > > Harry <Harry.Wentland@amd.com>; Zuo, Jerry <Jerry.Zuo@amd.com>;
> > > > > Wu, Hersen <hersenxs.wu@amd.com>; Juston Li <juston.li@intel.com>;
> > > > > Imre Deak <imre.deak@intel.com>; Ville Syrjälä
> > > > > <ville.syrjala@linux.intel.com>; Daniel Vetter
> > > > > <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten
> > > > > Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>;
> > > > > David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>;
> > > > > Deucher, Alexander <Alexander.Deucher@amd.com>; Siqueira, Rodrigo
> > > > > <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > > > > <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> > > > > <bas@basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij@amd.com>;
> > > > > Jani Nikula <jani.nikula@intel.com>; Manasi Navare
> > > > > <manasi.d.navare@intel.com>; Ankit Nautiyal
> > > > > <ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > > connected end device
> > > > > 
> > > > > [snip]
> > > > > 
> > > > > I think I might still be misunderstanding something, some comments
> > > > > below
> > > > > 
> > > > > On Mon, 2021-08-23 at 06:33 +0000, Lin, Wayne wrote:
> > > > > > > > Hi Lyude,
> > > > > > > > 
> > > > > > > > Really thankful for willing to explain in such details.
> > > > > > > > Really appreciate.
> > > > > > > > 
> > > > > > > > I'm trying to fix some problems that observed after these 2
> > > > > > > > patches
> > > > > > > > * 09b974e8983 drm/amd/amdgpu_dm/mst: Remove
> > > > > > > > ->destroy_connector() callback
> > > > > > > > * 72dc0f51591 drm/dp_mst: Remove
> > > > > > > > drm_dp_mst_topology_cbs.destroy_connector
> > > > > > > > 
> > > > > > > > With above patches, we now change to remove dc_sink when
> > > > > > > > connector is about to be destroyed. However, we found out
> > > > > > > > that connectors won't get destroyed after hotplugs. Thus,
> > > > > > > > after few times hotplugs, we won't create any new dc_sink
> > > > > > > > since number of sink is exceeding our limitation. As the
> > > > > > > > result of that, I'm trying to figure out why the refcount of
> > > > > > > > connectors won't get zero.
> > > > > > > > 
> > > > > > > > Based on my analysis, I found out that if we connect a sst
> > > > > > > > monitor to a mst hub then connect the hub to the system, and
> > > > > > > > then unplug the sst monitor from the hub. E.g.
> > > > > > > > src - mst hub - sst monitor => src - mst hub  (unplug) sst
> > > > > > > > monitor
> > > > > > > > 
> > > > > > > > Within this case, we won't try to put refcount of the sst
> > > > > > > > monitor.
> > > > > > > > Which is what I tried to resolve by [PATCH 3/4].
> > > > > > > > But here comes a problem which is confusing me that if I can
> > > > > > > > destroy connector in this case. By comparing to another
> > > > > > > > case, if now mst hub is connected with a mst monitor like this:
> > > > > > > > src - mst hub - mst monitor => src - mst hub  (unplug) mst
> > > > > > > > monitor
> > > > > > > > 
> > > > > > > > We will put the topology refcount of mst monitor's branching
> > > > > > > > unit in and
> > > > > > > > drm_dp_port_set_pdt() and eventually call
> > > > > > > > drm_dp_delayed_destroy_port() to unregister the connector of
> > > > > > > > the logical port. So following the same rule, I think to
> > > > > > > > dynamically unregister a mst connector is what we want and
> > > > > > > > should be reasonable to also destroy sst connectors in my
> > > > > > > > case. But this conflicts the idea what we have here. We want
> > > > > > > > to create connectors for all output ports.
> > > > > > > > So if dynamically creating/destroying connectors is what we
> > > > > > > > want, when is the appropriate time for us to create one is
> > > > > > > > what I'm considering.
> > > > > > > > 
> > > > > > > > Take the StartTech hub DP 1to4 DP output ports for instance.
> > > > > > > > This hub, internally, is constructed by  3 1-to-2 mst branch
> > > > > > > > chips. 2 output ports of 1st chip are hardwired to another 2
> > > > > > > > chips. It's how it makes it to support 1-to-4 mst branching.
> > > > > > > > So within this case, the internal
> > > > > > > > 2 output ports of 1st chip is not connecting to a stream
> > > > > > > > sink and will never get connected to one.  Thus, I'm
> > > > > > > > thinking maybe the best timing to attach a connector to a
> > > > > > > > port is when the port is connected, and the connected PDT is
> > > > > > > > determined as a stream sink.
> > > > > > > > 
> > > > > > > > Sorry if I misunderstand anything here and really thanks for
> > > > > > > > your time to shed light on this : ) Thanks Lyude.
> > > > > > > 
> > > > > > > It's no problem, it is my job after all! Sorry for how long my
> > > > > > > responses have been taking, but my plate seems to be finally
> > > > > > > clearing up for the foreseeable future.
> > > > > > > 
> > > > > > > That being said - it sounds like with this we still aren't
> > > > > > > actually clear on where the topology refcount leak is
> > > > > > > happening - only when it's happening, which says to me that's
> > > > > > > the issue we really need to be figuring out the cause of as
> > > > > > > opposed to trying to workaround it.
> > > > > > > 
> > > > > > > Actually - refcount leaks is an issue I've ran into a number
> > > > > > > of times before in the past, so a while back I actually added
> > > > > > > some nice debugging features to assist with debugging leaks.
> > > > > > > If you enable the following options in your kernel config:
> > > > > > > 
> > > > > > > CONFIG_EXPERT=y # This must be set first before the next
> > > > > > > option CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y
> > > > > > > 
> > > > > > > Unfortunately, I'm suddenly realizing after typing this that
> > > > > > > apparently I never bothered adding a way for us to debug the
> > > > 
> > > > > > > refcounts of ports/mstbs that haven't been released yet - only
> > > > > > > the ones for ones that have. This shouldn't be difficult at
> > > > > > > all for me to add, so I'll send you a patch either today or at
> > > > > > > the start of next week to try debugging with using this, and
> > > > > > > then we can figure out where this leak is really coming from.
> > > > > > 
> > > > > > Thanks Lyude!
> > > > > > 
> > > > > > Sorry to bother you, but I would like to clarify this again.  So
> > > > > > it sounds
> > > > > 
> > > > > It's no problem! It's my job and I'm happy to help :).
> > > > 
> > > > Thanks!
> > > > I would like to learn more from you as below : p
> > > > > 
> > > > > > like you also agree that we should destroy associated connector
> > > > > 
> > > > > Not quite. I think a better way of explaining this might be to
> > > > > point out that the lifetime of an MST port and its connector isn't
> > > > > supposed to be determined by whether or not it has something
> > > > > plugged into it - its lifetime is supposed to depend on whether
> > > > > there's a valid path from us down the MST topology to the port
> > > > > we're trying to reach. So an MSTB with ports that is unplugged
> > > > > would destroy all of its ports - but an unplugged port should just
> > > > > be the same as a disconnected DRM connector - even if the port
> > > > > itself is just hosting a branching device.
> > > > 
> > > > This is the part a bit difficult to me. I treat DRM connector as the
> > > > place where we associate with a stream sink. So if the statement is
> > > > "All DP mst output ports are places we connect with stream sink", I
> > > > would say false to this since I can find the negative example when
> > > > output port is connected with mst branch device. Thus, looks like we
> > > > could only determine whether to create a connector for an output
> > > > port when the peer device type is known?
> > > > > 
> > > > > Additionally - we don't want to try "delaying" connector creation
> > > > > either.
> > > > > In the modern world hotplugging is almost always reliable in
> > > > > normal situations, but even so there's still use cases for wanting
> > > > > force probing for analog devices on DP converters and just in
> > > > > general as it's a feature commonly used by developers or users
> > > > > working around monitors with problematic HPD issues or EDID issues.
> > > > 
> > > > I think I understand that why we want to create connectors for all
> > > > output ports here. But under these mentioned use cases, aren't we
> > > > still capable to force connector to enable stream? MST hub with
> > > > muti-functon capability, it will enumerate connected virtual DP peer
> > > > device.
> > > > For problematic HPD issues or EDID issues, their connection status
> > > > is also connected.
> > > > 
> > > > My understanding of output port is it is an internal node to help
> > > > construct an end-to-end virtual channel between a stream source
> > > > device and a stream sink device. Creating connectors for internal
> > > > nodes within a virtual channel is a bit hard for me to get the idea.
> > > > Please correct me if I misunderstand anything here. Thanks Lyude!
> > > > > 
> > > > > > when we unplug sst monitor from a mst hub in the case that I
> > > > > > described? In the case I described (unplug sst monitor), we only
> > > > > > receive CSN from the hub that notifying us the connection status
> > > > > > of one of its downstream output ports is changed to
> > > > > > disconnected. There is no topology refcount needed to be
> > > > > > decreased on this disconnected port but the malloc refcount.
> > > > > > Since the output port is still declared by
> > > > > 
> > > > > Apologies - I misunderstood your original mail as implying that
> > > > > topology refcounts were being leaked - but it sounds like it's
> > > > > actually malloc refcounts being leaked instead? In any case - that
> > > > > means we're still tracing down a leak, just a malloc ref leak.
> > > > > 
> > > > > But, this still doesn't totally make sense to me. Malloc refs only
> > > > > keep the actual drm_dp_mst_port/drm_dp_mst_branch struct alive in
> > > > > memory.
> > > > > Nothing else is kept around, meaning the DRM connector (and I
> > > > > assume by proxy, the dc_sink) should both be getting dropped still
> > > > > and the only thing that should be leaked is a memory allocation.
> > > > > These things should instead be dropped once there's no longer any
> > > > > topology references around. So, are we _sure_ that the problem
> > > > > here is a missing
> > > > > drm_dp_mst_port_put_malloc() or drm_dp_mst_mstb_put_malloc()?
> > > > 
> > > > Just my two cents, I don't think it's leak of malloc ref neither. As
> > > > you said, malloc ref is dealing with the last step to free port/mstb.
> > > > If there is still topology refcount on port/mstb in my case, we
> > > > won't free port/mstb.
> > > > > 
> > > > > If we are unfortunately we don't have equivalent tools for
> > > > > malloc() tracing. I'm totally fine with trying to add some if we
> > > > > have trouble figuring out this issue, but I'm a bit suspicious of
> > > > > the commits you mentioned that introduced this problem. If the
> > > > > problem doesn't happen until those two commits, then it's
> > > > > something in the code changes there that are causing this problem.
> > > > 
> > > > I think we probably also have the problem before these commits, but
> > > > we didn't notice this before. Just when we change to clean up all
> > > > things in dm_dp_mst_connector_destroy(), I start to try to figure
> > > > out all these things out.
> > > > > 
> > > > > The main thing I'm suspicious of just from looking at changes in
> > > > > 09b974e8983a4b163d4a406b46d50bf869da3073 is that the call to
> > > > > amdgpu_dm_update_freesync_caps() that was previously in
> > > > > dm_dp_destroy_mst_connector() appears to be dropped and not
> > > > > re-added in (oh dear, this is a /very/ confusingly similar
> > > > > function
> > > > 
> > > > Lol. I also have hard time on this..
> > > > > name!!!) dm_dp_mst_connector_destroy(). I don't remember if this
> > > > > was intentional on my part, but does adding a call back to
> > > > > amdgpu_dm_update_freesync_caps() into
> > > > > dm_dp_destroy_mst_connector() right before the
> > > > > dc_link_remove_remote_sink() call fix anything?
> > > > > 
> > > > > As well, I'm far less suspicious of this one but does re-adding
> > > > > this
> > > > > hunk:
> > > > > 
> > > > >       aconnector->dc_sink = NULL;
> > > > >       aconnector->dc_link->cur_link_settings.lane_count = 0;
> > > > > 
> > > > > After dc_sink_release() fix anything either?
> > > > 
> > > > So the main problem is we don't have chance to call
> > > > dc_link_remove_remote_sink() in the unplugging SST case. We only
> > > > have chance to remove the remote sink of a link when unplug a mstb.
> > > > > 
> > > > > > the mst hub,  I think we shouldn't destroy the port. Actually,
> > > > > > no ports nor mst branch devices should get destroyed in this
> > > > > > case I think.
> > > > > > The result of LINK_ADDRESS is still the same before/after
> > > > > > removing the sst monitor except the
> > > > > > DisplayPort_Device_Plug_Status/ Legacy_Device_Plug_Status.
> > > > > > 
> > > > > > Hence, if you agree that we should put refcount of the connector
> > > > > > of the disconnected port within the unplugging sst monitor case
> > > > > > to release the allocated resource, it means we don't want to
> > > > > > create connectors for those disconnected ports. Which conflicts
> > > > > > current flow to create connectors for all declared output ports.
> > > > > > 
> > > > > > Thanks again for your time Lyude!
> > > > > 
> > > > > --
> > > > > Cheers,
> > > > >  Lyude Paul (she/her)
> > > > >  Software Engineer at Red Hat
> > > > --
> > > > Regards,
> > > > Wayne
> > > > 
> > > 
> > 
> > --
> > Cheers,
> >  Lyude Paul (she/her)
> >  Software Engineer at Red Hat
> --
> Regards,
> Wayne Lin
>
Lyude Paul Oct. 12, 2021, 9:17 p.m. UTC | #16
OK - got sidetracked by an issue at work but I just resumed working on this
today, should hopefully have it done at the start of next week at the latest
(hooray for having time to do things upstream again! :).

On Tue, 2021-09-14 at 08:46 +0000, Lin, Wayne wrote:
> [Public]
> 
> > -----Original Message-----
> > From: Lyude Paul <lyude@redhat.com>
> > Sent: Thursday, September 2, 2021 6:00 AM
> > To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland, Harry
> > <Harry.Wentland@amd.com>; Zuo, Jerry
> > <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li
> > <juston.li@intel.com>; Imre Deak <imre.deak@intel.com>;
> > Ville Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter
> > <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten Lankhorst
> > <maarten.lankhorst@linux.intel.com>; Maxime Ripard <mripard@kernel.org>;
> > Thomas Zimmermann <tzimmermann@suse.de>;
> > David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher,
> > Alexander <Alexander.Deucher@amd.com>; Siqueira,
> > Rodrigo <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> > <bas@basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij@amd.com>; Jani
> > Nikula <jani.nikula@intel.com>; Manasi Navare
> > <manasi.d.navare@intel.com>; Ankit Nautiyal <ankit.k.nautiyal@intel.com>;
> > José Roberto de Souza <jose.souza@intel.com>; Sean
> > Paul <seanpaul@chromium.org>; Ben Skeggs <bskeggs@redhat.com>;
> > stable@vger.kernel.org
> > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected
> > end device
> > 
> > Actually - did some more thinking, and I think we shouldn't try to make
> > changes like this until we actually know what the problem is
> > here. I could try to figure out what the actual race conditions I was
> > facing before with trying to add/destroy connectors based on PDT,
> > but we still don't even actually have a clear idea of what's broken here.
> > I'd much rather us figure out exactly how this leak is
> > happening before considering making changes like this, because we have no
> > way of knowing if we've properly fixed it or not if we
> > don't know what the problem is in the first place.
> > 
> > I'm still happy to write up the topology debugging stuff I mentioned to
> > you if you think that would help you debug this issue - since
> > that would make it a lot easier for you to track down what references are
> > keeping a connector alive (and additkionally, where those
> > references were taken in code. thanks
> > stack_depot!)
> Hi Lyude,
> Sorry for late response. A bit busy on other stuff recently..
> 
> Really really thankful for all your help : ) I'm also glad to have the
> debugging tool if it won’t bother you too much. But before debugging,
> I need to have consensus with you about where do we expect to release
> resource allocated for a stream sink when it's reported as
> disconnected. Previous patch suggests releasing resource when connector is
> destroyed which will happen when topology refcount
> reaches zero (i.e. unplug mstb from topology). But when the case is
> receiving CSN notifying connection change, we don't try to destroy
> connector in this case now. And this is not caused by topology/malloc
> refcount leak since I don't expect neither one of them get
> decrease to zero under this case (topology of mstbs and ports is not
> changed). Hence, my plan was to also try to destroy connector under
> this case and the reason is reasonable to me as described in previous mail.
> With this patch set, I can see connectors eventually get
> successfully destroyed after userspace committing set_crtc() to free
> connectors (although also need a fix on the connector refcount
> grabbed by drm_client_modeset_probe() under specific scenario).
> 
> I think the main problem I encountered here is that I couldn't find a place
> that notify us to release resource allocated for a disconnected
> stream sink when receive CSN. If we decide not to destroy connector under
> this case, then I probably need some guidance about where
> to do the release work.
> 
> Thanks again Lyude!
> > 
> > On Tue, 2021-08-31 at 18:47 -0400, Lyude Paul wrote:
> > > (I am going to try responding to this tomorrow btw. I haven't been
> > > super busy this week, but this has been a surprisingly difficult email
> > > to respond to because I need to actually need to do a deep dive some
> > > of the MST helpers tomorrow to figure out more of the specifics on why
> > > I realized we couldn't just hot add/remove port->connector here).
> > > 
> > > On Wed, 2021-08-25 at 03:35 +0000, Lin, Wayne wrote:
> > > > [Public]
> > > > 
> > > > > -----Original Message-----
> > > > > From: Lyude Paul <lyude@redhat.com>
> > > > > Sent: Tuesday, August 24, 2021 5:18 AM
> > > > > To: Lin, Wayne <Wayne.Lin@amd.com>;
> > > > > dri-devel@lists.freedesktop.org
> > > > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland,
> > > > > Harry <Harry.Wentland@amd.com>; Zuo, Jerry <Jerry.Zuo@amd.com>;
> > > > > Wu, Hersen <hersenxs.wu@amd.com>; Juston Li <juston.li@intel.com>;
> > > > > Imre Deak <imre.deak@intel.com>; Ville Syrjälä
> > > > > <ville.syrjala@linux.intel.com>; Daniel Vetter
> > > > > <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten
> > > > > Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>;
> > > > > David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>;
> > > > > Deucher, Alexander <Alexander.Deucher@amd.com>; Siqueira, Rodrigo
> > > > > <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > > > > <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> > > > > <bas@basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij@amd.com>;
> > > > > Jani Nikula <jani.nikula@intel.com>; Manasi Navare
> > > > > <manasi.d.navare@intel.com>; Ankit Nautiyal
> > > > > <ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > > connected end device
> > > > > 
> > > > > [snip]
> > > > > 
> > > > > I think I might still be misunderstanding something, some comments
> > > > > below
> > > > > 
> > > > > On Mon, 2021-08-23 at 06:33 +0000, Lin, Wayne wrote:
> > > > > > > > Hi Lyude,
> > > > > > > > 
> > > > > > > > Really thankful for willing to explain in such details.
> > > > > > > > Really appreciate.
> > > > > > > > 
> > > > > > > > I'm trying to fix some problems that observed after these 2
> > > > > > > > patches
> > > > > > > > * 09b974e8983 drm/amd/amdgpu_dm/mst: Remove
> > > > > > > > ->destroy_connector() callback
> > > > > > > > * 72dc0f51591 drm/dp_mst: Remove
> > > > > > > > drm_dp_mst_topology_cbs.destroy_connector
> > > > > > > > 
> > > > > > > > With above patches, we now change to remove dc_sink when
> > > > > > > > connector is about to be destroyed. However, we found out
> > > > > > > > that connectors won't get destroyed after hotplugs. Thus,
> > > > > > > > after few times hotplugs, we won't create any new dc_sink
> > > > > > > > since number of sink is exceeding our limitation. As the
> > > > > > > > result of that, I'm trying to figure out why the refcount of
> > > > > > > > connectors won't get zero.
> > > > > > > > 
> > > > > > > > Based on my analysis, I found out that if we connect a sst
> > > > > > > > monitor to a mst hub then connect the hub to the system, and
> > > > > > > > then unplug the sst monitor from the hub. E.g.
> > > > > > > > src - mst hub - sst monitor => src - mst hub  (unplug) sst
> > > > > > > > monitor
> > > > > > > > 
> > > > > > > > Within this case, we won't try to put refcount of the sst
> > > > > > > > monitor.
> > > > > > > > Which is what I tried to resolve by [PATCH 3/4].
> > > > > > > > But here comes a problem which is confusing me that if I can
> > > > > > > > destroy connector in this case. By comparing to another
> > > > > > > > case, if now mst hub is connected with a mst monitor like
> > > > > > > > this:
> > > > > > > > src - mst hub - mst monitor => src - mst hub  (unplug) mst
> > > > > > > > monitor
> > > > > > > > 
> > > > > > > > We will put the topology refcount of mst monitor's branching
> > > > > > > > unit in and
> > > > > > > > drm_dp_port_set_pdt() and eventually call
> > > > > > > > drm_dp_delayed_destroy_port() to unregister the connector of
> > > > > > > > the logical port. So following the same rule, I think to
> > > > > > > > dynamically unregister a mst connector is what we want and
> > > > > > > > should be reasonable to also destroy sst connectors in my
> > > > > > > > case. But this conflicts the idea what we have here. We want
> > > > > > > > to create connectors for all output ports.
> > > > > > > > So if dynamically creating/destroying connectors is what we
> > > > > > > > want, when is the appropriate time for us to create one is
> > > > > > > > what I'm considering.
> > > > > > > > 
> > > > > > > > Take the StartTech hub DP 1to4 DP output ports for instance.
> > > > > > > > This hub, internally, is constructed by  3 1-to-2 mst branch
> > > > > > > > chips. 2 output ports of 1st chip are hardwired to another 2
> > > > > > > > chips. It's how it makes it to support 1-to-4 mst branching.
> > > > > > > > So within this case, the internal
> > > > > > > > 2 output ports of 1st chip is not connecting to a stream
> > > > > > > > sink and will never get connected to one.  Thus, I'm
> > > > > > > > thinking maybe the best timing to attach a connector to a
> > > > > > > > port is when the port is connected, and the connected PDT is
> > > > > > > > determined as a stream sink.
> > > > > > > > 
> > > > > > > > Sorry if I misunderstand anything here and really thanks for
> > > > > > > > your time to shed light on this : ) Thanks Lyude.
> > > > > > > 
> > > > > > > It's no problem, it is my job after all! Sorry for how long my
> > > > > > > responses have been taking, but my plate seems to be finally
> > > > > > > clearing up for the foreseeable future.
> > > > > > > 
> > > > > > > That being said - it sounds like with this we still aren't
> > > > > > > actually clear on where the topology refcount leak is
> > > > > > > happening - only when it's happening, which says to me that's
> > > > > > > the issue we really need to be figuring out the cause of as
> > > > > > > opposed to trying to workaround it.
> > > > > > > 
> > > > > > > Actually - refcount leaks is an issue I've ran into a number
> > > > > > > of times before in the past, so a while back I actually added
> > > > > > > some nice debugging features to assist with debugging leaks.
> > > > > > > If you enable the following options in your kernel config:
> > > > > > > 
> > > > > > > CONFIG_EXPERT=y # This must be set first before the next
> > > > > > > option CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y
> > > > > > > 
> > > > > > > Unfortunately, I'm suddenly realizing after typing this that
> > > > > > > apparently I never bothered adding a way for us to debug the
> > > > 
> > > > > > > refcounts of ports/mstbs that haven't been released yet - only
> > > > > > > the ones for ones that have. This shouldn't be difficult at
> > > > > > > all for me to add, so I'll send you a patch either today or at
> > > > > > > the start of next week to try debugging with using this, and
> > > > > > > then we can figure out where this leak is really coming from.
> > > > > > 
> > > > > > Thanks Lyude!
> > > > > > 
> > > > > > Sorry to bother you, but I would like to clarify this again.  So
> > > > > > it sounds
> > > > > 
> > > > > It's no problem! It's my job and I'm happy to help :).
> > > > 
> > > > Thanks!
> > > > I would like to learn more from you as below : p
> > > > > 
> > > > > > like you also agree that we should destroy associated connector
> > > > > 
> > > > > Not quite. I think a better way of explaining this might be to
> > > > > point out that the lifetime of an MST port and its connector isn't
> > > > > supposed to be determined by whether or not it has something
> > > > > plugged into it - its lifetime is supposed to depend on whether
> > > > > there's a valid path from us down the MST topology to the port
> > > > > we're trying to reach. So an MSTB with ports that is unplugged
> > > > > would destroy all of its ports - but an unplugged port should just
> > > > > be the same as a disconnected DRM connector - even if the port
> > > > > itself is just hosting a branching device.
> > > > 
> > > > This is the part a bit difficult to me. I treat DRM connector as the
> > > > place where we associate with a stream sink. So if the statement is
> > > > "All DP mst output ports are places we connect with stream sink", I
> > > > would say false to this since I can find the negative example when
> > > > output port is connected with mst branch device. Thus, looks like we
> > > > could only determine whether to create a connector for an output
> > > > port when the peer device type is known?
> > > > > 
> > > > > Additionally - we don't want to try "delaying" connector creation
> > > > > either.
> > > > > In the modern world hotplugging is almost always reliable in
> > > > > normal situations, but even so there's still use cases for wanting
> > > > > force probing for analog devices on DP converters and just in
> > > > > general as it's a feature commonly used by developers or users
> > > > > working around monitors with problematic HPD issues or EDID issues.
> > > > 
> > > > I think I understand that why we want to create connectors for all
> > > > output ports here. But under these mentioned use cases, aren't we
> > > > still capable to force connector to enable stream? MST hub with
> > > > muti-functon capability, it will enumerate connected virtual DP peer
> > > > device.
> > > > For problematic HPD issues or EDID issues, their connection status
> > > > is also connected.
> > > > 
> > > > My understanding of output port is it is an internal node to help
> > > > construct an end-to-end virtual channel between a stream source
> > > > device and a stream sink device. Creating connectors for internal
> > > > nodes within a virtual channel is a bit hard for me to get the idea.
> > > > Please correct me if I misunderstand anything here. Thanks Lyude!
> > > > > 
> > > > > > when we unplug sst monitor from a mst hub in the case that I
> > > > > > described? In the case I described (unplug sst monitor), we only
> > > > > > receive CSN from the hub that notifying us the connection status
> > > > > > of one of its downstream output ports is changed to
> > > > > > disconnected. There is no topology refcount needed to be
> > > > > > decreased on this disconnected port but the malloc refcount.
> > > > > > Since the output port is still declared by
> > > > > 
> > > > > Apologies - I misunderstood your original mail as implying that
> > > > > topology refcounts were being leaked - but it sounds like it's
> > > > > actually malloc refcounts being leaked instead? In any case - that
> > > > > means we're still tracing down a leak, just a malloc ref leak.
> > > > > 
> > > > > But, this still doesn't totally make sense to me. Malloc refs only
> > > > > keep the actual drm_dp_mst_port/drm_dp_mst_branch struct alive in
> > > > > memory.
> > > > > Nothing else is kept around, meaning the DRM connector (and I
> > > > > assume by proxy, the dc_sink) should both be getting dropped still
> > > > > and the only thing that should be leaked is a memory allocation.
> > > > > These things should instead be dropped once there's no longer any
> > > > > topology references around. So, are we _sure_ that the problem
> > > > > here is a missing
> > > > > drm_dp_mst_port_put_malloc() or drm_dp_mst_mstb_put_malloc()?
> > > > 
> > > > Just my two cents, I don't think it's leak of malloc ref neither. As
> > > > you said, malloc ref is dealing with the last step to free port/mstb.
> > > > If there is still topology refcount on port/mstb in my case, we
> > > > won't free port/mstb.
> > > > > 
> > > > > If we are unfortunately we don't have equivalent tools for
> > > > > malloc() tracing. I'm totally fine with trying to add some if we
> > > > > have trouble figuring out this issue, but I'm a bit suspicious of
> > > > > the commits you mentioned that introduced this problem. If the
> > > > > problem doesn't happen until those two commits, then it's
> > > > > something in the code changes there that are causing this problem.
> > > > 
> > > > I think we probably also have the problem before these commits, but
> > > > we didn't notice this before. Just when we change to clean up all
> > > > things in dm_dp_mst_connector_destroy(), I start to try to figure
> > > > out all these things out.
> > > > > 
> > > > > The main thing I'm suspicious of just from looking at changes in
> > > > > 09b974e8983a4b163d4a406b46d50bf869da3073 is that the call to
> > > > > amdgpu_dm_update_freesync_caps() that was previously in
> > > > > dm_dp_destroy_mst_connector() appears to be dropped and not
> > > > > re-added in (oh dear, this is a /very/ confusingly similar
> > > > > function
> > > > 
> > > > Lol. I also have hard time on this..
> > > > > name!!!) dm_dp_mst_connector_destroy(). I don't remember if this
> > > > > was intentional on my part, but does adding a call back to
> > > > > amdgpu_dm_update_freesync_caps() into
> > > > > dm_dp_destroy_mst_connector() right before the
> > > > > dc_link_remove_remote_sink() call fix anything?
> > > > > 
> > > > > As well, I'm far less suspicious of this one but does re-adding
> > > > > this
> > > > > hunk:
> > > > > 
> > > > >       aconnector->dc_sink = NULL;
> > > > >       aconnector->dc_link->cur_link_settings.lane_count = 0;
> > > > > 
> > > > > After dc_sink_release() fix anything either?
> > > > 
> > > > So the main problem is we don't have chance to call
> > > > dc_link_remove_remote_sink() in the unplugging SST case. We only
> > > > have chance to remove the remote sink of a link when unplug a mstb.
> > > > > 
> > > > > > the mst hub,  I think we shouldn't destroy the port. Actually,
> > > > > > no ports nor mst branch devices should get destroyed in this
> > > > > > case I think.
> > > > > > The result of LINK_ADDRESS is still the same before/after
> > > > > > removing the sst monitor except the
> > > > > > DisplayPort_Device_Plug_Status/ Legacy_Device_Plug_Status.
> > > > > > 
> > > > > > Hence, if you agree that we should put refcount of the connector
> > > > > > of the disconnected port within the unplugging sst monitor case
> > > > > > to release the allocated resource, it means we don't want to
> > > > > > create connectors for those disconnected ports. Which conflicts
> > > > > > current flow to create connectors for all declared output ports.
> > > > > > 
> > > > > > Thanks again for your time Lyude!
> > > > > 
> > > > > --
> > > > > Cheers,
> > > > >  Lyude Paul (she/her)
> > > > >  Software Engineer at Red Hat
> > > > --
> > > > Regards,
> > > > Wayne
> > > > 
> > > 
> > 
> > --
> > Cheers,
> >  Lyude Paul (she/her)
> >  Software Engineer at Red Hat
> --
> Regards,
> Wayne Lin
>
Lin, Wayne Oct. 15, 2021, 10:16 a.m. UTC | #17
[Public]

Thanks Lyude! And sorry for late reply.
I'm also struggling for other tasks so haven't get through your detail elaboration honestly.
Would like to take time to think through your elaboration : ) Anyway, will response it ASAP.

Thanks again!

> -----Original Message-----
> From: Lyude Paul <lyude@redhat.com>
> Sent: Wednesday, October 13, 2021 5:17 AM
> To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland, Harry <Harry.Wentland@amd.com>; Zuo, Jerry
> <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li <juston.li@intel.com>; Imre Deak <imre.deak@intel.com>; Ville
> Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten Lankhorst
> <maarten.lankhorst@linux.intel.com>; Maxime Ripard <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>; David Airlie
> <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher, Alexander <Alexander.Deucher@amd.com>; Siqueira, Rodrigo
> <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij,
> Nikola <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>; Manasi Navare <manasi.d.navare@intel.com>; Ankit Nautiyal
> <ankit.k.nautiyal@intel.com>; José Roberto de Souza <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben Skeggs
> <bskeggs@redhat.com>; stable@vger.kernel.org
> Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device
>
> OK - got sidetracked by an issue at work but I just resumed working on this today, should hopefully have it done at the start of next week at
> the latest (hooray for having time to do things upstream again! :).
>
> On Tue, 2021-09-14 at 08:46 +0000, Lin, Wayne wrote:
> > [Public]
> >
> > > -----Original Message-----
> > > From: Lyude Paul <lyude@redhat.com>
> > > Sent: Thursday, September 2, 2021 6:00 AM
> > > To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland,
> > > Harry <Harry.Wentland@amd.com>; Zuo, Jerry <Jerry.Zuo@amd.com>; Wu,
> > > Hersen <hersenxs.wu@amd.com>; Juston Li <juston.li@intel.com>; Imre
> > > Deak <imre.deak@intel.com>; Ville Syrjälä
> > > <ville.syrjala@linux.intel.com>; Daniel Vetter
> > > <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten
> > > Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>; David
> > > Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher,
> > > Alexander <Alexander.Deucher@amd.com>; Siqueira, Rodrigo
> > > <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > > <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> > > <bas@basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij@amd.com>;
> > > Jani Nikula <jani.nikula@intel.com>; Manasi Navare
> > > <manasi.d.navare@intel.com>; Ankit Nautiyal
> > > <ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > connected end device
> > >
> > > Actually - did some more thinking, and I think we shouldn't try to
> > > make changes like this until we actually know what the problem is
> > > here. I could try to figure out what the actual race conditions I
> > > was facing before with trying to add/destroy connectors based on
> > > PDT, but we still don't even actually have a clear idea of what's broken here.
> > > I'd much rather us figure out exactly how this leak is happening
> > > before considering making changes like this, because we have no way
> > > of knowing if we've properly fixed it or not if we don't know what
> > > the problem is in the first place.
> > >
> > > I'm still happy to write up the topology debugging stuff I mentioned
> > > to you if you think that would help you debug this issue - since
> > > that would make it a lot easier for you to track down what
> > > references are keeping a connector alive (and additkionally, where
> > > those references were taken in code. thanks
> > > stack_depot!)
> > Hi Lyude,
> > Sorry for late response. A bit busy on other stuff recently..
> >
> > Really really thankful for all your help : ) I'm also glad to have the
> > debugging tool if it won’t bother you too much. But before debugging,
> > I need to have consensus with you about where do we expect to release
> > resource allocated for a stream sink when it's reported as
> > disconnected. Previous patch suggests releasing resource when
> > connector is destroyed which will happen when topology refcount
> > reaches zero (i.e. unplug mstb from topology). But when the case is
> > receiving CSN notifying connection change, we don't try to destroy
> > connector in this case now. And this is not caused by topology/malloc
> > refcount leak since I don't expect neither one of them get decrease to
> > zero under this case (topology of mstbs and ports is not changed).
> > Hence, my plan was to also try to destroy connector under this case
> > and the reason is reasonable to me as described in previous mail.
> > With this patch set, I can see connectors eventually get successfully
> > destroyed after userspace committing set_crtc() to free connectors
> > (although also need a fix on the connector refcount grabbed by
> > drm_client_modeset_probe() under specific scenario).
> >
> > I think the main problem I encountered here is that I couldn't find a
> > place that notify us to release resource allocated for a disconnected
> > stream sink when receive CSN. If we decide not to destroy connector
> > under this case, then I probably need some guidance about where to do
> > the release work.
> >
> > Thanks again Lyude!
> > >
> > > On Tue, 2021-08-31 at 18:47 -0400, Lyude Paul wrote:
> > > > (I am going to try responding to this tomorrow btw. I haven't been
> > > > super busy this week, but this has been a surprisingly difficult
> > > > email to respond to because I need to actually need to do a deep
> > > > dive some of the MST helpers tomorrow to figure out more of the
> > > > specifics on why I realized we couldn't just hot add/remove port->connector here).
> > > >
> > > > On Wed, 2021-08-25 at 03:35 +0000, Lin, Wayne wrote:
> > > > > [Public]
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Lyude Paul <lyude@redhat.com>
> > > > > > Sent: Tuesday, August 24, 2021 5:18 AM
> > > > > > To: Lin, Wayne <Wayne.Lin@amd.com>;
> > > > > > dri-devel@lists.freedesktop.org
> > > > > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>;
> > > > > > Wentland, Harry <Harry.Wentland@amd.com>; Zuo, Jerry
> > > > > > <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston
> > > > > > Li <juston.li@intel.com>; Imre Deak <imre.deak@intel.com>;
> > > > > > Ville Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter
> > > > > > <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten
> > > > > > Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > > > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>;
> > > > > > David Airlie <airlied@linux.ie>; Daniel Vetter
> > > > > > <daniel@ffwll.ch>; Deucher, Alexander
> > > > > > <Alexander.Deucher@amd.com>; Siqueira, Rodrigo
> > > > > > <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > > > > > <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> > > > > > <bas@basnieuwenhuizen.nl>; Cornij, Nikola
> > > > > > <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>;
> > > > > > Manasi Navare <manasi.d.navare@intel.com>; Ankit Nautiyal
> > > > > > <ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > > > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > > > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > > > connected end device
> > > > > >
> > > > > > [snip]
> > > > > >
> > > > > > I think I might still be misunderstanding something, some
> > > > > > comments below
> > > > > >
> > > > > > On Mon, 2021-08-23 at 06:33 +0000, Lin, Wayne wrote:
> > > > > > > > > Hi Lyude,
> > > > > > > > >
> > > > > > > > > Really thankful for willing to explain in such details.
> > > > > > > > > Really appreciate.
> > > > > > > > >
> > > > > > > > > I'm trying to fix some problems that observed after
> > > > > > > > > these 2 patches
> > > > > > > > > * 09b974e8983 drm/amd/amdgpu_dm/mst: Remove
> > > > > > > > > ->destroy_connector() callback
> > > > > > > > > * 72dc0f51591 drm/dp_mst: Remove
> > > > > > > > > drm_dp_mst_topology_cbs.destroy_connector
> > > > > > > > >
> > > > > > > > > With above patches, we now change to remove dc_sink when
> > > > > > > > > connector is about to be destroyed. However, we found
> > > > > > > > > out that connectors won't get destroyed after hotplugs.
> > > > > > > > > Thus, after few times hotplugs, we won't create any new
> > > > > > > > > dc_sink since number of sink is exceeding our
> > > > > > > > > limitation. As the result of that, I'm trying to figure
> > > > > > > > > out why the refcount of connectors won't get zero.
> > > > > > > > >
> > > > > > > > > Based on my analysis, I found out that if we connect a
> > > > > > > > > sst monitor to a mst hub then connect the hub to the
> > > > > > > > > system, and then unplug the sst monitor from the hub. E.g.
> > > > > > > > > src - mst hub - sst monitor => src - mst hub  (unplug)
> > > > > > > > > sst monitor
> > > > > > > > >
> > > > > > > > > Within this case, we won't try to put refcount of the
> > > > > > > > > sst monitor.
> > > > > > > > > Which is what I tried to resolve by [PATCH 3/4].
> > > > > > > > > But here comes a problem which is confusing me that if I
> > > > > > > > > can destroy connector in this case. By comparing to
> > > > > > > > > another case, if now mst hub is connected with a mst
> > > > > > > > > monitor like
> > > > > > > > > this:
> > > > > > > > > src - mst hub - mst monitor => src - mst hub  (unplug)
> > > > > > > > > mst monitor
> > > > > > > > >
> > > > > > > > > We will put the topology refcount of mst monitor's
> > > > > > > > > branching unit in and
> > > > > > > > > drm_dp_port_set_pdt() and eventually call
> > > > > > > > > drm_dp_delayed_destroy_port() to unregister the
> > > > > > > > > connector of the logical port. So following the same
> > > > > > > > > rule, I think to dynamically unregister a mst connector
> > > > > > > > > is what we want and should be reasonable to also destroy
> > > > > > > > > sst connectors in my case. But this conflicts the idea
> > > > > > > > > what we have here. We want to create connectors for all output ports.
> > > > > > > > > So if dynamically creating/destroying connectors is what
> > > > > > > > > we want, when is the appropriate time for us to create
> > > > > > > > > one is what I'm considering.
> > > > > > > > >
> > > > > > > > > Take the StartTech hub DP 1to4 DP output ports for instance.
> > > > > > > > > This hub, internally, is constructed by  3 1-to-2 mst
> > > > > > > > > branch chips. 2 output ports of 1st chip are hardwired
> > > > > > > > > to another 2 chips. It's how it makes it to support 1-to-4 mst branching.
> > > > > > > > > So within this case, the internal
> > > > > > > > > 2 output ports of 1st chip is not connecting to a stream
> > > > > > > > > sink and will never get connected to one.  Thus, I'm
> > > > > > > > > thinking maybe the best timing to attach a connector to
> > > > > > > > > a port is when the port is connected, and the connected
> > > > > > > > > PDT is determined as a stream sink.
> > > > > > > > >
> > > > > > > > > Sorry if I misunderstand anything here and really thanks
> > > > > > > > > for your time to shed light on this : ) Thanks Lyude.
> > > > > > > >
> > > > > > > > It's no problem, it is my job after all! Sorry for how
> > > > > > > > long my responses have been taking, but my plate seems to
> > > > > > > > be finally clearing up for the foreseeable future.
> > > > > > > >
> > > > > > > > That being said - it sounds like with this we still aren't
> > > > > > > > actually clear on where the topology refcount leak is
> > > > > > > > happening - only when it's happening, which says to me
> > > > > > > > that's the issue we really need to be figuring out the
> > > > > > > > cause of as opposed to trying to workaround it.
> > > > > > > >
> > > > > > > > Actually - refcount leaks is an issue I've ran into a
> > > > > > > > number of times before in the past, so a while back I
> > > > > > > > actually added some nice debugging features to assist with debugging leaks.
> > > > > > > > If you enable the following options in your kernel config:
> > > > > > > >
> > > > > > > > CONFIG_EXPERT=y # This must be set first before the next
> > > > > > > > option CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y
> > > > > > > >
> > > > > > > > Unfortunately, I'm suddenly realizing after typing this
> > > > > > > > that apparently I never bothered adding a way for us to
> > > > > > > > debug the
> > > > >
> > > > > > > > refcounts of ports/mstbs that haven't been released yet -
> > > > > > > > only the ones for ones that have. This shouldn't be
> > > > > > > > difficult at all for me to add, so I'll send you a patch
> > > > > > > > either today or at the start of next week to try debugging
> > > > > > > > with using this, and then we can figure out where this leak is really coming from.
> > > > > > >
> > > > > > > Thanks Lyude!
> > > > > > >
> > > > > > > Sorry to bother you, but I would like to clarify this again.
> > > > > > > So it sounds
> > > > > >
> > > > > > It's no problem! It's my job and I'm happy to help :).
> > > > >
> > > > > Thanks!
> > > > > I would like to learn more from you as below : p
> > > > > >
> > > > > > > like you also agree that we should destroy associated
> > > > > > > connector
> > > > > >
> > > > > > Not quite. I think a better way of explaining this might be to
> > > > > > point out that the lifetime of an MST port and its connector
> > > > > > isn't supposed to be determined by whether or not it has
> > > > > > something plugged into it - its lifetime is supposed to depend
> > > > > > on whether there's a valid path from us down the MST topology
> > > > > > to the port we're trying to reach. So an MSTB with ports that
> > > > > > is unplugged would destroy all of its ports - but an unplugged
> > > > > > port should just be the same as a disconnected DRM connector -
> > > > > > even if the port itself is just hosting a branching device.
> > > > >
> > > > > This is the part a bit difficult to me. I treat DRM connector as
> > > > > the place where we associate with a stream sink. So if the
> > > > > statement is "All DP mst output ports are places we connect with
> > > > > stream sink", I would say false to this since I can find the
> > > > > negative example when output port is connected with mst branch
> > > > > device. Thus, looks like we could only determine whether to
> > > > > create a connector for an output port when the peer device type is known?
> > > > > >
> > > > > > Additionally - we don't want to try "delaying" connector
> > > > > > creation either.
> > > > > > In the modern world hotplugging is almost always reliable in
> > > > > > normal situations, but even so there's still use cases for
> > > > > > wanting force probing for analog devices on DP converters and
> > > > > > just in general as it's a feature commonly used by developers
> > > > > > or users working around monitors with problematic HPD issues or EDID issues.
> > > > >
> > > > > I think I understand that why we want to create connectors for
> > > > > all output ports here. But under these mentioned use cases,
> > > > > aren't we still capable to force connector to enable stream? MST
> > > > > hub with muti-functon capability, it will enumerate connected
> > > > > virtual DP peer device.
> > > > > For problematic HPD issues or EDID issues, their connection
> > > > > status is also connected.
> > > > >
> > > > > My understanding of output port is it is an internal node to
> > > > > help construct an end-to-end virtual channel between a stream
> > > > > source device and a stream sink device. Creating connectors for
> > > > > internal nodes within a virtual channel is a bit hard for me to get the idea.
> > > > > Please correct me if I misunderstand anything here. Thanks Lyude!
> > > > > >
> > > > > > > when we unplug sst monitor from a mst hub in the case that I
> > > > > > > described? In the case I described (unplug sst monitor), we
> > > > > > > only receive CSN from the hub that notifying us the
> > > > > > > connection status of one of its downstream output ports is
> > > > > > > changed to disconnected. There is no topology refcount
> > > > > > > needed to be decreased on this disconnected port but the malloc refcount.
> > > > > > > Since the output port is still declared by
> > > > > >
> > > > > > Apologies - I misunderstood your original mail as implying
> > > > > > that topology refcounts were being leaked - but it sounds like
> > > > > > it's actually malloc refcounts being leaked instead? In any
> > > > > > case - that means we're still tracing down a leak, just a malloc ref leak.
> > > > > >
> > > > > > But, this still doesn't totally make sense to me. Malloc refs
> > > > > > only keep the actual drm_dp_mst_port/drm_dp_mst_branch struct
> > > > > > alive in memory.
> > > > > > Nothing else is kept around, meaning the DRM connector (and I
> > > > > > assume by proxy, the dc_sink) should both be getting dropped
> > > > > > still and the only thing that should be leaked is a memory allocation.
> > > > > > These things should instead be dropped once there's no longer
> > > > > > any topology references around. So, are we _sure_ that the
> > > > > > problem here is a missing
> > > > > > drm_dp_mst_port_put_malloc() or drm_dp_mst_mstb_put_malloc()?
> > > > >
> > > > > Just my two cents, I don't think it's leak of malloc ref
> > > > > neither. As you said, malloc ref is dealing with the last step to free port/mstb.
> > > > > If there is still topology refcount on port/mstb in my case, we
> > > > > won't free port/mstb.
> > > > > >
> > > > > > If we are unfortunately we don't have equivalent tools for
> > > > > > malloc() tracing. I'm totally fine with trying to add some if
> > > > > > we have trouble figuring out this issue, but I'm a bit
> > > > > > suspicious of the commits you mentioned that introduced this
> > > > > > problem. If the problem doesn't happen until those two
> > > > > > commits, then it's something in the code changes there that are causing this problem.
> > > > >
> > > > > I think we probably also have the problem before these commits,
> > > > > but we didn't notice this before. Just when we change to clean
> > > > > up all things in dm_dp_mst_connector_destroy(), I start to try
> > > > > to figure out all these things out.
> > > > > >
> > > > > > The main thing I'm suspicious of just from looking at changes
> > > > > > in
> > > > > > 09b974e8983a4b163d4a406b46d50bf869da3073 is that the call to
> > > > > > amdgpu_dm_update_freesync_caps() that was previously in
> > > > > > dm_dp_destroy_mst_connector() appears to be dropped and not
> > > > > > re-added in (oh dear, this is a /very/ confusingly similar
> > > > > > function
> > > > >
> > > > > Lol. I also have hard time on this..
> > > > > > name!!!) dm_dp_mst_connector_destroy(). I don't remember if
> > > > > > this was intentional on my part, but does adding a call back
> > > > > > to
> > > > > > amdgpu_dm_update_freesync_caps() into
> > > > > > dm_dp_destroy_mst_connector() right before the
> > > > > > dc_link_remove_remote_sink() call fix anything?
> > > > > >
> > > > > > As well, I'm far less suspicious of this one but does
> > > > > > re-adding this
> > > > > > hunk:
> > > > > >
> > > > > >       aconnector->dc_sink = NULL;
> > > > > >       aconnector->dc_link->cur_link_settings.lane_count = 0;
> > > > > >
> > > > > > After dc_sink_release() fix anything either?
> > > > >
> > > > > So the main problem is we don't have chance to call
> > > > > dc_link_remove_remote_sink() in the unplugging SST case. We only
> > > > > have chance to remove the remote sink of a link when unplug a mstb.
> > > > > >
> > > > > > > the mst hub,  I think we shouldn't destroy the port.
> > > > > > > Actually, no ports nor mst branch devices should get
> > > > > > > destroyed in this case I think.
> > > > > > > The result of LINK_ADDRESS is still the same before/after
> > > > > > > removing the sst monitor except the
> > > > > > > DisplayPort_Device_Plug_Status/ Legacy_Device_Plug_Status.
> > > > > > >
> > > > > > > Hence, if you agree that we should put refcount of the
> > > > > > > connector of the disconnected port within the unplugging sst
> > > > > > > monitor case to release the allocated resource, it means we
> > > > > > > don't want to create connectors for those disconnected
> > > > > > > ports. Which conflicts current flow to create connectors for all declared output ports.
> > > > > > >
> > > > > > > Thanks again for your time Lyude!
> > > > > >
> > > > > > --
> > > > > > Cheers,
> > > > > >  Lyude Paul (she/her)
> > > > > >  Software Engineer at Red Hat
> > > > > --
> > > > > Regards,
> > > > > Wayne
> > > > >
> > > >
> > >
> > > --
> > > Cheers,
> > >  Lyude Paul (she/her)
> > >  Software Engineer at Red Hat
> > --
> > Regards,
> > Wayne Lin
> >
>
> --
> Cheers,
>  Lyude Paul (she/her)
>  Software Engineer at Red Hat
--
Regards,
Wayne Lin
Lin, Wayne Oct. 26, 2021, 3:50 a.m. UTC | #18
[Public]

Hi Lyude!
Apologize for replying late and really thanks for elaborating in such details!
Following are some of my thoughts : )

> -----Original Message-----
> From: Lyude Paul <lyude@redhat.com>
> Sent: Saturday, September 18, 2021 1:48 AM
> To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland, Harry <Harry.Wentland@amd.com>; Zuo, Jerry
> <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li <juston.li@intel.com>; Imre Deak <imre.deak@intel.com>; Ville
> Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten Lankhorst
> <maarten.lankhorst@linux.intel.com>; Maxime Ripard <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>; David Airlie
> <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher, Alexander <Alexander.Deucher@amd.com>; Siqueira, Rodrigo
> <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij,
> Nikola <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>; Manasi Navare <manasi.d.navare@intel.com>; Ankit Nautiyal
> <ankit.k.nautiyal@intel.com>; José Roberto de Souza <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben Skeggs
> <bskeggs@redhat.com>; stable@vger.kernel.org
> Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device
>
> Sorry about the slow response, this week XDC has been going on and I've been mostly paying attention to that.
>
> On Tue, 2021-09-14 at 08:46 +0000, Lin, Wayne wrote:
> > [Public]
> >
> > > -----Original Message-----
> > > From: Lyude Paul <lyude@redhat.com>
> > > Sent: Thursday, September 2, 2021 6:00 AM
> > > To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland,
> > > Harry <Harry.Wentland@amd.com>; Zuo, Jerry <Jerry.Zuo@amd.com>; Wu,
> > > Hersen <hersenxs.wu@amd.com>; Juston Li <juston.li@intel.com>; Imre
> > > Deak <imre.deak@intel.com>; Ville Syrjälä
> > > <ville.syrjala@linux.intel.com>; Daniel Vetter
> > > <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten
> > > Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>; David
> > > Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher,
> > > Alexander <Alexander.Deucher@amd.com>; Siqueira, Rodrigo
> > > <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > > <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> > > <bas@basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij@amd.com>;
> > > Jani Nikula <jani.nikula@intel.com>; Manasi Navare
> > > <manasi.d.navare@intel.com>; Ankit Nautiyal
> > > <ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > connected end device
> > >
> > > Actually - did some more thinking, and I think we shouldn't try to
> > > make changes like this until we actually know what the problem is
> > > here. I could try to figure out what the actual race conditions I
> > > was facing before with trying to add/destroy connectors based on
> > > PDT, but we still don't even actually have a clear idea of what's broken here.
> > > I'd much rather us figure out exactly how this leak is happening
> > > before considering making changes like this, because we have no way
> > > of knowing if we've properly fixed it or not if we don't know what
> > > the problem is in the first place.
> > >
> > > I'm still happy to write up the topology debugging stuff I mentioned
> > > to you if you think that would help you debug this issue - since
> > > that would make it a lot easier for you to track down what
> > > references are keeping a connector alive (and additkionally, where
> > > those references were taken in code. thanks
> > > stack_depot!)
> > Hi Lyude,
> > Sorry for late response. A bit busy on other stuff recently..
> >
> > Really really thankful for all your help : ) I'm also glad to have the
> > debugging tool if it won’t bother you too much. But before debugging,
>
> no problem! I will get to it early next week then
>
> > I need to have consensus with you about where do we expect to release
> > resource allocated for a stream sink when it's reported as
> > disconnected. Previous patch suggests releasing resource when
> > connector is destroyed which will happen when topology refcount
> > reaches zero (i.e. unplug mstb from topology). But when the case is
> > receiving CSN notifying connection change, we don't try to destroy
> > connector in this case now. And this is not caused by topology/malloc
> > refcount leak since I don't expect neither one of them get decrease to
> > zero under this case (topology of mstbs and ports is not changed).
> > Hence, my plan was to also try to destroy connector under
>
> Ah - I wonder if this might have been where some of the confusion here came from. So-both mstbs and ports (assume I'm talking the actual
> drm_dp_mst_port and drm_dp_mst_branch structs here) are supposed to have non-zero topology refcounts as long as there is a valid path
> between the port or mstb, and our source. This also means that for ports, the drm_connector associated with these ports should stay
> around as long as the port is reachable from the sink
> - regardless of whether anything is actually plugged into the port or not.

This concept is the place where a bit hard for me to get through.
I was thinking that we don’t have to associate a drm connector with a MST port whenever the port exists, since MST port is not always connected
to a stream sink. I treat MST port as an intermediate node of a virtual channel, which is an end-to-end direct virtual connection between a
stream source and a stream sink. Virtual channel could be constructed by multiple link count and stream sink is connected at end port. Hence,
I was thinking to associate a drm connector for end stream sink only.
I think we probably won't want to attach a connector to a relay/retimer/redriver within a stream path? I treat MST port as the similar role when
It's fixed to connect to a MST branch device.

I think it's a bit different to SST case. For legacy DP (before DP 1.2), we can attach a connector to its physical end output port since it's dedicated
for a stream sink. But MST port is not. However, I understand if there is any implementation requirement for us to associate drm connector
for all MST ports.

>
> So - a CSN on it's own shouldn't really get rid of the port it was notifying us about. But if that CSN results in an MSTB -with- its own ports
> being removed, this would mean there would no longer be a valid path between our source and the ports on said MSTB and as such - the
> connector for each one of those ports is removed from the topology. Remember however, when I say "removed from the topology" what
> I'm referring to is the fact that the MST helpers have dropped the main topology reference for a given mstb or port.
> Since various MST helpers retrieve temporary topology references to connectors they work on in order to simplify handling I/O errors, the
> operations from those helpers would potentially keep the port or mstb around in the topology until those helpers have had a chance to
> abort and drop their refs. And then once all the topology references are released, a destruction worker gets scheduled which handles
> unregistering the drm_connector (not destroying it).
> The drm_connector stays around unregistered, up until the point at which all malloc references to the drm_dp_mst_port have been
> released.
>
> I think it may also be worth clarifying the lifetime of drm_connector itself here as well, since that also actually has a refcount. Basically, as
> long as userspace has a mode committed which references a drm_connector - that drm_connector will still exist in memory, and its mode
> object ID will remain valid. This means if we were to have a MST topology hooked up with one display turned on and then suddenly
> unplugged it, keeping in mind that the port with said display now becomes inaccessible from the topology, the drm_connector associated
> with that display would continue to have a valid mode object ID up until the point at which userspace has committed a new mode which
> disables it.
> The sysfs paths for the connector however, will disappear immediately once the connector is unregistered so as to ensure that userspace
> applications cannot try to reuse it later or attempt to reprobe it.
>
> Any resource releases beyond this (streams on the driver side, for instance) are up to the driver, but typically I would expect them to happen
> in the same places as they would with an SST connector. Does that answer your question?

Unplug event of SST sink and MST remote sink is a bit different. SST unplug event relies on long HPD IRQ but MST CSN relies on short HPD IRQ.
Now we use MST helper function drm_dp_mst_handle_conn_stat() to deal with CSN short HPD IRQ. But within this function, driver won't
get notification of disconnection event to release associated allocated resource. So, by not changing the drm connector association logic
here, should we add a new call back function here?

Sorry Lyude, I don't understand as well as you on this and would like to learn more from you. Please correct me if I misunderstand anything
here. Much appreciate!

>
> > this case and the reason is reasonable to me as described in previous mail.
> > With this patch set, I can see connectors eventually get successfully
> > destroyed after userspace committing set_crtc() to free connectors
> > (although also need a fix on the connector refcount grabbed by
> > drm_client_modeset_probe() under specific scenario).
> >
> > I think the main problem I encountered here is that I couldn't find a
> > place that notify us to release resource allocated for a disconnected
> > stream sink when receive CSN. If we decide not to destroy connector
> > under this case, then I probably need some guidance about where to do
> > the release work.
> >
> > Thanks again Lyude!
> > >
> > > On Tue, 2021-08-31 at 18:47 -0400, Lyude Paul wrote:
> > > > (I am going to try responding to this tomorrow btw. I haven't been
> > > > super busy this week, but this has been a surprisingly difficult
> > > > email to respond to because I need to actually need to do a deep
> > > > dive some of the MST helpers tomorrow to figure out more of the
> > > > specifics on why I realized we couldn't just hot add/remove port->connector here).
> > > >
> > > > On Wed, 2021-08-25 at 03:35 +0000, Lin, Wayne wrote:
> > > > > [Public]
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Lyude Paul <lyude@redhat.com>
> > > > > > Sent: Tuesday, August 24, 2021 5:18 AM
> > > > > > To: Lin, Wayne <Wayne.Lin@amd.com>;
> > > > > > dri-devel@lists.freedesktop.org
> > > > > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>;
> > > > > > Wentland, Harry <Harry.Wentland@amd.com>; Zuo, Jerry
> > > > > > <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston
> > > > > > Li <juston.li@intel.com>; Imre Deak <imre.deak@intel.com>;
> > > > > > Ville Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter
> > > > > > <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten
> > > > > > Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > > > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>;
> > > > > > David Airlie <airlied@linux.ie>; Daniel Vetter
> > > > > > <daniel@ffwll.ch>; Deucher, Alexander
> > > > > > <Alexander.Deucher@amd.com>; Siqueira, Rodrigo
> > > > > > <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > > > > > <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> > > > > > <bas@basnieuwenhuizen.nl>; Cornij, Nikola
> > > > > > <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>;
> > > > > > Manasi Navare <manasi.d.navare@intel.com>; Ankit Nautiyal
> > > > > > <ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > > > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > > > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > > > connected end device
> > > > > >
> > > > > > [snip]
> > > > > >
> > > > > > I think I might still be misunderstanding something, some
> > > > > > comments below
> > > > > >
> > > > > > On Mon, 2021-08-23 at 06:33 +0000, Lin, Wayne wrote:
> > > > > > > > > Hi Lyude,
> > > > > > > > >
> > > > > > > > > Really thankful for willing to explain in such details.
> > > > > > > > > Really appreciate.
> > > > > > > > >
> > > > > > > > > I'm trying to fix some problems that observed after
> > > > > > > > > these 2 patches
> > > > > > > > > * 09b974e8983 drm/amd/amdgpu_dm/mst: Remove
> > > > > > > > > ->destroy_connector() callback
> > > > > > > > > * 72dc0f51591 drm/dp_mst: Remove
> > > > > > > > > drm_dp_mst_topology_cbs.destroy_connector
> > > > > > > > >
> > > > > > > > > With above patches, we now change to remove dc_sink when
> > > > > > > > > connector is about to be destroyed. However, we found
> > > > > > > > > out that connectors won't get destroyed after hotplugs.
> > > > > > > > > Thus, after few times hotplugs, we won't create any new
> > > > > > > > > dc_sink since number of sink is exceeding our
> > > > > > > > > limitation. As the result of that, I'm trying to figure
> > > > > > > > > out why the refcount of connectors won't get zero.
> > > > > > > > >
> > > > > > > > > Based on my analysis, I found out that if we connect a
> > > > > > > > > sst monitor to a mst hub then connect the hub to the
> > > > > > > > > system, and then unplug the sst monitor from the hub. E.g.
> > > > > > > > > src - mst hub - sst monitor => src - mst hub  (unplug)
> > > > > > > > > sst monitor
> > > > > > > > >
> > > > > > > > > Within this case, we won't try to put refcount of the
> > > > > > > > > sst monitor.
> > > > > > > > > Which is what I tried to resolve by [PATCH 3/4].
> > > > > > > > > But here comes a problem which is confusing me that if I
> > > > > > > > > can destroy connector in this case. By comparing to
> > > > > > > > > another case, if now mst hub is connected with a mst monitor like this:
> > > > > > > > > src - mst hub - mst monitor => src - mst hub  (unplug)
> > > > > > > > > mst monitor
> > > > > > > > >
> > > > > > > > > We will put the topology refcount of mst monitor's
> > > > > > > > > branching unit in and
> > > > > > > > > drm_dp_port_set_pdt() and eventually call
> > > > > > > > > drm_dp_delayed_destroy_port() to unregister the
> > > > > > > > > connector of the logical port. So following the same
> > > > > > > > > rule, I think to dynamically unregister a mst connector
> > > > > > > > > is what we want and should be reasonable to also destroy
> > > > > > > > > sst connectors in my case. But this conflicts the idea
> > > > > > > > > what we have here. We want to create connectors for all output ports.
> > > > > > > > > So if dynamically creating/destroying connectors is what
> > > > > > > > > we want, when is the appropriate time for us to create
> > > > > > > > > one is what I'm considering.
> > > > > > > > >
> > > > > > > > > Take the StartTech hub DP 1to4 DP output ports for instance.
> > > > > > > > > This hub, internally, is constructed by  3 1-to-2 mst
> > > > > > > > > branch chips. 2 output ports of 1st chip are hardwired
> > > > > > > > > to another 2 chips. It's how it makes it to support 1-to-4 mst branching.
> > > > > > > > > So within this case, the internal
> > > > > > > > > 2 output ports of 1st chip is not connecting to a stream
> > > > > > > > > sink and will never get connected to one.  Thus, I'm
> > > > > > > > > thinking maybe the best timing to attach a connector to
> > > > > > > > > a port is when the port is connected, and the connected
> > > > > > > > > PDT is determined as a stream sink.
> > > > > > > > >
> > > > > > > > > Sorry if I misunderstand anything here and really thanks
> > > > > > > > > for your time to shed light on this : ) Thanks Lyude.
> > > > > > > >
> > > > > > > > It's no problem, it is my job after all! Sorry for how
> > > > > > > > long my responses have been taking, but my plate seems to
> > > > > > > > be finally clearing up for the foreseeable future.
> > > > > > > >
> > > > > > > > That being said - it sounds like with this we still aren't
> > > > > > > > actually clear on where the topology refcount leak is
> > > > > > > > happening - only when it's happening, which says to me
> > > > > > > > that's the issue we really need to be figuring out the
> > > > > > > > cause of as opposed to trying to workaround it.
> > > > > > > >
> > > > > > > > Actually - refcount leaks is an issue I've ran into a
> > > > > > > > number of times before in the past, so a while back I
> > > > > > > > actually added some nice debugging features to assist with debugging leaks.
> > > > > > > > If you enable the following options in your kernel config:
> > > > > > > >
> > > > > > > > CONFIG_EXPERT=y # This must be set first before the next
> > > > > > > > option CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS=y
> > > > > > > >
> > > > > > > > Unfortunately, I'm suddenly realizing after typing this
> > > > > > > > that apparently I never bothered adding a way for us to
> > > > > > > > debug the
> > > > >
> > > > > > > > refcounts of ports/mstbs that haven't been released yet -
> > > > > > > > only the ones for ones that have. This shouldn't be
> > > > > > > > difficult at all for me to add, so I'll send you a patch
> > > > > > > > either today or at the start of next week to try debugging
> > > > > > > > with using this, and then we can figure out where this leak is really coming from.
> > > > > > >
> > > > > > > Thanks Lyude!
> > > > > > >
> > > > > > > Sorry to bother you, but I would like to clarify this again.
> > > > > > > So it sounds
> > > > > >
> > > > > > It's no problem! It's my job and I'm happy to help :).
> > > > >
> > > > > Thanks!
> > > > > I would like to learn more from you as below : p
> > > > > >
> > > > > > > like you also agree that we should destroy associated
> > > > > > > connector
> > > > > >
> > > > > > Not quite. I think a better way of explaining this might be to
> > > > > > point out that the lifetime of an MST port and its connector
> > > > > > isn't supposed to be determined by whether or not it has
> > > > > > something plugged into it - its lifetime is supposed to depend
> > > > > > on whether there's a valid path from us down the MST topology
> > > > > > to the port we're trying to reach. So an MSTB with ports that
> > > > > > is unplugged would destroy all of its ports - but an unplugged
> > > > > > port should just be the same as a disconnected DRM connector -
> > > > > > even if the port itself is just hosting a branching device.
> > > > >
> > > > > This is the part a bit difficult to me. I treat DRM connector as
> > > > > the place where we associate with a stream sink. So if the
> > > > > statement is "All DP mst output ports are places we connect with
> > > > > stream sink", I would say false to this since I can find the
> > > > > negative example when output port is connected with mst branch
> > > > > device. Thus, looks like we could only determine whether to
> > > > > create a connector for an output port when the peer device type is known?
> > > > > >
> > > > > > Additionally - we don't want to try "delaying" connector
> > > > > > creation either.
> > > > > > In the modern world hotplugging is almost always reliable in
> > > > > > normal situations, but even so there's still use cases for
> > > > > > wanting force probing for analog devices on DP converters and
> > > > > > just in general as it's a feature commonly used by developers
> > > > > > or users working around monitors with problematic HPD issues or EDID issues.
> > > > >
> > > > > I think I understand that why we want to create connectors for
> > > > > all output ports here. But under these mentioned use cases,
> > > > > aren't we still capable to force connector to enable stream? MST
> > > > > hub with muti-functon capability, it will enumerate connected
> > > > > virtual DP peer device.
> > > > > For problematic HPD issues or EDID issues, their connection
> > > > > status is also connected.
> > > > >
> > > > > My understanding of output port is it is an internal node to
> > > > > help construct an end-to-end virtual channel between a stream
> > > > > source device and a stream sink device. Creating connectors for
> > > > > internal nodes within a virtual channel is a bit hard for me to get the idea.
> > > > > Please correct me if I misunderstand anything here. Thanks Lyude!
> > > > > >
> > > > > > > when we unplug sst monitor from a mst hub in the case that I
> > > > > > > described? In the case I described (unplug sst monitor), we
> > > > > > > only receive CSN from the hub that notifying us the
> > > > > > > connection status of one of its downstream output ports is
> > > > > > > changed to disconnected. There is no topology refcount
> > > > > > > needed to be decreased on this disconnected port but the malloc refcount.
> > > > > > > Since the output port is still declared by
> > > > > >
> > > > > > Apologies - I misunderstood your original mail as implying
> > > > > > that topology refcounts were being leaked - but it sounds like
> > > > > > it's actually malloc refcounts being leaked instead? In any
> > > > > > case - that means we're still tracing down a leak, just a malloc ref leak.
> > > > > >
> > > > > > But, this still doesn't totally make sense to me. Malloc refs
> > > > > > only keep the actual drm_dp_mst_port/drm_dp_mst_branch struct
> > > > > > alive in memory.
> > > > > > Nothing else is kept around, meaning the DRM connector (and I
> > > > > > assume by proxy, the dc_sink) should both be getting dropped
> > > > > > still and the only thing that should be leaked is a memory allocation.
> > > > > > These things should instead be dropped once there's no longer
> > > > > > any topology references around. So, are we _sure_ that the
> > > > > > problem here is a missing
> > > > > > drm_dp_mst_port_put_malloc() or drm_dp_mst_mstb_put_malloc()?
> > > > >
> > > > > Just my two cents, I don't think it's leak of malloc ref
> > > > > neither. As you said, malloc ref is dealing with the last step to free port/mstb.
> > > > > If there is still topology refcount on port/mstb in my case, we
> > > > > won't free port/mstb.
> > > > > >
> > > > > > If we are unfortunately we don't have equivalent tools for
> > > > > > malloc() tracing. I'm totally fine with trying to add some if
> > > > > > we have trouble figuring out this issue, but I'm a bit
> > > > > > suspicious of the commits you mentioned that introduced this
> > > > > > problem. If the problem doesn't happen until those two
> > > > > > commits, then it's something in the code changes there that are causing this problem.
> > > > >
> > > > > I think we probably also have the problem before these commits,
> > > > > but we didn't notice this before. Just when we change to clean
> > > > > up all things in dm_dp_mst_connector_destroy(), I start to try
> > > > > to figure out all these things out.
> > > > > >
> > > > > > The main thing I'm suspicious of just from looking at changes
> > > > > > in
> > > > > > 09b974e8983a4b163d4a406b46d50bf869da3073 is that the call to
> > > > > > amdgpu_dm_update_freesync_caps() that was previously in
> > > > > > dm_dp_destroy_mst_connector() appears to be dropped and not
> > > > > > re-added in (oh dear, this is a /very/ confusingly similar
> > > > > > function
> > > > >
> > > > > Lol. I also have hard time on this..
> > > > > > name!!!) dm_dp_mst_connector_destroy(). I don't remember if
> > > > > > this was intentional on my part, but does adding a call back
> > > > > > to
> > > > > > amdgpu_dm_update_freesync_caps() into
> > > > > > dm_dp_destroy_mst_connector() right before the
> > > > > > dc_link_remove_remote_sink() call fix anything?
> > > > > >
> > > > > > As well, I'm far less suspicious of this one but does
> > > > > > re-adding this
> > > > > > hunk:
> > > > > >
> > > > > >       aconnector->dc_sink = NULL;
> > > > > >       aconnector->dc_link->cur_link_settings.lane_count = 0;
> > > > > >
> > > > > > After dc_sink_release() fix anything either?
> > > > >
> > > > > So the main problem is we don't have chance to call
> > > > > dc_link_remove_remote_sink() in the unplugging SST case. We only
> > > > > have chance to remove the remote sink of a link when unplug a mstb.
> > > > > >
> > > > > > > the mst hub,  I think we shouldn't destroy the port.
> > > > > > > Actually, no ports nor mst branch devices should get
> > > > > > > destroyed in this case I think.
> > > > > > > The result of LINK_ADDRESS is still the same before/after
> > > > > > > removing the sst monitor except the
> > > > > > > DisplayPort_Device_Plug_Status/ Legacy_Device_Plug_Status.
> > > > > > >
> > > > > > > Hence, if you agree that we should put refcount of the
> > > > > > > connector of the disconnected port within the unplugging sst
> > > > > > > monitor case to release the allocated resource, it means we
> > > > > > > don't want to create connectors for those disconnected
> > > > > > > ports. Which conflicts current flow to create connectors for all declared output ports.
> > > > > > >
> > > > > > > Thanks again for your time Lyude!
> > > > > >
> > > > > > --
> > > > > > Cheers,
> > > > > >  Lyude Paul (she/her)
> > > > > >  Software Engineer at Red Hat
> > > > > --
> > > > > Regards,
> > > > > Wayne
> > > > >
> > > >
> > >
> > > --
> > > Cheers,
> > >  Lyude Paul (she/her)
> > >  Software Engineer at Red Hat
> > --
> > Regards,
> > Wayne Lin
> >
>
> --
> Cheers,
>  Lyude Paul (she/her)
>  Software Engineer at Red Hat

--
Regards,
Wayne Lin
Lyude Paul Oct. 26, 2021, 7:34 p.m. UTC | #19
Comments below

On Tue, 2021-10-26 at 03:50 +0000, Lin, Wayne wrote:
> [Public]
> 
> Hi Lyude!
> Apologize for replying late and really thanks for elaborating in such
> details!
> Following are some of my thoughts : )
> 
> > -----Original Message-----
> > From: Lyude Paul <lyude@redhat.com>
> > Sent: Saturday, September 18, 2021 1:48 AM
> > To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland, Harry
> > <Harry.Wentland@amd.com>; Zuo, Jerry
> > <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li
> > <juston.li@intel.com>; Imre Deak <imre.deak@intel.com>; Ville
> > Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter
> > <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten Lankhorst
> > <maarten.lankhorst@linux.intel.com>; Maxime Ripard <mripard@kernel.org>;
> > Thomas Zimmermann <tzimmermann@suse.de>; David Airlie
> > <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher, Alexander
> > <Alexander.Deucher@amd.com>; Siqueira, Rodrigo
> > <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo <Aurabindo.Pillai@amd.com>;
> > Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>; Cornij,
> > Nikola <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>;
> > Manasi Navare <manasi.d.navare@intel.com>; Ankit Nautiyal
> > <ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben Skeggs
> > <bskeggs@redhat.com>; stable@vger.kernel.org
> > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected
> > end device
> > 
> > Sorry about the slow response, this week XDC has been going on and I've
> > been mostly paying attention to that.
> > 
> > On Tue, 2021-09-14 at 08:46 +0000, Lin, Wayne wrote:
> > > [Public]
> > > 
> > > > -----Original Message-----
> > > > From: Lyude Paul <lyude@redhat.com>
> > > > Sent: Thursday, September 2, 2021 6:00 AM
> > > > To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> > > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland,
> > > > Harry <Harry.Wentland@amd.com>; Zuo, Jerry <Jerry.Zuo@amd.com>; Wu,
> > > > Hersen <hersenxs.wu@amd.com>; Juston Li <juston.li@intel.com>; Imre
> > > > Deak <imre.deak@intel.com>; Ville Syrjälä
> > > > <ville.syrjala@linux.intel.com>; Daniel Vetter
> > > > <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten
> > > > Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>; David
> > > > Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher,
> > > > Alexander <Alexander.Deucher@amd.com>; Siqueira, Rodrigo
> > > > <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > > > <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> > > > <bas@basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij@amd.com>;
> > > > Jani Nikula <jani.nikula@intel.com>; Manasi Navare
> > > > <manasi.d.navare@intel.com>; Ankit Nautiyal
> > > > <ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > connected end device
> > > > 
> > > > Actually - did some more thinking, and I think we shouldn't try to
> > > > make changes like this until we actually know what the problem is
> > > > here. I could try to figure out what the actual race conditions I
> > > > was facing before with trying to add/destroy connectors based on
> > > > PDT, but we still don't even actually have a clear idea of what's
> > > > broken here.
> > > > I'd much rather us figure out exactly how this leak is happening
> > > > before considering making changes like this, because we have no way
> > > > of knowing if we've properly fixed it or not if we don't know what
> > > > the problem is in the first place.
> > > > 
> > > > I'm still happy to write up the topology debugging stuff I mentioned
> > > > to you if you think that would help you debug this issue - since
> > > > that would make it a lot easier for you to track down what
> > > > references are keeping a connector alive (and additkionally, where
> > > > those references were taken in code. thanks
> > > > stack_depot!)
> > > Hi Lyude,
> > > Sorry for late response. A bit busy on other stuff recently..
> > > 
> > > Really really thankful for all your help : ) I'm also glad to have the
> > > debugging tool if it won’t bother you too much. But before debugging,
> > 
> > no problem! I will get to it early next week then
> > 
> > > I need to have consensus with you about where do we expect to release
> > > resource allocated for a stream sink when it's reported as
> > > disconnected. Previous patch suggests releasing resource when
> > > connector is destroyed which will happen when topology refcount
> > > reaches zero (i.e. unplug mstb from topology). But when the case is
> > > receiving CSN notifying connection change, we don't try to destroy
> > > connector in this case now. And this is not caused by topology/malloc
> > > refcount leak since I don't expect neither one of them get decrease to
> > > zero under this case (topology of mstbs and ports is not changed).
> > > Hence, my plan was to also try to destroy connector under
> > 
> > Ah - I wonder if this might have been where some of the confusion here
> > came from. So-both mstbs and ports (assume I'm talking the actual
> > drm_dp_mst_port and drm_dp_mst_branch structs here) are supposed to have
> > non-zero topology refcounts as long as there is a valid path
> > between the port or mstb, and our source. This also means that for ports,
> > the drm_connector associated with these ports should stay
> > around as long as the port is reachable from the sink
> > - regardless of whether anything is actually plugged into the port or not.
> 
> This concept is the place where a bit hard for me to get through.
> I was thinking that we don’t have to associate a drm connector with a MST
> port whenever the port exists, since MST port is not always connected
> to a stream sink. I treat MST port as an intermediate node of a virtual
> channel, which is an end-to-end direct virtual connection between a
> stream source and a stream sink. Virtual channel could be constructed by
> multiple link count and stream sink is connected at end port. Hence,

Just to clarify I'm understanding you correctly, when you say multiple link
count do you just mean VCPI slots or do you mean two separate DP links? I
haven't dealt with the former, but IIRC that is how certain high resolutions
displays are handled over TB correct?

Regardless though, I would think that we could just handle this mostly from
the atomic state even with a connector for every port. For instance, i915
already has something called "big joiner" for supporting display
configurations where one display can take up two separate display pipes
(CRTCs). We could likely do something similar but with connectors if we end up
having to deal with a display driven by two DP links.

> I was thinking to associate a drm connector for end stream sink only.
> I think we probably won't want to attach a connector to a
> relay/retimer/redriver within a stream path? I treat MST port as the similar
> role when
> It's fixed to connect to a MST branch device.

If it's a fixed connection, this might actually be OK to avoid attaching
connectors on. Currently with input ports where we know we can never receive a
CSN for them during runtime, we're able to avoid creating a connector because
no potential for CSN during runtime means the only possible time an input port
could transition would be suspend/resume. So if we detect we're on another
topology where something that was previously an output port that is an input
port on the new topology, we get rid of the connector by removing the
drm_dp_mst_port it's associated with from the topology and replace it with a
new one. This works pretty well, as it avoids doing any actual connector
destruction from the suspend/resume codepath and ensures that any pointer
references to the now non-existent output port remain valid for as long as
needed. So I might actually be open to expanding this for fixed connections
like relays, retimers and redrivers if we handle things in a similar manner.
For anything that can receive a CSN though, a drm_connector is unconditionally
needed even if nothing's connected.

> 
> I think it's a bit different to SST case. For legacy DP (before DP 1.2), we
> can attach a connector to its physical end output port since it's dedicated
> for a stream sink. But MST port is not. However, I understand if there is
> any implementation requirement for us to associate drm connector
> for all MST ports.
> 
> > 
> > So - a CSN on it's own shouldn't really get rid of the port it was
> > notifying us about. But if that CSN results in an MSTB -with- its own
> > ports
> > being removed, this would mean there would no longer be a valid path
> > between our source and the ports on said MSTB and as such - the
> > connector for each one of those ports is removed from the topology.
> > Remember however, when I say "removed from the topology" what
> > I'm referring to is the fact that the MST helpers have dropped the main
> > topology reference for a given mstb or port.
> > Since various MST helpers retrieve temporary topology references to
> > connectors they work on in order to simplify handling I/O errors, the
> > operations from those helpers would potentially keep the port or mstb
> > around in the topology until those helpers have had a chance to
> > abort and drop their refs. And then once all the topology references are
> > released, a destruction worker gets scheduled which handles
> > unregistering the drm_connector (not destroying it).
> > The drm_connector stays around unregistered, up until the point at which
> > all malloc references to the drm_dp_mst_port have been
> > released.
> > 
> > I think it may also be worth clarifying the lifetime of drm_connector
> > itself here as well, since that also actually has a refcount. Basically,
> > as
> > long as userspace has a mode committed which references a drm_connector -
> > that drm_connector will still exist in memory, and its mode
> > object ID will remain valid. This means if we were to have a MST topology
> > hooked up with one display turned on and then suddenly
> > unplugged it, keeping in mind that the port with said display now becomes
> > inaccessible from the topology, the drm_connector associated
> > with that display would continue to have a valid mode object ID up until
> > the point at which userspace has committed a new mode which
> > disables it.
> > The sysfs paths for the connector however, will disappear immediately once
> > the connector is unregistered so as to ensure that userspace
> > applications cannot try to reuse it later or attempt to reprobe it.
> > 
> > Any resource releases beyond this (streams on the driver side, for
> > instance) are up to the driver, but typically I would expect them to
> > happen
> > in the same places as they would with an SST connector. Does that answer
> > your question?
> 
> Unplug event of SST sink and MST remote sink is a bit different. SST unplug
> event relies on long HPD IRQ but MST CSN relies on short HPD IRQ.
> Now we use MST helper function drm_dp_mst_handle_conn_stat() to deal with
> CSN short HPD IRQ. But within this function, driver won't
> get notification of disconnection event to release associated allocated
> resource. So, by not changing the drm connector association logic
> here, should we add a new call back function here?
> 
> Sorry Lyude, I don't understand as well as you on this and would like to
> learn more from you. Please correct me if I misunderstand anything
> here. Much appreciate!

It's no problem at all! I'm always glad to help :). This still sounds a lot
like a bug to me in amdgpu, because we actually do send a hotplug event here.
Basically, the only function that calls this; drm_dp_mst_process_up_req(),
will assume it needs to request a hotplug if we ever call
drm_dp_mst_handle_conn_stat(). From there we pass this information up to
drm_dp_mst_up_req_work(). Then once we've finished handling all pending up
requests, we send a single hotplug to indicate to userspace it needs to
reprobe everything.

Also, I'm still working on the debugging stuff btw!
Lin, Wayne Oct. 29, 2021, 12:11 p.m. UTC | #20
[Public]

Thanks Lyude for patiently guiding on this : )
Would like to learn more as following

> -----Original Message-----
> From: Lyude Paul <lyude@redhat.com>
> Sent: Wednesday, October 27, 2021 3:35 AM
> To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland, Harry <Harry.Wentland@amd.com>; Zuo, Jerry
> <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li <juston.li@intel.com>; Imre Deak <imre.deak@intel.com>;
> Ville Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten Lankhorst
> <maarten.lankhorst@linux.intel.com>; Maxime Ripard <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>;
> David Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher, Alexander <Alexander.Deucher@amd.com>; Siqueira,
> Rodrigo <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> <bas@basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>; Manasi Navare
> <manasi.d.navare@intel.com>; Ankit Nautiyal <ankit.k.nautiyal@intel.com>; José Roberto de Souza <jose.souza@intel.com>; Sean
> Paul <seanpaul@chromium.org>; Ben Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for connected end device
>
> Comments below
>
> On Tue, 2021-10-26 at 03:50 +0000, Lin, Wayne wrote:
> > [Public]
> >
> > Hi Lyude!
> > Apologize for replying late and really thanks for elaborating in such
> > details!
> > Following are some of my thoughts : )
> >
> > > -----Original Message-----
> > > From: Lyude Paul <lyude@redhat.com>
> > > Sent: Saturday, September 18, 2021 1:48 AM
> > > To: Lin, Wayne <Wayne.Lin@amd.com>; dri-devel@lists.freedesktop.org
> > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>; Wentland,
> > > Harry <Harry.Wentland@amd.com>; Zuo, Jerry <Jerry.Zuo@amd.com>; Wu,
> > > Hersen <hersenxs.wu@amd.com>; Juston Li <juston.li@intel.com>; Imre
> > > Deak <imre.deak@intel.com>; Ville Syrjälä
> > > <ville.syrjala@linux.intel.com>; Daniel Vetter
> > > <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten
> > > Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>; David
> > > Airlie <airlied@linux.ie>; Daniel Vetter <daniel@ffwll.ch>; Deucher,
> > > Alexander <Alexander.Deucher@amd.com>; Siqueira, Rodrigo
> > > <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > > <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> > > <bas@basnieuwenhuizen.nl>; Cornij, Nikola <Nikola.Cornij@amd.com>;
> > > Jani Nikula <jani.nikula@intel.com>; Manasi Navare
> > > <manasi.d.navare@intel.com>; Ankit Nautiyal
> > > <ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > connected end device
> > >
> > > Sorry about the slow response, this week XDC has been going on and
> > > I've been mostly paying attention to that.
> > >
> > > On Tue, 2021-09-14 at 08:46 +0000, Lin, Wayne wrote:
> > > > [Public]
> > > >
> > > > > -----Original Message-----
> > > > > From: Lyude Paul <lyude@redhat.com>
> > > > > Sent: Thursday, September 2, 2021 6:00 AM
> > > > > To: Lin, Wayne <Wayne.Lin@amd.com>;
> > > > > dri-devel@lists.freedesktop.org
> > > > > Cc: Kazlauskas, Nicholas <Nicholas.Kazlauskas@amd.com>;
> > > > > Wentland, Harry <Harry.Wentland@amd.com>; Zuo, Jerry
> > > > > <Jerry.Zuo@amd.com>; Wu, Hersen <hersenxs.wu@amd.com>; Juston Li
> > > > > <juston.li@intel.com>; Imre Deak <imre.deak@intel.com>; Ville
> > > > > Syrjälä <ville.syrjala@linux.intel.com>; Daniel Vetter
> > > > > <daniel.vetter@ffwll.ch>; Sean Paul <sean@poorly.run>; Maarten
> > > > > Lankhorst <maarten.lankhorst@linux.intel.com>; Maxime Ripard
> > > > > <mripard@kernel.org>; Thomas Zimmermann <tzimmermann@suse.de>;
> > > > > David Airlie <airlied@linux.ie>; Daniel Vetter
> > > > > <daniel@ffwll.ch>; Deucher, Alexander
> > > > > <Alexander.Deucher@amd.com>; Siqueira, Rodrigo
> > > > > <Rodrigo.Siqueira@amd.com>; Pillai, Aurabindo
> > > > > <Aurabindo.Pillai@amd.com>; Bas Nieuwenhuizen
> > > > > <bas@basnieuwenhuizen.nl>; Cornij, Nikola
> > > > > <Nikola.Cornij@amd.com>; Jani Nikula <jani.nikula@intel.com>;
> > > > > Manasi Navare <manasi.d.navare@intel.com>; Ankit Nautiyal
> > > > > <ankit.k.nautiyal@intel.com>; José Roberto de Souza
> > > > > <jose.souza@intel.com>; Sean Paul <seanpaul@chromium.org>; Ben
> > > > > Skeggs <bskeggs@redhat.com>; stable@vger.kernel.org
> > > > > Subject: Re: [PATCH 2/4] drm/dp_mst: Only create connector for
> > > > > connected end device
> > > > >
> > > > > Actually - did some more thinking, and I think we shouldn't try
> > > > > to make changes like this until we actually know what the
> > > > > problem is here. I could try to figure out what the actual race
> > > > > conditions I was facing before with trying to add/destroy
> > > > > connectors based on PDT, but we still don't even actually have a
> > > > > clear idea of what's broken here.
> > > > > I'd much rather us figure out exactly how this leak is happening
> > > > > before considering making changes like this, because we have no
> > > > > way of knowing if we've properly fixed it or not if we don't
> > > > > know what the problem is in the first place.
> > > > >
> > > > > I'm still happy to write up the topology debugging stuff I
> > > > > mentioned to you if you think that would help you debug this
> > > > > issue - since that would make it a lot easier for you to track
> > > > > down what references are keeping a connector alive (and
> > > > > additkionally, where those references were taken in code. thanks
> > > > > stack_depot!)
> > > > Hi Lyude,
> > > > Sorry for late response. A bit busy on other stuff recently..
> > > >
> > > > Really really thankful for all your help : ) I'm also glad to have
> > > > the debugging tool if it won’t bother you too much. But before
> > > > debugging,
> > >
> > > no problem! I will get to it early next week then
> > >
> > > > I need to have consensus with you about where do we expect to
> > > > release resource allocated for a stream sink when it's reported as
> > > > disconnected. Previous patch suggests releasing resource when
> > > > connector is destroyed which will happen when topology refcount
> > > > reaches zero (i.e. unplug mstb from topology). But when the case
> > > > is receiving CSN notifying connection change, we don't try to
> > > > destroy connector in this case now. And this is not caused by
> > > > topology/malloc refcount leak since I don't expect neither one of
> > > > them get decrease to zero under this case (topology of mstbs and ports is not changed).
> > > > Hence, my plan was to also try to destroy connector under
> > >
> > > Ah - I wonder if this might have been where some of the confusion
> > > here came from. So-both mstbs and ports (assume I'm talking the
> > > actual drm_dp_mst_port and drm_dp_mst_branch structs here) are
> > > supposed to have non-zero topology refcounts as long as there is a
> > > valid path between the port or mstb, and our source. This also means
> > > that for ports, the drm_connector associated with these ports should
> > > stay around as long as the port is reachable from the sink
> > > - regardless of whether anything is actually plugged into the port or not.
> >
> > This concept is the place where a bit hard for me to get through.
> > I was thinking that we don’t have to associate a drm connector with a
> > MST port whenever the port exists, since MST port is not always
> > connected to a stream sink. I treat MST port as an intermediate node
> > of a virtual channel, which is an end-to-end direct virtual connection
> > between a stream source and a stream sink. Virtual channel could be
> > constructed by multiple link count and stream sink is connected at end
> > port. Hence,
>
> Just to clarify I'm understanding you correctly, when you say multiple link count do you just mean VCPI slots or do you mean two
> separate DP links? I haven't dealt with the former, but IIRC that is how certain high resolutions displays are handled over TB correct?

Sorry for not clarifying on this. I mean the case when LCT (link count total) field in the packet header is bigger than one. Which means
multiple hops from source to sink.
e.g.  src->mstb->mstb..->sink
I was trying to express the idea to only associate drm connector at the end point stream sink when this virtual channel path is constructed.
Since the idea of virtual channel is constructing an end-to-end direct connection, my first thought was intermediate ports are transparent
to userspace and no need to create drm connectors for them.

>
> Regardless though, I would think that we could just handle this mostly from the atomic state even with a connector for every port. For
> instance, i915 already has something called "big joiner" for supporting display configurations where one display can take up two
> separate display pipes (CRTCs). We could likely do something similar but with connectors if we end up having to deal with a display
> driven by two DP links.
>
> > I was thinking to associate a drm connector for end stream sink only.
> > I think we probably won't want to attach a connector to a
> > relay/retimer/redriver within a stream path? I treat MST port as the
> > similar role when It's fixed to connect to a MST branch device.
>
> If it's a fixed connection, this might actually be OK to avoid attaching connectors on. Currently with input ports where we know we can
> never receive a CSN for them during runtime, we're able to avoid creating a connector because no potential for CSN during runtime
> means the only possible time an input port could transition would be suspend/resume. So if we detect we're on another topology
> where something that was previously an output port that is an input port on the new topology, we get rid of the connector by
> removing the drm_dp_mst_port it's associated with from the topology and replace it with a new one. This works pretty well, as it
> avoids doing any actual connector destruction from the suspend/resume codepath and ensures that any pointer references to the
> now non-existent output port remain valid for as long as needed. So I might actually be open to expanding this for fixed connections
> like relays, retimers and redrivers if we handle things in a similar manner.
> For anything that can receive a CSN though, a drm_connector is unconditionally needed even if nothing's connected.

Want to deepen my knowledge here. Sorry Lyude could you explain more on this please?
Are you saying that if we change to associate drm connector as what I proposed in this patch, we will create actual connector destruction
from the suspend/resume codepath and which is a problem here? I thought once the connection status changed from connected to
disconnected during suspend/resume, we still use the same way as what we did in drm_dp_delayed_destroy_port():
i.e.
if (port->connector) {
        drm_connector_unregister(port->connector);
        drm_connector_put(port->connector);
}
We won't directly destruct the drm connector?

>
> >
> > I think it's a bit different to SST case. For legacy DP (before DP
> > 1.2), we can attach a connector to its physical end output port since
> > it's dedicated for a stream sink. But MST port is not. However, I
> > understand if there is any implementation requirement for us to
> > associate drm connector for all MST ports.
> >
> > >
> > > So - a CSN on it's own shouldn't really get rid of the port it was
> > > notifying us about. But if that CSN results in an MSTB -with- its
> > > own ports being removed, this would mean there would no longer be a
> > > valid path between our source and the ports on said MSTB and as such
> > > - the connector for each one of those ports is removed from the
> > > topology.
> > > Remember however, when I say "removed from the topology" what I'm
> > > referring to is the fact that the MST helpers have dropped the main
> > > topology reference for a given mstb or port.
> > > Since various MST helpers retrieve temporary topology references to
> > > connectors they work on in order to simplify handling I/O errors,
> > > the operations from those helpers would potentially keep the port or
> > > mstb around in the topology until those helpers have had a chance to
> > > abort and drop their refs. And then once all the topology references
> > > are released, a destruction worker gets scheduled which handles
> > > unregistering the drm_connector (not destroying it).
> > > The drm_connector stays around unregistered, up until the point at
> > > which all malloc references to the drm_dp_mst_port have been
> > > released.
> > >
> > > I think it may also be worth clarifying the lifetime of
> > > drm_connector itself here as well, since that also actually has a
> > > refcount. Basically, as long as userspace has a mode committed which
> > > references a drm_connector - that drm_connector will still exist in
> > > memory, and its mode object ID will remain valid. This means if we
> > > were to have a MST topology hooked up with one display turned on and
> > > then suddenly unplugged it, keeping in mind that the port with said
> > > display now becomes inaccessible from the topology, the
> > > drm_connector associated with that display would continue to have a
> > > valid mode object ID up until the point at which userspace has
> > > committed a new mode which disables it.
> > > The sysfs paths for the connector however, will disappear
> > > immediately once the connector is unregistered so as to ensure that
> > > userspace applications cannot try to reuse it later or attempt to reprobe it.
> > >
> > > Any resource releases beyond this (streams on the driver side, for
> > > instance) are up to the driver, but typically I would expect them to
> > > happen in the same places as they would with an SST connector. Does
> > > that answer your question?
> >
> > Unplug event of SST sink and MST remote sink is a bit different. SST
> > unplug event relies on long HPD IRQ but MST CSN relies on short HPD IRQ.
> > Now we use MST helper function drm_dp_mst_handle_conn_stat() to deal
> > with CSN short HPD IRQ. But within this function, driver won't get
> > notification of disconnection event to release associated allocated
> > resource. So, by not changing the drm connector association logic
> > here, should we add a new call back function here?
> >
> > Sorry Lyude, I don't understand as well as you on this and would like
> > to learn more from you. Please correct me if I misunderstand anything
> > here. Much appreciate!
>
> It's no problem at all! I'm always glad to help :). This still sounds a lot like a bug to me in amdgpu, because we actually do send a
> hotplug event here.
> Basically, the only function that calls this; drm_dp_mst_process_up_req(), will assume it needs to request a hotplug if we ever call
> drm_dp_mst_handle_conn_stat(). From there we pass this information up to drm_dp_mst_up_req_work(). Then once we've
> finished handling all pending up requests, we send a single hotplug to indicate to userspace it needs to reprobe everything.
>
Right! I might not recall correctly, but I think that's why I want this patch. I probably encountered that userspace doesn’t explicitly
try to react to this unplug event. Instead, it tries to react when we plug in monitor next time. And the problem is when we plug in
monitor next time, stale resources are not released yet. It then hits the limitation within our HW. Which let me want to explicitly
release resource once driver detect the unplug event (just like sst long HPD event I think).  By the way, just out of curiosity, when
do you think is the timing to release sink related resource if we rely on hotplug event notifying userspace? When userspace frees the
associated pipe of the connector? Won't it be a transient state that userspace just free the pipe temporarily?

> Also, I'm still working on the debugging stuff btw!
Much appreciate Lyude! Thanks!

>
> --
> Cheers,
>  Lyude Paul (she/her)
>  Software Engineer at Red Hat
--
Regards,
Wayne Lin
Lyude Paul Nov. 2, 2021, 10:31 p.m. UTC | #21
On Fri, 2021-10-29 at 12:11 +0000, Lin, Wayne wrote:
> [Public]
> 
> Thanks Lyude for patiently guiding on this : )
> Would like to learn more as following

I do follow your bit about connectors only being created when a virtual path
is instantiated, but that still doesn't follow how connectors in DRM typically
behave though as this idea still comes down to "we don't have disconnected
connectors, only connected ones". Which still breaks force probing (if the
connector doesn't exist in userspace because we destroyed it, how do we get to
it's force sysfs file?), and also just makes hides information from userspace
that it might actually care about (what if for instance, a GUI wanted to display
the topology layout of an MST hub -including- all of the currently disconnected
ports on it? Considering we allow this for things like USB, it doesn't make
sense to hide them for MST.

As well, while your idea for what an MST connector is honestly does make a lot
more sense then what we have, that's not really the issue here. The problem is
that connector creation/destruction is already quite racy, and requires a _lot_
of care to get right. We've already had tons of bugs in the past that lead to us
resorting to all of the tricks we're currently using, for instance:
Which just seems to add a lot of
complication to the current MST code, without much reason here besides trying
to reduce the amount of connectors along with a potential bug with leaking
connectors that we still don't know the cause of. Trying to solve problems
without understanding exactly what's causing them 
something around a bug that could be entirely unrelated to how we create
connectors, because then it's not even really guaranteed we've fixed anything
if we don't know what caused the problem in the first place. Working around
problems might temporarily fix the ones we're dealing with right now, but
without understanding what's causing it there's no guarantee it won't just pop
up again in the future or that we won't introduce new problems in the process.

> 
> > 
> > Regardless though, I would think that we could just handle this mostly
> > from the atomic state even with a connector for every port. For
> > instance, i915 already has something called "big joiner" for supporting
> > display configurations where one display can take up two
> > separate display pipes (CRTCs). We could likely do something similar but
> > with connectors if we end up having to deal with a display
> > driven by two DP links.
> > 
> > > I was thinking to associate a drm connector for end stream sink only.
> > > I think we probably won't want to attach a connector to a
> > > relay/retimer/redriver within a stream path? I treat MST port as the
> > > similar role when It's fixed to connect to a MST branch device.
> > 
> > If it's a fixed connection, this might actually be OK to avoid attaching
> > connectors on. Currently with input ports where we know we can
> > never receive a CSN for them during runtime, we're able to avoid creating
> > a connector because no potential for CSN during runtime
> > means the only possible time an input port could transition would be
> > suspend/resume. So if we detect we're on another topology
> > where something that was previously an output port that is an input port
> > on the new topology, we get rid of the connector by
> > removing the drm_dp_mst_port it's associated with from the topology and
> > replace it with a new one. This works pretty well, as it
> > avoids doing any actual connector destruction from the suspend/resume
> > codepath and ensures that any pointer references to the
> > now non-existent output port remain valid for as long as needed. So I
> > might actually be open to expanding this for fixed connections
> > like relays, retimers and redrivers if we handle things in a similar
> > manner.
> > For anything that can receive a CSN though, a drm_connector is
> > unconditionally needed even if nothing's connected.
> 
> Want to deepen my knowledge here. Sorry Lyude could you explain more on this
> please?
> Are you saying that if we change to associate drm connector as what I
> proposed in this patch, we will create actual connector destruction
> from the suspend/resume codepath and which is a problem here? I thought once
> the connection status changed from connected to
> disconnected during suspend/resume, we still use the same way as what we did
> in drm_dp_delayed_destroy_port():
> i.e.
> if (port->connector) {
>         drm_connector_unregister(port->connector);
>         drm_connector_put(port->connector);
> }
> We won't directly destruct the drm connector?

Something like that, I'd need to to go look further into the details because I
very vividly remember most of the tricks we do in the MST helpers regarding
delayed connector destruction and when/how we change various members of the
drm_dp_mst_port/drm_dp_mst_branch structures. I vaguely remember the problem
with trying to hot add/remove connectors (I -did- actually try to do this once
I believe! but not as thoroughly as you have) being some kind of lockdep
issue. I started trying to dig into the MST code a bit deeper to get a clear
answer on this, but I actually decided to take that time and just finish up
the debug helpers I mentioned (I'll send the WIP patch I've got to you in a
moment, and will send it off the mailing list once I finish hooking things up
in i915) because it really just doesn't seem to me like we actually have a
clear understanding of how this issue is being caused - and it's not a good
idea for us to make any kind of API change like this to attempt (and
inevitably fail or break something else) to fix an issue we don't fully
understand.

[snip...]

> > 
> Right! I might not recall correctly, but I think that's why I want this
> patch. I probably encountered that userspace doesn’t explicitly
> try to react to this unplug event. Instead, it tries to react when we plug
> in monitor next time. And the problem is when we plug in
> monitor next time, stale resources are not released yet. It then hits the
> limitation within our HW. Which let me want to explicitly
> release resource once driver detect the unplug event (just like sst long HPD
> event I think).  By the way, just out of curiosity, when
> do you think is the timing to release sink related resource if we rely on
> hotplug event notifying userspace? When userspace frees the
> associated pipe of the connector? Won't it be a transient state that
> userspace just free the pipe temporarily?

The timing of releasing resources should be done at the same time that we
disable the connector. In general, MST modesetting shouldn't be much different
from anything else - except for having to maintain a payload table and
bandwidth limitations across a shared connection. So pretty much everything
related to enabling or disabling streams should be in the atomic commit phase
(with any bandwidth calculations done beforehand, WIP...). I'm going to say,
let's figure out where this is happening first. I've got the debugging patches
for this ready and will send them to you now.

> 
> > Also, I'm still working on the debugging stuff btw!
> Much appreciate Lyude! Thanks!
> 
> > 
> > --
> > Cheers,
> >  Lyude Paul (she/her)
> >  Software Engineer at Red Hat
> --
> Regards,
> Wayne Lin
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 51cd7f74f026..f13c7187b07f 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -2474,7 +2474,8 @@  drm_dp_mst_handle_link_address_port(struct drm_dp_mst_branch *mstb,
 
 	if (port->connector)
 		drm_modeset_unlock(&mgr->base.lock);
-	else if (!port->input)
+	else if (!port->input && port->pdt != DP_PEER_DEVICE_NONE &&
+		 drm_dp_mst_is_end_device(port->pdt, port->mcs))
 		drm_dp_mst_port_add_connector(mstb, port);
 
 	if (send_link_addr && port->mstb) {
@@ -2557,6 +2558,10 @@  drm_dp_mst_handle_conn_stat(struct drm_dp_mst_branch *mstb,
 		dowork = false;
 	}
 
+	if (!port->input && !port->connector && new_pdt != DP_PEER_DEVICE_NONE &&
+	    drm_dp_mst_is_end_device(new_pdt, new_mcs))
+		create_connector = true;
+
 	if (port->connector)
 		drm_modeset_unlock(&mgr->base.lock);
 	else if (create_connector)