diff mbox series

[v6,14/26] drm/bridge: add support for refcounted DRM bridges

Message ID 20250206-hotplug-drm-bridge-v6-14-9d6f2c9c3058@bootlin.com (mailing list archive)
State New, archived
Headers show
Series Add support for hot-pluggable DRM bridges | expand

Commit Message

Luca Ceresoli Feb. 6, 2025, 6:14 p.m. UTC
DRM bridges are currently considered as a fixed element of a DRM card, and
thus their lifetime is assumed to extend for as long as the card
exists. New use cases, such as hot-pluggable hardware with video bridges,
require DRM bridges to be added and removed to a DRM card without tearing
the card down. This is possible for connectors already (used by DP MST), so
add this possibility to DRM bridges as well.

Implementation is based on drm_connector_init() as far as it makes sense,
and differs when it doesn't. A difference is that bridges are not exposed
to userspace, hence struct drm_bridge does not embed a struct
drm_mode_object which would provide the refcount. Instead we add to struct
drm_bridge a refcount field (we don't need other struct drm_mode_object
fields here) and instead of using the drm_mode_object_*() functions we
reimplement from those functions the few lines that drm_bridge needs for
refcounting.

Also add a new devm_drm_bridge_alloc() macro to allocate a new refcounted
bridge.

Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

---

Changes in v6:
 - use drm_warn, not WARN_ON (Jani Nikula)
 - Add devm_drm_bridge_alloc() to replace drm_bridge_init() (similar to
   drmm_encoder_alloc)
 - Remove .destroy func: deallocation is done via the struct offset
   computed by the devm_drm_bridge_alloc() macro
 - use fixed free callback, as the same callback is used in all cases
   anyway (remove free_cb, add bool is_refcounted)
 - add drm_bridge_get/put() to drm_bridge_attach/detach() (add the bridge
   to a list)
 - make some DRM_DEBUG() strings more informative

This patch was added in v5.
---
 drivers/gpu/drm/drm_bridge.c |  76 ++++++++++++++++++++++++++--
 include/drm/drm_bridge.h     | 117 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 189 insertions(+), 4 deletions(-)

Comments

Maxime Ripard Feb. 7, 2025, 11:47 a.m. UTC | #1
Hi,

On Thu, Feb 06, 2025 at 07:14:29PM +0100, Luca Ceresoli wrote:
> DRM bridges are currently considered as a fixed element of a DRM card, and
> thus their lifetime is assumed to extend for as long as the card
> exists. New use cases, such as hot-pluggable hardware with video bridges,
> require DRM bridges to be added and removed to a DRM card without tearing
> the card down. This is possible for connectors already (used by DP MST), so
> add this possibility to DRM bridges as well.
> 
> Implementation is based on drm_connector_init() as far as it makes sense,
> and differs when it doesn't. A difference is that bridges are not exposed
> to userspace, hence struct drm_bridge does not embed a struct
> drm_mode_object which would provide the refcount. Instead we add to struct
> drm_bridge a refcount field (we don't need other struct drm_mode_object
> fields here) and instead of using the drm_mode_object_*() functions we
> reimplement from those functions the few lines that drm_bridge needs for
> refcounting.
> 
> Also add a new devm_drm_bridge_alloc() macro to allocate a new refcounted
> bridge.
> 
> Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

So, a couple of general comments:

- I've said it a couple of times already, but I really think you're
  making it harder than necessary for you here. This (and only this!)
  should be the very first series you should be pushing. The rest can
  only ever work if that work goes through, and it's already hard enough
  as it is. So, split that patch into a series of its own, get that
  merged, and then we will be able to deal with panels conversion and
  whatever. That's even more true with panels since there's ongoing work
  that will make it easier for you too. So the best thing here is
  probably to wait.

- This patch really needs to be split into several patches, something
  along the lines of:

  + Creating devm_drm_bridge_alloc()
  + Adding refcounting
  + Taking the references in all the needed places
  + Converting a bunch of drivers

> Changes in v6:
>  - use drm_warn, not WARN_ON (Jani Nikula)
>  - Add devm_drm_bridge_alloc() to replace drm_bridge_init() (similar to
>    drmm_encoder_alloc)
>  - Remove .destroy func: deallocation is done via the struct offset
>    computed by the devm_drm_bridge_alloc() macro
>  - use fixed free callback, as the same callback is used in all cases
>    anyway (remove free_cb, add bool is_refcounted)
>  - add drm_bridge_get/put() to drm_bridge_attach/detach() (add the bridge
>    to a list)
>  - make some DRM_DEBUG() strings more informative
> 
> This patch was added in v5.
> ---
>  drivers/gpu/drm/drm_bridge.c |  76 ++++++++++++++++++++++++++--
>  include/drm/drm_bridge.h     | 117 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 189 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c
> index 1955a231378050abf1071d74a145831b425c47c5..f694b32ca59cb91c32846bc00b43360df41cc1ad 100644
> --- a/drivers/gpu/drm/drm_bridge.c
> +++ b/drivers/gpu/drm/drm_bridge.c
> @@ -200,6 +200,57 @@
>  DEFINE_MUTEX(bridge_lock);
>  LIST_HEAD(bridge_list);
>  
> +/* Internal function (for refcounted bridges) */
> +void __drm_bridge_free(struct kref *kref)
> +{
> +	struct drm_bridge *bridge = container_of(kref, struct drm_bridge, refcount);
> +	void *container = ((void *)bridge) - bridge->container_offset;
> +
> +	DRM_DEBUG("bridge=%p, container=%p FREE\n", bridge, container);

Pointers are not really useful to track here, since they are obfuscated
most of the time. Using the bridge device name would probably be better
(or removing the SHOUTING DEBUG entirely :))

> +	kfree(container);
> +}
> +EXPORT_SYMBOL(__drm_bridge_free);
> +
> +static void drm_bridge_put_void(void *data)
> +{
> +	struct drm_bridge *bridge = (struct drm_bridge *)data;
> +
> +	drm_bridge_put(bridge);
> +}
> +
> +void *__devm_drm_bridge_alloc(struct device *dev, size_t size, size_t offset,
> +			      const struct drm_bridge_funcs *funcs)
> +{
> +	void *container;
> +	struct drm_bridge *bridge;
> +	int err;
> +
> +	if (!funcs) {
> +		dev_warn(dev, "Missing funcs pointer\n");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	container = kzalloc(size, GFP_KERNEL);
> +	if (!container)
> +		return ERR_PTR(-ENOMEM);
> +
> +	bridge = container + offset;
> +	bridge->container_offset = offset;
> +	bridge->funcs = funcs;
> +	kref_init(&bridge->refcount);
> +	bridge->is_refcounted = 1;
> +
> +	err = devm_add_action_or_reset(dev, drm_bridge_put_void, bridge);
> +	if (err)
> +		return ERR_PTR(err);
> +
> +	DRM_DEBUG("bridge=%p, container=%p, funcs=%ps ALLOC\n", bridge, container, funcs);
> +
> +	return container;
> +}
> +EXPORT_SYMBOL(__devm_drm_bridge_alloc);
> +
>  /**
>   * drm_bridge_add - add the given bridge to the global bridge list
>   *
> @@ -209,6 +260,10 @@ void drm_bridge_add(struct drm_bridge *bridge)
>  {
>  	struct drm_bridge *br, *tmp;
>  
> +	DRM_DEBUG("bridge=%p ADD\n", bridge);
> +
> +	drm_bridge_get(bridge);
> +
>  	mutex_init(&bridge->hpd_mutex);
>  
>  	if (bridge->ops & DRM_BRIDGE_OP_HDMI)
> @@ -257,6 +312,8 @@ void drm_bridge_remove(struct drm_bridge *bridge)
>  {
>  	struct drm_bridge *br, *tmp;
>  
> +	DRM_DEBUG("bridge=%p REMOVE\n", bridge);
> +
>  	mutex_lock(&bridge_lock);
>  	list_del_init(&bridge->list);
>  	mutex_unlock(&bridge_lock);
> @@ -266,6 +323,8 @@ void drm_bridge_remove(struct drm_bridge *bridge)
>  			br->funcs->bridge_event_notify(br, DRM_EVENT_BRIDGE_REMOVE, bridge);
>  
>  	mutex_destroy(&bridge->hpd_mutex);
> +
> +	drm_bridge_put(bridge);
>  }
>  EXPORT_SYMBOL(drm_bridge_remove);
>  
> @@ -326,11 +385,17 @@ int drm_bridge_attach(struct drm_encoder *encoder, struct drm_bridge *bridge,
>  	if (!encoder || !bridge)
>  		return -EINVAL;
>  
> -	if (previous && (!previous->dev || previous->encoder != encoder))
> -		return -EINVAL;
> +	drm_bridge_get(bridge);
>  
> -	if (bridge->dev)
> -		return -EBUSY;
> +	if (previous && (!previous->dev || previous->encoder != encoder)) {
> +		ret = -EINVAL;
> +		goto err_put_bridge;
> +	}
> +
> +	if (bridge->dev) {
> +		ret = -EBUSY;
> +		goto err_put_bridge;
> +	}
>  
>  	bridge->dev = encoder->dev;
>  	bridge->encoder = encoder;
> @@ -379,6 +444,8 @@ int drm_bridge_attach(struct drm_encoder *encoder, struct drm_bridge *bridge,
>  			      "failed to attach bridge %pOF to encoder %s\n",
>  			      bridge->of_node, encoder->name);
>  
> +err_put_bridge:
> +	drm_bridge_put(bridge);
>  	return ret;
>  }
>  EXPORT_SYMBOL(drm_bridge_attach);
> @@ -399,6 +466,7 @@ void drm_bridge_detach(struct drm_bridge *bridge)
>  
>  	list_del(&bridge->chain_node);
>  	bridge->dev = NULL;
> +	drm_bridge_put(bridge);
>  }
>  
>  /**
> diff --git a/include/drm/drm_bridge.h b/include/drm/drm_bridge.h
> index ad7ba444a13e5ecf16f996de3742e4ac67dc21f1..43cef0f6ccd36034f64ad2babfebea62db1d9e43 100644
> --- a/include/drm/drm_bridge.h
> +++ b/include/drm/drm_bridge.h
> @@ -31,6 +31,7 @@
>  #include <drm/drm_encoder.h>
>  #include <drm/drm_mode_object.h>
>  #include <drm/drm_modes.h>
> +#include <drm/drm_print.h>
>  
>  struct device_node;
>  
> @@ -863,6 +864,22 @@ struct drm_bridge {
>  	const struct drm_bridge_timings *timings;
>  	/** @funcs: control functions */
>  	const struct drm_bridge_funcs *funcs;
> +
> +	/**
> +	 * @container_offset: Offset of this struct within the container
> +	 * struct embedding it. Used for refcounted bridges to free the
> +	 * embeddeing struct when the refcount drops to zero. Unused on
> +	 * legacy bridges.
> +	 */
> +	size_t container_offset;

This shouldn't be in there. You can create an intermediate structure and
store both pointers for the action to consume.

> +	/**
> +	 * @refcount: reference count for bridges with dynamic lifetime
> +	 * (see drm_bridge_init)
> +	 */
> +	struct kref refcount;
> +	/** @is_refcounted: this bridge has dynamic lifetime management */
> +	bool is_refcounted;
> +

I'm not sure we want to treat both paths separately too. It'll require
to update most of/all the drivers, but it's not too hard with
coccinelle:

virtual patch

@@
identifier f;
identifier b, c, d;
expression bf;
type T;
@@

 f(...)
 {
	...
-	T *c;
+	T *c;
	...
-	c = devm_kzalloc(d, ...);
+	c = devm_drm_bridge_alloc(d, T, b, bf);
	...
-	c->b.funcs = bf;
	...
	drm_bridge_add(&c->b);
	...
 }

We need to add a bit more variations (like kzalloc vs devm_kzalloc,
drm_bridge_add vs devm_drm_bridge_add, etc.), but it should be a good
first approximation

Maxime
Dmitry Baryshkov Feb. 7, 2025, 7:54 p.m. UTC | #2
On Fri, Feb 07, 2025 at 12:47:51PM +0100, Maxime Ripard wrote:
> Hi,
> 
> On Thu, Feb 06, 2025 at 07:14:29PM +0100, Luca Ceresoli wrote:
> > DRM bridges are currently considered as a fixed element of a DRM card, and
> > thus their lifetime is assumed to extend for as long as the card
> > exists. New use cases, such as hot-pluggable hardware with video bridges,
> > require DRM bridges to be added and removed to a DRM card without tearing
> > the card down. This is possible for connectors already (used by DP MST), so
> > add this possibility to DRM bridges as well.
> > 
> > Implementation is based on drm_connector_init() as far as it makes sense,
> > and differs when it doesn't. A difference is that bridges are not exposed
> > to userspace, hence struct drm_bridge does not embed a struct
> > drm_mode_object which would provide the refcount. Instead we add to struct
> > drm_bridge a refcount field (we don't need other struct drm_mode_object
> > fields here) and instead of using the drm_mode_object_*() functions we
> > reimplement from those functions the few lines that drm_bridge needs for
> > refcounting.
> > 
> > Also add a new devm_drm_bridge_alloc() macro to allocate a new refcounted
> > bridge.
> > 
> > Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
> 
> So, a couple of general comments:
> 
> - I've said it a couple of times already, but I really think you're
>   making it harder than necessary for you here. This (and only this!)
>   should be the very first series you should be pushing. The rest can
>   only ever work if that work goes through, and it's already hard enough
>   as it is. So, split that patch into a series of its own, get that
>   merged, and then we will be able to deal with panels conversion and
>   whatever. That's even more true with panels since there's ongoing work
>   that will make it easier for you too. So the best thing here is
>   probably to wait.

Luca and I had a quick chat on this during FOSDEM. I really think that
panel (part of the) series can go in first as it fixes a very well known
bug _and_ allows a pretty good cleanup to a whole set of drivers. With
all those panel / bridge wrappers gone we should be able to see a
clearer picture of what individual drivers are doing. In other words,
which memory and which code actually hosts and uses internal
'next_bridge' reference.

> - This patch really needs to be split into several patches, something
>   along the lines of:
> 
>   + Creating devm_drm_bridge_alloc()
>   + Adding refcounting
>   + Taking the references in all the needed places
>   + Converting a bunch of drivers

The last two parts seem troublematic to me, but, I must admit, I didn't
spend so much time reviewing all drm_bridge usage patterns.

> 
> > Changes in v6:
> >  - use drm_warn, not WARN_ON (Jani Nikula)
> >  - Add devm_drm_bridge_alloc() to replace drm_bridge_init() (similar to
> >    drmm_encoder_alloc)
> >  - Remove .destroy func: deallocation is done via the struct offset
> >    computed by the devm_drm_bridge_alloc() macro
> >  - use fixed free callback, as the same callback is used in all cases
> >    anyway (remove free_cb, add bool is_refcounted)
> >  - add drm_bridge_get/put() to drm_bridge_attach/detach() (add the bridge
> >    to a list)
> >  - make some DRM_DEBUG() strings more informative
> > 
> > This patch was added in v5.
> > ---
> >  drivers/gpu/drm/drm_bridge.c |  76 ++++++++++++++++++++++++++--
> >  include/drm/drm_bridge.h     | 117 +++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 189 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c
> > index 1955a231378050abf1071d74a145831b425c47c5..f694b32ca59cb91c32846bc00b43360df41cc1ad 100644
> > --- a/drivers/gpu/drm/drm_bridge.c
> > +++ b/drivers/gpu/drm/drm_bridge.c
> > @@ -200,6 +200,57 @@
> >  DEFINE_MUTEX(bridge_lock);
> >  LIST_HEAD(bridge_list);
> >  
> > +/* Internal function (for refcounted bridges) */
> > +void __drm_bridge_free(struct kref *kref)
> > +{
> > +	struct drm_bridge *bridge = container_of(kref, struct drm_bridge, refcount);
> > +	void *container = ((void *)bridge) - bridge->container_offset;
> > +
> > +	DRM_DEBUG("bridge=%p, container=%p FREE\n", bridge, container);
> 
> Pointers are not really useful to track here, since they are obfuscated
> most of the time. Using the bridge device name would probably be better
> (or removing the SHOUTING DEBUG entirely :))

bridge device name or bridge funcs (I opted for the latter for the
debugfs file)
Maxime Ripard Feb. 10, 2025, 12:31 p.m. UTC | #3
On Fri, Feb 07, 2025 at 09:54:06PM +0200, Dmitry Baryshkov wrote:
> On Fri, Feb 07, 2025 at 12:47:51PM +0100, Maxime Ripard wrote:
> > Hi,
> > 
> > On Thu, Feb 06, 2025 at 07:14:29PM +0100, Luca Ceresoli wrote:
> > > DRM bridges are currently considered as a fixed element of a DRM card, and
> > > thus their lifetime is assumed to extend for as long as the card
> > > exists. New use cases, such as hot-pluggable hardware with video bridges,
> > > require DRM bridges to be added and removed to a DRM card without tearing
> > > the card down. This is possible for connectors already (used by DP MST), so
> > > add this possibility to DRM bridges as well.
> > > 
> > > Implementation is based on drm_connector_init() as far as it makes sense,
> > > and differs when it doesn't. A difference is that bridges are not exposed
> > > to userspace, hence struct drm_bridge does not embed a struct
> > > drm_mode_object which would provide the refcount. Instead we add to struct
> > > drm_bridge a refcount field (we don't need other struct drm_mode_object
> > > fields here) and instead of using the drm_mode_object_*() functions we
> > > reimplement from those functions the few lines that drm_bridge needs for
> > > refcounting.
> > > 
> > > Also add a new devm_drm_bridge_alloc() macro to allocate a new refcounted
> > > bridge.
> > > 
> > > Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
> > 
> > So, a couple of general comments:
> > 
> > - I've said it a couple of times already, but I really think you're
> >   making it harder than necessary for you here. This (and only this!)
> >   should be the very first series you should be pushing. The rest can
> >   only ever work if that work goes through, and it's already hard enough
> >   as it is. So, split that patch into a series of its own, get that
> >   merged, and then we will be able to deal with panels conversion and
> >   whatever. That's even more true with panels since there's ongoing work
> >   that will make it easier for you too. So the best thing here is
> >   probably to wait.
> 
> Luca and I had a quick chat on this during FOSDEM. I really think that
> panel (part of the) series can go in first as it fixes a very well known
> bug _and_ allows a pretty good cleanup to a whole set of drivers.

I don't necessarily disagree on principle, but if you state that it can
get first, and fixes a known problem (which one?), then it should be a
separate, standalone, series.

Ever-expanding features are bad for both the reviewers and the
contributors, even more so when the discussion happens off-list.

> With all those panel / bridge wrappers gone we should be able to see a
> clearer picture of what individual drivers are doing. In other words,
> which memory and which code actually hosts and uses internal
> 'next_bridge' reference.
> 
> > - This patch really needs to be split into several patches, something
> >   along the lines of:
> > 
> >   + Creating devm_drm_bridge_alloc()
> >   + Adding refcounting
> >   + Taking the references in all the needed places
> >   + Converting a bunch of drivers
> 
> The last two parts seem troublematic to me, but, I must admit, I didn't
> spend so much time reviewing all drm_bridge usage patterns.

Why? the third one is already done by that patch, the fourth can
relatively easily be done using coccinelle.

Maxime
Dmitry Baryshkov Feb. 10, 2025, 2:23 p.m. UTC | #4
On Mon, 10 Feb 2025 at 14:31, Maxime Ripard <mripard@kernel.org> wrote:
>
> On Fri, Feb 07, 2025 at 09:54:06PM +0200, Dmitry Baryshkov wrote:
> > On Fri, Feb 07, 2025 at 12:47:51PM +0100, Maxime Ripard wrote:
> > > Hi,
> > >
> > > On Thu, Feb 06, 2025 at 07:14:29PM +0100, Luca Ceresoli wrote:
> > > > DRM bridges are currently considered as a fixed element of a DRM card, and
> > > > thus their lifetime is assumed to extend for as long as the card
> > > > exists. New use cases, such as hot-pluggable hardware with video bridges,
> > > > require DRM bridges to be added and removed to a DRM card without tearing
> > > > the card down. This is possible for connectors already (used by DP MST), so
> > > > add this possibility to DRM bridges as well.
> > > >
> > > > Implementation is based on drm_connector_init() as far as it makes sense,
> > > > and differs when it doesn't. A difference is that bridges are not exposed
> > > > to userspace, hence struct drm_bridge does not embed a struct
> > > > drm_mode_object which would provide the refcount. Instead we add to struct
> > > > drm_bridge a refcount field (we don't need other struct drm_mode_object
> > > > fields here) and instead of using the drm_mode_object_*() functions we
> > > > reimplement from those functions the few lines that drm_bridge needs for
> > > > refcounting.
> > > >
> > > > Also add a new devm_drm_bridge_alloc() macro to allocate a new refcounted
> > > > bridge.
> > > >
> > > > Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
> > >
> > > So, a couple of general comments:
> > >
> > > - I've said it a couple of times already, but I really think you're
> > >   making it harder than necessary for you here. This (and only this!)
> > >   should be the very first series you should be pushing. The rest can
> > >   only ever work if that work goes through, and it's already hard enough
> > >   as it is. So, split that patch into a series of its own, get that
> > >   merged, and then we will be able to deal with panels conversion and
> > >   whatever. That's even more true with panels since there's ongoing work
> > >   that will make it easier for you too. So the best thing here is
> > >   probably to wait.
> >
> > Luca and I had a quick chat on this during FOSDEM. I really think that
> > panel (part of the) series can go in first as it fixes a very well known
> > bug _and_ allows a pretty good cleanup to a whole set of drivers.
>
> I don't necessarily disagree on principle, but if you state that it can
> get first, and fixes a known problem (which one?), then it should be a
> separate, standalone, series.

A problem of panel bridges having the wrong lifetime because of devm_
attachment to a wrong device and so either being kept for too long or
being destroyed too early.

>
> Ever-expanding features are bad for both the reviewers and the
> contributors, even more so when the discussion happens off-list.
>
> > With all those panel / bridge wrappers gone we should be able to see a
> > clearer picture of what individual drivers are doing. In other words,
> > which memory and which code actually hosts and uses internal
> > 'next_bridge' reference.
> >
> > > - This patch really needs to be split into several patches, something
> > >   along the lines of:
> > >
> > >   + Creating devm_drm_bridge_alloc()
> > >   + Adding refcounting
> > >   + Taking the references in all the needed places
> > >   + Converting a bunch of drivers
> >
> > The last two parts seem troublematic to me, but, I must admit, I didn't
> > spend so much time reviewing all drm_bridge usage patterns.
>
> Why? the third one is already done by that patch, the fourth can
> relatively easily be done using coccinelle.

I have doubts about cocci. It doesn't have a way to know, what is the
lifetime of the references to the reference-holding memory. Maybe I'm
missing a point there.
Luca Ceresoli Feb. 10, 2025, 5:12 p.m. UTC | #5
Hello Maxime, Dmitry,

On Mon, 10 Feb 2025 16:23:44 +0200
Dmitry Baryshkov <dmitry.baryshkov@linaro.org> wrote:

> On Mon, 10 Feb 2025 at 14:31, Maxime Ripard <mripard@kernel.org>
> wrote:
> >
> > On Fri, Feb 07, 2025 at 09:54:06PM +0200, Dmitry Baryshkov wrote:  
> > > On Fri, Feb 07, 2025 at 12:47:51PM +0100, Maxime Ripard wrote:  
> > > > Hi,
> > > >
> > > > On Thu, Feb 06, 2025 at 07:14:29PM +0100, Luca Ceresoli wrote:  
> > > > > DRM bridges are currently considered as a fixed element of a
> > > > > DRM card, and thus their lifetime is assumed to extend for as
> > > > > long as the card exists. New use cases, such as hot-pluggable
> > > > > hardware with video bridges, require DRM bridges to be added
> > > > > and removed to a DRM card without tearing the card down. This
> > > > > is possible for connectors already (used by DP MST), so add
> > > > > this possibility to DRM bridges as well.
> > > > >
> > > > > Implementation is based on drm_connector_init() as far as it
> > > > > makes sense, and differs when it doesn't. A difference is
> > > > > that bridges are not exposed to userspace, hence struct
> > > > > drm_bridge does not embed a struct drm_mode_object which
> > > > > would provide the refcount. Instead we add to struct
> > > > > drm_bridge a refcount field (we don't need other struct
> > > > > drm_mode_object fields here) and instead of using the
> > > > > drm_mode_object_*() functions we reimplement from those
> > > > > functions the few lines that drm_bridge needs for refcounting.
> > > > >
> > > > > Also add a new devm_drm_bridge_alloc() macro to allocate a
> > > > > new refcounted bridge.
> > > > >
> > > > > Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>  
> > > >
> > > > So, a couple of general comments:
> > > >
> > > > - I've said it a couple of times already, but I really think
> > > > you're making it harder than necessary for you here. This (and
> > > > only this!) should be the very first series you should be
> > > > pushing. The rest can only ever work if that work goes through,
> > > > and it's already hard enough as it is. So, split that patch
> > > > into a series of its own, get that merged, and then we will be
> > > > able to deal with panels conversion and whatever. That's even
> > > > more true with panels since there's ongoing work that will make
> > > > it easier for you too. So the best thing here is probably to
> > > > wait.  

The idea you proposed was to handle the issues current panel bridge
code adds to the hotplug work by adding a .destroy callback and some
more devm magic. I explored the idea but even after some clarifications
from you it still didn't appear clearly doable and correct to me. And
even in the case it were perfectly correct and doable, it is based on
adding more complexity and "magic" on top of a topic that is already
hard to understand: panel_bridge lifetime.

So I opted for the other way: rework panel_bridge code so its lifetime
is clear and as one would expect (panel_bridge lifetime == panel
lifetime).

Possibly more work for me, but now it's done and it's in these patches
so why waiting?

This work has made the hotplug work on top of it way cleaner, and you
can also see the good cleanup in the samsung-dsim driver in patch 11.

I'm at work to apply the improvements suggested by Dmitry and send a
new version, and I'm fine with applying more fixes as needed.

And after this "basic" panel_bridge, more cleanups and rationalization
can be done at any moment, in order to make the panel and panel_bridge
even closer, and remove the code duplication.

> > > Luca and I had a quick chat on this during FOSDEM. I really think
> > > that panel (part of the) series can go in first as it fixes a
> > > very well known bug _and_ allows a pretty good cleanup to a whole
> > > set of drivers.  
> >
> > I don't necessarily disagree on principle, but if you state that it
> > can get first, and fixes a known problem (which one?), then it
> > should be a separate, standalone, series.  
> 
> A problem of panel bridges having the wrong lifetime because of devm_
> attachment to a wrong device and so either being kept for too long or
> being destroyed too early.

Exactly, because panel_bridge is usually created in the consumer driver
context, so the devm calls get a struct device pointer to the consumer,
not the panel itself (more in patch 8).

> > Ever-expanding features are bad for both the reviewers and the
> > contributors, even more so when the discussion happens off-list.
> >  
> > > With all those panel / bridge wrappers gone we should be able to
> > > see a clearer picture of what individual drivers are doing. In
> > > other words, which memory and which code actually hosts and uses
> > > internal 'next_bridge' reference.
> > >  
> > > > - This patch really needs to be split into several patches,
> > > > something along the lines of:
> > > >
> > > >   + Creating devm_drm_bridge_alloc()
> > > >   + Adding refcounting
> > > >   + Taking the references in all the needed places
> > > >   + Converting a bunch of drivers  

OK, I can split it.

> > > The last two parts seem troublematic to me, but, I must admit, I
> > > didn't spend so much time reviewing all drm_bridge usage
> > > patterns.  
> >
> > Why? the third one is already done by that patch, the fourth can
> > relatively easily be done using coccinelle.  
> 
> I have doubts about cocci. It doesn't have a way to know, what is the
> lifetime of the references to the reference-holding memory. Maybe I'm
> missing a point there.

I haven't looked into that, but I think coccinelle would be helpful in
adding the needed gets, but probably not for the puts, which are
usually more complex to get right.

Luca
Luca Ceresoli Feb. 10, 2025, 5:12 p.m. UTC | #6
Hi Maxime, Dmitry

On Fri, 7 Feb 2025 21:54:06 +0200
Dmitry Baryshkov <dmitry.baryshkov@linaro.org> wrote:

> > > +/* Internal function (for refcounted bridges) */
> > > +void __drm_bridge_free(struct kref *kref)
> > > +{
> > > +	struct drm_bridge *bridge = container_of(kref, struct drm_bridge, refcount);
> > > +	void *container = ((void *)bridge) - bridge->container_offset;
> > > +
> > > +	DRM_DEBUG("bridge=%p, container=%p FREE\n", bridge, container);  
> > 
> > Pointers are not really useful to track here, since they are obfuscated
> > most of the time. Using the bridge device name would probably be better
> > (or removing the SHOUTING DEBUG entirely :))  
> 
> bridge device name or bridge funcs (I opted for the latter for the
> debugfs file)

These DRM_DEBUG()s proved extremely useful exactly because of the
pointer. This is because when using hotplug one normally has the same
device added and removed multiple times, and so the device name or
bridge funcs is always the same, preventing from understanding which
instance is leaking, or being freed, get, put, etc.

Do you think this is a sufficient motivation to keep it?

Luca
Luca Ceresoli Feb. 10, 2025, 5:12 p.m. UTC | #7
Hello Maxime,

On Fri, 7 Feb 2025 12:47:51 +0100
Maxime Ripard <mripard@kernel.org> wrote:

> > diff --git a/include/drm/drm_bridge.h b/include/drm/drm_bridge.h
> > index ad7ba444a13e5ecf16f996de3742e4ac67dc21f1..43cef0f6ccd36034f64ad2babfebea62db1d9e43 100644
> > --- a/include/drm/drm_bridge.h
> > +++ b/include/drm/drm_bridge.h
> > @@ -31,6 +31,7 @@
> >  #include <drm/drm_encoder.h>
> >  #include <drm/drm_mode_object.h>
> >  #include <drm/drm_modes.h>
> > +#include <drm/drm_print.h>
> >  
> >  struct device_node;
> >  
> > @@ -863,6 +864,22 @@ struct drm_bridge {
> >  	const struct drm_bridge_timings *timings;
> >  	/** @funcs: control functions */
> >  	const struct drm_bridge_funcs *funcs;
> > +
> > +	/**
> > +	 * @container_offset: Offset of this struct within the container
> > +	 * struct embedding it. Used for refcounted bridges to free the
> > +	 * embeddeing struct when the refcount drops to zero. Unused on
> > +	 * legacy bridges.
> > +	 */
> > +	size_t container_offset;  
> 
> This shouldn't be in there. You can create an intermediate structure and
> store both pointers for the action to consume.

You mean to store container_offset + refcount + is_refcounted?

It can be named drm_bridge_object maybe, as it is somewhat resembling
struct drm_mode_object?

> > +	/**
> > +	 * @refcount: reference count for bridges with dynamic lifetime
> > +	 * (see drm_bridge_init)
> > +	 */
> > +	struct kref refcount;
> > +	/** @is_refcounted: this bridge has dynamic lifetime management */
> > +	bool is_refcounted;
> > +  
> 
> I'm not sure we want to treat both paths separately too. It'll require
> to update most of/all the drivers, but it's not too hard with
> coccinelle:
> 
> virtual patch
> 
> @@
> identifier f;
> identifier b, c, d;
> expression bf;
> type T;
> @@
> 
>  f(...)
>  {
> 	...
> -	T *c;
> +	T *c;
> 	...
> -	c = devm_kzalloc(d, ...);
> +	c = devm_drm_bridge_alloc(d, T, b, bf);
> 	...
> -	c->b.funcs = bf;
> 	...
> 	drm_bridge_add(&c->b);
> 	...
>  }
> 
> We need to add a bit more variations (like kzalloc vs devm_kzalloc,
> drm_bridge_add vs devm_drm_bridge_add, etc.), but it should be a good
> first approximation

Sure, this can be useful, thanks.

Luca
Maxime Ripard Feb. 10, 2025, 6:16 p.m. UTC | #8
On Mon, Feb 10, 2025 at 06:12:03PM +0100, Luca Ceresoli wrote:
> Hello Maxime, Dmitry,
> 
> On Mon, 10 Feb 2025 16:23:44 +0200
> Dmitry Baryshkov <dmitry.baryshkov@linaro.org> wrote:
> 
> > On Mon, 10 Feb 2025 at 14:31, Maxime Ripard <mripard@kernel.org>
> > wrote:
> > >
> > > On Fri, Feb 07, 2025 at 09:54:06PM +0200, Dmitry Baryshkov wrote:  
> > > > On Fri, Feb 07, 2025 at 12:47:51PM +0100, Maxime Ripard wrote:  
> > > > > Hi,
> > > > >
> > > > > On Thu, Feb 06, 2025 at 07:14:29PM +0100, Luca Ceresoli wrote:  
> > > > > > DRM bridges are currently considered as a fixed element of a
> > > > > > DRM card, and thus their lifetime is assumed to extend for as
> > > > > > long as the card exists. New use cases, such as hot-pluggable
> > > > > > hardware with video bridges, require DRM bridges to be added
> > > > > > and removed to a DRM card without tearing the card down. This
> > > > > > is possible for connectors already (used by DP MST), so add
> > > > > > this possibility to DRM bridges as well.
> > > > > >
> > > > > > Implementation is based on drm_connector_init() as far as it
> > > > > > makes sense, and differs when it doesn't. A difference is
> > > > > > that bridges are not exposed to userspace, hence struct
> > > > > > drm_bridge does not embed a struct drm_mode_object which
> > > > > > would provide the refcount. Instead we add to struct
> > > > > > drm_bridge a refcount field (we don't need other struct
> > > > > > drm_mode_object fields here) and instead of using the
> > > > > > drm_mode_object_*() functions we reimplement from those
> > > > > > functions the few lines that drm_bridge needs for refcounting.
> > > > > >
> > > > > > Also add a new devm_drm_bridge_alloc() macro to allocate a
> > > > > > new refcounted bridge.
> > > > > >
> > > > > > Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>  
> > > > >
> > > > > So, a couple of general comments:
> > > > >
> > > > > - I've said it a couple of times already, but I really think
> > > > > you're making it harder than necessary for you here. This (and
> > > > > only this!) should be the very first series you should be
> > > > > pushing. The rest can only ever work if that work goes through,
> > > > > and it's already hard enough as it is. So, split that patch
> > > > > into a series of its own, get that merged, and then we will be
> > > > > able to deal with panels conversion and whatever. That's even
> > > > > more true with panels since there's ongoing work that will make
> > > > > it easier for you too. So the best thing here is probably to
> > > > > wait.  
> 
> The idea you proposed was to handle the issues current panel bridge
> code adds to the hotplug work by adding a .destroy callback and some
> more devm magic. I explored the idea but even after some clarifications
> from you it still didn't appear clearly doable and correct to me. And
> even in the case it were perfectly correct and doable, it is based on
> adding more complexity and "magic" on top of a topic that is already
> hard to understand: panel_bridge lifetime.

Not really, no. I told you several time that you shouldn't deal with
panels yet.

> So I opted for the other way: rework panel_bridge code so its lifetime
> is clear and as one would expect (panel_bridge lifetime == panel
> lifetime).
> 
> Possibly more work for me, but now it's done and it's in these patches
> so why waiting?

No, it's not done. You have the same issue with panels than you are
trying to fix with bridges: it's allocated through devm so they'll get
destroyed too soon. The panel_bridge might work fine now, but the panel
won't.

So it's more work, more scope-creep, and more discussions. For example,
I'm really not convinced on moving the drm_panel code under bridge.

Splitting it up will allow you to at least merge the parts that are
somewhat agreed upon. But do however you want :)

Maxime
Maxime Ripard Feb. 10, 2025, 6:17 p.m. UTC | #9
On Mon, Feb 10, 2025 at 04:23:44PM +0200, Dmitry Baryshkov wrote:
> On Mon, 10 Feb 2025 at 14:31, Maxime Ripard <mripard@kernel.org> wrote:
> >
> > On Fri, Feb 07, 2025 at 09:54:06PM +0200, Dmitry Baryshkov wrote:
> > > On Fri, Feb 07, 2025 at 12:47:51PM +0100, Maxime Ripard wrote:
> > > > Hi,
> > > >
> > > > On Thu, Feb 06, 2025 at 07:14:29PM +0100, Luca Ceresoli wrote:
> > > > > DRM bridges are currently considered as a fixed element of a DRM card, and
> > > > > thus their lifetime is assumed to extend for as long as the card
> > > > > exists. New use cases, such as hot-pluggable hardware with video bridges,
> > > > > require DRM bridges to be added and removed to a DRM card without tearing
> > > > > the card down. This is possible for connectors already (used by DP MST), so
> > > > > add this possibility to DRM bridges as well.
> > > > >
> > > > > Implementation is based on drm_connector_init() as far as it makes sense,
> > > > > and differs when it doesn't. A difference is that bridges are not exposed
> > > > > to userspace, hence struct drm_bridge does not embed a struct
> > > > > drm_mode_object which would provide the refcount. Instead we add to struct
> > > > > drm_bridge a refcount field (we don't need other struct drm_mode_object
> > > > > fields here) and instead of using the drm_mode_object_*() functions we
> > > > > reimplement from those functions the few lines that drm_bridge needs for
> > > > > refcounting.
> > > > >
> > > > > Also add a new devm_drm_bridge_alloc() macro to allocate a new refcounted
> > > > > bridge.
> > > > >
> > > > > Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
> > > >
> > > > So, a couple of general comments:
> > > >
> > > > - I've said it a couple of times already, but I really think you're
> > > >   making it harder than necessary for you here. This (and only this!)
> > > >   should be the very first series you should be pushing. The rest can
> > > >   only ever work if that work goes through, and it's already hard enough
> > > >   as it is. So, split that patch into a series of its own, get that
> > > >   merged, and then we will be able to deal with panels conversion and
> > > >   whatever. That's even more true with panels since there's ongoing work
> > > >   that will make it easier for you too. So the best thing here is
> > > >   probably to wait.
> > >
> > > Luca and I had a quick chat on this during FOSDEM. I really think that
> > > panel (part of the) series can go in first as it fixes a very well known
> > > bug _and_ allows a pretty good cleanup to a whole set of drivers.
> >
> > I don't necessarily disagree on principle, but if you state that it can
> > get first, and fixes a known problem (which one?), then it should be a
> > separate, standalone, series.
> 
> A problem of panel bridges having the wrong lifetime because of devm_
> attachment to a wrong device and so either being kept for too long or
> being destroyed too early.

Yeah, and panels too. Which isn't solved there.

> > Ever-expanding features are bad for both the reviewers and the
> > contributors, even more so when the discussion happens off-list.
> >
> > > With all those panel / bridge wrappers gone we should be able to see a
> > > clearer picture of what individual drivers are doing. In other words,
> > > which memory and which code actually hosts and uses internal
> > > 'next_bridge' reference.
> > >
> > > > - This patch really needs to be split into several patches, something
> > > >   along the lines of:
> > > >
> > > >   + Creating devm_drm_bridge_alloc()
> > > >   + Adding refcounting
> > > >   + Taking the references in all the needed places
> > > >   + Converting a bunch of drivers
> > >
> > > The last two parts seem troublematic to me, but, I must admit, I didn't
> > > spend so much time reviewing all drm_bridge usage patterns.
> >
> > Why? the third one is already done by that patch, the fourth can
> > relatively easily be done using coccinelle.
> 
> I have doubts about cocci. It doesn't have a way to know, what is the
> lifetime of the references to the reference-holding memory. Maybe I'm
> missing a point there.

It doesn't matter? Cocci doesn't (have to) understand the code, it
generates a patch for you. It's up to the author to make a correct
patch.

Maxime
Dmitry Baryshkov Feb. 10, 2025, 11:14 p.m. UTC | #10
On Mon, Feb 10, 2025 at 06:12:44PM +0100, Luca Ceresoli wrote:
> Hi Maxime, Dmitry
> 
> On Fri, 7 Feb 2025 21:54:06 +0200
> Dmitry Baryshkov <dmitry.baryshkov@linaro.org> wrote:
> 
> > > > +/* Internal function (for refcounted bridges) */
> > > > +void __drm_bridge_free(struct kref *kref)
> > > > +{
> > > > +	struct drm_bridge *bridge = container_of(kref, struct drm_bridge, refcount);
> > > > +	void *container = ((void *)bridge) - bridge->container_offset;
> > > > +
> > > > +	DRM_DEBUG("bridge=%p, container=%p FREE\n", bridge, container);  
> > > 
> > > Pointers are not really useful to track here, since they are obfuscated
> > > most of the time. Using the bridge device name would probably be better
> > > (or removing the SHOUTING DEBUG entirely :))  
> > 
> > bridge device name or bridge funcs (I opted for the latter for the
> > debugfs file)
> 
> These DRM_DEBUG()s proved extremely useful exactly because of the
> pointer. This is because when using hotplug one normally has the same
> device added and removed multiple times, and so the device name or
> bridge funcs is always the same, preventing from understanding which
> instance is leaking, or being freed, get, put, etc.
> 
> Do you think this is a sufficient motivation to keep it?

Then it should be something like %px. I found that %p is mangled.
What about having both device name _and_ a pointer?

> 
> Luca
> 
> -- 
> Luca Ceresoli, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
Maxime Ripard Feb. 11, 2025, 8:48 a.m. UTC | #11
On Tue, Feb 11, 2025 at 01:14:28AM +0200, Dmitry Baryshkov wrote:
> On Mon, Feb 10, 2025 at 06:12:44PM +0100, Luca Ceresoli wrote:
> > Hi Maxime, Dmitry
> > 
> > On Fri, 7 Feb 2025 21:54:06 +0200
> > Dmitry Baryshkov <dmitry.baryshkov@linaro.org> wrote:
> > 
> > > > > +/* Internal function (for refcounted bridges) */
> > > > > +void __drm_bridge_free(struct kref *kref)
> > > > > +{
> > > > > +	struct drm_bridge *bridge = container_of(kref, struct drm_bridge, refcount);
> > > > > +	void *container = ((void *)bridge) - bridge->container_offset;
> > > > > +
> > > > > +	DRM_DEBUG("bridge=%p, container=%p FREE\n", bridge, container);  
> > > > 
> > > > Pointers are not really useful to track here, since they are obfuscated
> > > > most of the time. Using the bridge device name would probably be better
> > > > (or removing the SHOUTING DEBUG entirely :))  
> > > 
> > > bridge device name or bridge funcs (I opted for the latter for the
> > > debugfs file)
> > 
> > These DRM_DEBUG()s proved extremely useful exactly because of the
> > pointer. This is because when using hotplug one normally has the same
> > device added and removed multiple times, and so the device name or
> > bridge funcs is always the same, preventing from understanding which
> > instance is leaking, or being freed, get, put, etc.
> > 
> > Do you think this is a sufficient motivation to keep it?
> 
> Then it should be something like %px. I found that %p is mangled.
> What about having both device name _and_ a pointer?

No, %px must not be used there. %p is mangled but should be consistent
across calls. But yeah, it's kind of the reason I suggested to use the
bridge device name instead.

Maxime
Maxime Ripard Feb. 11, 2025, 1:10 p.m. UTC | #12
On Mon, Feb 10, 2025 at 06:12:52PM +0100, Luca Ceresoli wrote:
> Hello Maxime,
> 
> On Fri, 7 Feb 2025 12:47:51 +0100
> Maxime Ripard <mripard@kernel.org> wrote:
> 
> > > diff --git a/include/drm/drm_bridge.h b/include/drm/drm_bridge.h
> > > index ad7ba444a13e5ecf16f996de3742e4ac67dc21f1..43cef0f6ccd36034f64ad2babfebea62db1d9e43 100644
> > > --- a/include/drm/drm_bridge.h
> > > +++ b/include/drm/drm_bridge.h
> > > @@ -31,6 +31,7 @@
> > >  #include <drm/drm_encoder.h>
> > >  #include <drm/drm_mode_object.h>
> > >  #include <drm/drm_modes.h>
> > > +#include <drm/drm_print.h>
> > >  
> > >  struct device_node;
> > >  
> > > @@ -863,6 +864,22 @@ struct drm_bridge {
> > >  	const struct drm_bridge_timings *timings;
> > >  	/** @funcs: control functions */
> > >  	const struct drm_bridge_funcs *funcs;
> > > +
> > > +	/**
> > > +	 * @container_offset: Offset of this struct within the container
> > > +	 * struct embedding it. Used for refcounted bridges to free the
> > > +	 * embeddeing struct when the refcount drops to zero. Unused on
> > > +	 * legacy bridges.
> > > +	 */
> > > +	size_t container_offset;  
> > 
> > This shouldn't be in there. You can create an intermediate structure and
> > store both pointers for the action to consume.
> 
> You mean to store container_offset + refcount + is_refcounted?

No, I meant for the private structure pointer and the drm_bridge
pointer. refcount should be in drm_bridge, and I think is_refcounted
should be dropped.

> It can be named drm_bridge_object maybe, as it is somewhat resembling
> struct drm_mode_object?
> 
> > > +	/**
> > > +	 * @refcount: reference count for bridges with dynamic lifetime
> > > +	 * (see drm_bridge_init)
> > > +	 */
> > > +	struct kref refcount;
> > > +	/** @is_refcounted: this bridge has dynamic lifetime management */
> > > +	bool is_refcounted;
> > > +  
> > 
> > I'm not sure we want to treat both paths separately too. It'll require
> > to update most of/all the drivers, but it's not too hard with
> > coccinelle:
> > 
> > virtual patch
> > 
> > @@
> > identifier f;
> > identifier b, c, d;
> > expression bf;
> > type T;
> > @@
> > 
> >  f(...)
> >  {
> > 	...
> > -	T *c;
> > +	T *c;
> > 	...
> > -	c = devm_kzalloc(d, ...);
> > +	c = devm_drm_bridge_alloc(d, T, b, bf);
> > 	...
> > -	c->b.funcs = bf;
> > 	...
> > 	drm_bridge_add(&c->b);
> > 	...
> >  }
> > 
> > We need to add a bit more variations (like kzalloc vs devm_kzalloc,
> > drm_bridge_add vs devm_drm_bridge_add, etc.), but it should be a good
> > first approximation
> 
> Sure, this can be useful, thanks.

You can identify all the bridges affected by this issue using:

virtual report

@ find_add @
identifier add_f;
identifier c;
identifier b;
expression d;
position p;
identifier r;
type T;
@@

 add_f(...)
 {
 	...
- 	T *c;
+ 	T *c;
 	...
(
	drm_bridge_add(&c->b)@p;
|
	devm_drm_bridge_add(d, &c->b)@p;
|
	r = devm_drm_bridge_add(d, &c->b)@p;
)
	...
 }

@ find_allocation depends on find_add @
identifier alloc_f;
type find_add.T;
identifier cal;
position p;
@@

 alloc_f(...)
 {
     ...
-    T *cal;
+    T *cal;
     ...
(
     cal = kzalloc(...)@p;
|
     cal = devm_kzalloc(...)@p;
)
     ...
 }

@ script:python depends on report && (find_add && find_allocation) @
add_f << find_add.add_f;
alloc_f << find_allocation.alloc_f;
add_p << find_add.p;
alloc_p << find_allocation.p;
@@

coccilib.report.print_report(alloc_p[0], "ERROR: Bridge Driver is unsafely allocated in %s and added in %s" % (alloc_f, add_f))


Maxime
Dmitry Baryshkov Feb. 12, 2025, 12:55 a.m. UTC | #13
On Tue, Feb 11, 2025 at 09:48:31AM +0100, Maxime Ripard wrote:
> On Tue, Feb 11, 2025 at 01:14:28AM +0200, Dmitry Baryshkov wrote:
> > On Mon, Feb 10, 2025 at 06:12:44PM +0100, Luca Ceresoli wrote:
> > > Hi Maxime, Dmitry
> > > 
> > > On Fri, 7 Feb 2025 21:54:06 +0200
> > > Dmitry Baryshkov <dmitry.baryshkov@linaro.org> wrote:
> > > 
> > > > > > +/* Internal function (for refcounted bridges) */
> > > > > > +void __drm_bridge_free(struct kref *kref)
> > > > > > +{
> > > > > > +	struct drm_bridge *bridge = container_of(kref, struct drm_bridge, refcount);
> > > > > > +	void *container = ((void *)bridge) - bridge->container_offset;
> > > > > > +
> > > > > > +	DRM_DEBUG("bridge=%p, container=%p FREE\n", bridge, container);  
> > > > > 
> > > > > Pointers are not really useful to track here, since they are obfuscated
> > > > > most of the time. Using the bridge device name would probably be better
> > > > > (or removing the SHOUTING DEBUG entirely :))  
> > > > 
> > > > bridge device name or bridge funcs (I opted for the latter for the
> > > > debugfs file)
> > > 
> > > These DRM_DEBUG()s proved extremely useful exactly because of the
> > > pointer. This is because when using hotplug one normally has the same
> > > device added and removed multiple times, and so the device name or
> > > bridge funcs is always the same, preventing from understanding which
> > > instance is leaking, or being freed, get, put, etc.
> > > 
> > > Do you think this is a sufficient motivation to keep it?
> > 
> > Then it should be something like %px. I found that %p is mangled.
> > What about having both device name _and_ a pointer?
> 
> No, %px must not be used there. %p is mangled but should be consistent
> across calls. But yeah, it's kind of the reason I suggested to use the
> bridge device name instead.

Then we need to extend struct drm_bridge with struct device *dev (which
I would appreciate, it will solve whole hdmi_audio_dev / CEC device /
etc story).
Maxime Ripard Feb. 12, 2025, 10:08 a.m. UTC | #14
On Wed, Feb 12, 2025 at 02:55:10AM +0200, Dmitry Baryshkov wrote:
> On Tue, Feb 11, 2025 at 09:48:31AM +0100, Maxime Ripard wrote:
> > On Tue, Feb 11, 2025 at 01:14:28AM +0200, Dmitry Baryshkov wrote:
> > > On Mon, Feb 10, 2025 at 06:12:44PM +0100, Luca Ceresoli wrote:
> > > > Hi Maxime, Dmitry
> > > > 
> > > > On Fri, 7 Feb 2025 21:54:06 +0200
> > > > Dmitry Baryshkov <dmitry.baryshkov@linaro.org> wrote:
> > > > 
> > > > > > > +/* Internal function (for refcounted bridges) */
> > > > > > > +void __drm_bridge_free(struct kref *kref)
> > > > > > > +{
> > > > > > > +	struct drm_bridge *bridge = container_of(kref, struct drm_bridge, refcount);
> > > > > > > +	void *container = ((void *)bridge) - bridge->container_offset;
> > > > > > > +
> > > > > > > +	DRM_DEBUG("bridge=%p, container=%p FREE\n", bridge, container);  
> > > > > > 
> > > > > > Pointers are not really useful to track here, since they are obfuscated
> > > > > > most of the time. Using the bridge device name would probably be better
> > > > > > (or removing the SHOUTING DEBUG entirely :))  
> > > > > 
> > > > > bridge device name or bridge funcs (I opted for the latter for the
> > > > > debugfs file)
> > > > 
> > > > These DRM_DEBUG()s proved extremely useful exactly because of the
> > > > pointer. This is because when using hotplug one normally has the same
> > > > device added and removed multiple times, and so the device name or
> > > > bridge funcs is always the same, preventing from understanding which
> > > > instance is leaking, or being freed, get, put, etc.
> > > > 
> > > > Do you think this is a sufficient motivation to keep it?
> > > 
> > > Then it should be something like %px. I found that %p is mangled.
> > > What about having both device name _and_ a pointer?
> > 
> > No, %px must not be used there. %p is mangled but should be consistent
> > across calls. But yeah, it's kind of the reason I suggested to use the
> > bridge device name instead.
> 
> Then we need to extend struct drm_bridge with struct device *dev (which
> I would appreciate, it will solve whole hdmi_audio_dev / CEC device /
> etc story).

Let's not get carried away and start yet another side discussion here.
Most of these log messages need to be reworked anyway, so I'm sure we
can find something that wouldn't require yet another rework to argue
about.

Maxime
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c
index 1955a231378050abf1071d74a145831b425c47c5..f694b32ca59cb91c32846bc00b43360df41cc1ad 100644
--- a/drivers/gpu/drm/drm_bridge.c
+++ b/drivers/gpu/drm/drm_bridge.c
@@ -200,6 +200,57 @@ 
 DEFINE_MUTEX(bridge_lock);
 LIST_HEAD(bridge_list);
 
+/* Internal function (for refcounted bridges) */
+void __drm_bridge_free(struct kref *kref)
+{
+	struct drm_bridge *bridge = container_of(kref, struct drm_bridge, refcount);
+	void *container = ((void *)bridge) - bridge->container_offset;
+
+	DRM_DEBUG("bridge=%p, container=%p FREE\n", bridge, container);
+
+	kfree(container);
+}
+EXPORT_SYMBOL(__drm_bridge_free);
+
+static void drm_bridge_put_void(void *data)
+{
+	struct drm_bridge *bridge = (struct drm_bridge *)data;
+
+	drm_bridge_put(bridge);
+}
+
+void *__devm_drm_bridge_alloc(struct device *dev, size_t size, size_t offset,
+			      const struct drm_bridge_funcs *funcs)
+{
+	void *container;
+	struct drm_bridge *bridge;
+	int err;
+
+	if (!funcs) {
+		dev_warn(dev, "Missing funcs pointer\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	container = kzalloc(size, GFP_KERNEL);
+	if (!container)
+		return ERR_PTR(-ENOMEM);
+
+	bridge = container + offset;
+	bridge->container_offset = offset;
+	bridge->funcs = funcs;
+	kref_init(&bridge->refcount);
+	bridge->is_refcounted = 1;
+
+	err = devm_add_action_or_reset(dev, drm_bridge_put_void, bridge);
+	if (err)
+		return ERR_PTR(err);
+
+	DRM_DEBUG("bridge=%p, container=%p, funcs=%ps ALLOC\n", bridge, container, funcs);
+
+	return container;
+}
+EXPORT_SYMBOL(__devm_drm_bridge_alloc);
+
 /**
  * drm_bridge_add - add the given bridge to the global bridge list
  *
@@ -209,6 +260,10 @@  void drm_bridge_add(struct drm_bridge *bridge)
 {
 	struct drm_bridge *br, *tmp;
 
+	DRM_DEBUG("bridge=%p ADD\n", bridge);
+
+	drm_bridge_get(bridge);
+
 	mutex_init(&bridge->hpd_mutex);
 
 	if (bridge->ops & DRM_BRIDGE_OP_HDMI)
@@ -257,6 +312,8 @@  void drm_bridge_remove(struct drm_bridge *bridge)
 {
 	struct drm_bridge *br, *tmp;
 
+	DRM_DEBUG("bridge=%p REMOVE\n", bridge);
+
 	mutex_lock(&bridge_lock);
 	list_del_init(&bridge->list);
 	mutex_unlock(&bridge_lock);
@@ -266,6 +323,8 @@  void drm_bridge_remove(struct drm_bridge *bridge)
 			br->funcs->bridge_event_notify(br, DRM_EVENT_BRIDGE_REMOVE, bridge);
 
 	mutex_destroy(&bridge->hpd_mutex);
+
+	drm_bridge_put(bridge);
 }
 EXPORT_SYMBOL(drm_bridge_remove);
 
@@ -326,11 +385,17 @@  int drm_bridge_attach(struct drm_encoder *encoder, struct drm_bridge *bridge,
 	if (!encoder || !bridge)
 		return -EINVAL;
 
-	if (previous && (!previous->dev || previous->encoder != encoder))
-		return -EINVAL;
+	drm_bridge_get(bridge);
 
-	if (bridge->dev)
-		return -EBUSY;
+	if (previous && (!previous->dev || previous->encoder != encoder)) {
+		ret = -EINVAL;
+		goto err_put_bridge;
+	}
+
+	if (bridge->dev) {
+		ret = -EBUSY;
+		goto err_put_bridge;
+	}
 
 	bridge->dev = encoder->dev;
 	bridge->encoder = encoder;
@@ -379,6 +444,8 @@  int drm_bridge_attach(struct drm_encoder *encoder, struct drm_bridge *bridge,
 			      "failed to attach bridge %pOF to encoder %s\n",
 			      bridge->of_node, encoder->name);
 
+err_put_bridge:
+	drm_bridge_put(bridge);
 	return ret;
 }
 EXPORT_SYMBOL(drm_bridge_attach);
@@ -399,6 +466,7 @@  void drm_bridge_detach(struct drm_bridge *bridge)
 
 	list_del(&bridge->chain_node);
 	bridge->dev = NULL;
+	drm_bridge_put(bridge);
 }
 
 /**
diff --git a/include/drm/drm_bridge.h b/include/drm/drm_bridge.h
index ad7ba444a13e5ecf16f996de3742e4ac67dc21f1..43cef0f6ccd36034f64ad2babfebea62db1d9e43 100644
--- a/include/drm/drm_bridge.h
+++ b/include/drm/drm_bridge.h
@@ -31,6 +31,7 @@ 
 #include <drm/drm_encoder.h>
 #include <drm/drm_mode_object.h>
 #include <drm/drm_modes.h>
+#include <drm/drm_print.h>
 
 struct device_node;
 
@@ -863,6 +864,22 @@  struct drm_bridge {
 	const struct drm_bridge_timings *timings;
 	/** @funcs: control functions */
 	const struct drm_bridge_funcs *funcs;
+
+	/**
+	 * @container_offset: Offset of this struct within the container
+	 * struct embedding it. Used for refcounted bridges to free the
+	 * embeddeing struct when the refcount drops to zero. Unused on
+	 * legacy bridges.
+	 */
+	size_t container_offset;
+	/**
+	 * @refcount: reference count for bridges with dynamic lifetime
+	 * (see drm_bridge_init)
+	 */
+	struct kref refcount;
+	/** @is_refcounted: this bridge has dynamic lifetime management */
+	bool is_refcounted;
+
 	/** @driver_private: pointer to the bridge driver's internal context */
 	void *driver_private;
 	/** @ops: bitmask of operations supported by the bridge */
@@ -964,6 +981,106 @@  drm_priv_to_bridge(struct drm_private_obj *priv)
 	return container_of(priv, struct drm_bridge, base);
 }
 
+static inline bool drm_bridge_is_refcounted(struct drm_bridge *bridge)
+{
+	return bridge->is_refcounted;
+}
+
+void __drm_bridge_free(struct kref *kref);
+
+/**
+ * drm_bridge_get - Acquire a bridge reference
+ * @bridge: DRM bridge
+ *
+ * This function increments the bridge's refcount.
+ *
+ * It does nothing on non-refcounted bridges. See drm_bridge_init().
+ */
+static inline void drm_bridge_get(struct drm_bridge *bridge)
+{
+	if (!drm_bridge_is_refcounted(bridge))
+		return;
+
+	DRM_DEBUG("bridge=%p GET\n", bridge);
+
+	kref_get(&bridge->refcount);
+}
+
+/**
+ * drm_bridge_put - Release a bridge reference
+ * @bridge: DRM bridge
+ *
+ * This function decrements the bridge's reference count and frees the
+ * object if the reference count drops to zero.
+ *
+ * It does nothing on non-refcounted bridges. See drm_bridge_init().
+ *
+ * See also drm_bridge_put_and_clear() which is more handy in many cases.
+ */
+static inline void drm_bridge_put(struct drm_bridge *bridge)
+{
+	if (!drm_bridge_is_refcounted(bridge))
+		return;
+
+	DRM_DEBUG("bridge=%p PUT\n", bridge);
+
+	kref_put(&bridge->refcount, __drm_bridge_free);
+}
+
+/**
+ * drm_bridge_put_and_clear - Given a bridge pointer, clear the pointer
+ *                            then put the bridge
+ *
+ * @bridge_pp: pointer to pointer to a struct drm_bridge
+ *
+ * Helper to put a DRM bridge (whose pointer is passed), but only after
+ * setting its pointer to NULL. Useful for drivers having struct drm_bridge
+ * pointers they need to dispose of, without leaving a use-after-free
+ * window where the pointed bridge might have been freed while still
+ * holding a pointer to it.
+ *
+ * For example a driver having this private struct::
+ *
+ *     struct my_bridge {
+ *         struct drm_bridge *remote_bridge;
+ *         ...
+ *     };
+ *
+ * can dispose of remote_bridge using::
+ *
+ *     drm_bridge_put_and_clear(&my_bridge->remote_bridge);
+ */
+static inline void drm_bridge_put_and_clear(struct drm_bridge **bridge_pp)
+{
+	struct drm_bridge *bridge = *bridge_pp;
+
+	*bridge_pp = NULL;
+	drm_bridge_put(bridge);
+}
+
+void *__devm_drm_bridge_alloc(struct device *dev, size_t size, size_t offset,
+			      const struct drm_bridge_funcs *funcs);
+
+/**
+ * devm_drm_bridge_alloc - Allocate and initialize an refcounted bridge
+ * @dev: struct device of the bridge device
+ * @type: the type of the struct which contains struct &drm_bridge
+ * @member: the name of the &drm_bridge within @type
+ * @funcs: callbacks for this bridge
+ *
+ * The bridge will have a dynamic lifetime (aka a refcounted bridge).
+ *
+ * The returned refcount is initialized to 1. This reference will be
+ * automatically dropped via devm (by calling drm_bridge_put()) when the
+ * device is removed.
+ *
+ * Returns:
+ * Pointer to new bridge, or ERR_PTR on failure.
+ */
+#define devm_drm_bridge_alloc(dev, type, member, funcs) \
+	((type *)__devm_drm_bridge_alloc(dev, sizeof(type), \
+					 offsetof(type, member), funcs))
+
 void drm_bridge_add(struct drm_bridge *bridge);
 int devm_drm_bridge_add(struct device *dev, struct drm_bridge *bridge);
 void drm_bridge_remove(struct drm_bridge *bridge);