mbox series

[v2,0/7] Devm helpers for regulator get and enable

Message ID cover.1660292316.git.mazziesaccount@gmail.com (mailing list archive)
Headers show
Series Devm helpers for regulator get and enable | expand

Message

Matti Vaittinen Aug. 12, 2022, 10:08 a.m. UTC
Devm helpers for regulator get and enable

First patch in the series is actually just a simple documentation fix
which could be taken in as it is now.

A few* drivers seem to use pattern demonstrated by pseudocode:

- devm_regulator_get()
- regulator_enable()
- devm_add_action_or_reset(regulator_disable())

Introducing devm helpers for this pattern would remove bunch of code from
drivers. Typically following:

- replace 3 calls (devm_regulator_get[_optional](), regulator_enable(),
  devm_add_action_or_reset()) with just one
  (devm_regulator_get_enable[_optional]()).
- drop disable callback.
- remove stored pointer to struct regulator - which can lead to problem
  when an devm action for regulator_disable is used.

I believe this simplifies things by removing some dublicated code.

The suggested managed 'get_enable' APIs do not return the pointer to
regulators for user because any call to regulator_disable()
(or regulator_enable()) may easily lead to regulator enable count imbalance
upon device detach. (Eg, if someone calls regulator_disable() and the
device is then detached before user has re-enabled the regulator). Not
returning the pointer to obtained regulator to caller is a good hint that
the enable/disable should not be manually handled when these APIs are used.

OTOH, not returning the pointer reduces the use-cases by not allowing
the consumers to perform other regulator actions. For example request the
voltages. A few drivers which used the "get, enable,
devm_action_to_disable" did also query the voltages. The API does not suit
needs of such users.

This series reowrks only a few drivers as I am short of time. So, there is
still plenty of fish in the sea for people who like to improve the code
(or count the beans ;]).

Finally - most of the converted drivers have not been tested (other than
compile-tested) due to lack of HW. All reviews and testing is _highly_
appreciated (as always!).

Revision history:

RFCv1 => v2:
	- Add devm_regulator_bulk_get_enable() and
	  devm_regulator_bulk_put()
	- Convert a couple of drivers to use the new
	  devm_regulator_bulk_get_enable().
	- Squash all IIO patches into one.

Patch 1:
	Fix docmentation (devres API list) for regulator APIs
Patch 2:
	The new devm helpers.
Patch 3:
	Add new devm-helper APIs to docs.
Patch 4:
	simplified CLK driver(s)
Patch 5:
	simplified GPU driver(s)
Patch 6:
	simplified hwmon driver(s)
Patch 7:
	simplified IIO driver(s)

---

Matti Vaittinen (7):
  docs: devres: regulator: Add missing devm_* functions to devres.rst
  regulator: Add devm helpers for get and enable
  docs: devres: regulator: Add new get_enable functions to devres.rst
  clk: cdce925: simplify using devm_regulator_get_enable()
  gpu: drm: simplify drivers using devm_regulator_*get_enable*()
  hwmon: lm90: simplify using devm_regulator_get_enable()
  iio: Simplify drivers using devm_regulator_*get_enable()

 .../driver-api/driver-model/devres.rst        |  11 ++
 drivers/clk/clk-cdce925.c                     |  21 +--
 drivers/gpu/drm/bridge/sii902x.c              |  22 +--
 drivers/gpu/drm/meson/meson_dw_hdmi.c         |  23 +--
 drivers/hwmon/lm90.c                          |  21 +--
 drivers/iio/adc/ad7192.c                      |  15 +-
 drivers/iio/dac/ltc2688.c                     |  23 +--
 drivers/iio/gyro/bmg160_core.c                |  24 +--
 drivers/iio/imu/st_lsm6dsx/st_lsm6dsx.h       |   2 -
 drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c  |  30 +---
 drivers/regulator/devres.c                    | 164 ++++++++++++++++++
 include/linux/regulator/consumer.h            |  27 +++
 12 files changed, 227 insertions(+), 156 deletions(-)

Comments

Laurent Pinchart Aug. 15, 2022, 3:54 p.m. UTC | #1
On Mon, Aug 15, 2022 at 04:44:44PM +0100, Mark Brown wrote:
> On Fri, 12 Aug 2022 13:08:17 +0300, Matti Vaittinen wrote:
> > Devm helpers for regulator get and enable
> > 
> > First patch in the series is actually just a simple documentation fix
> > which could be taken in as it is now.
> > 
> > A few* drivers seem to use pattern demonstrated by pseudocode:
> > 
> > [...]
> 
> Applied to
> 
>    https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-next
> 
> Thanks!
> 
> [1/7] docs: devres: regulator: Add missing devm_* functions to devres.rst
>       commit: 9b6744f60b6b47bc0757a1955adb4d2c3ab22e13
> [2/7] regulator: Add devm helpers for get and enable
>       (no commit info)

I didn't have time to reply to the series yet, but I think this isn't a
great idea. There are two issues:

- With devres, you don't have full control over the order in which
  resources will be released, which means that you can't control the
  power off sequence, in particular if it needs to be sequenced with
  GPIOs and clocks. That's not a concern for all drivers, but this API
  will creep in in places where it shouldn't be used, driver authours
  should really pay attention to power management and not live with the
  false impression that everything will be handled automatically for
  them. In the worst cases, an incorrect power off sequence could lead
  to hardware damage.

- Powering regulators on at probe time and leaving them on is a very bad
  practice from a power management point of view, and should really be
  discouraged. Adding convenience helpers to make this easy is the wrong
  message, we should instead push driver authors to implement proper
  runtime PM.

> All being well this means that it will be integrated into the linux-next
> tree (usually sometime in the next 24 hours) and sent to Linus during
> the next merge window (or sooner if it is a bug fix), however if
> problems are discovered then the patch may be dropped or reverted.
> 
> You may get further e-mails resulting from automated or manual testing
> and review of the tree, please engage with people reporting problems and
> send followup patches addressing any issues that are reported if needed.
> 
> If any updates are required or you are submitting further changes they
> should be sent as incremental updates against current git, existing
> patches will not be replaced.
> 
> Please add any relevant lists and maintainers to the CCs when replying
> to this mail.
Mark Brown Aug. 15, 2022, 4:33 p.m. UTC | #2
On Mon, Aug 15, 2022 at 06:54:45PM +0300, Laurent Pinchart wrote:

> - With devres, you don't have full control over the order in which
>   resources will be released, which means that you can't control the
>   power off sequence, in particular if it needs to be sequenced with
>   GPIOs and clocks. That's not a concern for all drivers, but this API
>   will creep in in places where it shouldn't be used, driver authours
>   should really pay attention to power management and not live with the
>   false impression that everything will be handled automatically for
>   them. In the worst cases, an incorrect power off sequence could lead
>   to hardware damage.

I basically agree with these concerns which is why I was only happy with
this API when Matti suggested doing it in a way that meant that the
callers are unable to access the regulator at runtime, this means that
if anyone wants to do any kind of management of the power state outside
of probe and remove they are forced to convert to the full fat APIs.
The general ordering concern with devm is that the free happens too late
but for the most part this isn't such a concern with regulators, they
might have delayed power off anyway due to sharing - it's no worse than
memory allocation AFAICT.  Given all the other APIs using devm it's
probably going to end up fixing some bugs.

For sequencing I'm not convinced it's much worse than the bulk API is
anyway, and practically speaking I expect most devices that have
problems here will also need more control over power anyway - it's
certainly the common case that hardware has pretty basic requirements
and is fairly tolerant.

> - Powering regulators on at probe time and leaving them on is a very bad
>   practice from a power management point of view, and should really be
>   discouraged. Adding convenience helpers to make this easy is the wrong
>   message, we should instead push driver authors to implement proper
>   runtime PM.

The stick simply isn't working here as far as I can see.
Laurent Pinchart Aug. 15, 2022, 6:52 p.m. UTC | #3
Hi Mark,

On Mon, Aug 15, 2022 at 05:33:06PM +0100, Mark Brown wrote:
> On Mon, Aug 15, 2022 at 06:54:45PM +0300, Laurent Pinchart wrote:
> 
> > - With devres, you don't have full control over the order in which
> >   resources will be released, which means that you can't control the
> >   power off sequence, in particular if it needs to be sequenced with
> >   GPIOs and clocks. That's not a concern for all drivers, but this API
> >   will creep in in places where it shouldn't be used, driver authours
> >   should really pay attention to power management and not live with the
> >   false impression that everything will be handled automatically for
> >   them. In the worst cases, an incorrect power off sequence could lead
> >   to hardware damage.
> 
> I basically agree with these concerns which is why I was only happy with
> this API when Matti suggested doing it in a way that meant that the
> callers are unable to access the regulator at runtime, this means that
> if anyone wants to do any kind of management of the power state outside
> of probe and remove they are forced to convert to the full fat APIs.
> The general ordering concern with devm is that the free happens too late
> but for the most part this isn't such a concern with regulators, they
> might have delayed power off anyway due to sharing - it's no worse than
> memory allocation AFAICT.  Given all the other APIs using devm it's
> probably going to end up fixing some bugs.
> 
> For sequencing I'm not convinced it's much worse than the bulk API is
> anyway, and practically speaking I expect most devices that have
> problems here will also need more control over power anyway - it's
> certainly the common case that hardware has pretty basic requirements
> and is fairly tolerant.

I'm not extremely concerned here at the moment, as power should be the
last thing to be turned off, after clocks and reset signals. As clocks
and GPIOs will still be controlled manually in the driver .remove()
function, it means that power will go last, which should be fine.
However, should a devm_clk_get_enable() or similar function be
implemented, we'll run into trouble. Supplying active high input signals
to a device that is not powered can lead to latch-up, which tends to
only manifest after a statistically significant number of occurrences of
the condition, and can slowly damage the hardware over time. This is a
real concern as it will typically not be caught during early
development. I think we would still be better off with requiring drivers
to manually handle powering off the device until we provide a mechanism
that can do so safely in an automated way.

> > - Powering regulators on at probe time and leaving them on is a very bad
> >   practice from a power management point of view, and should really be
> >   discouraged. Adding convenience helpers to make this easy is the wrong
> >   message, we should instead push driver authors to implement proper
> >   runtime PM.
> 
> The stick simply isn't working here as far as I can see.

Do you think there's no way we can get it to work, instead of giving up
and adding an API that goes in the wrong direction ? :-( I'll give a
talk about the dangers of devm_* at the kernel summit, this is something
I can mention to raise awareness of the issue among maintainers,
hopefully leading to improvements through better reviews.
Stephen Boyd Aug. 15, 2022, 8:58 p.m. UTC | #4
Quoting Laurent Pinchart (2022-08-15 11:52:36)
> Hi Mark,
> 
> On Mon, Aug 15, 2022 at 05:33:06PM +0100, Mark Brown wrote:
> > On Mon, Aug 15, 2022 at 06:54:45PM +0300, Laurent Pinchart wrote:
> > 
> > > - With devres, you don't have full control over the order in which
> > >   resources will be released, which means that you can't control the
> > >   power off sequence, in particular if it needs to be sequenced with
> > >   GPIOs and clocks. That's not a concern for all drivers, but this API
> > >   will creep in in places where it shouldn't be used, driver authours
> > >   should really pay attention to power management and not live with the
> > >   false impression that everything will be handled automatically for
> > >   them. In the worst cases, an incorrect power off sequence could lead
> > >   to hardware damage.

I think the main issue is that platform drivers are being asked to do
too much. We've put the burden on platform driver authors to intimately
understand how their devices are integrated, and as we all know they're
not very interested in these details because they already have a hard
time to write a driver just to make their latest gizmo whir. Throw in
power management and you get these wrappers that try to compartmentalize
power management logic away from the main part of the driver that's
plugging into the driver subsystem because the SoC integration logic is
constantly changing but the device core isn't.

We need to enhance the platform bus layer to make it SoC aware when the
platform device is inside an SoC, or "board" aware when the device lives
outside of an SoC, i.e. it's a discrete IC. The bus layer should manage
power state transitions for the platform devices, and the platform
drivers should only be able to request runtime power/performance state
changes through device PM APIs (dev_pm_*). If this can all be done
through genpds then it sounds great. We may need to write some generic
code for discrete ICs that enables regulators and then clks before
muxing out pins or something like that. Obviously, I don't have all the
details figured out.

The basic idea is that drivers should be focused on what they're
driving, not navigating the (sometimes) complex integration that's
taking place around them. When a device driver probe function is called
the device should already be powered on. When the driver is
removed/unbound, the power should be removed after the driver's remove
function is called. We're only going to be able to solve the power
sequencing and ordering problem by taking away power control and
sequencing from drivers.

> > 
> > I basically agree with these concerns which is why I was only happy with
> > this API when Matti suggested doing it in a way that meant that the
> > callers are unable to access the regulator at runtime, this means that
> > if anyone wants to do any kind of management of the power state outside
> > of probe and remove they are forced to convert to the full fat APIs.
> > The general ordering concern with devm is that the free happens too late
> > but for the most part this isn't such a concern with regulators, they
> > might have delayed power off anyway due to sharing - it's no worse than
> > memory allocation AFAICT.  Given all the other APIs using devm it's
> > probably going to end up fixing some bugs.
> > 
> > For sequencing I'm not convinced it's much worse than the bulk API is
> > anyway, and practically speaking I expect most devices that have
> > problems here will also need more control over power anyway - it's
> > certainly the common case that hardware has pretty basic requirements
> > and is fairly tolerant.
> 
> I'm not extremely concerned here at the moment, as power should be the
> last thing to be turned off, after clocks and reset signals. As clocks
> and GPIOs will still be controlled manually in the driver .remove()
> function, it means that power will go last, which should be fine.
> However, should a devm_clk_get_enable() or similar function be

This API is implemented now.

> implemented, we'll run into trouble. Supplying active high input signals
> to a device that is not powered can lead to latch-up, which tends to
> only manifest after a statistically significant number of occurrences of
> the condition, and can slowly damage the hardware over time. This is a
> real concern as it will typically not be caught during early
> development. I think we would still be better off with requiring drivers
> to manually handle powering off the device until we provide a mechanism
> that can do so safely in an automated way.

Can you describe the error scenario further? I think it's driver author
error that would lead to getting and enabling the regulator after
getting and enabling a clk that drives out a clock signal on some pins
that aren't powered yet. I'm not sure that's all that much easier to do
with these sorts of devm APIs, but if it is then I'm concerned.

> 
> > > - Powering regulators on at probe time and leaving them on is a very bad
> > >   practice from a power management point of view, and should really be
> > >   discouraged. Adding convenience helpers to make this easy is the wrong
> > >   message, we should instead push driver authors to implement proper
> > >   runtime PM.
> > 
> > The stick simply isn't working here as far as I can see.
> 
> Do you think there's no way we can get it to work, instead of giving up
> and adding an API that goes in the wrong direction ? :-( I'll give a
> talk about the dangers of devm_* at the kernel summit, this is something
> I can mention to raise awareness of the issue among maintainers,
> hopefully leading to improvements through better reviews.
> 

I agree with Mark, the stick isn't working. We discussed these exact
same issues for years with the devm clk get APIs. Search the archives.
Laurent Pinchart Aug. 15, 2022, 9:17 p.m. UTC | #5
Hi Stephan,

On Mon, Aug 15, 2022 at 01:58:55PM -0700, Stephen Boyd wrote:
> Quoting Laurent Pinchart (2022-08-15 11:52:36)
> > On Mon, Aug 15, 2022 at 05:33:06PM +0100, Mark Brown wrote:
> > > On Mon, Aug 15, 2022 at 06:54:45PM +0300, Laurent Pinchart wrote:
> > > 
> > > > - With devres, you don't have full control over the order in which
> > > >   resources will be released, which means that you can't control the
> > > >   power off sequence, in particular if it needs to be sequenced with
> > > >   GPIOs and clocks. That's not a concern for all drivers, but this API
> > > >   will creep in in places where it shouldn't be used, driver authours
> > > >   should really pay attention to power management and not live with the
> > > >   false impression that everything will be handled automatically for
> > > >   them. In the worst cases, an incorrect power off sequence could lead
> > > >   to hardware damage.
> 
> I think the main issue is that platform drivers are being asked to do
> too much. We've put the burden on platform driver authors to intimately
> understand how their devices are integrated, and as we all know they're
> not very interested in these details because they already have a hard
> time to write a driver just to make their latest gizmo whir. Throw in
> power management and you get these wrappers that try to compartmentalize
> power management logic away from the main part of the driver that's
> plugging into the driver subsystem because the SoC integration logic is
> constantly changing but the device core isn't.
> 
> We need to enhance the platform bus layer to make it SoC aware when the
> platform device is inside an SoC, or "board" aware when the device lives
> outside of an SoC, i.e. it's a discrete IC. The bus layer should manage
> power state transitions for the platform devices, and the platform
> drivers should only be able to request runtime power/performance state
> changes through device PM APIs (dev_pm_*). If this can all be done
> through genpds then it sounds great. We may need to write some generic
> code for discrete ICs that enables regulators and then clks before
> muxing out pins or something like that. Obviously, I don't have all the
> details figured out.
> 
> The basic idea is that drivers should be focused on what they're
> driving, not navigating the (sometimes) complex integration that's
> taking place around them. When a device driver probe function is called
> the device should already be powered on.

No. ACPI does that in many cases, and that's a real bad idea. There are
devices that you do *not* want to power up on probe. I'm thinking, for
example, about camera sensors that have a privacy LED that will light up
when the sensor is powered up. You don't want it to flash on boot. There
are also other use cases related to fault tolerance where you want
drivers to initialize properly and only access the device later.

> When the driver is
> removed/unbound, the power should be removed after the driver's remove
> function is called. We're only going to be able to solve the power
> sequencing and ordering problem by taking away power control and
> sequencing from drivers.

For SoC devices we may be able to achieve this to some extent, at least
for simple devices that don't have exotic requirements (by exotic I mean
requiring more than one clock for instance, so it's not *that* exotic).
For on-board devices that's impossible by nature, the power up/down
constraints are specific to the device, it's the job of the driver to
handle them, the same way it has to handle everything else that is
device-specific.

The right way to handle this in my opinion is to push for RPM support in
drivers. The regulator, reset, clock & co. handling then happens in the
RPM suspend and resume handlers, they're self-contained, well ordered,
and much easier to review. The rest of the driver then only uses the RPM
API. It's easy(er) and clean (at least when you don't throw ACPI to the
mix, with its "peculiar" idea of calling probe with the device powered
on *BUT* in the RPM suspended state - due to historical reasons I
believe - but I think that should be fixable on the ACPI side, albeit
perhaps not without substantial effort), and you get simple review
rules: if the driver doesn't implement RPM, or handles resource
enable/disable outside of the RPM suspend/resume handlers, it's most
likely getting it wrong.

These devres helpers go in the exact opposite direction of what we
should be doing, by telling driver authors it's totally fine to not
implement power management. Why don't we just drop error handling and go
back to the big kernel lock in that case ? That was much easier to
program too.

> > > I basically agree with these concerns which is why I was only happy with
> > > this API when Matti suggested doing it in a way that meant that the
> > > callers are unable to access the regulator at runtime, this means that
> > > if anyone wants to do any kind of management of the power state outside
> > > of probe and remove they are forced to convert to the full fat APIs.
> > > The general ordering concern with devm is that the free happens too late
> > > but for the most part this isn't such a concern with regulators, they
> > > might have delayed power off anyway due to sharing - it's no worse than
> > > memory allocation AFAICT.  Given all the other APIs using devm it's
> > > probably going to end up fixing some bugs.
> > > 
> > > For sequencing I'm not convinced it's much worse than the bulk API is
> > > anyway, and practically speaking I expect most devices that have
> > > problems here will also need more control over power anyway - it's
> > > certainly the common case that hardware has pretty basic requirements
> > > and is fairly tolerant.
> > 
> > I'm not extremely concerned here at the moment, as power should be the
> > last thing to be turned off, after clocks and reset signals. As clocks
> > and GPIOs will still be controlled manually in the driver .remove()
> > function, it means that power will go last, which should be fine.
> > However, should a devm_clk_get_enable() or similar function be
> 
> This API is implemented now.

:-( We're really going straight into the wall.

> > implemented, we'll run into trouble. Supplying active high input signals
> > to a device that is not powered can lead to latch-up, which tends to
> > only manifest after a statistically significant number of occurrences of
> > the condition, and can slowly damage the hardware over time. This is a
> > real concern as it will typically not be caught during early
> > development. I think we would still be better off with requiring drivers
> > to manually handle powering off the device until we provide a mechanism
> > that can do so safely in an automated way.
> 
> Can you describe the error scenario further? I think it's driver author
> error that would lead to getting and enabling the regulator after
> getting and enabling a clk that drives out a clock signal on some pins
> that aren't powered yet. I'm not sure that's all that much easier to do
> with these sorts of devm APIs, but if it is then I'm concerned.

You will very quickly see drivers doing this (either directly or
indirectly):

probe()
{
	devm_clk_get_enabled();
	devm_regulator_get_enable();
}

Without a devres-based get+enable API drivers can get the resources they
need in any order, possibly moving some of those resource acquisition
operations to different functions, and then have a clear block of code
that enables the resources in the right order. These devres helpers give
a false sense of security to driver authors and they will end up
introducing problems, the same way that devm_kzalloc() makes it
outrageously easy to crash the kernel by disconnecting a device that is
in use.

> > > > - Powering regulators on at probe time and leaving them on is a very bad
> > > >   practice from a power management point of view, and should really be
> > > >   discouraged. Adding convenience helpers to make this easy is the wrong
> > > >   message, we should instead push driver authors to implement proper
> > > >   runtime PM.
> > > 
> > > The stick simply isn't working here as far as I can see.
> > 
> > Do you think there's no way we can get it to work, instead of giving up
> > and adding an API that goes in the wrong direction ? :-( I'll give a
> > talk about the dangers of devm_* at the kernel summit, this is something
> > I can mention to raise awareness of the issue among maintainers,
> > hopefully leading to improvements through better reviews.
> 
> I agree with Mark, the stick isn't working. We discussed these exact
> same issues for years with the devm clk get APIs. Search the archives.

And the conclusion was to just give up and do nothing ? :-(
Mark Brown Aug. 15, 2022, 10:07 p.m. UTC | #6
On Mon, Aug 15, 2022 at 01:58:55PM -0700, Stephen Boyd wrote:
> Quoting Laurent Pinchart (2022-08-15 11:52:36)
> > On Mon, Aug 15, 2022 at 05:33:06PM +0100, Mark Brown wrote:
> > > On Mon, Aug 15, 2022 at 06:54:45PM +0300, Laurent Pinchart wrote:

> > > > - With devres, you don't have full control over the order in which
> > > >   resources will be released, which means that you can't control the
> > > >   power off sequence, in particular if it needs to be sequenced with
> > > >   GPIOs and clocks. That's not a concern for all drivers, but this API
> > > >   will creep in in places where it shouldn't be used, driver authours
> > > >   should really pay attention to power management and not live with the
> > > >   false impression that everything will be handled automatically for
> > > >   them. In the worst cases, an incorrect power off sequence could lead
> > > >   to hardware damage.

> I think the main issue is that platform drivers are being asked to do
> too much. We've put the burden on platform driver authors to intimately
> understand how their devices are integrated, and as we all know they're

This is for the regulator API, it's mainly for off SoC devices so it's
not a question of understanding the integration of a device into a piece
of silicon, it's a question of understanding the integration of a chip
into a board which seems reasonably in scope for a chip driver and is
certainly the sort of thing that you'd be talking to your customers
about as a silicon vendor.

> The basic idea is that drivers should be focused on what they're
> driving, not navigating the (sometimes) complex integration that's
> taking place around them. When a device driver probe function is called
> the device should already be powered on. When the driver is
> removed/unbound, the power should be removed after the driver's remove
> function is called. We're only going to be able to solve the power
> sequencing and ordering problem by taking away power control and
> sequencing from drivers.

That is a sensible approach for most on SoC things but for something
shipped as a separate driver there's little point in separating the
power and clocking domain driver from the device since there's typically
a 1:1 mapping.  Usually either it's extremely simple (eg, turn
everything on and remove reset) but some devices really need to manage
things.  There's obviously some edge cases in SoC integration as well
(eg, the need to manage card supplies for SD controllers, or knowing
exact clock rates for things like audio controllers) so you need some
flex.
Mark Brown Aug. 15, 2022, 10:55 p.m. UTC | #7
On Tue, Aug 16, 2022 at 12:17:17AM +0300, Laurent Pinchart wrote:
> On Mon, Aug 15, 2022 at 01:58:55PM -0700, Stephen Boyd wrote:

> > The basic idea is that drivers should be focused on what they're
> > driving, not navigating the (sometimes) complex integration that's
> > taking place around them. When a device driver probe function is called
> > the device should already be powered on.

> No. ACPI does that in many cases, and that's a real bad idea. There are
> devices that you do *not* want to power up on probe. I'm thinking, for
> example, about camera sensors that have a privacy LED that will light up
> when the sensor is powered up. You don't want it to flash on boot. There
> are also other use cases related to fault tolerance where you want
> drivers to initialize properly and only access the device later.

I don't think it's an either/or thing in terms of approach here - we
need a range of options to choose from.  ACPI is totally fine and solves
real problems for the systems it targets, the problems we see with it
are mainly that it has a very strong system abstraction and doesn't cope
well when things go outside that coupled with the fact that Windows long
ago decided that board files were totally fine for papering over any
problems so people haven't worked on standardisation where they should.
Some SoCs like to do similar things with their power controller cores.

Conversely for example with many (but not all) SoC IPs the mechanics of
the system integration and range of options available are such that
dealing with them is kind of out of scope of the driver, but they're
often very repetitive over any given SoC so there is a benefit in
pushing things into power domains rather than having the driver for the
IP manage everything.  We need to be able to be flexible so we can find
the best idioms to represent the different systems in front of us rather
than trying to force all systems into a single idiom.

> These devres helpers go in the exact opposite direction of what we
> should be doing, by telling driver authors it's totally fine to not
> implement power management. Why don't we just drop error handling and go
> back to the big kernel lock in that case ? That was much easier to
> program too.

Sometimes it's totally fine to not worry, at least at a first pass.
Perhaps you're more concerned with real time, perhaps your system
doesn't provide control for the relevant resources.  Sometimes the
savings are so negligable that it's questionable if doing the power
manageement is an overall power saving.

> You will very quickly see drivers doing this (either directly or
> indirectly):

> probe()
> {
> 	devm_clk_get_enabled();
> 	devm_regulator_get_enable();
> }

> Without a devres-based get+enable API drivers can get the resources they
> need in any order, possibly moving some of those resource acquisition
> operations to different functions, and then have a clear block of code
> that enables the resources in the right order. These devres helpers give
> a false sense of security to driver authors and they will end up
> introducing problems, the same way that devm_kzalloc() makes it
> outrageously easy to crash the kernel by disconnecting a device that is
> in use.

TBH I think the problem you have here is with devm not with this
particular function.  That's a different conversation, and a totally
valid one especially when you start looking at things like implementing
userspace APIs which need to cope with hardware going away while still
visible to userspace.  It's *probably* more of a subsystem conversation
than a driver one though, or at least I think subsystems should try to
arrange to make it so.
Matti Vaittinen Aug. 16, 2022, 4:56 a.m. UTC | #8
Hi dee Ho Mark, Laurent, Stephen, all

On 8/16/22 01:55, Mark Brown wrote:
> On Tue, Aug 16, 2022 at 12:17:17AM +0300, Laurent Pinchart wrote:
>> On Mon, Aug 15, 2022 at 01:58:55PM -0700, Stephen Boyd wrote:
> 
>> You will very quickly see drivers doing this (either directly or
>> indirectly):
> 
>> probe()
>> {
>> 	devm_clk_get_enabled();
>> 	devm_regulator_get_enable();
>> }
> 
>> Without a devres-based get+enable API drivers can get the resources they
>> need in any order, possibly moving some of those resource acquisition
>> operations to different functions, and then have a clear block of code
>> that enables the resources in the right order.

I agree. And I think that drivers which do that should stick with it. 
Still, as you know the devm-unwinding is also done in well defined 
order. I believe that instead of fighting against the devm we should try 
educate people to pay attention in the order of unwinding (also when not 
handled by the devm. Driver writers occasionally break things also w/o 
devm for example by freeing resources needed by IRQ handlers prior 
freeing the IRQ.)

If "purging" must not be done in reverse order compared to the 
aquisition - then one should not use devm. I know people have done 
errors with devm - OTOH, I've seen devm also fixing bunch of errors.

>> These devres helpers give
>> a false sense of security to driver authors and they will end up
>> introducing problems, the same way that devm_kzalloc() makes it
>> outrageously easy to crash the kernel by disconnecting a device that is
>> in use.

I think this is going a bit "off-topic" but I'd like to understand what 
is behind this statement? From device-writer's perspective - I don't 
know much better alternatives to free up the memory. I don't see how 
freeing stuff at .remove would be any better? As far as I understand - 
if someone is using driver's resources after the device has gone and the 
driver is detached, then there is not much the driver could do to 
free-up the stuff be it devm or not? This sounds like fundamentally 
different problem (to me).

> TBH I think the problem you have here is with devm not with this
> particular function.

I must say I kind of agree with Mark. If we stop for a second to think 
what would the Laurent's example look like if there were no 
devm_regulator_get_enable() provided. I bet the poor driver author could 
have used devm_clk_get_enabled() - and then implemented a .remove for 
disabling the regulator. That would be even worse, right?

> That's a different conversation, and a totally
> valid one especially when you start looking at things like implementing
> userspace APIs which need to cope with hardware going away while still
> visible to userspace.

This is interesting. It's not easy for me to spot how devm changes 
things here? If we consider some removable device - then I guess also 
the .remove() is ran only after HW has already gone? Yes, devm might 
make the time window when userspace can see hardware that has gone 
longer but does it bring any new problem there? It seems to me devm can 
make hitting the spot more likely - but I don't think it brings 
completely new issues? (Well, I may be wrong here - wouldn't be the 
first time :])

> It's *probably* more of a subsystem conversation
> than a driver one though, or at least I think subsystems should try to
> arrange to make it so.
Andy Shevchenko Aug. 16, 2022, 8:23 a.m. UTC | #9
On Mon, Aug 15, 2022 at 11:20 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
> On Mon, Aug 15, 2022 at 05:33:06PM +0100, Mark Brown wrote:

...

> However, should a devm_clk_get_enable() or similar function be
> implemented, we'll run into trouble.

And in 5.19 we have devm_clk_get_enable(), are we already in trouble?
Andy Shevchenko Aug. 16, 2022, 8:42 a.m. UTC | #10
On Tue, Aug 16, 2022 at 8:37 AM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
> On Mon, Aug 15, 2022 at 01:58:55PM -0700, Stephen Boyd wrote:
> > Quoting Laurent Pinchart (2022-08-15 11:52:36)
> > > On Mon, Aug 15, 2022 at 05:33:06PM +0100, Mark Brown wrote:

...

> > > we'll run into trouble. Supplying active high input signals
> > > to a device that is not powered can lead to latch-up, which tends to
> > > only manifest after a statistically significant number of occurrences of
> > > the condition, and can slowly damage the hardware over time. This is a
> > > real concern as it will typically not be caught during early
> > > development. I think we would still be better off with requiring drivers
> > > to manually handle powering off the device until we provide a mechanism
> > > that can do so safely in an automated way.
> >
> > Can you describe the error scenario further? I think it's driver author
> > error that would lead to getting and enabling the regulator after
> > getting and enabling a clk that drives out a clock signal on some pins
> > that aren't powered yet. I'm not sure that's all that much easier to do
> > with these sorts of devm APIs, but if it is then I'm concerned.
>
> You will very quickly see drivers doing this (either directly or
> indirectly):
>
> probe()
> {
>         devm_clk_get_enabled();
>         devm_regulator_get_enable();
> }

And how is it devm specific? If the author puts the same without devm
the ordering would be wrong, correct? devm allows us to focus on
ordering in a *single* place, which is a win. You seem to be proposing
to make a high burden on a driver's author to focus on ordering in the
*three* places. I disagree with that. Yet the driver author has to
understand many issues with any tool they use. So the root cause of
your whining is rather on the edge of documentation and education.
(Yes, I have heard about issues with object lifetime in v4l2
subdevices regarding to devm, but it seems irrelevant to devm
mechanism itself.)

> Without a devres-based get+enable API drivers can get the resources they
> need in any order, possibly moving some of those resource acquisition
> operations to different functions, and then have a clear block of code
> that enables the resources in the right order. These devres helpers give
> a false sense of security to driver authors and they will end up
> introducing problems, the same way that devm_kzalloc() makes it
> outrageously easy to crash the kernel by disconnecting a device that is
> in use.
Mark Brown Aug. 16, 2022, 10:36 a.m. UTC | #11
On Tue, Aug 16, 2022 at 07:56:06AM +0300, Matti Vaittinen wrote:
> On 8/16/22 01:55, Mark Brown wrote:
> > On Tue, Aug 16, 2022 at 12:17:17AM +0300, Laurent Pinchart wrote:

> > > These devres helpers give
> > > a false sense of security to driver authors and they will end up
> > > introducing problems, the same way that devm_kzalloc() makes it
> > > outrageously easy to crash the kernel by disconnecting a device that is
> > > in use.

> I think this is going a bit "off-topic" but I'd like to understand what is
> behind this statement? From device-writer's perspective - I don't know much
> better alternatives to free up the memory. I don't see how freeing stuff at
> .remove would be any better? As far as I understand - if someone is using
> driver's resources after the device has gone and the driver is detached,
> then there is not much the driver could do to free-up the stuff be it devm
> or not? This sounds like fundamentally different problem (to me).

If a driver has done something like create a file that's accessible to
userspace then that file may be held open by userspace even after the
device goes away, the driver that created the file needs to ensure that
any storage that's used by the file remains allocated until the file is
also freed (typically this is data specific to the file rather than the
full device data).  Similar situations can exist elsewhere, that's just
the common case.  There'll be a deletion callback on whatever other
object causes the problem, the allocation needs to be reference counted
against both the the device and whatever other users there might be.

> > That's a different conversation, and a totally
> > valid one especially when you start looking at things like implementing
> > userspace APIs which need to cope with hardware going away while still
> > visible to userspace.

> This is interesting. It's not easy for me to spot how devm changes things
> here? If we consider some removable device - then I guess also the .remove()
> is ran only after HW has already gone? Yes, devm might make the time window
> when userspace can see hardware that has gone longer but does it bring any
> new problem there? It seems to me devm can make hitting the spot more likely
> - but I don't think it brings completely new issues? (Well, I may be wrong
> here - wouldn't be the first time :])

See above, something can still know the device was there even after it's
gone.
Vaittinen, Matti Aug. 16, 2022, 11:06 a.m. UTC | #12
On 8/16/22 13:36, Mark Brown wrote:
> On Tue, Aug 16, 2022 at 07:56:06AM +0300, Matti Vaittinen wrote:
>> On 8/16/22 01:55, Mark Brown wrote:
>>> On Tue, Aug 16, 2022 at 12:17:17AM +0300, Laurent Pinchart wrote:
> 
>>>> These devres helpers give
>>>> a false sense of security to driver authors and they will end up
>>>> introducing problems, the same way that devm_kzalloc() makes it
>>>> outrageously easy to crash the kernel by disconnecting a device that is
>>>> in use.
> 
>> I think this is going a bit "off-topic" but I'd like to understand what is
>> behind this statement? From device-writer's perspective - I don't know much
>> better alternatives to free up the memory. I don't see how freeing stuff at
>> .remove would be any better? As far as I understand - if someone is using
>> driver's resources after the device has gone and the driver is detached,
>> then there is not much the driver could do to free-up the stuff be it devm
>> or not? This sounds like fundamentally different problem (to me).
> 
> If a driver has done something like create a file that's accessible to
> userspace then that file may be held open by userspace even after the
> device goes away, the driver that created the file needs to ensure that
> any storage that's used by the file remains allocated until the file is
> also freed (typically this is data specific to the file rather than the
> full device data).  Similar situations can exist elsewhere, that's just
> the common case.  There'll be a deletion callback on whatever other
> object causes the problem, the allocation needs to be reference counted
> against both the the device and whatever other users there might be.

Oh right. Thanks. So we're discussing (a corner?) case where the freeing 
is done by a callback that was registered by a driver. Callback being 
called for example when the refcount for a resource gets down, 
potentially long after the driver was gone.

I see the problem of releasing something with devm in such case. Still, 
I don't think it is something we should avoid by banning the use of devm 
- which is useful in many other cases. It'd be equally wrong release the 
resource in .remove() or doing any other "double freeing". To me this 
boils down to educating people about the life-times.

I wonder if writing such 'release callbacks' is compulsory? I mean, if I 
was writing a driver to some new (to me) subsystem and was required to 
write an explicit release-callback for a resource - then it'd surely 
rang a bell about potentially double freeing stuff with devm. Especially 
if the doc stated the callback can be called after the driver has been 
detached.

> 
>>> That's a different conversation, and a totally
>>> valid one especially when you start looking at things like implementing
>>> userspace APIs which need to cope with hardware going away while still
>>> visible to userspace.
> 
>> This is interesting. It's not easy for me to spot how devm changes things
>> here? If we consider some removable device - then I guess also the .remove()
>> is ran only after HW has already gone? Yes, devm might make the time window
>> when userspace can see hardware that has gone longer but does it bring any
>> new problem there? It seems to me devm can make hitting the spot more likely
>> - but I don't think it brings completely new issues? (Well, I may be wrong
>> here - wouldn't be the first time :])
> 
> See above, something can still know the device was there even after it's
> gone.

Yes. Thanks for the education.
I'm still not sure I understand how devm changes the picture. I'd guess 
in the case you described above, the user-space would still see the 
device until it closes the file and the call-back cleans-up. I guess 
freeing the stuff (that is used until user-space drops the handle) 
anywhere except the clean-up call-back is wrong - be the mechanism devm 
or driver's .remove() or any other. To me the key is still teaching 
people to know what bits may be used after the driver has been detached.

Thanks for the explanation guys - this has been insightful to me :)

Best Regards
-- Matti
Mark Brown Aug. 16, 2022, 11:31 a.m. UTC | #13
On Tue, Aug 16, 2022 at 11:06:21AM +0000, Vaittinen, Matti wrote:

> I wonder if writing such 'release callbacks' is compulsory? I mean, if I 
> was writing a driver to some new (to me) subsystem and was required to 
> write an explicit release-callback for a resource - then it'd surely 
> rang a bell about potentially double freeing stuff with devm. Especially 
> if the doc stated the callback can be called after the driver has been 
> detached.

Generally yes, thoguh people can and do leave them blank and it's easy
enough to do some cleanup in there that assumes that the device is still
present and not think the device might've gone away especially if the
hardware isn't practically hotpluggable.
Matti Vaittinen Aug. 18, 2022, 11:33 a.m. UTC | #14
Hi Mark,

On 8/15/22 18:44, Mark Brown wrote:
> On Fri, 12 Aug 2022 13:08:17 +0300, Matti Vaittinen wrote:
>> Devm helpers for regulator get and enable
>>
>> First patch in the series is actually just a simple documentation fix
>> which could be taken in as it is now.
>>
>> A few* drivers seem to use pattern demonstrated by pseudocode:
>>
>> [...]
> 
> Applied to
> 
>     https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-next
> 
> Thanks!
> 
> [1/7] docs: devres: regulator: Add missing devm_* functions to devres.rst
>        commit: 9b6744f60b6b47bc0757a1955adb4d2c3ab22e13
> [2/7] regulator: Add devm helpers for get and enable
>        (no commit info)
> 

I was planning to send out the v3 (where IIO patches are no longer 
squashed into one). I didn't spot the above mentioned patch 2/7 from 
regulator/for-next. I'd just like to get confirmation the 2/7 was not 
merged even though it's mentioned in this mail before re-spinning the 
series with it.

Best Regards
   --Matti
Mark Brown Aug. 18, 2022, 11:54 a.m. UTC | #15
On Thu, Aug 18, 2022 at 02:33:53PM +0300, Matti Vaittinen wrote:
> On 8/15/22 18:44, Mark Brown wrote:

> > [2/7] regulator: Add devm helpers for get and enable
> >        (no commit info)

> I was planning to send out the v3 (where IIO patches are no longer squashed
> into one). I didn't spot the above mentioned patch 2/7 from
> regulator/for-next. I'd just like to get confirmation the 2/7 was not merged
> even though it's mentioned in this mail before re-spinning the series with
> it.

It's not there yet (that's the "no commit info"), but it is queued for
today.
Vaittinen, Matti Aug. 18, 2022, 12:04 p.m. UTC | #16
On 8/18/22 14:54, Mark Brown wrote:
> On Thu, Aug 18, 2022 at 02:33:53PM +0300, Matti Vaittinen wrote:
>> On 8/15/22 18:44, Mark Brown wrote:
> 
>>> [2/7] regulator: Add devm helpers for get and enable
>>>         (no commit info)
> 
>> I was planning to send out the v3 (where IIO patches are no longer squashed
>> into one). I didn't spot the above mentioned patch 2/7 from
>> regulator/for-next. I'd just like to get confirmation the 2/7 was not merged
>> even though it's mentioned in this mail before re-spinning the series with
>> it.
> 
> It's not there yet (that's the "no commit info"), but it is queued for
> today.

Understood. Thanks! I'll rebase the next version on top of the 
regulator/for-next when it's out then.

--Matti
Stephen Boyd Aug. 30, 2022, 7:42 p.m. UTC | #17
Quoting Mark Brown (2022-08-15 15:07:35)
> On Mon, Aug 15, 2022 at 01:58:55PM -0700, Stephen Boyd wrote:
> 
> > I think the main issue is that platform drivers are being asked to do
> > too much. We've put the burden on platform driver authors to intimately
> > understand how their devices are integrated, and as we all know they're
> 
> This is for the regulator API, it's mainly for off SoC devices so it's
> not a question of understanding the integration of a device into a piece
> of silicon, it's a question of understanding the integration of a chip
> into a board which seems reasonably in scope for a chip driver and is
> certainly the sort of thing that you'd be talking to your customers
> about as a silicon vendor.

Right. I'm coming from the devm_clk_get_*() APIs angle when saying that
platform drivers don't want to know everything.

> 
> > The basic idea is that drivers should be focused on what they're
> > driving, not navigating the (sometimes) complex integration that's
> > taking place around them. When a device driver probe function is called
> > the device should already be powered on. When the driver is
> > removed/unbound, the power should be removed after the driver's remove
> > function is called. We're only going to be able to solve the power
> > sequencing and ordering problem by taking away power control and
> > sequencing from drivers.
> 
> That is a sensible approach for most on SoC things but for something
> shipped as a separate driver there's little point in separating the
> power and clocking domain driver from the device since there's typically
> a 1:1 mapping.  Usually either it's extremely simple (eg, turn
> everything on and remove reset) but some devices really need to manage
> things.  There's obviously some edge cases in SoC integration as well
> (eg, the need to manage card supplies for SD controllers, or knowing
> exact clock rates for things like audio controllers) so you need some
> flex.

I think we're on the same page. The clk API bridges both on SoC and off
SoC devices, but leans more towards on SoC devices so I'm coming from
that angle. 

I agree it doesn't make sense to rip out and move power management logic
for off SoC devices (your chip driver), because then you get a driver
that is split to two places. The hardware engineer for those types of
devices has designed the chip to be more aware of the system integration
and how their chip is powered, so that it can be easily integrated into
various designs without their involvement. This allows it to be used on
numerous boards and that's partly the reason why Linux doesn't have
board files or board "drivers" because the combinatorial explosion is
unmanageable, hence DTS and driver subsystems. The boundary of the
combinations ends at the chip which is 1:1 with the platform driver.

For on SoC devices, the hardware engineer typically isn't involved in
the system integration at all. Instead they hand that task off to the
SoC integrator who has to wire everything up (clks, power, resets) and
layout the SoC. The combinatorial explosion isn't possible here, because
only so many SoCs are ever created and customers can't rewire the
internals of the SoC to change which clks go there (although I guess
with FPGAs this may be possible). The boundary where the combinations
exist is at the device level, not the SoC level, but we've encoded the
SoC details into the compatible strings and the drivers to the point
that the boundary is pushed to the SoC level.

For these on SoC devices, we should extract and consolidate the power
management logic away from the drivers, because we're spreading the SoC
integration knowledge all around the drivers/ directory for every device
class that exists in that SoC. I continue to see drivers that get
another clk in the next SoC generation because there was some change to
split a clk domain or they get another regulator because they split a
power domain. The driver doesn't care, but it has to match a new
compatible string and then get the proper list of clks or regulators
even though it just wants to turn the thing on and get it running. This
gunk needs to go. Runtime PM is a solution to part of the problem, but I
think RPM ops should be about poking device registers for these on SoC
devices, not about controlling yet another clk or regulator that got
wired up this SoC generation.

Probably we need to get away from having platform driver probe for on
SoC devices get resources like clks, regulators, and interconnects at
all. Instead, those should be managed by some "SoC" driver that knows
the integration of the SoC and can make sure the proper power sequencing
is followed. Hopefully we can do this largely via genpd and RPM, with a
little bit of help from some SoC driver that registers genpds for
devices under /soc. Of course there are exact clk frequencies required
sometimes (audio rates, display link rates, serial baud rates, etc.) but
those sorts of things could use a higher level of abstraction so that we
don't get bogged down in the details of which clk needs to be used to
set the clk frequency when this compatible is present. Maybe
dev_pm_opp_set_rate() is the answer there, but then we have some drivers
that call clk APIs like clk_round_rate() that we'll need to figure out
how to manage.

Long story short, there's a need for both approaches so that we can
manage the combinatorial complexity at the place where it is. I hope the
devm APIs are going to help us find that place so we can come up with a
solution to the "drivers don't want to know" problem when we see that
XYZ driver can use a genpd that turns on the power along with RPM to
turn it off during suspend.