mbox series

[v9,0/4] driver core: add probe error check helper

Message ID 20200713144324.23654-1-a.hajda@samsung.com (mailing list archive)
Headers show
Series driver core: add probe error check helper | expand

Message

Andrzej Hajda July 13, 2020, 2:43 p.m. UTC
Hi All,

Thanks for comments.

Changes since v8:
- fixed typo in function name,
- removed cocci script (added by mistake)

Changes since v7:
- improved commit message
- added R-Bs

Changes since v6:
- removed leftovers from old naming scheme in commit descritions,
- added R-Bs.

Changes since v5:
- removed patch adding macro, dev_err_probe(dev, PTR_ERR(ptr), ...) should be used instead,
- added dev_dbg logging in case of -EPROBE_DEFER,
- renamed functions and vars according to comments,
- extended docs,
- cosmetics.

Original message (with small adjustments):

Recently I took some time to re-check error handling in drivers probe code,
and I have noticed that number of incorrect resource acquisition error handling
increased and there are no other propositions which can cure the situation.

So I have decided to resend my old proposition of probe_err helper which should
simplify resource acquisition error handling, it also extend it with adding defer
probe reason to devices_deferred debugfs property, which should improve debugging
experience for developers/testers.

I have also added two patches showing usage and benefits of the helper.

My dirty/ad-hoc cocci scripts shows that this helper can be used in at least 2700 places
saving about 3500 lines of code.

Regards
Andrzej


Andrzej Hajda (4):
  driver core: add device probe log helper
  driver core: add deferring probe reason to devices_deferred property
  drm/bridge/sii8620: fix resource acquisition error handling
  drm/bridge: lvds-codec: simplify error handling

 drivers/base/base.h                  |  3 ++
 drivers/base/core.c                  | 46 ++++++++++++++++++++++++++++
 drivers/base/dd.c                    | 23 +++++++++++++-
 drivers/gpu/drm/bridge/lvds-codec.c  | 10 ++----
 drivers/gpu/drm/bridge/sil-sii8620.c | 21 ++++++-------
 include/linux/device.h               |  3 ++
 6 files changed, 86 insertions(+), 20 deletions(-)

Comments

Andrzej Hajda July 28, 2020, 3:05 p.m. UTC | #1
Hi Greg,

Apparently the patchset has no more comments.

Could you take the patches to your tree? At least 1st and 2nd.


Regards

Andrzej


On 13.07.2020 16:43, Andrzej Hajda wrote:
> Hi All,
>
> Thanks for comments.
>
> Changes since v8:
> - fixed typo in function name,
> - removed cocci script (added by mistake)
>
> Changes since v7:
> - improved commit message
> - added R-Bs
>
> Changes since v6:
> - removed leftovers from old naming scheme in commit descritions,
> - added R-Bs.
>
> Changes since v5:
> - removed patch adding macro, dev_err_probe(dev, PTR_ERR(ptr), ...) should be used instead,
> - added dev_dbg logging in case of -EPROBE_DEFER,
> - renamed functions and vars according to comments,
> - extended docs,
> - cosmetics.
>
> Original message (with small adjustments):
>
> Recently I took some time to re-check error handling in drivers probe code,
> and I have noticed that number of incorrect resource acquisition error handling
> increased and there are no other propositions which can cure the situation.
>
> So I have decided to resend my old proposition of probe_err helper which should
> simplify resource acquisition error handling, it also extend it with adding defer
> probe reason to devices_deferred debugfs property, which should improve debugging
> experience for developers/testers.
>
> I have also added two patches showing usage and benefits of the helper.
>
> My dirty/ad-hoc cocci scripts shows that this helper can be used in at least 2700 places
> saving about 3500 lines of code.
>
> Regards
> Andrzej
>
>
> Andrzej Hajda (4):
>    driver core: add device probe log helper
>    driver core: add deferring probe reason to devices_deferred property
>    drm/bridge/sii8620: fix resource acquisition error handling
>    drm/bridge: lvds-codec: simplify error handling
>
>   drivers/base/base.h                  |  3 ++
>   drivers/base/core.c                  | 46 ++++++++++++++++++++++++++++
>   drivers/base/dd.c                    | 23 +++++++++++++-
>   drivers/gpu/drm/bridge/lvds-codec.c  | 10 ++----
>   drivers/gpu/drm/bridge/sil-sii8620.c | 21 ++++++-------
>   include/linux/device.h               |  3 ++
>   6 files changed, 86 insertions(+), 20 deletions(-)
>
Greg KH July 30, 2020, 7:08 a.m. UTC | #2
On Tue, Jul 28, 2020 at 05:05:03PM +0200, Andrzej Hajda wrote:
> Hi Greg,
> 
> Apparently the patchset has no more comments.
> 
> Could you take the patches to your tree? At least 1st and 2nd.

All now queued up, thanks!

greg k-h
Dmitry Torokhov July 30, 2020, 4:18 p.m. UTC | #3
On Thu, Jul 30, 2020 at 12:10 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Tue, Jul 28, 2020 at 05:05:03PM +0200, Andrzej Hajda wrote:
> > Hi Greg,
> >
> > Apparently the patchset has no more comments.
> >
> > Could you take the patches to your tree? At least 1st and 2nd.
>
> All now queued up, thanks!

I believe it still has not been answered why this can't be pushed into
resource providers (clock, regulators, gpio, interrupts, etc),
especially for devm APIs where we know exactly what device we are
requesting a resource for, so that individual drivers do not need to
change anything. We can mark the device as being probed so that probe
deferral is only handled when we actually execute probe() (and for the
bonus points scream loudly if someone tries to return -EPROBE_DEFER
outside of probe path).

And now with coccinelle script we can expect a deluge of patches
reshuffling drivers...

Thanks.
Mark Brown July 30, 2020, 4:48 p.m. UTC | #4
On Thu, Jul 30, 2020 at 09:18:30AM -0700, Dmitry Torokhov wrote:

> I believe it still has not been answered why this can't be pushed into
> resource providers (clock, regulators, gpio, interrupts, etc),
> especially for devm APIs where we know exactly what device we are
> requesting a resource for, so that individual drivers do not need to
> change anything.

The error messages are frequently in the caller rather than the
frameworks, it's often helpful for the comprehensibility of the error
messages especially in cases where things may be legitimately absent.

>                  We can mark the device as being probed so that probe
> deferral is only handled when we actually execute probe() (and for the
> bonus points scream loudly if someone tries to return -EPROBE_DEFER
> outside of probe path).

Is this a big issue?
Dmitry Torokhov July 30, 2020, 5:46 p.m. UTC | #5
On Thu, Jul 30, 2020 at 9:49 AM Mark Brown <broonie@kernel.org> wrote:
>
> On Thu, Jul 30, 2020 at 09:18:30AM -0700, Dmitry Torokhov wrote:
>
> > I believe it still has not been answered why this can't be pushed into
> > resource providers (clock, regulators, gpio, interrupts, etc),
> > especially for devm APIs where we know exactly what device we are
> > requesting a resource for, so that individual drivers do not need to
> > change anything.
>
> The error messages are frequently in the caller rather than the
> frameworks, it's often helpful for the comprehensibility of the error
> messages especially in cases where things may be legitimately absent.

Not for deferral. All you need to know in this case is:

"device A is attempting to request resource B which is not ready yet"

There is nothing to handle on the caller part except to float the error up.

>
> >                  We can mark the device as being probed so that probe
> > deferral is only handled when we actually execute probe() (and for the
> > bonus points scream loudly if someone tries to return -EPROBE_DEFER
> > outside of probe path).
>
> Is this a big issue?

We do not know ;) Probably not. It will just get reported as an
ordinary failure and the driver will handle it somehow. Still it would
be nice to know if we attempt to raise deferrals in code paths where
they do not make sense.

Thanks.
Mark Brown July 30, 2020, 6:16 p.m. UTC | #6
On Thu, Jul 30, 2020 at 10:46:31AM -0700, Dmitry Torokhov wrote:
> On Thu, Jul 30, 2020 at 9:49 AM Mark Brown <broonie@kernel.org> wrote:

> > The error messages are frequently in the caller rather than the
> > frameworks, it's often helpful for the comprehensibility of the error
> > messages especially in cases where things may be legitimately absent.

> Not for deferral. All you need to know in this case is:

> "device A is attempting to request resource B which is not ready yet"

> There is nothing to handle on the caller part except to float the error up.

You can sometimes do a better job of explaining what the resource you
were looking for was, and of course you still need diagnostics in the
non-deferral case.  Whatever happens we'll need a lot of per-driver
churn, either removing existing diagnostics that get factored into cores
or updating to use this new API.
Dmitry Torokhov July 30, 2020, 6:45 p.m. UTC | #7
On Thu, Jul 30, 2020 at 11:16 AM Mark Brown <broonie@kernel.org> wrote:
>
> On Thu, Jul 30, 2020 at 10:46:31AM -0700, Dmitry Torokhov wrote:
> > On Thu, Jul 30, 2020 at 9:49 AM Mark Brown <broonie@kernel.org> wrote:
>
> > > The error messages are frequently in the caller rather than the
> > > frameworks, it's often helpful for the comprehensibility of the error
> > > messages especially in cases where things may be legitimately absent.
>
> > Not for deferral. All you need to know in this case is:
>
> > "device A is attempting to request resource B which is not ready yet"
>
> > There is nothing to handle on the caller part except to float the error up.
>
> You can sometimes do a better job of explaining what the resource you
> were looking for was,

I think it is true for very esoteric cases. I.e. your driver uses 2
interrupt lines, or something like that. For GPIO, regulators, and
clocks we normally have a name/connection ID that provides enough of
context. We need to remember, the error messages really only make
total sense to a person familiar with the driver to begin with, not
for a random person looking at the log.

> and of course you still need diagnostics in the
> non-deferral case.  Whatever happens we'll need a lot of per-driver
> churn, either removing existing diagnostics that get factored into cores
> or updating to use this new API.

The point is if you push it into core you'll get the benefit of
notifying about the deferral (and can "attach" deferral reason to a
device) without changing drivers at all. You can clean them up later
if you want, or decide that additional logging in error paths does not
hurt. This new API does not do you any good unless you convert
drivers, and you need to convert the majority of them to be able to
rely on the deferral diagnostic that is being added.

Thanks.
Mark Brown July 30, 2020, 7:06 p.m. UTC | #8
On Thu, Jul 30, 2020 at 11:45:25AM -0700, Dmitry Torokhov wrote:
> On Thu, Jul 30, 2020 at 11:16 AM Mark Brown <broonie@kernel.org> wrote:

> > You can sometimes do a better job of explaining what the resource you
> > were looking for was,

> I think it is true for very esoteric cases. I.e. your driver uses 2
> interrupt lines, or something like that. For GPIO, regulators, and
> clocks we normally have a name/connection ID that provides enough of

*Normally* but not always - some of the older bindings do love their
arrays of phandles (or mixes of numbers and phandles!) unfortunately.

> context. We need to remember, the error messages really only make
> total sense to a person familiar with the driver to begin with, not
> for a random person looking at the log.

Not really, one of the big targets is people doing system integration
who are writing a DT or possibly producing a highly tuned kernel config.
They needn't have a strong familiarity with the driver, they're often
just picking it up off the shelf.

> > and of course you still need diagnostics in the
> > non-deferral case.  Whatever happens we'll need a lot of per-driver
> > churn, either removing existing diagnostics that get factored into cores
> > or updating to use this new API.

> The point is if you push it into core you'll get the benefit of
> notifying about the deferral (and can "attach" deferral reason to a
> device) without changing drivers at all. You can clean them up later
> if you want, or decide that additional logging in error paths does not
> hurt. This new API does not do you any good unless you convert
> drivers, and you need to convert the majority of them to be able to
> rely on the deferral diagnostic that is being added.

The push for this is that there's already people going around modifying
drivers whatever happens but at present they're mainly trying to delete
diagnostics which isn't wonderful.  Besides, even if we push things into
the subsystems they'd want to use this interface or something quite like
it anyway - it's more a question of if we go quickly add some users to
subsystems isn't it?  I'm not against that.