Message ID | 20200713144324.23654-1-a.hajda@samsung.com (mailing list archive) |
---|---|
Headers | show |
Series | driver core: add probe error check helper | expand |
Hi Greg, Apparently the patchset has no more comments. Could you take the patches to your tree? At least 1st and 2nd. Regards Andrzej On 13.07.2020 16:43, Andrzej Hajda wrote: > Hi All, > > Thanks for comments. > > Changes since v8: > - fixed typo in function name, > - removed cocci script (added by mistake) > > Changes since v7: > - improved commit message > - added R-Bs > > Changes since v6: > - removed leftovers from old naming scheme in commit descritions, > - added R-Bs. > > Changes since v5: > - removed patch adding macro, dev_err_probe(dev, PTR_ERR(ptr), ...) should be used instead, > - added dev_dbg logging in case of -EPROBE_DEFER, > - renamed functions and vars according to comments, > - extended docs, > - cosmetics. > > Original message (with small adjustments): > > Recently I took some time to re-check error handling in drivers probe code, > and I have noticed that number of incorrect resource acquisition error handling > increased and there are no other propositions which can cure the situation. > > So I have decided to resend my old proposition of probe_err helper which should > simplify resource acquisition error handling, it also extend it with adding defer > probe reason to devices_deferred debugfs property, which should improve debugging > experience for developers/testers. > > I have also added two patches showing usage and benefits of the helper. > > My dirty/ad-hoc cocci scripts shows that this helper can be used in at least 2700 places > saving about 3500 lines of code. > > Regards > Andrzej > > > Andrzej Hajda (4): > driver core: add device probe log helper > driver core: add deferring probe reason to devices_deferred property > drm/bridge/sii8620: fix resource acquisition error handling > drm/bridge: lvds-codec: simplify error handling > > drivers/base/base.h | 3 ++ > drivers/base/core.c | 46 ++++++++++++++++++++++++++++ > drivers/base/dd.c | 23 +++++++++++++- > drivers/gpu/drm/bridge/lvds-codec.c | 10 ++---- > drivers/gpu/drm/bridge/sil-sii8620.c | 21 ++++++------- > include/linux/device.h | 3 ++ > 6 files changed, 86 insertions(+), 20 deletions(-) >
On Tue, Jul 28, 2020 at 05:05:03PM +0200, Andrzej Hajda wrote: > Hi Greg, > > Apparently the patchset has no more comments. > > Could you take the patches to your tree? At least 1st and 2nd. All now queued up, thanks! greg k-h
On Thu, Jul 30, 2020 at 12:10 AM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Tue, Jul 28, 2020 at 05:05:03PM +0200, Andrzej Hajda wrote: > > Hi Greg, > > > > Apparently the patchset has no more comments. > > > > Could you take the patches to your tree? At least 1st and 2nd. > > All now queued up, thanks! I believe it still has not been answered why this can't be pushed into resource providers (clock, regulators, gpio, interrupts, etc), especially for devm APIs where we know exactly what device we are requesting a resource for, so that individual drivers do not need to change anything. We can mark the device as being probed so that probe deferral is only handled when we actually execute probe() (and for the bonus points scream loudly if someone tries to return -EPROBE_DEFER outside of probe path). And now with coccinelle script we can expect a deluge of patches reshuffling drivers... Thanks.
On Thu, Jul 30, 2020 at 09:18:30AM -0700, Dmitry Torokhov wrote: > I believe it still has not been answered why this can't be pushed into > resource providers (clock, regulators, gpio, interrupts, etc), > especially for devm APIs where we know exactly what device we are > requesting a resource for, so that individual drivers do not need to > change anything. The error messages are frequently in the caller rather than the frameworks, it's often helpful for the comprehensibility of the error messages especially in cases where things may be legitimately absent. > We can mark the device as being probed so that probe > deferral is only handled when we actually execute probe() (and for the > bonus points scream loudly if someone tries to return -EPROBE_DEFER > outside of probe path). Is this a big issue?
On Thu, Jul 30, 2020 at 9:49 AM Mark Brown <broonie@kernel.org> wrote: > > On Thu, Jul 30, 2020 at 09:18:30AM -0700, Dmitry Torokhov wrote: > > > I believe it still has not been answered why this can't be pushed into > > resource providers (clock, regulators, gpio, interrupts, etc), > > especially for devm APIs where we know exactly what device we are > > requesting a resource for, so that individual drivers do not need to > > change anything. > > The error messages are frequently in the caller rather than the > frameworks, it's often helpful for the comprehensibility of the error > messages especially in cases where things may be legitimately absent. Not for deferral. All you need to know in this case is: "device A is attempting to request resource B which is not ready yet" There is nothing to handle on the caller part except to float the error up. > > > We can mark the device as being probed so that probe > > deferral is only handled when we actually execute probe() (and for the > > bonus points scream loudly if someone tries to return -EPROBE_DEFER > > outside of probe path). > > Is this a big issue? We do not know ;) Probably not. It will just get reported as an ordinary failure and the driver will handle it somehow. Still it would be nice to know if we attempt to raise deferrals in code paths where they do not make sense. Thanks.
On Thu, Jul 30, 2020 at 10:46:31AM -0700, Dmitry Torokhov wrote: > On Thu, Jul 30, 2020 at 9:49 AM Mark Brown <broonie@kernel.org> wrote: > > The error messages are frequently in the caller rather than the > > frameworks, it's often helpful for the comprehensibility of the error > > messages especially in cases where things may be legitimately absent. > Not for deferral. All you need to know in this case is: > "device A is attempting to request resource B which is not ready yet" > There is nothing to handle on the caller part except to float the error up. You can sometimes do a better job of explaining what the resource you were looking for was, and of course you still need diagnostics in the non-deferral case. Whatever happens we'll need a lot of per-driver churn, either removing existing diagnostics that get factored into cores or updating to use this new API.
On Thu, Jul 30, 2020 at 11:16 AM Mark Brown <broonie@kernel.org> wrote: > > On Thu, Jul 30, 2020 at 10:46:31AM -0700, Dmitry Torokhov wrote: > > On Thu, Jul 30, 2020 at 9:49 AM Mark Brown <broonie@kernel.org> wrote: > > > > The error messages are frequently in the caller rather than the > > > frameworks, it's often helpful for the comprehensibility of the error > > > messages especially in cases where things may be legitimately absent. > > > Not for deferral. All you need to know in this case is: > > > "device A is attempting to request resource B which is not ready yet" > > > There is nothing to handle on the caller part except to float the error up. > > You can sometimes do a better job of explaining what the resource you > were looking for was, I think it is true for very esoteric cases. I.e. your driver uses 2 interrupt lines, or something like that. For GPIO, regulators, and clocks we normally have a name/connection ID that provides enough of context. We need to remember, the error messages really only make total sense to a person familiar with the driver to begin with, not for a random person looking at the log. > and of course you still need diagnostics in the > non-deferral case. Whatever happens we'll need a lot of per-driver > churn, either removing existing diagnostics that get factored into cores > or updating to use this new API. The point is if you push it into core you'll get the benefit of notifying about the deferral (and can "attach" deferral reason to a device) without changing drivers at all. You can clean them up later if you want, or decide that additional logging in error paths does not hurt. This new API does not do you any good unless you convert drivers, and you need to convert the majority of them to be able to rely on the deferral diagnostic that is being added. Thanks.
On Thu, Jul 30, 2020 at 11:45:25AM -0700, Dmitry Torokhov wrote: > On Thu, Jul 30, 2020 at 11:16 AM Mark Brown <broonie@kernel.org> wrote: > > You can sometimes do a better job of explaining what the resource you > > were looking for was, > I think it is true for very esoteric cases. I.e. your driver uses 2 > interrupt lines, or something like that. For GPIO, regulators, and > clocks we normally have a name/connection ID that provides enough of *Normally* but not always - some of the older bindings do love their arrays of phandles (or mixes of numbers and phandles!) unfortunately. > context. We need to remember, the error messages really only make > total sense to a person familiar with the driver to begin with, not > for a random person looking at the log. Not really, one of the big targets is people doing system integration who are writing a DT or possibly producing a highly tuned kernel config. They needn't have a strong familiarity with the driver, they're often just picking it up off the shelf. > > and of course you still need diagnostics in the > > non-deferral case. Whatever happens we'll need a lot of per-driver > > churn, either removing existing diagnostics that get factored into cores > > or updating to use this new API. > The point is if you push it into core you'll get the benefit of > notifying about the deferral (and can "attach" deferral reason to a > device) without changing drivers at all. You can clean them up later > if you want, or decide that additional logging in error paths does not > hurt. This new API does not do you any good unless you convert > drivers, and you need to convert the majority of them to be able to > rely on the deferral diagnostic that is being added. The push for this is that there's already people going around modifying drivers whatever happens but at present they're mainly trying to delete diagnostics which isn't wonderful. Besides, even if we push things into the subsystems they'd want to use this interface or something quite like it anyway - it's more a question of if we go quickly add some users to subsystems isn't it? I'm not against that.