[v3,2/3] drm/bridge: parade-ps8640: Use regmap APIs

Message ID	20210914162825.v3.2.Ib06997ddd73e2ac29e185f039d85cfa8e760d641@changeid (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=vphW=OE=lists.freedesktop.org=dri-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7A88B6108F From: Philip Chen <philipchen@chromium.org> To: LKML <linux-kernel@vger.kernel.org> Cc: dianders@chromium.org, swboyd@chromium.org, Philip Chen <philipchen@chromium.org>, Andrzej Hajda <a.hajda@samsung.com>, Daniel Vetter <daniel@ffwll.ch>, David Airlie <airlied@linux.ie>, Jernej Skrabec <jernej.skrabec@gmail.com>, Jonas Karlman <jonas@kwiboo.se>, Laurent Pinchart <Laurent.pinchart@ideasonboard.com>, Neil Armstrong <narmstrong@baylibre.com>, Robert Foss <robert.foss@linaro.org>, dri-devel@lists.freedesktop.org Subject: [PATCH v3 2/3] drm/bridge: parade-ps8640: Use regmap APIs Date: Tue, 14 Sep 2021 16:28:44 -0700 Message-Id: <20210914162825.v3.2.Ib06997ddd73e2ac29e185f039d85cfa8e760d641@changeid> In-Reply-To: <20210914162825.v3.1.I85e46da154e3fa570442b496a0363250fff0e44e@changeid> References: <20210914162825.v3.1.I85e46da154e3fa570442b496a0363250fff0e44e@changeid> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	[v3,1/3] drm/bridge: parade-ps8640: Improve logging at probing \| expand [v3,1/3] drm/bridge: parade-ps8640: Improve logging at probing [v3,2/3] drm/bridge: parade-ps8640: Use regmap APIs [v3,3/3] drm/bridge: parade-ps8640: Add support for AUX channel

Philip Chen Sept. 14, 2021, 11:28 p.m. UTC

Replace the direct i2c access (i2c_smbus_* functions) with regmap APIs,
which will simplify the future update on ps8640 driver.

Reviewed-by: Douglas Anderson <dianders@chromium.org>

Signed-off-by: Philip Chen <philipchen@chromium.org>
---

Changes in v3:
- Fix the nits from v2 review

Changes in v2:
- Add separate reg map config per page

 drivers/gpu/drm/bridge/parade-ps8640.c | 94 +++++++++++++++++++-------
 1 file changed, 68 insertions(+), 26 deletions(-)

Stephen Boyd Sept. 15, 2021, 12:29 a.m. UTC | #1

Quoting Philip Chen (2021-09-14 16:28:44)
> diff --git a/drivers/gpu/drm/bridge/parade-ps8640.c b/drivers/gpu/drm/bridge/parade-ps8640.c
> index e340af381e05..8d3e7a147170 100644
> --- a/drivers/gpu/drm/bridge/parade-ps8640.c
> +++ b/drivers/gpu/drm/bridge/parade-ps8640.c
> @@ -368,6 +396,12 @@ static int ps8640_probe(struct i2c_client *client)
>
>         ps_bridge->page[PAGE0_DP_CNTL] = client;
>
> +       ps_bridge->regmap[PAGE0_DP_CNTL] = devm_regmap_init_i2c(client, ps8640_regmap_config);
> +       if (IS_ERR(ps_bridge->regmap[PAGE0_DP_CNTL])) {
> +               return dev_err_probe(dev, PTR_ERR(ps_bridge->regmap[PAGE0_DP_CNTL]),
> +                                    "Error initting page 0 regmap\n");

This one also doesn't return -EPROBE_DEFER? The dev_err_probe() should
really only be used on "get" style APIs that can defer.

> +       }
> +
>         for (i = 1; i < ARRAY_SIZE(ps_bridge->page); i++) {
>                 ps_bridge->page[i] = devm_i2c_new_dummy_device(&client->dev,
>                                                              client->adapter,

Doug Anderson Sept. 15, 2021, 2:17 a.m. UTC | #2

Hi,

On Tue, Sep 14, 2021 at 5:29 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Philip Chen (2021-09-14 16:28:44)
> > diff --git a/drivers/gpu/drm/bridge/parade-ps8640.c b/drivers/gpu/drm/bridge/parade-ps8640.c
> > index e340af381e05..8d3e7a147170 100644
> > --- a/drivers/gpu/drm/bridge/parade-ps8640.c
> > +++ b/drivers/gpu/drm/bridge/parade-ps8640.c
> > @@ -368,6 +396,12 @@ static int ps8640_probe(struct i2c_client *client)
> >
> >         ps_bridge->page[PAGE0_DP_CNTL] = client;
> >
> > +       ps_bridge->regmap[PAGE0_DP_CNTL] = devm_regmap_init_i2c(client, ps8640_regmap_config);
> > +       if (IS_ERR(ps_bridge->regmap[PAGE0_DP_CNTL])) {
> > +               return dev_err_probe(dev, PTR_ERR(ps_bridge->regmap[PAGE0_DP_CNTL]),
> > +                                    "Error initting page 0 regmap\n");
>
> This one also doesn't return -EPROBE_DEFER? The dev_err_probe() should
> really only be used on "get" style APIs that can defer.

Any reason why you say that dev_err_probe() should only be used on
"get" style APIs that can defer? Even if an API can't return
-EPROBE_DEFER, using dev_err_probe() still (IMO) makes the code
cleaner and should be used for any error cases like this during probe.
Why?

* It shows the error code in a standard way for you.
* It returns the error code you passed it so you can make your error
return "one line" instead of 2.

Is there some bad thing about dev_err_probe() that makes it
problematic to use? If not then the above advantages should be a net
win, right?

-Doug

Stephen Boyd Sept. 15, 2021, 2:50 a.m. UTC | #3

Quoting Doug Anderson (2021-09-14 19:17:03)
> Hi,
>
> On Tue, Sep 14, 2021 at 5:29 PM Stephen Boyd <swboyd@chromium.org> wrote:
> >
> > Quoting Philip Chen (2021-09-14 16:28:44)
> > > diff --git a/drivers/gpu/drm/bridge/parade-ps8640.c b/drivers/gpu/drm/bridge/parade-ps8640.c
> > > index e340af381e05..8d3e7a147170 100644
> > > --- a/drivers/gpu/drm/bridge/parade-ps8640.c
> > > +++ b/drivers/gpu/drm/bridge/parade-ps8640.c
> > > @@ -368,6 +396,12 @@ static int ps8640_probe(struct i2c_client *client)
> > >
> > >         ps_bridge->page[PAGE0_DP_CNTL] = client;
> > >
> > > +       ps_bridge->regmap[PAGE0_DP_CNTL] = devm_regmap_init_i2c(client, ps8640_regmap_config);
> > > +       if (IS_ERR(ps_bridge->regmap[PAGE0_DP_CNTL])) {
> > > +               return dev_err_probe(dev, PTR_ERR(ps_bridge->regmap[PAGE0_DP_CNTL]),
> > > +                                    "Error initting page 0 regmap\n");
> >
> > This one also doesn't return -EPROBE_DEFER? The dev_err_probe() should
> > really only be used on "get" style APIs that can defer.
>
> Any reason why you say that dev_err_probe() should only be used on
> "get" style APIs that can defer? Even if an API can't return
> -EPROBE_DEFER, using dev_err_probe() still (IMO) makes the code
> cleaner and should be used for any error cases like this during probe.
> Why?
>
> * It shows the error code in a standard way for you.
> * It returns the error code you passed it so you can make your error
> return "one line" instead of 2.

I'd rather see any sort of error message in getter APIs be pushed into
the callee so that we reduce the text size of the kernel by having one
message instead of hundreds/thousands about "failure to get something".
As far as I can tell this API is designed to skip printing anything when
EPROBE_DEFER is returned, and only print something when it isn't that
particular error code. The other benefit of this API is it sets the
deferred reason in debugfs which is nice to know why some device failed
to probe. Of course now with fw_devlink that almost never triggers so
the feature is becoming useless.

>
> Is there some bad thing about dev_err_probe() that makes it
> problematic to use? If not then the above advantages should be a net
> win, right?
>

I view it as an anti-pattern. We should strive for driver probe to be
fairly simple so that it's basically getting resources and registering
with frameworks. The error messages in probe may help when you're trying
to get the driver to work and the resource APIs don't make any sense but
after that it's basically debug messages hiding as error messages.
They're never supposed to happen in practice, because the code is
tested, right?

Doug Anderson Sept. 15, 2021, 4:41 p.m. UTC | #4

Hi,

On Tue, Sep 14, 2021 at 7:50 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Doug Anderson (2021-09-14 19:17:03)
> > Hi,
> >
> > On Tue, Sep 14, 2021 at 5:29 PM Stephen Boyd <swboyd@chromium.org> wrote:
> > >
> > > Quoting Philip Chen (2021-09-14 16:28:44)
> > > > diff --git a/drivers/gpu/drm/bridge/parade-ps8640.c b/drivers/gpu/drm/bridge/parade-ps8640.c
> > > > index e340af381e05..8d3e7a147170 100644
> > > > --- a/drivers/gpu/drm/bridge/parade-ps8640.c
> > > > +++ b/drivers/gpu/drm/bridge/parade-ps8640.c
> > > > @@ -368,6 +396,12 @@ static int ps8640_probe(struct i2c_client *client)
> > > >
> > > >         ps_bridge->page[PAGE0_DP_CNTL] = client;
> > > >
> > > > +       ps_bridge->regmap[PAGE0_DP_CNTL] = devm_regmap_init_i2c(client, ps8640_regmap_config);
> > > > +       if (IS_ERR(ps_bridge->regmap[PAGE0_DP_CNTL])) {
> > > > +               return dev_err_probe(dev, PTR_ERR(ps_bridge->regmap[PAGE0_DP_CNTL]),
> > > > +                                    "Error initting page 0 regmap\n");
> > >
> > > This one also doesn't return -EPROBE_DEFER? The dev_err_probe() should
> > > really only be used on "get" style APIs that can defer.
> >
> > Any reason why you say that dev_err_probe() should only be used on
> > "get" style APIs that can defer? Even if an API can't return
> > -EPROBE_DEFER, using dev_err_probe() still (IMO) makes the code
> > cleaner and should be used for any error cases like this during probe.
> > Why?
> >
> > * It shows the error code in a standard way for you.
> > * It returns the error code you passed it so you can make your error
> > return "one line" instead of 2.
>
> I'd rather see any sort of error message in getter APIs be pushed into
> the callee so that we reduce the text size of the kernel by having one
> message instead of hundreds/thousands about "failure to get something".
> As far as I can tell this API is designed to skip printing anything when
> EPROBE_DEFER is returned, and only print something when it isn't that
> particular error code. The other benefit of this API is it sets the
> deferred reason in debugfs which is nice to know why some device failed
> to probe. Of course now with fw_devlink that almost never triggers so
> the feature is becoming useless.

I guess we need to split this apart into two issues. One (1) is
whether we should be printing errors like this in probe() and the
other (2) is the use of dev_err_probe() for cases where err could
never be -EPROBE_DEFER.

So the argument about reducing the text size for thousands of slightly
different errors is all about (1), right? In other words, you'd be
equally opposed to a change that added a normal error print with
dev_err(), right? IMO, this is a fair debate to have and it comes down
to a choice that has pros and cons. Yes the error messages are not
needed in the normal case and yes they bloat the kernel size, but when
something inevitably goes wrong then you have a way to track it down
instead of trying to guess or having to recompile the code to add
prints everywhere. Often this can give you a quick clue about a
missing Kconfig or a wrongly coded device tree file without tons of
time adding prints and recompiling code. That seems like it's worth
something...

One could also make the argument that if you don't care about all
these similar errors bloating the text segment that it would be pretty
easy to create a new Kconfig: "CONFIG_I_THINK_PROBE_ERRORS_ARE_BLOAT".
If that config is set then it could throw away the strings for every
dev_err_probe() that you compile in.

I'm not so convinced about the argument (2) that dev_err_probe()
should only be used if the error code could be -EPROBE_DEFER. Compare
these two:

Old:
  ret = do_something_that_cant_defer();
  if (ret < 0) {
    dev_err(dev, "The foo failed to bar (%pe)\n", ERR_PTR(ret));
    return ret;
  }

New:
  ret = do_something_that_cant_defer();
  if (ret < 0)
    return dev_err_probe(dev, ret, "The foo failed to bar\n");

It seems clear to me that the "New" case is better. The error code is
printed in a consistent fashion compared to all other error prints and
the fact that it returns the error code makes it cleaner. It's fine
that the error could never be -EPROBE_DEFER. Certainly we could add a
new function called dev_err_with_code() that worked exactly like
dev_err_probe() except that it didn't have special logic for
-EPROBE_DEFER but why?

Also note that the current function is dev_err_probe(), not
dev_err_might_defer(). By the name, it should be useful / OK to use
for any errors that come up in the probe path.

> > Is there some bad thing about dev_err_probe() that makes it
> > problematic to use? If not then the above advantages should be a net
> > win, right?
> >
>
> I view it as an anti-pattern. We should strive for driver probe to be
> fairly simple so that it's basically getting resources and registering
> with frameworks. The error messages in probe may help when you're trying
> to get the driver to work and the resource APIs don't make any sense but
> after that it's basically debug messages hiding as error messages.
> They're never supposed to happen in practice, because the code is
> tested, right?

IMO they happen even after initial driver bringup. You can trip error
cases from device tree problems and config problems pretty easily. It
could also be that you're bringing up an old / tested / tried and true
driver but on new hardware where some other thing (clock, regulators,
etc) is returning an error. Being able to track these down easily can
justify the error messages long term.

...or maybe what you're saying is that if it's clear that the only
case that an error could be returned is due to a driver error then we
should skip the error message? I guess, so, but only if it's somehow
built-in to the concept of the function that the only error case is a
driver error. Otherwise the function may change to check for more
errors in the future and you're back to where you started with.

In the case of devm_regmap_init_i2c(), the driver could be fine but
you might be trying to instantiate it on a system whose i2c bus lacks
the needed functionality. That's not a bug in the bridge driver but an
error in system integration. Yeah, after bringup of the new system you
probably don't need the error, but it will be useful during people's
bringups year after year.

-Doug

Stephen Boyd Sept. 16, 2021, 10:17 p.m. UTC | #5

TL;DR: Please try to reduce these error messages in drivers and
consolidate them into subsystems so that drivers stay simple.

Quoting Doug Anderson (2021-09-15 09:41:39)
> Hi,
>
> On Tue, Sep 14, 2021 at 7:50 PM Stephen Boyd <swboyd@chromium.org> wrote:
> >
> >
> > I'd rather see any sort of error message in getter APIs be pushed into
> > the callee so that we reduce the text size of the kernel by having one
> > message instead of hundreds/thousands about "failure to get something".
> > As far as I can tell this API is designed to skip printing anything when
> > EPROBE_DEFER is returned, and only print something when it isn't that
> > particular error code. The other benefit of this API is it sets the
> > deferred reason in debugfs which is nice to know why some device failed
> > to probe. Of course now with fw_devlink that almost never triggers so
> > the feature is becoming useless.
>
> I guess we need to split this apart into two issues. One (1) is
> whether we should be printing errors like this in probe() and the
> other (2) is the use of dev_err_probe() for cases where err could
> never be -EPROBE_DEFER.
>
> So the argument about reducing the text size for thousands of slightly
> different errors is all about (1), right? In other words, you'd be
> equally opposed to a change that added a normal error print with
> dev_err(), right? IMO, this is a fair debate to have and it comes down
> to a choice that has pros and cons. Yes the error messages are not
> needed in the normal case and yes they bloat the kernel size, but when
> something inevitably goes wrong then you have a way to track it down
> instead of trying to guess or having to recompile the code to add
> prints everywhere. Often this can give you a quick clue about a
> missing Kconfig or a wrongly coded device tree file without tons of
> time adding prints and recompiling code. That seems like it's worth
> something...

Agreed. dev_err_probe() does that by putting that into the deferred
reason debugfs file. I'm saying that drivers shouldn't really be using
this API unless they're doing something exotic. The subsystems that are
implementing the 'get' operation that may defer should use this function
and then drivers should just return the error value to driver core so
that we can consolidate error messages and shrink the kernel size.

Maybe we can look for the defer reason in call_driver_probe() and print
a warning message if the string is set. Right now -EPROBE_DEFER is
handled but it's a dev_dbg() print that probably nobody enables and it
doesn't print the reason string.

Even better, we could make the defer reason the 'probe failed reason'
instead, and then jam the dev_err_probe() string into there regardless
of EPROBE_DEFER being returned or not. This would elevate this API to
any sort of device probe error. One more crazy idea is that we could
save the stack when the dev_err_probe() call is made and print out the
stacktrace when the error string is printed in driver core. I'm not sure
this is any better than making it a WARN_ON() though.

>
> One could also make the argument that if you don't care about all
> these similar errors bloating the text segment that it would be pretty
> easy to create a new Kconfig: "CONFIG_I_THINK_PROBE_ERRORS_ARE_BLOAT".
> If that config is set then it could throw away the strings for every
> dev_err_probe() that you compile in.

I'll leave this little CONFIG_PRINTK=n sledgehammer here.

>
>
> I'm not so convinced about the argument (2) that dev_err_probe()
> should only be used if the error code could be -EPROBE_DEFER. Compare
> these two:
>
> Old:
>   ret = do_something_that_cant_defer();
>   if (ret < 0) {
>     dev_err(dev, "The foo failed to bar (%pe)\n", ERR_PTR(ret));
>     return ret;
>   }
>
> New:
>   ret = do_something_that_cant_defer();
>   if (ret < 0)
>     return dev_err_probe(dev, ret, "The foo failed to bar\n");
>
> It seems clear to me that the "New" case is better. The error code is
> printed in a consistent fashion compared to all other error prints and
> the fact that it returns the error code makes it cleaner. It's fine
> that the error could never be -EPROBE_DEFER. Certainly we could add a
> new function called dev_err_with_code() that worked exactly like
> dev_err_probe() except that it didn't have special logic for
> -EPROBE_DEFER but why?
>
> Also note that the current function is dev_err_probe(), not
> dev_err_might_defer(). By the name, it should be useful / OK to use
> for any errors that come up in the probe path.

I looked at the documentation for dev_err_probe()

 * This helper implements common pattern present in probe functions for error
 * checking: print debug or error message depending if the error value is
 * -EPROBE_DEFER and propagate error upwards.
 * In case of -EPROBE_DEFER it sets also defer probe reason, which can be
 * checked later by reading devices_deferred debugfs attribute.

This seems to imply that it's all about EPROBE_DEFER. I'm just
reconstructing what I read from kernel-doc. If the intent is to use it
outside of probe defer, then please update the documentation to
alleviate confusion.

>
>
> > > Is there some bad thing about dev_err_probe() that makes it
> > > problematic to use? If not then the above advantages should be a net
> > > win, right?
> > >
> >
> > I view it as an anti-pattern. We should strive for driver probe to be
> > fairly simple so that it's basically getting resources and registering
> > with frameworks. The error messages in probe may help when you're trying
> > to get the driver to work and the resource APIs don't make any sense but
> > after that it's basically debug messages hiding as error messages.
> > They're never supposed to happen in practice, because the code is
> > tested, right?
>
> IMO they happen even after initial driver bringup. You can trip error
> cases from device tree problems and config problems pretty easily. It
> could also be that you're bringing up an old / tested / tried and true
> driver but on new hardware where some other thing (clock, regulators,
> etc) is returning an error. Being able to track these down easily can
> justify the error messages long term.
>
> ...or maybe what you're saying is that if it's clear that the only
> case that an error could be returned is due to a driver error then we
> should skip the error message? I guess, so, but only if it's somehow
> built-in to the concept of the function that the only error case is a
> driver error. Otherwise the function may change to check for more
> errors in the future and you're back to where you started with.

I didn't really follow this paragraph, sorry.

>
> In the case of devm_regmap_init_i2c(), the driver could be fine but
> you might be trying to instantiate it on a system whose i2c bus lacks
> the needed functionality. That's not a bug in the bridge driver but an
> error in system integration. Yeah, after bringup of the new system you
> probably don't need the error, but it will be useful during people's
> bringups year after year.
>

The point I'm trying to make is that these error messages in probe
almost never get printed after the driver is brought up on the hardware
that starts shipping out to non-kernel developers. Of course they happen
when kernel devs are enabling new hardware year after year on the same
tried and tested driver. They're worthwhile messages to have to make our
lives easier at figuring out some misconfiguration, etc. The problem is
they lead to bloat once the bringup/configuration phase is over.

At one point we directed driver authors at dev_dbg() for these prints so
that the strings would be removed from the kernel image if debugging
wasn't enabled. It looks like dev_err_probe() goes in the opposite
direction by printing an error message and passing the string to an
exported function, so dev_dbg() won't reduce the image size. Ugh!

Doug Anderson Sept. 16, 2021, 11:21 p.m. UTC | #6

Hi,

On Thu, Sep 16, 2021 at 3:17 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> TL;DR: Please try to reduce these error messages in drivers and
> consolidate them into subsystems so that drivers stay simple.
>
> Quoting Doug Anderson (2021-09-15 09:41:39)
> > Hi,
> >
> > On Tue, Sep 14, 2021 at 7:50 PM Stephen Boyd <swboyd@chromium.org> wrote:
> > >
> > >
> > > I'd rather see any sort of error message in getter APIs be pushed into
> > > the callee so that we reduce the text size of the kernel by having one
> > > message instead of hundreds/thousands about "failure to get something".
> > > As far as I can tell this API is designed to skip printing anything when
> > > EPROBE_DEFER is returned, and only print something when it isn't that
> > > particular error code. The other benefit of this API is it sets the
> > > deferred reason in debugfs which is nice to know why some device failed
> > > to probe. Of course now with fw_devlink that almost never triggers so
> > > the feature is becoming useless.
> >
> > I guess we need to split this apart into two issues. One (1) is
> > whether we should be printing errors like this in probe() and the
> > other (2) is the use of dev_err_probe() for cases where err could
> > never be -EPROBE_DEFER.
> >
> > So the argument about reducing the text size for thousands of slightly
> > different errors is all about (1), right? In other words, you'd be
> > equally opposed to a change that added a normal error print with
> > dev_err(), right? IMO, this is a fair debate to have and it comes down
> > to a choice that has pros and cons. Yes the error messages are not
> > needed in the normal case and yes they bloat the kernel size, but when
> > something inevitably goes wrong then you have a way to track it down
> > instead of trying to guess or having to recompile the code to add
> > prints everywhere. Often this can give you a quick clue about a
> > missing Kconfig or a wrongly coded device tree file without tons of
> > time adding prints and recompiling code. That seems like it's worth
> > something...
>
> Agreed. dev_err_probe() does that by putting that into the deferred
> reason debugfs file. I'm saying that drivers shouldn't really be using
> this API unless they're doing something exotic. The subsystems that are
> implementing the 'get' operation that may defer should use this function
> and then drivers should just return the error value to driver core so
> that we can consolidate error messages and shrink the kernel size.
>
> Maybe we can look for the defer reason in call_driver_probe() and print
> a warning message if the string is set. Right now -EPROBE_DEFER is
> handled but it's a dev_dbg() print that probably nobody enables and it
> doesn't print the reason string.

Actually, in recent versions of the kernel it stashes the reason too.
I think there's a debugfs file "devices_deferred"


> Even better, we could make the defer reason the 'probe failed reason'
> instead, and then jam the dev_err_probe() string into there regardless
> of EPROBE_DEFER being returned or not. This would elevate this API to
> any sort of device probe error. One more crazy idea is that we could
> save the stack when the dev_err_probe() call is made and print out the
> stacktrace when the error string is printed in driver core. I'm not sure
> this is any better than making it a WARN_ON() though.
>
> >
> > One could also make the argument that if you don't care about all
> > these similar errors bloating the text segment that it would be pretty
> > easy to create a new Kconfig: "CONFIG_I_THINK_PROBE_ERRORS_ARE_BLOAT".
> > If that config is set then it could throw away the strings for every
> > dev_err_probe() that you compile in.
>
> I'll leave this little CONFIG_PRINTK=n sledgehammer here.
>
> >
> >
> > I'm not so convinced about the argument (2) that dev_err_probe()
> > should only be used if the error code could be -EPROBE_DEFER. Compare
> > these two:
> >
> > Old:
> >   ret = do_something_that_cant_defer();
> >   if (ret < 0) {
> >     dev_err(dev, "The foo failed to bar (%pe)\n", ERR_PTR(ret));
> >     return ret;
> >   }
> >
> > New:
> >   ret = do_something_that_cant_defer();
> >   if (ret < 0)
> >     return dev_err_probe(dev, ret, "The foo failed to bar\n");
> >
> > It seems clear to me that the "New" case is better. The error code is
> > printed in a consistent fashion compared to all other error prints and
> > the fact that it returns the error code makes it cleaner. It's fine
> > that the error could never be -EPROBE_DEFER. Certainly we could add a
> > new function called dev_err_with_code() that worked exactly like
> > dev_err_probe() except that it didn't have special logic for
> > -EPROBE_DEFER but why?
> >
> > Also note that the current function is dev_err_probe(), not
> > dev_err_might_defer(). By the name, it should be useful / OK to use
> > for any errors that come up in the probe path.
>
> I looked at the documentation for dev_err_probe()
>
>  * This helper implements common pattern present in probe functions for error
>  * checking: print debug or error message depending if the error value is
>  * -EPROBE_DEFER and propagate error upwards.
>  * In case of -EPROBE_DEFER it sets also defer probe reason, which can be
>  * checked later by reading devices_deferred debugfs attribute.
>
> This seems to imply that it's all about EPROBE_DEFER. I'm just
> reconstructing what I read from kernel-doc. If the intent is to use it
> outside of probe defer, then please update the documentation to
> alleviate confusion.

Meh. Yeah, it talks a lot about -EPROBE_DEFER, but it doesn't say it's
only for that.

Sure, I'll post a patch.

https://lore.kernel.org/r/20210916161931.1.I32bea713bd6c6fb419a24da73686145742b6c117@changeid


> > In the case of devm_regmap_init_i2c(), the driver could be fine but
> > you might be trying to instantiate it on a system whose i2c bus lacks
> > the needed functionality. That's not a bug in the bridge driver but an
> > error in system integration. Yeah, after bringup of the new system you
> > probably don't need the error, but it will be useful during people's
> > bringups year after year.
> >
>
> The point I'm trying to make is that these error messages in probe
> almost never get printed after the driver is brought up on the hardware
> that starts shipping out to non-kernel developers. Of course they happen
> when kernel devs are enabling new hardware year after year on the same
> tried and tested driver. They're worthwhile messages to have to make our
> lives easier at figuring out some misconfiguration, etc. The problem is
> they lead to bloat once the bringup/configuration phase is over.
>
> At one point we directed driver authors at dev_dbg() for these prints so
> that the strings would be removed from the kernel image if debugging
> wasn't enabled. It looks like dev_err_probe() goes in the opposite
> direction by printing an error message and passing the string to an
> exported function, so dev_dbg() won't reduce the image size. Ugh!

So maybe the key here is that "CONFIG_PRINTK=n" is not the same as
"CONFIG_I_THINK_PROBE_ERRORS_ARE_BLOAT" and it's not just that one has
a more flippant name than the other. I think your argument about the
fact that these errors almost never come up in practice is actually
true for pretty much _all_ probe errors, isn't it? So if you wanted to
keep non-probe errors in your system (keep PRINTK=y) and just do away
with these bloat-y probe errors then dev_err_probe() could really be
the key and there'd be a big benefit for using for all these errors
during probe, not just ones that have a chance of deferring. ...and
yes, you could make this config do something fancy like do a stack
dump or print the return address if you actually hit one of these
errors once you've thrown away the string.

I also wouldn't necessarily agree that dev_dbg() was an amazing fit
for these error messages. They truly were error-level things that were
happening. These are things that are causing the probe to abort, not
just extra spammy debug info. Calling them "error" messages rather
than "debug" messages seems better...


-Doug

Stephen Boyd Sept. 17, 2021, 6:12 a.m. UTC | #7

Quoting Doug Anderson (2021-09-16 16:21:12)
> Hi,
>
> On Thu, Sep 16, 2021 at 3:17 PM Stephen Boyd <swboyd@chromium.org> wrote:
> >
> > TL;DR: Please try to reduce these error messages in drivers and
> > consolidate them into subsystems so that drivers stay simple.
> >
> > Quoting Doug Anderson (2021-09-15 09:41:39)
> > > Hi,
> > >
> > > On Tue, Sep 14, 2021 at 7:50 PM Stephen Boyd <swboyd@chromium.org> wrote:
> > > >
> > > >
> > > > I'd rather see any sort of error message in getter APIs be pushed into
> > > > the callee so that we reduce the text size of the kernel by having one
> > > > message instead of hundreds/thousands about "failure to get something".
> > > > As far as I can tell this API is designed to skip printing anything when
> > > > EPROBE_DEFER is returned, and only print something when it isn't that
> > > > particular error code. The other benefit of this API is it sets the
> > > > deferred reason in debugfs which is nice to know why some device failed
> > > > to probe. Of course now with fw_devlink that almost never triggers so
> > > > the feature is becoming useless.
> > >
> > > I guess we need to split this apart into two issues. One (1) is
> > > whether we should be printing errors like this in probe() and the
> > > other (2) is the use of dev_err_probe() for cases where err could
> > > never be -EPROBE_DEFER.
> > >
> > > So the argument about reducing the text size for thousands of slightly
> > > different errors is all about (1), right? In other words, you'd be
> > > equally opposed to a change that added a normal error print with
> > > dev_err(), right? IMO, this is a fair debate to have and it comes down
> > > to a choice that has pros and cons. Yes the error messages are not
> > > needed in the normal case and yes they bloat the kernel size, but when
> > > something inevitably goes wrong then you have a way to track it down
> > > instead of trying to guess or having to recompile the code to add
> > > prints everywhere. Often this can give you a quick clue about a
> > > missing Kconfig or a wrongly coded device tree file without tons of
> > > time adding prints and recompiling code. That seems like it's worth
> > > something...
> >
> > Agreed. dev_err_probe() does that by putting that into the deferred
> > reason debugfs file. I'm saying that drivers shouldn't really be using
> > this API unless they're doing something exotic. The subsystems that are
> > implementing the 'get' operation that may defer should use this function
> > and then drivers should just return the error value to driver core so
> > that we can consolidate error messages and shrink the kernel size.
> >
> > Maybe we can look for the defer reason in call_driver_probe() and print
> > a warning message if the string is set. Right now -EPROBE_DEFER is
> > handled but it's a dev_dbg() print that probably nobody enables and it
> > doesn't print the reason string.
>
> Actually, in recent versions of the kernel it stashes the reason too.
> I think there's a debugfs file "devices_deferred"

Yep that's what I meant by "putting that into the deferred reason
debugfs file" above.

> > This seems to imply that it's all about EPROBE_DEFER. I'm just
> > reconstructing what I read from kernel-doc. If the intent is to use it
> > outside of probe defer, then please update the documentation to
> > alleviate confusion.
>
> Meh. Yeah, it talks a lot about -EPROBE_DEFER, but it doesn't say it's
> only for that.
>
> Sure, I'll post a patch.
>
> https://lore.kernel.org/r/20210916161931.1.I32bea713bd6c6fb419a24da73686145742b6c117@changeid

Cool thanks.

>
>
> > > In the case of devm_regmap_init_i2c(), the driver could be fine but
> > > you might be trying to instantiate it on a system whose i2c bus lacks
> > > the needed functionality. That's not a bug in the bridge driver but an
> > > error in system integration. Yeah, after bringup of the new system you
> > > probably don't need the error, but it will be useful during people's
> > > bringups year after year.
> > >
> >
> > The point I'm trying to make is that these error messages in probe
> > almost never get printed after the driver is brought up on the hardware
> > that starts shipping out to non-kernel developers. Of course they happen
> > when kernel devs are enabling new hardware year after year on the same
> > tried and tested driver. They're worthwhile messages to have to make our
> > lives easier at figuring out some misconfiguration, etc. The problem is
> > they lead to bloat once the bringup/configuration phase is over.
> >
> > At one point we directed driver authors at dev_dbg() for these prints so
> > that the strings would be removed from the kernel image if debugging
> > wasn't enabled. It looks like dev_err_probe() goes in the opposite
> > direction by printing an error message and passing the string to an
> > exported function, so dev_dbg() won't reduce the image size. Ugh!
>
> So maybe the key here is that "CONFIG_PRINTK=n" is not the same as
> "CONFIG_I_THINK_PROBE_ERRORS_ARE_BLOAT" and it's not just that one has
> a more flippant name than the other. I think your argument about the
> fact that these errors almost never come up in practice is actually
> true for pretty much _all_ probe errors, isn't it? So if you wanted to
> keep non-probe errors in your system (keep PRINTK=y) and just do away
> with these bloat-y probe errors then dev_err_probe() could really be
> the key and there'd be a big benefit for using for all these errors
> during probe, not just ones that have a chance of deferring. ...and
> yes, you could make this config do something fancy like do a stack
> dump or print the return address if you actually hit one of these
> errors once you've thrown away the string.

Yes, but it's also just as important to push similar messages, i.e. "I
failed to get some resource", into the API that hands resources out so
that bloat is minimized further and drivers are kept simple.

>
> I also wouldn't necessarily agree that dev_dbg() was an amazing fit
> for these error messages. They truly were error-level things that were
> happening. These are things that are causing the probe to abort, not
> just extra spammy debug info. Calling them "error" messages rather
> than "debug" messages seems better...
>

Agreed. When all we had was dev_dbg() it was the best option to getting
rid of these types of driver development printks.

Doug Anderson Sept. 17, 2021, 3:02 p.m. UTC | #8

Hi,

On Thu, Sep 16, 2021 at 11:12 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> > > > In the case of devm_regmap_init_i2c(), the driver could be fine but
> > > > you might be trying to instantiate it on a system whose i2c bus lacks
> > > > the needed functionality. That's not a bug in the bridge driver but an
> > > > error in system integration. Yeah, after bringup of the new system you
> > > > probably don't need the error, but it will be useful during people's
> > > > bringups year after year.
> > > >
> > >
> > > The point I'm trying to make is that these error messages in probe
> > > almost never get printed after the driver is brought up on the hardware
> > > that starts shipping out to non-kernel developers. Of course they happen
> > > when kernel devs are enabling new hardware year after year on the same
> > > tried and tested driver. They're worthwhile messages to have to make our
> > > lives easier at figuring out some misconfiguration, etc. The problem is
> > > they lead to bloat once the bringup/configuration phase is over.
> > >
> > > At one point we directed driver authors at dev_dbg() for these prints so
> > > that the strings would be removed from the kernel image if debugging
> > > wasn't enabled. It looks like dev_err_probe() goes in the opposite
> > > direction by printing an error message and passing the string to an
> > > exported function, so dev_dbg() won't reduce the image size. Ugh!
> >
> > So maybe the key here is that "CONFIG_PRINTK=n" is not the same as
> > "CONFIG_I_THINK_PROBE_ERRORS_ARE_BLOAT" and it's not just that one has
> > a more flippant name than the other. I think your argument about the
> > fact that these errors almost never come up in practice is actually
> > true for pretty much _all_ probe errors, isn't it? So if you wanted to
> > keep non-probe errors in your system (keep PRINTK=y) and just do away
> > with these bloat-y probe errors then dev_err_probe() could really be
> > the key and there'd be a big benefit for using for all these errors
> > during probe, not just ones that have a chance of deferring. ...and
> > yes, you could make this config do something fancy like do a stack
> > dump or print the return address if you actually hit one of these
> > errors once you've thrown away the string.
>
> Yes, but it's also just as important to push similar messages, i.e. "I
> failed to get some resource", into the API that hands resources out so
> that bloat is minimized further and drivers are kept simple.

Sure, but this is a slippery slope. If there's any chance that a
caller might want to know about the error but _not_ want the error
message printed then you can't push the error message into the API.
It's really hard to find error cases (even with "get resource" type
functions) where the caller _always_ wants the error reported. Even
kmalloc() has a nod to this with __GFP_NOWARN, though I'm not
advocating adding a "no warn" flag to all APIs. It's always possible
that the caller is expecting some types of errors and handles the case
elegantly.

Let's pop all the way back up to the original point here, which was
about devm_regmap_init_i2c(). What should happen with errors? Let's
look specifically at the errors that could be returned by
regmap_get_i2c_bus() which is the first thing devm_regmap_init_i2c()
tries to do. Those errors have to do with the i2c bus not supporting
the features needed by our regmap.

a) We could return the error without printing anything like the code does today.

No bloat, but during bringup of this bridge chip on a new i2c bus we'd
have to manually add printouts to the probe function to figure out
this error.

b) We could push error reporting into regmap_get_i2c_bus().

No per-driver bloat, but some drivers might have a legitimate reason
not to have an error print here. Perhaps they have a fallback `regmap`
config that they want to be able to use that works with different bus
capabilities. I don't think we can do this.

c) We could use dev_dbg() to print the error

Only bloat if dynamic debug or DEBUG is defined

d) We could use dev_err_probe() to print the error

Extra bloat, though it could be minimized (without sacrificing all
"printk") with a future patch to drop the string from dev_err_probe()
and perhaps replace it with a WARN_ON(). Also handles the fact that
perhaps someday someone might find a reason that regmap_get_i2c_bus()
and/or devm_regmap_init_i2c() should suddenly start returning
-EPROBE_DEFER.

I'm still advocating for "d)" above and I believe you originally
advocated for "a)" or "c)". It's really not such a huge deal, so if
you're adamant about "a)" then I'll shut up. I'm curious if I've
managed to convince you all about "d)" though.

-Doug

Philip Chen Sept. 17, 2021, 10:49 p.m. UTC | #9

Hi Doug and Stephen,

Thanks for the review.
Before we reach a consensus on the best logging option, I'll just
remove the printouts from this patch and just return PTR_ERR.
Once we reach a consensus, we can probably improve logging in a separate patch.

On Fri, Sep 17, 2021 at 8:02 AM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Thu, Sep 16, 2021 at 11:12 PM Stephen Boyd <swboyd@chromium.org> wrote:
> >
> > > > > In the case of devm_regmap_init_i2c(), the driver could be fine but
> > > > > you might be trying to instantiate it on a system whose i2c bus lacks
> > > > > the needed functionality. That's not a bug in the bridge driver but an
> > > > > error in system integration. Yeah, after bringup of the new system you
> > > > > probably don't need the error, but it will be useful during people's
> > > > > bringups year after year.
> > > > >
> > > >
> > > > The point I'm trying to make is that these error messages in probe
> > > > almost never get printed after the driver is brought up on the hardware
> > > > that starts shipping out to non-kernel developers. Of course they happen
> > > > when kernel devs are enabling new hardware year after year on the same
> > > > tried and tested driver. They're worthwhile messages to have to make our
> > > > lives easier at figuring out some misconfiguration, etc. The problem is
> > > > they lead to bloat once the bringup/configuration phase is over.
> > > >
> > > > At one point we directed driver authors at dev_dbg() for these prints so
> > > > that the strings would be removed from the kernel image if debugging
> > > > wasn't enabled. It looks like dev_err_probe() goes in the opposite
> > > > direction by printing an error message and passing the string to an
> > > > exported function, so dev_dbg() won't reduce the image size. Ugh!
> > >
> > > So maybe the key here is that "CONFIG_PRINTK=n" is not the same as
> > > "CONFIG_I_THINK_PROBE_ERRORS_ARE_BLOAT" and it's not just that one has
> > > a more flippant name than the other. I think your argument about the
> > > fact that these errors almost never come up in practice is actually
> > > true for pretty much _all_ probe errors, isn't it? So if you wanted to
> > > keep non-probe errors in your system (keep PRINTK=y) and just do away
> > > with these bloat-y probe errors then dev_err_probe() could really be
> > > the key and there'd be a big benefit for using for all these errors
> > > during probe, not just ones that have a chance of deferring. ...and
> > > yes, you could make this config do something fancy like do a stack
> > > dump or print the return address if you actually hit one of these
> > > errors once you've thrown away the string.
> >
> > Yes, but it's also just as important to push similar messages, i.e. "I
> > failed to get some resource", into the API that hands resources out so
> > that bloat is minimized further and drivers are kept simple.
>
> Sure, but this is a slippery slope. If there's any chance that a
> caller might want to know about the error but _not_ want the error
> message printed then you can't push the error message into the API.
> It's really hard to find error cases (even with "get resource" type
> functions) where the caller _always_ wants the error reported. Even
> kmalloc() has a nod to this with __GFP_NOWARN, though I'm not
> advocating adding a "no warn" flag to all APIs. It's always possible
> that the caller is expecting some types of errors and handles the case
> elegantly.
>
> Let's pop all the way back up to the original point here, which was
> about devm_regmap_init_i2c(). What should happen with errors? Let's
> look specifically at the errors that could be returned by
> regmap_get_i2c_bus() which is the first thing devm_regmap_init_i2c()
> tries to do. Those errors have to do with the i2c bus not supporting
> the features needed by our regmap.
>
>
> a) We could return the error without printing anything like the code does today.
>
> No bloat, but during bringup of this bridge chip on a new i2c bus we'd
> have to manually add printouts to the probe function to figure out
> this error.
>
>
> b) We could push error reporting into regmap_get_i2c_bus().
>
> No per-driver bloat, but some drivers might have a legitimate reason
> not to have an error print here. Perhaps they have a fallback `regmap`
> config that they want to be able to use that works with different bus
> capabilities. I don't think we can do this.
>
>
> c) We could use dev_dbg() to print the error
>
> Only bloat if dynamic debug or DEBUG is defined
>
>
> d) We could use dev_err_probe() to print the error
>
> Extra bloat, though it could be minimized (without sacrificing all
> "printk") with a future patch to drop the string from dev_err_probe()
> and perhaps replace it with a WARN_ON(). Also handles the fact that
> perhaps someday someone might find a reason that regmap_get_i2c_bus()
> and/or devm_regmap_init_i2c() should suddenly start returning
> -EPROBE_DEFER.
>
>
> I'm still advocating for "d)" above and I believe you originally
> advocated for "a)" or "c)". It's really not such a huge deal, so if
> you're adamant about "a)" then I'll shut up. I'm curious if I've
> managed to convince you all about "d)" though.
>
> -Doug

[v3,2/3] drm/bridge: parade-ps8640: Use regmap APIs

Commit Message

Comments

Patch