diff mbox series

[v2,2/2] of: base: Avoid console probe delay when fw_devlink.strict=1

Message ID 20220623080344.783549-3-saravanak@google.com (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series Fix console probe delay due to fw_devlink | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Saravana Kannan June 23, 2022, 8:03 a.m. UTC
Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
enabled iommus and dmas dependency enforcement by default. On some
systems, this caused the console device's probe to get delayed until the
deferred_probe_timeout expires.

We need consoles to work as soon as possible, so mark the console device
node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
the probe of the console device for suppliers without drivers. The
driver can then make the decision on where it can probe without those
suppliers or defer its probe.

Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
Reported-by: Sascha Hauer <sha@pengutronix.de>
Reported-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Tested-by: Peng Fan <peng.fan@nxp.com>
---
 drivers/of/base.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Sascha Hauer June 23, 2022, 10:04 a.m. UTC | #1
On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
> Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> enabled iommus and dmas dependency enforcement by default. On some
> systems, this caused the console device's probe to get delayed until the
> deferred_probe_timeout expires.
> 
> We need consoles to work as soon as possible, so mark the console device
> node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
> the probe of the console device for suppliers without drivers. The
> driver can then make the decision on where it can probe without those
> suppliers or defer its probe.
> 
> Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> Reported-by: Sascha Hauer <sha@pengutronix.de>
> Reported-by: Peng Fan <peng.fan@nxp.com>
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> Tested-by: Peng Fan <peng.fan@nxp.com>
> ---
>  drivers/of/base.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index d4f98c8469ed..a19cd0c73644 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align))
>  			of_property_read_string(of_aliases, "stdout", &name);
>  		if (name)
>  			of_stdout = of_find_node_opts_by_path(name, &of_stdout_options);
> +		if (of_stdout)
> +			of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT;

The device given in the stdout-path property doesn't necessarily have to
be consistent with the console= parameter. The former is usually
statically set in the device trees contained in the kernel while the
latter is dynamically set by the bootloader. So if you change the
console uart in the bootloader then you'll still run into this trap.

It's problematic to consult only the device tree for dependencies. I
found several examples of drivers in the tree for which dma support
is optional. They use it if they can, but continue without it when
not available. "hwlock" is another property which consider several
drivers as optional. Also consider SoCs in early upstreaming phases
when the device tree is merged with "dmas" or "hwlock" properties,
but the corresponding drivers are not yet upstreamed. It's not nice
to defer probing of all these devices for a long time.

I wonder if it wouldn't be a better approach to just probe all devices
and record the device(node) they are waiting on. Then you know that you
don't need to probe them again until the device they are waiting for
is available.

Sascha
Andy Shevchenko June 23, 2022, 4:39 p.m. UTC | #2
On Thu, Jun 23, 2022 at 12:04:21PM +0200, sascha hauer wrote:
> On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:

...

> I wonder if it wouldn't be a better approach to just probe all devices
> and record the device(node) they are waiting on. Then you know that you
> don't need to probe them again until the device they are waiting for
> is available.

There may be no device, but resource. And we become again to the something like
deferred probe ugly hack.

The real solution is to rework device driver model in the kernel that it will
create a graph of dependencies and then simply follow it. But actually it should
be more than 1 graph, because there are resources and there are power, clock and
resets that may be orthogonal to the higher dependencies (like driver X provides
a resource to driver Y).
Rafael J. Wysocki June 23, 2022, 5:22 p.m. UTC | #3
On Thu, Jun 23, 2022 at 6:39 PM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Thu, Jun 23, 2022 at 12:04:21PM +0200, sascha hauer wrote:
> > On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
>
> ...
>
> > I wonder if it wouldn't be a better approach to just probe all devices
> > and record the device(node) they are waiting on. Then you know that you
> > don't need to probe them again until the device they are waiting for
> > is available.
>
> There may be no device, but resource. And we become again to the something like
> deferred probe ugly hack.
>
> The real solution is to rework device driver model in the kernel that it will
> create a graph of dependencies and then simply follow it. But actually it should
> be more than 1 graph, because there are resources and there are power, clock and
> resets that may be orthogonal to the higher dependencies (like driver X provides
> a resource to driver Y).

There is one graph, or it wouldn't be possible to shut down the system orderly.

The problem is that this graph is generally dynamic, especially during
system init, and following dependencies in transient states is
generally hard.

Device links allow the already known dependencies to be recorded and
taken into account later, so we already have a graph for those.

The unknown dependencies obviously cannot be used for creating a graph
of any sort, though, and here we are in the business of guessing what
the unknown dependencies may be IIUC.
Saravana Kannan June 23, 2022, 5:26 p.m. UTC | #4
On Thu, Jun 23, 2022 at 3:05 AM sascha hauer <sha@pengutronix.de> wrote:
>
> On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
> > Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > enabled iommus and dmas dependency enforcement by default. On some
> > systems, this caused the console device's probe to get delayed until the
> > deferred_probe_timeout expires.
> >
> > We need consoles to work as soon as possible, so mark the console device
> > node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
> > the probe of the console device for suppliers without drivers. The
> > driver can then make the decision on where it can probe without those
> > suppliers or defer its probe.
> >
> > Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > Reported-by: Sascha Hauer <sha@pengutronix.de>
> > Reported-by: Peng Fan <peng.fan@nxp.com>
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > Tested-by: Peng Fan <peng.fan@nxp.com>
> > ---
> >  drivers/of/base.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/of/base.c b/drivers/of/base.c
> > index d4f98c8469ed..a19cd0c73644 100644
> > --- a/drivers/of/base.c
> > +++ b/drivers/of/base.c
> > @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align))
> >                       of_property_read_string(of_aliases, "stdout", &name);
> >               if (name)
> >                       of_stdout = of_find_node_opts_by_path(name, &of_stdout_options);
> > +             if (of_stdout)
> > +                     of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT;
>
> The device given in the stdout-path property doesn't necessarily have to
> be consistent with the console= parameter. The former is usually
> statically set in the device trees contained in the kernel while the
> latter is dynamically set by the bootloader. So if you change the
> console uart in the bootloader then you'll still run into this trap.
>
> It's problematic to consult only the device tree for dependencies. I
> found several examples of drivers in the tree for which dma support
> is optional. They use it if they can, but continue without it when
> not available. "hwlock" is another property which consider several
> drivers as optional. Also consider SoCs in early upstreaming phases
> when the device tree is merged with "dmas" or "hwlock" properties,
> but the corresponding drivers are not yet upstreamed. It's not nice
> to defer probing of all these devices for a long time.
>
> I wonder if it wouldn't be a better approach to just probe all devices
> and record the device(node) they are waiting on. Then you know that you
> don't need to probe them again until the device they are waiting for
> is available.

That actually breaks things in a worse sense. There are cases where
the consumer driver is built in and the optional supplier driver is
loaded at boot. Without fw_devlink and the deferred probe timeout, we
end up probing the consumer with limited functionality. With the
current setup, sure we delay some probes a bit but at least everything
works with the right functionality. And you can reduce or remove the
delay if you want to optimize it.

-Saravana
Saravana Kannan June 23, 2022, 5:30 p.m. UTC | #5
On Thu, Jun 23, 2022 at 9:39 AM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Thu, Jun 23, 2022 at 12:04:21PM +0200, sascha hauer wrote:
> > On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
>
> ...
>
> > I wonder if it wouldn't be a better approach to just probe all devices
> > and record the device(node) they are waiting on. Then you know that you
> > don't need to probe them again until the device they are waiting for
> > is available.
>
> There may be no device, but resource. And we become again to the something like
> deferred probe ugly hack.
>
> The real solution is to rework device driver model in the kernel that it will
> create a graph of dependencies and then simply follow it. But actually it should
> be more than 1 graph, because there are resources and there are power, clock and
> resets that may be orthogonal to the higher dependencies (like driver X provides
> a resource to driver Y).

We already do this with fw_devlink for DT based systems and we do
effectively just probe the devices in graph order (by deferring any
attempts that happen too early and before it even gets to the driver).
The problem is the knowledge of what's considered an optional vs
mandatory dependency and that's affected by the global state of driver
support in the kernel.

-Saravana
Ahmad Fatoum June 23, 2022, 5:35 p.m. UTC | #6
Hello Saravana,

On 23.06.22 19:26, Saravana Kannan wrote:
> On Thu, Jun 23, 2022 at 3:05 AM sascha hauer <sha@pengutronix.de> wrote:
>>
>> On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
>>> Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
>>> enabled iommus and dmas dependency enforcement by default. On some
>>> systems, this caused the console device's probe to get delayed until the
>>> deferred_probe_timeout expires.
>>>
>>> We need consoles to work as soon as possible, so mark the console device
>>> node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
>>> the probe of the console device for suppliers without drivers. The
>>> driver can then make the decision on where it can probe without those
>>> suppliers or defer its probe.
>>>
>>> Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
>>> Reported-by: Sascha Hauer <sha@pengutronix.de>
>>> Reported-by: Peng Fan <peng.fan@nxp.com>
>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
>>> Tested-by: Peng Fan <peng.fan@nxp.com>
>>> ---
>>>  drivers/of/base.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/of/base.c b/drivers/of/base.c
>>> index d4f98c8469ed..a19cd0c73644 100644
>>> --- a/drivers/of/base.c
>>> +++ b/drivers/of/base.c
>>> @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align))
>>>                       of_property_read_string(of_aliases, "stdout", &name);
>>>               if (name)
>>>                       of_stdout = of_find_node_opts_by_path(name, &of_stdout_options);
>>> +             if (of_stdout)
>>> +                     of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT;
>>
>> The device given in the stdout-path property doesn't necessarily have to
>> be consistent with the console= parameter. The former is usually
>> statically set in the device trees contained in the kernel while the
>> latter is dynamically set by the bootloader. So if you change the
>> console uart in the bootloader then you'll still run into this trap.
>>
>> It's problematic to consult only the device tree for dependencies. I
>> found several examples of drivers in the tree for which dma support
>> is optional. They use it if they can, but continue without it when
>> not available. "hwlock" is another property which consider several
>> drivers as optional. Also consider SoCs in early upstreaming phases
>> when the device tree is merged with "dmas" or "hwlock" properties,
>> but the corresponding drivers are not yet upstreamed. It's not nice
>> to defer probing of all these devices for a long time.
>>
>> I wonder if it wouldn't be a better approach to just probe all devices
>> and record the device(node) they are waiting on. Then you know that you
>> don't need to probe them again until the device they are waiting for
>> is available.
> 
> That actually breaks things in a worse sense. There are cases where
> the consumer driver is built in and the optional supplier driver is
> loaded at boot. Without fw_devlink and the deferred probe timeout, we
> end up probing the consumer with limited functionality. With the
> current setup, sure we delay some probes a bit but at least everything
> works with the right functionality. And you can reduce or remove the
> delay if you want to optimize it.

I have a system that doesn't use stdout-path and has the bootloader
set console= either to ttynull when secure booting or to an UART
when booting normally. How would I optimize the kernel to avoid
my UART being loaded after DMA controller probe without touching
the bootloader?

Cheers,
Ahmad

> 
> -Saravana
> 
>
Saravana Kannan June 23, 2022, 6:17 p.m. UTC | #7
On Thu, Jun 23, 2022 at 10:36 AM Ahmad Fatoum <a.fatoum@pengutronix.de> wrote:
>
> Hello Saravana,
>
> On 23.06.22 19:26, Saravana Kannan wrote:
> > On Thu, Jun 23, 2022 at 3:05 AM sascha hauer <sha@pengutronix.de> wrote:
> >>
> >> On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
> >>> Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> >>> enabled iommus and dmas dependency enforcement by default. On some
> >>> systems, this caused the console device's probe to get delayed until the
> >>> deferred_probe_timeout expires.
> >>>
> >>> We need consoles to work as soon as possible, so mark the console device
> >>> node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
> >>> the probe of the console device for suppliers without drivers. The
> >>> driver can then make the decision on where it can probe without those
> >>> suppliers or defer its probe.
> >>>
> >>> Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> >>> Reported-by: Sascha Hauer <sha@pengutronix.de>
> >>> Reported-by: Peng Fan <peng.fan@nxp.com>
> >>> Signed-off-by: Saravana Kannan <saravanak@google.com>
> >>> Tested-by: Peng Fan <peng.fan@nxp.com>
> >>> ---
> >>>  drivers/of/base.c | 2 ++
> >>>  1 file changed, 2 insertions(+)
> >>>
> >>> diff --git a/drivers/of/base.c b/drivers/of/base.c
> >>> index d4f98c8469ed..a19cd0c73644 100644
> >>> --- a/drivers/of/base.c
> >>> +++ b/drivers/of/base.c
> >>> @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align))
> >>>                       of_property_read_string(of_aliases, "stdout", &name);
> >>>               if (name)
> >>>                       of_stdout = of_find_node_opts_by_path(name, &of_stdout_options);
> >>> +             if (of_stdout)
> >>> +                     of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT;
> >>
> >> The device given in the stdout-path property doesn't necessarily have to
> >> be consistent with the console= parameter. The former is usually
> >> statically set in the device trees contained in the kernel while the
> >> latter is dynamically set by the bootloader. So if you change the
> >> console uart in the bootloader then you'll still run into this trap.
> >>
> >> It's problematic to consult only the device tree for dependencies. I
> >> found several examples of drivers in the tree for which dma support
> >> is optional. They use it if they can, but continue without it when
> >> not available. "hwlock" is another property which consider several
> >> drivers as optional. Also consider SoCs in early upstreaming phases
> >> when the device tree is merged with "dmas" or "hwlock" properties,
> >> but the corresponding drivers are not yet upstreamed. It's not nice
> >> to defer probing of all these devices for a long time.
> >>
> >> I wonder if it wouldn't be a better approach to just probe all devices
> >> and record the device(node) they are waiting on. Then you know that you
> >> don't need to probe them again until the device they are waiting for
> >> is available.
> >
> > That actually breaks things in a worse sense. There are cases where
> > the consumer driver is built in and the optional supplier driver is
> > loaded at boot. Without fw_devlink and the deferred probe timeout, we
> > end up probing the consumer with limited functionality. With the
> > current setup, sure we delay some probes a bit but at least everything
> > works with the right functionality. And you can reduce or remove the
> > delay if you want to optimize it.
>
> I have a system that doesn't use stdout-path and has the bootloader
> set console= either to ttynull when secure booting or to an UART
> when booting normally. How would I optimize the kernel to avoid
> my UART being loaded after DMA controller probe without touching
> the bootloader?
>

Thanks for the report Ahmad. I think someone else reported a similar
thing in another thread. I plan to take a look at it. It should be
possible to find the device and set the flag for those devices too.

-Saravana
Sascha Hauer June 23, 2022, 8:37 p.m. UTC | #8
On Thu, Jun 23, 2022 at 10:26:46AM -0700, Saravana Kannan wrote:
> On Thu, Jun 23, 2022 at 3:05 AM sascha hauer <sha@pengutronix.de> wrote:
> >
> > On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
> > > Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > > enabled iommus and dmas dependency enforcement by default. On some
> > > systems, this caused the console device's probe to get delayed until the
> > > deferred_probe_timeout expires.
> > >
> > > We need consoles to work as soon as possible, so mark the console device
> > > node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
> > > the probe of the console device for suppliers without drivers. The
> > > driver can then make the decision on where it can probe without those
> > > suppliers or defer its probe.
> > >
> > > Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > > Reported-by: Sascha Hauer <sha@pengutronix.de>
> > > Reported-by: Peng Fan <peng.fan@nxp.com>
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > Tested-by: Peng Fan <peng.fan@nxp.com>
> > > ---
> > >  drivers/of/base.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/of/base.c b/drivers/of/base.c
> > > index d4f98c8469ed..a19cd0c73644 100644
> > > --- a/drivers/of/base.c
> > > +++ b/drivers/of/base.c
> > > @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align))
> > >                       of_property_read_string(of_aliases, "stdout", &name);
> > >               if (name)
> > >                       of_stdout = of_find_node_opts_by_path(name, &of_stdout_options);
> > > +             if (of_stdout)
> > > +                     of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT;
> >
> > The device given in the stdout-path property doesn't necessarily have to
> > be consistent with the console= parameter. The former is usually
> > statically set in the device trees contained in the kernel while the
> > latter is dynamically set by the bootloader. So if you change the
> > console uart in the bootloader then you'll still run into this trap.
> >
> > It's problematic to consult only the device tree for dependencies. I
> > found several examples of drivers in the tree for which dma support
> > is optional. They use it if they can, but continue without it when
> > not available. "hwlock" is another property which consider several
> > drivers as optional. Also consider SoCs in early upstreaming phases
> > when the device tree is merged with "dmas" or "hwlock" properties,
> > but the corresponding drivers are not yet upstreamed. It's not nice
> > to defer probing of all these devices for a long time.
> >
> > I wonder if it wouldn't be a better approach to just probe all devices
> > and record the device(node) they are waiting on. Then you know that you
> > don't need to probe them again until the device they are waiting for
> > is available.
> 
> That actually breaks things in a worse sense. There are cases where
> the consumer driver is built in and the optional supplier driver is
> loaded at boot. Without fw_devlink and the deferred probe timeout, we
> end up probing the consumer with limited functionality. With the
> current setup, sure we delay some probes a bit but at least everything
> works with the right functionality. And you can reduce or remove the
> delay if you want to optimize it.

We have optional and mandatory resources. In this situation a driver has
to decide what to do. Either it continues with limited resources or it
defers probing. Some drivers try to allocate the optional resources at
open time so that they are able to use them once they are available.  We
could even think of an asynchronous callback into a driver when a
resource becomes available. Whether we put this decision what is
optional or not into the driver or in the framework doesn't make a
difference to the problem, it is still the same: When a resource is not
yet available we have no idea if and when it becomes available, if it's
worth waiting for it or not.

The difference is that with my proposal (which isn't actually mine but
from my collegue Lucas) a driver can decide very fine grained how it
wants to deal with the situation. With fw_devlink we try to put this
intelligence into the framework and it seems there are quite some quirks
necessary to get that running for everyone.

Anyway, we have fw_devlink now and actually I think the dependency graph
that we have with fw_devlink is quite nice to resolve the natural probe
order. But why do we have to put an extra penalty on drivers whose
resources are not yet available?  Probe devices with complete resources
as long as you find them, execute more initcalls as long as there are
any, but when there are no more left, you could start probing devices
with incomplete resources, why wait for another ten seconds?

For me it's no problem when the UART probes late, we have earlycon which
can be used to debug problems that arise before the UART probes, but
what nags is the ten seconds delay. zero would be a much saner value for
me.

Sascha
Saravana Kannan June 23, 2022, 11:13 p.m. UTC | #9
On Thu, Jun 23, 2022 at 1:37 PM sascha hauer <sha@pengutronix.de> wrote:
>
> On Thu, Jun 23, 2022 at 10:26:46AM -0700, Saravana Kannan wrote:
> > On Thu, Jun 23, 2022 at 3:05 AM sascha hauer <sha@pengutronix.de> wrote:
> > >
> > > On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
> > > > Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > > > enabled iommus and dmas dependency enforcement by default. On some
> > > > systems, this caused the console device's probe to get delayed until the
> > > > deferred_probe_timeout expires.
> > > >
> > > > We need consoles to work as soon as possible, so mark the console device
> > > > node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
> > > > the probe of the console device for suppliers without drivers. The
> > > > driver can then make the decision on where it can probe without those
> > > > suppliers or defer its probe.
> > > >
> > > > Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > > > Reported-by: Sascha Hauer <sha@pengutronix.de>
> > > > Reported-by: Peng Fan <peng.fan@nxp.com>
> > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > Tested-by: Peng Fan <peng.fan@nxp.com>
> > > > ---
> > > >  drivers/of/base.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/drivers/of/base.c b/drivers/of/base.c
> > > > index d4f98c8469ed..a19cd0c73644 100644
> > > > --- a/drivers/of/base.c
> > > > +++ b/drivers/of/base.c
> > > > @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align))
> > > >                       of_property_read_string(of_aliases, "stdout", &name);
> > > >               if (name)
> > > >                       of_stdout = of_find_node_opts_by_path(name, &of_stdout_options);
> > > > +             if (of_stdout)
> > > > +                     of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT;
> > >
> > > The device given in the stdout-path property doesn't necessarily have to
> > > be consistent with the console= parameter. The former is usually
> > > statically set in the device trees contained in the kernel while the
> > > latter is dynamically set by the bootloader. So if you change the
> > > console uart in the bootloader then you'll still run into this trap.
> > >
> > > It's problematic to consult only the device tree for dependencies. I
> > > found several examples of drivers in the tree for which dma support
> > > is optional. They use it if they can, but continue without it when
> > > not available. "hwlock" is another property which consider several
> > > drivers as optional. Also consider SoCs in early upstreaming phases
> > > when the device tree is merged with "dmas" or "hwlock" properties,
> > > but the corresponding drivers are not yet upstreamed. It's not nice
> > > to defer probing of all these devices for a long time.
> > >
> > > I wonder if it wouldn't be a better approach to just probe all devices
> > > and record the device(node) they are waiting on. Then you know that you
> > > don't need to probe them again until the device they are waiting for
> > > is available.
> >
> > That actually breaks things in a worse sense. There are cases where
> > the consumer driver is built in and the optional supplier driver is
> > loaded at boot. Without fw_devlink and the deferred probe timeout, we
> > end up probing the consumer with limited functionality. With the
> > current setup, sure we delay some probes a bit but at least everything
> > works with the right functionality. And you can reduce or remove the
> > delay if you want to optimize it.
>
> We have optional and mandatory resources. In this situation a driver has
> to decide what to do. Either it continues with limited resources or it
> defers probing. Some drivers try to allocate the optional resources at
> open time so that they are able to use them once they are available.  We
> could even think of an asynchronous callback into a driver when a
> resource becomes available. Whether we put this decision what is
> optional or not into the driver or in the framework doesn't make a
> difference to the problem, it is still the same: When a resource is not
> yet available we have no idea if and when it becomes available, if it's
> worth waiting for it or not.
>
> The difference is that with my proposal (which isn't actually mine but
> from my collegue Lucas) a driver can decide very fine grained how it
> wants to deal with the situation. With fw_devlink we try to put this
> intelligence into the framework and it seems there are quite some quirks
> necessary to get that running for everyone.

That's one possible solution, but for that to work, all drivers with
optional suppliers would need to be changed to take advantage of this
callback to work correctly when the optional suppliers become
available. We could add this callback, but it would be a long time
before the callback handles all/most cases of optional suppliers.

One of the goals of fw_devlink is so that people can stop having to
play initcall chicken where they try to tune their initcall levels wrt
to the chain of suppliers to avoid probe failures or minimize deferred
probed. Technically with deferred probes and proper error handling,
people shouldn't have to play initcall chicken, but we still have a
lot of those. Adding this callback is just going to make writing
drivers even harder. And there are tons of drivers that can't do
proper clean up and some drivers can't even be unbound once they are
bound.

Also, if I'm not mistaken (I could be), stuff like pinctrl is setup
before we even get to driver->probe(). So when the pinctrl supplier
becomes available, the driver would need to unbind fully and rebind.
What if there's a current user of the device?

> Anyway, we have fw_devlink now and actually I think the dependency graph
> that we have with fw_devlink is quite nice to resolve the natural probe
> order. But why do we have to put an extra penalty on drivers whose
> resources are not yet available?  Probe devices with complete resources
> as long as you find them, execute more initcalls as long as there are
> any, but when there are no more left, you could start probing devices
> with incomplete resources, why wait for another ten seconds?

The timeout is defining how long after the most recent module load
that we give up waiting for more modules to be loaded. On a Pixel 6
with serial console output, the timeout of 5 seconds would work
because the worst case gap between two module loads is ~2.8 seconds
(so 5 seconds for some margin). The default is this high to
accommodate slow storage devices where mounting all the filesystems
can take time (think HDD or network FS). The default is configured for
correctness so that we can maximize functionality across systems, but
people can optimize for the specific case.

> For me it's no problem when the UART probes late, we have earlycon which
> can be used to debug problems that arise before the UART probes, but
> what nags is the ten seconds delay. zero would be a much saner value for
> me.

Having said all that, I empathize with your annoyance at the delay.
Open to ideas of making this better without making the default
functionality worse.

-Saravana
Rob Herring (Arm) June 27, 2022, 5:50 p.m. UTC | #10
On Thu, Jun 23, 2022 at 12:04:21PM +0200, sascha hauer wrote:
> On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
> > Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > enabled iommus and dmas dependency enforcement by default. On some
> > systems, this caused the console device's probe to get delayed until the
> > deferred_probe_timeout expires.
> > 
> > We need consoles to work as soon as possible, so mark the console device
> > node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
> > the probe of the console device for suppliers without drivers. The
> > driver can then make the decision on where it can probe without those
> > suppliers or defer its probe.
> > 
> > Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > Reported-by: Sascha Hauer <sha@pengutronix.de>
> > Reported-by: Peng Fan <peng.fan@nxp.com>
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > Tested-by: Peng Fan <peng.fan@nxp.com>
> > ---
> >  drivers/of/base.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/of/base.c b/drivers/of/base.c
> > index d4f98c8469ed..a19cd0c73644 100644
> > --- a/drivers/of/base.c
> > +++ b/drivers/of/base.c
> > @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align))
> >  			of_property_read_string(of_aliases, "stdout", &name);
> >  		if (name)
> >  			of_stdout = of_find_node_opts_by_path(name, &of_stdout_options);
> > +		if (of_stdout)
> > +			of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT;
> 
> The device given in the stdout-path property doesn't necessarily have to
> be consistent with the console= parameter. The former is usually
> statically set in the device trees contained in the kernel while the
> latter is dynamically set by the bootloader. So if you change the
> console uart in the bootloader then you'll still run into this trap.
> 
> It's problematic to consult only the device tree for dependencies. I
> found several examples of drivers in the tree for which dma support
> is optional. They use it if they can, but continue without it when
> not available. "hwlock" is another property which consider several
> drivers as optional. Also consider SoCs in early upstreaming phases
> when the device tree is merged with "dmas" or "hwlock" properties,
> but the corresponding drivers are not yet upstreamed. It's not nice
> to defer probing of all these devices for a long time.
> 
> I wonder if it wouldn't be a better approach to just probe all devices
> and record the device(node) they are waiting on. Then you know that you
> don't need to probe them again until the device they are waiting for
> is available.

Can't we have a driver flag 'I have optional dependencies' that will 
trigger probe without all dependencies and then the driver can defer 
probe if required dependencies are not yet met.

Rob
Saravana Kannan June 27, 2022, 6:20 p.m. UTC | #11
On Mon, Jun 27, 2022 at 10:50 AM Rob Herring <robh@kernel.org> wrote:
>
> On Thu, Jun 23, 2022 at 12:04:21PM +0200, sascha hauer wrote:
> > On Thu, Jun 23, 2022 at 01:03:43AM -0700, Saravana Kannan wrote:
> > > Commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > > enabled iommus and dmas dependency enforcement by default. On some
> > > systems, this caused the console device's probe to get delayed until the
> > > deferred_probe_timeout expires.
> > >
> > > We need consoles to work as soon as possible, so mark the console device
> > > node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
> > > the probe of the console device for suppliers without drivers. The
> > > driver can then make the decision on where it can probe without those
> > > suppliers or defer its probe.
> > >
> > > Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
> > > Reported-by: Sascha Hauer <sha@pengutronix.de>
> > > Reported-by: Peng Fan <peng.fan@nxp.com>
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > Tested-by: Peng Fan <peng.fan@nxp.com>
> > > ---
> > >  drivers/of/base.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/of/base.c b/drivers/of/base.c
> > > index d4f98c8469ed..a19cd0c73644 100644
> > > --- a/drivers/of/base.c
> > > +++ b/drivers/of/base.c
> > > @@ -1919,6 +1919,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align))
> > >                     of_property_read_string(of_aliases, "stdout", &name);
> > >             if (name)
> > >                     of_stdout = of_find_node_opts_by_path(name, &of_stdout_options);
> > > +           if (of_stdout)
> > > +                   of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT;
> >
> > The device given in the stdout-path property doesn't necessarily have to
> > be consistent with the console= parameter. The former is usually
> > statically set in the device trees contained in the kernel while the
> > latter is dynamically set by the bootloader. So if you change the
> > console uart in the bootloader then you'll still run into this trap.
> >
> > It's problematic to consult only the device tree for dependencies. I
> > found several examples of drivers in the tree for which dma support
> > is optional. They use it if they can, but continue without it when
> > not available. "hwlock" is another property which consider several
> > drivers as optional. Also consider SoCs in early upstreaming phases
> > when the device tree is merged with "dmas" or "hwlock" properties,
> > but the corresponding drivers are not yet upstreamed. It's not nice
> > to defer probing of all these devices for a long time.
> >
> > I wonder if it wouldn't be a better approach to just probe all devices
> > and record the device(node) they are waiting on. Then you know that you
> > don't need to probe them again until the device they are waiting for
> > is available.
>
> Can't we have a driver flag 'I have optional dependencies' that will
> trigger probe without all dependencies and then the driver can defer
> probe if required dependencies are not yet met.

Haha... that's kinda what I'm working on right now. But named
intentionally in a more limited sense of "I can't wait for the
timeout" where fw_devlink will relax and allow the driver to probe
(and have it make the call) once we hit late_initcall(). I'm
explicitly limiting it to "timeout" because we don't want everyone
adding this flag everytime they hit an issue. That'll beat the point
of fw_devlink=on.

Also, setting the flag for a driver to fix one system might break
another system because in the other system the user might want to wait
for the timeout because the supplier drivers would be loaded before
the timeout.

Another option is to restart the timer (if it hasn't expired) when
filesystems get mounted (in addition to the current "when new driver
gets registered). That way, we might be able to drop the timeout from
10s to 5s.

-Saravana
Linus Walleij June 28, 2022, 1:41 p.m. UTC | #12
On Thu, Jun 23, 2022 at 12:05 PM sascha hauer <sha@pengutronix.de> wrote:

> Also consider SoCs in early upstreaming phases
> when the device tree is merged with "dmas" or "hwlock" properties,
> but the corresponding drivers are not yet upstreamed. It's not nice
> to defer probing of all these devices for a long time.

Actually this drives a truck through the entire approach in a way.

It is perfectly legal to have a device tree with dmas specified
but leave them unused in the operating system. DT just describes
what hardware is there, it does not mandate that the OS
implement drivers for all of it.

This approach really needs that the resolution mechanism
is aware of whether:

1. a driver exist for the resource at all so it will eventually resolve

2. if that driver is compiled in or module at all (IS_ENABLED())

3. If the resource should be grabbed early or optionally later
    such as dmas for console UART

Only then can the mechanism work in the generic case.

Yours,
Linus Walleij
diff mbox series

Patch

diff --git a/drivers/of/base.c b/drivers/of/base.c
index d4f98c8469ed..a19cd0c73644 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -1919,6 +1919,8 @@  void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align))
 			of_property_read_string(of_aliases, "stdout", &name);
 		if (name)
 			of_stdout = of_find_node_opts_by_path(name, &of_stdout_options);
+		if (of_stdout)
+			of_stdout->fwnode.flags |= FWNODE_FLAG_BEST_EFFORT;
 	}
 
 	if (!of_aliases)