diff mbox series

[v2,6/9] Revert "driver core: Set default deferred_probe_timeout back to 0."

Message ID 20220601070707.3946847-7-saravanak@google.com (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series deferred_probe_timeout logic clean up | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Saravana Kannan June 1, 2022, 7:07 a.m. UTC
This reverts commit 11f7e7ef553b6b93ac1aa74a3c2011b9cc8aeb61.

Let's take another shot at getting deferred_probe_timeout=10 to work.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/dd.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Geert Uytterhoeven July 20, 2022, 5:31 p.m. UTC | #1
Hi Saravana,

On Wed, Jun 1, 2022 at 9:45 AM Saravana Kannan <saravanak@google.com> wrote:
> This reverts commit 11f7e7ef553b6b93ac1aa74a3c2011b9cc8aeb61.
>
> Let's take another shot at getting deferred_probe_timeout=10 to work.
>
> Signed-off-by: Saravana Kannan <saravanak@google.com>

Thanks for your patch, which is now commit f516d01b9df2782b
("Revert "driver core: Set default deferred_probe_timeout
back to 0."") in driver-core/driver-core-next.

Wolfram found an issue on a Renesas board where disabling the IOMMU
driver (CONFIG_IPMMU_VMSA=n) causes the system to fail to boot,
and bisected this to a merge of driver-core/driver-core-next.
After some trials, I managed to reproduce the issue, and bisected it
further to commit f516d01b9df2782b.

The affected config has:
    CONFIG_MODULES=y
    CONFIG_RCAR_DMAC=y
    CONFIG_IPMMU_VMSA=n

In arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dtb,
e6e88000.serial links to a dmac, and the dmac links to an iommu,
for which no driver is available.

Playing with deferred_probe_timeout values doesn't help.

However, the above options do not seem to be sufficient to trigger
the issue, as I had other configs with those three options that do
boot fine.

After bisecting configs, I found the culprit: CONFIG_IP_PNP.
As Wolfram was using an initramfs, CONFIG_IP_PNP was not needed.
If CONFIG_IP_PNP=n, booting fails.
If CONFIG_IP_PNP=y, booting succeeds.
In fact, just disabling late_initcall(ip_auto_config) makes it fail,
too.
Reducing ip_auto_config(), it turns out the call to
wait_for_init_devices_probe() is what is needed to unblock booting.

So I guess wait_for_init_devices_probe() needs to be called (where?)
if CONFIG_IP_PNP=n, too?

> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -256,7 +256,12 @@ static int deferred_devs_show(struct seq_file *s, void *data)
>  }
>  DEFINE_SHOW_ATTRIBUTE(deferred_devs);
>
> +#ifdef CONFIG_MODULES
> +int driver_deferred_probe_timeout = 10;
> +#else
>  int driver_deferred_probe_timeout;
> +#endif
> +
>  EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
>
>  static int __init deferred_probe_timeout_setup(char *str)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Saravana Kannan July 20, 2022, 7:01 p.m. UTC | #2
On Wed, Jul 20, 2022 at 10:31 AM Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Wed, Jun 1, 2022 at 9:45 AM Saravana Kannan <saravanak@google.com> wrote:
> > This reverts commit 11f7e7ef553b6b93ac1aa74a3c2011b9cc8aeb61.
> >
> > Let's take another shot at getting deferred_probe_timeout=10 to work.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> Thanks for your patch, which is now commit f516d01b9df2782b
> ("Revert "driver core: Set default deferred_probe_timeout
> back to 0."") in driver-core/driver-core-next.
>
> Wolfram found an issue on a Renesas board where disabling the IOMMU
> driver (CONFIG_IPMMU_VMSA=n) causes the system to fail to boot,
> and bisected this to a merge of driver-core/driver-core-next.
> After some trials, I managed to reproduce the issue, and bisected it
> further to commit f516d01b9df2782b.
>
> The affected config has:
>     CONFIG_MODULES=y
>     CONFIG_RCAR_DMAC=y
>     CONFIG_IPMMU_VMSA=n
>
> In arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dtb,
> e6e88000.serial links to a dmac, and the dmac links to an iommu,
> for which no driver is available.

Thanks for digging into this and giving more details.

Is e6e88000.serial being blocked the reason for the boot failure?

If so, can you give this a shot?
https://lore.kernel.org/lkml/20220701012647.2007122-1-saravanak@google.com/

> Playing with deferred_probe_timeout values doesn't help.

This part is strange though. If you set deferred_probe_timeout=1,
fw_devlink will stop blocking all probes 1 second after
late_initcall()s finish. So, similar to the ip autoconfig issue, is
the issue caused by something that needs to be finished before we hit
late_initcall() but is getting blocked?

> However, the above options do not seem to be sufficient to trigger
> the issue, as I had other configs with those three options that do
> boot fine.
>
> After bisecting configs, I found the culprit: CONFIG_IP_PNP.
> As Wolfram was using an initramfs, CONFIG_IP_PNP was not needed.
> If CONFIG_IP_PNP=n, booting fails.
> If CONFIG_IP_PNP=y, booting succeeds.
> In fact, just disabling late_initcall(ip_auto_config) makes it fail,
> too.
> Reducing ip_auto_config(), it turns out the call to
> wait_for_init_devices_probe() is what is needed to unblock booting.
>
> So I guess wait_for_init_devices_probe() needs to be called (where?)
> if CONFIG_IP_PNP=n, too?

That function just unblocks all devices and allows them to try and
probe and then waits for all possible probes to finish before
returning. They problem with call it randomly/every time is that it
breaks functionality where an optional supplier will probe after a few
modules are loaded in the future.

I guess one possible issue with the timeout not helping is that once
the timeout expires, things are still being probed and nothing is
being blocked till they finish probing.

I'm trying to have the default config (in terms of fw_devlink,
deferred probe behavior, timeouts, etc) be the same between a fully
static kernel (but CONFIG_MODULES still enabled) and a fully modular
kernel (like GKI). But it might end up being an untenable problem.

I'll wait to see what specifically is the issue in this case and then
I'll go from there.

-Saravana

> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -256,7 +256,12 @@ static int deferred_devs_show(struct seq_file *s, void *data)
> >  }
> >  DEFINE_SHOW_ATTRIBUTE(deferred_devs);
> >
> > +#ifdef CONFIG_MODULES
> > +int driver_deferred_probe_timeout = 10;
> > +#else
> >  int driver_deferred_probe_timeout;
> > +#endif
> > +
> >  EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
> >
> >  static int __init deferred_probe_timeout_setup(char *str)
>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
Geert Uytterhoeven July 21, 2022, 8:40 a.m. UTC | #3
Hi Saravana,

On Wed, Jul 20, 2022 at 9:02 PM Saravana Kannan <saravanak@google.com> wrote:
> On Wed, Jul 20, 2022 at 10:31 AM Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> > On Wed, Jun 1, 2022 at 9:45 AM Saravana Kannan <saravanak@google.com> wrote:
> > > This reverts commit 11f7e7ef553b6b93ac1aa74a3c2011b9cc8aeb61.
> > >
> > > Let's take another shot at getting deferred_probe_timeout=10 to work.
> > >
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> >
> > Thanks for your patch, which is now commit f516d01b9df2782b
> > ("Revert "driver core: Set default deferred_probe_timeout
> > back to 0."") in driver-core/driver-core-next.
> >
> > Wolfram found an issue on a Renesas board where disabling the IOMMU
> > driver (CONFIG_IPMMU_VMSA=n) causes the system to fail to boot,
> > and bisected this to a merge of driver-core/driver-core-next.
> > After some trials, I managed to reproduce the issue, and bisected it
> > further to commit f516d01b9df2782b.
> >
> > The affected config has:
> >     CONFIG_MODULES=y
> >     CONFIG_RCAR_DMAC=y
> >     CONFIG_IPMMU_VMSA=n
> >
> > In arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dtb,
> > e6e88000.serial links to a dmac, and the dmac links to an iommu,
> > for which no driver is available.
>
> Thanks for digging into this and giving more details.
>
> Is e6e88000.serial being blocked the reason for the boot failure?

It doesn't seem to be.

> If so, can you give this a shot?
> https://lore.kernel.org/lkml/20220701012647.2007122-1-saravanak@google.com/

Thanks, but it doesn't make a difference.

> > After bisecting configs, I found the culprit: CONFIG_IP_PNP.
> > As Wolfram was using an initramfs, CONFIG_IP_PNP was not needed.
> > If CONFIG_IP_PNP=n, booting fails.
> > If CONFIG_IP_PNP=y, booting succeeds.
> > In fact, just disabling late_initcall(ip_auto_config) makes it fail,
> > too.
> > Reducing ip_auto_config(), it turns out the call to
> > wait_for_init_devices_probe() is what is needed to unblock booting.
> >
> > So I guess wait_for_init_devices_probe() needs to be called (where?)
> > if CONFIG_IP_PNP=n, too?
>
> That function just unblocks all devices and allows them to try and
> probe and then waits for all possible probes to finish before
> returning. They problem with call it randomly/every time is that it
> breaks functionality where an optional supplier will probe after a few
> modules are loaded in the future.
>
> I guess one possible issue with the timeout not helping is that once
> the timeout expires, things are still being probed and nothing is
> being blocked till they finish probing.

I'm not sure that it's a device that's missing.

Calling wait_for_init_devices_probe() or not changes lots of little
things in the probing order. But when comparing the sorted boot logs,
there does not seem to be any difference in the list of devices that
was probed successfully.
It looks like the system is just blocked on something else?...

I tried getting a list of all locks held using Magic SysRq + d,
but Magic SysRq on the serial console does not work at this point
(it does work in the booted kernel with CONFIG_IP_PNP=y).


Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
diff mbox series

Patch

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 4a55fbb7e0da..335e71d3a618 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -256,7 +256,12 @@  static int deferred_devs_show(struct seq_file *s, void *data)
 }
 DEFINE_SHOW_ATTRIBUTE(deferred_devs);
 
+#ifdef CONFIG_MODULES
+int driver_deferred_probe_timeout = 10;
+#else
 int driver_deferred_probe_timeout;
+#endif
+
 EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
 
 static int __init deferred_probe_timeout_setup(char *str)