diff mbox series

[RFC] driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires

Message ID 20200407070609.42865-1-john.stultz@linaro.org (mailing list archive)
State Superseded, archived
Headers show
Series [RFC] driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires | expand

Commit Message

John Stultz April 7, 2020, 7:06 a.m. UTC
In commit c8c43cee29f6 ("driver core: Fix
driver_deferred_probe_check_state() logic"), we set the default
driver_deferred_probe_timeout value to 30 seconds to allow for
drivers that are missing dependencies to have some time so that
the dependency may be loaded from userland after initcalls_done
is set.

However, Yoshihiro Shimoda reported that on his device that
expects to have unmet dependencies (due to "optional links" in
its devicetree), was failing to mount the NFS root.

In digging further, it seemed the problem was that while the
device properly probes after waiting 30 seconds for any missing
modules to load, the ip_auto_config() had already failed,
resulting in NFS to fail. This was due to ip_auto_config()
calling wait_for_device_probe() which doesn't wait for the
driver_deferred_probe_timeout to fire.

This patch tries to fix the issue by creating a waitqueue
for the driver_deferred_probe_timeout, and calling wait_event()
to make sure driver_deferred_probe_timeout is zero in
wait_for_device_probe() to make sure all the probing is
finished.

NOTE: I'm not 100% sure this won't have other unwanted side
effects (I don't have failing hardware myself to validate),
so I'd apprecate testing and close review.

If this approach doesn't work, I'll simply set the default
driver_deferred_probe_timeout value back to zero, to avoid any
behavioral change from before.

Thanks to Geert for chasing down that ip_auto_config was why NFS
was failing in this case!

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: netdev <netdev@vger.kernel.org>
Cc: linux-pm@vger.kernel.org
Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/base/dd.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Geert Uytterhoeven April 7, 2020, 7:50 a.m. UTC | #1
Hi John,

On Tue, Apr 7, 2020 at 9:06 AM John Stultz <john.stultz@linaro.org> wrote:
> In commit c8c43cee29f6 ("driver core: Fix
> driver_deferred_probe_check_state() logic"), we set the default
> driver_deferred_probe_timeout value to 30 seconds to allow for
> drivers that are missing dependencies to have some time so that
> the dependency may be loaded from userland after initcalls_done
> is set.
>
> However, Yoshihiro Shimoda reported that on his device that
> expects to have unmet dependencies (due to "optional links" in
> its devicetree), was failing to mount the NFS root.
>
> In digging further, it seemed the problem was that while the
> device properly probes after waiting 30 seconds for any missing
> modules to load, the ip_auto_config() had already failed,
> resulting in NFS to fail. This was due to ip_auto_config()
> calling wait_for_device_probe() which doesn't wait for the
> driver_deferred_probe_timeout to fire.
>
> This patch tries to fix the issue by creating a waitqueue
> for the driver_deferred_probe_timeout, and calling wait_event()
> to make sure driver_deferred_probe_timeout is zero in
> wait_for_device_probe() to make sure all the probing is
> finished.
>
> NOTE: I'm not 100% sure this won't have other unwanted side
> effects (I don't have failing hardware myself to validate),
> so I'd apprecate testing and close review.
>
> If this approach doesn't work, I'll simply set the default
> driver_deferred_probe_timeout value back to zero, to avoid any
> behavioral change from before.
>
> Thanks to Geert for chasing down that ip_auto_config was why NFS
> was failing in this case!
>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Cc: netdev <netdev@vger.kernel.org>
> Cc: linux-pm@vger.kernel.org
> Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
> Signed-off-by: John Stultz <john.stultz@linaro.org>

Thanks, this fixes the issue for me!

Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert
Geert Uytterhoeven April 7, 2020, 4:46 p.m. UTC | #2
Hi John,

On Tue, Apr 7, 2020 at 9:50 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> On Tue, Apr 7, 2020 at 9:06 AM John Stultz <john.stultz@linaro.org> wrote:
> > In commit c8c43cee29f6 ("driver core: Fix
> > driver_deferred_probe_check_state() logic"), we set the default
> > driver_deferred_probe_timeout value to 30 seconds to allow for
> > drivers that are missing dependencies to have some time so that
> > the dependency may be loaded from userland after initcalls_done
> > is set.
> >
> > However, Yoshihiro Shimoda reported that on his device that
> > expects to have unmet dependencies (due to "optional links" in
> > its devicetree), was failing to mount the NFS root.
> >
> > In digging further, it seemed the problem was that while the
> > device properly probes after waiting 30 seconds for any missing
> > modules to load, the ip_auto_config() had already failed,
> > resulting in NFS to fail. This was due to ip_auto_config()
> > calling wait_for_device_probe() which doesn't wait for the
> > driver_deferred_probe_timeout to fire.
> >
> > This patch tries to fix the issue by creating a waitqueue
> > for the driver_deferred_probe_timeout, and calling wait_event()
> > to make sure driver_deferred_probe_timeout is zero in
> > wait_for_device_probe() to make sure all the probing is
> > finished.
> >
> > NOTE: I'm not 100% sure this won't have other unwanted side
> > effects (I don't have failing hardware myself to validate),
> > so I'd apprecate testing and close review.
> >
> > If this approach doesn't work, I'll simply set the default
> > driver_deferred_probe_timeout value back to zero, to avoid any
> > behavioral change from before.
> >
> > Thanks to Geert for chasing down that ip_auto_config was why NFS
> > was failing in this case!
> >
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Cc: Rob Herring <robh@kernel.org>
> > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > Cc: netdev <netdev@vger.kernel.org>
> > Cc: linux-pm@vger.kernel.org
> > Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
> > Signed-off-by: John Stultz <john.stultz@linaro.org>
>
> Thanks, this fixes the issue for me!
>
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Unfortunately this adds another delay of ca. 30 s to mounting NFS root
when using a kernel config that does include IOMMU and MODULES
support.

Gr{oetje,eeting}s,

                        Geert
John Stultz April 7, 2020, 6:38 p.m. UTC | #3
On Tue, Apr 7, 2020 at 9:46 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi John,
>
> On Tue, Apr 7, 2020 at 9:50 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Tue, Apr 7, 2020 at 9:06 AM John Stultz <john.stultz@linaro.org> wrote:
> > > In commit c8c43cee29f6 ("driver core: Fix
> > > driver_deferred_probe_check_state() logic"), we set the default
> > > driver_deferred_probe_timeout value to 30 seconds to allow for
> > > drivers that are missing dependencies to have some time so that
> > > the dependency may be loaded from userland after initcalls_done
> > > is set.
> > >
> > > However, Yoshihiro Shimoda reported that on his device that
> > > expects to have unmet dependencies (due to "optional links" in
> > > its devicetree), was failing to mount the NFS root.
> > >
> > > In digging further, it seemed the problem was that while the
> > > device properly probes after waiting 30 seconds for any missing
> > > modules to load, the ip_auto_config() had already failed,
> > > resulting in NFS to fail. This was due to ip_auto_config()
> > > calling wait_for_device_probe() which doesn't wait for the
> > > driver_deferred_probe_timeout to fire.
> > >
> > > This patch tries to fix the issue by creating a waitqueue
> > > for the driver_deferred_probe_timeout, and calling wait_event()
> > > to make sure driver_deferred_probe_timeout is zero in
> > > wait_for_device_probe() to make sure all the probing is
> > > finished.
> > >
> > > NOTE: I'm not 100% sure this won't have other unwanted side
> > > effects (I don't have failing hardware myself to validate),
> > > so I'd apprecate testing and close review.
> > >
> > > If this approach doesn't work, I'll simply set the default
> > > driver_deferred_probe_timeout value back to zero, to avoid any
> > > behavioral change from before.
> > >
> > > Thanks to Geert for chasing down that ip_auto_config was why NFS
> > > was failing in this case!
> > >
> > > Cc: "David S. Miller" <davem@davemloft.net>
> > > Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> > > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> > > Cc: Jakub Kicinski <kuba@kernel.org>
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > Cc: Rob Herring <robh@kernel.org>
> > > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > > Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > > Cc: netdev <netdev@vger.kernel.org>
> > > Cc: linux-pm@vger.kernel.org
> > > Reported-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> > > Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() logic")
> > > Signed-off-by: John Stultz <john.stultz@linaro.org>
> >
> > Thanks, this fixes the issue for me!
> >
> > Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
>
> Unfortunately this adds another delay of ca. 30 s to mounting NFS root
> when using a kernel config that does include IOMMU and MODULES
> support.

Yea. I worry the other downside is that systems with no missing
dependencies will also see the stall here since we're waiting for the
timeout regardless of if there's any drivers missing.

So in the light of morning (well, just barely), I think just setting
the probe timeout to zero by default is the best approach. The series
then doesn't change behavior but just cleans things up.

Though, I guess one could argue this fix should go along with setting
the value to zero, so at least if folks specify a delay on the boot
cmd, things don't fail because they didn't wait.

thanks
-john
diff mbox series

Patch

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 06ec0e851fa1..8c13f0df3282 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -232,9 +232,10 @@  DEFINE_SHOW_ATTRIBUTE(deferred_devs);
 int driver_deferred_probe_timeout = 30;
 #else
 /* In the case of !modules, no probe timeout needed */
-int driver_deferred_probe_timeout = -1;
+int driver_deferred_probe_timeout;
 #endif
 EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
+static DECLARE_WAIT_QUEUE_HEAD(probe_timeout_waitqueue);
 
 static int __init deferred_probe_timeout_setup(char *str)
 {
@@ -266,7 +267,7 @@  int driver_deferred_probe_check_state(struct device *dev)
 		return -ENODEV;
 	}
 
-	if (!driver_deferred_probe_timeout) {
+	if (!driver_deferred_probe_timeout && initcalls_done) {
 		dev_WARN(dev, "deferred probe timeout, ignoring dependency");
 		return -ETIMEDOUT;
 	}
@@ -284,6 +285,7 @@  static void deferred_probe_timeout_work_func(struct work_struct *work)
 
 	list_for_each_entry_safe(private, p, &deferred_probe_pending_list, deferred_probe)
 		dev_info(private->device, "deferred probe pending");
+	wake_up(&probe_timeout_waitqueue);
 }
 static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_work_func);
 
@@ -658,6 +660,9 @@  int driver_probe_done(void)
  */
 void wait_for_device_probe(void)
 {
+	/* wait for probe timeout */
+	wait_event(probe_timeout_waitqueue, !driver_deferred_probe_timeout);
+
 	/* wait for the deferred probe workqueue to finish */
 	flush_work(&deferred_probe_work);