diff mbox series

driver core: Reorder devices on successful probe

Message ID 20201203175756.1405564-1-thierry.reding@gmail.com (mailing list archive)
State Not Applicable, archived
Headers show
Series driver core: Reorder devices on successful probe | expand

Commit Message

Thierry Reding Dec. 3, 2020, 5:57 p.m. UTC
From: Thierry Reding <treding@nvidia.com>

Device drivers usually depend on the fact that the devices that they
control are suspended in the same order that they were probed in. In
most cases this is already guaranteed via deferred probe.

However, there's one case where this can still break: if a device is
instantiated before a dependency (for example if it appears before the
dependency in device tree) but gets probed only after the dependency is
probed. Instantiation order would cause the dependency to get probed
later, in which case probe of the original device would be deferred and
the suspend/resume queue would get reordered properly. However, if the
dependency is provided by a built-in driver and the device depending on
that driver is controlled by a loadable module, which may only get
loaded after the root filesystem has become available, we can be faced
with a situation where the probe order ends up being different from the
suspend/resume order.

One example where this happens is on Tegra186, where the ACONNECT is
listed very early in device tree (sorted by unit-address) and depends on
BPMP (listed very late because it has no unit-address) for power domains
and clocks/resets. If the ACONNECT driver is built-in, there is no
problem because it will be probed before BPMP, causing a probe deferral
and that in turn reorders the suspend/resume queue. However, if built as
a module, it will end up being probed after BPMP, and therefore not
result in a probe deferral, and therefore the suspend/resume queue will
stay in the instantiation order. This in turn causes problems because
ACONNECT will be resumed before BPMP, which will result in a hang
because the ACONNECT's power domain cannot be powered on as long as the
BPMP is still suspended.

Fix this by always reordering devices on successful probe. This ensures
that the suspend/resume queue is always in probe order and hence meets
the natural expectations of drivers vs. their dependencies.

Reported-by: Jonathan Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/base/dd.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Rafael J. Wysocki Dec. 3, 2020, 6:17 p.m. UTC | #1
On Thu, Dec 3, 2020 at 6:58 PM Thierry Reding <thierry.reding@gmail.com> wrote:
>
> From: Thierry Reding <treding@nvidia.com>
>
> Device drivers usually depend on the fact that the devices that they
> control are suspended in the same order that they were probed in. In
> most cases this is already guaranteed via deferred probe.
>
> However, there's one case where this can still break: if a device is
> instantiated before a dependency (for example if it appears before the
> dependency in device tree) but gets probed only after the dependency is
> probed. Instantiation order would cause the dependency to get probed
> later, in which case probe of the original device would be deferred and
> the suspend/resume queue would get reordered properly. However, if the
> dependency is provided by a built-in driver and the device depending on
> that driver is controlled by a loadable module, which may only get
> loaded after the root filesystem has become available, we can be faced
> with a situation where the probe order ends up being different from the
> suspend/resume order.
>
> One example where this happens is on Tegra186, where the ACONNECT is
> listed very early in device tree (sorted by unit-address) and depends on
> BPMP (listed very late because it has no unit-address) for power domains
> and clocks/resets. If the ACONNECT driver is built-in, there is no
> problem because it will be probed before BPMP, causing a probe deferral
> and that in turn reorders the suspend/resume queue. However, if built as
> a module, it will end up being probed after BPMP, and therefore not
> result in a probe deferral, and therefore the suspend/resume queue will
> stay in the instantiation order. This in turn causes problems because
> ACONNECT will be resumed before BPMP, which will result in a hang
> because the ACONNECT's power domain cannot be powered on as long as the
> BPMP is still suspended.
>
> Fix this by always reordering devices on successful probe. This ensures
> that the suspend/resume queue is always in probe order and hence meets
> the natural expectations of drivers vs. their dependencies.
>
> Reported-by: Jonathan Hunter <jonathanh@nvidia.com>
> Signed-off-by: Thierry Reding <treding@nvidia.com>

Saravana had submitted a very similar patch (I don't have a pointer to
that one though) and I was against it at that time due to
overhead-related concerns.  There still are some, but maybe that
doesn't matter in practice.

Also, I kind of expect this to blow up somewhere, but since I have no
examples ready from the top of my head, I think let's try and see, so:

Acked-by: Rafael. J. Wysocki <rafael@kernel.org>

> ---
>  drivers/base/dd.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 148e81969e04..cfc079e738bb 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -370,6 +370,13 @@ static void driver_bound(struct device *dev)
>
>         device_pm_check_callbacks(dev);
>
> +       /*
> +        * Reorder successfully probed devices to the end of the device list.
> +        * This ensures that suspend/resume order matches probe order, which
> +        * is usually what drivers rely on.
> +        */
> +       device_pm_move_to_tail(dev);
> +
>         /*
>          * Make sure the device is no longer in one of the deferred lists and
>          * kick off retrying all pending devices
> --
> 2.29.2
>
Saravana Kannan Dec. 3, 2020, 7:19 p.m. UTC | #2
On Thu, Dec 3, 2020 at 10:17 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Dec 3, 2020 at 6:58 PM Thierry Reding <thierry.reding@gmail.com> wrote:
> >
> > From: Thierry Reding <treding@nvidia.com>
> >
> > Device drivers usually depend on the fact that the devices that they
> > control are suspended in the same order that they were probed in. In
> > most cases this is already guaranteed via deferred probe.
> >
> > However, there's one case where this can still break: if a device is
> > instantiated before a dependency (for example if it appears before the
> > dependency in device tree) but gets probed only after the dependency is
> > probed. Instantiation order would cause the dependency to get probed
> > later, in which case probe of the original device would be deferred and
> > the suspend/resume queue would get reordered properly. However, if the
> > dependency is provided by a built-in driver and the device depending on
> > that driver is controlled by a loadable module, which may only get
> > loaded after the root filesystem has become available, we can be faced
> > with a situation where the probe order ends up being different from the
> > suspend/resume order.
> >
> > One example where this happens is on Tegra186, where the ACONNECT is
> > listed very early in device tree (sorted by unit-address) and depends on
> > BPMP (listed very late because it has no unit-address) for power domains
> > and clocks/resets. If the ACONNECT driver is built-in, there is no
> > problem because it will be probed before BPMP, causing a probe deferral
> > and that in turn reorders the suspend/resume queue. However, if built as
> > a module, it will end up being probed after BPMP, and therefore not
> > result in a probe deferral, and therefore the suspend/resume queue will
> > stay in the instantiation order. This in turn causes problems because
> > ACONNECT will be resumed before BPMP, which will result in a hang
> > because the ACONNECT's power domain cannot be powered on as long as the
> > BPMP is still suspended.
> >
> > Fix this by always reordering devices on successful probe. This ensures
> > that the suspend/resume queue is always in probe order and hence meets
> > the natural expectations of drivers vs. their dependencies.
> >
> > Reported-by: Jonathan Hunter <jonathanh@nvidia.com>
> > Signed-off-by: Thierry Reding <treding@nvidia.com>
>
> Saravana had submitted a very similar patch (I don't have a pointer to
> that one though) and I was against it at that time due to
> overhead-related concerns.  There still are some, but maybe that
> doesn't matter in practice.

Yeah, it's a very similar patch but I also suggested deleting the
reorder done in the deferred probe code (I'm pretty sure we can drop
it). Here's the thread:
https://lore.kernel.org/lkml/20200625032430.152447-1-saravanak@google.com/

Btw, I've been wondering about this recently. Do we even need
device_pm_move_to_tail() to do the recursive thing once we do "add
device to end of list when added" + "move probed devices to the end
after probe" thing here? Doesn't this guarantee that none of the
consumers can come before the supplier in the dpm list?

-Saravana

>
> Also, I kind of expect this to blow up somewhere, but since I have no
> examples ready from the top of my head, I think let's try and see, so:
>
> Acked-by: Rafael. J. Wysocki <rafael@kernel.org>
>
> > ---
> >  drivers/base/dd.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index 148e81969e04..cfc079e738bb 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -370,6 +370,13 @@ static void driver_bound(struct device *dev)
> >
> >         device_pm_check_callbacks(dev);
> >
> > +       /*
> > +        * Reorder successfully probed devices to the end of the device list.
> > +        * This ensures that suspend/resume order matches probe order, which
> > +        * is usually what drivers rely on.
> > +        */
> > +       device_pm_move_to_tail(dev);
> > +
> >         /*
> >          * Make sure the device is no longer in one of the deferred lists and
> >          * kick off retrying all pending devices
> > --
> > 2.29.2
> >
Thierry Reding Dec. 4, 2020, 10:46 a.m. UTC | #3
On Thu, Dec 03, 2020 at 11:19:11AM -0800, Saravana Kannan wrote:
> On Thu, Dec 3, 2020 at 10:17 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Thu, Dec 3, 2020 at 6:58 PM Thierry Reding <thierry.reding@gmail.com> wrote:
> > >
> > > From: Thierry Reding <treding@nvidia.com>
> > >
> > > Device drivers usually depend on the fact that the devices that they
> > > control are suspended in the same order that they were probed in. In
> > > most cases this is already guaranteed via deferred probe.
> > >
> > > However, there's one case where this can still break: if a device is
> > > instantiated before a dependency (for example if it appears before the
> > > dependency in device tree) but gets probed only after the dependency is
> > > probed. Instantiation order would cause the dependency to get probed
> > > later, in which case probe of the original device would be deferred and
> > > the suspend/resume queue would get reordered properly. However, if the
> > > dependency is provided by a built-in driver and the device depending on
> > > that driver is controlled by a loadable module, which may only get
> > > loaded after the root filesystem has become available, we can be faced
> > > with a situation where the probe order ends up being different from the
> > > suspend/resume order.
> > >
> > > One example where this happens is on Tegra186, where the ACONNECT is
> > > listed very early in device tree (sorted by unit-address) and depends on
> > > BPMP (listed very late because it has no unit-address) for power domains
> > > and clocks/resets. If the ACONNECT driver is built-in, there is no
> > > problem because it will be probed before BPMP, causing a probe deferral
> > > and that in turn reorders the suspend/resume queue. However, if built as
> > > a module, it will end up being probed after BPMP, and therefore not
> > > result in a probe deferral, and therefore the suspend/resume queue will
> > > stay in the instantiation order. This in turn causes problems because
> > > ACONNECT will be resumed before BPMP, which will result in a hang
> > > because the ACONNECT's power domain cannot be powered on as long as the
> > > BPMP is still suspended.
> > >
> > > Fix this by always reordering devices on successful probe. This ensures
> > > that the suspend/resume queue is always in probe order and hence meets
> > > the natural expectations of drivers vs. their dependencies.
> > >
> > > Reported-by: Jonathan Hunter <jonathanh@nvidia.com>
> > > Signed-off-by: Thierry Reding <treding@nvidia.com>
> >
> > Saravana had submitted a very similar patch (I don't have a pointer to
> > that one though) and I was against it at that time due to
> > overhead-related concerns.  There still are some, but maybe that
> > doesn't matter in practice.
> 
> Yeah, it's a very similar patch but I also suggested deleting the
> reorder done in the deferred probe code (I'm pretty sure we can drop
> it). Here's the thread:
> https://lore.kernel.org/lkml/20200625032430.152447-1-saravanak@google.com/

Yeah, I was thinking about that as well and I agree with your assessment
that this should no longer be necessary after this patch. Any device
that would've gotten reordered by a deferred probe should now get
reordered after successful probe as well. The ordering of the DPM list
may not end up being exactly the same, but at least it should be from a
functional point of view.

I didn't want to roll these changes all into one so that we could first
make sure that this change itself works and then at a later time we
could remove the deferred probe reordering incrementally to validate
that we really no longer need it.

But thinking about it, it might be worth making these changes all at
once, so that when we break things we break them once instead of
potentially twice.

> Btw, I've been wondering about this recently. Do we even need
> device_pm_move_to_tail() to do the recursive thing once we do "add
> device to end of list when added" + "move probed devices to the end
> after probe" thing here? Doesn't this guarantee that none of the
> consumers can come before the supplier in the dpm list?

I think we still at least need to reorder the children recursively, but
I'm pretty sure we could also drop the reordering of consumer device
links because, yes, every consumer should defer probe if the provider
hasn't been probed yet, which in turn would cause the consumer to get
sorted into the right position in the DPM queue after its successful
probe.

Thierry
Thierry Reding Dec. 4, 2020, 10:55 a.m. UTC | #4
On Thu, Dec 03, 2020 at 07:17:30PM +0100, Rafael J. Wysocki wrote:
> On Thu, Dec 3, 2020 at 6:58 PM Thierry Reding <thierry.reding@gmail.com> wrote:
> >
> > From: Thierry Reding <treding@nvidia.com>
> >
> > Device drivers usually depend on the fact that the devices that they
> > control are suspended in the same order that they were probed in. In
> > most cases this is already guaranteed via deferred probe.
> >
> > However, there's one case where this can still break: if a device is
> > instantiated before a dependency (for example if it appears before the
> > dependency in device tree) but gets probed only after the dependency is
> > probed. Instantiation order would cause the dependency to get probed
> > later, in which case probe of the original device would be deferred and
> > the suspend/resume queue would get reordered properly. However, if the
> > dependency is provided by a built-in driver and the device depending on
> > that driver is controlled by a loadable module, which may only get
> > loaded after the root filesystem has become available, we can be faced
> > with a situation where the probe order ends up being different from the
> > suspend/resume order.
> >
> > One example where this happens is on Tegra186, where the ACONNECT is
> > listed very early in device tree (sorted by unit-address) and depends on
> > BPMP (listed very late because it has no unit-address) for power domains
> > and clocks/resets. If the ACONNECT driver is built-in, there is no
> > problem because it will be probed before BPMP, causing a probe deferral
> > and that in turn reorders the suspend/resume queue. However, if built as
> > a module, it will end up being probed after BPMP, and therefore not
> > result in a probe deferral, and therefore the suspend/resume queue will
> > stay in the instantiation order. This in turn causes problems because
> > ACONNECT will be resumed before BPMP, which will result in a hang
> > because the ACONNECT's power domain cannot be powered on as long as the
> > BPMP is still suspended.
> >
> > Fix this by always reordering devices on successful probe. This ensures
> > that the suspend/resume queue is always in probe order and hence meets
> > the natural expectations of drivers vs. their dependencies.
> >
> > Reported-by: Jonathan Hunter <jonathanh@nvidia.com>
> > Signed-off-by: Thierry Reding <treding@nvidia.com>
> 
> Saravana had submitted a very similar patch (I don't have a pointer to
> that one though) and I was against it at that time due to
> overhead-related concerns.  There still are some, but maybe that
> doesn't matter in practice.

I suspect that any overhead would be offset if we can get rid of the
deferred probe reordering and the recursive provider/consumer reordering
as discussed with Saravana in that other subthread. Even if we can't do
that, this is a one-time cost per device and therefore shouldn't have a
huge impact.

Besides, as the example above and Saravana's in the discussion in June
shows, there are certain cases where we just have no other way of doing
the sorting correctly, so I think we need this for correctness.

> Also, I kind of expect this to blow up somewhere, but since I have no
> examples ready from the top of my head, I think let's try and see, so:

I'm slightly worried about that, too. But I did give this quite a bit of
thought and I can't come up with a case where this would blow up. Maybe
the one case where this might break something is if some combination of
drivers specifically rely on the suspend/resume order to be *different*
from the probe order. That's a bit far-fetched and I would think that
either driver would have a workaround in place to deal with that somehow
so this might actually unveil such workarounds and gives us an
opportunity to do things right.

But I think it'd probably be best to feed this into linux-next sometime
soon after v5.11-rc1 to get broad exposure and see if there are any
cases where this causes trouble.

Thierry
diff mbox series

Patch

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 148e81969e04..cfc079e738bb 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -370,6 +370,13 @@  static void driver_bound(struct device *dev)
 
 	device_pm_check_callbacks(dev);
 
+	/*
+	 * Reorder successfully probed devices to the end of the device list.
+	 * This ensures that suspend/resume order matches probe order, which
+	 * is usually what drivers rely on.
+	 */
+	device_pm_move_to_tail(dev);
+
 	/*
 	 * Make sure the device is no longer in one of the deferred lists and
 	 * kick off retrying all pending devices