diff mbox

PM / runtime: Drop children check from __pm_runtime_set_status()

Message ID 1713438.irjm9MTSvo@aspire.rjw.lan (mailing list archive)
State Mainlined
Delegated to: Rafael Wysocki
Headers show

Commit Message

Rafael J. Wysocki Nov. 12, 2017, 12:27 a.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The check for "active" children in __pm_runtime_set_status(), when
trying to set the parent device status to "suspended", doesn't
really make sense, because in fact it is not invalid to set the
status of a device with runtime PM disabled to "suspended" in any
case.  It is invalid to enable runtime PM for a device with its
status set to "suspended" while its child_count reference counter
is nonzero, but the check in __pm_runtime_set_status() doesn't
really cover that situation.

For this reason, drop the children check from __pm_runtime_set_status()
and add a check against child_count reference counters of "suspended"
devices to pm_runtime_enable().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/base/power/runtime.c |   30 ++++++++++--------------------
 1 file changed, 10 insertions(+), 20 deletions(-)

Comments

Ulf Hansson Nov. 13, 2017, 1:26 p.m. UTC | #1
On 12 November 2017 at 01:27, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The check for "active" children in __pm_runtime_set_status(), when
> trying to set the parent device status to "suspended", doesn't
> really make sense, because in fact it is not invalid to set the
> status of a device with runtime PM disabled to "suspended" in any
> case.  It is invalid to enable runtime PM for a device with its
> status set to "suspended" while its child_count reference counter
> is nonzero, but the check in __pm_runtime_set_status() doesn't
> really cover that situation.

The reason to why I changed this in commit a8636c89648a ("PM /
Runtime: Don't allow to suspend a device with an active child") was
because to get a consistent behavior.

Because, setting the device's status to active (RPM_ACTIVE) via
__pm_runtime_set_status(), requires its parent to also be active (in
case the parent has runtime PM enabled).

I would prefer to try maintain this consistency, but I also I realize
that commit a8636c89648a, should also have been checking if the parent
has runtime PM enabled (again for consistency), which it doesn't.

What about fixing that instead?

Kind regards
Uffe

>
> For this reason, drop the children check from __pm_runtime_set_status()
> and add a check against child_count reference counters of "suspended"
> devices to pm_runtime_enable().
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/base/power/runtime.c |   30 ++++++++++--------------------
>  1 file changed, 10 insertions(+), 20 deletions(-)
>
> Index: linux-pm/drivers/base/power/runtime.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/runtime.c
> +++ linux-pm/drivers/base/power/runtime.c
> @@ -1101,29 +1101,13 @@ int __pm_runtime_set_status(struct devic
>                 goto out;
>         }
>
> -       if (dev->power.runtime_status == status)
> +       if (dev->power.runtime_status == status || !parent)
>                 goto out_set;
>
>         if (status == RPM_SUSPENDED) {
> -               /*
> -                * It is invalid to suspend a device with an active child,
> -                * unless it has been set to ignore its children.
> -                */
> -               if (!dev->power.ignore_children &&
> -                       atomic_read(&dev->power.child_count)) {
> -                       dev_err(dev, "runtime PM trying to suspend device but active child\n");
> -                       error = -EBUSY;
> -                       goto out;
> -               }
> -
> -               if (parent) {
> -                       atomic_add_unless(&parent->power.child_count, -1, 0);
> -                       notify_parent = !parent->power.ignore_children;
> -               }
> -               goto out_set;
> -       }
> -
> -       if (parent) {
> +               atomic_add_unless(&parent->power.child_count, -1, 0);
> +               notify_parent = !parent->power.ignore_children;
> +       } else {
>                 spin_lock_nested(&parent->power.lock, SINGLE_DEPTH_NESTING);
>
>                 /*
> @@ -1307,6 +1291,12 @@ void pm_runtime_enable(struct device *de
>         else
>                 dev_warn(dev, "Unbalanced %s!\n", __func__);
>
> +       WARN(dev->power.runtime_status == RPM_SUSPENDED &&
> +            !dev->power.ignore_children &&
> +            atomic_read(&dev->power.child_count) > 0,
> +            "Enabling runtime PM for inactive device (%s) with active children\n",
> +            dev_name(dev));
> +
>         spin_unlock_irqrestore(&dev->power.lock, flags);
>  }
>  EXPORT_SYMBOL_GPL(pm_runtime_enable);
>
Rafael J. Wysocki Nov. 13, 2017, 9:50 p.m. UTC | #2
On Monday, November 13, 2017 2:26:28 PM CET Ulf Hansson wrote:
> On 12 November 2017 at 01:27, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > The check for "active" children in __pm_runtime_set_status(), when
> > trying to set the parent device status to "suspended", doesn't
> > really make sense, because in fact it is not invalid to set the
> > status of a device with runtime PM disabled to "suspended" in any
> > case.  It is invalid to enable runtime PM for a device with its
> > status set to "suspended" while its child_count reference counter
> > is nonzero, but the check in __pm_runtime_set_status() doesn't
> > really cover that situation.
> 
> The reason to why I changed this in commit a8636c89648a ("PM /
> Runtime: Don't allow to suspend a device with an active child") was
> because to get a consistent behavior.

Well, it causes the function to return an error in a non-error situation,
which IMnsHO is a bug.

> Because, setting the device's status to active (RPM_ACTIVE) via
> __pm_runtime_set_status(), requires its parent to also be active (in
> case the parent has runtime PM enabled).

No!!!

The check is in there, because the parent's usage_count is affected by that
code and incrementing it in case the parent has runtime PM enabled and is
RPM_SUSPENDED leads to an inconsistent runtime PM state of the parent.  IOW,
it would confuse the framework.

There's no such issue if the runtime PM status of a child is set to RPM_SUSPENDED.

It is all consistent without the check I'm removing and is made inconsistent
by that very check.

> I would prefer to try maintain this consistency, but I also I realize
> that commit a8636c89648a, should also have been checking if the parent
> has runtime PM enabled (again for consistency), which it doesn't.
> 
> What about fixing that instead?

Runtime PM is *disabled* for the parent at this point, guaranteed, so there's
nothing to check, I'm afraid ...

OTOH, the runtime PM status of the children doesn't matter here, because their
reference counters are not updated.

Thanks,
Rafael
Rafael J. Wysocki Nov. 13, 2017, 9:58 p.m. UTC | #3
On Mon, Nov 13, 2017 at 10:50 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Monday, November 13, 2017 2:26:28 PM CET Ulf Hansson wrote:
>> On 12 November 2017 at 01:27, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> >
>> > The check for "active" children in __pm_runtime_set_status(), when
>> > trying to set the parent device status to "suspended", doesn't
>> > really make sense, because in fact it is not invalid to set the
>> > status of a device with runtime PM disabled to "suspended" in any
>> > case.  It is invalid to enable runtime PM for a device with its
>> > status set to "suspended" while its child_count reference counter
>> > is nonzero, but the check in __pm_runtime_set_status() doesn't
>> > really cover that situation.
>>
>> The reason to why I changed this in commit a8636c89648a ("PM /
>> Runtime: Don't allow to suspend a device with an active child") was
>> because to get a consistent behavior.
>
> Well, it causes the function to return an error in a non-error situation,
> which IMnsHO is a bug.
>
>> Because, setting the device's status to active (RPM_ACTIVE) via
>> __pm_runtime_set_status(), requires its parent to also be active (in
>> case the parent has runtime PM enabled).
>
> No!!!
>
> The check is in there, because the parent's usage_count is affected by that

Actually, the parent's child_count is affected, but that doesn't matter here.

> code and incrementing it in case the parent has runtime PM enabled and is
> RPM_SUSPENDED leads to an inconsistent runtime PM state of the parent.  IOW,
> it would confuse the framework.
>
> There's no such issue if the runtime PM status of a child is set to RPM_SUSPENDED.
>
> It is all consistent without the check I'm removing and is made inconsistent
> by that very check.
>
>> I would prefer to try maintain this consistency, but I also I realize
>> that commit a8636c89648a, should also have been checking if the parent
>> has runtime PM enabled (again for consistency), which it doesn't.
>>
>> What about fixing that instead?
>
> Runtime PM is *disabled* for the parent at this point, guaranteed, so there's
> nothing to check, I'm afraid ...
>
> OTOH, the runtime PM status of the children doesn't matter here, because their
> reference counters are not updated.

Thanks,
Rafael
Ulf Hansson Nov. 14, 2017, 9:13 a.m. UTC | #4
On 13 November 2017 at 22:50, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Monday, November 13, 2017 2:26:28 PM CET Ulf Hansson wrote:
>> On 12 November 2017 at 01:27, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> >
>> > The check for "active" children in __pm_runtime_set_status(), when
>> > trying to set the parent device status to "suspended", doesn't
>> > really make sense, because in fact it is not invalid to set the
>> > status of a device with runtime PM disabled to "suspended" in any
>> > case.  It is invalid to enable runtime PM for a device with its
>> > status set to "suspended" while its child_count reference counter
>> > is nonzero, but the check in __pm_runtime_set_status() doesn't
>> > really cover that situation.
>>
>> The reason to why I changed this in commit a8636c89648a ("PM /
>> Runtime: Don't allow to suspend a device with an active child") was
>> because to get a consistent behavior.
>
> Well, it causes the function to return an error in a non-error situation,
> which IMnsHO is a bug.
>
>> Because, setting the device's status to active (RPM_ACTIVE) via
>> __pm_runtime_set_status(), requires its parent to also be active (in
>> case the parent has runtime PM enabled).
>
> No!!!
>
> The check is in there, because the parent's usage_count is affected by that
> code and incrementing it in case the parent has runtime PM enabled and is
> RPM_SUSPENDED leads to an inconsistent runtime PM state of the parent.  IOW,
> it would confuse the framework.

Right, I do understand the reasons *why* it is like this.

>
> There's no such issue if the runtime PM status of a child is set to RPM_SUSPENDED.
>
> It is all consistent without the check I'm removing and is made inconsistent
> by that very check.
>
>> I would prefer to try maintain this consistency, but I also I realize
>> that commit a8636c89648a, should also have been checking if the parent
>> has runtime PM enabled (again for consistency), which it doesn't.
>>
>> What about fixing that instead?
>
> Runtime PM is *disabled* for the parent at this point, guaranteed, so there's
> nothing to check, I'm afraid ...

Where and how is that guarantee made?

[...]

Kind regards
Uffe
Ulf Hansson Nov. 14, 2017, 9:56 a.m. UTC | #5
On 14 November 2017 at 10:13, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 13 November 2017 at 22:50, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> On Monday, November 13, 2017 2:26:28 PM CET Ulf Hansson wrote:
>>> On 12 November 2017 at 01:27, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>> >
>>> > The check for "active" children in __pm_runtime_set_status(), when
>>> > trying to set the parent device status to "suspended", doesn't
>>> > really make sense, because in fact it is not invalid to set the
>>> > status of a device with runtime PM disabled to "suspended" in any
>>> > case.  It is invalid to enable runtime PM for a device with its
>>> > status set to "suspended" while its child_count reference counter
>>> > is nonzero, but the check in __pm_runtime_set_status() doesn't
>>> > really cover that situation.
>>>
>>> The reason to why I changed this in commit a8636c89648a ("PM /
>>> Runtime: Don't allow to suspend a device with an active child") was
>>> because to get a consistent behavior.
>>
>> Well, it causes the function to return an error in a non-error situation,
>> which IMnsHO is a bug.
>>
>>> Because, setting the device's status to active (RPM_ACTIVE) via
>>> __pm_runtime_set_status(), requires its parent to also be active (in
>>> case the parent has runtime PM enabled).
>>
>> No!!!
>>
>> The check is in there, because the parent's usage_count is affected by that
>> code and incrementing it in case the parent has runtime PM enabled and is
>> RPM_SUSPENDED leads to an inconsistent runtime PM state of the parent.  IOW,
>> it would confuse the framework.
>
> Right, I do understand the reasons *why* it is like this.
>
>>
>> There's no such issue if the runtime PM status of a child is set to RPM_SUSPENDED.
>>
>> It is all consistent without the check I'm removing and is made inconsistent
>> by that very check.
>>
>>> I would prefer to try maintain this consistency, but I also I realize
>>> that commit a8636c89648a, should also have been checking if the parent
>>> has runtime PM enabled (again for consistency), which it doesn't.
>>>
>>> What about fixing that instead?
>>
>> Runtime PM is *disabled* for the parent at this point, guaranteed, so there's
>> nothing to check, I'm afraid ...
>
> Where and how is that guarantee made?

Oh, just realize that it should be "child" instead of "parent", in the
above reasoning. Apologize for giving the wrong argument.

So, let's me take this once again, to make it clear.

When pm_runtime_set_suspended(dev) is called, dev's child device may
still be runtime PM enabled and active.
I was suggesting to add a check for this scenario, to see if dev's
child device is runtime PM is enabled, as and additional constraint
before deciding to return an error code.

The idea was to get a consistent behavior, from the
pm_runtime_set_active|suspended() APIs point of view, and not from the
runtime PM core point of view.

Of course, because dev->power.child_count, is maintained properly even
when runtime PM is disabled for dev's child device, it works as you
have suggested in $subject patch as well.

[...]

Kind regards
Uffe
Rafael J. Wysocki Nov. 14, 2017, 9:44 p.m. UTC | #6
On Tuesday, November 14, 2017 10:56:39 AM CET Ulf Hansson wrote:
> On 14 November 2017 at 10:13, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > On 13 November 2017 at 22:50, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >> On Monday, November 13, 2017 2:26:28 PM CET Ulf Hansson wrote:
> >>> On 12 November 2017 at 01:27, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >>> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>> >
> >>> > The check for "active" children in __pm_runtime_set_status(), when
> >>> > trying to set the parent device status to "suspended", doesn't
> >>> > really make sense, because in fact it is not invalid to set the
> >>> > status of a device with runtime PM disabled to "suspended" in any
> >>> > case.  It is invalid to enable runtime PM for a device with its
> >>> > status set to "suspended" while its child_count reference counter
> >>> > is nonzero, but the check in __pm_runtime_set_status() doesn't
> >>> > really cover that situation.
> >>>
> >>> The reason to why I changed this in commit a8636c89648a ("PM /
> >>> Runtime: Don't allow to suspend a device with an active child") was
> >>> because to get a consistent behavior.
> >>
> >> Well, it causes the function to return an error in a non-error situation,
> >> which IMnsHO is a bug.
> >>
> >>> Because, setting the device's status to active (RPM_ACTIVE) via
> >>> __pm_runtime_set_status(), requires its parent to also be active (in
> >>> case the parent has runtime PM enabled).
> >>
> >> No!!!
> >>
> >> The check is in there, because the parent's usage_count is affected by that
> >> code and incrementing it in case the parent has runtime PM enabled and is
> >> RPM_SUSPENDED leads to an inconsistent runtime PM state of the parent.  IOW,
> >> it would confuse the framework.
> >
> > Right, I do understand the reasons *why* it is like this.
> >
> >>
> >> There's no such issue if the runtime PM status of a child is set to RPM_SUSPENDED.
> >>
> >> It is all consistent without the check I'm removing and is made inconsistent
> >> by that very check.
> >>
> >>> I would prefer to try maintain this consistency, but I also I realize
> >>> that commit a8636c89648a, should also have been checking if the parent
> >>> has runtime PM enabled (again for consistency), which it doesn't.
> >>>
> >>> What about fixing that instead?
> >>
> >> Runtime PM is *disabled* for the parent at this point, guaranteed, so there's
> >> nothing to check, I'm afraid ...
> >
> > Where and how is that guarantee made?
> 
> Oh, just realize that it should be "child" instead of "parent", in the
> above reasoning. Apologize for giving the wrong argument.
> 
> So, let's me take this once again, to make it clear.
> 
> When pm_runtime_set_suspended(dev) is called, dev's child device may
> still be runtime PM enabled and active.
> I was suggesting to add a check for this scenario, to see if dev's
> child device is runtime PM is enabled, as and additional constraint
> before deciding to return an error code.

Well, that's sort of difficult to do, however, because the code would need to
walk all of the children of the device and the child power lock cannot be
acquired under the one of the parent, so it would be fragile and ugly.

> The idea was to get a consistent behavior, from the
> pm_runtime_set_active|suspended() APIs point of view, and not from the
> runtime PM core point of view.

Yes, but the cost is high and the benefit is shallow.

The enable-time WARN() should cover the really broken cases without that
much complexity.

Thanks,
Rafael
Ulf Hansson Nov. 15, 2017, 7:22 a.m. UTC | #7
[...]

>>
>> When pm_runtime_set_suspended(dev) is called, dev's child device may
>> still be runtime PM enabled and active.
>> I was suggesting to add a check for this scenario, to see if dev's
>> child device is runtime PM is enabled, as and additional constraint
>> before deciding to return an error code.
>
> Well, that's sort of difficult to do, however, because the code would need to
> walk all of the children of the device and the child power lock cannot be
> acquired under the one of the parent, so it would be fragile and ugly.

Yeah, you have a point.

>
>> The idea was to get a consistent behavior, from the
>> pm_runtime_set_active|suspended() APIs point of view, and not from the
>> runtime PM core point of view.
>
> Yes, but the cost is high and the benefit is shallow.
>
> The enable-time WARN() should cover the really broken cases without that
> much complexity.

Fair enough!

Feel free to add:
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

Kind regards
Uffe
Johan Hovold Nov. 16, 2017, 9:22 a.m. UTC | #8
On Sun, Nov 12, 2017 at 01:27:30AM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The check for "active" children in __pm_runtime_set_status(), when
> trying to set the parent device status to "suspended", doesn't
> really make sense, because in fact it is not invalid to set the
> status of a device with runtime PM disabled to "suspended" in any
> case.  It is invalid to enable runtime PM for a device with its
> status set to "suspended" while its child_count reference counter
> is nonzero, but the check in __pm_runtime_set_status() doesn't
> really cover that situation.
> 
> For this reason, drop the children check from __pm_runtime_set_status()
> and add a check against child_count reference counters of "suspended"
> devices to pm_runtime_enable().
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Looks good to me, but you should also fix
Documentation/power/runtime_pm.txt which was updated to reflect the
constraint that is now being reverted.

Reviewed-by: Johan Hovold <johan@kernel.org>

Thanks,
Johan
Rafael J. Wysocki Nov. 16, 2017, 1:57 p.m. UTC | #9
On Thursday, November 16, 2017 10:22:41 AM CET Johan Hovold wrote:
> On Sun, Nov 12, 2017 at 01:27:30AM +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > The check for "active" children in __pm_runtime_set_status(), when
> > trying to set the parent device status to "suspended", doesn't
> > really make sense, because in fact it is not invalid to set the
> > status of a device with runtime PM disabled to "suspended" in any
> > case.  It is invalid to enable runtime PM for a device with its
> > status set to "suspended" while its child_count reference counter
> > is nonzero, but the check in __pm_runtime_set_status() doesn't
> > really cover that situation.
> > 
> > For this reason, drop the children check from __pm_runtime_set_status()
> > and add a check against child_count reference counters of "suspended"
> > devices to pm_runtime_enable().
> > 
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Looks good to me, but you should also fix
> Documentation/power/runtime_pm.txt which was updated to reflect the
> constraint that is now being reverted.

Thanks for pointing that out.

> Reviewed-by: Johan Hovold <johan@kernel.org>

Thanks!

Rafael
Geert Uytterhoeven Nov. 28, 2017, 10:58 a.m. UTC | #10
Hi Rafael, Shimoda-san,

On Sun, Nov 12, 2017 at 1:27 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The check for "active" children in __pm_runtime_set_status(), when
> trying to set the parent device status to "suspended", doesn't
> really make sense, because in fact it is not invalid to set the
> status of a device with runtime PM disabled to "suspended" in any
> case.  It is invalid to enable runtime PM for a device with its
> status set to "suspended" while its child_count reference counter
> is nonzero, but the check in __pm_runtime_set_status() doesn't
> really cover that situation.
>
> For this reason, drop the children check from __pm_runtime_set_status()
> and add a check against child_count reference counters of "suspended"
> devices to pm_runtime_enable().
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/base/power/runtime.c |   30 ++++++++++--------------------
>  1 file changed, 10 insertions(+), 20 deletions(-)
>
> Index: linux-pm/drivers/base/power/runtime.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/runtime.c
> +++ linux-pm/drivers/base/power/runtime.c
> @@ -1101,29 +1101,13 @@ int __pm_runtime_set_status(struct devic
>                 goto out;
>         }
>
> -       if (dev->power.runtime_status == status)
> +       if (dev->power.runtime_status == status || !parent)
>                 goto out_set;
>
>         if (status == RPM_SUSPENDED) {
> -               /*
> -                * It is invalid to suspend a device with an active child,
> -                * unless it has been set to ignore its children.
> -                */
> -               if (!dev->power.ignore_children &&
> -                       atomic_read(&dev->power.child_count)) {
> -                       dev_err(dev, "runtime PM trying to suspend device but active child\n");

JFTR, this triggered before during system resume on e.g. Salvator-XS with
R-Car H3:

    ohci-platform ee080000.usb: runtime PM trying to suspend device
but active child
    phy_rcar_gen3_usb2 ee080200.usb-phy: runtime PM trying to suspend
device but active child
    ohci-platform ee0c0000.usb: runtime PM trying to suspend device
but active child
    ohci-platform ee0a0000.usb: runtime PM trying to suspend device
but active child
    phy_rcar_gen3_usb2 ee0c0200.usb-phy: runtime PM trying to suspend
device but active child
    phy_rcar_gen3_usb2 ee0a0200.usb-phy: runtime PM trying to suspend
device but active child

so this was an existing issue with USB before.

> -                       error = -EBUSY;
> -                       goto out;
> -               }
> -
> -               if (parent) {
> -                       atomic_add_unless(&parent->power.child_count, -1, 0);
> -                       notify_parent = !parent->power.ignore_children;
> -               }
> -               goto out_set;
> -       }
> -
> -       if (parent) {
> +               atomic_add_unless(&parent->power.child_count, -1, 0);
> +               notify_parent = !parent->power.ignore_children;
> +       } else {
>                 spin_lock_nested(&parent->power.lock, SINGLE_DEPTH_NESTING);
>
>                 /*
> @@ -1307,6 +1291,12 @@ void pm_runtime_enable(struct device *de
>         else
>                 dev_warn(dev, "Unbalanced %s!\n", __func__);
>
> +       WARN(dev->power.runtime_status == RPM_SUSPENDED &&
> +            !dev->power.ignore_children &&
> +            atomic_read(&dev->power.child_count) > 0,
> +            "Enabling runtime PM for inactive device (%s) with active children\n",
> +            dev_name(dev));

And now it became a bit more noisy:

Enabling runtime PM for inactive device (ee0a0200.usb-phy) with active children
WARNING: CPU: 0 PID: 1697 at drivers/base/power/runtime.c:1299
pm_runtime_enable+0x94/0xd8
CPU: 0 PID: 1697 Comm: s2idle Not tainted
4.15.0-rc1-arm64-renesas-00381-g40d152b966c941dd #41
Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
task: ffff8006f81bb100 task.stack: ffff00000aa80000
pstate: 60000085 (nZCv daIf -PAN -UAO)
pc : pm_runtime_enable+0x94/0xd8
lr : pm_runtime_enable+0x94/0xd8
sp : ffff00000aa83b50
x29: ffff00000aa83b50 x28: ffff8006f81bb100
x27: ffff000008841000 x26: ffff000008b4b640
x25: ffff000008b7f6e0 x24: ffff0000097a2f90
x23: ffff000008b7f000 x22: 0000000000000010
x21: 0000000000000000 x20: ffff8006fa9ad158
x19: ffff8006fa9ad010 x18: 00000000013c6577
x17: ffff00000947ea80 x16: 0000000000006370
x15: 000000000000636e x14: 646c696863206576
x13: 0000000000000001 x12: ffff8006f81bbaa8
x11: ffff8006f9479230 x10: ffff8006f97a63e0
x9 : 0000000000000000 x8 : ffff8006f97a6408
x7 : ffff8006f97a63c0 x6 : ffff00000975de80
x5 : 0000000000000000 x4 : 0000000000000000
x3 : ffffffffffffffff x2 : ffff000008b4bbf0
x1 : ffff8006f81bb100 x0 : 000000000000004f
Call trace:
 pm_runtime_enable+0x94/0xd8
 device_resume_early+0x50/0xec
 dpm_resume_early+0x118/0x204
 suspend_devices_and_enter+0x2a8/0x4b0
 pm_suspend+0x22c/0x27c
 state_store+0x84/0x108
 kobj_attr_store+0x14/0x24
 sysfs_kf_write+0x54/0x64
 kernfs_fop_write+0xd8/0x1ec
 __vfs_write+0x28/0x124
 vfs_write+0xa0/0x198
 SyS_write+0x44/0xa0
 el0_svc_naked+0x20/0x24
---[ end trace 965c08c229b62a65 ]---
------------[ cut here ]------------
Enabling runtime PM for inactive device (ee0c0200.usb-phy) with active children
WARNING: CPU: 0 PID: 1697 at drivers/base/power/runtime.c:1299
pm_runtime_enable+0x94/0xd8
CPU: 0 PID: 1697 Comm: s2idle Tainted: G        W
4.15.0-rc1-arm64-renesas-00381-g40d152b966c941dd #41
Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
task: ffff8006f81bb100 task.stack: ffff00000aa80000
pstate: 60000085 (nZCv daIf -PAN -UAO)
pc : pm_runtime_enable+0x94/0xd8
lr : pm_runtime_enable+0x94/0xd8
sp : ffff00000aa83b50
x29: ffff00000aa83b50 x28: ffff8006f81bb100
x27: ffff000008841000 x26: ffff000008b4b640
x25: ffff000008b7f6e0 x24: ffff0000097a2f90
x23: ffff000008b7f000 x22: 0000000000000010
x21: 0000000000000000 x20: ffff8006fa9ad958
x19: ffff8006fa9ad810 x18: 00000000013c6577
x17: ffff00000947ea80 x16: 0000000000006370
x15: 000000000000636e x14: 0720072007200720
x13: 0000000000000001 x12: ffff8006f81bbaa8
x11: ffff8006f9478940 x10: ffff8006f97a7ad8
x9 : 0000000000000000 x8 : ffff8006f97a7b00
x7 : ffff8006f97a7ab8 x6 : ffff00000975de80
x5 : 0000000000000000 x4 : 0000000000000000
x3 : ffffffffffffffff x2 : ffff000008b4bbf0
x1 : ffff8006f81bb100 x0 : 000000000000004f
Call trace:
 pm_runtime_enable+0x94/0xd8
 device_resume_early+0x50/0xec
 dpm_resume_early+0x118/0x204
 suspend_devices_and_enter+0x2a8/0x4b0
 pm_suspend+0x22c/0x27c
 state_store+0x84/0x108
 kobj_attr_store+0x14/0x24
 sysfs_kf_write+0x54/0x64
 kernfs_fop_write+0xd8/0x1ec
 __vfs_write+0x28/0x124
 vfs_write+0xa0/0x198
 SyS_write+0x44/0xa0
 el0_svc_naked+0x20/0x24
---[ end trace 965c08c229b62a66 ]---
------------[ cut here ]------------
...

> +
>         spin_unlock_irqrestore(&dev->power.lock, flags);
>  }
>  EXPORT_SYMBOL_GPL(pm_runtime_enable);

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Yoshihiro Shimoda Nov. 28, 2017, 12:48 p.m. UTC | #11
Hi Geert-san,

> From: Geert Uytterhoeven, Sent: Tuesday, November 28, 2017 7:58 PM

> 

> Hi Rafael, Shimoda-san,

> 

> On Sun, Nov 12, 2017 at 1:27 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:

> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> >

> > The check for "active" children in __pm_runtime_set_status(), when

> > trying to set the parent device status to "suspended", doesn't

> > really make sense, because in fact it is not invalid to set the

> > status of a device with runtime PM disabled to "suspended" in any

> > case.  It is invalid to enable runtime PM for a device with its

> > status set to "suspended" while its child_count reference counter

> > is nonzero, but the check in __pm_runtime_set_status() doesn't

> > really cover that situation.

> >

> > For this reason, drop the children check from __pm_runtime_set_status()

> > and add a check against child_count reference counters of "suspended"

> > devices to pm_runtime_enable().

> >

> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> > ---

> >  drivers/base/power/runtime.c |   30 ++++++++++--------------------

> >  1 file changed, 10 insertions(+), 20 deletions(-)

> >

> > Index: linux-pm/drivers/base/power/runtime.c

> > ===================================================================

> > --- linux-pm.orig/drivers/base/power/runtime.c

> > +++ linux-pm/drivers/base/power/runtime.c

> > @@ -1101,29 +1101,13 @@ int __pm_runtime_set_status(struct devic

> >                 goto out;

> >         }

> >

> > -       if (dev->power.runtime_status == status)

> > +       if (dev->power.runtime_status == status || !parent)

> >                 goto out_set;

> >

> >         if (status == RPM_SUSPENDED) {

> > -               /*

> > -                * It is invalid to suspend a device with an active child,

> > -                * unless it has been set to ignore its children.

> > -                */

> > -               if (!dev->power.ignore_children &&

> > -                       atomic_read(&dev->power.child_count)) {

> > -                       dev_err(dev, "runtime PM trying to suspend device but active child\n");

> 

> JFTR, this triggered before during system resume on e.g. Salvator-XS with

> R-Car H3:

> 

>     ohci-platform ee080000.usb: runtime PM trying to suspend device

> but active child

>     phy_rcar_gen3_usb2 ee080200.usb-phy: runtime PM trying to suspend

> device but active child

>     ohci-platform ee0c0000.usb: runtime PM trying to suspend device

> but active child

>     ohci-platform ee0a0000.usb: runtime PM trying to suspend device

> but active child

>     phy_rcar_gen3_usb2 ee0c0200.usb-phy: runtime PM trying to suspend

> device but active child

>     phy_rcar_gen3_usb2 ee0a0200.usb-phy: runtime PM trying to suspend

> device but active child

> 

> so this was an existing issue with USB before.


Thank you for the report!
I know that, but since this didn't cause any trouble until now,
I postponed to investigate the issue... But, I investigate it today.
I don't find the root cause yet. However, it seems related to usb host and/or usb core.
--> USB host related devices' child_count will be 1 in suspend timing.
 --> I guess remote wakeup feature is enabled? But, I don't find the point yet.

The renesas_usbhs also uses the phy_rcar_gen3_usb2 driver.
--> If I only used the renesas_usbhs driver (in other words, I don't install
    [eo]hci-{hcd,platform} drivers), the issue disappeared.
 --> So, I think the phy_rcar_gen3_usb2 driver doesn't cause this issue.
    (But, it is possible to be related though.)

I'll continue to investigate this issue tomorrow.

Best regards,
Yoshihiro Shimoda
Rafael J. Wysocki Nov. 28, 2017, 2:17 p.m. UTC | #12
On Tue, Nov 28, 2017 at 11:58 AM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> Hi Rafael, Shimoda-san,
>
> On Sun, Nov 12, 2017 at 1:27 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> The check for "active" children in __pm_runtime_set_status(), when
>> trying to set the parent device status to "suspended", doesn't
>> really make sense, because in fact it is not invalid to set the
>> status of a device with runtime PM disabled to "suspended" in any
>> case.  It is invalid to enable runtime PM for a device with its
>> status set to "suspended" while its child_count reference counter
>> is nonzero, but the check in __pm_runtime_set_status() doesn't
>> really cover that situation.
>>
>> For this reason, drop the children check from __pm_runtime_set_status()
>> and add a check against child_count reference counters of "suspended"
>> devices to pm_runtime_enable().
>>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> ---
>>  drivers/base/power/runtime.c |   30 ++++++++++--------------------
>>  1 file changed, 10 insertions(+), 20 deletions(-)
>>
>> Index: linux-pm/drivers/base/power/runtime.c
>> ===================================================================
>> --- linux-pm.orig/drivers/base/power/runtime.c
>> +++ linux-pm/drivers/base/power/runtime.c
>> @@ -1101,29 +1101,13 @@ int __pm_runtime_set_status(struct devic
>>                 goto out;
>>         }
>>
>> -       if (dev->power.runtime_status == status)
>> +       if (dev->power.runtime_status == status || !parent)
>>                 goto out_set;
>>
>>         if (status == RPM_SUSPENDED) {
>> -               /*
>> -                * It is invalid to suspend a device with an active child,
>> -                * unless it has been set to ignore its children.
>> -                */
>> -               if (!dev->power.ignore_children &&
>> -                       atomic_read(&dev->power.child_count)) {
>> -                       dev_err(dev, "runtime PM trying to suspend device but active child\n");
>
> JFTR, this triggered before during system resume on e.g. Salvator-XS with
> R-Car H3:
>
>     ohci-platform ee080000.usb: runtime PM trying to suspend device
> but active child
>     phy_rcar_gen3_usb2 ee080200.usb-phy: runtime PM trying to suspend
> device but active child
>     ohci-platform ee0c0000.usb: runtime PM trying to suspend device
> but active child
>     ohci-platform ee0a0000.usb: runtime PM trying to suspend device
> but active child
>     phy_rcar_gen3_usb2 ee0c0200.usb-phy: runtime PM trying to suspend
> device but active child
>     phy_rcar_gen3_usb2 ee0a0200.usb-phy: runtime PM trying to suspend
> device but active child
>
> so this was an existing issue with USB before.
>
>> -                       error = -EBUSY;
>> -                       goto out;
>> -               }
>> -
>> -               if (parent) {
>> -                       atomic_add_unless(&parent->power.child_count, -1, 0);
>> -                       notify_parent = !parent->power.ignore_children;
>> -               }
>> -               goto out_set;
>> -       }
>> -
>> -       if (parent) {
>> +               atomic_add_unless(&parent->power.child_count, -1, 0);
>> +               notify_parent = !parent->power.ignore_children;
>> +       } else {
>>                 spin_lock_nested(&parent->power.lock, SINGLE_DEPTH_NESTING);
>>
>>                 /*
>> @@ -1307,6 +1291,12 @@ void pm_runtime_enable(struct device *de
>>         else
>>                 dev_warn(dev, "Unbalanced %s!\n", __func__);
>>
>> +       WARN(dev->power.runtime_status == RPM_SUSPENDED &&
>> +            !dev->power.ignore_children &&
>> +            atomic_read(&dev->power.child_count) > 0,
>> +            "Enabling runtime PM for inactive device (%s) with active children\n",
>> +            dev_name(dev));
>
> And now it became a bit more noisy:

Well, it's all existing issues, although the WARN() doesn't provide
additional information in this particular case.

I'm considering changing it to print a message without a stack trace.

Thanks,
Rafael
Alan Stern Nov. 28, 2017, 3:06 p.m. UTC | #13
On Tue, 28 Nov 2017, Yoshihiro Shimoda wrote:

> Hi Geert-san,
> 
> > From: Geert Uytterhoeven, Sent: Tuesday, November 28, 2017 7:58 PM
> > 
> > Hi Rafael, Shimoda-san,
> > 
> > On Sun, Nov 12, 2017 at 1:27 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > The check for "active" children in __pm_runtime_set_status(), when
> > > trying to set the parent device status to "suspended", doesn't
> > > really make sense, because in fact it is not invalid to set the
> > > status of a device with runtime PM disabled to "suspended" in any
> > > case.  It is invalid to enable runtime PM for a device with its
> > > status set to "suspended" while its child_count reference counter
> > > is nonzero, but the check in __pm_runtime_set_status() doesn't
> > > really cover that situation.
> > >
> > > For this reason, drop the children check from __pm_runtime_set_status()
> > > and add a check against child_count reference counters of "suspended"
> > > devices to pm_runtime_enable().
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > >  drivers/base/power/runtime.c |   30 ++++++++++--------------------
> > >  1 file changed, 10 insertions(+), 20 deletions(-)
> > >
> > > Index: linux-pm/drivers/base/power/runtime.c
> > > ===================================================================
> > > --- linux-pm.orig/drivers/base/power/runtime.c
> > > +++ linux-pm/drivers/base/power/runtime.c
> > > @@ -1101,29 +1101,13 @@ int __pm_runtime_set_status(struct devic
> > >                 goto out;
> > >         }
> > >
> > > -       if (dev->power.runtime_status == status)
> > > +       if (dev->power.runtime_status == status || !parent)
> > >                 goto out_set;
> > >
> > >         if (status == RPM_SUSPENDED) {
> > > -               /*
> > > -                * It is invalid to suspend a device with an active child,
> > > -                * unless it has been set to ignore its children.
> > > -                */
> > > -               if (!dev->power.ignore_children &&
> > > -                       atomic_read(&dev->power.child_count)) {
> > > -                       dev_err(dev, "runtime PM trying to suspend device but active child\n");
> > 
> > JFTR, this triggered before during system resume on e.g. Salvator-XS with
> > R-Car H3:
> > 
> >     ohci-platform ee080000.usb: runtime PM trying to suspend device
> > but active child
> >     phy_rcar_gen3_usb2 ee080200.usb-phy: runtime PM trying to suspend
> > device but active child
> >     ohci-platform ee0c0000.usb: runtime PM trying to suspend device
> > but active child
> >     ohci-platform ee0a0000.usb: runtime PM trying to suspend device
> > but active child
> >     phy_rcar_gen3_usb2 ee0c0200.usb-phy: runtime PM trying to suspend
> > device but active child
> >     phy_rcar_gen3_usb2 ee0a0200.usb-phy: runtime PM trying to suspend
> > device but active child
> > 
> > so this was an existing issue with USB before.
> 
> Thank you for the report!
> I know that, but since this didn't cause any trouble until now,
> I postponed to investigate the issue... But, I investigate it today.
> I don't find the root cause yet. However, it seems related to usb host and/or usb core.
> --> USB host related devices' child_count will be 1 in suspend timing.
>  --> I guess remote wakeup feature is enabled? But, I don't find the point yet.
> 
> The renesas_usbhs also uses the phy_rcar_gen3_usb2 driver.
> --> If I only used the renesas_usbhs driver (in other words, I don't install
>     [eo]hci-{hcd,platform} drivers), the issue disappeared.
>  --> So, I think the phy_rcar_gen3_usb2 driver doesn't cause this issue.
>     (But, it is possible to be related though.)
> 
> I'll continue to investigate this issue tomorrow.

Does the phy_rcar_gen3_usb2 driver use runtime PM?  It looks like the 
phy device somehow gets enabled for runtime PM when it shouldn't be.

(And by the way, what device is the child of ee0a0200.usb-phy?)

Alan Stern
Ulf Hansson Nov. 28, 2017, 5:22 p.m. UTC | #14
On 28 November 2017 at 13:48, Yoshihiro Shimoda
<yoshihiro.shimoda.uh@renesas.com> wrote:
> Hi Geert-san,
>
>> From: Geert Uytterhoeven, Sent: Tuesday, November 28, 2017 7:58 PM
>>
>> Hi Rafael, Shimoda-san,
>>
>> On Sun, Nov 12, 2017 at 1:27 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> >
>> > The check for "active" children in __pm_runtime_set_status(), when
>> > trying to set the parent device status to "suspended", doesn't
>> > really make sense, because in fact it is not invalid to set the
>> > status of a device with runtime PM disabled to "suspended" in any
>> > case.  It is invalid to enable runtime PM for a device with its
>> > status set to "suspended" while its child_count reference counter
>> > is nonzero, but the check in __pm_runtime_set_status() doesn't
>> > really cover that situation.
>> >
>> > For this reason, drop the children check from __pm_runtime_set_status()
>> > and add a check against child_count reference counters of "suspended"
>> > devices to pm_runtime_enable().
>> >
>> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> > ---
>> >  drivers/base/power/runtime.c |   30 ++++++++++--------------------
>> >  1 file changed, 10 insertions(+), 20 deletions(-)
>> >
>> > Index: linux-pm/drivers/base/power/runtime.c
>> > ===================================================================
>> > --- linux-pm.orig/drivers/base/power/runtime.c
>> > +++ linux-pm/drivers/base/power/runtime.c
>> > @@ -1101,29 +1101,13 @@ int __pm_runtime_set_status(struct devic
>> >                 goto out;
>> >         }
>> >
>> > -       if (dev->power.runtime_status == status)
>> > +       if (dev->power.runtime_status == status || !parent)
>> >                 goto out_set;
>> >
>> >         if (status == RPM_SUSPENDED) {
>> > -               /*
>> > -                * It is invalid to suspend a device with an active child,
>> > -                * unless it has been set to ignore its children.
>> > -                */
>> > -               if (!dev->power.ignore_children &&
>> > -                       atomic_read(&dev->power.child_count)) {
>> > -                       dev_err(dev, "runtime PM trying to suspend device but active child\n");
>>
>> JFTR, this triggered before during system resume on e.g. Salvator-XS with
>> R-Car H3:
>>
>>     ohci-platform ee080000.usb: runtime PM trying to suspend device
>> but active child
>>     phy_rcar_gen3_usb2 ee080200.usb-phy: runtime PM trying to suspend
>> device but active child
>>     ohci-platform ee0c0000.usb: runtime PM trying to suspend device
>> but active child
>>     ohci-platform ee0a0000.usb: runtime PM trying to suspend device
>> but active child
>>     phy_rcar_gen3_usb2 ee0c0200.usb-phy: runtime PM trying to suspend
>> device but active child
>>     phy_rcar_gen3_usb2 ee0a0200.usb-phy: runtime PM trying to suspend
>> device but active child
>>
>> so this was an existing issue with USB before.
>
> Thank you for the report!
> I know that, but since this didn't cause any trouble until now,
> I postponed to investigate the issue... But, I investigate it today.
> I don't find the root cause yet. However, it seems related to usb host and/or usb core.
> --> USB host related devices' child_count will be 1 in suspend timing.
>  --> I guess remote wakeup feature is enabled? But, I don't find the point yet.

I am guessing the issue is triggered by genpd in the suspend noirq
phase (genpd_suspend_noirq()). In there,  there is a call to
pm_runtime_force_suspend() (which calls pm_runtime_set_suspended() and
which triggered the earlier error messages being printed).

The reason why genpd calls pm_runtime_force_suspend(), is because when
validating wakeup configurations for the device "if
(dev->power.wakeup_path && genpd_is_active_wakeup(genpd))", it's
thinks wakeup isn't configured while it probably should be.

An additional note, only when genpd has the GENPD_FLAG_PM_CLK set,
which makes the genpd->dev_ops.stop|start() being assigned, genpd
calls pm_runtime_force_suspend() - else it doesn't.

Perhaps try out the series I recently posted improving the code
dealing with wakeups in genpd and the PM core:
https://www.spinics.net/lists/linux-renesas-soc/msg20122.html
To that, you need to set the new flag (invented in the above series)
DPM_FLAG_IN_BAND_WAKEUP in the driver that configures wakeup of its
device.

Hope this helps!

>
> The renesas_usbhs also uses the phy_rcar_gen3_usb2 driver.
> --> If I only used the renesas_usbhs driver (in other words, I don't install
>     [eo]hci-{hcd,platform} drivers), the issue disappeared.
>  --> So, I think the phy_rcar_gen3_usb2 driver doesn't cause this issue.
>     (But, it is possible to be related though.)
>
> I'll continue to investigate this issue tomorrow.

Please keep me posted, I am interested about the why the problem exists. :-)

Kind regards
Uffe
Yoshihiro Shimoda Nov. 29, 2017, 8:21 a.m. UTC | #15
Hi,

> From: Alan Stern, Sent: Wednesday, November 29, 2017 12:07 AM
> 
> On Tue, 28 Nov 2017, Yoshihiro Shimoda wrote:
> 
> > Hi Geert-san,
> >
> > > From: Geert Uytterhoeven, Sent: Tuesday, November 28, 2017 7:58 PM
> > >
> > > Hi Rafael, Shimoda-san,
> > >
> > > On Sun, Nov 12, 2017 at 1:27 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
<snip>
> > > JFTR, this triggered before during system resume on e.g. Salvator-XS with
> > > R-Car H3:
> > >
> > >     ohci-platform ee080000.usb: runtime PM trying to suspend device
> > > but active child
> > >     phy_rcar_gen3_usb2 ee080200.usb-phy: runtime PM trying to suspend
> > > device but active child
> > >     ohci-platform ee0c0000.usb: runtime PM trying to suspend device
> > > but active child
> > >     ohci-platform ee0a0000.usb: runtime PM trying to suspend device
> > > but active child
> > >     phy_rcar_gen3_usb2 ee0c0200.usb-phy: runtime PM trying to suspend
> > > device but active child
> > >     phy_rcar_gen3_usb2 ee0a0200.usb-phy: runtime PM trying to suspend
> > > device but active child
> > >
> > > so this was an existing issue with USB before.
> >
> > Thank you for the report!
> > I know that, but since this didn't cause any trouble until now,
> > I postponed to investigate the issue... But, I investigate it today.
> > I don't find the root cause yet. However, it seems related to usb host and/or usb core.
> > --> USB host related devices' child_count will be 1 in suspend timing.
> >  --> I guess remote wakeup feature is enabled? But, I don't find the point yet.
> >
> > The renesas_usbhs also uses the phy_rcar_gen3_usb2 driver.

I'm so sorry, but this is mistake.
The renesas_usbhs doesn't use the phy_rcar_gen3_usb2 driver.
So,

> > --> If I only used the renesas_usbhs driver (in other words, I don't install
> >     [eo]hci-{hcd,platform} drivers), the issue disappeared.
> >  --> So, I think the phy_rcar_gen3_usb2 driver doesn't cause this issue.
> >     (But, it is possible to be related though.)

They are also mistake.

> > I'll continue to investigate this issue tomorrow.
> 
> Does the phy_rcar_gen3_usb2 driver use runtime PM?

Yes, the phy_rcar_gen3_usb2 uses runtime PM.

>  It looks like the
> phy device somehow gets enabled for runtime PM when it shouldn't be.

I also think that now.
I don't find why for now, but the usage_count of a phy device was not 1 just before suspend.
(This "a phy device" means the child of ee0a0200.usb-phy device.)

> (And by the way, what device is the child of ee0a0200.usb-phy?)

It's a phy device:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/phy/phy-core.c?h=v4.15-rc1#n773

Best regards,
Yoshihiro Shimoda

> Alan Stern
Yoshihiro Shimoda Nov. 29, 2017, 8:21 a.m. UTC | #16
Hi,

> From: Ulf Hansson, Sent: Wednesday, November 29, 2017 2:23 AM

> 

> On 28 November 2017 at 13:48, Yoshihiro Shimoda

> <yoshihiro.shimoda.uh@renesas.com> wrote:

> > Hi Geert-san,

> >

> >> From: Geert Uytterhoeven, Sent: Tuesday, November 28, 2017 7:58 PM

> >>

> >> Hi Rafael, Shimoda-san,

> >>

> >> On Sun, Nov 12, 2017 at 1:27 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:

> >> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

<snip>
> >> JFTR, this triggered before during system resume on e.g. Salvator-XS with

> >> R-Car H3:

> >>

> >>     ohci-platform ee080000.usb: runtime PM trying to suspend device

> >> but active child

> >>     phy_rcar_gen3_usb2 ee080200.usb-phy: runtime PM trying to suspend

> >> device but active child

> >>     ohci-platform ee0c0000.usb: runtime PM trying to suspend device

> >> but active child

> >>     ohci-platform ee0a0000.usb: runtime PM trying to suspend device

> >> but active child

> >>     phy_rcar_gen3_usb2 ee0c0200.usb-phy: runtime PM trying to suspend

> >> device but active child

> >>     phy_rcar_gen3_usb2 ee0a0200.usb-phy: runtime PM trying to suspend

> >> device but active child

> >>

> >> so this was an existing issue with USB before.

> >

> > Thank you for the report!

> > I know that, but since this didn't cause any trouble until now,

> > I postponed to investigate the issue... But, I investigate it today.

> > I don't find the root cause yet. However, it seems related to usb host and/or usb core.

> > --> USB host related devices' child_count will be 1 in suspend timing.

> >  --> I guess remote wakeup feature is enabled? But, I don't find the point yet.

> 

> I am guessing the issue is triggered by genpd in the suspend noirq

> phase (genpd_suspend_noirq()). In there,  there is a call to

> pm_runtime_force_suspend() (which calls pm_runtime_set_suspended() and

> which triggered the earlier error messages being printed).

> 

> The reason why genpd calls pm_runtime_force_suspend(), is because when

> validating wakeup configurations for the device "if

> (dev->power.wakeup_path && genpd_is_active_wakeup(genpd))", it's

> thinks wakeup isn't configured while it probably should be.

> 

> An additional note, only when genpd has the GENPD_FLAG_PM_CLK set,

> which makes the genpd->dev_ops.stop|start() being assigned, genpd

> calls pm_runtime_force_suspend() - else it doesn't.

> 

> Perhaps try out the series I recently posted improving the code

> dealing with wakeups in genpd and the PM core:

> https://www.spinics.net/lists/linux-renesas-soc/msg20122.html

> To that, you need to set the new flag (invented in the above series)

> DPM_FLAG_IN_BAND_WAKEUP in the driver that configures wakeup of its

> device.

> 

> Hope this helps!


Thank you for the comments!
I tried DPM_FLAG_IN_BAND_WAKEUP, but the issue still exists.
I added the flag in the [eo]hci-platform driver and usb/core/driver.c.
I also added the flag in the phy_rcar_gen3_usb2 driver except usb host drivers.

> > The renesas_usbhs also uses the phy_rcar_gen3_usb2 driver.

> > --> If I only used the renesas_usbhs driver (in other words, I don't install

> >     [eo]hci-{hcd,platform} drivers), the issue disappeared.

> >  --> So, I think the phy_rcar_gen3_usb2 driver doesn't cause this issue.

> >     (But, it is possible to be related though.)

> >

> > I'll continue to investigate this issue tomorrow.

> 

> Please keep me posted, I am interested about the why the problem exists. :-)


Sure! :)

Best regards,
Yoshihiro Shimoda

> Kind regards

> Uffe
Ulf Hansson Nov. 29, 2017, 9:24 a.m. UTC | #17
On 29 November 2017 at 09:21, Yoshihiro Shimoda
<yoshihiro.shimoda.uh@renesas.com> wrote:
> Hi,
>
>> From: Ulf Hansson, Sent: Wednesday, November 29, 2017 2:23 AM
>>
>> On 28 November 2017 at 13:48, Yoshihiro Shimoda
>> <yoshihiro.shimoda.uh@renesas.com> wrote:
>> > Hi Geert-san,
>> >
>> >> From: Geert Uytterhoeven, Sent: Tuesday, November 28, 2017 7:58 PM
>> >>
>> >> Hi Rafael, Shimoda-san,
>> >>
>> >> On Sun, Nov 12, 2017 at 1:27 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> >> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> <snip>
>> >> JFTR, this triggered before during system resume on e.g. Salvator-XS with
>> >> R-Car H3:
>> >>
>> >>     ohci-platform ee080000.usb: runtime PM trying to suspend device
>> >> but active child
>> >>     phy_rcar_gen3_usb2 ee080200.usb-phy: runtime PM trying to suspend
>> >> device but active child
>> >>     ohci-platform ee0c0000.usb: runtime PM trying to suspend device
>> >> but active child
>> >>     ohci-platform ee0a0000.usb: runtime PM trying to suspend device
>> >> but active child
>> >>     phy_rcar_gen3_usb2 ee0c0200.usb-phy: runtime PM trying to suspend
>> >> device but active child
>> >>     phy_rcar_gen3_usb2 ee0a0200.usb-phy: runtime PM trying to suspend
>> >> device but active child
>> >>
>> >> so this was an existing issue with USB before.
>> >
>> > Thank you for the report!
>> > I know that, but since this didn't cause any trouble until now,
>> > I postponed to investigate the issue... But, I investigate it today.
>> > I don't find the root cause yet. However, it seems related to usb host and/or usb core.
>> > --> USB host related devices' child_count will be 1 in suspend timing.
>> >  --> I guess remote wakeup feature is enabled? But, I don't find the point yet.
>>
>> I am guessing the issue is triggered by genpd in the suspend noirq
>> phase (genpd_suspend_noirq()). In there,  there is a call to
>> pm_runtime_force_suspend() (which calls pm_runtime_set_suspended() and
>> which triggered the earlier error messages being printed).
>>
>> The reason why genpd calls pm_runtime_force_suspend(), is because when
>> validating wakeup configurations for the device "if
>> (dev->power.wakeup_path && genpd_is_active_wakeup(genpd))", it's
>> thinks wakeup isn't configured while it probably should be.
>>
>> An additional note, only when genpd has the GENPD_FLAG_PM_CLK set,
>> which makes the genpd->dev_ops.stop|start() being assigned, genpd
>> calls pm_runtime_force_suspend() - else it doesn't.
>>
>> Perhaps try out the series I recently posted improving the code
>> dealing with wakeups in genpd and the PM core:
>> https://www.spinics.net/lists/linux-renesas-soc/msg20122.html
>> To that, you need to set the new flag (invented in the above series)
>> DPM_FLAG_IN_BAND_WAKEUP in the driver that configures wakeup of its
>> device.
>>
>> Hope this helps!
>
> Thank you for the comments!
> I tried DPM_FLAG_IN_BAND_WAKEUP, but the issue still exists.
> I added the flag in the [eo]hci-platform driver and usb/core/driver.c.
> I also added the flag in the phy_rcar_gen3_usb2 driver except usb host drivers.

First, did you confirm that genpd was used? Then for what device?

Second, did you check the call to pm_runtime_force_suspend() called by
genpd, is the reason to the error messages?

Third, it should be sufficient to add DPM_FLAG_IN_BAND_WAKEUP for the
driver that is actually dealing with the wakeup. Although, does this
driver's system ->suspend() callback check device_may_wakeup(), before
it decides to enable wakeup?
If not, the PM core and genpd don't notice that wakeup is enabled for
the device.

>
>> > The renesas_usbhs also uses the phy_rcar_gen3_usb2 driver.
>> > --> If I only used the renesas_usbhs driver (in other words, I don't install
>> >     [eo]hci-{hcd,platform} drivers), the issue disappeared.
>> >  --> So, I think the phy_rcar_gen3_usb2 driver doesn't cause this issue.
>> >     (But, it is possible to be related though.)
>> >
>> > I'll continue to investigate this issue tomorrow.
>>
>> Please keep me posted, I am interested about the why the problem exists. :-)
>
> Sure! :)

Great, thanks.

Kind regards
Uffe
Geert Uytterhoeven Nov. 29, 2017, 9:43 a.m. UTC | #18
Hi Ulf,

On Wed, Nov 29, 2017 at 10:24 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 29 November 2017 at 09:21, Yoshihiro Shimoda
> <yoshihiro.shimoda.uh@renesas.com> wrote:
>>> From: Ulf Hansson, Sent: Wednesday, November 29, 2017 2:23 AM
>>> On 28 November 2017 at 13:48, Yoshihiro Shimoda
>>> <yoshihiro.shimoda.uh@renesas.com> wrote:
>>> >> From: Geert Uytterhoeven, Sent: Tuesday, November 28, 2017 7:58 PM
>>> >> On Sun, Nov 12, 2017 at 1:27 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>> >> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> <snip>
>>> >> JFTR, this triggered before during system resume on e.g. Salvator-XS with
>>> >> R-Car H3:
>>> >>
>>> >>     ohci-platform ee080000.usb: runtime PM trying to suspend device
>>> >> but active child
>>> >>     phy_rcar_gen3_usb2 ee080200.usb-phy: runtime PM trying to suspend
>>> >> device but active child
>>> >>     ohci-platform ee0c0000.usb: runtime PM trying to suspend device
>>> >> but active child
>>> >>     ohci-platform ee0a0000.usb: runtime PM trying to suspend device
>>> >> but active child
>>> >>     phy_rcar_gen3_usb2 ee0c0200.usb-phy: runtime PM trying to suspend
>>> >> device but active child
>>> >>     phy_rcar_gen3_usb2 ee0a0200.usb-phy: runtime PM trying to suspend
>>> >> device but active child
>>> >>
>>> >> so this was an existing issue with USB before.
>>> >
>>> > Thank you for the report!
>>> > I know that, but since this didn't cause any trouble until now,
>>> > I postponed to investigate the issue... But, I investigate it today.
>>> > I don't find the root cause yet. However, it seems related to usb host and/or usb core.
>>> > --> USB host related devices' child_count will be 1 in suspend timing.
>>> >  --> I guess remote wakeup feature is enabled? But, I don't find the point yet.
>>>
>>> I am guessing the issue is triggered by genpd in the suspend noirq
>>> phase (genpd_suspend_noirq()). In there,  there is a call to
>>> pm_runtime_force_suspend() (which calls pm_runtime_set_suspended() and
>>> which triggered the earlier error messages being printed).
>>>
>>> The reason why genpd calls pm_runtime_force_suspend(), is because when
>>> validating wakeup configurations for the device "if
>>> (dev->power.wakeup_path && genpd_is_active_wakeup(genpd))", it's
>>> thinks wakeup isn't configured while it probably should be.
>>>
>>> An additional note, only when genpd has the GENPD_FLAG_PM_CLK set,
>>> which makes the genpd->dev_ops.stop|start() being assigned, genpd
>>> calls pm_runtime_force_suspend() - else it doesn't.
>>>
>>> Perhaps try out the series I recently posted improving the code
>>> dealing with wakeups in genpd and the PM core:
>>> https://www.spinics.net/lists/linux-renesas-soc/msg20122.html
>>> To that, you need to set the new flag (invented in the above series)
>>> DPM_FLAG_IN_BAND_WAKEUP in the driver that configures wakeup of its
>>> device.
>>>
>>> Hope this helps!
>>
>> Thank you for the comments!
>> I tried DPM_FLAG_IN_BAND_WAKEUP, but the issue still exists.
>> I added the flag in the [eo]hci-platform driver and usb/core/driver.c.
>> I also added the flag in the phy_rcar_gen3_usb2 driver except usb host drivers.
>
> First, did you confirm that genpd was used? Then for what device?

All 6 devices are part of the SYSC PM Domain.

> Second, did you check the call to pm_runtime_force_suspend() called by
> genpd, is the reason to the error messages?
>
> Third, it should be sufficient to add DPM_FLAG_IN_BAND_WAKEUP for the
> driver that is actually dealing with the wakeup. Although, does this
> driver's system ->suspend() callback check device_may_wakeup(), before
> it decides to enable wakeup?
> If not, the PM core and genpd don't notice that wakeup is enabled for
> the device.

Actually I saw this with my patches setting GENPD_FLAG_ACTIVE_WAKEUP
for the SYSC PM Domain, which should trigger the same behavior.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Ulf Hansson Nov. 29, 2017, 9:59 a.m. UTC | #19
On 29 November 2017 at 10:43, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> Hi Ulf,
>
> On Wed, Nov 29, 2017 at 10:24 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> On 29 November 2017 at 09:21, Yoshihiro Shimoda
>> <yoshihiro.shimoda.uh@renesas.com> wrote:
>>>> From: Ulf Hansson, Sent: Wednesday, November 29, 2017 2:23 AM
>>>> On 28 November 2017 at 13:48, Yoshihiro Shimoda
>>>> <yoshihiro.shimoda.uh@renesas.com> wrote:
>>>> >> From: Geert Uytterhoeven, Sent: Tuesday, November 28, 2017 7:58 PM
>>>> >> On Sun, Nov 12, 2017 at 1:27 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>>> >> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>> <snip>
>>>> >> JFTR, this triggered before during system resume on e.g. Salvator-XS with
>>>> >> R-Car H3:
>>>> >>
>>>> >>     ohci-platform ee080000.usb: runtime PM trying to suspend device
>>>> >> but active child
>>>> >>     phy_rcar_gen3_usb2 ee080200.usb-phy: runtime PM trying to suspend
>>>> >> device but active child
>>>> >>     ohci-platform ee0c0000.usb: runtime PM trying to suspend device
>>>> >> but active child
>>>> >>     ohci-platform ee0a0000.usb: runtime PM trying to suspend device
>>>> >> but active child
>>>> >>     phy_rcar_gen3_usb2 ee0c0200.usb-phy: runtime PM trying to suspend
>>>> >> device but active child
>>>> >>     phy_rcar_gen3_usb2 ee0a0200.usb-phy: runtime PM trying to suspend
>>>> >> device but active child
>>>> >>
>>>> >> so this was an existing issue with USB before.
>>>> >
>>>> > Thank you for the report!
>>>> > I know that, but since this didn't cause any trouble until now,
>>>> > I postponed to investigate the issue... But, I investigate it today.
>>>> > I don't find the root cause yet. However, it seems related to usb host and/or usb core.
>>>> > --> USB host related devices' child_count will be 1 in suspend timing.
>>>> >  --> I guess remote wakeup feature is enabled? But, I don't find the point yet.
>>>>
>>>> I am guessing the issue is triggered by genpd in the suspend noirq
>>>> phase (genpd_suspend_noirq()). In there,  there is a call to
>>>> pm_runtime_force_suspend() (which calls pm_runtime_set_suspended() and
>>>> which triggered the earlier error messages being printed).
>>>>
>>>> The reason why genpd calls pm_runtime_force_suspend(), is because when
>>>> validating wakeup configurations for the device "if
>>>> (dev->power.wakeup_path && genpd_is_active_wakeup(genpd))", it's
>>>> thinks wakeup isn't configured while it probably should be.
>>>>
>>>> An additional note, only when genpd has the GENPD_FLAG_PM_CLK set,
>>>> which makes the genpd->dev_ops.stop|start() being assigned, genpd
>>>> calls pm_runtime_force_suspend() - else it doesn't.
>>>>
>>>> Perhaps try out the series I recently posted improving the code
>>>> dealing with wakeups in genpd and the PM core:
>>>> https://www.spinics.net/lists/linux-renesas-soc/msg20122.html
>>>> To that, you need to set the new flag (invented in the above series)
>>>> DPM_FLAG_IN_BAND_WAKEUP in the driver that configures wakeup of its
>>>> device.
>>>>
>>>> Hope this helps!
>>>
>>> Thank you for the comments!
>>> I tried DPM_FLAG_IN_BAND_WAKEUP, but the issue still exists.
>>> I added the flag in the [eo]hci-platform driver and usb/core/driver.c.
>>> I also added the flag in the phy_rcar_gen3_usb2 driver except usb host drivers.
>>
>> First, did you confirm that genpd was used? Then for what device?
>
> All 6 devices are part of the SYSC PM Domain.

Okay!

Can you perhaps clarify which 6 devices/drivers that are involved, and
perhaps also point out if their child devices?

>
>> Second, did you check the call to pm_runtime_force_suspend() called by
>> genpd, is the reason to the error messages?
>>
>> Third, it should be sufficient to add DPM_FLAG_IN_BAND_WAKEUP for the
>> driver that is actually dealing with the wakeup. Although, does this
>> driver's system ->suspend() callback check device_may_wakeup(), before
>> it decides to enable wakeup?
>> If not, the PM core and genpd don't notice that wakeup is enabled for
>> the device.
>
> Actually I saw this with my patches setting GENPD_FLAG_ACTIVE_WAKEUP
> for the SYSC PM Domain, which should trigger the same behavior.

Okay, so the problem remains no matter which solution for wakeup you
pick in genpd.

Then this seems to point to that the driver may be misbehaving in some
way. I can help to check what is going on.

Kind regards
Uffe
Geert Uytterhoeven Nov. 29, 2017, 2:09 p.m. UTC | #20
Hi Ulf,

On Wed, Nov 29, 2017 at 10:59 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 29 November 2017 at 10:43, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>> On Wed, Nov 29, 2017 at 10:24 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>>> On 29 November 2017 at 09:21, Yoshihiro Shimoda
>>> <yoshihiro.shimoda.uh@renesas.com> wrote:
>>>>> From: Ulf Hansson, Sent: Wednesday, November 29, 2017 2:23 AM
>>>>> On 28 November 2017 at 13:48, Yoshihiro Shimoda
>>>>> <yoshihiro.shimoda.uh@renesas.com> wrote:
>>>>> >> From: Geert Uytterhoeven, Sent: Tuesday, November 28, 2017 7:58 PM
>>>>> >> On Sun, Nov 12, 2017 at 1:27 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>>>> >> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>> <snip>
>>>>> >> JFTR, this triggered before during system resume on e.g. Salvator-XS with
>>>>> >> R-Car H3:
>>>>> >>
>>>>> >>     ohci-platform ee080000.usb: runtime PM trying to suspend device
>>>>> >> but active child
>>>>> >>     phy_rcar_gen3_usb2 ee080200.usb-phy: runtime PM trying to suspend
>>>>> >> device but active child
>>>>> >>     ohci-platform ee0c0000.usb: runtime PM trying to suspend device
>>>>> >> but active child
>>>>> >>     ohci-platform ee0a0000.usb: runtime PM trying to suspend device
>>>>> >> but active child
>>>>> >>     phy_rcar_gen3_usb2 ee0c0200.usb-phy: runtime PM trying to suspend
>>>>> >> device but active child
>>>>> >>     phy_rcar_gen3_usb2 ee0a0200.usb-phy: runtime PM trying to suspend
>>>>> >> device but active child
>>>>> >>
>>>>> >> so this was an existing issue with USB before.
>>>>> >
>>>>> > Thank you for the report!
>>>>> > I know that, but since this didn't cause any trouble until now,
>>>>> > I postponed to investigate the issue... But, I investigate it today.
>>>>> > I don't find the root cause yet. However, it seems related to usb host and/or usb core.
>>>>> > --> USB host related devices' child_count will be 1 in suspend timing.
>>>>> >  --> I guess remote wakeup feature is enabled? But, I don't find the point yet.
>>>>>
>>>>> I am guessing the issue is triggered by genpd in the suspend noirq
>>>>> phase (genpd_suspend_noirq()). In there,  there is a call to
>>>>> pm_runtime_force_suspend() (which calls pm_runtime_set_suspended() and
>>>>> which triggered the earlier error messages being printed).
>>>>>
>>>>> The reason why genpd calls pm_runtime_force_suspend(), is because when
>>>>> validating wakeup configurations for the device "if
>>>>> (dev->power.wakeup_path && genpd_is_active_wakeup(genpd))", it's
>>>>> thinks wakeup isn't configured while it probably should be.
>>>>>
>>>>> An additional note, only when genpd has the GENPD_FLAG_PM_CLK set,
>>>>> which makes the genpd->dev_ops.stop|start() being assigned, genpd
>>>>> calls pm_runtime_force_suspend() - else it doesn't.
>>>>>
>>>>> Perhaps try out the series I recently posted improving the code
>>>>> dealing with wakeups in genpd and the PM core:
>>>>> https://www.spinics.net/lists/linux-renesas-soc/msg20122.html
>>>>> To that, you need to set the new flag (invented in the above series)
>>>>> DPM_FLAG_IN_BAND_WAKEUP in the driver that configures wakeup of its
>>>>> device.
>>>>>
>>>>> Hope this helps!
>>>>
>>>> Thank you for the comments!
>>>> I tried DPM_FLAG_IN_BAND_WAKEUP, but the issue still exists.
>>>> I added the flag in the [eo]hci-platform driver and usb/core/driver.c.
>>>> I also added the flag in the phy_rcar_gen3_usb2 driver except usb host drivers.
>>>
>>> First, did you confirm that genpd was used? Then for what device?
>>
>> All 6 devices are part of the SYSC PM Domain.
>
> Okay!
>
> Can you perhaps clarify which 6 devices/drivers that are involved, and
> perhaps also point out if their child devices?

/sys/devices/platform/soc/ee080000.usb
/sys/devices/platform/soc/ee0c0000.usb
/sys/devices/platform/soc/ee0a0000.usb

Driver: ohci-platform

The children are usb6/6-0:1.0, usb3/3-0:1.0, resp. usb4/4-0:1.0, all using
the usb "hub" driver

/sys/devices/platform/soc/ee080200.usb-phy
/sys/devices/platform/soc/ee0a0200.usb-phy
/sys/devices/platform/soc/ee0c0200.usb-phy

Driver: phy_rcar_gen3_usb2

The children are:

phy/phy-ee080200.usb-phy.2
phy/phy-ee0a0200.usb-phy.0
phy/phy-ee0c0200.usb-phy.1

all without a driver, according to sysfs.

Note that at first I had missed them, as printing the children using
device_for_each_child() does not print them, unlike the hub devices that
are children of the usb hosts.

With some debug code added, logging inc/dec of child_count:

USB driver init:

 ehci-pci: EHCI PCI platform driver
 ehci-platform: EHCI generic platform driver
+phy_rcar_gen3_usb2 ee0a0200.usb-phy: rpm_resume:830: inc child_count
of parent soc
+phy phy-ee0a0200.usb-phy.0: rpm_resume:830: inc child_count of parent
ee0a0200.usb-phy
+phy phy-ee0a0200.usb-phy.0: rpm_suspend:606: dec child_count of
parent ee0a0200.usb-phy
+phy phy-ee0a0200.usb-phy.0: rpm_resume:759: inc child_count of parent
ee0a0200.usb-phy

+phy_rcar_gen3_usb2 ee0c0200.usb-phy: rpm_resume:830: inc child_count
of parent soc
+phy phy-ee0c0200.usb-phy.1: rpm_resume:830: inc child_count of parent
ee0c0200.usb-phy
+phy phy-ee0c0200.usb-phy.1: rpm_suspend:606: dec child_count of
parent ee0c0200.usb-phy
+phy phy-ee0c0200.usb-phy.1: rpm_resume:759: inc child_count of parent
ee0c0200.usb-phy

+phy phy-ee080200.usb-phy.2: rpm_resume:830: inc child_count of parent
ee080200.usb-phy
+phy phy-ee080200.usb-phy.2: rpm_suspend:606: dec child_count of
parent ee080200.usb-phy
+phy phy-ee080200.usb-phy.2: rpm_resume:759: inc child_count of parent
ee080200.usb-phy

Somehow the phy class phy-ee0*0200.usb-phy.* devices have the platform
ee0*0200.usb-phy devices as parent, but they're not part of the list of
children? Looks like a bug in the USB PHY driver or subsystem.


USB hub instantiation:

+usb usb3: __pm_runtime_set_status:1131: inc child_count of parent ee0a0000.usb
+usb usb3: rpm_suspend:606: dec child_count of parent ee0a0000.usb

+usb usb4: __pm_runtime_set_status:1131: inc child_count of parent ee0c0000.usb
+usb usb4: rpm_suspend:606: dec child_count of parent ee0c0000.usb

+usb usb6: __pm_runtime_set_status:1131: inc child_count of parent ee080000.usb
+usb usb6: rpm_suspend:606: dec child_count of parent ee080000.usb


System suspend:

+usb usb6: rpm_resume:830: inc child_count of parent ee080000.usb
+usb usb3: rpm_resume:830: inc child_count of parent ee0a0000.usb
+usb usb4: rpm_resume:830: inc child_count of parent ee0c0000.usb


System resume:

 Enabling runtime PM for inactive device (ee0c0200.usb-phy) with active children
 Enabling runtime PM for inactive device (ee0c0200.usb-phy) with active children
 Enabling runtime PM for inactive device (ee0a0000.usb) with active children
 Enabling runtime PM for inactive device (ee0c0000.usb) with active children
 Enabling runtime PM for inactive device (ee080200.usb-phy) with active children
 Enabling runtime PM for inactive device (ee080000.usb) with active children

+usb usb4: rpm_suspend:606: dec child_count of parent ee0c0000.usb
+usb usb6: rpm_suspend:606: dec child_count of parent ee080000.usb
+usb usb3: rpm_suspend:606: dec child_count of parent ee0a0000.usb

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Yoshihiro Shimoda Nov. 30, 2017, 12:51 p.m. UTC | #21
Hi,

> From: Ulf Hansson, Sent: Wednesday, November 29, 2017 6:59 PM

> 

> On 29 November 2017 at 10:43, Geert Uytterhoeven <geert@linux-m68k.org> wrote:

> > Hi Ulf,

<snip>
> Okay, so the problem remains no matter which solution for wakeup you

> pick in genpd.


Yes. Today I could reproduce this issue without usb host driver.
- The renesas_usb3 usb peripheral driver has generic phy handling.
  (The peripheral driver uses different generic phy driver (phy-rcar-gen3-usb3.c) though.)
 --> If I used the current renesas_usb3 (this means doesn't call phy_power_{on,off}(),
     the issue didn't happen.
 --> If I added phy_power_{on,off}() calling, the issue happened.
  --> So, I'm thinking the APIs are related to the issue.

- The generic phy APIs are in drivers/phy/phy-core.c.
 --> The phy-rcar-gen3-usb[23] drivers call only pm_runtime_enable() before devm_phy_create().
  --> The phy-core will call pm_runtime_{get_sync,put}() in phy_{init,exit,power_{on,off}}.
   --> So, IIUC, both devices of phy-<dev_name>.<id> and <dev_name> will be handled by runtime PM APIs.
 --> The runtime PM implementation of phy-core seems good to me. But...?

> Then this seems to point to that the driver may be misbehaving in some

> way. I can help to check what is going on.


I guess so. But, I don't find yet...

Best regards,
Yoshihiro Shimoda

> Kind regards

> Uffe
diff mbox

Patch

Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -1101,29 +1101,13 @@  int __pm_runtime_set_status(struct devic
 		goto out;
 	}
 
-	if (dev->power.runtime_status == status)
+	if (dev->power.runtime_status == status || !parent)
 		goto out_set;
 
 	if (status == RPM_SUSPENDED) {
-		/*
-		 * It is invalid to suspend a device with an active child,
-		 * unless it has been set to ignore its children.
-		 */
-		if (!dev->power.ignore_children &&
-			atomic_read(&dev->power.child_count)) {
-			dev_err(dev, "runtime PM trying to suspend device but active child\n");
-			error = -EBUSY;
-			goto out;
-		}
-
-		if (parent) {
-			atomic_add_unless(&parent->power.child_count, -1, 0);
-			notify_parent = !parent->power.ignore_children;
-		}
-		goto out_set;
-	}
-
-	if (parent) {
+		atomic_add_unless(&parent->power.child_count, -1, 0);
+		notify_parent = !parent->power.ignore_children;
+	} else {
 		spin_lock_nested(&parent->power.lock, SINGLE_DEPTH_NESTING);
 
 		/*
@@ -1307,6 +1291,12 @@  void pm_runtime_enable(struct device *de
 	else
 		dev_warn(dev, "Unbalanced %s!\n", __func__);
 
+	WARN(dev->power.runtime_status == RPM_SUSPENDED &&
+	     !dev->power.ignore_children &&
+	     atomic_read(&dev->power.child_count) > 0,
+	     "Enabling runtime PM for inactive device (%s) with active children\n",
+	     dev_name(dev));
+
 	spin_unlock_irqrestore(&dev->power.lock, flags);
 }
 EXPORT_SYMBOL_GPL(pm_runtime_enable);