diff mbox series

PCI/PM: Target PM state is D3cold if the upstream bridge is power manageable

Message ID 20210510102647.40322-1-mika.westerberg@linux.intel.com (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series PCI/PM: Target PM state is D3cold if the upstream bridge is power manageable | expand

Commit Message

Mika Westerberg May 10, 2021, 10:26 a.m. UTC
ASMedia xHCI controller only supports PME from D3cold:

11:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller (prog-if 30 [XHCI])
  ...
  Capabilities: [78] Power Management version 3
  	  Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
	  Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

Now, if the controller is part of a Thunderbolt device for instance, it
is connected to a PCIe switch downstream port. When the hierarchy then
enters D3cold as a result of s2idle cycle pci_target_state() returns D0
because the device does not support PME from the default target_state
(D3hot). So what happens is that the whole hierarchy is left into D0
breaking power management.

For this reason choose target_state to be D3cold if there is a upstream
bridge that is power manageable. The reasoning here is that the upstream
bridge will be also placed into D3 making the effective power state of
the device in question to be D3cold.

Reported-by: Utkarsh H Patel <utkarsh.h.patel@intel.com>
Reported-by: Koba Ko <koba.ko@canonical.com>
Tested-by: Koba Ko <koba.ko@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/pci.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Rajat Jain May 17, 2021, 10:33 p.m. UTC | #1
[+Kai]

Hi,

I don't understand the power management very well, so pardon my
ignorance but I have a question.

On Mon, May 10, 2021 at 3:30 AM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> ASMedia xHCI controller only supports PME from D3cold:
>
> 11:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller (prog-if 30 [XHCI])
>   ...
>   Capabilities: [78] Power Management version 3
>           Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
>           Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>
> Now, if the controller is part of a Thunderbolt device for instance, it
> is connected to a PCIe switch downstream port. When the hierarchy then
> enters D3cold as a result of s2idle cycle pci_target_state() returns D0
> because the device does not support PME from the default target_state
> (D3hot). So what happens is that the whole hierarchy is left into D0
> breaking power management.at suspend time or resume time

Can you please provide a small call stack, when this issue is seen?
(I'm primarily trying to understand whether the issue is breaking
suspend, or the suspend is fine, but resume is broken?)

>
> For this reason choose target_state to be D3cold if there is a upstream
> bridge that is power manageable.

It seems to me that the goal of pci_target_state() is to find the
lowest power state that a device can be put into, from which device
can still generate PME (if needed). So I'm curious why it starts with
target_state = PCI_D3hot in the first place? Wouldn't starting with
PCI_D3cold will always be better (regardless of parent bridge
capabilities)?

And then I came across the commit 8feaec33b986 ("PCI / PM: Always
check PME wakeup capability for runtime wakeup support"), which
addresses the same device that this patch addresses, and 1 excerpt
from the commit log that stood out:
============================================================
    In addition, change wakeup flag passed to pci_target_state() from false
    to true, because we want to find the deepest state *different from D3cold*
    that the device can still generate PME#. In this case, it's D0 for the
    device in question.
============================================================

So, does returning D3Cold from this function break any other
assumption somewhere?

Thanks,
Rajat


> The reasoning here is that the upstream
> bridge will be also placed into D3 making the effective power state of
> the device in question to be D3cold.
>
> Reported-by: Utkarsh H Patel <utkarsh.h.patel@intel.com>
> Reported-by: Koba Ko <koba.ko@canonical.com>
> Tested-by: Koba Ko <koba.ko@canonical.com>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
>  drivers/pci/pci.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b717680377a9..e3f3b9010762 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2578,8 +2578,19 @@ static pci_power_t pci_target_state(struct pci_dev *dev, bool wakeup)
>                 return target_state;
>         }
>
> -       if (!dev->pm_cap)
> +       if (!dev->pm_cap) {
>                 target_state = PCI_D0;
> +       } else {
> +               struct pci_dev *bridge;
> +
> +               /*
> +                * If the upstream bridge can be put to D3 then it means
> +                * that our target state is D3cold instead of D3hot.
> +                */
> +               bridge = pci_upstream_bridge(dev);
> +               if (bridge && pci_bridge_d3_possible(bridge))
> +                       target_state = PCI_D3cold;
> +       }
>
>         /*
>          * If the device is in D3cold even though it's not power-manageable by
> --
> 2.30.2
>
Mika Westerberg May 18, 2021, 10:44 a.m. UTC | #2
Hi,

On Mon, May 17, 2021 at 03:33:56PM -0700, Rajat Jain wrote:
> [+Kai]
> 
> Hi,
> 
> I don't understand the power management very well, so pardon my
> ignorance but I have a question.
> 
> On Mon, May 10, 2021 at 3:30 AM Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> >
> > ASMedia xHCI controller only supports PME from D3cold:
> >
> > 11:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller (prog-if 30 [XHCI])
> >   ...
> >   Capabilities: [78] Power Management version 3
> >           Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
> >           Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >
> > Now, if the controller is part of a Thunderbolt device for instance, it
> > is connected to a PCIe switch downstream port. When the hierarchy then
> > enters D3cold as a result of s2idle cycle pci_target_state() returns D0
> > because the device does not support PME from the default target_state
> > (D3hot). So what happens is that the whole hierarchy is left into D0
> > breaking power management.at suspend time or resume time
> 
> Can you please provide a small call stack, when this issue is seen?
> (I'm primarily trying to understand whether the issue is breaking
> suspend, or the suspend is fine, but resume is broken?)

It is on suspend path. I added WARN_ON() to log the whole chain:

[   37.820164] RIP: 0010:pci_target_state+0x7f/0x100
[   37.820172] Code: 9d 00 00 00 01 19 c0 f7 d0 83 e0 03 83 bb 98 00 00 00 04 74 1b 40 84 ed 74 e1 0f b6 93 9e 00 00 00 f6 c2 1f 74 3d 85 c0 75 1c <0f> 0b 31 c0 eb cb b8 04 00 00 00 40 84 ed 74 c1 0f b6 93 9e 00 00
[   37.820180] RSP: 0018:ffff9e6bc037bd40 EFLAGS: 00010246
[   37.820188] RAX: 0000000000000000 RBX: ffff8a5984da4000 RCX: 0000000000000000
[   37.820194] RDX: 0000000000000010 RSI: 0000000000000001 RDI: 0000000000000000
[   37.820199] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[   37.820203] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   37.820208] R13: 0000000000000001 R14: 0000000000000002 R15: 0000000000000000
[   37.820213] FS:  0000000000000000(0000) GS:ffff8a5d1f600000(0000) knlGS:0000000000000000
[   37.820221] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.820243] CR2: 00007fe0fc96e1c8 CR3: 00000002a3012001 CR4: 0000000000770ee0
[   37.820249] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   37.820253] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   37.820256] PKRU: 55555554
[   37.820260] Call Trace:
[   37.820265]  pci_prepare_to_sleep+0x2e/0xc0
[   37.820275]  hcd_pci_suspend_noirq+0x58/0x130
[   37.820281]  ? find_held_lock+0x32/0x90
[   37.820289]  pci_pm_suspend_noirq+0x6d/0x290
[   37.820297]  ? lock_release+0x14f/0x430
[   37.820306]  ? pci_pm_suspend_late+0x30/0x30
[   37.820315]  dpm_run_callback+0x61/0x1d0
[   37.820330]  __device_suspend_noirq+0x84/0x270
[   37.820340]  async_suspend_noirq+0x16/0x90
[   37.820348]  async_run_entry_fn+0x2e/0x120
[   37.820360]  process_one_work+0x27c/0x540
[   37.820378]  worker_thread+0x4d/0x3f0
[   37.820387]  ? rescuer_thread+0x390/0x390
[   37.820396]  kthread+0x14c/0x170
[   37.820403]  ? __kthread_bind_mask+0x60/0x60
[   37.820413]  ret_from_fork+0x1f/0x30


> > For this reason choose target_state to be D3cold if there is a upstream
> > bridge that is power manageable.
> 
> It seems to me that the goal of pci_target_state() is to find the
> lowest power state that a device can be put into, from which device
> can still generate PME (if needed). So I'm curious why it starts with
> target_state = PCI_D3hot in the first place? Wouldn't starting with
> PCI_D3cold will always be better (regardless of parent bridge
> capabilities)?

That one is not my code but I suspect that for two reasons: one is historic
(older devices did not have proper D3cold support), the other is that typically
with S3/S4 it is the BIOS that in the end configures wakes (which of course
does not work with s2idle and especially devices that are not-onboard to begin
with).

I may be wrong too.

> And then I came across the commit 8feaec33b986 ("PCI / PM: Always
> check PME wakeup capability for runtime wakeup support"), which
> addresses the same device that this patch addresses, and 1 excerpt
> from the commit log that stood out:
> ============================================================
>     In addition, change wakeup flag passed to pci_target_state() from false
>     to true, because we want to find the deepest state *different from D3cold*
>     that the device can still generate PME#. In this case, it's D0 for the
>     device in question.
> ============================================================
> 
> So, does returning D3Cold from this function break any other
> assumption somewhere?

I can't be 100% sure but effectively the device is in D3cold once the parent
bridge is in D3hot as the device config space is not accessible anymore.
Kai-Heng Feng May 18, 2021, 11 a.m. UTC | #3
Hi,

On Tue, May 18, 2021 at 6:44 PM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> Hi,
>
> On Mon, May 17, 2021 at 03:33:56PM -0700, Rajat Jain wrote:
> > [+Kai]
> >
> > Hi,
> >
> > I don't understand the power management very well, so pardon my
> > ignorance but I have a question.
> >
> > On Mon, May 10, 2021 at 3:30 AM Mika Westerberg
> > <mika.westerberg@linux.intel.com> wrote:
> > >
> > > ASMedia xHCI controller only supports PME from D3cold:
> > >
> > > 11:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller (prog-if 30 [XHCI])
> > >   ...
> > >   Capabilities: [78] Power Management version 3
> > >           Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
> > >           Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > >
> > > Now, if the controller is part of a Thunderbolt device for instance, it
> > > is connected to a PCIe switch downstream port. When the hierarchy then
> > > enters D3cold as a result of s2idle cycle pci_target_state() returns D0
> > > because the device does not support PME from the default target_state
> > > (D3hot). So what happens is that the whole hierarchy is left into D0
> > > breaking power management.at suspend time or resume time
> >
> > Can you please provide a small call stack, when this issue is seen?
> > (I'm primarily trying to understand whether the issue is breaking
> > suspend, or the suspend is fine, but resume is broken?)
>
> It is on suspend path. I added WARN_ON() to log the whole chain:
>
> [   37.820164] RIP: 0010:pci_target_state+0x7f/0x100
> [   37.820172] Code: 9d 00 00 00 01 19 c0 f7 d0 83 e0 03 83 bb 98 00 00 00 04 74 1b 40 84 ed 74 e1 0f b6 93 9e 00 00 00 f6 c2 1f 74 3d 85 c0 75 1c <0f> 0b 31 c0 eb cb b8 04 00 00 00 40 84 ed 74 c1 0f b6 93 9e 00 00
> [   37.820180] RSP: 0018:ffff9e6bc037bd40 EFLAGS: 00010246
> [   37.820188] RAX: 0000000000000000 RBX: ffff8a5984da4000 RCX: 0000000000000000
> [   37.820194] RDX: 0000000000000010 RSI: 0000000000000001 RDI: 0000000000000000
> [   37.820199] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
> [   37.820203] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [   37.820208] R13: 0000000000000001 R14: 0000000000000002 R15: 0000000000000000
> [   37.820213] FS:  0000000000000000(0000) GS:ffff8a5d1f600000(0000) knlGS:0000000000000000
> [   37.820221] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   37.820243] CR2: 00007fe0fc96e1c8 CR3: 00000002a3012001 CR4: 0000000000770ee0
> [   37.820249] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   37.820253] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   37.820256] PKRU: 55555554
> [   37.820260] Call Trace:
> [   37.820265]  pci_prepare_to_sleep+0x2e/0xc0
> [   37.820275]  hcd_pci_suspend_noirq+0x58/0x130
> [   37.820281]  ? find_held_lock+0x32/0x90
> [   37.820289]  pci_pm_suspend_noirq+0x6d/0x290
> [   37.820297]  ? lock_release+0x14f/0x430
> [   37.820306]  ? pci_pm_suspend_late+0x30/0x30
> [   37.820315]  dpm_run_callback+0x61/0x1d0
> [   37.820330]  __device_suspend_noirq+0x84/0x270
> [   37.820340]  async_suspend_noirq+0x16/0x90
> [   37.820348]  async_run_entry_fn+0x2e/0x120
> [   37.820360]  process_one_work+0x27c/0x540
> [   37.820378]  worker_thread+0x4d/0x3f0
> [   37.820387]  ? rescuer_thread+0x390/0x390
> [   37.820396]  kthread+0x14c/0x170
> [   37.820403]  ? __kthread_bind_mask+0x60/0x60
> [   37.820413]  ret_from_fork+0x1f/0x30
>
>
> > > For this reason choose target_state to be D3cold if there is a upstream
> > > bridge that is power manageable.
> >
> > It seems to me that the goal of pci_target_state() is to find the
> > lowest power state that a device can be put into, from which device
> > can still generate PME (if needed). So I'm curious why it starts with
> > target_state = PCI_D3hot in the first place? Wouldn't starting with
> > PCI_D3cold will always be better (regardless of parent bridge
> > capabilities)?
>
> That one is not my code but I suspect that for two reasons: one is historic
> (older devices did not have proper D3cold support), the other is that typically
> with S3/S4 it is the BIOS that in the end configures wakes (which of course
> does not work with s2idle and especially devices that are not-onboard to begin
> with).
>
> I may be wrong too.
>
> > And then I came across the commit 8feaec33b986 ("PCI / PM: Always
> > check PME wakeup capability for runtime wakeup support"), which
> > addresses the same device that this patch addresses, and 1 excerpt
> > from the commit log that stood out:
> > ============================================================
> >     In addition, change wakeup flag passed to pci_target_state() from false
> >     to true, because we want to find the deepest state *different from D3cold*
> >     that the device can still generate PME#. In this case, it's D0 for the
> >     device in question.
> > ============================================================
> >
> > So, does returning D3Cold from this function break any other
> > assumption somewhere?
>
> I can't be 100% sure but effectively the device is in D3cold once the parent
> bridge is in D3hot as the device config space is not accessible anymore.

The patch is to fix a regression that ASmedia is runtime-suspended to
D3cold and PME stops working.
D3cold PME in general requires platform firmware support, so that
commit will prevent D3cold being the target state for _runtime_
suspend.

For system-wide suspend I think this patch is doing the correct thing
as Mika mentioned above.

Acked-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Bjorn Helgaas May 25, 2021, 11:36 p.m. UTC | #4
On Mon, May 10, 2021 at 01:26:47PM +0300, Mika Westerberg wrote:
> ASMedia xHCI controller only supports PME from D3cold:
> 
> 11:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller (prog-if 30 [XHCI])
>   ...
>   Capabilities: [78] Power Management version 3
>   	  Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
> 	  Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> 
> Now, if the controller is part of a Thunderbolt device for instance, it
> is connected to a PCIe switch downstream port. When the hierarchy then
> enters D3cold as a result of s2idle cycle pci_target_state() returns D0
> because the device does not support PME from the default target_state
> (D3hot). So what happens is that the whole hierarchy is left into D0
> breaking power management.
> 
> For this reason choose target_state to be D3cold if there is a upstream
> bridge that is power manageable. The reasoning here is that the upstream
> bridge will be also placed into D3 making the effective power state of
> the device in question to be D3cold.

I'm having a hard time understanding this in a generic way and
relating it to anything in the specs.  This isn't written as a quirk,
so I assume this is not specific to the ASM1042A or to Thunderbolt.

The same considerations apparently should apply to *any* device that
is below a power-manageable bridge and doesn't support PME from D3hot.
If so, let's lead off the commit log with that, and use ASM1042A
merely as an example instead of as the motivation.

"When the hierarchy enters D3cold" -- I guess you mean the bridge and
all downstream devices are in D3cold?  Does a bridge being in D3cold
actually force all downstream devices to be in D3cold as well?  I
guess not, because it seems that the bridge is in D3 but the whole
point of this is to change the target_state of the device from D0 to
D3cold, right?

Is s2idle relevant in itself?  My impression is that the important
things are the PME capabilities and the D0/D3hot/D3cold states of the
bridge and the device, and "s2idle" is just a distraction.

"Breaking power management" -- I assume this just means we don't save
as much power as we'd like?

"For this reason" -- I missed the actual reason.  Is the reason "the
whole hierarchy is in D0 and wastes more power"?  I guess we don't
really need a *reason*; saving power is good enough.  What we *do*
need is justification for why it is safe, and I can't connect the dots
yet.

You mention putting the bridge in D3.  Does that mean D3hot or D3cold?
If it can be either, say that.  If it means only one, be specific.
I'd like to eradicate "D3" from PCI because the ambiguity just makes
things hard.

What does "the effective power state of the device is D3cold" mean?
Does that mean the device is *actually* in D3cold, so it has no power
and will need complete re-initialization?  Or does it simply mean that
the device is unreachable because the bridge is not in D0, and the OS
can't directly wake it?

These are all questions I'd like to see answered in the commit log,
not just in the email thread.

> Reported-by: Utkarsh H Patel <utkarsh.h.patel@intel.com>
> Reported-by: Koba Ko <koba.ko@canonical.com>
> Tested-by: Koba Ko <koba.ko@canonical.com>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
>  drivers/pci/pci.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b717680377a9..e3f3b9010762 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2578,8 +2578,19 @@ static pci_power_t pci_target_state(struct pci_dev *dev, bool wakeup)
>  		return target_state;
>  	}
>  
> -	if (!dev->pm_cap)
> +	if (!dev->pm_cap) {
>  		target_state = PCI_D0;
> +	} else {
> +		struct pci_dev *bridge;
> +
> +		/*
> +		 * If the upstream bridge can be put to D3 then it means
> +		 * that our target state is D3cold instead of D3hot.

Can you expand on this a bit?  Expand "D3" to be specific, and more
importantly, say something about *why* the target state is D3cold.

> +		 */
> +		bridge = pci_upstream_bridge(dev);
> +		if (bridge && pci_bridge_d3_possible(bridge))
> +			target_state = PCI_D3cold;

I guess we don't or can't do this for the
platform_pci_power_manageable() case?

> +	}
>  
>  	/*
>  	 * If the device is in D3cold even though it's not power-manageable by
> -- 
> 2.30.2
>
Mika Westerberg May 26, 2021, 10:42 a.m. UTC | #5
Hi,

On Tue, May 25, 2021 at 06:36:04PM -0500, Bjorn Helgaas wrote:
> On Mon, May 10, 2021 at 01:26:47PM +0300, Mika Westerberg wrote:
> > ASMedia xHCI controller only supports PME from D3cold:
> > 
> > 11:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller (prog-if 30 [XHCI])
> >   ...
> >   Capabilities: [78] Power Management version 3
> >   	  Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
> > 	  Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > 
> > Now, if the controller is part of a Thunderbolt device for instance, it
> > is connected to a PCIe switch downstream port. When the hierarchy then
> > enters D3cold as a result of s2idle cycle pci_target_state() returns D0
> > because the device does not support PME from the default target_state
> > (D3hot). So what happens is that the whole hierarchy is left into D0
> > breaking power management.
> > 
> > For this reason choose target_state to be D3cold if there is a upstream
> > bridge that is power manageable. The reasoning here is that the upstream
> > bridge will be also placed into D3 making the effective power state of
> > the device in question to be D3cold.
> 
> I'm having a hard time understanding this in a generic way and
> relating it to anything in the specs.  This isn't written as a quirk,
> so I assume this is not specific to the ASM1042A or to Thunderbolt.

Right. Sorry about that. I tried to explain this as well as I can.

> The same considerations apparently should apply to *any* device that
> is below a power-manageable bridge and doesn't support PME from D3hot.
> If so, let's lead off the commit log with that, and use ASM1042A
> merely as an example instead of as the motivation.

OK, got it.

> "When the hierarchy enters D3cold" -- I guess you mean the bridge and
> all downstream devices are in D3cold?  Does a bridge being in D3cold
> actually force all downstream devices to be in D3cold as well?  I
> guess not, because it seems that the bridge is in D3 but the whole
> point of this is to change the target_state of the device from D0 to
> D3cold, right?

Right but the upstream bridge must not be in higher power state than the
children so if the brige is in D3cold so are its children. Otherwise it
is against the PCIe spec.

> Is s2idle relevant in itself?  My impression is that the important
> things are the PME capabilities and the D0/D3hot/D3cold states of the
> bridge and the device, and "s2idle" is just a distraction.

It is not relevant. Only an example of a situation we observe the issue.

> "Breaking power management" -- I assume this just means we don't save
> as much power as we'd like?

Yes, it means we leave one root port and the connected devices fully
powered, and in case of Intel hardware it means the system does not
enter low power idle states either.

> "For this reason" -- I missed the actual reason.  Is the reason "the
> whole hierarchy is in D0 and wastes more power"?  I guess we don't
> really need a *reason*; saving power is good enough.  What we *do*
> need is justification for why it is safe, and I can't connect the dots
> yet.
> 
> You mention putting the bridge in D3.  Does that mean D3hot or D3cold?
> If it can be either, say that.  If it means only one, be specific.
> I'd like to eradicate "D3" from PCI because the ambiguity just makes
> things hard.

It means D3hot but later on D3cold once the ACPI power resource has been
turned off. That's why I said only D3. But sure.

> What does "the effective power state of the device is D3cold" mean?
> Does that mean the device is *actually* in D3cold, so it has no power
> and will need complete re-initialization?  Or does it simply mean that
> the device is unreachable because the bridge is not in D0, and the OS
> can't directly wake it?

It means the upstream bridge is in D3hot and thus not forward
configuration requests to the device below it. In addition once all the
devices under the root port are in D3hot the ACPI powre resource is
turned off, the links go to L2 (or L3) and the device is actually in
D3cold and need to go through reset and complete re-initialization
before it can be functional again.

> These are all questions I'd like to see answered in the commit log,
> not just in the email thread.

Sure.

In a nutshell what happens here is that the one device prevents all the
devices above it (up to the root port) to enter low power states thus
wasting energy.

I will add all these to the commit log in next version.

> > Reported-by: Utkarsh H Patel <utkarsh.h.patel@intel.com>
> > Reported-by: Koba Ko <koba.ko@canonical.com>
> > Tested-by: Koba Ko <koba.ko@canonical.com>
> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > ---
> >  drivers/pci/pci.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index b717680377a9..e3f3b9010762 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -2578,8 +2578,19 @@ static pci_power_t pci_target_state(struct pci_dev *dev, bool wakeup)
> >  		return target_state;
> >  	}
> >  
> > -	if (!dev->pm_cap)
> > +	if (!dev->pm_cap) {
> >  		target_state = PCI_D0;
> > +	} else {
> > +		struct pci_dev *bridge;
> > +
> > +		/*
> > +		 * If the upstream bridge can be put to D3 then it means
> > +		 * that our target state is D3cold instead of D3hot.
> 
> Can you expand on this a bit?  Expand "D3" to be specific, and more
> importantly, say something about *why* the target state is D3cold.

OK.

> > +		 */
> > +		bridge = pci_upstream_bridge(dev);
> > +		if (bridge && pci_bridge_d3_possible(bridge))
> > +			target_state = PCI_D3cold;
> 
> I guess we don't or can't do this for the
> platform_pci_power_manageable() case?

No because this may be a bridge that has no ACPI description (e.g
further down the hierarchy).

> 
> > +	}
> >  
> >  	/*
> >  	 * If the device is in D3cold even though it's not power-manageable by
> > -- 
> > 2.30.2
> >
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b717680377a9..e3f3b9010762 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2578,8 +2578,19 @@  static pci_power_t pci_target_state(struct pci_dev *dev, bool wakeup)
 		return target_state;
 	}
 
-	if (!dev->pm_cap)
+	if (!dev->pm_cap) {
 		target_state = PCI_D0;
+	} else {
+		struct pci_dev *bridge;
+
+		/*
+		 * If the upstream bridge can be put to D3 then it means
+		 * that our target state is D3cold instead of D3hot.
+		 */
+		bridge = pci_upstream_bridge(dev);
+		if (bridge && pci_bridge_d3_possible(bridge))
+			target_state = PCI_D3cold;
+	}
 
 	/*
 	 * If the device is in D3cold even though it's not power-manageable by