diff mbox series

[v2] PCI: Run platform power transition on initial D0 entry

Message ID 20210314000439.3138941-1-luzmaximilian@gmail.com (mailing list archive)
State Not Applicable
Delegated to: Bjorn Helgaas
Headers show
Series [v2] PCI: Run platform power transition on initial D0 entry | expand

Commit Message

Maximilian Luz March 14, 2021, 12:04 a.m. UTC
On some devices and platforms, the initial platform (e.g. ACPI) power
state is not in sync with the power state of the PCI device.

This seems like it is, for all intents and purposes, an issue with the
device firmware (e.g. ACPI). On some devices, specifically Microsoft
Surface Books 2 and 3, we encounter ACPI code akin to the following
power resource, corresponding to a PCI device:

    PowerResource (PRP5, 0x00, 0x0000)
    {
        // Initialized to zero, i.e. off. There is no logic for checking
        // the actual state dynamically.
        Name (_STA, Zero)

        Method (_ON, 0, Serialized)
        {
            // ... code omitted ...
            _STA = One
        }

        Method (_OFF, 0, Serialized)
        {
            // ... code omitted ...
            _STA = Zero
        }
    }

This resource is initialized to 'off' and does not have any logic for
checking its actual state, i.e. the state of the corresponding PCI
device. The stored state of this resource can only be changed by running
the (platform/ACPI) power transition functions (i.e. _ON and _OFF).

This means that, at boot, the PCI device power state is out of sync with
the power state of the corresponding ACPI resource.

During initial bring-up of a PCI device, pci_enable_device_flags()
updates its PCI core state (from initially 'unknown') by reading from
its PCI_PM_CTRL register. It does, however, not check if the platform
(here ACPI) state is in sync with/valid for the actual device state and
needs updating.

The later call to pci_set_power_state(..., PCI_D0), which would normally
perform such platform transitions, will evaluate to a no-op if the
stored state has been updated to D0 via this. Thus any such power
resource, required for D0, will never get "officially" turned on.

One could also make the case that this could cause PCI devices requiring
some secondary power resources to not work properly, as no attempt is
ever made at checking that they are in fact synchronized (i.e. turned
on/off as required e.g. by ACPI) for the updated state.

Ultimately, this breaks the first D3cold entry for the discrete GPU on
the aforementioned Surface Books. On transition of the PCI device to
D3cold, the power resource is not properly turned off as it is already
considered off:

    $ echo auto > /sys/bus/pci/devices/0000:02:00.0/power/control

    [  104.430304] ACPI: \_SB_.PCI0.RP05: ACPI: PM: GPE69 enabled for wakeup
    [  104.430310] pcieport 0000:00:1c.4: Wakeup enabled by ACPI
    [  104.444144] ACPI: \_SB_.PCI0.RP05: ACPI: PM: Power state change: D3cold -> D3cold
    [  104.444151] acpi device:3b: Device already in D3cold
    [  104.444154] pcieport 0000:00:1c.4: power state changed by ACPI to D3cold

However, the device still consumes a non-negligible amount of power and
gets warm. A full power-cycle fixes this and results in the device being
properly transitioned to D3cold:

    $ echo on > /sys/bus/pci/devices/0000:02:00.0/power/control

    [  134.258039] ACPI: \_SB_.PCI0.RP05: ACPI: PM: Power state change: D3cold -> D0
    [  134.369140] ACPI: PM: Power resource [PRP5] turned on
    [  134.369157] acpi device:3b: Power state changed to D0
    [  134.369165] pcieport 0000:00:1c.4: power state changed by ACPI to D0
    [  134.369338] pcieport 0000:00:1c.4: Wakeup disabled by ACPI

    $ echo auto > /sys/bus/pci/devices/0000:02:00.0/power/control

    [  310.338597] ACPI: \_SB_.PCI0.RP05: ACPI: PM: GPE69 enabled for wakeup
    [  310.338604] pcieport 0000:00:1c.4: Wakeup enabled by ACPI
    [  310.353841] ACPI: \_SB_.PCI0.RP05: ACPI: PM: Power state change: D0 -> D3cold
    [  310.403879] ACPI: PM: Power resource [PRP5] turned off
    [  310.403894] acpi device:3b: Power state changed to D3cold
    [  310.403901] pcieport 0000:00:1c.4: power state changed by ACPI to D3cold

By replacing pci_set_power_state(..., PCI_DO) with pci_power_up() in
do_pci_enable_device(), we can ensure that the state of power resources
is always checked. This essentially drops the initial (first) check on
the current state of the PCI device and calls platform specific code,
i.e. pci_platform_power_transition() to perform the appropriate platform
transition.

This can be verified by

    $ echo auto > /sys/bus/pci/devices/0000:02:00.0/power/control

    [  152.790448] ACPI: \_SB_.PCI0.RP05: ACPI: PM: GPE69 enabled for wakeup
    [  152.790454] pcieport 0000:00:1c.4: Wakeup enabled by ACPI
    [  152.804252] ACPI: \_SB_.PCI0.RP05: ACPI: PM: Power state change: D0 -> D3cold
    [  152.857548] ACPI: PM: Power resource [PRP5] turned off
    [  152.857563] acpi device:3b: Power state changed to D3cold
    [  152.857570] pcieport 0000:00:1c.4: power state changed by ACPI to D3cold

which yields the expected behavior.

Note that

 a) pci_set_power_state() would call pci_power_up() if the check on the
 current state failed. This means that in case of the updated state not
 being D0, there is no functional change.

 b) The core and platform transitions, i.e. pci_raw_set_power_state()
 and pci_platform_power_transition() (should) both have checks on the
 current state. So in case either state is up to date, the corresponding
 transition should still evaluate to a no-op. The only notable
 difference made by the suggested replacement is pci_resume_bus() being
 called for devices where runtime D3cold is enabled.

Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
---

Changes in v2:
 - No code changes
 - Improve commit message, add details
 - Add Rafael and linux-pm to CCs

---
 drivers/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Rafael J. Wysocki March 15, 2021, 3:34 p.m. UTC | #1
On Sun, Mar 14, 2021 at 1:06 AM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>
> On some devices and platforms, the initial platform (e.g. ACPI) power
> state is not in sync with the power state of the PCI device.
>
> This seems like it is, for all intents and purposes, an issue with the
> device firmware (e.g. ACPI). On some devices, specifically Microsoft
> Surface Books 2 and 3, we encounter ACPI code akin to the following
> power resource, corresponding to a PCI device:
>
>     PowerResource (PRP5, 0x00, 0x0000)
>     {
>         // Initialized to zero, i.e. off. There is no logic for checking
>         // the actual state dynamically.
>         Name (_STA, Zero)
>
>         Method (_ON, 0, Serialized)
>         {
>             // ... code omitted ...
>             _STA = One
>         }
>
>         Method (_OFF, 0, Serialized)
>         {
>             // ... code omitted ...
>             _STA = Zero
>         }
>     }
>
> This resource is initialized to 'off' and does not have any logic for
> checking its actual state, i.e. the state of the corresponding PCI
> device. The stored state of this resource can only be changed by running
> the (platform/ACPI) power transition functions (i.e. _ON and _OFF).

Well, there is _STA that returns "off" initially, so the OS should set
the initial state of the device to D3cold and transition it into D0 as
appropriate (i.e. starting with setting all of the power resources
used by it to "on").

> This means that, at boot, the PCI device power state is out of sync with
> the power state of the corresponding ACPI resource.
>
> During initial bring-up of a PCI device, pci_enable_device_flags()
> updates its PCI core state (from initially 'unknown') by reading from
> its PCI_PM_CTRL register. It does, however, not check if the platform
> (here ACPI) state is in sync with/valid for the actual device state and
> needs updating.

Well, that's inconsistent.

Also, it is rather pointless to update the device's power state at
this point, because nothing between this point and the later
do_pci_enable_device() call in this function requires its
current_state to be up to date AFAICS.

Have you tried to drop the power state update from
pci_enable_device_flags()?  [Note that we're talking about relatively
old code here and it looks like that code is not necessary any more.]

Either it should be possible to do that and all should work, or there
is a good reason to make current_state reflect the real current power
state of the device upfront, but then that should be done by putting
it into D0 diractly at that point rather than later.

Calling pci_power_up(dev) instead of pci_set_power_state(dev, PCI_D0)
when current_state is already 0 only pokes at the power resources,
because pci_raw_set_power_state() will do nothing then, but that is a
rather less-than-straightforward way of doing this.  Moreover, the
ordering of actions mandated by the spec is to set power resources to
"on" first and then write to the PMCSR, not the other way around.
Maximilian Luz March 15, 2021, 6:28 p.m. UTC | #2
On 3/15/21 4:34 PM, Rafael J. Wysocki wrote:
> On Sun, Mar 14, 2021 at 1:06 AM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>
>> On some devices and platforms, the initial platform (e.g. ACPI) power
>> state is not in sync with the power state of the PCI device.
>>
>> This seems like it is, for all intents and purposes, an issue with the
>> device firmware (e.g. ACPI). On some devices, specifically Microsoft
>> Surface Books 2 and 3, we encounter ACPI code akin to the following
>> power resource, corresponding to a PCI device:
>>
>>      PowerResource (PRP5, 0x00, 0x0000)
>>      {
>>          // Initialized to zero, i.e. off. There is no logic for checking
>>          // the actual state dynamically.
>>          Name (_STA, Zero)
>>
>>          Method (_ON, 0, Serialized)
>>          {
>>              // ... code omitted ...
>>              _STA = One
>>          }
>>
>>          Method (_OFF, 0, Serialized)
>>          {
>>              // ... code omitted ...
>>              _STA = Zero
>>          }
>>      }
>>
>> This resource is initialized to 'off' and does not have any logic for
>> checking its actual state, i.e. the state of the corresponding PCI
>> device. The stored state of this resource can only be changed by running
>> the (platform/ACPI) power transition functions (i.e. _ON and _OFF).
> 
> Well, there is _STA that returns "off" initially, so the OS should set
> the initial state of the device to D3cold and transition it into D0 as
> appropriate (i.e. starting with setting all of the power resources
> used by it to "on").
> 
>> This means that, at boot, the PCI device power state is out of sync with
>> the power state of the corresponding ACPI resource.
>>
>> During initial bring-up of a PCI device, pci_enable_device_flags()
>> updates its PCI core state (from initially 'unknown') by reading from
>> its PCI_PM_CTRL register. It does, however, not check if the platform
>> (here ACPI) state is in sync with/valid for the actual device state and
>> needs updating.
> 
> Well, that's inconsistent.
> 
> Also, it is rather pointless to update the device's power state at
> this point, because nothing between this point and the later
> do_pci_enable_device() call in this function requires its
> current_state to be up to date AFAICS.
> 
> Have you tried to drop the power state update from
> pci_enable_device_flags()?  [Note that we're talking about relatively
> old code here and it looks like that code is not necessary any more.]

I had not tried this before, as I assumed the comment was still
relevant. I did test that now and it works! I can't detect any
regressions.

Do you want to send this in or should I do that?

> Either it should be possible to do that and all should work, or there
> is a good reason to make current_state reflect the real current power
> state of the device upfront, but then that should be done by putting
> it into D0 diractly at that point rather than later.
> 
> Calling pci_power_up(dev) instead of pci_set_power_state(dev, PCI_D0)
> when current_state is already 0 only pokes at the power resources,
> because pci_raw_set_power_state() will do nothing then, but that is a
> rather less-than-straightforward way of doing this.  Moreover, the
> ordering of actions mandated by the spec is to set power resources to
> "on" first and then write to the PMCSR, not the other way around.

I don't know much about the PCI core (let alone spec), so that seemed
like the least intrusive way to fix this for me.

Thanks!
Max
Rafael J. Wysocki March 16, 2021, 1:36 p.m. UTC | #3
On Mon, Mar 15, 2021 at 7:28 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>
> On 3/15/21 4:34 PM, Rafael J. Wysocki wrote:
> > On Sun, Mar 14, 2021 at 1:06 AM Maximilian Luz <luzmaximilian@gmail.com> wrote:
> >>
> >> On some devices and platforms, the initial platform (e.g. ACPI) power
> >> state is not in sync with the power state of the PCI device.
> >>
> >> This seems like it is, for all intents and purposes, an issue with the
> >> device firmware (e.g. ACPI). On some devices, specifically Microsoft
> >> Surface Books 2 and 3, we encounter ACPI code akin to the following
> >> power resource, corresponding to a PCI device:
> >>
> >>      PowerResource (PRP5, 0x00, 0x0000)
> >>      {
> >>          // Initialized to zero, i.e. off. There is no logic for checking
> >>          // the actual state dynamically.
> >>          Name (_STA, Zero)
> >>
> >>          Method (_ON, 0, Serialized)
> >>          {
> >>              // ... code omitted ...
> >>              _STA = One
> >>          }
> >>
> >>          Method (_OFF, 0, Serialized)
> >>          {
> >>              // ... code omitted ...
> >>              _STA = Zero
> >>          }
> >>      }
> >>
> >> This resource is initialized to 'off' and does not have any logic for
> >> checking its actual state, i.e. the state of the corresponding PCI
> >> device. The stored state of this resource can only be changed by running
> >> the (platform/ACPI) power transition functions (i.e. _ON and _OFF).
> >
> > Well, there is _STA that returns "off" initially, so the OS should set
> > the initial state of the device to D3cold and transition it into D0 as
> > appropriate (i.e. starting with setting all of the power resources
> > used by it to "on").
> >
> >> This means that, at boot, the PCI device power state is out of sync with
> >> the power state of the corresponding ACPI resource.
> >>
> >> During initial bring-up of a PCI device, pci_enable_device_flags()
> >> updates its PCI core state (from initially 'unknown') by reading from
> >> its PCI_PM_CTRL register. It does, however, not check if the platform
> >> (here ACPI) state is in sync with/valid for the actual device state and
> >> needs updating.
> >
> > Well, that's inconsistent.
> >
> > Also, it is rather pointless to update the device's power state at
> > this point, because nothing between this point and the later
> > do_pci_enable_device() call in this function requires its
> > current_state to be up to date AFAICS.
> >
> > Have you tried to drop the power state update from
> > pci_enable_device_flags()?  [Note that we're talking about relatively
> > old code here and it looks like that code is not necessary any more.]
>
> I had not tried this before, as I assumed the comment was still
> relevant. I did test that now and it works! I can't detect any
> regressions.
>
> Do you want to send this in or should I do that?

I'll post it, thanks!
Maximilian Luz March 16, 2021, 2:10 p.m. UTC | #4
On 3/16/21 2:36 PM, Rafael J. Wysocki wrote:
> On Mon, Mar 15, 2021 at 7:28 PM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>
>> On 3/15/21 4:34 PM, Rafael J. Wysocki wrote:
>>> On Sun, Mar 14, 2021 at 1:06 AM Maximilian Luz <luzmaximilian@gmail.com> wrote:
>>>>
>>>> On some devices and platforms, the initial platform (e.g. ACPI) power
>>>> state is not in sync with the power state of the PCI device.
>>>>
>>>> This seems like it is, for all intents and purposes, an issue with the
>>>> device firmware (e.g. ACPI). On some devices, specifically Microsoft
>>>> Surface Books 2 and 3, we encounter ACPI code akin to the following
>>>> power resource, corresponding to a PCI device:
>>>>
>>>>       PowerResource (PRP5, 0x00, 0x0000)
>>>>       {
>>>>           // Initialized to zero, i.e. off. There is no logic for checking
>>>>           // the actual state dynamically.
>>>>           Name (_STA, Zero)
>>>>
>>>>           Method (_ON, 0, Serialized)
>>>>           {
>>>>               // ... code omitted ...
>>>>               _STA = One
>>>>           }
>>>>
>>>>           Method (_OFF, 0, Serialized)
>>>>           {
>>>>               // ... code omitted ...
>>>>               _STA = Zero
>>>>           }
>>>>       }
>>>>
>>>> This resource is initialized to 'off' and does not have any logic for
>>>> checking its actual state, i.e. the state of the corresponding PCI
>>>> device. The stored state of this resource can only be changed by running
>>>> the (platform/ACPI) power transition functions (i.e. _ON and _OFF).
>>>
>>> Well, there is _STA that returns "off" initially, so the OS should set
>>> the initial state of the device to D3cold and transition it into D0 as
>>> appropriate (i.e. starting with setting all of the power resources
>>> used by it to "on").
>>>
>>>> This means that, at boot, the PCI device power state is out of sync with
>>>> the power state of the corresponding ACPI resource.
>>>>
>>>> During initial bring-up of a PCI device, pci_enable_device_flags()
>>>> updates its PCI core state (from initially 'unknown') by reading from
>>>> its PCI_PM_CTRL register. It does, however, not check if the platform
>>>> (here ACPI) state is in sync with/valid for the actual device state and
>>>> needs updating.
>>>
>>> Well, that's inconsistent.
>>>
>>> Also, it is rather pointless to update the device's power state at
>>> this point, because nothing between this point and the later
>>> do_pci_enable_device() call in this function requires its
>>> current_state to be up to date AFAICS.
>>>
>>> Have you tried to drop the power state update from
>>> pci_enable_device_flags()?  [Note that we're talking about relatively
>>> old code here and it looks like that code is not necessary any more.]
>>
>> I had not tried this before, as I assumed the comment was still
>> relevant. I did test that now and it works! I can't detect any
>> regressions.
>>
>> Do you want to send this in or should I do that?
> 
> I'll post it, thanks!

Thank you!

Feel free to add my tested-by tag.

Regards,
Max
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 16a17215f633..cc42315210e4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1800,7 +1800,7 @@  static int do_pci_enable_device(struct pci_dev *dev, int bars)
 	u16 cmd;
 	u8 pin;
 
-	err = pci_set_power_state(dev, PCI_D0);
+	err = pci_power_up(dev);
 	if (err < 0 && err != -EIO)
 		return err;