Message ID | 20220124122107.12148-1-abhsahu@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: Fix the ACPI power state during runtime resume | expand |
[+cc Rafael, hoping for your review :) Wonder if we should add something like this to MAINTAINERS so you get cc'd on power-related things: diff --git a/MAINTAINERS b/MAINTAINERS index ea3e6c914384..3d9a211cad5d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15422,6 +15422,7 @@ F: include/linux/pm.h F: include/linux/pm_* F: include/linux/powercap.h F: kernel/configs/nopm.config +K: pci_[a-z_]*power[a-z_]*\( DYNAMIC THERMAL POWER MANAGEMENT (DTPM) M: Daniel Lezcano <daniel.lezcano@kernel.org> ] On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote: > Consider the following sequence during PCI device runtime > suspend/resume: > > 1. PCI device goes into runtime suspended state. The PCI state > will be changed to PCI_D0 and then pci_platform_power_transition() > will be called which changes the ACPI state to ACPI_STATE_D3_HOT. > > 2. Parent bridge goes into runtime suspended state. If parent > bridge supports D3cold, then it will change the power state of all its > children to D3cold state and the power will be removed. > > 3. During wake-up time, the bridge will be runtime resumed first > and pci_power_up() will be called for the bridge. Now, the power > supply will be resumed. > > 4. pci_resume_bus() will be called which will internally invoke > pci_restore_standard_config(). pci_update_current_state() > will read PCI_PM_CTRL register and the current_state will be > updated to D0. > > In the above process, at step 4, the ACPI device state will still be > ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being > invoked. We need call the pci_platform_power_transition() with state > D0 to change the ACPI state to ACPI_STATE_D0. > > This patch calls pci_power_up() if current power state is D0 inside > pci_restore_standard_config(). This pci_power_up() will change the > ACPI state to ACPI_STATE_D0. > > Following are the steps to confirm: > > Enable the debug prints in acpi_pci_set_power_state() > > 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device > > Before: > > 0000:01:00.0: power state changed by ACPI to D3hot > 0000:00:01.0: power state changed by ACPI to D3cold > 0000:00:01.0: power state changed by ACPI to D0 > > After: > > 0000:01:00.0: power state changed by ACPI to D3hot > 0000:00:01.0: power state changed by ACPI to D3cold > 0000:00:01.0: power state changed by ACPI to D0 > 0000:01:00.0: power state changed by ACPI to D0 > > So with this patch, the PCI device ACPI state is also being > changed to D0. > > Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> > --- > drivers/pci/pci-driver.c | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 588588cfda48..64e0cca12f16 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev) > */ > static int pci_restore_standard_config(struct pci_dev *pci_dev) > { > + int error = 0; > pci_update_current_state(pci_dev, PCI_UNKNOWN); > > if (pci_dev->current_state != PCI_D0) { > - int error = pci_set_power_state(pci_dev, PCI_D0); > - if (error) > - return error; > + error = pci_set_power_state(pci_dev, PCI_D0); > + } else { > + /* > + * The platform power state can still be non-D0, so this is > + * required to change the platform power state to D0. > + */ > + error = pci_power_up(pci_dev); > } > > + if (error) > + return error; > + > pci_restore_state(pci_dev); > pci_pme_restore(pci_dev); > return 0; > -- > 2.17.1 >
On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote: > [+cc Rafael, hoping for your review :) +Mika > Wonder if we should add something like this to MAINTAINERS so you get > cc'd on power-related things: > > diff --git a/MAINTAINERS b/MAINTAINERS > index ea3e6c914384..3d9a211cad5d 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -15422,6 +15422,7 @@ F: include/linux/pm.h > F: include/linux/pm_* > F: include/linux/powercap.h > F: kernel/configs/nopm.config > +K: pci_[a-z_]*power[a-z_]*\( It seems so, but generally PM patches should be CCed to linux-pm anyway. > > DYNAMIC THERMAL POWER MANAGEMENT (DTPM) > M: Daniel Lezcano <daniel.lezcano@kernel.org> > ] > > On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote: > > Consider the following sequence during PCI device runtime > > suspend/resume: > > > > 1. PCI device goes into runtime suspended state. The PCI state > > will be changed to PCI_D0 and then pci_platform_power_transition() > > will be called which changes the ACPI state to ACPI_STATE_D3_HOT. You mean PCI_D3hot I suppose? > > 2. Parent bridge goes into runtime suspended state. If parent > > bridge supports D3cold, then it will change the power state of all its > > children to D3cold state and the power will be removed. > > > > 3. During wake-up time, the bridge will be runtime resumed first > > and pci_power_up() will be called for the bridge. Now, the power > > supply will be resumed. > > > > 4. pci_resume_bus() will be called which will internally invoke > > pci_restore_standard_config(). pci_update_current_state() > > will read PCI_PM_CTRL register and the current_state will be > > updated to D0. > > > > In the above process, at step 4, the ACPI device state will still be > > ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being > > invoked. I'm not quite following. I'm assuming that this description applies to the endpoint device that was previously put into D3_hot. Since its current state is D3_hot, it is not D0 (in particular) and the pci_set_power_state() in pci_restore_standard_config() should put int into D0 proper, including the platform firmware part. > > We need call the pci_platform_power_transition() with state > > D0 to change the ACPI state to ACPI_STATE_D0. > > > > This patch calls pci_power_up() if current power state is D0 inside > > pci_restore_standard_config(). This pci_power_up() will change the > > ACPI state to ACPI_STATE_D0. > > > > Following are the steps to confirm: > > > > Enable the debug prints in acpi_pci_set_power_state() > > > > 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device > > > > Before: > > > > 0000:01:00.0: power state changed by ACPI to D3hot > > 0000:00:01.0: power state changed by ACPI to D3cold > > 0000:00:01.0: power state changed by ACPI to D0 > > > > After: > > > > 0000:01:00.0: power state changed by ACPI to D3hot > > 0000:00:01.0: power state changed by ACPI to D3cold > > 0000:00:01.0: power state changed by ACPI to D0 > > 0000:01:00.0: power state changed by ACPI to D0 > > > > So with this patch, the PCI device ACPI state is also being > > changed to D0. > > > > Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> > > --- > > drivers/pci/pci-driver.c | 14 +++++++++++--- > > 1 file changed, 11 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > > index 588588cfda48..64e0cca12f16 100644 > > --- a/drivers/pci/pci-driver.c > > +++ b/drivers/pci/pci-driver.c > > @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev) > > */ > > static int pci_restore_standard_config(struct pci_dev *pci_dev) > > { > > + int error = 0; > > pci_update_current_state(pci_dev, PCI_UNKNOWN); > > > > if (pci_dev->current_state != PCI_D0) { > > - int error = pci_set_power_state(pci_dev, PCI_D0); > > - if (error) > > - return error; > > + error = pci_set_power_state(pci_dev, PCI_D0); > > + } else { > > + /* > > + * The platform power state can still be non-D0, so this is > > + * required to change the platform power state to D0. > > + */ This really isn't expected to happen. If the device's power state has been changed to D3hot by ACPI, it is not in D0. It looks like the state tracking is not working here. > > + error = pci_power_up(pci_dev); > > } > > > > + if (error) > > + return error; > > + > > pci_restore_state(pci_dev); > > pci_pme_restore(pci_dev); > > return 0; >
On Mon, Feb 07, 2022 at 07:58:13PM +0100, Rafael J. Wysocki wrote: > On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote: > > Wonder if we should add something like this to MAINTAINERS so you get > > cc'd on power-related things: > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > index ea3e6c914384..3d9a211cad5d 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -15422,6 +15422,7 @@ F: include/linux/pm.h > > F: include/linux/pm_* > > F: include/linux/powercap.h > > F: kernel/configs/nopm.config > > +K: pci_[a-z_]*power[a-z_]*\( > > It seems so, but generally PM patches should be CCed to linux-pm anyway. Definitely. It's just that running get_maintainer.pl on this patch currently shows: Bjorn Helgaas <bhelgaas@google.com> (supporter:PCI SUBSYSTEM) linux-pci@vger.kernel.org (open list:PCI SUBSYSTEM) linux-kernel@vger.kernel.org (open list) so it's not as helpful as it could be. The MAINTAINERS patch above would change it to this: Bjorn Helgaas <bhelgaas@google.com> (supporter:PCI SUBSYSTEM) "Rafael J. Wysocki" <rafael@kernel.org> (supporter:POWER MANAGEMENT CORE) linux-pci@vger.kernel.org (open list:PCI SUBSYSTEM) linux-kernel@vger.kernel.org (open list) linux-pm@vger.kernel.org (open list:POWER MANAGEMENT CORE) Bjorn
On 2/8/2022 12:28 AM, Rafael J. Wysocki wrote: > On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote: >> [+cc Rafael, hoping for your review :) > > +Mika > >> Wonder if we should add something like this to MAINTAINERS so you get >> cc'd on power-related things: >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> index ea3e6c914384..3d9a211cad5d 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -15422,6 +15422,7 @@ F: include/linux/pm.h >> F: include/linux/pm_* >> F: include/linux/powercap.h >> F: kernel/configs/nopm.config >> +K: pci_[a-z_]*power[a-z_]*\( > > It seems so, but generally PM patches should be CCed to linux-pm anyway. > >> >> DYNAMIC THERMAL POWER MANAGEMENT (DTPM) >> M: Daniel Lezcano <daniel.lezcano@kernel.org> >> ] >> >> On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote: >>> Consider the following sequence during PCI device runtime >>> suspend/resume: >>> >>> 1. PCI device goes into runtime suspended state. The PCI state >>> will be changed to PCI_D0 and then pci_platform_power_transition() >>> will be called which changes the ACPI state to ACPI_STATE_D3_HOT. > > You mean PCI_D3hot I suppose? > Yes. It should be PCI_D3hot here. >>> 2. Parent bridge goes into runtime suspended state. If parent >>> bridge supports D3cold, then it will change the power state of all its >>> children to D3cold state and the power will be removed. >>> >>> 3. During wake-up time, the bridge will be runtime resumed first >>> and pci_power_up() will be called for the bridge. Now, the power >>> supply will be resumed. >>> >>> 4. pci_resume_bus() will be called which will internally invoke >>> pci_restore_standard_config(). pci_update_current_state() >>> will read PCI_PM_CTRL register and the current_state will be >>> updated to D0. >>> >>> In the above process, at step 4, the ACPI device state will still be >>> ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being >>> invoked. > > I'm not quite following. > > I'm assuming that this description applies to the endpoint device that was > previously put into D3_hot. > Yes. This is applicable for endpoint devices which was previously put into D3hot. > Since its current state is D3_hot, it is not D0 (in particular) and the > pci_set_power_state() in pci_restore_standard_config() should put int into > D0 proper, including the platform firmware part. > The pci_restore_standard_config() for endpoint devices are being called internally during wake-up of upstream bridge. pci_power_up(struct pci_dev *dev) { ... if (dev->runtime_d3cold) { /* * When powering on a bridge from D3cold, the whole hierarchy * may be powered on into D0uninitialized state, resume them to * give them a chance to suspend again */ pci_resume_bus(dev->subordinate); } ... } For the upstream bridge, the above code will trigger the wake-up of endpoint devices and then following code will be executed for the endpoint devices: pci_update_current_state(struct pci_dev *dev, pci_power_t state) { if (platform_pci_get_power_state(dev) == PCI_D3cold || !pci_device_is_present(dev)) { dev->current_state = PCI_D3cold; } else if (dev->pm_cap) { u16 pmcsr; pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); } else { dev->current_state = state; } } In the above code, the current_state will be set to D0 for the endpoint devices since it will go into second block where it will read the PM_CTRL register. >>> We need call the pci_platform_power_transition() with state >>> D0 to change the ACPI state to ACPI_STATE_D0. >>> >>> This patch calls pci_power_up() if current power state is D0 inside >>> pci_restore_standard_config(). This pci_power_up() will change the >>> ACPI state to ACPI_STATE_D0. >>> >>> Following are the steps to confirm: >>> >>> Enable the debug prints in acpi_pci_set_power_state() >>> >>> 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device >>> >>> Before: >>> >>> 0000:01:00.0: power state changed by ACPI to D3hot >>> 0000:00:01.0: power state changed by ACPI to D3cold >>> 0000:00:01.0: power state changed by ACPI to D0 >>> >>> After: >>> >>> 0000:01:00.0: power state changed by ACPI to D3hot >>> 0000:00:01.0: power state changed by ACPI to D3cold >>> 0000:00:01.0: power state changed by ACPI to D0 >>> 0000:01:00.0: power state changed by ACPI to D0 >>> >>> So with this patch, the PCI device ACPI state is also being >>> changed to D0. >>> >>> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> >>> --- >>> drivers/pci/pci-driver.c | 14 +++++++++++--- >>> 1 file changed, 11 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c >>> index 588588cfda48..64e0cca12f16 100644 >>> --- a/drivers/pci/pci-driver.c >>> +++ b/drivers/pci/pci-driver.c >>> @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev) >>> */ >>> static int pci_restore_standard_config(struct pci_dev *pci_dev) >>> { >>> + int error = 0; >>> pci_update_current_state(pci_dev, PCI_UNKNOWN); >>> >>> if (pci_dev->current_state != PCI_D0) { >>> - int error = pci_set_power_state(pci_dev, PCI_D0); >>> - if (error) >>> - return error; >>> + error = pci_set_power_state(pci_dev, PCI_D0); >>> + } else { >>> + /* >>> + * The platform power state can still be non-D0, so this is >>> + * required to change the platform power state to D0. >>> + */ > > This really isn't expected to happen. > > If the device's power state has been changed to D3hot by ACPI, it is not in D0. > > It looks like the state tracking is not working here. > The state setting to D0 is happening due to the current logic present in pci_update_current_state(). If we can fix the logic in pci_update_current_state() to detect this condition and return state D3hot, then it should also fix the issue. Thanks, Abhishek >>> + error = pci_power_up(pci_dev); >>> } >>> >>> + if (error) >>> + return error; >>> + >>> pci_restore_state(pci_dev); >>> pci_pme_restore(pci_dev); >>> return 0; >> > > > >
On 2/8/2022 4:00 PM, Abhishek Sahu wrote: > On 2/8/2022 12:28 AM, Rafael J. Wysocki wrote: >> On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote: >>> [+cc Rafael, hoping for your review :) >> >> +Mika >> >>> Wonder if we should add something like this to MAINTAINERS so you get >>> cc'd on power-related things: >>> >>> diff --git a/MAINTAINERS b/MAINTAINERS >>> index ea3e6c914384..3d9a211cad5d 100644 >>> --- a/MAINTAINERS >>> +++ b/MAINTAINERS >>> @@ -15422,6 +15422,7 @@ F: include/linux/pm.h >>> F: include/linux/pm_* >>> F: include/linux/powercap.h >>> F: kernel/configs/nopm.config >>> +K: pci_[a-z_]*power[a-z_]*\( >> >> It seems so, but generally PM patches should be CCed to linux-pm anyway. >> >>> >>> DYNAMIC THERMAL POWER MANAGEMENT (DTPM) >>> M: Daniel Lezcano <daniel.lezcano@kernel.org> >>> ] >>> >>> On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote: >>>> Consider the following sequence during PCI device runtime >>>> suspend/resume: >>>> >>>> 1. PCI device goes into runtime suspended state. The PCI state >>>> will be changed to PCI_D0 and then pci_platform_power_transition() >>>> will be called which changes the ACPI state to ACPI_STATE_D3_HOT. >> >> You mean PCI_D3hot I suppose? >> > > Yes. It should be PCI_D3hot here. > >>>> 2. Parent bridge goes into runtime suspended state. If parent >>>> bridge supports D3cold, then it will change the power state of all its >>>> children to D3cold state and the power will be removed. >>>> >>>> 3. During wake-up time, the bridge will be runtime resumed first >>>> and pci_power_up() will be called for the bridge. Now, the power >>>> supply will be resumed. >>>> >>>> 4. pci_resume_bus() will be called which will internally invoke >>>> pci_restore_standard_config(). pci_update_current_state() >>>> will read PCI_PM_CTRL register and the current_state will be >>>> updated to D0. >>>> >>>> In the above process, at step 4, the ACPI device state will still be >>>> ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being >>>> invoked. >> >> I'm not quite following. >> >> I'm assuming that this description applies to the endpoint device that was >> previously put into D3_hot. >> > > Yes. This is applicable for endpoint devices which was previously put > into D3hot. > >> Since its current state is D3_hot, it is not D0 (in particular) and the >> pci_set_power_state() in pci_restore_standard_config() should put int into >> D0 proper, including the platform firmware part. >> > > The pci_restore_standard_config() for endpoint devices are being called > internally during wake-up of upstream bridge. > > pci_power_up(struct pci_dev *dev) > { > ... > if (dev->runtime_d3cold) { > /* > * When powering on a bridge from D3cold, the whole hierarchy > * may be powered on into D0uninitialized state, resume them to > * give them a chance to suspend again > */ > pci_resume_bus(dev->subordinate); > } > ... > } > > For the upstream bridge, the above code will trigger the wake-up of > endpoint devices and then following code will be executed for the > endpoint devices: > > pci_update_current_state(struct pci_dev *dev, pci_power_t state) > { > if (platform_pci_get_power_state(dev) == PCI_D3cold || > !pci_device_is_present(dev)) { > dev->current_state = PCI_D3cold; > } else if (dev->pm_cap) { > u16 pmcsr; > > pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); > dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); > } else { > dev->current_state = state; > } > } > > In the above code, the current_state will be set to D0 for the > endpoint devices since it will go into second block where > it will read the PM_CTRL register. > >>>> We need call the pci_platform_power_transition() with state >>>> D0 to change the ACPI state to ACPI_STATE_D0. >>>> >>>> This patch calls pci_power_up() if current power state is D0 inside >>>> pci_restore_standard_config(). This pci_power_up() will change the >>>> ACPI state to ACPI_STATE_D0. >>>> >>>> Following are the steps to confirm: >>>> >>>> Enable the debug prints in acpi_pci_set_power_state() >>>> >>>> 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device >>>> >>>> Before: >>>> >>>> 0000:01:00.0: power state changed by ACPI to D3hot >>>> 0000:00:01.0: power state changed by ACPI to D3cold >>>> 0000:00:01.0: power state changed by ACPI to D0 >>>> >>>> After: >>>> >>>> 0000:01:00.0: power state changed by ACPI to D3hot >>>> 0000:00:01.0: power state changed by ACPI to D3cold >>>> 0000:00:01.0: power state changed by ACPI to D0 >>>> 0000:01:00.0: power state changed by ACPI to D0 >>>> >>>> So with this patch, the PCI device ACPI state is also being >>>> changed to D0. >>>> >>>> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> >>>> --- >>>> drivers/pci/pci-driver.c | 14 +++++++++++--- >>>> 1 file changed, 11 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c >>>> index 588588cfda48..64e0cca12f16 100644 >>>> --- a/drivers/pci/pci-driver.c >>>> +++ b/drivers/pci/pci-driver.c >>>> @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev) >>>> */ >>>> static int pci_restore_standard_config(struct pci_dev *pci_dev) >>>> { >>>> + int error = 0; >>>> pci_update_current_state(pci_dev, PCI_UNKNOWN); >>>> >>>> if (pci_dev->current_state != PCI_D0) { >>>> - int error = pci_set_power_state(pci_dev, PCI_D0); >>>> - if (error) >>>> - return error; >>>> + error = pci_set_power_state(pci_dev, PCI_D0); >>>> + } else { >>>> + /* >>>> + * The platform power state can still be non-D0, so this is >>>> + * required to change the platform power state to D0. >>>> + */ >> >> This really isn't expected to happen. >> >> If the device's power state has been changed to D3hot by ACPI, it is not in D0. >> >> It looks like the state tracking is not working here. >> > > The state setting to D0 is happening due to the current logic present in > pci_update_current_state(). If we can fix the logic in > pci_update_current_state() to detect this condition and return state D3hot, > then it should also fix the issue. > > Thanks, > Abhishek > Hi Rafael/Mika, Could you please help regarding the correct way to fix this issue. I can update the patch accordingly. Thanks, Abhishek >>>> + error = pci_power_up(pci_dev); >>>> } >>>> >>>> + if (error) >>>> + return error; >>>> + >>>> pci_restore_state(pci_dev); >>>> pci_pme_restore(pci_dev); >>>> return 0; >>> >> >> >> >> >
On Tuesday, April 5, 2022 6:36:34 PM CEST Abhishek Sahu wrote: > On 2/8/2022 4:00 PM, Abhishek Sahu wrote: > > On 2/8/2022 12:28 AM, Rafael J. Wysocki wrote: > >> On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote: > >>> [+cc Rafael, hoping for your review :) > >> > >> +Mika > >> > >>> Wonder if we should add something like this to MAINTAINERS so you get > >>> cc'd on power-related things: > >>> > >>> diff --git a/MAINTAINERS b/MAINTAINERS > >>> index ea3e6c914384..3d9a211cad5d 100644 > >>> --- a/MAINTAINERS > >>> +++ b/MAINTAINERS > >>> @@ -15422,6 +15422,7 @@ F: include/linux/pm.h > >>> F: include/linux/pm_* > >>> F: include/linux/powercap.h > >>> F: kernel/configs/nopm.config > >>> +K: pci_[a-z_]*power[a-z_]*\( > >> > >> It seems so, but generally PM patches should be CCed to linux-pm anyway. > >> > >>> > >>> DYNAMIC THERMAL POWER MANAGEMENT (DTPM) > >>> M: Daniel Lezcano <daniel.lezcano@kernel.org> > >>> ] > >>> > >>> On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote: > >>>> Consider the following sequence during PCI device runtime > >>>> suspend/resume: > >>>> > >>>> 1. PCI device goes into runtime suspended state. The PCI state > >>>> will be changed to PCI_D0 and then pci_platform_power_transition() > >>>> will be called which changes the ACPI state to ACPI_STATE_D3_HOT. > >> > >> You mean PCI_D3hot I suppose? > >> > > > > Yes. It should be PCI_D3hot here. > > > >>>> 2. Parent bridge goes into runtime suspended state. If parent > >>>> bridge supports D3cold, then it will change the power state of all its > >>>> children to D3cold state and the power will be removed. > >>>> > >>>> 3. During wake-up time, the bridge will be runtime resumed first > >>>> and pci_power_up() will be called for the bridge. Now, the power > >>>> supply will be resumed. > >>>> > >>>> 4. pci_resume_bus() will be called which will internally invoke > >>>> pci_restore_standard_config(). pci_update_current_state() > >>>> will read PCI_PM_CTRL register and the current_state will be > >>>> updated to D0. > >>>> > >>>> In the above process, at step 4, the ACPI device state will still be > >>>> ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being > >>>> invoked. > >> > >> I'm not quite following. > >> > >> I'm assuming that this description applies to the endpoint device that was > >> previously put into D3_hot. > >> > > > > Yes. This is applicable for endpoint devices which was previously put > > into D3hot. > > > >> Since its current state is D3_hot, it is not D0 (in particular) and the > >> pci_set_power_state() in pci_restore_standard_config() should put int into > >> D0 proper, including the platform firmware part. > >> > > > > The pci_restore_standard_config() for endpoint devices are being called > > internally during wake-up of upstream bridge. > > > > pci_power_up(struct pci_dev *dev) > > { > > ... > > if (dev->runtime_d3cold) { > > /* > > * When powering on a bridge from D3cold, the whole hierarchy > > * may be powered on into D0uninitialized state, resume them to > > * give them a chance to suspend again > > */ > > pci_resume_bus(dev->subordinate); > > } > > ... > > } > > > > For the upstream bridge, the above code will trigger the wake-up of > > endpoint devices and then following code will be executed for the > > endpoint devices: > > > > pci_update_current_state(struct pci_dev *dev, pci_power_t state) > > { > > if (platform_pci_get_power_state(dev) == PCI_D3cold || > > !pci_device_is_present(dev)) { > > dev->current_state = PCI_D3cold; > > } else if (dev->pm_cap) { > > u16 pmcsr; > > > > pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); > > dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); > > } else { > > dev->current_state = state; > > } > > } > > > > In the above code, the current_state will be set to D0 for the > > endpoint devices since it will go into second block where > > it will read the PM_CTRL register. > > > >>>> We need call the pci_platform_power_transition() with state > >>>> D0 to change the ACPI state to ACPI_STATE_D0. > >>>> > >>>> This patch calls pci_power_up() if current power state is D0 inside > >>>> pci_restore_standard_config(). This pci_power_up() will change the > >>>> ACPI state to ACPI_STATE_D0. > >>>> > >>>> Following are the steps to confirm: > >>>> > >>>> Enable the debug prints in acpi_pci_set_power_state() > >>>> > >>>> 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device > >>>> > >>>> Before: > >>>> > >>>> 0000:01:00.0: power state changed by ACPI to D3hot > >>>> 0000:00:01.0: power state changed by ACPI to D3cold > >>>> 0000:00:01.0: power state changed by ACPI to D0 > >>>> > >>>> After: > >>>> > >>>> 0000:01:00.0: power state changed by ACPI to D3hot > >>>> 0000:00:01.0: power state changed by ACPI to D3cold > >>>> 0000:00:01.0: power state changed by ACPI to D0 > >>>> 0000:01:00.0: power state changed by ACPI to D0 > >>>> > >>>> So with this patch, the PCI device ACPI state is also being > >>>> changed to D0. > >>>> > >>>> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> > >>>> --- > >>>> drivers/pci/pci-driver.c | 14 +++++++++++--- > >>>> 1 file changed, 11 insertions(+), 3 deletions(-) > >>>> > >>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > >>>> index 588588cfda48..64e0cca12f16 100644 > >>>> --- a/drivers/pci/pci-driver.c > >>>> +++ b/drivers/pci/pci-driver.c > >>>> @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev) > >>>> */ > >>>> static int pci_restore_standard_config(struct pci_dev *pci_dev) > >>>> { > >>>> + int error = 0; > >>>> pci_update_current_state(pci_dev, PCI_UNKNOWN); > >>>> > >>>> if (pci_dev->current_state != PCI_D0) { > >>>> - int error = pci_set_power_state(pci_dev, PCI_D0); > >>>> - if (error) > >>>> - return error; > >>>> + error = pci_set_power_state(pci_dev, PCI_D0); > >>>> + } else { > >>>> + /* > >>>> + * The platform power state can still be non-D0, so this is > >>>> + * required to change the platform power state to D0. > >>>> + */ > >> > >> This really isn't expected to happen. > >> > >> If the device's power state has been changed to D3hot by ACPI, it is not in D0. > >> > >> It looks like the state tracking is not working here. > >> > > > > The state setting to D0 is happening due to the current logic present in > > pci_update_current_state(). If we can fix the logic in > > pci_update_current_state() to detect this condition and return state D3hot, > > then it should also fix the issue. > > > > Thanks, > > Abhishek > > > > Hi Rafael/Mika, > > Could you please help regarding the correct way to fix this issue. > I can update the patch accordingly. I think you can try one of the patches posted recently: https://patchwork.kernel.org/project/linux-pm/patch/3623886.MHq7AAxBmi@kreacher/ Thanks!
On 4/5/2022 10:20 PM, Rafael J. Wysocki wrote: > On Tuesday, April 5, 2022 6:36:34 PM CEST Abhishek Sahu wrote: >> On 2/8/2022 4:00 PM, Abhishek Sahu wrote: >>> On 2/8/2022 12:28 AM, Rafael J. Wysocki wrote: >>>> On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote: >>>>> [+cc Rafael, hoping for your review :) >>>> >>>> +Mika >>>> >>>>> Wonder if we should add something like this to MAINTAINERS so you get >>>>> cc'd on power-related things: >>>>> >>>>> diff --git a/MAINTAINERS b/MAINTAINERS >>>>> index ea3e6c914384..3d9a211cad5d 100644 >>>>> --- a/MAINTAINERS >>>>> +++ b/MAINTAINERS >>>>> @@ -15422,6 +15422,7 @@ F: include/linux/pm.h >>>>> F: include/linux/pm_* >>>>> F: include/linux/powercap.h >>>>> F: kernel/configs/nopm.config >>>>> +K: pci_[a-z_]*power[a-z_]*\( >>>> >>>> It seems so, but generally PM patches should be CCed to linux-pm anyway. >>>> >>>>> >>>>> DYNAMIC THERMAL POWER MANAGEMENT (DTPM) >>>>> M: Daniel Lezcano <daniel.lezcano@kernel.org> >>>>> ] >>>>> >>>>> On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote: >>>>>> Consider the following sequence during PCI device runtime >>>>>> suspend/resume: >>>>>> >>>>>> 1. PCI device goes into runtime suspended state. The PCI state >>>>>> will be changed to PCI_D0 and then pci_platform_power_transition() >>>>>> will be called which changes the ACPI state to ACPI_STATE_D3_HOT. >>>> >>>> You mean PCI_D3hot I suppose? >>>> >>> >>> Yes. It should be PCI_D3hot here. >>> >>>>>> 2. Parent bridge goes into runtime suspended state. If parent >>>>>> bridge supports D3cold, then it will change the power state of all its >>>>>> children to D3cold state and the power will be removed. >>>>>> >>>>>> 3. During wake-up time, the bridge will be runtime resumed first >>>>>> and pci_power_up() will be called for the bridge. Now, the power >>>>>> supply will be resumed. >>>>>> >>>>>> 4. pci_resume_bus() will be called which will internally invoke >>>>>> pci_restore_standard_config(). pci_update_current_state() >>>>>> will read PCI_PM_CTRL register and the current_state will be >>>>>> updated to D0. >>>>>> >>>>>> In the above process, at step 4, the ACPI device state will still be >>>>>> ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being >>>>>> invoked. >>>> >>>> I'm not quite following. >>>> >>>> I'm assuming that this description applies to the endpoint device that was >>>> previously put into D3_hot. >>>> >>> >>> Yes. This is applicable for endpoint devices which was previously put >>> into D3hot. >>> >>>> Since its current state is D3_hot, it is not D0 (in particular) and the >>>> pci_set_power_state() in pci_restore_standard_config() should put int into >>>> D0 proper, including the platform firmware part. >>>> >>> >>> The pci_restore_standard_config() for endpoint devices are being called >>> internally during wake-up of upstream bridge. >>> >>> pci_power_up(struct pci_dev *dev) >>> { >>> ... >>> if (dev->runtime_d3cold) { >>> /* >>> * When powering on a bridge from D3cold, the whole hierarchy >>> * may be powered on into D0uninitialized state, resume them to >>> * give them a chance to suspend again >>> */ >>> pci_resume_bus(dev->subordinate); >>> } >>> ... >>> } >>> >>> For the upstream bridge, the above code will trigger the wake-up of >>> endpoint devices and then following code will be executed for the >>> endpoint devices: >>> >>> pci_update_current_state(struct pci_dev *dev, pci_power_t state) >>> { >>> if (platform_pci_get_power_state(dev) == PCI_D3cold || >>> !pci_device_is_present(dev)) { >>> dev->current_state = PCI_D3cold; >>> } else if (dev->pm_cap) { >>> u16 pmcsr; >>> >>> pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); >>> dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); >>> } else { >>> dev->current_state = state; >>> } >>> } >>> >>> In the above code, the current_state will be set to D0 for the >>> endpoint devices since it will go into second block where >>> it will read the PM_CTRL register. >>> >>>>>> We need call the pci_platform_power_transition() with state >>>>>> D0 to change the ACPI state to ACPI_STATE_D0. >>>>>> >>>>>> This patch calls pci_power_up() if current power state is D0 inside >>>>>> pci_restore_standard_config(). This pci_power_up() will change the >>>>>> ACPI state to ACPI_STATE_D0. >>>>>> >>>>>> Following are the steps to confirm: >>>>>> >>>>>> Enable the debug prints in acpi_pci_set_power_state() >>>>>> >>>>>> 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device >>>>>> >>>>>> Before: >>>>>> >>>>>> 0000:01:00.0: power state changed by ACPI to D3hot >>>>>> 0000:00:01.0: power state changed by ACPI to D3cold >>>>>> 0000:00:01.0: power state changed by ACPI to D0 >>>>>> >>>>>> After: >>>>>> >>>>>> 0000:01:00.0: power state changed by ACPI to D3hot >>>>>> 0000:00:01.0: power state changed by ACPI to D3cold >>>>>> 0000:00:01.0: power state changed by ACPI to D0 >>>>>> 0000:01:00.0: power state changed by ACPI to D0 >>>>>> >>>>>> So with this patch, the PCI device ACPI state is also being >>>>>> changed to D0. >>>>>> >>>>>> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> >>>>>> --- >>>>>> drivers/pci/pci-driver.c | 14 +++++++++++--- >>>>>> 1 file changed, 11 insertions(+), 3 deletions(-) >>>>>> >>>>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c >>>>>> index 588588cfda48..64e0cca12f16 100644 >>>>>> --- a/drivers/pci/pci-driver.c >>>>>> +++ b/drivers/pci/pci-driver.c >>>>>> @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev) >>>>>> */ >>>>>> static int pci_restore_standard_config(struct pci_dev *pci_dev) >>>>>> { >>>>>> + int error = 0; >>>>>> pci_update_current_state(pci_dev, PCI_UNKNOWN); >>>>>> >>>>>> if (pci_dev->current_state != PCI_D0) { >>>>>> - int error = pci_set_power_state(pci_dev, PCI_D0); >>>>>> - if (error) >>>>>> - return error; >>>>>> + error = pci_set_power_state(pci_dev, PCI_D0); >>>>>> + } else { >>>>>> + /* >>>>>> + * The platform power state can still be non-D0, so this is >>>>>> + * required to change the platform power state to D0. >>>>>> + */ >>>> >>>> This really isn't expected to happen. >>>> >>>> If the device's power state has been changed to D3hot by ACPI, it is not in D0. >>>> >>>> It looks like the state tracking is not working here. >>>> >>> >>> The state setting to D0 is happening due to the current logic present in >>> pci_update_current_state(). If we can fix the logic in >>> pci_update_current_state() to detect this condition and return state D3hot, >>> then it should also fix the issue. >>> >>> Thanks, >>> Abhishek >>> >> >> Hi Rafael/Mika, >> >> Could you please help regarding the correct way to fix this issue. >> I can update the patch accordingly. > > I think you can try one of the patches posted recently: > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-pm%2Fpatch%2F3623886.MHq7AAxBmi%40kreacher%2F&data=04%7C01%7Cabhsahu%40nvidia.com%7Cae4c8574f5a44973514a08da172471d6%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637847743178405297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=aasJ79EICVnlJQ4EbXA2AtZFW0qnRsMkHEZRI8mnDI8%3D&reserved=0 > > Thanks! > > > Thanks Rafael. I have applied both the changes and still the issue which I mentioned is happening. Following are the prints: 0000:01:00.0: power state changed by ACPI to D3hot 0000:00:01.0: power state changed by ACPI to D3cold 0000:00:01.0: power state changed by ACPI to D0 So ACPI state change is still not happening for PCI endpoint devices. Also, the I checked the code and the pci_power_up() will not be called for endpoint devices. For endpoints, pci_restore_standard_config() will be called first where the current state will come as D0. Thanks, Abhishek
On Wednesday, April 6, 2022 7:32:45 AM CEST Abhishek Sahu wrote: > On 4/5/2022 10:20 PM, Rafael J. Wysocki wrote: > > On Tuesday, April 5, 2022 6:36:34 PM CEST Abhishek Sahu wrote: > >> On 2/8/2022 4:00 PM, Abhishek Sahu wrote: > >>> On 2/8/2022 12:28 AM, Rafael J. Wysocki wrote: > >>>> On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote: > >>>>> [+cc Rafael, hoping for your review :) > >>>> > >>>> +Mika > >>>> > >>>>> Wonder if we should add something like this to MAINTAINERS so you get > >>>>> cc'd on power-related things: > >>>>> > >>>>> diff --git a/MAINTAINERS b/MAINTAINERS > >>>>> index ea3e6c914384..3d9a211cad5d 100644 > >>>>> --- a/MAINTAINERS > >>>>> +++ b/MAINTAINERS > >>>>> @@ -15422,6 +15422,7 @@ F: include/linux/pm.h > >>>>> F: include/linux/pm_* > >>>>> F: include/linux/powercap.h > >>>>> F: kernel/configs/nopm.config > >>>>> +K: pci_[a-z_]*power[a-z_]*\( > >>>> > >>>> It seems so, but generally PM patches should be CCed to linux-pm anyway. > >>>> > >>>>> > >>>>> DYNAMIC THERMAL POWER MANAGEMENT (DTPM) > >>>>> M: Daniel Lezcano <daniel.lezcano@kernel.org> > >>>>> ] > >>>>> > >>>>> On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote: > >>>>>> Consider the following sequence during PCI device runtime > >>>>>> suspend/resume: > >>>>>> > >>>>>> 1. PCI device goes into runtime suspended state. The PCI state > >>>>>> will be changed to PCI_D0 and then pci_platform_power_transition() > >>>>>> will be called which changes the ACPI state to ACPI_STATE_D3_HOT. > >>>> > >>>> You mean PCI_D3hot I suppose? > >>>> > >>> > >>> Yes. It should be PCI_D3hot here. > >>> > >>>>>> 2. Parent bridge goes into runtime suspended state. If parent > >>>>>> bridge supports D3cold, then it will change the power state of all its > >>>>>> children to D3cold state and the power will be removed. > >>>>>> > >>>>>> 3. During wake-up time, the bridge will be runtime resumed first > >>>>>> and pci_power_up() will be called for the bridge. Now, the power > >>>>>> supply will be resumed. > >>>>>> > >>>>>> 4. pci_resume_bus() will be called which will internally invoke > >>>>>> pci_restore_standard_config(). pci_update_current_state() > >>>>>> will read PCI_PM_CTRL register and the current_state will be > >>>>>> updated to D0. > >>>>>> > >>>>>> In the above process, at step 4, the ACPI device state will still be > >>>>>> ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being > >>>>>> invoked. > >>>> > >>>> I'm not quite following. > >>>> > >>>> I'm assuming that this description applies to the endpoint device that was > >>>> previously put into D3_hot. > >>>> > >>> > >>> Yes. This is applicable for endpoint devices which was previously put > >>> into D3hot. > >>> > >>>> Since its current state is D3_hot, it is not D0 (in particular) and the > >>>> pci_set_power_state() in pci_restore_standard_config() should put int into > >>>> D0 proper, including the platform firmware part. > >>>> > >>> > >>> The pci_restore_standard_config() for endpoint devices are being called > >>> internally during wake-up of upstream bridge. > >>> > >>> pci_power_up(struct pci_dev *dev) > >>> { > >>> ... > >>> if (dev->runtime_d3cold) { > >>> /* > >>> * When powering on a bridge from D3cold, the whole hierarchy > >>> * may be powered on into D0uninitialized state, resume them to > >>> * give them a chance to suspend again > >>> */ > >>> pci_resume_bus(dev->subordinate); > >>> } > >>> ... > >>> } > >>> > >>> For the upstream bridge, the above code will trigger the wake-up of > >>> endpoint devices and then following code will be executed for the > >>> endpoint devices: > >>> > >>> pci_update_current_state(struct pci_dev *dev, pci_power_t state) > >>> { > >>> if (platform_pci_get_power_state(dev) == PCI_D3cold || > >>> !pci_device_is_present(dev)) { > >>> dev->current_state = PCI_D3cold; > >>> } else if (dev->pm_cap) { > >>> u16 pmcsr; > >>> > >>> pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); > >>> dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); > >>> } else { > >>> dev->current_state = state; > >>> } > >>> } > >>> > >>> In the above code, the current_state will be set to D0 for the > >>> endpoint devices since it will go into second block where > >>> it will read the PM_CTRL register. > >>> > >>>>>> We need call the pci_platform_power_transition() with state > >>>>>> D0 to change the ACPI state to ACPI_STATE_D0. > >>>>>> > >>>>>> This patch calls pci_power_up() if current power state is D0 inside > >>>>>> pci_restore_standard_config(). This pci_power_up() will change the > >>>>>> ACPI state to ACPI_STATE_D0. > >>>>>> > >>>>>> Following are the steps to confirm: > >>>>>> > >>>>>> Enable the debug prints in acpi_pci_set_power_state() > >>>>>> > >>>>>> 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device > >>>>>> > >>>>>> Before: > >>>>>> > >>>>>> 0000:01:00.0: power state changed by ACPI to D3hot > >>>>>> 0000:00:01.0: power state changed by ACPI to D3cold > >>>>>> 0000:00:01.0: power state changed by ACPI to D0 > >>>>>> > >>>>>> After: > >>>>>> > >>>>>> 0000:01:00.0: power state changed by ACPI to D3hot > >>>>>> 0000:00:01.0: power state changed by ACPI to D3cold > >>>>>> 0000:00:01.0: power state changed by ACPI to D0 > >>>>>> 0000:01:00.0: power state changed by ACPI to D0 > >>>>>> > >>>>>> So with this patch, the PCI device ACPI state is also being > >>>>>> changed to D0. > >>>>>> > >>>>>> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> > >>>>>> --- > >>>>>> drivers/pci/pci-driver.c | 14 +++++++++++--- > >>>>>> 1 file changed, 11 insertions(+), 3 deletions(-) > >>>>>> > >>>>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > >>>>>> index 588588cfda48..64e0cca12f16 100644 > >>>>>> --- a/drivers/pci/pci-driver.c > >>>>>> +++ b/drivers/pci/pci-driver.c > >>>>>> @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev) > >>>>>> */ > >>>>>> static int pci_restore_standard_config(struct pci_dev *pci_dev) > >>>>>> { > >>>>>> + int error = 0; > >>>>>> pci_update_current_state(pci_dev, PCI_UNKNOWN); > >>>>>> > >>>>>> if (pci_dev->current_state != PCI_D0) { > >>>>>> - int error = pci_set_power_state(pci_dev, PCI_D0); > >>>>>> - if (error) > >>>>>> - return error; > >>>>>> + error = pci_set_power_state(pci_dev, PCI_D0); > >>>>>> + } else { > >>>>>> + /* > >>>>>> + * The platform power state can still be non-D0, so this is > >>>>>> + * required to change the platform power state to D0. > >>>>>> + */ > >>>> > >>>> This really isn't expected to happen. > >>>> > >>>> If the device's power state has been changed to D3hot by ACPI, it is not in D0. > >>>> > >>>> It looks like the state tracking is not working here. > >>>> > >>> > >>> The state setting to D0 is happening due to the current logic present in > >>> pci_update_current_state(). If we can fix the logic in > >>> pci_update_current_state() to detect this condition and return state D3hot, > >>> then it should also fix the issue. > >>> > >>> Thanks, > >>> Abhishek > >>> > >> > >> Hi Rafael/Mika, > >> > >> Could you please help regarding the correct way to fix this issue. > >> I can update the patch accordingly. > > > > I think you can try one of the patches posted recently: > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-pm%2Fpatch%2F3623886.MHq7AAxBmi%40kreacher%2F&data=04%7C01%7Cabhsahu%40nvidia.com%7Cae4c8574f5a44973514a08da172471d6%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637847743178405297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=aasJ79EICVnlJQ4EbXA2AtZFW0qnRsMkHEZRI8mnDI8%3D&reserved=0 > > > > Thanks! > > > > > > > > Thanks Rafael. > I have applied both the changes and still the issue which I mentioned is happening. > > Following are the prints: > > 0000:01:00.0: power state changed by ACPI to D3hot > 0000:00:01.0: power state changed by ACPI to D3cold > 0000:00:01.0: power state changed by ACPI to D0 > > So ACPI state change is still not happening for PCI endpoint devices. > > Also, the I checked the code and the pci_power_up() will not be called > for endpoint devices. For endpoints, pci_restore_standard_config() will > be called first where the current state will come as D0. OK, I see. The problem is that if the PCI device goes to D0 because of the bridge power-up, it's ACPI companion's power state may not follow, which means that we really want to do a full power-up in there. Please test the appended patch with the patch from https://patchwork.kernel.org/project/linux-pm/patch/3623886.MHq7AAxBmi@kreacher/ still applied. --- drivers/pci/pci-driver.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-pm/drivers/pci/pci-driver.c =================================================================== --- linux-pm.orig/drivers/pci/pci-driver.c +++ linux-pm/drivers/pci/pci-driver.c @@ -1312,7 +1312,7 @@ static int pci_pm_runtime_resume(struct * to a driver because although we left it in D0, it may have gone to * D3cold when the bridge above it runtime suspended. */ - pci_restore_standard_config(pci_dev); + pci_pm_default_resume_early(pci_dev); if (!pci_dev->driver) return 0;
On 4/6/2022 5:36 PM, Rafael J. Wysocki wrote: > On Wednesday, April 6, 2022 7:32:45 AM CEST Abhishek Sahu wrote: >> On 4/5/2022 10:20 PM, Rafael J. Wysocki wrote: >>> On Tuesday, April 5, 2022 6:36:34 PM CEST Abhishek Sahu wrote: >>>> On 2/8/2022 4:00 PM, Abhishek Sahu wrote: >>>>> On 2/8/2022 12:28 AM, Rafael J. Wysocki wrote: >>>>>> On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote: >>>>>>> [+cc Rafael, hoping for your review :) >>>>>> >>>>>> +Mika >>>>>> >>>>>>> Wonder if we should add something like this to MAINTAINERS so you get >>>>>>> cc'd on power-related things: >>>>>>> >>>>>>> diff --git a/MAINTAINERS b/MAINTAINERS >>>>>>> index ea3e6c914384..3d9a211cad5d 100644 >>>>>>> --- a/MAINTAINERS >>>>>>> +++ b/MAINTAINERS >>>>>>> @@ -15422,6 +15422,7 @@ F: include/linux/pm.h >>>>>>> F: include/linux/pm_* >>>>>>> F: include/linux/powercap.h >>>>>>> F: kernel/configs/nopm.config >>>>>>> +K: pci_[a-z_]*power[a-z_]*\( >>>>>> >>>>>> It seems so, but generally PM patches should be CCed to linux-pm anyway. >>>>>> >>>>>>> >>>>>>> DYNAMIC THERMAL POWER MANAGEMENT (DTPM) >>>>>>> M: Daniel Lezcano <daniel.lezcano@kernel.org> >>>>>>> ] >>>>>>> >>>>>>> On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote: >>>>>>>> Consider the following sequence during PCI device runtime >>>>>>>> suspend/resume: >>>>>>>> >>>>>>>> 1. PCI device goes into runtime suspended state. The PCI state >>>>>>>> will be changed to PCI_D0 and then pci_platform_power_transition() >>>>>>>> will be called which changes the ACPI state to ACPI_STATE_D3_HOT. >>>>>> >>>>>> You mean PCI_D3hot I suppose? >>>>>> >>>>> >>>>> Yes. It should be PCI_D3hot here. >>>>> >>>>>>>> 2. Parent bridge goes into runtime suspended state. If parent >>>>>>>> bridge supports D3cold, then it will change the power state of all its >>>>>>>> children to D3cold state and the power will be removed. >>>>>>>> >>>>>>>> 3. During wake-up time, the bridge will be runtime resumed first >>>>>>>> and pci_power_up() will be called for the bridge. Now, the power >>>>>>>> supply will be resumed. >>>>>>>> >>>>>>>> 4. pci_resume_bus() will be called which will internally invoke >>>>>>>> pci_restore_standard_config(). pci_update_current_state() >>>>>>>> will read PCI_PM_CTRL register and the current_state will be >>>>>>>> updated to D0. >>>>>>>> >>>>>>>> In the above process, at step 4, the ACPI device state will still be >>>>>>>> ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being >>>>>>>> invoked. >>>>>> >>>>>> I'm not quite following. >>>>>> >>>>>> I'm assuming that this description applies to the endpoint device that was >>>>>> previously put into D3_hot. >>>>>> >>>>> >>>>> Yes. This is applicable for endpoint devices which was previously put >>>>> into D3hot. >>>>> >>>>>> Since its current state is D3_hot, it is not D0 (in particular) and the >>>>>> pci_set_power_state() in pci_restore_standard_config() should put int into >>>>>> D0 proper, including the platform firmware part. >>>>>> >>>>> >>>>> The pci_restore_standard_config() for endpoint devices are being called >>>>> internally during wake-up of upstream bridge. >>>>> >>>>> pci_power_up(struct pci_dev *dev) >>>>> { >>>>> ... >>>>> if (dev->runtime_d3cold) { >>>>> /* >>>>> * When powering on a bridge from D3cold, the whole hierarchy >>>>> * may be powered on into D0uninitialized state, resume them to >>>>> * give them a chance to suspend again >>>>> */ >>>>> pci_resume_bus(dev->subordinate); >>>>> } >>>>> ... >>>>> } >>>>> >>>>> For the upstream bridge, the above code will trigger the wake-up of >>>>> endpoint devices and then following code will be executed for the >>>>> endpoint devices: >>>>> >>>>> pci_update_current_state(struct pci_dev *dev, pci_power_t state) >>>>> { >>>>> if (platform_pci_get_power_state(dev) == PCI_D3cold || >>>>> !pci_device_is_present(dev)) { >>>>> dev->current_state = PCI_D3cold; >>>>> } else if (dev->pm_cap) { >>>>> u16 pmcsr; >>>>> >>>>> pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); >>>>> dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); >>>>> } else { >>>>> dev->current_state = state; >>>>> } >>>>> } >>>>> >>>>> In the above code, the current_state will be set to D0 for the >>>>> endpoint devices since it will go into second block where >>>>> it will read the PM_CTRL register. >>>>> >>>>>>>> We need call the pci_platform_power_transition() with state >>>>>>>> D0 to change the ACPI state to ACPI_STATE_D0. >>>>>>>> >>>>>>>> This patch calls pci_power_up() if current power state is D0 inside >>>>>>>> pci_restore_standard_config(). This pci_power_up() will change the >>>>>>>> ACPI state to ACPI_STATE_D0. >>>>>>>> >>>>>>>> Following are the steps to confirm: >>>>>>>> >>>>>>>> Enable the debug prints in acpi_pci_set_power_state() >>>>>>>> >>>>>>>> 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device >>>>>>>> >>>>>>>> Before: >>>>>>>> >>>>>>>> 0000:01:00.0: power state changed by ACPI to D3hot >>>>>>>> 0000:00:01.0: power state changed by ACPI to D3cold >>>>>>>> 0000:00:01.0: power state changed by ACPI to D0 >>>>>>>> >>>>>>>> After: >>>>>>>> >>>>>>>> 0000:01:00.0: power state changed by ACPI to D3hot >>>>>>>> 0000:00:01.0: power state changed by ACPI to D3cold >>>>>>>> 0000:00:01.0: power state changed by ACPI to D0 >>>>>>>> 0000:01:00.0: power state changed by ACPI to D0 >>>>>>>> >>>>>>>> So with this patch, the PCI device ACPI state is also being >>>>>>>> changed to D0. >>>>>>>> >>>>>>>> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> >>>>>>>> --- >>>>>>>> drivers/pci/pci-driver.c | 14 +++++++++++--- >>>>>>>> 1 file changed, 11 insertions(+), 3 deletions(-) >>>>>>>> >>>>>>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c >>>>>>>> index 588588cfda48..64e0cca12f16 100644 >>>>>>>> --- a/drivers/pci/pci-driver.c >>>>>>>> +++ b/drivers/pci/pci-driver.c >>>>>>>> @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev) >>>>>>>> */ >>>>>>>> static int pci_restore_standard_config(struct pci_dev *pci_dev) >>>>>>>> { >>>>>>>> + int error = 0; >>>>>>>> pci_update_current_state(pci_dev, PCI_UNKNOWN); >>>>>>>> >>>>>>>> if (pci_dev->current_state != PCI_D0) { >>>>>>>> - int error = pci_set_power_state(pci_dev, PCI_D0); >>>>>>>> - if (error) >>>>>>>> - return error; >>>>>>>> + error = pci_set_power_state(pci_dev, PCI_D0); >>>>>>>> + } else { >>>>>>>> + /* >>>>>>>> + * The platform power state can still be non-D0, so this is >>>>>>>> + * required to change the platform power state to D0. >>>>>>>> + */ >>>>>> >>>>>> This really isn't expected to happen. >>>>>> >>>>>> If the device's power state has been changed to D3hot by ACPI, it is not in D0. >>>>>> >>>>>> It looks like the state tracking is not working here. >>>>>> >>>>> >>>>> The state setting to D0 is happening due to the current logic present in >>>>> pci_update_current_state(). If we can fix the logic in >>>>> pci_update_current_state() to detect this condition and return state D3hot, >>>>> then it should also fix the issue. >>>>> >>>>> Thanks, >>>>> Abhishek >>>>> >>>> >>>> Hi Rafael/Mika, >>>> >>>> Could you please help regarding the correct way to fix this issue. >>>> I can update the patch accordingly. >>> >>> I think you can try one of the patches posted recently: >>> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-pm%2Fpatch%2F3623886.MHq7AAxBmi%40kreacher%2F&data=04%7C01%7Cabhsahu%40nvidia.com%7C87470c4f30f14871bcc708da17c5e913%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637848436478123578%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=USJs87Fmm0Vcjm1YWszZCBTpcNQED3InOuOdGgE3f88%3D&reserved=0 >>> >>> Thanks! >>> >>> >>> >> >> Thanks Rafael. >> I have applied both the changes and still the issue which I mentioned is happening. >> >> Following are the prints: >> >> 0000:01:00.0: power state changed by ACPI to D3hot >> 0000:00:01.0: power state changed by ACPI to D3cold >> 0000:00:01.0: power state changed by ACPI to D0 >> >> So ACPI state change is still not happening for PCI endpoint devices. >> >> Also, the I checked the code and the pci_power_up() will not be called >> for endpoint devices. For endpoints, pci_restore_standard_config() will >> be called first where the current state will come as D0. > > OK, I see. > > The problem is that if the PCI device goes to D0 because of the bridge power-up, > it's ACPI companion's power state may not follow, which means that we really > want to do a full power-up in there. > > Please test the appended patch with the patch from > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-pm%2Fpatch%2F3623886.MHq7AAxBmi%40kreacher%2F&data=04%7C01%7Cabhsahu%40nvidia.com%7C87470c4f30f14871bcc708da17c5e913%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637848436478123578%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=USJs87Fmm0Vcjm1YWszZCBTpcNQED3InOuOdGgE3f88%3D&reserved=0 > > still applied. > > --- > drivers/pci/pci-driver.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: linux-pm/drivers/pci/pci-driver.c > =================================================================== > --- linux-pm.orig/drivers/pci/pci-driver.c > +++ linux-pm/drivers/pci/pci-driver.c > @@ -1312,7 +1312,7 @@ static int pci_pm_runtime_resume(struct > * to a driver because although we left it in D0, it may have gone to > * D3cold when the bridge above it runtime suspended. > */ > - pci_restore_standard_config(pci_dev); > + pci_pm_default_resume_early(pci_dev); > > if (!pci_dev->driver) > return 0; > Thanks Rafael. With the above code change, the issue is getting fixed and the PCI endpoint power state is also changing to D0. 0000:01:00.0: power state changed by ACPI to D3hot 0000:00:01.0: power state changed by ACPI to D3cold 0000:00:01.0: power state changed by ACPI to D0 0000:01:00.0: power state changed by ACPI to D0 Regards, Abhishek
On Wednesday, April 6, 2022 8:43:19 PM CEST Abhishek Sahu wrote: > On 4/6/2022 5:36 PM, Rafael J. Wysocki wrote: > > On Wednesday, April 6, 2022 7:32:45 AM CEST Abhishek Sahu wrote: > >> On 4/5/2022 10:20 PM, Rafael J. Wysocki wrote: > >>> On Tuesday, April 5, 2022 6:36:34 PM CEST Abhishek Sahu wrote: > >>>> On 2/8/2022 4:00 PM, Abhishek Sahu wrote: > >>>>> On 2/8/2022 12:28 AM, Rafael J. Wysocki wrote: > >>>>>> On Saturday, February 5, 2022 12:32:19 AM CET Bjorn Helgaas wrote: > >>>>>>> [+cc Rafael, hoping for your review :) > >>>>>> > >>>>>> +Mika > >>>>>> > >>>>>>> Wonder if we should add something like this to MAINTAINERS so you get > >>>>>>> cc'd on power-related things: > >>>>>>> > >>>>>>> diff --git a/MAINTAINERS b/MAINTAINERS > >>>>>>> index ea3e6c914384..3d9a211cad5d 100644 > >>>>>>> --- a/MAINTAINERS > >>>>>>> +++ b/MAINTAINERS > >>>>>>> @@ -15422,6 +15422,7 @@ F: include/linux/pm.h > >>>>>>> F: include/linux/pm_* > >>>>>>> F: include/linux/powercap.h > >>>>>>> F: kernel/configs/nopm.config > >>>>>>> +K: pci_[a-z_]*power[a-z_]*\( > >>>>>> > >>>>>> It seems so, but generally PM patches should be CCed to linux-pm anyway. > >>>>>> > >>>>>>> > >>>>>>> DYNAMIC THERMAL POWER MANAGEMENT (DTPM) > >>>>>>> M: Daniel Lezcano <daniel.lezcano@kernel.org> > >>>>>>> ] > >>>>>>> > >>>>>>> On Mon, Jan 24, 2022 at 05:51:07PM +0530, Abhishek Sahu wrote: > >>>>>>>> Consider the following sequence during PCI device runtime > >>>>>>>> suspend/resume: > >>>>>>>> > >>>>>>>> 1. PCI device goes into runtime suspended state. The PCI state > >>>>>>>> will be changed to PCI_D0 and then pci_platform_power_transition() > >>>>>>>> will be called which changes the ACPI state to ACPI_STATE_D3_HOT. > >>>>>> > >>>>>> You mean PCI_D3hot I suppose? > >>>>>> > >>>>> > >>>>> Yes. It should be PCI_D3hot here. > >>>>> > >>>>>>>> 2. Parent bridge goes into runtime suspended state. If parent > >>>>>>>> bridge supports D3cold, then it will change the power state of all its > >>>>>>>> children to D3cold state and the power will be removed. > >>>>>>>> > >>>>>>>> 3. During wake-up time, the bridge will be runtime resumed first > >>>>>>>> and pci_power_up() will be called for the bridge. Now, the power > >>>>>>>> supply will be resumed. > >>>>>>>> > >>>>>>>> 4. pci_resume_bus() will be called which will internally invoke > >>>>>>>> pci_restore_standard_config(). pci_update_current_state() > >>>>>>>> will read PCI_PM_CTRL register and the current_state will be > >>>>>>>> updated to D0. > >>>>>>>> > >>>>>>>> In the above process, at step 4, the ACPI device state will still be > >>>>>>>> ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being > >>>>>>>> invoked. > >>>>>> > >>>>>> I'm not quite following. > >>>>>> > >>>>>> I'm assuming that this description applies to the endpoint device that was > >>>>>> previously put into D3_hot. > >>>>>> > >>>>> > >>>>> Yes. This is applicable for endpoint devices which was previously put > >>>>> into D3hot. > >>>>> > >>>>>> Since its current state is D3_hot, it is not D0 (in particular) and the > >>>>>> pci_set_power_state() in pci_restore_standard_config() should put int into > >>>>>> D0 proper, including the platform firmware part. > >>>>>> > >>>>> > >>>>> The pci_restore_standard_config() for endpoint devices are being called > >>>>> internally during wake-up of upstream bridge. > >>>>> > >>>>> pci_power_up(struct pci_dev *dev) > >>>>> { > >>>>> ... > >>>>> if (dev->runtime_d3cold) { > >>>>> /* > >>>>> * When powering on a bridge from D3cold, the whole hierarchy > >>>>> * may be powered on into D0uninitialized state, resume them to > >>>>> * give them a chance to suspend again > >>>>> */ > >>>>> pci_resume_bus(dev->subordinate); > >>>>> } > >>>>> ... > >>>>> } > >>>>> > >>>>> For the upstream bridge, the above code will trigger the wake-up of > >>>>> endpoint devices and then following code will be executed for the > >>>>> endpoint devices: > >>>>> > >>>>> pci_update_current_state(struct pci_dev *dev, pci_power_t state) > >>>>> { > >>>>> if (platform_pci_get_power_state(dev) == PCI_D3cold || > >>>>> !pci_device_is_present(dev)) { > >>>>> dev->current_state = PCI_D3cold; > >>>>> } else if (dev->pm_cap) { > >>>>> u16 pmcsr; > >>>>> > >>>>> pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr); > >>>>> dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); > >>>>> } else { > >>>>> dev->current_state = state; > >>>>> } > >>>>> } > >>>>> > >>>>> In the above code, the current_state will be set to D0 for the > >>>>> endpoint devices since it will go into second block where > >>>>> it will read the PM_CTRL register. > >>>>> > >>>>>>>> We need call the pci_platform_power_transition() with state > >>>>>>>> D0 to change the ACPI state to ACPI_STATE_D0. > >>>>>>>> > >>>>>>>> This patch calls pci_power_up() if current power state is D0 inside > >>>>>>>> pci_restore_standard_config(). This pci_power_up() will change the > >>>>>>>> ACPI state to ACPI_STATE_D0. > >>>>>>>> > >>>>>>>> Following are the steps to confirm: > >>>>>>>> > >>>>>>>> Enable the debug prints in acpi_pci_set_power_state() > >>>>>>>> > >>>>>>>> 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device > >>>>>>>> > >>>>>>>> Before: > >>>>>>>> > >>>>>>>> 0000:01:00.0: power state changed by ACPI to D3hot > >>>>>>>> 0000:00:01.0: power state changed by ACPI to D3cold > >>>>>>>> 0000:00:01.0: power state changed by ACPI to D0 > >>>>>>>> > >>>>>>>> After: > >>>>>>>> > >>>>>>>> 0000:01:00.0: power state changed by ACPI to D3hot > >>>>>>>> 0000:00:01.0: power state changed by ACPI to D3cold > >>>>>>>> 0000:00:01.0: power state changed by ACPI to D0 > >>>>>>>> 0000:01:00.0: power state changed by ACPI to D0 > >>>>>>>> > >>>>>>>> So with this patch, the PCI device ACPI state is also being > >>>>>>>> changed to D0. > >>>>>>>> > >>>>>>>> Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> > >>>>>>>> --- > >>>>>>>> drivers/pci/pci-driver.c | 14 +++++++++++--- > >>>>>>>> 1 file changed, 11 insertions(+), 3 deletions(-) > >>>>>>>> > >>>>>>>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > >>>>>>>> index 588588cfda48..64e0cca12f16 100644 > >>>>>>>> --- a/drivers/pci/pci-driver.c > >>>>>>>> +++ b/drivers/pci/pci-driver.c > >>>>>>>> @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev) > >>>>>>>> */ > >>>>>>>> static int pci_restore_standard_config(struct pci_dev *pci_dev) > >>>>>>>> { > >>>>>>>> + int error = 0; > >>>>>>>> pci_update_current_state(pci_dev, PCI_UNKNOWN); > >>>>>>>> > >>>>>>>> if (pci_dev->current_state != PCI_D0) { > >>>>>>>> - int error = pci_set_power_state(pci_dev, PCI_D0); > >>>>>>>> - if (error) > >>>>>>>> - return error; > >>>>>>>> + error = pci_set_power_state(pci_dev, PCI_D0); > >>>>>>>> + } else { > >>>>>>>> + /* > >>>>>>>> + * The platform power state can still be non-D0, so this is > >>>>>>>> + * required to change the platform power state to D0. > >>>>>>>> + */ > >>>>>> > >>>>>> This really isn't expected to happen. > >>>>>> > >>>>>> If the device's power state has been changed to D3hot by ACPI, it is not in D0. > >>>>>> > >>>>>> It looks like the state tracking is not working here. > >>>>>> > >>>>> > >>>>> The state setting to D0 is happening due to the current logic present in > >>>>> pci_update_current_state(). If we can fix the logic in > >>>>> pci_update_current_state() to detect this condition and return state D3hot, > >>>>> then it should also fix the issue. > >>>>> > >>>>> Thanks, > >>>>> Abhishek > >>>>> > >>>> > >>>> Hi Rafael/Mika, > >>>> > >>>> Could you please help regarding the correct way to fix this issue. > >>>> I can update the patch accordingly. > >>> > >>> I think you can try one of the patches posted recently: > >>> > >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-pm%2Fpatch%2F3623886.MHq7AAxBmi%40kreacher%2F&data=04%7C01%7Cabhsahu%40nvidia.com%7C87470c4f30f14871bcc708da17c5e913%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637848436478123578%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=USJs87Fmm0Vcjm1YWszZCBTpcNQED3InOuOdGgE3f88%3D&reserved=0 > >>> > >>> Thanks! > >>> > >>> > >>> > >> > >> Thanks Rafael. > >> I have applied both the changes and still the issue which I mentioned is happening. > >> > >> Following are the prints: > >> > >> 0000:01:00.0: power state changed by ACPI to D3hot > >> 0000:00:01.0: power state changed by ACPI to D3cold > >> 0000:00:01.0: power state changed by ACPI to D0 > >> > >> So ACPI state change is still not happening for PCI endpoint devices. > >> > >> Also, the I checked the code and the pci_power_up() will not be called > >> for endpoint devices. For endpoints, pci_restore_standard_config() will > >> be called first where the current state will come as D0. > > > > OK, I see. > > > > The problem is that if the PCI device goes to D0 because of the bridge power-up, > > it's ACPI companion's power state may not follow, which means that we really > > want to do a full power-up in there. > > > > Please test the appended patch with the patch from > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-pm%2Fpatch%2F3623886.MHq7AAxBmi%40kreacher%2F&data=04%7C01%7Cabhsahu%40nvidia.com%7C87470c4f30f14871bcc708da17c5e913%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637848436478123578%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=USJs87Fmm0Vcjm1YWszZCBTpcNQED3InOuOdGgE3f88%3D&reserved=0 > > > > still applied. > > > > --- > > drivers/pci/pci-driver.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > Index: linux-pm/drivers/pci/pci-driver.c > > =================================================================== > > --- linux-pm.orig/drivers/pci/pci-driver.c > > +++ linux-pm/drivers/pci/pci-driver.c > > @@ -1312,7 +1312,7 @@ static int pci_pm_runtime_resume(struct > > * to a driver because although we left it in D0, it may have gone to > > * D3cold when the bridge above it runtime suspended. > > */ > > - pci_restore_standard_config(pci_dev); > > + pci_pm_default_resume_early(pci_dev); > > > > if (!pci_dev->driver) > > return 0; > > > > Thanks Rafael. > With the above code change, the issue is getting fixed and the > PCI endpoint power state is also changing to D0. > > 0000:01:00.0: power state changed by ACPI to D3hot > 0000:00:01.0: power state changed by ACPI to D3cold > 0000:00:01.0: power state changed by ACPI to D0 > 0000:01:00.0: power state changed by ACPI to D0 Thanks for testing!
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 588588cfda48..64e0cca12f16 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -521,14 +521,22 @@ static void pci_device_shutdown(struct device *dev) */ static int pci_restore_standard_config(struct pci_dev *pci_dev) { + int error = 0; pci_update_current_state(pci_dev, PCI_UNKNOWN); if (pci_dev->current_state != PCI_D0) { - int error = pci_set_power_state(pci_dev, PCI_D0); - if (error) - return error; + error = pci_set_power_state(pci_dev, PCI_D0); + } else { + /* + * The platform power state can still be non-D0, so this is + * required to change the platform power state to D0. + */ + error = pci_power_up(pci_dev); } + if (error) + return error; + pci_restore_state(pci_dev); pci_pme_restore(pci_dev); return 0;
Consider the following sequence during PCI device runtime suspend/resume: 1. PCI device goes into runtime suspended state. The PCI state will be changed to PCI_D0 and then pci_platform_power_transition() will be called which changes the ACPI state to ACPI_STATE_D3_HOT. 2. Parent bridge goes into runtime suspended state. If parent bridge supports D3cold, then it will change the power state of all its children to D3cold state and the power will be removed. 3. During wake-up time, the bridge will be runtime resumed first and pci_power_up() will be called for the bridge. Now, the power supply will be resumed. 4. pci_resume_bus() will be called which will internally invoke pci_restore_standard_config(). pci_update_current_state() will read PCI_PM_CTRL register and the current_state will be updated to D0. In the above process, at step 4, the ACPI device state will still be ACPI_STATE_D3_HOT since pci_platform_power_transition() is not being invoked. We need call the pci_platform_power_transition() with state D0 to change the ACPI state to ACPI_STATE_D0. This patch calls pci_power_up() if current power state is D0 inside pci_restore_standard_config(). This pci_power_up() will change the ACPI state to ACPI_STATE_D0. Following are the steps to confirm: Enable the debug prints in acpi_pci_set_power_state() 0000:01:00.0 is PCI device and 0000:00:01.0 is parent bridge device Before: 0000:01:00.0: power state changed by ACPI to D3hot 0000:00:01.0: power state changed by ACPI to D3cold 0000:00:01.0: power state changed by ACPI to D0 After: 0000:01:00.0: power state changed by ACPI to D3hot 0000:00:01.0: power state changed by ACPI to D3cold 0000:00:01.0: power state changed by ACPI to D0 0000:01:00.0: power state changed by ACPI to D0 So with this patch, the PCI device ACPI state is also being changed to D0. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> --- drivers/pci/pci-driver.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)