Message ID | 20240925144526.2482-2-ville.syrjala@linux.intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | drm/i915/pm: Clean up the hibernate vs. PCI D3 quirk | expand |
On Wed, Sep 25, 2024 at 05:45:21PM +0300, Ville Syrjala wrote: > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > On some older laptops i915 needs to leave the GPU in > D0 when hibernating the system, or else the BIOS > hangs somewhere. Currently that is achieved by calling > pci_save_state() ahead of time, which then skips the > whole pci_prepare_to_sleep() stuff. IIUC this refers to pci_pm_suspend_noirq(), which has this: if (!pci_dev->state_saved) { pci_save_state(pci_dev); if (!pci_dev->skip_bus_pm && pci_power_manageable(pci_dev)) pci_prepare_to_sleep(pci_dev); } Would be good if the commit log included the name of the function where pci_prepare_to_sleep() is skipped. If there's a general requirement to leave all devices in D0 when hibernating, it would be nice to have have some documentation like an ACPI spec reference. Or if this is some i915-specific thing, maybe a pointer to history like a lore or bugzilla reference. > It feels to me that this approach could lead to unintended > side effects as it causes the pci code to deviate from the > standard path in various ways. In order to keep i915 > behaviour more standard it seems preferrable to use > pci_dev->skip_bus_pm here. Duplicate the relevant logic > from pci_pm_suspend_noirq() in pci_pm_poweroff_noirq(). > > It also looks like the current code is may put the parent > bridge into D3 despite leaving the device in D0. Though > perhaps the host bridge (which is where the integrated > GPU lives) always has subordinates, which would make > this a non-issue for i915. But maybe this could be a > problem for other devices. Utilizing skip_bus_pm will > make the behaviour of leaving the bridge in D0 a bit > more explicit if nothing else. s/is may/may/ Rewrap to fill 75 columns. Could apply to all patches in the series. Will need an ack from Rafael, author of: d491f2b75237 ("PCI: PM: Avoid possible suspend-to-idle issue") 3e26c5feed2a ("PCI: PM: Skip devices in D0 for suspend-to-idle") which added .skip_bus_pm and its use in pci_pm_suspend_noirq(). IIUC this is a cleanup that doesn't fix any known problem? The overall diffstat doesn't make it look like a simplification, although it might certainly be cleaner somehow: > drivers/gpu/drm/i915/i915_driver.c | 121 +++++++++++++++++++---------- > drivers/pci/pci-driver.c | 16 +++- > 2 files changed, 94 insertions(+), 43 deletions(-) > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: "Rafael J. Wysocki" <rafael@kernel.org> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> > Cc: linux-pci@vger.kernel.org > Cc: intel-gfx@lists.freedesktop.org > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > --- > drivers/pci/pci-driver.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index f412ef73a6e4..ef436895939c 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -1142,6 +1142,8 @@ static int pci_pm_poweroff(struct device *dev) > struct pci_dev *pci_dev = to_pci_dev(dev); > const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; > > + pci_dev->skip_bus_pm = false; > + > if (pci_has_legacy_pm_support(pci_dev)) > return pci_legacy_suspend(dev, PMSG_HIBERNATE); > > @@ -1206,9 +1208,21 @@ static int pci_pm_poweroff_noirq(struct device *dev) > return error; > } > > - if (!pci_dev->state_saved && !pci_has_subordinate(pci_dev)) > + if (!pci_dev->state_saved && !pci_dev->skip_bus_pm && > + !pci_has_subordinate(pci_dev)) > pci_prepare_to_sleep(pci_dev); > > + if (pci_dev->current_state == PCI_D0) { > + pci_dev->skip_bus_pm = true; > + /* > + * Per PCI PM r1.2, table 6-1, a bridge must be in D0 if any > + * downstream device is in D0, so avoid changing the power state > + * of the parent bridge by setting the skip_bus_pm flag for it. > + */ > + if (pci_dev->bus->self) > + pci_dev->bus->self->skip_bus_pm = true; > + } > + > /* > * The reason for doing this here is the same as for the analogous code > * in pci_pm_suspend_noirq(). > -- > 2.44.2 >
On Wed, Sep 25, 2024 at 02:28:42PM -0500, Bjorn Helgaas wrote: > On Wed, Sep 25, 2024 at 05:45:21PM +0300, Ville Syrjala wrote: > > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > On some older laptops i915 needs to leave the GPU in > > D0 when hibernating the system, or else the BIOS > > hangs somewhere. Currently that is achieved by calling > > pci_save_state() ahead of time, which then skips the > > whole pci_prepare_to_sleep() stuff. > > IIUC this refers to pci_pm_suspend_noirq(), which has this: > > if (!pci_dev->state_saved) { > pci_save_state(pci_dev); > if (!pci_dev->skip_bus_pm && pci_power_manageable(pci_dev)) > pci_prepare_to_sleep(pci_dev); > } > > Would be good if the commit log included the name of the function > where pci_prepare_to_sleep() is skipped. Sure, I can amend the commit msg. > > If there's a general requirement to leave all devices in D0 when > hibernating, it would be nice to have have some documentation like an > ACPI spec reference. No, IIRC the ACPI spec even says that you must (or at least should) put devices into D3. But the buggy BIOS on some old laptops keels over when you do that. Hence we need this quirk. > Or if this is some i915-specific thing, maybe a pointer to history > like a lore or bugzilla reference. The two relevant commits I can find are: commit 54875571bbfd ("drm/i915: apply the PCI_D0/D3 hibernation workaround everywhere on pre GEN6") commit ab3be73fa7b4 ("drm/i915: gen4: work around hang during hibernation") > > > It feels to me that this approach could lead to unintended > > side effects as it causes the pci code to deviate from the > > standard path in various ways. In order to keep i915 > > behaviour more standard it seems preferrable to use > > pci_dev->skip_bus_pm here. Duplicate the relevant logic > > from pci_pm_suspend_noirq() in pci_pm_poweroff_noirq(). > > > > It also looks like the current code is may put the parent > > bridge into D3 despite leaving the device in D0. Though > > perhaps the host bridge (which is where the integrated > > GPU lives) always has subordinates, which would make > > this a non-issue for i915. But maybe this could be a > > problem for other devices. Utilizing skip_bus_pm will > > make the behaviour of leaving the bridge in D0 a bit > > more explicit if nothing else. > > s/is may/may/ > > Rewrap to fill 75 columns. Could apply to all patches in the series. > > Will need an ack from Rafael, author of: > > d491f2b75237 ("PCI: PM: Avoid possible suspend-to-idle issue") > 3e26c5feed2a ("PCI: PM: Skip devices in D0 for suspend-to-idle") > > which added .skip_bus_pm and its use in pci_pm_suspend_noirq(). > > IIUC this is a cleanup that doesn't fix any known problem? The > overall diffstat doesn't make it look like a simplification, although > it might certainly be cleaner somehow: My main concern is that using pci_save_state() might cause the pci code to deviate from the normal path in more ways than just skipping the D0->D3 transition. And then we might end up constantly chasing after driver/pci changes in order to match its behaviour. Not to mention that having the pci_save_state() in the driver code is clearly confusing a bunch of our developers. > > > drivers/gpu/drm/i915/i915_driver.c | 121 +++++++++++++++++++---------- > > drivers/pci/pci-driver.c | 16 +++- > > 2 files changed, 94 insertions(+), 43 deletions(-) > > > Cc: Bjorn Helgaas <bhelgaas@google.com> > > Cc: "Rafael J. Wysocki" <rafael@kernel.org> > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> > > Cc: linux-pci@vger.kernel.org > > Cc: intel-gfx@lists.freedesktop.org > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > > --- > > drivers/pci/pci-driver.c | 16 +++++++++++++++- > > 1 file changed, 15 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > > index f412ef73a6e4..ef436895939c 100644 > > --- a/drivers/pci/pci-driver.c > > +++ b/drivers/pci/pci-driver.c > > @@ -1142,6 +1142,8 @@ static int pci_pm_poweroff(struct device *dev) > > struct pci_dev *pci_dev = to_pci_dev(dev); > > const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; > > > > + pci_dev->skip_bus_pm = false; > > + > > if (pci_has_legacy_pm_support(pci_dev)) > > return pci_legacy_suspend(dev, PMSG_HIBERNATE); > > > > @@ -1206,9 +1208,21 @@ static int pci_pm_poweroff_noirq(struct device *dev) > > return error; > > } > > > > - if (!pci_dev->state_saved && !pci_has_subordinate(pci_dev)) > > + if (!pci_dev->state_saved && !pci_dev->skip_bus_pm && > > + !pci_has_subordinate(pci_dev)) > > pci_prepare_to_sleep(pci_dev); > > > > + if (pci_dev->current_state == PCI_D0) { > > + pci_dev->skip_bus_pm = true; > > + /* > > + * Per PCI PM r1.2, table 6-1, a bridge must be in D0 if any > > + * downstream device is in D0, so avoid changing the power state > > + * of the parent bridge by setting the skip_bus_pm flag for it. > > + */ > > + if (pci_dev->bus->self) > > + pci_dev->bus->self->skip_bus_pm = true; > > + } > > + > > /* > > * The reason for doing this here is the same as for the analogous code > > * in pci_pm_suspend_noirq(). > > -- > > 2.44.2 > >
On Thu, Sep 26, 2024 at 07:03:16PM +0300, Ville Syrjälä wrote: > On Wed, Sep 25, 2024 at 02:28:42PM -0500, Bjorn Helgaas wrote: > > On Wed, Sep 25, 2024 at 05:45:21PM +0300, Ville Syrjala wrote: > > > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > > > On some older laptops i915 needs to leave the GPU in > > > D0 when hibernating the system, or else the BIOS > > > hangs somewhere. Currently that is achieved by calling > > > pci_save_state() ahead of time, which then skips the > > > whole pci_prepare_to_sleep() stuff. > > If there's a general requirement to leave all devices in D0 when > > hibernating, it would be nice to have have some documentation like an > > ACPI spec reference. > > No, IIRC the ACPI spec even says that you must (or at least > should) put devices into D3. But the buggy BIOS on some old > laptops keels over when you do that. Hence we need this quirk. Can we include a reference to this part of the ACPI spec and some details on which laptops have this issue? I'm a little bit wary of changing the PCI core in a generic-looking way on the basis of some unspecified buggy old BIOS. That feels like something we're likely to break in the future. > > Or if this is some i915-specific thing, maybe a pointer to history > > like a lore or bugzilla reference. > > The two relevant commits I can find are: > > commit 54875571bbfd ("drm/i915: apply the PCI_D0/D3 hibernation > workaround everywhere on pre GEN6") > commit ab3be73fa7b4 ("drm/i915: gen4: work around hang during > hibernation") Thanks, this feels like important history to include somewhere. > > IIUC this is a cleanup that doesn't fix any known problem? The > > overall diffstat doesn't make it look like a simplification, although > > it might certainly be cleaner somehow: > > My main concern is that using pci_save_state() might cause the pci > code to deviate from the normal path in more ways than just skipping > the D0->D3 transition. And then we might end up constantly chasing > after driver/pci changes in order to match its behaviour. > > Not to mention that having the pci_save_state() in the driver code > is clearly confusing a bunch of our developers. I'm all in favor of removing pci_save_state() from drivers when possible. I take it that this doesn't fix a functional issue. Bjorn
On Mon, Sep 30, 2024 at 02:50:09PM -0500, Bjorn Helgaas wrote: > On Thu, Sep 26, 2024 at 07:03:16PM +0300, Ville Syrjälä wrote: > > On Wed, Sep 25, 2024 at 02:28:42PM -0500, Bjorn Helgaas wrote: > > > On Wed, Sep 25, 2024 at 05:45:21PM +0300, Ville Syrjala wrote: > > > > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > > > > > On some older laptops i915 needs to leave the GPU in > > > > D0 when hibernating the system, or else the BIOS > > > > hangs somewhere. Currently that is achieved by calling > > > > pci_save_state() ahead of time, which then skips the > > > > whole pci_prepare_to_sleep() stuff. > > > > If there's a general requirement to leave all devices in D0 when > > > hibernating, it would be nice to have have some documentation like an > > > ACPI spec reference. > > > > No, IIRC the ACPI spec even says that you must (or at least > > should) put devices into D3. But the buggy BIOS on some old > > laptops keels over when you do that. Hence we need this quirk. > > Can we include a reference to this part of the ACPI spec It's been years since I looked at that, but a quick trawl of the ACPI 6.3 spec (what I had at hand) landed me this: "7.4.2.5 System \_S4 State ... - Devices states are compatible with the current Power Resource states. In other words, all devices are in the D3 state when the system state is S4." "16.1.6 Transitioning from the Working to the Sleeping State ... 4. OSPM places all device drivers into their respective Dx state. If the device is enabled for wake, it enters the Dx state associated with the wake capability. If the device is not enabled to wake the system, it enters the D3 state." > and some > details on which laptops have this issue? The known models are listed in a comment in i915 code (added in the two mentioned commits), though I suspect there are probably more because we couldn't find any obvious pattern why these known models are affected. > > I'm a little bit wary of changing the PCI core in a generic-looking > way on the basis of some unspecified buggy old BIOS. That feels like > something we're likely to break in the future. > > > > Or if this is some i915-specific thing, maybe a pointer to history > > > like a lore or bugzilla reference. > > > > The two relevant commits I can find are: > > > > commit 54875571bbfd ("drm/i915: apply the PCI_D0/D3 hibernation > > workaround everywhere on pre GEN6") > > commit ab3be73fa7b4 ("drm/i915: gen4: work around hang during > > hibernation") > > Thanks, this feels like important history to include somewhere. > > > > IIUC this is a cleanup that doesn't fix any known problem? The > > > overall diffstat doesn't make it look like a simplification, although > > > it might certainly be cleaner somehow: > > > > My main concern is that using pci_save_state() might cause the pci > > code to deviate from the normal path in more ways than just skipping > > the D0->D3 transition. And then we might end up constantly chasing > > after driver/pci changes in order to match its behaviour. > > > > Not to mention that having the pci_save_state() in the driver code > > is clearly confusing a bunch of our developers. > > I'm all in favor of removing pci_save_state() from drivers when > possible. I take it that this doesn't fix a functional issue. No known issue so far. But we are probably going to add eg. PME support at some point, and the fact that pci_save_state() also skips pci_enable_wake() makes me think we'd have to hand roll a lot more stuff in the driver code if we keep using the pci_save_state(). Though I suppose we could do the pci_save_state() only on those old systems which won't have PME anyway.
On Wed, Sep 25, 2024 at 02:28:42PM -0500, Bjorn Helgaas wrote: > On Wed, Sep 25, 2024 at 05:45:21PM +0300, Ville Syrjala wrote: > > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > On some older laptops i915 needs to leave the GPU in > > D0 when hibernating the system, or else the BIOS > > hangs somewhere. Currently that is achieved by calling > > pci_save_state() ahead of time, which then skips the > > whole pci_prepare_to_sleep() stuff. > > IIUC this refers to pci_pm_suspend_noirq(), which has this: > > if (!pci_dev->state_saved) { > pci_save_state(pci_dev); > if (!pci_dev->skip_bus_pm && pci_power_manageable(pci_dev)) > pci_prepare_to_sleep(pci_dev); > } > > Would be good if the commit log included the name of the function > where pci_prepare_to_sleep() is skipped. > > If there's a general requirement to leave all devices in D0 when > hibernating, it would be nice to have have some documentation like an > ACPI spec reference. > > Or if this is some i915-specific thing, maybe a pointer to history > like a lore or bugzilla reference. > > > It feels to me that this approach could lead to unintended > > side effects as it causes the pci code to deviate from the > > standard path in various ways. In order to keep i915 > > behaviour more standard it seems preferrable to use > > pci_dev->skip_bus_pm here. Duplicate the relevant logic > > from pci_pm_suspend_noirq() in pci_pm_poweroff_noirq(). > > > > It also looks like the current code is may put the parent > > bridge into D3 despite leaving the device in D0. Though > > perhaps the host bridge (which is where the integrated > > GPU lives) always has subordinates, which would make > > this a non-issue for i915. But maybe this could be a > > problem for other devices. Utilizing skip_bus_pm will > > make the behaviour of leaving the bridge in D0 a bit > > more explicit if nothing else. > > s/is may/may/ > > Rewrap to fill 75 columns. Could apply to all patches in the series. > > Will need an ack from Rafael, author of: > > d491f2b75237 ("PCI: PM: Avoid possible suspend-to-idle issue") > 3e26c5feed2a ("PCI: PM: Skip devices in D0 for suspend-to-idle") > > which added .skip_bus_pm and its use in pci_pm_suspend_noirq(). Rafael, any thoughts on this stuff? > > IIUC this is a cleanup that doesn't fix any known problem? The > overall diffstat doesn't make it look like a simplification, although > it might certainly be cleaner somehow: > > > drivers/gpu/drm/i915/i915_driver.c | 121 +++++++++++++++++++---------- > > drivers/pci/pci-driver.c | 16 +++- > > 2 files changed, 94 insertions(+), 43 deletions(-) > > > Cc: Bjorn Helgaas <bhelgaas@google.com> > > Cc: "Rafael J. Wysocki" <rafael@kernel.org> > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> > > Cc: linux-pci@vger.kernel.org > > Cc: intel-gfx@lists.freedesktop.org > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > > --- > > drivers/pci/pci-driver.c | 16 +++++++++++++++- > > 1 file changed, 15 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > > index f412ef73a6e4..ef436895939c 100644 > > --- a/drivers/pci/pci-driver.c > > +++ b/drivers/pci/pci-driver.c > > @@ -1142,6 +1142,8 @@ static int pci_pm_poweroff(struct device *dev) > > struct pci_dev *pci_dev = to_pci_dev(dev); > > const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; > > > > + pci_dev->skip_bus_pm = false; > > + > > if (pci_has_legacy_pm_support(pci_dev)) > > return pci_legacy_suspend(dev, PMSG_HIBERNATE); > > > > @@ -1206,9 +1208,21 @@ static int pci_pm_poweroff_noirq(struct device *dev) > > return error; > > } > > > > - if (!pci_dev->state_saved && !pci_has_subordinate(pci_dev)) > > + if (!pci_dev->state_saved && !pci_dev->skip_bus_pm && > > + !pci_has_subordinate(pci_dev)) > > pci_prepare_to_sleep(pci_dev); > > > > + if (pci_dev->current_state == PCI_D0) { > > + pci_dev->skip_bus_pm = true; > > + /* > > + * Per PCI PM r1.2, table 6-1, a bridge must be in D0 if any > > + * downstream device is in D0, so avoid changing the power state > > + * of the parent bridge by setting the skip_bus_pm flag for it. > > + */ > > + if (pci_dev->bus->self) > > + pci_dev->bus->self->skip_bus_pm = true; > > + } > > + > > /* > > * The reason for doing this here is the same as for the analogous code > > * in pci_pm_suspend_noirq(). > > -- > > 2.44.2 > >
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index f412ef73a6e4..ef436895939c 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -1142,6 +1142,8 @@ static int pci_pm_poweroff(struct device *dev) struct pci_dev *pci_dev = to_pci_dev(dev); const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; + pci_dev->skip_bus_pm = false; + if (pci_has_legacy_pm_support(pci_dev)) return pci_legacy_suspend(dev, PMSG_HIBERNATE); @@ -1206,9 +1208,21 @@ static int pci_pm_poweroff_noirq(struct device *dev) return error; } - if (!pci_dev->state_saved && !pci_has_subordinate(pci_dev)) + if (!pci_dev->state_saved && !pci_dev->skip_bus_pm && + !pci_has_subordinate(pci_dev)) pci_prepare_to_sleep(pci_dev); + if (pci_dev->current_state == PCI_D0) { + pci_dev->skip_bus_pm = true; + /* + * Per PCI PM r1.2, table 6-1, a bridge must be in D0 if any + * downstream device is in D0, so avoid changing the power state + * of the parent bridge by setting the skip_bus_pm flag for it. + */ + if (pci_dev->bus->self) + pci_dev->bus->self->skip_bus_pm = true; + } + /* * The reason for doing this here is the same as for the analogous code * in pci_pm_suspend_noirq().