Message ID | 20240416043225.1462548-2-kai.heng.feng@canonical.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | [v8,1/3] PCI: Add helper to check if any of ancestor device support D3cold | expand |
On 4/15/24 9:32 PM, Kai-Heng Feng wrote: > When the power rail gets cut off, the hardware can create some electric > noise on the link that triggers AER. If IRQ is shared between AER with > PME, such AER noise will cause a spurious wakeup on system suspend. > > When the power rail gets back, the firmware of the device resets itself > and can create unexpected behavior like sending PTM messages. For this > case, the driver will always be too late to toggle off features should > be disabled. > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power > Management", TLP and DLLP transmission are disabled for a Link in L2/L3 > Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if > the power will be turned off during suspend process, disable AER service > and re-enable it during the resume process. This should not affect the > basic functionality. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > --- Looks good to me. Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> > v8: > - Add more bug reports. > > v7: > - Wording > - Disable AER completely (again) if power will be turned off > > v6: > v5: > - Wording. > > v4: > v3: > - No change. > > v2: > - Only disable AER IRQ. > - No more check on PME IRQ#. > - Use helper. > > drivers/pci/pcie/aer.c | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index ac6293c24976..bea7818c2d1b 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -28,6 +28,7 @@ > #include <linux/delay.h> > #include <linux/kfifo.h> > #include <linux/slab.h> > +#include <linux/suspend.h> > #include <acpi/apei.h> > #include <acpi/ghes.h> > #include <ras/ras_event.h> > @@ -1497,6 +1498,28 @@ static int aer_probe(struct pcie_device *dev) > return 0; > } > > +static int aer_suspend(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + struct pci_dev *pdev = rpc->rpd; > + > + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware()) > + aer_disable_rootport(rpc); > + > + return 0; > +} > + > +static int aer_resume(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + struct pci_dev *pdev = rpc->rpd; > + > + if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware()) > + aer_enable_rootport(rpc); > + > + return 0; > +} > + > /** > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > * @dev: pointer to Root Port, RCEC, or RCiEP > @@ -1561,6 +1584,8 @@ static struct pcie_port_service_driver aerdriver = { > .service = PCIE_PORT_SERVICE_AER, > > .probe = aer_probe, > + .suspend = aer_suspend, > + .resume = aer_resume, > .remove = aer_remove, > }; >
On Tue, Apr 16, 2024 at 12:32:24PM +0800, Kai-Heng Feng wrote: > When the power rail gets cut off, the hardware can create some electric > noise on the link that triggers AER. If IRQ is shared between AER with > PME, such AER noise will cause a spurious wakeup on system suspend. > > When the power rail gets back, the firmware of the device resets itself > and can create unexpected behavior like sending PTM messages. For this > case, the driver will always be too late to toggle off features should > be disabled. > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power > Management", TLP and DLLP transmission are disabled for a Link in L2/L3 > Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if > the power will be turned off during suspend process, disable AER service > and re-enable it during the resume process. This should not affect the > basic functionality. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Thanks for reviving this series. I tried follow the history about this, but there are at least two series that were very similar and I can't put it all together. > --- > v8: > - Add more bug reports. > > v7: > - Wording > - Disable AER completely (again) if power will be turned off > > v6: > v5: > - Wording. > > v4: > v3: > - No change. > > v2: > - Only disable AER IRQ. > - No more check on PME IRQ#. > - Use helper. > > drivers/pci/pcie/aer.c | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index ac6293c24976..bea7818c2d1b 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -28,6 +28,7 @@ > #include <linux/delay.h> > #include <linux/kfifo.h> > #include <linux/slab.h> > +#include <linux/suspend.h> > #include <acpi/apei.h> > #include <acpi/ghes.h> > #include <ras/ras_event.h> > @@ -1497,6 +1498,28 @@ static int aer_probe(struct pcie_device *dev) > return 0; > } > > +static int aer_suspend(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + struct pci_dev *pdev = rpc->rpd; > + > + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware()) > + aer_disable_rootport(rpc); Why do we check pci_ancestor_pr3_present(pdev) and pm_suspend_via_firmware()? I'm getting pretty convinced that we need to disable AER interrupts on suspend in general. I think it will be better if we do that consistently on all platforms, not special cases based on details of how we suspend. Also, why do we use aer_disable_rootport() instead of just aer_disable_irq()? I think it's the interrupt that causes issues on suspend. I see that there *were* some versions that used aer_disable_irq(), but I can't find the reason it changed. > + > + return 0; > +} > + > +static int aer_resume(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + struct pci_dev *pdev = rpc->rpd; > + > + if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware()) > + aer_enable_rootport(rpc); > + > + return 0; > +} > + > /** > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > * @dev: pointer to Root Port, RCEC, or RCiEP > @@ -1561,6 +1584,8 @@ static struct pcie_port_service_driver aerdriver = { > .service = PCIE_PORT_SERVICE_AER, > > .probe = aer_probe, > + .suspend = aer_suspend, > + .resume = aer_resume, > .remove = aer_remove, > }; > > -- > 2.34.1 >
On Fri, Apr 19, 2024 at 4:35 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Tue, Apr 16, 2024 at 12:32:24PM +0800, Kai-Heng Feng wrote: > > When the power rail gets cut off, the hardware can create some electric > > noise on the link that triggers AER. If IRQ is shared between AER with > > PME, such AER noise will cause a spurious wakeup on system suspend. > > > > When the power rail gets back, the firmware of the device resets itself > > and can create unexpected behavior like sending PTM messages. For this > > case, the driver will always be too late to toggle off features should > > be disabled. > > > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power > > Management", TLP and DLLP transmission are disabled for a Link in L2/L3 > > Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if > > the power will be turned off during suspend process, disable AER service > > and re-enable it during the resume process. This should not affect the > > basic functionality. > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > > Thanks for reviving this series. I tried follow the history about > this, but there are at least two series that were very similar and I > can't put it all together. > > > --- > > v8: > > - Add more bug reports. > > > > v7: > > - Wording > > - Disable AER completely (again) if power will be turned off > > > > v6: > > v5: > > - Wording. > > > > v4: > > v3: > > - No change. > > > > v2: > > - Only disable AER IRQ. > > - No more check on PME IRQ#. > > - Use helper. > > > > drivers/pci/pcie/aer.c | 25 +++++++++++++++++++++++++ > > 1 file changed, 25 insertions(+) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index ac6293c24976..bea7818c2d1b 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -28,6 +28,7 @@ > > #include <linux/delay.h> > > #include <linux/kfifo.h> > > #include <linux/slab.h> > > +#include <linux/suspend.h> > > #include <acpi/apei.h> > > #include <acpi/ghes.h> > > #include <ras/ras_event.h> > > @@ -1497,6 +1498,28 @@ static int aer_probe(struct pcie_device *dev) > > return 0; > > } > > > > +static int aer_suspend(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware()) > > + aer_disable_rootport(rpc); > > Why do we check pci_ancestor_pr3_present(pdev) and > pm_suspend_via_firmware()? I'm getting pretty convinced that we need > to disable AER interrupts on suspend in general. I think it will be > better if we do that consistently on all platforms, not special cases > based on details of how we suspend. Sure. Will change in next revision. > > Also, why do we use aer_disable_rootport() instead of just > aer_disable_irq()? I think it's the interrupt that causes issues on > suspend. I see that there *were* some versions that used > aer_disable_irq(), but I can't find the reason it changed. Interrupt can cause system wakeup, if it's shared with PME. The reason why aer_disable_rootport() is used over aer_disable_irq() is that when the latter is used the error still gets logged during sleep cycle. Once the pcieport driver resumes, it invokes aer_root_reset() to reset the hierarchy, while the hierarchy hasn't resumed yet. So use aer_disable_rootport() to prevent such issue from happening. Kai-Heng > > > + > > + return 0; > > +} > > + > > +static int aer_resume(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware()) > > + aer_enable_rootport(rpc); > > + > > + return 0; > > +} > > + > > /** > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > > * @dev: pointer to Root Port, RCEC, or RCiEP > > @@ -1561,6 +1584,8 @@ static struct pcie_port_service_driver aerdriver = { > > .service = PCIE_PORT_SERVICE_AER, > > > > .probe = aer_probe, > > + .suspend = aer_suspend, > > + .resume = aer_resume, > > .remove = aer_remove, > > }; > > > > -- > > 2.34.1 > >
On Thu, Apr 25, 2024 at 03:33:01PM +0800, Kai-Heng Feng wrote: > On Fri, Apr 19, 2024 at 4:35 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > On Tue, Apr 16, 2024 at 12:32:24PM +0800, Kai-Heng Feng wrote: > > > When the power rail gets cut off, the hardware can create some electric > > > noise on the link that triggers AER. If IRQ is shared between AER with > > > PME, such AER noise will cause a spurious wakeup on system suspend. > > > > > > When the power rail gets back, the firmware of the device resets itself > > > and can create unexpected behavior like sending PTM messages. For this > > > case, the driver will always be too late to toggle off features should > > > be disabled. > > > > > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power > > > Management", TLP and DLLP transmission are disabled for a Link in L2/L3 > > > Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if > > > the power will be turned off during suspend process, disable AER service > > > and re-enable it during the resume process. This should not affect the > > > basic functionality. > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 > > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > > > > Thanks for reviving this series. I tried follow the history about > > this, but there are at least two series that were very similar and I > > can't put it all together. > > > > > --- > > > v8: > > > - Add more bug reports. > > > > > > v7: > > > - Wording > > > - Disable AER completely (again) if power will be turned off > > > > > > v6: > > > v5: > > > - Wording. > > > > > > v4: > > > v3: > > > - No change. > > > > > > v2: > > > - Only disable AER IRQ. > > > - No more check on PME IRQ#. > > > - Use helper. > > > > > > drivers/pci/pcie/aer.c | 25 +++++++++++++++++++++++++ > > > 1 file changed, 25 insertions(+) > > > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > > index ac6293c24976..bea7818c2d1b 100644 > > > --- a/drivers/pci/pcie/aer.c > > > +++ b/drivers/pci/pcie/aer.c > > > @@ -28,6 +28,7 @@ > > > #include <linux/delay.h> > > > #include <linux/kfifo.h> > > > #include <linux/slab.h> > > > +#include <linux/suspend.h> > > > #include <acpi/apei.h> > > > #include <acpi/ghes.h> > > > #include <ras/ras_event.h> > > > @@ -1497,6 +1498,28 @@ static int aer_probe(struct pcie_device *dev) > > > return 0; > > > } > > > > > > +static int aer_suspend(struct pcie_device *dev) > > > +{ > > > + struct aer_rpc *rpc = get_service_data(dev); > > > + struct pci_dev *pdev = rpc->rpd; > > > + > > > + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware()) > > > + aer_disable_rootport(rpc); > > > > Why do we check pci_ancestor_pr3_present(pdev) and > > pm_suspend_via_firmware()? I'm getting pretty convinced that we need > > to disable AER interrupts on suspend in general. I think it will be > > better if we do that consistently on all platforms, not special cases > > based on details of how we suspend. > > Sure. Will change in next revision. > > > Also, why do we use aer_disable_rootport() instead of just > > aer_disable_irq()? I think it's the interrupt that causes issues on > > suspend. I see that there *were* some versions that used > > aer_disable_irq(), but I can't find the reason it changed. > > Interrupt can cause system wakeup, if it's shared with PME. > > The reason why aer_disable_rootport() is used over aer_disable_irq() > is that when the latter is used the error still gets logged during > sleep cycle. Once the pcieport driver resumes, it invokes > aer_root_reset() to reset the hierarchy, while the hierarchy hasn't > resumed yet. > > So use aer_disable_rootport() to prevent such issue from happening. I think the issue is more likely on the resume side. aer_disable_rootport() disables AER interrupts, then clears PCI_ERR_ROOT_STATUS, so the path looks like this: aer_suspend aer_disable_rootport aer_disable_irq() pci_write_config_dword(PCI_ERR_ROOT_STATUS) # clear This happens during suspend, so at this point I think the link is still active and the spurious AER errors haven't happened yet and it probably doesn't matter that we clear PCI_ERR_ROOT_STATUS *here*. My guess is that what really matters is that we disable the AER interrupt so it doesn't happen during suspend, and then when we resume, we probably want to clear out the status registers before re-enabling the AER interrupt. In any event, I think we need to push this forward. I'll post a v9 based on this but dropping the pci_ancestor_pr3_present(pdev) and pm_suspend_via_firmware() tests so we do this unconditionally. > > > + return 0; > > > +} > > > + > > > +static int aer_resume(struct pcie_device *dev) > > > +{ > > > + struct aer_rpc *rpc = get_service_data(dev); > > > + struct pci_dev *pdev = rpc->rpd; > > > + > > > + if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware()) > > > + aer_enable_rootport(rpc); > > > + > > > + return 0; > > > +} > > > + > > > /** > > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > > > * @dev: pointer to Root Port, RCEC, or RCiEP > > > @@ -1561,6 +1584,8 @@ static struct pcie_port_service_driver aerdriver = { > > > .service = PCIE_PORT_SERVICE_AER, > > > > > > .probe = aer_probe, > > > + .suspend = aer_suspend, > > > + .resume = aer_resume, > > > .remove = aer_remove, > > > }; > > > > > > -- > > > 2.34.1 > > >
On Wed, Jun 19, 2024 at 4:48 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Thu, Apr 25, 2024 at 03:33:01PM +0800, Kai-Heng Feng wrote: > > On Fri, Apr 19, 2024 at 4:35 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > On Tue, Apr 16, 2024 at 12:32:24PM +0800, Kai-Heng Feng wrote: > > > > When the power rail gets cut off, the hardware can create some electric > > > > noise on the link that triggers AER. If IRQ is shared between AER with > > > > PME, such AER noise will cause a spurious wakeup on system suspend. > > > > > > > > When the power rail gets back, the firmware of the device resets itself > > > > and can create unexpected behavior like sending PTM messages. For this > > > > case, the driver will always be too late to toggle off features should > > > > be disabled. > > > > > > > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power > > > > Management", TLP and DLLP transmission are disabled for a Link in L2/L3 > > > > Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if > > > > the power will be turned off during suspend process, disable AER service > > > > and re-enable it during the resume process. This should not affect the > > > > basic functionality. > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 > > > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > > > > > > Thanks for reviving this series. I tried follow the history about > > > this, but there are at least two series that were very similar and I > > > can't put it all together. > > > > > > > --- > > > > v8: > > > > - Add more bug reports. > > > > > > > > v7: > > > > - Wording > > > > - Disable AER completely (again) if power will be turned off > > > > > > > > v6: > > > > v5: > > > > - Wording. > > > > > > > > v4: > > > > v3: > > > > - No change. > > > > > > > > v2: > > > > - Only disable AER IRQ. > > > > - No more check on PME IRQ#. > > > > - Use helper. > > > > > > > > drivers/pci/pcie/aer.c | 25 +++++++++++++++++++++++++ > > > > 1 file changed, 25 insertions(+) > > > > > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > > > index ac6293c24976..bea7818c2d1b 100644 > > > > --- a/drivers/pci/pcie/aer.c > > > > +++ b/drivers/pci/pcie/aer.c > > > > @@ -28,6 +28,7 @@ > > > > #include <linux/delay.h> > > > > #include <linux/kfifo.h> > > > > #include <linux/slab.h> > > > > +#include <linux/suspend.h> > > > > #include <acpi/apei.h> > > > > #include <acpi/ghes.h> > > > > #include <ras/ras_event.h> > > > > @@ -1497,6 +1498,28 @@ static int aer_probe(struct pcie_device *dev) > > > > return 0; > > > > } > > > > > > > > +static int aer_suspend(struct pcie_device *dev) > > > > +{ > > > > + struct aer_rpc *rpc = get_service_data(dev); > > > > + struct pci_dev *pdev = rpc->rpd; > > > > + > > > > + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware()) > > > > + aer_disable_rootport(rpc); > > > > > > Why do we check pci_ancestor_pr3_present(pdev) and > > > pm_suspend_via_firmware()? I'm getting pretty convinced that we need > > > to disable AER interrupts on suspend in general. I think it will be > > > better if we do that consistently on all platforms, not special cases > > > based on details of how we suspend. > > > > Sure. Will change in next revision. > > > > > Also, why do we use aer_disable_rootport() instead of just > > > aer_disable_irq()? I think it's the interrupt that causes issues on > > > suspend. I see that there *were* some versions that used > > > aer_disable_irq(), but I can't find the reason it changed. > > > > Interrupt can cause system wakeup, if it's shared with PME. > > > > The reason why aer_disable_rootport() is used over aer_disable_irq() > > is that when the latter is used the error still gets logged during > > sleep cycle. Once the pcieport driver resumes, it invokes > > aer_root_reset() to reset the hierarchy, while the hierarchy hasn't > > resumed yet. > > > > So use aer_disable_rootport() to prevent such issue from happening. > > I think the issue is more likely on the resume side. > > aer_disable_rootport() disables AER interrupts, then clears > PCI_ERR_ROOT_STATUS, so the path looks like this: > > aer_suspend > aer_disable_rootport > aer_disable_irq() > pci_write_config_dword(PCI_ERR_ROOT_STATUS) # clear > > This happens during suspend, so at this point I think the link is > still active and the spurious AER errors haven't happened yet and it > probably doesn't matter that we clear PCI_ERR_ROOT_STATUS *here*. > > My guess is that what really matters is that we disable the AER > interrupt so it doesn't happen during suspend, and then when we > resume, we probably want to clear out the status registers before > re-enabling the AER interrupt. Thanks for catching this. Clearing status registers does the trick for my cases here. > > In any event, I think we need to push this forward. I'll post a v9 > based on this but dropping the pci_ancestor_pr3_present(pdev) and > pm_suspend_via_firmware() tests so we do this unconditionally. Thanks for the v9. Kai-Heng > > > > > + return 0; > > > > +} > > > > + > > > > +static int aer_resume(struct pcie_device *dev) > > > > +{ > > > > + struct aer_rpc *rpc = get_service_data(dev); > > > > + struct pci_dev *pdev = rpc->rpd; > > > > + > > > > + if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware()) > > > > + aer_enable_rootport(rpc); > > > > + > > > > + return 0; > > > > +} > > > > + > > > > /** > > > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > > > > * @dev: pointer to Root Port, RCEC, or RCiEP > > > > @@ -1561,6 +1584,8 @@ static struct pcie_port_service_driver aerdriver = { > > > > .service = PCIE_PORT_SERVICE_AER, > > > > > > > > .probe = aer_probe, > > > > + .suspend = aer_suspend, > > > > + .resume = aer_resume, > > > > .remove = aer_remove, > > > > }; > > > > > > > > -- > > > > 2.34.1 > > > >
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index ac6293c24976..bea7818c2d1b 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -28,6 +28,7 @@ #include <linux/delay.h> #include <linux/kfifo.h> #include <linux/slab.h> +#include <linux/suspend.h> #include <acpi/apei.h> #include <acpi/ghes.h> #include <ras/ras_event.h> @@ -1497,6 +1498,28 @@ static int aer_probe(struct pcie_device *dev) return 0; } +static int aer_suspend(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware()) + aer_disable_rootport(rpc); + + return 0; +} + +static int aer_resume(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware()) + aer_enable_rootport(rpc); + + return 0; +} + /** * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP * @dev: pointer to Root Port, RCEC, or RCiEP @@ -1561,6 +1584,8 @@ static struct pcie_port_service_driver aerdriver = { .service = PCIE_PORT_SERVICE_AER, .probe = aer_probe, + .suspend = aer_suspend, + .resume = aer_resume, .remove = aer_remove, };
When the power rail gets cut off, the hardware can create some electric noise on the link that triggers AER. If IRQ is shared between AER with PME, such AER noise will cause a spurious wakeup on system suspend. When the power rail gets back, the firmware of the device resets itself and can create unexpected behavior like sending PTM messages. For this case, the driver will always be too late to toggle off features should be disabled. As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power Management", TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if the power will be turned off during suspend process, disable AER service and re-enable it during the resume process. This should not affect the basic functionality. Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> --- v8: - Add more bug reports. v7: - Wording - Disable AER completely (again) if power will be turned off v6: v5: - Wording. v4: v3: - No change. v2: - Only disable AER IRQ. - No more check on PME IRQ#. - Use helper. drivers/pci/pcie/aer.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)