Message ID | 20210809042414.107430-1-kai.heng.feng@canonical.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI/portdrv: Disallow runtime suspend when waekup is required but PME service isn't supported | expand |
[cc += Mika] On Mon, Aug 09, 2021 at 12:24:12PM +0800, Kai-Heng Feng wrote: > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213873 The last comment on this bugzilla says "BIOS will fix this." and the status is RESOLVED WILL_NOT_FIX. Why is the patch still necessary? > Some platforms cannot detect ethernet hotplug once its upstream port is > runtime suspended because PME isn't enabled in _OSC. If PME is not handled natively, why does the NIC runtime suspend? Shouldn't this be fixed in the NIC driver by keeping the device runtime active if PME cannot be used? > Disallow port runtime suspend when any child device requires wakeup, so > pci_pme_list_scan() can still read the PME status from the devices > behind the port. pci_pme_list_scan() is for broken devices which fail to signal PME. Is this NIC really among them or does PME fail merely because it's not granted to OSPM? > --- a/drivers/pci/pcie/portdrv_pci.c > +++ b/drivers/pci/pcie/portdrv_pci.c > @@ -59,14 +59,30 @@ static int pcie_port_runtime_suspend(struct device *dev) > return pcie_port_device_runtime_suspend(dev); > } > > +static int pcie_port_wakeup_check(struct device *dev, void *data) > +{ > + struct pci_dev *pdev = to_pci_dev(dev); > + > + if (!pdev) > + return 0; > + > + return pdev->wakeup_prepared; > +} > + > static int pcie_port_runtime_idle(struct device *dev) > { > + struct pci_dev *pdev = to_pci_dev(dev); > + > + if (!pcie_port_find_device(pdev, PCIE_PORT_SERVICE_PME) && > + device_for_each_child(dev, NULL, pcie_port_wakeup_check)) > + return -EBUSY; > + > /* > * Assume the PCI core has set bridge_d3 whenever it thinks the port > * should be good to go to D3. Everything else, including moving > * the port to D3, is handled by the PCI core. > */ > - return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY; > + return pdev->bridge_d3 ? 0 : -EBUSY; If an additional check is necessary for this issue, it should be integrated into pci_dev_check_d3cold() instead of pcie_port_runtime_idle(). Thanks, Lukas
On Mon, Aug 9, 2021 at 5:47 PM Lukas Wunner <lukas@wunner.de> wrote: > > [cc += Mika] > > On Mon, Aug 09, 2021 at 12:24:12PM +0800, Kai-Heng Feng wrote: > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213873 > > The last comment on this bugzilla says "BIOS will fix this." > and the status is RESOLVED WILL_NOT_FIX. > > Why is the patch still necessary? Well, let me update the bug report. > > > > Some platforms cannot detect ethernet hotplug once its upstream port is > > runtime suspended because PME isn't enabled in _OSC. > > If PME is not handled natively, why does the NIC runtime suspend? > Shouldn't this be fixed in the NIC driver by keeping the device > runtime active if PME cannot be used? That means we need to fix every user of pci_dev_run_wake(), or fix the issue in pci_dev_run_wake() helper itself. However, I am afraid that implementing the fix in pci_dev_run_wake() may break the while loop check: bool pci_dev_run_wake(struct pci_dev *dev) { ... while (bus->parent) { struct pci_dev *bridge = bus->self; if (device_can_wakeup(&bridge->dev)) return true; bus = bus->parent; ... } So I took the current approach. > > > > Disallow port runtime suspend when any child device requires wakeup, so > > pci_pme_list_scan() can still read the PME status from the devices > > behind the port. > > pci_pme_list_scan() is for broken devices which fail to signal PME. > Is this NIC really among them or does PME fail merely because it's not > granted to OSPM? The latter, PME IRQ isn't enabled because it's not granted by BIOS _OSC. > > > > --- a/drivers/pci/pcie/portdrv_pci.c > > +++ b/drivers/pci/pcie/portdrv_pci.c > > @@ -59,14 +59,30 @@ static int pcie_port_runtime_suspend(struct device *dev) > > return pcie_port_device_runtime_suspend(dev); > > } > > > > +static int pcie_port_wakeup_check(struct device *dev, void *data) > > +{ > > + struct pci_dev *pdev = to_pci_dev(dev); > > + > > + if (!pdev) > > + return 0; > > + > > + return pdev->wakeup_prepared; > > +} > > + > > static int pcie_port_runtime_idle(struct device *dev) > > { > > + struct pci_dev *pdev = to_pci_dev(dev); > > + > > + if (!pcie_port_find_device(pdev, PCIE_PORT_SERVICE_PME) && > > + device_for_each_child(dev, NULL, pcie_port_wakeup_check)) > > + return -EBUSY; > > + > > /* > > * Assume the PCI core has set bridge_d3 whenever it thinks the port > > * should be good to go to D3. Everything else, including moving > > * the port to D3, is handled by the PCI core. > > */ > > - return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY; > > + return pdev->bridge_d3 ? 0 : -EBUSY; > > If an additional check is necessary for this issue, it should be > integrated into pci_dev_check_d3cold() instead of pcie_port_runtime_idle(). I think PME IRQ and D3cold are different things here. The root port of the affected NIC doesn't support D3cold because there's no power resource. Kai-Heng > > Thanks, > > Lukas
[cc += Rafael] On Mon, Aug 09, 2021 at 06:40:41PM +0800, Kai-Heng Feng wrote: > On Mon, Aug 9, 2021 at 5:47 PM Lukas Wunner <lukas@wunner.de> wrote: > > On Mon, Aug 09, 2021 at 12:24:12PM +0800, Kai-Heng Feng wrote: > > > Some platforms cannot detect ethernet hotplug once its upstream port is > > > runtime suspended because PME isn't enabled in _OSC. > > > > If PME is not handled natively, why does the NIC runtime suspend? > > Shouldn't this be fixed in the NIC driver by keeping the device > > runtime active if PME cannot be used? > > That means we need to fix every user of pci_dev_run_wake(), or fix the > issue in pci_dev_run_wake() helper itself. If PME is not granted to the OS, the only consequence is that the PME port service is not instantiated at the root port. But PME is still enabled for downstream devices. Maybe that's a mistake? I think the ACPI spec is a little unclear what to do if PME control is *not* granted. It only specifies what to do if PME control is *granted*: "If the OS successfully receives control of this feature, it must handle power management events as described in the PCI Express Base Specification." "If firmware allows the OS control of this feature, then in the context of the _OSC method it must ensure that all PMEs are routed to root port interrupts as described in the PCI Express Base Specification. Additionally, after control is transferred to the OS, firmware must not update the PME Status field in the Root Status register or the PME Interrupt Enable field in the Root Control register. If control of this feature was requested and denied or was not requested, firmware returns this bit set to 0." Perhaps something like the below is appropriate, I'm not sure. diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 091b4a4..7e64185 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -3099,7 +3099,7 @@ void pci_pm_init(struct pci_dev *dev) } pmc &= PCI_PM_CAP_PME_MASK; - if (pmc) { + if (pmc && pci_find_host_bridge(dev->bus)->native_pme) { pci_info(dev, "PME# supported from%s%s%s%s%s\n", (pmc & PCI_PM_CAP_PME_D0) ? " D0" : "", (pmc & PCI_PM_CAP_PME_D1) ? " D1" : "", > > > --- a/drivers/pci/pcie/portdrv_pci.c > > > +++ b/drivers/pci/pcie/portdrv_pci.c > > > @@ -59,14 +59,30 @@ static int pcie_port_runtime_suspend(struct device *dev) > > > return pcie_port_device_runtime_suspend(dev); > > > } > > > > > > +static int pcie_port_wakeup_check(struct device *dev, void *data) > > > +{ > > > + struct pci_dev *pdev = to_pci_dev(dev); > > > + > > > + if (!pdev) > > > + return 0; > > > + > > > + return pdev->wakeup_prepared; > > > +} > > > + > > > static int pcie_port_runtime_idle(struct device *dev) > > > { > > > + struct pci_dev *pdev = to_pci_dev(dev); > > > + > > > + if (!pcie_port_find_device(pdev, PCIE_PORT_SERVICE_PME) && > > > + device_for_each_child(dev, NULL, pcie_port_wakeup_check)) > > > + return -EBUSY; > > > + > > > /* > > > * Assume the PCI core has set bridge_d3 whenever it thinks the port > > > * should be good to go to D3. Everything else, including moving > > > * the port to D3, is handled by the PCI core. > > > */ > > > - return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY; > > > + return pdev->bridge_d3 ? 0 : -EBUSY; > > > > If an additional check is necessary for this issue, it should be > > integrated into pci_dev_check_d3cold() instead of pcie_port_runtime_idle(). > > I think PME IRQ and D3cold are different things here. > The root port of the affected NIC doesn't support D3cold because > there's no power resource. If a bridge is runtime suspended to D3, the hierarchy below it is inaccessible, which is basically the same as if it's put in D3cold, hence the name pci_dev_check_d3cold(). That function allows a device to block an upstream bridge from runtime suspending because the device is not allowed to go to D3cold. The function specifically checks whether a device is PME-capable from D3cold. The NIC claims it's capable but the PME event has no effect because PME control wasn't granted to the OS and firmware neglected to set PME Interrupt Enable in the Root Control register. We could check for this case and block runtime PM of bridges based on the rationale that PME polling is needed to detect wakeup. Thanks, Lukas
On Mon, Aug 9, 2021 at 11:00 PM Lukas Wunner <lukas@wunner.de> wrote: > > [cc += Rafael] > > On Mon, Aug 09, 2021 at 06:40:41PM +0800, Kai-Heng Feng wrote: > > On Mon, Aug 9, 2021 at 5:47 PM Lukas Wunner <lukas@wunner.de> wrote: > > > On Mon, Aug 09, 2021 at 12:24:12PM +0800, Kai-Heng Feng wrote: > > > > Some platforms cannot detect ethernet hotplug once its upstream port is > > > > runtime suspended because PME isn't enabled in _OSC. > > > > > > If PME is not handled natively, why does the NIC runtime suspend? > > > Shouldn't this be fixed in the NIC driver by keeping the device > > > runtime active if PME cannot be used? > > > > That means we need to fix every user of pci_dev_run_wake(), or fix the > > issue in pci_dev_run_wake() helper itself. > > If PME is not granted to the OS, the only consequence is that the PME > port service is not instantiated at the root port. But PME is still > enabled for downstream devices. Maybe that's a mistake? I think the > ACPI spec is a little unclear what to do if PME control is *not* granted. > It only specifies what to do if PME control is *granted*: So do you prefer to just disable runtime PM for the downstream device? > > "If the OS successfully receives control of this feature, it must > handle power management events as described in the PCI Express Base > Specification." > > "If firmware allows the OS control of this feature, then in the context > of the _OSC method it must ensure that all PMEs are routed to root port > interrupts as described in the PCI Express Base Specification. > Additionally, after control is transferred to the OS, firmware must not > update the PME Status field in the Root Status register or the PME > Interrupt Enable field in the Root Control register. If control of this > feature was requested and denied or was not requested, firmware returns > this bit set to 0." > > Perhaps something like the below is appropriate, I'm not sure. > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 091b4a4..7e64185 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -3099,7 +3099,7 @@ void pci_pm_init(struct pci_dev *dev) > } > > pmc &= PCI_PM_CAP_PME_MASK; > - if (pmc) { > + if (pmc && pci_find_host_bridge(dev->bus)->native_pme) { > pci_info(dev, "PME# supported from%s%s%s%s%s\n", > (pmc & PCI_PM_CAP_PME_D0) ? " D0" : "", > (pmc & PCI_PM_CAP_PME_D1) ? " D1" : "", > > I think this will also prevent non-root port devices from using PME. [snipped] > > > > I think PME IRQ and D3cold are different things here. > > The root port of the affected NIC doesn't support D3cold because > > there's no power resource. > > If a bridge is runtime suspended to D3, the hierarchy below it is > inaccessible, which is basically the same as if it's put in D3cold, > hence the name pci_dev_check_d3cold(). That function allows a device > to block an upstream bridge from runtime suspending because the device > is not allowed to go to D3cold. The function specifically checks whether > a device is PME-capable from D3cold. The NIC claims it's capable but > the PME event has no effect because PME control wasn't granted to the > OS and firmware neglected to set PME Interrupt Enable in the Root Control > register. We could check for this case and block runtime PM of bridges > based on the rationale that PME polling is needed to detect wakeup. So for this case, should we prevent the downstream devices from runtime suspending, or let it suspend but keep the root port active in order to make pci_pme_list_scan() work? Kai-Heng > > Thanks, > > Lukas
On Tue, Aug 10, 2021 at 11:37:12PM +0800, Kai-Heng Feng wrote: > On Mon, Aug 9, 2021 at 11:00 PM Lukas Wunner <lukas@wunner.de> wrote: > > If PME is not granted to the OS, the only consequence is that the PME > > port service is not instantiated at the root port. But PME is still > > enabled for downstream devices. Maybe that's a mistake? I think the > > ACPI spec is a little unclear what to do if PME control is *not* granted. > > It only specifies what to do if PME control is *granted*: > > So do you prefer to just disable runtime PM for the downstream device? I honestly don't know. I was just wondering whether it is okay to enable PME on devices if control is not granted by the firmware. The spec is fairly vague. But I guess the idea is that enabling PME on devices is correct, just handling the interrupts is done by firmware instead of the OS. In your case, the endpoint device claims it can signal PME from D3cold, which is why we allow the root port above to runtime suspend to D3hot. The lspci output you've attached to the bugzilla indicates that yes, signaling PME in D3cold does work, but the PME interrupt is neither handled by the OS (because it's not allowed to) nor by firmware. So you would like to rely on PME polling instead, which only works if the root port remains in D0. Otherwise config space of the endpoint device is inaccessible. I think the proper solution is that firmware should handle the PME interrupt. You've said the vendor objects because they found PME doesn't work reliably. Well in that case the endpoint device shouldn't indicate that it can signal PME, at least not from D3cold. Perhaps the vendor is able to change the endpoint device's config space so that it doesn't claim to support PME? If that doesn't work and thus a kernel patch is necessary, the next question is whether changing core code is the right approach. If you do want to change core code, I'd suggest modifying pci_dev_check_d3cold() so that it blocks runtime PM on upstream bridges if PME is not handled natively AND firmware failed to enable the PME interrupt at the root port. The rationale is that upstream bridges need to remain in D0 so that PME polling is possible. An alternative would be a quirk for this specific laptop which clears pdev->pme_support. Thanks, Lukas
On Wed, Aug 11, 2021 at 12:21 AM Lukas Wunner <lukas@wunner.de> wrote: > > On Tue, Aug 10, 2021 at 11:37:12PM +0800, Kai-Heng Feng wrote: > > On Mon, Aug 9, 2021 at 11:00 PM Lukas Wunner <lukas@wunner.de> wrote: > > > If PME is not granted to the OS, the only consequence is that the PME > > > port service is not instantiated at the root port. But PME is still > > > enabled for downstream devices. Maybe that's a mistake? I think the > > > ACPI spec is a little unclear what to do if PME control is *not* granted. > > > It only specifies what to do if PME control is *granted*: > > > > So do you prefer to just disable runtime PM for the downstream device? > > I honestly don't know. I was just wondering whether it is okay > to enable PME on devices if control is not granted by the firmware. > The spec is fairly vague. But I guess the idea is that enabling PME > on devices is correct, just handling the interrupts is done by firmware > instead of the OS. Does this imply that current ACPI doesn't handle this part? > > In your case, the endpoint device claims it can signal PME from D3cold, > which is why we allow the root port above to runtime suspend to D3hot. > The lspci output you've attached to the bugzilla indicates that yes, > signaling PME in D3cold does work, but the PME interrupt is neither > handled by the OS (because it's not allowed to) nor by firmware. > > So you would like to rely on PME polling instead, which only works if the > root port remains in D0. Otherwise config space of the endpoint device > is inaccessible. The Windows approach is to make the entire hierarchy stays at D0, I think maybe it's a better way than relying on PME polling. > > I think the proper solution is that firmware should handle the PME > interrupt. You've said the vendor objects because they found PME > doesn't work reliably. The PME works, what vendor said is that enabling PME makes the system "unstable". > Well in that case the endpoint device shouldn't > indicate that it can signal PME, at least not from D3cold. Perhaps > the vendor is able to change the endpoint device's config space so > that it doesn't claim to support PME? This is not an viable option, and we have to consider that BIOS from different vendors can exhibit the same behavior. > > If that doesn't work and thus a kernel patch is necessary, the next > question is whether changing core code is the right approach. I really don't see other way because non-granted PME is a system-wide thing... > > If you do want to change core code, I'd suggest modifying > pci_dev_check_d3cold() so that it blocks runtime PM on upstream > bridges if PME is not handled natively AND firmware failed to enable > the PME interrupt at the root port. The rationale is that upstream > bridges need to remain in D0 so that PME polling is possible. How do I know that firmware failed to enable PME IRQ? And let me see how to make pci_dev_check_d3cold() work for this case. > > An alternative would be a quirk for this specific laptop which clears > pdev->pme_support. This won't scale, because many models are affected. Kai-Heng > > Thanks, > > Lukas
On Wed, Aug 11, 2021 at 01:06:27PM +0800, Kai-Heng Feng wrote: > On Wed, Aug 11, 2021 at 12:21 AM Lukas Wunner <lukas@wunner.de> wrote: > > > > On Tue, Aug 10, 2021 at 11:37:12PM +0800, Kai-Heng Feng wrote: > > I honestly don't know. I was just wondering whether it is okay > > to enable PME on devices if control is not granted by the firmware. > > The spec is fairly vague. But I guess the idea is that enabling PME > > on devices is correct, just handling the interrupts is done by firmware > > instead of the OS. > > Does this imply that current ACPI doesn't handle this part? Apparently not, according to the "lspci-bridge-after-hotplug" you've attached to the bugzilla, the PME Interrupt Enable bit wasn't set in the Root Control register. The kernel doesn't register an IRQ handler for PME because firmware doesn't grant it control, so it's firmware's job to enable and handle the IRQ (or poll the relevant register or whatever). RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- ^^^^^^^^^^ > The Windows approach is to make the entire hierarchy stays at D0, I > think maybe it's a better way than relying on PME polling. Including the endpoint device, i.e. the NIC? > > If you do want to change core code, I'd suggest modifying > > pci_dev_check_d3cold() so that it blocks runtime PM on upstream > > bridges if PME is not handled natively AND firmware failed to enable > > the PME interrupt at the root port. The rationale is that upstream > > bridges need to remain in D0 so that PME polling is possible. > > How do I know that firmware failed to enable PME IRQ? Check whether PCI_EXP_RTCTL_PMEIE was set by firmware in the Root Control register. > > An alternative would be a quirk for this specific laptop which clears > > pdev->pme_support. > > This won't scale, because many models are affected. We already have quirks which clear pdev->pme_support, e.g. pci_fixup_no_d0_pme() and pci_fixup_no_msi_no_pme(). Perhaps something like that would be appropriate here. Thanks, Lukas
On Wed, Aug 11, 2021 at 3:11 PM Lukas Wunner <lukas@wunner.de> wrote: > > On Wed, Aug 11, 2021 at 01:06:27PM +0800, Kai-Heng Feng wrote: > > On Wed, Aug 11, 2021 at 12:21 AM Lukas Wunner <lukas@wunner.de> wrote: > > > > > > On Tue, Aug 10, 2021 at 11:37:12PM +0800, Kai-Heng Feng wrote: > > > I honestly don't know. I was just wondering whether it is okay > > > to enable PME on devices if control is not granted by the firmware. > > > The spec is fairly vague. But I guess the idea is that enabling PME > > > on devices is correct, just handling the interrupts is done by firmware > > > instead of the OS. > > > > Does this imply that current ACPI doesn't handle this part? > > Apparently not, according to the "lspci-bridge-after-hotplug" you've > attached to the bugzilla, the PME Interrupt Enable bit wasn't set in > the Root Control register. The kernel doesn't register an IRQ handler > for PME because firmware doesn't grant it control, so it's firmware's > job to enable and handle the IRQ (or poll the relevant register or > whatever). > > RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- > ^^^^^^^^^^ OK, I'll send a patch that checks this flag for PME capability. > > > The Windows approach is to make the entire hierarchy stays at D0, I > > think maybe it's a better way than relying on PME polling. > > Including the endpoint device, i.e. the NIC? Yes, including the endpoint device. > > > > > If you do want to change core code, I'd suggest modifying > > > pci_dev_check_d3cold() so that it blocks runtime PM on upstream > > > bridges if PME is not handled natively AND firmware failed to enable > > > the PME interrupt at the root port. The rationale is that upstream > > > bridges need to remain in D0 so that PME polling is possible. > > > > How do I know that firmware failed to enable PME IRQ? > > Check whether PCI_EXP_RTCTL_PMEIE was set by firmware in the Root Control > register. I originally thought there can be a special ACPI method to query this info. > > > > > An alternative would be a quirk for this specific laptop which clears > > > pdev->pme_support. > > > > This won't scale, because many models are affected. > > We already have quirks which clear pdev->pme_support, e.g. > pci_fixup_no_d0_pme() and pci_fixup_no_msi_no_pme(). > Perhaps something like that would be appropriate here. OK, I'll take this approach. Kai-Heng > > Thanks, > > Lukas
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c index c7ff1eea225a..e693d243c90d 100644 --- a/drivers/pci/pcie/portdrv_pci.c +++ b/drivers/pci/pcie/portdrv_pci.c @@ -59,14 +59,30 @@ static int pcie_port_runtime_suspend(struct device *dev) return pcie_port_device_runtime_suspend(dev); } +static int pcie_port_wakeup_check(struct device *dev, void *data) +{ + struct pci_dev *pdev = to_pci_dev(dev); + + if (!pdev) + return 0; + + return pdev->wakeup_prepared; +} + static int pcie_port_runtime_idle(struct device *dev) { + struct pci_dev *pdev = to_pci_dev(dev); + + if (!pcie_port_find_device(pdev, PCIE_PORT_SERVICE_PME) && + device_for_each_child(dev, NULL, pcie_port_wakeup_check)) + return -EBUSY; + /* * Assume the PCI core has set bridge_d3 whenever it thinks the port * should be good to go to D3. Everything else, including moving * the port to D3, is handled by the PCI core. */ - return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY; + return pdev->bridge_d3 ? 0 : -EBUSY; } static const struct dev_pm_ops pcie_portdrv_pm_ops = {
Some platforms cannot detect ethernet hotplug once its upstream port is runtime suspended because PME isn't enabled in _OSC. The issue can be workarounded by "pcie_ports=native". The vendor confirmed that the PME in _OSC is disabled intentionally for stability issues on the other OS, so we should also honor the PME setting here. Disallow port runtime suspend when any child device requires wakeup, so pci_pme_list_scan() can still read the PME status from the devices behind the port. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213873 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> --- drivers/pci/pcie/portdrv_pci.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)