Message ID | 20190805205214.194981-4-helgaas@kernel.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | PCI: Add PCI_ERROR_RESPONSE, check for errors | expand |
On Mon, Aug 5, 2019 at 10:52 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > From: Bjorn Helgaas <bhelgaas@google.com> > > pci_check_pme_status() reads the Power Management capability to determine > whether a device has generated a PME. The capability is in config space, > which is accessible in D0, D1, D2, and D3hot, but not in D3cold. > > If we call pci_check_pme_status() on a device that's in D3cold, config > reads fail and return ~0 data, which we erroneously interpreted as "the > device has generated a PME". > > 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold") > avoided many of these problems by avoiding pci_check_pme_status() if we > think the device is in D3cold. However, it is not a complete fix because > the device may go to D3cold after we check its power state but before > pci_check_pme_status() reads the Power Management Status Register. > > Return false ("device has not generated a PME") if we get an error response > reading the Power Management Status Register. > > Fixes: 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold") > Fixes: 71a83bd727cc ("PCI/PM: add runtime PM support to PCIe port") > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > --- > drivers/pci/pci.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 984171d40858..af6a97d7012b 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -2008,6 +2008,9 @@ bool pci_check_pme_status(struct pci_dev *dev) > > pmcsr_pos = dev->pm_cap + PCI_PM_CTRL; > pci_read_config_word(dev, pmcsr_pos, &pmcsr); > + if (pmcsr == (u16) PCI_ERROR_RESPONSE) > + return false; > + No, sorry. We tried that and it didn't work. pcie_pme_handle_request() depends on this returning "true" for all bits set, as from its perspective "device is not accessible" may very well mean "device may have signaled PME". If you want to make this change, you need to rework pcie_pme_handle_request() along with it. > if (!(pmcsr & PCI_PM_CTRL_PME_STATUS)) > return false;
On Mon, Aug 05, 2019 at 11:02:51PM +0200, Rafael J. Wysocki wrote: > On Mon, Aug 5, 2019 at 10:52 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > pci_check_pme_status() reads the Power Management capability to determine > > whether a device has generated a PME. The capability is in config space, > > which is accessible in D0, D1, D2, and D3hot, but not in D3cold. > > > > If we call pci_check_pme_status() on a device that's in D3cold, config > > reads fail and return ~0 data, which we erroneously interpreted as "the > > device has generated a PME". > > > > 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold") > > avoided many of these problems by avoiding pci_check_pme_status() if we > > think the device is in D3cold. However, it is not a complete fix because > > the device may go to D3cold after we check its power state but before > > pci_check_pme_status() reads the Power Management Status Register. > > > > Return false ("device has not generated a PME") if we get an error response > > reading the Power Management Status Register. > > > > Fixes: 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold") > > Fixes: 71a83bd727cc ("PCI/PM: add runtime PM support to PCIe port") > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > > --- > > drivers/pci/pci.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index 984171d40858..af6a97d7012b 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -2008,6 +2008,9 @@ bool pci_check_pme_status(struct pci_dev *dev) > > > > pmcsr_pos = dev->pm_cap + PCI_PM_CTRL; > > pci_read_config_word(dev, pmcsr_pos, &pmcsr); > > + if (pmcsr == (u16) PCI_ERROR_RESPONSE) > > + return false; > > No, sorry. > > We tried that and it didn't work. > > pcie_pme_handle_request() depends on this returning "true" for all > bits set, as from its perspective "device is not accessible" may very > well mean "device may have signaled PME". Right, it's obviously wrong in the case of devices that advertise D3cold in PME_Support, i.e., devices that can generate PME even with main power off. Also, we may want to try to wake up devices if the config read fails for a reason other than the device being in D3cold. What I don't like about the current code is that it checks PCI_PM_CTRL_PME_STATUS in data that may be completely bogus. Do you think it would be better to do something like this: pci_read_config_word(dev, pmcsr_pos, &pmcsr); if (pmcsr == (u16) PCI_ERROR_RESPONSE) { if (pci_pme_capable(dev, PCI_PM_CAP_PME_D3cold)) return true; return false; } or maybe this: pci_read_config_word(dev, pmcsr_pos, &pmcsr); if (pmcsr == (u16) PCI_ERROR_RESPONSE) return true; We should get PCI_ERROR_RESPONSE pretty reliably from devices in D3cold, so the first possibility would cover that case. But since pci_check_pme_status() basically returns a hint ("true" means a device *may* have generated a PME), and even if the hint is wrong, the worst that happens is an unnecessary wakeup, maybe the second possibility is safer. What do you think? > > if (!(pmcsr & PCI_PM_CTRL_PME_STATUS)) > > return false;
On Tuesday, August 6, 2019 3:36:38 PM CEST Bjorn Helgaas wrote: > On Mon, Aug 05, 2019 at 11:02:51PM +0200, Rafael J. Wysocki wrote: > > On Mon, Aug 5, 2019 at 10:52 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > pci_check_pme_status() reads the Power Management capability to determine > > > whether a device has generated a PME. The capability is in config space, > > > which is accessible in D0, D1, D2, and D3hot, but not in D3cold. > > > > > > If we call pci_check_pme_status() on a device that's in D3cold, config > > > reads fail and return ~0 data, which we erroneously interpreted as "the > > > device has generated a PME". > > > > > > 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold") > > > avoided many of these problems by avoiding pci_check_pme_status() if we > > > think the device is in D3cold. However, it is not a complete fix because > > > the device may go to D3cold after we check its power state but before > > > pci_check_pme_status() reads the Power Management Status Register. > > > > > > Return false ("device has not generated a PME") if we get an error response > > > reading the Power Management Status Register. > > > > > > Fixes: 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold") > > > Fixes: 71a83bd727cc ("PCI/PM: add runtime PM support to PCIe port") > > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > > > --- > > > drivers/pci/pci.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > index 984171d40858..af6a97d7012b 100644 > > > --- a/drivers/pci/pci.c > > > +++ b/drivers/pci/pci.c > > > @@ -2008,6 +2008,9 @@ bool pci_check_pme_status(struct pci_dev *dev) > > > > > > pmcsr_pos = dev->pm_cap + PCI_PM_CTRL; > > > pci_read_config_word(dev, pmcsr_pos, &pmcsr); > > > + if (pmcsr == (u16) PCI_ERROR_RESPONSE) > > > + return false; > > > > No, sorry. > > > > We tried that and it didn't work. > > > > pcie_pme_handle_request() depends on this returning "true" for all > > bits set, as from its perspective "device is not accessible" may very > > well mean "device may have signaled PME". > > Right, it's obviously wrong in the case of devices that advertise > D3cold in PME_Support, i.e., devices that can generate PME even with > main power off. Also, we may want to try to wake up devices if the > config read fails for a reason other than the device being in D3cold. > > What I don't like about the current code is that it checks > PCI_PM_CTRL_PME_STATUS in data that may be completely bogus. Whether or not the other bits in the register make sense doesn't matter here. Only the PME_STATUS bit matters. > Do you think it would be better to do something like this: > > pci_read_config_word(dev, pmcsr_pos, &pmcsr); > if (pmcsr == (u16) PCI_ERROR_RESPONSE) { > if (pci_pme_capable(dev, PCI_PM_CAP_PME_D3cold)) > return true; > return false; > } > > or maybe this: > > pci_read_config_word(dev, pmcsr_pos, &pmcsr); > if (pmcsr == (u16) PCI_ERROR_RESPONSE) > return true; In this case it still would be prudent to check PME_ENABLE before returning true and so there is no practical difference between ERROR_RESPONSE and the valid data with PME_STATUS set. Except that in the ERROR_RESPONSE case we may as well avoid the PMCSR write which is not going to make a difference. > We should get PCI_ERROR_RESPONSE pretty reliably from devices in > D3cold, so the first possibility would cover that case. > > But since pci_check_pme_status() basically returns a hint ("true" > means a device *may* have generated a PME), and even if the hint is > wrong, the worst that happens is an unnecessary wakeup, maybe the > second possibility is safer. > > What do you think? So if you really want to avoid the PMCSR write in the ERROR_RESPONSE case, something like this can be done IMO: return false; /* Clear PME status. */ - pmcsr |= PCI_PM_CTRL_PME_STATUS; if (pmcsr & PCI_PM_CTRL_PME_ENABLE) { + if (pmcsr == (u16) PCI_ERROR_RESPONSE) + return true; + /* Disable PME to avoid interrupt flood. */ pmcsr &= ~PCI_PM_CTRL_PME_ENABLE; ret = true;
On Wed, Aug 14, 2019 at 01:26:56AM +0200, Rafael J. Wysocki wrote: > On Tuesday, August 6, 2019 3:36:38 PM CEST Bjorn Helgaas wrote: > > On Mon, Aug 05, 2019 at 11:02:51PM +0200, Rafael J. Wysocki wrote: > > > On Mon, Aug 5, 2019 at 10:52 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > > > pci_check_pme_status() reads the Power Management capability to determine > > > > whether a device has generated a PME. The capability is in config space, > > > > which is accessible in D0, D1, D2, and D3hot, but not in D3cold. > > > > > > > > If we call pci_check_pme_status() on a device that's in D3cold, config > > > > reads fail and return ~0 data, which we erroneously interpreted as "the > > > > device has generated a PME". > > > > > > > > 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold") > > > > avoided many of these problems by avoiding pci_check_pme_status() if we > > > > think the device is in D3cold. However, it is not a complete fix because > > > > the device may go to D3cold after we check its power state but before > > > > pci_check_pme_status() reads the Power Management Status Register. > > > > > > > > Return false ("device has not generated a PME") if we get an error response > > > > reading the Power Management Status Register. > > > > > > > > Fixes: 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold") > > > > Fixes: 71a83bd727cc ("PCI/PM: add runtime PM support to PCIe port") > > > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > > > > --- > > > > drivers/pci/pci.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > > index 984171d40858..af6a97d7012b 100644 > > > > --- a/drivers/pci/pci.c > > > > +++ b/drivers/pci/pci.c > > > > @@ -2008,6 +2008,9 @@ bool pci_check_pme_status(struct pci_dev *dev) > > > > > > > > pmcsr_pos = dev->pm_cap + PCI_PM_CTRL; > > > > pci_read_config_word(dev, pmcsr_pos, &pmcsr); > > > > + if (pmcsr == (u16) PCI_ERROR_RESPONSE) > > > > + return false; > > > > > > No, sorry. > > > > > > We tried that and it didn't work. > > > > > > pcie_pme_handle_request() depends on this returning "true" for all > > > bits set, as from its perspective "device is not accessible" may very > > > well mean "device may have signaled PME". > > > > Right, it's obviously wrong in the case of devices that advertise > > D3cold in PME_Support, i.e., devices that can generate PME even with > > main power off. Also, we may want to try to wake up devices if the > > config read fails for a reason other than the device being in D3cold. > > > > What I don't like about the current code is that it checks > > PCI_PM_CTRL_PME_STATUS in data that may be completely bogus. > > Whether or not the other bits in the register make sense doesn't > matter here. Only the PME_STATUS bit matters. Of course. It just relies on the implicit assumption that the bit in the error response matches the PME_STATUS state that we want, which is a little bit ugly. > > Do you think it would be better to do something like this: > > > > pci_read_config_word(dev, pmcsr_pos, &pmcsr); > > if (pmcsr == (u16) PCI_ERROR_RESPONSE) { > > if (pci_pme_capable(dev, PCI_PM_CAP_PME_D3cold)) > > return true; > > return false; > > } > > > > or maybe this: > > > > pci_read_config_word(dev, pmcsr_pos, &pmcsr); > > if (pmcsr == (u16) PCI_ERROR_RESPONSE) > > return true; > > In this case it still would be prudent to check PME_ENABLE before > returning true and so there is no practical difference between > ERROR_RESPONSE and the valid data with PME_STATUS set. > > Except that in the ERROR_RESPONSE case we may as well avoid the > PMCSR write which is not going to make a difference. > > > We should get PCI_ERROR_RESPONSE pretty reliably from devices in > > D3cold, so the first possibility would cover that case. > > > > But since pci_check_pme_status() basically returns a hint ("true" > > means a device *may* have generated a PME), and even if the hint is > > wrong, the worst that happens is an unnecessary wakeup, maybe the > > second possibility is safer. > > > > What do you think? > > So if you really want to avoid the PMCSR write in the ERROR_RESPONSE case, > something like this can be done IMO: > > return false; > > /* Clear PME status. */ > - pmcsr |= PCI_PM_CTRL_PME_STATUS; > if (pmcsr & PCI_PM_CTRL_PME_ENABLE) { > + if (pmcsr == (u16) PCI_ERROR_RESPONSE) > + return true; > + > /* Disable PME to avoid interrupt flood. */ > pmcsr &= ~PCI_PM_CTRL_PME_ENABLE; > ret = true; Agreed, that's the conclusion I came to as well. I wouldn't do this just to avoid the config write, since as you mentioned that will get dropped anyway. The reason I would consider this is as an example of how drivers might think about validating data they read from devices. Bjorn
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 984171d40858..af6a97d7012b 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2008,6 +2008,9 @@ bool pci_check_pme_status(struct pci_dev *dev) pmcsr_pos = dev->pm_cap + PCI_PM_CTRL; pci_read_config_word(dev, pmcsr_pos, &pmcsr); + if (pmcsr == (u16) PCI_ERROR_RESPONSE) + return false; + if (!(pmcsr & PCI_PM_CTRL_PME_STATUS)) return false;