Message ID | 36c7c3005c4d86a6884b270807d84433a86c0953.1633972263.git.naveennaidu479@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: Unify PCI error response checking | expand |
On Mon, Oct 11, 2021 at 11:37:33PM +0530, Naveen Naidu wrote: > An MMIO read from a PCI device that doesn't exist or doesn't respond > causes a PCI error. There's no real data to return to satisfy the > CPU read, so most hardware fabricates ~0 data. > > Use RESPONSE_IS_PCI_ERROR() to check the response we get when we read > data from hardware. Actually what happens is that PCI read transactions *time out*, so the host controller fabricates a response. By contrast, a PCI *error* usually denotes an Uncorrectable or Correctable Error as specified in section 6.2.2 of the PCIe Base Spec. Thus something like RESPONSE_IS_PCI_TIMEOUT() or IS_PCI_TIMEOUT() would probably be more appropriate. I'll leave the exact bikeshed color for others to decide. :-) > Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com> > --- > drivers/pci/hotplug/pciehp_hpc.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) Acked-by: Lukas Wunner <lukas@wunner.de>
On 11/10, Lukas Wunner wrote: > On Mon, Oct 11, 2021 at 11:37:33PM +0530, Naveen Naidu wrote: > > An MMIO read from a PCI device that doesn't exist or doesn't respond > > causes a PCI error. There's no real data to return to satisfy the > > CPU read, so most hardware fabricates ~0 data. > > > > Use RESPONSE_IS_PCI_ERROR() to check the response we get when we read > > data from hardware. > > Actually what happens is that PCI read transactions *time out*, > so the host controller fabricates a response. > Ah! yes. Now that I look at it, RESPONSE_IS_PCI_TIMEOUT() does indeed seem like a better option to RESPONSE_IS_PCI_ERROR(), since it's more specfic and depicts the actual condition. I'll wait for sometime and see if others have any objection/a better name for the macro and then redo the patch with that. Thank you very much for the review ^^ > By contrast, a PCI *error* usually denotes an Uncorrectable or > Correctable Error as specified in section 6.2.2 of the PCIe Base Spec. > > Thus something like RESPONSE_IS_PCI_TIMEOUT() or IS_PCI_TIMEOUT() would > probably be more appropriate. I'll leave the exact bikeshed color for > others to decide. :-) > > > > Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com> > > --- > > drivers/pci/hotplug/pciehp_hpc.c | 10 +++++----- > > 1 file changed, 5 insertions(+), 5 deletions(-) > > Acked-by: Lukas Wunner <lukas@wunner.de>
On Tuesday 12 October 2021 21:35:13 Naveen Naidu wrote: > On 11/10, Lukas Wunner wrote: > > On Mon, Oct 11, 2021 at 11:37:33PM +0530, Naveen Naidu wrote: > > > An MMIO read from a PCI device that doesn't exist or doesn't respond > > > causes a PCI error. There's no real data to return to satisfy the > > > CPU read, so most hardware fabricates ~0 data. > > > > > > Use RESPONSE_IS_PCI_ERROR() to check the response we get when we read > > > data from hardware. > > > > Actually what happens is that PCI read transactions *time out*, > > so the host controller fabricates a response. > > > > Ah! yes. Now that I look at it, RESPONSE_IS_PCI_TIMEOUT() does indeed > seem like a better option to RESPONSE_IS_PCI_ERROR(), since it's more > specfic and depicts the actual condition. This is not fully correct. 0xffffffff is returned when some error happens. It does not have to be timeout error. Errors like Unsupported Request, Completer Abort or Configuration Request Retry Status (when CRSSVE bit is disabled) are also reported as 0xffffffff and they do not represent timeout. For example Unsupported Request is returned when you try to read from non-existent device behind some PCIe switch. Also pci-aardvark.c fabricates value 0xffffffff when trying to read from config space below the PCIe Root Port when PCIe link is not up. And I have seen that Completer Abort was returned by PCIe switch when switch itself did not received reply from device below switch. So it means that controller can receive some reply from other device even when no real reply was sent. Which means that timeout can be reported by some other message. So I think that generic PCI_ERROR is the best name. You do not know what really happened (only some controller drivers can provide additional information, it does not have any standard HW<-->OS API) and application logic must decide how to process error. > I'll wait for sometime and see if others have any objection/a better > name for the macro and then redo the patch with that. > > Thank you very much for the review ^^ > > > By contrast, a PCI *error* usually denotes an Uncorrectable or > > Correctable Error as specified in section 6.2.2 of the PCIe Base Spec. > > > > Thus something like RESPONSE_IS_PCI_TIMEOUT() or IS_PCI_TIMEOUT() would > > probably be more appropriate. I'll leave the exact bikeshed color for > > others to decide. :-) > > > > > > > Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com> > > > --- > > > drivers/pci/hotplug/pciehp_hpc.c | 10 +++++----- > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > Acked-by: Lukas Wunner <lukas@wunner.de>
On Wed, Oct 13, 2021 at 01:12:01AM +0200, Pali Rohár wrote: > > On 11/10, Lukas Wunner wrote: > > > On Mon, Oct 11, 2021 at 11:37:33PM +0530, Naveen Naidu wrote: > > > > An MMIO read from a PCI device that doesn't exist or doesn't respond > > > > causes a PCI error. There's no real data to return to satisfy the > > > > CPU read, so most hardware fabricates ~0 data. > > > > > > > > Use RESPONSE_IS_PCI_ERROR() to check the response we get when we read > > > > data from hardware. > > > > > > Actually what happens is that PCI read transactions *time out*, > > > so the host controller fabricates a response. > > This is not fully correct. 0xffffffff is returned when some error > happens. It does not have to be timeout error. Errors like Unsupported > Request, Completer Abort or Configuration Request Retry Status (when > CRSSVE bit is disabled) are also reported as 0xffffffff and they do not > represent timeout. For example Unsupported Request is returned when you > try to read from non-existent device behind some PCIe switch. This particular patch concerns pciehp and in that context, "all ones" responses are predominantly timeouts caused by hot-removed devices. Thanks, Lukas
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 3024d7e85e6a..8a2f6bb643b5 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -89,7 +89,7 @@ static int pcie_poll_cmd(struct controller *ctrl, int timeout) do { pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status); - if (slot_status == (u16) ~0) { + if (RESPONSE_IS_PCI_ERROR(&slot_status)) { ctrl_info(ctrl, "%s: no response from device\n", __func__); return 0; @@ -165,7 +165,7 @@ static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd, pcie_wait_cmd(ctrl); pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl); - if (slot_ctrl == (u16) ~0) { + if (RESPONSE_IS_PCI_ERROR(&slot_ctrl)) { ctrl_info(ctrl, "%s: no response from device\n", __func__); goto out; } @@ -236,7 +236,7 @@ int pciehp_check_link_active(struct controller *ctrl) int ret; ret = pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status); - if (ret == PCIBIOS_DEVICE_NOT_FOUND || lnk_status == (u16)~0) + if (ret == PCIBIOS_DEVICE_NOT_FOUND || RESPONSE_IS_PCI_ERROR(&lnk_status)) return -ENODEV; ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA); @@ -443,7 +443,7 @@ int pciehp_card_present(struct controller *ctrl) int ret; ret = pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status); - if (ret == PCIBIOS_DEVICE_NOT_FOUND || slot_status == (u16)~0) + if (ret == PCIBIOS_DEVICE_NOT_FOUND || RESPONSE_IS_PCI_ERROR(&slot_status)) return -ENODEV; return !!(slot_status & PCI_EXP_SLTSTA_PDS); @@ -621,7 +621,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) read_status: pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status); - if (status == (u16) ~0) { + if (RESPONSE_IS_PCI_ERROR(&status)) { ctrl_info(ctrl, "%s: no response from device\n", __func__); if (parent) pm_runtime_put(parent);
An MMIO read from a PCI device that doesn't exist or doesn't respond causes a PCI error. There's no real data to return to satisfy the CPU read, so most hardware fabricates ~0 data. Use RESPONSE_IS_PCI_ERROR() to check the response we get when we read data from hardware. This helps unify PCI error response checking and make error checks consistent and easier to find. Compile tested only. Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com> --- drivers/pci/hotplug/pciehp_hpc.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)