Message ID | 40b03450-8f42-29d5-b65e-43644ec44940@nvidia.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | Query about secondary_bu_reset implementation | expand |
Hi Folks, Could you please take time to help us understand this better? Thanks, Vidya Sagar On 11/15/2021 11:24 AM, Vidya Sagar wrote: > Hi folks, > Regarding the below commit that added pci_dev_wait() API to wait for the > device (supposed to be a downstream device.. i.e. and endpoint) get > ready, I'm wondering, given the 'dev' pointer here points to an upstream > device (i.e. a root port) because the same is passed to > pcibios_reset_secondary_bus() API, how is passing a root port's dev > pointer to pci_dev_wait() is going to serve the purpose? > My understanding is that it would always get the response immediately as > the reset is applied to the endpoint here (through secondary bus reset) > and not to the root port, right? or am I missing something here? > > > commit 6b2f1351af567110cec80d7c067314c633a14f50 > Author: Sinan Kaya <okaya@codeaurora.org> > Date: Tue Feb 27 14:14:12 2018 -0600 > > PCI: Wait for device to become ready after secondary bus reset > > Setting Secondary Bus Reset of a downstream port sends a hot reset. > PCIe > r4.0, sec 2.3.1, Request Handling Rules, indicates that a device > can return > CRS Completion Status following such a reset. Wait until the device > becomes ready in that situation. > > Signed-off-by: Sinan Kaya <okaya@codeaurora.org> > Signed-off-by: Bjorn Helgaas <helgaas@kernel.org> > Reviewed-by: Christoph Hellwig <hch@lst.de> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index dde40506ffe5..0b8e8ee84bbc 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4233,7 +4233,7 @@ int pci_reset_bridge_secondary_bus(struct pci_dev > *dev) > { > pcibios_reset_secondary_bus(dev); > > - return 0; > + return pci_dev_wait(dev, "bus reset", PCIE_RESET_READY_POLL_MS); > } > EXPORT_SYMBOL_GPL(pci_reset_bridge_secondary_bus); > > > Thanks, > Vidya Sagar
On 11/18/2021 10:03 AM, Vidya Sagar wrote: > Regarding the below commit that added pci_dev_wait() API to wait for the > device (supposed to be a downstream device.. i.e. and endpoint) get > ready, I'm wondering, given the 'dev' pointer here points to an upstream > device (i.e. a root port) because the same is passed to > pcibios_reset_secondary_bus() API, how is passing a root port's dev > pointer to pci_dev_wait() is going to serve the purpose? > My understanding is that it would always get the response immediately as > the reset is applied to the endpoint here (through secondary bus reset) > and not to the root port, right? or am I missing something here? Root port is not reset. This is a link reset and recovery from link reset can take time per CRS response. We have seen some GPUs going all the way up to 60 seconds while returning CRS response and waiting to reinitialize.
On 11/19/2021 12:16 AM, Sinan Kaya wrote: > External email: Use caution opening links or attachments > > > On 11/18/2021 10:03 AM, Vidya Sagar wrote: >> Regarding the below commit that added pci_dev_wait() API to wait for the >> device (supposed to be a downstream device.. i.e. and endpoint) get >> ready, I'm wondering, given the 'dev' pointer here points to an upstream >> device (i.e. a root port) because the same is passed to >> pcibios_reset_secondary_bus() API, how is passing a root port's dev >> pointer to pci_dev_wait() is going to serve the purpose? > >> My understanding is that it would always get the response immediately as >> the reset is applied to the endpoint here (through secondary bus reset) >> and not to the root port, right? or am I missing something here? > > Root port is not reset. > This is a link reset and recovery from link reset can take time per CRS > response. > > We have seen some GPUs going all the way up to 60 seconds while > returning CRS response and waiting to reinitialize. Yes, but the pci_dev_wait() is called here with the pci_dev * of the RP and not the endpoint, right? So, how is CRSes from the endpoint are handled in this case?
On 11/19/2021 11:20 AM, Vidya Sagar wrote: > > > On 11/19/2021 12:16 AM, Sinan Kaya wrote: >> We have seen some GPUs going all the way up to 60 seconds while >> returning CRS response and waiting to reinitialize. > Yes, but the pci_dev_wait() is called here with the pci_dev * of the RP > and not the endpoint, right? So, how is CRSes from the endpoint are > handled in this case? I see what you are saying. Yes, that looks like a bug. It should have been a config space read to the EP.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index dde40506ffe5..0b8e8ee84bbc 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4233,7 +4233,7 @@ int pci_reset_bridge_secondary_bus(struct pci_dev *dev) { pcibios_reset_secondary_bus(dev); - return 0; + return pci_dev_wait(dev, "bus reset", PCIE_RESET_READY_POLL_MS); } EXPORT_SYMBOL_GPL(pci_reset_bridge_secondary_bus);