Message ID | 20220919183342.4090-1-vidyas@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | PCI: designware-ep: Fix DBI access before core init | expand |
On Tue, Sep 20, 2022 at 12:03:39AM +0530, Vidya Sagar wrote: > This series attempts to fix the issue with core register (Ex:- DBI) accesses > causing system hang issues in platforms where there is a dependency on the > availability of PCIe Reference clock from the host for their core > initialization. > This series is verified on Tegra194 & Tegra234 platforms. I think this design is just kind of weird, specifically, the fact that setting .core_init_notifier makes dw_pcie_ep_init() bail out early. The usual pattern is more like "if the specific driver sets this function pointer, the generic code calls it." The name "dw_pcie_ep_init_complete()" is not as helpful as it could be: it tells us something about what has happened before this point, but it doesn't tell us anything about what dw_pcie_ep_init_complete() *does*. Same thing with dw_pcie_ep_init_notify() -- it doesn't tell us anything about what the function *does*. I see that it calls pci_epc_init_notify(), which calls a notifier call chain (currently always empty except for a test case). I think pci_epc_linkup() is a better name because it says something about what's happening: the link is now up and we're telling somebody about it. "pci_epc_init_notify()" doesn't convey that. "pci_epc_core_initialized()" might. It looks like both qcom and tegra wait for an interrupt before calling dw_pcie_ep_init_notify(), but I'm a little concerned because I can't figure out what specifically they do to start the process that ultimately generates the interrupt. Presumably they request the IRQ *before* starting the process, but there's not much between the devm_request_threaded_irq() and the interrupt handler, which makes me wonder if both are racy. > Manivannan, could you please verify on qcom platforms? > > V4: > * Addressed review comments from Bjorn and Manivannan > * Added .ep_init_late() ops > * Added patches to refactor code in qcom and tegra platforms > > Vidya Sagar (3): > PCI: designware-ep: Fix DBI access before core init > PCI: qcom-ep: Refactor EP initialization completion > PCI: tegra194: Refactor EP initialization completion > > .../pci/controller/dwc/pcie-designware-ep.c | 112 ++++++++++-------- > drivers/pci/controller/dwc/pcie-designware.h | 10 +- > drivers/pci/controller/dwc/pcie-qcom-ep.c | 27 +++-- > drivers/pci/controller/dwc/pcie-tegra194.c | 4 +- > 4 files changed, 85 insertions(+), 68 deletions(-) > > -- > 2.17.1 >
On 9/20/2022 4:10 AM, Bjorn Helgaas wrote: > External email: Use caution opening links or attachments > > > On Tue, Sep 20, 2022 at 12:03:39AM +0530, Vidya Sagar wrote: >> This series attempts to fix the issue with core register (Ex:- DBI) accesses >> causing system hang issues in platforms where there is a dependency on the >> availability of PCIe Reference clock from the host for their core >> initialization. >> This series is verified on Tegra194 & Tegra234 platforms. > > I think this design is just kind of weird, specifically, the fact that > setting .core_init_notifier makes dw_pcie_ep_init() bail out early. > The usual pattern is more like "if the specific driver sets this > function pointer, the generic code calls it." Thanks for the review Bjorn. Typically the PCIe endpoints run using the reference clock from the hosts that they are connected to. Our hardware designers followed the same approach here as well, but the main difference here being that the controllers operating in the endpoint mode are not standalone controllers but part of a bigger Tegra (/Qcom) systems. So, the complete controller initialization sequence just can't happen during the boot stage itself, hence the boot initialization sequence needs to be split into two parts viz a) early initialization - that just parses DT, does the programming that doesn't depend on the reference clock from host and b) does the programming that can only be performed after reference clock is available from the host We are working with our hardware designers to avoid this dependency on the reference clock from the host so that all the programming can happen during boot itself and hardware is smart enough to switch to using the reference clock from the host when it is available. But, this is for future designs and Tegra194 & Tegra234 continue to have this limitation. > > The name "dw_pcie_ep_init_complete()" is not as helpful as it could > be: it tells us something about what has happened before this point, > but it doesn't tell us anything about what dw_pcie_ep_init_complete() > *does*. To be inline with new ops ep_init_late that I added in this series, would it be fine to name this as dw_pcie_ep_init_late()? > > Same thing with dw_pcie_ep_init_notify() -- it doesn't tell us > anything about what the function *does*. Would it make more sense to rename it as dw_pcie_ep_linkup_notify()? I see that it calls > pci_epc_init_notify(), which calls a notifier call chain (currently > always empty except for a test case). I think pci_epc_linkup() is a > better name because it says something about what's happening: the link > is now up and we're telling somebody about it. "pci_epc_init_notify()" > doesn't convey that. "pci_epc_core_initialized()" might. Ok. I'll rename it to pci_epc_core_initialized(). > > It looks like both qcom and tegra wait for an interrupt before calling > dw_pcie_ep_init_notify(), but I'm a little concerned because I can't > figure out what specifically they do to start the process that > ultimately generates the interrupt. As part of 'start'ing the endpoint as mentioned in https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/PCI/endpoint/pci-test-howto.rst#n101 we execute the following echo 1 > controllers/141a0000.pcie-ep/start that enables the interrupt generation for toggles on the PERST# line. Presumably they request the IRQ > *before* starting the process, but there's not much between the > devm_request_threaded_irq() and the interrupt handler, which makes me > wonder if both are racy. I don't think there is any race between these two as the 'start' is initiated from the user space. Not sure if I'm missing something here though. > >> Manivannan, could you please verify on qcom platforms? >> >> V4: >> * Addressed review comments from Bjorn and Manivannan >> * Added .ep_init_late() ops >> * Added patches to refactor code in qcom and tegra platforms >> >> Vidya Sagar (3): >> PCI: designware-ep: Fix DBI access before core init >> PCI: qcom-ep: Refactor EP initialization completion >> PCI: tegra194: Refactor EP initialization completion >> >> .../pci/controller/dwc/pcie-designware-ep.c | 112 ++++++++++-------- >> drivers/pci/controller/dwc/pcie-designware.h | 10 +- >> drivers/pci/controller/dwc/pcie-qcom-ep.c | 27 +++-- >> drivers/pci/controller/dwc/pcie-tegra194.c | 4 +- >> 4 files changed, 85 insertions(+), 68 deletions(-) >> >> -- >> 2.17.1 >>
Hi Bjorn, Did you find time to take a look at my responses? If you don't have anything to add further, I'll take care of the review comments as mentioned and send the V5 patch for review. Please let me know. Thanks, Vidya Sagar On 9/26/2022 8:32 PM, Vidya Sagar wrote: > > > On 9/20/2022 4:10 AM, Bjorn Helgaas wrote: >> External email: Use caution opening links or attachments >> >> >> On Tue, Sep 20, 2022 at 12:03:39AM +0530, Vidya Sagar wrote: >>> This series attempts to fix the issue with core register (Ex:- DBI) >>> accesses >>> causing system hang issues in platforms where there is a dependency >>> on the >>> availability of PCIe Reference clock from the host for their core >>> initialization. >>> This series is verified on Tegra194 & Tegra234 platforms. >> >> I think this design is just kind of weird, specifically, the fact that >> setting .core_init_notifier makes dw_pcie_ep_init() bail out early. >> The usual pattern is more like "if the specific driver sets this >> function pointer, the generic code calls it." > > Thanks for the review Bjorn. > > Typically the PCIe endpoints run using the reference clock from the > hosts that they are connected to. Our hardware designers followed the > same approach here as well, but the main difference here being that the > controllers operating in the endpoint mode are not standalone > controllers but part of a bigger Tegra (/Qcom) systems. > So, the complete controller initialization sequence just can't happen > during the boot stage itself, hence the boot initialization sequence > needs to be split into two parts viz a) early initialization - that just > parses DT, does the programming that doesn't depend on the reference > clock from host and b) does the programming that can only be performed > after reference clock is available from the host > We are working with our hardware designers to avoid this dependency on > the reference clock from the host so that all the programming can happen > during boot itself and hardware is smart enough to switch to using the > reference clock from the host when it is available. But, this is for > future designs and Tegra194 & Tegra234 continue to have this limitation. > >> >> The name "dw_pcie_ep_init_complete()" is not as helpful as it could >> be: it tells us something about what has happened before this point, >> but it doesn't tell us anything about what dw_pcie_ep_init_complete() >> *does*. > > To be inline with new ops ep_init_late that I added in this series, > would it be fine to name this as dw_pcie_ep_init_late()? > >> >> Same thing with dw_pcie_ep_init_notify() -- it doesn't tell us >> anything about what the function *does*. > > Would it make more sense to rename it as dw_pcie_ep_linkup_notify()? > > I see that it calls >> pci_epc_init_notify(), which calls a notifier call chain (currently >> always empty except for a test case). I think pci_epc_linkup() is a >> better name because it says something about what's happening: the link >> is now up and we're telling somebody about it. "pci_epc_init_notify()" >> doesn't convey that. "pci_epc_core_initialized()" might. > > Ok. I'll rename it to pci_epc_core_initialized(). > >> >> It looks like both qcom and tegra wait for an interrupt before calling >> dw_pcie_ep_init_notify(), but I'm a little concerned because I can't >> figure out what specifically they do to start the process that >> ultimately generates the interrupt. > > As part of 'start'ing the endpoint as mentioned in > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/PCI/endpoint/pci-test-howto.rst#n101 > > we execute the following > echo 1 > controllers/141a0000.pcie-ep/start > that enables the interrupt generation for toggles on the PERST# line. > > Presumably they request the IRQ >> *before* starting the process, but there's not much between the >> devm_request_threaded_irq() and the interrupt handler, which makes me >> wonder if both are racy. > > I don't think there is any race between these two as the 'start' is > initiated from the user space. Not sure if I'm missing something here > though. > >> >>> Manivannan, could you please verify on qcom platforms? >>> >>> V4: >>> * Addressed review comments from Bjorn and Manivannan >>> * Added .ep_init_late() ops >>> * Added patches to refactor code in qcom and tegra platforms >>> >>> Vidya Sagar (3): >>> PCI: designware-ep: Fix DBI access before core init >>> PCI: qcom-ep: Refactor EP initialization completion >>> PCI: tegra194: Refactor EP initialization completion >>> >>> .../pci/controller/dwc/pcie-designware-ep.c | 112 ++++++++++-------- >>> drivers/pci/controller/dwc/pcie-designware.h | 10 +- >>> drivers/pci/controller/dwc/pcie-qcom-ep.c | 27 +++-- >>> drivers/pci/controller/dwc/pcie-tegra194.c | 4 +- >>> 4 files changed, 85 insertions(+), 68 deletions(-) >>> >>> -- >>> 2.17.1 >>>