Message ID | 20230927041845.1222080-1-s-vadapalli@ti.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Krzysztof Wilczyński |
Headers | show |
Series | [v3] PCI: keystone: Fix race condition when initializing PHYs | expand |
On 9/27/23 9:48 AM, Siddharth Vadapalli wrote: > The PCI driver invokes the PHY APIs using the ks_pcie_enable_phy() > function. The PHY in this case is the Serdes. It is possible that the > PCI instance is configured for 2 lane operation across two different > Serdes instances, using 1 lane of each Serdes. In such a configuration, > if the reference clock for one Serdes is provided by the other Serdes, > it results in a race condition. After the Serdes providing the reference > clock is initialized by the PCI driver by invoking its PHY APIs, it is > not guaranteed that this Serdes remains powered on long enough for the > PHY APIs based initialization of the dependent Serdes. In such cases, > the PLL of the dependent Serdes fails to lock due to the absence of the > reference clock from the former Serdes which has been powered off by the > PM Core. > > Fix this by obtaining reference to the PHYs before invoking the PHY > initialization APIs and releasing reference after the initialization is > complete. Sounds reasonable. Acked-by: Ravi Gunasekaran <r-gunasekaran@ti.com> Ravi > > Fixes: 49229238ab47 ("PCI: keystone: Cleanup PHY handling") > Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> > --- > > NOTE: This patch is based on linux-next tagged next-20230927. > > v2: > https://lore.kernel.org/r/20230926063638.1005124-1-s-vadapalli@ti.com/ > > Changes since v2: > - Implement suggestion by Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> > moving the phy_pm_runtime_put_sync() For-Loop section before the > return value of ks_pcie_enable_phy(ks_pcie) is checked, thereby > preventing duplication of the For-Loop. > - Rebase patch on next-20230927. > > v1: > https://lore.kernel.org/r/20230926054200.963803-1-s-vadapalli@ti.com/ > > Changes since v1: > - Add code to release reference(s) to the phy(s) when > ks_pcie_enable_phy(ks_pcie) fails. > > Regards, > Siddharth. > > drivers/pci/controller/dwc/pci-keystone.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/drivers/pci/controller/dwc/pci-keystone.c b/drivers/pci/controller/dwc/pci-keystone.c > index 49aea6ce3e87..0ec6720cc2df 100644 > --- a/drivers/pci/controller/dwc/pci-keystone.c > +++ b/drivers/pci/controller/dwc/pci-keystone.c > @@ -1218,7 +1218,16 @@ static int __init ks_pcie_probe(struct platform_device *pdev) > goto err_link; > } > > + /* Obtain reference(s) to the phy(s) */ > + for (i = 0; i < num_lanes; i++) > + phy_pm_runtime_get_sync(ks_pcie->phy[i]); > + > ret = ks_pcie_enable_phy(ks_pcie); > + > + /* Release reference(s) to the phy(s) */ > + for (i = 0; i < num_lanes; i++) > + phy_pm_runtime_put_sync(ks_pcie->phy[i]); > + > if (ret) { > dev_err(dev, "failed to enable phy\n"); > goto err_link;
Hello Bjorn, Could you please merge this patch? On 28/09/23 13:21, Ravi Gunasekaran wrote: > > > On 9/27/23 9:48 AM, Siddharth Vadapalli wrote: >> The PCI driver invokes the PHY APIs using the ks_pcie_enable_phy() >> function. The PHY in this case is the Serdes. It is possible that the >> PCI instance is configured for 2 lane operation across two different >> Serdes instances, using 1 lane of each Serdes. In such a configuration, >> if the reference clock for one Serdes is provided by the other Serdes, >> it results in a race condition. After the Serdes providing the reference >> clock is initialized by the PCI driver by invoking its PHY APIs, it is >> not guaranteed that this Serdes remains powered on long enough for the >> PHY APIs based initialization of the dependent Serdes. In such cases, >> the PLL of the dependent Serdes fails to lock due to the absence of the >> reference clock from the former Serdes which has been powered off by the >> PM Core. >> >> Fix this by obtaining reference to the PHYs before invoking the PHY >> initialization APIs and releasing reference after the initialization is >> complete. > > Sounds reasonable. > > Acked-by: Ravi Gunasekaran <r-gunasekaran@ti.com> > > Ravi >> >> Fixes: 49229238ab47 ("PCI: keystone: Cleanup PHY handling") >> Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> >> --- >> >> NOTE: This patch is based on linux-next tagged next-20230927. >> >> v2: >> https://lore.kernel.org/r/20230926063638.1005124-1-s-vadapalli@ti.com/ >> >> Changes since v2: >> - Implement suggestion by Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> >> moving the phy_pm_runtime_put_sync() For-Loop section before the >> return value of ks_pcie_enable_phy(ks_pcie) is checked, thereby >> preventing duplication of the For-Loop. >> - Rebase patch on next-20230927. >> >> v1: >> https://lore.kernel.org/r/20230926054200.963803-1-s-vadapalli@ti.com/ >> >> Changes since v1: >> - Add code to release reference(s) to the phy(s) when >> ks_pcie_enable_phy(ks_pcie) fails. >> >> Regards, >> Siddharth. >> >> drivers/pci/controller/dwc/pci-keystone.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/drivers/pci/controller/dwc/pci-keystone.c b/drivers/pci/controller/dwc/pci-keystone.c >> index 49aea6ce3e87..0ec6720cc2df 100644 >> --- a/drivers/pci/controller/dwc/pci-keystone.c >> +++ b/drivers/pci/controller/dwc/pci-keystone.c >> @@ -1218,7 +1218,16 @@ static int __init ks_pcie_probe(struct platform_device *pdev) >> goto err_link; >> } >> >> + /* Obtain reference(s) to the phy(s) */ >> + for (i = 0; i < num_lanes; i++) >> + phy_pm_runtime_get_sync(ks_pcie->phy[i]); >> + >> ret = ks_pcie_enable_phy(ks_pcie); >> + >> + /* Release reference(s) to the phy(s) */ >> + for (i = 0; i < num_lanes; i++) >> + phy_pm_runtime_put_sync(ks_pcie->phy[i]); >> + >> if (ret) { >> dev_err(dev, "failed to enable phy\n"); >> goto err_link; >
Hello, > The PCI driver invokes the PHY APIs using the ks_pcie_enable_phy() > function. The PHY in this case is the Serdes. It is possible that the > PCI instance is configured for 2 lane operation across two different > Serdes instances, using 1 lane of each Serdes. In such a configuration, > if the reference clock for one Serdes is provided by the other Serdes, > it results in a race condition. After the Serdes providing the reference > clock is initialized by the PCI driver by invoking its PHY APIs, it is > not guaranteed that this Serdes remains powered on long enough for the > PHY APIs based initialization of the dependent Serdes. In such cases, > the PLL of the dependent Serdes fails to lock due to the absence of the > reference clock from the former Serdes which has been powered off by the > PM Core. > > Fix this by obtaining reference to the PHYs before invoking the PHY > initialization APIs and releasing reference after the initialization is > complete. Applied to controller/keystone, thank you! [1/1] PCI: keystone: Fix race condition when initializing PHYs https://git.kernel.org/pci/pci/c/c12ca110c613 Krzysztof
On Wed, Sep 27, 2023 at 09:48:45AM +0530, Siddharth Vadapalli wrote: > The PCI driver invokes the PHY APIs using the ks_pcie_enable_phy() > function. The PHY in this case is the Serdes. It is possible that the > PCI instance is configured for 2 lane operation across two different > Serdes instances, using 1 lane of each Serdes. In such a configuration, > if the reference clock for one Serdes is provided by the other Serdes, > it results in a race condition. After the Serdes providing the reference > clock is initialized by the PCI driver by invoking its PHY APIs, it is > not guaranteed that this Serdes remains powered on long enough for the > PHY APIs based initialization of the dependent Serdes. In such cases, > the PLL of the dependent Serdes fails to lock due to the absence of the > reference clock from the former Serdes which has been powered off by the > PM Core. > > Fix this by obtaining reference to the PHYs before invoking the PHY > initialization APIs and releasing reference after the initialization is > complete. > > Fixes: 49229238ab47 ("PCI: keystone: Cleanup PHY handling") > Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> > --- > > NOTE: This patch is based on linux-next tagged next-20230927. > > v2: > https://lore.kernel.org/r/20230926063638.1005124-1-s-vadapalli@ti.com/ > > Changes since v2: > - Implement suggestion by Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> > moving the phy_pm_runtime_put_sync() For-Loop section before the > return value of ks_pcie_enable_phy(ks_pcie) is checked, thereby > preventing duplication of the For-Loop. > - Rebase patch on next-20230927. > > v1: > https://lore.kernel.org/r/20230926054200.963803-1-s-vadapalli@ti.com/ > > Changes since v1: > - Add code to release reference(s) to the phy(s) when > ks_pcie_enable_phy(ks_pcie) fails. > > Regards, > Siddharth. > > drivers/pci/controller/dwc/pci-keystone.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/drivers/pci/controller/dwc/pci-keystone.c b/drivers/pci/controller/dwc/pci-keystone.c > index 49aea6ce3e87..0ec6720cc2df 100644 > --- a/drivers/pci/controller/dwc/pci-keystone.c > +++ b/drivers/pci/controller/dwc/pci-keystone.c > @@ -1218,7 +1218,16 @@ static int __init ks_pcie_probe(struct platform_device *pdev) > goto err_link; > } > > + /* Obtain reference(s) to the phy(s) */ > + for (i = 0; i < num_lanes; i++) > + phy_pm_runtime_get_sync(ks_pcie->phy[i]); > + > ret = ks_pcie_enable_phy(ks_pcie); > + > + /* Release reference(s) to the phy(s) */ > + for (i = 0; i < num_lanes; i++) > + phy_pm_runtime_put_sync(ks_pcie->phy[i]); This looks good and has already been applied, so no immediate action required. This is the only call to ks_pcie_enable_phy(), and these loops get and put the PM references for the same PHYs initialized in ks_pcie_enable_phy(), so it seems like maybe these loops could be moved *into* ks_pcie_enable_phy(). Is there any similar issue in ks_pcie_disable_phy()? What if we power-off a PHY that provides a reference clock to other PHYs that are still powered-up? Will the dependent PHYs still power-off cleanly? > if (ret) { > dev_err(dev, "failed to enable phy\n"); > goto err_link; > -- > 2.34.1 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Hello Bjorn, On 10/01/24 02:53, Bjorn Helgaas wrote: > On Wed, Sep 27, 2023 at 09:48:45AM +0530, Siddharth Vadapalli wrote: >> The PCI driver invokes the PHY APIs using the ks_pcie_enable_phy() >> function. The PHY in this case is the Serdes. It is possible that the >> PCI instance is configured for 2 lane operation across two different ... >> >> + /* Obtain reference(s) to the phy(s) */ >> + for (i = 0; i < num_lanes; i++) >> + phy_pm_runtime_get_sync(ks_pcie->phy[i]); >> + >> ret = ks_pcie_enable_phy(ks_pcie); >> + >> + /* Release reference(s) to the phy(s) */ >> + for (i = 0; i < num_lanes; i++) >> + phy_pm_runtime_put_sync(ks_pcie->phy[i]); > > This looks good and has already been applied, so no immediate action > required. > > This is the only call to ks_pcie_enable_phy(), and these loops get and > put the PM references for the same PHYs initialized in > ks_pcie_enable_phy(), so it seems like maybe these loops could be > moved *into* ks_pcie_enable_phy(). Does the following look fine? =============================================================================== diff --git a/drivers/pci/controller/dwc/pci-keystone.c b/drivers/pci/controller/dwc/pci-keystone.c index e02236003b46..6e9f9589d26c 100644 --- a/drivers/pci/controller/dwc/pci-keystone.c +++ b/drivers/pci/controller/dwc/pci-keystone.c @@ -962,6 +962,9 @@ static int ks_pcie_enable_phy(struct keystone_pcie *ks_pcie) int num_lanes = ks_pcie->num_lanes; for (i = 0; i < num_lanes; i++) { + /* Obtain reference to the phy */ + phy_pm_runtime_get_sync(ks_pcie->phy[i]); + ret = phy_reset(ks_pcie->phy[i]); if (ret < 0) goto err_phy; @@ -977,12 +980,18 @@ static int ks_pcie_enable_phy(struct keystone_pcie *ks_pcie) } } + /* Release reference(s) to the phy(s) */ + for (i = 0; i < num_lanes; i++) + phy_pm_runtime_put_sync(ks_pcie->phy[i]); + return 0; err_phy: while (--i >= 0) { phy_power_off(ks_pcie->phy[i]); phy_exit(ks_pcie->phy[i]); + /* Release reference to the phy */ + phy_pm_runtime_put_sync(ks_pcie->phy[i]); } return ret; =============================================================================== > > Is there any similar issue in ks_pcie_disable_phy()? What if we > power-off a PHY that provides a reference clock to other PHYs that are > still powered-up? Will the dependent PHYs still power-off cleanly? While debugging the issue fixed by this patch, I had bisected and identified that prior to the following commit: https://github.com/torvalds/linux/commit/e611f8cd8717c8fe7d4229997e6cd029a1465253 despite the race condition being present, there was no issue. While I am not fully certain, I believe that the above observation indicates that prior to the aforementioned commit, the race condition did exist, but there was a slightly longer delay between the PHY providing the reference clock being powered off within "ks_pcie_enable_phy()". That delay was sufficient for the dependent PHY to lock its PLL based on the reference clock provided, following which, despite the PHY providing the reference clock being powered off and the dependent PHY staying powered on, there was no issue observed. Therefore, it appears to me that holding reference to the PHY providing the reference clock isn't necessary once the dependent PHY's PLL is locked. ...
On Wed, Jan 10, 2024 at 11:35:24AM +0530, Siddharth Vadapalli wrote: > Hello Bjorn, > > On 10/01/24 02:53, Bjorn Helgaas wrote: > > On Wed, Sep 27, 2023 at 09:48:45AM +0530, Siddharth Vadapalli wrote: > >> The PCI driver invokes the PHY APIs using the ks_pcie_enable_phy() > >> function. The PHY in this case is the Serdes. It is possible that the > >> PCI instance is configured for 2 lane operation across two different > > ... > > >> > >> + /* Obtain reference(s) to the phy(s) */ > >> + for (i = 0; i < num_lanes; i++) > >> + phy_pm_runtime_get_sync(ks_pcie->phy[i]); > >> + > >> ret = ks_pcie_enable_phy(ks_pcie); > >> + > >> + /* Release reference(s) to the phy(s) */ > >> + for (i = 0; i < num_lanes; i++) > >> + phy_pm_runtime_put_sync(ks_pcie->phy[i]); > > > > This looks good and has already been applied, so no immediate action > > required. > > > > This is the only call to ks_pcie_enable_phy(), and these loops get and > > put the PM references for the same PHYs initialized in > > ks_pcie_enable_phy(), so it seems like maybe these loops could be > > moved *into* ks_pcie_enable_phy(). > > Does the following look fine? > =============================================================================== > diff --git a/drivers/pci/controller/dwc/pci-keystone.c > b/drivers/pci/controller/dwc/pci-keystone.c > index e02236003b46..6e9f9589d26c 100644 > --- a/drivers/pci/controller/dwc/pci-keystone.c > +++ b/drivers/pci/controller/dwc/pci-keystone.c > @@ -962,6 +962,9 @@ static int ks_pcie_enable_phy(struct keystone_pcie *ks_pcie) > int num_lanes = ks_pcie->num_lanes; > > for (i = 0; i < num_lanes; i++) { > + /* Obtain reference to the phy */ > + phy_pm_runtime_get_sync(ks_pcie->phy[i]); I thought the point was that you needed to guarantee that all PHYs are powered on and stay that way before initializing any of them. If so, you would need a separate loop, e.g., for (i = 0; i < num_lanes; i++) phy_pm_runtime_get_sync(ks_pcie->phy[i]); for (i = 0; i < num_lanes; i++) { ret = phy_reset(ks_pcie->phy[i]); ... > ret = phy_reset(ks_pcie->phy[i]); > if (ret < 0) > goto err_phy; > @@ -977,12 +980,18 @@ static int ks_pcie_enable_phy(struct keystone_pcie *ks_pcie) > } > } > > + /* Release reference(s) to the phy(s) */ > + for (i = 0; i < num_lanes; i++) > + phy_pm_runtime_put_sync(ks_pcie->phy[i]); > + > return 0; > > err_phy: > while (--i >= 0) { > phy_power_off(ks_pcie->phy[i]); > phy_exit(ks_pcie->phy[i]); > + /* Release reference to the phy */ > + phy_pm_runtime_put_sync(ks_pcie->phy[i]); > } > > return ret;
On 20/01/24 04:50, Bjorn Helgaas wrote: > On Wed, Jan 10, 2024 at 11:35:24AM +0530, Siddharth Vadapalli wrote: >> Hello Bjorn, >> >> On 10/01/24 02:53, Bjorn Helgaas wrote: ... >> >> Does the following look fine? >> =============================================================================== >> diff --git a/drivers/pci/controller/dwc/pci-keystone.c >> b/drivers/pci/controller/dwc/pci-keystone.c >> index e02236003b46..6e9f9589d26c 100644 >> --- a/drivers/pci/controller/dwc/pci-keystone.c >> +++ b/drivers/pci/controller/dwc/pci-keystone.c >> @@ -962,6 +962,9 @@ static int ks_pcie_enable_phy(struct keystone_pcie *ks_pcie) >> int num_lanes = ks_pcie->num_lanes; >> >> for (i = 0; i < num_lanes; i++) { >> + /* Obtain reference to the phy */ >> + phy_pm_runtime_get_sync(ks_pcie->phy[i]); > > I thought the point was that you needed to guarantee that all PHYs are > powered on and stay that way before initializing any of them. If so, > you would need a separate loop, e.g., > > for (i = 0; i < num_lanes; i++) > phy_pm_runtime_get_sync(ks_pcie->phy[i]); > > for (i = 0; i < num_lanes; i++) { > ret = phy_reset(ks_pcie->phy[i]); > ... > Yes, the above implementation seems better. The strict requirement will be that post initialization of the first PHY (Serdes), it remains powered ON so that it can provide its reference clock to the second PHY (Serdes). Therefore, getting the reference to the PHYs within the loop will work too since the reference is being release only outside the loop. Nevertheless I shall go ahead with the implementation suggested by you since that looks much better and cleaner. >> ret = phy_reset(ks_pcie->phy[i]); >> if (ret < 0) >> goto err_phy; >> @@ -977,12 +980,18 @@ static int ks_pcie_enable_phy(struct keystone_pcie *ks_pcie) >> } >> } >> >> + /* Release reference(s) to the phy(s) */ >> + for (i = 0; i < num_lanes; i++) >> + phy_pm_runtime_put_sync(ks_pcie->phy[i]); >> + >> return 0; >> >> err_phy: >> while (--i >= 0) { >> phy_power_off(ks_pcie->phy[i]); >> phy_exit(ks_pcie->phy[i]); >> + /* Release reference to the phy */ >> + phy_pm_runtime_put_sync(ks_pcie->phy[i]); >> } >> >> return ret;
diff --git a/drivers/pci/controller/dwc/pci-keystone.c b/drivers/pci/controller/dwc/pci-keystone.c index 49aea6ce3e87..0ec6720cc2df 100644 --- a/drivers/pci/controller/dwc/pci-keystone.c +++ b/drivers/pci/controller/dwc/pci-keystone.c @@ -1218,7 +1218,16 @@ static int __init ks_pcie_probe(struct platform_device *pdev) goto err_link; } + /* Obtain reference(s) to the phy(s) */ + for (i = 0; i < num_lanes; i++) + phy_pm_runtime_get_sync(ks_pcie->phy[i]); + ret = ks_pcie_enable_phy(ks_pcie); + + /* Release reference(s) to the phy(s) */ + for (i = 0; i < num_lanes; i++) + phy_pm_runtime_put_sync(ks_pcie->phy[i]); + if (ret) { dev_err(dev, "failed to enable phy\n"); goto err_link;
The PCI driver invokes the PHY APIs using the ks_pcie_enable_phy() function. The PHY in this case is the Serdes. It is possible that the PCI instance is configured for 2 lane operation across two different Serdes instances, using 1 lane of each Serdes. In such a configuration, if the reference clock for one Serdes is provided by the other Serdes, it results in a race condition. After the Serdes providing the reference clock is initialized by the PCI driver by invoking its PHY APIs, it is not guaranteed that this Serdes remains powered on long enough for the PHY APIs based initialization of the dependent Serdes. In such cases, the PLL of the dependent Serdes fails to lock due to the absence of the reference clock from the former Serdes which has been powered off by the PM Core. Fix this by obtaining reference to the PHYs before invoking the PHY initialization APIs and releasing reference after the initialization is complete. Fixes: 49229238ab47 ("PCI: keystone: Cleanup PHY handling") Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> --- NOTE: This patch is based on linux-next tagged next-20230927. v2: https://lore.kernel.org/r/20230926063638.1005124-1-s-vadapalli@ti.com/ Changes since v2: - Implement suggestion by Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> moving the phy_pm_runtime_put_sync() For-Loop section before the return value of ks_pcie_enable_phy(ks_pcie) is checked, thereby preventing duplication of the For-Loop. - Rebase patch on next-20230927. v1: https://lore.kernel.org/r/20230926054200.963803-1-s-vadapalli@ti.com/ Changes since v1: - Add code to release reference(s) to the phy(s) when ks_pcie_enable_phy(ks_pcie) fails. Regards, Siddharth. drivers/pci/controller/dwc/pci-keystone.c | 9 +++++++++ 1 file changed, 9 insertions(+)