Message ID | 92d97c68-d226-6290-37d6-f46f42ea604b@free.fr (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | [v1] phy: qcom-qmp: Raise qcom_qmp_phy_enable() polling delay | expand |
Hi Marc, On 6/13/2019 5:02 PM, Marc Gonzalez wrote: > readl_poll_timeout() calls usleep_range() to sleep between reads. > usleep_range() doesn't work efficiently for tiny values. > > Raise the polling delay in qcom_qmp_phy_enable() to bring it in line > with the delay in qcom_qmp_phy_com_init(). > > Signed-off-by: Marc Gonzalez <marc.w.gonzalez@free.fr> > --- > Vivek, do you remember why you didn't use the same delay value in > qcom_qmp_phy_enable) and qcom_qmp_phy_com_init() ? phy_qcom_init() thingy came from the PCIE phy driver from downstream msm-3.18 PCIE did something as below: ----- do { if (pcie_phy_is_ready(dev)) break; retries++; usleep_range(REFCLK_STABILIZATION_DELAY_US_MIN, REFCLK_STABILIZATION_DELAY_US_MAX); } while (retries < PHY_READY_TIMEOUT_COUNT); REFCLK_STABILIZATION_DELAY_US_MIN/MAX ==> 1000/1005 PHY_READY_TIMEOUT_COUNT ==> 10 ----- phy_enable() from the usb phy driver from downstream. /* Wait for PHY initialization to be done */ do { if (readl_relaxed(phy->base + phy->phy_reg[USB3_PHY_PCS_STATUS]) & PHYSTATUS) usleep_range(1, 2); else break; } while (--init_timeout_usec); init_timeout_usec ==> 1000 ----- USB never had a COM_PHY status bit. So clearly the resolutions were different. Does this change solves an issue at hand? > --- > drivers/phy/qualcomm/phy-qcom-qmp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c > index bb522b915fa9..34ff6434da8f 100644 > --- a/drivers/phy/qualcomm/phy-qcom-qmp.c > +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c > @@ -1548,7 +1548,7 @@ static int qcom_qmp_phy_enable(struct phy *phy) > status = pcs + cfg->regs[QPHY_PCS_READY_STATUS]; > mask = cfg->mask_pcs_ready; > > - ret = readl_poll_timeout(status, val, val & mask, 1, > + ret = readl_poll_timeout(status, val, val & mask, 10, > PHY_INIT_COMPLETE_TIMEOUT); > if (ret) { > dev_err(qmp->dev, "phy initialization timed-out\n");
+ Doug (who is familiar with usleep_range quirks) On 14/06/2019 11:50, Vivek Gautam wrote: > On 6/13/2019 5:02 PM, Marc Gonzalez wrote: > >> readl_poll_timeout() calls usleep_range() to sleep between reads. >> usleep_range() doesn't work efficiently for tiny values. >> >> Raise the polling delay in qcom_qmp_phy_enable() to bring it in line >> with the delay in qcom_qmp_phy_com_init(). >> >> Signed-off-by: Marc Gonzalez <marc.w.gonzalez@free.fr> >> --- >> Vivek, do you remember why you didn't use the same delay value in >> qcom_qmp_phy_enable) and qcom_qmp_phy_com_init() ? > > phy_qcom_init() thingy came from the PCIE phy driver from downstream > msm-3.18 PCIE did something as below: FWIW and IMO, drivers/pci/host/pci-msm.c is a good example of how not to write a device driver. It's huge (7000+ lines) because it handles multiple platforms via ifdefs, and lumps everything together (phy, core IP, SoC specific glue) in a single file. > ----- > do { > if (pcie_phy_is_ready(dev)) > break; > retries++; > usleep_range(REFCLK_STABILIZATION_DELAY_US_MIN, > REFCLK_STABILIZATION_DELAY_US_MAX); > } while (retries < PHY_READY_TIMEOUT_COUNT); > > REFCLK_STABILIZATION_DELAY_US_MIN/MAX ==> 1000/1005 > PHY_READY_TIMEOUT_COUNT ==> 10 > ----- https://source.codeaurora.org/quic/la/kernel/msm-4.4/tree/drivers/pci/host/pci-msm.c?h=LE.UM.1.3.r3.25#n4624 https://source.codeaurora.org/quic/la/kernel/msm-4.4/tree/drivers/pci/host/pci-msm.c?h=LE.UM.1.3.r3.25#n1721 readl_relaxed(dev->phy + PCIE_N_PCS_STATUS(dev->rc_idx, dev->common_phy)) & BIT(6) is equivalent to: the check in qcom_qmp_phy_enable() readl_relaxed(dev->phy + PCIE_COM_PCS_READY_STATUS) & 0x1 is equivalent to: the check in qcom_qmp_phy_com_init() I'll take a closer look, using some printks, to narrow down the run-time execution path. > phy_enable() from the usb phy driver from downstream. > /* Wait for PHY initialization to be done */ > do { > if (readl_relaxed(phy->base + > phy->phy_reg[USB3_PHY_PCS_STATUS]) & PHYSTATUS) > usleep_range(1, 2); > else > break; > } while (--init_timeout_usec); > > init_timeout_usec ==> 1000 > ----- > USB never had a COM_PHY status bit. > > So clearly the resolutions were different. > > Does this change solve an issue at hand? The issue is usleep_range() being misused ^_^ Although usleep_range() takes unsigned longs as parameters, it is not appropriate over the entire 0-2^64 range. a) It should not be used with tiny values, because the cost of programming the timer interrupt, and processing the resulting IRQ would dominate. b) It should not be used with large values (above 2000000/HZ) because msleep() is more efficient, and is acceptable for these ranges. Regards.
Hi, On 14/06/19 6:08 PM, Marc Gonzalez wrote: > + Doug (who is familiar with usleep_range quirks) > > On 14/06/2019 11:50, Vivek Gautam wrote: > >> On 6/13/2019 5:02 PM, Marc Gonzalez wrote: >> >>> readl_poll_timeout() calls usleep_range() to sleep between reads. >>> usleep_range() doesn't work efficiently for tiny values. >>> >>> Raise the polling delay in qcom_qmp_phy_enable() to bring it in line >>> with the delay in qcom_qmp_phy_com_init(). >>> >>> Signed-off-by: Marc Gonzalez <marc.w.gonzalez@free.fr> >>> --- >>> Vivek, do you remember why you didn't use the same delay value in >>> qcom_qmp_phy_enable) and qcom_qmp_phy_com_init() ? >> >> phy_qcom_init() thingy came from the PCIE phy driver from downstream >> msm-3.18 PCIE did something as below: > > FWIW and IMO, drivers/pci/host/pci-msm.c is a good example of how not to write > a device driver. It's huge (7000+ lines) because it handles multiple platforms > via ifdefs, and lumps everything together (phy, core IP, SoC specific glue) > in a single file. > >> ----- >> do { >> if (pcie_phy_is_ready(dev)) >> break; >> retries++; >> usleep_range(REFCLK_STABILIZATION_DELAY_US_MIN, >> REFCLK_STABILIZATION_DELAY_US_MAX); >> } while (retries < PHY_READY_TIMEOUT_COUNT); >> >> REFCLK_STABILIZATION_DELAY_US_MIN/MAX ==> 1000/1005 >> PHY_READY_TIMEOUT_COUNT ==> 10 >> ----- > > https://source.codeaurora.org/quic/la/kernel/msm-4.4/tree/drivers/pci/host/pci-msm.c?h=LE.UM.1.3.r3.25#n4624 > > https://source.codeaurora.org/quic/la/kernel/msm-4.4/tree/drivers/pci/host/pci-msm.c?h=LE.UM.1.3.r3.25#n1721 > > readl_relaxed(dev->phy + PCIE_N_PCS_STATUS(dev->rc_idx, dev->common_phy)) & BIT(6) > is equivalent to: > the check in qcom_qmp_phy_enable() > > readl_relaxed(dev->phy + PCIE_COM_PCS_READY_STATUS) & 0x1 > is equivalent to: > the check in qcom_qmp_phy_com_init() > > I'll take a closer look, using some printks, to narrow down the run-time > execution path. > >> phy_enable() from the usb phy driver from downstream. >> /* Wait for PHY initialization to be done */ >> do { >> if (readl_relaxed(phy->base + >> phy->phy_reg[USB3_PHY_PCS_STATUS]) & PHYSTATUS) >> usleep_range(1, 2); >> else >> break; >> } while (--init_timeout_usec); >> >> init_timeout_usec ==> 1000 >> ----- >> USB never had a COM_PHY status bit. >> >> So clearly the resolutions were different. >> >> Does this change solve an issue at hand? > > The issue is usleep_range() being misused ^_^ > > Although usleep_range() takes unsigned longs as parameters, it is > not appropriate over the entire 0-2^64 range. > > a) It should not be used with tiny values, because the cost of programming > the timer interrupt, and processing the resulting IRQ would dominate. > > b) It should not be used with large values (above 2000000/HZ) because > msleep() is more efficient, and is acceptable for these ranges. Documentation/timers/timers-howto.txt has all the information on the various kernel delay/sleep mechanisms. For < ~10us, it recommends to use udelay (readx_poll_timeout_atomic). Depending on the actual timeout to be used, the delay mechanism in timers-howto.txt should be used. Thanks Kishon
On 20/06/2019 08:25, Kishon Vijay Abraham I wrote: > On 14/06/19 6:08 PM, Marc Gonzalez wrote: > >> The issue is usleep_range() being misused ^_^ >> >> Although usleep_range() takes unsigned longs as parameters, it is >> not appropriate over the entire 0-2^64 range. >> >> a) It should not be used with tiny values, because the cost of programming >> the timer interrupt, and processing the resulting IRQ would dominate. >> >> b) It should not be used with large values (above 2000000/HZ) because >> msleep() is more efficient, and is acceptable for these ranges. > > Documentation/timers/timers-howto.txt has all the information on the various > kernel delay/sleep mechanisms. For < ~10us, it recommends to use udelay > (readx_poll_timeout_atomic). Depending on the actual timeout to be used, the > delay mechanism in timers-howto.txt should be used. Hello Kishon, I believe the proposed patch does the right thing: a) polling for the ready bit is not done in atomic context, therefore we don't need to busy-loop b) since we're ultimately calling usleep_range(), we should pass an appropriate parameter, such as max_us = 10 (instead of max_us = 1, which is outside usleep_range spec) Maybe it would help if someone reviewed this patch. Regards.
Hi, On Thu, Jun 13, 2019 at 8:28 AM Marc Gonzalez <marc.w.gonzalez@free.fr> wrote: > > readl_poll_timeout() calls usleep_range() to sleep between reads. > usleep_range() doesn't work efficiently for tiny values. > > Raise the polling delay in qcom_qmp_phy_enable() to bring it in line > with the delay in qcom_qmp_phy_com_init(). > > Signed-off-by: Marc Gonzalez <marc.w.gonzalez@free.fr> > --- > Vivek, do you remember why you didn't use the same delay value in > qcom_qmp_phy_enable) and qcom_qmp_phy_com_init() ? > --- > drivers/phy/qualcomm/phy-qcom-qmp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c > index bb522b915fa9..34ff6434da8f 100644 > --- a/drivers/phy/qualcomm/phy-qcom-qmp.c > +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c > @@ -1548,7 +1548,7 @@ static int qcom_qmp_phy_enable(struct phy *phy) > status = pcs + cfg->regs[QPHY_PCS_READY_STATUS]; > mask = cfg->mask_pcs_ready; > > - ret = readl_poll_timeout(status, val, val & mask, 1, > + ret = readl_poll_timeout(status, val, val & mask, 10, > PHY_INIT_COMPLETE_TIMEOUT); I would agree that the existing code is almost certainly wrong, since, as you said, trying to sleep for 1 us is likely pointless. I quickly coded up a test and ran it on sdm845-cheza. It looked like this: -- ktime_t a, b, c; a = ktime_get(); b = ktime_get(); usleep_range(1, 1); c = ktime_get(); pr_info("DOUG: %d ns, %d ns\n", (int)ktime_to_ns(ktime_sub(b, a)), (int)ktime_to_ns(ktime_sub(c, b))); -- At bootup I got: [ 4.121247] DOUG: 52 ns, 9479 ns [ 4.144990] DOUG: 52 ns, 9636 ns [ 4.328168] DOUG: 0 ns, 11667 ns [ 4.332659] DOUG: 52 ns, 7136 ns [ 4.358833] DOUG: 0 ns, 6666 ns [ 4.362095] DOUG: 52 ns, 8229 ns So basically the existing code is already waiting 5-10 us between polls but it's spending all of that time context switching. Changing the above to: usleep_range(5, 10); Give me instead: [ 4.120781] DOUG: 52 ns, 16927 ns [ 4.144626] DOUG: 53 ns, 17447 ns [ 4.327932] DOUG: 52 ns, 11302 ns [ 4.332501] DOUG: 0 ns, 7395 ns [ 4.357912] DOUG: 0 ns, 6823 ns [ 4.361175] DOUG: 52 ns, 9063 ns ...and that seems fine to me. -- Thus: Reviewed-by: Douglas Anderson <dianders@chromium.org>
diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c index bb522b915fa9..34ff6434da8f 100644 --- a/drivers/phy/qualcomm/phy-qcom-qmp.c +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c @@ -1548,7 +1548,7 @@ static int qcom_qmp_phy_enable(struct phy *phy) status = pcs + cfg->regs[QPHY_PCS_READY_STATUS]; mask = cfg->mask_pcs_ready; - ret = readl_poll_timeout(status, val, val & mask, 1, + ret = readl_poll_timeout(status, val, val & mask, 10, PHY_INIT_COMPLETE_TIMEOUT); if (ret) { dev_err(qmp->dev, "phy initialization timed-out\n");
readl_poll_timeout() calls usleep_range() to sleep between reads. usleep_range() doesn't work efficiently for tiny values. Raise the polling delay in qcom_qmp_phy_enable() to bring it in line with the delay in qcom_qmp_phy_com_init(). Signed-off-by: Marc Gonzalez <marc.w.gonzalez@free.fr> --- Vivek, do you remember why you didn't use the same delay value in qcom_qmp_phy_enable) and qcom_qmp_phy_com_init() ? --- drivers/phy/qualcomm/phy-qcom-qmp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)