Message ID | 20240112093006.2832105-1-ajayagarwal@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v5] PCI: dwc: Wait for link up only if link is started | expand |
Hi Mani A friendly reminder for your review.
On Fri, Jan 12, 2024 at 03:00:06PM +0530, Ajay Agarwal wrote: > In dw_pcie_host_init() regardless of whether the link has been > started or not, the code waits for the link to come up. Even in > cases where start_link() is not defined the code ends up spinning > in a loop for 1 second. Since in some systems dw_pcie_host_init() > gets called during probe, this one second loop for each pcie > interface instance ends up extending the boot time. > Which platform you are working on? Is that upstreamed? You should mention the specific platform where you are observing the issue. Right now, intel-gw and designware-plat are the only drivers not defining that callback. First one definitely needs a fixup and I do not know how the latter works. - Mani > Wait for the link up in only if the start_link() is defined. > > Signed-off-by: Ajay Agarwal <ajayagarwal@google.com> > --- > v4 was applied, but then reverted. The reason being v4 added a > regression on some products which expect the link to not come up as a > part of the probe. Since v4 returned error from dw_pcie_wait_for_link > check, the probe function of these products started to fail. > > Changelog since v4: > - Do not return error from dw_pcie_wait_for_link check > > .../pci/controller/dwc/pcie-designware-host.c | 12 +++++++---- > drivers/pci/controller/dwc/pcie-designware.c | 20 ++++++++++++------- > drivers/pci/controller/dwc/pcie-designware.h | 1 + > 3 files changed, 22 insertions(+), 11 deletions(-) > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c > index 7991f0e179b2..e53132663d1d 100644 > --- a/drivers/pci/controller/dwc/pcie-designware-host.c > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c > @@ -487,14 +487,18 @@ int dw_pcie_host_init(struct dw_pcie_rp *pp) > if (ret) > goto err_remove_edma; > > - if (!dw_pcie_link_up(pci)) { > + if (dw_pcie_link_up(pci)) { > + dw_pcie_print_link_status(pci); > + } else { > ret = dw_pcie_start_link(pci); > if (ret) > goto err_remove_edma; > - } > > - /* Ignore errors, the link may come up later */ > - dw_pcie_wait_for_link(pci); > + if (pci->ops && pci->ops->start_link) { > + /* Ignore errors, the link may come up later */ > + dw_pcie_wait_for_link(pci); > + } > + } > > bridge->sysdata = pp; > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c > index 250cf7f40b85..c067d2e960cf 100644 > --- a/drivers/pci/controller/dwc/pcie-designware.c > +++ b/drivers/pci/controller/dwc/pcie-designware.c > @@ -645,9 +645,20 @@ void dw_pcie_disable_atu(struct dw_pcie *pci, u32 dir, int index) > dw_pcie_writel_atu(pci, dir, index, PCIE_ATU_REGION_CTRL2, 0); > } > > -int dw_pcie_wait_for_link(struct dw_pcie *pci) > +void dw_pcie_print_link_status(struct dw_pcie *pci) > { > u32 offset, val; > + > + offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP); > + val = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA); > + > + dev_info(pci->dev, "PCIe Gen.%u x%u link up\n", > + FIELD_GET(PCI_EXP_LNKSTA_CLS, val), > + FIELD_GET(PCI_EXP_LNKSTA_NLW, val)); > +} > + > +int dw_pcie_wait_for_link(struct dw_pcie *pci) > +{ > int retries; > > /* Check if the link is up or not */ > @@ -663,12 +674,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci) > return -ETIMEDOUT; > } > > - offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP); > - val = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA); > - > - dev_info(pci->dev, "PCIe Gen.%u x%u link up\n", > - FIELD_GET(PCI_EXP_LNKSTA_CLS, val), > - FIELD_GET(PCI_EXP_LNKSTA_NLW, val)); > + dw_pcie_print_link_status(pci); > > return 0; > } > diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h > index 55ff76e3d384..164214a7219a 100644 > --- a/drivers/pci/controller/dwc/pcie-designware.h > +++ b/drivers/pci/controller/dwc/pcie-designware.h > @@ -447,6 +447,7 @@ void dw_pcie_setup(struct dw_pcie *pci); > void dw_pcie_iatu_detect(struct dw_pcie *pci); > int dw_pcie_edma_detect(struct dw_pcie *pci); > void dw_pcie_edma_remove(struct dw_pcie *pci); > +void dw_pcie_print_link_status(struct dw_pcie *pci); > > int dw_pcie_suspend_noirq(struct dw_pcie *pci); > int dw_pcie_resume_noirq(struct dw_pcie *pci); > -- > 2.43.0.381.gb435a96ce8-goog >
On Fri, Jan 19, 2024 at 01:22:19PM +0530, Manivannan Sadhasivam wrote: > On Fri, Jan 12, 2024 at 03:00:06PM +0530, Ajay Agarwal wrote: > > In dw_pcie_host_init() regardless of whether the link has been > > started or not, the code waits for the link to come up. Even in > > cases where start_link() is not defined the code ends up spinning > > in a loop for 1 second. Since in some systems dw_pcie_host_init() > > gets called during probe, this one second loop for each pcie > > interface instance ends up extending the boot time. > > > > Which platform you are working on? Is that upstreamed? You should mention the > specific platform where you are observing the issue. > This is for the Pixel phone platform. The platform driver for the same is not upstreamed yet. It is in the process. > Right now, intel-gw and designware-plat are the only drivers not defining that > callback. First one definitely needs a fixup and I do not know how the latter > works. > > - Mani > > > Wait for the link up in only if the start_link() is defined. > > > > Signed-off-by: Ajay Agarwal <ajayagarwal@google.com> > > --- > > v4 was applied, but then reverted. The reason being v4 added a > > regression on some products which expect the link to not come up as a > > part of the probe. Since v4 returned error from dw_pcie_wait_for_link > > check, the probe function of these products started to fail. > > > > Changelog since v4: > > - Do not return error from dw_pcie_wait_for_link check > > > > .../pci/controller/dwc/pcie-designware-host.c | 12 +++++++---- > > drivers/pci/controller/dwc/pcie-designware.c | 20 ++++++++++++------- > > drivers/pci/controller/dwc/pcie-designware.h | 1 + > > 3 files changed, 22 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c > > index 7991f0e179b2..e53132663d1d 100644 > > --- a/drivers/pci/controller/dwc/pcie-designware-host.c > > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c > > @@ -487,14 +487,18 @@ int dw_pcie_host_init(struct dw_pcie_rp *pp) > > if (ret) > > goto err_remove_edma; > > > > - if (!dw_pcie_link_up(pci)) { > > + if (dw_pcie_link_up(pci)) { > > + dw_pcie_print_link_status(pci); > > + } else { > > ret = dw_pcie_start_link(pci); > > if (ret) > > goto err_remove_edma; > > - } > > > > - /* Ignore errors, the link may come up later */ > > - dw_pcie_wait_for_link(pci); > > + if (pci->ops && pci->ops->start_link) { > > + /* Ignore errors, the link may come up later */ > > + dw_pcie_wait_for_link(pci); > > + } > > + } > > > > bridge->sysdata = pp; > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c > > index 250cf7f40b85..c067d2e960cf 100644 > > --- a/drivers/pci/controller/dwc/pcie-designware.c > > +++ b/drivers/pci/controller/dwc/pcie-designware.c > > @@ -645,9 +645,20 @@ void dw_pcie_disable_atu(struct dw_pcie *pci, u32 dir, int index) > > dw_pcie_writel_atu(pci, dir, index, PCIE_ATU_REGION_CTRL2, 0); > > } > > > > -int dw_pcie_wait_for_link(struct dw_pcie *pci) > > +void dw_pcie_print_link_status(struct dw_pcie *pci) > > { > > u32 offset, val; > > + > > + offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP); > > + val = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA); > > + > > + dev_info(pci->dev, "PCIe Gen.%u x%u link up\n", > > + FIELD_GET(PCI_EXP_LNKSTA_CLS, val), > > + FIELD_GET(PCI_EXP_LNKSTA_NLW, val)); > > +} > > + > > +int dw_pcie_wait_for_link(struct dw_pcie *pci) > > +{ > > int retries; > > > > /* Check if the link is up or not */ > > @@ -663,12 +674,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci) > > return -ETIMEDOUT; > > } > > > > - offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP); > > - val = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA); > > - > > - dev_info(pci->dev, "PCIe Gen.%u x%u link up\n", > > - FIELD_GET(PCI_EXP_LNKSTA_CLS, val), > > - FIELD_GET(PCI_EXP_LNKSTA_NLW, val)); > > + dw_pcie_print_link_status(pci); > > > > return 0; > > } > > diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h > > index 55ff76e3d384..164214a7219a 100644 > > --- a/drivers/pci/controller/dwc/pcie-designware.h > > +++ b/drivers/pci/controller/dwc/pcie-designware.h > > @@ -447,6 +447,7 @@ void dw_pcie_setup(struct dw_pcie *pci); > > void dw_pcie_iatu_detect(struct dw_pcie *pci); > > int dw_pcie_edma_detect(struct dw_pcie *pci); > > void dw_pcie_edma_remove(struct dw_pcie *pci); > > +void dw_pcie_print_link_status(struct dw_pcie *pci); > > > > int dw_pcie_suspend_noirq(struct dw_pcie *pci); > > int dw_pcie_resume_noirq(struct dw_pcie *pci); > > -- > > 2.43.0.381.gb435a96ce8-goog > > > > -- > மணிவண்ணன் சதாசிவம்
[+cc Bjorn A, Fabio, Xiaolei, who reported the v4 issue] On Fri, Jan 12, 2024 at 03:00:06PM +0530, Ajay Agarwal wrote: > In dw_pcie_host_init() regardless of whether the link has been > started or not, the code waits for the link to come up. Even in > cases where start_link() is not defined the code ends up spinning > in a loop for 1 second. Since in some systems dw_pcie_host_init() > gets called during probe, this one second loop for each pcie > interface instance ends up extending the boot time. s/start_link()/.start_link()/ to hint that this is a function pointer, not a function name in its own right (also below). s/1/one/ to be consistent. s/pcie/PCIe/ to follow spec usage. > Wait for the link up in only if the start_link() is defined. > > Signed-off-by: Ajay Agarwal <ajayagarwal@google.com> > --- > v4 was applied, but then reverted. The reason being v4 added a > regression on some products which expect the link to not come up as a > part of the probe. Since v4 returned error from dw_pcie_wait_for_link > check, the probe function of these products started to fail. I know this part doesn't get included in the commit log, but I think it would be nice to include the relevant commits here: da56a1bfbab5 ("PCI: dwc: Wait for link up only if link is started") c5097b9869a1 ("Revert "PCI: dwc: Wait for link up only if link is started"") because the latter includes details about the actual failure, so we can review this with that platform in mind. > Changelog since v4: > - Do not return error from dw_pcie_wait_for_link check > > .../pci/controller/dwc/pcie-designware-host.c | 12 +++++++---- > drivers/pci/controller/dwc/pcie-designware.c | 20 ++++++++++++------- > drivers/pci/controller/dwc/pcie-designware.h | 1 + > 3 files changed, 22 insertions(+), 11 deletions(-) > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c > index 7991f0e179b2..e53132663d1d 100644 > --- a/drivers/pci/controller/dwc/pcie-designware-host.c > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c > @@ -487,14 +487,18 @@ int dw_pcie_host_init(struct dw_pcie_rp *pp) > if (ret) > goto err_remove_edma; > > - if (!dw_pcie_link_up(pci)) { > + if (dw_pcie_link_up(pci)) { > + dw_pcie_print_link_status(pci); > + } else { > ret = dw_pcie_start_link(pci); > if (ret) > goto err_remove_edma; > - } > > - /* Ignore errors, the link may come up later */ > - dw_pcie_wait_for_link(pci); > + if (pci->ops && pci->ops->start_link) { > + /* Ignore errors, the link may come up later */ > + dw_pcie_wait_for_link(pci); > + } > + } > > bridge->sysdata = pp; > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c > index 250cf7f40b85..c067d2e960cf 100644 > --- a/drivers/pci/controller/dwc/pcie-designware.c > +++ b/drivers/pci/controller/dwc/pcie-designware.c > @@ -645,9 +645,20 @@ void dw_pcie_disable_atu(struct dw_pcie *pci, u32 dir, int index) > dw_pcie_writel_atu(pci, dir, index, PCIE_ATU_REGION_CTRL2, 0); > } > > -int dw_pcie_wait_for_link(struct dw_pcie *pci) > +void dw_pcie_print_link_status(struct dw_pcie *pci) > { > u32 offset, val; > + > + offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP); > + val = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA); > + > + dev_info(pci->dev, "PCIe Gen.%u x%u link up\n", > + FIELD_GET(PCI_EXP_LNKSTA_CLS, val), > + FIELD_GET(PCI_EXP_LNKSTA_NLW, val)); > +} > + > +int dw_pcie_wait_for_link(struct dw_pcie *pci) > +{ > int retries; > > /* Check if the link is up or not */ > @@ -663,12 +674,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci) > return -ETIMEDOUT; > } > > - offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP); > - val = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA); > - > - dev_info(pci->dev, "PCIe Gen.%u x%u link up\n", > - FIELD_GET(PCI_EXP_LNKSTA_CLS, val), > - FIELD_GET(PCI_EXP_LNKSTA_NLW, val)); > + dw_pcie_print_link_status(pci); > > return 0; > } This part (pcie-designware.c and pcie-designware.h changes) could arguably be in a separate patch so they're not a distraction from the important functional change. > diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h > index 55ff76e3d384..164214a7219a 100644 > --- a/drivers/pci/controller/dwc/pcie-designware.h > +++ b/drivers/pci/controller/dwc/pcie-designware.h > @@ -447,6 +447,7 @@ void dw_pcie_setup(struct dw_pcie *pci); > void dw_pcie_iatu_detect(struct dw_pcie *pci); > int dw_pcie_edma_detect(struct dw_pcie *pci); > void dw_pcie_edma_remove(struct dw_pcie *pci); > +void dw_pcie_print_link_status(struct dw_pcie *pci); > > int dw_pcie_suspend_noirq(struct dw_pcie *pci); > int dw_pcie_resume_noirq(struct dw_pcie *pci); > -- > 2.43.0.381.gb435a96ce8-goog >
On Fri, Jan 19, 2024 at 11:29:22PM +0530, Ajay Agarwal wrote: > On Fri, Jan 19, 2024 at 01:22:19PM +0530, Manivannan Sadhasivam wrote: > > On Fri, Jan 12, 2024 at 03:00:06PM +0530, Ajay Agarwal wrote: > > > In dw_pcie_host_init() regardless of whether the link has been > > > started or not, the code waits for the link to come up. Even in > > > cases where start_link() is not defined the code ends up spinning > > > in a loop for 1 second. Since in some systems dw_pcie_host_init() > > > gets called during probe, this one second loop for each pcie > > > interface instance ends up extending the boot time. > > > > > > > Which platform you are working on? Is that upstreamed? You should mention the > > specific platform where you are observing the issue. > > > This is for the Pixel phone platform. The platform driver for the same > is not upstreamed yet. It is in the process. > Then you should submit this patch at the time of the driver submission. Right now, you are trying to fix a problem which is not present in upstream. One can argue that it is a problem for designware-plat driver, but honestly I do not know how it works. - Mani > > Right now, intel-gw and designware-plat are the only drivers not defining that > > callback. First one definitely needs a fixup and I do not know how the latter > > works. > > > > - Mani > > > > > Wait for the link up in only if the start_link() is defined. > > > > > > Signed-off-by: Ajay Agarwal <ajayagarwal@google.com> > > > --- > > > v4 was applied, but then reverted. The reason being v4 added a > > > regression on some products which expect the link to not come up as a > > > part of the probe. Since v4 returned error from dw_pcie_wait_for_link > > > check, the probe function of these products started to fail. > > > > > > Changelog since v4: > > > - Do not return error from dw_pcie_wait_for_link check > > > > > > .../pci/controller/dwc/pcie-designware-host.c | 12 +++++++---- > > > drivers/pci/controller/dwc/pcie-designware.c | 20 ++++++++++++------- > > > drivers/pci/controller/dwc/pcie-designware.h | 1 + > > > 3 files changed, 22 insertions(+), 11 deletions(-) > > > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c > > > index 7991f0e179b2..e53132663d1d 100644 > > > --- a/drivers/pci/controller/dwc/pcie-designware-host.c > > > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c > > > @@ -487,14 +487,18 @@ int dw_pcie_host_init(struct dw_pcie_rp *pp) > > > if (ret) > > > goto err_remove_edma; > > > > > > - if (!dw_pcie_link_up(pci)) { > > > + if (dw_pcie_link_up(pci)) { > > > + dw_pcie_print_link_status(pci); > > > + } else { > > > ret = dw_pcie_start_link(pci); > > > if (ret) > > > goto err_remove_edma; > > > - } > > > > > > - /* Ignore errors, the link may come up later */ > > > - dw_pcie_wait_for_link(pci); > > > + if (pci->ops && pci->ops->start_link) { > > > + /* Ignore errors, the link may come up later */ > > > + dw_pcie_wait_for_link(pci); > > > + } > > > + } > > > > > > bridge->sysdata = pp; > > > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c > > > index 250cf7f40b85..c067d2e960cf 100644 > > > --- a/drivers/pci/controller/dwc/pcie-designware.c > > > +++ b/drivers/pci/controller/dwc/pcie-designware.c > > > @@ -645,9 +645,20 @@ void dw_pcie_disable_atu(struct dw_pcie *pci, u32 dir, int index) > > > dw_pcie_writel_atu(pci, dir, index, PCIE_ATU_REGION_CTRL2, 0); > > > } > > > > > > -int dw_pcie_wait_for_link(struct dw_pcie *pci) > > > +void dw_pcie_print_link_status(struct dw_pcie *pci) > > > { > > > u32 offset, val; > > > + > > > + offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP); > > > + val = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA); > > > + > > > + dev_info(pci->dev, "PCIe Gen.%u x%u link up\n", > > > + FIELD_GET(PCI_EXP_LNKSTA_CLS, val), > > > + FIELD_GET(PCI_EXP_LNKSTA_NLW, val)); > > > +} > > > + > > > +int dw_pcie_wait_for_link(struct dw_pcie *pci) > > > +{ > > > int retries; > > > > > > /* Check if the link is up or not */ > > > @@ -663,12 +674,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci) > > > return -ETIMEDOUT; > > > } > > > > > > - offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP); > > > - val = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA); > > > - > > > - dev_info(pci->dev, "PCIe Gen.%u x%u link up\n", > > > - FIELD_GET(PCI_EXP_LNKSTA_CLS, val), > > > - FIELD_GET(PCI_EXP_LNKSTA_NLW, val)); > > > + dw_pcie_print_link_status(pci); > > > > > > return 0; > > > } > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h > > > index 55ff76e3d384..164214a7219a 100644 > > > --- a/drivers/pci/controller/dwc/pcie-designware.h > > > +++ b/drivers/pci/controller/dwc/pcie-designware.h > > > @@ -447,6 +447,7 @@ void dw_pcie_setup(struct dw_pcie *pci); > > > void dw_pcie_iatu_detect(struct dw_pcie *pci); > > > int dw_pcie_edma_detect(struct dw_pcie *pci); > > > void dw_pcie_edma_remove(struct dw_pcie *pci); > > > +void dw_pcie_print_link_status(struct dw_pcie *pci); > > > > > > int dw_pcie_suspend_noirq(struct dw_pcie *pci); > > > int dw_pcie_resume_noirq(struct dw_pcie *pci); > > > -- > > > 2.43.0.381.gb435a96ce8-goog > > > > > > > -- > > மணிவண்ணன் சதாசிவம்
On Fri, Jan 19, 2024 at 02:42:11PM -0600, Bjorn Helgaas wrote: > [+cc Bjorn A, Fabio, Xiaolei, who reported the v4 issue] > > On Fri, Jan 12, 2024 at 03:00:06PM +0530, Ajay Agarwal wrote: > > In dw_pcie_host_init() regardless of whether the link has been > > started or not, the code waits for the link to come up. Even in > > cases where start_link() is not defined the code ends up spinning > > in a loop for 1 second. Since in some systems dw_pcie_host_init() > > gets called during probe, this one second loop for each pcie > > interface instance ends up extending the boot time. > > s/start_link()/.start_link()/ to hint that this is a function pointer, > not a function name in its own right (also below). > s/1/one/ to be consistent. > s/pcie/PCIe/ to follow spec usage. > Sure, will fix it in the patch series. > > Wait for the link up in only if the start_link() is defined. > > > > Signed-off-by: Ajay Agarwal <ajayagarwal@google.com> > > --- > > v4 was applied, but then reverted. The reason being v4 added a > > regression on some products which expect the link to not come up as a > > part of the probe. Since v4 returned error from dw_pcie_wait_for_link > > check, the probe function of these products started to fail. > > I know this part doesn't get included in the commit log, but I think > it would be nice to include the relevant commits here: > > da56a1bfbab5 ("PCI: dwc: Wait for link up only if link is started") > c5097b9869a1 ("Revert "PCI: dwc: Wait for link up only if link is started"") > > because the latter includes details about the actual failure, so we > can review this with that platform in mind. > Sure, will add the details in the patch series. > > Changelog since v4: > > - Do not return error from dw_pcie_wait_for_link check > > > > .../pci/controller/dwc/pcie-designware-host.c | 12 +++++++---- > > drivers/pci/controller/dwc/pcie-designware.c | 20 ++++++++++++------- > > drivers/pci/controller/dwc/pcie-designware.h | 1 + > > 3 files changed, 22 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c > > index 7991f0e179b2..e53132663d1d 100644 > > --- a/drivers/pci/controller/dwc/pcie-designware-host.c > > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c > > @@ -487,14 +487,18 @@ int dw_pcie_host_init(struct dw_pcie_rp *pp) > > if (ret) > > goto err_remove_edma; > > > > - if (!dw_pcie_link_up(pci)) { > > + if (dw_pcie_link_up(pci)) { > > + dw_pcie_print_link_status(pci); > > + } else { > > ret = dw_pcie_start_link(pci); > > if (ret) > > goto err_remove_edma; > > - } > > > > - /* Ignore errors, the link may come up later */ > > - dw_pcie_wait_for_link(pci); > > + if (pci->ops && pci->ops->start_link) { > > + /* Ignore errors, the link may come up later */ > > + dw_pcie_wait_for_link(pci); > > + } > > + } > > > > bridge->sysdata = pp; > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c > > index 250cf7f40b85..c067d2e960cf 100644 > > --- a/drivers/pci/controller/dwc/pcie-designware.c > > +++ b/drivers/pci/controller/dwc/pcie-designware.c > > @@ -645,9 +645,20 @@ void dw_pcie_disable_atu(struct dw_pcie *pci, u32 dir, int index) > > dw_pcie_writel_atu(pci, dir, index, PCIE_ATU_REGION_CTRL2, 0); > > } > > > > -int dw_pcie_wait_for_link(struct dw_pcie *pci) > > +void dw_pcie_print_link_status(struct dw_pcie *pci) > > { > > u32 offset, val; > > + > > + offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP); > > + val = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA); > > + > > + dev_info(pci->dev, "PCIe Gen.%u x%u link up\n", > > + FIELD_GET(PCI_EXP_LNKSTA_CLS, val), > > + FIELD_GET(PCI_EXP_LNKSTA_NLW, val)); > > +} > > + > > +int dw_pcie_wait_for_link(struct dw_pcie *pci) > > +{ > > int retries; > > > > /* Check if the link is up or not */ > > @@ -663,12 +674,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci) > > return -ETIMEDOUT; > > } > > > > - offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP); > > - val = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA); > > - > > - dev_info(pci->dev, "PCIe Gen.%u x%u link up\n", > > - FIELD_GET(PCI_EXP_LNKSTA_CLS, val), > > - FIELD_GET(PCI_EXP_LNKSTA_NLW, val)); > > + dw_pcie_print_link_status(pci); > > > > return 0; > > } > > This part (pcie-designware.c and pcie-designware.h changes) could > arguably be in a separate patch so they're not a distraction from the > important functional change. > Sure, will separate it out in the patch series. > > diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h > > index 55ff76e3d384..164214a7219a 100644 > > --- a/drivers/pci/controller/dwc/pcie-designware.h > > +++ b/drivers/pci/controller/dwc/pcie-designware.h > > @@ -447,6 +447,7 @@ void dw_pcie_setup(struct dw_pcie *pci); > > void dw_pcie_iatu_detect(struct dw_pcie *pci); > > int dw_pcie_edma_detect(struct dw_pcie *pci); > > void dw_pcie_edma_remove(struct dw_pcie *pci); > > +void dw_pcie_print_link_status(struct dw_pcie *pci); > > > > int dw_pcie_suspend_noirq(struct dw_pcie *pci); > > int dw_pcie_resume_noirq(struct dw_pcie *pci); > > -- > > 2.43.0.381.gb435a96ce8-goog > >
On Sat, Jan 20, 2024 at 08:04:34PM +0530, Manivannan Sadhasivam wrote: > On Fri, Jan 19, 2024 at 11:29:22PM +0530, Ajay Agarwal wrote: > > On Fri, Jan 19, 2024 at 01:22:19PM +0530, Manivannan Sadhasivam wrote: > > > On Fri, Jan 12, 2024 at 03:00:06PM +0530, Ajay Agarwal wrote: > > > > In dw_pcie_host_init() regardless of whether the link has been > > > > started or not, the code waits for the link to come up. Even in > > > > cases where start_link() is not defined the code ends up spinning > > > > in a loop for 1 second. Since in some systems dw_pcie_host_init() > > > > gets called during probe, this one second loop for each pcie > > > > interface instance ends up extending the boot time. > > > > > > > > > > Which platform you are working on? Is that upstreamed? You should mention the > > > specific platform where you are observing the issue. > > > > > This is for the Pixel phone platform. The platform driver for the same > > is not upstreamed yet. It is in the process. > > > > Then you should submit this patch at the time of the driver submission. Right > now, you are trying to fix a problem which is not present in upstream. One can > argue that it is a problem for designware-plat driver, but honestly I do not > know how it works. > > - Mani > Yes Mani, this can be a problem for the designware-plat driver. To me, the problem of a second being wasted in the probe path seems pretty obvious. We will wait for the link to be up even though we are not starting the link training. Can this patch be accepted considering the problem in the dw-plat driver then? Additionally, I have made this patch a part of a series [1] to keep the functional and refactoring changes separate. Please let me know if you see a concern with that. [1] https://lore.kernel.org/linux-pci/20240124092533.1267836-1-ajayagarwal@google.com/ > > > Right now, intel-gw and designware-plat are the only drivers not defining that > > > callback. First one definitely needs a fixup and I do not know how the latter > > > works. > > > > > > - Mani > > > > > > > Wait for the link up in only if the start_link() is defined. > > > > > > > > Signed-off-by: Ajay Agarwal <ajayagarwal@google.com> > > > > --- > > > > v4 was applied, but then reverted. The reason being v4 added a > > > > regression on some products which expect the link to not come up as a > > > > part of the probe. Since v4 returned error from dw_pcie_wait_for_link > > > > check, the probe function of these products started to fail. > > > > > > > > Changelog since v4: > > > > - Do not return error from dw_pcie_wait_for_link check > > > > > > > > .../pci/controller/dwc/pcie-designware-host.c | 12 +++++++---- > > > > drivers/pci/controller/dwc/pcie-designware.c | 20 ++++++++++++------- > > > > drivers/pci/controller/dwc/pcie-designware.h | 1 + > > > > 3 files changed, 22 insertions(+), 11 deletions(-) > > > > > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c > > > > index 7991f0e179b2..e53132663d1d 100644 > > > > --- a/drivers/pci/controller/dwc/pcie-designware-host.c > > > > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c > > > > @@ -487,14 +487,18 @@ int dw_pcie_host_init(struct dw_pcie_rp *pp) > > > > if (ret) > > > > goto err_remove_edma; > > > > > > > > - if (!dw_pcie_link_up(pci)) { > > > > + if (dw_pcie_link_up(pci)) { > > > > + dw_pcie_print_link_status(pci); > > > > + } else { > > > > ret = dw_pcie_start_link(pci); > > > > if (ret) > > > > goto err_remove_edma; > > > > - } > > > > > > > > - /* Ignore errors, the link may come up later */ > > > > - dw_pcie_wait_for_link(pci); > > > > + if (pci->ops && pci->ops->start_link) { > > > > + /* Ignore errors, the link may come up later */ > > > > + dw_pcie_wait_for_link(pci); > > > > + } > > > > + } > > > > > > > > bridge->sysdata = pp; > > > > > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c > > > > index 250cf7f40b85..c067d2e960cf 100644 > > > > --- a/drivers/pci/controller/dwc/pcie-designware.c > > > > +++ b/drivers/pci/controller/dwc/pcie-designware.c > > > > @@ -645,9 +645,20 @@ void dw_pcie_disable_atu(struct dw_pcie *pci, u32 dir, int index) > > > > dw_pcie_writel_atu(pci, dir, index, PCIE_ATU_REGION_CTRL2, 0); > > > > } > > > > > > > > -int dw_pcie_wait_for_link(struct dw_pcie *pci) > > > > +void dw_pcie_print_link_status(struct dw_pcie *pci) > > > > { > > > > u32 offset, val; > > > > + > > > > + offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP); > > > > + val = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA); > > > > + > > > > + dev_info(pci->dev, "PCIe Gen.%u x%u link up\n", > > > > + FIELD_GET(PCI_EXP_LNKSTA_CLS, val), > > > > + FIELD_GET(PCI_EXP_LNKSTA_NLW, val)); > > > > +} > > > > + > > > > +int dw_pcie_wait_for_link(struct dw_pcie *pci) > > > > +{ > > > > int retries; > > > > > > > > /* Check if the link is up or not */ > > > > @@ -663,12 +674,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci) > > > > return -ETIMEDOUT; > > > > } > > > > > > > > - offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP); > > > > - val = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA); > > > > - > > > > - dev_info(pci->dev, "PCIe Gen.%u x%u link up\n", > > > > - FIELD_GET(PCI_EXP_LNKSTA_CLS, val), > > > > - FIELD_GET(PCI_EXP_LNKSTA_NLW, val)); > > > > + dw_pcie_print_link_status(pci); > > > > > > > > return 0; > > > > } > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h > > > > index 55ff76e3d384..164214a7219a 100644 > > > > --- a/drivers/pci/controller/dwc/pcie-designware.h > > > > +++ b/drivers/pci/controller/dwc/pcie-designware.h > > > > @@ -447,6 +447,7 @@ void dw_pcie_setup(struct dw_pcie *pci); > > > > void dw_pcie_iatu_detect(struct dw_pcie *pci); > > > > int dw_pcie_edma_detect(struct dw_pcie *pci); > > > > void dw_pcie_edma_remove(struct dw_pcie *pci); > > > > +void dw_pcie_print_link_status(struct dw_pcie *pci); > > > > > > > > int dw_pcie_suspend_noirq(struct dw_pcie *pci); > > > > int dw_pcie_resume_noirq(struct dw_pcie *pci); > > > > -- > > > > 2.43.0.381.gb435a96ce8-goog > > > > > > > > > > -- > > > மணிவண்ணன் சதாசிவம் > > -- > மணிவண்ணன் சதாசிவம்
On Mon, Jan 29, 2024 at 12:21:51PM +0530, Ajay Agarwal wrote: > On Sat, Jan 20, 2024 at 08:04:34PM +0530, Manivannan Sadhasivam wrote: > > On Fri, Jan 19, 2024 at 11:29:22PM +0530, Ajay Agarwal wrote: > > > On Fri, Jan 19, 2024 at 01:22:19PM +0530, Manivannan Sadhasivam wrote: > > > > On Fri, Jan 12, 2024 at 03:00:06PM +0530, Ajay Agarwal wrote: > > > > > In dw_pcie_host_init() regardless of whether the link has been > > > > > started or not, the code waits for the link to come up. Even in > > > > > cases where start_link() is not defined the code ends up spinning > > > > > in a loop for 1 second. Since in some systems dw_pcie_host_init() > > > > > gets called during probe, this one second loop for each pcie > > > > > interface instance ends up extending the boot time. > > > > > > > > > > > > > Which platform you are working on? Is that upstreamed? You should mention the > > > > specific platform where you are observing the issue. > > > > > > > This is for the Pixel phone platform. The platform driver for the same > > > is not upstreamed yet. It is in the process. > > > > > > > Then you should submit this patch at the time of the driver submission. Right > > now, you are trying to fix a problem which is not present in upstream. One can > > argue that it is a problem for designware-plat driver, but honestly I do not > > know how it works. > > > > - Mani > > > Yes Mani, this can be a problem for the designware-plat driver. To me, > the problem of a second being wasted in the probe path seems pretty > obvious. We will wait for the link to be up even though we are not > starting the link training. Can this patch be accepted considering the > problem in the dw-plat driver then? > If that's the case with your driver, when are you starting the link training? > Additionally, I have made this patch a part of a series [1] to keep the > functional and refactoring changes separate. Please let me know if you > see a concern with that. > I have responded to this series with my concerns. - Mani
On Mon, Jan 29, 2024 at 12:40:25PM +0530, Manivannan Sadhasivam wrote: > On Mon, Jan 29, 2024 at 12:21:51PM +0530, Ajay Agarwal wrote: > > On Sat, Jan 20, 2024 at 08:04:34PM +0530, Manivannan Sadhasivam wrote: > > > On Fri, Jan 19, 2024 at 11:29:22PM +0530, Ajay Agarwal wrote: > > > > On Fri, Jan 19, 2024 at 01:22:19PM +0530, Manivannan Sadhasivam wrote: > > > > > On Fri, Jan 12, 2024 at 03:00:06PM +0530, Ajay Agarwal wrote: > > > > > > In dw_pcie_host_init() regardless of whether the link has been > > > > > > started or not, the code waits for the link to come up. Even in > > > > > > cases where start_link() is not defined the code ends up spinning > > > > > > in a loop for 1 second. Since in some systems dw_pcie_host_init() > > > > > > gets called during probe, this one second loop for each pcie > > > > > > interface instance ends up extending the boot time. > > > > > > > > > > > > > > > > Which platform you are working on? Is that upstreamed? You should mention the > > > > > specific platform where you are observing the issue. > > > > > > > > > This is for the Pixel phone platform. The platform driver for the same > > > > is not upstreamed yet. It is in the process. > > > > > > > > > > Then you should submit this patch at the time of the driver submission. Right > > > now, you are trying to fix a problem which is not present in upstream. One can > > > argue that it is a problem for designware-plat driver, but honestly I do not > > > know how it works. > > > > > > - Mani > > > > > Yes Mani, this can be a problem for the designware-plat driver. To me, > > the problem of a second being wasted in the probe path seems pretty > > obvious. We will wait for the link to be up even though we are not > > starting the link training. Can this patch be accepted considering the > > problem in the dw-plat driver then? > > > > If that's the case with your driver, when are you starting the link training? > The link training starts later based on a userspace/debugfs trigger. > > Additionally, I have made this patch a part of a series [1] to keep the > > functional and refactoring changes separate. Please let me know if you > > see a concern with that. > > > > I have responded to this series with my concerns. > > - Mani > Sure, I will check there and respond. Thanks. > -- > மணிவண்ணன் சதாசிவம்
On Mon, Jan 29, 2024 at 01:34:52PM +0530, Ajay Agarwal wrote: > On Mon, Jan 29, 2024 at 12:40:25PM +0530, Manivannan Sadhasivam wrote: > > On Mon, Jan 29, 2024 at 12:21:51PM +0530, Ajay Agarwal wrote: > > > On Sat, Jan 20, 2024 at 08:04:34PM +0530, Manivannan Sadhasivam wrote: > > > > On Fri, Jan 19, 2024 at 11:29:22PM +0530, Ajay Agarwal wrote: > > > > > On Fri, Jan 19, 2024 at 01:22:19PM +0530, Manivannan Sadhasivam wrote: > > > > > > On Fri, Jan 12, 2024 at 03:00:06PM +0530, Ajay Agarwal wrote: > > > > > > > In dw_pcie_host_init() regardless of whether the link has been > > > > > > > started or not, the code waits for the link to come up. Even in > > > > > > > cases where start_link() is not defined the code ends up spinning > > > > > > > in a loop for 1 second. Since in some systems dw_pcie_host_init() > > > > > > > gets called during probe, this one second loop for each pcie > > > > > > > interface instance ends up extending the boot time. > > > > > > > > > > > > > > > > > > > Which platform you are working on? Is that upstreamed? You should mention the > > > > > > specific platform where you are observing the issue. > > > > > > > > > > > This is for the Pixel phone platform. The platform driver for the same > > > > > is not upstreamed yet. It is in the process. > > > > > > > > > > > > > Then you should submit this patch at the time of the driver submission. Right > > > > now, you are trying to fix a problem which is not present in upstream. One can > > > > argue that it is a problem for designware-plat driver, but honestly I do not > > > > know how it works. > > > > > > > > - Mani > > > > > > > Yes Mani, this can be a problem for the designware-plat driver. To me, > > > the problem of a second being wasted in the probe path seems pretty > > > obvious. We will wait for the link to be up even though we are not > > > starting the link training. Can this patch be accepted considering the > > > problem in the dw-plat driver then? > > > > > > > If that's the case with your driver, when are you starting the link training? > > > The link training starts later based on a userspace/debugfs trigger. > Why does it happen as such? What's the problem in starting the link during probe? Keep it in mind that if you rely on the userspace for starting the link based on a platform (like Android), then if the same SoC or peripheral instance get reused in other platform (non-android), the it won't be a seamless user experience. If there are any other usecases, please state them. - Mani
On Mon, Jan 29, 2024 at 01:42:54PM +0530, Manivannan Sadhasivam wrote: > On Mon, Jan 29, 2024 at 01:34:52PM +0530, Ajay Agarwal wrote: > > On Mon, Jan 29, 2024 at 12:40:25PM +0530, Manivannan Sadhasivam wrote: > > > On Mon, Jan 29, 2024 at 12:21:51PM +0530, Ajay Agarwal wrote: > > > > On Sat, Jan 20, 2024 at 08:04:34PM +0530, Manivannan Sadhasivam wrote: > > > > > On Fri, Jan 19, 2024 at 11:29:22PM +0530, Ajay Agarwal wrote: > > > > > > On Fri, Jan 19, 2024 at 01:22:19PM +0530, Manivannan Sadhasivam wrote: > > > > > > > On Fri, Jan 12, 2024 at 03:00:06PM +0530, Ajay Agarwal wrote: > > > > > > > > In dw_pcie_host_init() regardless of whether the link has been > > > > > > > > started or not, the code waits for the link to come up. Even in > > > > > > > > cases where start_link() is not defined the code ends up spinning > > > > > > > > in a loop for 1 second. Since in some systems dw_pcie_host_init() > > > > > > > > gets called during probe, this one second loop for each pcie > > > > > > > > interface instance ends up extending the boot time. > > > > > > > > > > > > > > > > > > > > > > Which platform you are working on? Is that upstreamed? You should mention the > > > > > > > specific platform where you are observing the issue. > > > > > > > > > > > > > This is for the Pixel phone platform. The platform driver for the same > > > > > > is not upstreamed yet. It is in the process. > > > > > > > > > > > > > > > > Then you should submit this patch at the time of the driver submission. Right > > > > > now, you are trying to fix a problem which is not present in upstream. One can > > > > > argue that it is a problem for designware-plat driver, but honestly I do not > > > > > know how it works. > > > > > > > > > > - Mani > > > > > > > > > Yes Mani, this can be a problem for the designware-plat driver. To me, > > > > the problem of a second being wasted in the probe path seems pretty > > > > obvious. We will wait for the link to be up even though we are not > > > > starting the link training. Can this patch be accepted considering the > > > > problem in the dw-plat driver then? > > > > > > > > > > If that's the case with your driver, when are you starting the link training? > > > > > The link training starts later based on a userspace/debugfs trigger. > > > > Why does it happen as such? What's the problem in starting the link during > probe? Keep it in mind that if you rely on the userspace for starting the link > based on a platform (like Android), then if the same SoC or peripheral instance > get reused in other platform (non-android), the it won't be a seamless user > experience. > > If there are any other usecases, please state them. > > - Mani > This SoC is targeted for an android phone usecase and the endpoints being enumerated need to go through an appropriate and device specific power sequence which gets triggered only when the userspace is up. The PCIe probe cannot assume that the EPs have been powered up already and hence the link-up is not attempted. > -- > மணிவண்ணன் சதாசிவம்
On Mon, Jan 29, 2024 at 06:56:24PM +0530, Ajay Agarwal wrote: > On Mon, Jan 29, 2024 at 01:42:54PM +0530, Manivannan Sadhasivam wrote: > > On Mon, Jan 29, 2024 at 01:34:52PM +0530, Ajay Agarwal wrote: > > > On Mon, Jan 29, 2024 at 12:40:25PM +0530, Manivannan Sadhasivam wrote: > > > > On Mon, Jan 29, 2024 at 12:21:51PM +0530, Ajay Agarwal wrote: > > > > > On Sat, Jan 20, 2024 at 08:04:34PM +0530, Manivannan Sadhasivam wrote: > > > > > > On Fri, Jan 19, 2024 at 11:29:22PM +0530, Ajay Agarwal wrote: > > > > > > > On Fri, Jan 19, 2024 at 01:22:19PM +0530, Manivannan Sadhasivam wrote: > > > > > > > > On Fri, Jan 12, 2024 at 03:00:06PM +0530, Ajay Agarwal wrote: > > > > > > > > > In dw_pcie_host_init() regardless of whether the link has been > > > > > > > > > started or not, the code waits for the link to come up. Even in > > > > > > > > > cases where start_link() is not defined the code ends up spinning > > > > > > > > > in a loop for 1 second. Since in some systems dw_pcie_host_init() > > > > > > > > > gets called during probe, this one second loop for each pcie > > > > > > > > > interface instance ends up extending the boot time. > > > > > > > > > > > > > > > > > > > > > > > > > Which platform you are working on? Is that upstreamed? You should mention the > > > > > > > > specific platform where you are observing the issue. > > > > > > > > > > > > > > > This is for the Pixel phone platform. The platform driver for the same > > > > > > > is not upstreamed yet. It is in the process. > > > > > > > > > > > > > > > > > > > Then you should submit this patch at the time of the driver submission. Right > > > > > > now, you are trying to fix a problem which is not present in upstream. One can > > > > > > argue that it is a problem for designware-plat driver, but honestly I do not > > > > > > know how it works. > > > > > > > > > > > > - Mani > > > > > > > > > > > Yes Mani, this can be a problem for the designware-plat driver. To me, > > > > > the problem of a second being wasted in the probe path seems pretty > > > > > obvious. We will wait for the link to be up even though we are not > > > > > starting the link training. Can this patch be accepted considering the > > > > > problem in the dw-plat driver then? > > > > > > > > > > > > > If that's the case with your driver, when are you starting the link training? > > > > > > > The link training starts later based on a userspace/debugfs trigger. > > > > > > > Why does it happen as such? What's the problem in starting the link during > > probe? Keep it in mind that if you rely on the userspace for starting the link > > based on a platform (like Android), then if the same SoC or peripheral instance > > get reused in other platform (non-android), the it won't be a seamless user > > experience. > > > > If there are any other usecases, please state them. > > > > - Mani > > > This SoC is targeted for an android phone usecase and the endpoints > being enumerated need to go through an appropriate and device specific > power sequence which gets triggered only when the userspace is up. The > PCIe probe cannot assume that the EPs have been powered up already and > hence the link-up is not attempted. Still, I do not see the necessity to not call start_link() during probe. If you add PROBE_PREFER_ASYNCHRONOUS to your controller driver, this delay would become negligible. The reason why I'm against not calling start_link() is due to below reasons: 1. If the same SoC gets reused for other platforms, even other android phones that powers up the endpoints during boot, then it creates a dependency with userspace to always start the link even though the devices were available. That's why we should never fix the behavior of the controller drivers based on a single platform. 2. This will create fragmentation among the DWC glue drivers w.r.t the behavior and will become a maintenance nightmare (there are enough already). So, I'd suggest you to use the asynchronous probe mentioned above so that other drivers may probe in parallel thus avoiding the delay that you are worried about. - Mani
On Tue, Jan 30, 2024 at 12:15:55PM +0530, Manivannan Sadhasivam wrote: > On Mon, Jan 29, 2024 at 06:56:24PM +0530, Ajay Agarwal wrote: > > On Mon, Jan 29, 2024 at 01:42:54PM +0530, Manivannan Sadhasivam wrote: > > > On Mon, Jan 29, 2024 at 01:34:52PM +0530, Ajay Agarwal wrote: > > > > On Mon, Jan 29, 2024 at 12:40:25PM +0530, Manivannan Sadhasivam wrote: > > > > > On Mon, Jan 29, 2024 at 12:21:51PM +0530, Ajay Agarwal wrote: > > > > > > On Sat, Jan 20, 2024 at 08:04:34PM +0530, Manivannan Sadhasivam wrote: > > > > > > > On Fri, Jan 19, 2024 at 11:29:22PM +0530, Ajay Agarwal wrote: > > > > > > > > On Fri, Jan 19, 2024 at 01:22:19PM +0530, Manivannan Sadhasivam wrote: > > > > > > > > > On Fri, Jan 12, 2024 at 03:00:06PM +0530, Ajay Agarwal wrote: > > > > > > > > > > In dw_pcie_host_init() regardless of whether the link has been > > > > > > > > > > started or not, the code waits for the link to come up. Even in > > > > > > > > > > cases where start_link() is not defined the code ends up spinning > > > > > > > > > > in a loop for 1 second. Since in some systems dw_pcie_host_init() > > > > > > > > > > gets called during probe, this one second loop for each pcie > > > > > > > > > > interface instance ends up extending the boot time. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Which platform you are working on? Is that upstreamed? You should mention the > > > > > > > > > specific platform where you are observing the issue. > > > > > > > > > > > > > > > > > This is for the Pixel phone platform. The platform driver for the same > > > > > > > > is not upstreamed yet. It is in the process. > > > > > > > > > > > > > > > > > > > > > > Then you should submit this patch at the time of the driver submission. Right > > > > > > > now, you are trying to fix a problem which is not present in upstream. One can > > > > > > > argue that it is a problem for designware-plat driver, but honestly I do not > > > > > > > know how it works. > > > > > > > > > > > > > > - Mani > > > > > > > > > > > > > Yes Mani, this can be a problem for the designware-plat driver. To me, > > > > > > the problem of a second being wasted in the probe path seems pretty > > > > > > obvious. We will wait for the link to be up even though we are not > > > > > > starting the link training. Can this patch be accepted considering the > > > > > > problem in the dw-plat driver then? > > > > > > > > > > > > > > > > If that's the case with your driver, when are you starting the link training? > > > > > > > > > The link training starts later based on a userspace/debugfs trigger. > > > > > > > > > > Why does it happen as such? What's the problem in starting the link during > > > probe? Keep it in mind that if you rely on the userspace for starting the link > > > based on a platform (like Android), then if the same SoC or peripheral instance > > > get reused in other platform (non-android), the it won't be a seamless user > > > experience. > > > > > > If there are any other usecases, please state them. > > > > > > - Mani > > > > > This SoC is targeted for an android phone usecase and the endpoints > > being enumerated need to go through an appropriate and device specific > > power sequence which gets triggered only when the userspace is up. The > > PCIe probe cannot assume that the EPs have been powered up already and > > hence the link-up is not attempted. > > Still, I do not see the necessity to not call start_link() during probe. If you I am not adding any logic/condition around calling the start_link() itself. I am only avoiding the wait for the link to be up if the controller driver has not defined start_link(). > add PROBE_PREFER_ASYNCHRONOUS to your controller driver, this delay would become > negligible. The reason why I'm against not calling start_link() is due to below > reasons: > > 1. If the same SoC gets reused for other platforms, even other android phones > that powers up the endpoints during boot, then it creates a dependency with > userspace to always start the link even though the devices were available. > That's why we should never fix the behavior of the controller drivers based on a > single platform. I wonder how the behavior is changing with this patch. Do you have an example of a platform which does not have start_link() defined but would like to still wait for a second for the link to come up? For example, consider the intel-gw driver. The 1 sec wait time in its probe path is also a waste because it explicitly starts link training later in time. > > 2. This will create fragmentation among the DWC glue drivers w.r.t the behavior > and will become a maintenance nightmare (there are enough already). > > So, I'd suggest you to use the asynchronous probe mentioned above so that other > drivers may probe in parallel thus avoiding the delay that you are worried about. > > - Mani > > -- > மணிவண்ணன் சதாசிவம்
On Tue, Jan 30, 2024 at 02:30:27PM +0530, Ajay Agarwal wrote: [...] > > > > > > If that's the case with your driver, when are you starting the link training? > > > > > > > > > > > The link training starts later based on a userspace/debugfs trigger. > > > > > > > > > > > > > Why does it happen as such? What's the problem in starting the link during > > > > probe? Keep it in mind that if you rely on the userspace for starting the link > > > > based on a platform (like Android), then if the same SoC or peripheral instance > > > > get reused in other platform (non-android), the it won't be a seamless user > > > > experience. > > > > > > > > If there are any other usecases, please state them. > > > > > > > > - Mani > > > > > > > This SoC is targeted for an android phone usecase and the endpoints > > > being enumerated need to go through an appropriate and device specific > > > power sequence which gets triggered only when the userspace is up. The > > > PCIe probe cannot assume that the EPs have been powered up already and > > > hence the link-up is not attempted. > > > > Still, I do not see the necessity to not call start_link() during probe. If you > I am not adding any logic/condition around calling the start_link() > itself. I am only avoiding the wait for the link to be up if the > controller driver has not defined start_link(). > I'm saying that not defining the start_link() callback itself is wrong. > > add PROBE_PREFER_ASYNCHRONOUS to your controller driver, this delay would become > > negligible. The reason why I'm against not calling start_link() is due to below > > reasons: > > > > 1. If the same SoC gets reused for other platforms, even other android phones > > that powers up the endpoints during boot, then it creates a dependency with > > userspace to always start the link even though the devices were available. > > That's why we should never fix the behavior of the controller drivers based on a > > single platform. > I wonder how the behavior is changing with this patch. Do you have an > example of a platform which does not have start_link() defined but would > like to still wait for a second for the link to come up? > Did you went through my reply completely? I mentioned that the 1s delay would be gone if you add the async flag to your driver and you are ignoring that. But again, I'm saying that not defining start_link() itself is wrong and I've already mentioned the reasons. > For example, consider the intel-gw driver. The 1 sec wait time in its > probe path is also a waste because it explicitly starts link training > later in time. > I previously mentioned that the intel-gw needs fixing since there is no point in starting the link and waiting for it to come up in its probe() if the DWC core is already doing that. - Mani
On Tue, Jan 30, 2024 at 05:59:06PM +0530, Manivannan Sadhasivam wrote: > On Tue, Jan 30, 2024 at 02:30:27PM +0530, Ajay Agarwal wrote: > > [...] > > > > > > > > If that's the case with your driver, when are you starting the link training? > > > > > > > > > > > > > The link training starts later based on a userspace/debugfs trigger. > > > > > > > > > > > > > > > > Why does it happen as such? What's the problem in starting the link during > > > > > probe? Keep it in mind that if you rely on the userspace for starting the link > > > > > based on a platform (like Android), then if the same SoC or peripheral instance > > > > > get reused in other platform (non-android), the it won't be a seamless user > > > > > experience. > > > > > > > > > > If there are any other usecases, please state them. > > > > > > > > > > - Mani > > > > > > > > > This SoC is targeted for an android phone usecase and the endpoints > > > > being enumerated need to go through an appropriate and device specific > > > > power sequence which gets triggered only when the userspace is up. The > > > > PCIe probe cannot assume that the EPs have been powered up already and > > > > hence the link-up is not attempted. > > > > > > Still, I do not see the necessity to not call start_link() during probe. If you > > I am not adding any logic/condition around calling the start_link() > > itself. I am only avoiding the wait for the link to be up if the > > controller driver has not defined start_link(). > > > > I'm saying that not defining the start_link() callback itself is wrong. > Whether the start_link() should be defined or not, is a different design discussion. We currently have 2 drivers in upstream (intel-gw and dw-plat) which do not have start_link() defined. Waiting for the link to come up for the platforms using those drivers is not a good idea. And that is what we are trying to avoid. > > > add PROBE_PREFER_ASYNCHRONOUS to your controller driver, this delay would become > > > negligible. The reason why I'm against not calling start_link() is due to below > > > reasons: > > > > > > 1. If the same SoC gets reused for other platforms, even other android phones > > > that powers up the endpoints during boot, then it creates a dependency with > > > userspace to always start the link even though the devices were available. > > > That's why we should never fix the behavior of the controller drivers based on a > > > single platform. > > I wonder how the behavior is changing with this patch. Do you have an > > example of a platform which does not have start_link() defined but would > > like to still wait for a second for the link to come up? > > > > Did you went through my reply completely? I mentioned that the 1s delay would be > gone if you add the async flag to your driver and you are ignoring that. > Yes, I did go through your suggestion of async probe and that might solve my problem of the 1 sec delay. But I would like to fix the problem at the core. > But again, I'm saying that not defining start_link() itself is wrong and I've > already mentioned the reasons. > > > For example, consider the intel-gw driver. The 1 sec wait time in its > > probe path is also a waste because it explicitly starts link training > > later in time. > > > > I previously mentioned that the intel-gw needs fixing since there is no point in > starting the link and waiting for it to come up in its probe() if the DWC core > is already doing that. > > - Mani > > -- > மணிவண்ணன் சதாசிவம் I think we are at a dead-end in terms of agreeing to a policy. I would like the maintainers to pitch in here with their views.
On Tue, Jan 30, 2024 at 10:48:59PM +0530, Ajay Agarwal wrote: > On Tue, Jan 30, 2024 at 05:59:06PM +0530, Manivannan Sadhasivam wrote: > > On Tue, Jan 30, 2024 at 02:30:27PM +0530, Ajay Agarwal wrote: > > > > [...] > > > > > > > > > > If that's the case with your driver, when are you starting the link training? > > > > > > > > > > > > > > > The link training starts later based on a userspace/debugfs trigger. > > > > > > > > > > > > > > > > > > > Why does it happen as such? What's the problem in starting the link during > > > > > > probe? Keep it in mind that if you rely on the userspace for starting the link > > > > > > based on a platform (like Android), then if the same SoC or peripheral instance > > > > > > get reused in other platform (non-android), the it won't be a seamless user > > > > > > experience. > > > > > > > > > > > > If there are any other usecases, please state them. > > > > > > > > > > > > - Mani > > > > > > > > > > > This SoC is targeted for an android phone usecase and the endpoints > > > > > being enumerated need to go through an appropriate and device specific > > > > > power sequence which gets triggered only when the userspace is up. The > > > > > PCIe probe cannot assume that the EPs have been powered up already and > > > > > hence the link-up is not attempted. > > > > > > > > Still, I do not see the necessity to not call start_link() during probe. If you > > > I am not adding any logic/condition around calling the start_link() > > > itself. I am only avoiding the wait for the link to be up if the > > > controller driver has not defined start_link(). > > > > > > > I'm saying that not defining the start_link() callback itself is wrong. > > > Whether the start_link() should be defined or not, is a different > design discussion. We currently have 2 drivers in upstream (intel-gw and > dw-plat) which do not have start_link() defined. Waiting for the link to > come up for the platforms using those drivers is not a good idea. And > that is what we are trying to avoid. > NO. The sole intention of this patch is to fix the delay observed with _your_ out-of-tree controller driver as you explicitly said before. Impact for the existing 2 drivers are just a side effect. > > > > add PROBE_PREFER_ASYNCHRONOUS to your controller driver, this delay would become > > > > negligible. The reason why I'm against not calling start_link() is due to below > > > > reasons: > > > > > > > > 1. If the same SoC gets reused for other platforms, even other android phones > > > > that powers up the endpoints during boot, then it creates a dependency with > > > > userspace to always start the link even though the devices were available. > > > > That's why we should never fix the behavior of the controller drivers based on a > > > > single platform. > > > I wonder how the behavior is changing with this patch. Do you have an > > > example of a platform which does not have start_link() defined but would > > > like to still wait for a second for the link to come up? > > > > > > > Did you went through my reply completely? I mentioned that the 1s delay would be > > gone if you add the async flag to your driver and you are ignoring that. > > > Yes, I did go through your suggestion of async probe and that might > solve my problem of the 1 sec delay. But I would like to fix the problem > at the core. > There is no problem at the core. The problem is with some controller drivers. Please do not try to fix a problem which is not there. There are no _special_ reasons for those 2 drivers to not define start_link() callback. I'm trying to point you in the right path, but you are always chosing the other one. > > But again, I'm saying that not defining start_link() itself is wrong and I've > > already mentioned the reasons. > > > > > For example, consider the intel-gw driver. The 1 sec wait time in its > > > probe path is also a waste because it explicitly starts link training > > > later in time. > > > > > > > I previously mentioned that the intel-gw needs fixing since there is no point in > > starting the link and waiting for it to come up in its probe() if the DWC core > > is already doing that. > > > > - Mani > > > > -- > > மணிவண்ணன் சதாசிவம் > I think we are at a dead-end in terms of agreeing to a policy. I would > like the maintainers to pitch in here with their views. I'm the maintainer of the DWC drivers that you are proposing the patch for. If you happen to spin future revision of this series, please carry: Nacked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> - Mani
On Fri, Jan 19, 2024 at 01:22:19PM +0530, Manivannan Sadhasivam wrote: > On Fri, Jan 12, 2024 at 03:00:06PM +0530, Ajay Agarwal wrote: > > In dw_pcie_host_init() regardless of whether the link has been > > started or not, the code waits for the link to come up. Even in > > cases where start_link() is not defined the code ends up spinning > > in a loop for 1 second. Since in some systems dw_pcie_host_init() > > gets called during probe, this one second loop for each pcie > > interface instance ends up extending the boot time. > > Which platform you are working on? Is that upstreamed? You should mention the > specific platform where you are observing the issue. > > Right now, intel-gw and designware-plat are the only drivers not > defining that callback. First one definitely needs a fixup and I do > not know how the latter works. What fixup do you have in mind for intel-gw? It looks a little strange to me because it duplicates dw_pcie_setup_rc() and dw_pcie_wait_for_link(): dw_pcie_host_init() calls them first via pp->ops->init(), and then calls them a second time directly: struct dw_pcie_host_ops intel_pcie_dw_ops = { .init = intel_pcie_rc_init } intel_pcie_probe pp->ops = &intel_pcie_dw_ops dw_pcie_host_init(pp) if (pp->ops->init) pp->ops->init intel_pcie_rc_init intel_pcie_host_setup dw_pcie_setup_rc # <-- dw_pcie_wait_for_link # <-- dw_pcie_setup_rc # <-- dw_pcie_wait_for_link # <-- Is that what you're thinking? Bjorn
[+cc Chuanhua Lei, intel-gw maintainer, sorry I forgot this!] On Wed, Jan 31, 2024 at 05:48:17PM -0600, Bjorn Helgaas wrote: > On Fri, Jan 19, 2024 at 01:22:19PM +0530, Manivannan Sadhasivam wrote: > > On Fri, Jan 12, 2024 at 03:00:06PM +0530, Ajay Agarwal wrote: > > > In dw_pcie_host_init() regardless of whether the link has been > > > started or not, the code waits for the link to come up. Even in > > > cases where start_link() is not defined the code ends up spinning > > > in a loop for 1 second. Since in some systems dw_pcie_host_init() > > > gets called during probe, this one second loop for each pcie > > > interface instance ends up extending the boot time. > > > > Which platform you are working on? Is that upstreamed? You should mention the > > specific platform where you are observing the issue. > > > > Right now, intel-gw and designware-plat are the only drivers not > > defining that callback. First one definitely needs a fixup and I do > > not know how the latter works. > > What fixup do you have in mind for intel-gw? > > It looks a little strange to me because it duplicates > dw_pcie_setup_rc() and dw_pcie_wait_for_link(): dw_pcie_host_init() > calls them first via pp->ops->init(), and then calls them a second > time directly: > > struct dw_pcie_host_ops intel_pcie_dw_ops = { > .init = intel_pcie_rc_init > } > > intel_pcie_probe > pp->ops = &intel_pcie_dw_ops > dw_pcie_host_init(pp) > if (pp->ops->init) > pp->ops->init > intel_pcie_rc_init > intel_pcie_host_setup > dw_pcie_setup_rc # <-- > dw_pcie_wait_for_link # <-- > dw_pcie_setup_rc # <-- > dw_pcie_wait_for_link # <-- > > Is that what you're thinking? > > Bjorn
On Wed, Jan 31, 2024 at 09:14:13PM -0600, Bjorn Helgaas wrote: > [+cc Chuanhua Lei, intel-gw maintainer, sorry I forgot this!] > > On Wed, Jan 31, 2024 at 05:48:17PM -0600, Bjorn Helgaas wrote: > > On Fri, Jan 19, 2024 at 01:22:19PM +0530, Manivannan Sadhasivam wrote: > > > On Fri, Jan 12, 2024 at 03:00:06PM +0530, Ajay Agarwal wrote: > > > > In dw_pcie_host_init() regardless of whether the link has been > > > > started or not, the code waits for the link to come up. Even in > > > > cases where start_link() is not defined the code ends up spinning > > > > in a loop for 1 second. Since in some systems dw_pcie_host_init() > > > > gets called during probe, this one second loop for each pcie > > > > interface instance ends up extending the boot time. > > > > > > Which platform you are working on? Is that upstreamed? You should mention the > > > specific platform where you are observing the issue. > > > > > > Right now, intel-gw and designware-plat are the only drivers not > > > defining that callback. First one definitely needs a fixup and I do > > > not know how the latter works. > > > > What fixup do you have in mind for intel-gw? > > > > It looks a little strange to me because it duplicates > > dw_pcie_setup_rc() and dw_pcie_wait_for_link(): dw_pcie_host_init() > > calls them first via pp->ops->init(), and then calls them a second > > time directly: > > > > struct dw_pcie_host_ops intel_pcie_dw_ops = { > > .init = intel_pcie_rc_init > > } > > > > intel_pcie_probe > > pp->ops = &intel_pcie_dw_ops > > dw_pcie_host_init(pp) > > if (pp->ops->init) > > pp->ops->init > > intel_pcie_rc_init > > intel_pcie_host_setup > > dw_pcie_setup_rc # <-- > > dw_pcie_wait_for_link # <-- > > dw_pcie_setup_rc # <-- > > dw_pcie_wait_for_link # <-- > > > > Is that what you're thinking? > > Right. There is no need of this driver duplicating dw_pcie_setup_rc() and dw_pcie_wait_for_link(). Perhaps those functions were added to dw_pcie_host_init() after this driver got upstreamed and the author failed to take this driver into account. But my point was, the new drivers _should_not_ take inspiration from this driver to not define start_link() callback at the first place (unless there is a real requirement). - Mani
We will work on the updated version to remove the duplication. Chuanhua From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Sent: Thursday, February 1, 2024 3:32 PM To: Bjorn Helgaas <helgaas@kernel.org> Cc: Ajay Agarwal <ajayagarwal@google.com>; Jingoo Han <jingoohan1@gmail.com>; Johan Hovold <johan+linaro@kernel.org>; Jon Hunter <jonathanh@nvidia.com>; Lorenzo Pieralisi <lpieralisi@kernel.org>; Krzysztof Wilczyński <kw@linux.com>; Rob Herring <robh@kernel.org>; Bjorn Helgaas <bhelgaas@google.com>; Manu Gautam <manugautam@google.com>; Doug Zobel <zobel@google.com>; William McVicker <willmcvicker@google.com>; Serge Semin <fancer.lancer@gmail.com>; Robin Murphy <robin.murphy@arm.com>; linux-pci@vger.kernel.org <linux-pci@vger.kernel.org>; Lei Chuan Hua <lchuanhua@maxlinear.com> Subject: Re: [PATCH v5] PCI: dwc: Wait for link up only if link is started This email was sent from outside of MaxLinear. On Wed, Jan 31, 2024 at 09:14:13PM -0600, Bjorn Helgaas wrote: > [+cc Chuanhua Lei, intel-gw maintainer, sorry I forgot this!] > > On Wed, Jan 31, 2024 at 05:48:17PM -0600, Bjorn Helgaas wrote: > > On Fri, Jan 19, 2024 at 01:22:19PM +0530, Manivannan Sadhasivam wrote: > > > On Fri, Jan 12, 2024 at 03:00:06PM +0530, Ajay Agarwal wrote: > > > > In dw_pcie_host_init() regardless of whether the link has been > > > > started or not, the code waits for the link to come up. Even in > > > > cases where start_link() is not defined the code ends up spinning > > > > in a loop for 1 second. Since in some systems dw_pcie_host_init() > > > > gets called during probe, this one second loop for each pcie > > > > interface instance ends up extending the boot time. > > > > > > Which platform you are working on? Is that upstreamed? You should mention the > > > specific platform where you are observing the issue. > > > > > > Right now, intel-gw and designware-plat are the only drivers not > > > defining that callback. First one definitely needs a fixup and I do > > > not know how the latter works. > > > > What fixup do you have in mind for intel-gw? > > > > It looks a little strange to me because it duplicates > > dw_pcie_setup_rc() and dw_pcie_wait_for_link(): dw_pcie_host_init() > > calls them first via pp->ops->init(), and then calls them a second > > time directly: > > > > struct dw_pcie_host_ops intel_pcie_dw_ops = { > > .init = intel_pcie_rc_init > > } > > > > intel_pcie_probe > > pp->ops = &intel_pcie_dw_ops > > dw_pcie_host_init(pp) > > if (pp->ops->init) > > pp->ops->init > > intel_pcie_rc_init > > intel_pcie_host_setup > > dw_pcie_setup_rc # <-- > > dw_pcie_wait_for_link # <-- > > dw_pcie_setup_rc # <-- > > dw_pcie_wait_for_link # <-- > > > > Is that what you're thinking? > > Right. There is no need of this driver duplicating dw_pcie_setup_rc() and dw_pcie_wait_for_link(). Perhaps those functions were added to dw_pcie_host_init() after this driver got upstreamed and the author failed to take this driver into account. But my point was, the new drivers _should_not_ take inspiration from this driver to not define start_link() callback at the first place (unless there is a real requirement). - Mani -- மணிவண்ணன் சதாசிவம்
On Wed, Jan 31, 2024 at 12:06:26AM +0530, Manivannan Sadhasivam wrote: > On Tue, Jan 30, 2024 at 10:48:59PM +0530, Ajay Agarwal wrote: > > On Tue, Jan 30, 2024 at 05:59:06PM +0530, Manivannan Sadhasivam wrote: > > > On Tue, Jan 30, 2024 at 02:30:27PM +0530, Ajay Agarwal wrote: > > > > > > [...] > > > > > > > > > > > > If that's the case with your driver, when are you starting the link training? > > > > > > > > > > > > > > > > > The link training starts later based on a userspace/debugfs trigger. > > > > > > > > > > > > > > > > > > > > > > Why does it happen as such? What's the problem in starting the link during > > > > > > > probe? Keep it in mind that if you rely on the userspace for starting the link > > > > > > > based on a platform (like Android), then if the same SoC or peripheral instance > > > > > > > get reused in other platform (non-android), the it won't be a seamless user > > > > > > > experience. > > > > > > > > > > > > > > If there are any other usecases, please state them. > > > > > > > > > > > > > > - Mani > > > > > > > > > > > > > This SoC is targeted for an android phone usecase and the endpoints > > > > > > being enumerated need to go through an appropriate and device specific > > > > > > power sequence which gets triggered only when the userspace is up. The > > > > > > PCIe probe cannot assume that the EPs have been powered up already and > > > > > > hence the link-up is not attempted. > > > > > > > > > > Still, I do not see the necessity to not call start_link() during probe. If you > > > > I am not adding any logic/condition around calling the start_link() > > > > itself. I am only avoiding the wait for the link to be up if the > > > > controller driver has not defined start_link(). > > > > > > > > > > I'm saying that not defining the start_link() callback itself is wrong. > > > > > Whether the start_link() should be defined or not, is a different > > design discussion. We currently have 2 drivers in upstream (intel-gw and > > dw-plat) which do not have start_link() defined. Waiting for the link to > > come up for the platforms using those drivers is not a good idea. And > > that is what we are trying to avoid. > > > > NO. The sole intention of this patch is to fix the delay observed with _your_ > out-of-tree controller driver as you explicitly said before. Impact for the > existing 2 drivers are just a side effect. > Hi Mani, What is the expectation from the pcie-designware-host driver? If the .start_link() has to be defined by the vendor driver, then shouldn't the probe be failed if the vendor has not defined it? Thereby failing the probe for intel-gw and pcie-dw-plat drivers? Additionally, if the link fails to come up even after 1 sec of wait time, shouldn't the probe be failed in that case too? My understanding of these drivers is that the .start_link() is an OPTIONAL callback and that the dw_pcie_host_init should help setup the SW structures regardless of whether the .start_link() has been defined or not, and whether the link is up or not. The vendor should be allowed to train the link at a later point of time as well. Please let me know your thoughts. > > > > > add PROBE_PREFER_ASYNCHRONOUS to your controller driver, this delay would become > > > > > negligible. The reason why I'm against not calling start_link() is due to below > > > > > reasons: > > > > > > > > > > 1. If the same SoC gets reused for other platforms, even other android phones > > > > > that powers up the endpoints during boot, then it creates a dependency with > > > > > userspace to always start the link even though the devices were available. > > > > > That's why we should never fix the behavior of the controller drivers based on a > > > > > single platform. > > > > I wonder how the behavior is changing with this patch. Do you have an > > > > example of a platform which does not have start_link() defined but would > > > > like to still wait for a second for the link to come up? > > > > > > > > > > Did you went through my reply completely? I mentioned that the 1s delay would be > > > gone if you add the async flag to your driver and you are ignoring that. > > > The async probe might not help in all the cases. Consider a situation where the PCIe is probed after the boot is already completed. The user will face the delay then. Do you agree? > > Yes, I did go through your suggestion of async probe and that might > > solve my problem of the 1 sec delay. But I would like to fix the problem > > at the core. > > > > There is no problem at the core. The problem is with some controller drivers. > Please do not try to fix a problem which is not there. There are no _special_ > reasons for those 2 drivers to not define start_link() callback. I'm trying to > point you in the right path, but you are always chosing the other one. > > > > But again, I'm saying that not defining start_link() itself is wrong and I've > > > already mentioned the reasons. > > > > > > > For example, consider the intel-gw driver. The 1 sec wait time in its > > > > probe path is also a waste because it explicitly starts link training > > > > later in time. > > > > > > > > > > I previously mentioned that the intel-gw needs fixing since there is no point in > > > starting the link and waiting for it to come up in its probe() if the DWC core > > > is already doing that. > > > > > > - Mani > > > > > > -- > > > மணிவண்ணன் சதாசிவம் > > I think we are at a dead-end in terms of agreeing to a policy. I would > > like the maintainers to pitch in here with their views. > > I'm the maintainer of the DWC drivers that you are proposing the patch for. If > you happen to spin future revision of this series, please carry: > > Nacked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > - Mani > > -- > மணிவண்ணன் சதாசிவம்
+ Joao On Mon, Feb 05, 2024 at 04:30:20PM +0530, Ajay Agarwal wrote: > On Wed, Jan 31, 2024 at 12:06:26AM +0530, Manivannan Sadhasivam wrote: > > On Tue, Jan 30, 2024 at 10:48:59PM +0530, Ajay Agarwal wrote: > > > On Tue, Jan 30, 2024 at 05:59:06PM +0530, Manivannan Sadhasivam wrote: > > > > On Tue, Jan 30, 2024 at 02:30:27PM +0530, Ajay Agarwal wrote: > > > > > > > > [...] > > > > > > > > > > > > > > If that's the case with your driver, when are you starting the link training? > > > > > > > > > > > > > > > > > > > The link training starts later based on a userspace/debugfs trigger. > > > > > > > > > > > > > > > > > > > > > > > > > Why does it happen as such? What's the problem in starting the link during > > > > > > > > probe? Keep it in mind that if you rely on the userspace for starting the link > > > > > > > > based on a platform (like Android), then if the same SoC or peripheral instance > > > > > > > > get reused in other platform (non-android), the it won't be a seamless user > > > > > > > > experience. > > > > > > > > > > > > > > > > If there are any other usecases, please state them. > > > > > > > > > > > > > > > > - Mani > > > > > > > > > > > > > > > This SoC is targeted for an android phone usecase and the endpoints > > > > > > > being enumerated need to go through an appropriate and device specific > > > > > > > power sequence which gets triggered only when the userspace is up. The > > > > > > > PCIe probe cannot assume that the EPs have been powered up already and > > > > > > > hence the link-up is not attempted. > > > > > > > > > > > > Still, I do not see the necessity to not call start_link() during probe. If you > > > > > I am not adding any logic/condition around calling the start_link() > > > > > itself. I am only avoiding the wait for the link to be up if the > > > > > controller driver has not defined start_link(). > > > > > > > > > > > > > I'm saying that not defining the start_link() callback itself is wrong. > > > > > > > Whether the start_link() should be defined or not, is a different > > > design discussion. We currently have 2 drivers in upstream (intel-gw and > > > dw-plat) which do not have start_link() defined. Waiting for the link to > > > come up for the platforms using those drivers is not a good idea. And > > > that is what we are trying to avoid. > > > > > > > NO. The sole intention of this patch is to fix the delay observed with _your_ > > out-of-tree controller driver as you explicitly said before. Impact for the > > existing 2 drivers are just a side effect. > > > Hi Mani, > What is the expectation from the pcie-designware-host driver? If the > .start_link() has to be defined by the vendor driver, then shouldn't the > probe be failed if the vendor has not defined it? Thereby failing the > probe for intel-gw and pcie-dw-plat drivers? > intel-gw maintainer agreed to fix the driver [1], but I cannot really comment on the other one. It is not starting the link at all, so don't know how it works. I've CCed the driver author to get some idea. [1] https://lore.kernel.org/linux-pci/BY3PR19MB50764E90F107B3256189804CBD432@BY3PR19MB5076.namprd19.prod.outlook.com/ > Additionally, if the link fails to come up even after 1 sec of wait > time, shouldn't the probe be failed in that case too? > Why? The device can be attached at any point of time. What I'm stressing is, the driver should check for the link to be up during probe and if there is no device, then it should just continue and hope for the device to show up later. This way, the driver can detect the powered up devices during boot and also detect the hotplug devices. > My understanding of these drivers is that the .start_link() is an > OPTIONAL callback and that the dw_pcie_host_init should help setup the > SW structures regardless of whether the .start_link() has been defined > or not, and whether the link is up or not. The vendor should be allowed > to train the link at a later point of time as well. > What do you mean by later point of time? Bringing the link through debugfs? NO. We cannot let each driver behave differently, unless there is a really valid reason. > Please let me know your thoughts. > > > > > > add PROBE_PREFER_ASYNCHRONOUS to your controller driver, this delay would become > > > > > > negligible. The reason why I'm against not calling start_link() is due to below > > > > > > reasons: > > > > > > > > > > > > 1. If the same SoC gets reused for other platforms, even other android phones > > > > > > that powers up the endpoints during boot, then it creates a dependency with > > > > > > userspace to always start the link even though the devices were available. > > > > > > That's why we should never fix the behavior of the controller drivers based on a > > > > > > single platform. > > > > > I wonder how the behavior is changing with this patch. Do you have an > > > > > example of a platform which does not have start_link() defined but would > > > > > like to still wait for a second for the link to come up? > > > > > > > > > > > > > Did you went through my reply completely? I mentioned that the 1s delay would be > > > > gone if you add the async flag to your driver and you are ignoring that. > > > > > The async probe might not help in all the cases. Consider a situation > where the PCIe is probed after the boot is already completed. The user > will face the delay then. Do you agree? > You mean loading the driver module post boot? If the link is still not up, yes the users will see the 1sec delay. But again, the point I'm trying to make here is, all the drivers have to follow one flow. We cannot let each driver do its way of starting the link. There could be exceptions if we get a valid reason for a driver to not do so, but so far I haven't come across such reason. (the existing drivers, intel-gw and designware-plat are not exceptions, but they need fixing). For your driver, you said that the userspace brings up the link, post boot when the devices are powered on. So starting the link during probe incurs 1s delay, as there would be no devices. But I suggested you to enable async probe to nullify the 1s delay during probe and you confirmed that it fixes the issue for you. Then you are again debating about not defining the start_link() callback :( I've made by point clear, please do not take inspiration from those drivers, they really need fixing. And for your usecase, allowing the controller driver to start the link post boot just because a device on your Pixel phone comes up later is not a good argument. You _should_not_ define the behavior of a controller driver based on one platform, it is really a bad design. - Mani > > > Yes, I did go through your suggestion of async probe and that might > > > solve my problem of the 1 sec delay. But I would like to fix the problem > > > at the core. > > > > > > > There is no problem at the core. The problem is with some controller drivers. > > Please do not try to fix a problem which is not there. There are no _special_ > > reasons for those 2 drivers to not define start_link() callback. I'm trying to > > point you in the right path, but you are always chosing the other one. > > > > > > But again, I'm saying that not defining start_link() itself is wrong and I've > > > > already mentioned the reasons. > > > > > > > > > For example, consider the intel-gw driver. The 1 sec wait time in its > > > > > probe path is also a waste because it explicitly starts link training > > > > > later in time. > > > > > > > > > > > > > I previously mentioned that the intel-gw needs fixing since there is no point in > > > > starting the link and waiting for it to come up in its probe() if the DWC core > > > > is already doing that. > > > > > > > > - Mani > > > > > > > > -- > > > > மணிவண்ணன் சதாசிவம் > > > I think we are at a dead-end in terms of agreeing to a policy. I would > > > like the maintainers to pitch in here with their views. > > > > I'm the maintainer of the DWC drivers that you are proposing the patch for. If > > you happen to spin future revision of this series, please carry: > > > > Nacked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > > > - Mani > > > > -- > > மணிவண்ணன் சதாசிவம்
On Tue, Feb 06, 2024 at 10:40:43PM +0530, Manivannan Sadhasivam wrote: > ... > ... And for your usecase, allowing the controller driver to start > the link post boot just because a device on your Pixel phone comes > up later is not a good argument. You _should_not_ define the > behavior of a controller driver based on one platform, it is really > a bad design. I haven't followed the entire discussion, and I don't know much about the specifics of Ajay's situation, but from the controller driver's point of view, shouldn't a device coming up later look like a normal hot-add? I think most drivers are designed with the assumption that Endpoints are present and powered up at the time of host controller probe, which seems a little stronger than necessary. I think the host controller probe should initialize the Root Port such that its LTSSM enters the Detect state, and that much should be basically straight-line code with no waiting. If no Endpoint is attached, i.e., "the slot is empty", it would be nice if the probe could then complete immediately without waiting at all. If the link comes up later, could we handle it as a hot-add? This might be an actual hot-add, or it might be that an Endpoint was present at boot but link training didn't complete until later. I admit it doesn't look trivial to actually implement this. We would need to be able to detect link-up events, e.g., via hotplug or other link management interrupts. Lacking that hardware functionality, we might need driver-specific code to wait for the link to come up (possibly drivers could skip the wait if they can detect the "slot empty" case). Also, the hotplug functionality (pciehp or acpiphp) is currently initialized later and there's probably a race with enabling and detecting hot-add events in the "slot occupied" case. Bjorn
On Wed, Feb 14, 2024 at 04:02:28PM -0600, Bjorn Helgaas wrote: > On Tue, Feb 06, 2024 at 10:40:43PM +0530, Manivannan Sadhasivam wrote: > > ... > > > ... And for your usecase, allowing the controller driver to start > > the link post boot just because a device on your Pixel phone comes > > up later is not a good argument. You _should_not_ define the > > behavior of a controller driver based on one platform, it is really > > a bad design. > > I haven't followed the entire discussion, and I don't know much about > the specifics of Ajay's situation, but from the controller driver's > point of view, shouldn't a device coming up later look like a normal > hot-add? > Yes, but most of the form factors that these drivers work with do not support native hotplug. So users have to rescan the bus through sysfs. > I think most drivers are designed with the assumption that Endpoints > are present and powered up at the time of host controller probe, which > seems a little stronger than necessary. > Most of the drivers work with endpoints that are fixed in the board design (like M.2), so the endpoints would be up when the controller probes. > I think the host controller probe should initialize the Root Port such > that its LTSSM enters the Detect state, and that much should be > basically straight-line code with no waiting. If no Endpoint is > attached, i.e., "the slot is empty", it would be nice if the probe > could then complete immediately without waiting at all. > Atleast on Qcom platforms, the LTSSM would be in "Detect" state even if no endpoints are found during probe. Then once an endpoint comes up later, link training happens and user can rescan the bus through sysfs. But, I don't know the real need of 1s loop to wait for the link. It predates my work on DWC drivers. Maybe Lorenzo could shed some light. I could not find the reference in both DWC and PCIe specs (maybe my grep was bad). > If the link comes up later, could we handle it as a hot-add? This > might be an actual hot-add, or it might be that an Endpoint was > present at boot but link training didn't complete until later. > > I admit it doesn't look trivial to actually implement this. We would > need to be able to detect link-up events, e.g., via hotplug or other > link management interrupts. Lacking that hardware functionality, we > might need driver-specific code to wait for the link to come up > (possibly drivers could skip the wait if they can detect the "slot > empty" case). > > Also, the hotplug functionality (pciehp or acpiphp) is currently > initialized later and there's probably a race with enabling and > detecting hot-add events in the "slot occupied" case. > As I mentioned above, hotplug is not possible in all the cases. There is a series floating to add GPIO based hotplug, but still that requires board designers to route a dedicated GPIO to the endpoint. To conclude, we do need to check for the existence of the endpoints during probe. But whether the driver should wait for 1s for the link to come up, should be clarified by Lorenzo. - Mani
On Thu, Feb 15, 2024 at 07:39:08PM +0530, Manivannan Sadhasivam wrote: > On Wed, Feb 14, 2024 at 04:02:28PM -0600, Bjorn Helgaas wrote: > > On Tue, Feb 06, 2024 at 10:40:43PM +0530, Manivannan Sadhasivam wrote: > > > ... > > > > > ... And for your usecase, allowing the controller driver to > > > start the link post boot just because a device on your Pixel > > > phone comes up later is not a good argument. You _should_not_ > > > define the behavior of a controller driver based on one > > > platform, it is really a bad design. > > > > I haven't followed the entire discussion, and I don't know much > > about the specifics of Ajay's situation, but from the controller > > driver's point of view, shouldn't a device coming up later look > > like a normal hot-add? > > Yes, but most of the form factors that these drivers work with do > not support native hotplug. So users have to rescan the bus through > sysfs. > > > I think most drivers are designed with the assumption that > > Endpoints are present and powered up at the time of host > > controller probe, which seems a little stronger than necessary. > > Most of the drivers work with endpoints that are fixed in the board > design (like M.2), so the endpoints would be up when the controller > probes. > > > I think the host controller probe should initialize the Root Port > > such that its LTSSM enters the Detect state, and that much should > > be basically straight-line code with no waiting. If no Endpoint > > is attached, i.e., "the slot is empty", it would be nice if the > > probe could then complete immediately without waiting at all. > > Atleast on Qcom platforms, the LTSSM would be in "Detect" state even > if no endpoints are found during probe. Then once an endpoint comes > up later, link training happens and user can rescan the bus through > sysfs. Can the hardware tell us when the link state changes? If so, we should be able to scan the bus automatically without the need for sysfs. For example, if the Root Port advertised PCI_EXP_FLAGS_SLOT, we might be able to use a Data Link Layer State Changed interrupt to scan the bus via pciehp when the Endpoint is powered up, even if the Endpoint is actually soldered down and not physically hot-pluggable. Bjorn
On Fri, Feb 16, 2024 at 06:07:23PM -0600, Bjorn Helgaas wrote: > On Thu, Feb 15, 2024 at 07:39:08PM +0530, Manivannan Sadhasivam wrote: > > On Wed, Feb 14, 2024 at 04:02:28PM -0600, Bjorn Helgaas wrote: > > > On Tue, Feb 06, 2024 at 10:40:43PM +0530, Manivannan Sadhasivam wrote: > > > > ... > > > > > > > ... And for your usecase, allowing the controller driver to > > > > start the link post boot just because a device on your Pixel > > > > phone comes up later is not a good argument. You _should_not_ > > > > define the behavior of a controller driver based on one > > > > platform, it is really a bad design. > > > > > > I haven't followed the entire discussion, and I don't know much > > > about the specifics of Ajay's situation, but from the controller > > > driver's point of view, shouldn't a device coming up later look > > > like a normal hot-add? > > > > Yes, but most of the form factors that these drivers work with do > > not support native hotplug. So users have to rescan the bus through > > sysfs. > > > > > I think most drivers are designed with the assumption that > > > Endpoints are present and powered up at the time of host > > > controller probe, which seems a little stronger than necessary. > > > > Most of the drivers work with endpoints that are fixed in the board > > design (like M.2), so the endpoints would be up when the controller > > probes. > > > > > I think the host controller probe should initialize the Root Port > > > such that its LTSSM enters the Detect state, and that much should > > > be basically straight-line code with no waiting. If no Endpoint > > > is attached, i.e., "the slot is empty", it would be nice if the > > > probe could then complete immediately without waiting at all. > > > > Atleast on Qcom platforms, the LTSSM would be in "Detect" state even > > if no endpoints are found during probe. Then once an endpoint comes > > up later, link training happens and user can rescan the bus through > > sysfs. > > Can the hardware tell us when the link state changes? If so, we > should be able to scan the bus automatically without the need for > sysfs. For example, if the Root Port advertised PCI_EXP_FLAGS_SLOT, > we might be able to use a Data Link Layer State Changed interrupt to > scan the bus via pciehp when the Endpoint is powered up, even if the > Endpoint is actually soldered down and not physically hot-pluggable. > I don't think the state change interrupt is generated on Qcom platforms. I will check with the hw team and reply back. As a reply to my earlier question on requiring 1s delay for waiting for the link to come up during boot: PCIe spec r5, sec 6.6.1 says: "Unless Readiness Notifications mechanisms are used, the Root Complex and/or system software must allow at least 1.0 s after a Conventional Reset of a device, before determining that the device is broken if it fails to return a Successful Completion status for a valid Configuration Request. This period is independent of how quickly Link training completes." So this might be the reason. If so, I don't see a way to avoid the delay. - Mani
On Tue, Feb 06, 2024 at 10:40:43PM +0530, Manivannan Sadhasivam wrote: > + Joao > > On Mon, Feb 05, 2024 at 04:30:20PM +0530, Ajay Agarwal wrote: > > On Wed, Jan 31, 2024 at 12:06:26AM +0530, Manivannan Sadhasivam wrote: > > > On Tue, Jan 30, 2024 at 10:48:59PM +0530, Ajay Agarwal wrote: > > > > On Tue, Jan 30, 2024 at 05:59:06PM +0530, Manivannan Sadhasivam wrote: > > > > > On Tue, Jan 30, 2024 at 02:30:27PM +0530, Ajay Agarwal wrote: > > > > > > > > > > [...] > > > > > > > > > > > > > > > > If that's the case with your driver, when are you starting the link training? > > > > > > > > > > > > > > > > > > > > > The link training starts later based on a userspace/debugfs trigger. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Why does it happen as such? What's the problem in starting the link during > > > > > > > > > probe? Keep it in mind that if you rely on the userspace for starting the link > > > > > > > > > based on a platform (like Android), then if the same SoC or peripheral instance > > > > > > > > > get reused in other platform (non-android), the it won't be a seamless user > > > > > > > > > experience. > > > > > > > > > > > > > > > > > > If there are any other usecases, please state them. > > > > > > > > > > > > > > > > > > - Mani > > > > > > > > > > > > > > > > > This SoC is targeted for an android phone usecase and the endpoints > > > > > > > > being enumerated need to go through an appropriate and device specific > > > > > > > > power sequence which gets triggered only when the userspace is up. The > > > > > > > > PCIe probe cannot assume that the EPs have been powered up already and > > > > > > > > hence the link-up is not attempted. > > > > > > > > > > > > > > Still, I do not see the necessity to not call start_link() during probe. If you > > > > > > I am not adding any logic/condition around calling the start_link() > > > > > > itself. I am only avoiding the wait for the link to be up if the > > > > > > controller driver has not defined start_link(). > > > > > > > > > > > > > > > > I'm saying that not defining the start_link() callback itself is wrong. > > > > > > > > > Whether the start_link() should be defined or not, is a different > > > > design discussion. We currently have 2 drivers in upstream (intel-gw and > > > > dw-plat) which do not have start_link() defined. Waiting for the link to > > > > come up for the platforms using those drivers is not a good idea. And > > > > that is what we are trying to avoid. > > > > > > > > > > NO. The sole intention of this patch is to fix the delay observed with _your_ > > > out-of-tree controller driver as you explicitly said before. Impact for the > > > existing 2 drivers are just a side effect. > > > > > Hi Mani, > > What is the expectation from the pcie-designware-host driver? If the > > .start_link() has to be defined by the vendor driver, then shouldn't the > > probe be failed if the vendor has not defined it? Thereby failing the > > probe for intel-gw and pcie-dw-plat drivers? > > > > intel-gw maintainer agreed to fix the driver [1], but I cannot really comment on > the other one. It is not starting the link at all, so don't know how it works. > I've CCed the driver author to get some idea. > > [1] https://lore.kernel.org/linux-pci/BY3PR19MB50764E90F107B3256189804CBD432@BY3PR19MB5076.namprd19.prod.outlook.com/ > > > Additionally, if the link fails to come up even after 1 sec of wait > > time, shouldn't the probe be failed in that case too? > > > > Why? The device can be attached at any point of time. What I'm stressing is, the > driver should check for the link to be up during probe and if there is no > device, then it should just continue and hope for the device to show up later. My change is still checking whether the link is up during probe. If yes, print the link status (negotiated speed and width). If no, and the .start_link() exists, then call the same and wait for 1 second for the link to be up. > This way, the driver can detect the powered up devices during boot and also > detect the hotplug devices. > If the .start_link() is not defined, how will the link come up? Are you assuming that the bootloader might have enabled link training? > > My understanding of these drivers is that the .start_link() is an > > OPTIONAL callback and that the dw_pcie_host_init should help setup the > > SW structures regardless of whether the .start_link() has been defined > > or not, and whether the link is up or not. The vendor should be allowed > > to train the link at a later point of time as well. > > > > What do you mean by later point of time? Bringing the link through debugfs? NO. > We cannot let each driver behave differently, unless there is a really valid > reason. > pci_rescan_bus() is an exported API. I am assuming that this means that it is supposed to be used by modules when they know that the link is up. If a module wishes to bring the link up using debugfs or some other HW trigger, why is that a bad design? In my opinion, this is providing more options to the HW/product designer and the module author, in addition to the already existing .start_link() callback. > > Please let me know your thoughts. > > > > > > > add PROBE_PREFER_ASYNCHRONOUS to your controller driver, this delay would become > > > > > > > negligible. The reason why I'm against not calling start_link() is due to below > > > > > > > reasons: > > > > > > > > > > > > > > 1. If the same SoC gets reused for other platforms, even other android phones > > > > > > > that powers up the endpoints during boot, then it creates a dependency with > > > > > > > userspace to always start the link even though the devices were available. > > > > > > > That's why we should never fix the behavior of the controller drivers based on a > > > > > > > single platform. > > > > > > I wonder how the behavior is changing with this patch. Do you have an > > > > > > example of a platform which does not have start_link() defined but would > > > > > > like to still wait for a second for the link to come up? > > > > > > > > > > > > > > > > Did you went through my reply completely? I mentioned that the 1s delay would be > > > > > gone if you add the async flag to your driver and you are ignoring that. > > > > > > > The async probe might not help in all the cases. Consider a situation > > where the PCIe is probed after the boot is already completed. The user > > will face the delay then. Do you agree? > > > > You mean loading the driver module post boot? If the link is still not up, yes > the users will see the 1sec delay. > > But again, the point I'm trying to make here is, all the drivers have to follow > one flow. We cannot let each driver do its way of starting the link. There could > be exceptions if we get a valid reason for a driver to not do so, but so far I > haven't come across such reason. (the existing drivers, intel-gw and > designware-plat are not exceptions, but they need fixing). > > For your driver, you said that the userspace brings up the link, post boot when > the devices are powered on. So starting the link during probe incurs 1s delay, > as there would be no devices. But I suggested you to enable async probe to > nullify the 1s delay during probe and you confirmed that it fixes the issue for > you. > There are emulation/simulation platforms in which 1 second of runtime can correspond to ~1 hour of real-world time. Even if PCIe is probed asyncronously, it will still block the next set of processes. > Then you are again debating about not defining the start_link() callback :( > I am not sure why you think I am debating against defining the .start_link() callback. It is clearly an optional pointer and I am choosing to not use it because of the usecase. And if it is optional and I am not using it, why waste 1 sec of runtime waiting for the link? Do we have an example in upstream of platforms where this 1 sec could prove useful where link training is not being started by the platform driver but EPs have to be detected because they are present and powered-up? > I've made by point clear, please do not take inspiration from those drivers, > they really need fixing. And for your usecase, allowing the controller driver > to start the link post boot just because a device on your Pixel phone comes up > later is not a good argument. You _should_not_ define the behavior of a > controller driver based on one platform, it is really a bad design. > > - Mani > > > > > Yes, I did go through your suggestion of async probe and that might > > > > solve my problem of the 1 sec delay. But I would like to fix the problem > > > > at the core. > > > > > > > > > > There is no problem at the core. The problem is with some controller drivers. > > > Please do not try to fix a problem which is not there. There are no _special_ > > > reasons for those 2 drivers to not define start_link() callback. I'm trying to > > > point you in the right path, but you are always chosing the other one. > > > > > > > > But again, I'm saying that not defining start_link() itself is wrong and I've > > > > > already mentioned the reasons. > > > > > > > > > > > For example, consider the intel-gw driver. The 1 sec wait time in its > > > > > > probe path is also a waste because it explicitly starts link training > > > > > > later in time. > > > > > > > > > > > > > > > > I previously mentioned that the intel-gw needs fixing since there is no point in > > > > > starting the link and waiting for it to come up in its probe() if the DWC core > > > > > is already doing that. > > > > > > > > > > - Mani > > > > > > > > > > -- > > > > > மணிவண்ணன் சதாசிவம் > > > > I think we are at a dead-end in terms of agreeing to a policy. I would > > > > like the maintainers to pitch in here with their views. > > > > > > I'm the maintainer of the DWC drivers that you are proposing the patch for. If > > > you happen to spin future revision of this series, please carry: > > > > > > Nacked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > > > > > - Mani > > > > > > -- > > > மணிவண்ணன் சதாசிவம் > > -- > மணிவண்ணன் சதாசிவம்
On Thu, Feb 15, 2024 at 07:39:08PM +0530, Manivannan Sadhasivam wrote: > On Wed, Feb 14, 2024 at 04:02:28PM -0600, Bjorn Helgaas wrote: > > On Tue, Feb 06, 2024 at 10:40:43PM +0530, Manivannan Sadhasivam wrote: > > > ... > > > > > ... And for your usecase, allowing the controller driver to start > > > the link post boot just because a device on your Pixel phone comes > > > up later is not a good argument. You _should_not_ define the > > > behavior of a controller driver based on one platform, it is really > > > a bad design. > > > > I haven't followed the entire discussion, and I don't know much about > > the specifics of Ajay's situation, but from the controller driver's > > point of view, shouldn't a device coming up later look like a normal > > hot-add? > > > > Yes, but most of the form factors that these drivers work with do not support > native hotplug. So users have to rescan the bus through sysfs. > > > I think most drivers are designed with the assumption that Endpoints > > are present and powered up at the time of host controller probe, which > > seems a little stronger than necessary. > > > > Most of the drivers work with endpoints that are fixed in the board design (like > M.2), so the endpoints would be up when the controller probes. > > > I think the host controller probe should initialize the Root Port such > > that its LTSSM enters the Detect state, and that much should be > > basically straight-line code with no waiting. If no Endpoint is > > attached, i.e., "the slot is empty", it would be nice if the probe > > could then complete immediately without waiting at all. > > > > Atleast on Qcom platforms, the LTSSM would be in "Detect" state even if no > endpoints are found during probe. Then once an endpoint comes up later, link > training happens and user can rescan the bus through sysfs. > > But, I don't know the real need of 1s loop to wait for the link. It predates my > work on DWC drivers. Maybe Lorenzo could shed some light. I could not find the > reference in both DWC and PCIe specs (maybe my grep was bad). > > > If the link comes up later, could we handle it as a hot-add? This > > might be an actual hot-add, or it might be that an Endpoint was > > present at boot but link training didn't complete until later. > > > > I admit it doesn't look trivial to actually implement this. We would > > need to be able to detect link-up events, e.g., via hotplug or other > > link management interrupts. Lacking that hardware functionality, we > > might need driver-specific code to wait for the link to come up > > (possibly drivers could skip the wait if they can detect the "slot > > empty" case). > > > > Also, the hotplug functionality (pciehp or acpiphp) is currently > > initialized later and there's probably a race with enabling and > > detecting hot-add events in the "slot occupied" case. > > > > As I mentioned above, hotplug is not possible in all the cases. There is a > series floating to add GPIO based hotplug, but still that requires board > designers to route a dedicated GPIO to the endpoint. > > To conclude, we do need to check for the existence of the endpoints during > probe. But whether the driver should wait for 1s for the link to come up, > should be clarified by Lorenzo. > Lorenzo himself applied my patch [1] which had caused the regression on QCOM platforms. So I am assuming that he is okay with removing this 1s delay. [1] https://lore.kernel.org/all/168509076553.135117.7288121992217982937.b4-ty@kernel.org/ > - Mani > > -- > மணிவண்ணன் சதாசிவம்
On Thu, Feb 22, 2024 at 10:00:09AM +0530, Ajay Agarwal wrote: > On Thu, Feb 15, 2024 at 07:39:08PM +0530, Manivannan Sadhasivam wrote: > > On Wed, Feb 14, 2024 at 04:02:28PM -0600, Bjorn Helgaas wrote: > > > On Tue, Feb 06, 2024 at 10:40:43PM +0530, Manivannan Sadhasivam wrote: > > > > ... > > > > > > > ... And for your usecase, allowing the controller driver to start > > > > the link post boot just because a device on your Pixel phone comes > > > > up later is not a good argument. You _should_not_ define the > > > > behavior of a controller driver based on one platform, it is really > > > > a bad design. > > > > > > I haven't followed the entire discussion, and I don't know much about > > > the specifics of Ajay's situation, but from the controller driver's > > > point of view, shouldn't a device coming up later look like a normal > > > hot-add? > > > > > > > Yes, but most of the form factors that these drivers work with do not support > > native hotplug. So users have to rescan the bus through sysfs. > > > > > I think most drivers are designed with the assumption that Endpoints > > > are present and powered up at the time of host controller probe, which > > > seems a little stronger than necessary. > > > > > > > Most of the drivers work with endpoints that are fixed in the board design (like > > M.2), so the endpoints would be up when the controller probes. > > > > > I think the host controller probe should initialize the Root Port such > > > that its LTSSM enters the Detect state, and that much should be > > > basically straight-line code with no waiting. If no Endpoint is > > > attached, i.e., "the slot is empty", it would be nice if the probe > > > could then complete immediately without waiting at all. > > > > > > > Atleast on Qcom platforms, the LTSSM would be in "Detect" state even if no > > endpoints are found during probe. Then once an endpoint comes up later, link > > training happens and user can rescan the bus through sysfs. > > > > But, I don't know the real need of 1s loop to wait for the link. It predates my > > work on DWC drivers. Maybe Lorenzo could shed some light. I could not find the > > reference in both DWC and PCIe specs (maybe my grep was bad). > > > > > If the link comes up later, could we handle it as a hot-add? This > > > might be an actual hot-add, or it might be that an Endpoint was > > > present at boot but link training didn't complete until later. > > > > > > I admit it doesn't look trivial to actually implement this. We would > > > need to be able to detect link-up events, e.g., via hotplug or other > > > link management interrupts. Lacking that hardware functionality, we > > > might need driver-specific code to wait for the link to come up > > > (possibly drivers could skip the wait if they can detect the "slot > > > empty" case). > > > > > > Also, the hotplug functionality (pciehp or acpiphp) is currently > > > initialized later and there's probably a race with enabling and > > > detecting hot-add events in the "slot occupied" case. > > > > > > > As I mentioned above, hotplug is not possible in all the cases. There is a > > series floating to add GPIO based hotplug, but still that requires board > > designers to route a dedicated GPIO to the endpoint. > > > > To conclude, we do need to check for the existence of the endpoints during > > probe. But whether the driver should wait for 1s for the link to come up, > > should be clarified by Lorenzo. > > > Lorenzo himself applied my patch [1] which had caused the regression on > QCOM platforms. So I am assuming that he is okay with removing this 1s > delay. > > [1] https://lore.kernel.org/all/168509076553.135117.7288121992217982937.b4-ty@kernel.org/ > Hi Mani, I am wondering if you have any thoughts on the above comment? > > - Mani > > > > -- > > மணிவண்ணன் சதாசிவம்
On Tue, Feb 20, 2024 at 11:04:09PM +0530, Ajay Agarwal wrote: > On Tue, Feb 06, 2024 at 10:40:43PM +0530, Manivannan Sadhasivam wrote: > > + Joao > > > > On Mon, Feb 05, 2024 at 04:30:20PM +0530, Ajay Agarwal wrote: > > > On Wed, Jan 31, 2024 at 12:06:26AM +0530, Manivannan Sadhasivam wrote: > > > > On Tue, Jan 30, 2024 at 10:48:59PM +0530, Ajay Agarwal wrote: > > > > > On Tue, Jan 30, 2024 at 05:59:06PM +0530, Manivannan Sadhasivam wrote: > > > > > > On Tue, Jan 30, 2024 at 02:30:27PM +0530, Ajay Agarwal wrote: > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > If that's the case with your driver, when are you starting the link training? > > > > > > > > > > > > > > > > > > > > > > > The link training starts later based on a userspace/debugfs trigger. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Why does it happen as such? What's the problem in starting the link during > > > > > > > > > > probe? Keep it in mind that if you rely on the userspace for starting the link > > > > > > > > > > based on a platform (like Android), then if the same SoC or peripheral instance > > > > > > > > > > get reused in other platform (non-android), the it won't be a seamless user > > > > > > > > > > experience. > > > > > > > > > > > > > > > > > > > > If there are any other usecases, please state them. > > > > > > > > > > > > > > > > > > > > - Mani > > > > > > > > > > > > > > > > > > > This SoC is targeted for an android phone usecase and the endpoints > > > > > > > > > being enumerated need to go through an appropriate and device specific > > > > > > > > > power sequence which gets triggered only when the userspace is up. The > > > > > > > > > PCIe probe cannot assume that the EPs have been powered up already and > > > > > > > > > hence the link-up is not attempted. > > > > > > > > > > > > > > > > Still, I do not see the necessity to not call start_link() during probe. If you > > > > > > > I am not adding any logic/condition around calling the start_link() > > > > > > > itself. I am only avoiding the wait for the link to be up if the > > > > > > > controller driver has not defined start_link(). > > > > > > > > > > > > > > > > > > > I'm saying that not defining the start_link() callback itself is wrong. > > > > > > > > > > > Whether the start_link() should be defined or not, is a different > > > > > design discussion. We currently have 2 drivers in upstream (intel-gw and > > > > > dw-plat) which do not have start_link() defined. Waiting for the link to > > > > > come up for the platforms using those drivers is not a good idea. And > > > > > that is what we are trying to avoid. > > > > > > > > > > > > > NO. The sole intention of this patch is to fix the delay observed with _your_ > > > > out-of-tree controller driver as you explicitly said before. Impact for the > > > > existing 2 drivers are just a side effect. > > > > > > > Hi Mani, > > > What is the expectation from the pcie-designware-host driver? If the > > > .start_link() has to be defined by the vendor driver, then shouldn't the > > > probe be failed if the vendor has not defined it? Thereby failing the > > > probe for intel-gw and pcie-dw-plat drivers? > > > > > > > intel-gw maintainer agreed to fix the driver [1], but I cannot really comment on > > the other one. It is not starting the link at all, so don't know how it works. > > I've CCed the driver author to get some idea. > > > > [1] https://lore.kernel.org/linux-pci/BY3PR19MB50764E90F107B3256189804CBD432@BY3PR19MB5076.namprd19.prod.outlook.com/ > > > > > Additionally, if the link fails to come up even after 1 sec of wait > > > time, shouldn't the probe be failed in that case too? > > > > > > > Why? The device can be attached at any point of time. What I'm stressing is, the > > driver should check for the link to be up during probe and if there is no > > device, then it should just continue and hope for the device to show up later. > My change is still checking whether the link is up during probe. > If yes, print the link status (negotiated speed and width). > If no, and the .start_link() exists, then call the same and wait for 1 > second for the link to be up. > There is a reason why we are wating for 1s for the device to show up during probe. Please look at my earlier reply to Bjorn. > > This way, the driver can detect the powered up devices during boot and also > > detect the hotplug devices. > > > If the .start_link() is not defined, how will the link come up? Are you > assuming that the bootloader might have enabled link training? > Yes, something like that. But that assumption is moot in the first place. > > > My understanding of these drivers is that the .start_link() is an > > > OPTIONAL callback and that the dw_pcie_host_init should help setup the > > > SW structures regardless of whether the .start_link() has been defined > > > or not, and whether the link is up or not. The vendor should be allowed > > > to train the link at a later point of time as well. > > > > > > > What do you mean by later point of time? Bringing the link through debugfs? NO. > > We cannot let each driver behave differently, unless there is a really valid > > reason. > > > pci_rescan_bus() is an exported API. I am assuming that this means that > it is supposed to be used by modules when they know that the link is up. > If a module wishes to bring the link up using debugfs or some other HW > trigger, why is that a bad design? In my opinion, this is providing more > options to the HW/product designer and the module author, in addition to > the already existing .start_link() callback. > If the driver _knows_ that the device is up, then rescanning the bus makes sense. But here we are checking for the existence of the device. > > > Please let me know your thoughts. > > > > > > > > add PROBE_PREFER_ASYNCHRONOUS to your controller driver, this delay would become > > > > > > > > negligible. The reason why I'm against not calling start_link() is due to below > > > > > > > > reasons: > > > > > > > > > > > > > > > > 1. If the same SoC gets reused for other platforms, even other android phones > > > > > > > > that powers up the endpoints during boot, then it creates a dependency with > > > > > > > > userspace to always start the link even though the devices were available. > > > > > > > > That's why we should never fix the behavior of the controller drivers based on a > > > > > > > > single platform. > > > > > > > I wonder how the behavior is changing with this patch. Do you have an > > > > > > > example of a platform which does not have start_link() defined but would > > > > > > > like to still wait for a second for the link to come up? > > > > > > > > > > > > > > > > > > > Did you went through my reply completely? I mentioned that the 1s delay would be > > > > > > gone if you add the async flag to your driver and you are ignoring that. > > > > > > > > > The async probe might not help in all the cases. Consider a situation > > > where the PCIe is probed after the boot is already completed. The user > > > will face the delay then. Do you agree? > > > > > > > You mean loading the driver module post boot? If the link is still not up, yes > > the users will see the 1sec delay. > > > > But again, the point I'm trying to make here is, all the drivers have to follow > > one flow. We cannot let each driver do its way of starting the link. There could > > be exceptions if we get a valid reason for a driver to not do so, but so far I > > haven't come across such reason. (the existing drivers, intel-gw and > > designware-plat are not exceptions, but they need fixing). > > > > For your driver, you said that the userspace brings up the link, post boot when > > the devices are powered on. So starting the link during probe incurs 1s delay, > > as there would be no devices. But I suggested you to enable async probe to > > nullify the 1s delay during probe and you confirmed that it fixes the issue for > > you. > > > There are emulation/simulation platforms in which 1 second of runtime > can correspond to ~1 hour of real-world time. Even if PCIe is probed > asyncronously, it will still block the next set of processes. > Which simulation/emulation platform? The one during pre-production stage? If there is no endpoint connected, why would you enable the controller first place? And how can the endpoint show up later during simulation? Why can't it be up earlier? > > Then you are again debating about not defining the start_link() callback :( > > > I am not sure why you think I am debating against defining the > .start_link() callback. It is clearly an optional pointer and I am > choosing to not use it because of the usecase. And if it is optional and > I am not using it, why waste 1 sec of runtime waiting for the link? Do > we have an example in upstream of platforms where this 1 sec could prove > useful where link training is not being started by the platform driver > but EPs have to be detected because they are present and powered-up? > Please tell me how the link is started in your case without start_link() callback. - Mani
On Wed, Feb 28, 2024 at 10:59:51PM +0530, Manivannan Sadhasivam wrote: > On Tue, Feb 20, 2024 at 11:04:09PM +0530, Ajay Agarwal wrote: > > On Tue, Feb 06, 2024 at 10:40:43PM +0530, Manivannan Sadhasivam wrote: > > > + Joao > > > > > > On Mon, Feb 05, 2024 at 04:30:20PM +0530, Ajay Agarwal wrote: > > > > On Wed, Jan 31, 2024 at 12:06:26AM +0530, Manivannan Sadhasivam wrote: > > > > > On Tue, Jan 30, 2024 at 10:48:59PM +0530, Ajay Agarwal wrote: > > > > > > On Tue, Jan 30, 2024 at 05:59:06PM +0530, Manivannan Sadhasivam wrote: > > > > > > > On Tue, Jan 30, 2024 at 02:30:27PM +0530, Ajay Agarwal wrote: > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > > > If that's the case with your driver, when are you starting the link training? > > > > > > > > > > > > > > > > > > > > > > > > > The link training starts later based on a userspace/debugfs trigger. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Why does it happen as such? What's the problem in starting the link during > > > > > > > > > > > probe? Keep it in mind that if you rely on the userspace for starting the link > > > > > > > > > > > based on a platform (like Android), then if the same SoC or peripheral instance > > > > > > > > > > > get reused in other platform (non-android), the it won't be a seamless user > > > > > > > > > > > experience. > > > > > > > > > > > > > > > > > > > > > > If there are any other usecases, please state them. > > > > > > > > > > > > > > > > > > > > > > - Mani > > > > > > > > > > > > > > > > > > > > > This SoC is targeted for an android phone usecase and the endpoints > > > > > > > > > > being enumerated need to go through an appropriate and device specific > > > > > > > > > > power sequence which gets triggered only when the userspace is up. The > > > > > > > > > > PCIe probe cannot assume that the EPs have been powered up already and > > > > > > > > > > hence the link-up is not attempted. > > > > > > > > > > > > > > > > > > Still, I do not see the necessity to not call start_link() during probe. If you > > > > > > > > I am not adding any logic/condition around calling the start_link() > > > > > > > > itself. I am only avoiding the wait for the link to be up if the > > > > > > > > controller driver has not defined start_link(). > > > > > > > > > > > > > > > > > > > > > > I'm saying that not defining the start_link() callback itself is wrong. > > > > > > > > > > > > > Whether the start_link() should be defined or not, is a different > > > > > > design discussion. We currently have 2 drivers in upstream (intel-gw and > > > > > > dw-plat) which do not have start_link() defined. Waiting for the link to > > > > > > come up for the platforms using those drivers is not a good idea. And > > > > > > that is what we are trying to avoid. > > > > > > > > > > > > > > > > NO. The sole intention of this patch is to fix the delay observed with _your_ > > > > > out-of-tree controller driver as you explicitly said before. Impact for the > > > > > existing 2 drivers are just a side effect. > > > > > > > > > Hi Mani, > > > > What is the expectation from the pcie-designware-host driver? If the > > > > .start_link() has to be defined by the vendor driver, then shouldn't the > > > > probe be failed if the vendor has not defined it? Thereby failing the > > > > probe for intel-gw and pcie-dw-plat drivers? > > > > > > > > > > intel-gw maintainer agreed to fix the driver [1], but I cannot really comment on > > > the other one. It is not starting the link at all, so don't know how it works. > > > I've CCed the driver author to get some idea. > > > > > > [1] https://lore.kernel.org/linux-pci/BY3PR19MB50764E90F107B3256189804CBD432@BY3PR19MB5076.namprd19.prod.outlook.com/ > > > > > > > Additionally, if the link fails to come up even after 1 sec of wait > > > > time, shouldn't the probe be failed in that case too? > > > > > > > > > > Why? The device can be attached at any point of time. What I'm stressing is, the > > > driver should check for the link to be up during probe and if there is no > > > device, then it should just continue and hope for the device to show up later. > > My change is still checking whether the link is up during probe. > > If yes, print the link status (negotiated speed and width). > > If no, and the .start_link() exists, then call the same and wait for 1 > > second for the link to be up. > > > > There is a reason why we are wating for 1s for the device to show up during > probe. Please look at my earlier reply to Bjorn. > Yes, I looked at that. I am not sure if that is the real reason behind this delay. The explanation that you quoted from the spec talks about allowing 1s delay for the EP to return a completion. Whereas the delay here is to wait for the link training itself to be completed. We could surely wait for Lorenzo to explain the reason behind this delay, but he himself approved the earlier patch [1] (which caused the regression and had to be reverted): [1] https://lore.kernel.org/all/168509076553.135117.7288121992217982937.b4-ty@kernel.org/ > > > This way, the driver can detect the powered up devices during boot and also > > > detect the hotplug devices. > > > > > If the .start_link() is not defined, how will the link come up? Are you > > assuming that the bootloader might have enabled link training? > > > > Yes, something like that. But that assumption is moot in the first place. > Isnt it weird that a PCIe driver, which will most likely initialize all the resources like power, resets, clocks etc., relies on the bootloader to have enabled the link training? I think it is safe to assume that a driver should have the start_link() defined if it wishes to bring the link up during probe. > > > > My understanding of these drivers is that the .start_link() is an > > > > OPTIONAL callback and that the dw_pcie_host_init should help setup the > > > > SW structures regardless of whether the .start_link() has been defined > > > > or not, and whether the link is up or not. The vendor should be allowed > > > > to train the link at a later point of time as well. > > > > > > > > > > What do you mean by later point of time? Bringing the link through debugfs? NO. > > > We cannot let each driver behave differently, unless there is a really valid > > > reason. > > > > > pci_rescan_bus() is an exported API. I am assuming that this means that > > it is supposed to be used by modules when they know that the link is up. > > If a module wishes to bring the link up using debugfs or some other HW > > trigger, why is that a bad design? In my opinion, this is providing more > > options to the HW/product designer and the module author, in addition to > > the already existing .start_link() callback. > > > > If the driver _knows_ that the device is up, then rescanning the bus makes > sense. But here we are checking for the existence of the device. > Yeah, so the driver could _know_ that the device is up later in the game, right? And then re-scan the bus? Why wait for 1 sec here? > > > > Please let me know your thoughts. > > > > > > > > > add PROBE_PREFER_ASYNCHRONOUS to your controller driver, this delay would become > > > > > > > > > negligible. The reason why I'm against not calling start_link() is due to below > > > > > > > > > reasons: > > > > > > > > > > > > > > > > > > 1. If the same SoC gets reused for other platforms, even other android phones > > > > > > > > > that powers up the endpoints during boot, then it creates a dependency with > > > > > > > > > userspace to always start the link even though the devices were available. > > > > > > > > > That's why we should never fix the behavior of the controller drivers based on a > > > > > > > > > single platform. > > > > > > > > I wonder how the behavior is changing with this patch. Do you have an > > > > > > > > example of a platform which does not have start_link() defined but would > > > > > > > > like to still wait for a second for the link to come up? > > > > > > > > > > > > > > > > > > > > > > Did you went through my reply completely? I mentioned that the 1s delay would be > > > > > > > gone if you add the async flag to your driver and you are ignoring that. > > > > > > > > > > > The async probe might not help in all the cases. Consider a situation > > > > where the PCIe is probed after the boot is already completed. The user > > > > will face the delay then. Do you agree? > > > > > > > > > > You mean loading the driver module post boot? If the link is still not up, yes > > > the users will see the 1sec delay. > > > > > > But again, the point I'm trying to make here is, all the drivers have to follow > > > one flow. We cannot let each driver do its way of starting the link. There could > > > be exceptions if we get a valid reason for a driver to not do so, but so far I > > > haven't come across such reason. (the existing drivers, intel-gw and > > > designware-plat are not exceptions, but they need fixing). > > > > > > For your driver, you said that the userspace brings up the link, post boot when > > > the devices are powered on. So starting the link during probe incurs 1s delay, > > > as there would be no devices. But I suggested you to enable async probe to > > > nullify the 1s delay during probe and you confirmed that it fixes the issue for > > > you. > > > > > There are emulation/simulation platforms in which 1 second of runtime > > can correspond to ~1 hour of real-world time. Even if PCIe is probed > > asyncronously, it will still block the next set of processes. > > > > Which simulation/emulation platform? The one during pre-production stage? If Yes, the pre-production stage emulation platform. > there is no endpoint connected, why would you enable the controller first place? The endpoint is connected. But my usecase requires me to not bring it up by default. Rather, I start the link training later using debugfs. So I want to test my driver probe without attempting link training and thereby, I face the 1 sec delay. I want to avoid it. > And how can the endpoint show up later during simulation? Why can't it be up > earlier? > As explained above, I use debugfs to bring the link up later. The logic for that is present in my platform driver. Once the link is up, I call pci_rescan_bus() to enumerate the EP. > > > Then you are again debating about not defining the start_link() callback :( > > > > > I am not sure why you think I am debating against defining the > > .start_link() callback. It is clearly an optional pointer and I am > > choosing to not use it because of the usecase. And if it is optional and > > I am not using it, why waste 1 sec of runtime waiting for the link? Do > > we have an example in upstream of platforms where this 1 sec could prove > > useful where link training is not being started by the platform driver > > but EPs have to be detected because they are present and powered-up? > > > > Please tell me how the link is started in your case without start_link() > callback. > As explained above, one method is to use debugfs to start the link training. The actual usecase is for the userspace to start the link training based on certain GPIO events from the endpoint. > - Mani > > -- > மணிவண்ணன் சதாசிவம்
On Wed, Mar 06, 2024 at 05:30:53PM +0530, Ajay Agarwal wrote: > On Wed, Feb 28, 2024 at 10:59:51PM +0530, Manivannan Sadhasivam wrote: > > On Tue, Feb 20, 2024 at 11:04:09PM +0530, Ajay Agarwal wrote: > > > On Tue, Feb 06, 2024 at 10:40:43PM +0530, Manivannan Sadhasivam wrote: > > > > + Joao > > > > > > > > On Mon, Feb 05, 2024 at 04:30:20PM +0530, Ajay Agarwal wrote: > > > > > On Wed, Jan 31, 2024 at 12:06:26AM +0530, Manivannan Sadhasivam wrote: > > > > > > On Tue, Jan 30, 2024 at 10:48:59PM +0530, Ajay Agarwal wrote: > > > > > > > On Tue, Jan 30, 2024 at 05:59:06PM +0530, Manivannan Sadhasivam wrote: > > > > > > > > On Tue, Jan 30, 2024 at 02:30:27PM +0530, Ajay Agarwal wrote: > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > > > > > If that's the case with your driver, when are you starting the link training? > > > > > > > > > > > > > > > > > > > > > > > > > > > The link training starts later based on a userspace/debugfs trigger. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Why does it happen as such? What's the problem in starting the link during > > > > > > > > > > > > probe? Keep it in mind that if you rely on the userspace for starting the link > > > > > > > > > > > > based on a platform (like Android), then if the same SoC or peripheral instance > > > > > > > > > > > > get reused in other platform (non-android), the it won't be a seamless user > > > > > > > > > > > > experience. > > > > > > > > > > > > > > > > > > > > > > > > If there are any other usecases, please state them. > > > > > > > > > > > > > > > > > > > > > > > > - Mani > > > > > > > > > > > > > > > > > > > > > > > This SoC is targeted for an android phone usecase and the endpoints > > > > > > > > > > > being enumerated need to go through an appropriate and device specific > > > > > > > > > > > power sequence which gets triggered only when the userspace is up. The > > > > > > > > > > > PCIe probe cannot assume that the EPs have been powered up already and > > > > > > > > > > > hence the link-up is not attempted. > > > > > > > > > > > > > > > > > > > > Still, I do not see the necessity to not call start_link() during probe. If you > > > > > > > > > I am not adding any logic/condition around calling the start_link() > > > > > > > > > itself. I am only avoiding the wait for the link to be up if the > > > > > > > > > controller driver has not defined start_link(). > > > > > > > > > > > > > > > > > > > > > > > > > I'm saying that not defining the start_link() callback itself is wrong. > > > > > > > > > > > > > > > Whether the start_link() should be defined or not, is a different > > > > > > > design discussion. We currently have 2 drivers in upstream (intel-gw and > > > > > > > dw-plat) which do not have start_link() defined. Waiting for the link to > > > > > > > come up for the platforms using those drivers is not a good idea. And > > > > > > > that is what we are trying to avoid. > > > > > > > > > > > > > > > > > > > NO. The sole intention of this patch is to fix the delay observed with _your_ > > > > > > out-of-tree controller driver as you explicitly said before. Impact for the > > > > > > existing 2 drivers are just a side effect. > > > > > > > > > > > Hi Mani, > > > > > What is the expectation from the pcie-designware-host driver? If the > > > > > .start_link() has to be defined by the vendor driver, then shouldn't the > > > > > probe be failed if the vendor has not defined it? Thereby failing the > > > > > probe for intel-gw and pcie-dw-plat drivers? > > > > > > > > > > > > > intel-gw maintainer agreed to fix the driver [1], but I cannot really comment on > > > > the other one. It is not starting the link at all, so don't know how it works. > > > > I've CCed the driver author to get some idea. > > > > > > > > [1] https://lore.kernel.org/linux-pci/BY3PR19MB50764E90F107B3256189804CBD432@BY3PR19MB5076.namprd19.prod.outlook.com/ > > > > > > > > > Additionally, if the link fails to come up even after 1 sec of wait > > > > > time, shouldn't the probe be failed in that case too? > > > > > > > > > > > > > Why? The device can be attached at any point of time. What I'm stressing is, the > > > > driver should check for the link to be up during probe and if there is no > > > > device, then it should just continue and hope for the device to show up later. > > > My change is still checking whether the link is up during probe. > > > If yes, print the link status (negotiated speed and width). > > > If no, and the .start_link() exists, then call the same and wait for 1 > > > second for the link to be up. > > > > > > > There is a reason why we are wating for 1s for the device to show up during > > probe. Please look at my earlier reply to Bjorn. > > > Yes, I looked at that. I am not sure if that is the real reason behind > this delay. The explanation that you quoted from the spec talks about > allowing 1s delay for the EP to return a completion. Whereas the delay > here is to wait for the link training itself to be completed. > It is implied, afaiu. But the best person to comment here is Lorenzo. > We could surely wait for Lorenzo to explain the reason behind this > delay, but he himself approved the earlier patch [1] (which caused the > regression and had to be reverted): > This is not a valid argument. > [1] https://lore.kernel.org/all/168509076553.135117.7288121992217982937.b4-ty@kernel.org/ > > > > > This way, the driver can detect the powered up devices during boot and also > > > > detect the hotplug devices. > > > > > > > If the .start_link() is not defined, how will the link come up? Are you > > > assuming that the bootloader might have enabled link training? > > > > > > > Yes, something like that. But that assumption is moot in the first place. > > > Isnt it weird that a PCIe driver, which will most likely initialize all > the resources like power, resets, clocks etc., relies on the bootloader > to have enabled the link training? > That's why I said that assumption is 'moot'. > I think it is safe to assume that a driver should have the start_link() > defined if it wishes to bring the link up during probe. > I keep repeating this. But let me do one more time. There should be a valid reason for a driver to defer the start_link(). In your case, the problem is you are tying the driver with the Android usecase which is not going to work on other platforms. What is worse is, you are forcing the users to enable the link training post boot. It may work for you currently, as you need to bring up the endpoint manually for various reasons, but what if some other OEM has connected an endpoint that gets powered on during the controller probe? Still the user has to start the link manually. Will that provide a nice user experience? NO. Then the developers will start complaining that they cannot see the endpoint during boot even though it got powered up properly. Second, we cannot have different policies for different drivers to start the link unless there is a valid reason. This will become a maintenance burden. For sure, there are differences in the current drivers, but those should be fixed instead of extended. So to address your issue, I can suggest two things: 1. Wait for Lorenzo to clarify the 1s loop while waiting for link up. If that seem to be required, then keep this patch out-of-tree. Btw, your driver is still out-of-tree... 2. Try to bringup the endpoint during boot itself. We already have a series to achieve that [1]. - Mani [1] https://lore.kernel.org/linux-pci/20240216203215.40870-1-brgl@bgdev.pl/
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c index 7991f0e179b2..e53132663d1d 100644 --- a/drivers/pci/controller/dwc/pcie-designware-host.c +++ b/drivers/pci/controller/dwc/pcie-designware-host.c @@ -487,14 +487,18 @@ int dw_pcie_host_init(struct dw_pcie_rp *pp) if (ret) goto err_remove_edma; - if (!dw_pcie_link_up(pci)) { + if (dw_pcie_link_up(pci)) { + dw_pcie_print_link_status(pci); + } else { ret = dw_pcie_start_link(pci); if (ret) goto err_remove_edma; - } - /* Ignore errors, the link may come up later */ - dw_pcie_wait_for_link(pci); + if (pci->ops && pci->ops->start_link) { + /* Ignore errors, the link may come up later */ + dw_pcie_wait_for_link(pci); + } + } bridge->sysdata = pp; diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c index 250cf7f40b85..c067d2e960cf 100644 --- a/drivers/pci/controller/dwc/pcie-designware.c +++ b/drivers/pci/controller/dwc/pcie-designware.c @@ -645,9 +645,20 @@ void dw_pcie_disable_atu(struct dw_pcie *pci, u32 dir, int index) dw_pcie_writel_atu(pci, dir, index, PCIE_ATU_REGION_CTRL2, 0); } -int dw_pcie_wait_for_link(struct dw_pcie *pci) +void dw_pcie_print_link_status(struct dw_pcie *pci) { u32 offset, val; + + offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP); + val = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA); + + dev_info(pci->dev, "PCIe Gen.%u x%u link up\n", + FIELD_GET(PCI_EXP_LNKSTA_CLS, val), + FIELD_GET(PCI_EXP_LNKSTA_NLW, val)); +} + +int dw_pcie_wait_for_link(struct dw_pcie *pci) +{ int retries; /* Check if the link is up or not */ @@ -663,12 +674,7 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci) return -ETIMEDOUT; } - offset = dw_pcie_find_capability(pci, PCI_CAP_ID_EXP); - val = dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKSTA); - - dev_info(pci->dev, "PCIe Gen.%u x%u link up\n", - FIELD_GET(PCI_EXP_LNKSTA_CLS, val), - FIELD_GET(PCI_EXP_LNKSTA_NLW, val)); + dw_pcie_print_link_status(pci); return 0; } diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h index 55ff76e3d384..164214a7219a 100644 --- a/drivers/pci/controller/dwc/pcie-designware.h +++ b/drivers/pci/controller/dwc/pcie-designware.h @@ -447,6 +447,7 @@ void dw_pcie_setup(struct dw_pcie *pci); void dw_pcie_iatu_detect(struct dw_pcie *pci); int dw_pcie_edma_detect(struct dw_pcie *pci); void dw_pcie_edma_remove(struct dw_pcie *pci); +void dw_pcie_print_link_status(struct dw_pcie *pci); int dw_pcie_suspend_noirq(struct dw_pcie *pci); int dw_pcie_resume_noirq(struct dw_pcie *pci);
In dw_pcie_host_init() regardless of whether the link has been started or not, the code waits for the link to come up. Even in cases where start_link() is not defined the code ends up spinning in a loop for 1 second. Since in some systems dw_pcie_host_init() gets called during probe, this one second loop for each pcie interface instance ends up extending the boot time. Wait for the link up in only if the start_link() is defined. Signed-off-by: Ajay Agarwal <ajayagarwal@google.com> --- v4 was applied, but then reverted. The reason being v4 added a regression on some products which expect the link to not come up as a part of the probe. Since v4 returned error from dw_pcie_wait_for_link check, the probe function of these products started to fail. Changelog since v4: - Do not return error from dw_pcie_wait_for_link check .../pci/controller/dwc/pcie-designware-host.c | 12 +++++++---- drivers/pci/controller/dwc/pcie-designware.c | 20 ++++++++++++------- drivers/pci/controller/dwc/pcie-designware.h | 1 + 3 files changed, 22 insertions(+), 11 deletions(-)