mbox series

[v7,0/2] PCI: designware-ep: Fix DBI access before core init

Message ID 20231120084014.108274-1-manivannan.sadhasivam@linaro.org (mailing list archive)
Headers show
Series PCI: designware-ep: Fix DBI access before core init | expand

Message

Manivannan Sadhasivam Nov. 20, 2023, 8:40 a.m. UTC
Hello,

This series is the continuation of previous work by Vidya Sagar [1] to fix the
issues related to accessing DBI register space before completing the core
initialization in some EP platforms like Tegra194/234 and Qcom SM8450.

Since Vidya is busy, I took over the series based on his consent (off-list
discussion).

I've reworked the series in v7 to make it bisect friendly, and also to avoid
build issue with the dependency. This resulted in the patches being heavily
modified, so I took over the authorship of the patches.

Testing
=======

I've tested the series on Qcom SM8450 based dev board. I also expect it to work
on Tegra platforms as well but it'd be good if Vidya or someone could test it.

- Mani

[1] https://lore.kernel.org/linux-pci/20221013175712.7539-1-vidyas@nvidia.com/
[2] https://lore.kernel.org/linux-pci/20230825123843.GD6005@thinkpad/

Changes in v7:

- Rebased on top of v6.7-rc1
- Kept the current dw_pcie_ep_init_complete() API instead of renaming it to
  dw_pcie_ep_init_late(), since changing the name causes a slight ambiguity.
- Splitted the change that moves pci_epc_init_notify() inside
  dw_pcie_ep_init_notify() to help bisecting and also to avoid build issue.
- Added a new patch that moves pci_epc_init_notify() inside
  dw_pcie_ep_init_notify().
- Took over the authorship and dropped the previous Ack as the patches are
  heavily modified.

Changes in v6:

- Rebased on top of pci/next (6e2fca71e187)
- removed ep_init_late() callback as it is no longer necessary

For previous changelog, please refer [1].


Manivannan Sadhasivam (2):
  PCI: designware-ep: Fix DBI access before core init
  PCI: designware-ep: Move pci_epc_init_notify() inside
    dw_pcie_ep_init_complete()

 .../pci/controller/dwc/pcie-designware-ep.c   | 149 +++++++++++-------
 drivers/pci/controller/dwc/pcie-designware.h  |   5 -
 drivers/pci/controller/dwc/pcie-qcom-ep.c     |   2 -
 drivers/pci/controller/dwc/pcie-tegra194.c    |   2 -
 4 files changed, 93 insertions(+), 65 deletions(-)

Comments

Krzysztof Wilczyński Jan. 7, 2024, 7:27 a.m. UTC | #1
Hello,

> This series is the continuation of previous work by Vidya Sagar [1] to fix the
> issues related to accessing DBI register space before completing the core
> initialization in some EP platforms like Tegra194/234 and Qcom SM8450.

Applied to controller/dwc-ep, thank you!

[01/02] PCI: designware-ep: Fix DBI access before core init
        https://git.kernel.org/pci/pci/c/d3d13b00a2cf
[02/02] PCI: designware-ep: Move pci_epc_init_notify() inside dw_pcie_ep_init_complete()
        https://git.kernel.org/pci/pci/c/a171e1d60dad

	Krzysztof
Krzysztof Wilczyński Jan. 7, 2024, 7:34 a.m. UTC | #2
Hello,

> > This series is the continuation of previous work by Vidya Sagar [1] to fix the
> > issues related to accessing DBI register space before completing the core
> > initialization in some EP platforms like Tegra194/234 and Qcom SM8450.
> 
> Applied to controller/dwc-ep, thank you!
> 
> [01/02] PCI: designware-ep: Fix DBI access before core init
>         https://git.kernel.org/pci/pci/c/d3d13b00a2cf
> [02/02] PCI: designware-ep: Move pci_epc_init_notify() inside dw_pcie_ep_init_complete()
>         https://git.kernel.org/pci/pci/c/a171e1d60dad

Manivannan, I applied this series to "controller/dwc-ep" rather than
"controller/dwc" so that I can have a look at the conflicts we have
between this series and the current "controller/dwc" branch.

There have been changes from both Shimoda-san and Frank Li that both
changed some things and moved some things around within the code base,
so this series no longer cleanly applies.

However, I wanted the CI system to pick the branch for testing, as the
sooner, the better.

Hopefully, we can resolve the conflicts before Bjorn sends his Pull
Request with 6.8 changes.

	Krzysztof
Niklas Cassel Jan. 9, 2024, 5:58 p.m. UTC | #3
On Sun, Jan 07, 2024 at 04:27:07PM +0900, Krzysztof Wilczyński wrote:
> Hello,

Hello Krzysztof, Manivannan,

> 
> > This series is the continuation of previous work by Vidya Sagar [1] to fix the
> > issues related to accessing DBI register space before completing the core
> > initialization in some EP platforms like Tegra194/234 and Qcom SM8450.
> 
> Applied to controller/dwc-ep, thank you!
> 
> [01/02] PCI: designware-ep: Fix DBI access before core init
>         https://git.kernel.org/pci/pci/c/d3d13b00a2cf
> [02/02] PCI: designware-ep: Move pci_epc_init_notify() inside dw_pcie_ep_init_complete()
>         https://git.kernel.org/pci/pci/c/a171e1d60dad
> 
> 	Krzysztof

Considering that we know that this series introduces new problems
for drivers with a .core_init_notifier (i.e. tegra and qcom), see:
https://lore.kernel.org/linux-pci/ZWYmX8Y%2F7Q9WMxES@x1-carbon/

Do we really want to apply this series as is?


Reading the patch, it appears that (at least some) tegra and
qcom boards currently causes a whole system hang, which IIUC,
renders those boards unusable.

So perhaps the new issues introduced by this series is preferable
to a whole system hang.

As such, I can understand the urgency to merge this series.
However, at the very least, I think that it might be worth amending
the commit message to mention that this will currently not deregister
the dma device in a clean way, leading to e.g. superfluous entries in
/sys/class/dma/ and debugfs warnings being printed to dmesg.


Kind regards,
Niklas
Manivannan Sadhasivam Jan. 10, 2024, 3:11 a.m. UTC | #4
On Tue, Jan 09, 2024 at 05:58:53PM +0000, Niklas Cassel wrote:
> On Sun, Jan 07, 2024 at 04:27:07PM +0900, Krzysztof Wilczyński wrote:
> > Hello,
> 
> Hello Krzysztof, Manivannan,
> 
> > 
> > > This series is the continuation of previous work by Vidya Sagar [1] to fix the
> > > issues related to accessing DBI register space before completing the core
> > > initialization in some EP platforms like Tegra194/234 and Qcom SM8450.
> > 
> > Applied to controller/dwc-ep, thank you!
> > 
> > [01/02] PCI: designware-ep: Fix DBI access before core init
> >         https://git.kernel.org/pci/pci/c/d3d13b00a2cf
> > [02/02] PCI: designware-ep: Move pci_epc_init_notify() inside dw_pcie_ep_init_complete()
> >         https://git.kernel.org/pci/pci/c/a171e1d60dad
> > 
> > 	Krzysztof
> 
> Considering that we know that this series introduces new problems
> for drivers with a .core_init_notifier (i.e. tegra and qcom), see:
> https://lore.kernel.org/linux-pci/ZWYmX8Y%2F7Q9WMxES@x1-carbon/
> 
> Do we really want to apply this series as is?
> 
> 

Niklas, I think I explained it in this thread itself. Let me reiterate here
again.

The fact that you are seeing the dmaengine warnings is due to function drivers
not releasing the channels properly. It is not the job of the DWC driver to
release the channels. The channels are requested by the function drivers [1]
and they _should_ release them when the channels are no longer required.

I know that the PCI_EPF_TEST driver is not doing so and so you are seeing the
warnings. But I do not have a device to test that function driver. Qcom
platforms use a dedicated function driver and that releases the channels when it
gets the LINK_DOWN event from EPC [2].

So my conclusion is that the issue is there even without this series. If you
still want me to fix the EPF_TEST driver, I can submit a change, but someone has
to test it.

- Mani

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/endpoint/functions/pci-epf-test.c#n229
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/endpoint/functions/pci-epf-mhi.c#n563

> Reading the patch, it appears that (at least some) tegra and
> qcom boards currently causes a whole system hang, which IIUC,
> renders those boards unusable.
> 
> So perhaps the new issues introduced by this series is preferable
> to a whole system hang.
> 
> As such, I can understand the urgency to merge this series.
> However, at the very least, I think that it might be worth amending
> the commit message to mention that this will currently not deregister
> the dma device in a clean way, leading to e.g. superfluous entries in
> /sys/class/dma/ and debugfs warnings being printed to dmesg.
> 
> 
> Kind regards,
> Niklas
Niklas Cassel Jan. 10, 2024, 10:01 a.m. UTC | #5
Hello Mani,

On Wed, Jan 10, 2024 at 08:41:37AM +0530, Manivannan Sadhasivam wrote:
> On Tue, Jan 09, 2024 at 05:58:53PM +0000, Niklas Cassel wrote:
> > On Sun, Jan 07, 2024 at 04:27:07PM +0900, Krzysztof Wilczyński wrote:
> > 
> > Considering that we know that this series introduces new problems
> > for drivers with a .core_init_notifier (i.e. tegra and qcom), see:
> > https://lore.kernel.org/linux-pci/ZWYmX8Y%2F7Q9WMxES@x1-carbon/
> > 
> > Do we really want to apply this series as is?
> > 
> > 
> 
> Niklas, I think I explained it in this thread itself. Let me reiterate here
> again.
> 
> The fact that you are seeing the dmaengine warnings is due to function drivers
> not releasing the channels properly. It is not the job of the DWC driver to
> release the channels. The channels are requested by the function drivers [1]
> and they _should_ release them when the channels are no longer required.

Sure, the function driver should release the channels.


> 
> I know that the PCI_EPF_TEST driver is not doing so and so you are seeing the
> warnings. But I do not have a device to test that function driver. Qcom
> platforms use a dedicated function driver and that releases the channels when it
> gets the LINK_DOWN event from EPC [2].
> 
> So my conclusion is that the issue is there even without this series. If you
> still want me to fix the EPF_TEST driver, I can submit a change, but someone has
> to test it.

That conclusion is not fully correct.

Let's take e.g. these error messages that this series introduces:
[ 1000.714355] debugfs: File 'mf' in directory '/' already present!
[ 1000.714890] debugfs: File 'wr_ch_cnt' in directory '/' already present!
[ 1000.715476] debugfs: File 'rd_ch_cnt' in directory '/' already present!
[ 1000.716061] debugfs: Directory 'registers' with parent '/' already present!

These come from dw_edma_core_debugfs_on(), which is called by dw_edma_probe().

This is a direct result from your patch:
https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?h=controller/dwc-ep&id=9ab5c8bb7a305135b1b6c65cb8db92b4acbef79d

Which moves dw_pcie_edma_detect() from dw_pcie_ep_init_complete() to
dw_pcie_ep_late_init() (since dw_pcie_edma_detect() calls dw_edma_probe()).

So without your patch, those debugfs error messages are not seen.

Thus, I do not think that it is sufficient to only modify the pci-epf-test
driver to release the dma channels, as I don't see how that will avoid e.g.
the debugfs error messages introduced by this patch.


Kind regards,
Niklas
Manivannan Sadhasivam Jan. 10, 2024, 3:27 p.m. UTC | #6
On Wed, Jan 10, 2024 at 10:01:08AM +0000, Niklas Cassel wrote:
> Hello Mani,
> 
> On Wed, Jan 10, 2024 at 08:41:37AM +0530, Manivannan Sadhasivam wrote:
> > On Tue, Jan 09, 2024 at 05:58:53PM +0000, Niklas Cassel wrote:
> > > On Sun, Jan 07, 2024 at 04:27:07PM +0900, Krzysztof Wilczyński wrote:
> > > 
> > > Considering that we know that this series introduces new problems
> > > for drivers with a .core_init_notifier (i.e. tegra and qcom), see:
> > > https://lore.kernel.org/linux-pci/ZWYmX8Y%2F7Q9WMxES@x1-carbon/
> > > 
> > > Do we really want to apply this series as is?
> > > 
> > > 
> > 
> > Niklas, I think I explained it in this thread itself. Let me reiterate here
> > again.
> > 
> > The fact that you are seeing the dmaengine warnings is due to function drivers
> > not releasing the channels properly. It is not the job of the DWC driver to
> > release the channels. The channels are requested by the function drivers [1]
> > and they _should_ release them when the channels are no longer required.
> 
> Sure, the function driver should release the channels.
> 
> 
> > 
> > I know that the PCI_EPF_TEST driver is not doing so and so you are seeing the
> > warnings. But I do not have a device to test that function driver. Qcom
> > platforms use a dedicated function driver and that releases the channels when it
> > gets the LINK_DOWN event from EPC [2].
> > 
> > So my conclusion is that the issue is there even without this series. If you
> > still want me to fix the EPF_TEST driver, I can submit a change, but someone has
> > to test it.
> 
> That conclusion is not fully correct.
> 
> Let's take e.g. these error messages that this series introduces:
> [ 1000.714355] debugfs: File 'mf' in directory '/' already present!
> [ 1000.714890] debugfs: File 'wr_ch_cnt' in directory '/' already present!
> [ 1000.715476] debugfs: File 'rd_ch_cnt' in directory '/' already present!
> [ 1000.716061] debugfs: Directory 'registers' with parent '/' already present!
> 
> These come from dw_edma_core_debugfs_on(), which is called by dw_edma_probe().
> 
> This is a direct result from your patch:
> https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?h=controller/dwc-ep&id=9ab5c8bb7a305135b1b6c65cb8db92b4acbef79d
> 
> Which moves dw_pcie_edma_detect() from dw_pcie_ep_init_complete() to
> dw_pcie_ep_late_init() (since dw_pcie_edma_detect() calls dw_edma_probe()).
> 
> So without your patch, those debugfs error messages are not seen.
> 
> Thus, I do not think that it is sufficient to only modify the pci-epf-test
> driver to release the dma channels, as I don't see how that will avoid e.g.
> the debugfs error messages introduced by this patch.
> 

Ah, sorry I overlooked the warnings from the edma core. I think adding
dw_pcie_edma_remove() to the perst_assert() function would fix this issue. But
I'm traveling this week, so couldn't verify it on the device.

Bjorn, Krzysztof feel free to drop this series for 6.8. I will modify this
series to address some other issues discussed so far and resubmit it for 6.9.

- Mani

> 
> Kind regards,
> Niklas