Message ID | 20211210161025.3287927-1-sr@denx.de (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | [RFC] PCI/MSI: Only mask all MSI-X entries when MSI-X is used | expand |
Stefan, On Fri, Dec 10 2021 at 17:10, Stefan Roese wrote: > I've debugged the MSI integration of the ZynqMP PCIe rootport driver > (pcie-xilinx-nwl.c) and found no issues there. Also the MSI framework > in the Kernel did not reveal any problems - at least for me. Looking > a bit deeper into the lspci output, I found an interesting difference > between v5.4 and v5.10 (or later). > > v5.4: > 04:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1321 (rev 02) (prog-if 02 [NVM Express]) > ... > Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+ > Address: 00000000fd480000 Data: 0004 > Masking: 00000000 Pending: 00000000 > Capabilities: [70] Express (v2) Endpoint, MSI 00 > ... > Capabilities: [b0] MSI-X: Enable- Count=67 Masked- > Vector table: BAR=0 offset=00002000 > PBA: BAR=0 offset=00003000 > ... > > v5.10: > 04:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1321 (rev 02) (prog-if 02 [NVM Express]) > ... > Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+ > Address: 00000000fd480000 Data: 0004 > Masking: 00000000 Pending: 00000000 > Capabilities: [70] Express (v2) Endpoint, MSI 00 > ... > Capabilities: [b0] MSI-X: Enable- Count=67 Masked+ > Vector table: BAR=0 offset=00002000 > PBA: BAR=0 offset=00003000 > ... > > So the only difference here being the "Masked+" compared to the > "Masked-" in the working v5.4 Kernel. Testing in this area has shown, > that the root cause for the masked bit being set was the call to > msix_mask_all() in msix_capability_init(). Without this, all works just > fine and the MSI interrupts are received again by the NVMe driver. Not really. The Masked+ in the capabilities entry has nothing to do with the entries in the table being masked. The Masked+ reflects the PCI_MSIX_FLAGS_MASKALL bit in the MSI-X control register. That is set early on and not cleared in the error handling path. The error handling just clears the MSIX_FLAGS_ENABLE bit. Can you try the patch below? It might still be that this Marvell part really combines the per entry mask bits from MSI-X with MSI, then we need both. Thanks, tglx --- --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -777,7 +777,7 @@ static int msix_capability_init(struct p free_msi_irqs(dev); out_disable: - pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0); + pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE, 0); return ret; }
Hi Thomas, On 12/11/21 11:17, Thomas Gleixner wrote: > Stefan, > > On Fri, Dec 10 2021 at 17:10, Stefan Roese wrote: >> I've debugged the MSI integration of the ZynqMP PCIe rootport driver >> (pcie-xilinx-nwl.c) and found no issues there. Also the MSI framework >> in the Kernel did not reveal any problems - at least for me. Looking >> a bit deeper into the lspci output, I found an interesting difference >> between v5.4 and v5.10 (or later). >> >> v5.4: >> 04:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1321 (rev 02) (prog-if 02 [NVM Express]) >> ... >> Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+ >> Address: 00000000fd480000 Data: 0004 >> Masking: 00000000 Pending: 00000000 >> Capabilities: [70] Express (v2) Endpoint, MSI 00 >> ... >> Capabilities: [b0] MSI-X: Enable- Count=67 Masked- >> Vector table: BAR=0 offset=00002000 >> PBA: BAR=0 offset=00003000 >> ... >> >> v5.10: >> 04:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1321 (rev 02) (prog-if 02 [NVM Express]) >> ... >> Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+ >> Address: 00000000fd480000 Data: 0004 >> Masking: 00000000 Pending: 00000000 >> Capabilities: [70] Express (v2) Endpoint, MSI 00 >> ... >> Capabilities: [b0] MSI-X: Enable- Count=67 Masked+ >> Vector table: BAR=0 offset=00002000 >> PBA: BAR=0 offset=00003000 >> ... >> >> So the only difference here being the "Masked+" compared to the >> "Masked-" in the working v5.4 Kernel. Testing in this area has shown, >> that the root cause for the masked bit being set was the call to >> msix_mask_all() in msix_capability_init(). Without this, all works just >> fine and the MSI interrupts are received again by the NVMe driver. > > Not really. The Masked+ in the capabilities entry has nothing to do with > the entries in the table being masked. The Masked+ reflects the > PCI_MSIX_FLAGS_MASKALL bit in the MSI-X control register. > > That is set early on and not cleared in the error handling path. The > error handling just clears the MSIX_FLAGS_ENABLE bit. > > Can you try the patch below? Sure, please see below. > It might still be that this Marvell part really combines the per entry > mask bits from MSI-X with MSI, then we need both. With your patch applied only (mine not), the Masked+ is gone but still the MSI interrupts are not received in the system. So you seem to have guessed correctly, that we need both changes. How to continue? Should I integrate your patch into mine and send a new version? Or will you send it separately to the list for integration? Thanks, Stefan > Thanks, > > tglx > --- > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > @@ -777,7 +777,7 @@ static int msix_capability_init(struct p > free_msi_irqs(dev); > > out_disable: > - pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0); > + pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE, 0); > > return ret; > } > Viele Grüße, Stefan Roese
Stefan, On Sat, Dec 11 2021 at 14:58, Stefan Roese wrote: > On 12/11/21 11:17, Thomas Gleixner wrote: >> Can you try the patch below? > > Sure, please see below. > >> It might still be that this Marvell part really combines the per entry >> mask bits from MSI-X with MSI, then we need both. > > With your patch applied only (mine not), the Masked+ is gone but still > the MSI interrupts are not received in the system. So you seem to have > guessed correctly, that we need both changes. Groan. How is that device specification compliant? Vector Control for MSI-X Table Entries -------------------------------------- "00: Mask bit: When this bit is set, the function is prohibited from sending a message using this MSI-X Table entry. .... This bit’s state after reset is 1 (entry is masked)." So how can that work in the first place if that device is PCI specification compliant? Seems that PCI/SIG compliance program is just another rubberstamping nonsense. Can someone who has access to that group please ask them what their specification compliance stuff is actualy testing? Sure, that went unnoticed so far on that marvelous device because the kernel was missing a defense line, but sigh... > How to continue? Should I integrate your patch into mine and send a new > version? Or will you send it separately to the list for integration? Your patch is incomplete. The function can fail later on, which results in the same problem, no? So we need something like the below. Just to satisfy my curiosity: The device supports obviously MSI-X, which is preferred over MSI. So why is the MSI-X initialization failing in the first place on this platform? Thanks, tglx --- --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -722,9 +722,6 @@ static int msix_capability_init(struct p goto out_disable; } - /* Ensure that all table entries are masked. */ - msix_mask_all(base, tsize); - ret = msix_setup_entries(dev, base, entries, nvec, affd); if (ret) goto out_disable; @@ -751,6 +748,9 @@ static int msix_capability_init(struct p /* Set MSI-X enabled bits and unmask the function */ pci_intx_for_msi(dev, 0); dev->msix_enabled = 1; + + /* Ensure that all table entries are masked. */ + msix_mask_all(base, tsize); pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); pcibios_free_irq(dev); @@ -777,7 +777,7 @@ static int msix_capability_init(struct p free_msi_irqs(dev); out_disable: - pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0); + pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE, 0); return ret; }
Hi Thomas, On 12/11/21 22:02, Thomas Gleixner wrote: > Stefan, > > On Sat, Dec 11 2021 at 14:58, Stefan Roese wrote: >> On 12/11/21 11:17, Thomas Gleixner wrote: >>> Can you try the patch below? >> >> Sure, please see below. >> >>> It might still be that this Marvell part really combines the per entry >>> mask bits from MSI-X with MSI, then we need both. >> >> With your patch applied only (mine not), the Masked+ is gone but still >> the MSI interrupts are not received in the system. So you seem to have >> guessed correctly, that we need both changes. > > Groan. How is that device specification compliant? > > Vector Control for MSI-X Table Entries > -------------------------------------- > > "00: Mask bit: When this bit is set, the function is prohibited from > sending a message using this MSI-X Table entry. > .... > This bit’s state after reset is 1 (entry is masked)." > > So how can that work in the first place if that device is PCI > specification compliant? Seems that PCI/SIG compliance program is just > another rubberstamping nonsense. > > Can someone who has access to that group please ask them what their > specification compliance stuff is actualy testing? > > Sure, that went unnoticed so far on that marvelous device because the > kernel was missing a defense line, but sigh... > >> How to continue? Should I integrate your patch into mine and send a new >> version? Or will you send it separately to the list for integration? > > Your patch is incomplete. The function can fail later on, which results > in the same problem, no? Yes, agreed. > So we need something like the below. The patch below works fine on my ZynqMP platform. MSI interrupts are now received okay. > Just to satisfy my curiosity: > > The device supports obviously MSI-X, which is preferred over MSI. I would gladly use MSI-X interrupts, if easily possible. But... > So why is the MSI-X initialization failing in the first place on this > platform? ... the ZyqnMP PCIe rootport driver only support legacy and MSI interrupts but not MSI-X (yet) [1]. Thanks, Stefan [1] https://elixir.bootlin.com/linux/latest/source/drivers/pci/controller/pcie-xilinx-nwl.c > Thanks, > > tglx > --- > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > @@ -722,9 +722,6 @@ static int msix_capability_init(struct p > goto out_disable; > } > > - /* Ensure that all table entries are masked. */ > - msix_mask_all(base, tsize); > - > ret = msix_setup_entries(dev, base, entries, nvec, affd); > if (ret) > goto out_disable; > @@ -751,6 +748,9 @@ static int msix_capability_init(struct p > /* Set MSI-X enabled bits and unmask the function */ > pci_intx_for_msi(dev, 0); > dev->msix_enabled = 1; > + > + /* Ensure that all table entries are masked. */ > + msix_mask_all(base, tsize); > pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0); > > pcibios_free_irq(dev); > @@ -777,7 +777,7 @@ static int msix_capability_init(struct p > free_msi_irqs(dev); > > out_disable: > - pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0); > + pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE, 0); > > return ret; > } > Viele Grüße, Stefan Roese
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index a7a1c7411348..25b659dd5e2b 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -825,9 +825,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, goto out_disable; } - /* Ensure that all table entries are masked. */ - msix_mask_all(base, tsize); - ret = msix_setup_entries(dev, base, entries, nvec, affd); if (ret) goto out_disable; @@ -836,6 +833,9 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries, if (ret) goto out_avail; + /* Ensure that all table entries are masked. */ + msix_mask_all(base, tsize); + /* Check if all MSI entries honor device restrictions */ ret = msi_verify_entries(dev); if (ret)
This patch moves the masking of the MSI-X entries to a later stage in msix_capability_init(), which is not reached on platforms not supporting MSI-X. Without this, MSI interrupts from a NVMe drive are not received at all on this ZynqMP based platform, only supporting legacy and MSI interrupts. Background: This patch fixes a problem on our ZynqMP based system working with newer NVMe drives which support MSI & MSI-X. Running v5.4 all is fine and these drives correctly configure an MSI interrupt and this IRQ is received just fine in the ZynqMP rootport. But when updating to v5.10 or later (I also tested with v5.15 and v5.16-rc4) the MSI interrupt gets assigned but no interrupts are received by the NVMe driver at all. Note: The ZynqMP PCIe rootport driver only supports legacy and MSI interrupts, not MSI-X (yet). I've debugged the MSI integration of the ZynqMP PCIe rootport driver (pcie-xilinx-nwl.c) and found no issues there. Also the MSI framework in the Kernel did not reveal any problems - at least for me. Looking a bit deeper into the lspci output, I found an interesting difference between v5.4 and v5.10 (or later). v5.4: 04:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1321 (rev 02) (prog-if 02 [NVM Express]) ... Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+ Address: 00000000fd480000 Data: 0004 Masking: 00000000 Pending: 00000000 Capabilities: [70] Express (v2) Endpoint, MSI 00 ... Capabilities: [b0] MSI-X: Enable- Count=67 Masked- Vector table: BAR=0 offset=00002000 PBA: BAR=0 offset=00003000 ... v5.10: 04:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1321 (rev 02) (prog-if 02 [NVM Express]) ... Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+ Address: 00000000fd480000 Data: 0004 Masking: 00000000 Pending: 00000000 Capabilities: [70] Express (v2) Endpoint, MSI 00 ... Capabilities: [b0] MSI-X: Enable- Count=67 Masked+ Vector table: BAR=0 offset=00002000 PBA: BAR=0 offset=00003000 ... So the only difference here being the "Masked+" compared to the "Masked-" in the working v5.4 Kernel. Testing in this area has shown, that the root cause for the masked bit being set was the call to msix_mask_all() in msix_capability_init(). Without this, all works just fine and the MSI interrupts are received again by the NVMe driver. BTW: I've also tested this problem with the latest version of Thomas's PCI/MSI Spring cleaning on top of v5.16-rc4. No change - the masked bit is still set and the MSI interrupt are note received by the NVMe driver. I'm open to other ideas to fix this issue. So please review and comment. Fixes: aa8092c1d1f1 ("PCI/MSI: Mask all unused MSI-X entries") Signed-off-by: Stefan Roese <sr@denx.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Marek Vasut <marex@denx.de> --- drivers/pci/msi.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)