Message ID | 20240717195937.2240400-1-maz@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | irqchip/gic-v3-its: Correctly honor the RID remapping | expand |
On Wed, Jul 17, 2024 at 08:59:37PM +0100, Marc Zyngier wrote: > Since 6adb35ff43a16 ("irqchip/gic-v3-its: Provide MSI parent for > PCI/MSI[-X]"), the primary domain a PCI device allocates its interrupts > from is the one that is directly attached to the device itself. > > By virtue of being a PCI device, it has no OF node. > > This domain is (through more layer than it is worth describing) > passed to its_pci_msi_prepare(), which tries to compute the > full RID that is presented to the ITS by the device. This is ultimately > done by calling pci_msi_domain_get_msi_rid(), passing both the > domain and the PCI device as arguments. > > The baked-in assumption is that either the domain that is passed > to pci_msi_domain_get_msi_rid() describes an interrupt controller > with either an OF node or an entry in an ACPI IORT table. > In our case, it is *neither*. This domain is does not represent nit: stray "is" > anything firmware-based, but just an allocation unit for the device. > > As a result, we fail to provide the full RID (which requires inspecting > the msi-map/msi-mask properties in the DT), and stick to the BDF, > which isn't very useful. > > Tragedy follows with a litany of devices that randomly die as > they fail to see any MSI (because the RID is wrong) or fail to > get an allocation (because they try to steal LPIs from their > neighbour's pool). > > This will happen on any system where a single ITS is shared by > multiple root ports and end-points with overlapping BDF numbers, > and has the topology described in the device-tree. Simpler DT > topologies will luckily work, and so will ACPI-based systems. > > Solve it by pointing pci_msi_domain_get_msi_rid() at the *parent* > domain, which is the ITS, resulting in a correct mapping and a > restored happiness in my personal zoo. > > Fixes: 6adb35ff43a16 ("irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X]") > Reported-by: Johan Hovold <johan@kernel.org> > Signed-off-by: Marc Zyngier <maz@kernel.org> > Cc: Thomas Gleixner <tglx@linutronix.de> This fixes the regression for the Qualcomm machines I ran into this with: Tested-by: Johan Hovold <johan+linaro@kernel.org> Thanks! Johan
diff --git a/drivers/irqchip/irq-gic-v3-its-msi-parent.c b/drivers/irqchip/irq-gic-v3-its-msi-parent.c index 780e1bc9df549..2f3fc597331ba 100644 --- a/drivers/irqchip/irq-gic-v3-its-msi-parent.c +++ b/drivers/irqchip/irq-gic-v3-its-msi-parent.c @@ -65,7 +65,7 @@ static int its_pci_msi_prepare(struct irq_domain *domain, struct device *dev, } /* ITS specific DeviceID, as the core ITS ignores dev. */ - info->scratchpad[0].ul = pci_msi_domain_get_msi_rid(domain, pdev); + info->scratchpad[0].ul = pci_msi_domain_get_msi_rid(domain->parent, pdev); /* * @domain->msi_domain_info->hwsize contains the size of the
Since 6adb35ff43a16 ("irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X]"), the primary domain a PCI device allocates its interrupts from is the one that is directly attached to the device itself. By virtue of being a PCI device, it has no OF node. This domain is (through more layer than it is worth describing) passed to its_pci_msi_prepare(), which tries to compute the full RID that is presented to the ITS by the device. This is ultimately done by calling pci_msi_domain_get_msi_rid(), passing both the domain and the PCI device as arguments. The baked-in assumption is that either the domain that is passed to pci_msi_domain_get_msi_rid() describes an interrupt controller with either an OF node or an entry in an ACPI IORT table. In our case, it is *neither*. This domain is does not represent anything firmware-based, but just an allocation unit for the device. As a result, we fail to provide the full RID (which requires inspecting the msi-map/msi-mask properties in the DT), and stick to the BDF, which isn't very useful. Tragedy follows with a litany of devices that randomly die as they fail to see any MSI (because the RID is wrong) or fail to get an allocation (because they try to steal LPIs from their neighbour's pool). This will happen on any system where a single ITS is shared by multiple root ports and end-points with overlapping BDF numbers, and has the topology described in the device-tree. Simpler DT topologies will luckily work, and so will ACPI-based systems. Solve it by pointing pci_msi_domain_get_msi_rid() at the *parent* domain, which is the ITS, resulting in a correct mapping and a restored happiness in my personal zoo. Fixes: 6adb35ff43a16 ("irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X]") Reported-by: Johan Hovold <johan@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> --- drivers/irqchip/irq-gic-v3-its-msi-parent.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)