diff mbox series

[v2,1/2] PCI: Forcefully set the PCI_REASSIGN_ALL_BUS flag for Marvell CN96XX/CN10XXX boards

Message ID 20250311135229.3329381-2-Bo.Sun.CN@windriver.com (mailing list archive)
State New
Delegated to: Bjorn Helgaas
Headers show
Series PCI: Marvell CN96XX/CN10XXX quirk and bus-range omission | expand

Commit Message

Bo Sun March 11, 2025, 1:52 p.m. UTC
On our Marvell OCTEON CN96XX board, we observed the following panic on
the latest kernel:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000080
CPU: 22 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6 #20
Hardware name: Marvell OcteonTX CN96XX board (DT)
pc : of_pci_add_properties+0x278/0x4c8
Call trace:
 of_pci_add_properties+0x278/0x4c8 (P)
 of_pci_make_dev_node+0xe0/0x158
 pci_bus_add_device+0x158/0x228
 pci_bus_add_devices+0x40/0x98
 pci_host_probe+0x94/0x118
 pci_host_common_probe+0x130/0x1b0
 platform_probe+0x70/0xf0

The dmesg logs indicated that the PCI bridge was scanning with an invalid bus range:
 pci-host-generic 878020000000.pci: PCI host bridge to bus 0002:00
 pci_bus 0002:00: root bus resource [bus 00-ff]
 pci 0002:00:00.0: scanning [bus f9-f9] behind bridge, pass 0
 pci 0002:00:01.0: scanning [bus fa-fa] behind bridge, pass 0
 pci 0002:00:02.0: scanning [bus fb-fb] behind bridge, pass 0
 pci 0002:00:03.0: scanning [bus fc-fc] behind bridge, pass 0
 pci 0002:00:04.0: scanning [bus fd-fd] behind bridge, pass 0
 pci 0002:00:05.0: scanning [bus fe-fe] behind bridge, pass 0
 pci 0002:00:06.0: scanning [bus ff-ff] behind bridge, pass 0
 pci 0002:00:07.0: scanning [bus 00-00] behind bridge, pass 0
 pci 0002:00:07.0: bridge configuration invalid ([bus 00-00]), reconfiguring
 pci 0002:00:08.0: scanning [bus 01-01] behind bridge, pass 0
 pci 0002:00:09.0: scanning [bus 02-02] behind bridge, pass 0
 pci 0002:00:0a.0: scanning [bus 03-03] behind bridge, pass 0
 pci 0002:00:0b.0: scanning [bus 04-04] behind bridge, pass 0
 pci 0002:00:0c.0: scanning [bus 05-05] behind bridge, pass 0
 pci 0002:00:0d.0: scanning [bus 06-06] behind bridge, pass 0
 pci 0002:00:0e.0: scanning [bus 07-07] behind bridge, pass 0
 pci 0002:00:0f.0: scanning [bus 08-08] behind bridge, pass 0

This regression was introduced by commit 7246a4520b4b ("PCI: Use
preserve_config in place of pci_flags"). On our board, the 0002:00:07.0
bridge is misconfigured by the bootloader. Both its secondary and
subordinate bus numbers are initialized to 0, while its fixed secondary
bus number is set to 8. However, bus number 8 is also assigned to another
bridge (0002:00:0f.0). Although this is a bootloader issue, before the
change in commit 7246a4520b4b, the PCI_REASSIGN_ALL_BUS flag was set
by default when PCI_PROBE_ONLY was not enabled, ensuing that all the
bus number for these bridges were reassigned, avoiding any conflicts.

After the change introduced in commit 7246a4520b4b, the bus numbers
assigned by the bootloader are reused by all other bridges, except
the misconfigured 0002:00:07.0 bridge. The kernel attempt to reconfigure
0002:00:07.0 by reusing the fixed secondary bus number 8 assigned by
bootloader. However, since a pci_bus has already been allocated for
bus 8 due to the probe of 0002:00:0f.0, no new pci_bus allocated for
0002:00:07.0. This results in a pci bridge device without a pci_bus
attached (pdev->subordinate == NULL). Consequently, accessing
pdev->subordinate in of_pci_prop_bus_range() leads to a NULL pointer
dereference.

To summarize, we need to set the PCI_REASSIGN_ALL_BUS flag when
PCI_PROBE_ONLY is not enabled in order to work around issue like the
one described above.

Cc: stable@vger.kernel.org
Fixes: 7246a4520b4b ("PCI: Use preserve_config in place of pci_flags")
Signed-off-by: Bo Sun <Bo.Sun.CN@windriver.com>
---
Changes in v2:
 - Added explicit comment about the quirk, as requested by Mani.
 - Made commit message more clear, as requested by Bjorn.

 drivers/pci/quirks.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
diff mbox series

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 82b21e34c545..cec58c7479e1 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -6181,6 +6181,23 @@  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1536, rom_bar_overlap_defect);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1537, rom_bar_overlap_defect);
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1538, rom_bar_overlap_defect);
 
+/*
+ * Quirk for Marvell CN96XX/CN10XXX boards:
+ *
+ * Adds PCI_REASSIGN_ALL_BUS unless PCI_PROBE_ONLY is set, forcing bus number
+ * reassignment to avoid conflicts caused by bootloader misconfigured PCI bridges.
+ *
+ * This resolves a regression introduced by commit 7246a4520b4b ("PCI: Use
+ * preserve_config in place of pci_flags"), which removed this behavior.
+ */
+static void quirk_marvell_cn96xx_cn10xxx_reassign_all_busnr(struct pci_dev *dev)
+{
+	if (!pci_has_flag(PCI_PROBE_ONLY))
+		pci_add_flags(PCI_REASSIGN_ALL_BUS);
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_CAVIUM, 0xa002,
+			 quirk_marvell_cn96xx_cn10xxx_reassign_all_busnr);
+
 #ifdef CONFIG_PCIEASPM
 /*
  * Several Intel DG2 graphics devices advertise that they can only tolerate