Message ID | 20211101150405.14618-1-pali@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | PCI: Marvell: Update PCIe fixup | expand |
On Mon, Nov 01, 2021 at 04:04:05PM +0100, Pali Rohár wrote: > - The code relies on rc_pci_fixup being called, which only happens > when CONFIG_PCI_QUIRKS is enabled, so add that to Kconfig. Omitting > this causes a booting failure with a non-obvious cause. > - Update rc_pci_fixup to set the class properly, copying the > more modern style from other places > - Correct the rc_pci_fixup comment > > This patch just re-applies commit 1dc831bf53fd ("ARM: Kirkwood: Update > PCI-E fixup") for all other Marvell platforms which use same buggy PCIe > controller. I wonder if that code is even relevant any more since we started using CONFIG_PCI_MVEBU ? Really, these broken controllers should not be used "raw" but always via their special host bridge driver that fixes all the config space problems. Jason
On Monday 01 November 2021 13:27:11 Jason Gunthorpe wrote: > On Mon, Nov 01, 2021 at 04:04:05PM +0100, Pali Rohár wrote: > > - The code relies on rc_pci_fixup being called, which only happens > > when CONFIG_PCI_QUIRKS is enabled, so add that to Kconfig. Omitting > > this causes a booting failure with a non-obvious cause. > > - Update rc_pci_fixup to set the class properly, copying the > > more modern style from other places > > - Correct the rc_pci_fixup comment > > > > This patch just re-applies commit 1dc831bf53fd ("ARM: Kirkwood: Update > > PCI-E fixup") for all other Marvell platforms which use same buggy PCIe > > controller. > > I wonder if that code is even relevant any more since we started using > CONFIG_PCI_MVEBU > > ? It is (still) relevant for platforms which do not use CONFIG_PCI_MVEBU yet. > Really, these broken controllers should not be used "raw" but always > via their special host bridge driver that fixes all the config space > problems. I agree. Long-term goal should be to convert these platforms to use pci-mvebu.c driver. And until it happens simple fixes like in commit 1dc831bf53fd is needed for all affected Marvell platforms. Some details how these Marvell PCIe controllers are broken is in email: https://lore.kernel.org/linux-pci/20211003120944.3lmwxylnhlp2kfj7@pali/
On Mon, Nov 01, 2021 at 06:56:49PM +0100, Pali Rohár wrote: > On Monday 01 November 2021 13:27:11 Jason Gunthorpe wrote: > > On Mon, Nov 01, 2021 at 04:04:05PM +0100, Pali Rohár wrote: > > > - The code relies on rc_pci_fixup being called, which only happens > > > when CONFIG_PCI_QUIRKS is enabled, so add that to Kconfig. Omitting > > > this causes a booting failure with a non-obvious cause. > > > - Update rc_pci_fixup to set the class properly, copying the > > > more modern style from other places > > > - Correct the rc_pci_fixup comment > > > > > > This patch just re-applies commit 1dc831bf53fd ("ARM: Kirkwood: Update > > > PCI-E fixup") for all other Marvell platforms which use same buggy PCIe > > > controller. > > > > I wonder if that code is even relevant any more since we started using > > CONFIG_PCI_MVEBU > > > > ? > > It is (still) relevant for platforms which do not use CONFIG_PCI_MVEBU > yet. I think you should explain this in the commit message > > Really, these broken controllers should not be used "raw" but always > > via their special host bridge driver that fixes all the config space > > problems. > > I agree. > > Long-term goal should be to convert these platforms to use pci-mvebu.c > driver. And until it happens simple fixes like in commit 1dc831bf53fd is > needed for all affected Marvell platforms. IIRC all these platforms were obsolete before I wrote the above commit, so I'm not sure why this has suddenly cropped up? If you want to use a new kernel on this really old HW then update to use the right PCI driver? > Some details how these Marvell PCIe controllers are broken is in email: > https://lore.kernel.org/linux-pci/20211003120944.3lmwxylnhlp2kfj7@pali/ Yes, everything about this controller is broken. It does not function properly without its driver. The solution to these bugs is to use the driver, which was specifically made to deal with them. That said, I can't recall if this driver needs some fixing to work with pre-kirkwood systems.. Jason
On Mon, Nov 01, 2021 at 04:04:05PM +0100, Pali Rohár wrote: > - The code relies on rc_pci_fixup being called, which only happens > when CONFIG_PCI_QUIRKS is enabled, so add that to Kconfig. Omitting > this causes a booting failure with a non-obvious cause. > - Update rc_pci_fixup to set the class properly, copying the > more modern style from other places > - Correct the rc_pci_fixup comment > > This patch just re-applies commit 1dc831bf53fd ("ARM: Kirkwood: Update > PCI-E fixup") for all other Marvell platforms which use same buggy PCIe > controller. > [..] > diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig > index 771ca53af06d..c8d51bd20b84 100644 > --- a/arch/mips/Kconfig > +++ b/arch/mips/Kconfig > @@ -346,6 +346,7 @@ config MIPS_COBALT > select CEVT_GT641XX > select DMA_NONCOHERENT > select FORCE_PCI > + select PCI_QUIRKS > select I8253 > select I8259 > select IRQ_MIPS_CPU this is enabled by default, via drivers/pci/Kconfig config PCI_QUIRKS default y bool "Enable PCI quirk workarounds" if EXPERT help This enables workarounds for various PCI chipset bugs/quirks. Disable this only if your target machine is unaffected by PCI quirks. > diff --git a/arch/mips/pci/fixup-cobalt.c b/arch/mips/pci/fixup-cobalt.c > index 44be65c3e6bb..202f3a0bd97d 100644 > --- a/arch/mips/pci/fixup-cobalt.c > +++ b/arch/mips/pci/fixup-cobalt.c > @@ -36,6 +36,12 @@ > #define VIA_COBALT_BRD_ID_REG 0x94 > #define VIA_COBALT_BRD_REG_to_ID(reg) ((unsigned char)(reg) >> 4) > > +/* > + * The root complex has a hardwired class of PCI_CLASS_MEMORY_OTHER, when it > + * is operating as a root complex this needs to be switched to > + * PCI_CLASS_BRIDGE_HOST or Linux will errantly try to process the BAR's on > + * the device. Decoding setup is handled by the orion code. > + */ > static void qube_raq_galileo_early_fixup(struct pci_dev *dev) > { > if (dev->devfn == PCI_DEVFN(0, 0) && this is not a PCIe controller, so how is this patch related ? Thomas.
On Tuesday 02 November 2021 09:42:41 Thomas Bogendoerfer wrote: > On Mon, Nov 01, 2021 at 04:04:05PM +0100, Pali Rohár wrote: > > - The code relies on rc_pci_fixup being called, which only happens > > when CONFIG_PCI_QUIRKS is enabled, so add that to Kconfig. Omitting > > this causes a booting failure with a non-obvious cause. > > - Update rc_pci_fixup to set the class properly, copying the > > more modern style from other places > > - Correct the rc_pci_fixup comment > > > > This patch just re-applies commit 1dc831bf53fd ("ARM: Kirkwood: Update > > PCI-E fixup") for all other Marvell platforms which use same buggy PCIe > > controller. > > [..] > > > diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig > > index 771ca53af06d..c8d51bd20b84 100644 > > --- a/arch/mips/Kconfig > > +++ b/arch/mips/Kconfig > > @@ -346,6 +346,7 @@ config MIPS_COBALT > > select CEVT_GT641XX > > select DMA_NONCOHERENT > > select FORCE_PCI > > + select PCI_QUIRKS > > select I8253 > > select I8259 > > select IRQ_MIPS_CPU > > this is enabled by default, via drivers/pci/Kconfig IIRC 'default y' can be disabled but 'select' not. > > config PCI_QUIRKS > default y > bool "Enable PCI quirk workarounds" if EXPERT > help > This enables workarounds for various PCI chipset bugs/quirks. > Disable this only if your target machine is unaffected by PCI > quirks. > > > diff --git a/arch/mips/pci/fixup-cobalt.c b/arch/mips/pci/fixup-cobalt.c > > index 44be65c3e6bb..202f3a0bd97d 100644 > > --- a/arch/mips/pci/fixup-cobalt.c > > +++ b/arch/mips/pci/fixup-cobalt.c > > @@ -36,6 +36,12 @@ > > #define VIA_COBALT_BRD_ID_REG 0x94 > > #define VIA_COBALT_BRD_REG_to_ID(reg) ((unsigned char)(reg) >> 4) > > > > +/* > > + * The root complex has a hardwired class of PCI_CLASS_MEMORY_OTHER, when it > > + * is operating as a root complex this needs to be switched to > > + * PCI_CLASS_BRIDGE_HOST or Linux will errantly try to process the BAR's on > > + * the device. Decoding setup is handled by the orion code. > > + */ > > static void qube_raq_galileo_early_fixup(struct pci_dev *dev) > > { > > if (dev->devfn == PCI_DEVFN(0, 0) && > > this is not a PCIe controller, so how is this patch related ? I put that comment into all quirk code which is related to Marvell PCIe device XX:00.0 and changes PCI class type from PCI_CLASS_MEMORY_OTHER to PCI_CLASS_BRIDGE_HOST. From all what I saw, I'm sure that this device with this specific characteristics is really (non-compliant) Marvell PCIe controller. But I do not have this hardware to verify it. > Thomas. > > -- > Crap can work. Given enough thrust pigs will fly, but it's not necessarily a > good idea. [ RFC1925, 2.3 ]
On Tue, Nov 02, 2021 at 10:02:46AM +0100, Pali Rohár wrote: > On Tuesday 02 November 2021 09:42:41 Thomas Bogendoerfer wrote: > > On Mon, Nov 01, 2021 at 04:04:05PM +0100, Pali Rohár wrote: > > > - The code relies on rc_pci_fixup being called, which only happens > > > when CONFIG_PCI_QUIRKS is enabled, so add that to Kconfig. Omitting > > > this causes a booting failure with a non-obvious cause. > > > - Update rc_pci_fixup to set the class properly, copying the > > > more modern style from other places > > > - Correct the rc_pci_fixup comment > > > > > > This patch just re-applies commit 1dc831bf53fd ("ARM: Kirkwood: Update > > > PCI-E fixup") for all other Marvell platforms which use same buggy PCIe > > > controller. > > > [..] > > > > > diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig > > > index 771ca53af06d..c8d51bd20b84 100644 > > > --- a/arch/mips/Kconfig > > > +++ b/arch/mips/Kconfig > > > @@ -346,6 +346,7 @@ config MIPS_COBALT > > > select CEVT_GT641XX > > > select DMA_NONCOHERENT > > > select FORCE_PCI > > > + select PCI_QUIRKS > > > select I8253 > > > select I8259 > > > select IRQ_MIPS_CPU > > > > this is enabled by default, via drivers/pci/Kconfig > > IIRC 'default y' can be disabled but 'select' not. overruled only if CONFIG_EXPERT is enabled, which IMHO sounds good enough. > > config PCI_QUIRKS > > default y > > bool "Enable PCI quirk workarounds" if EXPERT > > help > > This enables workarounds for various PCI chipset bugs/quirks. > > Disable this only if your target machine is unaffected by PCI > > quirks. > > > > > diff --git a/arch/mips/pci/fixup-cobalt.c b/arch/mips/pci/fixup-cobalt.c > > > index 44be65c3e6bb..202f3a0bd97d 100644 > > > --- a/arch/mips/pci/fixup-cobalt.c > > > +++ b/arch/mips/pci/fixup-cobalt.c > > > @@ -36,6 +36,12 @@ > > > #define VIA_COBALT_BRD_ID_REG 0x94 > > > #define VIA_COBALT_BRD_REG_to_ID(reg) ((unsigned char)(reg) >> 4) > > > > > > +/* > > > + * The root complex has a hardwired class of PCI_CLASS_MEMORY_OTHER, when it > > > + * is operating as a root complex this needs to be switched to > > > + * PCI_CLASS_BRIDGE_HOST or Linux will errantly try to process the BAR's on > > > + * the device. Decoding setup is handled by the orion code. > > > + */ > > > static void qube_raq_galileo_early_fixup(struct pci_dev *dev) > > > { > > > if (dev->devfn == PCI_DEVFN(0, 0) && > > > > this is not a PCIe controller, so how is this patch related ? > > I put that comment into all quirk code which is related to Marvell PCIe > device XX:00.0 and changes PCI class type from PCI_CLASS_MEMORY_OTHER to > PCI_CLASS_BRIDGE_HOST. > > >From all what I saw, I'm sure that this device with this specific > characteristics is really (non-compliant) Marvell PCIe controller. just nitpicking, it's a Galileo PCI bridge and not PCIe. > But I do not have this hardware to verify it. I still have a few Cobalt systems here. Thomas.
On Tuesday 02 November 2021 10:47:00 Thomas Bogendoerfer wrote: > On Tue, Nov 02, 2021 at 10:02:46AM +0100, Pali Rohár wrote: > > On Tuesday 02 November 2021 09:42:41 Thomas Bogendoerfer wrote: > > > On Mon, Nov 01, 2021 at 04:04:05PM +0100, Pali Rohár wrote: > > > > - The code relies on rc_pci_fixup being called, which only happens > > > > when CONFIG_PCI_QUIRKS is enabled, so add that to Kconfig. Omitting > > > > this causes a booting failure with a non-obvious cause. > > > > - Update rc_pci_fixup to set the class properly, copying the > > > > more modern style from other places > > > > - Correct the rc_pci_fixup comment > > > > > > > > This patch just re-applies commit 1dc831bf53fd ("ARM: Kirkwood: Update > > > > PCI-E fixup") for all other Marvell platforms which use same buggy PCIe > > > > controller. > > > > [..] > > > > > > > diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig > > > > index 771ca53af06d..c8d51bd20b84 100644 > > > > --- a/arch/mips/Kconfig > > > > +++ b/arch/mips/Kconfig > > > > @@ -346,6 +346,7 @@ config MIPS_COBALT > > > > select CEVT_GT641XX > > > > select DMA_NONCOHERENT > > > > select FORCE_PCI > > > > + select PCI_QUIRKS > > > > select I8253 > > > > select I8259 > > > > select IRQ_MIPS_CPU > > > > > > this is enabled by default, via drivers/pci/Kconfig > > > > IIRC 'default y' can be disabled but 'select' not. > > overruled only if CONFIG_EXPERT is enabled, which IMHO sounds good enough. Well, if you think this is not needed (anymore), I can drop it. I just reapplied original fix 1dc831bf53fd. > > > config PCI_QUIRKS > > > default y > > > bool "Enable PCI quirk workarounds" if EXPERT > > > help > > > This enables workarounds for various PCI chipset bugs/quirks. > > > Disable this only if your target machine is unaffected by PCI > > > quirks. > > > > > > > diff --git a/arch/mips/pci/fixup-cobalt.c b/arch/mips/pci/fixup-cobalt.c > > > > index 44be65c3e6bb..202f3a0bd97d 100644 > > > > --- a/arch/mips/pci/fixup-cobalt.c > > > > +++ b/arch/mips/pci/fixup-cobalt.c > > > > @@ -36,6 +36,12 @@ > > > > #define VIA_COBALT_BRD_ID_REG 0x94 > > > > #define VIA_COBALT_BRD_REG_to_ID(reg) ((unsigned char)(reg) >> 4) > > > > > > > > +/* > > > > + * The root complex has a hardwired class of PCI_CLASS_MEMORY_OTHER, when it > > > > + * is operating as a root complex this needs to be switched to > > > > + * PCI_CLASS_BRIDGE_HOST or Linux will errantly try to process the BAR's on > > > > + * the device. Decoding setup is handled by the orion code. > > > > + */ > > > > static void qube_raq_galileo_early_fixup(struct pci_dev *dev) > > > > { > > > > if (dev->devfn == PCI_DEVFN(0, 0) && > > > > > > this is not a PCIe controller, so how is this patch related ? > > > > I put that comment into all quirk code which is related to Marvell PCIe > > device XX:00.0 and changes PCI class type from PCI_CLASS_MEMORY_OTHER to > > PCI_CLASS_BRIDGE_HOST. > > > > >From all what I saw, I'm sure that this device with this specific > > characteristics is really (non-compliant) Marvell PCIe controller. > > just nitpicking, it's a Galileo PCI bridge and not PCIe. Marvell acquired Galileo Technology in the past, so it is possible that this bad design is originated in Galileo. And maybe same for PCIe from PCI. At least PCI vendor id for all (new) PCIe controllers is this one. > > But I do not have this hardware to verify it. > > I still have a few Cobalt systems here. Perfect! It would help if you could provide 'lspci -nn -vv' output from that system. In case you have very old version of lspci on that system you could try to run it with '-xxxx' (or '-xxx') which prints hexdump and I can parse it with local lspci. > Thomas. > > -- > Crap can work. Given enough thrust pigs will fly, but it's not necessarily a > good idea. [ RFC1925, 2.3 ]
On Tue, 2 Nov 2021, Pali Rohár wrote: > > > >From all what I saw, I'm sure that this device with this specific > > > characteristics is really (non-compliant) Marvell PCIe controller. > > > > just nitpicking, it's a Galileo PCI bridge and not PCIe. > > Marvell acquired Galileo Technology in the past, so it is possible that > this bad design is originated in Galileo. And maybe same for PCIe from > PCI. At least PCI vendor id for all (new) PCIe controllers is this one. Umm, PCIe is so different hardware-wise from PCI I doubt there's any chance for errata to be carried across. Plus the MIPS SysAD bus is so different from other CPU buses. And we're talking 20+ years old Galileo devices here. None of the Galileo system controllers I came across had the class code set incorrectly. > > > But I do not have this hardware to verify it. > > > > I still have a few Cobalt systems here. > > Perfect! It would help if you could provide 'lspci -nn -vv' output from > that system. In case you have very old version of lspci on that system > you could try to run it with '-xxxx' (or '-xxx') which prints hexdump > and I can parse it with local lspci. For the record here's one from a core card used with a Malta (as with arch/mips/pci/fixup-malta.c); it has a newer 64120A chip (marked as an engineering sample BTW): 00:00.0 Host bridge: Marvell Technology Group Ltd. GT-64120/64120A/64121A System Controller (rev 11) 00: ab 11 20 46 47 01 80 22 11 00 00 06 00 20 00 00 10: 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 00 20: 00 00 e0 1b 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 ff 01 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 The lack of a quirk with a platform does not mean it cannot have a certain PCI/e device. As I recall various Atlas/Malta core cards had any of the three device variants covered by this vendor:device ID and later batches were actually indeed marked Marvell rather than Galileo. Maciej
On Tuesday 02 November 2021 12:35:41 Maciej W. Rozycki wrote: > On Tue, 2 Nov 2021, Pali Rohár wrote: > > > > > >From all what I saw, I'm sure that this device with this specific > > > > characteristics is really (non-compliant) Marvell PCIe controller. > > > > > > just nitpicking, it's a Galileo PCI bridge and not PCIe. > > > > Marvell acquired Galileo Technology in the past, so it is possible that > > this bad design is originated in Galileo. And maybe same for PCIe from > > PCI. At least PCI vendor id for all (new) PCIe controllers is this one. > > Umm, PCIe is so different hardware-wise from PCI Yes hardware is different. But software is same and fully backward compatible. And base part of config space is same for both PCI an PCIe. So it is possible to copy+paste HDL code which fills initial values into config space from old PCI IPs to new PCIe IPs. > I doubt there's any > chance for errata to be carried across. Plus the MIPS SysAD bus is so > different from other CPU buses. And we're talking 20+ years old Galileo > devices here. > > None of the Galileo system controllers I came across had the class code > set incorrectly. In kernel there is quirk only for one device with id: PCI_VENDOR_ID_MARVELL (0x11ab) PCI_DEVICE_ID_MARVELL_GT64111 (0x4146) So for some reasons quirk is needed... Anyway, patch for this quirk just adds comment as there is no explanation for it. It does not modify quirk code. So it is possible that Marvell (or rather Galileo at that time) included some config space fixup in some products and 0x4146 did not have it. Just guessing... We can really only guess what could happen at that time 20 years ago... Running git blame told me that this class code quirk was introduce in: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c4ed38a0c6e2e5c4906296758f816ee71373792f But there is also no information... > > > > But I do not have this hardware to verify it. > > > > > > I still have a few Cobalt systems here. > > > > Perfect! It would help if you could provide 'lspci -nn -vv' output from > > that system. In case you have very old version of lspci on that system > > you could try to run it with '-xxxx' (or '-xxx') which prints hexdump > > and I can parse it with local lspci. > > For the record here's one from a core card used with a Malta (as with > arch/mips/pci/fixup-malta.c); it has a newer 64120A chip (marked as an > engineering sample BTW): > > 00:00.0 Host bridge: Marvell Technology Group Ltd. GT-64120/64120A/64121A System Controller (rev 11) > 00: ab 11 20 46 47 01 80 22 11 00 00 06 00 20 00 00 > 10: 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 00 > 20: 00 00 e0 1b 00 00 00 00 00 00 00 00 00 00 00 00 > 30: 00 00 00 00 00 00 00 00 00 00 00 00 ff 01 00 00 > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > The lack of a quirk with a platform does not mean it cannot have a certain > PCI/e device. This is 11ab:4620 device an there is no PCIe capability in its config space (you can inspect it via 'lspci -F dump.txt -nn -vv' but there is nothing interesting). > As I recall various Atlas/Malta core cards had any of the three device > variants covered by this vendor:device ID and later batches were actually > indeed marked Marvell rather than Galileo. > > Maciej
On Tue, 2 Nov 2021, Pali Rohár wrote: > > None of the Galileo system controllers I came across had the class code > > set incorrectly. > > In kernel there is quirk only for one device with id: > PCI_VENDOR_ID_MARVELL (0x11ab) PCI_DEVICE_ID_MARVELL_GT64111 (0x4146) > > So for some reasons quirk is needed... Anyway, patch for this quirk just > adds comment as there is no explanation for it. It does not modify quirk > code. > > So it is possible that Marvell (or rather Galileo at that time) included > some config space fixup in some products and 0x4146 did not have it. > Just guessing... We can really only guess what could happen at that time > 20 years ago... Ah, there you go! -- sadly I don't seem to have a copy of the datasheet for the GT-64111, but the GT-64115 has it[1]: Table 158: PCI Class Code and Revision ID, Offset: 0x008 Bits Field name Function Initial Value 7:0 RevID Indicates the GT-64115 PCI Revision 0x01 number. 15:8 Reserved Read only. 0x0 23:16 SubClass Indicates the GT-64115 Subclass - Mem- 0x80 ory controller. 31:24 BaseClass Indicates the GT-64115 Base Class - 0x05 memory controller. and then: "Device and Vendor ID (0x000), Class Code and Revision ID (0x008), and Header Type (0x00e) fields are read only from the PCI bus. These fields can be modified and read via the CPU bus." Likewise with the GT-64120[2]: Table 208: PCI_0 Class Code and Revision ID, Offset: 0x008 from PCI_0 or CPU; 0x088 from PCI_1 Bits Field name Function Initial Value 7:0 RevID Indicates the GT-64120 PCI_0 revision number. 0x02 15:8 Reserved Read Only 0. 0x0 23:16 SubClass Indicates the GT-64120 Subclass Depends on value 0x00 - Host Bridge Device. sampled at reset 0x80 - Memory Device. on BankSel[0]. See Table 44 on page 11-1. 31:24 BaseClass Indicates the GT-64120 Base Class Depends on value 0x06 - Bridge Device. sampled at reset 0x05 - Memory Device. on BankSel[0]. See Table 44 on page 11-1. Table 209: PCI_1 Class Code and Revision ID, Offset: 0x088 from PCI_0 or CPU; 0x008 from PCI_1 Bits Field name Function Initial Value 31:0 Various Same as for PCI_0 Class Code and Revision ID. and then also: "Device and Vendor ID (0x000), Class Code and Revision ID (0x008), and Header Type (0x00e) fields are read only from the PCI bus. These fields can be modified and read via the CPU bus." -- so this is system-specific and an intended chip feature rather than an erratum (or rather it is a system erratum if the reset strap or the boot firmware has got it wrong). The memory device class code is IIUC meant to be typically chosen when the Galileo/Marvell device is used without the CPU interface, i.e. as a PCI memory controller device only[3]. > > The lack of a quirk with a platform does not mean it cannot have a certain > > PCI/e device. > > This is 11ab:4620 device an there is no PCIe capability in its config > space (you can inspect it via 'lspci -F dump.txt -nn -vv' but there is > nothing interesting). Of course, just as Thomas told you about the GT-64111 too. But you were right in that the memory controller feature seems shared across all the chip line, whether PCI or PCIe. References: [1] "GT-64115 System Controller for RC4640, RM523X, and VR4300 CPUs", Galileo Technology, Datasheet Revision 1.11, APR 04, 2000, Section 18.16 "PCI Configuration", p. 161 [2] "GT-64120 System Controller For RC4650/4700/5000 and RM526X/527X/7000 CPUs", Galileo Technology, Datasheet Revision 1.4, SEP 14, 1999, Section 17.16 "PCI Configuration", p. 17-59 [3] same, Chapter 14. "Using the GT-64120 Without the CPU Interface", p. 14-1 Maciej
On Tuesday 02 November 2021 14:01:41 Maciej W. Rozycki wrote: > On Tue, 2 Nov 2021, Pali Rohár wrote: > > > > None of the Galileo system controllers I came across had the class code > > > set incorrectly. > > > > In kernel there is quirk only for one device with id: > > PCI_VENDOR_ID_MARVELL (0x11ab) PCI_DEVICE_ID_MARVELL_GT64111 (0x4146) > > > > So for some reasons quirk is needed... Anyway, patch for this quirk just > > adds comment as there is no explanation for it. It does not modify quirk > > code. > > > > So it is possible that Marvell (or rather Galileo at that time) included > > some config space fixup in some products and 0x4146 did not have it. > > Just guessing... We can really only guess what could happen at that time > > 20 years ago... > > Ah, there you go! -- sadly I don't seem to have a copy of the datasheet > for the GT-64111, but the GT-64115 has it[1]: > > Table 158: PCI Class Code and Revision ID, Offset: 0x008 > Bits Field name Function Initial Value > 7:0 RevID Indicates the GT-64115 PCI Revision 0x01 > number. > 15:8 Reserved Read only. 0x0 > 23:16 SubClass Indicates the GT-64115 Subclass - Mem- 0x80 > ory controller. > 31:24 BaseClass Indicates the GT-64115 Base Class - 0x05 > memory controller. > > and then: > > "Device and Vendor ID (0x000), Class Code and Revision ID (0x008), and > Header Type (0x00e) fields are read only from the PCI bus. These fields > can be modified and read via the CPU bus." > > Likewise with the GT-64120[2]: > > Table 208: PCI_0 Class Code and Revision ID, Offset: 0x008 from PCI_0 or CPU; 0x088 from > PCI_1 > Bits Field name Function Initial Value > 7:0 RevID Indicates the GT-64120 PCI_0 revision number. 0x02 > 15:8 Reserved Read Only 0. 0x0 > 23:16 SubClass Indicates the GT-64120 Subclass Depends on value > 0x00 - Host Bridge Device. sampled at reset > 0x80 - Memory Device. on BankSel[0]. See > Table 44 on page > 11-1. > 31:24 BaseClass Indicates the GT-64120 Base Class Depends on value > 0x06 - Bridge Device. sampled at reset > 0x05 - Memory Device. on BankSel[0]. See > Table 44 on page > 11-1. > > Table 209: PCI_1 Class Code and Revision ID, Offset: 0x088 from PCI_0 or CPU; 0x008 from > PCI_1 > Bits Field name Function Initial Value > 31:0 Various Same as for PCI_0 Class Code and Revision ID. > > and then also: > > "Device and Vendor ID (0x000), Class Code and Revision ID (0x008), and > Header Type (0x00e) fields are read only from the PCI bus. These fields > can be modified and read via the CPU bus." > > -- so this is system-specific and an intended chip feature rather than an > erratum (or rather it is a system erratum if the reset strap or the boot > firmware has got it wrong). > > The memory device class code is IIUC meant to be typically chosen when > the Galileo/Marvell device is used without the CPU interface, i.e. as a > PCI memory controller device only[3]. > > > > The lack of a quirk with a platform does not mean it cannot have a certain > > > PCI/e device. > > > > This is 11ab:4620 device an there is no PCIe capability in its config > > space (you can inspect it via 'lspci -F dump.txt -nn -vv' but there is > > nothing interesting). > > Of course, just as Thomas told you about the GT-64111 too. But you were > right in that the memory controller feature seems shared across all the > chip line, whether PCI or PCIe. > > References: > > [1] "GT-64115 System Controller for RC4640, RM523X, and VR4300 CPUs", > Galileo Technology, Datasheet Revision 1.11, APR 04, 2000, Section > 18.16 "PCI Configuration", p. 161 > > [2] "GT-64120 System Controller For RC4650/4700/5000 and RM526X/527X/7000 > CPUs", Galileo Technology, Datasheet Revision 1.4, SEP 14, 1999, > Section 17.16 "PCI Configuration", p. 17-59 > > [3] same, Chapter 14. "Using the GT-64120 Without the CPU Interface", p. > 14-1 > > Maciej Hello Maciej! Thank you very much for the explanation! Now it makes sense and looks like that for GT-64111 it is "system erratum" that strapping pins are incorrectly set which leads to wrong PCI class code. I will update comment for GT-64111 quirk in v2. I'm surprised that Marvell copied this 20 years old MIPS Galileo PCI logic into followup ARM SoC PCIe IPs (and later also into recent ARM64 A3720 SoC PCIe IP), removed configuration of PCI class code via strapping pins and let default PCI class code value to Memory device, even also when PCIe controller is running in Root Complex mode. And so correction can be done only from "CPU bus". Just Marvell forgot to include chapter about usage without CPU interface in new ARM and ARM64 SoCs and origin/usage of that Memory Controller Device was lost in history, even Marvell people was not able to figure out what was wrong with PCIe IP in their ARM SoCs... https://lore.kernel.org/linux-pci/20211003120944.3lmwxylnhlp2kfj7@pali/ Maciej, if I had known that you have this kind of information I would have written you year ago :-)
On Tue, Nov 02, 2021 at 11:00:34AM +0100, Pali Rohár wrote: > > > But I do not have this hardware to verify it. > > > > I still have a few Cobalt systems here. > > Perfect! It would help if you could provide 'lspci -nn -vv' output from > that system. In case you have very old version of lspci on that system > you could try to run it with '-xxxx' (or '-xxx') which prints hexdump > and I can parse it with local lspci. not sure, if you still needed: root@raq2:~# lspci -nn -vv 00:00.0 Host bridge [0600]: Marvell Technology Group Ltd. Device [11ab:4146] (rev 11) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR+ INTx- Latency: 64, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 0 Region 1: Memory at 08000000 (32-bit, non-prefetchable) [size=128M] Region 2: Memory at 1c000000 (32-bit, non-prefetchable) [size=32M] Region 3: Memory at 1f000000 (32-bit, non-prefetchable) [size=16M] Region 4: Memory at 14000000 (32-bit, non-prefetchable) [size=4K] Region 5: I/O ports at 4000000 [disabled] [size=4K] root@raq2:~# lspci -xxxx 00:00.0 Host bridge: Marvell Technology Group Ltd. Device 4146 (rev 11) 00: ab 11 46 41 06 00 80 a2 11 00 80 05 08 40 00 00 10: 00 00 00 00 00 00 00 08 00 00 00 1c 00 00 00 1f 20: 00 00 00 14 01 00 00 14 00 00 00 00 00 00 00 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Thomas.
On Tuesday 02 November 2021 16:02:01 Thomas Bogendoerfer wrote: > On Tue, Nov 02, 2021 at 11:00:34AM +0100, Pali Rohár wrote: > > > > But I do not have this hardware to verify it. > > > > > > I still have a few Cobalt systems here. > > > > Perfect! It would help if you could provide 'lspci -nn -vv' output from > > that system. In case you have very old version of lspci on that system > > you could try to run it with '-xxxx' (or '-xxx') which prints hexdump > > and I can parse it with local lspci. > > not sure, if you still needed: > > root@raq2:~# lspci -nn -vv > 00:00.0 Host bridge [0600]: Marvell Technology Group Ltd. Device [11ab:4146] (rev 11) > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR+ INTx- > Latency: 64, Cache Line Size: 32 bytes > Interrupt: pin A routed to IRQ 0 > Region 1: Memory at 08000000 (32-bit, non-prefetchable) [size=128M] > Region 2: Memory at 1c000000 (32-bit, non-prefetchable) [size=32M] > Region 3: Memory at 1f000000 (32-bit, non-prefetchable) [size=16M] > Region 4: Memory at 14000000 (32-bit, non-prefetchable) [size=4K] > Region 5: I/O ports at 4000000 [disabled] [size=4K] > > > root@raq2:~# lspci -xxxx > 00:00.0 Host bridge: Marvell Technology Group Ltd. Device 4146 (rev 11) > 00: ab 11 46 41 06 00 80 a2 11 00 80 05 08 40 00 00 ^^ ^^ ^^ Here is class code So it confirms that PCI Class code is 0580 which is Memory Controller. And not Host Bridge as it should be. If I put this hexdump into dump.txt and run 'lspci -F dump.txt -nn' then I see: 00:00.0 Memory controller [0580]: Marvell Technology Group Ltd. Device [11ab:4146] (rev 11) In your output above is "Host bridge" which means that quirk was applied: 00:00.0 Host bridge [0600]: Marvell Technology Group Ltd. Device [11ab:4146] (rev 11) (I guess in 'lspci -nn -vv -b' should be Memory controller as lspci with '-b' should not see that quirk change) > 10: 00 00 00 00 00 00 00 08 00 00 00 1c 00 00 00 1f > 20: 00 00 00 14 01 00 00 14 00 00 00 00 00 00 00 00 > 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > Thomas. > > -- > Crap can work. Given enough thrust pigs will fly, but it's not necessarily a > good idea. [ RFC1925, 2.3 ]
On Tuesday 02 November 2021 15:49:29 Pali Rohár wrote: > On Tuesday 02 November 2021 14:01:41 Maciej W. Rozycki wrote: > > On Tue, 2 Nov 2021, Pali Rohár wrote: > > > > > > None of the Galileo system controllers I came across had the class code > > > > set incorrectly. > > > > > > In kernel there is quirk only for one device with id: > > > PCI_VENDOR_ID_MARVELL (0x11ab) PCI_DEVICE_ID_MARVELL_GT64111 (0x4146) > > > > > > So for some reasons quirk is needed... Anyway, patch for this quirk just > > > adds comment as there is no explanation for it. It does not modify quirk > > > code. > > > > > > So it is possible that Marvell (or rather Galileo at that time) included > > > some config space fixup in some products and 0x4146 did not have it. > > > Just guessing... We can really only guess what could happen at that time > > > 20 years ago... > > > > Ah, there you go! -- sadly I don't seem to have a copy of the datasheet > > for the GT-64111, but the GT-64115 has it[1]: > > > > Table 158: PCI Class Code and Revision ID, Offset: 0x008 > > Bits Field name Function Initial Value > > 7:0 RevID Indicates the GT-64115 PCI Revision 0x01 > > number. > > 15:8 Reserved Read only. 0x0 > > 23:16 SubClass Indicates the GT-64115 Subclass - Mem- 0x80 > > ory controller. > > 31:24 BaseClass Indicates the GT-64115 Base Class - 0x05 > > memory controller. > > > > and then: > > > > "Device and Vendor ID (0x000), Class Code and Revision ID (0x008), and > > Header Type (0x00e) fields are read only from the PCI bus. These fields > > can be modified and read via the CPU bus." > > > > Likewise with the GT-64120[2]: > > > > Table 208: PCI_0 Class Code and Revision ID, Offset: 0x008 from PCI_0 or CPU; 0x088 from > > PCI_1 > > Bits Field name Function Initial Value > > 7:0 RevID Indicates the GT-64120 PCI_0 revision number. 0x02 > > 15:8 Reserved Read Only 0. 0x0 > > 23:16 SubClass Indicates the GT-64120 Subclass Depends on value > > 0x00 - Host Bridge Device. sampled at reset > > 0x80 - Memory Device. on BankSel[0]. See > > Table 44 on page > > 11-1. > > 31:24 BaseClass Indicates the GT-64120 Base Class Depends on value > > 0x06 - Bridge Device. sampled at reset > > 0x05 - Memory Device. on BankSel[0]. See > > Table 44 on page > > 11-1. > > > > Table 209: PCI_1 Class Code and Revision ID, Offset: 0x088 from PCI_0 or CPU; 0x008 from > > PCI_1 > > Bits Field name Function Initial Value > > 31:0 Various Same as for PCI_0 Class Code and Revision ID. > > > > and then also: > > > > "Device and Vendor ID (0x000), Class Code and Revision ID (0x008), and > > Header Type (0x00e) fields are read only from the PCI bus. These fields > > can be modified and read via the CPU bus." > > > > -- so this is system-specific and an intended chip feature rather than an > > erratum (or rather it is a system erratum if the reset strap or the boot > > firmware has got it wrong). > > > > The memory device class code is IIUC meant to be typically chosen when > > the Galileo/Marvell device is used without the CPU interface, i.e. as a > > PCI memory controller device only[3]. I have found on internet some copy of GT64111 datasheet ("GT-64111 System Controller for RC4640, RM523X and VR4300 CPUs", Galileo Technology, Product Preview Revision 1.1, FEB 4, 1999) and in section "17.15 PCI Configuration Registers" there is subsection "Class Code and Revision ID, Offset: 0x008" with content: Bits Field name Function Initial Value 7:0 RevID Indicates the GT-64111 Revision number. 0x10 GT-64111-P-0 = 0x10 15:8 Reserved 0x0 23:16 SubClass Indicates the GT-64111 Subclass (0x80 - other mem- 0x80 ory controller) 31:24 BaseClass Indicates the GT-64111 Base Class (0x5 - memory 0x05 controller). And in section "6.5.3 PCI Autoconfiguration at RESET" is following interesting information: Eight PCI registers can be automatically loaded after Rst*. Autoconfiguration mode is enabled by asserting the DMAReq[3]* LOW on Rst*. Any PCI transactions targeted for the GT-64111 will be retried while the loading of the PCI configuration registers is in process. It is highly recommended that all PC applications utilize the PCI Autoconfiguration at RESET feature. The autoload feature can be easily implemented with a very low cost EPLD. Galileo provides sample EPLD equations upon request. (You can always pull the EPLD off your final product if you find there are no issues during testing.) NOTE: The GT-64111’s default Class Code is 0x0580 (Memory Controller) which is a change from the GT-64011. The GT-64011 used the Class Code 0x0600 which denotes Host Bridge. Some PCs refuse to configure host bridges if they are found plugged into a PCI slot (ask the BIOS vendors why...). The “Memory Controller” Class Code does not cause a problem for these non-compliant BIOSes, so we used this as the default in the GT-64111. The Class Code can be reporgrammed in both devices via autoload or CPU register writes. The PCI register values are loaded from the ROM controlled by BootCS* are shown in Table 21, below. TABLE 21. PCI Registers Loaded at RESET Register Offset Boot Device Address Device and Vendor ID 0x000 0x1fffffe0 Class Code and Revision ID 0x008 0x1fffffe4 Subsystem Device and Vendor ID 0x02c 0x1fffffe8 Interrupt Pin and Line 0x03c 0x1fffffec RAS[1:0]* Bank Size 0xc08 0x1ffffff0 RAS[3:2]* Bank Size 0xc0c 0x1ffffff4 CS[2:0]* Bank Size 0xc10 0x1ffffff8 CS[3]* & Boot CS* Bank Size 0xc14 0x1ffffffc === So the conclusion is that there is also some RESET configuration via BootCS (I have no idea what it is or was). And default value (when RESET configuration is not used) is always "Memory controller" due to existence of "broken PC BIOSes" (probably x86). Hence the quirk for GT64111 in kernel is always needed. And Thomas already confirmed in his pci hexdump that PCI Class code is set to Memory Controller. I hope that now this mystery of this GT64111 quirk is solved :-) I will update patch with correct comment, why quirk is needed. So due to the fact that 20 years ago there were broken x86 BIOSes which did not like PCI devices with PCI Class code of Host Bridge, Marvell changed default PCI Class code to Memory Controller and let it in this state also for future PCIe-based ARM and AR64 SoCs for next 20 years. Which generally leaded to broken PCIe support in mvebu SoCs. I have no more comments about it... :-( > > > > The lack of a quirk with a platform does not mean it cannot have a certain > > > > PCI/e device. > > > > > > This is 11ab:4620 device an there is no PCIe capability in its config > > > space (you can inspect it via 'lspci -F dump.txt -nn -vv' but there is > > > nothing interesting). > > > > Of course, just as Thomas told you about the GT-64111 too. But you were > > right in that the memory controller feature seems shared across all the > > chip line, whether PCI or PCIe. > > > > References: > > > > [1] "GT-64115 System Controller for RC4640, RM523X, and VR4300 CPUs", > > Galileo Technology, Datasheet Revision 1.11, APR 04, 2000, Section > > 18.16 "PCI Configuration", p. 161 > > > > [2] "GT-64120 System Controller For RC4650/4700/5000 and RM526X/527X/7000 > > CPUs", Galileo Technology, Datasheet Revision 1.4, SEP 14, 1999, > > Section 17.16 "PCI Configuration", p. 17-59 > > > > [3] same, Chapter 14. "Using the GT-64120 Without the CPU Interface", p. > > 14-1 > > > > Maciej > > Hello Maciej! Thank you very much for the explanation! > > Now it makes sense and looks like that for GT-64111 it is "system > erratum" that strapping pins are incorrectly set which leads to wrong > PCI class code. > > I will update comment for GT-64111 quirk in v2. > > I'm surprised that Marvell copied this 20 years old MIPS Galileo PCI > logic into followup ARM SoC PCIe IPs (and later also into recent ARM64 > A3720 SoC PCIe IP), removed configuration of PCI class code via > strapping pins and let default PCI class code value to Memory device, > even also when PCIe controller is running in Root Complex mode. And so > correction can be done only from "CPU bus". > > Just Marvell forgot to include chapter about usage without CPU interface > in new ARM and ARM64 SoCs and origin/usage of that Memory Controller > Device was lost in history, even Marvell people was not able to figure > out what was wrong with PCIe IP in their ARM SoCs... > https://lore.kernel.org/linux-pci/20211003120944.3lmwxylnhlp2kfj7@pali/ > > Maciej, if I had known that you have this kind of information I would > have written you year ago :-)
On 02.11.21 16:48, Pali Rohár wrote: > On Tuesday 02 November 2021 15:49:29 Pali Rohár wrote: >> On Tuesday 02 November 2021 14:01:41 Maciej W. Rozycki wrote: >>> On Tue, 2 Nov 2021, Pali Rohár wrote: >>> >>>>> None of the Galileo system controllers I came across had the class code >>>>> set incorrectly. >>>> >>>> In kernel there is quirk only for one device with id: >>>> PCI_VENDOR_ID_MARVELL (0x11ab) PCI_DEVICE_ID_MARVELL_GT64111 (0x4146) >>>> >>>> So for some reasons quirk is needed... Anyway, patch for this quirk just >>>> adds comment as there is no explanation for it. It does not modify quirk >>>> code. >>>> >>>> So it is possible that Marvell (or rather Galileo at that time) included >>>> some config space fixup in some products and 0x4146 did not have it. >>>> Just guessing... We can really only guess what could happen at that time >>>> 20 years ago... >>> >>> Ah, there you go! -- sadly I don't seem to have a copy of the datasheet >>> for the GT-64111, but the GT-64115 has it[1]: >>> >>> Table 158: PCI Class Code and Revision ID, Offset: 0x008 >>> Bits Field name Function Initial Value >>> 7:0 RevID Indicates the GT-64115 PCI Revision 0x01 >>> number. >>> 15:8 Reserved Read only. 0x0 >>> 23:16 SubClass Indicates the GT-64115 Subclass - Mem- 0x80 >>> ory controller. >>> 31:24 BaseClass Indicates the GT-64115 Base Class - 0x05 >>> memory controller. >>> >>> and then: >>> >>> "Device and Vendor ID (0x000), Class Code and Revision ID (0x008), and >>> Header Type (0x00e) fields are read only from the PCI bus. These fields >>> can be modified and read via the CPU bus." >>> >>> Likewise with the GT-64120[2]: >>> >>> Table 208: PCI_0 Class Code and Revision ID, Offset: 0x008 from PCI_0 or CPU; 0x088 from >>> PCI_1 >>> Bits Field name Function Initial Value >>> 7:0 RevID Indicates the GT-64120 PCI_0 revision number. 0x02 >>> 15:8 Reserved Read Only 0. 0x0 >>> 23:16 SubClass Indicates the GT-64120 Subclass Depends on value >>> 0x00 - Host Bridge Device. sampled at reset >>> 0x80 - Memory Device. on BankSel[0]. See >>> Table 44 on page >>> 11-1. >>> 31:24 BaseClass Indicates the GT-64120 Base Class Depends on value >>> 0x06 - Bridge Device. sampled at reset >>> 0x05 - Memory Device. on BankSel[0]. See >>> Table 44 on page >>> 11-1. >>> >>> Table 209: PCI_1 Class Code and Revision ID, Offset: 0x088 from PCI_0 or CPU; 0x008 from >>> PCI_1 >>> Bits Field name Function Initial Value >>> 31:0 Various Same as for PCI_0 Class Code and Revision ID. >>> >>> and then also: >>> >>> "Device and Vendor ID (0x000), Class Code and Revision ID (0x008), and >>> Header Type (0x00e) fields are read only from the PCI bus. These fields >>> can be modified and read via the CPU bus." >>> >>> -- so this is system-specific and an intended chip feature rather than an >>> erratum (or rather it is a system erratum if the reset strap or the boot >>> firmware has got it wrong). >>> >>> The memory device class code is IIUC meant to be typically chosen when >>> the Galileo/Marvell device is used without the CPU interface, i.e. as a >>> PCI memory controller device only[3]. > > I have found on internet some copy of GT64111 datasheet ("GT-64111 > System Controller for RC4640, RM523X and VR4300 CPUs", Galileo > Technology, Product Preview Revision 1.1, FEB 4, 1999) and in section > "17.15 PCI Configuration Registers" there is subsection "Class Code and > Revision ID, Offset: 0x008" with content: > > Bits Field name Function Initial Value > 7:0 RevID Indicates the GT-64111 Revision number. 0x10 > GT-64111-P-0 = 0x10 > 15:8 Reserved 0x0 > 23:16 SubClass Indicates the GT-64111 Subclass (0x80 - other mem- 0x80 > ory controller) > 31:24 BaseClass Indicates the GT-64111 Base Class (0x5 - memory 0x05 > controller). > > And in section "6.5.3 PCI Autoconfiguration at RESET" is following > interesting information: > > Eight PCI registers can be automatically loaded after Rst*. > Autoconfiguration mode is enabled by asserting the DMAReq[3]* LOW on > Rst*. Any PCI transactions targeted for the GT-64111 will be retried > while the loading of the PCI configuration registers is in process. > > It is highly recommended that all PC applications utilize the PCI > Autoconfiguration at RESET feature. The autoload feature can be easily > implemented with a very low cost EPLD. Galileo provides sample EPLD > equations upon request. (You can always pull the EPLD off your final > product if you find there are no issues during testing.) > > NOTE: The GT-64111’s default Class Code is 0x0580 (Memory Controller) > which is a change from the GT-64011. > > The GT-64011 used the Class Code 0x0600 which denotes Host Bridge. Some > PCs refuse to configure host bridges if they are found plugged into a > PCI slot (ask the BIOS vendors why...). The “Memory Controller” Class > Code does not cause a problem for these non-compliant BIOSes, so we used > this as the default in the GT-64111. The Class Code can be reporgrammed > in both devices via autoload or CPU register writes. > > The PCI register values are loaded from the ROM controlled by BootCS* > are shown in Table 21, below. > > TABLE 21. PCI Registers Loaded at RESET > Register Offset Boot Device Address > Device and Vendor ID 0x000 0x1fffffe0 > Class Code and Revision ID 0x008 0x1fffffe4 > Subsystem Device and Vendor ID 0x02c 0x1fffffe8 > Interrupt Pin and Line 0x03c 0x1fffffec > RAS[1:0]* Bank Size 0xc08 0x1ffffff0 > RAS[3:2]* Bank Size 0xc0c 0x1ffffff4 > CS[2:0]* Bank Size 0xc10 0x1ffffff8 > CS[3]* & Boot CS* Bank Size 0xc14 0x1ffffffc > > === > > So the conclusion is that there is also some RESET configuration via > BootCS (I have no idea what it is or was). And default value (when RESET > configuration is not used) is always "Memory controller" due to > existence of "broken PC BIOSes" (probably x86). > > Hence the quirk for GT64111 in kernel is always needed. And Thomas > already confirmed in his pci hexdump that PCI Class code is set to > Memory Controller. > > I hope that now this mystery of this GT64111 quirk is solved :-) I will > update patch with correct comment, why quirk is needed. > > So due to the fact that 20 years ago there were broken x86 BIOSes which > did not like PCI devices with PCI Class code of Host Bridge, Marvell > changed default PCI Class code to Memory Controller and let it in this > state also for future PCIe-based ARM and AR64 SoCs for next 20 years. > Which generally leaded to broken PCIe support in mvebu SoCs. I have no > more comments about it... :-( If this is really the case, that all this was "copied" in such a bad design state into newer SoC's for that many years, which I don't doubt right now any more, then this is absolutely amazing and pretty sad IMHO. Pali, many thanks for being persistant enough to dig through this. Thanks, Stefan
On Tue, 2 Nov 2021, Pali Rohár wrote: > Hello Maciej! Thank you very much for the explanation! You are welcome! > I'm surprised that Marvell copied this 20 years old MIPS Galileo PCI > logic into followup ARM SoC PCIe IPs (and later also into recent ARM64 > A3720 SoC PCIe IP), removed configuration of PCI class code via > strapping pins and let default PCI class code value to Memory device, > even also when PCIe controller is running in Root Complex mode. And so > correction can be done only from "CPU bus". Still the bootstrap firmware (say U-boot, as I can see it mentioned in your reference) can write the correct value to the class code register. Or can it? I guess you can try it and report your findings back. You can poke at PCI/e registers directly from U-boot (`pci write.w', etc.) as with any reasonable firmware monitor, no need to write code; I guess you probably know that already. I have no such hardware and I have no incentive to chase documentation for it even if public copies are available for the affected devices. Also you say it's IP rather than actual discrete chips as it used to be with the Galileo system controllers, so it could be up to the customer to get the IP wired/configured correctly. > Maciej, if I had known that you have this kind of information I would > have written you year ago :-) Well, I have all kinds of information. Maciej
On Tue, 2 Nov 2021, Pali Rohár wrote: > So the conclusion is that there is also some RESET configuration via > BootCS (I have no idea what it is or was). And default value (when RESET > configuration is not used) is always "Memory controller" due to > existence of "broken PC BIOSes" (probably x86). BootCS is one of the chip selects for the memory/device bus (AD bus), one of the three (or four in dual-PCI implementations), along with the SysAD bus and the PCI bus(es), interconnected, which is where DRAM sits as well as possibly other locally accessed devices with the Galileo system controllers. See Figure 5 on page 23 of the GT-64111 document. Maciej
On Wednesday 03 November 2021 14:49:07 Maciej W. Rozycki wrote: > On Tue, 2 Nov 2021, Pali Rohár wrote: > > > Hello Maciej! Thank you very much for the explanation! > > You are welcome! > > > I'm surprised that Marvell copied this 20 years old MIPS Galileo PCI > > logic into followup ARM SoC PCIe IPs (and later also into recent ARM64 > > A3720 SoC PCIe IP), removed configuration of PCI class code via > > strapping pins and let default PCI class code value to Memory device, > > even also when PCIe controller is running in Root Complex mode. And so > > correction can be done only from "CPU bus". > > Still the bootstrap firmware (say U-boot, as I can see it mentioned in > your reference) can write the correct value to the class code register. > Or can it? Yes, it can. And I have already sent patches to do it.
On Tuesday 02 November 2021 16:13:34 Pali Rohár wrote: > On Tuesday 02 November 2021 16:02:01 Thomas Bogendoerfer wrote: > > On Tue, Nov 02, 2021 at 11:00:34AM +0100, Pali Rohár wrote: > > > > > But I do not have this hardware to verify it. > > > > > > > > I still have a few Cobalt systems here. > > > > > > Perfect! It would help if you could provide 'lspci -nn -vv' output from > > > that system. In case you have very old version of lspci on that system > > > you could try to run it with '-xxxx' (or '-xxx') which prints hexdump > > > and I can parse it with local lspci. Thomas, one more question, do you have also GT-64115 system which has PCI device id 0x4611? Based on Maciej quote, GT-64115 probably also reports itself as "Memory controller" instead of "Host Bridge". So lspci output from GT-64115 could be also interesting. > > not sure, if you still needed: > > > > root@raq2:~# lspci -nn -vv > > 00:00.0 Host bridge [0600]: Marvell Technology Group Ltd. Device [11ab:4146] (rev 11) > > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- > > Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR+ INTx- > > Latency: 64, Cache Line Size: 32 bytes > > Interrupt: pin A routed to IRQ 0 > > Region 1: Memory at 08000000 (32-bit, non-prefetchable) [size=128M] > > Region 2: Memory at 1c000000 (32-bit, non-prefetchable) [size=32M] > > Region 3: Memory at 1f000000 (32-bit, non-prefetchable) [size=16M] > > Region 4: Memory at 14000000 (32-bit, non-prefetchable) [size=4K] > > Region 5: I/O ports at 4000000 [disabled] [size=4K] > > > > > > root@raq2:~# lspci -xxxx > > 00:00.0 Host bridge: Marvell Technology Group Ltd. Device 4146 (rev 11) > > 00: ab 11 46 41 06 00 80 a2 11 00 80 05 08 40 00 00 > > ^^ ^^ ^^ > Here is class code > > So it confirms that PCI Class code is 0580 which is Memory Controller. > And not Host Bridge as it should be. > > If I put this hexdump into dump.txt and run 'lspci -F dump.txt -nn' then I see: > 00:00.0 Memory controller [0580]: Marvell Technology Group Ltd. Device [11ab:4146] (rev 11) > > In your output above is "Host bridge" which means that quirk was applied: > 00:00.0 Host bridge [0600]: Marvell Technology Group Ltd. Device [11ab:4146] (rev 11) > > (I guess in 'lspci -nn -vv -b' should be Memory controller as lspci with > '-b' should not see that quirk change) > > > 10: 00 00 00 00 00 00 00 08 00 00 00 1c 00 00 00 1f > > 20: 00 00 00 14 01 00 00 14 00 00 00 00 00 00 00 00 > > 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 > > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > > Thomas. > > > > -- > > Crap can work. Given enough thrust pigs will fly, but it's not necessarily a > > good idea. [ RFC1925, 2.3 ]
On Wed, Nov 10, 2021 at 12:42:53AM +0100, Pali Rohár wrote: > On Tuesday 02 November 2021 16:13:34 Pali Rohár wrote: > > On Tuesday 02 November 2021 16:02:01 Thomas Bogendoerfer wrote: > > > On Tue, Nov 02, 2021 at 11:00:34AM +0100, Pali Rohár wrote: > > > > > > But I do not have this hardware to verify it. > > > > > > > > > > I still have a few Cobalt systems here. > > > > > > > > Perfect! It would help if you could provide 'lspci -nn -vv' output from > > > > that system. In case you have very old version of lspci on that system > > > > you could try to run it with '-xxxx' (or '-xxx') which prints hexdump > > > > and I can parse it with local lspci. > > Thomas, one more question, do you have also GT-64115 system which has > PCI device id 0x4611? Based on Maciej quote, GT-64115 probably also > reports itself as "Memory controller" instead of "Host Bridge". So lspci > output from GT-64115 could be also interesting. The only systems with GT64-xxx chips I have are Cobalt systems, but none of them has a GT-64115 chip (Raq1 comes with GT-64011 and Raq2 with GT-64111). Thomas.
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index fc196421b2ce..9f157e973555 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -400,6 +400,7 @@ config ARCH_DOVE select GENERIC_IRQ_MULTI_HANDLER select GPIOLIB select HAVE_PCI + select PCI_QUIRKS if PCI select MVEBU_MBUS select PINCTRL select PINCTRL_DOVE diff --git a/arch/arm/mach-dove/pcie.c b/arch/arm/mach-dove/pcie.c index ee91ac6b5ebf..ecf057a0f5ba 100644 --- a/arch/arm/mach-dove/pcie.c +++ b/arch/arm/mach-dove/pcie.c @@ -135,14 +135,19 @@ static struct pci_ops pcie_ops = { .write = pcie_wr_conf, }; +/* + * The root complex has a hardwired class of PCI_CLASS_MEMORY_OTHER, when it + * is operating as a root complex this needs to be switched to + * PCI_CLASS_BRIDGE_HOST or Linux will errantly try to process the BAR's on + * the device. Decoding setup is handled by the orion code. + */ static void rc_pci_fixup(struct pci_dev *dev) { - /* - * Prevent enumeration of root complex. - */ if (dev->bus->parent == NULL && dev->devfn == 0) { int i; + dev->class &= 0xff; + dev->class |= PCI_CLASS_BRIDGE_HOST << 8; for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { dev->resource[i].start = 0; dev->resource[i].end = 0; diff --git a/arch/arm/mach-mv78xx0/pcie.c b/arch/arm/mach-mv78xx0/pcie.c index 636d84b40466..9362b5fc116f 100644 --- a/arch/arm/mach-mv78xx0/pcie.c +++ b/arch/arm/mach-mv78xx0/pcie.c @@ -177,14 +177,19 @@ static struct pci_ops pcie_ops = { .write = pcie_wr_conf, }; +/* + * The root complex has a hardwired class of PCI_CLASS_MEMORY_OTHER, when it + * is operating as a root complex this needs to be switched to + * PCI_CLASS_BRIDGE_HOST or Linux will errantly try to process the BAR's on + * the device. Decoding setup is handled by the orion code. + */ static void rc_pci_fixup(struct pci_dev *dev) { - /* - * Prevent enumeration of root complex. - */ if (dev->bus->parent == NULL && dev->devfn == 0) { int i; + dev->class &= 0xff; + dev->class |= PCI_CLASS_BRIDGE_HOST << 8; for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { dev->resource[i].start = 0; dev->resource[i].end = 0; diff --git a/arch/arm/mach-orion5x/Kconfig b/arch/arm/mach-orion5x/Kconfig index e94a61901ffd..7189a5b1ec46 100644 --- a/arch/arm/mach-orion5x/Kconfig +++ b/arch/arm/mach-orion5x/Kconfig @@ -6,6 +6,7 @@ menuconfig ARCH_ORION5X select GPIOLIB select MVEBU_MBUS select FORCE_PCI + select PCI_QUIRKS select PHYLIB if NETDEVICES select PLAT_ORION_LEGACY help diff --git a/arch/arm/mach-orion5x/pci.c b/arch/arm/mach-orion5x/pci.c index 76951bfbacf5..5145fe89702e 100644 --- a/arch/arm/mach-orion5x/pci.c +++ b/arch/arm/mach-orion5x/pci.c @@ -509,14 +509,20 @@ static int __init pci_setup(struct pci_sys_data *sys) /***************************************************************************** * General PCIe + PCI ****************************************************************************/ + +/* + * The root complex has a hardwired class of PCI_CLASS_MEMORY_OTHER, when it + * is operating as a root complex this needs to be switched to + * PCI_CLASS_BRIDGE_HOST or Linux will errantly try to process the BAR's on + * the device. Decoding setup is handled by the orion code. + */ static void rc_pci_fixup(struct pci_dev *dev) { - /* - * Prevent enumeration of root complex. - */ if (dev->bus->parent == NULL && dev->devfn == 0) { int i; + dev->class &= 0xff; + dev->class |= PCI_CLASS_BRIDGE_HOST << 8; for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) { dev->resource[i].start = 0; dev->resource[i].end = 0; diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 771ca53af06d..c8d51bd20b84 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -346,6 +346,7 @@ config MIPS_COBALT select CEVT_GT641XX select DMA_NONCOHERENT select FORCE_PCI + select PCI_QUIRKS select I8253 select I8259 select IRQ_MIPS_CPU diff --git a/arch/mips/pci/fixup-cobalt.c b/arch/mips/pci/fixup-cobalt.c index 44be65c3e6bb..202f3a0bd97d 100644 --- a/arch/mips/pci/fixup-cobalt.c +++ b/arch/mips/pci/fixup-cobalt.c @@ -36,6 +36,12 @@ #define VIA_COBALT_BRD_ID_REG 0x94 #define VIA_COBALT_BRD_REG_to_ID(reg) ((unsigned char)(reg) >> 4) +/* + * The root complex has a hardwired class of PCI_CLASS_MEMORY_OTHER, when it + * is operating as a root complex this needs to be switched to + * PCI_CLASS_BRIDGE_HOST or Linux will errantly try to process the BAR's on + * the device. Decoding setup is handled by the orion code. + */ static void qube_raq_galileo_early_fixup(struct pci_dev *dev) { if (dev->devfn == PCI_DEVFN(0, 0) &&
- The code relies on rc_pci_fixup being called, which only happens when CONFIG_PCI_QUIRKS is enabled, so add that to Kconfig. Omitting this causes a booting failure with a non-obvious cause. - Update rc_pci_fixup to set the class properly, copying the more modern style from other places - Correct the rc_pci_fixup comment This patch just re-applies commit 1dc831bf53fd ("ARM: Kirkwood: Update PCI-E fixup") for all other Marvell platforms which use same buggy PCIe controller. Signed-off-by: Pali Rohár <pali@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: stable@vger.kernel.org --- arch/arm/Kconfig | 1 + arch/arm/mach-dove/pcie.c | 11 ++++++++--- arch/arm/mach-mv78xx0/pcie.c | 11 ++++++++--- arch/arm/mach-orion5x/Kconfig | 1 + arch/arm/mach-orion5x/pci.c | 12 +++++++++--- arch/mips/Kconfig | 1 + arch/mips/pci/fixup-cobalt.c | 6 ++++++ 7 files changed, 34 insertions(+), 9 deletions(-)