PCI: brcmstb: Avoid downstream access during link training

Message ID	385baf9dbfb6b65112803998dfc0cd6f84a5e6ba.1691296765.git.lukas@wunner.de (mailing list archive)
State	Not Applicable
Delegated to:	Bjorn Helgaas
Headers	show Return-Path: <linux-pci-owner@vger.kernel.org> Message-Id: <385baf9dbfb6b65112803998dfc0cd6f84a5e6ba.1691296765.git.lukas@wunner.de> From: Lukas Wunner <lukas@wunner.de> Date: Sun, 06 Aug 2023 06:44:50 +0200 Subject: [PATCH] PCI: brcmstb: Avoid downstream access during link training To: Jim Quinlan <jim2101024@gmail.com>, Nicolas Saenz Julienne <nsaenz@kernel.org>, Florian Fainelli <florian.fainelli@broadcom.com>, Lorenzo Pieralisi <lpieralisi@kernel.org>, Krzysztof Wilczynski <kw@linux.com>, Rob Herring <robh@kernel.org>, Bjorn Helgaas <bhelgaas@google.com> Cc: bcm-kernel-feedback-list@broadcom.com, linux-pci@vger.kernel.org, Philipp Rosenberger <p.rosenberger@kunbus.com>, Cyril Brulebois <kibi@debian.org> Precedence: bulk
Series	PCI: brcmstb: Avoid downstream access during link training \| expand PCI: brcmstb: Avoid downstream access during link training

Lukas Wunner Aug. 6, 2023, 4:44 a.m. UTC

The Broadcom Set Top Box PCIe controller signals an Asynchronous SError
Interrupt and thus causes a kernel panic when non-posted transactions
time out.  This differs from most other PCIe controllers which return a
fabricated "all ones" response instead.

To avoid gratuitous kernel panics, the driver reads the link status from
a proprietary PCIE_MISC_PCIE_STATUS register and skips downstream
accesses if the link is down.

However the bits in the proprietary register may purport that the link
is up even though link training is still in progress (as indicated by
the Link Training bit in the Link Status register).

This has been observed with various PCIe switches attached to a BCM2711
(Raspberry Pi CM4):  The issue is most pronounced with the Pericom
PI7C9X2G404SV, but has also occasionally been witnessed with the Pericom
PI7C9X2G404SL and ASMedia ASM1184e.

Check the Link Training bit in addition to the PCIE_MISC_PCIE_STATUS
register before performing downstream accesses.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: <stable@vger.kernel.org>
---
 drivers/pci/controller/pcie-brcmstb.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Florian Fainelli Aug. 6, 2023, 3:15 p.m. UTC | #1

On 8/5/2023 9:44 PM, Lukas Wunner wrote:
> The Broadcom Set Top Box PCIe controller signals an Asynchronous SError
> Interrupt and thus causes a kernel panic when non-posted transactions
> time out.  This differs from most other PCIe controllers which return a
> fabricated "all ones" response instead.
> 
> To avoid gratuitous kernel panics, the driver reads the link status from
> a proprietary PCIE_MISC_PCIE_STATUS register and skips downstream
> accesses if the link is down.
> 
> However the bits in the proprietary register may purport that the link
> is up even though link training is still in progress (as indicated by
> the Link Training bit in the Link Status register).
> 
> This has been observed with various PCIe switches attached to a BCM2711
> (Raspberry Pi CM4):  The issue is most pronounced with the Pericom
> PI7C9X2G404SV, but has also occasionally been witnessed with the Pericom
> PI7C9X2G404SL and ASMedia ASM1184e.
> 
> Check the Link Training bit in addition to the PCIE_MISC_PCIE_STATUS
> register before performing downstream accesses.
> 
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: <stable@vger.kernel.org>

Since you CC stable, it would be neat to provide a Fixes: tag here, most 
likely this dates back from the original commit adding support for this 
driver.

> ---
>   drivers/pci/controller/pcie-brcmstb.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> index f593a422bd63..b4abfced8e9b 100644
> --- a/drivers/pci/controller/pcie-brcmstb.c
> +++ b/drivers/pci/controller/pcie-brcmstb.c
> @@ -679,8 +679,10 @@ static bool brcm_pcie_link_up(struct brcm_pcie *pcie)
>   	u32 val = readl(pcie->base + PCIE_MISC_PCIE_STATUS);
>   	u32 dla = FIELD_GET(PCIE_MISC_PCIE_STATUS_PCIE_DL_ACTIVE_MASK, val);
>   	u32 plu = FIELD_GET(PCIE_MISC_PCIE_STATUS_PCIE_PHYLINKUP_MASK, val);
> +	u16 lnksta = readw(pcie->base + BRCM_PCIE_CAP_REGS + PCI_EXP_LNKSTA);
> +	u16 lt = FIELD_GET(PCI_EXP_LNKSTA_LT, lnksta);
>   
> -	return dla && plu;
> +	return dla && plu && !lt;

The patch looks good to me in spirit though I want Jim to review and 
provide feedback here since he was looking at this sort of issues not so 
long ago.

Thanks for tracking this down!

Pali Rohár Aug. 6, 2023, 9:43 p.m. UTC | #2

On Sunday 06 August 2023 06:44:50 Lukas Wunner wrote:
> The Broadcom Set Top Box PCIe controller signals an Asynchronous SError
> Interrupt

This is little incorrect wording. PCIe controller cannot send Async
SError, this is ARMv8 specific thing. In this case PCIe controller is
connected to ARM core via AXI bus and on PCIe transaction timeout it
sends AXI Slave Error, which then ARMv8 core reports to kernel as Async
SError interrupt.

The proper fix is to configure PCIe controller to never send AXI Slave
Error and neither AXI Decode Error (to prevent SErrors at all). For
example Synopsys PCIe controllers have proprietary hidden configuration
bits for enabling/disabling this AXI error reporting behavior.

Or second option is to access affected memory from the ARMv8 core via
synchronous operations and map memory as nGnRnE. Then ARMv8 core reports
AXI Slave Error as Synchronous Abort Exception which you can catch,
examine that was caused on PCIe memory region and fabricate all-ones
response. But the second option is not available for some licensed ARMv8
Cortex cores (e.g. A53) as they do not implement nE (non Early Write
Acknowledgement) memory mapping correctly.

The patch below does not fix the issue at all, instead it opens a new
race condition that if link state is changed _after_ the check and
_before_ accessing config space.

> and thus causes a kernel panic when non-posted transactions
> time out.  This differs from most other PCIe controllers which return a
> fabricated "all ones" response instead.
> 
> To avoid gratuitous kernel panics, the driver reads the link status from
> a proprietary PCIE_MISC_PCIE_STATUS register and skips downstream
> accesses if the link is down.
> 
> However the bits in the proprietary register may purport that the link
> is up even though link training is still in progress (as indicated by
> the Link Training bit in the Link Status register).
> 
> This has been observed with various PCIe switches attached to a BCM2711
> (Raspberry Pi CM4):  The issue is most pronounced with the Pericom
> PI7C9X2G404SV, but has also occasionally been witnessed with the Pericom
> PI7C9X2G404SL and ASMedia ASM1184e.
> 
> Check the Link Training bit in addition to the PCIE_MISC_PCIE_STATUS
> register before performing downstream accesses.
> 
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: <stable@vger.kernel.org>
> ---
>  drivers/pci/controller/pcie-brcmstb.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> index f593a422bd63..b4abfced8e9b 100644
> --- a/drivers/pci/controller/pcie-brcmstb.c
> +++ b/drivers/pci/controller/pcie-brcmstb.c
> @@ -679,8 +679,10 @@ static bool brcm_pcie_link_up(struct brcm_pcie *pcie)
>  	u32 val = readl(pcie->base + PCIE_MISC_PCIE_STATUS);
>  	u32 dla = FIELD_GET(PCIE_MISC_PCIE_STATUS_PCIE_DL_ACTIVE_MASK, val);
>  	u32 plu = FIELD_GET(PCIE_MISC_PCIE_STATUS_PCIE_PHYLINKUP_MASK, val);
> +	u16 lnksta = readw(pcie->base + BRCM_PCIE_CAP_REGS + PCI_EXP_LNKSTA);
> +	u16 lt = FIELD_GET(PCI_EXP_LNKSTA_LT, lnksta);
>  
> -	return dla && plu;
> +	return dla && plu && !lt;
>  }
>  
>  static void __iomem *brcm_pcie_map_bus(struct pci_bus *bus,
> -- 
> 2.39.2
>

Florian Fainelli Aug. 7, 2023, 2:48 a.m. UTC | #3

On 8/6/2023 2:43 PM, Pali Rohár wrote:
> On Sunday 06 August 2023 06:44:50 Lukas Wunner wrote:
>> The Broadcom Set Top Box PCIe controller signals an Asynchronous SError
>> Interrupt
> 
> This is little incorrect wording. PCIe controller cannot send Async
> SError, this is ARMv8 specific thing. In this case PCIe controller is
> connected to ARM core via AXI bus and on PCIe transaction timeout it
> sends AXI Slave Error, which then ARMv8 core reports to kernel as Async
> SError interrupt.

That is indeed a better way to explain the issue. FWIW, on BCM2711 the 
PCIe core connects via SCB and then AXI towards the ARMv8 CPU, does not 
change a thing about your paragraph.

> 
> The proper fix is to configure PCIe controller to never send AXI Slave
> Error and neither AXI Decode Error (to prevent SErrors at all). For
> example Synopsys PCIe controllers have proprietary hidden configuration
> bits for enabling/disabling this AXI error reporting behavior.

That does not exist with the version of the block present in BCM2711 
unfortunately.

> 
> Or second option is to access affected memory from the ARMv8 core via
> synchronous operations and map memory as nGnRnE. Then ARMv8 core reports
> AXI Slave Error as Synchronous Abort Exception which you can catch,
> examine that was caused on PCIe memory region and fabricate all-ones
> response. But the second option is not available for some licensed ARMv8
> Cortex cores (e.g. A53) as they do not implement nE (non Early Write
> Acknowledgement) memory mapping correctly.

BCM2711 uses Cortex-A72 do these cores still not implement nE correctly? 
Do you have a reference backing up that claim (not disputing it, just 
curious).

> 
> The patch below does not fix the issue at all, instead it opens a new
> race condition that if link state is changed _after_ the check and
> _before_ accessing config space.

Fair enough, but in the situation you describe there is not much that 
can be done anyway so we might as well deal with a narrowed window?

Thanks for reviewing.

Pali Rohár Aug. 7, 2023, 7:13 a.m. UTC | #4

On Sunday 06 August 2023 19:48:59 Florian Fainelli wrote:
> On 8/6/2023 2:43 PM, Pali Rohár wrote:
> > On Sunday 06 August 2023 06:44:50 Lukas Wunner wrote:
> > > The Broadcom Set Top Box PCIe controller signals an Asynchronous SError
> > > Interrupt
> > 
> > This is little incorrect wording. PCIe controller cannot send Async
> > SError, this is ARMv8 specific thing. In this case PCIe controller is
> > connected to ARM core via AXI bus and on PCIe transaction timeout it
> > sends AXI Slave Error, which then ARMv8 core reports to kernel as Async
> > SError interrupt.
> 
> That is indeed a better way to explain the issue. FWIW, on BCM2711 the PCIe
> core connects via SCB and then AXI towards the ARMv8 CPU, does not change a
> thing about your paragraph.

Ok, it is better describe the issue correctly.

> > 
> > The proper fix is to configure PCIe controller to never send AXI Slave
> > Error and neither AXI Decode Error (to prevent SErrors at all). For
> > example Synopsys PCIe controllers have proprietary hidden configuration
> > bits for enabling/disabling this AXI error reporting behavior.
> 
> That does not exist with the version of the block present in BCM2711
> unfortunately.

I was expecting such answer. I think that we have been discussing this
issue privately more months ago.

> > 
> > Or second option is to access affected memory from the ARMv8 core via
> > synchronous operations and map memory as nGnRnE. Then ARMv8 core reports
> > AXI Slave Error as Synchronous Abort Exception which you can catch,
> > examine that was caused on PCIe memory region and fabricate all-ones
> > response. But the second option is not available for some licensed ARMv8
> > Cortex cores (e.g. A53) as they do not implement nE (non Early Write
> > Acknowledgement) memory mapping correctly.
> 
> BCM2711 uses Cortex-A72 do these cores still not implement nE correctly? Do
> you have a reference backing up that claim (not disputing it, just curious).

I remember that I read this in Cortex-A53 documentation. Let me find it.

It is here:
ARM Cortex-A53 MPCore Processor Technical Reference Manual r0p3 - Support for v8 memory types:
https://developer.arm.com/documentation/ddi0500/e/level-1-memory-system/support-for-v8-memory-types

"nGnRnE - Treated the same as nGnRE inside a Cortex-A53 processor"

So really, disabling Early-acknowledge has no effect on A53.

> > 
> > The patch below does not fix the issue at all, instead it opens a new
> > race condition that if link state is changed _after_ the check and
> > _before_ accessing config space.
> 
> Fair enough, but in the situation you describe there is not much that can be
> done anyway so we might as well deal with a narrowed window?

Unless there is hidden/proprietary/undocumented bits for PCIe controller
to disable generating AXI errors, or ability to use only synchronous
access to affected memory from ARM core (e.g. by nE) there is really not
much what can be done here. I can just say that this is badly designed HW.

One more thing, if I remember correctly, ARMv8 specifies that Syndrome
register which delivers SError has some bits reserved for optional
feature - target AXI slave id which sent AXI Error. A53 does not
implement it and I highly doubt that other licensed ARM cores have them.
But maybe you could check A72. But even with that I'm really not sure if
it could be possible to catch SError, detect that it was sent by AXI
slave corresponding to PCIe controller and ignore it.

> Thanks for reviewing.
> -- 
> Florian

Pali Rohár Aug. 7, 2023, 7:28 a.m. UTC | #5

On Monday 07 August 2023 09:13:58 Pali Rohár wrote:
> On Sunday 06 August 2023 19:48:59 Florian Fainelli wrote:
> > On 8/6/2023 2:43 PM, Pali Rohár wrote:
> > > On Sunday 06 August 2023 06:44:50 Lukas Wunner wrote:
> > > > The Broadcom Set Top Box PCIe controller signals an Asynchronous SError
> > > > Interrupt
> > > 
> > > This is little incorrect wording. PCIe controller cannot send Async
> > > SError, this is ARMv8 specific thing. In this case PCIe controller is
> > > connected to ARM core via AXI bus and on PCIe transaction timeout it
> > > sends AXI Slave Error, which then ARMv8 core reports to kernel as Async
> > > SError interrupt.
> > 
> > That is indeed a better way to explain the issue. FWIW, on BCM2711 the PCIe
> > core connects via SCB and then AXI towards the ARMv8 CPU, does not change a
> > thing about your paragraph.
> 
> Ok, it is better describe the issue correctly.
> 
> > > 
> > > The proper fix is to configure PCIe controller to never send AXI Slave
> > > Error and neither AXI Decode Error (to prevent SErrors at all). For
> > > example Synopsys PCIe controllers have proprietary hidden configuration
> > > bits for enabling/disabling this AXI error reporting behavior.
> > 
> > That does not exist with the version of the block present in BCM2711
> > unfortunately.
> 
> I was expecting such answer. I think that we have been discussing this
> issue privately more months ago.
> 
> > > 
> > > Or second option is to access affected memory from the ARMv8 core via
> > > synchronous operations and map memory as nGnRnE. Then ARMv8 core reports
> > > AXI Slave Error as Synchronous Abort Exception which you can catch,
> > > examine that was caused on PCIe memory region and fabricate all-ones
> > > response. But the second option is not available for some licensed ARMv8
> > > Cortex cores (e.g. A53) as they do not implement nE (non Early Write
> > > Acknowledgement) memory mapping correctly.
> > 
> > BCM2711 uses Cortex-A72 do these cores still not implement nE correctly? Do
> > you have a reference backing up that claim (not disputing it, just curious).
> 
> I remember that I read this in Cortex-A53 documentation. Let me find it.
> 
> It is here:
> ARM Cortex-A53 MPCore Processor Technical Reference Manual r0p3 - Support for v8 memory types:
> https://developer.arm.com/documentation/ddi0500/e/level-1-memory-system/support-for-v8-memory-types
> 
> "nGnRnE - Treated the same as nGnRE inside a Cortex-A53 processor"
> 
> So really, disabling Early-acknowledge has no effect on A53.
> 
> > > 
> > > The patch below does not fix the issue at all, instead it opens a new
> > > race condition that if link state is changed _after_ the check and
> > > _before_ accessing config space.
> > 
> > Fair enough, but in the situation you describe there is not much that can be
> > done anyway so we might as well deal with a narrowed window?
> 
> Unless there is hidden/proprietary/undocumented bits for PCIe controller
> to disable generating AXI errors, or ability to use only synchronous
> access to affected memory from ARM core (e.g. by nE) there is really not
> much what can be done here. I can just say that this is badly designed HW.
> 
> One more thing, if I remember correctly, ARMv8 specifies that Syndrome
> register which delivers SError has some bits reserved for optional
> feature - target AXI slave id which sent AXI Error. A53 does not
> implement it and I highly doubt that other licensed ARM cores have them.
> But maybe you could check A72. But even with that I'm really not sure if
> it could be possible to catch SError, detect that it was sent by AXI
> slave corresponding to PCIe controller and ignore it.
> 
> > Thanks for reviewing.
> > -- 
> > Florian

One more idea. Does platform have some DMA controller? If yes, maybe it
could be possible to completely avoid access to that affected memory
from ARM core and instead access its content via DMA controller? DMA
controller could be better and does not signal slave errors as SError to
ARM core.

Jim Quinlan Aug. 14, 2023, 7:55 p.m. UTC | #6

On Sun, Aug 6, 2023 at 11:16 AM Florian Fainelli
<florian.fainelli@broadcom.com> wrote:
>
>
>
> On 8/5/2023 9:44 PM, Lukas Wunner wrote:
> > The Broadcom Set Top Box PCIe controller signals an Asynchronous SError
> > Interrupt and thus causes a kernel panic when non-posted transactions
> > time out.  This differs from most other PCIe controllers which return a
> > fabricated "all ones" response instead.
> >
> > To avoid gratuitous kernel panics, the driver reads the link status from
> > a proprietary PCIE_MISC_PCIE_STATUS register and skips downstream
> > accesses if the link is down.
> >
> > However the bits in the proprietary register may purport that the link
> > is up even though link training is still in progress (as indicated by
> > the Link Training bit in the Link Status register).
> >
> > This has been observed with various PCIe switches attached to a BCM2711
> > (Raspberry Pi CM4):  The issue is most pronounced with the Pericom
> > PI7C9X2G404SV, but has also occasionally been witnessed with the Pericom
> > PI7C9X2G404SL and ASMedia ASM1184e.
> >
> > Check the Link Training bit in addition to the PCIE_MISC_PCIE_STATUS
> > register before performing downstream accesses.
> >
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > Cc: <stable@vger.kernel.org>
>
> Since you CC stable, it would be neat to provide a Fixes: tag here, most
> likely this dates back from the original commit adding support for this
> driver.
>
> > ---
> >   drivers/pci/controller/pcie-brcmstb.c | 4 +++-
> >   1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> > index f593a422bd63..b4abfced8e9b 100644
> > --- a/drivers/pci/controller/pcie-brcmstb.c
> > +++ b/drivers/pci/controller/pcie-brcmstb.c
> > @@ -679,8 +679,10 @@ static bool brcm_pcie_link_up(struct brcm_pcie *pcie)
> >       u32 val = readl(pcie->base + PCIE_MISC_PCIE_STATUS);
> >       u32 dla = FIELD_GET(PCIE_MISC_PCIE_STATUS_PCIE_DL_ACTIVE_MASK, val);
> >       u32 plu = FIELD_GET(PCIE_MISC_PCIE_STATUS_PCIE_PHYLINKUP_MASK, val);
> > +     u16 lnksta = readw(pcie->base + BRCM_PCIE_CAP_REGS + PCI_EXP_LNKSTA);
> > +     u16 lt = FIELD_GET(PCI_EXP_LNKSTA_LT, lnksta);
> >
> > -     return dla && plu;
> > +     return dla && plu && !lt;
>
> The patch looks good to me in spirit though I want Jim to review and

This looks good to me but please allow me a few days to do some
testing -- I'm currently in Covid prison.

Regards,
Jim Quinlan
Broadcom STB

> provide feedback here since he was looking at this sort of issues not so
> long ago.
>
> Thanks for tracking this down!
> --
> Florian

Bjorn Helgaas Oct. 24, 2023, 12:51 a.m. UTC | #7

On Sun, Aug 06, 2023 at 06:44:50AM +0200, Lukas Wunner wrote:
> The Broadcom Set Top Box PCIe controller signals an Asynchronous SError
> Interrupt and thus causes a kernel panic when non-posted transactions
> time out.  This differs from most other PCIe controllers which return a
> fabricated "all ones" response instead.
> 
> To avoid gratuitous kernel panics, the driver reads the link status from
> a proprietary PCIE_MISC_PCIE_STATUS register and skips downstream
> accesses if the link is down.
> 
> However the bits in the proprietary register may purport that the link
> is up even though link training is still in progress (as indicated by
> the Link Training bit in the Link Status register).
> 
> This has been observed with various PCIe switches attached to a BCM2711
> (Raspberry Pi CM4):  The issue is most pronounced with the Pericom
> PI7C9X2G404SV, but has also occasionally been witnessed with the Pericom
> PI7C9X2G404SL and ASMedia ASM1184e.

So somebody is seeing kernel panics when these switches are connected?
Do we have pointers to those reports that we can reference here?

> Check the Link Training bit in addition to the PCIE_MISC_PCIE_STATUS
> register before performing downstream accesses.

I guess the theory is that link training takes longer than usual with
these devices?  Is the idea here that we wait longer in
brcm_pcie_start_link()?

Or is it that we avoid config accesses to downstream devices while the
link is not yet up?  This seems like it would be problematic (see
below).

> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: <stable@vger.kernel.org>
> ---
>  drivers/pci/controller/pcie-brcmstb.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> index f593a422bd63..b4abfced8e9b 100644
> --- a/drivers/pci/controller/pcie-brcmstb.c
> +++ b/drivers/pci/controller/pcie-brcmstb.c
> @@ -679,8 +679,10 @@ static bool brcm_pcie_link_up(struct brcm_pcie *pcie)
>  	u32 val = readl(pcie->base + PCIE_MISC_PCIE_STATUS);
>  	u32 dla = FIELD_GET(PCIE_MISC_PCIE_STATUS_PCIE_DL_ACTIVE_MASK, val);
>  	u32 plu = FIELD_GET(PCIE_MISC_PCIE_STATUS_PCIE_PHYLINKUP_MASK, val);
> +	u16 lnksta = readw(pcie->base + BRCM_PCIE_CAP_REGS + PCI_EXP_LNKSTA);
> +	u16 lt = FIELD_GET(PCI_EXP_LNKSTA_LT, lnksta);
>  
> -	return dla && plu;
> +	return dla && plu && !lt;

It looks like this will make config accesses to downstream devices
fail while PCI_EXP_LNKSTA_LT is set by making brcm_pcie_link_up()
return false, which makes brcm_pcie_map_bus() return NULL, which will
make pci_generic_config_read() return PCIBIOS_DEVICE_NOT_FOUND without
attempting the config read.

So this should avoid the SError (mostly, at least; I'm sure this is
still racy), but what about the config access?  Presumably the caller
depends on it happening, and it sounds like it *would* happen if we
tried a little later.  I don't think we can count on the caller to
retry a failed access, e.g., enumeration config reads that return ~0
are just interpreted as "there's no device here."

Maybe the real issue is that we need to make brcm_pcie_start_link()
wait longer, where we aren't attempting a config read?

Jim, are you still interested in testing this?

>  }
>  
>  static void __iomem *brcm_pcie_map_bus(struct pci_bus *bus,
> -- 
> 2.39.2
>

Florian Fainelli Oct. 24, 2023, 1:13 a.m. UTC | #8

On 8/7/2023 12:28 AM, Pali Rohár wrote:
> On Monday 07 August 2023 09:13:58 Pali Rohár wrote:
>> On Sunday 06 August 2023 19:48:59 Florian Fainelli wrote:
>>> On 8/6/2023 2:43 PM, Pali Rohár wrote:
>>>> On Sunday 06 August 2023 06:44:50 Lukas Wunner wrote:
>>>>> The Broadcom Set Top Box PCIe controller signals an Asynchronous SError
>>>>> Interrupt
>>>>
>>>> This is little incorrect wording. PCIe controller cannot send Async
>>>> SError, this is ARMv8 specific thing. In this case PCIe controller is
>>>> connected to ARM core via AXI bus and on PCIe transaction timeout it
>>>> sends AXI Slave Error, which then ARMv8 core reports to kernel as Async
>>>> SError interrupt.
>>>
>>> That is indeed a better way to explain the issue. FWIW, on BCM2711 the PCIe
>>> core connects via SCB and then AXI towards the ARMv8 CPU, does not change a
>>> thing about your paragraph.
>>
>> Ok, it is better describe the issue correctly.
>>
>>>>
>>>> The proper fix is to configure PCIe controller to never send AXI Slave
>>>> Error and neither AXI Decode Error (to prevent SErrors at all). For
>>>> example Synopsys PCIe controllers have proprietary hidden configuration
>>>> bits for enabling/disabling this AXI error reporting behavior.
>>>
>>> That does not exist with the version of the block present in BCM2711
>>> unfortunately.
>>
>> I was expecting such answer. I think that we have been discussing this
>> issue privately more months ago.
>>
>>>>
>>>> Or second option is to access affected memory from the ARMv8 core via
>>>> synchronous operations and map memory as nGnRnE. Then ARMv8 core reports
>>>> AXI Slave Error as Synchronous Abort Exception which you can catch,
>>>> examine that was caused on PCIe memory region and fabricate all-ones
>>>> response. But the second option is not available for some licensed ARMv8
>>>> Cortex cores (e.g. A53) as they do not implement nE (non Early Write
>>>> Acknowledgement) memory mapping correctly.
>>>
>>> BCM2711 uses Cortex-A72 do these cores still not implement nE correctly? Do
>>> you have a reference backing up that claim (not disputing it, just curious).
>>
>> I remember that I read this in Cortex-A53 documentation. Let me find it.
>>
>> It is here:
>> ARM Cortex-A53 MPCore Processor Technical Reference Manual r0p3 - Support for v8 memory types:
>> https://developer.arm.com/documentation/ddi0500/e/level-1-memory-system/support-for-v8-memory-types
>>
>> "nGnRnE - Treated the same as nGnRE inside a Cortex-A53 processor"
>>
>> So really, disabling Early-acknowledge has no effect on A53.
>>
>>>>
>>>> The patch below does not fix the issue at all, instead it opens a new
>>>> race condition that if link state is changed _after_ the check and
>>>> _before_ accessing config space.
>>>
>>> Fair enough, but in the situation you describe there is not much that can be
>>> done anyway so we might as well deal with a narrowed window?
>>
>> Unless there is hidden/proprietary/undocumented bits for PCIe controller
>> to disable generating AXI errors, or ability to use only synchronous
>> access to affected memory from ARM core (e.g. by nE) there is really not
>> much what can be done here. I can just say that this is badly designed HW.
>>
>> One more thing, if I remember correctly, ARMv8 specifies that Syndrome
>> register which delivers SError has some bits reserved for optional
>> feature - target AXI slave id which sent AXI Error. A53 does not
>> implement it and I highly doubt that other licensed ARM cores have them.
>> But maybe you could check A72. But even with that I'm really not sure if
>> it could be possible to catch SError, detect that it was sent by AXI
>> slave corresponding to PCIe controller and ignore it.
>>
>>> Thanks for reviewing.
>>> -- 
>>> Florian
> 
> One more idea. Does platform have some DMA controller? If yes, maybe it
> could be possible to completely avoid access to that affected memory
> from ARM core and instead access its content via DMA controller? DMA
> controller could be better and does not signal slave errors as SError to
> ARM core.

Looks like I never got back to you on that, yes 2711 has 15 programmable 
DMA controllers (drivers/dma/bcm2835-dma.c) which could be an option 
here, though since the PCIe bridge is supposed to be transparent, I am 
not sure how we could hook it in into memory cycles?

Would not that require faulting (from a MMU perspective) on the PCIe 
outbound window to then go and kick off the DMA engine to transfer data 
around?

Bjorn Helgaas Dec. 8, 2023, 7:53 p.m. UTC | #9

On Sun, Aug 06, 2023 at 06:44:50AM +0200, Lukas Wunner wrote:
> The Broadcom Set Top Box PCIe controller signals an Asynchronous SError
> Interrupt and thus causes a kernel panic when non-posted transactions
> time out.  This differs from most other PCIe controllers which return a
> fabricated "all ones" response instead.
> 
> To avoid gratuitous kernel panics, the driver reads the link status from
> a proprietary PCIE_MISC_PCIE_STATUS register and skips downstream
> accesses if the link is down.
> 
> However the bits in the proprietary register may purport that the link
> is up even though link training is still in progress (as indicated by
> the Link Training bit in the Link Status register).
> 
> This has been observed with various PCIe switches attached to a BCM2711
> (Raspberry Pi CM4):  The issue is most pronounced with the Pericom
> PI7C9X2G404SV, but has also occasionally been witnessed with the Pericom
> PI7C9X2G404SL and ASMedia ASM1184e.
> 
> Check the Link Training bit in addition to the PCIE_MISC_PCIE_STATUS
> register before performing downstream accesses.
> 
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: <stable@vger.kernel.org>

Since there was a fair bit of discussion and no obvious conclusion,
I'm dropping this for now.  Please repost if appropriate and maybe we
can get some testing/reviews/acks.

> ---
>  drivers/pci/controller/pcie-brcmstb.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> index f593a422bd63..b4abfced8e9b 100644
> --- a/drivers/pci/controller/pcie-brcmstb.c
> +++ b/drivers/pci/controller/pcie-brcmstb.c
> @@ -679,8 +679,10 @@ static bool brcm_pcie_link_up(struct brcm_pcie *pcie)
>  	u32 val = readl(pcie->base + PCIE_MISC_PCIE_STATUS);
>  	u32 dla = FIELD_GET(PCIE_MISC_PCIE_STATUS_PCIE_DL_ACTIVE_MASK, val);
>  	u32 plu = FIELD_GET(PCIE_MISC_PCIE_STATUS_PCIE_PHYLINKUP_MASK, val);
> +	u16 lnksta = readw(pcie->base + BRCM_PCIE_CAP_REGS + PCI_EXP_LNKSTA);
> +	u16 lt = FIELD_GET(PCI_EXP_LNKSTA_LT, lnksta);
>  
> -	return dla && plu;
> +	return dla && plu && !lt;
>  }
>  
>  static void __iomem *brcm_pcie_map_bus(struct pci_bus *bus,
> -- 
> 2.39.2
>

PCI: brcmstb: Avoid downstream access during link training

Commit Message

Comments

Patch