[V2] PCI: rcar: Add L1 link state fix into data abort hook

Message ID	20200926160934.136182-1-marek.vasut@gmail.com (mailing list archive)
State	Superseded, archived
Delegated to:	Lorenzo Pieralisi
Headers	show Return-Path: <SRS0=/pen=DD=vger.kernel.org=linux-pci-owner@kernel.org> From: marek.vasut@gmail.com To: linux-pci@vger.kernel.org Cc: Marek Vasut <marek.vasut+renesas@gmail.com>, Bjorn Helgaas <bhelgaas@google.com>, Geert Uytterhoeven <geert+renesas@glider.be>, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>, Wolfram Sang <wsa@the-dreams.de>, Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>, linux-renesas-soc@vger.kernel.org Subject: [PATCH V2] PCI: rcar: Add L1 link state fix into data abort hook Date: Sat, 26 Sep 2020 18:09:34 +0200 Message-Id: <20200926160934.136182-1-marek.vasut@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[V2] PCI: rcar: Add L1 link state fix into data abort hook \| expand [V2] PCI: rcar: Add L1 link state fix into data abort hook

Marek Vasut Sept. 26, 2020, 4:09 p.m. UTC

From: Marek Vasut <marek.vasut+renesas@gmail.com>

The R-Car PCIe controller is capable of handling L0s/L1 link states.
While the controller can enter and exit L0s link state, and exit L1
link state, without any additional action from the driver, to enter
L1 link state, the driver must complete the link state transition by
issuing additional commands to the controller.

The problem is, this transition is not atomic. The controller sets
PMEL1RX bit in PMSR register upon reception of PM_ENTER_L1 DLLP from
the PCIe card, but then the controller enters some sort of inbetween
state. The driver must detect this condition and complete the link
state transition, by setting L1IATN bit in PMCTLR and waiting for
the link state transition to complete.

If a PCIe access happens inside of this window, where the controller
is inbetween L0 and L1 link states, the access generates a fault and
the ARM 'imprecise external abort' handler is invoked.

Just like other PCI controller drivers, here we hook the fault handler,
perform the fixup to help the controller enter L1 link state, and then
restart the instruction which triggered the fault. Since the controller
is in L1 link state now, the link can exit from L1 link state to L0 and
successfully complete the access.

Note that this fixup is applicable only to Aarch32 R-Car controllers,
the Aarch64 R-Car perform the same fixup in TFA, see TFA commit [1]
0969397f2 ("rcar_gen3: plat: Prevent PCIe hang during L1X config access")
[1] https://github.com/ARM-software/arm-trusted-firmware/commit/0969397f295621aa26b3d14b76dd397d22be58bf

Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: linux-renesas-soc@vger.kernel.org
---
V2: - Update commit message, add link to TFA repository commit
    - Handle the LPAE case as in ARM fault.c and fsr-{2,3}level.c
    - Cache clock and check whether they are enabled before register
      access
---
 drivers/pci/controller/pcie-rcar-host.c | 65 +++++++++++++++++++++++++
 drivers/pci/controller/pcie-rcar.h      |  7 +++
 2 files changed, 72 insertions(+)

Sergey Shtylyov Sept. 27, 2020, 8:29 a.m. UTC | #1

Hello!

On 26.09.2020 19:09, marek.vasut@gmail.com wrote:

> From: Marek Vasut <marek.vasut+renesas@gmail.com>
> 
> The R-Car PCIe controller is capable of handling L0s/L1 link states.
> While the controller can enter and exit L0s link state, and exit L1
> link state, without any additional action from the driver, to enter
> L1 link state, the driver must complete the link state transition by
> issuing additional commands to the controller.
> 
> The problem is, this transition is not atomic. The controller sets
> PMEL1RX bit in PMSR register upon reception of PM_ENTER_L1 DLLP from
> the PCIe card, but then the controller enters some sort of inbetween
> state. The driver must detect this condition and complete the link
> state transition, by setting L1IATN bit in PMCTLR and waiting for
> the link state transition to complete.
> 
> If a PCIe access happens inside of this window, where the controller

     s/of//.

> is inbetween L0 and L1 link states, the access generates a fault and

    My spellchecker trips on "inbetween"...

> the ARM 'imprecise external abort' handler is invoked.
> 
> Just like other PCI controller drivers, here we hook the fault handler,
> perform the fixup to help the controller enter L1 link state, and then
> restart the instruction which triggered the fault. Since the controller

    If this is imprecise or async external abort, how we can re-execute the 
instruction that triggered the fault? It's been probably executed already, no?

> is in L1 link state now, the link can exit from L1 link state to L0 and
> successfully complete the access.
> 
> Note that this fixup is applicable only to Aarch32 R-Car controllers,
> the Aarch64 R-Car perform the same fixup in TFA, see TFA commit [1]
> 0969397f2 ("rcar_gen3: plat: Prevent PCIe hang during L1X config access")
> [1] https://github.com/ARM-software/arm-trusted-firmware/commit/0969397f295621aa26b3d14b76dd397d22be58bf
> 
> Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Geert Uytterhoeven <geert+renesas@glider.be>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Cc: Wolfram Sang <wsa@the-dreams.de>
> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Cc: linux-renesas-soc@vger.kernel.org
[...]
> +static int __init rcar_pcie_init(void)
> +{
> +#ifdef CONFIG_ARM_LPAE
> +	hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> +			"asynchronous external abort");
> +#else
> +	hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> +			"imprecise external abort");
> +#endif
> +	return platform_driver_register(&rcar_pcie_driver);
> +}
> +device_initcall(rcar_pcie_init);
> +#else
>   builtin_platform_driver(rcar_pcie_driver);
> +#endif
[...]

MBR, Sergei

Geert Uytterhoeven Sept. 28, 2020, 9:35 a.m. UTC | #2

Hi Marek,

On Sat, Sep 26, 2020 at 6:09 PM <marek.vasut@gmail.com> wrote:
> From: Marek Vasut <marek.vasut+renesas@gmail.com>
>
> The R-Car PCIe controller is capable of handling L0s/L1 link states.
> While the controller can enter and exit L0s link state, and exit L1
> link state, without any additional action from the driver, to enter
> L1 link state, the driver must complete the link state transition by
> issuing additional commands to the controller.
>
> The problem is, this transition is not atomic. The controller sets
> PMEL1RX bit in PMSR register upon reception of PM_ENTER_L1 DLLP from
> the PCIe card, but then the controller enters some sort of inbetween
> state. The driver must detect this condition and complete the link
> state transition, by setting L1IATN bit in PMCTLR and waiting for
> the link state transition to complete.
>
> If a PCIe access happens inside of this window, where the controller
> is inbetween L0 and L1 link states, the access generates a fault and
> the ARM 'imprecise external abort' handler is invoked.
>
> Just like other PCI controller drivers, here we hook the fault handler,
> perform the fixup to help the controller enter L1 link state, and then
> restart the instruction which triggered the fault. Since the controller
> is in L1 link state now, the link can exit from L1 link state to L0 and
> successfully complete the access.
>
> Note that this fixup is applicable only to Aarch32 R-Car controllers,
> the Aarch64 R-Car perform the same fixup in TFA, see TFA commit [1]
> 0969397f2 ("rcar_gen3: plat: Prevent PCIe hang during L1X config access")
> [1] https://github.com/ARM-software/arm-trusted-firmware/commit/0969397f295621aa26b3d14b76dd397d22be58bf
>
> Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Geert Uytterhoeven <geert+renesas@glider.be>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Cc: Wolfram Sang <wsa@the-dreams.de>
> Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> Cc: linux-renesas-soc@vger.kernel.org
> ---
> V2: - Update commit message, add link to TFA repository commit
>     - Handle the LPAE case as in ARM fault.c and fsr-{2,3}level.c
>     - Cache clock and check whether they are enabled before register
>       access

Thanks for the update!

> --- a/drivers/pci/controller/pcie-rcar-host.c
> +++ b/drivers/pci/controller/pcie-rcar-host.c
> @@ -13,6 +13,7 @@
>
>  #include <linux/bitops.h>
>  #include <linux/clk.h>
> +#include <linux/clk-provider.h>
>  #include <linux/delay.h>
>  #include <linux/interrupt.h>
>  #include <linux/irq.h>
> @@ -42,6 +43,21 @@ struct rcar_msi {
>         int irq2;
>  };
>
> +#ifdef CONFIG_ARM
> +/*
> + * Here we keep a static copy of the remapped PCIe controller address.
> + * This is only used on aarch32 systems, all of which have one single
> + * PCIe controller, to provide quick access to the PCIe controller in
> + * the L1 link state fixup function, called from ARM fault handler.

the ARM fault handler

> + */
> +static void __iomem *pcie_base;
> +/*
> + * Static copy of bus clock pointer, so we can check whether the clock
> + * are enabled or not.

is

> + */
> +static struct clk *pcie_bus_clk;
> +#endif

> @@ -1050,4 +1072,47 @@ static struct platform_driver rcar_pcie_driver = {
>         },
>         .probe = rcar_pcie_probe,
>  };
> +
> +#ifdef CONFIG_ARM
> +static int rcar_pcie_aarch32_abort_handler(unsigned long addr,
> +               unsigned int fsr, struct pt_regs *regs)
> +{
> +       u32 pmsr;
> +
> +       if (!pcie_base || !pcie_bus_clk || !__clk_is_enabled(pcie_bus_clk))

As __clk_is_enabled() handles a NULL pointer parameter, the second
test is not needed.

> +               return 1;
> +
> +       pmsr = readl(pcie_base + PMSR);
> +
> +       /*
> +        * Test if the PCIe controller received PM_ENTER_L1 DLLP and
> +        * the PCIe controller is not in L1 link state. If true, apply
> +        * fix, which will put the controller into L1 link state, from
> +        * which it can return to L0s/L0 on its own.
> +        */
> +       if ((pmsr & PMEL1RX) && ((pmsr & PMSTATE) != PMSTATE_L1)) {
> +               writel(L1IATN, pcie_base + PMCTLR);
> +               while (!(readl(pcie_base + PMSR) & L1FAEG))
> +                       ;

To answer my own question from previous review cycle: this takes 7
cycles on r8a7791/koelsch.

> +               writel(L1FAEG | PMEL1RX, pcie_base + PMSR);
> +               return 0;
> +       }
> +
> +       return 1;
> +}
> +
> +static int __init rcar_pcie_init(void)
> +{
> +#ifdef CONFIG_ARM_LPAE
> +       hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> +                       "asynchronous external abort");
> +#else
> +       hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> +                       "imprecise external abort");
> +#endif

As there can be only a single handler, this may interfere with a handler
for another platform in a multi-platform kernel.
Hence I think this should not be done unconditionally, but be moved to
the driver's .probe() callback.

> +       return platform_driver_register(&rcar_pcie_driver);
> +}
> +device_initcall(rcar_pcie_init);
> +#else
>  builtin_platform_driver(rcar_pcie_driver);
> +#endif

As this fixes the crash seen during s2ram with an Intel E1000E card
present and the e1000e driver loaded:
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Marek Vasut Oct. 4, 2020, 2:10 p.m. UTC | #3

On 9/27/20 10:29 AM, Sergei Shtylyov wrote:
[...]
>> the ARM 'imprecise external abort' handler is invoked.
>>
>> Just like other PCI controller drivers, here we hook the fault handler,
>> perform the fixup to help the controller enter L1 link state, and then
>> restart the instruction which triggered the fault. Since the controller
> 
>    If this is imprecise or async external abort, how we can re-execute
> the instruction that triggered the fault? It's been probably executed
> already, no?

It has been executed and triggered the fault, because it could not
access the data across the bus. Now the bus is back in operational
state, so restart the instruction, let it access the data and do its task.

[...]

Marek Vasut Oct. 4, 2020, 2:14 p.m. UTC | #4

On 9/28/20 11:35 AM, Geert Uytterhoeven wrote:
[...]
>> +static int __init rcar_pcie_init(void)
>> +{
>> +#ifdef CONFIG_ARM_LPAE
>> +       hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
>> +                       "asynchronous external abort");
>> +#else
>> +       hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
>> +                       "imprecise external abort");
>> +#endif
> 
> As there can be only a single handler, this may interfere with a handler
> for another platform in a multi-platform kernel.
> Hence I think this should not be done unconditionally, but be moved to
> the driver's .probe() callback.

Why is nobody doing this in the probe code then ? It seems all the other
drivers which hook fault code do it in init as well. I can imagine that
something might trip the fault handler even before probe is called, e.g.
some PM handling or simply user accessing that PCIe area using setpci.

[...]

Geert Uytterhoeven Oct. 5, 2020, 7:31 a.m. UTC | #5

Hi Marek,

On Sun, Oct 4, 2020 at 4:16 PM Marek Vasut <marek.vasut@gmail.com> wrote:
> On 9/28/20 11:35 AM, Geert Uytterhoeven wrote:
> [...]
> >> +static int __init rcar_pcie_init(void)
> >> +{
> >> +#ifdef CONFIG_ARM_LPAE
> >> +       hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> >> +                       "asynchronous external abort");
> >> +#else
> >> +       hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> >> +                       "imprecise external abort");
> >> +#endif
> >
> > As there can be only a single handler, this may interfere with a handler
> > for another platform in a multi-platform kernel.
> > Hence I think this should not be done unconditionally, but be moved to
> > the driver's .probe() callback.
>
> Why is nobody doing this in the probe code then ? It seems all the other

drivers/pci/controller/dwc/pci-keystone.c is:

  ks_pcie_probe()
    ks_pcie_add_pcie_port()
      dw_pcie_host_init()
        pp->ops->host_init(pp) = ks_pcie_host_init()
          hook_fault_code()

> drivers which hook fault code do it in init as well. I can imagine that

Probably nobody bothered exercising the external abort handler on
multi-platform kernels?

> something might trip the fault handler even before probe is called, e.g.
> some PM handling or simply user accessing that PCIe area using setpci.

If that is the case, it must indeed by done earlier, but still
conditional on the presence of the actual PCIe controller.

Gr{oetje,eeting}s,

                        Geert

Marek Vasut Oct. 5, 2020, 8 a.m. UTC | #6

On 10/5/20 9:31 AM, Geert Uytterhoeven wrote:
> Hi Marek,

Hi,

> On Sun, Oct 4, 2020 at 4:16 PM Marek Vasut <marek.vasut@gmail.com> wrote:
>> On 9/28/20 11:35 AM, Geert Uytterhoeven wrote:
>> [...]
>>>> +static int __init rcar_pcie_init(void)
>>>> +{
>>>> +#ifdef CONFIG_ARM_LPAE
>>>> +       hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
>>>> +                       "asynchronous external abort");
>>>> +#else
>>>> +       hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
>>>> +                       "imprecise external abort");
>>>> +#endif
>>>
>>> As there can be only a single handler, this may interfere with a handler
>>> for another platform in a multi-platform kernel.
>>> Hence I think this should not be done unconditionally, but be moved to
>>> the driver's .probe() callback.
>>
>> Why is nobody doing this in the probe code then ? It seems all the other
> 
> drivers/pci/controller/dwc/pci-keystone.c is:
> 
>   ks_pcie_probe()
>     ks_pcie_add_pcie_port()
>       dw_pcie_host_init()
>         pp->ops->host_init(pp) = ks_pcie_host_init()
>           hook_fault_code()

Well that one is interesting. I wonder whether that driver has the same
LPAE bug (different fault code for LPAE and non-LPAE configuration) we
found here too, since it is used on CA15 TI SoCs.

>> drivers which hook fault code do it in init as well. I can imagine that
> 
> Probably nobody bothered exercising the external abort handler on
> multi-platform kernels?
> 
>> something might trip the fault handler even before probe is called, e.g.
>> some PM handling or simply user accessing that PCIe area using setpci.
> 
> If that is the case, it must indeed by done earlier, but still
> conditional on the presence of the actual PCIe controller.

I am open to suggestions how to do that part.

Geert Uytterhoeven Oct. 5, 2020, 8:07 a.m. UTC | #7

Hi Marek,

On Mon, Oct 5, 2020 at 10:00 AM Marek Vasut <marek.vasut@gmail.com> wrote:
> On 10/5/20 9:31 AM, Geert Uytterhoeven wrote:
> > On Sun, Oct 4, 2020 at 4:16 PM Marek Vasut <marek.vasut@gmail.com> wrote:
> >> On 9/28/20 11:35 AM, Geert Uytterhoeven wrote:
> >> [...]
> >>>> +static int __init rcar_pcie_init(void)
> >>>> +{
> >>>> +#ifdef CONFIG_ARM_LPAE
> >>>> +       hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> >>>> +                       "asynchronous external abort");
> >>>> +#else
> >>>> +       hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> >>>> +                       "imprecise external abort");
> >>>> +#endif
> >>>
> >>> As there can be only a single handler, this may interfere with a handler
> >>> for another platform in a multi-platform kernel.
> >>> Hence I think this should not be done unconditionally, but be moved to
> >>> the driver's .probe() callback.
> >>
> >> Why is nobody doing this in the probe code then ? It seems all the other
> >
> > drivers/pci/controller/dwc/pci-keystone.c is:
> >
> >   ks_pcie_probe()
> >     ks_pcie_add_pcie_port()
> >       dw_pcie_host_init()
> >         pp->ops->host_init(pp) = ks_pcie_host_init()
> >           hook_fault_code()
>
> Well that one is interesting. I wonder whether that driver has the same
> LPAE bug (different fault code for LPAE and non-LPAE configuration) we
> found here too, since it is used on CA15 TI SoCs.

Isn't that an ARM "feature"?

    arch/arm/mm/fault.c-/* FSR definition */
    arch/arm/mm/fault.c:#ifdef CONFIG_ARM_LPAE
    arch/arm/mm/fault.c-#include "fsr-3level.c"
    arch/arm/mm/fault.c-#else
    arch/arm/mm/fault.c-#include "fsr-2level.c"
    arch/arm/mm/fault.c-#endif

> >> drivers which hook fault code do it in init as well. I can imagine that
> >
> > Probably nobody bothered exercising the external abort handler on
> > multi-platform kernels?
> >
> >> something might trip the fault handler even before probe is called, e.g.
> >> some PM handling or simply user accessing that PCIe area using setpci.
> >
> > If that is the case, it must indeed by done earlier, but still
> > conditional on the presence of the actual PCIe controller.
>
> I am open to suggestions how to do that part.

    if (of_find_matching_node(...))
           do_the_right_stuff();

Gr{oetje,eeting}s,

                        Geert

Rob Herring (Arm) Oct. 7, 2020, 6:29 p.m. UTC | #8

On Mon, Oct 05, 2020 at 09:31:54AM +0200, Geert Uytterhoeven wrote:
> Hi Marek,
> 
> On Sun, Oct 4, 2020 at 4:16 PM Marek Vasut <marek.vasut@gmail.com> wrote:
> > On 9/28/20 11:35 AM, Geert Uytterhoeven wrote:
> > [...]
> > >> +static int __init rcar_pcie_init(void)
> > >> +{
> > >> +#ifdef CONFIG_ARM_LPAE
> > >> +       hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> > >> +                       "asynchronous external abort");
> > >> +#else
> > >> +       hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> > >> +                       "imprecise external abort");
> > >> +#endif
> > >
> > > As there can be only a single handler, this may interfere with a handler
> > > for another platform in a multi-platform kernel.
> > > Hence I think this should not be done unconditionally, but be moved to
> > > the driver's .probe() callback.
> >
> > Why is nobody doing this in the probe code then ? It seems all the other
> 
> drivers/pci/controller/dwc/pci-keystone.c is:
> 
>   ks_pcie_probe()
>     ks_pcie_add_pcie_port()
>       dw_pcie_host_init()
>         pp->ops->host_init(pp) = ks_pcie_host_init()
>           hook_fault_code()

Looks broken in deferred probe case as hook_fault_code is __init.

Really, hook_fault_code needs to be exported so these drivers can be 
modules. Or we split out all the abort handlers to a separate broken, 
aborting PCI hosts module.


> > drivers which hook fault code do it in init as well. I can imagine that
> 
> Probably nobody bothered exercising the external abort handler on
> multi-platform kernels?
> 
> > something might trip the fault handler even before probe is called, e.g.
> > some PM handling or simply user accessing that PCIe area using setpci.

I don't see how that's possible. You'd first hit translation faults as 
nothing is mapped.


> If that is the case, it must indeed by done earlier, but still
> conditional on the presence of the actual PCIe controller.

imx6 should be conditional too.

Rob

[V2] PCI: rcar: Add L1 link state fix into data abort hook

Commit Message

Comments

Patch