diff mbox series

of: property: Fix fw_devlink handling of interrupt-map

Message ID 20240528164132.2451685-1-maz@kernel.org (mailing list archive)
State New, archived
Headers show
Series of: property: Fix fw_devlink handling of interrupt-map | expand

Commit Message

Marc Zyngier May 28, 2024, 4:41 p.m. UTC
Commit d976c6f4b32c ("of: property: Add fw_devlink support for
interrupt-map property") tried to do what it says on the tin,
but failed on a couple of points:

- it confuses bytes and cells. Not a huge deal, except when it
  comes to pointer arithmetic

- it doesn't really handle anything but interrupt-maps that have
  their parent #address-cells set to 0

The combinations of the two leads to some serious fun on my M1
box, with plenty of WARN-ON() firing all over the shop, and
amusing values being generated for interrupt specifiers.

Address both issues so that I can boot my machines again.

Fixes: d976c6f4b32c ("of: property: Add fw_devlink support for interrupt-map property")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Anup Patel <apatel@ventanamicro.com>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Rob Herring (Arm) <robh@kernel.org>
---
 drivers/of/property.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Comments

Rob Herring (Arm) May 28, 2024, 5:14 p.m. UTC | #1
On Tue, 28 May 2024 17:41:32 +0100, Marc Zyngier wrote:
> Commit d976c6f4b32c ("of: property: Add fw_devlink support for
> interrupt-map property") tried to do what it says on the tin,
> but failed on a couple of points:
> 
> - it confuses bytes and cells. Not a huge deal, except when it
>   comes to pointer arithmetic
> 
> - it doesn't really handle anything but interrupt-maps that have
>   their parent #address-cells set to 0
> 
> The combinations of the two leads to some serious fun on my M1
> box, with plenty of WARN-ON() firing all over the shop, and
> amusing values being generated for interrupt specifiers.
> 
> Address both issues so that I can boot my machines again.
> 
> Fixes: d976c6f4b32c ("of: property: Add fw_devlink support for interrupt-map property")
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Cc: Anup Patel <apatel@ventanamicro.com>
> Cc: Saravana Kannan <saravanak@google.com>
> Cc: Rob Herring (Arm) <robh@kernel.org>
> ---
>  drivers/of/property.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 

Applied, thanks!
Saravana Kannan May 28, 2024, 5:23 p.m. UTC | #2
On Tue, May 28, 2024 at 6:41 PM Marc Zyngier <maz@kernel.org> wrote:
>
> Commit d976c6f4b32c ("of: property: Add fw_devlink support for
> interrupt-map property") tried to do what it says on the tin,
> but failed on a couple of points:
>
> - it confuses bytes and cells. Not a huge deal, except when it
>   comes to pointer arithmetic
>
> - it doesn't really handle anything but interrupt-maps that have
>   their parent #address-cells set to 0
>
> The combinations of the two leads to some serious fun on my M1
> box, with plenty of WARN-ON() firing all over the shop, and
> amusing values being generated for interrupt specifiers.
>
> Address both issues so that I can boot my machines again.
>
> Fixes: d976c6f4b32c ("of: property: Add fw_devlink support for interrupt-map property")
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Cc: Anup Patel <apatel@ventanamicro.com>
> Cc: Saravana Kannan <saravanak@google.com>
> Cc: Rob Herring (Arm) <robh@kernel.org>
> ---
>  drivers/of/property.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/of/property.c b/drivers/of/property.c
> index 1c83e68f805b..9adebc63bea9 100644
> --- a/drivers/of/property.c
> +++ b/drivers/of/property.c
> @@ -1322,7 +1322,13 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
>         addrcells = of_bus_n_addr_cells(np);
>
>         imap = of_get_property(np, "interrupt-map", &imaplen);
> -       if (!imap || imaplen <= (addrcells + intcells))
> +       imaplen /= sizeof(*imap);
> +
> +       /*
> +        * Check that we have enough runway for the child unit interrupt
> +        * specifier and a phandle. That's the bare minimum we can expect.
> +        */
> +       if (!imap || imaplen <= (addrcells + intcells + 1))
>                 return NULL;
>         imap_end = imap + imaplen;
>
> @@ -1346,8 +1352,14 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
>                 if (!index)
>                         return sup_args.np;
>
> -               of_node_put(sup_args.np);
> +               /*
> +                * Account for the full parent unit interrupt specifier
> +                * (address cells, interrupt cells, and phandle).
> +                */
> +               imap += of_bus_n_addr_cells(sup_args.np);
>                 imap += sup_args.args_count + 1;
> +
> +               of_node_put(sup_args.np);
>                 index--;
>         }
>

Thanks Marc! And sorry for not catching this in my earlier review!

Acked-by: Saravana Kannan <saravanak@google.com>

-Saravana
Anup Patel May 29, 2024, 5:15 a.m. UTC | #3
On Tue, May 28, 2024 at 10:11 PM Marc Zyngier <maz@kernel.org> wrote:
>
> Commit d976c6f4b32c ("of: property: Add fw_devlink support for
> interrupt-map property") tried to do what it says on the tin,
> but failed on a couple of points:
>
> - it confuses bytes and cells. Not a huge deal, except when it
>   comes to pointer arithmetic
>
> - it doesn't really handle anything but interrupt-maps that have
>   their parent #address-cells set to 0
>
> The combinations of the two leads to some serious fun on my M1
> box, with plenty of WARN-ON() firing all over the shop, and
> amusing values being generated for interrupt specifiers.
>
> Address both issues so that I can boot my machines again.
>
> Fixes: d976c6f4b32c ("of: property: Add fw_devlink support for interrupt-map property")
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Cc: Anup Patel <apatel@ventanamicro.com>
> Cc: Saravana Kannan <saravanak@google.com>
> Cc: Rob Herring (Arm) <robh@kernel.org>

Thanks for the fix patch but unfortunately it breaks for RISC-V.

> ---
>  drivers/of/property.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/of/property.c b/drivers/of/property.c
> index 1c83e68f805b..9adebc63bea9 100644
> --- a/drivers/of/property.c
> +++ b/drivers/of/property.c
> @@ -1322,7 +1322,13 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
>         addrcells = of_bus_n_addr_cells(np);
>
>         imap = of_get_property(np, "interrupt-map", &imaplen);
> -       if (!imap || imaplen <= (addrcells + intcells))
> +       imaplen /= sizeof(*imap);
> +
> +       /*
> +        * Check that we have enough runway for the child unit interrupt
> +        * specifier and a phandle. That's the bare minimum we can expect.
> +        */
> +       if (!imap || imaplen <= (addrcells + intcells + 1))
>                 return NULL;
>         imap_end = imap + imaplen;
>
> @@ -1346,8 +1352,14 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
>                 if (!index)
>                         return sup_args.np;
>
> -               of_node_put(sup_args.np);
> +               /*
> +                * Account for the full parent unit interrupt specifier
> +                * (address cells, interrupt cells, and phandle).
> +                */
> +               imap += of_bus_n_addr_cells(sup_args.np);

This breaks for RISC-V because we don't have "#address-cells"
property in interrupt controller DT node and of_bus_n_addr_cells()
retrieves "#address-cells" from the parent of interrupt controller.

The of_irq_parse_raw() looks for "#address-cells" property
in the interrupt controller DT node only so we should do a
similar thing here as well.

The below change on top of this patch worked for me.

diff --git a/drivers/of/property.c b/drivers/of/property.c
index 9adebc63bea9..f54da2989ea9 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1308,7 +1308,7 @@ static struct device_node
*parse_interrupt_map(struct device_node *np,
 {
     const __be32 *imap, *imap_end, *addr;
     struct of_phandle_args sup_args;
-    u32 addrcells, intcells;
+    u32 addrcells, intcells, paddrcells;
     int i, imaplen;

     if (!IS_ENABLED(CONFIG_OF_IRQ))
@@ -1356,7 +1356,8 @@ static struct device_node
*parse_interrupt_map(struct device_node *np,
          * Account for the full parent unit interrupt specifier
          * (address cells, interrupt cells, and phandle).
          */
-        imap += of_bus_n_addr_cells(sup_args.np);
+        if (!of_property_read_u32(sup_args.np, "#address-cells", &paddrcells))
+            imap += paddrcells;
         imap += sup_args.args_count + 1;

         of_node_put(sup_args.np);

<snip>

Regards,
Anup
Marc Zyngier May 29, 2024, 6:33 a.m. UTC | #4
On Wed, 29 May 2024 06:15:52 +0100,
Anup Patel <apatel@ventanamicro.com> wrote:
> 
> On Tue, May 28, 2024 at 10:11 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > Commit d976c6f4b32c ("of: property: Add fw_devlink support for
> > interrupt-map property") tried to do what it says on the tin,
> > but failed on a couple of points:
> >
> > - it confuses bytes and cells. Not a huge deal, except when it
> >   comes to pointer arithmetic
> >
> > - it doesn't really handle anything but interrupt-maps that have
> >   their parent #address-cells set to 0
> >
> > The combinations of the two leads to some serious fun on my M1
> > box, with plenty of WARN-ON() firing all over the shop, and
> > amusing values being generated for interrupt specifiers.
> >
> > Address both issues so that I can boot my machines again.
> >
> > Fixes: d976c6f4b32c ("of: property: Add fw_devlink support for interrupt-map property")
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > Cc: Anup Patel <apatel@ventanamicro.com>
> > Cc: Saravana Kannan <saravanak@google.com>
> > Cc: Rob Herring (Arm) <robh@kernel.org>
> 
> Thanks for the fix patch but unfortunately it breaks for RISC-V.
> 
> > ---
> >  drivers/of/property.c | 16 ++++++++++++++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > index 1c83e68f805b..9adebc63bea9 100644
> > --- a/drivers/of/property.c
> > +++ b/drivers/of/property.c
> > @@ -1322,7 +1322,13 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
> >         addrcells = of_bus_n_addr_cells(np);
> >
> >         imap = of_get_property(np, "interrupt-map", &imaplen);
> > -       if (!imap || imaplen <= (addrcells + intcells))
> > +       imaplen /= sizeof(*imap);
> > +
> > +       /*
> > +        * Check that we have enough runway for the child unit interrupt
> > +        * specifier and a phandle. That's the bare minimum we can expect.
> > +        */
> > +       if (!imap || imaplen <= (addrcells + intcells + 1))
> >                 return NULL;
> >         imap_end = imap + imaplen;
> >
> > @@ -1346,8 +1352,14 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
> >                 if (!index)
> >                         return sup_args.np;
> >
> > -               of_node_put(sup_args.np);
> > +               /*
> > +                * Account for the full parent unit interrupt specifier
> > +                * (address cells, interrupt cells, and phandle).
> > +                */
> > +               imap += of_bus_n_addr_cells(sup_args.np);
> 
> This breaks for RISC-V because we don't have "#address-cells"
> property in interrupt controller DT node and of_bus_n_addr_cells()
> retrieves "#address-cells" from the parent of interrupt controller.

That's a feature, not a bug. #address-cells, AFAICT, applies to all
child nodes until you set it otherwise.

> 
> The of_irq_parse_raw() looks for "#address-cells" property
> in the interrupt controller DT node only so we should do a
> similar thing here as well.

This looks more like a of_irq_parse_raw() bug than anything else.

> 
> The below change on top of this patch worked for me.
> 
> diff --git a/drivers/of/property.c b/drivers/of/property.c
> index 9adebc63bea9..f54da2989ea9 100644
> --- a/drivers/of/property.c
> +++ b/drivers/of/property.c
> @@ -1308,7 +1308,7 @@ static struct device_node
> *parse_interrupt_map(struct device_node *np,
>  {
>      const __be32 *imap, *imap_end, *addr;
>      struct of_phandle_args sup_args;
> -    u32 addrcells, intcells;
> +    u32 addrcells, intcells, paddrcells;
>      int i, imaplen;
> 
>      if (!IS_ENABLED(CONFIG_OF_IRQ))
> @@ -1356,7 +1356,8 @@ static struct device_node
> *parse_interrupt_map(struct device_node *np,
>           * Account for the full parent unit interrupt specifier
>           * (address cells, interrupt cells, and phandle).
>           */
> -        imap += of_bus_n_addr_cells(sup_args.np);
> +        if (!of_property_read_u32(sup_args.np, "#address-cells", &paddrcells))
> +            imap += paddrcells;

This looks wrong to me for the reason I outlined above: you need to
look for a valid #address-cells all along the parent chain, not just
in the interrupt-controller node.

	M.
Anup Patel May 29, 2024, 10:16 a.m. UTC | #5
On Wed, May 29, 2024 at 12:03 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Wed, 29 May 2024 06:15:52 +0100,
> Anup Patel <apatel@ventanamicro.com> wrote:
> >
> > On Tue, May 28, 2024 at 10:11 PM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > Commit d976c6f4b32c ("of: property: Add fw_devlink support for
> > > interrupt-map property") tried to do what it says on the tin,
> > > but failed on a couple of points:
> > >
> > > - it confuses bytes and cells. Not a huge deal, except when it
> > >   comes to pointer arithmetic
> > >
> > > - it doesn't really handle anything but interrupt-maps that have
> > >   their parent #address-cells set to 0
> > >
> > > The combinations of the two leads to some serious fun on my M1
> > > box, with plenty of WARN-ON() firing all over the shop, and
> > > amusing values being generated for interrupt specifiers.
> > >
> > > Address both issues so that I can boot my machines again.
> > >
> > > Fixes: d976c6f4b32c ("of: property: Add fw_devlink support for interrupt-map property")
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > Cc: Anup Patel <apatel@ventanamicro.com>
> > > Cc: Saravana Kannan <saravanak@google.com>
> > > Cc: Rob Herring (Arm) <robh@kernel.org>
> >
> > Thanks for the fix patch but unfortunately it breaks for RISC-V.
> >
> > > ---
> > >  drivers/of/property.c | 16 ++++++++++++++--
> > >  1 file changed, 14 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > > index 1c83e68f805b..9adebc63bea9 100644
> > > --- a/drivers/of/property.c
> > > +++ b/drivers/of/property.c
> > > @@ -1322,7 +1322,13 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
> > >         addrcells = of_bus_n_addr_cells(np);
> > >
> > >         imap = of_get_property(np, "interrupt-map", &imaplen);
> > > -       if (!imap || imaplen <= (addrcells + intcells))
> > > +       imaplen /= sizeof(*imap);
> > > +
> > > +       /*
> > > +        * Check that we have enough runway for the child unit interrupt
> > > +        * specifier and a phandle. That's the bare minimum we can expect.
> > > +        */
> > > +       if (!imap || imaplen <= (addrcells + intcells + 1))
> > >                 return NULL;
> > >         imap_end = imap + imaplen;
> > >
> > > @@ -1346,8 +1352,14 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
> > >                 if (!index)
> > >                         return sup_args.np;
> > >
> > > -               of_node_put(sup_args.np);
> > > +               /*
> > > +                * Account for the full parent unit interrupt specifier
> > > +                * (address cells, interrupt cells, and phandle).
> > > +                */
> > > +               imap += of_bus_n_addr_cells(sup_args.np);
> >
> > This breaks for RISC-V because we don't have "#address-cells"
> > property in interrupt controller DT node and of_bus_n_addr_cells()
> > retrieves "#address-cells" from the parent of interrupt controller.
>
> That's a feature, not a bug. #address-cells, AFAICT, applies to all
> child nodes until you set it otherwise.
>
> >
> > The of_irq_parse_raw() looks for "#address-cells" property
> > in the interrupt controller DT node only so we should do a
> > similar thing here as well.
>
> This looks more like a of_irq_parse_raw() bug than anything else.

If we change of_irq_parse_raw() to use of_bus_n_addr_cells()
then it would still break for RISC-V.

Using of_bus_n_addr_cells() over here forces interrupt controller
DT nodes to have a "#address-cells" DT property. There are many
interrupt controller (e.g. RISC-V PLIC or RISC-V APLIC) where the
DT bindings don't require "#address-cells" DT property and existing
DTS files with such interrupt controllers don't have it either.

In the RISC-V world, there have been quite a few QEMU releases
where the generated DT node of the interrupt controller does not
have the "#address-cells" property. This patch breaks the kernel
for all such QEMU releases.

I think we should align the implementation in parse_interrupt_map()
with of_irq_parse_raw() and use of_property_read_u32() instead of
of_bus_n_addr_cells().

Regards,
Anup

>
> >
> > The below change on top of this patch worked for me.
> >
> > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > index 9adebc63bea9..f54da2989ea9 100644
> > --- a/drivers/of/property.c
> > +++ b/drivers/of/property.c
> > @@ -1308,7 +1308,7 @@ static struct device_node
> > *parse_interrupt_map(struct device_node *np,
> >  {
> >      const __be32 *imap, *imap_end, *addr;
> >      struct of_phandle_args sup_args;
> > -    u32 addrcells, intcells;
> > +    u32 addrcells, intcells, paddrcells;
> >      int i, imaplen;
> >
> >      if (!IS_ENABLED(CONFIG_OF_IRQ))
> > @@ -1356,7 +1356,8 @@ static struct device_node
> > *parse_interrupt_map(struct device_node *np,
> >           * Account for the full parent unit interrupt specifier
> >           * (address cells, interrupt cells, and phandle).
> >           */
> > -        imap += of_bus_n_addr_cells(sup_args.np);
> > +        if (!of_property_read_u32(sup_args.np, "#address-cells", &paddrcells))
> > +            imap += paddrcells;
>
> This looks wrong to me for the reason I outlined above: you need to
> look for a valid #address-cells all along the parent chain, not just
> in the interrupt-controller node.
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.
Marc Zyngier May 29, 2024, 10:44 a.m. UTC | #6
On Wed, 29 May 2024 11:16:30 +0100,
Anup Patel <apatel@ventanamicro.com> wrote:
> 
> On Wed, May 29, 2024 at 12:03 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Wed, 29 May 2024 06:15:52 +0100,
> > Anup Patel <apatel@ventanamicro.com> wrote:
> > >
> > > On Tue, May 28, 2024 at 10:11 PM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > Commit d976c6f4b32c ("of: property: Add fw_devlink support for
> > > > interrupt-map property") tried to do what it says on the tin,
> > > > but failed on a couple of points:
> > > >
> > > > - it confuses bytes and cells. Not a huge deal, except when it
> > > >   comes to pointer arithmetic
> > > >
> > > > - it doesn't really handle anything but interrupt-maps that have
> > > >   their parent #address-cells set to 0
> > > >
> > > > The combinations of the two leads to some serious fun on my M1
> > > > box, with plenty of WARN-ON() firing all over the shop, and
> > > > amusing values being generated for interrupt specifiers.
> > > >
> > > > Address both issues so that I can boot my machines again.
> > > >
> > > > Fixes: d976c6f4b32c ("of: property: Add fw_devlink support for interrupt-map property")
> > > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > > Cc: Anup Patel <apatel@ventanamicro.com>
> > > > Cc: Saravana Kannan <saravanak@google.com>
> > > > Cc: Rob Herring (Arm) <robh@kernel.org>
> > >
> > > Thanks for the fix patch but unfortunately it breaks for RISC-V.
> > >
> > > > ---
> > > >  drivers/of/property.c | 16 ++++++++++++++--
> > > >  1 file changed, 14 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > > > index 1c83e68f805b..9adebc63bea9 100644
> > > > --- a/drivers/of/property.c
> > > > +++ b/drivers/of/property.c
> > > > @@ -1322,7 +1322,13 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
> > > >         addrcells = of_bus_n_addr_cells(np);
> > > >
> > > >         imap = of_get_property(np, "interrupt-map", &imaplen);
> > > > -       if (!imap || imaplen <= (addrcells + intcells))
> > > > +       imaplen /= sizeof(*imap);
> > > > +
> > > > +       /*
> > > > +        * Check that we have enough runway for the child unit interrupt
> > > > +        * specifier and a phandle. That's the bare minimum we can expect.
> > > > +        */
> > > > +       if (!imap || imaplen <= (addrcells + intcells + 1))
> > > >                 return NULL;
> > > >         imap_end = imap + imaplen;
> > > >
> > > > @@ -1346,8 +1352,14 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
> > > >                 if (!index)
> > > >                         return sup_args.np;
> > > >
> > > > -               of_node_put(sup_args.np);
> > > > +               /*
> > > > +                * Account for the full parent unit interrupt specifier
> > > > +                * (address cells, interrupt cells, and phandle).
> > > > +                */
> > > > +               imap += of_bus_n_addr_cells(sup_args.np);
> > >
> > > This breaks for RISC-V because we don't have "#address-cells"
> > > property in interrupt controller DT node and of_bus_n_addr_cells()
> > > retrieves "#address-cells" from the parent of interrupt controller.
> >
> > That's a feature, not a bug. #address-cells, AFAICT, applies to all
> > child nodes until you set it otherwise.
> >
> > >
> > > The of_irq_parse_raw() looks for "#address-cells" property
> > > in the interrupt controller DT node only so we should do a
> > > similar thing here as well.
> >
> > This looks more like a of_irq_parse_raw() bug than anything else.
> 
> If we change of_irq_parse_raw() to use of_bus_n_addr_cells()
> then it would still break for RISC-V.

I'm not trying to "fix" riscv. I'm merely outlining that you are
relying on both broken DTs and a buggy OS.

> 
> Using of_bus_n_addr_cells() over here forces interrupt controller
> DT nodes to have a "#address-cells" DT property. There are many
> interrupt controller (e.g. RISC-V PLIC or RISC-V APLIC) where the
> DT bindings don't require "#address-cells" DT property and existing
> DTS files with such interrupt controllers don't have it either.

It forces you to set #address-cells *if you rely on a different
value in a child node*. It's not like the semantics are new.

> 
> In the RISC-V world, there have been quite a few QEMU releases
> where the generated DT node of the interrupt controller does not
> have the "#address-cells" property. This patch breaks the kernel
> for all such QEMU releases.

Congratulations, you've forked DT. News at 11.

> 
> I think we should align the implementation in parse_interrupt_map()
> with of_irq_parse_raw() and use of_property_read_u32() instead of
> of_bus_n_addr_cells().

I think we should fix the kernel and quirk riscv as broken, just like
PPC or sparc.

	M.
Anup Patel May 29, 2024, 11:28 a.m. UTC | #7
On Wed, May 29, 2024 at 4:15 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Wed, 29 May 2024 11:16:30 +0100,
> Anup Patel <apatel@ventanamicro.com> wrote:
> >
> > On Wed, May 29, 2024 at 12:03 PM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Wed, 29 May 2024 06:15:52 +0100,
> > > Anup Patel <apatel@ventanamicro.com> wrote:
> > > >
> > > > On Tue, May 28, 2024 at 10:11 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > >
> > > > > Commit d976c6f4b32c ("of: property: Add fw_devlink support for
> > > > > interrupt-map property") tried to do what it says on the tin,
> > > > > but failed on a couple of points:
> > > > >
> > > > > - it confuses bytes and cells. Not a huge deal, except when it
> > > > >   comes to pointer arithmetic
> > > > >
> > > > > - it doesn't really handle anything but interrupt-maps that have
> > > > >   their parent #address-cells set to 0
> > > > >
> > > > > The combinations of the two leads to some serious fun on my M1
> > > > > box, with plenty of WARN-ON() firing all over the shop, and
> > > > > amusing values being generated for interrupt specifiers.
> > > > >
> > > > > Address both issues so that I can boot my machines again.
> > > > >
> > > > > Fixes: d976c6f4b32c ("of: property: Add fw_devlink support for interrupt-map property")
> > > > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > > > Cc: Anup Patel <apatel@ventanamicro.com>
> > > > > Cc: Saravana Kannan <saravanak@google.com>
> > > > > Cc: Rob Herring (Arm) <robh@kernel.org>
> > > >
> > > > Thanks for the fix patch but unfortunately it breaks for RISC-V.
> > > >
> > > > > ---
> > > > >  drivers/of/property.c | 16 ++++++++++++++--
> > > > >  1 file changed, 14 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > > > > index 1c83e68f805b..9adebc63bea9 100644
> > > > > --- a/drivers/of/property.c
> > > > > +++ b/drivers/of/property.c
> > > > > @@ -1322,7 +1322,13 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
> > > > >         addrcells = of_bus_n_addr_cells(np);
> > > > >
> > > > >         imap = of_get_property(np, "interrupt-map", &imaplen);
> > > > > -       if (!imap || imaplen <= (addrcells + intcells))
> > > > > +       imaplen /= sizeof(*imap);
> > > > > +
> > > > > +       /*
> > > > > +        * Check that we have enough runway for the child unit interrupt
> > > > > +        * specifier and a phandle. That's the bare minimum we can expect.
> > > > > +        */
> > > > > +       if (!imap || imaplen <= (addrcells + intcells + 1))
> > > > >                 return NULL;
> > > > >         imap_end = imap + imaplen;
> > > > >
> > > > > @@ -1346,8 +1352,14 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
> > > > >                 if (!index)
> > > > >                         return sup_args.np;
> > > > >
> > > > > -               of_node_put(sup_args.np);
> > > > > +               /*
> > > > > +                * Account for the full parent unit interrupt specifier
> > > > > +                * (address cells, interrupt cells, and phandle).
> > > > > +                */
> > > > > +               imap += of_bus_n_addr_cells(sup_args.np);
> > > >
> > > > This breaks for RISC-V because we don't have "#address-cells"
> > > > property in interrupt controller DT node and of_bus_n_addr_cells()
> > > > retrieves "#address-cells" from the parent of interrupt controller.
> > >
> > > That's a feature, not a bug. #address-cells, AFAICT, applies to all
> > > child nodes until you set it otherwise.
> > >
> > > >
> > > > The of_irq_parse_raw() looks for "#address-cells" property
> > > > in the interrupt controller DT node only so we should do a
> > > > similar thing here as well.
> > >
> > > This looks more like a of_irq_parse_raw() bug than anything else.
> >
> > If we change of_irq_parse_raw() to use of_bus_n_addr_cells()
> > then it would still break for RISC-V.
>
> I'm not trying to "fix" riscv. I'm merely outlining that you are
> relying on both broken DTs and a buggy OS.
>
> >
> > Using of_bus_n_addr_cells() over here forces interrupt controller
> > DT nodes to have a "#address-cells" DT property. There are many
> > interrupt controller (e.g. RISC-V PLIC or RISC-V APLIC) where the
> > DT bindings don't require "#address-cells" DT property and existing
> > DTS files with such interrupt controllers don't have it either.
>
> It forces you to set #address-cells *if you rely on a different
> value in a child node*. It's not like the semantics are new.

We don't have child nodes under the interrupt controller DT node
(for both RISC-V PLIC and APLIC) so we certainly don't require the
"#address-cells" property in the interrupt controller DT node.

>
> >
> > In the RISC-V world, there have been quite a few QEMU releases
> > where the generated DT node of the interrupt controller does not
> > have the "#address-cells" property. This patch breaks the kernel
> > for all such QEMU releases.
>
> Congratulations, you've forked DT. News at 11.

Can you elaborate how ?

>
> >
> > I think we should align the implementation in parse_interrupt_map()
> > with of_irq_parse_raw() and use of_property_read_u32() instead of
> > of_bus_n_addr_cells().
>
> I think we should fix the kernel and quirk riscv as broken, just like
> PPC or sparc.
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

Regards,
Anup
Marc Zyngier May 29, 2024, noon UTC | #8
On Wed, 29 May 2024 12:28:34 +0100,
Anup Patel <apatel@ventanamicro.com> wrote:
> 
> On Wed, May 29, 2024 at 4:15 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Wed, 29 May 2024 11:16:30 +0100,
> > Anup Patel <apatel@ventanamicro.com> wrote:
> > >
> > > On Wed, May 29, 2024 at 12:03 PM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > On Wed, 29 May 2024 06:15:52 +0100,
> > > > Anup Patel <apatel@ventanamicro.com> wrote:
> > > > >
> > > > > On Tue, May 28, 2024 at 10:11 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > > >
> > > > > > Commit d976c6f4b32c ("of: property: Add fw_devlink support for
> > > > > > interrupt-map property") tried to do what it says on the tin,
> > > > > > but failed on a couple of points:
> > > > > >
> > > > > > - it confuses bytes and cells. Not a huge deal, except when it
> > > > > >   comes to pointer arithmetic
> > > > > >
> > > > > > - it doesn't really handle anything but interrupt-maps that have
> > > > > >   their parent #address-cells set to 0
> > > > > >
> > > > > > The combinations of the two leads to some serious fun on my M1
> > > > > > box, with plenty of WARN-ON() firing all over the shop, and
> > > > > > amusing values being generated for interrupt specifiers.
> > > > > >
> > > > > > Address both issues so that I can boot my machines again.
> > > > > >
> > > > > > Fixes: d976c6f4b32c ("of: property: Add fw_devlink support for interrupt-map property")
> > > > > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > > > > Cc: Anup Patel <apatel@ventanamicro.com>
> > > > > > Cc: Saravana Kannan <saravanak@google.com>
> > > > > > Cc: Rob Herring (Arm) <robh@kernel.org>
> > > > >
> > > > > Thanks for the fix patch but unfortunately it breaks for RISC-V.
> > > > >
> > > > > > ---
> > > > > >  drivers/of/property.c | 16 ++++++++++++++--
> > > > > >  1 file changed, 14 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > > > > > index 1c83e68f805b..9adebc63bea9 100644
> > > > > > --- a/drivers/of/property.c
> > > > > > +++ b/drivers/of/property.c
> > > > > > @@ -1322,7 +1322,13 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
> > > > > >         addrcells = of_bus_n_addr_cells(np);
> > > > > >
> > > > > >         imap = of_get_property(np, "interrupt-map", &imaplen);
> > > > > > -       if (!imap || imaplen <= (addrcells + intcells))
> > > > > > +       imaplen /= sizeof(*imap);
> > > > > > +
> > > > > > +       /*
> > > > > > +        * Check that we have enough runway for the child unit interrupt
> > > > > > +        * specifier and a phandle. That's the bare minimum we can expect.
> > > > > > +        */
> > > > > > +       if (!imap || imaplen <= (addrcells + intcells + 1))
> > > > > >                 return NULL;
> > > > > >         imap_end = imap + imaplen;
> > > > > >
> > > > > > @@ -1346,8 +1352,14 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
> > > > > >                 if (!index)
> > > > > >                         return sup_args.np;
> > > > > >
> > > > > > -               of_node_put(sup_args.np);
> > > > > > +               /*
> > > > > > +                * Account for the full parent unit interrupt specifier
> > > > > > +                * (address cells, interrupt cells, and phandle).
> > > > > > +                */
> > > > > > +               imap += of_bus_n_addr_cells(sup_args.np);
> > > > >
> > > > > This breaks for RISC-V because we don't have "#address-cells"
> > > > > property in interrupt controller DT node and of_bus_n_addr_cells()
> > > > > retrieves "#address-cells" from the parent of interrupt controller.
> > > >
> > > > That's a feature, not a bug. #address-cells, AFAICT, applies to all
> > > > child nodes until you set it otherwise.
> > > >
> > > > >
> > > > > The of_irq_parse_raw() looks for "#address-cells" property
> > > > > in the interrupt controller DT node only so we should do a
> > > > > similar thing here as well.
> > > >
> > > > This looks more like a of_irq_parse_raw() bug than anything else.
> > >
> > > If we change of_irq_parse_raw() to use of_bus_n_addr_cells()
> > > then it would still break for RISC-V.
> >
> > I'm not trying to "fix" riscv. I'm merely outlining that you are
> > relying on both broken DTs and a buggy OS.
> >
> > >
> > > Using of_bus_n_addr_cells() over here forces interrupt controller
> > > DT nodes to have a "#address-cells" DT property. There are many
> > > interrupt controller (e.g. RISC-V PLIC or RISC-V APLIC) where the
> > > DT bindings don't require "#address-cells" DT property and existing
> > > DTS files with such interrupt controllers don't have it either.
> >
> > It forces you to set #address-cells *if you rely on a different
> > value in a child node*. It's not like the semantics are new.
> 
> We don't have child nodes under the interrupt controller DT node
> (for both RISC-V PLIC and APLIC) so we certainly don't require the
> "#address-cells" property in the interrupt controller DT node.

You keep missing the point.

You *do* require it if the parent node has an #address-cells value
that doesn't apply to its children nodes. Basic property inheritance.
Interrupt controller nodes are not special in this regard (and please
allow me to think that I know a thing or two about those).

So it's not that "you don't need it". It's that "you're relying on
something that is broken".

But in your defence, the DT spec is amusingly self-contradictory:

<quote>
2.3.5. #address-cells and #size-cells

The #address-cells and #size-cells properties may be used in any
device node that has children in the devicetree hierarchy and
describes how child device nodes should be addressed. The
#address-cells property defines the number of <u32> cells used to
encode the address field in a child node’s reg property. The
#size-cells property defines the number of <u32> cells used to encode
the size field in a child node’s reg property.

The #address-cells and #size-cells properties are not inherited from
ancestors in the devicetree. They shall be explicitly defined.
</quote>

Followed by:

<quote>
2.4.3.1. interrupt-map

Note

Both the child node and the interrupt parent node are required to have
#address-cells and #interrupt-cells properties defined. If a unit
address component is not required, #address-cells shall be explicitly
defined to be zero.
</quote>

which says one thing and then the other about property inheritance,
but then asserts that #address-cells isn't optional.

> >
> > >
> > > In the RISC-V world, there have been quite a few QEMU releases
> > > where the generated DT node of the interrupt controller does not
> > > have the "#address-cells" property. This patch breaks the kernel
> > > for all such QEMU releases.
> >
> > Congratulations, you've forked DT. News at 11.
> 
> Can you elaborate how ?

You've stated it yourself. You are relying on a behaviour that
deviates from the standard by having DTs with missing properties

And since we can't travel back it time to fix this, the only solution
I can see is to support both behaviours by quirking it.

	M.
Rob Herring (Arm) May 29, 2024, 1:44 p.m. UTC | #9
On Wed, May 29, 2024 at 1:33 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Wed, 29 May 2024 06:15:52 +0100,
> Anup Patel <apatel@ventanamicro.com> wrote:
> >
> > On Tue, May 28, 2024 at 10:11 PM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > Commit d976c6f4b32c ("of: property: Add fw_devlink support for
> > > interrupt-map property") tried to do what it says on the tin,
> > > but failed on a couple of points:
> > >
> > > - it confuses bytes and cells. Not a huge deal, except when it
> > >   comes to pointer arithmetic
> > >
> > > - it doesn't really handle anything but interrupt-maps that have
> > >   their parent #address-cells set to 0
> > >
> > > The combinations of the two leads to some serious fun on my M1
> > > box, with plenty of WARN-ON() firing all over the shop, and
> > > amusing values being generated for interrupt specifiers.
> > >
> > > Address both issues so that I can boot my machines again.
> > >
> > > Fixes: d976c6f4b32c ("of: property: Add fw_devlink support for interrupt-map property")
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > Cc: Anup Patel <apatel@ventanamicro.com>
> > > Cc: Saravana Kannan <saravanak@google.com>
> > > Cc: Rob Herring (Arm) <robh@kernel.org>
> >
> > Thanks for the fix patch but unfortunately it breaks for RISC-V.
> >
> > > ---
> > >  drivers/of/property.c | 16 ++++++++++++++--
> > >  1 file changed, 14 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > > index 1c83e68f805b..9adebc63bea9 100644
> > > --- a/drivers/of/property.c
> > > +++ b/drivers/of/property.c
> > > @@ -1322,7 +1322,13 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
> > >         addrcells = of_bus_n_addr_cells(np);
> > >
> > >         imap = of_get_property(np, "interrupt-map", &imaplen);
> > > -       if (!imap || imaplen <= (addrcells + intcells))
> > > +       imaplen /= sizeof(*imap);
> > > +
> > > +       /*
> > > +        * Check that we have enough runway for the child unit interrupt
> > > +        * specifier and a phandle. That's the bare minimum we can expect.
> > > +        */
> > > +       if (!imap || imaplen <= (addrcells + intcells + 1))
> > >                 return NULL;
> > >         imap_end = imap + imaplen;
> > >
> > > @@ -1346,8 +1352,14 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
> > >                 if (!index)
> > >                         return sup_args.np;
> > >
> > > -               of_node_put(sup_args.np);
> > > +               /*
> > > +                * Account for the full parent unit interrupt specifier
> > > +                * (address cells, interrupt cells, and phandle).
> > > +                */
> > > +               imap += of_bus_n_addr_cells(sup_args.np);
> >
> > This breaks for RISC-V because we don't have "#address-cells"
> > property in interrupt controller DT node and of_bus_n_addr_cells()
> > retrieves "#address-cells" from the parent of interrupt controller.
>
> That's a feature, not a bug. #address-cells, AFAICT, applies to all
> child nodes until you set it otherwise.

That may be supported in some places, but only because of buggy DTs
(we're talking 2000 era). Current dtc should warn if an interrupt
controller node doesn't have #address-cells AND is referred to by
interrupt-map.

There's also the notion of default root values, but that's broken as
well. dtc and the kernel don't even agree on the default for some
arches. Fortunately, that's been a dtc warning longer than I've worked
on DT.


> > The of_irq_parse_raw() looks for "#address-cells" property
> > in the interrupt controller DT node only so we should do a
> > similar thing here as well.
>
> This looks more like a of_irq_parse_raw() bug than anything else.

Nope.

Rob
Rob Herring (Arm) May 29, 2024, 1:51 p.m. UTC | #10
On Wed, May 29, 2024 at 6:28 AM Anup Patel <apatel@ventanamicro.com> wrote:
>
> On Wed, May 29, 2024 at 4:15 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Wed, 29 May 2024 11:16:30 +0100,
> > Anup Patel <apatel@ventanamicro.com> wrote:
> > >
> > > On Wed, May 29, 2024 at 12:03 PM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > On Wed, 29 May 2024 06:15:52 +0100,
> > > > Anup Patel <apatel@ventanamicro.com> wrote:
> > > > >
> > > > > On Tue, May 28, 2024 at 10:11 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > > >
> > > > > > Commit d976c6f4b32c ("of: property: Add fw_devlink support for
> > > > > > interrupt-map property") tried to do what it says on the tin,
> > > > > > but failed on a couple of points:
> > > > > >
> > > > > > - it confuses bytes and cells. Not a huge deal, except when it
> > > > > >   comes to pointer arithmetic
> > > > > >
> > > > > > - it doesn't really handle anything but interrupt-maps that have
> > > > > >   their parent #address-cells set to 0
> > > > > >
> > > > > > The combinations of the two leads to some serious fun on my M1
> > > > > > box, with plenty of WARN-ON() firing all over the shop, and
> > > > > > amusing values being generated for interrupt specifiers.
> > > > > >
> > > > > > Address both issues so that I can boot my machines again.
> > > > > >
> > > > > > Fixes: d976c6f4b32c ("of: property: Add fw_devlink support for interrupt-map property")
> > > > > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > > > > Cc: Anup Patel <apatel@ventanamicro.com>
> > > > > > Cc: Saravana Kannan <saravanak@google.com>
> > > > > > Cc: Rob Herring (Arm) <robh@kernel.org>
> > > > >
> > > > > Thanks for the fix patch but unfortunately it breaks for RISC-V.
> > > > >
> > > > > > ---
> > > > > >  drivers/of/property.c | 16 ++++++++++++++--
> > > > > >  1 file changed, 14 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > > > > > index 1c83e68f805b..9adebc63bea9 100644
> > > > > > --- a/drivers/of/property.c
> > > > > > +++ b/drivers/of/property.c
> > > > > > @@ -1322,7 +1322,13 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
> > > > > >         addrcells = of_bus_n_addr_cells(np);
> > > > > >
> > > > > >         imap = of_get_property(np, "interrupt-map", &imaplen);
> > > > > > -       if (!imap || imaplen <= (addrcells + intcells))
> > > > > > +       imaplen /= sizeof(*imap);
> > > > > > +
> > > > > > +       /*
> > > > > > +        * Check that we have enough runway for the child unit interrupt
> > > > > > +        * specifier and a phandle. That's the bare minimum we can expect.
> > > > > > +        */
> > > > > > +       if (!imap || imaplen <= (addrcells + intcells + 1))
> > > > > >                 return NULL;
> > > > > >         imap_end = imap + imaplen;
> > > > > >
> > > > > > @@ -1346,8 +1352,14 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
> > > > > >                 if (!index)
> > > > > >                         return sup_args.np;
> > > > > >
> > > > > > -               of_node_put(sup_args.np);
> > > > > > +               /*
> > > > > > +                * Account for the full parent unit interrupt specifier
> > > > > > +                * (address cells, interrupt cells, and phandle).
> > > > > > +                */
> > > > > > +               imap += of_bus_n_addr_cells(sup_args.np);
> > > > >
> > > > > This breaks for RISC-V because we don't have "#address-cells"
> > > > > property in interrupt controller DT node and of_bus_n_addr_cells()
> > > > > retrieves "#address-cells" from the parent of interrupt controller.
> > > >
> > > > That's a feature, not a bug. #address-cells, AFAICT, applies to all
> > > > child nodes until you set it otherwise.
> > > >
> > > > >
> > > > > The of_irq_parse_raw() looks for "#address-cells" property
> > > > > in the interrupt controller DT node only so we should do a
> > > > > similar thing here as well.
> > > >
> > > > This looks more like a of_irq_parse_raw() bug than anything else.
> > >
> > > If we change of_irq_parse_raw() to use of_bus_n_addr_cells()
> > > then it would still break for RISC-V.
> >
> > I'm not trying to "fix" riscv. I'm merely outlining that you are
> > relying on both broken DTs and a buggy OS.
> >
> > >
> > > Using of_bus_n_addr_cells() over here forces interrupt controller
> > > DT nodes to have a "#address-cells" DT property. There are many
> > > interrupt controller (e.g. RISC-V PLIC or RISC-V APLIC) where the
> > > DT bindings don't require "#address-cells" DT property and existing
> > > DTS files with such interrupt controllers don't have it either.
> >
> > It forces you to set #address-cells *if you rely on a different
> > value in a child node*. It's not like the semantics are new.
>
> We don't have child nodes under the interrupt controller DT node
> (for both RISC-V PLIC and APLIC) so we certainly don't require the
> "#address-cells" property in the interrupt controller DT node.

interrupt controller nodes which are referred to by interrupt-map
require #address-cells. Period. Child nodes or not.

Really, it should be just interrupt-controller nodes require
#address-cells, but that spewed too many warnings so it's limited to
where it is really needed.

Rob
Marc Zyngier May 29, 2024, 2 p.m. UTC | #11
On Wed, 29 May 2024 14:44:07 +0100,
Rob Herring <robh@kernel.org> wrote:
> 
> On Wed, May 29, 2024 at 1:33 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Wed, 29 May 2024 06:15:52 +0100,
> > Anup Patel <apatel@ventanamicro.com> wrote:
> > >
> > > This breaks for RISC-V because we don't have "#address-cells"
> > > property in interrupt controller DT node and of_bus_n_addr_cells()
> > > retrieves "#address-cells" from the parent of interrupt controller.
> >
> > That's a feature, not a bug. #address-cells, AFAICT, applies to all
> > child nodes until you set it otherwise.
> 
> That may be supported in some places, but only because of buggy DTs
> (we're talking 2000 era). Current dtc should warn if an interrupt
> controller node doesn't have #address-cells AND is referred to by
> interrupt-map.

Clearly that didn't deter the riscv folks from doing silly things, and
we're now stuck with more random hacks.

Anyhow, I vented enough about this, and I'm going back to doing
semi-useful stuff.

	M.
Conor Dooley May 29, 2024, 3:17 p.m. UTC | #12
On Wed, May 29, 2024 at 01:00:07PM +0100, Marc Zyngier wrote:
> > > > In the RISC-V world, there have been quite a few QEMU releases
> > > > where the generated DT node of the interrupt controller does not
> > > > have the "#address-cells" property. This patch breaks the kernel
> > > > for all such QEMU releases.
> > >
> > > Congratulations, you've forked DT. News at 11.
> > 
> > Can you elaborate how ?
> 
> You've stated it yourself. You are relying on a behaviour that
> deviates from the standard by having DTs with missing properties
> 
> And since we can't travel back it time to fix this, the only solution
> I can see is to support both behaviours by quirking it.

I'm not convinced that there is any actual production hardware that
would get broken by your patch, just QEMU, so I think it should get
fixed to output devicetrees that are spec compliant rather than add some
riscv-specific hacks that we can't even gate on the "qemu,aplic"
compatible because QEMU doesn't use the compatible created for it...

Spec violations aside, the QEMU aplic nodes in the DT contain a bunch
of other issues, including using properties that changed in the
upstreaming process. Here's the issues with Alistair's current riscv
tree for QEMU w/ -smp 4 -M virt,aia=aplic,dumpdtb=$(qemu_dtb) -cpu max -m 1G -nographic

qemu.dtb: aplic@d000000: $nodename:0: 'aplic@d000000' does not match '^interrupt-controller(@[0-9a-f,]+)*$'
        from schema $id: http://devicetree.org/schemas/interrupt-controller/riscv,aplic.yaml#
qemu.dtb: aplic@d000000: compatible:0: 'riscv,aplic' is not one of ['qemu,aplic']
        from schema $id: http://devicetree.org/schemas/interrupt-controller/riscv,aplic.yaml#
qemu.dtb: aplic@d000000: compatible: ['riscv,aplic'] is too short
        from schema $id: http://devicetree.org/schemas/interrupt-controller/riscv,aplic.yaml#
qemu.dtb: aplic@d000000: Unevaluated properties are not allowed ('compatible' was unexpected)
        from schema $id: http://devicetree.org/schemas/interrupt-controller/riscv,aplic.yaml#
qemu.dtb: aplic@c000000: $nodename:0: 'aplic@c000000' does not match '^interrupt-controller(@[0-9a-f,]+)*$'
        from schema $id: http://devicetree.org/schemas/interrupt-controller/riscv,aplic.yaml#
qemu.dtb: aplic@c000000: compatible:0: 'riscv,aplic' is not one of ['qemu,aplic']
        from schema $id: http://devicetree.org/schemas/interrupt-controller/riscv,aplic.yaml#
qemu.dtb: aplic@c000000: compatible: ['riscv,aplic'] is too short
        from schema $id: http://devicetree.org/schemas/interrupt-controller/riscv,aplic.yaml#
qemu.dtb: aplic@c000000: Unevaluated properties are not allowed ('compatible', 'riscv,delegate' were unexpected)
        from schema $id: http://devicetree.org/schemas/interrupt-controller/riscv,aplic.yaml#

I guess noone updated QEMU to comply with the bindings that actually got
upstreamed for the aplic?
Anup Patel May 29, 2024, 4:37 p.m. UTC | #13
On Wed, May 29, 2024 at 8:47 PM Conor Dooley <conor@kernel.org> wrote:
>
> On Wed, May 29, 2024 at 01:00:07PM +0100, Marc Zyngier wrote:
> > > > > In the RISC-V world, there have been quite a few QEMU releases
> > > > > where the generated DT node of the interrupt controller does not
> > > > > have the "#address-cells" property. This patch breaks the kernel
> > > > > for all such QEMU releases.
> > > >
> > > > Congratulations, you've forked DT. News at 11.
> > >
> > > Can you elaborate how ?
> >
> > You've stated it yourself. You are relying on a behaviour that
> > deviates from the standard by having DTs with missing properties
> >
> > And since we can't travel back it time to fix this, the only solution
> > I can see is to support both behaviours by quirking it.
>
> I'm not convinced that there is any actual production hardware that
> would get broken by your patch, just QEMU, so I think it should get
> fixed to output devicetrees that are spec compliant rather than add some
> riscv-specific hacks that we can't even gate on the "qemu,aplic"
> compatible because QEMU doesn't use the compatible created for it...

I also did further digging and it turns out the "#address-cells"
is missing only for APLIC DT nodes and this issue only impacts
APLIC DT node creation in QEMU RISC-V virt machine. We
should just go ahead and fix QEMU.

>
> Spec violations aside, the QEMU aplic nodes in the DT contain a bunch
> of other issues, including using properties that changed in the
> upstreaming process. Here's the issues with Alistair's current riscv
> tree for QEMU w/ -smp 4 -M virt,aia=aplic,dumpdtb=$(qemu_dtb) -cpu max -m 1G -nographic
>
> qemu.dtb: aplic@d000000: $nodename:0: 'aplic@d000000' does not match '^interrupt-controller(@[0-9a-f,]+)*$'
>         from schema $id: http://devicetree.org/schemas/interrupt-controller/riscv,aplic.yaml#
> qemu.dtb: aplic@d000000: compatible:0: 'riscv,aplic' is not one of ['qemu,aplic']
>         from schema $id: http://devicetree.org/schemas/interrupt-controller/riscv,aplic.yaml#
> qemu.dtb: aplic@d000000: compatible: ['riscv,aplic'] is too short
>         from schema $id: http://devicetree.org/schemas/interrupt-controller/riscv,aplic.yaml#
> qemu.dtb: aplic@d000000: Unevaluated properties are not allowed ('compatible' was unexpected)
>         from schema $id: http://devicetree.org/schemas/interrupt-controller/riscv,aplic.yaml#
> qemu.dtb: aplic@c000000: $nodename:0: 'aplic@c000000' does not match '^interrupt-controller(@[0-9a-f,]+)*$'
>         from schema $id: http://devicetree.org/schemas/interrupt-controller/riscv,aplic.yaml#
> qemu.dtb: aplic@c000000: compatible:0: 'riscv,aplic' is not one of ['qemu,aplic']
>         from schema $id: http://devicetree.org/schemas/interrupt-controller/riscv,aplic.yaml#
> qemu.dtb: aplic@c000000: compatible: ['riscv,aplic'] is too short
>         from schema $id: http://devicetree.org/schemas/interrupt-controller/riscv,aplic.yaml#
> qemu.dtb: aplic@c000000: Unevaluated properties are not allowed ('compatible', 'riscv,delegate' were unexpected)
>         from schema $id: http://devicetree.org/schemas/interrupt-controller/riscv,aplic.yaml#
>
> I guess noone updated QEMU to comply with the bindings that actually got
> upstreamed for the aplic?

Yes, we never bothered to update the QEMU DT generation after
AIA DT bindings were accepted. Thanks for catching.

Regards,
Anup
Rob Herring (Arm) May 29, 2024, 4:46 p.m. UTC | #14
On Wed, May 29, 2024 at 8:51 AM Rob Herring <robh@kernel.org> wrote:
>
> On Wed, May 29, 2024 at 6:28 AM Anup Patel <apatel@ventanamicro.com> wrote:
> >
> > On Wed, May 29, 2024 at 4:15 PM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Wed, 29 May 2024 11:16:30 +0100,
> > > Anup Patel <apatel@ventanamicro.com> wrote:
> > > >
> > > > On Wed, May 29, 2024 at 12:03 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > >
> > > > > On Wed, 29 May 2024 06:15:52 +0100,
> > > > > Anup Patel <apatel@ventanamicro.com> wrote:
> > > > > >
> > > > > > On Tue, May 28, 2024 at 10:11 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > >
> > > > > > > Commit d976c6f4b32c ("of: property: Add fw_devlink support for
> > > > > > > interrupt-map property") tried to do what it says on the tin,
> > > > > > > but failed on a couple of points:
> > > > > > >
> > > > > > > - it confuses bytes and cells. Not a huge deal, except when it
> > > > > > >   comes to pointer arithmetic
> > > > > > >
> > > > > > > - it doesn't really handle anything but interrupt-maps that have
> > > > > > >   their parent #address-cells set to 0
> > > > > > >
> > > > > > > The combinations of the two leads to some serious fun on my M1
> > > > > > > box, with plenty of WARN-ON() firing all over the shop, and
> > > > > > > amusing values being generated for interrupt specifiers.
> > > > > > >
> > > > > > > Address both issues so that I can boot my machines again.
> > > > > > >
> > > > > > > Fixes: d976c6f4b32c ("of: property: Add fw_devlink support for interrupt-map property")
> > > > > > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > > > > > Cc: Anup Patel <apatel@ventanamicro.com>
> > > > > > > Cc: Saravana Kannan <saravanak@google.com>
> > > > > > > Cc: Rob Herring (Arm) <robh@kernel.org>
> > > > > >
> > > > > > Thanks for the fix patch but unfortunately it breaks for RISC-V.
> > > > > >
> > > > > > > ---
> > > > > > >  drivers/of/property.c | 16 ++++++++++++++--
> > > > > > >  1 file changed, 14 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > > > > > > index 1c83e68f805b..9adebc63bea9 100644
> > > > > > > --- a/drivers/of/property.c
> > > > > > > +++ b/drivers/of/property.c
> > > > > > > @@ -1322,7 +1322,13 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
> > > > > > >         addrcells = of_bus_n_addr_cells(np);
> > > > > > >
> > > > > > >         imap = of_get_property(np, "interrupt-map", &imaplen);
> > > > > > > -       if (!imap || imaplen <= (addrcells + intcells))
> > > > > > > +       imaplen /= sizeof(*imap);
> > > > > > > +
> > > > > > > +       /*
> > > > > > > +        * Check that we have enough runway for the child unit interrupt
> > > > > > > +        * specifier and a phandle. That's the bare minimum we can expect.
> > > > > > > +        */
> > > > > > > +       if (!imap || imaplen <= (addrcells + intcells + 1))
> > > > > > >                 return NULL;
> > > > > > >         imap_end = imap + imaplen;
> > > > > > >
> > > > > > > @@ -1346,8 +1352,14 @@ static struct device_node *parse_interrupt_map(struct device_node *np,
> > > > > > >                 if (!index)
> > > > > > >                         return sup_args.np;
> > > > > > >
> > > > > > > -               of_node_put(sup_args.np);
> > > > > > > +               /*
> > > > > > > +                * Account for the full parent unit interrupt specifier
> > > > > > > +                * (address cells, interrupt cells, and phandle).
> > > > > > > +                */
> > > > > > > +               imap += of_bus_n_addr_cells(sup_args.np);
> > > > > >
> > > > > > This breaks for RISC-V because we don't have "#address-cells"
> > > > > > property in interrupt controller DT node and of_bus_n_addr_cells()
> > > > > > retrieves "#address-cells" from the parent of interrupt controller.
> > > > >
> > > > > That's a feature, not a bug. #address-cells, AFAICT, applies to all
> > > > > child nodes until you set it otherwise.
> > > > >
> > > > > >
> > > > > > The of_irq_parse_raw() looks for "#address-cells" property
> > > > > > in the interrupt controller DT node only so we should do a
> > > > > > similar thing here as well.
> > > > >
> > > > > This looks more like a of_irq_parse_raw() bug than anything else.
> > > >
> > > > If we change of_irq_parse_raw() to use of_bus_n_addr_cells()
> > > > then it would still break for RISC-V.
> > >
> > > I'm not trying to "fix" riscv. I'm merely outlining that you are
> > > relying on both broken DTs and a buggy OS.
> > >
> > > >
> > > > Using of_bus_n_addr_cells() over here forces interrupt controller
> > > > DT nodes to have a "#address-cells" DT property. There are many
> > > > interrupt controller (e.g. RISC-V PLIC or RISC-V APLIC) where the
> > > > DT bindings don't require "#address-cells" DT property and existing
> > > > DTS files with such interrupt controllers don't have it either.
> > >
> > > It forces you to set #address-cells *if you rely on a different
> > > value in a child node*. It's not like the semantics are new.
> >
> > We don't have child nodes under the interrupt controller DT node
> > (for both RISC-V PLIC and APLIC) so we certainly don't require the
> > "#address-cells" property in the interrupt controller DT node.
>
> interrupt controller nodes which are referred to by interrupt-map
> require #address-cells. Period. Child nodes or not.

Now that I've re-read the code, the kernel will treat missing
#address-cells in the interrupt parent as 0. Looking in the parent
nodes for #address-cells only happens for the input address (i.e. the
address before the phandle). IOW, the first use of_bus_n_addr_cells()
was correct, but the 2nd use is not correct. That's not great, but
that's how this code passed down on stone tablets works...

As I commented on v1 of the original patch. I don't like parsing
interrupt-map in 2 places. It leads to problems exactly as this thread
has shown. The duplication was reduced some, but is still more than
I'd like. So I'm looking at how to refactor of_irq_parse_raw() to make
it work for what's needed here. Maybe I'll have a fix quickly. If not,
I'm going to revert the original commit.

Rob
diff mbox series

Patch

diff --git a/drivers/of/property.c b/drivers/of/property.c
index 1c83e68f805b..9adebc63bea9 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1322,7 +1322,13 @@  static struct device_node *parse_interrupt_map(struct device_node *np,
 	addrcells = of_bus_n_addr_cells(np);
 
 	imap = of_get_property(np, "interrupt-map", &imaplen);
-	if (!imap || imaplen <= (addrcells + intcells))
+	imaplen /= sizeof(*imap);
+
+	/*
+	 * Check that we have enough runway for the child unit interrupt
+	 * specifier and a phandle. That's the bare minimum we can expect.
+	 */
+	if (!imap || imaplen <= (addrcells + intcells + 1))
 		return NULL;
 	imap_end = imap + imaplen;
 
@@ -1346,8 +1352,14 @@  static struct device_node *parse_interrupt_map(struct device_node *np,
 		if (!index)
 			return sup_args.np;
 
-		of_node_put(sup_args.np);
+		/*
+		 * Account for the full parent unit interrupt specifier
+		 * (address cells, interrupt cells, and phandle).
+		 */
+		imap += of_bus_n_addr_cells(sup_args.np);
 		imap += sup_args.args_count + 1;
+
+		of_node_put(sup_args.np);
 		index--;
 	}