diff mbox series

[1/4] of/fdt: Update zone_dma_bits when running in bcm2711

Message ID 20201001161740.29064-2-nsaenzjulienne@suse.de (mailing list archive)
State New, archived
Headers show
Series arm64: Default to 32-bit wide ZONE_DMA | expand

Commit Message

Nicolas Saenz Julienne Oct. 1, 2020, 4:17 p.m. UTC
arm64 wants to be able to set ZONE_DMA's size depending on the specific
platform its being run on. Ideally this could be achieved in a smart way
by parsing all dma-ranges and calculating the smaller DMA constraint in
the system. Easier said than done. We compromised on a simpler solution
as the only platform interested in using this is the Raspberry Pi 4.

So update zone_dma_bits if the machine's compatible string matches
Raspberry Pi 4's, otherwise let arm64's mm code deal with it.

Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
---
 drivers/of/fdt.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Catalin Marinas Oct. 1, 2020, 5:15 p.m. UTC | #1
Hi Nicolas,

Thanks for putting this together.

On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 4602e467ca8b..cd0d115ef329 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -25,6 +25,7 @@
>  #include <linux/serial_core.h>
>  #include <linux/sysfs.h>
>  #include <linux/random.h>
> +#include <linux/dma-direct.h>	/* for zone_dma_bits */
>  
>  #include <asm/setup.h>  /* for COMMAND_LINE_SIZE */
>  #include <asm/page.h>
> @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
>  	of_scan_flat_dt(early_init_dt_scan_memory, NULL);
>  }
>  
> +void __init early_init_dt_update_zone_dma_bits(void)
> +{
> +	unsigned long dt_root = of_get_flat_dt_root();
> +
> +	if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> +		zone_dma_bits = 30;
> +}

I think we could keep this entirely in the arm64 setup_machine_fdt() and
not pollute the core code with RPi4-specific code.
Catalin Marinas Oct. 1, 2020, 5:23 p.m. UTC | #2
On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> Hi Nicolas,
> 
> Thanks for putting this together.
> 
> On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > index 4602e467ca8b..cd0d115ef329 100644
> > --- a/drivers/of/fdt.c
> > +++ b/drivers/of/fdt.c
> > @@ -25,6 +25,7 @@
> >  #include <linux/serial_core.h>
> >  #include <linux/sysfs.h>
> >  #include <linux/random.h>
> > +#include <linux/dma-direct.h>	/* for zone_dma_bits */
> >  
> >  #include <asm/setup.h>  /* for COMMAND_LINE_SIZE */
> >  #include <asm/page.h>
> > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> >  	of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> >  }
> >  
> > +void __init early_init_dt_update_zone_dma_bits(void)
> > +{
> > +	unsigned long dt_root = of_get_flat_dt_root();
> > +
> > +	if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > +		zone_dma_bits = 30;
> > +}
> 
> I think we could keep this entirely in the arm64 setup_machine_fdt() and
> not pollute the core code with RPi4-specific code.

Actually, even better, could we not move the check to
arm64_memblock_init() when we initialise zone_dma_bits?
Nicolas Saenz Julienne Oct. 1, 2020, 5:31 p.m. UTC | #3
On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > Hi Nicolas,
> > 
> > Thanks for putting this together.
> > 
> > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > index 4602e467ca8b..cd0d115ef329 100644
> > > --- a/drivers/of/fdt.c
> > > +++ b/drivers/of/fdt.c
> > > @@ -25,6 +25,7 @@
> > >  #include <linux/serial_core.h>
> > >  #include <linux/sysfs.h>
> > >  #include <linux/random.h>
> > > +#include <linux/dma-direct.h>	/* for zone_dma_bits */
> > >  
> > >  #include <asm/setup.h>  /* for COMMAND_LINE_SIZE */
> > >  #include <asm/page.h>
> > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > >  	of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > >  }
> > >  
> > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > +{
> > > +	unsigned long dt_root = of_get_flat_dt_root();
> > > +
> > > +	if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > +		zone_dma_bits = 30;
> > > +}
> > 
> > I think we could keep this entirely in the arm64 setup_machine_fdt() and
> > not pollute the core code with RPi4-specific code.
> 
> Actually, even better, could we not move the check to
> arm64_memblock_init() when we initialise zone_dma_bits?

I did it this way as I vaguely remembered Rob saying he wanted to centralise
all early boot fdt code in one place. But I'll be happy to move it there.

Regards,
Nicolas
Rob Herring Oct. 1, 2020, 8:02 p.m. UTC | #4
On Thu, Oct 1, 2020 at 12:31 PM Nicolas Saenz Julienne
<nsaenzjulienne@suse.de> wrote:
>
> On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > Hi Nicolas,
> > >
> > > Thanks for putting this together.
> > >
> > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > --- a/drivers/of/fdt.c
> > > > +++ b/drivers/of/fdt.c
> > > > @@ -25,6 +25,7 @@
> > > >  #include <linux/serial_core.h>
> > > >  #include <linux/sysfs.h>
> > > >  #include <linux/random.h>
> > > > +#include <linux/dma-direct.h>    /* for zone_dma_bits */
> > > >
> > > >  #include <asm/setup.h>  /* for COMMAND_LINE_SIZE */
> > > >  #include <asm/page.h>
> > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > > >   of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > >  }
> > > >
> > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > +{
> > > > + unsigned long dt_root = of_get_flat_dt_root();
> > > > +
> > > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > +         zone_dma_bits = 30;
> > > > +}
> > >
> > > I think we could keep this entirely in the arm64 setup_machine_fdt() and
> > > not pollute the core code with RPi4-specific code.
> >
> > Actually, even better, could we not move the check to
> > arm64_memblock_init() when we initialise zone_dma_bits?
>
> I did it this way as I vaguely remembered Rob saying he wanted to centralise
> all early boot fdt code in one place. But I'll be happy to move it there.

Right, unless zone_dma_bits is only an arm64 thing, then this doesn't
really have anything arch specific.

Reviewed-by: Rob Herring <robh@kernel.org>

Rob
Catalin Marinas Oct. 2, 2020, 11:55 a.m. UTC | #5
On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote:
> On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > --- a/drivers/of/fdt.c
> > > > +++ b/drivers/of/fdt.c
> > > > @@ -25,6 +25,7 @@
> > > >  #include <linux/serial_core.h>
> > > >  #include <linux/sysfs.h>
> > > >  #include <linux/random.h>
> > > > +#include <linux/dma-direct.h>	/* for zone_dma_bits */
> > > >  
> > > >  #include <asm/setup.h>  /* for COMMAND_LINE_SIZE */
> > > >  #include <asm/page.h>
> > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > > >  	of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > >  }
> > > >  
> > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > +{
> > > > +	unsigned long dt_root = of_get_flat_dt_root();
> > > > +
> > > > +	if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > +		zone_dma_bits = 30;
> > > > +}
> > > 
> > > I think we could keep this entirely in the arm64 setup_machine_fdt() and
> > > not pollute the core code with RPi4-specific code.
> > 
> > Actually, even better, could we not move the check to
> > arm64_memblock_init() when we initialise zone_dma_bits?
> 
> I did it this way as I vaguely remembered Rob saying he wanted to centralise
> all early boot fdt code in one place. But I'll be happy to move it there.

I can see Rob replied and I'm fine if that's his preference. However,
what I don't particularly like is that in the arm64 code, if
zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by
the early_init_dt_update_zone_dma_bits(). What if at some point we'll
get a platform that actually needs 24 here (I truly hope not, but just
the principle of relying on magic values)?

So rather than guessing, I'd prefer if the arch code can override
ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
need to explicitly touch the zone_dma_bits variable.
Nicolas Saenz Julienne Oct. 8, 2020, 10:05 a.m. UTC | #6
Hi Catalin, sorry for the late reply.

On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote:
> On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote:
> > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > > --- a/drivers/of/fdt.c
> > > > > +++ b/drivers/of/fdt.c
> > > > > @@ -25,6 +25,7 @@
> > > > >  #include <linux/serial_core.h>
> > > > >  #include <linux/sysfs.h>
> > > > >  #include <linux/random.h>
> > > > > +#include <linux/dma-direct.h>	/* for zone_dma_bits */
> > > > >  
> > > > >  #include <asm/setup.h>  /* for COMMAND_LINE_SIZE */
> > > > >  #include <asm/page.h>
> > > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > > > >  	of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > > >  }
> > > > >  
> > > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > > +{
> > > > > +	unsigned long dt_root = of_get_flat_dt_root();
> > > > > +
> > > > > +	if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > > +		zone_dma_bits = 30;
> > > > > +}
> > > > 
> > > > I think we could keep this entirely in the arm64 setup_machine_fdt() and
> > > > not pollute the core code with RPi4-specific code.
> > > 
> > > Actually, even better, could we not move the check to
> > > arm64_memblock_init() when we initialise zone_dma_bits?
> > 
> > I did it this way as I vaguely remembered Rob saying he wanted to centralise
> > all early boot fdt code in one place. But I'll be happy to move it there.
> 
> I can see Rob replied and I'm fine if that's his preference. However,
> what I don't particularly like is that in the arm64 code, if
> zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by
> the early_init_dt_update_zone_dma_bits(). What if at some point we'll
> get a platform that actually needs 24 here (I truly hope not, but just
> the principle of relying on magic values)?
> 
> So rather than guessing, I'd prefer if the arch code can override
> ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
> need to explicitly touch the zone_dma_bits variable.

Yes, sonds like the way to go. TBH I wasn't happy with that solution either,
but couldn't think of a nicer alternative.

Sadly I just realised that the series is incomplete, we have RPi4 users that
want to boot unsing ACPI, and this series would break things for them. I'll
have a word with them to see what we can do for their use-case.

Regards,
Nicolas
Catalin Marinas Oct. 8, 2020, 10:13 a.m. UTC | #7
On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote:
> > On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote:
> > > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > > > --- a/drivers/of/fdt.c
> > > > > > +++ b/drivers/of/fdt.c
> > > > > > @@ -25,6 +25,7 @@
> > > > > >  #include <linux/serial_core.h>
> > > > > >  #include <linux/sysfs.h>
> > > > > >  #include <linux/random.h>
> > > > > > +#include <linux/dma-direct.h>	/* for zone_dma_bits */
> > > > > >  
> > > > > >  #include <asm/setup.h>  /* for COMMAND_LINE_SIZE */
> > > > > >  #include <asm/page.h>
> > > > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > > > > >  	of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > > > >  }
> > > > > >  
> > > > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > > > +{
> > > > > > +	unsigned long dt_root = of_get_flat_dt_root();
> > > > > > +
> > > > > > +	if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > > > +		zone_dma_bits = 30;
> > > > > > +}
> > > > > 
> > > > > I think we could keep this entirely in the arm64 setup_machine_fdt() and
> > > > > not pollute the core code with RPi4-specific code.
> > > > 
> > > > Actually, even better, could we not move the check to
> > > > arm64_memblock_init() when we initialise zone_dma_bits?
> > > 
> > > I did it this way as I vaguely remembered Rob saying he wanted to centralise
> > > all early boot fdt code in one place. But I'll be happy to move it there.
> > 
> > I can see Rob replied and I'm fine if that's his preference. However,
> > what I don't particularly like is that in the arm64 code, if
> > zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by
> > the early_init_dt_update_zone_dma_bits(). What if at some point we'll
> > get a platform that actually needs 24 here (I truly hope not, but just
> > the principle of relying on magic values)?
> > 
> > So rather than guessing, I'd prefer if the arch code can override
> > ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
> > need to explicitly touch the zone_dma_bits variable.
> 
> Yes, sonds like the way to go. TBH I wasn't happy with that solution either,
> but couldn't think of a nicer alternative.
> 
> Sadly I just realised that the series is incomplete, we have RPi4 users that
> want to boot unsing ACPI, and this series would break things for them. I'll
> have a word with them to see what we can do for their use-case.

Is there a way to get some SoC information from ACPI?
Ard Biesheuvel Oct. 8, 2020, 7:43 p.m. UTC | #8
(+ Lorenzo)

On Thu, 8 Oct 2020 at 12:14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote:
> > > On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote:
> > > > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > > > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > > > > --- a/drivers/of/fdt.c
> > > > > > > +++ b/drivers/of/fdt.c
> > > > > > > @@ -25,6 +25,7 @@
> > > > > > >  #include <linux/serial_core.h>
> > > > > > >  #include <linux/sysfs.h>
> > > > > > >  #include <linux/random.h>
> > > > > > > +#include <linux/dma-direct.h>      /* for zone_dma_bits */
> > > > > > >
> > > > > > >  #include <asm/setup.h>  /* for COMMAND_LINE_SIZE */
> > > > > > >  #include <asm/page.h>
> > > > > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > > > > > >     of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > > > > >  }
> > > > > > >
> > > > > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > > > > +{
> > > > > > > +   unsigned long dt_root = of_get_flat_dt_root();
> > > > > > > +
> > > > > > > +   if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > > > > +           zone_dma_bits = 30;
> > > > > > > +}
> > > > > >
> > > > > > I think we could keep this entirely in the arm64 setup_machine_fdt() and
> > > > > > not pollute the core code with RPi4-specific code.
> > > > >
> > > > > Actually, even better, could we not move the check to
> > > > > arm64_memblock_init() when we initialise zone_dma_bits?
> > > >
> > > > I did it this way as I vaguely remembered Rob saying he wanted to centralise
> > > > all early boot fdt code in one place. But I'll be happy to move it there.
> > >
> > > I can see Rob replied and I'm fine if that's his preference. However,
> > > what I don't particularly like is that in the arm64 code, if
> > > zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by
> > > the early_init_dt_update_zone_dma_bits(). What if at some point we'll
> > > get a platform that actually needs 24 here (I truly hope not, but just
> > > the principle of relying on magic values)?
> > >
> > > So rather than guessing, I'd prefer if the arch code can override
> > > ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
> > > need to explicitly touch the zone_dma_bits variable.
> >
> > Yes, sonds like the way to go. TBH I wasn't happy with that solution either,
> > but couldn't think of a nicer alternative.
> >
> > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > want to boot unsing ACPI, and this series would break things for them. I'll
> > have a word with them to see what we can do for their use-case.
>
> Is there a way to get some SoC information from ACPI?
>

This is unfortunate. We used ACPI _DMA methods as they were designed
to communicate the DMA limit of the XHCI controller to the OS.

It shouldn't be too hard to match the OEM id field in the DSDT, and
switch to the smaller mask. But it sucks to have to add a quirk like
that.
Jeremy Linton Oct. 9, 2020, 3:59 a.m. UTC | #9
On 10/8/20 2:43 PM, Ard Biesheuvel wrote:
> (+ Lorenzo)
> 
> On Thu, 8 Oct 2020 at 12:14, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>
>> On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
>>> On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote:
>>>> On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote:
>>>>> On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
>>>>>> On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
>>>>>>> On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
>>>>>>>> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
>>>>>>>> index 4602e467ca8b..cd0d115ef329 100644
>>>>>>>> --- a/drivers/of/fdt.c
>>>>>>>> +++ b/drivers/of/fdt.c
>>>>>>>> @@ -25,6 +25,7 @@
>>>>>>>>   #include <linux/serial_core.h>
>>>>>>>>   #include <linux/sysfs.h>
>>>>>>>>   #include <linux/random.h>
>>>>>>>> +#include <linux/dma-direct.h>      /* for zone_dma_bits */
>>>>>>>>
>>>>>>>>   #include <asm/setup.h>  /* for COMMAND_LINE_SIZE */
>>>>>>>>   #include <asm/page.h>
>>>>>>>> @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
>>>>>>>>      of_scan_flat_dt(early_init_dt_scan_memory, NULL);
>>>>>>>>   }
>>>>>>>>
>>>>>>>> +void __init early_init_dt_update_zone_dma_bits(void)
>>>>>>>> +{
>>>>>>>> +   unsigned long dt_root = of_get_flat_dt_root();
>>>>>>>> +
>>>>>>>> +   if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
>>>>>>>> +           zone_dma_bits = 30;
>>>>>>>> +}
>>>>>>>
>>>>>>> I think we could keep this entirely in the arm64 setup_machine_fdt() and
>>>>>>> not pollute the core code with RPi4-specific code.
>>>>>>
>>>>>> Actually, even better, could we not move the check to
>>>>>> arm64_memblock_init() when we initialise zone_dma_bits?
>>>>>
>>>>> I did it this way as I vaguely remembered Rob saying he wanted to centralise
>>>>> all early boot fdt code in one place. But I'll be happy to move it there.
>>>>
>>>> I can see Rob replied and I'm fine if that's his preference. However,
>>>> what I don't particularly like is that in the arm64 code, if
>>>> zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by
>>>> the early_init_dt_update_zone_dma_bits(). What if at some point we'll
>>>> get a platform that actually needs 24 here (I truly hope not, but just
>>>> the principle of relying on magic values)?
>>>>
>>>> So rather than guessing, I'd prefer if the arch code can override
>>>> ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
>>>> need to explicitly touch the zone_dma_bits variable.
>>>
>>> Yes, sonds like the way to go. TBH I wasn't happy with that solution either,
>>> but couldn't think of a nicer alternative.
>>>
>>> Sadly I just realised that the series is incomplete, we have RPi4 users that
>>> want to boot unsing ACPI, and this series would break things for them. I'll
>>> have a word with them to see what we can do for their use-case.
>>
>> Is there a way to get some SoC information from ACPI?
>>
> 
> This is unfortunate. We used ACPI _DMA methods as they were designed
> to communicate the DMA limit of the XHCI controller to the OS.
> 
> It shouldn't be too hard to match the OEM id field in the DSDT, and
> switch to the smaller mask. But it sucks to have to add a quirk like
> that.
> It also requires delaying setting the arm64_dma_phy_limit a bit, but 
that doesn't appear to be causing a problem. I've been boot/compiling 
with a patch like:

diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index cada0b816c8a..9dfe776c1c75 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -14,6 +14,7 @@

  #include <linux/acpi.h>
  #include <linux/cpumask.h>
+#include <linux/dma-direct.h>
  #include <linux/efi.h>
  #include <linux/efi-bgrt.h>
  #include <linux/init.h>
@@ -168,6 +169,11 @@ static int __init acpi_fadt_sanity_check(void)
                 ret = -EINVAL;
         }

+       if (!strncmp(table->oem_id, "RPIFDN", ACPI_OEM_ID_SIZE) &&
+           !strncmp(table->oem_table_id,  "RPI4    ", 
ACPI_OEM_TABLE_ID_SIZE) &&
+           table->oem_revision <= 0x200)  {
+               zone_dma_bits = 30;
+       }
  out:
         /*
          * acpi_get_table() creates FADT table mapping that
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index cd5caca8a929..6c8aaf1570ce 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -195,6 +195,7 @@ static void __init zone_sizes_init(unsigned long 
min, unsigned long max)
         unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};

  #ifdef CONFIG_ZONE_DMA
+       arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
         max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
  #endif
  #ifdef CONFIG_ZONE_DMA32
@@ -393,7 +394,6 @@ void __init arm64_memblock_init(void)
                  */
                 if (zone_dma_bits == ZONE_DMA_BITS_DEFAULT)
                         zone_dma_bits = 32;
-               arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
         }

         if (IS_ENABLED(CONFIG_ZONE_DMA32))
Christoph Hellwig Oct. 9, 2020, 7:10 a.m. UTC | #10
On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> Sadly I just realised that the series is incomplete, we have RPi4 users that
> want to boot unsing ACPI, and this series would break things for them. I'll
> have a word with them to see what we can do for their use-case.

Stupid question:  why do these users insist on a totally unsuitable
interface?  And why would we as Linux developers care to support such
a aims?
Ard Biesheuvel Oct. 9, 2020, 7:37 a.m. UTC | #11
On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > want to boot unsing ACPI, and this series would break things for them. I'll
> > have a word with them to see what we can do for their use-case.
>
> Stupid question:  why do these users insist on a totally unsuitable
> interface?  And why would we as Linux developers care to support such
> a aims?
>

The point is really whether we want to revert changes in Linux that
made both DT and ACPI boot work without quirks on RPi4. Having to
check the RPi4 compatible string or OEM id in core init code is awful,
regardless of whether you boot via ACPI or via DT.

The problem with this hardware is that it uses a DMA mask which is
narrower than 32, and the arm64 kernel is simply not set up to deal
with that at all. On DT, we have DMA ranges properties and the likes
to describe such limitations, on ACPI we have _DMA methods as well as
DMA range attributes in the IORT, both of which are now handled
correctly. So all the information is there, we just have to figure out
how to consume it early on.

Interestingly, this limitation always existed in the SoC, but it
wasn't until they started shipping it with more than 1 GB of DRAM that
it became a problem. This means issues like this could resurface in
the future with existing SoCs when they get shipped with more memory,
and so I would prefer fixing this in a generic way.

Also, I assume papering over the issue like this does not fix the
kdump issue fundamentally, it just works around it, and so we might
run into this again in the future.
Nicolas Saenz Julienne Oct. 9, 2020, 8:36 a.m. UTC | #12
On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <hch@lst.de> wrote:
> > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > > want to boot unsing ACPI, and this series would break things for them. I'll
> > > have a word with them to see what we can do for their use-case.
> > 
> > Stupid question:  why do these users insist on a totally unsuitable
> > interface? And why would we as Linux developers care to support such
> > a aims?
>
> The point is really whether we want to revert changes in Linux that
> made both DT and ACPI boot work without quirks on RPi4.

Well, and broke a big amount of devices that were otherwise fine.

> Having to check the RPi4 compatible string or OEM id in core init code is
> awful, regardless of whether you boot via ACPI or via DT.
>
> The problem with this hardware is that it uses a DMA mask which is
> narrower than 32, and the arm64 kernel is simply not set up to deal
> with that at all. On DT, we have DMA ranges properties and the likes
> to describe such limitations, on ACPI we have _DMA methods as well as
> DMA range attributes in the IORT, both of which are now handled
> correctly. So all the information is there, we just have to figure out
> how to consume it early on.

Is it worth the effort just for a single board? I don't know about ACPI but
parsing dma-ranges that early at boot time is not trivial. My intuition tells
me that it'd be even harder for ACPI, being a more complex data structure.

> Interestingly, this limitation always existed in the SoC, but it
> wasn't until they started shipping it with more than 1 GB of DRAM that
> it became a problem. This means issues like this could resurface in
> the future with existing SoCs when they get shipped with more memory,
> and so I would prefer fixing this in a generic way.

Actually what I proposed here is pretty generic. Specially from arm64's
perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on
whatever it finds in DT. Both those operations are architecture independent.
arm64 arch code doesn't care about the logic involved in ascertaining
zone_dma_bits. I get that the last step isn't generic. But it's all setup so as
to make it as such whenever it's worth the effort.

> Also, I assume papering over the issue like this does not fix the
> kdump issue fundamentally, it just works around it, and so we might
> run into this again in the future.

Any ideas? The way I understand it the kdump issue is just a shortcoming of
the memory zones design.

Regards,
Nicolas
Nicolas Saenz Julienne Oct. 9, 2020, 8:37 a.m. UTC | #13
Hi Jeremy.

On Thu, 2020-10-08 at 22:59 -0500, Jeremy Linton wrote:
> On 10/8/20 2:43 PM, Ard Biesheuvel wrote:
> > (+ Lorenzo)
> > 
> > On Thu, 8 Oct 2020 at 12:14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > > On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote:
> > > > > On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote:
> > > > > > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > > > > > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > > > > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > > > > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > > > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > > > > > > --- a/drivers/of/fdt.c
> > > > > > > > > +++ b/drivers/of/fdt.c
> > > > > > > > > @@ -25,6 +25,7 @@
> > > > > > > > >   #include <linux/serial_core.h>
> > > > > > > > >   #include <linux/sysfs.h>
> > > > > > > > >   #include <linux/random.h>
> > > > > > > > > +#include <linux/dma-direct.h>      /* for zone_dma_bits */
> > > > > > > > > 
> > > > > > > > >   #include <asm/setup.h>  /* for COMMAND_LINE_SIZE */
> > > > > > > > >   #include <asm/page.h>
> > > > > > > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > > > > > > > >      of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > > > > > > >   }
> > > > > > > > > 
> > > > > > > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > > > > > > +{
> > > > > > > > > +   unsigned long dt_root = of_get_flat_dt_root();
> > > > > > > > > +
> > > > > > > > > +   if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > > > > > > +           zone_dma_bits = 30;
> > > > > > > > > +}
> > > > > > > > 
> > > > > > > > I think we could keep this entirely in the arm64 setup_machine_fdt() and
> > > > > > > > not pollute the core code with RPi4-specific code.
> > > > > > > 
> > > > > > > Actually, even better, could we not move the check to
> > > > > > > arm64_memblock_init() when we initialise zone_dma_bits?
> > > > > > 
> > > > > > I did it this way as I vaguely remembered Rob saying he wanted to centralise
> > > > > > all early boot fdt code in one place. But I'll be happy to move it there.
> > > > > 
> > > > > I can see Rob replied and I'm fine if that's his preference. However,
> > > > > what I don't particularly like is that in the arm64 code, if
> > > > > zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by
> > > > > the early_init_dt_update_zone_dma_bits(). What if at some point we'll
> > > > > get a platform that actually needs 24 here (I truly hope not, but just
> > > > > the principle of relying on magic values)?
> > > > > 
> > > > > So rather than guessing, I'd prefer if the arch code can override
> > > > > ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
> > > > > need to explicitly touch the zone_dma_bits variable.
> > > > 
> > > > Yes, sonds like the way to go. TBH I wasn't happy with that solution either,
> > > > but couldn't think of a nicer alternative.
> > > > 
> > > > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > > > want to boot unsing ACPI, and this series would break things for them. I'll
> > > > have a word with them to see what we can do for their use-case.
> > > 
> > > Is there a way to get some SoC information from ACPI?
> > > 
> > 
> > This is unfortunate. We used ACPI _DMA methods as they were designed
> > to communicate the DMA limit of the XHCI controller to the OS.
> > 
> > It shouldn't be too hard to match the OEM id field in the DSDT, and
> > switch to the smaller mask. But it sucks to have to add a quirk like
> > that.
> > It also requires delaying setting the arm64_dma_phy_limit a bit, but 
> that doesn't appear to be causing a problem. I've been boot/compiling 
> with a patch like:
> 
> diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
> index cada0b816c8a..9dfe776c1c75 100644
> --- a/arch/arm64/kernel/acpi.c
> +++ b/arch/arm64/kernel/acpi.c
> @@ -14,6 +14,7 @@
> 
>   #include <linux/acpi.h>
>   #include <linux/cpumask.h>
> +#include <linux/dma-direct.h>
>   #include <linux/efi.h>
>   #include <linux/efi-bgrt.h>
>   #include <linux/init.h>
> @@ -168,6 +169,11 @@ static int __init acpi_fadt_sanity_check(void)
>                  ret = -EINVAL;
>          }
> 
> +       if (!strncmp(table->oem_id, "RPIFDN", ACPI_OEM_ID_SIZE) &&
> +           !strncmp(table->oem_table_id,  "RPI4    ", 
> ACPI_OEM_TABLE_ID_SIZE) &&
> +           table->oem_revision <= 0x200)  {
> +               zone_dma_bits = 30;
> +       }
>   out:
>          /*
>           * acpi_get_table() creates FADT table mapping that
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index cd5caca8a929..6c8aaf1570ce 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -195,6 +195,7 @@ static void __init zone_sizes_init(unsigned long 
> min, unsigned long max)
>          unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
> 
>   #ifdef CONFIG_ZONE_DMA
> +       arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
>          max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
>   #endif
>   #ifdef CONFIG_ZONE_DMA32
> @@ -393,7 +394,6 @@ void __init arm64_memblock_init(void)
>                   */
>                  if (zone_dma_bits == ZONE_DMA_BITS_DEFAULT)
>                          zone_dma_bits = 32;
> -               arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
>          }
> 
>          if (IS_ENABLED(CONFIG_ZONE_DMA32))
> 
> 

Thanks for taking the time to look at this!

Regards,
Nicolas
Ard Biesheuvel Oct. 9, 2020, 9:13 a.m. UTC | #14
On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne
<nsaenzjulienne@suse.de> wrote:
>
> On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <hch@lst.de> wrote:
> > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > > > want to boot unsing ACPI, and this series would break things for them. I'll
> > > > have a word with them to see what we can do for their use-case.
> > >
> > > Stupid question:  why do these users insist on a totally unsuitable
> > > interface? And why would we as Linux developers care to support such
> > > a aims?
> >
> > The point is really whether we want to revert changes in Linux that
> > made both DT and ACPI boot work without quirks on RPi4.
>
> Well, and broke a big amount of devices that were otherwise fine.
>

Yeah that was unfortunate.

> > Having to check the RPi4 compatible string or OEM id in core init code is
> > awful, regardless of whether you boot via ACPI or via DT.
> >
> > The problem with this hardware is that it uses a DMA mask which is
> > narrower than 32, and the arm64 kernel is simply not set up to deal
> > with that at all. On DT, we have DMA ranges properties and the likes
> > to describe such limitations, on ACPI we have _DMA methods as well as
> > DMA range attributes in the IORT, both of which are now handled
> > correctly. So all the information is there, we just have to figure out
> > how to consume it early on.
>
> Is it worth the effort just for a single board? I don't know about ACPI but
> parsing dma-ranges that early at boot time is not trivial. My intuition tells
> me that it'd be even harder for ACPI, being a more complex data structure.
>

Yes, it will be harder, especially for the _DMA methods.

> > Interestingly, this limitation always existed in the SoC, but it
> > wasn't until they started shipping it with more than 1 GB of DRAM that
> > it became a problem. This means issues like this could resurface in
> > the future with existing SoCs when they get shipped with more memory,
> > and so I would prefer fixing this in a generic way.
>
> Actually what I proposed here is pretty generic. Specially from arm64's
> perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on
> whatever it finds in DT. Both those operations are architecture independent.
> arm64 arch code doesn't care about the logic involved in ascertaining
> zone_dma_bits. I get that the last step isn't generic. But it's all setup so as
> to make it as such whenever it's worth the effort.
>

The problem is that, while we are providing a full description of the
SoC's capabilities, we short circuit this by inserting knowledge into
the code (that is shared between all DT architectures) that
"brcm,bcm2711" is special, and needs a DMA zone override.

I think for ACPI boot, we might be able to work around this by cold
plugging the memory above 1 GB, but I have to double check whether it
won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that
for me here from the top of their head)
Nicolas Saenz Julienne Oct. 9, 2020, 1:33 p.m. UTC | #15
On Fri, 2020-10-09 at 11:13 +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne
> <nsaenzjulienne@suse.de> wrote:
> > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <hch@lst.de> wrote:
> > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > > > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > > > > want to boot unsing ACPI, and this series would break things for them. I'll
> > > > > have a word with them to see what we can do for their use-case.
> > > > 
> > > > Stupid question:  why do these users insist on a totally unsuitable
> > > > interface? And why would we as Linux developers care to support such
> > > > a aims?
> > > 
> > > The point is really whether we want to revert changes in Linux that
> > > made both DT and ACPI boot work without quirks on RPi4.
> > 
> > Well, and broke a big amount of devices that were otherwise fine.
> > 
> 
> Yeah that was unfortunate.
> 
> > > Having to check the RPi4 compatible string or OEM id in core init code is
> > > awful, regardless of whether you boot via ACPI or via DT.
> > > 
> > > The problem with this hardware is that it uses a DMA mask which is
> > > narrower than 32, and the arm64 kernel is simply not set up to deal
> > > with that at all. On DT, we have DMA ranges properties and the likes
> > > to describe such limitations, on ACPI we have _DMA methods as well as
> > > DMA range attributes in the IORT, both of which are now handled
> > > correctly. So all the information is there, we just have to figure out
> > > how to consume it early on.
> > 
> > Is it worth the effort just for a single board? I don't know about ACPI but
> > parsing dma-ranges that early at boot time is not trivial. My intuition tells
> > me that it'd be even harder for ACPI, being a more complex data structure.
> > 
> 
> Yes, it will be harder, especially for the _DMA methods.
> 
> > > Interestingly, this limitation always existed in the SoC, but it
> > > wasn't until they started shipping it with more than 1 GB of DRAM that
> > > it became a problem. This means issues like this could resurface in
> > > the future with existing SoCs when they get shipped with more memory,
> > > and so I would prefer fixing this in a generic way.
> > 
> > Actually what I proposed here is pretty generic. Specially from arm64's
> > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on
> > whatever it finds in DT. Both those operations are architecture independent.
> > arm64 arch code doesn't care about the logic involved in ascertaining
> > zone_dma_bits. I get that the last step isn't generic. But it's all setup so as
> > to make it as such whenever it's worth the effort.
> > 
> 
> The problem is that, while we are providing a full description of the
> SoC's capabilities, we short circuit this by inserting knowledge into
> the code (that is shared between all DT architectures) that
> "brcm,bcm2711" is special, and needs a DMA zone override.

Yes I understand this and I sympathize with it, not the most beautiful thing
out there :). But that's only half the issue, as I said, implementing this
early at boot time is a tangible amount of work and a burden to maintain just
for one board. So this is the compromise we discussed with the DT maintainer
(RobH). The series sets things up so as to be able to implement the right
thing transparently to arm64's architecture when deemed worth the effort.

Ultimately, if you're worried about inserting knowledge into the code, aren't
we doing that, in a more extreme way, when imposing an extra unwarranted zone
to the whole arm64 ecosystem?

Note that I'm more that happy to work on alternative solutions, but let's first
settle on what would be acceptable to everyone.

> I think for ACPI boot, we might be able to work around this by cold
> plugging the memory above 1 GB, but I have to double check whether it
> won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that
> for me here from the top of their head)

Don't know much about what ACPI memory cold plugging involves, but we'll still
need a proper ZONE_DMA32 (i.e. spanning the whole 32-bit address space) for
RPi4.

Regards,
Nicolas
Lorenzo Pieralisi Oct. 9, 2020, 3:24 p.m. UTC | #16
On Fri, Oct 09, 2020 at 11:13:59AM +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne
> <nsaenzjulienne@suse.de> wrote:
> >
> > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <hch@lst.de> wrote:
> > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > > > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > > > > want to boot unsing ACPI, and this series would break things for them. I'll
> > > > > have a word with them to see what we can do for their use-case.
> > > >
> > > > Stupid question:  why do these users insist on a totally unsuitable
> > > > interface? And why would we as Linux developers care to support such
> > > > a aims?
> > >
> > > The point is really whether we want to revert changes in Linux that
> > > made both DT and ACPI boot work without quirks on RPi4.
> >
> > Well, and broke a big amount of devices that were otherwise fine.
> >
> 
> Yeah that was unfortunate.
> 
> > > Having to check the RPi4 compatible string or OEM id in core init code is
> > > awful, regardless of whether you boot via ACPI or via DT.
> > >
> > > The problem with this hardware is that it uses a DMA mask which is
> > > narrower than 32, and the arm64 kernel is simply not set up to deal
> > > with that at all. On DT, we have DMA ranges properties and the likes
> > > to describe such limitations, on ACPI we have _DMA methods as well as
> > > DMA range attributes in the IORT, both of which are now handled
> > > correctly. So all the information is there, we just have to figure out
> > > how to consume it early on.
> >
> > Is it worth the effort just for a single board? I don't know about ACPI but
> > parsing dma-ranges that early at boot time is not trivial. My intuition tells
> > me that it'd be even harder for ACPI, being a more complex data structure.
> >
> 
> Yes, it will be harder, especially for the _DMA methods.
> 
> > > Interestingly, this limitation always existed in the SoC, but it
> > > wasn't until they started shipping it with more than 1 GB of DRAM that
> > > it became a problem. This means issues like this could resurface in
> > > the future with existing SoCs when they get shipped with more memory,
> > > and so I would prefer fixing this in a generic way.
> >
> > Actually what I proposed here is pretty generic. Specially from arm64's
> > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on
> > whatever it finds in DT. Both those operations are architecture independent.
> > arm64 arch code doesn't care about the logic involved in ascertaining
> > zone_dma_bits. I get that the last step isn't generic. But it's all setup so as
> > to make it as such whenever it's worth the effort.
> >
> 
> The problem is that, while we are providing a full description of the
> SoC's capabilities, we short circuit this by inserting knowledge into
> the code (that is shared between all DT architectures) that
> "brcm,bcm2711" is special, and needs a DMA zone override.
> 
> I think for ACPI boot, we might be able to work around this by cold
> plugging the memory above 1 GB, but I have to double check whether it
> won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that
> for me here from the top of their head)

Is this information that we can infer from IORT nodes and make it
generic (ie make a DMA limit out of all IORT nodes allowed ranges) ?

We can move this check to IORT code and call it from arm64 if it
can be made to work.

Thanks,
Lorenzo
Ard Biesheuvel Oct. 9, 2020, 4:23 p.m. UTC | #17
On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi
<lorenzo.pieralisi@arm.com> wrote:
>
> On Fri, Oct 09, 2020 at 11:13:59AM +0200, Ard Biesheuvel wrote:
> > On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne
> > <nsaenzjulienne@suse.de> wrote:
> > >
> > > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> > > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <hch@lst.de> wrote:
> > > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > > > > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > > > > > want to boot unsing ACPI, and this series would break things for them. I'll
> > > > > > have a word with them to see what we can do for their use-case.
> > > > >
> > > > > Stupid question:  why do these users insist on a totally unsuitable
> > > > > interface? And why would we as Linux developers care to support such
> > > > > a aims?
> > > >
> > > > The point is really whether we want to revert changes in Linux that
> > > > made both DT and ACPI boot work without quirks on RPi4.
> > >
> > > Well, and broke a big amount of devices that were otherwise fine.
> > >
> >
> > Yeah that was unfortunate.
> >
> > > > Having to check the RPi4 compatible string or OEM id in core init code is
> > > > awful, regardless of whether you boot via ACPI or via DT.
> > > >
> > > > The problem with this hardware is that it uses a DMA mask which is
> > > > narrower than 32, and the arm64 kernel is simply not set up to deal
> > > > with that at all. On DT, we have DMA ranges properties and the likes
> > > > to describe such limitations, on ACPI we have _DMA methods as well as
> > > > DMA range attributes in the IORT, both of which are now handled
> > > > correctly. So all the information is there, we just have to figure out
> > > > how to consume it early on.
> > >
> > > Is it worth the effort just for a single board? I don't know about ACPI but
> > > parsing dma-ranges that early at boot time is not trivial. My intuition tells
> > > me that it'd be even harder for ACPI, being a more complex data structure.
> > >
> >
> > Yes, it will be harder, especially for the _DMA methods.
> >
> > > > Interestingly, this limitation always existed in the SoC, but it
> > > > wasn't until they started shipping it with more than 1 GB of DRAM that
> > > > it became a problem. This means issues like this could resurface in
> > > > the future with existing SoCs when they get shipped with more memory,
> > > > and so I would prefer fixing this in a generic way.
> > >
> > > Actually what I proposed here is pretty generic. Specially from arm64's
> > > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on
> > > whatever it finds in DT. Both those operations are architecture independent.
> > > arm64 arch code doesn't care about the logic involved in ascertaining
> > > zone_dma_bits. I get that the last step isn't generic. But it's all setup so as
> > > to make it as such whenever it's worth the effort.
> > >
> >
> > The problem is that, while we are providing a full description of the
> > SoC's capabilities, we short circuit this by inserting knowledge into
> > the code (that is shared between all DT architectures) that
> > "brcm,bcm2711" is special, and needs a DMA zone override.
> >
> > I think for ACPI boot, we might be able to work around this by cold
> > plugging the memory above 1 GB, but I have to double check whether it
> > won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that
> > for me here from the top of their head)
>
> Is this information that we can infer from IORT nodes and make it
> generic (ie make a DMA limit out of all IORT nodes allowed ranges) ?
>

Yes, that seems feasible.

> We can move this check to IORT code and call it from arm64 if it
> can be made to work.
>

Finding the smallest value in the IORT, and assigning it to
zone_dma_bits if it is < 32 should be easy. But as I understand it,
having these separate DMA and DMA32 zones is what breaks kdump, no? So
how is this going to fix the underlying issue?

Nicolas, were there any other reported regressions caused by the
introduction of ZONE_DMA?
Catalin Marinas Oct. 9, 2020, 5:10 p.m. UTC | #18
On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi
> <lorenzo.pieralisi@arm.com> wrote:
> > We can move this check to IORT code and call it from arm64 if it
> > can be made to work.
> 
> Finding the smallest value in the IORT, and assigning it to
> zone_dma_bits if it is < 32 should be easy. But as I understand it,
> having these separate DMA and DMA32 zones is what breaks kdump, no? So
> how is this going to fix the underlying issue?

If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32
allocations fall back to ZONE_DMA).

kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would
be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly
became 1GB. We could change kdump to allocate ZONE_DMA32 but this one
may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel
would need to be rebuilt without ZONE_DMA since it won't have any. IIRC
(it's been a while since I looked), the kdump allocation couldn't span
multiple zones.

In a separate thread, we try to fix kdump to use allocations above 4G as
a fallback but this only fixes platforms with enough RAM (and maybe it's
only those platforms that care about kdump).
Ard Biesheuvel Oct. 10, 2020, 10:36 a.m. UTC | #19
On Fri, 9 Oct 2020 at 19:10, Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote:
> > On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi
> > <lorenzo.pieralisi@arm.com> wrote:
> > > We can move this check to IORT code and call it from arm64 if it
> > > can be made to work.
> >
> > Finding the smallest value in the IORT, and assigning it to
> > zone_dma_bits if it is < 32 should be easy. But as I understand it,
> > having these separate DMA and DMA32 zones is what breaks kdump, no? So
> > how is this going to fix the underlying issue?
>
> If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32
> allocations fall back to ZONE_DMA).
>
> kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would
> be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly
> became 1GB. We could change kdump to allocate ZONE_DMA32 but this one
> may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel
> would need to be rebuilt without ZONE_DMA since it won't have any. IIRC
> (it's been a while since I looked), the kdump allocation couldn't span
> multiple zones.
>
> In a separate thread, we try to fix kdump to use allocations above 4G as
> a fallback but this only fixes platforms with enough RAM (and maybe it's
> only those platforms that care about kdump).
>

One thing that strikes me as odd is that we are applying the same
shifting logic to ZONE_DMA as we are applying to ZONE_DMA32, i.e., if
DRAM starts outside of the zone, it is shifted upwards.

On a typical ARM box, this gives me

[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000080000000-0x00000000bfffffff]
[    0.000000]   DMA32    [mem 0x00000000c0000000-0x00000000ffffffff]
[    0.000000]   Normal   [mem 0x0000000100000000-0x0000000fffffffff]

i.e., the 30-bit addressable range has bit 31 set, which is weird.

I wonder if it wouldn't be better (and less problematic in the general
case) to drop this logic for ZONE_DMA, and simply let it remain empty
unless there is really some memory there.
Nicolas Saenz Julienne Oct. 10, 2020, 10:53 a.m. UTC | #20
On Sat, 2020-10-10 at 12:36 +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 19:10, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote:
> > > On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi
> > > <lorenzo.pieralisi@arm.com> wrote:
> > > > We can move this check to IORT code and call it from arm64 if it
> > > > can be made to work.
> > > 
> > > Finding the smallest value in the IORT, and assigning it to
> > > zone_dma_bits if it is < 32 should be easy. But as I understand it,
> > > having these separate DMA and DMA32 zones is what breaks kdump, no? So
> > > how is this going to fix the underlying issue?
> > 
> > If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32
> > allocations fall back to ZONE_DMA).
> > 
> > kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would
> > be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly
> > became 1GB. We could change kdump to allocate ZONE_DMA32 but this one
> > may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel
> > would need to be rebuilt without ZONE_DMA since it won't have any. IIRC
> > (it's been a while since I looked), the kdump allocation couldn't span
> > multiple zones.
> > 
> > In a separate thread, we try to fix kdump to use allocations above 4G as
> > a fallback but this only fixes platforms with enough RAM (and maybe it's
> > only those platforms that care about kdump).
> > 
> 
> One thing that strikes me as odd is that we are applying the same
> shifting logic to ZONE_DMA as we are applying to ZONE_DMA32, i.e., if
> DRAM starts outside of the zone, it is shifted upwards.
> 
> On a typical ARM box, this gives me
> 
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x0000000080000000-0x00000000bfffffff]
> [    0.000000]   DMA32    [mem 0x00000000c0000000-0x00000000ffffffff]
> [    0.000000]   Normal   [mem 0x0000000100000000-0x0000000fffffffff]
> 
> i.e., the 30-bit addressable range has bit 31 set, which is weird.

Yes I agree it's weird, and IMO plain useless. I sent a series this summer to
address this[1], which ultimately triggered the discussion we're having right
now.

Although with with your latest patch and the DT counterpart, we should be OK.
It would be weird for a HW description to define DMA constraints that are
impossible to reach on that system.

> I wonder if it wouldn't be better (and less problematic in the general
> case) to drop this logic for ZONE_DMA, and simply let it remain empty
> unless there is really some memory there.

From my experience, you can't have empty ZONE_DMA when enabled. Empty
ZONE_DMA32 is OK though. Although I'm sure it's something that can be changed.

Regards,
Nicolas

[1] https://lkml.org/lkml/2020/8/19/1022
Catalin Marinas Oct. 10, 2020, 12:38 p.m. UTC | #21
On Sat, Oct 10, 2020 at 12:53:19PM +0200, Nicolas Saenz Julienne wrote:
> On Sat, 2020-10-10 at 12:36 +0200, Ard Biesheuvel wrote:
> > On Fri, 9 Oct 2020 at 19:10, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote:
> > > > On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi
> > > > <lorenzo.pieralisi@arm.com> wrote:
> > > > > We can move this check to IORT code and call it from arm64 if it
> > > > > can be made to work.
> > > > 
> > > > Finding the smallest value in the IORT, and assigning it to
> > > > zone_dma_bits if it is < 32 should be easy. But as I understand it,
> > > > having these separate DMA and DMA32 zones is what breaks kdump, no? So
> > > > how is this going to fix the underlying issue?
> > > 
> > > If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32
> > > allocations fall back to ZONE_DMA).
> > > 
> > > kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would
> > > be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly
> > > became 1GB. We could change kdump to allocate ZONE_DMA32 but this one
> > > may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel
> > > would need to be rebuilt without ZONE_DMA since it won't have any. IIRC
> > > (it's been a while since I looked), the kdump allocation couldn't span
> > > multiple zones.
> > > 
> > > In a separate thread, we try to fix kdump to use allocations above 4G as
> > > a fallback but this only fixes platforms with enough RAM (and maybe it's
> > > only those platforms that care about kdump).
> > > 
> > 
> > One thing that strikes me as odd is that we are applying the same
> > shifting logic to ZONE_DMA as we are applying to ZONE_DMA32, i.e., if
> > DRAM starts outside of the zone, it is shifted upwards.
> > 
> > On a typical ARM box, this gives me
> > 
> > [    0.000000] Zone ranges:
> > [    0.000000]   DMA      [mem 0x0000000080000000-0x00000000bfffffff]
> > [    0.000000]   DMA32    [mem 0x00000000c0000000-0x00000000ffffffff]
> > [    0.000000]   Normal   [mem 0x0000000100000000-0x0000000fffffffff]
> > 
> > i.e., the 30-bit addressable range has bit 31 set, which is weird.
> 
> Yes I agree it's weird, and IMO plain useless. I sent a series this summer to
> address this[1], which ultimately triggered the discussion we're having right
> now.

I don't mind assuming that ZONE_DMA is always from pfn 0 (i.e. no
dma_offset for such constrained devices). But if ZONE_DMA32 is squeezed
out with ZONE_DMA extended to 4GB, it should allow non-zero upper 32
bits. IIRC we do have SoCs with RAM starting above 4GB.

However, your patch didn't completely solve the problem for non-RPi4
platforms as there's hardware with RAM starting at 0 that doesn't need
the 1GB ZONE_DMA. We may end up with a combination of the two
approaches.

> Although with with your latest patch and the DT counterpart, we should be OK.
> It would be weird for a HW description to define DMA constraints that are
> impossible to reach on that system.

I don't remember the difficulties with parsing a DT early for inferring
the ZONE_DMA requirements. Could we not check the dma-ranges property in
the soc node? I can see bcm2711.dtsi defines a 30-bit address range. We
are not interested in the absolute physical/bus addresses, just the
size to check whether it's smaller than 32-bit.

> > I wonder if it wouldn't be better (and less problematic in the general
> > case) to drop this logic for ZONE_DMA, and simply let it remain empty
> > unless there is really some memory there.
> 
> From my experience, you can't have empty ZONE_DMA when enabled. Empty
> ZONE_DMA32 is OK though. Although I'm sure it's something that can be changed.

Indeed, because we still have GFP_DMA around which can't fall back to
ZONE_DMA32 (well, unless CONFIG_ZONE_DMA is disabled).
Christoph Hellwig Oct. 12, 2020, 6:47 a.m. UTC | #22
On Fri, Oct 09, 2020 at 06:10:52PM +0100, Catalin Marinas wrote:
> kdump wants DMA-able memory and,

DMAable by whom?  The only way to guranteed DMAable memory is to use
the DMA memory allocator(s) and pass a specific device to them.  Everyting
else is just fundamentally broken.  Note that even when device is not
DMAable we can still use swiotlb to access it.
Catalin Marinas Oct. 12, 2020, 8:47 a.m. UTC | #23
On Mon, Oct 12, 2020 at 08:47:15AM +0200, Christoph Hellwig wrote:
> On Fri, Oct 09, 2020 at 06:10:52PM +0100, Catalin Marinas wrote:
> > kdump wants DMA-able memory and,
> 
> DMAable by whom?  The only way to guranteed DMAable memory is to use
> the DMA memory allocator(s) and pass a specific device to them.  Everyting
> else is just fundamentally broken.  Note that even when device is not
> DMAable we can still use swiotlb to access it.

What I meant is that the new kexec'ed kernel needs some memory in the
ZONE_DMA range, currently set to the bottom 30-bit even for platforms
that can cope with the whole 32-bit range (anything other than RPi4).
The memory range available to the kdump kernels is limited to what
reserve_crashkernel() allocated, which may not fit in the lower 1GB.

There are two ongoing threads (complementary):

1. Change the arm64 reserve_crashkernel() similar to x86 which allocates
   memory above 4G with a small block in the ZONE_DMA range.

2. Allow zone_dma_bits to be 32 for arm64 platforms other than RPi4.

The second point also fixes some regressions with CMA reservations that
could no longer fit in the lower 1GB.
diff mbox series

Patch

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 4602e467ca8b..cd0d115ef329 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -25,6 +25,7 @@ 
 #include <linux/serial_core.h>
 #include <linux/sysfs.h>
 #include <linux/random.h>
+#include <linux/dma-direct.h>	/* for zone_dma_bits */
 
 #include <asm/setup.h>  /* for COMMAND_LINE_SIZE */
 #include <asm/page.h>
@@ -1198,6 +1199,14 @@  void __init early_init_dt_scan_nodes(void)
 	of_scan_flat_dt(early_init_dt_scan_memory, NULL);
 }
 
+void __init early_init_dt_update_zone_dma_bits(void)
+{
+	unsigned long dt_root = of_get_flat_dt_root();
+
+	if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
+		zone_dma_bits = 30;
+}
+
 bool __init early_init_dt_scan(void *params)
 {
 	bool status;
@@ -1207,6 +1216,7 @@  bool __init early_init_dt_scan(void *params)
 		return false;
 
 	early_init_dt_scan_nodes();
+	early_init_dt_update_zone_dma_bits();
 	return true;
 }