Message ID | 20201001161740.29064-2-nsaenzjulienne@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Default to 32-bit wide ZONE_DMA | expand |
Hi Nicolas, Thanks for putting this together. On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote: > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > index 4602e467ca8b..cd0d115ef329 100644 > --- a/drivers/of/fdt.c > +++ b/drivers/of/fdt.c > @@ -25,6 +25,7 @@ > #include <linux/serial_core.h> > #include <linux/sysfs.h> > #include <linux/random.h> > +#include <linux/dma-direct.h> /* for zone_dma_bits */ > > #include <asm/setup.h> /* for COMMAND_LINE_SIZE */ > #include <asm/page.h> > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void) > of_scan_flat_dt(early_init_dt_scan_memory, NULL); > } > > +void __init early_init_dt_update_zone_dma_bits(void) > +{ > + unsigned long dt_root = of_get_flat_dt_root(); > + > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711")) > + zone_dma_bits = 30; > +} I think we could keep this entirely in the arm64 setup_machine_fdt() and not pollute the core code with RPi4-specific code.
On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote: > Hi Nicolas, > > Thanks for putting this together. > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote: > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > > index 4602e467ca8b..cd0d115ef329 100644 > > --- a/drivers/of/fdt.c > > +++ b/drivers/of/fdt.c > > @@ -25,6 +25,7 @@ > > #include <linux/serial_core.h> > > #include <linux/sysfs.h> > > #include <linux/random.h> > > +#include <linux/dma-direct.h> /* for zone_dma_bits */ > > > > #include <asm/setup.h> /* for COMMAND_LINE_SIZE */ > > #include <asm/page.h> > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void) > > of_scan_flat_dt(early_init_dt_scan_memory, NULL); > > } > > > > +void __init early_init_dt_update_zone_dma_bits(void) > > +{ > > + unsigned long dt_root = of_get_flat_dt_root(); > > + > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711")) > > + zone_dma_bits = 30; > > +} > > I think we could keep this entirely in the arm64 setup_machine_fdt() and > not pollute the core code with RPi4-specific code. Actually, even better, could we not move the check to arm64_memblock_init() when we initialise zone_dma_bits?
On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote: > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote: > > Hi Nicolas, > > > > Thanks for putting this together. > > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote: > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > > > index 4602e467ca8b..cd0d115ef329 100644 > > > --- a/drivers/of/fdt.c > > > +++ b/drivers/of/fdt.c > > > @@ -25,6 +25,7 @@ > > > #include <linux/serial_core.h> > > > #include <linux/sysfs.h> > > > #include <linux/random.h> > > > +#include <linux/dma-direct.h> /* for zone_dma_bits */ > > > > > > #include <asm/setup.h> /* for COMMAND_LINE_SIZE */ > > > #include <asm/page.h> > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void) > > > of_scan_flat_dt(early_init_dt_scan_memory, NULL); > > > } > > > > > > +void __init early_init_dt_update_zone_dma_bits(void) > > > +{ > > > + unsigned long dt_root = of_get_flat_dt_root(); > > > + > > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711")) > > > + zone_dma_bits = 30; > > > +} > > > > I think we could keep this entirely in the arm64 setup_machine_fdt() and > > not pollute the core code with RPi4-specific code. > > Actually, even better, could we not move the check to > arm64_memblock_init() when we initialise zone_dma_bits? I did it this way as I vaguely remembered Rob saying he wanted to centralise all early boot fdt code in one place. But I'll be happy to move it there. Regards, Nicolas
On Thu, Oct 1, 2020 at 12:31 PM Nicolas Saenz Julienne <nsaenzjulienne@suse.de> wrote: > > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote: > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote: > > > Hi Nicolas, > > > > > > Thanks for putting this together. > > > > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote: > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > > > > index 4602e467ca8b..cd0d115ef329 100644 > > > > --- a/drivers/of/fdt.c > > > > +++ b/drivers/of/fdt.c > > > > @@ -25,6 +25,7 @@ > > > > #include <linux/serial_core.h> > > > > #include <linux/sysfs.h> > > > > #include <linux/random.h> > > > > +#include <linux/dma-direct.h> /* for zone_dma_bits */ > > > > > > > > #include <asm/setup.h> /* for COMMAND_LINE_SIZE */ > > > > #include <asm/page.h> > > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void) > > > > of_scan_flat_dt(early_init_dt_scan_memory, NULL); > > > > } > > > > > > > > +void __init early_init_dt_update_zone_dma_bits(void) > > > > +{ > > > > + unsigned long dt_root = of_get_flat_dt_root(); > > > > + > > > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711")) > > > > + zone_dma_bits = 30; > > > > +} > > > > > > I think we could keep this entirely in the arm64 setup_machine_fdt() and > > > not pollute the core code with RPi4-specific code. > > > > Actually, even better, could we not move the check to > > arm64_memblock_init() when we initialise zone_dma_bits? > > I did it this way as I vaguely remembered Rob saying he wanted to centralise > all early boot fdt code in one place. But I'll be happy to move it there. Right, unless zone_dma_bits is only an arm64 thing, then this doesn't really have anything arch specific. Reviewed-by: Rob Herring <robh@kernel.org> Rob
On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote: > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote: > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote: > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote: > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > > > > index 4602e467ca8b..cd0d115ef329 100644 > > > > --- a/drivers/of/fdt.c > > > > +++ b/drivers/of/fdt.c > > > > @@ -25,6 +25,7 @@ > > > > #include <linux/serial_core.h> > > > > #include <linux/sysfs.h> > > > > #include <linux/random.h> > > > > +#include <linux/dma-direct.h> /* for zone_dma_bits */ > > > > > > > > #include <asm/setup.h> /* for COMMAND_LINE_SIZE */ > > > > #include <asm/page.h> > > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void) > > > > of_scan_flat_dt(early_init_dt_scan_memory, NULL); > > > > } > > > > > > > > +void __init early_init_dt_update_zone_dma_bits(void) > > > > +{ > > > > + unsigned long dt_root = of_get_flat_dt_root(); > > > > + > > > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711")) > > > > + zone_dma_bits = 30; > > > > +} > > > > > > I think we could keep this entirely in the arm64 setup_machine_fdt() and > > > not pollute the core code with RPi4-specific code. > > > > Actually, even better, could we not move the check to > > arm64_memblock_init() when we initialise zone_dma_bits? > > I did it this way as I vaguely remembered Rob saying he wanted to centralise > all early boot fdt code in one place. But I'll be happy to move it there. I can see Rob replied and I'm fine if that's his preference. However, what I don't particularly like is that in the arm64 code, if zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by the early_init_dt_update_zone_dma_bits(). What if at some point we'll get a platform that actually needs 24 here (I truly hope not, but just the principle of relying on magic values)? So rather than guessing, I'd prefer if the arch code can override ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no need to explicitly touch the zone_dma_bits variable.
Hi Catalin, sorry for the late reply. On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote: > On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote: > > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote: > > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote: > > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote: > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > > > > > index 4602e467ca8b..cd0d115ef329 100644 > > > > > --- a/drivers/of/fdt.c > > > > > +++ b/drivers/of/fdt.c > > > > > @@ -25,6 +25,7 @@ > > > > > #include <linux/serial_core.h> > > > > > #include <linux/sysfs.h> > > > > > #include <linux/random.h> > > > > > +#include <linux/dma-direct.h> /* for zone_dma_bits */ > > > > > > > > > > #include <asm/setup.h> /* for COMMAND_LINE_SIZE */ > > > > > #include <asm/page.h> > > > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void) > > > > > of_scan_flat_dt(early_init_dt_scan_memory, NULL); > > > > > } > > > > > > > > > > +void __init early_init_dt_update_zone_dma_bits(void) > > > > > +{ > > > > > + unsigned long dt_root = of_get_flat_dt_root(); > > > > > + > > > > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711")) > > > > > + zone_dma_bits = 30; > > > > > +} > > > > > > > > I think we could keep this entirely in the arm64 setup_machine_fdt() and > > > > not pollute the core code with RPi4-specific code. > > > > > > Actually, even better, could we not move the check to > > > arm64_memblock_init() when we initialise zone_dma_bits? > > > > I did it this way as I vaguely remembered Rob saying he wanted to centralise > > all early boot fdt code in one place. But I'll be happy to move it there. > > I can see Rob replied and I'm fine if that's his preference. However, > what I don't particularly like is that in the arm64 code, if > zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by > the early_init_dt_update_zone_dma_bits(). What if at some point we'll > get a platform that actually needs 24 here (I truly hope not, but just > the principle of relying on magic values)? > > So rather than guessing, I'd prefer if the arch code can override > ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no > need to explicitly touch the zone_dma_bits variable. Yes, sonds like the way to go. TBH I wasn't happy with that solution either, but couldn't think of a nicer alternative. Sadly I just realised that the series is incomplete, we have RPi4 users that want to boot unsing ACPI, and this series would break things for them. I'll have a word with them to see what we can do for their use-case. Regards, Nicolas
On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: > On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote: > > On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote: > > > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote: > > > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote: > > > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote: > > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > > > > > > index 4602e467ca8b..cd0d115ef329 100644 > > > > > > --- a/drivers/of/fdt.c > > > > > > +++ b/drivers/of/fdt.c > > > > > > @@ -25,6 +25,7 @@ > > > > > > #include <linux/serial_core.h> > > > > > > #include <linux/sysfs.h> > > > > > > #include <linux/random.h> > > > > > > +#include <linux/dma-direct.h> /* for zone_dma_bits */ > > > > > > > > > > > > #include <asm/setup.h> /* for COMMAND_LINE_SIZE */ > > > > > > #include <asm/page.h> > > > > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void) > > > > > > of_scan_flat_dt(early_init_dt_scan_memory, NULL); > > > > > > } > > > > > > > > > > > > +void __init early_init_dt_update_zone_dma_bits(void) > > > > > > +{ > > > > > > + unsigned long dt_root = of_get_flat_dt_root(); > > > > > > + > > > > > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711")) > > > > > > + zone_dma_bits = 30; > > > > > > +} > > > > > > > > > > I think we could keep this entirely in the arm64 setup_machine_fdt() and > > > > > not pollute the core code with RPi4-specific code. > > > > > > > > Actually, even better, could we not move the check to > > > > arm64_memblock_init() when we initialise zone_dma_bits? > > > > > > I did it this way as I vaguely remembered Rob saying he wanted to centralise > > > all early boot fdt code in one place. But I'll be happy to move it there. > > > > I can see Rob replied and I'm fine if that's his preference. However, > > what I don't particularly like is that in the arm64 code, if > > zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by > > the early_init_dt_update_zone_dma_bits(). What if at some point we'll > > get a platform that actually needs 24 here (I truly hope not, but just > > the principle of relying on magic values)? > > > > So rather than guessing, I'd prefer if the arch code can override > > ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no > > need to explicitly touch the zone_dma_bits variable. > > Yes, sonds like the way to go. TBH I wasn't happy with that solution either, > but couldn't think of a nicer alternative. > > Sadly I just realised that the series is incomplete, we have RPi4 users that > want to boot unsing ACPI, and this series would break things for them. I'll > have a word with them to see what we can do for their use-case. Is there a way to get some SoC information from ACPI?
(+ Lorenzo) On Thu, 8 Oct 2020 at 12:14, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: > > On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote: > > > On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote: > > > > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote: > > > > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote: > > > > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote: > > > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > > > > > > > index 4602e467ca8b..cd0d115ef329 100644 > > > > > > > --- a/drivers/of/fdt.c > > > > > > > +++ b/drivers/of/fdt.c > > > > > > > @@ -25,6 +25,7 @@ > > > > > > > #include <linux/serial_core.h> > > > > > > > #include <linux/sysfs.h> > > > > > > > #include <linux/random.h> > > > > > > > +#include <linux/dma-direct.h> /* for zone_dma_bits */ > > > > > > > > > > > > > > #include <asm/setup.h> /* for COMMAND_LINE_SIZE */ > > > > > > > #include <asm/page.h> > > > > > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void) > > > > > > > of_scan_flat_dt(early_init_dt_scan_memory, NULL); > > > > > > > } > > > > > > > > > > > > > > +void __init early_init_dt_update_zone_dma_bits(void) > > > > > > > +{ > > > > > > > + unsigned long dt_root = of_get_flat_dt_root(); > > > > > > > + > > > > > > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711")) > > > > > > > + zone_dma_bits = 30; > > > > > > > +} > > > > > > > > > > > > I think we could keep this entirely in the arm64 setup_machine_fdt() and > > > > > > not pollute the core code with RPi4-specific code. > > > > > > > > > > Actually, even better, could we not move the check to > > > > > arm64_memblock_init() when we initialise zone_dma_bits? > > > > > > > > I did it this way as I vaguely remembered Rob saying he wanted to centralise > > > > all early boot fdt code in one place. But I'll be happy to move it there. > > > > > > I can see Rob replied and I'm fine if that's his preference. However, > > > what I don't particularly like is that in the arm64 code, if > > > zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by > > > the early_init_dt_update_zone_dma_bits(). What if at some point we'll > > > get a platform that actually needs 24 here (I truly hope not, but just > > > the principle of relying on magic values)? > > > > > > So rather than guessing, I'd prefer if the arch code can override > > > ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no > > > need to explicitly touch the zone_dma_bits variable. > > > > Yes, sonds like the way to go. TBH I wasn't happy with that solution either, > > but couldn't think of a nicer alternative. > > > > Sadly I just realised that the series is incomplete, we have RPi4 users that > > want to boot unsing ACPI, and this series would break things for them. I'll > > have a word with them to see what we can do for their use-case. > > Is there a way to get some SoC information from ACPI? > This is unfortunate. We used ACPI _DMA methods as they were designed to communicate the DMA limit of the XHCI controller to the OS. It shouldn't be too hard to match the OEM id field in the DSDT, and switch to the smaller mask. But it sucks to have to add a quirk like that.
On 10/8/20 2:43 PM, Ard Biesheuvel wrote: > (+ Lorenzo) > > On Thu, 8 Oct 2020 at 12:14, Catalin Marinas <catalin.marinas@arm.com> wrote: >> >> On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: >>> On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote: >>>> On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote: >>>>> On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote: >>>>>> On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote: >>>>>>> On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote: >>>>>>>> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c >>>>>>>> index 4602e467ca8b..cd0d115ef329 100644 >>>>>>>> --- a/drivers/of/fdt.c >>>>>>>> +++ b/drivers/of/fdt.c >>>>>>>> @@ -25,6 +25,7 @@ >>>>>>>> #include <linux/serial_core.h> >>>>>>>> #include <linux/sysfs.h> >>>>>>>> #include <linux/random.h> >>>>>>>> +#include <linux/dma-direct.h> /* for zone_dma_bits */ >>>>>>>> >>>>>>>> #include <asm/setup.h> /* for COMMAND_LINE_SIZE */ >>>>>>>> #include <asm/page.h> >>>>>>>> @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void) >>>>>>>> of_scan_flat_dt(early_init_dt_scan_memory, NULL); >>>>>>>> } >>>>>>>> >>>>>>>> +void __init early_init_dt_update_zone_dma_bits(void) >>>>>>>> +{ >>>>>>>> + unsigned long dt_root = of_get_flat_dt_root(); >>>>>>>> + >>>>>>>> + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711")) >>>>>>>> + zone_dma_bits = 30; >>>>>>>> +} >>>>>>> >>>>>>> I think we could keep this entirely in the arm64 setup_machine_fdt() and >>>>>>> not pollute the core code with RPi4-specific code. >>>>>> >>>>>> Actually, even better, could we not move the check to >>>>>> arm64_memblock_init() when we initialise zone_dma_bits? >>>>> >>>>> I did it this way as I vaguely remembered Rob saying he wanted to centralise >>>>> all early boot fdt code in one place. But I'll be happy to move it there. >>>> >>>> I can see Rob replied and I'm fine if that's his preference. However, >>>> what I don't particularly like is that in the arm64 code, if >>>> zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by >>>> the early_init_dt_update_zone_dma_bits(). What if at some point we'll >>>> get a platform that actually needs 24 here (I truly hope not, but just >>>> the principle of relying on magic values)? >>>> >>>> So rather than guessing, I'd prefer if the arch code can override >>>> ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no >>>> need to explicitly touch the zone_dma_bits variable. >>> >>> Yes, sonds like the way to go. TBH I wasn't happy with that solution either, >>> but couldn't think of a nicer alternative. >>> >>> Sadly I just realised that the series is incomplete, we have RPi4 users that >>> want to boot unsing ACPI, and this series would break things for them. I'll >>> have a word with them to see what we can do for their use-case. >> >> Is there a way to get some SoC information from ACPI? >> > > This is unfortunate. We used ACPI _DMA methods as they were designed > to communicate the DMA limit of the XHCI controller to the OS. > > It shouldn't be too hard to match the OEM id field in the DSDT, and > switch to the smaller mask. But it sucks to have to add a quirk like > that. > It also requires delaying setting the arm64_dma_phy_limit a bit, but that doesn't appear to be causing a problem. I've been boot/compiling with a patch like: diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c index cada0b816c8a..9dfe776c1c75 100644 --- a/arch/arm64/kernel/acpi.c +++ b/arch/arm64/kernel/acpi.c @@ -14,6 +14,7 @@ #include <linux/acpi.h> #include <linux/cpumask.h> +#include <linux/dma-direct.h> #include <linux/efi.h> #include <linux/efi-bgrt.h> #include <linux/init.h> @@ -168,6 +169,11 @@ static int __init acpi_fadt_sanity_check(void) ret = -EINVAL; } + if (!strncmp(table->oem_id, "RPIFDN", ACPI_OEM_ID_SIZE) && + !strncmp(table->oem_table_id, "RPI4 ", ACPI_OEM_TABLE_ID_SIZE) && + table->oem_revision <= 0x200) { + zone_dma_bits = 30; + } out: /* * acpi_get_table() creates FADT table mapping that diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index cd5caca8a929..6c8aaf1570ce 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -195,6 +195,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max) unsigned long max_zone_pfns[MAX_NR_ZONES] = {0}; #ifdef CONFIG_ZONE_DMA + arm64_dma_phys_limit = max_zone_phys(zone_dma_bits); max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); #endif #ifdef CONFIG_ZONE_DMA32 @@ -393,7 +394,6 @@ void __init arm64_memblock_init(void) */ if (zone_dma_bits == ZONE_DMA_BITS_DEFAULT) zone_dma_bits = 32; - arm64_dma_phys_limit = max_zone_phys(zone_dma_bits); } if (IS_ENABLED(CONFIG_ZONE_DMA32))
On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: > Sadly I just realised that the series is incomplete, we have RPi4 users that > want to boot unsing ACPI, and this series would break things for them. I'll > have a word with them to see what we can do for their use-case. Stupid question: why do these users insist on a totally unsuitable interface? And why would we as Linux developers care to support such a aims?
On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <hch@lst.de> wrote: > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: > > Sadly I just realised that the series is incomplete, we have RPi4 users that > > want to boot unsing ACPI, and this series would break things for them. I'll > > have a word with them to see what we can do for their use-case. > > Stupid question: why do these users insist on a totally unsuitable > interface? And why would we as Linux developers care to support such > a aims? > The point is really whether we want to revert changes in Linux that made both DT and ACPI boot work without quirks on RPi4. Having to check the RPi4 compatible string or OEM id in core init code is awful, regardless of whether you boot via ACPI or via DT. The problem with this hardware is that it uses a DMA mask which is narrower than 32, and the arm64 kernel is simply not set up to deal with that at all. On DT, we have DMA ranges properties and the likes to describe such limitations, on ACPI we have _DMA methods as well as DMA range attributes in the IORT, both of which are now handled correctly. So all the information is there, we just have to figure out how to consume it early on. Interestingly, this limitation always existed in the SoC, but it wasn't until they started shipping it with more than 1 GB of DRAM that it became a problem. This means issues like this could resurface in the future with existing SoCs when they get shipped with more memory, and so I would prefer fixing this in a generic way. Also, I assume papering over the issue like this does not fix the kdump issue fundamentally, it just works around it, and so we might run into this again in the future.
On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote: > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <hch@lst.de> wrote: > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: > > > Sadly I just realised that the series is incomplete, we have RPi4 users that > > > want to boot unsing ACPI, and this series would break things for them. I'll > > > have a word with them to see what we can do for their use-case. > > > > Stupid question: why do these users insist on a totally unsuitable > > interface? And why would we as Linux developers care to support such > > a aims? > > The point is really whether we want to revert changes in Linux that > made both DT and ACPI boot work without quirks on RPi4. Well, and broke a big amount of devices that were otherwise fine. > Having to check the RPi4 compatible string or OEM id in core init code is > awful, regardless of whether you boot via ACPI or via DT. > > The problem with this hardware is that it uses a DMA mask which is > narrower than 32, and the arm64 kernel is simply not set up to deal > with that at all. On DT, we have DMA ranges properties and the likes > to describe such limitations, on ACPI we have _DMA methods as well as > DMA range attributes in the IORT, both of which are now handled > correctly. So all the information is there, we just have to figure out > how to consume it early on. Is it worth the effort just for a single board? I don't know about ACPI but parsing dma-ranges that early at boot time is not trivial. My intuition tells me that it'd be even harder for ACPI, being a more complex data structure. > Interestingly, this limitation always existed in the SoC, but it > wasn't until they started shipping it with more than 1 GB of DRAM that > it became a problem. This means issues like this could resurface in > the future with existing SoCs when they get shipped with more memory, > and so I would prefer fixing this in a generic way. Actually what I proposed here is pretty generic. Specially from arm64's perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on whatever it finds in DT. Both those operations are architecture independent. arm64 arch code doesn't care about the logic involved in ascertaining zone_dma_bits. I get that the last step isn't generic. But it's all setup so as to make it as such whenever it's worth the effort. > Also, I assume papering over the issue like this does not fix the > kdump issue fundamentally, it just works around it, and so we might > run into this again in the future. Any ideas? The way I understand it the kdump issue is just a shortcoming of the memory zones design. Regards, Nicolas
Hi Jeremy. On Thu, 2020-10-08 at 22:59 -0500, Jeremy Linton wrote: > On 10/8/20 2:43 PM, Ard Biesheuvel wrote: > > (+ Lorenzo) > > > > On Thu, 8 Oct 2020 at 12:14, Catalin Marinas <catalin.marinas@arm.com> wrote: > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: > > > > On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote: > > > > > On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote: > > > > > > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote: > > > > > > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote: > > > > > > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote: > > > > > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > > > > > > > > > index 4602e467ca8b..cd0d115ef329 100644 > > > > > > > > > --- a/drivers/of/fdt.c > > > > > > > > > +++ b/drivers/of/fdt.c > > > > > > > > > @@ -25,6 +25,7 @@ > > > > > > > > > #include <linux/serial_core.h> > > > > > > > > > #include <linux/sysfs.h> > > > > > > > > > #include <linux/random.h> > > > > > > > > > +#include <linux/dma-direct.h> /* for zone_dma_bits */ > > > > > > > > > > > > > > > > > > #include <asm/setup.h> /* for COMMAND_LINE_SIZE */ > > > > > > > > > #include <asm/page.h> > > > > > > > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void) > > > > > > > > > of_scan_flat_dt(early_init_dt_scan_memory, NULL); > > > > > > > > > } > > > > > > > > > > > > > > > > > > +void __init early_init_dt_update_zone_dma_bits(void) > > > > > > > > > +{ > > > > > > > > > + unsigned long dt_root = of_get_flat_dt_root(); > > > > > > > > > + > > > > > > > > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711")) > > > > > > > > > + zone_dma_bits = 30; > > > > > > > > > +} > > > > > > > > > > > > > > > > I think we could keep this entirely in the arm64 setup_machine_fdt() and > > > > > > > > not pollute the core code with RPi4-specific code. > > > > > > > > > > > > > > Actually, even better, could we not move the check to > > > > > > > arm64_memblock_init() when we initialise zone_dma_bits? > > > > > > > > > > > > I did it this way as I vaguely remembered Rob saying he wanted to centralise > > > > > > all early boot fdt code in one place. But I'll be happy to move it there. > > > > > > > > > > I can see Rob replied and I'm fine if that's his preference. However, > > > > > what I don't particularly like is that in the arm64 code, if > > > > > zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by > > > > > the early_init_dt_update_zone_dma_bits(). What if at some point we'll > > > > > get a platform that actually needs 24 here (I truly hope not, but just > > > > > the principle of relying on magic values)? > > > > > > > > > > So rather than guessing, I'd prefer if the arch code can override > > > > > ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no > > > > > need to explicitly touch the zone_dma_bits variable. > > > > > > > > Yes, sonds like the way to go. TBH I wasn't happy with that solution either, > > > > but couldn't think of a nicer alternative. > > > > > > > > Sadly I just realised that the series is incomplete, we have RPi4 users that > > > > want to boot unsing ACPI, and this series would break things for them. I'll > > > > have a word with them to see what we can do for their use-case. > > > > > > Is there a way to get some SoC information from ACPI? > > > > > > > This is unfortunate. We used ACPI _DMA methods as they were designed > > to communicate the DMA limit of the XHCI controller to the OS. > > > > It shouldn't be too hard to match the OEM id field in the DSDT, and > > switch to the smaller mask. But it sucks to have to add a quirk like > > that. > > It also requires delaying setting the arm64_dma_phy_limit a bit, but > that doesn't appear to be causing a problem. I've been boot/compiling > with a patch like: > > diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c > index cada0b816c8a..9dfe776c1c75 100644 > --- a/arch/arm64/kernel/acpi.c > +++ b/arch/arm64/kernel/acpi.c > @@ -14,6 +14,7 @@ > > #include <linux/acpi.h> > #include <linux/cpumask.h> > +#include <linux/dma-direct.h> > #include <linux/efi.h> > #include <linux/efi-bgrt.h> > #include <linux/init.h> > @@ -168,6 +169,11 @@ static int __init acpi_fadt_sanity_check(void) > ret = -EINVAL; > } > > + if (!strncmp(table->oem_id, "RPIFDN", ACPI_OEM_ID_SIZE) && > + !strncmp(table->oem_table_id, "RPI4 ", > ACPI_OEM_TABLE_ID_SIZE) && > + table->oem_revision <= 0x200) { > + zone_dma_bits = 30; > + } > out: > /* > * acpi_get_table() creates FADT table mapping that > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index cd5caca8a929..6c8aaf1570ce 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -195,6 +195,7 @@ static void __init zone_sizes_init(unsigned long > min, unsigned long max) > unsigned long max_zone_pfns[MAX_NR_ZONES] = {0}; > > #ifdef CONFIG_ZONE_DMA > + arm64_dma_phys_limit = max_zone_phys(zone_dma_bits); > max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); > #endif > #ifdef CONFIG_ZONE_DMA32 > @@ -393,7 +394,6 @@ void __init arm64_memblock_init(void) > */ > if (zone_dma_bits == ZONE_DMA_BITS_DEFAULT) > zone_dma_bits = 32; > - arm64_dma_phys_limit = max_zone_phys(zone_dma_bits); > } > > if (IS_ENABLED(CONFIG_ZONE_DMA32)) > > Thanks for taking the time to look at this! Regards, Nicolas
On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne <nsaenzjulienne@suse.de> wrote: > > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote: > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <hch@lst.de> wrote: > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: > > > > Sadly I just realised that the series is incomplete, we have RPi4 users that > > > > want to boot unsing ACPI, and this series would break things for them. I'll > > > > have a word with them to see what we can do for their use-case. > > > > > > Stupid question: why do these users insist on a totally unsuitable > > > interface? And why would we as Linux developers care to support such > > > a aims? > > > > The point is really whether we want to revert changes in Linux that > > made both DT and ACPI boot work without quirks on RPi4. > > Well, and broke a big amount of devices that were otherwise fine. > Yeah that was unfortunate. > > Having to check the RPi4 compatible string or OEM id in core init code is > > awful, regardless of whether you boot via ACPI or via DT. > > > > The problem with this hardware is that it uses a DMA mask which is > > narrower than 32, and the arm64 kernel is simply not set up to deal > > with that at all. On DT, we have DMA ranges properties and the likes > > to describe such limitations, on ACPI we have _DMA methods as well as > > DMA range attributes in the IORT, both of which are now handled > > correctly. So all the information is there, we just have to figure out > > how to consume it early on. > > Is it worth the effort just for a single board? I don't know about ACPI but > parsing dma-ranges that early at boot time is not trivial. My intuition tells > me that it'd be even harder for ACPI, being a more complex data structure. > Yes, it will be harder, especially for the _DMA methods. > > Interestingly, this limitation always existed in the SoC, but it > > wasn't until they started shipping it with more than 1 GB of DRAM that > > it became a problem. This means issues like this could resurface in > > the future with existing SoCs when they get shipped with more memory, > > and so I would prefer fixing this in a generic way. > > Actually what I proposed here is pretty generic. Specially from arm64's > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on > whatever it finds in DT. Both those operations are architecture independent. > arm64 arch code doesn't care about the logic involved in ascertaining > zone_dma_bits. I get that the last step isn't generic. But it's all setup so as > to make it as such whenever it's worth the effort. > The problem is that, while we are providing a full description of the SoC's capabilities, we short circuit this by inserting knowledge into the code (that is shared between all DT architectures) that "brcm,bcm2711" is special, and needs a DMA zone override. I think for ACPI boot, we might be able to work around this by cold plugging the memory above 1 GB, but I have to double check whether it won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that for me here from the top of their head)
On Fri, 2020-10-09 at 11:13 +0200, Ard Biesheuvel wrote: > On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne > <nsaenzjulienne@suse.de> wrote: > > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote: > > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <hch@lst.de> wrote: > > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: > > > > > Sadly I just realised that the series is incomplete, we have RPi4 users that > > > > > want to boot unsing ACPI, and this series would break things for them. I'll > > > > > have a word with them to see what we can do for their use-case. > > > > > > > > Stupid question: why do these users insist on a totally unsuitable > > > > interface? And why would we as Linux developers care to support such > > > > a aims? > > > > > > The point is really whether we want to revert changes in Linux that > > > made both DT and ACPI boot work without quirks on RPi4. > > > > Well, and broke a big amount of devices that were otherwise fine. > > > > Yeah that was unfortunate. > > > > Having to check the RPi4 compatible string or OEM id in core init code is > > > awful, regardless of whether you boot via ACPI or via DT. > > > > > > The problem with this hardware is that it uses a DMA mask which is > > > narrower than 32, and the arm64 kernel is simply not set up to deal > > > with that at all. On DT, we have DMA ranges properties and the likes > > > to describe such limitations, on ACPI we have _DMA methods as well as > > > DMA range attributes in the IORT, both of which are now handled > > > correctly. So all the information is there, we just have to figure out > > > how to consume it early on. > > > > Is it worth the effort just for a single board? I don't know about ACPI but > > parsing dma-ranges that early at boot time is not trivial. My intuition tells > > me that it'd be even harder for ACPI, being a more complex data structure. > > > > Yes, it will be harder, especially for the _DMA methods. > > > > Interestingly, this limitation always existed in the SoC, but it > > > wasn't until they started shipping it with more than 1 GB of DRAM that > > > it became a problem. This means issues like this could resurface in > > > the future with existing SoCs when they get shipped with more memory, > > > and so I would prefer fixing this in a generic way. > > > > Actually what I proposed here is pretty generic. Specially from arm64's > > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on > > whatever it finds in DT. Both those operations are architecture independent. > > arm64 arch code doesn't care about the logic involved in ascertaining > > zone_dma_bits. I get that the last step isn't generic. But it's all setup so as > > to make it as such whenever it's worth the effort. > > > > The problem is that, while we are providing a full description of the > SoC's capabilities, we short circuit this by inserting knowledge into > the code (that is shared between all DT architectures) that > "brcm,bcm2711" is special, and needs a DMA zone override. Yes I understand this and I sympathize with it, not the most beautiful thing out there :). But that's only half the issue, as I said, implementing this early at boot time is a tangible amount of work and a burden to maintain just for one board. So this is the compromise we discussed with the DT maintainer (RobH). The series sets things up so as to be able to implement the right thing transparently to arm64's architecture when deemed worth the effort. Ultimately, if you're worried about inserting knowledge into the code, aren't we doing that, in a more extreme way, when imposing an extra unwarranted zone to the whole arm64 ecosystem? Note that I'm more that happy to work on alternative solutions, but let's first settle on what would be acceptable to everyone. > I think for ACPI boot, we might be able to work around this by cold > plugging the memory above 1 GB, but I have to double check whether it > won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that > for me here from the top of their head) Don't know much about what ACPI memory cold plugging involves, but we'll still need a proper ZONE_DMA32 (i.e. spanning the whole 32-bit address space) for RPi4. Regards, Nicolas
On Fri, Oct 09, 2020 at 11:13:59AM +0200, Ard Biesheuvel wrote: > On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne > <nsaenzjulienne@suse.de> wrote: > > > > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote: > > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <hch@lst.de> wrote: > > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: > > > > > Sadly I just realised that the series is incomplete, we have RPi4 users that > > > > > want to boot unsing ACPI, and this series would break things for them. I'll > > > > > have a word with them to see what we can do for their use-case. > > > > > > > > Stupid question: why do these users insist on a totally unsuitable > > > > interface? And why would we as Linux developers care to support such > > > > a aims? > > > > > > The point is really whether we want to revert changes in Linux that > > > made both DT and ACPI boot work without quirks on RPi4. > > > > Well, and broke a big amount of devices that were otherwise fine. > > > > Yeah that was unfortunate. > > > > Having to check the RPi4 compatible string or OEM id in core init code is > > > awful, regardless of whether you boot via ACPI or via DT. > > > > > > The problem with this hardware is that it uses a DMA mask which is > > > narrower than 32, and the arm64 kernel is simply not set up to deal > > > with that at all. On DT, we have DMA ranges properties and the likes > > > to describe such limitations, on ACPI we have _DMA methods as well as > > > DMA range attributes in the IORT, both of which are now handled > > > correctly. So all the information is there, we just have to figure out > > > how to consume it early on. > > > > Is it worth the effort just for a single board? I don't know about ACPI but > > parsing dma-ranges that early at boot time is not trivial. My intuition tells > > me that it'd be even harder for ACPI, being a more complex data structure. > > > > Yes, it will be harder, especially for the _DMA methods. > > > > Interestingly, this limitation always existed in the SoC, but it > > > wasn't until they started shipping it with more than 1 GB of DRAM that > > > it became a problem. This means issues like this could resurface in > > > the future with existing SoCs when they get shipped with more memory, > > > and so I would prefer fixing this in a generic way. > > > > Actually what I proposed here is pretty generic. Specially from arm64's > > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on > > whatever it finds in DT. Both those operations are architecture independent. > > arm64 arch code doesn't care about the logic involved in ascertaining > > zone_dma_bits. I get that the last step isn't generic. But it's all setup so as > > to make it as such whenever it's worth the effort. > > > > The problem is that, while we are providing a full description of the > SoC's capabilities, we short circuit this by inserting knowledge into > the code (that is shared between all DT architectures) that > "brcm,bcm2711" is special, and needs a DMA zone override. > > I think for ACPI boot, we might be able to work around this by cold > plugging the memory above 1 GB, but I have to double check whether it > won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that > for me here from the top of their head) Is this information that we can infer from IORT nodes and make it generic (ie make a DMA limit out of all IORT nodes allowed ranges) ? We can move this check to IORT code and call it from arm64 if it can be made to work. Thanks, Lorenzo
On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote: > > On Fri, Oct 09, 2020 at 11:13:59AM +0200, Ard Biesheuvel wrote: > > On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne > > <nsaenzjulienne@suse.de> wrote: > > > > > > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote: > > > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <hch@lst.de> wrote: > > > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote: > > > > > > Sadly I just realised that the series is incomplete, we have RPi4 users that > > > > > > want to boot unsing ACPI, and this series would break things for them. I'll > > > > > > have a word with them to see what we can do for their use-case. > > > > > > > > > > Stupid question: why do these users insist on a totally unsuitable > > > > > interface? And why would we as Linux developers care to support such > > > > > a aims? > > > > > > > > The point is really whether we want to revert changes in Linux that > > > > made both DT and ACPI boot work without quirks on RPi4. > > > > > > Well, and broke a big amount of devices that were otherwise fine. > > > > > > > Yeah that was unfortunate. > > > > > > Having to check the RPi4 compatible string or OEM id in core init code is > > > > awful, regardless of whether you boot via ACPI or via DT. > > > > > > > > The problem with this hardware is that it uses a DMA mask which is > > > > narrower than 32, and the arm64 kernel is simply not set up to deal > > > > with that at all. On DT, we have DMA ranges properties and the likes > > > > to describe such limitations, on ACPI we have _DMA methods as well as > > > > DMA range attributes in the IORT, both of which are now handled > > > > correctly. So all the information is there, we just have to figure out > > > > how to consume it early on. > > > > > > Is it worth the effort just for a single board? I don't know about ACPI but > > > parsing dma-ranges that early at boot time is not trivial. My intuition tells > > > me that it'd be even harder for ACPI, being a more complex data structure. > > > > > > > Yes, it will be harder, especially for the _DMA methods. > > > > > > Interestingly, this limitation always existed in the SoC, but it > > > > wasn't until they started shipping it with more than 1 GB of DRAM that > > > > it became a problem. This means issues like this could resurface in > > > > the future with existing SoCs when they get shipped with more memory, > > > > and so I would prefer fixing this in a generic way. > > > > > > Actually what I proposed here is pretty generic. Specially from arm64's > > > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on > > > whatever it finds in DT. Both those operations are architecture independent. > > > arm64 arch code doesn't care about the logic involved in ascertaining > > > zone_dma_bits. I get that the last step isn't generic. But it's all setup so as > > > to make it as such whenever it's worth the effort. > > > > > > > The problem is that, while we are providing a full description of the > > SoC's capabilities, we short circuit this by inserting knowledge into > > the code (that is shared between all DT architectures) that > > "brcm,bcm2711" is special, and needs a DMA zone override. > > > > I think for ACPI boot, we might be able to work around this by cold > > plugging the memory above 1 GB, but I have to double check whether it > > won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that > > for me here from the top of their head) > > Is this information that we can infer from IORT nodes and make it > generic (ie make a DMA limit out of all IORT nodes allowed ranges) ? > Yes, that seems feasible. > We can move this check to IORT code and call it from arm64 if it > can be made to work. > Finding the smallest value in the IORT, and assigning it to zone_dma_bits if it is < 32 should be easy. But as I understand it, having these separate DMA and DMA32 zones is what breaks kdump, no? So how is this going to fix the underlying issue? Nicolas, were there any other reported regressions caused by the introduction of ZONE_DMA?
On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote: > On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi > <lorenzo.pieralisi@arm.com> wrote: > > We can move this check to IORT code and call it from arm64 if it > > can be made to work. > > Finding the smallest value in the IORT, and assigning it to > zone_dma_bits if it is < 32 should be easy. But as I understand it, > having these separate DMA and DMA32 zones is what breaks kdump, no? So > how is this going to fix the underlying issue? If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32 allocations fall back to ZONE_DMA). kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly became 1GB. We could change kdump to allocate ZONE_DMA32 but this one may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel would need to be rebuilt without ZONE_DMA since it won't have any. IIRC (it's been a while since I looked), the kdump allocation couldn't span multiple zones. In a separate thread, we try to fix kdump to use allocations above 4G as a fallback but this only fixes platforms with enough RAM (and maybe it's only those platforms that care about kdump).
On Fri, 9 Oct 2020 at 19:10, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote: > > On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi > > <lorenzo.pieralisi@arm.com> wrote: > > > We can move this check to IORT code and call it from arm64 if it > > > can be made to work. > > > > Finding the smallest value in the IORT, and assigning it to > > zone_dma_bits if it is < 32 should be easy. But as I understand it, > > having these separate DMA and DMA32 zones is what breaks kdump, no? So > > how is this going to fix the underlying issue? > > If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32 > allocations fall back to ZONE_DMA). > > kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would > be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly > became 1GB. We could change kdump to allocate ZONE_DMA32 but this one > may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel > would need to be rebuilt without ZONE_DMA since it won't have any. IIRC > (it's been a while since I looked), the kdump allocation couldn't span > multiple zones. > > In a separate thread, we try to fix kdump to use allocations above 4G as > a fallback but this only fixes platforms with enough RAM (and maybe it's > only those platforms that care about kdump). > One thing that strikes me as odd is that we are applying the same shifting logic to ZONE_DMA as we are applying to ZONE_DMA32, i.e., if DRAM starts outside of the zone, it is shifted upwards. On a typical ARM box, this gives me [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000080000000-0x00000000bfffffff] [ 0.000000] DMA32 [mem 0x00000000c0000000-0x00000000ffffffff] [ 0.000000] Normal [mem 0x0000000100000000-0x0000000fffffffff] i.e., the 30-bit addressable range has bit 31 set, which is weird. I wonder if it wouldn't be better (and less problematic in the general case) to drop this logic for ZONE_DMA, and simply let it remain empty unless there is really some memory there.
On Sat, 2020-10-10 at 12:36 +0200, Ard Biesheuvel wrote: > On Fri, 9 Oct 2020 at 19:10, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote: > > > On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi > > > <lorenzo.pieralisi@arm.com> wrote: > > > > We can move this check to IORT code and call it from arm64 if it > > > > can be made to work. > > > > > > Finding the smallest value in the IORT, and assigning it to > > > zone_dma_bits if it is < 32 should be easy. But as I understand it, > > > having these separate DMA and DMA32 zones is what breaks kdump, no? So > > > how is this going to fix the underlying issue? > > > > If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32 > > allocations fall back to ZONE_DMA). > > > > kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would > > be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly > > became 1GB. We could change kdump to allocate ZONE_DMA32 but this one > > may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel > > would need to be rebuilt without ZONE_DMA since it won't have any. IIRC > > (it's been a while since I looked), the kdump allocation couldn't span > > multiple zones. > > > > In a separate thread, we try to fix kdump to use allocations above 4G as > > a fallback but this only fixes platforms with enough RAM (and maybe it's > > only those platforms that care about kdump). > > > > One thing that strikes me as odd is that we are applying the same > shifting logic to ZONE_DMA as we are applying to ZONE_DMA32, i.e., if > DRAM starts outside of the zone, it is shifted upwards. > > On a typical ARM box, this gives me > > [ 0.000000] Zone ranges: > [ 0.000000] DMA [mem 0x0000000080000000-0x00000000bfffffff] > [ 0.000000] DMA32 [mem 0x00000000c0000000-0x00000000ffffffff] > [ 0.000000] Normal [mem 0x0000000100000000-0x0000000fffffffff] > > i.e., the 30-bit addressable range has bit 31 set, which is weird. Yes I agree it's weird, and IMO plain useless. I sent a series this summer to address this[1], which ultimately triggered the discussion we're having right now. Although with with your latest patch and the DT counterpart, we should be OK. It would be weird for a HW description to define DMA constraints that are impossible to reach on that system. > I wonder if it wouldn't be better (and less problematic in the general > case) to drop this logic for ZONE_DMA, and simply let it remain empty > unless there is really some memory there. From my experience, you can't have empty ZONE_DMA when enabled. Empty ZONE_DMA32 is OK though. Although I'm sure it's something that can be changed. Regards, Nicolas [1] https://lkml.org/lkml/2020/8/19/1022
On Sat, Oct 10, 2020 at 12:53:19PM +0200, Nicolas Saenz Julienne wrote: > On Sat, 2020-10-10 at 12:36 +0200, Ard Biesheuvel wrote: > > On Fri, 9 Oct 2020 at 19:10, Catalin Marinas <catalin.marinas@arm.com> wrote: > > > On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote: > > > > On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi > > > > <lorenzo.pieralisi@arm.com> wrote: > > > > > We can move this check to IORT code and call it from arm64 if it > > > > > can be made to work. > > > > > > > > Finding the smallest value in the IORT, and assigning it to > > > > zone_dma_bits if it is < 32 should be easy. But as I understand it, > > > > having these separate DMA and DMA32 zones is what breaks kdump, no? So > > > > how is this going to fix the underlying issue? > > > > > > If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32 > > > allocations fall back to ZONE_DMA). > > > > > > kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would > > > be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly > > > became 1GB. We could change kdump to allocate ZONE_DMA32 but this one > > > may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel > > > would need to be rebuilt without ZONE_DMA since it won't have any. IIRC > > > (it's been a while since I looked), the kdump allocation couldn't span > > > multiple zones. > > > > > > In a separate thread, we try to fix kdump to use allocations above 4G as > > > a fallback but this only fixes platforms with enough RAM (and maybe it's > > > only those platforms that care about kdump). > > > > > > > One thing that strikes me as odd is that we are applying the same > > shifting logic to ZONE_DMA as we are applying to ZONE_DMA32, i.e., if > > DRAM starts outside of the zone, it is shifted upwards. > > > > On a typical ARM box, this gives me > > > > [ 0.000000] Zone ranges: > > [ 0.000000] DMA [mem 0x0000000080000000-0x00000000bfffffff] > > [ 0.000000] DMA32 [mem 0x00000000c0000000-0x00000000ffffffff] > > [ 0.000000] Normal [mem 0x0000000100000000-0x0000000fffffffff] > > > > i.e., the 30-bit addressable range has bit 31 set, which is weird. > > Yes I agree it's weird, and IMO plain useless. I sent a series this summer to > address this[1], which ultimately triggered the discussion we're having right > now. I don't mind assuming that ZONE_DMA is always from pfn 0 (i.e. no dma_offset for such constrained devices). But if ZONE_DMA32 is squeezed out with ZONE_DMA extended to 4GB, it should allow non-zero upper 32 bits. IIRC we do have SoCs with RAM starting above 4GB. However, your patch didn't completely solve the problem for non-RPi4 platforms as there's hardware with RAM starting at 0 that doesn't need the 1GB ZONE_DMA. We may end up with a combination of the two approaches. > Although with with your latest patch and the DT counterpart, we should be OK. > It would be weird for a HW description to define DMA constraints that are > impossible to reach on that system. I don't remember the difficulties with parsing a DT early for inferring the ZONE_DMA requirements. Could we not check the dma-ranges property in the soc node? I can see bcm2711.dtsi defines a 30-bit address range. We are not interested in the absolute physical/bus addresses, just the size to check whether it's smaller than 32-bit. > > I wonder if it wouldn't be better (and less problematic in the general > > case) to drop this logic for ZONE_DMA, and simply let it remain empty > > unless there is really some memory there. > > From my experience, you can't have empty ZONE_DMA when enabled. Empty > ZONE_DMA32 is OK though. Although I'm sure it's something that can be changed. Indeed, because we still have GFP_DMA around which can't fall back to ZONE_DMA32 (well, unless CONFIG_ZONE_DMA is disabled).
On Fri, Oct 09, 2020 at 06:10:52PM +0100, Catalin Marinas wrote:
> kdump wants DMA-able memory and,
DMAable by whom? The only way to guranteed DMAable memory is to use
the DMA memory allocator(s) and pass a specific device to them. Everyting
else is just fundamentally broken. Note that even when device is not
DMAable we can still use swiotlb to access it.
On Mon, Oct 12, 2020 at 08:47:15AM +0200, Christoph Hellwig wrote: > On Fri, Oct 09, 2020 at 06:10:52PM +0100, Catalin Marinas wrote: > > kdump wants DMA-able memory and, > > DMAable by whom? The only way to guranteed DMAable memory is to use > the DMA memory allocator(s) and pass a specific device to them. Everyting > else is just fundamentally broken. Note that even when device is not > DMAable we can still use swiotlb to access it. What I meant is that the new kexec'ed kernel needs some memory in the ZONE_DMA range, currently set to the bottom 30-bit even for platforms that can cope with the whole 32-bit range (anything other than RPi4). The memory range available to the kdump kernels is limited to what reserve_crashkernel() allocated, which may not fit in the lower 1GB. There are two ongoing threads (complementary): 1. Change the arm64 reserve_crashkernel() similar to x86 which allocates memory above 4G with a small block in the ZONE_DMA range. 2. Allow zone_dma_bits to be 32 for arm64 platforms other than RPi4. The second point also fixes some regressions with CMA reservations that could no longer fit in the lower 1GB.
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 4602e467ca8b..cd0d115ef329 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -25,6 +25,7 @@ #include <linux/serial_core.h> #include <linux/sysfs.h> #include <linux/random.h> +#include <linux/dma-direct.h> /* for zone_dma_bits */ #include <asm/setup.h> /* for COMMAND_LINE_SIZE */ #include <asm/page.h> @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void) of_scan_flat_dt(early_init_dt_scan_memory, NULL); } +void __init early_init_dt_update_zone_dma_bits(void) +{ + unsigned long dt_root = of_get_flat_dt_root(); + + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711")) + zone_dma_bits = 30; +} + bool __init early_init_dt_scan(void *params) { bool status; @@ -1207,6 +1216,7 @@ bool __init early_init_dt_scan(void *params) return false; early_init_dt_scan_nodes(); + early_init_dt_update_zone_dma_bits(); return true; }
arm64 wants to be able to set ZONE_DMA's size depending on the specific platform its being run on. Ideally this could be achieved in a smart way by parsing all dma-ranges and calculating the smaller DMA constraint in the system. Easier said than done. We compromised on a simpler solution as the only platform interested in using this is the Raspberry Pi 4. So update zone_dma_bits if the machine's compatible string matches Raspberry Pi 4's, otherwise let arm64's mm code deal with it. Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> --- drivers/of/fdt.c | 10 ++++++++++ 1 file changed, 10 insertions(+)