Message ID | 20201021123437.21538-6-nsaenzjulienne@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Default to 32-bit wide ZONE_DMA | expand |
On Wed, Oct 21, 2020 at 02:34:35PM +0200, Nicolas Saenz Julienne wrote: > @@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int zone_bits) > static void __init zone_sizes_init(unsigned long min, unsigned long max) > { > unsigned long max_zone_pfns[MAX_NR_ZONES] = {0}; > + unsigned int __maybe_unused dt_zone_dma_bits; > > #ifdef CONFIG_ZONE_DMA > - zone_dma_bits = ARM64_ZONE_DMA_BITS; > + dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL)); > + zone_dma_bits = min(32U, dt_zone_dma_bits); A thought: can we remove the min here and expand ZONE_DMA to whatever dt_zone_dma_bits says? More on this below. > arm64_dma_phys_limit = max_zone_phys(zone_dma_bits); > max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); > #endif I was talking earlier to Ard and Robin on the ZONE_DMA32 history and the need for max_zone_phys(). This was rather theoretical, the Seattle platform has all RAM starting above 4GB and that led to an empty ZONE_DMA32 originally. The max_zone_phys() hack was meant to lift ZONE_DMA32 into the bottom of the RAM, on the assumption that such 32-bit devices would have a DMA offset hardwired. We are not aware of any such case on arm64 systems and even on Seattle, IIUC 32-bit devices only work if they are behind an SMMU (so no hardwired offset). In hindsight, it would have made more sense on platforms with RAM above 4GB to expand ZONE_DMA32 to cover the whole memory (so empty ZONE_NORMAL). Something like: diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index a53c1e0fb017..7d5e3dd85617 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -187,8 +187,12 @@ static void __init reserve_elfcorehdr(void) */ static phys_addr_t __init max_zone_phys(unsigned int zone_bits) { - phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, zone_bits); - return min(offset + (1ULL << zone_bits), memblock_end_of_DRAM()); + phys_addr_t zone_mask = 1ULL << zone_bits; + + if (!(memblock_start_of_DRAM() & zone_mask)) + zone_mask = PHYS_ADDR_MAX; + + return min(zone_mask, memblock_end_of_DRAM()); } static void __init zone_sizes_init(unsigned long min, unsigned long max) I don't think this makes any difference for ZONE_DMA unless a broken DT or IORT reports the max CPU address below the start of DRAM. There's a minor issue if of_dma_get_max_cpu_address() matches memblock_end_of_DRAM() but they are not a power of 2. We'd be left with a bit of RAM at the end in ZONE_NORMAL due to ilog2 truncation.
Hi Catalin, On Thu, 2020-10-22 at 19:06 +0100, Catalin Marinas wrote: > On Wed, Oct 21, 2020 at 02:34:35PM +0200, Nicolas Saenz Julienne wrote: > > @@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int zone_bits) > > static void __init zone_sizes_init(unsigned long min, unsigned long max) > > { > > unsigned long max_zone_pfns[MAX_NR_ZONES] = {0}; > > + unsigned int __maybe_unused dt_zone_dma_bits; > > > > #ifdef CONFIG_ZONE_DMA > > - zone_dma_bits = ARM64_ZONE_DMA_BITS; > > + dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL)); > > + zone_dma_bits = min(32U, dt_zone_dma_bits); > > A thought: can we remove the min here and expand ZONE_DMA to whatever > dt_zone_dma_bits says? More on this below. On most platforms we'd get PHYS_ADDR_MAX, or something bigger than the actual amount of RAM. Which would ultimately create a system wide ZONE_DMA. At first sight, I don't see it breaking dma-direct in any way. On the other hand, there is a big amount of MMIO devices out there that can only handle 32-bit addressing. Be it PCI cards or actual IP cores. To make things worse, this limitation is often expressed in the driver, not FW (with dma_set_mask() and friends). If those devices aren't behind an IOMMU we have be able to provide at least 32-bit addressable memory. See this comment from dma_direct_supported(): /* * Because 32-bit DMA masks are so common we expect every architecture * to be able to satisfy them - either by not supporting more physical * memory, or by providing a ZONE_DMA32. If neither is the case, the * architecture needs to use an IOMMU instead of the direct mapping. */ I think, for the common case, we're stuck with at least one zone spanning the 32-bit address space. > > arm64_dma_phys_limit = max_zone_phys(zone_dma_bits); > > max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); > > #endif > > I was talking earlier to Ard and Robin on the ZONE_DMA32 history and the > need for max_zone_phys(). This was rather theoretical, the Seattle > platform has all RAM starting above 4GB and that led to an empty > ZONE_DMA32 originally. The max_zone_phys() hack was meant to lift > ZONE_DMA32 into the bottom of the RAM, on the assumption that such > 32-bit devices would have a DMA offset hardwired. We are not aware of > any such case on arm64 systems and even on Seattle, IIUC 32-bit devices > only work if they are behind an SMMU (so no hardwired offset). > > In hindsight, it would have made more sense on platforms with RAM above > 4GB to expand ZONE_DMA32 to cover the whole memory (so empty > ZONE_NORMAL). Something like: > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index a53c1e0fb017..7d5e3dd85617 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -187,8 +187,12 @@ static void __init reserve_elfcorehdr(void) > */ > static phys_addr_t __init max_zone_phys(unsigned int zone_bits) > { > - phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, zone_bits); > - return min(offset + (1ULL << zone_bits), memblock_end_of_DRAM()); > + phys_addr_t zone_mask = 1ULL << zone_bits; > + > + if (!(memblock_start_of_DRAM() & zone_mask)) > + zone_mask = PHYS_ADDR_MAX; > + > + return min(zone_mask, memblock_end_of_DRAM()); > } > > static void __init zone_sizes_init(unsigned long min, unsigned long max) > > I don't think this makes any difference for ZONE_DMA unless a > broken DT or IORT reports the max CPU address below the start of DRAM. > > There's a minor issue if of_dma_get_max_cpu_address() matches > memblock_end_of_DRAM() but they are not a power of 2. We'd be left with > a bit of RAM at the end in ZONE_NORMAL due to ilog2 truncation. I agree it makes no sense to create more than one zone when the beginning of RAM is located above the 32-bit address space. I'm all for disregarding the possibility of hardwired offsets. As a bonus, as we already discussed some time ago, this is something that never played well with current dma-direct code[1]. Regards, Nicolas [1] https://lkml.org/lkml/2020/9/8/377
On Fri, Oct 23, 2020 at 05:27:49PM +0200, Nicolas Saenz Julienne wrote: > On Thu, 2020-10-22 at 19:06 +0100, Catalin Marinas wrote: > > On Wed, Oct 21, 2020 at 02:34:35PM +0200, Nicolas Saenz Julienne wrote: > > > @@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int zone_bits) > > > static void __init zone_sizes_init(unsigned long min, unsigned long max) > > > { > > > unsigned long max_zone_pfns[MAX_NR_ZONES] = {0}; > > > + unsigned int __maybe_unused dt_zone_dma_bits; > > > > > > #ifdef CONFIG_ZONE_DMA > > > - zone_dma_bits = ARM64_ZONE_DMA_BITS; > > > + dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL)); > > > + zone_dma_bits = min(32U, dt_zone_dma_bits); > > > > A thought: can we remove the min here and expand ZONE_DMA to whatever > > dt_zone_dma_bits says? More on this below. > > On most platforms we'd get PHYS_ADDR_MAX, or something bigger than the actual > amount of RAM. Which would ultimately create a system wide ZONE_DMA. At first > sight, I don't see it breaking dma-direct in any way. > > On the other hand, there is a big amount of MMIO devices out there that can > only handle 32-bit addressing. Be it PCI cards or actual IP cores. To make > things worse, this limitation is often expressed in the driver, not FW (with > dma_set_mask() and friends). If those devices aren't behind an IOMMU we have be > able to provide at least 32-bit addressable memory. See this comment from > dma_direct_supported(): > > /* > * Because 32-bit DMA masks are so common we expect every architecture > * to be able to satisfy them - either by not supporting more physical > * memory, or by providing a ZONE_DMA32. If neither is the case, the > * architecture needs to use an IOMMU instead of the direct mapping. > */ > > I think, for the common case, we're stuck with at least one zone spanning the > 32-bit address space. You are right, I guess it makes sense to keep a 32-bit zone as not all devices would be described as such. > > > arm64_dma_phys_limit = max_zone_phys(zone_dma_bits); > > > max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); > > > #endif > > > > I was talking earlier to Ard and Robin on the ZONE_DMA32 history and the > > need for max_zone_phys(). This was rather theoretical, the Seattle > > platform has all RAM starting above 4GB and that led to an empty > > ZONE_DMA32 originally. The max_zone_phys() hack was meant to lift > > ZONE_DMA32 into the bottom of the RAM, on the assumption that such > > 32-bit devices would have a DMA offset hardwired. We are not aware of > > any such case on arm64 systems and even on Seattle, IIUC 32-bit devices > > only work if they are behind an SMMU (so no hardwired offset). > > > > In hindsight, it would have made more sense on platforms with RAM above > > 4GB to expand ZONE_DMA32 to cover the whole memory (so empty > > ZONE_NORMAL). Something like: > > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > > index a53c1e0fb017..7d5e3dd85617 100644 > > --- a/arch/arm64/mm/init.c > > +++ b/arch/arm64/mm/init.c > > @@ -187,8 +187,12 @@ static void __init reserve_elfcorehdr(void) > > */ > > static phys_addr_t __init max_zone_phys(unsigned int zone_bits) > > { > > - phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, zone_bits); > > - return min(offset + (1ULL << zone_bits), memblock_end_of_DRAM()); > > + phys_addr_t zone_mask = 1ULL << zone_bits; > > + > > + if (!(memblock_start_of_DRAM() & zone_mask)) > > + zone_mask = PHYS_ADDR_MAX; > > + > > + return min(zone_mask, memblock_end_of_DRAM()); > > } > > > > static void __init zone_sizes_init(unsigned long min, unsigned long max) > > > > I don't think this makes any difference for ZONE_DMA unless a > > broken DT or IORT reports the max CPU address below the start of DRAM. > > > > There's a minor issue if of_dma_get_max_cpu_address() matches > > memblock_end_of_DRAM() but they are not a power of 2. We'd be left with > > a bit of RAM at the end in ZONE_NORMAL due to ilog2 truncation. > > I agree it makes no sense to create more than one zone when the beginning of > RAM is located above the 32-bit address space. I'm all for disregarding the > possibility of hardwired offsets. As a bonus, as we already discussed some time > ago, this is something that never played well with current dma-direct code[1]. > > [1] https://lkml.org/lkml/2020/9/8/377 Maybe this one is still worth fixing, at least for consistency. But it's not urgent. My diff above has a side-effect that if dt_zone_dma_bits is below the start of DRAM, ZONE_DMA gets expanded to PHYS_ADDR_MAX. If this was 32-bit, that's fine but if it was, say, 30-bit because of some firmware misdescription with RAM starting at 2GB, we end up with no ZONE_DMA32. I think max_zone_phys() could cap this at 32, as a safety mechanism: static phys_addr_t __init max_zone_phys(unsigned int zone_bits) { phys_addr_t zone_mask = (1ULL << zone_bits) - 1; phys_addr_t phys_start = memblock_start_of_DRAM(); if (!(phys_start & U32_MAX)) zone_mask = PHYS_ADDR_MAX; else if (!(phys_start & zone_mask)) zone_mask = U32_MAX; return min(zone_mask + 1, memblock_end_of_DRAM()); } Assuming I got the shifting right, arm64_dma_phys_limit becomes: arm64_dma_phys_limit = max_zone_phys(zone_dma_bits, 32);
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 410721fc4fc0..94e38f99748b 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -42,8 +42,6 @@ #include <asm/tlb.h> #include <asm/alternative.h> -#define ARM64_ZONE_DMA_BITS 30 - /* * We need to be able to catch inadvertent references to memstart_addr * that occur (potentially in generic code) before arm64_memblock_init() @@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int zone_bits) static void __init zone_sizes_init(unsigned long min, unsigned long max) { unsigned long max_zone_pfns[MAX_NR_ZONES] = {0}; + unsigned int __maybe_unused dt_zone_dma_bits; #ifdef CONFIG_ZONE_DMA - zone_dma_bits = ARM64_ZONE_DMA_BITS; + dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL)); + zone_dma_bits = min(32U, dt_zone_dma_bits); arm64_dma_phys_limit = max_zone_phys(zone_dma_bits); max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); #endif
We recently introduced a 1 GB sized ZONE_DMA to cater for platforms incorporating masters that can address less than 32 bits of DMA, in particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has peripherals that can only address up to 1 GB (and its PCIe host bridge can only access the bottom 3 GB) The DMA layer also needs to be able to allocate memory that is guaranteed to meet those DMA constraints, for bounce buffering as well as allocating the backing for consistent mappings. This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately, it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes problems with kdump, and potentially in other places where allocations cannot cross zone boundaries. Therefore, we should avoid having two separate DMA zones when possible. So, with the help of of_dma_get_max_cpu_address() get the topmost physical address accessible to all DMA masters in system and use that information to fine-tune ZONE_DMA's size. In the absence of addressing limited masters ZONE_DMA will span the whole 32-bit address space, otherwise, in the case of the Raspberry Pi 4 it'll only span the 30-bit address space, and have ZONE_DMA32 cover the rest of the 32-bit address space. Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> --- Changes since v3: - Simplify code for readability. Changes since v2: - Updated commit log by shamelessly copying Ard's ACPI commit log arch/arm64/mm/init.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)