Message ID | 20241125171650.77424-1-yang@os.amperecomputing.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: mm: fix zone_dma_limit calculation | expand |
Hi Yang, On Mon, Nov 25 2024, Yang Shi wrote: > The commit ba0fb44aed47 ("dma-mapping: replace zone_dma_bits by > zone_dma_limit") changed how zone_dma_limit was calculated. Now it > returns the memsize limit in IORT or device tree instead of U32_MAX if > the memsize limit is greater than U32_MAX. Can you give a concrete example of memory layout and dma-ranges that demonstrates this issue? > This resulted in DMA allocations may use GFP_DMA even though the devices > don't require it. It caused regression on our two sockets systems due > to excessive remote memory access. That is, DMA zone used to cover all memory before commit ba0fb44aed47, but now DMA zone is limited to the smallest dma-ranges. Is that correct? Thanks, baruch > Fixes: ba0fb44aed47 ("dma-mapping: replace zone_dma_bits by zone_dma_limit") > Cc: <stable@vger.kernel.org> [6.12+] > Reported-by: Yutang Jiang <jiangyutang@os.amperecomputing.com> > Tested-by: Yutang Jiang <jiangyutang@os.amperecomputing.com> > Signed-off-by: Yang Shi <yang@os.amperecomputing.com> > --- > arch/arm64/mm/init.c | 17 ++++++++--------- > 1 file changed, 8 insertions(+), 9 deletions(-) > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index d21f67d67cf5..ccdef53872a0 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -117,15 +117,6 @@ static void __init arch_reserve_crashkernel(void) > > static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit) > { > - /** > - * Information we get from firmware (e.g. DT dma-ranges) describe DMA > - * bus constraints. Devices using DMA might have their own limitations. > - * Some of them rely on DMA zone in low 32-bit memory. Keep low RAM > - * DMA zone on platforms that have RAM there. > - */ > - if (memblock_start_of_DRAM() < U32_MAX) > - zone_limit = min(zone_limit, U32_MAX); > - > return min(zone_limit, memblock_end_of_DRAM() - 1) + 1; > } > > @@ -141,6 +132,14 @@ static void __init zone_sizes_init(void) > acpi_zone_dma_limit = acpi_iort_dma_get_max_cpu_address(); > dt_zone_dma_limit = of_dma_get_max_cpu_address(NULL); > zone_dma_limit = min(dt_zone_dma_limit, acpi_zone_dma_limit); > + /* > + * Information we get from firmware (e.g. DT dma-ranges) describe DMA > + * bus constraints. Devices using DMA might have their own limitations. > + * Some of them rely on DMA zone in low 32-bit memory. Keep low RAM > + * DMA zone on platforms that have RAM there. > + */ > + if (memblock_start_of_DRAM() < U32_MAX) > + zone_dma_limit = min(zone_dma_limit, U32_MAX); > arm64_dma_phys_limit = max_zone_phys(zone_dma_limit); > max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); > #endif
On 11/25/24 10:27 PM, Baruch Siach wrote: > Hi Yang, > > On Mon, Nov 25 2024, Yang Shi wrote: >> The commit ba0fb44aed47 ("dma-mapping: replace zone_dma_bits by >> zone_dma_limit") changed how zone_dma_limit was calculated. Now it >> returns the memsize limit in IORT or device tree instead of U32_MAX if >> the memsize limit is greater than U32_MAX. > Can you give a concrete example of memory layout and dma-ranges that > demonstrates this issue? Our 2 sockets system has physical memory starts at 0x0 on node 0 and 0x200000000000 on node 1. The memory size limit defined in IORT is 0x30 (48 bits). The DMA zone is: pages free 887722 boost 0 min 229 low 1108 high 1987 promo 2866 spanned 983040 present 982034 managed 903238 cma 16384 protection: (0, 0, 124824, 0, 0) start_pfn: 65536 When allocating DMA buffer, dma_direct_optimal_gfp_mask() is called to determine the proper zone constraints. If the phys_limit is less than zone_dma_limit, it will use GFP_DMA. But zone_dma_limit is 0xffffffffffff on v6.12 instead of 4G prior v6.12, it means all DMA buffer allocation will go to DMA zone even though the devices don't require it. DMA zone is on node 0, so we saw excessive remote access on 2 sockets system. > >> This resulted in DMA allocations may use GFP_DMA even though the devices >> don't require it. It caused regression on our two sockets systems due >> to excessive remote memory access. > That is, DMA zone used to cover all memory before commit ba0fb44aed47, > but now DMA zone is limited to the smallest dma-ranges. Is that correct? The physical addr range for DMA zone is correct, the problem is wrong zone_dma_limit. Before commit ba0fb44aed47 zone_dma_limit was 4G, after it it is the whole memory even though DMA zone just covers low 4G. Thanks, Yang > > Thanks, > baruch > >> Fixes: ba0fb44aed47 ("dma-mapping: replace zone_dma_bits by zone_dma_limit") >> Cc: <stable@vger.kernel.org> [6.12+] >> Reported-by: Yutang Jiang <jiangyutang@os.amperecomputing.com> >> Tested-by: Yutang Jiang <jiangyutang@os.amperecomputing.com> >> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> >> --- >> arch/arm64/mm/init.c | 17 ++++++++--------- >> 1 file changed, 8 insertions(+), 9 deletions(-) >> >> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >> index d21f67d67cf5..ccdef53872a0 100644 >> --- a/arch/arm64/mm/init.c >> +++ b/arch/arm64/mm/init.c >> @@ -117,15 +117,6 @@ static void __init arch_reserve_crashkernel(void) >> >> static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit) >> { >> - /** >> - * Information we get from firmware (e.g. DT dma-ranges) describe DMA >> - * bus constraints. Devices using DMA might have their own limitations. >> - * Some of them rely on DMA zone in low 32-bit memory. Keep low RAM >> - * DMA zone on platforms that have RAM there. >> - */ >> - if (memblock_start_of_DRAM() < U32_MAX) >> - zone_limit = min(zone_limit, U32_MAX); >> - >> return min(zone_limit, memblock_end_of_DRAM() - 1) + 1; >> } >> >> @@ -141,6 +132,14 @@ static void __init zone_sizes_init(void) >> acpi_zone_dma_limit = acpi_iort_dma_get_max_cpu_address(); >> dt_zone_dma_limit = of_dma_get_max_cpu_address(NULL); >> zone_dma_limit = min(dt_zone_dma_limit, acpi_zone_dma_limit); >> + /* >> + * Information we get from firmware (e.g. DT dma-ranges) describe DMA >> + * bus constraints. Devices using DMA might have their own limitations. >> + * Some of them rely on DMA zone in low 32-bit memory. Keep low RAM >> + * DMA zone on platforms that have RAM there. >> + */ >> + if (memblock_start_of_DRAM() < U32_MAX) >> + zone_dma_limit = min(zone_dma_limit, U32_MAX); >> arm64_dma_phys_limit = max_zone_phys(zone_dma_limit); >> max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); >> #endif
+ Robin On Tue, Nov 26, 2024 at 09:38:22AM -0800, Yang Shi wrote: > On 11/25/24 10:27 PM, Baruch Siach wrote: > > On Mon, Nov 25 2024, Yang Shi wrote: > > > The commit ba0fb44aed47 ("dma-mapping: replace zone_dma_bits by > > > zone_dma_limit") changed how zone_dma_limit was calculated. Now it > > > returns the memsize limit in IORT or device tree instead of U32_MAX if > > > the memsize limit is greater than U32_MAX. > > > > Can you give a concrete example of memory layout and dma-ranges that > > demonstrates this issue? > > Our 2 sockets system has physical memory starts at 0x0 on node 0 and > 0x200000000000 on node 1. The memory size limit defined in IORT is 0x30 (48 > bits). > > The DMA zone is: > > pages free 887722 > boost 0 > min 229 > low 1108 > high 1987 > promo 2866 > spanned 983040 > present 982034 > managed 903238 > cma 16384 > protection: (0, 0, 124824, 0, 0) > start_pfn: 65536 > > When allocating DMA buffer, dma_direct_optimal_gfp_mask() is called to > determine the proper zone constraints. If the phys_limit is less than > zone_dma_limit, it will use GFP_DMA. But zone_dma_limit is 0xffffffffffff on > v6.12 instead of 4G prior v6.12, it means all DMA buffer allocation will go > to DMA zone even though the devices don't require it. > > DMA zone is on node 0, so we saw excessive remote access on 2 sockets > system. [...] > The physical addr range for DMA zone is correct, the problem is wrong > zone_dma_limit. Before commit ba0fb44aed47 zone_dma_limit was 4G, after it > it is the whole memory even though DMA zone just covers low 4G. Thanks for the details. I agree that zone_dma_limit shouldn't be higher than the ZONE_DMA upper boundary, otherwise it gets confusing for functions like dma_direct_optimal_gfp_mask() and we may force allocations to a specific range unnecessarily. If IORT or DT indicate a large mask covering the whole RAM (i.e. no restrictions), in an ideal world, we should normally extend ZONE_DMA to the same. One problem is ZONE_DMA32 (and GFP_DMA32) and the fact that ZONE_DMA sits below it. Until we hear otherwise, we assume a DMA offset of 0 for such 32-bit devices and therefore define ZONE_DMA32 in the lower 4GB if RAM starts below this limit (and an empty ZONE_DMA32 if RAM starts above). Another aspect to consider is that we don't always have DT or IORT information or some devices need a smaller limit than what's advertised in the firmware tables (typically 32-bit masks). This code went through a couple of fixes already: 833bd284a454 ("arm64: mm: fix DMA zone when dma-ranges is missing") 122c234ef4e1 ("arm64: mm: keep low RAM dma zone") Since our current assumption is to assume ZONE_DMA within 32-bit if RAM below 4GB, I'm happy to make this conditional on CONFIG_ZONE_DMA32 also being enabled. So, from your patch below: > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index d21f67d67cf5..ccdef53872a0 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -117,15 +117,6 @@ static void __init arch_reserve_crashkernel(void) > static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit) > { > - /** > - * Information we get from firmware (e.g. DT dma-ranges) describe DMA > - * bus constraints. Devices using DMA might have their own limitations. > - * Some of them rely on DMA zone in low 32-bit memory. Keep low RAM > - * DMA zone on platforms that have RAM there. > - */ > - if (memblock_start_of_DRAM() < U32_MAX) > - zone_limit = min(zone_limit, U32_MAX); > - > return min(zone_limit, memblock_end_of_DRAM() - 1) + 1; > } This part is fine. > @@ -141,6 +132,14 @@ static void __init zone_sizes_init(void) > acpi_zone_dma_limit = acpi_iort_dma_get_max_cpu_address(); > dt_zone_dma_limit = of_dma_get_max_cpu_address(NULL); > zone_dma_limit = min(dt_zone_dma_limit, acpi_zone_dma_limit); > + /* > + * Information we get from firmware (e.g. DT dma-ranges) describe DMA > + * bus constraints. Devices using DMA might have their own limitations. > + * Some of them rely on DMA zone in low 32-bit memory. Keep low RAM > + * DMA zone on platforms that have RAM there. > + */ > + if (memblock_start_of_DRAM() < U32_MAX) > + zone_dma_limit = min(zone_dma_limit, U32_MAX); > arm64_dma_phys_limit = max_zone_phys(zone_dma_limit); > max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); > #endif But I'd move the zone_dma_limit update further down in the CONFIG_ZONE_DMA32 block. I think we only need to limit it to dma32_phys_limit and ignore the U32_MAX check. The former is already capped to 32-bit. For the second hunk, something like below (untested): diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index d21f67d67cf5..ffaf5bd8d0a1 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -146,8 +146,10 @@ static void __init zone_sizes_init(void) #endif #ifdef CONFIG_ZONE_DMA32 max_zone_pfns[ZONE_DMA32] = PFN_DOWN(dma32_phys_limit); - if (!arm64_dma_phys_limit) + if (!arm64_dma_phys_limit || arm64_dma_phys_limit > dma32_phys_limit) { arm64_dma_phys_limit = dma32_phys_limit; + zone_dma_limit = arm64_dma_phys_limit - 1; + } #endif if (!arm64_dma_phys_limit) arm64_dma_phys_limit = PHYS_MASK + 1; With some comment on why we do this but most likely not the current comment in max_zone_phys() (more like keep ZONE_DMA below ZONE_DMA32).
On 2024-11-27 5:49 pm, Catalin Marinas wrote: > + Robin > > On Tue, Nov 26, 2024 at 09:38:22AM -0800, Yang Shi wrote: >> On 11/25/24 10:27 PM, Baruch Siach wrote: >>> On Mon, Nov 25 2024, Yang Shi wrote: >>>> The commit ba0fb44aed47 ("dma-mapping: replace zone_dma_bits by >>>> zone_dma_limit") changed how zone_dma_limit was calculated. Now it >>>> returns the memsize limit in IORT or device tree instead of U32_MAX if >>>> the memsize limit is greater than U32_MAX. >>> >>> Can you give a concrete example of memory layout and dma-ranges that >>> demonstrates this issue? >> >> Our 2 sockets system has physical memory starts at 0x0 on node 0 and >> 0x200000000000 on node 1. The memory size limit defined in IORT is 0x30 (48 >> bits). >> >> The DMA zone is: >> >> pages free 887722 >> boost 0 >> min 229 >> low 1108 >> high 1987 >> promo 2866 >> spanned 983040 >> present 982034 >> managed 903238 >> cma 16384 >> protection: (0, 0, 124824, 0, 0) >> start_pfn: 65536 >> >> When allocating DMA buffer, dma_direct_optimal_gfp_mask() is called to >> determine the proper zone constraints. If the phys_limit is less than >> zone_dma_limit, it will use GFP_DMA. But zone_dma_limit is 0xffffffffffff on >> v6.12 instead of 4G prior v6.12, it means all DMA buffer allocation will go >> to DMA zone even though the devices don't require it. >> >> DMA zone is on node 0, so we saw excessive remote access on 2 sockets >> system. > [...] >> The physical addr range for DMA zone is correct, the problem is wrong >> zone_dma_limit. Before commit ba0fb44aed47 zone_dma_limit was 4G, after it >> it is the whole memory even though DMA zone just covers low 4G. > > Thanks for the details. I agree that zone_dma_limit shouldn't be higher > than the ZONE_DMA upper boundary, otherwise it gets confusing for > functions like dma_direct_optimal_gfp_mask() and we may force > allocations to a specific range unnecessarily. Oof, indeed the original min3() logic did also result in the 32-bit-clamped value being written back to zone_dma_bits itself, so that's another subtlety overlooked in ba0fb44aed47 *and* the subsequent fixes reinstating that clamping (conditionally) within max_zone_phys(). > If IORT or DT indicate a large mask covering the whole RAM (i.e. no > restrictions), in an ideal world, we should normally extend ZONE_DMA to > the same. That's not right, ZONE_DMA should still be relatively limited in size (unless we really do only have a tiny amount of RAM) - just because a DT dma-ranges property says the system interconnect can carry >32 address bits in general doesn't mean that individual device DMA masks can't still be 32-bit or smaller. IIRC we're still implicitly assuming that if DT does describe an offset range into "high" RAM, it must represent a suitable lowest common denominator for all relevant devices already, and therefore we can get away with sizing ZONE_DMA off it blindly. I'm yet to hear of any platform having a general offset with no significant upper limit, but if we did want to support such a thing then we would need to revisit the previously-discussed notion of an accurate of_dma_get_min_cpu_address() rather than the assumption based on memblock_start_of_DRAM(). After staring at it for long enough, I think $SUBJECT patch is actually correct as it is. In fact I'm now wondering why the fix was put inside max_zone_phys() in the first place, since it's clearly pointless to clamp DMA_BIT_MASK(32) to U32_MAX in the dma32_phys_limit case... However the commit message is perhaps not as clear as it could be - technically we are correctly *calculating* the appropriate effective zone_dma_limt value within the scope of zone_sizes_init(), we're just failing to properly update the actual zone_dma_limit variable for the benefit of other users. Thanks, Robin. > One problem is ZONE_DMA32 (and GFP_DMA32) and the fact that > ZONE_DMA sits below it. Until we hear otherwise, we assume a DMA offset > of 0 for such 32-bit devices and therefore define ZONE_DMA32 in the > lower 4GB if RAM starts below this limit (and an empty ZONE_DMA32 if RAM > starts above). > > Another aspect to consider is that we don't always have DT or IORT > information or some devices need a smaller limit than what's advertised > in the firmware tables (typically 32-bit masks). This code went through > a couple of fixes already: > > 833bd284a454 ("arm64: mm: fix DMA zone when dma-ranges is missing") > 122c234ef4e1 ("arm64: mm: keep low RAM dma zone") > > Since our current assumption is to assume ZONE_DMA within 32-bit if RAM > below 4GB, I'm happy to make this conditional on CONFIG_ZONE_DMA32 also > being enabled. So, from your patch below: > >> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >> index d21f67d67cf5..ccdef53872a0 100644 >> --- a/arch/arm64/mm/init.c >> +++ b/arch/arm64/mm/init.c >> @@ -117,15 +117,6 @@ static void __init arch_reserve_crashkernel(void) >> static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit) >> { >> - /** >> - * Information we get from firmware (e.g. DT dma-ranges) describe DMA >> - * bus constraints. Devices using DMA might have their own limitations. >> - * Some of them rely on DMA zone in low 32-bit memory. Keep low RAM >> - * DMA zone on platforms that have RAM there. >> - */ >> - if (memblock_start_of_DRAM() < U32_MAX) >> - zone_limit = min(zone_limit, U32_MAX); >> - >> return min(zone_limit, memblock_end_of_DRAM() - 1) + 1; >> } > > This part is fine. > >> @@ -141,6 +132,14 @@ static void __init zone_sizes_init(void) >> acpi_zone_dma_limit = acpi_iort_dma_get_max_cpu_address(); >> dt_zone_dma_limit = of_dma_get_max_cpu_address(NULL); >> zone_dma_limit = min(dt_zone_dma_limit, acpi_zone_dma_limit); >> + /* >> + * Information we get from firmware (e.g. DT dma-ranges) describe DMA >> + * bus constraints. Devices using DMA might have their own limitations. >> + * Some of them rely on DMA zone in low 32-bit memory. Keep low RAM >> + * DMA zone on platforms that have RAM there. >> + */ >> + if (memblock_start_of_DRAM() < U32_MAX) >> + zone_dma_limit = min(zone_dma_limit, U32_MAX); >> arm64_dma_phys_limit = max_zone_phys(zone_dma_limit); >> max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); >> #endif > > But I'd move the zone_dma_limit update further down in the > CONFIG_ZONE_DMA32 block. I think we only need to limit it to > dma32_phys_limit and ignore the U32_MAX check. The former is already > capped to 32-bit. For the second hunk, something like below (untested): > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index d21f67d67cf5..ffaf5bd8d0a1 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -146,8 +146,10 @@ static void __init zone_sizes_init(void) > #endif > #ifdef CONFIG_ZONE_DMA32 > max_zone_pfns[ZONE_DMA32] = PFN_DOWN(dma32_phys_limit); > - if (!arm64_dma_phys_limit) > + if (!arm64_dma_phys_limit || arm64_dma_phys_limit > dma32_phys_limit) { > arm64_dma_phys_limit = dma32_phys_limit; > + zone_dma_limit = arm64_dma_phys_limit - 1; > + } > #endif > if (!arm64_dma_phys_limit) > arm64_dma_phys_limit = PHYS_MASK + 1; > > With some comment on why we do this but most likely not the current > comment in max_zone_phys() (more like keep ZONE_DMA below ZONE_DMA32). >
On Fri, Nov 29, 2024 at 06:06:50PM +0000, Robin Murphy wrote: > On 2024-11-27 5:49 pm, Catalin Marinas wrote: > > If IORT or DT indicate a large mask covering the whole RAM (i.e. no > > restrictions), in an ideal world, we should normally extend ZONE_DMA to > > the same. > > That's not right, ZONE_DMA should still be relatively limited in size > (unless we really do only have a tiny amount of RAM) - just because a DT > dma-ranges property says the system interconnect can carry >32 address bits > in general doesn't mean that individual device DMA masks can't still be > 32-bit or smaller. IIRC we're still implicitly assuming that if DT does > describe an offset range into "high" RAM, it must represent a suitable > lowest common denominator for all relevant devices already, and therefore we > can get away with sizing ZONE_DMA off it blindly. Fine by me to keep ZONE_DMA in the low range always. I was thinking of only doing this if ZONE_DMA32 is enabled. > After staring at it for long enough, I think $SUBJECT patch is actually > correct as it is. Thanks Robin for having a look. Can I add your reviewed-by? > In fact I'm now wondering why the fix was put inside > max_zone_phys() in the first place, since it's clearly pointless to clamp > DMA_BIT_MASK(32) to U32_MAX in the dma32_phys_limit case... I came to the same conclusion. I think it might have been some left-over from when we had a ZONE_DMA32 in the above 4GB (AMD Seattle?). Than we changed it a few times but only focused on this function for setting the limits. > However the commit message is perhaps not as clear as it could be - > technically we are correctly *calculating* the appropriate effective > zone_dma_limt value within the scope of zone_sizes_init(), we're just > failing to properly update the actual zone_dma_limit variable for the > benefit of other users. I'll have a look next week at rewriting the commit message, unless Yang does it first. I'm planning to queue this patch for -rc2.
On 11/29/24 11:38 AM, Catalin Marinas wrote: > On Fri, Nov 29, 2024 at 06:06:50PM +0000, Robin Murphy wrote: >> On 2024-11-27 5:49 pm, Catalin Marinas wrote: >>> If IORT or DT indicate a large mask covering the whole RAM (i.e. no >>> restrictions), in an ideal world, we should normally extend ZONE_DMA to >>> the same. >> That's not right, ZONE_DMA should still be relatively limited in size >> (unless we really do only have a tiny amount of RAM) - just because a DT >> dma-ranges property says the system interconnect can carry >32 address bits >> in general doesn't mean that individual device DMA masks can't still be >> 32-bit or smaller. IIRC we're still implicitly assuming that if DT does >> describe an offset range into "high" RAM, it must represent a suitable >> lowest common denominator for all relevant devices already, and therefore we >> can get away with sizing ZONE_DMA off it blindly. > Fine by me to keep ZONE_DMA in the low range always. I was thinking of > only doing this if ZONE_DMA32 is enabled. > >> After staring at it for long enough, I think $SUBJECT patch is actually >> correct as it is. > Thanks Robin for having a look. Can I add your reviewed-by? > >> In fact I'm now wondering why the fix was put inside >> max_zone_phys() in the first place, since it's clearly pointless to clamp >> DMA_BIT_MASK(32) to U32_MAX in the dma32_phys_limit case... > I came to the same conclusion. I think it might have been some left-over > from when we had a ZONE_DMA32 in the above 4GB (AMD Seattle?). Than we > changed it a few times but only focused on this function for setting the > limits. > >> However the commit message is perhaps not as clear as it could be - >> technically we are correctly *calculating* the appropriate effective >> zone_dma_limt value within the scope of zone_sizes_init(), we're just >> failing to properly update the actual zone_dma_limit variable for the >> benefit of other users. > I'll have a look next week at rewriting the commit message, unless Yang > does it first. I'm planning to queue this patch for -rc2. Hi Catalin and Robin, Thanks for moving this forward. How's about the below commit message? We failed to properly update the actual zone_dma_limit variable. Now it is the memsize limit in IORT or device tree instead of U32_MAX if the memsize limit is greater than U32_MAX. The zone_dma_limit is used to determine whether GFP_DMA should be used or not when allocating DMA buffers. The wrong zone_dma_limit resulted in DMA allocations use GFP_DMA even though the devices don't require it then fall into DMA zone on node 0. It caused regression on our two sockets systems due to excessive remote memory access. >
On Mon, 25 Nov 2024 09:16:50 -0800, Yang Shi wrote: > The commit ba0fb44aed47 ("dma-mapping: replace zone_dma_bits by > zone_dma_limit") changed how zone_dma_limit was calculated. Now it > returns the memsize limit in IORT or device tree instead of U32_MAX if > the memsize limit is greater than U32_MAX. > > This resulted in DMA allocations may use GFP_DMA even though the devices > don't require it. It caused regression on our two sockets systems due > to excessive remote memory access. > > [...] Applied to arm64 (for-next/fixes), thanks! [1/1] arm64: mm: fix zone_dma_limit calculation https://git.kernel.org/arm64/c/56a708742a8b
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index d21f67d67cf5..ccdef53872a0 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -117,15 +117,6 @@ static void __init arch_reserve_crashkernel(void) static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit) { - /** - * Information we get from firmware (e.g. DT dma-ranges) describe DMA - * bus constraints. Devices using DMA might have their own limitations. - * Some of them rely on DMA zone in low 32-bit memory. Keep low RAM - * DMA zone on platforms that have RAM there. - */ - if (memblock_start_of_DRAM() < U32_MAX) - zone_limit = min(zone_limit, U32_MAX); - return min(zone_limit, memblock_end_of_DRAM() - 1) + 1; } @@ -141,6 +132,14 @@ static void __init zone_sizes_init(void) acpi_zone_dma_limit = acpi_iort_dma_get_max_cpu_address(); dt_zone_dma_limit = of_dma_get_max_cpu_address(NULL); zone_dma_limit = min(dt_zone_dma_limit, acpi_zone_dma_limit); + /* + * Information we get from firmware (e.g. DT dma-ranges) describe DMA + * bus constraints. Devices using DMA might have their own limitations. + * Some of them rely on DMA zone in low 32-bit memory. Keep low RAM + * DMA zone on platforms that have RAM there. + */ + if (memblock_start_of_DRAM() < U32_MAX) + zone_dma_limit = min(zone_dma_limit, U32_MAX); arm64_dma_phys_limit = max_zone_phys(zone_dma_limit); max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit); #endif