diff mbox series

arm64: limit MAX_PHYSMEM_BITS based on vmemmap

Message ID 20240703210707.1986816-1-scott@os.amperecomputing.com (mailing list archive)
State New, archived
Headers show
Series arm64: limit MAX_PHYSMEM_BITS based on vmemmap | expand

Commit Message

D Scott Phillips July 3, 2024, 9:07 p.m. UTC
Prior to the memory map adjustments in v6.9-rc1, the amdgpu driver could
trip over the warning of:

`WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));`

in vmemmap_populate()[1]. After the adjustments, it becomes a translation
fault and panic.

The cause is that the amdgpu driver allocates some unused space from
iomem_resource and claims it as MEMORY_DEVICE_PRIVATE and
devm_memremap_pages() it. An address above those backed by the vmemmap is
used.

Adjust MAX_PHYSMEM_BITS so that addresses not backed by the vmemmap will
not be chosen as device private addresses.

[1]: Call trace:
      vmemmap_populate+0x30/0x48
      __populate_section_memmap+0x40/0x90
      sparse_add_section+0xfc/0x3e8
      __add_pages+0xb4/0x168
      pagemap_range+0x300/0x410
      memremap_pages+0x184/0x2d8
      devm_memremap_pages+0x30/0x90
      kgd2kfd_init_zone_device+0xe0/0x1f0 [amdgpu]
      amdgpu_device_ip_init+0x674/0x888 [amdgpu]
      amdgpu_device_init+0x7bc/0xed8 [amdgpu]
      amdgpu_driver_load_kms+0x28/0x1c0 [amdgpu]
      amdgpu_pci_probe+0x194/0x580 [amdgpu]
      local_pci_probe+0x48/0xb8
      work_for_cpu_fn+0x24/0x40
      process_one_work+0x170/0x3e0
      worker_thread+0x2ac/0x3e0
      kthread+0xf4/0x108
      ret_from_fork+0x10/0x20

Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
---
 arch/arm64/include/asm/sparsemem.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Anshuman Khandual July 4, 2024, 3:12 a.m. UTC | #1
Hello Scott,

On 7/4/24 02:37, D Scott Phillips wrote:
> Prior to the memory map adjustments in v6.9-rc1, the amdgpu driver could
> trip over the warning of:
> 
> `WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));`

Could you please provide the mainline commit ID for the mentioned memory
adjustment changes.

> 
> in vmemmap_populate()[1]. After the adjustments, it becomes a translation
> fault and panic.

Probably caused by accessing memory which does not exist on that address.

> 
> The cause is that the amdgpu driver allocates some unused space from
> iomem_resource and claims it as MEMORY_DEVICE_PRIVATE and
> devm_memremap_pages() it. An address above those backed by the vmemmap is
> used.
> 
> Adjust MAX_PHYSMEM_BITS so that addresses not backed by the vmemmap will
> not be chosen as device private addresses.
> 
> [1]: Call trace:
>       vmemmap_populate+0x30/0x48
>       __populate_section_memmap+0x40/0x90
>       sparse_add_section+0xfc/0x3e8
>       __add_pages+0xb4/0x168
>       pagemap_range+0x300/0x410
>       memremap_pages+0x184/0x2d8
>       devm_memremap_pages+0x30/0x90
>       kgd2kfd_init_zone_device+0xe0/0x1f0 [amdgpu]
>       amdgpu_device_ip_init+0x674/0x888 [amdgpu]
>       amdgpu_device_init+0x7bc/0xed8 [amdgpu]
>       amdgpu_driver_load_kms+0x28/0x1c0 [amdgpu]
>       amdgpu_pci_probe+0x194/0x580 [amdgpu]
>       local_pci_probe+0x48/0xb8
>       work_for_cpu_fn+0x24/0x40
>       process_one_work+0x170/0x3e0
>       worker_thread+0x2ac/0x3e0
>       kthread+0xf4/0x108
>       ret_from_fork+0x10/0x20
> 
> Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
> ---
>  arch/arm64/include/asm/sparsemem.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/sparsemem.h b/arch/arm64/include/asm/sparsemem.h
> index 8a8acc220371c..8387301f2e206 100644
> --- a/arch/arm64/include/asm/sparsemem.h
> +++ b/arch/arm64/include/asm/sparsemem.h
> @@ -5,7 +5,7 @@
>  #ifndef __ASM_SPARSEMEM_H
>  #define __ASM_SPARSEMEM_H
>  
> -#define MAX_PHYSMEM_BITS	CONFIG_ARM64_PA_BITS
> +#define MAX_PHYSMEM_BITS	ilog2(VMEMMAP_RANGE)

Just wondering if there is another method, which avoids selecting physical
memory ranges not backed with vmemmap. Also will reducing MAX_PHYSMEM_BITS
below ARM64_PA_BITS have other side effects ? Do other platforms have this
exact same co-relation between MAX_PHYSMEM_BITS and vmemmap range ?

>  
>  /*
>   * Section size must be at least 512MB for 64K base
Catalin Marinas July 4, 2024, 2:08 p.m. UTC | #2
On Thu, Jul 04, 2024 at 08:42:52AM +0530, Anshuman Khandual wrote:
> On 7/4/24 02:37, D Scott Phillips wrote:
> > diff --git a/arch/arm64/include/asm/sparsemem.h b/arch/arm64/include/asm/sparsemem.h
> > index 8a8acc220371c..8387301f2e206 100644
> > --- a/arch/arm64/include/asm/sparsemem.h
> > +++ b/arch/arm64/include/asm/sparsemem.h
> > @@ -5,7 +5,7 @@
> >  #ifndef __ASM_SPARSEMEM_H
> >  #define __ASM_SPARSEMEM_H
> >  
> > -#define MAX_PHYSMEM_BITS	CONFIG_ARM64_PA_BITS
> > +#define MAX_PHYSMEM_BITS	ilog2(VMEMMAP_RANGE)
> 
> Just wondering if there is another method, which avoids selecting physical
> memory ranges not backed with vmemmap. Also will reducing MAX_PHYSMEM_BITS
> below ARM64_PA_BITS have other side effects ? Do other platforms have this
> exact same co-relation between MAX_PHYSMEM_BITS and vmemmap range ?

That's indeed a pretty weird workaround. MAX_PHYSMEM_BITS, as the name
implies, is about the physical bits supported for memory while
VMEMMAP_RANGE tells us the virtual address range. There is a correlation
between them but they are different things conceptually.

The memory hotplug code uses arch_get_mappable_range(). This should be
called from the amdgpu code rather than changing MAX_PHYSMEM_BITS.
D Scott Phillips July 8, 2024, 3:52 a.m. UTC | #3
Anshuman Khandual <anshuman.khandual@arm.com> writes:

> Hello Scott,
>
> On 7/4/24 02:37, D Scott Phillips wrote:
> > Prior to the memory map adjustments in v6.9-rc1, the amdgpu driver could
> > trip over the warning of:
> > 
> > `WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));`
>
> Could you please provide the mainline commit ID for the mentioned memory
> adjustment changes.

Yes, it's commit 32697ff38287 ("arm64: vmemmap: Avoid base2 order of
struct page size to dimension region)"

> > 
> > in vmemmap_populate()[1]. After the adjustments, it becomes a translation
> > fault and panic.
>
> Probably caused by accessing memory which does not exist on that address.

It seems the changed VMEMMAP_START causes the out-of-range address to
wrap around to the low virtual address space.

| [   22.177085] Unable to handle kernel paging request at virtual address 000001ffa6000034
| [   22.186641] Mem abort info:
| [   22.192201]   ESR = 0x0000000096000044
| [   22.198720]   EC = 0x25: DABT (current EL), IL = 32 bits
| [   22.206798]   SET = 0, FnV = 0
| [   22.212624]   EA = 0, S1PTW = 0
| [   22.218612]   FSC = 0x04: level 0 translation fault
| [   22.226343] Data abort info:
| [   22.231992]   ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
| [   22.240245]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
| [   22.248064]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
| [   22.256145] user pgtable: 4k pages, 48-bit VAs, pgdp=000008000287c000
| [   22.265357] [000001ffa6000034] pgd=0000000000000000, p4d=0000000000000000
| [   22.265362] Internal error: Oops: 0000000096000044 [#1] SMP
| [   22.265365] Modules linked in: amdgpu(+) drm_ttm_helper ttm drm_exec drm_suballoc_helper amdxcp drm_buddy gpu_sched drm_display_helper cec
| [   22.292879] CPU: 0 PID: 564 Comm: kworker/0:2 Tainted: G        W          6.8.0-rc3+ #2
| [   22.300957] Hardware name: ADLINK Ampere Altra Developer Platform/Ampere Altra Developer Platform, BIOS TianoCore 2.04.100.11 (SYS: 2.06.20220308) 11/06/2
| [   22.314763] Workqueue: events work_for_cpu_fn
| [   22.319112] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| [   22.326062] pc : __init_zone_device_page.constprop.0+0x2c/0xa8
| [   22.331886] lr : memmap_init_zone_device+0xf0/0x210
| [   22.336751] sp : ffff80008359b9f0
| [   22.340053] x29: ffff80008359b9f0 x28: 0000000fffa00000 x27: ffff07ffe3e3a7c8
| [   22.347177] x26: 000001ffa6000000 x25: 0000000000000001 x24: fffffdffc0000000
| [   22.354300] x23: ffffcb2dc0b21dd0 x22: 0000001000000000 x21: 0000000000000000
| [   22.361424] x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000001fb8
| [   22.368547] x17: ffffcb2d80f67db4 x16: ffffcb2dc00f7090 x15: ffffcb2d80f5cc50
| [   22.375671] x14: ffffcb2d80f6aff8 x13: ffffcb2dbea78a4c x12: ffffcb2dbea6d82c
| [   22.382794] x11: ffffcb2dc1661008 x10: ffffcb2dbea78a4c x9 : ffffcb2dc00eef40
| [   22.389918] x8 : ffff081f6fdcd2c0 x7 : 00000000ffffffff x6 : 0000ffff00004000
| [   22.397041] x5 : 0000000000000001 x4 : 000001ffa6000008 x3 : ffff07ffe3e3a7c8
| [   22.404164] x2 : 0000000000000000 x1 : 0000000fffa00000 x0 : 000001ffa6000000
| [   22.411287] Call trace:
| [   22.413721]  __init_zone_device_page.constprop.0+0x2c/0xa8
| [   22.419194]  memmap_init_zone_device+0xf0/0x210
| [   22.423713]  pagemap_range+0x1e0/0x410
| [   22.427452]  memremap_pages+0x18c/0x2e0
| [   22.431275]  devm_memremap_pages+0x30/0x90
| [   22.435360]  kgd2kfd_init_zone_device+0xf0/0x200 [amdgpu]
| [   22.441301]  amdgpu_device_ip_init+0x674/0x888 [amdgpu]
| [   22.446989]  amdgpu_device_init+0x7a4/0xea0 [amdgpu]
| [   22.452415]  amdgpu_driver_load_kms+0x28/0x1c0 [amdgpu]
| [   22.458101]  amdgpu_pci_probe+0x1a0/0x560 [amdgpu]
| [   22.463353]  local_pci_probe+0x48/0xb8
| [   22.467093]  work_for_cpu_fn+0x24/0x40
| [   22.470831]  process_one_work+0x170/0x3e0
| [   22.474829]  worker_thread+0x2ac/0x3e0
| [   22.478567]  kthread+0xf4/0x108
| [   22.481696]  ret_from_fork+0x10/0x20
| [   22.485261] Code: d2880006 a90153f3 91002004 f2dfffe6 (b9003405) 
| [   22.491342] ---[ end trace 0000000000000000 ]---

> > 
> > The cause is that the amdgpu driver allocates some unused space from
> > iomem_resource and claims it as MEMORY_DEVICE_PRIVATE and
> > devm_memremap_pages() it. An address above those backed by the vmemmap is
> > used.
> > 
> > Adjust MAX_PHYSMEM_BITS so that addresses not backed by the vmemmap will
> > not be chosen as device private addresses.
> > 
> > [1]: Call trace:
> >       vmemmap_populate+0x30/0x48
> >       __populate_section_memmap+0x40/0x90
> >       sparse_add_section+0xfc/0x3e8
> >       __add_pages+0xb4/0x168
> >       pagemap_range+0x300/0x410
> >       memremap_pages+0x184/0x2d8
> >       devm_memremap_pages+0x30/0x90
> >       kgd2kfd_init_zone_device+0xe0/0x1f0 [amdgpu]
> >       amdgpu_device_ip_init+0x674/0x888 [amdgpu]
> >       amdgpu_device_init+0x7bc/0xed8 [amdgpu]
> >       amdgpu_driver_load_kms+0x28/0x1c0 [amdgpu]
> >       amdgpu_pci_probe+0x194/0x580 [amdgpu]
> >       local_pci_probe+0x48/0xb8
> >       work_for_cpu_fn+0x24/0x40
> >       process_one_work+0x170/0x3e0
> >       worker_thread+0x2ac/0x3e0
> >       kthread+0xf4/0x108
> >       ret_from_fork+0x10/0x20
> > 
> > Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
> > ---
> >  arch/arm64/include/asm/sparsemem.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/include/asm/sparsemem.h b/arch/arm64/include/asm/sparsemem.h
> > index 8a8acc220371c..8387301f2e206 100644
> > --- a/arch/arm64/include/asm/sparsemem.h
> > +++ b/arch/arm64/include/asm/sparsemem.h
> > @@ -5,7 +5,7 @@
> >  #ifndef __ASM_SPARSEMEM_H
> >  #define __ASM_SPARSEMEM_H
> >  
> > -#define MAX_PHYSMEM_BITS	CONFIG_ARM64_PA_BITS
> > +#define MAX_PHYSMEM_BITS	ilog2(VMEMMAP_RANGE)
>
> Just wondering if there is another method, which avoids selecting physical
> memory ranges not backed with vmemmap. Also will reducing MAX_PHYSMEM_BITS
> below ARM64_PA_BITS have other side effects ? Do other platforms have this
> exact same co-relation between MAX_PHYSMEM_BITS and vmemmap range ?

I'm not that familiar with other architectures, but I assume that high
addresses of MAX_PHYSMEM_BITS work properly for hmm as the code to
allocate a free range allocates from high addresses, descending.
D Scott Phillips July 8, 2024, 3:52 a.m. UTC | #4
Catalin Marinas <catalin.marinas@arm.com> writes:

> On Thu, Jul 04, 2024 at 08:42:52AM +0530, Anshuman Khandual wrote:
> > On 7/4/24 02:37, D Scott Phillips wrote:
> > > diff --git a/arch/arm64/include/asm/sparsemem.h b/arch/arm64/include/asm/sparsemem.h
> > > index 8a8acc220371c..8387301f2e206 100644
> > > --- a/arch/arm64/include/asm/sparsemem.h
> > > +++ b/arch/arm64/include/asm/sparsemem.h
> > > @@ -5,7 +5,7 @@
> > >  #ifndef __ASM_SPARSEMEM_H
> > >  #define __ASM_SPARSEMEM_H
> > >  
> > > -#define MAX_PHYSMEM_BITS	CONFIG_ARM64_PA_BITS
> > > +#define MAX_PHYSMEM_BITS	ilog2(VMEMMAP_RANGE)
> > 
> > Just wondering if there is another method, which avoids selecting
> > physical memory ranges not backed with vmemmap. Also will reducing
> > MAX_PHYSMEM_BITS below ARM64_PA_BITS have other side effects ? Do
> > other platforms have this exact same co-relation between
> > MAX_PHYSMEM_BITS and vmemmap range ?
>
> That's indeed a pretty weird workaround. MAX_PHYSMEM_BITS, as the name
> implies, is about the physical bits supported for memory while
> VMEMMAP_RANGE tells us the virtual address range. There is a
> correlation between them but they are different things conceptually.
>
> The memory hotplug code uses arch_get_mappable_range(). This should be
> called from the amdgpu code rather than changing MAX_PHYSMEM_BITS.

OK, thanks I'll pursue that approach.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/sparsemem.h b/arch/arm64/include/asm/sparsemem.h
index 8a8acc220371c..8387301f2e206 100644
--- a/arch/arm64/include/asm/sparsemem.h
+++ b/arch/arm64/include/asm/sparsemem.h
@@ -5,7 +5,7 @@ 
 #ifndef __ASM_SPARSEMEM_H
 #define __ASM_SPARSEMEM_H
 
-#define MAX_PHYSMEM_BITS	CONFIG_ARM64_PA_BITS
+#define MAX_PHYSMEM_BITS	ilog2(VMEMMAP_RANGE)
 
 /*
  * Section size must be at least 512MB for 64K base