Message ID | f8e67f82-103d-156c-deb0-d6d6e2756f5e@microchip.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | RISC-V reserved memory problems | expand |
On 8/16/22 22:41, Conor.Dooley@microchip.com wrote: > Hey all, > We've run into a bit of a problem with reserved memory on PolarFire, or > more accurately a pair of problems that seem to have opposite fixes. > > The first of these problems is triggered when trying to implement a > remoteproc driver. To get the reserved memory buffer, remoteproc > does an of_reserved_mem_lookup(), something like: > > np = of_parse_phandle(pdev->of_node, "memory-region", 0); > if (!np) > return -EINVAL; > > rmem = of_reserved_mem_lookup(np); > if (!rmem) > return -EINVAL; > > of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find > a match - but this was triggering kernel panics for us. We did some > debugging and found that the name string's pointer was pointing to an > address in the 0x4000_0000 range. The minimum reproduction for this > crash is attached - it hacks in some print_reserved_mem()s into > setup_vm_final() around a tlb flush so you can see the before/after. > (You'll need a reserved memory node in your dts to replicate) > > The output is like so, with the same crash as in the remoteproc driver: > > [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022 > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > [ 0.000000] Machine model: Microchip PolarFire-SoC Icicle Kit > [ 0.000000] earlycon: ns16550a0 at MMIO32 0x0000000020100000 (options '115200n8') > [ 0.000000] printk: bootconsole [ns16550a0] enabled > [ 0.000000] printk: debug: skip boot console de-registration. > [ 0.000000] efi: UEFI not found. > [ 0.000000] before flush > [ 0.000000] OF: reserved mem: debug name is fabricbuf@ae000000 > [ 0.000000] after flush > [ 0.000000] Unable to handle kernel paging request at virtual address 00000000401c31ac > [ 0.000000] Oops [#1] > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1 > [ 0.000000] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT) > [ 0.000000] epc : string+0x4a/0xea > [ 0.000000] ra : vsnprintf+0x1e4/0x336 > [ 0.000000] epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0 > [ 0.000000] gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000 > [ 0.000000] t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20 > [ 0.000000] s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000 > [ 0.000000] a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff > [ 0.000000] a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff > [ 0.000000] s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008 > [ 0.000000] s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00 > [ 0.000000] s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002 > [ 0.000000] s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617 > [ 0.000000] t5 : ffffffff812f3618 t6 : ffffffff81203d08 > [ 0.000000] status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d > [ 0.000000] [<ffffffff80338936>] vsnprintf+0x1e4/0x336 > [ 0.000000] [<ffffffff80055ae2>] vprintk_store+0xf6/0x344 > [ 0.000000] [<ffffffff80055d86>] vprintk_emit+0x56/0x192 > [ 0.000000] [<ffffffff80055ed8>] vprintk_default+0x16/0x1e > [ 0.000000] [<ffffffff800563d2>] vprintk+0x72/0x80 > [ 0.000000] [<ffffffff806813b2>] _printk+0x36/0x50 > [ 0.000000] [<ffffffff8068af48>] print_reserved_mem+0x1c/0x24 > [ 0.000000] [<ffffffff808057ec>] paging_init+0x528/0x5bc > [ 0.000000] [<ffffffff808031ae>] setup_arch+0xd0/0x592 > [ 0.000000] [<ffffffff8080070e>] start_kernel+0x82/0x73c > [ 0.000000] ---[ end trace 0000000000000000 ]--- > [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > We traced this back to early_init_fdt_scan_reserved_mem() in > setup_bootmem() - moving it later back up the boot sequence to > after the dt has been remapped etc has fixed the problem for us. > > The least movement to get it working is attached, and also pushed > here: git.kernel.org/conor/c/1735589baefc > > The second problem is a bit more complicated to explain - but we > found the solution conflicted with the remoteproc fix as we had > to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot > process to solve this one. > > We want to have a node in our devicetree that contains some memory > that is non-cached & marked as reserved-memory. Maybe we have just > missed something, but from what we've seen: > - the really early setup looks at the dtb, picks the highest bit > of memory and puts the dtb etc there so it can start using it > - early_init_fdt_scan_reserved_mem() is then called, which figures > out if memory is reserved or not. > > Unfortunately, the highest bit of memory is the non-cached bit so > everything falls over, but we can avoid this by moving the call to > early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that > takes place right before it in setup_bootmem(). > > Obviously, both of these changes are moving the function call in > opposite directions and we can only really do one of them. We are not > sure if what we are doing with the non-cached reserved-memory section > is just not permitted & cannot work - or if this is something that > was overlooked for RISC-V specifically and works for other archs. > > It does seem like the first issue is a real bug, and I am happy to > submit the patch for that whenever - but having two problems with > opposite fixes seemed as if there was something else lurking that we > just don't have enough understanding to detect. > > Any help would be great! > > Thanks, > Conor. > > > Hello Conor, could you, please, provide the relevant device-tree sniplets. Please, have a look at the no-map property in Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt. It controls if the kernel in any way will access the memory outside of your new driver. Best regards Heinrich
Hey Heinrich, Thanks for chiming in. On 18/08/2022 15:32, Heinrich Schuchardt wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > On 8/16/22 22:41, Conor.Dooley@microchip.com wrote: >> Hey all, >> We've run into a bit of a problem with reserved memory on PolarFire, or >> more accurately a pair of problems that seem to have opposite fixes. >> >> The first of these problems is triggered when trying to implement a >> remoteproc driver. To get the reserved memory buffer, remoteproc >> does an of_reserved_mem_lookup(), something like: >> >> np = of_parse_phandle(pdev->of_node, "memory-region", 0); >> if (!np) >> return -EINVAL; >> >> rmem = of_reserved_mem_lookup(np); >> if (!rmem) >> return -EINVAL; >> >> of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find >> a match - but this was triggering kernel panics for us. We did some >> debugging and found that the name string's pointer was pointing to an >> address in the 0x4000_0000 range. The minimum reproduction for this >> crash is attached - it hacks in some print_reserved_mem()s into >> setup_vm_final() around a tlb flush so you can see the before/after. >> (You'll need a reserved memory node in your dts to replicate) >> >> The output is like so, with the same crash as in the remoteproc driver: >> >> [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022 >> [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 >> [ 0.000000] Machine model: Microchip PolarFire-SoC Icicle Kit >> [ 0.000000] earlycon: ns16550a0 at MMIO32 0x0000000020100000 (options '115200n8') >> [ 0.000000] printk: bootconsole [ns16550a0] enabled >> [ 0.000000] printk: debug: skip boot console de-registration. >> [ 0.000000] efi: UEFI not found. >> [ 0.000000] before flush >> [ 0.000000] OF: reserved mem: debug name is fabricbuf@ae000000 >> [ 0.000000] after flush >> [ 0.000000] Unable to handle kernel paging request at virtual address 00000000401c31ac >> [ 0.000000] Oops [#1] >> [ 0.000000] Modules linked in: >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1 >> [ 0.000000] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT) >> [ 0.000000] epc : string+0x4a/0xea >> [ 0.000000] ra : vsnprintf+0x1e4/0x336 >> [ 0.000000] epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0 >> [ 0.000000] gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000 >> [ 0.000000] t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20 >> [ 0.000000] s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000 >> [ 0.000000] a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff >> [ 0.000000] a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff >> [ 0.000000] s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008 >> [ 0.000000] s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00 >> [ 0.000000] s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002 >> [ 0.000000] s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617 >> [ 0.000000] t5 : ffffffff812f3618 t6 : ffffffff81203d08 >> [ 0.000000] status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d >> [ 0.000000] [<ffffffff80338936>] vsnprintf+0x1e4/0x336 >> [ 0.000000] [<ffffffff80055ae2>] vprintk_store+0xf6/0x344 >> [ 0.000000] [<ffffffff80055d86>] vprintk_emit+0x56/0x192 >> [ 0.000000] [<ffffffff80055ed8>] vprintk_default+0x16/0x1e >> [ 0.000000] [<ffffffff800563d2>] vprintk+0x72/0x80 >> [ 0.000000] [<ffffffff806813b2>] _printk+0x36/0x50 >> [ 0.000000] [<ffffffff8068af48>] print_reserved_mem+0x1c/0x24 >> [ 0.000000] [<ffffffff808057ec>] paging_init+0x528/0x5bc >> [ 0.000000] [<ffffffff808031ae>] setup_arch+0xd0/0x592 >> [ 0.000000] [<ffffffff8080070e>] start_kernel+0x82/0x73c >> [ 0.000000] ---[ end trace 0000000000000000 ]--- >> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! >> [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- >> >> We traced this back to early_init_fdt_scan_reserved_mem() in >> setup_bootmem() - moving it later back up the boot sequence to >> after the dt has been remapped etc has fixed the problem for us. >> >> The least movement to get it working is attached, and also pushed >> here: git.kernel.org/conor/c/1735589baefc >> >> The second problem is a bit more complicated to explain - but we >> found the solution conflicted with the remoteproc fix as we had >> to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot >> process to solve this one. >> >> We want to have a node in our devicetree that contains some memory >> that is non-cached & marked as reserved-memory. Maybe we have just >> missed something, but from what we've seen: >> - the really early setup looks at the dtb, picks the highest bit >> of memory and puts the dtb etc there so it can start using it >> - early_init_fdt_scan_reserved_mem() is then called, which figures >> out if memory is reserved or not. >> >> Unfortunately, the highest bit of memory is the non-cached bit so >> everything falls over, but we can avoid this by moving the call to >> early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that >> takes place right before it in setup_bootmem(). >> >> Obviously, both of these changes are moving the function call in >> opposite directions and we can only really do one of them. We are not >> sure if what we are doing with the non-cached reserved-memory section >> is just not permitted & cannot work - or if this is something that >> was overlooked for RISC-V specifically and works for other archs. >> >> It does seem like the first issue is a real bug, and I am happy to >> submit the patch for that whenever - but having two problems with >> opposite fixes seemed as if there was something else lurking that we >> just don't have enough understanding to detect. >> >> Any help would be great! > could you, please, provide the relevant device-tree sniplets. Sure. That "might" have been a good thing to do from the start.. For the first problem it is actually fairly straightforward, something like the following triggered it for me: reserved-memory { ranges; #size-cells = <2>; #address-cells = <2>; fabricbuf0: fabricbuf@0 { compatible = "shared-dma-pool"; reg = <0x0 0xae000000 0x0 0x2000000>; label = "fabricbuf0-ddr-c"; }; }; I was able to repro this with the stanza in u-boot's dt and not in Linux's too. > > Please, have a look at the no-map property in > Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt. For those playing along at home, this has been moved to: Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml For this case, it is something *like* the following: reserved-memory { ranges; #size-cells = <2>; #address-cells = <2>; dma_nc_hi: linux,dma { compatible = "shared-dma-pool"; size = <0x0 0x1000000>; no-map; linux,dma-default; alloc-ranges = <0x14 0x00000000 0x0 0x1000000>; dma-ranges = <0x14 0x00000000 0x14 0x00000000 0x0 0x1000000>; }; }; ddrc_cache_lo: memory@80000000 { device_type = "memory"; reg = <0x0 0x80000000 0x0 0x2e000000>; status = "okay"; label = "cache-lo"; }; ddrc_cache_hi: memory@1000000000 { device_type = "memory"; reg = <0x10 0x0 0x0 0x20000000>; status = "okay"; label = "cache-hi"; }; ddr_nc_hi: memory@1400000000 { device_type = "memory"; reg = <0x14 0x00000000 0x0 0x1000000>; status = "okay"; label = "non-cache-hi"; }; As you can see, that does in fact have a no-map in it. I have adapted this slightly so that it would resemble the existing dts, so it not the *exact* one the issue was found with but it is functionally the same. Hope that helps explain things a little better. Thanks, Conor.
Hi, On 16.08.2022 23:41, Conor.Dooley@microchip.com wrote: > Hey all, > We've run into a bit of a problem with reserved memory on PolarFire, or > more accurately a pair of problems that seem to have opposite fixes. > > The first of these problems is triggered when trying to implement a > remoteproc driver. To get the reserved memory buffer, remoteproc > does an of_reserved_mem_lookup(), something like: > > np = of_parse_phandle(pdev->of_node, "memory-region", 0); > if (!np) > return -EINVAL; > > rmem = of_reserved_mem_lookup(np); > if (!rmem) > return -EINVAL; > > of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find > a match - but this was triggering kernel panics for us. We did some > debugging and found that the name string's pointer was pointing to an > address in the 0x4000_0000 range. The minimum reproduction for this > crash is attached - it hacks in some print_reserved_mem()s into > setup_vm_final() around a tlb flush so you can see the before/after. > (You'll need a reserved memory node in your dts to replicate) > > The output is like so, with the same crash as in the remoteproc driver: > > [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022 > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > [ 0.000000] Machine model: Microchip PolarFire-SoC Icicle Kit > [ 0.000000] earlycon: ns16550a0 at MMIO32 0x0000000020100000 (options '115200n8') > [ 0.000000] printk: bootconsole [ns16550a0] enabled > [ 0.000000] printk: debug: skip boot console de-registration. > [ 0.000000] efi: UEFI not found. > [ 0.000000] before flush > [ 0.000000] OF: reserved mem: debug name is fabricbuf@ae000000 > [ 0.000000] after flush > [ 0.000000] Unable to handle kernel paging request at virtual address 00000000401c31ac > [ 0.000000] Oops [#1] > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1 > [ 0.000000] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT) > [ 0.000000] epc : string+0x4a/0xea > [ 0.000000] ra : vsnprintf+0x1e4/0x336 > [ 0.000000] epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0 > [ 0.000000] gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000 > [ 0.000000] t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20 > [ 0.000000] s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000 > [ 0.000000] a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff > [ 0.000000] a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff > [ 0.000000] s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008 > [ 0.000000] s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00 > [ 0.000000] s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002 > [ 0.000000] s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617 > [ 0.000000] t5 : ffffffff812f3618 t6 : ffffffff81203d08 > [ 0.000000] status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d > [ 0.000000] [<ffffffff80338936>] vsnprintf+0x1e4/0x336 > [ 0.000000] [<ffffffff80055ae2>] vprintk_store+0xf6/0x344 > [ 0.000000] [<ffffffff80055d86>] vprintk_emit+0x56/0x192 > [ 0.000000] [<ffffffff80055ed8>] vprintk_default+0x16/0x1e > [ 0.000000] [<ffffffff800563d2>] vprintk+0x72/0x80 > [ 0.000000] [<ffffffff806813b2>] _printk+0x36/0x50 > [ 0.000000] [<ffffffff8068af48>] print_reserved_mem+0x1c/0x24 > [ 0.000000] [<ffffffff808057ec>] paging_init+0x528/0x5bc > [ 0.000000] [<ffffffff808031ae>] setup_arch+0xd0/0x592 > [ 0.000000] [<ffffffff8080070e>] start_kernel+0x82/0x73c > [ 0.000000] ---[ end trace 0000000000000000 ]--- > [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > We traced this back to early_init_fdt_scan_reserved_mem() in > setup_bootmem() - moving it later back up the boot sequence to > after the dt has been remapped etc has fixed the problem for us. > > The least movement to get it working is attached, and also pushed > here: git.kernel.org/conor/c/1735589baefc Any updates on this? I have encountered the same issue with invalid reserved_mem[i].name pointers recently, while working on a remoteproc driver for our RISC-V-based SoC. I can confirm that "riscv: fix reserved memory setup" (git.kernel.org/conor/c/1735589baefc) fixes the issue in our kernel based on 5.15.x. Your patch does not seem to have any adverse side-effects either, so: Tested-by: Evgenii Shatokhin <e.shatokhin@yadro.com> If there are newer versions or variants of the fix, I'll be glad to test them too. By the way, I wonder why arm and aarch64 do not seem to be affected by the issue. As far as I can see, these architectures also populate reserved_mem[] before switching to the final memory mapping during kernel init. I have not dug deep into that though. > > The second problem is a bit more complicated to explain - but we > found the solution conflicted with the remoteproc fix as we had > to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot > process to solve this one. > > We want to have a node in our devicetree that contains some memory > that is non-cached & marked as reserved-memory. Maybe we have just > missed something, but from what we've seen: > - the really early setup looks at the dtb, picks the highest bit > of memory and puts the dtb etc there so it can start using it > - early_init_fdt_scan_reserved_mem() is then called, which figures > out if memory is reserved or not. > > Unfortunately, the highest bit of memory is the non-cached bit so > everything falls over, but we can avoid this by moving the call to > early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that > takes place right before it in setup_bootmem(). > > Obviously, both of these changes are moving the function call in > opposite directions and we can only really do one of them. We are not > sure if what we are doing with the non-cached reserved-memory section > is just not permitted & cannot work - or if this is something that > was overlooked for RISC-V specifically and works for other archs. > > It does seem like the first issue is a real bug, and I am happy to > submit the patch for that whenever - but having two problems with > opposite fixes seemed as if there was something else lurking that we > just don't have enough understanding to detect. > > Any help would be great! > > Thanks, > Conor. > > > Regards, Evgenii
On Thu, Nov 03, 2022 at 02:46:37PM +0300, Evgenii Shatokhin wrote: > Hi, > > On 16.08.2022 23:41, Conor.Dooley@microchip.com wrote: > > Hey all, > > We've run into a bit of a problem with reserved memory on PolarFire, or > > more accurately a pair of problems that seem to have opposite fixes. > > > > The first of these problems is triggered when trying to implement a > > remoteproc driver. To get the reserved memory buffer, remoteproc > > does an of_reserved_mem_lookup(), something like: > > > > np = of_parse_phandle(pdev->of_node, "memory-region", 0); > > if (!np) > > return -EINVAL; > > > > rmem = of_reserved_mem_lookup(np); > > if (!rmem) > > return -EINVAL; > > > > of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find > > a match - but this was triggering kernel panics for us. We did some > > debugging and found that the name string's pointer was pointing to an > > address in the 0x4000_0000 range. The minimum reproduction for this > > crash is attached - it hacks in some print_reserved_mem()s into > > setup_vm_final() around a tlb flush so you can see the before/after. > > (You'll need a reserved memory node in your dts to replicate) > > > > The output is like so, with the same crash as in the remoteproc driver: > > > > [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022 > > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > > [ 0.000000] Machine model: Microchip PolarFire-SoC Icicle Kit > > [ 0.000000] earlycon: ns16550a0 at MMIO32 0x0000000020100000 (options '115200n8') > > [ 0.000000] printk: bootconsole [ns16550a0] enabled > > [ 0.000000] printk: debug: skip boot console de-registration. > > [ 0.000000] efi: UEFI not found. > > [ 0.000000] before flush > > [ 0.000000] OF: reserved mem: debug name is fabricbuf@ae000000 > > [ 0.000000] after flush > > [ 0.000000] Unable to handle kernel paging request at virtual address 00000000401c31ac > > [ 0.000000] Oops [#1] > > [ 0.000000] Modules linked in: > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1 > > [ 0.000000] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT) > > [ 0.000000] epc : string+0x4a/0xea > > [ 0.000000] ra : vsnprintf+0x1e4/0x336 > > [ 0.000000] epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0 > > [ 0.000000] gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000 > > [ 0.000000] t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20 > > [ 0.000000] s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000 > > [ 0.000000] a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff > > [ 0.000000] a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff > > [ 0.000000] s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008 > > [ 0.000000] s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00 > > [ 0.000000] s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002 > > [ 0.000000] s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617 > > [ 0.000000] t5 : ffffffff812f3618 t6 : ffffffff81203d08 > > [ 0.000000] status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d > > [ 0.000000] [<ffffffff80338936>] vsnprintf+0x1e4/0x336 > > [ 0.000000] [<ffffffff80055ae2>] vprintk_store+0xf6/0x344 > > [ 0.000000] [<ffffffff80055d86>] vprintk_emit+0x56/0x192 > > [ 0.000000] [<ffffffff80055ed8>] vprintk_default+0x16/0x1e > > [ 0.000000] [<ffffffff800563d2>] vprintk+0x72/0x80 > > [ 0.000000] [<ffffffff806813b2>] _printk+0x36/0x50 > > [ 0.000000] [<ffffffff8068af48>] print_reserved_mem+0x1c/0x24 > > [ 0.000000] [<ffffffff808057ec>] paging_init+0x528/0x5bc > > [ 0.000000] [<ffffffff808031ae>] setup_arch+0xd0/0x592 > > [ 0.000000] [<ffffffff8080070e>] start_kernel+0x82/0x73c > > [ 0.000000] ---[ end trace 0000000000000000 ]--- > > [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! > > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > > > We traced this back to early_init_fdt_scan_reserved_mem() in > > setup_bootmem() - moving it later back up the boot sequence to > > after the dt has been remapped etc has fixed the problem for us. > > > > The least movement to get it working is attached, and also pushed > > here: git.kernel.org/conor/c/1735589baefc > > Any updates on this? "Yes". I /briefly/ chatted with Palmer about this at Plumbers. (see below). Funny timing from you replying though, I'm planning on spending the next days/week/weeks trying to sort this whole thing out - it's come to a head for us. > I have encountered the same issue with invalid reserved_mem[i].name pointers > recently, while working on a remoteproc driver for our RISC-V-based SoC. > > I can confirm that "riscv: fix reserved memory setup" > (git.kernel.org/conor/c/1735589baefc) fixes the issue in our kernel based on > 5.15.x. Aye, we're on 5.15 for our vendor tree too so that's where we found the problem initally. > Your patch does not seem to have any adverse side-effects either, so: This patch itself, from what we can see, doesn't have any adverse side-effects. The other issue that I pointed out in this mail with reserved memory allocations requires an opposite fix, so there's something else going on that needs digging into. When I spoke with Palmer, I said I'd spend the time looking at it but just have not got around to doing it until now. I think it may make some sense to try and merge this patch as it at least makes things better and unblocks people wanting to upstream remoteproc drivers etc. And then when I (eventually?) come up with something better maybe it can build on that. I'll resend this patch as standalone early next week unless I've somehow made a breakthrough between now and then. > Tested-by: Evgenii Shatokhin <e.shatokhin@yadro.com> Thanks! > If there are newer versions or variants of the fix, I'll be glad to test > them too. This patch is currently the most recent work I have done, but hopefully that's going to change. > By the way, I wonder why arm and aarch64 do not seem to be affected by the > issue. As far as I can see, these architectures also populate reserved_mem[] > before switching to the final memory mapping during kernel init. I have not > dug deep into that though. Aye, that's at least where I will be starting to do comparisons.. > > The second problem is a bit more complicated to explain - but we > > found the solution conflicted with the remoteproc fix as we had > > to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot > > process to solve this one. > > > > We want to have a node in our devicetree that contains some memory > > that is non-cached & marked as reserved-memory. Maybe we have just > > missed something, but from what we've seen: > > - the really early setup looks at the dtb, picks the highest bit > > of memory and puts the dtb etc there so it can start using it > > - early_init_fdt_scan_reserved_mem() is then called, which figures > > out if memory is reserved or not. > > > > Unfortunately, the highest bit of memory is the non-cached bit so > > everything falls over, but we can avoid this by moving the call to > > early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that > > takes place right before it in setup_bootmem(). > > > > Obviously, both of these changes are moving the function call in > > opposite directions and we can only really do one of them. We are not > > sure if what we are doing with the non-cached reserved-memory section > > is just not permitted & cannot work - or if this is something that > > was overlooked for RISC-V specifically and works for other archs. > > > > It does seem like the first issue is a real bug, and I am happy to > > submit the patch for that whenever - but having two problems with > > opposite fixes seemed as if there was something else lurking that we > > just don't have enough understanding to detect. > > > > Any help would be great!
Hullo Palmer, Mike & whoever else may read this, Just reviving this thread from a little while ago as I have been in the area again recently... On Tue, Aug 16, 2022 at 08:41:05PM +0000, Conor.Dooley@microchip.com wrote: > Hey all, > We've run into a bit of a problem with reserved memory on PolarFire, or > more accurately a pair of problems that seem to have opposite fixes. > > The first of these problems is triggered when trying to implement a > remoteproc driver. To get the reserved memory buffer, remoteproc > does an of_reserved_mem_lookup(), something like: > > np = of_parse_phandle(pdev->of_node, "memory-region", 0); > if (!np) > return -EINVAL; > > rmem = of_reserved_mem_lookup(np); > if (!rmem) > return -EINVAL; > > of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find > a match - but this was triggering kernel panics for us. We did some > debugging and found that the name string's pointer was pointing to an > address in the 0x4000_0000 range. The minimum reproduction for this > crash is attached - it hacks in some print_reserved_mem()s into > setup_vm_final() around a tlb flush so you can see the before/after. > (You'll need a reserved memory node in your dts to replicate) > > The output is like so, with the same crash as in the remoteproc driver: > > [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022 [...] > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > We traced this back to early_init_fdt_scan_reserved_mem() in > setup_bootmem() - moving it later back up the boot sequence to > after the dt has been remapped etc has fixed the problem for us. > > The least movement to get it working is attached, and also pushed > here: git.kernel.org/conor/c/1735589baefc This one is fixed now, as of commit 50e63dd8ed92 ("riscv: fix reserved memory setup"). > The second problem is a bit more complicated to explain - but we > found the solution conflicted with the remoteproc fix as we had > to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot > process to solve this one. > > We want to have a node in our devicetree that contains some memory > that is non-cached & marked as reserved-memory. Maybe we have just > missed something, but from what we've seen: > - the really early setup looks at the dtb, picks the highest bit > of memory and puts the dtb etc there so it can start using it > - early_init_fdt_scan_reserved_mem() is then called, which figures > out if memory is reserved or not. > > Unfortunately, the highest bit of memory is the non-cached bit so > everything falls over, but we can avoid this by moving the call to > early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that > takes place right before it in setup_bootmem(). > > Obviously, both of these changes are moving the function call in > opposite directions and we can only really do one of them. We are not > sure if what we are doing with the non-cached reserved-memory section > is just not permitted & cannot work - or if this is something that > was overlooked for RISC-V specifically and works for other archs. We ended up working around this one by making sure that U-Boot loaded the dtb to somewhere that would be inside the kernel's memory map, thus avoiding the remapping in the first place. We did run into another problem recently though, and 50e63dd8ed92 is kinda at fault for it. This particular issue was encountered with a devicetree where the top-most memory region was entirely reserved & was not observed prior to my fix for the first issue. On RISC-V, the boot sequence is something like: setup_bootmem(); setup_vm_final(); unflatten_device_tree(); early_init_fdt_scan_reserved_mem(); Whereas, before my patch it used to be (give-or-take): setup_bootmem(); early_init_fdt_scan_reserved_mem(); setup_vm_final(); unflatten_device_tree(); The difference being that we used to have scanned the reserved memory regions before calling setup_vm_final() & therefore know which regions we cannot use. As a reminder, calling early_init_fdt_scan_reserved_mem() before we've got the dt in a proper virtual memory address will cause the kernel to panic if it tries to read a reserved memory node's label. As we are now calling setup_vm_final() *before* we know what the reserved memory regions are & as RISC-V allocates memblocks from the top down, the allocations in setup_vm_final() will be done in the highest memory region. When early_init_fdt_scan_reserved_mem() then tries to reserve the entirety of that top-most memory region, the reservation fails as part of this region has already been allocated. In the scenario where I found this bug, that top-most region is non- cached memory & the kernel ends up panicking. The memblock debug code made this pretty easy to spot, otherwise I'd probably have spent more than just a few hours trying to figure out why it was panicking! My "this needs to be fixed today" solution for this problem was calling memblock_set_bottom_up(true) in setup_bootmem() & that's what we are going to carry downstream for now. I haven't tested it (yet) but I suspect that it would also fix our problem of the dtb being remapped into a non-cached region of memory that we would later go on to reserve too. Non-cached being an issue mainly due to the panicking, but failing to reserve (and using!) memory regions that are meant to be reserved is very far from ideal even when they are memory that the kernel can actually use. I have no idea if that is an acceptable solution for upstream though, so I guess this is me putting out feelers as to whether this is something I should send a patch to do *OR* if this is another sign of the issues that you (Mike, Palmer) mentioned in the past. If it isn't an acceptable solution, I'm not really too sure how to proceed! Cheers, Conor.
Hi Conor, Sorry for the delay, somehow this slipped between the cracks. On Thu, Feb 02, 2023 at 10:01:26PM +0000, Conor Dooley wrote: > Hullo Palmer, Mike & whoever else may read this, > > Just reviving this thread from a little while ago as I have been in the > area again recently... TBH, I didn't really dig deep into the issues, but the thought I had was what if DT was mapped via fixmap until the setup_vm_final() and then it would be possible to call DT methods early. Could be I'm shooting in the dark :) > On Tue, Aug 16, 2022 at 08:41:05PM +0000, Conor.Dooley@microchip.com wrote: > > Hey all, > > We've run into a bit of a problem with reserved memory on PolarFire, or > > more accurately a pair of problems that seem to have opposite fixes. > > > > The first of these problems is triggered when trying to implement a > > remoteproc driver. To get the reserved memory buffer, remoteproc > > does an of_reserved_mem_lookup(), something like: > > > > np = of_parse_phandle(pdev->of_node, "memory-region", 0); > > if (!np) > > return -EINVAL; > > > > rmem = of_reserved_mem_lookup(np); > > if (!rmem) > > return -EINVAL; > > > > of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find > > a match - but this was triggering kernel panics for us. We did some > > debugging and found that the name string's pointer was pointing to an > > address in the 0x4000_0000 range. The minimum reproduction for this > > crash is attached - it hacks in some print_reserved_mem()s into > > setup_vm_final() around a tlb flush so you can see the before/after. > > (You'll need a reserved memory node in your dts to replicate) > > > > The output is like so, with the same crash as in the remoteproc driver: > > > > [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022 > > [...] > > > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > > > We traced this back to early_init_fdt_scan_reserved_mem() in > > setup_bootmem() - moving it later back up the boot sequence to > > after the dt has been remapped etc has fixed the problem for us. > > > > The least movement to get it working is attached, and also pushed > > here: git.kernel.org/conor/c/1735589baefc > > This one is fixed now, as of commit 50e63dd8ed92 ("riscv: fix reserved > memory setup"). > > > The second problem is a bit more complicated to explain - but we > > found the solution conflicted with the remoteproc fix as we had > > to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot > > process to solve this one. > > > > We want to have a node in our devicetree that contains some memory > > that is non-cached & marked as reserved-memory. Maybe we have just > > missed something, but from what we've seen: > > - the really early setup looks at the dtb, picks the highest bit > > of memory and puts the dtb etc there so it can start using it > > - early_init_fdt_scan_reserved_mem() is then called, which figures > > out if memory is reserved or not. > > > > Unfortunately, the highest bit of memory is the non-cached bit so > > everything falls over, but we can avoid this by moving the call to > > early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that > > takes place right before it in setup_bootmem(). > > > > Obviously, both of these changes are moving the function call in > > opposite directions and we can only really do one of them. We are not > > sure if what we are doing with the non-cached reserved-memory section > > is just not permitted & cannot work - or if this is something that > > was overlooked for RISC-V specifically and works for other archs. > > We ended up working around this one by making sure that U-Boot loaded > the dtb to somewhere that would be inside the kernel's memory map, thus > avoiding the remapping in the first place. > > We did run into another problem recently though, and 50e63dd8ed92 is > kinda at fault for it. > This particular issue was encountered with a devicetree where the > top-most memory region was entirely reserved & was not observed prior > to my fix for the first issue. > > On RISC-V, the boot sequence is something like: > setup_bootmem(); > setup_vm_final(); > unflatten_device_tree(); > early_init_fdt_scan_reserved_mem(); > > Whereas, before my patch it used to be (give-or-take): > setup_bootmem(); > early_init_fdt_scan_reserved_mem(); > setup_vm_final(); > unflatten_device_tree(); > > The difference being that we used to have scanned the reserved memory > regions before calling setup_vm_final() & therefore know which regions > we cannot use. As a reminder, calling early_init_fdt_scan_reserved_mem() > before we've got the dt in a proper virtual memory address will cause > the kernel to panic if it tries to read a reserved memory node's label. > > As we are now calling setup_vm_final() *before* we know what the > reserved memory regions are & as RISC-V allocates memblocks from the top > down, the allocations in setup_vm_final() will be done in the highest > memory region. > When early_init_fdt_scan_reserved_mem() then tries to reserve the > entirety of that top-most memory region, the reservation fails as part > of this region has already been allocated. > In the scenario where I found this bug, that top-most region is non- > cached memory & the kernel ends up panicking. > The memblock debug code made this pretty easy to spot, otherwise I'd > probably have spent more than just a few hours trying to figure out why > it was panicking! > > My "this needs to be fixed today" solution for this problem was calling > memblock_set_bottom_up(true) in setup_bootmem() & that's what we are > going to carry downstream for now. > > I haven't tested it (yet) but I suspect that it would also fix our > problem of the dtb being remapped into a non-cached region of memory > that we would later go on to reserve too. Non-cached being an issue > mainly due to the panicking, but failing to reserve (and using!) memory > regions that are meant to be reserved is very far from ideal even when > they are memory that the kernel can actually use. > > I have no idea if that is an acceptable solution for upstream though, so > I guess this is me putting out feelers as to whether this is something I > should send a patch to do *OR* if this is another sign of the issues > that you (Mike, Palmer) mentioned in the past. > If it isn't an acceptable solution, I'm not really too sure how to > proceed! > > Cheers, > Conor. >
On Tue, Mar 07, 2023 at 01:35:11PM +0200, Mike Rapoport wrote: > Hi Conor, > > Sorry for the delay, somehow this slipped between the cracks. No worries. > On Thu, Feb 02, 2023 at 10:01:26PM +0000, Conor Dooley wrote: > > Hullo Palmer, Mike & whoever else may read this, > > > > Just reviving this thread from a little while ago as I have been in the > > area again recently... > > TBH, I didn't really dig deep into the issues, I only preserved most of the context here to point out that it wasn't an isolated issue, the top-down/bottom-up bit is the main part that I was interested in. The others are fixed, or workaround-able without "harming" anyone else. > but the thought I had was > what if DT was mapped via fixmap until the setup_vm_final() and then it > would be possible to call DT methods early. From my memory, this would be more along the lines of what arm64 does. I'll give it a shot and see how it goes. I figure it'll take me some time! > Could be I'm shooting in the dark :) A pointer on where to start is helpful, even if it is "rewrite a bunch of stuff". Cheers, Conor. > > On Tue, Aug 16, 2022 at 08:41:05PM +0000, Conor.Dooley@microchip.com wrote: > > > Hey all, > > > We've run into a bit of a problem with reserved memory on PolarFire, or > > > more accurately a pair of problems that seem to have opposite fixes. > > > > > > The first of these problems is triggered when trying to implement a > > > remoteproc driver. To get the reserved memory buffer, remoteproc > > > does an of_reserved_mem_lookup(), something like: > > > > > > np = of_parse_phandle(pdev->of_node, "memory-region", 0); > > > if (!np) > > > return -EINVAL; > > > > > > rmem = of_reserved_mem_lookup(np); > > > if (!rmem) > > > return -EINVAL; > > > > > > of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find > > > a match - but this was triggering kernel panics for us. We did some > > > debugging and found that the name string's pointer was pointing to an > > > address in the 0x4000_0000 range. The minimum reproduction for this > > > crash is attached - it hacks in some print_reserved_mem()s into > > > setup_vm_final() around a tlb flush so you can see the before/after. > > > (You'll need a reserved memory node in your dts to replicate) > > > > > > The output is like so, with the same crash as in the remoteproc driver: > > > > > > [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022 > > > > [...] > > > > > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > > > > > We traced this back to early_init_fdt_scan_reserved_mem() in > > > setup_bootmem() - moving it later back up the boot sequence to > > > after the dt has been remapped etc has fixed the problem for us. > > > > > > The least movement to get it working is attached, and also pushed > > > here: git.kernel.org/conor/c/1735589baefc > > > > This one is fixed now, as of commit 50e63dd8ed92 ("riscv: fix reserved > > memory setup"). > > > > > The second problem is a bit more complicated to explain - but we > > > found the solution conflicted with the remoteproc fix as we had > > > to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot > > > process to solve this one. > > > > > > We want to have a node in our devicetree that contains some memory > > > that is non-cached & marked as reserved-memory. Maybe we have just > > > missed something, but from what we've seen: > > > - the really early setup looks at the dtb, picks the highest bit > > > of memory and puts the dtb etc there so it can start using it > > > - early_init_fdt_scan_reserved_mem() is then called, which figures > > > out if memory is reserved or not. > > > > > > Unfortunately, the highest bit of memory is the non-cached bit so > > > everything falls over, but we can avoid this by moving the call to > > > early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that > > > takes place right before it in setup_bootmem(). > > > > > > Obviously, both of these changes are moving the function call in > > > opposite directions and we can only really do one of them. We are not > > > sure if what we are doing with the non-cached reserved-memory section > > > is just not permitted & cannot work - or if this is something that > > > was overlooked for RISC-V specifically and works for other archs. > > > > We ended up working around this one by making sure that U-Boot loaded > > the dtb to somewhere that would be inside the kernel's memory map, thus > > avoiding the remapping in the first place. > > > > We did run into another problem recently though, and 50e63dd8ed92 is > > kinda at fault for it. > > This particular issue was encountered with a devicetree where the > > top-most memory region was entirely reserved & was not observed prior > > to my fix for the first issue. > > > > On RISC-V, the boot sequence is something like: > > setup_bootmem(); > > setup_vm_final(); > > unflatten_device_tree(); > > early_init_fdt_scan_reserved_mem(); > > > > Whereas, before my patch it used to be (give-or-take): > > setup_bootmem(); > > early_init_fdt_scan_reserved_mem(); > > setup_vm_final(); > > unflatten_device_tree(); > > > > The difference being that we used to have scanned the reserved memory > > regions before calling setup_vm_final() & therefore know which regions > > we cannot use. As a reminder, calling early_init_fdt_scan_reserved_mem() > > before we've got the dt in a proper virtual memory address will cause > > the kernel to panic if it tries to read a reserved memory node's label. > > > > As we are now calling setup_vm_final() *before* we know what the > > reserved memory regions are & as RISC-V allocates memblocks from the top > > down, the allocations in setup_vm_final() will be done in the highest > > memory region. > > When early_init_fdt_scan_reserved_mem() then tries to reserve the > > entirety of that top-most memory region, the reservation fails as part > > of this region has already been allocated. > > In the scenario where I found this bug, that top-most region is non- > > cached memory & the kernel ends up panicking. > > The memblock debug code made this pretty easy to spot, otherwise I'd > > probably have spent more than just a few hours trying to figure out why > > it was panicking! > > > > My "this needs to be fixed today" solution for this problem was calling > > memblock_set_bottom_up(true) in setup_bootmem() & that's what we are > > going to carry downstream for now. > > > > I haven't tested it (yet) but I suspect that it would also fix our > > problem of the dtb being remapped into a non-cached region of memory > > that we would later go on to reserve too. Non-cached being an issue > > mainly due to the panicking, but failing to reserve (and using!) memory > > regions that are meant to be reserved is very far from ideal even when > > they are memory that the kernel can actually use. > > > > I have no idea if that is an acceptable solution for upstream though, so > > I guess this is me putting out feelers as to whether this is something I > > should send a patch to do *OR* if this is another sign of the issues > > that you (Mike, Palmer) mentioned in the past. > > If it isn't an acceptable solution, I'm not really too sure how to > > proceed! > > > > Cheers, > > Conor. > > > > > > -- > Sincerely yours, > Mike. >
Hi Conor, On 8/16/22 22:41, Conor.Dooley@microchip.com wrote: > Hey all, > We've run into a bit of a problem with reserved memory on PolarFire, or > more accurately a pair of problems that seem to have opposite fixes. > > The first of these problems is triggered when trying to implement a > remoteproc driver. To get the reserved memory buffer, remoteproc > does an of_reserved_mem_lookup(), something like: > > np = of_parse_phandle(pdev->of_node, "memory-region", 0); > if (!np) > return -EINVAL; > > rmem = of_reserved_mem_lookup(np); > if (!rmem) > return -EINVAL; > > of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find > a match - but this was triggering kernel panics for us. We did some > debugging and found that the name string's pointer was pointing to an > address in the 0x4000_0000 range. The minimum reproduction for this 0x4000_0000 corresponds to DTB_EARLY_BASE_VA: this is the address that is used to map the dtb before we can access it using the linear mapping. > crash is attached - it hacks in some print_reserved_mem()s into > setup_vm_final() around a tlb flush so you can see the before/after. > (You'll need a reserved memory node in your dts to replicate) > > The output is like so, with the same crash as in the remoteproc driver: > > [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022 > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > [ 0.000000] Machine model: Microchip PolarFire-SoC Icicle Kit > [ 0.000000] earlycon: ns16550a0 at MMIO32 0x0000000020100000 (options '115200n8') > [ 0.000000] printk: bootconsole [ns16550a0] enabled > [ 0.000000] printk: debug: skip boot console de-registration. > [ 0.000000] efi: UEFI not found. > [ 0.000000] before flush > [ 0.000000] OF: reserved mem: debug name is fabricbuf@ae000000 > [ 0.000000] after flush > [ 0.000000] Unable to handle kernel paging request at virtual address 00000000401c31ac You take the trap here because the mapping for the dtb does not exist in swapper_pg_dir, but you don't need this mapping anymore as you can access the device tree through the linear mapping now. I would say that: you build your kernel with CONFIG_BUILTIN_DTB and then you don't call early_init_dt_verify which resets initial_boot_params to the linear mapping address (it was initially set to 0x4000_0000 in parse_dtb). If that's the case, does the following fix your issue? diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c index 376d2827e736..2b09f0bd8432 100644 --- a/arch/riscv/kernel/setup.c +++ b/arch/riscv/kernel/setup.c @@ -276,6 +276,7 @@ void __init setup_arch(char **cmdline_p) efi_init(); paging_init(); #if IS_ENABLED(CONFIG_BUILTIN_DTB) + initial_boot_params = __va(XIP_FIXUP(dtb_early_pa)); unflatten_and_copy_device_tree(); #else if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa)))) > [ 0.000000] Oops [#1] > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1 > [ 0.000000] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT) > [ 0.000000] epc : string+0x4a/0xea > [ 0.000000] ra : vsnprintf+0x1e4/0x336 > [ 0.000000] epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0 > [ 0.000000] gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000 > [ 0.000000] t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20 > [ 0.000000] s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000 > [ 0.000000] a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff > [ 0.000000] a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff > [ 0.000000] s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008 > [ 0.000000] s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00 > [ 0.000000] s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002 > [ 0.000000] s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617 > [ 0.000000] t5 : ffffffff812f3618 t6 : ffffffff81203d08 > [ 0.000000] status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d > [ 0.000000] [<ffffffff80338936>] vsnprintf+0x1e4/0x336 > [ 0.000000] [<ffffffff80055ae2>] vprintk_store+0xf6/0x344 > [ 0.000000] [<ffffffff80055d86>] vprintk_emit+0x56/0x192 > [ 0.000000] [<ffffffff80055ed8>] vprintk_default+0x16/0x1e > [ 0.000000] [<ffffffff800563d2>] vprintk+0x72/0x80 > [ 0.000000] [<ffffffff806813b2>] _printk+0x36/0x50 > [ 0.000000] [<ffffffff8068af48>] print_reserved_mem+0x1c/0x24 > [ 0.000000] [<ffffffff808057ec>] paging_init+0x528/0x5bc > [ 0.000000] [<ffffffff808031ae>] setup_arch+0xd0/0x592 > [ 0.000000] [<ffffffff8080070e>] start_kernel+0x82/0x73c > [ 0.000000] ---[ end trace 0000000000000000 ]--- > [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > We traced this back to early_init_fdt_scan_reserved_mem() in > setup_bootmem() - moving it later back up the boot sequence to > after the dt has been remapped etc has fixed the problem for us. > > The least movement to get it working is attached, and also pushed > here: git.kernel.org/conor/c/1735589baefc > > The second problem is a bit more complicated to explain - but we > found the solution conflicted with the remoteproc fix as we had > to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot > process to solve this one. > > We want to have a node in our devicetree that contains some memory > that is non-cached & marked as reserved-memory. Maybe we have just > missed something, but from what we've seen: > - the really early setup looks at the dtb, picks the highest bit > of memory and puts the dtb etc there so it can start using it > - early_init_fdt_scan_reserved_mem() is then called, which figures > out if memory is reserved or not. > > Unfortunately, the highest bit of memory is the non-cached bit so > everything falls over, but we can avoid this by moving the call to > early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that > takes place right before it in setup_bootmem(). And then I suppose the allocations you are mentioning happen in unflatten_XXX, so parsing the device tree for reserved memory nodes before this should do the trick. Does the following fix your second issue? diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c index 2b09f0bd8432..94b3d049fe9d 100644 --- a/arch/riscv/kernel/setup.c +++ b/arch/riscv/kernel/setup.c @@ -277,14 +277,15 @@ void __init setup_arch(char **cmdline_p) paging_init(); #if IS_ENABLED(CONFIG_BUILTIN_DTB) initial_boot_params = __va(XIP_FIXUP(dtb_early_pa)); + early_init_fdt_scan_reserved_mem(); unflatten_and_copy_device_tree(); #else - if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa)))) + if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa)))) { + early_init_fdt_scan_reserved_mem(); unflatten_device_tree(); - else + } else pr_err("No DTB found in kernel mappings\n"); #endif - early_init_fdt_scan_reserved_mem(); misc_mem_init(); init_resources(); > > Obviously, both of these changes are moving the function call in > opposite directions and we can only really do one of them. We are not > sure if what we are doing with the non-cached reserved-memory section > is just not permitted & cannot work - or if this is something that > was overlooked for RISC-V specifically and works for other archs. > > It does seem like the first issue is a real bug, and I am happy to > submit the patch for that whenever - but having two problems with > opposite fixes seemed as if there was something else lurking that we > just don't have enough understanding to detect. > > Any help would be great! > > Thanks, > Conor. > Even if that does not fix your issue, the first patch is necessary as it fixes initial_boot_params. > > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On 3/9/23 11:30, Alexandre Ghiti wrote: > Hi Conor, > > On 8/16/22 22:41, Conor.Dooley@microchip.com wrote: >> Hey all, >> We've run into a bit of a problem with reserved memory on PolarFire, or >> more accurately a pair of problems that seem to have opposite fixes. >> >> The first of these problems is triggered when trying to implement a >> remoteproc driver. To get the reserved memory buffer, remoteproc >> does an of_reserved_mem_lookup(), something like: >> >> np = of_parse_phandle(pdev->of_node, "memory-region", 0); >> if (!np) >> return -EINVAL; >> >> rmem = of_reserved_mem_lookup(np); >> if (!rmem) >> return -EINVAL; >> >> of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find >> a match - but this was triggering kernel panics for us. We did some >> debugging and found that the name string's pointer was pointing to an >> address in the 0x4000_0000 range. The minimum reproduction for this > > > 0x4000_0000 corresponds to DTB_EARLY_BASE_VA: this is the address that > is used to map the dtb before we can access it using the linear mapping. > > >> crash is attached - it hacks in some print_reserved_mem()s into >> setup_vm_final() around a tlb flush so you can see the before/after. >> (You'll need a reserved memory node in your dts to replicate) >> >> The output is like so, with the same crash as in the remoteproc driver: >> >> [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 >> (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, >> GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022 >> [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 >> [ 0.000000] Machine model: Microchip PolarFire-SoC Icicle Kit >> [ 0.000000] earlycon: ns16550a0 at MMIO32 0x0000000020100000 >> (options '115200n8') >> [ 0.000000] printk: bootconsole [ns16550a0] enabled >> [ 0.000000] printk: debug: skip boot console de-registration. >> [ 0.000000] efi: UEFI not found. >> [ 0.000000] before flush >> [ 0.000000] OF: reserved mem: debug name is fabricbuf@ae000000 >> [ 0.000000] after flush >> [ 0.000000] Unable to handle kernel paging request at virtual >> address 00000000401c31ac > > > You take the trap here because the mapping for the dtb does not exist > in swapper_pg_dir, but you don't need this mapping anymore as you can > access the device tree through the linear mapping now. You can forget everything below, I was completely wrong. I'll follow up on Mike's answer. > > I would say that: you build your kernel with CONFIG_BUILTIN_DTB and > then you don't call early_init_dt_verify which resets > initial_boot_params to the linear mapping address (it was initially > set to 0x4000_0000 in parse_dtb). If that's the case, does the > following fix your issue? > > > diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c > index 376d2827e736..2b09f0bd8432 100644 > --- a/arch/riscv/kernel/setup.c > +++ b/arch/riscv/kernel/setup.c > @@ -276,6 +276,7 @@ void __init setup_arch(char **cmdline_p) > efi_init(); > paging_init(); > #if IS_ENABLED(CONFIG_BUILTIN_DTB) > + initial_boot_params = __va(XIP_FIXUP(dtb_early_pa)); > unflatten_and_copy_device_tree(); > #else > if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa)))) > > >> [ 0.000000] Oops [#1] >> [ 0.000000] Modules linked in: >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted >> 6.0.0-rc1-00001-g0d9d6953d834 #1 >> [ 0.000000] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT) >> [ 0.000000] epc : string+0x4a/0xea >> [ 0.000000] ra : vsnprintf+0x1e4/0x336 >> [ 0.000000] epc : ffffffff80335ea0 ra : ffffffff80338936 sp : >> ffffffff81203be0 >> [ 0.000000] gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : >> 0000000000000000 >> [ 0.000000] t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : >> ffffffff81203c20 >> [ 0.000000] s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : >> 0000000000000000 >> [ 0.000000] a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : >> ffffffffffffffff >> [ 0.000000] a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : >> ffffffffffffffff >> [ 0.000000] s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : >> 0000000000000008 >> [ 0.000000] s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : >> 00000000ffffff00 >> [ 0.000000] s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: >> 0000000000000002 >> [ 0.000000] s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : >> ffffffff812f3617 >> [ 0.000000] t5 : ffffffff812f3618 t6 : ffffffff81203d08 >> [ 0.000000] status: 0000000200000100 badaddr: 00000000401c31ac >> cause: 000000000000000d >> [ 0.000000] [<ffffffff80338936>] vsnprintf+0x1e4/0x336 >> [ 0.000000] [<ffffffff80055ae2>] vprintk_store+0xf6/0x344 >> [ 0.000000] [<ffffffff80055d86>] vprintk_emit+0x56/0x192 >> [ 0.000000] [<ffffffff80055ed8>] vprintk_default+0x16/0x1e >> [ 0.000000] [<ffffffff800563d2>] vprintk+0x72/0x80 >> [ 0.000000] [<ffffffff806813b2>] _printk+0x36/0x50 >> [ 0.000000] [<ffffffff8068af48>] print_reserved_mem+0x1c/0x24 >> [ 0.000000] [<ffffffff808057ec>] paging_init+0x528/0x5bc >> [ 0.000000] [<ffffffff808031ae>] setup_arch+0xd0/0x592 >> [ 0.000000] [<ffffffff8080070e>] start_kernel+0x82/0x73c >> [ 0.000000] ---[ end trace 0000000000000000 ]--- >> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle >> task! >> [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill >> the idle task! ]--- >> >> We traced this back to early_init_fdt_scan_reserved_mem() in >> setup_bootmem() - moving it later back up the boot sequence to >> after the dt has been remapped etc has fixed the problem for us. >> >> The least movement to get it working is attached, and also pushed >> here: git.kernel.org/conor/c/1735589baefc >> >> The second problem is a bit more complicated to explain - but we >> found the solution conflicted with the remoteproc fix as we had >> to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot >> process to solve this one. >> >> We want to have a node in our devicetree that contains some memory >> that is non-cached & marked as reserved-memory. Maybe we have just >> missed something, but from what we've seen: >> - the really early setup looks at the dtb, picks the highest bit >> of memory and puts the dtb etc there so it can start using it >> - early_init_fdt_scan_reserved_mem() is then called, which figures >> out if memory is reserved or not. >> >> Unfortunately, the highest bit of memory is the non-cached bit so >> everything falls over, but we can avoid this by moving the call to >> early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that >> takes place right before it in setup_bootmem(). > > > And then I suppose the allocations you are mentioning happen in > unflatten_XXX, so parsing the device tree for reserved memory nodes > before this should do the trick. Does the following fix your second > issue? > > diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c > index 2b09f0bd8432..94b3d049fe9d 100644 > --- a/arch/riscv/kernel/setup.c > +++ b/arch/riscv/kernel/setup.c > @@ -277,14 +277,15 @@ void __init setup_arch(char **cmdline_p) > paging_init(); > #if IS_ENABLED(CONFIG_BUILTIN_DTB) > initial_boot_params = __va(XIP_FIXUP(dtb_early_pa)); > + early_init_fdt_scan_reserved_mem(); > unflatten_and_copy_device_tree(); > #else > - if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa)))) > + if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa)))) { > + early_init_fdt_scan_reserved_mem(); > unflatten_device_tree(); > - else > + } else > pr_err("No DTB found in kernel mappings\n"); > #endif > - early_init_fdt_scan_reserved_mem(); > misc_mem_init(); > > init_resources(); > > > >> >> Obviously, both of these changes are moving the function call in >> opposite directions and we can only really do one of them. We are not >> sure if what we are doing with the non-cached reserved-memory section >> is just not permitted & cannot work - or if this is something that >> was overlooked for RISC-V specifically and works for other archs. >> >> It does seem like the first issue is a real bug, and I am happy to >> submit the patch for that whenever - but having two problems with >> opposite fixes seemed as if there was something else lurking that we >> just don't have enough understanding to detect. >> >> Any help would be great! >> >> Thanks, >> Conor. >> > > Even if that does not fix your issue, the first patch is necessary as > it fixes initial_boot_params. > > >> >> >> _______________________________________________ >> linux-riscv mailing list >> linux-riscv@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-riscv
On Thu, Mar 09, 2023 at 11:30:07AM +0100, Alexandre Ghiti wrote: > Hi Conor, Hey Alex! > On 8/16/22 22:41, Conor.Dooley@microchip.com wrote: > > Hey all, > > We've run into a bit of a problem with reserved memory on PolarFire, or > > more accurately a pair of problems that seem to have opposite fixes. > > > > The first of these problems is triggered when trying to implement a > > remoteproc driver. To get the reserved memory buffer, remoteproc > > does an of_reserved_mem_lookup(), something like: > > > > np = of_parse_phandle(pdev->of_node, "memory-region", 0); > > if (!np) > > return -EINVAL; > > > > rmem = of_reserved_mem_lookup(np); > > if (!rmem) > > return -EINVAL; > > > > of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find > > a match - but this was triggering kernel panics for us. We did some > > debugging and found that the name string's pointer was pointing to an > > address in the 0x4000_0000 range. The minimum reproduction for this > > > 0x4000_0000 corresponds to DTB_EARLY_BASE_VA: this is the address that is > used to map the dtb before we can access it using the linear mapping. > > > > crash is attached - it hacks in some print_reserved_mem()s into > > setup_vm_final() around a tlb flush so you can see the before/after. > > (You'll need a reserved memory node in your dts to replicate) > > > > The output is like so, with the same crash as in the remoteproc driver: > > > > [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022 > > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > > [ 0.000000] Machine model: Microchip PolarFire-SoC Icicle Kit > > [ 0.000000] earlycon: ns16550a0 at MMIO32 0x0000000020100000 (options '115200n8') > > [ 0.000000] printk: bootconsole [ns16550a0] enabled > > [ 0.000000] printk: debug: skip boot console de-registration. > > [ 0.000000] efi: UEFI not found. > > [ 0.000000] before flush > > [ 0.000000] OF: reserved mem: debug name is fabricbuf@ae000000 > > [ 0.000000] after flush > > [ 0.000000] Unable to handle kernel paging request at virtual address 00000000401c31ac > > > You take the trap here because the mapping for the dtb does not exist in > swapper_pg_dir, but you don't need this mapping anymore as you can access > the device tree through the linear mapping now. > > I would say that: you build your kernel with CONFIG_BUILTIN_DTB I do not have any "NONPORTABLE" options selected. > and then you > don't call early_init_dt_verify which resets initial_boot_params to the > linear mapping address (it was initially set to 0x4000_0000 in parse_dtb). > If that's the case, does the following fix your issue? I dunno, I don't hit this issue anymore as of 50e63dd8ed92 ("riscv: fix reserved memory setup"). That might not be the right fix in the grand scheme of things, but it solved problems that both I and others were hitting. > diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c > index 376d2827e736..2b09f0bd8432 100644 > --- a/arch/riscv/kernel/setup.c > +++ b/arch/riscv/kernel/setup.c > @@ -276,6 +276,7 @@ void __init setup_arch(char **cmdline_p) > efi_init(); > paging_init(); > #if IS_ENABLED(CONFIG_BUILTIN_DTB) > + initial_boot_params = __va(XIP_FIXUP(dtb_early_pa)); > unflatten_and_copy_device_tree(); > #else > if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa)))) I'll try and allocate some time for looking at this next week, got some items on my todo list first. I doubt it'll help me cos I have not get that option set, but I can go and test reserved memory stuff for systems that do use it (I have a k210). > > [ 0.000000] Oops [#1] > > [ 0.000000] Modules linked in: > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1 > > [ 0.000000] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT) > > [ 0.000000] epc : string+0x4a/0xea > > [ 0.000000] ra : vsnprintf+0x1e4/0x336 > > [ 0.000000] epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0 > > [ 0.000000] gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000 > > [ 0.000000] t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20 > > [ 0.000000] s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000 > > [ 0.000000] a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff > > [ 0.000000] a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff > > [ 0.000000] s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008 > > [ 0.000000] s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00 > > [ 0.000000] s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002 > > [ 0.000000] s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617 > > [ 0.000000] t5 : ffffffff812f3618 t6 : ffffffff81203d08 > > [ 0.000000] status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d > > [ 0.000000] [<ffffffff80338936>] vsnprintf+0x1e4/0x336 > > [ 0.000000] [<ffffffff80055ae2>] vprintk_store+0xf6/0x344 > > [ 0.000000] [<ffffffff80055d86>] vprintk_emit+0x56/0x192 > > [ 0.000000] [<ffffffff80055ed8>] vprintk_default+0x16/0x1e > > [ 0.000000] [<ffffffff800563d2>] vprintk+0x72/0x80 > > [ 0.000000] [<ffffffff806813b2>] _printk+0x36/0x50 > > [ 0.000000] [<ffffffff8068af48>] print_reserved_mem+0x1c/0x24 > > [ 0.000000] [<ffffffff808057ec>] paging_init+0x528/0x5bc > > [ 0.000000] [<ffffffff808031ae>] setup_arch+0xd0/0x592 > > [ 0.000000] [<ffffffff8080070e>] start_kernel+0x82/0x73c > > [ 0.000000] ---[ end trace 0000000000000000 ]--- > > [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! > > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > > > We traced this back to early_init_fdt_scan_reserved_mem() in > > setup_bootmem() - moving it later back up the boot sequence to > > after the dt has been remapped etc has fixed the problem for us. > > > > The least movement to get it working is attached, and also pushed > > here: git.kernel.org/conor/c/1735589baefc > > > > The second problem is a bit more complicated to explain - but we > > found the solution conflicted with the remoteproc fix as we had > > to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot > > process to solve this one. > > > > We want to have a node in our devicetree that contains some memory > > that is non-cached & marked as reserved-memory. Maybe we have just > > missed something, but from what we've seen: > > - the really early setup looks at the dtb, picks the highest bit > > of memory and puts the dtb etc there so it can start using it > > - early_init_fdt_scan_reserved_mem() is then called, which figures > > out if memory is reserved or not. > > > > Unfortunately, the highest bit of memory is the non-cached bit so > > everything falls over, but we can avoid this by moving the call to > > early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that > > takes place right before it in setup_bootmem(). > > > And then I suppose the allocations you are mentioning happen in > unflatten_XXX, so parsing the device tree for reserved memory nodes before > this should do the trick. Does the following fix your second issue? Again, I'll have to look at this. To give you a quick answer, I believe it is the create_pgd_mapping() stuff in setup_vm_final() actually. I'm not super sure about this particular bit of the issue, but in the email I sent more recently that expands on these issues (that Mike replied to the other day) the similar issue that I saw the allocations in reserved memory were inside paging_init(). > diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c > index 2b09f0bd8432..94b3d049fe9d 100644 > --- a/arch/riscv/kernel/setup.c > +++ b/arch/riscv/kernel/setup.c > @@ -277,14 +277,15 @@ void __init setup_arch(char **cmdline_p) > paging_init(); > #if IS_ENABLED(CONFIG_BUILTIN_DTB) > initial_boot_params = __va(XIP_FIXUP(dtb_early_pa)); > + early_init_fdt_scan_reserved_mem(); > unflatten_and_copy_device_tree(); > #else > - if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa)))) > + if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa)))) { > + early_init_fdt_scan_reserved_mem(); > unflatten_device_tree(); Previously, this was problematic (doing the scan before unflattening). I moved the scan to the earliest possible place that I could call it without issues. I'll get back to you... > - else > + } else > pr_err("No DTB found in kernel mappings\n"); > #endif > - early_init_fdt_scan_reserved_mem(); > misc_mem_init(); > > init_resources(); > > > > > > > Obviously, both of these changes are moving the function call in > > opposite directions and we can only really do one of them. We are not > > sure if what we are doing with the non-cached reserved-memory section > > is just not permitted & cannot work - or if this is something that > > was overlooked for RISC-V specifically and works for other archs. > > > > It does seem like the first issue is a real bug, and I am happy to > > submit the patch for that whenever - but having two problems with > > opposite fixes seemed as if there was something else lurking that we > > just don't have enough understanding to detect. > > > > Any help would be great! > > > > Thanks, > > Conor. > > > > Even if that does not fix your issue, the first patch is necessary as it > fixes initial_boot_params. Sure, send a patch (with a nice commit message please, so that I don't have to remember the whole process to review it!!) as my brain has long since swapped out the nitty gritty of the unflattening -> scanning process. Thanks Alex! Conor.
On 3/7/23 12:35, Mike Rapoport wrote: > Hi Conor, > > Sorry for the delay, somehow this slipped between the cracks. > > On Thu, Feb 02, 2023 at 10:01:26PM +0000, Conor Dooley wrote: >> Hullo Palmer, Mike & whoever else may read this, >> >> Just reviving this thread from a little while ago as I have been in the >> area again recently... > TBH, I didn't really dig deep into the issues, but the thought I had was > what if DT was mapped via fixmap until the setup_vm_final() and then it > would be possible to call DT methods early. > > Could be I'm shooting in the dark :) I think I understand the issue now, it's because In riscv, we establish 2 different virtual mappings and we map the device tree at 2 different virtual addresses, which is the problem. So to me, the solution is: - to revert your previous fix, that is calling early_init_fdt_scan_reserved_mem() before any call to memblock_alloc() (which could result in an allocation in the area you want to reserve) - to map the device tree at the same virtual address, because early_init_fdt_scan_reserved_mem() initializes reserved_mem with the dtb mapping established in setup_vm() and uses reserved_mem with the new mapping from setup_vm_final (which is what Mike proposes, we should use the fixmap region to have the same virtual addresses) Hope that makes sense: I'll come up with something this afternoon for you to test! Sorry for the first completely wrong email, Thanks, Alex > >> On Tue, Aug 16, 2022 at 08:41:05PM +0000, Conor.Dooley@microchip.com wrote: >>> Hey all, >>> We've run into a bit of a problem with reserved memory on PolarFire, or >>> more accurately a pair of problems that seem to have opposite fixes. >>> >>> The first of these problems is triggered when trying to implement a >>> remoteproc driver. To get the reserved memory buffer, remoteproc >>> does an of_reserved_mem_lookup(), something like: >>> >>> np = of_parse_phandle(pdev->of_node, "memory-region", 0); >>> if (!np) >>> return -EINVAL; >>> >>> rmem = of_reserved_mem_lookup(np); >>> if (!rmem) >>> return -EINVAL; >>> >>> of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find >>> a match - but this was triggering kernel panics for us. We did some >>> debugging and found that the name string's pointer was pointing to an >>> address in the 0x4000_0000 range. The minimum reproduction for this >>> crash is attached - it hacks in some print_reserved_mem()s into >>> setup_vm_final() around a tlb flush so you can see the before/after. >>> (You'll need a reserved memory node in your dts to replicate) >>> >>> The output is like so, with the same crash as in the remoteproc driver: >>> >>> [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022 >> [...] >> >>> [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- >>> >>> We traced this back to early_init_fdt_scan_reserved_mem() in >>> setup_bootmem() - moving it later back up the boot sequence to >>> after the dt has been remapped etc has fixed the problem for us. >>> >>> The least movement to get it working is attached, and also pushed >>> here: git.kernel.org/conor/c/1735589baefc >> This one is fixed now, as of commit 50e63dd8ed92 ("riscv: fix reserved >> memory setup"). >> >>> The second problem is a bit more complicated to explain - but we >>> found the solution conflicted with the remoteproc fix as we had >>> to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot >>> process to solve this one. >>> >>> We want to have a node in our devicetree that contains some memory >>> that is non-cached & marked as reserved-memory. Maybe we have just >>> missed something, but from what we've seen: >>> - the really early setup looks at the dtb, picks the highest bit >>> of memory and puts the dtb etc there so it can start using it >>> - early_init_fdt_scan_reserved_mem() is then called, which figures >>> out if memory is reserved or not. >>> >>> Unfortunately, the highest bit of memory is the non-cached bit so >>> everything falls over, but we can avoid this by moving the call to >>> early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that >>> takes place right before it in setup_bootmem(). >>> >>> Obviously, both of these changes are moving the function call in >>> opposite directions and we can only really do one of them. We are not >>> sure if what we are doing with the non-cached reserved-memory section >>> is just not permitted & cannot work - or if this is something that >>> was overlooked for RISC-V specifically and works for other archs. >> We ended up working around this one by making sure that U-Boot loaded >> the dtb to somewhere that would be inside the kernel's memory map, thus >> avoiding the remapping in the first place. >> >> We did run into another problem recently though, and 50e63dd8ed92 is >> kinda at fault for it. >> This particular issue was encountered with a devicetree where the >> top-most memory region was entirely reserved & was not observed prior >> to my fix for the first issue. >> >> On RISC-V, the boot sequence is something like: >> setup_bootmem(); >> setup_vm_final(); >> unflatten_device_tree(); >> early_init_fdt_scan_reserved_mem(); >> >> Whereas, before my patch it used to be (give-or-take): >> setup_bootmem(); >> early_init_fdt_scan_reserved_mem(); >> setup_vm_final(); >> unflatten_device_tree(); >> >> The difference being that we used to have scanned the reserved memory >> regions before calling setup_vm_final() & therefore know which regions >> we cannot use. As a reminder, calling early_init_fdt_scan_reserved_mem() >> before we've got the dt in a proper virtual memory address will cause >> the kernel to panic if it tries to read a reserved memory node's label. >> >> As we are now calling setup_vm_final() *before* we know what the >> reserved memory regions are & as RISC-V allocates memblocks from the top >> down, the allocations in setup_vm_final() will be done in the highest >> memory region. >> When early_init_fdt_scan_reserved_mem() then tries to reserve the >> entirety of that top-most memory region, the reservation fails as part >> of this region has already been allocated. >> In the scenario where I found this bug, that top-most region is non- >> cached memory & the kernel ends up panicking. >> The memblock debug code made this pretty easy to spot, otherwise I'd >> probably have spent more than just a few hours trying to figure out why >> it was panicking! >> >> My "this needs to be fixed today" solution for this problem was calling >> memblock_set_bottom_up(true) in setup_bootmem() & that's what we are >> going to carry downstream for now. >> >> I haven't tested it (yet) but I suspect that it would also fix our >> problem of the dtb being remapped into a non-cached region of memory >> that we would later go on to reserve too. Non-cached being an issue >> mainly due to the panicking, but failing to reserve (and using!) memory >> regions that are meant to be reserved is very far from ideal even when >> they are memory that the kernel can actually use. >> >> I have no idea if that is an acceptable solution for upstream though, so >> I guess this is me putting out feelers as to whether this is something I >> should send a patch to do *OR* if this is another sign of the issues >> that you (Mike, Palmer) mentioned in the past. >> If it isn't an acceptable solution, I'm not really too sure how to >> proceed! >> >> Cheers, >> Conor. >> > >
On Thu, Mar 09, 2023 at 01:45:05PM +0100, Alexandre Ghiti wrote: > > On 3/7/23 12:35, Mike Rapoport wrote: > > Hi Conor, > > > > Sorry for the delay, somehow this slipped between the cracks. > > > > On Thu, Feb 02, 2023 at 10:01:26PM +0000, Conor Dooley wrote: > > > Hullo Palmer, Mike & whoever else may read this, > > > > > > Just reviving this thread from a little while ago as I have been in the > > > area again recently... > > TBH, I didn't really dig deep into the issues, but the thought I had was > > what if DT was mapped via fixmap until the setup_vm_final() and then it > > would be possible to call DT methods early. > > > > Could be I'm shooting in the dark :) > > > I think I understand the issue now, it's because In riscv, we establish 2 > different virtual mappings and we map the device tree at 2 different virtual > addresses, which is the problem. > > So to me, the solution is: > > - to revert your previous fix, that is calling > early_init_fdt_scan_reserved_mem() before any call to memblock_alloc() > (which could result in an allocation in the area you want to reserve) > > - to map the device tree at the same virtual address, because > early_init_fdt_scan_reserved_mem() initializes reserved_mem with the dtb > mapping established in setup_vm() and uses reserved_mem with the new mapping > from setup_vm_final (which is what Mike proposes, we should use the fixmap > region to have the same virtual addresses) > > Hope that makes sense: I'll come up with something this afternoon for you to > test! Sounds good. Please give me some ELI5 commit messages if you can, explanations for this stuff (which I found took a lot of archaeology to understand) would be very welcome next time we need to go back looking at this stuff. Thanks Alex! Conor.
On 3/9/23 13:51, Conor Dooley wrote: > On Thu, Mar 09, 2023 at 01:45:05PM +0100, Alexandre Ghiti wrote: >> On 3/7/23 12:35, Mike Rapoport wrote: >>> Hi Conor, >>> >>> Sorry for the delay, somehow this slipped between the cracks. >>> >>> On Thu, Feb 02, 2023 at 10:01:26PM +0000, Conor Dooley wrote: >>>> Hullo Palmer, Mike & whoever else may read this, >>>> >>>> Just reviving this thread from a little while ago as I have been in the >>>> area again recently... >>> TBH, I didn't really dig deep into the issues, but the thought I had was >>> what if DT was mapped via fixmap until the setup_vm_final() and then it >>> would be possible to call DT methods early. >>> >>> Could be I'm shooting in the dark :) >> >> I think I understand the issue now, it's because In riscv, we establish 2 >> different virtual mappings and we map the device tree at 2 different virtual >> addresses, which is the problem. >> >> So to me, the solution is: >> >> - to revert your previous fix, that is calling >> early_init_fdt_scan_reserved_mem() before any call to memblock_alloc() >> (which could result in an allocation in the area you want to reserve) >> >> - to map the device tree at the same virtual address, because >> early_init_fdt_scan_reserved_mem() initializes reserved_mem with the dtb >> mapping established in setup_vm() and uses reserved_mem with the new mapping >> from setup_vm_final (which is what Mike proposes, we should use the fixmap >> region to have the same virtual addresses) >> >> Hope that makes sense: I'll come up with something this afternoon for you to >> test! > Sounds good. Please give me some ELI5 commit messages if you can, > explanations for this stuff (which I found took a lot of archaeology to > understand) would be very welcome next time we need to go back looking > at this stuff. Can you give it a try here: https://github.com/AlexGhiti/riscv-linux/commits/dev/alex/conor_dtb_fixmap_v1 ? That works for me but I need to carefully explain and check that's correct though, not upstreamable as is. Thanks, Alex > Thanks Alex! > Conor. > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
On Thu, Mar 09, 2023 at 04:12:57PM +0100, Alexandre Ghiti wrote: > On 3/9/23 13:51, Conor Dooley wrote: > > On Thu, Mar 09, 2023 at 01:45:05PM +0100, Alexandre Ghiti wrote: > > > On 3/7/23 12:35, Mike Rapoport wrote: > > > > Hi Conor, > > > > > > > > Sorry for the delay, somehow this slipped between the cracks. > > > > > > > > On Thu, Feb 02, 2023 at 10:01:26PM +0000, Conor Dooley wrote: > > > > > Hullo Palmer, Mike & whoever else may read this, > > > > > > > > > > Just reviving this thread from a little while ago as I have been in the > > > > > area again recently... > > > > TBH, I didn't really dig deep into the issues, but the thought I had was > > > > what if DT was mapped via fixmap until the setup_vm_final() and then it > > > > would be possible to call DT methods early. > > > > > > > > Could be I'm shooting in the dark :) > > > > > > I think I understand the issue now, it's because In riscv, we establish 2 > > > different virtual mappings and we map the device tree at 2 different virtual > > > addresses, which is the problem. > > > > > > So to me, the solution is: > > > > > > - to revert your previous fix, that is calling > > > early_init_fdt_scan_reserved_mem() before any call to memblock_alloc() > > > (which could result in an allocation in the area you want to reserve) > > > > > > - to map the device tree at the same virtual address, because > > > early_init_fdt_scan_reserved_mem() initializes reserved_mem with the dtb > > > mapping established in setup_vm() and uses reserved_mem with the new mapping > > > from setup_vm_final (which is what Mike proposes, we should use the fixmap > > > region to have the same virtual addresses) > > > > > > Hope that makes sense: I'll come up with something this afternoon for you to > > > test! > > Sounds good. Please give me some ELI5 commit messages if you can, > > explanations for this stuff (which I found took a lot of archaeology to > > understand) would be very welcome next time we need to go back looking > > at this stuff. > > > Can you give it a try here: > https://github.com/AlexGhiti/riscv-linux/commits/dev/alex/conor_dtb_fixmap_v1 > ? > > That works for me but I need to carefully explain and check that's correct > though, not upstreamable as is. Hey Alex, So I ended up being pretty sick & had to take a week off. I gave this an initial spin today & it appears to work. I'll take it for a longer test-drive when you send a "real" patch for it, but I tested both the lookup by name & the situation that was allocating in reserved memory and both were not an issue. Thanks for working on this, Conor.
From 14447dc618a3007bf17dd27c03f7fec095efbc6f Mon Sep 17 00:00:00 2001 From: Valentina Fernandez <valentina.fernandezalanis@microchip.com> Date: Wed, 13 Jul 2022 10:56:47 +0100 Subject: [PATCH] Debug tlb_flush with reserved memory --- arch/riscv/boot/dts/microchip/Makefile | 1 + .../microchip/mpfs-icicle-kit-context-a.dts | 203 ++++++++++++++++++ arch/riscv/mm/init.c | 9 +- drivers/of/of_reserved_mem.c | 6 + include/linux/of_reserved_mem.h | 2 + 5 files changed, 220 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/boot/dts/microchip/mpfs-icicle-kit-context-a.dts diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index d466ec670e1f..5458519c3eb4 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -20,7 +20,7 @@ #include <linux/dma-map-ops.h> #include <linux/crash_dump.h> #include <linux/hugetlb.h> - +#include <linux/of_reserved_mem.h> #include <asm/fixmap.h> #include <asm/tlbflush.h> #include <asm/sections.h> @@ -1112,8 +1112,15 @@ static void __init setup_vm_final(void) /* Move to swapper page table */ csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | satp_mode); + + pr_err("before flush\n"); + print_reserved_mem(); + local_flush_tlb_all(); + pr_err("after flush\n"); + print_reserved_mem(); + pt_ops_set_late(); } #else diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c index 75caa6f5d36f..6aaebb05730d 100644 --- a/drivers/of/of_reserved_mem.c +++ b/drivers/of/of_reserved_mem.c @@ -445,3 +445,9 @@ struct reserved_mem *of_reserved_mem_lookup(struct device_node *np) return NULL; } EXPORT_SYMBOL_GPL(of_reserved_mem_lookup); + +void print_reserved_mem(void) +{ + pr_err("debug name is %s\n", reserved_mem[0].name); +} +EXPORT_SYMBOL_GPL(print_reserved_mem); \ No newline at end of file diff --git a/include/linux/of_reserved_mem.h b/include/linux/of_reserved_mem.h index 4de2a24cadc9..1fe504af3944 100644 --- a/include/linux/of_reserved_mem.h +++ b/include/linux/of_reserved_mem.h @@ -40,6 +40,8 @@ int of_reserved_mem_device_init_by_name(struct device *dev, void of_reserved_mem_device_release(struct device *dev); struct reserved_mem *of_reserved_mem_lookup(struct device_node *np); + +void print_reserved_mem(void); #else #define RESERVEDMEM_OF_DECLARE(name, compat, init) \ -- 2.25.1