Message ID | 20181214040819.58625-1-cai@lca.pw (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] arm64: invalidate TLB just before turning MMU on | expand |
On Fri, Dec 14, 2018 at 9:39 AM Qian Cai <cai@lca.pw> wrote: > > On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash > dump just hung. It has 4 threads on each core. Each 2-core share a same > L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same > L3 cache. > > It turned out that this was due to the TLB contained stale entries (or > uninitialized junk which just happened to look valid) before turning the > MMU on in the second kernel which caused this instruction hung, > > msr sctlr_el1, x0 > > Although there is a local TLB flush in the second kernel in > __cpu_setup(), it is called too early. When the time to turn the MMU on > later, the TLB is dirty again from some reasons. > > Also tried to move the local TLB flush part around a bit inside > __cpu_setup(), although it did complete kdump some times, it did trigger > "Synchronous Exception" in EFI after a cold-reboot fairly often that > seems no way to recover remotely without reinstalling the OS. For > example, in those places, > > ENTRY(__cpu_setup) > + isb > tlbi vmalle1 > dsb nsh > > or > > mov x0, #3 << 20 > msr cpacr_el1, x0 > + tlbi vmalle1 > + dsb nsh > > Since it is only necessary to flush local TLB right before turning the > MMU on, just re-arrage the part a bit like the one in __primary_switch() > within CONFIG_RANDOMIZE_BASE path, so it does not depends on other > instructions in between that could pollute the TLB, and it no longer > trigger "Synchronous Exception" as well. > > Signed-off-by: Qian Cai <cai@lca.pw> > --- > > v2: merge the similar part from __cpu_setup() pointed out by James. > > arch/arm64/kernel/head.S | 4 ++++ > arch/arm64/mm/proc.S | 3 --- > 2 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S > index 4471f570a295..7f555dd4577e 100644 > --- a/arch/arm64/kernel/head.S > +++ b/arch/arm64/kernel/head.S > @@ -771,6 +771,10 @@ ENTRY(__enable_mmu) > msr ttbr0_el1, x2 // load TTBR0 > msr ttbr1_el1, x1 // load TTBR1 > isb > + > + tlbi vmalle1 // invalidate TLB > + dsb nsh > + > msr sctlr_el1, x0 > isb > /* > diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S > index 2c75b0b903ae..14f68afdd57f 100644 > --- a/arch/arm64/mm/proc.S > +++ b/arch/arm64/mm/proc.S > @@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings) > */ > .pushsection ".idmap.text", "awx" > ENTRY(__cpu_setup) > - tlbi vmalle1 // Invalidate local TLB > - dsb nsh > - > mov x0, #3 << 20 > msr cpacr_el1, x0 // Enable FP/ASIMD > mov x0, #1 << 12 // Reset mdscr_el1 and disable > -- > 2.17.2 (Apple Git-113) > Not sure why I can't reproduce on my HPE Apollo machine, so a couple of questions: 1. How many CPUs do you enable in the kdump kernel - do you pass 'nr_cpus=1' to the kdump kernel to limit the maximum number of cores to 1 in the kdump kernel? 2. Which firmware version do you use on your board? Thanks, Bhupesh
On Fri, 14 Dec 2018 at 05:08, Qian Cai <cai@lca.pw> wrote: > Also tried to move the local TLB flush part around a bit inside > __cpu_setup(), although it did complete kdump some times, it did trigger > "Synchronous Exception" in EFI after a cold-reboot fairly often that > seems no way to recover remotely without reinstalling the OS. This doesn't make any sense to me. If the system gets into a weird state out of cold reboot, how could this code be the culprit? Please check your firmware, and try to reproduce the issue on a system that doesn't have such defects.
On 12/14/18 12:01 AM, Bhupesh Sharma wrote: > Not sure why I can't reproduce on my HPE Apollo machine, so a couple > of questions: > 1. How many CPUs do you enable in the kdump kernel - do you pass > 'nr_cpus=1' to the kdump kernel to limit the maximum number of cores > to 1 in the kdump kernel? Yes > 2. Which firmware version do you use on your board? Handle 0x0000, DMI type 0, 26 bytes BIOS Information Vendor: American Megatrends Inc. Version: L50_5.13_1.0.6 Release Date: 07/10/2018 Address: 0xF0000 Runtime Size: 64 kB ROM Size: 64 MB Characteristics: PCI is supported BIOS is upgradeable BIOS shadowing is allowed Boot from CD is supported Selectable boot is supported BIOS ROM is socketed ACPI is supported BIOS boot specification is supported Targeted content distribution is supported UEFI is supported BIOS Revision: 6.3
On 12/14/18 2:23 AM, Ard Biesheuvel wrote: > On Fri, 14 Dec 2018 at 05:08, Qian Cai <cai@lca.pw> wrote: >> Also tried to move the local TLB flush part around a bit inside >> __cpu_setup(), although it did complete kdump some times, it did trigger >> "Synchronous Exception" in EFI after a cold-reboot fairly often that >> seems no way to recover remotely without reinstalling the OS. > > This doesn't make any sense to me. If the system gets into a weird > state out of cold reboot, how could this code be the culprit? Please > check your firmware, and try to reproduce the issue on a system that > doesn't have such defects. > I'll continue investigating those "Synchronous Exception" although it is kind of hard due to I don't have any source code of the firmware to confirm it is buggy or not. I did manage to reproduce this kdump issue on around 5 of those server running a fairly recent version of the firmware (07/01/2018). I don't have access to other large CPU machines.
Hi Qian, On Sat, Dec 15, 2018 at 7:24 AM Qian Cai <cai@lca.pw> wrote: > > On 12/14/18 2:23 AM, Ard Biesheuvel wrote: > > On Fri, 14 Dec 2018 at 05:08, Qian Cai <cai@lca.pw> wrote: > >> Also tried to move the local TLB flush part around a bit inside > >> __cpu_setup(), although it did complete kdump some times, it did trigger > >> "Synchronous Exception" in EFI after a cold-reboot fairly often that > >> seems no way to recover remotely without reinstalling the OS. > > > > This doesn't make any sense to me. If the system gets into a weird > > state out of cold reboot, how could this code be the culprit? Please > > check your firmware, and try to reproduce the issue on a system that > > doesn't have such defects. > > > > I'll continue investigating those "Synchronous Exception" although it is kind of > hard due to I don't have any source code of the firmware to confirm it is buggy > or not. > > I did manage to reproduce this kdump issue on around 5 of those server running a > fairly recent version of the firmware (07/01/2018). I don't have access to other > large CPU machines. Sorry I got busy with some other stuff, but as I reported earlier, I am not able to reproduce this on my HPE apollo with the latest linus tree as well. Here are some details on my setup: 1. # uname -r 5.0.0-rc1+ with the following commit as the HEAD: commit a88cc8da0279f8e481b0d90e51a0a1cffac55906 (HEAD -> master, origin/master, origin/HEAD) Merge: 9cb2feb4d21d 73444bc4d8f9 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Tue Jan 8 18:58:29 2019 -0800 Merge branch 'akpm' (patches from Andrew) 2. I use the following kdump commandline: Kernel command line: BOOT_IMAGE=(hd9,gpt2)/vmlinuz-5.0.0-rc1+ ro irqpoll nr_cpus=1 swiotlb=noforce reset_devices earlycon=pl011,mmio,0x402020000 3. I am able to run kdump successfully on the machine and also collect the crash core properly: .. snip.. kdump: saving to /sysroot//var/crash/127.0.0.1-2019-01-10-10:52:25/ kdump: saving vmcore-dmesg.txt kdump: saving vmcore-dmesg.txt complete kdump: saving vmcore Copying data : [100.0 %] \ eta: 0s kdump: saving vmcore complete .. snip .. 4. I use the same firmware version on the board as you shared earlier: # dmidecode | grep -A 20 -i "BIOS Information" BIOS Information Vendor: American Megatrends Inc. Version: L50_5.13_1.0.6 Release Date: 07/10/2018 Address: 0xF0000 Runtime Size: 64 kB ROM Size: 64 MB Characteristics: PCI is supported BIOS is upgradeable BIOS shadowing is allowed Boot from CD is supported Selectable boot is supported BIOS ROM is socketed ACPI is supported BIOS boot specification is supported Targeted content distribution is supported UEFI is supported BIOS Revision: 6.3 So, I am guessing that it might be a kdump command line issue at your end. Thanks, Bhupesh
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 4471f570a295..7f555dd4577e 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -771,6 +771,10 @@ ENTRY(__enable_mmu) msr ttbr0_el1, x2 // load TTBR0 msr ttbr1_el1, x1 // load TTBR1 isb + + tlbi vmalle1 // invalidate TLB + dsb nsh + msr sctlr_el1, x0 isb /* diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index 2c75b0b903ae..14f68afdd57f 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings) */ .pushsection ".idmap.text", "awx" ENTRY(__cpu_setup) - tlbi vmalle1 // Invalidate local TLB - dsb nsh - mov x0, #3 << 20 msr cpacr_el1, x0 // Enable FP/ASIMD mov x0, #1 << 12 // Reset mdscr_el1 and disable
On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash dump just hung. It has 4 threads on each core. Each 2-core share a same L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same L3 cache. It turned out that this was due to the TLB contained stale entries (or uninitialized junk which just happened to look valid) before turning the MMU on in the second kernel which caused this instruction hung, msr sctlr_el1, x0 Although there is a local TLB flush in the second kernel in __cpu_setup(), it is called too early. When the time to turn the MMU on later, the TLB is dirty again from some reasons. Also tried to move the local TLB flush part around a bit inside __cpu_setup(), although it did complete kdump some times, it did trigger "Synchronous Exception" in EFI after a cold-reboot fairly often that seems no way to recover remotely without reinstalling the OS. For example, in those places, ENTRY(__cpu_setup) + isb tlbi vmalle1 dsb nsh or mov x0, #3 << 20 msr cpacr_el1, x0 + tlbi vmalle1 + dsb nsh Since it is only necessary to flush local TLB right before turning the MMU on, just re-arrage the part a bit like the one in __primary_switch() within CONFIG_RANDOMIZE_BASE path, so it does not depends on other instructions in between that could pollute the TLB, and it no longer trigger "Synchronous Exception" as well. Signed-off-by: Qian Cai <cai@lca.pw> --- v2: merge the similar part from __cpu_setup() pointed out by James. arch/arm64/kernel/head.S | 4 ++++ arch/arm64/mm/proc.S | 3 --- 2 files changed, 4 insertions(+), 3 deletions(-)