Message ID | 20191016110304.44932-1-mark.rutland@arm.com (mailing list archive) |
---|---|
State | Mainlined |
Commit | 3813733595c0c7c0674d106309b04e871d54dc1c |
Headers | show |
Series | arm64: mm: fix inverted PAR_EL1.F check | expand |
Hi Mark, On 16/10/2019 12:03, Mark Rutland wrote: > When detecting a spurious EL1 translation fault, we have the CPU retry > the translation using an AT S1E1R instruction, and inspect PAR_EL1 to > determine if the fault was spurious. > > When PAR_EL1.F == 0, the AT instruction successfully translated the > address without a fault, which implies the original fault was spurious. > However, in this case we return false and treat the original fault as if > it was not spurious. > > Invert the return value so that we treat such a case as spurious. With d0b7a302d58a reverted I can use A57's #1387217[1] to hit this spurious case. (it needs planetary alignment too). With this this patch the 'spurious' message is printed, and the kernel keeps running [0]. If its useful: Tested-by: James Morse <james.morse@arm.com> Thanks, James [0] survivable spurious translation fault: [ 2186.464077] ------------[ cut here ]------------ [ 2186.468687] Ignoring spurious kernel translation fault at virtual address fffffdfffe5fd020 [ 2186.476950] WARNING: CPU: 7 PID: 0 at ../arch/arm64/mm/fault.c:315 __do_kernel_fault+0xf0/0x120 [ 2186.485634] Modules linked in: crct10dif_ce ip_tables x_tables ipv6 nf_defrag_ipv6 [ 2186.493197] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G W 5.4.0-rc3-00002-g08245819bd8b #125 [ 2186.502663] Hardware name: AMD Seattle (Rev.B0) Development Board (Overdrive) (DT) [ 2186.510219] pstate: 40000085 (nZcv daIf -PAN -UAO) [ 2186.514998] pc : __do_kernel_fault+0xf0/0x120 [ 2186.519342] lr : __do_kernel_fault+0xf0/0x120 [ 2186.606470] Call trace: [ 2186.608906] __do_kernel_fault+0xf0/0x120 [ 2186.612904] do_translation_fault+0x40/0x70 [ 2186.617075] do_mem_abort+0x3c/0x98 [ 2186.620551] el1_da+0x20/0x94 [ 2186.623507] __memcpy_fromio+0x44/0x98 [ 2186.627245] __ghes_peek_estatus.isra.13+0x54/0xc8 [ 2186.632023] ghes_proc+0x40/0x158 [ 2186.635325] ghes_poll_func+0x38/0x68 [ 2186.638977] call_timer_fn.isra.31+0x20/0x78 [ 2186.643234] run_timer_softirq+0x33c/0x398 [ 2186.647317] __do_softirq+0x114/0x230 [ 2186.650968] irq_exit+0xcc/0xd8 [ 2186.654097] __handle_domain_irq+0x60/0xb8 [ 2186.658180] gic_handle_irq+0x58/0xb0 [ 2186.661829] el1_irq+0xb8/0x180 [ 2186.664958] arch_cpu_idle+0x10/0x18 [ 2186.668521] do_idle+0x1c4/0x290 [ 2186.671736] cpu_startup_entry+0x20/0x40 [ 2186.675647] secondary_start_kernel+0x1bc/0x218 [ 2186.680165] ---[ end trace b0c5d14deb160e8f ]--- [1] https://static.docs.arm.com/epm049219/220/Arm_Cortex_A57_MPCore_Software_Developers_Errata_Notice_SDEN_v22.pdf
On Wed, Oct 16, 2019 at 05:29:27PM +0100, James Morse wrote: > On 16/10/2019 12:03, Mark Rutland wrote: > > When detecting a spurious EL1 translation fault, we have the CPU retry > > the translation using an AT S1E1R instruction, and inspect PAR_EL1 to > > determine if the fault was spurious. > > > > When PAR_EL1.F == 0, the AT instruction successfully translated the > > address without a fault, which implies the original fault was spurious. > > However, in this case we return false and treat the original fault as if > > it was not spurious. > > > > Invert the return value so that we treat such a case as spurious. > > With d0b7a302d58a reverted I can use A57's #1387217[1] to hit this spurious case. > (it needs planetary alignment too). Haha, I like your choice of words here! "Using" an erratum for testing is an interesting concept. > With this this patch the 'spurious' message is printed, and the kernel keeps running [0]. > > If its useful: > Tested-by: James Morse <james.morse@arm.com> Thanks, I'll pick this up along with the other patch to the field definition (did you have that one too?) Will
Hi Will, On 16/10/2019 17:49, Will Deacon wrote: > On Wed, Oct 16, 2019 at 05:29:27PM +0100, James Morse wrote: >> On 16/10/2019 12:03, Mark Rutland wrote: >>> When detecting a spurious EL1 translation fault, we have the CPU retry >>> the translation using an AT S1E1R instruction, and inspect PAR_EL1 to >>> determine if the fault was spurious. >>> >>> When PAR_EL1.F == 0, the AT instruction successfully translated the >>> address without a fault, which implies the original fault was spurious. >>> However, in this case we return false and treat the original fault as if >>> it was not spurious. >>> >>> Invert the return value so that we treat such a case as spurious. >> >> With d0b7a302d58a reverted I can use A57's #1387217[1] to hit this spurious case. >> (it needs planetary alignment too). > > Haha, I like your choice of words here! "Using" an erratum for testing is an > interesting concept. > >> With this this patch the 'spurious' message is printed, and the kernel keeps running [0]. >> >> If its useful: >> Tested-by: James Morse <james.morse@arm.com> > > Thanks, I'll pick this up along with the other patch to the field definition > (did you have that one too?) No... is it in rc3? git log --oneline | 08245819bd8b arm64: mm: fix inverted PAR_EL1.F check | 123260acf6ac Revert "Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}"" | 4f5cafb5cb84 Linux 5.4-rc3 I ran it without Mark's patch and it died as expected. Thanks, James
On Wed, Oct 16, 2019 at 05:55:04PM +0100, James Morse wrote: > On 16/10/2019 17:49, Will Deacon wrote: > > On Wed, Oct 16, 2019 at 05:29:27PM +0100, James Morse wrote: > >> With this this patch the 'spurious' message is printed, and the kernel keeps running [0]. > >> > >> If its useful: > >> Tested-by: James Morse <james.morse@arm.com> > > > > Thanks, I'll pick this up along with the other patch to the field definition > > (did you have that one too?) > > No... is it in rc3? No: 1571197377-48650-1-git-send-email-yangyingliang@huawei.com Will
On 16/10/2019 18:12, Will Deacon wrote: > On Wed, Oct 16, 2019 at 05:55:04PM +0100, James Morse wrote: >> On 16/10/2019 17:49, Will Deacon wrote: >>> On Wed, Oct 16, 2019 at 05:29:27PM +0100, James Morse wrote: >>>> With this this patch the 'spurious' message is printed, and the kernel keeps running [0]. >>>> >>>> If its useful: >>>> Tested-by: James Morse <james.morse@arm.com> >>> >>> Thanks, I'll pick this up along with the other patch to the field definition >>> (did you have that one too?) >> >> No... is it in rc3? > > No: 1571197377-48650-1-git-send-email-yangyingliang@huawei.com Huh, so we depend on the bottom FSC bit being set if it faults. Just for completeness, tested again: | 2b41181c36d6 arm64: mm: fix inverted PAR_EL1.F check | c5ff5e2c9b97 arm64: sysreg: fix wrong PAR_EL1.F macro | 123260acf6ac Revert "Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}"" | 4f5cafb5cb84 Linux 5.4-rc3 With this patch: | ------------[ cut here ]------------ | Ignoring spurious kernel translation fault at virtual address fffffdfffe5fd988 | WARNING: CPU: 7 PID: 0 at ../arch/arm64/mm/fault.c:315 __do_kernel_fault+0xf0/0x120 | Modules linked in: crct10dif_ce ip_tables x_tables ipv6 nf_defrag_ipv6 | CPU: 7 PID: 0 Comm: swapper/7 Tainted: G W 5.4.0-rc3-00003-g2b41181c36d6 #126 | Hardware name: AMD Seattle (Rev.B0) Development Board (Overdrive) (DT) | pstate: 40000085 (nZcv daIf -PAN -UAO) | pc : __do_kernel_fault+0xf0/0x120 | lr : __do_kernel_fault+0xf0/0x120 | Call trace: | __do_kernel_fault+0xf0/0x120 | do_translation_fault+0x40/0x70 | do_mem_abort+0x3c/0x98 | el1_da+0x20/0x94 | __memcpy_fromio+0x44/0x98 | __ghes_peek_estatus.isra.13+0x54/0xc8 | ghes_proc+0x40/0x158 | ghes_poll_func+0x38/0x68 | call_timer_fn.isra.31+0x20/0x78 | run_timer_softirq+0x33c/0x398 | __do_softirq+0x114/0x230 | irq_exit+0xcc/0xd8 | __handle_domain_irq+0x60/0xb8 | gic_handle_irq+0x58/0xb0 | el1_irq+0xb8/0x180 | arch_cpu_idle+0x10/0x18 | do_idle+0x1c4/0x290 | cpu_startup_entry+0x20/0x40 | secondary_start_kernel+0x1bc/0x218 | ---[ end trace e60cc177313a1b39 ]--- and without: | Unable to handle kernel paging request at virtual address fffffdfffe5fd020 | Mem abort info: | ESR = 0x96000007 | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | Data abort info: | ISV = 0, ISS = 0x00000007 | CM = 0, WnR = 0 | swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000008003282000 | [fffffdfffe5fd020] pgd=0000008003954003, pud=0000008003955003, pmd=0000008003953003, pte=00e80087fef6f713 | Internal error: Oops: 96000007 [#1] PREEMPT SMP | Modules linked in: crct10dif_ce ip_tables x_tables ipv6 nf_defrag_ipv6 | CPU: 7 PID: 0 Comm: swapper/7 Tainted: G W 5.4.0-rc3-00002-gc5ff5e2c9b97 #127 | Hardware name: AMD Seattle (Rev.B0) Development Board (Overdrive) (DT) | pstate: 20000085 (nzCv daIf -PAN -UAO) | pc : __memcpy_fromio+0x44/0x98 | lr : ghes_copy_tofrom_phys+0x7c/0x158 | Call trace: | __memcpy_fromio+0x44/0x98 | __ghes_peek_estatus.isra.13+0x54/0xc8 | ghes_proc+0x40/0x158 | ghes_poll_func+0x38/0x68 | call_timer_fn.isra.31+0x20/0x78 | run_timer_softirq+0x33c/0x398 | __do_softirq+0x114/0x230 | irq_exit+0xcc/0xd8 | __handle_domain_irq+0x60/0xb8 | gic_handle_irq+0x58/0xb0 | el1_irq+0xb8/0x180 | arch_cpu_idle+0x10/0x18 | do_idle+0x1c4/0x290 | cpu_startup_entry+0x20/0x40 | secondary_start_kernel+0x1bc/0x218 | Code: 927df0e6 aa0003e3 910020c6 8b060006 (c8dffc85) | ---[ end trace 869b94c4c0803e42 ]--- | Kernel panic - not syncing: Fatal exception in interrupt | SMP: stopping secondary CPUs | Kernel Offset: disabled | CPU features: 0x0002,20006082 | Memory Limit: none | ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Thanks, James
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 855f2a7954e6..9fc6db0bcbad 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -268,8 +268,12 @@ static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr, par = read_sysreg(par_el1); local_irq_restore(flags); + /* + * If we now have a valid translation, treat the translation fault as + * spurious. + */ if (!(par & SYS_PAR_EL1_F)) - return false; + return true; /* * If we got a different type of fault from the AT instruction,
When detecting a spurious EL1 translation fault, we have the CPU retry the translation using an AT S1E1R instruction, and inspect PAR_EL1 to determine if the fault was spurious. When PAR_EL1.F == 0, the AT instruction successfully translated the address without a fault, which implies the original fault was spurious. However, in this case we return false and treat the original fault as if it was not spurious. Invert the return value so that we treat such a case as spurious. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> --- arch/arm64/mm/fault.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)