Message ID | 20221108090219.3285030-1-guoren@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | riscv: asid: Fixup stale TLB entry cause application crash | expand |
Context | Check | Description |
---|---|---|
conchuod/patch_count | success | Link |
conchuod/cover_letter | success | Single patches do not need cover letters |
conchuod/tree_selection | success | Guessed tree name to be fixes |
conchuod/fixes_present | success | Fixes tag present in non-next series |
conchuod/verify_signedoff | success | Signed-off-by tag matches author and committer |
conchuod/kdoc | success | Errors and warnings before: 0 this patch: 0 |
conchuod/module_param | success | Was 0 now: 0 |
conchuod/build_rv32_defconfig | success | Build OK |
conchuod/build_warn_rv64 | success | Errors and warnings before: 0 this patch: 0 |
conchuod/dtb_warn_rv64 | success | Errors and warnings before: 0 this patch: 0 |
conchuod/header_inline | success | No static functions without inline keyword in header files |
conchuod/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 7 lines checked |
conchuod/source_inline | success | Was 0 now: 0 |
conchuod/build_rv64_nommu_k210_defconfig | success | Build OK |
conchuod/verify_fixes | success | Fixes tag looks correct |
conchuod/build_rv64_nommu_virt_defconfig | success | Build OK |
On Tue, Nov 8, 2022 at 5:02 PM <guoren@kernel.org> wrote: > > From: Guo Ren <guoren@linux.alibaba.com> > > When use_asid_allocator is enabled, the userspace application would > crash by stale tlb entry. Because only using cpumask_clear_cpu without > local_flush_tlb_all couldn't guarantee CPU's tlb contains fresh mapping > entry. Then set_mm_asid would cause user space application get a stale > value by stale tlb entry, but set_mm_noasid is okay. > > Here is the symptom of the bug: > unhandled signal 11 code 0x1 (coredump) > 0x0000003fd6d22524 <+4>: auipc s0,0x70 > 0x0000003fd6d22528 <+8>: ld s0,-148(s0) # 0x3fd6d92490 > => 0x0000003fd6d2252c <+12>: ld a5,0(s0) > (gdb) i r s0 > s0 0x8082ed1cc3198b21 0x8082ed1cc3198b21 > (gdb) x/16 0x3fd6d92490 > 0x3fd6d92490: 0xd80ac8a8 0x0000003f > The core dump file show us the value of register s0 is wrong, but the > value in memory is right. > > When task run on CPU0, the task loaded/speculative-loaded the value of > address-0x3fd6d92490, the first version tlb mapping enter in CPU0's tlb. > When the task switched from CPU0 to CPU1 without local_tlb_flush_all > (because of asid), the task happened to write a value on address: > 0x3fd6d92490 that caused do_page_fault -> wp_page_copy -> > ptep_clear_flush -> ptep_get_and_clear & flush_tlb_page. > The flush_tlb_page would use mm_cpumask(mm) to determine which CPUs need > tlb flush, but CPU0 cleared the CPU0's mm_cpumask in previous switch_mm. > So we only flushed the CPU1's tlb entry, and setted second version mapping > of the pte. When the task switch from CPU1 to CPU0 again, it still used a > stale tlb entry on CPU0 which contained a wrong target physical address. > When the task happened to read that value, the bug would be raised. > > Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator") > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > Signed-off-by: Guo Ren <guoren@kernel.org> > Cc: Anup Patel <apatel@ventanamicro.com> > Cc: Palmer Dabbelt <palmer@rivosinc.com> > --- > arch/riscv/mm/context.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c > index 7acbfbd14557..843e86b63532 100644 > --- a/arch/riscv/mm/context.c > +++ b/arch/riscv/mm/context.c > @@ -317,7 +317,6 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, > */ > cpu = smp_processor_id(); > > - cpumask_clear_cpu(cpu, mm_cpumask(prev)); It should be: - cpumask_clear_cpu(cpu, mm_cpumask(prev)); + if (static_branch_unlikely(&use_asid_allocator)) + cpumask_clear_cpu(cpu, mm_cpumask(prev)); + No bug on noasid part. > cpumask_set_cpu(cpu, mm_cpumask(next)); > > set_mm(next, cpu); > -- > 2.36.1 >
On Tue, Nov 8, 2022 at 5:25 PM Guo Ren <guoren@kernel.org> wrote: > > On Tue, Nov 8, 2022 at 5:02 PM <guoren@kernel.org> wrote: > > > > From: Guo Ren <guoren@linux.alibaba.com> > > > > When use_asid_allocator is enabled, the userspace application would > > crash by stale tlb entry. Because only using cpumask_clear_cpu without > > local_flush_tlb_all couldn't guarantee CPU's tlb contains fresh mapping > > entry. Then set_mm_asid would cause user space application get a stale > > value by stale tlb entry, but set_mm_noasid is okay. > > > > Here is the symptom of the bug: > > unhandled signal 11 code 0x1 (coredump) > > 0x0000003fd6d22524 <+4>: auipc s0,0x70 > > 0x0000003fd6d22528 <+8>: ld s0,-148(s0) # 0x3fd6d92490 > > => 0x0000003fd6d2252c <+12>: ld a5,0(s0) > > (gdb) i r s0 > > s0 0x8082ed1cc3198b21 0x8082ed1cc3198b21 > > (gdb) x/16 0x3fd6d92490 > > 0x3fd6d92490: 0xd80ac8a8 0x0000003f > > The core dump file show us the value of register s0 is wrong, but the > > value in memory is right. > > > > When task run on CPU0, the task loaded/speculative-loaded the value of > > address-0x3fd6d92490, the first version tlb mapping enter in CPU0's tlb. > > When the task switched from CPU0 to CPU1 without local_tlb_flush_all > > (because of asid), the task happened to write a value on address: > > 0x3fd6d92490 that caused do_page_fault -> wp_page_copy -> > > ptep_clear_flush -> ptep_get_and_clear & flush_tlb_page. > > The flush_tlb_page would use mm_cpumask(mm) to determine which CPUs need > > tlb flush, but CPU0 cleared the CPU0's mm_cpumask in previous switch_mm. > > So we only flushed the CPU1's tlb entry, and setted second version mapping > > of the pte. When the task switch from CPU1 to CPU0 again, it still used a > > stale tlb entry on CPU0 which contained a wrong target physical address. > > When the task happened to read that value, the bug would be raised. > > > > Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator") > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > > Signed-off-by: Guo Ren <guoren@kernel.org> > > Cc: Anup Patel <apatel@ventanamicro.com> > > Cc: Palmer Dabbelt <palmer@rivosinc.com> > > --- > > arch/riscv/mm/context.c | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c > > index 7acbfbd14557..843e86b63532 100644 > > --- a/arch/riscv/mm/context.c > > +++ b/arch/riscv/mm/context.c > > @@ -317,7 +317,6 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, > > */ > > cpu = smp_processor_id(); > > > > - cpumask_clear_cpu(cpu, mm_cpumask(prev)); > It should be: > > - cpumask_clear_cpu(cpu, mm_cpumask(prev)); > + if (static_branch_unlikely(&use_asid_allocator)) Sorry, if (!static_branch_unlikely(&use_asid_allocator)) I would RESEND the patch. > + cpumask_clear_cpu(cpu, mm_cpumask(prev)); > + > > No bug on noasid part. > > > cpumask_set_cpu(cpu, mm_cpumask(next)); > > > > set_mm(next, cpu); > > -- > > 2.36.1 > > > > > -- > Best Regards > Guo Ren
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c index 7acbfbd14557..843e86b63532 100644 --- a/arch/riscv/mm/context.c +++ b/arch/riscv/mm/context.c @@ -317,7 +317,6 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, */ cpu = smp_processor_id(); - cpumask_clear_cpu(cpu, mm_cpumask(prev)); cpumask_set_cpu(cpu, mm_cpumask(next)); set_mm(next, cpu);