Message ID | 20221108102044.3317793-1-guoren@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [RESEND] riscv: asid: Fixup stale TLB entry cause application crash | expand |
Context | Check | Description |
---|---|---|
conchuod/patch_count | success | Link |
conchuod/cover_letter | success | Single patches do not need cover letters |
conchuod/tree_selection | success | Guessed tree name to be fixes |
conchuod/fixes_present | success | Fixes tag present in non-next series |
conchuod/verify_signedoff | success | Signed-off-by tag matches author and committer |
conchuod/kdoc | success | Errors and warnings before: 0 this patch: 0 |
conchuod/module_param | success | Was 0 now: 0 |
conchuod/build_rv32_defconfig | success | Build OK |
conchuod/build_warn_rv64 | success | Errors and warnings before: 0 this patch: 0 |
conchuod/dtb_warn_rv64 | success | Errors and warnings before: 0 this patch: 0 |
conchuod/header_inline | success | No static functions without inline keyword in header files |
conchuod/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 10 lines checked |
conchuod/source_inline | success | Was 0 now: 0 |
conchuod/build_rv64_nommu_k210_defconfig | fail | Build failed |
conchuod/verify_fixes | success | Fixes tag looks correct |
conchuod/build_rv64_nommu_virt_defconfig | fail | Build failed |
On 8 November 2022 10:20:44 GMT, guoren@kernel.org wrote: >From: Guo Ren <guoren@linux.alibaba.com> > >After use_asid_allocator enabled, the userspace application will >crash for stale tlb entry. Because only using cpumask_clear_cpu without >local_flush_tlb_all couldn't guarantee CPU's tlb entries fresh. Then >set_mm_asid would cause user space application get a stale value by >the stale tlb entry, but set_mm_noasid is okay. > >Here is the symptom of the bug: >unhandled signal 11 code 0x1 (coredump) > 0x0000003fd6d22524 <+4>: auipc s0,0x70 > 0x0000003fd6d22528 <+8>: ld s0,-148(s0) # 0x3fd6d92490 >=> 0x0000003fd6d2252c <+12>: ld a5,0(s0) >(gdb) i r s0 >s0 0x8082ed1cc3198b21 0x8082ed1cc3198b21 >(gdb) x/16 0x3fd6d92490 >0x3fd6d92490: 0xd80ac8a8 0x0000003f >The core dump file shows that the value of register s0 is wrong, but the >value in memory is right. This is because 'ld s0, -148(s0)' use a stale >mapping entry in TLB and got a wrong value from a stale physical >address. > >When task run on CPU0, the task loaded/speculative-loaded the value of >address(0x3fd6d92490), and the first version of tlb mapping entry was >PTWed into CPU0's tlb. >When the task switched from CPU0 to CPU1 without local_tlb_flush_all >(because of asid), the task happened to write a value on address >(0x3fd6d92490). It caused do_page_fault -> wp_page_copy -> >ptep_clear_flush -> ptep_get_and_clear & flush_tlb_page. >The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need >tlb flush, but CPU0 had cleared the CPU0's mm_cpumask in previous switch_mm. >So we only flushed the CPU1 tlb, and setted second version mapping >of the pte. When the task switch from CPU1 to CPU0 again, CPU0 still used a >stale tlb mapping entry which contained a wrong target physical address. >When the task happened to read that value, the bug would be raised. > >Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator") >Signed-off-by: Guo Ren <guoren@linux.alibaba.com> >Signed-off-by: Guo Ren <guoren@kernel.org> >Cc: Anup Patel <apatel@ventanamicro.com> >Cc: Palmer Dabbelt <palmer@rivosinc.com> >--- > arch/riscv/mm/context.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > >diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c >index 7acbfbd14557..8ad6c2493e93 100644 >--- a/arch/riscv/mm/context.c >+++ b/arch/riscv/mm/context.c >@@ -317,7 +317,9 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, > */ > cpu = smp_processor_id(); > >- cpumask_clear_cpu(cpu, mm_cpumask(prev)); >+ if (!static_branch_unlikely(&use_asid_allocator)) >+ cpumask_clear_cpu(cpu, mm_cpumask(prev)); >+ > cpumask_set_cpu(cpu, mm_cpumask(next)); > > set_mm(next, cpu); This is a completely different patch to what you already sent. Why have you marked it RESEND rather than v2?
On Tue, Nov 08, 2022 at 10:27:51AM +0000, Conor Dooley wrote: > > > On 8 November 2022 10:20:44 GMT, guoren@kernel.org wrote: > >From: Guo Ren <guoren@linux.alibaba.com> > > > >After use_asid_allocator enabled, the userspace application will > >crash for stale tlb entry. Because only using cpumask_clear_cpu without > >local_flush_tlb_all couldn't guarantee CPU's tlb entries fresh. Then > >set_mm_asid would cause user space application get a stale value by > >the stale tlb entry, but set_mm_noasid is okay. > > > >Here is the symptom of the bug: > >unhandled signal 11 code 0x1 (coredump) > > 0x0000003fd6d22524 <+4>: auipc s0,0x70 > > 0x0000003fd6d22528 <+8>: ld s0,-148(s0) # 0x3fd6d92490 > >=> 0x0000003fd6d2252c <+12>: ld a5,0(s0) > >(gdb) i r s0 > >s0 0x8082ed1cc3198b21 0x8082ed1cc3198b21 > >(gdb) x/16 0x3fd6d92490 > >0x3fd6d92490: 0xd80ac8a8 0x0000003f > >The core dump file shows that the value of register s0 is wrong, but the > >value in memory is right. This is because 'ld s0, -148(s0)' use a stale > >mapping entry in TLB and got a wrong value from a stale physical > >address. > > > >When task run on CPU0, the task loaded/speculative-loaded the value of > >address(0x3fd6d92490), and the first version of tlb mapping entry was > >PTWed into CPU0's tlb. > >When the task switched from CPU0 to CPU1 without local_tlb_flush_all > >(because of asid), the task happened to write a value on address > >(0x3fd6d92490). It caused do_page_fault -> wp_page_copy -> > >ptep_clear_flush -> ptep_get_and_clear & flush_tlb_page. > >The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need > >tlb flush, but CPU0 had cleared the CPU0's mm_cpumask in previous switch_mm. > >So we only flushed the CPU1 tlb, and setted second version mapping > >of the pte. When the task switch from CPU1 to CPU0 again, CPU0 still used a > >stale tlb mapping entry which contained a wrong target physical address. > >When the task happened to read that value, the bug would be raised. > > > >Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator") > >Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > >Signed-off-by: Guo Ren <guoren@kernel.org> > >Cc: Anup Patel <apatel@ventanamicro.com> > >Cc: Palmer Dabbelt <palmer@rivosinc.com> > >--- > > arch/riscv/mm/context.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > >diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c > >index 7acbfbd14557..8ad6c2493e93 100644 > >--- a/arch/riscv/mm/context.c > >+++ b/arch/riscv/mm/context.c > >@@ -317,7 +317,9 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, > > */ > > cpu = smp_processor_id(); > > > >- cpumask_clear_cpu(cpu, mm_cpumask(prev)); > >+ if (!static_branch_unlikely(&use_asid_allocator)) > >+ cpumask_clear_cpu(cpu, mm_cpumask(prev)); > >+ > > cpumask_set_cpu(cpu, mm_cpumask(next)); > > > > set_mm(next, cpu); > > This is a completely different patch to what you already sent. > Why have you marked it RESEND rather than v2? In addition, it seems to break the build for the nommu defconfigs.
On Tue, Nov 8, 2022 at 6:27 PM Conor Dooley <conor@kernel.org> wrote: > > > > On 8 November 2022 10:20:44 GMT, guoren@kernel.org wrote: > >From: Guo Ren <guoren@linux.alibaba.com> > > > >After use_asid_allocator enabled, the userspace application will > >crash for stale tlb entry. Because only using cpumask_clear_cpu without > >local_flush_tlb_all couldn't guarantee CPU's tlb entries fresh. Then > >set_mm_asid would cause user space application get a stale value by > >the stale tlb entry, but set_mm_noasid is okay. > > > >Here is the symptom of the bug: > >unhandled signal 11 code 0x1 (coredump) > > 0x0000003fd6d22524 <+4>: auipc s0,0x70 > > 0x0000003fd6d22528 <+8>: ld s0,-148(s0) # 0x3fd6d92490 > >=> 0x0000003fd6d2252c <+12>: ld a5,0(s0) > >(gdb) i r s0 > >s0 0x8082ed1cc3198b21 0x8082ed1cc3198b21 > >(gdb) x/16 0x3fd6d92490 > >0x3fd6d92490: 0xd80ac8a8 0x0000003f > >The core dump file shows that the value of register s0 is wrong, but the > >value in memory is right. This is because 'ld s0, -148(s0)' use a stale > >mapping entry in TLB and got a wrong value from a stale physical > >address. > > > >When task run on CPU0, the task loaded/speculative-loaded the value of > >address(0x3fd6d92490), and the first version of tlb mapping entry was > >PTWed into CPU0's tlb. > >When the task switched from CPU0 to CPU1 without local_tlb_flush_all > >(because of asid), the task happened to write a value on address > >(0x3fd6d92490). It caused do_page_fault -> wp_page_copy -> > >ptep_clear_flush -> ptep_get_and_clear & flush_tlb_page. > >The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need > >tlb flush, but CPU0 had cleared the CPU0's mm_cpumask in previous switch_mm. > >So we only flushed the CPU1 tlb, and setted second version mapping > >of the pte. When the task switch from CPU1 to CPU0 again, CPU0 still used a > >stale tlb mapping entry which contained a wrong target physical address. > >When the task happened to read that value, the bug would be raised. > > > >Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator") > >Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > >Signed-off-by: Guo Ren <guoren@kernel.org> > >Cc: Anup Patel <apatel@ventanamicro.com> > >Cc: Palmer Dabbelt <palmer@rivosinc.com> > >--- > > arch/riscv/mm/context.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > >diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c > >index 7acbfbd14557..8ad6c2493e93 100644 > >--- a/arch/riscv/mm/context.c > >+++ b/arch/riscv/mm/context.c > >@@ -317,7 +317,9 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, > > */ > > cpu = smp_processor_id(); > > > >- cpumask_clear_cpu(cpu, mm_cpumask(prev)); > >+ if (!static_branch_unlikely(&use_asid_allocator)) > >+ cpumask_clear_cpu(cpu, mm_cpumask(prev)); > >+ > > cpumask_set_cpu(cpu, mm_cpumask(next)); > > > > set_mm(next, cpu); > > This is a completely different patch to what you already sent. Why have you marked it RESEND rather than v2? Okay, I should send v2.
Yes, thx for pointing that out. On Tue, Nov 8, 2022 at 10:23 PM Conor Dooley <conor.dooley@microchip.com> wrote: > > On Tue, Nov 08, 2022 at 10:27:51AM +0000, Conor Dooley wrote: > > > > > > On 8 November 2022 10:20:44 GMT, guoren@kernel.org wrote: > > >From: Guo Ren <guoren@linux.alibaba.com> > > > > > >After use_asid_allocator enabled, the userspace application will > > >crash for stale tlb entry. Because only using cpumask_clear_cpu without > > >local_flush_tlb_all couldn't guarantee CPU's tlb entries fresh. Then > > >set_mm_asid would cause user space application get a stale value by > > >the stale tlb entry, but set_mm_noasid is okay. > > > > > >Here is the symptom of the bug: > > >unhandled signal 11 code 0x1 (coredump) > > > 0x0000003fd6d22524 <+4>: auipc s0,0x70 > > > 0x0000003fd6d22528 <+8>: ld s0,-148(s0) # 0x3fd6d92490 > > >=> 0x0000003fd6d2252c <+12>: ld a5,0(s0) > > >(gdb) i r s0 > > >s0 0x8082ed1cc3198b21 0x8082ed1cc3198b21 > > >(gdb) x/16 0x3fd6d92490 > > >0x3fd6d92490: 0xd80ac8a8 0x0000003f > > >The core dump file shows that the value of register s0 is wrong, but the > > >value in memory is right. This is because 'ld s0, -148(s0)' use a stale > > >mapping entry in TLB and got a wrong value from a stale physical > > >address. > > > > > >When task run on CPU0, the task loaded/speculative-loaded the value of > > >address(0x3fd6d92490), and the first version of tlb mapping entry was > > >PTWed into CPU0's tlb. > > >When the task switched from CPU0 to CPU1 without local_tlb_flush_all > > >(because of asid), the task happened to write a value on address > > >(0x3fd6d92490). It caused do_page_fault -> wp_page_copy -> > > >ptep_clear_flush -> ptep_get_and_clear & flush_tlb_page. > > >The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need > > >tlb flush, but CPU0 had cleared the CPU0's mm_cpumask in previous switch_mm. > > >So we only flushed the CPU1 tlb, and setted second version mapping > > >of the pte. When the task switch from CPU1 to CPU0 again, CPU0 still used a > > >stale tlb mapping entry which contained a wrong target physical address. > > >When the task happened to read that value, the bug would be raised. > > > > > >Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator") > > >Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > > >Signed-off-by: Guo Ren <guoren@kernel.org> > > >Cc: Anup Patel <apatel@ventanamicro.com> > > >Cc: Palmer Dabbelt <palmer@rivosinc.com> > > >--- > > > arch/riscv/mm/context.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > >diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c > > >index 7acbfbd14557..8ad6c2493e93 100644 > > >--- a/arch/riscv/mm/context.c > > >+++ b/arch/riscv/mm/context.c > > >@@ -317,7 +317,9 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, > > > */ > > > cpu = smp_processor_id(); > > > > > >- cpumask_clear_cpu(cpu, mm_cpumask(prev)); > > >+ if (!static_branch_unlikely(&use_asid_allocator)) > > >+ cpumask_clear_cpu(cpu, mm_cpumask(prev)); > > >+ > > > cpumask_set_cpu(cpu, mm_cpumask(next)); > > > > > > set_mm(next, cpu); > > > > This is a completely different patch to what you already sent. > > Why have you marked it RESEND rather than v2? > > In addition, it seems to break the build for the nommu defconfigs.
Hi,
I love your patch! Yet something to improve:
[auto build test ERROR on linus/master]
[also build test ERROR on v6.1-rc4 next-20221108]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/guoren-kernel-org/riscv-asid-Fixup-stale-TLB-entry-cause-application-crash/20221108-182151
patch link: https://lore.kernel.org/r/20221108102044.3317793-1-guoren%40kernel.org
patch subject: [PATCH RESEND] riscv: asid: Fixup stale TLB entry cause application crash
config: riscv-nommu_virt_defconfig
compiler: riscv64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/025af39c1eb149e584705dfc72ac117812c82f3e
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review guoren-kernel-org/riscv-asid-Fixup-stale-TLB-entry-cause-application-crash/20221108-182151
git checkout 025af39c1eb149e584705dfc72ac117812c82f3e
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
riscv64-linux-ld: arch/riscv/mm/context.o: in function `.L0 ':
>> context.c:(__jump_table+0x8): undefined reference to `use_asid_allocator'
Hi,
I love your patch! Yet something to improve:
[auto build test ERROR on linus/master]
[also build test ERROR on v6.1-rc4 next-20221108]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/guoren-kernel-org/riscv-asid-Fixup-stale-TLB-entry-cause-application-crash/20221108-182151
patch link: https://lore.kernel.org/r/20221108102044.3317793-1-guoren%40kernel.org
patch subject: [PATCH RESEND] riscv: asid: Fixup stale TLB entry cause application crash
config: riscv-nommu_k210_defconfig
compiler: riscv64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/025af39c1eb149e584705dfc72ac117812c82f3e
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review guoren-kernel-org/riscv-asid-Fixup-stale-TLB-entry-cause-application-crash/20221108-182151
git checkout 025af39c1eb149e584705dfc72ac117812c82f3e
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
riscv64-linux-ld: arch/riscv/mm/context.o: in function `.L0 ':
>> context.c:(.text+0x2e): undefined reference to `use_asid_allocator'
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c index 7acbfbd14557..8ad6c2493e93 100644 --- a/arch/riscv/mm/context.c +++ b/arch/riscv/mm/context.c @@ -317,7 +317,9 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, */ cpu = smp_processor_id(); - cpumask_clear_cpu(cpu, mm_cpumask(prev)); + if (!static_branch_unlikely(&use_asid_allocator)) + cpumask_clear_cpu(cpu, mm_cpumask(prev)); + cpumask_set_cpu(cpu, mm_cpumask(next)); set_mm(next, cpu);