Message ID | 20230329082950.726-1-cuiyunhui@bytedance.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] riscv: Dump user opcode bytes on fatal faults | expand |
Context | Check | Description |
---|---|---|
conchuod/cover_letter | success | Single patches do not need cover letters |
conchuod/tree_selection | success | Guessed tree name to be for-next at HEAD d34a6b715a23 |
conchuod/fixes_present | success | Fixes tag not required for -next series |
conchuod/maintainers_pattern | success | MAINTAINERS pattern errors before the patch: 1 and now 1 |
conchuod/verify_signedoff | success | Signed-off-by tag matches author and committer |
conchuod/kdoc | success | Errors and warnings before: 0 this patch: 0 |
conchuod/build_rv64_clang_allmodconfig | success | Errors and warnings before: 3323 this patch: 3323 |
conchuod/module_param | success | Was 0 now: 0 |
conchuod/build_rv64_gcc_allmodconfig | success | Errors and warnings before: 17794 this patch: 17794 |
conchuod/build_rv32_defconfig | success | Build OK |
conchuod/dtb_warn_rv64 | success | Errors and warnings before: 3 this patch: 3 |
conchuod/header_inline | success | No static functions without inline keyword in header files |
conchuod/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 50 lines checked |
conchuod/source_inline | success | Was 0 now: 0 |
conchuod/build_rv64_nommu_k210_defconfig | success | Build OK |
conchuod/verify_fixes | success | No Fixes tag |
conchuod/build_rv64_nommu_virt_defconfig | success | Build OK |
Dear All, Any comments on this patch? Thanks, Yunhui
Hi Yunhui, Waking up the dead! ;-) Yunhui Cui <cuiyunhui@bytedance.com> writes: > We encountered such a problem that when the system starts to execute > init, init exits unexpectedly with error message: "unhandled signal 4 > code 0x1 ...". > > We are more curious about which instruction execution caused the > exception. After dumping it through show_opcodes(), we found that it > was caused by a floating-point instruction. > > In this way, we found the problem: in the system bringup , it is > precisely that we have not enabled the floating point function(CONFIG_FPU > is set, but not enalbe COMPAT_HWCAP_ISA_F/D in the dts or acpi). > > Like commit ba54d856a9d8 ("x86/fault: Dump user opcode bytes on fatal > faults"), when an exception occurs, it is necessary to dump the > instruction that caused the exception. X86's show_opcodes() is used both for kernel oops:es, and userland unhandled signals. On RISC-V there's dump_kernel_instr() added in commit eb165bfa8eaf ("riscv: Add instruction dump to RISC-V splats"). Wdyt about reworking that function, so that it works for userland epc as well? I think it's useful to have the surrounding instruction context, and not just on instruction. Björn
Hi Björn, On Wed, Aug 16, 2023 at 11:11 PM Björn Töpel <bjorn@kernel.org> wrote: > > Hi Yunhui, > > Waking up the dead! ;-) > > > X86's show_opcodes() is used both for kernel oops:es, and userland > unhandled signals. On RISC-V there's dump_kernel_instr() added in commit > eb165bfa8eaf ("riscv: Add instruction dump to RISC-V splats"). > > Wdyt about reworking that function, so that it works for userland epc as > well? I think it's useful to have the surrounding instruction context, > and not just on instruction. Okay, Based on your suggestion, I'm going to rename dump_kernel_instr to dump_instr. Like: static void dump_instr(const char *loglvl, struct pt_regs *regs) { ... if (user_mode(regs)) bad = get_user_nofault(val, &insns[i]); else bad = get_kernel_nofault(val, &insns[i]); ... } What do you think? Thanks, Yunhui
yunhui cui <cuiyunhui@bytedance.com> writes: > Hi Björn, > > On Wed, Aug 16, 2023 at 11:11 PM Björn Töpel <bjorn@kernel.org> wrote: >> >> Hi Yunhui, >> >> Waking up the dead! ;-) >> > >> >> X86's show_opcodes() is used both for kernel oops:es, and userland >> unhandled signals. On RISC-V there's dump_kernel_instr() added in commit >> eb165bfa8eaf ("riscv: Add instruction dump to RISC-V splats"). >> >> Wdyt about reworking that function, so that it works for userland epc as >> well? I think it's useful to have the surrounding instruction context, >> and not just on instruction. > > Okay, Based on your suggestion, I'm going to rename dump_kernel_instr > to dump_instr. Like: > static void dump_instr(const char *loglvl, struct pt_regs *regs) > { > ... > if (user_mode(regs)) > bad = get_user_nofault(val, &insns[i]); > else > bad = get_kernel_nofault(val, &insns[i]); > ... > } > > What do you think? Yeah, looks good! Does that work for you? Björn
Hi Björn, On Thu, Aug 17, 2023 at 6:03 PM Björn Töpel <bjorn@kernel.org> wrote: > > yunhui cui <cuiyunhui@bytedance.com> writes: > > > Hi Björn, > > > > On Wed, Aug 16, 2023 at 11:11 PM Björn Töpel <bjorn@kernel.org> wrote: > >> > >> Hi Yunhui, > >> > >> Waking up the dead! ;-) > >> > > > >> > >> X86's show_opcodes() is used both for kernel oops:es, and userland > >> unhandled signals. On RISC-V there's dump_kernel_instr() added in commit > >> eb165bfa8eaf ("riscv: Add instruction dump to RISC-V splats"). > >> > >> Wdyt about reworking that function, so that it works for userland epc as > >> well? I think it's useful to have the surrounding instruction context, > >> and not just on instruction. > > > > Okay, Based on your suggestion, I'm going to rename dump_kernel_instr > > to dump_instr. Like: > > static void dump_instr(const char *loglvl, struct pt_regs *regs) > > { > > ... > > if (user_mode(regs)) > > bad = get_user_nofault(val, &insns[i]); > > else > > bad = get_kernel_nofault(val, &insns[i]); > > ... > > } > > > > What do you think? > > Yeah, looks good! Does that work for you? Yeah, It works for me, I'll update the patch to v3 and send it. Thanks, Yunhui
diff --git a/arch/riscv/include/asm/bug.h b/arch/riscv/include/asm/bug.h index 1aaea81fb141..56dab998d05d 100644 --- a/arch/riscv/include/asm/bug.h +++ b/arch/riscv/include/asm/bug.h @@ -86,6 +86,7 @@ struct pt_regs; struct task_struct; void __show_regs(struct pt_regs *regs); +void show_opcodes(struct pt_regs *regs); void die(struct pt_regs *regs, const char *str); void do_trap(struct pt_regs *regs, int signo, int code, unsigned long addr); diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index 774ffde386ab..9ba9f8719605 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -83,6 +83,36 @@ void show_regs(struct pt_regs *regs) dump_backtrace(regs, NULL, KERN_DEFAULT); } +static int copy_code(struct pt_regs *regs, u8 *buf, unsigned long src, + unsigned int nbytes) +{ + if (!user_mode(regs)) + return copy_from_kernel_nofault(buf, (u8 *)src, nbytes); + + /* The user space code from other tasks cannot be accessed. */ + if (regs != task_pt_regs(current)) + return -EPERM; + + return copy_from_user_nofault(buf, (void __user *)src, nbytes); +} + +void show_opcodes(struct pt_regs *regs) +{ + u8 opcodes[4]; + + switch (copy_code(regs, opcodes, regs->epc, sizeof(opcodes))) { + case 0: + pr_info("Opcode: %4ph", opcodes); + break; + case -EPERM: + pr_err("Unable to access userspace of other tasks"); + break; + default: + pr_err("Failed to access opcode"); + break; + } +} + #ifdef CONFIG_COMPAT static bool compat_mode_supported __read_mostly; diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c index f6fda94e8e59..892826234ee9 100644 --- a/arch/riscv/kernel/traps.c +++ b/arch/riscv/kernel/traps.c @@ -100,6 +100,7 @@ void do_trap(struct pt_regs *regs, int signo, int code, unsigned long addr) print_vma_addr(KERN_CONT " in ", instruction_pointer(regs)); pr_cont("\n"); __show_regs(regs); + show_opcodes(regs); } force_sig_fault(signo, code, (void __user *)addr);
We encountered such a problem that when the system starts to execute init, init exits unexpectedly with error message: "unhandled signal 4 code 0x1 ...". We are more curious about which instruction execution caused the exception. After dumping it through show_opcodes(), we found that it was caused by a floating-point instruction. In this way, we found the problem: in the system bringup , it is precisely that we have not enabled the floating point function(CONFIG_FPU is set, but not enalbe COMPAT_HWCAP_ISA_F/D in the dts or acpi). Like commit ba54d856a9d8 ("x86/fault: Dump user opcode bytes on fatal faults"), when an exception occurs, it is necessary to dump the instruction that caused the exception. Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> --- arch/riscv/include/asm/bug.h | 1 + arch/riscv/kernel/process.c | 30 ++++++++++++++++++++++++++++++ arch/riscv/kernel/traps.c | 1 + 3 files changed, 32 insertions(+)