diff mbox series

[v2] riscv: Dump user opcode bytes on fatal faults

Message ID 20230329082950.726-1-cuiyunhui@bytedance.com (mailing list archive)
State New, archived
Headers show
Series [v2] riscv: Dump user opcode bytes on fatal faults | expand

Checks

Context Check Description
conchuod/cover_letter success Single patches do not need cover letters
conchuod/tree_selection success Guessed tree name to be for-next at HEAD d34a6b715a23
conchuod/fixes_present success Fixes tag not required for -next series
conchuod/maintainers_pattern success MAINTAINERS pattern errors before the patch: 1 and now 1
conchuod/verify_signedoff success Signed-off-by tag matches author and committer
conchuod/kdoc success Errors and warnings before: 0 this patch: 0
conchuod/build_rv64_clang_allmodconfig success Errors and warnings before: 3323 this patch: 3323
conchuod/module_param success Was 0 now: 0
conchuod/build_rv64_gcc_allmodconfig success Errors and warnings before: 17794 this patch: 17794
conchuod/build_rv32_defconfig success Build OK
conchuod/dtb_warn_rv64 success Errors and warnings before: 3 this patch: 3
conchuod/header_inline success No static functions without inline keyword in header files
conchuod/checkpatch success total: 0 errors, 0 warnings, 0 checks, 50 lines checked
conchuod/source_inline success Was 0 now: 0
conchuod/build_rv64_nommu_k210_defconfig success Build OK
conchuod/verify_fixes success No Fixes tag
conchuod/build_rv64_nommu_virt_defconfig success Build OK

Commit Message

Yunhui Cui March 29, 2023, 8:29 a.m. UTC
We encountered such a problem that when the system starts to execute
init, init exits unexpectedly with error message: "unhandled signal 4
code 0x1 ...".

We are more curious about which instruction execution caused the
exception. After dumping it through show_opcodes(), we found that it
was caused by a floating-point instruction.

In this way, we found the problem: in the system bringup , it is
precisely that we have not enabled the floating point function(CONFIG_FPU
is set, but not enalbe COMPAT_HWCAP_ISA_F/D in the dts or acpi).

Like commit ba54d856a9d8 ("x86/fault: Dump user opcode bytes on fatal
faults"), when an exception occurs, it is necessary to dump the
instruction that caused the exception.

Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
 arch/riscv/include/asm/bug.h |  1 +
 arch/riscv/kernel/process.c  | 30 ++++++++++++++++++++++++++++++
 arch/riscv/kernel/traps.c    |  1 +
 3 files changed, 32 insertions(+)

Comments

Yunhui Cui April 11, 2023, 12:31 p.m. UTC | #1
Dear All,

Any comments on this patch?

Thanks,
Yunhui
Björn Töpel Aug. 16, 2023, 3:10 p.m. UTC | #2
Hi Yunhui,

Waking up the dead! ;-)

Yunhui Cui <cuiyunhui@bytedance.com> writes:

> We encountered such a problem that when the system starts to execute
> init, init exits unexpectedly with error message: "unhandled signal 4
> code 0x1 ...".
>
> We are more curious about which instruction execution caused the
> exception. After dumping it through show_opcodes(), we found that it
> was caused by a floating-point instruction.
>
> In this way, we found the problem: in the system bringup , it is
> precisely that we have not enabled the floating point function(CONFIG_FPU
> is set, but not enalbe COMPAT_HWCAP_ISA_F/D in the dts or acpi).
>
> Like commit ba54d856a9d8 ("x86/fault: Dump user opcode bytes on fatal
> faults"), when an exception occurs, it is necessary to dump the
> instruction that caused the exception.

X86's show_opcodes() is used both for kernel oops:es, and userland
unhandled signals. On RISC-V there's dump_kernel_instr() added in commit
eb165bfa8eaf ("riscv: Add instruction dump to RISC-V splats").

Wdyt about reworking that function, so that it works for userland epc as
well? I think it's useful to have the surrounding instruction context,
and not just on instruction.


Björn
Yunhui Cui Aug. 17, 2023, 8:23 a.m. UTC | #3
Hi Björn,

On Wed, Aug 16, 2023 at 11:11 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> Hi Yunhui,
>
> Waking up the dead! ;-)
>

>
> X86's show_opcodes() is used both for kernel oops:es, and userland
> unhandled signals. On RISC-V there's dump_kernel_instr() added in commit
> eb165bfa8eaf ("riscv: Add instruction dump to RISC-V splats").
>
> Wdyt about reworking that function, so that it works for userland epc as
> well? I think it's useful to have the surrounding instruction context,
> and not just on instruction.

Okay,  Based on your suggestion, I'm going to rename dump_kernel_instr
to dump_instr. Like:
static void dump_instr(const char *loglvl, struct pt_regs *regs)
{
...
                if (user_mode(regs))
                        bad = get_user_nofault(val, &insns[i]);
                else
                        bad = get_kernel_nofault(val, &insns[i]);
...
}

What do you think?

Thanks,
Yunhui
Björn Töpel Aug. 17, 2023, 10:02 a.m. UTC | #4
yunhui cui <cuiyunhui@bytedance.com> writes:

> Hi Björn,
>
> On Wed, Aug 16, 2023 at 11:11 PM Björn Töpel <bjorn@kernel.org> wrote:
>>
>> Hi Yunhui,
>>
>> Waking up the dead! ;-)
>>
>
>>
>> X86's show_opcodes() is used both for kernel oops:es, and userland
>> unhandled signals. On RISC-V there's dump_kernel_instr() added in commit
>> eb165bfa8eaf ("riscv: Add instruction dump to RISC-V splats").
>>
>> Wdyt about reworking that function, so that it works for userland epc as
>> well? I think it's useful to have the surrounding instruction context,
>> and not just on instruction.
>
> Okay,  Based on your suggestion, I'm going to rename dump_kernel_instr
> to dump_instr. Like:
> static void dump_instr(const char *loglvl, struct pt_regs *regs)
> {
> ...
>                 if (user_mode(regs))
>                         bad = get_user_nofault(val, &insns[i]);
>                 else
>                         bad = get_kernel_nofault(val, &insns[i]);
> ...
> }
>
> What do you think?

Yeah, looks good! Does that work for you?


Björn
Yunhui Cui Aug. 17, 2023, 11:32 a.m. UTC | #5
Hi  Björn,

On Thu, Aug 17, 2023 at 6:03 PM Björn Töpel <bjorn@kernel.org> wrote:
>
> yunhui cui <cuiyunhui@bytedance.com> writes:
>
> > Hi Björn,
> >
> > On Wed, Aug 16, 2023 at 11:11 PM Björn Töpel <bjorn@kernel.org> wrote:
> >>
> >> Hi Yunhui,
> >>
> >> Waking up the dead! ;-)
> >>
> >
> >>
> >> X86's show_opcodes() is used both for kernel oops:es, and userland
> >> unhandled signals. On RISC-V there's dump_kernel_instr() added in commit
> >> eb165bfa8eaf ("riscv: Add instruction dump to RISC-V splats").
> >>
> >> Wdyt about reworking that function, so that it works for userland epc as
> >> well? I think it's useful to have the surrounding instruction context,
> >> and not just on instruction.
> >
> > Okay,  Based on your suggestion, I'm going to rename dump_kernel_instr
> > to dump_instr. Like:
> > static void dump_instr(const char *loglvl, struct pt_regs *regs)
> > {
> > ...
> >                 if (user_mode(regs))
> >                         bad = get_user_nofault(val, &insns[i]);
> >                 else
> >                         bad = get_kernel_nofault(val, &insns[i]);
> > ...
> > }
> >
> > What do you think?
>
> Yeah, looks good! Does that work for you?
Yeah, It works for me, I'll update the patch to v3 and send it.

Thanks,
Yunhui
diff mbox series

Patch

diff --git a/arch/riscv/include/asm/bug.h b/arch/riscv/include/asm/bug.h
index 1aaea81fb141..56dab998d05d 100644
--- a/arch/riscv/include/asm/bug.h
+++ b/arch/riscv/include/asm/bug.h
@@ -86,6 +86,7 @@  struct pt_regs;
 struct task_struct;
 
 void __show_regs(struct pt_regs *regs);
+void show_opcodes(struct pt_regs *regs);
 void die(struct pt_regs *regs, const char *str);
 void do_trap(struct pt_regs *regs, int signo, int code, unsigned long addr);
 
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 774ffde386ab..9ba9f8719605 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -83,6 +83,36 @@  void show_regs(struct pt_regs *regs)
 		dump_backtrace(regs, NULL, KERN_DEFAULT);
 }
 
+static int copy_code(struct pt_regs *regs, u8 *buf, unsigned long src,
+		     unsigned int nbytes)
+{
+	if (!user_mode(regs))
+		return copy_from_kernel_nofault(buf, (u8 *)src, nbytes);
+
+	/* The user space code from other tasks cannot be accessed. */
+	if (regs != task_pt_regs(current))
+		return -EPERM;
+
+	return copy_from_user_nofault(buf, (void __user *)src, nbytes);
+}
+
+void show_opcodes(struct pt_regs *regs)
+{
+	u8 opcodes[4];
+
+	switch (copy_code(regs, opcodes, regs->epc, sizeof(opcodes))) {
+	case 0:
+		pr_info("Opcode: %4ph", opcodes);
+		break;
+	case -EPERM:
+		pr_err("Unable to access userspace of other tasks");
+		break;
+	default:
+		pr_err("Failed to access opcode");
+		break;
+	}
+}
+
 #ifdef CONFIG_COMPAT
 static bool compat_mode_supported __read_mostly;
 
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index f6fda94e8e59..892826234ee9 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -100,6 +100,7 @@  void do_trap(struct pt_regs *regs, int signo, int code, unsigned long addr)
 		print_vma_addr(KERN_CONT " in ", instruction_pointer(regs));
 		pr_cont("\n");
 		__show_regs(regs);
+		show_opcodes(regs);
 	}
 
 	force_sig_fault(signo, code, (void __user *)addr);