Message ID | 20210526214917.20099-3-madvenka@linux.microsoft.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Implement stack trace reliability checks | expand |
On Wed, May 26, 2021 at 04:49:17PM -0500, madvenka@linux.microsoft.com wrote: > The unwinder should check if the return PC falls in any function that > is considered unreliable from an unwinding perspective. If it does, > mark the stack trace unreliable. Reviwed-by: Mark Brown <broonie@kernel.org> However it'd be good for someone else to double check this as it's entirely possible that I've missed some case here. > + * Some special cases covered by sym_code_functions[] deserve a mention here: > + * - All EL1 interrupt and exception stack traces will be considered > + * unreliable. This is the correct behavior as interrupts and exceptions > + * can happen on any instruction including ones in the frame pointer > + * prolog and epilog. Unless stack metadata is available so the unwinder > + * can unwind through these special cases, such stack traces will be > + * considered unreliable. > + * If you're respinning this it's probably also worth noting that we only ever perform reliable stack trace on either blocked tasks or the current task which should if my reasoning is correct mean that the fact that the exclusions here mean that we avoid having to worry about so many race conditions when entering and leaving functions. If we got preempted at the wrong moment for one of them then we should observe the preemption and mark the trace as unreliable due to that which means that any confusion the race causes is a non-issue.
On Wed, May 26, 2021 at 04:49:17PM -0500, madvenka@linux.microsoft.com wrote: > + * - return_to_handler() is handled by the unwinder by attempting to > + * retrieve the original return address from the per-task return > + * address stack. > + * > + * - kretprobe_trampoline() can be handled in a similar fashion by > + * attempting to retrieve the original return address from the per-task > + * kretprobe instance list. > + * > + * - I reckon optprobes can be handled in a similar fashion in the future? Note that there's a patch for optprobes on the list now: https://lore.kernel.org/r/1622803839-27354-1-git-send-email-liuqi115@huawei.com
On 6/4/21 11:24 AM, Mark Brown wrote: > On Wed, May 26, 2021 at 04:49:17PM -0500, madvenka@linux.microsoft.com wrote: > >> The unwinder should check if the return PC falls in any function that >> is considered unreliable from an unwinding perspective. If it does, >> mark the stack trace unreliable. > > Reviwed-by: Mark Brown <broonie@kernel.org> > Thanks. > However it'd be good for someone else to double check this as it's > entirely possible that I've missed some case here. > I will request Mark Rutland to review this as well. >> + * Some special cases covered by sym_code_functions[] deserve a mention here: > >> + * - All EL1 interrupt and exception stack traces will be considered >> + * unreliable. This is the correct behavior as interrupts and exceptions >> + * can happen on any instruction including ones in the frame pointer >> + * prolog and epilog. Unless stack metadata is available so the unwinder >> + * can unwind through these special cases, such stack traces will be >> + * considered unreliable. >> + * > > If you're respinning this it's probably also worth noting that we only > ever perform reliable stack trace on either blocked tasks or the current > task which should if my reasoning is correct mean that the fact that > the exclusions here mean that we avoid having to worry about so many > race conditions when entering and leaving functions. If we got > preempted at the wrong moment for one of them then we should observe the > preemption and mark the trace as unreliable due to that which means that > any confusion the race causes is a non-issue. > I will add a comment that "livepatch only looks at tasks that are currently not on any CPU (except for the current task). Such tasks either blocked on something and gave up the CPU voluntarily. Or, they were preempted. The above comment applies to the latter case". Madhavan
On 6/4/21 11:59 AM, Mark Brown wrote: > On Wed, May 26, 2021 at 04:49:17PM -0500, madvenka@linux.microsoft.com wrote: > >> + * - return_to_handler() is handled by the unwinder by attempting to >> + * retrieve the original return address from the per-task return >> + * address stack. >> + * >> + * - kretprobe_trampoline() can be handled in a similar fashion by >> + * attempting to retrieve the original return address from the per-task >> + * kretprobe instance list. >> + * >> + * - I reckon optprobes can be handled in a similar fashion in the future? > > Note that there's a patch for optprobes on the list now: > > https://lore.kernel.org/r/1622803839-27354-1-git-send-email-liuqi115@huawei.com Yes. I saw that. Madhavan
On Wed, 2021-05-26 at 16:49 -0500, madvenka@linux.microsoft.com wrote: > From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> > > The unwinder should check if the return PC falls in any function that > is considered unreliable from an unwinding perspective. If it does, > mark the stack trace unreliable. > [snip] Correct me if I'm wrong, but do you not need to move the final frame check to before the unwinder_is_unreliable() call? Userland threads which have ret_from_fork as the last entry on the stack will always be marked unreliable as they will always have a SYM_CODE entry on their stack (the ret_from_fork). Also given that this means the last frame has been reached and as such there's no more unwinding to do, I don't think we care if the last pc is a code address. - Suraj > * > @@ -133,7 +236,20 @@ int notrace unwind_frame(struct task_struct > *tsk, struct stackframe *frame) > * - Foreign code (e.g. EFI runtime services) > * - Procedure Linkage Table (PLT) entries and veneer > functions > */ > - if (!__kernel_text_address(frame->pc)) > + if (!__kernel_text_address(frame->pc)) { > + frame->reliable = false; > + return 0; > + } > + > + /* > + * If the final frame has been reached, there is no more > unwinding > + * to do. There is no need to check if the return PC is > considered > + * unreliable by the unwinder. > + */ > + if (!frame->fp) > + return 0; if (frame->fp == (unsigned long)task_pt_regs(tsk)->stackframe) return -ENOENT; > + > + if (unwinder_is_unreliable(frame->pc)) > frame->reliable = false; > > return 0; > diff --git a/arch/arm64/kernel/vmlinux.lds.S > b/arch/arm64/kernel/vmlinux.lds.S > index 7eea7888bb02..32e8d57397a1 100644 > --- a/arch/arm64/kernel/vmlinux.lds.S > +++ b/arch/arm64/kernel/vmlinux.lds.S > @@ -103,6 +103,12 @@ jiffies = jiffies_64; > #define TRAMP_TEXT > #endif > > +#define SYM_CODE_FUNCTIONS \ > + . = ALIGN(16); \ > + __sym_code_functions_start = .; \ > + KEEP(*(sym_code_functions)) \ > + __sym_code_functions_end = .; > + > /* > * The size of the PE/COFF section that covers the kernel image, > which > * runs from _stext to _edata, must be a round multiple of the > PE/COFF > @@ -218,6 +224,7 @@ SECTIONS > CON_INITCALL > INIT_RAM_FS > *(.init.altinstructions .init.bss) /* from the > EFI stub */ > + SYM_CODE_FUNCTIONS > } > .exit.data : { > EXIT_DATA
Hi Suraj, > > if (frame->fp == (unsigned long)task_pt_regs(tsk)->stackframe) > return -ENOENT; If I understand correctly, a similar final frame check is introduced in this patch: arm64: Implement stack trace termination record https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?id=7d7b720a4b80 It is currently merged into for-next/stacktrace branch. Thanks & Best Regards, Keiya Nobuta <nobuta.keiya@fujitsu.com> --------------------------------------------------------- Solution Development Dept. Software Div. FUJITSU COMPUTER TECHNOLOGIES Ltd.
On 6/15/21 8:52 PM, Suraj Jitindar Singh wrote: > On Wed, 2021-05-26 at 16:49 -0500, madvenka@linux.microsoft.com wrote: >> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> >> >> The unwinder should check if the return PC falls in any function that >> is considered unreliable from an unwinding perspective. If it does, >> mark the stack trace unreliable. >> > > [snip] > > Correct me if I'm wrong, but do you not need to move the final frame > check to before the unwinder_is_unreliable() call? > That is done in a patch series that has been merged into for-next/stacktrace branch. When I merge this patch series with that, the final frame check will be done prior. I have mentioned this in the cover letter: Last stack frame ================ If a SYM_CODE function occurs in the very last frame in the stack trace, then the stack trace is not considered unreliable. This is because there is no more unwinding to do. Examples: - EL0 exception stack traces end in the top level EL0 exception handlers. - All kernel thread stack traces end in ret_from_fork(). Madhavan > Userland threads which have ret_from_fork as the last entry on the > stack will always be marked unreliable as they will always have a > SYM_CODE entry on their stack (the ret_from_fork). > > Also given that this means the last frame has been reached and as such > there's no more unwinding to do, I don't think we care if the last pc > is a code address. > > - Suraj > >> * >> @@ -133,7 +236,20 @@ int notrace unwind_frame(struct task_struct >> *tsk, struct stackframe *frame) >> * - Foreign code (e.g. EFI runtime services) >> * - Procedure Linkage Table (PLT) entries and veneer >> functions >> */ >> - if (!__kernel_text_address(frame->pc)) >> + if (!__kernel_text_address(frame->pc)) { >> + frame->reliable = false; >> + return 0; >> + } >> + >> + /* >> + * If the final frame has been reached, there is no more >> unwinding >> + * to do. There is no need to check if the return PC is >> considered >> + * unreliable by the unwinder. >> + */ >> + if (!frame->fp) >> + return 0; > > if (frame->fp == (unsigned long)task_pt_regs(tsk)->stackframe) > return -ENOENT; > >> + >> + if (unwinder_is_unreliable(frame->pc)) >> frame->reliable = false; >> >> return 0; >> diff --git a/arch/arm64/kernel/vmlinux.lds.S >> b/arch/arm64/kernel/vmlinux.lds.S >> index 7eea7888bb02..32e8d57397a1 100644 >> --- a/arch/arm64/kernel/vmlinux.lds.S >> +++ b/arch/arm64/kernel/vmlinux.lds.S >> @@ -103,6 +103,12 @@ jiffies = jiffies_64; >> #define TRAMP_TEXT >> #endif >> >> +#define SYM_CODE_FUNCTIONS \ >> + . = ALIGN(16); \ >> + __sym_code_functions_start = .; \ >> + KEEP(*(sym_code_functions)) \ >> + __sym_code_functions_end = .; >> + >> /* >> * The size of the PE/COFF section that covers the kernel image, >> which >> * runs from _stext to _edata, must be a round multiple of the >> PE/COFF >> @@ -218,6 +224,7 @@ SECTIONS >> CON_INITCALL >> INIT_RAM_FS >> *(.init.altinstructions .init.bss) /* from the >> EFI stub */ >> + SYM_CODE_FUNCTIONS >> } >> .exit.data : { >> EXIT_DATA
diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h index ba89a9af820a..3b5f1fd332b0 100644 --- a/arch/arm64/include/asm/linkage.h +++ b/arch/arm64/include/asm/linkage.h @@ -60,4 +60,16 @@ SYM_FUNC_END(x); \ SYM_FUNC_END_ALIAS(__pi_##x) +/* + * Record the address range of each SYM_CODE function in a struct code_range + * in a special section. + */ +#define SYM_CODE_END(name) \ + SYM_END(name, SYM_T_NONE) ;\ + 99: ;\ + .pushsection "sym_code_functions", "aw" ;\ + .quad name ;\ + .quad 99b ;\ + .popsection + #endif diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h index 2f36b16a5b5d..29cb566f65ec 100644 --- a/arch/arm64/include/asm/sections.h +++ b/arch/arm64/include/asm/sections.h @@ -20,5 +20,6 @@ extern char __exittext_begin[], __exittext_end[]; extern char __irqentry_text_start[], __irqentry_text_end[]; extern char __mmuoff_data_start[], __mmuoff_data_end[]; extern char __entry_tramp_text_start[], __entry_tramp_text_end[]; +extern char __sym_code_functions_start[], __sym_code_functions_end[]; #endif /* __ASM_SECTIONS_H */ diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index 9061375c8785..5477a9d39b12 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -18,6 +18,109 @@ #include <asm/stack_pointer.h> #include <asm/stacktrace.h> +struct code_range { + unsigned long start; + unsigned long end; +}; + +static struct code_range *sym_code_functions; +static int num_sym_code_functions; + +int __init init_sym_code_functions(void) +{ + size_t size; + + size = (unsigned long)__sym_code_functions_end - + (unsigned long)__sym_code_functions_start; + + sym_code_functions = kmalloc(size, GFP_KERNEL); + if (!sym_code_functions) + return -ENOMEM; + + memcpy(sym_code_functions, __sym_code_functions_start, size); + /* Update num_sym_code_functions after copying sym_code_functions. */ + smp_mb(); + num_sym_code_functions = size / sizeof(struct code_range); + + return 0; +} +early_initcall(init_sym_code_functions); + +/* + * Check the return PC against sym_code_functions[]. If there is a match, then + * the consider the stack frame unreliable. These functions contain low-level + * code where the frame pointer and/or the return address register cannot be + * relied upon. This addresses the following situations: + * + * - Exception handlers and entry assembly + * - Trampoline assembly (e.g., ftrace, kprobes) + * - Hypervisor-related assembly + * - Hibernation-related assembly + * - CPU start-stop, suspend-resume assembly + * - Kernel relocation assembly + * + * Some special cases covered by sym_code_functions[] deserve a mention here: + * + * - All EL1 interrupt and exception stack traces will be considered + * unreliable. This is the correct behavior as interrupts and exceptions + * can happen on any instruction including ones in the frame pointer + * prolog and epilog. Unless stack metadata is available so the unwinder + * can unwind through these special cases, such stack traces will be + * considered unreliable. + * + * - A task can get preempted at the end of an interrupt. Stack traces + * of preempted tasks will show the interrupt frame in the stack trace + * and will be considered unreliable. + * + * - Breakpoints are exceptions. So, all stack traces in the break point + * handler (including probes) will be considered unreliable. + * + * - All of the ftrace entry trampolines are considered unreliable. So, + * all stack traces taken from tracer functions will be considered + * unreliable. + * + * - The Function Graph Tracer return trampoline (return_to_handler) + * and the Kretprobe return trampoline (kretprobe_trampoline) are + * also considered unreliable. + * + * Some of the special cases above can be unwound through using special logic + * in unwind_frame(). + * + * - return_to_handler() is handled by the unwinder by attempting to + * retrieve the original return address from the per-task return + * address stack. + * + * - kretprobe_trampoline() can be handled in a similar fashion by + * attempting to retrieve the original return address from the per-task + * kretprobe instance list. + * + * - I reckon optprobes can be handled in a similar fashion in the future? + * + * - Stack traces taken from the FTrace tracer functions can be handled + * as well. ftrace_call is an inner label defined in the Ftrace entry + * trampoline. This is the location where the call to a tracer function + * is patched. So, if the return PC equals ftrace_call+4, it is + * reliable. At that point, proper stack frames have already been set + * up for the traced function and its caller. + */ +static bool unwinder_is_unreliable(unsigned long pc) +{ + const struct code_range *range; + int i; + + /* + * If sym_code_functions[] were sorted, a binary search could be + * done to make this more performant. + */ + for (i = 0; i < num_sym_code_functions; i++) { + range = &sym_code_functions[i]; + if (pc >= range->start && pc < range->end) + return true; + } + + return false; +} + /* * AArch64 PCS assigns the frame pointer to x29. * @@ -133,7 +236,20 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) * - Foreign code (e.g. EFI runtime services) * - Procedure Linkage Table (PLT) entries and veneer functions */ - if (!__kernel_text_address(frame->pc)) + if (!__kernel_text_address(frame->pc)) { + frame->reliable = false; + return 0; + } + + /* + * If the final frame has been reached, there is no more unwinding + * to do. There is no need to check if the return PC is considered + * unreliable by the unwinder. + */ + if (!frame->fp) + return 0; + + if (unwinder_is_unreliable(frame->pc)) frame->reliable = false; return 0; diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S index 7eea7888bb02..32e8d57397a1 100644 --- a/arch/arm64/kernel/vmlinux.lds.S +++ b/arch/arm64/kernel/vmlinux.lds.S @@ -103,6 +103,12 @@ jiffies = jiffies_64; #define TRAMP_TEXT #endif +#define SYM_CODE_FUNCTIONS \ + . = ALIGN(16); \ + __sym_code_functions_start = .; \ + KEEP(*(sym_code_functions)) \ + __sym_code_functions_end = .; + /* * The size of the PE/COFF section that covers the kernel image, which * runs from _stext to _edata, must be a round multiple of the PE/COFF @@ -218,6 +224,7 @@ SECTIONS CON_INITCALL INIT_RAM_FS *(.init.altinstructions .init.bss) /* from the EFI stub */ + SYM_CODE_FUNCTIONS } .exit.data : { EXIT_DATA