Message ID | 20210405204313.21346-2-madvenka@linux.microsoft.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Implement stack trace reliability checks | expand |
On Mon, Apr 05, 2021 at 03:43:10PM -0500, madvenka@linux.microsoft.com wrote: > From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> > > Implement a check_reliability() function that will contain checks for the > presence of various features and conditions that can render the stack trace > unreliable. Reviewed-by: Mark Brown <broonie@kernel.org>
On Mon, Apr 05, 2021 at 03:43:10PM -0500, madvenka@linux.microsoft.com wrote: > These checks will involve checking the return PC to see if it falls inside > any special functions where the stack trace is considered unreliable. > Implement the infrastructure needed for this. Following up again based on an off-list discussion with Mark Rutland: while I think this is a reasonable implementation for specifically listing functions that cause problems we could make life easier for ourselves by instead using annotations at the call sites to put things into sections which indicate that they're unsafe for unwinding, we can then check for any address in one of those sections (or possibly do the reverse and check for any address in a section we specifically know is safe) rather than having to enumerate problematic functions in the unwinder. This also has the advantage of not having a list that's separate to the functions themselves so it's less likely that the unwinder will get out of sync with the rest of the code as things evolve. We already have SYM_CODE_START() annotations in the code for assembly functions that aren't using the standard calling convention which should help a lot here, we could add a variant of that for things that we know are safe on stacks (like those we expect to find at the bottom of stacks).
On 4/8/21 12:17 PM, Mark Brown wrote: > On Mon, Apr 05, 2021 at 03:43:10PM -0500, madvenka@linux.microsoft.com wrote: > >> These checks will involve checking the return PC to see if it falls inside >> any special functions where the stack trace is considered unreliable. >> Implement the infrastructure needed for this. > > Following up again based on an off-list discussion with Mark Rutland: > while I think this is a reasonable implementation for specifically > listing functions that cause problems we could make life easier for > ourselves by instead using annotations at the call sites to put things > into sections which indicate that they're unsafe for unwinding, we can > then check for any address in one of those sections (or possibly do the > reverse and check for any address in a section we specifically know is > safe) rather than having to enumerate problematic functions in the > unwinder. This also has the advantage of not having a list that's > separate to the functions themselves so it's less likely that the > unwinder will get out of sync with the rest of the code as things evolve. > > We already have SYM_CODE_START() annotations in the code for assembly > functions that aren't using the standard calling convention which should > help a lot here, we could add a variant of that for things that we know > are safe on stacks (like those we expect to find at the bottom of > stacks). > As I already mentioned before, I like the idea of sections. The only reason that I did not try it was that I have to address FTRACE trampolines and the kretprobe_trampoline (and optprobes in the future). I have the following options: 1. Create a common section (I will have to come up with an appropriate name) and put all such functions in that one section. 2. Create one section for each logical type (exception section, ftrace section and kprobe section) or some such. 3. Use the section idea only for the el1 exceptions. For the others use the current special_functions[] approach. Which one do you and Mark Rutland prefer? Or, is there another choice? Madhavan
On 4/8/21 2:30 PM, Madhavan T. Venkataraman wrote: > > > On 4/8/21 12:17 PM, Mark Brown wrote: >> On Mon, Apr 05, 2021 at 03:43:10PM -0500, madvenka@linux.microsoft.com wrote: >> >>> These checks will involve checking the return PC to see if it falls inside >>> any special functions where the stack trace is considered unreliable. >>> Implement the infrastructure needed for this. >> >> Following up again based on an off-list discussion with Mark Rutland: >> while I think this is a reasonable implementation for specifically >> listing functions that cause problems we could make life easier for >> ourselves by instead using annotations at the call sites to put things >> into sections which indicate that they're unsafe for unwinding, we can >> then check for any address in one of those sections (or possibly do the >> reverse and check for any address in a section we specifically know is >> safe) rather than having to enumerate problematic functions in the >> unwinder. This also has the advantage of not having a list that's >> separate to the functions themselves so it's less likely that the >> unwinder will get out of sync with the rest of the code as things evolve. >> >> We already have SYM_CODE_START() annotations in the code for assembly >> functions that aren't using the standard calling convention which should >> help a lot here, we could add a variant of that for things that we know >> are safe on stacks (like those we expect to find at the bottom of >> stacks). >> > > As I already mentioned before, I like the idea of sections. The only reason that I did > not try it was that I have to address FTRACE trampolines and the kretprobe_trampoline > (and optprobes in the future). > > I have the following options: > > 1. Create a common section (I will have to come up with an appropriate name) and put > all such functions in that one section. > > 2. Create one section for each logical type (exception section, ftrace section and > kprobe section) or some such. > For now, I will start with idea 2. I will create a special section for each class of functions (EL1 exception handlers, FTRACE trampolines, KPROBE trampolines). Instead of a special functions array, I will implement a special_sections array. The rest of the code should just fall into place. Let me know if you prefer something different. Thanks. Madhavan > 3. Use the section idea only for the el1 exceptions. For the others use the current > special_functions[] approach. > > Which one do you and Mark Rutland prefer? Or, is there another choice? > > Madhavan >
On Thu, Apr 08, 2021 at 06:30:22PM -0500, Madhavan T. Venkataraman wrote: > On 4/8/21 2:30 PM, Madhavan T. Venkataraman wrote: > > 1. Create a common section (I will have to come up with an appropriate name) and put > > all such functions in that one section. > > 2. Create one section for each logical type (exception section, ftrace section and > > kprobe section) or some such. > For now, I will start with idea 2. I will create a special section for each class of > functions (EL1 exception handlers, FTRACE trampolines, KPROBE trampolines). Instead of a > special functions array, I will implement a special_sections array. The rest of the code > should just fall into place. > Let me know if you prefer something different. It might be safer to start off by just putting all SYM_CODE into a section then pulling bits we know to be safe out of the section as needed - we know that anything that's SYM_CODE is doing something non-standard and needs checking to verify that the unwinder will be happy with it and I that should cover most if not all of the cases above as well as anything else we didn't explicitly think of.
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h index eb29b1fe8255..684f65808394 100644 --- a/arch/arm64/include/asm/stacktrace.h +++ b/arch/arm64/include/asm/stacktrace.h @@ -59,6 +59,7 @@ struct stackframe { #ifdef CONFIG_FUNCTION_GRAPH_TRACER int graph; #endif + bool reliable; }; extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame); @@ -169,6 +170,7 @@ static inline void start_backtrace(struct stackframe *frame, bitmap_zero(frame->stacks_done, __NR_STACK_TYPES); frame->prev_fp = 0; frame->prev_type = STACK_TYPE_UNKNOWN; + frame->reliable = true; } #endif /* __ASM_STACKTRACE_H */ diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index ad20981dfda4..557657d6e6bd 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -18,6 +18,84 @@ #include <asm/stack_pointer.h> #include <asm/stacktrace.h> +struct function_range { + unsigned long start; + unsigned long end; +}; + +/* + * Special functions where the stack trace is unreliable. + */ +static struct function_range special_functions[] = { + { /* sentinel */ } +}; + +static bool is_reliable_function(unsigned long pc) +{ + static bool inited = false; + struct function_range *func; + + if (!inited) { + static char sym[KSYM_NAME_LEN]; + unsigned long size, offset; + + for (func = special_functions; func->start; func++) { + if (kallsyms_lookup(func->start, &size, &offset, + NULL, sym)) { + func->start -= offset; + func->end = func->start + size; + } else { + /* + * This is just a label. So, we only need to + * consider that particular location. So, size + * is the size of one Aarch64 instruction. + */ + func->end = func->start + 4; + } + } + inited = true; + } + + for (func = special_functions; func->start; func++) { + if (pc >= func->start && pc < func->end) + return false; + } + return true; +} + +/* + * Check for the presence of features and conditions that render the stack + * trace unreliable. + * + * Once all such cases have been addressed, this function can aid live + * patching (and this comment can be removed). + */ +static void check_reliability(struct stackframe *frame) +{ + /* + * If the stack trace has already been marked unreliable, just return. + */ + if (!frame->reliable) + return; + + /* + * First, make sure that the return address is a proper kernel text + * address. A NULL or invalid return address probably means there's + * some generated code which __kernel_text_address() doesn't know + * about. Mark the stack trace as not reliable. + */ + if (!__kernel_text_address(frame->pc)) { + frame->reliable = false; + return; + } + + /* + * Check the reliability of the return PC's function. + */ + if (!is_reliable_function(frame->pc)) + frame->reliable = false; +} + /* * AArch64 PCS assigns the frame pointer to x29. * @@ -108,6 +186,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) frame->pc = ptrauth_strip_insn_pac(frame->pc); + check_reliability(frame); + return 0; } NOKPROBE_SYMBOL(unwind_frame);