Message ID | 20211123193723.12112-5-madvenka@linux.microsoft.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Reorganize the unwinder and implement stack trace reliability checks | expand |
On Tue, Nov 23, 2021 at 01:37:22PM -0600, madvenka@linux.microsoft.com wrote: > Introduce arch_stack_walk_reliable() for ARM64. This works like > arch_stack_walk() except that it returns -EINVAL if the stack trace is not > reliable. > Until all the reliability checks are in place, arch_stack_walk_reliable() > may not be used by livepatch. But it may be used by debug and test code. Probably also worth noting that this doesn't select HAVE_RELIABLE_STACKTRACE which is what any actual users are going to use to identify if the architecture has the feature. I would have been tempted to add arch_stack_walk() as a separate patch but equally having the user code there (even if it itself can't yet be used...) helps with reviewing the actual unwinder so I don't mind. > +static void unwind_check_reliability(struct task_struct *task, > + struct stackframe *frame) > +{ > + if (frame->fp == (unsigned long)task_pt_regs(task)->stackframe) { > + /* Final frame; no more unwind, no need to check reliability */ > + return; > + } If the unwinder carries on for some reason (the code for that is elsewhere and may be updated separately...) then this will start checking again. I'm not sure if this is a *problem* as such but the thing about this being the final frame coupled with not actually explicitly stopping the unwind here makes me think this should at least be clearer, the comment begs the question about what happens if something decides it is not in fact the final frame.
On 11/25/21 8:56 AM, Mark Brown wrote: > On Tue, Nov 23, 2021 at 01:37:22PM -0600, madvenka@linux.microsoft.com wrote: > >> Introduce arch_stack_walk_reliable() for ARM64. This works like >> arch_stack_walk() except that it returns -EINVAL if the stack trace is not >> reliable. > >> Until all the reliability checks are in place, arch_stack_walk_reliable() >> may not be used by livepatch. But it may be used by debug and test code. > > Probably also worth noting that this doesn't select > HAVE_RELIABLE_STACKTRACE which is what any actual users are going to use > to identify if the architecture has the feature. I would have been > tempted to add arch_stack_walk() as a separate patch but equally having > the user code there (even if it itself can't yet be used...) helps with > reviewing the actual unwinder so I don't mind. > I did not select HAVE_RELIABLE_STACKTRACE just in case we think that some more reliability checks need to be added. But if reviewers agree that this patch series contains all the reliability checks we need, I will add a patch to select HAVE_RELIABLE_STACKTRACE to the series. >> +static void unwind_check_reliability(struct task_struct *task, >> + struct stackframe *frame) >> +{ >> + if (frame->fp == (unsigned long)task_pt_regs(task)->stackframe) { >> + /* Final frame; no more unwind, no need to check reliability */ >> + return; >> + } > > If the unwinder carries on for some reason (the code for that is > elsewhere and may be updated separately...) then this will start > checking again. I'm not sure if this is a *problem* as such but the > thing about this being the final frame coupled with not actually > explicitly stopping the unwind here makes me think this should at least > be clearer, the comment begs the question about what happens if > something decides it is not in fact the final frame. > I can address this by adding an explicit comment to that effect. For example, define a separate function to check for the final frame: /* * Check if this is the final frame. Unwind must stop at the final * frame. */ static inline bool unwind_is_final_frame(struct task_struct *task, struct stackframe *frame) { return frame->fp == (unsigned long)task_pt_regs(task)->stackframe; } Then, use this function in unwind_check_reliability() and unwind_continue(). Is this acceptable? Madhavan
On Thu, Nov 25, 2021 at 10:59:27AM -0600, Madhavan T. Venkataraman wrote: > On 11/25/21 8:56 AM, Mark Brown wrote: > > On Tue, Nov 23, 2021 at 01:37:22PM -0600, madvenka@linux.microsoft.com wrote: > > Probably also worth noting that this doesn't select > > HAVE_RELIABLE_STACKTRACE which is what any actual users are going to use > > to identify if the architecture has the feature. I would have been > > tempted to add arch_stack_walk() as a separate patch but equally having > > the user code there (even if it itself can't yet be used...) helps with > > reviewing the actual unwinder so I don't mind. > I did not select HAVE_RELIABLE_STACKTRACE just in case we think that some > more reliability checks need to be added. But if reviewers agree > that this patch series contains all the reliability checks we need, I > will add a patch to select HAVE_RELIABLE_STACKTRACE to the series. I agree that more checks probably need to be added, might be worth throwing that patch into the end of the series though to provide a place to discuss what exactly we need. My main thought here was that it's worth explicitly highlighting in this change that the Kconfig bit isn't glued up here so reviewers notice that's what's happening. > >> +static void unwind_check_reliability(struct task_struct *task, > >> + struct stackframe *frame) > >> +{ > >> + if (frame->fp == (unsigned long)task_pt_regs(task)->stackframe) { > >> + /* Final frame; no more unwind, no need to check reliability */ > >> + return; > >> + } > > If the unwinder carries on for some reason (the code for that is > > elsewhere and may be updated separately...) then this will start > > checking again. I'm not sure if this is a *problem* as such but the > > thing about this being the final frame coupled with not actually > > explicitly stopping the unwind here makes me think this should at least > > be clearer, the comment begs the question about what happens if > > something decides it is not in fact the final frame. > I can address this by adding an explicit comment to that effect. > For example, define a separate function to check for the final frame: > /* > * Check if this is the final frame. Unwind must stop at the final > * frame. > */ > static inline bool unwind_is_final_frame(struct task_struct *task, > struct stackframe *frame) > { > return frame->fp == (unsigned long)task_pt_regs(task)->stackframe; > } > Then, use this function in unwind_check_reliability() and unwind_continue(). > Is this acceptable? Yes, I think that should address the issue - I'd have to see it in context to be sure but it does make it clear that the same check is being done which was the main thing.
On 11/26/21 7:29 AM, Mark Brown wrote: > On Thu, Nov 25, 2021 at 10:59:27AM -0600, Madhavan T. Venkataraman wrote: >> On 11/25/21 8:56 AM, Mark Brown wrote: >>> On Tue, Nov 23, 2021 at 01:37:22PM -0600, madvenka@linux.microsoft.com wrote: > >>> Probably also worth noting that this doesn't select >>> HAVE_RELIABLE_STACKTRACE which is what any actual users are going to use >>> to identify if the architecture has the feature. I would have been >>> tempted to add arch_stack_walk() as a separate patch but equally having >>> the user code there (even if it itself can't yet be used...) helps with >>> reviewing the actual unwinder so I don't mind. > >> I did not select HAVE_RELIABLE_STACKTRACE just in case we think that some >> more reliability checks need to be added. But if reviewers agree >> that this patch series contains all the reliability checks we need, I >> will add a patch to select HAVE_RELIABLE_STACKTRACE to the series. > > I agree that more checks probably need to be added, might be worth > throwing that patch into the end of the series though to provide a place > to discuss what exactly we need. My main thought here was that it's > worth explicitly highlighting in this change that the Kconfig bit isn't > glued up here so reviewers notice that's what's happening. > OK. I will add the patch to the next version. >>>> +static void unwind_check_reliability(struct task_struct *task, >>>> + struct stackframe *frame) >>>> +{ >>>> + if (frame->fp == (unsigned long)task_pt_regs(task)->stackframe) { >>>> + /* Final frame; no more unwind, no need to check reliability */ >>>> + return; >>>> + } > >>> If the unwinder carries on for some reason (the code for that is >>> elsewhere and may be updated separately...) then this will start >>> checking again. I'm not sure if this is a *problem* as such but the >>> thing about this being the final frame coupled with not actually >>> explicitly stopping the unwind here makes me think this should at least >>> be clearer, the comment begs the question about what happens if >>> something decides it is not in fact the final frame. > >> I can address this by adding an explicit comment to that effect. >> For example, define a separate function to check for the final frame: > >> /* >> * Check if this is the final frame. Unwind must stop at the final >> * frame. >> */ >> static inline bool unwind_is_final_frame(struct task_struct *task, >> struct stackframe *frame) >> { >> return frame->fp == (unsigned long)task_pt_regs(task)->stackframe; >> } > >> Then, use this function in unwind_check_reliability() and unwind_continue(). > >> Is this acceptable? > > Yes, I think that should address the issue - I'd have to see it in > context to be sure but it does make it clear that the same check is > being done which was the main thing. > OK. I will make the above changes in the next version. Thanks. Madhavan
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h index d838586adef9..7143e80c3d96 100644 --- a/arch/arm64/include/asm/stacktrace.h +++ b/arch/arm64/include/asm/stacktrace.h @@ -53,6 +53,8 @@ struct stack_info { * value. * * @failed: Unwind failed. + * + * @reliable: Stack trace is reliable. */ struct stackframe { unsigned long fp; @@ -64,6 +66,7 @@ struct stackframe { struct llist_node *kr_cur; #endif bool failed; + bool reliable; }; extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk, diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index 3b670ab1f0e9..77eb00e45558 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -18,6 +18,26 @@ #include <asm/stack_pointer.h> #include <asm/stacktrace.h> +/* + * Check the stack frame for conditions that make further unwinding unreliable. + */ +static void unwind_check_reliability(struct task_struct *task, + struct stackframe *frame) +{ + if (frame->fp == (unsigned long)task_pt_regs(task)->stackframe) { + /* Final frame; no more unwind, no need to check reliability */ + return; + } + + /* + * If the PC is not a known kernel text address, then we cannot + * be sure that a subsequent unwind will be reliable, as we + * don't know that the code follows our unwind requirements. + */ + if (!__kernel_text_address(frame->pc)) + frame->reliable = false; +} + /* * AArch64 PCS assigns the frame pointer to x29. * @@ -33,8 +53,9 @@ */ -static void unwind_start(struct stackframe *frame, unsigned long fp, - unsigned long pc) +static void unwind_start(struct task_struct *task, + struct stackframe *frame, + unsigned long fp, unsigned long pc) { frame->fp = fp; frame->pc = pc; @@ -55,6 +76,8 @@ static void unwind_start(struct stackframe *frame, unsigned long fp, frame->prev_fp = 0; frame->prev_type = STACK_TYPE_UNKNOWN; frame->failed = false; + frame->reliable = true; + unwind_check_reliability(task, frame); } /* @@ -141,6 +164,7 @@ static void notrace unwind_next(struct task_struct *tsk, if (is_kretprobe_trampoline(frame->pc)) frame->pc = kretprobe_find_ret_addr(tsk, (void *)frame->fp, &frame->kr_cur); #endif + unwind_check_reliability(tsk, frame); } NOKPROBE_SYMBOL(unwind_next); @@ -166,15 +190,16 @@ static bool unwind_continue(struct task_struct *task, return true; } -static void notrace unwind(struct task_struct *tsk, +static bool notrace unwind(struct task_struct *tsk, unsigned long fp, unsigned long pc, bool (*fn)(void *, unsigned long), void *data) { struct stackframe frame; - unwind_start(&frame, fp, pc); + unwind_start(tsk, &frame, fp, pc); while (unwind_continue(tsk, &frame, fn, data)) unwind_next(tsk, &frame); + return !frame.failed && frame.reliable; } NOKPROBE_SYMBOL(unwind); @@ -231,3 +256,29 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry, } unwind(task, fp, pc, consume_entry, cookie); } + +/* + * arch_stack_walk_reliable() may not be used for livepatch until all of + * the reliability checks are in place in unwind_consume(). However, + * debug and test code can choose to use it even if all the checks are not + * in place. + */ +noinline int notrace arch_stack_walk_reliable(stack_trace_consume_fn consume_fn, + void *cookie, + struct task_struct *task) +{ + unsigned long fp, pc; + + if (task == current) { + /* Skip arch_stack_walk_reliable() in the stack trace. */ + fp = (unsigned long)__builtin_frame_address(1); + pc = (unsigned long)__builtin_return_address(0); + } else { + /* Caller guarantees that the task is not running. */ + fp = thread_saved_fp(task); + pc = thread_saved_pc(task); + } + if (unwind(task, fp, pc, consume_fn, cookie)) + return 0; + return -EINVAL; +}