Message ID | 20210503173615.21576-2-madvenka@linux.microsoft.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Stack trace reliability checks in the unwinder | expand |
On Mon, May 03, 2021 at 12:36:12PM -0500, madvenka@linux.microsoft.com wrote: > + /* > + * First, make sure that the return address is a proper kernel text > + * address. A NULL or invalid return address probably means there's > + * some generated code which __kernel_text_address() doesn't know > + * about. Mark the stack trace as not reliable. > + */ > + if (!__kernel_text_address(frame->pc)) { > + frame->reliable = false; > + return 0; > + } Do we want the return here? It means that... > + > #ifdef CONFIG_FUNCTION_GRAPH_TRACER > if (tsk->ret_stack && > - (ptrauth_strip_insn_pac(frame->pc) == (unsigned long)return_to_handler)) { > + frame->pc == (unsigned long)return_to_handler) { > struct ftrace_ret_stack *ret_stack; > /* > * This is a case where function graph tracer has > @@ -103,11 +117,10 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) > if (WARN_ON_ONCE(!ret_stack)) > return -EINVAL; > frame->pc = ret_stack->ret; > + frame->pc = ptrauth_strip_insn_pac(frame->pc); > } ...we skip this handling in the case where we're not in kernel code. I don't know off hand if that's a case that can happen right now but it seems more robust to run through this and anything else we add later, even if it's not relevant now changes either in the unwinder itself or resulting from some future work elsewhere may mean it later becomes important. Skipping futher reliability checks is obviously fine if we've already decided things aren't reliable but this is more than just a reliability check.
On 5/4/21 10:50 AM, Mark Brown wrote: > On Mon, May 03, 2021 at 12:36:12PM -0500, madvenka@linux.microsoft.com wrote: > >> + /* >> + * First, make sure that the return address is a proper kernel text >> + * address. A NULL or invalid return address probably means there's >> + * some generated code which __kernel_text_address() doesn't know >> + * about. Mark the stack trace as not reliable. >> + */ >> + if (!__kernel_text_address(frame->pc)) { >> + frame->reliable = false; >> + return 0; >> + } > > Do we want the return here? It means that... > >> + >> #ifdef CONFIG_FUNCTION_GRAPH_TRACER >> if (tsk->ret_stack && >> - (ptrauth_strip_insn_pac(frame->pc) == (unsigned long)return_to_handler)) { >> + frame->pc == (unsigned long)return_to_handler) { >> struct ftrace_ret_stack *ret_stack; >> /* >> * This is a case where function graph tracer has >> @@ -103,11 +117,10 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) >> if (WARN_ON_ONCE(!ret_stack)) >> return -EINVAL; >> frame->pc = ret_stack->ret; >> + frame->pc = ptrauth_strip_insn_pac(frame->pc); >> } > > ...we skip this handling in the case where we're not in kernel code. I > don't know off hand if that's a case that can happen right now but it > seems more robust to run through this and anything else we add later, > even if it's not relevant now changes either in the unwinder itself or > resulting from some future work elsewhere may mean it later becomes > important. Skipping futher reliability checks is obviously fine if > we've already decided things aren't reliable but this is more than just > a reliability check. > AFAICT, currently, all the functions that the unwinder checks do have valid kernel text addresses. However, I don't think there is any harm in letting it fall through and make all the checks. So, I will remove the return statement. Thanks! Madhavan
On Mon, May 03, 2021 at 12:36:12PM -0500, madvenka@linux.microsoft.com wrote: > @@ -44,6 +44,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) > unsigned long fp = frame->fp; > struct stack_info info; > > + frame->reliable = true; > + Why set 'reliable' to true on every invocation of unwind_frame()? Shouldn't it be remembered across frames? Also, it looks like there are several error scenarios where it returns -EINVAL but doesn't set 'reliable' to false.
On 5/4/21 4:52 PM, Josh Poimboeuf wrote: > On Mon, May 03, 2021 at 12:36:12PM -0500, madvenka@linux.microsoft.com wrote: >> @@ -44,6 +44,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) >> unsigned long fp = frame->fp; >> struct stack_info info; >> >> + frame->reliable = true; >> + > > Why set 'reliable' to true on every invocation of unwind_frame()? > Shouldn't it be remembered across frames? > This is mainly for debug purposes in case a caller wants to print the whole stack and also print which functions are unreliable. For livepatch, it does not make any difference. It will quit as soon as it encounters an unreliable frame. > Also, it looks like there are several error scenarios where it returns > -EINVAL but doesn't set 'reliable' to false. > I wanted to make a distinction between an error situation (like stack corruption where unwinding has to stop) and an unreliable situation (where unwinding can still proceed). E.g., when a stack trace is taken for informational purposes or debug purposes, the unwinding will try to proceed until either the stack trace ends or an error happens. Madhavan
On Tue, May 04, 2021 at 06:13:39PM -0500, Madhavan T. Venkataraman wrote: > > > On 5/4/21 4:52 PM, Josh Poimboeuf wrote: > > On Mon, May 03, 2021 at 12:36:12PM -0500, madvenka@linux.microsoft.com wrote: > >> @@ -44,6 +44,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) > >> unsigned long fp = frame->fp; > >> struct stack_info info; > >> > >> + frame->reliable = true; > >> + > > > > Why set 'reliable' to true on every invocation of unwind_frame()? > > Shouldn't it be remembered across frames? > > > > This is mainly for debug purposes in case a caller wants to print the whole stack and also > print which functions are unreliable. For livepatch, it does not make any difference. It will > quit as soon as it encounters an unreliable frame. Hm, ok. So 'frame->reliable' refers to the current frame, not the entire stack. > > Also, it looks like there are several error scenarios where it returns > > -EINVAL but doesn't set 'reliable' to false. > > > > I wanted to make a distinction between an error situation (like stack corruption where unwinding > has to stop) and an unreliable situation (where unwinding can still proceed). E.g., when a > stack trace is taken for informational purposes or debug purposes, the unwinding will try to > proceed until either the stack trace ends or an error happens. Ok, but I don't understand how that relates to my comment. Why wouldn't a stack corruption like !on_accessible_stack() set 'frame->reliable' to false? In other words: for livepatch purposes, how does the caller tell the difference between hitting the final stack record -- which returns an error with reliable 'true' -- and a stack corruption like !on_accessible_stack(), which also returns an error with reliable 'true'? Surely the latter should be considered unreliable?
On 5/4/21 7:07 PM, Josh Poimboeuf wrote: > On Tue, May 04, 2021 at 06:13:39PM -0500, Madhavan T. Venkataraman wrote: >> >> >> On 5/4/21 4:52 PM, Josh Poimboeuf wrote: >>> On Mon, May 03, 2021 at 12:36:12PM -0500, madvenka@linux.microsoft.com wrote: >>>> @@ -44,6 +44,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) >>>> unsigned long fp = frame->fp; >>>> struct stack_info info; >>>> >>>> + frame->reliable = true; >>>> + >>> >>> Why set 'reliable' to true on every invocation of unwind_frame()? >>> Shouldn't it be remembered across frames? >>> >> >> This is mainly for debug purposes in case a caller wants to print the whole stack and also >> print which functions are unreliable. For livepatch, it does not make any difference. It will >> quit as soon as it encounters an unreliable frame. > > Hm, ok. So 'frame->reliable' refers to the current frame, not the > entire stack. > Yes. >>> Also, it looks like there are several error scenarios where it returns >>> -EINVAL but doesn't set 'reliable' to false. >>> >> >> I wanted to make a distinction between an error situation (like stack corruption where unwinding >> has to stop) and an unreliable situation (where unwinding can still proceed). E.g., when a >> stack trace is taken for informational purposes or debug purposes, the unwinding will try to >> proceed until either the stack trace ends or an error happens. > > Ok, but I don't understand how that relates to my comment. > > Why wouldn't a stack corruption like !on_accessible_stack() set > 'frame->reliable' to false? > I do see your point. If an error has been hit, then the stack trace is essentially unreliable regardless of anything else. So, I accept your comment. I will mark the stack trace as unreliable if any kind of error is encountered. Thanks! Madhavan
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h index eb29b1fe8255..f1eab6b029f7 100644 --- a/arch/arm64/include/asm/stacktrace.h +++ b/arch/arm64/include/asm/stacktrace.h @@ -49,6 +49,8 @@ struct stack_info { * * @graph: When FUNCTION_GRAPH_TRACER is selected, holds the index of a * replacement lr value in the ftrace graph stack. + * + * @reliable: Is this stack frame reliable? */ struct stackframe { unsigned long fp; @@ -59,6 +61,7 @@ struct stackframe { #ifdef CONFIG_FUNCTION_GRAPH_TRACER int graph; #endif + bool reliable; }; extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame); @@ -169,6 +172,7 @@ static inline void start_backtrace(struct stackframe *frame, bitmap_zero(frame->stacks_done, __NR_STACK_TYPES); frame->prev_fp = 0; frame->prev_type = STACK_TYPE_UNKNOWN; + frame->reliable = true; } #endif /* __ASM_STACKTRACE_H */ diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index d55bdfb7789c..c21a1bca28f3 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -44,6 +44,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) unsigned long fp = frame->fp; struct stack_info info; + frame->reliable = true; + /* Terminal record; nothing to unwind */ if (!fp) return -ENOENT; @@ -86,12 +88,24 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) */ frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp)); frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8)); + frame->pc = ptrauth_strip_insn_pac(frame->pc); frame->prev_fp = fp; frame->prev_type = info.type; + /* + * First, make sure that the return address is a proper kernel text + * address. A NULL or invalid return address probably means there's + * some generated code which __kernel_text_address() doesn't know + * about. Mark the stack trace as not reliable. + */ + if (!__kernel_text_address(frame->pc)) { + frame->reliable = false; + return 0; + } + #ifdef CONFIG_FUNCTION_GRAPH_TRACER if (tsk->ret_stack && - (ptrauth_strip_insn_pac(frame->pc) == (unsigned long)return_to_handler)) { + frame->pc == (unsigned long)return_to_handler) { struct ftrace_ret_stack *ret_stack; /* * This is a case where function graph tracer has @@ -103,11 +117,10 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) if (WARN_ON_ONCE(!ret_stack)) return -EINVAL; frame->pc = ret_stack->ret; + frame->pc = ptrauth_strip_insn_pac(frame->pc); } #endif /* CONFIG_FUNCTION_GRAPH_TRACER */ - frame->pc = ptrauth_strip_insn_pac(frame->pc); - return 0; } NOKPROBE_SYMBOL(unwind_frame);