diff mbox series

[RFC,v3,1/4] arm64: Introduce stack trace reliability checks in the unwinder

Message ID 20210503173615.21576-2-madvenka@linux.microsoft.com (mailing list archive)
State New, archived
Headers show
Series arm64: Stack trace reliability checks in the unwinder | expand

Commit Message

Madhavan T. Venkataraman May 3, 2021, 5:36 p.m. UTC
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

The unwinder should check for the presence of various features and
conditions that can render the stack trace unreliable and mark the
the stack trace as unreliable for the benefit of the caller.

Introduce the first reliability check - If a return PC encountered in a
stack trace is not a valid kernel text address, the stack trace is
considered unreliable. It could be some generated code. Mark the stack
trace unreliable.

Other reliability checks will be added in the future.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/stacktrace.h |  4 ++++
 arch/arm64/kernel/stacktrace.c      | 19 ++++++++++++++++---
 2 files changed, 20 insertions(+), 3 deletions(-)

Comments

Mark Brown May 4, 2021, 3:50 p.m. UTC | #1
On Mon, May 03, 2021 at 12:36:12PM -0500, madvenka@linux.microsoft.com wrote:

> +	/*
> +	 * First, make sure that the return address is a proper kernel text
> +	 * address. A NULL or invalid return address probably means there's
> +	 * some generated code which __kernel_text_address() doesn't know
> +	 * about. Mark the stack trace as not reliable.
> +	 */
> +	if (!__kernel_text_address(frame->pc)) {
> +		frame->reliable = false;
> +		return 0;
> +	}

Do we want the return here?  It means that...

> +
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>  	if (tsk->ret_stack &&
> -		(ptrauth_strip_insn_pac(frame->pc) == (unsigned long)return_to_handler)) {
> +		frame->pc == (unsigned long)return_to_handler) {
>  		struct ftrace_ret_stack *ret_stack;
>  		/*
>  		 * This is a case where function graph tracer has
> @@ -103,11 +117,10 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
>  		if (WARN_ON_ONCE(!ret_stack))
>  			return -EINVAL;
>  		frame->pc = ret_stack->ret;
> +		frame->pc = ptrauth_strip_insn_pac(frame->pc);
>  	}

...we skip this handling in the case where we're not in kernel code.  I
don't know off hand if that's a case that can happen right now but it
seems more robust to run through this and anything else we add later,
even if it's not relevant now changes either in the unwinder itself or
resulting from some future work elsewhere may mean it later becomes
important.  Skipping futher reliability checks is obviously fine if
we've already decided things aren't reliable but this is more than just
a reliability check.
Madhavan T. Venkataraman May 4, 2021, 7:14 p.m. UTC | #2
On 5/4/21 10:50 AM, Mark Brown wrote:
> On Mon, May 03, 2021 at 12:36:12PM -0500, madvenka@linux.microsoft.com wrote:
> 
>> +	/*
>> +	 * First, make sure that the return address is a proper kernel text
>> +	 * address. A NULL or invalid return address probably means there's
>> +	 * some generated code which __kernel_text_address() doesn't know
>> +	 * about. Mark the stack trace as not reliable.
>> +	 */
>> +	if (!__kernel_text_address(frame->pc)) {
>> +		frame->reliable = false;
>> +		return 0;
>> +	}
> 
> Do we want the return here?  It means that...
> 
>> +
>>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>>  	if (tsk->ret_stack &&
>> -		(ptrauth_strip_insn_pac(frame->pc) == (unsigned long)return_to_handler)) {
>> +		frame->pc == (unsigned long)return_to_handler) {
>>  		struct ftrace_ret_stack *ret_stack;
>>  		/*
>>  		 * This is a case where function graph tracer has
>> @@ -103,11 +117,10 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
>>  		if (WARN_ON_ONCE(!ret_stack))
>>  			return -EINVAL;
>>  		frame->pc = ret_stack->ret;
>> +		frame->pc = ptrauth_strip_insn_pac(frame->pc);
>>  	}
> 
> ...we skip this handling in the case where we're not in kernel code.  I
> don't know off hand if that's a case that can happen right now but it
> seems more robust to run through this and anything else we add later,
> even if it's not relevant now changes either in the unwinder itself or
> resulting from some future work elsewhere may mean it later becomes
> important.  Skipping futher reliability checks is obviously fine if
> we've already decided things aren't reliable but this is more than just
> a reliability check.
> 

AFAICT, currently, all the functions that the unwinder checks do have
valid kernel text addresses. However, I don't think there is any harm
in letting it fall through and make all the checks. So, I will remove the
return statement.

Thanks!

Madhavan
Josh Poimboeuf May 4, 2021, 9:52 p.m. UTC | #3
On Mon, May 03, 2021 at 12:36:12PM -0500, madvenka@linux.microsoft.com wrote:
> @@ -44,6 +44,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
>  	unsigned long fp = frame->fp;
>  	struct stack_info info;
>  
> +	frame->reliable = true;
> +

Why set 'reliable' to true on every invocation of unwind_frame()?
Shouldn't it be remembered across frames?

Also, it looks like there are several error scenarios where it returns
-EINVAL but doesn't set 'reliable' to false.
Madhavan T. Venkataraman May 4, 2021, 11:13 p.m. UTC | #4
On 5/4/21 4:52 PM, Josh Poimboeuf wrote:
> On Mon, May 03, 2021 at 12:36:12PM -0500, madvenka@linux.microsoft.com wrote:
>> @@ -44,6 +44,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
>>  	unsigned long fp = frame->fp;
>>  	struct stack_info info;
>>  
>> +	frame->reliable = true;
>> +
> 
> Why set 'reliable' to true on every invocation of unwind_frame()?
> Shouldn't it be remembered across frames?
> 

This is mainly for debug purposes in case a caller wants to print the whole stack and also
print which functions are unreliable. For livepatch, it does not make any difference. It will
quit as soon as it encounters an unreliable frame.

> Also, it looks like there are several error scenarios where it returns
> -EINVAL but doesn't set 'reliable' to false.
> 

I wanted to make a distinction between an error situation (like stack corruption where unwinding
has to stop) and an unreliable situation (where unwinding can still proceed). E.g., when a
stack trace is taken for informational purposes or debug purposes, the unwinding will try to
proceed until either the stack trace ends or an error happens.

Madhavan
Josh Poimboeuf May 5, 2021, 12:07 a.m. UTC | #5
On Tue, May 04, 2021 at 06:13:39PM -0500, Madhavan T. Venkataraman wrote:
> 
> 
> On 5/4/21 4:52 PM, Josh Poimboeuf wrote:
> > On Mon, May 03, 2021 at 12:36:12PM -0500, madvenka@linux.microsoft.com wrote:
> >> @@ -44,6 +44,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
> >>  	unsigned long fp = frame->fp;
> >>  	struct stack_info info;
> >>  
> >> +	frame->reliable = true;
> >> +
> > 
> > Why set 'reliable' to true on every invocation of unwind_frame()?
> > Shouldn't it be remembered across frames?
> > 
> 
> This is mainly for debug purposes in case a caller wants to print the whole stack and also
> print which functions are unreliable. For livepatch, it does not make any difference. It will
> quit as soon as it encounters an unreliable frame.

Hm, ok.  So 'frame->reliable' refers to the current frame, not the
entire stack.

> > Also, it looks like there are several error scenarios where it returns
> > -EINVAL but doesn't set 'reliable' to false.
> > 
> 
> I wanted to make a distinction between an error situation (like stack corruption where unwinding
> has to stop) and an unreliable situation (where unwinding can still proceed). E.g., when a
> stack trace is taken for informational purposes or debug purposes, the unwinding will try to
> proceed until either the stack trace ends or an error happens.

Ok, but I don't understand how that relates to my comment.

Why wouldn't a stack corruption like !on_accessible_stack() set
'frame->reliable' to false?

In other words: for livepatch purposes, how does the caller tell the
difference between hitting the final stack record -- which returns an
error with reliable 'true' -- and a stack corruption like
!on_accessible_stack(), which also returns an error with reliable
'true'?  Surely the latter should be considered unreliable?
Madhavan T. Venkataraman May 5, 2021, 12:21 a.m. UTC | #6
On 5/4/21 7:07 PM, Josh Poimboeuf wrote:
> On Tue, May 04, 2021 at 06:13:39PM -0500, Madhavan T. Venkataraman wrote:
>>
>>
>> On 5/4/21 4:52 PM, Josh Poimboeuf wrote:
>>> On Mon, May 03, 2021 at 12:36:12PM -0500, madvenka@linux.microsoft.com wrote:
>>>> @@ -44,6 +44,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
>>>>  	unsigned long fp = frame->fp;
>>>>  	struct stack_info info;
>>>>  
>>>> +	frame->reliable = true;
>>>> +
>>>
>>> Why set 'reliable' to true on every invocation of unwind_frame()?
>>> Shouldn't it be remembered across frames?
>>>
>>
>> This is mainly for debug purposes in case a caller wants to print the whole stack and also
>> print which functions are unreliable. For livepatch, it does not make any difference. It will
>> quit as soon as it encounters an unreliable frame.
> 
> Hm, ok.  So 'frame->reliable' refers to the current frame, not the
> entire stack.
> 

Yes.

>>> Also, it looks like there are several error scenarios where it returns
>>> -EINVAL but doesn't set 'reliable' to false.
>>>
>>
>> I wanted to make a distinction between an error situation (like stack corruption where unwinding
>> has to stop) and an unreliable situation (where unwinding can still proceed). E.g., when a
>> stack trace is taken for informational purposes or debug purposes, the unwinding will try to
>> proceed until either the stack trace ends or an error happens.
> 
> Ok, but I don't understand how that relates to my comment.
> 
> Why wouldn't a stack corruption like !on_accessible_stack() set
> 'frame->reliable' to false?
> 

I do see your point. If an error has been hit, then the stack trace is essentially unreliable
regardless of anything else. So, I accept your comment. I will mark the stack trace as unreliable
if any kind of error is encountered.

Thanks!

Madhavan
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index eb29b1fe8255..f1eab6b029f7 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -49,6 +49,8 @@  struct stack_info {
  *
  * @graph:       When FUNCTION_GRAPH_TRACER is selected, holds the index of a
  *               replacement lr value in the ftrace graph stack.
+ *
+ * @reliable:	Is this stack frame reliable?
  */
 struct stackframe {
 	unsigned long fp;
@@ -59,6 +61,7 @@  struct stackframe {
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	int graph;
 #endif
+	bool reliable;
 };
 
 extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
@@ -169,6 +172,7 @@  static inline void start_backtrace(struct stackframe *frame,
 	bitmap_zero(frame->stacks_done, __NR_STACK_TYPES);
 	frame->prev_fp = 0;
 	frame->prev_type = STACK_TYPE_UNKNOWN;
+	frame->reliable = true;
 }
 
 #endif	/* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index d55bdfb7789c..c21a1bca28f3 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -44,6 +44,8 @@  int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 	unsigned long fp = frame->fp;
 	struct stack_info info;
 
+	frame->reliable = true;
+
 	/* Terminal record; nothing to unwind */
 	if (!fp)
 		return -ENOENT;
@@ -86,12 +88,24 @@  int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 	 */
 	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
 	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
+	frame->pc = ptrauth_strip_insn_pac(frame->pc);
 	frame->prev_fp = fp;
 	frame->prev_type = info.type;
 
+	/*
+	 * First, make sure that the return address is a proper kernel text
+	 * address. A NULL or invalid return address probably means there's
+	 * some generated code which __kernel_text_address() doesn't know
+	 * about. Mark the stack trace as not reliable.
+	 */
+	if (!__kernel_text_address(frame->pc)) {
+		frame->reliable = false;
+		return 0;
+	}
+
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	if (tsk->ret_stack &&
-		(ptrauth_strip_insn_pac(frame->pc) == (unsigned long)return_to_handler)) {
+		frame->pc == (unsigned long)return_to_handler) {
 		struct ftrace_ret_stack *ret_stack;
 		/*
 		 * This is a case where function graph tracer has
@@ -103,11 +117,10 @@  int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 		if (WARN_ON_ONCE(!ret_stack))
 			return -EINVAL;
 		frame->pc = ret_stack->ret;
+		frame->pc = ptrauth_strip_insn_pac(frame->pc);
 	}
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
-	frame->pc = ptrauth_strip_insn_pac(frame->pc);
-
 	return 0;
 }
 NOKPROBE_SYMBOL(unwind_frame);