diff mbox series

[RFC,v2,1/4] arm64: Implement infrastructure for stack trace reliability checks

Message ID 20210405204313.21346-2-madvenka@linux.microsoft.com (mailing list archive)
State New, archived
Headers show
Series arm64: Implement stack trace reliability checks | expand

Commit Message

Madhavan T. Venkataraman April 5, 2021, 8:43 p.m. UTC
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

Implement a check_reliability() function that will contain checks for the
presence of various features and conditions that can render the stack trace
unreliable.

Introduce the first reliability check - If a return PC encountered in a
stack trace is not a valid kernel text address, the stack trace is
considered unreliable. It could be some generated code.

Other reliability checks will be added in the future.

These checks will involve checking the return PC to see if it falls inside
any special functions where the stack trace is considered unreliable.
Implement the infrastructure needed for this.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/include/asm/stacktrace.h |  2 +
 arch/arm64/kernel/stacktrace.c      | 80 +++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+)

Comments

Mark Brown April 8, 2021, 3:15 p.m. UTC | #1
On Mon, Apr 05, 2021 at 03:43:10PM -0500, madvenka@linux.microsoft.com wrote:
> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
> 
> Implement a check_reliability() function that will contain checks for the
> presence of various features and conditions that can render the stack trace
> unreliable.

Reviewed-by: Mark Brown <broonie@kernel.org>
Mark Brown April 8, 2021, 5:17 p.m. UTC | #2
On Mon, Apr 05, 2021 at 03:43:10PM -0500, madvenka@linux.microsoft.com wrote:

> These checks will involve checking the return PC to see if it falls inside
> any special functions where the stack trace is considered unreliable.
> Implement the infrastructure needed for this.

Following up again based on an off-list discussion with Mark Rutland:
while I think this is a reasonable implementation for specifically
listing functions that cause problems we could make life easier for
ourselves by instead using annotations at the call sites to put things
into sections which indicate that they're unsafe for unwinding, we can
then check for any address in one of those sections (or possibly do the
reverse and check for any address in a section we specifically know is
safe) rather than having to enumerate problematic functions in the
unwinder.  This also has the advantage of not having a list that's
separate to the functions themselves so it's less likely that the
unwinder will get out of sync with the rest of the code as things evolve.

We already have SYM_CODE_START() annotations in the code for assembly
functions that aren't using the standard calling convention which should
help a lot here, we could add a variant of that for things that we know
are safe on stacks (like those we expect to find at the bottom of
stacks).
Madhavan T. Venkataraman April 8, 2021, 7:30 p.m. UTC | #3
On 4/8/21 12:17 PM, Mark Brown wrote:
> On Mon, Apr 05, 2021 at 03:43:10PM -0500, madvenka@linux.microsoft.com wrote:
> 
>> These checks will involve checking the return PC to see if it falls inside
>> any special functions where the stack trace is considered unreliable.
>> Implement the infrastructure needed for this.
> 
> Following up again based on an off-list discussion with Mark Rutland:
> while I think this is a reasonable implementation for specifically
> listing functions that cause problems we could make life easier for
> ourselves by instead using annotations at the call sites to put things
> into sections which indicate that they're unsafe for unwinding, we can
> then check for any address in one of those sections (or possibly do the
> reverse and check for any address in a section we specifically know is
> safe) rather than having to enumerate problematic functions in the
> unwinder.  This also has the advantage of not having a list that's
> separate to the functions themselves so it's less likely that the
> unwinder will get out of sync with the rest of the code as things evolve.
> 
> We already have SYM_CODE_START() annotations in the code for assembly
> functions that aren't using the standard calling convention which should
> help a lot here, we could add a variant of that for things that we know
> are safe on stacks (like those we expect to find at the bottom of
> stacks).
> 

As I already mentioned before, I like the idea of sections. The only reason that I did
not try it was that I have to address FTRACE trampolines and the kretprobe_trampoline
(and optprobes in the future).

I have the following options:

1. Create a common section (I will have to come up with an appropriate name) and put
   all such functions in that one section.

2. Create one section for each logical type (exception section, ftrace section and
   kprobe section) or some such.

3. Use the section idea only for the el1 exceptions. For the others use the current
   special_functions[] approach.

Which one do you and Mark Rutland prefer? Or, is there another choice?

Madhavan
Madhavan T. Venkataraman April 8, 2021, 11:30 p.m. UTC | #4
On 4/8/21 2:30 PM, Madhavan T. Venkataraman wrote:
> 
> 
> On 4/8/21 12:17 PM, Mark Brown wrote:
>> On Mon, Apr 05, 2021 at 03:43:10PM -0500, madvenka@linux.microsoft.com wrote:
>>
>>> These checks will involve checking the return PC to see if it falls inside
>>> any special functions where the stack trace is considered unreliable.
>>> Implement the infrastructure needed for this.
>>
>> Following up again based on an off-list discussion with Mark Rutland:
>> while I think this is a reasonable implementation for specifically
>> listing functions that cause problems we could make life easier for
>> ourselves by instead using annotations at the call sites to put things
>> into sections which indicate that they're unsafe for unwinding, we can
>> then check for any address in one of those sections (or possibly do the
>> reverse and check for any address in a section we specifically know is
>> safe) rather than having to enumerate problematic functions in the
>> unwinder.  This also has the advantage of not having a list that's
>> separate to the functions themselves so it's less likely that the
>> unwinder will get out of sync with the rest of the code as things evolve.
>>
>> We already have SYM_CODE_START() annotations in the code for assembly
>> functions that aren't using the standard calling convention which should
>> help a lot here, we could add a variant of that for things that we know
>> are safe on stacks (like those we expect to find at the bottom of
>> stacks).
>>
> 
> As I already mentioned before, I like the idea of sections. The only reason that I did
> not try it was that I have to address FTRACE trampolines and the kretprobe_trampoline
> (and optprobes in the future).
> 
> I have the following options:
> 
> 1. Create a common section (I will have to come up with an appropriate name) and put
>    all such functions in that one section.
> 
> 2. Create one section for each logical type (exception section, ftrace section and
>    kprobe section) or some such.
> 

For now, I will start with idea 2. I will create a special section for each class of
functions (EL1 exception handlers, FTRACE trampolines, KPROBE trampolines). Instead of a
special functions array, I will implement a special_sections array. The rest of the code
should just fall into place.

Let me know if you prefer something different.

Thanks.

Madhavan

> 3. Use the section idea only for the el1 exceptions. For the others use the current
>    special_functions[] approach.
> 
> Which one do you and Mark Rutland prefer? Or, is there another choice?
> 
> Madhavan
>
Mark Brown April 9, 2021, 11:57 a.m. UTC | #5
On Thu, Apr 08, 2021 at 06:30:22PM -0500, Madhavan T. Venkataraman wrote:
> On 4/8/21 2:30 PM, Madhavan T. Venkataraman wrote:

> > 1. Create a common section (I will have to come up with an appropriate name) and put
> >    all such functions in that one section.

> > 2. Create one section for each logical type (exception section, ftrace section and
> >    kprobe section) or some such.

> For now, I will start with idea 2. I will create a special section for each class of
> functions (EL1 exception handlers, FTRACE trampolines, KPROBE trampolines). Instead of a
> special functions array, I will implement a special_sections array. The rest of the code
> should just fall into place.

> Let me know if you prefer something different.

It might be safer to start off by just putting all SYM_CODE into a
section then pulling bits we know to be safe out of the section as
needed - we know that anything that's SYM_CODE is doing something
non-standard and needs checking to verify that the unwinder will be
happy with it and I that should cover most if not all of the cases above
as well as anything else we didn't explicitly think of.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index eb29b1fe8255..684f65808394 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -59,6 +59,7 @@  struct stackframe {
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	int graph;
 #endif
+	bool reliable;
 };
 
 extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
@@ -169,6 +170,7 @@  static inline void start_backtrace(struct stackframe *frame,
 	bitmap_zero(frame->stacks_done, __NR_STACK_TYPES);
 	frame->prev_fp = 0;
 	frame->prev_type = STACK_TYPE_UNKNOWN;
+	frame->reliable = true;
 }
 
 #endif	/* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index ad20981dfda4..557657d6e6bd 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -18,6 +18,84 @@ 
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
 
+struct function_range {
+	unsigned long	start;
+	unsigned long	end;
+};
+
+/*
+ * Special functions where the stack trace is unreliable.
+ */
+static struct function_range	special_functions[] = {
+	{ /* sentinel */ }
+};
+
+static bool is_reliable_function(unsigned long pc)
+{
+	static bool inited = false;
+	struct function_range *func;
+
+	if (!inited) {
+		static char sym[KSYM_NAME_LEN];
+		unsigned long size, offset;
+
+		for (func = special_functions; func->start; func++) {
+			if (kallsyms_lookup(func->start, &size, &offset,
+					    NULL, sym)) {
+				func->start -= offset;
+				func->end = func->start + size;
+			} else {
+				/*
+				 * This is just a label. So, we only need to
+				 * consider that particular location. So, size
+				 * is the size of one Aarch64 instruction.
+				 */
+				func->end = func->start + 4;
+			}
+		}
+		inited = true;
+	}
+
+	for (func = special_functions; func->start; func++) {
+		if (pc >= func->start && pc < func->end)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * Check for the presence of features and conditions that render the stack
+ * trace unreliable.
+ *
+ * Once all such cases have been addressed, this function can aid live
+ * patching (and this comment can be removed).
+ */
+static void check_reliability(struct stackframe *frame)
+{
+	/*
+	 * If the stack trace has already been marked unreliable, just return.
+	 */
+	if (!frame->reliable)
+		return;
+
+	/*
+	 * First, make sure that the return address is a proper kernel text
+	 * address. A NULL or invalid return address probably means there's
+	 * some generated code which __kernel_text_address() doesn't know
+	 * about. Mark the stack trace as not reliable.
+	 */
+	if (!__kernel_text_address(frame->pc)) {
+		frame->reliable = false;
+		return;
+	}
+
+	/*
+	 * Check the reliability of the return PC's function.
+	 */
+	if (!is_reliable_function(frame->pc))
+		frame->reliable = false;
+}
+
 /*
  * AArch64 PCS assigns the frame pointer to x29.
  *
@@ -108,6 +186,8 @@  int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
 
 	frame->pc = ptrauth_strip_insn_pac(frame->pc);
 
+	check_reliability(frame);
+
 	return 0;
 }
 NOKPROBE_SYMBOL(unwind_frame);