Message ID | 20210330190955.13707-1-madvenka@linux.microsoft.com (mailing list archive) |
---|---|
Headers | show |
Series | arm64: Implement stack trace reliability checks | expand |
On Tue, Mar 30, 2021 at 02:09:51PM -0500, madvenka@linux.microsoft.com wrote: > From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> > > There are a number of places in kernel code where the stack trace is not > reliable. Enhance the unwinder to check for those cases and mark the > stack trace as unreliable. Once all of the checks are in place, the unwinder > can be used for livepatching. This assumes all such places are known. That's a big assumption, as a) hand-written asm code may inadvertantly skip frame pointer setup; b) for inline asm which calls a function, the compiler may blindly insert it into a function before the frame pointer setup. That's where objtool stack validation would come in. > Detect EL1 exception frame > ========================== > > EL1 exceptions can happen on any instruction including instructions in > the frame pointer prolog or epilog. Depending on where exactly they happen, > they could render the stack trace unreliable. > > Add all of the EL1 exception handlers to special_functions[]. > > - el1_sync() > - el1_irq() > - el1_error() > - el1_sync_invalid() > - el1_irq_invalid() > - el1_fiq_invalid() > - el1_error_invalid() A possibly more robust alternative would be to somehow mark el1 exception frames so the unwinder can detect them more generally. For example, as described in my previous email, encode the frame pointer so the unwinder can detect el1 frames automatically. > Detect ftrace frame > =================== > > When FTRACE executes at the beginning of a traced function, it creates two > frames and calls the tracer function: > > - One frame for the traced function > > - One frame for the caller of the traced function > > That gives a sensible stack trace while executing in the tracer function. > When FTRACE returns to the traced function, the frames are popped and > everything is back to normal. > > However, in cases like live patch, the tracer function redirects execution > to a different function. When FTRACE returns, control will go to that target > function. A stack trace taken in the tracer function will not show the target > function. The target function is the real function that we want to track. > So, the stack trace is unreliable. I don't think this is a real problem. Livepatch only checks the stacks of blocked tasks (and the task calling into livepatch). So the reliability of unwinding from the livepatch tracer function itself (klp_ftrace_handler) isn't a concern since it doesn't sleep. > To detect FTRACE in a stack trace, add the following to special_functions[]: > > - ftrace_graph_call() > - ftrace_graph_caller() > > Please see the diff for a comment that explains why ftrace_graph_call() > must be checked. > > Also, the Function Graph Tracer modifies the return address of a traced > function to a return trampoline (return_to_handler()) to gather tracing > data on function return. Stack traces taken from the traced function and > functions it calls will not show the original caller of the traced function. > The unwinder handles this case by getting the original caller from FTRACE. > > However, stack traces taken from the trampoline itself and functions it calls > are unreliable as the original return address may not be available in > that context. This is because the trampoline calls FTRACE to gather trace > data as well as to obtain the actual return address and FTRACE discards the > record of the original return address along the way. Again, this shouldn't be a concern because livepatch won't be unwinding from a function_graph trampoline unless it got preempted somehow (and then the el1 frame would get detected anyway). > Add return_to_handler() to special_functions[]. > > Check for kretprobe > =================== > > For functions with a kretprobe set up, probe code executes on entry > to the function and replaces the return address in the stack frame with a > kretprobe trampoline. Whenever the function returns, control is > transferred to the trampoline. The trampoline eventually returns to the > original return address. > > A stack trace taken while executing in the function (or in functions that > get called from the function) will not show the original return address. > Similarly, a stack trace taken while executing in the trampoline itself > (and functions that get called from the trampoline) will not show the > original return address. This means that the caller of the probed function > will not show. This makes the stack trace unreliable. > > Add the kretprobe trampoline to special_functions[]. > > FYI, each task contains a task->kretprobe_instances list that can > theoretically be consulted to find the orginal return address. But I am > not entirely sure how to safely traverse that list for stack traces > not on the current process. So, I have taken the easy way out. For kretprobes, unwinding from the trampoline or kretprobe handler shouldn't be a reliability concern for live patching, for similar reasons as above. Otherwise, when unwinding from a blocked task which has 'kretprobe_trampoline' on the stack, the unwinder needs a way to get the original return address. Masami has been working on an interface to make that possible for x86. I assume something similar could be done for arm64. > Optprobes > ========= > > Optprobes may be implemented in the future for arm64. For optprobes, > the relevant trampoline(s) can be added to special_functions[]. Similar comment here, livepatch doesn't unwind from such trampolines.
On 4/3/21 12:01 PM, Josh Poimboeuf wrote: > On Tue, Mar 30, 2021 at 02:09:51PM -0500, madvenka@linux.microsoft.com wrote: >> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> >> >> There are a number of places in kernel code where the stack trace is not >> reliable. Enhance the unwinder to check for those cases and mark the >> stack trace as unreliable. Once all of the checks are in place, the unwinder >> can be used for livepatching. > > This assumes all such places are known. That's a big assumption, as > > a) hand-written asm code may inadvertantly skip frame pointer setup; > > b) for inline asm which calls a function, the compiler may blindly > insert it into a function before the frame pointer setup. > > That's where objtool stack validation would come in. > Yes. I meant that the reliable stack trace in the kernel is necessary. I did not imply that it was sufficient. Clearly, it is not sufficient. It relies on the frame pointer being setup correctly for all functions. That has to be guaranteed by another entity such as objtool. So, I will improve the wording and make it clear in the next version. >> Detect EL1 exception frame >> ========================== >> >> EL1 exceptions can happen on any instruction including instructions in >> the frame pointer prolog or epilog. Depending on where exactly they happen, >> they could render the stack trace unreliable. >> >> Add all of the EL1 exception handlers to special_functions[]. >> >> - el1_sync() >> - el1_irq() >> - el1_error() >> - el1_sync_invalid() >> - el1_irq_invalid() >> - el1_fiq_invalid() >> - el1_error_invalid() > > A possibly more robust alternative would be to somehow mark el1 > exception frames so the unwinder can detect them more generally. > > For example, as described in my previous email, encode the frame pointer > so the unwinder can detect el1 frames automatically. > Encoding the frame pointer by setting the LSB (like X86) was my first solution. Mark Rutland NAKed it. His objection was that it would confuse the debuggers which are expecting the last 4 bits of the frame pointer to be 0. I agree with this objection. My problem with the encoding was also that it is not possible to know if the LSB was set because of encoding or because of stack corruption. My second attempt was to encode the frame pointer indirectly. That is, make pt_regs->stackframe the exception frame and use other fields in the pt_regs (including a frame type encoding field) for verification. Mark Rutland NAKed it. His objection (if I am rephrasing it correctly) was that garbage on the stack may accidentally match the values the unwinder checks in the pt_regs (however unlikely that match might be). The consensus was that the return PC must be checked against special functions to recognize those special cases as the special functions are only invoked in those special contexts and nowhere else. As an aside, Mark Brown suggested (if I recall correctly) that the exception functions could be placed in a special exception section so the unwinder can check a return PC against the section bounds instead of individual functions. I did consider implementing this. But I needed a way to address FTRACE trampolines and KPROBE trampolines as well. So, I did not do that. >> Detect ftrace frame >> =================== >> >> When FTRACE executes at the beginning of a traced function, it creates two >> frames and calls the tracer function: >> >> - One frame for the traced function >> >> - One frame for the caller of the traced function >> >> That gives a sensible stack trace while executing in the tracer function. >> When FTRACE returns to the traced function, the frames are popped and >> everything is back to normal. >> >> However, in cases like live patch, the tracer function redirects execution >> to a different function. When FTRACE returns, control will go to that target >> function. A stack trace taken in the tracer function will not show the target >> function. The target function is the real function that we want to track. >> So, the stack trace is unreliable. > > I don't think this is a real problem. Livepatch only checks the stacks > of blocked tasks (and the task calling into livepatch). So the > reliability of unwinding from the livepatch tracer function itself > (klp_ftrace_handler) isn't a concern since it doesn't sleep. > My thinking was - arch_stack_walk_reliable() should provide a reliable stack trace and not assume anything about its consumers. It should not assume that livepatch is the only consumer although it might be. Theoretically, there can be a tracer function that calls some kernel function F() that can go to sleep. Is this not allowed? Or, F() could call arch_stack_walk_reliable() on the current task for debugging or tracing purposes. It should still work correctly. >> To detect FTRACE in a stack trace, add the following to special_functions[]: >> >> - ftrace_graph_call() >> - ftrace_graph_caller() >> >> Please see the diff for a comment that explains why ftrace_graph_call() >> must be checked. >> >> Also, the Function Graph Tracer modifies the return address of a traced >> function to a return trampoline (return_to_handler()) to gather tracing >> data on function return. Stack traces taken from the traced function and >> functions it calls will not show the original caller of the traced function. >> The unwinder handles this case by getting the original caller from FTRACE. >> >> However, stack traces taken from the trampoline itself and functions it calls >> are unreliable as the original return address may not be available in >> that context. This is because the trampoline calls FTRACE to gather trace >> data as well as to obtain the actual return address and FTRACE discards the >> record of the original return address along the way. > > Again, this shouldn't be a concern because livepatch won't be unwinding > from a function_graph trampoline unless it got preempted somehow (and > then the el1 frame would get detected anyway). > Please see previous answer. >> Add return_to_handler() to special_functions[]. >> >> Check for kretprobe >> =================== >> >> For functions with a kretprobe set up, probe code executes on entry >> to the function and replaces the return address in the stack frame with a >> kretprobe trampoline. Whenever the function returns, control is >> transferred to the trampoline. The trampoline eventually returns to the >> original return address. >> >> A stack trace taken while executing in the function (or in functions that >> get called from the function) will not show the original return address. >> Similarly, a stack trace taken while executing in the trampoline itself >> (and functions that get called from the trampoline) will not show the >> original return address. This means that the caller of the probed function >> will not show. This makes the stack trace unreliable. >> >> Add the kretprobe trampoline to special_functions[]. >> >> FYI, each task contains a task->kretprobe_instances list that can >> theoretically be consulted to find the orginal return address. But I am >> not entirely sure how to safely traverse that list for stack traces >> not on the current process. So, I have taken the easy way out. > > For kretprobes, unwinding from the trampoline or kretprobe handler > shouldn't be a reliability concern for live patching, for similar > reasons as above. > Please see previous answer. > Otherwise, when unwinding from a blocked task which has > 'kretprobe_trampoline' on the stack, the unwinder needs a way to get the > original return address. Masami has been working on an interface to > make that possible for x86. I assume something similar could be done > for arm64. > OK. Until that is available, this case needs to be addressed. Madhavan
Hi Madhaven, On Sat, 3 Apr 2021 22:29:12 -0500 "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> wrote: > >> Check for kretprobe > >> =================== > >> > >> For functions with a kretprobe set up, probe code executes on entry > >> to the function and replaces the return address in the stack frame with a > >> kretprobe trampoline. Whenever the function returns, control is > >> transferred to the trampoline. The trampoline eventually returns to the > >> original return address. > >> > >> A stack trace taken while executing in the function (or in functions that > >> get called from the function) will not show the original return address. > >> Similarly, a stack trace taken while executing in the trampoline itself > >> (and functions that get called from the trampoline) will not show the > >> original return address. This means that the caller of the probed function > >> will not show. This makes the stack trace unreliable. > >> > >> Add the kretprobe trampoline to special_functions[]. > >> > >> FYI, each task contains a task->kretprobe_instances list that can > >> theoretically be consulted to find the orginal return address. But I am > >> not entirely sure how to safely traverse that list for stack traces > >> not on the current process. So, I have taken the easy way out. > > > > For kretprobes, unwinding from the trampoline or kretprobe handler > > shouldn't be a reliability concern for live patching, for similar > > reasons as above. > > > > Please see previous answer. > > > Otherwise, when unwinding from a blocked task which has > > 'kretprobe_trampoline' on the stack, the unwinder needs a way to get the > > original return address. Masami has been working on an interface to > > make that possible for x86. I assume something similar could be done > > for arm64. > > > > OK. Until that is available, this case needs to be addressed. Actually, I've done that on arm64 :) See below patch. (and I also have a similar code for arm32, what I'm considering is how to unify x86/arm/arm64 kretprobe_find_ret_addr(), since those are very similar.) This is applicable on my x86 series v5 https://lore.kernel.org/bpf/161676170650.330141.6214727134265514123.stgit@devnote2/ Thank you, From 947cf6cf1fd4154edd5533d18c2f8dfedc8d993d Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu <mhiramat@kernel.org> Date: Sat, 20 Mar 2021 00:14:29 +0900 Subject: [PATCH] arm64: Recover kretprobe modified return address in stacktrace Since the kretprobe replaces the function return address with the kretprobe_trampoline on the stack, arm64 unwinder shows it instead of the correct return address. This finds the correct return address from the per-task kretprobe_instances list and verify it is in between the caller fp and callee fp. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> --- arch/arm64/include/asm/stacktrace.h | 2 ++ arch/arm64/kernel/probes/kprobes.c | 28 ++++++++++++++++++++++++++++ arch/arm64/kernel/stacktrace.c | 3 +++ kernel/kprobes.c | 8 ++++---- 4 files changed, 37 insertions(+), 4 deletions(-) diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h index eb29b1fe8255..50ebc9e9dba9 100644 --- a/arch/arm64/include/asm/stacktrace.h +++ b/arch/arm64/include/asm/stacktrace.h @@ -9,6 +9,7 @@ #include <linux/sched.h> #include <linux/sched/task_stack.h> #include <linux/types.h> +#include <linux/llist.h> #include <asm/memory.h> #include <asm/ptrace.h> @@ -59,6 +60,7 @@ struct stackframe { #ifdef CONFIG_FUNCTION_GRAPH_TRACER int graph; #endif + struct llist_node *kr_cur; }; extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame); diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c index fce681fdfce6..204e475cbff3 100644 --- a/arch/arm64/kernel/probes/kprobes.c +++ b/arch/arm64/kernel/probes/kprobes.c @@ -410,6 +410,34 @@ int __init arch_populate_kprobe_blacklist(void) return ret; } +unsigned long __kretprobe_find_ret_addr(struct task_struct *tsk, + struct llist_node **cur); + +unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, + void *fp, struct llist_node **cur) +{ + struct kretprobe_instance *ri; + unsigned long ret; + + do { + ret = __kretprobe_find_ret_addr(tsk, cur); + if (!ret) + return ret; + ri = container_of(*cur, struct kretprobe_instance, llist); + /* + * Since arm64 stores the stack pointer of the entry of target + * function (callee) to ri->fp, the given real @fp must be + * smaller than ri->fp, but bigger than the previous ri->fp. + * + * callee sp (prev ri->fp) + * fp (and *saved_lr) + * caller sp (ri->fp) + */ + } while (ri->fp <= fp); + + return ret; +} + void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs) { return (void *)kretprobe_trampoline_handler(regs, (void *)kernel_stack_pointer(regs)); diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index ad20981dfda4..956486d4ac10 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -105,6 +105,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame) frame->pc = ret_stack->ret; } #endif /* CONFIG_FUNCTION_GRAPH_TRACER */ + if (is_kretprobe_trampoline(frame->pc)) + frame->pc = kretprobe_find_ret_addr(tsk, (void *)fp, &frame->kr_cur); frame->pc = ptrauth_strip_insn_pac(frame->pc); @@ -199,6 +201,7 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, { struct stackframe frame; + memset(&frame, 0, sizeof(frame)); if (regs) start_backtrace(&frame, regs->regs[29], regs->pc); else if (task == current) diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 4ce3e6f5d28d..12677a463561 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -1859,8 +1859,8 @@ static struct notifier_block kprobe_exceptions_nb = { #ifdef CONFIG_KRETPROBES /* This assumes the tsk is current or the task which is not running. */ -static unsigned long __kretprobe_find_ret_addr(struct task_struct *tsk, - struct llist_node **cur) +unsigned long __kretprobe_find_ret_addr(struct task_struct *tsk, + struct llist_node **cur) { struct kretprobe_instance *ri = NULL; struct llist_node *node = *cur; @@ -1882,8 +1882,8 @@ static unsigned long __kretprobe_find_ret_addr(struct task_struct *tsk, } NOKPROBE_SYMBOL(__kretprobe_find_ret_addr); -unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp, - struct llist_node **cur) +unsigned long __weak kretprobe_find_ret_addr(struct task_struct *tsk, + void *fp, struct llist_node **cur) { struct kretprobe_instance *ri = NULL; unsigned long ret;
On 4/5/21 8:24 AM, Masami Hiramatsu wrote: > Hi Madhaven, > > On Sat, 3 Apr 2021 22:29:12 -0500 > "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> wrote: > > >>>> Check for kretprobe >>>> =================== >>>> >>>> For functions with a kretprobe set up, probe code executes on entry >>>> to the function and replaces the return address in the stack frame with a >>>> kretprobe trampoline. Whenever the function returns, control is >>>> transferred to the trampoline. The trampoline eventually returns to the >>>> original return address. >>>> >>>> A stack trace taken while executing in the function (or in functions that >>>> get called from the function) will not show the original return address. >>>> Similarly, a stack trace taken while executing in the trampoline itself >>>> (and functions that get called from the trampoline) will not show the >>>> original return address. This means that the caller of the probed function >>>> will not show. This makes the stack trace unreliable. >>>> >>>> Add the kretprobe trampoline to special_functions[]. >>>> >>>> FYI, each task contains a task->kretprobe_instances list that can >>>> theoretically be consulted to find the orginal return address. But I am >>>> not entirely sure how to safely traverse that list for stack traces >>>> not on the current process. So, I have taken the easy way out. >>> >>> For kretprobes, unwinding from the trampoline or kretprobe handler >>> shouldn't be a reliability concern for live patching, for similar >>> reasons as above. >>> >> >> Please see previous answer. >> >>> Otherwise, when unwinding from a blocked task which has >>> 'kretprobe_trampoline' on the stack, the unwinder needs a way to get the >>> original return address. Masami has been working on an interface to >>> make that possible for x86. I assume something similar could be done >>> for arm64. >>> >> >> OK. Until that is available, this case needs to be addressed. > > Actually, I've done that on arm64 :) See below patch. > (and I also have a similar code for arm32, what I'm considering is how > to unify x86/arm/arm64 kretprobe_find_ret_addr(), since those are very > similar.) > > This is applicable on my x86 series v5 > > https://lore.kernel.org/bpf/161676170650.330141.6214727134265514123.stgit@devnote2/ > > Thank you, > > OK. I will take a look. Thanks. Madhavan
On 4/5/21 8:24 AM, Masami Hiramatsu wrote: > Hi Madhaven, > > On Sat, 3 Apr 2021 22:29:12 -0500 > "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> wrote: > > >>>> Check for kretprobe >>>> =================== >>>> >>>> For functions with a kretprobe set up, probe code executes on entry >>>> to the function and replaces the return address in the stack frame with a >>>> kretprobe trampoline. Whenever the function returns, control is >>>> transferred to the trampoline. The trampoline eventually returns to the >>>> original return address. >>>> >>>> A stack trace taken while executing in the function (or in functions that >>>> get called from the function) will not show the original return address. >>>> Similarly, a stack trace taken while executing in the trampoline itself >>>> (and functions that get called from the trampoline) will not show the >>>> original return address. This means that the caller of the probed function >>>> will not show. This makes the stack trace unreliable. >>>> >>>> Add the kretprobe trampoline to special_functions[]. >>>> >>>> FYI, each task contains a task->kretprobe_instances list that can >>>> theoretically be consulted to find the orginal return address. But I am >>>> not entirely sure how to safely traverse that list for stack traces >>>> not on the current process. So, I have taken the easy way out. >>> >>> For kretprobes, unwinding from the trampoline or kretprobe handler >>> shouldn't be a reliability concern for live patching, for similar >>> reasons as above. >>> >> >> Please see previous answer. >> >>> Otherwise, when unwinding from a blocked task which has >>> 'kretprobe_trampoline' on the stack, the unwinder needs a way to get the >>> original return address. Masami has been working on an interface to >>> make that possible for x86. I assume something similar could be done >>> for arm64. >>> >> >> OK. Until that is available, this case needs to be addressed. > > Actually, I've done that on arm64 :) See below patch. > (and I also have a similar code for arm32, what I'm considering is how > to unify x86/arm/arm64 kretprobe_find_ret_addr(), since those are very > similar.) > > This is applicable on my x86 series v5 > > https://lore.kernel.org/bpf/161676170650.330141.6214727134265514123.stgit@devnote2/ > > Thank you, > > I took a brief look at your changes. Looks reasonable. However, for now, I am going to include the kretprobe_trampoline in the special_functions[] array until your changes are merged. At that point, it is just a matter of deleting kretprobe_trampoline from the special_functions[] array. That is all. I hope that is fine with everyone. Madhavan
On 4/5/21 9:56 AM, Madhavan T. Venkataraman wrote: > > > On 4/5/21 8:24 AM, Masami Hiramatsu wrote: >> Hi Madhaven, >> >> On Sat, 3 Apr 2021 22:29:12 -0500 >> "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> wrote: >> >> >>>>> Check for kretprobe >>>>> =================== >>>>> >>>>> For functions with a kretprobe set up, probe code executes on entry >>>>> to the function and replaces the return address in the stack frame with a >>>>> kretprobe trampoline. Whenever the function returns, control is >>>>> transferred to the trampoline. The trampoline eventually returns to the >>>>> original return address. >>>>> >>>>> A stack trace taken while executing in the function (or in functions that >>>>> get called from the function) will not show the original return address. >>>>> Similarly, a stack trace taken while executing in the trampoline itself >>>>> (and functions that get called from the trampoline) will not show the >>>>> original return address. This means that the caller of the probed function >>>>> will not show. This makes the stack trace unreliable. >>>>> >>>>> Add the kretprobe trampoline to special_functions[]. >>>>> >>>>> FYI, each task contains a task->kretprobe_instances list that can >>>>> theoretically be consulted to find the orginal return address. But I am >>>>> not entirely sure how to safely traverse that list for stack traces >>>>> not on the current process. So, I have taken the easy way out. >>>> >>>> For kretprobes, unwinding from the trampoline or kretprobe handler >>>> shouldn't be a reliability concern for live patching, for similar >>>> reasons as above. >>>> >>> >>> Please see previous answer. >>> >>>> Otherwise, when unwinding from a blocked task which has >>>> 'kretprobe_trampoline' on the stack, the unwinder needs a way to get the >>>> original return address. Masami has been working on an interface to >>>> make that possible for x86. I assume something similar could be done >>>> for arm64. >>>> >>> >>> OK. Until that is available, this case needs to be addressed. >> >> Actually, I've done that on arm64 :) See below patch. >> (and I also have a similar code for arm32, what I'm considering is how >> to unify x86/arm/arm64 kretprobe_find_ret_addr(), since those are very >> similar.) >> >> This is applicable on my x86 series v5 >> >> https://lore.kernel.org/bpf/161676170650.330141.6214727134265514123.stgit@devnote2/ >> >> Thank you, >> >> > > I took a brief look at your changes. Looks reasonable. > > However, for now, I am going to include the kretprobe_trampoline in the special_functions[] > array until your changes are merged. At that point, it is just a matter of deleting > kretprobe_trampoline from the special_functions[] array. That is all. > > I hope that is fine with everyone. > Actually, there may still be a problem to solve. If arch_stack_walk_reliable() is ever called from within kretprobe_trampoline() for debugging or other purposes after the instance is deleted from the task instance list, it would not be able to retrieve the original return address. The stack trace would be unreliable in that case, would it not? Madhavan
On Mon, 5 Apr 2021 12:12:08 -0500 "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> wrote: > > > On 4/5/21 9:56 AM, Madhavan T. Venkataraman wrote: > > > > > > On 4/5/21 8:24 AM, Masami Hiramatsu wrote: > >> Hi Madhaven, > >> > >> On Sat, 3 Apr 2021 22:29:12 -0500 > >> "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> wrote: > >> > >> > >>>>> Check for kretprobe > >>>>> =================== > >>>>> > >>>>> For functions with a kretprobe set up, probe code executes on entry > >>>>> to the function and replaces the return address in the stack frame with a > >>>>> kretprobe trampoline. Whenever the function returns, control is > >>>>> transferred to the trampoline. The trampoline eventually returns to the > >>>>> original return address. > >>>>> > >>>>> A stack trace taken while executing in the function (or in functions that > >>>>> get called from the function) will not show the original return address. > >>>>> Similarly, a stack trace taken while executing in the trampoline itself > >>>>> (and functions that get called from the trampoline) will not show the > >>>>> original return address. This means that the caller of the probed function > >>>>> will not show. This makes the stack trace unreliable. > >>>>> > >>>>> Add the kretprobe trampoline to special_functions[]. > >>>>> > >>>>> FYI, each task contains a task->kretprobe_instances list that can > >>>>> theoretically be consulted to find the orginal return address. But I am > >>>>> not entirely sure how to safely traverse that list for stack traces > >>>>> not on the current process. So, I have taken the easy way out. > >>>> > >>>> For kretprobes, unwinding from the trampoline or kretprobe handler > >>>> shouldn't be a reliability concern for live patching, for similar > >>>> reasons as above. > >>>> > >>> > >>> Please see previous answer. > >>> > >>>> Otherwise, when unwinding from a blocked task which has > >>>> 'kretprobe_trampoline' on the stack, the unwinder needs a way to get the > >>>> original return address. Masami has been working on an interface to > >>>> make that possible for x86. I assume something similar could be done > >>>> for arm64. > >>>> > >>> > >>> OK. Until that is available, this case needs to be addressed. > >> > >> Actually, I've done that on arm64 :) See below patch. > >> (and I also have a similar code for arm32, what I'm considering is how > >> to unify x86/arm/arm64 kretprobe_find_ret_addr(), since those are very > >> similar.) > >> > >> This is applicable on my x86 series v5 > >> > >> https://lore.kernel.org/bpf/161676170650.330141.6214727134265514123.stgit@devnote2/ > >> > >> Thank you, > >> > >> > > > > I took a brief look at your changes. Looks reasonable. > > > > However, for now, I am going to include the kretprobe_trampoline in the special_functions[] > > array until your changes are merged. At that point, it is just a matter of deleting > > kretprobe_trampoline from the special_functions[] array. That is all. > > > > I hope that is fine with everyone. > > > > Actually, there may still be a problem to solve. > > If arch_stack_walk_reliable() is ever called from within kretprobe_trampoline() for debugging or > other purposes after the instance is deleted from the task instance list, it would not be able > to retrieve the original return address. > > The stack trace would be unreliable in that case, would it not? Good catch! I'm preparing a patch to fix that case (currently only for x86, see below). This is currently only for x86. Arm64 kretprobe may have to modify its stack layout similar to x86 so that unwinder can find the return address from stack. Thank you, From cdca74a1ebc174062eb99a376072002ae21f7d7e Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu <mhiramat@kernel.org> Date: Mon, 8 Mar 2021 00:22:51 +0900 Subject: [PATCH] x86/kprobes: Fixup return address in generic trampoline handler In x86, kretprobe trampoline address on the stack frame will be replaced with the real return address after returning from trampoline_handler. Before fixing the return address, the real return address can be found in the current->kretprobe_instances. However, since there is a window between updating the current->kretprobe_instances and fixing the address on the stack, if an interrupt caused at that timing and the interrupt handler does stacktrace, it may fail to unwind because it can not get the correct return address from current->kretprobe_instances. This will minimize that window by fixing the return address right before updating current->kretprobe_instances. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> --- arch/x86/kernel/kprobes/core.c | 14 ++++++++++++-- kernel/kprobes.c | 8 ++++++++ 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index 00c5944ae8f6..950b8e873937 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -1032,6 +1032,7 @@ STACK_FRAME_NON_STANDARD(kretprobe_trampoline); #undef UNWIND_HINT_FUNC #define UNWIND_HINT_FUNC #endif + /* * When a retprobed function returns, this code saves registers and * calls trampoline_handler() runs, which calls the kretprobe's handler. @@ -1073,6 +1074,16 @@ asm( ); NOKPROBE_SYMBOL(kretprobe_trampoline); +void arch_kretprobe_fixup_return(struct pt_regs *regs, + unsigned long correct_ret_addr) +{ + unsigned long *frame_pointer; + frame_pointer = ((unsigned long *)®s->sp) + 1; + + /* Replace fake return address with real one. */ + *frame_pointer = correct_ret_addr; +} + /* * Called from kretprobe_trampoline */ @@ -1090,8 +1101,7 @@ __used __visible void trampoline_handler(struct pt_regs *regs) regs->sp += sizeof(long); frame_pointer = ((unsigned long *)®s->sp) + 1; - /* Replace fake return address with real one. */ - *frame_pointer = kretprobe_trampoline_handler(regs, frame_pointer); + kretprobe_trampoline_handler(regs, frame_pointer); /* * Move flags to sp so that kretprobe_trapmoline can return * right after popf. diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 12677a463561..3c72df5b31dd 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -1899,6 +1899,12 @@ unsigned long __weak kretprobe_find_ret_addr(struct task_struct *tsk, } NOKPROBE_SYMBOL(kretprobe_find_ret_addr); +void __weak arch_kretprobe_fixup_return(struct pt_regs *regs, + unsigned long correct_ret_addr) +{ + /* Do nothing by default. */ +} + unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs, void *frame_pointer) { @@ -1939,6 +1945,8 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs, first = first->next; } + arch_kretprobe_fixup_return(regs, (unsigned long)correct_ret_addr); + /* Unlink all nodes for this frame. */ first = current->kretprobe_instances.first; current->kretprobe_instances.first = node->next;
On Mon, 5 Apr 2021 09:56:48 -0500 "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> wrote: > > > On 4/5/21 8:24 AM, Masami Hiramatsu wrote: > > Hi Madhaven, > > > > On Sat, 3 Apr 2021 22:29:12 -0500 > > "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> wrote: > > > > > >>>> Check for kretprobe > >>>> =================== > >>>> > >>>> For functions with a kretprobe set up, probe code executes on entry > >>>> to the function and replaces the return address in the stack frame with a > >>>> kretprobe trampoline. Whenever the function returns, control is > >>>> transferred to the trampoline. The trampoline eventually returns to the > >>>> original return address. > >>>> > >>>> A stack trace taken while executing in the function (or in functions that > >>>> get called from the function) will not show the original return address. > >>>> Similarly, a stack trace taken while executing in the trampoline itself > >>>> (and functions that get called from the trampoline) will not show the > >>>> original return address. This means that the caller of the probed function > >>>> will not show. This makes the stack trace unreliable. > >>>> > >>>> Add the kretprobe trampoline to special_functions[]. > >>>> > >>>> FYI, each task contains a task->kretprobe_instances list that can > >>>> theoretically be consulted to find the orginal return address. But I am > >>>> not entirely sure how to safely traverse that list for stack traces > >>>> not on the current process. So, I have taken the easy way out. > >>> > >>> For kretprobes, unwinding from the trampoline or kretprobe handler > >>> shouldn't be a reliability concern for live patching, for similar > >>> reasons as above. > >>> > >> > >> Please see previous answer. > >> > >>> Otherwise, when unwinding from a blocked task which has > >>> 'kretprobe_trampoline' on the stack, the unwinder needs a way to get the > >>> original return address. Masami has been working on an interface to > >>> make that possible for x86. I assume something similar could be done > >>> for arm64. > >>> > >> > >> OK. Until that is available, this case needs to be addressed. > > > > Actually, I've done that on arm64 :) See below patch. > > (and I also have a similar code for arm32, what I'm considering is how > > to unify x86/arm/arm64 kretprobe_find_ret_addr(), since those are very > > similar.) > > > > This is applicable on my x86 series v5 > > > > https://lore.kernel.org/bpf/161676170650.330141.6214727134265514123.stgit@devnote2/ > > > > Thank you, > > > > > > I took a brief look at your changes. Looks reasonable. > > However, for now, I am going to include the kretprobe_trampoline in the special_functions[] > array until your changes are merged. At that point, it is just a matter of deleting > kretprobe_trampoline from the special_functions[] array. That is all. > > I hope that is fine with everyone. Agreed, that is reasonable unless my series is merged. Thank you,
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> There are a number of places in kernel code where the stack trace is not reliable. Enhance the unwinder to check for those cases and mark the stack trace as unreliable. Once all of the checks are in place, the unwinder can be used for livepatching. Except for the return address check, all the other checks involve checking the return PC of every frame against certain kernel functions. To do this, implement some infrastructure code: - Define a special_functions[] array and populate the array with the special functions - Using kallsyms_lookup(), lookup the symbol table entries for the functions and record their address ranges - Define an is_reliable_function(pc) to match a return PC against the special functions. The unwinder calls is_reliable_function(pc) for every return PC and marks the stack trace as reliable or unreliable accordingly. Return address check ==================== Check the return PC of every stack frame to make sure that it is a valid kernel text address (and not some generated code, for example). Detect EL1 exception frame ========================== EL1 exceptions can happen on any instruction including instructions in the frame pointer prolog or epilog. Depending on where exactly they happen, they could render the stack trace unreliable. Add all of the EL1 exception handlers to special_functions[]. - el1_sync() - el1_irq() - el1_error() - el1_sync_invalid() - el1_irq_invalid() - el1_fiq_invalid() - el1_error_invalid() Interrupts are EL1 exceptions. When a task is preempted, the preempt interrupt EL1 frame will show on the stack and the stack trace is considered unreliable. This is correct behavior as preemption can happen anywhere. Breakpoints are EL1 exceptions and can happen anywhere. Stack traces taken from within the breakpoint handler are, therefore, unreliable. This includes KProbe code that gets called from the breakpoint handler. Mark Rutland wanted me to send the EL1 checks in a separate patch series because the exception handling code is being reorganized. But the infrastructure code is common to the EL1 detection and other cases listed below. I was not entirely sure how to neatly split the patches. Besides, all this patch does is include the EL1 exception handlers in special_functions[]. When the names change because of the code reorg, this array can simply be edited. So, in the interest of getting review comments on this EL1 related work, I have included it in this patch series. Hope this is ok. Detect ftrace frame =================== When FTRACE executes at the beginning of a traced function, it creates two frames and calls the tracer function: - One frame for the traced function - One frame for the caller of the traced function That gives a sensible stack trace while executing in the tracer function. When FTRACE returns to the traced function, the frames are popped and everything is back to normal. However, in cases like live patch, the tracer function redirects execution to a different function. When FTRACE returns, control will go to that target function. A stack trace taken in the tracer function will not show the target function. The target function is the real function that we want to track. So, the stack trace is unreliable. To detect FTRACE in a stack trace, add the following to special_functions[]: - ftrace_graph_call() - ftrace_graph_caller() Please see the diff for a comment that explains why ftrace_graph_call() must be checked. Also, the Function Graph Tracer modifies the return address of a traced function to a return trampoline (return_to_handler()) to gather tracing data on function return. Stack traces taken from the traced function and functions it calls will not show the original caller of the traced function. The unwinder handles this case by getting the original caller from FTRACE. However, stack traces taken from the trampoline itself and functions it calls are unreliable as the original return address may not be available in that context. This is because the trampoline calls FTRACE to gather trace data as well as to obtain the actual return address and FTRACE discards the record of the original return address along the way. Add return_to_handler() to special_functions[]. Check for kretprobe =================== For functions with a kretprobe set up, probe code executes on entry to the function and replaces the return address in the stack frame with a kretprobe trampoline. Whenever the function returns, control is transferred to the trampoline. The trampoline eventually returns to the original return address. A stack trace taken while executing in the function (or in functions that get called from the function) will not show the original return address. Similarly, a stack trace taken while executing in the trampoline itself (and functions that get called from the trampoline) will not show the original return address. This means that the caller of the probed function will not show. This makes the stack trace unreliable. Add the kretprobe trampoline to special_functions[]. FYI, each task contains a task->kretprobe_instances list that can theoretically be consulted to find the orginal return address. But I am not entirely sure how to safely traverse that list for stack traces not on the current process. So, I have taken the easy way out. Optprobes ========= Optprobes may be implemented in the future for arm64. For optprobes, the relevant trampoline(s) can be added to special_functions[]. --- v1 - Define a bool field in struct stackframe. This will indicate if a stack trace is reliable. - Implement a special_functions[] array that will be populated with special functions in which the stack trace is considered unreliable. - Using kallsyms_lookup(), get the address ranges for the special functions and record them. - Implement an is_reliable_function(pc). This function will check if a given return PC falls in any of the special functions. If it does, the stack trace is unreliable. - Implement check_reliability() function that will check if a stack frame is reliable. Call is_reliable_function() from check_reliability(). - Before a return PC is checked against special_funtions[], it must be validates as a proper kernel text address. Call __kernel_text_address() from check_reliability(). - Finally, call check_reliability() from unwind_frame() for each stack frame. - Add EL1 exception handlers to special_functions[]. el1_sync(); el1_irq(); el1_error(); el1_sync_invalid(); el1_irq_invalid(); el1_fiq_invalid(); el1_error_invalid(); - The above functions are currently defined as LOCAL symbols. Make them global so that they can be referenced from the unwinder code. - Add FTRACE trampolines to special_functions[]: ftrace_graph_call() ftrace_graph_caller() return_to_handler() - Add the kretprobe trampoline to special functions[]: kretprobe_trampoline() Madhavan T. Venkataraman (4): arm64: Implement infrastructure for stack trace reliability checks arm64: Mark a stack trace unreliable if an EL1 exception frame is detected arm64: Detect FTRACE cases that make the stack trace unreliable arm64: Mark stack trace as unreliable if kretprobed functions are present arch/arm64/include/asm/exception.h | 8 ++ arch/arm64/include/asm/stacktrace.h | 2 + arch/arm64/kernel/entry-ftrace.S | 10 ++ arch/arm64/kernel/entry.S | 14 +- arch/arm64/kernel/stacktrace.c | 211 ++++++++++++++++++++++++++++ 5 files changed, 238 insertions(+), 7 deletions(-) base-commit: 0d02ec6b3136c73c09e7859f0d0e4e2c4c07b49b