Message ID | 20210405204313.21346-1-madvenka@linux.microsoft.com (mailing list archive) |
---|---|
Headers | show |
Series | arm64: Implement stack trace reliability checks | expand |
Hi Madhavan, I've noted some concerns below. At a high-level, I'm not keen on the blacklisting approach, and I think there's some other preparatory work that would be more valuable in the short term. On Mon, Apr 05, 2021 at 03:43:09PM -0500, madvenka@linux.microsoft.com wrote: > From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> > > There are a number of places in kernel code where the stack trace is not > reliable. Enhance the unwinder to check for those cases and mark the > stack trace as unreliable. Once all of the checks are in place, the unwinder > can provide a reliable stack trace. But before this can be used for livepatch, > some other entity needs to guarantee that the frame pointers are all set up > correctly in kernel functions. objtool is currently being worked on to > fill that gap. > > Except for the return address check, all the other checks involve checking > the return PC of every frame against certain kernel functions. To do this, > implement some infrastructure code: > > - Define a special_functions[] array and populate the array with > the special functions I'm not too keen on having to manually collate this within the unwinder, as it's very painful from a maintenance perspective. I'd much rather we could associate this information with the implementations of these functions, so that they're more likely to stay in sync. Further, I believe all the special cases are assembly functions, and most of those are already in special sections to begin with. I reckon it'd be simpler and more robust to reject unwinding based on the section. If we need to unwind across specific functions in those sections, we could opt-in with some metadata. So e.g. we could reject all functions in ".entry.text", special casing the EL0 entry functions if necessary. As I mentioned before, I'm currently reworking the entry assembly to make this simpler to do. I'd prefer to not make invasive changes in that area until that's sorted. I think there's a lot more code that we cannot unwind, e.g. KVM exception code, or almost anything marked with SYM_CODE_END(). > - Using kallsyms_lookup(), lookup the symbol table entries for the > functions and record their address ranges > > - Define an is_reliable_function(pc) to match a return PC against > the special functions. > > The unwinder calls is_reliable_function(pc) for every return PC and marks > the stack trace as reliable or unreliable accordingly. > > Return address check > ==================== > > Check the return PC of every stack frame to make sure that it is a valid > kernel text address (and not some generated code, for example). > > Detect EL1 exception frame > ========================== > > EL1 exceptions can happen on any instruction including instructions in > the frame pointer prolog or epilog. Depending on where exactly they happen, > they could render the stack trace unreliable. > > Add all of the EL1 exception handlers to special_functions[]. > > - el1_sync() > - el1_irq() > - el1_error() > - el1_sync_invalid() > - el1_irq_invalid() > - el1_fiq_invalid() > - el1_error_invalid() > > Detect ftrace frame > =================== > > When FTRACE executes at the beginning of a traced function, it creates two > frames and calls the tracer function: > > - One frame for the traced function > > - One frame for the caller of the traced function > > That gives a sensible stack trace while executing in the tracer function. > When FTRACE returns to the traced function, the frames are popped and > everything is back to normal. > > However, in cases like live patch, the tracer function redirects execution > to a different function. When FTRACE returns, control will go to that target > function. A stack trace taken in the tracer function will not show the target > function. The target function is the real function that we want to track. > So, the stack trace is unreliable. This doesn't match my understanding of the reliable stacktrace requirements, but I might have misunderstood what you're saying here. IIUC what you're describing here is: 1) A calls B 2) B is traced 3) tracer replaces B with TARGET 4) tracer returns to TARGET ... and if a stacktrace is taken at step 3 (before the return address is patched), the trace will show B rather than TARGET. My understanding is that this is legitimate behaviour. > To detect stack traces from a tracer function, add the following to > special_functions[]: > > - ftrace_call + 4 > > ftrace_call is the label at which the tracer function is patched in. So, > ftrace_call + 4 is its return address. This is what will show up in a > stack trace taken from the tracer function. > > When Function Graph Tracing is on, ftrace_graph_caller is patched in > at the label ftrace_graph_call. If a tracer function called before it has > redirected execution as mentioned above, the stack traces taken from within > ftrace_graph_caller will also be unreliable for the same reason as mentioned > above. So, add ftrace_graph_caller to special_functions[] as well. > > Also, the Function Graph Tracer modifies the return address of a traced > function to a return trampoline (return_to_handler()) to gather tracing > data on function return. Stack traces taken from the traced function and > functions it calls will not show the original caller of the traced function. > The unwinder handles this case by getting the original caller from FTRACE. > > However, stack traces taken from the trampoline itself and functions it calls > are unreliable as the original return address may not be available in > that context. This is because the trampoline calls FTRACE to gather trace > data as well as to obtain the actual return address and FTRACE discards the > record of the original return address along the way. The reason we cannot unwind the trampolines in the usual way is because they are not AAPCS compliant functions. We don't discard the original return address, but it's not in the usual location. With care, we could write a special case unwinder for them. Note that we also cannot unwind from any PLT on the way to the trampolines, so we'd also need to identify those. Luckily we're in charge of creating those, and (for now) we only need to care about the module PLTs. The bigger problem is return_to_handler, since there's a transient period when C code removes the return address from the graph return stack before passing this to assembly in a register, and so we can't reliably find the correct return address during this period. With care we could special case unwinding immediately before/after this. If we could find a way to restructure return_to_handler such that we can reliably find the correct return address, that would be a useful improvement today, and would mean that we don't have to blacklist it for reliable stacktrace. Thanks, Mark. > Add return_to_handler() to special_functions[]. > > Check for kretprobe > =================== > > For functions with a kretprobe set up, probe code executes on entry > to the function and replaces the return address in the stack frame with a > kretprobe trampoline. Whenever the function returns, control is > transferred to the trampoline. The trampoline eventually returns to the > original return address. > > A stack trace taken while executing in the function (or in functions that > get called from the function) will not show the original return address. > Similarly, a stack trace taken while executing in the trampoline itself > (and functions that get called from the trampoline) will not show the > original return address. This means that the caller of the probed function > will not show. This makes the stack trace unreliable. > > Add the kretprobe trampoline to special_functions[]. > > Optprobes > ========= > > Optprobes may be implemented in the future for arm64. For optprobes, > the relevant trampoline(s) can be added to special_functions[]. > --- > Changelog: > > v1 > - Define a bool field in struct stackframe. This will indicate if > a stack trace is reliable. > > - Implement a special_functions[] array that will be populated > with special functions in which the stack trace is considered > unreliable. > > - Using kallsyms_lookup(), get the address ranges for the special > functions and record them. > > - Implement an is_reliable_function(pc). This function will check > if a given return PC falls in any of the special functions. If > it does, the stack trace is unreliable. > > - Implement check_reliability() function that will check if a > stack frame is reliable. Call is_reliable_function() from > check_reliability(). > > - Before a return PC is checked against special_funtions[], it > must be validates as a proper kernel text address. Call > __kernel_text_address() from check_reliability(). > > - Finally, call check_reliability() from unwind_frame() for > each stack frame. > > - Add EL1 exception handlers to special_functions[]. > > el1_sync(); > el1_irq(); > el1_error(); > el1_sync_invalid(); > el1_irq_invalid(); > el1_fiq_invalid(); > el1_error_invalid(); > > - The above functions are currently defined as LOCAL symbols. > Make them global so that they can be referenced from the > unwinder code. > > - Add FTRACE trampolines to special_functions[]: > > ftrace_graph_call() > ftrace_graph_caller() > return_to_handler() > > - Add the kretprobe trampoline to special functions[]: > > kretprobe_trampoline() > > v2 > - Removed the terminating entry { 0, 0 } in special_functions[] > and replaced it with the idiom { /* sentinel */ }. > > - Change the ftrace trampoline entry ftrace_graph_call in > special_functions[] to ftrace_call + 4 and added explanatory > comments. > > - Unnested #ifdefs in special_functions[] for FTRACE. > > Madhavan T. Venkataraman (4): > arm64: Implement infrastructure for stack trace reliability checks > arm64: Mark a stack trace unreliable if an EL1 exception frame is > detected > arm64: Detect FTRACE cases that make the stack trace unreliable > arm64: Mark stack trace as unreliable if kretprobed functions are > present > > arch/arm64/include/asm/exception.h | 8 ++ > arch/arm64/include/asm/stacktrace.h | 2 + > arch/arm64/kernel/entry-ftrace.S | 12 ++ > arch/arm64/kernel/entry.S | 14 +- > arch/arm64/kernel/stacktrace.c | 215 ++++++++++++++++++++++++++++ > 5 files changed, 244 insertions(+), 7 deletions(-) > > > base-commit: 0d02ec6b3136c73c09e7859f0d0e4e2c4c07b49b > -- > 2.25.1 >
On 4/9/21 7:09 AM, Mark Rutland wrote: > Hi Madhavan, > > I've noted some concerns below. At a high-level, I'm not keen on the > blacklisting approach, and I think there's some other preparatory work > that would be more valuable in the short term. > Some kind of blacklisting has to be done whichever way you do it. > On Mon, Apr 05, 2021 at 03:43:09PM -0500, madvenka@linux.microsoft.com wrote: >> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> >> >> There are a number of places in kernel code where the stack trace is not >> reliable. Enhance the unwinder to check for those cases and mark the >> stack trace as unreliable. Once all of the checks are in place, the unwinder >> can provide a reliable stack trace. But before this can be used for livepatch, >> some other entity needs to guarantee that the frame pointers are all set up >> correctly in kernel functions. objtool is currently being worked on to >> fill that gap. >> >> Except for the return address check, all the other checks involve checking >> the return PC of every frame against certain kernel functions. To do this, >> implement some infrastructure code: >> >> - Define a special_functions[] array and populate the array with >> the special functions > > I'm not too keen on having to manually collate this within the unwinder, > as it's very painful from a maintenance perspective. I'd much rather we > could associate this information with the implementations of these > functions, so that they're more likely to stay in sync. > > Further, I believe all the special cases are assembly functions, and > most of those are already in special sections to begin with. I reckon > it'd be simpler and more robust to reject unwinding based on the > section. If we need to unwind across specific functions in those > sections, we could opt-in with some metadata. So e.g. we could reject > all functions in ".entry.text", special casing the EL0 entry functions > if necessary. > Yes. I have already agreed that using sections is the way to go. I am working on that now. > As I mentioned before, I'm currently reworking the entry assembly to > make this simpler to do. I'd prefer to not make invasive changes in that > area until that's sorted. > I don't plan to make any invasive changes. But a couple of cosmetic changes may be necessary. I don't know yet. But I will keep in mind that you don't want any invasive changes there. > I think there's a lot more code that we cannot unwind, e.g. KVM > exception code, or almost anything marked with SYM_CODE_END(). > As Mark Brown suggested, I will take a look at all code that is marked as SYM_CODE. His idea of placing all SYM_CODE in a separate section and blacklisting that to begin with and refining things as we go along appears to me to be a reasonable approach. >> - Using kallsyms_lookup(), lookup the symbol table entries for the >> functions and record their address ranges >> >> - Define an is_reliable_function(pc) to match a return PC against >> the special functions. >> >> The unwinder calls is_reliable_function(pc) for every return PC and marks >> the stack trace as reliable or unreliable accordingly. >> >> Return address check >> ==================== >> >> Check the return PC of every stack frame to make sure that it is a valid >> kernel text address (and not some generated code, for example). >> >> Detect EL1 exception frame >> ========================== >> >> EL1 exceptions can happen on any instruction including instructions in >> the frame pointer prolog or epilog. Depending on where exactly they happen, >> they could render the stack trace unreliable. >> >> Add all of the EL1 exception handlers to special_functions[]. >> >> - el1_sync() >> - el1_irq() >> - el1_error() >> - el1_sync_invalid() >> - el1_irq_invalid() >> - el1_fiq_invalid() >> - el1_error_invalid() >> >> Detect ftrace frame >> =================== >> >> When FTRACE executes at the beginning of a traced function, it creates two >> frames and calls the tracer function: >> >> - One frame for the traced function >> >> - One frame for the caller of the traced function >> >> That gives a sensible stack trace while executing in the tracer function. >> When FTRACE returns to the traced function, the frames are popped and >> everything is back to normal. >> >> However, in cases like live patch, the tracer function redirects execution >> to a different function. When FTRACE returns, control will go to that target >> function. A stack trace taken in the tracer function will not show the target >> function. The target function is the real function that we want to track. >> So, the stack trace is unreliable. > > This doesn't match my understanding of the reliable stacktrace > requirements, but I might have misunderstood what you're saying here. > > IIUC what you're describing here is: > > 1) A calls B > 2) B is traced > 3) tracer replaces B with TARGET > 4) tracer returns to TARGET > > ... and if a stacktrace is taken at step 3 (before the return address is > patched), the trace will show B rather than TARGET. > > My understanding is that this is legitimate behaviour. > My understanding is as follows (correct me if I am wrong): - Before B is traced, the situation is "A calls B". - A trace is placed on B to redirect execution to TARGET. Semantically, it becomes "A calls TARGET" beyond that point and B is irrelevant. - But temporarily, the stack trace will show A -> B. >> To detect stack traces from a tracer function, add the following to >> special_functions[]: >> >> - ftrace_call + 4 >> >> ftrace_call is the label at which the tracer function is patched in. So, >> ftrace_call + 4 is its return address. This is what will show up in a >> stack trace taken from the tracer function. >> >> When Function Graph Tracing is on, ftrace_graph_caller is patched in >> at the label ftrace_graph_call. If a tracer function called before it has >> redirected execution as mentioned above, the stack traces taken from within >> ftrace_graph_caller will also be unreliable for the same reason as mentioned >> above. So, add ftrace_graph_caller to special_functions[] as well. >> >> Also, the Function Graph Tracer modifies the return address of a traced >> function to a return trampoline (return_to_handler()) to gather tracing >> data on function return. Stack traces taken from the traced function and >> functions it calls will not show the original caller of the traced function. >> The unwinder handles this case by getting the original caller from FTRACE. >> >> However, stack traces taken from the trampoline itself and functions it calls >> are unreliable as the original return address may not be available in >> that context. This is because the trampoline calls FTRACE to gather trace >> data as well as to obtain the actual return address and FTRACE discards the >> record of the original return address along the way. > > The reason we cannot unwind the trampolines in the usual way is because > they are not AAPCS compliant functions. We don't discard the original > return address, but it's not in the usual location. With care, we could > write a special case unwinder for them. Note that we also cannot unwind > from any PLT on the way to the trampolines, so we'd also need to > identify those. Luckily we're in charge of creating those, and (for > now) we only need to care about the module PLTs. > > The bigger problem is return_to_handler, since there's a transient > period when C code removes the return address from the graph return > stack before passing this to assembly in a register, and so we can't > reliably find the correct return address during this period. With care > we could special case unwinding immediately before/after this. > This is what I meant when I said "as the original return address may not be available in that context" because the original address is popped off the return address stack by the ftrace code called from the trampoline. > If we could find a way to restructure return_to_handler such that we can > reliably find the correct return address, that would be a useful > improvement today, and would mean that we don't have to blacklist it for > reliable stacktrace. > Agreed. But until then it needs to be blacklisted. Rather than wait for that restructuring to be done, we could initially blacklist it and remove the blacklist if and when the restructuring is done. Thanks. Madhavan
On Fri, Apr 09, 2021 at 01:09:09PM +0100, Mark Rutland wrote: > On Mon, Apr 05, 2021 at 03:43:09PM -0500, madvenka@linux.microsoft.com wrote: > > From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> > > > > There are a number of places in kernel code where the stack trace is not > > reliable. Enhance the unwinder to check for those cases and mark the > > stack trace as unreliable. Once all of the checks are in place, the unwinder > > can provide a reliable stack trace. But before this can be used for livepatch, > > some other entity needs to guarantee that the frame pointers are all set up > > correctly in kernel functions. objtool is currently being worked on to > > fill that gap. > > > > Except for the return address check, all the other checks involve checking > > the return PC of every frame against certain kernel functions. To do this, > > implement some infrastructure code: > > > > - Define a special_functions[] array and populate the array with > > the special functions > > I'm not too keen on having to manually collate this within the unwinder, > as it's very painful from a maintenance perspective. Agreed. > I'd much rather we could associate this information with the > implementations of these functions, so that they're more likely to > stay in sync. > > Further, I believe all the special cases are assembly functions, and > most of those are already in special sections to begin with. I reckon > it'd be simpler and more robust to reject unwinding based on the > section. If we need to unwind across specific functions in those > sections, we could opt-in with some metadata. So e.g. we could reject > all functions in ".entry.text", special casing the EL0 entry functions > if necessary. Couldn't this also end up being somewhat fragile? Saying "certain sections are deemed unreliable" isn't necessarily obvious to somebody who doesn't already know about it, and it could be overlooked or forgotten over time. And there's no way to enforce it stays that way. FWIW, over the years we've had zero issues with encoding the frame pointer on x86. After you save pt_regs, you encode the frame pointer to point to it. Ideally in the same macro so it's hard to overlook. If you're concerned about debuggers getting confused by the encoding - which debuggers specifically? In my experience, if vmlinux has debuginfo, gdb and most other debuggers will use DWARF (which is already broken in asm code) and completely ignore frame pointers. > I think there's a lot more code that we cannot unwind, e.g. KVM > exception code, or almost anything marked with SYM_CODE_END(). Just a reminder that livepatch only unwinds blocked tasks (plus the 'current' task which calls into livepatch). So practically speaking, it doesn't matter whether the 'unreliable' detection has full coverage. The only exceptions which really matter are those which end up calling schedule(), e.g. preemption or page faults. Being able to consistently detect *all* possible unreliable paths would be nice in theory, but it's unnecessary and may not be worth the extra complexity.
On 4/9/21 4:37 PM, Josh Poimboeuf wrote: > On Fri, Apr 09, 2021 at 01:09:09PM +0100, Mark Rutland wrote: >> On Mon, Apr 05, 2021 at 03:43:09PM -0500, madvenka@linux.microsoft.com wrote: >>> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> >>> >>> There are a number of places in kernel code where the stack trace is not >>> reliable. Enhance the unwinder to check for those cases and mark the >>> stack trace as unreliable. Once all of the checks are in place, the unwinder >>> can provide a reliable stack trace. But before this can be used for livepatch, >>> some other entity needs to guarantee that the frame pointers are all set up >>> correctly in kernel functions. objtool is currently being worked on to >>> fill that gap. >>> >>> Except for the return address check, all the other checks involve checking >>> the return PC of every frame against certain kernel functions. To do this, >>> implement some infrastructure code: >>> >>> - Define a special_functions[] array and populate the array with >>> the special functions >> >> I'm not too keen on having to manually collate this within the unwinder, >> as it's very painful from a maintenance perspective. > > Agreed. > >> I'd much rather we could associate this information with the >> implementations of these functions, so that they're more likely to >> stay in sync. >> >> Further, I believe all the special cases are assembly functions, and >> most of those are already in special sections to begin with. I reckon >> it'd be simpler and more robust to reject unwinding based on the >> section. If we need to unwind across specific functions in those >> sections, we could opt-in with some metadata. So e.g. we could reject >> all functions in ".entry.text", special casing the EL0 entry functions >> if necessary. > > Couldn't this also end up being somewhat fragile? Saying "certain > sections are deemed unreliable" isn't necessarily obvious to somebody > who doesn't already know about it, and it could be overlooked or > forgotten over time. And there's no way to enforce it stays that way. > Good point! > FWIW, over the years we've had zero issues with encoding the frame > pointer on x86. After you save pt_regs, you encode the frame pointer to > point to it. Ideally in the same macro so it's hard to overlook. > I had the same opinion. In fact, in my encoding scheme, I have additional checks to make absolutely sure that it is a true encoding and not stack corruption. The chances of all of those values accidentally matching are, well, null. > If you're concerned about debuggers getting confused by the encoding - > which debuggers specifically? In my experience, if vmlinux has > debuginfo, gdb and most other debuggers will use DWARF (which is already > broken in asm code) and completely ignore frame pointers. > Yes. I checked gdb actually. It did not show a problem. >> I think there's a lot more code that we cannot unwind, e.g. KVM >> exception code, or almost anything marked with SYM_CODE_END(). > > Just a reminder that livepatch only unwinds blocked tasks (plus the > 'current' task which calls into livepatch). So practically speaking, it > doesn't matter whether the 'unreliable' detection has full coverage. > The only exceptions which really matter are those which end up calling > schedule(), e.g. preemption or page faults. > > Being able to consistently detect *all* possible unreliable paths would > be nice in theory, but it's unnecessary and may not be worth the extra > complexity. > You do have a point. I tried to think of arch_stack_walk_reliable() as something that should be implemented independent of livepatching. But I could not really come up with a single example of where else it would really be useful. So, if we assume that the reliable stack trace is solely for the purpose of livepatching, I agree with your earlier comments as well. Thanks! Madhavan
On Fri, Apr 09, 2021 at 05:05:58PM -0500, Madhavan T. Venkataraman wrote: > > FWIW, over the years we've had zero issues with encoding the frame > > pointer on x86. After you save pt_regs, you encode the frame pointer to > > point to it. Ideally in the same macro so it's hard to overlook. > > > > I had the same opinion. In fact, in my encoding scheme, I have additional > checks to make absolutely sure that it is a true encoding and not stack > corruption. The chances of all of those values accidentally matching are, > well, null. Right, stack corruption -- which is already exceedingly rare -- would have to be combined with a miracle or two in order to come out of the whole thing marked as 'reliable' :-) And really, we already take a similar risk today by "trusting" the frame pointer value on the stack to a certain extent. > >> I think there's a lot more code that we cannot unwind, e.g. KVM > >> exception code, or almost anything marked with SYM_CODE_END(). > > > > Just a reminder that livepatch only unwinds blocked tasks (plus the > > 'current' task which calls into livepatch). So practically speaking, it > > doesn't matter whether the 'unreliable' detection has full coverage. > > The only exceptions which really matter are those which end up calling > > schedule(), e.g. preemption or page faults. > > > > Being able to consistently detect *all* possible unreliable paths would > > be nice in theory, but it's unnecessary and may not be worth the extra > > complexity. > > > > You do have a point. I tried to think of arch_stack_walk_reliable() as > something that should be implemented independent of livepatching. But > I could not really come up with a single example of where else it would > really be useful. > > So, if we assume that the reliable stack trace is solely for the purpose > of livepatching, I agree with your earlier comments as well. One thought: if folks really view this as a problem, it might help to just rename things to reduce confusion. For example, instead of calling it 'reliable', we could call it something more precise, like 'klp_reliable', to indicate that its reliable enough for live patching. Then have a comment above 'klp_reliable' and/or stack_trace_save_tsk_klp_reliable() which describes what that means. Hm, for that matter, even without renaming things, a comment above stack_trace_save_tsk_reliable() describing the meaning of "reliable" would be a good idea.
On Fri, Apr 09, 2021 at 05:32:27PM -0500, Josh Poimboeuf wrote: > On Fri, Apr 09, 2021 at 05:05:58PM -0500, Madhavan T. Venkataraman wrote: > > > FWIW, over the years we've had zero issues with encoding the frame > > > pointer on x86. After you save pt_regs, you encode the frame pointer to > > > point to it. Ideally in the same macro so it's hard to overlook. > > > > > > > I had the same opinion. In fact, in my encoding scheme, I have additional > > checks to make absolutely sure that it is a true encoding and not stack > > corruption. The chances of all of those values accidentally matching are, > > well, null. > > Right, stack corruption -- which is already exceedingly rare -- would > have to be combined with a miracle or two in order to come out of the > whole thing marked as 'reliable' :-) > > And really, we already take a similar risk today by "trusting" the frame > pointer value on the stack to a certain extent. Oh yeah, I forgot to mention some more benefits of encoding the frame pointer (or marking pt_regs in some other way): a) Stack addresses can be printed properly: '%pS' for printing regs->pc and '%pB' for printing call returns. Using '%pS' for call returns (as arm64 seems to do today) will result in printing the wrong function when you have tail calls to noreturn functions on the stack (which is actually quite common for calls to panic(), die(), etc). More details: https://lkml.kernel.org/r/20210403155948.ubbgtwmlsdyar7yp@treble b) Stack dumps to the console can dump the exception registers they find along the way. This is actually quite nice for debugging.
On 4/9/21 5:53 PM, Josh Poimboeuf wrote: > On Fri, Apr 09, 2021 at 05:32:27PM -0500, Josh Poimboeuf wrote: >> On Fri, Apr 09, 2021 at 05:05:58PM -0500, Madhavan T. Venkataraman wrote: >>>> FWIW, over the years we've had zero issues with encoding the frame >>>> pointer on x86. After you save pt_regs, you encode the frame pointer to >>>> point to it. Ideally in the same macro so it's hard to overlook. >>>> >>> >>> I had the same opinion. In fact, in my encoding scheme, I have additional >>> checks to make absolutely sure that it is a true encoding and not stack >>> corruption. The chances of all of those values accidentally matching are, >>> well, null. >> >> Right, stack corruption -- which is already exceedingly rare -- would >> have to be combined with a miracle or two in order to come out of the >> whole thing marked as 'reliable' :-) >> >> And really, we already take a similar risk today by "trusting" the frame >> pointer value on the stack to a certain extent. > > Oh yeah, I forgot to mention some more benefits of encoding the frame > pointer (or marking pt_regs in some other way): > > a) Stack addresses can be printed properly: '%pS' for printing regs->pc > and '%pB' for printing call returns. > > Using '%pS' for call returns (as arm64 seems to do today) will result > in printing the wrong function when you have tail calls to noreturn > functions on the stack (which is actually quite common for calls to > panic(), die(), etc). > > More details: > > https://lkml.kernel.org/r/20210403155948.ubbgtwmlsdyar7yp@treble > > b) Stack dumps to the console can dump the exception registers they find > along the way. This is actually quite nice for debugging. > > Great. I am preparing version 3 taking into account comments from yourself, Mark Rutland and Mark Brown. Stay tuned. Madhavan
On Fri, Apr 09, 2021 at 05:32:27PM -0500, Josh Poimboeuf wrote: > Hm, for that matter, even without renaming things, a comment above > stack_trace_save_tsk_reliable() describing the meaning of "reliable" > would be a good idea. Might be better to place something at the prototype for arch_stack_walk_reliable() or cross link the two since that's where any new architectures should be starting, or perhaps even better to extend the document that Mark wrote further and point to that from both places. Some more explict pointer to live patching as the only user would definitely be good but I think the more important thing would be writing down any assumptions in the API that aren't already written down and we're supposed to be relying on. Mark's document captured a lot of it but it sounds like there's more here, and even with knowing that this interface is only used by live patch and digging into what it does it's not always clear what happens to work with the code right now and what's something that's suitable to be relied on.
On Fri, Apr 09, 2021 at 04:37:41PM -0500, Josh Poimboeuf wrote: > On Fri, Apr 09, 2021 at 01:09:09PM +0100, Mark Rutland wrote: > > Further, I believe all the special cases are assembly functions, and > > most of those are already in special sections to begin with. I reckon > > it'd be simpler and more robust to reject unwinding based on the > > section. If we need to unwind across specific functions in those > > sections, we could opt-in with some metadata. So e.g. we could reject > > all functions in ".entry.text", special casing the EL0 entry functions > > if necessary. > Couldn't this also end up being somewhat fragile? Saying "certain > sections are deemed unreliable" isn't necessarily obvious to somebody > who doesn't already know about it, and it could be overlooked or > forgotten over time. And there's no way to enforce it stays that way. Anything in this area is going to have some opportunity for fragility and missed assumptions somewhere. I do find the idea of using the SYM_CODE annotations that we already have and use for other purposes to flag code that we don't expect to be suitable for reliable unwinding appealing from that point of view. It's pretty clear at the points where they're used that they're needed, even with a pretty surface level review, and the bit actually pushing things into a section is going to be in a single place where the macro is defined. That seems relatively robust as these things go, it seems no worse than our reliance on SYM_FUNC to create BTI annotations. Missing those causes oopses when we try to branch to the function.
On 4/12/21 12:36 PM, Mark Brown wrote: > On Fri, Apr 09, 2021 at 04:37:41PM -0500, Josh Poimboeuf wrote: >> On Fri, Apr 09, 2021 at 01:09:09PM +0100, Mark Rutland wrote: > >>> Further, I believe all the special cases are assembly functions, and >>> most of those are already in special sections to begin with. I reckon >>> it'd be simpler and more robust to reject unwinding based on the >>> section. If we need to unwind across specific functions in those >>> sections, we could opt-in with some metadata. So e.g. we could reject >>> all functions in ".entry.text", special casing the EL0 entry functions >>> if necessary. > >> Couldn't this also end up being somewhat fragile? Saying "certain >> sections are deemed unreliable" isn't necessarily obvious to somebody >> who doesn't already know about it, and it could be overlooked or >> forgotten over time. And there's no way to enforce it stays that way. > > Anything in this area is going to have some opportunity for fragility > and missed assumptions somewhere. I do find the idea of using the > SYM_CODE annotations that we already have and use for other purposes to > flag code that we don't expect to be suitable for reliable unwinding > appealing from that point of view. It's pretty clear at the points > where they're used that they're needed, even with a pretty surface level > review, and the bit actually pushing things into a section is going to > be in a single place where the macro is defined. That seems relatively > robust as these things go, it seems no worse than our reliance on > SYM_FUNC to create BTI annotations. Missing those causes oopses when we > try to branch to the function. > OK. Just so I am clear on the whole picture, let me state my understanding so far. Correct me if I am wrong. 1. We are hoping that we can convert a significant number of SYM_CODE functions to SYM_FUNC functions by providing them with a proper FP prolog and epilog so that we can get objtool coverage for them. These don't need any blacklisting. 2. If we can locate the pt_regs structures created on the stack cleanly for EL1 exceptions, etc, then we can handle those cases in the unwinder without needing any black listing. I have a solution for this in version 3 that does it without encoding the FP or matching values on the stack. I have addressed all of the objections so far on that count. I will send the patch series out soon. 3. We are going to assume that the reliable unwinder is only for livepatch purposes and will only be invoked on a task that is not currently running. The task either voluntarily gave up the CPU or was pre-empted. We can safely ignore all SYM_CODE functions that will never voluntarily give up the CPU. They can only be pre-empted and pre-emption is already handled in (2). We don't need to blacklist any of these functions. 4. So, the only functions that will need blacklisting are the remaining SYM_CODE functions that might give up the CPU voluntarily. At this point, I am not even sure how many of these will exist. One hopes that all of these would have ended up as SYM_FUNC functions in (1). So, IMHO, placing code in a black listed section should be the last step and not the first one. This also satisfies Mark Rutland's requirement that no one muck with the entry text while he is sorting out that code. I suggest we do (3) first. Then, review the assembly functions to do (1). Then, review the remaining ones to see which ones must be blacklisted, if any. Do you agree? Madhavan
On Mon, Apr 12, 2021 at 02:55:35PM -0500, Madhavan T. Venkataraman wrote: > > OK. Just so I am clear on the whole picture, let me state my understanding so far. > Correct me if I am wrong. > 1. We are hoping that we can convert a significant number of SYM_CODE functions to > SYM_FUNC functions by providing them with a proper FP prolog and epilog so that > we can get objtool coverage for them. These don't need any blacklisting. I wouldn't expect to be converting lots of SYM_CODE to SYM_FUNC. I'd expect the overwhelming majority of SYM_CODE to be SYM_CODE because it's required to be non standard due to some external interface - things like the exception vectors, ftrace, and stuff around suspend/hibernate. A quick grep seems to confirm this. > 3. We are going to assume that the reliable unwinder is only for livepatch purposes > and will only be invoked on a task that is not currently running. The task either The reliable unwinder can also be invoked on itself. > 4. So, the only functions that will need blacklisting are the remaining SYM_CODE functions > that might give up the CPU voluntarily. At this point, I am not even sure how > many of these will exist. One hopes that all of these would have ended up as > SYM_FUNC functions in (1). There's stuff like ret_from_fork there. > I suggest we do (3) first. Then, review the assembly functions to do (1). Then, review the > remaining ones to see which ones must be blacklisted, if any. I'm not clear what the concrete steps you're planning to do first are there - your 3 seems like a statement of assumptions. For flagging functions I do think it'd be safer to default to assuming that all SYM_CODE functions can't be unwound reliably rather than only explicitly listing ones that cause problems.
On Mon, Apr 12, 2021 at 05:59:33PM +0100, Mark Brown wrote: > On Fri, Apr 09, 2021 at 05:32:27PM -0500, Josh Poimboeuf wrote: > > > Hm, for that matter, even without renaming things, a comment above > > stack_trace_save_tsk_reliable() describing the meaning of "reliable" > > would be a good idea. > > Might be better to place something at the prototype for > arch_stack_walk_reliable() or cross link the two since that's where any > new architectures should be starting, or perhaps even better to extend > the document that Mark wrote further and point to that from both places. > > Some more explict pointer to live patching as the only user would > definitely be good but I think the more important thing would be writing > down any assumptions in the API that aren't already written down and > we're supposed to be relying on. Mark's document captured a lot of it > but it sounds like there's more here, and even with knowing that this > interface is only used by live patch and digging into what it does it's > not always clear what happens to work with the code right now and what's > something that's suitable to be relied on. Something like so? From: Josh Poimboeuf <jpoimboe@redhat.com> Subject: [PATCH] livepatch: Clarify the meaning of 'reliable' Update the comments and documentation to reflect what 'reliable' unwinding actually means, in the context of live patching. Suggested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> --- .../livepatch/reliable-stacktrace.rst | 26 +++++++++++++---- arch/x86/kernel/stacktrace.c | 6 ---- include/linux/stacktrace.h | 29 +++++++++++++++++-- kernel/stacktrace.c | 7 ++++- 4 files changed, 53 insertions(+), 15 deletions(-) diff --git a/Documentation/livepatch/reliable-stacktrace.rst b/Documentation/livepatch/reliable-stacktrace.rst index 67459d2ca2af..e325efc7e952 100644 --- a/Documentation/livepatch/reliable-stacktrace.rst +++ b/Documentation/livepatch/reliable-stacktrace.rst @@ -72,7 +72,21 @@ The unwinding process varies across architectures, their respective procedure call standards, and kernel configurations. This section describes common details that architectures should consider. -4.1 Identifying successful termination +4.1 Only preemptible code needs reliability detection +----------------------------------------------------- + +The only current user of reliable stacktracing is livepatch, which only +calls it for a) inactive tasks; or b) the current task in task context. + +Therefore, the unwinder only needs to detect the reliability of stacks +involving *preemptible* code. + +Practically speaking, reliability of stacks involving *non-preemptible* +code is a "don't-care". It may help to return a wrong reliability +result for such cases, if it results in reduced complexity, since such +cases will not happen in practice. + +4.2 Identifying successful termination -------------------------------------- Unwinding may terminate early for a number of reasons, including: @@ -95,7 +109,7 @@ architectures verify that a stacktrace ends at an expected location, e.g. * On a specific stack expected for a kernel entry point (e.g. if the architecture has separate task and IRQ stacks). -4.2 Identifying unwindable code +4.3 Identifying unwindable code ------------------------------- Unwinding typically relies on code following specific conventions (e.g. @@ -129,7 +143,7 @@ unreliable to unwind from, e.g. * Identifying specific portions of code using bounds information. -4.3 Unwinding across interrupts and exceptions +4.4 Unwinding across interrupts and exceptions ---------------------------------------------- At function call boundaries the stack and other unwind state is expected to be @@ -156,7 +170,7 @@ have no such cases) should attempt to unwind across exception boundaries, as doing so can prevent unnecessarily stalling livepatch consistency checks and permits livepatch transitions to complete more quickly. -4.4 Rewriting of return addresses +4.5 Rewriting of return addresses --------------------------------- Some trampolines temporarily modify the return address of a function in order @@ -222,7 +236,7 @@ middle of return_to_handler and can report this as unreliable. Architectures are not required to unwind from other trampolines which modify the return address. -4.5 Obscuring of return addresses +4.6 Obscuring of return addresses --------------------------------- Some trampolines do not rewrite the return address in order to intercept @@ -249,7 +263,7 @@ than the link register as would usually be the case. Architectures must either ensure that unwinders either reliably unwind such cases, or report the unwinding as unreliable. -4.6 Link register unreliability +4.7 Link register unreliability ------------------------------- On some other architectures, 'call' instructions place the return address into a diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c index 8627fda8d993..15b058eefc4e 100644 --- a/arch/x86/kernel/stacktrace.c +++ b/arch/x86/kernel/stacktrace.c @@ -29,12 +29,6 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, } } -/* - * This function returns an error if it detects any unreliable features of the - * stack. Otherwise it guarantees that the stack trace is reliable. - * - * If the task is not 'current', the caller *must* ensure the task is inactive. - */ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void *cookie, struct task_struct *task) { diff --git a/include/linux/stacktrace.h b/include/linux/stacktrace.h index 50e2df30b0aa..1b6a65a0ad22 100644 --- a/include/linux/stacktrace.h +++ b/include/linux/stacktrace.h @@ -26,7 +26,7 @@ unsigned int stack_trace_save_user(unsigned long *store, unsigned int size); #ifdef CONFIG_ARCH_STACKWALK /** - * stack_trace_consume_fn - Callback for arch_stack_walk() + * stack_trace_consume_fn() - Callback for arch_stack_walk() * @cookie: Caller supplied pointer handed back by arch_stack_walk() * @addr: The stack entry address to consume * @@ -35,7 +35,7 @@ unsigned int stack_trace_save_user(unsigned long *store, unsigned int size); */ typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr); /** - * arch_stack_walk - Architecture specific function to walk the stack + * arch_stack_walk() - Architecture specific function to walk the stack * @consume_entry: Callback which is invoked by the architecture code for * each entry. * @cookie: Caller supplied pointer which is handed back to @@ -52,8 +52,33 @@ typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr); */ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, struct task_struct *task, struct pt_regs *regs); + +/** + * arch_stack_walk_reliable() - Architecture specific function to walk the + * stack, with stack reliability check + * @consume_entry: Callback which is invoked by the architecture code for + * each entry. + * @cookie: Caller supplied pointer which is handed back to + * @consume_entry + * @task: Pointer to a task struct, can be NULL for current + * + * Return: 0 if the stack trace is considered reliable for livepatch; else < 0. + * + * NOTE: This interface is only used by livepatch. The caller must ensure that + * it's only called in one of the following two scenarios: + * + * a) the task is inactive (and guaranteed to remain so); or + * + * b) the task is 'current', running in task context. + * + * Effectively, this means the arch unwinder doesn't need to detect the + * reliability of stacks involving non-preemptible code. + * + * For more details, see Documentation/livepatch/reliable-stacktrace.rst. + */ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void *cookie, struct task_struct *task); + void arch_stack_walk_user(stack_trace_consume_fn consume_entry, void *cookie, const struct pt_regs *regs); diff --git a/kernel/stacktrace.c b/kernel/stacktrace.c index 9f8117c7cfdd..a198fd194fed 100644 --- a/kernel/stacktrace.c +++ b/kernel/stacktrace.c @@ -185,7 +185,12 @@ unsigned int stack_trace_save_regs(struct pt_regs *regs, unsigned long *store, * stack. Otherwise it guarantees that the stack trace is * reliable and returns the number of entries stored. * - * If the task is not 'current', the caller *must* ensure the task is inactive. + * NOTE: This interface is only used by livepatch. The caller must ensure that + * it's only called in one of the following two scenarios: + * + * a) the task is inactive (and guaranteed to remain so); or + * + * b) the task is 'current', running in task context. */ int stack_trace_save_tsk_reliable(struct task_struct *tsk, unsigned long *store, unsigned int size)
On 4/13/21 6:02 AM, Mark Brown wrote: > On Mon, Apr 12, 2021 at 02:55:35PM -0500, Madhavan T. Venkataraman wrote: > >> >> OK. Just so I am clear on the whole picture, let me state my understanding so far. >> Correct me if I am wrong. > >> 1. We are hoping that we can convert a significant number of SYM_CODE functions to >> SYM_FUNC functions by providing them with a proper FP prolog and epilog so that >> we can get objtool coverage for them. These don't need any blacklisting. > > I wouldn't expect to be converting lots of SYM_CODE to SYM_FUNC. I'd > expect the overwhelming majority of SYM_CODE to be SYM_CODE because it's > required to be non standard due to some external interface - things like > the exception vectors, ftrace, and stuff around suspend/hibernate. A > quick grep seems to confirm this. > OK. Fair enough. >> 3. We are going to assume that the reliable unwinder is only for livepatch purposes >> and will only be invoked on a task that is not currently running. The task either > > The reliable unwinder can also be invoked on itself. > I have not called out the self-directed case because I am assuming that the reliable unwinder is only used for livepatch. So, AFAICT, this is applicable to the task that performs the livepatch operation itself. In this case, there should be no unreliable functions on the self-directed stack trace (otherwise, livepatching would always fail). >> 4. So, the only functions that will need blacklisting are the remaining SYM_CODE functions >> that might give up the CPU voluntarily. At this point, I am not even sure how >> many of these will exist. One hopes that all of these would have ended up as >> SYM_FUNC functions in (1). > > There's stuff like ret_from_fork there. > OK. There would be a few functions that fit this category. I agree. >> I suggest we do (3) first. Then, review the assembly functions to do (1). Then, review the >> remaining ones to see which ones must be blacklisted, if any. > > I'm not clear what the concrete steps you're planning to do first are > there - your 3 seems like a statement of assumptions. For flagging > functions I do think it'd be safer to default to assuming that all > SYM_CODE functions can't be unwound reliably rather than only explicitly > listing ones that cause problems. > They are not assumptions. They are true statements. But I probably did not do a good job of explaining. But Josh sent out a patch that updates the documentation that explains what I said a lot better. In any case, I have absolutely no problems in implementing your section idea. I will make an attempt to do that in version 3 of my patch series. Stay tuned. And, thanks for all the input. It is very helpful. Madhavan
On Tue, Apr 13, 2021 at 05:53:10PM -0500, Josh Poimboeuf wrote: > On Mon, Apr 12, 2021 at 05:59:33PM +0100, Mark Brown wrote: > > Some more explict pointer to live patching as the only user would > > definitely be good but I think the more important thing would be writing > > down any assumptions in the API that aren't already written down and > Something like so? Yeah, looks reasonable - it'll need rebasing against current code as I moved the docs in the source out of the arch code into the header this cycle (they were copied verbatim in a couple of places). > #ifdef CONFIG_ARCH_STACKWALK > > /** > - * stack_trace_consume_fn - Callback for arch_stack_walk() > + * stack_trace_consume_fn() - Callback for arch_stack_walk() > * @cookie: Caller supplied pointer handed back by arch_stack_walk() > * @addr: The stack entry address to consume > * > @@ -35,7 +35,7 @@ unsigned int stack_trace_save_user(unsigned long *store, unsigned int size); > */ > typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr); > /** > - * arch_stack_walk - Architecture specific function to walk the stack > + * arch_stack_walk() - Architecture specific function to walk the stack > * @consume_entry: Callback which is invoked by the architecture code for > * each entry. > * @cookie: Caller supplied pointer which is handed back to These two should be separated.
On Wed, Apr 14, 2021 at 05:23:38AM -0500, Madhavan T. Venkataraman wrote: > On 4/13/21 6:02 AM, Mark Brown wrote: > > On Mon, Apr 12, 2021 at 02:55:35PM -0500, Madhavan T. Venkataraman wrote: > >> 3. We are going to assume that the reliable unwinder is only for livepatch purposes > >> and will only be invoked on a task that is not currently running. The task either > > > > The reliable unwinder can also be invoked on itself. > I have not called out the self-directed case because I am assuming that the reliable unwinder > is only used for livepatch. So, AFAICT, this is applicable to the task that performs the > livepatch operation itself. In this case, there should be no unreliable functions on the > self-directed stack trace (otherwise, livepatching would always fail). Someone might've added a probe of some kind which upsets things so there's a possibility things might fail. Like you say there's no way a system in such a state can succesfully apply a live patch but we might still run into that situation. > >> I suggest we do (3) first. Then, review the assembly functions to do (1). Then, review the > >> remaining ones to see which ones must be blacklisted, if any. > > I'm not clear what the concrete steps you're planning to do first are > > there - your 3 seems like a statement of assumptions. For flagging > > functions I do think it'd be safer to default to assuming that all > > SYM_CODE functions can't be unwound reliably rather than only explicitly > > listing ones that cause problems. > They are not assumptions. They are true statements. But I probably did not do a good > job of explaining. But Josh sent out a patch that updates the documentation that > explains what I said a lot better. You say true statements, I say assumptions :)
On 4/14/21 5:23 AM, Madhavan T. Venkataraman wrote: > In any case, I have absolutely no problems in implementing your section idea. I will > make an attempt to do that in version 3 of my patch series. So, I attempted a patch with just declaring all .entry.text functions as unreliable by checking just the section bounds. It does work for EL1 exceptions. But there are other functions that are actually reliable that show up as unreliable. The example in my test is el0_sync() which is at the base of all system call stacks. How would you prefer I handle this? Should I place all SYM_CODE functions that are actually safe for the unwinder in a separate section? I could just take some approach and solve this. But I would like to get your opinion and Mark Rutland's opinion so we are all on the same page. Please let me know. Madhavan
On Fri, Apr 16, 2021 at 09:43:48AM -0500, Madhavan T. Venkataraman wrote: > How would you prefer I handle this? Should I place all SYM_CODE functions that > are actually safe for the unwinder in a separate section? I could just take > some approach and solve this. But I would like to get your opinion and Mark > Rutland's opinion so we are all on the same page. That sounds reasonable to me, obviously we'd have to look at how exactly the annotation ends up getting done and general bikeshed colour discussions. I'm not sure if we want a specific "safe for unwinder section" or to split things up into sections per reason things are safe for the unwinder (kind of like what you were proposing for flagging things as a problem), that might end up being useful for other things at some point.
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> There are a number of places in kernel code where the stack trace is not reliable. Enhance the unwinder to check for those cases and mark the stack trace as unreliable. Once all of the checks are in place, the unwinder can provide a reliable stack trace. But before this can be used for livepatch, some other entity needs to guarantee that the frame pointers are all set up correctly in kernel functions. objtool is currently being worked on to fill that gap. Except for the return address check, all the other checks involve checking the return PC of every frame against certain kernel functions. To do this, implement some infrastructure code: - Define a special_functions[] array and populate the array with the special functions - Using kallsyms_lookup(), lookup the symbol table entries for the functions and record their address ranges - Define an is_reliable_function(pc) to match a return PC against the special functions. The unwinder calls is_reliable_function(pc) for every return PC and marks the stack trace as reliable or unreliable accordingly. Return address check ==================== Check the return PC of every stack frame to make sure that it is a valid kernel text address (and not some generated code, for example). Detect EL1 exception frame ========================== EL1 exceptions can happen on any instruction including instructions in the frame pointer prolog or epilog. Depending on where exactly they happen, they could render the stack trace unreliable. Add all of the EL1 exception handlers to special_functions[]. - el1_sync() - el1_irq() - el1_error() - el1_sync_invalid() - el1_irq_invalid() - el1_fiq_invalid() - el1_error_invalid() Detect ftrace frame =================== When FTRACE executes at the beginning of a traced function, it creates two frames and calls the tracer function: - One frame for the traced function - One frame for the caller of the traced function That gives a sensible stack trace while executing in the tracer function. When FTRACE returns to the traced function, the frames are popped and everything is back to normal. However, in cases like live patch, the tracer function redirects execution to a different function. When FTRACE returns, control will go to that target function. A stack trace taken in the tracer function will not show the target function. The target function is the real function that we want to track. So, the stack trace is unreliable. To detect stack traces from a tracer function, add the following to special_functions[]: - ftrace_call + 4 ftrace_call is the label at which the tracer function is patched in. So, ftrace_call + 4 is its return address. This is what will show up in a stack trace taken from the tracer function. When Function Graph Tracing is on, ftrace_graph_caller is patched in at the label ftrace_graph_call. If a tracer function called before it has redirected execution as mentioned above, the stack traces taken from within ftrace_graph_caller will also be unreliable for the same reason as mentioned above. So, add ftrace_graph_caller to special_functions[] as well. Also, the Function Graph Tracer modifies the return address of a traced function to a return trampoline (return_to_handler()) to gather tracing data on function return. Stack traces taken from the traced function and functions it calls will not show the original caller of the traced function. The unwinder handles this case by getting the original caller from FTRACE. However, stack traces taken from the trampoline itself and functions it calls are unreliable as the original return address may not be available in that context. This is because the trampoline calls FTRACE to gather trace data as well as to obtain the actual return address and FTRACE discards the record of the original return address along the way. Add return_to_handler() to special_functions[]. Check for kretprobe =================== For functions with a kretprobe set up, probe code executes on entry to the function and replaces the return address in the stack frame with a kretprobe trampoline. Whenever the function returns, control is transferred to the trampoline. The trampoline eventually returns to the original return address. A stack trace taken while executing in the function (or in functions that get called from the function) will not show the original return address. Similarly, a stack trace taken while executing in the trampoline itself (and functions that get called from the trampoline) will not show the original return address. This means that the caller of the probed function will not show. This makes the stack trace unreliable. Add the kretprobe trampoline to special_functions[]. Optprobes ========= Optprobes may be implemented in the future for arm64. For optprobes, the relevant trampoline(s) can be added to special_functions[]. --- Changelog: v1 - Define a bool field in struct stackframe. This will indicate if a stack trace is reliable. - Implement a special_functions[] array that will be populated with special functions in which the stack trace is considered unreliable. - Using kallsyms_lookup(), get the address ranges for the special functions and record them. - Implement an is_reliable_function(pc). This function will check if a given return PC falls in any of the special functions. If it does, the stack trace is unreliable. - Implement check_reliability() function that will check if a stack frame is reliable. Call is_reliable_function() from check_reliability(). - Before a return PC is checked against special_funtions[], it must be validates as a proper kernel text address. Call __kernel_text_address() from check_reliability(). - Finally, call check_reliability() from unwind_frame() for each stack frame. - Add EL1 exception handlers to special_functions[]. el1_sync(); el1_irq(); el1_error(); el1_sync_invalid(); el1_irq_invalid(); el1_fiq_invalid(); el1_error_invalid(); - The above functions are currently defined as LOCAL symbols. Make them global so that they can be referenced from the unwinder code. - Add FTRACE trampolines to special_functions[]: ftrace_graph_call() ftrace_graph_caller() return_to_handler() - Add the kretprobe trampoline to special functions[]: kretprobe_trampoline() v2 - Removed the terminating entry { 0, 0 } in special_functions[] and replaced it with the idiom { /* sentinel */ }. - Change the ftrace trampoline entry ftrace_graph_call in special_functions[] to ftrace_call + 4 and added explanatory comments. - Unnested #ifdefs in special_functions[] for FTRACE. Madhavan T. Venkataraman (4): arm64: Implement infrastructure for stack trace reliability checks arm64: Mark a stack trace unreliable if an EL1 exception frame is detected arm64: Detect FTRACE cases that make the stack trace unreliable arm64: Mark stack trace as unreliable if kretprobed functions are present arch/arm64/include/asm/exception.h | 8 ++ arch/arm64/include/asm/stacktrace.h | 2 + arch/arm64/kernel/entry-ftrace.S | 12 ++ arch/arm64/kernel/entry.S | 14 +- arch/arm64/kernel/stacktrace.c | 215 ++++++++++++++++++++++++++++ 5 files changed, 244 insertions(+), 7 deletions(-) base-commit: 0d02ec6b3136c73c09e7859f0d0e4e2c4c07b49b