Message ID | 20210315165800.5948-3-madvenka@linux.microsoft.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Implement reliable stack trace | expand |
On Mon, Mar 15, 2021 at 11:57:54AM -0500, madvenka@linux.microsoft.com wrote: > To summarize, pt_regs->stackframe is used (or will be used) as a marker > frame in stack traces. To enable the unwinder to detect these frames, tag > each pt_regs->stackframe with a type. To record the type, use the unused2 > field in struct pt_regs and rename it to frame_type. The types are: Unless I'm misreading what's going on here this is more trying to set a type for the stack as a whole than for a specific stack frame. I'm also finding this a bit confusing as the unwinder already tracks things it calls frame types and it handles types that aren't covered here like SDEI. At the very least there's a naming issue here. Taking a step back though do we want to be tracking this via pt_regs? It's reliant on us robustly finding the correct pt_regs and on having the things that make the stack unreliable explicitly go in and set the appropriate type. That seems like it will be error prone, I'd been expecting to do something more like using sections to filter code for unreliable features based on the addresses of the functions we find on the stack or similar. This could still go wrong of course but there's fewer moving pieces, and especially fewer moving pieces specific to reliable stack trace. I'm wary of tracking data that only ever gets used for the reliable stack trace path given that it's going to be fairly infrequently used and hence tested, especially things that only crop up in cases that are hard to provoke reliably. If there's a way to detect things that doesn't use special data that seems safer. > EL1_FRAME > EL1 exception frame. We do trap into EL2 as well, the patch will track EL2 frames as EL1 frames. Even if we can treat them the same the naming ought to be clear. > FTRACE_FRAME > FTRACE frame. This is implemented later in the series. If using this approach I'd suggest pulling the change in entry-ftrace.S that sets this into this patch, it's easier than adding a note about this being added later and should help with any bisect issues.
On 3/18/21 12:40 PM, Mark Brown wrote: > On Mon, Mar 15, 2021 at 11:57:54AM -0500, madvenka@linux.microsoft.com wrote: > >> To summarize, pt_regs->stackframe is used (or will be used) as a marker >> frame in stack traces. To enable the unwinder to detect these frames, tag >> each pt_regs->stackframe with a type. To record the type, use the unused2 >> field in struct pt_regs and rename it to frame_type. The types are: > > Unless I'm misreading what's going on here this is more trying to set a > type for the stack as a whole than for a specific stack frame. I'm also > finding this a bit confusing as the unwinder already tracks things it > calls frame types and it handles types that aren't covered here like > SDEI. At the very least there's a naming issue here. > When the unwinder gets to EL1 pt_regs->stackframe, it needs to be sure that it is indeed a frame inside an EL1 pt_regs structure. It performs the following checks: FP == pt_regs->regs[29] PC == pt_regs->pc type == EL1_FRAME to confirm that the frame is EL1 pt_regs->stackframe. Similarly, for EL0, the type is EL0_FRAME. Both these frames are on the task stack. So, it is not a stack type. > Taking a step back though do we want to be tracking this via pt_regs? > It's reliant on us robustly finding the correct pt_regs and on having > the things that make the stack unreliable explicitly go in and set the > appropriate type. That seems like it will be error prone, I'd been > expecting to do something more like using sections to filter code for > unreliable features based on the addresses of the functions we find on > the stack or similar. This could still go wrong of course but there's > fewer moving pieces, and especially fewer moving pieces specific to > reliable stack trace. > In that case, I suggest doing both. That is, check the type as well as specific functions. For instance, in the EL1 pt_regs, in addition to the above checks, check the PC against el1_sync(), el1_irq() and el1_error(). I have suggested this in the cover letter. If this is OK with you, we could do that. We want to make really sure that nothing goes wrong with detecting the exception frame. > I'm wary of tracking data that only ever gets used for the reliable > stack trace path given that it's going to be fairly infrequently used > and hence tested, especially things that only crop up in cases that are > hard to provoke reliably. If there's a way to detect things that > doesn't use special data that seems safer. > If you dislike the frame type, I could remove it and just do the following checks: FP == pt_regs->regs[29] PC == pt_regs->pc and the address check against el1_*() functions and similar changes for EL0 as well. I still think that the frame type check makes it more robust. >> EL1_FRAME >> EL1 exception frame. > > We do trap into EL2 as well, the patch will track EL2 frames as EL1 > frames. Even if we can treat them the same the naming ought to be > clear. > Are you referring to ARMv8.1 VHE extension where the kernel can run at EL2? Could you elaborate? I thought that EL2 was basically for Hypervisors. Thanks. >> FTRACE_FRAME >> FTRACE frame. > > This is implemented later in the series. If using this approach I'd > suggest pulling the change in entry-ftrace.S that sets this into this > patch, it's easier than adding a note about this being added later and > should help with any bisect issues. > OK. Good point. Madhavan
On Thu, Mar 18, 2021 at 05:22:49PM -0500, Madhavan T. Venkataraman wrote: > On 3/18/21 12:40 PM, Mark Brown wrote: > > Unless I'm misreading what's going on here this is more trying to set a > > type for the stack as a whole than for a specific stack frame. I'm also > > finding this a bit confusing as the unwinder already tracks things it > > calls frame types and it handles types that aren't covered here like > > SDEI. At the very least there's a naming issue here. > Both these frames are on the task stack. So, it is not a stack type. OTOH it's also not something that applies to every frame but only to the base frame from each stack which I think was more where I was coming from there. In any case, the issue is also that there's already another thing that the unwinder calls a frame type so there's at least that collision which needs to be resolved if nothing else. > > Taking a step back though do we want to be tracking this via pt_regs? > > It's reliant on us robustly finding the correct pt_regs and on having > > the things that make the stack unreliable explicitly go in and set the > > appropriate type. That seems like it will be error prone, I'd been > > expecting to do something more like using sections to filter code for > > unreliable features based on the addresses of the functions we find on > > the stack or similar. This could still go wrong of course but there's > > fewer moving pieces, and especially fewer moving pieces specific to > > reliable stack trace. > In that case, I suggest doing both. That is, check the type as well > as specific functions. For instance, in the EL1 pt_regs, in addition > to the above checks, check the PC against el1_sync(), el1_irq() and > el1_error(). I have suggested this in the cover letter. > If this is OK with you, we could do that. We want to make really sure that > nothing goes wrong with detecting the exception frame. ... > If you dislike the frame type, I could remove it and just do the > following checks: > FP == pt_regs->regs[29] > PC == pt_regs->pc > and the address check against el1_*() functions > and similar changes for EL0 as well. > I still think that the frame type check makes it more robust. Yeah, we know the entry points so they can serve the same role as checking an explicitly written value. It does mean one less operation on exception entry, though I'm not sure that's that a big enough overhead to actually worry about. I don't have *super* strong opinons against adding the explicitly written value other than it being one more thing we don't otherwise use which we have to get right for reliable stack trace, there's a greater risk of bitrot if it's not something that we ever look at outside of the reliable stack trace code. > >> EL1_FRAME > >> EL1 exception frame. > > We do trap into EL2 as well, the patch will track EL2 frames as EL1 > > frames. Even if we can treat them the same the naming ought to be > > clear. > Are you referring to ARMv8.1 VHE extension where the kernel can run > at EL2? Could you elaborate? I thought that EL2 was basically for > Hypervisors. KVM is the main case, yes - IIRC otherwise it's mainly error handlers but I might be missing something. We do recommend that the kernel is started at EL2 where possible. Actually now I look again it's just not adding anything on EL2 entries at all, they use a separate set of macros which aren't updated - this will only update things for EL0 and EL1 entries so my comment above about this tracking EL2 as EL1 isn't accurate.
On 3/19/21 8:22 AM, Mark Brown wrote: > On Thu, Mar 18, 2021 at 05:22:49PM -0500, Madhavan T. Venkataraman wrote: >> On 3/18/21 12:40 PM, Mark Brown wrote: > >>> Unless I'm misreading what's going on here this is more trying to set a >>> type for the stack as a whole than for a specific stack frame. I'm also >>> finding this a bit confusing as the unwinder already tracks things it >>> calls frame types and it handles types that aren't covered here like >>> SDEI. At the very least there's a naming issue here. > >> Both these frames are on the task stack. So, it is not a stack type. > > OTOH it's also not something that applies to every frame but only to the > base frame from each stack which I think was more where I was coming > from there. In any case, the issue is also that there's already another > thing that the unwinder calls a frame type so there's at least that > collision which needs to be resolved if nothing else. > The base frame from each stack as well as intermediate marker frames such as the EL1 frame and the Ftrace frame. As for the frame type, I will try to come up with a better name. >>> Taking a step back though do we want to be tracking this via pt_regs? >>> It's reliant on us robustly finding the correct pt_regs and on having >>> the things that make the stack unreliable explicitly go in and set the >>> appropriate type. That seems like it will be error prone, I'd been >>> expecting to do something more like using sections to filter code for >>> unreliable features based on the addresses of the functions we find on >>> the stack or similar. This could still go wrong of course but there's >>> fewer moving pieces, and especially fewer moving pieces specific to >>> reliable stack trace. > >> In that case, I suggest doing both. That is, check the type as well >> as specific functions. For instance, in the EL1 pt_regs, in addition >> to the above checks, check the PC against el1_sync(), el1_irq() and >> el1_error(). I have suggested this in the cover letter. > >> If this is OK with you, we could do that. We want to make really sure that >> nothing goes wrong with detecting the exception frame. > > ... > >> If you dislike the frame type, I could remove it and just do the >> following checks: > >> FP == pt_regs->regs[29] >> PC == pt_regs->pc >> and the address check against el1_*() functions > >> and similar changes for EL0 as well. > >> I still think that the frame type check makes it more robust. > > Yeah, we know the entry points so they can serve the same role as > checking an explicitly written value. It does mean one less operation > on exception entry, though I'm not sure that's that a big enough > overhead to actually worry about. I don't have *super* strong opinons > against adding the explicitly written value other than it being one more > thing we don't otherwise use which we have to get right for reliable > stack trace, there's a greater risk of bitrot if it's not something that > we ever look at outside of the reliable stack trace code. > So, I will add the address checks for robustness. I will think some more about the frame type. >>>> EL1_FRAME >>>> EL1 exception frame. > >>> We do trap into EL2 as well, the patch will track EL2 frames as EL1 >>> frames. Even if we can treat them the same the naming ought to be >>> clear. > >> Are you referring to ARMv8.1 VHE extension where the kernel can run >> at EL2? Could you elaborate? I thought that EL2 was basically for >> Hypervisors. > > KVM is the main case, yes - IIRC otherwise it's mainly error handlers > but I might be missing something. We do recommend that the kernel is > started at EL2 where possible. > > Actually now I look again it's just not adding anything on EL2 entries > at all, they use a separate set of macros which aren't updated - this > will only update things for EL0 and EL1 entries so my comment above > about this tracking EL2 as EL1 isn't accurate. > OK. Madhavan
On 3/19/21 9:40 AM, Madhavan T. Venkataraman wrote: >> Are you referring to ARMv8.1 VHE extension where the kernel can run >> at EL2? Could you elaborate? I thought that EL2 was basically for >> Hypervisors. > KVM is the main case, yes - IIRC otherwise it's mainly error handlers > but I might be missing something. We do recommend that the kernel is > started at EL2 where possible. > > Actually now I look again it's just not adding anything on EL2 entries > at all, they use a separate set of macros which aren't updated - this > will only update things for EL0 and EL1 entries so my comment above > about this tracking EL2 as EL1 isn't accurate. So, do I need to do anything here? Madhavan
On Fri, Mar 19, 2021 at 10:02:52AM -0500, Madhavan T. Venkataraman wrote: > On 3/19/21 9:40 AM, Madhavan T. Venkataraman wrote: > > Actually now I look again it's just not adding anything on EL2 entries > > at all, they use a separate set of macros which aren't updated - this > > will only update things for EL0 and EL1 entries so my comment above > > about this tracking EL2 as EL1 isn't accurate. > So, do I need to do anything here? Probably worth some note somewhere about other stack types existing and how they end up being handled, in the changelog at least.
On 3/19/21 11:20 AM, Mark Brown wrote: > On Fri, Mar 19, 2021 at 10:02:52AM -0500, Madhavan T. Venkataraman wrote: >> On 3/19/21 9:40 AM, Madhavan T. Venkataraman wrote: > >>> Actually now I look again it's just not adding anything on EL2 entries >>> at all, they use a separate set of macros which aren't updated - this >>> will only update things for EL0 and EL1 entries so my comment above >>> about this tracking EL2 as EL1 isn't accurate. > >> So, do I need to do anything here? > > Probably worth some note somewhere about other stack types existing and > how they end up being handled, in the changelog at least. > OK. Madhavan
On Mon, Mar 15, 2021 at 11:57:54AM -0500, madvenka@linux.microsoft.com wrote: > From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com> > > Apart from the task pt_regs, pt_regs is also created on the stack for other > other cases: > > - EL1 exception. A pt_regs is created on the stack to save register > state. In addition, pt_regs->stackframe is set up for the > interrupted kernel function so that the function shows up in the > EL1 exception stack trace. > > - When a traced function calls the ftrace infrastructure at the > beginning of the function, ftrace creates a pt_regs on the stack > at that point to save register state. In addition, it sets up > pt_regs->stackframe for the traced function so that the traced > function shows up in the stack trace taken from anywhere in the > ftrace code after that point. When the ftrace code returns to the > traced function, the pt_regs is removed from the stack. > > To summarize, pt_regs->stackframe is used (or will be used) as a marker > frame in stack traces. To enable the unwinder to detect these frames, tag > each pt_regs->stackframe with a type. To record the type, use the unused2 > field in struct pt_regs and rename it to frame_type. The types are: > > TASK_FRAME > Terminating frame for a normal stack trace. > EL0_FRAME > Terminating frame for an EL0 exception. > EL1_FRAME > EL1 exception frame. > FTRACE_FRAME > FTRACE frame. > > These frame types will be used by the unwinder later to validate frames. I don't think that we need a marker in the pt_regs: * For kernel tasks and user tasks we just need the terminal frame record to be at a known location. We don't need the pt_regs to determine this. * For EL1<->EL1 exception boundaries, we already chain the frame records together, and we can identify the entry functions to see that there's an exception boundary. We don't need the pt_regs to determine this. * For ftrace using patchable-function-entry, we can identify the trampoline function. I'm also hoping to move away from pt_regs to an ftrace_regs here, and I'd like to avoid more strongly coupling this to pt_regs. Maybe I'm missing something you need for this last case? > > Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> > --- > arch/arm64/include/asm/ptrace.h | 15 +++++++++++++-- > arch/arm64/kernel/asm-offsets.c | 1 + > arch/arm64/kernel/entry.S | 4 ++++ > arch/arm64/kernel/head.S | 2 ++ > arch/arm64/kernel/process.c | 1 + > 5 files changed, 21 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h > index e58bca832dff..a75211ce009a 100644 > --- a/arch/arm64/include/asm/ptrace.h > +++ b/arch/arm64/include/asm/ptrace.h > @@ -117,6 +117,17 @@ > */ > #define NO_SYSCALL (-1) > > +/* > + * pt_regs->stackframe is a marker frame that is used in different > + * situations. These are the different types of frames. Use patterns > + * for the frame types instead of (0, 1, 2, 3, ..) so that it is less > + * likely to find them on the stack. > + */ > +#define TASK_FRAME 0xDEADBEE0 /* Task stack termination frame */ > +#define EL0_FRAME 0xDEADBEE1 /* EL0 exception frame */ > +#define EL1_FRAME 0xDEADBEE2 /* EL1 exception frame */ > +#define FTRACE_FRAME 0xDEADBEE3 /* FTrace frame */ This sounds like we're using this as a heuristic, which I don't think we should do. I'd strongly prefr to avoid magic valuess here, and if we cannot be 100% certain of the stack contents, this is not reliable anyway. Thanks, Mark. > #ifndef __ASSEMBLY__ > #include <linux/bug.h> > #include <linux/types.h> > @@ -187,11 +198,11 @@ struct pt_regs { > }; > u64 orig_x0; > #ifdef __AARCH64EB__ > - u32 unused2; > + u32 frame_type; > s32 syscallno; > #else > s32 syscallno; > - u32 unused2; > + u32 frame_type; > #endif > u64 sdei_ttbr1; > /* Only valid when ARM64_HAS_IRQ_PRIO_MASKING is enabled. */ > diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c > index a36e2fc330d4..43f97dbc7dfc 100644 > --- a/arch/arm64/kernel/asm-offsets.c > +++ b/arch/arm64/kernel/asm-offsets.c > @@ -75,6 +75,7 @@ int main(void) > DEFINE(S_SDEI_TTBR1, offsetof(struct pt_regs, sdei_ttbr1)); > DEFINE(S_PMR_SAVE, offsetof(struct pt_regs, pmr_save)); > DEFINE(S_STACKFRAME, offsetof(struct pt_regs, stackframe)); > + DEFINE(S_FRAME_TYPE, offsetof(struct pt_regs, frame_type)); > DEFINE(PT_REGS_SIZE, sizeof(struct pt_regs)); > BLANK(); > #ifdef CONFIG_COMPAT > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S > index e2dc2e998934..ecc3507d9cdd 100644 > --- a/arch/arm64/kernel/entry.S > +++ b/arch/arm64/kernel/entry.S > @@ -269,8 +269,12 @@ alternative_else_nop_endif > */ > .if \el == 0 > stp xzr, xzr, [sp, #S_STACKFRAME] > + ldr w17, =EL0_FRAME > + str w17, [sp, #S_FRAME_TYPE] > .else > stp x29, x22, [sp, #S_STACKFRAME] > + ldr w17, =EL1_FRAME > + str w17, [sp, #S_FRAME_TYPE] > .endif > add x29, sp, #S_STACKFRAME > > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S > index 2769b20934d4..d2ee78f8f97f 100644 > --- a/arch/arm64/kernel/head.S > +++ b/arch/arm64/kernel/head.S > @@ -410,6 +410,8 @@ SYM_FUNC_END(__create_page_tables) > */ > .macro setup_last_frame > sub sp, sp, #PT_REGS_SIZE > + ldr w17, =TASK_FRAME > + str w17, [sp, #S_FRAME_TYPE] > stp xzr, xzr, [sp, #S_STACKFRAME] > add x29, sp, #S_STACKFRAME > ldr x30, =ret_from_fork > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > index 7ffa689e8b60..5c152fd60503 100644 > --- a/arch/arm64/kernel/process.c > +++ b/arch/arm64/kernel/process.c > @@ -442,6 +442,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, > * as the last frame for the new task. > */ > p->thread.cpu_context.fp = (unsigned long)childregs->stackframe; > + childregs->frame_type = TASK_FRAME; > > ptrace_hw_copy_thread(p); > > -- > 2.25.1 >
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h index e58bca832dff..a75211ce009a 100644 --- a/arch/arm64/include/asm/ptrace.h +++ b/arch/arm64/include/asm/ptrace.h @@ -117,6 +117,17 @@ */ #define NO_SYSCALL (-1) +/* + * pt_regs->stackframe is a marker frame that is used in different + * situations. These are the different types of frames. Use patterns + * for the frame types instead of (0, 1, 2, 3, ..) so that it is less + * likely to find them on the stack. + */ +#define TASK_FRAME 0xDEADBEE0 /* Task stack termination frame */ +#define EL0_FRAME 0xDEADBEE1 /* EL0 exception frame */ +#define EL1_FRAME 0xDEADBEE2 /* EL1 exception frame */ +#define FTRACE_FRAME 0xDEADBEE3 /* FTrace frame */ + #ifndef __ASSEMBLY__ #include <linux/bug.h> #include <linux/types.h> @@ -187,11 +198,11 @@ struct pt_regs { }; u64 orig_x0; #ifdef __AARCH64EB__ - u32 unused2; + u32 frame_type; s32 syscallno; #else s32 syscallno; - u32 unused2; + u32 frame_type; #endif u64 sdei_ttbr1; /* Only valid when ARM64_HAS_IRQ_PRIO_MASKING is enabled. */ diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index a36e2fc330d4..43f97dbc7dfc 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -75,6 +75,7 @@ int main(void) DEFINE(S_SDEI_TTBR1, offsetof(struct pt_regs, sdei_ttbr1)); DEFINE(S_PMR_SAVE, offsetof(struct pt_regs, pmr_save)); DEFINE(S_STACKFRAME, offsetof(struct pt_regs, stackframe)); + DEFINE(S_FRAME_TYPE, offsetof(struct pt_regs, frame_type)); DEFINE(PT_REGS_SIZE, sizeof(struct pt_regs)); BLANK(); #ifdef CONFIG_COMPAT diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index e2dc2e998934..ecc3507d9cdd 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -269,8 +269,12 @@ alternative_else_nop_endif */ .if \el == 0 stp xzr, xzr, [sp, #S_STACKFRAME] + ldr w17, =EL0_FRAME + str w17, [sp, #S_FRAME_TYPE] .else stp x29, x22, [sp, #S_STACKFRAME] + ldr w17, =EL1_FRAME + str w17, [sp, #S_FRAME_TYPE] .endif add x29, sp, #S_STACKFRAME diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 2769b20934d4..d2ee78f8f97f 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -410,6 +410,8 @@ SYM_FUNC_END(__create_page_tables) */ .macro setup_last_frame sub sp, sp, #PT_REGS_SIZE + ldr w17, =TASK_FRAME + str w17, [sp, #S_FRAME_TYPE] stp xzr, xzr, [sp, #S_STACKFRAME] add x29, sp, #S_STACKFRAME ldr x30, =ret_from_fork diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 7ffa689e8b60..5c152fd60503 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -442,6 +442,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, * as the last frame for the new task. */ p->thread.cpu_context.fp = (unsigned long)childregs->stackframe; + childregs->frame_type = TASK_FRAME; ptrace_hw_copy_thread(p);