diff mbox series

[RFC,v2,1/8] arm64: Implement stack trace termination record

Message ID 20210315165800.5948-2-madvenka@linux.microsoft.com (mailing list archive)
State New, archived
Headers show
Series arm64: Implement reliable stack trace | expand

Commit Message

Madhavan T. Venkataraman March 15, 2021, 4:57 p.m. UTC
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>

The unwinder needs to be able to reliably tell when it has reached the end
of a stack trace. One way to do this is to have the last stack frame at a
fixed offset from the base of the task stack. When the unwinder reaches
that offset, it knows it is done.

Kernel Tasks
============

All tasks except the idle task have a pt_regs structure right after the
task stack. This is called the task pt_regs. The pt_regs structure has a
special stackframe field. Make this stackframe field the last frame in the
task stack. This needs to be done in copy_thread() which initializes a new
task's pt_regs and initial CPU context.

For the idle task, there is no task pt_regs. For our purpose, we need one.
So, create a pt_regs just like other kernel tasks and make
pt_regs->stackframe the last frame in the idle task stack. This needs to be
done at two places:

	- On the primary CPU, the boot task runs. It calls start_kernel()
	  and eventually becomes the idle task for the primary CPU. Just
	  before start_kernel() is called, set up the last frame.

	- On each secondary CPU, a startup task runs that calls
	  secondary_startup_kernel() and eventually becomes the idle task
	  on the secondary CPU. Just before secondary_start_kernel() is
	  called, set up the last frame.

User Tasks
==========

User tasks are initially set up like kernel tasks when they are created.
Then, they return to userland after fork via ret_from_fork(). After that,
they enter the kernel only on an EL0 exception. (In arm64, system calls are
also EL0 exceptions). The EL0 exception handler stores state in the task
pt_regs and calls different functions based on the type of exception. The
stack trace for an EL0 exception must end at the task pt_regs. So, make
task pt_regs->stackframe as the last frame in the EL0 exception stack.

In summary, task pt_regs->stackframe is where a successful stack trace ends.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
 arch/arm64/kernel/entry.S   |  8 +++++---
 arch/arm64/kernel/head.S    | 28 ++++++++++++++++++++++++----
 arch/arm64/kernel/process.c |  5 +++++
 3 files changed, 34 insertions(+), 7 deletions(-)

Comments

Mark Brown March 18, 2021, 3:09 p.m. UTC | #1
On Mon, Mar 15, 2021 at 11:57:53AM -0500, madvenka@linux.microsoft.com wrote:

> In summary, task pt_regs->stackframe is where a successful stack trace ends.

>         .if \el == 0
> -       mov     x29, xzr
> +       stp     xzr, xzr, [sp, #S_STACKFRAME]
>         .else
>         stp     x29, x22, [sp, #S_STACKFRAME]
> -       add     x29, sp, #S_STACKFRAME
>         .endif
> +       add     x29, sp, #S_STACKFRAME

For both user and kernel threads this patch (at least by itself) results
in an additional record being reported in stack traces with a NULL
function pointer since it keeps the existing record where it is and adds
this new fixed record below it.  This is addressed for the kernel later
in the series, by "arm64: Terminate the stack trace at TASK_FRAME and
EL0_FRAME", but will still be visible to other unwinders such as
debuggers.  I'm not sure that this *matters* but it might and should at
least be called out more explicitly.

If we are going to add the extra record there would probably be less
potential for confusion if we pointed it at some sensibly named dummy
function so anything or anyone that does see it on the stack doesn't get
confused by a NULL.
Madhavan T. Venkataraman March 18, 2021, 8:26 p.m. UTC | #2
On 3/18/21 10:09 AM, Mark Brown wrote:
> On Mon, Mar 15, 2021 at 11:57:53AM -0500, madvenka@linux.microsoft.com wrote:
> 
>> In summary, task pt_regs->stackframe is where a successful stack trace ends.
> 
>>         .if \el == 0
>> -       mov     x29, xzr
>> +       stp     xzr, xzr, [sp, #S_STACKFRAME]
>>         .else
>>         stp     x29, x22, [sp, #S_STACKFRAME]
>> -       add     x29, sp, #S_STACKFRAME
>>         .endif
>> +       add     x29, sp, #S_STACKFRAME
> 
> For both user and kernel threads this patch (at least by itself) results
> in an additional record being reported in stack traces with a NULL
> function pointer since it keeps the existing record where it is and adds
> this new fixed record below it.  This is addressed for the kernel later
> in the series, by "arm64: Terminate the stack trace at TASK_FRAME and
> EL0_FRAME", but will still be visible to other unwinders such as
> debuggers.  I'm not sure that this *matters* but it might and should at
> least be called out more explicitly.
> 
> If we are going to add the extra record there would probably be less
> potential for confusion if we pointed it at some sensibly named dummy
> function so anything or anyone that does see it on the stack doesn't get
> confused by a NULL.
> 

I agree. I will think about this some more. If no other solution presents
itself, I will add the dummy function.

Madhavan

> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Mark Brown March 19, 2021, 12:30 p.m. UTC | #3
On Thu, Mar 18, 2021 at 03:26:13PM -0500, Madhavan T. Venkataraman wrote:
> On 3/18/21 10:09 AM, Mark Brown wrote:

> > If we are going to add the extra record there would probably be less
> > potential for confusion if we pointed it at some sensibly named dummy
> > function so anything or anyone that does see it on the stack doesn't get
> > confused by a NULL.

> I agree. I will think about this some more. If no other solution presents
> itself, I will add the dummy function.

After discussing this with Mark Rutland offlist he convinced me that so
long as we ensure the kernel doesn't print the NULL record we're
probably OK here, the effort setting the function pointer up correctly
in all circumstances (especially when we're not in the normal memory
map) is probably not worth it for the limited impact it's likely to have
to see the NULL pointer (probably mainly a person working with some
external debugger).  It should be noted in the changelog though, and/or
merged in with the relevant change to the unwinder.
Madhavan T. Venkataraman March 19, 2021, 2:29 p.m. UTC | #4
On 3/19/21 7:30 AM, Mark Brown wrote:
> On Thu, Mar 18, 2021 at 03:26:13PM -0500, Madhavan T. Venkataraman wrote:
>> On 3/18/21 10:09 AM, Mark Brown wrote:
> 
>>> If we are going to add the extra record there would probably be less
>>> potential for confusion if we pointed it at some sensibly named dummy
>>> function so anything or anyone that does see it on the stack doesn't get
>>> confused by a NULL.
> 
>> I agree. I will think about this some more. If no other solution presents
>> itself, I will add the dummy function.
> 
> After discussing this with Mark Rutland offlist he convinced me that so
> long as we ensure the kernel doesn't print the NULL record we're
> probably OK here, the effort setting the function pointer up correctly
> in all circumstances (especially when we're not in the normal memory
> map) is probably not worth it for the limited impact it's likely to have
> to see the NULL pointer (probably mainly a person working with some
> external debugger).  It should be noted in the changelog though, and/or
> merged in with the relevant change to the unwinder.
> 

OK. I will add a comment as well as note it in the changelog.

Thanks to both of you.

Madhavan
Madhavan T. Venkataraman March 19, 2021, 6:19 p.m. UTC | #5
On 3/19/21 9:29 AM, Madhavan T. Venkataraman wrote:
> 
> 
> On 3/19/21 7:30 AM, Mark Brown wrote:
>> On Thu, Mar 18, 2021 at 03:26:13PM -0500, Madhavan T. Venkataraman wrote:
>>> On 3/18/21 10:09 AM, Mark Brown wrote:
>>
>>>> If we are going to add the extra record there would probably be less
>>>> potential for confusion if we pointed it at some sensibly named dummy
>>>> function so anything or anyone that does see it on the stack doesn't get
>>>> confused by a NULL.
>>
>>> I agree. I will think about this some more. If no other solution presents
>>> itself, I will add the dummy function.
>>
>> After discussing this with Mark Rutland offlist he convinced me that so
>> long as we ensure the kernel doesn't print the NULL record we're
>> probably OK here, the effort setting the function pointer up correctly
>> in all circumstances (especially when we're not in the normal memory
>> map) is probably not worth it for the limited impact it's likely to have
>> to see the NULL pointer (probably mainly a person working with some
>> external debugger).  It should be noted in the changelog though, and/or
>> merged in with the relevant change to the unwinder.
>>
> 
> OK. I will add a comment as well as note it in the changelog.
> 
> Thanks to both of you.
> 
> Madhavan
> 

I thought about this some more. I think I have a simple solution. I will
prepare a patch and send it out. If you and Mark Rutland approve, I will
include it in the next version of this RFC.

Madhavan
Madhavan T. Venkataraman March 19, 2021, 10:03 p.m. UTC | #6
On 3/19/21 1:19 PM, Madhavan T. Venkataraman wrote:
> 
> 
> On 3/19/21 9:29 AM, Madhavan T. Venkataraman wrote:
>>
>>
>> On 3/19/21 7:30 AM, Mark Brown wrote:
>>> On Thu, Mar 18, 2021 at 03:26:13PM -0500, Madhavan T. Venkataraman wrote:
>>>> On 3/18/21 10:09 AM, Mark Brown wrote:
>>>
>>>>> If we are going to add the extra record there would probably be less
>>>>> potential for confusion if we pointed it at some sensibly named dummy
>>>>> function so anything or anyone that does see it on the stack doesn't get
>>>>> confused by a NULL.
>>>
>>>> I agree. I will think about this some more. If no other solution presents
>>>> itself, I will add the dummy function.
>>>
>>> After discussing this with Mark Rutland offlist he convinced me that so
>>> long as we ensure the kernel doesn't print the NULL record we're
>>> probably OK here, the effort setting the function pointer up correctly
>>> in all circumstances (especially when we're not in the normal memory
>>> map) is probably not worth it for the limited impact it's likely to have
>>> to see the NULL pointer (probably mainly a person working with some
>>> external debugger).  It should be noted in the changelog though, and/or
>>> merged in with the relevant change to the unwinder.
>>>
>>
>> OK. I will add a comment as well as note it in the changelog.
>>
>> Thanks to both of you.
>>
>> Madhavan
>>
> 
> I thought about this some more. I think I have a simple solution. I will
> prepare a patch and send it out. If you and Mark Rutland approve, I will
> include it in the next version of this RFC.
> 
> Madhavan
> 

I solved this by using existing functions logically instead of inventing a
dummy function. I initialize pt_regs->stackframe[1] to an existing function
so that the stack trace will not show a 0x0 entry as well as the kernel and
gdb will show identical stack traces.

For all task stack traces including the idle tasks, the stack trace will
end at copy_thread() as copy_thread() is the function that initializes the
pt_regs and the first stack frame for a task.

For EL0 exceptions, the stack trace will end with vectors() as vectors
entries call the EL0 handlers.

Here are sample stack traces (I only show the ending of each trace):

Idle task on primary CPU
========================

		 ...
[    0.022557]   start_kernel+0x5b8/0x5f4
[    0.022570]   __primary_switched+0xa8/0xb8
[    0.022578]   copy_thread+0x0/0x188

Idle task on secondary CPU
==========================

		 ...
[    0.023397]   secondary_start_kernel+0x188/0x1e0
[    0.023406]   __secondary_switched+0x40/0x88
[    0.023415]   copy_thread+0x0/0x188

All other kernel threads
========================

		 ...
[   13.501062]   ret_from_fork+0x10/0x18
[   13.507998]   copy_thread+0x0/0x188

User threads (EL0 exception)
============

write(2) system call example:

		 ...
[  521.686148]   vfs_write+0xc8/0x2c0
[  521.686156]   ksys_write+0x74/0x108
[  521.686161]   __arm64_sys_write+0x24/0x30
[  521.686166]   el0_svc_common.constprop.0+0x70/0x1a8
[  521.686175]   do_el0_svc+0x2c/0x98
[  521.686180]   el0_svc+0x2c/0x70
[  521.686188]   el0_sync_handler+0xb0/0xb8
[  521.686193]   el0_sync+0x17c/0x180
[  521.686198]   vectors+0x0/0x7d8

Here are the code changes:

========================================================================
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index a31a0a713c85..514307e80b79 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -261,16 +261,19 @@ alternative_else_nop_endif
 	stp	lr, x21, [sp, #S_LR]
 
 	/*
-	 * For exceptions from EL0, terminate the callchain here.
+	 * For exceptions from EL0, terminate the callchain here at
+	 * task_pt_regs(current)->stackframe.
+	 *
 	 * For exceptions from EL1, create a synthetic frame record so the
 	 * interrupted code shows up in the backtrace.
 	 */
 	.if \el == 0
-	mov	x29, xzr
+	ldr	x17, =vectors
+	stp	xzr, x17, [sp, #S_STACKFRAME]
 	.else
 	stp	x29, x22, [sp, #S_STACKFRAME]
-	add	x29, sp, #S_STACKFRAME
 	.endif
+	add	x29, sp, #S_STACKFRAME
 
 #ifdef CONFIG_ARM64_SW_TTBR0_PAN
 alternative_if_not ARM64_HAS_PAN
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 66b0e0b66e31..699e0dd313a1 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -393,6 +393,29 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
 	ret	x28
 SYM_FUNC_END(__create_page_tables)
 
+	/*
+	 * The boot task becomes the idle task for the primary CPU. The
+	 * CPU startup task on each secondary CPU becomes the idle task
+	 * for the secondary CPU.
+	 *
+	 * The idle task does not require pt_regs. But create a dummy
+	 * pt_regs so that task_pt_regs(idle_task)->stackframe can be
+	 * set up to be the last frame on the idle task stack just like
+	 * all the other kernel tasks. This helps the unwinder to
+	 * terminate the stack trace at a well-known stack offset.
+	 *
+	 * Also, set up the last return PC to be copy_thread() just
+	 * like all the other kernel tasks so that the stack trace of
+	 * all kernel tasks ends with the same function.
+	 */
+	.macro setup_last_frame
+	sub	sp, sp, #PT_REGS_SIZE
+	ldr	x17, =copy_thread
+	stp	xzr, x17, [sp, #S_STACKFRAME]
+	add	x29, sp, #S_STACKFRAME
+	adr	x30, #0
+	.endm
+
 /*
  * The following fragment of code is executed with the MMU enabled.
  *
@@ -447,8 +470,7 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 #endif
 	bl	switch_to_vhe			// Prefer VHE if possible
 	add	sp, sp, #16
-	mov	x29, #0
-	mov	x30, #0
+	setup_last_frame
 	b	start_kernel
 SYM_FUNC_END(__primary_switched)
 
@@ -606,8 +628,7 @@ SYM_FUNC_START_LOCAL(__secondary_switched)
 	cbz	x2, __secondary_too_slow
 	msr	sp_el0, x2
 	scs_load x2, x3
-	mov	x29, #0
-	mov	x30, #0
+	setup_last_frame
 
 #ifdef CONFIG_ARM64_PTR_AUTH
 	ptrauth_keys_init_cpu x2, x3, x4, x5
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 325c83b1a24d..9050699ff67c 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -437,6 +437,12 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 	}
 	p->thread.cpu_context.pc = (unsigned long)ret_from_fork;
 	p->thread.cpu_context.sp = (unsigned long)childregs;
+	/*
+	 * For the benefit of the unwinder, set up childregs->stackframe
+	 * as the last frame for the new task.
+	 */
+	p->thread.cpu_context.fp = (unsigned long)childregs->stackframe;
+	childregs->stackframe[1] = (unsigned long)copy_thread;
 
 	ptrace_hw_copy_thread(p);
 
======================================================================

If you approve, the above will become RFC Patch v3 1/8 in the next version.

Let me know.

Also, I could introduce an extra frame in the EL1 exception stack trace that
includes vectors so the stack trace would look like this (timer interrupt example):

call_timer_fn
run_timer_softirq
__do_softirq
irq_exit
__handle_domain_irq
gic_handle_irq
el1_irq
vectors

This way, if the unwinder finds vectors, it knows that it is an exception frame.

Madhavan
Mark Rutland March 23, 2021, 10:24 a.m. UTC | #7
On Fri, Mar 19, 2021 at 05:03:09PM -0500, Madhavan T. Venkataraman wrote:
> I solved this by using existing functions logically instead of inventing a
> dummy function. I initialize pt_regs->stackframe[1] to an existing function
> so that the stack trace will not show a 0x0 entry as well as the kernel and
> gdb will show identical stack traces.
> 
> For all task stack traces including the idle tasks, the stack trace will
> end at copy_thread() as copy_thread() is the function that initializes the
> pt_regs and the first stack frame for a task.

I don't think this is a good idea, as it will mean that copy_thread()
will appear to be live in every thread, and therefore will not be
patchable.

There are other things people need to be aware of when using an external
debugger (e.g. during EL0<->ELx transitions there are periods when X29
and X30 contain the EL0 values, and cannot be used to unwind), so I
don't think there's a strong need to make this look prettier to an
external debugger.

> For EL0 exceptions, the stack trace will end with vectors() as vectors
> entries call the EL0 handlers.
> 
> Here are sample stack traces (I only show the ending of each trace):
> 
> Idle task on primary CPU
> ========================
> 
> 		 ...
> [    0.022557]   start_kernel+0x5b8/0x5f4
> [    0.022570]   __primary_switched+0xa8/0xb8
> [    0.022578]   copy_thread+0x0/0x188
> 
> Idle task on secondary CPU
> ==========================
> 
> 		 ...
> [    0.023397]   secondary_start_kernel+0x188/0x1e0
> [    0.023406]   __secondary_switched+0x40/0x88
> [    0.023415]   copy_thread+0x0/0x188
> 
> All other kernel threads
> ========================
> 
> 		 ...
> [   13.501062]   ret_from_fork+0x10/0x18
> [   13.507998]   copy_thread+0x0/0x188
> 
> User threads (EL0 exception)
> ============
> 
> write(2) system call example:
> 
> 		 ...
> [  521.686148]   vfs_write+0xc8/0x2c0
> [  521.686156]   ksys_write+0x74/0x108
> [  521.686161]   __arm64_sys_write+0x24/0x30
> [  521.686166]   el0_svc_common.constprop.0+0x70/0x1a8
> [  521.686175]   do_el0_svc+0x2c/0x98
> [  521.686180]   el0_svc+0x2c/0x70
> [  521.686188]   el0_sync_handler+0xb0/0xb8
> [  521.686193]   el0_sync+0x17c/0x180
> [  521.686198]   vectors+0x0/0x7d8

[...]

> If you approve, the above will become RFC Patch v3 1/8 in the next version.

As above, I don't think we should repurpose an existing function here,
and my preference is to use 0x0.

> Let me know.
> 
> Also, I could introduce an extra frame in the EL1 exception stack trace that
> includes vectors so the stack trace would look like this (timer interrupt example):
> 
> call_timer_fn
> run_timer_softirq
> __do_softirq
> irq_exit
> __handle_domain_irq
> gic_handle_irq
> el1_irq
> vectors
> 
> This way, if the unwinder finds vectors, it knows that it is an exception frame.

I can see this might make it simpler to detect exception boundaries, but
I suspect that we need other information anyway, so this doesn't become
all that helpful. For EL0<->EL1 exception boundaries we want to
successfully terminate a robust stacktrace whereas for EL1<->EL1
exception boundaries we want to fail a robust stacktrace.

I reckon we have to figure that out from the el1_* and el0_* entry
points (which I am working to reduce/simplify as part of the entry
assembly conversion to C). With that we can terminate unwind at the
el0_* parts, and reject unwinding across any other bit of .entry.text.

Thanks,
Mark.
Madhavan T. Venkataraman March 23, 2021, 12:39 p.m. UTC | #8
On 3/23/21 5:24 AM, Mark Rutland wrote:
> On Fri, Mar 19, 2021 at 05:03:09PM -0500, Madhavan T. Venkataraman wrote:
>> I solved this by using existing functions logically instead of inventing a
>> dummy function. I initialize pt_regs->stackframe[1] to an existing function
>> so that the stack trace will not show a 0x0 entry as well as the kernel and
>> gdb will show identical stack traces.
>>
>> For all task stack traces including the idle tasks, the stack trace will
>> end at copy_thread() as copy_thread() is the function that initializes the
>> pt_regs and the first stack frame for a task.
> 
> I don't think this is a good idea, as it will mean that copy_thread()
> will appear to be live in every thread, and therefore will not be
> patchable.
> 
> There are other things people need to be aware of when using an external
> debugger (e.g. during EL0<->ELx transitions there are periods when X29
> and X30 contain the EL0 values, and cannot be used to unwind), so I
> don't think there's a strong need to make this look prettier to an
> external debugger.
> 

OK.

>> For EL0 exceptions, the stack trace will end with vectors() as vectors
>> entries call the EL0 handlers.
>>
>> Here are sample stack traces (I only show the ending of each trace):
>>
>> Idle task on primary CPU
>> ========================
>>
>> 		 ...
>> [    0.022557]   start_kernel+0x5b8/0x5f4
>> [    0.022570]   __primary_switched+0xa8/0xb8
>> [    0.022578]   copy_thread+0x0/0x188
>>
>> Idle task on secondary CPU
>> ==========================
>>
>> 		 ...
>> [    0.023397]   secondary_start_kernel+0x188/0x1e0
>> [    0.023406]   __secondary_switched+0x40/0x88
>> [    0.023415]   copy_thread+0x0/0x188
>>
>> All other kernel threads
>> ========================
>>
>> 		 ...
>> [   13.501062]   ret_from_fork+0x10/0x18
>> [   13.507998]   copy_thread+0x0/0x188
>>
>> User threads (EL0 exception)
>> ============
>>
>> write(2) system call example:
>>
>> 		 ...
>> [  521.686148]   vfs_write+0xc8/0x2c0
>> [  521.686156]   ksys_write+0x74/0x108
>> [  521.686161]   __arm64_sys_write+0x24/0x30
>> [  521.686166]   el0_svc_common.constprop.0+0x70/0x1a8
>> [  521.686175]   do_el0_svc+0x2c/0x98
>> [  521.686180]   el0_svc+0x2c/0x70
>> [  521.686188]   el0_sync_handler+0xb0/0xb8
>> [  521.686193]   el0_sync+0x17c/0x180
>> [  521.686198]   vectors+0x0/0x7d8
> 
> [...]
> 
>> If you approve, the above will become RFC Patch v3 1/8 in the next version.
> 
> As above, I don't think we should repurpose an existing function here,
> and my preference is to use 0x0.
> 

OK.

>> Let me know.
>>
>> Also, I could introduce an extra frame in the EL1 exception stack trace that
>> includes vectors so the stack trace would look like this (timer interrupt example):
>>
>> call_timer_fn
>> run_timer_softirq
>> __do_softirq
>> irq_exit
>> __handle_domain_irq
>> gic_handle_irq
>> el1_irq
>> vectors
>>
>> This way, if the unwinder finds vectors, it knows that it is an exception frame.
> 
> I can see this might make it simpler to detect exception boundaries, but
> I suspect that we need other information anyway, so this doesn't become
> all that helpful. For EL0<->EL1 exception boundaries we want to
> successfully terminate a robust stacktrace whereas for EL1<->EL1
> exception boundaries we want to fail a robust stacktrace.
> 
> I reckon we have to figure that out from the el1_* and el0_* entry
> points (which I am working to reduce/simplify as part of the entry
> assembly conversion to C). With that we can terminate unwind at the
> el0_* parts, and reject unwinding across any other bit of .entry.text.
> 

OK. That is fine.

Thanks.

Madhavan
diff mbox series

Patch

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index a31a0a713c85..e2dc2e998934 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -261,16 +261,18 @@  alternative_else_nop_endif
 	stp	lr, x21, [sp, #S_LR]
 
 	/*
-	 * For exceptions from EL0, terminate the callchain here.
+	 * For exceptions from EL0, terminate the callchain here at
+	 * task_pt_regs(current)->stackframe.
+	 *
 	 * For exceptions from EL1, create a synthetic frame record so the
 	 * interrupted code shows up in the backtrace.
 	 */
 	.if \el == 0
-	mov	x29, xzr
+	stp	xzr, xzr, [sp, #S_STACKFRAME]
 	.else
 	stp	x29, x22, [sp, #S_STACKFRAME]
-	add	x29, sp, #S_STACKFRAME
 	.endif
+	add	x29, sp, #S_STACKFRAME
 
 #ifdef CONFIG_ARM64_SW_TTBR0_PAN
 alternative_if_not ARM64_HAS_PAN
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 66b0e0b66e31..2769b20934d4 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -393,6 +393,28 @@  SYM_FUNC_START_LOCAL(__create_page_tables)
 	ret	x28
 SYM_FUNC_END(__create_page_tables)
 
+	/*
+	 * The boot task becomes the idle task for the primary CPU. The
+	 * CPU startup task on each secondary CPU becomes the idle task
+	 * for the secondary CPU.
+	 *
+	 * The idle task does not require pt_regs. But create a dummy
+	 * pt_regs so that task_pt_regs(idle_task)->stackframe can be
+	 * set up to be the last frame on the idle task stack just like
+	 * all the other kernel tasks. This helps the unwinder to
+	 * terminate the stack trace at a well-known stack offset.
+	 *
+	 * Also, set up the last return PC to be ret_from_fork() just
+	 * like all the other kernel tasks so that the stack trace of
+	 * all kernel tasks ends with the same function.
+	 */
+	.macro setup_last_frame
+	sub	sp, sp, #PT_REGS_SIZE
+	stp	xzr, xzr, [sp, #S_STACKFRAME]
+	add	x29, sp, #S_STACKFRAME
+	ldr	x30, =ret_from_fork
+	.endm
+
 /*
  * The following fragment of code is executed with the MMU enabled.
  *
@@ -447,8 +469,7 @@  SYM_FUNC_START_LOCAL(__primary_switched)
 #endif
 	bl	switch_to_vhe			// Prefer VHE if possible
 	add	sp, sp, #16
-	mov	x29, #0
-	mov	x30, #0
+	setup_last_frame
 	b	start_kernel
 SYM_FUNC_END(__primary_switched)
 
@@ -606,8 +627,7 @@  SYM_FUNC_START_LOCAL(__secondary_switched)
 	cbz	x2, __secondary_too_slow
 	msr	sp_el0, x2
 	scs_load x2, x3
-	mov	x29, #0
-	mov	x30, #0
+	setup_last_frame
 
 #ifdef CONFIG_ARM64_PTR_AUTH
 	ptrauth_keys_init_cpu x2, x3, x4, x5
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 325c83b1a24d..7ffa689e8b60 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -437,6 +437,11 @@  int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 	}
 	p->thread.cpu_context.pc = (unsigned long)ret_from_fork;
 	p->thread.cpu_context.sp = (unsigned long)childregs;
+	/*
+	 * For the benefit of the unwinder, set up childregs->stackframe
+	 * as the last frame for the new task.
+	 */
+	p->thread.cpu_context.fp = (unsigned long)childregs->stackframe;
 
 	ptrace_hw_copy_thread(p);