diff mbox

arm64: fix infinite stacktrace

Message ID alpine.LRH.2.02.1807092300520.24277@file01.intranet.prod.int.rdu2.redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mikulas Patocka July 10, 2018, 3:04 a.m. UTC
On Thu, 28 Jun 2018, Dave Martin wrote:

> On Wed, Jun 27, 2018 at 05:41:51PM +0100, Will Deacon wrote:
> > Hi all,
> > 
> > On Fri, Jun 15, 2018 at 12:58:23PM +0100, Mark Rutland wrote:
> > > On Thu, Jun 14, 2018 at 02:58:21PM -0400, Mikulas Patocka wrote:
> > > > I've got this infinite stacktrace when debugging another problem:
> > > > [  908.795225] INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > [  908.796176]  1-...!: (1 GPs behind) idle=952/1/4611686018427387904 softirq=1462/1462 fqs=355
> > > > [  908.797692]  2-...!: (1 GPs behind) idle=f42/1/4611686018427387904 softirq=1550/1551 fqs=355
> > > > [  908.799189]  (detected by 0, t=2109 jiffies, g=130, c=129, q=235)
> > > > [  908.800284] Task dump for CPU 1:
> > > > [  908.800871] kworker/1:1     R  running task        0    32      2 0x00000022
> > > > [  908.802127] Workqueue: writecache-writeabck writecache_writeback [dm_writecache]
> > > > [  908.820285] Call trace:
> > > > [  908.824785]  __switch_to+0x68/0x90
> > > > [  908.837661]  0xfffffe00603afd90
> > > > [  908.844119]  0xfffffe00603afd90
> > > > [  908.850091]  0xfffffe00603afd90
> > > > [  908.854285]  0xfffffe00603afd90
> > > > [  908.863538]  0xfffffe00603afd90
> > > > [  908.865523]  0xfffffe00603afd90
> > > > 
> > > > The machine just locked up and kept on printing the same line over and
> > > > over again. This patch fixes it.
> > > > 
> > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > > Cc: stable@vger.kernel.org
> > > 
> > > Given this can only occur when there's a corrupted stack (where a frame
> > > record points to itself), I'm not sure this requires a cc stable.
> > > 
> > > > Index: linux-2.6/arch/arm64/kernel/stacktrace.c
> > > > ===================================================================
> > > > --- linux-2.6.orig/arch/arm64/kernel/stacktrace.c
> > > > +++ linux-2.6/arch/arm64/kernel/stacktrace.c
> > > > @@ -56,6 +56,9 @@ int notrace unwind_frame(struct task_str
> > > >  	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> > > >  	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> > > >  
> > > > +	if (frame->fp <= fp)
> > > > +		return -EINVAL;
> > > > +
> > > 
> > > Dave Martin had a series [1] which addressed this along with a number of
> > > other cases where stack traces might not terminate.
> > > 
> > > Dave, do you plan to respin that?
> > 
> > I'd be interested in an update on that; we clearly should be fixing this in
> > one way or another.
> > 
> > Mikulus -- would you be able to test and/or review it, please?
> 
> My patch was arguably over-engineered, and broken in some way connected
> with SDEI.  Unfortunately I've had too much other stuff to do...
> 
> I could take another look, but it may take time to get to it.
> 
> Alternatively, if someone wants to pick up my patch and take it forward,
> I'm happy to comment.
> 
> Cheers
> ---Dave

Or - what about this patch? It doesn't guarantee that infinite loop won't 
happen, but it at least catches the most obvious loop on the same address.


From: Mikulas Patocka <mpatocka@redhat.com>
Subject: [PATCH] arm64: fix infinite stacktrace

I've got this infinite stacktrace when debugging another problem:
[  908.795225] INFO: rcu_preempt detected stalls on CPUs/tasks:
[  908.796176]  1-...!: (1 GPs behind) idle=952/1/4611686018427387904 softirq=1462/1462 fqs=355
[  908.797692]  2-...!: (1 GPs behind) idle=f42/1/4611686018427387904 softirq=1550/1551 fqs=355
[  908.799189]  (detected by 0, t=2109 jiffies, g=130, c=129, q=235)
[  908.800284] Task dump for CPU 1:
[  908.800871] kworker/1:1     R  running task        0    32      2 0x00000022
[  908.802127] Workqueue: writecache-writeabck writecache_writeback [dm_writecache]
[  908.820285] Call trace:
[  908.824785]  __switch_to+0x68/0x90
[  908.837661]  0xfffffe00603afd90
[  908.844119]  0xfffffe00603afd90
[  908.850091]  0xfffffe00603afd90
[  908.854285]  0xfffffe00603afd90
[  908.863538]  0xfffffe00603afd90
[  908.865523]  0xfffffe00603afd90

The machine just locked up and kept on printing the same line over and
over again. This patch fixes it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org

---
 arch/arm64/kernel/stacktrace.c |    3 +++
 1 file changed, 3 insertions(+)

Comments

Will Deacon July 10, 2018, 9:13 a.m. UTC | #1
On Mon, Jul 09, 2018 at 11:04:33PM -0400, Mikulas Patocka wrote:
> On Thu, 28 Jun 2018, Dave Martin wrote:
> > Alternatively, if someone wants to pick up my patch and take it forward,
> > I'm happy to comment.
> > 
> > Cheers
> > ---Dave
> 
> Or - what about this patch? It doesn't guarantee that infinite loop won't 
> happen, but it at least catches the most obvious loop on the same address.
> 
> 
> From: Mikulas Patocka <mpatocka@redhat.com>
> Subject: [PATCH] arm64: fix infinite stacktrace
> 
> I've got this infinite stacktrace when debugging another problem:
> [  908.795225] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  908.796176]  1-...!: (1 GPs behind) idle=952/1/4611686018427387904 softirq=1462/1462 fqs=355
> [  908.797692]  2-...!: (1 GPs behind) idle=f42/1/4611686018427387904 softirq=1550/1551 fqs=355
> [  908.799189]  (detected by 0, t=2109 jiffies, g=130, c=129, q=235)
> [  908.800284] Task dump for CPU 1:
> [  908.800871] kworker/1:1     R  running task        0    32      2 0x00000022
> [  908.802127] Workqueue: writecache-writeabck writecache_writeback [dm_writecache]
> [  908.820285] Call trace:
> [  908.824785]  __switch_to+0x68/0x90
> [  908.837661]  0xfffffe00603afd90
> [  908.844119]  0xfffffe00603afd90
> [  908.850091]  0xfffffe00603afd90
> [  908.854285]  0xfffffe00603afd90
> [  908.863538]  0xfffffe00603afd90
> [  908.865523]  0xfffffe00603afd90
> 
> The machine just locked up and kept on printing the same line over and
> over again. This patch fixes it.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Cc: stable@vger.kernel.org
> 
> ---
>  arch/arm64/kernel/stacktrace.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> Index: linux-2.6/arch/arm64/kernel/stacktrace.c
> ===================================================================
> --- linux-2.6.orig/arch/arm64/kernel/stacktrace.c	2018-07-10 05:01:56.990000000 +0200
> +++ linux-2.6/arch/arm64/kernel/stacktrace.c	2018-07-10 05:02:51.650000000 +0200
> @@ -56,6 +56,9 @@ int notrace unwind_frame(struct task_str
>  	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
>  	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
>  
> +	if (frame->fp == fp)
> +		return -EINVAL;

I've already queued your previous patch using '<=' (it's in -next). If you
want to change it, please send patches on top of the arm64 for-next/core
branch.

Will
Mikulas Patocka July 10, 2018, 6:10 p.m. UTC | #2
On Tue, 10 Jul 2018, Will Deacon wrote:

> On Mon, Jul 09, 2018 at 11:04:33PM -0400, Mikulas Patocka wrote:
> > On Thu, 28 Jun 2018, Dave Martin wrote:
> > > Alternatively, if someone wants to pick up my patch and take it forward,
> > > I'm happy to comment.
> > > 
> > > Cheers
> > > ---Dave
> > 
> > Or - what about this patch? It doesn't guarantee that infinite loop won't 
> > happen, but it at least catches the most obvious loop on the same address.
> > 
> > 
> > From: Mikulas Patocka <mpatocka@redhat.com>
> > Subject: [PATCH] arm64: fix infinite stacktrace
> > 
> > I've got this infinite stacktrace when debugging another problem:
> > [  908.795225] INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [  908.796176]  1-...!: (1 GPs behind) idle=952/1/4611686018427387904 softirq=1462/1462 fqs=355
> > [  908.797692]  2-...!: (1 GPs behind) idle=f42/1/4611686018427387904 softirq=1550/1551 fqs=355
> > [  908.799189]  (detected by 0, t=2109 jiffies, g=130, c=129, q=235)
> > [  908.800284] Task dump for CPU 1:
> > [  908.800871] kworker/1:1     R  running task        0    32      2 0x00000022
> > [  908.802127] Workqueue: writecache-writeabck writecache_writeback [dm_writecache]
> > [  908.820285] Call trace:
> > [  908.824785]  __switch_to+0x68/0x90
> > [  908.837661]  0xfffffe00603afd90
> > [  908.844119]  0xfffffe00603afd90
> > [  908.850091]  0xfffffe00603afd90
> > [  908.854285]  0xfffffe00603afd90
> > [  908.863538]  0xfffffe00603afd90
> > [  908.865523]  0xfffffe00603afd90
> > 
> > The machine just locked up and kept on printing the same line over and
> > over again. This patch fixes it.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > Cc: stable@vger.kernel.org
> > 
> > ---
> >  arch/arm64/kernel/stacktrace.c |    3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > Index: linux-2.6/arch/arm64/kernel/stacktrace.c
> > ===================================================================
> > --- linux-2.6.orig/arch/arm64/kernel/stacktrace.c	2018-07-10 05:01:56.990000000 +0200
> > +++ linux-2.6/arch/arm64/kernel/stacktrace.c	2018-07-10 05:02:51.650000000 +0200
> > @@ -56,6 +56,9 @@ int notrace unwind_frame(struct task_str
> >  	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> >  	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> >  
> > +	if (frame->fp == fp)
> > +		return -EINVAL;
> 
> I've already queued your previous patch using '<=' (it's in -next). If you
> want to change it, please send patches on top of the arm64 for-next/core
> branch.
> 
> Will

I don't know - is this function supposed to backtrace against more 
discontiguous stacks? If yes, then the patch with (frame->fp <= fp) is 
incorrect. If no, then the condition (frame->fp <= fp) could be there.

Mikulas
Will Deacon July 12, 2018, 10:39 a.m. UTC | #3
On Tue, Jul 10, 2018 at 02:10:04PM -0400, Mikulas Patocka wrote:
> On Tue, 10 Jul 2018, Will Deacon wrote:
> > On Mon, Jul 09, 2018 at 11:04:33PM -0400, Mikulas Patocka wrote:
> > > Index: linux-2.6/arch/arm64/kernel/stacktrace.c
> > > ===================================================================
> > > --- linux-2.6.orig/arch/arm64/kernel/stacktrace.c	2018-07-10 05:01:56.990000000 +0200
> > > +++ linux-2.6/arch/arm64/kernel/stacktrace.c	2018-07-10 05:02:51.650000000 +0200
> > > @@ -56,6 +56,9 @@ int notrace unwind_frame(struct task_str
> > >  	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> > >  	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> > >  
> > > +	if (frame->fp == fp)
> > > +		return -EINVAL;
> > 
> > I've already queued your previous patch using '<=' (it's in -next). If you
> > want to change it, please send patches on top of the arm64 for-next/core
> > branch.
> > 
> > Will
> 
> I don't know - is this function supposed to backtrace against more 
> discontiguous stacks? If yes, then the patch with (frame->fp <= fp) is 
> incorrect. If no, then the condition (frame->fp <= fp) could be there.

Ah yes, your original patch breaks with irqstacks, so I'll need to revert
it. Using '==' feels like a big hack to me.

Will
diff mbox

Patch

Index: linux-2.6/arch/arm64/kernel/stacktrace.c
===================================================================
--- linux-2.6.orig/arch/arm64/kernel/stacktrace.c	2018-07-10 05:01:56.990000000 +0200
+++ linux-2.6/arch/arm64/kernel/stacktrace.c	2018-07-10 05:02:51.650000000 +0200
@@ -56,6 +56,9 @@  int notrace unwind_frame(struct task_str
 	frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
 	frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
 
+	if (frame->fp == fp)
+		return -EINVAL;
+
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	if (tsk->ret_stack &&
 			(frame->pc == (unsigned long)return_to_handler)) {