mbox series

[RFC,v1,0/4] arm64: Introduce new IPI as IPI_CALL_NMI_FUNC

Message ID 1587726554-32018-1-git-send-email-sumit.garg@linaro.org (mailing list archive)
Headers show
Series arm64: Introduce new IPI as IPI_CALL_NMI_FUNC | expand

Message

Sumit Garg April 24, 2020, 11:09 a.m. UTC
With pseudo NMIs support available its possible to configure SGIs to be
triggered as pseudo NMIs running in NMI context. And kernel features
such as kgdb relies on NMI support to round up CPUs which are stuck in
hard lockup state with interrupts disabled.

This patch-set adds support for IPI_CALL_NMI_FUNC which can be triggered
as a pseudo NMI which in turn is leveraged via kgdb to round up CPUs.

After this patch-set we should be able to get a backtrace for a CPU
stuck in HARDLOCKUP. Have a look at an example below from a testcase run
on Developerbox:

$ echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT

# Enter kdb via Magic SysRq

[11]kdb> btc
btc: cpu status: Currently on cpu 11
Available cpus: 0-10(I), 11, 12(I), 13, 14-23(I)
<snip>
Stack traceback for pid 623
0xffff00086a644600      623      622  1   13   R  0xffff00086a644fc0  bash
CPU: 13 PID: 623 Comm: bash Not tainted 5.7.0-rc2 #27
Hardware name: Socionext SynQuacer E-series DeveloperBox, BIOS build #73 Apr  6 2020
Call trace:
 dump_backtrace+0x0/0x198
 show_stack+0x18/0x28
 dump_stack+0xb8/0x100
 kgdb_cpu_enter+0x5c0/0x5f8
 kgdb_nmicallback+0xa0/0xa8
 handle_IPI+0x190/0x200
 gic_handle_irq+0x2b8/0x2d8
 el1_irq+0xcc/0x180
 lkdtm_HARDLOCKUP+0x8/0x18
 direct_entry+0x124/0x1c0
 full_proxy_write+0x60/0xb0
 __vfs_write+0x1c/0x48
 vfs_write+0xe4/0x1d0
 ksys_write+0x6c/0xf8
 __arm64_sys_write+0x1c/0x28
 el0_svc_common.constprop.0+0x74/0x1f0
 do_el0_svc+0x24/0x90
 el0_sync_handler+0x178/0x2b8
 el0_sync+0x158/0x180
<snip>

Looking forward to your comments/feedback.

Sumit Garg (4):
  arm64: smp: Introduce a new IPI as IPI_CALL_NMI_FUNC
  irqchip/gic-v3: Add support to handle SGI as pseudo NMI
  irqchip/gic-v3: Enable arch specific IPI as pseudo NMI
  arm64: kgdb: Round up cpus using IPI_CALL_NMI_FUNC

 arch/arm64/include/asm/hardirq.h |  2 +-
 arch/arm64/include/asm/smp.h     |  1 +
 arch/arm64/kernel/kgdb.c         | 15 +++++++++++++++
 arch/arm64/kernel/smp.c          | 36 +++++++++++++++++++++++++++++++++++-
 drivers/irqchip/irq-gic-v3.c     | 36 +++++++++++++++++++++++++++++++-----
 5 files changed, 83 insertions(+), 7 deletions(-)

Comments

Doug Anderson April 24, 2020, 8:49 p.m. UTC | #1
Hi,

On Fri, Apr 24, 2020 at 4:11 AM Sumit Garg <sumit.garg@linaro.org> wrote:
>
> With pseudo NMIs support available its possible to configure SGIs to be
> triggered as pseudo NMIs running in NMI context. And kernel features
> such as kgdb relies on NMI support to round up CPUs which are stuck in
> hard lockup state with interrupts disabled.
>
> This patch-set adds support for IPI_CALL_NMI_FUNC which can be triggered
> as a pseudo NMI which in turn is leveraged via kgdb to round up CPUs.
>
> After this patch-set we should be able to get a backtrace for a CPU
> stuck in HARDLOCKUP. Have a look at an example below from a testcase run
> on Developerbox:
>
> $ echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
>
> # Enter kdb via Magic SysRq
>
> [11]kdb> btc
> btc: cpu status: Currently on cpu 11
> Available cpus: 0-10(I), 11, 12(I), 13, 14-23(I)
> <snip>
> Stack traceback for pid 623
> 0xffff00086a644600      623      622  1   13   R  0xffff00086a644fc0  bash
> CPU: 13 PID: 623 Comm: bash Not tainted 5.7.0-rc2 #27
> Hardware name: Socionext SynQuacer E-series DeveloperBox, BIOS build #73 Apr  6 2020
> Call trace:
>  dump_backtrace+0x0/0x198
>  show_stack+0x18/0x28
>  dump_stack+0xb8/0x100
>  kgdb_cpu_enter+0x5c0/0x5f8
>  kgdb_nmicallback+0xa0/0xa8
>  handle_IPI+0x190/0x200
>  gic_handle_irq+0x2b8/0x2d8
>  el1_irq+0xcc/0x180
>  lkdtm_HARDLOCKUP+0x8/0x18
>  direct_entry+0x124/0x1c0
>  full_proxy_write+0x60/0xb0
>  __vfs_write+0x1c/0x48
>  vfs_write+0xe4/0x1d0
>  ksys_write+0x6c/0xf8
>  __arm64_sys_write+0x1c/0x28
>  el0_svc_common.constprop.0+0x74/0x1f0
>  do_el0_svc+0x24/0x90
>  el0_sync_handler+0x178/0x2b8
>  el0_sync+0x158/0x180
> <snip>
>
> Looking forward to your comments/feedback.
>
> Sumit Garg (4):
>   arm64: smp: Introduce a new IPI as IPI_CALL_NMI_FUNC
>   irqchip/gic-v3: Add support to handle SGI as pseudo NMI
>   irqchip/gic-v3: Enable arch specific IPI as pseudo NMI
>   arm64: kgdb: Round up cpus using IPI_CALL_NMI_FUNC
>
>  arch/arm64/include/asm/hardirq.h |  2 +-
>  arch/arm64/include/asm/smp.h     |  1 +
>  arch/arm64/kernel/kgdb.c         | 15 +++++++++++++++
>  arch/arm64/kernel/smp.c          | 36 +++++++++++++++++++++++++++++++++++-
>  drivers/irqchip/irq-gic-v3.c     | 36 +++++++++++++++++++++++++++++++-----
>  5 files changed, 83 insertions(+), 7 deletions(-)

This is amazing!

* picked your patches back to my current 5.4 tree
* turned on "CONFIG_ARM64_PSEUDO_NMI"
* set the "irqchip.gicv3_pseudo_nmi=1" command line

...and bam I can trace on the locked up CPU instead of being left in the dark.

I'm not sure I'm going to be too much use in actually doing the review
of the code since I'm not really an expert at how SGIs work (it took
me a while to realize that it must stand for software generated
interrupts) nor the bowels of the GIC.  I tried to do what little
review I could.

In any case, I'll keep this in my local patch stack for now and keep
testing it to make sure I don't notice any weird problems.

-Doug
Sumit Garg April 27, 2020, 4:54 a.m. UTC | #2
On Sat, 25 Apr 2020 at 02:20, Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Fri, Apr 24, 2020 at 4:11 AM Sumit Garg <sumit.garg@linaro.org> wrote:
> >
> > With pseudo NMIs support available its possible to configure SGIs to be
> > triggered as pseudo NMIs running in NMI context. And kernel features
> > such as kgdb relies on NMI support to round up CPUs which are stuck in
> > hard lockup state with interrupts disabled.
> >
> > This patch-set adds support for IPI_CALL_NMI_FUNC which can be triggered
> > as a pseudo NMI which in turn is leveraged via kgdb to round up CPUs.
> >
> > After this patch-set we should be able to get a backtrace for a CPU
> > stuck in HARDLOCKUP. Have a look at an example below from a testcase run
> > on Developerbox:
> >
> > $ echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> >
> > # Enter kdb via Magic SysRq
> >
> > [11]kdb> btc
> > btc: cpu status: Currently on cpu 11
> > Available cpus: 0-10(I), 11, 12(I), 13, 14-23(I)
> > <snip>
> > Stack traceback for pid 623
> > 0xffff00086a644600      623      622  1   13   R  0xffff00086a644fc0  bash
> > CPU: 13 PID: 623 Comm: bash Not tainted 5.7.0-rc2 #27
> > Hardware name: Socionext SynQuacer E-series DeveloperBox, BIOS build #73 Apr  6 2020
> > Call trace:
> >  dump_backtrace+0x0/0x198
> >  show_stack+0x18/0x28
> >  dump_stack+0xb8/0x100
> >  kgdb_cpu_enter+0x5c0/0x5f8
> >  kgdb_nmicallback+0xa0/0xa8
> >  handle_IPI+0x190/0x200
> >  gic_handle_irq+0x2b8/0x2d8
> >  el1_irq+0xcc/0x180
> >  lkdtm_HARDLOCKUP+0x8/0x18
> >  direct_entry+0x124/0x1c0
> >  full_proxy_write+0x60/0xb0
> >  __vfs_write+0x1c/0x48
> >  vfs_write+0xe4/0x1d0
> >  ksys_write+0x6c/0xf8
> >  __arm64_sys_write+0x1c/0x28
> >  el0_svc_common.constprop.0+0x74/0x1f0
> >  do_el0_svc+0x24/0x90
> >  el0_sync_handler+0x178/0x2b8
> >  el0_sync+0x158/0x180
> > <snip>
> >
> > Looking forward to your comments/feedback.
> >
> > Sumit Garg (4):
> >   arm64: smp: Introduce a new IPI as IPI_CALL_NMI_FUNC
> >   irqchip/gic-v3: Add support to handle SGI as pseudo NMI
> >   irqchip/gic-v3: Enable arch specific IPI as pseudo NMI
> >   arm64: kgdb: Round up cpus using IPI_CALL_NMI_FUNC
> >
> >  arch/arm64/include/asm/hardirq.h |  2 +-
> >  arch/arm64/include/asm/smp.h     |  1 +
> >  arch/arm64/kernel/kgdb.c         | 15 +++++++++++++++
> >  arch/arm64/kernel/smp.c          | 36 +++++++++++++++++++++++++++++++++++-
> >  drivers/irqchip/irq-gic-v3.c     | 36 +++++++++++++++++++++++++++++++-----
> >  5 files changed, 83 insertions(+), 7 deletions(-)
>
> This is amazing!
>
> * picked your patches back to my current 5.4 tree
> * turned on "CONFIG_ARM64_PSEUDO_NMI"
> * set the "irqchip.gicv3_pseudo_nmi=1" command line
>
> ...and bam I can trace on the locked up CPU instead of being left in the dark.
>
> I'm not sure I'm going to be too much use in actually doing the review
> of the code since I'm not really an expert at how SGIs work (it took
> me a while to realize that it must stand for software generated
> interrupts) nor the bowels of the GIC.  I tried to do what little
> review I could.
>
> In any case, I'll keep this in my local patch stack for now and keep
> testing it to make sure I don't notice any weird problems.

Thanks for your review and testing.

-Sumit

>
> -Doug