Message ID | 1587726554-32018-1-git-send-email-sumit.garg@linaro.org (mailing list archive) |
---|---|
Headers | show |
Series | arm64: Introduce new IPI as IPI_CALL_NMI_FUNC | expand |
Hi, On Fri, Apr 24, 2020 at 4:11 AM Sumit Garg <sumit.garg@linaro.org> wrote: > > With pseudo NMIs support available its possible to configure SGIs to be > triggered as pseudo NMIs running in NMI context. And kernel features > such as kgdb relies on NMI support to round up CPUs which are stuck in > hard lockup state with interrupts disabled. > > This patch-set adds support for IPI_CALL_NMI_FUNC which can be triggered > as a pseudo NMI which in turn is leveraged via kgdb to round up CPUs. > > After this patch-set we should be able to get a backtrace for a CPU > stuck in HARDLOCKUP. Have a look at an example below from a testcase run > on Developerbox: > > $ echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT > > # Enter kdb via Magic SysRq > > [11]kdb> btc > btc: cpu status: Currently on cpu 11 > Available cpus: 0-10(I), 11, 12(I), 13, 14-23(I) > <snip> > Stack traceback for pid 623 > 0xffff00086a644600 623 622 1 13 R 0xffff00086a644fc0 bash > CPU: 13 PID: 623 Comm: bash Not tainted 5.7.0-rc2 #27 > Hardware name: Socionext SynQuacer E-series DeveloperBox, BIOS build #73 Apr 6 2020 > Call trace: > dump_backtrace+0x0/0x198 > show_stack+0x18/0x28 > dump_stack+0xb8/0x100 > kgdb_cpu_enter+0x5c0/0x5f8 > kgdb_nmicallback+0xa0/0xa8 > handle_IPI+0x190/0x200 > gic_handle_irq+0x2b8/0x2d8 > el1_irq+0xcc/0x180 > lkdtm_HARDLOCKUP+0x8/0x18 > direct_entry+0x124/0x1c0 > full_proxy_write+0x60/0xb0 > __vfs_write+0x1c/0x48 > vfs_write+0xe4/0x1d0 > ksys_write+0x6c/0xf8 > __arm64_sys_write+0x1c/0x28 > el0_svc_common.constprop.0+0x74/0x1f0 > do_el0_svc+0x24/0x90 > el0_sync_handler+0x178/0x2b8 > el0_sync+0x158/0x180 > <snip> > > Looking forward to your comments/feedback. > > Sumit Garg (4): > arm64: smp: Introduce a new IPI as IPI_CALL_NMI_FUNC > irqchip/gic-v3: Add support to handle SGI as pseudo NMI > irqchip/gic-v3: Enable arch specific IPI as pseudo NMI > arm64: kgdb: Round up cpus using IPI_CALL_NMI_FUNC > > arch/arm64/include/asm/hardirq.h | 2 +- > arch/arm64/include/asm/smp.h | 1 + > arch/arm64/kernel/kgdb.c | 15 +++++++++++++++ > arch/arm64/kernel/smp.c | 36 +++++++++++++++++++++++++++++++++++- > drivers/irqchip/irq-gic-v3.c | 36 +++++++++++++++++++++++++++++++----- > 5 files changed, 83 insertions(+), 7 deletions(-) This is amazing! * picked your patches back to my current 5.4 tree * turned on "CONFIG_ARM64_PSEUDO_NMI" * set the "irqchip.gicv3_pseudo_nmi=1" command line ...and bam I can trace on the locked up CPU instead of being left in the dark. I'm not sure I'm going to be too much use in actually doing the review of the code since I'm not really an expert at how SGIs work (it took me a while to realize that it must stand for software generated interrupts) nor the bowels of the GIC. I tried to do what little review I could. In any case, I'll keep this in my local patch stack for now and keep testing it to make sure I don't notice any weird problems. -Doug
On Sat, 25 Apr 2020 at 02:20, Doug Anderson <dianders@chromium.org> wrote: > > Hi, > > On Fri, Apr 24, 2020 at 4:11 AM Sumit Garg <sumit.garg@linaro.org> wrote: > > > > With pseudo NMIs support available its possible to configure SGIs to be > > triggered as pseudo NMIs running in NMI context. And kernel features > > such as kgdb relies on NMI support to round up CPUs which are stuck in > > hard lockup state with interrupts disabled. > > > > This patch-set adds support for IPI_CALL_NMI_FUNC which can be triggered > > as a pseudo NMI which in turn is leveraged via kgdb to round up CPUs. > > > > After this patch-set we should be able to get a backtrace for a CPU > > stuck in HARDLOCKUP. Have a look at an example below from a testcase run > > on Developerbox: > > > > $ echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT > > > > # Enter kdb via Magic SysRq > > > > [11]kdb> btc > > btc: cpu status: Currently on cpu 11 > > Available cpus: 0-10(I), 11, 12(I), 13, 14-23(I) > > <snip> > > Stack traceback for pid 623 > > 0xffff00086a644600 623 622 1 13 R 0xffff00086a644fc0 bash > > CPU: 13 PID: 623 Comm: bash Not tainted 5.7.0-rc2 #27 > > Hardware name: Socionext SynQuacer E-series DeveloperBox, BIOS build #73 Apr 6 2020 > > Call trace: > > dump_backtrace+0x0/0x198 > > show_stack+0x18/0x28 > > dump_stack+0xb8/0x100 > > kgdb_cpu_enter+0x5c0/0x5f8 > > kgdb_nmicallback+0xa0/0xa8 > > handle_IPI+0x190/0x200 > > gic_handle_irq+0x2b8/0x2d8 > > el1_irq+0xcc/0x180 > > lkdtm_HARDLOCKUP+0x8/0x18 > > direct_entry+0x124/0x1c0 > > full_proxy_write+0x60/0xb0 > > __vfs_write+0x1c/0x48 > > vfs_write+0xe4/0x1d0 > > ksys_write+0x6c/0xf8 > > __arm64_sys_write+0x1c/0x28 > > el0_svc_common.constprop.0+0x74/0x1f0 > > do_el0_svc+0x24/0x90 > > el0_sync_handler+0x178/0x2b8 > > el0_sync+0x158/0x180 > > <snip> > > > > Looking forward to your comments/feedback. > > > > Sumit Garg (4): > > arm64: smp: Introduce a new IPI as IPI_CALL_NMI_FUNC > > irqchip/gic-v3: Add support to handle SGI as pseudo NMI > > irqchip/gic-v3: Enable arch specific IPI as pseudo NMI > > arm64: kgdb: Round up cpus using IPI_CALL_NMI_FUNC > > > > arch/arm64/include/asm/hardirq.h | 2 +- > > arch/arm64/include/asm/smp.h | 1 + > > arch/arm64/kernel/kgdb.c | 15 +++++++++++++++ > > arch/arm64/kernel/smp.c | 36 +++++++++++++++++++++++++++++++++++- > > drivers/irqchip/irq-gic-v3.c | 36 +++++++++++++++++++++++++++++++----- > > 5 files changed, 83 insertions(+), 7 deletions(-) > > This is amazing! > > * picked your patches back to my current 5.4 tree > * turned on "CONFIG_ARM64_PSEUDO_NMI" > * set the "irqchip.gicv3_pseudo_nmi=1" command line > > ...and bam I can trace on the locked up CPU instead of being left in the dark. > > I'm not sure I'm going to be too much use in actually doing the review > of the code since I'm not really an expert at how SGIs work (it took > me a while to realize that it must stand for software generated > interrupts) nor the bowels of the GIC. I tried to do what little > review I could. > > In any case, I'll keep this in my local patch stack for now and keep > testing it to make sure I don't notice any weird problems. Thanks for your review and testing. -Sumit > > -Doug