Message ID | 20200924044236.1245808-1-ito-yuichi@fujitsu.com (mailing list archive) |
---|---|
Headers | show |
Series | Enable support IPI_CPU_CRASH_STOP to be pseudo-NMI | expand |
Hi Marc, Sumit I would appreciate if you have any advice on this patch. Yuichi Ito > -----Original Message----- > From: Yuichi Ito <ito-yuichi@fujitsu.com> > Sent: Thursday, September 24, 2020 1:43 PM > To: maz@kernel.org; sumit.garg@linaro.org; tglx@linutronix.de; > jason@lakedaemon.net; catalin.marinas@arm.com; will@kernel.org > Cc: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; Ito, > Yuichi/伊藤 有一 <ito-yuichi@fujitsu.com> > Subject: [PATCH 0/2] Enable support IPI_CPU_CRASH_STOP to be > pseudo-NMI > > Enable support IPI_CPU_CRASH_STOP to be pseudo-NMI > > This patchset enables IPI_CPU_CRASH_STOP IPI to be pseudo-NMI. > This allows kdump to collect system information even when the CPU is in a > HARDLOCKUP state. > > Only IPI_CPU_CRASH_STOP uses NMI and the other IPIs remain normal > IRQs. > > The patch has been tested on ThunderX. > > This patch assumes Marc's latest IPIs patch-set. [1] It also uses some of > Sumit's IPI patch set for NMI.[2] > > [1] > https://lore.kernel.org/linux-arm-kernel/20200901144324.1071694-1-maz@ke > rnel.org/ > [2] > https://lore.kernel.org/linux-arm-kernel/1599830924-13990-3-git-send-email > -sumit.garg@linaro.org/ > > $ echo 1 > /proc/sys/kernel/panic_on_rcu_stal > $ echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT > : kernel panics and crash kernel boot > : makedumpfile saves the system state at HARDLOCKUP in vmcore. > > crash utility: > crash> bt > PID: 3213 TASK: fffffd001adc5940 CPU: 8 COMMAND: "bash" > #0 [fffffe0022fefcf0] lkdtm_HARDLOCKUP at fffffe0010888ab4 > #1 [fffffe0022fefd10] lkdtm_do_action at fffffe00108882bc > #2 [fffffe0022fefd20] direct_entry at fffffe0010888720 > #3 [fffffe0022fefd70] full_proxy_write at fffffe001058cfe4 > #4 [fffffe0022fefdb0] vfs_write at fffffe00104a4c2c > #5 [fffffe0022fefdf0] ksys_write at fffffe00104a4f0c > #6 [fffffe0022fefe40] __arm64_sys_write at fffffe00104a4fbc > #7 [fffffe0022fefe50] el0_svc_common.constprop.0 at fffffe0010159e38 > #8 [fffffe0022fefe80] do_el0_svc at fffffe0010159fa0 > #9 [fffffe0022fefe90] el0_svc at fffffe00101481d0 > #10 [fffffe0022fefea0] el0_sync_handler at fffffe00101484b4 > #11 [fffffe0022fefff0] el0_sync at fffffe0010142b7c > > > Sumit Garg (1): > irqchip/gic-v3: Enable support for SGIs to act as NMIs > > Yuichi Ito (1): > Register IPI_CPU_CRASH_STOP IPI as pseudo-NMI > > arch/arm64/kernel/smp.c | 39 > ++++++++++++++++++++++++++++-------- > drivers/irqchip/irq-gic-v3.c | 13 ++++++++++-- > 2 files changed, 42 insertions(+), 10 deletions(-) > > -- > 2.25.1
On 2020-09-28 03:43, ito-yuichi@fujitsu.com wrote: > Hi Marc, Sumit > > I would appreciate if you have any advice on this patch. I haven't had a chance to look into it, as I'm not even sure I'll take the core series in the first place (there are outstanding regressions I can't reproduce, let alone fix them). > > Yuichi Ito > >> -----Original Message----- >> From: Yuichi Ito <ito-yuichi@fujitsu.com> >> Sent: Thursday, September 24, 2020 1:43 PM >> To: maz@kernel.org; sumit.garg@linaro.org; tglx@linutronix.de; >> jason@lakedaemon.net; catalin.marinas@arm.com; will@kernel.org >> Cc: linux-arm-kernel@lists.infradead.org; >> linux-kernel@vger.kernel.org; Ito, >> Yuichi/伊藤 有一 <ito-yuichi@fujitsu.com> >> Subject: [PATCH 0/2] Enable support IPI_CPU_CRASH_STOP to be >> pseudo-NMI >> >> Enable support IPI_CPU_CRASH_STOP to be pseudo-NMI >> >> This patchset enables IPI_CPU_CRASH_STOP IPI to be pseudo-NMI. >> This allows kdump to collect system information even when the CPU is >> in a >> HARDLOCKUP state. >> >> Only IPI_CPU_CRASH_STOP uses NMI and the other IPIs remain normal >> IRQs. >> >> The patch has been tested on ThunderX. Which ThunderX? TX2 (at least the incarnation I used in the past) wasn't able to correctly deal with priorities. M.
Hi Marc Thank you for your reply. > On 2020-09-28 03:43, ito-yuichi@fujitsu.com wrote: > > Hi Marc, Sumit > > > > I would appreciate if you have any advice on this patch. > > I haven't had a chance to look into it, as I'm not even sure I'll take the core > series in the first place (there are outstanding regressions I can't reproduce, > let alone fix them). > I understand it. Please let me know if there is anything I can do. I sincerely hope that your patches will be merged into the mainline. > > > > Yuichi Ito > > > >> Enable support IPI_CPU_CRASH_STOP to be pseudo-NMI > >> > >> This patchset enables IPI_CPU_CRASH_STOP IPI to be pseudo-NMI. > >> This allows kdump to collect system information even when the CPU is > >> in a HARDLOCKUP state. > >> > >> Only IPI_CPU_CRASH_STOP uses NMI and the other IPIs remain normal > >> IRQs. > >> > >> The patch has been tested on ThunderX. > > Which ThunderX? TX2 (at least the incarnation I used in the past) wasn't able > to correctly deal with priorities. I tried it with ThunderX CN8890. If you tell me steps to reproduce the problem of TX2, I will investigate it with TX as well. > M. > -- > Jazz is not dead. It just smells funny... Thank you and best regards, Yuichi Ito
On 2020-09-29 06:50, ito-yuichi@fujitsu.com wrote: > Hi Marc [...] >> >> The patch has been tested on ThunderX. >> >> Which ThunderX? TX2 (at least the incarnation I used in the past) >> wasn't able >> to correctly deal with priorities. > > I tried it with ThunderX CN8890. > If you tell me steps to reproduce the problem of TX2, I will > investigate it with TX as well. PMR_EL1 reporting fantasy values, non-uniform priority support across the interrupt classes, and generally prone to lockups. The original TX is a very different machine though (TX 1 and 2 only share the engraving of the manufacturer on the heat-spreader). M.
Hi Marc > > On 2020-09-29 06:50, ito-yuichi@fujitsu.com wrote: > > Hi Marc > > [...] > > >> >> The patch has been tested on ThunderX. > >> > >> Which ThunderX? TX2 (at least the incarnation I used in the past) > >> wasn't able > >> to correctly deal with priorities. > > > > I tried it with ThunderX CN8890. > > If you tell me steps to reproduce the problem of TX2, I will > > investigate it with TX as well. > > PMR_EL1 reporting fantasy values, non-uniform priority support across > the interrupt classes, and generally prone to lockups. The original TX > is a very different machine though (TX 1 and 2 only share the engraving > of the manufacturer on the heat-spreader). Thank you for the information. I will check if we have a ThunderX1 or X2 environment. If we have either one, I will investigate it. > M. > -- > Jazz is not dead. It just smells funny... Thank you and best regards, Yuichi Ito