Message ID | 20210617125023.7288-1-shijie@os.amperecomputing.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: kexec: flush log to console in nmi_panic() | expand |
On Thu, Jun 17, 2021 at 12:50:23PM +0000, Huang Shijie wrote: > If kdump is configured, nmi_panic() may run to machine_kexec(). > > But in NMI context, the log is put in PER-CPU nmi_print_seq. > So we can not see any log on the console since we entered the NMI context, > such as the "Bye!" in previous line. > > This patch fixes this issue by two steps: > 1) Uses printk_safe_flush_on_panic() to flush the log from > nmi_print_seq to global printk ring buffer, > 2) Then uses console_flush_on_panic() to flush to console. > > After this patch, we can see the "Bye!" log in the panic console. Does it matter? I'd be more inclined to remove the print altogether... Will
On Thu, Jun 17, 2021 at 1:52 PM Will Deacon <will@kernel.org> wrote: > > On Thu, Jun 17, 2021 at 12:50:23PM +0000, Huang Shijie wrote: > > If kdump is configured, nmi_panic() may run to machine_kexec(). > > > > But in NMI context, the log is put in PER-CPU nmi_print_seq. > > So we can not see any log on the console since we entered the NMI context, > > such as the "Bye!" in previous line. > > > > This patch fixes this issue by two steps: > > 1) Uses printk_safe_flush_on_panic() to flush the log from > > nmi_print_seq to global printk ring buffer, > > 2) Then uses console_flush_on_panic() to flush to console. > > > > After this patch, we can see the "Bye!" log in the panic console. > > Does it matter? I'd be more inclined to remove the print altogether... I agree, the print could be removed entirely. But, my assumption was that this patch meant to flush other buffered prints beside this last "Bye" one. > > Will
On Thu, Jun 17, 2021 at 01:55:08PM -0400, Pavel Tatashin wrote: > On Thu, Jun 17, 2021 at 1:52 PM Will Deacon <will@kernel.org> wrote: > > > > On Thu, Jun 17, 2021 at 12:50:23PM +0000, Huang Shijie wrote: > > > If kdump is configured, nmi_panic() may run to machine_kexec(). > > > > > > But in NMI context, the log is put in PER-CPU nmi_print_seq. > > > So we can not see any log on the console since we entered the NMI context, > > > such as the "Bye!" in previous line. > > > > > > This patch fixes this issue by two steps: > > > 1) Uses printk_safe_flush_on_panic() to flush the log from > > > nmi_print_seq to global printk ring buffer, > > > 2) Then uses console_flush_on_panic() to flush to console. > > > > > > After this patch, we can see the "Bye!" log in the panic console. > > > > Does it matter? I'd be more inclined to remove the print altogether... > > I agree, the print could be removed entirely. But, my assumption was > that this patch meant to flush other buffered prints beside this last > "Bye" one. That sounds like something which should be done in the core code, rather than the in the architecture backend (and looks like panic() might do this already?) Will
On Thu, Jun 17, 2021 at 06:58:23PM +0100, Will Deacon wrote: > On Thu, Jun 17, 2021 at 01:55:08PM -0400, Pavel Tatashin wrote: > > On Thu, Jun 17, 2021 at 1:52 PM Will Deacon <will@kernel.org> wrote: > > > > > > On Thu, Jun 17, 2021 at 12:50:23PM +0000, Huang Shijie wrote: > > > > If kdump is configured, nmi_panic() may run to machine_kexec(). > > > > > > > > But in NMI context, the log is put in PER-CPU nmi_print_seq. > > > > So we can not see any log on the console since we entered the NMI context, > > > > such as the "Bye!" in previous line. > > > > > > > > This patch fixes this issue by two steps: > > > > 1) Uses printk_safe_flush_on_panic() to flush the log from > > > > nmi_print_seq to global printk ring buffer, > > > > 2) Then uses console_flush_on_panic() to flush to console. > > > > > > > > After this patch, we can see the "Bye!" log in the panic console. > > > > > > Does it matter? I'd be more inclined to remove the print altogether... We may remove the log in the arm64 code. But in the panic() itself, it still has many log, such as: .............. pr_emerg("Kernel panic - not syncing: %s\n", buf); .............. dump_stack(); .............. kdb_printf("PANIC: %s\n", msg); Without this patch, all these log above will loss.. > > > > I agree, the print could be removed entirely. But, my assumption was > > that this patch meant to flush other buffered prints beside this last > > "Bye" one. > > That sounds like something which should be done in the core code, rather > than the in the architecture backend (and looks like panic() might do this > already?) In the non-kdump code path, the core code will take care of it, please read the code in panic(). But in the kdump code path, the architecture code should take care of it. Thanks Huang Shijie
On Fri, Jun 18, 2021 at 09:03:26AM +0000, Huang Shijie wrote: > On Thu, Jun 17, 2021 at 06:58:23PM +0100, Will Deacon wrote: > > On Thu, Jun 17, 2021 at 01:55:08PM -0400, Pavel Tatashin wrote: > > > On Thu, Jun 17, 2021 at 1:52 PM Will Deacon <will@kernel.org> wrote: > > > > > > > > On Thu, Jun 17, 2021 at 12:50:23PM +0000, Huang Shijie wrote: > > > > > If kdump is configured, nmi_panic() may run to machine_kexec(). > > > > > > > > > > But in NMI context, the log is put in PER-CPU nmi_print_seq. > > > > > So we can not see any log on the console since we entered the NMI context, > > > > > such as the "Bye!" in previous line. > > > > > > > > > > This patch fixes this issue by two steps: > > > > > 1) Uses printk_safe_flush_on_panic() to flush the log from > > > > > nmi_print_seq to global printk ring buffer, > > > > > 2) Then uses console_flush_on_panic() to flush to console. > > > > > > > > > > After this patch, we can see the "Bye!" log in the panic console. > > > > > > > > Does it matter? I'd be more inclined to remove the print altogether... > We may remove the log in the arm64 code. > > But in the panic() itself, it still has many log, such as: > > .............. > pr_emerg("Kernel panic - not syncing: %s\n", buf); > .............. > dump_stack(); > .............. > kdb_printf("PANIC: %s\n", msg); > > Without this patch, all these log above will loss.. > > > > > > > I agree, the print could be removed entirely. But, my assumption was > > > that this patch meant to flush other buffered prints beside this last > > > "Bye" one. > > > > That sounds like something which should be done in the core code, rather > > than the in the architecture backend (and looks like panic() might do this > > already?) > In the non-kdump code path, the core code will take care of it, please read the > code in panic(). > > But in the kdump code path, the architecture code should take care of it. Why the discrepancy? Wouldn't it make more sense to do this in panic() for both cases, if the prints that we want to display are coming from panic() itself? Will
Hi Will, On Mon, Jun 21, 2021 at 11:08:37AM +0100, Will Deacon wrote: > > > That sounds like something which should be done in the core code, rather > > > than the in the architecture backend (and looks like panic() might do this > > > already?) > > In the non-kdump code path, the core code will take care of it, please read the > > code in panic(). > > > > But in the kdump code path, the architecture code should take care of it. > > Why the discrepancy? Wouldn't it make more sense to do this in panic() for > both cases, if the prints that we want to display are coming from panic() > itself? In the kdump code path, code call like this: panic() -->__crash_kexec() --> machine_kexec(); When we reach arm64's machine_kexec(), it means we can __NOT__ return to the panic(), we will run to the kdump linux kernel by cpu_soft_restart(). So we can not depend the panic() to print the log. :) By the way, I quote part of the arm64 log after we enter __crash_kexec() in NMI context: 1.) the log in machine_crash_shutdown() .............. pr_crit("SMP: stopping secondary CPUs\n"); .............. pr_info("Starting crashdump kernel...\n"); .............. 2.) the log in machine_kexec() .............. WARN(in_kexec_crash && (stuck_cpus || smp_crash_stop_failed()), "Some CPUs may be stale, kdump will be unreliable.\n"); .............. the logs in kexec_segment_flush(kimage); .............. pr_info("Bye!\n"); We cannot remove them all, and need to flush all the logs above to console in the NMI context. Thanks Huang Shijie
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 213d56c14f60..0ab841dab9db 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -6,6 +6,7 @@ * Copyright (C) Huawei Futurewei Technologies. */ +#include <linux/console.h> #include <linux/interrupt.h> #include <linux/irq.h> #include <linux/kernel.h> @@ -189,6 +190,12 @@ void machine_kexec(struct kimage *kimage) pr_info("Bye!\n"); + if (in_nmi()) { + /* Flush the log to console if we are in NMI context */ + printk_safe_flush_on_panic(); + console_flush_on_panic(CONSOLE_FLUSH_PENDING); + } + local_daif_mask(); /*
If kdump is configured, nmi_panic() may run to machine_kexec(). But in NMI context, the log is put in PER-CPU nmi_print_seq. So we can not see any log on the console since we entered the NMI context, such as the "Bye!" in previous line. This patch fixes this issue by two steps: 1) Uses printk_safe_flush_on_panic() to flush the log from nmi_print_seq to global printk ring buffer, 2) Then uses console_flush_on_panic() to flush to console. After this patch, we can see the "Bye!" log in the panic console. Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com> --- arch/arm64/kernel/machine_kexec.c | 7 +++++++ 1 file changed, 7 insertions(+)