Message ID | 20151211124159.GB3729@pathway.suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Friday 11 December 2015 13:41:59 Petr Mladek wrote: > diff --git a/init/Kconfig b/init/Kconfig > index efcff25a112d..61cfd96a3c96 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -870,7 +870,7 @@ config NMI_LOG_BUF_SHIFT > int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)" > range 10 21 > default 13 > - depends on PRINTK && HAVE_NMI > + depends on PRINTK_NMI > help > Select the size of a per-CPU buffer where NMI messages are temporary > stored. They are copied to the main log buffer in a safe context > > Acked-by: Arnd Bergmann <arnd@arndb.de> I found this on linux-next as well today and came to the same conclusion. Arnd
On Fri, Dec 11, 2015 at 1:41 PM, Petr Mladek <pmladek@suse.com> wrote: > On Fri 2015-12-11 12:10:02, Geert Uytterhoeven wrote: >> On Wed, Dec 9, 2015 at 2:21 PM, Petr Mladek <pmladek@suse.com> wrote: >> > --- a/init/Kconfig >> > +++ b/init/Kconfig >> > @@ -866,6 +866,28 @@ config LOG_CPU_MAX_BUF_SHIFT >> > 13 => 8 KB for each CPU >> > 12 => 4 KB for each CPU >> > >> > +config NMI_LOG_BUF_SHIFT >> > + int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)" >> > + range 10 21 >> > + default 13 >> > + depends on PRINTK && HAVE_NMI >> >> Symbol NMI_LOG_BUF_SHIFT does not exist if its dependencies are not met. > > Åh, the NMI buffer is enabled on arm via NEED_PRINTK_NMI. > > The buffer is compiled when CONFIG_PRINTK_NMI is defined. I am going > to fix it the following way: > > > diff --git a/init/Kconfig b/init/Kconfig > index efcff25a112d..61cfd96a3c96 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -870,7 +870,7 @@ config NMI_LOG_BUF_SHIFT > int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)" > range 10 21 > default 13 > - depends on PRINTK && HAVE_NMI > + depends on PRINTK_NMI > help > Select the size of a per-CPU buffer where NMI messages are temporary > stored. They are copied to the main log buffer in a safe context Makes sense, as kernel/printk/nmi.c is compiled if PRINTK_NMI is set. Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On Fri, 11 Dec 2015 13:41:59 +0100 Petr Mladek <pmladek@suse.com> wrote: > On Fri 2015-12-11 12:10:02, Geert Uytterhoeven wrote: > > On Wed, Dec 9, 2015 at 2:21 PM, Petr Mladek <pmladek@suse.com> wrote: > > > --- a/init/Kconfig > > > +++ b/init/Kconfig > > > @@ -866,6 +866,28 @@ config LOG_CPU_MAX_BUF_SHIFT > > > 13 => 8 KB for each CPU > > > 12 => 4 KB for each CPU > > > > > > +config NMI_LOG_BUF_SHIFT > > > + int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)" > > > + range 10 21 > > > + default 13 > > > + depends on PRINTK && HAVE_NMI > > > > Symbol NMI_LOG_BUF_SHIFT does not exist if its dependencies are not met. > > __h, the NMI buffer is enabled on arm via NEED_PRINTK_NMI. > > The buffer is compiled when CONFIG_PRINTK_NMI is defined. I am going > to fix it the following way: > > > diff --git a/init/Kconfig b/init/Kconfig > index efcff25a112d..61cfd96a3c96 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -870,7 +870,7 @@ config NMI_LOG_BUF_SHIFT > int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)" > range 10 21 > default 13 > - depends on PRINTK && HAVE_NMI > + depends on PRINTK_NMI > help > Select the size of a per-CPU buffer where NMI messages are temporary > stored. They are copied to the main log buffer in a safe context I'm wondering why we're building kernel/printk/nmi.o if HAVE_NMI is not set. obj-$(CONFIG_PRINTK_NMI) += nmi.o and config PRINTK_NMI def_bool y depends on PRINTK depends on HAVE_NMI || NEED_PRINTK_NMI This is a bit messy. NEED_PRINTK_NMI is an added-on hack for one particular arm variant. From the changelog: "One exception is arm where the deferred printing is used for printing backtraces even without NMI. For this purpose, we define NEED_PRINTK_NMI Kconfig flag. The alternative printk_func is explicitly set when IPI_CPU_BACKTRACE is handled." - why does arm needs deferred printing for backtraces? - why is this specific to CONFIG_CPU_V7M? - can this Kconfig logic be cleaned up a bit?
On Fri, Dec 11, 2015 at 02:57:25PM -0800, Andrew Morton wrote: > This is a bit messy. NEED_PRINTK_NMI is an added-on hack for one > particular arm variant. From the changelog: > > "One exception is arm where the deferred printing is used for > printing backtraces even without NMI. For this purpose, we define > NEED_PRINTK_NMI Kconfig flag. The alternative printk_func is > explicitly set when IPI_CPU_BACKTRACE is handled." > > > - why does arm needs deferred printing for backtraces? > > - why is this specific to CONFIG_CPU_V7M? > > - can this Kconfig logic be cleaned up a bit? I think this comes purely from this attempt to apply another round of cleanups to the nmi backtrace work I did. As I explained when I did that work, the vast majority of ARM platforms are unable to trigger anything like a NMI - the FIQ is something that's generally a property of the secure monitor, and is not accessible to Linux. However, there are platforms where it is accessible. The work to add the FIQ-based variant never happened (I've no idea what happened to that part, Daniel seems to have lost interest in working on it.) So, what we have is the IRQ-based variant merged in mainline, which would be the fallback for the "FIQ not available" cases, and I carry a local hack in my tree which provides the FIQ-based version - but if it were to trigger, it takes out all interrupts (hence why I've not merged my hack.) I think the reason that the FIQ-based variant has never really happened is that hooking into the interrupt controller code to clear down the FIQ creates such a horrid layering violation, and also a locking mess that I suspect it's just been given up with. However, I've found my "hack" useful - it's turned a number of totally undebuggable hangs (where one CPU silently hangs leaving the others running with no way to find out where the hung CPU is) into something that can be debugged. Now, when we end up triggering the IRQ-based variant, we could already be in a situation where IRQs are off for the local CPU, so the IRQ is never delivered. Others decided that it wasn't acceptable to wait 10sec for the local CPU to time out, and (iirc) we'd also loose the local CPUs backtrace in certain situations. I'm personally happy with the existing code, and I've been wondering why there's this effort to apply further cleanups - to me, the changelogs don't seem to make that much sense, unless we want to start using printk() extensively in NMI functions - using the generic nmi backtrace code surely gets us something that works across all architectures... I've been assuming that I've missed something, which is why I've not said anything on that point until now.
On Fri, 11 Dec 2015, Russell King - ARM Linux wrote: > I'm personally happy with the existing code, and I've been wondering why > there's this effort to apply further cleanups - to me, the changelogs > don't seem to make that much sense, unless we want to start using > printk() extensively in NMI functions - using the generic nmi backtrace > code surely gets us something that works across all architectures... It is already being used extensively, and not only for all-CPU backtraces. For starters, please consider - WARN_ON(in_nmi()) - BUG_ON(in_nmi()) - anything being printed out from MCE handlers
On Fri, 11 Dec 2015 23:21:13 +0000 Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Fri, Dec 11, 2015 at 02:57:25PM -0800, Andrew Morton wrote: > > This is a bit messy. NEED_PRINTK_NMI is an added-on hack for one > > particular arm variant. From the changelog: > > > > "One exception is arm where the deferred printing is used for > > printing backtraces even without NMI. For this purpose, we define > > NEED_PRINTK_NMI Kconfig flag. The alternative printk_func is > > explicitly set when IPI_CPU_BACKTRACE is handled." > > > > > > - why does arm needs deferred printing for backtraces? > > > > - why is this specific to CONFIG_CPU_V7M? > > > > - can this Kconfig logic be cleaned up a bit? > > I think this comes purely from this attempt to apply another round of > cleanups to the nmi backtrace work I did. > > As I explained when I did that work, the vast majority of ARM platforms > are unable to trigger anything like a NMI - the FIQ is something that's > generally a property of the secure monitor, and is not accessible to > Linux. However, there are platforms where it is accessible. OK, thanks. So "not needed at present, might be needed in the future, useful for out-of-tree debug code"? > I'm personally happy with the existing code, and I've been wondering why > there's this effort to apply further cleanups - to me, the changelogs > don't seem to make that much sense, unless we want to start using > printk() extensively in NMI functions - using the generic nmi backtrace > code surely gets us something that works across all architectures... Yes, I was scratching my head over that. The patchset takes an nmi-safe all-cpu-backtrace and generalises that into an nmi-safe printk. That *sounds* like a good thing to do but yes, some additional justification would be helpful. What real-world value does this patchset really bring to real-world users?
On 11/12/15 23:21, Russell King - ARM Linux wrote: > As I explained when I did that work, the vast majority of ARM platforms > are unable to trigger anything like a NMI - the FIQ is something that's > generally a property of the secure monitor, and is not accessible to > Linux. However, there are platforms where it is accessible. > > The work to add the FIQ-based variant never happened (I've no idea what > happened to that part, Daniel seems to have lost interest in working on > it.) So, what we have is the IRQ-based variant merged in mainline, which > would be the fallback for the "FIQ not available" cases, and I carry a > local hack in my tree which provides the FIQ-based version - but if it > were to trigger, it takes out all interrupts (hence why I've not merged > my hack.) > > I think the reason that the FIQ-based variant has never really happened > is that hooking into the interrupt controller code to clear down the FIQ > creates such a horrid layering violation, and also a locking mess that > I suspect it's just been given up with. I haven't quite given up; I'm still looking into this stuff. However you're certainly right that connecting the FIQ handler to the GIC code in an elegant way is tough. I've been working in parallel on an arm64 implementation with the result that I'm now two lumps of code that are almost, but not quite, ready. Right now I hope to share latest arm code fairly late in the this devcycle (for review rather than merge) followed up with a new version very early in v4.6. Even now I think the code needs a long soak in -next just in case there are any lurking regressions on particular platforms. I don't expect anyone to base decisions on my aspirations above but would like to reassure Russell that I haven't given up on it. Daniel.
On Fri 2015-12-11 15:30:54, Andrew Morton wrote: > On Fri, 11 Dec 2015 23:21:13 +0000 Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > > > On Fri, Dec 11, 2015 at 02:57:25PM -0800, Andrew Morton wrote: > > > This is a bit messy. NEED_PRINTK_NMI is an added-on hack for one > > > particular arm variant. From the changelog: > > > > > > "One exception is arm where the deferred printing is used for > > > printing backtraces even without NMI. For this purpose, we define > > > NEED_PRINTK_NMI Kconfig flag. The alternative printk_func is > > > explicitly set when IPI_CPU_BACKTRACE is handled." > > > > > > > > > - why does arm needs deferred printing for backtraces? > > > > > > - why is this specific to CONFIG_CPU_V7M? > > > - can this Kconfig logic be cleaned up a bit? > > > > I think this comes purely from this attempt to apply another round of > > cleanups to the nmi backtrace work I did. > > > > As I explained when I did that work, the vast majority of ARM platforms > > are unable to trigger anything like a NMI - the FIQ is something that's > > generally a property of the secure monitor, and is not accessible to > > Linux. However, there are platforms where it is accessible. > > OK, thanks. So "not needed at present, might be needed in the future, > useful for out-of-tree debug code"? It is possible that I got it a wrong way on arm. The NMI buffer is usable there on two locations. First, the temporary is currently used to handle IPI_CPU_BACKTRACE. It seems that it is not a real NMI. But it seems to be available (compiled) on all arm system. This is why I introduced NEED_PRINTK_NMI Kconfig flag to avoid confusion with a real NMI. Second, there is the FIQ "NMI" handler that is called from /arch/arm/kernel/entry-armv.S. It is compiled only if _not_ defined $(CONFIG_CPU_V7M). It calls nmi_enter() and nmi_stop(). It looks like a real NMI handler. This is why I defined HAVE_NMI if (!CPU_V7M). A solution would be to define HAVE_NMI on all Arm systems and get rid of NEED_PRINTK_NMI. If you think that it would cause less confusion... > > there's this effort to apply further cleanups - to me, the changelogs > > don't seem to make that much sense, unless we want to start using > > printk() extensively in NMI functions - using the generic nmi backtrace > > code surely gets us something that works across all architectures... > > Yes, I was scratching my head over that. The patchset takes an nmi-safe > all-cpu-backtrace and generalises that into an nmi-safe printk. That > *sounds* like a good thing to do but yes, some additional justification > would be helpful. What real-world value does this patchset really > bring to real-world users? The patchset brings two big advantages. First, it makes the NMI backtraces safe on all architectures for free. Second, it makes all NMI messages almost[*] safe on all architectures. Note that there already are several messages printed in NMI context. See the mail from Jiri Kosina. They are not easy to avoid. [*] The temporary buffer is limited. We still should keep the number of messages in NMI context at minimum. Best Regards, Petr
On Tue, 15 Dec 2015 15:26:21 +0100 Petr Mladek <pmladek@suse.com> wrote: > > OK, thanks. So "not needed at present, might be needed in the future, > > useful for out-of-tree debug code"? > > It is possible that I got it a wrong way on arm. The NMI buffer is > usable there on two locations. > > First, the temporary is currently used to handle IPI_CPU_BACKTRACE. > It seems that it is not a real NMI. But it seems to be available > (compiled) on all arm system. This is why I introduced NEED_PRINTK_NMI > Kconfig flag to avoid confusion with a real NMI. > > Second, there is the FIQ "NMI" handler that is called from > /arch/arm/kernel/entry-armv.S. It is compiled only if _not_ > defined $(CONFIG_CPU_V7M). It calls nmi_enter() and nmi_stop(). > It looks like a real NMI handler. This is why I defined HAVE_NMI > if (!CPU_V7M). > > A solution would be to define HAVE_NMI on all Arm systems and get rid > of NEED_PRINTK_NMI. If you think that it would cause less confusion... So does this mean that the patch will be updated? > > > > there's this effort to apply further cleanups - to me, the changelogs > > > don't seem to make that much sense, unless we want to start using > > > printk() extensively in NMI functions - using the generic nmi backtrace > > > code surely gets us something that works across all architectures... > > > > Yes, I was scratching my head over that. The patchset takes an nmi-safe > > all-cpu-backtrace and generalises that into an nmi-safe printk. That > > *sounds* like a good thing to do but yes, some additional justification > > would be helpful. What real-world value does this patchset really > > bring to real-world users? > > The patchset brings two big advantages. First, it makes the NMI > backtraces safe on all architectures for free. Second, it makes > all NMI messages almost[*] safe on all architectures. > > Note that there already are several messages printed in NMI context. > See the mail from Jiri Kosina. They are not easy to avoid. > > [*] The temporary buffer is limited. We still should keep > the number of messages in NMI context at minimum. This is important info - in fact a paragraph which starts with "The patchset brings two big advantages" is *the most* important info. I added the below text to the [1/n] changelog: : The patchset brings two big advantages. First, it makes the NMI : backtraces safe on all architectures for free. Second, it makes all NMI : messages almost safe on all architectures (the temporary buffer is : limited. We still should keep the number of messages in NMI context at : minimum). : : Note that there already are several messages printed in NMI context: : WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE : handlers. These are not easy to avoid.
On 11/12/15 23:26, Jiri Kosina wrote: > On Fri, 11 Dec 2015, Russell King - ARM Linux wrote: > >> I'm personally happy with the existing code, and I've been wondering why >> there's this effort to apply further cleanups - to me, the changelogs >> don't seem to make that much sense, unless we want to start using >> printk() extensively in NMI functions - using the generic nmi backtrace >> code surely gets us something that works across all architectures... > > It is already being used extensively, and not only for all-CPU backtraces. > For starters, please consider > > - WARN_ON(in_nmi()) > - BUG_ON(in_nmi()) Sorry to join in so late but... Today we risk deadlock when we try to issue these diagnostic errors directly from NMI context. After this change we will still risk deadlock, because that's what the diagnostic code is trying to tell us, *and* we delay actually reporting the error until, and only if, the NMI handler completes. I'm not entirely sure that this is an improvement. > - anything being printed out from MCE handlers The MCE handlers should only call printk() when they decide to panic and *after* busting the spinlocks. At this point deferring printk() until it is safe is not very helpful. When we bust the spinlocks we should probably restore the normal printk() function to give best chance of the failure messages making it out. Daniel.
On Fri, Dec 18, 2015 at 10:18:08AM +0000, Daniel Thompson wrote:
> I'm not entirely sure that this is an improvement.
What I do these days is delete everything in vprintk_emit() and simply
call early_printk().
Kill the useless kmsg buffer crap and locking, just pound bytes to the
UART registers without anything in between.
The other semi usable solution is redirecting to trace_printk() and
recovering the trace buffers from your kdump. But I've found that
typically kdump doesn't work anymore if you properly wedge the machine.
So this is very much a second rate solution.
But this globally locked buffer, calling out to console drivers that do
locking and even scheduling, is an unreliable unfixable trainwreck that
I've given up on.
On Fri, Dec 18, 2015 at 12:29:02PM +0100, Peter Zijlstra wrote: > On Fri, Dec 18, 2015 at 10:18:08AM +0000, Daniel Thompson wrote: > > I'm not entirely sure that this is an improvement. > > What I do these days is delete everything in vprintk_emit() and simply > call early_printk(). On that, whoever made the device model use vprintk_emit() broke the debugger (KGDB/KDB) printk intercept, and the whole vprintk_func redirection scheme.
On Fri 2015-12-18 10:18:08, Daniel Thompson wrote: > On 11/12/15 23:26, Jiri Kosina wrote: > >On Fri, 11 Dec 2015, Russell King - ARM Linux wrote: > > > >>I'm personally happy with the existing code, and I've been wondering why > >>there's this effort to apply further cleanups - to me, the changelogs > >>don't seem to make that much sense, unless we want to start using > >>printk() extensively in NMI functions - using the generic nmi backtrace > >>code surely gets us something that works across all architectures... > > > >It is already being used extensively, and not only for all-CPU backtraces. > >For starters, please consider > > > >- WARN_ON(in_nmi()) > >- BUG_ON(in_nmi()) > > Sorry to join in so late but... > > Today we risk deadlock when we try to issue these diagnostic errors > directly from NMI context. > > After this change we will still risk deadlock, because that's what > the diagnostic code is trying to tell us, *and* we delay actually > reporting the error until, and only if, the NMI handler completes. I think that NMI messages about a possible deadlock are the ones from kernel/locking/rtmutex.c kernel/irq_work.c include/linux/hardirq.h You are right that if the deadlock happens, this patch set lowers the chance to see the message. On the other hand, all the other printk's in NMI seems to be non-fatal warnings. In this case, this patch set increases the chance to see them. A compromise might be to explicitly call printk_nmi_flush() in the few fatal cases. Alternatively we could force the messages on the early_console when available. > >- anything being printed out from MCE handlers > > The MCE handlers should only call printk() when they decide to panic > and *after* busting the spinlocks. At this point deferring printk() > until it is safe is not very helpful. > > When we bust the spinlocks we should probably restore the normal > printk() function to give best chance of the failure messages making > it out. The problem is that we do not know what locks need to be busted. There are too many consoles and too many locks involved. Also busting locks open another can of worms. Best Regards, Petr
On 18/12/15 14:52, Petr Mladek wrote: > On Fri 2015-12-18 10:18:08, Daniel Thompson wrote: >> On 11/12/15 23:26, Jiri Kosina wrote: >>> On Fri, 11 Dec 2015, Russell King - ARM Linux wrote: >>> >>>> I'm personally happy with the existing code, and I've been wondering why >>>> there's this effort to apply further cleanups - to me, the changelogs >>>> don't seem to make that much sense, unless we want to start using >>>> printk() extensively in NMI functions - using the generic nmi backtrace >>>> code surely gets us something that works across all architectures... >>> >>> It is already being used extensively, and not only for all-CPU backtraces. >>> For starters, please consider >>> >>> - WARN_ON(in_nmi()) >>> - BUG_ON(in_nmi()) >> >> Sorry to join in so late but... >> >> Today we risk deadlock when we try to issue these diagnostic errors >> directly from NMI context. >> >> After this change we will still risk deadlock, because that's what >> the diagnostic code is trying to tell us, *and* we delay actually >> reporting the error until, and only if, the NMI handler completes. > > I think that NMI messages about a possible deadlock are the ones > from > > kernel/locking/rtmutex.c > kernel/irq_work.c > include/linux/hardirq.h > > You are right that if the deadlock happens, this patch set lowers the > chance to see the message. > > On the other hand, all the other printk's in NMI seems to be non-fatal > warnings. In this case, this patch set increases the chance to see > them. Maybe for a WARN_ON() the trade off is worth it but I don't think a BUG_ON() trace would ever make it out. > A compromise might be to explicitly call printk_nmi_flush() in the few > fatal cases. Alternatively we could force the messages on the > early_console when available. > > >>> - anything being printed out from MCE handlers >> >> The MCE handlers should only call printk() when they decide to panic >> and *after* busting the spinlocks. At this point deferring printk() >> until it is safe is not very helpful. >> >> When we bust the spinlocks we should probably restore the normal >> printk() function to give best chance of the failure messages making >> it out. > > The problem is that we do not know what locks need to be busted. There > are too many consoles and too many locks involved. Also busting locks > open another can of worms. Yes, I agree that busting the spinlocks doesn't avoid all risk of deadlock. Probably I've been placing too much weight on the importance of getting messages out when dying. You're right that surviving far enough through a panic to trigger kdump or reset is equally (or more) important in many scenarios than getting a failure message out. However on a system with nothing but "while(1) {}" hooked up to panic() then its worth risking a lock up. In this case restoring normal printk() behavior and dumping the NMI buffers would be worthwhile. Daniel.
On Fri, 18 Dec 2015 13:11:41 +0100 Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Dec 18, 2015 at 12:29:02PM +0100, Peter Zijlstra wrote: > > On Fri, Dec 18, 2015 at 10:18:08AM +0000, Daniel Thompson wrote: > > > I'm not entirely sure that this is an improvement. > > > > What I do these days is delete everything in vprintk_emit() and simply > > call early_printk(). > > On that, whoever made the device model use vprintk_emit() broke the > debugger (KGDB/KDB) printk intercept, and the whole vprintk_func > redirection scheme. crap, we have a whole set of interfaces which are broken this way. printk_emit(), vprintk(), vprintk_emit(). commit 7ff9554bb578ba02166071d2d487b7fc7d860d62 Author: Kay Sievers <kay@vrfy.org> AuthorDate: Thu May 3 02:29:13 2012 +0200 Commit: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CommitDate: Mon May 7 16:53:02 2012 -0700 printk: convert byte-buffer to variable-length record buffer
On 18/12/15 17:00, Daniel Thompson wrote: >>> The MCE handlers should only call printk() when they decide to panic >>> and *after* busting the spinlocks. At this point deferring printk() >>> until it is safe is not very helpful. >>> >>> When we bust the spinlocks we should probably restore the normal >>> printk() function to give best chance of the failure messages making >>> it out. >> >> The problem is that we do not know what locks need to be busted. There >> are too many consoles and too many locks involved. Also busting locks >> open another can of worms. > > Yes, I agree that busting the spinlocks doesn't avoid all risk of deadlock. > > Probably I've been placing too much weight on the importance of getting > messages out when dying. You're right that surviving far enough through > a panic to trigger kdump or reset is equally (or more) important in many > scenarios than getting a failure message out. > > However on a system with nothing but "while(1) {}" hooked up to panic() > then its worth risking a lock up. In this case restoring normal printk() > behavior and dumping the NMI buffers would be worthwhile. An a (much) later thread[1] Andrew Morton described this comment as non-committal. Sorry for that. I don't object to the overall approach taken by Petr, merely that I think there are cases where the current patchset doesn't put in quite enough effort to issue messages. Panic triggered during NMI is the only case I can think of and that, I think, could be addressed by adding an extra call to printk_nmi_flush() during panic(). It should probably only cover cases where we *don't* call kdump and the panic handler doesn't restart the machine... so just after the pr_emerg("...end kernel panic") would be OK for me. Daniel. [1]: http://thread.gmane.org/gmane.linux.ports.arm.kernel/482845
diff --git a/init/Kconfig b/init/Kconfig index efcff25a112d..61cfd96a3c96 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -870,7 +870,7 @@ config NMI_LOG_BUF_SHIFT int "Temporary per-CPU NMI log buffer size (12 => 4KB, 13 => 8KB)" range 10 21 default 13 - depends on PRINTK && HAVE_NMI + depends on PRINTK_NMI help Select the size of a per-CPU buffer where NMI messages are temporary stored. They are copied to the main log buffer in a safe context