diff mbox

[v5,1/4] printk/nmi: generic solution for safe printk in NMI

Message ID 20170427121458.2be577cc@gandalf.local.home (mailing list archive)
State New, archived
Headers show

Commit Message

Steven Rostedt April 27, 2017, 4:14 p.m. UTC
On Thu, 20 Apr 2017 15:11:54 +0200
Petr Mladek <pmladek@suse.com> wrote:



> 
> >From c530d9dee91c74db5e6a198479e2e63b24cb84a2 Mon Sep 17 00:00:00 2001  
> From: Petr Mladek <pmladek@suse.com>
> Date: Thu, 20 Apr 2017 10:52:31 +0200
> Subject: [PATCH] printk: Use the main logbuf in NMI when logbuf_lock is
>  available

I tried this patch. It's better because I get the end of the trace, but
I do lose the beginning of it:

** 196358 printk messages dropped ** [  102.321182]     perf-5981    0.... 12983650us : d_path <-seq_path

The way I tested it was by adding this:



Then doing the following:

# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops 
# trace-cmd start -p function
# echo nmi_handle:cpudump > /debug/tracing/set_ftrace_filter
# perf record -c 100 -a sleep 1

And that triggers the crash.

-- Steve

Comments

Sergey Senozhatsky April 28, 2017, 1:35 a.m. UTC | #1
On (04/27/17 12:14), Steven Rostedt wrote:
[..]
> I tried this patch. It's better because I get the end of the trace, but
> I do lose the beginning of it:
> 
> ** 196358 printk messages dropped ** [  102.321182]     perf-5981    0.... 12983650us : d_path <-seq_path

many thanks!

so we now drop messages from logbuf, not from per-CPU buffers. that
"queue printk_deferred irq_work on every online CPU when we bypass per-CPU
buffers from NMI" idea *probably* might help here - we need someone to emit
messages from the logbuf while we printk from NMI. there is still a
possibility that we can drop messages, though, since log_store() from NMI
CPU can be much-much faster than call_console_drivers() on other CPU.

	-ss
Petr Mladek April 28, 2017, 12:57 p.m. UTC | #2
On Fri 2017-04-28 10:35:32, Sergey Senozhatsky wrote:
> On (04/27/17 12:14), Steven Rostedt wrote:
> [..]
> > I tried this patch. It's better because I get the end of the trace, but
> > I do lose the beginning of it:
> > 
> > ** 196358 printk messages dropped ** [  102.321182]     perf-5981    0.... 12983650us : d_path <-seq_path
>
> many thanks!
> 
> so we now drop messages from logbuf, not from per-CPU buffers. that
> "queue printk_deferred irq_work on every online CPU when we bypass per-CPU
> buffers from NMI" idea *probably* might help here - we need someone to emit
> messages from the logbuf while we printk from NMI. there is still a
> possibility that we can drop messages, though, since log_store() from NMI
> CPU can be much-much faster than call_console_drivers() on other CPU.

ftrace log is dumped via trace_panic_notifier. It is done after
smp_send_stop(). It means that only a single CPU is available and
it is NMI context at the moment.

One possibility might be to put printk into a special mode and
drop the last messages instead of the first ones. But this would
need to be configurable.

Of course, if the problem is reproducible, the easiest solution
is to use bigger main log buffer, for example boot with
log_buf_len=32M.

Best Regards,
Petr
Steven Rostedt April 28, 2017, 2:16 p.m. UTC | #3
On Fri, 28 Apr 2017 14:57:25 +0200
Petr Mladek <pmladek@suse.com> wrote:


> Of course, if the problem is reproducible, the easiest solution
> is to use bigger main log buffer, for example boot with
> log_buf_len=32M.

Of course that may not be enough. Especially when I have a machine with
240 CPUs. But it also has a ton of RAM, I could easily do
log_buf_len=32G

-- Steve
diff mbox

Patch

Index: linux-trace.git/kernel/trace/trace_functions.c
===================================================================
--- linux-trace.git.orig/kernel/trace/trace_functions.c
+++ linux-trace.git/kernel/trace/trace_functions.c
@@ -469,8 +469,11 @@  ftrace_cpudump_probe(unsigned long ip, u
 		     struct trace_array *tr, struct ftrace_probe_ops *ops,
 		     void *data)
 {
-	if (update_count(ops, ip, data))
-		ftrace_dump(DUMP_ORIG);
+	char *killer = NULL;
+
+	panic_on_oops = 1;	/* force panic */
+	wmb();
+	*killer = 1;
 }
 
 static int