Message ID | 20170421015724.GA586@jagdpanzerIV.localdomain (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri 2017-04-21 10:57:25, Sergey Senozhatsky wrote: > On (04/20/17 15:11), Petr Mladek wrote: > [..] > > void printk_nmi_enter(void) > > { > > - this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); > > + /* > > + * The size of the extra per-CPU buffer is limited. Use it > > + * only when really needed. > > + */ > > + if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK || > > + raw_spin_is_locked(&logbuf_lock)) { > > + this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); > > + } else { > > + this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK); > > + } > > } > > well... the logbuf_lock can temporarily be locked from another CPU. I'd say > that spin_is_locked() has better chances for false positive than > this_cpu_read(printk_context). because this_cpu_read(printk_context) depends > only on this CPU state, while spin_is_locked() depends on all CPUs. and the > idea with this_cpu_read(printk_context) was that we check if the logbuf_lock > was locked from this particular CPU. I finally see the point. I confess that I did not think much about this way yesterday because it looked too tricky (indirect) and possibly error prone. > I agree that this_cpu_read(printk_context) covers slightly more than > logbuf_lock scope, so we may get positive this_cpu_read(printk_context) > with unlocked logbuf_lock, but I don't tend to think that it's a big > problem. PRINTK_SAFE_CONTEXT is set also in call_console_drivers(). It might take rather long and logbuf_lock is availe. So, it is noticeable source of false positives. Hmm, my version actually checks both the lock and the context. It is very deffensive to be always on the safe side. We could get the best restults with both checks and by using "&&": void printk_nmi_enter(void) { /* * The size of the extra per-CPU buffer is limited. Use it * only when the lock for the main log buffer is not * available. * * logbuf_lock might be taken on another CPU. But it must be * in PRINTK_SAFE context. Reduce false positives by a check * of the context. */ if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK || raw_spin_is_locked(&logbuf_lock)) { this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); } else { this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK); } } But after all, I would prefer to keep it simple, strightforward, and check only the logbuf_lock: if (raw_spin_is_locked(&logbuf_lock)) this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); else this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK); > wouldn't something as simple as below do the trick? > // absolutely and completely untested // > > > diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c > index 033e50a7d706..c7477654c5b1 100644 > --- a/kernel/printk/printk_safe.c > +++ b/kernel/printk/printk_safe.c > @@ -303,7 +303,10 @@ static int vprintk_nmi(const char *fmt, va_list args) > { > struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq); > > - return printk_safe_log_store(s, fmt, args); > + if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) > + return printk_safe_log_store(s, fmt, args); > + > + return vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); > } It looks simple but some things are missing. It will be used also outside panic/oops, so it should queue the irq_work to flush the console. Also the serialization of nmi_cpu_backtrace() backtrace calls is needed. All in all, we could get rid only of the extra PRINTK_NMI_DEFERRED_CONTEXT_MASK with this approach. Other than that it looks more tricky to me. Sigh, I hate problems without the single best solution. Best Regards, Petr
On (04/21/17 14:06), Petr Mladek wrote: [..] > > I agree that this_cpu_read(printk_context) covers slightly more than > > logbuf_lock scope, so we may get positive this_cpu_read(printk_context) > > with unlocked logbuf_lock, but I don't tend to think that it's a big > > problem. > > PRINTK_SAFE_CONTEXT is set also in call_console_drivers(). > It might take rather long and logbuf_lock is availe. So, it is > noticeable source of false positives. yes, agree. probably we need additional printk_safe annotations for "logbuf_lock is locked from _this_ CPU" false positives there can be very painful. [..] > if (raw_spin_is_locked(&logbuf_lock)) > this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); > else > this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK); well, if everyone is fine with logbuf_lock access from every CPU from every NMI then I won't object either. but may be it makes sense to reduce the possibility of false positives. Steven is loosing critically important logs, after all. by the way, does this `nmi_print_seq' bypass even fix anything for Steven? it sort of can, in theory, but just in theory. so may be we need direct message flush from NMI handler (printk->console_unlock), which will be a really big problem. logbuf might not be big enough for 4890096 messages (Steven's report mentions "Lost 4890096 message(s)!"). we are counting on the fact that in case of `nmi_print_seq' bypass some other CPU will call console_unlock() and print pending logbuf messages, but this is not guaranteed and the messages can be dropped even from logbuf. I don't know, should we try to queue printk_deferred irq_work for all online CPUs from vprintk_nmi() when it bypasses printk_safe_log_store()? in order to minimize possibilities of logbuf overflow. printk_deferred() will queue work on vprintk_nmi() CPU, sure, but we don't know how many messages we are going to add to logbuf from NMI. > > @@ -303,7 +303,10 @@ static int vprintk_nmi(const char *fmt, va_list args) > > { > > struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq); > > > > - return printk_safe_log_store(s, fmt, args); > > + if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) > > + return printk_safe_log_store(s, fmt, args); > > + > > + return vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); > > } > > It looks simple but some things are missing. It will be used also > outside panic/oops, so it should queue the irq_work to flush the console. you are right. I thought about moving irq_work to vprintk_emit(), but completely forgot about it. without that missing bit the proposed two-liner is not complete. -ss
On Mon 2017-04-24 11:17:47, Sergey Senozhatsky wrote: > On (04/21/17 14:06), Petr Mladek wrote: > [..] > > > I agree that this_cpu_read(printk_context) covers slightly more than > > > logbuf_lock scope, so we may get positive this_cpu_read(printk_context) > > > with unlocked logbuf_lock, but I don't tend to think that it's a big > > > problem. > > > > PRINTK_SAFE_CONTEXT is set also in call_console_drivers(). > > It might take rather long and logbuf_lock is availe. So, it is > > noticeable source of false positives. > > yes, agree. > > probably we need additional printk_safe annotations for > "logbuf_lock is locked from _this_ CPU" > > false positives there can be very painful. > > [..] > > if (raw_spin_is_locked(&logbuf_lock)) > > this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK); > > else > > this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK); > > well, if everyone is fine with logbuf_lock access from every CPU from every > NMI then I won't object either. but may be it makes sense to reduce the > possibility of false positives. Steven is loosing critically important logs, > after all. > > > by the way, > does this `nmi_print_seq' bypass even fix anything for Steven? I think that this is the most important question. Steven, does the patch from https://lkml.kernel.org/r/20170420131154.GL3452@pathway.suse.cz help you to see the debug messages, please? > it sort of > can, in theory, but just in theory. so may be we need direct message flush > from NMI handler (printk->console_unlock), which will be a really big problem. I thought about it a lot and got scared where this might go. We need to balance the usefulness and the complexity of the solution. It took one year to discover this regression. Before it was suggested to avoid calling printk() in NMI context at all. Now, we are trying to fix printk() to handle MBs of messages in NMI context. If my proposed patch solves the problem for Steven, I would still like to get similar solution in. It is not that complex and helps to bypass the limited per-CPU buffer in most cases. I always thought that 8kB might be not enough in some cases. Note that my patch is very defensive. It uses the main log buffer only when it is really safe. It has higher potential for unneeded fallback but if it works for Steven (really existing usecase), ... On the other hand, I would prefer to avoid any much more complex solution until we have a real reports that they are needed. Also we need to look for alternatives. There is a chance to create crashdump and get the ftrace messages from it. Also this might be scenario when we might need to suggest the early_printk() patchset from Peter Zijlstra. > logbuf might not be big enough for 4890096 messages (Steven's report > mentions "Lost 4890096 message(s)!"). we are counting on the fact that > in case of `nmi_print_seq' bypass some other CPU will call console_unlock() > and print pending logbuf messages, but this is not guaranteed and the > messages can be dropped even from logbuf. Yup. I tested the patch here and I needed to increase the main log buffer size to see all ftrace messages. Fortunately, it was possible to use a really huge global buffer. But it is not realistic to use huge per-CPU ones. Best Regards, Petr
On Thu, 27 Apr 2017 15:38:19 +0200 Petr Mladek <pmladek@suse.com> wrote: > > by the way, > > does this `nmi_print_seq' bypass even fix anything for Steven? > > I think that this is the most important question. > > Steven, does the patch from > https://lkml.kernel.org/r/20170420131154.GL3452@pathway.suse.cz > help you to see the debug messages, please? You'll have to wait for a bit. The box that I was debugging takes 45 minutes to reboot. And I don't have much more time to play on it before I have to give it back. I already found the bug I was looking for and I'm trying not to crash it again (due to the huge bring up time). When I get a chance, I'll see if I can insert a trigger to crash the kernel from NMI on another box and see if this patch helps. Thanks, -- Steve
On Thu 2017-04-27 10:31:18, Steven Rostedt wrote: > On Thu, 27 Apr 2017 15:38:19 +0200 > Petr Mladek <pmladek@suse.com> wrote: > > > > by the way, > > > does this `nmi_print_seq' bypass even fix anything for Steven? > > > > I think that this is the most important question. > > > > Steven, does the patch from > > https://lkml.kernel.org/r/20170420131154.GL3452@pathway.suse.cz > > help you to see the debug messages, please? > > You'll have to wait for a bit. The box that I was debugging takes 45 > minutes to reboot. And I don't have much more time to play on it before > I have to give it back. I already found the bug I was looking for and > I'm trying not to crash it again (due to the huge bring up time). I see. > When I get a chance, I'll see if I can insert a trigger to crash the > kernel from NMI on another box and see if this patch helps. I actually tested it here using this hack: diff --cc lib/nmi_backtrace.c index d531f85c0c9b,0bc0a3535a8a..000000000000 --- a/lib/nmi_backtrace.c +++ b/lib/nmi_backtrace.c @@@ -89,8 -90,7 +90,9 @@@ bool nmi_cpu_backtrace(struct pt_regs * int cpu = smp_processor_id(); if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) { + if (in_nmi()) + panic("Simulating panic in NMI\n"); + arch_spin_lock(&lock); if (regs && cpu_in_idle(instruction_pointer(regs))) { pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n", cpu, instruction_pointer(regs)); and triggered by: echo l > /proc/sysrq-trigger The patch really helped to see much more (all) messages from the ftrace buffers in NMI mode. But the test is a bit artifical. The patch might not help when there is a big printk() activity on the system when the panic() is triggered. We might wrongly use the small per-CPU buffer when the logbuf_lock is tested and taken on another CPU at the same time. It means that it will not always help. I personally think that the patch might be good enough. I am not sure if a perfect (more comlpex) solution is worth it. Best Regards, Petr
On Thu, 27 Apr 2017 17:28:07 +0200 Petr Mladek <pmladek@suse.com> wrote: > > When I get a chance, I'll see if I can insert a trigger to crash the > > kernel from NMI on another box and see if this patch helps. > > I actually tested it here using this hack: > > diff --cc lib/nmi_backtrace.c > index d531f85c0c9b,0bc0a3535a8a..000000000000 > --- a/lib/nmi_backtrace.c > +++ b/lib/nmi_backtrace.c > @@@ -89,8 -90,7 +90,9 @@@ bool nmi_cpu_backtrace(struct pt_regs * > int cpu = smp_processor_id(); > > if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) { > + if (in_nmi()) > + panic("Simulating panic in NMI\n"); > + arch_spin_lock(&lock); I was going to create a ftrace trigger, to crash on demand, but this may do as well. > if (regs && cpu_in_idle(instruction_pointer(regs))) { > pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n", > cpu, instruction_pointer(regs)); > > and triggered by: > > echo l > /proc/sysrq-trigger > > The patch really helped to see much more (all) messages from the ftrace > buffers in NMI mode. > > But the test is a bit artifical. The patch might not help when there > is a big printk() activity on the system when the panic() is > triggered. We might wrongly use the small per-CPU buffer when > the logbuf_lock is tested and taken on another CPU at the same time. > It means that it will not always help. > > I personally think that the patch might be good enough. I am not sure > if a perfect (more comlpex) solution is worth it. I wasn't asking for perfect, as the previous solutions never were either. I just want an optimistic dump if possible. I'll try to get some time today to test this, and let you know. But it wont be on the machine that I originally had the issue with. Thanks, -- Steve
On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote: > Also we need to look for alternatives. There is a chance > to create crashdump and get the ftrace messages from it. > Also this might be scenario when we might need to suggest > the early_printk() patchset from Peter Zijlstra. I'd be happy to repost those. I still carry them in my tree.
On Fri 2017-04-28 11:02:26, Peter Zijlstra wrote: > On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote: > > Also we need to look for alternatives. There is a chance > > to create crashdump and get the ftrace messages from it. > > Also this might be scenario when we might need to suggest > > the early_printk() patchset from Peter Zijlstra. > > I'd be happy to repost those. I still carry them in my tree. You do not need to if they are still the same as https://lkml.kernel.org/r/20161018170830.405990950@infradead.org I rather do not promise anything but I would like to look at them within next few weeks (after the merge window). Best Regards, Petr
On Fri, Apr 28, 2017 at 03:44:23PM +0200, Petr Mladek wrote: > On Fri 2017-04-28 11:02:26, Peter Zijlstra wrote: > > On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote: > > > Also we need to look for alternatives. There is a chance > > > to create crashdump and get the ftrace messages from it. > > > Also this might be scenario when we might need to suggest > > > the early_printk() patchset from Peter Zijlstra. > > > > I'd be happy to repost those. I still carry them in my tree. > > You do not need to if they are still the same as > https://lkml.kernel.org/r/20161018170830.405990950@infradead.org More or less, I think I fixed an intermediate compile fail reported by you and the 0-day thing and I wrote a few comments in the last patch. They live here fwiw: git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/tmp > I rather do not promise anything but I would like to look at them > within next few weeks (after the merge window). Sure, I'll post them again during/after the merge window.
On Fri, 28 Apr 2017 15:58:10 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Apr 28, 2017 at 03:44:23PM +0200, Petr Mladek wrote: > > On Fri 2017-04-28 11:02:26, Peter Zijlstra wrote: > > > On Thu, Apr 27, 2017 at 03:38:19PM +0200, Petr Mladek wrote: > > > > Also we need to look for alternatives. There is a chance > > > > to create crashdump and get the ftrace messages from it. > > > > Also this might be scenario when we might need to suggest > > > > the early_printk() patchset from Peter Zijlstra. > > > > > > I'd be happy to repost those. I still carry them in my tree. > > > > You do not need to if they are still the same as > > https://lkml.kernel.org/r/20161018170830.405990950@infradead.org > > More or less, I think I fixed an intermediate compile fail reported by > you and the 0-day thing and I wrote a few comments in the last patch. > > They live here fwiw: > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/tmp > > > I rather do not promise anything but I would like to look at them > > within next few weeks (after the merge window). > > Sure, I'll post them again during/after the merge window. BTW, this is what I ended up using to get the debug information I needed. -- Steve
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 033e50a7d706..c7477654c5b1 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -303,7 +303,10 @@ static int vprintk_nmi(const char *fmt, va_list args) { struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq); - return printk_safe_log_store(s, fmt, args); + if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) + return printk_safe_log_store(s, fmt, args); + + return vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); } -ss