diff mbox

[v7,4/4] nmi_backtrace: generate one-line reports for idle cpus

Message ID 20160811152538.GH13300@pathway.suse.cz (mailing list archive)
State New, archived
Headers show

Commit Message

Petr Mladek Aug. 11, 2016, 3:25 p.m. UTC
On Tue 2016-08-09 12:43:58, Chris Metcalf wrote:
> On 8/9/2016 8:43 AM, Petr Mladek wrote:
> >On Mon 2016-08-08 12:03:38, Chris Metcalf wrote:
> >>When doing an nmi backtrace of many cores, most of which are idle,
> >>the output is a little overwhelming and very uninformative.  Suppress
> >>messages for cpus that are idling when they are interrupted and just
> >>emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
> >Hmm, I do not see this message even though the CPU is in the idle state:
> >
> >[ 7918.884535] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.8.0-rc1-4-default+ #3088
> >[ 7918.884538] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> >[ 7918.884539] task: ffff88013a594380 task.stack: ffff88013a598000
> >[ 7918.884541] RIP: 0010:[<ffffffff81050bc6>]  [<ffffffff81050bc6>] native_safe_halt+0x6/0x10
> >[ 7918.884543] RSP: 0018:ffff88013a59bea8  EFLAGS: 00000206
> >[ 7918.884544] RAX: ffff88013a594380 RBX: 0000000000000003 RCX: 0000000000000000
> >[ 7918.884546] RDX: ffff88013a594380 RSI: 0000000000000001 RDI: ffff88013a594380
> >[ 7918.884548] RBP: ffff88013a59bea8 R08: 0000000000000000 R09: 0000000000000000
> >[ 7918.884550] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000003
> >[ 7918.884551] R13: 0000000000000000 R14: ffff88013a598000 R15: ffff88013a598000
> >[ 7918.884553] FS:  0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
> >[ 7918.884554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >[ 7918.884556] CR2: 00007f8afc65e000 CR3: 00000001383b8000 CR4: 00000000000006e0
> >[ 7918.884557] Stack:
> >[ 7918.884559]  ffff88013a59bec8 ffffffff819573d3 0000000000000003 0000000000000000
> >[ 7918.884561]  ffff88013a59bed8 ffffffff8102628f ffff88013a59bee8 ffffffff819579ea
> >[ 7918.884562]  ffff88013a59bf30 ffffffff810bfe1a ffff88013a598000 ffff88013a598000
> >[ 7918.884563] Call Trace:
> >[ 7918.884565]  [<ffffffff819573d3>] default_idle+0x23/0x170
> >[ 7918.884566]  [<ffffffff8102628f>] arch_cpu_idle+0xf/0x20
> >[ 7918.884568]  [<ffffffff819579ea>] default_idle_call+0x2a/0x50
> >[ 7918.884570]  [<ffffffff810bfe1a>] cpu_startup_entry+0x16a/0x260
> >[ 7918.884571]  [<ffffffff8103faf6>] start_secondary+0xf6/0x100
> >[ 7918.884573] Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55
> >  48 89 e5 f4 5d c3 66 0f 1f 84
> >
> >Note that I test it in a virtual machine using qemu.
> >
> >The strange thing is that I do not see .cpuidle.text section in
> >the vmlinux binary. But it is possible that I have misunderstood
> >the concept.
> 
> The .cpuidle.text sections are merged into the final kernel's overall
> text segment.  What you should see is something like this in the "nm -n"
> output from the built vmlinux:
> 
> [...]
> ffffffff81922aa8 T __cpuidle_text_start
> ffffffff81922ab0 T default_idle
> ffffffff81922b90 t mwait_idle
> ffffffff81922d20 T acpi_processor_ffh_cstate_enter
> ffffffff81922df0 T default_idle_call
> ffffffff81922e30 t cpu_idle_poll
> ffffffff81922f50 t intel_idle
> ffffffff81923085 t acpi_idle_do_entry
> ffffffff819230d0 t poll_idle
> ffffffff81923143 T __cpuidle_text_end
> [...]
> 
> In other words, all the cpuidle functions grouped together and bracketed by
> the __cpuidle_text_{start,end} symbols.
> 
> Perhaps you were running with a kernel that didn't have the actual patch 4/4
> applied, but just the earlier patches?  Or perhaps your host Linux has been
> patched, but not the guest Linux running in qemu?  Or perhaps you are
> ending up doing an NMI backtrace on the host kernel, not the guest?

Hmm, the problem is that native_safe_halt() is called from default_idle()
here. The function is marked as inline but the compiler did not inline
it.

It helped me to put native_safe_halt() into the __cpuidle_text section:



I did not use __cpuidle macro because I was not able to include
linux/cpu.h into that header.

I wonder if it would be possible to detect the idle thread an other
way. For example, I wonder if it would be enough to check for the
PID 0.


Best Regards,
Petr

Comments

Peter Zijlstra Aug. 11, 2016, 3:36 p.m. UTC | #1
On Thu, Aug 11, 2016 at 05:25:38PM +0200, Petr Mladek wrote:
> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
> index b77f5edb03b0..e31d50acd491 100644
> --- a/arch/x86/include/asm/irqflags.h
> +++ b/arch/x86/include/asm/irqflags.h
> @@ -44,7 +44,7 @@ static inline void native_irq_enable(void)
>  	asm volatile("sti": : :"memory");
>  }
>  
> -static inline void native_safe_halt(void)
> +static inline __attribute__((__section__(".cpuidle.text"))) void native_safe_halt(void)
>  {
>  	asm volatile("sti; hlt": : :"memory");
>  }

An alternative is to use __always_inline I suppose.
Chris Metcalf Aug. 15, 2016, 4:41 p.m. UTC | #2
On 8/11/2016 11:25 AM, Petr Mladek wrote:
> On Mon 2016-08-08 12:03:38, Chris Metcalf wrote:
>>>> When doing an nmi backtrace of many cores, most of which are idle,
>>>> the output is a little overwhelming and very uninformative.  Suppress
>>>> messages for cpus that are idling when they are interrupted and just
>>>> emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
> Hmm, the problem is that native_safe_halt() is called from default_idle()
> here. The function is marked as inline but the compiler did not inline
> it.
>
> It helped me to put native_safe_halt() into the __cpuidle_text section:

Following Peter Z's suggestion, I have added an __always_inline marker
to native_safe_halt.  For consistency, I also did arch_safe_halt(), since that
invokes native_safe_halt, and then also native_halt() and halt(), so that
we're not being weirdly inconsistent with markings for halt inlines.

There are also the native_irq_enable(), etc., accessors in that same header
that are still only marked "inline" not "always_inline", but I wanted to stop
before I was touching too much unrelated code.

> I wonder if it would be possible to detect the idle thread an other
> way. For example, I wonder if it would be enough to check for the
> PID 0.

No, the problem is that pid 0 can also go off and run "interesting" code
for things like power management, etc., so we really just want to
focus on being quite sure that the running code isn't interesting before
we suppress the backtrace information.

See the thread around here:

https://lkml.kernel.org/r/20160307204317.GR6344@twins.programming.kicks-ass.net
Petr Mladek Aug. 16, 2016, 8:04 a.m. UTC | #3
On Mon 2016-08-15 12:41:54, Chris Metcalf wrote:
> On 8/11/2016 11:25 AM, Petr Mladek wrote:
> >On Mon 2016-08-08 12:03:38, Chris Metcalf wrote:
> >>>>When doing an nmi backtrace of many cores, most of which are idle,
> >>>>the output is a little overwhelming and very uninformative.  Suppress
> >>>>messages for cpus that are idling when they are interrupted and just
> >>>>emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
> >Hmm, the problem is that native_safe_halt() is called from default_idle()
> >here. The function is marked as inline but the compiler did not inline
> >it.
> >
> >It helped me to put native_safe_halt() into the __cpuidle_text section:
> 
> Following Peter Z's suggestion, I have added an __always_inline marker
> to native_safe_halt.  For consistency, I also did arch_safe_halt(), since that
> invokes native_safe_halt, and then also native_halt() and halt(), so that
> we're not being weirdly inconsistent with markings for halt inlines.
> 
> There are also the native_irq_enable(), etc., accessors in that same header
> that are still only marked "inline" not "always_inline", but I wanted to stop
> before I was touching too much unrelated code.

Sounds fine.

> >I wonder if it would be possible to detect the idle thread an other
> >way. For example, I wonder if it would be enough to check for the
> >PID 0.
> 
> No, the problem is that pid 0 can also go off and run "interesting" code
> for things like power management, etc., so we really just want to
> focus on being quite sure that the running code isn't interesting before
> we suppress the backtrace information.
> 
> See the thread around here:
> 
> https://lkml.kernel.org/r/20160307204317.GR6344@twins.programming.kicks-ass.net

Makes sense. Thanks for the poitner.

Best Regards,
Petr
diff mbox

Patch

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index b77f5edb03b0..e31d50acd491 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -44,7 +44,7 @@  static inline void native_irq_enable(void)
 	asm volatile("sti": : :"memory");
 }
 
-static inline void native_safe_halt(void)
+static inline __attribute__((__section__(".cpuidle.text"))) void native_safe_halt(void)
 {
 	asm volatile("sti; hlt": : :"memory");
 }