Message ID | 1458147733-29338-5-git-send-email-cmetcalf@mellanox.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Chris, [auto build test ERROR on tile/master] [also build test ERROR on v4.5] [cannot apply to next-20160316] [if your patch is applied to the wrong git tree, please drop us a note to help improving the system] url: https://github.com/0day-ci/linux/commits/Chris-Metcalf/improvements-to-the-nmi_backtrace-code/20160317-010929 base: https://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git master config: xtensa-common_defconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=xtensa All errors (new ones prefixed by >>): kernel/built-in.o: In function `SyS_setgroups': >> (.text+0x16688): undefined reference to `__cpuidle_text_start' kernel/built-in.o: In function `SyS_setgroups': >> (.text+0x1668c): undefined reference to `__cpuidle_text_end' --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Wed, Mar 16, 2016 at 01:02:13PM -0400, Chris Metcalf wrote: > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c > index 48958d3cec9e..37afd721ec99 100644 > --- a/scripts/mod/modpost.c > +++ b/scripts/mod/modpost.c > @@ -887,8 +887,8 @@ static void check_section(const char *modname, struct elf_info *elf, > #define ALL_EXIT_SECTIONS EXIT_SECTIONS, ALL_XXXEXIT_SECTIONS > > #define DATA_SECTIONS ".data", ".data.rel" > -#define TEXT_SECTIONS ".text", ".text.unlikely", ".sched.text", \ > - ".kprobes.text" > +#define TEXT_SECTIONS ".text", ".text.unlikely", \ > + ".kprobes.text", ".cpuidle.text" Where did .sched.text go? > #define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \ > ".fixup", ".entry.text", ".exception.text", ".text.*", \ > ".coldtext"
On Wed, Mar 16, 2016 at 01:02:13PM -0400, Chris Metcalf wrote: > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > index 9f7c21c22477..d569ae7fde37 100644 > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -298,7 +298,7 @@ void arch_cpu_idle(void) > /* > * We use this if we don't have any better idle routine.. > */ > -void default_idle(void) > +void __cpuidle default_idle(void) > { > trace_cpu_idle_rcuidle(1, smp_processor_id()); > safe_halt(); > @@ -413,7 +413,7 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c) > * with interrupts enabled and no flags, which is backwards compatible with the > * original MWAIT implementation. > */ > -static void mwait_idle(void) > +static __cpuidle void mwait_idle(void) > { > if (!current_set_polling_and_test()) { > trace_cpu_idle_rcuidle(1, smp_processor_id()); The most common idle function for x86 is: mwait_idle_with_hints(), trouble is, its an inline, so I'm not sure adding __cpuidle to it does anything. I've yet to find the magic objdump incantation to check. Or rather objdump -h doesn't appear to list .cpuidle.text at all :/ I'm probably doing something silly...
On 03/21/2016 11:38 AM, Peter Zijlstra wrote: > On Wed, Mar 16, 2016 at 01:02:13PM -0400, Chris Metcalf wrote: >> diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c >> index 48958d3cec9e..37afd721ec99 100644 >> --- a/scripts/mod/modpost.c >> +++ b/scripts/mod/modpost.c >> @@ -887,8 +887,8 @@ static void check_section(const char *modname, struct elf_info *elf, >> #define ALL_EXIT_SECTIONS EXIT_SECTIONS, ALL_XXXEXIT_SECTIONS >> >> #define DATA_SECTIONS ".data", ".data.rel" >> -#define TEXT_SECTIONS ".text", ".text.unlikely", ".sched.text", \ >> - ".kprobes.text" >> +#define TEXT_SECTIONS ".text", ".text.unlikely", \ >> + ".kprobes.text", ".cpuidle.text" > Where did .sched.text go? Indeed! Good catch. I can't even speculate as to how I managed to delete the thing on the previous line while adding something on the following line :-)
On 03/21/2016 11:42 AM, Peter Zijlstra wrote: > On Wed, Mar 16, 2016 at 01:02:13PM -0400, Chris Metcalf wrote: >> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c >> index 9f7c21c22477..d569ae7fde37 100644 >> --- a/arch/x86/kernel/process.c >> +++ b/arch/x86/kernel/process.c >> @@ -298,7 +298,7 @@ void arch_cpu_idle(void) >> /* >> * We use this if we don't have any better idle routine.. >> */ >> -void default_idle(void) >> +void __cpuidle default_idle(void) >> { >> trace_cpu_idle_rcuidle(1, smp_processor_id()); >> safe_halt(); >> @@ -413,7 +413,7 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c) >> * with interrupts enabled and no flags, which is backwards compatible with the >> * original MWAIT implementation. >> */ >> -static void mwait_idle(void) >> +static __cpuidle void mwait_idle(void) >> { >> if (!current_set_polling_and_test()) { >> trace_cpu_idle_rcuidle(1, smp_processor_id()); > The most common idle function for x86 is: mwait_idle_with_hints(), > trouble is, its an inline, so I'm not sure adding __cpuidle to it does > anything. No, you're right, it wouldn't help. I didn't look at the drivers/cpuidle subsystem at all in my patch, since I'm not that familiar with it, but it seems like tagging acpi_processor_ffh_cstate_enter(), as the only user of mwait_idle_with_hints(), will do the job. I do see that native_play_dead() also uses mwait/monitor, but since that's hotplug I don't think it's relevant to this patch series. > I've yet to find the magic objdump incantation to check. Or rather > objdump -h doesn't appear to list .cpuidle.text at all :/ > > I'm probably doing something silly... The easiest way to check for a given function is just to look at the "nm -n" output and see that all the functions you expect to reflect idle behavior are in the cpuidle begin/end range. Or, to look at "objdump -dr" and search for monitor/mwait. objdump -h certainly works to show .cpuidle.text if you look at individual objects (e.g. arch/x86/kernel/process.o) but by the time you're looking at the linked vmlinux image they have all been linked into the giant .text section.
On Mon, Mar 21, 2016 at 12:15:12PM -0400, Chris Metcalf wrote: > On 03/21/2016 11:42 AM, Peter Zijlstra wrote: > >The most common idle function for x86 is: mwait_idle_with_hints(), > >trouble is, its an inline, so I'm not sure adding __cpuidle to it does > >anything. > > No, you're right, it wouldn't help. I didn't look at the drivers/cpuidle > subsystem at all in my patch, since I'm not that familiar with it, > but it seems like tagging acpi_processor_ffh_cstate_enter(), as the > only user of mwait_idle_with_hints(), will do the job. intel_idle() also uses it. > >I've yet to find the magic objdump incantation to check. Or rather > >objdump -h doesn't appear to list .cpuidle.text at all :/ > > > >I'm probably doing something silly... > > The easiest way to check for a given function is just to look > at the "nm -n" output and see that all the functions you expect > to reflect idle behavior are in the cpuidle begin/end range. # nm -n ivb-ep-build/vmlinux | awk '/__cpuidle_text_start/ {p=1} {if (p) print $0} /__cpuidle_text_end/ {p=0}' ffffffff81b16ca8 T __cpuidle_text_start ffffffff81b16cb0 T default_idle ffffffff81b16e50 t mwait_idle ffffffff81b17080 t cpu_idle_poll ffffffff81b17280 T default_idle_call ffffffff81b172be T __cpuidle_text_end So no intel_idle for me.. > objdump -h certainly works to show .cpuidle.text if you look at > individual objects (e.g. arch/x86/kernel/process.o) but by the time > you're looking at the linked vmlinux image they have all been linked > into the giant .text section. Indeed.
On 03/21/2016 12:32 PM, Peter Zijlstra wrote: > On Mon, Mar 21, 2016 at 12:15:12PM -0400, Chris Metcalf wrote: >> On 03/21/2016 11:42 AM, Peter Zijlstra wrote: >>> The most common idle function for x86 is: mwait_idle_with_hints(), >>> trouble is, its an inline, so I'm not sure adding __cpuidle to it does >>> anything. >> No, you're right, it wouldn't help. I didn't look at the drivers/cpuidle >> subsystem at all in my patch, since I'm not that familiar with it, >> but it seems like tagging acpi_processor_ffh_cstate_enter(), as the >> only user of mwait_idle_with_hints(), will do the job. > intel_idle() also uses it. Ah, of course. I was only looking at the config options enabled in the kernel I was building. I've added INTEL_IDLE now and grep'ed the whole kernel tree as well, finding a couple of extra possibilities: I do see mwait used in the ACPI 4.0 Processor Aggregator Device driver, but this seems sufficiently far removed from regular cpuidle that I don't think it's appropriate to tag the power_saving_thread() function - the initial commit talks about using the mechanism "to ride-out transient electrical and thermal emergencies." There's also the thermal "powerclamp" driver that enforces a particular amount of idle time across the system. For this one it's less clear to me whether this is a valid "idle" state that we should ignore when doing NMI backtracing. This would be the clamp_thread() function in drivers/thermal/intel_powerclamp.c. For now I'm not including it, but what do you think? > # nm -n ivb-ep-build/vmlinux | awk '/__cpuidle_text_start/ {p=1} {if (p) print $0} /__cpuidle_text_end/ {p=0}' > ffffffff81b16ca8 T __cpuidle_text_start > ffffffff81b16cb0 T default_idle > ffffffff81b16e50 t mwait_idle > ffffffff81b17080 t cpu_idle_poll > ffffffff81b17280 T default_idle_call > ffffffff81b172be T __cpuidle_text_end > > So no intel_idle for me.. With the changes discussed so far in this email thread, we've gotten to: ffffffff818df178 T __cpuidle_text_start ffffffff818df180 T default_idle ffffffff818df260 t mwait_idle ffffffff818df3f0 T acpi_processor_ffh_cstate_enter ffffffff818df4a0 T default_idle_call ffffffff818df4e0 t cpu_idle_poll ffffffff818df600 t intel_idle_freeze ffffffff818df6a0 t intel_idle ffffffff818df7b5 T __cpuidle_text_end This is about 1,600 bytes (or about 450 instructions) that will cause NMI to skip doing a backtrace if the PC is anywhere in the range.
On Mon, Mar 21, 2016 at 01:12:39PM -0400, Chris Metcalf wrote: > I do see mwait used in the ACPI 4.0 Processor Aggregator Device driver, but > this seems sufficiently far removed from regular cpuidle that I don't > think it's appropriate to tag the power_saving_thread() function - > the initial commit talks about using the mechanism "to ride-out > transient electrical and thermal emergencies." > > There's also the thermal "powerclamp" driver that enforces a particular > amount of idle time across the system. For this one it's less clear to > me whether this is a valid "idle" state that we should ignore when doing > NMI backtracing. This would be the clamp_thread() function in > drivers/thermal/intel_powerclamp.c. For now I'm not including it, > but what do you think? Both the acpi power aggregator and the powerclamp driver are forced idle and have some serious issues, so are safe to ignore for now. Also, I would explicitly not include them, because forced idle might still be interesting. > ># nm -n ivb-ep-build/vmlinux | awk '/__cpuidle_text_start/ {p=1} {if (p) print $0} /__cpuidle_text_end/ {p=0}' > >ffffffff81b16ca8 T __cpuidle_text_start > >ffffffff81b16cb0 T default_idle > >ffffffff81b16e50 t mwait_idle > >ffffffff81b17080 t cpu_idle_poll > >ffffffff81b17280 T default_idle_call > >ffffffff81b172be T __cpuidle_text_end > > > >So no intel_idle for me.. > > With the changes discussed so far in this email thread, we've gotten to: > > ffffffff818df178 T __cpuidle_text_start > ffffffff818df180 T default_idle > ffffffff818df260 t mwait_idle > ffffffff818df3f0 T acpi_processor_ffh_cstate_enter > ffffffff818df4a0 T default_idle_call > ffffffff818df4e0 t cpu_idle_poll > ffffffff818df600 t intel_idle_freeze You can skip this one, that only happens when you suspend to idle. > ffffffff818df6a0 t intel_idle > ffffffff818df7b5 T __cpuidle_text_end > > This is about 1,600 bytes (or about 450 instructions) that will cause > NMI to skip doing a backtrace if the PC is anywhere in the range. Yeah, the alternative is making mwait_idle_with_hints an actual function, but then we get to somehow exclude the other users like the forced idle stuff.
On Wed, Mar 16, 2016 at 01:02:13PM -0400, Chris Metcalf wrote: > When doing an nmi backtrace of many cores, most of which are idle, > the output is a little overwhelming and very uninformative. Suppress > messages for cpus that are idling when they are interrupted and just > emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN". This is still 100+ lines on a modern system, but better than the many many thousands it would otherwise generate. > We do this by grouping all the cpuidle code together into a new > .cpuidle.text section, and then checking the address of the > interrupted PC to see if it lies within that section. > > Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Please Cc Rafael on the next posting.
From the version 1 cover letter: This patch series modifies the trigger_xxx_backtrace() NMI-based remote backtracing code to make it more flexible, and makes a few small improvements along the way. The motivation comes from the task isolation code, where there are scenarios where we want to be able to diagnose a case where some cpu is about to interrupt a task-isolated cpu. It can be helpful to see both where the interrupting cpu is, and also an approximation of where the cpu that is being interrupted is. The nmi_backtrace framework allows us to discover the stack of the interrupted cpu. I've tested that the change works as desired on tile, and build-tested x86, arm64, and arm. For x86 and arm64 I confirmed that the generic cpuidle stuff as well as the architecture-specific routines are in the new cpuidle section. For arm I just build-tested it and made sure the generic cpuidle routines were in the new cpuidle section, but I didn't attempt to tease apart the tangle of platform-specific idle routines that arm has and tag them with __cpuidle. That might be more usefully done by someone with arm platform experience in a follow-up patch. I have also pushed it up to kernel.org to pull if that's easier: git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git nmi-backtrace v3: Various improvements to the set of __cpuidle functions; Add back in a missing section accidentally removed in modpost.c (PeterZ) v2: Switch to using __cpuidle tagging, switch S-O-B to Mellanox https://lkml.kernel.org/r/1458147733-29338-1-git-send-email-cmetcalf@mellanox.com Chris Metcalf (4): nmi_backtrace: add more trigger_*_cpu_backtrace() methods nmi_backtrace: do a local dump_stack() instead of a self-NMI arch/tile: adopt the new nmi_backtrace framework nmi_backtrace: generate one-line reports for idle cpus arch/alpha/kernel/vmlinux.lds.S | 1 + arch/arc/kernel/vmlinux.lds.S | 1 + arch/arm/include/asm/irq.h | 4 +- arch/arm/kernel/smp.c | 13 +------ arch/arm/kernel/vmlinux.lds.S | 1 + arch/arm64/kernel/vmlinux.lds.S | 1 + arch/arm64/mm/proc.S | 2 + arch/avr32/kernel/vmlinux.lds.S | 1 + arch/blackfin/kernel/vmlinux.lds.S | 1 + arch/c6x/kernel/vmlinux.lds.S | 1 + arch/cris/kernel/vmlinux.lds.S | 1 + arch/frv/kernel/vmlinux.lds.S | 1 + arch/h8300/kernel/vmlinux.lds.S | 1 + arch/hexagon/kernel/vmlinux.lds.S | 1 + arch/ia64/kernel/vmlinux.lds.S | 1 + arch/m32r/kernel/vmlinux.lds.S | 1 + arch/m68k/kernel/vmlinux-nommu.lds | 1 + arch/m68k/kernel/vmlinux-std.lds | 1 + arch/m68k/kernel/vmlinux-sun3.lds | 1 + arch/metag/kernel/vmlinux.lds.S | 1 + arch/microblaze/kernel/vmlinux.lds.S | 1 + arch/mips/kernel/vmlinux.lds.S | 1 + arch/mn10300/kernel/vmlinux.lds.S | 1 + arch/nios2/kernel/vmlinux.lds.S | 1 + arch/openrisc/kernel/vmlinux.lds.S | 1 + arch/parisc/kernel/vmlinux.lds.S | 1 + arch/powerpc/kernel/vmlinux.lds.S | 1 + arch/s390/kernel/vmlinux.lds.S | 1 + arch/score/kernel/vmlinux.lds.S | 1 + arch/sh/kernel/vmlinux.lds.S | 1 + arch/sparc/kernel/vmlinux.lds.S | 1 + arch/tile/include/asm/irq.h | 4 +- arch/tile/kernel/entry.S | 2 +- arch/tile/kernel/pmc.c | 3 -- arch/tile/kernel/process.c | 72 ++++++++---------------------------- arch/tile/kernel/traps.c | 7 +++- arch/tile/kernel/vmlinux.lds.S | 1 + arch/um/kernel/dyn.lds.S | 1 + arch/um/kernel/uml.lds.S | 1 + arch/unicore32/kernel/vmlinux.lds.S | 1 + arch/x86/include/asm/irq.h | 4 +- arch/x86/kernel/acpi/cstate.c | 2 +- arch/x86/kernel/apic/hw_nmi.c | 6 +-- arch/x86/kernel/process.c | 4 +- arch/x86/kernel/vmlinux.lds.S | 1 + arch/xtensa/kernel/vmlinux.lds.S | 3 ++ drivers/idle/intel_idle.c | 4 +- include/asm-generic/vmlinux.lds.h | 6 +++ include/linux/cpu.h | 5 +++ include/linux/nmi.h | 63 ++++++++++++++++++++++++------- kernel/sched/idle.c | 13 ++++++- lib/nmi_backtrace.c | 40 +++++++++++++------- scripts/mod/modpost.c | 2 +- scripts/recordmcount.c | 1 + scripts/recordmcount.pl | 1 + 55 files changed, 177 insertions(+), 117 deletions(-)
diff --git a/arch/alpha/kernel/vmlinux.lds.S b/arch/alpha/kernel/vmlinux.lds.S index 647b84c15382..cebecfb76fbf 100644 --- a/arch/alpha/kernel/vmlinux.lds.S +++ b/arch/alpha/kernel/vmlinux.lds.S @@ -22,6 +22,7 @@ SECTIONS HEAD_TEXT TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT *(.fixup) *(.gnu.warning) diff --git a/arch/arc/kernel/vmlinux.lds.S b/arch/arc/kernel/vmlinux.lds.S index 894e696bddaa..65652160cfda 100644 --- a/arch/arc/kernel/vmlinux.lds.S +++ b/arch/arc/kernel/vmlinux.lds.S @@ -97,6 +97,7 @@ SECTIONS _text = .; TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT *(.fixup) diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S index 8b60fde5ce48..6c13d570e9c9 100644 --- a/arch/arm/kernel/vmlinux.lds.S +++ b/arch/arm/kernel/vmlinux.lds.S @@ -107,6 +107,7 @@ SECTIONS IRQENTRY_TEXT TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT *(.gnu.warning) diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S index e3928f578891..a5cbecf8a74c 100644 --- a/arch/arm64/kernel/vmlinux.lds.S +++ b/arch/arm64/kernel/vmlinux.lds.S @@ -104,6 +104,7 @@ SECTIONS IRQENTRY_TEXT TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT HYPERVISOR_TEXT IDMAP_TEXT diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index c164d2cb35c0..b1b60fc438f6 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -48,11 +48,13 @@ * * Idle the processor (wait for interrupt). */ + .pushsection ".cpuidle.text","ax" ENTRY(cpu_do_idle) dsb sy // WFI may enter a low-power mode wfi ret ENDPROC(cpu_do_idle) + .popsection #ifdef CONFIG_CPU_PM /** diff --git a/arch/avr32/kernel/vmlinux.lds.S b/arch/avr32/kernel/vmlinux.lds.S index a4589176bed5..17f2730eb497 100644 --- a/arch/avr32/kernel/vmlinux.lds.S +++ b/arch/avr32/kernel/vmlinux.lds.S @@ -52,6 +52,7 @@ SECTIONS KPROBES_TEXT TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT *(.fixup) *(.gnu.warning) diff --git a/arch/blackfin/kernel/vmlinux.lds.S b/arch/blackfin/kernel/vmlinux.lds.S index c9eec84aa258..63a02c342830 100644 --- a/arch/blackfin/kernel/vmlinux.lds.S +++ b/arch/blackfin/kernel/vmlinux.lds.S @@ -33,6 +33,7 @@ SECTIONS #ifndef CONFIG_SCHEDULE_L1 SCHED_TEXT #endif + CPUIDLE_TEXT LOCK_TEXT IRQENTRY_TEXT KPROBES_TEXT diff --git a/arch/c6x/kernel/vmlinux.lds.S b/arch/c6x/kernel/vmlinux.lds.S index 5a6e141d1641..9cabd962ab36 100644 --- a/arch/c6x/kernel/vmlinux.lds.S +++ b/arch/c6x/kernel/vmlinux.lds.S @@ -70,6 +70,7 @@ SECTIONS _stext = .; TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT IRQENTRY_TEXT KPROBES_TEXT diff --git a/arch/cris/kernel/vmlinux.lds.S b/arch/cris/kernel/vmlinux.lds.S index 7552c2557506..979586261520 100644 --- a/arch/cris/kernel/vmlinux.lds.S +++ b/arch/cris/kernel/vmlinux.lds.S @@ -43,6 +43,7 @@ SECTIONS HEAD_TEXT TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT *(.fixup) *(.text.__*) diff --git a/arch/frv/kernel/vmlinux.lds.S b/arch/frv/kernel/vmlinux.lds.S index 7e958d829ec9..aa6e573d57da 100644 --- a/arch/frv/kernel/vmlinux.lds.S +++ b/arch/frv/kernel/vmlinux.lds.S @@ -63,6 +63,7 @@ SECTIONS *(.text..tlbmiss) TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT #ifdef CONFIG_DEBUG_INFO INIT_TEXT diff --git a/arch/h8300/kernel/vmlinux.lds.S b/arch/h8300/kernel/vmlinux.lds.S index cb5dfb02c88d..7f11da1b895e 100644 --- a/arch/h8300/kernel/vmlinux.lds.S +++ b/arch/h8300/kernel/vmlinux.lds.S @@ -29,6 +29,7 @@ SECTIONS _stext = . ; TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT #if defined(CONFIG_ROMKERNEL) *(.int_redirect) diff --git a/arch/hexagon/kernel/vmlinux.lds.S b/arch/hexagon/kernel/vmlinux.lds.S index 5f268c1071b3..ec87e67feb19 100644 --- a/arch/hexagon/kernel/vmlinux.lds.S +++ b/arch/hexagon/kernel/vmlinux.lds.S @@ -50,6 +50,7 @@ SECTIONS _text = .; TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT *(.fixup) diff --git a/arch/ia64/kernel/vmlinux.lds.S b/arch/ia64/kernel/vmlinux.lds.S index dc506b05ffbd..f89d20c97412 100644 --- a/arch/ia64/kernel/vmlinux.lds.S +++ b/arch/ia64/kernel/vmlinux.lds.S @@ -46,6 +46,7 @@ SECTIONS { __end_ivt_text = .; TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT *(.gnu.linkonce.t*) diff --git a/arch/m32r/kernel/vmlinux.lds.S b/arch/m32r/kernel/vmlinux.lds.S index 018e4a711d79..ad1fe56455aa 100644 --- a/arch/m32r/kernel/vmlinux.lds.S +++ b/arch/m32r/kernel/vmlinux.lds.S @@ -31,6 +31,7 @@ SECTIONS HEAD_TEXT TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT *(.fixup) *(.gnu.warning) diff --git a/arch/m68k/kernel/vmlinux-nommu.lds b/arch/m68k/kernel/vmlinux-nommu.lds index 06a763f49fd3..d2c8abf1c8c4 100644 --- a/arch/m68k/kernel/vmlinux-nommu.lds +++ b/arch/m68k/kernel/vmlinux-nommu.lds @@ -45,6 +45,7 @@ SECTIONS { HEAD_TEXT TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT *(.fixup) . = ALIGN(16); diff --git a/arch/m68k/kernel/vmlinux-std.lds b/arch/m68k/kernel/vmlinux-std.lds index d0993594f558..5b5ce1e4d1ed 100644 --- a/arch/m68k/kernel/vmlinux-std.lds +++ b/arch/m68k/kernel/vmlinux-std.lds @@ -16,6 +16,7 @@ SECTIONS HEAD_TEXT TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT *(.fixup) *(.gnu.warning) diff --git a/arch/m68k/kernel/vmlinux-sun3.lds b/arch/m68k/kernel/vmlinux-sun3.lds index 8080469ee6c1..fe5ea1974b16 100644 --- a/arch/m68k/kernel/vmlinux-sun3.lds +++ b/arch/m68k/kernel/vmlinux-sun3.lds @@ -16,6 +16,7 @@ SECTIONS HEAD_TEXT TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT *(.fixup) *(.gnu.warning) diff --git a/arch/metag/kernel/vmlinux.lds.S b/arch/metag/kernel/vmlinux.lds.S index e12055e88bfe..9fc48354d519 100644 --- a/arch/metag/kernel/vmlinux.lds.S +++ b/arch/metag/kernel/vmlinux.lds.S @@ -21,6 +21,7 @@ SECTIONS .text : { TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT IRQENTRY_TEXT diff --git a/arch/microblaze/kernel/vmlinux.lds.S b/arch/microblaze/kernel/vmlinux.lds.S index be9488d69734..5913c7863067 100644 --- a/arch/microblaze/kernel/vmlinux.lds.S +++ b/arch/microblaze/kernel/vmlinux.lds.S @@ -33,6 +33,7 @@ SECTIONS { EXIT_TEXT EXIT_CALL SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT IRQENTRY_TEXT diff --git a/arch/mips/kernel/vmlinux.lds.S b/arch/mips/kernel/vmlinux.lds.S index 0a93e83cd014..e0fc08cb0c89 100644 --- a/arch/mips/kernel/vmlinux.lds.S +++ b/arch/mips/kernel/vmlinux.lds.S @@ -55,6 +55,7 @@ SECTIONS .text : { TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT IRQENTRY_TEXT diff --git a/arch/mn10300/kernel/vmlinux.lds.S b/arch/mn10300/kernel/vmlinux.lds.S index 13c4814c29f8..2d5f1c3f1afb 100644 --- a/arch/mn10300/kernel/vmlinux.lds.S +++ b/arch/mn10300/kernel/vmlinux.lds.S @@ -30,6 +30,7 @@ SECTIONS HEAD_TEXT TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT *(.fixup) diff --git a/arch/nios2/kernel/vmlinux.lds.S b/arch/nios2/kernel/vmlinux.lds.S index 326fab40a9de..340c7ab1d8b0 100644 --- a/arch/nios2/kernel/vmlinux.lds.S +++ b/arch/nios2/kernel/vmlinux.lds.S @@ -37,6 +37,7 @@ SECTIONS .text : { TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT IRQENTRY_TEXT KPROBES_TEXT diff --git a/arch/openrisc/kernel/vmlinux.lds.S b/arch/openrisc/kernel/vmlinux.lds.S index 2d69a853b742..6c3cf834b5d8 100644 --- a/arch/openrisc/kernel/vmlinux.lds.S +++ b/arch/openrisc/kernel/vmlinux.lds.S @@ -47,6 +47,7 @@ SECTIONS _stext = .; TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT IRQENTRY_TEXT diff --git a/arch/parisc/kernel/vmlinux.lds.S b/arch/parisc/kernel/vmlinux.lds.S index 308f29081d46..7e53bf44fdd2 100644 --- a/arch/parisc/kernel/vmlinux.lds.S +++ b/arch/parisc/kernel/vmlinux.lds.S @@ -69,6 +69,7 @@ SECTIONS .text ALIGN(PAGE_SIZE) : { TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT IRQENTRY_TEXT diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index d41fd0af8980..bf423392b20a 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -52,6 +52,7 @@ SECTIONS /* careful! __ftr_alt_* sections need to be close to .text */ *(.text .fixup __ftr_alt_* .ref.text) SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT IRQENTRY_TEXT diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S index 445657fe658c..cbc74fd4a6db 100644 --- a/arch/s390/kernel/vmlinux.lds.S +++ b/arch/s390/kernel/vmlinux.lds.S @@ -25,6 +25,7 @@ SECTIONS HEAD_TEXT TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT IRQENTRY_TEXT diff --git a/arch/score/kernel/vmlinux.lds.S b/arch/score/kernel/vmlinux.lds.S index 7274b5c4287e..4117890b1db1 100644 --- a/arch/score/kernel/vmlinux.lds.S +++ b/arch/score/kernel/vmlinux.lds.S @@ -40,6 +40,7 @@ SECTIONS _text = .; /* Text and read-only data */ TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT *(.text.*) diff --git a/arch/sh/kernel/vmlinux.lds.S b/arch/sh/kernel/vmlinux.lds.S index db88cbf9eafd..989500c17358 100644 --- a/arch/sh/kernel/vmlinux.lds.S +++ b/arch/sh/kernel/vmlinux.lds.S @@ -36,6 +36,7 @@ SECTIONS TEXT_TEXT EXTRA_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT IRQENTRY_TEXT diff --git a/arch/sparc/kernel/vmlinux.lds.S b/arch/sparc/kernel/vmlinux.lds.S index f1a2f688b28a..93029a4b5299 100644 --- a/arch/sparc/kernel/vmlinux.lds.S +++ b/arch/sparc/kernel/vmlinux.lds.S @@ -45,6 +45,7 @@ SECTIONS HEAD_TEXT TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT IRQENTRY_TEXT diff --git a/arch/tile/kernel/entry.S b/arch/tile/kernel/entry.S index 670a3569450f..101de132e363 100644 --- a/arch/tile/kernel/entry.S +++ b/arch/tile/kernel/entry.S @@ -50,7 +50,7 @@ STD_ENTRY(smp_nap) * When interrupted at _cpu_idle_nap, we bump the PC forward 8, and * as a result return to the function that called _cpu_idle(). */ -STD_ENTRY(_cpu_idle) +STD_ENTRY_SECTION(_cpu_idle, .cpuidle.text) movei r1, 1 IRQ_ENABLE_LOAD(r2, r3) mtspr INTERRUPT_CRITICAL_SECTION, r1 diff --git a/arch/tile/kernel/vmlinux.lds.S b/arch/tile/kernel/vmlinux.lds.S index 0e059a0101ea..a92931e8c4f9 100644 --- a/arch/tile/kernel/vmlinux.lds.S +++ b/arch/tile/kernel/vmlinux.lds.S @@ -42,6 +42,7 @@ SECTIONS .text : AT (ADDR(.text) - LOAD_OFFSET) { HEAD_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT IRQENTRY_TEXT diff --git a/arch/um/kernel/dyn.lds.S b/arch/um/kernel/dyn.lds.S index adde088aeeff..4fdbcf958cd5 100644 --- a/arch/um/kernel/dyn.lds.S +++ b/arch/um/kernel/dyn.lds.S @@ -68,6 +68,7 @@ SECTIONS _stext = .; TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT *(.fixup) *(.stub .text.* .gnu.linkonce.t.*) diff --git a/arch/um/kernel/uml.lds.S b/arch/um/kernel/uml.lds.S index 6899195602b7..1840f55ed042 100644 --- a/arch/um/kernel/uml.lds.S +++ b/arch/um/kernel/uml.lds.S @@ -28,6 +28,7 @@ SECTIONS _stext = .; TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT *(.fixup) /* .gnu.warning sections are handled specially by elf32.em. */ diff --git a/arch/unicore32/kernel/vmlinux.lds.S b/arch/unicore32/kernel/vmlinux.lds.S index 77e407e49a63..56e788e8ee83 100644 --- a/arch/unicore32/kernel/vmlinux.lds.S +++ b/arch/unicore32/kernel/vmlinux.lds.S @@ -37,6 +37,7 @@ SECTIONS .text : { /* Real text segment */ TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT *(.fixup) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 9f7c21c22477..d569ae7fde37 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -298,7 +298,7 @@ void arch_cpu_idle(void) /* * We use this if we don't have any better idle routine.. */ -void default_idle(void) +void __cpuidle default_idle(void) { trace_cpu_idle_rcuidle(1, smp_processor_id()); safe_halt(); @@ -413,7 +413,7 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c) * with interrupts enabled and no flags, which is backwards compatible with the * original MWAIT implementation. */ -static void mwait_idle(void) +static __cpuidle void mwait_idle(void) { if (!current_set_polling_and_test()) { trace_cpu_idle_rcuidle(1, smp_processor_id()); diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 74e4bf11f562..95f80be7632f 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -98,6 +98,7 @@ SECTIONS _stext = .; TEXT_TEXT SCHED_TEXT + CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT ENTRY_TEXT diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index c4bd0e2c173c..18af5199f97c 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -444,6 +444,12 @@ *(.spinlock.text) \ VMLINUX_SYMBOL(__lock_text_end) = .; +#define CPUIDLE_TEXT \ + ALIGN_FUNCTION(); \ + VMLINUX_SYMBOL(__cpuidle_text_start) = .; \ + *(.cpuidle.text) \ + VMLINUX_SYMBOL(__cpuidle_text_end) = .; + #define KPROBES_TEXT \ ALIGN_FUNCTION(); \ VMLINUX_SYMBOL(__kprobes_text_start) = .; \ diff --git a/include/linux/cpu.h b/include/linux/cpu.h index d2ca8c38f9c4..0cbe214e8f4b 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -274,6 +274,11 @@ void cpu_startup_entry(enum cpuhp_state state); void cpu_idle_poll_ctrl(bool enable); +/* Attach to any functions which should be considered cpuidle. */ +#define __cpuidle __attribute__((__section__(".cpuidle.text"))) + +bool cpu_in_idle(unsigned long pc); + void arch_cpu_idle(void); void arch_cpu_idle_prepare(void); void arch_cpu_idle_enter(void); diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 544a7133cbd1..ffca482beab5 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -15,6 +15,9 @@ #include "sched.h" +/* Linker adds these: start and end of __cpuidle functions */ +extern char __cpuidle_text_start[], __cpuidle_text_end[]; + /** * sched_idle_set_state - Record idle state for the current CPU. * @idle_state: State to record. @@ -52,7 +55,7 @@ static int __init cpu_idle_nopoll_setup(char *__unused) __setup("hlt", cpu_idle_nopoll_setup); #endif -static inline int cpu_idle_poll(void) +static int noinline __cpuidle cpu_idle_poll(void) { rcu_idle_enter(); trace_cpu_idle_rcuidle(0, smp_processor_id()); @@ -83,7 +86,7 @@ void __weak arch_cpu_idle(void) * * To use when the cpuidle framework cannot be used. */ -void default_idle_call(void) +void __cpuidle default_idle_call(void) { if (current_clr_polling_and_test()) { local_irq_enable(); @@ -273,6 +276,12 @@ static void cpu_idle_loop(void) } } +bool cpu_in_idle(unsigned long pc) +{ + return pc >= (unsigned long)__cpuidle_text_start && + pc < (unsigned long)__cpuidle_text_end; +} + void cpu_startup_entry(enum cpuhp_state state) { /* diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c index 9375c0279b73..ac41f3c84e8d 100644 --- a/lib/nmi_backtrace.c +++ b/lib/nmi_backtrace.c @@ -17,6 +17,7 @@ #include <linux/kprobes.h> #include <linux/nmi.h> #include <linux/seq_buf.h> +#include <linux/cpu.h> #ifdef arch_trigger_cpumask_backtrace /* For reliability, we're prepared to waste bits here. */ @@ -160,11 +161,16 @@ bool nmi_cpu_backtrace(struct pt_regs *regs) /* Replace printk to write into the NMI seq */ this_cpu_write(printk_func, nmi_vprintk); - pr_warn("NMI backtrace for cpu %d\n", cpu); - if (regs) - show_regs(regs); - else - dump_stack(); + if (regs != NULL && cpu_in_idle(instruction_pointer(regs))) { + pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n", + cpu, instruction_pointer(regs)); + } else { + pr_warn("NMI backtrace for cpu %d\n", cpu); + if (regs) + show_regs(regs); + else + dump_stack(); + } this_cpu_write(printk_func, printk_func_save); cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask)); diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index 48958d3cec9e..37afd721ec99 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -887,8 +887,8 @@ static void check_section(const char *modname, struct elf_info *elf, #define ALL_EXIT_SECTIONS EXIT_SECTIONS, ALL_XXXEXIT_SECTIONS #define DATA_SECTIONS ".data", ".data.rel" -#define TEXT_SECTIONS ".text", ".text.unlikely", ".sched.text", \ - ".kprobes.text" +#define TEXT_SECTIONS ".text", ".text.unlikely", \ + ".kprobes.text", ".cpuidle.text" #define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \ ".fixup", ".entry.text", ".exception.text", ".text.*", \ ".coldtext" diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c index e167592793a7..9a6ec6ce00b5 100644 --- a/scripts/recordmcount.c +++ b/scripts/recordmcount.c @@ -357,6 +357,7 @@ is_mcounted_section_name(char const *const txtname) strcmp(".spinlock.text", txtname) == 0 || strcmp(".irqentry.text", txtname) == 0 || strcmp(".kprobes.text", txtname) == 0 || + strcmp(".cpuidle.text", txtname) == 0 || strcmp(".text.unlikely", txtname) == 0; } diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl index 96e2486a6fc4..29cecf9b504f 100755 --- a/scripts/recordmcount.pl +++ b/scripts/recordmcount.pl @@ -135,6 +135,7 @@ my %text_sections = ( ".spinlock.text" => 1, ".irqentry.text" => 1, ".kprobes.text" => 1, + ".cpuidle.text" => 1, ".text.unlikely" => 1, );
When doing an nmi backtrace of many cores, most of which are idle, the output is a little overwhelming and very uninformative. Suppress messages for cpus that are idling when they are interrupted and just emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN". We do this by grouping all the cpuidle code together into a new .cpuidle.text section, and then checking the address of the interrupted PC to see if it lies within that section. Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> --- arch/alpha/kernel/vmlinux.lds.S | 1 + arch/arc/kernel/vmlinux.lds.S | 1 + arch/arm/kernel/vmlinux.lds.S | 1 + arch/arm64/kernel/vmlinux.lds.S | 1 + arch/arm64/mm/proc.S | 2 ++ arch/avr32/kernel/vmlinux.lds.S | 1 + arch/blackfin/kernel/vmlinux.lds.S | 1 + arch/c6x/kernel/vmlinux.lds.S | 1 + arch/cris/kernel/vmlinux.lds.S | 1 + arch/frv/kernel/vmlinux.lds.S | 1 + arch/h8300/kernel/vmlinux.lds.S | 1 + arch/hexagon/kernel/vmlinux.lds.S | 1 + arch/ia64/kernel/vmlinux.lds.S | 1 + arch/m32r/kernel/vmlinux.lds.S | 1 + arch/m68k/kernel/vmlinux-nommu.lds | 1 + arch/m68k/kernel/vmlinux-std.lds | 1 + arch/m68k/kernel/vmlinux-sun3.lds | 1 + arch/metag/kernel/vmlinux.lds.S | 1 + arch/microblaze/kernel/vmlinux.lds.S | 1 + arch/mips/kernel/vmlinux.lds.S | 1 + arch/mn10300/kernel/vmlinux.lds.S | 1 + arch/nios2/kernel/vmlinux.lds.S | 1 + arch/openrisc/kernel/vmlinux.lds.S | 1 + arch/parisc/kernel/vmlinux.lds.S | 1 + arch/powerpc/kernel/vmlinux.lds.S | 1 + arch/s390/kernel/vmlinux.lds.S | 1 + arch/score/kernel/vmlinux.lds.S | 1 + arch/sh/kernel/vmlinux.lds.S | 1 + arch/sparc/kernel/vmlinux.lds.S | 1 + arch/tile/kernel/entry.S | 2 +- arch/tile/kernel/vmlinux.lds.S | 1 + arch/um/kernel/dyn.lds.S | 1 + arch/um/kernel/uml.lds.S | 1 + arch/unicore32/kernel/vmlinux.lds.S | 1 + arch/x86/kernel/process.c | 4 ++-- arch/x86/kernel/vmlinux.lds.S | 1 + include/asm-generic/vmlinux.lds.h | 6 ++++++ include/linux/cpu.h | 5 +++++ kernel/sched/idle.c | 13 +++++++++++-- lib/nmi_backtrace.c | 16 +++++++++++----- scripts/mod/modpost.c | 4 ++-- scripts/recordmcount.c | 1 + scripts/recordmcount.pl | 1 + 43 files changed, 75 insertions(+), 12 deletions(-)