Message ID | 20230504151100.v4.8.I818492c326b632560b09f20d2608455ecf9d3650@changeid (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | watchdog/hardlockup: Add the buddy hardlockup detector | expand |
On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote: > These are tiny style changes: > - Add a blank line before a "return". > - Renames two globals to use the "watchdog_hld" prefix. Particularly static ones don't really need the namespace prefixes. Not sure if processed is better than warn. allcpu_dumped is better than dumped_stacks though because the all-CPUs-dump is a particular thing. Other style bits seem fine. Thanks, Nick > - Store processor id in "unsigned int" rather than "int". > - Minor comment rewording. > - Use "else" rather than extra returns since it seemed more symmetric. > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > > Changes in v4: > - ("Style changes to watchdog_hardlockup_check ...") new for v4. > > kernel/watchdog.c | 32 +++++++++++++++----------------- > 1 file changed, 15 insertions(+), 17 deletions(-) > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 2d319cdf64b9..f46669c1671d 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -89,8 +89,8 @@ __setup("nmi_watchdog=", hardlockup_panic_setup); > > static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); > static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); > -static DEFINE_PER_CPU(bool, hard_watchdog_warn); > -static unsigned long hardlockup_allcpu_dumped; > +static DEFINE_PER_CPU(bool, watchdog_hardlockup_processed); > +static unsigned long watchdog_hardlockup_dumped_stacks; > > static bool watchdog_hardlockup_is_lockedup(void) > { > @@ -100,6 +100,7 @@ static bool watchdog_hardlockup_is_lockedup(void) > return true; > > __this_cpu_write(hrtimer_interrupts_saved, hrint); > + > return false; > } > > @@ -110,21 +111,20 @@ static void watchdog_hardlockup_interrupt_count(void) > > void watchdog_hardlockup_check(struct pt_regs *regs) > { > - /* check for a hardlockup > - * This is done by making sure our timer interrupt > - * is incrementing. The timer interrupt should have > - * fired multiple times before we overflow'd. If it hasn't > + /* > + * Check for a hardlockup by making sure the CPU's timer > + * interrupt is incrementing. The timer interrupt should have > + * fired multiple times before we overflow'd. If it hasn't > * then this is a good indication the cpu is stuck > */ > if (watchdog_hardlockup_is_lockedup()) { > - int this_cpu = smp_processor_id(); > + unsigned int this_cpu = smp_processor_id(); > > - /* only print hardlockups once */ > - if (__this_cpu_read(hard_watchdog_warn) == true) > + /* Only handle hardlockups once. */ > + if (__this_cpu_read(watchdog_hardlockup_processed)) > return; > > - pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", > - this_cpu); > + pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", this_cpu); > print_modules(); > print_irqtrace_events(current); > if (regs) > @@ -137,18 +137,16 @@ void watchdog_hardlockup_check(struct pt_regs *regs) > * generating interleaving traces > */ > if (sysctl_hardlockup_all_cpu_backtrace && > - !test_and_set_bit(0, &hardlockup_allcpu_dumped)) > + !test_and_set_bit(0, &watchdog_hardlockup_dumped_stacks)) > trigger_allbutself_cpu_backtrace(); > > if (hardlockup_panic) > nmi_panic(regs, "Hard LOCKUP"); > > - __this_cpu_write(hard_watchdog_warn, true); > - return; > + __this_cpu_write(watchdog_hardlockup_processed, true); > + } else { > + __this_cpu_write(watchdog_hardlockup_processed, false); > } > - > - __this_cpu_write(hard_watchdog_warn, false); > - return; > } > > #else /* CONFIG_HARDLOCKUP_DETECTOR_PERF */ > -- > 2.40.1.521.gf1e218fcd8-goog
Hi, On Thu, May 4, 2023 at 8:02 PM Nicholas Piggin <npiggin@gmail.com> wrote: > > On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote: > > These are tiny style changes: > > - Add a blank line before a "return". > > - Renames two globals to use the "watchdog_hld" prefix. > > Particularly static ones don't really need the namespace prefixes. Renames are mostly at Petr's request. If I've misunderstood what he wants here that I'm happy to remove them. > Not sure if processed is better than warn. I can undo this one if you want. It felt like we were doing more than just warning, but if people think "warn" is a better way to describe it then that's fine with me. > allcpu_dumped is better > than dumped_stacks though because the all-CPUs-dump is a particular > thing. OK, I can undo this and leave it as "allcpu_dumped". [1] https://lore.kernel.org/r/ZFErmshcrcikrSU1@alley
diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 2d319cdf64b9..f46669c1671d 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -89,8 +89,8 @@ __setup("nmi_watchdog=", hardlockup_panic_setup); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); -static DEFINE_PER_CPU(bool, hard_watchdog_warn); -static unsigned long hardlockup_allcpu_dumped; +static DEFINE_PER_CPU(bool, watchdog_hardlockup_processed); +static unsigned long watchdog_hardlockup_dumped_stacks; static bool watchdog_hardlockup_is_lockedup(void) { @@ -100,6 +100,7 @@ static bool watchdog_hardlockup_is_lockedup(void) return true; __this_cpu_write(hrtimer_interrupts_saved, hrint); + return false; } @@ -110,21 +111,20 @@ static void watchdog_hardlockup_interrupt_count(void) void watchdog_hardlockup_check(struct pt_regs *regs) { - /* check for a hardlockup - * This is done by making sure our timer interrupt - * is incrementing. The timer interrupt should have - * fired multiple times before we overflow'd. If it hasn't + /* + * Check for a hardlockup by making sure the CPU's timer + * interrupt is incrementing. The timer interrupt should have + * fired multiple times before we overflow'd. If it hasn't * then this is a good indication the cpu is stuck */ if (watchdog_hardlockup_is_lockedup()) { - int this_cpu = smp_processor_id(); + unsigned int this_cpu = smp_processor_id(); - /* only print hardlockups once */ - if (__this_cpu_read(hard_watchdog_warn) == true) + /* Only handle hardlockups once. */ + if (__this_cpu_read(watchdog_hardlockup_processed)) return; - pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", - this_cpu); + pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", this_cpu); print_modules(); print_irqtrace_events(current); if (regs) @@ -137,18 +137,16 @@ void watchdog_hardlockup_check(struct pt_regs *regs) * generating interleaving traces */ if (sysctl_hardlockup_all_cpu_backtrace && - !test_and_set_bit(0, &hardlockup_allcpu_dumped)) + !test_and_set_bit(0, &watchdog_hardlockup_dumped_stacks)) trigger_allbutself_cpu_backtrace(); if (hardlockup_panic) nmi_panic(regs, "Hard LOCKUP"); - __this_cpu_write(hard_watchdog_warn, true); - return; + __this_cpu_write(watchdog_hardlockup_processed, true); + } else { + __this_cpu_write(watchdog_hardlockup_processed, false); } - - __this_cpu_write(hard_watchdog_warn, false); - return; } #else /* CONFIG_HARDLOCKUP_DETECTOR_PERF */
These are tiny style changes: - Add a blank line before a "return". - Renames two globals to use the "watchdog_hld" prefix. - Store processor id in "unsigned int" rather than "int". - Minor comment rewording. - Use "else" rather than extra returns since it seemed more symmetric. Signed-off-by: Douglas Anderson <dianders@chromium.org> --- Changes in v4: - ("Style changes to watchdog_hardlockup_check ...") new for v4. kernel/watchdog.c | 32 +++++++++++++++----------------- 1 file changed, 15 insertions(+), 17 deletions(-)