Message ID | 20230519101840.v5.14.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | watchdog/hardlockup: Add the buddy hardlockup detector | expand |
On Fri 2023-05-19 10:18:38, Douglas Anderson wrote: > Implement a hardlockup detector that doesn't doesn't need any extra > arch-specific support code to detect lockups. Instead of using > something arch-specific we will use the buddy system, where each CPU > watches out for another one. Specifically, each CPU will use its > softlockup hrtimer to check that the next CPU is processing hrtimer > interrupts by verifying that a counter is increasing. > > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -85,7 +85,7 @@ __setup("nmi_watchdog=", hardlockup_panic_setup); > > #endif /* CONFIG_HARDLOCKUP_DETECTOR */ > > -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) > +#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER) > > static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts); > static DEFINE_PER_CPU(int, hrtimer_interrupts_saved); > @@ -106,6 +106,14 @@ notrace void arch_touch_nmi_watchdog(void) > } > EXPORT_SYMBOL(arch_touch_nmi_watchdog); > > +void watchdog_hardlockup_touch_cpu(unsigned int cpu) > +{ > + per_cpu(watchdog_hardlockup_touched, cpu) = true; > + > + /* Match with smp_rmb() in watchdog_hardlockup_check() */ > + smp_wmb(); It is great that you described where the related barrier is. Another important information is what exactly is synchronized. And I am actually not sure what we are synchronizing here. My understanding is that a write barrier should synchronize related writes, for example: X = ...; /* Make sure that X is modified before Y */ smp_wmb(); Y = ...; And the related read barrier should synchronize the related reads, for example: if (test(Y)) { /* * Make sure that we use the updated X when * we saw the updated Y. */ smp_rmb(); do_something(X); } IMHO, we do not need any barrier here because we have only one variable "watchdog_hardlockup_touched" here. watchdog_hardlockup_check() will either see the updated value or not. But it does not synchronize it against any other variables or values. > +} > + > static bool is_hardlockup(unsigned int cpu) > { > int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu)); > @@ -443,11 +454,15 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > struct pt_regs *regs = get_irq_regs(); > int duration; > int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace; > + unsigned long hrtimer_interrupts; > > if (!watchdog_enabled) > return HRTIMER_NORESTART; > > - watchdog_hardlockup_kick(); > + hrtimer_interrupts = watchdog_hardlockup_kick(); > + > + /* test for hardlockups */ I would omit the comment. It is not valid when perf detector is used. And checking the buddy is clear from the function name. > + watchdog_buddy_check_hardlockup(hrtimer_interrupts); I would personally move this into watchdog_hardlockup_kick(). watchdog_timer_fn() is already complex enough. And checking the buddy when kicking a CPU makes sense. Also I would not pass "hrtimer_interrupts". I guess that it is just an optimization. It is an extra churn in the code. IMHO, is is not wort it. This code does not need to be super optimized. > + > > /* kick the softlockup detector */ > if (completion_done(this_cpu_ptr(&softlockup_completion))) { > diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c > new file mode 100644 > index 000000000000..fee45af2e5bd > --- /dev/null > +++ b/kernel/watchdog_buddy.c > @@ -0,0 +1,93 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +#include <linux/cpu.h> > +#include <linux/cpumask.h> > +#include <linux/kernel.h> > +#include <linux/nmi.h> > +#include <linux/percpu-defs.h> > + > +static cpumask_t __read_mostly watchdog_cpus; > + > +static unsigned int watchdog_next_cpu(unsigned int cpu) > +{ > + cpumask_t cpus = watchdog_cpus; A copy should be done by cpumask_copy(). But the question is why a copy would be needed. When called from watchdog_buddy_check_hardlockup(), this function is not sychronized against the CPU hotplug. And even the copying would be racy. IMHO, the copy does not help much and we do not need it. The only important this is that this function would return a valid CPU. And I think that it is guarantted because CPU0 could not be disabled. > + unsigned int next_cpu; > + > + next_cpu = cpumask_next(cpu, &cpus); > + if (next_cpu >= nr_cpu_ids) > + next_cpu = cpumask_first(&cpus); > + > + if (next_cpu == cpu) > + return nr_cpu_ids; >> + return next_cpu; > +} > + > +int __init watchdog_hardlockup_probe(void) > +{ > + return 0; > +} > + > +void watchdog_hardlockup_enable(unsigned int cpu) > +{ > + unsigned int next_cpu; > + > + /* > + * The new CPU will be marked online before the hrtimer interrupt > + * gets a chance to run on it. If another CPU tests for a > + * hardlockup on the new CPU before it has run its the hrtimer > + * interrupt, it will get a false positive. Touch the watchdog on > + * the new CPU to delay the check for at least 3 sampling periods > + * to guarantee one hrtimer has run on the new CPU. > + */ > + watchdog_hardlockup_touch_cpu(cpu); > + > + /* > + * We are going to check the next CPU. Our watchdog_hrtimer > + * need not be zero if the CPU has already been online earlier. > + * Touch the watchdog on the next CPU to avoid false positive > + * if we try to check it in less then 3 interrupts. > + */ > + next_cpu = watchdog_next_cpu(cpu); > + if (next_cpu < nr_cpu_ids) > + watchdog_hardlockup_touch_cpu(next_cpu); Thinking loudly: This feels racy when many CPUs are enabled/disabled in parallel. I am not 100% sure it it can happen though. My understanding is that it can't happen because the CPU hotplug is serialized by cpu_add_remove_lock. So, this seems to work after all. > + > + cpumask_set_cpu(cpu, &watchdog_cpus); > +} > + > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -1035,10 +1035,55 @@ config BOOTPARAM_SOFTLOCKUP_PANIC > > Say N if unsure. > > -config HARDLOCKUP_DETECTOR_PERF > +# Both the "perf" and "buddy" hardlockup detectors count hrtimer > +# interrupts. This config enables functions managing this common code. > +config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > bool > select SOFTLOCKUP_DETECTOR > > +config HARDLOCKUP_DETECTOR_PERF > + bool > + depends on HAVE_HARDLOCKUP_DETECTOR_PERF > + select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > + > +config HARDLOCKUP_DETECTOR_BUDDY > + bool > + depends on SMP > + select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > + > +# For hardlockup detectors you can have one directly provided by the arch > +# or use a "non-arch" one. If you're using a "non-arch" one that is > +# further divided the perf hardlockup detector (which, confusingly, needs > +# arch-provided perf support) and the buddy hardlockup detector (which just > +# needs SMP). In either case, using the "non-arch" code conflicts with > +# the NMI watchdog code (which is sometimes used directly and sometimes used > +# by the arch-provided hardlockup detector). > +config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH > + bool > + depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG > + default y > + > +config HARDLOCKUP_DETECTOR_PREFER_BUDDY > + bool "Prefer the buddy CPU hardlockup detector" > + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH && HAVE_HARDLOCKUP_DETECTOR_PERF && SMP Huh, I have big troubles to scratch my head around this check: HAVE_HARDLOCKUP_DETECTOR_NON_ARCH depends on HAVE_HARDLOCKUP_DETECTOR_PERF and SMP and this depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH and again on HAVE_HARDLOCKUP_DETECTOR_PERF and SMP. > + help > + Say Y here to prefer the buddy hardlockup detector over the perf one. > + > + With the buddy detector, each CPU uses its softlockup hrtimer > + to check that the next CPU is processing hrtimer interrupts by > + verifying that a counter is increasing. > + > + This hardlockup detector is useful on systems that don't have > + an arch-specific hardlockup detector or if resources needed > + for the hardlockup detector are better used for other things. > + > +# This will select the appropriate non-arch hardlockdup detector > +config HARDLOCKUP_DETECTOR_NON_ARCH > + bool > + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH > + select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY > + select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY > + > # > # Enables a timestamp based low pass filter to compensate for perf based > # hard lockup detection which runs too fast due to turbo modes. > @@ -1053,9 +1098,10 @@ config HARDLOCKUP_CHECK_TIMESTAMP > config HARDLOCKUP_DETECTOR > bool "Detect Hard Lockups" > depends on DEBUG_KERNEL && !S390 Is there any reason why S390 could not or do not want to use the buddy hardlockup detector. > - depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH > + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH > select LOCKUP_DETECTOR > - select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF > + select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH Anyway, the configuration of the hard lockup detectors is insane and this patchset makes it even worse, especially the new HARDLOCKUP_DETECTOR_NON_ARCH stuff. It seems that sparc, powerpc and s390 are somehow special. Do you still have in mind how they are distinguished using the Kconfig variables? For example, I am pretty confused by the meaning of HAVE_NMI_WATCHDOG. And sparc has its own variant of watchdog_hardlockup_enable()/disable(). It means that it is arch-specific. Does it work with the 13th patch which made watchdog_hardlockup_enable()/disable() to be watchdog-hardlockup-type specific? Is is somehow related to HAVE_NMI_WATCHDOG? Does this replace the entire watchdog only only the enable part? I think that we need to make this more straightforward. But I first need to understand the existing maze of config variables. Best Regards, Petr
Hi, On Thu, May 25, 2023 at 9:27 AM Petr Mladek <pmladek@suse.com> wrote: > > On Fri 2023-05-19 10:18:38, Douglas Anderson wrote: > > Implement a hardlockup detector that doesn't doesn't need any extra > > arch-specific support code to detect lockups. Instead of using > > something arch-specific we will use the buddy system, where each CPU > > watches out for another one. Specifically, each CPU will use its > > softlockup hrtimer to check that the next CPU is processing hrtimer > > interrupts by verifying that a counter is increasing. > > > > --- a/kernel/watchdog.c > > +++ b/kernel/watchdog.c > > @@ -85,7 +85,7 @@ __setup("nmi_watchdog=", hardlockup_panic_setup); > > > > #endif /* CONFIG_HARDLOCKUP_DETECTOR */ > > > > -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) > > +#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER) > > > > static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts); > > static DEFINE_PER_CPU(int, hrtimer_interrupts_saved); > > @@ -106,6 +106,14 @@ notrace void arch_touch_nmi_watchdog(void) > > } > > EXPORT_SYMBOL(arch_touch_nmi_watchdog); > > > > +void watchdog_hardlockup_touch_cpu(unsigned int cpu) > > +{ > > + per_cpu(watchdog_hardlockup_touched, cpu) = true; > > + > > + /* Match with smp_rmb() in watchdog_hardlockup_check() */ > > + smp_wmb(); > > It is great that you described where the related barrier is. > > Another important information is what exactly is synchronized. > And I am actually not sure what we are synchronizing here. > > My understanding is that a write barrier should synchronize > related writes, for example: > > X = ...; > /* Make sure that X is modified before Y */ > smp_wmb(); > Y = ...; > > And the related read barrier should synchronize the related reads, > for example: > > if (test(Y)) { > /* > * Make sure that we use the updated X when > * we saw the updated Y. > */ > smp_rmb(); > do_something(X); > } > > IMHO, we do not need any barrier here because we have only > one variable "watchdog_hardlockup_touched" here. > watchdog_hardlockup_check() will either see the updated value > or not. But it does not synchronize it against any other > variables or values. Fair. These barriers were present in the original buddy lockup detector that we've been carrying in ChromeOS but that doesn't necessarily mean that they were there for a good reason. Reasoning about weakly ordered memory always makes my brain hurt and I never feel confident at the end that I got the right answer and, of course, this is coupled by the fact that if I have a logic error in my reasoning that it might cause a rare / subtle bug. :( When possible I try to write code that uses full blown locks to make it easier to reason about (even if less efficient), but that's not necessarily possible here. While we obviously don't just want to sprinkle barriers all over the code, IMO it's not a terrible sin to put a barrier in a case where it makes it easier to reason about the order of things. In any case, I guess in this case I would worry about some sort of ordering race when enabling / disabling the buddy lockup detector. At the end of the buddy's watchdog_hardlockup_enable() / watchdog_hardlockup_disable() we adjust the "watchdog_cpus" which changes buddy assignments. Without a barrier, I _think_ it would be possible for a new CPU to notice the change in buddies without noticing the touch. Does that match your understanding? Now when reasoning about CPUs going online/offline we need to consider this extra case and we have to decide if there's any chance it could lead to a false lockup detection. With the memory barriers here, it's a little easier to think about. Did the above convince you about keeping the barriers? If so, do you have any suggested comment that would make it clearer? > > +} > > + > > static bool is_hardlockup(unsigned int cpu) > > { > > int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu)); > > @@ -443,11 +454,15 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > > struct pt_regs *regs = get_irq_regs(); > > int duration; > > int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace; > > + unsigned long hrtimer_interrupts; > > > > if (!watchdog_enabled) > > return HRTIMER_NORESTART; > > > > - watchdog_hardlockup_kick(); > > + hrtimer_interrupts = watchdog_hardlockup_kick(); > > + > > + /* test for hardlockups */ > > I would omit the comment. It is not valid when perf detector is used. > And checking the buddy is clear from the function name. > > > + watchdog_buddy_check_hardlockup(hrtimer_interrupts); > > I would personally move this into watchdog_hardlockup_kick(). > watchdog_timer_fn() is already complex enough. And checking > the buddy when kicking a CPU makes sense. Sure, I'll add that to my list of things to follow-up with. > Also I would not pass "hrtimer_interrupts". I guess that it is > just an optimization. It is an extra churn in the code. IMHO, > is is not wort it. This code does not need to be super optimized. The main reason I did it is that "hrtimer_interrupts" is static to watchdog.c now. If I don't pass it in then I have to make it non-static and add it to the header. That also means anyone looking at the variable and figuring out how it is read/written needs to go search for other people that reference it. I feel like it's cleaner to just pass it in. If you feel strongly that I should change this then let me know, but otherwise I'll plan to leave this how I have it. > > /* kick the softlockup detector */ > > if (completion_done(this_cpu_ptr(&softlockup_completion))) { > > diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c > > new file mode 100644 > > index 000000000000..fee45af2e5bd > > --- /dev/null > > +++ b/kernel/watchdog_buddy.c > > @@ -0,0 +1,93 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > + > > +#include <linux/cpu.h> > > +#include <linux/cpumask.h> > > +#include <linux/kernel.h> > > +#include <linux/nmi.h> > > +#include <linux/percpu-defs.h> > > + > > +static cpumask_t __read_mostly watchdog_cpus; > > + > > +static unsigned int watchdog_next_cpu(unsigned int cpu) > > +{ > > + cpumask_t cpus = watchdog_cpus; > > A copy should be done by cpumask_copy(). > > But the question is why a copy would be needed. When called from > watchdog_buddy_check_hardlockup(), this function is not sychronized > against the CPU hotplug. And even the copying would be racy. > > IMHO, the copy does not help much and we do not need it. > > The only important this is that this function would return > a valid CPU. And I think that it is guarantted because > CPU0 could not be disabled. Yup, I'll get rid of the copy. > > + unsigned int next_cpu; > > + > > + next_cpu = cpumask_next(cpu, &cpus); > > + if (next_cpu >= nr_cpu_ids) > > + next_cpu = cpumask_first(&cpus); > > + > > + if (next_cpu == cpu) > > + return nr_cpu_ids; > >> + return next_cpu; > > +} > > + > > +int __init watchdog_hardlockup_probe(void) > > +{ > > + return 0; > > +} > > + > > +void watchdog_hardlockup_enable(unsigned int cpu) > > +{ > > + unsigned int next_cpu; > > + > > + /* > > + * The new CPU will be marked online before the hrtimer interrupt > > + * gets a chance to run on it. If another CPU tests for a > > + * hardlockup on the new CPU before it has run its the hrtimer > > + * interrupt, it will get a false positive. Touch the watchdog on > > + * the new CPU to delay the check for at least 3 sampling periods > > + * to guarantee one hrtimer has run on the new CPU. > > + */ > > + watchdog_hardlockup_touch_cpu(cpu); > > + > > + /* > > + * We are going to check the next CPU. Our watchdog_hrtimer > > + * need not be zero if the CPU has already been online earlier. > > + * Touch the watchdog on the next CPU to avoid false positive > > + * if we try to check it in less then 3 interrupts. > > + */ > > + next_cpu = watchdog_next_cpu(cpu); > > + if (next_cpu < nr_cpu_ids) > > + watchdog_hardlockup_touch_cpu(next_cpu); > > Thinking loudly: > > This feels racy when many CPUs are enabled/disabled in parallel. > I am not 100% sure it it can happen though. > > My understanding is that it can't happen because the CPU hotplug > is serialized by cpu_add_remove_lock. > > So, this seems to work after all. > > > + > > + cpumask_set_cpu(cpu, &watchdog_cpus); > > +} > > + > > --- a/lib/Kconfig.debug > > +++ b/lib/Kconfig.debug > > @@ -1035,10 +1035,55 @@ config BOOTPARAM_SOFTLOCKUP_PANIC > > > > Say N if unsure. > > > > -config HARDLOCKUP_DETECTOR_PERF > > +# Both the "perf" and "buddy" hardlockup detectors count hrtimer > > +# interrupts. This config enables functions managing this common code. > > +config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > > bool > > select SOFTLOCKUP_DETECTOR > > > > +config HARDLOCKUP_DETECTOR_PERF > > + bool > > + depends on HAVE_HARDLOCKUP_DETECTOR_PERF > > + select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > > + > > +config HARDLOCKUP_DETECTOR_BUDDY > > + bool > > + depends on SMP > > + select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > > + > > +# For hardlockup detectors you can have one directly provided by the arch > > +# or use a "non-arch" one. If you're using a "non-arch" one that is > > +# further divided the perf hardlockup detector (which, confusingly, needs > > +# arch-provided perf support) and the buddy hardlockup detector (which just > > +# needs SMP). In either case, using the "non-arch" code conflicts with > > +# the NMI watchdog code (which is sometimes used directly and sometimes used > > +# by the arch-provided hardlockup detector). > > +config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH > > + bool > > + depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG > > + default y > > + > > +config HARDLOCKUP_DETECTOR_PREFER_BUDDY > > + bool "Prefer the buddy CPU hardlockup detector" > > + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH && HAVE_HARDLOCKUP_DETECTOR_PERF && SMP > > Huh, I have big troubles to scratch my head around this check: > > HAVE_HARDLOCKUP_DETECTOR_NON_ARCH depends on HAVE_HARDLOCKUP_DETECTOR_PERF and SMP > > and this depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH and again > on HAVE_HARDLOCKUP_DETECTOR_PERF and SMP. The goal is to have "HARDLOCKUP_DETECTOR_PREFER_BUDDY" to show up as an option if there is an option _other_ than the buddy. If there's not more than one hardlockup detector to pick from then there's no reason to ask the person configuring the kernel which one they'd prefer. At the moment, if you have an "arch" lockup detector then you're stuck with it, so you only get a choice if a "perf" detector is available and you've got SMP. Ah, so I guess this could be simplified to: depends on HAVE_HARDLOCKUP_DETECTOR_PERF && SMP OK, I'll add that to the list. > > + help > > + Say Y here to prefer the buddy hardlockup detector over the perf one. > > + > > + With the buddy detector, each CPU uses its softlockup hrtimer > > + to check that the next CPU is processing hrtimer interrupts by > > + verifying that a counter is increasing. > > + > > + This hardlockup detector is useful on systems that don't have > > + an arch-specific hardlockup detector or if resources needed > > + for the hardlockup detector are better used for other things. > > + > > +# This will select the appropriate non-arch hardlockdup detector > > +config HARDLOCKUP_DETECTOR_NON_ARCH > > + bool > > + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH > > + select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY > > + select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY > > + > > # > > # Enables a timestamp based low pass filter to compensate for perf based > > # hard lockup detection which runs too fast due to turbo modes. > > @@ -1053,9 +1098,10 @@ config HARDLOCKUP_CHECK_TIMESTAMP > > config HARDLOCKUP_DETECTOR > > bool "Detect Hard Lockups" > > depends on DEBUG_KERNEL && !S390 > > Is there any reason why S390 could not or do not want to use the buddy > hardlockup detector. This isn't a new dependency, but it's a good question. Looking at the git history, I see commit dea20a3fbdd0 ("[PATCH] Disable DETECT_SOFTLOCKUP for s390"). ...and it looks like the softlockup detector still says it's broken on s390. That would mean that the buddy detector is broken too. > > - depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH > > + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH > > select LOCKUP_DETECTOR > > - select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF > > + select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH > > Anyway, the configuration of the hard lockup detectors is insane and > this patchset makes it even worse, especially the new > HARDLOCKUP_DETECTOR_NON_ARCH stuff. > > It seems that sparc, powerpc and s390 are somehow special. Do you > still have in mind how they are distinguished using the Kconfig > variables? > > For example, I am pretty confused by the meaning of HAVE_NMI_WATCHDOG. > > And sparc has its own variant of > watchdog_hardlockup_enable()/disable(). It means that it is > arch-specific. Does it work with the 13th patch which made > watchdog_hardlockup_enable()/disable() to be watchdog-hardlockup-type > specific? Is is somehow related to HAVE_NMI_WATCHDOG? > Does this replace the entire watchdog only only the enable part? > > I think that we need to make this more straightforward. But I first > need to understand the existing maze of config variables. I agree that it's confusing. I'm obviously biased, but IMO it's less confusing after my patchset than before. ;-) The state of the world before my patchset set a pretty low bar. As far as I understand it, at an architecture-level you can choose any _ONE_ of the following: a) Implement bits needed for the the "perf" hardlockup detector. x86 has done this, some configs of powerpc do this, and arm64 now after my patch series. This is HAVE_HARDLOCKUP_DETECTOR_PERF. b) Implement your own totally separate hardlockup detector that doesn't use any of the common "perf" code but still looks the same to userspace (same sysctls, etc). Only powerpc does this (in some configs). As per conversations in previous versions of my patch series, apparently powerpc's version is quite fancy and maybe someday people can move some of these features to the common code. This is HAVE_HARDLOCKUP_DETECTOR_ARCH. c) Don't implement the full features of a hardlockup detector but still have the basics. In the very least, I think it doesn't support the sysctls "hardlockup_panic" and "hardlockup_all_cpu_backtrace". It doesn't support the kernel command line parameter "nmi_watchdog=". I don't know for sure if there are any other differences. Only sparc64 does this. This is HAVE_NMI_WATCHDOG. Confusingly, HAVE_HARDLOCKUP_DETECTOR_ARCH selects HAVE_NMI_WATCHDOG. d) Don't implement _any_ hardlockup detector of any sort. After my patchset you can still end up with "buddy" if you have SMP. One thing that would probably help would be to bring sparc64 to a full "arch" hardlockup implementation and then get rid of the special case. That seems a bit outside my scope, though if someone wanted to post patches for that I'd be willing to give them a review. I guess other than that, the best we could try to do is to rename some configs and/or add some subconfigs to describe certain features? Maybe HAVE_NMI_WATCHDOG => HAVE_HARDLOCKUP_DETECTOR_ARCH_BASIC_FEATURES would help? I'd love to come up with a better name for HAVE_HARDLOCKUP_DETECTOR_NON_ARCH but I couldn't come up with one. Maybe the unwieldy "HAVE_HARDLOCKUP_DETECTOR_THAT_COUNTS_HRTIMER"? If you have concrete suggestions for what would be cleaner, let me know and I can queue up a patch. ...or I'm happy to review a patch. -Doug
On Thu 2023-05-25 13:08:04, Doug Anderson wrote: > Hi, > > On Thu, May 25, 2023 at 9:27 AM Petr Mladek <pmladek@suse.com> wrote: > > > > On Fri 2023-05-19 10:18:38, Douglas Anderson wrote: > > > Implement a hardlockup detector that doesn't doesn't need any extra > > > arch-specific support code to detect lockups. Instead of using > > > something arch-specific we will use the buddy system, where each CPU > > > watches out for another one. Specifically, each CPU will use its > > > softlockup hrtimer to check that the next CPU is processing hrtimer > > > interrupts by verifying that a counter is increasing. > > > > > > --- a/kernel/watchdog.c > > > +++ b/kernel/watchdog.c > > > @@ -85,7 +85,7 @@ __setup("nmi_watchdog=", hardlockup_panic_setup); > > > > > > #endif /* CONFIG_HARDLOCKUP_DETECTOR */ > > > > > > -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) > > > +#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER) > > > > > > static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts); > > > static DEFINE_PER_CPU(int, hrtimer_interrupts_saved); > > > @@ -106,6 +106,14 @@ notrace void arch_touch_nmi_watchdog(void) > > > } > > > EXPORT_SYMBOL(arch_touch_nmi_watchdog); > > > > > > +void watchdog_hardlockup_touch_cpu(unsigned int cpu) > > > +{ > > > + per_cpu(watchdog_hardlockup_touched, cpu) = true; > > > + > > > + /* Match with smp_rmb() in watchdog_hardlockup_check() */ > > > + smp_wmb(); > > > > It is great that you described where the related barrier is. > > > > Another important information is what exactly is synchronized. > > And I am actually not sure what we are synchronizing here. > > > > My understanding is that a write barrier should synchronize > > related writes, for example: > > > > X = ...; > > /* Make sure that X is modified before Y */ > > smp_wmb(); > > Y = ...; > > > > And the related read barrier should synchronize the related reads, > > for example: > > > > if (test(Y)) { > > /* > > * Make sure that we use the updated X when > > * we saw the updated Y. > > */ > > smp_rmb(); > > do_something(X); > > } > > > > IMHO, we do not need any barrier here because we have only > > one variable "watchdog_hardlockup_touched" here. > > watchdog_hardlockup_check() will either see the updated value > > or not. But it does not synchronize it against any other > > variables or values. > > Fair. These barriers were present in the original buddy lockup > detector that we've been carrying in ChromeOS but that doesn't > necessarily mean that they were there for a good reason. > > Reasoning about weakly ordered memory always makes my brain hurt and I > never feel confident at the end that I got the right answer and, of > course, this is coupled by the fact that if I have a logic error in my > reasoning that it might cause a rare / subtle bug. :( Sure. Lockless code is complicated. > When possible I > try to write code that uses full blown locks to make it easier to > reason about (even if less efficient), Makes sense. There should be a good reason to use lockless code because it is complicated to do it right and maintain. > but that's not necessarily > possible here. While we obviously don't just want to sprinkle barriers > all over the code, IMO it's not a terrible sin to put a barrier in a > case where it makes it easier to reason about the order of things. I understand this. Well, it is always important to describe the the reason why the barrier was added there. Even when it is wrong, it gives a clue what was the motivation. Otherwise, it is hard to do any changes on the code later. I guess that it might be more problematic for you because you probably are not the original author. > In any case, I guess in this case I would worry about some sort of > ordering race when enabling / disabling the buddy lockup detector. At > the end of the buddy's watchdog_hardlockup_enable() / > watchdog_hardlockup_disable() we adjust the "watchdog_cpus" which > changes buddy assignments. Without a barrier, I _think_ it would be > possible for a new CPU to notice the change in buddies without > noticing the touch. Does that match your understanding? Now when > reasoning about CPUs going online/offline we need to consider this > extra case and we have to decide if there's any chance it could lead > to a false lockup detection. With the memory barriers here, it's a > little easier to think about. This makes sense. I did not think about the hotplug scenario. Well, I suggest to move the barriers into the buddy code and describe it there. It does not make sense to use the barriers for the perf hardlockup. I mean something like: diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c index fee45af2e5bd..ebe71dcb55e6 100644 --- a/kernel/watchdog_buddy.c +++ b/kernel/watchdog_buddy.c @@ -52,6 +52,13 @@ void watchdog_hardlockup_enable(unsigned int cpu) if (next_cpu < nr_cpu_ids) watchdog_hardlockup_touch_cpu(next_cpu); + /* + * Makes sure that watchdog is touched on this CPU before + * other CPUs could see it in watchdog_cpus. The counter + * part is in watchdog_buddy_check_hardlockup(). + */ + smp_wmb(); + cpumask_set_cpu(cpu, &watchdog_cpus); } @@ -69,6 +76,13 @@ void watchdog_hardlockup_disable(unsigned int cpu) if (next_cpu < nr_cpu_ids) watchdog_hardlockup_touch_cpu(next_cpu); + /* + * Makes sure that watchdog is touched on the next CPU before + * this CPU disappear in watchdog_cpus. The counter part is in + * watchdog_buddy_check_hardlockup(). + */ + smp_wmb(); + cpumask_clear_cpu(cpu, &watchdog_cpus); } @@ -89,5 +103,12 @@ void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts) if (next_cpu >= nr_cpu_ids) return; + /* + * Make sure that the watchdog was touched on next CPU when + * watchdog_next_cpu() returned another one because of + * a change in watchdog_hardlockup_enable()/disable(). + */ + smp_rmb(); + watchdog_hardlockup_check(next_cpu, NULL); } > Did the above convince you about keeping the barriers? If so, do you > have any suggested comment that would make it clearer? > > > > > +} > > > + > > > static bool is_hardlockup(unsigned int cpu) > > > { > > > int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu)); > > > @@ -443,11 +454,15 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > > > struct pt_regs *regs = get_irq_regs(); > > > int duration; > > > int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace; > > > + unsigned long hrtimer_interrupts; > > > > > > if (!watchdog_enabled) > > > return HRTIMER_NORESTART; > > > > > > - watchdog_hardlockup_kick(); > > > + hrtimer_interrupts = watchdog_hardlockup_kick(); > > > + > > > + /* test for hardlockups */ > > > > I would omit the comment. It is not valid when perf detector is used. > > And checking the buddy is clear from the function name. > > > > > + watchdog_buddy_check_hardlockup(hrtimer_interrupts); > > > > I would personally move this into watchdog_hardlockup_kick(). > > watchdog_timer_fn() is already complex enough. And checking > > the buddy when kicking a CPU makes sense. > > Sure, I'll add that to my list of things to follow-up with. > > > > Also I would not pass "hrtimer_interrupts". I guess that it is > > just an optimization. It is an extra churn in the code. IMHO, > > is is not wort it. This code does not need to be super optimized. > > The main reason I did it is that "hrtimer_interrupts" is static to > watchdog.c now. If I don't pass it in then I have to make it > non-static and add it to the header. That also means anyone looking at > the variable and figuring out how it is read/written needs to go > search for other people that reference it. I feel like it's cleaner to > just pass it in. If you feel strongly that I should change this then > let me know, but otherwise I'll plan to leave this how I have it. I do not have strong opinion. For me, the more important change is to move watchdog_buddy_check_hardlockup() into watchdog_hardlockup_kick(). watchdog_timer_fn() is already too complex. > > > > /* kick the softlockup detector */ > > > if (completion_done(this_cpu_ptr(&softlockup_completion))) { > > > --- a/lib/Kconfig.debug > > > +++ b/lib/Kconfig.debug > > > @@ -1035,10 +1035,55 @@ config BOOTPARAM_SOFTLOCKUP_PANIC > > > > > > Say N if unsure. > > > > > > -config HARDLOCKUP_DETECTOR_PERF > > > +# Both the "perf" and "buddy" hardlockup detectors count hrtimer > > > +# interrupts. This config enables functions managing this common code. > > > +config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > > > bool > > > select SOFTLOCKUP_DETECTOR > > > > > > +config HARDLOCKUP_DETECTOR_PERF > > > + bool > > > + depends on HAVE_HARDLOCKUP_DETECTOR_PERF > > > + select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > > > + > > > +config HARDLOCKUP_DETECTOR_BUDDY > > > + bool > > > + depends on SMP > > > + select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > > > + > > > +# For hardlockup detectors you can have one directly provided by the arch > > > +# or use a "non-arch" one. If you're using a "non-arch" one that is > > > +# further divided the perf hardlockup detector (which, confusingly, needs > > > +# arch-provided perf support) and the buddy hardlockup detector (which just > > > +# needs SMP). In either case, using the "non-arch" code conflicts with > > > +# the NMI watchdog code (which is sometimes used directly and sometimes used > > > +# by the arch-provided hardlockup detector). > > > +config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH > > > + bool > > > + depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG > > > + default y > > > + > > > +config HARDLOCKUP_DETECTOR_PREFER_BUDDY > > > + bool "Prefer the buddy CPU hardlockup detector" > > > + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH && HAVE_HARDLOCKUP_DETECTOR_PERF && SMP > > > > Huh, I have big troubles to scratch my head around this check: > > > > HAVE_HARDLOCKUP_DETECTOR_NON_ARCH depends on HAVE_HARDLOCKUP_DETECTOR_PERF and SMP > > > > and this depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH and again > > on HAVE_HARDLOCKUP_DETECTOR_PERF and SMP. > > The goal is to have "HARDLOCKUP_DETECTOR_PREFER_BUDDY" to show up as > an option if there is an option _other_ than the buddy. If there's not > more than one hardlockup detector to pick from then there's no reason > to ask the person configuring the kernel which one they'd prefer. At > the moment, if you have an "arch" lockup detector then you're stuck > with it, so you only get a choice if a "perf" detector is available > and you've got SMP. > > Ah, so I guess this could be simplified to: > > depends on HAVE_HARDLOCKUP_DETECTOR_PERF && SMP Yes, this is much better. > OK, I'll add that to the list. > > > > > + help > > > + Say Y here to prefer the buddy hardlockup detector over the perf one. > > > + > > > + With the buddy detector, each CPU uses its softlockup hrtimer > > > + to check that the next CPU is processing hrtimer interrupts by > > > + verifying that a counter is increasing. > > > + > > > + This hardlockup detector is useful on systems that don't have > > > + an arch-specific hardlockup detector or if resources needed > > > + for the hardlockup detector are better used for other things. > > > + > > > +# This will select the appropriate non-arch hardlockdup detector > > > +config HARDLOCKUP_DETECTOR_NON_ARCH > > > + bool > > > + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH > > > + select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY > > > + select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY > > > + > > > # > > > # Enables a timestamp based low pass filter to compensate for perf based > > > # hard lockup detection which runs too fast due to turbo modes. > > > @@ -1053,9 +1098,10 @@ config HARDLOCKUP_CHECK_TIMESTAMP > > > config HARDLOCKUP_DETECTOR > > > bool "Detect Hard Lockups" > > > depends on DEBUG_KERNEL && !S390 > > > > Is there any reason why S390 could not or do not want to use the buddy > > hardlockup detector. > > This isn't a new dependency, but it's a good question. Looking at the > git history, I see commit dea20a3fbdd0 ("[PATCH] Disable > DETECT_SOFTLOCKUP for s390"). ...and it looks like the softlockup > detector still says it's broken on s390. That would mean that the > buddy detector is broken too. It seems that s390 wanted to disable the watchdog completely, see the commit dea20a3fbdd08e5 ("[PATCH] Disable DETECT_SOFTLOCKUP for s390") because they got too many false positives. > > > > - depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH > > > + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH > > > select LOCKUP_DETECTOR > > > - select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF > > > + select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH > > > > Anyway, the configuration of the hard lockup detectors is insane and > > this patchset makes it even worse, especially the new > > HARDLOCKUP_DETECTOR_NON_ARCH stuff. > > > > It seems that sparc, powerpc and s390 are somehow special. Do you > > still have in mind how they are distinguished using the Kconfig > > variables? > > > > For example, I am pretty confused by the meaning of HAVE_NMI_WATCHDOG. > > > > And sparc has its own variant of > > watchdog_hardlockup_enable()/disable(). It means that it is > > arch-specific. Does it work with the 13th patch which made > > watchdog_hardlockup_enable()/disable() to be watchdog-hardlockup-type > > specific? Is is somehow related to HAVE_NMI_WATCHDOG? > > Does this replace the entire watchdog only only the enable part? > > > > I think that we need to make this more straightforward. But I first > > need to understand the existing maze of config variables. > > I agree that it's confusing. I'm obviously biased, but IMO it's less > confusing after my patchset than before. ;-) The state of the world > before my patchset set a pretty low bar. > > As far as I understand it, at an architecture-level you can choose any > _ONE_ of the following: > > a) Implement bits needed for the the "perf" hardlockup detector. x86 > has done this, some configs of powerpc do this, and arm64 now after my > patch series. This is HAVE_HARDLOCKUP_DETECTOR_PERF. > > b) Implement your own totally separate hardlockup detector that > doesn't use any of the common "perf" code but still looks the same to > userspace (same sysctls, etc). Only powerpc does this (in some > configs). As per conversations in previous versions of my patch > series, apparently powerpc's version is quite fancy and maybe someday > people can move some of these features to the common code. This is > HAVE_HARDLOCKUP_DETECTOR_ARCH. > > c) Don't implement the full features of a hardlockup detector but > still have the basics. In the very least, I think it doesn't support > the sysctls "hardlockup_panic" and "hardlockup_all_cpu_backtrace". It > doesn't support the kernel command line parameter "nmi_watchdog=". I > don't know for sure if there are any other differences. Only sparc64 > does this. This is HAVE_NMI_WATCHDOG. Confusingly, > HAVE_HARDLOCKUP_DETECTOR_ARCH selects HAVE_NMI_WATCHDOG. > > d) Don't implement _any_ hardlockup detector of any sort. After my > patchset you can still end up with "buddy" if you have SMP. > > One thing that would probably help would be to bring sparc64 to a full > "arch" hardlockup implementation and then get rid of the special case. > That seems a bit outside my scope, though if someone wanted to post > patches for that I'd be willing to give them a review. It would be nice but it might be problematic if we do not have access to the hardware. > I guess other than that, the best we could try to do is to rename some > configs and/or add some subconfigs to describe certain features? Maybe > HAVE_NMI_WATCHDOG => HAVE_HARDLOCKUP_DETECTOR_ARCH_BASIC_FEATURES > would help? I'd love to come up with a better name for > HAVE_HARDLOCKUP_DETECTOR_NON_ARCH but I couldn't come up with one. > Maybe the unwieldy "HAVE_HARDLOCKUP_DETECTOR_THAT_COUNTS_HRTIMER"? Renaming the config variables seems to be the best solution at the moment. IMHO, it would be nice to have something like: + CONFIG_HARDLOCKUP_DETECTOR for code shared by all hardlockup detectors + CONFIG_HARDLOCKUP_DETECTOR_PERF for code using kernel/watchdog_perf.c + CONFIG_HARDLOCKUP_DETECTOR_BUDDY for code using kernel/watchdog_buddy.c + HAVE_HARDLOCKUP_DETECTOR_PERF set by architectures that support using kernel/watchdog_perf.c + HAVE_HARDLOCKUP_DETECTOR_ARCH set by architectures that have alternative implementation of the hardlockup detector + CONFIG_HARDLOCKUP_DETECTOR_PREFER_BUDDY Allow to prefer the buddy detector when _PERF or _ARCH is available as well. + HAVE_HARDLOCKUP_PANIC + HAVE_HARDLOCKUP_ALL_CPU_BACKTRACE set when the earchitecture support these features and used for the sysfs interface > If you have concrete suggestions for what would be cleaner, let me > know and I can queue up a patch. ...or I'm happy to review a patch. I am not sure how complicated it would be to rename the config variables to somehing sane. I am sorry I do not have time to prepare the patches at the moment. I'll let Andrew to decide if he would require this cleanup to accept the patchset. Best Regards, Petr
diff --git a/include/linux/nmi.h b/include/linux/nmi.h index c216e8a1be1f..47db14e7da52 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -87,8 +87,9 @@ extern unsigned int hardlockup_panic; static inline void hardlockup_detector_disable(void) {} #endif -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) +#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER) void arch_touch_nmi_watchdog(void); +void watchdog_hardlockup_touch_cpu(unsigned int cpu); void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); #elif !defined(CONFIG_HAVE_NMI_WATCHDOG) static inline void arch_touch_nmi_watchdog(void) { } @@ -118,6 +119,12 @@ void watchdog_hardlockup_disable(unsigned int cpu); void lockup_detector_reconfigure(void); +#ifdef CONFIG_HARDLOCKUP_DETECTOR_BUDDY +void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts); +#else +static inline void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts) {} +#endif + /** * touch_nmi_watchdog - manually pet the hardlockup watchdog. * diff --git a/kernel/Makefile b/kernel/Makefile index 7eb72033143c..f9e3fd9195d9 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -91,6 +91,7 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o obj-$(CONFIG_KGDB) += debug/ obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o +obj-$(CONFIG_HARDLOCKUP_DETECTOR_BUDDY) += watchdog_buddy.o obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_perf.o obj-$(CONFIG_SECCOMP) += seccomp.o obj-$(CONFIG_RELAY) += relay.o diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 08ce046f636d..4110f7ca23a5 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -85,7 +85,7 @@ __setup("nmi_watchdog=", hardlockup_panic_setup); #endif /* CONFIG_HARDLOCKUP_DETECTOR */ -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) +#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER) static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts); static DEFINE_PER_CPU(int, hrtimer_interrupts_saved); @@ -106,6 +106,14 @@ notrace void arch_touch_nmi_watchdog(void) } EXPORT_SYMBOL(arch_touch_nmi_watchdog); +void watchdog_hardlockup_touch_cpu(unsigned int cpu) +{ + per_cpu(watchdog_hardlockup_touched, cpu) = true; + + /* Match with smp_rmb() in watchdog_hardlockup_check() */ + smp_wmb(); +} + static bool is_hardlockup(unsigned int cpu) { int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu)); @@ -123,13 +131,16 @@ static bool is_hardlockup(unsigned int cpu) return false; } -static void watchdog_hardlockup_kick(void) +static unsigned long watchdog_hardlockup_kick(void) { - atomic_inc(raw_cpu_ptr(&hrtimer_interrupts)); + return atomic_inc_return(raw_cpu_ptr(&hrtimer_interrupts)); } void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) { + /* Match with smp_wmb() in watchdog_hardlockup_touch_cpu() */ + smp_rmb(); + if (per_cpu(watchdog_hardlockup_touched, cpu)) { per_cpu(watchdog_hardlockup_touched, cpu) = false; return; @@ -180,11 +191,11 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) } } -#else /* CONFIG_HARDLOCKUP_DETECTOR_PERF */ +#else /* CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER */ -static inline void watchdog_hardlockup_kick(void) { } +static inline unsigned long watchdog_hardlockup_kick(void) { return 0; } -#endif /* !CONFIG_HARDLOCKUP_DETECTOR_PERF */ +#endif /* !CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER */ /* * These functions can be overridden based on the configured hardlockdup detector. @@ -443,11 +454,15 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) struct pt_regs *regs = get_irq_regs(); int duration; int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace; + unsigned long hrtimer_interrupts; if (!watchdog_enabled) return HRTIMER_NORESTART; - watchdog_hardlockup_kick(); + hrtimer_interrupts = watchdog_hardlockup_kick(); + + /* test for hardlockups */ + watchdog_buddy_check_hardlockup(hrtimer_interrupts); /* kick the softlockup detector */ if (completion_done(this_cpu_ptr(&softlockup_completion))) { diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c new file mode 100644 index 000000000000..fee45af2e5bd --- /dev/null +++ b/kernel/watchdog_buddy.c @@ -0,0 +1,93 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/cpu.h> +#include <linux/cpumask.h> +#include <linux/kernel.h> +#include <linux/nmi.h> +#include <linux/percpu-defs.h> + +static cpumask_t __read_mostly watchdog_cpus; + +static unsigned int watchdog_next_cpu(unsigned int cpu) +{ + cpumask_t cpus = watchdog_cpus; + unsigned int next_cpu; + + next_cpu = cpumask_next(cpu, &cpus); + if (next_cpu >= nr_cpu_ids) + next_cpu = cpumask_first(&cpus); + + if (next_cpu == cpu) + return nr_cpu_ids; + + return next_cpu; +} + +int __init watchdog_hardlockup_probe(void) +{ + return 0; +} + +void watchdog_hardlockup_enable(unsigned int cpu) +{ + unsigned int next_cpu; + + /* + * The new CPU will be marked online before the hrtimer interrupt + * gets a chance to run on it. If another CPU tests for a + * hardlockup on the new CPU before it has run its the hrtimer + * interrupt, it will get a false positive. Touch the watchdog on + * the new CPU to delay the check for at least 3 sampling periods + * to guarantee one hrtimer has run on the new CPU. + */ + watchdog_hardlockup_touch_cpu(cpu); + + /* + * We are going to check the next CPU. Our watchdog_hrtimer + * need not be zero if the CPU has already been online earlier. + * Touch the watchdog on the next CPU to avoid false positive + * if we try to check it in less then 3 interrupts. + */ + next_cpu = watchdog_next_cpu(cpu); + if (next_cpu < nr_cpu_ids) + watchdog_hardlockup_touch_cpu(next_cpu); + + cpumask_set_cpu(cpu, &watchdog_cpus); +} + +void watchdog_hardlockup_disable(unsigned int cpu) +{ + unsigned int next_cpu = watchdog_next_cpu(cpu); + + /* + * Offlining this CPU will cause the CPU before this one to start + * checking the one after this one. If this CPU just finished checking + * the next CPU and updating hrtimer_interrupts_saved, and then the + * previous CPU checks it within one sample period, it will trigger a + * false positive. Touch the watchdog on the next CPU to prevent it. + */ + if (next_cpu < nr_cpu_ids) + watchdog_hardlockup_touch_cpu(next_cpu); + + cpumask_clear_cpu(cpu, &watchdog_cpus); +} + +void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts) +{ + unsigned int next_cpu; + + /* + * Test for hardlockups every 3 samples. The sample period is + * watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over + * watchdog_thresh (over by 20%). + */ + if (hrtimer_interrupts % 3 != 0) + return; + + /* check for a hardlockup on the next CPU */ + next_cpu = watchdog_next_cpu(smp_processor_id()); + if (next_cpu >= nr_cpu_ids) + return; + + watchdog_hardlockup_check(next_cpu, NULL); +} diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index ce51d4dc6803..abcad0513a32 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1035,10 +1035,55 @@ config BOOTPARAM_SOFTLOCKUP_PANIC Say N if unsure. -config HARDLOCKUP_DETECTOR_PERF +# Both the "perf" and "buddy" hardlockup detectors count hrtimer +# interrupts. This config enables functions managing this common code. +config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER bool select SOFTLOCKUP_DETECTOR +config HARDLOCKUP_DETECTOR_PERF + bool + depends on HAVE_HARDLOCKUP_DETECTOR_PERF + select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER + +config HARDLOCKUP_DETECTOR_BUDDY + bool + depends on SMP + select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER + +# For hardlockup detectors you can have one directly provided by the arch +# or use a "non-arch" one. If you're using a "non-arch" one that is +# further divided the perf hardlockup detector (which, confusingly, needs +# arch-provided perf support) and the buddy hardlockup detector (which just +# needs SMP). In either case, using the "non-arch" code conflicts with +# the NMI watchdog code (which is sometimes used directly and sometimes used +# by the arch-provided hardlockup detector). +config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH + bool + depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG + default y + +config HARDLOCKUP_DETECTOR_PREFER_BUDDY + bool "Prefer the buddy CPU hardlockup detector" + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH && HAVE_HARDLOCKUP_DETECTOR_PERF && SMP + help + Say Y here to prefer the buddy hardlockup detector over the perf one. + + With the buddy detector, each CPU uses its softlockup hrtimer + to check that the next CPU is processing hrtimer interrupts by + verifying that a counter is increasing. + + This hardlockup detector is useful on systems that don't have + an arch-specific hardlockup detector or if resources needed + for the hardlockup detector are better used for other things. + +# This will select the appropriate non-arch hardlockdup detector +config HARDLOCKUP_DETECTOR_NON_ARCH + bool + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH + select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY + select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY + # # Enables a timestamp based low pass filter to compensate for perf based # hard lockup detection which runs too fast due to turbo modes. @@ -1053,9 +1098,10 @@ config HARDLOCKUP_CHECK_TIMESTAMP config HARDLOCKUP_DETECTOR bool "Detect Hard Lockups" depends on DEBUG_KERNEL && !S390 - depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH select LOCKUP_DETECTOR - select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF + select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH + help Say Y here to enable the kernel to act as a watchdog to detect hard lockups.