Message ID | 20230519101840.v5.2.I843b0d1de3e096ba111a179f3adb16d576bef5c7@changeid (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | watchdog/hardlockup: Add the buddy hardlockup detector | expand |
On Fri 2023-05-19 10:18:26, Douglas Anderson wrote: > Currently, in the watchdog_overflow_callback() we first check to see > if the watchdog had been touched and _then_ we handle the workaround > for turbo mode. This order should be reversed. > > Specifically, "touching" the hardlockup detector's watchdog should > avoid lockups being detected for one period that should be roughly the > same regardless of whether we're running turbo or not. That means that > we should do the extra accounting for turbo _before_ we look at (and > clear) the global indicating that we've been touched. The ideal solution would be to reset the turbo-mode-related variables when the watchdog is touched. And keep checking watchdog_nmi_touch first. But this ordering change should be good enough. It causes that we always check watchdog_nmi_touch when the turbo-more-related variables are already reset. > NOTE: this fix is made based on code inspection. I am not aware of any > reports where the old code would have generated false positives. That > being said, this order seems more correct and also makes it easier > down the line to share code with the "buddy" hardlockup detector. > > Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes") > Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Best Regards, Petr
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c index 247bf0b1582c..1e8a49dc956e 100644 --- a/kernel/watchdog_hld.c +++ b/kernel/watchdog_hld.c @@ -114,14 +114,14 @@ static void watchdog_overflow_callback(struct perf_event *event, /* Ensure the watchdog never gets throttled */ event->hw.interrupts = 0; + if (!watchdog_check_timestamp()) + return; + if (__this_cpu_read(watchdog_nmi_touch) == true) { __this_cpu_write(watchdog_nmi_touch, false); return; } - if (!watchdog_check_timestamp()) - return; - /* check for a hardlockup * This is done by making sure our timer interrupt * is incrementing. The timer interrupt should have
Currently, in the watchdog_overflow_callback() we first check to see if the watchdog had been touched and _then_ we handle the workaround for turbo mode. This order should be reversed. Specifically, "touching" the hardlockup detector's watchdog should avoid lockups being detected for one period that should be roughly the same regardless of whether we're running turbo or not. That means that we should do the extra accounting for turbo _before_ we look at (and clear) the global indicating that we've been touched. NOTE: this fix is made based on code inspection. I am not aware of any reports where the old code would have generated false positives. That being said, this order seems more correct and also makes it easier down the line to share code with the "buddy" hardlockup detector. Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes") Signed-off-by: Douglas Anderson <dianders@chromium.org> --- Changes in v5: - ("More properly prevent false ...") promoted to its own patch for v5. kernel/watchdog_hld.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)