diff mbox series

[1/2] watchdog/perf: Provide function for adjusting the event period

Message ID 20250307021811.46981-2-yangyicong@huawei.com (mailing list archive)
State New
Headers show
Series Update the watchdog period according to real CPU frequency | expand

Commit Message

Yicong Yang March 7, 2025, 2:18 a.m. UTC
From: Yicong Yang <yangyicong@hisilicon.com>

Architecture's using perf events for hard lockup detection needs to
convert the watchdog_thresh to the event's period, some architecture
for example arm64 perform this conversion using the CPU's maximum
frequency which will be acquired by cpufreq. However by the time
the lockup detector's initialized the cpufreq driver may not be
initialized, thus launch a watchdog with inaccurate period. Provide
a function dhardlockup_detector_perf_adjust_period() to allowing
adjust the event period. Then architecture can update with more
accurate period if cpufreq is initialized.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
 include/linux/nmi.h    |  2 ++
 kernel/watchdog_perf.c | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+)

Comments

Doug Anderson March 7, 2025, 4:53 p.m. UTC | #1
Hi,

On Thu, Mar 6, 2025 at 6:18 PM Yicong Yang <yangyicong@huawei.com> wrote:
>
> @@ -211,6 +211,27 @@ void hardlockup_detector_perf_cleanup(void)
>         cpumask_clear(&dead_events_mask);
>  }
>
> +/**
> + * hardlockup_detector_perf_adjust_period - Adjust the events period due
> + *                                          to cpu frequency change
> + */
> +void hardlockup_detector_perf_adjust_period(int cpu, u64 period)

Running `scripts/kernel-doc -v kernel/watchdog_perf.c > /dev/null`
after your patch shows:

kernel/watchdog_perf.c:219: warning: Function parameter or struct
member 'cpu' not described in 'hardlockup_detector_perf_adjust_period'
kernel/watchdog_perf.c:219: warning: Function parameter or struct
member 'period' not described in
'hardlockup_detector_perf_adjust_period'


Other than that, this seems reasonable to me.

...but I'd also have to ask: is there a reason you're using the "perf"
hard-lockup detector instead of the buddy one? In my mind, the "buddy"
watchdog is better in almost all ways (I believe it's lower power,
doesn't waste a "perf" controller, and doesn't suffer from frequency
issues). It's even crossed my mind whether the "perf" lockup detector
should be deprecated. ;-)
Yicong Yang March 11, 2025, 8:06 a.m. UTC | #2
On 2025/3/8 0:53, Doug Anderson wrote:
> Hi,
> 
> On Thu, Mar 6, 2025 at 6:18 PM Yicong Yang <yangyicong@huawei.com> wrote:
>>
>> @@ -211,6 +211,27 @@ void hardlockup_detector_perf_cleanup(void)
>>         cpumask_clear(&dead_events_mask);
>>  }
>>
>> +/**
>> + * hardlockup_detector_perf_adjust_period - Adjust the events period due
>> + *                                          to cpu frequency change
>> + */
>> +void hardlockup_detector_perf_adjust_period(int cpu, u64 period)
> 
> Running `scripts/kernel-doc -v kernel/watchdog_perf.c > /dev/null`
> after your patch shows:
> 
> kernel/watchdog_perf.c:219: warning: Function parameter or struct
> member 'cpu' not described in 'hardlockup_detector_perf_adjust_period'
> kernel/watchdog_perf.c:219: warning: Function parameter or struct
> member 'period' not described in
> 'hardlockup_detector_perf_adjust_period'
> 

so the script will check whether parameters are documented as well,
need to add description.

I just followed the document convention like hardlockup_detector_perf_cleanup().

> 
> Other than that, this seems reasonable to me.
> 
> ...but I'd also have to ask: is there a reason you're using the "perf"
> hard-lockup detector instead of the buddy one? In my mind, the "buddy"
> watchdog is better in almost all ways (I believe it's lower power,
> doesn't waste a "perf" controller, and doesn't suffer from frequency
> issues).

ok, sounds reasonable to me especially for save one perf counter.

> It's even crossed my mind whether the "perf" lockup detector
> should be deprecated. ;-)
> .

I think it's mainly due to the policy of the distribution. watchdog_perf
is used by default on openEuler which is used on my board.

Thanks.
diff mbox series

Patch

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index a8dfb38c9bb6..2200ce481439 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -106,11 +106,13 @@  extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
 extern void hardlockup_detector_perf_cleanup(void);
 extern void hardlockup_config_perf_event(const char *str);
+extern void hardlockup_detector_perf_adjust_period(int cpu, u64 period);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
 static inline void hardlockup_detector_perf_cleanup(void) { }
 static inline void hardlockup_config_perf_event(const char *str) { }
+static inline void hardlockup_detector_perf_adjust_period(int cpu, u64 period) { }
 #endif
 
 void watchdog_hardlockup_stop(void);
diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index 59c1d86a73a2..bc00985d1f69 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -211,6 +211,27 @@  void hardlockup_detector_perf_cleanup(void)
 	cpumask_clear(&dead_events_mask);
 }
 
+/**
+ * hardlockup_detector_perf_adjust_period - Adjust the events period due
+ *                                          to cpu frequency change
+ */
+void hardlockup_detector_perf_adjust_period(int cpu, u64 period)
+{
+	struct perf_event *event = per_cpu(watchdog_ev, cpu);
+
+	if (!(watchdog_enabled & WATCHDOG_HARDLOCKUP_ENABLED))
+		return;
+
+	if (!event)
+		return;
+
+	if (event->attr.sample_period == period)
+		return;
+
+	if (perf_event_period(event, period))
+		pr_err("failed to change period to %llu\n", period);
+}
+
 /**
  * hardlockup_detector_perf_stop - Globally stop watchdog events
  *