diff mbox series

[v1] thermal: core: Fix the handling of invalid trip points

Message ID 12441937.O9o76ZdvQC@kreacher (mailing list archive)
State In Next
Delegated to: Rafael Wysocki
Headers show
Series [v1] thermal: core: Fix the handling of invalid trip points | expand

Commit Message

Rafael J. Wysocki May 16, 2024, 5:20 p.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Commit 9ad18043fb35 ("thermal: core: Send trip crossing notifications
at init time if needed") overlooked the case when a trip point that
has started as invalid is set to a valid temperature later.  Namely,
the initial threshold value for all trips is zero, so if a previously
invalid trip becomes valid and its (new) low temperature is above the
zone temperature, a spurious trip crossing notification will occur and
it may trigger the WARN_ON() in handle_thermal_trip().

To address this, set the initial threshold for all trips to INT_MAX.

There is also the case when a valid writable trip becomes invalid that
requires special handling.  First, in accordance with the change
mentioned above, the trip's threshold needs to be set to INT_MAX to
avoid the same issue.  Second, if the trip in question is passive and
it has been crossed by the thermal zone temperature on the way up, the
zone's passive count has been incremented and it is in the passive
polling mode, so its passive count needs to be adjusted to allow the
passive polling to be turned off eventually.

Fixes: 9ad18043fb35 ("thermal: core: Send trip crossing notifications at init time if needed")
Fixes: 042a3d80f118 ("thermal: core: Move passive polling management to the core")
Reported-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/thermal/thermal_core.c |    9 ++++++++-
 drivers/thermal/thermal_trip.c |   18 ++++++++++++++++++
 2 files changed, 26 insertions(+), 1 deletion(-)

Comments

Wendy Wang May 17, 2024, 10:32 a.m. UTC | #1
Hi Rafael,

I have tested your patch '[PATCH v1] thermal: core: Fix the handling of invalid trip points'
on my server, which is running the v6.9 kernel, the thermal throttling test case has passed.

Tested-by: Wendy Wang <wendy.wang@intel.com>

On 2024-05-16 at 19:20:19 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Commit 9ad18043fb35 ("thermal: core: Send trip crossing notifications
> at init time if needed") overlooked the case when a trip point that
> has started as invalid is set to a valid temperature later.  Namely,
> the initial threshold value for all trips is zero, so if a previously
> invalid trip becomes valid and its (new) low temperature is above the
> zone temperature, a spurious trip crossing notification will occur and
> it may trigger the WARN_ON() in handle_thermal_trip().
> 
> To address this, set the initial threshold for all trips to INT_MAX.
> 
> There is also the case when a valid writable trip becomes invalid that
> requires special handling.  First, in accordance with the change
> mentioned above, the trip's threshold needs to be set to INT_MAX to
> avoid the same issue.  Second, if the trip in question is passive and
> it has been crossed by the thermal zone temperature on the way up, the
> zone's passive count has been incremented and it is in the passive
> polling mode, so its passive count needs to be adjusted to allow the
> passive polling to be turned off eventually.
> 
> Fixes: 9ad18043fb35 ("thermal: core: Send trip crossing notifications at init time if needed")
> Fixes: 042a3d80f118 ("thermal: core: Move passive polling management to the core")
> Reported-by: Zhang Rui <rui.zhang@intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/thermal/thermal_core.c |    9 ++++++++-
>  drivers/thermal/thermal_trip.c |   18 ++++++++++++++++++
>  2 files changed, 26 insertions(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -1401,8 +1401,15 @@ thermal_zone_device_register_with_trips(
>  	tz->device.class = thermal_class;
>  	tz->devdata = devdata;
>  	tz->num_trips = num_trips;
> -	for_each_trip_desc(tz, td)
> +	for_each_trip_desc(tz, td) {
>  		td->trip = *trip++;
> +		/*
> +		 * Mark all thresholds as invalid to start with even though
> +		 * this only matters for the trips that start as invalid and
> +		 * become valid later.
> +		 */
> +		td->threshold = INT_MAX;
> +	}
>  
>  	thermal_set_delay_jiffies(&tz->passive_delay_jiffies, passive_delay);
>  	thermal_set_delay_jiffies(&tz->polling_delay_jiffies, polling_delay);
> Index: linux-pm/drivers/thermal/thermal_trip.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_trip.c
> +++ linux-pm/drivers/thermal/thermal_trip.c
> @@ -137,6 +137,24 @@ void thermal_zone_set_trip_temp(struct t
>  	if (trip->temperature == temp)
>  		return;
>  
> +	if (temp == THERMAL_TEMP_INVALID) {
> +		struct thermal_trip_desc *td = trip_to_trip_desc(trip);
> +
> +		if (trip->type == THERMAL_TRIP_PASSIVE &&
> +		    tz->temperature >= td->threshold) {
> +			/*
> +			 * The trip has been crossed, so the thermal zone's
> +			 * passive count needs to be adjusted.
> +			 */
> +			tz->passive--;
> +			WARN_ON_ONCE(tz->passive < 0);
> +		}
> +		/*
> +		 * Invalidate the threshold to avoid triggering a spurious
> +		 * trip crossing notification when the trip becomes valid.
> +		 */
> +		td->threshold = INT_MAX;
> +	}
>  	WRITE_ONCE(trip->temperature, temp);
>  	thermal_notify_tz_trip_change(tz, trip);
>  }
> 
> 
>
diff mbox series

Patch

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -1401,8 +1401,15 @@  thermal_zone_device_register_with_trips(
 	tz->device.class = thermal_class;
 	tz->devdata = devdata;
 	tz->num_trips = num_trips;
-	for_each_trip_desc(tz, td)
+	for_each_trip_desc(tz, td) {
 		td->trip = *trip++;
+		/*
+		 * Mark all thresholds as invalid to start with even though
+		 * this only matters for the trips that start as invalid and
+		 * become valid later.
+		 */
+		td->threshold = INT_MAX;
+	}
 
 	thermal_set_delay_jiffies(&tz->passive_delay_jiffies, passive_delay);
 	thermal_set_delay_jiffies(&tz->polling_delay_jiffies, polling_delay);
Index: linux-pm/drivers/thermal/thermal_trip.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_trip.c
+++ linux-pm/drivers/thermal/thermal_trip.c
@@ -137,6 +137,24 @@  void thermal_zone_set_trip_temp(struct t
 	if (trip->temperature == temp)
 		return;
 
+	if (temp == THERMAL_TEMP_INVALID) {
+		struct thermal_trip_desc *td = trip_to_trip_desc(trip);
+
+		if (trip->type == THERMAL_TRIP_PASSIVE &&
+		    tz->temperature >= td->threshold) {
+			/*
+			 * The trip has been crossed, so the thermal zone's
+			 * passive count needs to be adjusted.
+			 */
+			tz->passive--;
+			WARN_ON_ONCE(tz->passive < 0);
+		}
+		/*
+		 * Invalidate the threshold to avoid triggering a spurious
+		 * trip crossing notification when the trip becomes valid.
+		 */
+		td->threshold = INT_MAX;
+	}
 	WRITE_ONCE(trip->temperature, temp);
 	thermal_notify_tz_trip_change(tz, trip);
 }