Message ID | 12441937.O9o76ZdvQC@kreacher (mailing list archive) |
---|---|
State | In Next |
Delegated to: | Rafael Wysocki |
Headers | show |
Series | [v1] thermal: core: Fix the handling of invalid trip points | expand |
Hi Rafael, I have tested your patch '[PATCH v1] thermal: core: Fix the handling of invalid trip points' on my server, which is running the v6.9 kernel, the thermal throttling test case has passed. Tested-by: Wendy Wang <wendy.wang@intel.com> On 2024-05-16 at 19:20:19 +0200, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Commit 9ad18043fb35 ("thermal: core: Send trip crossing notifications > at init time if needed") overlooked the case when a trip point that > has started as invalid is set to a valid temperature later. Namely, > the initial threshold value for all trips is zero, so if a previously > invalid trip becomes valid and its (new) low temperature is above the > zone temperature, a spurious trip crossing notification will occur and > it may trigger the WARN_ON() in handle_thermal_trip(). > > To address this, set the initial threshold for all trips to INT_MAX. > > There is also the case when a valid writable trip becomes invalid that > requires special handling. First, in accordance with the change > mentioned above, the trip's threshold needs to be set to INT_MAX to > avoid the same issue. Second, if the trip in question is passive and > it has been crossed by the thermal zone temperature on the way up, the > zone's passive count has been incremented and it is in the passive > polling mode, so its passive count needs to be adjusted to allow the > passive polling to be turned off eventually. > > Fixes: 9ad18043fb35 ("thermal: core: Send trip crossing notifications at init time if needed") > Fixes: 042a3d80f118 ("thermal: core: Move passive polling management to the core") > Reported-by: Zhang Rui <rui.zhang@intel.com> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > drivers/thermal/thermal_core.c | 9 ++++++++- > drivers/thermal/thermal_trip.c | 18 ++++++++++++++++++ > 2 files changed, 26 insertions(+), 1 deletion(-) > > Index: linux-pm/drivers/thermal/thermal_core.c > =================================================================== > --- linux-pm.orig/drivers/thermal/thermal_core.c > +++ linux-pm/drivers/thermal/thermal_core.c > @@ -1401,8 +1401,15 @@ thermal_zone_device_register_with_trips( > tz->device.class = thermal_class; > tz->devdata = devdata; > tz->num_trips = num_trips; > - for_each_trip_desc(tz, td) > + for_each_trip_desc(tz, td) { > td->trip = *trip++; > + /* > + * Mark all thresholds as invalid to start with even though > + * this only matters for the trips that start as invalid and > + * become valid later. > + */ > + td->threshold = INT_MAX; > + } > > thermal_set_delay_jiffies(&tz->passive_delay_jiffies, passive_delay); > thermal_set_delay_jiffies(&tz->polling_delay_jiffies, polling_delay); > Index: linux-pm/drivers/thermal/thermal_trip.c > =================================================================== > --- linux-pm.orig/drivers/thermal/thermal_trip.c > +++ linux-pm/drivers/thermal/thermal_trip.c > @@ -137,6 +137,24 @@ void thermal_zone_set_trip_temp(struct t > if (trip->temperature == temp) > return; > > + if (temp == THERMAL_TEMP_INVALID) { > + struct thermal_trip_desc *td = trip_to_trip_desc(trip); > + > + if (trip->type == THERMAL_TRIP_PASSIVE && > + tz->temperature >= td->threshold) { > + /* > + * The trip has been crossed, so the thermal zone's > + * passive count needs to be adjusted. > + */ > + tz->passive--; > + WARN_ON_ONCE(tz->passive < 0); > + } > + /* > + * Invalidate the threshold to avoid triggering a spurious > + * trip crossing notification when the trip becomes valid. > + */ > + td->threshold = INT_MAX; > + } > WRITE_ONCE(trip->temperature, temp); > thermal_notify_tz_trip_change(tz, trip); > } > > >
Index: linux-pm/drivers/thermal/thermal_core.c =================================================================== --- linux-pm.orig/drivers/thermal/thermal_core.c +++ linux-pm/drivers/thermal/thermal_core.c @@ -1401,8 +1401,15 @@ thermal_zone_device_register_with_trips( tz->device.class = thermal_class; tz->devdata = devdata; tz->num_trips = num_trips; - for_each_trip_desc(tz, td) + for_each_trip_desc(tz, td) { td->trip = *trip++; + /* + * Mark all thresholds as invalid to start with even though + * this only matters for the trips that start as invalid and + * become valid later. + */ + td->threshold = INT_MAX; + } thermal_set_delay_jiffies(&tz->passive_delay_jiffies, passive_delay); thermal_set_delay_jiffies(&tz->polling_delay_jiffies, polling_delay); Index: linux-pm/drivers/thermal/thermal_trip.c =================================================================== --- linux-pm.orig/drivers/thermal/thermal_trip.c +++ linux-pm/drivers/thermal/thermal_trip.c @@ -137,6 +137,24 @@ void thermal_zone_set_trip_temp(struct t if (trip->temperature == temp) return; + if (temp == THERMAL_TEMP_INVALID) { + struct thermal_trip_desc *td = trip_to_trip_desc(trip); + + if (trip->type == THERMAL_TRIP_PASSIVE && + tz->temperature >= td->threshold) { + /* + * The trip has been crossed, so the thermal zone's + * passive count needs to be adjusted. + */ + tz->passive--; + WARN_ON_ONCE(tz->passive < 0); + } + /* + * Invalidate the threshold to avoid triggering a spurious + * trip crossing notification when the trip becomes valid. + */ + td->threshold = INT_MAX; + } WRITE_ONCE(trip->temperature, temp); thermal_notify_tz_trip_change(tz, trip); }