diff mbox series

Revert "thermal: core: Don't update trip points inside the hysteresis range"

Message ID 20240325222424.4179635-1-daniel.lezcano@linaro.org (mailing list archive)
State Mainlined, archived
Headers show
Series Revert "thermal: core: Don't update trip points inside the hysteresis range" | expand

Commit Message

Daniel Lezcano March 25, 2024, 10:24 p.m. UTC
It has been reported the commit cf3986f8c01d3 introduced a regression
when the temperature is wavering in the hysteresis region. The
mitigation stops leading to an uncontrolled temperature increase until
reaching the critical trip point.

Here what happens:

 * 'throttle' is when the current temperature is greater than the trip
   point temperature
 * 'target' is the mitigation level
 * 'passive' is positive when there is a mitigation, zero otherwise
 * these values are computed in the step_wise governor

Configuration:

 trip point 1: temp=95°C, hyst=5°C (passive)
 trip point 2: temp=115°C, hyst=0°C (critical)
 governor: step_wise

1. The temperature crosses the way up the trip point 1 at 95°C

   - trend=raising
   - throttle=1, target=1
   - passive=1
   - set_trips: low=90°C, high=115°C

2. The temperature decreases but stays in the hysteresis region at
   93°C

   - trend=dropping
   - throttle=0, target=0
   - passive=1

   Before cf3986f8c01d3
   - set_trips: low=90°C, high=95°C

   After cf3986f8c01d3
   - set_trips: low=90°C, high=115°C

3. The temperature increases a bit but stays in the hysteresis region
   at 94°C (so below the trip point 1 temp 95°C)

   - trend=raising
   - throttle=0, target=0
   - passive=1

   Before cf3986f8c01d3
   - set_trips: low=90°C, high=95°C

   After cf3986f8c01d3
   - set_trips: low=90°C, high=115°C

4. The temperature decreases but stays in the hysteresis region at
   93°C

   - trend=dropping
   - throttle=0, target=THERMAL_NO_TARGET
   - passive=0

   Before cf3986f8c01d3
   - set_trips: low=90°C, high=95°C

   After cf3986f8c01d3
   - set_trips: low=90°C, high=115°C

At this point, the 'passive' value is zero, there is no mitigation,
the temperature is in the hysteresis region, the next trip point is
115°C. As 'passive' is zero, the timer to monitor the thermal zone is
disabled. Consequently if the temperature continues to increase, no
mitigation will happen and it will reach the 115°C trip point and
reboot.

Before the optimization, the high boundary would have been 95°C, thus
triggering the mitigation again and rearming the polling timer.

The optimization make sense but given the current implementation of
the step_wise governor collaborating via this 'passive' flag with the
core framework it can not work.

From a higher perspective it seems like there is a problem between the
governor which sets a variable to be used by the core framework. That
sounds akward and it would make much more sense if the core framework
controls the governor and not the opposite. But as the devil hides in
the details, there are some subtilities to be addressed before.

Elaborating those would be out of the scope this changelog. So let's
stay simple and revert the change first to fixup all broken mobile
platforms.

This reverts commit cf3986f8c01d355490d0ac6024391b989a9d1e9d.

This revert applies on top of v6.9-rc1.

Fixes: cf3986f8c01d3 ("thermal: core: Don't update trip points inside the hysteresis range")
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reported-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
Cc: Nícolas F. R. A. Prado <nfraprado@collabora.com>

Conflicts with commit 0c0c4740c9d26 in:
	drivers/thermal/thermal_trip.c

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 drivers/thermal/thermal_trip.c | 19 ++-----------------
 1 file changed, 2 insertions(+), 17 deletions(-)

Comments

Nícolas F. R. A. Prado March 25, 2024, 11:29 p.m. UTC | #1
On Mon, Mar 25, 2024 at 11:24:24PM +0100, Daniel Lezcano wrote:
> It has been reported the commit cf3986f8c01d3 introduced a regression
> when the temperature is wavering in the hysteresis region. The
> mitigation stops leading to an uncontrolled temperature increase until
> reaching the critical trip point.
> 
> Here what happens:
> 
>  * 'throttle' is when the current temperature is greater than the trip
>    point temperature
>  * 'target' is the mitigation level
>  * 'passive' is positive when there is a mitigation, zero otherwise
>  * these values are computed in the step_wise governor
> 
> Configuration:
> 
>  trip point 1: temp=95°C, hyst=5°C (passive)
>  trip point 2: temp=115°C, hyst=0°C (critical)
>  governor: step_wise
> 
> 1. The temperature crosses the way up the trip point 1 at 95°C
> 
>    - trend=raising
>    - throttle=1, target=1
>    - passive=1
>    - set_trips: low=90°C, high=115°C
> 
> 2. The temperature decreases but stays in the hysteresis region at
>    93°C
> 
>    - trend=dropping
>    - throttle=0, target=0
>    - passive=1
> 
>    Before cf3986f8c01d3
>    - set_trips: low=90°C, high=95°C
> 
>    After cf3986f8c01d3
>    - set_trips: low=90°C, high=115°C
> 
> 3. The temperature increases a bit but stays in the hysteresis region
>    at 94°C (so below the trip point 1 temp 95°C)
> 
>    - trend=raising
>    - throttle=0, target=0
>    - passive=1
> 
>    Before cf3986f8c01d3
>    - set_trips: low=90°C, high=95°C
> 
>    After cf3986f8c01d3
>    - set_trips: low=90°C, high=115°C
> 
> 4. The temperature decreases but stays in the hysteresis region at
>    93°C
> 
>    - trend=dropping
>    - throttle=0, target=THERMAL_NO_TARGET
>    - passive=0
> 
>    Before cf3986f8c01d3
>    - set_trips: low=90°C, high=95°C
> 
>    After cf3986f8c01d3
>    - set_trips: low=90°C, high=115°C
> 
> At this point, the 'passive' value is zero, there is no mitigation,
> the temperature is in the hysteresis region, the next trip point is
> 115°C. As 'passive' is zero, the timer to monitor the thermal zone is
> disabled. Consequently if the temperature continues to increase, no
> mitigation will happen and it will reach the 115°C trip point and
> reboot.
> 
> Before the optimization, the high boundary would have been 95°C, thus
> triggering the mitigation again and rearming the polling timer.
> 
> The optimization make sense but given the current implementation of
> the step_wise governor collaborating via this 'passive' flag with the
> core framework it can not work.
> 
> From a higher perspective it seems like there is a problem between the
> governor which sets a variable to be used by the core framework. That
> sounds akward and it would make much more sense if the core framework
> controls the governor and not the opposite. But as the devil hides in
> the details, there are some subtilities to be addressed before.
> 
> Elaborating those would be out of the scope this changelog. So let's
> stay simple and revert the change first to fixup all broken mobile
> platforms.
> 
> This reverts commit cf3986f8c01d355490d0ac6024391b989a9d1e9d.
> 
> This revert applies on top of v6.9-rc1.
> 
> Fixes: cf3986f8c01d3 ("thermal: core: Don't update trip points inside the hysteresis range")
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Reported-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
> Cc: Nícolas F. R. A. Prado <nfraprado@collabora.com>

As mentioned in the commit, the issue is elsewhere, but given the original
commit was an optimization to prevent unnecessary trip point updates, and that
it seems to have caused a regression, sounds reasonable to revert at least while
a proper fix isn't found.

Acked-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>

Only thing is you might want to add a cc: stable tag to guarantee it is
backported (AFAIR Fixes: doesn't guarantee backport), even more so given there
are conflicts.

Thanks,
Nícolas
Rafael J. Wysocki March 26, 2024, 12:21 p.m. UTC | #2
On Tue, Mar 26, 2024 at 12:29 AM Nícolas F. R. A. Prado
<nfraprado@collabora.com> wrote:
>
> On Mon, Mar 25, 2024 at 11:24:24PM +0100, Daniel Lezcano wrote:
> > It has been reported the commit cf3986f8c01d3 introduced a regression
> > when the temperature is wavering in the hysteresis region. The
> > mitigation stops leading to an uncontrolled temperature increase until
> > reaching the critical trip point.
> >
> > Here what happens:
> >
> >  * 'throttle' is when the current temperature is greater than the trip
> >    point temperature
> >  * 'target' is the mitigation level
> >  * 'passive' is positive when there is a mitigation, zero otherwise
> >  * these values are computed in the step_wise governor
> >
> > Configuration:
> >
> >  trip point 1: temp=95°C, hyst=5°C (passive)
> >  trip point 2: temp=115°C, hyst=0°C (critical)
> >  governor: step_wise
> >
> > 1. The temperature crosses the way up the trip point 1 at 95°C
> >
> >    - trend=raising
> >    - throttle=1, target=1
> >    - passive=1
> >    - set_trips: low=90°C, high=115°C
> >
> > 2. The temperature decreases but stays in the hysteresis region at
> >    93°C
> >
> >    - trend=dropping
> >    - throttle=0, target=0
> >    - passive=1
> >
> >    Before cf3986f8c01d3
> >    - set_trips: low=90°C, high=95°C
> >
> >    After cf3986f8c01d3
> >    - set_trips: low=90°C, high=115°C
> >
> > 3. The temperature increases a bit but stays in the hysteresis region
> >    at 94°C (so below the trip point 1 temp 95°C)
> >
> >    - trend=raising
> >    - throttle=0, target=0
> >    - passive=1
> >
> >    Before cf3986f8c01d3
> >    - set_trips: low=90°C, high=95°C
> >
> >    After cf3986f8c01d3
> >    - set_trips: low=90°C, high=115°C
> >
> > 4. The temperature decreases but stays in the hysteresis region at
> >    93°C
> >
> >    - trend=dropping
> >    - throttle=0, target=THERMAL_NO_TARGET
> >    - passive=0
> >
> >    Before cf3986f8c01d3
> >    - set_trips: low=90°C, high=95°C
> >
> >    After cf3986f8c01d3
> >    - set_trips: low=90°C, high=115°C
> >
> > At this point, the 'passive' value is zero, there is no mitigation,
> > the temperature is in the hysteresis region, the next trip point is
> > 115°C. As 'passive' is zero, the timer to monitor the thermal zone is
> > disabled. Consequently if the temperature continues to increase, no
> > mitigation will happen and it will reach the 115°C trip point and
> > reboot.
> >
> > Before the optimization, the high boundary would have been 95°C, thus
> > triggering the mitigation again and rearming the polling timer.
> >
> > The optimization make sense but given the current implementation of
> > the step_wise governor collaborating via this 'passive' flag with the
> > core framework it can not work.
> >
> > From a higher perspective it seems like there is a problem between the
> > governor which sets a variable to be used by the core framework. That
> > sounds akward and it would make much more sense if the core framework
> > controls the governor and not the opposite. But as the devil hides in
> > the details, there are some subtilities to be addressed before.
> >
> > Elaborating those would be out of the scope this changelog. So let's
> > stay simple and revert the change first to fixup all broken mobile
> > platforms.
> >
> > This reverts commit cf3986f8c01d355490d0ac6024391b989a9d1e9d.
> >
> > This revert applies on top of v6.9-rc1.
> >
> > Fixes: cf3986f8c01d3 ("thermal: core: Don't update trip points inside the hysteresis range")
> > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> > Reported-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
> > Cc: Nícolas F. R. A. Prado <nfraprado@collabora.com>
>
> As mentioned in the commit, the issue is elsewhere, but given the original
> commit was an optimization to prevent unnecessary trip point updates, and that
> it seems to have caused a regression, sounds reasonable to revert at least while
> a proper fix isn't found.
>
> Acked-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
>
> Only thing is you might want to add a cc: stable tag to guarantee it is
> backported (AFAIR Fixes: doesn't guarantee backport), even more so given there
> are conflicts.

Applied as 6.9-rc material, thanks!
diff mbox series

Patch

diff --git a/drivers/thermal/thermal_trip.c b/drivers/thermal/thermal_trip.c
index 09f6050dd041..497abf0d47ca 100644
--- a/drivers/thermal/thermal_trip.c
+++ b/drivers/thermal/thermal_trip.c
@@ -65,7 +65,6 @@  void __thermal_zone_set_trips(struct thermal_zone_device *tz)
 {
 	const struct thermal_trip *trip;
 	int low = -INT_MAX, high = INT_MAX;
-	bool same_trip = false;
 	int ret;
 
 	lockdep_assert_held(&tz->lock);
@@ -74,36 +73,22 @@  void __thermal_zone_set_trips(struct thermal_zone_device *tz)
 		return;
 
 	for_each_trip(tz, trip) {
-		bool low_set = false;
 		int trip_low;
 
 		trip_low = trip->temperature - trip->hysteresis;
 
-		if (trip_low < tz->temperature && trip_low > low) {
+		if (trip_low < tz->temperature && trip_low > low)
 			low = trip_low;
-			low_set = true;
-			same_trip = false;
-		}
 
 		if (trip->temperature > tz->temperature &&
-		    trip->temperature < high) {
+		    trip->temperature < high)
 			high = trip->temperature;
-			same_trip = low_set;
-		}
 	}
 
 	/* No need to change trip points */
 	if (tz->prev_low_trip == low && tz->prev_high_trip == high)
 		return;
 
-	/*
-	 * If "high" and "low" are the same, skip the change unless this is the
-	 * first time.
-	 */
-	if (same_trip && (tz->prev_low_trip != -INT_MAX ||
-	    tz->prev_high_trip != INT_MAX))
-		return;
-
 	tz->prev_low_trip = low;
 	tz->prev_high_trip = high;