Message ID | 20240325222424.4179635-1-daniel.lezcano@linaro.org (mailing list archive) |
---|---|
State | Mainlined, archived |
Headers | show |
Series | Revert "thermal: core: Don't update trip points inside the hysteresis range" | expand |
On Mon, Mar 25, 2024 at 11:24:24PM +0100, Daniel Lezcano wrote: > It has been reported the commit cf3986f8c01d3 introduced a regression > when the temperature is wavering in the hysteresis region. The > mitigation stops leading to an uncontrolled temperature increase until > reaching the critical trip point. > > Here what happens: > > * 'throttle' is when the current temperature is greater than the trip > point temperature > * 'target' is the mitigation level > * 'passive' is positive when there is a mitigation, zero otherwise > * these values are computed in the step_wise governor > > Configuration: > > trip point 1: temp=95°C, hyst=5°C (passive) > trip point 2: temp=115°C, hyst=0°C (critical) > governor: step_wise > > 1. The temperature crosses the way up the trip point 1 at 95°C > > - trend=raising > - throttle=1, target=1 > - passive=1 > - set_trips: low=90°C, high=115°C > > 2. The temperature decreases but stays in the hysteresis region at > 93°C > > - trend=dropping > - throttle=0, target=0 > - passive=1 > > Before cf3986f8c01d3 > - set_trips: low=90°C, high=95°C > > After cf3986f8c01d3 > - set_trips: low=90°C, high=115°C > > 3. The temperature increases a bit but stays in the hysteresis region > at 94°C (so below the trip point 1 temp 95°C) > > - trend=raising > - throttle=0, target=0 > - passive=1 > > Before cf3986f8c01d3 > - set_trips: low=90°C, high=95°C > > After cf3986f8c01d3 > - set_trips: low=90°C, high=115°C > > 4. The temperature decreases but stays in the hysteresis region at > 93°C > > - trend=dropping > - throttle=0, target=THERMAL_NO_TARGET > - passive=0 > > Before cf3986f8c01d3 > - set_trips: low=90°C, high=95°C > > After cf3986f8c01d3 > - set_trips: low=90°C, high=115°C > > At this point, the 'passive' value is zero, there is no mitigation, > the temperature is in the hysteresis region, the next trip point is > 115°C. As 'passive' is zero, the timer to monitor the thermal zone is > disabled. Consequently if the temperature continues to increase, no > mitigation will happen and it will reach the 115°C trip point and > reboot. > > Before the optimization, the high boundary would have been 95°C, thus > triggering the mitigation again and rearming the polling timer. > > The optimization make sense but given the current implementation of > the step_wise governor collaborating via this 'passive' flag with the > core framework it can not work. > > From a higher perspective it seems like there is a problem between the > governor which sets a variable to be used by the core framework. That > sounds akward and it would make much more sense if the core framework > controls the governor and not the opposite. But as the devil hides in > the details, there are some subtilities to be addressed before. > > Elaborating those would be out of the scope this changelog. So let's > stay simple and revert the change first to fixup all broken mobile > platforms. > > This reverts commit cf3986f8c01d355490d0ac6024391b989a9d1e9d. > > This revert applies on top of v6.9-rc1. > > Fixes: cf3986f8c01d3 ("thermal: core: Don't update trip points inside the hysteresis range") > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > Reported-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com> > Cc: Nícolas F. R. A. Prado <nfraprado@collabora.com> As mentioned in the commit, the issue is elsewhere, but given the original commit was an optimization to prevent unnecessary trip point updates, and that it seems to have caused a regression, sounds reasonable to revert at least while a proper fix isn't found. Acked-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Only thing is you might want to add a cc: stable tag to guarantee it is backported (AFAIR Fixes: doesn't guarantee backport), even more so given there are conflicts. Thanks, Nícolas
On Tue, Mar 26, 2024 at 12:29 AM Nícolas F. R. A. Prado <nfraprado@collabora.com> wrote: > > On Mon, Mar 25, 2024 at 11:24:24PM +0100, Daniel Lezcano wrote: > > It has been reported the commit cf3986f8c01d3 introduced a regression > > when the temperature is wavering in the hysteresis region. The > > mitigation stops leading to an uncontrolled temperature increase until > > reaching the critical trip point. > > > > Here what happens: > > > > * 'throttle' is when the current temperature is greater than the trip > > point temperature > > * 'target' is the mitigation level > > * 'passive' is positive when there is a mitigation, zero otherwise > > * these values are computed in the step_wise governor > > > > Configuration: > > > > trip point 1: temp=95°C, hyst=5°C (passive) > > trip point 2: temp=115°C, hyst=0°C (critical) > > governor: step_wise > > > > 1. The temperature crosses the way up the trip point 1 at 95°C > > > > - trend=raising > > - throttle=1, target=1 > > - passive=1 > > - set_trips: low=90°C, high=115°C > > > > 2. The temperature decreases but stays in the hysteresis region at > > 93°C > > > > - trend=dropping > > - throttle=0, target=0 > > - passive=1 > > > > Before cf3986f8c01d3 > > - set_trips: low=90°C, high=95°C > > > > After cf3986f8c01d3 > > - set_trips: low=90°C, high=115°C > > > > 3. The temperature increases a bit but stays in the hysteresis region > > at 94°C (so below the trip point 1 temp 95°C) > > > > - trend=raising > > - throttle=0, target=0 > > - passive=1 > > > > Before cf3986f8c01d3 > > - set_trips: low=90°C, high=95°C > > > > After cf3986f8c01d3 > > - set_trips: low=90°C, high=115°C > > > > 4. The temperature decreases but stays in the hysteresis region at > > 93°C > > > > - trend=dropping > > - throttle=0, target=THERMAL_NO_TARGET > > - passive=0 > > > > Before cf3986f8c01d3 > > - set_trips: low=90°C, high=95°C > > > > After cf3986f8c01d3 > > - set_trips: low=90°C, high=115°C > > > > At this point, the 'passive' value is zero, there is no mitigation, > > the temperature is in the hysteresis region, the next trip point is > > 115°C. As 'passive' is zero, the timer to monitor the thermal zone is > > disabled. Consequently if the temperature continues to increase, no > > mitigation will happen and it will reach the 115°C trip point and > > reboot. > > > > Before the optimization, the high boundary would have been 95°C, thus > > triggering the mitigation again and rearming the polling timer. > > > > The optimization make sense but given the current implementation of > > the step_wise governor collaborating via this 'passive' flag with the > > core framework it can not work. > > > > From a higher perspective it seems like there is a problem between the > > governor which sets a variable to be used by the core framework. That > > sounds akward and it would make much more sense if the core framework > > controls the governor and not the opposite. But as the devil hides in > > the details, there are some subtilities to be addressed before. > > > > Elaborating those would be out of the scope this changelog. So let's > > stay simple and revert the change first to fixup all broken mobile > > platforms. > > > > This reverts commit cf3986f8c01d355490d0ac6024391b989a9d1e9d. > > > > This revert applies on top of v6.9-rc1. > > > > Fixes: cf3986f8c01d3 ("thermal: core: Don't update trip points inside the hysteresis range") > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > > Reported-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com> > > Cc: Nícolas F. R. A. Prado <nfraprado@collabora.com> > > As mentioned in the commit, the issue is elsewhere, but given the original > commit was an optimization to prevent unnecessary trip point updates, and that > it seems to have caused a regression, sounds reasonable to revert at least while > a proper fix isn't found. > > Acked-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> > > Only thing is you might want to add a cc: stable tag to guarantee it is > backported (AFAIR Fixes: doesn't guarantee backport), even more so given there > are conflicts. Applied as 6.9-rc material, thanks!
diff --git a/drivers/thermal/thermal_trip.c b/drivers/thermal/thermal_trip.c index 09f6050dd041..497abf0d47ca 100644 --- a/drivers/thermal/thermal_trip.c +++ b/drivers/thermal/thermal_trip.c @@ -65,7 +65,6 @@ void __thermal_zone_set_trips(struct thermal_zone_device *tz) { const struct thermal_trip *trip; int low = -INT_MAX, high = INT_MAX; - bool same_trip = false; int ret; lockdep_assert_held(&tz->lock); @@ -74,36 +73,22 @@ void __thermal_zone_set_trips(struct thermal_zone_device *tz) return; for_each_trip(tz, trip) { - bool low_set = false; int trip_low; trip_low = trip->temperature - trip->hysteresis; - if (trip_low < tz->temperature && trip_low > low) { + if (trip_low < tz->temperature && trip_low > low) low = trip_low; - low_set = true; - same_trip = false; - } if (trip->temperature > tz->temperature && - trip->temperature < high) { + trip->temperature < high) high = trip->temperature; - same_trip = low_set; - } } /* No need to change trip points */ if (tz->prev_low_trip == low && tz->prev_high_trip == high) return; - /* - * If "high" and "low" are the same, skip the change unless this is the - * first time. - */ - if (same_trip && (tz->prev_low_trip != -INT_MAX || - tz->prev_high_trip != INT_MAX)) - return; - tz->prev_low_trip = low; tz->prev_high_trip = high;