diff mbox series

[v2] thermal/core: Clear all mitigation when thermal zone is disabled

Message ID 1640761407-2028-1-git-send-email-quic_manafm@quicinc.com (mailing list archive)
State Changes Requested, archived
Headers show
Series [v2] thermal/core: Clear all mitigation when thermal zone is disabled | expand

Commit Message

Manaf Meethalavalappu Pallikunhi Dec. 29, 2021, 7:03 a.m. UTC
Whenever a thermal zone is in trip violated state, there is a chance
that the same thermal zone mode can be disabled either via thermal
core API or via thermal zone sysfs. Once it is disabled, the framework
bails out any re-evaluation of thermal zone. It leads to a case where
if it is already in mitigation state, it will stay the same state
until it is re-enabled.

To avoid above mentioned issue, on thermal zone disable request
reset thermal zone and clear mitigation for each trip explicitly.

Signed-off-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
---
 drivers/thermal/thermal_core.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Rafael J. Wysocki Dec. 30, 2021, 4:09 p.m. UTC | #1
On Wed, Dec 29, 2021 at 8:03 AM Manaf Meethalavalappu Pallikunhi
<quic_manafm@quicinc.com> wrote:
>
> Whenever a thermal zone is in trip violated state, there is a chance
> that the same thermal zone mode can be disabled either via thermal
> core API or via thermal zone sysfs. Once it is disabled, the framework
> bails out any re-evaluation of thermal zone. It leads to a case where
> if it is already in mitigation state, it will stay the same state
> until it is re-enabled.

You seem to be arguing that disabling a thermal zone should prevent it
from throttling anything, which is reasonable, but I'm not sure if the
change below is sufficient for that.

> To avoid above mentioned issue, on thermal zone disable request
> reset thermal zone and clear mitigation for each trip explicitly.
>
> Signed-off-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
> ---
>  drivers/thermal/thermal_core.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 51374f4..5f4e35b 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -427,6 +427,7 @@ static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
>                                         enum thermal_device_mode mode)
>  {
>         int ret = 0;
> +       int trip;

This can be declared in the block in which it is used.

>
>         mutex_lock(&tz->lock);
>
> @@ -449,8 +450,14 @@ static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
>
>         if (mode == THERMAL_DEVICE_ENABLED)

The coding style asks for braces here if they are used after the else.

>                 thermal_notify_tz_enable(tz->id);
> -       else
> +       else {
> +               /* make sure all previous throttlings are cleared */
> +               thermal_zone_device_init(tz);
> +               for (trip = 0; trip < tz->trips; trip++)
> +                       handle_thermal_trip(tz, trip);

So I'm not sure if this makes the throttling go away in all cases (eg.
what if the current temperature is still above the given trip at this
point?).

> +
>                 thermal_notify_tz_disable(tz->id);
> +       }
>
>         return ret;
>  }
>
Manaf Meethalavalappu Pallikunhi Dec. 30, 2021, 6:16 p.m. UTC | #2
On 12/30/2021 9:39 PM, Rafael J. Wysocki wrote:
> On Wed, Dec 29, 2021 at 8:03 AM Manaf Meethalavalappu Pallikunhi
> <quic_manafm@quicinc.com> wrote:
>> Whenever a thermal zone is in trip violated state, there is a chance
>> that the same thermal zone mode can be disabled either via thermal
>> core API or via thermal zone sysfs. Once it is disabled, the framework
>> bails out any re-evaluation of thermal zone. It leads to a case where
>> if it is already in mitigation state, it will stay the same state
>> until it is re-enabled.
> You seem to be arguing that disabling a thermal zone should prevent it
> from throttling anything, which is reasonable, but I'm not sure if the
> change below is sufficient for that.
>
>> To avoid above mentioned issue, on thermal zone disable request
>> reset thermal zone and clear mitigation for each trip explicitly.
>>
>> Signed-off-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
>> ---
>>   drivers/thermal/thermal_core.c | 9 ++++++++-
>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
>> index 51374f4..5f4e35b 100644
>> --- a/drivers/thermal/thermal_core.c
>> +++ b/drivers/thermal/thermal_core.c
>> @@ -427,6 +427,7 @@ static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
>>                                          enum thermal_device_mode mode)
>>   {
>>          int ret = 0;
>> +       int trip;
> This can be declared in the block in which it is used.
Sure, will update in next patch version
>
>>          mutex_lock(&tz->lock);
>>
>> @@ -449,8 +450,14 @@ static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
>>
>>          if (mode == THERMAL_DEVICE_ENABLED)
> The coding style asks for braces here if they are used after the else.
Sure will update in next version
>>                  thermal_notify_tz_enable(tz->id);
>> -       else
>> +       else {
>> +               /* make sure all previous throttlings are cleared */
>> +               thermal_zone_device_init(tz);
>> +               for (trip = 0; trip < tz->trips; trip++)
>> +                       handle_thermal_trip(tz, trip);
> So I'm not sure if this makes the throttling go away in all cases (eg.
> what if the current temperature is still above the given trip at this
> point?).

The thermal_zone_device_init() will reset current temperature with THERMAL_TEMP_INVALID. Then the following API
thandle_thermal_trip() doesn't call get_temp to sensor driver again instead it will use this reset temperature
value for each trip re-evaluation. Hence  for above mentioned case, I think it will clear the throttling
irrespective of what is the actual current temperature at that instant. Otherwise please correct me

May I know is there any other possible cases where throttling will not go away completely ?

>> +
>>                  thermal_notify_tz_disable(tz->id);
>> +       }
>>
>>          return ret;
>>   }
>>
diff mbox series

Patch

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 51374f4..5f4e35b 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -427,6 +427,7 @@  static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
 					enum thermal_device_mode mode)
 {
 	int ret = 0;
+	int trip;
 
 	mutex_lock(&tz->lock);
 
@@ -449,8 +450,14 @@  static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
 
 	if (mode == THERMAL_DEVICE_ENABLED)
 		thermal_notify_tz_enable(tz->id);
-	else
+	else {
+		/* make sure all previous throttlings are cleared */
+		thermal_zone_device_init(tz);
+		for (trip = 0; trip < tz->trips; trip++)
+			handle_thermal_trip(tz, trip);
+
 		thermal_notify_tz_disable(tz->id);
+	}
 
 	return ret;
 }