Message ID | 20210105190107.30479-1-lukasz.luba@arm.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Daniel Lezcano |
Headers | show |
Series | thermal: power allocator: Add control for non-power actor devices | expand |
Hi Daniel, Gentle ping. On 1/5/21 7:01 PM, Lukasz Luba wrote: > The cooling devices which are used in IPA should provide power mapping > functions. The callback functions are used for power estimation and state > setting. When these functions are missing IPA ignores such cooling devices > and does not limit their performance. It could happen that the platform > configuration is missing these functions in example when the Energy Model > was not setup properly (missing DT entry 'dynamic-power-coefficient'). > > The patch adds basic control over these devices' performance. It > manages to throttle them to stay safe and not overheat. It also adds a > warning during the binding phase, so it can be captured during testing. > > The patch covers also a corner case when all of the cooling devices are > non-power actors. > > Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> > --- > drivers/thermal/gov_power_allocator.c | 71 +++++++++++++++++++++++++-- > 1 file changed, 68 insertions(+), 3 deletions(-) > Could you have a look on the patch and if OK take it into your tree, please? If you missed it, I can resend. I'm still going through emails received around my holidays. Regards, Lukasz
On 05/01/2021 20:01, Lukasz Luba wrote: > The cooling devices which are used in IPA should provide power mapping > functions. The callback functions are used for power estimation and state > setting. When these functions are missing IPA ignores such cooling devices > and does not limit their performance. It could happen that the platform > configuration is missing these functions in example when the Energy Model > was not setup properly (missing DT entry 'dynamic-power-coefficient'). > > The patch adds basic control over these devices' performance. It > manages to throttle them to stay safe and not overheat. It also adds a > warning during the binding phase, so it can be captured during testing. > > The patch covers also a corner case when all of the cooling devices are > non-power actors. In my opinion this is a user space problem. If a device does not have power information, then it should use the step-wise governor instead. It is not the power allocator to overcome a wrong or unsupported setup. Usually, the default governor is the step-wise and the userspace sets the power allocator policy. A solution can be to fail to change the policy or bind if the associated cooling devices are not all power actors. > Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> > --- > drivers/thermal/gov_power_allocator.c | 71 +++++++++++++++++++++++++-- > 1 file changed, 68 insertions(+), 3 deletions(-) > > diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c > index 7a4170a0b51f..bcd1d524a1ba 100644 > --- a/drivers/thermal/gov_power_allocator.c > +++ b/drivers/thermal/gov_power_allocator.c > @@ -276,6 +276,33 @@ static u32 pid_controller(struct thermal_zone_device *tz, > return power_range; > } > > +/** > + * control_non_power_actor() - control performance of a cooling device which > + * is not a power actor > + * @instance: thermal instance to update > + * @throttle: boolean flag indicating the action > + * > + * Set the min/max performance point for a given cooling device, with respect > + * to limits. It is needed only for devices which are not power actors and > + * don't provide the power mapping functions. These devices will be capped > + * more strictly to provide safe conditions and not overheat them. > + */ > +static void control_non_power_actor(struct thermal_instance *instance, > + bool throttle) > +{ > + struct thermal_cooling_device *cdev = instance->cdev; > + > + if (throttle) > + instance->target = instance->upper; > + else > + instance->target = instance->lower; > + > + mutex_lock(&cdev->lock); > + cdev->updated = false; > + mutex_unlock(&cdev->lock); > + thermal_cdev_update(cdev); > +} > + > /** > * power_actor_set_power() - limit the maximum power a cooling device consumes > * @cdev: pointer to &thermal_cooling_device > @@ -405,7 +432,7 @@ static int allocate_power(struct thermal_zone_device *tz, > > if (!num_actors) { > ret = -ENODEV; > - goto unlock; > + goto safety_throttling; > } > > /* > @@ -495,6 +522,16 @@ static int allocate_power(struct thermal_zone_device *tz, > control_temp - tz->temperature); > > kfree(req_power); > + > +safety_throttling: > + list_for_each_entry(instance, &tz->thermal_instances, tz_node) { > + if (instance->trip != trip_max_desired_temperature) > + continue; > + > + if (!cdev_is_power_actor(instance->cdev)) > + control_non_power_actor(instance, true); > + } > + > unlock: > mutex_unlock(&tz->lock); > > @@ -576,9 +613,13 @@ static void allow_maximum_power(struct thermal_zone_device *tz) > > mutex_lock(&tz->lock); > list_for_each_entry(instance, &tz->thermal_instances, tz_node) { > - if ((instance->trip != params->trip_max_desired_temperature) || > - (!cdev_is_power_actor(instance->cdev))) > + if (instance->trip != params->trip_max_desired_temperature) > + continue; > + > + if (!cdev_is_power_actor(instance->cdev)) { > + control_non_power_actor(instance, false); > continue; > + } > > instance->target = 0; > mutex_lock(&instance->cdev->lock); > @@ -589,6 +630,28 @@ static void allow_maximum_power(struct thermal_zone_device *tz) > mutex_unlock(&tz->lock); > } > > +/** > + * check_power_actors() - Check all cooling devices and warn when they are > + * not power actors > + * @tz: thermal zone to operate on > + * > + * Check all cooling devices in the @tz and warn when they are not power > + * actors. These cooling devices will be throttled aggressively because they > + * miss needed callbacks. The warning would help to investigate the issue, > + * which could be e.g. lack of Energy Model for a given device. > + */ > +static void check_power_actors(struct thermal_zone_device *tz) > +{ > + struct thermal_instance *instance; > + > + list_for_each_entry(instance, &tz->thermal_instances, tz_node) { > + if (!cdev_is_power_actor(instance->cdev)) { > + dev_warn(&tz->device, "power_allocator: %s is not a power actor\n", > + instance->cdev->type); > + } > + } > +} > + > /** > * power_allocator_bind() - bind the power_allocator governor to a thermal zone > * @tz: thermal zone to bind it to > @@ -637,6 +700,8 @@ static int power_allocator_bind(struct thermal_zone_device *tz) > > tz->governor_data = params; > > + check_power_actors(tz); > + > return 0; > > free_params: >
On 1/18/21 4:07 PM, Daniel Lezcano wrote: > On 05/01/2021 20:01, Lukasz Luba wrote: >> The cooling devices which are used in IPA should provide power mapping >> functions. The callback functions are used for power estimation and state >> setting. When these functions are missing IPA ignores such cooling devices >> and does not limit their performance. It could happen that the platform >> configuration is missing these functions in example when the Energy Model >> was not setup properly (missing DT entry 'dynamic-power-coefficient'). >> >> The patch adds basic control over these devices' performance. It >> manages to throttle them to stay safe and not overheat. It also adds a >> warning during the binding phase, so it can be captured during testing. >> >> The patch covers also a corner case when all of the cooling devices are >> non-power actors. > > In my opinion this is a user space problem. If a device does not have > power information, then it should use the step-wise governor instead. > > It is not the power allocator to overcome a wrong or unsupported setup. > > Usually, the default governor is the step-wise and the userspace sets > the power allocator policy. True, but currently there is even no warning to the user, that some device (quite big, like GPU) is not controlled. This could be dangerous. > > A solution can be to fail to change the policy or bind if the associated > cooling devices are not all power actors. I tend to agree, we can simply 'fail' during the binding and print some warning. The code in that case would be much simpler. Let me send a v2 then. Thank you for your comments. Regards, Lukasz
diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c index 7a4170a0b51f..bcd1d524a1ba 100644 --- a/drivers/thermal/gov_power_allocator.c +++ b/drivers/thermal/gov_power_allocator.c @@ -276,6 +276,33 @@ static u32 pid_controller(struct thermal_zone_device *tz, return power_range; } +/** + * control_non_power_actor() - control performance of a cooling device which + * is not a power actor + * @instance: thermal instance to update + * @throttle: boolean flag indicating the action + * + * Set the min/max performance point for a given cooling device, with respect + * to limits. It is needed only for devices which are not power actors and + * don't provide the power mapping functions. These devices will be capped + * more strictly to provide safe conditions and not overheat them. + */ +static void control_non_power_actor(struct thermal_instance *instance, + bool throttle) +{ + struct thermal_cooling_device *cdev = instance->cdev; + + if (throttle) + instance->target = instance->upper; + else + instance->target = instance->lower; + + mutex_lock(&cdev->lock); + cdev->updated = false; + mutex_unlock(&cdev->lock); + thermal_cdev_update(cdev); +} + /** * power_actor_set_power() - limit the maximum power a cooling device consumes * @cdev: pointer to &thermal_cooling_device @@ -405,7 +432,7 @@ static int allocate_power(struct thermal_zone_device *tz, if (!num_actors) { ret = -ENODEV; - goto unlock; + goto safety_throttling; } /* @@ -495,6 +522,16 @@ static int allocate_power(struct thermal_zone_device *tz, control_temp - tz->temperature); kfree(req_power); + +safety_throttling: + list_for_each_entry(instance, &tz->thermal_instances, tz_node) { + if (instance->trip != trip_max_desired_temperature) + continue; + + if (!cdev_is_power_actor(instance->cdev)) + control_non_power_actor(instance, true); + } + unlock: mutex_unlock(&tz->lock); @@ -576,9 +613,13 @@ static void allow_maximum_power(struct thermal_zone_device *tz) mutex_lock(&tz->lock); list_for_each_entry(instance, &tz->thermal_instances, tz_node) { - if ((instance->trip != params->trip_max_desired_temperature) || - (!cdev_is_power_actor(instance->cdev))) + if (instance->trip != params->trip_max_desired_temperature) + continue; + + if (!cdev_is_power_actor(instance->cdev)) { + control_non_power_actor(instance, false); continue; + } instance->target = 0; mutex_lock(&instance->cdev->lock); @@ -589,6 +630,28 @@ static void allow_maximum_power(struct thermal_zone_device *tz) mutex_unlock(&tz->lock); } +/** + * check_power_actors() - Check all cooling devices and warn when they are + * not power actors + * @tz: thermal zone to operate on + * + * Check all cooling devices in the @tz and warn when they are not power + * actors. These cooling devices will be throttled aggressively because they + * miss needed callbacks. The warning would help to investigate the issue, + * which could be e.g. lack of Energy Model for a given device. + */ +static void check_power_actors(struct thermal_zone_device *tz) +{ + struct thermal_instance *instance; + + list_for_each_entry(instance, &tz->thermal_instances, tz_node) { + if (!cdev_is_power_actor(instance->cdev)) { + dev_warn(&tz->device, "power_allocator: %s is not a power actor\n", + instance->cdev->type); + } + } +} + /** * power_allocator_bind() - bind the power_allocator governor to a thermal zone * @tz: thermal zone to bind it to @@ -637,6 +700,8 @@ static int power_allocator_bind(struct thermal_zone_device *tz) tz->governor_data = params; + check_power_actors(tz); + return 0; free_params:
The cooling devices which are used in IPA should provide power mapping functions. The callback functions are used for power estimation and state setting. When these functions are missing IPA ignores such cooling devices and does not limit their performance. It could happen that the platform configuration is missing these functions in example when the Energy Model was not setup properly (missing DT entry 'dynamic-power-coefficient'). The patch adds basic control over these devices' performance. It manages to throttle them to stay safe and not overheat. It also adds a warning during the binding phase, so it can be captured during testing. The patch covers also a corner case when all of the cooling devices are non-power actors. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> --- drivers/thermal/gov_power_allocator.c | 71 +++++++++++++++++++++++++-- 1 file changed, 68 insertions(+), 3 deletions(-)