Message ID | 20210419084536.25000-3-lukasz.luba@arm.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Daniel Lezcano |
Headers | show |
Series | Improve IPA mechanisms in low temperature state | expand |
On 19/04/2021 10:45, Lukasz Luba wrote: > The cooling device state change generates an event, also when there is no > need, because temperature is low and device is not throttled. Avoid to > unnecessary update the cooling device which means also not sending event. > The cooling device state has not changed because the temperature is still > below the first activation trip point value, so we can do this. > Add a tracking mechanism to make sure it updates cooling devices only > once - when the temperature dropps below first trip point. > > Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org> > Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> > --- > drivers/thermal/gov_power_allocator.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c > index d393409fb786..f379f1aaa3b5 100644 > --- a/drivers/thermal/gov_power_allocator.c > +++ b/drivers/thermal/gov_power_allocator.c > @@ -571,7 +571,7 @@ static void reset_pid_controller(struct power_allocator_params *params) > params->prev_err = 0; > } > > -static void allow_maximum_power(struct thermal_zone_device *tz) > +static void allow_maximum_power(struct thermal_zone_device *tz, bool update) > { > struct thermal_instance *instance; > struct power_allocator_params *params = tz->governor_data; > @@ -594,9 +594,13 @@ static void allow_maximum_power(struct thermal_zone_device *tz) > */ > cdev->ops->get_requested_power(cdev, &req_power); > > - instance->cdev->updated = false; > + if (update) > + instance->cdev->updated = false; > + > mutex_unlock(&instance->cdev->lock); > - (instance->cdev); > + > + if (update) > + thermal_cdev_update(instance->cdev); This cdev update has something bad IMHO. It is protected by a mutex but the 'updated' field is left unprotected before calling thermal_cdev_update(). It is not the fault of this code but how the cooling device are updated and how it interacts with the thermal instances. IMO, part of the core code needs to revisited. This change tight a bit more the knot. Would it make sense to you if we create a function eg. __thermal_cdev_update() And then we have: void thermal_cdev_update(struct thermal_cooling_device *cdev) { mutex_lock(&cdev->lock); /* cooling device is updated*/ if (cdev->updated) { mutex_unlock(&cdev->lock); return; } __thermal_cdev_update(cdev); thermal_cdev_set_cur_state(cdev, target); cdev->updated = true; mutex_unlock(&cdev->lock); trace_cdev_update(cdev, target); dev_dbg(&cdev->device, "set to state %lu\n", target); } And in this file we do instead: - instance->cdev->updated = false; + if (update) + __thermal_cdev_update(instance->cdev); mutex_unlock(&instance->cdev->lock); - thermal_cdev_update(instance->cdev); > } > mutex_unlock(&tz->lock); > } > @@ -710,6 +714,7 @@ static int power_allocator_throttle(struct thermal_zone_device *tz, int trip) > int ret; > int switch_on_temp, control_temp; > struct power_allocator_params *params = tz->governor_data; > + bool update; > > /* > * We get called for every trip point but we only need to do > @@ -721,9 +726,10 @@ static int power_allocator_throttle(struct thermal_zone_device *tz, int trip) > ret = tz->ops->get_trip_temp(tz, params->trip_switch_on, > &switch_on_temp); > if (!ret && (tz->temperature < switch_on_temp)) { > + update = (tz->last_temperature >= switch_on_temp); > tz->passive = 0; > reset_pid_controller(params); > - allow_maximum_power(tz); > + allow_maximum_power(tz, update); > return 0; > } > >
Hi Daniel, On 4/20/21 2:30 PM, Daniel Lezcano wrote: > On 19/04/2021 10:45, Lukasz Luba wrote: [snip] >> - instance->cdev->updated = false; >> + if (update) >> + instance->cdev->updated = false; >> + >> mutex_unlock(&instance->cdev->lock); >> - (instance->cdev); >> + >> + if (update) >> + thermal_cdev_update(instance->cdev); > > This cdev update has something bad IMHO. It is protected by a mutex but > the 'updated' field is left unprotected before calling > thermal_cdev_update(). > > It is not the fault of this code but how the cooling device are updated > and how it interacts with the thermal instances. > > IMO, part of the core code needs to revisited. I agree, but please check my comments below. > > This change tight a bit more the knot. > > Would it make sense to you if we create a function eg. > __thermal_cdev_update() I'm not sure if I assume it right that the function would only have the: list_for_each_entry(instance, &cdev->thermal_instances, cdev_node) loop from the thermal_cdev_update(). But if it has only this loop then it's too little. > > And then we have: > > void thermal_cdev_update(struct thermal_cooling_device *cdev) > { > mutex_lock(&cdev->lock); > /* cooling device is updated*/ > if (cdev->updated) { > mutex_unlock(&cdev->lock); > return; > } > > __thermal_cdev_update(cdev); > > thermal_cdev_set_cur_state(cdev, target); Here we are actually setting the 'target' state via: cdev->ops->set_cur_state(cdev, target) then we notify, then updating stats. > > cdev->updated = true; > mutex_unlock(&cdev->lock); > trace_cdev_update(cdev, target); Also this trace is something that I'm using in my tests... > dev_dbg(&cdev->device, "set to state %lu\n", target); > } > > And in this file we do instead: > > - instance->cdev->updated = false; > + if (update) > + __thermal_cdev_update(instance->cdev); > mutex_unlock(&instance->cdev->lock); > - thermal_cdev_update(instance->cdev); Without the line above, we are not un-throttling the devices. Regards, Lukasz
On 20/04/2021 16:21, Lukasz Luba wrote: > Hi Daniel, > > On 4/20/21 2:30 PM, Daniel Lezcano wrote: >> On 19/04/2021 10:45, Lukasz Luba wrote: > > [snip] > >>> - instance->cdev->updated = false; >>> + if (update) >>> + instance->cdev->updated = false; >>> + >>> mutex_unlock(&instance->cdev->lock); >>> - (instance->cdev); >>> + >>> + if (update) >>> + thermal_cdev_update(instance->cdev); >> >> This cdev update has something bad IMHO. It is protected by a mutex but >> the 'updated' field is left unprotected before calling >> thermal_cdev_update(). >> >> It is not the fault of this code but how the cooling device are updated >> and how it interacts with the thermal instances. >> >> IMO, part of the core code needs to revisited. > > I agree, but please check my comments below. > >> >> This change tight a bit more the knot. >> >> Would it make sense to you if we create a function eg. >> __thermal_cdev_update() > > I'm not sure if I assume it right that the function would only have the: > list_for_each_entry(instance, &cdev->thermal_instances, cdev_node) > > loop from the thermal_cdev_update(). But if it has only this loop then > it's too little. > >> >> And then we have: >> >> void thermal_cdev_update(struct thermal_cooling_device *cdev) >> { >> mutex_lock(&cdev->lock); >> /* cooling device is updated*/ >> if (cdev->updated) { >> mutex_unlock(&cdev->lock); >> return; >> } >> >> __thermal_cdev_update(cdev); >> >> thermal_cdev_set_cur_state(cdev, target); > > Here we are actually setting the 'target' state via: > cdev->ops->set_cur_state(cdev, target) > > then we notify, then updating stats. > >> >> cdev->updated = true; >> mutex_unlock(&cdev->lock); >> trace_cdev_update(cdev, target); > > Also this trace is something that I'm using in my tests... Yeah, I noticed right after sending the comments. All that should be moved in the lockless function. So this function becomes: void thermal_cdev_update(struct thermal_cooling_device *cdev) { mutex_lock(&cdev->lock); if (!cdev->updated) { __thermal_cdev_update(cdev); cdev->updated = true; } mutex_unlock(&cdev->lock); dev_dbg(&cdev->device, "set to state %lu\n", target); } We end up with the trace_cdev_update(cdev, target) inside the mutex section but that should be fine. >> dev_dbg(&cdev->device, "set to state %lu\n", target); >> } >> >> And in this file we do instead: >> >> - instance->cdev->updated = false; >> + if (update) >> + __thermal_cdev_update(instance->cdev); >> mutex_unlock(&instance->cdev->lock); >> - thermal_cdev_update(instance->cdev); > > Without the line above, we are not un-throttling the devices. Is it still true with the amended function thermal_cdev_update() ?
On 4/20/21 4:24 PM, Daniel Lezcano wrote: > On 20/04/2021 16:21, Lukasz Luba wrote: >> Hi Daniel, >> >> On 4/20/21 2:30 PM, Daniel Lezcano wrote: >>> On 19/04/2021 10:45, Lukasz Luba wrote: >> >> [snip] >> >>>> - instance->cdev->updated = false; >>>> + if (update) >>>> + instance->cdev->updated = false; >>>> + >>>> mutex_unlock(&instance->cdev->lock); >>>> - (instance->cdev); >>>> + >>>> + if (update) >>>> + thermal_cdev_update(instance->cdev); >>> >>> This cdev update has something bad IMHO. It is protected by a mutex but >>> the 'updated' field is left unprotected before calling >>> thermal_cdev_update(). >>> >>> It is not the fault of this code but how the cooling device are updated >>> and how it interacts with the thermal instances. >>> >>> IMO, part of the core code needs to revisited. >> >> I agree, but please check my comments below. >> >>> >>> This change tight a bit more the knot. >>> >>> Would it make sense to you if we create a function eg. >>> __thermal_cdev_update() >> >> I'm not sure if I assume it right that the function would only have the: >> list_for_each_entry(instance, &cdev->thermal_instances, cdev_node) >> >> loop from the thermal_cdev_update(). But if it has only this loop then >> it's too little. >> >>> >>> And then we have: >>> >>> void thermal_cdev_update(struct thermal_cooling_device *cdev) >>> { >>> mutex_lock(&cdev->lock); >>> /* cooling device is updated*/ >>> if (cdev->updated) { >>> mutex_unlock(&cdev->lock); >>> return; >>> } >>> >>> __thermal_cdev_update(cdev); >>> >>> thermal_cdev_set_cur_state(cdev, target); >> >> Here we are actually setting the 'target' state via: >> cdev->ops->set_cur_state(cdev, target) >> >> then we notify, then updating stats. >> >>> >>> cdev->updated = true; >>> mutex_unlock(&cdev->lock); >>> trace_cdev_update(cdev, target); >> >> Also this trace is something that I'm using in my tests... > > Yeah, I noticed right after sending the comments. All that should be > moved in the lockless function. Agree > > So this function becomes: > > void thermal_cdev_update(struct thermal_cooling_device *cdev) > { > mutex_lock(&cdev->lock); > if (!cdev->updated) { > __thermal_cdev_update(cdev); > cdev->updated = true; > } > mutex_unlock(&cdev->lock); > > dev_dbg(&cdev->device, "set to state %lu\n", target); > } > > We end up with the trace_cdev_update(cdev, target) inside the mutex > section but that should be fine. True, this shouldn't be an issue. > >>> dev_dbg(&cdev->device, "set to state %lu\n", target); >>> } >>> >>> And in this file we do instead: >>> >>> - instance->cdev->updated = false; >>> + if (update) >>> + __thermal_cdev_update(instance->cdev); >>> mutex_unlock(&instance->cdev->lock); >>> - thermal_cdev_update(instance->cdev); >> >> Without the line above, we are not un-throttling the devices. > > Is it still true with the amended function thermal_cdev_update() ? > > That new approach should work. I can test your patch with this new functions and re-base my work on top of it. Or you like me to write such patch and send it?
On 20/04/2021 22:01, Lukasz Luba wrote: > > > On 4/20/21 4:24 PM, Daniel Lezcano wrote: >> On 20/04/2021 16:21, Lukasz Luba wrote: >>> Hi Daniel, >>> >>> On 4/20/21 2:30 PM, Daniel Lezcano wrote: >>>> On 19/04/2021 10:45, Lukasz Luba wrote: >>> >>> [snip] [ ... ] > > That new approach should work. I can test your patch with this new > functions and re-base my work on top of it. > Or you like me to write such patch and send it? At your convenience. I'm pretty busy ATM with more patches to review, so if you can handle that change that will be nice. Otherwise, I can take care of it but later. Thanks -- Daniel
On 4/20/21 10:03 PM, Daniel Lezcano wrote: > On 20/04/2021 22:01, Lukasz Luba wrote: >> >> >> On 4/20/21 4:24 PM, Daniel Lezcano wrote: >>> On 20/04/2021 16:21, Lukasz Luba wrote: >>>> Hi Daniel, >>>> >>>> On 4/20/21 2:30 PM, Daniel Lezcano wrote: >>>>> On 19/04/2021 10:45, Lukasz Luba wrote: >>>> >>>> [snip] > > [ ... ] > >> >> That new approach should work. I can test your patch with this new >> functions and re-base my work on top of it. >> Or you like me to write such patch and send it? > > At your convenience. I'm pretty busy ATM with more patches to review, so > if you can handle that change that will be nice. Otherwise, I can take > care of it but later. > OK, so I will create such patch and add your name in tags: Co-developed-by: and Signed-off-by: plus also a lore.kernel.org link in the patch commit message into this discussion. Thanks for having a look at this. Regards, Lukasz
diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c index d393409fb786..f379f1aaa3b5 100644 --- a/drivers/thermal/gov_power_allocator.c +++ b/drivers/thermal/gov_power_allocator.c @@ -571,7 +571,7 @@ static void reset_pid_controller(struct power_allocator_params *params) params->prev_err = 0; } -static void allow_maximum_power(struct thermal_zone_device *tz) +static void allow_maximum_power(struct thermal_zone_device *tz, bool update) { struct thermal_instance *instance; struct power_allocator_params *params = tz->governor_data; @@ -594,9 +594,13 @@ static void allow_maximum_power(struct thermal_zone_device *tz) */ cdev->ops->get_requested_power(cdev, &req_power); - instance->cdev->updated = false; + if (update) + instance->cdev->updated = false; + mutex_unlock(&instance->cdev->lock); - thermal_cdev_update(instance->cdev); + + if (update) + thermal_cdev_update(instance->cdev); } mutex_unlock(&tz->lock); } @@ -710,6 +714,7 @@ static int power_allocator_throttle(struct thermal_zone_device *tz, int trip) int ret; int switch_on_temp, control_temp; struct power_allocator_params *params = tz->governor_data; + bool update; /* * We get called for every trip point but we only need to do @@ -721,9 +726,10 @@ static int power_allocator_throttle(struct thermal_zone_device *tz, int trip) ret = tz->ops->get_trip_temp(tz, params->trip_switch_on, &switch_on_temp); if (!ret && (tz->temperature < switch_on_temp)) { + update = (tz->last_temperature >= switch_on_temp); tz->passive = 0; reset_pid_controller(params); - allow_maximum_power(tz); + allow_maximum_power(tz, update); return 0; }
The cooling device state change generates an event, also when there is no need, because temperature is low and device is not throttled. Avoid to unnecessary update the cooling device which means also not sending event. The cooling device state has not changed because the temperature is still below the first activation trip point value, so we can do this. Add a tracking mechanism to make sure it updates cooling devices only once - when the temperature dropps below first trip point. Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> --- drivers/thermal/gov_power_allocator.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)