Message ID | 20240306085428.88011-1-daniel.lezcano@linaro.org (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
Series | [RFC] thermal/core: Fix trip point crossing events ordering | expand |
On Wed, Mar 6, 2024 at 9:54 AM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > Let's assume the following setup: > > - trip 0 = 65°C > - trip 1 = 70°C > - trip 2 = 75°C > > The current temperature is 35°C. > > The interrupt is setup to fire at 65°C. If the thermal capacity is > saturated it is possible the temperature jumps to 72°c when reading > the temperature after the interrupt fired when 65°C was crossed. That > means we should have two events notified to userspace. The first one > for trip 0 and the second one for trip 1. > > When the function thermal_zone_update() is called from the threaded > interrupt, it will read the temperature and then call for_each_trip() > which in turns call handle_trip_point(). > > This function will check: > > if (tz->last_temperature < trip->temperature && > tz->temperature >= trip->temperature) > thermal_notify_tz_trip_up() For the mainline: $ git grep handle_trip_point | cat $ Do you mean handle_thermal_trip()? But it doesn't do the above in the mainline. It does (comments omitted) if (tz->last_temperature < trip->threshold) { if (tz->temperature >= trip->temperature) { thermal_notify_tz_trip_up(tz, trip); thermal_debug_tz_trip_up(tz, trip); trip->threshold = trip->temperature - trip->hysteresis; } else { trip->threshold = trip->temperature; } } > > So here, we will call this function with trip0 followed by trip1. That > will result in an event for each trip point, reflecting the trip point > being crossed the way up with a temperature raising. So far, so good. > > Usually the sensors have an interrupt when the temperature is crossed > the way up but not the way down, so there an extra delay corresponding > to the passive polling where the temperature could have dropped and > crossed more than one trip point. This scenario is likely to happen > more often when multiple trip points are specified. So considering the > same setup after crossing the trip 2, we stop the workload responsible > of the heat and the temperature drops suddenly to 62°C. In this case, > the next polling will call thermal_zone_device_update(), then > for_each_trip() and handle_trip_point(). > > This function will check: > > if (tz->last_temperature >= trip->temperature && > tz->temperature < trip->temperature - trip->hysteresis) > thermal_notify_tz_trip_down() Again, assuming that you mean handle_thermal_trip(), the above is not the current mainline code, which is (comments omitted) if (tz->last_temperature >= trip->threshold) { if (tz->temperature < trip->temperature - trip->hysteresis) { thermal_notify_tz_trip_down(tz, trip); thermal_debug_tz_trip_down(tz, trip); trip->threshold = trip->temperature; } else { trip->threshold = trip->temperature - trip->hysteresis; } } I guess this doesn't matter here? > The loop for_each_trip() will call trip0, 1 and 2. That will result in > generating the events for trip0, 1 and 2, in the wrong order. That is > not reflecting the thermal dynamic and puzzles the userspace > monitoring the temperature. Only if the trips are ordered in a specific way, but they don't need to be ordered in any way. > Fix this by inspecting the trend of the temperature. If it is raising, > then we browse the trip point in the ascending order, if it is falling > then we browse in the descending order. > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> > --- > drivers/thermal/thermal_core.c | 8 ++++++-- > drivers/thermal/thermal_core.h | 3 +++ > 2 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > index dfaa6341694a..abb8ee5c9afe 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -473,8 +473,12 @@ void __thermal_zone_device_update(struct thermal_zone_device *tz, > > tz->notify_event = event; > > - for_each_trip(tz, trip) > - handle_thermal_trip(tz, trip); > + if (tz->last_temperature < tz->temperature) > + for_each_trip(tz, trip) > + handle_thermal_trip(tz, trip); > + else > + for_each_trip_reverse(tz, trip) > + handle_thermal_trip(tz, trip); This works assuming a "proper" ordering of the trips. > > monitor_thermal_zone(tz); > } > diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h > index e9c099ecdd0f..0072b3d4039e 100644 > --- a/drivers/thermal/thermal_core.h > +++ b/drivers/thermal/thermal_core.h > @@ -123,6 +123,9 @@ void thermal_governor_update_tz(struct thermal_zone_device *tz, > #define for_each_trip(__tz, __trip) \ > for (__trip = __tz->trips; __trip - __tz->trips < __tz->num_trips; __trip++) > > +#define for_each_trip_reverse(__tz, __trip) \ > + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--) > + > void __thermal_zone_set_trips(struct thermal_zone_device *tz); > int thermal_zone_trip_id(const struct thermal_zone_device *tz, > const struct thermal_trip *trip); > -- Generally speaking, this is a matter of getting alignment on the expectations between the kernel and user space. It looks like user space expects to get the notifications in the order of either growing or falling temperatures, depending on the direction of the temperature change. Ordering the trips in the kernel is not practical, but the notifications can be ordered in principle. Is this what you'd like to do? Or can user space be bothered with recognizing that it may get the notifications for different trips out of order?
On 06/03/2024 13:02, Rafael J. Wysocki wrote: [ ... ] >> +#define for_each_trip_reverse(__tz, __trip) \ >> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--) >> + >> void __thermal_zone_set_trips(struct thermal_zone_device *tz); >> int thermal_zone_trip_id(const struct thermal_zone_device *tz, >> const struct thermal_trip *trip); >> -- > > Generally speaking, this is a matter of getting alignment on the > expectations between the kernel and user space. > > It looks like user space expects to get the notifications in the order > of either growing or falling temperatures, depending on the direction > of the temperature change. Ordering the trips in the kernel is not > practical, but the notifications can be ordered in principle. Is this > what you'd like to do? Yes > Or can user space be bothered with recognizing that it may get the > notifications for different trips out of order? IMO it is a bad information if the trip points events are coming unordered. The temperature signal is a time related measurements, the userspace should receive thermal information from this signal in the right order. It sounds strange to track the temperature signal in the kernel, then scramble the information, pass it to the userspace and except it to apply some kind of logic to unscramble it.
On Wed, Mar 6, 2024 at 1:43 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > On 06/03/2024 13:02, Rafael J. Wysocki wrote: > > [ ... ] > > >> +#define for_each_trip_reverse(__tz, __trip) \ > >> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--) > >> + > >> void __thermal_zone_set_trips(struct thermal_zone_device *tz); > >> int thermal_zone_trip_id(const struct thermal_zone_device *tz, > >> const struct thermal_trip *trip); > >> -- > > > > Generally speaking, this is a matter of getting alignment on the > > expectations between the kernel and user space. > > > > It looks like user space expects to get the notifications in the order > > of either growing or falling temperatures, depending on the direction > > of the temperature change. Ordering the trips in the kernel is not > > practical, but the notifications can be ordered in principle. Is this > > what you'd like to do? > > Yes > > > Or can user space be bothered with recognizing that it may get the > > notifications for different trips out of order? > > IMO it is a bad information if the trip points events are coming > unordered. The temperature signal is a time related measurements, the > userspace should receive thermal information from this signal in the > right order. It sounds strange to track the temperature signal in the > kernel, then scramble the information, pass it to the userspace and > except it to apply some kind of logic to unscramble it. So the notifications can be ordered before sending them out, as long as they are produced by a single __thermal_zone_device_update() call. I guess you also would like the thermal_debug_tz_trip_up/down() calls to be ordered, wouldn't you?
On 06/03/2024 13:53, Rafael J. Wysocki wrote: > On Wed, Mar 6, 2024 at 1:43 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >> >> On 06/03/2024 13:02, Rafael J. Wysocki wrote: >> >> [ ... ] >> >>>> +#define for_each_trip_reverse(__tz, __trip) \ >>>> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--) >>>> + >>>> void __thermal_zone_set_trips(struct thermal_zone_device *tz); >>>> int thermal_zone_trip_id(const struct thermal_zone_device *tz, >>>> const struct thermal_trip *trip); >>>> -- >>> >>> Generally speaking, this is a matter of getting alignment on the >>> expectations between the kernel and user space. >>> >>> It looks like user space expects to get the notifications in the order >>> of either growing or falling temperatures, depending on the direction >>> of the temperature change. Ordering the trips in the kernel is not >>> practical, but the notifications can be ordered in principle. Is this >>> what you'd like to do? >> >> Yes >> >>> Or can user space be bothered with recognizing that it may get the >>> notifications for different trips out of order? >> >> IMO it is a bad information if the trip points events are coming >> unordered. The temperature signal is a time related measurements, the >> userspace should receive thermal information from this signal in the >> right order. It sounds strange to track the temperature signal in the >> kernel, then scramble the information, pass it to the userspace and >> except it to apply some kind of logic to unscramble it. > > So the notifications can be ordered before sending them out, as long > as they are produced by a single __thermal_zone_device_update() call. > > I guess you also would like the thermal_debug_tz_trip_up/down() calls > to be ordered, wouldn't you? Right
On Wed, Mar 6, 2024 at 2:16 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > On 06/03/2024 13:53, Rafael J. Wysocki wrote: > > On Wed, Mar 6, 2024 at 1:43 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > >> > >> On 06/03/2024 13:02, Rafael J. Wysocki wrote: > >> > >> [ ... ] > >> > >>>> +#define for_each_trip_reverse(__tz, __trip) \ > >>>> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--) > >>>> + > >>>> void __thermal_zone_set_trips(struct thermal_zone_device *tz); > >>>> int thermal_zone_trip_id(const struct thermal_zone_device *tz, > >>>> const struct thermal_trip *trip); > >>>> -- > >>> > >>> Generally speaking, this is a matter of getting alignment on the > >>> expectations between the kernel and user space. > >>> > >>> It looks like user space expects to get the notifications in the order > >>> of either growing or falling temperatures, depending on the direction > >>> of the temperature change. Ordering the trips in the kernel is not > >>> practical, but the notifications can be ordered in principle. Is this > >>> what you'd like to do? > >> > >> Yes > >> > >>> Or can user space be bothered with recognizing that it may get the > >>> notifications for different trips out of order? > >> > >> IMO it is a bad information if the trip points events are coming > >> unordered. The temperature signal is a time related measurements, the > >> userspace should receive thermal information from this signal in the > >> right order. It sounds strange to track the temperature signal in the > >> kernel, then scramble the information, pass it to the userspace and > >> except it to apply some kind of logic to unscramble it. > > > > So the notifications can be ordered before sending them out, as long > > as they are produced by a single __thermal_zone_device_update() call. > > > > I guess you also would like the thermal_debug_tz_trip_up/down() calls > > to be ordered, wouldn't you? > > Right I have an idea how to do this, but it is based on a couple of patches that I've been working on in the meantime. Let me post these patches first and then I'll send a prototype patch addressing this on top of them.
On 06/03/2024 16:41, Rafael J. Wysocki wrote: > On Wed, Mar 6, 2024 at 2:16 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >> >> On 06/03/2024 13:53, Rafael J. Wysocki wrote: >>> On Wed, Mar 6, 2024 at 1:43 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >>>> >>>> On 06/03/2024 13:02, Rafael J. Wysocki wrote: >>>> >>>> [ ... ] >>>> >>>>>> +#define for_each_trip_reverse(__tz, __trip) \ >>>>>> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--) >>>>>> + >>>>>> void __thermal_zone_set_trips(struct thermal_zone_device *tz); >>>>>> int thermal_zone_trip_id(const struct thermal_zone_device *tz, >>>>>> const struct thermal_trip *trip); >>>>>> -- >>>>> >>>>> Generally speaking, this is a matter of getting alignment on the >>>>> expectations between the kernel and user space. >>>>> >>>>> It looks like user space expects to get the notifications in the order >>>>> of either growing or falling temperatures, depending on the direction >>>>> of the temperature change. Ordering the trips in the kernel is not >>>>> practical, but the notifications can be ordered in principle. Is this >>>>> what you'd like to do? >>>> >>>> Yes >>>> >>>>> Or can user space be bothered with recognizing that it may get the >>>>> notifications for different trips out of order? >>>> >>>> IMO it is a bad information if the trip points events are coming >>>> unordered. The temperature signal is a time related measurements, the >>>> userspace should receive thermal information from this signal in the >>>> right order. It sounds strange to track the temperature signal in the >>>> kernel, then scramble the information, pass it to the userspace and >>>> except it to apply some kind of logic to unscramble it. >>> >>> So the notifications can be ordered before sending them out, as long >>> as they are produced by a single __thermal_zone_device_update() call. >>> >>> I guess you also would like the thermal_debug_tz_trip_up/down() calls >>> to be ordered, wouldn't you? >> >> Right > > I have an idea how to do this, but it is based on a couple of patches > that I've been working on in the meantime. > > Let me post these patches first and then I'll send a prototype patch > addressing this on top of them. That is awesome, thanks !
On Wednesday, March 6, 2024 4:55:51 PM CET Daniel Lezcano wrote: > On 06/03/2024 16:41, Rafael J. Wysocki wrote: > > On Wed, Mar 6, 2024 at 2:16 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > >> > >> On 06/03/2024 13:53, Rafael J. Wysocki wrote: > >>> On Wed, Mar 6, 2024 at 1:43 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > >>>> > >>>> On 06/03/2024 13:02, Rafael J. Wysocki wrote: > >>>> > >>>> [ ... ] > >>>> > >>>>>> +#define for_each_trip_reverse(__tz, __trip) \ > >>>>>> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--) > >>>>>> + > >>>>>> void __thermal_zone_set_trips(struct thermal_zone_device *tz); > >>>>>> int thermal_zone_trip_id(const struct thermal_zone_device *tz, > >>>>>> const struct thermal_trip *trip); > >>>>>> -- > >>>>> > >>>>> Generally speaking, this is a matter of getting alignment on the > >>>>> expectations between the kernel and user space. > >>>>> > >>>>> It looks like user space expects to get the notifications in the order > >>>>> of either growing or falling temperatures, depending on the direction > >>>>> of the temperature change. Ordering the trips in the kernel is not > >>>>> practical, but the notifications can be ordered in principle. Is this > >>>>> what you'd like to do? > >>>> > >>>> Yes > >>>> > >>>>> Or can user space be bothered with recognizing that it may get the > >>>>> notifications for different trips out of order? > >>>> > >>>> IMO it is a bad information if the trip points events are coming > >>>> unordered. The temperature signal is a time related measurements, the > >>>> userspace should receive thermal information from this signal in the > >>>> right order. It sounds strange to track the temperature signal in the > >>>> kernel, then scramble the information, pass it to the userspace and > >>>> except it to apply some kind of logic to unscramble it. > >>> > >>> So the notifications can be ordered before sending them out, as long > >>> as they are produced by a single __thermal_zone_device_update() call. > >>> > >>> I guess you also would like the thermal_debug_tz_trip_up/down() calls > >>> to be ordered, wouldn't you? > >> > >> Right > > > > I have an idea how to do this, but it is based on a couple of patches > > that I've been working on in the meantime. > > > > Let me post these patches first and then I'll send a prototype patch > > addressing this on top of them. > > That is awesome, thanks ! Anytime! Now that I've posted this series: https://lore.kernel.org/linux-pm/4558384.LvFx2qVVIh@kreacher/ I can append the patch below that is based on it. The idea is really straightforward: Instead of sending the notifications and recording the stats right away, create two lists of trips for which they need to be send, sort them and then send the notifications etc in the right order. I want to avoid explicit memory allocations that can fail in principle, which is why lists are used. The reason why two lists are used is in case the trips are updated and that's why they appear to be crossed (which may not depend on the actual temperature change). One caveat is that the lists are sorted by trip thresholds (because they are the real values take into account in the code), but user space may expect them to be sorted by trip temperatures instead. That can be changed. --- drivers/thermal/thermal_core.c | 39 +++++++++++++++++++++++++++++++++------ drivers/thermal/thermal_core.h | 1 + 2 files changed, 34 insertions(+), 6 deletions(-) Index: linux-pm/drivers/thermal/thermal_core.c =================================================================== --- linux-pm.orig/drivers/thermal/thermal_core.c +++ linux-pm/drivers/thermal/thermal_core.c @@ -15,6 +15,7 @@ #include <linux/slab.h> #include <linux/kdev_t.h> #include <linux/idr.h> +#include <linux/list_sort.h> #include <linux/thermal.h> #include <linux/reboot.h> #include <linux/string.h> @@ -361,7 +362,9 @@ static void handle_critical_trips(struct } static void handle_thermal_trip(struct thermal_zone_device *tz, - struct thermal_trip_desc *td) + struct thermal_trip_desc *td, + struct list_head *way_up_list, + struct list_head *way_down_list) { const struct thermal_trip *trip = &td->trip; @@ -382,8 +385,7 @@ static void handle_thermal_trip(struct t * the threshold and the trip temperature will be equal. */ if (tz->temperature >= trip->temperature) { - thermal_notify_tz_trip_up(tz, trip); - thermal_debug_tz_trip_up(tz, trip); + list_add_tail(&td->notify_list_node, way_up_list); td->threshold = trip->temperature - trip->hysteresis; } else { td->threshold = trip->temperature; @@ -400,8 +402,7 @@ static void handle_thermal_trip(struct t * the trip. */ if (tz->temperature < trip->temperature - trip->hysteresis) { - thermal_notify_tz_trip_down(tz, trip); - thermal_debug_tz_trip_down(tz, trip); + list_add(&td->notify_list_node, way_down_list); td->threshold = trip->temperature; } else { td->threshold = trip->temperature - trip->hysteresis; @@ -457,10 +458,24 @@ static void thermal_zone_device_init(str pos->initialized = false; } +static int thermal_trip_notify_cmp(void *ascending, const struct list_head *a, + const struct list_head *b) +{ + struct thermal_trip_desc *tda = container_of(a, struct thermal_trip_desc, + notify_list_node); + struct thermal_trip_desc *tdb = container_of(b, struct thermal_trip_desc, + notify_list_node); + int ret = tdb->threshold - tda->threshold; + + return ascending ? ret : -ret; +} + void __thermal_zone_device_update(struct thermal_zone_device *tz, enum thermal_notify_event event) { struct thermal_trip_desc *td; + LIST_HEAD(way_down_list); + LIST_HEAD(way_up_list); if (tz->suspended) return; @@ -475,7 +490,19 @@ void __thermal_zone_device_update(struct tz->notify_event = event; for_each_trip_desc(tz, td) - handle_thermal_trip(tz, td); + handle_thermal_trip(tz, td, &way_up_list, &way_down_list); + + list_sort((void *)true, &way_up_list, thermal_trip_notify_cmp); + list_for_each_entry(td, &way_up_list, notify_list_node) { + thermal_notify_tz_trip_up(tz, &td->trip); + thermal_debug_tz_trip_up(tz, &td->trip); + } + + list_sort(NULL, &way_down_list, thermal_trip_notify_cmp); + list_for_each_entry(td, &way_down_list, notify_list_node) { + thermal_notify_tz_trip_down(tz, &td->trip); + thermal_debug_tz_trip_down(tz, &td->trip); + } monitor_thermal_zone(tz); } Index: linux-pm/drivers/thermal/thermal_core.h =================================================================== --- linux-pm.orig/drivers/thermal/thermal_core.h +++ linux-pm/drivers/thermal/thermal_core.h @@ -17,6 +17,7 @@ struct thermal_trip_desc { struct thermal_trip trip; + struct list_head notify_list_node; int threshold; };
On Wed, Mar 6, 2024 at 8:32 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > On Wednesday, March 6, 2024 4:55:51 PM CET Daniel Lezcano wrote: > > On 06/03/2024 16:41, Rafael J. Wysocki wrote: > > > On Wed, Mar 6, 2024 at 2:16 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > >> > > >> On 06/03/2024 13:53, Rafael J. Wysocki wrote: > > >>> On Wed, Mar 6, 2024 at 1:43 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > >>>> > > >>>> On 06/03/2024 13:02, Rafael J. Wysocki wrote: > > >>>> > > >>>> [ ... ] > > >>>> > > >>>>>> +#define for_each_trip_reverse(__tz, __trip) \ > > >>>>>> + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--) > > >>>>>> + > > >>>>>> void __thermal_zone_set_trips(struct thermal_zone_device *tz); > > >>>>>> int thermal_zone_trip_id(const struct thermal_zone_device *tz, > > >>>>>> const struct thermal_trip *trip); > > >>>>>> -- > > >>>>> > > >>>>> Generally speaking, this is a matter of getting alignment on the > > >>>>> expectations between the kernel and user space. > > >>>>> > > >>>>> It looks like user space expects to get the notifications in the order > > >>>>> of either growing or falling temperatures, depending on the direction > > >>>>> of the temperature change. Ordering the trips in the kernel is not > > >>>>> practical, but the notifications can be ordered in principle. Is this > > >>>>> what you'd like to do? > > >>>> > > >>>> Yes > > >>>> > > >>>>> Or can user space be bothered with recognizing that it may get the > > >>>>> notifications for different trips out of order? > > >>>> > > >>>> IMO it is a bad information if the trip points events are coming > > >>>> unordered. The temperature signal is a time related measurements, the > > >>>> userspace should receive thermal information from this signal in the > > >>>> right order. It sounds strange to track the temperature signal in the > > >>>> kernel, then scramble the information, pass it to the userspace and > > >>>> except it to apply some kind of logic to unscramble it. > > >>> > > >>> So the notifications can be ordered before sending them out, as long > > >>> as they are produced by a single __thermal_zone_device_update() call. > > >>> > > >>> I guess you also would like the thermal_debug_tz_trip_up/down() calls > > >>> to be ordered, wouldn't you? > > >> > > >> Right > > > > > > I have an idea how to do this, but it is based on a couple of patches > > > that I've been working on in the meantime. > > > > > > Let me post these patches first and then I'll send a prototype patch > > > addressing this on top of them. > > > > That is awesome, thanks ! > > Anytime! > > Now that I've posted this series: > > https://lore.kernel.org/linux-pm/4558384.LvFx2qVVIh@kreacher/ > > I can append the patch below that is based on it. > > The idea is really straightforward: Instead of sending the notifications > and recording the stats right away, create two lists of trips for which > they need to be send, sort them and then send the notifications etc in > the right order. I want to avoid explicit memory allocations that can > fail in principle, which is why lists are used. > > The reason why two lists are used is in case the trips are updated and > that's why they appear to be crossed (which may not depend on the actual > temperature change). > > One caveat is that the lists are sorted by trip thresholds (because they > are the real values take into account in the code), but user space may > expect them to be sorted by trip temperatures instead. That can be changed. I've concluded that it is better to sort by trip temperature, because the thresholds used here are the new ones (computed after the trips have been crossed) which may be confusing. > --- > drivers/thermal/thermal_core.c | 39 +++++++++++++++++++++++++++++++++------ > drivers/thermal/thermal_core.h | 1 + > 2 files changed, 34 insertions(+), 6 deletions(-) > > Index: linux-pm/drivers/thermal/thermal_core.c > =================================================================== > --- linux-pm.orig/drivers/thermal/thermal_core.c > +++ linux-pm/drivers/thermal/thermal_core.c > @@ -15,6 +15,7 @@ > #include <linux/slab.h> > #include <linux/kdev_t.h> > #include <linux/idr.h> > +#include <linux/list_sort.h> > #include <linux/thermal.h> > #include <linux/reboot.h> > #include <linux/string.h> > @@ -361,7 +362,9 @@ static void handle_critical_trips(struct > } > > static void handle_thermal_trip(struct thermal_zone_device *tz, > - struct thermal_trip_desc *td) > + struct thermal_trip_desc *td, > + struct list_head *way_up_list, > + struct list_head *way_down_list) > { > const struct thermal_trip *trip = &td->trip; > > @@ -382,8 +385,7 @@ static void handle_thermal_trip(struct t > * the threshold and the trip temperature will be equal. > */ > if (tz->temperature >= trip->temperature) { > - thermal_notify_tz_trip_up(tz, trip); > - thermal_debug_tz_trip_up(tz, trip); > + list_add_tail(&td->notify_list_node, way_up_list); > td->threshold = trip->temperature - trip->hysteresis; > } else { > td->threshold = trip->temperature; > @@ -400,8 +402,7 @@ static void handle_thermal_trip(struct t > * the trip. > */ > if (tz->temperature < trip->temperature - trip->hysteresis) { > - thermal_notify_tz_trip_down(tz, trip); > - thermal_debug_tz_trip_down(tz, trip); > + list_add(&td->notify_list_node, way_down_list); > td->threshold = trip->temperature; > } else { > td->threshold = trip->temperature - trip->hysteresis; > @@ -457,10 +458,24 @@ static void thermal_zone_device_init(str > pos->initialized = false; > } > > +static int thermal_trip_notify_cmp(void *ascending, const struct list_head *a, > + const struct list_head *b) > +{ > + struct thermal_trip_desc *tda = container_of(a, struct thermal_trip_desc, > + notify_list_node); > + struct thermal_trip_desc *tdb = container_of(b, struct thermal_trip_desc, > + notify_list_node); > + int ret = tdb->threshold - tda->threshold; So this will become + int ret = tdb->trip.temperature - tda->trip.temperature; > + > + return ascending ? ret : -ret; > +} > + > void __thermal_zone_device_update(struct thermal_zone_device *tz, > enum thermal_notify_event event) > { > struct thermal_trip_desc *td; > + LIST_HEAD(way_down_list); > + LIST_HEAD(way_up_list); > > if (tz->suspended) > return; > @@ -475,7 +490,19 @@ void __thermal_zone_device_update(struct > tz->notify_event = event; > > for_each_trip_desc(tz, td) > - handle_thermal_trip(tz, td); > + handle_thermal_trip(tz, td, &way_up_list, &way_down_list); > + > + list_sort((void *)true, &way_up_list, thermal_trip_notify_cmp); > + list_for_each_entry(td, &way_up_list, notify_list_node) { > + thermal_notify_tz_trip_up(tz, &td->trip); > + thermal_debug_tz_trip_up(tz, &td->trip); > + } > + > + list_sort(NULL, &way_down_list, thermal_trip_notify_cmp); > + list_for_each_entry(td, &way_down_list, notify_list_node) { > + thermal_notify_tz_trip_down(tz, &td->trip); > + thermal_debug_tz_trip_down(tz, &td->trip); > + } > > monitor_thermal_zone(tz); > } > Index: linux-pm/drivers/thermal/thermal_core.h > =================================================================== > --- linux-pm.orig/drivers/thermal/thermal_core.h > +++ linux-pm/drivers/thermal/thermal_core.h > @@ -17,6 +17,7 @@ > > struct thermal_trip_desc { > struct thermal_trip trip; > + struct list_head notify_list_node; > int threshold; > }; >
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index dfaa6341694a..abb8ee5c9afe 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -473,8 +473,12 @@ void __thermal_zone_device_update(struct thermal_zone_device *tz, tz->notify_event = event; - for_each_trip(tz, trip) - handle_thermal_trip(tz, trip); + if (tz->last_temperature < tz->temperature) + for_each_trip(tz, trip) + handle_thermal_trip(tz, trip); + else + for_each_trip_reverse(tz, trip) + handle_thermal_trip(tz, trip); monitor_thermal_zone(tz); } diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h index e9c099ecdd0f..0072b3d4039e 100644 --- a/drivers/thermal/thermal_core.h +++ b/drivers/thermal/thermal_core.h @@ -123,6 +123,9 @@ void thermal_governor_update_tz(struct thermal_zone_device *tz, #define for_each_trip(__tz, __trip) \ for (__trip = __tz->trips; __trip - __tz->trips < __tz->num_trips; __trip++) +#define for_each_trip_reverse(__tz, __trip) \ + for (__trip = &__tz->trips[__tz->num_trips - 1]; __trip >= __tz->trips ; __trip--) + void __thermal_zone_set_trips(struct thermal_zone_device *tz); int thermal_zone_trip_id(const struct thermal_zone_device *tz, const struct thermal_trip *trip);
Let's assume the following setup: - trip 0 = 65°C - trip 1 = 70°C - trip 2 = 75°C The current temperature is 35°C. The interrupt is setup to fire at 65°C. If the thermal capacity is saturated it is possible the temperature jumps to 72°c when reading the temperature after the interrupt fired when 65°C was crossed. That means we should have two events notified to userspace. The first one for trip 0 and the second one for trip 1. When the function thermal_zone_update() is called from the threaded interrupt, it will read the temperature and then call for_each_trip() which in turns call handle_trip_point(). This function will check: if (tz->last_temperature < trip->temperature && tz->temperature >= trip->temperature) thermal_notify_tz_trip_up() So here, we will call this function with trip0 followed by trip1. That will result in an event for each trip point, reflecting the trip point being crossed the way up with a temperature raising. So far, so good. Usually the sensors have an interrupt when the temperature is crossed the way up but not the way down, so there an extra delay corresponding to the passive polling where the temperature could have dropped and crossed more than one trip point. This scenario is likely to happen more often when multiple trip points are specified. So considering the same setup after crossing the trip 2, we stop the workload responsible of the heat and the temperature drops suddenly to 62°C. In this case, the next polling will call thermal_zone_device_update(), then for_each_trip() and handle_trip_point(). This function will check: if (tz->last_temperature >= trip->temperature && tz->temperature < trip->temperature - trip->hysteresis) thermal_notify_tz_trip_down() The loop for_each_trip() will call trip0, 1 and 2. That will result in generating the events for trip0, 1 and 2, in the wrong order. That is not reflecting the thermal dynamic and puzzles the userspace monitoring the temperature. Fix this by inspecting the trend of the temperature. If it is raising, then we browse the trip point in the ascending order, if it is falling then we browse in the descending order. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> --- drivers/thermal/thermal_core.c | 8 ++++++-- drivers/thermal/thermal_core.h | 3 +++ 2 files changed, 9 insertions(+), 2 deletions(-)