Message ID | 20200407174926.23971-1-andrzej.p@collabora.com (mailing list archive) |
---|---|
Headers | show |
Series | Stop monitoring disabled devices | expand |
On 07/04/2020 19:49, Andrzej Pietrasiewicz wrote: > The current kernel behavior is to keep polling the thermal zone devices > regardless of their current mode. This is not desired, as all such "disabled" > devices are meant to be handled by userspace,> so polling them makes no sense. Thanks for proposing these changes. I've been (quickly) through the series and the description below. I have the feeling the series makes more complex while the current code which would deserve a cleanup. Why not first: - Add a 'mode' field in the thermal zone device - Kill all set/get_mode callbacks in the drivers which are duplicated code. - Add a function: enum thermal_device_mode thermal_zone_get_mode( *tz) { ... if (tz->ops->get_mode) return tz->ops->get_mode(); return tz->mode; } int thermal_zone_set_mode(..*tz, enum thermal_device_mode mode) { ... if (tz->ops->set_mode) return tz->ops->set_mode(tz, mode); tz->mode = mode; return 0; } static inline thermal_zone_enable(... *tz) { thermal_zone_set_mode(tz, THERMAL_DEVICE_ENABLED); } static inline thermal_zone_disable(... *tz) { thermal_zone_set_mode(tz, THERMAL_DEVICE_DISABLED); } And then when the code is consolidated, use the mode to enable/disable the polling and continue killing the duplicated code in of-thermal.c and anywhere else. > There was an attempt to solve this issue: > > https://lkml.org/lkml/2018/2/26/498 > > and it ultimately has not succeeded: > > https://lkml.org/lkml/2018/2/27/910 > > This is a new attempt addressing all the relevant drivers, and I have > identified them with: > > $ git grep "thermal_zone_device_ops" | grep "= {" | cut -f1 -d: | sort | uniq > > The idea is to modify thermal_zone_device_update() and monitor_thermal_zone() > in such a way that they stop polling a disabled device. To do decide what to > do they should call ->get_mode() operation of the specialized thermal zone > device in question (e.g. drivers/acpi/thermal.c's). But here comes problem: > sometimes a thermal zone device must be initially disabled and becomes enabled > only after its sensors appear on the system. If such thermal zone's > ->get_mode() /* in the context of thermal_zone_device_update() or > monitor_thermal_zone() */ is called _before_ the sensors are available, it will > be reported as "disabled" and consequently polling it will be ceased. This is > a change in behavior from userspace's perspective. > > To solve the above described problem I want to introduce the third mode of a > thermal_zone_device: initial. The idea is that when the device is in its > initial mode, then its polling will be handled as it is now. This is a good > thing: should the temperature be just about hitting the critical treshnold > early during the boot process, it might be too late if we wait for the > userspace to run to save the system from overheating. The initial mode should > be reported in sysfs as "enabled" to keep the userspace interface intact. > From the initial mode there will be two possible transitions: to enabled or > disabled mode, but there will be no transition back to initial. If the > transition is from initial to enabled, then keep polling. If the transition is > from initial to disabled, then stop polling. If the transition is from enabled > to disabled, then stop polling. The transition from disabled to enabled must > be handled in a special way: there must be a mandatory call to > monitor_thermal_zone(), otherwise the polling will not start. If this > transition is triggeted from sysfs, then it can be easily handled at the > thermal framework level. However, if drivers call their own ->set_mode() > operation then they must also call "monitor_thermal_zone()" afterwards. > The latter being a sensible thing anyway, so perhaps all/most of the drivers > in question do. The plan for implementation is this: > > - ensure ALL users use symbolic enum names (THERMAL_DEVICE_DISABLED, > THERMAL_DEVICE_ENABLED) for thermal device mode rather than the numeric > values of enum thermal_device_mode elements > - add THERMAL_DEVICE_INITIAL to the said enum making its value 0 (so that > kzalloc() results in the initial state) > - modify thermal zone device's mode_show() (thermal framework level) so that > it reports "enabled" for THERMAL_DEVICE_INITIAL > - modify thermal zone device's mode_store() (thermal framework level) so that > it calls monitor_thermal_zone() upon mode change > - modify ALL thermal drivers so that their code is prepared to return > THERMAL_DEVICE_INITIAL before they call thermal_zone_device_register(); when > the invocation of the latter completes then polling is expected to be started > - verify ALL drivers which call their own ->set_mode() to ensure they do call > monitor_thermal_zone() afterwards > - modify thermal_zone_device_update() and monitor_thermal_zone() so that they > cancel polling for disabled thermal zone devices (but not for those in > THERMAL_DEVICE_INITIAL mode) > > This RFC series does all the above steps in more or less that order. > > I kindly ask for comments/suggestions/improvements. > > Rebased onto v5.6. > > Andrzej Pietrasiewicz (8): > thermal: int3400_thermal: Statically initialize > .get_mode()/.set_mode() ops > thermal: Properly handle mode values in .set_mode() > thermal: Store thermal mode in a dedicated enum > thermal: core: Introduce THERMAL_DEVICE_INITIAL > thermal: core: Monitor thermal zone after mode change > thermal: Set initial state to THERMAL_DEVICE_INITIAL > thermal: of: Monitor thermal zone after enabling it > thermal: Stop polling DISABLED thermal devices > > drivers/acpi/thermal.c | 28 +++++----- > .../ethernet/mellanox/mlxsw/core_thermal.c | 11 +++- > drivers/platform/x86/acerhdf.c | 17 ++++-- > drivers/thermal/da9062-thermal.c | 2 +- > drivers/thermal/imx_thermal.c | 5 +- > .../intel/int340x_thermal/int3400_thermal.c | 24 ++++----- > .../thermal/intel/intel_quark_dts_thermal.c | 6 ++- > drivers/thermal/of-thermal.c | 9 +++- > drivers/thermal/thermal_core.c | 52 ++++++++++++++++++- > drivers/thermal/thermal_core.h | 2 + > drivers/thermal/thermal_sysfs.c | 12 +++-- > include/linux/thermal.h | 3 +- > 12 files changed, 123 insertions(+), 48 deletions(-) >
Hi Daniel, W dniu 09.04.2020 o 12:29, Daniel Lezcano pisze: > On 07/04/2020 19:49, Andrzej Pietrasiewicz wrote: >> The current kernel behavior is to keep polling the thermal zone devices >> regardless of their current mode. This is not desired, as all such "disabled" >> devices are meant to be handled by userspace,> so polling them makes no sense. > > Thanks for proposing these changes. > > I've been (quickly) through the series and the description below. I have > the feeling the series makes more complex while the current code which > would deserve a cleanup. > > Why not first: > > - Add a 'mode' field in the thermal zone device > - Kill all set/get_mode callbacks in the drivers which are duplicated code. > - Add a function: > > enum thermal_device_mode thermal_zone_get_mode( *tz) > { > ... > if (tz->ops->get_mode) > return tz->ops->get_mode(); > > return tz->mode; > } > > > int thermal_zone_set_mode(..*tz, enum thermal_device_mode mode) > { > ... > if (tz->ops->set_mode) > return tz->ops->set_mode(tz, mode); > > tz->mode = mode; > > return 0; > } > > static inline thermal_zone_enable(... *tz) > { > thermal_zone_set_mode(tz, THERMAL_DEVICE_ENABLED); > } > > static inline thermal_zone_disable(... *tz) { > thermal_zone_set_mode(tz, THERMAL_DEVICE_DISABLED); > } > > And then when the code is consolidated, use the mode to enable/disable > the polling and continue killing the duplicated code in of-thermal.c and > anywhere else. > > Thanks for feedback. Anyone else? Andrzej
This is the second iteration of this RFC. The series now focuses on cleaning up the code in the first place. After the cleanups in patches 1-3 struct thermal_zone_device is extended so that it contains a "mode" member (patch 4/9). The next patch (5/9) makes all thermal zone devices use the "mode" member. This patch makes drivers' ->get_mode() methods redundant, so the next one (6/9) removes the method altogether. Patches 7-8/9 ensure that after changing thermal zone device's mode an attempt will be made to monitor the device. And finally patch 9/9 prevents DISABLED devices from being monitored. It also adds THERMAL_DEVICE_INITIAL to accommodate the devices, which should be monitored but cannot be initially ENABLED. Andrzej Pietrasiewicz (9): thermal: int3400_thermal: Statically initialize .get_mode()/.set_mode() ops thermal: Eliminate an always-false condition thermal: Properly handle mode values in .set_mode() thermal: core: Let thermal zone device's mode be stored in its struct thermal: Store mode in thermal_zone_device thermal: Remove get_mode() method thermal: core: Monitor thermal zone after mode change thermal: of: Monitor thermal zone after enabling it thermal: core: Stop polling DISABLED thermal devices drivers/acpi/thermal.c | 44 +++++---------- .../ethernet/mellanox/mlxsw/core_thermal.c | 43 ++++----------- drivers/platform/x86/acerhdf.c | 28 +++++----- drivers/thermal/da9062-thermal.c | 12 +--- drivers/thermal/imx_thermal.c | 30 ++++------ .../intel/int340x_thermal/int3400_thermal.c | 39 +++---------- .../thermal/intel/intel_quark_dts_thermal.c | 27 ++++----- drivers/thermal/of-thermal.c | 28 ++++------ drivers/thermal/thermal_core.c | 40 ++++++++++++-- drivers/thermal/thermal_core.h | 2 + drivers/thermal/thermal_sysfs.c | 40 ++++---------- include/linux/thermal.h | 55 ++++++++++++++++++- 12 files changed, 180 insertions(+), 208 deletions(-)
Hi Andrzej, Thanks for your patches. On Tue, 14 Apr 2020 at 15:01, Andrzej Pietrasiewicz <andrzej.p@collabora.com> wrote: > > This is the second iteration of this RFC. > > The series now focuses on cleaning up the code in the first place. > [..] > 12 files changed, 180 insertions(+), 208 deletions(-) > Compared with the previous iteration, and just judging by this diffstat, I think it's a step in the right direction. Nice job :-) Ezequiel
On 4/9/20 1:10 PM, Andrzej Pietrasiewicz wrote: > Hi Daniel, > > W dniu 09.04.2020 o 12:29, Daniel Lezcano pisze: >> On 07/04/2020 19:49, Andrzej Pietrasiewicz wrote: >>> The current kernel behavior is to keep polling the thermal zone devices >>> regardless of their current mode. This is not desired, as all such "disabled" >>> devices are meant to be handled by userspace,> so polling them makes no sense. >> >> Thanks for proposing these changes. >> >> I've been (quickly) through the series and the description below. I have >> the feeling the series makes more complex while the current code which >> would deserve a cleanup. >> >> Why not first: >> >> - Add a 'mode' field in the thermal zone device >> - Kill all set/get_mode callbacks in the drivers which are duplicated code. >> - Add a function: >> >> enum thermal_device_mode thermal_zone_get_mode( *tz) >> { >> ... >> if (tz->ops->get_mode) >> return tz->ops->get_mode(); >> >> return tz->mode; >> } >> >> >> int thermal_zone_set_mode(..*tz, enum thermal_device_mode mode) >> { >> ... >> if (tz->ops->set_mode) >> return tz->ops->set_mode(tz, mode); >> >> tz->mode = mode; >> >> return 0; >> } >> >> static inline thermal_zone_enable(... *tz) >> { >> thermal_zone_set_mode(tz, THERMAL_DEVICE_ENABLED); >> } >> >> static inline thermal_zone_disable(... *tz) { >> thermal_zone_set_mode(tz, THERMAL_DEVICE_DISABLED); >> } >> >> And then when the code is consolidated, use the mode to enable/disable >> the polling and continue killing the duplicated code in of-thermal.c and >> anywhere else. >> >> > > Thanks for feedback. > > Anyone else? Yes. :) Please take a look at the following patchset (which I'm reviving currently): https://lkml.org/lkml/2018/10/17/926 It overlaps partially with your work so we need to coordinate our efforts. Best regards, -- Bartlomiej Zolnierkiewicz Samsung R&D Institute Poland Samsung Electronics
Hi Barlomiej, >> Thanks for feedback. >> >> Anyone else? > > Yes. :) > > Please take a look at the following patchset (which I'm reviving currently): > > https://lkml.org/lkml/2018/10/17/926 > > It overlaps partially with your work so we need to coordinate our efforts. > I've just sent a v3. After addressing your and Daniel's comments my series now looks pretty compact. Let's see if there's more feedback. Is your work on reviving the above mentioned 2018 series ready? Andrzej
Hi, 17. April 2020 18:23, "Andrzej Pietrasiewicz" <andrzej.p@collabora.com> schrieb: [...] > I've just sent a v3. After addressing your and Daniel's comments my series > now looks pretty compact. Let's see if there's more feedback. Is your work on > reviving the above mentioned 2018 series ready? I agree, v3 looks pretty good, I will test it within next 2 days for acerhdf. Thanks for your work!
On 4/17/20 6:23 PM, Andrzej Pietrasiewicz wrote: > Hi Barlomiej, > >>> Thanks for feedback. >>> >>> Anyone else? >> >> Yes. :) >> >> Please take a look at the following patchset (which I'm reviving currently): >> >> https://protect2.fireeye.com/url?k=5d37badf-00a92135-5d363190-0cc47a6cba04-376ae45aa028b19a&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2018%2F10%2F17%2F926 >> >> It overlaps partially with your work so we need to coordinate our efforts. >> > > I've just sent a v3. After addressing your and Daniel's comments my series > now looks pretty compact. Let's see if there's more feedback. Is your work on > reviving the above mentioned 2018 series ready? Not yet, also I think now that it will be the best if I rebase my changes on top of your patchset (once it is ready/finished). Best regards, -- Bartlomiej Zolnierkiewicz Samsung R&D Institute Poland Samsung Electronics