Message ID | 20210917072732.611140-1-abailon@baylibre.com (mailing list archive) |
---|---|
Headers | show |
Series | Add a generic virtual thermal sensor | expand |
On 17/09/2021 09:27, Alexandre Bailon wrote: > This series add a virtual thermal sensor. > It could be used to get a temperature using some thermal sensors. > Currently, the supported operations are max, min and avg. > The virtual sensor could be easily extended to support others operations. > > Note: > Currently, thermal drivers must explicitly register their sensors to make them > available to the virtual sensor. > This doesn't seem a good solution to me and I think it would be preferable to > update the framework to register the list of each available sensors. Why must the drivers do that ? > Changes in v2: > - Fix some warnings / errors reported by kernel test robot > - rename some struct and functions with a more accurate name > - update the dt bindings: rename type attribute to aggregation-function > - factorize a little bit the aggregation functions > > Alexandre Bailon (2): > dt-bindings: Add bindings for the virtual thermal sensor > thermal: add a virtual sensor to aggregate temperatures > > .../thermal/virtual,thermal-sensor.yaml | 67 +++ > drivers/thermal/Kconfig | 8 + > drivers/thermal/Makefile | 1 + > drivers/thermal/virtual-sensor.h | 51 +++ > drivers/thermal/virtual_sensor.c | 400 ++++++++++++++++++ > include/dt-bindings/thermal/virtual-sensor.h | 15 + > 6 files changed, 542 insertions(+) > create mode 100644 Documentation/devicetree/bindings/thermal/virtual,thermal-sensor.yaml > create mode 100644 drivers/thermal/virtual-sensor.h > create mode 100644 drivers/thermal/virtual_sensor.c > create mode 100644 include/dt-bindings/thermal/virtual-sensor.h >
Hi Daniel, On 9/17/21 2:41 PM, Daniel Lezcano wrote: > On 17/09/2021 09:27, Alexandre Bailon wrote: >> This series add a virtual thermal sensor. >> It could be used to get a temperature using some thermal sensors. >> Currently, the supported operations are max, min and avg. >> The virtual sensor could be easily extended to support others operations. >> >> Note: >> Currently, thermal drivers must explicitly register their sensors to make them >> available to the virtual sensor. >> This doesn't seem a good solution to me and I think it would be preferable to >> update the framework to register the list of each available sensors. > Why must the drivers do that ? Because there are no central place where thermal sensor are registered. The only other way I found was to update thermal_of.c, to register the thermal sensors and make them available later to the virtual thermal sensor. To work, the virtual thermal need to get the sensor_data the ops from the thermal sensor. And as far I know, this is only registered in thermal_of.c, in the thermal zone data but I can't access it directly from the virtual thermal sensor. How would you do it ? Thanks, Alexandre > >> Changes in v2: >> - Fix some warnings / errors reported by kernel test robot >> - rename some struct and functions with a more accurate name >> - update the dt bindings: rename type attribute to aggregation-function >> - factorize a little bit the aggregation functions >> >> Alexandre Bailon (2): >> dt-bindings: Add bindings for the virtual thermal sensor >> thermal: add a virtual sensor to aggregate temperatures >> >> .../thermal/virtual,thermal-sensor.yaml | 67 +++ >> drivers/thermal/Kconfig | 8 + >> drivers/thermal/Makefile | 1 + >> drivers/thermal/virtual-sensor.h | 51 +++ >> drivers/thermal/virtual_sensor.c | 400 ++++++++++++++++++ >> include/dt-bindings/thermal/virtual-sensor.h | 15 + >> 6 files changed, 542 insertions(+) >> create mode 100644 Documentation/devicetree/bindings/thermal/virtual,thermal-sensor.yaml >> create mode 100644 drivers/thermal/virtual-sensor.h >> create mode 100644 drivers/thermal/virtual_sensor.c >> create mode 100644 include/dt-bindings/thermal/virtual-sensor.h >> >
On 17/09/2021 15:33, Alexandre Bailon wrote: > Hi Daniel, > > On 9/17/21 2:41 PM, Daniel Lezcano wrote: >> On 17/09/2021 09:27, Alexandre Bailon wrote: >>> This series add a virtual thermal sensor. >>> It could be used to get a temperature using some thermal sensors. >>> Currently, the supported operations are max, min and avg. >>> The virtual sensor could be easily extended to support others >>> operations. >>> >>> Note: >>> Currently, thermal drivers must explicitly register their sensors to >>> make them >>> available to the virtual sensor. >>> This doesn't seem a good solution to me and I think it would be >>> preferable to >>> update the framework to register the list of each available sensors. >> Why must the drivers do that ? > Because there are no central place where thermal sensor are registered. > The only other way I found was to update thermal_of.c, > to register the thermal sensors and make them available later to the > virtual thermal sensor. > > To work, the virtual thermal need to get the sensor_data the ops from > the thermal sensor. > And as far I know, this is only registered in thermal_of.c, in the > thermal zone data > but I can't access it directly from the virtual thermal sensor. > > How would you do it ? Via the phandles when registering the virtual sensor ? Or is the problem you are mentioning related to the sensor module loading after the virtual sensor ?
On 9/17/21 4:03 PM, Daniel Lezcano wrote: > On 17/09/2021 15:33, Alexandre Bailon wrote: >> Hi Daniel, >> >> On 9/17/21 2:41 PM, Daniel Lezcano wrote: >>> On 17/09/2021 09:27, Alexandre Bailon wrote: >>>> This series add a virtual thermal sensor. >>>> It could be used to get a temperature using some thermal sensors. >>>> Currently, the supported operations are max, min and avg. >>>> The virtual sensor could be easily extended to support others >>>> operations. >>>> >>>> Note: >>>> Currently, thermal drivers must explicitly register their sensors to >>>> make them >>>> available to the virtual sensor. >>>> This doesn't seem a good solution to me and I think it would be >>>> preferable to >>>> update the framework to register the list of each available sensors. >>> Why must the drivers do that ? >> Because there are no central place where thermal sensor are registered. >> The only other way I found was to update thermal_of.c, >> to register the thermal sensors and make them available later to the >> virtual thermal sensor. >> >> To work, the virtual thermal need to get the sensor_data the ops from >> the thermal sensor. >> And as far I know, this is only registered in thermal_of.c, in the >> thermal zone data >> but I can't access it directly from the virtual thermal sensor. >> >> How would you do it ? > Via the phandles when registering the virtual sensor ? As far I know, we can't get the ops or the sensor_data from the phandle of a thermal sensor. The closest solution I found so far would be to aggregate the thermal zones instead of thermal sensors. thermal_zone_device has the data needed and a thermal zone could be find easily using its name. But, using a thermal_zone_device, I don't see how to handle module unloading. Thanks, Alexandre > Or is the problem you are mentioning related to the sensor module > loading after the virtual sensor ? >
On 20/09/2021 15:12, Alexandre Bailon wrote: > > On 9/17/21 4:03 PM, Daniel Lezcano wrote: >> On 17/09/2021 15:33, Alexandre Bailon wrote: >>> Hi Daniel, >>> >>> On 9/17/21 2:41 PM, Daniel Lezcano wrote: >>>> On 17/09/2021 09:27, Alexandre Bailon wrote: >>>>> This series add a virtual thermal sensor. >>>>> It could be used to get a temperature using some thermal sensors. >>>>> Currently, the supported operations are max, min and avg. >>>>> The virtual sensor could be easily extended to support others >>>>> operations. >>>>> >>>>> Note: >>>>> Currently, thermal drivers must explicitly register their sensors to >>>>> make them >>>>> available to the virtual sensor. >>>>> This doesn't seem a good solution to me and I think it would be >>>>> preferable to >>>>> update the framework to register the list of each available sensors. >>>> Why must the drivers do that ? >>> Because there are no central place where thermal sensor are registered. >>> The only other way I found was to update thermal_of.c, >>> to register the thermal sensors and make them available later to the >>> virtual thermal sensor. >>> >>> To work, the virtual thermal need to get the sensor_data the ops from >>> the thermal sensor. >>> And as far I know, this is only registered in thermal_of.c, in the >>> thermal zone data >>> but I can't access it directly from the virtual thermal sensor. >>> >>> How would you do it ? >> Via the phandles when registering the virtual sensor ? > As far I know, we can't get the ops or the sensor_data from the phandle > of a thermal sensor. > The closest solution I found so far would be to aggregate the thermal > zones instead of thermal sensors. > thermal_zone_device has the data needed and a thermal zone could be find > easily using its name. Yeah, the concept of the thermal zone and the sensor are very close. There is the function in thermal_core.h: -> for_each_thermal_zone() You should be able for each 'slave' sensor, do a lookup to find the corresponding thermal_zone_device_ops. > But, using a thermal_zone_device, I don't see how to handle module > unloading. I think try_module_get() / module_put() are adequate for this situation as it is done on an external module and we can not rely on the exported symbols.
On 9/22/21 10:10 AM, Daniel Lezcano wrote: > On 20/09/2021 15:12, Alexandre Bailon wrote: >> On 9/17/21 4:03 PM, Daniel Lezcano wrote: >>> On 17/09/2021 15:33, Alexandre Bailon wrote: >>>> Hi Daniel, >>>> >>>> On 9/17/21 2:41 PM, Daniel Lezcano wrote: >>>>> On 17/09/2021 09:27, Alexandre Bailon wrote: >>>>>> This series add a virtual thermal sensor. >>>>>> It could be used to get a temperature using some thermal sensors. >>>>>> Currently, the supported operations are max, min and avg. >>>>>> The virtual sensor could be easily extended to support others >>>>>> operations. >>>>>> >>>>>> Note: >>>>>> Currently, thermal drivers must explicitly register their sensors to >>>>>> make them >>>>>> available to the virtual sensor. >>>>>> This doesn't seem a good solution to me and I think it would be >>>>>> preferable to >>>>>> update the framework to register the list of each available sensors. >>>>> Why must the drivers do that ? >>>> Because there are no central place where thermal sensor are registered. >>>> The only other way I found was to update thermal_of.c, >>>> to register the thermal sensors and make them available later to the >>>> virtual thermal sensor. >>>> >>>> To work, the virtual thermal need to get the sensor_data the ops from >>>> the thermal sensor. >>>> And as far I know, this is only registered in thermal_of.c, in the >>>> thermal zone data >>>> but I can't access it directly from the virtual thermal sensor. >>>> >>>> How would you do it ? >>> Via the phandles when registering the virtual sensor ? >> As far I know, we can't get the ops or the sensor_data from the phandle >> of a thermal sensor. >> The closest solution I found so far would be to aggregate the thermal >> zones instead of thermal sensors. >> thermal_zone_device has the data needed and a thermal zone could be find >> easily using its name. > Yeah, the concept of the thermal zone and the sensor are very close. > > There is the function in thermal_core.h: > > -> for_each_thermal_zone() > > You should be able for each 'slave' sensor, do a lookup to find the > corresponding thermal_zone_device_ops. > >> But, using a thermal_zone_device, I don't see how to handle module >> unloading. > I think try_module_get() / module_put() are adequate for this situation > as it is done on an external module and we can not rely on the exported > symbols. I don't see how it would be possible to use these functions. The thermal zone doesn't have the data required to use it. Maybe a more easier way is to use the thermal_zone_device mutex. If I get a lock before to use the thermal_zone_device ops, I have the guaranty that module won't be unloaded. When a "thermal of sensor" is unloaded, it calls thermal_zone_of_sensor_unregister which takes a lock before update ops. Thanks, Alexandre >
On 04/10/2021 12:24, Alexandre Bailon wrote: > > On 9/22/21 10:10 AM, Daniel Lezcano wrote: >> On 20/09/2021 15:12, Alexandre Bailon wrote: >>> On 9/17/21 4:03 PM, Daniel Lezcano wrote: >>>> On 17/09/2021 15:33, Alexandre Bailon wrote: >>>>> Hi Daniel, >>>>> >>>>> On 9/17/21 2:41 PM, Daniel Lezcano wrote: >>>>>> On 17/09/2021 09:27, Alexandre Bailon wrote: >>>>>>> This series add a virtual thermal sensor. >>>>>>> It could be used to get a temperature using some thermal sensors. >>>>>>> Currently, the supported operations are max, min and avg. >>>>>>> The virtual sensor could be easily extended to support others >>>>>>> operations. >>>>>>> >>>>>>> Note: >>>>>>> Currently, thermal drivers must explicitly register their sensors to >>>>>>> make them >>>>>>> available to the virtual sensor. >>>>>>> This doesn't seem a good solution to me and I think it would be >>>>>>> preferable to >>>>>>> update the framework to register the list of each available sensors. >>>>>> Why must the drivers do that ? >>>>> Because there are no central place where thermal sensor are >>>>> registered. >>>>> The only other way I found was to update thermal_of.c, >>>>> to register the thermal sensors and make them available later to the >>>>> virtual thermal sensor. >>>>> >>>>> To work, the virtual thermal need to get the sensor_data the ops from >>>>> the thermal sensor. >>>>> And as far I know, this is only registered in thermal_of.c, in the >>>>> thermal zone data >>>>> but I can't access it directly from the virtual thermal sensor. >>>>> >>>>> How would you do it ? >>>> Via the phandles when registering the virtual sensor ? >>> As far I know, we can't get the ops or the sensor_data from the phandle >>> of a thermal sensor. >>> The closest solution I found so far would be to aggregate the thermal >>> zones instead of thermal sensors. >>> thermal_zone_device has the data needed and a thermal zone could be find >>> easily using its name. >> Yeah, the concept of the thermal zone and the sensor are very close. >> >> There is the function in thermal_core.h: >> >> -> for_each_thermal_zone() >> >> You should be able for each 'slave' sensor, do a lookup to find the >> corresponding thermal_zone_device_ops. >> >>> But, using a thermal_zone_device, I don't see how to handle module >>> unloading. >> I think try_module_get() / module_put() are adequate for this situation >> as it is done on an external module and we can not rely on the exported >> symbols. > I don't see how it would be possible to use these functions. > The thermal zone doesn't have the data required to use it. Actually I was able to crash the kernel by doing: console 1: while $(true); do insmod <module> && rmmod <module>; done console 2: while $(true); cat /sys/class/thermal/thermal_zone0/temp; done So there is something wrong already in the thermal framework. IMO, the first thing would be to fix this critical issue by getting the sensor module refcount when the thermal zone is enabled and dropping it when it is disabled. With that fixed, perhaps it will possible to get the device associated with the sensor and then try_module_get(dev->driver->owner) > Maybe a more easier way is to use the thermal_zone_device mutex. > If I get a lock before to use the thermal_zone_device ops, I have the > guaranty that module won't be unloaded. > > When a "thermal of sensor" is unloaded, it calls > thermal_zone_of_sensor_unregister which takes a lock before > update ops. I'm not sure to understand. The goal is to have the refcount on the modules to be incremented when the virtual sensor is using them. Until the virtual sensor is registered, it will prevent the other modules to be unloaded.
On Mon, Oct 4, 2021 at 3:42 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > On 04/10/2021 12:24, Alexandre Bailon wrote: > > > > On 9/22/21 10:10 AM, Daniel Lezcano wrote: > >> On 20/09/2021 15:12, Alexandre Bailon wrote: > >>> On 9/17/21 4:03 PM, Daniel Lezcano wrote: > >>>> On 17/09/2021 15:33, Alexandre Bailon wrote: > >>>>> Hi Daniel, > >>>>> > >>>>> On 9/17/21 2:41 PM, Daniel Lezcano wrote: > >>>>>> On 17/09/2021 09:27, Alexandre Bailon wrote: > >>>>>>> This series add a virtual thermal sensor. > >>>>>>> It could be used to get a temperature using some thermal sensors. > >>>>>>> Currently, the supported operations are max, min and avg. > >>>>>>> The virtual sensor could be easily extended to support others > >>>>>>> operations. > >>>>>>> > >>>>>>> Note: > >>>>>>> Currently, thermal drivers must explicitly register their sensors to > >>>>>>> make them > >>>>>>> available to the virtual sensor. > >>>>>>> This doesn't seem a good solution to me and I think it would be > >>>>>>> preferable to > >>>>>>> update the framework to register the list of each available sensors. > >>>>>> Why must the drivers do that ? > >>>>> Because there are no central place where thermal sensor are > >>>>> registered. > >>>>> The only other way I found was to update thermal_of.c, > >>>>> to register the thermal sensors and make them available later to the > >>>>> virtual thermal sensor. > >>>>> > >>>>> To work, the virtual thermal need to get the sensor_data the ops from > >>>>> the thermal sensor. > >>>>> And as far I know, this is only registered in thermal_of.c, in the > >>>>> thermal zone data > >>>>> but I can't access it directly from the virtual thermal sensor. > >>>>> > >>>>> How would you do it ? > >>>> Via the phandles when registering the virtual sensor ? > >>> As far I know, we can't get the ops or the sensor_data from the phandle > >>> of a thermal sensor. > >>> The closest solution I found so far would be to aggregate the thermal > >>> zones instead of thermal sensors. > >>> thermal_zone_device has the data needed and a thermal zone could be find > >>> easily using its name. > >> Yeah, the concept of the thermal zone and the sensor are very close. > >> > >> There is the function in thermal_core.h: > >> > >> -> for_each_thermal_zone() > >> > >> You should be able for each 'slave' sensor, do a lookup to find the > >> corresponding thermal_zone_device_ops. > >> > >>> But, using a thermal_zone_device, I don't see how to handle module > >>> unloading. > >> I think try_module_get() / module_put() are adequate for this situation > >> as it is done on an external module and we can not rely on the exported > >> symbols. > > I don't see how it would be possible to use these functions. > > The thermal zone doesn't have the data required to use it. > > Actually I was able to crash the kernel by doing: > > console 1: > > while $(true); do insmod <module> && rmmod <module>; done > > console 2: > > while $(true); cat /sys/class/thermal/thermal_zone0/temp; done > > So there is something wrong already in the thermal framework. Hmmm. > IMO, the first thing would be to fix this critical issue by getting the > sensor module refcount when the thermal zone is enabled and dropping it > when it is disabled. > > With that fixed, perhaps it will possible to get the device associated > with the sensor and then try_module_get(dev->driver->owner) > > > Maybe a more easier way is to use the thermal_zone_device mutex. > > If I get a lock before to use the thermal_zone_device ops, I have the > > guaranty that module won't be unloaded. That would be my approach too. > > When a "thermal of sensor" is unloaded, it calls > > thermal_zone_of_sensor_unregister which takes a lock before > > update ops. > > I'm not sure to understand. The goal is to have the refcount on the > modules to be incremented when the virtual sensor is using them. IMO the goal is to prevent the code from crashing when modules get unloaded. I'm not really sure if refcounts alone are sufficient for that. > Until the virtual sensor is registered, it will prevent the other > modules to be unloaded. Unless they are forced to unload that is, AFAICS. IMO it would be better to make the code survive unloading of a module.
On 05/10/2021 18:45, Rafael J. Wysocki wrote: > On Mon, Oct 4, 2021 at 3:42 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >> >> On 04/10/2021 12:24, Alexandre Bailon wrote: >>> >>> On 9/22/21 10:10 AM, Daniel Lezcano wrote: >>>> On 20/09/2021 15:12, Alexandre Bailon wrote: >>>>> On 9/17/21 4:03 PM, Daniel Lezcano wrote: >>>>>> On 17/09/2021 15:33, Alexandre Bailon wrote: >>>>>>> Hi Daniel, >>>>>>> >>>>>>> On 9/17/21 2:41 PM, Daniel Lezcano wrote: >>>>>>>> On 17/09/2021 09:27, Alexandre Bailon wrote: >>>>>>>>> This series add a virtual thermal sensor. >>>>>>>>> It could be used to get a temperature using some thermal sensors. >>>>>>>>> Currently, the supported operations are max, min and avg. >>>>>>>>> The virtual sensor could be easily extended to support others >>>>>>>>> operations. >>>>>>>>> >>>>>>>>> Note: >>>>>>>>> Currently, thermal drivers must explicitly register their sensors to >>>>>>>>> make them >>>>>>>>> available to the virtual sensor. >>>>>>>>> This doesn't seem a good solution to me and I think it would be >>>>>>>>> preferable to >>>>>>>>> update the framework to register the list of each available sensors. >>>>>>>> Why must the drivers do that ? >>>>>>> Because there are no central place where thermal sensor are >>>>>>> registered. >>>>>>> The only other way I found was to update thermal_of.c, >>>>>>> to register the thermal sensors and make them available later to the >>>>>>> virtual thermal sensor. >>>>>>> >>>>>>> To work, the virtual thermal need to get the sensor_data the ops from >>>>>>> the thermal sensor. >>>>>>> And as far I know, this is only registered in thermal_of.c, in the >>>>>>> thermal zone data >>>>>>> but I can't access it directly from the virtual thermal sensor. >>>>>>> >>>>>>> How would you do it ? >>>>>> Via the phandles when registering the virtual sensor ? >>>>> As far I know, we can't get the ops or the sensor_data from the phandle >>>>> of a thermal sensor. >>>>> The closest solution I found so far would be to aggregate the thermal >>>>> zones instead of thermal sensors. >>>>> thermal_zone_device has the data needed and a thermal zone could be find >>>>> easily using its name. >>>> Yeah, the concept of the thermal zone and the sensor are very close. >>>> >>>> There is the function in thermal_core.h: >>>> >>>> -> for_each_thermal_zone() >>>> >>>> You should be able for each 'slave' sensor, do a lookup to find the >>>> corresponding thermal_zone_device_ops. >>>> >>>>> But, using a thermal_zone_device, I don't see how to handle module >>>>> unloading. >>>> I think try_module_get() / module_put() are adequate for this situation >>>> as it is done on an external module and we can not rely on the exported >>>> symbols. >>> I don't see how it would be possible to use these functions. >>> The thermal zone doesn't have the data required to use it. >> >> Actually I was able to crash the kernel by doing: >> >> console 1: >> >> while $(true); do insmod <module> && rmmod <module>; done >> >> console 2: >> >> while $(true); cat /sys/class/thermal/thermal_zone0/temp; done >> >> So there is something wrong already in the thermal framework. > > Hmmm. > >> IMO, the first thing would be to fix this critical issue by getting the >> sensor module refcount when the thermal zone is enabled and dropping it >> when it is disabled. >> >> With that fixed, perhaps it will possible to get the device associated >> with the sensor and then try_module_get(dev->driver->owner) >> >>> Maybe a more easier way is to use the thermal_zone_device mutex. >>> If I get a lock before to use the thermal_zone_device ops, I have the >>> guaranty that module won't be unloaded. > > That would be my approach too. The mutex is private to the thermal core. The virtual sensor should not touch it :/ Perhaps, it can work with a private spin_lock with a try_spinlock() ? >>> When a "thermal of sensor" is unloaded, it calls >>> thermal_zone_of_sensor_unregister which takes a lock before >>> update ops. >> >> I'm not sure to understand. The goal is to have the refcount on the >> modules to be incremented when the virtual sensor is using them. > > IMO the goal is to prevent the code from crashing when modules get > unloaded. I'm not really sure if refcounts alone are sufficient for > that. The problem is in the loop: +static int virtual_thermal_sensor_get_temp(void *data, int *temperature) +{ + struct virtual_thermal_sensor *sensor = data; + int max_temp = INT_MIN; + int temp; + int i; + + for (i = 0; i < sensor->count; i++) { + struct thermal_sensor_data *hw_sensor; + + hw_sensor = &sensor->sensors[i]; + if (!hw_sensor->ops) + return -ENODEV; + + hw_sensor->ops->get_temp(hw_sensor->sensor_data, &temp); + max_temp = sensor->aggr_temp(max_temp, temp); + } + + *temperature = max_temp; + + return 0; +} If one of the sensor is unloaded when get_temp is called, hw_sensor->ops->get_temp will crash. So the proposal is virtual_sensor_add_sensor() does try_get_module() and virtual_sensor_remove_sensor() does put_module(). The ref on the 'slave' modules will be release only if the virtual sensor is unregistered. So until, the virtual sensor is unregistered, the 'slaves' modules can not be unloaded. That is what we find with eg. the wifi modules. >> Until the virtual sensor is registered, it will prevent the other >> modules to be unloaded. > > Unless they are forced to unload that is, AFAICS. > > IMO it would be better to make the code survive unloading of a module.
On Wed, Oct 6, 2021 at 6:06 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > On 05/10/2021 18:45, Rafael J. Wysocki wrote: > > On Mon, Oct 4, 2021 at 3:42 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > >> > >> On 04/10/2021 12:24, Alexandre Bailon wrote: > >>> > >>> On 9/22/21 10:10 AM, Daniel Lezcano wrote: > >>>> On 20/09/2021 15:12, Alexandre Bailon wrote: > >>>>> On 9/17/21 4:03 PM, Daniel Lezcano wrote: > >>>>>> On 17/09/2021 15:33, Alexandre Bailon wrote: > >>>>>>> Hi Daniel, > >>>>>>> > >>>>>>> On 9/17/21 2:41 PM, Daniel Lezcano wrote: > >>>>>>>> On 17/09/2021 09:27, Alexandre Bailon wrote: > >>>>>>>>> This series add a virtual thermal sensor. > >>>>>>>>> It could be used to get a temperature using some thermal sensors. > >>>>>>>>> Currently, the supported operations are max, min and avg. > >>>>>>>>> The virtual sensor could be easily extended to support others > >>>>>>>>> operations. > >>>>>>>>> > >>>>>>>>> Note: > >>>>>>>>> Currently, thermal drivers must explicitly register their sensors to > >>>>>>>>> make them > >>>>>>>>> available to the virtual sensor. > >>>>>>>>> This doesn't seem a good solution to me and I think it would be > >>>>>>>>> preferable to > >>>>>>>>> update the framework to register the list of each available sensors. > >>>>>>>> Why must the drivers do that ? > >>>>>>> Because there are no central place where thermal sensor are > >>>>>>> registered. > >>>>>>> The only other way I found was to update thermal_of.c, > >>>>>>> to register the thermal sensors and make them available later to the > >>>>>>> virtual thermal sensor. > >>>>>>> > >>>>>>> To work, the virtual thermal need to get the sensor_data the ops from > >>>>>>> the thermal sensor. > >>>>>>> And as far I know, this is only registered in thermal_of.c, in the > >>>>>>> thermal zone data > >>>>>>> but I can't access it directly from the virtual thermal sensor. > >>>>>>> > >>>>>>> How would you do it ? > >>>>>> Via the phandles when registering the virtual sensor ? > >>>>> As far I know, we can't get the ops or the sensor_data from the phandle > >>>>> of a thermal sensor. > >>>>> The closest solution I found so far would be to aggregate the thermal > >>>>> zones instead of thermal sensors. > >>>>> thermal_zone_device has the data needed and a thermal zone could be find > >>>>> easily using its name. > >>>> Yeah, the concept of the thermal zone and the sensor are very close. > >>>> > >>>> There is the function in thermal_core.h: > >>>> > >>>> -> for_each_thermal_zone() > >>>> > >>>> You should be able for each 'slave' sensor, do a lookup to find the > >>>> corresponding thermal_zone_device_ops. > >>>> > >>>>> But, using a thermal_zone_device, I don't see how to handle module > >>>>> unloading. > >>>> I think try_module_get() / module_put() are adequate for this situation > >>>> as it is done on an external module and we can not rely on the exported > >>>> symbols. > >>> I don't see how it would be possible to use these functions. > >>> The thermal zone doesn't have the data required to use it. > >> > >> Actually I was able to crash the kernel by doing: > >> > >> console 1: > >> > >> while $(true); do insmod <module> && rmmod <module>; done > >> > >> console 2: > >> > >> while $(true); cat /sys/class/thermal/thermal_zone0/temp; done > >> > >> So there is something wrong already in the thermal framework. > > > > Hmmm. > > > >> IMO, the first thing would be to fix this critical issue by getting the > >> sensor module refcount when the thermal zone is enabled and dropping it > >> when it is disabled. > >> > >> With that fixed, perhaps it will possible to get the device associated > >> with the sensor and then try_module_get(dev->driver->owner) > >> > >>> Maybe a more easier way is to use the thermal_zone_device mutex. > >>> If I get a lock before to use the thermal_zone_device ops, I have the > >>> guaranty that module won't be unloaded. > > > > That would be my approach too. > > The mutex is private to the thermal core. The virtual sensor should not > touch it :/ > > Perhaps, it can work with a private spin_lock with a try_spinlock() ? IIUC this is a case when module A refers to some memory in module B and if the latter goes away, an access to that memory from the former is a use-after-free, so it is not sufficient to use a local spinlock. This can be avoided by having a lock and a flag such that the flag is set under the lock by module B when making the memory in question available and cleared under the lock when freeing that memory. Then, module A needs to check the flag under the lock on every access to that memory. Also, the lock and the flag must be accessible all the time to both modules (ie. must not go away along with any of them if they don't depend on each other). > >>> When a "thermal of sensor" is unloaded, it calls > >>> thermal_zone_of_sensor_unregister which takes a lock before > >>> update ops. > >> > >> I'm not sure to understand. The goal is to have the refcount on the > >> modules to be incremented when the virtual sensor is using them. > > > > IMO the goal is to prevent the code from crashing when modules get > > unloaded. I'm not really sure if refcounts alone are sufficient for > > that. > > The problem is in the loop: > > +static int virtual_thermal_sensor_get_temp(void *data, int *temperature) > +{ > + struct virtual_thermal_sensor *sensor = data; > + int max_temp = INT_MIN; > + int temp; > + int i; > + > + for (i = 0; i < sensor->count; i++) { > + struct thermal_sensor_data *hw_sensor; > + > + hw_sensor = &sensor->sensors[i]; > + if (!hw_sensor->ops) > + return -ENODEV; > + > + hw_sensor->ops->get_temp(hw_sensor->sensor_data, &temp); > + max_temp = sensor->aggr_temp(max_temp, temp); > + } > + > + *temperature = max_temp; > + > + return 0; > +} > > If one of the sensor is unloaded when get_temp is called, > hw_sensor->ops->get_temp will crash. Right. Dereferencing hw_sensor itself is not safe in this loop if the module holding the memory pointed to by it may go away. However, presumably, the hw_sensor object needs to be registered with the core in order to be used here and unregistered when it goes away, so it looks like this loop could use a wrapper like thermal_get_sensor_temp(hw_sensor, &temp) which would return a negative error code if the sensor in question went away. Of course, that would require the core to check the list of available sensors every time, but that may be implemented efficiently with the help of an xarray, for example. > So the proposal is virtual_sensor_add_sensor() does try_get_module() > and virtual_sensor_remove_sensor() does put_module(). > > The ref on the 'slave' modules will be release only if the virtual > sensor is unregistered. > > So until, the virtual sensor is unregistered, the 'slaves' modules can > not be unloaded. > > That is what we find with eg. the wifi modules. Yes, but that's a bit cumbersome from the sysadmin perspective IMO, especially when one wants to unload one of the modules and doesn't know exactly what other modules hold references to it. IIUC ref_module() might be nicer, but it is still more convenient if the modules can just go away independently.
On 06/10/2021 20:00, Rafael J. Wysocki wrote: > On Wed, Oct 6, 2021 at 6:06 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >> >> On 05/10/2021 18:45, Rafael J. Wysocki wrote: >>> On Mon, Oct 4, 2021 at 3:42 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >>>> >>>> On 04/10/2021 12:24, Alexandre Bailon wrote: >>>>> >>>>> On 9/22/21 10:10 AM, Daniel Lezcano wrote: >>>>>> On 20/09/2021 15:12, Alexandre Bailon wrote: >>>>>>> On 9/17/21 4:03 PM, Daniel Lezcano wrote: >>>>>>>> On 17/09/2021 15:33, Alexandre Bailon wrote: >>>>>>>>> Hi Daniel, >>>>>>>>> >>>>>>>>> On 9/17/21 2:41 PM, Daniel Lezcano wrote: >>>>>>>>>> On 17/09/2021 09:27, Alexandre Bailon wrote: >>>>>>>>>>> This series add a virtual thermal sensor. >>>>>>>>>>> It could be used to get a temperature using some thermal sensors. >>>>>>>>>>> Currently, the supported operations are max, min and avg. >>>>>>>>>>> The virtual sensor could be easily extended to support others >>>>>>>>>>> operations. >>>>>>>>>>> >>>>>>>>>>> Note: >>>>>>>>>>> Currently, thermal drivers must explicitly register their sensors to >>>>>>>>>>> make them >>>>>>>>>>> available to the virtual sensor. >>>>>>>>>>> This doesn't seem a good solution to me and I think it would be >>>>>>>>>>> preferable to >>>>>>>>>>> update the framework to register the list of each available sensors. >>>>>>>>>> Why must the drivers do that ? >>>>>>>>> Because there are no central place where thermal sensor are >>>>>>>>> registered. >>>>>>>>> The only other way I found was to update thermal_of.c, >>>>>>>>> to register the thermal sensors and make them available later to the >>>>>>>>> virtual thermal sensor. >>>>>>>>> >>>>>>>>> To work, the virtual thermal need to get the sensor_data the ops from >>>>>>>>> the thermal sensor. >>>>>>>>> And as far I know, this is only registered in thermal_of.c, in the >>>>>>>>> thermal zone data >>>>>>>>> but I can't access it directly from the virtual thermal sensor. >>>>>>>>> >>>>>>>>> How would you do it ? >>>>>>>> Via the phandles when registering the virtual sensor ? >>>>>>> As far I know, we can't get the ops or the sensor_data from the phandle >>>>>>> of a thermal sensor. >>>>>>> The closest solution I found so far would be to aggregate the thermal >>>>>>> zones instead of thermal sensors. >>>>>>> thermal_zone_device has the data needed and a thermal zone could be find >>>>>>> easily using its name. >>>>>> Yeah, the concept of the thermal zone and the sensor are very close. >>>>>> >>>>>> There is the function in thermal_core.h: >>>>>> >>>>>> -> for_each_thermal_zone() >>>>>> >>>>>> You should be able for each 'slave' sensor, do a lookup to find the >>>>>> corresponding thermal_zone_device_ops. >>>>>> >>>>>>> But, using a thermal_zone_device, I don't see how to handle module >>>>>>> unloading. >>>>>> I think try_module_get() / module_put() are adequate for this situation >>>>>> as it is done on an external module and we can not rely on the exported >>>>>> symbols. >>>>> I don't see how it would be possible to use these functions. >>>>> The thermal zone doesn't have the data required to use it. >>>> >>>> Actually I was able to crash the kernel by doing: >>>> >>>> console 1: >>>> >>>> while $(true); do insmod <module> && rmmod <module>; done >>>> >>>> console 2: >>>> >>>> while $(true); cat /sys/class/thermal/thermal_zone0/temp; done >>>> >>>> So there is something wrong already in the thermal framework. >>> >>> Hmmm. >>> >>>> IMO, the first thing would be to fix this critical issue by getting the >>>> sensor module refcount when the thermal zone is enabled and dropping it >>>> when it is disabled. >>>> >>>> With that fixed, perhaps it will possible to get the device associated >>>> with the sensor and then try_module_get(dev->driver->owner) >>>> >>>>> Maybe a more easier way is to use the thermal_zone_device mutex. >>>>> If I get a lock before to use the thermal_zone_device ops, I have the >>>>> guaranty that module won't be unloaded. >>> >>> That would be my approach too. >> >> The mutex is private to the thermal core. The virtual sensor should not >> touch it :/ >> >> Perhaps, it can work with a private spin_lock with a try_spinlock() ? > > IIUC this is a case when module A refers to some memory in module B > and if the latter goes away, an access to that memory from the former > is a use-after-free, so it is not sufficient to use a local spinlock. > This can be avoided by having a lock and a flag such that the flag is > set under the lock by module B when making the memory in question > available and cleared under the lock when freeing that memory. Then, > module A needs to check the flag under the lock on every access to > that memory. Also, the lock and the flag must be accessible all the > time to both modules (ie. must not go away along with any of them if > they don't depend on each other). Just to clarify, the idea behind the virtual sensor is the same as the network bridge: the network interfaces are not aware they are attached to a bridge. The virtual sensor should attach and detach the physical sensors. The sensors should not un|register themselves from|to the virtual sensor, nor having specific code related to the virtual sensor. De facto, all the existing sensors will be compatible with the virtual sensor. Do we have a solution other than grabbing a module reference and self-contained to the virtual sensor ? >>>>> When a "thermal of sensor" is unloaded, it calls >>>>> thermal_zone_of_sensor_unregister which takes a lock before >>>>> update ops. >>>> >>>> I'm not sure to understand. The goal is to have the refcount on the >>>> modules to be incremented when the virtual sensor is using them. >>> >>> IMO the goal is to prevent the code from crashing when modules get >>> unloaded. I'm not really sure if refcounts alone are sufficient for >>> that. >> >> The problem is in the loop: >> >> +static int virtual_thermal_sensor_get_temp(void *data, int *temperature) >> +{ >> + struct virtual_thermal_sensor *sensor = data; >> + int max_temp = INT_MIN; >> + int temp; >> + int i; >> + >> + for (i = 0; i < sensor->count; i++) { >> + struct thermal_sensor_data *hw_sensor; >> + >> + hw_sensor = &sensor->sensors[i]; >> + if (!hw_sensor->ops) >> + return -ENODEV; >> + >> + hw_sensor->ops->get_temp(hw_sensor->sensor_data, &temp); >> + max_temp = sensor->aggr_temp(max_temp, temp); >> + } >> + >> + *temperature = max_temp; >> + >> + return 0; >> +} >> >> If one of the sensor is unloaded when get_temp is called, >> hw_sensor->ops->get_temp will crash. > > Right. > > Dereferencing hw_sensor itself is not safe in this loop if the module > holding the memory pointed to by it may go away. > > However, presumably, the hw_sensor object needs to be registered with > the core in order to be used here and unregistered when it goes away, > so it looks like this loop could use a wrapper like > thermal_get_sensor_temp(hw_sensor, &temp) which would return a > negative error code if the sensor in question went away. > > Of course, that would require the core to check the list of available > sensors every time, but that may be implemented efficiently with the > help of an xarray, for example. > >> So the proposal is virtual_sensor_add_sensor() does try_get_module() >> and virtual_sensor_remove_sensor() does put_module(). >> >> The ref on the 'slave' modules will be release only if the virtual >> sensor is unregistered. >> >> So until, the virtual sensor is unregistered, the 'slaves' modules can >> not be unloaded. >> >> That is what we find with eg. the wifi modules. > > Yes, but that's a bit cumbersome from the sysadmin perspective IMO, > especially when one wants to unload one of the modules and doesn't > know exactly what other modules hold references to it. Well, sysadmin are not supposed to unload a thermal sensor and I would say if he unloads the module, which is not an usual operation, up to him to 'lsmod' and check what module is holding the reference. > IIUC ref_module() might be nicer, but it is still more convenient if > the modules can just go away independently. "for desesperate users" :)
On Wed, Oct 6, 2021 at 9:51 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > On 06/10/2021 20:00, Rafael J. Wysocki wrote: > > On Wed, Oct 6, 2021 at 6:06 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > >> > >> On 05/10/2021 18:45, Rafael J. Wysocki wrote: > >>> On Mon, Oct 4, 2021 at 3:42 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > >>>> > >>>> On 04/10/2021 12:24, Alexandre Bailon wrote: > >>>>> > >>>>> On 9/22/21 10:10 AM, Daniel Lezcano wrote: > >>>>>> On 20/09/2021 15:12, Alexandre Bailon wrote: > >>>>>>> On 9/17/21 4:03 PM, Daniel Lezcano wrote: > >>>>>>>> On 17/09/2021 15:33, Alexandre Bailon wrote: > >>>>>>>>> Hi Daniel, > >>>>>>>>> > >>>>>>>>> On 9/17/21 2:41 PM, Daniel Lezcano wrote: > >>>>>>>>>> On 17/09/2021 09:27, Alexandre Bailon wrote: > >>>>>>>>>>> This series add a virtual thermal sensor. > >>>>>>>>>>> It could be used to get a temperature using some thermal sensors. > >>>>>>>>>>> Currently, the supported operations are max, min and avg. > >>>>>>>>>>> The virtual sensor could be easily extended to support others > >>>>>>>>>>> operations. > >>>>>>>>>>> > >>>>>>>>>>> Note: > >>>>>>>>>>> Currently, thermal drivers must explicitly register their sensors to > >>>>>>>>>>> make them > >>>>>>>>>>> available to the virtual sensor. > >>>>>>>>>>> This doesn't seem a good solution to me and I think it would be > >>>>>>>>>>> preferable to > >>>>>>>>>>> update the framework to register the list of each available sensors. > >>>>>>>>>> Why must the drivers do that ? > >>>>>>>>> Because there are no central place where thermal sensor are > >>>>>>>>> registered. > >>>>>>>>> The only other way I found was to update thermal_of.c, > >>>>>>>>> to register the thermal sensors and make them available later to the > >>>>>>>>> virtual thermal sensor. > >>>>>>>>> > >>>>>>>>> To work, the virtual thermal need to get the sensor_data the ops from > >>>>>>>>> the thermal sensor. > >>>>>>>>> And as far I know, this is only registered in thermal_of.c, in the > >>>>>>>>> thermal zone data > >>>>>>>>> but I can't access it directly from the virtual thermal sensor. > >>>>>>>>> > >>>>>>>>> How would you do it ? > >>>>>>>> Via the phandles when registering the virtual sensor ? > >>>>>>> As far I know, we can't get the ops or the sensor_data from the phandle > >>>>>>> of a thermal sensor. > >>>>>>> The closest solution I found so far would be to aggregate the thermal > >>>>>>> zones instead of thermal sensors. > >>>>>>> thermal_zone_device has the data needed and a thermal zone could be find > >>>>>>> easily using its name. > >>>>>> Yeah, the concept of the thermal zone and the sensor are very close. > >>>>>> > >>>>>> There is the function in thermal_core.h: > >>>>>> > >>>>>> -> for_each_thermal_zone() > >>>>>> > >>>>>> You should be able for each 'slave' sensor, do a lookup to find the > >>>>>> corresponding thermal_zone_device_ops. > >>>>>> > >>>>>>> But, using a thermal_zone_device, I don't see how to handle module > >>>>>>> unloading. > >>>>>> I think try_module_get() / module_put() are adequate for this situation > >>>>>> as it is done on an external module and we can not rely on the exported > >>>>>> symbols. > >>>>> I don't see how it would be possible to use these functions. > >>>>> The thermal zone doesn't have the data required to use it. > >>>> > >>>> Actually I was able to crash the kernel by doing: > >>>> > >>>> console 1: > >>>> > >>>> while $(true); do insmod <module> && rmmod <module>; done > >>>> > >>>> console 2: > >>>> > >>>> while $(true); cat /sys/class/thermal/thermal_zone0/temp; done > >>>> > >>>> So there is something wrong already in the thermal framework. > >>> > >>> Hmmm. > >>> > >>>> IMO, the first thing would be to fix this critical issue by getting the > >>>> sensor module refcount when the thermal zone is enabled and dropping it > >>>> when it is disabled. > >>>> > >>>> With that fixed, perhaps it will possible to get the device associated > >>>> with the sensor and then try_module_get(dev->driver->owner) > >>>> > >>>>> Maybe a more easier way is to use the thermal_zone_device mutex. > >>>>> If I get a lock before to use the thermal_zone_device ops, I have the > >>>>> guaranty that module won't be unloaded. > >>> > >>> That would be my approach too. > >> > >> The mutex is private to the thermal core. The virtual sensor should not > >> touch it :/ > >> > >> Perhaps, it can work with a private spin_lock with a try_spinlock() ? > > > > IIUC this is a case when module A refers to some memory in module B > > and if the latter goes away, an access to that memory from the former > > is a use-after-free, so it is not sufficient to use a local spinlock. > > > This can be avoided by having a lock and a flag such that the flag is > > set under the lock by module B when making the memory in question > > available and cleared under the lock when freeing that memory. Then, > > module A needs to check the flag under the lock on every access to > > that memory. Also, the lock and the flag must be accessible all the > > time to both modules (ie. must not go away along with any of them if > > they don't depend on each other). > > Just to clarify, the idea behind the virtual sensor is the same as the > network bridge: the network interfaces are not aware they are attached > to a bridge. But there's no code handling the bridge that needs to access memory controlled by network interface driver modules. > The virtual sensor should attach and detach the physical sensors. > > The sensors should not un|register themselves from|to the virtual > sensor, nor having specific code related to the virtual sensor. De > facto, all the existing sensors will be compatible with the virtual sensor. If I understand this patch series correctly, somebody is expected to call thermal_virtual_sensor_register() (or the devm_ variant of it) in order to add a sensor to the thermal_sensors list (which appears to be racy given the lack of locking or any kind of synchronization whatsoever). This appears to assume that all of the sensors will be added before running virtual_thermal_sensor_probe() which effectively is the only reader of the thermal_sensors list and runs as a ->probe() callbacks of virtual_thermal_sensor (there is a name clash between this and a struct data type which isn't useful) which gets registered as a platform driver when the module loads, as per module_platform_driver(). So whoever called thermal_virtual_sensor_register() to register a sensor, will be required to call thermal_virtual_sensor_unregister() or devm_thermal_virtual_sensor_release() will be called for them when they are going away (again, no locking there). That should also reconfigure all of the virtual sensors using the given sensor going away and which should take care of the problem we are discussing, shouldn't it? Let me respond to the patches, though.