Message ID | 20230324070807.6342-2-rui.zhang@intel.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | [v2,1/5] thermal/core: Update cooling device during thermal zone unregistration | expand |
On Fri, Mar 24, 2023 at 8:08 AM Zhang Rui <rui.zhang@intel.com> wrote: > > When unregistering a cooling device, it is possible that the cooling > device has been activated. And once the cooling device is unregistered, > no one will deactivate it anymore. > > Reset cooling state during cooling device unregistration. > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > --- > In theory, this problem that this patch fixes can be triggered on a > platform with ACPI Active cooling, by > 1. overheat the system to trigger ACPI active cooling > 2. unload ACPI fan driver > 3. check if the fan is still spinning > But I don't have such a system so I didn't trigger then problem and I > only did build & boot test. So I'm not sure if this change is actually safe. In the example above, the system will still need the fan to spin after the ACPI fan driver is unloaded in order to cool down, won't it? > --- > drivers/thermal/thermal_core.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > index 30ff39154598..fd54e6c10b60 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -1192,6 +1192,10 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev) > } > } > > + mutex_lock(&cdev->lock); > + cdev->ops->set_cur_state(cdev, 0); > + mutex_unlock(&cdev->lock); > + > mutex_unlock(&thermal_list_lock); > > device_unregister(&cdev->device); > -- > 2.25.1 >
On Fri, 2023-03-24 at 14:19 +0100, Rafael J. Wysocki wrote: > On Fri, Mar 24, 2023 at 8:08 AM Zhang Rui <rui.zhang@intel.com> > wrote: > > When unregistering a cooling device, it is possible that the > > cooling > > device has been activated. And once the cooling device is > > unregistered, > > no one will deactivate it anymore. > > > > Reset cooling state during cooling device unregistration. > > > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > --- > > In theory, this problem that this patch fixes can be triggered on a > > platform with ACPI Active cooling, by > > 1. overheat the system to trigger ACPI active cooling > > 2. unload ACPI fan driver > > 3. check if the fan is still spinning > > But I don't have such a system so I didn't trigger then problem and > > I > > only did build & boot test. > > So I'm not sure if this change is actually safe. > > In the example above, the system will still need the fan to spin > after > the ACPI fan driver is unloaded in order to cool down, won't it? Then we can argue that the ACPI fan driver should not be unloaded in this case. Actually, this is the same situation as patch 1/5. Patch 1/5 fixes the problem that cooling state not restored to 0 when unloading the thermal driver, and this fixes the same problem when unloading the cooling device driver. thanks, rui > > > --- > > drivers/thermal/thermal_core.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/thermal/thermal_core.c > > b/drivers/thermal/thermal_core.c > > index 30ff39154598..fd54e6c10b60 100644 > > --- a/drivers/thermal/thermal_core.c > > +++ b/drivers/thermal/thermal_core.c > > @@ -1192,6 +1192,10 @@ void > > thermal_cooling_device_unregister(struct thermal_cooling_device > > *cdev) > > } > > } > > > > + mutex_lock(&cdev->lock); > > + cdev->ops->set_cur_state(cdev, 0); > > + mutex_unlock(&cdev->lock); > > + > > mutex_unlock(&thermal_list_lock); > > > > device_unregister(&cdev->device); > > -- > > 2.25.1 > >
On Mon, Mar 27, 2023 at 4:50 PM Zhang, Rui <rui.zhang@intel.com> wrote: > > On Fri, 2023-03-24 at 14:19 +0100, Rafael J. Wysocki wrote: > > On Fri, Mar 24, 2023 at 8:08 AM Zhang Rui <rui.zhang@intel.com> > > wrote: > > > When unregistering a cooling device, it is possible that the > > > cooling > > > device has been activated. And once the cooling device is > > > unregistered, > > > no one will deactivate it anymore. > > > > > > Reset cooling state during cooling device unregistration. > > > > > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > > --- > > > In theory, this problem that this patch fixes can be triggered on a > > > platform with ACPI Active cooling, by > > > 1. overheat the system to trigger ACPI active cooling > > > 2. unload ACPI fan driver > > > 3. check if the fan is still spinning > > > But I don't have such a system so I didn't trigger then problem and > > > I > > > only did build & boot test. > > > > So I'm not sure if this change is actually safe. > > > > In the example above, the system will still need the fan to spin > > after > > the ACPI fan driver is unloaded in order to cool down, won't it? > > Then we can argue that the ACPI fan driver should not be unloaded in > this case. I don't think that whether or not the driver is expected to be unloaded at a given time has any bearing on how it should behave when actually unloaded. Leaving the cooling device in its current state is "safe" from the thermal control perspective, but it may affect the general user experience (which may include performance too) going forward, so there is a tradeoff. You can argue that even if the cooling device is reset on the driver removal, there should be another thermal control mechanism in place that will take care of the overheat condition instead of it, but that mechanism may be an emergency system shutdown. What do the other cooling device drivers do in general when they get removed? > Actually, this is the same situation as patch 1/5. > Patch 1/5 fixes the problem that cooling state not restored to 0 when > unloading the thermal driver, and this fixes the same problem when > unloading the cooling device driver. Right, it is analogous.
On Mon, 2023-03-27 at 17:13 +0200, Rafael J. Wysocki wrote: > On Mon, Mar 27, 2023 at 4:50 PM Zhang, Rui <rui.zhang@intel.com> > wrote: > > On Fri, 2023-03-24 at 14:19 +0100, Rafael J. Wysocki wrote: > > > On Fri, Mar 24, 2023 at 8:08 AM Zhang Rui <rui.zhang@intel.com> > > > wrote: > > > > When unregistering a cooling device, it is possible that the > > > > cooling > > > > device has been activated. And once the cooling device is > > > > unregistered, > > > > no one will deactivate it anymore. > > > > > > > > Reset cooling state during cooling device unregistration. > > > > > > > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > > > --- > > > > In theory, this problem that this patch fixes can be triggered > > > > on a > > > > platform with ACPI Active cooling, by > > > > 1. overheat the system to trigger ACPI active cooling > > > > 2. unload ACPI fan driver > > > > 3. check if the fan is still spinning > > > > But I don't have such a system so I didn't trigger then problem > > > > and > > > > I > > > > only did build & boot test. > > > > > > So I'm not sure if this change is actually safe. > > > > > > In the example above, the system will still need the fan to spin > > > after > > > the ACPI fan driver is unloaded in order to cool down, won't it? > > > > Then we can argue that the ACPI fan driver should not be unloaded > > in > > this case. > > I don't think that whether or not the driver is expected to be > unloaded at a given time has any bearing on how it should behave when > actually unloaded. > > Leaving the cooling device in its current state is "safe" from the > thermal control perspective, but it may affect the general user > experience (which may include performance too) going forward, so > there > is a tradeoff. Right. If we don't have a third choice, then the question is simple. "thermal safety" vs. "user experience"? I'd vote for "thermal safety" and drop this patch series. > > What do the other cooling device drivers do in general when they get > removed? No cooling device driver has extra handling after cdev unregistration. thanks, rui
On Tue, Mar 28, 2023 at 4:46 AM Zhang, Rui <rui.zhang@intel.com> wrote: > > On Mon, 2023-03-27 at 17:13 +0200, Rafael J. Wysocki wrote: > > On Mon, Mar 27, 2023 at 4:50 PM Zhang, Rui <rui.zhang@intel.com> > > wrote: > > > On Fri, 2023-03-24 at 14:19 +0100, Rafael J. Wysocki wrote: > > > > On Fri, Mar 24, 2023 at 8:08 AM Zhang Rui <rui.zhang@intel.com> > > > > wrote: > > > > > When unregistering a cooling device, it is possible that the > > > > > cooling > > > > > device has been activated. And once the cooling device is > > > > > unregistered, > > > > > no one will deactivate it anymore. > > > > > > > > > > Reset cooling state during cooling device unregistration. > > > > > > > > > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > > > > --- > > > > > In theory, this problem that this patch fixes can be triggered > > > > > on a > > > > > platform with ACPI Active cooling, by > > > > > 1. overheat the system to trigger ACPI active cooling > > > > > 2. unload ACPI fan driver > > > > > 3. check if the fan is still spinning > > > > > But I don't have such a system so I didn't trigger then problem > > > > > and > > > > > I > > > > > only did build & boot test. > > > > > > > > So I'm not sure if this change is actually safe. > > > > > > > > In the example above, the system will still need the fan to spin > > > > after > > > > the ACPI fan driver is unloaded in order to cool down, won't it? > > > > > > Then we can argue that the ACPI fan driver should not be unloaded > > > in > > > this case. > > > > I don't think that whether or not the driver is expected to be > > unloaded at a given time has any bearing on how it should behave when > > actually unloaded. > > > > Leaving the cooling device in its current state is "safe" from the > > thermal control perspective, but it may affect the general user > > experience (which may include performance too) going forward, so > > there > > is a tradeoff. > > Right. > If we don't have a third choice, then the question is simple. > "thermal safety" vs. "user experience"? > > I'd vote for "thermal safety" and drop this patch series. Works for me. > > What do the other cooling device drivers do in general when they get > > removed? > > No cooling device driver has extra handling after cdev unregistration. However, the question regarding what to do when the driver of a cooling device in use is being removed is a valid one. One possible approach that comes to mind could be to defer the driver removal until the overheat condition goes away, but anyway it would be better to do that in the core IMV.
On Tue, 2023-03-28 at 19:54 +0200, Rafael J. Wysocki wrote: > > > What do the other cooling device drivers do in general when they > > > get > > > removed? > > > > No cooling device driver has extra handling after cdev > > unregistration. > > However, the question regarding what to do when the driver of a > cooling device in use is being removed is a valid one. > > One possible approach that comes to mind could be to defer the driver > removal until the overheat condition goes away, but anyway it would > be > better to do that in the core IMV. In this case, we should guarantee that the thermal zone driver is still functional. i.e. it still can get temperature change notifications and update the thermal zone. I doubt if current thermal zone drivers can guarantee this. Given that this is a rare case, and the current behavior is not perfect but still acceptable, maybe we can leave this low priority for now. thanks, rui
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 30ff39154598..fd54e6c10b60 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -1192,6 +1192,10 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev) } } + mutex_lock(&cdev->lock); + cdev->ops->set_cur_state(cdev, 0); + mutex_unlock(&cdev->lock); + mutex_unlock(&thermal_list_lock); device_unregister(&cdev->device);
When unregistering a cooling device, it is possible that the cooling device has been activated. And once the cooling device is unregistered, no one will deactivate it anymore. Reset cooling state during cooling device unregistration. Signed-off-by: Zhang Rui <rui.zhang@intel.com> --- In theory, this problem that this patch fixes can be triggered on a platform with ACPI Active cooling, by 1. overheat the system to trigger ACPI active cooling 2. unload ACPI fan driver 3. check if the fan is still spinning But I don't have such a system so I didn't trigger then problem and I only did build & boot test. --- drivers/thermal/thermal_core.c | 4 ++++ 1 file changed, 4 insertions(+)