diff mbox

thermal: tell cooling devices when a trip_point changes

Message ID 53D97B70.1050305@nvidia.com (mailing list archive)
State Rejected
Delegated to: Zhang Rui
Headers show

Commit Message

Matthew Longnecker July 30, 2014, 11:10 p.m. UTC
Some hardware can react autonomously at a programmed temperature.
For example, an SoC might implement a last ditch throttle or a
hardware thermal shutdown. The driver for such a device can
register itself as a cooling_device with the thermal framework.

With this change, the thermal framework notifies such a driver
when userspace alters the relevant trip temperature so that
the driver can reprogram its hardware

Signed-off-by: Matt Longnecker <mlongnecker@nvidia.com>
---
 drivers/thermal/thermal_core.c |   30 ++++++++++++++++++++++++++++++
 include/linux/thermal.h        |    2 ++
 2 files changed, 32 insertions(+)

Comments

Javi Merino July 31, 2014, 7:59 a.m. UTC | #1
On Thu, Jul 31, 2014 at 12:10:40AM +0100, Matt Longnecker wrote:
> Some hardware can react autonomously at a programmed temperature.
> For example, an SoC might implement a last ditch throttle or a
> hardware thermal shutdown. The driver for such a device can
> register itself as a cooling_device with the thermal framework.
> 
> With this change, the thermal framework notifies such a driver
> when userspace alters the relevant trip temperature so that
> the driver can reprogram its hardware

Why can't you just use the existing cooling device interface?  Cooling
devices can be bound to trip points.  Most thermal governors will
increase cooling for that cooling device when the trip point is hit.
The last ditch throttle or hardware thermal shutdown will then kick
when the cooling state changes to 1.

If the existing governors are too complex for what you want, you can
have a look at the bang bang governor[0] which (I think) is bound to
be merged soon.

[0] http://article.gmane.org/gmane.linux.kernel/1753348

Cheers,
Javi

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang Rui July 31, 2014, 8:30 a.m. UTC | #2
On Wed, 2014-07-30 at 16:10 -0700, Matt Longnecker wrote:
> Some hardware can react autonomously at a programmed temperature.

if you have a temperature sensor, then you should have a thermal zone
device driver for it, right?

> For example, an SoC might implement a last ditch throttle or a
> hardware thermal shutdown.

And it should be handled by the thermal zone driver, who has the
knowledge of when to do throttle/shutdown.

>  The driver for such a device can
> register itself as a cooling_device with the thermal framework.
> 
cooling device just exports its cooling ability for thermal framework to
use. It never makes any decision about thermal control.

> With this change, the thermal framework notifies such a driver
> when userspace alters the relevant trip temperature so that
> the driver can reprogram its hardware
> 
When user space alters the trip temperature, the thermal zone device
driver is aware is this change, and it can optionally program the
hardware.

thanks,
rui
> Signed-off-by: Matt Longnecker <mlongnecker@nvidia.com>
> ---
>  drivers/thermal/thermal_core.c |   30 ++++++++++++++++++++++++++++++
>  include/linux/thermal.h        |    2 ++
>  2 files changed, 32 insertions(+)
> 
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 71b0ec0..f25272e 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -597,6 +597,7 @@ trip_point_temp_store(struct device *dev, struct device_attribute *attr,
>  	struct thermal_zone_device *tz = to_thermal_zone(dev);
>  	int trip, ret;
>  	unsigned long temperature;
> +	struct thermal_instance *pos = NULL;
>  
>  	if (!tz->ops->set_trip_temp)
>  		return -EPERM;
> @@ -609,6 +610,20 @@ trip_point_temp_store(struct device *dev, struct device_attribute *attr,
>  
>  	ret = tz->ops->set_trip_temp(tz, trip, temperature);
>  
> +	/*
> +	 * Notify bound cooling devices that this trip point changed.
> +	 * This is useful for cooling devices which represent a behavior
> +	 * which trips in hardware (e.g. catastrophic shutdown)
> +	 */
> +	list_for_each_entry(pos, &tz->thermal_instances, tz_node) {
> +		if (pos->tz == tz && pos->trip == trip && pos->cdev) {
> +			if (pos->cdev->ops->trip_point_changed)
> +				pos->cdev->ops->trip_point_changed(pos->cdev,
> +								   pos->tz,
> +								   trip);
> +		}
> +	}
> +
>  	return ret ? ret : count;
>  }
>  
> @@ -641,6 +656,7 @@ trip_point_hyst_store(struct device *dev, struct device_attribute *attr,
>  	struct thermal_zone_device *tz = to_thermal_zone(dev);
>  	int trip, ret;
>  	unsigned long temperature;
> +	struct thermal_instance *pos = NULL;
>  
>  	if (!tz->ops->set_trip_hyst)
>  		return -EPERM;
> @@ -658,6 +674,20 @@ trip_point_hyst_store(struct device *dev, struct device_attribute *attr,
>  	 */
>  	ret = tz->ops->set_trip_hyst(tz, trip, temperature);
>  
> +	/*
> +	 * Notify bound cooling devices that this trip point changed.
> +	 * This is useful for cooling devices which represent a behavior
> +	 * which trips in hardware (e.g. catastrophic shutdown)
> +	 */
> +	list_for_each_entry(pos, &tz->thermal_instances, tz_node) {
> +		if (pos->tz == tz && pos->trip == trip && pos->cdev) {
> +			if (pos->cdev->ops->trip_point_changed)
> +				pos->cdev->ops->trip_point_changed(pos->cdev,
> +								   pos->tz,
> +								   trip);
> +		}
> +	}
> +
>  	return ret ? ret : count;
>  }
>  
> diff --git a/include/linux/thermal.h b/include/linux/thermal.h
> index f7e11c7..7da7fc5 100644
> --- a/include/linux/thermal.h
> +++ b/include/linux/thermal.h
> @@ -138,6 +138,8 @@ struct thermal_cooling_device_ops {
>  	int (*get_max_state) (struct thermal_cooling_device *, unsigned long *);
>  	int (*get_cur_state) (struct thermal_cooling_device *, unsigned long *);
>  	int (*set_cur_state) (struct thermal_cooling_device *, unsigned long);
> +	void (*trip_point_changed) (struct thermal_cooling_device *,
> +				    struct thermal_zone_device *, int);
>  };
>  
>  struct thermal_cooling_device {


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Longnecker July 31, 2014, 5:17 p.m. UTC | #3
I should have labeled my message RFC. My intent was to generate some 
public discussion about a problem I've hit -- not to get this particular 
patch merged.

On 07/31/2014 01:30 AM, Zhang Rui wrote:
> On Wed, 2014-07-30 at 16:10 -0700, Matt Longnecker wrote:
>> Some hardware can react autonomously at a programmed temperature.
>
> if you have a temperature sensor, then you should have a thermal zone
> device driver for it, right?
>

Yes, but....

In existing "downstream" Tegra kernels, thermal sensor drivers directly 
register thermal zones with thermal_core.c. Drivers get information for 
the zone (e.g. trip points, cooling maps, etc). Each driver has its own 
format for this zone-related information.

Now we're moving away from board files and toward device tree.

of-thermal offers a canonical way to encode zone-related information in 
device tree. I don't want to invent a competing alternative.

As of-thermal parses device tree, it creates thermal zones. Sensor 
drivers no longer need to register zones -- they just register with 
of-thermal as sensor drivers.

of-thermal hides trip points and cooling maps from sensor drivers. This 
is good and bad. It's good because it eliminates a _lot_ of thermal zone 
boilerplate code from sensor drivers. It's bad because it breaks an 
existing usecase:
* the kernel configures the hardware thermal reaction (e.g. catastrophic 
shutdown) according to platform-data provided at boot
* a user adjusts the shutdown temperature at runtime by poking at 
/sys/class/thermal/thermal_zone*/trip_point_*_temp

My patch offers one way to support this usecase in systems using 
of-thermal to manage thermal zones. I'm happy to discuss alternatives.

>> For example, an SoC might implement a last ditch throttle or a
>> hardware thermal shutdown.
>
> And it should be handled by the thermal zone driver, who has the
> knowledge of when to do throttle/shutdown.
>

of-thermal doesn't support this because it doesn't propagate trip point 
changes to sensor drivers.

Thanks,
Matt Longnecker
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 71b0ec0..f25272e 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -597,6 +597,7 @@  trip_point_temp_store(struct device *dev, struct device_attribute *attr,
 	struct thermal_zone_device *tz = to_thermal_zone(dev);
 	int trip, ret;
 	unsigned long temperature;
+	struct thermal_instance *pos = NULL;
 
 	if (!tz->ops->set_trip_temp)
 		return -EPERM;
@@ -609,6 +610,20 @@  trip_point_temp_store(struct device *dev, struct device_attribute *attr,
 
 	ret = tz->ops->set_trip_temp(tz, trip, temperature);
 
+	/*
+	 * Notify bound cooling devices that this trip point changed.
+	 * This is useful for cooling devices which represent a behavior
+	 * which trips in hardware (e.g. catastrophic shutdown)
+	 */
+	list_for_each_entry(pos, &tz->thermal_instances, tz_node) {
+		if (pos->tz == tz && pos->trip == trip && pos->cdev) {
+			if (pos->cdev->ops->trip_point_changed)
+				pos->cdev->ops->trip_point_changed(pos->cdev,
+								   pos->tz,
+								   trip);
+		}
+	}
+
 	return ret ? ret : count;
 }
 
@@ -641,6 +656,7 @@  trip_point_hyst_store(struct device *dev, struct device_attribute *attr,
 	struct thermal_zone_device *tz = to_thermal_zone(dev);
 	int trip, ret;
 	unsigned long temperature;
+	struct thermal_instance *pos = NULL;
 
 	if (!tz->ops->set_trip_hyst)
 		return -EPERM;
@@ -658,6 +674,20 @@  trip_point_hyst_store(struct device *dev, struct device_attribute *attr,
 	 */
 	ret = tz->ops->set_trip_hyst(tz, trip, temperature);
 
+	/*
+	 * Notify bound cooling devices that this trip point changed.
+	 * This is useful for cooling devices which represent a behavior
+	 * which trips in hardware (e.g. catastrophic shutdown)
+	 */
+	list_for_each_entry(pos, &tz->thermal_instances, tz_node) {
+		if (pos->tz == tz && pos->trip == trip && pos->cdev) {
+			if (pos->cdev->ops->trip_point_changed)
+				pos->cdev->ops->trip_point_changed(pos->cdev,
+								   pos->tz,
+								   trip);
+		}
+	}
+
 	return ret ? ret : count;
 }
 
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index f7e11c7..7da7fc5 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -138,6 +138,8 @@  struct thermal_cooling_device_ops {
 	int (*get_max_state) (struct thermal_cooling_device *, unsigned long *);
 	int (*get_cur_state) (struct thermal_cooling_device *, unsigned long *);
 	int (*set_cur_state) (struct thermal_cooling_device *, unsigned long);
+	void (*trip_point_changed) (struct thermal_cooling_device *,
+				    struct thermal_zone_device *, int);
 };
 
 struct thermal_cooling_device {