diff mbox series

[v3,1/5] PM / OPP: Introduce a power estimation helper

Message ID 20190201093101.31869-2-quentin.perret@arm.com (mailing list archive)
State Superseded, archived
Headers show
Series Register an Energy Model for Arm reference platforms | expand

Commit Message

Quentin Perret Feb. 1, 2019, 9:30 a.m. UTC
The Energy Model (EM) framework provides an API to let drivers register
the active power of CPUs. The drivers are expected to provide a callback
method which estimates the power consumed by a CPU at each available
performance levels. How exactly this should be implemented, however,
depends on the platform.

On some systems, PM_OPP knows the voltage and frequency at which CPUs
can run. When coupled with the CPU 'capacitance' (as provided by the
'dynamic-power-coefficient' devicetree binding), it is possible to
estimate the dynamic power consumption of a CPU as P = C * V^2 * f, with
C its capacitance and V and f respectively the voltage and frequency of
the OPP. The Intelligent Power Allocator (IPA) thermal governor already
implements that estimation method, in the thermal framework.

However, this power estimation method can be applied to any platform
where all the parameters are known (C, V and f), and not only those
suffering thermal issues. As such, the code implementing this feature
can be re-used to also populate the EM framework now used by EAS.

As a first step, introduce in PM_OPP a helper function which CPUFreq
drivers can use to register into the EM framework. This duplicates the
power estimation done in IPA until it can be migrated to using the EM
framework. This will be done later, once the EM framework has support
for at least all platforms currently supported by IPA.

Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Tested-by: Matthias Kaehlcke <mka@chromium.org>
---
 drivers/opp/of.c       | 96 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm_opp.h |  6 +++
 2 files changed, 102 insertions(+)

Comments

Sudeep Holla Feb. 1, 2019, 12:04 p.m. UTC | #1
On Fri, Feb 01, 2019 at 09:30:57AM +0000, Quentin Perret wrote:
> The Energy Model (EM) framework provides an API to let drivers register
> the active power of CPUs. The drivers are expected to provide a callback
> method which estimates the power consumed by a CPU at each available
> performance levels. How exactly this should be implemented, however,
> depends on the platform.
> 
> On some systems, PM_OPP knows the voltage and frequency at which CPUs
> can run. When coupled with the CPU 'capacitance' (as provided by the
> 'dynamic-power-coefficient' devicetree binding), it is possible to
> estimate the dynamic power consumption of a CPU as P = C * V^2 * f, with
> C its capacitance and V and f respectively the voltage and frequency of
> the OPP. The Intelligent Power Allocator (IPA) thermal governor already
> implements that estimation method, in the thermal framework.
> 
> However, this power estimation method can be applied to any platform
> where all the parameters are known (C, V and f), and not only those
> suffering thermal issues. As such, the code implementing this feature
> can be re-used to also populate the EM framework now used by EAS.
> 
> As a first step, introduce in PM_OPP a helper function which CPUFreq
> drivers can use to register into the EM framework. This duplicates the
> power estimation done in IPA until it can be migrated to using the EM
> framework. This will be done later, once the EM framework has support
> for at least all platforms currently supported by IPA.
> 
> Signed-off-by: Quentin Perret <quentin.perret@arm.com>
> Tested-by: Matthias Kaehlcke <mka@chromium.org>
> ---
>  drivers/opp/of.c       | 96 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pm_opp.h |  6 +++
>  2 files changed, 102 insertions(+)
> 
> diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> index 06f0f632ec47..4bde13d94dc2 100644
> --- a/drivers/opp/of.c
> +++ b/drivers/opp/of.c
> @@ -20,6 +20,7 @@
>  #include <linux/pm_domain.h>
>  #include <linux/slab.h>
>  #include <linux/export.h>
> +#include <linux/energy_model.h>
>  
>  #include "opp.h"
>  
> @@ -1047,3 +1048,98 @@ struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp)
>  	return of_node_get(opp->np);
>  }
>  EXPORT_SYMBOL_GPL(dev_pm_opp_get_of_node);
> +
> +/*
> + * Callback function provided to the Energy Model framework upon registration.
> + * This computes the power estimated by @CPU at @kHz if it is the frequency
> + * of an existing OPP, or at the frequency of the first OPP above @kHz otherwise
> + * (see dev_pm_opp_find_freq_ceil()). This function updates @kHz to the ceiled
> + * frequency and @mW to the associated power. The power is estimated as
> + * P = C * V^2 * f with C being the CPU's capacitance and V and f respectively
> + * the voltage and frequency of the OPP.
> + *
> + * Returns -ENODEV if the CPU device cannot be found, -EINVAL if the power
> + * calculation failed because of missing parameters, 0 otherwise.
> + */
> +static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz,
> +					 int cpu)
> +{
> +	struct device *cpu_dev;
> +	struct dev_pm_opp *opp;
> +	struct device_node *np;
> +	unsigned long mV, Hz;
> +	u32 cap;
> +	u64 tmp;
> +	int ret;
> +
> +	cpu_dev = get_cpu_device(cpu);
> +	if (!cpu_dev)
> +		return -ENODEV;
> +
> +	np = of_node_get(cpu_dev->of_node);
> +	if (!np)
> +		return -EINVAL;
> +
> +	ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap);
> +	of_node_put(np);
> +	if (ret)
> +		return -EINVAL;
> +
> +	Hz = *kHz * 1000;
> +	opp = dev_pm_opp_find_freq_ceil(cpu_dev, &Hz);
> +	if (IS_ERR(opp))
> +		return -EINVAL;
> +
> +	mV = dev_pm_opp_get_voltage(opp) / 1000;
> +	dev_pm_opp_put(opp);
> +	if (!mV)
> +		return -EINVAL;
> +
> +	tmp = (u64)cap * mV * mV * (Hz / 1000000);
> +	do_div(tmp, 1000000000);
> +
> +	*mW = (unsigned long)tmp;
> +	*kHz = Hz / 1000;
> +
> +	return 0;
> +}
> +
> +/**
> + * dev_pm_opp_of_register_em() - Attempt to register an Energy Model
> + * @cpus	: CPUs for which an Energy Model has to be registered
> + * @nr_opp	: Number of OPPs to register in the Energy Model
> + *
> + * This checks whether the "dynamic-power-coefficient" devicetree property has
> + * been specified, and tries to register an Energy Model with it if it has.
> + */
> +void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp)
> +{
> +	struct em_data_callback em_cb = EM_DATA_CB(_get_cpu_power);
> +	int ret, cpu = cpumask_first(cpus);
> +	struct device *cpu_dev;
> +	struct device_node *np;
> +	u32 cap;
> +
> +	cpu_dev = get_cpu_device(cpu);
> +	if (!cpu_dev)
> +		return;
> +
> +	np = of_node_get(cpu_dev->of_node);
> +	if (!np)
> +		return;
> +

Does it make sense to add the check for OPP count here. You need not pass
that as parameter. Just makes one less thing to check in new drivers adding
this support. Thoughts ?

--
Regards,
Sudeep
Quentin Perret Feb. 1, 2019, 12:09 p.m. UTC | #2
Hi Sudeep,

On Friday 01 Feb 2019 at 12:04:53 (+0000), Sudeep Holla wrote:
> On Fri, Feb 01, 2019 at 09:30:57AM +0000, Quentin Perret wrote:
> > +void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp)
> > +{
> > +	struct em_data_callback em_cb = EM_DATA_CB(_get_cpu_power);
> > +	int ret, cpu = cpumask_first(cpus);
> > +	struct device *cpu_dev;
> > +	struct device_node *np;
> > +	u32 cap;
> > +
> > +	cpu_dev = get_cpu_device(cpu);
> > +	if (!cpu_dev)
> > +		return;
> > +
> > +	np = of_node_get(cpu_dev->of_node);
> > +	if (!np)
> > +		return;
> > +
> 
> Does it make sense to add the check for OPP count here. You need not pass
> that as parameter. Just makes one less thing to check in new drivers adding
> this support. Thoughts ?

Yeah Matthias had the same suggestion. I don't mind moving it here TBH.
It's just that some users already do the opp count before calling this
function, so I figured I could as well use that data instead of counting
again.

But yeah, that's one less thing to worry about on the driver side so
I'll move the OPP count in there for v4 and we'll see if people ask me
to move it out to optimize things ;-)

Thanks,
Quentin
Sudeep Holla Feb. 1, 2019, 12:27 p.m. UTC | #3
On Fri, Feb 01, 2019 at 12:09:53PM +0000, Quentin Perret wrote:
> Hi Sudeep,
>
> On Friday 01 Feb 2019 at 12:04:53 (+0000), Sudeep Holla wrote:
> > On Fri, Feb 01, 2019 at 09:30:57AM +0000, Quentin Perret wrote:
> > > +void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp)
> > > +{
> > > +	struct em_data_callback em_cb = EM_DATA_CB(_get_cpu_power);
> > > +	int ret, cpu = cpumask_first(cpus);
> > > +	struct device *cpu_dev;
> > > +	struct device_node *np;
> > > +	u32 cap;
> > > +
> > > +	cpu_dev = get_cpu_device(cpu);
> > > +	if (!cpu_dev)
> > > +		return;
> > > +
> > > +	np = of_node_get(cpu_dev->of_node);
> > > +	if (!np)
> > > +		return;
> > > +

Forgot earlier, you can use of_cpu_device_node_get to combine the above 2.

> >
> > Does it make sense to add the check for OPP count here. You need not pass
> > that as parameter. Just makes one less thing to check in new drivers adding
> > this support. Thoughts ?
>
> Yeah Matthias had the same suggestion. I don't mind moving it here TBH.
> It's just that some users already do the opp count before calling this
> function, so I figured I could as well use that data instead of counting
> again.
>

Indeed, I was under same opinion after seeing in 2 patches and then 3rd
made me think why not. Also since you fetch cpu_dev already there, it
should be fine.

> But yeah, that's one less thing to worry about on the driver side so
> I'll move the OPP count in there for v4 and we'll see if people ask me
> to move it out to optimize things ;-)
>

Yes, but I will leave it to Viresh's taste :)

--
Regards,
Sudeep
Sudeep Holla Feb. 1, 2019, 12:44 p.m. UTC | #4
On Fri, Feb 01, 2019 at 12:27:45PM +0000, Sudeep Holla wrote:
> On Fri, Feb 01, 2019 at 12:09:53PM +0000, Quentin Perret wrote:
> > Hi Sudeep,
> >
> > On Friday 01 Feb 2019 at 12:04:53 (+0000), Sudeep Holla wrote:
> > > On Fri, Feb 01, 2019 at 09:30:57AM +0000, Quentin Perret wrote:
> > > > +void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp)
> > > > +{
> > > > +	struct em_data_callback em_cb = EM_DATA_CB(_get_cpu_power);
> > > > +	int ret, cpu = cpumask_first(cpus);
> > > > +	struct device *cpu_dev;
> > > > +	struct device_node *np;
> > > > +	u32 cap;
> > > > +
> > > > +	cpu_dev = get_cpu_device(cpu);
> > > > +	if (!cpu_dev)
> > > > +		return;
> > > > +
> > > > +	np = of_node_get(cpu_dev->of_node);
> > > > +	if (!np)
> > > > +		return;
> > > > +
>
> Forgot earlier, you can use of_cpu_device_node_get to combine the above 2.
>

Scratch it, you need cpu_dev anyways. So ignore that.

--
Regards,
Sudeep
Matthias Kaehlcke Feb. 1, 2019, 6:16 p.m. UTC | #5
On Fri, Feb 01, 2019 at 12:09:53PM +0000, Quentin Perret wrote:
> Hi Sudeep,
> 
> On Friday 01 Feb 2019 at 12:04:53 (+0000), Sudeep Holla wrote:
> > On Fri, Feb 01, 2019 at 09:30:57AM +0000, Quentin Perret wrote:
> > > +void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp)
> > > +{
> > > +	struct em_data_callback em_cb = EM_DATA_CB(_get_cpu_power);
> > > +	int ret, cpu = cpumask_first(cpus);
> > > +	struct device *cpu_dev;
> > > +	struct device_node *np;
> > > +	u32 cap;
> > > +
> > > +	cpu_dev = get_cpu_device(cpu);
> > > +	if (!cpu_dev)
> > > +		return;
> > > +
> > > +	np = of_node_get(cpu_dev->of_node);
> > > +	if (!np)
> > > +		return;
> > > +
> > 
> > Does it make sense to add the check for OPP count here. You need not pass
> > that as parameter. Just makes one less thing to check in new drivers adding
> > this support. Thoughts ?
> 
> Yeah Matthias had the same suggestion. I don't mind moving it here TBH.
> It's just that some users already do the opp count before calling this
> function, so I figured I could as well use that data instead of counting
> again.
> 
> But yeah, that's one less thing to worry about on the driver side so
> I'll move the OPP count in there for v4 and we'll see if people ask me
> to move it out to optimize things ;-)

From an API perspective it would be nice to get rid of the nr_opp
parameter, it seems somewhat arbitrary. Moving dev_pm_opp_get_opp_count()
from the drivers into dev_pm_opp_of_register_em() (instead of calling
it twice) also sounds good in general, as long as the error handling
doesn't become too messy. In the current version
dev_pm_opp_of_register_em() doesn't return a value, with the change it
would have to return one to catch an empty OPP table, and it needs
to be distinguished from other cases where the EM registration fails
but the cpufreq driver is still functional (e.g. no
'dynamic-power-coefficient'). Maybe return -ENOTSUPP in those cases?

Well, let's see how it looks like :)
diff mbox series

Patch

diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 06f0f632ec47..4bde13d94dc2 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -20,6 +20,7 @@ 
 #include <linux/pm_domain.h>
 #include <linux/slab.h>
 #include <linux/export.h>
+#include <linux/energy_model.h>
 
 #include "opp.h"
 
@@ -1047,3 +1048,98 @@  struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp)
 	return of_node_get(opp->np);
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_get_of_node);
+
+/*
+ * Callback function provided to the Energy Model framework upon registration.
+ * This computes the power estimated by @CPU at @kHz if it is the frequency
+ * of an existing OPP, or at the frequency of the first OPP above @kHz otherwise
+ * (see dev_pm_opp_find_freq_ceil()). This function updates @kHz to the ceiled
+ * frequency and @mW to the associated power. The power is estimated as
+ * P = C * V^2 * f with C being the CPU's capacitance and V and f respectively
+ * the voltage and frequency of the OPP.
+ *
+ * Returns -ENODEV if the CPU device cannot be found, -EINVAL if the power
+ * calculation failed because of missing parameters, 0 otherwise.
+ */
+static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz,
+					 int cpu)
+{
+	struct device *cpu_dev;
+	struct dev_pm_opp *opp;
+	struct device_node *np;
+	unsigned long mV, Hz;
+	u32 cap;
+	u64 tmp;
+	int ret;
+
+	cpu_dev = get_cpu_device(cpu);
+	if (!cpu_dev)
+		return -ENODEV;
+
+	np = of_node_get(cpu_dev->of_node);
+	if (!np)
+		return -EINVAL;
+
+	ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap);
+	of_node_put(np);
+	if (ret)
+		return -EINVAL;
+
+	Hz = *kHz * 1000;
+	opp = dev_pm_opp_find_freq_ceil(cpu_dev, &Hz);
+	if (IS_ERR(opp))
+		return -EINVAL;
+
+	mV = dev_pm_opp_get_voltage(opp) / 1000;
+	dev_pm_opp_put(opp);
+	if (!mV)
+		return -EINVAL;
+
+	tmp = (u64)cap * mV * mV * (Hz / 1000000);
+	do_div(tmp, 1000000000);
+
+	*mW = (unsigned long)tmp;
+	*kHz = Hz / 1000;
+
+	return 0;
+}
+
+/**
+ * dev_pm_opp_of_register_em() - Attempt to register an Energy Model
+ * @cpus	: CPUs for which an Energy Model has to be registered
+ * @nr_opp	: Number of OPPs to register in the Energy Model
+ *
+ * This checks whether the "dynamic-power-coefficient" devicetree property has
+ * been specified, and tries to register an Energy Model with it if it has.
+ */
+void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp)
+{
+	struct em_data_callback em_cb = EM_DATA_CB(_get_cpu_power);
+	int ret, cpu = cpumask_first(cpus);
+	struct device *cpu_dev;
+	struct device_node *np;
+	u32 cap;
+
+	cpu_dev = get_cpu_device(cpu);
+	if (!cpu_dev)
+		return;
+
+	np = of_node_get(cpu_dev->of_node);
+	if (!np)
+		return;
+
+	/*
+	 * Register an EM _only_ if the 'dynamic-power-coefficient' property is
+	 * set in devicetree. It is assumed the voltage values are known if that
+	 * property is set since it is useless otherwise. If voltages are not
+	 * known, just let the EM registration fail with an error to alert the
+	 * user about the inconsistent configuration.
+	 */
+	ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap);
+	of_node_put(np);
+	if (ret || !cap)
+		return;
+
+	em_register_perf_domain(cpus, nr_opp, &em_cb);
+}
+EXPORT_SYMBOL_GPL(dev_pm_opp_of_register_em);
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
index b895f4e79868..58ae08b024bd 100644
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -327,6 +327,7 @@  int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpuma
 struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev);
 struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp);
 int of_get_required_opp_performance_state(struct device_node *np, int index);
+void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp);
 #else
 static inline int dev_pm_opp_of_add_table(struct device *dev)
 {
@@ -365,6 +366,11 @@  static inline struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp)
 {
 	return NULL;
 }
+
+static inline void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp)
+{
+}
+
 static inline int of_get_required_opp_performance_state(struct device_node *np, int index)
 {
 	return -ENOTSUPP;