Message ID | 20190130170506.20450-2-quentin.perret@arm.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | Register an Energy Model for Arm reference platforms | expand |
Hi Quentin, On Wed, Jan 30, 2019 at 05:05:02PM +0000, Quentin Perret wrote: > The Energy Model (EM) framework provides an API to let drivers register > the active power of CPUs. The drivers are expected to provide a callback > method which estimates the power consumed by a CPU at each available > performance levels. How exactly this should be implemented, however, > depends on the platform. > > On some systems, PM_OPP knows the voltage and frequency at which CPUs > can run. When coupled with the CPU 'capacitance' (as provided by the > 'dynamic-power-coefficient' devicetree binding), it is possible to > estimate the dynamic power consumption of a CPU as P = C * V^2 * f, with > C its capacitance and V and f respectively the voltage and frequency of > the OPP. The Intelligent Power Allocator (IPA) thermal governor already > implements that estimation method, in the thermal framework. > > However, this power estimation method can be applied to any platform > where all the parameters are known (C, V and f), and not only those > suffering thermal issues. As such, the code implementing this feature > can be re-used to also populate the EM framework now used by EAS. > > As a first step, introduce in PM_OPP a helper function which CPUFreq > drivers can use to register into the EM framework. This duplicates the > power estimation done in IPA until it can be migrated to using the EM > framework. This will be done later, once the EM framework has support > for at least all platforms currently supported by IPA. > > Signed-off-by: Quentin Perret <quentin.perret@arm.com> > > --- > > Matthias: Given this patch changed a bit I dropped your Reviewed-by and > Tested-by, but let me know if you think they still hold. > --- > drivers/opp/of.c | 88 ++++++++++++++++++++++++++++++++++++++++++ > include/linux/pm_opp.h | 6 +++ > 2 files changed, 94 insertions(+) > > diff --git a/drivers/opp/of.c b/drivers/opp/of.c > index 06f0f632ec47..4c8bf172e9ed 100644 > --- a/drivers/opp/of.c > +++ b/drivers/opp/of.c > @@ -20,6 +20,7 @@ > #include <linux/pm_domain.h> > #include <linux/slab.h> > #include <linux/export.h> > +#include <linux/energy_model.h> nit: AFAIK typically alphabetical order is used for includes, though this file doesn't exactly adhere to it. > #include "opp.h" > > @@ -1047,3 +1048,90 @@ struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp) > return of_node_get(opp->np); > } > EXPORT_SYMBOL_GPL(dev_pm_opp_get_of_node); > + > +/* > + * Callback function provided to the Energy Model framework upon registration. > + * This computes the power estimated by @CPU at the first OPP above @kHz (ceil), that's not entirely correct, it could be the OPP at @kHz. > + * and updates @kHz and @mW accordingly. The power is estimated as > + * P = C * V^2 * f with C being the CPU's capacitance and V and f respectively > + * the voltage and frequency of the OPP. > + * > + * Returns -ENODEV if the CPU device cannot be found, -EINVAL if the power > + * calculation failed because of missing parameters, 0 otherwise. > + */ > +static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz, > + int cpu) why __maybe_unused? > +{ > + struct device *cpu_dev; > + struct dev_pm_opp *opp; > + struct device_node *np; > + unsigned long mV, Hz; > + u32 cap; > + u64 tmp; > + int ret; > + > + cpu_dev = get_cpu_device(cpu); > + if (!cpu_dev) > + return -ENODEV; > + > + np = of_node_get(cpu_dev->of_node); > + if (!np) > + return -EINVAL; > + > + ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap); > + of_node_put(np); > + if (ret) > + return -EINVAL; > + > + Hz = *kHz * 1000; > + opp = dev_pm_opp_find_freq_ceil(cpu_dev, &Hz); > + if (IS_ERR(opp)) > + return -EINVAL; > + > + mV = dev_pm_opp_get_voltage(opp) / 1000; > + dev_pm_opp_put(opp); > + if (!mV) > + return -EINVAL; > + > + tmp = (u64)cap * mV * mV * (Hz / 1000000); > + do_div(tmp, 1000000000); > + > + *mW = (unsigned long)tmp; > + *kHz = Hz / 1000; > + > + return 0; > +} > + > +/** > + * dev_pm_opp_of_register_em() - Attempt to register an Energy Model > + * @cpus : CPUs for which an Energy Model has to be registered > + * @nr_opp : Number of OPPs to register in the Energy Model > + * > + * This checks whether the "dynamic-power-coefficient" devicetree binding has s/binding/property/ ? > + * been specified, and tries to register an Energy Model with it if it has. > + */ > +void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp) Is the nr_opp parameter really needed? The function looks up the CPU device and hence could determine the OPP count itself with dev_pm_opp_get_opp_count(). I see most cpufreq drivers call dev_pm_opp_get_opp_count() anyway, so passing the count as parameter can be considered a small optimization, not sure how relevant it is though, since dev_pm_opp_of_register_em() isn't called frequently. > +{ > + struct em_data_callback em_cb = EM_DATA_CB(_get_cpu_power); > + int ret, cpu = cpumask_first(cpus); > + struct device *cpu_dev; > + struct device_node *np; > + u32 cap; > + > + cpu_dev = get_cpu_device(cpu); > + if (!cpu_dev) > + return; > + > + np = of_node_get(cpu_dev->of_node); > + if (!np) > + return; > + > + /* Don't register an EM without the right DT binding */ > + ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap); > + of_node_put(np); > + if (ret || !cap) > + return; > + > + em_register_perf_domain(cpus, nr_opp, &em_cb); > +} > +EXPORT_SYMBOL_GPL(dev_pm_opp_of_register_em); > diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h > index b895f4e79868..58ae08b024bd 100644 > --- a/include/linux/pm_opp.h > +++ b/include/linux/pm_opp.h > @@ -327,6 +327,7 @@ int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpuma > struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev); > struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp); > int of_get_required_opp_performance_state(struct device_node *np, int index); > +void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp); > #else > static inline int dev_pm_opp_of_add_table(struct device *dev) > { > @@ -365,6 +366,11 @@ static inline struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp) > { > return NULL; > } > + > +static inline void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp) > +{ > +} > + > static inline int of_get_required_opp_performance_state(struct device_node *np, int index) > { > return -ENOTSUPP; Tested-by: Matthias Kaehlcke <mka@chromium.org>
On 30-01-19, 11:07, Matthias Kaehlcke wrote: > On Wed, Jan 30, 2019 at 05:05:02PM +0000, Quentin Perret wrote: > > +static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz, > > + int cpu) > > why __maybe_unused? Yeah, it isn't required I think. He probably added it for the case where CONFIG_ENERGY_MODEL=n, but even then an inline routine is defined which will accept it as argument and wouldn't do anything with it. Had it been a macro, we would have required __maybe_unused but not now.
On 30-01-19, 17:05, Quentin Perret wrote: > +static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz, > + int cpu) > +{ > + struct device *cpu_dev; > + struct dev_pm_opp *opp; > + struct device_node *np; > + unsigned long mV, Hz; > + u32 cap; > + u64 tmp; > + int ret; > + > + cpu_dev = get_cpu_device(cpu); > + if (!cpu_dev) > + return -ENODEV; > + > + np = of_node_get(cpu_dev->of_node); > + if (!np) > + return -EINVAL; > + > + ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap); > + of_node_put(np); > + if (ret) > + return -EINVAL; > + > + Hz = *kHz * 1000; > + opp = dev_pm_opp_find_freq_ceil(cpu_dev, &Hz); > + if (IS_ERR(opp)) > + return -EINVAL; > + > + mV = dev_pm_opp_get_voltage(opp) / 1000; The voltage is also stored as triplet now a days and we must consider the higher value for these calculations. Also what about the case of multiple regulators here or performance-states ? > + dev_pm_opp_put(opp); > + if (!mV) > + return -EINVAL; > + > + tmp = (u64)cap * mV * mV * (Hz / 1000000); > + do_div(tmp, 1000000000); > + > + *mW = (unsigned long)tmp; I was thinking will it be better if we just save this information in opp->power field during init, so we can just read a value here instead. But I am still not sure :( > + *kHz = Hz / 1000; > + > + return 0; > +} > + > +/** > + * dev_pm_opp_of_register_em() - Attempt to register an Energy Model > + * @cpus : CPUs for which an Energy Model has to be registered > + * @nr_opp : Number of OPPs to register in the Energy Model > + * > + * This checks whether the "dynamic-power-coefficient" devicetree binding has > + * been specified, and tries to register an Energy Model with it if it has. > + */ > +void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp) > +{ > + struct em_data_callback em_cb = EM_DATA_CB(_get_cpu_power); > + int ret, cpu = cpumask_first(cpus); > + struct device *cpu_dev; > + struct device_node *np; > + u32 cap; > + > + cpu_dev = get_cpu_device(cpu); > + if (!cpu_dev) > + return; > + > + np = of_node_get(cpu_dev->of_node); > + if (!np) > + return; > + > + /* Don't register an EM without the right DT binding */ > + ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap); > + of_node_put(np); > + if (ret || !cap) > + return; What if no voltage is supplied in DT ? > + > + em_register_perf_domain(cpus, nr_opp, &em_cb); > +} > +EXPORT_SYMBOL_GPL(dev_pm_opp_of_register_em); > diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h > index b895f4e79868..58ae08b024bd 100644 > --- a/include/linux/pm_opp.h > +++ b/include/linux/pm_opp.h > @@ -327,6 +327,7 @@ int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpuma > struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev); > struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp); > int of_get_required_opp_performance_state(struct device_node *np, int index); > +void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp); > #else > static inline int dev_pm_opp_of_add_table(struct device *dev) > { > @@ -365,6 +366,11 @@ static inline struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp) > { > return NULL; > } > + > +static inline void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp) > +{ > +} > + > static inline int of_get_required_opp_performance_state(struct device_node *np, int index) > { > return -ENOTSUPP; > -- > 2.20.1
On Thursday 31 Jan 2019 at 12:52:09 (+0530), Viresh Kumar wrote: > On 30-01-19, 11:07, Matthias Kaehlcke wrote: > > On Wed, Jan 30, 2019 at 05:05:02PM +0000, Quentin Perret wrote: > > > +static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz, > > > + int cpu) > > > > why __maybe_unused? > > Yeah, it isn't required I think. He probably added it for the case > where CONFIG_ENERGY_MODEL=n, but even then an inline routine is > defined which will accept it as argument and wouldn't do anything with > it. Had it been a macro, we would have required __maybe_unused but not > now. The thing is, the EM_DATA_CB() macro _is_ stubbed for CONFIG_ENERGY_MODEL=n: https://elixir.bootlin.com/linux/v5.0-rc4/source/include/linux/energy_model.h#L165 So, without __maybe_unused you get do get a compiler warning. Thanks, Quentin > > -- > viresh
On 31-01-19, 09:34, Quentin Perret wrote: > On Thursday 31 Jan 2019 at 12:52:09 (+0530), Viresh Kumar wrote: > > On 30-01-19, 11:07, Matthias Kaehlcke wrote: > > > On Wed, Jan 30, 2019 at 05:05:02PM +0000, Quentin Perret wrote: > > > > +static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz, > > > > + int cpu) > > > > > > why __maybe_unused? > > > > Yeah, it isn't required I think. He probably added it for the case > > where CONFIG_ENERGY_MODEL=n, but even then an inline routine is > > defined which will accept it as argument and wouldn't do anything with > > it. Had it been a macro, we would have required __maybe_unused but not > > now. > > The thing is, the EM_DATA_CB() macro _is_ stubbed for > CONFIG_ENERGY_MODEL=n: > > https://elixir.bootlin.com/linux/v5.0-rc4/source/include/linux/energy_model.h#L165 > > So, without __maybe_unused you get do get a compiler warning. Yeah, I almost got to it and still missed it :)
Hi Matthias, On Wednesday 30 Jan 2019 at 11:07:03 (-0800), Matthias Kaehlcke wrote: > > diff --git a/drivers/opp/of.c b/drivers/opp/of.c > > index 06f0f632ec47..4c8bf172e9ed 100644 > > --- a/drivers/opp/of.c > > +++ b/drivers/opp/of.c > > @@ -20,6 +20,7 @@ > > #include <linux/pm_domain.h> > > #include <linux/slab.h> > > #include <linux/export.h> > > +#include <linux/energy_model.h> > > nit: AFAIK typically alphabetical order is used for includes, though > this file doesn't exactly adhere to it. Yeah that's what I was thinking too. Since nothing is in order here I figured there wasn't a best place to put it so I just stick it there. I happy to re-order all of them if necessary. > > > #include "opp.h" > > > > @@ -1047,3 +1048,90 @@ struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp) > > return of_node_get(opp->np); > > } > > EXPORT_SYMBOL_GPL(dev_pm_opp_get_of_node); > > + > > +/* > > + * Callback function provided to the Energy Model framework upon registration. > > + * This computes the power estimated by @CPU at the first OPP above @kHz (ceil), > > that's not entirely correct, it could be the OPP at @kHz. Right, I'll update that. > > > + * and updates @kHz and @mW accordingly. The power is estimated as > > + * P = C * V^2 * f with C being the CPU's capacitance and V and f respectively > > + * the voltage and frequency of the OPP. > > + * > > + * Returns -ENODEV if the CPU device cannot be found, -EINVAL if the power > > + * calculation failed because of missing parameters, 0 otherwise. > > + */ > > +static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz, > > + int cpu) > > why __maybe_unused? To avoid compiler warnings with CONFIG_ENERGY_MODEL=n, see my other email ;-) > > +{ > > + struct device *cpu_dev; > > + struct dev_pm_opp *opp; > > + struct device_node *np; > > + unsigned long mV, Hz; > > + u32 cap; > > + u64 tmp; > > + int ret; > > + > > + cpu_dev = get_cpu_device(cpu); > > + if (!cpu_dev) > > + return -ENODEV; > > + > > + np = of_node_get(cpu_dev->of_node); > > + if (!np) > > + return -EINVAL; > > + > > + ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap); > > + of_node_put(np); > > + if (ret) > > + return -EINVAL; > > + > > + Hz = *kHz * 1000; > > + opp = dev_pm_opp_find_freq_ceil(cpu_dev, &Hz); > > + if (IS_ERR(opp)) > > + return -EINVAL; > > + > > + mV = dev_pm_opp_get_voltage(opp) / 1000; > > + dev_pm_opp_put(opp); > > + if (!mV) > > + return -EINVAL; > > + > > + tmp = (u64)cap * mV * mV * (Hz / 1000000); > > + do_div(tmp, 1000000000); > > + > > + *mW = (unsigned long)tmp; > > + *kHz = Hz / 1000; > > + > > + return 0; > > +} > > + > > +/** > > + * dev_pm_opp_of_register_em() - Attempt to register an Energy Model > > + * @cpus : CPUs for which an Energy Model has to be registered > > + * @nr_opp : Number of OPPs to register in the Energy Model > > + * > > + * This checks whether the "dynamic-power-coefficient" devicetree binding has > > s/binding/property/ ? Sounds good. > > > + * been specified, and tries to register an Energy Model with it if it has. > > + */ > > +void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp) > > Is the nr_opp parameter really needed? The function looks up the CPU > device and hence could determine the OPP count itself with > dev_pm_opp_get_opp_count(). I see most cpufreq drivers call > dev_pm_opp_get_opp_count() anyway, so passing the count as parameter > can be considered a small optimization, not sure how relevant it is > though, since dev_pm_opp_of_register_em() isn't called frequently. Yeah, I figured since most callers of dev_pm_opp_of_register_em() already counted the OPPs, I could as well use the data instead of counting again. I mean, dev_pm_opp_get_count() has to traverse the whole list every time, so there is no point in doing that twice. Not a huge deal I guess. > > > +{ > > + struct em_data_callback em_cb = EM_DATA_CB(_get_cpu_power); > > + int ret, cpu = cpumask_first(cpus); > > + struct device *cpu_dev; > > + struct device_node *np; > > + u32 cap; > > + > > + cpu_dev = get_cpu_device(cpu); > > + if (!cpu_dev) > > + return; > > + > > + np = of_node_get(cpu_dev->of_node); > > + if (!np) > > + return; > > + > > + /* Don't register an EM without the right DT binding */ > > + ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap); > > + of_node_put(np); > > + if (ret || !cap) > > + return; > > + > > + em_register_perf_domain(cpus, nr_opp, &em_cb); > > +} > > +EXPORT_SYMBOL_GPL(dev_pm_opp_of_register_em); > > diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h > > index b895f4e79868..58ae08b024bd 100644 > > --- a/include/linux/pm_opp.h > > +++ b/include/linux/pm_opp.h > > @@ -327,6 +327,7 @@ int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpuma > > struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev); > > struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp); > > int of_get_required_opp_performance_state(struct device_node *np, int index); > > +void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp); > > #else > > static inline int dev_pm_opp_of_add_table(struct device *dev) > > { > > @@ -365,6 +366,11 @@ static inline struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp) > > { > > return NULL; > > } > > + > > +static inline void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp) > > +{ > > +} > > + > > static inline int of_get_required_opp_performance_state(struct device_node *np, int index) > > { > > return -ENOTSUPP; > > Tested-by: Matthias Kaehlcke <mka@chromium.org> Thank you very much ! Quentin
On Thursday 31 Jan 2019 at 12:56:33 (+0530), Viresh Kumar wrote: > On 30-01-19, 17:05, Quentin Perret wrote: > > +static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz, > > + int cpu) > > +{ > > + struct device *cpu_dev; > > + struct dev_pm_opp *opp; > > + struct device_node *np; > > + unsigned long mV, Hz; > > + u32 cap; > > + u64 tmp; > > + int ret; > > + > > + cpu_dev = get_cpu_device(cpu); > > + if (!cpu_dev) > > + return -ENODEV; > > + > > + np = of_node_get(cpu_dev->of_node); > > + if (!np) > > + return -EINVAL; > > + > > + ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap); > > + of_node_put(np); > > + if (ret) > > + return -EINVAL; > > + > > + Hz = *kHz * 1000; > > + opp = dev_pm_opp_find_freq_ceil(cpu_dev, &Hz); > > + if (IS_ERR(opp)) > > + return -EINVAL; > > + > > + mV = dev_pm_opp_get_voltage(opp) / 1000; > > The voltage is also stored as triplet now a days and we must consider > the higher value for these calculations. Also what about the case of > multiple regulators here or performance-states ? Well at least this is not worst than what we already do for IPA :-) https://elixir.bootlin.com/linux/latest/source/drivers/thermal/cpu_cooling.c#L245 In the case of multiple regulators, then maybe that should be dealt with at the dev_pm_op_get_voltage() ? Not sure. > > > + dev_pm_opp_put(opp); > > + if (!mV) > > + return -EINVAL; > > + > > + tmp = (u64)cap * mV * mV * (Hz / 1000000); > > + do_div(tmp, 1000000000); > > + > > + *mW = (unsigned long)tmp; > > I was thinking will it be better if we just save this information in > opp->power field during init, so we can just read a value here > instead. But I am still not sure :( Yeah, I had the exact same question. But, then I thought, we're only gonna use that once, so it's not clear we need to cache the value. And I don't think we want other subsystems to ask PM_OPP for power values directly. Those subsystems should ask the EM framework instead (which exists for that very reason). So we're probably not gonna expose a dev_pm_opp_get_power() accessor or so, I think. That's why I went that way. > > > + *kHz = Hz / 1000; > > + > > + return 0; > > +} > > + > > +/** > > + * dev_pm_opp_of_register_em() - Attempt to register an Energy Model > > + * @cpus : CPUs for which an Energy Model has to be registered > > + * @nr_opp : Number of OPPs to register in the Energy Model > > + * > > + * This checks whether the "dynamic-power-coefficient" devicetree binding has > > + * been specified, and tries to register an Energy Model with it if it has. > > + */ > > +void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp) > > +{ > > + struct em_data_callback em_cb = EM_DATA_CB(_get_cpu_power); > > + int ret, cpu = cpumask_first(cpus); > > + struct device *cpu_dev; > > + struct device_node *np; > > + u32 cap; > > + > > + cpu_dev = get_cpu_device(cpu); > > + if (!cpu_dev) > > + return; > > + > > + np = of_node_get(cpu_dev->of_node); > > + if (!np) > > + return; > > + > > + /* Don't register an EM without the right DT binding */ > > + ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap); > > + of_node_put(np); > > + if (ret || !cap) > > + return; > > What if no voltage is supplied in DT ? Then don't provide 'dynamic-power-coefficient' ? There is nothing you can do with that without voltages I think. With this implementation you'll get an error message at some point, which is probably sane. > > > + > > + em_register_perf_domain(cpus, nr_opp, &em_cb); > > +} > > +EXPORT_SYMBOL_GPL(dev_pm_opp_of_register_em); > > diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h > > index b895f4e79868..58ae08b024bd 100644 > > --- a/include/linux/pm_opp.h > > +++ b/include/linux/pm_opp.h > > @@ -327,6 +327,7 @@ int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpuma > > struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev); > > struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp); > > int of_get_required_opp_performance_state(struct device_node *np, int index); > > +void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp); > > #else > > static inline int dev_pm_opp_of_add_table(struct device *dev) > > { > > @@ -365,6 +366,11 @@ static inline struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp) > > { > > return NULL; > > } > > + > > +static inline void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp) > > +{ > > +} > > + > > static inline int of_get_required_opp_performance_state(struct device_node *np, int index) > > { > > return -ENOTSUPP; > > -- > > 2.20.1 > > -- > viresh Thanks, Quentin
diff --git a/drivers/opp/of.c b/drivers/opp/of.c index 06f0f632ec47..4c8bf172e9ed 100644 --- a/drivers/opp/of.c +++ b/drivers/opp/of.c @@ -20,6 +20,7 @@ #include <linux/pm_domain.h> #include <linux/slab.h> #include <linux/export.h> +#include <linux/energy_model.h> #include "opp.h" @@ -1047,3 +1048,90 @@ struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp) return of_node_get(opp->np); } EXPORT_SYMBOL_GPL(dev_pm_opp_get_of_node); + +/* + * Callback function provided to the Energy Model framework upon registration. + * This computes the power estimated by @CPU at the first OPP above @kHz (ceil), + * and updates @kHz and @mW accordingly. The power is estimated as + * P = C * V^2 * f with C being the CPU's capacitance and V and f respectively + * the voltage and frequency of the OPP. + * + * Returns -ENODEV if the CPU device cannot be found, -EINVAL if the power + * calculation failed because of missing parameters, 0 otherwise. + */ +static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz, + int cpu) +{ + struct device *cpu_dev; + struct dev_pm_opp *opp; + struct device_node *np; + unsigned long mV, Hz; + u32 cap; + u64 tmp; + int ret; + + cpu_dev = get_cpu_device(cpu); + if (!cpu_dev) + return -ENODEV; + + np = of_node_get(cpu_dev->of_node); + if (!np) + return -EINVAL; + + ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap); + of_node_put(np); + if (ret) + return -EINVAL; + + Hz = *kHz * 1000; + opp = dev_pm_opp_find_freq_ceil(cpu_dev, &Hz); + if (IS_ERR(opp)) + return -EINVAL; + + mV = dev_pm_opp_get_voltage(opp) / 1000; + dev_pm_opp_put(opp); + if (!mV) + return -EINVAL; + + tmp = (u64)cap * mV * mV * (Hz / 1000000); + do_div(tmp, 1000000000); + + *mW = (unsigned long)tmp; + *kHz = Hz / 1000; + + return 0; +} + +/** + * dev_pm_opp_of_register_em() - Attempt to register an Energy Model + * @cpus : CPUs for which an Energy Model has to be registered + * @nr_opp : Number of OPPs to register in the Energy Model + * + * This checks whether the "dynamic-power-coefficient" devicetree binding has + * been specified, and tries to register an Energy Model with it if it has. + */ +void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp) +{ + struct em_data_callback em_cb = EM_DATA_CB(_get_cpu_power); + int ret, cpu = cpumask_first(cpus); + struct device *cpu_dev; + struct device_node *np; + u32 cap; + + cpu_dev = get_cpu_device(cpu); + if (!cpu_dev) + return; + + np = of_node_get(cpu_dev->of_node); + if (!np) + return; + + /* Don't register an EM without the right DT binding */ + ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap); + of_node_put(np); + if (ret || !cap) + return; + + em_register_perf_domain(cpus, nr_opp, &em_cb); +} +EXPORT_SYMBOL_GPL(dev_pm_opp_of_register_em); diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index b895f4e79868..58ae08b024bd 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -327,6 +327,7 @@ int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpuma struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev); struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp); int of_get_required_opp_performance_state(struct device_node *np, int index); +void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp); #else static inline int dev_pm_opp_of_add_table(struct device *dev) { @@ -365,6 +366,11 @@ static inline struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp) { return NULL; } + +static inline void dev_pm_opp_of_register_em(struct cpumask *cpus, int nr_opp) +{ +} + static inline int of_get_required_opp_performance_state(struct device_node *np, int index) { return -ENOTSUPP;
The Energy Model (EM) framework provides an API to let drivers register the active power of CPUs. The drivers are expected to provide a callback method which estimates the power consumed by a CPU at each available performance levels. How exactly this should be implemented, however, depends on the platform. On some systems, PM_OPP knows the voltage and frequency at which CPUs can run. When coupled with the CPU 'capacitance' (as provided by the 'dynamic-power-coefficient' devicetree binding), it is possible to estimate the dynamic power consumption of a CPU as P = C * V^2 * f, with C its capacitance and V and f respectively the voltage and frequency of the OPP. The Intelligent Power Allocator (IPA) thermal governor already implements that estimation method, in the thermal framework. However, this power estimation method can be applied to any platform where all the parameters are known (C, V and f), and not only those suffering thermal issues. As such, the code implementing this feature can be re-used to also populate the EM framework now used by EAS. As a first step, introduce in PM_OPP a helper function which CPUFreq drivers can use to register into the EM framework. This duplicates the power estimation done in IPA until it can be migrated to using the EM framework. This will be done later, once the EM framework has support for at least all platforms currently supported by IPA. Signed-off-by: Quentin Perret <quentin.perret@arm.com> --- Matthias: Given this patch changed a bit I dropped your Reviewed-by and Tested-by, but let me know if you think they still hold. --- drivers/opp/of.c | 88 ++++++++++++++++++++++++++++++++++++++++++ include/linux/pm_opp.h | 6 +++ 2 files changed, 94 insertions(+)