diff mbox series

[v2,08/17] PM: EM: Introduce runtime modifiable table

Message ID 20230512095743.3393563-9-lukasz.luba@arm.com (mailing list archive)
State RFC
Headers show
Series Introduce runtime modifiable Energy Model | expand

Commit Message

Lukasz Luba May 12, 2023, 9:57 a.m. UTC
This patch introduces the new feature: modifiable EM perf_state table.
The new runtime table would be populated with a new power data to better
reflect the actual power. The power can vary over time e.g. due to the
SoC temperature change. Higher temperature can increase power values.
For longer running scenarios, such as game or camera, when also other
devices are used (e.g. GPU, ISP) the CPU power can change. The new
EM framework is able to addresses this issue and change the data
at runtime safely. The runtime modifiable EM data is used by the Energy
Aware Scheduler (EAS) for the task placement.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h | 13 +++++++++++++
 kernel/power/energy_model.c  | 24 ++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

Comments

kernel test robot May 14, 2023, 4:28 a.m. UTC | #1
Hi Lukasz,

kernel test robot noticed the following build warnings:

[auto build test WARNING on rafael-pm/linux-next]
[also build test WARNING on rafael-pm/thermal linus/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Lukasz-Luba/PM-EM-Refactor-em_cpufreq_update_efficiencies-arguments/20230512-180158
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
patch link:    https://lore.kernel.org/r/20230512095743.3393563-9-lukasz.luba%40arm.com
patch subject: [PATCH v2 08/17] PM: EM: Introduce runtime modifiable table
config: arm64-randconfig-s041-20230514 (https://download.01.org/0day-ci/archive/20230514/202305141200.aaTHzYOJ-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 12.1.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.4-39-gce1a6720-dirty
        # https://github.com/intel-lab-lkp/linux/commit/d12d8d1010d7b093d6b64c204d77484d6fc268ab
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Lukasz-Luba/PM-EM-Refactor-em_cpufreq_update_efficiencies-arguments/20230512-180158
        git checkout d12d8d1010d7b093d6b64c204d77484d6fc268ab
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=arm64 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=arm64 SHELL=/bin/bash kernel/power/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202305141200.aaTHzYOJ-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> kernel/power/energy_model.c:472:13: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct em_perf_table *tmp @@     got struct em_perf_table [noderef] __rcu *runtime_table @@
   kernel/power/energy_model.c:472:13: sparse:     expected struct em_perf_table *tmp
   kernel/power/energy_model.c:472:13: sparse:     got struct em_perf_table [noderef] __rcu *runtime_table

vim +472 kernel/power/energy_model.c

   444	
   445	/**
   446	 * em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
   447	 * @dev		: Device for which the EM is registered
   448	 *
   449	 * Unregister the EM for the specified @dev (but not a CPU device).
   450	 */
   451	void em_dev_unregister_perf_domain(struct device *dev)
   452	{
   453		struct em_perf_domain *pd;
   454		struct em_perf_table *tmp;
   455	
   456		if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
   457			return;
   458	
   459		if (_is_cpu_device(dev))
   460			return;
   461	
   462		pd = dev->em_pd;
   463		/*
   464		 * The mutex separates all register/unregister requests and protects
   465		 * from potential clean-up/setup issues in the debugfs directories.
   466		 * The debugfs directory name is the same as device's name.
   467		 */
   468		mutex_lock(&em_pd_mutex);
   469	
   470		em_debug_remove_pd(dev);
   471	
 > 472		tmp = pd->runtime_table;
Dietmar Eggemann May 30, 2023, 10:18 a.m. UTC | #2
On 12/05/2023 11:57, Lukasz Luba wrote:
> This patch introduces the new feature: modifiable EM perf_state table.
> The new runtime table would be populated with a new power data to better
> reflect the actual power. The power can vary over time e.g. due to the
> SoC temperature change. Higher temperature can increase power values.
> For longer running scenarios, such as game or camera, when also other
> devices are used (e.g. GPU, ISP) the CPU power can change. The new
> EM framework is able to addresses this issue and change the data
> at runtime safely. The runtime modifiable EM data is used by the Energy
> Aware Scheduler (EAS) for the task placement.

It's important to say that EAS is the _only_user of the `runtime
modifiable EM`. All the other users (thermal, etc.) are still using the
default (basic) EM. IMHO, this fact drove the design here.

> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---
>  include/linux/energy_model.h | 13 +++++++++++++
>  kernel/power/energy_model.c  | 24 ++++++++++++++++++++++++
>  2 files changed, 37 insertions(+)
> 
> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> index cc2bf607191e..a616006a8130 100644
> --- a/include/linux/energy_model.h
> +++ b/include/linux/energy_model.h
> @@ -36,9 +36,21 @@ struct em_perf_state {
>   */
>  #define EM_PERF_STATE_INEFFICIENT BIT(0)
>  
> +/**
> + * struct em_perf_table - Performance states table, which can be
> + *		runtime modifiable and protected with RCU

which is `runtime modifiable` ? So `runtime modifiable performance state
table`? RCU is obvious since we have `struct rcu_head rcu`.

> + * @state:	List of performance states, in ascending order
> + * @rcu:	RCU used for safe access and destruction
> + */
> +struct em_perf_table {
> +	struct em_perf_state *state;
> +	struct rcu_head rcu;
> +};
> +
>  /**
>   * struct em_perf_domain - Performance domain
>   * @table:		List of performance states, in ascending order
> + * @runtime_table:	Pointer to the runtime modified em_perf_table

s/modified/modifiable

[...]

> @@ -237,12 +238,23 @@ static int em_create_pd(struct device *dev, int nr_states,
>  			return -ENOMEM;
>  	}
>  
> +	runtime_table = kzalloc(sizeof(*runtime_table), GFP_KERNEL);
> +	if (!runtime_table) {
> +		kfree(pd);
> +		return -ENOMEM;
> +	}
> +
>  	ret = em_create_perf_table(dev, pd, nr_states, cb, flags);
>  	if (ret) {
>  		kfree(pd);
> +		kfree(runtime_table);
>  		return ret;
>  	}
>  
> +	/* Re-use temporally (till 1st modification) the memory */

So this means that the runtime (modifiable) table
(pd->runtime_table>state) is mapped to the default (basic) table
(pd->default_table->state) until the first call to
em_dev_update_perf_domain() (here mentioned as the 1st modification)?

IMHO, not easy to understand since neither the cover letter, nor
documentation patch 15/17 describes this in a consistent story.

[...]
Lukasz Luba July 3, 2023, 3:58 p.m. UTC | #3
On 5/30/23 11:18, Dietmar Eggemann wrote:
> On 12/05/2023 11:57, Lukasz Luba wrote:
>> This patch introduces the new feature: modifiable EM perf_state table.
>> The new runtime table would be populated with a new power data to better
>> reflect the actual power. The power can vary over time e.g. due to the
>> SoC temperature change. Higher temperature can increase power values.
>> For longer running scenarios, such as game or camera, when also other
>> devices are used (e.g. GPU, ISP) the CPU power can change. The new
>> EM framework is able to addresses this issue and change the data
>> at runtime safely. The runtime modifiable EM data is used by the Energy
>> Aware Scheduler (EAS) for the task placement.
> 
> It's important to say that EAS is the _only_user of the `runtime
> modifiable EM`. All the other users (thermal, etc.) are still using the
> default (basic) EM. IMHO, this fact drove the design here.

OK, I'll add that information in the header.

> 
>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
>> ---
>>   include/linux/energy_model.h | 13 +++++++++++++
>>   kernel/power/energy_model.c  | 24 ++++++++++++++++++++++++
>>   2 files changed, 37 insertions(+)
>>
>> diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
>> index cc2bf607191e..a616006a8130 100644
>> --- a/include/linux/energy_model.h
>> +++ b/include/linux/energy_model.h
>> @@ -36,9 +36,21 @@ struct em_perf_state {
>>    */
>>   #define EM_PERF_STATE_INEFFICIENT BIT(0)
>>   
>> +/**
>> + * struct em_perf_table - Performance states table, which can be
>> + *		runtime modifiable and protected with RCU
> 
> which is `runtime modifiable` ? So `runtime modifiable performance state
> table`? RCU is obvious since we have `struct rcu_head rcu`.

Thanks, 'Runtime modifiable performance state table' sounds better.

> 
>> + * @state:	List of performance states, in ascending order
>> + * @rcu:	RCU used for safe access and destruction
>> + */
>> +struct em_perf_table {
>> +	struct em_perf_state *state;
>> +	struct rcu_head rcu;
>> +};
>> +
>>   /**
>>    * struct em_perf_domain - Performance domain
>>    * @table:		List of performance states, in ascending order
>> + * @runtime_table:	Pointer to the runtime modified em_perf_table
> 
> s/modified/modifiable
> 
> [...]
> 
>> @@ -237,12 +238,23 @@ static int em_create_pd(struct device *dev, int nr_states,
>>   			return -ENOMEM;
>>   	}
>>   
>> +	runtime_table = kzalloc(sizeof(*runtime_table), GFP_KERNEL);
>> +	if (!runtime_table) {
>> +		kfree(pd);
>> +		return -ENOMEM;
>> +	}
>> +
>>   	ret = em_create_perf_table(dev, pd, nr_states, cb, flags);
>>   	if (ret) {
>>   		kfree(pd);
>> +		kfree(runtime_table);
>>   		return ret;
>>   	}
>>   
>> +	/* Re-use temporally (till 1st modification) the memory */
> 
> So this means that the runtime (modifiable) table
> (pd->runtime_table>state) is mapped to the default (basic) table
> (pd->default_table->state) until the first call to
> em_dev_update_perf_domain() (here mentioned as the 1st modification)?

correct

> 
> IMHO, not easy to understand since neither the cover letter, nor
> documentation patch 15/17 describes this in a consistent story.

I'll add that to the patch header and also to the documentation patch
which is later in the series.
diff mbox series

Patch

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index cc2bf607191e..a616006a8130 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -36,9 +36,21 @@  struct em_perf_state {
  */
 #define EM_PERF_STATE_INEFFICIENT BIT(0)
 
+/**
+ * struct em_perf_table - Performance states table, which can be
+ *		runtime modifiable and protected with RCU
+ * @state:	List of performance states, in ascending order
+ * @rcu:	RCU used for safe access and destruction
+ */
+struct em_perf_table {
+	struct em_perf_state *state;
+	struct rcu_head rcu;
+};
+
 /**
  * struct em_perf_domain - Performance domain
  * @table:		List of performance states, in ascending order
+ * @runtime_table:	Pointer to the runtime modified em_perf_table
  * @nr_perf_states:	Number of performance states
  * @flags:		See "em_perf_domain flags"
  * @cpus:		Cpumask covering the CPUs of the domain. It's here
@@ -54,6 +66,7 @@  struct em_perf_state {
  */
 struct em_perf_domain {
 	struct em_perf_state *table;
+	struct em_perf_table __rcu *runtime_table;
 	int nr_perf_states;
 	unsigned long flags;
 	unsigned long cpus[];
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 8866d217714e..39d47028ef3d 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -213,6 +213,7 @@  static int em_create_pd(struct device *dev, int nr_states,
 			struct em_data_callback *cb, cpumask_t *cpus,
 			unsigned long flags)
 {
+	struct em_perf_table *runtime_table;
 	struct em_perf_domain *pd;
 	struct device *cpu_dev;
 	int cpu, ret, num_cpus;
@@ -237,12 +238,23 @@  static int em_create_pd(struct device *dev, int nr_states,
 			return -ENOMEM;
 	}
 
+	runtime_table = kzalloc(sizeof(*runtime_table), GFP_KERNEL);
+	if (!runtime_table) {
+		kfree(pd);
+		return -ENOMEM;
+	}
+
 	ret = em_create_perf_table(dev, pd, nr_states, cb, flags);
 	if (ret) {
 		kfree(pd);
+		kfree(runtime_table);
 		return ret;
 	}
 
+	/* Re-use temporally (till 1st modification) the memory */
+	runtime_table->state = pd->table;
+	rcu_assign_pointer(pd->runtime_table, runtime_table);
+
 	if (_is_cpu_device(dev))
 		for_each_cpu(cpu, cpus) {
 			cpu_dev = get_cpu_device(cpu);
@@ -438,20 +450,32 @@  EXPORT_SYMBOL_GPL(em_dev_register_perf_domain);
  */
 void em_dev_unregister_perf_domain(struct device *dev)
 {
+	struct em_perf_domain *pd;
+	struct em_perf_table *tmp;
+
 	if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
 		return;
 
 	if (_is_cpu_device(dev))
 		return;
 
+	pd = dev->em_pd;
 	/*
 	 * The mutex separates all register/unregister requests and protects
 	 * from potential clean-up/setup issues in the debugfs directories.
 	 * The debugfs directory name is the same as device's name.
 	 */
 	mutex_lock(&em_pd_mutex);
+
 	em_debug_remove_pd(dev);
 
+	tmp = pd->runtime_table;
+
+	rcu_assign_pointer(pd->runtime_table, NULL);
+	synchronize_rcu();
+
+	kfree(tmp);
+
 	kfree(dev->em_pd->table);
 	kfree(dev->em_pd);
 	dev->em_pd = NULL;