[v2,2/6] topology: add a new arch_scale_freq_reference

Message ID	20231009103621.374412-3-vincent.guittot@linaro.org (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org> From: Vincent Guittot <vincent.guittot@linaro.org> To: linux@armlinux.org.uk, catalin.marinas@arm.com, will@kernel.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, sudeep.holla@arm.com, gregkh@linuxfoundation.org, rafael@kernel.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, viresh.kumar@linaro.org, lukasz.luba@arm.com, ionela.voinescu@arm.com, pierre.gondois@arm.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, linux-pm@vger.kernel.org Cc: conor.dooley@microchip.com, suagrfillet@gmail.com, ajones@ventanamicro.com, lftan@kernel.org, Vincent Guittot <vincent.guittot@linaro.org> Subject: [PATCH v2 2/6] topology: add a new arch_scale_freq_reference Date: Mon, 9 Oct 2023 12:36:17 +0200 Message-Id: <20231009103621.374412-3-vincent.guittot@linaro.org> In-Reply-To: <20231009103621.374412-1-vincent.guittot@linaro.org> References: <20231009103621.374412-1-vincent.guittot@linaro.org> MIME-Version: 1.0 Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" <linux-riscv-bounces@lists.infradead.org> Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org
Series	consolidate and cleanup CPU capacity \| expand [v2,0/6] consolidate and cleanup CPU capacity [v2,1/6] sched: consolidate and cleanup access to CPU's max compute capacity [v2,2/6] topology: add a new arch_scale_freq_reference [v2,3/6] cpufreq: use the fixed and coherent frequency for scaling capacity [v2,4/6] cpufreq/schedutil: use a fixed reference frequency [v2,5/6] energy_model: use a fixed reference frequency [v2,6/6] cpufreq/cppc: set the frequency used for capacity computation

Message ID

20231009103621.374412-3-vincent.guittot@linaro.org (mailing list archive)

State

Superseded

Headers

From: Vincent Guittot <vincent.guittot@linaro.org>
To: linux@armlinux.org.uk,
	catalin.marinas@arm.com,
	will@kernel.org,
	paul.walmsley@sifive.com,
	palmer@dabbelt.com,
	aou@eecs.berkeley.edu,
	sudeep.holla@arm.com,
	gregkh@linuxfoundation.org,
	rafael@kernel.org,
	mingo@redhat.com,
	peterz@infradead.org,
	juri.lelli@redhat.com,
	dietmar.eggemann@arm.com,
	rostedt@goodmis.org,
	bsegall@google.com,
	mgorman@suse.de,
	bristot@redhat.com,
	vschneid@redhat.com,
	viresh.kumar@linaro.org,
	lukasz.luba@arm.com,
	ionela.voinescu@arm.com,
	pierre.gondois@arm.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	linux-riscv@lists.infradead.org,
	linux-pm@vger.kernel.org
Cc: conor.dooley@microchip.com,
	suagrfillet@gmail.com,
	ajones@ventanamicro.com,
	lftan@kernel.org,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: [PATCH v2 2/6] topology: add a new arch_scale_freq_reference
Date: Mon,  9 Oct 2023 12:36:17 +0200
Message-Id: <20231009103621.374412-3-vincent.guittot@linaro.org>
In-Reply-To: <20231009103621.374412-1-vincent.guittot@linaro.org>
References: <20231009103621.374412-1-vincent.guittot@linaro.org>
MIME-Version: 1.0
Precedence: list
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-riscv" <linux-riscv-bounces@lists.infradead.org>
Errors-To: 
 linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org

Series

consolidate and cleanup CPU capacity | expand

Context	Check	Description
conchuod/vmtest-for-next-PR	fail	PR summary
conchuod/patch-2-test-1	success	.github/scripts/patches/build_rv32_defconfig.sh
conchuod/patch-2-test-2	success	.github/scripts/patches/build_rv64_clang_allmodconfig.sh
conchuod/patch-2-test-3	success	.github/scripts/patches/build_rv64_gcc_allmodconfig.sh
conchuod/patch-2-test-4	success	.github/scripts/patches/build_rv64_nommu_k210_defconfig.sh
conchuod/patch-2-test-5	success	.github/scripts/patches/build_rv64_nommu_virt_defconfig.sh
conchuod/patch-2-test-6	success	.github/scripts/patches/checkpatch.sh
conchuod/patch-2-test-7	success	.github/scripts/patches/dtb_warn_rv64.sh
conchuod/patch-2-test-8	success	.github/scripts/patches/header_inline.sh
conchuod/patch-2-test-9	success	.github/scripts/patches/kdoc.sh
conchuod/patch-2-test-10	success	.github/scripts/patches/maintainers_patterns.sh
conchuod/patch-2-test-11	success	.github/scripts/patches/module_param.sh
conchuod/patch-2-test-12	success	.github/scripts/patches/verify_fixes.sh
conchuod/patch-2-test-13	success	.github/scripts/patches/verify_signedoff.sh

Context

Check

Description

conchuod/vmtest-for-next-PR

fail

PR summary

conchuod/patch-2-test-1

success

.github/scripts/patches/build_rv32_defconfig.sh

conchuod/patch-2-test-2

success

.github/scripts/patches/build_rv64_clang_allmodconfig.sh

conchuod/patch-2-test-3

success

.github/scripts/patches/build_rv64_gcc_allmodconfig.sh

conchuod/patch-2-test-4

success

.github/scripts/patches/build_rv64_nommu_k210_defconfig.sh

conchuod/patch-2-test-5

success

.github/scripts/patches/build_rv64_nommu_virt_defconfig.sh

conchuod/patch-2-test-6

success

.github/scripts/patches/checkpatch.sh

conchuod/patch-2-test-7

success

.github/scripts/patches/dtb_warn_rv64.sh

conchuod/patch-2-test-8

success

.github/scripts/patches/header_inline.sh

conchuod/patch-2-test-9

success

.github/scripts/patches/kdoc.sh

conchuod/patch-2-test-10

success

.github/scripts/patches/maintainers_patterns.sh

conchuod/patch-2-test-11

success

.github/scripts/patches/module_param.sh

conchuod/patch-2-test-12

success

.github/scripts/patches/verify_fixes.sh

conchuod/patch-2-test-13

success

.github/scripts/patches/verify_signedoff.sh

Commit Message

Vincent Guittot Oct. 9, 2023, 10:36 a.m. UTC

Create a new method to get a unique and fixed max frequency. Currently
cpuinfo.max_freq or the highest (or last) state of performance domain are
used as the max frequency when computing the frequency for a level of
utilization but:
- cpuinfo_max_freq can change at runtime. boost is one example of
  such change.
- cpuinfo.max_freq and last item of the PD can be different leading to
  different results between cpufreq and energy model.

We need to save the reference frequency that has been used when computing
the CPUs capacity and use this fixed and coherent value to convert between
frequency and CPU's capacity.

In fact, we already save the frequency that has been used when computing
the capacity of each CPU. We extend the precision to save khZ instead of
Mhz currently and we modify the type to be aligned with other variables
used when converting frequency to capacity and the other way.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 arch/arm/include/asm/topology.h   |  1 +
 arch/arm64/include/asm/topology.h |  1 +
 arch/riscv/include/asm/topology.h |  1 +
 drivers/base/arch_topology.c      | 29 ++++++++++++++---------------
 include/linux/arch_topology.h     |  7 +++++++
 5 files changed, 24 insertions(+), 15 deletions(-)

Comments

Pierre Gondois Oct. 11, 2023, 10:27 a.m. UTC | #1

Hello Vincent,

On 10/9/23 12:36, Vincent Guittot wrote:
> Create a new method to get a unique and fixed max frequency. Currently
> cpuinfo.max_freq or the highest (or last) state of performance domain are
> used as the max frequency when computing the frequency for a level of
> utilization but:
> - cpuinfo_max_freq can change at runtime. boost is one example of
>    such change.
> - cpuinfo.max_freq and last item of the PD can be different leading to
>    different results between cpufreq and energy model.
> 
> We need to save the reference frequency that has been used when computing
> the CPUs capacity and use this fixed and coherent value to convert between
> frequency and CPU's capacity.
> 
> In fact, we already save the frequency that has been used when computing
> the capacity of each CPU. We extend the precision to save khZ instead of
> Mhz currently and we modify the type to be aligned with other variables
> used when converting frequency to capacity and the other way.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>   arch/arm/include/asm/topology.h   |  1 +
>   arch/arm64/include/asm/topology.h |  1 +
>   arch/riscv/include/asm/topology.h |  1 +
>   drivers/base/arch_topology.c      | 29 ++++++++++++++---------------
>   include/linux/arch_topology.h     |  7 +++++++
>   5 files changed, 24 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
> index c7d2510e5a78..853c4f81ba4a 100644
> --- a/arch/arm/include/asm/topology.h
> +++ b/arch/arm/include/asm/topology.h
> @@ -13,6 +13,7 @@
>   #define arch_set_freq_scale topology_set_freq_scale
>   #define arch_scale_freq_capacity topology_get_freq_scale
>   #define arch_scale_freq_invariant topology_scale_freq_invariant
> +#define arch_scale_freq_ref topology_get_freq_ref
>   #endif
>   
>   /* Replace task scheduler's default cpu-invariant accounting */
> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
> index 9fab663dd2de..a323b109b9c4 100644
> --- a/arch/arm64/include/asm/topology.h
> +++ b/arch/arm64/include/asm/topology.h
> @@ -23,6 +23,7 @@ void update_freq_counters_refs(void);
>   #define arch_set_freq_scale topology_set_freq_scale
>   #define arch_scale_freq_capacity topology_get_freq_scale
>   #define arch_scale_freq_invariant topology_scale_freq_invariant
> +#define arch_scale_freq_ref topology_get_freq_ref
>   
>   #ifdef CONFIG_ACPI_CPPC_LIB
>   #define arch_init_invariance_cppc topology_init_cpu_capacity_cppc
> diff --git a/arch/riscv/include/asm/topology.h b/arch/riscv/include/asm/topology.h
> index e316ab3b77f3..61183688bdd5 100644
> --- a/arch/riscv/include/asm/topology.h
> +++ b/arch/riscv/include/asm/topology.h
> @@ -9,6 +9,7 @@
>   #define arch_set_freq_scale		topology_set_freq_scale
>   #define arch_scale_freq_capacity	topology_get_freq_scale
>   #define arch_scale_freq_invariant	topology_scale_freq_invariant
> +#define arch_scale_freq_ref		topology_get_freq_ref
>   
>   /* Replace task scheduler's default cpu-invariant accounting */
>   #define arch_scale_cpu_capacity	topology_get_cpu_scale
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index b741b5ba82bd..9a073c2d2086 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -19,6 +19,7 @@
>   #include <linux/init.h>
>   #include <linux/rcupdate.h>
>   #include <linux/sched.h>
> +#include <linux/units.h>
>   
>   #define CREATE_TRACE_POINTS
>   #include <trace/events/thermal_pressure.h>
> @@ -26,7 +27,8 @@
>   static DEFINE_PER_CPU(struct scale_freq_data __rcu *, sft_data);
>   static struct cpumask scale_freq_counters_mask;
>   static bool scale_freq_invariant;
> -static DEFINE_PER_CPU(u32, freq_factor) = 1;
> +DEFINE_PER_CPU(unsigned long, capacity_ref_freq) = 1;
> +EXPORT_PER_CPU_SYMBOL_GPL(capacity_ref_freq);
>   
>   static bool supports_scale_freq_counters(const struct cpumask *cpus)
>   {
> @@ -170,9 +172,9 @@ DEFINE_PER_CPU(unsigned long, thermal_pressure);
>    * operating on stale data when hot-plug is used for some CPUs. The
>    * @capped_freq reflects the currently allowed max CPUs frequency due to
>    * thermal capping. It might be also a boost frequency value, which is bigger
> - * than the internal 'freq_factor' max frequency. In such case the pressure
> - * value should simply be removed, since this is an indication that there is
> - * no thermal throttling. The @capped_freq must be provided in kHz.
> + * than the internal 'capacity_ref_freq' max frequency. In such case the
> + * pressure value should simply be removed, since this is an indication that
> + * there is no thermal throttling. The @capped_freq must be provided in kHz.
>    */
>   void topology_update_thermal_pressure(const struct cpumask *cpus,
>   				      unsigned long capped_freq)
> @@ -183,10 +185,7 @@ void topology_update_thermal_pressure(const struct cpumask *cpus,
>   
>   	cpu = cpumask_first(cpus);
>   	max_capacity = arch_scale_cpu_capacity(cpu);
> -	max_freq = per_cpu(freq_factor, cpu);
> -
> -	/* Convert to MHz scale which is used in 'freq_factor' */
> -	capped_freq /= 1000;
> +	max_freq = arch_scale_freq_ref(cpu);
>   
>   	/*
>   	 * Handle properly the boost frequencies, which should simply clean
> @@ -279,13 +278,13 @@ void topology_normalize_cpu_scale(void)
>   
>   	capacity_scale = 1;
>   	for_each_possible_cpu(cpu) {
> -		capacity = raw_capacity[cpu] * per_cpu(freq_factor, cpu);
> +		capacity = raw_capacity[cpu] * per_cpu(capacity_ref_freq, cpu);
>   		capacity_scale = max(capacity, capacity_scale);
>   	}
>   
>   	pr_debug("cpu_capacity: capacity_scale=%llu\n", capacity_scale);
>   	for_each_possible_cpu(cpu) {
> -		capacity = raw_capacity[cpu] * per_cpu(freq_factor, cpu);
> +		capacity = raw_capacity[cpu] * per_cpu(capacity_ref_freq, cpu);
>   		capacity = div64_u64(capacity << SCHED_CAPACITY_SHIFT,
>   			capacity_scale);
>   		topology_set_cpu_scale(cpu, capacity);
> @@ -321,15 +320,15 @@ bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
>   			cpu_node, raw_capacity[cpu]);
>   
>   		/*
> -		 * Update freq_factor for calculating early boot cpu capacities.
> +		 * Update capacity_ref_freq for calculating early boot cpu capacities.
>   		 * For non-clk CPU DVFS mechanism, there's no way to get the
>   		 * frequency value now, assuming they are running at the same
> -		 * frequency (by keeping the initial freq_factor value).
> +		 * frequency (by keeping the initial capacity_ref_freq value).
>   		 */
>   		cpu_clk = of_clk_get(cpu_node, 0);
>   		if (!PTR_ERR_OR_ZERO(cpu_clk)) {
> -			per_cpu(freq_factor, cpu) =
> -				clk_get_rate(cpu_clk) / 1000;
> +			per_cpu(capacity_ref_freq, cpu) =
> +				clk_get_rate(cpu_clk) / HZ_PER_KHZ;
>   			clk_put(cpu_clk);
>   		}
>   	} else {
> @@ -411,7 +410,7 @@ init_cpu_capacity_callback(struct notifier_block *nb,
>   	cpumask_andnot(cpus_to_visit, cpus_to_visit, policy->related_cpus);
>   
>   	for_each_cpu(cpu, policy->related_cpus)
> -		per_cpu(freq_factor, cpu) = policy->cpuinfo.max_freq / 1000;
> +		per_cpu(capacity_ref_freq, cpu) = policy->cpuinfo.max_freq;

It seems init_cpu_capacity_callback() is only called when a policy is created
(cf. CPUFREQ_CREATE_POLICY). This means that CPU capacities are only updated
in this specific case, eluding the cases where:
- boost is enabled. 'policy->cpuinfo.max_freq' is updated in the cpufreq driver,
   but it seems it doesn't have any consequence regarding CPU capacities.
- a cpufreq driver is unplugged (e.g. rmmod cpufreq_driver.ko). In this case
   the CPU capacities are only updated when plugging in the driver (e.g. insmod
   cpufreq_driver.ko).

Regards,
Pierre


>   
>   	if (cpumask_empty(cpus_to_visit)) {
>   		topology_normalize_cpu_scale();
> diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
> index a07b510e7dc5..38ca6c76af56 100644
> --- a/include/linux/arch_topology.h
> +++ b/include/linux/arch_topology.h
> @@ -27,6 +27,13 @@ static inline unsigned long topology_get_cpu_scale(int cpu)
>   
>   void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity);
>   
> +DECLARE_PER_CPU(unsigned long, capacity_ref_freq);
> +
> +static inline unsigned long topology_get_freq_ref(int cpu)
> +{
> +	return per_cpu(capacity_ref_freq, cpu);
> +}
> +
>   DECLARE_PER_CPU(unsigned long, arch_freq_scale);
>   
>   static inline unsigned long topology_get_freq_scale(int cpu)

Vincent Guittot Oct. 11, 2023, 1:48 p.m. UTC | #2

On Wed, 11 Oct 2023 at 12:27, Pierre Gondois <pierre.gondois@arm.com> wrote:
>
> Hello Vincent,
>
> On 10/9/23 12:36, Vincent Guittot wrote:
> > Create a new method to get a unique and fixed max frequency. Currently
> > cpuinfo.max_freq or the highest (or last) state of performance domain are
> > used as the max frequency when computing the frequency for a level of
> > utilization but:
> > - cpuinfo_max_freq can change at runtime. boost is one example of
> >    such change.
> > - cpuinfo.max_freq and last item of the PD can be different leading to
> >    different results between cpufreq and energy model.
> >
> > We need to save the reference frequency that has been used when computing
> > the CPUs capacity and use this fixed and coherent value to convert between
> > frequency and CPU's capacity.
> >
> > In fact, we already save the frequency that has been used when computing
> > the capacity of each CPU. We extend the precision to save khZ instead of
> > Mhz currently and we modify the type to be aligned with other variables
> > used when converting frequency to capacity and the other way.
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > ---
> >   arch/arm/include/asm/topology.h   |  1 +
> >   arch/arm64/include/asm/topology.h |  1 +
> >   arch/riscv/include/asm/topology.h |  1 +
> >   drivers/base/arch_topology.c      | 29 ++++++++++++++---------------
> >   include/linux/arch_topology.h     |  7 +++++++
> >   5 files changed, 24 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
> > index c7d2510e5a78..853c4f81ba4a 100644
> > --- a/arch/arm/include/asm/topology.h
> > +++ b/arch/arm/include/asm/topology.h
> > @@ -13,6 +13,7 @@
> >   #define arch_set_freq_scale topology_set_freq_scale
> >   #define arch_scale_freq_capacity topology_get_freq_scale
> >   #define arch_scale_freq_invariant topology_scale_freq_invariant
> > +#define arch_scale_freq_ref topology_get_freq_ref
> >   #endif
> >
> >   /* Replace task scheduler's default cpu-invariant accounting */
> > diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
> > index 9fab663dd2de..a323b109b9c4 100644
> > --- a/arch/arm64/include/asm/topology.h
> > +++ b/arch/arm64/include/asm/topology.h
> > @@ -23,6 +23,7 @@ void update_freq_counters_refs(void);
> >   #define arch_set_freq_scale topology_set_freq_scale
> >   #define arch_scale_freq_capacity topology_get_freq_scale
> >   #define arch_scale_freq_invariant topology_scale_freq_invariant
> > +#define arch_scale_freq_ref topology_get_freq_ref
> >
> >   #ifdef CONFIG_ACPI_CPPC_LIB
> >   #define arch_init_invariance_cppc topology_init_cpu_capacity_cppc
> > diff --git a/arch/riscv/include/asm/topology.h b/arch/riscv/include/asm/topology.h
> > index e316ab3b77f3..61183688bdd5 100644
> > --- a/arch/riscv/include/asm/topology.h
> > +++ b/arch/riscv/include/asm/topology.h
> > @@ -9,6 +9,7 @@
> >   #define arch_set_freq_scale         topology_set_freq_scale
> >   #define arch_scale_freq_capacity    topology_get_freq_scale
> >   #define arch_scale_freq_invariant   topology_scale_freq_invariant
> > +#define arch_scale_freq_ref          topology_get_freq_ref
> >
> >   /* Replace task scheduler's default cpu-invariant accounting */
> >   #define arch_scale_cpu_capacity     topology_get_cpu_scale
> > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> > index b741b5ba82bd..9a073c2d2086 100644
> > --- a/drivers/base/arch_topology.c
> > +++ b/drivers/base/arch_topology.c
> > @@ -19,6 +19,7 @@
> >   #include <linux/init.h>
> >   #include <linux/rcupdate.h>
> >   #include <linux/sched.h>
> > +#include <linux/units.h>
> >
> >   #define CREATE_TRACE_POINTS
> >   #include <trace/events/thermal_pressure.h>
> > @@ -26,7 +27,8 @@
> >   static DEFINE_PER_CPU(struct scale_freq_data __rcu *, sft_data);
> >   static struct cpumask scale_freq_counters_mask;
> >   static bool scale_freq_invariant;
> > -static DEFINE_PER_CPU(u32, freq_factor) = 1;
> > +DEFINE_PER_CPU(unsigned long, capacity_ref_freq) = 1;
> > +EXPORT_PER_CPU_SYMBOL_GPL(capacity_ref_freq);
> >
> >   static bool supports_scale_freq_counters(const struct cpumask *cpus)
> >   {
> > @@ -170,9 +172,9 @@ DEFINE_PER_CPU(unsigned long, thermal_pressure);
> >    * operating on stale data when hot-plug is used for some CPUs. The
> >    * @capped_freq reflects the currently allowed max CPUs frequency due to
> >    * thermal capping. It might be also a boost frequency value, which is bigger
> > - * than the internal 'freq_factor' max frequency. In such case the pressure
> > - * value should simply be removed, since this is an indication that there is
> > - * no thermal throttling. The @capped_freq must be provided in kHz.
> > + * than the internal 'capacity_ref_freq' max frequency. In such case the
> > + * pressure value should simply be removed, since this is an indication that
> > + * there is no thermal throttling. The @capped_freq must be provided in kHz.
> >    */
> >   void topology_update_thermal_pressure(const struct cpumask *cpus,
> >                                     unsigned long capped_freq)
> > @@ -183,10 +185,7 @@ void topology_update_thermal_pressure(const struct cpumask *cpus,
> >
> >       cpu = cpumask_first(cpus);
> >       max_capacity = arch_scale_cpu_capacity(cpu);
> > -     max_freq = per_cpu(freq_factor, cpu);
> > -
> > -     /* Convert to MHz scale which is used in 'freq_factor' */
> > -     capped_freq /= 1000;
> > +     max_freq = arch_scale_freq_ref(cpu);
> >
> >       /*
> >        * Handle properly the boost frequencies, which should simply clean
> > @@ -279,13 +278,13 @@ void topology_normalize_cpu_scale(void)
> >
> >       capacity_scale = 1;
> >       for_each_possible_cpu(cpu) {
> > -             capacity = raw_capacity[cpu] * per_cpu(freq_factor, cpu);
> > +             capacity = raw_capacity[cpu] * per_cpu(capacity_ref_freq, cpu);
> >               capacity_scale = max(capacity, capacity_scale);
> >       }
> >
> >       pr_debug("cpu_capacity: capacity_scale=%llu\n", capacity_scale);
> >       for_each_possible_cpu(cpu) {
> > -             capacity = raw_capacity[cpu] * per_cpu(freq_factor, cpu);
> > +             capacity = raw_capacity[cpu] * per_cpu(capacity_ref_freq, cpu);
> >               capacity = div64_u64(capacity << SCHED_CAPACITY_SHIFT,
> >                       capacity_scale);
> >               topology_set_cpu_scale(cpu, capacity);
> > @@ -321,15 +320,15 @@ bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
> >                       cpu_node, raw_capacity[cpu]);
> >
> >               /*
> > -              * Update freq_factor for calculating early boot cpu capacities.
> > +              * Update capacity_ref_freq for calculating early boot cpu capacities.
> >                * For non-clk CPU DVFS mechanism, there's no way to get the
> >                * frequency value now, assuming they are running at the same
> > -              * frequency (by keeping the initial freq_factor value).
> > +              * frequency (by keeping the initial capacity_ref_freq value).
> >                */
> >               cpu_clk = of_clk_get(cpu_node, 0);
> >               if (!PTR_ERR_OR_ZERO(cpu_clk)) {
> > -                     per_cpu(freq_factor, cpu) =
> > -                             clk_get_rate(cpu_clk) / 1000;
> > +                     per_cpu(capacity_ref_freq, cpu) =
> > +                             clk_get_rate(cpu_clk) / HZ_PER_KHZ;
> >                       clk_put(cpu_clk);
> >               }
> >       } else {
> > @@ -411,7 +410,7 @@ init_cpu_capacity_callback(struct notifier_block *nb,
> >       cpumask_andnot(cpus_to_visit, cpus_to_visit, policy->related_cpus);
> >
> >       for_each_cpu(cpu, policy->related_cpus)
> > -             per_cpu(freq_factor, cpu) = policy->cpuinfo.max_freq / 1000;
> > +             per_cpu(capacity_ref_freq, cpu) = policy->cpuinfo.max_freq;
>
> It seems init_cpu_capacity_callback() is only called when a policy is created
> (cf. CPUFREQ_CREATE_POLICY). This means that CPU capacities are only updated
> in this specific case, eluding the cases where:
> - boost is enabled. 'policy->cpuinfo.max_freq' is updated in the cpufreq driver,
>    but it seems it doesn't have any consequence regarding CPU capacities.

Yes, that's why we have to rely on something else than max_freq. And
one might not want to take boost into account in the compute capacity
because it's not sustainable. Nothing as change on this side

> - a cpufreq driver is unplugged (e.g. rmmod cpufreq_driver.ko). In this case
>    the CPU capacities are only updated when plugging in the driver (e.g. insmod
>    cpufreq_driver.ko).

same here

>
> Regards,
> Pierre
>
>
> >
> >       if (cpumask_empty(cpus_to_visit)) {
> >               topology_normalize_cpu_scale();
> > diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
> > index a07b510e7dc5..38ca6c76af56 100644
> > --- a/include/linux/arch_topology.h
> > +++ b/include/linux/arch_topology.h
> > @@ -27,6 +27,13 @@ static inline unsigned long topology_get_cpu_scale(int cpu)
> >
> >   void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity);
> >
> > +DECLARE_PER_CPU(unsigned long, capacity_ref_freq);
> > +
> > +static inline unsigned long topology_get_freq_ref(int cpu)
> > +{
> > +     return per_cpu(capacity_ref_freq, cpu);
> > +}
> > +
> >   DECLARE_PER_CPU(unsigned long, arch_freq_scale);
> >
> >   static inline unsigned long topology_get_freq_scale(int cpu)

Lukasz Luba Oct. 18, 2023, 11:06 a.m. UTC | #3

On 10/9/23 11:36, Vincent Guittot wrote:
> Create a new method to get a unique and fixed max frequency. Currently
> cpuinfo.max_freq or the highest (or last) state of performance domain are
> used as the max frequency when computing the frequency for a level of
> utilization but:
> - cpuinfo_max_freq can change at runtime. boost is one example of
>    such change.
> - cpuinfo.max_freq and last item of the PD can be different leading to
>    different results between cpufreq and energy model.
> 
> We need to save the reference frequency that has been used when computing
> the CPUs capacity and use this fixed and coherent value to convert between
> frequency and CPU's capacity.
> 
> In fact, we already save the frequency that has been used when computing
> the capacity of each CPU. We extend the precision to save khZ instead of
> Mhz currently and we modify the type to be aligned with other variables
> used when converting frequency to capacity and the other way.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>   arch/arm/include/asm/topology.h   |  1 +
>   arch/arm64/include/asm/topology.h |  1 +
>   arch/riscv/include/asm/topology.h |  1 +
>   drivers/base/arch_topology.c      | 29 ++++++++++++++---------------
>   include/linux/arch_topology.h     |  7 +++++++
>   5 files changed, 24 insertions(+), 15 deletions(-)

[snip]

> @@ -170,9 +172,9 @@ DEFINE_PER_CPU(unsigned long, thermal_pressure);
>    * operating on stale data when hot-plug is used for some CPUs. The
>    * @capped_freq reflects the currently allowed max CPUs frequency due to
>    * thermal capping. It might be also a boost frequency value, which is bigger
> - * than the internal 'freq_factor' max frequency. In such case the pressure
> - * value should simply be removed, since this is an indication that there is
> - * no thermal throttling. The @capped_freq must be provided in kHz.
> + * than the internal 'capacity_ref_freq' max frequency. In such case the
> + * pressure value should simply be removed, since this is an indication that
> + * there is no thermal throttling. The @capped_freq must be provided in kHz.
>    */
>   void topology_update_thermal_pressure(const struct cpumask *cpus,
>   				      unsigned long capped_freq)
> @@ -183,10 +185,7 @@ void topology_update_thermal_pressure(const struct cpumask *cpus,
>   
>   	cpu = cpumask_first(cpus);
>   	max_capacity = arch_scale_cpu_capacity(cpu);
> -	max_freq = per_cpu(freq_factor, cpu);
> -
> -	/* Convert to MHz scale which is used in 'freq_factor' */
> -	capped_freq /= 1000;

I do like this small speed-up and I'm happy that it's possible with this
new design. (IIRC some of your platforms can update the thermal
pressure quite many time per sec - that might be the reason why
you see some speed-ups mentioned in the cover letter)

> +	max_freq = arch_scale_freq_ref(cpu);
>   


LGTM

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
index c7d2510e5a78..853c4f81ba4a 100644
--- a/arch/arm/include/asm/topology.h
+++ b/arch/arm/include/asm/topology.h
@@ -13,6 +13,7 @@ 
 #define arch_set_freq_scale topology_set_freq_scale
 #define arch_scale_freq_capacity topology_get_freq_scale
 #define arch_scale_freq_invariant topology_scale_freq_invariant
+#define arch_scale_freq_ref topology_get_freq_ref
 #endif
 
 /* Replace task scheduler's default cpu-invariant accounting */
diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index 9fab663dd2de..a323b109b9c4 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -23,6 +23,7 @@  void update_freq_counters_refs(void);
 #define arch_set_freq_scale topology_set_freq_scale
 #define arch_scale_freq_capacity topology_get_freq_scale
 #define arch_scale_freq_invariant topology_scale_freq_invariant
+#define arch_scale_freq_ref topology_get_freq_ref
 
 #ifdef CONFIG_ACPI_CPPC_LIB
 #define arch_init_invariance_cppc topology_init_cpu_capacity_cppc
diff --git a/arch/riscv/include/asm/topology.h b/arch/riscv/include/asm/topology.h
index e316ab3b77f3..61183688bdd5 100644
--- a/arch/riscv/include/asm/topology.h
+++ b/arch/riscv/include/asm/topology.h
@@ -9,6 +9,7 @@ 
 #define arch_set_freq_scale		topology_set_freq_scale
 #define arch_scale_freq_capacity	topology_get_freq_scale
 #define arch_scale_freq_invariant	topology_scale_freq_invariant
+#define arch_scale_freq_ref		topology_get_freq_ref
 
 /* Replace task scheduler's default cpu-invariant accounting */
 #define arch_scale_cpu_capacity	topology_get_cpu_scale
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index b741b5ba82bd..9a073c2d2086 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -19,6 +19,7 @@ 
 #include <linux/init.h>
 #include <linux/rcupdate.h>
 #include <linux/sched.h>
+#include <linux/units.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/thermal_pressure.h>
@@ -26,7 +27,8 @@ 
 static DEFINE_PER_CPU(struct scale_freq_data __rcu *, sft_data);
 static struct cpumask scale_freq_counters_mask;
 static bool scale_freq_invariant;
-static DEFINE_PER_CPU(u32, freq_factor) = 1;
+DEFINE_PER_CPU(unsigned long, capacity_ref_freq) = 1;
+EXPORT_PER_CPU_SYMBOL_GPL(capacity_ref_freq);
 
 static bool supports_scale_freq_counters(const struct cpumask *cpus)
 {
@@ -170,9 +172,9 @@  DEFINE_PER_CPU(unsigned long, thermal_pressure);
  * operating on stale data when hot-plug is used for some CPUs. The
  * @capped_freq reflects the currently allowed max CPUs frequency due to
  * thermal capping. It might be also a boost frequency value, which is bigger
- * than the internal 'freq_factor' max frequency. In such case the pressure
- * value should simply be removed, since this is an indication that there is
- * no thermal throttling. The @capped_freq must be provided in kHz.
+ * than the internal 'capacity_ref_freq' max frequency. In such case the
+ * pressure value should simply be removed, since this is an indication that
+ * there is no thermal throttling. The @capped_freq must be provided in kHz.
  */
 void topology_update_thermal_pressure(const struct cpumask *cpus,
 				      unsigned long capped_freq)
@@ -183,10 +185,7 @@  void topology_update_thermal_pressure(const struct cpumask *cpus,
 
 	cpu = cpumask_first(cpus);
 	max_capacity = arch_scale_cpu_capacity(cpu);
-	max_freq = per_cpu(freq_factor, cpu);
-
-	/* Convert to MHz scale which is used in 'freq_factor' */
-	capped_freq /= 1000;
+	max_freq = arch_scale_freq_ref(cpu);
 
 	/*
 	 * Handle properly the boost frequencies, which should simply clean
@@ -279,13 +278,13 @@  void topology_normalize_cpu_scale(void)
 
 	capacity_scale = 1;
 	for_each_possible_cpu(cpu) {
-		capacity = raw_capacity[cpu] * per_cpu(freq_factor, cpu);
+		capacity = raw_capacity[cpu] * per_cpu(capacity_ref_freq, cpu);
 		capacity_scale = max(capacity, capacity_scale);
 	}
 
 	pr_debug("cpu_capacity: capacity_scale=%llu\n", capacity_scale);
 	for_each_possible_cpu(cpu) {
-		capacity = raw_capacity[cpu] * per_cpu(freq_factor, cpu);
+		capacity = raw_capacity[cpu] * per_cpu(capacity_ref_freq, cpu);
 		capacity = div64_u64(capacity << SCHED_CAPACITY_SHIFT,
 			capacity_scale);
 		topology_set_cpu_scale(cpu, capacity);
@@ -321,15 +320,15 @@  bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
 			cpu_node, raw_capacity[cpu]);
 
 		/*
-		 * Update freq_factor for calculating early boot cpu capacities.
+		 * Update capacity_ref_freq for calculating early boot cpu capacities.
 		 * For non-clk CPU DVFS mechanism, there's no way to get the
 		 * frequency value now, assuming they are running at the same
-		 * frequency (by keeping the initial freq_factor value).
+		 * frequency (by keeping the initial capacity_ref_freq value).
 		 */
 		cpu_clk = of_clk_get(cpu_node, 0);
 		if (!PTR_ERR_OR_ZERO(cpu_clk)) {
-			per_cpu(freq_factor, cpu) =
-				clk_get_rate(cpu_clk) / 1000;
+			per_cpu(capacity_ref_freq, cpu) =
+				clk_get_rate(cpu_clk) / HZ_PER_KHZ;
 			clk_put(cpu_clk);
 		}
 	} else {
@@ -411,7 +410,7 @@  init_cpu_capacity_callback(struct notifier_block *nb,
 	cpumask_andnot(cpus_to_visit, cpus_to_visit, policy->related_cpus);
 
 	for_each_cpu(cpu, policy->related_cpus)
-		per_cpu(freq_factor, cpu) = policy->cpuinfo.max_freq / 1000;
+		per_cpu(capacity_ref_freq, cpu) = policy->cpuinfo.max_freq;
 
 	if (cpumask_empty(cpus_to_visit)) {
 		topology_normalize_cpu_scale();
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index a07b510e7dc5..38ca6c76af56 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -27,6 +27,13 @@  static inline unsigned long topology_get_cpu_scale(int cpu)
 
 void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity);
 
+DECLARE_PER_CPU(unsigned long, capacity_ref_freq);
+
+static inline unsigned long topology_get_freq_ref(int cpu)
+{
+	return per_cpu(capacity_ref_freq, cpu);
+}
+
 DECLARE_PER_CPU(unsigned long, arch_freq_scale);
 
 static inline unsigned long topology_get_freq_scale(int cpu)

[v2,2/6] topology: add a new arch_scale_freq_reference

Checks

Commit Message

Comments

Patch