diff mbox

[v10,4/4] arm64: topology: Provide relative power numbers for cores

Message ID 1389287106-29979-5-git-send-email-broonie@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Mark Brown Jan. 9, 2014, 5:05 p.m. UTC
From: Mark Brown <broonie@linaro.org>

Provide performance numbers to the scheduler to help it fill the cores in
the system on big.LITTLE systems. With the current scheduler this may
perform poorly for applications that try to do OpenMP style work over all
cores but should help for more common workloads.

The power numbers are the same as for ARMv7 since it seems that the
expected differential between the big and little cores is very similar on
both ARMv7 and ARMv8. These numbers are just an initial and basic
approximation for use with the current scheduler, it is likely that both
experience with silicon and ongoing work on improving the scheduler will
lead to further tuning. In both ARMv7 and ARMv8 cases the numbers were
based on the published DMIPS numbers.

Signed-off-by: Mark Brown <broonie@linaro.org>
---
 arch/arm64/kernel/topology.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Paul Walmsley Jan. 14, 2014, 1:16 a.m. UTC | #1
Hi

On 01/09/2014 09:05 AM, Mark Brown wrote:
> The power numbers are the same as for ARMv7 since it seems that the
> expected differential between the big and little cores is very similar on
> both ARMv7 and ARMv8. These numbers are just an initial and basic
> approximation for use with the current scheduler, it is likely that both
> experience with silicon and ongoing work on improving the scheduler will
> lead to further tuning. In both ARMv7 and ARMv8 cases the numbers were
> based on the published DMIPS numbers.

Just to chime in on this.

As I think someone else proposed, ideally both arch/arm32 and arch/arm64 
should be determining these metrics dynamically, and allowing the 
metrics to be overridden from DT data.  These metrics can vary 
significantly based on variables other than just the core type.

For example, for some ARMv7 CPUs, we know that different revision levels 
of the same core type can result in significant performance 
differences.  Also CPU bug workarounds applied early by the bootloader, 
or different integrator hardware core configurations, can impact these 
metrics.


- Paul
Mark Brown Jan. 14, 2014, 1:43 p.m. UTC | #2
On Mon, Jan 13, 2014 at 05:16:41PM -0800, Paul Walmsley wrote:

> As I think someone else proposed, ideally both arch/arm32 and
> arch/arm64 should be determining these metrics dynamically, and
> allowing the metrics to be overridden from DT data.  These metrics
> can vary significantly based on variables other than just the core
> type.

> For example, for some ARMv7 CPUs, we know that different revision
> levels of the same core type can result in significant performance
> differences.  Also CPU bug workarounds applied early by the
> bootloader, or different integrator hardware core configurations,
> can impact these metrics.

So, determining any numbers dynamically is obviously the ideal (and
shouldn't really be an architecture thing - there is nothing obviously
architecture specific about that) and we ought to take things like core
revision differences into account.

However doing this by putting specific tunable numbers into the DT isn't
good practice since the implementation that interprets those numbers is
subject to change.  The numbers are basically just benchmark numbers
with all the specificness and so on that this entails, saying "this is
core with rating X" isn't robust for the long term since it's not really
objective information and is heavily tied to the the specific software
doing the parsing (and most likely also some particular application
stack that was being tuned).

The best practice is therefore for any relevant information that needs
to be factored in and can't be read from the silicon at runtime to be
put in the DT in a way that describes something objective like the core
revisions.  This lets the system take account of these factors in a way
that makes sense within the implementation rather than having to try to
interpret numbers which may not apply well to the current implementation
and may hide information which would help the implementation tune itself.

The other option here is runtime tunables via sysfs or whatever which
avoids baking numbers into the DT in the same way, though that is partly
just a punt of the same problem to userspace.  That's probably also more
helpful to people doing the tuning since it can be tweaked dynamically.
Mark Brown Jan. 14, 2014, 7:16 p.m. UTC | #3
On Tue, Jan 14, 2014 at 01:43:10PM +0000, Mark Brown wrote:

> The other option here is runtime tunables via sysfs or whatever which
> avoids baking numbers into the DT in the same way, though that is partly
> just a punt of the same problem to userspace.  That's probably also more
> helpful to people doing the tuning since it can be tweaked dynamically.

One other thing on this which occurred to me later on: if we do want to
make these values directly tunable we probably want to do that from the
scheduler rather than individual architectures.  These are scheduler
interfaces and the issues which make tuning desirable are essentially
architecture neutral ones.
diff mbox

Patch

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 1f086fc7665d..022ad0905a2a 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -177,6 +177,8 @@  struct cpu_efficiency {
  * use the default SCHED_POWER_SCALE value for cpu_scale.
  */
 static const struct cpu_efficiency table_efficiency[] = {
+	{ "arm,cortex-a57", 3891 },
+	{ "arm,cortex-a53", 2048 },
 	{ NULL, },
 };