Message ID | 1389287106-29979-5-git-send-email-broonie@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi On 01/09/2014 09:05 AM, Mark Brown wrote: > The power numbers are the same as for ARMv7 since it seems that the > expected differential between the big and little cores is very similar on > both ARMv7 and ARMv8. These numbers are just an initial and basic > approximation for use with the current scheduler, it is likely that both > experience with silicon and ongoing work on improving the scheduler will > lead to further tuning. In both ARMv7 and ARMv8 cases the numbers were > based on the published DMIPS numbers. Just to chime in on this. As I think someone else proposed, ideally both arch/arm32 and arch/arm64 should be determining these metrics dynamically, and allowing the metrics to be overridden from DT data. These metrics can vary significantly based on variables other than just the core type. For example, for some ARMv7 CPUs, we know that different revision levels of the same core type can result in significant performance differences. Also CPU bug workarounds applied early by the bootloader, or different integrator hardware core configurations, can impact these metrics. - Paul
On Mon, Jan 13, 2014 at 05:16:41PM -0800, Paul Walmsley wrote: > As I think someone else proposed, ideally both arch/arm32 and > arch/arm64 should be determining these metrics dynamically, and > allowing the metrics to be overridden from DT data. These metrics > can vary significantly based on variables other than just the core > type. > For example, for some ARMv7 CPUs, we know that different revision > levels of the same core type can result in significant performance > differences. Also CPU bug workarounds applied early by the > bootloader, or different integrator hardware core configurations, > can impact these metrics. So, determining any numbers dynamically is obviously the ideal (and shouldn't really be an architecture thing - there is nothing obviously architecture specific about that) and we ought to take things like core revision differences into account. However doing this by putting specific tunable numbers into the DT isn't good practice since the implementation that interprets those numbers is subject to change. The numbers are basically just benchmark numbers with all the specificness and so on that this entails, saying "this is core with rating X" isn't robust for the long term since it's not really objective information and is heavily tied to the the specific software doing the parsing (and most likely also some particular application stack that was being tuned). The best practice is therefore for any relevant information that needs to be factored in and can't be read from the silicon at runtime to be put in the DT in a way that describes something objective like the core revisions. This lets the system take account of these factors in a way that makes sense within the implementation rather than having to try to interpret numbers which may not apply well to the current implementation and may hide information which would help the implementation tune itself. The other option here is runtime tunables via sysfs or whatever which avoids baking numbers into the DT in the same way, though that is partly just a punt of the same problem to userspace. That's probably also more helpful to people doing the tuning since it can be tweaked dynamically.
On Tue, Jan 14, 2014 at 01:43:10PM +0000, Mark Brown wrote: > The other option here is runtime tunables via sysfs or whatever which > avoids baking numbers into the DT in the same way, though that is partly > just a punt of the same problem to userspace. That's probably also more > helpful to people doing the tuning since it can be tweaked dynamically. One other thing on this which occurred to me later on: if we do want to make these values directly tunable we probably want to do that from the scheduler rather than individual architectures. These are scheduler interfaces and the issues which make tuning desirable are essentially architecture neutral ones.
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index 1f086fc7665d..022ad0905a2a 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -177,6 +177,8 @@ struct cpu_efficiency { * use the default SCHED_POWER_SCALE value for cpu_scale. */ static const struct cpu_efficiency table_efficiency[] = { + { "arm,cortex-a57", 3891 }, + { "arm,cortex-a53", 2048 }, { NULL, }, };