From patchwork Tue May 12 19:38:41 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Morten Rasmussen <morten.rasmussen@arm.com>
X-Patchwork-Id: 6391061
Return-Path: <linux-pm-owner@kernel.org>
X-Original-To: patchwork-linux-pm@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id 964A79F32B
	for <patchwork-linux-pm@patchwork.kernel.org>;
	Tue, 12 May 2015 19:46:24 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 8D209203AB
	for <patchwork-linux-pm@patchwork.kernel.org>;
	Tue, 12 May 2015 19:46:23 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 5AFD5200D4
	for <patchwork-linux-pm@patchwork.kernel.org>;
	Tue, 12 May 2015 19:46:21 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S933734AbbELTiF (ORCPT
	<rfc822;patchwork-linux-pm@patchwork.kernel.org>);
	Tue, 12 May 2015 15:38:05 -0400
Received: from foss.arm.com ([217.140.101.70]:33773 "EHLO foss.arm.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S933733AbbELTiC (ORCPT <rfc822;linux-pm@vger.kernel.org>);
	Tue, 12 May 2015 15:38:02 -0400
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D949A69B;
	Tue, 12 May 2015 12:37:24 -0700 (PDT)
Received: from e105550-lin.cambridge.arm.com (e105550-lin.cambridge.arm.com
	[10.2.131.193])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id
	22E693F218; Tue, 12 May 2015 12:37:58 -0700 (PDT)
From: Morten Rasmussen <morten.rasmussen@arm.com>
To: peterz@infradead.org, mingo@redhat.com
Cc: vincent.guittot@linaro.org, Dietmar Eggemann <Dietmar.Eggemann@arm.com>,
	yuyang.du@intel.com, preeti@linux.vnet.ibm.com,
	mturquette@linaro.org, rjw@rjwysocki.net,
	Juri Lelli <Juri.Lelli@arm.com>, sgurrappadi@nvidia.com,
	pang.xunlei@zte.com.cn, linux-kernel@vger.kernel.org,
	linux-pm@vger.kernel.org, morten.rasmussen@arm.com
Subject: [RFCv4 PATCH 06/34] sched: Make usage tracking cpu scale-invariant
Date: Tue, 12 May 2015 20:38:41 +0100
Message-Id: <1431459549-18343-7-git-send-email-morten.rasmussen@arm.com>
X-Mailer: git-send-email 1.9.1
In-Reply-To: <1431459549-18343-1-git-send-email-morten.rasmussen@arm.com>
References: <1431459549-18343-1-git-send-email-morten.rasmussen@arm.com>
Sender: linux-pm-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-pm.vger.kernel.org>
X-Mailing-List: linux-pm@vger.kernel.org
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI,
	T_RP_MATCHES_RCVD,
	UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

From: Dietmar Eggemann <dietmar.eggemann@arm.com>

Besides the existing frequency scale-invariance correction factor, apply
cpu scale-invariance correction factor to usage tracking.

Cpu scale-invariance takes cpu performance deviations due to
micro-architectural differences (i.e. instructions per seconds) between
cpus in HMP systems (e.g. big.LITTLE) and differences in the frequency
value of the highest OPP between cpus in SMP systems into consideration.

Each segment of the sched_avg::running_avg_sum geometric series is now
scaled by the cpu performance factor too so the
sched_avg::utilization_avg_contrib of each entity will be invariant from
the particular cpu of the HMP/SMP system it is gathered on.

So the usage level that is returned by get_cpu_usage stays relative to
the max cpu performance of the system.

In contrast to usage, load (sched_avg::runnable_avg_sum) is currently
not considered to be made cpu scale-invariant because this will have a
negative effect on the the existing load balance code based on
s[dg]_lb_stats::avg_load in overload scenarios.

example: 7 always running tasks
         4 on cluster 0 (2 cpus w/ cpu_capacity=512)
         3 on cluster 1 (1 cpu w/ cpu_capacity=1024)

                 cluster 0     cluster 1

capacity         1024 (2*512)  1024 (1*1024)
load             4096          3072
cpu-scaled load  2048          3072

Simply using cpu-scaled load in the existing lb code would declare
cluster 1 busier than cluster 0, although the compute capacity budget
for one task is higher on cluster 1 (1024/3 = 341) than on cluster 0
(2*512/4 = 256).

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/fair.c  | 13 +++++++++++++
 kernel/sched/sched.h |  2 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d71d0ca..af55982 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2540,6 +2540,7 @@ static __always_inline int __update_entity_runnable_avg(u64 now, int cpu,
 	u32 runnable_contrib, scaled_runnable_contrib;
 	int delta_w, scaled_delta_w, decayed = 0;
 	unsigned long scale_freq = arch_scale_freq_capacity(NULL, cpu);
+	unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
 
 	delta = now - sa->last_runnable_update;
 	/*
@@ -2576,6 +2577,10 @@ static __always_inline int __update_entity_runnable_avg(u64 now, int cpu,
 
 		if (runnable)
 			sa->runnable_avg_sum += scaled_delta_w;
+
+		scaled_delta_w *= scale_cpu;
+		scaled_delta_w >>= SCHED_CAPACITY_SHIFT;
+
 		if (running)
 			sa->running_avg_sum += scaled_delta_w;
 		sa->avg_period += delta_w;
@@ -2600,6 +2605,10 @@ static __always_inline int __update_entity_runnable_avg(u64 now, int cpu,
 
 		if (runnable)
 			sa->runnable_avg_sum += scaled_runnable_contrib;
+
+		scaled_runnable_contrib *= scale_cpu;
+		scaled_runnable_contrib >>= SCHED_CAPACITY_SHIFT;
+
 		if (running)
 			sa->running_avg_sum += scaled_runnable_contrib;
 		sa->avg_period += runnable_contrib;
@@ -2610,6 +2619,10 @@ static __always_inline int __update_entity_runnable_avg(u64 now, int cpu,
 
 	if (runnable)
 		sa->runnable_avg_sum += scaled_delta;
+
+	scaled_delta *= scale_cpu;
+	scaled_delta >>= SCHED_CAPACITY_SHIFT;
+
 	if (running)
 		sa->running_avg_sum += scaled_delta;
 	sa->avg_period += delta;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b422e08..3193025 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1404,7 +1404,7 @@ unsigned long arch_scale_freq_capacity(struct sched_domain *sd, int cpu)
 static __always_inline
 unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
 {
-	if ((sd->flags & SD_SHARE_CPUCAPACITY) && (sd->span_weight > 1))
+	if (sd && (sd->flags & SD_SHARE_CPUCAPACITY) && (sd->span_weight > 1))
 		return sd->smt_gain / sd->span_weight;
 
 	return SCHED_CAPACITY_SCALE;