diff mbox series

[v9,13/15] sched/fair: Introduce an energy estimation helper function

Message ID 20181119141857.8625-14-quentin.perret@arm.com (mailing list archive)
State Superseded, archived
Headers show
Series Energy Aware Scheduling | expand

Commit Message

Quentin Perret Nov. 19, 2018, 2:18 p.m. UTC
In preparation for the definition of an energy-aware wakeup path,
introduce a helper function to estimate the consequence on system energy
when a specific task wakes-up on a specific CPU. compute_energy()
estimates the capacity state to be reached by all performance domains
and estimates the consumption of each online CPU according to its Energy
Model and its percentage of busy time.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
---
 kernel/sched/fair.c | 76 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

Comments

Peter Zijlstra Nov. 21, 2018, 2:28 p.m. UTC | #1
On Mon, Nov 19, 2018 at 02:18:55PM +0000, Quentin Perret wrote:
> +static long
> +compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
> +{
> +	long util, max_util, sum_util, energy = 0;
> +	int cpu;
> +
> +	for (; pd; pd = pd->next) {
> +		max_util = sum_util = 0;
> +		/*
> +		 * The capacity state of CPUs of the current rd can be driven by
> +		 * CPUs of another rd if they belong to the same performance
> +		 * domain. So, account for the utilization of these CPUs too
> +		 * by masking pd with cpu_online_mask instead of the rd span.
> +		 *
> +		 * If an entire performance domain is outside of the current rd,
> +		 * it will not appear in its pd list and will not be accounted
> +		 * by compute_energy().
> +		 */
> +		for_each_cpu_and(cpu, perf_domain_span(pd), cpu_online_mask) {

Should that not be cpu_active_mask ?

> +			util = cpu_util_next(cpu, p, dst_cpu);
> +			util = schedutil_energy_util(cpu, util);
> +			max_util = max(util, max_util);
> +			sum_util += util;
> +		}
> +
> +		energy += em_pd_energy(pd->em_pd, max_util, sum_util);
> +	}
> +
> +	return energy;
> +}
Quentin Perret Nov. 21, 2018, 4:05 p.m. UTC | #2
On Wednesday 21 Nov 2018 at 15:28:03 (+0100), Peter Zijlstra wrote:
> On Mon, Nov 19, 2018 at 02:18:55PM +0000, Quentin Perret wrote:
> > +static long
> > +compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
> > +{
> > +	long util, max_util, sum_util, energy = 0;
> > +	int cpu;
> > +
> > +	for (; pd; pd = pd->next) {
> > +		max_util = sum_util = 0;
> > +		/*
> > +		 * The capacity state of CPUs of the current rd can be driven by
> > +		 * CPUs of another rd if they belong to the same performance
> > +		 * domain. So, account for the utilization of these CPUs too
> > +		 * by masking pd with cpu_online_mask instead of the rd span.
> > +		 *
> > +		 * If an entire performance domain is outside of the current rd,
> > +		 * it will not appear in its pd list and will not be accounted
> > +		 * by compute_energy().
> > +		 */
> > +		for_each_cpu_and(cpu, perf_domain_span(pd), cpu_online_mask) {
> 
> Should that not be cpu_active_mask ?

Hmm, I must admit I'm sometimes a bit confused by the exact difference
between these masks, so maybe yeah ...

IIUC, cpu_active_mask is basically the set of CPUs on which the
scheduler is actually allowed to migrate tasks. Is that correct ?

I have always seen cpu_online_mask as a superset of cpu_active_mask
which can also include CPUs which are still running 'special' tasks
(kthreads and things like that I assume) although not allowed for
migration any more (or not yet) because we're in the process of
hotplugging that CPU.

So, the thing is, I'm not trying to select a CPU candidate for my task
here, I'm trying to understand what's the energy impact of a migration.
That involves all CPUs that are running _something_ in a perf domain
no matter if they're allowed to run more tasks or not. I mean, raising
the OPP will make running online && !active CPUs more expensive as well.
That's why I thought cpu_online_mask was good match here.

Or maybe I'm confused again :-)

> 
> > +			util = cpu_util_next(cpu, p, dst_cpu);
> > +			util = schedutil_energy_util(cpu, util);
> > +			max_util = max(util, max_util);
> > +			sum_util += util;
> > +		}
> > +
> > +		energy += em_pd_energy(pd->em_pd, max_util, sum_util);
> > +	}
> > +
> > +	return energy;
> > +}

Thanks,
Quentin
Peter Zijlstra Nov. 22, 2018, 1:56 p.m. UTC | #3
On Wed, Nov 21, 2018 at 04:05:27PM +0000, Quentin Perret wrote:
> On Wednesday 21 Nov 2018 at 15:28:03 (+0100), Peter Zijlstra wrote:
> > On Mon, Nov 19, 2018 at 02:18:55PM +0000, Quentin Perret wrote:
> > > +static long
> > > +compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
> > > +{
> > > +	long util, max_util, sum_util, energy = 0;
> > > +	int cpu;
> > > +
> > > +	for (; pd; pd = pd->next) {
> > > +		max_util = sum_util = 0;
> > > +		/*
> > > +		 * The capacity state of CPUs of the current rd can be driven by
> > > +		 * CPUs of another rd if they belong to the same performance
> > > +		 * domain. So, account for the utilization of these CPUs too
> > > +		 * by masking pd with cpu_online_mask instead of the rd span.
> > > +		 *
> > > +		 * If an entire performance domain is outside of the current rd,
> > > +		 * it will not appear in its pd list and will not be accounted
> > > +		 * by compute_energy().
> > > +		 */
> > > +		for_each_cpu_and(cpu, perf_domain_span(pd), cpu_online_mask) {
> > 
> > Should that not be cpu_active_mask ?
> 
> Hmm, I must admit I'm sometimes a bit confused by the exact difference
> between these masks, so maybe yeah ...
> 
> IIUC, cpu_active_mask is basically the set of CPUs on which the
> scheduler is actually allowed to migrate tasks. Is that correct ?

Yep. Which is a strict subset of online. The difference only matters
during hotplug. We take a CPU out of active before we take if offline
and we add it to active only after the CPU is fully online and
scheduling.

> I have always seen cpu_online_mask as a superset of cpu_active_mask
> which can also include CPUs which are still running 'special' tasks
> (kthreads and things like that I assume) although not allowed for
> migration any more (or not yet) because we're in the process of
> hotplugging that CPU.

Right.

> So, the thing is, I'm not trying to select a CPU candidate for my task
> here, I'm trying to understand what's the energy impact of a migration.
> That involves all CPUs that are running _something_ in a perf domain
> no matter if they're allowed to run more tasks or not. I mean, raising
> the OPP will make running online && !active CPUs more expensive as well.
> That's why I thought cpu_online_mask was good match here.

Ah, fair enough. Thanks!
diff mbox series

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c3b2dad72c9c..a20018ad9236 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6377,6 +6377,82 @@  static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)
 	return !task_fits_capacity(p, min_cap);
 }
 
+/*
+ * Predicts what cpu_util(@cpu) would return if @p was migrated (and enqueued)
+ * to @dst_cpu.
+ */
+static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
+{
+	struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs;
+	unsigned long util_est, util = READ_ONCE(cfs_rq->avg.util_avg);
+
+	/*
+	 * If @p migrates from @cpu to another, remove its contribution. Or,
+	 * if @p migrates from another CPU to @cpu, add its contribution. In
+	 * the other cases, @cpu is not impacted by the migration, so the
+	 * util_avg should already be correct.
+	 */
+	if (task_cpu(p) == cpu && dst_cpu != cpu)
+		sub_positive(&util, task_util(p));
+	else if (task_cpu(p) != cpu && dst_cpu == cpu)
+		util += task_util(p);
+
+	if (sched_feat(UTIL_EST)) {
+		util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued);
+
+		/*
+		 * During wake-up, the task isn't enqueued yet and doesn't
+		 * appear in the cfs_rq->avg.util_est.enqueued of any rq,
+		 * so just add it (if needed) to "simulate" what will be
+		 * cpu_util() after the task has been enqueued.
+		 */
+		if (dst_cpu == cpu)
+			util_est += _task_util_est(p);
+
+		util = max(util, util_est);
+	}
+
+	return min(util, capacity_orig_of(cpu));
+}
+
+/*
+ * compute_energy(): Estimates the energy that would be consumed if @p was
+ * migrated to @dst_cpu. compute_energy() predicts what will be the utilization
+ * landscape of the * CPUs after the task migration, and uses the Energy Model
+ * to compute what would be the energy if we decided to actually migrate that
+ * task.
+ */
+static long
+compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
+{
+	long util, max_util, sum_util, energy = 0;
+	int cpu;
+
+	for (; pd; pd = pd->next) {
+		max_util = sum_util = 0;
+		/*
+		 * The capacity state of CPUs of the current rd can be driven by
+		 * CPUs of another rd if they belong to the same performance
+		 * domain. So, account for the utilization of these CPUs too
+		 * by masking pd with cpu_online_mask instead of the rd span.
+		 *
+		 * If an entire performance domain is outside of the current rd,
+		 * it will not appear in its pd list and will not be accounted
+		 * by compute_energy().
+		 */
+		for_each_cpu_and(cpu, perf_domain_span(pd), cpu_online_mask) {
+			util = cpu_util_next(cpu, p, dst_cpu);
+			util = schedutil_energy_util(cpu, util);
+			max_util = max(util, max_util);
+			sum_util += util;
+		}
+
+		energy += em_pd_energy(pd->em_pd, max_util, sum_util);
+	}
+
+	return energy;
+}
+
 /*
  * select_task_rq_fair: Select target runqueue for the waking task in domains
  * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,