[v6,12/16] sched/core: uclamp: Extend CPU's cgroup controller

The cgroup CPU bandwidth controller allows to assign a specified
(maximum) bandwidth to the tasks of a group. However this bandwidth is
defined and enforced only on a temporal base, without considering the
actual frequency a CPU is running on. Thus, the amount of computation
completed by a task within an allocated bandwidth can be very different
depending on the actual frequency the CPU is running that task.
The amount of computation can be affected also by the specific CPU a
task is running on, especially when running on asymmetric capacity
systems like Arm's big.LITTLE.

With the availability of schedutil, the scheduler is now able
to drive frequency selections based on actual task utilization.
Moreover, the utilization clamping support provides a mechanism to
bias the frequency selection operated by schedutil depending on
constraints assigned to the tasks currently RUNNABLE on a CPU.

Giving the mechanisms described above, it is now possible to extend the
cpu controller to specify the minimum (or maximum) utilization which
should be considered for tasks RUNNABLE on a cpu.
This makes it possible to better defined the actual computational
power assigned to task groups, thus improving the cgroup CPU bandwidth
controller which is currently based just on time constraints.

Extend the CPU controller with a couple of new attributes util.{min,max}
which allows to enforce utilization boosting and capping for all the
tasks in a group. Specifically:

- util.min: defines the minimum utilization which should be considered
	    i.e. the RUNNABLE tasks of this group will run at least at a
		 minimum frequency which corresponds to the min_util
		 utilization

- util.max: defines the maximum utilization which should be considered
	    i.e. the RUNNABLE tasks of this group will run up to a
		 maximum frequency which corresponds to the max_util
		 utilization

These attributes:

a) are available only for non-root nodes, both on default and legacy
   hierarchies, while system wide clamps are defined by a generic
   interface which does not depends on cgroups

b) do not enforce any constraints and/or dependencies between the parent
   and its child nodes, thus relying:
   - on permission settings defined by the system management software,
     to define if subgroups can configure their clamp values
   - on the delegation model, to ensure that effective clamps are
     updated to consider both subgroup requests and parent group
     constraints

c) have higher priority than task-specific clamps, defined via
   sched_setattr(), thus allowing to control and restrict task requests

This patch provides the basic support to expose the two new attributes
and to validate their run-time updates, while we do not (yet) actually
allocated clamp buckets.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>

---

NOTEs:

1) The delegation model described above is provided in one of the
following patches of this series.

2) Utilization clamping constraints are useful not only to bias frequency
selection, when a task is running, but also to better support certain
scheduler decisions regarding task placement. For example, on
asymmetric capacity systems, a utilization clamp value can be
conveniently used to enforce important interactive tasks on more capable
CPUs or to run low priority and background tasks on more energy
efficient CPUs.

The ultimate goal of utilization clamping is thus to enable:

- boosting: by selecting an higher capacity CPU and/or higher execution
            frequency for small tasks which are affecting the user
            interactive experience.

- capping: by selecting more energy efficiency CPUs or lower execution
           frequency, for big tasks which are mainly related to
           background activities, and thus without a direct impact on
           the user experience.

Thus, a proper extension of the cpu controller with utilization clamping
support will make this controller even more suitable for integration
with advanced system management software (e.g. Android).
Indeed, an informed user-space can provide rich information hints to the
scheduler regarding the tasks it's going to schedule.

The bits related to task placement biasing are left for a further
extension once the basic support introduced by this series will be
merged. Anyway they will not affect the integration with cgroups.

Changes in v6:
 Others:
 - wholesale s/group/bucket/
 - wholesale s/_{get,put}/_{inc,dec}/ to match refcount APIs
---
 Documentation/admin-guide/cgroup-v2.rst |  25 +++++
 init/Kconfig                            |  22 ++++
 kernel/sched/core.c                     | 131 ++++++++++++++++++++++++
 kernel/sched/sched.h                    |   5 +
 4 files changed, 183 insertions(+)

Message ID	20190115101513.2822-13-patrick.bellasi@arm.com (mailing list archive)
State	Not Applicable, archived
Headers	show Return-Path: <linux-pm-owner@kernel.org> Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A8D291390 for <patchwork-linux-pm@patchwork.kernel.org>; Tue, 15 Jan 2019 10:16:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 96AED2AFB3 for <patchwork-linux-pm@patchwork.kernel.org>; Tue, 15 Jan 2019 10:16:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8984A2B3AB; Tue, 15 Jan 2019 10:16:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 89F382AFB3 for <patchwork-linux-pm@patchwork.kernel.org>; Tue, 15 Jan 2019 10:16:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728629AbfAOKQg (ORCPT <rfc822;patchwork-linux-pm@patchwork.kernel.org>); Tue, 15 Jan 2019 05:16:36 -0500 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:46990 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728911AbfAOKQH (ORCPT <rfc822;linux-pm@vger.kernel.org>); Tue, 15 Jan 2019 05:16:07 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2085D15AD; Tue, 15 Jan 2019 02:16:07 -0800 (PST) Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com [10.1.194.43]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 09EA23F7B8; Tue, 15 Jan 2019 02:16:03 -0800 (PST) From: Patrick Bellasi <patrick.bellasi@arm.com> To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, linux-api@vger.kernel.org Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>, "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>, Vincent Guittot <vincent.guittot@linaro.org>, Viresh Kumar <viresh.kumar@linaro.org>, Paul Turner <pjt@google.com>, Quentin Perret <quentin.perret@arm.com>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Morten Rasmussen <morten.rasmussen@arm.com>, Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@google.com>, Joel Fernandes <joelaf@google.com>, Steve Muckle <smuckle@google.com>, Suren Baghdasaryan <surenb@google.com> Subject: [PATCH v6 12/16] sched/core: uclamp: Extend CPU's cgroup controller Date: Tue, 15 Jan 2019 10:15:09 +0000 Message-Id: <20190115101513.2822-13-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190115101513.2822-1-patrick.bellasi@arm.com> References: <20190115101513.2822-1-patrick.bellasi@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: <linux-pm.vger.kernel.org> X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP
Series	Add utilization clamping support \| expand [v6,00/16] Add utilization clamping support [v6,01/16] sched/core: Allow sched_setattr() to use the current policy [v6,02/16] sched/core: uclamp: Extend sched_setattr() to support utilization clamping [v6,03/16] sched/core: uclamp: Map TASK's clamp values into CPU's clamp buckets [v6,04/16] sched/core: uclamp: Add CPU's clamp buckets refcounting [v6,05/16] sched/core: uclamp: Update CPU's refcount on clamp changes [v6,06/16] sched/core: uclamp: Enforce last task UCLAMP_MAX [v6,07/16] sched/core: uclamp: Add system default clamps [v6,08/16] sched/cpufreq: uclamp: Add utilization clamping for FAIR tasks [v6,09/16] sched/cpufreq: uclamp: Add utilization clamping for RT tasks [v6,10/16] sched/core: Add uclamp_util_with() [v6,11/16] sched/fair: Add uclamp support to energy_compute() [v6,12/16] sched/core: uclamp: Extend CPU's cgroup controller [v6,13/16] sched/core: uclamp: Propagate parent clamps [v6,14/16] sched/core: uclamp: Map TG's clamp values into CPU's clamp buckets [v6,15/16] sched/core: uclamp: Use TG's clamps to restrict TASK's clamps [v6,16/16] sched/core: uclamp: Update CPU's refcount on TG's clamp changes

[v6,12/16] sched/core: uclamp: Extend CPU's cgroup controller

Commit Message

Patch