diff mbox series

[v11,4/5] sched/core: uclamp: Use TG's clamps to restrict TASK's clamps

Message ID 20190708084357.12944-5-patrick.bellasi@arm.com (mailing list archive)
State Not Applicable, archived
Headers show
Series Add utilization clamping support (CGroups API) | expand

Commit Message

Patrick Bellasi July 8, 2019, 8:43 a.m. UTC
When a task specific clamp value is configured via sched_setattr(2), this
value is accounted in the corresponding clamp bucket every time the task is
{en,de}qeued. However, when cgroups are also in use, the task specific
clamp values could be restricted by the task_group (TG) clamp values.

Update uclamp_cpu_inc() to aggregate task and TG clamp values. Every time a
task is enqueued, it's accounted in the clamp bucket tracking the smaller
clamp between the task specific value and its TG effective value. This
allows to:

1. ensure cgroup clamps are always used to restrict task specific requests,
   i.e. boosted not more than its TG effective protection and capped at
   least as its TG effective limit.

2. implement a "nice-like" policy, where tasks are still allowed to request
   less than what enforced by their TG effective limits and protections

This mimics what already happens for a task's CPU affinity mask when the
task is also in a cpuset, i.e. cgroup attributes are always used to
restrict per-task attributes.

Do this by exploiting the concept of "effective" clamp, which is already
used by a TG to track parent enforced restrictions.

Apply task group clamp restrictions only to tasks belonging to a child
group. While, for tasks in the root group or in an autogroup, only system
defaults are enforced.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
---
 kernel/sched/core.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

Comments

Michal Koutný July 15, 2019, 4:42 p.m. UTC | #1
On Mon, Jul 08, 2019 at 09:43:56AM +0100, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> This mimics what already happens for a task's CPU affinity mask when the
> task is also in a cpuset, i.e. cgroup attributes are always used to
> restrict per-task attributes.
If I am not mistaken when set_schedaffinity(2) call is made that results
in an empty cpuset, the call fails with EINVAL [1].

If I track the code correctly, the values passed to sched_setattr(2) are
checked against the trivial validity (umin <= umax) and later on, they
are adjusted to match the effective clamping of the containing
task_group. Is that correct?

If the user attempted to sched_setattr [a, b], and the effective uclamp
was [c, d] such that [a, b] ∩ [c, d] = ∅, the set uclamp will be
silently moved out of their intended range. Wouldn't it be better to
return with EINVAL too when the intersection is empty (since the user
supplied range won't be attained)?

Michal

[1] http://www.linux-arm.org/git?p=linux-pb.git;a=blob;f=kernel/sched/core.c;h=ddc5fcd4b9cfaa95496b24d8599c03bc140e768e;hb=2c15043a2a2b5d86eb409556cbe1e36d4fd275b5#l1660
Patrick Bellasi July 16, 2019, 2:34 p.m. UTC | #2
On 15-Jul 18:42, Michal Koutný wrote:
> On Mon, Jul 08, 2019 at 09:43:56AM +0100, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> > This mimics what already happens for a task's CPU affinity mask when the
> > task is also in a cpuset, i.e. cgroup attributes are always used to
> > restrict per-task attributes.
> If I am not mistaken when set_schedaffinity(2) call is made that results
> in an empty cpuset, the call fails with EINVAL [1].
> 
> If I track the code correctly, the values passed to sched_setattr(2) are
> checked against the trivial validity (umin <= umax) and later on, they
> are adjusted to match the effective clamping of the containing
> task_group. Is that correct?
> 
> If the user attempted to sched_setattr [a, b], and the effective uclamp
> was [c, d] such that [a, b] ∩ [c, d] = ∅, the set uclamp will be
> silently moved out of their intended range. Wouldn't it be better to
> return with EINVAL too when the intersection is empty (since the user
> supplied range won't be attained)?

You right for the cpuset case, but I don't think we never end up with
a "empty" set in the case of utilization clamping.

We limit clamps hierarchically in such a way that:

  clamp[clamp_id] = min(task::clamp[clamp_id],
                        tg::clamp[clamp_id],
                        system::clamp[clamp_id])

and we ensure, on top of the above that:

  clamp[UCLAMP_MIN] = min(clamp[UCLAMP_MIN], clamp[UCLAMP_MAX])

Since it's all and only about "capping" values, at the very extreme
case you can end up with:

  clamp[UCLAMP_MIN] = clamp[UCLAMP_MAX] = 0

but that's till a valid configuration.

Am I missing something?

Otherwise, I think the changelog sentence you quoted is just
misleading.  I'll remove it from v12 since it does not really clarify
anything more then the rest of the message.

Cheers,
Patrick
Michal Koutný July 16, 2019, 3:58 p.m. UTC | #3
On Tue, Jul 16, 2019 at 03:34:35PM +0100, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> Am I missing something?
No, it's rather my misinterpretation of the syscall semantics.

> Otherwise, I think the changelog sentence you quoted is just
> misleading.
It certainly mislead me to thinking about the sched_setattr calls as
requests of utilization being in the given interval (substituting 0 or 1 when
only one boundary is given, and further constrained by tg's interval).

I see your point, those are actually two (mostly) independent controls.
Makes sense now.

Thanks,
Michal
diff mbox series

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 276f9c2f6103..2591a70c85cf 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -873,16 +873,42 @@  unsigned int uclamp_rq_max_value(struct rq *rq, unsigned int clamp_id,
 	return uclamp_idle_value(rq, clamp_id, clamp_value);
 }
 
+static inline struct uclamp_se
+uclamp_tg_restrict(struct task_struct *p, unsigned int clamp_id)
+{
+	struct uclamp_se uc_req = p->uclamp_req[clamp_id];
+#ifdef CONFIG_UCLAMP_TASK_GROUP
+	struct uclamp_se uc_max;
+
+	/*
+	 * Tasks in autogroups or root task group will be
+	 * restricted by system defaults.
+	 */
+	if (task_group_is_autogroup(task_group(p)))
+		return uc_req;
+	if (task_group(p) == &root_task_group)
+		return uc_req;
+
+	uc_max = task_group(p)->uclamp[clamp_id];
+	if (uc_req.value > uc_max.value || !uc_req.user_defined)
+		return uc_max;
+#endif
+
+	return uc_req;
+}
+
 /*
  * The effective clamp bucket index of a task depends on, by increasing
  * priority:
  * - the task specific clamp value, when explicitly requested from userspace
+ * - the task group effective clamp value, for tasks not either in the root
+ *   group or in an autogroup
  * - the system default clamp value, defined by the sysadmin
  */
 static inline struct uclamp_se
 uclamp_eff_get(struct task_struct *p, unsigned int clamp_id)
 {
-	struct uclamp_se uc_req = p->uclamp_req[clamp_id];
+	struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
 	struct uclamp_se uc_max = uclamp_default[clamp_id];
 
 	/* System default restrictions always apply */