Message ID | 20190708084357.12944-5-patrick.bellasi@arm.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | Add utilization clamping support (CGroups API) | expand |
On Mon, Jul 08, 2019 at 09:43:56AM +0100, Patrick Bellasi <patrick.bellasi@arm.com> wrote: > This mimics what already happens for a task's CPU affinity mask when the > task is also in a cpuset, i.e. cgroup attributes are always used to > restrict per-task attributes. If I am not mistaken when set_schedaffinity(2) call is made that results in an empty cpuset, the call fails with EINVAL [1]. If I track the code correctly, the values passed to sched_setattr(2) are checked against the trivial validity (umin <= umax) and later on, they are adjusted to match the effective clamping of the containing task_group. Is that correct? If the user attempted to sched_setattr [a, b], and the effective uclamp was [c, d] such that [a, b] ∩ [c, d] = ∅, the set uclamp will be silently moved out of their intended range. Wouldn't it be better to return with EINVAL too when the intersection is empty (since the user supplied range won't be attained)? Michal [1] http://www.linux-arm.org/git?p=linux-pb.git;a=blob;f=kernel/sched/core.c;h=ddc5fcd4b9cfaa95496b24d8599c03bc140e768e;hb=2c15043a2a2b5d86eb409556cbe1e36d4fd275b5#l1660
On 15-Jul 18:42, Michal Koutný wrote: > On Mon, Jul 08, 2019 at 09:43:56AM +0100, Patrick Bellasi <patrick.bellasi@arm.com> wrote: > > This mimics what already happens for a task's CPU affinity mask when the > > task is also in a cpuset, i.e. cgroup attributes are always used to > > restrict per-task attributes. > If I am not mistaken when set_schedaffinity(2) call is made that results > in an empty cpuset, the call fails with EINVAL [1]. > > If I track the code correctly, the values passed to sched_setattr(2) are > checked against the trivial validity (umin <= umax) and later on, they > are adjusted to match the effective clamping of the containing > task_group. Is that correct? > > If the user attempted to sched_setattr [a, b], and the effective uclamp > was [c, d] such that [a, b] ∩ [c, d] = ∅, the set uclamp will be > silently moved out of their intended range. Wouldn't it be better to > return with EINVAL too when the intersection is empty (since the user > supplied range won't be attained)? You right for the cpuset case, but I don't think we never end up with a "empty" set in the case of utilization clamping. We limit clamps hierarchically in such a way that: clamp[clamp_id] = min(task::clamp[clamp_id], tg::clamp[clamp_id], system::clamp[clamp_id]) and we ensure, on top of the above that: clamp[UCLAMP_MIN] = min(clamp[UCLAMP_MIN], clamp[UCLAMP_MAX]) Since it's all and only about "capping" values, at the very extreme case you can end up with: clamp[UCLAMP_MIN] = clamp[UCLAMP_MAX] = 0 but that's till a valid configuration. Am I missing something? Otherwise, I think the changelog sentence you quoted is just misleading. I'll remove it from v12 since it does not really clarify anything more then the rest of the message. Cheers, Patrick
On Tue, Jul 16, 2019 at 03:34:35PM +0100, Patrick Bellasi <patrick.bellasi@arm.com> wrote: > Am I missing something? No, it's rather my misinterpretation of the syscall semantics. > Otherwise, I think the changelog sentence you quoted is just > misleading. It certainly mislead me to thinking about the sched_setattr calls as requests of utilization being in the given interval (substituting 0 or 1 when only one boundary is given, and further constrained by tg's interval). I see your point, those are actually two (mostly) independent controls. Makes sense now. Thanks, Michal
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 276f9c2f6103..2591a70c85cf 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -873,16 +873,42 @@ unsigned int uclamp_rq_max_value(struct rq *rq, unsigned int clamp_id, return uclamp_idle_value(rq, clamp_id, clamp_value); } +static inline struct uclamp_se +uclamp_tg_restrict(struct task_struct *p, unsigned int clamp_id) +{ + struct uclamp_se uc_req = p->uclamp_req[clamp_id]; +#ifdef CONFIG_UCLAMP_TASK_GROUP + struct uclamp_se uc_max; + + /* + * Tasks in autogroups or root task group will be + * restricted by system defaults. + */ + if (task_group_is_autogroup(task_group(p))) + return uc_req; + if (task_group(p) == &root_task_group) + return uc_req; + + uc_max = task_group(p)->uclamp[clamp_id]; + if (uc_req.value > uc_max.value || !uc_req.user_defined) + return uc_max; +#endif + + return uc_req; +} + /* * The effective clamp bucket index of a task depends on, by increasing * priority: * - the task specific clamp value, when explicitly requested from userspace + * - the task group effective clamp value, for tasks not either in the root + * group or in an autogroup * - the system default clamp value, defined by the sysadmin */ static inline struct uclamp_se uclamp_eff_get(struct task_struct *p, unsigned int clamp_id) { - struct uclamp_se uc_req = p->uclamp_req[clamp_id]; + struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id); struct uclamp_se uc_max = uclamp_default[clamp_id]; /* System default restrictions always apply */
When a task specific clamp value is configured via sched_setattr(2), this value is accounted in the corresponding clamp bucket every time the task is {en,de}qeued. However, when cgroups are also in use, the task specific clamp values could be restricted by the task_group (TG) clamp values. Update uclamp_cpu_inc() to aggregate task and TG clamp values. Every time a task is enqueued, it's accounted in the clamp bucket tracking the smaller clamp between the task specific value and its TG effective value. This allows to: 1. ensure cgroup clamps are always used to restrict task specific requests, i.e. boosted not more than its TG effective protection and capped at least as its TG effective limit. 2. implement a "nice-like" policy, where tasks are still allowed to request less than what enforced by their TG effective limits and protections This mimics what already happens for a task's CPU affinity mask when the task is also in a cpuset, i.e. cgroup attributes are always used to restrict per-task attributes. Do this by exploiting the concept of "effective" clamp, which is already used by a TG to track parent enforced restrictions. Apply task group clamp restrictions only to tasks belonging to a child group. While, for tasks in the root group or in an autogroup, only system defaults are enforced. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> --- kernel/sched/core.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-)