diff mbox series

[v11,3/5] sched/core: uclamp: Propagate system defaults to root group

Message ID 20190708084357.12944-4-patrick.bellasi@arm.com (mailing list archive)
State Not Applicable, archived
Headers show
Series Add utilization clamping support (CGroups API) | expand

Commit Message

Patrick Bellasi July 8, 2019, 8:43 a.m. UTC
The clamp values are not tunable at the level of the root task group.
That's for two main reasons:

 - the root group represents "system resources" which are always
   entirely available from the cgroup standpoint.

 - when tuning/restricting "system resources" makes sense, tuning must
   be done using a system wide API which should also be available when
   control groups are not.

When a system wide restriction is available, cgroups should be aware of
its value in order to know exactly how much "system resources" are
available for the subgroups.

Utilization clamping supports already the concepts of:

 - system defaults: which define the maximum possible clamp values
   usable by tasks.

 - effective clamps: which allows a parent cgroup to constraint (maybe
   temporarily) its descendants without losing the information related
   to the values "requested" from them.

Exploit these two concepts and bind them together in such a way that,
whenever system default are tuned, the new values are propagated to
(possibly) restrict or relax the "effective" value of nested cgroups.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
---
 kernel/sched/core.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

Comments

Michal Koutný July 15, 2019, 4:42 p.m. UTC | #1
On Mon, Jul 08, 2019 at 09:43:55AM +0100, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> +static void uclamp_update_root_tg(void)
> +{
> +	struct task_group *tg = &root_task_group;
> +
> +	uclamp_se_set(&tg->uclamp_req[UCLAMP_MIN],
> +		      sysctl_sched_uclamp_util_min, false);
> +	uclamp_se_set(&tg->uclamp_req[UCLAMP_MAX],
> +		      sysctl_sched_uclamp_util_max, false);
> +
> +	cpu_util_update_eff(&root_task_group.css);
> +}
cpu_util_update_eff internally calls css_for_each_descendant_pre() so
this should be protected with rcu_read_lock().
Patrick Bellasi July 16, 2019, 2:34 p.m. UTC | #2
On 15-Jul 18:42, Michal Koutný wrote:
> On Mon, Jul 08, 2019 at 09:43:55AM +0100, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> > +static void uclamp_update_root_tg(void)
> > +{
> > +	struct task_group *tg = &root_task_group;
> > +
> > +	uclamp_se_set(&tg->uclamp_req[UCLAMP_MIN],
> > +		      sysctl_sched_uclamp_util_min, false);
> > +	uclamp_se_set(&tg->uclamp_req[UCLAMP_MAX],
> > +		      sysctl_sched_uclamp_util_max, false);
> > +
> > +	cpu_util_update_eff(&root_task_group.css);
> > +}
> cpu_util_update_eff internally calls css_for_each_descendant_pre() so
> this should be protected with rcu_read_lock().

Right, good catch! Will add in v12.

Cheers,
Patrick
Michal Koutný July 16, 2019, 3:36 p.m. UTC | #3
On Tue, Jul 16, 2019 at 03:34:17PM +0100, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> > cpu_util_update_eff internally calls css_for_each_descendant_pre() so
> > this should be protected with rcu_read_lock().
> 
> Right, good catch! Will add in v12.
When I responded to your other patch, it occurred to me that since
cpu_util_update_eff goes writing down to child csses, it should take
also uclamp_mutex here to avoid race with direct cgroup attribute
writers.
Joel Fernandes July 16, 2019, 3:46 p.m. UTC | #4
On Tue, Jul 16, 2019 at 10:34 AM Patrick Bellasi
<patrick.bellasi@arm.com> wrote:
>
> On 15-Jul 18:42, Michal Koutný wrote:
> > On Mon, Jul 08, 2019 at 09:43:55AM +0100, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> > > +static void uclamp_update_root_tg(void)
> > > +{
> > > +   struct task_group *tg = &root_task_group;
> > > +
> > > +   uclamp_se_set(&tg->uclamp_req[UCLAMP_MIN],
> > > +                 sysctl_sched_uclamp_util_min, false);
> > > +   uclamp_se_set(&tg->uclamp_req[UCLAMP_MAX],
> > > +                 sysctl_sched_uclamp_util_max, false);
> > > +
> > > +   cpu_util_update_eff(&root_task_group.css);
> > > +}
> > cpu_util_update_eff internally calls css_for_each_descendant_pre() so
> > this should be protected with rcu_read_lock().
>
> Right, good catch! Will add in v12.
>

Hopefully these can catch it in the near future!
https://lore.kernel.org/rcu/20190715143705.117908-1-joel@joelfernandes.org/T/#t
Patrick Bellasi July 16, 2019, 6 p.m. UTC | #5
On 16-Jul 17:36, Michal Koutný wrote:
> On Tue, Jul 16, 2019 at 03:34:17PM +0100, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> > > cpu_util_update_eff internally calls css_for_each_descendant_pre() so
> > > this should be protected with rcu_read_lock().
> > 
> > Right, good catch! Will add in v12.
> When I responded to your other patch, it occurred to me that since
> cpu_util_update_eff goes writing down to child csses, it should take
> also uclamp_mutex here to avoid race with direct cgroup attribute
> writers.

Yep, I should drop the "dedicated" mutex we have now in
sysctl_sched_uclamp_handler() and use the uclamp_mutex we already
have.

Thanks, Patrick
diff mbox series

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ec91f4518752..276f9c2f6103 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1017,12 +1017,30 @@  static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
 		uclamp_rq_dec_id(rq, p, clamp_id);
 }
 
+#ifdef CONFIG_UCLAMP_TASK_GROUP
+static void cpu_util_update_eff(struct cgroup_subsys_state *css);
+static void uclamp_update_root_tg(void)
+{
+	struct task_group *tg = &root_task_group;
+
+	uclamp_se_set(&tg->uclamp_req[UCLAMP_MIN],
+		      sysctl_sched_uclamp_util_min, false);
+	uclamp_se_set(&tg->uclamp_req[UCLAMP_MAX],
+		      sysctl_sched_uclamp_util_max, false);
+
+	cpu_util_update_eff(&root_task_group.css);
+}
+#else
+static void uclamp_update_root_tg(void) { }
+#endif
+
 int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
 				void __user *buffer, size_t *lenp,
 				loff_t *ppos)
 {
-	int old_min, old_max;
+	bool update_root_tg = false;
 	static DEFINE_MUTEX(mutex);
+	int old_min, old_max;
 	int result;
 
 	mutex_lock(&mutex);
@@ -1044,12 +1062,17 @@  int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
 	if (old_min != sysctl_sched_uclamp_util_min) {
 		uclamp_se_set(&uclamp_default[UCLAMP_MIN],
 			      sysctl_sched_uclamp_util_min, false);
+		update_root_tg = true;
 	}
 	if (old_max != sysctl_sched_uclamp_util_max) {
 		uclamp_se_set(&uclamp_default[UCLAMP_MAX],
 			      sysctl_sched_uclamp_util_max, false);
+		update_root_tg = true;
 	}
 
+	if (update_root_tg)
+		uclamp_update_root_tg();
+
 	/*
 	 * Updating all the RUNNABLE task is expensive, keep it simple and do
 	 * just a lazy update at each next enqueue time.