diff mbox series

[sched_ext/for-6.11] sched_ext: Disallow loading BPF scheduler if isolcpus= domain isolation is in effect

Message ID Zny_5syk1K74HP0D@slm.duckdns.org (mailing list archive)
State Not Applicable
Headers show
Series [sched_ext/for-6.11] sched_ext: Disallow loading BPF scheduler if isolcpus= domain isolation is in effect | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Tejun Heo June 27, 2024, 1:27 a.m. UTC
sched_domains regulate the load balancing for sched_classes. A machine can
be partitioned into multiple sections that are not load-balanced across
using either isolcpus= boot param or cpuset partitions. In such cases, tasks
that are in one partition are expected to stay within that partition.

cpuset configured partitions are always reflected in each member task's
cpumask. As SCX always honors the task cpumasks, the BPF scheduler is
automatically in compliance with the configured partitions.

However, for isolcpus= domain isolation, the isolated CPUs are simply
omitted from the top-level sched_domain[s] without further restrictions on
tasks' cpumasks, so, for example, a task currently running in an isolated
CPU may have more CPUs in its allowed cpumask while expected to remain on
the same CPU.

There is no straightforward way to enforce this partitioning preemptively on
BPF schedulers and erroring out after a violation can be surprising.
isolcpus= domain isolation is being replaced with cpuset partitions anyway,
so keep it simple and simply disallow loading a BPF scheduler if isolcpus=
domain isolation is in effect.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20240626082342.GY31592@noisy.programming.kicks-ass.net
Cc: David Vernet <void@manifault.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/sched/build_policy.c |    1 +
 kernel/sched/ext.c          |    6 ++++++
 2 files changed, 7 insertions(+)

Comments

Tejun Heo July 8, 2024, 7:30 p.m. UTC | #1
On Wed, Jun 26, 2024 at 03:27:02PM -1000, Tejun Heo wrote:
> sched_domains regulate the load balancing for sched_classes. A machine can
> be partitioned into multiple sections that are not load-balanced across
> using either isolcpus= boot param or cpuset partitions. In such cases, tasks
> that are in one partition are expected to stay within that partition.
> 
> cpuset configured partitions are always reflected in each member task's
> cpumask. As SCX always honors the task cpumasks, the BPF scheduler is
> automatically in compliance with the configured partitions.
> 
> However, for isolcpus= domain isolation, the isolated CPUs are simply
> omitted from the top-level sched_domain[s] without further restrictions on
> tasks' cpumasks, so, for example, a task currently running in an isolated
> CPU may have more CPUs in its allowed cpumask while expected to remain on
> the same CPU.
> 
> There is no straightforward way to enforce this partitioning preemptively on
> BPF schedulers and erroring out after a violation can be surprising.
> isolcpus= domain isolation is being replaced with cpuset partitions anyway,
> so keep it simple and simply disallow loading a BPF scheduler if isolcpus=
> domain isolation is in effect.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Link: http://lkml.kernel.org/r/20240626082342.GY31592@noisy.programming.kicks-ass.net
> Cc: David Vernet <void@manifault.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Frederic Weisbecker <frederic@kernel.org>

Applied to cgroup/for-6.11.

Thanks.
diff mbox series

Patch

--- a/kernel/sched/build_policy.c
+++ b/kernel/sched/build_policy.c
@@ -16,6 +16,7 @@ 
 #include <linux/sched/clock.h>
 #include <linux/sched/cputime.h>
 #include <linux/sched/hotplug.h>
+#include <linux/sched/isolation.h>
 #include <linux/sched/posix-timers.h>
 #include <linux/sched/rt.h>
 
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4400,6 +4400,12 @@  static int scx_ops_enable(struct sched_e
 	unsigned long timeout;
 	int i, cpu, ret;
 
+	if (!cpumask_equal(housekeeping_cpumask(HK_TYPE_DOMAIN),
+			   cpu_possible_mask)) {
+		pr_err("sched_ext: Not compatible with \"isolcpus=\" domain isolation");
+		return -EINVAL;
+	}
+
 	mutex_lock(&scx_ops_enable_mutex);
 
 	if (!scx_ops_helper) {