From patchwork Fri Mar 14 09:45:33 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrea Righi <arighi@nvidia.com>
X-Patchwork-Id: 14016529
Received: from NAM10-BN7-obe.outbound.protection.outlook.com
 (mail-bn7nam10on2083.outbound.protection.outlook.com [40.107.92.83])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id ACB321EF0B9;
	Fri, 14 Mar 2025 09:48:52 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=fail smtp.client-ip=40.107.92.83
ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741945734; cv=fail;
 b=g87hVeoV7P7nB9hgGhoVxps1SbjAo/g5Jw4m4NCtPWGpv1Zepc9u5cTM6k+lflE582lOBqKl/SPeTfu8XN1V10Rvt9MAQfD7ATCy6qbeXqPOx8uLT5d0PLp0e6rkkwyE9DLzzSokAifMoSrADzZal5FrlLGzMQcXb6Pj+sOfRbE=
ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741945734; c=relaxed/simple;
	bh=R13jwkuZ0Dj86K5nCS317BddN0NK56835Y0kdOJ5S08=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 Content-Type:MIME-Version;
 b=ogpIVWzCK3mNWWnko5/q4qu5FohqQVX+dC++/hz+x9MjZGLR4rstx6/TwN6UukIKu46RQcD/IPR1G43YSjmgcnyWW++e9MXH2APURnkqYfAQwiLnBRc/TdoWLvlxUh8NHa0nSfet9zzLN2dbnm+97Y2CR2qQOBYGmlcmoFVLm4I=
ARC-Authentication-Results: i=2; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com;
 spf=fail smtp.mailfrom=nvidia.com;
 dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b=beKqw4RO; arc=fail smtp.client-ip=40.107.92.83
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=fail smtp.mailfrom=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b="beKqw4RO"
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=es4G5UcrGnXF4LtgMFeGQr/Si2cplkjp8iJTtLjJiQlqN/GD+M+VDJ4mPS/s3GKRwvC+ksBEY1v4IFQELUPbxh3UK0G8uPyCSq9yzOW/2E1wcesEjzX+cRcssl7Yg22akLcvQrgcuS+1ed36JfmtbbWsmWpgZkTpQvGJls5AF4ekkgXTCBUwkcwJ0sUbxK9d5AKFCBK4YU/UKTqQ5WCf+cYki4/iygxGtRUNqPwIi0qljrHwoq7LDS/ch9GX2mfLMfZQCv2vxA+PIEDFJPvKSXW70WJKgZA1Q7Km7b7OP5FcXHew42d5aBcgTWZukaD3vhxaD5jeNxf+1+cIx+spBw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=hRou4w4PAlHHzr5GTM3lO0oFYjmQgKqjXBWlNu9jecU=;
 b=ksKYJ4O3huEjny720h8HNb2i0HIamQsfLFjKV26BTVVmetOeUVniuSdV2SC+ED4vpzs0e1zduBcwaGx5I973xMnZX6SnFgXgLcs0MSqc0GlSxpHzOex7pd3A0Hes1tldSu8bfDv4YRAoG4mNMZhp8R8Ky0x6devuUQWDsK8O+w4h7E5gJci6KapdLsHRhf62Dwckr5ekzYY+0zqTZgAsYGX8Vv31Ss4xPSLhcTQ7cJTqKHYshgt1G/Pazw/iWVtDUPeQuuu/jwQ7873f93L/5oySYRXWwNr29XV8jws0k+ILuE5NT5fT92riyLqc816SEpsNa6puXzpvwIQTtlUqpg==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com;
 dkim=pass header.d=nvidia.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com;
 s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=hRou4w4PAlHHzr5GTM3lO0oFYjmQgKqjXBWlNu9jecU=;
 b=beKqw4ROEqT5avp1OxUkeYBoXN4QOSx4i9WAs+oybXaPv6B+tuBQDyHi+jo/LdtgRbfZXk/8h+TtHCSGvtAH0MpUpkWXijm/zwXpUyWyCCcefylD1P9wRPc2vSEUDnAK6BCC4Z+SApZcXX3PNS8FmHRiBRikopzefARS8ZTFq3DespDkI435LbAa52vHP+3gWZHhi8HGR3wBF+E2TIGQKeCdU0nKDsLV8Mml5s1Zc9n5cxUJ4F5nfTMHU+ep/k0o+9cPdE+bY7qRJTtIcR2nrBn2a5qALPhL/wgzdqqb/uwm16oqOemwLVfovrH07uxgUWqNbrv18jX8cezcFZiIzg==
Authentication-Results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=nvidia.com;
Received: from CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17)
 by CY5PR12MB6431.namprd12.prod.outlook.com (2603:10b6:930:39::8) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8511.27; Fri, 14 Mar
 2025 09:48:49 +0000
Received: from CY5PR12MB6405.namprd12.prod.outlook.com
 ([fe80::2119:c96c:b455:53b5]) by CY5PR12MB6405.namprd12.prod.outlook.com
 ([fe80::2119:c96c:b455:53b5%5]) with mapi id 15.20.8511.031; Fri, 14 Mar 2025
 09:48:49 +0000
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>,
	David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>
Cc: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH 1/8] sched_ext: idle: Honor idle flags in the built-in idle
 selection policy
Date: Fri, 14 Mar 2025 10:45:33 +0100
Message-ID: <20250314094827.167563-2-arighi@nvidia.com>
X-Mailer: git-send-email 2.48.1
In-Reply-To: <20250314094827.167563-1-arighi@nvidia.com>
References: <20250314094827.167563-1-arighi@nvidia.com>
X-ClientProxiedBy: MI1P293CA0028.ITAP293.PROD.OUTLOOK.COM
 (2603:10a6:290:3::20) To CY5PR12MB6405.namprd12.prod.outlook.com
 (2603:10b6:930:3e::17)
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: CY5PR12MB6405:EE_|CY5PR12MB6431:EE_
X-MS-Office365-Filtering-Correlation-Id: 66e234a0-8306-4afe-a7b1-08dd62dd6bc9
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|366016|376014;
X-Microsoft-Antispam-Message-Info: 
 3m7+c2MwXz/zAT8lh0D7q6+dnfin6PbI28xiA/ZSY0WjCI3wK5UMJUGVnZRU2Em5B+yviZm7gk1vR6oJoMP1dC+ZCQUHSoJ1pPSg22wl9qdKDoowGs3hd/xXybdkOm7o63MBe1lDyVauoNoKbOKbCcJd4SeNGW3r4ki0fx0GWV6jGsJ0VTnIVyadF0V3zpOrhxsaj3Ivpmnhet6vdhJUgGXRvcLd/Pc7pboXymSB5B/PY9lSOCP7yoK38VEoQDNC/GvwVPT9hJizx6TigUNWqxpqjJWuzlxUF2JI91OOEilHj493Iq125ar24c1xQDp4wAh8BMUJZP4vMfp6wKSoiTbXKqbYZ7SVyahdIrncbFUyfqrPRX8KPt+ivS19aF6Y0TJGrjQO43zHoUMqPhO/E1FCQCdahJF8PwjbIsiJ5+bTiigHHfBRZcyqYIZ7+nYo7TjRp72/9BMUb7Pk7a5gpK38yQN6wsLHAzbcIsHvm6ONxrEKJVuI+VS/nasyu8Z8ejSuMT3cax6kzrpCoRth62BA1HJMoGbK6p/RQ/4JT8jXY6WQcX31ygG2nCSLFlTVVDrTEkHXDFe7a8qPvPakML7mZUnBg6zMGBDc+hugOgrB9OTPcfSiMRlnA2g67jthnt92uTbHdao5fQQ9VFq0WNz+UwKI5ilvyRoIiDqsfvpeujoKXTwUF7baHZt9LlFHAqnlugFW0YsM2MxOpVvwAM4WTc8Fcz3x2qe2KzZXd0PyXf0ijoXix1L4HqYCHjXv5fcGuaLKg4cdP2x/7CTvVJXRroCx4MP9Rmnz9xMJH30kR9IP5/C8df+lnvn+6ALgw/vUohvLpxuBoN0k8xhzFu/fCnCBEjNHsT81oyxG3CwKYyGri1JlqXvjbleT3SWsKfD8uPgliudC9k/FFYGQDxoBOONxBfZ+rQjSsHjPgy2f9vni3fdeeVcD6wzAvVQ6RbF/yp7asHBtlkE8ERg0N3nSNEV+DHckO6MJqMhKi9HTZhTF4asHe5JZzg3GOUgvagQ4nTRUuW3QMAAUxS4DcVFHF5V6ZN39NdPKIXLPWncaJoyIhFLGgocHzlcPAW4g0EVMkp+Bml6XJSp8MfvZaTCBGEGoF803DaWFkZx9O/y/DfVzsLNncLhF/qE4VvC7H9tcruwAtQaxW1WsQNeoX2IYPif1kqWM8kKgq9dmrqWy0mhAdltYJ7C02RMF+8QH0K49uATf7nG5EhNZ/h7y246eZaluhYMwGkHX/B5K5SjdKmsdUhIgadVYYqL7/ip0cmT/9fOPMZ+fjmphBOKlYZDc3VxQggwPi5GtI+CHH6LZR64hjjP3kXNChHdqzQNfOG9+bXdxcSrq+oeS/vbG3tS4kdwJHsQGWkzQdjwB93NSgcdm63sgbJZ1WJxlCDju
X-Forefront-Antispam-Report: 
	CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY5PR12MB6405.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(366016)(376014);DIR:OUT;SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: 
 w1DXC4vSl+g8C/Eogb5LJmdt9DQtWku+KBKUBk+fqwEblXylvlmZwtu0qmSFu7cWM/uCACEkTg6pJLRDn68N38Vx/fAvRqePIfu1HDMD31b4Ie7jhynbY/6UY1oRF4avzIfGJ90x8f9CAVptDnCvl6naUiFSOGpHXsUIy+B3Kp7N9v04OFEGmrdlNjwXGhDFr1KCcn7njw4NO85Hrf5VJnpwPCO0ryG8ygjps3mJJxY/r3ynNMBs7JaK8/Fkrp5Kvqfw7K3vUwttISO/1rVLVwKQ0UB5KBGyjtMBQkQWYdgTqu74c/SOrI9Emy8BMC7Sh3lwEPI03UYwr+xNt3w8H22LZ30JxzQ3hId3liGkPdgm/Cwpd1+WdTq2CnpossZIadX72NzxsGEBDM8cd+VmBuuU33h9RKNtqcL+lWNzSQuzA4EHMjwZkJ5VFobxXwpXzlG0uM8Jz/cfrKITaFD6dyLqjMfrbXdWyBuM+mg2TvQfVNJnQysb2lRSq8IMqnGxxYBzWZjCEkZRPaaNDDeoCoB4MAgEJZAnUqGk4mIEewmfHJnPcJ0xVNd0Hh+ENEDPxspdWxXgpzOYWdfOwXqUoPOeGfygwfM7Tx0SOwrj1sdm6u8S6lGXqDGX92fSmzAVc2T2f0KSbpH+Z/CkQHGgSxOEuzhVtGqlpb7ZMloHjkIu9ze8Ntu0CphgnHrLYrBw46AfkEMtsLhebCYzyCjCJKOxC6fd0GILIWyxpyHv0MLhEJpAawueEJZ99P+23IjF5k/PW0Ij81GVFMWeNskJp0eePjCe2f6GHmjlGzI95XSE5UC34dC5S6g2+Wqy65XbHYkAaHTg6xHXafbgqxNg56OyCgpetqp9/WsPflhsyyDzoEV3fAJHSaSj3N2aOqTwrMfVZu/QCNg0rgViJ37rNQac/0tMhnSM3c11vA27V1I5bmNNFvH/DvVii3Q2izTbjLtDlDCyZ8Fe4GKTbWgUB9BxkOa1jthEbxhnv8zdvMHhvR4uLTxEVo5EnULxedcTJ2WB/Hx9VxtHYVQC3BKx8zCwnwvrl2lq7YsiEY3V9+JboaP0TA0vN6xg+e9AoTRZ3GT9pAW/t3q1bL8C9qfspy+POES67lo6QpjgTU+mCcDDvv5zPj2Vvp+weGve/dOKw8LcQLYwspoDa0IBdro8NDsbR/m1e6+pkI1XOgCBV7qHWOUvFssPOgXSoZYRUNjWaVLutqHrQfO86sLqTOYuOePDQxugHPwCQk/Ci/S5dRkpfaA+pAT4g+XI4cqUrW+BMo8liqXOWj2nw89wk69ZjOcvQPEA/07VI/PLnkawqYtkghEH5TYIkhegS6kC8YD/PgMeNS63G4iufXZnP1W/UQrNkZUlADNRxCQ/DpninmvJxb1uuR/FebSS5h7mpvPxnrTjv9YUx58qSwS3ajF2PcKyzBtkyc9M6VxkPo83wj51KiDEXuzejJLYZbByu7jXP1l6iep8MhGTOJEjOn+8nIQHKSlYP4ms99hKQrQIT0+Oil4AUK6F/fbszQVCmqpxZxcSUXldWBoSAdEJtxR/xdXU+kRd5VkYC6i+NBhuAkkn8EWEW5/v8OLAqIlCwh2Z
X-OriginatorOrg: Nvidia.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 66e234a0-8306-4afe-a7b1-08dd62dd6bc9
X-MS-Exchange-CrossTenant-AuthSource: CY5PR12MB6405.namprd12.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Mar 2025 09:48:49.7028
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 
 4cr4Z9mxSffpAs9F5qN0CFo4CMXyAGlbnZ6SQlplYkqkxgg7SsQdEVixiksGxMnk3nAuS1/HjPRSHMZGAUuq0A==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR12MB6431

Enable passing idle flags (%SCX_PICK_IDLE_*) to scx_select_cpu_dfl(),
to enforce strict selection criteria, such as selecting an idle CPU
strictly within @prev_cpu's node or choosing only a fully idle SMT core.

This functionality will be exposed through a dedicated kfunc in a
separate patch.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext.c      |  2 +-
 kernel/sched/ext_idle.c | 41 ++++++++++++++++++++++++++++++-----------
 kernel/sched/ext_idle.h |  2 +-
 3 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index db5bc4d57dba4..1756fbb8a668f 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -3396,7 +3396,7 @@ static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wake_flag
 		bool found;
 		s32 cpu;
 
-		cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, &found);
+		cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, 0, &found);
 		p->scx.selected_cpu = cpu;
 		if (found) {
 			p->scx.slice = SCX_SLICE_DFL;
diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index 15e9d1c8b2815..16981456ec1ed 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -418,7 +418,7 @@ void scx_idle_update_selcpu_topology(struct sched_ext_ops *ops)
  * NOTE: tasks that can only run on 1 CPU are excluded by this logic, because
  * we never call ops.select_cpu() for them, see select_task_rq().
  */
-s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, bool *found)
+s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64 flags, bool *found)
 {
 	const struct cpumask *llc_cpus = NULL;
 	const struct cpumask *numa_cpus = NULL;
@@ -455,12 +455,13 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, bool
 	 * If WAKE_SYNC, try to migrate the wakee to the waker's CPU.
 	 */
 	if (wake_flags & SCX_WAKE_SYNC) {
-		cpu = smp_processor_id();
+		int waker_node;
 
 		/*
 		 * If the waker's CPU is cache affine and prev_cpu is idle,
 		 * then avoid a migration.
 		 */
+		cpu = smp_processor_id();
 		if (cpus_share_cache(cpu, prev_cpu) &&
 		    scx_idle_test_and_clear_cpu(prev_cpu)) {
 			cpu = prev_cpu;
@@ -480,9 +481,11 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, bool
 		 * piled up on it even if there is an idle core elsewhere on
 		 * the system.
 		 */
+		waker_node = cpu_to_node(cpu);
 		if (!(current->flags & PF_EXITING) &&
 		    cpu_rq(cpu)->scx.local_dsq.nr == 0 &&
-		    !cpumask_empty(idle_cpumask(cpu_to_node(cpu))->cpu)) {
+		    (!(flags & SCX_PICK_IDLE_IN_NODE) || (waker_node == node)) &&
+		    !cpumask_empty(idle_cpumask(waker_node)->cpu)) {
 			if (cpumask_test_cpu(cpu, p->cpus_ptr))
 				goto cpu_found;
 		}
@@ -521,15 +524,25 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, bool
 		}
 
 		/*
-		 * Search for any full idle core usable by the task.
+		 * Search for any full-idle core usable by the task.
 		 *
-		 * If NUMA aware idle selection is enabled, the search will
+		 * If the node-aware idle CPU selection policy is enabled
+		 * (%SCX_OPS_BUILTIN_IDLE_PER_NODE), the search will always
 		 * begin in prev_cpu's node and proceed to other nodes in
 		 * order of increasing distance.
 		 */
-		cpu = scx_pick_idle_cpu(p->cpus_ptr, node, SCX_PICK_IDLE_CORE);
+		cpu = scx_pick_idle_cpu(p->cpus_ptr, node, flags | SCX_PICK_IDLE_CORE);
 		if (cpu >= 0)
 			goto cpu_found;
+
+		/*
+		 * Give up if we're strictly looking for a full-idle SMT
+		 * core.
+		 */
+		if (flags & SCX_PICK_IDLE_CORE) {
+			cpu = prev_cpu;
+			goto out_unlock;
+		}
 	}
 
 	/*
@@ -560,18 +573,24 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, bool
 
 	/*
 	 * Search for any idle CPU usable by the task.
+	 *
+	 * If the node-aware idle CPU selection policy is enabled
+	 * (%SCX_OPS_BUILTIN_IDLE_PER_NODE), the search will always begin
+	 * in prev_cpu's node and proceed to other nodes in order of
+	 * increasing distance.
 	 */
-	cpu = scx_pick_idle_cpu(p->cpus_ptr, node, 0);
+	cpu = scx_pick_idle_cpu(p->cpus_ptr, node, flags);
 	if (cpu >= 0)
 		goto cpu_found;
 
-	rcu_read_unlock();
-	return prev_cpu;
+	cpu = prev_cpu;
+	goto out_unlock;
 
 cpu_found:
+	*found = true;
+out_unlock:
 	rcu_read_unlock();
 
-	*found = true;
 	return cpu;
 }
 
@@ -810,7 +829,7 @@ __bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 		goto prev_cpu;
 
 #ifdef CONFIG_SMP
-	return scx_select_cpu_dfl(p, prev_cpu, wake_flags, is_idle);
+	return scx_select_cpu_dfl(p, prev_cpu, wake_flags, 0, is_idle);
 #endif
 
 prev_cpu:
diff --git a/kernel/sched/ext_idle.h b/kernel/sched/ext_idle.h
index 68c4307ce4f6f..5c1db6b315f7a 100644
--- a/kernel/sched/ext_idle.h
+++ b/kernel/sched/ext_idle.h
@@ -27,7 +27,7 @@ static inline s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, int node
 }
 #endif /* CONFIG_SMP */
 
-s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, bool *found);
+s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64 flags, bool *found);
 void scx_idle_enable(struct sched_ext_ops *ops);
 void scx_idle_disable(void);
 int scx_idle_init(void);

From patchwork Fri Mar 14 09:45:34 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrea Righi <arighi@nvidia.com>
X-Patchwork-Id: 14016530
Received: from NAM10-BN7-obe.outbound.protection.outlook.com
 (mail-bn7nam10on2056.outbound.protection.outlook.com [40.107.92.56])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id DAAF31F153C;
	Fri, 14 Mar 2025 09:49:01 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=fail smtp.client-ip=40.107.92.56
ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741945743; cv=fail;
 b=R0+wnDmZ+V1r49RlcX4g+wcjaFa8nK8YNJtn1VFDP4+wCJHQSLC//8nPtzlf3GelqpG8OOkxq/PQs3CahPjTsdd6FGu5Z79rBz9Aowq5LTMeVbOjykvQYUv//FN9O4lnIKlxMGaoE7hC9NnBuObREjkCqa3BLdfFPuzUS3WwYBU=
ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741945743; c=relaxed/simple;
	bh=1D2VheNH1Y0+OcqFatN7IMCF21qzaLMDwgiJEOPIkF8=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 Content-Type:MIME-Version;
 b=bisPmKQdyPpD3uh+BEYgXxYA4kuhe4aayt0R7UPhd743xauAt9I7kfS0mYEiUg4Dph+XChP6LrzEstDvhUq/WO61tb/G3mEXlCbms1UjYGdRIbhIXb6V+f7YH1jTazX31ORbsEfPeHsoC4PFxeRrzjvai8whDlEDtu03DHLYPcQ=
ARC-Authentication-Results: i=2; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com;
 spf=fail smtp.mailfrom=nvidia.com;
 dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b=MmjghGXz; arc=fail smtp.client-ip=40.107.92.56
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=fail smtp.mailfrom=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b="MmjghGXz"
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=RpmhDvMjZ14APcR6tUb9+1Fk4QDoSjyQ76Q72W9+KNI4NC5yfWQUr4/pbWazj+W5elWLFT+vjXC6erwbJWQGvPDl3nHTqBBWThIiaIdyo9X6eaulnY+ABannsKez+EaqWHv95UMjJcXwzvrHlNAJRfaJbStgBD/O1caxQwPS2BRHYVfBN8XMR7S6KjRtGAWoLZyNyMmyvHbS2K0KJQjGgM95V460HzdGk51I5OBLh1VG94rrYSzIZc6A48iObTPW4671r33kckhupMIatn4p14/kqnaZl/M+CsEO61gROkHdZTwvnZGgKhuSoAbW8oW4UyzM/k349OkdLh30ZsMLFQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=aMCu3MMj7Y30kvN4Bt88uO3hMhGzCoUcbRVOTlDhLi4=;
 b=f+X/mmdpXn0KIcdEB0+cWvWesRXFjAnt5j8a7bs+O7urLcFvKjiF0VBr2GQ3sKz97MgvPoTqaOOpiLsJe4TIAMSegCYxTgWpfSUvTXVfW3yGjak5vFIjY4lTejv6O3QAT39Ag561QdSgcWM80T+cgkI/qDHeqXrfv46fNNwOMR/eibhydb01yYdBCWqN4a8AOVB3vMgjEmzxDHwLduTfpcWtjmV1anQkRgDtezHHbu4qlCJfETXDaevItH/2Uww9SzfSsua57G1NUpzWF5ZU9Z2jnMNwHAYqfbsvbnVDL0KxJVwnFl8y0hG8WcFKDKuXOP/rJUueTp+oKwrSasWNeg==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com;
 dkim=pass header.d=nvidia.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com;
 s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=aMCu3MMj7Y30kvN4Bt88uO3hMhGzCoUcbRVOTlDhLi4=;
 b=MmjghGXzsmUvx6KgRSD2Kc+HSmOBHxExY/QZXPLh94eXgavVpaBDMRuBrUbsbCXvOg1oiAtDSOQ5fCFFLQusZ0AhJmAF6o1dzxQKU84qdASbhuJOshS9l/snMmXL1K+bb9NUI04LBXbrkdJe7X0OZuznAMbvyjoQqbdqIOdd58iKNGD8DjShX8Qh77oSIccUGQMXUEzO3g/l3+VU647tIajqG9FdOWRg/YdvCY1n+9ThBR60IM3PklbdE87tEBdDOjbI3t4FNQtbVQWQgcOkInKpOFXFv+bzHK7E3cYYjG+r3NkGDM6a5bhDFFbmlPIjP+zilHBL6Vg9L02SHYAsrA==
Authentication-Results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=nvidia.com;
Received: from CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17)
 by CY5PR12MB6431.namprd12.prod.outlook.com (2603:10b6:930:39::8) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8511.27; Fri, 14 Mar
 2025 09:48:59 +0000
Received: from CY5PR12MB6405.namprd12.prod.outlook.com
 ([fe80::2119:c96c:b455:53b5]) by CY5PR12MB6405.namprd12.prod.outlook.com
 ([fe80::2119:c96c:b455:53b5%5]) with mapi id 15.20.8511.031; Fri, 14 Mar 2025
 09:48:59 +0000
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>,
	David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>
Cc: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH 2/8] sched_ext: idle: Refactor scx_select_cpu_dfl()
Date: Fri, 14 Mar 2025 10:45:34 +0100
Message-ID: <20250314094827.167563-3-arighi@nvidia.com>
X-Mailer: git-send-email 2.48.1
In-Reply-To: <20250314094827.167563-1-arighi@nvidia.com>
References: <20250314094827.167563-1-arighi@nvidia.com>
X-ClientProxiedBy: MI0P293CA0004.ITAP293.PROD.OUTLOOK.COM
 (2603:10a6:290:44::15) To CY5PR12MB6405.namprd12.prod.outlook.com
 (2603:10b6:930:3e::17)
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: CY5PR12MB6405:EE_|CY5PR12MB6431:EE_
X-MS-Office365-Filtering-Correlation-Id: dc0b05fd-65da-4a92-2d04-08dd62dd7196
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|366016|376014;
X-Microsoft-Antispam-Message-Info: 
 2Dm6dVmpwki8rqDQLCw04NUe6YnsrIdXqUlld9ySouc1KYliSCBMoWL+Y4MZJc1qQwvLO92/48FUUyYPUHcPnFAamVLVOl+YREXg6+S+SG0XXysCyKpmA6berh8e3LNQ8GTAiL5Jwg//fDQS5chlmZyyb3FucpKVG05FVC9U33C4WELqwm+Sv8E5TIlx0J75cxL/no/pCqiicwJzQOmz90YOryp8/BYhKBWozcrf8ZZzwVsndAMBgOhcL48hHTTZNjtn5Y20QUDTz83dtl7tdBjFJHDZ4/hhqJDdiHUCCy0AuDedEYRhPW0Wvb4RXCh7B4OR3WckVAJ8aoSH4KRoG4kRB609hbiTXoizX70qFXL91aE8SeVs/qCOAmCo8/a762JL48wjlpNAg1cPVmnXwnrhpdYvlsJbvauGTnzMMj6WnNSIcdhdZDC4GtBe3ThHTwshASguRmTGrPAKNUTgls4o3CXsDwKjDvTOGiqDt0iie8ZfwjsEoTQZO9kGQVPyUQdymEUqm25A9iPyWOuEYJq8H9vy8ySEFf92RKusyorBcEvOKFDINrwa4P5fCs/aXA1sGXUyyFwH6lMzGwho2dxGAonVrFw8Kof5hDLjAzsZOak6ebL1g5emwfG+tCWJ5CIaOxEWgBuxXqOIuffmRQo7sU7+a4MLFpQ6A5c00SX0kSIpWqg/y/JiGPuobY3mGAUtD2zGWJxyEwr4Kk+zzzYY8X8LATU9/VyMlj8AXYElFHc92delKf5+/cAEbAeKsW8CEikOanyLpvh0sFJXACMXz9ixYeA0kZGcMbhZKujAkMHjhINW8Xg69WXet5QziRvcncltmMDXoEi72KPqnGAsupta8uoxx/vVNDbl/3tZas5eowEbWz9BRBUqzQlZerKj6GKjTrJ7LFAdu2rKVkVNmpfQb0JzszqTq7W+/URwPJg+l7vV2H81S7uEvAJZDFXuABDxVfpMJS7Udvo/i1psHLEMEnm2QLvhc9n4HCj9wPPftFrO4CUXVsmwUR6gkWRGq2IiKK4G+cVgnpKR2azWj/OVvjp4bajv/6jYqRihx6vLepsIPEj2FlYmboP8II50GdNDNHaK/zn99VoBs8V5d2ZhyYfI5ADQjTGN682cXCpPscIDItM5KuOr76n41jPMtKWacW4Kmw2sma04b+ymmExogIMxgUp4njIcQo1g3YBAqpM+ahjFiicVZ1ZgkGwX7O6SfmCBBpLmHcUJow9rAz9kpaT0hojh/plfw6FKdTTyPGwXZ3rUovfwGLzEfQ202/9sGj14jMtTQWkx0qHUYVeicqtw5NAKIUciBp3/S4v3uuWnp01Sehw1kowEh0VBz/5tm+xWUYVcjkyhDU52ko6ODkHlFM/AuyEWVoug1bX4rchvLJ0/j389Z2iO
X-Forefront-Antispam-Report: 
	CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY5PR12MB6405.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(366016)(376014);DIR:OUT;SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: 
 noIzzuQUZaPCGOWTI9SV13qHXR1mYhLoJPU0aBbbtvzLkGJEErpu4HUQ3QHFu1MWe7A+8hWxEmAbFucBflxl/QmhsOKTVT+tq8s6WIkcP6ot+9HljyNaiNnV+pc+eKm9OgmXfIqGBg49KssDaUoLZgu8i1XgSMs0zi0LcBaRKLCq3oB3wn5tA7MEVXgOKGdgukYVTKxF0QVfKQWey46pYRL4ti9Z59spIWFWY8imQ7yz+GYCoSzZ3caWwG2DsjyH677hhNtkGNaxVhgBiojQsFZNI5EqEurSPmx12AD4nhqSsRAkyA50jz8fPFCbEct+D6bKrmLoT/hYtivH0Zznc45eehCkpOmOZUOqr/ReytwI3E7/btX0YCkBKeVixuVfIVmyTVzoGcJAwuB6OextgIHsxrgdr3OAKt3RKWnfAxlsdYhgLaMSGq/Yc905LDKGaCEjhXJTqq9EMB0Y/o1RGno+dPzGvgAbReYloI3Haxo9rESpkfuRq+iMxVZ9Q7HAxuGVG72FMjiatMm60tM1p7YXEXx4UsRqHGA2d9G7S83tSPZJ4wfTC2XT3kUoFBPO9JjWhLzdbt8lT6E+OYJzqkH+y9bzrG8gLGq4cwZzGKeKWmiH97LXXfU3AHRIBhuH3xRfd8c0QlrU2d4XCpz+C95lEGM7dG6iPdlNoXASn5tQ1zf7ep2H+wXNbXNxbrRlSdaOD21piRZKpSPdFKu8FKdBmcqpAA79/a1zIFW1dCWrW2SuNMo8mJFxtSmMO0RQ7w4cXWrgJTl8MLbzfrlhPaCUIHKIT9qzRo/F8t96SdW87S3uGyX7wH1kFajyS42BAfnFAATMLmu+EqiGHwf8IXBfXeHcz+49GeBICjO6CdLReizNr6dHHjWUcw1uerNbZW1f+6GXOgRz4aKwLDpfCdjTFr90hja1MJC/uGZMyXNW4xLuOKFzGXhSO9pHiyN5Ykf9g7OSeTOWzxaOVKbq8JC508V/alUhYg4iBvZV+VWjHGMEvUGHQ3qUUrrfeYE/K3ptEN/zJCNHcA5/xN8f0G/a0aMK97gttNlCisJUxMjYSZuWgQYAcQzfnCoZqOXM5d4KEb5vI6xGYt4XuAGePMw1XWtxVEMzuP+Lssn4RpgE8z4RlV5toJo5cD5HQjACvm75PHUSH1ISv4rNCULsg/frq6ACMBLhJ67sbSTN6t8d6OH+NrnKTsSHOlGr3Mmb1U/HEIhgpWJ35kx8NFEsCliIXh/3FiZ382m8yqSWpZjmkvkpLasWimMGKeBC/oU+WTdyiO0o9XLUhkuxpmn2E5qyJZCJtA0Ws/upTbRN8wvrgjA4K3U3swM/8vHNyOGtJYmsLkcQl8q53i1paMdel8f8RDVC6E4T5nHFXAxdKNZzD8indxxp51Iih0zY9Jh0Kbw+bN9gLi9ggfwvomdmp4rxxa5y5Wajv5AIwIZPR2l2S7Msg8bR8SpoyF1bbDNwmK7nUJqhm/1DPAn8/2CceWqpLjTv4DRSPffxPmpa6qbZIAFBDYLJpZL3kbokAWW/vCpQwnheGcZ/lUdrwaouB+wR+XQOM+2ixu/AUkfkXeF6wmOUkFWEp6175yTabLG1
X-OriginatorOrg: Nvidia.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 dc0b05fd-65da-4a92-2d04-08dd62dd7196
X-MS-Exchange-CrossTenant-AuthSource: CY5PR12MB6405.namprd12.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Mar 2025 09:48:59.0286
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 
 KB5/w0VB3sHcXXkXVGGtByNDlWx2ycxYI83tkMGK/hsgavlqOtKZjkUN+i+Vj+x8LFyn5wtVvl2tn0e0Q8c3jw==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR12MB6431

Make scx_select_cpu_dfl() more consistent with the other idle-related
APIs by returning a negative value when an idle CPU isn't found.

No functional changes, this is purely a refactoring.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext.c      |  9 ++++----
 kernel/sched/ext_idle.c | 50 +++++++++++++++++++++++------------------
 kernel/sched/ext_idle.h |  2 +-
 3 files changed, 34 insertions(+), 27 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 1756fbb8a668f..06561d6717c9a 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -3393,16 +3393,17 @@ static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wake_flag
 		else
 			return prev_cpu;
 	} else {
-		bool found;
 		s32 cpu;
 
-		cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, 0, &found);
-		p->scx.selected_cpu = cpu;
-		if (found) {
+		cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, 0);
+		if (cpu >= 0) {
 			p->scx.slice = SCX_SLICE_DFL;
 			p->scx.ddsp_dsq_id = SCX_DSQ_LOCAL;
 			__scx_add_event(SCX_EV_ENQ_SLICE_DFL, 1);
+		} else {
+			cpu = prev_cpu;
 		}
+		p->scx.selected_cpu = cpu;
 
 		if (rq_bypass)
 			__scx_add_event(SCX_EV_BYPASS_DISPATCH, 1);
diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index 16981456ec1ed..52c36a70a3d04 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -411,22 +411,26 @@ void scx_idle_update_selcpu_topology(struct sched_ext_ops *ops)
  *
  * 5. Pick any idle CPU usable by the task.
  *
- * Step 3 and 4 are performed only if the system has, respectively, multiple
- * LLC domains / multiple NUMA nodes (see scx_selcpu_topo_llc and
- * scx_selcpu_topo_numa).
+ * Step 3 and 4 are performed only if the system has, respectively,
+ * multiple LLCs / multiple NUMA nodes (see scx_selcpu_topo_llc and
+ * scx_selcpu_topo_numa) and they don't contain the same subset of CPUs.
+ *
+ * If %SCX_OPS_BUILTIN_IDLE_PER_NODE is enabled, the search will always
+ * begin in @prev_cpu's node and proceed to other nodes in order of
+ * increasing distance.
+ *
+ * Return the picked CPU if idle, or a negative value otherwise.
  *
  * NOTE: tasks that can only run on 1 CPU are excluded by this logic, because
  * we never call ops.select_cpu() for them, see select_task_rq().
  */
-s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64 flags, bool *found)
+s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64 flags)
 {
 	const struct cpumask *llc_cpus = NULL;
 	const struct cpumask *numa_cpus = NULL;
 	int node = scx_cpu_node_if_enabled(prev_cpu);
 	s32 cpu;
 
-	*found = false;
-
 	/*
 	 * This is necessary to protect llc_cpus.
 	 */
@@ -465,7 +469,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64
 		if (cpus_share_cache(cpu, prev_cpu) &&
 		    scx_idle_test_and_clear_cpu(prev_cpu)) {
 			cpu = prev_cpu;
-			goto cpu_found;
+			goto out_unlock;
 		}
 
 		/*
@@ -487,7 +491,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64
 		    (!(flags & SCX_PICK_IDLE_IN_NODE) || (waker_node == node)) &&
 		    !cpumask_empty(idle_cpumask(waker_node)->cpu)) {
 			if (cpumask_test_cpu(cpu, p->cpus_ptr))
-				goto cpu_found;
+				goto out_unlock;
 		}
 	}
 
@@ -502,7 +506,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64
 		if (cpumask_test_cpu(prev_cpu, idle_cpumask(node)->smt) &&
 		    scx_idle_test_and_clear_cpu(prev_cpu)) {
 			cpu = prev_cpu;
-			goto cpu_found;
+			goto out_unlock;
 		}
 
 		/*
@@ -511,7 +515,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64
 		if (llc_cpus) {
 			cpu = pick_idle_cpu_in_node(llc_cpus, node, SCX_PICK_IDLE_CORE);
 			if (cpu >= 0)
-				goto cpu_found;
+				goto out_unlock;
 		}
 
 		/*
@@ -520,7 +524,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64
 		if (numa_cpus) {
 			cpu = pick_idle_cpu_in_node(numa_cpus, node, SCX_PICK_IDLE_CORE);
 			if (cpu >= 0)
-				goto cpu_found;
+				goto out_unlock;
 		}
 
 		/*
@@ -533,7 +537,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64
 		 */
 		cpu = scx_pick_idle_cpu(p->cpus_ptr, node, flags | SCX_PICK_IDLE_CORE);
 		if (cpu >= 0)
-			goto cpu_found;
+			goto out_unlock;
 
 		/*
 		 * Give up if we're strictly looking for a full-idle SMT
@@ -550,7 +554,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64
 	 */
 	if (scx_idle_test_and_clear_cpu(prev_cpu)) {
 		cpu = prev_cpu;
-		goto cpu_found;
+		goto out_unlock;
 	}
 
 	/*
@@ -559,7 +563,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64
 	if (llc_cpus) {
 		cpu = pick_idle_cpu_in_node(llc_cpus, node, 0);
 		if (cpu >= 0)
-			goto cpu_found;
+			goto out_unlock;
 	}
 
 	/*
@@ -568,7 +572,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64
 	if (numa_cpus) {
 		cpu = pick_idle_cpu_in_node(numa_cpus, node, 0);
 		if (cpu >= 0)
-			goto cpu_found;
+			goto out_unlock;
 	}
 
 	/*
@@ -581,13 +585,8 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64
 	 */
 	cpu = scx_pick_idle_cpu(p->cpus_ptr, node, flags);
 	if (cpu >= 0)
-		goto cpu_found;
-
-	cpu = prev_cpu;
-	goto out_unlock;
+		goto out_unlock;
 
-cpu_found:
-	*found = true;
 out_unlock:
 	rcu_read_unlock();
 
@@ -819,6 +818,9 @@ __bpf_kfunc int scx_bpf_cpu_node(s32 cpu)
 __bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 				       u64 wake_flags, bool *is_idle)
 {
+#ifdef CONFIG_SMP
+	s32 cpu;
+#endif
 	if (!ops_cpu_valid(prev_cpu, NULL))
 		goto prev_cpu;
 
@@ -829,7 +831,11 @@ __bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 		goto prev_cpu;
 
 #ifdef CONFIG_SMP
-	return scx_select_cpu_dfl(p, prev_cpu, wake_flags, 0, is_idle);
+	cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, 0);
+	if (cpu >= 0) {
+		*is_idle = true;
+		return cpu;
+	}
 #endif
 
 prev_cpu:
diff --git a/kernel/sched/ext_idle.h b/kernel/sched/ext_idle.h
index 5c1db6b315f7a..511cc2221f7a8 100644
--- a/kernel/sched/ext_idle.h
+++ b/kernel/sched/ext_idle.h
@@ -27,7 +27,7 @@ static inline s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, int node
 }
 #endif /* CONFIG_SMP */
 
-s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64 flags, bool *found);
+s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64 flags);
 void scx_idle_enable(struct sched_ext_ops *ops);
 void scx_idle_disable(void);
 int scx_idle_init(void);

From patchwork Fri Mar 14 09:45:35 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrea Righi <arighi@nvidia.com>
X-Patchwork-Id: 14016531
Received: from NAM12-DM6-obe.outbound.protection.outlook.com
 (mail-dm6nam12on2081.outbound.protection.outlook.com [40.107.243.81])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D06D1F3BAD;
	Fri, 14 Mar 2025 09:49:09 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=fail smtp.client-ip=40.107.243.81
ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741945752; cv=fail;
 b=tMRh+PoZ+yc7+s/fBcoivhptpNvlvYSa5qhxvkYuKmSyeiG8g4i1lFgIJILVf4iOJ7maQ2iRob33ivIr4hRpThHn2biscbJdsASv0qns53V0rLi2LrSIlex9gskZ2XahkUhWcKSzef5PH+WXdLnqqxf69QmpyL96I8dkb0b6CgI=
ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741945752; c=relaxed/simple;
	bh=elTVn+LGVLe4c4BDxDiA0RbqpurMmETKvQpXmXyYBQc=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 Content-Type:MIME-Version;
 b=clZWNliH2GYeRacQtsgBjx3wFbLxQWiNxyIeoLrGYWi32xHJtLnEVsCxZ+l8+vRpi5VmjJCO20Z8qqG60VDv6u1xK8fPpAV5+ki+g1ZVVw59/RAR+O6ZphXtts3gGnEaAGFAo4Xw79/xtrIMOL9xfAAFk2iQoxuftt0RnE6Kx9A=
ARC-Authentication-Results: i=2; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com;
 spf=fail smtp.mailfrom=nvidia.com;
 dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b=hHCzZtae; arc=fail smtp.client-ip=40.107.243.81
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=fail smtp.mailfrom=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b="hHCzZtae"
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=h/KJPmrHhM3pFCarJQZr5OIxY2PVcXgJ2JxHQfHatvkprfd1PwT53YdFLd9fvw9j59r92f2IhUk/PQOCaA7LvnwCDIU06JXR/5EpFblDSZeOdWFUQSak17NrhQip09KMPMrXONZqrVSO3sm54i1vzHUrqLcqNln684NrLKYAcf2q2ueoTI/5hM9rX/pyayvxGgl2nTAUVt+Kn0PT4xOqtoGg5J6+57X3F/qXvOE/BstAafKV+vV7UtzEbkpBwjPUKIZLvrtURgkK/1UhkxEv+k47pz24/eOlcvEi5drL6TorHIBZrPo1gUhpQx/JmZuKrEnBb2W16PlFJgoAh1fwFw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=YPm/U4270S6e++2WO+Sw7SXsT8WrGZ1yMKmTeOvebvQ=;
 b=gaGbnpy/ss2oyIuvNss8IxSYB+Mku9DYUoquv2Inm6FIVeJ86TMqKSyZ6BAg3cHxvOYsej3ltbZn6AdcFHDXPAHnfYvERmXHW4mgHx+vJzz0EXQqrYckdPNDfKWcZ6OOlmtjfDbSeyMcomjAUbpNScwtdQBz4AaBp7YxD6cxpjRD+LRE+wS4Eqb4Xw2okouD/CoCULMsIm89VRQp7h+iCbDeVeby80yKzVnR1ETkL72uTgTIFmEZkpR3FMPTbkcsYz0uZngeVDAuLdBhvTRPh9tR9u3d7j/Ap1BRUB/lamoH0tpnyaZfIznpgOJwYBdQpr+S4nRwGE9W1RfOKLTSsw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com;
 dkim=pass header.d=nvidia.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com;
 s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=YPm/U4270S6e++2WO+Sw7SXsT8WrGZ1yMKmTeOvebvQ=;
 b=hHCzZtaeg9DgoNCPxaPXFb0xvu9Kx1RCJdbsPr4NwR16BnetMENNSXIvm0sJcjUXkLAEsRXxaoPDpeQ9rpoSl7UxY5WsrEaJD7fYfM4nvIjbfm1vNl0W/yLG8+jd/E3lQezR0mQLwKh3xSSNqXUKdhJDCpxH9ZPi0jP1U65+UJ76piXeAX48jW4y7s4GsjCZAjlNaeMI5+3xWvdqK11zagfYq86VxY3aH41zde9U/yZh54dRzsivj8gWY+8T5WLhsHVf1NqQyeOqXUnKQ6wGHPjbRpDLuCUpm6kh33UOCRST+Sa2oX5PaPHVml0vjUOwK7J5QHZr5SyGNNu1/u5Ijw==
Authentication-Results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=nvidia.com;
Received: from CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17)
 by CY5PR12MB6431.namprd12.prod.outlook.com (2603:10b6:930:39::8) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8511.27; Fri, 14 Mar
 2025 09:49:08 +0000
Received: from CY5PR12MB6405.namprd12.prod.outlook.com
 ([fe80::2119:c96c:b455:53b5]) by CY5PR12MB6405.namprd12.prod.outlook.com
 ([fe80::2119:c96c:b455:53b5%5]) with mapi id 15.20.8511.031; Fri, 14 Mar 2025
 09:49:07 +0000
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>,
	David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>
Cc: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH 3/8] sched_ext: idle: Extend topology optimizations to all
 tasks
Date: Fri, 14 Mar 2025 10:45:35 +0100
Message-ID: <20250314094827.167563-4-arighi@nvidia.com>
X-Mailer: git-send-email 2.48.1
In-Reply-To: <20250314094827.167563-1-arighi@nvidia.com>
References: <20250314094827.167563-1-arighi@nvidia.com>
X-ClientProxiedBy: ZR0P278CA0015.CHEP278.PROD.OUTLOOK.COM
 (2603:10a6:910:16::25) To CY5PR12MB6405.namprd12.prod.outlook.com
 (2603:10b6:930:3e::17)
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: CY5PR12MB6405:EE_|CY5PR12MB6431:EE_
X-MS-Office365-Filtering-Correlation-Id: 19d6974d-a277-469f-5b08-08dd62dd76d6
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|366016|376014;
X-Microsoft-Antispam-Message-Info: 
 JP3qVEQoz9h59JjnIAXrxTNdl0ZNWWsC0+o8Ihi00DgY+9GOxa7XWXzPxOge6meUkob+Mi9DW+HcYFtkvETvtPRDSoYlLlc1uxzhHGJglDnucylp00X4+Q13scXcBL+xefe7jGFtigmaZB8CwDulvk2zvuAsIX+vvL1ixC5NpaZWFYs5BtQIgXnlUtDgDW8R5mssmjjwV8AHDICaIh7JOPchTcfB0QLzUS9O114BTWNPs5lNth25cn57nCRfcbWz3XsFqRXmKTbjmT8aAanYBv9QGsarbTfXhOyr7rOMPF6KEpRqO9WEUtn+Dhhjb5y3qxTrAH1jijLyOn1PqeTaVDygidcMUybRsRZ4HLsBCxYkHH/J2IoWnbhHtV+sfNCFN+IxYHMG7aiBpLrXZ5aZzC0IpOy7SDWJSlsBMk94SfMDLHVG1q462Jhe2mxtcamvbiHECAyvrSyjnFm8yT0lxpaejAYASLpNmXEc8cHvFUHGw/3kDWcRw2VqIkMP+zsEhD0BvPo7AE/cZtvvvR9Yuzp6rc6E3CukIAmiZ74QQG2Gjh/qj8pJyo1l3/OJ5x6z/EETiqF1dj2cwNJ5NUmO59X1oDJRBOAW1aIQcskJA3CebxlSUZk/2YY2Aae5XvUifr4czoM7vxbm5fGAzvrHVTed/4btVdJrVfaRy4GuoAWGEPopxfBa4LsSOSpFzd+l4TtnwP3wWet0iMGUttWfmVAwAEJFUq6J7sN+ZdnlqEcPNNicRJuGbplZERjuYVIlVMyFTrSSBHCdJlZ66zLxSWUH1ZZc4W1EvhPcFnPl7kweKdBi2rDGQD6MyyK/j+cWHpkXYuUZYg70Y50SEq0EDpLAhI5LYuKBW5GDEU/36gR+I0srUwUSJKjUCT/mdgDNLf10JMloMWH7p3OEpwXjUn3RO825jmzXbZ8M21LnQ1TaLXVDSGiNPuMoWTVnFtxjor2OfBrz7jJdmqfP5vPnBv2q9+L93Rf8338vKIXno1bn3a5hDoftyZd9rSZ/ciZknn6djgRXv/IDtqszJl7IHzsgWbbRFKJMa6+ftUpE8CwykmA1iuEdNZVBjPWhsQsxTxbCs9QL6oXLEyBbOi1ZgKI+PIz6pkFZUzxZ0WwUVEGS0g+baGVjuQoThC9aVk0wioeZZERnO6Lyv4vz6wfLVY1B6Q77OGuRszYukTICP8fTsetOXKWrd96dEfuUrvwRKvNXL06MnjL8iYStXnV2MoAkLOS/iJWKFmx2xAX0R01EEtYnXTRFQI5EoN9BvSWO8iP35lDwJrOGOjP/vZbe1W+CqLYbzqh+KGGqMwfWVAUJK3MrGlkJY4pqNr58rEQ0mH6yuZ+pCvNcg5jDQN6WQ+kLUfcscj1hXKMBt6U5/y9P7RvXoV/SDJig/qepTpS+
X-Forefront-Antispam-Report: 
	CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY5PR12MB6405.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(366016)(376014);DIR:OUT;SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: 
 ULigGzKBW8Re2l2aoc1YgVnCZYx39ecZ1KIsZFOzWOcN2lgtGPkZF7lRxndrqiPNTEkLfBZsTRfx3zra5PNLnljwmsua8IfihLI9ow76s99erpo6lOeGZeTZKRYppR0F+0W0carvgdTdDo1eV49PdEnkBk40iqML6VYYc5pEFBicB7ezbdtH3m5lLE7TnFv1kcrYffbxCzewGiX0VmIbM2DgB4Jf/sj7C2xktlKOjKEtPyXJp50DjNeWZdV2Y9iMoS6z4aqEGPekhpC4TKSzK+2FjLlLWhsJQJOT4hmzGOwQ757RREdCOXoHBXIAcHgFjb+r/pCIkLqr3QJjPnfk5bwRAnougI5CXvf/AGjaghau1qU21WEG6srEgs4E5fx0wmerQpMQhXf8Pzy4Sviz0KUKqpkFtD91Ey0oO1pnNQJCAiJikmsuzZIwIwV4UYCoe8DO86rXZifEIYNazm/9O9CxQsQZCRj7lw+BeJUZgAqGuzM5n4IPHiKDmOhONlP1RmIotY0V5Ez0o72iGWoDghLPjPH/ZLZ7O/9ymEyP573mH0tVTopxLYM2Deic1feEbKidEa9iyWAthgSuekDroNcxzchuqt6FHGmUosfzoW1+b6Mr2y3z9NJIdzjC21rFR6/PoKDOQaE2BRXRj2zH7CZSSO2CnxOUokOcyc0AB0eaoVGQH9SmJotxAwxdi2Y0nXy6QtHnYunVhQj5NNOfmbS//1shFPtzjfHg2Tm2SKHbNWGaT5mX4L6RhGAcprDfZ0WPcRKdZxINODp1IBSx4IG0iULubIXpkNjseJV5dIxPRJdPfYwxp+yvARsV81hloMbZQQNkT46kIotlM9NgtahcyDVfS8gd92BXplduvxOLUe9sgVYJkfSeUm4MEHVksVD850a20yCLeuSnFVA7mbu1DUjCGP1Xyi4Y0bh/D7wR5msYnTuFKvy3XfWstz1KY9IyurHXsjbAsqtZ8MIGCsODmrVvzPGFBbRu4276MOzanJvCsVLRfYpLjsian/1s53lePZK6J0rk7vhnp3o7MVELaMx4pxNgVBBd7TAA5UbzYjcw4qUUUNVho0+3nZVj8FzicnjicWWODCDynltgIARginSwngnIn/gOpKPbpCXSW1zM+2RHnZTqaTQR4jQB0is7/8J+RNgx0GwFCgGDDG7WuX558fsxOsQNs3NMcjc6EGG0DPsDp1CPDV7AAQW67YoXOetBh4P3wHLZym97OleRUdz6HDB4qVuSupplJBoWXLnWkL+Q3kma86e/TxZOT+HUeLaa/wCPWCCkV9FW4vyypOjQkAIJowjr7fSdbwW7b42g0PwyTIuRvbBth5wMip8d4+186JwdcjxPz0uaZ0Ivc/ohB3I2Qx10ZwvlzNjO0PQELL0PU8BbcF7rmN/YJ2JcWekzYUS+TYjVtSH8heRtlb4X6ro2LJX5bwP9WbDyQUDbKSm89kHdXOYXLgrZhqXIaVVLkS9E3LAOc8DyesgpPB2AwT+wAdDTuq9IeWUPg9T2NINQdmrlCFS263/J1pZ/4zFJPFraKcn2922TocJiQf3U6adqy9kX+dsS8v7aZkjfFxHeBNlpVb2H5Tp5
X-OriginatorOrg: Nvidia.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 19d6974d-a277-469f-5b08-08dd62dd76d6
X-MS-Exchange-CrossTenant-AuthSource: CY5PR12MB6405.namprd12.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Mar 2025 09:49:07.8335
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 
 DlvlXBdn/gxWhkF+5jDDEnA9fMKc8TJUOEWeqURR2s5hvM6InqAzlucAm4Rbi9VZEuXEBKs7uVohFePMiq5LRg==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR12MB6431

The built-in idle selection policy, scx_select_cpu_dfl(), always
prioritizes picking idle CPUs within the same LLC or NUMA node, but
these optimizations are currently applied only when a task has no CPU
affinity constraints.

This is done primarily for efficiency, as it avoids the overhead of
updating a cpumask every time we need to select an idle CPU (which can
be costly in large SMP systems).

However, this approach limits the effectiveness of the built-in idle
policy and results in inconsistent behavior, as affinity-restricted
tasks don't benefit from topology-aware optimizations.

To address this, modify the policy to apply LLC and NUMA-aware
optimizations even when a task is constrained to a subset of CPUs.

We can still avoid updating the cpumasks by checking if the subset of
LLC and node CPUs are contained in the subset of allowed CPUs usable by
the task (which is true in most of the cases - for tasks that don't have
affinity constratints).

Moreover, use temporary local per-CPU cpumasks to determine the LLC and
node subsets, minimizing potential overhead even on large SMP systems.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext_idle.c | 73 +++++++++++++++++++++++++++--------------
 1 file changed, 49 insertions(+), 24 deletions(-)

diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index 52c36a70a3d04..1940baedde157 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -46,6 +46,12 @@ static struct scx_idle_cpus scx_idle_global_masks;
  */
 static struct scx_idle_cpus **scx_idle_node_masks;
 
+/*
+ * Local per-CPU cpumasks (used to generate temporary idle cpumasks).
+ */
+static DEFINE_PER_CPU(cpumask_var_t, local_llc_idle_cpumask);
+static DEFINE_PER_CPU(cpumask_var_t, local_numa_idle_cpumask);
+
 /*
  * Return the idle masks associated to a target @node.
  *
@@ -426,8 +432,7 @@ void scx_idle_update_selcpu_topology(struct sched_ext_ops *ops)
  */
 s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64 flags)
 {
-	const struct cpumask *llc_cpus = NULL;
-	const struct cpumask *numa_cpus = NULL;
+	struct cpumask *llc_cpus = NULL, *numa_cpus = NULL;
 	int node = scx_cpu_node_if_enabled(prev_cpu);
 	s32 cpu;
 
@@ -437,22 +442,34 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64
 	rcu_read_lock();
 
 	/*
-	 * Determine the scheduling domain only if the task is allowed to run
-	 * on all CPUs.
-	 *
-	 * This is done primarily for efficiency, as it avoids the overhead of
-	 * updating a cpumask every time we need to select an idle CPU (which
-	 * can be costly in large SMP systems), but it also aligns logically:
-	 * if a task's scheduling domain is restricted by user-space (through
-	 * CPU affinity), the task will simply use the flat scheduling domain
-	 * defined by user-space.
+	 * Determine the subset of CPUs that the task can use in its
+	 * current LLC and node.
 	 */
-	if (p->nr_cpus_allowed >= num_possible_cpus()) {
-		if (static_branch_maybe(CONFIG_NUMA, &scx_selcpu_topo_numa))
-			numa_cpus = numa_span(prev_cpu);
-
-		if (static_branch_maybe(CONFIG_SCHED_MC, &scx_selcpu_topo_llc))
-			llc_cpus = llc_span(prev_cpu);
+	if (static_branch_maybe(CONFIG_NUMA, &scx_selcpu_topo_numa)) {
+		struct cpumask *cpus = numa_span(prev_cpu);
+
+		if (cpus && !cpumask_equal(cpus, p->cpus_ptr)) {
+			if (cpumask_subset(cpus, p->cpus_ptr)) {
+				numa_cpus = cpus;
+			} else {
+				numa_cpus = this_cpu_cpumask_var_ptr(local_numa_idle_cpumask);
+				if (!cpumask_and(numa_cpus, cpus, p->cpus_ptr))
+					numa_cpus = NULL;
+			}
+		}
+	}
+	if (static_branch_maybe(CONFIG_SCHED_MC, &scx_selcpu_topo_llc)) {
+		struct cpumask *cpus = llc_span(prev_cpu);
+
+		if (cpus && !cpumask_equal(cpus, p->cpus_ptr)) {
+			if (cpumask_subset(cpus, p->cpus_ptr)) {
+				llc_cpus = cpus;
+			} else {
+				llc_cpus = this_cpu_cpumask_var_ptr(local_llc_idle_cpumask);
+				if (!cpumask_and(llc_cpus, cpus, p->cpus_ptr))
+					llc_cpus = NULL;
+			}
+		}
 	}
 
 	/*
@@ -598,7 +615,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64
  */
 void scx_idle_init_masks(void)
 {
-	int node;
+	int i;
 
 	/* Allocate global idle cpumasks */
 	BUG_ON(!alloc_cpumask_var(&scx_idle_global_masks.cpu, GFP_KERNEL));
@@ -609,13 +626,21 @@ void scx_idle_init_masks(void)
 				      sizeof(*scx_idle_node_masks), GFP_KERNEL);
 	BUG_ON(!scx_idle_node_masks);
 
-	for_each_node(node) {
-		scx_idle_node_masks[node] = kzalloc_node(sizeof(**scx_idle_node_masks),
-							 GFP_KERNEL, node);
-		BUG_ON(!scx_idle_node_masks[node]);
+	for_each_node(i) {
+		scx_idle_node_masks[i] = kzalloc_node(sizeof(**scx_idle_node_masks),
+							 GFP_KERNEL, i);
+		BUG_ON(!scx_idle_node_masks[i]);
+
+		BUG_ON(!alloc_cpumask_var_node(&scx_idle_node_masks[i]->cpu, GFP_KERNEL, i));
+		BUG_ON(!alloc_cpumask_var_node(&scx_idle_node_masks[i]->smt, GFP_KERNEL, i));
+	}
 
-		BUG_ON(!alloc_cpumask_var_node(&scx_idle_node_masks[node]->cpu, GFP_KERNEL, node));
-		BUG_ON(!alloc_cpumask_var_node(&scx_idle_node_masks[node]->smt, GFP_KERNEL, node));
+	/* Allocate local per-cpu idle cpumasks */
+	for_each_possible_cpu(i) {
+		BUG_ON(!alloc_cpumask_var_node(&per_cpu(local_llc_idle_cpumask, i),
+					       GFP_KERNEL, cpu_to_node(i)));
+		BUG_ON(!alloc_cpumask_var_node(&per_cpu(local_numa_idle_cpumask, i),
+					       GFP_KERNEL, cpu_to_node(i)));
 	}
 }
 

From patchwork Fri Mar 14 09:45:36 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrea Righi <arighi@nvidia.com>
X-Patchwork-Id: 14016532
Received: from NAM12-DM6-obe.outbound.protection.outlook.com
 (mail-dm6nam12on2058.outbound.protection.outlook.com [40.107.243.58])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C95991F30BB;
	Fri, 14 Mar 2025 09:49:14 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=fail smtp.client-ip=40.107.243.58
ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741945760; cv=fail;
 b=JpVxDSXVp7nzt27+lPbjCOD1Kk5wWMwHEHdf1q/OzlhTdT7phbmjVF4H9HLUcWSn19yD4xRR3Yz99T5r/T6KFsHERSPemBqneZA4bjP8KBHt6hfBZeCSy2giF0lTNrGA0CgmvEyAICnwHpIJMfErivGRgXXyALe9fUEmFA0b134=
ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741945760; c=relaxed/simple;
	bh=hlXkbZyUl44Q+iEOLwaldzCyKc1R+CvoSPRbRbt0v7I=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 Content-Type:MIME-Version;
 b=cp5Z2DMRnVsppLR9QuKaaVYXHrzJDXT+qZZuV2yfJRXP9d5YAT1xr+AkvBqH8k66/1F2doPkZxP8+8IVi0T1yOMIMUoItOnRXTmsOzBrYsxaPq0ZTylReINVwuMdteqV3B0tXZXBtnWRuPB4E2kqaL2PBi6Lpnsjo0KSQsRRBVw=
ARC-Authentication-Results: i=2; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com;
 spf=fail smtp.mailfrom=nvidia.com;
 dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b=gHYtc4Og; arc=fail smtp.client-ip=40.107.243.58
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=fail smtp.mailfrom=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b="gHYtc4Og"
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=j758joNP5Dx6/E3rZQBWDauQsjEd8eeuzayqyOz9xxZIKbl7uGWNLmC6jLqkRjc6BHEy5Gwy2JtcQE6r599B4WOl9ubpJWWpXI7X3nd1NC5jTlL6q+8HBrQ0gY3Kj28dOeuQFVMQA6HMP1qHhgnBRDua8g1ZW4V2vUe6OI8EGEHKUJWGeShZMzGtNMlCMOR9LU+GzDHVOIabkbLorzkcX9gokVdmLX4TbOcBIliIZ4Uo1UNdmgYumpoPGfqfGeoxdS+PV+xaerwQUkb94gUU3S6OwtOKjj3FEcIgF88x1oDYQ2QRrDDoe5ER34Fl/Hxs2VZYuC/2ESP9zi88ct4JOw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=DMn8rp9O9EwslskOheB3gYvdo2XTrbiXlwMHGYAFQGM=;
 b=rW46T8AUx0Rptz8rs8PrkLI8Jk+wdMOsuMtsspvj7DpvzOgWjWr3D3ymf24amRMSpeX12VcOAOPqd8U1AfBLhXTacwm/tE2Q6Xe7HfS9aTsHlsGnBWtyjX6SrmooLoTE4vS2Y49R5w0W4uPZf/rvnlZzv8LXMIV/ORm3uy/kTpA4xm3l6bBRJMNDzHvJJa08Rw6W+Mnp3kS0ZIUFC8lTUj3Tperqvpn+rTQpeztdScfxDAU02dLdRn+2mj39THS17GdQtUewuVs+IXgRA33rdtdrkXNdSnkZpo5gQRMKzU1mkR5U+/lzgDAo308ve4kOdrVprqInhh+6IadrAt0K8w==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com;
 dkim=pass header.d=nvidia.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com;
 s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=DMn8rp9O9EwslskOheB3gYvdo2XTrbiXlwMHGYAFQGM=;
 b=gHYtc4OgFj/9yfQdP93p07m7KjB5DDSciN2djnTv9MV2pE5atwot+TNbxiwBV/jvYO89ffQybyVTiyF23waAXy87TIcx8BIQuqsLiuu+OtCp7dgjOD6Jyp6IDIwSqO6DiiAxhkkmDQ0/IwvhSQLynw/kbOjTIZ4fD5264wac5m1Gk7ib2a9BNzbhOrcaKDcrWSln8sr2AT0AmkmlPWl/05+aCI11C+qzo2WVg1ahc80bZviqKK8lSIFQmG+MHKq2T+4rkwQfMncmrR+nAeWwZtUGIpTTErM2AAs8ijS1frMh371g0dx6uf37AMHH4fYtn2/6qWT+Ss9zwFqpyox/7g==
Authentication-Results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=nvidia.com;
Received: from CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17)
 by CY5PR12MB6431.namprd12.prod.outlook.com (2603:10b6:930:39::8) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8511.27; Fri, 14 Mar
 2025 09:49:12 +0000
Received: from CY5PR12MB6405.namprd12.prod.outlook.com
 ([fe80::2119:c96c:b455:53b5]) by CY5PR12MB6405.namprd12.prod.outlook.com
 ([fe80::2119:c96c:b455:53b5%5]) with mapi id 15.20.8511.031; Fri, 14 Mar 2025
 09:49:12 +0000
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>,
	David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>
Cc: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH 4/8] sched_ext: idle: Explicitly pass allowed cpumask to
 scx_select_cpu_dfl()
Date: Fri, 14 Mar 2025 10:45:36 +0100
Message-ID: <20250314094827.167563-5-arighi@nvidia.com>
X-Mailer: git-send-email 2.48.1
In-Reply-To: <20250314094827.167563-1-arighi@nvidia.com>
References: <20250314094827.167563-1-arighi@nvidia.com>
X-ClientProxiedBy: BYAPR08CA0062.namprd08.prod.outlook.com
 (2603:10b6:a03:117::39) To CY5PR12MB6405.namprd12.prod.outlook.com
 (2603:10b6:930:3e::17)
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: CY5PR12MB6405:EE_|CY5PR12MB6431:EE_
X-MS-Office365-Filtering-Correlation-Id: 59d545f8-847f-4c6f-9aec-08dd62dd79a0
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|366016|376014;
X-Microsoft-Antispam-Message-Info: 
 W72TTPr3Bga15BkFY6GHyBQF+rgwIb8fbGsTDO/AGrDDiNDRVyKzADScw2ON1jRFVr47aK/yYHdm7OQIvE5zelIb9rUYYpjxlOEQDbIj217I9IvymYsWbt1yzU6Q2J83Z3UB4oP9jUnkpf7IqDfRRFOpgMClsIkqHaA53m6jw5IADluy31gRX9dVIVjeIjUje0fyAnXQItD9clccUPM58/zf6L6S5iwatBmoyrm5CpjsRAeIMVqOPlYyjNJ7tWFDviXh9YqP8k3hSIT+zv05Mq0W/9FEM6RjClJKXn0Iu0EwBdJCvFFpTRlfA7jNDfcCy/8vI2gwMrUAQRrEEwEEvXlOgswJvvMvq27VA/SHdzra/KEaFRf5TZeRdFU+L60URgwxmFeAzGi1RLzq79E2xox2I94F014Q4ZpgjeJxwdjHWfE//0tSSGxjNVULmlnuYpH8QXT3GhzzM6ejGkj7YyEuUkbYNX8Y6sAAuebKzC+Ap7ginJlrbxAk5w3XL7+gbUln8Xj7NqyIpfSjEX7Wc/xOpGRRYU8Ow5GDAZ5WjWzx8zOuSA1cb2hzy0H+pNgt7kyPp7V53Z8YXoW/MOjvKuyP6GL9A1VioJaIt8X/FBIZDD+qhHQ5pRX0ac6MsRyPNhPflLeAgAC1hwrjdUDYvaruDOCmcmyqfOWMgeuyQMaeEvQWZ8azz0WcKdQh29FB9VnRdLS0tjJgPY7YQamEasiF3H2dpZgfVFJIJgFhU99I3oCozf6eaXFkG0p6yETlLX1FPrV75SC3jcs3B1Ea3WauYheuuo89bnXbP2zinHvD6BYab301eQdt6AjNTvXrC6zwNaLlgCksT90ScWggVK69epgrDKzyzOnXOXDKo7PZlFMr9aahKSbYUoc2asQbQ0BjYBDBydUsdgEReCHWguxDWh/kbiHGiXuOIiq9fyZ2Yoq5W4ZpUuuITd7slgsyX0erbsidD+CWqOD+OZYuUt7EzHEjynTJkeATC+W6yqFmfCQjfylt1uFZRhvUaTDmFqj+JjTfeJmravccd4hNQvuQrtdZJdbQ2SLJKjay9m/8QYTVNIvp5s1aOreC79UbTbB3olORrllVqJfhvGUqx72Qyw+pUGLz/3lOMKglp5qub/pXHEeDdTU5I0hoJYafmF71opD924isZ/8C8GjlwFysy82U48wJ3eoeII7NDxev4PYdmdS+d78B2BXv7mzwt34Sbpu03kFQ5nYeTasNJzcd+Dam8Wmi99BtSGFnl+AQYpKqtIa+PKXaX5Oj82yQDyIl6x3jUE62rqa7nKQgcWPoOKhuIsc73cAm0YX5u54c7/wD0O+B96FImwyUMOyF9r7eFFqh7cuh4RPYfgDxouJ+jycl3rwSENEaUAkOhSR1nhru/uO/N1g7KufeCZ4/
X-Forefront-Antispam-Report: 
	CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY5PR12MB6405.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(366016)(376014);DIR:OUT;SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: 
 WIVg2QbfgfxJlz10GUHjN28XnAT3uQMQC5pNyYhHc1KvzgS3thY0VcX9W19XUDKatoyZ9TPb62Vsd6pRjChRIcHg5jWXAK9lSVesyxpwekWlzjFfKaq1mcL7Sx0SDQ1U/dMu0OUN+Dl1pzl3RAJMWrSOjMqmAgBPnG/9h2E4cNScVAq+jwpmp3ZssTSvoGA3c2MX91dunxU031mqFGgPP8p03/pWqHLA8l/tz2cSci3dc6nE85/1xmAtj3s8eqyBvem4htHzaZQGzlIcbNGQFEmBZ70ikjHERWXblnqu4VxqMMYJezemxhrB/VZ2n/hJyNeKJojIDg4Us7yVDoDsKIFRSWEqU36QbUmbxLHM5qUPhUToQKUy6g8Pj2HWxGu6Q87hYABDRaKtsQ2AVvHwQV/O+f9DFakRlrqSx4zRlx1m7APRGIP01bVK/oNreebk8woVNLQZXXhI+YKBnRMwi/BdRN0LKI9Q+N1wBDuvIyR9aA1SjIaWG1WKfAPtVCCUOUIjdXJumEcHb0HTzhj2xrZ2xELztXuaS+gYPuElVXMY5gYiWcVd8JciC2hKnzJXPyfTgwo+chh/I6KZrSpg+MMNkp+5Hpe76pEyioIS4+WeRbxh7MOTeS2W5WiKn1e0zQVAWBcPTekjzEa4fLfc/tGuK2rHpnn9zO0N9DRvZPZAbDD5EPJ4Jr0qi+9BqR+CrwfgVKvK+Se1+r6N6I1TKj3B1MNmAW1KmKZtr5q7Wj2wngDuFfYgyKHFs7/9rf2J7+WLj0Qwr+Z9Xk2G6nrg/oGRrUKZKxO96iaEtN+KH7HjhAeQW4Ubp89weW2CoUG7Uhs2kObQjZpyHRd0lBy43gKP17iJvVVefGtIIcjXtPNnwYPOc7KT0C3Zp04+f+xiVcY5SCY2QJHnCfdodMRvXDoBgZMVxCEXEfc19WJDkwyIwSzo8nbpup7NGOm9fQmqLR3glJWiq+Qv8IpqUy7av4ggxhUeJa5XzudHulPX/qUi8HYhS8yjg6/2d58DpWYdbpx5FcchhaeA26Qofwf/Bq+HwXkmVCxWm5pyA8TtEMyzi25rS/wWLvr8PezaayroQIL2V3NXx3MTvK9aYVi+TbxS0BYUhe7PV6szXKj4oJ0/Jhz+mMSNqtBjYTud2lDB2ZtrJffy48+KStFXoUSp2AXF3Vv55zoN7VdGLPetWaeHx1Toa7/gO/F87Ea5kvVYszbaYH9HVMBHYACmZJOAL/y3lQ7G3CvBnNoF9BE7yrhXzclUmeUQhdXsg0H2kKJnKztwuhd9+rtinByLd2JIivq5aNAROAu/46JZQYpNFALzE8lNtseeKCSp2GISHVTu3MArKc5EUDJCgbdB4aFrYUQTPq+3bSMFvrCeNYNTGfTcYIppqUO3fmTHvAld/d3T3z2HDQ6f8IVGBaXZkbVeOy5QQTQnXG1yJd2eLwlL/1Yo5GZssldjEJHAXS9ug5UqrsbEDr7jEslyXkbd3nP1YMqsMKnZOh2nF0ZP05WBGSPtMsNhmzjwh7NvDskx2fGsNlvOQBo98q394IIre61xowp9vqXMwQ37ha08RvTXt7Opgw1jbkTu8N2TwFV0Othc
X-OriginatorOrg: Nvidia.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 59d545f8-847f-4c6f-9aec-08dd62dd79a0
X-MS-Exchange-CrossTenant-AuthSource: CY5PR12MB6405.namprd12.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Mar 2025 09:49:12.7024
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 
 9jNA+ruZNygPJbRj8Zg+g3+5nqwkGbEpFB36TAP/095esRgSQI9+yyCsCdHOnmQa1EvXkvyG6fGwiMEbJf/g7w==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR12MB6431

Modify scx_select_cpu_dfl() to take the allowed cpumask as an explicit
argument, instead of implicitly using @p->cpus_ptr.

This prepares for future changes where arbitrary cpumasks may be passed
to the built-in idle CPU selection policy.

This is a pure refactoring with no functional changes.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext.c      |  2 +-
 kernel/sched/ext_idle.c | 23 ++++++++++++-----------
 kernel/sched/ext_idle.h |  3 ++-
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 06561d6717c9a..f42352e8d889e 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -3395,7 +3395,7 @@ static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wake_flag
 	} else {
 		s32 cpu;
 
-		cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, 0);
+		cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, p->cpus_ptr, 0);
 		if (cpu >= 0) {
 			p->scx.slice = SCX_SLICE_DFL;
 			p->scx.ddsp_dsq_id = SCX_DSQ_LOCAL;
diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index 1940baedde157..27aaadf14cb44 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -430,7 +430,8 @@ void scx_idle_update_selcpu_topology(struct sched_ext_ops *ops)
  * NOTE: tasks that can only run on 1 CPU are excluded by this logic, because
  * we never call ops.select_cpu() for them, see select_task_rq().
  */
-s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64 flags)
+s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
+		       const struct cpumask *cpus_allowed, u64 flags)
 {
 	struct cpumask *llc_cpus = NULL, *numa_cpus = NULL;
 	int node = scx_cpu_node_if_enabled(prev_cpu);
@@ -448,12 +449,12 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64
 	if (static_branch_maybe(CONFIG_NUMA, &scx_selcpu_topo_numa)) {
 		struct cpumask *cpus = numa_span(prev_cpu);
 
-		if (cpus && !cpumask_equal(cpus, p->cpus_ptr)) {
-			if (cpumask_subset(cpus, p->cpus_ptr)) {
+		if (cpus && !cpumask_equal(cpus, cpus_allowed)) {
+			if (cpumask_subset(cpus, cpus_allowed)) {
 				numa_cpus = cpus;
 			} else {
 				numa_cpus = this_cpu_cpumask_var_ptr(local_numa_idle_cpumask);
-				if (!cpumask_and(numa_cpus, cpus, p->cpus_ptr))
+				if (!cpumask_and(numa_cpus, cpus, cpus_allowed))
 					numa_cpus = NULL;
 			}
 		}
@@ -461,12 +462,12 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64
 	if (static_branch_maybe(CONFIG_SCHED_MC, &scx_selcpu_topo_llc)) {
 		struct cpumask *cpus = llc_span(prev_cpu);
 
-		if (cpus && !cpumask_equal(cpus, p->cpus_ptr)) {
-			if (cpumask_subset(cpus, p->cpus_ptr)) {
+		if (cpus && !cpumask_equal(cpus, cpus_allowed)) {
+			if (cpumask_subset(cpus, cpus_allowed)) {
 				llc_cpus = cpus;
 			} else {
 				llc_cpus = this_cpu_cpumask_var_ptr(local_llc_idle_cpumask);
-				if (!cpumask_and(llc_cpus, cpus, p->cpus_ptr))
+				if (!cpumask_and(llc_cpus, cpus, cpus_allowed))
 					llc_cpus = NULL;
 			}
 		}
@@ -507,7 +508,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64
 		    cpu_rq(cpu)->scx.local_dsq.nr == 0 &&
 		    (!(flags & SCX_PICK_IDLE_IN_NODE) || (waker_node == node)) &&
 		    !cpumask_empty(idle_cpumask(waker_node)->cpu)) {
-			if (cpumask_test_cpu(cpu, p->cpus_ptr))
+			if (cpumask_test_cpu(cpu, cpus_allowed))
 				goto out_unlock;
 		}
 	}
@@ -552,7 +553,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64
 		 * begin in prev_cpu's node and proceed to other nodes in
 		 * order of increasing distance.
 		 */
-		cpu = scx_pick_idle_cpu(p->cpus_ptr, node, flags | SCX_PICK_IDLE_CORE);
+		cpu = scx_pick_idle_cpu(cpus_allowed, node, flags | SCX_PICK_IDLE_CORE);
 		if (cpu >= 0)
 			goto out_unlock;
 
@@ -600,7 +601,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64
 	 * in prev_cpu's node and proceed to other nodes in order of
 	 * increasing distance.
 	 */
-	cpu = scx_pick_idle_cpu(p->cpus_ptr, node, flags);
+	cpu = scx_pick_idle_cpu(cpus_allowed, node, flags);
 	if (cpu >= 0)
 		goto out_unlock;
 
@@ -856,7 +857,7 @@ __bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 		goto prev_cpu;
 
 #ifdef CONFIG_SMP
-	cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, 0);
+	cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, p->cpus_ptr, 0);
 	if (cpu >= 0) {
 		*is_idle = true;
 		return cpu;
diff --git a/kernel/sched/ext_idle.h b/kernel/sched/ext_idle.h
index 511cc2221f7a8..37be78a7502b3 100644
--- a/kernel/sched/ext_idle.h
+++ b/kernel/sched/ext_idle.h
@@ -27,7 +27,8 @@ static inline s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, int node
 }
 #endif /* CONFIG_SMP */
 
-s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64 flags);
+s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
+		       const struct cpumask *cpus_allowed, u64 flags);
 void scx_idle_enable(struct sched_ext_ops *ops);
 void scx_idle_disable(void);
 int scx_idle_init(void);

From patchwork Fri Mar 14 09:45:37 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrea Righi <arighi@nvidia.com>
X-Patchwork-Id: 14016533
Received: from NAM12-DM6-obe.outbound.protection.outlook.com
 (mail-dm6nam12on2050.outbound.protection.outlook.com [40.107.243.50])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C18021F3B92;
	Fri, 14 Mar 2025 09:49:23 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=fail smtp.client-ip=40.107.243.50
ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741945765; cv=fail;
 b=J/4+VlSLrMRB47GM8n2eQ0MRGRifhKV/plk4XrYyNxH5L1Xe8UMfLMSJStl9k0PIVpey99aRZCqSnistT8B3GUdOc1YIJ5oSByCO+J1YayXRvWjSH3KzRNvvZB745ThfLqBDa/PCO98kd8lHM7xNU+Nu1/uiRDN87Wv+H+hio18=
ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741945765; c=relaxed/simple;
	bh=SfdEfH4cc/aHWCx+Xr3G/UTBrdcTwCgo69aGxQ+pbzg=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 Content-Type:MIME-Version;
 b=LksvX1aTHo1JbpH7aHIwJnipG4xiz/4wVNA2/OBx++MCt+8wq+6t029ID2OZD+0orxQfdQ/1ZSHhteDEaYkZO5rFdVVX9QK70QB4LqHx2yNzblknCdd/KHF4hbJt1xXYghb+Ny1BLVpXpAsm0+i1rJjGmFc6ZxsJDJNm20ZgtUw=
ARC-Authentication-Results: i=2; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com;
 spf=fail smtp.mailfrom=nvidia.com;
 dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b=WAlFgHOP; arc=fail smtp.client-ip=40.107.243.50
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=fail smtp.mailfrom=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b="WAlFgHOP"
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=qWh6P5kaPA9C9drd/xqsE9naOuWegiq7xRREF/PH3w07RoYUKfkrsUNLtiC7yi5N6W6dC5mKMYFmxWGTvsHr4z/nfWMafcA4h4FLwqzgeXZTmY10G2f1xwZ9OW3VOn+KwsvAy0w9u+3QbTpeKN2l5/ARuCNOXS53mwy13vzwa2g+SzIMxBlcIeFdOTGpaVJ+jk7zM84hVOWMdnOxXtiIt9HWd/LOGceymTLue892F60ujlWXn0utVujanBp+MqyVyurkwKbX+KmA/cAc9CN2jq1LoDoPOlHvHK68CD966fGQi5mPtek/dG/h2N5bYGXuf5UtrcH461X/1g9a7t/enQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=tHmoiO9QRf/tJ4qq+cAaLGe76qcC3ppIfupXhwrfpjg=;
 b=pWYlm7GRvNAtxbhL+RwL3W755hmAXL/55hRXHB9yUpkA0QmWaNHIEaU0s8gIeEyA9yJlbLPz48j6TR9+E4asP62LOFjtlI/zJGgcoz52IwucoBWUwpgEL8zEi5mTb/Wwxtz7iSvhdHulfDInzuhMGl9vUAI1Y/pVTWyutFni8yAnXIUKhfP92KZy2oOrs2FxU32P2L7bRM2feqsBlXBo2yHVlDsPPmcaIWLV5JvVjgJAOzDdv30+8Sow8i/jaYuzVC07ybWprGBR+dGTuR7F/v9Evu8nC53YrI9cbii6SrRtfoWVh1/56+LUILzap7xa3aQ6nqW7lULkbdn9jdnq6g==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com;
 dkim=pass header.d=nvidia.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com;
 s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=tHmoiO9QRf/tJ4qq+cAaLGe76qcC3ppIfupXhwrfpjg=;
 b=WAlFgHOPpG7gq6lCr+uZPxscRGOT7A9cqxm7NzAvSgFulEO11kFeUQLQi2MnjCgsoA31zp2YkFpOC/dwXbXvqPx+48EER1WSZvwwSikMKry3E96o5ncvr9qyCFLmLxjBhnQuOaM6xYSfK1pZwGB8SIy4Wr+Y68B+54/pxaatr+CC/+xgYHtFxr5DBOl1X+Kp7GKrYx9DEmjivK9WqOv/00qBEPSdfILoSvOHtbQDwkCdBtXwAPLI2TlMJ8oAhbHVmy/pNXLbVQYDkWi3z2igKVw4TbvS3r6CJ9JbIg35sWP+izbV3HgNf8QiY+wLw4j+fRg8IAOpWLnUevyn1+9RKQ==
Authentication-Results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=nvidia.com;
Received: from CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17)
 by CY5PR12MB6431.namprd12.prod.outlook.com (2603:10b6:930:39::8) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8511.27; Fri, 14 Mar
 2025 09:49:21 +0000
Received: from CY5PR12MB6405.namprd12.prod.outlook.com
 ([fe80::2119:c96c:b455:53b5]) by CY5PR12MB6405.namprd12.prod.outlook.com
 ([fe80::2119:c96c:b455:53b5%5]) with mapi id 15.20.8511.031; Fri, 14 Mar 2025
 09:49:21 +0000
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>,
	David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>
Cc: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH 5/8] sched_ext: idle: Accept an arbitrary cpumask in
 scx_select_cpu_dfl()
Date: Fri, 14 Mar 2025 10:45:37 +0100
Message-ID: <20250314094827.167563-6-arighi@nvidia.com>
X-Mailer: git-send-email 2.48.1
In-Reply-To: <20250314094827.167563-1-arighi@nvidia.com>
References: <20250314094827.167563-1-arighi@nvidia.com>
X-ClientProxiedBy: ZR2P278CA0054.CHEP278.PROD.OUTLOOK.COM
 (2603:10a6:910:53::17) To CY5PR12MB6405.namprd12.prod.outlook.com
 (2603:10b6:930:3e::17)
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: CY5PR12MB6405:EE_|CY5PR12MB6431:EE_
X-MS-Office365-Filtering-Correlation-Id: 608ed469-6547-4876-8551-08dd62dd7eb6
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|366016|376014;
X-Microsoft-Antispam-Message-Info: 
 1IHDBbG78yx3Jogf8KUS/Z6wU5h2bPH/Yc5JA6wmVudbVWTEuAzjTyXzWpDgcOIKTpxnf1K7VZxPV21lZ2QC7VeQ6QAzm20lFt9y0ubBELbDHdYwZEjocO8+odYjInyBluUlhdnS4iJJVv7IXQCNeZB2S+haX6resGtYKfcTjbb8AyGF2SItdExcX/dWuzxkeUuwWNW5E9lxzzRykOW4dRSKKRqQR3+6x/VEMdOP2c28echM/GTeHyJj1eFI17E1/dJl2IyIbj+jQL/G1u+oyJ5nSUN/L2IXWyjY1TnyoQjqOn8u0zCEYv+N1JBvMlqbzmQvb2UrzBqqERYEOZiIapmILJ96Skv+LU88fK5Jc5GZmgmF7gvjDEhYy3I8gPbyPN19NJAda2r+YyMAK63hVNVvjj/V8lVOepwkwQUJ4eYWZ4rvPetPdM+cCEzyWtS5Q8VFwxCla1nT69OLIOrA92yHs/pfvTyOpdOO61kCjL8LspmjMikzFzvnev4akugl62mpy0Y4wlDJG7c9TAxdj1BTOam2CfI6BAUjHSAoi3uH4eO8pe0r7nHG57okxVBLyaZOU7fOFW2cveCSircTigHNfJ9pjjCGyQW4hzVB8u0b5jvAuYOOFlH1ZUZ/mEsGUxfGLbS/E4rv2AIqe/F3Dc4n3AFALMf5XfPuOWoLFyCXWSAXVt6SaMfBh7wnSq6WaQLYqmcqyUTRqvS8CezydHplkOg3LQdYG+GHqYetdn+EkkQIlvrlVL1flUEpQx5B/z9lp3PtKRWhlqCWn9hdtaA6FaMGlnBx7Jt/Z9TgmvyYo2N0EdlWeY8S0QuxpAr7PPc8t/F4GlUAa82dqwEOw4G7DhFzRGjkrAABgBDSgc7QPj8ICP4sO2kJMTdMWuMWFY6PaxX8ImC8JsqKbvkXKlQoZzaH2I5qz9DuM5lb8+Y4a9eD6cKsH/Lc7VIOB1nV/YIXrJV1kD97KL1rDf9KHEYlzTDgqD5Qhd8AVs9ZZHai26yAn3bQCp3rYbWy7sm24yxTEUsh/icd3q/L0r7y9NRNOKordXunEm/60Uv7lJ2tKS14N2cbi1cXZuoyNX+bMbeSJbB8irseHRMbgfbJSYoS1ar7s4bEmhaeYXaszXsEUsew6nQ11FNh+w8L7IYIJRSq7bjeoncXTUNIU4mtf29Qy3yJkbRWmO5alYota/F+yk5zQOUfXONxTU8kjo/9YsdpobbebkcZLmwSjK1KosqjbrFA7UsvNFK3Ll3jLh8TDv23PaTLAwvcN35U3ovKvY/JPNgh1dh3L/7V+w+f3NPhyYN7atOxY5a1MtJHRBkX2ZsXREB+gMInVgx0tuZ5yPOR62a3d3snZtupnJt9GhWyAPZorqI1t+YNWPn9ofiEC3mQaVJe8yTDH/uO5+Yn
X-Forefront-Antispam-Report: 
	CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY5PR12MB6405.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(366016)(376014);DIR:OUT;SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: 
 aIqcJTBS0oi5wSr78KuvAHq/yw9epFe1FT4TC5BxbmLjiy33UYCWxBt/mLSataEtbWByoQ4hGOfsh/B5UFFCTib6ccLLE4ZhoLL/4387FPjfvKrJvIzNfNi17xUJeS5pT5l36XVCrGKR5F8kPyP3mEhdbpIKyx37SZGS0vWOLsZnKmYwXLMfoj4cBkl16eN6Cf9zEDMfY69wHal5SXlHDGGyA2Zy0cl7JozUKQoz/Bq7H2pd8o3tBj4epvW01eKjhMSSLWvjRkjRRe9cL+7I00jCKWIwul8qwgKFuaab1YjyChcdaVHdc18qPGGfGLiRYB72o7pnCZNKcUUGNDjvkSzglV+HjTKYBLR0lES7daSdt/yPyrGOdXJ71MeidMZ8gmUM7Hl8eYcbVk8aNE17W6lcaYeYlRYBMSSvK5lTINQlt3E1rDlbrnm+QfUu9o2PLmqqh1lFRl3KqTPx0oLdmX36lpvMx8nORav/w59Kq3PUZH+opJEBuLMd5yFOlkqWfjmI27XN530GeZc0CiDiZe6+YPu/vpixjoVHLDHhUfBZ3k62i4sfOCgKyQUOJedx8b7JJxxFBsjgtYL9MwZQnfvd4Xpt4adttPhCBxIdQTAFw2ZrhXtbjEJmdEQx5dcjmM/6pGNVMDawvjon1Wuud9COtUr79GFW623YfolkHuyHzit+9rC9b+n1z++VnsziCQ4ZxQJS4Q05zXypihPp+abjkO49s+oW14VTxs51Xk2tedz7MWfPfUGSmauCN1wbgUg7SgK9HwavtaTFMQ7sjgprTbr9eHc6/Fzs2es6VB4pL/aJ701UpDdOs76cBceiMwtYwyNIoTEFQief3z8EOxJLHXdJpdnNuJ9+Pbau/tFB84svvsKFgYXaRlon6nFBKBHV9NdrcAa4dkrWZmvwys5ZBQ7BlfagdRXLZKb/X00Goqt4J1yCzTxr01a3dArl4x8hx0+dpfS11fE5JLMGI3hUIKIcOZp8bx7UkigkEdwhFWVuIcMXyRFh6VkXMYHjOmkg7iGAOlsjFXeB7+12D4l9d0pgA2RT6b41eLmSNrTWJPxhmyqQUVDtp9A+EDrwF5z6HMpKFyklip3BDZqOKgnkljbZkKoBalD4YF941ETG3m7xfblFopKqczcu45LoP2U5Eqm0KkCh1+kPu+9VMBl4GM7sA6VtPr5KMVAfzZWZZhk7CKr2KVq0YOZN48W40UZIds4pZspmUx8gtZLtAr0GNAIKyEGurWVhdqgkICTSfN3IGH7zSykxUqPFHAqo0l1RwOjw0vIrB9wHFjnePGYlMt793BB8D3tcnNfFd0S41b+PTlNlUx7J7XBbiRv+TnuziMADpseJ/moC5hsHayxjvwKZa4ZKxT7kDCjhzlt45i6wT7bGRsJsRzqyCJblZm9PbtjS4bOaoND2jUf79weY/NrPBAjrBt9rvsKumqalYIzKL5C3iX5ITbZmKAmTm7V/udcRVCrX0EZnuICNEwTP0/jx3ujYer2bwEBAaoJ2n5m8vu7T+RI2J4MDJ4Uf3FcFi6cdbFu0wTbX5V9jxjk27u7GhJm/epsLdoYN7VEvBxaaCEUcWRo3ROdUgkED
X-OriginatorOrg: Nvidia.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 608ed469-6547-4876-8551-08dd62dd7eb6
X-MS-Exchange-CrossTenant-AuthSource: CY5PR12MB6405.namprd12.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Mar 2025 09:49:21.0403
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 
 Py9egnfFX8n0N18W94YLhgkKgFGS8BRmXB9q0sQWoqn06v7ypEXEabtk6TqVe5HZAamW+0Pt3z8b5KmQ7LfWMA==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR12MB6431

Many scx schedulers implement their own hard or soft-affinity rules
to support topology characteristics, such as heterogeneous architectures
(e.g., big.LITTLE, P-cores/E-cores), or to categorize tasks based on
specific properties (e.g., running certain tasks only in a subset of
CPUs).

Currently, there is no mechanism that allows to use the built-in idle
CPU selection policy to an arbitrary subset of CPUs. As a result,
schedulers often implement their own idle CPU selection policies, which
are typically similar to one another, leading to a lot of code
duplication.

To address this, modify scx_select_cpu_dfl() to accept an arbitrary
cpumask, that can be used by the BPF schedulers to apply the existent
built-in idle CPU selection policy to a subset of allowed CPUs.

With this concept the idle CPU selection policy becomes the following:
 - always prioritize CPUs from fully idle SMT cores (if SMT is enabled),
 - select the same CPU if it's idle and in the allowed CPUs,
 - select an idle CPU within the same LLC, if the LLC cpumask is a
   subset of the allowed CPUs,
 - select an idle CPU within the same node, if the node cpumask is a
   subset of the allowed CPUs,
 - select an idle CPU within the allowed CPUs.

This functionality will be exposed through a dedicated kfunc in a
separate patch.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext_idle.c | 73 +++++++++++++++++++++++++++++++++--------
 1 file changed, 59 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index 27aaadf14cb44..549551bc97a7b 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -49,6 +49,7 @@ static struct scx_idle_cpus **scx_idle_node_masks;
 /*
  * Local per-CPU cpumasks (used to generate temporary idle cpumasks).
  */
+static DEFINE_PER_CPU(cpumask_var_t, local_idle_cpumask);
 static DEFINE_PER_CPU(cpumask_var_t, local_llc_idle_cpumask);
 static DEFINE_PER_CPU(cpumask_var_t, local_numa_idle_cpumask);
 
@@ -397,6 +398,21 @@ void scx_idle_update_selcpu_topology(struct sched_ext_ops *ops)
 		static_branch_disable_cpuslocked(&scx_selcpu_topo_numa);
 }
 
+static const struct cpumask *
+task_allowed_cpumask(const struct task_struct *p, const struct cpumask *cpus_allowed, s32 prev_cpu)
+{
+	struct cpumask *allowed;
+
+	if (cpus_allowed == p->cpus_ptr || p->nr_cpus_allowed >= num_possible_cpus())
+		return cpus_allowed;
+
+	allowed = this_cpu_cpumask_var_ptr(local_idle_cpumask);
+	if (!cpumask_and(allowed, p->cpus_ptr, cpus_allowed))
+		return NULL;
+
+	return allowed;
+}
+
 /*
  * Built-in CPU idle selection policy:
  *
@@ -409,13 +425,15 @@ void scx_idle_update_selcpu_topology(struct sched_ext_ops *ops)
  *     branch prediction optimizations.
  *
  * 3. Pick a CPU within the same LLC (Last-Level Cache):
- *   - if the above conditions aren't met, pick a CPU that shares the same LLC
- *     to maintain cache locality.
+ *   - if the above conditions aren't met, pick a CPU that shares the same
+ *     LLC, if the LLC domain is a subset of @cpus_allowed, to maintain
+ *     cache locality.
  *
  * 4. Pick a CPU within the same NUMA node, if enabled:
- *   - choose a CPU from the same NUMA node to reduce memory access latency.
+ *   - choose a CPU from the same NUMA node, if the node cpumask is a
+ *     subset of @cpus_allowed, to reduce memory access latency.
  *
- * 5. Pick any idle CPU usable by the task.
+ * 5. Pick any idle CPU within the @cpus_allowed domain.
  *
  * Step 3 and 4 are performed only if the system has, respectively,
  * multiple LLCs / multiple NUMA nodes (see scx_selcpu_topo_llc and
@@ -434,9 +452,32 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
 		       const struct cpumask *cpus_allowed, u64 flags)
 {
 	struct cpumask *llc_cpus = NULL, *numa_cpus = NULL;
-	int node = scx_cpu_node_if_enabled(prev_cpu);
+	const struct cpumask *allowed;
+	int node;
 	s32 cpu;
 
+	preempt_disable();
+
+	/*
+	 * Determine the allowed scheduling domain of the task.
+	 */
+	allowed = task_allowed_cpumask(p, cpus_allowed, prev_cpu);
+	if (!allowed) {
+		cpu = -EBUSY;
+		goto out_enable;
+	}
+
+	/*
+	 * If @prev_cpu is not in the allowed domain, try to assign a new
+	 * arbitrary CPU in the allowed domain.
+	 */
+	if (!cpumask_test_cpu(prev_cpu, allowed)) {
+		cpu = cpumask_any_and_distribute(p->cpus_ptr, allowed);
+		if (cpu < nr_cpu_ids)
+			prev_cpu = cpu;
+	}
+	node = scx_cpu_node_if_enabled(prev_cpu);
+
 	/*
 	 * This is necessary to protect llc_cpus.
 	 */
@@ -449,12 +490,12 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
 	if (static_branch_maybe(CONFIG_NUMA, &scx_selcpu_topo_numa)) {
 		struct cpumask *cpus = numa_span(prev_cpu);
 
-		if (cpus && !cpumask_equal(cpus, cpus_allowed)) {
-			if (cpumask_subset(cpus, cpus_allowed)) {
+		if (cpus && !cpumask_equal(cpus, allowed)) {
+			if (cpumask_subset(cpus, allowed)) {
 				numa_cpus = cpus;
 			} else {
 				numa_cpus = this_cpu_cpumask_var_ptr(local_numa_idle_cpumask);
-				if (!cpumask_and(numa_cpus, cpus, cpus_allowed))
+				if (!cpumask_and(numa_cpus, cpus, allowed))
 					numa_cpus = NULL;
 			}
 		}
@@ -462,12 +503,12 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
 	if (static_branch_maybe(CONFIG_SCHED_MC, &scx_selcpu_topo_llc)) {
 		struct cpumask *cpus = llc_span(prev_cpu);
 
-		if (cpus && !cpumask_equal(cpus, cpus_allowed)) {
-			if (cpumask_subset(cpus, cpus_allowed)) {
+		if (cpus && !cpumask_equal(cpus, allowed)) {
+			if (cpumask_subset(cpus, allowed)) {
 				llc_cpus = cpus;
 			} else {
 				llc_cpus = this_cpu_cpumask_var_ptr(local_llc_idle_cpumask);
-				if (!cpumask_and(llc_cpus, cpus, cpus_allowed))
+				if (!cpumask_and(llc_cpus, cpus, allowed))
 					llc_cpus = NULL;
 			}
 		}
@@ -508,7 +549,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
 		    cpu_rq(cpu)->scx.local_dsq.nr == 0 &&
 		    (!(flags & SCX_PICK_IDLE_IN_NODE) || (waker_node == node)) &&
 		    !cpumask_empty(idle_cpumask(waker_node)->cpu)) {
-			if (cpumask_test_cpu(cpu, cpus_allowed))
+			if (cpumask_test_cpu(cpu, allowed))
 				goto out_unlock;
 		}
 	}
@@ -553,7 +594,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
 		 * begin in prev_cpu's node and proceed to other nodes in
 		 * order of increasing distance.
 		 */
-		cpu = scx_pick_idle_cpu(cpus_allowed, node, flags | SCX_PICK_IDLE_CORE);
+		cpu = scx_pick_idle_cpu(allowed, node, flags | SCX_PICK_IDLE_CORE);
 		if (cpu >= 0)
 			goto out_unlock;
 
@@ -601,12 +642,14 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
 	 * in prev_cpu's node and proceed to other nodes in order of
 	 * increasing distance.
 	 */
-	cpu = scx_pick_idle_cpu(cpus_allowed, node, flags);
+	cpu = scx_pick_idle_cpu(allowed, node, flags);
 	if (cpu >= 0)
 		goto out_unlock;
 
 out_unlock:
 	rcu_read_unlock();
+out_enable:
+	preempt_enable();
 
 	return cpu;
 }
@@ -638,6 +681,8 @@ void scx_idle_init_masks(void)
 
 	/* Allocate local per-cpu idle cpumasks */
 	for_each_possible_cpu(i) {
+		BUG_ON(!alloc_cpumask_var_node(&per_cpu(local_idle_cpumask, i),
+					       GFP_KERNEL, cpu_to_node(i)));
 		BUG_ON(!alloc_cpumask_var_node(&per_cpu(local_llc_idle_cpumask, i),
 					       GFP_KERNEL, cpu_to_node(i)));
 		BUG_ON(!alloc_cpumask_var_node(&per_cpu(local_numa_idle_cpumask, i),

From patchwork Fri Mar 14 09:45:38 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrea Righi <arighi@nvidia.com>
X-Patchwork-Id: 14016534
Received: from NAM12-DM6-obe.outbound.protection.outlook.com
 (mail-dm6nam12on2073.outbound.protection.outlook.com [40.107.243.73])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7657A1F239B;
	Fri, 14 Mar 2025 09:49:28 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=fail smtp.client-ip=40.107.243.73
ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741945770; cv=fail;
 b=piFY8zLjHBrqralZGiWoYaxuFmHzWKoPKRzwjL5me1u9WZp5mtbjl9E/t1kyqANH/W3zEAagtaGQNPlnYJz2M9ywLs36tOTpUNBZWu6oepSu35E27lXQtdvYBwbF9jiJSQ70/J4HHvzjiEAmbPw59Za0x+pVcb/311/wZN/gczY=
ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741945770; c=relaxed/simple;
	bh=WkGodVuQcYvOugyWCBMeFXpHN74cb6jdrTxbpVS7r60=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 Content-Type:MIME-Version;
 b=Jcj9hz2h5IhouRQUQujZHJ/B+jFohHCi7VtYPz/o8FswHlqSFdkkMVR6OYGbQhPsOeCFr/2FI0QBCe0F65kT/js5DHGcQqRUtrIC+wbhxRPF2p/FQ5c5POI2/T1F9nN6IrEwctXKTlfUpaYlr2R1y3GMwyYDzM/G5XI/Suhzrvk=
ARC-Authentication-Results: i=2; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com;
 spf=fail smtp.mailfrom=nvidia.com;
 dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b=UbnAsQGA; arc=fail smtp.client-ip=40.107.243.73
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=fail smtp.mailfrom=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b="UbnAsQGA"
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=Pu6IqvQnRXwUvfmL8Rr06k+fz4BqdXzpuvbaeBk4T4VC3pYXSKf9u9512IHQb5dZdmNZGB5rC/dbQRq6lAYC2lp1diKnyUavESkjmeZNlHGZnolgpx5LMOnoMYpixwD/506bGJzUprMgYgoy6oimuHVh//CmGk3HXc6sB/UMxj8EO1iY5It6PXfPc39GpaTZAU2RJ0G4dTpGqWt6t5Drk0VevtlHV4TLimAkBySI9F0Am0IXjZ7JWJWyTXIUC3S9615rtXwygaDUZEErqsWqZudoIyxS3RNDz0kZSJ+2hnVyHWOLmPuTrBknwLsnARb6Pxw9dh1SUx9cIFTyhuo/gw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=8gz5mnrPVymvN08Ipaw5GDyV8d9ebpS82xi3qU+raYI=;
 b=EB7+2mJDRaOKiOewsQ+aT1JVb+PIJfTaQKvVNBpGcYclGiaeqSP7m5GRJEU6b3ZU/H0K5dVYrsHKMyfWjjSoUsoxVPBqHBKqP+0VhZYLWoY9V/fwpNltMJH7KSJxigcsBHyEYle8m2hcJkYRzKroeYKZhwTptv4e7jzInLx4IEhHRfGdjd7DLZ+whiViDjFnnBahNwFsymnzwpHNvAJ6Y6wlTwF0W2upe9T9lro3mOLbqX2TC0pkhKnZjU/6OZGuXXTD7/fPPHL/43lnslY6jSiNnz+PpWjmy+2t3q0pmcdsCBMZ3tEF+5FYqGCCLFkH9uD7d54YRYfNpvcDBiI/sw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com;
 dkim=pass header.d=nvidia.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com;
 s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=8gz5mnrPVymvN08Ipaw5GDyV8d9ebpS82xi3qU+raYI=;
 b=UbnAsQGAjjsQ2nSULuzuaI55syav9irG5tukw2NkAPITjiiIFvEew/xfLBPDkJpT5fnSmTvRivGo+P5ZUTkzHjlqWxo3uRcvu8lnl3Yw/0OpKQUF8aUL1eslU10c0ofvUPUx6+WpZzQ7Iry+uTY8/00s4XaCZceM0EMCTSNqHrwle6wQzxlB7exnl4DUfL+/vj0c01h2fTjhkbAUIB8S8Ac0GjVOEeTaOMzv+3we1x4wqGlmUpqRnlpqV7T1VQrgVXwO04/JGFe2ZvkFhsCqqBXdZlcX+dBRKk34juiU0IJcJ6ue3Jy+7o0/CmQuRzf6V0zYPDjASbI1Uov8HjOAeA==
Authentication-Results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=nvidia.com;
Received: from CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17)
 by CY5PR12MB6431.namprd12.prod.outlook.com (2603:10b6:930:39::8) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8511.27; Fri, 14 Mar
 2025 09:49:26 +0000
Received: from CY5PR12MB6405.namprd12.prod.outlook.com
 ([fe80::2119:c96c:b455:53b5]) by CY5PR12MB6405.namprd12.prod.outlook.com
 ([fe80::2119:c96c:b455:53b5%5]) with mapi id 15.20.8511.031; Fri, 14 Mar 2025
 09:49:26 +0000
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>,
	David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>
Cc: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH 6/8] sched_ext: idle: Introduce scx_bpf_select_cpu_and()
Date: Fri, 14 Mar 2025 10:45:38 +0100
Message-ID: <20250314094827.167563-7-arighi@nvidia.com>
X-Mailer: git-send-email 2.48.1
In-Reply-To: <20250314094827.167563-1-arighi@nvidia.com>
References: <20250314094827.167563-1-arighi@nvidia.com>
X-ClientProxiedBy: SJ0PR03CA0084.namprd03.prod.outlook.com
 (2603:10b6:a03:331::29) To CY5PR12MB6405.namprd12.prod.outlook.com
 (2603:10b6:930:3e::17)
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: CY5PR12MB6405:EE_|CY5PR12MB6431:EE_
X-MS-Office365-Filtering-Correlation-Id: 126d1b91-c84f-4e00-4187-08dd62dd81ee
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|366016|376014;
X-Microsoft-Antispam-Message-Info: 
 IhQPKofA9nL38uNaEgVyEaRc0NkBecvyW9kiYiUTtBUA/mslc8t/Zzww0e4caJjhLDFx/Eec1IxPSpvuIFG17MV1jog/cCaH7fwAGKDhvqSquZK+OkCYzB6frfszU0bYjV/NLvbm/PI+nokodU36/2nbH1p37nIrYY0CYxjHkN9TB6R9WHdDEPOy9m/020iQAXiadZo+JmsfP6+jITpDWPngZc/Po4ljQJbI0u8aJ6OR5570iywht1fB0QpB9HARaLdJnmpHXypg+6no4QywYZZOlS9txcVGd8SdmIUYtTCI+swXYSyvsNMV+gu8pxATlY8sPhRY7+oNFRj/71B76Ga6XjhGHyvSew7ngU7OXglx953NuAK7ha4Hfx8VGhkNrZz3dU3Lgvq9IVW6zpnM7VI4eZi1h1WcThfUSVybkEyreFgsw3moA+hkKr79VsDfNMqTRHBN9bzr7sWznU+528ad5t9aOS+aM+kPGSrWdQFY4Vl3svh27d1ZE/rwJ0B9tSQji4Hf3gddRGk977YglaS8NQOWwY7ZgIE1Eo84l7vBPuFML6e/U153oEoQy6g8KhFT08l4aLY1efeoJbHR/X60cYGAf6YhlbpBUv3apTr8JzkJfsyUK2gkzLQIVAdm23AduTOZrlPM8xZXVcZcj4Fzw6WNaOaZzyrLiCwUWTW9YfnnAfDBm1kH4SoTzCPXYBcHbNK2cT2dhTrFbol9f9FSPycjllm/Ni/9muTOSrcKNQEVZwjq6u3sspv31mpeoSxWifbH/DwYVZqoqlh8XkBKGMLhCPhFLsWCWbGFdN70pF3LUcEqIY8TNnWaFtjykH5vPwchVR4Egt4LOSSR7ggvoxgemEF/kJNj5qXZNVVE4OvOtv8acYryQbSFHvtsWuttYTgy1UAezjtrLTO0tt9xlqD/K66MuvjrYBmatEMUF2/BQeegTAVxhreNNDDst4W5TnMo9Xnwk0gfcOH2nE6ETowQHtTdKnplWjhSOHknu6/qfeje5sW8Xi3jG8F9yabHiiqqBNogcEVfr2KcTPJQX2tmJPYqqbbe7wvcGWyI8jrOt+qLO+HjSAvY/4V2V3u+aXRfFP3yJnyRqfyL2Tqq6mZW39FDdpcos3BG+GHdfzHkz+Mn8Ap0bjp9rlwCiUunBB9Z6YzJceS63nch2GrEy01taNI8fgiJ2BIBb9EsS7yoQm46NawmvFh0ODr2d8KU+K17m4P3VEw58jaf+KxyY2JWsynL5hEd/zdOP8G5/9EW+ByVNI/hURWSjmBecTfYMwDc4JcsLTz3MHtObjYGh+51039HfH4C0EI7SMZdWWSYNDhwD4ji6w7Mn7cxoKsWTL6nGMAUtqXD16v1vampYHlegfoYPqkCk0ltMYchl+G6In4YYs+Ui5qzu3Iy
X-Forefront-Antispam-Report: 
	CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY5PR12MB6405.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(366016)(376014);DIR:OUT;SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: 
 oLsTwtPjghIphSw3LhkWSy+uwsMTwsthLSIEzNGVfbVXIHMKm5BYtpzTi0iIhn2d6MyirkwW5/FhIkGWwysLukggDDtbGZZ0m5O9SSTG7oYhW0sk2DPb2O3EqGZHtFAoScpuAcoNnRvnM9Qog2gdEa5o5YaY5IpRZq132chqw0npuY0tuqg0ZsXy2hbNcxVvmhc+RBLkxCOwS1Msk4KauUhxvsH+NjZ596w8j2rlpe94X2M26TDg7RT5SQB/M3j3t416bmBZso77lnG8sw963i56afGRMRwOwAk/PRjpneb3AcLA/Qut0XcAK8j2EFTBwYw+mbt1Std41fY/Y+6eLMPclcm1d39aKKwld6KB8F/jutmSi9RrfwqurnTFxjXqdLznvK2TtcKVhpIqSf0K981yrw5efPpv5iHErECUfee+WzI21yTiS2MDKFYRIpVd+ap/sfMBlzlNH3JQeJB9CfB4GNSF+q96x03ua7hPTYelyfhtKATRGb1UGRtxdbyuyge8NT4W9SUI9NJCn3cByBEqj/SxDWLRqtZCm4V9D9oYlSfc56bi0NlSU6oj2CNnZG+5OHn8fmvztYLXNhUTCoDyhmHWg4xWybjA8xWaAVUYWBXa+alpLRr4dkgJCUxk++CSuDfsLXSRYctNZ5I3WcfPfZuxyyc45ko55B950i7NHnXOyL1aZ6zVncdSRifZpTDFu0RWPQD4LfRmmpCxDElgL6k8chEeuqCQWaypmC8eL3O5cpv+XlHkwPXS3Wa7KZ7Ra8gilXi4y0fDMXODtuS0SpCsuWTdirEVjJk5eJ2Tz4VCHX3lSJ50HVsg5IxMdJ9SZcG4SiJsJiR3wnEt7vLGQe8FEHZP1hm78wwmjp2xHHi4QX/2ytH4PxBO/rQrtslBEGcf3cnu2TL5k4wgJjIzUIe0hG/1T1eRFOt24GNH7xfuRqljns8FGdfqGS/xGZ0xoh7ABXughgI8XudGz9zDzfYxu4M5JxymKUMX2WfQ6vDDr1q00nDkxTRMoIzTQLviJA1iN8C1Gp1L/hidNh5jJADpsFJWbv0W7PUry3DkLVElFHfvGTmJIajOfVejDBPhwCayXHkfEICcygNRIi5d8cYijnQ9P1BIkTZM1wIk4D5/spejL53eaBJ3p7hs7QAjIjl2ZVQIv5OiynoxhtPUwvFKo4LqnsReGX2svvdgyWEVf90BlcVWXAWkKqUNIBed7r+l7cQCDB/y7YefUuUphcVVvcLbOC+pZu/3syPzeC9bafsgzC3hnyyVWbe+MPQSYtwFgLaanGYF+wKl2vOl9kfHoDuiOm7/Ar0ovE8mx8nK9vHSlEwpBjxYwhCyXDuhBbyqccIZkCDC8HEpBP4g+kNKyoorps6kAaTPUOM7iVYaf5isleEcVeEvazWUEPwsuHLn3hw+C80knNvVfth3KWzc3ayHkS1S3BymWZS3jwT0Q4oUMQHQqCM/yQ0TB0l6zZD5tEVongNNXC9RuG9NlP/D9L7HnmFbfrZ7p/9Jn3rzNXDkkdzcibno05Pmht4+INXRr8WdNi0jXbZVL6bSNteucQ7ve5iKWFXhRtWDqX5QI33AnEP37a/nXvBB
X-OriginatorOrg: Nvidia.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 126d1b91-c84f-4e00-4187-08dd62dd81ee
X-MS-Exchange-CrossTenant-AuthSource: CY5PR12MB6405.namprd12.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Mar 2025 09:49:26.4376
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 
 Y1O7xxk9CkdTOmlKQRD1gr9578A9jxc916yU/0KgYidcdZNQudaEkDTldBjloZ1DPiHcr4oibKHqpOCJhXwAmQ==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR12MB6431

Provide a new kfunc, scx_bpf_select_cpu_and(), that can be used to apply
the built-in idle CPU selection policy to a subset of allowed CPU.

This new helper is basically an extension of scx_bpf_select_cpu_dfl().
However, when an idle CPU can't be found, it returns a negative value
instead of @prev_cpu, aligning its behavior more closely with
scx_bpf_pick_idle_cpu().

It also accepts %SCX_PICK_IDLE_* flags, which can be used to enforce
strict selection to @prev_cpu's node (%SCX_PICK_IDLE_IN_NODE), or to
request only a full-idle SMT core (%SCX_PICK_IDLE_CORE), while applying
the built-in selection logic.

With this helper, BPF schedulers can apply the built-in idle CPU
selection policy restricted to any arbitrary subset of CPUs.

Example usage
=============

Possible usage in ops.select_cpu():

s32 BPF_STRUCT_OPS(foo_select_cpu, struct task_struct *p,
		   s32 prev_cpu, u64 wake_flags)
{
	const struct cpumask *cpus = task_allowed_cpus(p) ?: p->cpus_ptr;
	s32 cpu;

	cpu = scx_bpf_select_cpu_and(p, prev_cpu, wake_flags, cpus, 0);
	if (cpu >= 0) {
		scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
		return cpu;
	}

	return prev_cpu;
}

Results
=======

Load distribution on a 4 sockets, 4 cores per socket system, simulated
using virtme-ng, running a modified version of scx_bpfland that uses
scx_bpf_select_cpu_and() with 0xff00 as the allowed subset of CPUs:

 $ vng --cpu 16,sockets=4,cores=4,threads=1
 ...
 $ stress-ng -c 16
 ...
 $ htop
 ...
   0[                         0.0%]   8[||||||||||||||||||||||||100.0%]
   1[                         0.0%]   9[||||||||||||||||||||||||100.0%]
   2[                         0.0%]  10[||||||||||||||||||||||||100.0%]
   3[                         0.0%]  11[||||||||||||||||||||||||100.0%]
   4[                         0.0%]  12[||||||||||||||||||||||||100.0%]
   5[                         0.0%]  13[||||||||||||||||||||||||100.0%]
   6[                         0.0%]  14[||||||||||||||||||||||||100.0%]
   7[                         0.0%]  15[||||||||||||||||||||||||100.0%]

With scx_bpf_select_cpu_dfl() tasks would be distributed evenly across
all the available CPUs.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext.c                       |  1 +
 kernel/sched/ext_idle.c                  | 41 ++++++++++++++++++++++++
 tools/sched_ext/include/scx/common.bpf.h |  2 ++
 3 files changed, 44 insertions(+)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index f42352e8d889e..343f066c1185d 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -465,6 +465,7 @@ struct sched_ext_ops {
 	 * idle CPU tracking and the following helpers become unavailable:
 	 *
 	 * - scx_bpf_select_cpu_dfl()
+	 * - scx_bpf_select_cpu_and()
 	 * - scx_bpf_test_and_clear_cpu_idle()
 	 * - scx_bpf_pick_idle_cpu()
 	 *
diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index 549551bc97a7b..c0de7b64771d4 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -914,6 +914,46 @@ __bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 	return prev_cpu;
 }
 
+/**
+ * scx_bpf_select_cpu_and - Pick an idle CPU usable by task @p,
+ *			    prioritizing those in @cpus_allowed
+ * @p: task_struct to select a CPU for
+ * @prev_cpu: CPU @p was on previously
+ * @wake_flags: %SCX_WAKE_* flags
+ * @cpus_allowed: cpumask of allowed CPUs
+ * @flags: %SCX_PICK_IDLE* flags
+ *
+ * Can only be called from ops.select_cpu() if the built-in CPU selection is
+ * enabled - ops.update_idle() is missing or %SCX_OPS_KEEP_BUILTIN_IDLE is set.
+ * @p, @prev_cpu and @wake_flags match ops.select_cpu().
+ *
+ * Returns the selected idle CPU, which will be automatically awakened upon
+ * returning from ops.select_cpu() and can be used for direct dispatch, or
+ * a negative value if no idle CPU is available.
+ */
+__bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
+				       const struct cpumask *cpus_allowed, u64 flags)
+{
+	s32 cpu;
+
+	if (!ops_cpu_valid(prev_cpu, NULL))
+		return -EINVAL;
+
+	if (!check_builtin_idle_enabled())
+		return -EBUSY;
+
+	if (!scx_kf_allowed(SCX_KF_SELECT_CPU))
+		return -EPERM;
+
+#ifdef CONFIG_SMP
+	cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, cpus_allowed, flags);
+#else
+	cpu = -EBUSY;
+#endif
+
+	return cpu;
+}
+
 /**
  * scx_bpf_get_idle_cpumask_node - Get a referenced kptr to the
  * idle-tracking per-CPU cpumask of a target NUMA node.
@@ -1222,6 +1262,7 @@ static const struct btf_kfunc_id_set scx_kfunc_set_idle = {
 
 BTF_KFUNCS_START(scx_kfunc_ids_select_cpu)
 BTF_ID_FLAGS(func, scx_bpf_select_cpu_dfl, KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU)
 BTF_KFUNCS_END(scx_kfunc_ids_select_cpu)
 
 static const struct btf_kfunc_id_set scx_kfunc_set_select_cpu = {
diff --git a/tools/sched_ext/include/scx/common.bpf.h b/tools/sched_ext/include/scx/common.bpf.h
index dc4333d23189f..6f1da61cf7f17 100644
--- a/tools/sched_ext/include/scx/common.bpf.h
+++ b/tools/sched_ext/include/scx/common.bpf.h
@@ -48,6 +48,8 @@ static inline void ___vmlinux_h_sanity_check___(void)
 
 s32 scx_bpf_create_dsq(u64 dsq_id, s32 node) __ksym;
 s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, bool *is_idle) __ksym;
+s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
+			   const struct cpumask *cpus_allowed, u64 flags) __ksym __weak;
 void scx_bpf_dsq_insert(struct task_struct *p, u64 dsq_id, u64 slice, u64 enq_flags) __ksym __weak;
 void scx_bpf_dsq_insert_vtime(struct task_struct *p, u64 dsq_id, u64 slice, u64 vtime, u64 enq_flags) __ksym __weak;
 u32 scx_bpf_dispatch_nr_slots(void) __ksym;

From patchwork Fri Mar 14 09:45:39 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrea Righi <arighi@nvidia.com>
X-Patchwork-Id: 14016535
Received: from NAM12-BN8-obe.outbound.protection.outlook.com
 (mail-bn8nam12on2086.outbound.protection.outlook.com [40.107.237.86])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 598401F236B;
	Fri, 14 Mar 2025 09:49:37 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=fail smtp.client-ip=40.107.237.86
ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741945779; cv=fail;
 b=VpKnalZCfyCGoXhyfqdhLh1iS8+pDsh6TwyplmWTVJH5uiG69Q3tt7V7Ft13Z6h38XiMMaEbeokI3AJ9LdoWuwfp8+ANPYEVksp5kK23MrWW1pBN2QWd2Z64Vi2kpb6yN4B9qdJpa6F+cEvDK1M+dclHslYKjtUp0NZZjEX3bWI=
ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741945779; c=relaxed/simple;
	bh=bX0KGLg7VcArnhLDCA+9MB2g3vldI0RocwxNRE/VC50=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 Content-Type:MIME-Version;
 b=rBawMMCKBl7nmf52OrXWHPTnHVKShM50/xe1lccI0qOxN7Z2bopeBwTyYRKvsxPMur9Aab9OVhZ0UP/BrrDjaFIw0MLCmMYZfDUj2NKykLFZQpHB299rFYz3VbkAXWrd008XkEMkBKSG1vpRNNNA97tci4qE7VXhRoY0vgHYS1g=
ARC-Authentication-Results: i=2; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com;
 spf=fail smtp.mailfrom=nvidia.com;
 dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b=IIwL6oAa; arc=fail smtp.client-ip=40.107.237.86
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=fail smtp.mailfrom=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b="IIwL6oAa"
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=pz+VrFbxTzQbrD6O0/3wKcnwfvCAtuYGL0zKq/5v+Ygf0ZD7vn2+WXPcHupniYxvVj7TGIENzpBiutxeU/Zc8mQx5yl2HJAmAkNL/N/182JvFn6A7V+sJJ6Z4uu5t7oJR/OpYrz5+LkaBX2skUKmpiXHT2vtJ3p0A0v/u7GXsxHKVpc5NGgLNW9aTdg8U4iS2G1kKlsQI8AD/EV+vBx7uBhts/zRLs/AFpXnU1LyqrHfixf1vg+VbwPsWTclV92G1hGJFhVqdqovZ6LtscjSKvLG9sYnvpyCFDJYOYSPqrrBwlgTA0awwOT70wIkOhhjsbZpNV86igE6B2+C/PwpTg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=jl70aBtHhZmGxOT2RbPGdbxZB5r1Gb8jki0+zEhKmdo=;
 b=toenK1F0gwCo3LLL0Wi4LHygOtapvARoo1e+0Zpih1pwkiJ6vC8hWiIBltq9RITe4XQb35hTeh4Ig8ApfD/5IQRlRrj9FpCGZzWN8slVDRUKc/MGL/W85dwGa1zM0nIljJe9YQj7usP587gSncrDB8ZKQc2KXAfUf9FvBw2gmstlre9Amsqqox8yqL8hslZo1BvAUz/Sf66ELCSdPT+5CjmOSegrYx1V6EEjLpz2o5zpLJrtP4DvSXR3eu0gmSNZYEGxrXE5HUyCh2uT46UGFEdGVgjwZ74vJJ+wyigw+l7aB+f/DTf/Dx6TucSQdG08R4PoOywHeOXOf0keKs4nsw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com;
 dkim=pass header.d=nvidia.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com;
 s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=jl70aBtHhZmGxOT2RbPGdbxZB5r1Gb8jki0+zEhKmdo=;
 b=IIwL6oAawe+8Fhe7gbEv/dvAL2RD2CTTaKziIMgu7koPCFHG5OiBe6F5D/hNtXTniY4nRB8LqZLpnkccC+k/0cU7x4U804GA2d3XRqqjCTgrdBu2WYzFXJeF9O65Gs+LSqwZu2aPDVL2ctKrGAnmM7E0chBkh0xxr0KN1hS6L03dNiR5kYkTMw91YCEP8rtN1QpqbEfSjRuoR9N+vztVumImMoVxP6vYFL2w87ML8ED2H3Q1ENus5q7eVjdOMzjPlqqZ2Z8oG6MxA+fa/ZwUfmPs2nK3R8JtOQKHLuBih4iMmvpaXyb/v6GO5e6nvsvB+QdsGkpCkeEV8rR09olB5g==
Authentication-Results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=nvidia.com;
Received: from CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17)
 by CY5PR12MB6431.namprd12.prod.outlook.com (2603:10b6:930:39::8) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8511.27; Fri, 14 Mar
 2025 09:49:34 +0000
Received: from CY5PR12MB6405.namprd12.prod.outlook.com
 ([fe80::2119:c96c:b455:53b5]) by CY5PR12MB6405.namprd12.prod.outlook.com
 ([fe80::2119:c96c:b455:53b5%5]) with mapi id 15.20.8511.031; Fri, 14 Mar 2025
 09:49:34 +0000
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>,
	David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>
Cc: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH 7/8] selftests/sched_ext: Add test for
 scx_bpf_select_cpu_and()
Date: Fri, 14 Mar 2025 10:45:39 +0100
Message-ID: <20250314094827.167563-8-arighi@nvidia.com>
X-Mailer: git-send-email 2.48.1
In-Reply-To: <20250314094827.167563-1-arighi@nvidia.com>
References: <20250314094827.167563-1-arighi@nvidia.com>
X-ClientProxiedBy: ZR0P278CA0204.CHEP278.PROD.OUTLOOK.COM
 (2603:10a6:910:6a::13) To CY5PR12MB6405.namprd12.prod.outlook.com
 (2603:10b6:930:3e::17)
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: CY5PR12MB6405:EE_|CY5PR12MB6431:EE_
X-MS-Office365-Filtering-Correlation-Id: a9476414-04f9-435c-c545-08dd62dd86d8
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|366016|376014;
X-Microsoft-Antispam-Message-Info: 
 ssuKg9RTWakDLrDdthrOMKovk+rff2c/6GPyTg2rUJCkswbtvXbPKnf0SVtaqKQfuiTmvCbQwrL4SZ90YTpztKEL7k+eyVwQm3/W9Rf73hxLbrezIpVx22Vou8wyVEXZIg/ERCYDTz1eLp2hvN1iBemcvUFvtcq/YQjn8o15yO8wmZRxm6i2K/G1AO2PAQYt9V4YiJtzD7ujmz62cXUyltLuAnthbmh9AFTZlqxSv5UjjCtqKqusidATb7LWX6lFYV2HjMWGLsk3vmB+WR2Pyy+yNhWMkF6W992vx16+7lvUooYRbmWUXOcv85/pnkh29+icKtQq57cRizZBDDq55g9VM7poQAMest6XQfUJ4eUGyDOVU0D/3Wojh5/hUasKQ3o1xanxDZYpAj61sk5D/CEqE33wmcbG3/v76n/hghnREVCHkOPfo1iDoBabJ/qPzb41xPjUYR1spnw+HU4HqZOCnpDPQe/Uc4j1dQN4hjv29j00rTryFHP3mLxYBdMAqaz9wInZ25rZFsz5Ut3avVBlU6JY1vGqkP8Tl7nPdvxHQC2eKcZ86citPtajk/r3p0xkdZWjH5pWutCfCC29VaavVg7GlEFfeP4TodkwbRsNdW5RxUmS11s45Zf+LPb5MMFl7ZG7o9SGx+KWFaHF+y5cytUuGSfZVQneOcmbWl98hfJfT6agPby1HiFD8vJ9sMfCs254D1rTUsH+VL+oGpRvTl44fzzzEnwBZBY/CoYB7hozXPRQSXMIDg7wdrWnrSnbJOYpgnxinxmLiSVmcv2YgupUplK0eH90MCZLKF/J7DsGK0HJhTxBS84tMeXKX0AZxcNji3mrSEZp9OksYsX2AEqRcbQtr5vkdtG+VAqMeI70i6w9WJtx13xm2ZQHHXQf5XhrFq/MX2mvzl6c0gqRshbv8l7KZB8jO+JhZctp4XJ9WOmGpKllsHinlfgiyQXJazU5DxTxjrJZbG6I+hjAUbR2ypR1Zvoaq+YBRea1Xf657ppUOpxmM1pk29Vbh+SYWWtPWuMFK8QuXNmuf8AeQDkRQTlk04LcbgTCqShIZONlzGkmtxi3y3b8zNy0h4/c8IP/X0qTqrCOE1vKxkN7uwlU0x7u+pSBnyGYHmeuaQrqiltZhoaaaU5qIM0ZPqCFVVkk240TW1ww/sInXEImsfmdGuz7l9DBFe1wisB4rf0u0rh+cROawdVzG0IcUPSkfyl0xmXdtrrWAOWTAdP22monb69ejiMoU/IqHlTso8+y7f1anqZoRAHjXfFq6Iim/GuBdw5+ZsgV1ELyctIExyD5m4ms6GHVVDQN8rTkvKvIA1LNKtENyV5nBi+wf7/ao+w6O/V63FsigG6xdvY1EafY00MDSOk8Zj4S6FvaugE/mbUZi3voeSKyclVb
X-Forefront-Antispam-Report: 
	CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY5PR12MB6405.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(366016)(376014);DIR:OUT;SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: 
 kwmLW4KsDrCzf+uLOiNzoYFTquBFwoB5/e+8m9hxwg2CYWj+yFj7bXt5D7A7wN0tW2FraSt4irx76tPfJA2p5T5bYyfknxRqPFV1HyNqOBTP4cz8muwV+6wgt9C4EGa8brHfygmd6vQZCLb0c2T7GF7vX2gS+Qnni49pEI46L2GDliEOqykxhMbB6+alH5v2/xUhO8Cchh/itWP/NdAtUe5GegNR+lW8kFWQz+xcKIAlTyvR1xZyxVWrgafFgXegcWQfI/D3DEU/xbK/muXx0ttItm8M39yQ2K2wWnzhDzfLJVnndas2+OW4wcVBR90eruyo4cPI/S7uU/tXDnb0JvwptkCSw2Wg1JLytahjTegqFRtsz0FzSZWoOX0r0K/pT1PLwxtCcX2sGy9jOUYWFVhYMVwhRmDwjp55L8uqrv4gAkXMydmZ9bqInVRnXbKRmV/ZrFRvV0lIojUS7Gm4x8jUVbM/LvMwpaTb5dLRqIp0YDV/fJvYR9VJ4ezv6/f6BEMktPK2vYa4hY8xjdWEDR3Ou0F1GVDziddjVvCClfHgW3Xe+UAkroWoFdqGzqtHl4yQaPCuToz8V9udiPdxkF0YGQJahfydCN5XyoQo2CAcfw3p+6QHVNEDoNg3C07V88nbgR3jIwtTp9Qu8SjbnNnyyX6wz8G9K/Hjoil7tNU9V3Nzydof3d4ah4Zu1uGiqPRAkS5wdlkPEJm8iKOMIt3stpKVrLlHSkWIzxUVO9UVEKuLNx0paKcWZ8qRNCsl981Tk+vhsbmX8Fpf7ee26QT/3w+gOucWlzukeHIFc/6wB18agyaTSMUwTBNii6rUrhMDrRAzRf5Xr7jyAgs9GKACJNbzQfhN33h/oa6FC4icxLY19z2P4gZ5oTyX+Ryt7dGw1eIan3oNQViIzh8baqn56eSWy29FqJ93CMF1nfCQOo4G1+HfwGiCh+k5Hpr7abN3vi5yIxcLaR9PGdt7mZfxHhHBCDM/H1DvHaozoJ9o6SY9foX/nPbN/tfqBQixqvKmxnfsxD9XEYNHaOClZ2gO2o4Ob1m1wSBfeFEMhOM8OguP9SGFyF+6J/Zhlez5ZMuYdqws86EyBMHMGHXgnPP5u5e6GPDX4sqC4FOONcDnEcZ6vkdNjsrvcx7QRpoBKJlslyME+38fgQgJMB5cqeo1WlFJtDpNb6hoMhbRse0TkPfb+QiTWEE7faHHn22X9WAY39qFqNmNxrxy7scUzoSTxQzDkHhrdSs21guCHA+FkVyALFYcebGKjzu8/JTlzQD3Q1iLSnIahDSq0THRkobcPBmkG38fK/6o/ZxzM1v6V82jPubxxnGaVVgWYl4pNIHwEfgspMUHDJ2HhMivPzhms1p1tqNAZjjC4Uv/SqfJAwZWcVY9NL7xULjcpzP26iaq8PBt3YAUSOwGvwc3yZX4cEsj08qJYw7Zwa2DGgZIsJzK80NzGF1j7OFWqWfEM6S4indvT4CvCE2IcOjnuhgkvX9hWY+Q6rbvoeDwrFgTZSPKZNznHCLw1UYhF781RPvSX0bIhlG4pnfKZpDR3Ter/3lZiWD7NsF418zY+GMr6O82w7tTctbivo1rpWGM
X-OriginatorOrg: Nvidia.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 a9476414-04f9-435c-c545-08dd62dd86d8
X-MS-Exchange-CrossTenant-AuthSource: CY5PR12MB6405.namprd12.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Mar 2025 09:49:34.6828
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 
 fEApD1yTq0GFERuBQM5WaaaJf5uqVI5zVRSuUXJnilg42jk0LuQk6P9AVBE6usknxQBGVLI78HSFZ0O2huVPEQ==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR12MB6431

Add a selftest to validate the behavior of the built-in idle CPU
selection policy applied to a subset of allowed CPUs, using
scx_bpf_select_cpu_and().

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 tools/testing/selftests/sched_ext/Makefile    |  1 +
 .../selftests/sched_ext/allowed_cpus.bpf.c    | 91 +++++++++++++++++++
 .../selftests/sched_ext/allowed_cpus.c        | 57 ++++++++++++
 3 files changed, 149 insertions(+)
 create mode 100644 tools/testing/selftests/sched_ext/allowed_cpus.bpf.c
 create mode 100644 tools/testing/selftests/sched_ext/allowed_cpus.c

diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/selftests/sched_ext/Makefile
index f4531327b8e76..e9d5bc575f806 100644
--- a/tools/testing/selftests/sched_ext/Makefile
+++ b/tools/testing/selftests/sched_ext/Makefile
@@ -173,6 +173,7 @@ auto-test-targets :=			\
 	maybe_null			\
 	minimal				\
 	numa				\
+	allowed_cpus			\
 	prog_run			\
 	reload_loop			\
 	select_cpu_dfl			\
diff --git a/tools/testing/selftests/sched_ext/allowed_cpus.bpf.c b/tools/testing/selftests/sched_ext/allowed_cpus.bpf.c
new file mode 100644
index 0000000000000..0c9de334d4427
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/allowed_cpus.bpf.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * A scheduler that validates the behavior of scx_bpf_select_cpu_and() by
+ * selecting idle CPUs strictly within a subset of allowed CPUs.
+ *
+ * Copyright (c) 2025 Andrea Righi <arighi@nvidia.com>
+ */
+
+#include <scx/common.bpf.h>
+
+char _license[] SEC("license") = "GPL";
+
+UEI_DEFINE(uei);
+
+private(PREF_CPUS) struct bpf_cpumask __kptr * allowed_cpumask;
+
+s32 BPF_STRUCT_OPS(allowed_cpus_select_cpu,
+		   struct task_struct *p, s32 prev_cpu, u64 wake_flags)
+{
+	const struct cpumask *allowed;
+	s32 cpu;
+
+	allowed = cast_mask(allowed_cpumask);
+	if (!allowed) {
+		scx_bpf_error("allowed domain not initialized");
+		return -EINVAL;
+	}
+
+	/*
+	 * Select an idle CPU strictly within the allowed domain.
+	 */
+	cpu = scx_bpf_select_cpu_and(p, prev_cpu, wake_flags, allowed, 0);
+	if (cpu >= 0) {
+		if (scx_bpf_test_and_clear_cpu_idle(cpu))
+			scx_bpf_error("CPU %d should be marked as busy", cpu);
+
+		if (bpf_cpumask_subset(allowed, p->cpus_ptr) &&
+		    !bpf_cpumask_test_cpu(cpu, allowed))
+			scx_bpf_error("CPU %d not in the allowed domain for %d (%s)",
+				      cpu, p->pid, p->comm);
+
+		scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
+
+		return cpu;
+	}
+
+	return prev_cpu;
+}
+
+s32 BPF_STRUCT_OPS_SLEEPABLE(allowed_cpus_init)
+{
+	struct bpf_cpumask *mask;
+
+	mask = bpf_cpumask_create();
+	if (!mask)
+		return -ENOMEM;
+
+	mask = bpf_kptr_xchg(&allowed_cpumask, mask);
+	if (mask)
+		bpf_cpumask_release(mask);
+
+	bpf_rcu_read_lock();
+
+	/*
+	 * Assign the first online CPU to the allowed domain.
+	 */
+	mask = allowed_cpumask;
+	if (mask) {
+		const struct cpumask *online = scx_bpf_get_online_cpumask();
+
+		bpf_cpumask_set_cpu(bpf_cpumask_first(online), mask);
+		scx_bpf_put_cpumask(online);
+	}
+
+	bpf_rcu_read_unlock();
+
+	return 0;
+}
+
+void BPF_STRUCT_OPS(allowed_cpus_exit, struct scx_exit_info *ei)
+{
+	UEI_RECORD(uei, ei);
+}
+
+SEC(".struct_ops.link")
+struct sched_ext_ops allowed_cpus_ops = {
+	.select_cpu		= (void *)allowed_cpus_select_cpu,
+	.init			= (void *)allowed_cpus_init,
+	.exit			= (void *)allowed_cpus_exit,
+	.name			= "allowed_cpus",
+};
diff --git a/tools/testing/selftests/sched_ext/allowed_cpus.c b/tools/testing/selftests/sched_ext/allowed_cpus.c
new file mode 100644
index 0000000000000..a001a3a0e9f1f
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/allowed_cpus.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2025 Andrea Righi <arighi@nvidia.com>
+ */
+#include <bpf/bpf.h>
+#include <scx/common.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#include "allowed_cpus.bpf.skel.h"
+#include "scx_test.h"
+
+static enum scx_test_status setup(void **ctx)
+{
+	struct allowed_cpus *skel;
+
+	skel = allowed_cpus__open();
+	SCX_FAIL_IF(!skel, "Failed to open");
+	SCX_ENUM_INIT(skel);
+	SCX_FAIL_IF(allowed_cpus__load(skel), "Failed to load skel");
+
+	*ctx = skel;
+
+	return SCX_TEST_PASS;
+}
+
+static enum scx_test_status run(void *ctx)
+{
+	struct allowed_cpus *skel = ctx;
+	struct bpf_link *link;
+
+	link = bpf_map__attach_struct_ops(skel->maps.allowed_cpus_ops);
+	SCX_FAIL_IF(!link, "Failed to attach scheduler");
+
+	/* Just sleeping is fine, plenty of scheduling events happening */
+	sleep(1);
+
+	SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_NONE));
+	bpf_link__destroy(link);
+
+	return SCX_TEST_PASS;
+}
+
+static void cleanup(void *ctx)
+{
+	struct allowed_cpus *skel = ctx;
+
+	allowed_cpus__destroy(skel);
+}
+
+struct scx_test allowed_cpus = {
+	.name = "allowed_cpus",
+	.description = "Verify scx_bpf_select_cpu_and()",
+	.setup = setup,
+	.run = run,
+	.cleanup = cleanup,
+};
+REGISTER_SCX_TEST(&allowed_cpus)

From patchwork Fri Mar 14 09:45:40 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrea Righi <arighi@nvidia.com>
X-Patchwork-Id: 14016536
Received: from NAM12-MW2-obe.outbound.protection.outlook.com
 (mail-mw2nam12on2047.outbound.protection.outlook.com [40.107.244.47])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB7691F236B;
	Fri, 14 Mar 2025 09:49:46 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=fail smtp.client-ip=40.107.244.47
ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741945788; cv=fail;
 b=eQbrfIpUcwHeBS2MTKdY9z5VraVEdYmBgM2MzqCH5AREj1mfTpZxH3SM/O4aGl2H89Hy5estlHLfk8LpN908YpdiMfd/gzOrPRaOqrjvpxhEf0xwFFLnEveYpBh1qVK8HUcxUIp7ZUFxHR+tGYtiDRcd6drSr74+0QISbQO8bQQ=
ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741945788; c=relaxed/simple;
	bh=2z2tl6EGi0Fkp+AOarwDWTm6SH+nMXLI4PoENK3cf2s=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 Content-Type:MIME-Version;
 b=J0whWku3Qr/MG/ofKmFwL9O/Onjapj3RjOHFdTxmaL1Tp2C2nWp5fsYcn9qjfi1xzM4K/cdkJGE/REE0U0IVJ65T4GIgnczIE28AT7GacThiyD7pPimTXMfAMK/2rSnmRDuEB2XxMvsuZSnasMmqLu/YZiJx101iMqm8iFTS4nU=
ARC-Authentication-Results: i=2; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com;
 spf=fail smtp.mailfrom=nvidia.com;
 dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b=IkpTlegp; arc=fail smtp.client-ip=40.107.244.47
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=fail smtp.mailfrom=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b="IkpTlegp"
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=hzklExXiqKrULfeCoh6XOMBHNyNJ+YZ3EtPzL/aQIhV/h7/VcQ3hbK1+/AnT1w5nO9JTmZeB9Ck/CfC8o7ki0Kecw3hU9IBvaD25p91Z4FS2iC0vmfmwhUepuKVbJ2mXbHwJcHPjUqbu8h8wlCQAbJ/LWxQ4zTQGreK7ScJD7Puw8WigrudUdc+wll+lG4GkTFo6KEZZTFUh/G4tXpcxAorx6ZNbKiJZsEzR387pDZsYktVJiFf+Fj8LupzYP0X2lzecsy0KGzQ+AnYuXjP50Q0wFdgRJAkPXQOpn8X2W55sG0iN9Hp5lFyaJX2D4jKrdlhwLMDsVe/b+vp/pTJYQw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=lbrcSYpYfXxpoYSv8bqXb1Cw0q/XhBW9hmEoKDk2yz0=;
 b=ZGoW/gwgJkJNoRfgqAer/WaKkVEIBMRlu8UNTf/bIZP7rLMKwXP017mBwVu+wUFFKEBHyAM+GnigsEmWtbcmWstJl/eI9qjUpSNCFRbyVbIOoANj/5BRfNtsSimBYcgPV1gIEO70u9q2khc2rxeeK1GqUwfucQjxz6ZfCu4/NC/2RvmogvRgFuPodR07CWwTTKvJGKRO1RDUZpXVC9nkp2xMtuO+mFGVZi0sbIpxpFLkgH4VNVr3k1rUNVJeBLZSADVas193120ZmmbvDi/Fh87C70UpUtUq1r9PwEZgnud0heWHyuqIu3E2KnY+JnAMe26C/YahjPBPIjGV9QbIGA==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com;
 dkim=pass header.d=nvidia.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com;
 s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=lbrcSYpYfXxpoYSv8bqXb1Cw0q/XhBW9hmEoKDk2yz0=;
 b=IkpTlegp+tgCCbQruqeLYPWVpvrBd+engqa6u+YNive8CHs+tsOIMDrsB82jCVxXCOcNJ3fV3AiV0wbAfbuqcL736kd4VV1lqDM5H+I+Zy941CWjjbIw8yjAPysA5lucZLBHb/fDTZ0r38r7xlNdWxQ0Zx2z3iot72iipuoXxQfxTovXRlnlBYZyHJW1+pZs+tx77F9AF6VaO1ElPY6+0XkyYr/t/LT0C7ZT1USK73eGeea0XqdgR3ZhNhH54IQSBtu9OJkN7K9v2Tocw3K/vaC9eUVF2sgzSOPCcxToTL0sxd/IUnL/WT3CJ9QKaOPO00RLcWppdWxSnoWTa8jUww==
Authentication-Results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=nvidia.com;
Received: from CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17)
 by CY5PR12MB6431.namprd12.prod.outlook.com (2603:10b6:930:39::8) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8511.27; Fri, 14 Mar
 2025 09:49:43 +0000
Received: from CY5PR12MB6405.namprd12.prod.outlook.com
 ([fe80::2119:c96c:b455:53b5]) by CY5PR12MB6405.namprd12.prod.outlook.com
 ([fe80::2119:c96c:b455:53b5%5]) with mapi id 15.20.8511.031; Fri, 14 Mar 2025
 09:49:43 +0000
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>,
	David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>
Cc: bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH 8/8] sched_ext: idle: Deprecate scx_bpf_select_cpu_dfl()
Date: Fri, 14 Mar 2025 10:45:40 +0100
Message-ID: <20250314094827.167563-9-arighi@nvidia.com>
X-Mailer: git-send-email 2.48.1
In-Reply-To: <20250314094827.167563-1-arighi@nvidia.com>
References: <20250314094827.167563-1-arighi@nvidia.com>
X-ClientProxiedBy: MI1P293CA0012.ITAP293.PROD.OUTLOOK.COM
 (2603:10a6:290:2::10) To CY5PR12MB6405.namprd12.prod.outlook.com
 (2603:10b6:930:3e::17)
Precedence: bulk
X-Mailing-List: bpf@vger.kernel.org
List-Id: <bpf.vger.kernel.org>
List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: CY5PR12MB6405:EE_|CY5PR12MB6431:EE_
X-MS-Office365-Filtering-Correlation-Id: 2c735432-b4d0-4445-b3b9-08dd62dd8bf4
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|366016|376014;
X-Microsoft-Antispam-Message-Info: 
 1yXPsYRRRFeiKKdKz4jcr4okmkanJedVGj98nO35Op+NCZQWL/73qjJu60TE+dPyyTSosYimC2OhqJTB/pKJHEVzN2VxdeNOEiSKZBF+R6t7V/H1s0iAkfjtantPxzsABil7+r0ehLWnhcaqyLlWZ138DX4P7Q6wSC1SbiE3C8SHX2lHca5yWu0jZfVbq2VxRS0aNcvuzqy9hqJ9kOos9OChbrRU6IihRsGpZqdLEratG6S++b0oiCLZ5TfI9ec/jTiQM/YEC3XYXHhIRuCGXBD6MCkSJHR3EzdoOrskXdstG60Unjl+afFzctTsKaC+T/vs/wEzIIklr/h+24qThBJsZezwiiRrrXLIBIImBunD9H4wOkmWc2KiZHPIMZLhLMszTbOH/OlX2kG34moDeYlAgn9o/H80q3JIsWabEycnpBEuz1zaVDZX09yqCKybb5bMsajVmG+/Vc/kt0Mm1Ov+d5tCfnMgxg/eswKOjXqqdJvNP4ht7TkgWyyg8EDs5ffIxKRTo4WaEkfKifCjKGLzTd9v90wx5srQoN5lx0xIM7DEVmgRhNhSTevitXubRcvi4jqJQDDRjB9aDRMVB1WBGKaG20AjxjtvK2IZANQzGlmk6UO/ORMI2f6/h2FjrfV+fqEo1yIRUeg46Xspj1awVALBJw0URlm9Ld5BDvXTLhnQ8t6sR8/J1CX9U4wvEcVqn0ImCx+5rwGEu/mywi1L6C1LezuPkb0jAxId0ruoS7lpkf4EXQvipFNAbxnfjyYoZClOWlRqjSDmx6GmCSmAdIhO+dYBUU5BaoU5cIZoPflZgVi3fMVctiRc+NEN+x7/t1hC+8pk4OAANEH+PgtobbgiZ3JlujUe4m5lKR6RUxx2cXT+u/h8bPFL4awJG1zDBJOnQRm+lbVNZhvMiuRLr21ngPPosFzZ6QtETzawKkunWKq31DwDNEOCM+nhJpSXb7ociY0nAGegCxnswxubGoH5qlbdsGO/FRz5H5GY2J8296LoT9OxRdPLKD5ADA7QLCP70g+shDPssHFr81YsKLchvBVZrpJyu6PH0cSMnbkTyjyJO2WZiQWS8blpBqPXaStsrdquWCYQMpttcEnJ+qcS/7NIeVrYTQGGmDrYauF323G35bnvCcLmRx5h8HWm7gKUQI6dUmjpW1RpMCW2ZTzV1IXa3BurJ2AfH9BpcwSEsfysFBW4Ua3NvK/Na6qIHB1cnVJ9Z/iSCdCe7U6f3B64YFMVSv88C6QPjIxuuiSPa/JEc3hk+mYvU4jlET4TSB0n5HWKdJw1dXDYJNaaqyHLO/7k/4Yrn6VxOAuM1DCGA3qZuFFzsbf7pxU6+E6qU6ObBp2brUNCIpsvOC6HpITL3r13OqHJHzkYJ/DJDcLsMPPjUnPFKqoA0NSP
X-Forefront-Antispam-Report: 
	CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY5PR12MB6405.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(366016)(376014);DIR:OUT;SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: 
 zGfmi4pUsbBcI9+mzTesTr37vu4OwYXaq1qLmayQbYNqnaDjDE+lguoD/mA0jFTjXxWvao9PLFhfi/JUgneI34tsKv+/iNexZKUJxACzmCe5TCu//lwqV1vjAO/pGRBoaFXiVFVu6ZR9zQLoMd3GbQDTyfrL6+iHVB2s1Y5lP14MqZFSOX2MnzP7QPQpNBHP9rOB2d0kDJKq1BD1b1XNc4sgD957STX/FboJ0ob749S14czKVqxNeAjBlsH/jFQz23vBBOuNSCJRtPCiPKSsGtqf5HJjJmRinjys//ivpP9+51zib6PbWRbzP0QkKs+MZc5N1GZEmHmqm6aHUhenOb3B5vErAYw2HLYWSraMqDZhdx6BlZTdNtS4Ld3805HH0sYPeCW7f1FGA7oOquLRtKifOlTyGZ99L+0kZWLkgUV0VS52jZyZAOGy88rsSFjyD8VEoR0xj+O3mp6z6bip+BLQNWJuUFwEtI9fsQFFwkO5Diw8n2WRsCfpMgcD6kNUAzxUjpxEHQDI4ILRoPFMPZBmbXqrMX0+7NMZBYsfs+3iI2J0mIing+qWQ7MN/1mGGT09GfusFaSQQoCSVDY+XqvLlVY00mN548lYfeXH2WogTa1QWuD8U/WdDJ/LPh+cKxoiIHpH22NUUM2hYw4O+AaGbSOOi1cCMSzKWTVqMsNcgpzjP4NOl0T4hfAmhPWcANen88BW08YtI7yLJ+iMq7xqLGQvcIoWzHnFgFTGY+K/NJe/cn8ixouanXCZuC3Qw+DXjhnbC4O+SC/l3plooZd8MdEwyYrsjzK/7YSfKgOfBebESr7Hd5wra6awLOIeOfQARsUq7m7EHRcdYX6ku0jGG+vwB2Ymi0XgJvXz+34LS182eF7L/uWMXugWgFZZtbh2+pnfl0kUnl6CneUL9jrSNHWycJn/KIsj0LXCw8GjGKdCyYmcACDJ4psYrXcZwhkvVvx3iwZ/ATRU7NmJBVBzeoxct5n8cF3zoVHgEdr3y7VAnxufmuFA4HwkgFeCbVC6QRrsa3/3jhdm1lEO69KJKuw4pTW0/piHbDv4+xmmPuRo6FnxUs/xvVmyXBHNmdZ23MHBedIOJZJUtn9j+0N2o8BMZeCAPeUJ6CE2AUTxQzz29CYlEx3ksYiGpvO2tu3IqSCZTNIyqwDgSdi3egMQmN85Ca504BwmNqIS4o096ubb/Q6YQEzvzq8/sE+Z2gkND5lETRSaJeONB04/CwExeLoUaCYTjBCA3HXzO7PfJvlWXtycqlHDYhzVVbhND9L0imUQU7tkGrL4gz8sNcNpRDx+jREDTa/WjmenMckoPTXf7ZQeZdmNJLiptgoNxcskpMJElLucwmEM/MIe2F11P+FHunODbt7AJFe+SQHcUw4mgS6/fE+q+KM+swquwaozaTYJ62R8Q/cWxwXJut4IZAhsUGj3jE4Hcf1x0oJNJbaO4hz4iqq0zk4N3uIung30T4EeTN8xqFzRCiJa0lKAWN+nrTIL+F0IXMroC5Cze53e9fXbR2X8LBzAeU1/rhhVQZ8fSOkwWrSvGbcssSYR3vCMNZDfqOyojw0W1QtPl0TYIbG7Y4L1icWYjYAB
X-OriginatorOrg: Nvidia.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 2c735432-b4d0-4445-b3b9-08dd62dd8bf4
X-MS-Exchange-CrossTenant-AuthSource: CY5PR12MB6405.namprd12.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Mar 2025 09:49:43.2788
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 
 23cq3Td+3Rj8LsD3DIjsTuYCio0A682UbqdOL2DuwO909rNupx6FWtubkDGx4bONAIWY2WEs1nW9HWjiySe+vg==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR12MB6431

With the introduction of scx_bpf_select_cpu_and(), we can deprecate
scx_bpf_select_cpu_dfl(), as it offers only a subset of features and
it's also more consistent with other idle-related APIs (returning a
negative value when no idle CPU is found).

Therefore, mark scx_bpf_select_cpu_dfl() as deprecated (printing a
warning when it's used), update all the scheduler examples and
kselftests to adopt the new API, and ensure backward (source and binary)
compatibility by providing the necessary macros and hooks.

Support for scx_bpf_select_cpu_dfl() can be maintained until v6.17.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 Documentation/scheduler/sched-ext.rst         | 11 +++---
 kernel/sched/ext.c                            |  3 +-
 kernel/sched/ext_idle.c                       | 18 ++-------
 tools/sched_ext/include/scx/common.bpf.h      |  3 +-
 tools/sched_ext/include/scx/compat.bpf.h      | 37 +++++++++++++++++++
 tools/sched_ext/scx_flatcg.bpf.c              | 12 +++---
 tools/sched_ext/scx_simple.bpf.c              |  9 +++--
 .../sched_ext/enq_select_cpu_fails.bpf.c      | 12 +-----
 .../sched_ext/enq_select_cpu_fails.c          |  2 +-
 tools/testing/selftests/sched_ext/exit.bpf.c  |  6 ++-
 .../sched_ext/select_cpu_dfl_nodispatch.bpf.c | 13 +++----
 .../sched_ext/select_cpu_dfl_nodispatch.c     |  2 +-
 12 files changed, 73 insertions(+), 55 deletions(-)

diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst
index 0993e41353db7..7f36f4fcf5f31 100644
--- a/Documentation/scheduler/sched-ext.rst
+++ b/Documentation/scheduler/sched-ext.rst
@@ -142,15 +142,14 @@ optional. The following modified excerpt is from
                        s32 prev_cpu, u64 wake_flags)
     {
             s32 cpu;
-            /* Need to initialize or the BPF verifier will reject the program */
-            bool direct = false;
 
-            cpu = scx_bpf_select_cpu_dfl(p, prev_cpu, wake_flags, &direct);
-
-            if (direct)
+            cpu = scx_bpf_select_cpu_and(p, prev_cpu, wake_flags, p->cpus_ptr, 0);
+            if (cpu >= 0)
                     scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
+                    return cpu;
+            }
 
-            return cpu;
+            return prev_cpu;
     }
 
     /*
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 343f066c1185d..d82e9d3cbc0dc 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -464,13 +464,12 @@ struct sched_ext_ops {
 	 * state. By default, implementing this operation disables the built-in
 	 * idle CPU tracking and the following helpers become unavailable:
 	 *
-	 * - scx_bpf_select_cpu_dfl()
 	 * - scx_bpf_select_cpu_and()
 	 * - scx_bpf_test_and_clear_cpu_idle()
 	 * - scx_bpf_pick_idle_cpu()
 	 *
 	 * The user also must implement ops.select_cpu() as the default
-	 * implementation relies on scx_bpf_select_cpu_dfl().
+	 * implementation relies on scx_bpf_select_cpu_and().
 	 *
 	 * Specify the %SCX_OPS_KEEP_BUILTIN_IDLE flag to keep the built-in idle
 	 * tracking.
diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index c0de7b64771d4..2fc5e7972eed1 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -872,26 +872,16 @@ __bpf_kfunc int scx_bpf_cpu_node(s32 cpu)
 #endif
 }
 
-/**
- * scx_bpf_select_cpu_dfl - The default implementation of ops.select_cpu()
- * @p: task_struct to select a CPU for
- * @prev_cpu: CPU @p was on previously
- * @wake_flags: %SCX_WAKE_* flags
- * @is_idle: out parameter indicating whether the returned CPU is idle
- *
- * Can only be called from ops.select_cpu() if the built-in CPU selection is
- * enabled - ops.update_idle() is missing or %SCX_OPS_KEEP_BUILTIN_IDLE is set.
- * @p, @prev_cpu and @wake_flags match ops.select_cpu().
- *
- * Returns the picked CPU with *@is_idle indicating whether the picked CPU is
- * currently idle and thus a good candidate for direct dispatching.
- */
+/* Provided for backward binary compatibility, will be removed in v6.17. */
 __bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 				       u64 wake_flags, bool *is_idle)
 {
 #ifdef CONFIG_SMP
 	s32 cpu;
 #endif
+	printk_deferred_once(KERN_WARNING
+			"sched_ext: scx_bpf_select_cpu_dfl() deprecated in favor of scx_bpf_select_cpu_and()");
+
 	if (!ops_cpu_valid(prev_cpu, NULL))
 		goto prev_cpu;
 
diff --git a/tools/sched_ext/include/scx/common.bpf.h b/tools/sched_ext/include/scx/common.bpf.h
index 6f1da61cf7f17..1eb790eb90d40 100644
--- a/tools/sched_ext/include/scx/common.bpf.h
+++ b/tools/sched_ext/include/scx/common.bpf.h
@@ -47,7 +47,8 @@ static inline void ___vmlinux_h_sanity_check___(void)
 }
 
 s32 scx_bpf_create_dsq(u64 dsq_id, s32 node) __ksym;
-s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, bool *is_idle) __ksym;
+s32 scx_bpf_select_cpu_dfl(struct task_struct *p,
+			   s32 prev_cpu, u64 wake_flags, bool *is_idle) __ksym __weak;
 s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
 			   const struct cpumask *cpus_allowed, u64 flags) __ksym __weak;
 void scx_bpf_dsq_insert(struct task_struct *p, u64 dsq_id, u64 slice, u64 enq_flags) __ksym __weak;
diff --git a/tools/sched_ext/include/scx/compat.bpf.h b/tools/sched_ext/include/scx/compat.bpf.h
index 9252e1a00556f..f9caa7baf356c 100644
--- a/tools/sched_ext/include/scx/compat.bpf.h
+++ b/tools/sched_ext/include/scx/compat.bpf.h
@@ -225,6 +225,43 @@ static inline bool __COMPAT_is_enq_cpu_selected(u64 enq_flags)
 	 scx_bpf_pick_any_cpu_node(cpus_allowed, node, flags) :			\
 	 scx_bpf_pick_any_cpu(cpus_allowed, flags))
 
+/**
+ * scx_bpf_select_cpu_dfl - The default implementation of ops.select_cpu().
+ * We will preserve this compatible helper until v6.17.
+ *
+ * @p: task_struct to select a CPU for
+ * @prev_cpu: CPU @p was on previously
+ * @wake_flags: %SCX_WAKE_* flags
+ * @is_idle: out parameter indicating whether the returned CPU is idle
+ *
+ * Can only be called from ops.select_cpu() if the built-in CPU selection is
+ * enabled - ops.update_idle() is missing or %SCX_OPS_KEEP_BUILTIN_IDLE is set.
+ * @p, @prev_cpu and @wake_flags match ops.select_cpu().
+ *
+ * Returns the picked CPU with *@is_idle indicating whether the picked CPU is
+ * currently idle and thus a good candidate for direct dispatching.
+ */
+#define scx_bpf_select_cpu_dfl(p, prev_cpu, wake_flags, is_idle)		\
+({										\
+	s32 __cpu;								\
+										\
+	if (bpf_ksym_exists(scx_bpf_select_cpu_and)) {				\
+		__cpu = scx_bpf_select_cpu_and((p), (prev_cpu), (wake_flags),	\
+					       (p)->cpus_ptr, 0);		\
+		if (__cpu >= 0) {						\
+			*(is_idle) = true;					\
+		} else {							\
+			*(is_idle) = false;					\
+			__cpu = (prev_cpu);					\
+		}								\
+	} else {								\
+		__cpu = scx_bpf_select_cpu_dfl((p), (prev_cpu),			\
+					       (wake_flags), (is_idle));	\
+	}									\
+										\
+	__cpu;									\
+})
+
 /*
  * Define sched_ext_ops. This may be expanded to define multiple variants for
  * backward compatibility. See compat.h::SCX_OPS_LOAD/ATTACH().
diff --git a/tools/sched_ext/scx_flatcg.bpf.c b/tools/sched_ext/scx_flatcg.bpf.c
index 2c720e3ecad59..0075bff928893 100644
--- a/tools/sched_ext/scx_flatcg.bpf.c
+++ b/tools/sched_ext/scx_flatcg.bpf.c
@@ -317,15 +317,12 @@ static void set_bypassed_at(struct task_struct *p, struct fcg_task_ctx *taskc)
 s32 BPF_STRUCT_OPS(fcg_select_cpu, struct task_struct *p, s32 prev_cpu, u64 wake_flags)
 {
 	struct fcg_task_ctx *taskc;
-	bool is_idle = false;
 	s32 cpu;
 
-	cpu = scx_bpf_select_cpu_dfl(p, prev_cpu, wake_flags, &is_idle);
-
 	taskc = bpf_task_storage_get(&task_ctx, p, 0, 0);
 	if (!taskc) {
 		scx_bpf_error("task_ctx lookup failed");
-		return cpu;
+		return prev_cpu;
 	}
 
 	/*
@@ -333,13 +330,16 @@ s32 BPF_STRUCT_OPS(fcg_select_cpu, struct task_struct *p, s32 prev_cpu, u64 wake
 	 * idle. Follow it and charge the cgroup later in fcg_stopping() after
 	 * the fact.
 	 */
-	if (is_idle) {
+	cpu = scx_bpf_select_cpu_and(p, prev_cpu, wake_flags, p->cpus_ptr, 0);
+	if (cpu >= 0) {
 		set_bypassed_at(p, taskc);
 		stat_inc(FCG_STAT_LOCAL);
 		scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
+
+		return cpu;
 	}
 
-	return cpu;
+	return prev_cpu;
 }
 
 void BPF_STRUCT_OPS(fcg_enqueue, struct task_struct *p, u64 enq_flags)
diff --git a/tools/sched_ext/scx_simple.bpf.c b/tools/sched_ext/scx_simple.bpf.c
index e6de99dba7db6..0e48b2e46a683 100644
--- a/tools/sched_ext/scx_simple.bpf.c
+++ b/tools/sched_ext/scx_simple.bpf.c
@@ -54,16 +54,17 @@ static void stat_inc(u32 idx)
 
 s32 BPF_STRUCT_OPS(simple_select_cpu, struct task_struct *p, s32 prev_cpu, u64 wake_flags)
 {
-	bool is_idle = false;
 	s32 cpu;
 
-	cpu = scx_bpf_select_cpu_dfl(p, prev_cpu, wake_flags, &is_idle);
-	if (is_idle) {
+	cpu = scx_bpf_select_cpu_and(p, prev_cpu, wake_flags, p->cpus_ptr, 0);
+	if (cpu >= 0) {
 		stat_inc(0);	/* count local queueing */
 		scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
+
+		return cpu;
 	}
 
-	return cpu;
+	return prev_cpu;
 }
 
 void BPF_STRUCT_OPS(simple_enqueue, struct task_struct *p, u64 enq_flags)
diff --git a/tools/testing/selftests/sched_ext/enq_select_cpu_fails.bpf.c b/tools/testing/selftests/sched_ext/enq_select_cpu_fails.bpf.c
index a7cf868d5e311..d3c0716aa79c9 100644
--- a/tools/testing/selftests/sched_ext/enq_select_cpu_fails.bpf.c
+++ b/tools/testing/selftests/sched_ext/enq_select_cpu_fails.bpf.c
@@ -9,10 +9,6 @@
 
 char _license[] SEC("license") = "GPL";
 
-/* Manually specify the signature until the kfunc is added to the scx repo. */
-s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
-			   bool *found) __ksym;
-
 s32 BPF_STRUCT_OPS(enq_select_cpu_fails_select_cpu, struct task_struct *p,
 		   s32 prev_cpu, u64 wake_flags)
 {
@@ -22,14 +18,8 @@ s32 BPF_STRUCT_OPS(enq_select_cpu_fails_select_cpu, struct task_struct *p,
 void BPF_STRUCT_OPS(enq_select_cpu_fails_enqueue, struct task_struct *p,
 		    u64 enq_flags)
 {
-	/*
-	 * Need to initialize the variable or the verifier will fail to load.
-	 * Improving these semantics is actively being worked on.
-	 */
-	bool found = false;
-
 	/* Can only call from ops.select_cpu() */
-	scx_bpf_select_cpu_dfl(p, 0, 0, &found);
+	scx_bpf_select_cpu_and(p, 0, 0, p->cpus_ptr, 0);
 
 	scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags);
 }
diff --git a/tools/testing/selftests/sched_ext/enq_select_cpu_fails.c b/tools/testing/selftests/sched_ext/enq_select_cpu_fails.c
index a80e3a3b3698c..c964444998667 100644
--- a/tools/testing/selftests/sched_ext/enq_select_cpu_fails.c
+++ b/tools/testing/selftests/sched_ext/enq_select_cpu_fails.c
@@ -52,7 +52,7 @@ static void cleanup(void *ctx)
 
 struct scx_test enq_select_cpu_fails = {
 	.name = "enq_select_cpu_fails",
-	.description = "Verify we fail to call scx_bpf_select_cpu_dfl() "
+	.description = "Verify we fail to call scx_bpf_select_cpu_and() "
 		       "from ops.enqueue()",
 	.setup = setup,
 	.run = run,
diff --git a/tools/testing/selftests/sched_ext/exit.bpf.c b/tools/testing/selftests/sched_ext/exit.bpf.c
index 4bc36182d3ffc..8122421856c1b 100644
--- a/tools/testing/selftests/sched_ext/exit.bpf.c
+++ b/tools/testing/selftests/sched_ext/exit.bpf.c
@@ -20,12 +20,14 @@ UEI_DEFINE(uei);
 s32 BPF_STRUCT_OPS(exit_select_cpu, struct task_struct *p,
 		   s32 prev_cpu, u64 wake_flags)
 {
-	bool found;
+	s32 cpu;
 
 	if (exit_point == EXIT_SELECT_CPU)
 		EXIT_CLEANLY();
 
-	return scx_bpf_select_cpu_dfl(p, prev_cpu, wake_flags, &found);
+	cpu = scx_bpf_select_cpu_and(p, prev_cpu, wake_flags, p->cpus_ptr, 0);
+
+	return cpu >= 0 ? cpu : prev_cpu;
 }
 
 void BPF_STRUCT_OPS(exit_enqueue, struct task_struct *p, u64 enq_flags)
diff --git a/tools/testing/selftests/sched_ext/select_cpu_dfl_nodispatch.bpf.c b/tools/testing/selftests/sched_ext/select_cpu_dfl_nodispatch.bpf.c
index 815f1d5d61ac4..4e1b698f710e7 100644
--- a/tools/testing/selftests/sched_ext/select_cpu_dfl_nodispatch.bpf.c
+++ b/tools/testing/selftests/sched_ext/select_cpu_dfl_nodispatch.bpf.c
@@ -27,10 +27,6 @@ struct {
 	__type(value, struct task_ctx);
 } task_ctx_stor SEC(".maps");
 
-/* Manually specify the signature until the kfunc is added to the scx repo. */
-s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
-			   bool *found) __ksym;
-
 s32 BPF_STRUCT_OPS(select_cpu_dfl_nodispatch_select_cpu, struct task_struct *p,
 		   s32 prev_cpu, u64 wake_flags)
 {
@@ -43,10 +39,13 @@ s32 BPF_STRUCT_OPS(select_cpu_dfl_nodispatch_select_cpu, struct task_struct *p,
 		return -ESRCH;
 	}
 
-	cpu = scx_bpf_select_cpu_dfl(p, prev_cpu, wake_flags,
-				     &tctx->force_local);
+	cpu = scx_bpf_select_cpu_and(p, prev_cpu, wake_flags, p->cpus_ptr, 0);
+	if (cpu >= 0) {
+		tctx->force_local = true;
+		return cpu;
+	}
 
-	return cpu;
+	return prev_cpu;
 }
 
 void BPF_STRUCT_OPS(select_cpu_dfl_nodispatch_enqueue, struct task_struct *p,
diff --git a/tools/testing/selftests/sched_ext/select_cpu_dfl_nodispatch.c b/tools/testing/selftests/sched_ext/select_cpu_dfl_nodispatch.c
index 9b5d232efb7f6..2f450bb14e8d9 100644
--- a/tools/testing/selftests/sched_ext/select_cpu_dfl_nodispatch.c
+++ b/tools/testing/selftests/sched_ext/select_cpu_dfl_nodispatch.c
@@ -66,7 +66,7 @@ static void cleanup(void *ctx)
 
 struct scx_test select_cpu_dfl_nodispatch = {
 	.name = "select_cpu_dfl_nodispatch",
-	.description = "Verify behavior of scx_bpf_select_cpu_dfl() in "
+	.description = "Verify behavior of scx_bpf_select_cpu_and() in "
 		       "ops.select_cpu()",
 	.setup = setup,
 	.run = run,