From patchwork Tue Nov 12 14:22:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13872309 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24A41D42B85 for ; Tue, 12 Nov 2024 14:23:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A8B1D8D0009; Tue, 12 Nov 2024 09:23:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A3AA78D0001; Tue, 12 Nov 2024 09:23:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8B5AA8D0009; Tue, 12 Nov 2024 09:23:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6B4D98D0001 for ; Tue, 12 Nov 2024 09:23:46 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 1FB8F1A010B for ; Tue, 12 Nov 2024 14:23:46 +0000 (UTC) X-FDA: 82777660392.22.9559689 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf06.hostedemail.com (Postfix) with ESMTP id 9D8EB180010 for ; Tue, 12 Nov 2024 14:23:14 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=jtvLRf24; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf06.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731421248; a=rsa-sha256; cv=none; b=SwUII61zvtRTv+yCFlXTorTm9sRxNQgOu6wSvwN8HGB81Y4EnYnI+PlG6LbBJK/AkXpfSl Ch7M6p/uTl2OTqpJ8vKBe5AHcIjMKsgjZtU7/LrAUCdtcantNC1s6T7AuVBL6KMYpMiRBM M7u/+xsgSiCZgJu1X3wI8uiixuvSISY= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=jtvLRf24; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf06.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731421248; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=u5orFVSumv12STGpLH1rJ7LZUt0aBf1OAEJEhhsv9Qw=; b=l2qSX39G0hEdL3Mo5HxASooBSAb1++i1tmgYxbHUlwvWlporZnXc4gG0SfcqFRQWNCJydL sdbTqAhGWtt2ZZgzpDVvtb4E/QzYE2yNBwh6jHJwRKP6TrN0L9B78zvWKEsEvul4+ZgKyX k3Xi6T/dribDgViy44CaC7owx3l++5I= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 42ACA5C5558; Tue, 12 Nov 2024 14:22:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 13733C4CECD; Tue, 12 Nov 2024 14:23:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1731421423; bh=pa85GAr4tPpJJbyysqwfPwjTT1mOzOrKwHdNJ+6moto=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jtvLRf24a5DYRdTyw6weBAEzquiZKdPSG3J5Bqnes7vezMHYk5u33kuGAedyiSdEC DBtIz3NlR/I1xinJ4ZzV2zVmopScflrzYj0m2hHq4JSZKM01v6WC7VJPqn4iLeHli3 JivN3BDsVZ/XqWwU+l2Ou3xIT6WRn+TvYqwYNjkq8U1g+SdirnSEeJyIiBWqF79Ed3 fnV33hYuupkEWt2xczXdVANYVq94jBBzuZRv5VdyJdpBEPzDqg04955QyAYGRAqGsb UT6Psz8RKBZvIVi7ZCN7+FAvqZIDMzx2BaV6FvuxEwHmlF57sYpM6yOMzK3KqnM4ai Bc7/kKBLSB0Yw== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Kees Cook , Peter Zijlstra , Thomas Gleixner , Michal Hocko , Vlastimil Babka , linux-mm@kvack.org, "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Boqun Feng , Uladzislau Rezki , Zqiang , rcu@vger.kernel.org Subject: [PATCH 17/21] kthread: Implement preferred affinity Date: Tue, 12 Nov 2024 15:22:41 +0100 Message-ID: <20241112142248.20503-18-frederic@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241112142248.20503-1-frederic@kernel.org> References: <20241112142248.20503-1-frederic@kernel.org> MIME-Version: 1.0 X-Stat-Signature: e89gm9n337xuyjgfej8p8qg9agjye8us X-Rspamd-Queue-Id: 9D8EB180010 X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1731421394-487149 X-HE-Meta: U2FsdGVkX18zfv7JMu+tmq0GmkZJjxyTYdWylL09spX0R8f8aOrdYAPqzQLnl70zalDDMO4hgypS37JPLjQ89TdAz9Kbp2gvmyOr0/b1MCRt7LUez3DnE0PK2eZnQz6s3u8h1Fx8BsHO/YmwHPRza/YGsaBntAbWzejygJSliiwlo9b8ylo/izuXXpQxhSWiAFo+TdVXCkrwN4IBNnCuNh0UGmARJfOt04YwgSCcFHsJrJm7Bdpw3Tlqkn43HHAN+YcZWEUOPpjnQya2jvxtX4E9gDrG8jchEvMLmHnF6uE0l6fKBUoDe7CiNhYN5z3C2U6VhacMMFWh3C56p1xSB653CLRr6d3d94iYfthiq0d2RXYZAjtK9nve8IZS9LtkQbT0gsd+85c3NyLddWNFVfBSWpdDmjxxyCzM4RGsAktvF5CFg/HZiAT3o9NkFpFKpcxxE+PkG8tPpPRxPxTYwydzoB/yWQbr1gF2k3TojkQksxfYQAiZHox0GFv0hT4QFAMVsoTfOHWO3VFjnhVbpWVycdRZ3fKKfzz52eDyT6YTVM45sTbp26u+lVhqiaC0tbAgfDr/U8cHKyIi2mC73v0V9se+++ZDWwfbxgjw0bEuq06pSxmm8gv8d0wTCBzaiUYunthj3Pkc+Vc5lWKWG1f6mxAo9G7FyCzWNLeYOD9cWljLbILiP0+cKxqbds8LsPH9BoC1bYyhpl3rxkRwibsEXfc2bO6lZLDcB7Bdr1OhmIoDsrtKOpqvy66+UKryJFjAcWtknwYXuMMzTm5h4tc2+hDBSTy5IFBtiAy0ndy2PbQZJs/NzMkAlE++fMPXE7UDFyQn6ZFul5/qKG76gydOCXtIRdwAGa7rq56yNd2m4Ho1pwvr6znLgeabRA3IkJmUwTODfjl7ZY7C6WZOuyytGOfmuKMxs/95RaORFQymP66blcykd/bOSf6tSpHJ4R19pTcz717mGL7WWvC v9BhOMcH I5HESAbgnvb33p0b9IgqQ/9LNgL9gger5x1nizNNE7sdbea4ipjH4md8jV4jFvkOqXQS3pdMmiOji180qKZU/2xw2a2i1WU6XcglyQaqOK06FDvOAGoSNNvqsB9lHnZwTQyWehKuKP1X7URLkI3HbprXP1Xlc8t+e7sd5ZgoGi7UqiUCvF4xdYbBScm0iutfE35doXj1hjl1/RFh09+aBBxUMBMBLI331tHzy40+XTfNbtO4oIuJtCwZMYWuLgEIAKxXzir8UsfG248gyQNTaJWCjscBZcu89GK16B4375unMjN3TAQr9d0aZFQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Affining kthreads follow either of four existing different patterns: 1) Per-CPU kthreads must stay affine to a single CPU and never execute relevant code on any other CPU. This is currently handled by smpboot code which takes care of CPU-hotplug operations. 2) Kthreads that _have_ to be affine to a specific set of CPUs and can't run anywhere else. The affinity is set through kthread_bind_mask() and the subsystem takes care by itself to handle CPU-hotplug operations. 3) Kthreads that prefer to be affine to a specific NUMA node. That preferred affinity is applied by default when an actual node ID is passed on kthread creation, provided the kthread is not per-CPU and no call to kthread_bind_mask() has been issued before the first wake-up. 4) Similar to the previous point but kthreads have a preferred affinity different than a node. It is set manually like any other task and CPU-hotplug is supposed to be handled by the relevant subsystem so that the task is properly reaffined whenever a given CPU from the preferred affinity comes up. Also care must be taken so that the preferred affinity doesn't cross housekeeping cpumask boundaries. Provide a function to handle the last usecase, mostly reusing the current node default affinity infrastructure. kthread_affine_preferred() is introduced, to be used just like kthread_bind_mask(), right after kthread creation and before the first wake up. The kthread is then affine right away to the cpumask passed through the API if it has online housekeeping CPUs. Otherwise it will be affine to all online housekeeping CPUs as a last resort. As with node affinity, it is aware of CPU hotplug events such that: * When a housekeeping CPU goes up that is part of the preferred affinity of a given kthread, the related task is re-affined to that preferred affinity if it was previously running on the default last resort online housekeeping set. * When a housekeeping CPU goes down while it was part of the preferred affinity of a kthread, the running task is migrated (or the sleeping task is woken up) automatically by the scheduler to other housekeepers within the preferred affinity or, as a last resort, to all housekeepers from other nodes. Acked-by: Vlastimil Babka Signed-off-by: Frederic Weisbecker --- include/linux/kthread.h | 1 + kernel/kthread.c | 68 ++++++++++++++++++++++++++++++++++++----- 2 files changed, 62 insertions(+), 7 deletions(-) diff --git a/include/linux/kthread.h b/include/linux/kthread.h index b11f53c1ba2e..30209bdf83a2 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -85,6 +85,7 @@ kthread_run_on_cpu(int (*threadfn)(void *data), void *data, void free_kthread_struct(struct task_struct *k); void kthread_bind(struct task_struct *k, unsigned int cpu); void kthread_bind_mask(struct task_struct *k, const struct cpumask *mask); +int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask); int kthread_stop(struct task_struct *k); int kthread_stop_put(struct task_struct *k); bool kthread_should_stop(void); diff --git a/kernel/kthread.c b/kernel/kthread.c index df6a0551e8ba..43724fc6e021 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -70,6 +70,7 @@ struct kthread { char *full_name; struct task_struct *task; struct list_head hotplug_node; + struct cpumask *preferred_affinity; }; enum KTHREAD_BITS { @@ -327,6 +328,11 @@ void __noreturn kthread_exit(long result) mutex_lock(&kthreads_hotplug_lock); list_del(&kthread->hotplug_node); mutex_unlock(&kthreads_hotplug_lock); + + if (kthread->preferred_affinity) { + kfree(kthread->preferred_affinity); + kthread->preferred_affinity = NULL; + } } do_exit(0); } @@ -355,9 +361,17 @@ EXPORT_SYMBOL(kthread_complete_and_exit); static void kthread_fetch_affinity(struct kthread *kthread, struct cpumask *cpumask) { - cpumask_and(cpumask, cpumask_of_node(kthread->node), - housekeeping_cpumask(HK_TYPE_KTHREAD)); + const struct cpumask *pref; + if (kthread->preferred_affinity) { + pref = kthread->preferred_affinity; + } else { + if (WARN_ON_ONCE(kthread->node == NUMA_NO_NODE)) + return; + pref = cpumask_of_node(kthread->node); + } + + cpumask_and(cpumask, pref, housekeeping_cpumask(HK_TYPE_KTHREAD)); if (cpumask_empty(cpumask)) cpumask_copy(cpumask, housekeeping_cpumask(HK_TYPE_KTHREAD)); } @@ -440,7 +454,7 @@ static int kthread(void *_create) self->started = 1; - if (!(current->flags & PF_NO_SETAFFINITY)) + if (!(current->flags & PF_NO_SETAFFINITY) && !self->preferred_affinity) kthread_affine_node(); ret = -EINTR; @@ -839,12 +853,53 @@ int kthreadd(void *unused) return 0; } +int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask) +{ + struct kthread *kthread = to_kthread(p); + cpumask_var_t affinity; + unsigned long flags; + int ret; + + if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) { + WARN_ON(1); + return -EINVAL; + } + + WARN_ON_ONCE(kthread->preferred_affinity); + + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) + return -ENOMEM; + + kthread->preferred_affinity = kzalloc(sizeof(struct cpumask), GFP_KERNEL); + if (!kthread->preferred_affinity) { + ret = -ENOMEM; + goto out; + } + + mutex_lock(&kthreads_hotplug_lock); + cpumask_copy(kthread->preferred_affinity, mask); + WARN_ON_ONCE(!list_empty(&kthread->hotplug_node)); + list_add_tail(&kthread->hotplug_node, &kthreads_hotplug); + kthread_fetch_affinity(kthread, affinity); + + /* It's safe because the task is inactive. */ + raw_spin_lock_irqsave(&p->pi_lock, flags); + do_set_cpus_allowed(p, affinity); + raw_spin_unlock_irqrestore(&p->pi_lock, flags); + + mutex_unlock(&kthreads_hotplug_lock); +out: + free_cpumask_var(affinity); + + return 0; +} + /* * Re-affine kthreads according to their preferences * and the newly online CPU. The CPU down part is handled * by select_fallback_rq() which default re-affines to - * housekeepers in case the preferred affinity doesn't - * apply anymore. + * housekeepers from other nodes in case the preferred + * affinity doesn't apply anymore. */ static int kthreads_online_cpu(unsigned int cpu) { @@ -864,8 +919,7 @@ static int kthreads_online_cpu(unsigned int cpu) list_for_each_entry(k, &kthreads_hotplug, hotplug_node) { if (WARN_ON_ONCE((k->task->flags & PF_NO_SETAFFINITY) || - kthread_is_per_cpu(k->task) || - k->node == NUMA_NO_NODE)) { + kthread_is_per_cpu(k->task))) { ret = -EINVAL; continue; }