From patchwork Wed Dec 11 15:40:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frederic Weisbecker X-Patchwork-Id: 13903663 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBE79E77180 for ; Wed, 11 Dec 2024 15:41:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 53D6F6B0095; Wed, 11 Dec 2024 10:41:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4EF236B0098; Wed, 11 Dec 2024 10:41:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 367F86B0099; Wed, 11 Dec 2024 10:41:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 11B566B0095 for ; Wed, 11 Dec 2024 10:41:17 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8064CAF310 for ; Wed, 11 Dec 2024 15:41:16 +0000 (UTC) X-FDA: 82883091942.14.3C56319 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf24.hostedemail.com (Postfix) with ESMTP id C56B418000F for ; Wed, 11 Dec 2024 15:41:11 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=cHTxkkVn; spf=pass (imf24.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733931650; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=onQI45ii58UNueXOSDyL3wz/jz1jpp4THZgMbgek52o=; b=8FVocDAxw0VqZ4z7yx7kO8R2Jhp+YJwgMuBGA2BP2MsDkOYYuDfCPNvL6ci/a8XuZ76SGl O2jytIR1UJO0lVqGpcZVilzrmFATdCxdrXVP5T3wz5qhsj2ytRi0V/DQeWJoinECDBmDEV YTrbsT+EZY2xlPTnBkYaD+CIQviZFEs= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=cHTxkkVn; spf=pass (imf24.hostedemail.com: domain of frederic@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=frederic@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733931650; a=rsa-sha256; cv=none; b=ndeivQpvEbMDqX0xP4ZzmXLvCuldLtR9FV2e2dNBGf0sKt8g6EuTWikXLIDcHGE1IrBDca L2bY7inPlrDUumuX4QcqFH9mjvx5l5MSfMcN0Ox7MnkhPXiC61U+JO7u9KP3yAjMKMKUAf iaHxY+PAhwTwUdMvBWgr4dSRrmrJsUY= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 7BCBD5C041E; Wed, 11 Dec 2024 15:40:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4FD8BC4CEDE; Wed, 11 Dec 2024 15:41:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733931673; bh=Zl9E3LG4ePzB3S1zktQqMaaRMvPxhBEbf67CVv5m/KE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=cHTxkkVnesefVanc4+eZ3LWwfNJWnYX7m658TiDwNSPmtX/3AnFpkbX4TrJ/ImQy4 Hu4WVheUhsjZawEfpaSNjDmofDHG5tL8WAJgb9pFg0vauoXffLb/bjDBeXVP/ymgrU KJyu5XUCrwo0usqm95LRig9tq26ZTpe/WlQb8Wbf2RsIkj4GLJlDVQTjZfIPIL+cZ9 tX08hMyrBYYVU32XsWlR3jaEdJkBbRiQkecnHN9+b9UVv/zxjT7HbRNknY0gLixVD+ Bz66hGcuGvyv1oHODsCFMOSYEwWtYO3HEfm+O5JppuSLhmsIA0tbIF2YFQtCWwckYE o1Iw05m/KSfCg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Kees Cook , Peter Zijlstra , Thomas Gleixner , Michal Hocko , Vlastimil Babka , linux-mm@kvack.org, "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Boqun Feng , Uladzislau Rezki , Zqiang , rcu@vger.kernel.org Subject: [PATCH 12/19] kthread: Default affine kthread to its preferred NUMA node Date: Wed, 11 Dec 2024 16:40:25 +0100 Message-ID: <20241211154035.75565-13-frederic@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241211154035.75565-1-frederic@kernel.org> References: <20241211154035.75565-1-frederic@kernel.org> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: C56B418000F X-Rspam-User: X-Stat-Signature: 3kgk1qnopwzueobbqoe47pyk3yqzi5ue X-HE-Tag: 1733931671-843804 X-HE-Meta: U2FsdGVkX1+Lj2hMdCft+yGyWe0ivwLhEKCiFTeRa7PfbizG1UizON1ZxV0CnDlG/ELwHfvAziq6zeyO89XCPeJ5mwvAreDmp6teYXEaM5dcoYdCMMG8z6Vt9Oz0XECv6sW9WVECbqfxl6EAQe9I72X1NTUs7q95zr+chz4DjCSrb5LGYQJ7BLmqILyAKKcVxv6wQ/G1xJ77D3QuWhci31/5+W0OC5GlOMGoSeKGj24cMIaxBcBkGqw9TtyzTyge2+p3jESIxNb+RkkPzCkiArrdfMqlvDfHRHWCWjo3I86n9c3oAnG8SV96vyZsGoIM+obRQsDj2J2f33oQUgzlpX7NQBqmF4IYGUw5JZHLjv1QMJyhlkTlcyg9PyLPBndHv4vxzwJS2gutZLawg6mGxT98nfui85BPipVXaJemUd7+e4xHzJBx/ucYT0qGxX0JGmLsOaIqsLUZloa0JYikzTYvE5DWI0AglVYCLM5muw/ScouCm/uyeW/nJZotS+3iCcSwu1Azrih9hrsVtefB1ld320jd4ddqRURNkqC1xyFzz0hHyqv+ApImVN5CYOSeys4looZOceveor+DCoVImnBZkYh0D5utvgPseiwQFaz/1vVbNYcXccLlmvvzBZORcEcTfKARf0pSdbkRIoM4JBI71YqrXMSdKHs4MRrDXldsQXz6hhKelxangZ5QOkoZsJjdyRkznZqJ/eBRAdjwD/g8kbkFOuGrP8GNGGQCbfbF+S48733//zMewVqfv7yNVa5Gv863hm+lkogbS2OKNiMyYp7JIXxaNPkFlAmbVXa5AyU63K+vLLk+O/keK8IMCTH+fPBdFuK+Q4CYQ1rdBWg9rNRl+/KkPOM72ypvBqsFK1yTxw6zN414lc+Kz/WwCAHMMGVBNmvOuYaQj4y6809eXt/BV0WpxTJRfI++3uL+Ypv5CBdLJVsf5zbX7cu6lr6HUEbqCB/J+IR5Le4 lfeNFNPK wm/Ywf0JXILm2QXjrLtB02Oj4pw9XIyu1Lo3Hf9FFt7IlncMOOl5MwPc4kRSd0YQ+VXnYqJeBqUjpW29oGNNvz8zxIk73JhqM5ZPeV7rN9hCC2HjX3f1VJEgsWVQi8a+WPXwUNFEpsygvpGC82J8s2eQvSyh/0GppaGN2fQJ4Z6OmEexwnYoJzwSAyE9bdbYwvBLDvu+kG1ezYlbHuhuYbV8Ciz6Auy9vtcu9I8hzgaRheZGWlskX7F6H0By/tk8alYJVr2ENR0Cgea1D0bg0ZLhJtOqfv0HJ8LKcc1tXsMxm6BW8XzDlmETyUw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kthreads attached to a preferred NUMA node for their task structure allocation can also be assumed to run preferrably within that same node. A more precise affinity is usually notified by calling kthread_create_on_cpu() or kthread_bind[_mask]() before the first wakeup. For the others, a default affinity to the node is desired and sometimes implemented with more or less success when it comes to deal with hotplug events and nohz_full / CPU Isolation interactions: - kcompactd is affine to its node and handles hotplug but not CPU Isolation - kswapd is affine to its node and ignores hotplug and CPU Isolation - A bunch of drivers create their kthreads on a specific node and don't take care about affining further. Handle that default node affinity preference at the generic level instead, provided a kthread is created on an actual node and doesn't apply any specific affinity such as a given CPU or a custom cpumask to bind to before its first wake-up. This generic handling is aware of CPU hotplug events and CPU isolation such that: * When a housekeeping CPU goes up that is part of the node of a given kthread, the related task is re-affined to that own node if it was previously running on the default last resort online housekeeping set from other nodes. * When a housekeeping CPU goes down while it was part of the node of a kthread, the running task is migrated (or the sleeping task is woken up) automatically by the scheduler to other housekeepers within the same node or, as a last resort, to all housekeepers from other nodes. Acked-by: Vlastimil Babka Signed-off-by: Frederic Weisbecker --- include/linux/cpuhotplug.h | 1 + kernel/kthread.c | 106 ++++++++++++++++++++++++++++++++++++- 2 files changed, 106 insertions(+), 1 deletion(-) diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index a04b73c40173..6cc5e484547c 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -240,6 +240,7 @@ enum cpuhp_state { CPUHP_AP_WORKQUEUE_ONLINE, CPUHP_AP_RANDOM_ONLINE, CPUHP_AP_RCUTREE_ONLINE, + CPUHP_AP_KTHREADS_ONLINE, CPUHP_AP_BASE_CACHEINFO_ONLINE, CPUHP_AP_ONLINE_DYN, CPUHP_AP_ONLINE_DYN_END = CPUHP_AP_ONLINE_DYN + 40, diff --git a/kernel/kthread.c b/kernel/kthread.c index b6f9ce475a4f..3394ff024a5a 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -35,6 +35,9 @@ static DEFINE_SPINLOCK(kthread_create_lock); static LIST_HEAD(kthread_create_list); struct task_struct *kthreadd_task; +static LIST_HEAD(kthreads_hotplug); +static DEFINE_MUTEX(kthreads_hotplug_lock); + struct kthread_create_info { /* Information passed to kthread() from kthreadd. */ @@ -53,6 +56,7 @@ struct kthread_create_info struct kthread { unsigned long flags; unsigned int cpu; + unsigned int node; int started; int result; int (*threadfn)(void *); @@ -64,6 +68,8 @@ struct kthread { #endif /* To store the full name if task comm is truncated. */ char *full_name; + struct task_struct *task; + struct list_head hotplug_node; }; enum KTHREAD_BITS { @@ -122,8 +128,11 @@ bool set_kthread_struct(struct task_struct *p) init_completion(&kthread->exited); init_completion(&kthread->parked); + INIT_LIST_HEAD(&kthread->hotplug_node); p->vfork_done = &kthread->exited; + kthread->task = p; + kthread->node = tsk_fork_get_node(current); p->worker_private = kthread; return true; } @@ -314,6 +323,11 @@ void __noreturn kthread_exit(long result) { struct kthread *kthread = to_kthread(current); kthread->result = result; + if (!list_empty(&kthread->hotplug_node)) { + mutex_lock(&kthreads_hotplug_lock); + list_del(&kthread->hotplug_node); + mutex_unlock(&kthreads_hotplug_lock); + } do_exit(0); } EXPORT_SYMBOL(kthread_exit); @@ -339,6 +353,48 @@ void __noreturn kthread_complete_and_exit(struct completion *comp, long code) } EXPORT_SYMBOL(kthread_complete_and_exit); +static void kthread_fetch_affinity(struct kthread *kthread, struct cpumask *cpumask) +{ + cpumask_and(cpumask, cpumask_of_node(kthread->node), + housekeeping_cpumask(HK_TYPE_KTHREAD)); + + if (cpumask_empty(cpumask)) + cpumask_copy(cpumask, housekeeping_cpumask(HK_TYPE_KTHREAD)); +} + +static void kthread_affine_node(void) +{ + struct kthread *kthread = to_kthread(current); + cpumask_var_t affinity; + + WARN_ON_ONCE(kthread_is_per_cpu(current)); + + if (kthread->node == NUMA_NO_NODE) { + housekeeping_affine(current, HK_TYPE_KTHREAD); + } else { + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) { + WARN_ON_ONCE(1); + return; + } + + mutex_lock(&kthreads_hotplug_lock); + WARN_ON_ONCE(!list_empty(&kthread->hotplug_node)); + list_add_tail(&kthread->hotplug_node, &kthreads_hotplug); + /* + * The node cpumask is racy when read from kthread() but: + * - a racing CPU going down will either fail on the subsequent + * call to set_cpus_allowed_ptr() or be migrated to housekeepers + * afterwards by the scheduler. + * - a racing CPU going up will be handled by kthreads_online_cpu() + */ + kthread_fetch_affinity(kthread, affinity); + set_cpus_allowed_ptr(current, affinity); + mutex_unlock(&kthreads_hotplug_lock); + + free_cpumask_var(affinity); + } +} + static int kthread(void *_create) { static const struct sched_param param = { .sched_priority = 0 }; @@ -369,7 +425,6 @@ static int kthread(void *_create) * back to default in case they have been changed. */ sched_setscheduler_nocheck(current, SCHED_NORMAL, ¶m); - set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD)); /* OK, tell user we're spawned, wait for stop or wakeup */ __set_current_state(TASK_UNINTERRUPTIBLE); @@ -385,6 +440,9 @@ static int kthread(void *_create) self->started = 1; + if (!(current->flags & PF_NO_SETAFFINITY)) + kthread_affine_node(); + ret = -EINTR; if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) { cgroup_kthread_ready(); @@ -781,6 +839,52 @@ int kthreadd(void *unused) return 0; } +/* + * Re-affine kthreads according to their preferences + * and the newly online CPU. The CPU down part is handled + * by select_fallback_rq() which default re-affines to + * housekeepers in case the preferred affinity doesn't + * apply anymore. + */ +static int kthreads_online_cpu(unsigned int cpu) +{ + cpumask_var_t affinity; + struct kthread *k; + int ret; + + guard(mutex)(&kthreads_hotplug_lock); + + if (list_empty(&kthreads_hotplug)) + return 0; + + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) + return -ENOMEM; + + ret = 0; + + list_for_each_entry(k, &kthreads_hotplug, hotplug_node) { + if (WARN_ON_ONCE((k->task->flags & PF_NO_SETAFFINITY) || + kthread_is_per_cpu(k->task) || + k->node == NUMA_NO_NODE)) { + ret = -EINVAL; + continue; + } + kthread_fetch_affinity(k, affinity); + set_cpus_allowed_ptr(k->task, affinity); + } + + free_cpumask_var(affinity); + + return ret; +} + +static int kthreads_init(void) +{ + return cpuhp_setup_state(CPUHP_AP_KTHREADS_ONLINE, "kthreads:online", + kthreads_online_cpu, NULL); +} +early_initcall(kthreads_init); + void __kthread_init_worker(struct kthread_worker *worker, const char *name, struct lock_class_key *key)