From patchwork Mon Nov 4 14:23:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Herton R. Krzesinski" X-Patchwork-Id: 13861487 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15A25D132CF for ; Mon, 4 Nov 2024 14:23:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7EAC26B007B; Mon, 4 Nov 2024 09:23:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 79B056B0082; Mon, 4 Nov 2024 09:23:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6631C6B0083; Mon, 4 Nov 2024 09:23:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 46A6A6B007B for ; Mon, 4 Nov 2024 09:23:32 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id ECF86160A33 for ; Mon, 4 Nov 2024 14:23:31 +0000 (UTC) X-FDA: 82748629698.22.440D549 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf07.hostedemail.com (Postfix) with ESMTP id 59E6540012 for ; Mon, 4 Nov 2024 14:22:45 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=J1bWFqfn; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf07.hostedemail.com: domain of herton@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=herton@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730730152; a=rsa-sha256; cv=none; b=hKt3mAkABdjtfAI0rRuwyj83ykhyghsJ/8aNzgjitnnm4tsW7lRjOgQokOj53TpKEyT/+F zJGczfC6wA5lIXbeHYcU1YkS0toMcpIHtmpeArVwSv09j65WGOsPVUG5gQ6OEjANHjwsXx JgIO7b3o4Y1uVIjvwjH0AEximYggzR4= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=J1bWFqfn; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf07.hostedemail.com: domain of herton@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=herton@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730730152; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=j4zN/ElscuerA//rJUqf/4KurcEwyLi6zSlvuwt93ss=; b=eygWwyAj8ECemMmokd5nMlAiCJ6lmBHx7hAitNe12YTdHB9cpynLlk+vEPMSeQU4PutvPg rL0U/dw0/IKSMSgSo6koE8J9CIhEm3e0yLnGVZighGGk3HJ6gFmolESot+x3tvyL2Zwmqk p/Ncu8qrp5Qp4FQRoClZktdGisNpBYI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1730730209; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=j4zN/ElscuerA//rJUqf/4KurcEwyLi6zSlvuwt93ss=; b=J1bWFqfnRMDdjuOOiYW8SD3ytXQEn3JfU2z0VM9NMaW9sGLgyh5aFgMnih0c05xBC7S6c3 GRUC6X0v0HJzcSHk3gFZiZl780lpVruMNDUxGZhZCkp1XYTqdqigSOcXdw7sIDBaZ2Vomm lFYsScw0cHEuae9Q0/85bVjdXL/sDmA= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-80-t7I83Uq_PpO99z-09uYBSA-1; Mon, 04 Nov 2024 09:23:24 -0500 X-MC-Unique: t7I83Uq_PpO99z-09uYBSA-1 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A30011955DCD; Mon, 4 Nov 2024 14:23:21 +0000 (UTC) Received: from redhat.com (unknown [10.96.134.12]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 04B571956089; Mon, 4 Nov 2024 14:23:19 +0000 (UTC) From: "Herton R. Krzesinski" To: Andrew Morton Cc: Nicholas Piggin , Linus Torvalds , Peter Zijlstra , Michael Ellerman , Thomas Gleixner , linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org, herton@redhat.com Subject: [PATCH v2] lazy tlb: fix hotplug exit race with MMU_LAZY_TLB_SHOOTDOWN Date: Mon, 4 Nov 2024 11:23:18 -0300 Message-ID: <20241104142318.3295663-1-herton@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspam-User: X-Rspamd-Queue-Id: 59E6540012 X-Rspamd-Server: rspam01 X-Stat-Signature: po3xugbjqie7bnmuq3o6pja6tnryzjjf X-HE-Tag: 1730730165-326714 X-HE-Meta: U2FsdGVkX19X9hrNEfy+lFD42X04fh5yD3DijCQfQuBOtcvu68EhJUHR5+9eH/NJvqr262ijrfh8TmZGJ7MjnbEX8pwnSNYtPq8XUey6G0Ungbf9FVG9ofFlaQDWHR67EGIcGOeR8I8oW8GIXS/ulPblyVZlL1M04UbbC5dwqbv8uG6DLltJHXJxqe2VtDmOqGCDlt0rN1B486IamMMu9cPjKJUoecs98iB4Ce9TmV+xNvk6oyMQN4BZa2U7V4y9ahz7/ezecSvbbtLg0mgQSG5laQJKoEX2vG0X9hHoTn8ixPGb/PZNsa7YIewpsCYJckeKaInPAYzL4GJko8v+wxQeShyPE5ZxAtsLyCnEb7OylIBASS1QqeZ9jtkh+FbzCr2JuU68lBcWu5x02NDSbPQH3kx1jhpaEu05mtB6GkgBsEr6VuDQsqvzfva5XQQrgmbPXWvAnVOCi2rSq8SjE55ccCd/WJpyf10LYXl3cOneeTSOONaZ9ycciZLUQ9u/J+wts8A5B120KM32pS9FaI9eM8dMpdJAl45ncgMFCL1BpdHN1wkKNtjZXKlPQ8mlFZ0gDdAcRgf56tSU8JBfRXk45gg2ug59lgaQkOO4xmp9pLvBgnZR64sZ9s/T4mNKSDe8q3KxpwnU5/88+FDiaLfWCiMiQt7jyskFVcw9iVPNAFF5WNuV9wccS4SigfvLce9QqHwY01C/DPv+g4Cob9mkPhd9wsEwKsPx56AkhnGL57O7h1ZtlGqBwl5fREPz4Q/je9og5QMwiNikQAL4X+H/SVzmPwOXrdJZ8dnUJ819r/yqK3tDcY+tBmA6gdRNtOkOibsB10jnNyHsRZl9elVArPJ5NRcGTYPcSowVTZ2KzpLhxgilZMEPMt3WU8dw4EI+6FqQJVHUkQqbzZN99aU3SN9otFG5ZpCIkGlC3b0Ah4bP8bv6qg/dFtBEeCXd48lLXvbJaB7IwDoQEi0 G9Nq2jQU AfQR1pAskHffmYSGOTqyuzW5eKk8eD1mnqx3BBvIfQAxM1nJPXuewX1P40eWY/l/ZAOjUucK/pCHDyVxUPYMFK12Vf5zrqbjYEkPb86DgLTFNueRaj0402WtXDz8ms9EwvpUm0nvF3TL45dy4gv8H4g8GfED/hiymMZUkebEO+N4Fa/wyhaWtBEkyU86cULS7/O8Skif8yv0SB67qffFRSBbz/7UnWV2sy40eR0zOR+JNv7SaaXNTQr0Y5mfCu1FAdogq9EL2jeMcV3cfGDodtOyPrmoNnSVhOsTtxh0Yv51QkQTCoubdkpx0hUcEVGH2aR1xlpQ2rtNBZuotepZb/5SzeHOVDmakuS+TzY8a+DZKORdVKSmCTaplE6OyQcfj0C2zGYRk5VgAZuWspVMZPCiCx31uFuysJui2MxGdyDBEwd8MR4VHlufjMpBKnFITXzMg+cTvpZNFTilMhu2jukMv18Fw6CuawmYibKAFSxQELmO7qStyY2Rr/BMX0yjOR2Nq+Cij8kfJIRdfYhbbDMbjIsDIDCUzxdVx4M1vChiWqwV+9zfX9ENLsZ466hHHsPWOzBX8/rkd/CVju2vrLkskWYxPKP0k17SOpE4bchi/s/0iWqihOQ8yLyJCCeI31n/F X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Nicholas Piggin CPU unplug first calls __cpu_disable(), and that's where powerpc calls cleanup_cpu_mmu_context(), which clears this CPU from mm_cpumask() of all mms in the system. However this CPU may still be using a lazy tlb mm, and its mm_cpumask bit will be cleared from it. The CPU does not switch away from the lazy tlb mm until arch_cpu_idle_dead() calls idle_task_exit(). If that user mm exits in this window, it will not be subject to the lazy tlb mm shootdown and may be freed while in use as a lazy mm by the CPU that is being unplugged. cleanup_cpu_mmu_context() could be moved later, but it looks better to move the lazy tlb mm switching earlier. The problem with doing the lazy mm switching in idle_task_exit() is explained in commit bf2c59fce4074 ("sched/core: Fix illegal RCU from offline CPUs"), which added a wart to switch away from the mm but leave it set in active_mm to be cleaned up later. So instead, switch away from the lazy tlb mm at sched_cpu_wait_empty(), which is the last hotplug state before teardown (CPUHP_AP_SCHED_WAIT_EMPTY). This CPU will never switch to a user thread from this point, so it has no chance to pick up a new lazy tlb mm. This removes the lazy tlb mm handling wart in CPU unplug. With this, idle_task_exit() is not needed anymore and can be cleaned up. This leaves the prototype alone, to be cleaned after this change. herton: took the suggestions from https://lore.kernel.org/all/87jzvyprsw.ffs@tglx/ and made adjustments on the initial patch proposed by Nicholas. Link: https://lkml.kernel.org/r/20230524060455.147699-1-npiggin@gmail.com Link: https://lore.kernel.org/all/20230525205253.E2FAEC433EF@smtp.kernel.org/ Fixes: 2655421ae69fa ("lazy tlb: shoot lazies, non-refcounting lazy tlb mm reference handling scheme") Signed-off-by: Nicholas Piggin Cc: Linus Torvalds Cc: Peter Zijlstra Suggested-by: Thomas Gleixner Signed-off-by: Herton R. Krzesinski --- include/linux/sched/hotplug.h | 4 ---- kernel/cpu.c | 11 ++++++----- kernel/sched/core.c | 22 +++++++++++++++------- 3 files changed, 21 insertions(+), 16 deletions(-) Herton: I contacted Nicholas by email, he was ok with me going ahead and posting this, I saw the original patch was stalled/didn't went forward. Thus I'm posting this but keeping his From/authorship, since he is original author of the patch, so we can have this moving forward. I have a report and also reproduced the warning similar to the one reported at https://github.com/linuxppc/issues/issues/469 - which can be triggered doing cpu offline/online loop with CONFIG_DEBUG_VM enabled. This patch fixes the problem. I updated the changelog/patch based on the suggestions given and to the best of my knowledge/investigation on this issue, thorough review is appreciated. If this is ok then I can submit a followup for this to cleanup idle_task_exit(). v2: fix warning reported by kernel test robot https://lore.kernel.org/oe-kbuild-all/202411022220.0u2CXCAM-lkp@intel.com/ - sched_force_init_mm is only used under CONFIG_HOTPLUG_CPU at sched_cpu_wait_empty, so we don't need to define it for !CONFIG_HOTPLUG_CPU diff --git a/include/linux/sched/hotplug.h b/include/linux/sched/hotplug.h index 412cdaba33eb..17e04859b9a4 100644 --- a/include/linux/sched/hotplug.h +++ b/include/linux/sched/hotplug.h @@ -18,10 +18,6 @@ extern int sched_cpu_dying(unsigned int cpu); # define sched_cpu_dying NULL #endif -#ifdef CONFIG_HOTPLUG_CPU -extern void idle_task_exit(void); -#else static inline void idle_task_exit(void) {} -#endif #endif /* _LINUX_SCHED_HOTPLUG_H */ diff --git a/kernel/cpu.c b/kernel/cpu.c index d293d52a3e00..fb4f46885cb2 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -904,13 +904,14 @@ static int finish_cpu(unsigned int cpu) struct task_struct *idle = idle_thread_get(cpu); struct mm_struct *mm = idle->active_mm; - /* - * idle_task_exit() will have switched to &init_mm, now - * clean up any remaining active_mm state. + /* + * sched_force_init_mm() ensured the use of &init_mm, + * drop that refcount now that the CPU has stopped. */ - if (mm != &init_mm) - idle->active_mm = &init_mm; + WARN_ON(mm != &init_mm); + idle->active_mm = NULL; mmdrop_lazy_tlb(mm); + return 0; } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index dbfb5717d6af..7d8f47a8f000 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7826,19 +7826,26 @@ void sched_setnuma(struct task_struct *p, int nid) #ifdef CONFIG_HOTPLUG_CPU /* - * Ensure that the idle task is using init_mm right before its CPU goes - * offline. + * Invoked on the outgoing CPU in context of the CPU hotplug thread + * after ensuring that there are no user space tasks left on the CPU. + * + * If there is a lazy mm in use on the hotplug thread, drop it and + * switch to init_mm. + * + * The reference count on init_mm is dropped in finish_cpu(). */ -void idle_task_exit(void) +static void sched_force_init_mm(void) { struct mm_struct *mm = current->active_mm; - BUG_ON(cpu_online(smp_processor_id())); - BUG_ON(current != this_rq()->idle); - if (mm != &init_mm) { - switch_mm(mm, &init_mm, current); + mmgrab_lazy_tlb(&init_mm); + local_irq_disable(); + current->active_mm = &init_mm; + switch_mm_irqs_off(mm, &init_mm, current); + local_irq_enable(); finish_arch_post_lock_switch(); + mmdrop_lazy_tlb(mm); } /* finish_cpu(), as ran on the BP, will clean up the active_mm state */ @@ -8240,6 +8247,7 @@ int sched_cpu_starting(unsigned int cpu) int sched_cpu_wait_empty(unsigned int cpu) { balance_hotplug_wait(); + sched_force_init_mm(); return 0; }