From patchwork Wed Jan 18 08:00:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 13105806 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6575FC004D4 for ; Wed, 18 Jan 2023 08:00:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B6EA86B0075; Wed, 18 Jan 2023 03:00:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B1E616B0078; Wed, 18 Jan 2023 03:00:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C3A66B007B; Wed, 18 Jan 2023 03:00:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 8BC5B6B0075 for ; Wed, 18 Jan 2023 03:00:29 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 5A187403E5 for ; Wed, 18 Jan 2023 08:00:29 +0000 (UTC) X-FDA: 80367172578.09.5223801 Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by imf21.hostedemail.com (Postfix) with ESMTP id 9EB461C001F for ; Wed, 18 Jan 2023 08:00:27 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=mLPAcj75; spf=pass (imf21.hostedemail.com: domain of npiggin@gmail.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674028827; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n6PuCTERzdY2d/gYc8R3maLl/fbctX4OPH6RObWuYyk=; b=JcNTAFUE+uizupTes8y0x544HAQ/R1npHHFR4GIKa8Bsg1V544XcvqYAOA8s6oy0pfPa2U ATKdn9SGlZ2PRzFGeSRppVfl1iAzKflxYlRibfJsmSs8CkUIZFbW0BcFzjDvlhPJo3y8nS DJR8ZoHF/xqcostbYoA/l6HY4iFssh8= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=mLPAcj75; spf=pass (imf21.hostedemail.com: domain of npiggin@gmail.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674028827; a=rsa-sha256; cv=none; b=xZBmY5FZLjRj83H0fsxMJAHsdmV1ZvnhJ9VpsnQh4k2QuSCWQ/xxDEoXirIhs2wRaa9A2q DiQRuReS9D6JH7qqPqq1gOsUQUip6zmHxqiaWNArQ4nH8mebQKeeTJ07se0rWdbITI2/vP Qnu3zHJ/ukDi7o45h9OFlIgJbf+8eEQ= Received: by mail-pj1-f52.google.com with SMTP id q64so34968041pjq.4 for ; Wed, 18 Jan 2023 00:00:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=n6PuCTERzdY2d/gYc8R3maLl/fbctX4OPH6RObWuYyk=; b=mLPAcj75durv64NidSKodcHmBhYym/DDM+iL8MW5HDxvzQ5mMgHhbdgrXMpJGbOi4Y mVW9wBj5zIok1vK4Tj2bKVEe6bs1LBNMhZmZtD+n8mfhO8aNk1YcaysWuuWRHrSw6pFM UHPaAcT1FbrZaF9vdXEgSY47DOHbKtLKA3Adq6HXuimKYkD6R1fgcTphKi5s0lO2eqH5 eupSgPjW+jy9HiX5ukWEqS2r+u8df35c62Wc5Sm3oOxka7ARVkbcVCZk+bOH3OlLVlH4 xJHj62JSP0Q0g1rP9oo+WXMSAJy8ojYfY/mTTWZkllr+1/fbmo7L0Y7+eK5PDWH5WUNO 0Yfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n6PuCTERzdY2d/gYc8R3maLl/fbctX4OPH6RObWuYyk=; b=o29YF3ee6RPCCKo3CaOeJwpY5KLZuq4tspy4NfhYBixfdW7gLfsp3Y9kxfvgzbZy9s MhcQAU0wPrT7NSj3qiBv1iKkdv9qEjKYW7yJIhdwFAUI4f3ef0thsNZWuMgM/hUkWdgV kbL18u7Y/QRpNKxf+Ip2F6kE0OXnIWh611V+6N5MV6TQ1VTM12zWSWHcBwzjYmHlkEpL vClRmVIbu1nYvnHgwNoh3P00UBeQnvUsTgzzjJS9G2aO339Fgp3kXhYB1NRLY51nXDEG OK7GVz0lPu36u7qKqtVfDKBQAkK1A1YAjqp69H0Dwt8MBD3jhkDcPz0i3YLa9BiJbyJi HK5A== X-Gm-Message-State: AFqh2kps4N9cjaDoxo8MzlhFqbuVcbtATAZxZDJ1z2K1GjOYQERTYVS6 DX/jA6AgoQ0GK9RC059WQAw= X-Google-Smtp-Source: AMrXdXtoyvjMvP0/AkXBEr4j6oVuizDW6dMWCoEY70bY2IPo7cXCt1CBRz4u1euS2CBCOAp/EY+llw== X-Received: by 2002:a17:90a:dd98:b0:229:8526:ba98 with SMTP id l24-20020a17090add9800b002298526ba98mr6450941pjv.12.1674028826410; Wed, 18 Jan 2023 00:00:26 -0800 (PST) Received: from bobo.ibm.com (193-116-102-45.tpgi.com.au. [193.116.102.45]) by smtp.gmail.com with ESMTPSA id y2-20020a17090a16c200b002272616d3e1sm738462pje.40.2023.01.18.00.00.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Jan 2023 00:00:25 -0800 (PST) From: Nicholas Piggin To: Andrew Morton Cc: Nicholas Piggin , Andy Lutomirski , Linus Torvalds , linux-arch , linux-mm , linuxppc-dev@lists.ozlabs.org Subject: [PATCH v6 1/5] lazy tlb: introduce lazy tlb mm refcount helper functions Date: Wed, 18 Jan 2023 18:00:07 +1000 Message-Id: <20230118080011.2258375-2-npiggin@gmail.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20230118080011.2258375-1-npiggin@gmail.com> References: <20230118080011.2258375-1-npiggin@gmail.com> MIME-Version: 1.0 X-Stat-Signature: w5tg1shd7iw96jcrfbq9kmnjgmoqjkyh X-Rspam-User: X-Rspamd-Queue-Id: 9EB461C001F X-Rspamd-Server: rspam06 X-HE-Tag: 1674028827-685044 X-HE-Meta: U2FsdGVkX19jn6kpjAJtQc8CV9MLxEwKPhO7qEUH7819/Uan30K4v5PoSTugddiAxiS74m1YuzPnK/0IFnbgf5/4hg5bWTqPzOU2sV5fDXUDKB0f7ViVEEIq2My2QV6zROOOWF+StV7s3dUWFy63De3qNQHpeD9w44idRm3w7tI1WjuFN8l1zWIxxolieGTWWJLWXUvS8s9DSTQCFYZ4pVguiuL3UZ0zYcp6HRf2xEgYHqwtgExZJHA6sUXLos9A61CJ7MLoa7/hOmtejvGUoDaCRrosjjHHhMtxRSqoVR23e4tNW5vFKAIQq56+3FBUoQ1OPNEVjLUWAhdaxZM/E+B9ggTBkY1aEcDYI6osZldLs1wTDUE6jABtIExizVoir8F4dcl8pFMkyzRmc/1yyWScUpJks5U0v1fIT9t3DTz6ZjbBfTRPH5v7CJnKUFgpwMVNbHr4XMWGuRTuykAUY5xb21wmKLyBoB40LRJkY/G2JdMLtLy2KJIhoDDGg4az6cJOr+4nEicIUxmoIsxBaTMIhWg+L2dZ0NDctwU9BK+iinLG3SGgyzu8eACUEbhassUij0NlRL/C4uKYP1OC3RqJAIEMlBtOK3eI4YjL52n0l/YAEhAe/OrO6e6x4rZHRvpiDMVWzMgL69imo9BdKDfPLAfBvWdeRQnZ4Sptug5T1AHZ5kPK9QlSRSkUG99vbbUwqoq4nYr25oYG9vOMKBNEv4fbeP9zykQxCiermXa4zJDsDUWFu7EujRyw2KrasQicl6QTmzQxi5Ocj5cs4cdso6Fpbzff90Z77Fr3TDaSu6GPwKPW13oomWeceXat+fyc1lI5YGGBouijMQ+nXXfCw3qThWU15eC9xG0EDSjfVtt6i8QiawE7NR05g8QktbcMPoZJtX7Pb9hUQ985bEcWbJ2+PUi6QBroK+w4JoPGsG3ygKi0pbXHqu4ZSc9rVblvQm2FM1tUJOG7ISc i8IS7o88 F7vDra5+wEQN8NmUhCr/VCMSWbQtU6u5mLFCyxHNz93Kh1UuMx9HnJQ77o51h1EuuikxOMIfAJ3fH4/VjBMkoqHVw1rFpeCA33yL6HHJRD/ifRbkeQ1lQNF+0EZz4pG/5M/ic X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add explicit _lazy_tlb annotated functions for lazy tlb mm refcounting. This makes the lazy tlb mm references more obvious, and allows the refcounting scheme to be modified in later changes. The only functional change is in kthread_use_mm/kthread_unuse_mm is because it is clever with refcounting: If it happens that the kthread's lazy tlb mm (active_mm) is the same as the mm to be used, the code doesn't touch the refcount but rather transfers the lazy refcount to used-mm refcount. If the lazy tlb mm refcount is no longer equivalent to the regular refcount, this trick can not be used. mmgrab a regular reference on mm to use, and mmdrop_lazy_tlb the previous active_mm. Signed-off-by: Nicholas Piggin --- arch/arm/mach-rpc/ecard.c | 2 +- arch/powerpc/kernel/smp.c | 2 +- arch/powerpc/mm/book3s64/radix_tlb.c | 4 ++-- fs/exec.c | 2 +- include/linux/sched/mm.h | 16 ++++++++++++++++ kernel/cpu.c | 2 +- kernel/exit.c | 2 +- kernel/kthread.c | 21 +++++++++++++-------- kernel/sched/core.c | 15 ++++++++------- 9 files changed, 44 insertions(+), 22 deletions(-) diff --git a/arch/arm/mach-rpc/ecard.c b/arch/arm/mach-rpc/ecard.c index 53813f9464a2..c30df1097c52 100644 --- a/arch/arm/mach-rpc/ecard.c +++ b/arch/arm/mach-rpc/ecard.c @@ -253,7 +253,7 @@ static int ecard_init_mm(void) current->mm = mm; current->active_mm = mm; activate_mm(active_mm, mm); - mmdrop(active_mm); + mmdrop_lazy_tlb(active_mm); ecard_init_pgtables(mm); return 0; } diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 6b90f10a6c81..7db6b3faea65 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -1611,7 +1611,7 @@ void start_secondary(void *unused) if (IS_ENABLED(CONFIG_PPC32)) setup_kup(); - mmgrab(&init_mm); + mmgrab_lazy_tlb(&init_mm); current->active_mm = &init_mm; smp_store_cpu_info(cpu); diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c index 4e29b619578c..282359ab525b 100644 --- a/arch/powerpc/mm/book3s64/radix_tlb.c +++ b/arch/powerpc/mm/book3s64/radix_tlb.c @@ -794,10 +794,10 @@ void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush) if (current->active_mm == mm) { WARN_ON_ONCE(current->mm != NULL); /* Is a kernel thread and is using mm as the lazy tlb */ - mmgrab(&init_mm); + mmgrab_lazy_tlb(&init_mm); current->active_mm = &init_mm; switch_mm_irqs_off(mm, &init_mm, current); - mmdrop(mm); + mmdrop_lazy_tlb(mm); } /* diff --git a/fs/exec.c b/fs/exec.c index ab913243a367..1a32a88db173 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1033,7 +1033,7 @@ static int exec_mmap(struct mm_struct *mm) mmput(old_mm); return 0; } - mmdrop(active_mm); + mmdrop_lazy_tlb(active_mm); return 0; } diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 2a243616f222..5376caf6fcf3 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -79,6 +79,22 @@ static inline void mmdrop_sched(struct mm_struct *mm) } #endif +/* Helpers for lazy TLB mm refcounting */ +static inline void mmgrab_lazy_tlb(struct mm_struct *mm) +{ + mmgrab(mm); +} + +static inline void mmdrop_lazy_tlb(struct mm_struct *mm) +{ + mmdrop(mm); +} + +static inline void mmdrop_lazy_tlb_sched(struct mm_struct *mm) +{ + mmdrop_sched(mm); +} + /** * mmget() - Pin the address space associated with a &struct mm_struct. * @mm: The address space to pin. diff --git a/kernel/cpu.c b/kernel/cpu.c index 6c0a92ca6bb5..189895288d9d 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -623,7 +623,7 @@ static int finish_cpu(unsigned int cpu) */ if (mm != &init_mm) idle->active_mm = &init_mm; - mmdrop(mm); + mmdrop_lazy_tlb(mm); return 0; } diff --git a/kernel/exit.c b/kernel/exit.c index 15dc2ec80c46..1a4608d765e4 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -537,7 +537,7 @@ static void exit_mm(void) return; sync_mm_rss(mm); mmap_read_lock(mm); - mmgrab(mm); + mmgrab_lazy_tlb(mm); BUG_ON(mm != current->active_mm); /* more a memory barrier than a real lock */ task_lock(current); diff --git a/kernel/kthread.c b/kernel/kthread.c index f97fd01a2932..691b213e578f 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -1410,14 +1410,19 @@ void kthread_use_mm(struct mm_struct *mm) WARN_ON_ONCE(!(tsk->flags & PF_KTHREAD)); WARN_ON_ONCE(tsk->mm); + /* + * It's possible that tsk->active_mm == mm here, but we must + * still mmgrab(mm) and mmdrop_lazy_tlb(active_mm), because lazy + * mm may not have its own refcount (see mmgrab/drop_lazy_tlb()). + */ + mmgrab(mm); + task_lock(tsk); /* Hold off tlb flush IPIs while switching mm's */ local_irq_disable(); active_mm = tsk->active_mm; - if (active_mm != mm) { - mmgrab(mm); + if (active_mm != mm) tsk->active_mm = mm; - } tsk->mm = mm; membarrier_update_current_mm(mm); switch_mm_irqs_off(active_mm, mm, tsk); @@ -1434,12 +1439,9 @@ void kthread_use_mm(struct mm_struct *mm) * memory barrier after storing to tsk->mm, before accessing * user-space memory. A full memory barrier for membarrier * {PRIVATE,GLOBAL}_EXPEDITED is implicitly provided by - * mmdrop(), or explicitly with smp_mb(). + * mmdrop_lazy_tlb(). */ - if (active_mm != mm) - mmdrop(active_mm); - else - smp_mb(); + mmdrop_lazy_tlb(active_mm); } EXPORT_SYMBOL_GPL(kthread_use_mm); @@ -1467,10 +1469,13 @@ void kthread_unuse_mm(struct mm_struct *mm) local_irq_disable(); tsk->mm = NULL; membarrier_update_current_mm(NULL); + mmgrab_lazy_tlb(mm); /* active_mm is still 'mm' */ enter_lazy_tlb(mm, tsk); local_irq_enable(); task_unlock(tsk); + + mmdrop(mm); } EXPORT_SYMBOL_GPL(kthread_unuse_mm); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 25b582b6ee5f..26aaa974ee6d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5140,13 +5140,14 @@ static struct rq *finish_task_switch(struct task_struct *prev) * rq->curr, before returning to userspace, so provide them here: * * - a full memory barrier for {PRIVATE,GLOBAL}_EXPEDITED, implicitly - * provided by mmdrop(), + * provided by mmdrop_lazy_tlb(), * - a sync_core for SYNC_CORE. */ if (mm) { membarrier_mm_sync_core_before_usermode(mm); - mmdrop_sched(mm); + mmdrop_lazy_tlb_sched(mm); } + if (unlikely(prev_state == TASK_DEAD)) { if (prev->sched_class->task_dead) prev->sched_class->task_dead(prev); @@ -5203,9 +5204,9 @@ context_switch(struct rq *rq, struct task_struct *prev, /* * kernel -> kernel lazy + transfer active - * user -> kernel lazy + mmgrab() active + * user -> kernel lazy + mmgrab_lazy_tlb() active * - * kernel -> user switch + mmdrop() active + * kernel -> user switch + mmdrop_lazy_tlb() active * user -> user switch */ if (!next->mm) { // to kernel @@ -5213,7 +5214,7 @@ context_switch(struct rq *rq, struct task_struct *prev, next->active_mm = prev->active_mm; if (prev->mm) // from user - mmgrab(prev->active_mm); + mmgrab_lazy_tlb(prev->active_mm); else prev->active_mm = NULL; } else { // to user @@ -5230,7 +5231,7 @@ context_switch(struct rq *rq, struct task_struct *prev, lru_gen_use_mm(next->mm); if (!prev->mm) { // from kernel - /* will mmdrop() in finish_task_switch(). */ + /* will mmdrop_lazy_tlb() in finish_task_switch(). */ rq->prev_mm = prev->active_mm; prev->active_mm = NULL; } @@ -9859,7 +9860,7 @@ void __init sched_init(void) /* * The boot idle thread does lazy MMU switching as well: */ - mmgrab(&init_mm); + mmgrab_lazy_tlb(&init_mm); enter_lazy_tlb(&init_mm, current); /* From patchwork Wed Jan 18 08:00:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 13105807 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49FBAC004D4 for ; Wed, 18 Jan 2023 08:00:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D1CCA6B0078; Wed, 18 Jan 2023 03:00:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CCBD56B007B; Wed, 18 Jan 2023 03:00:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B944E6B007D; Wed, 18 Jan 2023 03:00:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id ABAE26B0078 for ; Wed, 18 Jan 2023 03:00:33 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 63B79402CF for ; Wed, 18 Jan 2023 08:00:33 +0000 (UTC) X-FDA: 80367172746.16.D537082 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf07.hostedemail.com (Postfix) with ESMTP id 8B1B340016 for ; Wed, 18 Jan 2023 08:00:31 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=qt6g0zZx; spf=pass (imf07.hostedemail.com: domain of npiggin@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674028831; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=j17jMfq2pjg82rh+LLiiAYltqlq3/q9a3KAqNyLA6fY=; b=oWYnTnep3OFmBzCl3vXJPOC7d9YkCcp31JViftIJjegpxqMImTFQIc+DEN3pOOOQ+YE2uP +Dpiwsp7cY+QhttQo54aig/o+JxZ7X/Y66tilEyjPtxogXiGWtptRZfuHB37fQ7zgayHvW +41G4nYDeg/uLyed9fILNzPXY7So8IU= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=qt6g0zZx; spf=pass (imf07.hostedemail.com: domain of npiggin@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674028831; a=rsa-sha256; cv=none; b=zyhNHwNWCfxu3t3q/Hm/3q0QxQ93F9OHRM4HJicoUfmoUWt+NDFAQRgfwLkSYkIt9fxWA9 15wvlhaS4cirTZAIQuCrJ71derARp8iZzYE64PSvmBeGfkZP39JrA1WFqGM14fwrRLtgko 8V82PzqEhEKmO6U1nd48ko5VRbks2ig= Received: by mail-pj1-f53.google.com with SMTP id z9-20020a17090a468900b00226b6e7aeeaso1365639pjf.1 for ; Wed, 18 Jan 2023 00:00:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=j17jMfq2pjg82rh+LLiiAYltqlq3/q9a3KAqNyLA6fY=; b=qt6g0zZxclXPX/KufLzDrxetxOa+Y0KuO2kDBGjEirYTyEuDL2FU0kh1CW+bvKzQ8e 1PgIn8vJJDHYwOXsY+eXFKzMReN8kzIkL3cqJbHC/KcVaDPk5+4ASp5rwAzbthNV2jCy sFId0D+KFl7RHqDm5hu7vWp9CHS1bb0XbHOhZNmj1FkqrWt4HCW+AaZLa2qrD0ImjpmG OLknz9aRVzEP5mWTfavQXBkdiCAYcQpJIhtMx/MGg9B5RRIVRIGc2Rw6L2BEqeC36yiX RdYtdnH476ugfbY0kv33lImerLZIfQVPZX7TrvFbeii8oXEK4Tdej1T6R4YBH6qhLF7Z 2J+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=j17jMfq2pjg82rh+LLiiAYltqlq3/q9a3KAqNyLA6fY=; b=I7dlmMrRcpXqM3PhFtq+RZ2qIBbzDM4ELWJ+98p7wO/i8KQY+ijodp9d/4RBTVGOMc P31IZfQLMJrMxLsm+GDFQP0a81ZAtV9NT4KMbIPVO3fRbr5TzoCWP5TNYGtwjweXDMoO Cdb7Es4FwUI4ilaK85PJkH5/B87cVm7RirscWpfYXkEGRbn4dbGtS2mAeN+L0wiypDwh Qd6qaMoGdf3q5bCv3/jaWFm9MEyiCx4vn0eZGz4MDShKcQNi4Gve5SZmLApviqXpqpf/ une6XavYVGoayaUTR8dsXFP659Szc0uQSyOXmQcDyGEioCpRqdJUmI0dtRr7xsqeG0W3 7OPA== X-Gm-Message-State: AFqh2kp8/dIN+cHghht0F5QOfIVEbWbrCsnwdAayCITf3Rj8M5OyAMH+ Ux7wY7WXAqkqMIv1i9qBkqY= X-Google-Smtp-Source: AMrXdXtbjb5k3yKNxPZ6Ry87Y+0HPuS/QyUv3IYX7qbgXoBriB0NOKeG0RZHtZHLAEw3QY6HwQqHbg== X-Received: by 2002:a17:90b:1d04:b0:229:2b7d:ee41 with SMTP id on4-20020a17090b1d0400b002292b7dee41mr5967999pjb.45.1674028830421; Wed, 18 Jan 2023 00:00:30 -0800 (PST) Received: from bobo.ibm.com (193-116-102-45.tpgi.com.au. [193.116.102.45]) by smtp.gmail.com with ESMTPSA id y2-20020a17090a16c200b002272616d3e1sm738462pje.40.2023.01.18.00.00.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Jan 2023 00:00:29 -0800 (PST) From: Nicholas Piggin To: Andrew Morton Cc: Nicholas Piggin , Andy Lutomirski , Linus Torvalds , linux-arch , linux-mm , linuxppc-dev@lists.ozlabs.org Subject: [PATCH v6 2/5] lazy tlb: allow lazy tlb mm refcounting to be configurable Date: Wed, 18 Jan 2023 18:00:08 +1000 Message-Id: <20230118080011.2258375-3-npiggin@gmail.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20230118080011.2258375-1-npiggin@gmail.com> References: <20230118080011.2258375-1-npiggin@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 8B1B340016 X-Stat-Signature: x6a9uubfib18ssiigwc6q5a5a8p53fzi X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1674028831-441083 X-HE-Meta: U2FsdGVkX1/hya+1OUXwd4z0rt0H+Js3kz9KSAbbCknWOAjNAPxlT7/eFBl2Q7M1hbpjCeNF3CNH/WfTBxE92rnjkq38Or6x8MCYKF0+tKuXq0xnQFRvWFkbHSy64XXV04gDb1CX/ZWxcu/dszaYNwaXgYkp9H+/hVmAIez6wbG2qYVo5jbFXpBl5Kro2qsNCkngMVHElV6HJLru/RZOTyTaLgze2MQ5wTYZY04QSeTc2SXsYbYtfbnjIE27SCC/obh9Sq3hXrK7v8EvA4xoZa17/d63jodFA6xOFTEp4UjjLA69RvMGWJIflK462v+rhCS3ryBJOFErk6dke7RhwEgH25at8H5wzuHxE4PSqa12bnjPhppm9XjUJzGcC4kWsT4TnyPHb5PpDrx33RYNBNC4PKkinXbDWzY7se/85b99xLijQZy3GhHOddmzeRB3KgwWaiOm/zoyOBhxU2OVEZIHGLb3Fq4Gv3L7pNIoFDOf0VyLq6XlVKV8/eIKXUk1QFx6UEACq7MF6bgRG/ka8daCDWKelR/jxg7TDo3wtwUdVIxCpFKrtkXsBCKDORu7AFRPB4MibuExRulgZSp2EsIiTZ749FOhvQLUzayQQQ2I1LhtutPC3rhChvXd7nUOBBzSyr9fYEuMhVBcew1wscSNOAp/6sNNEQAb5eY4iJaJgyBF5HB+u4X3GCPAsbppJBWZxA2cZGPkxrwQoPad2s4J5qfkMWRI+Fvu+Hk7HQZfe74nVu8YGgtUVXvNA12nKECOckubujZs2Rd4YuSkhrcYYaLBq4NsyoclQ39sWIPVYN3920CiEcMbE5qipQ0+HfJId6uLjE2X0yua8z6My2cBv58cNOvflnIyEbgieqZjfCImyUz+DIs0qFuNTr2Zal2gFZxDwRbEOADWtEB7mJy0EOgaSPCtN1asfzMibusx2x4yaRX8VhGudOVTHu15KhxE3v3tUrBE1Btxbf6 CBVRfgSU bdYZFmYQVu2u1n1h/XLzk4BnfF1AI8gYIELphwluiSa4LatHF9Lf4dTf63paV+AvcaSEyMnnDZ3rR/4gQKaJO+2czdOBrlsPfPwnpiZZHWnDbln77l6eX8ZgFb4CUJHkFAc1N X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add CONFIG_MMU_TLB_REFCOUNT which enables refcounting of the lazy tlb mm when it is context switched. This can be disabled by architectures that don't require this refcounting if they clean up lazy tlb mms when the last refcount is dropped. Currently this is always enabled, which is what existing code does, so the patch is effectively a no-op. Rename rq->prev_mm to rq->prev_lazy_mm, because that's what it is. Signed-off-by: Nicholas Piggin --- Documentation/mm/active_mm.rst | 6 ++++++ arch/Kconfig | 17 +++++++++++++++++ include/linux/sched/mm.h | 18 +++++++++++++++--- kernel/sched/core.c | 22 ++++++++++++++++++---- kernel/sched/sched.h | 4 +++- 5 files changed, 59 insertions(+), 8 deletions(-) diff --git a/Documentation/mm/active_mm.rst b/Documentation/mm/active_mm.rst index 6f8269c284ed..2b0d08332400 100644 --- a/Documentation/mm/active_mm.rst +++ b/Documentation/mm/active_mm.rst @@ -4,6 +4,12 @@ Active MM ========= +Note, the mm_count refcount may no longer include the "lazy" users +(running tasks with ->active_mm == mm && ->mm == NULL) on kernels +with CONFIG_MMU_LAZY_TLB_REFCOUNT=n. Taking and releasing these lazy +references must be done with mmgrab_lazy_tlb() and mmdrop_lazy_tlb() +helpers which abstracts this config option. + :: List: linux-kernel diff --git a/arch/Kconfig b/arch/Kconfig index 12e3ddabac9d..b07d36f08fea 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -465,6 +465,23 @@ config ARCH_WANT_IRQS_OFF_ACTIVATE_MM irqs disabled over activate_mm. Architectures that do IPI based TLB shootdowns should enable this. +# Use normal mm refcounting for MMU_LAZY_TLB kernel thread references. +# MMU_LAZY_TLB_REFCOUNT=n can improve the scalability of context switching +# to/from kernel threads when the same mm is running on a lot of CPUs (a large +# multi-threaded application), by reducing contention on the mm refcount. +# +# This can be disabled if the architecture ensures no CPUs are using an mm as a +# "lazy tlb" beyond its final refcount (i.e., by the time __mmdrop frees the mm +# or its kernel page tables). This could be arranged by arch_exit_mmap(), or +# final exit(2) TLB flush, for example. +# +# To implement this, an arch *must*: +# Ensure the _lazy_tlb variants of mmgrab/mmdrop are used when dropping the +# lazy reference of a kthread's ->active_mm (non-arch code has been converted +# already). +config MMU_LAZY_TLB_REFCOUNT + def_bool y + config ARCH_HAVE_NMI_SAFE_CMPXCHG bool diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 5376caf6fcf3..68bbe8d90c2e 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -82,17 +82,29 @@ static inline void mmdrop_sched(struct mm_struct *mm) /* Helpers for lazy TLB mm refcounting */ static inline void mmgrab_lazy_tlb(struct mm_struct *mm) { - mmgrab(mm); + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_REFCOUNT)) + mmgrab(mm); } static inline void mmdrop_lazy_tlb(struct mm_struct *mm) { - mmdrop(mm); + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_REFCOUNT)) { + mmdrop(mm); + } else { + /* + * mmdrop_lazy_tlb must provide a full memory barrier, see the + * membarrier comment finish_task_switch which relies on this. + */ + smp_mb(); + } } static inline void mmdrop_lazy_tlb_sched(struct mm_struct *mm) { - mmdrop_sched(mm); + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_REFCOUNT)) + mmdrop_sched(mm); + else + smp_mb(); // see above } /** diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 26aaa974ee6d..1ea14d849a0d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5081,7 +5081,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) __releases(rq->lock) { struct rq *rq = this_rq(); - struct mm_struct *mm = rq->prev_mm; + struct mm_struct *mm = NULL; unsigned int prev_state; /* @@ -5100,7 +5100,10 @@ static struct rq *finish_task_switch(struct task_struct *prev) current->comm, current->pid, preempt_count())) preempt_count_set(FORK_PREEMPT_COUNT); - rq->prev_mm = NULL; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + mm = rq->prev_lazy_mm; + rq->prev_lazy_mm = NULL; +#endif /* * A task struct has one reference for the use as "current". @@ -5231,9 +5234,20 @@ context_switch(struct rq *rq, struct task_struct *prev, lru_gen_use_mm(next->mm); if (!prev->mm) { // from kernel - /* will mmdrop_lazy_tlb() in finish_task_switch(). */ - rq->prev_mm = prev->active_mm; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + /* Will mmdrop_lazy_tlb() in finish_task_switch(). */ + rq->prev_lazy_mm = prev->active_mm; prev->active_mm = NULL; +#else + /* + * Without MMU_LAZY_TLB_REFCOUNT there is no lazy + * tracking (because no rq->prev_lazy_mm) in + * finish_task_switch, so no mmdrop_lazy_tlb(), so no + * memory barrier for membarrier (see the membarrier + * comment in finish_task_switch()). Do it here. + */ + smp_mb(); +#endif } } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 771f8ddb7053..33da8fa8b5a5 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1009,7 +1009,9 @@ struct rq { struct task_struct *idle; struct task_struct *stop; unsigned long next_balance; - struct mm_struct *prev_mm; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + struct mm_struct *prev_lazy_mm; +#endif unsigned int clock_update_flags; u64 clock; From patchwork Wed Jan 18 08:00:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 13105808 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0925C004D4 for ; Wed, 18 Jan 2023 08:00:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 727CD6B007B; Wed, 18 Jan 2023 03:00:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6D8476B007D; Wed, 18 Jan 2023 03:00:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A0316B007E; Wed, 18 Jan 2023 03:00:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4DCC46B007B for ; Wed, 18 Jan 2023 03:00:37 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 2952540347 for ; Wed, 18 Jan 2023 08:00:37 +0000 (UTC) X-FDA: 80367172914.15.C742D5E Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) by imf21.hostedemail.com (Postfix) with ESMTP id 6E70F1C000F for ; Wed, 18 Jan 2023 08:00:35 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=KfdJtVwV; spf=pass (imf21.hostedemail.com: domain of npiggin@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674028835; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fqIE8OQntd0W4a4N2kg2lnLwGGF3XcgL5F+Chwm9T4Q=; b=6JqmlrPlhyp1khxYahxw4IIbF6IEmmnwt2PY+pmU4vMTyccTscKw4aU1T6Bvbhx6rp6Y3x sFqDSgKBnhOK2FPjRZFER1B9MVS0xcwnQy4mE7ICICYB1g07Bb/3cLq5wFQ14MmqnGa4T3 8qsMc/WuWZPDlR5z8UE4azu50rUxdm4= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=KfdJtVwV; spf=pass (imf21.hostedemail.com: domain of npiggin@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674028835; a=rsa-sha256; cv=none; b=cX2Mlmd7BrWgi/bkkkmNvwOIKEYev9urAHL3r5jtMt+gjz68fxZzySRQ0hYR7QSbLHKXm/ U9ujCWLEW5SrCmpPErtbWQ3O5P57InDhFsTkLA/8ZsspfkPdPvJ9/7uB3Nco/4bI7NfpC6 ZZEORlNgq1HUJqBVp1h3aaamQyTF47c= Received: by mail-pj1-f48.google.com with SMTP id a14-20020a17090a70ce00b00229a2f73c56so1505581pjm.3 for ; Wed, 18 Jan 2023 00:00:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fqIE8OQntd0W4a4N2kg2lnLwGGF3XcgL5F+Chwm9T4Q=; b=KfdJtVwV3zcqhh5famP1bouR06n0PTfilksZbm1hFiESF1H43nvbTHW7Mb75MaGACp zxfqqjGqGn88kJjs1uiSW/iHLDh4mIVynPbPCrU6IgmQSUDMRtetaS50F04kKSoGNz9g WONXHWze2mU/V3dtxmyHdPzr4FeUSFZg7tJTmveXFrYiPNYRsjHJbd3hg7AjrtA34J5h Io7y9+STzomQq1Rz0BArK9zI6oqFyEPUifvhz+IyNLNv5/K9JJDuVyedlkHTRoqirdGz Tca1XzhJjeMflbybvMkbUbVUj9LNEmaXTfCe9fq/YPFk+TQN1MRX0mvIM+RvTMI7GgXY 1JAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fqIE8OQntd0W4a4N2kg2lnLwGGF3XcgL5F+Chwm9T4Q=; b=XnQp2z8HGtMzgaCb62tzt4VTdgZOQbX2dSoDhcETpqTyWfvFNHzR4TfJZy6I2dJ+2T Hn1uRWm5eVjMILWC35Z8eYxXOvoOB6csBaXWODFduR8CT+8lYn+sRkCAO0+ESOUECcd5 JUbdylNeTP1W9/u0vRszBdpfLMNTnTXVduCl7X46ApSEQI0oGfGSe6t4NhQfQyZkBGYm HehRj8x6kdohtp48xDnrbdCZIhxHHVPP6u7Kc+7Tre2PKr6j2hZv8kGCifWojVnYqHls nJ7pzf640wScsQru/9BohBWOCgxdak5HAoqZEyY1e3BBXuI4y1HhMN7aspV/JV+lR60M QSGQ== X-Gm-Message-State: AFqh2krbo4i+p4JL0pD79Wg+TFxUkGzr6AneH3KAcyvEXGPY0xIVEgPy tYzZT4o7nFzeP3vSFpb32Os= X-Google-Smtp-Source: AMrXdXsdbUkyarSYAzgpmXfctJIjd4K16TFeyd7Cq4zYvGF3a+5S7WArti19aUKfPU+yLC30TT4/9Q== X-Received: by 2002:a17:90a:4606:b0:226:620b:6ae5 with SMTP id w6-20020a17090a460600b00226620b6ae5mr5974187pjg.22.1674028834284; Wed, 18 Jan 2023 00:00:34 -0800 (PST) Received: from bobo.ibm.com (193-116-102-45.tpgi.com.au. [193.116.102.45]) by smtp.gmail.com with ESMTPSA id y2-20020a17090a16c200b002272616d3e1sm738462pje.40.2023.01.18.00.00.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Jan 2023 00:00:33 -0800 (PST) From: Nicholas Piggin To: Andrew Morton Cc: Nicholas Piggin , Andy Lutomirski , Linus Torvalds , linux-arch , linux-mm , linuxppc-dev@lists.ozlabs.org Subject: [PATCH v6 3/5] lazy tlb: shoot lazies, non-refcounting lazy tlb mm reference handling scheme Date: Wed, 18 Jan 2023 18:00:09 +1000 Message-Id: <20230118080011.2258375-4-npiggin@gmail.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20230118080011.2258375-1-npiggin@gmail.com> References: <20230118080011.2258375-1-npiggin@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 6E70F1C000F X-Stat-Signature: jfubpqtg8w58cjpmiuhzztnkgw7ribrh X-HE-Tag: 1674028835-50629 X-HE-Meta: U2FsdGVkX1/rxjwH2bri3y9aAfz5dCC10dWOUtsrz0h2NEPlKLA7+7C+Uq7XSwJ42b2i3aiCh0JkP5DtYaI3XyClqNNhZ/fQqTrI2G/bxwUP5ogujDcSe47sIXpEkIXbOs5v4IYwtgFxPzmM7tKXchaR5ogailr1KZ8x+G19WA94wewNeaF7QrjyRcQE9+GGrNeXniZhwxMgv0alcOAvm2E8A3ECge6j0cftHmgSyAii2BeiSKEKw7MSTGAJ+lb4y1KKgRmKBJz3fugErB4XubZ5Cd+NYdPFBX87I3j2cV3tPR6O1uEy0LyveAm4SYY76TO+cZ67NqPRGGwHd4cqThT+OpPqulpxkCQ8SH6KBdj9JSyjteHqLUXg0M90199Q3oh+pCJa4jP0GSP6oQ7zxanoI82IUURFRlmjFIayQLrSUvtlCuERhDqERYzuh8M6oTXxWBaVy1ZvnAF6cxd2jQV0M3/1Bf5x5/yhl5URwWwgJlw7oO73LQ8Msqyyb/gWBQxixSzKJptjrLBSe4UCY4N7m2sBFrF0xDanvhjquEmzo6mxVdTftSKSEC1Hfk7W+XzAyySbslTVkQHJCOZQ4e9pZmIr4GELdyTWW3utqNxk58Cf+ew1CShDcXXV+ing6NZHDg7HSqfQErZUIdM8KaZOoHQuapIZitgk2IvosnVdufWbnrUJxfjyhrk0hdjT0VN9Afbg+qeHFih3RiaKqsK+Icta3tkRz4p5GBdym7p7ZLW1CWw63CUi8fpWIAqs2uQaBSWbI2/XzXaNOTW/XS+TkbophQJ8ErOEL4teDPJM+G/DtVLFX7BwjbxSmFh9zYknHanb/Ia4gg2/HhM6t8Bk3HMYqo64OxdhNd53kefk2ZLE+jv4Fu6+sWl770ihr9XvZg+SNcSFh+KBrkdHQENBJuOsOPPWzz6PSxWAECsVP92WGqZbNFBNsflTyoqYRVjwcJPHI+kA99sMggY zG84VBYb ARjXzj2NiSix0aCxecjvJLR4VWY49Dl3RJnTDvwgCkU7OlQnjvCRibQQ4YpsGPgTxtAIznzISACKk6f+qVYDH5VQ4Yu8DC1fKrTQMNBTLA8TzwXDCe4RJyDc9DROGB/bvoGgL X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On big systems, the mm refcount can become highly contented when doing a lot of context switching with threaded applications (particularly switching between the idle thread and an application thread). Abandoning lazy tlb slows switching down quite a bit in the important user->idle->user cases, so instead implement a non-refcounted scheme that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down any remaining lazy ones. Shootdown IPIs cost could be an issue, but they have not been observed to be a serious problem with this scheme, because short-lived processes tend not to migrate CPUs much, therefore they don't get much chance to leave lazy tlb mm references on remote CPUs. There are a lot of options to reduce them if necessary. Signed-off-by: Nicholas Piggin --- arch/Kconfig | 15 ++++++++++++ kernel/fork.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 80 insertions(+) diff --git a/arch/Kconfig b/arch/Kconfig index b07d36f08fea..f7da34e4bc62 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -481,6 +481,21 @@ config ARCH_WANT_IRQS_OFF_ACTIVATE_MM # already). config MMU_LAZY_TLB_REFCOUNT def_bool y + depends on !MMU_LAZY_TLB_SHOOTDOWN + +# This option allows MMU_LAZY_TLB_REFCOUNT=n. It ensures no CPUs are using an +# mm as a lazy tlb beyond its last reference count, by shooting down these +# users before the mm is deallocated. __mmdrop() first IPIs all CPUs that may +# be using the mm as a lazy tlb, so that they may switch themselves to using +# init_mm for their active mm. mm_cpumask(mm) is used to determine which CPUs +# may be using mm as a lazy tlb mm. +# +# To implement this, an arch *must*: +# - At the time of the final mmdrop of the mm, ensure mm_cpumask(mm) contains +# at least all possible CPUs in which the mm is lazy. +# - It must meet the requirements for MMU_LAZY_TLB_REFCOUNT=n (see above). +config MMU_LAZY_TLB_SHOOTDOWN + bool config ARCH_HAVE_NMI_SAFE_CMPXCHG bool diff --git a/kernel/fork.c b/kernel/fork.c index 9f7fe3541897..263660e78c2a 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -780,6 +780,67 @@ static void check_mm(struct mm_struct *mm) #define allocate_mm() (kmem_cache_alloc(mm_cachep, GFP_KERNEL)) #define free_mm(mm) (kmem_cache_free(mm_cachep, (mm))) +static void do_check_lazy_tlb(void *arg) +{ + struct mm_struct *mm = arg; + + WARN_ON_ONCE(current->active_mm == mm); +} + +static void do_shoot_lazy_tlb(void *arg) +{ + struct mm_struct *mm = arg; + + if (current->active_mm == mm) { + WARN_ON_ONCE(current->mm); + current->active_mm = &init_mm; + switch_mm(mm, &init_mm, current); + } +} + +static void cleanup_lazy_tlbs(struct mm_struct *mm) +{ + if (!IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) { + /* + * In this case, lazy tlb mms are refounted and would not reach + * __mmdrop until all CPUs have switched away and mmdrop()ed. + */ + return; + } + + /* + * Lazy TLB shootdown does not refcount "lazy tlb mm" usage, rather it + * requires lazy mm users to switch to another mm when the refcount + * drops to zero, before the mm is freed. This requires IPIs here to + * switch kernel threads to init_mm. + * + * archs that use IPIs to flush TLBs can piggy-back that lazy tlb mm + * switch with the final userspace teardown TLB flush which leaves the + * mm lazy on this CPU but no others, reducing the need for additional + * IPIs here. There are cases where a final IPI is still required here, + * such as the final mmdrop being performed on a different CPU than the + * one exiting, or kernel threads using the mm when userspace exits. + * + * IPI overheads have not found to be expensive, but they could be + * reduced in a number of possible ways, for example (roughly + * increasing order of complexity): + * - The last lazy reference created by exit_mm() could instead switch + * to init_mm, however it's probable this will run on the same CPU + * immediately afterwards, so this may not reduce IPIs much. + * - A batch of mms requiring IPIs could be gathered and freed at once. + * - CPUs store active_mm where it can be remotely checked without a + * lock, to filter out false-positives in the cpumask. + * - After mm_users or mm_count reaches zero, switching away from the + * mm could clear mm_cpumask to reduce some IPIs, perhaps together + * with some batching or delaying of the final IPIs. + * - A delayed freeing and RCU-like quiescing sequence based on mm + * switching to avoid IPIs completely. + */ + on_each_cpu_mask(mm_cpumask(mm), do_shoot_lazy_tlb, (void *)mm, 1); + if (IS_ENABLED(CONFIG_DEBUG_VM)) + on_each_cpu(do_check_lazy_tlb, (void *)mm, 1); +} + /* * Called when the last reference to the mm * is dropped: either by a lazy thread or by @@ -791,6 +852,10 @@ void __mmdrop(struct mm_struct *mm) BUG_ON(mm == &init_mm); WARN_ON_ONCE(mm == current->mm); + + /* Ensure no CPUs are using this as their lazy tlb mm */ + cleanup_lazy_tlbs(mm); + WARN_ON_ONCE(mm == current->active_mm); mm_free_pgd(mm); destroy_context(mm); From patchwork Wed Jan 18 08:00:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 13105809 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7D07C004D4 for ; Wed, 18 Jan 2023 08:00:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 664D06B007D; Wed, 18 Jan 2023 03:00:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 614C56B007E; Wed, 18 Jan 2023 03:00:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 503C26B0080; Wed, 18 Jan 2023 03:00:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 41E6C6B007D for ; Wed, 18 Jan 2023 03:00:41 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 1B1E5A0630 for ; Wed, 18 Jan 2023 08:00:41 +0000 (UTC) X-FDA: 80367173082.18.F2FF744 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf05.hostedemail.com (Postfix) with ESMTP id 61AC0100015 for ; Wed, 18 Jan 2023 08:00:39 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=i4bw46JJ; spf=pass (imf05.hostedemail.com: domain of npiggin@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674028839; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CrPiR9IjSMnA6UKTOp8OkTYk/Ghbyk4227JxOnG3eIc=; b=N4YMB1LyYdpverJkIUmedZZa/Ste76ZSrwdBgAYBtcyBvJ1Q22Le5C7tnm6g01aEZz1ymr 8Gap3nrQxNBKoD/FefMV9pvQUpRQtulrsS+IARR8hanRWxXAQvKw/G7IR2ORkj3in/FnWl L5ZvyHJKOMLTnlrt5etdPasO/ePjw1I= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=i4bw46JJ; spf=pass (imf05.hostedemail.com: domain of npiggin@gmail.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674028839; a=rsa-sha256; cv=none; b=fev7L79VPJOd+z+m2D9OJFHdbc8zxwwtoaWe3D+aA4dhxjTeasJFNl/1kldHEnwN3HoMJi ykyffPAFX6ZSZgfUM0+h0cm5vYj+wjhC4ruCwz3qoKD6RolwwJbaYmFCJpMHeiPmymOyUo QkO0sYycSIV6gHsjTac6OXdNoq2pDl8= Received: by mail-pl1-f175.google.com with SMTP id 20so8787514plo.3 for ; Wed, 18 Jan 2023 00:00:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CrPiR9IjSMnA6UKTOp8OkTYk/Ghbyk4227JxOnG3eIc=; b=i4bw46JJNHRCls6t57srDxlnFl9Ixtn2MdZ/C1ywWOJAZ6c4MzBYu4pOZeTJcb/q7N UsxGeUe0RQcuI8CsYp+iwUQSUUwXzJJH/aoZLqbs8vXZiAVTsoNPzhMrgcJ56rVs7UyC 3wO94DAvEHxcY8Wc7kErGUl9QIJzkStrhbVKKsW0tMVDhEBg0wBmovNzmHyWbXyE6NdP TYBWmk/p0jbSH5WF00hnqcEwOS109b2suq6lAXft0IQlzKGzp7MmKdgJLgQioXXYUHz4 D6ia+/6u4/Dws5LDHKM6NQv/orJW01MoD44qPN/ZLK3HdidkLzBV8lr0dSmdnyYamY+9 Kkpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CrPiR9IjSMnA6UKTOp8OkTYk/Ghbyk4227JxOnG3eIc=; b=12juvQLmWOBgJwLTn6PJZccA1LjIgMcA0BDUyEIu9hDDtey55C5raEXR5jdCDXV6Il zaDaHzbxC4/s8bkQyHMUZgAv181L4+HaxIpmNsQacykOC6ecas9Wpn/Y1GPbHH7BNOO4 0x+M4NcpZELc0daojltNUxJ1VWWqt66U6zaSAe8VMGcWzLdRmb/KmyHjewjc/oGILlxF rsJE/C85z9rZfMSyGCqAjlCkUIzN4YTRlUWHlHDcV1I7TArF6K6vCtGEp3rICzOryJRy ac/oszhw3kYRZ02L2XhByGoy7VcVpfEeLN385oQQPPbE/REWAxNl3CDJh8OmXPy/LEDB yMEQ== X-Gm-Message-State: AFqh2koDFtXWTyhYgmsIDwP1KEKUbuDKNoO54k58vkX3glTdcpuRF6js 5ZE4P6nmvIcAPXi2KySKams= X-Google-Smtp-Source: AMrXdXtTl64G4cCab9NpNyGKytSh9JoZppnQQfip0s76mKtZrzwRQdlnqZiCFVAJ79+Rk/NIj37VRw== X-Received: by 2002:a17:90a:e543:b0:229:a2:a265 with SMTP id ei3-20020a17090ae54300b0022900a2a265mr6031348pjb.3.1674028838309; Wed, 18 Jan 2023 00:00:38 -0800 (PST) Received: from bobo.ibm.com (193-116-102-45.tpgi.com.au. [193.116.102.45]) by smtp.gmail.com with ESMTPSA id y2-20020a17090a16c200b002272616d3e1sm738462pje.40.2023.01.18.00.00.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Jan 2023 00:00:37 -0800 (PST) From: Nicholas Piggin To: Andrew Morton Cc: Nicholas Piggin , Andy Lutomirski , Linus Torvalds , linux-arch , linux-mm , linuxppc-dev@lists.ozlabs.org Subject: [PATCH v6 4/5] powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN Date: Wed, 18 Jan 2023 18:00:10 +1000 Message-Id: <20230118080011.2258375-5-npiggin@gmail.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20230118080011.2258375-1-npiggin@gmail.com> References: <20230118080011.2258375-1-npiggin@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 61AC0100015 X-Stat-Signature: 6unwi98nx6b95miukn1xezipssc6j6pj X-HE-Tag: 1674028839-758787 X-HE-Meta: U2FsdGVkX19F+v2fbV1IykpP0cwQQEhs0dUhtAuaDtH87P/HriCfzebV0tkcRMq2Xh6nKFOXO/vv3zSkMsbJ9th9Pa8fRQmvqzY/EGqppiIQSAQTH6J7jSsDX/wF/4JT3XkeVAWF9VXq6wHRwkQcLEulf0dYyh4iAEHaYCRENVsPf05whR3WFReVZdatlTgXDCvx63Lf3cIsYW2L0jWHYY2ZF/ZDwWg21YkRVTetViUCJx4+ePDY+1TJVwuK7+S6LpkqqLkTU2InZ+fW82ZVgS/WZ1y8mperRiORns/NmsCzCLOOo6kKXOtyqelxxBavBj+XZU6Xee0Wqo7jTg9EUgueji0IcCrafxfTH6Z1O6LN+DP9BUfHQA0S2aTGKpJC/qNBKlHxnvcQflL/5QUmIlFQJBFtKPWXCxWYHcB6/m18GKPdpL/KWCXU7ZBOV0omLyYW/F8ls6jCiutjpudTX1OIj21LMlHg90f202EV3CDva+/pQPQcyM3YN1eTQJe4/B/cTMsvNgbWOJ4fBsksGbwPsoNul7/uY+998kM8YClyKqh6XEJiJDGbGCDZORRO3n6tIWAEY3TGDrjykgQfQQnWPiEQz3Pn08dfRkj4G72AC/NwumC0xIHdsMZPJTQ3f0M24b5k/rohAO1Z82hXYFkfKOH98qGNJJXZU73a+fiqAWGj2SjElbp/4wSCJmf0EGtzQyPTXYOn+PEBuIQnPZAWwHpzzApktzh85gSdednAuM332RJjvGQCxpYqvC36/Z9ZtCMeXODf1NX33eDdoxOdfzZHg2SGb/sny23d9K0hX9nfpCxjp9ci93xOMDDSUYpQvCb1s4QQK8a9dX5D8DoXOuyq74Lpht73d941Xi60Dg9XSlUGoOxvfbbds8nbv6MUo1C0XpGQR1aYzcFgF/Y9jT/cPwwObewmck9xHesPY7pYxib/AcWGI261HJZXM77VZX10h08erZ1pEZI FGrGeUtY oRsaglW5aS1ObkRjx1BlgxPQnXXSmt0ZbtEOB5HCSotIGz/Yvs9g6652NxPujvlQpuc1jpLV8Y8VlshYm1drGkAWFpC6nNLU5XFV2SA+B7UosnhGsCTSmxyzCb9DD+yFS8QV3sHu163SXYTEc2e1z7e1hC3E7IOp8kQWOZkBC1O4nNo8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On a 16-socket 192-core POWER8 system, a context switching benchmark with as many software threads as CPUs (so each switch will go in and out of idle), upstream can achieve a rate of about 1 million context switches per second, due to contention on the mm refcount. 64s meets the prerequisites for CONFIG_MMU_LAZY_TLB_SHOOTDOWN, so enable the option. This increases the above benchmark to 118 million context switches per second. This generates 314 additional IPI interrupts on a 144 CPU system doing a kernel compile, which is in the noise in terms of kernel cycles. Signed-off-by: Nicholas Piggin --- arch/powerpc/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index b8c4ac56bddc..600ace5a7f1a 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -265,6 +265,7 @@ config PPC select MMU_GATHER_PAGE_SIZE select MMU_GATHER_RCU_TABLE_FREE select MMU_GATHER_MERGE_VMAS + select MMU_LAZY_TLB_SHOOTDOWN if PPC_BOOK3S_64 select MODULES_USE_ELF_RELA select NEED_DMA_MAP_STATE if PPC64 || NOT_COHERENT_CACHE select NEED_PER_CPU_EMBED_FIRST_CHUNK if PPC64 From patchwork Wed Jan 18 08:00:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 13105810 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 001AAC004D4 for ; Wed, 18 Jan 2023 08:00:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D5DF6B0074; Wed, 18 Jan 2023 03:00:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 885CB6B007E; Wed, 18 Jan 2023 03:00:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 74E036B0080; Wed, 18 Jan 2023 03:00:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6839E6B007E for ; Wed, 18 Jan 2023 03:00:45 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E8947160541 for ; Wed, 18 Jan 2023 08:00:44 +0000 (UTC) X-FDA: 80367173208.26.D44AA38 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf02.hostedemail.com (Postfix) with ESMTP id 30E6380025 for ; Wed, 18 Jan 2023 08:00:43 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=WcdwzEYp; spf=pass (imf02.hostedemail.com: domain of npiggin@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674028843; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cadyF8CD/E3oK1yfSJSphX1/cZjS+kj7x9tW6+ar6sc=; b=MRM/B5HN2LacKeqt9gLLjVyUVumTXuZMAOHEfXWl/9rA0KBjDPqyGU81C7g5WUMlc9wCc5 QSclt4xBouUdfv/ve5RSMeD8/cmQlgaPBZIskLy3KDKViGnhRyTKZkaEaAaD/2ZiKrDict jZzsn448s33HfCNw1JexvT6y9ozIobM= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=WcdwzEYp; spf=pass (imf02.hostedemail.com: domain of npiggin@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674028843; a=rsa-sha256; cv=none; b=6V4atMrBxomoVbRhCjlClWxsrITI1MWtBONxvfV9XZzzZ7fKfQXvSJDY47EeWKqhk9HdAv +ECY41nvNWXJlRaYce+M5V/BJ+y+qXtfGAmqaqB01lUXjFvKujq1PGZBqZ7Kq++6j87ub6 PF6pPYyb5NjXltcH/PuhWa4+V5s38pY= Received: by mail-pl1-f172.google.com with SMTP id d9so36081416pll.9 for ; Wed, 18 Jan 2023 00:00:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=cadyF8CD/E3oK1yfSJSphX1/cZjS+kj7x9tW6+ar6sc=; b=WcdwzEYpRynxGJWgQlKjA9zr5Ugl+tLrV/aeoKmaEyWm5I+eofmuFjC8v+O5CZJC8w tOZB+tAJR4+wiFQ1iaQWMCDXko5KxSau59i0CeKJSVmyvX/0i6O5Zr39SstJW0Z3rE1B Fyu5ijjz/jR31RZrsNZLyh0clWoOshZNgMIiETyrsUXvb3eq/iyNx1mkClYVLAY7+64k zYOIM7lzO4YVQSlFFkESwH80wBohrcEE2EqtboWnWa9tbyNk7GkZ4n+1ZHeUM7FQ71rU UAR97RL8R4zHQ490YhXWXkQbBJPh185TldI1Fz1/+yE5zazmCzPESNz0kF9nXmNoebc3 uPoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cadyF8CD/E3oK1yfSJSphX1/cZjS+kj7x9tW6+ar6sc=; b=NiEPr/P/sm9SGtZdsIC9x9sxzpV90rc935Vz+cirKAZT2dp4EiKu0fg18hGThFptSl j1/cTBsScyXd5lK3oFi+WnOt8dSY/CBIdsSc9ImrG7PXLmtaLXFnt3HLxsKpkHL8uMJ+ C+YSKvW3sj+kRqvrMDIscQo4FO2MEouF7HZ+nmOGI8sfypNYStXmukjLMu5YxlBwAELZ gvfYMO1IdS2bbSMxLuSz24JcK9eLISWXpNV1NtJjT+jVt3ukh9mdFavD3z76NpAoU6TI jdyRdikyVJfZoUAb+ro84loRMX93/KvUFBrIg6fAuKzUdx0L8oJUYUa6/CTivku8NDcg KiOw== X-Gm-Message-State: AFqh2kqKmrId+xciA+iCfvz8nM9tknR53hUJ6KRyoej7ksCT4v5DdNhg Gyj7OhjC9O0L/c5WFLfuGzs= X-Google-Smtp-Source: AMrXdXvru18rhSMREurCixBPKsRCor9uxoV61Ex5rHoJVQ7gmSRE7YHzcemWD4ARplQOcaVlqvp+VQ== X-Received: by 2002:a17:90a:3fca:b0:227:161a:6318 with SMTP id u10-20020a17090a3fca00b00227161a6318mr6162197pjm.47.1674028842208; Wed, 18 Jan 2023 00:00:42 -0800 (PST) Received: from bobo.ibm.com (193-116-102-45.tpgi.com.au. [193.116.102.45]) by smtp.gmail.com with ESMTPSA id y2-20020a17090a16c200b002272616d3e1sm738462pje.40.2023.01.18.00.00.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Jan 2023 00:00:41 -0800 (PST) From: Nicholas Piggin To: Andrew Morton Cc: Nicholas Piggin , Andy Lutomirski , Linus Torvalds , linux-arch , linux-mm , linuxppc-dev@lists.ozlabs.org Subject: [PATCH v6 5/5] powerpc/64s/radix: combine final TLB flush and lazy tlb mm shootdown IPIs Date: Wed, 18 Jan 2023 18:00:11 +1000 Message-Id: <20230118080011.2258375-6-npiggin@gmail.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20230118080011.2258375-1-npiggin@gmail.com> References: <20230118080011.2258375-1-npiggin@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 30E6380025 X-Stat-Signature: wpd8kndcxk8z38b8gtsacuywgsb1dhyc X-Rspam-User: X-HE-Tag: 1674028842-496516 X-HE-Meta: U2FsdGVkX18pCBq/QkpQUQ8NbHograEEKsvRMUmwk6x/8T01dN+8/7QjVSDvXSwWdeVREfo/zfIWkIU9XXVDoSZmPWf5VNyT+uOzaLBaSq3fWTV9BujVoHuWom0Huz/a0Jios8BfgS6eIhAgzYZ0ktFdDohNSkASkF/7xT5DpMl+TJ/NfMxveXrIEgVyANMU4WlPoAcV6IT3Srz4tSeefEaM1VKPvfhVQv8xD/w9EcT9UAopdBtYTN0QOtsw7+U1bL3BZbawaqUtaBh96C3m+RDld8uwNPCRMCVHxryWzTLyke6ba+K89ly1OsgTF8Q0M39oUiMZ8hLZCDRVR/HkYjlSUngofiSJCxlPTBNPtnxIC1iKQ3WUFiTSUdp12HutgZELpRNUNu0KLIbUDedZWSglYAognAm2kQQc8yd+YRpFDEcUuQmON5axmAs3hzSp7iU4m9hDxBq3LjRj1LzcxQpbXFYMAt8gYNSr9qyBTyaih/PEsDNowoFqsq0bMy9Ka23lVHnXYT8seSM1cK9eub9L0KtJtr+fjKC2N2Pf3lyInFdjU+r0zrHjoebdBbIVcrU62Tr/1x2eNqszV5JCkYY+mK9gc7FW3hZH7QALYxyhG51StQAeFxKOAFKPuXuB2C3c8MAeSwT32slc9ZUthj4g1TOM1YgkS7V9XHKwv5FXA/Odek2/0f6nsKMh92rVlP+rooRyHwEaF5xOVb6V2uG9diiUImQFWZ2iq3+8T7VUfs2KkzAG7RXxHVfY2026eicYVE+spJisonJGlN2DoHvNa9J8dASz+Sh+OfXaoiTCcu6KD4BOB8wO9u4hqvYDWa6a/tfP9WtW6jJNswMa+Koer3JvXJvhrMTuWscHayzXkc5i8KwjYbJ8ZxlaSPTik59qYy//Mlt2BzOynEpzmiT3F+3YYIsPWtaKqMcSgRwXIN0VWIIPMJVdW8v9QlCxw/hYaTjI0k1aqq0ovNG O8/nKCgm oq+ekzEqVQsd0+3kP7n7TPXo7/IXsyDVDoOLUjvtDn/3xwdqmx33tsjpz5KBQ9db0zobqlUxD+3oI65iYirYEGLKueu0UTN/5GEaZya5xBezDUVvwURvc5l2z5uLbwh9y2gXE X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: ** Not for merge ** CONFIG_MMU_LAZY_TLB_SHOOTDOWN that requires IPIs to clear the "lazy tlb" references to an mm that is being freed. With the radix MMU, the final userspace exit TLB flush can be performed with IPIs, and those IPIs can also clear lazy tlb mm references, which mostly eliminates the final IPIs required by MMU_LAZY_TLB_SHOOTDOWN. This does mean the final TLB flush is not done with TLBIE, which can be faster than IPI+TLBIEL, but we would have to do those IPIs for lazy shootdown so using TLBIEL should be a win. The final cpumask test and possible IPIs are still needed to clean up some rare race cases. We could prevent those entirely (e.g., prevent new lazy tlb mm references if userspace has gone away, or move the final TLB flush later), but I'd have to see actual numbers that matter before adding any more complexity for it. I can't imagine it would ever be worthwhile. This takes lazy tlb mm shootdown IPI interrupts from 314 to 3 on a 144 CPU system doing a kernel compile. It also takes care of the one potential problem workload which is a short-lived process with multiple CPU-bound threads that want to be spread to other CPUs, because the mm exit happens after the process is back to single-threaded. --- arch/powerpc/mm/book3s64/radix_tlb.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c index 282359ab525b..f34b78cb4c7d 100644 --- a/arch/powerpc/mm/book3s64/radix_tlb.c +++ b/arch/powerpc/mm/book3s64/radix_tlb.c @@ -1303,7 +1303,31 @@ void radix__tlb_flush(struct mmu_gather *tlb) * See the comment for radix in arch_exit_mmap(). */ if (tlb->fullmm || tlb->need_flush_all) { - __flush_all_mm(mm, true); + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) { + /* + * Shootdown based lazy tlb mm refcounting means we + * have to IPI everyone in the mm_cpumask anyway soon + * when the mm goes away, so might as well do it as + * part of the final flush now. + * + * If lazy shootdown was improved to reduce IPIs (e.g., + * by batching), then it may end up being better to use + * tlbies here instead. + */ + smp_mb(); /* see radix__flush_tlb_mm */ + exit_flush_lazy_tlbs(mm); + _tlbiel_pid(mm->context.id, RIC_FLUSH_ALL); + + /* + * It should not be possible to have coprocessors still + * attached here. + */ + if (WARN_ON_ONCE(atomic_read(&mm->context.copros) > 0)) + __flush_all_mm(mm, true); + } else { + __flush_all_mm(mm, true); + } + } else if ( (psize = radix_get_mmu_psize(page_size)) == -1) { if (!tlb->freed_tables) radix__flush_tlb_mm(mm);