From patchwork Thu Sep 2 21:56:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12473065 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3F25C433FE for ; Thu, 2 Sep 2021 21:56:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A9FC060F21 for ; Thu, 2 Sep 2021 21:56:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A9FC060F21 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 4CDA76B0111; Thu, 2 Sep 2021 17:56:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 47DAE6B0112; Thu, 2 Sep 2021 17:56:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 36D9C8D0001; Thu, 2 Sep 2021 17:56:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0171.hostedemail.com [216.40.44.171]) by kanga.kvack.org (Postfix) with ESMTP id 27B256B0111 for ; Thu, 2 Sep 2021 17:56:19 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E86EC182CFFC1 for ; Thu, 2 Sep 2021 21:56:18 +0000 (UTC) X-FDA: 78543992436.07.E96E4DD Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf25.hostedemail.com (Postfix) with ESMTP id 88F48B00008E for ; Thu, 2 Sep 2021 21:56:18 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 89EAC603E9; Thu, 2 Sep 2021 21:56:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1630619777; bh=3NXjCoh43DrkBrEDm3MrAaB/Ju4wQDC8CWHHEAYe2So=; h=Date:From:To:Subject:In-Reply-To:From; b=w2dPbSI1LsoUtamRSwHlnUNYdJW5pvr5WLF0rpYVnYR1NoYJkyo7kQFiyWnbeQRa2 v9Gu3oldQj/VOL4sPuYh3lKaCRstZ7IfrKl1R+aiAX9d0EbteFUsHvRgCAHt2L/mmM gVlHFRB7oUkPJySvpJYWs4xKDw6gOx6ch+WIOy0o= Date: Thu, 02 Sep 2021 14:56:17 -0700 From: Andrew Morton To: akpm@linux-foundation.org, anton@ozlabs.org, benh@kernel.crashing.org, linux-mm@kvack.org, luto@kernel.org, mm-commits@vger.kernel.org, npiggin@gmail.com, paulus@ozlabs.org, rdunlap@infradead.org, torvalds@linux-foundation.org Subject: [patch 118/212] lazy tlb: allow lazy tlb mm refcounting to be configurable Message-ID: <20210902215617.sA6qrA5sw%akpm@linux-foundation.org> In-Reply-To: <20210902144820.78957dff93d7bea620d55a89@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Rspamd-Queue-Id: 88F48B00008E Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=w2dPbSI1; dmarc=none; spf=pass (imf25.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org X-Rspamd-Server: rspam01 X-Stat-Signature: 6yncscshpp6o11zefchakfsj9y7khnpf X-HE-Tag: 1630619778-36759 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nicholas Piggin Subject: lazy tlb: allow lazy tlb mm refcounting to be configurable Add CONFIG_MMU_TLB_REFCOUNT which enables refcounting of the lazy tlb mm when it is context switched. This can be disabled by architectures that don't require this refcounting if they clean up lazy tlb mms when the last refcount is dropped. Currently this is always enabled, which is what existing code does, so the patch is effectively a no-op. Rename rq->prev_mm to rq->prev_lazy_mm, because that's what it is. [akpm@linux-foundation.org: fix comment] [npiggin@gmail.com: update comments] Link: https://lkml.kernel.org/r/1623121605.j47gdpccep.astroid@bobo.none Link: https://lkml.kernel.org/r/20210605014216.446867-3-npiggin@gmail.com Signed-off-by: Nicholas Piggin Cc: Andy Lutomirski Cc: Anton Blanchard Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Randy Dunlap Signed-off-by: Andrew Morton --- arch/Kconfig | 14 ++++++++++++++ include/linux/sched/mm.h | 14 ++++++++++++-- kernel/sched/core.c | 22 ++++++++++++++++++---- kernel/sched/sched.h | 4 +++- 4 files changed, 47 insertions(+), 7 deletions(-) --- a/arch/Kconfig~lazy-tlb-allow-lazy-tlb-mm-refcounting-to-be-configurable +++ a/arch/Kconfig @@ -425,6 +425,20 @@ config ARCH_WANT_IRQS_OFF_ACTIVATE_MM irqs disabled over activate_mm. Architectures that do IPI based TLB shootdowns should enable this. +# Use normal mm refcounting for MMU_LAZY_TLB kernel thread references. +# MMU_LAZY_TLB_REFCOUNT=n can improve the scalability of context switching +# to/from kernel threads when the same mm is running on a lot of CPUs (a large +# multi-threaded application), by reducing contention on the mm refcount. +# +# This can be disabled if the architecture ensures no CPUs are using an mm as a +# "lazy tlb" beyond its final refcount (i.e., by the time __mmdrop frees the mm +# or its kernel page tables). This could be arranged by arch_exit_mmap(), or +# final exit(2) TLB flush, for example. arch code must also ensure the +# _lazy_tlb variants of mmgrab/mmdrop are used when dropping the lazy reference +# to a kthread ->active_mm (non-arch code has been converted already). +config MMU_LAZY_TLB_REFCOUNT + def_bool y + config ARCH_HAVE_NMI_SAFE_CMPXCHG bool --- a/include/linux/sched/mm.h~lazy-tlb-allow-lazy-tlb-mm-refcounting-to-be-configurable +++ a/include/linux/sched/mm.h @@ -52,12 +52,22 @@ static inline void mmdrop(struct mm_stru /* Helpers for lazy TLB mm refcounting */ static inline void mmgrab_lazy_tlb(struct mm_struct *mm) { - mmgrab(mm); + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_REFCOUNT)) + mmgrab(mm); } static inline void mmdrop_lazy_tlb(struct mm_struct *mm) { - mmdrop(mm); + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_REFCOUNT)) { + mmdrop(mm); + } else { + /* + * mmdrop_lazy_tlb must provide a full memory barrier, see the + * membarrier comment in finish_task_switch which relies on + * this. + */ + smp_mb(); + } } /** --- a/kernel/sched/core.c~lazy-tlb-allow-lazy-tlb-mm-refcounting-to-be-configurable +++ a/kernel/sched/core.c @@ -4527,7 +4527,7 @@ static struct rq *finish_task_switch(str __releases(rq->lock) { struct rq *rq = this_rq(); - struct mm_struct *mm = rq->prev_mm; + struct mm_struct *mm = NULL; long prev_state; /* @@ -4546,7 +4546,10 @@ static struct rq *finish_task_switch(str current->comm, current->pid, preempt_count())) preempt_count_set(FORK_PREEMPT_COUNT); - rq->prev_mm = NULL; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + mm = rq->prev_lazy_mm; + rq->prev_lazy_mm = NULL; +#endif /* * A task struct has one reference for the use as "current". @@ -4682,9 +4685,20 @@ context_switch(struct rq *rq, struct tas switch_mm_irqs_off(prev->active_mm, next->mm, next); if (!prev->mm) { // from kernel - /* will mmdrop_lazy_tlb() in finish_task_switch(). */ - rq->prev_mm = prev->active_mm; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + /* Will mmdrop_lazy_tlb() in finish_task_switch(). */ + rq->prev_lazy_mm = prev->active_mm; prev->active_mm = NULL; +#else + /* + * Without MMU_LAZY_TLB_REFCOUNT there is no lazy + * tracking (because no rq->prev_lazy_mm) in + * finish_task_switch, so no mmdrop_lazy_tlb(), so no + * memory barrier for membarrier (see the membarrier + * comment in finish_task_switch()). Do it here. + */ + smp_mb(); +#endif } } --- a/kernel/sched/sched.h~lazy-tlb-allow-lazy-tlb-mm-refcounting-to-be-configurable +++ a/kernel/sched/sched.h @@ -967,7 +967,9 @@ struct rq { struct task_struct *idle; struct task_struct *stop; unsigned long next_balance; - struct mm_struct *prev_mm; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + struct mm_struct *prev_lazy_mm; +#endif unsigned int clock_update_flags; u64 clock;