From patchwork Thu Aug 17 08:05:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13356106 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3A1FEB64DD for ; Thu, 17 Aug 2023 08:09:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4CC1B28003A; Thu, 17 Aug 2023 04:09:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 47C05280039; Thu, 17 Aug 2023 04:09:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 344AC28003A; Thu, 17 Aug 2023 04:09:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 2505A280039 for ; Thu, 17 Aug 2023 04:09:02 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id DFB1E1A054A for ; Thu, 17 Aug 2023 08:09:01 +0000 (UTC) X-FDA: 81132870882.10.5882A52 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf29.hostedemail.com (Postfix) with ESMTP id 4C51E12002B for ; Thu, 17 Aug 2023 08:08:58 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf29.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692259740; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=HGRcAg5HjdS4ONzDmJ/T8NuIYNtN9mrquxCAtEoXEfY=; b=MB+XynemOswYxd1z4m0oTl3cEvZv0g9mIUj1o73gQ+VuFLQ3Yi6spnNhty756EbaESgFR9 5LqEcNUggUjB6G+t2b/WrgDdnMFq44x62X1y6t0LoeGojd90brfLFvVLeiepi8EgEEPh8u whIYXCcS8NFuFieHOH3dSWV1ATgiXGk= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf29.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692259740; a=rsa-sha256; cv=none; b=RvDLCOPf22qEo5XiBFKqOWQ/eDhTgp0xoJi11YRo9JjZBpgXpb4rMD+Ri8I75yKb4SJWYB 2aUshj1Yg0LAr2v/6jsGJrFwcdcOTUW7B4CH7KBN5eEbGYkwZPPm80U0dvvYjPN1T4hvS+ PF3r+6J9D7HiJbIDD4Ub1ccfgxnxOJE= X-AuditID: a67dfc5b-d85ff70000001748-c5-64ddd598e29a From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, namit@vmware.com, xhao@linux.alibaba.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, dave.hansen@linux.intel.com Subject: [RFC v2 1/6] mm/rmap: Recognize non-writable TLB entries during TLB batch flush Date: Thu, 17 Aug 2023 17:05:54 +0900 Message-Id: <20230817080559.43200-2-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230817080559.43200-1-byungchul@sk.com> References: <20230817080559.43200-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrNLMWRmVeSWpSXmKPExsXC9ZZnoe6Mq3dTDDYsU7OYs34Nm8WLDe2M Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsru96yGhxvPcAk8XvH0DZOVOsLE7O msziwOuxYFOpx+YVWh6L97xk8ti0qpPNY9OnSeweJ2b8ZvHY+dDSY97JQI/3+66yeWz9Zefx eZOcx7v5b9kCeKK4bFJSczLLUov07RK4Mlbe3MhUcESvomXWDMYGxvtqXYycHBICJhILD81i h7E37LnGCGKzCahL3LjxkxnEFhEwkzjY+geohouDWWAZk8TdA+dYQRLCAhES+5Y8BWtgEVCV mN82lwXE5hUwldiz6gHUUHmJ1RsOAA3i4OAEGrT5rypIWAio5MPedawgMyUELrNJPN19mRGi XlLi4IobLBMYeRcwMqxiFMrMK8tNzMwx0cuozMus0EvOz93ECAziZbV/oncwfroQfIhRgINR iYfXYdedFCHWxLLiytxDjBIczEoivD28t1KEeFMSK6tSi/Lji0pzUosPMUpzsCiJ8xp9K08R EkhPLEnNTk0tSC2CyTJxcEo1MNY+3OTYNOXJsTsXbDgfP9x/5T+/291TCnqn5RV/F21Nefoo 795/xVaGjUaM96ZEPgnI+Z27Vyj3s1bM/5uz606emX2GW27W8vbCtGvBK07yne4IPGSzr9nZ XWqeY3vMt03V7y7MCAoWvuv2fmZRID+TmGYS/5qPj9gM5f30jurdfn82jP2P8GolluKMREMt 5qLiRABVsmrZXgIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrELMWRmVeSWpSXmKPExsXC5WfdrDvj6t0Ugw+/pS3mrF/DZvFiQzuj xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlc3/WQ0eJ47wEmi98/gLJz plhZnJw1mcWBz2PBplKPzSu0PBbvecnksWlVJ5vHpk+T2D1OzPjN4rHzoaXHvJOBHu/3XWXz WPziA5PH1l92Hp83yXm8m/+WLYA3issmJTUnsyy1SN8ugStj5c2NTAVH9CpaZs1gbGC8r9bF yMkhIWAisWHPNUYQm01AXeLGjZ/MILaIgJnEwdY/7F2MXBzMAsuYJO4eOMcKkhAWiJDYt+Qp WAOLgKrE/La5LCA2r4CpxJ5VD9ghhspLrN5wAGgQBwcn0KDNf1VBwkJAJR/2rmOdwMi1gJFh FaNIZl5ZbmJmjqlecXZGZV5mhV5yfu4mRmBILqv9M3EH45fL7ocYBTgYlXh4HXbdSRFiTSwr rsw9xCjBwawkwtvDeytFiDclsbIqtSg/vqg0J7X4EKM0B4uSOK9XeGqCkEB6YklqdmpqQWoR TJaJg1OqgTE+4q5e/bfvPSlK2VXf7JpnrpGwfrfxgXN2duMdjf2/5nxSmBGj+cj9xood2ic8 FpRdqMn8IBKR57Xobojn0w/+zd8tp4anO3BLCbb8/Ku5OTB2jeoXY1Y5o/vfjM4JrWt925Pl Id0hHXBKt9ct0GobO4up7bt1zg8//ZgonbL/+bHDCr+271ViKc5INNRiLipOBAC0ccGjRQIA AA== X-CFilter-Loop: Reflected X-Rspam-User: X-Stat-Signature: fdnm4zh8g5pwrie34n4976jjzc4kg89t X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4C51E12002B X-HE-Tag: 1692259738-649324 X-HE-Meta: U2FsdGVkX1/AhoJc5KtOTdZLQJ9iKE/vX7bIxg2a+EWuvAxPeZm/eRqbIAWaT8SjuxUXiES11PMve7rrHXwfMBaLZFYuFhilON+lxA3nFtLdVkfO5SeE2zR3JIlgqgmaTPBKyrhSXKOwChfCCTAhx10kAy8Rt3uuUAA9pz5Moq5gxqGAFNzcL1EywFXqO46wL6Ber9hgwMOnkFRolRYkpaxaFM1lhLPwGcSohEhWt2F2zjwSka362uOhKrRk6WMgevWeAdHk7VmSht6HNo+YVRRWwhdNmAN+CaU1cJgPOk+rUGCSHgM6+X0LH3FiXsSqUPYhmIj5iHzYPEIvoAaU3QdhgQPnZokLZJt/rgI1I84i8lZvtRzX4gCF4KglvD+a/qYbtT40HokZw2TSwPw1SKii3ig82Nhtgo4wDHtk0uV5w1y9tB/7g5NKjNxviPGalYVTRJ8vSLoAFEgr1f7KJUN7uQthHA6BC3zr2exf73Ta8vHaLaCEfRvExmtdq1DGO2u2aI6z7h7+5/5TpOU7pFupQF6yvS9q0koyVf8nQK4TyLRK1rGLxUhDQ5BPXcjQ/AlOM/48oj4F80Qm5bn2b09i4llSTFY0J3yK0+RDMqQtuOGImrlirL50mItwE8aKjSiTbw5Z/05yxalbRLsI/k/zEjHEm8XRs12K+9KmTSN8P8ICyIkXt66hb8GECu56YHhDfPeMkEaCJmlbas6BQVmbiis6nb8C6Bfo2/8ljjD9Ac2H8nUunMvckyQvUXDgBTs72ScZ12JpVpCl4QG3h/b16Sezz2I8K4C3GutabkDwf7RRyQ3AttWEOZmwaC+VKGZcTSXyj17himLawrkPeHuQbrSF3oKqZV3AcJ9FAK8x4U2opiDa0+1JK31mlL52LI+unypcJdwIGs7LlhaeFBOdGwqtOrmcqF92nm7mD+mf7wnxh5dRPJd7WHv9M4lMJk9vCMuV8FHXrcbxWfI aqw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Functionally, no change. This is a preparation for CONFIG_MIGRC that requires to recognize non-writable TLB entries and makes use of them to batch more aggressively or even skip TLB flushes. While at it, changed struct tlbflush_unmap's ->flush_required(boolean) to ->nr_flush_required(int) in order to take into account not only whether it has been requested or not, but also the exact number of the requests. That will be used in CONFIG_MIGRC implementation. Signed-off-by: Byungchul Park --- arch/x86/include/asm/tlbflush.h | 2 ++ arch/x86/mm/tlb.c | 7 +++++++ include/linux/mm_types_task.h | 4 ++-- include/linux/sched.h | 1 + mm/internal.h | 4 ++++ mm/rmap.c | 29 ++++++++++++++++++++++++----- 6 files changed, 40 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 75bfaa421030..63504cde364b 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -279,6 +279,8 @@ static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch, } extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +extern void arch_tlbbatch_fold(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc); static inline bool pte_flags_need_flush(unsigned long oldflags, unsigned long newflags, diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 267acf27480a..69d145f1fff1 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1265,6 +1265,13 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) put_cpu(); } +void arch_tlbbatch_fold(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + cpumask_or(&bdst->cpumask, &bdst->cpumask, &bsrc->cpumask); + cpumask_clear(&bsrc->cpumask); +} + /* * Blindly accessing user memory from NMI context can be dangerous * if we're in the middle of switching the current user task or diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index 5414b5c6a103..6f3bb757eb46 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -59,8 +59,8 @@ struct tlbflush_unmap_batch { */ struct arch_tlbflush_unmap_batch arch; - /* True if a flush is needed. */ - bool flush_required; + /* The number of flush requested. */ + int nr_flush_required; /* * If true then the PTE was dirty when unmapped. The entry must be diff --git a/include/linux/sched.h b/include/linux/sched.h index eed5d65b8d1f..2232b2cdfce8 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1322,6 +1322,7 @@ struct task_struct { #endif struct tlbflush_unmap_batch tlb_ubc; + struct tlbflush_unmap_batch tlb_ubc_nowr; /* Cache last used pipe for splice(): */ struct pipe_inode_info *splice_pipe; diff --git a/mm/internal.h b/mm/internal.h index 68410c6d97ac..b90d516ad41f 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -840,6 +840,7 @@ extern struct workqueue_struct *mm_percpu_wq; void try_to_unmap_flush(void); void try_to_unmap_flush_dirty(void); void flush_tlb_batched_pending(struct mm_struct *mm); +void fold_ubc_nowr(void); #else static inline void try_to_unmap_flush(void) { @@ -850,6 +851,9 @@ static inline void try_to_unmap_flush_dirty(void) static inline void flush_tlb_batched_pending(struct mm_struct *mm) { } +static inline void fold_ubc_nowr(void) +{ +} #endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ extern const struct trace_print_flags pageflag_names[]; diff --git a/mm/rmap.c b/mm/rmap.c index 19392e090bec..d18460a48485 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -605,6 +605,22 @@ struct anon_vma *folio_lock_anon_vma_read(struct folio *folio, } #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH + +void fold_ubc_nowr(void) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_nowr = ¤t->tlb_ubc_nowr; + + if (!tlb_ubc_nowr->nr_flush_required) + return; + + arch_tlbbatch_fold(&tlb_ubc->arch, &tlb_ubc_nowr->arch); + tlb_ubc->writable = tlb_ubc->writable || tlb_ubc_nowr->writable; + tlb_ubc->nr_flush_required += tlb_ubc_nowr->nr_flush_required; + tlb_ubc_nowr->nr_flush_required = 0; + tlb_ubc_nowr->writable = false; +} + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed @@ -615,11 +631,12 @@ void try_to_unmap_flush(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; - if (!tlb_ubc->flush_required) + fold_ubc_nowr(); + if (!tlb_ubc->nr_flush_required) return; arch_tlbbatch_flush(&tlb_ubc->arch); - tlb_ubc->flush_required = false; + tlb_ubc->nr_flush_required = 0; tlb_ubc->writable = false; } @@ -627,8 +644,9 @@ void try_to_unmap_flush(void) void try_to_unmap_flush_dirty(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_nowr = ¤t->tlb_ubc_nowr; - if (tlb_ubc->writable) + if (tlb_ubc->writable || tlb_ubc_nowr->writable) try_to_unmap_flush(); } @@ -644,15 +662,16 @@ void try_to_unmap_flush_dirty(void) static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval) { - struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc; int batch; bool writable = pte_dirty(pteval); if (!pte_accessible(mm, pteval)) return; + tlb_ubc = pte_write(pteval) ? ¤t->tlb_ubc : ¤t->tlb_ubc_nowr; arch_tlbbatch_add_mm(&tlb_ubc->arch, mm); - tlb_ubc->flush_required = true; + tlb_ubc->nr_flush_required += 1; /* * Ensure compiler does not re-order the setting of tlb_flush_batched From patchwork Thu Aug 17 08:05:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13356109 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0BE5EB64DD for ; Thu, 17 Aug 2023 08:09:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1272F28003D; Thu, 17 Aug 2023 04:09:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D57F28003B; Thu, 17 Aug 2023 04:09:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF30D28003D; Thu, 17 Aug 2023 04:09:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C9F4328003B for ; Thu, 17 Aug 2023 04:09:04 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8C7E2C0791 for ; Thu, 17 Aug 2023 08:09:04 +0000 (UTC) X-FDA: 81132871008.15.CADE2EE Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf29.hostedemail.com (Postfix) with ESMTP id 68AF6120010 for ; Thu, 17 Aug 2023 08:09:02 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf29.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692259743; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=+XbFA58eIdsyEsY4OOHU21i/vM/1jzgTWm8NyTsozu8=; b=o2I+mDkwBp4sQksbsNPWvFpJCtMuJhuMze08jC1sOtq9956cqeSU9gEieYVDy1c50l/6IK RO535Q+6ssF8a5/Ku2ikEuzU00UZAVH8lAf6iqaEnacXe2lNZ1sVolFzeDDWOM7gqJiukt pejWjmNfjPqNFqwqPETcg3v1JIaeW0Q= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf29.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692259743; a=rsa-sha256; cv=none; b=n82IbpqO+Nw/nBGbIBgx535hnY8Ccd6PPrK1S2UVdV1Cj7nWB+d4uA58qPcgul7eI2cQS7 OAuAKMWiDE32uzOQdHKBpwWnd2xCCYhOShtqb+0kQbCQs+UYAw0IX0MlPF9jpMbCepSdJc 5ZO2KjCkWNDDJ7giRILF6Hg+gMshI6M= X-AuditID: a67dfc5b-d85ff70000001748-c9-64ddd5986068 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, namit@vmware.com, xhao@linux.alibaba.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, dave.hansen@linux.intel.com Subject: [RFC v2 2/6] mm: Defer TLB flush by keeping both src and dst folios at migration Date: Thu, 17 Aug 2023 17:05:55 +0900 Message-Id: <20230817080559.43200-3-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230817080559.43200-1-byungchul@sk.com> References: <20230817080559.43200-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrNLMWRmVeSWpSXmKPExsXC9ZZnoe6Mq3dTDG4v0rWYs34Nm8WLDe2M Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsru96yGhxvPcAk8XvH0DZOVOsLE7O msziwOuxYFOpx+YVWh6L97xk8ti0qpPNY9OnSeweJ2b8ZvHY+dDSY97JQI/3+66yeWz9Zefx eZOcx7v5b9kCeKK4bFJSczLLUov07RK4MvYefM5c8Ps0Y0XHozdMDYyXFzF2MXJwSAiYSHw4 HNvFyAlmTn6/ih3EZhNQl7hx4ycziC0iYCZxsPUPUJyLg1lgGZPE3QPnWEESwgKREt+37gab wyKgKjFhaTZImFfAVOLek35GiJnyEqs3HGAGKeEEmrP5rypIWAio5MPedawgIyUELrNJPPx5 ggmiXlLi4IobLBMYeRcwMqxiFMrMK8tNzMwx0cuozMus0EvOz93ECAziZbV/oncwfroQfIhR gINRiYfXYdedFCHWxLLiytxDjBIczEoivD28t1KEeFMSK6tSi/Lji0pzUosPMUpzsCiJ8xp9 K08REkhPLEnNTk0tSC2CyTJxcEo1MK5Pyd8j3rLGu7CAuffPMU5vx3YLg00bV5/8pmwRJfP8 w8fN4qzxk3R26dz81pbR7LVN+PdrHmafu6vExEzunGNxZt97YbqBb87JzYcLzi3WneN779vr J5tmnrnInXo09Olfvdri1h/vAw0uZa60PR0p9GlaX41z1g5DQZ2XHJks8vNmWG7JbFNiKc5I NNRiLipOBAAvFlKpXgIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrMLMWRmVeSWpSXmKPExsXC5WfdrDvj6t0Ug1kXFS3mrF/DZvFiQzuj xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlc3/WQ0eJ47wEmi98/gLJz plhZnJw1mcWBz2PBplKPzSu0PBbvecnksWlVJ5vHpk+T2D1OzPjN4rHzoaXHvJOBHu/3XWXz WPziA5PH1l92Hp83yXm8m/+WLYA3issmJTUnsyy1SN8ugStj78HnzAW/TzNWdDx6w9TAeHkR YxcjJ4eEgInE5Per2EFsNgF1iRs3fjKD2CICZhIHW/8Axbk4mAWWMUncPXCOFSQhLBAp8X3r bqBmDg4WAVWJCUuzQcK8AqYS9570Q82Ul1i94QAzSAkn0JzNf1VBwkJAJR/2rmOdwMi1gJFh FaNIZl5ZbmJmjqlecXZGZV5mhV5yfu4mRmBQLqv9M3EH45fL7ocYBTgYlXh4HXbdSRFiTSwr rsw9xCjBwawkwtvDeytFiDclsbIqtSg/vqg0J7X4EKM0B4uSOK9XeGqCkEB6YklqdmpqQWoR TJaJg1OqgXF3ppPa9b2y+232FbS3Ox27O+cg/+YvQQvL4+MVfpzadX9+91Kr50ZM8+8fXd86 b+MV9imyohxR//weH2p1DdeNi+FZdVqo64f2iQuzPtjd09Y0FI2b9XO1qXps06l5Xuyx0aVv eXgPp83a8vDMrMAH7T0TimQX7DDYVxhZ8uGRaVJdyZPaiq1KLMUZiYZazEXFiQDkjtyyRgIA AA== X-CFilter-Loop: Reflected X-Rspam-User: X-Stat-Signature: s3yj7qf9k86u5hxgyqjait7d1pkc1ieo X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 68AF6120010 X-HE-Tag: 1692259742-729647 X-HE-Meta: U2FsdGVkX18u8GPC4+hj51jUrSNRU6gDA32dz40/EvQD+gxucCG0tQocQx8lsLwhan5S3CIRD/7qwXjqZXwHtB57MDNOfz6qAT/2q/o63oNaNtwRRet8A08oEX0WY+afr/b/dIK3AkJ2fRwtyUBbTVXwSeU2TNL2lfNxT3mbV+8vL/xFhdFv+vbYAEod1eA8Hki1CoZ6Wy9mA0l807hgSb+SIQE2RkjIi2VM7I648kCsDo0fSV303BCf+0JcBD3/xATBSOuq+21W0wpch1/YXzPFbEpQi7isThwD1Qm9rQ++6MdlcMkY489WNgaNYGsLLNs8zlCiQbtra17sS8k+8luvxBPmCOl5xqdKg3XeOewyc5YibC5OKwZaOzNPELTG59rTv5Fz5LW+eR63GVBBbi3aMFuFvR1bdSzNzh3+9dO+qa4lqvhAvva8jKy0XufReXVhXmWVBB5oLjdlIt9qeVzVbGuTjXsduzh94/WRjC2GioQSsSKfUiuODlSKWsM8FeaVS4Py/rpC4t10KUZJ9+35AndvGErBCHJ/lZ36ZlFwpW/p8yDT88j3tU2nQcmXG1+LAjA1HUfeZytWJ+v8Vy3ax6HbdwA28pus9KGo45uKJrsJzAXBMkIAmD2SxA8/sDlTLp1AhsoD6c+B2Sh7Mw6D4uWZEiLpJrTsVOKKSrQXf39qt2F3uHd1xeTj/pWftMctk3xeqG5g6z/ssSawa7/bQvZHNNHVoRWTqrTuAse9MM+Uwr3PVpTPkrJ/ZgQPJirlJfHsoCll2R1OwhiWNzgotaGWTnw3AbRTYMwp8KnFc1Iqvg4Gy+qKn8N5iVqIZwS/e+8t6NBFWeltLfBBCWSqLetE8dn3aGWrPoYd57qwNrbU0OzvpnOTUJT2UjfWEVzdfcnRMJsvlrcIGn1b9TiAhT0yYLZKcX+SjgrPH8hsL3SNFJ3uzstx/2RkmotIpcge+6fuHTkhj4/71if MJA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Implementation of CONFIG_MIGRC that stands for 'Migration Read Copy'. We always face the migration overhead at either promotion or demotion, while working with tiered memory e.g. CXL memory and found out TLB shootdown is a quite big one that is needed to get rid of if possible. Fortunately, TLB flush can be defered or even skipped if both source and destination of folios during migration are kept until all TLB flushes required will have been done, of course, only if the target PTE entries have read only permission, more precisely speaking, don't have write permission. Otherwise, no doubt the folio might get messed up. To achieve that: 1. For the folios that map only to non-writable TLB entries, prevent TLB flush at migration by keeping both source and destination folios, which will be handled later at a better time. 2. When any non-writable TLB entry changes to writable e.g. through fault handler, give up CONFIG_MIGRC mechanism so as to perform TLB flush required right away. The measurement result: Architecture - x86_64 QEMU - kvm enabled, host cpu, 2nodes {(4cpus, 2GB), (cpuless, 6GB)} Linux Kernel - v6.4, numa balancing tiering on, demotion enabled Benchmark - XSBench with no parameter changed run 'perf stat' using events: 1) itlb.itlb_flush 2) tlb_flush.dtlb_thread 3) tlb_flush.stlb_any run 'cat /proc/vmstat' and pick up: 1) pgdemote_kswapd 2) numa_pages_migrated 3) pgmigrate_success 4) nr_tlb_remote_flush 5) nr_tlb_remote_flush_received 6) nr_tlb_local_flush_all 7) nr_tlb_local_flush_one BEFORE - mainline v6.4 ------------------------------------------ $ perf stat -e itlb.itlb_flush,tlb_flush.dtlb_thread,tlb_flush.stlb_any ./XSBench Performance counter stats for './XSBench': 426856 itlb.itlb_flush 6900414 tlb_flush.dtlb_thread 7303137 tlb_flush.stlb_any 33.500486566 seconds time elapsed 92.852128000 seconds user 10.526718000 seconds sys $ cat /proc/vmstat ... pgdemote_kswapd 1052596 numa_pages_migrated 1052359 pgmigrate_success 2161846 nr_tlb_remote_flush 72370 nr_tlb_remote_flush_received 213711 nr_tlb_local_flush_all 3385 nr_tlb_local_flush_one 198679 ... AFTER - mainline v6.4 + CONFIG_MIGRC ------------------------------------------ $ perf stat -e itlb.itlb_flush,tlb_flush.dtlb_thread,tlb_flush.stlb_any ./XSBench Performance counter stats for './XSBench': 179537 itlb.itlb_flush 6131135 tlb_flush.dtlb_thread 6920979 tlb_flush.stlb_any 30.396700625 seconds time elapsed 80.331252000 seconds user 10.303761000 seconds sys $ cat /proc/vmstat ... pgdemote_kswapd 1044602 numa_pages_migrated 1044202 pgmigrate_success 2157808 nr_tlb_remote_flush 30453 nr_tlb_remote_flush_received 88840 nr_tlb_local_flush_all 3039 nr_tlb_local_flush_one 198875 ... Signed-off-by: Byungchul Park --- arch/x86/include/asm/tlbflush.h | 1 + arch/x86/mm/tlb.c | 5 + include/linux/mm.h | 28 +++ include/linux/mm_types.h | 47 +++++ include/linux/mmzone.h | 3 + include/linux/sched.h | 4 + init/Kconfig | 13 ++ mm/internal.h | 10 ++ mm/memory.c | 17 +- mm/migrate.c | 296 +++++++++++++++++++++++++++++++- mm/mm_init.c | 1 + mm/page_alloc.c | 3 + mm/rmap.c | 103 +++++++++++ 13 files changed, 526 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 63504cde364b..752d72ea209b 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -279,6 +279,7 @@ static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch, } extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +extern void arch_tlbbatch_clean(struct arch_tlbflush_unmap_batch *batch); extern void arch_tlbbatch_fold(struct arch_tlbflush_unmap_batch *bdst, struct arch_tlbflush_unmap_batch *bsrc); diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 69d145f1fff1..2dabf0f340fb 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1265,6 +1265,11 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) put_cpu(); } +void arch_tlbbatch_clean(struct arch_tlbflush_unmap_batch *batch) +{ + cpumask_clear(&batch->cpumask); +} + void arch_tlbbatch_fold(struct arch_tlbflush_unmap_batch *bdst, struct arch_tlbflush_unmap_batch *bsrc) { diff --git a/include/linux/mm.h b/include/linux/mm.h index 27ce77080c79..1ceec7f3591e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3816,4 +3816,32 @@ madvise_set_anon_name(struct mm_struct *mm, unsigned long start, } #endif +#ifdef CONFIG_MIGRC +void migrc_init_page(struct page *p); +bool migrc_pending(struct folio *f); +void migrc_shrink(struct llist_head *h); +void migrc_req_start(void); +void migrc_req_end(void); +bool migrc_req_processing(void); +bool migrc_try_flush_free_folios(void); +void migrc_try_flush_free_folios_dirty(void); +struct migrc_req *fold_ubc_nowr_to_migrc(void); +void free_migrc_req(struct migrc_req *req); + +extern atomic_t migrc_gen; +extern struct llist_head migrc_reqs; +extern struct llist_head migrc_reqs_dirty; +#else +static inline void migrc_init_page(struct page *p) {} +static inline bool migrc_pending(struct folio *f) { return false; } +static inline void migrc_shrink(struct llist_head *h) {} +static inline void migrc_req_start(void) {} +static inline void migrc_req_end(void) {} +static inline bool migrc_req_processing(void) { return false; } +static inline bool migrc_try_flush_free_folios(void) { return false; } +static inline void migrc_try_flush_free_folios_dirty(void) {} +static inline struct migrc_req *fold_ubc_nowr_to_migrc(void) { return NULL; } +static inline void free_migrc_req(struct migrc_req *req) {} +#endif + #endif /* _LINUX_MM_H */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 306a3d1a0fa6..56011670a6fe 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -228,6 +228,23 @@ struct page { #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS int _last_cpupid; #endif +#ifdef CONFIG_MIGRC + /* + * XXX: Need to get rid of the following additional fields. + */ + + /* + * for hanging onto a request(struct migrc_req), waiting for TLB + * flushes to free up this page + */ + struct llist_node migrc_node; + + /* + * for keeping a state of this page e.g. whether pending for TLB + * flushes, whether duplicated or whether in the initial state + */ + unsigned int migrc_state; +#endif } _struct_page_alignment; /* @@ -1255,4 +1272,34 @@ enum { /* See also internal only FOLL flags in mm/internal.h */ }; +#ifdef CONFIG_MIGRC +struct migrc_req { + /* + * pages hung onto this, pending for TLB flush + */ + struct llist_head pages; + + /* + * llist_node of the last page in 'pages' + */ + struct llist_node *last; + + /* + * for hanging onto the global llist, 'migrc_reqs' + */ + struct llist_node llnode; + + /* + * architecture specific batch data + */ + struct arch_tlbflush_unmap_batch arch; + + /* + * when this hung onto the global llist, 'migrc_reqs' + */ + int gen; +}; +#else +struct migrc_req {}; +#endif #endif /* _LINUX_MM_TYPES_H */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index a4889c9d4055..6d645beaf7a6 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1371,6 +1371,9 @@ typedef struct pglist_data { #ifdef CONFIG_MEMORY_FAILURE struct memory_failure_stats mf_stats; #endif +#ifdef CONFIG_MIGRC + atomic_t migrc_pending_nr; +#endif } pg_data_t; #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) diff --git a/include/linux/sched.h b/include/linux/sched.h index 2232b2cdfce8..d0a46089959d 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1323,6 +1323,10 @@ struct task_struct { struct tlbflush_unmap_batch tlb_ubc; struct tlbflush_unmap_batch tlb_ubc_nowr; +#ifdef CONFIG_MIGRC + struct migrc_req *mreq; + struct migrc_req *mreq_dirty; +#endif /* Cache last used pipe for splice(): */ struct pipe_inode_info *splice_pipe; diff --git a/init/Kconfig b/init/Kconfig index 32c24950c4ce..9f9d0f7e15d2 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -907,6 +907,19 @@ config NUMA_BALANCING_DEFAULT_ENABLED If set, automatic NUMA balancing will be enabled if running on a NUMA machine. +config MIGRC + bool "Deferring TLB flush by keeping read copies on migration" + depends on ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH + depends on NUMA_BALANCING + default n + help + TLB flush is necessary when PTE changes by migration. However, + TLB flush can be deferred if both copies of the src page and + the dst page are kept until TLB flush if they are non-writable. + System performance will be improved especially in case that + promotion and demotion types of migrations are heavily + happening. + menuconfig CGROUPS bool "Control Group support" select KERNFS diff --git a/mm/internal.h b/mm/internal.h index b90d516ad41f..a8e3168614d6 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -841,6 +841,8 @@ void try_to_unmap_flush(void); void try_to_unmap_flush_dirty(void); void flush_tlb_batched_pending(struct mm_struct *mm); void fold_ubc_nowr(void); +int nr_flush_required(void); +int nr_flush_required_nowr(void); #else static inline void try_to_unmap_flush(void) { @@ -854,6 +856,14 @@ static inline void flush_tlb_batched_pending(struct mm_struct *mm) static inline void fold_ubc_nowr(void) { } +static inline int nr_flush_required(void) +{ + return 0; +} +static inline int nr_flush_required_nowr(void) +{ + return 0; +} #endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ extern const struct trace_print_flags pageflag_names[]; diff --git a/mm/memory.c b/mm/memory.c index f69fbc251198..066b7d5b7217 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3345,6 +3345,20 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); + if (vmf->page) + folio = page_folio(vmf->page); + + /* + * This folio has its read copy to prevent inconsistency while + * deferring TLB flushes. However, the problem might arise if + * it's going to become writable. + * + * To prevent it, give up the deferring TLB flushes and perform + * TLB flush right away. + */ + if (folio && migrc_pending(folio)) + migrc_try_flush_free_folios(); + /* * Shared mapping: we are guaranteed to have VM_WRITE and * FAULT_FLAG_WRITE set at this point. @@ -3362,9 +3376,6 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) return wp_page_shared(vmf); } - if (vmf->page) - folio = page_folio(vmf->page); - /* * Private mapping: create an exclusive anonymous page copy if reuse * is impossible. We might miss VM_WRITE for FOLL_FORCE handling. diff --git a/mm/migrate.c b/mm/migrate.c index 01cac26a3127..f9446f5b312a 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -58,6 +58,230 @@ #include "internal.h" +#ifdef CONFIG_MIGRC + +/* + * TODO: Yeah, it's a non-sense magic number. This simple value manages + * to work conservatively anyway. However, the value needs to be + * tuned and adjusted based on the internal condition of memory + * management subsystem later. + * + * Let's start with a simple value for now. + */ +static const int migrc_pending_max = 512; /* unit: page */ + +atomic_t migrc_gen; +LLIST_HEAD(migrc_reqs); +LLIST_HEAD(migrc_reqs_dirty); + +enum { + MIGRC_STATE_NONE, + MIGRC_SRC_PENDING, + MIGRC_DST_PENDING, +}; + +static struct migrc_req *alloc_migrc_req(void) +{ + return kmalloc(sizeof(struct migrc_req), GFP_KERNEL); +} + +void free_migrc_req(struct migrc_req *req) +{ + kfree(req); +} + +static bool migrc_is_full(int nid) +{ + struct pglist_data *node = NODE_DATA(nid); + + if (migrc_pending_max == -1) + return false; + + return atomic_read(&node->migrc_pending_nr) >= migrc_pending_max; +} + +void migrc_init_page(struct page *p) +{ + WRITE_ONCE(p->migrc_state, MIGRC_STATE_NONE); +} + +/* + * The list should be isolated before. + */ +void migrc_shrink(struct llist_head *h) +{ + struct page *p, *p2; + struct llist_node *n; + + n = llist_del_all(h); + llist_for_each_entry_safe(p, p2, n, migrc_node) { + if (p->migrc_state == MIGRC_SRC_PENDING) { + struct pglist_data *node; + + node = NODE_DATA(page_to_nid(p)); + atomic_dec(&node->migrc_pending_nr); + } + + if (WARN_ON(!migrc_pending(page_folio(p)))) + continue; + + WRITE_ONCE(p->migrc_state, MIGRC_STATE_NONE); + + /* + * Ensure the folio is in the initial state once it has + * been freed and then allocated. + */ + smp_wmb(); + + folio_put(page_folio(p)); + } +} + +static inline bool migrc_src_pending(struct folio *f) +{ + /* + * For the case called from page fault handler, make sure the + * order between seeing updated PTE and reading migrc_state. + * + * Or should be able to see the initial state if no one has + * touched the state since allocation. + */ + smp_rmb(); + return READ_ONCE(f->page.migrc_state) == MIGRC_SRC_PENDING; +} + +static inline bool migrc_dst_pending(struct folio *f) +{ + /* + * For the case called from page fault handler, make sure the + * order between seeing updated PTE and reading migrc_state. + * + * Or should be able to see the initial state if no one has + * touched the state since allocation. + */ + smp_rmb(); + return READ_ONCE(f->page.migrc_state) == MIGRC_DST_PENDING; +} + +bool migrc_pending(struct folio *f) +{ + return migrc_src_pending(f) || migrc_dst_pending(f); +} + +static void migrc_expand_req(struct folio *fsrc, struct folio *fdst) +{ + struct migrc_req *req; + struct pglist_data *node; + + req = fold_ubc_nowr_to_migrc(); + if (!req) + return; + + folio_get(fsrc); + WRITE_ONCE(fsrc->page.migrc_state, MIGRC_SRC_PENDING); + WRITE_ONCE(fdst->page.migrc_state, MIGRC_DST_PENDING); + + /* + * Keep the order between migrc_state update and PTE update. + */ + smp_wmb(); + + if (llist_add(&fsrc->page.migrc_node, &req->pages)) + req->last = &fsrc->page.migrc_node; + + node = NODE_DATA(folio_nid(fsrc)); + atomic_inc(&node->migrc_pending_nr); + + if (migrc_is_full(folio_nid(fsrc))) + migrc_try_flush_free_folios(); +} + +/* + * To start to gather pages pending for TLB flushes, try to allocate + * objects needed, initialize them and make them ready. + */ +void migrc_req_start(void) +{ + struct migrc_req *req; + struct migrc_req *req_dirty; + + if (WARN_ON(current->mreq || current->mreq_dirty)) + return; + + req = alloc_migrc_req(); + req_dirty = alloc_migrc_req(); + + if (!req || !req_dirty) + goto fail; + + arch_tlbbatch_clean(&req->arch); + init_llist_head(&req->pages); + req->last = NULL; + current->mreq = req; + + /* + * Gather pages having a mapping, pte_dirty() == true, in a + * separate request to handle try_to_unmap_flush_dirty(). + */ + arch_tlbbatch_clean(&req_dirty->arch); + init_llist_head(&req_dirty->pages); + req_dirty->last = NULL; + current->mreq_dirty = req_dirty; + return; +fail: + if (req_dirty) + free_migrc_req(req_dirty); + if (req) + free_migrc_req(req); +} + +/* + * Hang the request with the collected pages onto the global llist, + * 'migrc_reqs', which will be referred when performing TLB flush via + * migrc_try_flush_free_folios(). + */ +void migrc_req_end(void) +{ + struct migrc_req *req = current->mreq; + struct migrc_req *req_dirty = current->mreq_dirty; + + WARN_ON((!req && req_dirty) || (req && !req_dirty)); + + if (!req || !req_dirty) + return; + + if (llist_empty(&req->pages)) { + free_migrc_req(req); + } else { + req->gen = atomic_inc_return(&migrc_gen); + llist_add(&req->llnode, &migrc_reqs); + } + current->mreq = NULL; + + /* + * Gather pages having a mapping, pte_dirty() == true, in a + * separate request to handle try_to_unmap_flush_dirty(). + */ + if (llist_empty(&req_dirty->pages)) { + free_migrc_req(req_dirty); + } else { + req_dirty->gen = atomic_inc_return(&migrc_gen); + llist_add(&req_dirty->llnode, &migrc_reqs_dirty); + } + current->mreq_dirty = NULL; +} + +bool migrc_req_processing(void) +{ + return current->mreq && current->mreq_dirty; +} +#else +static inline bool migrc_src_pending(struct folio *f) { return false; } +static inline bool migrc_dst_pending(struct folio *f) { return false; } +static inline bool migrc_is_full(int nid) { return true; } +static inline void migrc_expand_req(struct folio *fsrc, struct folio *fdst) {} +#endif + bool isolate_movable_page(struct page *page, isolate_mode_t mode) { struct folio *folio = folio_get_nontail_page(page); @@ -383,6 +607,9 @@ static int folio_expected_refs(struct address_space *mapping, struct folio *folio) { int refs = 1; + + refs += migrc_src_pending(folio) ? 1 : 0; + if (!mapping) return refs; @@ -1060,6 +1287,12 @@ static void migrate_folio_undo_src(struct folio *src, bool locked, struct list_head *ret) { + /* + * TODO: There might be folios already pending for migrc. + * However, there's no way to cancel those on failure for now. + * Let's reflect the requirement when needed. + */ + if (page_was_mapped) remove_migration_ptes(src, src, false); /* Drop an anon_vma reference if we took one */ @@ -1627,10 +1860,17 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, LIST_HEAD(unmap_folios); LIST_HEAD(dst_folios); bool nosplit = (reason == MR_NUMA_MISPLACED); + bool migrc_cond1; + bool need_migrc_flush = false; VM_WARN_ON_ONCE(mode != MIGRATE_ASYNC && !list_empty(from) && !list_is_singular(from)); + migrc_cond1 = (reason == MR_DEMOTION && current_is_kswapd()) || + (reason == MR_NUMA_MISPLACED); + + if (migrc_cond1) + migrc_req_start(); for (pass = 0; pass < nr_pass && (retry || large_retry); pass++) { retry = 0; large_retry = 0; @@ -1638,6 +1878,10 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, nr_retry_pages = 0; list_for_each_entry_safe(folio, folio2, from, lru) { + int nr_required; + bool migrc_cond2; + bool migrc; + /* * Large folio statistics is based on the source large * folio. Capture required information that might get @@ -1671,8 +1915,33 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, continue; } + nr_required = nr_flush_required(); rc = migrate_folio_unmap(get_new_page, put_new_page, private, folio, &dst, mode, reason, ret_folios); + migrc_cond2 = nr_required == nr_flush_required() && + nr_flush_required_nowr() && + !migrc_is_full(folio_nid(folio)); + migrc = migrc_cond1 && migrc_cond2; + + /* + * This folio already has been participating in + * migrc mechanism previously. + * + * If it was the destination at the migration, + * then TLB flush is needed to keep consistency + * because the folio is about to move to + * somewhere else again. + * + * If it was the source at the migration which + * cannot happen because the folio was isolated + * at that time so it couldn't play here, then + * it should be warned. + */ + if (migrc_dst_pending(folio)) + need_migrc_flush = true; + else if (migrc_src_pending(folio)) + WARN_ON(1); + /* * The rules are: * Success: folio will be freed @@ -1722,9 +1991,11 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, nr_large_failed += large_retry; stats->nr_thp_failed += thp_retry; rc_saved = rc; - if (list_empty(&unmap_folios)) + if (list_empty(&unmap_folios)) { + if (migrc_cond1) + migrc_req_end(); goto out; - else + } else goto move; case -EAGAIN: if (is_large) { @@ -1742,6 +2013,13 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, case MIGRATEPAGE_UNMAP: list_move_tail(&folio->lru, &unmap_folios); list_add_tail(&dst->lru, &dst_folios); + + if (migrc) + /* + * XXX: On migration failure, + * extra TLB flush might happen. + */ + migrc_expand_req(folio, dst); break; default: /* @@ -1760,6 +2038,14 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, stats->nr_failed_pages += nr_pages; break; } + + /* + * Setting up TLB batch information for this + * folio is done. It's about to move on to the + * next folio. Fold the remaining TLB batch + * information if it exists. + */ + fold_ubc_nowr(); } } nr_failed += retry; @@ -1767,9 +2053,15 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, stats->nr_thp_failed += thp_retry; stats->nr_failed_pages += nr_retry_pages; move: + if (migrc_cond1) + migrc_req_end(); + /* Flush TLBs for all unmapped folios */ try_to_unmap_flush(); + if (need_migrc_flush) + migrc_try_flush_free_folios(); + retry = 1; for (pass = 0; pass < nr_pass && (retry || large_retry); pass++) { retry = 0; diff --git a/mm/mm_init.c b/mm/mm_init.c index 7f7f9c677854..87cbddc7d780 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -558,6 +558,7 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn, page_mapcount_reset(page); page_cpupid_reset_last(page); page_kasan_tag_reset(page); + migrc_init_page(page); INIT_LIST_HEAD(&page->lru); #ifdef WANT_PAGE_VIRTUAL diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47421bedc12b..c51cbdb45d86 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1730,6 +1730,9 @@ inline void post_alloc_hook(struct page *page, unsigned int order, set_page_owner(page, order, gfp_flags); page_table_check_alloc(page, order); + + for (i = 0; i != 1 << order; ++i) + migrc_init_page(page + i); } static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, diff --git a/mm/rmap.c b/mm/rmap.c index d18460a48485..0652d25206ee 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -606,6 +606,97 @@ struct anon_vma *folio_lock_anon_vma_read(struct folio *folio, #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH +#ifdef CONFIG_MIGRC +static bool __migrc_try_flush_free_folios(struct llist_head *h) +{ + struct arch_tlbflush_unmap_batch arch; + struct llist_node *reqs; + struct migrc_req *req; + struct migrc_req *req2; + LLIST_HEAD(pages); + + reqs = llist_del_all(h); + if (!reqs) + return false; + + arch_tlbbatch_clean(&arch); + + /* + * TODO: Optimize the time complexity. + */ + llist_for_each_entry_safe(req, req2, reqs, llnode) { + struct llist_node *n; + + arch_tlbbatch_fold(&arch, &req->arch); + + n = llist_del_all(&req->pages); + llist_add_batch(n, req->last, &pages); + free_migrc_req(req); + } + + arch_tlbbatch_flush(&arch); + migrc_shrink(&pages); + return true; +} + +bool migrc_try_flush_free_folios(void) +{ + bool ret; + + /* + * If building up a request collecting pages is in progress, + * the request needs to be finalized and hung onto the global + * llist, 'migrc_reqs', so that it can be used in migrc's TLB + * flush routine e.i. __migrc_try_flush_free_folios(). + */ + if (migrc_req_processing()) { + migrc_req_end(); + migrc_req_start(); + } + ret = __migrc_try_flush_free_folios(&migrc_reqs); + ret = ret || __migrc_try_flush_free_folios(&migrc_reqs_dirty); + + return ret; +} + +void migrc_try_flush_free_folios_dirty(void) +{ + /* + * If building up a request collecting pages is in progress, + * the request needs to be finalized and hung onto the global + * llist, 'migrc_reqs', so that it can be used in migrc's TLB + * flush routine e.i. __migrc_try_flush_free_folios(). + */ + if (migrc_req_processing()) { + migrc_req_end(); + migrc_req_start(); + } + __migrc_try_flush_free_folios(&migrc_reqs_dirty); +} + +struct migrc_req *fold_ubc_nowr_to_migrc(void) +{ + struct tlbflush_unmap_batch *tlb_ubc_nowr = ¤t->tlb_ubc_nowr; + struct migrc_req *req; + bool dirty; + + if (!tlb_ubc_nowr->nr_flush_required) + return NULL; + + dirty = tlb_ubc_nowr->writable; + req = dirty ? current->mreq_dirty : current->mreq; + if (!req) { + fold_ubc_nowr(); + return NULL; + } + + arch_tlbbatch_fold(&req->arch, &tlb_ubc_nowr->arch); + tlb_ubc_nowr->nr_flush_required = 0; + tlb_ubc_nowr->writable = false; + return req; +} +#endif + void fold_ubc_nowr(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; @@ -621,6 +712,16 @@ void fold_ubc_nowr(void) tlb_ubc_nowr->writable = false; } +int nr_flush_required(void) +{ + return current->tlb_ubc.nr_flush_required; +} + +int nr_flush_required_nowr(void) +{ + return current->tlb_ubc_nowr.nr_flush_required; +} + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed @@ -648,6 +749,8 @@ void try_to_unmap_flush_dirty(void) if (tlb_ubc->writable || tlb_ubc_nowr->writable) try_to_unmap_flush(); + + migrc_try_flush_free_folios_dirty(); } /* From patchwork Thu Aug 17 08:05:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13356107 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2DBAC2FC14 for ; Thu, 17 Aug 2023 08:09:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E90D528003C; Thu, 17 Aug 2023 04:09:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E1A1E28003B; Thu, 17 Aug 2023 04:09:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C6B8B28003C; Thu, 17 Aug 2023 04:09:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B410B280039 for ; Thu, 17 Aug 2023 04:09:02 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 857EF809C3 for ; Thu, 17 Aug 2023 08:09:02 +0000 (UTC) X-FDA: 81132870924.04.C165919 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf25.hostedemail.com (Postfix) with ESMTP id 84AC8A001F for ; Thu, 17 Aug 2023 08:09:00 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf25.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692259740; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=CI7BtFpMpIaaOCGYgXUvB3zymfhIaAUcP12OXiwAjus=; b=EOGr9wF0XOLv4sHykC+36oFZMvs6g18h33BEMG/hNF0TRaWu5tlZ2vDFf8xmeiMFobLv03 j1dIMarODuGTCEPOQ1GXidfWyyq8ATUYiXXNNfma5foVnktyfyGE2WaizFpp9Pvvw6wb7D +zj7fBC3GP+ETW3FMIYcDupUtDTA7rE= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf25.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692259740; a=rsa-sha256; cv=none; b=2wrCp0Yh9uP6PpXNt6++D0ntKmeErRbFCiX/ZsyMH2aRAWYtkcVjZsm6IRTB6zaOVkkjFX MgH3cF/yb+zRKcGRCb74A9zyegUFPGXPwymCcKkfs6X1sJ2bkT4NuPAJMpascE4ALm9kVt KzYLrzKEIITSvOJJH72MhDh0OOrjins= X-AuditID: a67dfc5b-d85ff70000001748-cd-64ddd598b1b7 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, namit@vmware.com, xhao@linux.alibaba.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, dave.hansen@linux.intel.com Subject: [RFC v2 3/6] mm, migrc: Skip TLB flushes at the CPUs that already have been done Date: Thu, 17 Aug 2023 17:05:56 +0900 Message-Id: <20230817080559.43200-4-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230817080559.43200-1-byungchul@sk.com> References: <20230817080559.43200-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrFLMWRmVeSWpSXmKPExsXC9ZZnoe6Mq3dTDE7M1bWYs34Nm8WLDe2M Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsru96yGhxvPcAk8XvH0DZOVOsLE7O msziwOuxYFOpx+YVWh6L97xk8ti0qpPNY9OnSeweJ2b8ZvHY+dDSY97JQI/3+66yeWz9Zefx eZOcx7v5b9kCeKK4bFJSczLLUov07RK4MpYdkS64p1Axue8EYwPjHakuRk4OCQETic0NU5hg 7MPLLrKC2GwC6hI3bvxkBrFFBMwkDrb+Ye9i5OJgFljGJHH3wDmwImGBSIl3K9rAmlkEVCUO 31kK1sArYCoxe8lPNoih8hKrNxwAinNwcAIN2vxXFSQsBFTyYe86VpCZEgJn2CSmbZ7BClEv KXFwxQ2WCYy8CxgZVjEKZeaV5SZm5pjoZVTmZVboJefnbmIEhvCy2j/ROxg/XQg+xCjAwajE w+uw606KEGtiWXFl7iFGCQ5mJRHeHt5bKUK8KYmVValF+fFFpTmpxYcYpTlYlMR5jb6VpwgJ pCeWpGanphakFsFkmTg4pRoYXf75bBe89vrhtJhWGcu90yxfHKrkkzZ7sn72XYcjTpwrPT3e f3yheun5kbblwobtpdvC3zXwdMntiJx4R/adxYXLrB8aGv0Zo473KXxO8xZexrS/yvar891T TQVxvfN5e4rkct+uiN1y2k68ctFCm76HFQKbtmfe36R7Q6JZkFlsRX9ewu3dSizFGYmGWsxF xYkABxT0Ul0CAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrILMWRmVeSWpSXmKPExsXC5WfdrDvj6t0Ugwd3lS3mrF/DZvFiQzuj xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlc3/WQ0eJ47wEmi98/gLJz plhZnJw1mcWBz2PBplKPzSu0PBbvecnksWlVJ5vHpk+T2D1OzPjN4rHzoaXHvJOBHu/3XWXz WPziA5PH1l92Hp83yXm8m/+WLYA3issmJTUnsyy1SN8ugStj2RHpgnsKFZP7TjA2MN6R6mLk 5JAQMJE4vOwiK4jNJqAucePGT2YQW0TATOJg6x/2LkYuDmaBZUwSdw+cAysSFoiUeLeijQnE ZhFQlTh8ZylYA6+AqcTsJT/ZIIbKS6zecAAozsHBCTRo819VkLAQUMmHvetYJzByLWBkWMUo kplXlpuYmWOqV5ydUZmXWaGXnJ+7iREYkMtq/0zcwfjlsvshRgEORiUeXoddd1KEWBPLiitz DzFKcDArifD28N5KEeJNSaysSi3Kjy8qzUktPsQozcGiJM7rFZ6aICSQnliSmp2aWpBaBJNl 4uCUamCcmfprh7SOzUXlmHMuxVOuNJnK5DTzyDXMUJVT5W3P/P1wRvSHrZ6rr9y8GC25bu60 PtvjD96/c+Vi7Tx7+5eBgsstd0OGK+9r4yMuGHKd69RmfWRqLhWm8D7nYrWe5gGVD4oz6g7v 6Fix4WZ53AbnqrvWHgHF27P+T75mu+uiu1PIpotT1yxUYinOSDTUYi4qTgQATWrKskQCAAA= X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 84AC8A001F X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: q6ypfahzdwbyk83biw1469od8oktojfx X-HE-Tag: 1692259740-366133 X-HE-Meta: U2FsdGVkX18MTjEuLSQBWLhyT+1yp246NZ+ucmC6Hw2G2Oh7KU4pZoOltKHXIzAzypLI20BoSQ6wfJm85PtbBZKq4WNsOJxYeMdBASQJT0YT8g3hnN9nAcPi8L59USCD8MdQNtaWjB60YoJc59RV+VeMT/c8PVRyQzsfWUSf3UKCGtyzCBxjszEcxA715ycVFBxGPAI4c2YbgG8tTXBwfcMn4LA6zdwDoG2bKk1QOTfOGWv/go1BCkWGHgZjdl3J/caxt5mj62GbxgzTc/MofXJ/DVWikjXim0cqBDdyFjRB6PiYGGOVK9TqtugnR3UZDrkIU26IMtrJqDLP6cM0s1+f8wMGw6VuYGesqszszCJNOgtyvroaXvdm3OTCNXl77JYIE2OfQP/hvlnqn4avGs8mXZFc3ilGTvoAS2gZnCpzhib5SeWTYX8K1lFODuC+KMIdqm0pQHtaPcmX21/HbLEtiaMx25b3UkYWgFK0oyyMpDK+Ajiv9/CY2/jOnbH6j0LI7QddIj5okyd8wEt4bI7vQV3+rQvveooaNW9Ku8PjPl2WKsPNtUKAX958IJhUq3h66WPxE0U0s+kJfVZNjczhDVq426VPTvSr7fQoQrMvyWfMmb1JUceHLspmZ0F/WQLF0L6P19Zjz7wDdu8ExZvUgyETTqFJl9NYOC+qFteiy5rURQ0xZusffFMgEnT1wGgW/VVMZo9SUi0J92IGuNHNRFNocisKH69RCIZJsShvGwycCKASUWVYDsXEQC3ROzRqxiqh+JECP5/9Ahrj1d+h8lYqvyEKAd0cZ70maTROvlny3YTJJrZsNxUnccgqEFePTDBS27RvBOgCXxli8s3XmJndggip9IorinqA50QxGQ0VVif/ccCHM7ctahd4WVMjQIU2AwAxGUB74LgCyVGB0M9YyDdWdVKB1cyYZyVtBl7B0n0zpsPK8VhZIRJQ2KWtwmt1sLorg3SPDkb PGFUU+w+ PH+qV0q1ViiFriovrlGfd4Zn1/w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: TLB flushes can be skipped if TLB flushes requested have been done by any reason, which doesn't have to be done from migrations. It can be tracked by keeping timestamp(= migrc_gen) when it's requested and when it's triggered. Signed-off-by: Byungchul Park --- arch/x86/include/asm/tlbflush.h | 6 ++++ arch/x86/mm/tlb.c | 55 +++++++++++++++++++++++++++++++++ mm/migrate.c | 10 ++++++ mm/rmap.c | 1 + 4 files changed, 72 insertions(+) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 752d72ea209b..da987c15049e 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -283,6 +283,12 @@ extern void arch_tlbbatch_clean(struct arch_tlbflush_unmap_batch *batch); extern void arch_tlbbatch_fold(struct arch_tlbflush_unmap_batch *bdst, struct arch_tlbflush_unmap_batch *bsrc); +#ifdef CONFIG_MIGRC +extern void arch_migrc_adj(struct arch_tlbflush_unmap_batch *batch, int gen); +#else +static inline void arch_migrc_adj(struct arch_tlbflush_unmap_batch *batch, int gen) {} +#endif + static inline bool pte_flags_need_flush(unsigned long oldflags, unsigned long newflags, bool ignore_access) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 2dabf0f340fb..913cad013979 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1210,9 +1210,48 @@ STATIC_NOPV void native_flush_tlb_local(void) native_write_cr3(__native_read_cr3()); } +#ifdef CONFIG_MIGRC +DEFINE_PER_CPU(int, migrc_done); + +static inline int migrc_tlb_local_begin(void) +{ + int ret = atomic_read(&migrc_gen); + + /* + * XXX: barrier() would be sufficient if the architecture + * quarantees the order between memory access and TLB flush. + */ + smp_mb(); + return ret; +} + +static inline void migrc_tlb_local_end(int gen) +{ + /* + * XXX: barrier() would be sufficient if the architecture + * quarantees the order between TLB flush and memory access. + */ + smp_mb(); + WRITE_ONCE(*this_cpu_ptr(&migrc_done), gen); +} +#else +static inline int migrc_tlb_local_begin(void) +{ + return 0; +} + +static inline void migrc_tlb_local_end(int gen) +{ +} +#endif + void flush_tlb_local(void) { + unsigned int gen; + + gen = migrc_tlb_local_begin(); __flush_tlb_local(); + migrc_tlb_local_end(gen); } /* @@ -1237,6 +1276,22 @@ void __flush_tlb_all(void) } EXPORT_SYMBOL_GPL(__flush_tlb_all); +#ifdef CONFIG_MIGRC +static inline bool before(int a, int b) +{ + return a - b < 0; +} + +void arch_migrc_adj(struct arch_tlbflush_unmap_batch *batch, int gen) +{ + int cpu; + + for_each_cpu(cpu, &batch->cpumask) + if (!before(READ_ONCE(*per_cpu_ptr(&migrc_done, cpu)), gen)) + cpumask_clear_cpu(cpu, &batch->cpumask); +} +#endif + void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) { struct flush_tlb_info *info; diff --git a/mm/migrate.c b/mm/migrate.c index f9446f5b312a..c7b72d275b2a 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2053,6 +2053,16 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, stats->nr_thp_failed += thp_retry; stats->nr_failed_pages += nr_retry_pages; move: + /* + * Should be prior to try_to_unmap_flush() so that + * migrc_try_flush_free_folios() that will be called later + * can take benefit from the TLB flushes in try_to_unmap_flush(). + * + * migrc_req_end() will store the timestamp for pending, and + * TLB flushes will also store the timestamp for TLB flush so + * that unnecessary TLB flushes can be skipped using the time + * information. + */ if (migrc_cond1) migrc_req_end(); diff --git a/mm/rmap.c b/mm/rmap.c index 0652d25206ee..2ae1b1324f84 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -627,6 +627,7 @@ static bool __migrc_try_flush_free_folios(struct llist_head *h) llist_for_each_entry_safe(req, req2, reqs, llnode) { struct llist_node *n; + arch_migrc_adj(&req->arch, req->gen); arch_tlbbatch_fold(&arch, &req->arch); n = llist_del_all(&req->pages); From patchwork Thu Aug 17 08:05:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13356111 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15B44EB64DD for ; Thu, 17 Aug 2023 08:09:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9123B28003B; Thu, 17 Aug 2023 04:09:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 89CB7280040; Thu, 17 Aug 2023 04:09:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6535928003E; Thu, 17 Aug 2023 04:09:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2A9E128003F for ; Thu, 17 Aug 2023 04:09:05 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C7A981204D2 for ; Thu, 17 Aug 2023 08:09:04 +0000 (UTC) X-FDA: 81132871008.17.F71BEA9 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf25.hostedemail.com (Postfix) with ESMTP id DA64EA000B for ; Thu, 17 Aug 2023 08:09:02 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf25.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692259743; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=VeZGpcoQenF/aQoygKVBgWLSehzTiJkIYYfUPtiFLks=; b=vz9/AcGHXrTfxjNCgjW7oPGuORiKv/eqMLwbFCRAVFkhLOu1+vNpFTY6v3ffsmxgaHLMys nRBLOaGNcnlpBkkqG8ZqLoMzSizOq4/sKgac4UsPiGT1LM0VXP5xqhoYX3/PEsw9ytVbBz vwehQVRcpJRm4q7JF9HntngN3HR6mb8= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf25.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692259743; a=rsa-sha256; cv=none; b=rz2Grsj6YIjMESHqcM2/IH42VL2W2yPcpMmNgFMtTJFcbNffhmMmxkMzS0BjPXZ190JBY9 GHqZIzOMzAG2aX2XZRk9vK75tSMlrXsrqokarNiJhO71SHa24xei6+GdGybBKOgxhga3pi j18DSmVAgIbkSe+W5Tx64hYK4O2Vl+s= X-AuditID: a67dfc5b-d85ff70000001748-d1-64ddd5982d36 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, namit@vmware.com, xhao@linux.alibaba.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, dave.hansen@linux.intel.com Subject: [RFC v2 4/6] mm, migrc: Ajust __zone_watermark_ok() with the amount of pending folios Date: Thu, 17 Aug 2023 17:05:57 +0900 Message-Id: <20230817080559.43200-5-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230817080559.43200-1-byungchul@sk.com> References: <20230817080559.43200-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrBLMWRmVeSWpSXmKPExsXC9ZZnke6Mq3dTDG4dM7CYs34Nm8WLDe2M Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsru96yGhxvPcAk8XvH0DZOVOsLE7O msziwOuxYFOpx+YVWh6L97xk8ti0qpPNY9OnSeweJ2b8ZvHY+dDSY97JQI/3+66yeWz9Zefx eZOcx7v5b9kCeKK4bFJSczLLUov07RK4Ms4d38NesF2tYtO6o8wNjPfluxg5OCQETCTW3YyH Mf8vz+9i5ORgE1CXuHHjJzOILSJgJnGw9Q97FyMXB7PAMiaJuwfOsYIkhAXiJFau3sECYrMI qEpcfLOZHcTmFTCV+Hx1GliNhIC8xOoNB5hB5nMCDdr8VxUkLARU8mHvOqiSy2wS819zQdiS EgdX3GCZwMi7gJFhFaNQZl5ZbmJmjoleRmVeZoVecn7uJkZg+C6r/RO9g/HTheBDjAIcjEo8 vA677qQIsSaWFVfmHmKU4GBWEuHt4b2VIsSbklhZlVqUH19UmpNafIhRmoNFSZzX6Ft5ipBA emJJanZqakFqEUyWiYNTqoHRa/KMqjNFu48+3V7bev8KI+vzGTdr/J5s8PHXUOR6Oaf38F1R xgtlCVulbZdfEpgq+3vPjYRWs8DH8/1kRPWddCbUZsXtOFJlWz3hh0D8grvXdmp7e3PY/0lb 4uPQa3Vx1nfdQ1z/k/gXrniy5M8xDoHn51Z0Pssy2FFlcdrZpfBd0m/pk5N/KrEUZyQaajEX FScCAEvBO9lbAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrILMWRmVeSWpSXmKPExsXC5WfdrDvj6t0Ug+db1SzmrF/DZvFiQzuj xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlc3/WQ0eJ47wEmi98/gLJz plhZnJw1mcWBz2PBplKPzSu0PBbvecnksWlVJ5vHpk+T2D1OzPjN4rHzoaXHvJOBHu/3XWXz WPziA5PH1l92Hp83yXm8m/+WLYA3issmJTUnsyy1SN8ugSvj3PE97AXb1So2rTvK3MB4X76L kYNDQsBE4v/y/C5GTg42AXWJGzd+MoPYIgJmEgdb/7B3MXJxMAssY5K4e+AcK0hCWCBOYuXq HSwgNouAqsTFN5vZQWxeAVOJz1engdVICMhLrN5wgBlkPifQoM1/VUHCQkAlH/auY53AyLWA kWEVo0hmXlluYmaOqV5xdkZlXmaFXnJ+7iZGYEAuq/0zcQfjl8vuhxgFOBiVeHgddt1JEWJN LCuuzD3EKMHBrCTC28N7K0WINyWxsiq1KD++qDQntfgQozQHi5I4r1d4aoKQQHpiSWp2ampB ahFMlomDU6qBccPd0sIrDBUHxb7I739uoXMje73U4/+m3pJes62ZHz1bt0puV/9HV9dcjm0b FFdHOv/Zt1r4C8/VrusTU2T0jLQ2Wis0iOekZzSevhOzy6VqgeJNsfy/D8TeT4xqqA7Rjy5/ d+fgFctzL2q2HGXc2bKDWXzN7zcRS+MZT9ddS/8pv/sNh+0fHiWW4oxEQy3mouJEAEEGEXdE AgAA X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: DA64EA000B X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: chyb98hb7i9z9iupeaqtuc8g5a4tqjjw X-HE-Tag: 1692259742-248620 X-HE-Meta: U2FsdGVkX1/R1+GtZ/GjGMcW9gN5ynd26SHbnPTiWo7So/sVOpDTzB1Y1zWLDM9sVVY5yFD920VZdJz2/qRnuMycMtSigofqI6XjRC2ElDvUEyGxOUqlCHbI2b9Vii3yxYaYQ/CIdv+psc+/ukjcdhXcPLX1CfysLIXxfZunwmHcOMVUv0r8WvC0DXvscODDNBVuZhEBRxtnHelmhnzHIHpt5GkkgtsT3L5FHDHV97IzwZ6vpLTgsfAWnlhBO3RIk+6i76IB1J+aT7Htzdip/uuvBqOAA16pidblatckfUk2s0T8i/ayCU+gvYvwMXiPIPMdzjTVyZ6231FY4lWfBvqnmcdGKTorzi6mUuuIwhodoVuYVp/xtya9/xYhI5suv0I5ceStj4bZrQo41RGOdblw7J6Ih6C3EObgmL9g/MpQYWaEEEYVXberTV2O/E3L+lbfsvUxDIwnzRGk/AYd2c2kIGgyk5FQJ8b0YzXWj+yxpOqE/tuT7DlZEpnufwM1kQWhiOF1ggSZfXQfVHIjtpxl3yiaxCb/WlN70PY+8S2e9Tyma2jas9vsRl9QQsN73YrHJ0txlu/TnT8ib2//qsKCMBYyqg1DL0LqcIHHCjhkUXHn/XAooAv6T9IJ77WdE/KDW1Txrv99b7uC26lLOLgyVPD+S6oE4Hup+cD75siEHj/blMnJGaLNH//LwmZJCvM+5BWhe+S0gS7bF611NAnU0ftn0zoANGVQd47hgRtoaFNJ1b31rn2dOhq0LM8jpBFjreqovOs+YwCpCEcnKhrCFL2FuLe5yOBI8aZ1ZUQVw2PP5NwUTk9wueA+PB8fx2Bz7omTfZePodlXitmcwjjVZOevtUKx/+No1FKXiRgOrHRHMctZ8bd9Fswbv8pTfLeqvCtcdadMh95Yue5iydLscjA+lJw+llP+pxP59Uwz8IYmip/a8cf8wQ6UZwpTnQIu9nKLP29Fg2dMTID ACIU2T8X Mk3etyWnPN6eJM46e2jGPBBflyPPOA5crwT81jxfaFjv8N2A= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: CONFIG_MIGRC duplicates folios participated in migration to avoid TLB flushes and provide a consistent view to CPUs that are still caching its old mapping in TLB. However, the duplicated folios can be freed and available right away through appropreate TLB flushes if needed. Adjusted watermark check routine, __zone_watermark_ok(), with the number of duplicated folios and made it perform TLB flushes and free the duplicated folios if page allocation routine is in trouble due to memory pressure, even more aggresively for high order allocation. Signed-off-by: Byungchul Park --- include/linux/mm.h | 2 ++ include/linux/mmzone.h | 3 +++ mm/migrate.c | 12 ++++++++++++ mm/page_alloc.c | 16 ++++++++++++++++ 4 files changed, 33 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1ceec7f3591e..9df393074e6a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3827,6 +3827,7 @@ bool migrc_try_flush_free_folios(void); void migrc_try_flush_free_folios_dirty(void); struct migrc_req *fold_ubc_nowr_to_migrc(void); void free_migrc_req(struct migrc_req *req); +int migrc_pending_nr_in_zone(struct zone *z); extern atomic_t migrc_gen; extern struct llist_head migrc_reqs; @@ -3842,6 +3843,7 @@ static inline bool migrc_try_flush_free_folios(void) { return false; } static inline void migrc_try_flush_free_folios_dirty(void) {} static inline struct migrc_req *fold_ubc_nowr_to_migrc(void) { return NULL; } static inline void free_migrc_req(struct migrc_req *req) {} +static inline int migrc_pending_nr_in_zone(struct zone *z) { return 0; } #endif #endif /* _LINUX_MM_H */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 6d645beaf7a6..1ec79bb63ba7 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -958,6 +958,9 @@ struct zone { /* Zone statistics */ atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS]; atomic_long_t vm_numa_event[NR_VM_NUMA_EVENT_ITEMS]; +#ifdef CONFIG_MIGRC + atomic_t migrc_pending_nr; +#endif } ____cacheline_internodealigned_in_smp; enum pgdat_flags { diff --git a/mm/migrate.c b/mm/migrate.c index c7b72d275b2a..badef3d89c6c 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -117,9 +117,12 @@ void migrc_shrink(struct llist_head *h) llist_for_each_entry_safe(p, p2, n, migrc_node) { if (p->migrc_state == MIGRC_SRC_PENDING) { struct pglist_data *node; + struct zone *zone; node = NODE_DATA(page_to_nid(p)); + zone = page_zone(p); atomic_dec(&node->migrc_pending_nr); + atomic_dec(&zone->migrc_pending_nr); } if (WARN_ON(!migrc_pending(page_folio(p)))) @@ -172,6 +175,7 @@ static void migrc_expand_req(struct folio *fsrc, struct folio *fdst) { struct migrc_req *req; struct pglist_data *node; + struct zone *zone; req = fold_ubc_nowr_to_migrc(); if (!req) @@ -190,7 +194,9 @@ static void migrc_expand_req(struct folio *fsrc, struct folio *fdst) req->last = &fsrc->page.migrc_node; node = NODE_DATA(folio_nid(fsrc)); + zone = page_zone(&fsrc->page); atomic_inc(&node->migrc_pending_nr); + atomic_inc(&zone->migrc_pending_nr); if (migrc_is_full(folio_nid(fsrc))) migrc_try_flush_free_folios(); @@ -275,6 +281,12 @@ bool migrc_req_processing(void) { return current->mreq && current->mreq_dirty; } + +int migrc_pending_nr_in_zone(struct zone *z) +{ + return atomic_read(&z->migrc_pending_nr); + +} #else static inline bool migrc_src_pending(struct folio *f) { return false; } static inline bool migrc_dst_pending(struct folio *f) { return false; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c51cbdb45d86..9f791c0fa15d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3179,6 +3179,11 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, long min = mark; int o; + /* + * There are pages that can be freed by migrc_try_flush_free_folios(). + */ + free_pages += migrc_pending_nr_in_zone(z); + /* free_pages may go negative - that's OK */ free_pages -= __zone_watermark_unusable_free(z, order, alloc_flags); @@ -4257,6 +4262,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, unsigned int zonelist_iter_cookie; int reserve_flags; + migrc_try_flush_free_folios(); restart: compaction_retries = 0; no_progress_loops = 0; @@ -4772,6 +4778,16 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid, if (likely(page)) goto out; + if (order && migrc_try_flush_free_folios()) { + /* + * Try again after freeing migrc's pending pages in case + * of high order allocation. + */ + page = get_page_from_freelist(alloc_gfp, order, alloc_flags, &ac); + if (likely(page)) + goto out; + } + alloc_gfp = gfp; ac.spread_dirty_pages = false; From patchwork Thu Aug 17 08:05:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13356110 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05DA3C2FC0E for ; Thu, 17 Aug 2023 08:09:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6536328003F; Thu, 17 Aug 2023 04:09:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 603BC28003B; Thu, 17 Aug 2023 04:09:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B873280040; Thu, 17 Aug 2023 04:09:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 26D9E28003E for ; Thu, 17 Aug 2023 04:09:05 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E63DA80FC7 for ; Thu, 17 Aug 2023 08:09:04 +0000 (UTC) X-FDA: 81132871008.07.A72AF96 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf17.hostedemail.com (Postfix) with ESMTP id F17B740015 for ; Thu, 17 Aug 2023 08:09:02 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf17.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692259743; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=6d5JvFgXB4IH48gYGHBh4W+DlbqVNMKKaT1DruFmElM=; b=60WjFwf+eKJRDZVAO6bNyP87hL7W6IM5u1AJW6DIKENNQmou37tVHq9InaqJWpa0+KY9Ak BjAbDVeXcEXyYOicSeu3ganvGm9sB/0ePl4fD1Cjuc9Q4z3Qz5embJV4/dPhB4kQZoTvtG Z3ywSMh9ovtlo4rsTl5MowXlFi9EJT4= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf17.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692259743; a=rsa-sha256; cv=none; b=h/SEhCTNR33irNLddXZfLHY9nL9f39hDHZqEL+OlEbBfyUfWRrIKrLIlL5W51LYI9umM7/ 9hNk9GqpjO8R+uylyu8+Dip0B2KN/+oNsJxLEAKmkJAT0QCkIHKQpvdxBq8ugvs4ZNwhhh KnvqPN8npu/msNkYd5RRK3FSfTTdE6U= X-AuditID: a67dfc5b-d85ff70000001748-d5-64ddd598af46 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, namit@vmware.com, xhao@linux.alibaba.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, dave.hansen@linux.intel.com Subject: [RFC v2 5/6] mm, migrc: Add a sysctl knob to enable/disable MIGRC mechanism Date: Thu, 17 Aug 2023 17:05:58 +0900 Message-Id: <20230817080559.43200-6-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230817080559.43200-1-byungchul@sk.com> References: <20230817080559.43200-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrFLMWRmVeSWpSXmKPExsXC9ZZnke6Mq3dTDNrmGVnMWb+GzeLFhnZG i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwW13c9ZLQ43nuAyeL3D6DsnClWFidn TWZx4PVYsKnUY/MKLY/Fe14yeWxa1cnmsenTJHaPEzN+s3jsfGjpMe9koMf7fVfZPLb+svP4 vEnO4938t2wBPFFcNimpOZllqUX6dglcGXf/GRccEKrYMmEvcwPje74uRk4OCQETieN35zHB 2Ldm7mMDsdkE1CVu3PjJDGKLCJhJHGz9w97FyMXBLLCMSeLugXOsIAlhgRCJnw9awGwWAVWJ J93XwWxeAVOJ35sbmCGGykus3nAAyObg4AQatPmvKkhYCKjkw951rCAzJQTOsElcnvKQHaJe UuLgihssExh5FzAyrGIUyswry03MzDHRy6jMy6zQS87P3cQIDOFltX+idzB+uhB8iFGAg1GJ h9dh150UIdbEsuLK3EOMEhzMSiK8Pby3UoR4UxIrq1KL8uOLSnNSiw8xSnOwKInzGn0rTxES SE8sSc1OTS1ILYLJMnFwSjUwztn16LDMdu7/Ycu1d3hUcljkZDFn+jWF3jJcNevB5GdSv2oK lp1qjIpo+SffzJmQuj62+u5O+VfBfV2qc7bsitTfENnsPnV/Z27Lb4eLL9bcWno1XnXVoVCz Ey5G25t6Kw+vmae4W1Dk8+qLqwwT+S+k7VPhT03aHLniSjfTky0PvgYYGGhkK7EUZyQaajEX FScCADd7b45dAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrILMWRmVeSWpSXmKPExsXC5WfdrDvj6t0UgyN7NCzmrF/DZvFiQzuj xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlc3/WQ0eJ47wEmi98/gLJz plhZnJw1mcWBz2PBplKPzSu0PBbvecnksWlVJ5vHpk+T2D1OzPjN4rHzoaXHvJOBHu/3XWXz WPziA5PH1l92Hp83yXm8m/+WLYA3issmJTUnsyy1SN8ugSvj7j/jggNCFVsm7GVuYHzP18XI ySEhYCJxa+Y+NhCbTUBd4saNn8wgtoiAmcTB1j/sXYxcHMwCy5gk7h44xwqSEBYIkfj5oAXM ZhFQlXjSfR3M5hUwlfi9uYEZYqi8xOoNB4BsDg5OoEGb/6qChIWASj7sXcc6gZFrASPDKkaR zLyy3MTMHFO94uyMyrzMCr3k/NxNjMCAXFb7Z+IOxi+X3Q8xCnAwKvHwOuy6kyLEmlhWXJl7 iFGCg1lJhLeH91aKEG9KYmVValF+fFFpTmrxIUZpDhYlcV6v8NQEIYH0xJLU7NTUgtQimCwT B6dUA2P3ZhuRjPWfy9UDL9z26D014an2cSOOpjOvI+OntWtMrQ+p2SNpWPVj2QsVLiutoiXX qjN3Ws1XdrSZJ3J4XdK6yXb2ZReTAuYZTZC5uetgx6lJb+9tT352WbFr53kbN+6wxQrll3s3 a7MJVh1adddld+L01eIGKb61jSeeFpRGPJjxOOjbRSYlluKMREMt5qLiRAD0GJvFRAIAAA== X-CFilter-Loop: Reflected X-Rspam-User: X-Stat-Signature: yrehn8r3qesr5cc9hdtpechsuy173eui X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: F17B740015 X-HE-Tag: 1692259742-134426 X-HE-Meta: U2FsdGVkX1/C0yKsSwT7KcvbP9CZfr53+JyOUM+/HgdO5+IbSy9k9cPdM+iJ1i/Wg0kmxyEw/+/93Wqaiz5Rm5KTLvH2xaA5wyzJLYFL8tTWQnRbf8lO2wu7/Od968KzBWkXzE+N0+dbWkqVMnK3D/vyxWv50LAdbEah926BxmX+6FCgr/N3ntjPM3h6uV6ihtxwZGC4nsKtnfPIoL0cekfT80E1Fv3/1cFIfY6RjT2CaG5k1dWZIhoPWjRbBw2OBaBqOomvn5/+zktRQlrBwOXVrkkFhhCOSO2SFf68lWBQr6mUA0YX82NvTQVD+ZM5tR/7JpOuogjtrHpzS0IM4u+NRUkkuhBDj8Vos1z4WZxOQAWjVXXey6DByiOvwDgFShaA/C7GO/4O+SMBfYJLZXRBlVq7b3uIBTFzb13asvSbB30B02DMAgHn6Bo4tHNoeixrarCPMywl8lOsq3uNtmkfSG2Pqtlz6Ysjk7597PBWynyiU+Ndron8NQeilQiFT5M5k5kg4DMn+wEFpfx81KZHTULJprC1KMFG5Aor9g7KMzQVmFeTg1g8Z5NSrMzEGAFDMtSGej9Km6l5gteZCj7N4HvMBXFxmZHTCAfyvm/1pFy8nheOAOrj6p1A0DcwbrsYxwALu2YZQcgTjy0Kv9eRpZYXeGFETxveb49r9zNMM97rLaLj1GXHbmKeq/lcqEKQAVg0nS/puO5So3P8UbHbz/598onfmW3e/AB3tmSNofRWiNjU5RUCBm6H/E89G9AJyfH8e/8KW7gnnZPnC5oHizwCu0RINovPRu+7DT43cTZaYVpJICnuORSFfdyLc14MWJbrYYuBROA5cgS2dzOkot5VYRAUV//4Gb8CeyaQeiGXChniwcQ48UFl9zuChQzm0F7eS8DQKB6nCc3MClNMquwsN9XAUlpVTUVVdcTDiEO8zOQKN3BNka5Hz0LbdRbc8ZT6udrEu4KgY+C GKg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a sysctl knob, '/proc/sys/vm/migrc_enable' to switch on/off migrc. Signed-off-by: Byungchul Park --- mm/migrate.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index badef3d89c6c..c57536a0b2a6 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -59,6 +59,48 @@ #include "internal.h" #ifdef CONFIG_MIGRC +static int sysctl_migrc_enable = 1; +#ifdef CONFIG_SYSCTL +static int sysctl_migrc_enable_handler(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + struct ctl_table t; + int err; + int enabled = sysctl_migrc_enable; + + if (write && !capable(CAP_SYS_ADMIN)) + return -EPERM; + + t = *table; + t.data = &enabled; + err = proc_dointvec_minmax(&t, write, buffer, lenp, ppos); + if (err < 0) + return err; + if (write) + sysctl_migrc_enable = enabled; + return err; +} + +static struct ctl_table migrc_sysctls[] = { + { + .procname = "migrc_enable", + .data = NULL, /* filled in by handler */ + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = sysctl_migrc_enable_handler, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, + {} +}; + +static int __init migrc_sysctl_init(void) +{ + register_sysctl_init("vm", migrc_sysctls); + return 0; +} +late_initcall(migrc_sysctl_init); +#endif /* * TODO: Yeah, it's a non-sense magic number. This simple value manages @@ -288,6 +330,7 @@ int migrc_pending_nr_in_zone(struct zone *z) } #else +static const int sysctl_migrc_enable; static inline bool migrc_src_pending(struct folio *f) { return false; } static inline bool migrc_dst_pending(struct folio *f) { return false; } static inline bool migrc_is_full(int nid) { return true; } @@ -1878,8 +1921,9 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, VM_WARN_ON_ONCE(mode != MIGRATE_ASYNC && !list_empty(from) && !list_is_singular(from)); - migrc_cond1 = (reason == MR_DEMOTION && current_is_kswapd()) || - (reason == MR_NUMA_MISPLACED); + migrc_cond1 = sysctl_migrc_enable && + ((reason == MR_DEMOTION && current_is_kswapd()) || + (reason == MR_NUMA_MISPLACED)); if (migrc_cond1) migrc_req_start(); From patchwork Thu Aug 17 08:05:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13356112 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 158FCC2FC0E for ; Thu, 17 Aug 2023 08:09:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1BEC8280040; Thu, 17 Aug 2023 04:09:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1709528003E; Thu, 17 Aug 2023 04:09:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 01214280040; Thu, 17 Aug 2023 04:09:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DFD7828003E for ; Thu, 17 Aug 2023 04:09:06 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id C0628A1032 for ; Thu, 17 Aug 2023 08:09:06 +0000 (UTC) X-FDA: 81132871092.25.3E50776 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf29.hostedemail.com (Postfix) with ESMTP id DA02F120010 for ; Thu, 17 Aug 2023 08:09:04 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf29.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692259745; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=mrAvDZq1CLVKxCFelD5gJ6PvbbLu6jexXpUij4Oengc=; b=BIdaOnkGSH+qmR9N+3NajOPA7wLT6qCWMB/LSahRFPrenbUKv+UpihlywH7fxBf68vabyo YqERxtxyfoaMHQB+CtzYux3Rj6IZs2es4gLUplrRQ4XXSNEXESj/yWt30cT37FigjhiEWJ Y45ieIr3eNQeZEfg0J/+Mkb+l34n//4= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf29.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692259745; a=rsa-sha256; cv=none; b=aPD7kJG364DjUlJJgHbffWy+SFl2ElvMBfwMc2NtrQT2EdA8101SEwOgmqZlrNJ8+0dgaN ane3awQK5kbz4BR7hoFszxzzcrr9+wBDCKVN+NDxZSrhRPeUyU7mlz/lXC3dt+0bjF80nE RmWw7B/PQ9aPAGRilJB/BMEOwQzUnwA= X-AuditID: a67dfc5b-d85ff70000001748-d9-64ddd5987e6f From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, namit@vmware.com, xhao@linux.alibaba.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, dave.hansen@linux.intel.com Subject: [RFC v2 6/6] mm, migrc: Implement internal allocator to minimize impact onto vm Date: Thu, 17 Aug 2023 17:05:59 +0900 Message-Id: <20230817080559.43200-7-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230817080559.43200-1-byungchul@sk.com> References: <20230817080559.43200-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrFLMWRmVeSWpSXmKPExsXC9ZZnoe6Mq3dTDA59MrOYs34Nm8WLDe2M Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsru96yGhxvPcAk8XvH0DZOVOsLE7O msziwOuxYFOpx+YVWh6L97xk8ti0qpPNY9OnSeweJ2b8ZvHY+dDSY97JQI/3+66yeWz9Zefx eZOcx7v5b9kCeKK4bFJSczLLUov07RK4MrYflSq4zlXxd9Uc5gbG3xxdjJwcEgImEnvfHWeD sVd/ec8OYrMJqEvcuPGTGcQWETCTONj6ByjOxcEssIxJ4u6Bc6wgCWGBCImmxyvAGlgEVCX2 b1/LBGLzCphKnL7fCDVUXmL1hgNAgzg4OIEGbf6rChIWAir5sHcdK8hMCYEzbBLXb06EqpeU OLjiBssERt4FjAyrGIUy88pyEzNzTPQyKvMyK/SS83M3MQJDeFntn+gdjJ8uBB9iFOBgVOLh ddh1J0WINbGsuDL3EKMEB7OSCG8P760UId6UxMqq1KL8+KLSnNTiQ4zSHCxK4rxG38pThATS E0tSs1NTC1KLYLJMHJxSDYzupm8kOrLjuMpUnr29FH1W+Pk1NmG3N8vzCpWfpnxq6N4s+Xbz 65VWOhr1H58scXny6I0y05eL91YcXDKz5BzrHo7VzYcEqq84PEg7UvaAzUa/PXy21McvurcM 0u37FRNVN5xjuVDq8FzdiO/rSYP+C0kme5/M7I6Yc9Cj7FqXzcWAsquLIqKUWIozEg21mIuK EwHEHSNIXQIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrILMWRmVeSWpSXmKPExsXC5WfdrDvj6t0Ug62/dSzmrF/DZvFiQzuj xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlc3/WQ0eJ47wEmi98/gLJz plhZnJw1mcWBz2PBplKPzSu0PBbvecnksWlVJ5vHpk+T2D1OzPjN4rHzoaXHvJOBHu/3XWXz WPziA5PH1l92Hp83yXm8m/+WLYA3issmJTUnsyy1SN8ugStj+1GpgutcFX9XzWFuYPzN0cXI ySEhYCKx+st7dhCbTUBd4saNn8wgtoiAmcTB1j9AcS4OZoFlTBJ3D5xjBUkIC0RIND1eAdbA IqAqsX/7WiYQm1fAVOL0/UY2iKHyEqs3HAAaxMHBCTRo819VkLAQUMmHvetYJzByLWBkWMUo kplXlpuYmWOqV5ydUZmXWaGXnJ+7iREYkMtq/0zcwfjlsvshRgEORiUeXoddd1KEWBPLiitz DzFKcDArifD28N5KEeJNSaysSi3Kjy8qzUktPsQozcGiJM7rFZ6aICSQnliSmp2aWpBaBJNl 4uCUamAsajd6ekX0lL9J4j9+H/FX/ZcnJKy/1HbN4d/ZqQ/UJy1MUEuccv7a5ylv7yyK2Tmd nZk1y4bnwPePp5/OnvjT7xdXNZeLzYxHSqdWxdj6sTpqRTvs7jSyXbTnqbvq4plH77+PPdU+ 3eigzrK5wZytF9vPnpJ4zu9ysdN4S55o6m5eyw8THhzMU2Ipzkg01GIuKk4EAC4YsvFEAgAA X-CFilter-Loop: Reflected X-Rspam-User: X-Stat-Signature: s3cp3b68wsrtp8bdy9x9t3kegcxkyfzp X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: DA02F120010 X-HE-Tag: 1692259744-112263 X-HE-Meta: U2FsdGVkX1/6Esu8ntUgvadh0JA/8/iiFAA6aZVZ+aCWIQR0tiFMuSRZ2Hja1+ggKRXBGZzUKJ3uKNwoU1+TjO6vHuMFjwimnTXmQGs9lXkziXMwoJ8SWD5xl03ozoRxpi+AmYDDO+vSDjcQa/HznpE/cGSJiodhYYgPSQXGruxsYNvLxJKi61LSLtaVECuZwsOGrR/Y4PlH03hwYJH1aSsIoHrBsqu2+qxTwjkVx+veTws+bI850an8t8ijHjN3N5PaRwtawEYAjjUMP0+yCaRiV0WiJUUh1vAfmSeJlO/SVfdsg1aW5KYFR+YQg4+jmhdVzCDEFabp135PHmX/ffXL6BRSIVpOkFIaQCeW2+wozDqSdIyRQNSD2EzAhyZzxoSEwE9TlcES8r/fetDd7vC+JGeDhEU8Xd077UhBl/Yk0kXALROCCfC/a3z+r5IueY9qI0ixuI2NVFHgi8xllajnCeWubnGm4Q2JqeBU7xXcfCN1hW5LxoKXMiXFflQzo+SqWaWoVl/Wsuq3vaUFllB2Vj/Ckj2h924pxSa0w74y3lC4TkgUd1pIBw6gD8i9JNKndyNnE/EMrICeLbiiv3x56mrVNxYrPJA6KbHjrlcv5VAnrYcf9hnBH7YSZ0IdWfhh/VmyDXDeAApeqAWl+yjKuR2WHlKuyC3HGE1QkyYUJ+UTF89n1NR7SzVyNHCqgvD716Z1ekOsEvJOBUrKKfENpFnmAyrJpfhiIY7ac9YJgoAZmchO5MNrU6qf1MzClMmnH0DUbPMyj1Xz0bpsTM6x34t4M0zMb8n1QNJMuZ9zVDWwtn2n2PwH8STtc7zKIVZYERLIuNMwORhPAxBm7uFkjIYBcO4gq8DDCgh5jprf04pVAOM9UPkqAz6CERK6qE4UfZy2CfKCDKSTKp4K5PfZWx1qjsEDj27tj3zS5doWhq/wj41iPHfouyanMtjm4KD+TQyCaZYneWnuMqd 7yQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: (Not sure if this patch works meaningfully. Ignore if not.) Signed-off-by: Byungchul Park --- mm/migrate.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index c57536a0b2a6..6b5113d5a1e2 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -122,14 +122,33 @@ enum { MIGRC_DST_PENDING, }; +#define MAX_MIGRC_REQ_NR 4096 +static struct migrc_req migrc_req_pool_static[MAX_MIGRC_REQ_NR]; +static atomic_t migrc_req_pool_idx = ATOMIC_INIT(-1); +static LLIST_HEAD(migrc_req_pool_llist); +static DEFINE_SPINLOCK(migrc_req_pool_lock); + static struct migrc_req *alloc_migrc_req(void) { - return kmalloc(sizeof(struct migrc_req), GFP_KERNEL); + int idx = atomic_read(&migrc_req_pool_idx); + struct llist_node *n; + + if (idx < MAX_MIGRC_REQ_NR - 1) { + idx = atomic_inc_return(&migrc_req_pool_idx); + if (idx < MAX_MIGRC_REQ_NR) + return migrc_req_pool_static + idx; + } + + spin_lock(&migrc_req_pool_lock); + n = llist_del_first(&migrc_req_pool_llist); + spin_unlock(&migrc_req_pool_lock); + + return n ? llist_entry(n, struct migrc_req, llnode) : NULL; } void free_migrc_req(struct migrc_req *req) { - kfree(req); + llist_add(&req->llnode, &migrc_req_pool_llist); } static bool migrc_is_full(int nid)