From patchwork Thu May 18 06:59:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yicong Yang X-Patchwork-Id: 13246227 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F524C7EE25 for ; Thu, 18 May 2023 07:01:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D37B8900005; Thu, 18 May 2023 03:01:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CE897900003; Thu, 18 May 2023 03:01:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B89E7900005; Thu, 18 May 2023 03:01:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A52FE900003 for ; Thu, 18 May 2023 03:01:01 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 825DE8077B for ; Thu, 18 May 2023 07:01:01 +0000 (UTC) X-FDA: 80802478722.09.2FF935F Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf01.hostedemail.com (Postfix) with ESMTP id B69664001C for ; Thu, 18 May 2023 07:00:57 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=none; spf=pass (imf01.hostedemail.com: domain of yangyicong@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=yangyicong@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684393259; a=rsa-sha256; cv=none; b=EndjftPqMREbk1AZ2k83brJWE9GaE5KEGBAZvIq33s0jZ5nncyUd0Z4vAb71C/2xMuvGS9 Name++gJWlv6TGHLVxZrp+2DvPlKJLQjIxEgHW93Vz/H454ELmhy4Bwu0FcNcpNR4Vg6jd YM4rsZ8CI6kodAeMCm7x7lpkZjTNzz8= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none; spf=pass (imf01.hostedemail.com: domain of yangyicong@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=yangyicong@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684393259; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YtxXaoBQfTsdnDdbbI3w3fKSPoA3Phmt+SkoMz5BraM=; b=dUyw8uU8akQ40RFPlEzWNNd3t8AVgTWK/Ka2V3/CQZRY6Kos0vnt5lnGoVNM+6ZV5ogzcx oPLfMraBkN+58hr9ZXoBO4Dn1OkCBYVB3llua7szBn023V7qDt6mkbcvHn+8jlqPLoIg5t kGG/xCYftOrYFidRjc5V4Y4KWx/lcto= Received: from canpemm500009.china.huawei.com (unknown [172.30.72.53]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4QMLSN351FzLmM5; Thu, 18 May 2023 14:59:32 +0800 (CST) Received: from localhost.localdomain (10.50.163.32) by canpemm500009.china.huawei.com (7.192.105.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Thu, 18 May 2023 15:00:53 +0800 From: Yicong Yang To: , , , , , , , , , CC: , , , , , , , , , , , , , , , , , Barry Song <21cnbao@gmail.com>, , , , , Barry Song , Nadav Amit , Mel Gorman Subject: [RESEND PATCH v9 2/2] arm64: support batched/deferred tlb shootdown during page reclamation/migration Date: Thu, 18 May 2023 14:59:34 +0800 Message-ID: <20230518065934.12877-3-yangyicong@huawei.com> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20230518065934.12877-1-yangyicong@huawei.com> References: <20230518065934.12877-1-yangyicong@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.50.163.32] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To canpemm500009.china.huawei.com (7.192.105.203) X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: B69664001C X-Stat-Signature: bumyungkcoxy5qp9i11bt89n84n3jnst X-HE-Tag: 1684393257-781726 X-HE-Meta: U2FsdGVkX1+kjLwxT04fpB1dZtpIQFlSwq/dNvSrCBg0gVCMeL/vjuSgOFtYKO5RtldjJSoqS8jctMUDqSrmW2HhFmJlsfPx32QFORcV6TVj4JZXEiJglNgq7CwRyTMn242meljWgVwoxZpT3yyWuFPdHYDJGvhyoxTwzJoNJFJJ4i58BX3KOkGoHGJxOie3PxXjy59oK9Gev/oKnVg274PJVEHxNLEuz/tgz1Oeab066gAxHkPuFzvxWZXzHvRr6H9eJv5xmmkTP8WB5iTZ8hFODGZQn0vuTOHJx1WJ7H4bmAGI6RrfJkFAZfsjfX+UMXv2v5Z+9/71ee7OeZmeuwLxgx+VwubghjCD1i4mTz110wXFxK/HvuC3IA26yQvn0sIIT9MA31eZO2ZartbvO69vQEHzvIqPqL6QRpN7NM1GYvhMI5unsn7Lewf1q9y4iSrALZTriRQuWL2H/lHfmKCy1c6MEDh2Su+zWDc1ZJINSuM22OQWtK0OCZrB6n44FFQjp5nM3f/MsGB0y70GuWUJUAiMKbabdvDZARQmnrucHx+7CY3Vn53xEn3+QNlslizyobD9+9MSX0FzNAkRkKz6NTayhmlO/EHvdEwSVW9Gp96NuenwJ/bZ/7Jow8PF5ZmnlsQr6iO8WlX/OoKbnOM/GaBWdnXCgYlG9BYqZCk+moNclOQ5YbHhtm+/ITUknn6/D0SKu0eiXxot/ctuVyeKeJrcLAqu2wl6K52JGDYgnmIL1kf6+bDn7eqe1rdGktt6wu2LaRIEyzTedHbTDvo3rcGSKy4VwuJo0ab0Q5kVHUYJghIAfbrRYaKXbhsdB1CmFLfOsxD4yLiOwLi1AUIVjz/yvODzayQnIayNZNl/DYafs8m/c8d5a3VbfqZB0X96+ARNkVBUZ2TurpQAm0sXMTxNcXdXn/MSD/19vrfSjeHFnM0gI31SYno8sVdstBsZ0Lx9ehihC6jhGBX 6JaJXlqC 3tzuRjXu3fTYWvEKm9AVzbrNFUwSgChrB6ebeQSaO02zw+gUv221O/x8YdEWzUlkQhpoaLHKhTpHuiW7Wr/cWbHA8OkLjQdoSUqM6Od5DMAhEOiunDupN+WIvIrc3T1DGOdWUQifbgx4x5xSp7MWzqadkRYNL8tqlLh0ENiFySSB5sE6N7MUceiWMILmHWX2h0vuA8tVtm9ZrqcV71ovwD6nJXg4qHYZ0XnGvMMtelelwzewb/++X5ocoK/pjkPx5tKDRGxDyFeY/RnMDoCw25J3WSpoGKd/6PI0flgi+dnJ/Mnwjwo2VMbnnPMo5g8+dj1ZP01OU/f7wNBXJtCYXPtzPuk6dqLTbB16t+wA2GNnWTdWO6+oUmYZ402F2AUlsTY6jDH/xAYvRBL7SbJRXD/MNai7CAeSHs9zQcHfZgH4cidsZlPtJ09gjs2L1uWEMEhn/jexm1+k1PBbwaLGSym0dDpov0ID1mx2iFK0LNtDLp3bFiDR9l2dYiQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Barry Song on x86, batched and deferred tlb shootdown has lead to 90% performance increase on tlb shootdown. on arm64, HW can do tlb shootdown without software IPI. But sync tlbi is still quite expensive. Even running a simplest program which requires swapout can prove this is true, #include #include #include #include int main() { #define SIZE (1 * 1024 * 1024) volatile unsigned char *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); memset(p, 0x88, SIZE); for (int k = 0; k < 10000; k++) { /* swap in */ for (int i = 0; i < SIZE; i += 4096) { (void)p[i]; } /* swap out */ madvise(p, SIZE, MADV_PAGEOUT); } } Perf result on snapdragon 888 with 8 cores by using zRAM as the swap block device. ~ # perf record taskset -c 4 ./a.out [ perf record: Woken up 10 times to write data ] [ perf record: Captured and wrote 2.297 MB perf.data (60084 samples) ] ~ # perf report # To display the perf.data header info, please use --header/--header-only options. # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 60K of event 'cycles' # Event count (approx.): 35706225414 # # Overhead Command Shared Object Symbol # ........ ....... ................. ............................................................................. # 21.07% a.out [kernel.kallsyms] [k] _raw_spin_unlock_irq 8.23% a.out [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 6.67% a.out [kernel.kallsyms] [k] filemap_map_pages 6.16% a.out [kernel.kallsyms] [k] __zram_bvec_write 5.36% a.out [kernel.kallsyms] [k] ptep_clear_flush 3.71% a.out [kernel.kallsyms] [k] _raw_spin_lock 3.49% a.out [kernel.kallsyms] [k] memset64 1.63% a.out [kernel.kallsyms] [k] clear_page 1.42% a.out [kernel.kallsyms] [k] _raw_spin_unlock 1.26% a.out [kernel.kallsyms] [k] mod_zone_state.llvm.8525150236079521930 1.23% a.out [kernel.kallsyms] [k] xas_load 1.15% a.out [kernel.kallsyms] [k] zram_slot_lock ptep_clear_flush() takes 5.36% CPU in the micro-benchmark swapping in/out a page mapped by only one process. If the page is mapped by multiple processes, typically, like more than 100 on a phone, the overhead would be much higher as we have to run tlb flush 100 times for one single page. Plus, tlb flush overhead will increase with the number of CPU cores due to the bad scalability of tlb shootdown in HW, so those ARM64 servers should expect much higher overhead. Further perf annonate shows 95% cpu time of ptep_clear_flush is actually used by the final dsb() to wait for the completion of tlb flush. This provides us a very good chance to leverage the existing batched tlb in kernel. The minimum modification is that we only send async tlbi in the first stage and we send dsb while we have to sync in the second stage. With the above simplest micro benchmark, collapsed time to finish the program decreases around 5%. Typical collapsed time w/o patch: ~ # time taskset -c 4 ./a.out 0.21user 14.34system 0:14.69elapsed w/ patch: ~ # time taskset -c 4 ./a.out 0.22user 13.45system 0:13.80elapsed Also, Yicong Yang added the following observation. Tested with benchmark in the commit on Kunpeng920 arm64 server, observed an improvement around 12.5% with command `time ./swap_bench`. w/o w/ real 0m13.460s 0m11.771s user 0m0.248s 0m0.279s sys 0m12.039s 0m11.458s Originally it's noticed a 16.99% overhead of ptep_clear_flush() which has been eliminated by this patch: [root@localhost yang]# perf record -- ./swap_bench && perf report [...] 16.99% swap_bench [kernel.kallsyms] [k] ptep_clear_flush It is tested on 4,8,128 CPU platforms and shows to be beneficial on large systems but may not have improvement on small systems like on a 4 CPU platform. So make ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH depends on CONFIG_EXPERT for this stage and add a runtime tunable to allow to disable it according to the scenario. Also this patch improve the performance of page migration. Using pmbench and tries to migrate the pages of pmbench between node 0 and node 1 for 20 times, this patch decrease the time used more than 50% and saved the time used by ptep_clear_flush(). This patch extends arch_tlbbatch_add_mm() to take an address of the target page to support the feature on arm64. Also rename it to arch_tlbbatch_add_pending() to better match its function since we don't need to handle the mm on arm64 and add_mm is not proper. add_pending will make sense to both as on x86 we're pending the TLB flush operations while on arm64 we're pending the synchronize operations. Cc: Anshuman Khandual Cc: Jonathan Corbet Cc: Nadav Amit Cc: Mel Gorman Tested-by: Yicong Yang Tested-by: Xin Hao Tested-by: Punit Agrawal Signed-off-by: Barry Song Signed-off-by: Yicong Yang Reviewed-by: Kefeng Wang Reviewed-by: Xin Hao Reviewed-by: Anshuman Khandual --- .../features/vm/TLB/arch-support.txt | 2 +- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/tlbbatch.h | 12 ++++ arch/arm64/include/asm/tlbflush.h | 33 ++++++++- arch/arm64/mm/flush.c | 69 +++++++++++++++++++ arch/x86/include/asm/tlbflush.h | 5 +- include/linux/mm_types_task.h | 4 +- mm/rmap.c | 12 ++-- 8 files changed, 126 insertions(+), 12 deletions(-) create mode 100644 arch/arm64/include/asm/tlbbatch.h diff --git a/Documentation/features/vm/TLB/arch-support.txt b/Documentation/features/vm/TLB/arch-support.txt index 7f049c251a79..76208db88f3b 100644 --- a/Documentation/features/vm/TLB/arch-support.txt +++ b/Documentation/features/vm/TLB/arch-support.txt @@ -9,7 +9,7 @@ | alpha: | TODO | | arc: | TODO | | arm: | TODO | - | arm64: | N/A | + | arm64: | ok | | csky: | TODO | | hexagon: | TODO | | ia64: | TODO | diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index b1201d25a8a4..b3fc652dc902 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -96,6 +96,7 @@ config ARM64 select ARCH_SUPPORTS_NUMA_BALANCING select ARCH_SUPPORTS_PAGE_TABLE_CHECK select ARCH_SUPPORTS_PER_VMA_LOCK + select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH if EXPERT select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT select ARCH_WANT_DEFAULT_BPF_JIT select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT diff --git a/arch/arm64/include/asm/tlbbatch.h b/arch/arm64/include/asm/tlbbatch.h new file mode 100644 index 000000000000..fedb0b87b8db --- /dev/null +++ b/arch/arm64/include/asm/tlbbatch.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ARCH_ARM64_TLBBATCH_H +#define _ARCH_ARM64_TLBBATCH_H + +struct arch_tlbflush_unmap_batch { + /* + * For arm64, HW can do tlb shootdown, so we don't + * need to record cpumask for sending IPI + */ +}; + +#endif /* _ARCH_ARM64_TLBBATCH_H */ diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index 412a3b9a3c25..8041905e26b9 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -254,17 +254,23 @@ static inline void flush_tlb_mm(struct mm_struct *mm) dsb(ish); } -static inline void flush_tlb_page_nosync(struct vm_area_struct *vma, +static inline void __flush_tlb_page_nosync(struct mm_struct *mm, unsigned long uaddr) { unsigned long addr; dsb(ishst); - addr = __TLBI_VADDR(uaddr, ASID(vma->vm_mm)); + addr = __TLBI_VADDR(uaddr, ASID(mm)); __tlbi(vale1is, addr); __tlbi_user(vale1is, addr); } +static inline void flush_tlb_page_nosync(struct vm_area_struct *vma, + unsigned long uaddr) +{ + return __flush_tlb_page_nosync(vma->vm_mm, uaddr); +} + static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr) { @@ -272,6 +278,29 @@ static inline void flush_tlb_page(struct vm_area_struct *vma, dsb(ish); } +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH + +extern struct static_key_false batched_tlb_enabled; + +static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) +{ + return static_branch_likely(&batched_tlb_enabled); +} + +static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm, + unsigned long uaddr) +{ + __flush_tlb_page_nosync(mm, uaddr); +} + +static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) +{ + dsb(ish); +} + +#endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ + /* * This is meant to avoid soft lock-ups on large TLB flushing ranges and not * necessarily a performance improvement. diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index 5f9379b3c8c8..84a8e15cda96 100644 --- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -7,8 +7,10 @@ */ #include +#include #include #include +#include #include #include @@ -107,3 +109,70 @@ void arch_invalidate_pmem(void *addr, size_t size) } EXPORT_SYMBOL_GPL(arch_invalidate_pmem); #endif + +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH + +DEFINE_STATIC_KEY_FALSE(batched_tlb_enabled); + +static bool batched_tlb_flush_supported(void) +{ +#ifdef CONFIG_ARM64_WORKAROUND_REPEAT_TLBI + /* + * TLB flush deferral is not required on systems, which are affected with + * ARM64_WORKAROUND_REPEAT_TLBI, as __tlbi()/__tlbi_user() implementation + * will have two consecutive TLBI instructions with a dsb(ish) in between + * defeating the purpose (i.e save overall 'dsb ish' cost). + */ + if (unlikely(cpus_have_const_cap(ARM64_WORKAROUND_REPEAT_TLBI))) + return false; +#endif + return true; +} + +int batched_tlb_enabled_handler(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + unsigned int enabled = static_branch_unlikely(&batched_tlb_enabled); + struct ctl_table t; + int err; + + if (write && !capable(CAP_SYS_ADMIN)) + return -EPERM; + + t = *table; + t.data = &enabled; + err = proc_dointvec_minmax(&t, write, buffer, lenp, ppos); + if (!err && write) { + if (enabled && batched_tlb_flush_supported()) + static_branch_enable(&batched_tlb_enabled); + else + static_branch_disable(&batched_tlb_enabled); + } + + return err; +} + +static struct ctl_table batched_tlb_sysctls[] = { + { + .procname = "batched_tlb_enabled", + .data = NULL, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = batched_tlb_enabled_handler, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, + {} +}; + +static int __init batched_tlb_sysctls_init(void) +{ + if (batched_tlb_flush_supported()) + static_branch_enable(&batched_tlb_enabled); + + register_sysctl_init("vm", batched_tlb_sysctls); + return 0; +} +late_initcall(batched_tlb_sysctls_init); + +#endif diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 46bdff73217c..2eb4b69ce38b 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -283,8 +283,9 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) return atomic64_inc_return(&mm->context.tlb_gen); } -static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch, - struct mm_struct *mm) +static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm, + unsigned long uaddr) { inc_mm_tlb_gen(mm); cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm)); diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index 5414b5c6a103..aa44fff8bb9d 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -52,8 +52,8 @@ struct tlbflush_unmap_batch { #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH /* * The arch code makes the following promise: generic code can modify a - * PTE, then call arch_tlbbatch_add_mm() (which internally provides all - * needed barriers), then call arch_tlbbatch_flush(), and the entries + * PTE, then call arch_tlbbatch_add_pending() (which internally provides + * all needed barriers), then call arch_tlbbatch_flush(), and the entries * will be flushed on all CPUs by the time that arch_tlbbatch_flush() * returns. */ diff --git a/mm/rmap.c b/mm/rmap.c index b45f95ab0c04..9ef497228d45 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -642,7 +642,8 @@ void try_to_unmap_flush_dirty(void) #define TLB_FLUSH_BATCH_PENDING_LARGE \ (TLB_FLUSH_BATCH_PENDING_MASK / 2) -static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval) +static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, + unsigned long uaddr) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; int batch; @@ -651,7 +652,7 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval) if (!pte_accessible(mm, pteval)) return; - arch_tlbbatch_add_mm(&tlb_ubc->arch, mm); + arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr); tlb_ubc->flush_required = true; /* @@ -726,7 +727,8 @@ void flush_tlb_batched_pending(struct mm_struct *mm) } } #else -static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval) +static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, + unsigned long uaddr) { } @@ -1577,7 +1579,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval); + set_tlb_ubc_flush_pending(mm, pteval, address); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } @@ -1958,7 +1960,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval); + set_tlb_ubc_flush_pending(mm, pteval, address); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); }