Message ID | 20230518065934.12877-3-yangyicong@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: support batched/deferred tlb shootdown during page reclamation/migration | expand |
On Thu, May 18, 2023 at 02:59:34PM +0800, Yicong Yang wrote: > From: Barry Song <v-songbaohua@oppo.com> > > on x86, batched and deferred tlb shootdown has lead to 90% > performance increase on tlb shootdown. on arm64, HW can do > tlb shootdown without software IPI. But sync tlbi is still > quite expensive. [...] > .../features/vm/TLB/arch-support.txt | 2 +- > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/tlbbatch.h | 12 ++++ > arch/arm64/include/asm/tlbflush.h | 33 ++++++++- > arch/arm64/mm/flush.c | 69 +++++++++++++++++++ > arch/x86/include/asm/tlbflush.h | 5 +- > include/linux/mm_types_task.h | 4 +- > mm/rmap.c | 12 ++-- First of all, this patch needs to be split in some preparatory patches introducing/renaming functions with no functional change for x86. Once done, you can add the arm64-only changes. Now, on the implementation, I had some comments on v7 but we didn't get to a conclusion and the thread eventually died: https://lore.kernel.org/linux-mm/Y7cToj5mWd1ZbMyQ@arm.com/ I know I said a command line argument is better than Kconfig or some random number of CPUs heuristics but it would be even better if we don't bother with any, just make this always on. Barry had some comments around mprotect() being racy and that's why we have flush_tlb_batched_pending() but I don't think it's needed (or, for arm64, it can be a DSB since this patch issues the TLBIs but without the DVM Sync). So we need to clarify this (see Barry's last email on the above thread) and before attempting new versions of this patchset. With flush_tlb_batched_pending() removed (or DSB), I have a suspicion such implementation would be faster on any SoC irrespective of the number of CPUs.
On Thu, Jun 29, 2023 at 05:31:36PM +0100, Catalin Marinas wrote: > On Thu, May 18, 2023 at 02:59:34PM +0800, Yicong Yang wrote: > > From: Barry Song <v-songbaohua@oppo.com> > > > > on x86, batched and deferred tlb shootdown has lead to 90% > > performance increase on tlb shootdown. on arm64, HW can do > > tlb shootdown without software IPI. But sync tlbi is still > > quite expensive. > [...] > > .../features/vm/TLB/arch-support.txt | 2 +- > > arch/arm64/Kconfig | 1 + > > arch/arm64/include/asm/tlbbatch.h | 12 ++++ > > arch/arm64/include/asm/tlbflush.h | 33 ++++++++- > > arch/arm64/mm/flush.c | 69 +++++++++++++++++++ > > arch/x86/include/asm/tlbflush.h | 5 +- > > include/linux/mm_types_task.h | 4 +- > > mm/rmap.c | 12 ++-- > > First of all, this patch needs to be split in some preparatory patches > introducing/renaming functions with no functional change for x86. Once > done, you can add the arm64-only changes. > > Now, on the implementation, I had some comments on v7 but we didn't get > to a conclusion and the thread eventually died: > > https://lore.kernel.org/linux-mm/Y7cToj5mWd1ZbMyQ@arm.com/ > > I know I said a command line argument is better than Kconfig or some > random number of CPUs heuristics but it would be even better if we don't > bother with any, just make this always on. Barry had some comments > around mprotect() being racy and that's why we have > flush_tlb_batched_pending() but I don't think it's needed (or, for > arm64, it can be a DSB since this patch issues the TLBIs but without the > DVM Sync). So we need to clarify this (see Barry's last email on the > above thread) and before attempting new versions of this patchset. With > flush_tlb_batched_pending() removed (or DSB), I have a suspicion such > implementation would be faster on any SoC irrespective of the number of > CPUs. I think I got the need for flush_tlb_batched_pending(). If try_to_unmap() marks the pte !present and we have a pending TLBI, change_pte_range() will skip the TLB maintenance altogether since it did not change the pte. So we could be left with stale TLB entries after mprotect() before TTU does the batch flushing. We can have an arch-specific flush_tlb_batched_pending() that can be a DSB only on arm64 and a full mm flush on x86.
On 2023/6/30 1:26, Catalin Marinas wrote: > On Thu, Jun 29, 2023 at 05:31:36PM +0100, Catalin Marinas wrote: >> On Thu, May 18, 2023 at 02:59:34PM +0800, Yicong Yang wrote: >>> From: Barry Song <v-songbaohua@oppo.com> >>> >>> on x86, batched and deferred tlb shootdown has lead to 90% >>> performance increase on tlb shootdown. on arm64, HW can do >>> tlb shootdown without software IPI. But sync tlbi is still >>> quite expensive. >> [...] >>> .../features/vm/TLB/arch-support.txt | 2 +- >>> arch/arm64/Kconfig | 1 + >>> arch/arm64/include/asm/tlbbatch.h | 12 ++++ >>> arch/arm64/include/asm/tlbflush.h | 33 ++++++++- >>> arch/arm64/mm/flush.c | 69 +++++++++++++++++++ >>> arch/x86/include/asm/tlbflush.h | 5 +- >>> include/linux/mm_types_task.h | 4 +- >>> mm/rmap.c | 12 ++-- >> >> First of all, this patch needs to be split in some preparatory patches >> introducing/renaming functions with no functional change for x86. Once >> done, you can add the arm64-only changes. >> got it. will try to split this patch as suggested. >> Now, on the implementation, I had some comments on v7 but we didn't get >> to a conclusion and the thread eventually died: >> >> https://lore.kernel.org/linux-mm/Y7cToj5mWd1ZbMyQ@arm.com/ >> >> I know I said a command line argument is better than Kconfig or some >> random number of CPUs heuristics but it would be even better if we don't >> bother with any, just make this always on. ok, will make this always on. >> Barry had some comments >> around mprotect() being racy and that's why we have >> flush_tlb_batched_pending() but I don't think it's needed (or, for >> arm64, it can be a DSB since this patch issues the TLBIs but without the >> DVM Sync). So we need to clarify this (see Barry's last email on the >> above thread) and before attempting new versions of this patchset. With >> flush_tlb_batched_pending() removed (or DSB), I have a suspicion such >> implementation would be faster on any SoC irrespective of the number of >> CPUs. > > I think I got the need for flush_tlb_batched_pending(). If > try_to_unmap() marks the pte !present and we have a pending TLBI, > change_pte_range() will skip the TLB maintenance altogether since it did > not change the pte. So we could be left with stale TLB entries after > mprotect() before TTU does the batch flushing. > > We can have an arch-specific flush_tlb_batched_pending() that can be a > DSB only on arm64 and a full mm flush on x86. > We need to do a flush/dsb in flush_tlb_batched_pending() only in a race condition so we first check whether there's a pended batched flush and if so do the tlb flush. The pending checking is common and the differences among the archs is how to flush the TLB here within the flush_tlb_batched_pending(), on arm64 it should only be a dsb. As we only needs to maintain the TLBs already pended in batched flush, does it make sense to only handle those TLBs in flush_tlb_batched_pending()? Then we can use the arch_tlbbatch_flush() rather than flush_tlb_mm() in flush_tlb_batched_pending() and no arch specific function needed. Thanks.
On Tue, Jul 4, 2023 at 10:36 PM Yicong Yang <yangyicong@huawei.com> wrote: > > On 2023/6/30 1:26, Catalin Marinas wrote: > > On Thu, Jun 29, 2023 at 05:31:36PM +0100, Catalin Marinas wrote: > >> On Thu, May 18, 2023 at 02:59:34PM +0800, Yicong Yang wrote: > >>> From: Barry Song <v-songbaohua@oppo.com> > >>> > >>> on x86, batched and deferred tlb shootdown has lead to 90% > >>> performance increase on tlb shootdown. on arm64, HW can do > >>> tlb shootdown without software IPI. But sync tlbi is still > >>> quite expensive. > >> [...] > >>> .../features/vm/TLB/arch-support.txt | 2 +- > >>> arch/arm64/Kconfig | 1 + > >>> arch/arm64/include/asm/tlbbatch.h | 12 ++++ > >>> arch/arm64/include/asm/tlbflush.h | 33 ++++++++- > >>> arch/arm64/mm/flush.c | 69 +++++++++++++++++++ > >>> arch/x86/include/asm/tlbflush.h | 5 +- > >>> include/linux/mm_types_task.h | 4 +- > >>> mm/rmap.c | 12 ++-- > >> > >> First of all, this patch needs to be split in some preparatory patches > >> introducing/renaming functions with no functional change for x86. Once > >> done, you can add the arm64-only changes. > >> > > got it. will try to split this patch as suggested. > > >> Now, on the implementation, I had some comments on v7 but we didn't get > >> to a conclusion and the thread eventually died: > >> > >> https://lore.kernel.org/linux-mm/Y7cToj5mWd1ZbMyQ@arm.com/ > >> > >> I know I said a command line argument is better than Kconfig or some > >> random number of CPUs heuristics but it would be even better if we don't > >> bother with any, just make this always on. > > ok, will make this always on. > > >> Barry had some comments > >> around mprotect() being racy and that's why we have > >> flush_tlb_batched_pending() but I don't think it's needed (or, for > >> arm64, it can be a DSB since this patch issues the TLBIs but without the > >> DVM Sync). So we need to clarify this (see Barry's last email on the > >> above thread) and before attempting new versions of this patchset. With > >> flush_tlb_batched_pending() removed (or DSB), I have a suspicion such > >> implementation would be faster on any SoC irrespective of the number of > >> CPUs. > > > > I think I got the need for flush_tlb_batched_pending(). If > > try_to_unmap() marks the pte !present and we have a pending TLBI, > > change_pte_range() will skip the TLB maintenance altogether since it did > > not change the pte. So we could be left with stale TLB entries after > > mprotect() before TTU does the batch flushing. > > Good catch. This could be also true for MADV_DONTNEED. after try_to_unmap, we run MADV_DONTNEED on this area, as pte is not present, we don't do anything on this PTE in zap_pte_range afterwards. > > We can have an arch-specific flush_tlb_batched_pending() that can be a > > DSB only on arm64 and a full mm flush on x86. > > > > We need to do a flush/dsb in flush_tlb_batched_pending() only in a race > condition so we first check whether there's a pended batched flush and > if so do the tlb flush. The pending checking is common and the differences > among the archs is how to flush the TLB here within the flush_tlb_batched_pending(), > on arm64 it should only be a dsb. > > As we only needs to maintain the TLBs already pended in batched flush, > does it make sense to only handle those TLBs in flush_tlb_batched_pending()? > Then we can use the arch_tlbbatch_flush() rather than flush_tlb_mm() in > flush_tlb_batched_pending() and no arch specific function needed. as we have issued no-sync tlbi on those pending addresses , that means our hardware has already "recorded" what should be flushed in the specific mm. so DSB only will flush them correctly. right? > > Thanks. > Barry
On 2023/7/5 16:43, Barry Song wrote: > On Tue, Jul 4, 2023 at 10:36 PM Yicong Yang <yangyicong@huawei.com> wrote: >> >> On 2023/6/30 1:26, Catalin Marinas wrote: >>> On Thu, Jun 29, 2023 at 05:31:36PM +0100, Catalin Marinas wrote: >>>> On Thu, May 18, 2023 at 02:59:34PM +0800, Yicong Yang wrote: >>>>> From: Barry Song <v-songbaohua@oppo.com> >>>>> >>>>> on x86, batched and deferred tlb shootdown has lead to 90% >>>>> performance increase on tlb shootdown. on arm64, HW can do >>>>> tlb shootdown without software IPI. But sync tlbi is still >>>>> quite expensive. >>>> [...] >>>>> .../features/vm/TLB/arch-support.txt | 2 +- >>>>> arch/arm64/Kconfig | 1 + >>>>> arch/arm64/include/asm/tlbbatch.h | 12 ++++ >>>>> arch/arm64/include/asm/tlbflush.h | 33 ++++++++- >>>>> arch/arm64/mm/flush.c | 69 +++++++++++++++++++ >>>>> arch/x86/include/asm/tlbflush.h | 5 +- >>>>> include/linux/mm_types_task.h | 4 +- >>>>> mm/rmap.c | 12 ++-- >>>> >>>> First of all, this patch needs to be split in some preparatory patches >>>> introducing/renaming functions with no functional change for x86. Once >>>> done, you can add the arm64-only changes. >>>> >> >> got it. will try to split this patch as suggested. >> >>>> Now, on the implementation, I had some comments on v7 but we didn't get >>>> to a conclusion and the thread eventually died: >>>> >>>> https://lore.kernel.org/linux-mm/Y7cToj5mWd1ZbMyQ@arm.com/ >>>> >>>> I know I said a command line argument is better than Kconfig or some >>>> random number of CPUs heuristics but it would be even better if we don't >>>> bother with any, just make this always on. >> >> ok, will make this always on. >> >>>> Barry had some comments >>>> around mprotect() being racy and that's why we have >>>> flush_tlb_batched_pending() but I don't think it's needed (or, for >>>> arm64, it can be a DSB since this patch issues the TLBIs but without the >>>> DVM Sync). So we need to clarify this (see Barry's last email on the >>>> above thread) and before attempting new versions of this patchset. With >>>> flush_tlb_batched_pending() removed (or DSB), I have a suspicion such >>>> implementation would be faster on any SoC irrespective of the number of >>>> CPUs. >>> >>> I think I got the need for flush_tlb_batched_pending(). If >>> try_to_unmap() marks the pte !present and we have a pending TLBI, >>> change_pte_range() will skip the TLB maintenance altogether since it did >>> not change the pte. So we could be left with stale TLB entries after >>> mprotect() before TTU does the batch flushing. >>> > > Good catch. > This could be also true for MADV_DONTNEED. after try_to_unmap, we run > MADV_DONTNEED on this area, as pte is not present, we don't do anything > on this PTE in zap_pte_range afterwards. > >>> We can have an arch-specific flush_tlb_batched_pending() that can be a >>> DSB only on arm64 and a full mm flush on x86. >>> >> >> We need to do a flush/dsb in flush_tlb_batched_pending() only in a race >> condition so we first check whether there's a pended batched flush and >> if so do the tlb flush. The pending checking is common and the differences >> among the archs is how to flush the TLB here within the flush_tlb_batched_pending(), >> on arm64 it should only be a dsb. >> >> As we only needs to maintain the TLBs already pended in batched flush, >> does it make sense to only handle those TLBs in flush_tlb_batched_pending()? >> Then we can use the arch_tlbbatch_flush() rather than flush_tlb_mm() in >> flush_tlb_batched_pending() and no arch specific function needed. > > as we have issued no-sync tlbi on those pending addresses , that means > our hardware > has already "recorded" what should be flushed in the specific mm. so > DSB only will flush > them correctly. right? > yes it's right. I was just thought something like below. arch_tlbbatch_flush() will only be a dsb on arm64 so this will match what Catalin wants. But as you told that this maybe incorrect on x86 so we'd better have arch specific implementation for flush_tlb_batched_pending() as suggested. diff --git a/mm/rmap.c b/mm/rmap.c index 9699c6011b0e..afa3571503a0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -717,7 +717,7 @@ void flush_tlb_batched_pending(struct mm_struct *mm) int flushed = batch >> TLB_FLUSH_BATCH_FLUSHED_SHIFT; if (pending != flushed) { - flush_tlb_mm(mm); + arch_tlbbatch_flush(¤t->tlb_ubc.arch); /* * If the new TLB flushing is pending during flushing, leave * mm->tlb_flush_batched as is, to avoid losing flushing.
diff --git a/Documentation/features/vm/TLB/arch-support.txt b/Documentation/features/vm/TLB/arch-support.txt index 7f049c251a79..76208db88f3b 100644 --- a/Documentation/features/vm/TLB/arch-support.txt +++ b/Documentation/features/vm/TLB/arch-support.txt @@ -9,7 +9,7 @@ | alpha: | TODO | | arc: | TODO | | arm: | TODO | - | arm64: | N/A | + | arm64: | ok | | csky: | TODO | | hexagon: | TODO | | ia64: | TODO | diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index b1201d25a8a4..b3fc652dc902 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -96,6 +96,7 @@ config ARM64 select ARCH_SUPPORTS_NUMA_BALANCING select ARCH_SUPPORTS_PAGE_TABLE_CHECK select ARCH_SUPPORTS_PER_VMA_LOCK + select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH if EXPERT select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT select ARCH_WANT_DEFAULT_BPF_JIT select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT diff --git a/arch/arm64/include/asm/tlbbatch.h b/arch/arm64/include/asm/tlbbatch.h new file mode 100644 index 000000000000..fedb0b87b8db --- /dev/null +++ b/arch/arm64/include/asm/tlbbatch.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ARCH_ARM64_TLBBATCH_H +#define _ARCH_ARM64_TLBBATCH_H + +struct arch_tlbflush_unmap_batch { + /* + * For arm64, HW can do tlb shootdown, so we don't + * need to record cpumask for sending IPI + */ +}; + +#endif /* _ARCH_ARM64_TLBBATCH_H */ diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index 412a3b9a3c25..8041905e26b9 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -254,17 +254,23 @@ static inline void flush_tlb_mm(struct mm_struct *mm) dsb(ish); } -static inline void flush_tlb_page_nosync(struct vm_area_struct *vma, +static inline void __flush_tlb_page_nosync(struct mm_struct *mm, unsigned long uaddr) { unsigned long addr; dsb(ishst); - addr = __TLBI_VADDR(uaddr, ASID(vma->vm_mm)); + addr = __TLBI_VADDR(uaddr, ASID(mm)); __tlbi(vale1is, addr); __tlbi_user(vale1is, addr); } +static inline void flush_tlb_page_nosync(struct vm_area_struct *vma, + unsigned long uaddr) +{ + return __flush_tlb_page_nosync(vma->vm_mm, uaddr); +} + static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr) { @@ -272,6 +278,29 @@ static inline void flush_tlb_page(struct vm_area_struct *vma, dsb(ish); } +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH + +extern struct static_key_false batched_tlb_enabled; + +static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) +{ + return static_branch_likely(&batched_tlb_enabled); +} + +static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm, + unsigned long uaddr) +{ + __flush_tlb_page_nosync(mm, uaddr); +} + +static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) +{ + dsb(ish); +} + +#endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ + /* * This is meant to avoid soft lock-ups on large TLB flushing ranges and not * necessarily a performance improvement. diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index 5f9379b3c8c8..84a8e15cda96 100644 --- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -7,8 +7,10 @@ */ #include <linux/export.h> +#include <linux/jump_label.h> #include <linux/mm.h> #include <linux/pagemap.h> +#include <linux/sysctl.h> #include <asm/cacheflush.h> #include <asm/cache.h> @@ -107,3 +109,70 @@ void arch_invalidate_pmem(void *addr, size_t size) } EXPORT_SYMBOL_GPL(arch_invalidate_pmem); #endif + +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH + +DEFINE_STATIC_KEY_FALSE(batched_tlb_enabled); + +static bool batched_tlb_flush_supported(void) +{ +#ifdef CONFIG_ARM64_WORKAROUND_REPEAT_TLBI + /* + * TLB flush deferral is not required on systems, which are affected with + * ARM64_WORKAROUND_REPEAT_TLBI, as __tlbi()/__tlbi_user() implementation + * will have two consecutive TLBI instructions with a dsb(ish) in between + * defeating the purpose (i.e save overall 'dsb ish' cost). + */ + if (unlikely(cpus_have_const_cap(ARM64_WORKAROUND_REPEAT_TLBI))) + return false; +#endif + return true; +} + +int batched_tlb_enabled_handler(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + unsigned int enabled = static_branch_unlikely(&batched_tlb_enabled); + struct ctl_table t; + int err; + + if (write && !capable(CAP_SYS_ADMIN)) + return -EPERM; + + t = *table; + t.data = &enabled; + err = proc_dointvec_minmax(&t, write, buffer, lenp, ppos); + if (!err && write) { + if (enabled && batched_tlb_flush_supported()) + static_branch_enable(&batched_tlb_enabled); + else + static_branch_disable(&batched_tlb_enabled); + } + + return err; +} + +static struct ctl_table batched_tlb_sysctls[] = { + { + .procname = "batched_tlb_enabled", + .data = NULL, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = batched_tlb_enabled_handler, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, + {} +}; + +static int __init batched_tlb_sysctls_init(void) +{ + if (batched_tlb_flush_supported()) + static_branch_enable(&batched_tlb_enabled); + + register_sysctl_init("vm", batched_tlb_sysctls); + return 0; +} +late_initcall(batched_tlb_sysctls_init); + +#endif diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 46bdff73217c..2eb4b69ce38b 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -283,8 +283,9 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) return atomic64_inc_return(&mm->context.tlb_gen); } -static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch, - struct mm_struct *mm) +static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm, + unsigned long uaddr) { inc_mm_tlb_gen(mm); cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm)); diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index 5414b5c6a103..aa44fff8bb9d 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -52,8 +52,8 @@ struct tlbflush_unmap_batch { #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH /* * The arch code makes the following promise: generic code can modify a - * PTE, then call arch_tlbbatch_add_mm() (which internally provides all - * needed barriers), then call arch_tlbbatch_flush(), and the entries + * PTE, then call arch_tlbbatch_add_pending() (which internally provides + * all needed barriers), then call arch_tlbbatch_flush(), and the entries * will be flushed on all CPUs by the time that arch_tlbbatch_flush() * returns. */ diff --git a/mm/rmap.c b/mm/rmap.c index b45f95ab0c04..9ef497228d45 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -642,7 +642,8 @@ void try_to_unmap_flush_dirty(void) #define TLB_FLUSH_BATCH_PENDING_LARGE \ (TLB_FLUSH_BATCH_PENDING_MASK / 2) -static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval) +static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, + unsigned long uaddr) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; int batch; @@ -651,7 +652,7 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval) if (!pte_accessible(mm, pteval)) return; - arch_tlbbatch_add_mm(&tlb_ubc->arch, mm); + arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr); tlb_ubc->flush_required = true; /* @@ -726,7 +727,8 @@ void flush_tlb_batched_pending(struct mm_struct *mm) } } #else -static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval) +static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, + unsigned long uaddr) { } @@ -1577,7 +1579,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval); + set_tlb_ubc_flush_pending(mm, pteval, address); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } @@ -1958,7 +1960,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval); + set_tlb_ubc_flush_pending(mm, pteval, address); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); }