Message ID | 20210131001132.3368247-16-namit@vmware.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | TLB batching consolidation and enhancements | expand |
> On Jan 30, 2021, at 4:11 PM, Nadav Amit <nadav.amit@gmail.com> wrote: > > From: Nadav Amit <namit@vmware.com> > > Currently, deferred TLB flushes are detected in the mm granularity: if > there is any deferred TLB flush in the entire address space due to NUMA > migration, pte_accessible() in x86 would return true, and > ptep_clear_flush() would require a TLB flush. This would happen even if > the PTE resides in a completely different vma. [ snip ] > +static inline void read_defer_tlb_flush_gen(struct mmu_gather *tlb) > +{ > + struct mm_struct *mm = tlb->mm; > + u64 mm_gen; > + > + /* > + * Any change of PTE before calling __track_deferred_tlb_flush() must be > + * performed using RMW atomic operation that provides a memory barriers, > + * such as ptep_modify_prot_start(). The barrier ensure the PTEs are > + * written before the current generation is read, synchronizing > + * (implicitly) with flush_tlb_mm_range(). > + */ > + smp_mb__after_atomic(); > + > + mm_gen = atomic64_read(&mm->tlb_gen); > + > + /* > + * This condition checks for both first deferred TLB flush and for other > + * TLB pending or executed TLB flushes after the last table that we > + * updated. In the latter case, we are going to skip a generation, which > + * would lead to a full TLB flush. This should therefore not cause > + * correctness issues, and should not induce overheads, since anyhow in > + * TLB storms it is better to perform full TLB flush. > + */ > + if (mm_gen != tlb->defer_gen) { > + VM_BUG_ON(mm_gen < tlb->defer_gen); > + > + tlb->defer_gen = inc_mm_tlb_gen(mm); > + } > +} Andy’s comments managed to make me realize this code is wrong. We must call inc_mm_tlb_gen(mm) every time. Otherwise, a CPU that saw the old tlb_gen and updated it in its local cpu_tlbstate on a context-switch. If the process was not running when the TLB flush was issued, no IPI will be sent to the CPU. Therefore, later switch_mm_irqs_off() back to the process will not flush the local TLB. I need to think if there is a better solution. Multiple calls to inc_mm_tlb_gen() during deferred flushes would trigger a full TLB flush instead of one that is specific to the ranges, once the flush actually takes place. On x86 it’s practically a non-issue, since anyhow any update of more than 33-entries or so would cause a full TLB flush, but this is still ugly.
> On Feb 1, 2021, at 2:04 PM, Nadav Amit <nadav.amit@gmail.com> wrote: > > >> >> On Jan 30, 2021, at 4:11 PM, Nadav Amit <nadav.amit@gmail.com> wrote: >> >> From: Nadav Amit <namit@vmware.com> >> >> Currently, deferred TLB flushes are detected in the mm granularity: if >> there is any deferred TLB flush in the entire address space due to NUMA >> migration, pte_accessible() in x86 would return true, and >> ptep_clear_flush() would require a TLB flush. This would happen even if >> the PTE resides in a completely different vma. > > [ snip ] > >> +static inline void read_defer_tlb_flush_gen(struct mmu_gather *tlb) >> +{ >> + struct mm_struct *mm = tlb->mm; >> + u64 mm_gen; >> + >> + /* >> + * Any change of PTE before calling __track_deferred_tlb_flush() must be >> + * performed using RMW atomic operation that provides a memory barriers, >> + * such as ptep_modify_prot_start(). The barrier ensure the PTEs are >> + * written before the current generation is read, synchronizing >> + * (implicitly) with flush_tlb_mm_range(). >> + */ >> + smp_mb__after_atomic(); >> + >> + mm_gen = atomic64_read(&mm->tlb_gen); >> + >> + /* >> + * This condition checks for both first deferred TLB flush and for other >> + * TLB pending or executed TLB flushes after the last table that we >> + * updated. In the latter case, we are going to skip a generation, which >> + * would lead to a full TLB flush. This should therefore not cause >> + * correctness issues, and should not induce overheads, since anyhow in >> + * TLB storms it is better to perform full TLB flush. >> + */ >> + if (mm_gen != tlb->defer_gen) { >> + VM_BUG_ON(mm_gen < tlb->defer_gen); >> + >> + tlb->defer_gen = inc_mm_tlb_gen(mm); >> + } >> +} > > Andy’s comments managed to make me realize this code is wrong. We must > call inc_mm_tlb_gen(mm) every time. > > Otherwise, a CPU that saw the old tlb_gen and updated it in its local > cpu_tlbstate on a context-switch. If the process was not running when the > TLB flush was issued, no IPI will be sent to the CPU. Therefore, later > switch_mm_irqs_off() back to the process will not flush the local TLB. > > I need to think if there is a better solution. Multiple calls to > inc_mm_tlb_gen() during deferred flushes would trigger a full TLB flush > instead of one that is specific to the ranges, once the flush actually takes > place. On x86 it’s practically a non-issue, since anyhow any update of more > than 33-entries or so would cause a full TLB flush, but this is still ugly. > What if we had a per-mm ring buffer of flushes? When starting a flush, we would stick the range in the ring buffer and, when flushing, we would read the ring buffer to catch up. This would mostly replace the flush_tlb_info struct, and it would let us process multiple partial flushes together.
> On Feb 1, 2021, at 4:14 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > >> On Feb 1, 2021, at 2:04 PM, Nadav Amit <nadav.amit@gmail.com> wrote: >> >> Andy’s comments managed to make me realize this code is wrong. We must >> call inc_mm_tlb_gen(mm) every time. >> >> Otherwise, a CPU that saw the old tlb_gen and updated it in its local >> cpu_tlbstate on a context-switch. If the process was not running when the >> TLB flush was issued, no IPI will be sent to the CPU. Therefore, later >> switch_mm_irqs_off() back to the process will not flush the local TLB. >> >> I need to think if there is a better solution. Multiple calls to >> inc_mm_tlb_gen() during deferred flushes would trigger a full TLB flush >> instead of one that is specific to the ranges, once the flush actually takes >> place. On x86 it’s practically a non-issue, since anyhow any update of more >> than 33-entries or so would cause a full TLB flush, but this is still ugly. > > What if we had a per-mm ring buffer of flushes? When starting a flush, we would stick the range in the ring buffer and, when flushing, we would read the ring buffer to catch up. This would mostly replace the flush_tlb_info struct, and it would let us process multiple partial flushes together. I wanted to sleep on it, and went back and forth on whether it is the right direction, hence the late response. I think that what you say make sense. I think that I even tried to do once something similar for some reason, but my memory plays tricks on me. So tell me what you think on this ring-based solution. As you said, you keep per-mm ring of flush_tlb_info. When you queue an entry, you do something like: #define RING_ENTRY_INVALID (0) gen = inc_mm_tlb_gen(mm); struct flush_tlb_info *info = mm->ring[gen % RING_SIZE]; spin_lock(&mm->ring_lock); WRITE_ONCE(info->new_tlb_gen, RING_ENTRY_INVALID); smp_wmb(); info->start = start; info->end = end; info->stride_shift = stride_shift; info->freed_tables = freed_tables; smp_store_release(&info->new_tlb_gen, gen); spin_unlock(&mm->ring_lock); When you flush you use the entry generation as a sequence lock. On overflow of the ring (i.e., sequence number mismatch) you perform a full flush: for (gen = mm->tlb_gen_completed; gen < mm->tlb_gen; gen++) { struct flush_tlb_info *info = &mm->ring[gen % RING_SIZE]; // detect overflow and invalid entries if (smp_load_acquire(info->new_tlb_gen) != gen) goto full_flush; start = min(start, info->start); end = max(end, info->end); stride_shift = min(stride_shift, info->stride_shift); freed_tables |= info.freed_tables; smp_rmb(); // seqlock-like check that the information was not updated if (READ_ONCE(info->new_tlb_gen) != gen) goto full_flush; } On x86 I suspect that performing a full TLB flush would anyhow be the best thing to do if there is more than a single entry. I am also not sure that it makes sense to check the ring from flush_tlb_func_common() (i.e., in each IPI handler) as it might cause cache thrashing. Instead it may be better to do so from flush_tlb_mm_range(), when the flushes are initiated, and use an aggregated flush_tlb_info for the flush. It may also be better to have the ring arch-independent, so it would resemble more of mmu_gather (the parts about the TLB flush information, without the freed pages stuff). We can detect deferred TLB flushes either by storing “deferred_gen” in the page-tables/VMA (as I did) or by going over the ring, from tlb_gen_completed to tlb_gen, and checking for an overlap. I think page-tables would be most efficient/scalable, but perhaps going over the ring would be easier to understand logic. Makes sense? Thoughts?
> On Feb 2, 2021, at 12:52 PM, Nadav Amit <nadav.amit@gmail.com> wrote: > > >> >>> On Feb 1, 2021, at 4:14 PM, Andy Lutomirski <luto@amacapital.net> wrote: >>> >>> >>>> On Feb 1, 2021, at 2:04 PM, Nadav Amit <nadav.amit@gmail.com> wrote: >>> >>> Andy’s comments managed to make me realize this code is wrong. We must >>> call inc_mm_tlb_gen(mm) every time. >>> >>> Otherwise, a CPU that saw the old tlb_gen and updated it in its local >>> cpu_tlbstate on a context-switch. If the process was not running when the >>> TLB flush was issued, no IPI will be sent to the CPU. Therefore, later >>> switch_mm_irqs_off() back to the process will not flush the local TLB. >>> >>> I need to think if there is a better solution. Multiple calls to >>> inc_mm_tlb_gen() during deferred flushes would trigger a full TLB flush >>> instead of one that is specific to the ranges, once the flush actually takes >>> place. On x86 it’s practically a non-issue, since anyhow any update of more >>> than 33-entries or so would cause a full TLB flush, but this is still ugly. >> >> What if we had a per-mm ring buffer of flushes? When starting a flush, we would stick the range in the ring buffer and, when flushing, we would read the ring buffer to catch up. This would mostly replace the flush_tlb_info struct, and it would let us process multiple partial flushes together. > > I wanted to sleep on it, and went back and forth on whether it is the right > direction, hence the late response. > > I think that what you say make sense. I think that I even tried to do once > something similar for some reason, but my memory plays tricks on me. > > So tell me what you think on this ring-based solution. As you said, you keep > per-mm ring of flush_tlb_info. When you queue an entry, you do something > like: > > #define RING_ENTRY_INVALID (0) > > gen = inc_mm_tlb_gen(mm); > struct flush_tlb_info *info = mm->ring[gen % RING_SIZE]; > spin_lock(&mm->ring_lock); Once you are holding the lock, you should presumably check that the ring didn’t overflow while you were getting the lock — if new_tlb_gen > gen, abort. > WRITE_ONCE(info->new_tlb_gen, RING_ENTRY_INVALID); > smp_wmb(); > info->start = start; > info->end = end; > info->stride_shift = stride_shift; > info->freed_tables = freed_tables; > smp_store_release(&info->new_tlb_gen, gen); > spin_unlock(&mm->ring_lock); > Seems reasonable. I’m curious how this ends up getting used. > When you flush you use the entry generation as a sequence lock. On overflow > of the ring (i.e., sequence number mismatch) you perform a full flush: > > for (gen = mm->tlb_gen_completed; gen < mm->tlb_gen; gen++) { > struct flush_tlb_info *info = &mm->ring[gen % RING_SIZE]; > > // detect overflow and invalid entries > if (smp_load_acquire(info->new_tlb_gen) != gen) > goto full_flush; > > start = min(start, info->start); > end = max(end, info->end); > stride_shift = min(stride_shift, info->stride_shift); > freed_tables |= info.freed_tables; > smp_rmb(); > > // seqlock-like check that the information was not updated > if (READ_ONCE(info->new_tlb_gen) != gen) > goto full_flush; > } > > On x86 I suspect that performing a full TLB flush would anyhow be the best > thing to do if there is more than a single entry. I am also not sure that it > makes sense to check the ring from flush_tlb_func_common() (i.e., in each > IPI handler) as it might cause cache thrashing. > > Instead it may be better to do so from flush_tlb_mm_range(), when the > flushes are initiated, and use an aggregated flush_tlb_info for the flush. > > It may also be better to have the ring arch-independent, so it would > resemble more of mmu_gather (the parts about the TLB flush information, > without the freed pages stuff). > > We can detect deferred TLB flushes either by storing “deferred_gen” in the > page-tables/VMA (as I did) or by going over the ring, from tlb_gen_completed > to tlb_gen, and checking for an overlap. I think page-tables would be most > efficient/scalable, but perhaps going over the ring would be easier to > understand logic. > > Makes sense? Thoughts?
diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h index 580636cdc257..ecf538e6c6d5 100644 --- a/arch/x86/include/asm/tlb.h +++ b/arch/x86/include/asm/tlb.h @@ -9,15 +9,23 @@ static inline void tlb_flush(struct mmu_gather *tlb); static inline void tlb_flush(struct mmu_gather *tlb) { - unsigned long start = 0UL, end = TLB_FLUSH_ALL; unsigned int stride_shift = tlb_get_unmap_shift(tlb); - if (!tlb->fullmm && !tlb->need_flush_all) { - start = tlb->start; - end = tlb->end; + /* Perform full flush when needed */ + if (tlb->fullmm || tlb->need_flush_all) { + flush_tlb_mm_range(tlb->mm, 0, TLB_FLUSH_ALL, stride_shift, + tlb->freed_tables); + return; } - flush_tlb_mm_range(tlb->mm, start, end, stride_shift, tlb->freed_tables); + /* Check if flush was already performed */ + if (!tlb->freed_tables && !tlb->cleared_puds && + !tlb->cleared_p4ds && + atomic64_read(&tlb->mm->tlb_gen_completed) > tlb->defer_gen) + return; + + flush_tlb_mm_range_gen(tlb->mm, tlb->start, tlb->end, stride_shift, + tlb->freed_tables, tlb->defer_gen); } /* diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 2110b98026a7..296a00545056 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -225,6 +225,11 @@ void flush_tlb_others(const struct cpumask *cpumask, : PAGE_SHIFT, false) extern void flush_tlb_all(void); + +extern void flush_tlb_mm_range_gen(struct mm_struct *mm, unsigned long start, + unsigned long end, unsigned int stride_shift, + bool freed_tables, u64 gen); + extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int stride_shift, bool freed_tables); diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index d17b5575531e..48f4b56fc4a7 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -883,12 +883,11 @@ static inline void put_flush_tlb_info(void) #endif } -void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, +void flush_tlb_mm_range_gen(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int stride_shift, - bool freed_tables) + bool freed_tables, u64 new_tlb_gen) { struct flush_tlb_info *info; - u64 new_tlb_gen; int cpu; cpu = get_cpu(); @@ -923,6 +922,15 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, put_cpu(); } +void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, + unsigned long end, unsigned int stride_shift, + bool freed_tables) +{ + u64 new_tlb_gen = inc_mm_tlb_gen(mm); + + flush_tlb_mm_range_gen(mm, start, end, stride_shift, freed_tables, + new_tlb_gen); +} static void do_flush_tlb_all(void *info) { diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 10690763090a..f25d2d955076 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -295,6 +295,11 @@ struct mmu_gather { unsigned int cleared_puds : 1; unsigned int cleared_p4ds : 1; + /* + * Whether a TLB flush was needed for PTEs in the current table + */ + unsigned int cleared_ptes_in_table : 1; + unsigned int batch_count; #ifndef CONFIG_MMU_GATHER_NO_GATHER @@ -305,6 +310,10 @@ struct mmu_gather { #ifdef CONFIG_MMU_GATHER_PAGE_SIZE unsigned int page_size; #endif + +#ifdef CONFIG_ARCH_HAS_TLB_GENERATIONS + u64 defer_gen; +#endif #endif }; @@ -381,7 +390,8 @@ static inline void tlb_flush(struct mmu_gather *tlb) #endif #if __is_defined(tlb_flush) || \ - IS_ENABLED(CONFIG_ARCH_WANT_AGGRESSIVE_TLB_FLUSH_BATCHING) + IS_ENABLED(CONFIG_ARCH_WANT_AGGRESSIVE_TLB_FLUSH_BATCHING) || \ + IS_ENABLED(CONFIG_ARCH_HAS_TLB_GENERATIONS) static inline void tlb_update_vma(struct mmu_gather *tlb, struct vm_area_struct *vma) @@ -472,7 +482,8 @@ static inline unsigned long tlb_get_unmap_size(struct mmu_gather *tlb) */ static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma) { - if (tlb->fullmm) + if (IS_ENABLED(CONFIG_ARCH_WANT_AGGRESSIVE_TLB_FLUSH_BATCHING) && + tlb->fullmm) return; tlb_update_vma(tlb, vma); @@ -530,16 +541,87 @@ static inline void mark_mm_tlb_gen_done(struct mm_struct *mm, u64 gen) tlb_update_generation(&mm->tlb_gen_completed, gen); } -#endif /* CONFIG_ARCH_HAS_TLB_GENERATIONS */ +static inline void read_defer_tlb_flush_gen(struct mmu_gather *tlb) +{ + struct mm_struct *mm = tlb->mm; + u64 mm_gen; + + /* + * Any change of PTE before calling __track_deferred_tlb_flush() must be + * performed using RMW atomic operation that provides a memory barriers, + * such as ptep_modify_prot_start(). The barrier ensure the PTEs are + * written before the current generation is read, synchronizing + * (implicitly) with flush_tlb_mm_range(). + */ + smp_mb__after_atomic(); + + mm_gen = atomic64_read(&mm->tlb_gen); + + /* + * This condition checks for both first deferred TLB flush and for other + * TLB pending or executed TLB flushes after the last table that we + * updated. In the latter case, we are going to skip a generation, which + * would lead to a full TLB flush. This should therefore not cause + * correctness issues, and should not induce overheads, since anyhow in + * TLB storms it is better to perform full TLB flush. + */ + if (mm_gen != tlb->defer_gen) { + VM_BUG_ON(mm_gen < tlb->defer_gen); + + tlb->defer_gen = inc_mm_tlb_gen(mm); + } +} + +/* + * Store the deferred TLB generation in the VMA + */ +static inline void store_deferred_tlb_gen(struct mmu_gather *tlb) +{ + tlb_update_generation(&tlb->vma->defer_tlb_gen, tlb->defer_gen); +} + +/* + * Track deferred TLB flushes for PTEs and PMDs to allow fine granularity checks + * whether a PTE is accessible. The TLB generation after the PTE is flushed is + * saved in the mmu_gather struct. Once a flush is performed, the geneartion is + * advanced. + */ +static inline void track_defer_tlb_flush(struct mmu_gather *tlb) +{ + if (tlb->fullmm) + return; + + BUG_ON(!tlb->vma); + + read_defer_tlb_flush_gen(tlb); + store_deferred_tlb_gen(tlb); +} + +#define init_vma_tlb_generation(vma) \ + atomic64_set(&(vma)->defer_tlb_gen, 0) +#else +static inline void init_vma_tlb_generation(struct vm_area_struct *vma) { } +#endif #define tlb_start_ptes(tlb) \ do { \ struct mmu_gather *_tlb = (tlb); \ \ flush_tlb_batched_pending(_tlb->mm); \ + if (IS_ENABLED(CONFIG_ARCH_HAS_TLB_GENERATIONS)) \ + _tlb->cleared_ptes_in_table = 0; \ } while (0) -static inline void tlb_end_ptes(struct mmu_gather *tlb) { } +static inline void tlb_end_ptes(struct mmu_gather *tlb) +{ + if (!IS_ENABLED(CONFIG_ARCH_HAS_TLB_GENERATIONS)) + return; + + if (tlb->cleared_ptes_in_table) + track_defer_tlb_flush(tlb); + + tlb->cleared_ptes_in_table = 0; +} /* * tlb_flush_{pte|pmd|pud|p4d}_range() adjust the tlb->start and tlb->end, @@ -550,15 +632,25 @@ static inline void tlb_flush_pte_range(struct mmu_gather *tlb, { __tlb_adjust_range(tlb, address, size); tlb->cleared_ptes = 1; + + if (IS_ENABLED(CONFIG_ARCH_HAS_TLB_GENERATIONS)) + tlb->cleared_ptes_in_table = 1; } -static inline void tlb_flush_pmd_range(struct mmu_gather *tlb, +static inline void __tlb_flush_pmd_range(struct mmu_gather *tlb, unsigned long address, unsigned long size) { __tlb_adjust_range(tlb, address, size); tlb->cleared_pmds = 1; } +static inline void tlb_flush_pmd_range(struct mmu_gather *tlb, + unsigned long address, unsigned long size) +{ + __tlb_flush_pmd_range(tlb, address, size); + track_defer_tlb_flush(tlb); +} + static inline void tlb_flush_pud_range(struct mmu_gather *tlb, unsigned long address, unsigned long size) { @@ -649,7 +741,7 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, #ifndef pte_free_tlb #define pte_free_tlb(tlb, ptep, address) \ do { \ - tlb_flush_pmd_range(tlb, address, PAGE_SIZE); \ + __tlb_flush_pmd_range(tlb, address, PAGE_SIZE); \ tlb->freed_tables = 1; \ __pte_free_tlb(tlb, ptep, address); \ } while (0) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 676795dfd5d4..bbe5d4a422f7 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -367,6 +367,9 @@ struct vm_area_struct { #endif #ifdef CONFIG_NUMA struct mempolicy *vm_policy; /* NUMA policy for the VMA */ +#endif +#ifdef CONFIG_ARCH_HAS_TLB_GENERATIONS + atomic64_t defer_tlb_gen; /* Deferred TLB flushes generation */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; } __randomize_layout; @@ -628,6 +631,21 @@ static inline bool mm_tlb_flush_pending(struct mm_struct *mm) return atomic_read(&mm->tlb_flush_pending); } +#ifdef CONFIG_ARCH_HAS_TLB_GENERATIONS +static inline bool pte_tlb_flush_pending(struct vm_area_struct *vma, pte_t *pte) +{ + struct mm_struct *mm = vma->vm_mm; + + return atomic64_read(&vma->defer_tlb_gen) < atomic64_read(&mm->tlb_gen_completed); +} + +static inline bool pmd_tlb_flush_pending(struct vm_area_struct *vma, pmd_t *pmd) +{ + struct mm_struct *mm = vma->vm_mm; + + return atomic64_read(&vma->defer_tlb_gen) < atomic64_read(&mm->tlb_gen_completed); +} +#else /* CONFIG_ARCH_HAS_TLB_GENERATIONS */ static inline bool pte_tlb_flush_pending(struct vm_area_struct *vma, pte_t *pte) { return mm_tlb_flush_pending(vma->vm_mm); @@ -637,6 +655,7 @@ static inline bool pmd_tlb_flush_pending(struct vm_area_struct *vma, pmd_t *pmd) { return mm_tlb_flush_pending(vma->vm_mm); } +#endif /* CONFIG_ARCH_HAS_TLB_GENERATIONS */ static inline bool mm_tlb_flush_nested(struct mm_struct *mm) { diff --git a/mm/mmap.c b/mm/mmap.c index 90673febce6a..a81ef902e296 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3337,6 +3337,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, get_file(new_vma->vm_file); if (new_vma->vm_ops && new_vma->vm_ops->open) new_vma->vm_ops->open(new_vma); + init_vma_tlb_generation(new_vma); vma_link(mm, new_vma, prev, rb_link, rb_parent); *need_rmap_locks = false; } diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 13338c096cc6..0d554f2f92ac 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -329,6 +329,9 @@ static void __tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, #endif tlb_table_init(tlb); +#ifdef CONFIG_ARCH_HAS_TLB_GENERATIONS + tlb->defer_gen = 0; +#endif #ifdef CONFIG_MMU_GATHER_PAGE_SIZE tlb->page_size = 0; #endif