[RFC,08/20] mm: store completed TLB generation

Message ID	20210131001132.3368247-9-namit@vmware.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=3kWv=HC=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 037D864E19 From: Nadav Amit <nadav.amit@gmail.com> To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit <namit@vmware.com>, Andrea Arcangeli <aarcange@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, Andy Lutomirski <luto@kernel.org>, Dave Hansen <dave.hansen@linux.intel.com>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>, Nick Piggin <npiggin@gmail.com>, x86@kernel.org Subject: [RFC 08/20] mm: store completed TLB generation Date: Sat, 30 Jan 2021 16:11:20 -0800 Message-Id: <20210131001132.3368247-9-namit@vmware.com> In-Reply-To: <20210131001132.3368247-1-namit@vmware.com> References: <20210131001132.3368247-1-namit@vmware.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	TLB batching consolidation and enhancements \| expand [RFC,00/20] TLB batching consolidation and enhancements [RFC,01/20] mm/tlb: fix fullmm semantics [RFC,02/20] mm/mprotect: use mmu_gather [RFC,03/20] mm/mprotect: do not flush on permission promotion [RFC,04/20] mm/mapping_dirty_helpers: use mmu_gather [RFC,05/20] mm/tlb: move BATCHED_UNMAP_TLB_FLUSH to tlb.h [RFC,06/20] fs/task_mmu: use mmu_gather interface of clear-soft-dirty [RFC,07/20] mm: move x86 tlb_gen to generic code [RFC,08/20] mm: store completed TLB generation [RFC,09/20] mm: create pte/pmd_tlb_flush_pending() [RFC,10/20] mm: add pte_to_page() [RFC,11/20] mm/tlb: remove arch-specific tlb_start/end_vma() [RFC,12/20] mm/tlb: save the VMA that is flushed during tlb_start_vma() [RFC,13/20] mm/tlb: introduce tlb_start_ptes() and tlb_end_ptes() [RFC,14/20] mm: move inc/dec_tlb_flush_pending() to mmu_gather.c [RFC,15/20] mm: detect deferred TLB flushes in vma granularity [RFC,16/20] mm/tlb: per-page table generation tracking [RFC,17/20] mm/tlb: updated completed deferred TLB flush conditionally [RFC,18/20] mm: make mm_cpumask() volatile [RFC,19/20] lib/cpumask: introduce cpumask_atomic_or() [RFC,20/20] mm/rmap: avoid potential races

Message ID

20210131001132.3368247-9-namit@vmware.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 037D864E19
From: Nadav Amit <nadav.amit@gmail.com>
To: linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Cc: Nadav Amit <namit@vmware.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>,
	Yu Zhao <yuzhao@google.com>,
	Nick Piggin <npiggin@gmail.com>,
	x86@kernel.org
Subject: [RFC 08/20] mm: store completed TLB generation
Date: Sat, 30 Jan 2021 16:11:20 -0800
Message-Id: <20210131001132.3368247-9-namit@vmware.com>
In-Reply-To: <20210131001132.3368247-1-namit@vmware.com>
References: <20210131001132.3368247-1-namit@vmware.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

TLB batching consolidation and enhancements | expand

Commit Message

Nadav Amit Jan. 31, 2021, 12:11 a.m. UTC

From: Nadav Amit <namit@vmware.com>

To detect deferred TLB flushes in fine granularity, we need to keep
track on the completed TLB flush generation for each mm.

Add logic to track for each mm the tlb_gen_completed, which tracks the
completed TLB generation. It is the arch responsibility to call
mark_mm_tlb_gen_done() whenever a TLB flush is completed.

Start the generation numbers from 1 instead of 0. This would allow later
to detect whether flushes of a certain generation were completed.

Signed-off-by: Nadav Amit <namit@vmware.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: x86@kernel.org
---
 arch/x86/mm/tlb.c         | 10 ++++++++++
 include/asm-generic/tlb.h | 33 +++++++++++++++++++++++++++++++++
 include/linux/mm_types.h  | 15 ++++++++++++++-
 3 files changed, 57 insertions(+), 1 deletion(-)

Comments

Andy Lutomirski Jan. 31, 2021, 8:32 p.m. UTC | #1

On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit <nadav.amit@gmail.com> wrote:
>
> From: Nadav Amit <namit@vmware.com>
>
> To detect deferred TLB flushes in fine granularity, we need to keep
> track on the completed TLB flush generation for each mm.
>
> Add logic to track for each mm the tlb_gen_completed, which tracks the
> completed TLB generation. It is the arch responsibility to call
> mark_mm_tlb_gen_done() whenever a TLB flush is completed.
>
> Start the generation numbers from 1 instead of 0. This would allow later
> to detect whether flushes of a certain generation were completed.

Can you elaborate on how this helps?

I think you should document that tlb_gen_completed only means that no
outdated TLB entries will be observably used.  In the x86
implementation it's possible for older TLB entries to still exist,
unused, in TLBs of cpus running other mms.

How does this work with arch_tlbbatch_flush()?

>
> Signed-off-by: Nadav Amit <namit@vmware.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Will Deacon <will@kernel.org>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Nick Piggin <npiggin@gmail.com>
> Cc: x86@kernel.org
> ---
>  arch/x86/mm/tlb.c         | 10 ++++++++++
>  include/asm-generic/tlb.h | 33 +++++++++++++++++++++++++++++++++
>  include/linux/mm_types.h  | 15 ++++++++++++++-
>  3 files changed, 57 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 7ab21430be41..d17b5575531e 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -14,6 +14,7 @@
>  #include <asm/nospec-branch.h>
>  #include <asm/cache.h>
>  #include <asm/apic.h>
> +#include <asm/tlb.h>
>
>  #include "mm_internal.h"
>
> @@ -915,6 +916,9 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
>         if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids)
>                 flush_tlb_others(mm_cpumask(mm), info);
>
> +       /* Update the completed generation */
> +       mark_mm_tlb_gen_done(mm, new_tlb_gen);
> +
>         put_flush_tlb_info();
>         put_cpu();
>  }
> @@ -1147,6 +1151,12 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
>
>         cpumask_clear(&batch->cpumask);
>
> +       /*
> +        * We cannot call mark_mm_tlb_gen_done() since we do not know which
> +        * mm's should be flushed. This may lead to some unwarranted TLB
> +        * flushes, but not to correction problems.
> +        */
> +
>         put_cpu();
>  }
>
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index 517c89398c83..427bfcc6cdec 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -513,6 +513,39 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vm
>  }
>  #endif
>
> +#ifdef CONFIG_ARCH_HAS_TLB_GENERATIONS
> +
> +/*
> + * Helper function to update a generation to have a new value, as long as new
> + * value is greater or equal to gen.
> + */

I read this a couple of times, and I don't understand it.  How about:

Helper function to atomically set *gen = max(*gen, new_gen)

> +static inline void tlb_update_generation(atomic64_t *gen, u64 new_gen)
> +{
> +       u64 cur_gen = atomic64_read(gen);
> +
> +       while (cur_gen < new_gen) {
> +               u64 old_gen = atomic64_cmpxchg(gen, cur_gen, new_gen);
> +
> +               /* Check if we succeeded in the cmpxchg */
> +               if (likely(cur_gen == old_gen))
> +                       break;
> +
> +               cur_gen = old_gen;
> +       };
> +}
> +
> +
> +static inline void mark_mm_tlb_gen_done(struct mm_struct *mm, u64 gen)
> +{
> +       /*
> +        * Update the completed generation to the new generation if the new
> +        * generation is greater than the previous one.
> +        */
> +       tlb_update_generation(&mm->tlb_gen_completed, gen);
> +}
> +
> +#endif /* CONFIG_ARCH_HAS_TLB_GENERATIONS */
> +
>  /*
>   * tlb_flush_{pte|pmd|pud|p4d}_range() adjust the tlb->start and tlb->end,
>   * and set corresponding cleared_*.
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 2035ac319c2b..8a5eb4bfac59 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -571,6 +571,13 @@ struct mm_struct {
>                  * This is not used on Xen PV.
>                  */
>                 atomic64_t tlb_gen;
> +
> +               /*
> +                * TLB generation which is guarnateed to be flushed, including

guaranteed

> +                * all the PTE changes that were performed before tlb_gen was
> +                * incremented.
> +                */

I will defer judgment to future patches before I believe that this isn't racy :)

> +               atomic64_t tlb_gen_completed;
>  #endif
>         } __randomize_layout;
>
> @@ -690,7 +697,13 @@ static inline bool mm_tlb_flush_nested(struct mm_struct *mm)
>  #ifdef CONFIG_ARCH_HAS_TLB_GENERATIONS
>  static inline void init_mm_tlb_gen(struct mm_struct *mm)
>  {
> -       atomic64_set(&mm->tlb_gen, 0);
> +       /*
> +        * Start from generation of 1, so default generation 0 will be
> +        * considered as flushed and would not be regarded as an outstanding
> +        * deferred invalidation.
> +        */

Aha, this makes sense.

> +       atomic64_set(&mm->tlb_gen, 1);
> +       atomic64_set(&mm->tlb_gen_completed, 1);
>  }
>
>  static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)
> --
> 2.25.1
>

Nadav Amit Feb. 1, 2021, 7:28 a.m. UTC | #2

> On Jan 31, 2021, at 12:32 PM, Andy Lutomirski <luto@kernel.org> wrote:
> 
> On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit <nadav.amit@gmail.com> wrote:
>> From: Nadav Amit <namit@vmware.com>
>> 
>> To detect deferred TLB flushes in fine granularity, we need to keep
>> track on the completed TLB flush generation for each mm.
>> 
>> Add logic to track for each mm the tlb_gen_completed, which tracks the
>> completed TLB generation. It is the arch responsibility to call
>> mark_mm_tlb_gen_done() whenever a TLB flush is completed.
>> 
>> Start the generation numbers from 1 instead of 0. This would allow later
>> to detect whether flushes of a certain generation were completed.
> 
> Can you elaborate on how this helps?

I guess it should have gone to patch 15.

The relevant code it interacts with is in read_defer_tlb_flush_gen(). It
allows to use a single check to see “outdated” deferred TLB gen. Initially
tlb->defer_gen is zero. We are going to do inc_mm_tlb_gen() both on the
first time we defer TLB entries and whenever we see mm_gen is newer than
tlb->defer_gen:

+       mm_gen = atomic64_read(&mm->tlb_gen);
+
+       /*
+        * This condition checks for both first deferred TLB flush and for other
+        * TLB pending or executed TLB flushes after the last table that we
+        * updated. In the latter case, we are going to skip a generation, which
+        * would lead to a full TLB flush. This should therefore not cause
+        * correctness issues, and should not induce overheads, since anyhow in
+        * TLB storms it is better to perform full TLB flush.
+        */
+       if (mm_gen != tlb->defer_gen) {
+               VM_BUG_ON(mm_gen < tlb->defer_gen);
+
+               tlb->defer_gen = inc_mm_tlb_gen(mm);
+       }


> 
> I think you should document that tlb_gen_completed only means that no
> outdated TLB entries will be observably used.  In the x86
> implementation it's possible for older TLB entries to still exist,
> unused, in TLBs of cpus running other mms.

You mean entries that be later flushed during switch_mm_irqs_off(), right? I
think that overall my comments need some work. Yes.

> How does this work with arch_tlbbatch_flush()?

completed_gen is not updated by arch_tlbbatch_flush(), since I couldn’t find
a way to combine them. completed_gen might not catch up with tlb_gen in this
case until another TLB flush takes place. I do not see correctness issue,
but it might result in redundant TLB flush.

>> Signed-off-by: Nadav Amit <namit@vmware.com>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Yu Zhao <yuzhao@google.com>
>> Cc: Nick Piggin <npiggin@gmail.com>
>> Cc: x86@kernel.org
>> ---
>> arch/x86/mm/tlb.c         | 10 ++++++++++
>> include/asm-generic/tlb.h | 33 +++++++++++++++++++++++++++++++++
>> include/linux/mm_types.h  | 15 ++++++++++++++-
>> 3 files changed, 57 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>> index 7ab21430be41..d17b5575531e 100644
>> --- a/arch/x86/mm/tlb.c
>> +++ b/arch/x86/mm/tlb.c
>> @@ -14,6 +14,7 @@
>> #include <asm/nospec-branch.h>
>> #include <asm/cache.h>
>> #include <asm/apic.h>
>> +#include <asm/tlb.h>
>> 
>> #include "mm_internal.h"
>> 
>> @@ -915,6 +916,9 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
>>        if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids)
>>                flush_tlb_others(mm_cpumask(mm), info);
>> 
>> +       /* Update the completed generation */
>> +       mark_mm_tlb_gen_done(mm, new_tlb_gen);
>> +
>>        put_flush_tlb_info();
>>        put_cpu();
>> }
>> @@ -1147,6 +1151,12 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
>> 
>>        cpumask_clear(&batch->cpumask);
>> 
>> +       /*
>> +        * We cannot call mark_mm_tlb_gen_done() since we do not know which
>> +        * mm's should be flushed. This may lead to some unwarranted TLB
>> +        * flushes, but not to correction problems.
>> +        */
>> +
>>        put_cpu();
>> }
>> 
>> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
>> index 517c89398c83..427bfcc6cdec 100644
>> --- a/include/asm-generic/tlb.h
>> +++ b/include/asm-generic/tlb.h
>> @@ -513,6 +513,39 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vm
>> }
>> #endif
>> 
>> +#ifdef CONFIG_ARCH_HAS_TLB_GENERATIONS
>> +
>> +/*
>> + * Helper function to update a generation to have a new value, as long as new
>> + * value is greater or equal to gen.
>> + */
> 
> I read this a couple of times, and I don't understand it.  How about:
> 
> Helper function to atomically set *gen = max(*gen, new_gen)
> 
>> +static inline void tlb_update_generation(atomic64_t *gen, u64 new_gen)
>> +{
>> +       u64 cur_gen = atomic64_read(gen);
>> +
>> +       while (cur_gen < new_gen) {
>> +               u64 old_gen = atomic64_cmpxchg(gen, cur_gen, new_gen);
>> +
>> +               /* Check if we succeeded in the cmpxchg */
>> +               if (likely(cur_gen == old_gen))
>> +                       break;
>> +
>> +               cur_gen = old_gen;
>> +       };
>> +}
>> +
>> +
>> +static inline void mark_mm_tlb_gen_done(struct mm_struct *mm, u64 gen)
>> +{
>> +       /*
>> +        * Update the completed generation to the new generation if the new
>> +        * generation is greater than the previous one.
>> +        */
>> +       tlb_update_generation(&mm->tlb_gen_completed, gen);
>> +}
>> +
>> +#endif /* CONFIG_ARCH_HAS_TLB_GENERATIONS */
>> +
>> /*
>>  * tlb_flush_{pte|pmd|pud|p4d}_range() adjust the tlb->start and tlb->end,
>>  * and set corresponding cleared_*.
>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>> index 2035ac319c2b..8a5eb4bfac59 100644
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -571,6 +571,13 @@ struct mm_struct {
>>                 * This is not used on Xen PV.
>>                 */
>>                atomic64_t tlb_gen;
>> +
>> +               /*
>> +                * TLB generation which is guarnateed to be flushed, including
> 
> guaranteed
> 
>> +                * all the PTE changes that were performed before tlb_gen was
>> +                * incremented.
>> +                */
> 
> I will defer judgment to future patches before I believe that this isn't racy :)

Fair enough. Thanks for the review.

Peter Zijlstra Feb. 1, 2021, 11:52 a.m. UTC | #3

On Sat, Jan 30, 2021 at 04:11:20PM -0800, Nadav Amit wrote:
> +static inline void tlb_update_generation(atomic64_t *gen, u64 new_gen)
> +{
> +	u64 cur_gen = atomic64_read(gen);
> +
> +	while (cur_gen < new_gen) {
> +		u64 old_gen = atomic64_cmpxchg(gen, cur_gen, new_gen);
> +
> +		/* Check if we succeeded in the cmpxchg */
> +		if (likely(cur_gen == old_gen))
> +			break;
> +
> +		cur_gen = old_gen;
> +	};
> +}

	u64 cur_gen = atomic64_read(gen);
	while (cur_gen < new_gen && !atomic64_try_cmpxchg(gen, &cur_gen, new_gen))
		;

Andy Lutomirski Feb. 1, 2021, 4:53 p.m. UTC | #4

On Sun, Jan 31, 2021 at 11:28 PM Nadav Amit <namit@vmware.com> wrote:
>
> > On Jan 31, 2021, at 12:32 PM, Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit <nadav.amit@gmail.com> wrote:
> >> From: Nadav Amit <namit@vmware.com>
> >>
> >> To detect deferred TLB flushes in fine granularity, we need to keep
> >> track on the completed TLB flush generation for each mm.
> >>
> >> Add logic to track for each mm the tlb_gen_completed, which tracks the
> >> completed TLB generation. It is the arch responsibility to call
> >> mark_mm_tlb_gen_done() whenever a TLB flush is completed.
> >>
> >> Start the generation numbers from 1 instead of 0. This would allow later
> >> to detect whether flushes of a certain generation were completed.
> >
> > Can you elaborate on how this helps?
>
> I guess it should have gone to patch 15.
>
> The relevant code it interacts with is in read_defer_tlb_flush_gen(). It
> allows to use a single check to see “outdated” deferred TLB gen. Initially
> tlb->defer_gen is zero. We are going to do inc_mm_tlb_gen() both on the
> first time we defer TLB entries and whenever we see mm_gen is newer than
> tlb->defer_gen:
>
> +       mm_gen = atomic64_read(&mm->tlb_gen);
> +
> +       /*
> +        * This condition checks for both first deferred TLB flush and for other
> +        * TLB pending or executed TLB flushes after the last table that we
> +        * updated. In the latter case, we are going to skip a generation, which
> +        * would lead to a full TLB flush. This should therefore not cause
> +        * correctness issues, and should not induce overheads, since anyhow in
> +        * TLB storms it is better to perform full TLB flush.
> +        */
> +       if (mm_gen != tlb->defer_gen) {
> +               VM_BUG_ON(mm_gen < tlb->defer_gen);
> +
> +               tlb->defer_gen = inc_mm_tlb_gen(mm);
> +       }
>
>
> >
> > I think you should document that tlb_gen_completed only means that no
> > outdated TLB entries will be observably used.  In the x86
> > implementation it's possible for older TLB entries to still exist,
> > unused, in TLBs of cpus running other mms.
>
> You mean entries that be later flushed during switch_mm_irqs_off(), right? I
> think that overall my comments need some work. Yes.

That's exactly what I mean.

>
> > How does this work with arch_tlbbatch_flush()?
>
> completed_gen is not updated by arch_tlbbatch_flush(), since I couldn’t find
> a way to combine them. completed_gen might not catch up with tlb_gen in this
> case until another TLB flush takes place. I do not see correctness issue,
> but it might result in redundant TLB flush.

Please at least document this.

FWIW, arch_tlbbatch_flush() is gross.  I'm not convinced it's really
supportable with proper broadcast invalidation. I suppose we could
remove it or explicitly track the set of mms that need flushing.

>
> >> Signed-off-by: Nadav Amit <namit@vmware.com>
> >> Cc: Andrea Arcangeli <aarcange@redhat.com>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: Andy Lutomirski <luto@kernel.org>
> >> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >> Cc: Thomas Gleixner <tglx@linutronix.de>
> >> Cc: Will Deacon <will@kernel.org>
> >> Cc: Yu Zhao <yuzhao@google.com>
> >> Cc: Nick Piggin <npiggin@gmail.com>
> >> Cc: x86@kernel.org
> >> ---
> >> arch/x86/mm/tlb.c         | 10 ++++++++++
> >> include/asm-generic/tlb.h | 33 +++++++++++++++++++++++++++++++++
> >> include/linux/mm_types.h  | 15 ++++++++++++++-
> >> 3 files changed, 57 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> >> index 7ab21430be41..d17b5575531e 100644
> >> --- a/arch/x86/mm/tlb.c
> >> +++ b/arch/x86/mm/tlb.c
> >> @@ -14,6 +14,7 @@
> >> #include <asm/nospec-branch.h>
> >> #include <asm/cache.h>
> >> #include <asm/apic.h>
> >> +#include <asm/tlb.h>
> >>
> >> #include "mm_internal.h"
> >>
> >> @@ -915,6 +916,9 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
> >>        if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids)
> >>                flush_tlb_others(mm_cpumask(mm), info);
> >>
> >> +       /* Update the completed generation */
> >> +       mark_mm_tlb_gen_done(mm, new_tlb_gen);
> >> +
> >>        put_flush_tlb_info();
> >>        put_cpu();
> >> }
> >> @@ -1147,6 +1151,12 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
> >>
> >>        cpumask_clear(&batch->cpumask);
> >>
> >> +       /*
> >> +        * We cannot call mark_mm_tlb_gen_done() since we do not know which
> >> +        * mm's should be flushed. This may lead to some unwarranted TLB
> >> +        * flushes, but not to correction problems.
> >> +        */
> >> +
> >>        put_cpu();
> >> }
> >>
> >> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> >> index 517c89398c83..427bfcc6cdec 100644
> >> --- a/include/asm-generic/tlb.h
> >> +++ b/include/asm-generic/tlb.h
> >> @@ -513,6 +513,39 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vm
> >> }
> >> #endif
> >>
> >> +#ifdef CONFIG_ARCH_HAS_TLB_GENERATIONS
> >> +
> >> +/*
> >> + * Helper function to update a generation to have a new value, as long as new
> >> + * value is greater or equal to gen.
> >> + */
> >
> > I read this a couple of times, and I don't understand it.  How about:
> >
> > Helper function to atomically set *gen = max(*gen, new_gen)
> >
> >> +static inline void tlb_update_generation(atomic64_t *gen, u64 new_gen)
> >> +{
> >> +       u64 cur_gen = atomic64_read(gen);
> >> +
> >> +       while (cur_gen < new_gen) {
> >> +               u64 old_gen = atomic64_cmpxchg(gen, cur_gen, new_gen);
> >> +
> >> +               /* Check if we succeeded in the cmpxchg */
> >> +               if (likely(cur_gen == old_gen))
> >> +                       break;
> >> +
> >> +               cur_gen = old_gen;
> >> +       };
> >> +}
> >> +
> >> +
> >> +static inline void mark_mm_tlb_gen_done(struct mm_struct *mm, u64 gen)
> >> +{
> >> +       /*
> >> +        * Update the completed generation to the new generation if the new
> >> +        * generation is greater than the previous one.
> >> +        */
> >> +       tlb_update_generation(&mm->tlb_gen_completed, gen);
> >> +}
> >> +
> >> +#endif /* CONFIG_ARCH_HAS_TLB_GENERATIONS */
> >> +
> >> /*
> >>  * tlb_flush_{pte|pmd|pud|p4d}_range() adjust the tlb->start and tlb->end,
> >>  * and set corresponding cleared_*.
> >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> >> index 2035ac319c2b..8a5eb4bfac59 100644
> >> --- a/include/linux/mm_types.h
> >> +++ b/include/linux/mm_types.h
> >> @@ -571,6 +571,13 @@ struct mm_struct {
> >>                 * This is not used on Xen PV.
> >>                 */
> >>                atomic64_t tlb_gen;
> >> +
> >> +               /*
> >> +                * TLB generation which is guarnateed to be flushed, including
> >
> > guaranteed
> >
> >> +                * all the PTE changes that were performed before tlb_gen was
> >> +                * incremented.
> >> +                */
> >
> > I will defer judgment to future patches before I believe that this isn't racy :)
>
> Fair enough. Thanks for the review.
>

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 7ab21430be41..d17b5575531e 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -14,6 +14,7 @@ 
 #include <asm/nospec-branch.h>
 #include <asm/cache.h>
 #include <asm/apic.h>
+#include <asm/tlb.h>
 
 #include "mm_internal.h"
 
@@ -915,6 +916,9 @@  void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 	if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids)
 		flush_tlb_others(mm_cpumask(mm), info);
 
+	/* Update the completed generation */
+	mark_mm_tlb_gen_done(mm, new_tlb_gen);
+
 	put_flush_tlb_info();
 	put_cpu();
 }
@@ -1147,6 +1151,12 @@  void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
 
 	cpumask_clear(&batch->cpumask);
 
+	/*
+	 * We cannot call mark_mm_tlb_gen_done() since we do not know which
+	 * mm's should be flushed. This may lead to some unwarranted TLB
+	 * flushes, but not to correction problems.
+	 */
+
 	put_cpu();
 }
 
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 517c89398c83..427bfcc6cdec 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -513,6 +513,39 @@  static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vm
 }
 #endif
 
+#ifdef CONFIG_ARCH_HAS_TLB_GENERATIONS
+
+/*
+ * Helper function to update a generation to have a new value, as long as new
+ * value is greater or equal to gen.
+ */
+static inline void tlb_update_generation(atomic64_t *gen, u64 new_gen)
+{
+	u64 cur_gen = atomic64_read(gen);
+
+	while (cur_gen < new_gen) {
+		u64 old_gen = atomic64_cmpxchg(gen, cur_gen, new_gen);
+
+		/* Check if we succeeded in the cmpxchg */
+		if (likely(cur_gen == old_gen))
+			break;
+
+		cur_gen = old_gen;
+	};
+}
+
+
+static inline void mark_mm_tlb_gen_done(struct mm_struct *mm, u64 gen)
+{
+	/*
+	 * Update the completed generation to the new generation if the new
+	 * generation is greater than the previous one.
+	 */
+	tlb_update_generation(&mm->tlb_gen_completed, gen);
+}
+
+#endif /* CONFIG_ARCH_HAS_TLB_GENERATIONS */
+
 /*
  * tlb_flush_{pte|pmd|pud|p4d}_range() adjust the tlb->start and tlb->end,
  * and set corresponding cleared_*.
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 2035ac319c2b..8a5eb4bfac59 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -571,6 +571,13 @@  struct mm_struct {
 		 * This is not used on Xen PV.
 		 */
 		atomic64_t tlb_gen;
+
+		/*
+		 * TLB generation which is guarnateed to be flushed, including
+		 * all the PTE changes that were performed before tlb_gen was
+		 * incremented.
+		 */
+		atomic64_t tlb_gen_completed;
 #endif
 	} __randomize_layout;
 
@@ -690,7 +697,13 @@  static inline bool mm_tlb_flush_nested(struct mm_struct *mm)
 #ifdef CONFIG_ARCH_HAS_TLB_GENERATIONS
 static inline void init_mm_tlb_gen(struct mm_struct *mm)
 {
-	atomic64_set(&mm->tlb_gen, 0);
+	/*
+	 * Start from generation of 1, so default generation 0 will be
+	 * considered as flushed and would not be regarded as an outstanding
+	 * deferred invalidation.
+	 */
+	atomic64_set(&mm->tlb_gen, 1);
+	atomic64_set(&mm->tlb_gen_completed, 1);
 }
 
 static inline u64 inc_mm_tlb_gen(struct mm_struct *mm)

[RFC,08/20] mm: store completed TLB generation

Commit Message

Comments

Patch