diff mbox series

[RFC,01/20] mm/tlb: fix fullmm semantics

Message ID 20210131001132.3368247-2-namit@vmware.com (mailing list archive)
State New, archived
Headers show
Series TLB batching consolidation and enhancements | expand

Commit Message

Nadav Amit Jan. 31, 2021, 12:11 a.m. UTC
From: Nadav Amit <namit@vmware.com>

fullmm in mmu_gather is supposed to indicate that the mm is torn-down
(e.g., on process exit) and can therefore allow certain optimizations.
However, tlb_finish_mmu() sets fullmm, when in fact it want to say that
the TLB should be fully flushed.

Change tlb_finish_mmu() to set need_flush_all and check this flag in
tlb_flush_mmu_tlbonly() when deciding whether a flush is needed.

Signed-off-by: Nadav Amit <namit@vmware.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: x86@kernel.org
---
 include/asm-generic/tlb.h | 2 +-
 mm/mmu_gather.c           | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Andy Lutomirski Jan. 31, 2021, 1:02 a.m. UTC | #1
On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit <nadav.amit@gmail.com> wrote:
>
> From: Nadav Amit <namit@vmware.com>
>
> fullmm in mmu_gather is supposed to indicate that the mm is torn-down
> (e.g., on process exit) and can therefore allow certain optimizations.
> However, tlb_finish_mmu() sets fullmm, when in fact it want to say that
> the TLB should be fully flushed.

Maybe also rename fullmm?
Nadav Amit Jan. 31, 2021, 1:19 a.m. UTC | #2
> On Jan 30, 2021, at 5:02 PM, Andy Lutomirski <luto@kernel.org> wrote:
> 
> On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit <nadav.amit@gmail.com> wrote:
>> From: Nadav Amit <namit@vmware.com>
>> 
>> fullmm in mmu_gather is supposed to indicate that the mm is torn-down
>> (e.g., on process exit) and can therefore allow certain optimizations.
>> However, tlb_finish_mmu() sets fullmm, when in fact it want to say that
>> the TLB should be fully flushed.
> 
> Maybe also rename fullmm?

Possible. How about mm_torn_down?

I should have also changed the comment in tlb_finish_mmu().
Andy Lutomirski Jan. 31, 2021, 2:57 a.m. UTC | #3
On Sat, Jan 30, 2021 at 5:19 PM Nadav Amit <nadav.amit@gmail.com> wrote:
>
> > On Jan 30, 2021, at 5:02 PM, Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit <nadav.amit@gmail.com> wrote:
> >> From: Nadav Amit <namit@vmware.com>
> >>
> >> fullmm in mmu_gather is supposed to indicate that the mm is torn-down
> >> (e.g., on process exit) and can therefore allow certain optimizations.
> >> However, tlb_finish_mmu() sets fullmm, when in fact it want to say that
> >> the TLB should be fully flushed.
> >
> > Maybe also rename fullmm?
>
> Possible. How about mm_torn_down?

Sure.  Or mm_exiting, perhaps?

>
> I should have also changed the comment in tlb_finish_mmu().
Nadav Amit Feb. 1, 2021, 7:30 a.m. UTC | #4
> On Jan 30, 2021, at 6:57 PM, Andy Lutomirski <luto@kernel.org> wrote:
> 
> On Sat, Jan 30, 2021 at 5:19 PM Nadav Amit <nadav.amit@gmail.com> wrote:
>>> On Jan 30, 2021, at 5:02 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>> 
>>> On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit <nadav.amit@gmail.com> wrote:
>>>> From: Nadav Amit <namit@vmware.com>
>>>> 
>>>> fullmm in mmu_gather is supposed to indicate that the mm is torn-down
>>>> (e.g., on process exit) and can therefore allow certain optimizations.
>>>> However, tlb_finish_mmu() sets fullmm, when in fact it want to say that
>>>> the TLB should be fully flushed.
>>> 
>>> Maybe also rename fullmm?
>> 
>> Possible. How about mm_torn_down?
> 
> Sure.  Or mm_exiting, perhaps?

mm_exiting indeed sounds better.
Peter Zijlstra Feb. 1, 2021, 11:36 a.m. UTC | #5
https://lkml.kernel.org/r/20210127235347.1402-1-will@kernel.org
Nadav Amit Feb. 2, 2021, 9:32 a.m. UTC | #6
> On Feb 1, 2021, at 3:36 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> 
> https://lkml.kernel.org/r/20210127235347.1402-1-will@kernel.org

I have seen this series, and applied my patches on it.

Despite Will’s patches, there were still inconsistencies between fullmm
and need_flush_all.

Am I missing something?
Peter Zijlstra Feb. 2, 2021, 11 a.m. UTC | #7
On Tue, Feb 02, 2021 at 01:32:36AM -0800, Nadav Amit wrote:
> > On Feb 1, 2021, at 3:36 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > 
> > https://lkml.kernel.org/r/20210127235347.1402-1-will@kernel.org
> 
> I have seen this series, and applied my patches on it.
> 
> Despite Will’s patches, there were still inconsistencies between fullmm
> and need_flush_all.
> 
> Am I missing something?

I wasn't aware you were on top. I'll look again.
Nadav Amit Feb. 2, 2021, 9:35 p.m. UTC | #8
> On Feb 2, 2021, at 3:00 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Tue, Feb 02, 2021 at 01:32:36AM -0800, Nadav Amit wrote:
>>> On Feb 1, 2021, at 3:36 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> 
>>> 
>>> https://lkml.kernel.org/r/20210127235347.1402-1-will@kernel.org
>> 
>> I have seen this series, and applied my patches on it.
>> 
>> Despite Will’s patches, there were still inconsistencies between fullmm
>> and need_flush_all.
>> 
>> Am I missing something?
> 
> I wasn't aware you were on top. I'll look again.

Looking on arm64’s tlb_flush() makes me think that there is currently a bug
that this patch fixes. Arm64’s tlb_flush() does:

       /*
        * If we're tearing down the address space then we only care about
        * invalidating the walk-cache, since the ASID allocator won't
        * reallocate our ASID without invalidating the entire TLB.
        */
       if (tlb->fullmm) {
               if (!last_level)
                       flush_tlb_mm(tlb->mm);
               return;
       } 

But currently tlb_mmu_finish() can mistakenly set fullmm incorrectly (if
mm_tlb_flush_nested() is true), which might skip the TLB flush.

Lucky for us, arm64 flushes each VMA separately (which as we discussed
separately may not be necessary), so the only PTEs that might not be flushed
are PTEs that are updated concurrently by another thread that also defer
their flushes. It therefore seems that the implications are more on the
correctness of certain syscalls (e.g., madvise(DONT_NEED)) without
implications on security or memory corruptions.

Let me know if you want me to send this patch separately with an updated
commit log for faster inclusion.
Will Deacon Feb. 3, 2021, 9:44 a.m. UTC | #9
On Tue, Feb 02, 2021 at 01:35:38PM -0800, Nadav Amit wrote:
> > On Feb 2, 2021, at 3:00 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > On Tue, Feb 02, 2021 at 01:32:36AM -0800, Nadav Amit wrote:
> >>> On Feb 1, 2021, at 3:36 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >>> 
> >>> 
> >>> https://lkml.kernel.org/r/20210127235347.1402-1-will@kernel.org
> >> 
> >> I have seen this series, and applied my patches on it.
> >> 
> >> Despite Will’s patches, there were still inconsistencies between fullmm
> >> and need_flush_all.
> >> 
> >> Am I missing something?
> > 
> > I wasn't aware you were on top. I'll look again.
> 
> Looking on arm64’s tlb_flush() makes me think that there is currently a bug
> that this patch fixes. Arm64’s tlb_flush() does:
> 
>        /*
>         * If we're tearing down the address space then we only care about
>         * invalidating the walk-cache, since the ASID allocator won't
>         * reallocate our ASID without invalidating the entire TLB.
>         */
>        if (tlb->fullmm) {
>                if (!last_level)
>                        flush_tlb_mm(tlb->mm);
>                return;
>        } 
> 
> But currently tlb_mmu_finish() can mistakenly set fullmm incorrectly (if
> mm_tlb_flush_nested() is true), which might skip the TLB flush.

But in that case isn't 'freed_tables' set to 1, so 'last_level' will be
false and we'll do the flush in the code above?

Will
Nadav Amit Feb. 4, 2021, 3:20 a.m. UTC | #10
> On Feb 3, 2021, at 1:44 AM, Will Deacon <will@kernel.org> wrote:
> 
> On Tue, Feb 02, 2021 at 01:35:38PM -0800, Nadav Amit wrote:
>>> On Feb 2, 2021, at 3:00 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> 
>>> On Tue, Feb 02, 2021 at 01:32:36AM -0800, Nadav Amit wrote:
>>>>> On Feb 1, 2021, at 3:36 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>>>> 
>>>>> 
>>>>> https://lkml.kernel.org/r/20210127235347.1402-1-will@kernel.org
>>>> 
>>>> I have seen this series, and applied my patches on it.
>>>> 
>>>> Despite Will’s patches, there were still inconsistencies between fullmm
>>>> and need_flush_all.
>>>> 
>>>> Am I missing something?
>>> 
>>> I wasn't aware you were on top. I'll look again.
>> 
>> Looking on arm64’s tlb_flush() makes me think that there is currently a bug
>> that this patch fixes. Arm64’s tlb_flush() does:
>> 
>>       /*
>>        * If we're tearing down the address space then we only care about
>>        * invalidating the walk-cache, since the ASID allocator won't
>>        * reallocate our ASID without invalidating the entire TLB.
>>        */
>>       if (tlb->fullmm) {
>>               if (!last_level)
>>                       flush_tlb_mm(tlb->mm);
>>               return;
>>       } 
>> 
>> But currently tlb_mmu_finish() can mistakenly set fullmm incorrectly (if
>> mm_tlb_flush_nested() is true), which might skip the TLB flush.
> 
> But in that case isn't 'freed_tables' set to 1, so 'last_level' will be
> false and we'll do the flush in the code above?

Indeed. You are right. So no rush.
diff mbox series

Patch

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 2c68a545ffa7..eea113323468 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -420,7 +420,7 @@  static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 	 * these bits.
 	 */
 	if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds ||
-	      tlb->cleared_puds || tlb->cleared_p4ds))
+	      tlb->cleared_puds || tlb->cleared_p4ds || tlb->need_flush_all))
 		return;
 
 	tlb_flush(tlb);
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 0dc7149b0c61..5a659d4e59eb 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -323,7 +323,7 @@  void tlb_finish_mmu(struct mmu_gather *tlb)
 		 * On x86 non-fullmm doesn't yield significant difference
 		 * against fullmm.
 		 */
-		tlb->fullmm = 1;
+		tlb->need_flush_all = 1;
 		__tlb_reset_range(tlb);
 		tlb->freed_tables = 1;
 	}