diff mbox series

[1/2] mm: use_mm: fix for arches checking mm_users to optimize TLB flushes

Message ID 20200203201745.29986-2-aarcange@redhat.com (mailing list archive)
State New, archived
Headers show
Series arm64: tlb: skip tlbi broadcast for single threaded TLB flushes | expand

Commit Message

Andrea Arcangeli Feb. 3, 2020, 8:17 p.m. UTC
alpha, ia64, mips, powerpc, sh, sparc are relying on a check on
mm->mm_users to know if they can skip some remote TLB flushes for
single threaded processes.

Most callers of use_mm() tend to invoke mmget_not_zero() or
get_task_mm() before use_mm() to ensure the mm will remain alive in
between use_mm() and unuse_mm().

Some callers however don't increase mm_users and they instead rely on
serialization in __mmput() to ensure the mm will remain alive in
between use_mm() and unuse_mm(). Not increasing mm_users during
use_mm() is however unsafe for aforementioned arch TLB flushes
optimizations. So either mmget()/mmput() should be added to the
problematic callers of use_mm()/unuse_mm() or we can embed them in
use_mm()/unuse_mm() which is more robust.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 mm/mmu_context.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Michal Hocko Feb. 18, 2020, 11:31 a.m. UTC | #1
On Mon 03-02-20 15:17:44, Andrea Arcangeli wrote:
> alpha, ia64, mips, powerpc, sh, sparc are relying on a check on
> mm->mm_users to know if they can skip some remote TLB flushes for
> single threaded processes.
> 
> Most callers of use_mm() tend to invoke mmget_not_zero() or
> get_task_mm() before use_mm() to ensure the mm will remain alive in
> between use_mm() and unuse_mm().
> 
> Some callers however don't increase mm_users and they instead rely on
> serialization in __mmput() to ensure the mm will remain alive in
> between use_mm() and unuse_mm(). Not increasing mm_users during
> use_mm() is however unsafe for aforementioned arch TLB flushes
> optimizations. So either mmget()/mmput() should be added to the
> problematic callers of use_mm()/unuse_mm() or we can embed them in
> use_mm()/unuse_mm() which is more robust.

I would prefer we do not do that because then the real owner of the mm
cannot really tear down the address space and the life time of it is
bound to a kernel thread doing the use_mm. This is undesirable I would
really prefer if the existing few users would use mmget only when they
really need to access mm.

> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
>  mm/mmu_context.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/mmu_context.c b/mm/mmu_context.c
> index 3e612ae748e9..ced0e1218c0f 100644
> --- a/mm/mmu_context.c
> +++ b/mm/mmu_context.c
> @@ -30,6 +30,7 @@ void use_mm(struct mm_struct *mm)
>  		mmgrab(mm);
>  		tsk->active_mm = mm;
>  	}
> +	mmget(mm);
>  	tsk->mm = mm;
>  	switch_mm(active_mm, mm, tsk);
>  	task_unlock(tsk);
> @@ -57,6 +58,7 @@ void unuse_mm(struct mm_struct *mm)
>  	task_lock(tsk);
>  	sync_mm_rss(mm);
>  	tsk->mm = NULL;
> +	mmput(mm);
>  	/* active_mm is still 'mm' */
>  	enter_lazy_tlb(mm, tsk);
>  	task_unlock(tsk);
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Andrea Arcangeli Feb. 18, 2020, 6:56 p.m. UTC | #2
Hi Michal!

On Tue, Feb 18, 2020 at 12:31:03PM +0100, Michal Hocko wrote:
> On Mon 03-02-20 15:17:44, Andrea Arcangeli wrote:
> > alpha, ia64, mips, powerpc, sh, sparc are relying on a check on
> > mm->mm_users to know if they can skip some remote TLB flushes for
> > single threaded processes.
> > 
> > Most callers of use_mm() tend to invoke mmget_not_zero() or
> > get_task_mm() before use_mm() to ensure the mm will remain alive in
> > between use_mm() and unuse_mm().
> > 
> > Some callers however don't increase mm_users and they instead rely on
> > serialization in __mmput() to ensure the mm will remain alive in
> > between use_mm() and unuse_mm(). Not increasing mm_users during
> > use_mm() is however unsafe for aforementioned arch TLB flushes
> > optimizations. So either mmget()/mmput() should be added to the
> > problematic callers of use_mm()/unuse_mm() or we can embed them in
> > use_mm()/unuse_mm() which is more robust.
> 
> I would prefer we do not do that because then the real owner of the mm
> cannot really tear down the address space and the life time of it is
> bound to a kernel thread doing the use_mm. This is undesirable I would
> really prefer if the existing few users would use mmget only when they
> really need to access mm.

If the existing few users that don't already do the explicit mmget
will have to start doing it too, the end result will be exactly the
same that you described in your "cons" (lieftime of the mm will still
be up to who did mmget;use_mm and didn't call unuse_mm;mmput yet).

One reason to prefer adding the mmget to the callers to forget it,
would be to avoid an atomic op in use_mm (for those callers that
didn't forget it), but if anybody is doing use_mm in a fast path that
won't be very fast anyway so I didn't think this was worth the
risk. If that microoptimization in a slow path is the reason we should
add mmget to the callers that forgot it that would be fine with me
although I think it's risky because if already happened once and it
could happen again (and when it happens it only bits a few arches if
used with a few drivers).

On a side note the patch 2/2 should be dropped for now, I'm looking if
we can optimize away TLB-i broadcasts from multithreaded apps too.

Thanks,
Andrea

> 
> > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> > ---
> >  mm/mmu_context.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/mm/mmu_context.c b/mm/mmu_context.c
> > index 3e612ae748e9..ced0e1218c0f 100644
> > --- a/mm/mmu_context.c
> > +++ b/mm/mmu_context.c
> > @@ -30,6 +30,7 @@ void use_mm(struct mm_struct *mm)
> >  		mmgrab(mm);
> >  		tsk->active_mm = mm;
> >  	}
> > +	mmget(mm);
> >  	tsk->mm = mm;
> >  	switch_mm(active_mm, mm, tsk);
> >  	task_unlock(tsk);
> > @@ -57,6 +58,7 @@ void unuse_mm(struct mm_struct *mm)
> >  	task_lock(tsk);
> >  	sync_mm_rss(mm);
> >  	tsk->mm = NULL;
> > +	mmput(mm);
> >  	/* active_mm is still 'mm' */
> >  	enter_lazy_tlb(mm, tsk);
> >  	task_unlock(tsk);
> > 
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> -- 
> Michal Hocko
> SUSE Labs
>
Michal Hocko Feb. 19, 2020, 11:58 a.m. UTC | #3
On Tue 18-02-20 13:56:18, Andrea Arcangeli wrote:
> Hi Michal!
> 
> On Tue, Feb 18, 2020 at 12:31:03PM +0100, Michal Hocko wrote:
> > On Mon 03-02-20 15:17:44, Andrea Arcangeli wrote:
> > > alpha, ia64, mips, powerpc, sh, sparc are relying on a check on
> > > mm->mm_users to know if they can skip some remote TLB flushes for
> > > single threaded processes.
> > > 
> > > Most callers of use_mm() tend to invoke mmget_not_zero() or
> > > get_task_mm() before use_mm() to ensure the mm will remain alive in
> > > between use_mm() and unuse_mm().
> > > 
> > > Some callers however don't increase mm_users and they instead rely on
> > > serialization in __mmput() to ensure the mm will remain alive in
> > > between use_mm() and unuse_mm(). Not increasing mm_users during
> > > use_mm() is however unsafe for aforementioned arch TLB flushes
> > > optimizations. So either mmget()/mmput() should be added to the
> > > problematic callers of use_mm()/unuse_mm() or we can embed them in
> > > use_mm()/unuse_mm() which is more robust.
> > 
> > I would prefer we do not do that because then the real owner of the mm
> > cannot really tear down the address space and the life time of it is
> > bound to a kernel thread doing the use_mm. This is undesirable I would
> > really prefer if the existing few users would use mmget only when they
> > really need to access mm.
> 
> If the existing few users that don't already do the explicit mmget
> will have to start doing it too, the end result will be exactly the
> same that you described in your "cons" (lieftime of the mm will still
> be up to who did mmget;use_mm and didn't call unuse_mm;mmput yet).

Well, they should use mmget only for moments when they access the
address space.

> One reason to prefer adding the mmget to the callers to forget it,
> would be to avoid an atomic op in use_mm (for those callers that
> didn't forget it), but if anybody is doing use_mm in a fast path that
> won't be very fast anyway so I didn't think this was worth the
> risk. If that microoptimization in a slow path is the reason we should
> add mmget to the callers that forgot it that would be fine with me
> although I think it's risky because if already happened once and it
> could happen again (and when it happens it only bits a few arches if
> used with a few drivers).

The primary concern really is that use_mm is usually not bound in time
and we do not want to pin the address space for such a long time without
any binding to a killable process.
Andrea Arcangeli Feb. 19, 2020, 8:04 p.m. UTC | #4
On Wed, Feb 19, 2020 at 12:58:55PM +0100, Michal Hocko wrote:
> On Tue 18-02-20 13:56:18, Andrea Arcangeli wrote:
> > Hi Michal!
> > 
> > On Tue, Feb 18, 2020 at 12:31:03PM +0100, Michal Hocko wrote:
> > > On Mon 03-02-20 15:17:44, Andrea Arcangeli wrote:
> > > > alpha, ia64, mips, powerpc, sh, sparc are relying on a check on
> > > > mm->mm_users to know if they can skip some remote TLB flushes for
> > > > single threaded processes.
> > > > 
> > > > Most callers of use_mm() tend to invoke mmget_not_zero() or
> > > > get_task_mm() before use_mm() to ensure the mm will remain alive in
> > > > between use_mm() and unuse_mm().
> > > > 
> > > > Some callers however don't increase mm_users and they instead rely on
> > > > serialization in __mmput() to ensure the mm will remain alive in
> > > > between use_mm() and unuse_mm(). Not increasing mm_users during
> > > > use_mm() is however unsafe for aforementioned arch TLB flushes
> > > > optimizations. So either mmget()/mmput() should be added to the
> > > > problematic callers of use_mm()/unuse_mm() or we can embed them in
> > > > use_mm()/unuse_mm() which is more robust.
> > > 
> > > I would prefer we do not do that because then the real owner of the mm
> > > cannot really tear down the address space and the life time of it is
> > > bound to a kernel thread doing the use_mm. This is undesirable I would
> > > really prefer if the existing few users would use mmget only when they
> > > really need to access mm.
> > 
> > If the existing few users that don't already do the explicit mmget
> > will have to start doing it too, the end result will be exactly the
> > same that you described in your "cons" (lieftime of the mm will still
> > be up to who did mmget;use_mm and didn't call unuse_mm;mmput yet).
> 
> Well, they should use mmget only for moments when they access the
> address space.

I don't think so because mmget is a pure atomic_inc. How can you serialize
that against a tlb flush that just does an atomic_read() on mm_users?

At the very least you will have to invent something new slower than
mmget that adds some serialization against the TLB flush code and the
common code would need to learn to use that. And the question is how
much faster that can be than switch_mm() that use_mm already invokes.

It doesn't need to block but it needs a smp_mb() on both sides.

i.e. after atomic_inc(&mm_users) you need a smp_mb() and a
test_and_clear_bit and conditional local tlb flush. In the TLB flush
code you need to set the bit, smp_mb() and then atomic_read(&mm_users).

As things are now (no new mmget_local_tlb_flush() call in common code)
the mmget has to happen regardless before use_mm() so there's a slight
chance to serialize it in switch_mm arch code. Note I did put a
smp_mb() myself in the switch_mm path to make it safe in 2/2.

With your solution it'd be already possible to call mmput at any time
after using the mm and before calling unuse_mm. But then you'd have to
still call unuse_mm; mmget;use_mm again before you can use the mm
again so why not to call unuse_mm immediately instead of only mmput?

> > One reason to prefer adding the mmget to the callers to forget it,
> > would be to avoid an atomic op in use_mm (for those callers that
> > didn't forget it), but if anybody is doing use_mm in a fast path that
> > won't be very fast anyway so I didn't think this was worth the
> > risk. If that microoptimization in a slow path is the reason we should
> > add mmget to the callers that forgot it that would be fine with me
> > although I think it's risky because if already happened once and it
> > could happen again (and when it happens it only bits a few arches if
> > used with a few drivers).
> 
> The primary concern really is that use_mm is usually not bound in time
> and we do not want to pin the address space for such a long time without
> any binding to a killable process.

Almost all use_mm already do mmget before use_mm and mmput after
unuse_mm, or perhaps I wouldn't have to find about this by source
review only. So the cons you describe already affects the vast
majority of use_mm cases (the non buggy ones).

I see you concern for adding a "cons" to the minority of remaining
cases, but it's still better to introduce that "cons" than corrupting
userland memory.

Thanks,
Andrea
diff mbox series

Patch

diff --git a/mm/mmu_context.c b/mm/mmu_context.c
index 3e612ae748e9..ced0e1218c0f 100644
--- a/mm/mmu_context.c
+++ b/mm/mmu_context.c
@@ -30,6 +30,7 @@  void use_mm(struct mm_struct *mm)
 		mmgrab(mm);
 		tsk->active_mm = mm;
 	}
+	mmget(mm);
 	tsk->mm = mm;
 	switch_mm(active_mm, mm, tsk);
 	task_unlock(tsk);
@@ -57,6 +58,7 @@  void unuse_mm(struct mm_struct *mm)
 	task_lock(tsk);
 	sync_mm_rss(mm);
 	tsk->mm = NULL;
+	mmput(mm);
 	/* active_mm is still 'mm' */
 	enter_lazy_tlb(mm, tsk);
 	task_unlock(tsk);