Message ID | 20200203201745.29986-2-aarcange@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: tlb: skip tlbi broadcast for single threaded TLB flushes | expand |
On Mon 03-02-20 15:17:44, Andrea Arcangeli wrote: > alpha, ia64, mips, powerpc, sh, sparc are relying on a check on > mm->mm_users to know if they can skip some remote TLB flushes for > single threaded processes. > > Most callers of use_mm() tend to invoke mmget_not_zero() or > get_task_mm() before use_mm() to ensure the mm will remain alive in > between use_mm() and unuse_mm(). > > Some callers however don't increase mm_users and they instead rely on > serialization in __mmput() to ensure the mm will remain alive in > between use_mm() and unuse_mm(). Not increasing mm_users during > use_mm() is however unsafe for aforementioned arch TLB flushes > optimizations. So either mmget()/mmput() should be added to the > problematic callers of use_mm()/unuse_mm() or we can embed them in > use_mm()/unuse_mm() which is more robust. I would prefer we do not do that because then the real owner of the mm cannot really tear down the address space and the life time of it is bound to a kernel thread doing the use_mm. This is undesirable I would really prefer if the existing few users would use mmget only when they really need to access mm. > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > --- > mm/mmu_context.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/mmu_context.c b/mm/mmu_context.c > index 3e612ae748e9..ced0e1218c0f 100644 > --- a/mm/mmu_context.c > +++ b/mm/mmu_context.c > @@ -30,6 +30,7 @@ void use_mm(struct mm_struct *mm) > mmgrab(mm); > tsk->active_mm = mm; > } > + mmget(mm); > tsk->mm = mm; > switch_mm(active_mm, mm, tsk); > task_unlock(tsk); > @@ -57,6 +58,7 @@ void unuse_mm(struct mm_struct *mm) > task_lock(tsk); > sync_mm_rss(mm); > tsk->mm = NULL; > + mmput(mm); > /* active_mm is still 'mm' */ > enter_lazy_tlb(mm, tsk); > task_unlock(tsk); > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Hi Michal! On Tue, Feb 18, 2020 at 12:31:03PM +0100, Michal Hocko wrote: > On Mon 03-02-20 15:17:44, Andrea Arcangeli wrote: > > alpha, ia64, mips, powerpc, sh, sparc are relying on a check on > > mm->mm_users to know if they can skip some remote TLB flushes for > > single threaded processes. > > > > Most callers of use_mm() tend to invoke mmget_not_zero() or > > get_task_mm() before use_mm() to ensure the mm will remain alive in > > between use_mm() and unuse_mm(). > > > > Some callers however don't increase mm_users and they instead rely on > > serialization in __mmput() to ensure the mm will remain alive in > > between use_mm() and unuse_mm(). Not increasing mm_users during > > use_mm() is however unsafe for aforementioned arch TLB flushes > > optimizations. So either mmget()/mmput() should be added to the > > problematic callers of use_mm()/unuse_mm() or we can embed them in > > use_mm()/unuse_mm() which is more robust. > > I would prefer we do not do that because then the real owner of the mm > cannot really tear down the address space and the life time of it is > bound to a kernel thread doing the use_mm. This is undesirable I would > really prefer if the existing few users would use mmget only when they > really need to access mm. If the existing few users that don't already do the explicit mmget will have to start doing it too, the end result will be exactly the same that you described in your "cons" (lieftime of the mm will still be up to who did mmget;use_mm and didn't call unuse_mm;mmput yet). One reason to prefer adding the mmget to the callers to forget it, would be to avoid an atomic op in use_mm (for those callers that didn't forget it), but if anybody is doing use_mm in a fast path that won't be very fast anyway so I didn't think this was worth the risk. If that microoptimization in a slow path is the reason we should add mmget to the callers that forgot it that would be fine with me although I think it's risky because if already happened once and it could happen again (and when it happens it only bits a few arches if used with a few drivers). On a side note the patch 2/2 should be dropped for now, I'm looking if we can optimize away TLB-i broadcasts from multithreaded apps too. Thanks, Andrea > > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > > --- > > mm/mmu_context.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/mm/mmu_context.c b/mm/mmu_context.c > > index 3e612ae748e9..ced0e1218c0f 100644 > > --- a/mm/mmu_context.c > > +++ b/mm/mmu_context.c > > @@ -30,6 +30,7 @@ void use_mm(struct mm_struct *mm) > > mmgrab(mm); > > tsk->active_mm = mm; > > } > > + mmget(mm); > > tsk->mm = mm; > > switch_mm(active_mm, mm, tsk); > > task_unlock(tsk); > > @@ -57,6 +58,7 @@ void unuse_mm(struct mm_struct *mm) > > task_lock(tsk); > > sync_mm_rss(mm); > > tsk->mm = NULL; > > + mmput(mm); > > /* active_mm is still 'mm' */ > > enter_lazy_tlb(mm, tsk); > > task_unlock(tsk); > > > > > > _______________________________________________ > > linux-arm-kernel mailing list > > linux-arm-kernel@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > -- > Michal Hocko > SUSE Labs >
On Tue 18-02-20 13:56:18, Andrea Arcangeli wrote: > Hi Michal! > > On Tue, Feb 18, 2020 at 12:31:03PM +0100, Michal Hocko wrote: > > On Mon 03-02-20 15:17:44, Andrea Arcangeli wrote: > > > alpha, ia64, mips, powerpc, sh, sparc are relying on a check on > > > mm->mm_users to know if they can skip some remote TLB flushes for > > > single threaded processes. > > > > > > Most callers of use_mm() tend to invoke mmget_not_zero() or > > > get_task_mm() before use_mm() to ensure the mm will remain alive in > > > between use_mm() and unuse_mm(). > > > > > > Some callers however don't increase mm_users and they instead rely on > > > serialization in __mmput() to ensure the mm will remain alive in > > > between use_mm() and unuse_mm(). Not increasing mm_users during > > > use_mm() is however unsafe for aforementioned arch TLB flushes > > > optimizations. So either mmget()/mmput() should be added to the > > > problematic callers of use_mm()/unuse_mm() or we can embed them in > > > use_mm()/unuse_mm() which is more robust. > > > > I would prefer we do not do that because then the real owner of the mm > > cannot really tear down the address space and the life time of it is > > bound to a kernel thread doing the use_mm. This is undesirable I would > > really prefer if the existing few users would use mmget only when they > > really need to access mm. > > If the existing few users that don't already do the explicit mmget > will have to start doing it too, the end result will be exactly the > same that you described in your "cons" (lieftime of the mm will still > be up to who did mmget;use_mm and didn't call unuse_mm;mmput yet). Well, they should use mmget only for moments when they access the address space. > One reason to prefer adding the mmget to the callers to forget it, > would be to avoid an atomic op in use_mm (for those callers that > didn't forget it), but if anybody is doing use_mm in a fast path that > won't be very fast anyway so I didn't think this was worth the > risk. If that microoptimization in a slow path is the reason we should > add mmget to the callers that forgot it that would be fine with me > although I think it's risky because if already happened once and it > could happen again (and when it happens it only bits a few arches if > used with a few drivers). The primary concern really is that use_mm is usually not bound in time and we do not want to pin the address space for such a long time without any binding to a killable process.
On Wed, Feb 19, 2020 at 12:58:55PM +0100, Michal Hocko wrote: > On Tue 18-02-20 13:56:18, Andrea Arcangeli wrote: > > Hi Michal! > > > > On Tue, Feb 18, 2020 at 12:31:03PM +0100, Michal Hocko wrote: > > > On Mon 03-02-20 15:17:44, Andrea Arcangeli wrote: > > > > alpha, ia64, mips, powerpc, sh, sparc are relying on a check on > > > > mm->mm_users to know if they can skip some remote TLB flushes for > > > > single threaded processes. > > > > > > > > Most callers of use_mm() tend to invoke mmget_not_zero() or > > > > get_task_mm() before use_mm() to ensure the mm will remain alive in > > > > between use_mm() and unuse_mm(). > > > > > > > > Some callers however don't increase mm_users and they instead rely on > > > > serialization in __mmput() to ensure the mm will remain alive in > > > > between use_mm() and unuse_mm(). Not increasing mm_users during > > > > use_mm() is however unsafe for aforementioned arch TLB flushes > > > > optimizations. So either mmget()/mmput() should be added to the > > > > problematic callers of use_mm()/unuse_mm() or we can embed them in > > > > use_mm()/unuse_mm() which is more robust. > > > > > > I would prefer we do not do that because then the real owner of the mm > > > cannot really tear down the address space and the life time of it is > > > bound to a kernel thread doing the use_mm. This is undesirable I would > > > really prefer if the existing few users would use mmget only when they > > > really need to access mm. > > > > If the existing few users that don't already do the explicit mmget > > will have to start doing it too, the end result will be exactly the > > same that you described in your "cons" (lieftime of the mm will still > > be up to who did mmget;use_mm and didn't call unuse_mm;mmput yet). > > Well, they should use mmget only for moments when they access the > address space. I don't think so because mmget is a pure atomic_inc. How can you serialize that against a tlb flush that just does an atomic_read() on mm_users? At the very least you will have to invent something new slower than mmget that adds some serialization against the TLB flush code and the common code would need to learn to use that. And the question is how much faster that can be than switch_mm() that use_mm already invokes. It doesn't need to block but it needs a smp_mb() on both sides. i.e. after atomic_inc(&mm_users) you need a smp_mb() and a test_and_clear_bit and conditional local tlb flush. In the TLB flush code you need to set the bit, smp_mb() and then atomic_read(&mm_users). As things are now (no new mmget_local_tlb_flush() call in common code) the mmget has to happen regardless before use_mm() so there's a slight chance to serialize it in switch_mm arch code. Note I did put a smp_mb() myself in the switch_mm path to make it safe in 2/2. With your solution it'd be already possible to call mmput at any time after using the mm and before calling unuse_mm. But then you'd have to still call unuse_mm; mmget;use_mm again before you can use the mm again so why not to call unuse_mm immediately instead of only mmput? > > One reason to prefer adding the mmget to the callers to forget it, > > would be to avoid an atomic op in use_mm (for those callers that > > didn't forget it), but if anybody is doing use_mm in a fast path that > > won't be very fast anyway so I didn't think this was worth the > > risk. If that microoptimization in a slow path is the reason we should > > add mmget to the callers that forgot it that would be fine with me > > although I think it's risky because if already happened once and it > > could happen again (and when it happens it only bits a few arches if > > used with a few drivers). > > The primary concern really is that use_mm is usually not bound in time > and we do not want to pin the address space for such a long time without > any binding to a killable process. Almost all use_mm already do mmget before use_mm and mmput after unuse_mm, or perhaps I wouldn't have to find about this by source review only. So the cons you describe already affects the vast majority of use_mm cases (the non buggy ones). I see you concern for adding a "cons" to the minority of remaining cases, but it's still better to introduce that "cons" than corrupting userland memory. Thanks, Andrea
diff --git a/mm/mmu_context.c b/mm/mmu_context.c index 3e612ae748e9..ced0e1218c0f 100644 --- a/mm/mmu_context.c +++ b/mm/mmu_context.c @@ -30,6 +30,7 @@ void use_mm(struct mm_struct *mm) mmgrab(mm); tsk->active_mm = mm; } + mmget(mm); tsk->mm = mm; switch_mm(active_mm, mm, tsk); task_unlock(tsk); @@ -57,6 +58,7 @@ void unuse_mm(struct mm_struct *mm) task_lock(tsk); sync_mm_rss(mm); tsk->mm = NULL; + mmput(mm); /* active_mm is still 'mm' */ enter_lazy_tlb(mm, tsk); task_unlock(tsk);
alpha, ia64, mips, powerpc, sh, sparc are relying on a check on mm->mm_users to know if they can skip some remote TLB flushes for single threaded processes. Most callers of use_mm() tend to invoke mmget_not_zero() or get_task_mm() before use_mm() to ensure the mm will remain alive in between use_mm() and unuse_mm(). Some callers however don't increase mm_users and they instead rely on serialization in __mmput() to ensure the mm will remain alive in between use_mm() and unuse_mm(). Not increasing mm_users during use_mm() is however unsafe for aforementioned arch TLB flushes optimizations. So either mmget()/mmput() should be added to the problematic callers of use_mm()/unuse_mm() or we can embed them in use_mm()/unuse_mm() which is more robust. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- mm/mmu_context.c | 2 ++ 1 file changed, 2 insertions(+)