Message ID | 20200814164358.4783-2-mathieu.desnoyers@efficios.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC,1/3] sched: fix exit_mm vs membarrier (v2) | expand |
Hi Mathieu, On Fri, Aug 14, 2020 at 12:43:56PM -0400, Mathieu Desnoyers wrote: > exit_mm should issue memory barriers after user-space memory accesses, > before clearing current->mm, to order user-space memory accesses > performed prior to exit_mm before clearing tsk->mm, which has the > effect of skipping the membarrier private expedited IPIs. > > The membarrier system call can be issued concurrently with do_exit > if we have thread groups created with CLONE_VM but not CLONE_THREAD. > > Here is the scenario I have in mind: > > Two thread groups are created, A and B. Thread group B is created by > issuing clone from group A with flag CLONE_VM set, but not CLONE_THREAD. > Let's assume we have a single thread within each thread group (Thread A > and Thread B). > > The AFAIU we can have: > > Userspace variables: > > int x = 0, y = 0; > > CPU 0 CPU 1 > Thread A Thread B > (in thread group A) (in thread group B) > > x = 1 > barrier() > y = 1 > exit() > exit_mm() > current->mm = NULL; > r1 = load y > membarrier() > skips CPU 0 (no IPI) because its current mm is NULL > r2 = load x > BUG_ON(r1 == 1 && r2 == 0) > > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org> > Cc: Will Deacon <will@kernel.org> > Cc: Paul E. McKenney <paulmck@kernel.org> > Cc: Nicholas Piggin <npiggin@gmail.com> > Cc: Andy Lutomirski <luto@amacapital.net> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Alan Stern <stern@rowland.harvard.edu> > Cc: linux-mm@kvack.org > --- > Changes since v1: > - Use smp_mb__after_spinlock rather than smp_mb. > - Document race scenario in commit message. > --- > kernel/exit.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/kernel/exit.c b/kernel/exit.c > index 733e80f334e7..fe64e6e28dd5 100644 > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -475,6 +475,14 @@ static void exit_mm(void) > BUG_ON(mm != current->active_mm); > /* more a memory barrier than a real lock */ > task_lock(current); > + /* > + * When a thread stops operating on an address space, the loop > + * in membarrier_{private,global}_expedited() may not observe Is it accurate to say that the correctness of membarrier_global_expedited() relies on the observation of ->mm? Because IIUC membarrier_global_expedited() loop doesn't check ->mm. Regards, Boqun > + * that tsk->mm, and not issue an IPI. Membarrier requires a > + * memory barrier after accessing user-space memory, before > + * clearing tsk->mm. > + */ > + smp_mb__after_spinlock(); > current->mm = NULL; > mmap_read_unlock(mm); > enter_lazy_tlb(mm, current); > -- > 2.11.0 >
----- On Aug 16, 2020, at 11:23 AM, Boqun Feng boqun.feng@gmail.com wrote: > Hi Mathieu, > > On Fri, Aug 14, 2020 at 12:43:56PM -0400, Mathieu Desnoyers wrote: >> exit_mm should issue memory barriers after user-space memory accesses, >> before clearing current->mm, to order user-space memory accesses >> performed prior to exit_mm before clearing tsk->mm, which has the >> effect of skipping the membarrier private expedited IPIs. >> >> The membarrier system call can be issued concurrently with do_exit >> if we have thread groups created with CLONE_VM but not CLONE_THREAD. >> >> Here is the scenario I have in mind: >> >> Two thread groups are created, A and B. Thread group B is created by >> issuing clone from group A with flag CLONE_VM set, but not CLONE_THREAD. >> Let's assume we have a single thread within each thread group (Thread A >> and Thread B). >> >> The AFAIU we can have: >> >> Userspace variables: >> >> int x = 0, y = 0; >> >> CPU 0 CPU 1 >> Thread A Thread B >> (in thread group A) (in thread group B) >> >> x = 1 >> barrier() >> y = 1 >> exit() >> exit_mm() >> current->mm = NULL; >> r1 = load y >> membarrier() >> skips CPU 0 (no IPI) because its current mm is NULL >> r2 = load x >> BUG_ON(r1 == 1 && r2 == 0) >> >> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> >> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> >> Cc: Will Deacon <will@kernel.org> >> Cc: Paul E. McKenney <paulmck@kernel.org> >> Cc: Nicholas Piggin <npiggin@gmail.com> >> Cc: Andy Lutomirski <luto@amacapital.net> >> Cc: Thomas Gleixner <tglx@linutronix.de> >> Cc: Linus Torvalds <torvalds@linux-foundation.org> >> Cc: Alan Stern <stern@rowland.harvard.edu> >> Cc: linux-mm@kvack.org >> --- >> Changes since v1: >> - Use smp_mb__after_spinlock rather than smp_mb. >> - Document race scenario in commit message. >> --- >> kernel/exit.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/kernel/exit.c b/kernel/exit.c >> index 733e80f334e7..fe64e6e28dd5 100644 >> --- a/kernel/exit.c >> +++ b/kernel/exit.c >> @@ -475,6 +475,14 @@ static void exit_mm(void) >> BUG_ON(mm != current->active_mm); >> /* more a memory barrier than a real lock */ >> task_lock(current); >> + /* >> + * When a thread stops operating on an address space, the loop >> + * in membarrier_{private,global}_expedited() may not observe > > Is it accurate to say that the correctness of > membarrier_global_expedited() relies on the observation of ->mm? Because > IIUC membarrier_global_expedited() loop doesn't check ->mm. Good point, I was wrong. Will instead reword as: /* * When a thread stops operating on an address space, the loop * in membarrier_private_expedited() may not observe that * tsk->mm, and the loop in membarrier_global_expedited() may * not observe a MEMBARRIER_STATE_GLOBAL_EXPEDITED * rq->membarrier_state, so those would not issue an IPI. * Membarrier requires a memory barrier after accessing * user-space memory, before clearing tsk->mm or the * rq->membarrier_state. */ And I'll make sure exit_mm clears this_rq()->membarrier_state as well. Thanks, Mathieu > > Regards, > Boqun > >> + * that tsk->mm, and not issue an IPI. Membarrier requires a >> + * memory barrier after accessing user-space memory, before >> + * clearing tsk->mm. >> + */ >> + smp_mb__after_spinlock(); >> current->mm = NULL; >> mmap_read_unlock(mm); >> enter_lazy_tlb(mm, current); >> -- >> 2.11.0
diff --git a/kernel/exit.c b/kernel/exit.c index 733e80f334e7..fe64e6e28dd5 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -475,6 +475,14 @@ static void exit_mm(void) BUG_ON(mm != current->active_mm); /* more a memory barrier than a real lock */ task_lock(current); + /* + * When a thread stops operating on an address space, the loop + * in membarrier_{private,global}_expedited() may not observe + * that tsk->mm, and not issue an IPI. Membarrier requires a + * memory barrier after accessing user-space memory, before + * clearing tsk->mm. + */ + smp_mb__after_spinlock(); current->mm = NULL; mmap_read_unlock(mm); enter_lazy_tlb(mm, current);
exit_mm should issue memory barriers after user-space memory accesses, before clearing current->mm, to order user-space memory accesses performed prior to exit_mm before clearing tsk->mm, which has the effect of skipping the membarrier private expedited IPIs. The membarrier system call can be issued concurrently with do_exit if we have thread groups created with CLONE_VM but not CLONE_THREAD. Here is the scenario I have in mind: Two thread groups are created, A and B. Thread group B is created by issuing clone from group A with flag CLONE_VM set, but not CLONE_THREAD. Let's assume we have a single thread within each thread group (Thread A and Thread B). The AFAIU we can have: Userspace variables: int x = 0, y = 0; CPU 0 CPU 1 Thread A Thread B (in thread group A) (in thread group B) x = 1 barrier() y = 1 exit() exit_mm() current->mm = NULL; r1 = load y membarrier() skips CPU 0 (no IPI) because its current mm is NULL r2 = load x BUG_ON(r1 == 1 && r2 == 0) Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: linux-mm@kvack.org --- Changes since v1: - Use smp_mb__after_spinlock rather than smp_mb. - Document race scenario in commit message. --- kernel/exit.c | 8 ++++++++ 1 file changed, 8 insertions(+)