Message ID | 20240906051205.530219-2-andrii@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | uprobes,mm: speculative lockless VMA-to-uprobe lookup | expand |
On Fri, Sep 6, 2024 at 7:12 AM Andrii Nakryiko <andrii@kernel.org> wrote: > +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, int seq) > +{ > + /* Pairs with RELEASE semantics in inc_mm_lock_seq(). */ > + return seq == smp_load_acquire(&mm->mm_lock_seq); > +} A load-acquire can't provide "end of locked section" semantics - a load-acquire is a one-way barrier, you can basically use it for "acquire lock" semantics but not for "release lock" semantics, because the CPU will prevent reordering the load with *later* loads but not with *earlier* loads. So if you do: mmap_lock_speculation_start() [locked reads go here] mmap_lock_speculation_end() then the CPU is allowed to reorder your instructions like this: mmap_lock_speculation_start() mmap_lock_speculation_end() [locked reads go here] so the lock is broken. > static inline void mmap_write_lock(struct mm_struct *mm) > { > __mmap_lock_trace_start_locking(mm, true); > down_write(&mm->mmap_lock); > + inc_mm_lock_seq(mm); > __mmap_lock_trace_acquire_returned(mm, true, true); > } Similarly, inc_mm_lock_seq(), which does a store-release, can only provide "release lock" semantics, not "take lock" semantics, because the CPU can reorder it with later stores; for example, this code: inc_mm_lock_seq() [locked stuff goes here] inc_mm_lock_seq() can be reordered into this: [locked stuff goes here] inc_mm_lock_seq() inc_mm_lock_seq() so the lock is broken. For "taking a lock" with a memory store, or "dropping a lock" with a memory load, you need heavier memory barriers, see Documentation/memory-barriers.txt.
On Mon, Sep 9, 2024 at 5:35 AM Jann Horn <jannh@google.com> wrote: > > On Fri, Sep 6, 2024 at 7:12 AM Andrii Nakryiko <andrii@kernel.org> wrote: > > +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, int seq) > > +{ > > + /* Pairs with RELEASE semantics in inc_mm_lock_seq(). */ > > + return seq == smp_load_acquire(&mm->mm_lock_seq); > > +} > > A load-acquire can't provide "end of locked section" semantics - a > load-acquire is a one-way barrier, you can basically use it for > "acquire lock" semantics but not for "release lock" semantics, because > the CPU will prevent reordering the load with *later* loads but not > with *earlier* loads. So if you do: > > mmap_lock_speculation_start() > [locked reads go here] > mmap_lock_speculation_end() > > then the CPU is allowed to reorder your instructions like this: > > mmap_lock_speculation_start() > mmap_lock_speculation_end() > [locked reads go here] > > so the lock is broken. Hi Jann, Thanks for the review! Yeah, you are right, we do need an smp_rmb() before we compare mm->mm_lock_seq with the stored seq. Otherwise reads might get reordered this way: CPU1 CPU2 mmap_lock_speculation_start() // seq = mm->mm_lock_seq reloaded_seq = mm->mm_lock_seq; // reordered read mmap_write_lock() // inc_mm_lock_seq(mm) vma->vm_file = ...; mmap_write_unlock() // inc_mm_lock_seq(mm) <speculate> mmap_lock_speculation_end() // return (reloaded_seq == seq) > > > static inline void mmap_write_lock(struct mm_struct *mm) > > { > > __mmap_lock_trace_start_locking(mm, true); > > down_write(&mm->mmap_lock); > > + inc_mm_lock_seq(mm); > > __mmap_lock_trace_acquire_returned(mm, true, true); > > } > > Similarly, inc_mm_lock_seq(), which does a store-release, can only > provide "release lock" semantics, not "take lock" semantics, because > the CPU can reorder it with later stores; for example, this code: > > inc_mm_lock_seq() > [locked stuff goes here] > inc_mm_lock_seq() > > can be reordered into this: > > [locked stuff goes here] > inc_mm_lock_seq() > inc_mm_lock_seq() > > so the lock is broken. Ugh, yes. We do need smp_wmb() AFTER the inc_mm_lock_seq(). Whenever we use inc_mm_lock_seq() for "take lock" semantics, it's preceded by a down_write(&mm->mmap_lock) with implied ACQUIRE ordering. So I thought we can use it but I realize now that this reordering is still possible: CPU1 CPU2 mmap_write_lock() down_write(&mm->mmap_lock); vma->vm_file = ...; mmap_lock_speculation_start() // seq = mm->mm_lock_seq <speculate> mmap_lock_speculation_end() // return (mm->mm_lock_seq == seq) inc_mm_lock_seq(mm); mmap_write_unlock() // inc_mm_lock_seq(mm) Is that what you were describing? Thanks, Suren. > > For "taking a lock" with a memory store, or "dropping a lock" with a > memory load, you need heavier memory barriers, see > Documentation/memory-barriers.txt.
On Tue, Sep 10, 2024 at 4:09 AM Suren Baghdasaryan <surenb@google.com> wrote: > On Mon, Sep 9, 2024 at 5:35 AM Jann Horn <jannh@google.com> wrote: > > On Fri, Sep 6, 2024 at 7:12 AM Andrii Nakryiko <andrii@kernel.org> wrote: > > > static inline void mmap_write_lock(struct mm_struct *mm) > > > { > > > __mmap_lock_trace_start_locking(mm, true); > > > down_write(&mm->mmap_lock); > > > + inc_mm_lock_seq(mm); > > > __mmap_lock_trace_acquire_returned(mm, true, true); > > > } > > > > Similarly, inc_mm_lock_seq(), which does a store-release, can only > > provide "release lock" semantics, not "take lock" semantics, because > > the CPU can reorder it with later stores; for example, this code: > > > > inc_mm_lock_seq() > > [locked stuff goes here] > > inc_mm_lock_seq() > > > > can be reordered into this: > > > > [locked stuff goes here] > > inc_mm_lock_seq() > > inc_mm_lock_seq() > > > > so the lock is broken. > > Ugh, yes. We do need smp_wmb() AFTER the inc_mm_lock_seq(). Whenever > we use inc_mm_lock_seq() for "take lock" semantics, it's preceded by a > down_write(&mm->mmap_lock) with implied ACQUIRE ordering. So I thought > we can use it but I realize now that this reordering is still > possible: > CPU1 CPU2 > mmap_write_lock() > down_write(&mm->mmap_lock); > vma->vm_file = ...; > > mmap_lock_speculation_start() // seq = mm->mm_lock_seq > <speculate> > mmap_lock_speculation_end() // return (mm->mm_lock_seq == seq) > > inc_mm_lock_seq(mm); > mmap_write_unlock() // inc_mm_lock_seq(mm) > > Is that what you were describing? Yeah, that's the scenario I was thinking of (though I did not spend the time to look at the surroundings to see if there are other implied barriers that happen to stop this).
On Mon, Sep 9, 2024 at 7:09 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Mon, Sep 9, 2024 at 5:35 AM Jann Horn <jannh@google.com> wrote: > > > > On Fri, Sep 6, 2024 at 7:12 AM Andrii Nakryiko <andrii@kernel.org> wrote: > > > +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, int seq) > > > +{ > > > + /* Pairs with RELEASE semantics in inc_mm_lock_seq(). */ > > > + return seq == smp_load_acquire(&mm->mm_lock_seq); > > > +} > > > > A load-acquire can't provide "end of locked section" semantics - a > > load-acquire is a one-way barrier, you can basically use it for > > "acquire lock" semantics but not for "release lock" semantics, because > > the CPU will prevent reordering the load with *later* loads but not > > with *earlier* loads. So if you do: > > > > mmap_lock_speculation_start() > > [locked reads go here] > > mmap_lock_speculation_end() > > > > then the CPU is allowed to reorder your instructions like this: > > > > mmap_lock_speculation_start() > > mmap_lock_speculation_end() > > [locked reads go here] > > > > so the lock is broken. > > Hi Jann, > Thanks for the review! > Yeah, you are right, we do need an smp_rmb() before we compare > mm->mm_lock_seq with the stored seq. > > Otherwise reads might get reordered this way: > > CPU1 CPU2 > mmap_lock_speculation_start() // seq = mm->mm_lock_seq > reloaded_seq = mm->mm_lock_seq; // reordered read > mmap_write_lock() // inc_mm_lock_seq(mm) > vma->vm_file = ...; > mmap_write_unlock() // inc_mm_lock_seq(mm) > <speculate> > mmap_lock_speculation_end() // return (reloaded_seq == seq) > > > > > > static inline void mmap_write_lock(struct mm_struct *mm) > > > { > > > __mmap_lock_trace_start_locking(mm, true); > > > down_write(&mm->mmap_lock); > > > + inc_mm_lock_seq(mm); > > > __mmap_lock_trace_acquire_returned(mm, true, true); > > > } > > > > Similarly, inc_mm_lock_seq(), which does a store-release, can only > > provide "release lock" semantics, not "take lock" semantics, because > > the CPU can reorder it with later stores; for example, this code: > > > > inc_mm_lock_seq() > > [locked stuff goes here] > > inc_mm_lock_seq() > > > > can be reordered into this: > > > > [locked stuff goes here] > > inc_mm_lock_seq() > > inc_mm_lock_seq() > > > > so the lock is broken. > > Ugh, yes. We do need smp_wmb() AFTER the inc_mm_lock_seq(). Whenever Suren, can you share with me an updated patch for mm_lock_seq with the right memory barriers? Do you think this might have a noticeable impact on performance? What sort of benchmark do mm folks use to quantify changes like that? > we use inc_mm_lock_seq() for "take lock" semantics, it's preceded by a > down_write(&mm->mmap_lock) with implied ACQUIRE ordering. So I thought > we can use it but I realize now that this reordering is still > possible: > CPU1 CPU2 > mmap_write_lock() > down_write(&mm->mmap_lock); > vma->vm_file = ...; > > mmap_lock_speculation_start() // seq = mm->mm_lock_seq > <speculate> > mmap_lock_speculation_end() // return (mm->mm_lock_seq == seq) > > inc_mm_lock_seq(mm); > mmap_write_unlock() // inc_mm_lock_seq(mm) > > Is that what you were describing? > Thanks, > Suren. > > > > > For "taking a lock" with a memory store, or "dropping a lock" with a > > memory load, you need heavier memory barriers, see > > Documentation/memory-barriers.txt.
On Wed, Sep 11, 2024 at 2:35 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Mon, Sep 9, 2024 at 7:09 PM Suren Baghdasaryan <surenb@google.com> wrote: > > > > On Mon, Sep 9, 2024 at 5:35 AM Jann Horn <jannh@google.com> wrote: > > > > > > On Fri, Sep 6, 2024 at 7:12 AM Andrii Nakryiko <andrii@kernel.org> wrote: > > > > +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, int seq) > > > > +{ > > > > + /* Pairs with RELEASE semantics in inc_mm_lock_seq(). */ > > > > + return seq == smp_load_acquire(&mm->mm_lock_seq); > > > > +} > > > > > > A load-acquire can't provide "end of locked section" semantics - a > > > load-acquire is a one-way barrier, you can basically use it for > > > "acquire lock" semantics but not for "release lock" semantics, because > > > the CPU will prevent reordering the load with *later* loads but not > > > with *earlier* loads. So if you do: > > > > > > mmap_lock_speculation_start() > > > [locked reads go here] > > > mmap_lock_speculation_end() > > > > > > then the CPU is allowed to reorder your instructions like this: > > > > > > mmap_lock_speculation_start() > > > mmap_lock_speculation_end() > > > [locked reads go here] > > > > > > so the lock is broken. > > > > Hi Jann, > > Thanks for the review! > > Yeah, you are right, we do need an smp_rmb() before we compare > > mm->mm_lock_seq with the stored seq. > > > > Otherwise reads might get reordered this way: > > > > CPU1 CPU2 > > mmap_lock_speculation_start() // seq = mm->mm_lock_seq > > reloaded_seq = mm->mm_lock_seq; // reordered read > > mmap_write_lock() // inc_mm_lock_seq(mm) > > vma->vm_file = ...; > > mmap_write_unlock() // inc_mm_lock_seq(mm) > > <speculate> > > mmap_lock_speculation_end() // return (reloaded_seq == seq) > > > > > > > > > static inline void mmap_write_lock(struct mm_struct *mm) > > > > { > > > > __mmap_lock_trace_start_locking(mm, true); > > > > down_write(&mm->mmap_lock); > > > > + inc_mm_lock_seq(mm); > > > > __mmap_lock_trace_acquire_returned(mm, true, true); > > > > } > > > > > > Similarly, inc_mm_lock_seq(), which does a store-release, can only > > > provide "release lock" semantics, not "take lock" semantics, because > > > the CPU can reorder it with later stores; for example, this code: > > > > > > inc_mm_lock_seq() > > > [locked stuff goes here] > > > inc_mm_lock_seq() > > > > > > can be reordered into this: > > > > > > [locked stuff goes here] > > > inc_mm_lock_seq() > > > inc_mm_lock_seq() > > > > > > so the lock is broken. > > > > Ugh, yes. We do need smp_wmb() AFTER the inc_mm_lock_seq(). Whenever > > Suren, can you share with me an updated patch for mm_lock_seq with the > right memory barriers? Do you think this might have a noticeable > impact on performance? What sort of benchmark do mm folks use to > quantify changes like that? Yes, I think I can get it to you before leaving for LPC. It might end up affecting paths where we take mmap_lock for write (mmap/munmap/mprotect/etc) but these are not considered fast paths. I'll think about possible tests we can run to evaluate it. > > > we use inc_mm_lock_seq() for "take lock" semantics, it's preceded by a > > down_write(&mm->mmap_lock) with implied ACQUIRE ordering. So I thought > > we can use it but I realize now that this reordering is still > > possible: > > CPU1 CPU2 > > mmap_write_lock() > > down_write(&mm->mmap_lock); > > vma->vm_file = ...; > > > > mmap_lock_speculation_start() // seq = mm->mm_lock_seq > > <speculate> > > mmap_lock_speculation_end() // return (mm->mm_lock_seq == seq) > > > > inc_mm_lock_seq(mm); > > mmap_write_unlock() // inc_mm_lock_seq(mm) > > > > Is that what you were describing? > > Thanks, > > Suren. > > > > > > > > For "taking a lock" with a memory store, or "dropping a lock" with a > > > memory load, you need heavier memory barriers, see > > > Documentation/memory-barriers.txt.
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 485424979254..d5e3f907eea4 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -876,6 +876,9 @@ struct mm_struct { * Roughly speaking, incrementing the sequence number is * equivalent to releasing locks on VMAs; reading the sequence * number can be part of taking a read lock on a VMA. + * Incremented every time mmap_lock is write-locked/unlocked. + * Initialized to 0, therefore odd values indicate mmap_lock + * is write-locked and even values that it's released. * * Can be modified under write mmap_lock using RELEASE * semantics. diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index de9dc20b01ba..5410ce741d75 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -71,15 +71,12 @@ static inline void mmap_assert_write_locked(const struct mm_struct *mm) } #ifdef CONFIG_PER_VMA_LOCK -/* - * Drop all currently-held per-VMA locks. - * This is called from the mmap_lock implementation directly before releasing - * a write-locked mmap_lock (or downgrading it to read-locked). - * This should normally NOT be called manually from other places. - * If you want to call this manually anyway, keep in mind that this will release - * *all* VMA write locks, including ones from further up the stack. - */ -static inline void vma_end_write_all(struct mm_struct *mm) +static inline void init_mm_lock_seq(struct mm_struct *mm) +{ + mm->mm_lock_seq = 0; +} + +static inline void inc_mm_lock_seq(struct mm_struct *mm) { mmap_assert_write_locked(mm); /* @@ -91,19 +88,52 @@ static inline void vma_end_write_all(struct mm_struct *mm) */ smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1); } + +static inline bool mmap_lock_speculation_start(struct mm_struct *mm, int *seq) +{ + /* Pairs with RELEASE semantics in inc_mm_lock_seq(). */ + *seq = smp_load_acquire(&mm->mm_lock_seq); + /* Allow speculation if mmap_lock is not write-locked */ + return (*seq & 1) == 0; +} + +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, int seq) +{ + /* Pairs with RELEASE semantics in inc_mm_lock_seq(). */ + return seq == smp_load_acquire(&mm->mm_lock_seq); +} + #else -static inline void vma_end_write_all(struct mm_struct *mm) {} +static inline void init_mm_lock_seq(struct mm_struct *mm) {} +static inline void inc_mm_lock_seq(struct mm_struct *mm) {} +static inline bool mmap_lock_speculation_start(struct mm_struct *mm, int *seq) { return false; } +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, int seq) { return false; } #endif +/* + * Drop all currently-held per-VMA locks. + * This is called from the mmap_lock implementation directly before releasing + * a write-locked mmap_lock (or downgrading it to read-locked). + * This should normally NOT be called manually from other places. + * If you want to call this manually anyway, keep in mind that this will release + * *all* VMA write locks, including ones from further up the stack. + */ +static inline void vma_end_write_all(struct mm_struct *mm) +{ + inc_mm_lock_seq(mm); +} + static inline void mmap_init_lock(struct mm_struct *mm) { init_rwsem(&mm->mmap_lock); + init_mm_lock_seq(mm); } static inline void mmap_write_lock(struct mm_struct *mm) { __mmap_lock_trace_start_locking(mm, true); down_write(&mm->mmap_lock); + inc_mm_lock_seq(mm); __mmap_lock_trace_acquire_returned(mm, true, true); } @@ -111,6 +141,7 @@ static inline void mmap_write_lock_nested(struct mm_struct *mm, int subclass) { __mmap_lock_trace_start_locking(mm, true); down_write_nested(&mm->mmap_lock, subclass); + inc_mm_lock_seq(mm); __mmap_lock_trace_acquire_returned(mm, true, true); } @@ -120,6 +151,8 @@ static inline int mmap_write_lock_killable(struct mm_struct *mm) __mmap_lock_trace_start_locking(mm, true); ret = down_write_killable(&mm->mmap_lock); + if (!ret) + inc_mm_lock_seq(mm); __mmap_lock_trace_acquire_returned(mm, true, ret == 0); return ret; } diff --git a/kernel/fork.c b/kernel/fork.c index 18bdc87209d0..c44b71d354ee 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1259,9 +1259,6 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, seqcount_init(&mm->write_protect_seq); mmap_init_lock(mm); INIT_LIST_HEAD(&mm->mmlist); -#ifdef CONFIG_PER_VMA_LOCK - mm->mm_lock_seq = 0; -#endif mm_pgtables_bytes_init(mm); mm->map_count = 0; mm->locked_vm = 0;