Message ID | 20230109205336.3665937-29-surenb@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Per-VMA locks | expand |
On Mon 09-01-23 12:53:23, Suren Baghdasaryan wrote: > Introduce lock_vma_under_rcu function to lookup and lock a VMA during > page fault handling. When VMA is not found, can't be locked or changes > after being locked, the function returns NULL. The lookup is performed > under RCU protection to prevent the found VMA from being destroyed before > the VMA lock is acquired. VMA lock statistics are updated according to > the results. > For now only anonymous VMAs can be searched this way. In other cases the > function returns NULL. Could you describe why only anonymous vmas are handled at this stage and what (roughly) has to be done to support other vmas? lock_vma_under_rcu doesn't seem to have any anonymous vma specific requirements AFAICS. Also isn't lock_vma_under_rcu effectively find_read_lock_vma? Not that the naming is really the most important part but the rcu locking is internal to the function so why should we spread this implementation detail to the world... > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > --- > include/linux/mm.h | 3 +++ > mm/memory.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 54 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index c464fc8a514c..d0fddf6a1de9 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -687,6 +687,9 @@ static inline void vma_assert_no_reader(struct vm_area_struct *vma) > vma); > } > > +struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > + unsigned long address); > + > #else /* CONFIG_PER_VMA_LOCK */ > > static inline void vma_init_lock(struct vm_area_struct *vma) {} > diff --git a/mm/memory.c b/mm/memory.c > index 9ece18548db1..a658e26d965d 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5242,6 +5242,57 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > } > EXPORT_SYMBOL_GPL(handle_mm_fault); > > +#ifdef CONFIG_PER_VMA_LOCK > +/* > + * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed to be > + * stable and not isolated. If the VMA is not found or is being modified the > + * function returns NULL. > + */ > +struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > + unsigned long address) > +{ > + MA_STATE(mas, &mm->mm_mt, address, address); > + struct vm_area_struct *vma, *validate; > + > + rcu_read_lock(); > + vma = mas_walk(&mas); > +retry: > + if (!vma) > + goto inval; > + > + /* Only anonymous vmas are supported for now */ > + if (!vma_is_anonymous(vma)) > + goto inval; > + > + if (!vma_read_trylock(vma)) > + goto inval; > + > + /* Check since vm_start/vm_end might change before we lock the VMA */ > + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) { > + vma_read_unlock(vma); > + goto inval; > + } > + > + /* Check if the VMA got isolated after we found it */ > + mas.index = address; > + validate = mas_walk(&mas); > + if (validate != vma) { > + vma_read_unlock(vma); > + count_vm_vma_lock_event(VMA_LOCK_MISS); > + /* The area was replaced with another one. */ > + vma = validate; > + goto retry; > + } > + > + rcu_read_unlock(); > + return vma; > +inval: > + rcu_read_unlock(); > + count_vm_vma_lock_event(VMA_LOCK_ABORT); > + return NULL; > +} > +#endif /* CONFIG_PER_VMA_LOCK */ > + > #ifndef __PAGETABLE_P4D_FOLDED > /* > * Allocate p4d page table. > -- > 2.39.0
On Mon, Jan 9, 2023 at 9:54 PM Suren Baghdasaryan <surenb@google.com> wrote: > Introduce lock_vma_under_rcu function to lookup and lock a VMA during > page fault handling. When VMA is not found, can't be locked or changes > after being locked, the function returns NULL. The lookup is performed > under RCU protection to prevent the found VMA from being destroyed before > the VMA lock is acquired. VMA lock statistics are updated according to > the results. > For now only anonymous VMAs can be searched this way. In other cases the > function returns NULL. [...] > +struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > + unsigned long address) > +{ > + MA_STATE(mas, &mm->mm_mt, address, address); > + struct vm_area_struct *vma, *validate; > + > + rcu_read_lock(); > + vma = mas_walk(&mas); > +retry: > + if (!vma) > + goto inval; > + > + /* Only anonymous vmas are supported for now */ > + if (!vma_is_anonymous(vma)) > + goto inval; > + > + if (!vma_read_trylock(vma)) > + goto inval; > + > + /* Check since vm_start/vm_end might change before we lock the VMA */ > + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) { > + vma_read_unlock(vma); > + goto inval; > + } > + > + /* Check if the VMA got isolated after we found it */ > + mas.index = address; > + validate = mas_walk(&mas); Question for Maple Tree experts: Are you allowed to use mas_walk() like this? If the first mas_walk() call encountered a single-entry tree, it would store mas->node = MAS_ROOT, right? And then the second call would go into mas_state_walk(), mas_start() would return NULL, mas_is_ptr() would be true, and then mas_state_walk() would return the result of mas_start(), which is NULL? And we'd end up with mas_walk() returning NULL on the second run even though the tree hasn't changed? > + if (validate != vma) { > + vma_read_unlock(vma); > + count_vm_vma_lock_event(VMA_LOCK_MISS); > + /* The area was replaced with another one. */ > + vma = validate; > + goto retry; > + } > + > + rcu_read_unlock(); > + return vma; > +inval: > + rcu_read_unlock(); > + count_vm_vma_lock_event(VMA_LOCK_ABORT); > + return NULL; > +}
* Jann Horn <jannh@google.com> [230117 16:04]: > On Mon, Jan 9, 2023 at 9:54 PM Suren Baghdasaryan <surenb@google.com> wrote: > > Introduce lock_vma_under_rcu function to lookup and lock a VMA during > > page fault handling. When VMA is not found, can't be locked or changes > > after being locked, the function returns NULL. The lookup is performed > > under RCU protection to prevent the found VMA from being destroyed before > > the VMA lock is acquired. VMA lock statistics are updated according to > > the results. > > For now only anonymous VMAs can be searched this way. In other cases the > > function returns NULL. > [...] > > +struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > > + unsigned long address) > > +{ > > + MA_STATE(mas, &mm->mm_mt, address, address); > > + struct vm_area_struct *vma, *validate; > > + > > + rcu_read_lock(); > > + vma = mas_walk(&mas); > > +retry: > > + if (!vma) > > + goto inval; > > + > > + /* Only anonymous vmas are supported for now */ > > + if (!vma_is_anonymous(vma)) > > + goto inval; > > + > > + if (!vma_read_trylock(vma)) > > + goto inval; > > + > > + /* Check since vm_start/vm_end might change before we lock the VMA */ > > + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) { > > + vma_read_unlock(vma); > > + goto inval; > > + } > > + > > + /* Check if the VMA got isolated after we found it */ > > + mas.index = address; > > + validate = mas_walk(&mas); > > Question for Maple Tree experts: > > Are you allowed to use mas_walk() like this? If the first mas_walk() > call encountered a single-entry tree, it would store mas->node = > MAS_ROOT, right? And then the second call would go into > mas_state_walk(), mas_start() would return NULL, mas_is_ptr() would be > true, and then mas_state_walk() would return the result of > mas_start(), which is NULL? And we'd end up with mas_walk() returning > NULL on the second run even though the tree hasn't changed? This is safe for VMAs. There might be a bug in the tree regarding re-walking with a pointer, but it won't matter here. A single entry tree will be a pointer if the entry is of the range 0 - 0 (mas.index == 0, mas.last == 0). This would be a zero sized VMA - which is not valid. The second walk will check if the maple node is dead and restart the walk if it is dead. If the node isn't dead (almost always the case), then it will be a very quick walk. After a mas_walk(), the maple state has mas.index = vma->vm_start and mas.last = (vma->vm_end - 1). The address is set prior to the second walk in case of a vma split where mas.index from the first walk is on the other side of the split than address. > > > + if (validate != vma) { > > + vma_read_unlock(vma); > > + count_vm_vma_lock_event(VMA_LOCK_MISS); > > + /* The area was replaced with another one. */ > > + vma = validate; > > + goto retry; > > + } > > + > > + rcu_read_unlock(); > > + return vma; > > +inval: > > + rcu_read_unlock(); > > + count_vm_vma_lock_event(VMA_LOCK_ABORT); > > + return NULL; > > +}
On Tue, Jan 17, 2023 at 7:47 AM Michal Hocko <mhocko@suse.com> wrote: > > On Mon 09-01-23 12:53:23, Suren Baghdasaryan wrote: > > Introduce lock_vma_under_rcu function to lookup and lock a VMA during > > page fault handling. When VMA is not found, can't be locked or changes > > after being locked, the function returns NULL. The lookup is performed > > under RCU protection to prevent the found VMA from being destroyed before > > the VMA lock is acquired. VMA lock statistics are updated according to > > the results. > > For now only anonymous VMAs can be searched this way. In other cases the > > function returns NULL. > > Could you describe why only anonymous vmas are handled at this stage and > what (roughly) has to be done to support other vmas? lock_vma_under_rcu > doesn't seem to have any anonymous vma specific requirements AFAICS. TBH I haven't spent too much time looking into file-backed page faults yet but a couple of tasks I can think of are: - Ensure that all vma->vm_ops->fault() handlers do not rely on mmap_lock being read-locked; - vma->vm_file freeing like VMA freeing will need to be done after RCU grace period since page fault handlers use it. This will require some caution because simply adding it into __vm_area_free() called via call_rcu() will cause corresponding fops->release() to be called asynchronously. I had to solve this issue with out-of-tree SPF implementation when asynchronously called snd_pcm_release() was problematic. I'm sure I'm missing more potential issues and maybe Matthew and Michel can pinpoint more things to resolve here? > > Also isn't lock_vma_under_rcu effectively find_read_lock_vma? Not that > the naming is really the most important part but the rcu locking is > internal to the function so why should we spread this implementation > detail to the world... I wanted the name to indicate that the lookup is done with no locks held. But I'm open to suggestions. > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > --- > > include/linux/mm.h | 3 +++ > > mm/memory.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 54 insertions(+) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index c464fc8a514c..d0fddf6a1de9 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -687,6 +687,9 @@ static inline void vma_assert_no_reader(struct vm_area_struct *vma) > > vma); > > } > > > > +struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > > + unsigned long address); > > + > > #else /* CONFIG_PER_VMA_LOCK */ > > > > static inline void vma_init_lock(struct vm_area_struct *vma) {} > > diff --git a/mm/memory.c b/mm/memory.c > > index 9ece18548db1..a658e26d965d 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -5242,6 +5242,57 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > > } > > EXPORT_SYMBOL_GPL(handle_mm_fault); > > > > +#ifdef CONFIG_PER_VMA_LOCK > > +/* > > + * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed to be > > + * stable and not isolated. If the VMA is not found or is being modified the > > + * function returns NULL. > > + */ > > +struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > > + unsigned long address) > > +{ > > + MA_STATE(mas, &mm->mm_mt, address, address); > > + struct vm_area_struct *vma, *validate; > > + > > + rcu_read_lock(); > > + vma = mas_walk(&mas); > > +retry: > > + if (!vma) > > + goto inval; > > + > > + /* Only anonymous vmas are supported for now */ > > + if (!vma_is_anonymous(vma)) > > + goto inval; > > + > > + if (!vma_read_trylock(vma)) > > + goto inval; > > + > > + /* Check since vm_start/vm_end might change before we lock the VMA */ > > + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) { > > + vma_read_unlock(vma); > > + goto inval; > > + } > > + > > + /* Check if the VMA got isolated after we found it */ > > + mas.index = address; > > + validate = mas_walk(&mas); > > + if (validate != vma) { > > + vma_read_unlock(vma); > > + count_vm_vma_lock_event(VMA_LOCK_MISS); > > + /* The area was replaced with another one. */ > > + vma = validate; > > + goto retry; > > + } > > + > > + rcu_read_unlock(); > > + return vma; > > +inval: > > + rcu_read_unlock(); > > + count_vm_vma_lock_event(VMA_LOCK_ABORT); > > + return NULL; > > +} > > +#endif /* CONFIG_PER_VMA_LOCK */ > > + > > #ifndef __PAGETABLE_P4D_FOLDED > > /* > > * Allocate p4d page table. > > -- > > 2.39.0 > > -- > Michal Hocko > SUSE Labs
On Tue, Jan 17, 2023 at 05:06:57PM -0800, Suren Baghdasaryan wrote: > On Tue, Jan 17, 2023 at 7:47 AM Michal Hocko <mhocko@suse.com> wrote: > > > > On Mon 09-01-23 12:53:23, Suren Baghdasaryan wrote: > > > Introduce lock_vma_under_rcu function to lookup and lock a VMA during > > > page fault handling. When VMA is not found, can't be locked or changes > > > after being locked, the function returns NULL. The lookup is performed > > > under RCU protection to prevent the found VMA from being destroyed before > > > the VMA lock is acquired. VMA lock statistics are updated according to > > > the results. > > > For now only anonymous VMAs can be searched this way. In other cases the > > > function returns NULL. > > > > Could you describe why only anonymous vmas are handled at this stage and > > what (roughly) has to be done to support other vmas? lock_vma_under_rcu > > doesn't seem to have any anonymous vma specific requirements AFAICS. > > TBH I haven't spent too much time looking into file-backed page faults > yet but a couple of tasks I can think of are: > - Ensure that all vma->vm_ops->fault() handlers do not rely on > mmap_lock being read-locked; I think this way lies madness. There are just too many device drivers that implement ->fault. My plan is to call the ->map_pages() method under RCU without even read-locking the VMA. If that doesn't satisfy the fault, then drop all the way back to taking the mmap_sem for read before calling into ->fault.
On Tue, Jan 17, 2023 at 6:44 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Tue, Jan 17, 2023 at 05:06:57PM -0800, Suren Baghdasaryan wrote: > > On Tue, Jan 17, 2023 at 7:47 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Mon 09-01-23 12:53:23, Suren Baghdasaryan wrote: > > > > Introduce lock_vma_under_rcu function to lookup and lock a VMA during > > > > page fault handling. When VMA is not found, can't be locked or changes > > > > after being locked, the function returns NULL. The lookup is performed > > > > under RCU protection to prevent the found VMA from being destroyed before > > > > the VMA lock is acquired. VMA lock statistics are updated according to > > > > the results. > > > > For now only anonymous VMAs can be searched this way. In other cases the > > > > function returns NULL. > > > > > > Could you describe why only anonymous vmas are handled at this stage and > > > what (roughly) has to be done to support other vmas? lock_vma_under_rcu > > > doesn't seem to have any anonymous vma specific requirements AFAICS. > > > > TBH I haven't spent too much time looking into file-backed page faults > > yet but a couple of tasks I can think of are: > > - Ensure that all vma->vm_ops->fault() handlers do not rely on > > mmap_lock being read-locked; > > I think this way lies madness. There are just too many device drivers > that implement ->fault. My plan is to call the ->map_pages() method > under RCU without even read-locking the VMA. If that doesn't satisfy > the fault, then drop all the way back to taking the mmap_sem for read > before calling into ->fault. Sounds reasonable to me but I guess the devil is in the details... >
diff --git a/include/linux/mm.h b/include/linux/mm.h index c464fc8a514c..d0fddf6a1de9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -687,6 +687,9 @@ static inline void vma_assert_no_reader(struct vm_area_struct *vma) vma); } +struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, + unsigned long address); + #else /* CONFIG_PER_VMA_LOCK */ static inline void vma_init_lock(struct vm_area_struct *vma) {} diff --git a/mm/memory.c b/mm/memory.c index 9ece18548db1..a658e26d965d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5242,6 +5242,57 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, } EXPORT_SYMBOL_GPL(handle_mm_fault); +#ifdef CONFIG_PER_VMA_LOCK +/* + * Lookup and lock a VMA under RCU protection. Returned VMA is guaranteed to be + * stable and not isolated. If the VMA is not found or is being modified the + * function returns NULL. + */ +struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, + unsigned long address) +{ + MA_STATE(mas, &mm->mm_mt, address, address); + struct vm_area_struct *vma, *validate; + + rcu_read_lock(); + vma = mas_walk(&mas); +retry: + if (!vma) + goto inval; + + /* Only anonymous vmas are supported for now */ + if (!vma_is_anonymous(vma)) + goto inval; + + if (!vma_read_trylock(vma)) + goto inval; + + /* Check since vm_start/vm_end might change before we lock the VMA */ + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) { + vma_read_unlock(vma); + goto inval; + } + + /* Check if the VMA got isolated after we found it */ + mas.index = address; + validate = mas_walk(&mas); + if (validate != vma) { + vma_read_unlock(vma); + count_vm_vma_lock_event(VMA_LOCK_MISS); + /* The area was replaced with another one. */ + vma = validate; + goto retry; + } + + rcu_read_unlock(); + return vma; +inval: + rcu_read_unlock(); + count_vm_vma_lock_event(VMA_LOCK_ABORT); + return NULL; +} +#endif /* CONFIG_PER_VMA_LOCK */ + #ifndef __PAGETABLE_P4D_FOLDED /* * Allocate p4d page table.
Introduce lock_vma_under_rcu function to lookup and lock a VMA during page fault handling. When VMA is not found, can't be locked or changes after being locked, the function returns NULL. The lookup is performed under RCU protection to prevent the found VMA from being destroyed before the VMA lock is acquired. VMA lock statistics are updated according to the results. For now only anonymous VMAs can be searched this way. In other cases the function returns NULL. Signed-off-by: Suren Baghdasaryan <surenb@google.com> --- include/linux/mm.h | 3 +++ mm/memory.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+)