Message ID | 20230216051750.3125598-17-surenb@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Per-VMA locks | expand |
On Wed, Feb 15, 2023 at 09:17:31PM -0800, Suren Baghdasaryan wrote: > Decisions about whether VMAs can be merged, split or expanded must be > made while VMAs are protected from the changes which can affect that > decision. For example, merge_vma uses vma->anon_vma in its decision Did you mean vma_merge()? > whether the VMA can be merged. Meanwhile, page fault handler changes > vma->anon_vma during COW operation. > Write-lock all VMAs which might be affected by a merge or split operation > before making decision how such operations should be performed. > It doesn't make sense (to me) to update vma->anon_vma during COW fault. AFAIK children's vma->anon_vma is allocated during fork() and page->anon_vma is updated on COW to reduce rmap walking because it's now unshared, no? As patch 26 just falls back to mmap_lock if no anon_vma is set, I think we can assume nothing updates vma->anon_vma (and its interval tree) if we are holding mmap_lock in write mode. Or am I missing something? -- Regards, Hyeonggon
On Thu, Feb 23, 2023 at 02:51:27PM +0000, Hyeonggon Yoo wrote: > On Wed, Feb 15, 2023 at 09:17:31PM -0800, Suren Baghdasaryan wrote: > > Decisions about whether VMAs can be merged, split or expanded must be > > made while VMAs are protected from the changes which can affect that > > decision. For example, merge_vma uses vma->anon_vma in its decision > > Did you mean vma_merge()? > > > whether the VMA can be merged. Meanwhile, page fault handler changes > > vma->anon_vma during COW operation. > > Write-lock all VMAs which might be affected by a merge or split operation > > before making decision how such operations should be performed. > > > > It doesn't make sense (to me) to update vma->anon_vma during COW fault. > > AFAIK children's vma->anon_vma is allocated during fork() and > page->anon_vma is updated on COW to reduce rmap walking because it's now > unshared, no? > > As patch 26 just falls back to mmap_lock if no anon_vma is set, > I think we can assume nothing updates vma->anon_vma (and its interval > tree) if we are holding mmap_lock in write mode. *Not interval tree, as it can be modified by other processes* > > Or am I missing something? > > -- > Regards, > Hyeonggon >
On Thu, Feb 23, 2023 at 6:51 AM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote: > > On Wed, Feb 15, 2023 at 09:17:31PM -0800, Suren Baghdasaryan wrote: > > Decisions about whether VMAs can be merged, split or expanded must be > > made while VMAs are protected from the changes which can affect that > > decision. For example, merge_vma uses vma->anon_vma in its decision > > Did you mean vma_merge()? Correct. > > > whether the VMA can be merged. Meanwhile, page fault handler changes > > vma->anon_vma during COW operation. > > Write-lock all VMAs which might be affected by a merge or split operation > > before making decision how such operations should be performed. > > > > It doesn't make sense (to me) to update vma->anon_vma during COW fault. > > AFAIK children's vma->anon_vma is allocated during fork() and > page->anon_vma is updated on COW to reduce rmap walking because it's now > unshared, no? > > As patch 26 just falls back to mmap_lock if no anon_vma is set, > I think we can assume nothing updates vma->anon_vma (and its interval > tree) if we are holding mmap_lock in write mode. > > Or am I missing something? No, I think you are right. Patch 26 was added in the later versions of this patchset and at the time this patch was written vma->anon_vma could change during page fault handling under only per-VMA lock protection. So, this patch can be simplified. We still want to prevent concurrent page faults while VMA is being merged or split (simply because par-VMA lock that page fault handler holds might become the wrong one if the VMA is split or merged from under it) but the timing of taking per-VMA lock does not have to be *before* can_vma_merge_{before|after}. Does that make sense? > > -- > Regards, > Hyeonggon > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
diff --git a/mm/mmap.c b/mm/mmap.c index c5f2ddf17b87..ec2f8d0af280 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -269,8 +269,11 @@ SYSCALL_DEFINE1(brk, unsigned long, brk) */ vma_iter_init(&vmi, mm, oldbrk); next = vma_find(&vmi, newbrk + PAGE_SIZE + stack_guard_gap); - if (next && newbrk + PAGE_SIZE > vm_start_gap(next)) - goto out; + if (next) { + vma_start_write(next); + if (newbrk + PAGE_SIZE > vm_start_gap(next)) + goto out; + } brkvma = vma_prev_limit(&vmi, mm->start_brk); /* Ok, looks good - let it rip. */ @@ -912,10 +915,17 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, if (vm_flags & VM_SPECIAL) return NULL; + if (prev) + vma_start_write(prev); next = find_vma(mm, prev ? prev->vm_end : 0); + if (next) + vma_start_write(next); mid = next; - if (next && next->vm_end == end) /* cases 6, 7, 8 */ + if (next && next->vm_end == end) { /* cases 6, 7, 8 */ next = find_vma(mm, next->vm_end); + if (next) + vma_start_write(next); + } /* verify some invariant that must be enforced by the caller */ VM_WARN_ON(prev && addr <= prev->vm_start); @@ -2163,6 +2173,7 @@ int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, WARN_ON(vma->vm_start >= addr); WARN_ON(vma->vm_end <= addr); + vma_start_write(vma); if (vma->vm_ops && vma->vm_ops->may_split) { err = vma->vm_ops->may_split(vma, addr); if (err) @@ -2518,6 +2529,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, /* Attempt to expand an old mapping */ /* Check next */ + if (next) + vma_start_write(next); if (next && next->vm_start == end && !vma_policy(next) && can_vma_merge_before(next, vm_flags, NULL, file, pgoff+pglen, NULL_VM_UFFD_CTX, NULL)) { @@ -2527,6 +2540,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, } /* Check prev */ + if (prev) + vma_start_write(prev); if (prev && prev->vm_end == addr && !vma_policy(prev) && (vma ? can_vma_merge_after(prev, vm_flags, vma->anon_vma, file, pgoff, vma->vm_userfaultfd_ctx, NULL) : @@ -2900,6 +2915,8 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma, if (security_vm_enough_memory_mm(mm, len >> PAGE_SHIFT)) return -ENOMEM; + if (vma) + vma_start_write(vma); /* * Expand the existing vma if possible; Note that singular lists do not * occur after forking, so the expand will only happen on new VMAs.
Decisions about whether VMAs can be merged, split or expanded must be made while VMAs are protected from the changes which can affect that decision. For example, merge_vma uses vma->anon_vma in its decision whether the VMA can be merged. Meanwhile, page fault handler changes vma->anon_vma during COW operation. Write-lock all VMAs which might be affected by a merge or split operation before making decision how such operations should be performed. Signed-off-by: Suren Baghdasaryan <surenb@google.com> --- mm/mmap.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)