Message ID | 20221207203034.650899-6-peterx@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2,01/10] mm/hugetlb: Let vma_offset_start() to return start | expand |
On 12/7/22 12:30, Peter Xu wrote: > We can take the hugetlb walker lock, here taking vma lock directly. > > Reviewed-by: David Hildenbrand <david@redhat.com> > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > fs/userfaultfd.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index 07c81ab3fd4d..a602f008dde5 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -376,7 +376,8 @@ static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags) > */ > vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) > { > - struct mm_struct *mm = vmf->vma->vm_mm; > + struct vm_area_struct *vma = vmf->vma; > + struct mm_struct *mm = vma->vm_mm; > struct userfaultfd_ctx *ctx; > struct userfaultfd_wait_queue uwq; > vm_fault_t ret = VM_FAULT_SIGBUS; > @@ -403,7 +404,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) > */ > mmap_assert_locked(mm); > > - ctx = vmf->vma->vm_userfaultfd_ctx.ctx; > + ctx = vma->vm_userfaultfd_ctx.ctx; > if (!ctx) > goto out; > > @@ -493,6 +494,13 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) > > blocking_state = userfaultfd_get_blocking_state(vmf->flags); > > + /* > + * This stablizes pgtable for hugetlb on e.g. pmd unsharing. Need > + * to be before setting current state. > + */ Looking at this code, I am not able to come up with a reason for why the vma lock/unlock placement is exactly where it is. It looks quite arbitrary. Why not, for example, take and drop the vma lock within userfaultfd_huge_must_wait()? That makes more sense to me, but I'm not familiar with userfaultfd so of course I'm missing something. But the comment above certainly doesn't supply that something. thanks,
On Wed, Dec 07, 2022 at 03:19:55PM -0800, John Hubbard wrote: > On 12/7/22 12:30, Peter Xu wrote: > > We can take the hugetlb walker lock, here taking vma lock directly. > > > > Reviewed-by: David Hildenbrand <david@redhat.com> > > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > fs/userfaultfd.c | 18 ++++++++++++++---- > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > > index 07c81ab3fd4d..a602f008dde5 100644 > > --- a/fs/userfaultfd.c > > +++ b/fs/userfaultfd.c > > @@ -376,7 +376,8 @@ static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags) > > */ > > vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) > > { > > - struct mm_struct *mm = vmf->vma->vm_mm; > > + struct vm_area_struct *vma = vmf->vma; > > + struct mm_struct *mm = vma->vm_mm; > > struct userfaultfd_ctx *ctx; > > struct userfaultfd_wait_queue uwq; > > vm_fault_t ret = VM_FAULT_SIGBUS; > > @@ -403,7 +404,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) > > */ > > mmap_assert_locked(mm); > > - ctx = vmf->vma->vm_userfaultfd_ctx.ctx; > > + ctx = vma->vm_userfaultfd_ctx.ctx; > > if (!ctx) > > goto out; > > @@ -493,6 +494,13 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) > > blocking_state = userfaultfd_get_blocking_state(vmf->flags); > > + /* > > + * This stablizes pgtable for hugetlb on e.g. pmd unsharing. Need > > + * to be before setting current state. > > + */ > > Looking at this code, I am not able to come up with a reason for why the > vma lock/unlock placement is exactly where it is. It looks quite arbitrary. > > Why not, for example, take and drop the vma lock within > userfaultfd_huge_must_wait()? That makes more sense to me, but I'm not familiar > with userfaultfd so of course I'm missing something. > > But the comment above certainly doesn't supply that something. The part that matters in the comment is "need to be before setting current state". blocking_state = userfaultfd_get_blocking_state(vmf->flags); if (is_vm_hugetlb_page(vma)) hugetlb_vma_lock_read(vma); set_current_state(blocking_state); down_read() can sleep and also modify the task state, we cannot take the lock after that point because otherwise the task state will be messed up.
On 12/7/22 15:44, Peter Xu wrote: > The part that matters in the comment is "need to be before setting current > state". > > blocking_state = userfaultfd_get_blocking_state(vmf->flags); > if (is_vm_hugetlb_page(vma)) > hugetlb_vma_lock_read(vma); > set_current_state(blocking_state); > > down_read() can sleep and also modify the task state, we cannot take the > lock after that point because otherwise the task state will be messed up. > OK, how about: /* * Take the vma lock now, in order to safely call * userfaultfd_huge_must_wait() a little bit later. Because acquiring * the (sleepable) vma lock potentially modifies the current task state, * that must be before explicitly calling set_current_state(). */ Other than that, the patch looks good, so: Reviewed-by: John Hubbard <jhubbard@nvidia.com> thanks,
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 07c81ab3fd4d..a602f008dde5 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -376,7 +376,8 @@ static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags) */ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) { - struct mm_struct *mm = vmf->vma->vm_mm; + struct vm_area_struct *vma = vmf->vma; + struct mm_struct *mm = vma->vm_mm; struct userfaultfd_ctx *ctx; struct userfaultfd_wait_queue uwq; vm_fault_t ret = VM_FAULT_SIGBUS; @@ -403,7 +404,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) */ mmap_assert_locked(mm); - ctx = vmf->vma->vm_userfaultfd_ctx.ctx; + ctx = vma->vm_userfaultfd_ctx.ctx; if (!ctx) goto out; @@ -493,6 +494,13 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) blocking_state = userfaultfd_get_blocking_state(vmf->flags); + /* + * This stablizes pgtable for hugetlb on e.g. pmd unsharing. Need + * to be before setting current state. + */ + if (is_vm_hugetlb_page(vma)) + hugetlb_vma_lock_read(vma); + spin_lock_irq(&ctx->fault_pending_wqh.lock); /* * After the __add_wait_queue the uwq is visible to userland @@ -507,13 +515,15 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) set_current_state(blocking_state); spin_unlock_irq(&ctx->fault_pending_wqh.lock); - if (!is_vm_hugetlb_page(vmf->vma)) + if (!is_vm_hugetlb_page(vma)) must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags, reason); else - must_wait = userfaultfd_huge_must_wait(ctx, vmf->vma, + must_wait = userfaultfd_huge_must_wait(ctx, vma, vmf->address, vmf->flags, reason); + if (is_vm_hugetlb_page(vma)) + hugetlb_vma_unlock_read(vma); mmap_read_unlock(mm); if (likely(must_wait && !READ_ONCE(ctx->released))) {