Message ID | 20221005011707.514612-2-mike.kravetz@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | hugetlb: fixes for new vma lock series | expand |
Sorry for late respond. It's a really busy week. :) On 2022/10/5 9:17, Mike Kravetz wrote: > The hugetlb vma lock hangs off the vm_private_data field and is specific > to the vma. When vm_area_dup() is called as part of vma splitting, the Oh, I checked vm_area_dup() from callsite of copy_vma and dup_mmap but split_vma is missed... And yes, vma splitting can occur but vma merging won't for hugetlb vma. Thanks for catching this, Mike. > vma lock pointer is copied to the new vma. This will result in issues > such as double freeing of the structure. Update the hugetlb open vm_ops > to allocate a new vma lock for the new vma. > > The routine __unmap_hugepage_range_final unconditionally unset > VM_MAYSHARE to prevent subsequent pmd sharing. hugetlb_vma_lock_free > attempted to anticipate this by checking both VM_MAYSHARE and VM_SHARED. > However, if only VM_MAYSHARE was set we would miss the free. With the > introduction of the vma lock, a vma can not participate in pmd sharing > if vm_private_data is NULL. Instead of clearing VM_MAYSHARE in > __unmap_hugepage_range_final, free the vma lock to prevent sharing. Also, > update the sharing code to make sure vma lock is indeed a condition for > pmd sharing. hugetlb_vma_lock_free can then key off VM_MAYSHARE and not > miss any vmas. > > Fixes: "hugetlb: add vma based lock for pmd sharing" > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > --- > mm/hugetlb.c | 43 +++++++++++++++++++++++++++---------------- > mm/memory.c | 4 ---- > 2 files changed, 27 insertions(+), 20 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 4443e87e814b..0129d371800c 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4612,7 +4612,14 @@ static void hugetlb_vm_op_open(struct vm_area_struct *vma) > kref_get(&resv->refs); > } > > - hugetlb_vma_lock_alloc(vma); > + /* > + * vma_lock structure for sharable mappings is vma specific. > + * Clear old pointer (if copied via vm_area_dup) and create new. > + */ > + if (vma->vm_flags & VM_MAYSHARE) { > + vma->vm_private_data = NULL; > + hugetlb_vma_lock_alloc(vma); > + } IMHO this would lead to memoryleak. Think about the below move_vma() flow: move_vma copy_vma new_vma = vm_area_dup(vma); new_vma->vm_ops->open(new_vma); --> new_vma has its own vma lock. is_vm_hugetlb_page(vma) clear_vma_resv_huge_pages hugetlb_dup_vma_private --> vma->vm_private_data is set to NULL without put ref. So vma lock is *leaked*? Other part looks good to me. Thanks for your work. Thanks, Miaohe Lin
On 10/15/22 09:25, Miaohe Lin wrote: > Sorry for late respond. It's a really busy week. :) > > On 2022/10/5 9:17, Mike Kravetz wrote: > > The hugetlb vma lock hangs off the vm_private_data field and is specific > > to the vma. When vm_area_dup() is called as part of vma splitting, the > > Oh, I checked vm_area_dup() from callsite of copy_vma and dup_mmap but split_vma > is missed... And yes, vma splitting can occur but vma merging won't for hugetlb > vma. Thanks for catching this, Mike. > > > vma lock pointer is copied to the new vma. This will result in issues > > such as double freeing of the structure. Update the hugetlb open vm_ops > > to allocate a new vma lock for the new vma. > > > > The routine __unmap_hugepage_range_final unconditionally unset > > VM_MAYSHARE to prevent subsequent pmd sharing. hugetlb_vma_lock_free > > attempted to anticipate this by checking both VM_MAYSHARE and VM_SHARED. > > However, if only VM_MAYSHARE was set we would miss the free. With the > > introduction of the vma lock, a vma can not participate in pmd sharing > > if vm_private_data is NULL. Instead of clearing VM_MAYSHARE in > > __unmap_hugepage_range_final, free the vma lock to prevent sharing. Also, > > update the sharing code to make sure vma lock is indeed a condition for > > pmd sharing. hugetlb_vma_lock_free can then key off VM_MAYSHARE and not > > miss any vmas. > > > > Fixes: "hugetlb: add vma based lock for pmd sharing" > > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > > --- > > mm/hugetlb.c | 43 +++++++++++++++++++++++++++---------------- > > mm/memory.c | 4 ---- > > 2 files changed, 27 insertions(+), 20 deletions(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 4443e87e814b..0129d371800c 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -4612,7 +4612,14 @@ static void hugetlb_vm_op_open(struct vm_area_struct *vma) > > kref_get(&resv->refs); > > } > > > > - hugetlb_vma_lock_alloc(vma); > > + /* > > + * vma_lock structure for sharable mappings is vma specific. > > + * Clear old pointer (if copied via vm_area_dup) and create new. > > + */ > > + if (vma->vm_flags & VM_MAYSHARE) { > > + vma->vm_private_data = NULL; > > + hugetlb_vma_lock_alloc(vma); > > + } > > IMHO this would lead to memoryleak. Think about the below move_vma() flow: > move_vma > copy_vma > new_vma = vm_area_dup(vma); > new_vma->vm_ops->open(new_vma); --> new_vma has its own vma lock. > is_vm_hugetlb_page(vma) > clear_vma_resv_huge_pages > hugetlb_dup_vma_private --> vma->vm_private_data is set to NULL > without put ref. So vma lock is *leaked*? You are right, that could lead to a leak. I have an idea about setting vma->vm_private_data to NULL for VM_MAYSHARE vmas in routines like hugetlb_dup_vma_private(). We can check hugetlb_vma_lock->vma and only set to NULL if, vma->(hugetlb_vma_lock)vma->vm_private_data->vma != vma Got sidetracked chasing down another leak today. Will send a patch implementing this idea soon. Thanks for looking at this!
On 2022/10/18 10:56, Mike Kravetz wrote: > On 10/15/22 09:25, Miaohe Lin wrote: >> Sorry for late respond. It's a really busy week. :) >> >> On 2022/10/5 9:17, Mike Kravetz wrote: >>> The hugetlb vma lock hangs off the vm_private_data field and is specific >>> to the vma. When vm_area_dup() is called as part of vma splitting, the >> >> Oh, I checked vm_area_dup() from callsite of copy_vma and dup_mmap but split_vma >> is missed... And yes, vma splitting can occur but vma merging won't for hugetlb >> vma. Thanks for catching this, Mike. >> >>> vma lock pointer is copied to the new vma. This will result in issues >>> such as double freeing of the structure. Update the hugetlb open vm_ops >>> to allocate a new vma lock for the new vma. >>> >>> The routine __unmap_hugepage_range_final unconditionally unset >>> VM_MAYSHARE to prevent subsequent pmd sharing. hugetlb_vma_lock_free >>> attempted to anticipate this by checking both VM_MAYSHARE and VM_SHARED. >>> However, if only VM_MAYSHARE was set we would miss the free. With the >>> introduction of the vma lock, a vma can not participate in pmd sharing >>> if vm_private_data is NULL. Instead of clearing VM_MAYSHARE in >>> __unmap_hugepage_range_final, free the vma lock to prevent sharing. Also, >>> update the sharing code to make sure vma lock is indeed a condition for >>> pmd sharing. hugetlb_vma_lock_free can then key off VM_MAYSHARE and not >>> miss any vmas. >>> >>> Fixes: "hugetlb: add vma based lock for pmd sharing" >>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> >>> --- >>> mm/hugetlb.c | 43 +++++++++++++++++++++++++++---------------- >>> mm/memory.c | 4 ---- >>> 2 files changed, 27 insertions(+), 20 deletions(-) >>> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>> index 4443e87e814b..0129d371800c 100644 >>> --- a/mm/hugetlb.c >>> +++ b/mm/hugetlb.c >>> @@ -4612,7 +4612,14 @@ static void hugetlb_vm_op_open(struct vm_area_struct *vma) >>> kref_get(&resv->refs); >>> } >>> >>> - hugetlb_vma_lock_alloc(vma); >>> + /* >>> + * vma_lock structure for sharable mappings is vma specific. >>> + * Clear old pointer (if copied via vm_area_dup) and create new. >>> + */ >>> + if (vma->vm_flags & VM_MAYSHARE) { >>> + vma->vm_private_data = NULL; >>> + hugetlb_vma_lock_alloc(vma); >>> + } >> >> IMHO this would lead to memoryleak. Think about the below move_vma() flow: >> move_vma >> copy_vma >> new_vma = vm_area_dup(vma); >> new_vma->vm_ops->open(new_vma); --> new_vma has its own vma lock. >> is_vm_hugetlb_page(vma) >> clear_vma_resv_huge_pages >> hugetlb_dup_vma_private --> vma->vm_private_data is set to NULL >> without put ref. So vma lock is *leaked*? > > You are right, that could lead to a leak. > > I have an idea about setting vma->vm_private_data to NULL for VM_MAYSHARE > vmas in routines like hugetlb_dup_vma_private(). We can check > hugetlb_vma_lock->vma and only set to NULL if, > > vma->(hugetlb_vma_lock)vma->vm_private_data->vma != vma Looks feasible. Thanks for your work, Mike. Thanks, Miaohe Lin > > Got sidetracked chasing down another leak today. Will send a patch > implementing this idea soon. > > Thanks for looking at this! >
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4443e87e814b..0129d371800c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4612,7 +4612,14 @@ static void hugetlb_vm_op_open(struct vm_area_struct *vma) kref_get(&resv->refs); } - hugetlb_vma_lock_alloc(vma); + /* + * vma_lock structure for sharable mappings is vma specific. + * Clear old pointer (if copied via vm_area_dup) and create new. + */ + if (vma->vm_flags & VM_MAYSHARE) { + vma->vm_private_data = NULL; + hugetlb_vma_lock_alloc(vma); + } } static void hugetlb_vm_op_close(struct vm_area_struct *vma) @@ -5168,19 +5175,23 @@ void __unmap_hugepage_range_final(struct mmu_gather *tlb, unsigned long end, struct page *ref_page, zap_flags_t zap_flags) { + hugetlb_vma_lock_write(vma); + i_mmap_lock_write(vma->vm_file->f_mapping); + __unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags); /* - * Clear this flag so that x86's huge_pmd_share page_table_shareable - * test will fail on a vma being torn down, and not grab a page table - * on its way out. We're lucky that the flag has such an appropriate - * name, and can in fact be safely cleared here. We could clear it - * before the __unmap_hugepage_range above, but all that's necessary - * is to clear it before releasing the i_mmap_rwsem. This works - * because in the context this is called, the VMA is about to be - * destroyed and the i_mmap_rwsem is held. + * Unlock and free the vma lock before releasing i_mmap_rwsem. When + * the vma_lock is freed, this makes the vma ineligible for pmd + * sharing. And, i_mmap_rwsem is required to set up pmd sharing. + * This is important as page tables for this unmapped range will + * be asynchrously deleted. If the page tables are shared, there + * will be issues when accessed by someone else. */ - vma->vm_flags &= ~VM_MAYSHARE; + hugetlb_vma_unlock_write(vma); + hugetlb_vma_lock_free(vma); + + i_mmap_unlock_write(vma->vm_file->f_mapping); } void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, @@ -6730,10 +6741,13 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma, /* * match the virtual addresses, permission and the alignment of the * page table page. + * + * Also, vma_lock (vm_private_data) is required for sharing. */ if (pmd_index(addr) != pmd_index(saddr) || vm_flags != svm_flags || - !range_in_vma(svma, sbase, s_end)) + !range_in_vma(svma, sbase, s_end) || + !svma->vm_private_data) return 0; return saddr; @@ -6883,12 +6897,9 @@ void hugetlb_vma_lock_release(struct kref *kref) static void hugetlb_vma_lock_free(struct vm_area_struct *vma) { /* - * Only present in sharable vmas. See comment in - * __unmap_hugepage_range_final about how VM_SHARED could - * be set without VM_MAYSHARE. As a result, we need to - * check if either is set in the free path. + * Only present in sharable vmas. */ - if (!vma || !(vma->vm_flags & (VM_MAYSHARE | VM_SHARED))) + if (!vma || !__vma_shareable_flags_pmd(vma)) return; if (vma->vm_private_data) { diff --git a/mm/memory.c b/mm/memory.c index 1b994a55f176..81cc75e71888 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1685,12 +1685,8 @@ static void unmap_single_vma(struct mmu_gather *tlb, if (vma->vm_file) { zap_flags_t zap_flags = details ? details->zap_flags : 0; - hugetlb_vma_lock_write(vma); - i_mmap_lock_write(vma->vm_file->f_mapping); __unmap_hugepage_range_final(tlb, vma, start, end, NULL, zap_flags); - i_mmap_unlock_write(vma->vm_file->f_mapping); - hugetlb_vma_unlock_write(vma); } } else unmap_page_range(tlb, vma, start, end, details);
The hugetlb vma lock hangs off the vm_private_data field and is specific to the vma. When vm_area_dup() is called as part of vma splitting, the vma lock pointer is copied to the new vma. This will result in issues such as double freeing of the structure. Update the hugetlb open vm_ops to allocate a new vma lock for the new vma. The routine __unmap_hugepage_range_final unconditionally unset VM_MAYSHARE to prevent subsequent pmd sharing. hugetlb_vma_lock_free attempted to anticipate this by checking both VM_MAYSHARE and VM_SHARED. However, if only VM_MAYSHARE was set we would miss the free. With the introduction of the vma lock, a vma can not participate in pmd sharing if vm_private_data is NULL. Instead of clearing VM_MAYSHARE in __unmap_hugepage_range_final, free the vma lock to prevent sharing. Also, update the sharing code to make sure vma lock is indeed a condition for pmd sharing. hugetlb_vma_lock_free can then key off VM_MAYSHARE and not miss any vmas. Fixes: "hugetlb: add vma based lock for pmd sharing" Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- mm/hugetlb.c | 43 +++++++++++++++++++++++++++---------------- mm/memory.c | 4 ---- 2 files changed, 27 insertions(+), 20 deletions(-)