Message ID | 3a190892355989d42f59cf9f2f98b94694b0d24d.1718090413.git.baolin.wang@linux.alibaba.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | add mTHP support for anonymous shmem | expand |
On 11 Jun 2024, at 6:11, Baolin Wang wrote: > Add large folio mapping establishment support for finish_fault() as a > preparation, to support multi-size THP allocation of anonymous shmem pages > in the following patches. > > Keep the same behavior (per-page fault) for non-anon shmem to avoid inflating > the RSS unintentionally, and we can discuss what size of mapping to build > when extending mTHP to control non-anon shmem in the future. > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> > --- > mm/memory.c | 57 +++++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 47 insertions(+), 10 deletions(-) > LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com> -- Best Regards, Yan, Zi
On 2024/6/11 22:38, Zi Yan wrote: > On 11 Jun 2024, at 6:11, Baolin Wang wrote: > >> Add large folio mapping establishment support for finish_fault() as a >> preparation, to support multi-size THP allocation of anonymous shmem pages >> in the following patches. >> >> Keep the same behavior (per-page fault) for non-anon shmem to avoid inflating >> the RSS unintentionally, and we can discuss what size of mapping to build >> when extending mTHP to control non-anon shmem in the future. >> >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >> --- >> mm/memory.c | 57 +++++++++++++++++++++++++++++++++++++++++++---------- >> 1 file changed, 47 insertions(+), 10 deletions(-) >> > LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com> Thanks Zi for reviewing.
On 2024/6/11 18:11, Baolin Wang wrote: > Add large folio mapping establishment support for finish_fault() as a > preparation, to support multi-size THP allocation of anonymous shmem pages > in the following patches. > > Keep the same behavior (per-page fault) for non-anon shmem to avoid inflating > the RSS unintentionally, and we can discuss what size of mapping to build > when extending mTHP to control non-anon shmem in the future. > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> > --- > mm/memory.c | 57 +++++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 47 insertions(+), 10 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index eef4e482c0c2..72775ee99ff3 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4831,9 +4831,12 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > { > struct vm_area_struct *vma = vmf->vma; > struct page *page; > + struct folio *folio; > vm_fault_t ret; > bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) && > !(vma->vm_flags & VM_SHARED); > + int type, nr_pages; > + unsigned long addr = vmf->address; > > /* Did we COW the page? */ > if (is_cow) > @@ -4864,24 +4867,58 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > return VM_FAULT_OOM; > } > > + folio = page_folio(page); > + nr_pages = folio_nr_pages(folio); > + > + /* > + * Using per-page fault to maintain the uffd semantics, and same > + * approach also applies to non-anonymous-shmem faults to avoid > + * inflating the RSS of the process. > + */ > + if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma))) { > + nr_pages = 1; > + } else if (nr_pages > 1) { > + pgoff_t idx = folio_page_idx(folio, page); > + /* The page offset of vmf->address within the VMA. */ > + pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff; > + vma->vm_pgoff > + /* > + * Fallback to per-page fault in case the folio size in page > + * cache beyond the VMA limits. > + */ > + if (unlikely(vma_off < idx || > + vma_off + (nr_pages - idx) > vma_pages(vma))) { > + nr_pages = 1; > + } else { > + /* Now we can set mappings for the whole large folio. */ > + addr = vmf->address - idx * PAGE_SIZE; addr -= idx * PAGE_SIZE; > + page = &folio->page; > + } > + } > + > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, > - vmf->address, &vmf->ptl); > + addr, &vmf->ptl); no newline now, > if (!vmf->pte) > return VM_FAULT_NOPAGE; > > /* Re-check under ptl */ > - if (likely(!vmf_pte_changed(vmf))) { > - struct folio *folio = page_folio(page); > - int type = is_cow ? MM_ANONPAGES : mm_counter_file(folio); > - > - set_pte_range(vmf, folio, page, 1, vmf->address); > - add_mm_counter(vma->vm_mm, type, 1); > - ret = 0; > - } else { > - update_mmu_tlb(vma, vmf->address, vmf->pte); > + if (nr_pages == 1 && unlikely(vmf_pte_changed(vmf))) { > + update_mmu_tlb(vma, addr, vmf->pte); > + ret = VM_FAULT_NOPAGE; > + goto unlock; > + } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) { > + update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); > ret = VM_FAULT_NOPAGE; > + goto unlock; > } We may add a vmf_pte_range_changed(), but separate it. Some very small nits, up to you, Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> > > + folio_ref_add(folio, nr_pages - 1); > + set_pte_range(vmf, folio, page, nr_pages, addr); > + type = is_cow ? MM_ANONPAGES : mm_counter_file(folio); > + add_mm_counter(vma->vm_mm, type, nr_pages); > + ret = 0; > + > +unlock: > pte_unmap_unlock(vmf->pte, vmf->ptl); > return ret; > }
On 2024/6/12 21:40, Kefeng Wang wrote: > > > On 2024/6/11 18:11, Baolin Wang wrote: >> Add large folio mapping establishment support for finish_fault() as a >> preparation, to support multi-size THP allocation of anonymous shmem >> pages >> in the following patches. >> >> Keep the same behavior (per-page fault) for non-anon shmem to avoid >> inflating >> the RSS unintentionally, and we can discuss what size of mapping to build >> when extending mTHP to control non-anon shmem in the future. >> >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >> --- >> mm/memory.c | 57 +++++++++++++++++++++++++++++++++++++++++++---------- >> 1 file changed, 47 insertions(+), 10 deletions(-) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index eef4e482c0c2..72775ee99ff3 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -4831,9 +4831,12 @@ vm_fault_t finish_fault(struct vm_fault *vmf) >> { >> struct vm_area_struct *vma = vmf->vma; >> struct page *page; >> + struct folio *folio; >> vm_fault_t ret; >> bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) && >> !(vma->vm_flags & VM_SHARED); >> + int type, nr_pages; >> + unsigned long addr = vmf->address; >> /* Did we COW the page? */ >> if (is_cow) >> @@ -4864,24 +4867,58 @@ vm_fault_t finish_fault(struct vm_fault *vmf) >> return VM_FAULT_OOM; >> } >> + folio = page_folio(page); >> + nr_pages = folio_nr_pages(folio); >> + >> + /* >> + * Using per-page fault to maintain the uffd semantics, and same >> + * approach also applies to non-anonymous-shmem faults to avoid >> + * inflating the RSS of the process. >> + */ >> + if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma))) { >> + nr_pages = 1; >> + } else if (nr_pages > 1) { >> + pgoff_t idx = folio_page_idx(folio, page); >> + /* The page offset of vmf->address within the VMA. */ >> + pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff; >> + > vma->vm_pgoff > >> + /* >> + * Fallback to per-page fault in case the folio size in page >> + * cache beyond the VMA limits. >> + */ >> + if (unlikely(vma_off < idx || >> + vma_off + (nr_pages - idx) > vma_pages(vma))) { >> + nr_pages = 1; >> + } else { >> + /* Now we can set mappings for the whole large folio. */ >> + addr = vmf->address - idx * PAGE_SIZE; > > addr -= idx * PAGE_SIZE; > >> + page = &folio->page; >> + } >> + } >> + >> vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, >> - vmf->address, &vmf->ptl); >> + addr, &vmf->ptl); > > no newline now, > >> if (!vmf->pte) >> return VM_FAULT_NOPAGE; >> /* Re-check under ptl */ >> - if (likely(!vmf_pte_changed(vmf))) { >> - struct folio *folio = page_folio(page); >> - int type = is_cow ? MM_ANONPAGES : mm_counter_file(folio); >> - >> - set_pte_range(vmf, folio, page, 1, vmf->address); >> - add_mm_counter(vma->vm_mm, type, 1); >> - ret = 0; >> - } else { >> - update_mmu_tlb(vma, vmf->address, vmf->pte); >> + if (nr_pages == 1 && unlikely(vmf_pte_changed(vmf))) { >> + update_mmu_tlb(vma, addr, vmf->pte); >> + ret = VM_FAULT_NOPAGE; >> + goto unlock; >> + } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) { >> + update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); >> ret = VM_FAULT_NOPAGE; >> + goto unlock; >> } > > We may add a vmf_pte_range_changed(), but separate it. > > Some very small nits, up to you, > > Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Thanks for reviewing. If a new version is needed, then I will clean up these coding style issues.
diff --git a/mm/memory.c b/mm/memory.c index eef4e482c0c2..72775ee99ff3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4831,9 +4831,12 @@ vm_fault_t finish_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct page *page; + struct folio *folio; vm_fault_t ret; bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED); + int type, nr_pages; + unsigned long addr = vmf->address; /* Did we COW the page? */ if (is_cow) @@ -4864,24 +4867,58 @@ vm_fault_t finish_fault(struct vm_fault *vmf) return VM_FAULT_OOM; } + folio = page_folio(page); + nr_pages = folio_nr_pages(folio); + + /* + * Using per-page fault to maintain the uffd semantics, and same + * approach also applies to non-anonymous-shmem faults to avoid + * inflating the RSS of the process. + */ + if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma))) { + nr_pages = 1; + } else if (nr_pages > 1) { + pgoff_t idx = folio_page_idx(folio, page); + /* The page offset of vmf->address within the VMA. */ + pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff; + + /* + * Fallback to per-page fault in case the folio size in page + * cache beyond the VMA limits. + */ + if (unlikely(vma_off < idx || + vma_off + (nr_pages - idx) > vma_pages(vma))) { + nr_pages = 1; + } else { + /* Now we can set mappings for the whole large folio. */ + addr = vmf->address - idx * PAGE_SIZE; + page = &folio->page; + } + } + vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, - vmf->address, &vmf->ptl); + addr, &vmf->ptl); if (!vmf->pte) return VM_FAULT_NOPAGE; /* Re-check under ptl */ - if (likely(!vmf_pte_changed(vmf))) { - struct folio *folio = page_folio(page); - int type = is_cow ? MM_ANONPAGES : mm_counter_file(folio); - - set_pte_range(vmf, folio, page, 1, vmf->address); - add_mm_counter(vma->vm_mm, type, 1); - ret = 0; - } else { - update_mmu_tlb(vma, vmf->address, vmf->pte); + if (nr_pages == 1 && unlikely(vmf_pte_changed(vmf))) { + update_mmu_tlb(vma, addr, vmf->pte); + ret = VM_FAULT_NOPAGE; + goto unlock; + } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) { + update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); ret = VM_FAULT_NOPAGE; + goto unlock; } + folio_ref_add(folio, nr_pages - 1); + set_pte_range(vmf, folio, page, nr_pages, addr); + type = is_cow ? MM_ANONPAGES : mm_counter_file(folio); + add_mm_counter(vma->vm_mm, type, nr_pages); + ret = 0; + +unlock: pte_unmap_unlock(vmf->pte, vmf->ptl); return ret; }
Add large folio mapping establishment support for finish_fault() as a preparation, to support multi-size THP allocation of anonymous shmem pages in the following patches. Keep the same behavior (per-page fault) for non-anon shmem to avoid inflating the RSS unintentionally, and we can discuss what size of mapping to build when extending mTHP to control non-anon shmem in the future. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> --- mm/memory.c | 57 +++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 47 insertions(+), 10 deletions(-)