Message ID | 20210211000322.159437-3-mike.kravetz@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add hugetlb soft dirty support | expand |
On Wed, Feb 10, 2021 at 04:03:19PM -0800, Mike Kravetz wrote: > hugetlb fault processing code would COW all write faults where the > pte was not writable. Soft dirty will write protect ptes as part > of it's tracking mechanism. The existing hugetlb_cow code will do > the right thing for PRIVATE mappings as it checks map_count. However, > for SHARED mappings it would actually allocate and install a COW page. > Modify the code to not call hugetlb_cow for SHARED mappings and just > update the pte. > > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > --- > mm/hugetlb.c | 23 ++++++++++++++++------- > 1 file changed, 16 insertions(+), 7 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 47f3123afd1a..b561b6867ec1 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4584,8 +4584,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > * spinlock. For private mappings, we also lookup the pagecache > * page now as it is used to determine if a reservation has been > * consumed. > + * Only non-shared mappings are sent to hugetlb_cow. > */ > - if ((flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) { > + if ((flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry) && > + !(vma->vm_flags & VM_SHARED)) { > if (vma_needs_reservation(h, vma, haddr) < 0) { > ret = VM_FAULT_OOM; > goto out_mutex; > @@ -4593,9 +4595,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > /* Just decrements count, does not deallocate */ > vma_end_reservation(h, vma, haddr); > > - if (!(vma->vm_flags & VM_MAYSHARE)) > - pagecache_page = hugetlbfs_pagecache_page(h, > - vma, haddr); > + pagecache_page = hugetlbfs_pagecache_page(h, vma, haddr); Pure question: I see that the check actually changed from VM_MAYSHARE into VM_SHARE, then I noticed I'm actually unclear on the difference.. Say, when VM_MAYSHARE is set, could VM_SHARED be cleared in any case? Or say, is this change intended? I see that vma_set_page_prot() tried to remove VM_SHARED if soft dirty enabled (which should cause vma_wants_writenotify() to return true, iiuc), however that's temporary just to calculate vm_page_prot, and it's not applied to the vma->vm_flags. I failed to find a place where VM_SHARED of the vma is cleared while VM_MAYSHARE is set.. > } > > ptl = huge_pte_lock(h, mm, ptep); > @@ -4620,9 +4620,18 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > > if (flags & FAULT_FLAG_WRITE) { > if (!huge_pte_write(entry)) { > - ret = hugetlb_cow(mm, vma, address, ptep, > - pagecache_page, ptl); > - goto out_put_page; > + if (!(vma->vm_flags & VM_SHARED)) { > + ret = hugetlb_cow(mm, vma, address, ptep, > + pagecache_page, ptl); > + goto out_put_page; > + } > + > + /* write protected for soft dirty processing */ > + if ((vma->vm_flags & VM_WRITE) && This VM_WRITE check seems to be redundant. As example, do_user_addr_fault() of x86 code will check this right after vma lookup by access_error(). So when reach here if "flags & FAULT_FLAG_WRITE", then VM_WRITE must be set, imho. > + (vma->vm_flags & VM_SHARED)) > + entry = huge_pte_mkwrite(entry); Same question to VM_SHARED, since "(vma->vm_flags & VM_SHARED)" is just checked above and we'll go hugetlb_cow() otherwise. > + > + entry = huge_pte_mkdirty(entry); There's another huge_pte_mkdirty() right below; likely we could merge them somehow? Thanks, > } > entry = huge_pte_mkdirty(entry); > } > -- > 2.29.2 >
On 2/17/21 11:32 AM, Peter Xu wrote: > On Wed, Feb 10, 2021 at 04:03:19PM -0800, Mike Kravetz wrote: >> hugetlb fault processing code would COW all write faults where the >> pte was not writable. Soft dirty will write protect ptes as part >> of it's tracking mechanism. The existing hugetlb_cow code will do >> the right thing for PRIVATE mappings as it checks map_count. However, >> for SHARED mappings it would actually allocate and install a COW page. >> Modify the code to not call hugetlb_cow for SHARED mappings and just >> update the pte. >> >> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> >> --- >> mm/hugetlb.c | 23 ++++++++++++++++------- >> 1 file changed, 16 insertions(+), 7 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index 47f3123afd1a..b561b6867ec1 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -4584,8 +4584,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, >> * spinlock. For private mappings, we also lookup the pagecache >> * page now as it is used to determine if a reservation has been >> * consumed. >> + * Only non-shared mappings are sent to hugetlb_cow. >> */ >> - if ((flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) { >> + if ((flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry) && >> + !(vma->vm_flags & VM_SHARED)) { >> if (vma_needs_reservation(h, vma, haddr) < 0) { >> ret = VM_FAULT_OOM; >> goto out_mutex; >> @@ -4593,9 +4595,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, >> /* Just decrements count, does not deallocate */ >> vma_end_reservation(h, vma, haddr); >> >> - if (!(vma->vm_flags & VM_MAYSHARE)) >> - pagecache_page = hugetlbfs_pagecache_page(h, >> - vma, haddr); >> + pagecache_page = hugetlbfs_pagecache_page(h, vma, haddr); > > Pure question: I see that the check actually changed from VM_MAYSHARE into > VM_SHARE, then I noticed I'm actually unclear on the difference.. Say, when > VM_MAYSHARE is set, could VM_SHARED be cleared in any case? Or say, is this > change intended? The change was not intended. I will use VM_MAYSHARE. > > I see that vma_set_page_prot() tried to remove VM_SHARED if soft dirty enabled > (which should cause vma_wants_writenotify() to return true, iiuc), however > that's temporary just to calculate vm_page_prot, and it's not applied to the > vma->vm_flags. I failed to find a place where VM_SHARED of the vma is cleared > while VM_MAYSHARE is set.. I am not 100% sure about differences. Here is a snippet from do_mmap() where you can have VM_MAYSHARE and not VM_SHARED vm_flags |= VM_SHARED | VM_MAYSHARE; if (!(file->f_mode & FMODE_WRITE)) vm_flags &= ~(VM_MAYWRITE | VM_SHARED); fallthrough; > >> } >> >> ptl = huge_pte_lock(h, mm, ptep); >> @@ -4620,9 +4620,18 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, >> >> if (flags & FAULT_FLAG_WRITE) { >> if (!huge_pte_write(entry)) { >> - ret = hugetlb_cow(mm, vma, address, ptep, >> - pagecache_page, ptl); >> - goto out_put_page; >> + if (!(vma->vm_flags & VM_SHARED)) { >> + ret = hugetlb_cow(mm, vma, address, ptep, >> + pagecache_page, ptl); >> + goto out_put_page; >> + } >> + >> + /* write protected for soft dirty processing */ >> + if ((vma->vm_flags & VM_WRITE) && > > This VM_WRITE check seems to be redundant. As example, do_user_addr_fault() of > x86 code will check this right after vma lookup by access_error(). So when > reach here if "flags & FAULT_FLAG_WRITE", then VM_WRITE must be set, imho. Thanks, that sounds reasonable. I will check to make sure and drop the redundant check. > >> + (vma->vm_flags & VM_SHARED)) >> + entry = huge_pte_mkwrite(entry); > > Same question to VM_SHARED, since "(vma->vm_flags & VM_SHARED)" is just checked > above and we'll go hugetlb_cow() otherwise. Yes, certainly redundant here. > >> + >> + entry = huge_pte_mkdirty(entry); > > There's another huge_pte_mkdirty() right below; likely we could merge them somehow? > Yes, Thanks for taking a look!
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 47f3123afd1a..b561b6867ec1 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4584,8 +4584,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * spinlock. For private mappings, we also lookup the pagecache * page now as it is used to determine if a reservation has been * consumed. + * Only non-shared mappings are sent to hugetlb_cow. */ - if ((flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) { + if ((flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry) && + !(vma->vm_flags & VM_SHARED)) { if (vma_needs_reservation(h, vma, haddr) < 0) { ret = VM_FAULT_OOM; goto out_mutex; @@ -4593,9 +4595,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, /* Just decrements count, does not deallocate */ vma_end_reservation(h, vma, haddr); - if (!(vma->vm_flags & VM_MAYSHARE)) - pagecache_page = hugetlbfs_pagecache_page(h, - vma, haddr); + pagecache_page = hugetlbfs_pagecache_page(h, vma, haddr); } ptl = huge_pte_lock(h, mm, ptep); @@ -4620,9 +4620,18 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, if (flags & FAULT_FLAG_WRITE) { if (!huge_pte_write(entry)) { - ret = hugetlb_cow(mm, vma, address, ptep, - pagecache_page, ptl); - goto out_put_page; + if (!(vma->vm_flags & VM_SHARED)) { + ret = hugetlb_cow(mm, vma, address, ptep, + pagecache_page, ptl); + goto out_put_page; + } + + /* write protected for soft dirty processing */ + if ((vma->vm_flags & VM_WRITE) && + (vma->vm_flags & VM_SHARED)) + entry = huge_pte_mkwrite(entry); + + entry = huge_pte_mkdirty(entry); } entry = huge_pte_mkdirty(entry); }
hugetlb fault processing code would COW all write faults where the pte was not writable. Soft dirty will write protect ptes as part of it's tracking mechanism. The existing hugetlb_cow code will do the right thing for PRIVATE mappings as it checks map_count. However, for SHARED mappings it would actually allocate and install a COW page. Modify the code to not call hugetlb_cow for SHARED mappings and just update the pte. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- mm/hugetlb.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-)