Message ID | 20250217190836.435039-2-willy@infradead.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Add folio_mk_pte() and simplify mk_pte() | expand |
On 17.02.25 20:08, Matthew Wilcox (Oracle) wrote: > If the first access to a folio is a read that is then followed by a > write, we can save a page fault. s390 implemented this in their > mk_pte() in commit abf09bed3cce ("s390/mm: implement software dirty > bits"), but other architectures can also benefit from this. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > arch/s390/include/asm/pgtable.h | 7 +------ > mm/memory.c | 2 ++ > 2 files changed, 3 insertions(+), 6 deletions(-) > > diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h > index 3ca5af4cfe43..3ee495b5171e 100644 > --- a/arch/s390/include/asm/pgtable.h > +++ b/arch/s390/include/asm/pgtable.h > @@ -1451,12 +1451,7 @@ static inline pte_t mk_pte_phys(unsigned long physpage, pgprot_t pgprot) > > static inline pte_t mk_pte(struct page *page, pgprot_t pgprot) > { > - unsigned long physpage = page_to_phys(page); > - pte_t __pte = mk_pte_phys(physpage, pgprot); > - > - if (pte_write(__pte) && PageDirty(page)) > - __pte = pte_mkdirty(__pte); > - return __pte; > + return mk_pte_phys(page_to_phys(page), pgprot); > } > > #define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1)) > diff --git a/mm/memory.c b/mm/memory.c > index 539c0f7c6d54..4330560eee55 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5124,6 +5124,8 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio, > > if (write) > entry = maybe_mkwrite(pte_mkdirty(entry), vma); > + else if (pte_write(entry) && folio_test_dirty(folio)) > + entry = pte_mkdirty(entry); > if (unlikely(vmf_orig_pte_uffd_wp(vmf))) > entry = pte_mkuffd_wp(entry); > /* copy-on-write page */ Yes, that looks sane Acked-by: David Hildenbrand <david@redhat.com>
On Mon, Feb 17, 2025 at 07:08:28PM +0000, Matthew Wilcox (Oracle) wrote: Hi Matthew, > If the first access to a folio is a read that is then followed by a > write, we can save a page fault. s390 implemented this in their > mk_pte() in commit abf09bed3cce ("s390/mm: implement software dirty > bits"), but other architectures can also benefit from this. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > arch/s390/include/asm/pgtable.h | 7 +------ > mm/memory.c | 2 ++ > 2 files changed, 3 insertions(+), 6 deletions(-) > > diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h > index 3ca5af4cfe43..3ee495b5171e 100644 > --- a/arch/s390/include/asm/pgtable.h > +++ b/arch/s390/include/asm/pgtable.h > @@ -1451,12 +1451,7 @@ static inline pte_t mk_pte_phys(unsigned long physpage, pgprot_t pgprot) > > static inline pte_t mk_pte(struct page *page, pgprot_t pgprot) > { > - unsigned long physpage = page_to_phys(page); > - pte_t __pte = mk_pte_phys(physpage, pgprot); > - > - if (pte_write(__pte) && PageDirty(page)) > - __pte = pte_mkdirty(__pte); > - return __pte; > + return mk_pte_phys(page_to_phys(page), pgprot); > } With the above the implicit dirtifying of hugetlb PTEs (as result of mk_huge_pte() -> mk_pte()) in make_huge_pte() is removed: static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, bool try_mkwrite) { ... if (try_mkwrite && (vma->vm_flags & VM_WRITE)) { entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page, vma->vm_page_prot))); } else { entry = huge_pte_wrprotect(mk_huge_pte(page, vma->vm_page_prot)); } ... } What is your take on this? > #define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1)) > diff --git a/mm/memory.c b/mm/memory.c > index 539c0f7c6d54..4330560eee55 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5124,6 +5124,8 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio, > > if (write) > entry = maybe_mkwrite(pte_mkdirty(entry), vma); > + else if (pte_write(entry) && folio_test_dirty(folio)) > + entry = pte_mkdirty(entry); > if (unlikely(vmf_orig_pte_uffd_wp(vmf))) > entry = pte_mkuffd_wp(entry); > /* copy-on-write page */ Thanks!
On Tue, Feb 18, 2025 at 05:20:28PM +0100, Alexander Gordeev wrote: > > +++ b/arch/s390/include/asm/pgtable.h > > @@ -1451,12 +1451,7 @@ static inline pte_t mk_pte_phys(unsigned long physpage, pgprot_t pgprot) > > > > static inline pte_t mk_pte(struct page *page, pgprot_t pgprot) > > { > > - unsigned long physpage = page_to_phys(page); > > - pte_t __pte = mk_pte_phys(physpage, pgprot); > > - > > - if (pte_write(__pte) && PageDirty(page)) > > - __pte = pte_mkdirty(__pte); > > - return __pte; > > + return mk_pte_phys(page_to_phys(page), pgprot); > > } > > With the above the implicit dirtifying of hugetlb PTEs (as result of > mk_huge_pte() -> mk_pte()) in make_huge_pte() is removed: > > static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, > bool try_mkwrite) > { > ... > if (try_mkwrite && (vma->vm_flags & VM_WRITE)) { > entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page, > vma->vm_page_prot))); > } else { > entry = huge_pte_wrprotect(mk_huge_pte(page, > vma->vm_page_prot)); > } Took me a moment to spot how this was getting invoked; for anyone else playing along, it's mk_huge_pte() which calls mk_pte(). But I'm not sure how you lose out on the PTE being marked dirty. In the first arm that you've quoted, the pte is made dirty anyway. In the second arm, it's being writeprotected, so marking it dirty isn't a helpful thing to do because writing to it will cause a fault anyway? I know s390 is a little different, so there's probably something I'm missing.
On Tue, Feb 18, 2025 at 05:06:38PM +0000, Matthew Wilcox wrote: ... > > With the above the implicit dirtifying of hugetlb PTEs (as result of > > mk_huge_pte() -> mk_pte()) in make_huge_pte() is removed: > > > > static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, > > bool try_mkwrite) > > { > > ... > > if (try_mkwrite && (vma->vm_flags & VM_WRITE)) { > > entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page, > > vma->vm_page_prot))); > > } else { > > entry = huge_pte_wrprotect(mk_huge_pte(page, > > vma->vm_page_prot)); > > } > > Took me a moment to spot how this was getting invoked; for anyone else > playing along, it's mk_huge_pte() which calls mk_pte(). > > But I'm not sure how you lose out on the PTE being marked dirty. In > the first arm that you've quoted, the pte is made dirty anyway. In the > second arm, it's being writeprotected, so marking it dirty isn't a > helpful thing to do because writing to it will cause a fault anyway? > > I know s390 is a little different, so there's probably something I'm > missing. No, it is just me missing the obvious.
On Mon, Feb 17, 2025 at 07:08:28PM +0000, Matthew Wilcox (Oracle) wrote: > If the first access to a folio is a read that is then followed by a > write, we can save a page fault. s390 implemented this in their > mk_pte() in commit abf09bed3cce ("s390/mm: implement software dirty > bits"), but other architectures can also benefit from this. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > arch/s390/include/asm/pgtable.h | 7 +------ > mm/memory.c | 2 ++ > 2 files changed, 3 insertions(+), 6 deletions(-) > > diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h > index 3ca5af4cfe43..3ee495b5171e 100644 > --- a/arch/s390/include/asm/pgtable.h > +++ b/arch/s390/include/asm/pgtable.h > @@ -1451,12 +1451,7 @@ static inline pte_t mk_pte_phys(unsigned long physpage, pgprot_t pgprot) > > static inline pte_t mk_pte(struct page *page, pgprot_t pgprot) > { > - unsigned long physpage = page_to_phys(page); > - pte_t __pte = mk_pte_phys(physpage, pgprot); > - > - if (pte_write(__pte) && PageDirty(page)) > - __pte = pte_mkdirty(__pte); > - return __pte; > + return mk_pte_phys(page_to_phys(page), pgprot); > } > > #define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1)) > diff --git a/mm/memory.c b/mm/memory.c > index 539c0f7c6d54..4330560eee55 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5124,6 +5124,8 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio, > > if (write) > entry = maybe_mkwrite(pte_mkdirty(entry), vma); > + else if (pte_write(entry) && folio_test_dirty(folio)) > + entry = pte_mkdirty(entry); > if (unlikely(vmf_orig_pte_uffd_wp(vmf))) > entry = pte_mkuffd_wp(entry); > /* copy-on-write page */ Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Thanks!
> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Sorry, I meant for s390.
Can not judge the other archs impact.
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 3ca5af4cfe43..3ee495b5171e 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1451,12 +1451,7 @@ static inline pte_t mk_pte_phys(unsigned long physpage, pgprot_t pgprot) static inline pte_t mk_pte(struct page *page, pgprot_t pgprot) { - unsigned long physpage = page_to_phys(page); - pte_t __pte = mk_pte_phys(physpage, pgprot); - - if (pte_write(__pte) && PageDirty(page)) - __pte = pte_mkdirty(__pte); - return __pte; + return mk_pte_phys(page_to_phys(page), pgprot); } #define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1)) diff --git a/mm/memory.c b/mm/memory.c index 539c0f7c6d54..4330560eee55 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5124,6 +5124,8 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio, if (write) entry = maybe_mkwrite(pte_mkdirty(entry), vma); + else if (pte_write(entry) && folio_test_dirty(folio)) + entry = pte_mkdirty(entry); if (unlikely(vmf_orig_pte_uffd_wp(vmf))) entry = pte_mkuffd_wp(entry); /* copy-on-write page */
If the first access to a folio is a read that is then followed by a write, we can save a page fault. s390 implemented this in their mk_pte() in commit abf09bed3cce ("s390/mm: implement software dirty bits"), but other architectures can also benefit from this. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- arch/s390/include/asm/pgtable.h | 7 +------ mm/memory.c | 2 ++ 2 files changed, 3 insertions(+), 6 deletions(-)