mbox series

[0/7] Add folio_mk_pte() and simplify mk_pte()

Message ID 20250217190836.435039-1-willy@infradead.org (mailing list archive)
Headers show
Series Add folio_mk_pte() and simplify mk_pte() | expand

Message

Matthew Wilcox (Oracle) Feb. 17, 2025, 7:08 p.m. UTC
The intent is to add folio_mk_pte() to remove the conversion from folio
to page necessary to call mk_pte().  Eventually we might end up removing
mk_pte(), but that's not what's being proposed today.

I didn't want to add folio_mk_pte() to each architecture, and I didn't
want to lose any optimisations that architectures have from their own
implementation of mk_pte().  Fortunately, most architectures have by
now turned their mk_pte() into a fairly bland variant of pfn_pte(),
but s390 is different.

So patch 1 hoists the optimisation of calling pte_mkdirty() from s390
to generic code.  I'd appreciate some eyes on this from mm people who
understand this better than I do.  I originally had

-	if (write)
+	if (write || folio_test_dirty(folio))
		entry = maybe_mkwrite(pte_mkdirty(entry), vma);

and I think that broke COW under some circumstances that 01.org could
reproduce and I couldn't.

The various architecture maintainers might care to make sure that what
I've done is an equivalent transformation.  x86 was particularly tricky.
The build bots say it works ... at least now I've dealt with the pesky
!MMU problem.

The last patch to actually use folio_mk_pte() ought to be the least likely
to have a problem  since it's equivalent to calling mk_pte(&folio->page).

Matthew Wilcox (Oracle) (7):
  mm: Set the pte dirty if the folio is already dirty
  mm: Introduce a common definition of mk_pte()
  sparc32: Remove custom definition of mk_pte()
  x86: Remove custom definition of mk_pte()
  um: Remove custom definition of mk_pte()
  mm: Make mk_pte() definition unconditional
  mm: Add folio_mk_pte()

 arch/alpha/include/asm/pgtable.h         |  7 -------
 arch/arc/include/asm/pgtable-levels.h    |  1 -
 arch/arm/include/asm/pgtable.h           |  1 -
 arch/arm64/include/asm/pgtable.h         |  6 ------
 arch/csky/include/asm/pgtable.h          |  5 -----
 arch/hexagon/include/asm/pgtable.h       |  3 ---
 arch/loongarch/include/asm/pgtable.h     |  6 ------
 arch/m68k/include/asm/mcf_pgtable.h      |  6 ------
 arch/m68k/include/asm/motorola_pgtable.h |  6 ------
 arch/m68k/include/asm/sun3_pgtable.h     |  6 ------
 arch/microblaze/include/asm/pgtable.h    |  8 --------
 arch/mips/include/asm/pgtable.h          |  6 ------
 arch/nios2/include/asm/pgtable.h         |  6 ------
 arch/openrisc/include/asm/pgtable.h      |  2 --
 arch/parisc/include/asm/pgtable.h        |  6 ------
 arch/powerpc/include/asm/pgtable.h       |  3 +--
 arch/riscv/include/asm/pgtable.h         |  2 --
 arch/s390/include/asm/pgtable.h          | 10 ----------
 arch/sh/include/asm/pgtable_32.h         |  8 --------
 arch/sparc/include/asm/pgtable_32.h      | 15 +++++----------
 arch/sparc/include/asm/pgtable_64.h      |  1 -
 arch/um/include/asm/pgtable-2level.h     |  1 -
 arch/um/include/asm/pgtable-4level.h     |  9 ---------
 arch/um/include/asm/pgtable.h            | 18 ++++++++----------
 arch/x86/include/asm/pgtable.h           | 19 +++----------------
 arch/xtensa/include/asm/pgtable.h        |  6 ------
 include/linux/mm.h                       | 22 ++++++++++++++++++++++
 mm/memory.c                              |  8 +++++---
 mm/userfaultfd.c                         |  2 +-
 29 files changed, 45 insertions(+), 154 deletions(-)

Comments

David Hildenbrand Feb. 18, 2025, 10:29 a.m. UTC | #1
On 17.02.25 20:08, Matthew Wilcox (Oracle) wrote:
> The intent is to add folio_mk_pte() to remove the conversion from folio
> to page necessary to call mk_pte().  Eventually we might end up removing
> mk_pte(), but that's not what's being proposed today.
> 
> I didn't want to add folio_mk_pte() to each architecture, and I didn't
> want to lose any optimisations that architectures have from their own
> implementation of mk_pte().  Fortunately, most architectures have by
> now turned their mk_pte() into a fairly bland variant of pfn_pte(),
> but s390 is different.
> 
> So patch 1 hoists the optimisation of calling pte_mkdirty() from s390
> to generic code.  I'd appreciate some eyes on this from mm people who
> understand this better than I do.  I originally had
> 
> -	if (write)
> +	if (write || folio_test_dirty(folio))
> 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> 
> and I think that broke COW under some circumstances that 01.org could
> reproduce and I couldn't.

If it's an anon folio that logic would be broken, yes (anon CoW). We do 
have can_change_pte_writable() that tells you when it is safe to upgrade 
write permissions for a PTE.

Looking at can_change_pte_writable(), I don't know if filesystems with 
writenotify might have a problem when setting the PTE dirty and allowing 
for write access, just because the folio is dirty.

So I assume that it would break fs-level CoW indeed.