diff mbox series

[4/7] x86: Remove custom definition of mk_pte()

Message ID 20250217190836.435039-5-willy@infradead.org (mailing list archive)
State New
Headers show
Series Add folio_mk_pte() and simplify mk_pte() | expand

Commit Message

Matthew Wilcox Feb. 17, 2025, 7:08 p.m. UTC
Move the shadow stack check to pfn_pte() which lets us use the common
definition of mk_pte().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 arch/x86/include/asm/pgtable.h | 19 +++----------------
 1 file changed, 3 insertions(+), 16 deletions(-)

Comments

Dave Hansen Feb. 19, 2025, 8:53 p.m. UTC | #1
On 2/17/25 11:08, Matthew Wilcox (Oracle) wrote:
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 593f10aabd45..9f480bdafd20 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -784,6 +784,9 @@ static inline pgprotval_t check_pgprot(pgprot_t pgprot)
>  static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
>  {
>  	phys_addr_t pfn = (phys_addr_t)page_nr << PAGE_SHIFT;
> +	/* This bit combination is used to mark shadow stacks */
> +	WARN_ON_ONCE((pgprot_val(pgprot) & (_PAGE_DIRTY | _PAGE_RW)) ==
> +			_PAGE_DIRTY);

Looks sane to me. Good riddance to unnecessary arch-specific code.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

Just one note (in case anyone ever trips over that WARN_ON_ONCE()): This
is a problem with the existing code and with your patch, but the
'pgprot' going to mk_pte() or pfn_pte() probably can't come from a
hardware PTE that has the dirty bit clear.

Old, pre-shadow-stack hardware could race between:

	1. Software transitioning _PAGE_RW 1=>0
	2. The CPU page walker trying to set
	   _PAGE_DIRTY in response to a write

and end up with a Write=0,Dirty=1 PTE.

That doesn't happen for kernel memory because most or all of the PTEs
that the kernel establishes have _PAGE_DIRTY=1 so the page walker never
tries to set _PAGE_DIRTY. It's also generally some kind of a bug if one
CPU is in the kernel trying to write to a page while another is making
the page read-only.

Anyway, I can basically see cases where this warning might trip, but
it's very likely on older hardware when the kernel is doing something
else silly.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 593f10aabd45..9f480bdafd20 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -784,6 +784,9 @@  static inline pgprotval_t check_pgprot(pgprot_t pgprot)
 static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
 {
 	phys_addr_t pfn = (phys_addr_t)page_nr << PAGE_SHIFT;
+	/* This bit combination is used to mark shadow stacks */
+	WARN_ON_ONCE((pgprot_val(pgprot) & (_PAGE_DIRTY | _PAGE_RW)) ==
+			_PAGE_DIRTY);
 	pfn ^= protnone_mask(pgprot_val(pgprot));
 	pfn &= PTE_PFN_MASK;
 	return __pte(pfn | check_pgprot(pgprot));
@@ -1080,22 +1083,6 @@  static inline unsigned long pmd_page_vaddr(pmd_t pmd)
  */
 #define pmd_page(pmd)	pfn_to_page(pmd_pfn(pmd))
 
-/*
- * Conversion functions: convert a page and protection to a page entry,
- * and a page entry and page directory to the page they refer to.
- *
- * (Currently stuck as a macro because of indirect forward reference
- * to linux/mm.h:page_to_nid())
- */
-#define mk_pte(page, pgprot)						  \
-({									  \
-	pgprot_t __pgprot = pgprot;					  \
-									  \
-	WARN_ON_ONCE((pgprot_val(__pgprot) & (_PAGE_DIRTY | _PAGE_RW)) == \
-		    _PAGE_DIRTY);					  \
-	pfn_pte(page_to_pfn(page), __pgprot);				  \
-})
-
 static inline int pmd_bad(pmd_t pmd)
 {
 	return (pmd_flags(pmd) & ~(_PAGE_USER | _PAGE_ACCESSED)) !=