diff mbox series

[17/35] mm: Fixup places that call pte_mkwrite() directly

Message ID 20220130211838.8382-18-rick.p.edgecombe@intel.com (mailing list archive)
State New
Headers show
Series Shadow stacks for userspace | expand

Commit Message

Rick Edgecombe Jan. 30, 2022, 9:18 p.m. UTC
From: Yu-cheng Yu <yu-cheng.yu@intel.com>

When serving a page fault, maybe_mkwrite() makes a PTE writable if it is in
a writable vma.  A shadow stack vma is writable, but its PTEs need
_PAGE_DIRTY to be set to become writable.  For this reason, maybe_mkwrite()
has been updated.

There are a few places that call pte_mkwrite() directly, but have the
same result as from maybe_mkwrite().  These sites need to be updated for
shadow stack as well.  Thus, change them to maybe_mkwrite():

- do_anonymous_page() and migrate_vma_insert_page() check VM_WRITE directly
  and call pte_mkwrite(), which is the same as maybe_mkwrite().  Change
  them to maybe_mkwrite().

- In do_numa_page(), if the numa entry was writable, then pte_mkwrite()
  is called directly.  Fix it by doing maybe_mkwrite().  Make the same
  changes to do_huge_pmd_numa_page().

- In change_pte_range(), pte_mkwrite() is called directly.  Replace it with
  maybe_mkwrite().

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Kees Cook <keescook@chromium.org>
---

Yu-cheng v25:
 - Apply same changes to do_huge_pmd_numa_page() as to do_numa_page().

 mm/huge_memory.c | 2 +-
 mm/memory.c      | 5 ++---
 mm/migrate.c     | 3 +--
 mm/mprotect.c    | 2 +-
 4 files changed, 5 insertions(+), 7 deletions(-)

Comments

Dave Hansen Feb. 9, 2022, 9:51 p.m. UTC | #1
On 1/30/22 13:18, Rick Edgecombe wrote:
> - do_anonymous_page() and migrate_vma_insert_page() check VM_WRITE directly
>   and call pte_mkwrite(), which is the same as maybe_mkwrite().  Change
>   them to maybe_mkwrite().

Those look OK.

> - In do_numa_page(), if the numa entry was writable, then pte_mkwrite()
>   is called directly.  Fix it by doing maybe_mkwrite().  Make the same
>   changes to do_huge_pmd_numa_page().

This is another "what", not "why" changelog.  This change puzzles me.

*Why* is this needed?  It sounds like pte_mkwrite() doesn't work for
shadow stack PTEs.  Let's say that explicitly.

I also this this is ab/misuse of maybe_mkwrite().

The shadow stack VMA *REQUIRES* PTEs with Dirty=1.  There's no *maybe*
about it.  The rest of this is essentially a hack to get
VM_SHADOW_STACK-required bits into the PTE.  We have a place where we
store those VMA-required bits: vma->vm_page_prot.  Look at how we store
the pkey bits in there for instance.

Let's say we set _PAGE_DIRTY in vma->vm_page_prot.  We'd come into
do_anonymous_page() for instance and do this:

>         entry = mk_pte(page, vma->vm_page_prot); <--- PTE is Write=0,Dirty=1 Yay!
>         entry = pte_sw_mkyoung(entry);
>         if (vma->vm_flags & VM_WRITE) <--- False, skip the pte_mkwrite()
>                 entry = pte_mkwrite(pte_mkdirty(entry));

In other words, it "just works" because shadow stack VMAs don't have
VM_WRITE set.

I think the other VM_WRITE checks would be fine too, although I'm unsure
about the change_page_attr() one.
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2adedcfca00b..3588e9fefbe0 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1489,7 +1489,7 @@  vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
 	pmd = pmd_modify(oldpmd, vma->vm_page_prot);
 	pmd = pmd_mkyoung(pmd);
 	if (was_writable)
-		pmd = pmd_mkwrite(pmd);
+		pmd = maybe_pmd_mkwrite(pmd, vma);
 	set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd);
 	update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
 	spin_unlock(vmf->ptl);
diff --git a/mm/memory.c b/mm/memory.c
index c125c4969913..c79444603d5d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3793,8 +3793,7 @@  static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 
 	entry = mk_pte(page, vma->vm_page_prot);
 	entry = pte_sw_mkyoung(entry);
-	if (vma->vm_flags & VM_WRITE)
-		entry = pte_mkwrite(pte_mkdirty(entry));
+	entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
 			&vmf->ptl);
@@ -4428,7 +4427,7 @@  static vm_fault_t do_numa_page(struct vm_fault *vmf)
 	pte = pte_modify(old_pte, vma->vm_page_prot);
 	pte = pte_mkyoung(pte);
 	if (was_writable)
-		pte = pte_mkwrite(pte);
+		pte = maybe_mkwrite(pte, vma);
 	ptep_modify_prot_commit(vma, vmf->address, vmf->pte, old_pte, pte);
 	update_mmu_cache(vma, vmf->address, vmf->pte);
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
diff --git a/mm/migrate.c b/mm/migrate.c
index c7da064b4781..438f1e21b9c7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2697,8 +2697,7 @@  static void migrate_vma_insert_page(struct migrate_vma *migrate,
 		}
 	} else {
 		entry = mk_pte(page, vma->vm_page_prot);
-		if (vma->vm_flags & VM_WRITE)
-			entry = pte_mkwrite(pte_mkdirty(entry));
+		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 	}
 
 	ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 0138dfcdb1d8..b0012c13a00e 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -135,7 +135,7 @@  static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 			if (dirty_accountable && pte_dirty(ptent) &&
 					(pte_soft_dirty(ptent) ||
 					 !(vma->vm_flags & VM_SOFTDIRTY))) {
-				ptent = pte_mkwrite(ptent);
+				ptent = maybe_mkwrite(ptent, vma);
 			}
 			ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
 			pages++;