Message ID | 20220929222936.14584-18-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Shadowstacks for userspace | expand |
On Thu, Sep 29, 2022 at 03:29:14PM -0700, Rick Edgecombe wrote: > From: Yu-cheng Yu <yu-cheng.yu@intel.com> > > With the introduction of shadow stack memory there are two ways a pte can > be writable: regular writable memory and shadow stack memory. > > In past patches, maybe_mkwrite() has been updated to apply pte_mkwrite() > or pte_mkwrite_shstk() depending on the VMA flag. This covers most cases > where a PTE is made writable. However, there are places where pte_mkwrite() > is called directly and the logic should now also create a shadow stack PTE > in the case of a shadow stack VMA. > > - do_anonymous_page() and migrate_vma_insert_page() check VM_WRITE > directly and call pte_mkwrite(), which is the same as maybe_mkwrite() > in logic and intention. Just change them to maybe_mkwrite(). > > - When userfaultfd is creating a PTE after userspace handles the fault > it calls pte_mkwrite() directly. Teach it about pte_mkwrite_shstk() > > In other cases where pte_mkwrite() is called directly, the VMA will not > be VM_SHADOW_STACK, and so shadow stack memory should not be created. > - In the case of pte_savedwrite(), shadow stack VMA's are excluded. > - In the case of the "dirty_accountable" optimization in mprotect(), > shadow stack VMA's won't be VM_SHARED, so it is not nessary. > > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org>
On Thu, Sep 29, 2022 at 03:29:14PM -0700, Rick Edgecombe wrote: > From: Yu-cheng Yu <yu-cheng.yu@intel.com> > > With the introduction of shadow stack memory there are two ways a pte can > be writable: regular writable memory and shadow stack memory. > > In past patches, maybe_mkwrite() has been updated to apply pte_mkwrite() > or pte_mkwrite_shstk() depending on the VMA flag. This covers most cases > where a PTE is made writable. However, there are places where pte_mkwrite() > is called directly and the logic should now also create a shadow stack PTE > in the case of a shadow stack VMA. > > - do_anonymous_page() and migrate_vma_insert_page() check VM_WRITE > directly and call pte_mkwrite(), which is the same as maybe_mkwrite() > in logic and intention. Just change them to maybe_mkwrite(). Looks like you folded change for do_anonymous_page() into the wrong patch. I see the relevant change in the previous patch.
Hopefully I will not waste your time again… If it has been discussed in the last 26 iterations, just tell me and ignore. On Sep 29, 2022, at 3:29 PM, Rick Edgecombe <rick.p.edgecombe@intel.com> wrote: > --- a/mm/migrate_device.c > +++ b/mm/migrate_device.c > @@ -606,8 +606,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, > goto abort; > } > entry = mk_pte(page, vma->vm_page_prot); > - if (vma->vm_flags & VM_WRITE) > - entry = pte_mkwrite(pte_mkdirty(entry)); > + entry = maybe_mkwrite(pte_mkdirty(entry), vma); > } This is not exactly the same logic. You might dirty read-only pages since you call pte_mkdirty() unconditionally. It has been known not to be very robust (e.g., dirty-COW and friends). Perhaps it is not dangerous following some recent enhancements, but why do you want to take the risk? Instead, although it might seem redundant, the compiler will hopefully would make it efficient: if (vma->vm_flags & VM_WRITE) { entry = pte_mkdirty(entry); entry = maybe_mkwrite(entry, vma); }
On Tue, 2022-10-04 at 02:56 +0300, Kirill A . Shutemov wrote: > On Thu, Sep 29, 2022 at 03:29:14PM -0700, Rick Edgecombe wrote: > > From: Yu-cheng Yu <yu-cheng.yu@intel.com> > > > > With the introduction of shadow stack memory there are two ways a > > pte can > > be writable: regular writable memory and shadow stack memory. > > > > In past patches, maybe_mkwrite() has been updated to apply > > pte_mkwrite() > > or pte_mkwrite_shstk() depending on the VMA flag. This covers most > > cases > > where a PTE is made writable. However, there are places where > > pte_mkwrite() > > is called directly and the logic should now also create a shadow > > stack PTE > > in the case of a shadow stack VMA. > > > > - do_anonymous_page() and migrate_vma_insert_page() check > > VM_WRITE > > directly and call pte_mkwrite(), which is the same as > > maybe_mkwrite() > > in logic and intention. Just change them to maybe_mkwrite(). > > Looks like you folded change for do_anonymous_page() into the wrong > patch. > I see the relevant change in the previous patch. Arg, yep thanks. It got moved accidentally.
On Mon, 2022-10-03 at 18:56 -0700, Nadav Amit wrote: > Hopefully I will not waste your time again… If it has been discussed > in the > last 26 iterations, just tell me and ignore. > > On Sep 29, 2022, at 3:29 PM, Rick Edgecombe < > rick.p.edgecombe@intel.com> wrote: > > > --- a/mm/migrate_device.c > > +++ b/mm/migrate_device.c > > @@ -606,8 +606,7 @@ static void migrate_vma_insert_page(struct > > migrate_vma *migrate, > > goto abort; > > } > > entry = mk_pte(page, vma->vm_page_prot); > > - if (vma->vm_flags & VM_WRITE) > > - entry = pte_mkwrite(pte_mkdirty(entry)); > > + entry = maybe_mkwrite(pte_mkdirty(entry), vma); > > } > > This is not exactly the same logic. You might dirty read-only pages > since > you call pte_mkdirty() unconditionally. It has been known not to be > very > robust (e.g., dirty-COW and friends). Perhaps it is not dangerous > following > some recent enhancements, but why do you want to take the risk? Yea those changes let me drop a patch. But, it's a good point. > > Instead, although it might seem redundant, the compiler will > hopefully would > make it efficient: > > if (vma->vm_flags & VM_WRITE) { > entry = pte_mkdirty(entry); > entry = maybe_mkwrite(entry, vma); > } > Thanks Nadav. I think you're right, it should have the open coded logic here and in the do_anonymous_page() chunk that got moved to the previous patch on accident.
On Thu, Sep 29, 2022 at 03:29:14PM -0700, Rick Edgecombe wrote: > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index 7327b2573f7c..b49372c7de41 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -63,6 +63,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, > int ret; > pte_t _dst_pte, *dst_pte; > bool writable = dst_vma->vm_flags & VM_WRITE; > + bool shstk = dst_vma->vm_flags & VM_SHADOW_STACK; > bool vm_shared = dst_vma->vm_flags & VM_SHARED; > bool page_in_cache = page->mapping; > spinlock_t *ptl; > @@ -83,9 +84,12 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, > writable = false; > } > > - if (writable) > - _dst_pte = pte_mkwrite(_dst_pte); > - else > + if (writable) { > + if (shstk) > + _dst_pte = pte_mkwrite_shstk(_dst_pte); > + else > + _dst_pte = pte_mkwrite(_dst_pte); > + } else > /* > * We need this to make sure write bit removed; as mk_pte() > * could return a pte with write bit set. Urgh.. that's unfortunate. But yeah, I don't see a way to make that pretty either.
On Fri, 2022-10-14 at 17:52 +0200, Peter Zijlstra wrote: > On Thu, Sep 29, 2022 at 03:29:14PM -0700, Rick Edgecombe wrote: > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > > index 7327b2573f7c..b49372c7de41 100644 > > --- a/mm/userfaultfd.c > > +++ b/mm/userfaultfd.c > > @@ -63,6 +63,7 @@ int mfill_atomic_install_pte(struct mm_struct > > *dst_mm, pmd_t *dst_pmd, > > int ret; > > pte_t _dst_pte, *dst_pte; > > bool writable = dst_vma->vm_flags & VM_WRITE; > > + bool shstk = dst_vma->vm_flags & VM_SHADOW_STACK; > > bool vm_shared = dst_vma->vm_flags & VM_SHARED; > > bool page_in_cache = page->mapping; > > spinlock_t *ptl; > > @@ -83,9 +84,12 @@ int mfill_atomic_install_pte(struct mm_struct > > *dst_mm, pmd_t *dst_pmd, > > writable = false; > > } > > > > - if (writable) > > - _dst_pte = pte_mkwrite(_dst_pte); > > - else > > + if (writable) { > > + if (shstk) > > + _dst_pte = pte_mkwrite_shstk(_dst_pte); > > + else > > + _dst_pte = pte_mkwrite(_dst_pte); > > + } else > > /* > > * We need this to make sure write bit removed; as > > mk_pte() > > * could return a pte with write bit set. > > Urgh.. that's unfortunate. But yeah, I don't see a way to make that > pretty either. Nadav pointed out that: entry = maybe_mkwrite(pte_mkdirty(entry), vma); and: if (vma->vm_flags & VM_WRITE) entry = pte_mkwrite(pte_mkdirty(entry)); Are not actually the same, because in the former the non-writable PTE gets marked dirty. So I was actually going to add two more cases like the ugly case.
diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 27fb37d65476..eba3164736b3 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -606,8 +606,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, goto abort; } entry = mk_pte(page, vma->vm_page_prot); - if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 7327b2573f7c..b49372c7de41 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -63,6 +63,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, int ret; pte_t _dst_pte, *dst_pte; bool writable = dst_vma->vm_flags & VM_WRITE; + bool shstk = dst_vma->vm_flags & VM_SHADOW_STACK; bool vm_shared = dst_vma->vm_flags & VM_SHARED; bool page_in_cache = page->mapping; spinlock_t *ptl; @@ -83,9 +84,12 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, writable = false; } - if (writable) - _dst_pte = pte_mkwrite(_dst_pte); - else + if (writable) { + if (shstk) + _dst_pte = pte_mkwrite_shstk(_dst_pte); + else + _dst_pte = pte_mkwrite(_dst_pte); + } else /* * We need this to make sure write bit removed; as mk_pte() * could return a pte with write bit set.