Message ID | 20180710222639.8241-17-yu-cheng.yu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/10/2018 03:26 PM, Yu-cheng Yu wrote: > There are three possible shadow stack PTE settings: > > Normal SHSTK PTE: (R/O + DIRTY_HW) > SHSTK PTE COW'ed: (R/O + DIRTY_HW) > SHSTK PTE shared as R/O data: (R/O + DIRTY_SW) > > Update can_follow_write_pte/pmd for the shadow stack. First of all, thanks for the excellent patch headers. It's nice to have that reference every time even though it's repeated. > -static inline bool can_follow_write_pte(pte_t pte, unsigned int flags) > +static inline bool can_follow_write_pte(pte_t pte, unsigned int flags, > + bool shstk) > { > + bool pte_cowed = shstk ? is_shstk_pte(pte):pte_dirty(pte); > + > return pte_write(pte) || > - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte)); > + ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_cowed); > } Can we just pass the VMA in here? This use is OK-ish, but I generally detest true/false function arguments because you can't tell what they are when they show up without a named variable. But... Why does this even matter? Your own example showed that all shadowstack PTEs have either DIRTY_HW or DIRTY_SW set, and pte_dirty() checks both. That makes this check seem a bit superfluous.
On Tue, Jul 10, 2018 at 03:26:28PM -0700, Yu-cheng Yu wrote: > There are three possible shadow stack PTE settings: > > Normal SHSTK PTE: (R/O + DIRTY_HW) > SHSTK PTE COW'ed: (R/O + DIRTY_HW) > SHSTK PTE shared as R/O data: (R/O + DIRTY_SW) I count _2_ distinct states there. > Update can_follow_write_pte/pmd for the shadow stack. So the below disallows can_follow_write when shstk && _PAGE_DIRTY_SW, but this here Changelog doesn't explain why. Doesn't even get close. Also, the code is a right mess :/ Can't we try harder to not let this shadow stack stuff escape arch code.
On Tue, 2018-07-10 at 16:37 -0700, Dave Hansen wrote: > On 07/10/2018 03:26 PM, Yu-cheng Yu wrote: > > > > There are three possible shadow stack PTE settings: > > > > Normal SHSTK PTE: (R/O + DIRTY_HW) > > SHSTK PTE COW'ed: (R/O + DIRTY_HW) > > SHSTK PTE shared as R/O data: (R/O + DIRTY_SW) > > > > Update can_follow_write_pte/pmd for the shadow stack. > First of all, thanks for the excellent patch headers. It's nice to > have > that reference every time even though it's repeated. > > > > > -static inline bool can_follow_write_pte(pte_t pte, unsigned int > > flags) > > +static inline bool can_follow_write_pte(pte_t pte, unsigned int > > flags, > > + bool shstk) > > { > > + bool pte_cowed = shstk ? is_shstk_pte(pte):pte_dirty(pte); > > + > > return pte_write(pte) || > > - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && > > pte_dirty(pte)); > > + ((flags & FOLL_FORCE) && (flags & FOLL_COW) && > > pte_cowed); > > } > Can we just pass the VMA in here? This use is OK-ish, but I > generally > detest true/false function arguments because you can't tell what they > are when they show up without a named variable. > > But... Why does this even matter? Your own example showed that all > shadowstack PTEs have either DIRTY_HW or DIRTY_SW set, and > pte_dirty() > checks both. > > That makes this check seem a bit superfluous. My understanding is that we don't want to follow write pte if the page is shared as read-only. For a SHSTK page, that is (R/O + DIRTY_SW), which means the SHSTK page has not been COW'ed. Is that right? Thanks, Yu-cheng
On 07/11/2018 10:05 AM, Yu-cheng Yu wrote: > My understanding is that we don't want to follow write pte if the page > is shared as read-only. For a SHSTK page, that is (R/O + DIRTY_SW), > which means the SHSTK page has not been COW'ed. Is that right? Let's look at the code again: > -static inline bool can_follow_write_pte(pte_t pte, unsigned int flags) > +static inline bool can_follow_write_pte(pte_t pte, unsigned int flags, > + bool shstk) > { > + bool pte_cowed = shstk ? is_shstk_pte(pte):pte_dirty(pte); > + > return pte_write(pte) || > - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte)); > + ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_cowed); > } This is another case where the naming of pte_*() is biting us vs. the perversion of the PTE bits. The lack of comments and explanation inthe patch is compounding the confusion. We need to find a way to differentiate "someone can write to this PTE" from "the write bit is set in this PTE". In this particular hunk, we need to make it clear that pte_write() is *never* true for shadowstack PTEs. In other words, shadow stack VMAs will (should?) never even *see* a pte_write() PTE. I think this is a case where you just need to bite the bullet and bifurcate can_follow_write_pte(). Just separate the shadowstack and non-shadowstack parts.
On Wed, 2018-07-11 at 11:29 +0200, Peter Zijlstra wrote: > On Tue, Jul 10, 2018 at 03:26:28PM -0700, Yu-cheng Yu wrote: > > > > There are three possible shadow stack PTE settings: > > > > Normal SHSTK PTE: (R/O + DIRTY_HW) > > SHSTK PTE COW'ed: (R/O + DIRTY_HW) > > SHSTK PTE shared as R/O data: (R/O + DIRTY_SW) > I count _2_ distinct states there. > > > > > Update can_follow_write_pte/pmd for the shadow stack. > So the below disallows can_follow_write when shstk && _PAGE_DIRTY_SW, > but this here Changelog doesn't explain why. Doesn't even get close. Can we add the following to the log: When a SHSTK PTE is shared, it is (R/O + DIRTY_SW); otherwise it is (R/O + DIRTY_HW). When we (FOLL_WRITE | FOLL_FORCE) on a SHSTK PTE, the following must be true: - It has been COW'ed at least once (FOLL_COW is set); - It still is not shared, i.e. PTE is (R/O + DIRTY_HW); > > Also, the code is a right mess :/ Can't we try harder to not let this > shadow stack stuff escape arch code. We either check here if the VMA is SHSTK mapping or move the logic to pte_dirty(). The latter would be less obvious. Or can we create a can_follow_write_shstk_pte()? Yu-cheng
diff --git a/mm/gup.c b/mm/gup.c index b70d7ba7cc13..00171ee847af 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -64,10 +64,13 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address, * FOLL_FORCE can write to even unwritable pte's, but only * after we've gone through a COW cycle and they are dirty. */ -static inline bool can_follow_write_pte(pte_t pte, unsigned int flags) +static inline bool can_follow_write_pte(pte_t pte, unsigned int flags, + bool shstk) { + bool pte_cowed = shstk ? is_shstk_pte(pte):pte_dirty(pte); + return pte_write(pte) || - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte)); + ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_cowed); } static struct page *follow_page_pte(struct vm_area_struct *vma, @@ -78,7 +81,9 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, struct page *page; spinlock_t *ptl; pte_t *ptep, pte; + bool shstk; + shstk = is_shstk_mapping(vma->vm_flags); retry: if (unlikely(pmd_bad(*pmd))) return no_page_table(vma, flags); @@ -105,7 +110,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, } if ((flags & FOLL_NUMA) && pte_protnone(pte)) goto no_page; - if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) { + if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags, shstk)) { pte_unmap_unlock(ptep, ptl); return NULL; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 7f3e11d3b64a..db4c689a960a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1389,10 +1389,13 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd) * FOLL_FORCE can write to even unwritable pmd's, but only * after we've gone through a COW cycle and they are dirty. */ -static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags) +static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags, + bool shstk) { + bool pmd_cowed = shstk ? is_shstk_pmd(pmd):pmd_dirty(pmd); + return pmd_write(pmd) || - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd)); + ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_cowed); } struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, @@ -1402,10 +1405,11 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, { struct mm_struct *mm = vma->vm_mm; struct page *page = NULL; + bool shstk = is_shstk_mapping(vma->vm_flags); assert_spin_locked(pmd_lockptr(mm, pmd)); - if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags)) + if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags, shstk)) goto out; /* Avoid dumping huge zero page */
There are three possible shadow stack PTE settings: Normal SHSTK PTE: (R/O + DIRTY_HW) SHSTK PTE COW'ed: (R/O + DIRTY_HW) SHSTK PTE shared as R/O data: (R/O + DIRTY_SW) Update can_follow_write_pte/pmd for the shadow stack. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> --- mm/gup.c | 11 ++++++++--- mm/huge_memory.c | 10 +++++++--- 2 files changed, 15 insertions(+), 6 deletions(-)