Message ID | 20180710222639.8241-15-yu-cheng.yu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/10/2018 03:26 PM, Yu-cheng Yu wrote: > @@ -1347,6 +1353,8 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd) > pmd_t entry; > entry = mk_huge_pmd(new_page, vma->vm_page_prot); > entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); > + if (is_shstk_mapping(vma->vm_flags)) > + entry = pmd_mkdirty_shstk(entry); This pattern is repeated enough that it makes me wonder if we should just be doing the shadowstack PTE creation in mk_huge_pmd() itself. Or, should we just be setting the shadowstack pte bit combination in vma->vm_page_prot so we don't have to go set it explicitly every time?
On Tue, Jul 10, 2018 at 03:26:26PM -0700, Yu-cheng Yu wrote: > diff --git a/mm/memory.c b/mm/memory.c > index a2695dbc0418..f7c46d61eaea 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4108,7 +4108,13 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > if (pmd_protnone(orig_pmd) && vma_is_accessible(vma)) > return do_huge_pmd_numa_page(&vmf, orig_pmd); > > - if (dirty && !pmd_write(orig_pmd)) { > + /* > + * Shadow stack trans huge PMDs are copy-on-access, > + * so wp_huge_pmd() on them no mater if we have a > + * write fault or not. > + */ > + if (is_shstk_mapping(vma->vm_flags) || > + (dirty && !pmd_write(orig_pmd))) { > ret = wp_huge_pmd(&vmf, orig_pmd); > if (!(ret & VM_FAULT_FALLBACK)) > return ret; Can't we do this (and the do_wp_page thing) by setting FAULT_FLAG_WRITE in the arch fault handler on shadow stack faults?
On Wed, 2018-07-11 at 11:10 +0200, Peter Zijlstra wrote: > On Tue, Jul 10, 2018 at 03:26:26PM -0700, Yu-cheng Yu wrote: > > > > diff --git a/mm/memory.c b/mm/memory.c > > index a2695dbc0418..f7c46d61eaea 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -4108,7 +4108,13 @@ static int __handle_mm_fault(struct > > vm_area_struct *vma, unsigned long address, > > if (pmd_protnone(orig_pmd) && > > vma_is_accessible(vma)) > > return do_huge_pmd_numa_page(&vmf, > > orig_pmd); > > > > - if (dirty && !pmd_write(orig_pmd)) { > > + /* > > + * Shadow stack trans huge PMDs are copy- > > on-access, > > + * so wp_huge_pmd() on them no mater if we > > have a > > + * write fault or not. > > + */ > > + if (is_shstk_mapping(vma->vm_flags) || > > + (dirty && !pmd_write(orig_pmd))) { > > ret = wp_huge_pmd(&vmf, orig_pmd); > > if (!(ret & VM_FAULT_FALLBACK)) > > return ret; > Can't we do this (and the do_wp_page thing) by setting > FAULT_FLAG_WRITE > in the arch fault handler on shadow stack faults? This can work. I don't know if that will create other issues. Let me think about that. Yu-cheng
On 07/10/2018 03:26 PM, Yu-cheng Yu wrote: > @@ -1193,6 +1195,8 @@ static int do_huge_pmd_wp_page_fallback(struct vm_fault *vmf, pmd_t orig_pmd, > pte_t entry; > entry = mk_pte(pages[i], vma->vm_page_prot); > entry = maybe_mkwrite(pte_mkdirty(entry), vma); > + if (is_shstk_mapping(vma->vm_flags)) > + entry = pte_mkdirty_shstk(entry); Peter Z was pointing out that we should get rid of all this generic code manipulation. We might not easily be able to do it *all*, but we can do better than what we've got here. Basically, if you have code outside of arch/x86 in your patch set that refers to shadow stacks, you should consider it a bug (for now), especially if you have to hack .c files. For instance, in the code above, you could move the is_shstk_mapping() into: static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) pte = pte_mkwrite(pte); + pte = arch_pte_mkwrite(pte, vma); + return pte; } ... and add an arch callback that does: static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) { if (!is_shstk_mapping(vma->vm_flags)) return pte; WARN_ON(... pte bits incompatible with shadow stacks?); /* Lots of comments of course */ entry = pte_mkdirty_shstk(entry); } This is just one example. You are probably going to need a couple of similar things. Just remember: the bar is very high to make changes to .c files outside of arch/x86. You can do a _bit_ more in non-x86 headers, but you have the most freedom to patch what you want as long as it's in arch/x86.
On Fri, 2018-07-20 at 07:20 -0700, Dave Hansen wrote: > On 07/10/2018 03:26 PM, Yu-cheng Yu wrote: > > > > @@ -1193,6 +1195,8 @@ static int > > do_huge_pmd_wp_page_fallback(struct vm_fault *vmf, pmd_t orig_pmd, > > pte_t entry; > > entry = mk_pte(pages[i], vma->vm_page_prot); > > entry = maybe_mkwrite(pte_mkdirty(entry), vma); > > + if (is_shstk_mapping(vma->vm_flags)) > > + entry = pte_mkdirty_shstk(entry); > Peter Z was pointing out that we should get rid of all this generic > code > manipulation. We might not easily be able to do it *all*, but we > can do > better than what we've got here. > > Basically, if you have code outside of arch/x86 in your patch set > that > refers to shadow stacks, you should consider it a bug (for now), > especially if you have to hack .c files. > > For instance, in the code above, you could move the > is_shstk_mapping() into: > > static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct > *vma) > { > if (likely(vma->vm_flags & VM_WRITE)) > pte = pte_mkwrite(pte); > > + pte = arch_pte_mkwrite(pte, vma); > + > return pte; > } > > ... and add an arch callback that does: > > static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct > *vma) > { > if (!is_shstk_mapping(vma->vm_flags)) > return pte; > > WARN_ON(... pte bits incompatible with shadow stacks?); > > /* Lots of comments of course */ > entry = pte_mkdirty_shstk(entry); > } > > This is just one example. You are probably going to need a couple > of > similar things. Just remember: the bar is very high to make changes > to > .c files outside of arch/x86. You can do a _bit_ more in non-x86 > headers, but you have the most freedom to patch what you want as > long as > it's in arch/x86. Ok, I will work on that. Thanks! Yu-cheng
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1cd7c1a57a14..7f3e11d3b64a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -597,6 +597,8 @@ static int __do_huge_pmd_anonymous_page(struct vm_fault *vmf, struct page *page, entry = mk_huge_pmd(page, vma->vm_page_prot); entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); + if (is_shstk_mapping(vma->vm_flags)) + entry = pmd_mkdirty_shstk(entry); page_add_new_anon_rmap(page, vma, haddr, true); mem_cgroup_commit_charge(page, memcg, false, true); lru_cache_add_active_or_unevictable(page, vma); @@ -1193,6 +1195,8 @@ static int do_huge_pmd_wp_page_fallback(struct vm_fault *vmf, pmd_t orig_pmd, pte_t entry; entry = mk_pte(pages[i], vma->vm_page_prot); entry = maybe_mkwrite(pte_mkdirty(entry), vma); + if (is_shstk_mapping(vma->vm_flags)) + entry = pte_mkdirty_shstk(entry); memcg = (void *)page_private(pages[i]); set_page_private(pages[i], 0); page_add_new_anon_rmap(pages[i], vmf->vma, haddr, false); @@ -1277,6 +1281,8 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd) pmd_t entry; entry = pmd_mkyoung(orig_pmd); entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); + if (is_shstk_mapping(vma->vm_flags)) + entry = pmd_mkdirty_shstk(entry); if (pmdp_set_access_flags(vma, haddr, vmf->pmd, entry, 1)) update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); ret |= VM_FAULT_WRITE; @@ -1347,6 +1353,8 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd) pmd_t entry; entry = mk_huge_pmd(new_page, vma->vm_page_prot); entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); + if (is_shstk_mapping(vma->vm_flags)) + entry = pmd_mkdirty_shstk(entry); pmdp_huge_clear_flush_notify(vma, haddr, vmf->pmd); page_add_new_anon_rmap(new_page, vma, haddr, true); mem_cgroup_commit_charge(new_page, memcg, false, true); diff --git a/mm/memory.c b/mm/memory.c index a2695dbc0418..f7c46d61eaea 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4108,7 +4108,13 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address, if (pmd_protnone(orig_pmd) && vma_is_accessible(vma)) return do_huge_pmd_numa_page(&vmf, orig_pmd); - if (dirty && !pmd_write(orig_pmd)) { + /* + * Shadow stack trans huge PMDs are copy-on-access, + * so wp_huge_pmd() on them no mater if we have a + * write fault or not. + */ + if (is_shstk_mapping(vma->vm_flags) || + (dirty && !pmd_write(orig_pmd))) { ret = wp_huge_pmd(&vmf, orig_pmd); if (!(ret & VM_FAULT_FALLBACK)) return ret;
This patch implements THP shadow stack memory copying in the same way as the previous patch for regular PTE. In copy_huge_pmd(), we clear the dirty bit from the PMD. On the next shadow stack access to the PMD, a page fault occurs. At that time, the page is copied/re-used and the PMD is fixed. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> --- mm/huge_memory.c | 8 ++++++++ mm/memory.c | 8 +++++++- 2 files changed, 15 insertions(+), 1 deletion(-)