Message ID | 20180710222639.8241-14-yu-cheng.yu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/10/2018 03:26 PM, Yu-cheng Yu wrote: > + if (is_shstk_mapping(vma->vm_flags)) > + entry = pte_mkdirty_shstk(entry); > + else > + entry = pte_mkdirty(entry); > + > + entry = maybe_mkwrite(entry, vma); > if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1)) > update_mmu_cache(vma, vmf->address, vmf->pte); > pte_unmap_unlock(vmf->pte, vmf->ptl); > @@ -2526,7 +2532,11 @@ static int wp_page_copy(struct vm_fault *vmf) > } > flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); > entry = mk_pte(new_page, vma->vm_page_prot); > - entry = maybe_mkwrite(pte_mkdirty(entry), vma); > + if (is_shstk_mapping(vma->vm_flags)) > + entry = pte_mkdirty_shstk(entry); > + else > + entry = pte_mkdirty(entry); > + entry = maybe_mkwrite(entry, vma); Do we want to lift this hunk of code and put it elsewhere? Maybe: entry = pte_set_vma_features(entry, vma); and then: pte_t pte_set_vma_features(pte_t entry, struct vm_area_struct) { /* * Shadow stack PTEs are always dirty and always * writable. They have a different encoding for * this than normal PTEs, though. */ if (is_shstk_mapping(vma->vm_flags)) entry = pte_mkdirty_shstk(entry); else entry = pte_mkdirty(entry); entry = maybe_mkwrite(entry, vma); return entry; } > /* > * Clear the pte entry and flush it first, before updating the > * pte with the new entry. This will avoid a race condition > @@ -3201,6 +3211,14 @@ static int do_anonymous_page(struct vm_fault *vmf) > mem_cgroup_commit_charge(page, memcg, false, false); > lru_cache_add_active_or_unevictable(page, vma); > setpte: > + /* > + * If this is within a shadow stack mapping, mark > + * the PTE dirty. We don't use pte_mkdirty(), > + * because the PTE must have _PAGE_DIRTY_HW set. > + */ > + if (is_shstk_mapping(vma->vm_flags)) > + entry = pte_mkdirty_shstk(entry); > + > set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); I'm not sure this is the right spot to do this. The other code does pte_mkdirty_shstk() near where we do the pte_mkwrite(). Why not here? I think you might have picked this because it's a common path used by both allocated pages and zero pages. But, we can't have the zero pages in shadow stack areas since they can't be read-only. I think you need to move this up. Can you even consolidate it with the other two pte_mkdirt_shstk() call sites? > /* No need to invalidate - it was non-present before */ > @@ -3983,6 +4001,14 @@ static int handle_pte_fault(struct vm_fault *vmf) > entry = vmf->orig_pte; > if (unlikely(!pte_same(*vmf->pte, entry))) > goto unlock; > + > + /* > + * Shadow stack PTEs are copy-on-access, so do_wp_page() > + * handling on them no matter if we have write fault or not. > + */ I'd say this differently: Shadow stack PTEs can not be read-only and because of that can not have traditional copy-on-write semantics. This essentially performs a copy-on-write operation, but on *any* access, not just actual writes.
On Tue, Jul 10, 2018 at 04:06:25PM -0700, Dave Hansen wrote: > On 07/10/2018 03:26 PM, Yu-cheng Yu wrote: > > + if (is_shstk_mapping(vma->vm_flags)) > > + entry = pte_mkdirty_shstk(entry); > > + else > > + entry = pte_mkdirty(entry); > > + > > + entry = maybe_mkwrite(entry, vma); > > if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1)) > > update_mmu_cache(vma, vmf->address, vmf->pte); > > pte_unmap_unlock(vmf->pte, vmf->ptl); > > @@ -2526,7 +2532,11 @@ static int wp_page_copy(struct vm_fault *vmf) > > } > > flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); > > entry = mk_pte(new_page, vma->vm_page_prot); > > - entry = maybe_mkwrite(pte_mkdirty(entry), vma); > > + if (is_shstk_mapping(vma->vm_flags)) > > + entry = pte_mkdirty_shstk(entry); > > + else > > + entry = pte_mkdirty(entry); > > + entry = maybe_mkwrite(entry, vma); > > Do we want to lift this hunk of code and put it elsewhere? Maybe: > > entry = pte_set_vma_features(entry, vma); > > and then: > > pte_t pte_set_vma_features(pte_t entry, struct vm_area_struct) > { > /* > * Shadow stack PTEs are always dirty and always > * writable. They have a different encoding for > * this than normal PTEs, though. > */ > if (is_shstk_mapping(vma->vm_flags)) > entry = pte_mkdirty_shstk(entry); > else > entry = pte_mkdirty(entry); > > entry = maybe_mkwrite(entry, vma); > > return entry; > } Yes, that wants a helper like that. Not sold on the name, but whatever. Is there any way we can hide all the shadow stack magic in arch code?
On Wed, 2018-07-11 at 11:06 +0200, Peter Zijlstra wrote: > On Tue, Jul 10, 2018 at 04:06:25PM -0700, Dave Hansen wrote: > > > > On 07/10/2018 03:26 PM, Yu-cheng Yu wrote: > > > > > > + if (is_shstk_mapping(vma->vm_flags)) > > > + entry = pte_mkdirty_shstk(entry); > > > + else > > > + entry = pte_mkdirty(entry); > > > + > > > + entry = maybe_mkwrite(entry, vma); > > > if (ptep_set_access_flags(vma, vmf->address, vmf->pte, > > > entry, 1)) > > > update_mmu_cache(vma, vmf->address, vmf->pte); > > > pte_unmap_unlock(vmf->pte, vmf->ptl); > > > @@ -2526,7 +2532,11 @@ static int wp_page_copy(struct vm_fault > > > *vmf) > > > } > > > flush_cache_page(vma, vmf->address, > > > pte_pfn(vmf->orig_pte)); > > > entry = mk_pte(new_page, vma->vm_page_prot); > > > - entry = maybe_mkwrite(pte_mkdirty(entry), vma); > > > + if (is_shstk_mapping(vma->vm_flags)) > > > + entry = pte_mkdirty_shstk(entry); > > > + else > > > + entry = pte_mkdirty(entry); > > > + entry = maybe_mkwrite(entry, vma); > > Do we want to lift this hunk of code and put it elsewhere? Maybe: > > > > entry = pte_set_vma_features(entry, vma); > > > > and then: > > > > pte_t pte_set_vma_features(pte_t entry, struct vm_area_struct) > > { > > /* > > * Shadow stack PTEs are always dirty and always > > * writable. They have a different encoding for > > * this than normal PTEs, though. > > */ > > if (is_shstk_mapping(vma->vm_flags)) > > entry = pte_mkdirty_shstk(entry); > > else > > entry = pte_mkdirty(entry); > > > > entry = maybe_mkwrite(entry, vma); > > > > return entry; > > } > Yes, that wants a helper like that. Not sold on the name, but > whatever. > > Is there any way we can hide all the shadow stack magic in arch > code? We use is_shstk_mapping() only to determine PAGE_DIRTY_SW or PAGE_DIRTY_HW should be set in a PTE. One way to remove this shadow stack code from generic code is changing pte_mkdirty(pte) to pte_mkdirty(pte, vma), and in the arch code we handle shadow stack. Is this acceptable? Thanks, Yu-cheng
diff --git a/mm/memory.c b/mm/memory.c index 7206a634270b..a2695dbc0418 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2453,7 +2453,13 @@ static inline void wp_page_reuse(struct vm_fault *vmf) flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); entry = pte_mkyoung(vmf->orig_pte); - entry = maybe_mkwrite(pte_mkdirty(entry), vma); + + if (is_shstk_mapping(vma->vm_flags)) + entry = pte_mkdirty_shstk(entry); + else + entry = pte_mkdirty(entry); + + entry = maybe_mkwrite(entry, vma); if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1)) update_mmu_cache(vma, vmf->address, vmf->pte); pte_unmap_unlock(vmf->pte, vmf->ptl); @@ -2526,7 +2532,11 @@ static int wp_page_copy(struct vm_fault *vmf) } flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); entry = mk_pte(new_page, vma->vm_page_prot); - entry = maybe_mkwrite(pte_mkdirty(entry), vma); + if (is_shstk_mapping(vma->vm_flags)) + entry = pte_mkdirty_shstk(entry); + else + entry = pte_mkdirty(entry); + entry = maybe_mkwrite(entry, vma); /* * Clear the pte entry and flush it first, before updating the * pte with the new entry. This will avoid a race condition @@ -3201,6 +3211,14 @@ static int do_anonymous_page(struct vm_fault *vmf) mem_cgroup_commit_charge(page, memcg, false, false); lru_cache_add_active_or_unevictable(page, vma); setpte: + /* + * If this is within a shadow stack mapping, mark + * the PTE dirty. We don't use pte_mkdirty(), + * because the PTE must have _PAGE_DIRTY_HW set. + */ + if (is_shstk_mapping(vma->vm_flags)) + entry = pte_mkdirty_shstk(entry); + set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); /* No need to invalidate - it was non-present before */ @@ -3983,6 +4001,14 @@ static int handle_pte_fault(struct vm_fault *vmf) entry = vmf->orig_pte; if (unlikely(!pte_same(*vmf->pte, entry))) goto unlock; + + /* + * Shadow stack PTEs are copy-on-access, so do_wp_page() + * handling on them no matter if we have write fault or not. + */ + if (is_shstk_mapping(vmf->vma->vm_flags)) + return do_wp_page(vmf); + if (vmf->flags & FAULT_FLAG_WRITE) { if (!pte_write(entry)) return do_wp_page(vmf);
When a task does fork(), its shadow stack must be duplicated for the child. However, the child may not actually use all pages of of the copied shadow stack. This patch implements a flow that is similar to copy-on-write of an anonymous page, but for shadow stack memory. A shadow stack PTE needs to be RO and dirty. We use this dirty bit requirement to effect the copying of shadow stack pages. In copy_one_pte(), we clear the dirty bit from the shadow stack PTE. On the next shadow stack access to the PTE, a page fault occurs. At that time, we then copy/re-use the page and fix the PTE. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> --- mm/memory.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-)