Message ID | 20230213045351.3945824-12-debug@rivosinc.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | None | expand |
On 13.02.23 05:53, Deepak Gupta wrote: > maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if > VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write- > able memory except it can only be written by certain specific > instructions. This patch allows maybe_mkwrite to create shadow stack PTEs > if vma is shadow stack VMA. Each arch can define which combination of VMA > flags means a shadow stack. > > Additionally pte_mkshdwstk must be provided by arch specific PTE > construction headers to create shadow stack PTEs. (in arch specific > pgtable.h). > > This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK > is not selected. > > Signed-off-by: Deepak Gupta <debug@rivosinc.com> > --- > include/linux/mm.h | 23 +++++++++++++++++++++-- > include/linux/pgtable.h | 4 ++++ > 2 files changed, 25 insertions(+), 2 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 8f857163ac89..a7705bc49bfe 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page) > void free_compound_page(struct page *page); > > #ifdef CONFIG_MMU > + > +#ifdef CONFIG_USER_SHADOW_STACK > +bool arch_is_shadow_stack_vma(struct vm_area_struct *vma); > +#endif > + > +static inline bool > +is_shadow_stack_vma(struct vm_area_struct *vma) > +{ > +#ifdef CONFIG_USER_SHADOW_STACK > + return arch_is_shadow_stack_vma(vma); > +#else > + return false; > +#endif > +} > + > /* > * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when > * servicing faults for write access. In the normal case, do always want > @@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page); > */ > static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) > { > - if (likely(vma->vm_flags & VM_WRITE)) > - pte = pte_mkwrite(pte); > + if (likely(vma->vm_flags & VM_WRITE)) { > + if (unlikely(is_shadow_stack_vma(vma))) > + pte = pte_mkshdwstk(pte); > + else > + pte = pte_mkwrite(pte); > + } > return pte; Exactly what we are trying to avoid in the x86 approach right now. Please see the x86 series on details, we shouldn't try reinventing the wheel but finding a core-mm approach that fits multiple architectures. https://lkml.kernel.org/r/20230119212317.8324-1-rick.p.edgecombe@intel.com
On Mon, Feb 13, 2023 at 01:05:16PM +0100, David Hildenbrand wrote: >On 13.02.23 05:53, Deepak Gupta wrote: >>maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if >>VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write- >>able memory except it can only be written by certain specific >>instructions. This patch allows maybe_mkwrite to create shadow stack PTEs >>if vma is shadow stack VMA. Each arch can define which combination of VMA >>flags means a shadow stack. >> >>Additionally pte_mkshdwstk must be provided by arch specific PTE >>construction headers to create shadow stack PTEs. (in arch specific >>pgtable.h). >> >>This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK >>is not selected. >> >>Signed-off-by: Deepak Gupta <debug@rivosinc.com> >>--- >> include/linux/mm.h | 23 +++++++++++++++++++++-- >> include/linux/pgtable.h | 4 ++++ >> 2 files changed, 25 insertions(+), 2 deletions(-) >> >>diff --git a/include/linux/mm.h b/include/linux/mm.h >>index 8f857163ac89..a7705bc49bfe 100644 >>--- a/include/linux/mm.h >>+++ b/include/linux/mm.h >>@@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page) >> void free_compound_page(struct page *page); >> #ifdef CONFIG_MMU >>+ >>+#ifdef CONFIG_USER_SHADOW_STACK >>+bool arch_is_shadow_stack_vma(struct vm_area_struct *vma); >>+#endif >>+ >>+static inline bool >>+is_shadow_stack_vma(struct vm_area_struct *vma) >>+{ >>+#ifdef CONFIG_USER_SHADOW_STACK >>+ return arch_is_shadow_stack_vma(vma); >>+#else >>+ return false; >>+#endif >>+} >>+ >> /* >> * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when >> * servicing faults for write access. In the normal case, do always want >>@@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page); >> */ >> static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) >> { >>- if (likely(vma->vm_flags & VM_WRITE)) >>- pte = pte_mkwrite(pte); >>+ if (likely(vma->vm_flags & VM_WRITE)) { >>+ if (unlikely(is_shadow_stack_vma(vma))) >>+ pte = pte_mkshdwstk(pte); >>+ else >>+ pte = pte_mkwrite(pte); >>+ } >> return pte; > >Exactly what we are trying to avoid in the x86 approach right now. >Please see the x86 series on details, we shouldn't try reinventing the >wheel but finding a core-mm approach that fits multiple architectures. > >https://lkml.kernel.org/r/20230119212317.8324-1-rick.p.edgecombe@intel.com Thanks David for comment here. I looked at x86 approach. This patch actually written in a way which is not re-inventing wheel and is following a core-mm approach that fits multiple architectures. Change above checks `is_shadow_stack_vma` and if it returns true then only it manufactures shadow stack pte else it'll make a regular writeable mapping. Now if we look at `is_shadow_stack_vma` implementation, it returns false if `CONFIG_USER_SHADOW_STACK` is not defined. If `CONFIG_USER_SHADOW_STACK is defined then it calls `arch_is_shadow_stack_vma` which should be implemented by arch specific code. This allows each architecture to define their own vma flag encodings for shadow stack (riscv chooses presence of only `VM_WRITE` which is analogous to choosen PTE encodings on riscv W=1,R=0,X=0) Additionally pte_mkshdwstk will be nop if not implemented by architecture. Let me know if this make sense. If I am missing something here, let me know. > >-- >Thanks, > >David / dhildenb >
On 13.02.23 15:37, Deepak Gupta wrote: > On Mon, Feb 13, 2023 at 01:05:16PM +0100, David Hildenbrand wrote: >> On 13.02.23 05:53, Deepak Gupta wrote: >>> maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if >>> VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write- >>> able memory except it can only be written by certain specific >>> instructions. This patch allows maybe_mkwrite to create shadow stack PTEs >>> if vma is shadow stack VMA. Each arch can define which combination of VMA >>> flags means a shadow stack. >>> >>> Additionally pte_mkshdwstk must be provided by arch specific PTE >>> construction headers to create shadow stack PTEs. (in arch specific >>> pgtable.h). >>> >>> This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK >>> is not selected. >>> >>> Signed-off-by: Deepak Gupta <debug@rivosinc.com> >>> --- >>> include/linux/mm.h | 23 +++++++++++++++++++++-- >>> include/linux/pgtable.h | 4 ++++ >>> 2 files changed, 25 insertions(+), 2 deletions(-) >>> >>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>> index 8f857163ac89..a7705bc49bfe 100644 >>> --- a/include/linux/mm.h >>> +++ b/include/linux/mm.h >>> @@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page) >>> void free_compound_page(struct page *page); >>> #ifdef CONFIG_MMU >>> + >>> +#ifdef CONFIG_USER_SHADOW_STACK >>> +bool arch_is_shadow_stack_vma(struct vm_area_struct *vma); >>> +#endif >>> + >>> +static inline bool >>> +is_shadow_stack_vma(struct vm_area_struct *vma) >>> +{ >>> +#ifdef CONFIG_USER_SHADOW_STACK >>> + return arch_is_shadow_stack_vma(vma); >>> +#else >>> + return false; >>> +#endif >>> +} >>> + >>> /* >>> * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when >>> * servicing faults for write access. In the normal case, do always want >>> @@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page); >>> */ >>> static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) >>> { >>> - if (likely(vma->vm_flags & VM_WRITE)) >>> - pte = pte_mkwrite(pte); >>> + if (likely(vma->vm_flags & VM_WRITE)) { >>> + if (unlikely(is_shadow_stack_vma(vma))) >>> + pte = pte_mkshdwstk(pte); >>> + else >>> + pte = pte_mkwrite(pte); >>> + } >>> return pte; >> >> Exactly what we are trying to avoid in the x86 approach right now. >> Please see the x86 series on details, we shouldn't try reinventing the >> wheel but finding a core-mm approach that fits multiple architectures. >> >> https://lkml.kernel.org/r/20230119212317.8324-1-rick.p.edgecombe@intel.com > > Thanks David for comment here. I looked at x86 approach. This patch > actually written in a way which is not re-inventing wheel and is following > a core-mm approach that fits multiple architectures. > > Change above checks `is_shadow_stack_vma` and if it returns true then only > it manufactures shadow stack pte else it'll make a regular writeable mapping. > > Now if we look at `is_shadow_stack_vma` implementation, it returns false if > `CONFIG_USER_SHADOW_STACK` is not defined. If `CONFIG_USER_SHADOW_STACK is > defined then it calls `arch_is_shadow_stack_vma` which should be implemented > by arch specific code. This allows each architecture to define their own vma > flag encodings for shadow stack (riscv chooses presence of only `VM_WRITE` > which is analogous to choosen PTE encodings on riscv W=1,R=0,X=0) > > Additionally pte_mkshdwstk will be nop if not implemented by architecture. > > Let me know if this make sense. If I am missing something here, let me know. See the discussion in that thread. The idea is to pass a VMA to pte_mkwrite() and let it handle how to actually set it writable.
On Mon, Feb 13, 2023 at 03:56:22PM +0100, David Hildenbrand wrote: >On 13.02.23 15:37, Deepak Gupta wrote: >>On Mon, Feb 13, 2023 at 01:05:16PM +0100, David Hildenbrand wrote: >>>On 13.02.23 05:53, Deepak Gupta wrote: >>>>maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if >>>>VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write- >>>>able memory except it can only be written by certain specific >>>>instructions. This patch allows maybe_mkwrite to create shadow stack PTEs >>>>if vma is shadow stack VMA. Each arch can define which combination of VMA >>>>flags means a shadow stack. >>>> >>>>Additionally pte_mkshdwstk must be provided by arch specific PTE >>>>construction headers to create shadow stack PTEs. (in arch specific >>>>pgtable.h). >>>> >>>>This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK >>>>is not selected. >>>> >>>>Signed-off-by: Deepak Gupta <debug@rivosinc.com> >>>>--- >>>> include/linux/mm.h | 23 +++++++++++++++++++++-- >>>> include/linux/pgtable.h | 4 ++++ >>>> 2 files changed, 25 insertions(+), 2 deletions(-) >>>> >>>>diff --git a/include/linux/mm.h b/include/linux/mm.h >>>>index 8f857163ac89..a7705bc49bfe 100644 >>>>--- a/include/linux/mm.h >>>>+++ b/include/linux/mm.h >>>>@@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page) >>>> void free_compound_page(struct page *page); >>>> #ifdef CONFIG_MMU >>>>+ >>>>+#ifdef CONFIG_USER_SHADOW_STACK >>>>+bool arch_is_shadow_stack_vma(struct vm_area_struct *vma); >>>>+#endif >>>>+ >>>>+static inline bool >>>>+is_shadow_stack_vma(struct vm_area_struct *vma) >>>>+{ >>>>+#ifdef CONFIG_USER_SHADOW_STACK >>>>+ return arch_is_shadow_stack_vma(vma); >>>>+#else >>>>+ return false; >>>>+#endif >>>>+} >>>>+ >>>> /* >>>> * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when >>>> * servicing faults for write access. In the normal case, do always want >>>>@@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page); >>>> */ >>>> static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) >>>> { >>>>- if (likely(vma->vm_flags & VM_WRITE)) >>>>- pte = pte_mkwrite(pte); >>>>+ if (likely(vma->vm_flags & VM_WRITE)) { >>>>+ if (unlikely(is_shadow_stack_vma(vma))) >>>>+ pte = pte_mkshdwstk(pte); >>>>+ else >>>>+ pte = pte_mkwrite(pte); >>>>+ } >>>> return pte; >>> >>>Exactly what we are trying to avoid in the x86 approach right now. >>>Please see the x86 series on details, we shouldn't try reinventing the >>>wheel but finding a core-mm approach that fits multiple architectures. >>> >>>https://lkml.kernel.org/r/20230119212317.8324-1-rick.p.edgecombe@intel.com >> >>Thanks David for comment here. I looked at x86 approach. This patch >>actually written in a way which is not re-inventing wheel and is following >>a core-mm approach that fits multiple architectures. >> >>Change above checks `is_shadow_stack_vma` and if it returns true then only >>it manufactures shadow stack pte else it'll make a regular writeable mapping. >> >>Now if we look at `is_shadow_stack_vma` implementation, it returns false if >>`CONFIG_USER_SHADOW_STACK` is not defined. If `CONFIG_USER_SHADOW_STACK is >>defined then it calls `arch_is_shadow_stack_vma` which should be implemented >>by arch specific code. This allows each architecture to define their own vma >>flag encodings for shadow stack (riscv chooses presence of only `VM_WRITE` >>which is analogous to choosen PTE encodings on riscv W=1,R=0,X=0) >> >>Additionally pte_mkshdwstk will be nop if not implemented by architecture. >> >>Let me know if this make sense. If I am missing something here, let me know. > >See the discussion in that thread. The idea is to pass a VMA to >pte_mkwrite() and let it handle how to actually set it writable. > Thanks. I see. Instances where `pte_mkwrite` is directly invoked by checking VM_WRITE and thus instead of fixing all those instance, make pte_mkwrite itself take vma flag or vma. I'll revise. >-- >Thanks, > >David / dhildenb >
On 13.02.23 21:01, Deepak Gupta wrote: > On Mon, Feb 13, 2023 at 03:56:22PM +0100, David Hildenbrand wrote: >> On 13.02.23 15:37, Deepak Gupta wrote: >>> On Mon, Feb 13, 2023 at 01:05:16PM +0100, David Hildenbrand wrote: >>>> On 13.02.23 05:53, Deepak Gupta wrote: >>>>> maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if >>>>> VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write- >>>>> able memory except it can only be written by certain specific >>>>> instructions. This patch allows maybe_mkwrite to create shadow stack PTEs >>>>> if vma is shadow stack VMA. Each arch can define which combination of VMA >>>>> flags means a shadow stack. >>>>> >>>>> Additionally pte_mkshdwstk must be provided by arch specific PTE >>>>> construction headers to create shadow stack PTEs. (in arch specific >>>>> pgtable.h). >>>>> >>>>> This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK >>>>> is not selected. >>>>> >>>>> Signed-off-by: Deepak Gupta <debug@rivosinc.com> >>>>> --- >>>>> include/linux/mm.h | 23 +++++++++++++++++++++-- >>>>> include/linux/pgtable.h | 4 ++++ >>>>> 2 files changed, 25 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>>>> index 8f857163ac89..a7705bc49bfe 100644 >>>>> --- a/include/linux/mm.h >>>>> +++ b/include/linux/mm.h >>>>> @@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page) >>>>> void free_compound_page(struct page *page); >>>>> #ifdef CONFIG_MMU >>>>> + >>>>> +#ifdef CONFIG_USER_SHADOW_STACK >>>>> +bool arch_is_shadow_stack_vma(struct vm_area_struct *vma); >>>>> +#endif >>>>> + >>>>> +static inline bool >>>>> +is_shadow_stack_vma(struct vm_area_struct *vma) >>>>> +{ >>>>> +#ifdef CONFIG_USER_SHADOW_STACK >>>>> + return arch_is_shadow_stack_vma(vma); >>>>> +#else >>>>> + return false; >>>>> +#endif >>>>> +} >>>>> + >>>>> /* >>>>> * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when >>>>> * servicing faults for write access. In the normal case, do always want >>>>> @@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page); >>>>> */ >>>>> static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) >>>>> { >>>>> - if (likely(vma->vm_flags & VM_WRITE)) >>>>> - pte = pte_mkwrite(pte); >>>>> + if (likely(vma->vm_flags & VM_WRITE)) { >>>>> + if (unlikely(is_shadow_stack_vma(vma))) >>>>> + pte = pte_mkshdwstk(pte); >>>>> + else >>>>> + pte = pte_mkwrite(pte); >>>>> + } >>>>> return pte; >>>> >>>> Exactly what we are trying to avoid in the x86 approach right now. >>>> Please see the x86 series on details, we shouldn't try reinventing the >>>> wheel but finding a core-mm approach that fits multiple architectures. >>>> >>>> https://lkml.kernel.org/r/20230119212317.8324-1-rick.p.edgecombe@intel.com >>> >>> Thanks David for comment here. I looked at x86 approach. This patch >>> actually written in a way which is not re-inventing wheel and is following >>> a core-mm approach that fits multiple architectures. >>> >>> Change above checks `is_shadow_stack_vma` and if it returns true then only >>> it manufactures shadow stack pte else it'll make a regular writeable mapping. >>> >>> Now if we look at `is_shadow_stack_vma` implementation, it returns false if >>> `CONFIG_USER_SHADOW_STACK` is not defined. If `CONFIG_USER_SHADOW_STACK is >>> defined then it calls `arch_is_shadow_stack_vma` which should be implemented >>> by arch specific code. This allows each architecture to define their own vma >>> flag encodings for shadow stack (riscv chooses presence of only `VM_WRITE` >>> which is analogous to choosen PTE encodings on riscv W=1,R=0,X=0) >>> >>> Additionally pte_mkshdwstk will be nop if not implemented by architecture. >>> >>> Let me know if this make sense. If I am missing something here, let me know. >> >> See the discussion in that thread. The idea is to pass a VMA to >> pte_mkwrite() and let it handle how to actually set it writable. >> > > Thanks. I see. Instances where `pte_mkwrite` is directly invoked by checking > VM_WRITE and thus instead of fixing all those instance, make pte_mkwrite itself > take vma flag or vma. > > I'll revise. Thanks, it would be great to discuss in the other threads what else you would need to make it work for you. I assume Rick will have something to play with soonish (Right, Rick? :) ).
On Tue, 2023-02-14 at 13:10 +0100, David Hildenbrand wrote: > I assume Rick will have something to > play with soonish (Right, Rick? :) ). Yes, Deepak and I were discussing on the x86 series. I haven't heard anything from 0-day for a few days so looking good. There was discussion happening with Boris on the pte_modify() patch, so might wait a day more to post a new version.
diff --git a/include/linux/mm.h b/include/linux/mm.h index 8f857163ac89..a7705bc49bfe 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page) void free_compound_page(struct page *page); #ifdef CONFIG_MMU + +#ifdef CONFIG_USER_SHADOW_STACK +bool arch_is_shadow_stack_vma(struct vm_area_struct *vma); +#endif + +static inline bool +is_shadow_stack_vma(struct vm_area_struct *vma) +{ +#ifdef CONFIG_USER_SHADOW_STACK + return arch_is_shadow_stack_vma(vma); +#else + return false; +#endif +} + /* * Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when * servicing faults for write access. In the normal case, do always want @@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page); */ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) { - if (likely(vma->vm_flags & VM_WRITE)) - pte = pte_mkwrite(pte); + if (likely(vma->vm_flags & VM_WRITE)) { + if (unlikely(is_shadow_stack_vma(vma))) + pte = pte_mkshdwstk(pte); + else + pte = pte_mkwrite(pte); + } return pte; } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 1159b25b0542..94b157218c73 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1736,4 +1736,8 @@ pgprot_t vm_get_page_prot(unsigned long vm_flags) \ } \ EXPORT_SYMBOL(vm_get_page_prot); +#ifndef CONFIG_USER_SHADOW_STACK +#define pte_mkshdwstk(pte) pte +#endif + #endif /* _LINUX_PGTABLE_H */
maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write- able memory except it can only be written by certain specific instructions. This patch allows maybe_mkwrite to create shadow stack PTEs if vma is shadow stack VMA. Each arch can define which combination of VMA flags means a shadow stack. Additionally pte_mkshdwstk must be provided by arch specific PTE construction headers to create shadow stack PTEs. (in arch specific pgtable.h). This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK is not selected. Signed-off-by: Deepak Gupta <debug@rivosinc.com> --- include/linux/mm.h | 23 +++++++++++++++++++++-- include/linux/pgtable.h | 4 ++++ 2 files changed, 25 insertions(+), 2 deletions(-)