Message ID | 20230218211433.26859-19-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Shadow stacks for userspace | expand |
On 18.02.23 22:14, Rick Edgecombe wrote: > From: Yu-cheng Yu <yu-cheng.yu@intel.com> > > The x86 Control-flow Enforcement Technology (CET) feature includes a new > type of memory called shadow stack. This shadow stack memory has some > unusual properties, which requires some core mm changes to function > properly. > > A shadow stack PTE must be read-only and have _PAGE_DIRTY set. However, > read-only and Dirty PTEs also exist for copy-on-write (COW) pages. These > two cases are handled differently for page faults. Introduce > VM_SHADOW_STACK to track shadow stack VMAs. I suggest simplifying and abstracting that description. "New hardware extensions implement support for shadow stack memory, such as x86 Control-flow Enforcement Technology (CET). Let's add a new VM flag to identify these areas, for example, to be used to properly indicate shadow stack PTEs to the hardware." > > Reviewed-by: Kees Cook <keescook@chromium.org> > Tested-by: Pengfei Xu <pengfei.xu@intel.com> > Tested-by: John Allen <john.allen@amd.com> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> > Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > Cc: Kees Cook <keescook@chromium.org> > > --- > v6: > - Add comment about VM_SHADOW_STACK not being allowed with VM_SHARED > (David Hildenbrand) Might want to add some more meat to the patch description why that is the case. > > v3: > - Drop arch specific change in arch_vma_name(). The memory can show as > anonymous (Kirill) > - Change CONFIG_ARCH_HAS_SHADOW_STACK to CONFIG_X86_USER_SHADOW_STACK > in show_smap_vma_flags() (Boris) > --- > Documentation/filesystems/proc.rst | 1 + > fs/proc/task_mmu.c | 3 +++ > include/linux/mm.h | 8 ++++++++ > 3 files changed, 12 insertions(+) > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst > index e224b6d5b642..115843e8cce3 100644 > --- a/Documentation/filesystems/proc.rst > +++ b/Documentation/filesystems/proc.rst > @@ -564,6 +564,7 @@ encoded manner. The codes are the following: > mt arm64 MTE allocation tags are enabled > um userfaultfd missing tracking > uw userfaultfd wr-protect tracking > + ss shadow stack page > == ======================================= > > Note that there is no guarantee that every flag and associated mnemonic will > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index af1c49ae11b1..9e2cefe47749 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -711,6 +711,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) > #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR > [ilog2(VM_UFFD_MINOR)] = "ui", > #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ > +#ifdef CONFIG_X86_USER_SHADOW_STACK > + [ilog2(VM_SHADOW_STACK)] = "ss", > +#endif > }; > size_t i; > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index e6f1789c8e69..76e0a09aeffe 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -315,11 +315,13 @@ extern unsigned int kobjsize(const void *objp); > #define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-bit architectures */ > #define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit architectures */ > #define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-bit architectures */ > +#define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit architectures */ > #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0) > #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1) > #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2) > #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3) > #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) > +#define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5) > #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ > > #ifdef CONFIG_ARCH_HAS_PKEYS > @@ -335,6 +337,12 @@ extern unsigned int kobjsize(const void *objp); > #endif > #endif /* CONFIG_ARCH_HAS_PKEYS */ > > +#ifdef CONFIG_X86_USER_SHADOW_STACK Should we abstract this to CONFIG_ARCH_USER_SHADOW_STACK, seeing that other architectures might similarly need it?
On Mon, 2023-02-20 at 13:56 +0100, David Hildenbrand wrote: > On 18.02.23 22:14, Rick Edgecombe wrote: > > From: Yu-cheng Yu <yu-cheng.yu@intel.com> > > > > The x86 Control-flow Enforcement Technology (CET) feature includes > > a new > > type of memory called shadow stack. This shadow stack memory has > > some > > unusual properties, which requires some core mm changes to function > > properly. > > > > A shadow stack PTE must be read-only and have _PAGE_DIRTY set. > > However, > > read-only and Dirty PTEs also exist for copy-on-write (COW) pages. > > These > > two cases are handled differently for page faults. Introduce > > VM_SHADOW_STACK to track shadow stack VMAs. > > I suggest simplifying and abstracting that description. > > "New hardware extensions implement support for shadow stack memory, > such > as x86 Control-flow Enforcement Technology (CET). Let's add a new VM > flag to identify these areas, for example, to be used to properly > indicate shadow stack PTEs to the hardware." Ah yea, that top blurb was added to all the non-x86 arch patches after some feedback from Andrew Morton. He had said basically (in some more colorful language) that the changelogs (at the time) were written assuming the reader knows what a shadow stack is. So it might be worth keeping a little more info in the log? > > > > > Reviewed-by: Kees Cook <keescook@chromium.org> > > Tested-by: Pengfei Xu <pengfei.xu@intel.com> > > Tested-by: John Allen <john.allen@amd.com> > > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> > > Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > > Cc: Kees Cook <keescook@chromium.org> > > > > --- > > v6: > > - Add comment about VM_SHADOW_STACK not being allowed with > > VM_SHARED > > (David Hildenbrand) > > Might want to add some more meat to the patch description why that > is > the case. Sure. > > > > > v3: > > - Drop arch specific change in arch_vma_name(). The memory can > > show as > > anonymous (Kirill) > > - Change CONFIG_ARCH_HAS_SHADOW_STACK to > > CONFIG_X86_USER_SHADOW_STACK > > in show_smap_vma_flags() (Boris) > > --- > > Documentation/filesystems/proc.rst | 1 + > > fs/proc/task_mmu.c | 3 +++ > > include/linux/mm.h | 8 ++++++++ > > 3 files changed, 12 insertions(+) > > > > diff --git a/Documentation/filesystems/proc.rst > > b/Documentation/filesystems/proc.rst > > index e224b6d5b642..115843e8cce3 100644 > > --- a/Documentation/filesystems/proc.rst > > +++ b/Documentation/filesystems/proc.rst > > @@ -564,6 +564,7 @@ encoded manner. The codes are the following: > > mt arm64 MTE allocation tags are enabled > > um userfaultfd missing tracking > > uw userfaultfd wr-protect tracking > > + ss shadow stack page > > == ======================================= > > > > Note that there is no guarantee that every flag and associated > > mnemonic will > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > index af1c49ae11b1..9e2cefe47749 100644 > > --- a/fs/proc/task_mmu.c > > +++ b/fs/proc/task_mmu.c > > @@ -711,6 +711,9 @@ static void show_smap_vma_flags(struct seq_file > > *m, struct vm_area_struct *vma) > > #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR > > [ilog2(VM_UFFD_MINOR)] = "ui", > > #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ > > +#ifdef CONFIG_X86_USER_SHADOW_STACK > > + [ilog2(VM_SHADOW_STACK)] = "ss", > > +#endif > > }; > > size_t i; > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index e6f1789c8e69..76e0a09aeffe 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -315,11 +315,13 @@ extern unsigned int kobjsize(const void > > *objp); > > #define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64- > > bit architectures */ > > #define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64- > > bit architectures */ > > #define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64- > > bit architectures */ > > +#define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit > > architectures */ > > #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0) > > #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1) > > #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2) > > #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3) > > #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) > > +#define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5) > > #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ > > > > #ifdef CONFIG_ARCH_HAS_PKEYS > > @@ -335,6 +337,12 @@ extern unsigned int kobjsize(const void > > *objp); > > #endif > > #endif /* CONFIG_ARCH_HAS_PKEYS */ > > > > +#ifdef CONFIG_X86_USER_SHADOW_STACK > > > Should we abstract this to CONFIG_ARCH_USER_SHADOW_STACK, seeing > that > other architectures might similarly need it? There was an ARCH_HAS_SHADOW_STACK but it got removed following this discussion: https://lore.kernel.org/lkml/d09e952d8ae696f687f0787dfeb7be7699c02913.camel@intel.com/ Now we have this new RFC for riscv as potentially a second implementation. But it is still very early, and I'm not sure anyone knows exactly what the similarities will be in a mature version. So I think it would be better to refactor in an ARCH_HAS_SHADOW_STACK later (and similar abstractions) once that series is more mature and we have an idea of what pieces will be shared. I don't have a problem in principle with an ARCH config, just don't think we should do it yet. >
On 20.02.23 23:08, Edgecombe, Rick P wrote: > On Mon, 2023-02-20 at 13:56 +0100, David Hildenbrand wrote: >> On 18.02.23 22:14, Rick Edgecombe wrote: >>> From: Yu-cheng Yu <yu-cheng.yu@intel.com> >>> >>> The x86 Control-flow Enforcement Technology (CET) feature includes >>> a new >>> type of memory called shadow stack. This shadow stack memory has >>> some >>> unusual properties, which requires some core mm changes to function >>> properly. >>> >>> A shadow stack PTE must be read-only and have _PAGE_DIRTY set. >>> However, >>> read-only and Dirty PTEs also exist for copy-on-write (COW) pages. >>> These >>> two cases are handled differently for page faults. Introduce >>> VM_SHADOW_STACK to track shadow stack VMAs. >> >> I suggest simplifying and abstracting that description. >> >> "New hardware extensions implement support for shadow stack memory, >> such >> as x86 Control-flow Enforcement Technology (CET). Let's add a new VM >> flag to identify these areas, for example, to be used to properly >> indicate shadow stack PTEs to the hardware." > > Ah yea, that top blurb was added to all the non-x86 arch patches after > some feedback from Andrew Morton. He had said basically (in some more > colorful language) that the changelogs (at the time) were written > assuming the reader knows what a shadow stack is. Okay. It's a bit repetitive, though. Ideally, we'd just explain it in the cover letter in detail and Andrews's script would include the cover letter in the first commit. IIRC, that's what usually happens. > > So it might be worth keeping a little more info in the log? Copying the same paragraph into each commit is IMHO a bit repetitive. But these are just my 2 cents. [...] >> Should we abstract this to CONFIG_ARCH_USER_SHADOW_STACK, seeing >> that >> other architectures might similarly need it? > > There was an ARCH_HAS_SHADOW_STACK but it got removed following this > discussion: > > https://lore.kernel.org/lkml/d09e952d8ae696f687f0787dfeb7be7699c02913.camel@intel.com/ > > Now we have this new RFC for riscv as potentially a second > implementation. But it is still very early, and I'm not sure anyone > knows exactly what the similarities will be in a mature version. So I > think it would be better to refactor in an ARCH_HAS_SHADOW_STACK later > (and similar abstractions) once that series is more mature and we have > an idea of what pieces will be shared. I don't have a problem in > principle with an ARCH config, just don't think we should do it yet. Okay, easy to factor out later. Acked-by: David Hildenbrand <david@redhat.com>
On Tue, Feb 21, 2023 at 09:34:35AM +0100, David Hildenbrand wrote: >On 20.02.23 23:08, Edgecombe, Rick P wrote: >>On Mon, 2023-02-20 at 13:56 +0100, David Hildenbrand wrote: >>>On 18.02.23 22:14, Rick Edgecombe wrote: >>>>From: Yu-cheng Yu <yu-cheng.yu@intel.com> >>>> >>>>The x86 Control-flow Enforcement Technology (CET) feature includes >>>>a new >>>>type of memory called shadow stack. This shadow stack memory has >>>>some >>>>unusual properties, which requires some core mm changes to function >>>>properly. >>>> >>>>A shadow stack PTE must be read-only and have _PAGE_DIRTY set. >>>>However, >>>>read-only and Dirty PTEs also exist for copy-on-write (COW) pages. >>>>These >>>>two cases are handled differently for page faults. Introduce >>>>VM_SHADOW_STACK to track shadow stack VMAs. >>> >>>I suggest simplifying and abstracting that description. >>> >>>"New hardware extensions implement support for shadow stack memory, >>>such >>>as x86 Control-flow Enforcement Technology (CET). Let's add a new VM >>>flag to identify these areas, for example, to be used to properly >>>indicate shadow stack PTEs to the hardware." >> >>Ah yea, that top blurb was added to all the non-x86 arch patches after >>some feedback from Andrew Morton. He had said basically (in some more >>colorful language) that the changelogs (at the time) were written >>assuming the reader knows what a shadow stack is. > >Okay. It's a bit repetitive, though. > >Ideally, we'd just explain it in the cover letter in detail and >Andrews's script would include the cover letter in the first commit. >IIRC, that's what usually happens. > >> >>So it might be worth keeping a little more info in the log? > >Copying the same paragraph into each commit is IMHO a bit repetitive. >But these are just my 2 cents. > >[...] > >>>Should we abstract this to CONFIG_ARCH_USER_SHADOW_STACK, seeing >>>that >>>other architectures might similarly need it? >> >>There was an ARCH_HAS_SHADOW_STACK but it got removed following this >>discussion: >> >>https://lore.kernel.org/lkml/d09e952d8ae696f687f0787dfeb7be7699c02913.camel@intel.com/ >> >>Now we have this new RFC for riscv as potentially a second >>implementation. But it is still very early, and I'm not sure anyone >>knows exactly what the similarities will be in a mature version. So I >>think it would be better to refactor in an ARCH_HAS_SHADOW_STACK later >>(and similar abstractions) once that series is more mature and we have >>an idea of what pieces will be shared. I don't have a problem in >>principle with an ARCH config, just don't think we should do it yet. > >Okay, easy to factor out later. I would be more than happy if this config name would've been abstracted out and arches can choose to implement. It's a bit sad that it was generic earlier and was later changed due to lack of support from other architectures. Now there are three architectures who either already support shadow stack (x86), announced the support (aarch64) or are planning to support (riscv). However given patch reduction I will get due to `pte_mkwrite` refactor, I am in favor of future refactor for config. > >Acked-by: David Hildenbrand <david@redhat.com> > >-- >Thanks, > >David / dhildenb >
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index e224b6d5b642..115843e8cce3 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -564,6 +564,7 @@ encoded manner. The codes are the following: mt arm64 MTE allocation tags are enabled um userfaultfd missing tracking uw userfaultfd wr-protect tracking + ss shadow stack page == ======================================= Note that there is no guarantee that every flag and associated mnemonic will diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index af1c49ae11b1..9e2cefe47749 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -711,6 +711,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR [ilog2(VM_UFFD_MINOR)] = "ui", #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_X86_USER_SHADOW_STACK + [ilog2(VM_SHADOW_STACK)] = "ss", +#endif }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index e6f1789c8e69..76e0a09aeffe 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -315,11 +315,13 @@ extern unsigned int kobjsize(const void *objp); #define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-bit architectures */ +#define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0) #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1) #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2) #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3) #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) +#define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ #ifdef CONFIG_ARCH_HAS_PKEYS @@ -335,6 +337,12 @@ extern unsigned int kobjsize(const void *objp); #endif #endif /* CONFIG_ARCH_HAS_PKEYS */ +#ifdef CONFIG_X86_USER_SHADOW_STACK +# define VM_SHADOW_STACK VM_HIGH_ARCH_5 /* Should not be set with VM_SHARED */ +#else +# define VM_SHADOW_STACK VM_NONE +#endif + #if defined(CONFIG_X86) # define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC)