diff mbox series

[v2,22/39] mm: Don't allow write GUPs to shadow stack memory

Message ID 20220929222936.14584-23-rick.p.edgecombe@intel.com (mailing list archive)
State New
Headers show
Series Shadowstacks for userspace | expand

Commit Message

Rick Edgecombe Sept. 29, 2022, 10:29 p.m. UTC
Shadow stack memory is writable only in very specific, controlled ways.
However, since it is writable, the kernel treats it as such. As a result
there remain many ways for userspace to trigger the kernel to write to
shadow stack's via get_user_pages(, FOLL_WRITE) operations. To make this a
little less exposed, block writable GUPs for shadow stack VMAs.

Still allow FOLL_FORCE to write through shadow stack protections, as it
does for read-only protections.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---

v2:
 - New patch

 arch/x86/include/asm/pgtable.h | 3 +++
 mm/gup.c                       | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

Comments

Dave Hansen Sept. 30, 2022, 7:16 p.m. UTC | #1
On 9/29/22 15:29, Rick Edgecombe wrote:
> @@ -1633,6 +1633,9 @@ static inline bool __pte_access_permitted(unsigned long pteval, bool write)
>  {
>  	unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER;
>  
> +	if (write && (pteval & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY)
> +		return 0;

Do we not have a helper for this?  Seems a bit messy to open-code these
shadow-stack permissions.  Definitely at least needs a comment.
Rick Edgecombe Sept. 30, 2022, 8:30 p.m. UTC | #2
On Fri, 2022-09-30 at 12:16 -0700, Dave Hansen wrote:
> On 9/29/22 15:29, Rick Edgecombe wrote:
> > @@ -1633,6 +1633,9 @@ static inline bool
> > __pte_access_permitted(unsigned long pteval, bool write)
> >   {
> >        unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER;
> >   
> > +     if (write && (pteval & (_PAGE_RW | _PAGE_DIRTY)) ==
> > _PAGE_DIRTY)
> > +             return 0;
> 
> Do we not have a helper for this?  Seems a bit messy to open-code
> these
> shadow-stack permissions.  Definitely at least needs a comment.

It's because pteval is an unsigned long. We could create a pte_t, and
use the helpers, but then we would be using pte_foo() on pmd's, etc. So
probably comment is the better option?
Dave Hansen Sept. 30, 2022, 8:37 p.m. UTC | #3
On 9/30/22 13:30, Edgecombe, Rick P wrote:
> On Fri, 2022-09-30 at 12:16 -0700, Dave Hansen wrote:
>> On 9/29/22 15:29, Rick Edgecombe wrote:
>>> @@ -1633,6 +1633,9 @@ static inline bool
>>> __pte_access_permitted(unsigned long pteval, bool write)
>>>   {
>>>        unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER;
>>>
>>> +     if (write && (pteval & (_PAGE_RW | _PAGE_DIRTY)) ==
>>> _PAGE_DIRTY)
>>> +             return 0;
>> Do we not have a helper for this?  Seems a bit messy to open-code
>> these
>> shadow-stack permissions.  Definitely at least needs a comment.
> It's because pteval is an unsigned long. We could create a pte_t, and
> use the helpers, but then we would be using pte_foo() on pmd's, etc. So
> probably comment is the better option?

Yeah, a comment is probably best.

This is one of those "generic" page table functions that doesn't work
well with the p{te,md,ud}_* types.  It's either this or cast over to a
pteval_t for pmd/pud and pretend this is a pte-only function.
Jann Horn Sept. 30, 2022, 11 p.m. UTC | #4
On Fri, Sep 30, 2022 at 9:16 PM Dave Hansen <dave.hansen@intel.com> wrote:
> On 9/29/22 15:29, Rick Edgecombe wrote:
> > @@ -1633,6 +1633,9 @@ static inline bool __pte_access_permitted(unsigned long pteval, bool write)
> >  {
> >       unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER;
> >
> > +     if (write && (pteval & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY)
> > +             return 0;
>
> Do we not have a helper for this?  Seems a bit messy to open-code these
> shadow-stack permissions.  Definitely at least needs a comment.

FWIW, if you look at more context around this diff, the function looks
like this:

 static inline bool __pte_access_permitted(unsigned long pteval, bool write)
 {
        unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER;

+       if (write && (pteval & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY)
+               return 0;
+
        if (write)
                need_pte_bits |= _PAGE_RW;

        if ((pteval & need_pte_bits) != need_pte_bits)
                return 0;

        return __pkru_allows_pkey(pte_flags_pkey(pteval), write);
 }

So I think this change is actually a no-op - the only thing it does is
to return 0 if write==1, !_PAGE_RW, and _PAGE_DIRTY. But the check
below will always return 0 if !_PAGE_RW, unless I'm misreading it? And
this is the only patch in the series that touches this function, so
it's not like this becomes necessary with a later patch in the series
either.

Should this check go in anyway for clarity reasons, or should this
instead be a comment explaining that __pte_access_permitted() behaves
just like the hardware access check, which means shadow pages are
treated as readonly?
Jann Horn Sept. 30, 2022, 11:02 p.m. UTC | #5
On Sat, Oct 1, 2022 at 1:00 AM Jann Horn <jannh@google.com> wrote:
> So I think this change is actually a no-op - the only thing it does is
> to return 0 if write==1, !_PAGE_RW, and _PAGE_DIRTY. But the check
> below will always return 0 if !_PAGE_RW, unless I'm misreading it?

Er, to be precise, it will always return 0 if write==1 and !_PAGE_RW.
Rick Edgecombe Sept. 30, 2022, 11:04 p.m. UTC | #6
On Sat, 2022-10-01 at 01:00 +0200, Jann Horn wrote:
> On Fri, Sep 30, 2022 at 9:16 PM Dave Hansen <dave.hansen@intel.com>
> wrote:
> > On 9/29/22 15:29, Rick Edgecombe wrote:
> > > @@ -1633,6 +1633,9 @@ static inline bool
> > > __pte_access_permitted(unsigned long pteval, bool write)
> > >   {
> > >        unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER;
> > > 
> > > +     if (write && (pteval & (_PAGE_RW | _PAGE_DIRTY)) ==
> > > _PAGE_DIRTY)
> > > +             return 0;
> > 
> > Do we not have a helper for this?  Seems a bit messy to open-code
> > these
> > shadow-stack permissions.  Definitely at least needs a comment.
> 
> FWIW, if you look at more context around this diff, the function
> looks
> like this:
> 
>  static inline bool __pte_access_permitted(unsigned long pteval, bool
> write)
>  {
>         unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER;
> 
> +       if (write && (pteval & (_PAGE_RW | _PAGE_DIRTY)) ==
> _PAGE_DIRTY)
> +               return 0;
> +
>         if (write)
>                 need_pte_bits |= _PAGE_RW;
> 
>         if ((pteval & need_pte_bits) != need_pte_bits)
>                 return 0;
> 
>         return __pkru_allows_pkey(pte_flags_pkey(pteval), write);
>  }
> 
> So I think this change is actually a no-op - the only thing it does
> is
> to return 0 if write==1, !_PAGE_RW, and _PAGE_DIRTY. But the check
> below will always return 0 if !_PAGE_RW, unless I'm misreading it?
> And
> this is the only patch in the series that touches this function, so
> it's not like this becomes necessary with a later patch in the series
> either.
> 
> Should this check go in anyway for clarity reasons, or should this
> instead be a comment explaining that __pte_access_permitted() behaves
> just like the hardware access check, which means shadow pages are
> treated as readonly?

Thanks Jann, I was just realizing the same thing. Yes, I think it
doesn't do anything. I can add a comment of why there is no check, but
otherwise the check seems like unnecessary work.
Kees Cook Oct. 3, 2022, 6:39 p.m. UTC | #7
On Thu, Sep 29, 2022 at 03:29:19PM -0700, Rick Edgecombe wrote:
> [...]
> Still allow FOLL_FORCE to write through shadow stack protections, as it
> does for read-only protections.

As I asked in the cover letter: why do we need to add this for shstk? It
was a mistake for general memory. :P

> [...]
> diff --git a/mm/gup.c b/mm/gup.c
> index 5abdaf487460..56da98f3335c 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1043,7 +1043,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
>  		return -EFAULT;
>  
>  	if (write) {
> -		if (!(vm_flags & VM_WRITE)) {
> +		if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) {
>  			if (!(gup_flags & FOLL_FORCE))
>  				return -EFAULT;
>  			/*

How about this instead:

  		return -EFAULT;
  
 	if (write) {
+		if (vm_flags & VM_SHADOW_STACK)
+			return -EFAULT;
 		if (!(vm_flags & VM_WRITE)) {
 			if (!(gup_flags & FOLL_FORCE))
 				return -EFAULT;
Andy Lutomirski Oct. 3, 2022, 10:49 p.m. UTC | #8
On 10/3/22 11:39, Kees Cook wrote:
> On Thu, Sep 29, 2022 at 03:29:19PM -0700, Rick Edgecombe wrote:
>> [...]
>> Still allow FOLL_FORCE to write through shadow stack protections, as it
>> does for read-only protections.
> 
> As I asked in the cover letter: why do we need to add this for shstk? It
> was a mistake for general memory. :P

For debuggers, which use FOLL_FORCE, quite intentionally, to modify 
text.  And once a debugger has ptrace write access to a target, shadow 
stacks provide exactly no protection -- ptrace can modify text and all 
registers.

But /proc/.../mem may be a different story, and I'd be okay with having 
FOLL_PROC_MEM for legacy compatibility via /proc/.../mem and not 
allowing that to access shadow stacks.  This does seem like it may not 
be very useful, though.
Kees Cook Oct. 4, 2022, 4:21 a.m. UTC | #9
On Mon, Oct 03, 2022 at 03:49:18PM -0700, Andy Lutomirski wrote:
> On 10/3/22 11:39, Kees Cook wrote:
> > On Thu, Sep 29, 2022 at 03:29:19PM -0700, Rick Edgecombe wrote:
> > > [...]
> > > Still allow FOLL_FORCE to write through shadow stack protections, as it
> > > does for read-only protections.
> > 
> > As I asked in the cover letter: why do we need to add this for shstk? It
> > was a mistake for general memory. :P
> 
> For debuggers, which use FOLL_FORCE, quite intentionally, to modify text.
> And once a debugger has ptrace write access to a target, shadow stacks
> provide exactly no protection -- ptrace can modify text and all registers.

i.e. via ptrace? Yeah, I grudgingly accept the ptrace need for
FOLL_FORCE.

> But /proc/.../mem may be a different story, and I'd be okay with having
> FOLL_PROC_MEM for legacy compatibility via /proc/.../mem and not allowing
> that to access shadow stacks.  This does seem like it may not be very
> useful, though.

I *really* don't like the /mem use of FOLL_FORCE, though. I think the
rationale has been "using PTRACE_POKE is too slow". Again, I can live
with it, I was just hoping we could avoid expanding that questionable
behavior, especially since it's a bypass of WRSS.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 7a769c4dbc1c..2e6a5ee70034 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1633,6 +1633,9 @@  static inline bool __pte_access_permitted(unsigned long pteval, bool write)
 {
 	unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER;
 
+	if (write && (pteval & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY)
+		return 0;
+
 	if (write)
 		need_pte_bits |= _PAGE_RW;
 
diff --git a/mm/gup.c b/mm/gup.c
index 5abdaf487460..56da98f3335c 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1043,7 +1043,7 @@  static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
 		return -EFAULT;
 
 	if (write) {
-		if (!(vm_flags & VM_WRITE)) {
+		if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) {
 			if (!(gup_flags & FOLL_FORCE))
 				return -EFAULT;
 			/*