diff mbox series

[RFC,v2] mm: Enforce the stack gap when changing inaccessible VMAs

Message ID 20241011-stack-gap-inaccessible-v2-1-111b6a0ee2cb@google.com (mailing list archive)
State New
Headers show
Series [RFC,v2] mm: Enforce the stack gap when changing inaccessible VMAs | expand

Commit Message

Jann Horn Oct. 11, 2024, 3:50 p.m. UTC
As explained in the comment block this change adds, we can't tell what
userspace's intent is when the stack grows towards an inaccessible VMA.

We should ensure that, as long as code is compiled with something like
-fstack-check, a stack overflow in this code can never cause the main stack
to overflow into adjacent heap memory - so the bottom of a stack should
never be directly adjacent to an accessible VMA.

As suggested by Lorenzo, enforce this by blocking attempts to:

 - make an inaccessible VMA accessible with mprotect() when it is too close
   to a stack
 - replace an inaccessible VMA with another VMA using MAP_FIXED when it is
   too close to a stack


I have a (highly contrived) C testcase for 32-bit x86 userspace with glibc
that mixes malloc(), pthread creation, and recursion in just the right way
such that the main stack overflows into malloc() arena memory, see the
linked list post.

I don't know of any specific scenario where this is actually exploitable,
but it seems like it could be a security problem for sufficiently unlucky
userspace.

Link: https://lore.kernel.org/r/CAG48ez2v=r9-37JADA5DgnZdMLCjcbVxAjLt5eH5uoBohRdqsw@mail.gmail.com/
Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Fixes: 561b5e0709e4 ("mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
---
This is an attempt at the alternate fix approach suggested by Lorenzo.

This turned into more code than I would prefer to have for a scenario
like this...

Also, the way the gap is enforced in my patch for MAP_FIXED_NOREPLACE is
a bit ugly. In the existing code, __get_unmapped_area() normally already
enforces the stack gap even when it is called with a hint; but when
MAP_FIXED_NOREPLACE is used, we kinda lie to __get_unmapped_area() and
tell it we'll do a MAP_FIXED mapping (introduced in commit
a4ff8e8620d3f when MAP_FIXED_NOREPLACE was created), then afterwards
manually reject overlapping mappings.
So I ended up also doing the gap check separately for
MAP_FIXED_NOREPLACE.

The following test program exercises scenarios that could lead to the
stack becoming directly adjacent to another accessible VMA,
and passes with this patch applied:
<<<

int main(void) {
  setbuf(stdout, NULL);

  char *ptr = (char*)(  (unsigned long)(STACK_POINTER() - (1024*1024*4)/*4MiB*/) & ~0xfffUL  );
  if (mmap(ptr, 0x1000, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0) != ptr)
    err(1, "mmap distant-from-stack");
  *(volatile char *)(ptr + 0x1000); /* expand stack */
  system("echo;cat /proc/$PPID/maps;echo");

  /* test transforming PROT_NONE mapping adjacent to stack */
  if (mprotect(ptr, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC) == 0)
    errx(1, "mprotect adjacent to stack allowed");
  if (mmap(ptr, 0x1000, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0) != MAP_FAILED)
    errx(1, "MAP_FIXED adjacent to stack allowed");

  if (munmap(ptr, 0x1000))
    err(1, "munmap failed???");

  /* test creating new mapping adjacent to stack */
  if (mmap(ptr, 0x1000, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED_NOREPLACE, -1, 0) != MAP_FAILED)
    errx(1, "MAP_FIXED_NOREPLACE adjacent to stack allowed");
  if (mmap(ptr, 0x1000, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0) == ptr)
    errx(1, "mmap hint adjacent to stack accepted");

  printf("all tests passed\n");
}
>>>
---
Changes in v2:
- Entirely new approach (suggested by Lorenzo)
- Link to v1: https://lore.kernel.org/r/20241008-stack-gap-inaccessible-v1-1-848d4d891f21@google.com
---
 include/linux/mm.h |  1 +
 mm/mmap.c          | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/mprotect.c      |  6 +++++
 3 files changed, 72 insertions(+)


---
base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b
change-id: 20241008-stack-gap-inaccessible-c7319f7d4b1b

Comments

Liam R. Howlett Oct. 11, 2024, 5:54 p.m. UTC | #1
* Jann Horn <jannh@google.com> [241011 11:51]:
> As explained in the comment block this change adds, we can't tell what
> userspace's intent is when the stack grows towards an inaccessible VMA.
> 
> We should ensure that, as long as code is compiled with something like
> -fstack-check, a stack overflow in this code can never cause the main stack
> to overflow into adjacent heap memory - so the bottom of a stack should
> never be directly adjacent to an accessible VMA.
> 
> As suggested by Lorenzo, enforce this by blocking attempts to:
> 
>  - make an inaccessible VMA accessible with mprotect() when it is too close
>    to a stack
>  - replace an inaccessible VMA with another VMA using MAP_FIXED when it is
>    too close to a stack
> 
> 
> I have a (highly contrived) C testcase for 32-bit x86 userspace with glibc
> that mixes malloc(), pthread creation, and recursion in just the right way
> such that the main stack overflows into malloc() arena memory, see the
> linked list post.
> 
> I don't know of any specific scenario where this is actually exploitable,
> but it seems like it could be a security problem for sufficiently unlucky
> userspace.
> 
> Link: https://lore.kernel.org/r/CAG48ez2v=r9-37JADA5DgnZdMLCjcbVxAjLt5eH5uoBohRdqsw@mail.gmail.com/
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Fixes: 561b5e0709e4 ("mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack")
> Cc: stable@vger.kernel.org
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
> This is an attempt at the alternate fix approach suggested by Lorenzo.
> 
> This turned into more code than I would prefer to have for a scenario
> like this...
> 
> Also, the way the gap is enforced in my patch for MAP_FIXED_NOREPLACE is
> a bit ugly. In the existing code, __get_unmapped_area() normally already
> enforces the stack gap even when it is called with a hint; but when
> MAP_FIXED_NOREPLACE is used, we kinda lie to __get_unmapped_area() and
> tell it we'll do a MAP_FIXED mapping (introduced in commit
> a4ff8e8620d3f when MAP_FIXED_NOREPLACE was created), then afterwards
> manually reject overlapping mappings.
> So I ended up also doing the gap check separately for
> MAP_FIXED_NOREPLACE.
> 
> The following test program exercises scenarios that could lead to the
> stack becoming directly adjacent to another accessible VMA,
> and passes with this patch applied:
> <<<
> 
> int main(void) {
>   setbuf(stdout, NULL);
> 
>   char *ptr = (char*)(  (unsigned long)(STACK_POINTER() - (1024*1024*4)/*4MiB*/) & ~0xfffUL  );
>   if (mmap(ptr, 0x1000, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0) != ptr)
>     err(1, "mmap distant-from-stack");
>   *(volatile char *)(ptr + 0x1000); /* expand stack */
>   system("echo;cat /proc/$PPID/maps;echo");
> 
>   /* test transforming PROT_NONE mapping adjacent to stack */
>   if (mprotect(ptr, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC) == 0)
>     errx(1, "mprotect adjacent to stack allowed");
>   if (mmap(ptr, 0x1000, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0) != MAP_FAILED)
>     errx(1, "MAP_FIXED adjacent to stack allowed");
> 
>   if (munmap(ptr, 0x1000))
>     err(1, "munmap failed???");
> 
>   /* test creating new mapping adjacent to stack */
>   if (mmap(ptr, 0x1000, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED_NOREPLACE, -1, 0) != MAP_FAILED)
>     errx(1, "MAP_FIXED_NOREPLACE adjacent to stack allowed");
>   if (mmap(ptr, 0x1000, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0) == ptr)
>     errx(1, "mmap hint adjacent to stack accepted");
> 
>   printf("all tests passed\n");
> }
> >>>
> ---
> Changes in v2:
> - Entirely new approach (suggested by Lorenzo)
> - Link to v1: https://lore.kernel.org/r/20241008-stack-gap-inaccessible-v1-1-848d4d891f21@google.com
> ---
>  include/linux/mm.h |  1 +
>  mm/mmap.c          | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/mprotect.c      |  6 +++++
>  3 files changed, 72 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ecf63d2b0582..ecd4afc304ca 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3520,6 +3520,7 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
>  
>  struct vm_area_struct *find_extend_vma_locked(struct mm_struct *,
>  		unsigned long addr);
> +bool overlaps_stack_gap(struct mm_struct *mm, unsigned long addr, unsigned long len);
>  int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
>  			unsigned long pfn, unsigned long size, pgprot_t);
>  int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
> diff --git a/mm/mmap.c b/mm/mmap.c
> index dd4b35a25aeb..937361be3c48 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -359,6 +359,20 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  			return -EEXIST;
>  	}
>  
> +	/*
> +	 * This does two things:
> +	 *
> +	 * 1. Disallow MAP_FIXED replacing a PROT_NONE VMA adjacent to a stack
> +	 * with an accessible VMA.
> +	 * 2. Disallow MAP_FIXED_NOREPLACE creating a new accessible VMA
> +	 * adjacent to a stack.
> +	 */
> +	if ((flags & (MAP_FIXED_NOREPLACE | MAP_FIXED)) &&
> +	    (prot & (PROT_READ | PROT_WRITE | PROT_EXEC)) &&
> +	    !(vm_flags & (VM_GROWSUP|VM_GROWSDOWN)) &&
> +	    overlaps_stack_gap(mm, addr, len))
> +		return (flags & MAP_FIXED) ? -ENOMEM : -EEXIST;
> +

This is probably going to impact performance for allocators by causing
two walks of the tree any time they protect a portion of mmaped area.

In the mmap_region() code, there is a place we know next/prev on
MAP_FIXED, and next for MAP_FIXED_NOREPLACE - which has a vma iterator
that would be lower cost than a tree walk.  That area may be a better
place to check these requirements.  Unfortunately, it may cause a vma
split in the vms_gather_munmap_vmas() call prior to this check, but
considering the rarity it may not be that big of a deal?

>  	if (flags & MAP_LOCKED)
>  		if (!can_do_mlock())
>  			return -EPERM;
> @@ -1341,6 +1355,57 @@ struct vm_area_struct *expand_stack(struct mm_struct *mm, unsigned long addr)
>  	return vma;
>  }
>  
> +/*
> + * Does the specified VA range overlap the stack gap of a preceding or following
> + * stack VMA?
> + * Overlapping stack VMAs are ignored - so if someone deliberately creates a
> + * MAP_FIXED mapping in the middle of a stack or such, we let that go through.
> + *
> + * This is needed partly because userspace's intent when making PROT_NONE
> + * mappings is unclear; there are two different reasons for creating PROT_NONE
> + * mappings:
> + *
> + * A) Userspace wants to create its own guard mapping, for example for stacks.
> + * According to
> + * <https://lore.kernel.org/all/1499126133.2707.20.camel@decadent.org.uk/T/>,
> + * some Rust/Java programs do this with the main stack.
> + * Enforcing the kernel's stack gap between these userspace guard mappings and
> + * the main stack breaks stuff.
> + *
> + * B) Userspace wants to reserve some virtual address space for later mappings.
> + * This is done by memory allocators.
> + * In this case, we want to enforce a stack gap between the mapping and the
> + * stack.
> + *
> + * Because we can't tell these cases apart when a PROT_NONE mapping is created,
> + * we instead enforce the stack gap when a PROT_NONE mapping is made accessible
> + * (using mprotect()) or replaced with an accessible one (using MAP_FIXED).
> + */
> +bool overlaps_stack_gap(struct mm_struct *mm, unsigned long addr, unsigned long len)
> +{
> +
> +	struct vm_area_struct *vma, *prev_vma;
> +
> +	/* step 1: search for a non-overlapping following stack VMA */
> +	vma = find_vma(mm, addr+len);
> +	if (vma && vma->vm_start >= addr+len) {
> +		/* is it too close? */
> +		if (vma->vm_start - (addr+len) < stack_guard_start_gap(vma))
> +			return true;
> +	}
> +
> +	/* step 2: search for a non-overlapping preceding stack VMA */
> +	if (!IS_ENABLED(CONFIG_STACK_GROWSUP))
> +		return false;
> +	vma = find_vma_prev(mm, addr, &prev_vma);
> +	/* don't handle cases where the VA start overlaps a VMA */
> +	if (vma && vma->vm_start < addr)
> +		return false;
> +	if (!prev_vma || !(prev_vma->vm_flags & VM_GROWSUP))
> +		return false;
> +	return addr - prev_vma->vm_end < stack_guard_gap;
> +}
> +
>  /* do_munmap() - Wrapper function for non-maple tree aware do_munmap() calls.
>   * @mm: The mm_struct
>   * @start: The start address to munmap
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 0c5d6d06107d..2300e2eff956 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -772,6 +772,12 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
>  		}
>  	}
>  
> +	error = -ENOMEM;
> +	if ((prot & (PROT_READ | PROT_WRITE | PROT_EXEC)) &&
> +	    !(vma->vm_flags & (VM_GROWSUP|VM_GROWSDOWN)) &&
> +	    overlaps_stack_gap(current->mm, start, end - start))
> +		goto out;
> +

We have prev just below your call here, so we could reuse that.  Getting
the vma after the mprotect range doesn't seem that easy.  I guess we
need to make the loop even more complicated and find the next vma (and
remember the fixup can merge).  This isn't as straight forward as what
you have, but would be faster.

>  	prev = vma_prev(&vmi);
>  	if (start > vma->vm_start)
>  		prev = vma;
> 
> ---
> base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b
> change-id: 20241008-stack-gap-inaccessible-c7319f7d4b1b
> -- 
> Jann Horn <jannh@google.com>
>
Jann Horn Oct. 11, 2024, 6:46 p.m. UTC | #2
On Fri, Oct 11, 2024 at 7:55 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> * Jann Horn <jannh@google.com> [241011 11:51]:
> > As explained in the comment block this change adds, we can't tell what
> > userspace's intent is when the stack grows towards an inaccessible VMA.
> >
> > We should ensure that, as long as code is compiled with something like
> > -fstack-check, a stack overflow in this code can never cause the main stack
> > to overflow into adjacent heap memory - so the bottom of a stack should
> > never be directly adjacent to an accessible VMA.
[...]
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index dd4b35a25aeb..937361be3c48 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -359,6 +359,20 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> >                       return -EEXIST;
> >       }
> >
> > +     /*
> > +      * This does two things:
> > +      *
> > +      * 1. Disallow MAP_FIXED replacing a PROT_NONE VMA adjacent to a stack
> > +      * with an accessible VMA.
> > +      * 2. Disallow MAP_FIXED_NOREPLACE creating a new accessible VMA
> > +      * adjacent to a stack.
> > +      */
> > +     if ((flags & (MAP_FIXED_NOREPLACE | MAP_FIXED)) &&
> > +         (prot & (PROT_READ | PROT_WRITE | PROT_EXEC)) &&
> > +         !(vm_flags & (VM_GROWSUP|VM_GROWSDOWN)) &&
> > +         overlaps_stack_gap(mm, addr, len))
> > +             return (flags & MAP_FIXED) ? -ENOMEM : -EEXIST;
> > +
>
> This is probably going to impact performance for allocators by causing
> two walks of the tree any time they protect a portion of mmaped area.

Well, it's one extra walk except on parisc, thanks to the "if
(!IS_ENABLED(CONFIG_STACK_GROWSUP))" bailout - but point taken, it
would be better to avoid that.

> In the mmap_region() code, there is a place we know next/prev on
> MAP_FIXED, and next for MAP_FIXED_NOREPLACE - which has a vma iterator
> that would be lower cost than a tree walk.  That area may be a better
> place to check these requirements.  Unfortunately, it may cause a vma
> split in the vms_gather_munmap_vmas() call prior to this check, but
> considering the rarity it may not be that big of a deal?

Hmm, yeah, that sounds fine to me.

[...]
> > diff --git a/mm/mprotect.c b/mm/mprotect.c
> > index 0c5d6d06107d..2300e2eff956 100644
> > --- a/mm/mprotect.c
> > +++ b/mm/mprotect.c
> > @@ -772,6 +772,12 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
> >               }
> >       }
> >
> > +     error = -ENOMEM;
> > +     if ((prot & (PROT_READ | PROT_WRITE | PROT_EXEC)) &&
> > +         !(vma->vm_flags & (VM_GROWSUP|VM_GROWSDOWN)) &&
> > +         overlaps_stack_gap(current->mm, start, end - start))
> > +             goto out;
> > +
>
> We have prev just below your call here, so we could reuse that.  Getting
> the vma after the mprotect range doesn't seem that easy.  I guess we
> need to make the loop even more complicated and find the next vma (and
> remember the fixup can merge).  This isn't as straight forward as what
> you have, but would be faster.

For mprotect, maybe one option would be to do it inside the loop?
Something like this:

```
diff --git a/mm/mprotect.c b/mm/mprotect.c
index d0e3ebfadef8..2873cc254eaf 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -790,6 +790,24 @@ static int do_mprotect_pkey(unsigned long start,
size_t len,
                        break;
                }

+               if (IS_ENABLED(CONFIG_STACK_GROWSUP) && vma->vm_start
== start) {
+                       /* just do an extra lookup here, we do this
only on parisc */
+                       if (overlaps_stack_gap_growsup([...])) {
+                               error = -ENOMEM;
+                               break;
+                       }
+               }
+               if (vma->vm_end == end) {
+                       /* peek ahead */
+                       struct vma_iterator vmi_peek = vmi;
+                       struct vm_area_struct *next = vma_next(&vmi_peek);
+
+                       if (next && overlaps_stack_gap_growsdown([...], next)) {
+                               error = -ENOMEM;
+                               break;
+                       }
+               }
+
                /* Does the application expect PROT_READ to imply PROT_EXEC */
                if (rier && (vma->vm_flags & VM_MAYEXEC))
                        prot |= PROT_EXEC;
```

Assuming that well-behaved userspace only calls mprotect() ranges that
are fully covered by VMAs, that should be good enough?

(I don't know how you feel about the idea of peeking ahead from a VMA
iterator by copying the iterator, I imagine you might have a better
way to do that...)

> >       prev = vma_prev(&vmi);
> >       if (start > vma->vm_start)
> >               prev = vma;
Liam R. Howlett Oct. 11, 2024, 8:09 p.m. UTC | #3
* Jann Horn <jannh@google.com> [241011 14:47]:
> On Fri, Oct 11, 2024 at 7:55 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> > * Jann Horn <jannh@google.com> [241011 11:51]:
> > > As explained in the comment block this change adds, we can't tell what
> > > userspace's intent is when the stack grows towards an inaccessible VMA.
> > >
> > > We should ensure that, as long as code is compiled with something like
> > > -fstack-check, a stack overflow in this code can never cause the main stack
> > > to overflow into adjacent heap memory - so the bottom of a stack should
> > > never be directly adjacent to an accessible VMA.
> [...]
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index dd4b35a25aeb..937361be3c48 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -359,6 +359,20 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> > >                       return -EEXIST;
> > >       }
> > >
> > > +     /*
> > > +      * This does two things:
> > > +      *
> > > +      * 1. Disallow MAP_FIXED replacing a PROT_NONE VMA adjacent to a stack
> > > +      * with an accessible VMA.
> > > +      * 2. Disallow MAP_FIXED_NOREPLACE creating a new accessible VMA
> > > +      * adjacent to a stack.
> > > +      */
> > > +     if ((flags & (MAP_FIXED_NOREPLACE | MAP_FIXED)) &&
> > > +         (prot & (PROT_READ | PROT_WRITE | PROT_EXEC)) &&
> > > +         !(vm_flags & (VM_GROWSUP|VM_GROWSDOWN)) &&
> > > +         overlaps_stack_gap(mm, addr, len))
> > > +             return (flags & MAP_FIXED) ? -ENOMEM : -EEXIST;
> > > +
> >
> > This is probably going to impact performance for allocators by causing
> > two walks of the tree any time they protect a portion of mmaped area.
> 
> Well, it's one extra walk except on parisc, thanks to the "if
> (!IS_ENABLED(CONFIG_STACK_GROWSUP))" bailout - but point taken, it
> would be better to avoid that.
> 
> > In the mmap_region() code, there is a place we know next/prev on
> > MAP_FIXED, and next for MAP_FIXED_NOREPLACE - which has a vma iterator
> > that would be lower cost than a tree walk.  That area may be a better
> > place to check these requirements.  Unfortunately, it may cause a vma
> > split in the vms_gather_munmap_vmas() call prior to this check, but
> > considering the rarity it may not be that big of a deal?
> 
> Hmm, yeah, that sounds fine to me.
> 
> [...]
> > > diff --git a/mm/mprotect.c b/mm/mprotect.c
> > > index 0c5d6d06107d..2300e2eff956 100644
> > > --- a/mm/mprotect.c
> > > +++ b/mm/mprotect.c
> > > @@ -772,6 +772,12 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
> > >               }
> > >       }
> > >
> > > +     error = -ENOMEM;
> > > +     if ((prot & (PROT_READ | PROT_WRITE | PROT_EXEC)) &&
> > > +         !(vma->vm_flags & (VM_GROWSUP|VM_GROWSDOWN)) &&
> > > +         overlaps_stack_gap(current->mm, start, end - start))
> > > +             goto out;
> > > +
> >
> > We have prev just below your call here, so we could reuse that.  Getting
> > the vma after the mprotect range doesn't seem that easy.  I guess we
> > need to make the loop even more complicated and find the next vma (and
> > remember the fixup can merge).  This isn't as straight forward as what
> > you have, but would be faster.
> 
> For mprotect, maybe one option would be to do it inside the loop?
> Something like this:
> 
> ```
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index d0e3ebfadef8..2873cc254eaf 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -790,6 +790,24 @@ static int do_mprotect_pkey(unsigned long start,
> size_t len,
>                         break;
>                 }
> 
> +               if (IS_ENABLED(CONFIG_STACK_GROWSUP) && vma->vm_start
> == start) {
> +                       /* just do an extra lookup here, we do this
> only on parisc */
> +                       if (overlaps_stack_gap_growsup([...])) {
> +                               error = -ENOMEM;
> +                               break;
> +                       }
> +               }

Okay, so this part, before you were checking the next vma.  Since this
is only going to be run for the first vma (vma->vm_start == start), we
can probably move this outside the loop and just get the next vma then
move the vma iterator back (see notes below).

> +               if (vma->vm_end == end) {
> +                       /* peek ahead */
> +                       struct vma_iterator vmi_peek = vmi;
> +                       struct vm_area_struct *next = vma_next(&vmi_peek);
> +
> +                       if (next && overlaps_stack_gap_growsdown([...], next)) {
> +                               error = -ENOMEM;
> +                               break;
> +                       }
> +               }
> +
>                 /* Does the application expect PROT_READ to imply PROT_EXEC */
>                 if (rier && (vma->vm_flags & VM_MAYEXEC))
>                         prot |= PROT_EXEC;
> ```
> 
> Assuming that well-behaved userspace only calls mprotect() ranges that
> are fully covered by VMAs, that should be good enough?

mprotect can split and merge, but I think the side effect here would be
doing an earlier lookup in that rare case.  And it would only matter if
we were not going to split, so vma->vm_end == end works here (since
splitting means the soon-to-be-next vma is already validated).

Annoyingly the merge will re-find the next vma.

> 
> (I don't know how you feel about the idea of peeking ahead from a VMA
> iterator by copying the iterator, I imagine you might have a better
> way to do that...)

vma_next() maps to mas_find(), while vma_prev() maps to mas_prev().  A
valid operation is to find the value then go back one.  The maple state
will do the right thing and return the state to the previous entry even
if there is not a next entry.

All that to say you can probably avoid copying the iterator and just get
vma_next(); then move back with vma_prev().

There is also vma_iter_next_range() and vma_iter_prev_range(), which may
make a better choice as the next/prev range will either be NULL or have
a vma that touches the current one - and that's the case we are
interested in checking.

> 
> > >       prev = vma_prev(&vmi);
> > >       if (start > vma->vm_start)
> > >               prev = vma;
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ecf63d2b0582..ecd4afc304ca 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3520,6 +3520,7 @@  unsigned long change_prot_numa(struct vm_area_struct *vma,
 
 struct vm_area_struct *find_extend_vma_locked(struct mm_struct *,
 		unsigned long addr);
+bool overlaps_stack_gap(struct mm_struct *mm, unsigned long addr, unsigned long len);
 int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
 			unsigned long pfn, unsigned long size, pgprot_t);
 int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/mmap.c b/mm/mmap.c
index dd4b35a25aeb..937361be3c48 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -359,6 +359,20 @@  unsigned long do_mmap(struct file *file, unsigned long addr,
 			return -EEXIST;
 	}
 
+	/*
+	 * This does two things:
+	 *
+	 * 1. Disallow MAP_FIXED replacing a PROT_NONE VMA adjacent to a stack
+	 * with an accessible VMA.
+	 * 2. Disallow MAP_FIXED_NOREPLACE creating a new accessible VMA
+	 * adjacent to a stack.
+	 */
+	if ((flags & (MAP_FIXED_NOREPLACE | MAP_FIXED)) &&
+	    (prot & (PROT_READ | PROT_WRITE | PROT_EXEC)) &&
+	    !(vm_flags & (VM_GROWSUP|VM_GROWSDOWN)) &&
+	    overlaps_stack_gap(mm, addr, len))
+		return (flags & MAP_FIXED) ? -ENOMEM : -EEXIST;
+
 	if (flags & MAP_LOCKED)
 		if (!can_do_mlock())
 			return -EPERM;
@@ -1341,6 +1355,57 @@  struct vm_area_struct *expand_stack(struct mm_struct *mm, unsigned long addr)
 	return vma;
 }
 
+/*
+ * Does the specified VA range overlap the stack gap of a preceding or following
+ * stack VMA?
+ * Overlapping stack VMAs are ignored - so if someone deliberately creates a
+ * MAP_FIXED mapping in the middle of a stack or such, we let that go through.
+ *
+ * This is needed partly because userspace's intent when making PROT_NONE
+ * mappings is unclear; there are two different reasons for creating PROT_NONE
+ * mappings:
+ *
+ * A) Userspace wants to create its own guard mapping, for example for stacks.
+ * According to
+ * <https://lore.kernel.org/all/1499126133.2707.20.camel@decadent.org.uk/T/>,
+ * some Rust/Java programs do this with the main stack.
+ * Enforcing the kernel's stack gap between these userspace guard mappings and
+ * the main stack breaks stuff.
+ *
+ * B) Userspace wants to reserve some virtual address space for later mappings.
+ * This is done by memory allocators.
+ * In this case, we want to enforce a stack gap between the mapping and the
+ * stack.
+ *
+ * Because we can't tell these cases apart when a PROT_NONE mapping is created,
+ * we instead enforce the stack gap when a PROT_NONE mapping is made accessible
+ * (using mprotect()) or replaced with an accessible one (using MAP_FIXED).
+ */
+bool overlaps_stack_gap(struct mm_struct *mm, unsigned long addr, unsigned long len)
+{
+
+	struct vm_area_struct *vma, *prev_vma;
+
+	/* step 1: search for a non-overlapping following stack VMA */
+	vma = find_vma(mm, addr+len);
+	if (vma && vma->vm_start >= addr+len) {
+		/* is it too close? */
+		if (vma->vm_start - (addr+len) < stack_guard_start_gap(vma))
+			return true;
+	}
+
+	/* step 2: search for a non-overlapping preceding stack VMA */
+	if (!IS_ENABLED(CONFIG_STACK_GROWSUP))
+		return false;
+	vma = find_vma_prev(mm, addr, &prev_vma);
+	/* don't handle cases where the VA start overlaps a VMA */
+	if (vma && vma->vm_start < addr)
+		return false;
+	if (!prev_vma || !(prev_vma->vm_flags & VM_GROWSUP))
+		return false;
+	return addr - prev_vma->vm_end < stack_guard_gap;
+}
+
 /* do_munmap() - Wrapper function for non-maple tree aware do_munmap() calls.
  * @mm: The mm_struct
  * @start: The start address to munmap
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 0c5d6d06107d..2300e2eff956 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -772,6 +772,12 @@  static int do_mprotect_pkey(unsigned long start, size_t len,
 		}
 	}
 
+	error = -ENOMEM;
+	if ((prot & (PROT_READ | PROT_WRITE | PROT_EXEC)) &&
+	    !(vma->vm_flags & (VM_GROWSUP|VM_GROWSDOWN)) &&
+	    overlaps_stack_gap(current->mm, start, end - start))
+		goto out;
+
 	prev = vma_prev(&vmi);
 	if (start > vma->vm_start)
 		prev = vma;