Message ID | 20241008-stack-gap-inaccessible-v1-1-848d4d891f21@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: Enforce a minimal stack gap even against inaccessible VMAs | expand |
This is touching mm/mmap.c, please ensure to cc- the reviewers (me and Liam, I have cc'd Liam here) as listed in MAINTAINERS when submitting patches for this file. Also this seems like a really speculative 'please discuss' change so should be an RFC imo. On Tue, Oct 08, 2024 at 12:55:39AM +0200, Jann Horn wrote: > As explained in the comment block this change adds, we can't tell what > userspace's intent is when the stack grows towards an inaccessible VMA. > > I have a (highly contrived) C testcase for 32-bit x86 userspace with glibc > that mixes malloc(), pthread creation, and recursion in just the right way > such that the main stack overflows into malloc() arena memory. > (Let me know if you want me to share that.) You are claiming a fixes from 2017 and cc'ing stable on a non-RFC so yeah... we're going to need more than your word for it :) we will want to be really sure this is a thing before we backport to basically every stable kernel. Surely this isn't that hard to demonstrate though? You mmap something PROT_NONE just stack gap below the stack, then intentionally extend stack to it, before mprotect()'ing the PROT_NONE region? > > I don't know of any specific scenario where this is actually exploitable, > but it seems like it could be a security problem for sufficiently unlucky > userspace. > > I believe we should ensure that, as long as code is compiled with something > like -fstack-check, a stack overflow in it can never cause the main stack > to overflow into adjacent heap memory. > > My fix effectively reverts the behavior for !vma_is_accessible() VMAs to > the behavior before commit 1be7107fbe18 ("mm: larger stack guard gap, > between vmas"), so I think it should be a fairly safe change even in > case A. > > Fixes: 561b5e0709e4 ("mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack") > Cc: stable@vger.kernel.org > Signed-off-by: Jann Horn <jannh@google.com> > --- > I have tested that Libreoffice still launches after this change, though > I don't know if that means anything. > > Note that I haven't tested the growsup code. > --- > mm/mmap.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++------- > 1 file changed, 46 insertions(+), 7 deletions(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index dd4b35a25aeb..971bfd6c1cea 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1064,10 +1064,12 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address) > gap_addr = TASK_SIZE; > > next = find_vma_intersection(mm, vma->vm_end, gap_addr); > - if (next && vma_is_accessible(next)) { > - if (!(next->vm_flags & VM_GROWSUP)) > + if (next && !(next->vm_flags & VM_GROWSUP)) { > + /* see comments in expand_downwards() */ > + if (vma_is_accessible(prev)) > + return -ENOMEM; > + if (address == next->vm_start) > return -ENOMEM; > - /* Check that both stack segments have the same anon_vma? */ I hate that we even maintain this for one single arch I believe at this point? > } > > if (next) > @@ -1155,10 +1157,47 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address) > /* Enforce stack_guard_gap */ > prev = vma_prev(&vmi); > /* Check that both stack segments have the same anon_vma? */ > - if (prev) { > - if (!(prev->vm_flags & VM_GROWSDOWN) && > - vma_is_accessible(prev) && > - (address - prev->vm_end < stack_guard_gap)) > + if (prev && !(prev->vm_flags & VM_GROWSDOWN) && > + (address - prev->vm_end < stack_guard_gap)) { > + /* > + * If the previous VMA is accessible, this is the normal case > + * where the main stack is growing down towards some unrelated > + * VMA. Enforce the full stack guard gap. > + */ > + if (vma_is_accessible(prev)) > + return -ENOMEM; > + > + /* > + * If the previous VMA is not accessible, we have a problem: > + * We can't tell what userspace's intent is. > + * > + * Case A: > + * Maybe userspace wants to use the previous VMA as a > + * "guard region" at the bottom of the main stack, in which case > + * userspace wants us to grow the stack until it is adjacent to > + * the guard region. Apparently some Java runtime environments > + * and Rust do that? > + * That is kind of ugly, and in that case userspace really ought > + * to ensure that the stack is fully expanded immediately, but > + * we have to handle this case. Yeah we can't break userspace on this, no doubt somebody is relying on this _somewhere_. That said, I wish we disallowed this altogether regardless of accessibility. > + * > + * Case B: > + * But maybe the previous VMA is entirely unrelated to the stack > + * and is only *temporarily* PROT_NONE. For example, glibc > + * malloc arenas create a big PROT_NONE region and then > + * progressively mark parts of it as writable. > + * In that case, we must not let the stack become adjacent to > + * the previous VMA. Otherwise, after the region later becomes > + * writable, a stack overflow will cause the stack to grow into > + * the previous VMA, and we won't have any stack gap to protect > + * against this. Should be careful with terminology here, an mprotect() will not allow a merge so by 'grow into' you mean that a stack VMA could become immediately adjacent to a non-stack VMA prior to it which was later made writable. Perhaps I am being pedantic... > + * > + * As an ugly tradeoff, enforce a single-page gap. > + * A single page will hopefully be small enough to not be > + * noticed in case A, while providing the same level of > + * protection in case B that normal userspace threads get. > + */ > + if (address == prev->vm_end) > return -ENOMEM; Ugh, yuck. Not a fan of this at all. Feels like a dreadful hack. You do raise an important point here, but it strikes me that we should be doing this check in the mprotect()/mmap() MAP_FIXED scenarios where it shouldn't be too costly to check against the next VMA (which we will be obtaining anyway for merge checks)? That way we don't need a hack like this, and can just disallow the operation. That'd probably be as liable to break the program as an -ENOMEM on a stack expansion would... > } > > > --- > base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b > change-id: 20241008-stack-gap-inaccessible-c7319f7d4b1b > -- > Jann Horn <jannh@google.com> >
On Tue, Oct 8, 2024 at 1:40 PM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > This is touching mm/mmap.c, please ensure to cc- the reviewers (me and > Liam, I have cc'd Liam here) as listed in MAINTAINERS when submitting > patches for this file. Ah, my bad, sorry about that. > Also this seems like a really speculative 'please discuss' change so should > be an RFC imo. > > On Tue, Oct 08, 2024 at 12:55:39AM +0200, Jann Horn wrote: > > As explained in the comment block this change adds, we can't tell what > > userspace's intent is when the stack grows towards an inaccessible VMA. > > > > I have a (highly contrived) C testcase for 32-bit x86 userspace with glibc > > that mixes malloc(), pthread creation, and recursion in just the right way > > such that the main stack overflows into malloc() arena memory. > > (Let me know if you want me to share that.) > > You are claiming a fixes from 2017 and cc'ing stable on a non-RFC so > yeah... we're going to need more than your word for it :) we will want to > be really sure this is a thing before we backport to basically every > stable kernel. > > Surely this isn't that hard to demonstrate though? You mmap something > PROT_NONE just stack gap below the stack, then intentionally extend stack > to it, before mprotect()'ing the PROT_NONE region? I've attached my test case that demonstrates this using basically only malloc, free, pthread_create() and recursion, plus a bunch of ugly read-only gunk and synchronization. It assumes that it runs under glibc, as a 32-bit x86 binary. Usage: $ clang -O2 -fstack-check -m32 -o grow-32 grow-32.c -pthread -ggdb && for i in {0..10}; do ./grow-32; done corrupted thread_obj2 at depth 190528 corrupted thread_obj2 at depth 159517 corrupted thread_obj2 at depth 209777 corrupted thread_obj2 at depth 200119 corrupted thread_obj2 at depth 208093 corrupted thread_obj2 at depth 167705 corrupted thread_obj2 at depth 234523 corrupted thread_obj2 at depth 174528 corrupted thread_obj2 at depth 223823 corrupted thread_obj2 at depth 199816 grow-32: malloc failed: Cannot allocate memory This demonstrates that it is possible for a userspace program that is just using basic libc functionality, and whose only bug is unbounded recursion, to corrupt memory despite being built with -fstack-check. As you said, to just demonstrate the core issue in a more contrived way, you can also use a simpler example: $ cat basic-grow-repro.c #include <err.h> #include <stdlib.h> #include <sys/mman.h> #define STACK_POINTER() ({ void *__stack; asm volatile("mov %%rsp, %0":"=r"(__stack)); __stack; }) int main(void) { char *ptr = (char*)( (unsigned long)(STACK_POINTER() - (1024*1024*4)/*4MiB*/) & ~0xfffUL ); if (mmap(ptr, 0x1000, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0) != ptr) err(1, "mmap"); *(volatile char *)(ptr + 0x1000); /* expand stack */ if (mprotect(ptr, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC)) err(1, "mprotect"); system("cat /proc/$PPID/maps"); } $ gcc -o basic-grow-repro basic-grow-repro.c $ ./basic-grow-repro 5600a0fef000-5600a0ff0000 r--p 00000000 fd:01 28313737 [...]/basic-grow-repro 5600a0ff0000-5600a0ff1000 r-xp 00001000 fd:01 28313737 [...]/basic-grow-repro 5600a0ff1000-5600a0ff2000 r--p 00002000 fd:01 28313737 [...]/basic-grow-repro 5600a0ff2000-5600a0ff3000 r--p 00002000 fd:01 28313737 [...]/basic-grow-repro 5600a0ff3000-5600a0ff4000 rw-p 00003000 fd:01 28313737 [...]/basic-grow-repro 7f9a88553000-7f9a88556000 rw-p 00000000 00:00 0 7f9a88556000-7f9a8857c000 r--p 00000000 fd:01 3417714 /usr/lib/x86_64-linux-gnu/libc.so.6 7f9a8857c000-7f9a886d2000 r-xp 00026000 fd:01 3417714 /usr/lib/x86_64-linux-gnu/libc.so.6 7f9a886d2000-7f9a88727000 r--p 0017c000 fd:01 3417714 /usr/lib/x86_64-linux-gnu/libc.so.6 7f9a88727000-7f9a8872b000 r--p 001d0000 fd:01 3417714 /usr/lib/x86_64-linux-gnu/libc.so.6 7f9a8872b000-7f9a8872d000 rw-p 001d4000 fd:01 3417714 /usr/lib/x86_64-linux-gnu/libc.so.6 7f9a8872d000-7f9a8873a000 rw-p 00000000 00:00 0 7f9a88754000-7f9a88756000 rw-p 00000000 00:00 0 7f9a88756000-7f9a8875a000 r--p 00000000 00:00 0 [vvar] 7f9a8875a000-7f9a8875c000 r-xp 00000000 00:00 0 [vdso] 7f9a8875c000-7f9a8875d000 r--p 00000000 fd:01 3409055 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f9a8875d000-7f9a88782000 r-xp 00001000 fd:01 3409055 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f9a88782000-7f9a8878c000 r--p 00026000 fd:01 3409055 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f9a8878c000-7f9a8878e000 r--p 00030000 fd:01 3409055 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f9a8878e000-7f9a88790000 rw-p 00032000 fd:01 3409055 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7fff84664000-7fff84665000 rwxp 00000000 00:00 0 7fff84665000-7fff84a67000 rw-p 00000000 00:00 0 [stack] $ Though, while writing the above reproducer, I noticed another dodgy scenario regarding the stack gap: MAP_FIXED_NOREPLACE apparently ignores the stack guard region, because it only checks for VMA intersection, see this example: $ cat basic-grow-repro-ohno.c #include <err.h> #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #define STACK_POINTER() ({ void *__stack; asm volatile("mov %%rsp, %0":"=r"(__stack)); __stack; }) int main(void) { setbuf(stdout, NULL); char *ptr = (char*)( (unsigned long)(STACK_POINTER() - (1024*1024*4)/*4MiB*/) & ~0xfffUL ); *(volatile char *)(ptr + 0x1000); /* expand stack to just above ptr */ printf("trying to map at %p\n", ptr); system("cat /proc/$PPID/maps;echo"); if (mmap(ptr, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_FIXED_NOREPLACE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0) != ptr) err(1, "mmap"); system("cat /proc/$PPID/maps"); } $ gcc -o basic-grow-repro-ohno basic-grow-repro-ohno.c $ ./basic-grow-repro-ohno trying to map at 0x7ffc344ca000 560ee371d000-560ee371e000 r--p 00000000 fd:01 28313842 [...]/basic-grow-repro-ohno 560ee371e000-560ee371f000 r-xp 00001000 fd:01 28313842 [...]/basic-grow-repro-ohno 560ee371f000-560ee3720000 r--p 00002000 fd:01 28313842 [...]/basic-grow-repro-ohno 560ee3720000-560ee3721000 r--p 00002000 fd:01 28313842 [...]/basic-grow-repro-ohno 560ee3721000-560ee3722000 rw-p 00003000 fd:01 28313842 [...]/basic-grow-repro-ohno 7f0d636ed000-7f0d636f0000 rw-p 00000000 00:00 0 7f0d636f0000-7f0d63716000 r--p 00000000 fd:01 3417714 /usr/lib/x86_64-linux-gnu/libc.so.6 7f0d63716000-7f0d6386c000 r-xp 00026000 fd:01 3417714 /usr/lib/x86_64-linux-gnu/libc.so.6 7f0d6386c000-7f0d638c1000 r--p 0017c000 fd:01 3417714 /usr/lib/x86_64-linux-gnu/libc.so.6 7f0d638c1000-7f0d638c5000 r--p 001d0000 fd:01 3417714 /usr/lib/x86_64-linux-gnu/libc.so.6 7f0d638c5000-7f0d638c7000 rw-p 001d4000 fd:01 3417714 /usr/lib/x86_64-linux-gnu/libc.so.6 7f0d638c7000-7f0d638d4000 rw-p 00000000 00:00 0 7f0d638ee000-7f0d638f0000 rw-p 00000000 00:00 0 7f0d638f0000-7f0d638f4000 r--p 00000000 00:00 0 [vvar] 7f0d638f4000-7f0d638f6000 r-xp 00000000 00:00 0 [vdso] 7f0d638f6000-7f0d638f7000 r--p 00000000 fd:01 3409055 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f0d638f7000-7f0d6391c000 r-xp 00001000 fd:01 3409055 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f0d6391c000-7f0d63926000 r--p 00026000 fd:01 3409055 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f0d63926000-7f0d63928000 r--p 00030000 fd:01 3409055 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f0d63928000-7f0d6392a000 rw-p 00032000 fd:01 3409055 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7ffc344cb000-7ffc348cd000 rw-p 00000000 00:00 0 [stack] 560ee371d000-560ee371e000 r--p 00000000 fd:01 28313842 [...]/basic-grow-repro-ohno 560ee371e000-560ee371f000 r-xp 00001000 fd:01 28313842 [...]/basic-grow-repro-ohno 560ee371f000-560ee3720000 r--p 00002000 fd:01 28313842 [...]/basic-grow-repro-ohno 560ee3720000-560ee3721000 r--p 00002000 fd:01 28313842 [...]/basic-grow-repro-ohno 560ee3721000-560ee3722000 rw-p 00003000 fd:01 28313842 [...]/basic-grow-repro-ohno 7f0d636ed000-7f0d636f0000 rw-p 00000000 00:00 0 7f0d636f0000-7f0d63716000 r--p 00000000 fd:01 3417714 /usr/lib/x86_64-linux-gnu/libc.so.6 7f0d63716000-7f0d6386c000 r-xp 00026000 fd:01 3417714 /usr/lib/x86_64-linux-gnu/libc.so.6 7f0d6386c000-7f0d638c1000 r--p 0017c000 fd:01 3417714 /usr/lib/x86_64-linux-gnu/libc.so.6 7f0d638c1000-7f0d638c5000 r--p 001d0000 fd:01 3417714 /usr/lib/x86_64-linux-gnu/libc.so.6 7f0d638c5000-7f0d638c7000 rw-p 001d4000 fd:01 3417714 /usr/lib/x86_64-linux-gnu/libc.so.6 7f0d638c7000-7f0d638d4000 rw-p 00000000 00:00 0 7f0d638ee000-7f0d638f0000 rw-p 00000000 00:00 0 7f0d638f0000-7f0d638f4000 r--p 00000000 00:00 0 [vvar] 7f0d638f4000-7f0d638f6000 r-xp 00000000 00:00 0 [vdso] 7f0d638f6000-7f0d638f7000 r--p 00000000 fd:01 3409055 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f0d638f7000-7f0d6391c000 r-xp 00001000 fd:01 3409055 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f0d6391c000-7f0d63926000 r--p 00026000 fd:01 3409055 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f0d63926000-7f0d63928000 r--p 00030000 fd:01 3409055 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7f0d63928000-7f0d6392a000 rw-p 00032000 fd:01 3409055 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 7ffc344ca000-7ffc344cb000 rwxp 00000000 00:00 0 7ffc344cb000-7ffc348cd000 rw-p 00000000 00:00 0 [stack] $ That could also be bad: MAP_FIXED_NOREPLACE exists, from what I understand, partly so that malloc implementations can use it to grow heap memory chunks (though glibc doesn't use it, I'm not sure who actually uses it that way). We wouldn't want a malloc implementation to grow a heap memory chunk until it is directly adjacent to a stack. > > I don't know of any specific scenario where this is actually exploitable, > > but it seems like it could be a security problem for sufficiently unlucky > > userspace. > > > > I believe we should ensure that, as long as code is compiled with something > > like -fstack-check, a stack overflow in it can never cause the main stack > > to overflow into adjacent heap memory. > > > > My fix effectively reverts the behavior for !vma_is_accessible() VMAs to > > the behavior before commit 1be7107fbe18 ("mm: larger stack guard gap, > > between vmas"), so I think it should be a fairly safe change even in > > case A. > > > > Fixes: 561b5e0709e4 ("mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack") > > Cc: stable@vger.kernel.org > > Signed-off-by: Jann Horn <jannh@google.com> > > --- > > I have tested that Libreoffice still launches after this change, though > > I don't know if that means anything. > > > > Note that I haven't tested the growsup code. > > --- > > mm/mmap.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++------- > > 1 file changed, 46 insertions(+), 7 deletions(-) > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > index dd4b35a25aeb..971bfd6c1cea 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -1064,10 +1064,12 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address) > > gap_addr = TASK_SIZE; > > > > next = find_vma_intersection(mm, vma->vm_end, gap_addr); > > - if (next && vma_is_accessible(next)) { > > - if (!(next->vm_flags & VM_GROWSUP)) > > + if (next && !(next->vm_flags & VM_GROWSUP)) { > > + /* see comments in expand_downwards() */ > > + if (vma_is_accessible(prev)) > > + return -ENOMEM; > > + if (address == next->vm_start) > > return -ENOMEM; > > - /* Check that both stack segments have the same anon_vma? */ > > I hate that we even maintain this for one single arch I believe at this point? Looks that way, just parisc? It would be so nice if we could somehow just get rid of this concept of growing stacks in general... > > } > > > > if (next) > > @@ -1155,10 +1157,47 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address) > > /* Enforce stack_guard_gap */ > > prev = vma_prev(&vmi); > > /* Check that both stack segments have the same anon_vma? */ > > - if (prev) { > > - if (!(prev->vm_flags & VM_GROWSDOWN) && > > - vma_is_accessible(prev) && > > - (address - prev->vm_end < stack_guard_gap)) > > + if (prev && !(prev->vm_flags & VM_GROWSDOWN) && > > + (address - prev->vm_end < stack_guard_gap)) { > > + /* > > + * If the previous VMA is accessible, this is the normal case > > + * where the main stack is growing down towards some unrelated > > + * VMA. Enforce the full stack guard gap. > > + */ > > + if (vma_is_accessible(prev)) > > + return -ENOMEM; > > + > > + /* > > + * If the previous VMA is not accessible, we have a problem: > > + * We can't tell what userspace's intent is. > > + * > > + * Case A: > > + * Maybe userspace wants to use the previous VMA as a > > + * "guard region" at the bottom of the main stack, in which case > > + * userspace wants us to grow the stack until it is adjacent to > > + * the guard region. Apparently some Java runtime environments > > + * and Rust do that? > > + * That is kind of ugly, and in that case userspace really ought > > + * to ensure that the stack is fully expanded immediately, but > > + * we have to handle this case. > > Yeah we can't break userspace on this, no doubt somebody is relying on this > _somewhere_. It would have to be a new user who appeared after commit 1be7107fbe18. And they'd have to install a "guard vma" somewhere below the main stack, and they'd have to care so much about the size of the stack that a single page makes a difference. > That said, I wish we disallowed this altogether regardless of accessibility. > > > + * > > + * Case B: > > + * But maybe the previous VMA is entirely unrelated to the stack > > + * and is only *temporarily* PROT_NONE. For example, glibc > > + * malloc arenas create a big PROT_NONE region and then > > + * progressively mark parts of it as writable. > > + * In that case, we must not let the stack become adjacent to > > + * the previous VMA. Otherwise, after the region later becomes > > + * writable, a stack overflow will cause the stack to grow into > > + * the previous VMA, and we won't have any stack gap to protect > > + * against this. > > Should be careful with terminology here, an mprotect() will not allow a > merge so by 'grow into' you mean that a stack VMA could become immediately > adjacent to a non-stack VMA prior to it which was later made writable. > > Perhaps I am being pedantic... Ah, sorry, I worded that very confusingly. By "a stack overflow will cause the stack to grow into the previous VMA", I meant that the stack pointer moves into the previous VMA and the program uses part of the previous VMA as stack memory, clobbering whatever was stored there before. That part was not meant to talk about a change of VMA bounds. > > + * > > + * As an ugly tradeoff, enforce a single-page gap. > > + * A single page will hopefully be small enough to not be > > + * noticed in case A, while providing the same level of > > + * protection in case B that normal userspace threads get. > > + */ > > + if (address == prev->vm_end) > > return -ENOMEM; > > Ugh, yuck. Not a fan of this at all. Feels like a dreadful hack. Oh, I agree, I just didn't see a better way to do it. > You do raise an important point here, but it strikes me that we should be > doing this check in the mprotect()/mmap() MAP_FIXED scenarios where it > shouldn't be too costly to check against the next VMA (which we will be > obtaining anyway for merge checks)? > > That way we don't need a hack like this, and can just disallow the > operation. That'd probably be as liable to break the program as an -ENOMEM > on a stack expansion would... Hmm... yeah, I guess that would work. If someone hits this case, it would probably be less obvious to the programmer what went wrong based on the error code, but on the other hand, it would give userspace a slightly better chance to recover from the issue... I guess I can see if I can code that up.
Hi Jann, kernel test robot noticed the following build errors: [auto build test ERROR on 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b] url: https://github.com/intel-lab-lkp/linux/commits/Jann-Horn/mm-Enforce-a-minimal-stack-gap-even-against-inaccessible-VMAs/20241008-065733 base: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b patch link: https://lore.kernel.org/r/20241008-stack-gap-inaccessible-v1-1-848d4d891f21%40google.com patch subject: [PATCH] mm: Enforce a minimal stack gap even against inaccessible VMAs config: parisc-randconfig-r072-20241009 (https://download.01.org/0day-ci/archive/20241009/202410090632.brLG8w0b-lkp@intel.com/config) compiler: hppa-linux-gcc (GCC) 14.1.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241009/202410090632.brLG8w0b-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202410090632.brLG8w0b-lkp@intel.com/ All errors (new ones prefixed by >>): mm/mmap.c: In function 'expand_upwards': >> mm/mmap.c:1069:39: error: 'prev' undeclared (first use in this function) 1069 | if (vma_is_accessible(prev)) | ^~~~ mm/mmap.c:1069:39: note: each undeclared identifier is reported only once for each function it appears in vim +/prev +1069 mm/mmap.c 1036 1037 #if defined(CONFIG_STACK_GROWSUP) 1038 /* 1039 * PA-RISC uses this for its stack. 1040 * vma is the last one with address > vma->vm_end. Have to extend vma. 1041 */ 1042 static int expand_upwards(struct vm_area_struct *vma, unsigned long address) 1043 { 1044 struct mm_struct *mm = vma->vm_mm; 1045 struct vm_area_struct *next; 1046 unsigned long gap_addr; 1047 int error = 0; 1048 VMA_ITERATOR(vmi, mm, vma->vm_start); 1049 1050 if (!(vma->vm_flags & VM_GROWSUP)) 1051 return -EFAULT; 1052 1053 /* Guard against exceeding limits of the address space. */ 1054 address &= PAGE_MASK; 1055 if (address >= (TASK_SIZE & PAGE_MASK)) 1056 return -ENOMEM; 1057 address += PAGE_SIZE; 1058 1059 /* Enforce stack_guard_gap */ 1060 gap_addr = address + stack_guard_gap; 1061 1062 /* Guard against overflow */ 1063 if (gap_addr < address || gap_addr > TASK_SIZE) 1064 gap_addr = TASK_SIZE; 1065 1066 next = find_vma_intersection(mm, vma->vm_end, gap_addr); 1067 if (next && !(next->vm_flags & VM_GROWSUP)) { 1068 /* see comments in expand_downwards() */ > 1069 if (vma_is_accessible(prev)) 1070 return -ENOMEM; 1071 if (address == next->vm_start) 1072 return -ENOMEM; 1073 } 1074 1075 if (next) 1076 vma_iter_prev_range_limit(&vmi, address); 1077 1078 vma_iter_config(&vmi, vma->vm_start, address); 1079 if (vma_iter_prealloc(&vmi, vma)) 1080 return -ENOMEM; 1081 1082 /* We must make sure the anon_vma is allocated. */ 1083 if (unlikely(anon_vma_prepare(vma))) { 1084 vma_iter_free(&vmi); 1085 return -ENOMEM; 1086 } 1087 1088 /* Lock the VMA before expanding to prevent concurrent page faults */ 1089 vma_start_write(vma); 1090 /* 1091 * vma->vm_start/vm_end cannot change under us because the caller 1092 * is required to hold the mmap_lock in read mode. We need the 1093 * anon_vma lock to serialize against concurrent expand_stacks. 1094 */ 1095 anon_vma_lock_write(vma->anon_vma); 1096 1097 /* Somebody else might have raced and expanded it already */ 1098 if (address > vma->vm_end) { 1099 unsigned long size, grow; 1100 1101 size = address - vma->vm_start; 1102 grow = (address - vma->vm_end) >> PAGE_SHIFT; 1103 1104 error = -ENOMEM; 1105 if (vma->vm_pgoff + (size >> PAGE_SHIFT) >= vma->vm_pgoff) { 1106 error = acct_stack_growth(vma, size, grow); 1107 if (!error) { 1108 /* 1109 * We only hold a shared mmap_lock lock here, so 1110 * we need to protect against concurrent vma 1111 * expansions. anon_vma_lock_write() doesn't 1112 * help here, as we don't guarantee that all 1113 * growable vmas in a mm share the same root 1114 * anon vma. So, we reuse mm->page_table_lock 1115 * to guard against concurrent vma expansions. 1116 */ 1117 spin_lock(&mm->page_table_lock); 1118 if (vma->vm_flags & VM_LOCKED) 1119 mm->locked_vm += grow; 1120 vm_stat_account(mm, vma->vm_flags, grow); 1121 anon_vma_interval_tree_pre_update_vma(vma); 1122 vma->vm_end = address; 1123 /* Overwrite old entry in mtree. */ 1124 vma_iter_store(&vmi, vma); 1125 anon_vma_interval_tree_post_update_vma(vma); 1126 spin_unlock(&mm->page_table_lock); 1127 1128 perf_event_mmap(vma); 1129 } 1130 } 1131 } 1132 anon_vma_unlock_write(vma->anon_vma); 1133 vma_iter_free(&vmi); 1134 validate_mm(mm); 1135 return error; 1136 } 1137 #endif /* CONFIG_STACK_GROWSUP */ 1138
On Tue, Oct 08, 2024 at 04:20:13PM +0200, Jann Horn wrote: > On Tue, Oct 8, 2024 at 1:40 PM Lorenzo Stoakes > <lorenzo.stoakes@oracle.com> wrote: > > This is touching mm/mmap.c, please ensure to cc- the reviewers (me and > > Liam, I have cc'd Liam here) as listed in MAINTAINERS when submitting > > patches for this file. > > Ah, my bad, sorry about that. > > > Also this seems like a really speculative 'please discuss' change so should > > be an RFC imo. > > > > On Tue, Oct 08, 2024 at 12:55:39AM +0200, Jann Horn wrote: > > > As explained in the comment block this change adds, we can't tell what > > > userspace's intent is when the stack grows towards an inaccessible VMA. > > > > > > I have a (highly contrived) C testcase for 32-bit x86 userspace with glibc > > > that mixes malloc(), pthread creation, and recursion in just the right way > > > such that the main stack overflows into malloc() arena memory. > > > (Let me know if you want me to share that.) > > > > You are claiming a fixes from 2017 and cc'ing stable on a non-RFC so > > yeah... we're going to need more than your word for it :) we will want to > > be really sure this is a thing before we backport to basically every > > stable kernel. > > > > Surely this isn't that hard to demonstrate though? You mmap something > > PROT_NONE just stack gap below the stack, then intentionally extend stack > > to it, before mprotect()'ing the PROT_NONE region? > > I've attached my test case that demonstrates this using basically only > malloc, free, pthread_create() and recursion, plus a bunch of ugly > read-only gunk and synchronization. It assumes that it runs under > glibc, as a 32-bit x86 binary. Usage: Thanks! > > $ clang -O2 -fstack-check -m32 -o grow-32 grow-32.c -pthread -ggdb && > for i in {0..10}; do ./grow-32; done > corrupted thread_obj2 at depth 190528 > corrupted thread_obj2 at depth 159517 > corrupted thread_obj2 at depth 209777 > corrupted thread_obj2 at depth 200119 > corrupted thread_obj2 at depth 208093 > corrupted thread_obj2 at depth 167705 > corrupted thread_obj2 at depth 234523 > corrupted thread_obj2 at depth 174528 > corrupted thread_obj2 at depth 223823 > corrupted thread_obj2 at depth 199816 > grow-32: malloc failed: Cannot allocate memory > > This demonstrates that it is possible for a userspace program that is > just using basic libc functionality, and whose only bug is unbounded > recursion, to corrupt memory despite being built with -fstack-check. > > As you said, to just demonstrate the core issue in a more contrived > way, you can also use a simpler example: > > $ cat basic-grow-repro.c > #include <err.h> > #include <stdlib.h> > #include <sys/mman.h> > > #define STACK_POINTER() ({ void *__stack; asm volatile("mov %%rsp, > %0":"=r"(__stack)); __stack; }) > > int main(void) { > char *ptr = (char*)( (unsigned long)(STACK_POINTER() - > (1024*1024*4)/*4MiB*/) & ~0xfffUL ); > if (mmap(ptr, 0x1000, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0) != ptr) > err(1, "mmap"); > *(volatile char *)(ptr + 0x1000); /* expand stack */ > if (mprotect(ptr, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC)) > err(1, "mprotect"); > system("cat /proc/$PPID/maps"); > } > $ gcc -o basic-grow-repro basic-grow-repro.c > $ ./basic-grow-repro > 5600a0fef000-5600a0ff0000 r--p 00000000 fd:01 28313737 > [...]/basic-grow-repro > 5600a0ff0000-5600a0ff1000 r-xp 00001000 fd:01 28313737 > [...]/basic-grow-repro > 5600a0ff1000-5600a0ff2000 r--p 00002000 fd:01 28313737 > [...]/basic-grow-repro > 5600a0ff2000-5600a0ff3000 r--p 00002000 fd:01 28313737 > [...]/basic-grow-repro > 5600a0ff3000-5600a0ff4000 rw-p 00003000 fd:01 28313737 > [...]/basic-grow-repro > 7f9a88553000-7f9a88556000 rw-p 00000000 00:00 0 > 7f9a88556000-7f9a8857c000 r--p 00000000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f9a8857c000-7f9a886d2000 r-xp 00026000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f9a886d2000-7f9a88727000 r--p 0017c000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f9a88727000-7f9a8872b000 r--p 001d0000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f9a8872b000-7f9a8872d000 rw-p 001d4000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f9a8872d000-7f9a8873a000 rw-p 00000000 00:00 0 > 7f9a88754000-7f9a88756000 rw-p 00000000 00:00 0 > 7f9a88756000-7f9a8875a000 r--p 00000000 00:00 0 [vvar] > 7f9a8875a000-7f9a8875c000 r-xp 00000000 00:00 0 [vdso] > 7f9a8875c000-7f9a8875d000 r--p 00000000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f9a8875d000-7f9a88782000 r-xp 00001000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f9a88782000-7f9a8878c000 r--p 00026000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f9a8878c000-7f9a8878e000 r--p 00030000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f9a8878e000-7f9a88790000 rw-p 00032000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7fff84664000-7fff84665000 rwxp 00000000 00:00 0 > 7fff84665000-7fff84a67000 rw-p 00000000 00:00 0 [stack] > $ > > > Though, while writing the above reproducer, I noticed another dodgy > scenario regarding the stack gap: MAP_FIXED_NOREPLACE apparently > ignores the stack guard region, because it only checks for VMA > intersection, see this example: Oh my. > > $ cat basic-grow-repro-ohno.c > #include <err.h> > #include <stdio.h> > #include <stdlib.h> > #include <sys/mman.h> > > #define STACK_POINTER() ({ void *__stack; asm volatile("mov %%rsp, > %0":"=r"(__stack)); __stack; }) > > int main(void) { > setbuf(stdout, NULL); > char *ptr = (char*)( (unsigned long)(STACK_POINTER() - > (1024*1024*4)/*4MiB*/) & ~0xfffUL ); > *(volatile char *)(ptr + 0x1000); /* expand stack to just above ptr */ > > printf("trying to map at %p\n", ptr); > system("cat /proc/$PPID/maps;echo"); > if (mmap(ptr, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC, > MAP_FIXED_NOREPLACE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0) != ptr) > err(1, "mmap"); > system("cat /proc/$PPID/maps"); > } > $ gcc -o basic-grow-repro-ohno basic-grow-repro-ohno.c > $ ./basic-grow-repro-ohno > trying to map at 0x7ffc344ca000 > 560ee371d000-560ee371e000 r--p 00000000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 560ee371e000-560ee371f000 r-xp 00001000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 560ee371f000-560ee3720000 r--p 00002000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 560ee3720000-560ee3721000 r--p 00002000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 560ee3721000-560ee3722000 rw-p 00003000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 7f0d636ed000-7f0d636f0000 rw-p 00000000 00:00 0 > 7f0d636f0000-7f0d63716000 r--p 00000000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d63716000-7f0d6386c000 r-xp 00026000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d6386c000-7f0d638c1000 r--p 0017c000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d638c1000-7f0d638c5000 r--p 001d0000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d638c5000-7f0d638c7000 rw-p 001d4000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d638c7000-7f0d638d4000 rw-p 00000000 00:00 0 > 7f0d638ee000-7f0d638f0000 rw-p 00000000 00:00 0 > 7f0d638f0000-7f0d638f4000 r--p 00000000 00:00 0 [vvar] > 7f0d638f4000-7f0d638f6000 r-xp 00000000 00:00 0 [vdso] > 7f0d638f6000-7f0d638f7000 r--p 00000000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f0d638f7000-7f0d6391c000 r-xp 00001000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f0d6391c000-7f0d63926000 r--p 00026000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f0d63926000-7f0d63928000 r--p 00030000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f0d63928000-7f0d6392a000 rw-p 00032000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7ffc344cb000-7ffc348cd000 rw-p 00000000 00:00 0 [stack] > > 560ee371d000-560ee371e000 r--p 00000000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 560ee371e000-560ee371f000 r-xp 00001000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 560ee371f000-560ee3720000 r--p 00002000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 560ee3720000-560ee3721000 r--p 00002000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 560ee3721000-560ee3722000 rw-p 00003000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 7f0d636ed000-7f0d636f0000 rw-p 00000000 00:00 0 > 7f0d636f0000-7f0d63716000 r--p 00000000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d63716000-7f0d6386c000 r-xp 00026000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d6386c000-7f0d638c1000 r--p 0017c000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d638c1000-7f0d638c5000 r--p 001d0000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d638c5000-7f0d638c7000 rw-p 001d4000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d638c7000-7f0d638d4000 rw-p 00000000 00:00 0 > 7f0d638ee000-7f0d638f0000 rw-p 00000000 00:00 0 > 7f0d638f0000-7f0d638f4000 r--p 00000000 00:00 0 [vvar] > 7f0d638f4000-7f0d638f6000 r-xp 00000000 00:00 0 [vdso] > 7f0d638f6000-7f0d638f7000 r--p 00000000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f0d638f7000-7f0d6391c000 r-xp 00001000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f0d6391c000-7f0d63926000 r--p 00026000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f0d63926000-7f0d63928000 r--p 00030000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f0d63928000-7f0d6392a000 rw-p 00032000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7ffc344ca000-7ffc344cb000 rwxp 00000000 00:00 0 > 7ffc344cb000-7ffc348cd000 rw-p 00000000 00:00 0 [stack] > $ > > That could also be bad: MAP_FIXED_NOREPLACE exists, from what I > understand, partly so that malloc implementations can use it to grow > heap memory chunks (though glibc doesn't use it, I'm not sure who > actually uses it that way). We wouldn't want a malloc implementation > to grow a heap memory chunk until it is directly adjacent to a stack. It seems... weird to use it that way as you couldn't be sure you weren't overwriting another VMA. > > > > I don't know of any specific scenario where this is actually exploitable, > > > but it seems like it could be a security problem for sufficiently unlucky > > > userspace. > > > > > > I believe we should ensure that, as long as code is compiled with something > > > like -fstack-check, a stack overflow in it can never cause the main stack > > > to overflow into adjacent heap memory. > > > > > > My fix effectively reverts the behavior for !vma_is_accessible() VMAs to > > > the behavior before commit 1be7107fbe18 ("mm: larger stack guard gap, > > > between vmas"), so I think it should be a fairly safe change even in > > > case A. > > > > > > Fixes: 561b5e0709e4 ("mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack") > > > Cc: stable@vger.kernel.org > > > Signed-off-by: Jann Horn <jannh@google.com> > > > --- > > > I have tested that Libreoffice still launches after this change, though > > > I don't know if that means anything. > > > > > > Note that I haven't tested the growsup code. > > > --- > > > mm/mmap.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++------- > > > 1 file changed, 46 insertions(+), 7 deletions(-) > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index dd4b35a25aeb..971bfd6c1cea 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -1064,10 +1064,12 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address) > > > gap_addr = TASK_SIZE; > > > > > > next = find_vma_intersection(mm, vma->vm_end, gap_addr); > > > - if (next && vma_is_accessible(next)) { > > > - if (!(next->vm_flags & VM_GROWSUP)) > > > + if (next && !(next->vm_flags & VM_GROWSUP)) { > > > + /* see comments in expand_downwards() */ > > > + if (vma_is_accessible(prev)) > > > + return -ENOMEM; > > > + if (address == next->vm_start) > > > return -ENOMEM; > > > - /* Check that both stack segments have the same anon_vma? */ > > > > I hate that we even maintain this for one single arch I believe at this point? > > Looks that way, just parisc? > > It would be so nice if we could somehow just get rid of this concept > of growing stacks in general... > > > > } > > > > > > if (next) > > > @@ -1155,10 +1157,47 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address) > > > /* Enforce stack_guard_gap */ > > > prev = vma_prev(&vmi); > > > /* Check that both stack segments have the same anon_vma? */ > > > - if (prev) { > > > - if (!(prev->vm_flags & VM_GROWSDOWN) && > > > - vma_is_accessible(prev) && > > > - (address - prev->vm_end < stack_guard_gap)) > > > + if (prev && !(prev->vm_flags & VM_GROWSDOWN) && > > > + (address - prev->vm_end < stack_guard_gap)) { > > > + /* > > > + * If the previous VMA is accessible, this is the normal case > > > + * where the main stack is growing down towards some unrelated > > > + * VMA. Enforce the full stack guard gap. > > > + */ > > > + if (vma_is_accessible(prev)) > > > + return -ENOMEM; > > > + > > > + /* > > > + * If the previous VMA is not accessible, we have a problem: > > > + * We can't tell what userspace's intent is. > > > + * > > > + * Case A: > > > + * Maybe userspace wants to use the previous VMA as a > > > + * "guard region" at the bottom of the main stack, in which case > > > + * userspace wants us to grow the stack until it is adjacent to > > > + * the guard region. Apparently some Java runtime environments > > > + * and Rust do that? > > > + * That is kind of ugly, and in that case userspace really ought > > > + * to ensure that the stack is fully expanded immediately, but > > > + * we have to handle this case. > > > > Yeah we can't break userspace on this, no doubt somebody is relying on this > > _somewhere_. > > It would have to be a new user who appeared after commit 1be7107fbe18. > And they'd have to install a "guard vma" somewhere below the main > stack, and they'd have to care so much about the size of the stack > that a single page makes a difference. You did say 'Apparently some Java runtime environments and Rust do that' though right? Or am I misunderstanding? > > > That said, I wish we disallowed this altogether regardless of accessibility. > > > > > + * > > > + * Case B: > > > + * But maybe the previous VMA is entirely unrelated to the stack > > > + * and is only *temporarily* PROT_NONE. For example, glibc > > > + * malloc arenas create a big PROT_NONE region and then > > > + * progressively mark parts of it as writable. > > > + * In that case, we must not let the stack become adjacent to > > > + * the previous VMA. Otherwise, after the region later becomes > > > + * writable, a stack overflow will cause the stack to grow into > > > + * the previous VMA, and we won't have any stack gap to protect > > > + * against this. > > > > Should be careful with terminology here, an mprotect() will not allow a > > merge so by 'grow into' you mean that a stack VMA could become immediately > > adjacent to a non-stack VMA prior to it which was later made writable. > > > > Perhaps I am being pedantic... > > Ah, sorry, I worded that very confusingly. By "a stack overflow will > cause the stack to grow into the previous VMA", I meant that the stack > pointer moves into the previous VMA and the program uses part of the > previous VMA as stack memory, clobbering whatever was stored there > before. That part was not meant to talk about a change of VMA bounds. Sure, yes this is what I assumed. Might be worth clarifying that just to be _totally_ clear. > > > > + * > > > + * As an ugly tradeoff, enforce a single-page gap. > > > + * A single page will hopefully be small enough to not be > > > + * noticed in case A, while providing the same level of > > > + * protection in case B that normal userspace threads get. > > > + */ > > > + if (address == prev->vm_end) > > > return -ENOMEM; > > > > Ugh, yuck. Not a fan of this at all. Feels like a dreadful hack. > > Oh, I agree, I just didn't see a better way to do it. > > > You do raise an important point here, but it strikes me that we should be > > doing this check in the mprotect()/mmap() MAP_FIXED scenarios where it > > shouldn't be too costly to check against the next VMA (which we will be > > obtaining anyway for merge checks)? > > > > That way we don't need a hack like this, and can just disallow the > > operation. That'd probably be as liable to break the program as an -ENOMEM > > on a stack expansion would... > > Hmm... yeah, I guess that would work. If someone hits this case, it > would probably be less obvious to the programmer what went wrong based > on the error code, but on the other hand, it would give userspace a > slightly better chance to recover from the issue... > > I guess I can see if I can code that up. Thanks!
On Wed, Oct 09, 2024 at 06:45:08AM +0800, kernel test robot wrote: > Hi Jann, > > kernel test robot noticed the following build errors: > > [auto build test ERROR on 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b] > > url: https://github.com/intel-lab-lkp/linux/commits/Jann-Horn/mm-Enforce-a-minimal-stack-gap-even-against-inaccessible-VMAs/20241008-065733 > base: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b > patch link: https://lore.kernel.org/r/20241008-stack-gap-inaccessible-v1-1-848d4d891f21%40google.com > patch subject: [PATCH] mm: Enforce a minimal stack gap even against inaccessible VMAs > config: parisc-randconfig-r072-20241009 (https://download.01.org/0day-ci/archive/20241009/202410090632.brLG8w0b-lkp@intel.com/config) > compiler: hppa-linux-gcc (GCC) 14.1.0 > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241009/202410090632.brLG8w0b-lkp@intel.com/reproduce) > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <lkp@intel.com> > | Closes: https://lore.kernel.org/oe-kbuild-all/202410090632.brLG8w0b-lkp@intel.com/ > > All errors (new ones prefixed by >>): > > mm/mmap.c: In function 'expand_upwards': > >> mm/mmap.c:1069:39: error: 'prev' undeclared (first use in this function) > 1069 | if (vma_is_accessible(prev)) Suspect this is just a simple typo and should be next rather than prev :>) > | ^~~~ > mm/mmap.c:1069:39: note: each undeclared identifier is reported only once for each function it appears in > > > vim +/prev +1069 mm/mmap.c > > 1036 > 1037 #if defined(CONFIG_STACK_GROWSUP) > 1038 /* > 1039 * PA-RISC uses this for its stack. > 1040 * vma is the last one with address > vma->vm_end. Have to extend vma. > 1041 */ > 1042 static int expand_upwards(struct vm_area_struct *vma, unsigned long address) > 1043 { > 1044 struct mm_struct *mm = vma->vm_mm; > 1045 struct vm_area_struct *next; > 1046 unsigned long gap_addr; > 1047 int error = 0; > 1048 VMA_ITERATOR(vmi, mm, vma->vm_start); > 1049 > 1050 if (!(vma->vm_flags & VM_GROWSUP)) > 1051 return -EFAULT; > 1052 > 1053 /* Guard against exceeding limits of the address space. */ > 1054 address &= PAGE_MASK; > 1055 if (address >= (TASK_SIZE & PAGE_MASK)) > 1056 return -ENOMEM; > 1057 address += PAGE_SIZE; > 1058 > 1059 /* Enforce stack_guard_gap */ > 1060 gap_addr = address + stack_guard_gap; > 1061 > 1062 /* Guard against overflow */ > 1063 if (gap_addr < address || gap_addr > TASK_SIZE) > 1064 gap_addr = TASK_SIZE; > 1065 > 1066 next = find_vma_intersection(mm, vma->vm_end, gap_addr); > 1067 if (next && !(next->vm_flags & VM_GROWSUP)) { > 1068 /* see comments in expand_downwards() */ > > 1069 if (vma_is_accessible(prev)) > 1070 return -ENOMEM; > 1071 if (address == next->vm_start) > 1072 return -ENOMEM; > 1073 } > 1074 > 1075 if (next) > 1076 vma_iter_prev_range_limit(&vmi, address); > 1077 > 1078 vma_iter_config(&vmi, vma->vm_start, address); > 1079 if (vma_iter_prealloc(&vmi, vma)) > 1080 return -ENOMEM; > 1081 > 1082 /* We must make sure the anon_vma is allocated. */ > 1083 if (unlikely(anon_vma_prepare(vma))) { > 1084 vma_iter_free(&vmi); > 1085 return -ENOMEM; > 1086 } > 1087 > 1088 /* Lock the VMA before expanding to prevent concurrent page faults */ > 1089 vma_start_write(vma); > 1090 /* > 1091 * vma->vm_start/vm_end cannot change under us because the caller > 1092 * is required to hold the mmap_lock in read mode. We need the > 1093 * anon_vma lock to serialize against concurrent expand_stacks. > 1094 */ > 1095 anon_vma_lock_write(vma->anon_vma); > 1096 > 1097 /* Somebody else might have raced and expanded it already */ > 1098 if (address > vma->vm_end) { > 1099 unsigned long size, grow; > 1100 > 1101 size = address - vma->vm_start; > 1102 grow = (address - vma->vm_end) >> PAGE_SHIFT; > 1103 > 1104 error = -ENOMEM; > 1105 if (vma->vm_pgoff + (size >> PAGE_SHIFT) >= vma->vm_pgoff) { > 1106 error = acct_stack_growth(vma, size, grow); > 1107 if (!error) { > 1108 /* > 1109 * We only hold a shared mmap_lock lock here, so > 1110 * we need to protect against concurrent vma > 1111 * expansions. anon_vma_lock_write() doesn't > 1112 * help here, as we don't guarantee that all > 1113 * growable vmas in a mm share the same root > 1114 * anon vma. So, we reuse mm->page_table_lock > 1115 * to guard against concurrent vma expansions. > 1116 */ > 1117 spin_lock(&mm->page_table_lock); > 1118 if (vma->vm_flags & VM_LOCKED) > 1119 mm->locked_vm += grow; > 1120 vm_stat_account(mm, vma->vm_flags, grow); > 1121 anon_vma_interval_tree_pre_update_vma(vma); > 1122 vma->vm_end = address; > 1123 /* Overwrite old entry in mtree. */ > 1124 vma_iter_store(&vmi, vma); > 1125 anon_vma_interval_tree_post_update_vma(vma); > 1126 spin_unlock(&mm->page_table_lock); > 1127 > 1128 perf_event_mmap(vma); > 1129 } > 1130 } > 1131 } > 1132 anon_vma_unlock_write(vma->anon_vma); > 1133 vma_iter_free(&vmi); > 1134 validate_mm(mm); > 1135 return error; > 1136 } > 1137 #endif /* CONFIG_STACK_GROWSUP */ > 1138 > > -- > 0-DAY CI Kernel Test Service > https://github.com/intel/lkp-tests/wiki
On Wed, Oct 9, 2024 at 4:53 PM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > On Tue, Oct 08, 2024 at 04:20:13PM +0200, Jann Horn wrote: > > Though, while writing the above reproducer, I noticed another dodgy > > scenario regarding the stack gap: MAP_FIXED_NOREPLACE apparently > > ignores the stack guard region, because it only checks for VMA > > intersection, see this example: [snip] > > That could also be bad: MAP_FIXED_NOREPLACE exists, from what I > > understand, partly so that malloc implementations can use it to grow > > heap memory chunks (though glibc doesn't use it, I'm not sure who > > actually uses it that way). We wouldn't want a malloc implementation > > to grow a heap memory chunk until it is directly adjacent to a stack. > > It seems... weird to use it that way as you couldn't be sure you weren't > overwriting another VMA. Here I'm talking about MAP_FIXED_NOREPLACE, not MAP_FIXED. MAP_FIXED_NOREPLACE is supposed to be sort of like calling mmap() with an address hint, except that if creating the VMA at the provided hint is not possible, it fails. I remember Daniel Micay talking about using it in his memory allocator at some point... > > > > @@ -1155,10 +1157,47 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address) > > > > /* Enforce stack_guard_gap */ > > > > prev = vma_prev(&vmi); > > > > /* Check that both stack segments have the same anon_vma? */ > > > > - if (prev) { > > > > - if (!(prev->vm_flags & VM_GROWSDOWN) && > > > > - vma_is_accessible(prev) && > > > > - (address - prev->vm_end < stack_guard_gap)) > > > > + if (prev && !(prev->vm_flags & VM_GROWSDOWN) && > > > > + (address - prev->vm_end < stack_guard_gap)) { > > > > + /* > > > > + * If the previous VMA is accessible, this is the normal case > > > > + * where the main stack is growing down towards some unrelated > > > > + * VMA. Enforce the full stack guard gap. > > > > + */ > > > > + if (vma_is_accessible(prev)) > > > > + return -ENOMEM; > > > > + > > > > + /* > > > > + * If the previous VMA is not accessible, we have a problem: > > > > + * We can't tell what userspace's intent is. > > > > + * > > > > + * Case A: > > > > + * Maybe userspace wants to use the previous VMA as a > > > > + * "guard region" at the bottom of the main stack, in which case > > > > + * userspace wants us to grow the stack until it is adjacent to > > > > + * the guard region. Apparently some Java runtime environments > > > > + * and Rust do that? > > > > + * That is kind of ugly, and in that case userspace really ought > > > > + * to ensure that the stack is fully expanded immediately, but > > > > + * we have to handle this case. > > > > > > Yeah we can't break userspace on this, no doubt somebody is relying on this > > > _somewhere_. > > > > It would have to be a new user who appeared after commit 1be7107fbe18. > > And they'd have to install a "guard vma" somewhere below the main > > stack, and they'd have to care so much about the size of the stack > > that a single page makes a difference. > > You did say 'Apparently some Java runtime environments and Rust do that' > though right? Or am I misunderstanding? Ah, sorry, the context for this is in the commit message of commit 561b5e0709e4, and the upstream discussion leading up to it (https://lore.kernel.org/all/1499126133.2707.20.camel@decadent.org.uk/T/). So before commit 1be7107fbe18, these workloads worked fine despite the kernel unconditionally enforcing a single-page gap; and only when 1be7107fbe18 changed that gap to be 1MB, people started seeing issues, which 561b5e0709e4 was supposed to address. So my idea with this patch was to revert the behavior for such workloads to the pre-1be7107fbe18 situation.
On Wed, 9 Oct 2024 15:53:50 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > All errors (new ones prefixed by >>): > > > > mm/mmap.c: In function 'expand_upwards': > > >> mm/mmap.c:1069:39: error: 'prev' undeclared (first use in this function) > > 1069 | if (vma_is_accessible(prev)) > > Suspect this is just a simple typo and should be next rather than prev :>) Agree, I'll make that change. CONFIG_STACK_GROWSUP is only a parisc thing. That makes runtime testing difficult.
On Wed, Oct 09, 2024 at 02:08:22PM -0700, Andrew Morton wrote: > On Wed, 9 Oct 2024 15:53:50 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > > > All errors (new ones prefixed by >>): > > > > > > mm/mmap.c: In function 'expand_upwards': > > > >> mm/mmap.c:1069:39: error: 'prev' undeclared (first use in this function) > > > 1069 | if (vma_is_accessible(prev)) > > > > Suspect this is just a simple typo and should be next rather than prev :>) > > Agree, I'll make that change. > > CONFIG_STACK_GROWSUP is only a parisc thing. That makes runtime > testing difficult. Hi Andrew, Would it be possible to drop this for now as it is not clear we want to take this approach and there is some concern this could break some things? I believe Jann was going to provide an alternative that attacked this from the mmap()/mprotect() side also? Thanks!
diff --git a/mm/mmap.c b/mm/mmap.c index dd4b35a25aeb..971bfd6c1cea 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1064,10 +1064,12 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address) gap_addr = TASK_SIZE; next = find_vma_intersection(mm, vma->vm_end, gap_addr); - if (next && vma_is_accessible(next)) { - if (!(next->vm_flags & VM_GROWSUP)) + if (next && !(next->vm_flags & VM_GROWSUP)) { + /* see comments in expand_downwards() */ + if (vma_is_accessible(prev)) + return -ENOMEM; + if (address == next->vm_start) return -ENOMEM; - /* Check that both stack segments have the same anon_vma? */ } if (next) @@ -1155,10 +1157,47 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address) /* Enforce stack_guard_gap */ prev = vma_prev(&vmi); /* Check that both stack segments have the same anon_vma? */ - if (prev) { - if (!(prev->vm_flags & VM_GROWSDOWN) && - vma_is_accessible(prev) && - (address - prev->vm_end < stack_guard_gap)) + if (prev && !(prev->vm_flags & VM_GROWSDOWN) && + (address - prev->vm_end < stack_guard_gap)) { + /* + * If the previous VMA is accessible, this is the normal case + * where the main stack is growing down towards some unrelated + * VMA. Enforce the full stack guard gap. + */ + if (vma_is_accessible(prev)) + return -ENOMEM; + + /* + * If the previous VMA is not accessible, we have a problem: + * We can't tell what userspace's intent is. + * + * Case A: + * Maybe userspace wants to use the previous VMA as a + * "guard region" at the bottom of the main stack, in which case + * userspace wants us to grow the stack until it is adjacent to + * the guard region. Apparently some Java runtime environments + * and Rust do that? + * That is kind of ugly, and in that case userspace really ought + * to ensure that the stack is fully expanded immediately, but + * we have to handle this case. + * + * Case B: + * But maybe the previous VMA is entirely unrelated to the stack + * and is only *temporarily* PROT_NONE. For example, glibc + * malloc arenas create a big PROT_NONE region and then + * progressively mark parts of it as writable. + * In that case, we must not let the stack become adjacent to + * the previous VMA. Otherwise, after the region later becomes + * writable, a stack overflow will cause the stack to grow into + * the previous VMA, and we won't have any stack gap to protect + * against this. + * + * As an ugly tradeoff, enforce a single-page gap. + * A single page will hopefully be small enough to not be + * noticed in case A, while providing the same level of + * protection in case B that normal userspace threads get. + */ + if (address == prev->vm_end) return -ENOMEM; }
As explained in the comment block this change adds, we can't tell what userspace's intent is when the stack grows towards an inaccessible VMA. I have a (highly contrived) C testcase for 32-bit x86 userspace with glibc that mixes malloc(), pthread creation, and recursion in just the right way such that the main stack overflows into malloc() arena memory. (Let me know if you want me to share that.) I don't know of any specific scenario where this is actually exploitable, but it seems like it could be a security problem for sufficiently unlucky userspace. I believe we should ensure that, as long as code is compiled with something like -fstack-check, a stack overflow in it can never cause the main stack to overflow into adjacent heap memory. My fix effectively reverts the behavior for !vma_is_accessible() VMAs to the behavior before commit 1be7107fbe18 ("mm: larger stack guard gap, between vmas"), so I think it should be a fairly safe change even in case A. Fixes: 561b5e0709e4 ("mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> --- I have tested that Libreoffice still launches after this change, though I don't know if that means anything. Note that I haven't tested the growsup code. --- mm/mmap.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 46 insertions(+), 7 deletions(-) --- base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b change-id: 20241008-stack-gap-inaccessible-c7319f7d4b1b