Message ID | CAEjxPJ5ZuBGVDah0f3g0-7t2v1uSXTmp_cTT3g_MSP3J9QtoeA@mail.gmail.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Delegated to: | Paul Moore |
Headers | show |
Series | None | expand |
On Mon, Jul 31, 2023 at 10:26 AM Stephen Smalley <stephen.smalley.work@gmail.com> wrote: > I believe this patch yields a semantic change in the SELinux execheap > permission check. That said, I think the change is for the better. Agreed. I'm also in favor of using a helper which is maintained by the VM folks over open coded logic in the SELinux code.
(Restoring part of the Cc list to include more relevant lists & people... If you are lost, the original email is here: https://lore.kernel.org/selinux/20230728050043.59880-4-wangkefeng.wang@huawei.com/) On Tue, Aug 1, 2023 at 1:08 AM Paul Moore <paul@paul-moore.com> wrote: > > On Mon, Jul 31, 2023 at 4:02 PM Stephen Smalley > <stephen.smalley.work@gmail.com> wrote: > > On Mon, Jul 31, 2023 at 12:19 PM Paul Moore <paul@paul-moore.com> wrote: > > > > > > On Mon, Jul 31, 2023 at 10:26 AM Stephen Smalley > > > <stephen.smalley.work@gmail.com> wrote: > > > > I believe this patch yields a semantic change in the SELinux execheap > > > > permission check. That said, I think the change is for the better. > > > > > > Agreed. I'm also in favor of using a helper which is maintained by > > > the VM folks over open coded logic in the SELinux code. > > > > Yes, only caveat is in theory it could trigger new execheap denials > > under existing policies. > > Trying to construct an example based on the > > selinux-testsuite/tests/mmap/mprot_heap.c example but coming up empty > > so far on something that both works and yields different results > > before and after this patch. > > My gut feeling is that this will not be an issue, but I could very > well be wrong. If it becomes a significant issue we can revert the > SELinux portion of the patch. > > Of course, if you have any luck demonstrating this with reasonable > code, that would be good input too. So, it turns out this does affect actual code. Thus far, we know about gcl [1] and wine [2]. The gcl case is easy to reproduce (just install gcl on Fedora and run gcl without arguments), so I was able to dig a bit deeper. gcl has the following relevant memory mappings as captured by gdb: Start Addr End Addr Size Offset Perms objfile 0x413000 0xf75000 0xb62000 0x3fa000 rw-p /usr/lib/gcl-2.6.14/unixport/saved_ansi_gcl 0xf75000 0xf79000 0x4000 0x0 rwxp [heap] It tries to call mprotect(0x883000, 7282688, PROT_READ|PROT_WRITE|PROT_EXEC), i.e. it tries to make the region 0x883000 - 0xf75000 executable. Before this patch it was allowed, whereas now it triggers an execheap SELinux denial. But this seems wrong - the affected region is merely adjacent to the [heap] region, it does not actually overlap with it. So even if we accept that the correct semantics is to catch any region that overlaps with the heap (before only subregions of the heap were subject to the execheap check), this corner case doesn't seem to be handled correctly by the new check (and the same bug seems to have been in fs/proc/task_mmu.c before commit 11250fd12eb8 ("mm: factor out VMA stack and heap checks")). I didn't analyze the wine case ([2]), but it may be the same situation. Unless I'm mistaken, those <= & >= in should in fact be just < & >. And the expression in vma_is_initial_stack() is also suspicious (but I'm not going to make any assumption on what is the intended semantics there...) [1] https://bugzilla.redhat.com/show_bug.cgi?id=2252391 [2] https://bugzilla.redhat.com/show_bug.cgi?id=2247299 -- Ondrej Mosnacek Senior Software Engineer, Linux Security - SELinux kernel Red Hat, Inc.
On Wed, Dec 6, 2023 at 9:22 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: > > (Restoring part of the Cc list to include more relevant lists & > people... If you are lost, the original email is here: > https://lore.kernel.org/selinux/20230728050043.59880-4-wangkefeng.wang@huawei.com/) > > On Tue, Aug 1, 2023 at 1:08 AM Paul Moore <paul@paul-moore.com> wrote: > > On Mon, Jul 31, 2023 at 4:02 PM Stephen Smalley > > <stephen.smalley.work@gmail.com> wrote: > > > On Mon, Jul 31, 2023 at 12:19 PM Paul Moore <paul@paul-moore.com> wrote: > > > > > > > > On Mon, Jul 31, 2023 at 10:26 AM Stephen Smalley > > > > <stephen.smalley.work@gmail.com> wrote: > > > > > I believe this patch yields a semantic change in the SELinux execheap > > > > > permission check. That said, I think the change is for the better. > > > > > > > > Agreed. I'm also in favor of using a helper which is maintained by > > > > the VM folks over open coded logic in the SELinux code. > > > > > > Yes, only caveat is in theory it could trigger new execheap denials > > > under existing policies. > > > Trying to construct an example based on the > > > selinux-testsuite/tests/mmap/mprot_heap.c example but coming up empty > > > so far on something that both works and yields different results > > > before and after this patch. > > > > My gut feeling is that this will not be an issue, but I could very > > well be wrong. If it becomes a significant issue we can revert the > > SELinux portion of the patch. > > > > Of course, if you have any luck demonstrating this with reasonable > > code, that would be good input too. > > So, it turns out this does affect actual code. Thus far, we know about > gcl [1] and wine [2]. The gcl case is easy to reproduce (just install > gcl on Fedora and run gcl without arguments), so I was able to dig a > bit deeper. > > gcl has the following relevant memory mappings as captured by gdb: > Start Addr End Addr Size Offset Perms objfile > 0x413000 0xf75000 0xb62000 0x3fa000 rw-p > /usr/lib/gcl-2.6.14/unixport/saved_ansi_gcl > 0xf75000 0xf79000 0x4000 0x0 rwxp [heap] > > It tries to call mprotect(0x883000, 7282688, > PROT_READ|PROT_WRITE|PROT_EXEC), i.e. it tries to make the region > 0x883000 - 0xf75000 executable. Before this patch it was allowed, > whereas now it triggers an execheap SELinux denial. But this seems > wrong - the affected region is merely adjacent to the [heap] region, > it does not actually overlap with it. So even if we accept that the > correct semantics is to catch any region that overlaps with the heap > (before only subregions of the heap were subject to the execheap > check), this corner case doesn't seem to be handled correctly by the > new check (and the same bug seems to have been in fs/proc/task_mmu.c > before commit 11250fd12eb8 ("mm: factor out VMA stack and heap > checks")). > > I didn't analyze the wine case ([2]), but it may be the same situation. > > Unless I'm mistaken, those <= & >= in should in fact be just < & >. > And the expression in vma_is_initial_stack() is also suspicious (but > I'm not going to make any assumption on what is the intended semantics > there...) > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=2252391 > [2] https://bugzilla.redhat.com/show_bug.cgi?id=2247299 Thanks Ondrej. I'm hoping the mm folks will comment on this as it looks like this is an issue with the helper functions, but just in case I'm going to prep a revert for just the SELinux changes. If we don't hear anything in the next couple of days I'll send the revert up to Linus with the idea that we can eventually shift back to the helpers when this is sorted.
On 2023/12/7 4:47, Paul Moore wrote: > On Wed, Dec 6, 2023 at 9:22 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: >> >> (Restoring part of the Cc list to include more relevant lists & >> people... If you are lost, the original email is here: >> https://lore.kernel.org/selinux/20230728050043.59880-4-wangkefeng.wang@huawei.com/) >> >> On Tue, Aug 1, 2023 at 1:08 AM Paul Moore <paul@paul-moore.com> wrote: >>> On Mon, Jul 31, 2023 at 4:02 PM Stephen Smalley >>> <stephen.smalley.work@gmail.com> wrote: >>>> On Mon, Jul 31, 2023 at 12:19 PM Paul Moore <paul@paul-moore.com> wrote: >>>>> >>>>> On Mon, Jul 31, 2023 at 10:26 AM Stephen Smalley >>>>> <stephen.smalley.work@gmail.com> wrote: >>>>>> I believe this patch yields a semantic change in the SELinux execheap >>>>>> permission check. That said, I think the change is for the better. >>>>> >>>>> Agreed. I'm also in favor of using a helper which is maintained by >>>>> the VM folks over open coded logic in the SELinux code. >>>> >>>> Yes, only caveat is in theory it could trigger new execheap denials >>>> under existing policies. >>>> Trying to construct an example based on the >>>> selinux-testsuite/tests/mmap/mprot_heap.c example but coming up empty >>>> so far on something that both works and yields different results >>>> before and after this patch. >>> >>> My gut feeling is that this will not be an issue, but I could very >>> well be wrong. If it becomes a significant issue we can revert the >>> SELinux portion of the patch. >>> >>> Of course, if you have any luck demonstrating this with reasonable >>> code, that would be good input too. >> >> So, it turns out this does affect actual code. Thus far, we know about >> gcl [1] and wine [2]. The gcl case is easy to reproduce (just install >> gcl on Fedora and run gcl without arguments), so I was able to dig a >> bit deeper. >> >> gcl has the following relevant memory mappings as captured by gdb: >> Start Addr End Addr Size Offset Perms objfile >> 0x413000 0xf75000 0xb62000 0x3fa000 rw-p >> /usr/lib/gcl-2.6.14/unixport/saved_ansi_gcl >> 0xf75000 0xf79000 0x4000 0x0 rwxp [heap] >> >> It tries to call mprotect(0x883000, 7282688, >> PROT_READ|PROT_WRITE|PROT_EXEC), i.e. it tries to make the region >> 0x883000 - 0xf75000 executable. Before this patch it was allowed, >> whereas now it triggers an execheap SELinux denial. But this seems >> wrong - the affected region is merely adjacent to the [heap] region, >> it does not actually overlap with it. So even if we accept that the >> correct semantics is to catch any region that overlaps with the heap >> (before only subregions of the heap were subject to the execheap >> check), this corner case doesn't seem to be handled correctly by the >> new check (and the same bug seems to have been in fs/proc/task_mmu.c >> before commit 11250fd12eb8 ("mm: factor out VMA stack and heap >> checks")). Yes, the heap check exists for a long time, [start_brk brk] case1: [vm_start,vm_end] case2: [vm_start,vm_end] case3: [vm_start,vm_end] case4: [vm_start, vm_end] case5: [vm_start,vm_end] case6: [vm_start,vm_end] old check: vma->vm_start >= vma->vm_mm->start_brk && vma->vm_end <= vma->vm_mm->brk Only include case1, vma range must be within heap new check: vma->vm_start <= vma->vm_mm->brk && vma->vm_end >= vma->vm_mm->start_brk Include case1~case6, but case5(vm_end=start_brk) and case6(vm_start=brk) are the corner cases, gcl issue matchs the case5. >> >> I didn't analyze the wine case ([2]), but it may be the same situation. >> >> Unless I'm mistaken, those <= & >= in should in fact be just < & >. I support this change. vma->vm_start < vma->vm_mm->brk && vma->vm_end > vma->vm_mm->start_brk >> And the expression in vma_is_initial_stack() is also suspicious (but >> I'm not going to make any assumption on what is the intended semantics >> there...) >> >> [1] https://bugzilla.redhat.com/show_bug.cgi?id=2252391 >> [2] https://bugzilla.redhat.com/show_bug.cgi?id=2247299 Could you quickly verify after the above change? > > Thanks Ondrej. > > I'm hoping the mm folks will comment on this as it looks like this is > an issue with the helper functions, but just in case I'm going to prep > a revert for just the SELinux changes. If we don't hear anything in > the next couple of days I'll send the revert up to Linus with the idea > that we can eventually shift back to the helpers when this is sorted. >
On Thu, Dec 7, 2023 at 5:50 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote: > > > > On 2023/12/7 4:47, Paul Moore wrote: > > On Wed, Dec 6, 2023 at 9:22 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: > >> > >> (Restoring part of the Cc list to include more relevant lists & > >> people... If you are lost, the original email is here: > >> https://lore.kernel.org/selinux/20230728050043.59880-4-wangkefeng.wang@huawei.com/) > >> > >> On Tue, Aug 1, 2023 at 1:08 AM Paul Moore <paul@paul-moore.com> wrote: > >>> On Mon, Jul 31, 2023 at 4:02 PM Stephen Smalley > >>> <stephen.smalley.work@gmail.com> wrote: > >>>> On Mon, Jul 31, 2023 at 12:19 PM Paul Moore <paul@paul-moore.com> wrote: > >>>>> > >>>>> On Mon, Jul 31, 2023 at 10:26 AM Stephen Smalley > >>>>> <stephen.smalley.work@gmail.com> wrote: > >>>>>> I believe this patch yields a semantic change in the SELinux execheap > >>>>>> permission check. That said, I think the change is for the better. > >>>>> > >>>>> Agreed. I'm also in favor of using a helper which is maintained by > >>>>> the VM folks over open coded logic in the SELinux code. > >>>> > >>>> Yes, only caveat is in theory it could trigger new execheap denials > >>>> under existing policies. > >>>> Trying to construct an example based on the > >>>> selinux-testsuite/tests/mmap/mprot_heap.c example but coming up empty > >>>> so far on something that both works and yields different results > >>>> before and after this patch. > >>> > >>> My gut feeling is that this will not be an issue, but I could very > >>> well be wrong. If it becomes a significant issue we can revert the > >>> SELinux portion of the patch. > >>> > >>> Of course, if you have any luck demonstrating this with reasonable > >>> code, that would be good input too. > >> > >> So, it turns out this does affect actual code. Thus far, we know about > >> gcl [1] and wine [2]. The gcl case is easy to reproduce (just install > >> gcl on Fedora and run gcl without arguments), so I was able to dig a > >> bit deeper. > >> > >> gcl has the following relevant memory mappings as captured by gdb: > >> Start Addr End Addr Size Offset Perms objfile > >> 0x413000 0xf75000 0xb62000 0x3fa000 rw-p > >> /usr/lib/gcl-2.6.14/unixport/saved_ansi_gcl > >> 0xf75000 0xf79000 0x4000 0x0 rwxp [heap] > >> > >> It tries to call mprotect(0x883000, 7282688, > >> PROT_READ|PROT_WRITE|PROT_EXEC), i.e. it tries to make the region > >> 0x883000 - 0xf75000 executable. Before this patch it was allowed, > >> whereas now it triggers an execheap SELinux denial. But this seems > >> wrong - the affected region is merely adjacent to the [heap] region, > >> it does not actually overlap with it. So even if we accept that the > >> correct semantics is to catch any region that overlaps with the heap > >> (before only subregions of the heap were subject to the execheap > >> check), this corner case doesn't seem to be handled correctly by the > >> new check (and the same bug seems to have been in fs/proc/task_mmu.c > >> before commit 11250fd12eb8 ("mm: factor out VMA stack and heap > >> checks")). > > Yes, the heap check exists for a long time, > > [start_brk brk] > case1: [vm_start,vm_end] > case2: [vm_start,vm_end] > case3: [vm_start,vm_end] > case4: [vm_start, vm_end] > > case5: [vm_start,vm_end] > case6: [vm_start,vm_end] > > old check: > vma->vm_start >= vma->vm_mm->start_brk && vma->vm_end <= vma->vm_mm->brk > > Only include case1, vma range must be within heap > > new check: > vma->vm_start <= vma->vm_mm->brk && vma->vm_end >= vma->vm_mm->start_brk > > Include case1~case6, but case5(vm_end=start_brk) and case6(vm_start=brk) > are the corner cases, gcl issue matchs the case5. > > >> > >> I didn't analyze the wine case ([2]), but it may be the same situation. > >> > >> Unless I'm mistaken, those <= & >= in should in fact be just < & >. > > I support this change. > vma->vm_start < vma->vm_mm->brk && vma->vm_end > vma->vm_mm->start_brk > > >> And the expression in vma_is_initial_stack() is also suspicious (but > >> I'm not going to make any assumption on what is the intended semantics > >> there...) > >> > >> [1] https://bugzilla.redhat.com/show_bug.cgi?id=2252391 > >> [2] https://bugzilla.redhat.com/show_bug.cgi?id=2247299 > > Could you quickly verify after the above change? Yes, changing the operators as suggested fixes the gcl case. BTW, the vma_is_*() helpers introduced by your patch also have wrong indentation - the "return" lines are indented by 7 spaces instead of a tab. If you are going to submit a fixing patch you might want to fix that, too.
On 2023/12/7 16:37, Ondrej Mosnacek wrote: > On Thu, Dec 7, 2023 at 5:50 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote: >> >> >> >> On 2023/12/7 4:47, Paul Moore wrote: >>> On Wed, Dec 6, 2023 at 9:22 AM Ondrej Mosnacek <omosnace@redhat.com> wrote: >>>> >>>> (Restoring part of the Cc list to include more relevant lists & >>>> people... If you are lost, the original email is here: >>>> https://lore.kernel.org/selinux/20230728050043.59880-4-wangkefeng.wang@huawei.com/) >>>> >>>> On Tue, Aug 1, 2023 at 1:08 AM Paul Moore <paul@paul-moore.com> wrote: >>>>> On Mon, Jul 31, 2023 at 4:02 PM Stephen Smalley >>>>> <stephen.smalley.work@gmail.com> wrote: >>>>>> On Mon, Jul 31, 2023 at 12:19 PM Paul Moore <paul@paul-moore.com> wrote: >>>>>>> >>>>>>> On Mon, Jul 31, 2023 at 10:26 AM Stephen Smalley >>>>>>> <stephen.smalley.work@gmail.com> wrote: >>>>>>>> I believe this patch yields a semantic change in the SELinux execheap >>>>>>>> permission check. That said, I think the change is for the better. >>>>>>> >>>>>>> Agreed. I'm also in favor of using a helper which is maintained by >>>>>>> the VM folks over open coded logic in the SELinux code. >>>>>> >>>>>> Yes, only caveat is in theory it could trigger new execheap denials >>>>>> under existing policies. >>>>>> Trying to construct an example based on the >>>>>> selinux-testsuite/tests/mmap/mprot_heap.c example but coming up empty >>>>>> so far on something that both works and yields different results >>>>>> before and after this patch. >>>>> >>>>> My gut feeling is that this will not be an issue, but I could very >>>>> well be wrong. If it becomes a significant issue we can revert the >>>>> SELinux portion of the patch. >>>>> >>>>> Of course, if you have any luck demonstrating this with reasonable >>>>> code, that would be good input too. >>>> >>>> So, it turns out this does affect actual code. Thus far, we know about >>>> gcl [1] and wine [2]. The gcl case is easy to reproduce (just install >>>> gcl on Fedora and run gcl without arguments), so I was able to dig a >>>> bit deeper. >>>> >>>> gcl has the following relevant memory mappings as captured by gdb: >>>> Start Addr End Addr Size Offset Perms objfile >>>> 0x413000 0xf75000 0xb62000 0x3fa000 rw-p >>>> /usr/lib/gcl-2.6.14/unixport/saved_ansi_gcl >>>> 0xf75000 0xf79000 0x4000 0x0 rwxp [heap] >>>> >>>> It tries to call mprotect(0x883000, 7282688, >>>> PROT_READ|PROT_WRITE|PROT_EXEC), i.e. it tries to make the region >>>> 0x883000 - 0xf75000 executable. Before this patch it was allowed, >>>> whereas now it triggers an execheap SELinux denial. But this seems >>>> wrong - the affected region is merely adjacent to the [heap] region, >>>> it does not actually overlap with it. So even if we accept that the >>>> correct semantics is to catch any region that overlaps with the heap >>>> (before only subregions of the heap were subject to the execheap >>>> check), this corner case doesn't seem to be handled correctly by the >>>> new check (and the same bug seems to have been in fs/proc/task_mmu.c >>>> before commit 11250fd12eb8 ("mm: factor out VMA stack and heap >>>> checks")). >> >> Yes, the heap check exists for a long time, >> >> [start_brk brk] >> case1: [vm_start,vm_end] >> case2: [vm_start,vm_end] >> case3: [vm_start,vm_end] >> case4: [vm_start, vm_end] >> >> case5: [vm_start,vm_end] >> case6: [vm_start,vm_end] >> >> old check: >> vma->vm_start >= vma->vm_mm->start_brk && vma->vm_end <= vma->vm_mm->brk >> >> Only include case1, vma range must be within heap >> >> new check: >> vma->vm_start <= vma->vm_mm->brk && vma->vm_end >= vma->vm_mm->start_brk >> >> Include case1~case6, but case5(vm_end=start_brk) and case6(vm_start=brk) >> are the corner cases, gcl issue matchs the case5. >> >>>> >>>> I didn't analyze the wine case ([2]), but it may be the same situation. >>>> >>>> Unless I'm mistaken, those <= & >= in should in fact be just < & >. >> >> I support this change. >> vma->vm_start < vma->vm_mm->brk && vma->vm_end > vma->vm_mm->start_brk >> >>>> And the expression in vma_is_initial_stack() is also suspicious (but >>>> I'm not going to make any assumption on what is the intended semantics >>>> there...) >>>> >>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=2252391 >>>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=2247299 >> >> Could you quickly verify after the above change? > > Yes, changing the operators as suggested fixes the gcl case. Thanks for your confirm,win[2] fixed too, right? > > BTW, the vma_is_*() helpers introduced by your patch also have wrong > indentation - the "return" lines are indented by 7 spaces instead of a > tab. If you are going to submit a fixing patch you might want to fix > that, too. Sure. will fix.
It looks like this issue is still not fixed. There has been some investigation on going in this Bugzilla for Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=2254434 The behavior we are seeing is that when a process has no heap and mmap(2) is called with MAP_PRIVATE | MAP_ANONYMOUS, it allocates memory on the heap. If the address space returned by mmap(2) is later on made executable with mprotect(2), that triggers an execheap avc. We have a fairly simple reproducer. We widdled it down from picking an address to pass to mmap(2) using getrandom(2) to using the same address every time. Sometimes it triggers the behavior, sometimes it does not. We also observe that disabling ASLR via sysctl kernel.randomize_va_space=0 works around the issue, with obvious caveats. Here is a reproducer, which raises SIGSTOP so that you can analyze /proc/<pid>/maps. It allocates a 512 MB address space and then attempts to give it execute permissions. Running it multiple times will eventually trigger the behavior. #include <stdint.h> #include <unistd.h> #include <errno.h> #include <sys/mman.h> #include <sys/random.h> #include <signal.h> int main(void) { uintptr_t raw_addr = 0x25085000; int length = 512 * 1024 * 1024; void* pointer = mmap((void *)raw_addr, length, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (!pointer) { raise(SIGSTOP); return 1; } if (mprotect(pointer, length, PROT_READ | PROT_WRITE | PROT_EXEC) == -1) { raise(SIGSTOP); return 1; } return 0; } As far as impact goes, this is mostly causing Chromium/Electron based applications to fail randomly, but I believe other applications such as wine-preloader (which I do not think is based on Chromium) are also affected. This has been reproduced on kernels >= 6.6. In reviewing the code, my best guess is that this is caused by the scenario where brk == start_brk not being handled, though I am not expert enough in kernel code to know. If the start address allocated by mmap is before the starting program break, and the end address is after the starting program break, then the avc will trigger. However, I don't know how mmap deals with determining an address and if it takes into account the program break, or if calling brk(2) later on will just pick a new location.
On Wed, Aug 7, 2024 at 5:26 PM Marc Reisner <reisner.marc@gmail.com> wrote: > > It looks like this issue is still not fixed. There has been some > investigation on going in this Bugzilla for Fedora: > > https://bugzilla.redhat.com/show_bug.cgi?id=2254434 FWIW, I recently learned there was also a report in the kernel bugzilla about the same issue: https://bugzilla.kernel.org/show_bug.cgi?id=218258 > The behavior we are seeing is that when a process has no heap and > mmap(2) is called with MAP_PRIVATE | MAP_ANONYMOUS, it allocates memory > on the heap. > > If the address space returned by mmap(2) is later on made executable > with mprotect(2), that triggers an execheap avc. ... > This has been reproduced on kernels >= 6.6. > > In reviewing the code, my best guess is that this is caused by the > scenario where brk == start_brk not being handled, though I am not > expert enough in kernel code to know. If the start address allocated > by mmap is before the starting program break, and the end address is > after the starting program break, then the avc will trigger. However, > I don't know how mmap deals with determining an address and if it takes > into account the program break, or if calling brk(2) later on will just > pick a new location. I'm not a mm expert, but thankfully we have some on the To/CC line so I'm hopeful they will be able to take a look and provide some insight. To bring the relevant code into this email, prior to using the vma_is_initial_heap() helper the SELinux execheap logic looked like this: /* WORKING */ if (vma->vm_start >= vma->vm_mm->start_brk && vma->vm_end <= vma->vm_mm->brk) /* execheap denial */ ... while the current vma_is_initial_heap() helper has logic that looks like this: /* BUGGY */ if (vma->vm_start < vma->vm_mm->brk && vma->vm_end > vma->vm_mm->start_brk) /* execheap denial */
On 2024/8/8 9:10, Paul Moore wrote: > On Wed, Aug 7, 2024 at 5:26 PM Marc Reisner <reisner.marc@gmail.com> wrote: >> >> It looks like this issue is still not fixed. There has been some >> investigation on going in this Bugzilla for Fedora: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=2254434 > > FWIW, I recently learned there was also a report in the kernel > bugzilla about the same issue: > > https://bugzilla.kernel.org/show_bug.cgi?id=218258 > >> The behavior we are seeing is that when a process has no heap and >> mmap(2) is called with MAP_PRIVATE | MAP_ANONYMOUS, it allocates memory >> on the heap. >> >> If the address space returned by mmap(2) is later on made executable >> with mprotect(2), that triggers an execheap avc. > > ... > >> This has been reproduced on kernels >= 6.6. I try the reproducer, but fails to reproduce on my 6.6 >> >> In reviewing the code, my best guess is that this is caused by the >> scenario where brk == start_brk not being handled, though I am not Is there vma_start,end and start_brk,brk infos, adding some debug print. diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 81fbfa5b80d4..b66c381c558e 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -3848,6 +3848,9 @@ static int selinux_file_mprotect(struct vm_area_struct *vma, if (vma_is_initial_heap(vma)) { rc = avc_has_perm(sid, sid, SECCLASS_PROCESS, PROCESS__EXECHEAP, NULL); + if (rc) + pr_err("DEBUG: vma_start,end=%lx,%lx, start_brk,brk=%lx,%lx\n", + vma->vm_start, vma->vm_end, vma->vm_mm->start_brk, vma->vm_mm->brk); } else if (!vma->vm_file && (vma_is_initial_stack(vma) || vma_is_stack_for_current(vma))) { rc = avc_has_perm(sid, sid, SECCLASS_PROCESS, >> expert enough in kernel code to know. If the start address allocated >> by mmap is before the starting program break, and the end address is >> after the starting program break, then the avc will trigger. However, >> I don't know how mmap deals with determining an address and if it takes >> into account the program break, or if calling brk(2) later on will just >> pick a new location. > > I'm not a mm expert, but thankfully we have some on the To/CC line so > I'm hopeful they will be able to take a look and provide some insight. +Cc mmap maintainers too. > > To bring the relevant code into this email, prior to using the > vma_is_initial_heap() helper the SELinux execheap logic looked like > this: > > /* WORKING */ > if (vma->vm_start >= vma->vm_mm->start_brk && > vma->vm_end <= vma->vm_mm->brk) > /* execheap denial */ > > ... while the current vma_is_initial_heap() helper has logic that > looks like this: > > /* BUGGY */ > if (vma->vm_start < vma->vm_mm->brk && > vma->vm_end > vma->vm_mm->start_brk) > /* execheap denial */ >
On 2024/8/8 14:43, Kefeng Wang wrote: > > On 2024/8/8 9:10, Paul Moore wrote: >> On Wed, Aug 7, 2024 at 5:26 PM Marc Reisner <reisner.marc@gmail.com> >> wrote: >>> >>> It looks like this issue is still not fixed. There has been some >>> investigation on going in this Bugzilla for Fedora: >>> >>> https://bugzilla.redhat.com/show_bug.cgi?id=2254434 >> >> FWIW, I recently learned there was also a report in the kernel >> bugzilla about the same issue: >> >> https://bugzilla.kernel.org/show_bug.cgi?id=218258 >> >>> The behavior we are seeing is that when a process has no heap and >>> mmap(2) is called with MAP_PRIVATE | MAP_ANONYMOUS, it allocates memory >>> on the heap. >>> >>> If the address space returned by mmap(2) is later on made executable >>> with mprotect(2), that triggers an execheap avc. >> >> ... >> >>> This has been reproduced on kernels >= 6.6. > > I try the reproducer, but fails to reproduce on my 6.6 After enable selinux then trigger the issue. >>> >>> In reviewing the code, my best guess is that this is caused by the >>> scenario where brk == start_brk not being handled, though I am not > > Is there vma_start,end and start_brk,brk infos, adding some debug print. > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 81fbfa5b80d4..b66c381c558e 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -3848,6 +3848,9 @@ static int selinux_file_mprotect(struct > vm_area_struct *vma, > if (vma_is_initial_heap(vma)) { > rc = avc_has_perm(sid, sid, SECCLASS_PROCESS, > PROCESS__EXECHEAP, NULL); > + if (rc) > + pr_err("DEBUG: vma_start,end=%lx,%lx, > start_brk,brk=%lx,%lx\n", > + vma->vm_start, vma->vm_end, > vma->vm_mm->start_brk, vma->vm_mm->brk); > } else if (!vma->vm_file && (vma_is_initial_stack(vma) || > vma_is_stack_for_current(vma))) { > rc = avc_has_perm(sid, sid, SECCLASS_PROCESS, [ 3520.913952] DEBUG: pid=769674,comm=test-vma, vma:start,end=1e9c000,21e9c000, start_brk,brk=2310000,2310000 vma_start < start_brk = brk < vma_end > >>> expert enough in kernel code to know. If the start address allocated >>> by mmap is before the starting program break, and the end address is >>> after the starting program break, then the avc will trigger. However, >>> I don't know how mmap deals with determining an address and if it takes >>> into account the program break, or if calling brk(2) later on will just >>> pick a new location. >> >> I'm not a mm expert, but thankfully we have some on the To/CC line so >> I'm hopeful they will be able to take a look and provide some insight. > > +Cc mmap maintainers too. > >> >> To bring the relevant code into this email, prior to using the >> vma_is_initial_heap() helper the SELinux execheap logic looked like >> this: >> >> /* WORKING */ >> if (vma->vm_start >= vma->vm_mm->start_brk && >> vma->vm_end <= vma->vm_mm->brk) >> /* execheap denial */ >> vma range must be within heap. >> ... while the current vma_is_initial_heap() helper has logic that >> looks like this: >> >> /* BUGGY */ >> if (vma->vm_start < vma->vm_mm->brk && >> vma->vm_end > vma->vm_mm->start_brk) >> /* execheap denial */ vma range intersection with brk range will be rejected, and there's a cross in above case, so execheap denial, I not sure it should be allowed or not, hope maintainers give some comments here. Thanks.
On Thu, Aug 8, 2024 at 7:09 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote: > > > > On 2024/8/8 14:43, Kefeng Wang wrote: > > > > On 2024/8/8 9:10, Paul Moore wrote: > >> On Wed, Aug 7, 2024 at 5:26 PM Marc Reisner <reisner.marc@gmail.com> > >> wrote: > >>> > >>> It looks like this issue is still not fixed. There has been some > >>> investigation on going in this Bugzilla for Fedora: > >>> > >>> https://bugzilla.redhat.com/show_bug.cgi?id=2254434 > >> > >> FWIW, I recently learned there was also a report in the kernel > >> bugzilla about the same issue: > >> > >> https://bugzilla.kernel.org/show_bug.cgi?id=218258 > >> > >>> The behavior we are seeing is that when a process has no heap and > >>> mmap(2) is called with MAP_PRIVATE | MAP_ANONYMOUS, it allocates memory > >>> on the heap. > >>> > >>> If the address space returned by mmap(2) is later on made executable > >>> with mprotect(2), that triggers an execheap avc. > >> > >> ... > >> > >>> This has been reproduced on kernels >= 6.6. > > > > I try the reproducer, but fails to reproduce on my 6.6 > > After enable selinux then trigger the issue. > > >>> > >>> In reviewing the code, my best guess is that this is caused by the > >>> scenario where brk == start_brk not being handled, though I am not > > > > Is there vma_start,end and start_brk,brk infos, adding some debug print. > > > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > > index 81fbfa5b80d4..b66c381c558e 100644 > > --- a/security/selinux/hooks.c > > +++ b/security/selinux/hooks.c > > @@ -3848,6 +3848,9 @@ static int selinux_file_mprotect(struct > > vm_area_struct *vma, > > if (vma_is_initial_heap(vma)) { > > rc = avc_has_perm(sid, sid, SECCLASS_PROCESS, > > PROCESS__EXECHEAP, NULL); > > + if (rc) > > + pr_err("DEBUG: vma_start,end=%lx,%lx, > > start_brk,brk=%lx,%lx\n", > > + vma->vm_start, vma->vm_end, > > vma->vm_mm->start_brk, vma->vm_mm->brk); > > } else if (!vma->vm_file && (vma_is_initial_stack(vma) || > > vma_is_stack_for_current(vma))) { > > rc = avc_has_perm(sid, sid, SECCLASS_PROCESS, > > > [ 3520.913952] DEBUG: pid=769674,comm=test-vma, > vma:start,end=1e9c000,21e9c000, start_brk,brk=2310000,2310000 > > > vma_start < start_brk = brk < vma_end > > > > >>> expert enough in kernel code to know. If the start address allocated > >>> by mmap is before the starting program break, and the end address is > >>> after the starting program break, then the avc will trigger. However, > >>> I don't know how mmap deals with determining an address and if it takes > >>> into account the program break, or if calling brk(2) later on will just > >>> pick a new location. > >> > >> I'm not a mm expert, but thankfully we have some on the To/CC line so > >> I'm hopeful they will be able to take a look and provide some insight. > > > > +Cc mmap maintainers too. > > > >> > >> To bring the relevant code into this email, prior to using the > >> vma_is_initial_heap() helper the SELinux execheap logic looked like > >> this: > >> > >> /* WORKING */ > >> if (vma->vm_start >= vma->vm_mm->start_brk && > >> vma->vm_end <= vma->vm_mm->brk) > >> /* execheap denial */ > >> > > vma range must be within heap. > > >> ... while the current vma_is_initial_heap() helper has logic that > >> looks like this: > >> > >> /* BUGGY */ > >> if (vma->vm_start < vma->vm_mm->brk && > >> vma->vm_end > vma->vm_mm->start_brk) > >> /* execheap denial */ > > vma range intersection with brk range will be rejected, and > there's a cross in above case, so execheap denial, I not sure it > should be allowed or not, hope maintainers give some comments here. It's a kernel regression so we need to revert the change to SELinux at least - can't start denying permission without a new/updated policy (e.g. making the change in the check conditional on a new policy capability).
On 2024/8/8 19:41, Stephen Smalley wrote: > On Thu, Aug 8, 2024 at 7:09 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote: >> >> >> >> On 2024/8/8 14:43, Kefeng Wang wrote: >>> >>> On 2024/8/8 9:10, Paul Moore wrote: >>>> On Wed, Aug 7, 2024 at 5:26 PM Marc Reisner <reisner.marc@gmail.com> >>>> wrote: >>>>> >>>>> It looks like this issue is still not fixed. There has been some >>>>> investigation on going in this Bugzilla for Fedora: >>>>> >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=2254434 >>>> >>>> FWIW, I recently learned there was also a report in the kernel >>>> bugzilla about the same issue: >>>> >>>> https://bugzilla.kernel.org/show_bug.cgi?id=218258 >>>> >>>>> The behavior we are seeing is that when a process has no heap and >>>>> mmap(2) is called with MAP_PRIVATE | MAP_ANONYMOUS, it allocates memory >>>>> on the heap. >>>>> >>>>> If the address space returned by mmap(2) is later on made executable >>>>> with mprotect(2), that triggers an execheap avc. >>>> >>>> ... >>>> >>>>> This has been reproduced on kernels >= 6.6. >>> >>> I try the reproducer, but fails to reproduce on my 6.6 >> >> After enable selinux then trigger the issue. >> >>>>> >>>>> In reviewing the code, my best guess is that this is caused by the >>>>> scenario where brk == start_brk not being handled, though I am not >>> >>> Is there vma_start,end and start_brk,brk infos, adding some debug print. >>> >>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c >>> index 81fbfa5b80d4..b66c381c558e 100644 >>> --- a/security/selinux/hooks.c >>> +++ b/security/selinux/hooks.c >>> @@ -3848,6 +3848,9 @@ static int selinux_file_mprotect(struct >>> vm_area_struct *vma, >>> if (vma_is_initial_heap(vma)) { >>> rc = avc_has_perm(sid, sid, SECCLASS_PROCESS, >>> PROCESS__EXECHEAP, NULL); >>> + if (rc) >>> + pr_err("DEBUG: vma_start,end=%lx,%lx, >>> start_brk,brk=%lx,%lx\n", >>> + vma->vm_start, vma->vm_end, >>> vma->vm_mm->start_brk, vma->vm_mm->brk); >>> } else if (!vma->vm_file && (vma_is_initial_stack(vma) || >>> vma_is_stack_for_current(vma))) { >>> rc = avc_has_perm(sid, sid, SECCLASS_PROCESS, >> >> >> [ 3520.913952] DEBUG: pid=769674,comm=test-vma, >> vma:start,end=1e9c000,21e9c000, start_brk,brk=2310000,2310000 >> >> >> vma_start < start_brk = brk < vma_end >> >>> >>>>> expert enough in kernel code to know. If the start address allocated >>>>> by mmap is before the starting program break, and the end address is >>>>> after the starting program break, then the avc will trigger. However, >>>>> I don't know how mmap deals with determining an address and if it takes >>>>> into account the program break, or if calling brk(2) later on will just >>>>> pick a new location. >>>> >>>> I'm not a mm expert, but thankfully we have some on the To/CC line so >>>> I'm hopeful they will be able to take a look and provide some insight. >>> >>> +Cc mmap maintainers too. >>> >>>> >>>> To bring the relevant code into this email, prior to using the >>>> vma_is_initial_heap() helper the SELinux execheap logic looked like >>>> this: >>>> >>>> /* WORKING */ >>>> if (vma->vm_start >= vma->vm_mm->start_brk && >>>> vma->vm_end <= vma->vm_mm->brk) >>>> /* execheap denial */ >>>> >> >> vma range must be within heap. >> >>>> ... while the current vma_is_initial_heap() helper has logic that >>>> looks like this: >>>> >>>> /* BUGGY */ >>>> if (vma->vm_start < vma->vm_mm->brk && >>>> vma->vm_end > vma->vm_mm->start_brk) >>>> /* execheap denial */ >> >> vma range intersection with brk range will be rejected, and >> there's a cross in above case, so execheap denial, I not sure it >> should be allowed or not, hope maintainers give some comments here. > > It's a kernel regression so we need to revert the change to SELinux at > least - can't start denying permission without a new/updated policy > (e.g. making the change in the check conditional on a new policy > capability). > OK,revert patch is sent, but I am also curious about it. https://lore.kernel.org/all/20240808130909.1027860-1-wangkefeng.wang@huawei.com/
On Thu, Aug 08, 2024 at 09:12:59PM +0800, Kefeng Wang wrote: > > OK,revert patch is sent, but I am also curious about it. > > https://lore.kernel.org/all/20240808130909.1027860-1-wangkefeng.wang@huawei.com/ I am also curious. It seems like the "real" fix would be in mmap - my understanding is that it should not intersect with heap, even when heap is empty (start_brk == brk). It looks like start_brk is fixed in place when the ELF is loaded in fs/binfmt_elf.c:load_elf_binary (line 1288). if ((current->flags & PF_RANDOMIZE) && (snapshot_randomize_va_space > 1)) { /* * For architectures with ELF randomization, when executing * a loader directly (i.e. no interpreter listed in ELF * headers), move the brk area out of the mmap region * (since it grows up, and may collide early with the stack * growing down), and into the unused ELF_ET_DYN_BASE region. */ if (IS_ENABLED(CONFIG_ARCH_HAS_ELF_RANDOMIZE) && elf_ex->e_type == ET_DYN && !interpreter) { mm->brk = mm->start_brk = ELF_ET_DYN_BASE; } else { /* Otherwise leave a gap between .bss and brk. */ mm->brk = mm->start_brk = mm->brk + PAGE_SIZE; } mm->brk = mm->start_brk = arch_randomize_brk(mm); #ifdef compat_brk_randomized current->brk_randomized = 1; #endif }
* Marc Reisner <reisner.marc@gmail.com> [240808 11:03]: > On Thu, Aug 08, 2024 at 09:12:59PM +0800, Kefeng Wang wrote: > > > > OK,revert patch is sent, but I am also curious about it. > > > > https://lore.kernel.org/all/20240808130909.1027860-1-wangkefeng.wang@huawei.com/ > > I am also curious. It seems like the "real" fix would be in mmap - my > understanding is that it should not intersect with heap, even when heap > is empty (start_brk == brk). > > It looks like start_brk is fixed in place when the ELF is > loaded in fs/binfmt_elf.c:load_elf_binary (line 1288). > > if ((current->flags & PF_RANDOMIZE) && (snapshot_randomize_va_space > 1)) { > /* > * For architectures with ELF randomization, when executing > * a loader directly (i.e. no interpreter listed in ELF > * headers), move the brk area out of the mmap region > * (since it grows up, and may collide early with the stack > * growing down), and into the unused ELF_ET_DYN_BASE region. > */ > if (IS_ENABLED(CONFIG_ARCH_HAS_ELF_RANDOMIZE) && > elf_ex->e_type == ET_DYN && !interpreter) { > mm->brk = mm->start_brk = ELF_ET_DYN_BASE; > } else { > /* Otherwise leave a gap between .bss and brk. */ > mm->brk = mm->start_brk = mm->brk + PAGE_SIZE; > } > > mm->brk = mm->start_brk = arch_randomize_brk(mm); > #ifdef compat_brk_randomized > current->brk_randomized = 1; > #endif > } Have a look at the mmapstress 3 test in ltp [1]. The tests pokes holes and mmaps into those holes throughout the brk range. [1]. https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mmapstress/mmapstress03.c
On Thu, Aug 08, 2024 at 02:00:09PM -0400, Liam R. Howlett wrote: > Have a look at the mmapstress 3 test in ltp [1]. The tests pokes holes > and mmaps into those holes throughout the brk range. > > [1]. https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mmapstress/mmapstress03.c In investigating this further, with additional reproducers, I believe that the whole bug is in vma_is_initial_heap(). Here is what I have tested so far: 1. Use only sbrk to allocate heap memory - sbrk(0) returns the start_brk before calling sbrk(increment). Afterwards, sbrk(0) returns start_brk + increment. 2. Use sbrk(0) to obtain start_brk, then request 512 MB of address space from mmap starting at start_brk. mmap allocates 512 MB of address space starting at start_brk and ending at start_brk + 0x20000000. sbrk(0) still returns start_brk. However, /proc/PID/maps flags the mmapped address space with "[heap]" 3. Use sbrk(0) to obtain start_brk, then request 512 MB of address space from mmap starting at start_brk + _SC_PAGESIZE. mmap allocates 512 MB of address space starting at start_brk + _SC_PAGESIZE and ending at start_brk + _SC_PAGESIZE + 0x20000000. sbrk(0) still returns start_brk, and /proc/PID/maps does NOT flag the mmapped address space with "[heap]". I believe that the entire bug may reside in vma_is_initial_heap because /proc/PID/maps also uses vma_is_initial_heap to flag entries with "[heap]" [1]. Also, sbrk(0) is not actually getting updated after a call to mmap, so mmap is not actually allocating heap memory. What do you all think about this patch? If it doesn't have any obvious flaws I can submit it (along with a revert for the revert). diff --git a/include/linux/mm.h b/include/linux/mm.h index c4b238a20b76..1dd588833af8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -918,7 +918,8 @@ static inline bool vma_is_anonymous(struct vm_area_struct *vma) */ static inline bool vma_is_initial_heap(const struct vm_area_struct *vma) { - return vma->vm_start < vma->vm_mm->brk && + return vma->vm_mm->brk != vma->vm_mm->start_brk && + vma->vm_start < vma->vm_mm->brk && vma->vm_end > vma->vm_mm->start_brk; } -- [1]. https://github.com/torvalds/linux/blob/6a0e38264012809afa24113ee2162dc07f4ed22b/fs/proc/task_mmu.c#L287
On Thu, Aug 8, 2024 at 3:35 PM Marc Reisner <reisner.marc@gmail.com> wrote: > On Thu, Aug 08, 2024 at 02:00:09PM -0400, Liam R. Howlett wrote: > > Have a look at the mmapstress 3 test in ltp [1]. The tests pokes holes > > and mmaps into those holes throughout the brk range. > > > > [1]. https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mmapstress/mmapstress03.c > > In investigating this further, with additional reproducers, I believe > that the whole bug is in vma_is_initial_heap(). That's my feeling at this point too. Unfortunately, there are a few callers other than SELinux so I don't want to change the helper function without an explicit ACK from the mm folks and I think now that we understand the problem we want to get this fixed ASAP in Linus' tree (and get it marked for -stable). I just posted a patch that reverts just our use of vma_is_initial_heap() in favor of our old logic and adds a few lines of comments about the problem with vma_is_initial_heap(). I'm okay with moving back to vma_is_initial_heap() when it's fixed, but I'd prefer it to be fixed for a while before we transition back to it. We've gotten burned twice now with vma_is_initial_heap() so I'm going to be a little extra cautious here. https://lore.kernel.org/selinux/20240808203353.202352-2-paul@paul-moore.com > What do you all think about this patch? If it doesn't have any obvious > flaws I can submit it (along with a revert for the revert). I'll leave the mm folks to weigh in on the fix to vma_is_initial_heap(), but as I said above, please don't submit a patch to SELinux right now, I want the fixed version of vma_is_initial_heap() to "soak" for a bit before we go back to it.
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index c87b79a29fad..ac582c046c51 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -3800,13 +3800,10 @@ static int selinux_file_mprotect(struct vm_area_struct *vma, if (default_noexec && (prot & PROT_EXEC) && !(vma->vm_flags & VM_EXEC)) { int rc = 0; - if (vma->vm_start >= vma->vm_mm->start_brk && - vma->vm_end <= vma->vm_mm->brk) { + if (vma_is_initial_heap(vma)) { rc = avc_has_perm(sid, sid, SECCLASS_PROCESS, PROCESS__EXECHEAP, NULL); - } else if (!vma->vm_file && - ((vma->vm_start <= vma->vm_mm->start_stack && - vma->vm_end >= vma->vm_mm->start_stack) || + } else if (!vma->vm_file && (vma_is_initial_stack(vma) || vma_is_stack_for_current(vma))) { rc = avc_has_perm(sid, sid, SECCLASS_PROCESS,