diff mbox series

[v3,3/4] selinux: use vma_is_initial_stack() and vma_is_initial_heap()

Message ID CAEjxPJ5ZuBGVDah0f3g0-7t2v1uSXTmp_cTT3g_MSP3J9QtoeA@mail.gmail.com (mailing list archive)
State Handled Elsewhere
Delegated to: Paul Moore
Headers show
Series None | expand

Commit Message

Stephen Smalley July 31, 2023, 2:26 p.m. UTC
I believe this patch yields a semantic change in the SELinux execheap
permission check. That said, I think the change is for the better. For
ease of comparison, is_initial_heap() is defined in patch 1 of the
series as:
+/*
+ * Indicate if the VMA is a heap for the given task; for
+ * /proc/PID/maps that is the heap of the main task.
+ */
+static inline bool vma_is_initial_heap(const struct vm_area_struct *vma)
+{
+       return vma->vm_start <= vma->vm_mm->brk &&
+               vma->vm_end >= vma->vm_mm->start_brk;
+}
+

This is a check for whether the mapping has a non-empty intersection
with the heap range.
Whereas the existing test in the SELinux code only appears to check
whether the mapping is _within_ the heap range.

---------- Forwarded message ---------
From: Kefeng Wang <wangkefeng.wang@huawei.com>
Date: Fri, Jul 28, 2023 at 12:48 AM
Subject: [PATCH v3 3/4] selinux: use vma_is_initial_stack() and
vma_is_initial_heap()
To: Andrew Morton <akpm@linux-foundation.org>
Cc: <amd-gfx@lists.freedesktop.org>,
<dri-devel@lists.freedesktop.org>, <linux-kernel@vger.kernel.org>,
<linux-fsdevel@vger.kernel.org>, <linux-mm@kvack.org>,
<linux-perf-users@vger.kernel.org>, <selinux@vger.kernel.org>,
Christian Göttsche <cgzones@googlemail.com>, David Hildenbrand
<david@redhat.com>, Felix Kuehling <Felix.Kuehling@amd.com>, Alex
Deucher <alexander.deucher@amd.com>, <christian.koenig@amd.com>,
<Xinhui.Pan@amd.com>, <airlied@gmail.com>, <daniel@ffwll.ch>,
<paul@paul-moore.com>, <stephen.smalley.work@gmail.com>,
<eparis@parisplace.org>, <peterz@infradead.org>, <acme@kernel.org>,
Kefeng Wang <wangkefeng.wang@huawei.com>


Use the helpers to simplify code.

Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Paris <eparis@parisplace.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 security/selinux/hooks.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

                                          PROCESS__EXECSTACK, NULL);
--
2.41.0

Comments

Paul Moore July 31, 2023, 4:19 p.m. UTC | #1
On Mon, Jul 31, 2023 at 10:26 AM Stephen Smalley
<stephen.smalley.work@gmail.com> wrote:
> I believe this patch yields a semantic change in the SELinux execheap
> permission check. That said, I think the change is for the better.

Agreed.  I'm also in favor of using a helper which is maintained by
the VM folks over open coded logic in the SELinux code.
Ondrej Mosnacek Dec. 6, 2023, 2:22 p.m. UTC | #2
(Restoring part of the Cc list to include more relevant lists &
people... If you are lost, the original email is here:
https://lore.kernel.org/selinux/20230728050043.59880-4-wangkefeng.wang@huawei.com/)

On Tue, Aug 1, 2023 at 1:08 AM Paul Moore <paul@paul-moore.com> wrote:
>
> On Mon, Jul 31, 2023 at 4:02 PM Stephen Smalley
> <stephen.smalley.work@gmail.com> wrote:
> > On Mon, Jul 31, 2023 at 12:19 PM Paul Moore <paul@paul-moore.com> wrote:
> > >
> > > On Mon, Jul 31, 2023 at 10:26 AM Stephen Smalley
> > > <stephen.smalley.work@gmail.com> wrote:
> > > > I believe this patch yields a semantic change in the SELinux execheap
> > > > permission check. That said, I think the change is for the better.
> > >
> > > Agreed.  I'm also in favor of using a helper which is maintained by
> > > the VM folks over open coded logic in the SELinux code.
> >
> > Yes, only caveat is in theory it could trigger new execheap denials
> > under existing policies.
> > Trying to construct an example based on the
> > selinux-testsuite/tests/mmap/mprot_heap.c example but coming up empty
> > so far on something that both works and yields different results
> > before and after this patch.
>
> My gut feeling is that this will not be an issue, but I could very
> well be wrong.  If it becomes a significant issue we can revert the
> SELinux portion of the patch.
>
> Of course, if you have any luck demonstrating this with reasonable
> code, that would be good input too.

So, it turns out this does affect actual code. Thus far, we know about
gcl [1] and wine [2]. The gcl case is easy to reproduce (just install
gcl on Fedora and run gcl without arguments), so I was able to dig a
bit deeper.

gcl has the following relevant memory mappings as captured by gdb:
Start Addr           End Addr       Size     Offset  Perms  objfile
  0x413000           0xf75000   0xb62000   0x3fa000  rw-p
/usr/lib/gcl-2.6.14/unixport/saved_ansi_gcl
  0xf75000           0xf79000     0x4000        0x0  rwxp   [heap]

It tries to call mprotect(0x883000, 7282688,
PROT_READ|PROT_WRITE|PROT_EXEC), i.e. it tries to make the region
0x883000 - 0xf75000 executable. Before this patch it was allowed,
whereas now it triggers an execheap SELinux denial. But this seems
wrong - the affected region is merely adjacent to the [heap] region,
it does not actually overlap with it. So even if we accept that the
correct semantics is to catch any region that overlaps with the heap
(before only subregions of the heap were subject to the execheap
check), this corner case doesn't seem to be handled correctly by the
new check (and the same bug seems to have been in fs/proc/task_mmu.c
before commit 11250fd12eb8 ("mm: factor out VMA stack and heap
checks")).

I didn't analyze the wine case ([2]), but it may be the same situation.

Unless I'm mistaken, those <= & >= in should in fact be just < & >.
And the expression in vma_is_initial_stack() is also suspicious (but
I'm not going to make any assumption on what is the intended semantics
there...)

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2252391
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2247299

--
Ondrej Mosnacek
Senior Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.
Paul Moore Dec. 6, 2023, 8:47 p.m. UTC | #3
On Wed, Dec 6, 2023 at 9:22 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
>
> (Restoring part of the Cc list to include more relevant lists &
> people... If you are lost, the original email is here:
> https://lore.kernel.org/selinux/20230728050043.59880-4-wangkefeng.wang@huawei.com/)
>
> On Tue, Aug 1, 2023 at 1:08 AM Paul Moore <paul@paul-moore.com> wrote:
> > On Mon, Jul 31, 2023 at 4:02 PM Stephen Smalley
> > <stephen.smalley.work@gmail.com> wrote:
> > > On Mon, Jul 31, 2023 at 12:19 PM Paul Moore <paul@paul-moore.com> wrote:
> > > >
> > > > On Mon, Jul 31, 2023 at 10:26 AM Stephen Smalley
> > > > <stephen.smalley.work@gmail.com> wrote:
> > > > > I believe this patch yields a semantic change in the SELinux execheap
> > > > > permission check. That said, I think the change is for the better.
> > > >
> > > > Agreed.  I'm also in favor of using a helper which is maintained by
> > > > the VM folks over open coded logic in the SELinux code.
> > >
> > > Yes, only caveat is in theory it could trigger new execheap denials
> > > under existing policies.
> > > Trying to construct an example based on the
> > > selinux-testsuite/tests/mmap/mprot_heap.c example but coming up empty
> > > so far on something that both works and yields different results
> > > before and after this patch.
> >
> > My gut feeling is that this will not be an issue, but I could very
> > well be wrong.  If it becomes a significant issue we can revert the
> > SELinux portion of the patch.
> >
> > Of course, if you have any luck demonstrating this with reasonable
> > code, that would be good input too.
>
> So, it turns out this does affect actual code. Thus far, we know about
> gcl [1] and wine [2]. The gcl case is easy to reproduce (just install
> gcl on Fedora and run gcl without arguments), so I was able to dig a
> bit deeper.
>
> gcl has the following relevant memory mappings as captured by gdb:
> Start Addr           End Addr       Size     Offset  Perms  objfile
>   0x413000           0xf75000   0xb62000   0x3fa000  rw-p
> /usr/lib/gcl-2.6.14/unixport/saved_ansi_gcl
>   0xf75000           0xf79000     0x4000        0x0  rwxp   [heap]
>
> It tries to call mprotect(0x883000, 7282688,
> PROT_READ|PROT_WRITE|PROT_EXEC), i.e. it tries to make the region
> 0x883000 - 0xf75000 executable. Before this patch it was allowed,
> whereas now it triggers an execheap SELinux denial. But this seems
> wrong - the affected region is merely adjacent to the [heap] region,
> it does not actually overlap with it. So even if we accept that the
> correct semantics is to catch any region that overlaps with the heap
> (before only subregions of the heap were subject to the execheap
> check), this corner case doesn't seem to be handled correctly by the
> new check (and the same bug seems to have been in fs/proc/task_mmu.c
> before commit 11250fd12eb8 ("mm: factor out VMA stack and heap
> checks")).
>
> I didn't analyze the wine case ([2]), but it may be the same situation.
>
> Unless I'm mistaken, those <= & >= in should in fact be just < & >.
> And the expression in vma_is_initial_stack() is also suspicious (but
> I'm not going to make any assumption on what is the intended semantics
> there...)
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=2252391
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=2247299

Thanks Ondrej.

I'm hoping the mm folks will comment on this as it looks like this is
an issue with the helper functions, but just in case I'm going to prep
a revert for just the SELinux changes.  If we don't hear anything in
the next couple of days I'll send the revert up to Linus with the idea
that we can eventually shift back to the helpers when this is sorted.
Kefeng Wang Dec. 7, 2023, 4:50 a.m. UTC | #4
On 2023/12/7 4:47, Paul Moore wrote:
> On Wed, Dec 6, 2023 at 9:22 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
>>
>> (Restoring part of the Cc list to include more relevant lists &
>> people... If you are lost, the original email is here:
>> https://lore.kernel.org/selinux/20230728050043.59880-4-wangkefeng.wang@huawei.com/)
>>
>> On Tue, Aug 1, 2023 at 1:08 AM Paul Moore <paul@paul-moore.com> wrote:
>>> On Mon, Jul 31, 2023 at 4:02 PM Stephen Smalley
>>> <stephen.smalley.work@gmail.com> wrote:
>>>> On Mon, Jul 31, 2023 at 12:19 PM Paul Moore <paul@paul-moore.com> wrote:
>>>>>
>>>>> On Mon, Jul 31, 2023 at 10:26 AM Stephen Smalley
>>>>> <stephen.smalley.work@gmail.com> wrote:
>>>>>> I believe this patch yields a semantic change in the SELinux execheap
>>>>>> permission check. That said, I think the change is for the better.
>>>>>
>>>>> Agreed.  I'm also in favor of using a helper which is maintained by
>>>>> the VM folks over open coded logic in the SELinux code.
>>>>
>>>> Yes, only caveat is in theory it could trigger new execheap denials
>>>> under existing policies.
>>>> Trying to construct an example based on the
>>>> selinux-testsuite/tests/mmap/mprot_heap.c example but coming up empty
>>>> so far on something that both works and yields different results
>>>> before and after this patch.
>>>
>>> My gut feeling is that this will not be an issue, but I could very
>>> well be wrong.  If it becomes a significant issue we can revert the
>>> SELinux portion of the patch.
>>>
>>> Of course, if you have any luck demonstrating this with reasonable
>>> code, that would be good input too.
>>
>> So, it turns out this does affect actual code. Thus far, we know about
>> gcl [1] and wine [2]. The gcl case is easy to reproduce (just install
>> gcl on Fedora and run gcl without arguments), so I was able to dig a
>> bit deeper.
>>
>> gcl has the following relevant memory mappings as captured by gdb:
>> Start Addr           End Addr       Size     Offset  Perms  objfile
>>    0x413000           0xf75000   0xb62000   0x3fa000  rw-p
>> /usr/lib/gcl-2.6.14/unixport/saved_ansi_gcl
>>    0xf75000           0xf79000     0x4000        0x0  rwxp   [heap]
>>
>> It tries to call mprotect(0x883000, 7282688,
>> PROT_READ|PROT_WRITE|PROT_EXEC), i.e. it tries to make the region
>> 0x883000 - 0xf75000 executable. Before this patch it was allowed,
>> whereas now it triggers an execheap SELinux denial. But this seems
>> wrong - the affected region is merely adjacent to the [heap] region,
>> it does not actually overlap with it. So even if we accept that the
>> correct semantics is to catch any region that overlaps with the heap
>> (before only subregions of the heap were subject to the execheap
>> check), this corner case doesn't seem to be handled correctly by the
>> new check (and the same bug seems to have been in fs/proc/task_mmu.c
>> before commit 11250fd12eb8 ("mm: factor out VMA stack and heap
>> checks")).

Yes, the heap check exists for a long time,

                          [start_brk        brk]
case1:                     [vm_start,vm_end]
case2:             [vm_start,vm_end]
case3:                               [vm_start,vm_end]
case4:         [vm_start,                      vm_end]

case5:   [vm_start,vm_end]
case6:                                          [vm_start,vm_end]

old check:
vma->vm_start >= vma->vm_mm->start_brk && vma->vm_end <= vma->vm_mm->brk

Only include case1, vma range must be within heap

new check:
vma->vm_start <= vma->vm_mm->brk && vma->vm_end >= vma->vm_mm->start_brk

Include case1~case6, but case5(vm_end=start_brk) and case6(vm_start=brk)
are the corner cases, gcl issue matchs the case5.

>>
>> I didn't analyze the wine case ([2]), but it may be the same situation.
>>
>> Unless I'm mistaken, those <= & >= in should in fact be just < & >.

I support this change.
  vma->vm_start < vma->vm_mm->brk && vma->vm_end > vma->vm_mm->start_brk

>> And the expression in vma_is_initial_stack() is also suspicious (but
>> I'm not going to make any assumption on what is the intended semantics
>> there...)
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=2252391
>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=2247299

Could you quickly verify after the above change?
> 
> Thanks Ondrej.
> 
> I'm hoping the mm folks will comment on this as it looks like this is
> an issue with the helper functions, but just in case I'm going to prep
> a revert for just the SELinux changes.  If we don't hear anything in
> the next couple of days I'll send the revert up to Linus with the idea
> that we can eventually shift back to the helpers when this is sorted.
>
Ondrej Mosnacek Dec. 7, 2023, 8:37 a.m. UTC | #5
On Thu, Dec 7, 2023 at 5:50 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
>
>
> On 2023/12/7 4:47, Paul Moore wrote:
> > On Wed, Dec 6, 2023 at 9:22 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> >>
> >> (Restoring part of the Cc list to include more relevant lists &
> >> people... If you are lost, the original email is here:
> >> https://lore.kernel.org/selinux/20230728050043.59880-4-wangkefeng.wang@huawei.com/)
> >>
> >> On Tue, Aug 1, 2023 at 1:08 AM Paul Moore <paul@paul-moore.com> wrote:
> >>> On Mon, Jul 31, 2023 at 4:02 PM Stephen Smalley
> >>> <stephen.smalley.work@gmail.com> wrote:
> >>>> On Mon, Jul 31, 2023 at 12:19 PM Paul Moore <paul@paul-moore.com> wrote:
> >>>>>
> >>>>> On Mon, Jul 31, 2023 at 10:26 AM Stephen Smalley
> >>>>> <stephen.smalley.work@gmail.com> wrote:
> >>>>>> I believe this patch yields a semantic change in the SELinux execheap
> >>>>>> permission check. That said, I think the change is for the better.
> >>>>>
> >>>>> Agreed.  I'm also in favor of using a helper which is maintained by
> >>>>> the VM folks over open coded logic in the SELinux code.
> >>>>
> >>>> Yes, only caveat is in theory it could trigger new execheap denials
> >>>> under existing policies.
> >>>> Trying to construct an example based on the
> >>>> selinux-testsuite/tests/mmap/mprot_heap.c example but coming up empty
> >>>> so far on something that both works and yields different results
> >>>> before and after this patch.
> >>>
> >>> My gut feeling is that this will not be an issue, but I could very
> >>> well be wrong.  If it becomes a significant issue we can revert the
> >>> SELinux portion of the patch.
> >>>
> >>> Of course, if you have any luck demonstrating this with reasonable
> >>> code, that would be good input too.
> >>
> >> So, it turns out this does affect actual code. Thus far, we know about
> >> gcl [1] and wine [2]. The gcl case is easy to reproduce (just install
> >> gcl on Fedora and run gcl without arguments), so I was able to dig a
> >> bit deeper.
> >>
> >> gcl has the following relevant memory mappings as captured by gdb:
> >> Start Addr           End Addr       Size     Offset  Perms  objfile
> >>    0x413000           0xf75000   0xb62000   0x3fa000  rw-p
> >> /usr/lib/gcl-2.6.14/unixport/saved_ansi_gcl
> >>    0xf75000           0xf79000     0x4000        0x0  rwxp   [heap]
> >>
> >> It tries to call mprotect(0x883000, 7282688,
> >> PROT_READ|PROT_WRITE|PROT_EXEC), i.e. it tries to make the region
> >> 0x883000 - 0xf75000 executable. Before this patch it was allowed,
> >> whereas now it triggers an execheap SELinux denial. But this seems
> >> wrong - the affected region is merely adjacent to the [heap] region,
> >> it does not actually overlap with it. So even if we accept that the
> >> correct semantics is to catch any region that overlaps with the heap
> >> (before only subregions of the heap were subject to the execheap
> >> check), this corner case doesn't seem to be handled correctly by the
> >> new check (and the same bug seems to have been in fs/proc/task_mmu.c
> >> before commit 11250fd12eb8 ("mm: factor out VMA stack and heap
> >> checks")).
>
> Yes, the heap check exists for a long time,
>
>                           [start_brk        brk]
> case1:                     [vm_start,vm_end]
> case2:             [vm_start,vm_end]
> case3:                               [vm_start,vm_end]
> case4:         [vm_start,                      vm_end]
>
> case5:   [vm_start,vm_end]
> case6:                                          [vm_start,vm_end]
>
> old check:
> vma->vm_start >= vma->vm_mm->start_brk && vma->vm_end <= vma->vm_mm->brk
>
> Only include case1, vma range must be within heap
>
> new check:
> vma->vm_start <= vma->vm_mm->brk && vma->vm_end >= vma->vm_mm->start_brk
>
> Include case1~case6, but case5(vm_end=start_brk) and case6(vm_start=brk)
> are the corner cases, gcl issue matchs the case5.
>
> >>
> >> I didn't analyze the wine case ([2]), but it may be the same situation.
> >>
> >> Unless I'm mistaken, those <= & >= in should in fact be just < & >.
>
> I support this change.
>   vma->vm_start < vma->vm_mm->brk && vma->vm_end > vma->vm_mm->start_brk
>
> >> And the expression in vma_is_initial_stack() is also suspicious (but
> >> I'm not going to make any assumption on what is the intended semantics
> >> there...)
> >>
> >> [1] https://bugzilla.redhat.com/show_bug.cgi?id=2252391
> >> [2] https://bugzilla.redhat.com/show_bug.cgi?id=2247299
>
> Could you quickly verify after the above change?

Yes, changing the operators as suggested fixes the gcl case.

BTW, the vma_is_*() helpers introduced by your patch also have wrong
indentation - the "return" lines are indented by 7 spaces instead of a
tab. If you are going to submit a fixing patch you might want to fix
that, too.
Kefeng Wang Dec. 7, 2023, 9:23 a.m. UTC | #6
On 2023/12/7 16:37, Ondrej Mosnacek wrote:
> On Thu, Dec 7, 2023 at 5:50 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>
>>
>>
>> On 2023/12/7 4:47, Paul Moore wrote:
>>> On Wed, Dec 6, 2023 at 9:22 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
>>>>
>>>> (Restoring part of the Cc list to include more relevant lists &
>>>> people... If you are lost, the original email is here:
>>>> https://lore.kernel.org/selinux/20230728050043.59880-4-wangkefeng.wang@huawei.com/)
>>>>
>>>> On Tue, Aug 1, 2023 at 1:08 AM Paul Moore <paul@paul-moore.com> wrote:
>>>>> On Mon, Jul 31, 2023 at 4:02 PM Stephen Smalley
>>>>> <stephen.smalley.work@gmail.com> wrote:
>>>>>> On Mon, Jul 31, 2023 at 12:19 PM Paul Moore <paul@paul-moore.com> wrote:
>>>>>>>
>>>>>>> On Mon, Jul 31, 2023 at 10:26 AM Stephen Smalley
>>>>>>> <stephen.smalley.work@gmail.com> wrote:
>>>>>>>> I believe this patch yields a semantic change in the SELinux execheap
>>>>>>>> permission check. That said, I think the change is for the better.
>>>>>>>
>>>>>>> Agreed.  I'm also in favor of using a helper which is maintained by
>>>>>>> the VM folks over open coded logic in the SELinux code.
>>>>>>
>>>>>> Yes, only caveat is in theory it could trigger new execheap denials
>>>>>> under existing policies.
>>>>>> Trying to construct an example based on the
>>>>>> selinux-testsuite/tests/mmap/mprot_heap.c example but coming up empty
>>>>>> so far on something that both works and yields different results
>>>>>> before and after this patch.
>>>>>
>>>>> My gut feeling is that this will not be an issue, but I could very
>>>>> well be wrong.  If it becomes a significant issue we can revert the
>>>>> SELinux portion of the patch.
>>>>>
>>>>> Of course, if you have any luck demonstrating this with reasonable
>>>>> code, that would be good input too.
>>>>
>>>> So, it turns out this does affect actual code. Thus far, we know about
>>>> gcl [1] and wine [2]. The gcl case is easy to reproduce (just install
>>>> gcl on Fedora and run gcl without arguments), so I was able to dig a
>>>> bit deeper.
>>>>
>>>> gcl has the following relevant memory mappings as captured by gdb:
>>>> Start Addr           End Addr       Size     Offset  Perms  objfile
>>>>     0x413000           0xf75000   0xb62000   0x3fa000  rw-p
>>>> /usr/lib/gcl-2.6.14/unixport/saved_ansi_gcl
>>>>     0xf75000           0xf79000     0x4000        0x0  rwxp   [heap]
>>>>
>>>> It tries to call mprotect(0x883000, 7282688,
>>>> PROT_READ|PROT_WRITE|PROT_EXEC), i.e. it tries to make the region
>>>> 0x883000 - 0xf75000 executable. Before this patch it was allowed,
>>>> whereas now it triggers an execheap SELinux denial. But this seems
>>>> wrong - the affected region is merely adjacent to the [heap] region,
>>>> it does not actually overlap with it. So even if we accept that the
>>>> correct semantics is to catch any region that overlaps with the heap
>>>> (before only subregions of the heap were subject to the execheap
>>>> check), this corner case doesn't seem to be handled correctly by the
>>>> new check (and the same bug seems to have been in fs/proc/task_mmu.c
>>>> before commit 11250fd12eb8 ("mm: factor out VMA stack and heap
>>>> checks")).
>>
>> Yes, the heap check exists for a long time,
>>
>>                            [start_brk        brk]
>> case1:                     [vm_start,vm_end]
>> case2:             [vm_start,vm_end]
>> case3:                               [vm_start,vm_end]
>> case4:         [vm_start,                      vm_end]
>>
>> case5:   [vm_start,vm_end]
>> case6:                                          [vm_start,vm_end]
>>
>> old check:
>> vma->vm_start >= vma->vm_mm->start_brk && vma->vm_end <= vma->vm_mm->brk
>>
>> Only include case1, vma range must be within heap
>>
>> new check:
>> vma->vm_start <= vma->vm_mm->brk && vma->vm_end >= vma->vm_mm->start_brk
>>
>> Include case1~case6, but case5(vm_end=start_brk) and case6(vm_start=brk)
>> are the corner cases, gcl issue matchs the case5.
>>
>>>>
>>>> I didn't analyze the wine case ([2]), but it may be the same situation.
>>>>
>>>> Unless I'm mistaken, those <= & >= in should in fact be just < & >.
>>
>> I support this change.
>>    vma->vm_start < vma->vm_mm->brk && vma->vm_end > vma->vm_mm->start_brk
>>
>>>> And the expression in vma_is_initial_stack() is also suspicious (but
>>>> I'm not going to make any assumption on what is the intended semantics
>>>> there...)
>>>>
>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=2252391
>>>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=2247299
>>
>> Could you quickly verify after the above change?
> 
> Yes, changing the operators as suggested fixes the gcl case.

Thanks for your confirm,win[2] fixed too, right?
> 
> BTW, the vma_is_*() helpers introduced by your patch also have wrong
> indentation - the "return" lines are indented by 7 spaces instead of a
> tab. If you are going to submit a fixing patch you might want to fix
> that, too.

Sure. will fix.
Marc Reisner Aug. 7, 2024, 9:26 p.m. UTC | #7
It looks like this issue is still not fixed. There has been some
investigation on going in this Bugzilla for Fedora:

https://bugzilla.redhat.com/show_bug.cgi?id=2254434

The behavior we are seeing is that when a process has no heap and
mmap(2) is called with MAP_PRIVATE | MAP_ANONYMOUS, it allocates memory
on the heap.

If the address space returned by mmap(2) is later on made executable
with mprotect(2), that triggers an execheap avc.

We have a fairly simple reproducer. We widdled it down from picking an
address to pass to mmap(2) using getrandom(2) to using the same address
every time. Sometimes it triggers the behavior, sometimes it does not.

We also observe that disabling ASLR via sysctl
kernel.randomize_va_space=0 works around the issue, with obvious
caveats.

Here is a reproducer, which raises SIGSTOP so that you can analyze
/proc/<pid>/maps. It allocates a 512 MB address space and then attempts
to give it execute permissions. Running it multiple times will
eventually trigger the behavior.

#include <stdint.h>
#include <unistd.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/random.h>
#include <signal.h>

int main(void)
{
    uintptr_t raw_addr = 0x25085000;

    int length = 512 * 1024 * 1024;
    void* pointer = mmap((void *)raw_addr, length, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

    if (!pointer)
    {
        raise(SIGSTOP);
        return 1;
    }

    if (mprotect(pointer, length, PROT_READ | PROT_WRITE | PROT_EXEC) == -1)
    {
        raise(SIGSTOP);
        return 1;
    }

    return 0;
}

As far as impact goes, this is mostly causing Chromium/Electron based
applications to fail randomly, but I believe other applications such as
wine-preloader (which I do not think is based on Chromium) are also
affected.

This has been reproduced on kernels >= 6.6.

In reviewing the code, my best guess is that this is caused by the
scenario where brk == start_brk not being handled, though I am not
expert enough in kernel code to know. If the start address allocated
by mmap is before the starting program break, and the end address is
after the starting program break, then the avc will trigger. However,
I don't know how mmap deals with determining an address and if it takes
into account the program break, or if calling brk(2) later on will just
pick a new location.
Paul Moore Aug. 8, 2024, 1:10 a.m. UTC | #8
On Wed, Aug 7, 2024 at 5:26 PM Marc Reisner <reisner.marc@gmail.com> wrote:
>
> It looks like this issue is still not fixed. There has been some
> investigation on going in this Bugzilla for Fedora:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=2254434

FWIW, I recently learned there was also a report in the kernel
bugzilla about the same issue:

https://bugzilla.kernel.org/show_bug.cgi?id=218258

> The behavior we are seeing is that when a process has no heap and
> mmap(2) is called with MAP_PRIVATE | MAP_ANONYMOUS, it allocates memory
> on the heap.
>
> If the address space returned by mmap(2) is later on made executable
> with mprotect(2), that triggers an execheap avc.

...

> This has been reproduced on kernels >= 6.6.
>
> In reviewing the code, my best guess is that this is caused by the
> scenario where brk == start_brk not being handled, though I am not
> expert enough in kernel code to know. If the start address allocated
> by mmap is before the starting program break, and the end address is
> after the starting program break, then the avc will trigger. However,
> I don't know how mmap deals with determining an address and if it takes
> into account the program break, or if calling brk(2) later on will just
> pick a new location.

I'm not a mm expert, but thankfully we have some on the To/CC line so
I'm hopeful they will be able to take a look and provide some insight.

To bring the relevant code into this email, prior to using the
vma_is_initial_heap() helper the SELinux execheap logic looked like
this:

  /* WORKING */
  if (vma->vm_start >= vma->vm_mm->start_brk &&
      vma->vm_end <= vma->vm_mm->brk)
    /* execheap denial */

... while the current vma_is_initial_heap() helper has logic that
looks like this:

  /* BUGGY */
  if (vma->vm_start < vma->vm_mm->brk &&
      vma->vm_end > vma->vm_mm->start_brk)
    /* execheap denial */
Kefeng Wang Aug. 8, 2024, 6:43 a.m. UTC | #9
On 2024/8/8 9:10, Paul Moore wrote:
> On Wed, Aug 7, 2024 at 5:26 PM Marc Reisner <reisner.marc@gmail.com> wrote:
>>
>> It looks like this issue is still not fixed. There has been some
>> investigation on going in this Bugzilla for Fedora:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=2254434
> 
> FWIW, I recently learned there was also a report in the kernel
> bugzilla about the same issue:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=218258
> 
>> The behavior we are seeing is that when a process has no heap and
>> mmap(2) is called with MAP_PRIVATE | MAP_ANONYMOUS, it allocates memory
>> on the heap.
>>
>> If the address space returned by mmap(2) is later on made executable
>> with mprotect(2), that triggers an execheap avc.
> 
> ...
> 
>> This has been reproduced on kernels >= 6.6.

I try the reproducer, but fails to reproduce on my 6.6
>>
>> In reviewing the code, my best guess is that this is caused by the
>> scenario where brk == start_brk not being handled, though I am not

Is there vma_start,end and start_brk,brk infos, adding some debug print.

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 81fbfa5b80d4..b66c381c558e 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3848,6 +3848,9 @@ static int selinux_file_mprotect(struct 
vm_area_struct *vma,
                 if (vma_is_initial_heap(vma)) {
                         rc = avc_has_perm(sid, sid, SECCLASS_PROCESS,
                                           PROCESS__EXECHEAP, NULL);
+                       if (rc)
+                               pr_err("DEBUG: vma_start,end=%lx,%lx, 
start_brk,brk=%lx,%lx\n",
+                                       vma->vm_start, vma->vm_end, 
vma->vm_mm->start_brk, vma->vm_mm->brk);
                 } else if (!vma->vm_file && (vma_is_initial_stack(vma) ||
                             vma_is_stack_for_current(vma))) {
                         rc = avc_has_perm(sid, sid, SECCLASS_PROCESS,

>> expert enough in kernel code to know. If the start address allocated
>> by mmap is before the starting program break, and the end address is
>> after the starting program break, then the avc will trigger. However,
>> I don't know how mmap deals with determining an address and if it takes
>> into account the program break, or if calling brk(2) later on will just
>> pick a new location.
> 
> I'm not a mm expert, but thankfully we have some on the To/CC line so
> I'm hopeful they will be able to take a look and provide some insight.

+Cc mmap maintainers too.

> 
> To bring the relevant code into this email, prior to using the
> vma_is_initial_heap() helper the SELinux execheap logic looked like
> this:
> 
>    /* WORKING */
>    if (vma->vm_start >= vma->vm_mm->start_brk &&
>        vma->vm_end <= vma->vm_mm->brk)
>      /* execheap denial */
> 
> ... while the current vma_is_initial_heap() helper has logic that
> looks like this:
> 
>    /* BUGGY */
>    if (vma->vm_start < vma->vm_mm->brk &&
>        vma->vm_end > vma->vm_mm->start_brk)
>      /* execheap denial */
>
Kefeng Wang Aug. 8, 2024, 11:09 a.m. UTC | #10
On 2024/8/8 14:43, Kefeng Wang wrote:
> 
> On 2024/8/8 9:10, Paul Moore wrote:
>> On Wed, Aug 7, 2024 at 5:26 PM Marc Reisner <reisner.marc@gmail.com> 
>> wrote:
>>>
>>> It looks like this issue is still not fixed. There has been some
>>> investigation on going in this Bugzilla for Fedora:
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=2254434
>>
>> FWIW, I recently learned there was also a report in the kernel
>> bugzilla about the same issue:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=218258
>>
>>> The behavior we are seeing is that when a process has no heap and
>>> mmap(2) is called with MAP_PRIVATE | MAP_ANONYMOUS, it allocates memory
>>> on the heap.
>>>
>>> If the address space returned by mmap(2) is later on made executable
>>> with mprotect(2), that triggers an execheap avc.
>>
>> ...
>>
>>> This has been reproduced on kernels >= 6.6.
> 
> I try the reproducer, but fails to reproduce on my 6.6

After enable selinux then trigger the issue.

>>>
>>> In reviewing the code, my best guess is that this is caused by the
>>> scenario where brk == start_brk not being handled, though I am not
> 
> Is there vma_start,end and start_brk,brk infos, adding some debug print.
> 
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 81fbfa5b80d4..b66c381c558e 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -3848,6 +3848,9 @@ static int selinux_file_mprotect(struct 
> vm_area_struct *vma,
>                  if (vma_is_initial_heap(vma)) {
>                          rc = avc_has_perm(sid, sid, SECCLASS_PROCESS,
>                                            PROCESS__EXECHEAP, NULL);
> +                       if (rc)
> +                               pr_err("DEBUG: vma_start,end=%lx,%lx, 
> start_brk,brk=%lx,%lx\n",
> +                                       vma->vm_start, vma->vm_end, 
> vma->vm_mm->start_brk, vma->vm_mm->brk);
>                  } else if (!vma->vm_file && (vma_is_initial_stack(vma) ||
>                              vma_is_stack_for_current(vma))) {
>                          rc = avc_has_perm(sid, sid, SECCLASS_PROCESS,


[ 3520.913952] DEBUG: pid=769674,comm=test-vma, 
vma:start,end=1e9c000,21e9c000, start_brk,brk=2310000,2310000


vma_start <  start_brk = brk  < vma_end

> 
>>> expert enough in kernel code to know. If the start address allocated
>>> by mmap is before the starting program break, and the end address is
>>> after the starting program break, then the avc will trigger. However,
>>> I don't know how mmap deals with determining an address and if it takes
>>> into account the program break, or if calling brk(2) later on will just
>>> pick a new location.
>>
>> I'm not a mm expert, but thankfully we have some on the To/CC line so
>> I'm hopeful they will be able to take a look and provide some insight.
> 
> +Cc mmap maintainers too.
> 
>>
>> To bring the relevant code into this email, prior to using the
>> vma_is_initial_heap() helper the SELinux execheap logic looked like
>> this:
>>
>>    /* WORKING */
>>    if (vma->vm_start >= vma->vm_mm->start_brk &&
>>        vma->vm_end <= vma->vm_mm->brk)
>>      /* execheap denial */
>>

vma range must be within heap.

>> ... while the current vma_is_initial_heap() helper has logic that
>> looks like this:
>>
>>    /* BUGGY */
>>    if (vma->vm_start < vma->vm_mm->brk &&
>>        vma->vm_end > vma->vm_mm->start_brk)
>>      /* execheap denial */

vma range intersection with brk range will be rejected, and
there's a cross in above case, so execheap denial, I not sure it
should be allowed or not, hope maintainers give some comments here.

Thanks.
Stephen Smalley Aug. 8, 2024, 11:41 a.m. UTC | #11
On Thu, Aug 8, 2024 at 7:09 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
>
>
> On 2024/8/8 14:43, Kefeng Wang wrote:
> >
> > On 2024/8/8 9:10, Paul Moore wrote:
> >> On Wed, Aug 7, 2024 at 5:26 PM Marc Reisner <reisner.marc@gmail.com>
> >> wrote:
> >>>
> >>> It looks like this issue is still not fixed. There has been some
> >>> investigation on going in this Bugzilla for Fedora:
> >>>
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=2254434
> >>
> >> FWIW, I recently learned there was also a report in the kernel
> >> bugzilla about the same issue:
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=218258
> >>
> >>> The behavior we are seeing is that when a process has no heap and
> >>> mmap(2) is called with MAP_PRIVATE | MAP_ANONYMOUS, it allocates memory
> >>> on the heap.
> >>>
> >>> If the address space returned by mmap(2) is later on made executable
> >>> with mprotect(2), that triggers an execheap avc.
> >>
> >> ...
> >>
> >>> This has been reproduced on kernels >= 6.6.
> >
> > I try the reproducer, but fails to reproduce on my 6.6
>
> After enable selinux then trigger the issue.
>
> >>>
> >>> In reviewing the code, my best guess is that this is caused by the
> >>> scenario where brk == start_brk not being handled, though I am not
> >
> > Is there vma_start,end and start_brk,brk infos, adding some debug print.
> >
> > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > index 81fbfa5b80d4..b66c381c558e 100644
> > --- a/security/selinux/hooks.c
> > +++ b/security/selinux/hooks.c
> > @@ -3848,6 +3848,9 @@ static int selinux_file_mprotect(struct
> > vm_area_struct *vma,
> >                  if (vma_is_initial_heap(vma)) {
> >                          rc = avc_has_perm(sid, sid, SECCLASS_PROCESS,
> >                                            PROCESS__EXECHEAP, NULL);
> > +                       if (rc)
> > +                               pr_err("DEBUG: vma_start,end=%lx,%lx,
> > start_brk,brk=%lx,%lx\n",
> > +                                       vma->vm_start, vma->vm_end,
> > vma->vm_mm->start_brk, vma->vm_mm->brk);
> >                  } else if (!vma->vm_file && (vma_is_initial_stack(vma) ||
> >                              vma_is_stack_for_current(vma))) {
> >                          rc = avc_has_perm(sid, sid, SECCLASS_PROCESS,
>
>
> [ 3520.913952] DEBUG: pid=769674,comm=test-vma,
> vma:start,end=1e9c000,21e9c000, start_brk,brk=2310000,2310000
>
>
> vma_start <  start_brk = brk  < vma_end
>
> >
> >>> expert enough in kernel code to know. If the start address allocated
> >>> by mmap is before the starting program break, and the end address is
> >>> after the starting program break, then the avc will trigger. However,
> >>> I don't know how mmap deals with determining an address and if it takes
> >>> into account the program break, or if calling brk(2) later on will just
> >>> pick a new location.
> >>
> >> I'm not a mm expert, but thankfully we have some on the To/CC line so
> >> I'm hopeful they will be able to take a look and provide some insight.
> >
> > +Cc mmap maintainers too.
> >
> >>
> >> To bring the relevant code into this email, prior to using the
> >> vma_is_initial_heap() helper the SELinux execheap logic looked like
> >> this:
> >>
> >>    /* WORKING */
> >>    if (vma->vm_start >= vma->vm_mm->start_brk &&
> >>        vma->vm_end <= vma->vm_mm->brk)
> >>      /* execheap denial */
> >>
>
> vma range must be within heap.
>
> >> ... while the current vma_is_initial_heap() helper has logic that
> >> looks like this:
> >>
> >>    /* BUGGY */
> >>    if (vma->vm_start < vma->vm_mm->brk &&
> >>        vma->vm_end > vma->vm_mm->start_brk)
> >>      /* execheap denial */
>
> vma range intersection with brk range will be rejected, and
> there's a cross in above case, so execheap denial, I not sure it
> should be allowed or not, hope maintainers give some comments here.

It's a kernel regression so we need to revert the change to SELinux at
least - can't start denying permission without a new/updated policy
(e.g. making the change in the check conditional on a new policy
capability).
Kefeng Wang Aug. 8, 2024, 1:12 p.m. UTC | #12
On 2024/8/8 19:41, Stephen Smalley wrote:
> On Thu, Aug 8, 2024 at 7:09 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>
>>
>>
>> On 2024/8/8 14:43, Kefeng Wang wrote:
>>>
>>> On 2024/8/8 9:10, Paul Moore wrote:
>>>> On Wed, Aug 7, 2024 at 5:26 PM Marc Reisner <reisner.marc@gmail.com>
>>>> wrote:
>>>>>
>>>>> It looks like this issue is still not fixed. There has been some
>>>>> investigation on going in this Bugzilla for Fedora:
>>>>>
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=2254434
>>>>
>>>> FWIW, I recently learned there was also a report in the kernel
>>>> bugzilla about the same issue:
>>>>
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=218258
>>>>
>>>>> The behavior we are seeing is that when a process has no heap and
>>>>> mmap(2) is called with MAP_PRIVATE | MAP_ANONYMOUS, it allocates memory
>>>>> on the heap.
>>>>>
>>>>> If the address space returned by mmap(2) is later on made executable
>>>>> with mprotect(2), that triggers an execheap avc.
>>>>
>>>> ...
>>>>
>>>>> This has been reproduced on kernels >= 6.6.
>>>
>>> I try the reproducer, but fails to reproduce on my 6.6
>>
>> After enable selinux then trigger the issue.
>>
>>>>>
>>>>> In reviewing the code, my best guess is that this is caused by the
>>>>> scenario where brk == start_brk not being handled, though I am not
>>>
>>> Is there vma_start,end and start_brk,brk infos, adding some debug print.
>>>
>>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>>> index 81fbfa5b80d4..b66c381c558e 100644
>>> --- a/security/selinux/hooks.c
>>> +++ b/security/selinux/hooks.c
>>> @@ -3848,6 +3848,9 @@ static int selinux_file_mprotect(struct
>>> vm_area_struct *vma,
>>>                   if (vma_is_initial_heap(vma)) {
>>>                           rc = avc_has_perm(sid, sid, SECCLASS_PROCESS,
>>>                                             PROCESS__EXECHEAP, NULL);
>>> +                       if (rc)
>>> +                               pr_err("DEBUG: vma_start,end=%lx,%lx,
>>> start_brk,brk=%lx,%lx\n",
>>> +                                       vma->vm_start, vma->vm_end,
>>> vma->vm_mm->start_brk, vma->vm_mm->brk);
>>>                   } else if (!vma->vm_file && (vma_is_initial_stack(vma) ||
>>>                               vma_is_stack_for_current(vma))) {
>>>                           rc = avc_has_perm(sid, sid, SECCLASS_PROCESS,
>>
>>
>> [ 3520.913952] DEBUG: pid=769674,comm=test-vma,
>> vma:start,end=1e9c000,21e9c000, start_brk,brk=2310000,2310000
>>
>>
>> vma_start <  start_brk = brk  < vma_end
>>
>>>
>>>>> expert enough in kernel code to know. If the start address allocated
>>>>> by mmap is before the starting program break, and the end address is
>>>>> after the starting program break, then the avc will trigger. However,
>>>>> I don't know how mmap deals with determining an address and if it takes
>>>>> into account the program break, or if calling brk(2) later on will just
>>>>> pick a new location.
>>>>
>>>> I'm not a mm expert, but thankfully we have some on the To/CC line so
>>>> I'm hopeful they will be able to take a look and provide some insight.
>>>
>>> +Cc mmap maintainers too.
>>>
>>>>
>>>> To bring the relevant code into this email, prior to using the
>>>> vma_is_initial_heap() helper the SELinux execheap logic looked like
>>>> this:
>>>>
>>>>     /* WORKING */
>>>>     if (vma->vm_start >= vma->vm_mm->start_brk &&
>>>>         vma->vm_end <= vma->vm_mm->brk)
>>>>       /* execheap denial */
>>>>
>>
>> vma range must be within heap.
>>
>>>> ... while the current vma_is_initial_heap() helper has logic that
>>>> looks like this:
>>>>
>>>>     /* BUGGY */
>>>>     if (vma->vm_start < vma->vm_mm->brk &&
>>>>         vma->vm_end > vma->vm_mm->start_brk)
>>>>       /* execheap denial */
>>
>> vma range intersection with brk range will be rejected, and
>> there's a cross in above case, so execheap denial, I not sure it
>> should be allowed or not, hope maintainers give some comments here.
> 
> It's a kernel regression so we need to revert the change to SELinux at
> least - can't start denying permission without a new/updated policy
> (e.g. making the change in the check conditional on a new policy
> capability).
> 

OK,revert patch is sent, but I am also curious about it.

https://lore.kernel.org/all/20240808130909.1027860-1-wangkefeng.wang@huawei.com/
Marc Reisner Aug. 8, 2024, 3:03 p.m. UTC | #13
On Thu, Aug 08, 2024 at 09:12:59PM +0800, Kefeng Wang wrote:
>
> OK,revert patch is sent, but I am also curious about it.
>
> https://lore.kernel.org/all/20240808130909.1027860-1-wangkefeng.wang@huawei.com/

I am also curious. It seems like the "real" fix would be in mmap - my
understanding is that it should not intersect with heap, even when heap
is empty (start_brk == brk).

It looks like start_brk is fixed in place when the ELF is
loaded in fs/binfmt_elf.c:load_elf_binary (line 1288).

        if ((current->flags & PF_RANDOMIZE) && (snapshot_randomize_va_space > 1)) {
                /*
                 * For architectures with ELF randomization, when executing
                 * a loader directly (i.e. no interpreter listed in ELF
                 * headers), move the brk area out of the mmap region
                 * (since it grows up, and may collide early with the stack
                 * growing down), and into the unused ELF_ET_DYN_BASE region.
                 */
                if (IS_ENABLED(CONFIG_ARCH_HAS_ELF_RANDOMIZE) &&
                    elf_ex->e_type == ET_DYN && !interpreter) {
                        mm->brk = mm->start_brk = ELF_ET_DYN_BASE;
                } else {
                        /* Otherwise leave a gap between .bss and brk. */
                        mm->brk = mm->start_brk = mm->brk + PAGE_SIZE;
                }

                mm->brk = mm->start_brk = arch_randomize_brk(mm);
#ifdef compat_brk_randomized
                current->brk_randomized = 1;
#endif
        }
Liam R. Howlett Aug. 8, 2024, 6 p.m. UTC | #14
* Marc Reisner <reisner.marc@gmail.com> [240808 11:03]:
> On Thu, Aug 08, 2024 at 09:12:59PM +0800, Kefeng Wang wrote:
> >
> > OK,revert patch is sent, but I am also curious about it.
> >
> > https://lore.kernel.org/all/20240808130909.1027860-1-wangkefeng.wang@huawei.com/
> 
> I am also curious. It seems like the "real" fix would be in mmap - my
> understanding is that it should not intersect with heap, even when heap
> is empty (start_brk == brk).
> 
> It looks like start_brk is fixed in place when the ELF is
> loaded in fs/binfmt_elf.c:load_elf_binary (line 1288).
> 
>         if ((current->flags & PF_RANDOMIZE) && (snapshot_randomize_va_space > 1)) {
>                 /*
>                  * For architectures with ELF randomization, when executing
>                  * a loader directly (i.e. no interpreter listed in ELF
>                  * headers), move the brk area out of the mmap region
>                  * (since it grows up, and may collide early with the stack
>                  * growing down), and into the unused ELF_ET_DYN_BASE region.
>                  */
>                 if (IS_ENABLED(CONFIG_ARCH_HAS_ELF_RANDOMIZE) &&
>                     elf_ex->e_type == ET_DYN && !interpreter) {
>                         mm->brk = mm->start_brk = ELF_ET_DYN_BASE;
>                 } else {
>                         /* Otherwise leave a gap between .bss and brk. */
>                         mm->brk = mm->start_brk = mm->brk + PAGE_SIZE;
>                 }
> 
>                 mm->brk = mm->start_brk = arch_randomize_brk(mm);
> #ifdef compat_brk_randomized
>                 current->brk_randomized = 1;
> #endif
>         }


Have a look at the mmapstress 3 test in ltp [1].  The tests pokes holes
and mmaps into those holes throughout the brk range.

[1]. https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mmapstress/mmapstress03.c
Marc Reisner Aug. 8, 2024, 7:35 p.m. UTC | #15
On Thu, Aug 08, 2024 at 02:00:09PM -0400, Liam R. Howlett wrote:
> Have a look at the mmapstress 3 test in ltp [1].  The tests pokes holes
> and mmaps into those holes throughout the brk range.
> 
> [1]. https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mmapstress/mmapstress03.c

In investigating this further, with additional reproducers, I believe
that the whole bug is in vma_is_initial_heap(). Here is what I have
tested so far:

1. Use only sbrk to allocate heap memory - sbrk(0) returns the start_brk
before calling sbrk(increment). Afterwards, sbrk(0) returns start_brk +
increment.
2. Use sbrk(0) to obtain start_brk, then request 512 MB of address space
from mmap starting at start_brk. mmap allocates 512 MB of address space
starting at start_brk and ending at start_brk + 0x20000000. sbrk(0)
still returns start_brk. However, /proc/PID/maps flags the mmapped
address space with "[heap]"
3. Use sbrk(0) to obtain start_brk, then request 512 MB of address space
from mmap starting at start_brk + _SC_PAGESIZE. mmap allocates 512 MB of
address space starting at start_brk + _SC_PAGESIZE and ending at
start_brk + _SC_PAGESIZE + 0x20000000. sbrk(0) still returns start_brk,
and /proc/PID/maps does NOT flag the mmapped address space with
"[heap]".

I believe that the entire bug may reside in vma_is_initial_heap because
/proc/PID/maps also uses vma_is_initial_heap to flag entries with
"[heap]" [1]. Also, sbrk(0) is not actually getting updated after a call
to mmap, so mmap is not actually allocating heap memory.

What do you all think about this patch? If it doesn't have any obvious
flaws I can submit it (along with a revert for the revert).

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c4b238a20b76..1dd588833af8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -918,7 +918,8 @@ static inline bool vma_is_anonymous(struct vm_area_struct *vma)
  */
 static inline bool vma_is_initial_heap(const struct vm_area_struct *vma)
 {
-       return vma->vm_start < vma->vm_mm->brk &&
+       return vma->vm_mm->brk != vma->vm_mm->start_brk &&
+               vma->vm_start < vma->vm_mm->brk &&
                vma->vm_end > vma->vm_mm->start_brk;
 }

--


[1]. https://github.com/torvalds/linux/blob/6a0e38264012809afa24113ee2162dc07f4ed22b/fs/proc/task_mmu.c#L287
Paul Moore Aug. 8, 2024, 8:40 p.m. UTC | #16
On Thu, Aug 8, 2024 at 3:35 PM Marc Reisner <reisner.marc@gmail.com> wrote:
> On Thu, Aug 08, 2024 at 02:00:09PM -0400, Liam R. Howlett wrote:
> > Have a look at the mmapstress 3 test in ltp [1].  The tests pokes holes
> > and mmaps into those holes throughout the brk range.
> >
> > [1]. https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mmapstress/mmapstress03.c
>
> In investigating this further, with additional reproducers, I believe
> that the whole bug is in vma_is_initial_heap().

That's my feeling at this point too.  Unfortunately, there are a few
callers other than SELinux so I don't want to change the helper
function without an explicit ACK from the mm folks and I think now
that we understand the problem we want to get this fixed ASAP in
Linus' tree (and get it marked for -stable).

I just posted a patch that reverts just our use of
vma_is_initial_heap() in favor of our old logic and adds a few lines
of comments about the problem with vma_is_initial_heap().  I'm okay
with moving back to vma_is_initial_heap() when it's fixed, but I'd
prefer it to be fixed for a while before we transition back to it.
We've gotten burned twice now with vma_is_initial_heap() so I'm going
to be a little extra cautious here.

https://lore.kernel.org/selinux/20240808203353.202352-2-paul@paul-moore.com

> What do you all think about this patch? If it doesn't have any obvious
> flaws I can submit it (along with a revert for the revert).

I'll leave the mm folks to weigh in on the fix to
vma_is_initial_heap(), but as I said above, please don't submit a
patch to SELinux right now, I want the fixed version of
vma_is_initial_heap() to "soak" for a bit before we go back to it.
diff mbox series

Patch

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index c87b79a29fad..ac582c046c51 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3800,13 +3800,10 @@  static int selinux_file_mprotect(struct
vm_area_struct *vma,
        if (default_noexec &&
            (prot & PROT_EXEC) && !(vma->vm_flags & VM_EXEC)) {
                int rc = 0;
-               if (vma->vm_start >= vma->vm_mm->start_brk &&
-                   vma->vm_end <= vma->vm_mm->brk) {
+               if (vma_is_initial_heap(vma)) {
                        rc = avc_has_perm(sid, sid, SECCLASS_PROCESS,
                                          PROCESS__EXECHEAP, NULL);
-               } else if (!vma->vm_file &&
-                          ((vma->vm_start <= vma->vm_mm->start_stack &&
-                            vma->vm_end >= vma->vm_mm->start_stack) ||
+               } else if (!vma->vm_file && (vma_is_initial_stack(vma) ||
                            vma_is_stack_for_current(vma))) {
                        rc = avc_has_perm(sid, sid, SECCLASS_PROCESS,