diff mbox series

[RFC,RESEND,v2,1/2] mm/memfd: Add support for F_SEAL_FUTURE_EXEC to memfd

Message ID 20250102233255.1180524-2-isaacmanjarres@google.com (mailing list archive)
State New
Headers show
Series Add file seal to prevent future exec mappings | expand

Commit Message

Isaac Manjarres Jan. 2, 2025, 11:32 p.m. UTC
Android currently uses the ashmem driver [1] for creating shared memory
regions between processes. Ashmem buffers can initially be mapped with
PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the
ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the
permissions that the buffer can be mapped with.

Processes can remove the ability to map ashmem buffers as executable to
ensure that those buffers cannot be exploited to run unintended code.

For instance, suppose process A allocates a memfd that is meant to be
read and written by itself and another process, call it B.

Process A shares the buffer with process B, but process B injects code
into the buffer, and compromises process A, such that it makes A map
the buffer with PROT_EXEC. This provides an opportunity for process A
to run the code that process B injected into the buffer.

If process A had the ability to seal the buffer against future
executable mappings before sharing the buffer with process B, this
attack would not be possible.

Android is currently trying to replace ashmem with memfd. However, memfd
does not have a provision to permanently remove the ability to map a
buffer as executable, and leaves itself open to the type of attack
described earlier. However, this should be something that can be
achieved via a new file seal.

There are known usecases (e.g. CursorWindow [2]) where a process
maps a buffer with read/write permissions before restricting the buffer
to being mapped as read-only for future mappings.

The resulting VMA from the writable mapping has VM_MAYEXEC set, meaning
that mprotect() can change the mapping to be executable. Therefore,
implementing the seal similar to F_SEAL_WRITE would not be appropriate,
since it would not work with the CursorWindow usecase. This is because
the CursorWindow process restricts the mapping permissions to read-only
after the writable mapping is created. So, adding a file seal for
executable mappings that operates like F_SEAL_WRITE would fail.

Therefore, add support for F_SEAL_FUTURE_EXEC, which is handled
similarly to F_SEAL_FUTURE_WRITE. This ensures that CursorWindow can
continue to create a writable mapping initially, and then restrict the
permissions on the buffer to be mappable as read-only by using both
F_SEAL_FUTURE_WRITE and F_SEAL_FUTURE_EXEC. After the seal is
applied, any calls to mmap() with PROT_EXEC will fail.

[1] https://cs.android.com/android/kernel/superproject/+/common-android-mainline:common/drivers/staging/android/ashmem.c
[2] https://developer.android.com/reference/android/database/CursorWindow

Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
---
 include/uapi/linux/fcntl.h |  1 +
 mm/memfd.c                 | 39 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 39 insertions(+), 1 deletion(-)

Comments

Jann Horn Jan. 3, 2025, 3:03 p.m. UTC | #1
On Fri, Jan 3, 2025 at 12:32 AM Isaac J. Manjarres
<isaacmanjarres@google.com> wrote:
> Android currently uses the ashmem driver [1] for creating shared memory
> regions between processes. Ashmem buffers can initially be mapped with
> PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the
> ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the
> permissions that the buffer can be mapped with.
>
> Processes can remove the ability to map ashmem buffers as executable to
> ensure that those buffers cannot be exploited to run unintended code.

Is there really code out there that first maps an ashmem buffer with
PROT_EXEC, then uses the ioctl to remove execute permission for future
mappings? I don't see why anyone would do that.

> For instance, suppose process A allocates a memfd that is meant to be
> read and written by itself and another process, call it B.
>
> Process A shares the buffer with process B, but process B injects code
> into the buffer, and compromises process A, such that it makes A map
> the buffer with PROT_EXEC. This provides an opportunity for process A
> to run the code that process B injected into the buffer.
>
> If process A had the ability to seal the buffer against future
> executable mappings before sharing the buffer with process B, this
> attack would not be possible.

I think if you want to enforce such restrictions in a scenario where
the attacker can already make the target process perform
semi-arbitrary syscalls, it would probably be more reliable to enforce
rules on executable mappings with something like SELinux policy and/or
F_SEAL_EXEC.

> Android is currently trying to replace ashmem with memfd. However, memfd
> does not have a provision to permanently remove the ability to map a
> buffer as executable, and leaves itself open to the type of attack
> described earlier. However, this should be something that can be
> achieved via a new file seal.
>
> There are known usecases (e.g. CursorWindow [2]) where a process
> maps a buffer with read/write permissions before restricting the buffer
> to being mapped as read-only for future mappings.

Here you're talking about write permission, but the patch is about
execute permission?

> The resulting VMA from the writable mapping has VM_MAYEXEC set, meaning
> that mprotect() can change the mapping to be executable. Therefore,
> implementing the seal similar to F_SEAL_WRITE would not be appropriate,
> since it would not work with the CursorWindow usecase. This is because
> the CursorWindow process restricts the mapping permissions to read-only
> after the writable mapping is created. So, adding a file seal for
> executable mappings that operates like F_SEAL_WRITE would fail.
>
> Therefore, add support for F_SEAL_FUTURE_EXEC, which is handled
> similarly to F_SEAL_FUTURE_WRITE. This ensures that CursorWindow can
> continue to create a writable mapping initially, and then restrict the
> permissions on the buffer to be mappable as read-only by using both
> F_SEAL_FUTURE_WRITE and F_SEAL_FUTURE_EXEC. After the seal is
> applied, any calls to mmap() with PROT_EXEC will fail.
>
> [1] https://cs.android.com/android/kernel/superproject/+/common-android-mainline:common/drivers/staging/android/ashmem.c
> [2] https://developer.android.com/reference/android/database/CursorWindow
>
> Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
> ---
>  include/uapi/linux/fcntl.h |  1 +
>  mm/memfd.c                 | 39 +++++++++++++++++++++++++++++++++++++-
>  2 files changed, 39 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> index 6e6907e63bfc..ef066e524777 100644
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> @@ -49,6 +49,7 @@
>  #define F_SEAL_WRITE   0x0008  /* prevent writes */
>  #define F_SEAL_FUTURE_WRITE    0x0010  /* prevent future writes while mapped */
>  #define F_SEAL_EXEC    0x0020  /* prevent chmod modifying exec bits */
> +#define F_SEAL_FUTURE_EXEC     0x0040 /* prevent future executable mappings */
>  /* (1U << 31) is reserved for signed error codes */
>
>  /*
> diff --git a/mm/memfd.c b/mm/memfd.c
> index 5f5a23c9051d..cfd62454df5e 100644
> --- a/mm/memfd.c
> +++ b/mm/memfd.c
> @@ -184,6 +184,7 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
>  }
>
>  #define F_ALL_SEALS (F_SEAL_SEAL | \
> +                    F_SEAL_FUTURE_EXEC |\
>                      F_SEAL_EXEC | \
>                      F_SEAL_SHRINK | \
>                      F_SEAL_GROW | \
> @@ -357,14 +358,50 @@ static int check_write_seal(unsigned long *vm_flags_ptr)
>         return 0;
>  }
>
> +static inline bool is_exec_sealed(unsigned int seals)
> +{
> +       return seals & F_SEAL_FUTURE_EXEC;
> +}
> +
> +static int check_exec_seal(unsigned long *vm_flags_ptr)
> +{
> +       unsigned long vm_flags = *vm_flags_ptr;
> +       unsigned long mask = vm_flags & (VM_SHARED | VM_EXEC);
> +
> +       /* Executability is not a concern for private mappings. */
> +       if (!(mask & VM_SHARED))
> +               return 0;

Why is it not a concern for private mappings?

> +       /*
> +        * New PROT_EXEC and MAP_SHARED mmaps are not allowed when exec seal
> +        * is active.
> +        */
> +       if (mask & VM_EXEC)
> +               return -EPERM;
> +
> +       /*
> +        * Prevent mprotect() from making an exec-sealed mapping executable in
> +        * the future.
> +        */
> +       *vm_flags_ptr &= ~VM_MAYEXEC;
> +
> +       return 0;
> +}
> +
>  int memfd_check_seals_mmap(struct file *file, unsigned long *vm_flags_ptr)
>  {
>         int err = 0;
>         unsigned int *seals_ptr = memfd_file_seals_ptr(file);
>         unsigned int seals = seals_ptr ? *seals_ptr : 0;
>
> -       if (is_write_sealed(seals))
> +       if (is_write_sealed(seals)) {
>                 err = check_write_seal(vm_flags_ptr);
> +               if (err)
> +                       return err;
> +       }
> +
> +       if (is_exec_sealed(seals))
> +               err = check_exec_seal(vm_flags_ptr);
>
>         return err;
>  }
> --
> 2.47.1.613.gc27f4b7a9f-goog
>
>
>
Jeff Xu Jan. 6, 2025, 5:35 p.m. UTC | #2
+ Kees because this is related to W^X memfd and security.

On Fri, Jan 3, 2025 at 7:04 AM Jann Horn <jannh@google.com> wrote:
>
> On Fri, Jan 3, 2025 at 12:32 AM Isaac J. Manjarres
> <isaacmanjarres@google.com> wrote:
> > Android currently uses the ashmem driver [1] for creating shared memory
> > regions between processes. Ashmem buffers can initially be mapped with
> > PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the
> > ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the
> > permissions that the buffer can be mapped with.
> >
> > Processes can remove the ability to map ashmem buffers as executable to
> > ensure that those buffers cannot be exploited to run unintended code.
>
> Is there really code out there that first maps an ashmem buffer with
> PROT_EXEC, then uses the ioctl to remove execute permission for future
> mappings? I don't see why anyone would do that.
>
> > For instance, suppose process A allocates a memfd that is meant to be
> > read and written by itself and another process, call it B.
> >
> > Process A shares the buffer with process B, but process B injects code
> > into the buffer, and compromises process A, such that it makes A map
> > the buffer with PROT_EXEC. This provides an opportunity for process A
> > to run the code that process B injected into the buffer.
> >
> > If process A had the ability to seal the buffer against future
> > executable mappings before sharing the buffer with process B, this
> > attack would not be possible.
>
> I think if you want to enforce such restrictions in a scenario where
> the attacker can already make the target process perform
> semi-arbitrary syscalls, it would probably be more reliable to enforce
> rules on executable mappings with something like SELinux policy and/or
> F_SEAL_EXEC.
>
I would like to second on the suggestion of  making this as part of F_SEAL_EXEC.

> > Android is currently trying to replace ashmem with memfd. However, memfd
> > does not have a provision to permanently remove the ability to map a
> > buffer as executable, and leaves itself open to the type of attack
> > described earlier. However, this should be something that can be
> > achieved via a new file seal.
> >
> > There are known usecases (e.g. CursorWindow [2]) where a process
> > maps a buffer with read/write permissions before restricting the buffer
> > to being mapped as read-only for future mappings.
>
> Here you're talking about write permission, but the patch is about
> execute permission?
>
> > The resulting VMA from the writable mapping has VM_MAYEXEC set, meaning
> > that mprotect() can change the mapping to be executable. Therefore,
> > implementing the seal similar to F_SEAL_WRITE would not be appropriate,
> > since it would not work with the CursorWindow usecase. This is because
> > the CursorWindow process restricts the mapping permissions to read-only
> > after the writable mapping is created. So, adding a file seal for
> > executable mappings that operates like F_SEAL_WRITE would fail.
> >
> > Therefore, add support for F_SEAL_FUTURE_EXEC, which is handled
> > similarly to F_SEAL_FUTURE_WRITE. This ensures that CursorWindow can
> > continue to create a writable mapping initially, and then restrict the
> > permissions on the buffer to be mappable as read-only by using both
> > F_SEAL_FUTURE_WRITE and F_SEAL_FUTURE_EXEC. After the seal is
> > applied, any calls to mmap() with PROT_EXEC will fail.
> >
> > [1] https://cs.android.com/android/kernel/superproject/+/common-android-mainline:common/drivers/staging/android/ashmem.c
> > [2] https://developer.android.com/reference/android/database/CursorWindow
> >
> > Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
> > ---
> >  include/uapi/linux/fcntl.h |  1 +
> >  mm/memfd.c                 | 39 +++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 39 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > index 6e6907e63bfc..ef066e524777 100644
> > --- a/include/uapi/linux/fcntl.h
> > +++ b/include/uapi/linux/fcntl.h
> > @@ -49,6 +49,7 @@
> >  #define F_SEAL_WRITE   0x0008  /* prevent writes */
> >  #define F_SEAL_FUTURE_WRITE    0x0010  /* prevent future writes while mapped */
> >  #define F_SEAL_EXEC    0x0020  /* prevent chmod modifying exec bits */
> > +#define F_SEAL_FUTURE_EXEC     0x0040 /* prevent future executable mappings */
> >  /* (1U << 31) is reserved for signed error codes */
> >
> >  /*
> > diff --git a/mm/memfd.c b/mm/memfd.c
> > index 5f5a23c9051d..cfd62454df5e 100644
> > --- a/mm/memfd.c
> > +++ b/mm/memfd.c
> > @@ -184,6 +184,7 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
> >  }
> >
> >  #define F_ALL_SEALS (F_SEAL_SEAL | \
> > +                    F_SEAL_FUTURE_EXEC |\
> >                      F_SEAL_EXEC | \
> >                      F_SEAL_SHRINK | \
> >                      F_SEAL_GROW | \
> > @@ -357,14 +358,50 @@ static int check_write_seal(unsigned long *vm_flags_ptr)
> >         return 0;
> >  }
> >
> > +static inline bool is_exec_sealed(unsigned int seals)
> > +{
> > +       return seals & F_SEAL_FUTURE_EXEC;
> > +}
> > +
> > +static int check_exec_seal(unsigned long *vm_flags_ptr)
> > +{
> > +       unsigned long vm_flags = *vm_flags_ptr;
> > +       unsigned long mask = vm_flags & (VM_SHARED | VM_EXEC);
> > +
> > +       /* Executability is not a concern for private mappings. */
> > +       if (!(mask & VM_SHARED))
> > +               return 0;
>
> Why is it not a concern for private mappings?
>
> > +       /*
> > +        * New PROT_EXEC and MAP_SHARED mmaps are not allowed when exec seal
> > +        * is active.
> > +        */
> > +       if (mask & VM_EXEC)
> > +               return -EPERM;
> > +
> > +       /*
> > +        * Prevent mprotect() from making an exec-sealed mapping executable in
> > +        * the future.
> > +        */
> > +       *vm_flags_ptr &= ~VM_MAYEXEC;
> > +
> > +       return 0;
> > +}
> > +
> >  int memfd_check_seals_mmap(struct file *file, unsigned long *vm_flags_ptr)
> >  {
> >         int err = 0;
> >         unsigned int *seals_ptr = memfd_file_seals_ptr(file);
> >         unsigned int seals = seals_ptr ? *seals_ptr : 0;
> >
> > -       if (is_write_sealed(seals))
> > +       if (is_write_sealed(seals)) {
> >                 err = check_write_seal(vm_flags_ptr);
> > +               if (err)
> > +                       return err;
> > +       }
> > +
> > +       if (is_exec_sealed(seals))
> > +               err = check_exec_seal(vm_flags_ptr);
> >
memfd_check_seals_mmap is only for mmap() path, right ?

How about the mprotect()  path ? i.e.  An attacker can first create a
RW VMA mapping for the memfd and later mprotect the VMA to be
executable.

Similar to the check_write_seal call , we might want to block mprotect
for write seal as well.

> >         return err;
> >  }
> > --
> > 2.47.1.613.gc27f4b7a9f-goog
> >
> >
> >
>
Isaac Manjarres Jan. 7, 2025, 1:14 a.m. UTC | #3
On Fri, Jan 03, 2025 at 04:03:44PM +0100, Jann Horn wrote:
> On Fri, Jan 3, 2025 at 12:32 AM Isaac J. Manjarres
> <isaacmanjarres@google.com> wrote:
> > Android currently uses the ashmem driver [1] for creating shared memory
> > regions between processes. Ashmem buffers can initially be mapped with
> > PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the
> > ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the
> > permissions that the buffer can be mapped with.
> >
> > Processes can remove the ability to map ashmem buffers as executable to
> > ensure that those buffers cannot be exploited to run unintended code.
> 
> Is there really code out there that first maps an ashmem buffer with
> PROT_EXEC, then uses the ioctl to remove execute permission for future
> mappings? I don't see why anyone would do that.

Hi Jann,

Thanks for your feedback and for taking the time to review these
patches!

Not that I'm aware of. The reason why I made this seal have semantics
where it prevents future executable mappings is because there are
existing applications that allocate an ashmem buffer (default
permissions are RWX), map the buffer as RW, and then restrict
the permissions to just R.

When the buffer is mapped as RW, do_mmap() unconditionally sets
VM_MAYEXEC on the VMA for the mapping, which means that the mapping
could later be mapped as executable via mprotect(). Therefore, having
the semantics of the seal be that it prevents any executable mappings
would break existing code that has already been released. It would
make transitioning clients to memfd difficult, because to amend that,
the ashmem users would have to first restrict the permissions of the
buffer to be RW, then map it as RW, and then restrict the permissions
again to be just R, which also means an additional system call.

> > For instance, suppose process A allocates a memfd that is meant to be
> > read and written by itself and another process, call it B.
> >
> > Process A shares the buffer with process B, but process B injects code
> > into the buffer, and compromises process A, such that it makes A map
> > the buffer with PROT_EXEC. This provides an opportunity for process A
> > to run the code that process B injected into the buffer.
> >
> > If process A had the ability to seal the buffer against future
> > executable mappings before sharing the buffer with process B, this
> > attack would not be possible.
> 
> I think if you want to enforce such restrictions in a scenario where
> the attacker can already make the target process perform
> semi-arbitrary syscalls, it would probably be more reliable to enforce
> rules on executable mappings with something like SELinux policy and/or
> F_SEAL_EXEC.
>

For SELinux policy, do you mean to not allow execmem permissions? What
about scenarios where a process wants to use JIT compilation, but
doesn't want memfd data buffers to be executable? My thought was to use
this new seal to have a finer granularity to control what buffers can
be mapped as executable. If not, could you please clarify?

Also, F_SEAL_EXEC just seals the memfd's current executable permissions,
and doesn't affect the mapping permissions at all. Are you saying that
F_SEAL_EXEC should be extended to cover mappings as well? If so, it is
not clear to me what the semantics of that would be.

For instance, if a memfd is non-executable and F_SEAL_EXEC is applied,
we can also prevent any executable mappings at that point. I'm not sure
if that's the right thing to do though. For instance, there are shared
object files that don't have executable permissions, but their code
sections should be mapped as executable. So, drawing from that, I'm not
sure if it makes sense to tie the file execution permissions to the
mapping permissions.

There's also the case where F_SEAL_EXEC is invoked on an executable
memfd. In that case, there doesn't seem to be anything to do from a
mapping perspective since memfds can be mapped as executable by
default?

> > Android is currently trying to replace ashmem with memfd. However, memfd
> > does not have a provision to permanently remove the ability to map a
> > buffer as executable, and leaves itself open to the type of attack
> > described earlier. However, this should be something that can be
> > achieved via a new file seal.
> >
> > There are known usecases (e.g. CursorWindow [2]) where a process
> > maps a buffer with read/write permissions before restricting the buffer
> > to being mapped as read-only for future mappings.
> 
> Here you're talking about write permission, but the patch is about
> execute permission?
> 

Sorry, I used this example about write permission to show why I implemented
the seal with support for preventing future mappings, since the writable
mappings that get created can become executable in the future, as
described later in the commit text.

> > The resulting VMA from the writable mapping has VM_MAYEXEC set, meaning
> > that mprotect() can change the mapping to be executable. Therefore,
> > implementing the seal similar to F_SEAL_WRITE would not be appropriate,
> > since it would not work with the CursorWindow usecase. This is because
> > the CursorWindow process restricts the mapping permissions to read-only
> > after the writable mapping is created. So, adding a file seal for
> > executable mappings that operates like F_SEAL_WRITE would fail.
> >
> > Therefore, add support for F_SEAL_FUTURE_EXEC, which is handled
> > similarly to F_SEAL_FUTURE_WRITE. This ensures that CursorWindow can
> > continue to create a writable mapping initially, and then restrict the
> > permissions on the buffer to be mappable as read-only by using both
> > F_SEAL_FUTURE_WRITE and F_SEAL_FUTURE_EXEC. After the seal is
> > applied, any calls to mmap() with PROT_EXEC will fail.
> >
> > [1] https://cs.android.com/android/kernel/superproject/+/common-android-mainline:common/drivers/staging/android/ashmem.c
> > [2] https://developer.android.com/reference/android/database/CursorWindow
> >
> > Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
> > ---
> >  include/uapi/linux/fcntl.h |  1 +
> >  mm/memfd.c                 | 39 +++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 39 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > index 6e6907e63bfc..ef066e524777 100644
> > --- a/include/uapi/linux/fcntl.h
> > +++ b/include/uapi/linux/fcntl.h
> > @@ -49,6 +49,7 @@
> >  #define F_SEAL_WRITE   0x0008  /* prevent writes */
> >  #define F_SEAL_FUTURE_WRITE    0x0010  /* prevent future writes while mapped */
> >  #define F_SEAL_EXEC    0x0020  /* prevent chmod modifying exec bits */
> > +#define F_SEAL_FUTURE_EXEC     0x0040 /* prevent future executable mappings */
> >  /* (1U << 31) is reserved for signed error codes */
> >
> >  /*
> > diff --git a/mm/memfd.c b/mm/memfd.c
> > index 5f5a23c9051d..cfd62454df5e 100644
> > --- a/mm/memfd.c
> > +++ b/mm/memfd.c
> > @@ -184,6 +184,7 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
> >  }
> >
> >  #define F_ALL_SEALS (F_SEAL_SEAL | \
> > +                    F_SEAL_FUTURE_EXEC |\
> >                      F_SEAL_EXEC | \
> >                      F_SEAL_SHRINK | \
> >                      F_SEAL_GROW | \
> > @@ -357,14 +358,50 @@ static int check_write_seal(unsigned long *vm_flags_ptr)
> >         return 0;
> >  }
> >
> > +static inline bool is_exec_sealed(unsigned int seals)
> > +{
> > +       return seals & F_SEAL_FUTURE_EXEC;
> > +}
> > +
> > +static int check_exec_seal(unsigned long *vm_flags_ptr)
> > +{
> > +       unsigned long vm_flags = *vm_flags_ptr;
> > +       unsigned long mask = vm_flags & (VM_SHARED | VM_EXEC);
> > +
> > +       /* Executability is not a concern for private mappings. */
> > +       if (!(mask & VM_SHARED))
> > +               return 0;
> 
> Why is it not a concern for private mappings?
>
I didn't consider private mappings since it wasn't clear as to how
they could be a threat to another process. A process can copy the
contents of the buffer into another executable region of memory
and just run it from there? Or are you saying that because it
can do that, is there any value in differentiating between
shared and private mappings?

Thanks,
Isaac
Isaac Manjarres Jan. 7, 2025, 1:26 a.m. UTC | #4
On Mon, Jan 06, 2025 at 09:35:09AM -0800, Jeff Xu wrote:
> + Kees because this is related to W^X memfd and security.
> 
> On Fri, Jan 3, 2025 at 7:04 AM Jann Horn <jannh@google.com> wrote:
> >
> > On Fri, Jan 3, 2025 at 12:32 AM Isaac J. Manjarres
> > <isaacmanjarres@google.com> wrote:
> > > Android currently uses the ashmem driver [1] for creating shared memory
> > > regions between processes. Ashmem buffers can initially be mapped with
> > > PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the
> > > ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the
> > > permissions that the buffer can be mapped with.
> > >
> > > Processes can remove the ability to map ashmem buffers as executable to
> > > ensure that those buffers cannot be exploited to run unintended code.
> >
> > Is there really code out there that first maps an ashmem buffer with
> > PROT_EXEC, then uses the ioctl to remove execute permission for future
> > mappings? I don't see why anyone would do that.
> >
> > > For instance, suppose process A allocates a memfd that is meant to be
> > > read and written by itself and another process, call it B.
> > >
> > > Process A shares the buffer with process B, but process B injects code
> > > into the buffer, and compromises process A, such that it makes A map
> > > the buffer with PROT_EXEC. This provides an opportunity for process A
> > > to run the code that process B injected into the buffer.
> > >
> > > If process A had the ability to seal the buffer against future
> > > executable mappings before sharing the buffer with process B, this
> > > attack would not be possible.
> >
> > I think if you want to enforce such restrictions in a scenario where
> > the attacker can already make the target process perform
> > semi-arbitrary syscalls, it would probably be more reliable to enforce
> > rules on executable mappings with something like SELinux policy and/or
> > F_SEAL_EXEC.
> >
> I would like to second on the suggestion of  making this as part of F_SEAL_EXEC.

Thanks for taking a look at this patch Jeff! Can you please elaborate
some more on how F_SEAL_EXEC should be extended to restricting executable
mappings?

I understand that if a memfd file is non-executable (either because it
was made non-executable via fchmod() or by being created with
MFD_NOEXEC_SEAL) one could argue that applying F_SEAL_EXEC to that file
would also mean preventing any executable mappings. However, it is not
clear to me if we should tie a file's executable permissions to whether
or not if it can be mapped as executable. For example, shared object
files don't have to have executable permissions, but processes should
be able to map them as executable.

The case where we apply F_SEAL_EXEC on an executable memfd also seems
awkward to me, since memfds can be mapped as executable by default
so what would happen in that scenario?

I also shared the same concerns in my response to Jann in [1].

> > > diff --git a/mm/memfd.c b/mm/memfd.c
> > > index 5f5a23c9051d..cfd62454df5e 100644
> > > --- a/mm/memfd.c
> > > +++ b/mm/memfd.c
> > > @@ -184,6 +184,7 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
> > >  }
> > >
> > >  #define F_ALL_SEALS (F_SEAL_SEAL | \
> > > +                    F_SEAL_FUTURE_EXEC |\
> > >                      F_SEAL_EXEC | \
> > >                      F_SEAL_SHRINK | \
> > >                      F_SEAL_GROW | \
> > > @@ -357,14 +358,50 @@ static int check_write_seal(unsigned long *vm_flags_ptr)
> > >         return 0;
> > >  }
> > >
> > > +static inline bool is_exec_sealed(unsigned int seals)
> > > +{
> > > +       return seals & F_SEAL_FUTURE_EXEC;
> > > +}
> > > +
> > > +static int check_exec_seal(unsigned long *vm_flags_ptr)
> > > +{
> > > +       unsigned long vm_flags = *vm_flags_ptr;
> > > +       unsigned long mask = vm_flags & (VM_SHARED | VM_EXEC);
> > > +
> > > +       /* Executability is not a concern for private mappings. */
> > > +       if (!(mask & VM_SHARED))
> > > +               return 0;
> >
> > Why is it not a concern for private mappings?
> >
> > > +       /*
> > > +        * New PROT_EXEC and MAP_SHARED mmaps are not allowed when exec seal
> > > +        * is active.
> > > +        */
> > > +       if (mask & VM_EXEC)
> > > +               return -EPERM;
> > > +
> > > +       /*
> > > +        * Prevent mprotect() from making an exec-sealed mapping executable in
> > > +        * the future.
> > > +        */
> > > +       *vm_flags_ptr &= ~VM_MAYEXEC;
> > > +
> > > +       return 0;
> > > +}
> > > +
> > >  int memfd_check_seals_mmap(struct file *file, unsigned long *vm_flags_ptr)
> > >  {
> > >         int err = 0;
> > >         unsigned int *seals_ptr = memfd_file_seals_ptr(file);
> > >         unsigned int seals = seals_ptr ? *seals_ptr : 0;
> > >
> > > -       if (is_write_sealed(seals))
> > > +       if (is_write_sealed(seals)) {
> > >                 err = check_write_seal(vm_flags_ptr);
> > > +               if (err)
> > > +                       return err;
> > > +       }
> > > +
> > > +       if (is_exec_sealed(seals))
> > > +               err = check_exec_seal(vm_flags_ptr);
> > >
> memfd_check_seals_mmap is only for mmap() path, right ?
> 
> How about the mprotect()  path ? i.e.  An attacker can first create a
> RW VMA mapping for the memfd and later mprotect the VMA to be
> executable.
> 
> Similar to the check_write_seal call , we might want to block mprotect
> for write seal as well.
>

So when memfd_check_seals_mmap() is called, if the file is exec_sealed,
check_exec_seal() will not only just check that VM_EXEC is not set,
but it will also clear VM_MAYEXEC, which will prevent the mapping from
being changed to executable via mprotect() later.

[1] https://lore.kernel.org/all/Z3x_8uFn2e0EpDqM@google.com/

Thanks,
Isaac
Jeff Xu Jan. 7, 2025, 5:21 a.m. UTC | #5
On Mon, Jan 6, 2025 at 5:26 PM Isaac Manjarres
<isaacmanjarres@google.com> wrote:
>
> On Mon, Jan 06, 2025 at 09:35:09AM -0800, Jeff Xu wrote:
> > + Kees because this is related to W^X memfd and security.
> >
> > On Fri, Jan 3, 2025 at 7:04 AM Jann Horn <jannh@google.com> wrote:
> > >
> > > On Fri, Jan 3, 2025 at 12:32 AM Isaac J. Manjarres
> > > <isaacmanjarres@google.com> wrote:
> > > > Android currently uses the ashmem driver [1] for creating shared memory
> > > > regions between processes. Ashmem buffers can initially be mapped with
> > > > PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the
> > > > ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the
> > > > permissions that the buffer can be mapped with.
> > > >
> > > > Processes can remove the ability to map ashmem buffers as executable to
> > > > ensure that those buffers cannot be exploited to run unintended code.
> > >
> > > Is there really code out there that first maps an ashmem buffer with
> > > PROT_EXEC, then uses the ioctl to remove execute permission for future
> > > mappings? I don't see why anyone would do that.
> > >
> > > > For instance, suppose process A allocates a memfd that is meant to be
> > > > read and written by itself and another process, call it B.
> > > >
> > > > Process A shares the buffer with process B, but process B injects code
> > > > into the buffer, and compromises process A, such that it makes A map
> > > > the buffer with PROT_EXEC. This provides an opportunity for process A
> > > > to run the code that process B injected into the buffer.
> > > >
> > > > If process A had the ability to seal the buffer against future
> > > > executable mappings before sharing the buffer with process B, this
> > > > attack would not be possible.
> > >
> > > I think if you want to enforce such restrictions in a scenario where
> > > the attacker can already make the target process perform
> > > semi-arbitrary syscalls, it would probably be more reliable to enforce
> > > rules on executable mappings with something like SELinux policy and/or
> > > F_SEAL_EXEC.
> > >
> > I would like to second on the suggestion of  making this as part of F_SEAL_EXEC.
>
> Thanks for taking a look at this patch Jeff! Can you please elaborate
> some more on how F_SEAL_EXEC should be extended to restricting executable
> mappings?
>
> I understand that if a memfd file is non-executable (either because it
> was made non-executable via fchmod() or by being created with
> MFD_NOEXEC_SEAL) one could argue that applying F_SEAL_EXEC to that file
> would also mean preventing any executable mappings. However, it is not
> clear to me if we should tie a file's executable permissions to whether
> or not if it can be mapped as executable. For example, shared object
> files don't have to have executable permissions, but processes should
> be able to map them as executable.
>
> The case where we apply F_SEAL_EXEC on an executable memfd also seems
> awkward to me, since memfds can be mapped as executable by default
> so what would happen in that scenario?
>
> I also shared the same concerns in my response to Jann in [1].
>
Apology  for not being clear. I meant this below:
when
1> memfd is created with MFD_NOEXEC_SEAL or
2> memfd is no-exec (NX)  and F_SEAL_EXEC is set.
We could also block the memfd from being mapped as executable.

MFD_NOEXEC_SEAL/F_SEAL_EXEC  is added in 6fd7353829ca, which is about
2 years old, I m not sure any application uses the case of creating a
MFD_NOEXEC_SEAL memfd and still wants to mmap it as executable memory,
that is a strange user case.  It is more logical that  applications
want to block both execve() and mmap() for a non-executable memfd.
Therefore I think we could reuse the F_SEAL_EXEC bit + NX state for
this feature, for simplicity.

> > > > diff --git a/mm/memfd.c b/mm/memfd.c
> > > > index 5f5a23c9051d..cfd62454df5e 100644
> > > > --- a/mm/memfd.c
> > > > +++ b/mm/memfd.c
> > > > @@ -184,6 +184,7 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
> > > >  }
> > > >
> > > >  #define F_ALL_SEALS (F_SEAL_SEAL | \
> > > > +                    F_SEAL_FUTURE_EXEC |\
> > > >                      F_SEAL_EXEC | \
> > > >                      F_SEAL_SHRINK | \
> > > >                      F_SEAL_GROW | \
> > > > @@ -357,14 +358,50 @@ static int check_write_seal(unsigned long *vm_flags_ptr)
> > > >         return 0;
> > > >  }
> > > >
> > > > +static inline bool is_exec_sealed(unsigned int seals)
> > > > +{
> > > > +       return seals & F_SEAL_FUTURE_EXEC;
> > > > +}
> > > > +
> > > > +static int check_exec_seal(unsigned long *vm_flags_ptr)
> > > > +{
> > > > +       unsigned long vm_flags = *vm_flags_ptr;
> > > > +       unsigned long mask = vm_flags & (VM_SHARED | VM_EXEC);
> > > > +
> > > > +       /* Executability is not a concern for private mappings. */
> > > > +       if (!(mask & VM_SHARED))
> > > > +               return 0;
> > >
> > > Why is it not a concern for private mappings?
> > >
> > > > +       /*
> > > > +        * New PROT_EXEC and MAP_SHARED mmaps are not allowed when exec seal
> > > > +        * is active.
> > > > +        */
> > > > +       if (mask & VM_EXEC)
> > > > +               return -EPERM;
> > > > +
> > > > +       /*
> > > > +        * Prevent mprotect() from making an exec-sealed mapping executable in
> > > > +        * the future.
> > > > +        */
> > > > +       *vm_flags_ptr &= ~VM_MAYEXEC;
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +
> > > >  int memfd_check_seals_mmap(struct file *file, unsigned long *vm_flags_ptr)
> > > >  {
> > > >         int err = 0;
> > > >         unsigned int *seals_ptr = memfd_file_seals_ptr(file);
> > > >         unsigned int seals = seals_ptr ? *seals_ptr : 0;
> > > >
> > > > -       if (is_write_sealed(seals))
> > > > +       if (is_write_sealed(seals)) {
> > > >                 err = check_write_seal(vm_flags_ptr);
> > > > +               if (err)
> > > > +                       return err;
> > > > +       }
> > > > +
> > > > +       if (is_exec_sealed(seals))
> > > > +               err = check_exec_seal(vm_flags_ptr);
> > > >
> > memfd_check_seals_mmap is only for mmap() path, right ?
> >
> > How about the mprotect()  path ? i.e.  An attacker can first create a
> > RW VMA mapping for the memfd and later mprotect the VMA to be
> > executable.
> >
> > Similar to the check_write_seal call , we might want to block mprotect
> > for write seal as well.
> >
>
> So when memfd_check_seals_mmap() is called, if the file is exec_sealed,
> check_exec_seal() will not only just check that VM_EXEC is not set,
> but it will also clear VM_MAYEXEC, which will prevent the mapping from
> being changed to executable via mprotect() later.
>
Thanks for clarification.

The name of check_exec_seal() is misleading , check implies a read
operation, but this function actually does update. Maybe renaming to
check_and_update_exec_seal or something like that ?

Do you know which code checks for VM_MAYEXEC flag in the mprotect code
path ?  it isn't obvious to me, i.e. when I grep the VM_MAYEXEC inside
mm path, it only shows one place in mprotect and that doesn't do the
work.

~/mm/mm$ grep VM_MAYEXEC *
mmap.c: mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
mmap.c: vm_flags &= ~VM_MAYEXEC;
mprotect.c: if (rier && (vma->vm_flags & VM_MAYEXEC))
nommu.c: vm_flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
nommu.c: vm_flags |= VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;

Thanks
-Jeff




> [1] https://lore.kernel.org/all/Z3x_8uFn2e0EpDqM@google.com/
>
> Thanks,
> Isaac
diff mbox series

Patch

diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 6e6907e63bfc..ef066e524777 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -49,6 +49,7 @@ 
 #define F_SEAL_WRITE	0x0008	/* prevent writes */
 #define F_SEAL_FUTURE_WRITE	0x0010  /* prevent future writes while mapped */
 #define F_SEAL_EXEC	0x0020  /* prevent chmod modifying exec bits */
+#define F_SEAL_FUTURE_EXEC	0x0040 /* prevent future executable mappings */
 /* (1U << 31) is reserved for signed error codes */
 
 /*
diff --git a/mm/memfd.c b/mm/memfd.c
index 5f5a23c9051d..cfd62454df5e 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -184,6 +184,7 @@  static unsigned int *memfd_file_seals_ptr(struct file *file)
 }
 
 #define F_ALL_SEALS (F_SEAL_SEAL | \
+		     F_SEAL_FUTURE_EXEC |\
 		     F_SEAL_EXEC | \
 		     F_SEAL_SHRINK | \
 		     F_SEAL_GROW | \
@@ -357,14 +358,50 @@  static int check_write_seal(unsigned long *vm_flags_ptr)
 	return 0;
 }
 
+static inline bool is_exec_sealed(unsigned int seals)
+{
+	return seals & F_SEAL_FUTURE_EXEC;
+}
+
+static int check_exec_seal(unsigned long *vm_flags_ptr)
+{
+	unsigned long vm_flags = *vm_flags_ptr;
+	unsigned long mask = vm_flags & (VM_SHARED | VM_EXEC);
+
+	/* Executability is not a concern for private mappings. */
+	if (!(mask & VM_SHARED))
+		return 0;
+
+	/*
+	 * New PROT_EXEC and MAP_SHARED mmaps are not allowed when exec seal
+	 * is active.
+	 */
+	if (mask & VM_EXEC)
+		return -EPERM;
+
+	/*
+	 * Prevent mprotect() from making an exec-sealed mapping executable in
+	 * the future.
+	 */
+	*vm_flags_ptr &= ~VM_MAYEXEC;
+
+	return 0;
+}
+
 int memfd_check_seals_mmap(struct file *file, unsigned long *vm_flags_ptr)
 {
 	int err = 0;
 	unsigned int *seals_ptr = memfd_file_seals_ptr(file);
 	unsigned int seals = seals_ptr ? *seals_ptr : 0;
 
-	if (is_write_sealed(seals))
+	if (is_write_sealed(seals)) {
 		err = check_write_seal(vm_flags_ptr);
+		if (err)
+			return err;
+	}
+
+	if (is_exec_sealed(seals))
+		err = check_exec_seal(vm_flags_ptr);
 
 	return err;
 }